[jira] [Assigned] (IMPALA-6756) Order children of profiles in a canonical way

2021-10-14 Thread Balazs Jeszenszky (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky reassigned IMPALA-6756:
-

Assignee: (was: Balazs Jeszenszky)

> Order children of profiles in a canonical way
> -
>
> Key: IMPALA-6756
> URL: https://issues.apache.org/jira/browse/IMPALA-6756
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.12.0
>Reporter: Lars Volker
>Priority: Trivial
>  Labels: supportability
>
> IMPALA-6694 addressed an issue where variations in the execution order across 
> multiple backends changed the order of a fragment instance's profile 
> children. Instead of relying on the order in which children are created on 
> the worker nodes and reported to the coordinator, we should introduce a 
> canonical order, e.g. have the non-exec-node children appear before the 
> exec-nodes, and order them alphabetically. [This discussion in the 
> review|https://gerrit.cloudera.org/#/c/9749/3/be/src/util/runtime-profile.cc@358]
>  has some more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-3880) Add list of all tables queried to runtime profile

2021-10-14 Thread Balazs Jeszenszky (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky reassigned IMPALA-3880:
-

Assignee: (was: Balazs Jeszenszky)

> Add list of all tables queried to runtime profile
> -
>
> Key: IMPALA-3880
> URL: https://issues.apache.org/jira/browse/IMPALA-3880
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.6.0
>Reporter: Henry Robinson
>Priority: Minor
>  Labels: newbie, ramp-up
>
> We list the tables missing stats in the runtime profile, but it would be 
> useful to see all tables referenced by a query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-4331) Allow gaps in error codes in generate_error_codes.py

2021-10-14 Thread Balazs Jeszenszky (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky reassigned IMPALA-4331:
-

Assignee: (was: Balazs Jeszenszky)

> Allow gaps in error codes in generate_error_codes.py
> 
>
> Key: IMPALA-4331
> URL: https://issues.apache.org/jira/browse/IMPALA-4331
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 2.7.0
>Reporter: Lars Volker
>Priority: Minor
>  Labels: newbie
>
> Currently we don't allow gaps in error code numbers in 
> generate_error_codes.py, which sometimes makes backporting changes more 
> difficult. Since there doesn't seem to be a reason to forbid gaps we should 
> adapt generate_error_codes.py accordingly and allow them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-5256) ERROR log files can get very large

2021-10-14 Thread Balazs Jeszenszky (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky reassigned IMPALA-5256:
-

Assignee: (was: Balazs Jeszenszky)

> ERROR log files can get very large
> --
>
> Key: IMPALA-5256
> URL: https://issues.apache.org/jira/browse/IMPALA-5256
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.5.5.
>Reporter: Mala Chikka Kempanna
>Priority: Major
>  Labels: supportability
>
> There's a user who's reporting seeing very large ERROR log files (up to 7gb) 
> that don't seem to be obeying the --max_log_size parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-4366) Partition is created in HMS even though the ALTER TABLE .. ADD PARTITION stmt fails

2021-10-14 Thread Balazs Jeszenszky (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky reassigned IMPALA-4366:
-

Assignee: (was: Balazs Jeszenszky)

> Partition is created in HMS even though the ALTER TABLE .. ADD PARTITION stmt 
> fails
> ---
>
> Key: IMPALA-4366
> URL: https://issues.apache.org/jira/browse/IMPALA-4366
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.7.0
>Reporter: Dimitris Tsirogiannis
>Priority: Major
>  Labels: catalog-server, newbie
>
> In some cases, even though adding a partition using an ALTER TABLE ADD 
> PARTITION statement fails, the partition is still created in HMS. To 
> reproduce the problem:
> {code}
> create table foo (a int) partitioned by (x int);
> alter table foo add partition (x = false); <-- This throws a CatalogException 
> due to the wrong type used.
> alter table foo add partition (x = false); <--- This throws an 
> AlreadyExistsException for partition x=False
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-6665) Tag CatalogOp logs with query IDs

2021-10-14 Thread Balazs Jeszenszky (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky reassigned IMPALA-6665:
-

Assignee: (was: Balazs Jeszenszky)

> Tag CatalogOp logs with query IDs
> -
>
> Key: IMPALA-6665
> URL: https://issues.apache.org/jira/browse/IMPALA-6665
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 2.12.0
>Reporter: Bharath Vissapragada
>Priority: Major
>  Labels: supportability
>
> Similar to IMPALA-6664. The idea is to improve catalog server logging by 
> adding query-ID to each of the Catalog server log statements. This helps map 
> Catalog errors to specific queries, which is currently not possible. 
> Raising a separate jira for the Catalog server since fixing it could be a 
> little tricker than the other components since we don't have the query hash 
> readily available in the Catalog context. We need to augment the Catalog RPCs 
> with this data. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10541) impyla fails to fetchall() in python3

2021-02-23 Thread Balazs Jeszenszky (Jira)
Balazs Jeszenszky created IMPALA-10541:
--

 Summary: impyla fails to fetchall() in python3
 Key: IMPALA-10541
 URL: https://issues.apache.org/jira/browse/IMPALA-10541
 Project: IMPALA
  Issue Type: Bug
  Components: Clients
Reporter: Balazs Jeszenszky


{{select *}} queries fail instantly, with:
{code}
Traceback (most recent call last):
  File "", line 1, in 
  File 
"/home/balazsj/.local/lib/python3.6/site-packages/impala/hiveserver2.py", line 
536, in fetchall
return list(self)
  File 
"/home/balazsj/.local/lib/python3.6/site-packages/impala/hiveserver2.py", line 
582, in __next__
self._buffer = self._last_operation.fetch(self.description,
  File 
"/home/balazsj/.local/lib/python3.6/site-packages/impala/hiveserver2.py", line 
187, in description
schema = self._last_operation.get_result_schema()
  File 
"/home/balazsj/.local/lib/python3.6/site-packages/impala/hiveserver2.py", line 
1295, in get_result_schema
entry = column.typeDesc.types[0].primitiveEntry
AttributeError: 'NoneType' object has no attribute 'types'
{code}

Selecting specific columns hangs indefinitely. The stack trace is always:
{code}
 File 
"/Users/balazsjeszenszky/.pyenv/versions/3.7.3/lib/python3.7/site-packages/impala/hiveserver2.py",
 line 535, in fetchall
return list(self)
  File 
"/Users/balazsjeszenszky/.pyenv/versions/3.7.3/lib/python3.7/site-packages/impala/hiveserver2.py",
 line 583, in __next__
convert_types=self.convert_types)
  File 
"/Users/balazsjeszenszky/.pyenv/versions/3.7.3/lib/python3.7/site-packages/impala/hiveserver2.py",
 line 1242, in fetch
resp = self._rpc('FetchResults', req)
  File 
"/Users/balazsjeszenszky/.pyenv/versions/3.7.3/lib/python3.7/site-packages/impala/hiveserver2.py",
 line 992, in _rpc
response = self._execute(func_name, request)
  File 
"/Users/balazsjeszenszky/.pyenv/versions/3.7.3/lib/python3.7/site-packages/impala/hiveserver2.py",
 line 1009, in _execute
return func(request)
  File 
"/Users/balazsjeszenszky/.pyenv/versions/3.7.3/lib/python3.7/site-packages/thriftpy2/thrift.py",
 line 219, in _req
return self._recv(_api)
  File 
"/Users/balazsjeszenszky/.pyenv/versions/3.7.3/lib/python3.7/site-packages/thriftpy2/thrift.py",
 line 238, in _recv
result.read(self._iprot)
  File 
"/Users/balazsjeszenszky/.pyenv/versions/3.7.3/lib/python3.7/site-packages/thriftpy2/thrift.py",
 line 160, in read
iprot.read_struct(self)
  File 
"/Users/balazsjeszenszky/.pyenv/versions/3.7.3/lib/python3.7/site-packages/thriftpy2/protocol/binary.py",
 line 387, in read_struct
return read_struct(self.trans, obj, self.decode_response)
  File 
"/Users/balazsjeszenszky/.pyenv/versions/3.7.3/lib/python3.7/site-packages/thriftpy2/protocol/binary.py",
 line 316, in read_struct
read_val(inbuf, f_type, f_container_spec, decode_response))
  File 
"/Users/balazsjeszenszky/.pyenv/versions/3.7.3/lib/python3.7/site-packages/thriftpy2/protocol/binary.py",
 line 289, in read_val
read_struct(inbuf, obj, decode_response)
  File 
"/Users/balazsjeszenszky/.pyenv/versions/3.7.3/lib/python3.7/site-packages/thriftpy2/protocol/binary.py",
 line 316, in read_struct
read_val(inbuf, f_type, f_container_spec, decode_response))
  File 
"/Users/balazsjeszenszky/.pyenv/versions/3.7.3/lib/python3.7/site-packages/thriftpy2/protocol/binary.py",
 line 289, in read_val
read_struct(inbuf, obj, decode_response)
  File 
"/Users/balazsjeszenszky/.pyenv/versions/3.7.3/lib/python3.7/site-packages/thriftpy2/protocol/binary.py",
 line 316, in read_struct
read_val(inbuf, f_type, f_container_spec, decode_response))
  File 
"/Users/balazsjeszenszky/.pyenv/versions/3.7.3/lib/python3.7/site-packages/thriftpy2/protocol/binary.py",
 line 256, in read_val
result.append(read_val(inbuf, v_type, v_spec, decode_response))
  File 
"/Users/balazsjeszenszky/.pyenv/versions/3.7.3/lib/python3.7/site-packages/thriftpy2/protocol/binary.py",
 line 289, in read_val
read_struct(inbuf, obj, decode_response)
  File 
"/Users/balazsjeszenszky/.pyenv/versions/3.7.3/lib/python3.7/site-packages/thriftpy2/protocol/binary.py",
 line 316, in read_struct
read_val(inbuf, f_type, f_container_spec, decode_response))
  File 
"/Users/balazsjeszenszky/.pyenv/versions/3.7.3/lib/python3.7/site-packages/thriftpy2/protocol/binary.py",
 line 289, in read_val
read_struct(inbuf, obj, decode_response)
  File 
"/Users/balazsjeszenszky/.pyenv/versions/3.7.3/lib/python3.7/site-packages/thriftpy2/protocol/binary.py",
 line 316, in read_struct
read_val(inbuf, f_type, f_container_spec, decode_response))
  File 
"/Users/balazsjeszenszky/.pyenv/versions/3.7.3/lib/python3.7/site-packages/thriftpy2/protocol/binary.py",
 line 256, in read_val
result.append(read_val(inbuf, v_type, v_spec, decode_response))
  File 

[jira] [Commented] (IMPALA-9622) Exceeding user disk quota leads to crash

2020-04-23 Thread Balazs Jeszenszky (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17090611#comment-17090611
 ] 

Balazs Jeszenszky commented on IMPALA-9622:
---

Seeing this few times a week now. I'll try to figure out the underlying 
(HBase?) issue to see if I'm right, feels like a TIMED_WAIT thread can also 
trigger deadlock detection.

> Exceeding user disk quota leads to crash
> 
>
> Key: IMPALA-9622
> URL: https://issues.apache.org/jira/browse/IMPALA-9622
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.12.0
>Reporter: Balazs Jeszenszky
>Priority: Major
> Attachments: pause_monitor_kill.log, threads_in_deadlock.log
>
>
> Apparently, a user running out of disk quota gets detected by JvmPauseMonitor 
> as a deadlock and the impalad is killed. This is potentially bad since it 
> could lead to a node crashing continuously if the user keeps firing queries 
> without clearing up disk space, although this is the first time I'm seeing 
> this.
> Ideally, the query should fail and cite the quota as a reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9622) Exceeding user disk quota leads to crash

2020-04-08 Thread Balazs Jeszenszky (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078636#comment-17078636
 ] 

Balazs Jeszenszky commented on IMPALA-9622:
---

Glad I'm not the only one then :) I looked at the info logs, the only 
out-of-the ordinary is a set of messages like:
{code}
I0407 14:36:43.525135  6027 status.cc:125] IOException: Failed to get result 
within timeout, timeout=6ms
@   0x980e8a
@   0xd372cf
@   0xe827ec
@   0xe84634
@   0xe7f193
@  0x1089d7c
@  0x108aee1
@   0xbd7706
@   0xbda49f
@   0xbc807a
@   0xdaa6ef
@   0xdaaeea
@  0x133383a
@ 0x7fc1847f0aa1
@ 0x7fc18453dbcd
{code}
The queries that failed due to this were run by the user who was over the 
quota. Maybe this timeout triggered the JvmPauseMonitor?

> Exceeding user disk quota leads to crash
> 
>
> Key: IMPALA-9622
> URL: https://issues.apache.org/jira/browse/IMPALA-9622
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.12.0
>Reporter: Balazs Jeszenszky
>Priority: Major
> Attachments: pause_monitor_kill.log, threads_in_deadlock.log
>
>
> Apparently, a user running out of disk quota gets detected by JvmPauseMonitor 
> as a deadlock and the impalad is killed. This is potentially bad since it 
> could lead to a node crashing continuously if the user keeps firing queries 
> without clearing up disk space, although this is the first time I'm seeing 
> this.
> Ideally, the query should fail and cite the quota as a reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9622) Exceeding user disk quota leads to crash

2020-04-08 Thread Balazs Jeszenszky (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17078605#comment-17078605
 ] 

Balazs Jeszenszky commented on IMPALA-9622:
---

Attached.

> Exceeding user disk quota leads to crash
> 
>
> Key: IMPALA-9622
> URL: https://issues.apache.org/jira/browse/IMPALA-9622
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.12.0
>Reporter: Balazs Jeszenszky
>Priority: Major
> Attachments: pause_monitor_kill.log, threads_in_deadlock.log
>
>
> Apparently, a user running out of disk quota gets detected by JvmPauseMonitor 
> as a deadlock and the impalad is killed. This is potentially bad since it 
> could lead to a node crashing continuously if the user keeps firing queries 
> without clearing up disk space, although this is the first time I'm seeing 
> this.
> Ideally, the query should fail and cite the quota as a reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-9622) Exceeding user disk quota leads to crash

2020-04-08 Thread Balazs Jeszenszky (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-9622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky updated IMPALA-9622:
--
Attachment: threads_in_deadlock.log

> Exceeding user disk quota leads to crash
> 
>
> Key: IMPALA-9622
> URL: https://issues.apache.org/jira/browse/IMPALA-9622
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.12.0
>Reporter: Balazs Jeszenszky
>Priority: Major
> Attachments: pause_monitor_kill.log, threads_in_deadlock.log
>
>
> Apparently, a user running out of disk quota gets detected by JvmPauseMonitor 
> as a deadlock and the impalad is killed. This is potentially bad since it 
> could lead to a node crashing continuously if the user keeps firing queries 
> without clearing up disk space, although this is the first time I'm seeing 
> this.
> Ideally, the query should fail and cite the quota as a reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-9622) Exceeding user disk quota leads to crash

2020-04-08 Thread Balazs Jeszenszky (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077920#comment-17077920
 ] 

Balazs Jeszenszky commented on IMPALA-9622:
---

FWIW I know it's possible this is the only way out of this - ie. it's a legit 
deadlock -, but not sure that's the case here.

> Exceeding user disk quota leads to crash
> 
>
> Key: IMPALA-9622
> URL: https://issues.apache.org/jira/browse/IMPALA-9622
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.12.0
>Reporter: Balazs Jeszenszky
>Priority: Major
> Attachments: pause_monitor_kill.log
>
>
> Apparently, a user running out of disk quota gets detected by JvmPauseMonitor 
> as a deadlock and the impalad is killed. This is potentially bad since it 
> could lead to a node crashing continuously if the user keeps firing queries 
> without clearing up disk space, although this is the first time I'm seeing 
> this.
> Ideally, the query should fail and cite the quota as a reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-9622) Exceeding user disk quota leads to crash

2020-04-08 Thread Balazs Jeszenszky (Jira)
Balazs Jeszenszky created IMPALA-9622:
-

 Summary: Exceeding user disk quota leads to crash
 Key: IMPALA-9622
 URL: https://issues.apache.org/jira/browse/IMPALA-9622
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 2.12.0
Reporter: Balazs Jeszenszky
 Attachments: pause_monitor_kill.log

Apparently, a user running out of disk quota gets detected by JvmPauseMonitor 
as a deadlock and the impalad is killed. This is potentially bad since it could 
lead to a node crashing continuously if the user keeps firing queries without 
clearing up disk space, although this is the first time I'm seeing this.

Ideally, the query should fail and cite the quota as a reason.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-6756) Order children of profiles in a canonical way

2020-02-12 Thread Balazs Jeszenszky (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky reassigned IMPALA-6756:
-

Assignee: Balazs Jeszenszky

> Order children of profiles in a canonical way
> -
>
> Key: IMPALA-6756
> URL: https://issues.apache.org/jira/browse/IMPALA-6756
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.12.0
>Reporter: Lars Volker
>Assignee: Balazs Jeszenszky
>Priority: Trivial
>  Labels: supportability
>
> IMPALA-6694 addressed an issue where variations in the execution order across 
> multiple backends changed the order of a fragment instance's profile 
> children. Instead of relying on the order in which children are created on 
> the worker nodes and reported to the coordinator, we should introduce a 
> canonical order, e.g. have the non-exec-node children appear before the 
> exec-nodes, and order them alphabetically. [This discussion in the 
> review|https://gerrit.cloudera.org/#/c/9749/3/be/src/util/runtime-profile.cc@358]
>  has some more details.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-6665) Tag CatalogOp logs with query IDs

2020-02-12 Thread Balazs Jeszenszky (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky reassigned IMPALA-6665:
-

Assignee: Balazs Jeszenszky

> Tag CatalogOp logs with query IDs
> -
>
> Key: IMPALA-6665
> URL: https://issues.apache.org/jira/browse/IMPALA-6665
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 2.12.0
>Reporter: Bharath Vissapragada
>Assignee: Balazs Jeszenszky
>Priority: Major
>  Labels: supportability
>
> Similar to IMPALA-6664. The idea is to improve catalog server logging by 
> adding query-ID to each of the Catalog server log statements. This helps map 
> Catalog errors to specific queries, which is currently not possible. 
> Raising a separate jira for the Catalog server since fixing it could be a 
> little tricker than the other components since we don't have the query hash 
> readily available in the Catalog context. We need to augment the Catalog RPCs 
> with this data. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-5256) ERROR log files can get very large

2020-02-12 Thread Balazs Jeszenszky (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky reassigned IMPALA-5256:
-

Assignee: Balazs Jeszenszky

> ERROR log files can get very large
> --
>
> Key: IMPALA-5256
> URL: https://issues.apache.org/jira/browse/IMPALA-5256
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.5.5.
>Reporter: Mala Chikka Kempanna
>Assignee: Balazs Jeszenszky
>Priority: Major
>  Labels: supportability
>
> There's a user who's reporting seeing very large ERROR log files (up to 7gb) 
> that don't seem to be obeying the --max_log_size parameter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-3880) Add list of all tables queried to runtime profile

2020-02-12 Thread Balazs Jeszenszky (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky reassigned IMPALA-3880:
-

Assignee: Balazs Jeszenszky

> Add list of all tables queried to runtime profile
> -
>
> Key: IMPALA-3880
> URL: https://issues.apache.org/jira/browse/IMPALA-3880
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.6.0
>Reporter: Henry Robinson
>Assignee: Balazs Jeszenszky
>Priority: Minor
>  Labels: newbie, ramp-up
>
> We list the tables missing stats in the runtime profile, but it would be 
> useful to see all tables referenced by a query.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-4331) Allow gaps in error codes in generate_error_codes.py

2020-02-12 Thread Balazs Jeszenszky (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky reassigned IMPALA-4331:
-

Assignee: Balazs Jeszenszky

> Allow gaps in error codes in generate_error_codes.py
> 
>
> Key: IMPALA-4331
> URL: https://issues.apache.org/jira/browse/IMPALA-4331
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 2.7.0
>Reporter: Lars Volker
>Assignee: Balazs Jeszenszky
>Priority: Minor
>  Labels: newbie
>
> Currently we don't allow gaps in error code numbers in 
> generate_error_codes.py, which sometimes makes backporting changes more 
> difficult. Since there doesn't seem to be a reason to forbid gaps we should 
> adapt generate_error_codes.py accordingly and allow them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-4366) Partition is created in HMS even though the ALTER TABLE .. ADD PARTITION stmt fails

2020-02-12 Thread Balazs Jeszenszky (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky reassigned IMPALA-4366:
-

Assignee: Balazs Jeszenszky

> Partition is created in HMS even though the ALTER TABLE .. ADD PARTITION stmt 
> fails
> ---
>
> Key: IMPALA-4366
> URL: https://issues.apache.org/jira/browse/IMPALA-4366
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.7.0
>Reporter: Dimitris Tsirogiannis
>Assignee: Balazs Jeszenszky
>Priority: Major
>  Labels: catalog-server, newbie
>
> In some cases, even though adding a partition using an ALTER TABLE ADD 
> PARTITION statement fails, the partition is still created in HMS. To 
> reproduce the problem:
> {code}
> create table foo (a int) partitioned by (x int);
> alter table foo add partition (x = false); <-- This throws a CatalogException 
> due to the wrong type used.
> alter table foo add partition (x = false); <--- This throws an 
> AlreadyExistsException for partition x=False
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8985) Error in profile accounting of bytes read

2019-10-10 Thread Balazs Jeszenszky (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16948409#comment-16948409
 ] 

Balazs Jeszenszky commented on IMPALA-8985:
---

Saw this on 5.15 as well:
{code}
- BytesRead: 81.0 GiB (86962377329)
- BytesReadDataNodeCache: 0 B (0)
- BytesReadLocal: 70.7 GiB (75937558874)
- BytesReadRemoteUnexpected: 30.7 GiB (32999585610)
- BytesReadShortCircuit: 70.7 GiB (75896212339)
{code}
Both of these profiles were complete, so I think that rules out IMPALA-8294. I 
can provide the complete profiles offline if someone takes this up.

> Error in profile accounting of bytes read
> -
>
> Key: IMPALA-8985
> URL: https://issues.apache.org/jira/browse/IMPALA-8985
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Balazs Jeszenszky
>Priority: Major
>
> I've seen this a few times by now:
> {code}
>- BytesRead: 571.14 MB (598886266)
>- BytesReadDataNodeCache: 0
>- BytesReadLocal: 1.12 GB (1197772532)
>- BytesReadRemoteUnexpected: 0
>- BytesReadShortCircuit: 1.12 GB (1197772532)
> {code}
> Haven't tried reproducing it on purpose, just keeps popping up. Counters like 
> this will get more important in cloud-backed clusters since we'll have a 
> mismatch between our charts and what the user ends up paying for S3-based 
> reads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8986) Typo in plan row count

2019-09-30 Thread Balazs Jeszenszky (Jira)
Balazs Jeszenszky created IMPALA-8986:
-

 Summary: Typo in plan row count
 Key: IMPALA-8986
 URL: https://issues.apache.org/jira/browse/IMPALA-8986
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 3.3.0
Reporter: Balazs Jeszenszky
Assignee: Tamas Mate


{code}
00:SCAN HDFS [default.genotype_256, RANDOM]
   HDFS partitions=1/1 files=78 size=19.27GB
   runtime filters: RF000[bloom] -> genotype_256.sampleid
   stored statistics:
 table: rows=8.23G size=19.27GB
 columns: all
   extrapolated-rows=disabled max-scan-range-rows=106.58M
   mem-estimate=88.00MB mem-reservation=16.00MB thread-reservation=0
   tuple-ids=0 row-size=44B cardinality=8.23G
   in pipelines: 00(GETNEXT)
{code}
Exec summary is correct:
{code}
00:SCAN HDFS 32   16s545ms   22s441ms   8.23B   8.23B
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8985) Error in profile accounting of bytes read

2019-09-30 Thread Balazs Jeszenszky (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16940755#comment-16940755
 ] 

Balazs Jeszenszky commented on IMPALA-8985:
---

Having sub-counters that add up would be useful in many places. The parent 
counter would start/stop if any/none of its children are started, making sure 
both that the totals add up and that there are no 'hidden' cases like here 
(expected remote reads are not explicitly called out, so difference in 
{{BytesRead}} and the sum of the others is possible even without this bug).

> Error in profile accounting of bytes read
> -
>
> Key: IMPALA-8985
> URL: https://issues.apache.org/jira/browse/IMPALA-8985
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.3.0
>Reporter: Balazs Jeszenszky
>Priority: Major
>
> I've seen this a few times by now:
> {code}
>- BytesRead: 571.14 MB (598886266)
>- BytesReadDataNodeCache: 0
>- BytesReadLocal: 1.12 GB (1197772532)
>- BytesReadRemoteUnexpected: 0
>- BytesReadShortCircuit: 1.12 GB (1197772532)
> {code}
> Haven't tried reproducing it on purpose, just keeps popping up. Counters like 
> this will get more important in cloud-backed clusters since we'll have a 
> mismatch between our charts and what the user ends up paying for S3-based 
> reads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8985) Error in profile accounting of bytes read

2019-09-30 Thread Balazs Jeszenszky (Jira)
Balazs Jeszenszky created IMPALA-8985:
-

 Summary: Error in profile accounting of bytes read
 Key: IMPALA-8985
 URL: https://issues.apache.org/jira/browse/IMPALA-8985
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 3.3.0
Reporter: Balazs Jeszenszky


I've seen this a few times by now:
{code}
   - BytesRead: 571.14 MB (598886266)
   - BytesReadDataNodeCache: 0
   - BytesReadLocal: 1.12 GB (1197772532)
   - BytesReadRemoteUnexpected: 0
   - BytesReadShortCircuit: 1.12 GB (1197772532)
{code}

Haven't tried reproducing it on purpose, just keeps popping up. Counters like 
this will get more important in cloud-backed clusters since we'll have a 
mismatch between our charts and what the user ends up paying for S3-based reads.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8429) Update docs to reflect default join distribution mode change

2019-08-12 Thread Balazs Jeszenszky (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky resolved IMPALA-8429.
---
Resolution: Invalid

> Update docs to reflect default join distribution mode change
> 
>
> Key: IMPALA-8429
> URL: https://issues.apache.org/jira/browse/IMPALA-8429
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Affects Versions: Impala 3.2.0
>Reporter: Balazs Jeszenszky
>Assignee: Alex Rodoni
>Priority: Minor
>
> The 'DEFAULT_JOIN_DISTRIBUTION_MODE Query Option' page needs an update to 
> reflect the changes in IMPALA-5120.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8429) Update docs to reflect default join distribution mode change

2019-08-12 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16905518#comment-16905518
 ] 

Balazs Jeszenszky commented on IMPALA-8429:
---

Sorry for the delay. This took me a while to figure out again for some reason. 
I agree the docs are correct, I don't think I was aware of IMPALA-5381 at the 
time of submitting this request.

> Update docs to reflect default join distribution mode change
> 
>
> Key: IMPALA-8429
> URL: https://issues.apache.org/jira/browse/IMPALA-8429
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Affects Versions: Impala 3.2.0
>Reporter: Balazs Jeszenszky
>Assignee: Alex Rodoni
>Priority: Minor
>
> The 'DEFAULT_JOIN_DISTRIBUTION_MODE Query Option' page needs an update to 
> reflect the changes in IMPALA-5120.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8606) GET_TABLES performance in local catalog mode

2019-06-07 Thread Balazs Jeszenszky (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky updated IMPALA-8606:
--
Description: 
With local catalog mode enabled, GET_TABLES JDBC requests will return more than 
the always available table information. Any request for more metadata about a 
table will trigger a full load of that table on the catalogd side, meaning that 
GET_TABLES triggers the load of the entire catalog. Also, as far as I can see, 
the requests for more metadata are made one table at a time. 

Once the tables are loaded on the catalogd-side, a coordinator needs 3 
roundtrips to the catalog to fetch all the details about a single table. My 
test case had around 57k tables, 1700 DBs, and ~120k partitions. 
GET_TABLES on a cold catalog takes 18 minutes. With a warm catalog, but cold 
impalad, it still takes ~70 seconds.

Many tools use GET_TABLES to populate dropdowns, etc. so this is bad for both 
end user experience and catalog memory usage.

  was:
With local catalog mode enabled, GET_TABLES JDBC requests will return more than 
the always available table information. Any request for more metadata about a 
table will trigger a full load of that table on the catalogd side, meaning that 
GET_TABLES triggers the load of the entire catalog. Also, as far as I can see, 
the requests for more metadata are made one table at a time. 

Once the tables are loaded, the coordinator needs 3 roundtrips to the catalog 
to fetch all the details about a single table. My test case had around 57k 
tables, 1700 DBs, and ~120k partitions. 
GET_TABLES on a cold catalog takes 18 minutes. With a warm catalog, but cold 
impalad, it still takes ~70 seconds.

Many tools use GET_TABLES to populate dropdowns, etc. so this is bad for both 
end user experience and catalog memory usage.


> GET_TABLES performance in local catalog mode
> 
>
> Key: IMPALA-8606
> URL: https://issues.apache.org/jira/browse/IMPALA-8606
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.2.0
>Reporter: Balazs Jeszenszky
>Assignee: Todd Lipcon
>Priority: Critical
>
> With local catalog mode enabled, GET_TABLES JDBC requests will return more 
> than the always available table information. Any request for more metadata 
> about a table will trigger a full load of that table on the catalogd side, 
> meaning that GET_TABLES triggers the load of the entire catalog. Also, as far 
> as I can see, the requests for more metadata are made one table at a time. 
> Once the tables are loaded on the catalogd-side, a coordinator needs 3 
> roundtrips to the catalog to fetch all the details about a single table. My 
> test case had around 57k tables, 1700 DBs, and ~120k partitions. 
> GET_TABLES on a cold catalog takes 18 minutes. With a warm catalog, but cold 
> impalad, it still takes ~70 seconds.
> Many tools use GET_TABLES to populate dropdowns, etc. so this is bad for both 
> end user experience and catalog memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8606) GET_TABLES performance in local catalog mode

2019-06-07 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858453#comment-16858453
 ] 

Balazs Jeszenszky commented on IMPALA-8606:
---

Also, the v1 bugs seem preferable to this one, so reverting to only serving 
full info if the table's already loaded on the catalog would work, too. As long 
as the catalogd is used for reference on what's loaded, and not individual 
coordinators' local cache, I don't think it would be much more noticeable.

> GET_TABLES performance in local catalog mode
> 
>
> Key: IMPALA-8606
> URL: https://issues.apache.org/jira/browse/IMPALA-8606
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.2.0
>Reporter: Balazs Jeszenszky
>Assignee: Todd Lipcon
>Priority: Critical
>
> With local catalog mode enabled, GET_TABLES JDBC requests will return more 
> than the always available table information. Any request for more metadata 
> about a table will trigger a full load of that table on the catalogd side, 
> meaning that GET_TABLES triggers the load of the entire catalog. Also, as far 
> as I can see, the requests for more metadata are made one table at a time. 
> Once the tables are loaded, the coordinator needs 3 roundtrips to the catalog 
> to fetch all the details about a single table. My test case had around 57k 
> tables, 1700 DBs, and ~120k partitions. 
> GET_TABLES on a cold catalog takes 18 minutes. With a warm catalog, but cold 
> impalad, it still takes ~70 seconds.
> Many tools use GET_TABLES to populate dropdowns, etc. so this is bad for both 
> end user experience and catalog memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8606) GET_TABLES performance in local catalog mode

2019-06-07 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16858433#comment-16858433
 ] 

Balazs Jeszenszky commented on IMPALA-8606:
---

A couple of things that sound useful independently, and would IMO add up to a 
nice behaviour in the end:
* more granular metadata loading on catalog side
* eagerly caching HMS metadata except incremental stats on the catalog side
* GET_TABLES could return a partial result set while fetching the rest from 
catalog. Not sure how useful this is to clients.

Currently iirc coordinator fetches first the table details, then the partition 
list, then info about individually specified partitions. According to 
https://github.com/cloudera/Impala/blob/cdh6.2.0/common/thrift/CatalogService.thrift#L271-L283,
 that could be one RPC. The handling of one of these on the catalog side is 
significantly slower than the rest (4-500ms), I think it was getting the 
partition info, not sure that's expected. 
We could add something like {{TMultipleTableInfoSelector}} to avoid having to 
request tables serially.

> GET_TABLES performance in local catalog mode
> 
>
> Key: IMPALA-8606
> URL: https://issues.apache.org/jira/browse/IMPALA-8606
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.2.0
>Reporter: Balazs Jeszenszky
>Assignee: Todd Lipcon
>Priority: Critical
>
> With local catalog mode enabled, GET_TABLES JDBC requests will return more 
> than the always available table information. Any request for more metadata 
> about a table will trigger a full load of that table on the catalogd side, 
> meaning that GET_TABLES triggers the load of the entire catalog. Also, as far 
> as I can see, the requests for more metadata are made one table at a time. 
> Once the tables are loaded, the coordinator needs 3 roundtrips to the catalog 
> to fetch all the details about a single table. My test case had around 57k 
> tables, 1700 DBs, and ~120k partitions. 
> GET_TABLES on a cold catalog takes 18 minutes. With a warm catalog, but cold 
> impalad, it still takes ~70 seconds.
> Many tools use GET_TABLES to populate dropdowns, etc. so this is bad for both 
> end user experience and catalog memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8606) GET_TABLES performance in local catalog mode

2019-05-31 Thread Balazs Jeszenszky (JIRA)
Balazs Jeszenszky created IMPALA-8606:
-

 Summary: GET_TABLES performance in local catalog mode
 Key: IMPALA-8606
 URL: https://issues.apache.org/jira/browse/IMPALA-8606
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Affects Versions: Impala 3.2.0
Reporter: Balazs Jeszenszky


With local catalog mode enabled, GET_TABLES JDBC requests will return more than 
the always available table information. Any request for more metadata about a 
table will trigger a full load of that table on the catalogd side, meaning that 
GET_TABLES triggers the load of the entire catalog. Also, as far as I can see, 
the requests for more metadata are made one table at a time. 

Once the tables are loaded, the coordinator needs 3 roundtrips to the catalog 
to fetch all the details about a single table. My test case had around 57k 
tables, 1700 DBs, and ~120k partitions. 
GET_TABLES on a cold catalog takes 18 minutes. With a warm catalog, but cold 
impalad, it still takes ~70 seconds.

Many tools use GET_TABLES to populate dropdowns, etc. so this is bad for both 
end user experience and catalog memory usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8601) Validate redaction rules before checking against glog_v=3

2019-05-30 Thread Balazs Jeszenszky (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky updated IMPALA-8601:
--
Description: 
https://github.com/cloudera/Impala/blob/cdh6.2.0/be/src/common/init.cc#L224-L228
 just checks if there is a file configured or not. This results in startup 
failures for practically empty redaction rule files, like this one emitted by 
CM on an empty redaction rule configuration:
{code}
{
  "version": 1,
  "rules": []
}
{code}

  was:
https://github.com/cloudera/Impala/blob/cdh6.2.0/be/src/common/init.cc#L224-L228
 just checks if the file is empty or not. This results in startup failures for 
practically empty redaction rule files, like this one emitted by CM on an empty 
redaction rule configuration:
{code}
{
  "version": 1,
  "rules": []
}
{code}


> Validate redaction rules before checking against glog_v=3
> -
>
> Key: IMPALA-8601
> URL: https://issues.apache.org/jira/browse/IMPALA-8601
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Balazs Jeszenszky
>Priority: Minor
>
> https://github.com/cloudera/Impala/blob/cdh6.2.0/be/src/common/init.cc#L224-L228
>  just checks if there is a file configured or not. This results in startup 
> failures for practically empty redaction rule files, like this one emitted by 
> CM on an empty redaction rule configuration:
> {code}
> {
>   "version": 1,
>   "rules": []
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8601) Validate redaction rules before checking against glog_v=3

2019-05-30 Thread Balazs Jeszenszky (JIRA)
Balazs Jeszenszky created IMPALA-8601:
-

 Summary: Validate redaction rules before checking against glog_v=3
 Key: IMPALA-8601
 URL: https://issues.apache.org/jira/browse/IMPALA-8601
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Balazs Jeszenszky


https://github.com/cloudera/Impala/blob/cdh6.2.0/be/src/common/init.cc#L224-L228
 just checks if the file is empty or not. This results in startup failures for 
practically empty redaction rule files, like this one emitted by CM on an empty 
redaction rule configuration:
{code}
{
  "version": 1,
  "rules": []
}
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7107) [DOCS] Review docs for storage formats impala cannot insert into

2019-05-14 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839489#comment-16839489
 ] 

Balazs Jeszenszky commented on IMPALA-7107:
---

That's right. Sorry, I misunderstood your original comment. Disregard.

> [DOCS] Review docs for storage formats impala cannot insert into
> 
>
> Key: IMPALA-7107
> URL: https://issues.apache.org/jira/browse/IMPALA-7107
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Affects Versions: Impala 2.12.0
>Reporter: Balazs Jeszenszky
>Assignee: Alex Rodoni
>Priority: Minor
> Fix For: Impala 3.2.0
>
>
> There are several points to clear up or improve across these pages:
> * I'd refer to the Hive documentation on how to set compression codecs 
> instead of documenting Hive's behaviour for file formats Impala cannot write
> * Add 'Ingesting file formats Impala can't write' section to 'How Impala 
> Works with Hadoop File Formats' page, link that central location from 
> wherever applicable. Unify the recommendation on data loading (usage of LOAD 
> DATA or hive or manual copy).
> * add a compatibility matrix for compressions and file formats, clear up 
> compatibility on 'How Impala Works with Hadoop File Formats' (the page is 
> inconsistent even within itself, e.g. bzip2).
> * Remove references to Impala versions <2.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8533) Impala daemon crash on sort

2019-05-10 Thread Balazs Jeszenszky (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky updated IMPALA-8533:
--
Labels: crash  (was: )

> Impala daemon crash on sort
> ---
>
> Key: IMPALA-8533
> URL: https://issues.apache.org/jira/browse/IMPALA-8533
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.2.0
>Reporter: Jeremy Beard
>Priority: Major
>  Labels: crash
> Attachments: fatal_error.txt, hs_err_pid8552.log, query.txt
>
>
> Running the attached data generation query crashes the Impala coordinator 
> daemon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8533) Impala daemon crash on sort

2019-05-10 Thread Balazs Jeszenszky (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky updated IMPALA-8533:
--
Component/s: Backend

> Impala daemon crash on sort
> ---
>
> Key: IMPALA-8533
> URL: https://issues.apache.org/jira/browse/IMPALA-8533
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Jeremy Beard
>Priority: Blocker
>  Labels: crash
> Attachments: fatal_error.txt, hs_err_pid8552.log, query.txt
>
>
> Running the attached data generation query crashes the Impala coordinator 
> daemon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8533) Impala daemon crash on sort

2019-05-10 Thread Balazs Jeszenszky (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky updated IMPALA-8533:
--
Priority: Blocker  (was: Major)

> Impala daemon crash on sort
> ---
>
> Key: IMPALA-8533
> URL: https://issues.apache.org/jira/browse/IMPALA-8533
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.2.0
>Reporter: Jeremy Beard
>Priority: Blocker
>  Labels: crash
> Attachments: fatal_error.txt, hs_err_pid8552.log, query.txt
>
>
> Running the attached data generation query crashes the Impala coordinator 
> daemon.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8270) ASAN issue with MemTracker::LogUsage() called via webserver's /memz page

2019-04-17 Thread Balazs Jeszenszky (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky updated IMPALA-8270:
--
Priority: Blocker  (was: Critical)

> ASAN issue with MemTracker::LogUsage() called via webserver's /memz page
> 
>
> Key: IMPALA-8270
> URL: https://issues.apache.org/jira/browse/IMPALA-8270
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Joe McDonnell
>Priority: Blocker
>  Labels: broken-build
>
> I saw this on an ASAN build from a several days ago:
> {noformat}
> ==124622==ERROR: AddressSanitizer: heap-use-after-free on address 
> 0x61200337f2d8 at pc 0x01fdbdc5 bp 0x7f3b9e11db90 sp 0x7f3b9e11db88
> READ of size 8 at 0x61200337f2d8 thread T145092 (sq_worker)
> #0 0x1fdbdc4 in impala::MemTracker::LogUsage(int, std::string const&, long*) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/runtime/mem-tracker.cc:297:7
> #1 0x1fded9a in impala::MemTracker::LogUsage(int, std::string const&, 
> std::list > const&, 
> long*) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/runtime/mem-tracker.cc:362:36
> #2 0x1fdbb6c in impala::MemTracker::LogUsage(int, std::string const&, long*) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/runtime/mem-tracker.cc:338:28
> #3 0x1fded9a in impala::MemTracker::LogUsage(int, std::string const&, 
> std::list > const&, 
> long*) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/runtime/mem-tracker.cc:362:36
> #4 0x1fdbb6c in impala::MemTracker::LogUsage(int, std::string const&, long*) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/runtime/mem-tracker.cc:338:28
> #5 0x1fded9a in impala::MemTracker::LogUsage(int, std::string const&, 
> std::list > const&, 
> long*) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/runtime/mem-tracker.cc:362:36
> #6 0x1fdbb6c in impala::MemTracker::LogUsage(int, std::string const&, long*) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/runtime/mem-tracker.cc:338:28
> #7 0x241766f in MemUsageHandler(impala::MemTracker*, impala::MetricGroup*, 
> std::map, 
> std::allocator > > const&, 
> rapidjson::GenericDocument, 
> rapidjson::MemoryPoolAllocator, 
> rapidjson::CrtAllocator>*) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/util/default-path-handlers.cc:155:31
> #8 0x25296e5 in boost::function2 std::less, std::allocator std::string> > > const&, rapidjson::GenericDocument, 
> rapidjson::MemoryPoolAllocator, 
> rapidjson::CrtAllocator>*>::operator()(std::map std::less, std::allocator std::string> > > const&, rapidjson::GenericDocument, 
> rapidjson::MemoryPoolAllocator, 
> rapidjson::CrtAllocator>*) const 
> /data/jenkins/workspace/impala-asf-master-core-asan/Impala-Toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:766:14
> #9 0x2527849 in 
> impala::Webserver::RenderUrlWithTemplate(std::map std::less, std::allocator std::string> > > const&, impala::Webserver::UrlHandler const&, 
> std::basic_stringstream, std::allocator 
> >*, impala::ContentType*) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/util/webserver.cc:447:3
> #10 0x2526ea7 in impala::Webserver::BeginRequestCallback(sq_connection*, 
> sq_request_info*) 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/util/webserver.cc:419:5
> #11 0x253f1bf in handle_request 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/service/impalad+0x253f1bf)
> #12 0x253ebed in process_new_connection 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/service/impalad+0x253ebed)
> #13 0x253e616 in worker_thread 
> (/data0/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/build/debug/service/impalad+0x253e616)
> #14 0x7f45dfbdbe24 in start_thread (/lib64/libpthread.so.0+0x7e24)
> #15 0x7f45df6f234c in __clone (/lib64/libc.so.6+0xf834c)
> 0x61200337f2d8 is located 152 bytes inside of 312-byte region 
> [0x61200337f240,0x61200337f378)
> freed by thread T138428 here:
> #0 0x17ce6c0 in operator delete(void*) 
> /mnt/source/llvm/llvm-5.0.1.src-p1/projects/compiler-rt/lib/asan/asan_new_delete.cc:137
> #1 0x200ed3c in impala::RuntimeState::~RuntimeState() 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/runtime/runtime-state.cc:111:1
> #2 0x2212e86 in 
> Java_org_apache_impala_service_FeSupport_NativeEvalExprsWithoutRow 
> /data/jenkins/workspace/impala-asf-master-core-asan/repos/Impala/be/src/service/fe-support.cc:275:1
> #3 0x7f45c60d1d74 ()
> #4 0x7f45c6e0921b ()
> previously allocated by thread T138428 

[jira] [Created] (IMPALA-8429) Update docs to reflect default join distribution mode change

2019-04-17 Thread Balazs Jeszenszky (JIRA)
Balazs Jeszenszky created IMPALA-8429:
-

 Summary: Update docs to reflect default join distribution mode 
change
 Key: IMPALA-8429
 URL: https://issues.apache.org/jira/browse/IMPALA-8429
 Project: IMPALA
  Issue Type: Bug
  Components: Docs
Affects Versions: Impala 3.2.0
Reporter: Balazs Jeszenszky


The 'DEFAULT_JOIN_DISTRIBUTION_MODE Query Option' page needs an update to 
reflect the changes in IMPALA-5120.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8408) Failure to load metadata for .deflate compressed text files

2019-04-11 Thread Balazs Jeszenszky (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky resolved IMPALA-8408.
---
   Resolution: Fixed
Fix Version/s: Impala 2.13.0

Fixed indirectly.

> Failure to load metadata for .deflate compressed text files
> ---
>
> Key: IMPALA-8408
> URL: https://issues.apache.org/jira/browse/IMPALA-8408
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 2.12.0
>Reporter: Balazs Jeszenszky
>Priority: Major
> Fix For: Impala 2.13.0
>
>
> While the metadata is loaded successfully on the catalogd side, it can't be 
> applied on the impalads:
> {code}
> I1005 14:07:25.045325 27076 Frontend.java:962] Analyzing query: describe test1
> I1005 14:07:25.045603 27076 FeSupport.java:274] Requesting prioritized load 
> of table(s): default.test1
> ...
> E1005 14:07:30.871942 19685 ImpaladCatalog.java:201] Error adding catalog 
> object: Expected compressed text file with {.lzo,.gzip,.snappy,.bz2} suffix: 
> 00_0.deflate
> Java exception follows:
> java.lang.RuntimeException: Expected compressed text file with 
> {.lzo,.gzip,.snappy,.bz2} suffix: 00_0.deflate
> at 
> org.apache.impala.catalog.HdfsPartition.(HdfsPartition.java:772)
> at 
> org.apache.impala.catalog.HdfsPartition.fromThrift(HdfsPartition.java:884)
> at 
> org.apache.impala.catalog.HdfsTable.loadFromThrift(HdfsTable.java:1678)
> at org.apache.impala.catalog.Table.fromThrift(Table.java:311)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:403)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:292)
> at 
> org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:199)
> at 
> org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:223)
> at 
> org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:175)
> {code}
> This results in the affected queries hanging indefinitely in planning phase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8408) Failure to load metadata for .deflate compressed text files

2019-04-11 Thread Balazs Jeszenszky (JIRA)
Balazs Jeszenszky created IMPALA-8408:
-

 Summary: Failure to load metadata for .deflate compressed text 
files
 Key: IMPALA-8408
 URL: https://issues.apache.org/jira/browse/IMPALA-8408
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Affects Versions: Impala 2.12.0, Impala 2.11.0, Impala 2.10.0, Impala 2.9.0
Reporter: Balazs Jeszenszky


While the metadata is loaded successfully on the catalogd side, it can't be 
applied on the impalads:
{code}
I1005 14:07:25.045325 27076 Frontend.java:962] Analyzing query: describe test1
I1005 14:07:25.045603 27076 FeSupport.java:274] Requesting prioritized load of 
table(s): default.test1
...
E1005 14:07:30.871942 19685 ImpaladCatalog.java:201] Error adding catalog 
object: Expected compressed text file with {.lzo,.gzip,.snappy,.bz2} suffix: 
00_0.deflate
Java exception follows:
java.lang.RuntimeException: Expected compressed text file with 
{.lzo,.gzip,.snappy,.bz2} suffix: 00_0.deflate
at 
org.apache.impala.catalog.HdfsPartition.(HdfsPartition.java:772)
at 
org.apache.impala.catalog.HdfsPartition.fromThrift(HdfsPartition.java:884)
at 
org.apache.impala.catalog.HdfsTable.loadFromThrift(HdfsTable.java:1678)
at org.apache.impala.catalog.Table.fromThrift(Table.java:311)
at 
org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:403)
at 
org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:292)
at 
org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:199)
at 
org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:223)
at 
org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:175)
{code}

This results in the affected queries hanging indefinitely in planning phase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8108) Impala query returns TIMESTAMP values in different types

2019-03-28 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16803682#comment-16803682
 ] 

Balazs Jeszenszky commented on IMPALA-8108:
---

Not sure if this is a good idea - if someone requires a certain format, it's 
best to use a specific format string, and I wouldn't expect every timestamp to 
have a bunch of trailing zeroes. [~grahn] thoughts?

> Impala query returns TIMESTAMP values in different types
> 
>
> Key: IMPALA-8108
> URL: https://issues.apache.org/jira/browse/IMPALA-8108
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Robbie Zhang
>Assignee: Robbie Zhang
>Priority: Major
>
> When a timestamp has a .000 or .00 or .0 (when fraction value is 
> zeros) the timestamp is displayed with no fraction of second. For example:
> {code:java}
> select cast(ts as timestamp) from 
>  (values 
>  ('2019-01-11 10:40:18' as ts),
>  ('2019-01-11 10:40:19.0'),
>  ('2019-01-11 10:40:19.00'), 
>  ('2019-01-11 10:40:19.000'),
>  ('2019-01-11 10:40:19.'),
>  ('2019-01-11 10:40:19.0'),
>  ('2019-01-11 10:40:19.00'),
>  ('2019-01-11 10:40:19.000'),
>  ('2019-01-11 10:40:19.'),
>  ('2019-01-11 10:40:19.0'),
>  ('2019-01-11 10:40:19.1')
>  ) t;{code}
> The output is:
> {code:java}
> +---+
> |cast(ts as timestamp)|
> +---+
> |2019-01-11 10:40:18|
> |2019-01-11 10:40:19|
> |2019-01-11 10:40:19|
> |2019-01-11 10:40:19|
> |2019-01-11 10:40:19|
> |2019-01-11 10:40:19|
> |2019-01-11 10:40:19|
> |2019-01-11 10:40:19|
> |2019-01-11 10:40:19|
> |2019-01-11 10:40:19|
> |2019-01-11 10:40:19.1|
> +---+
> {code}
> As we can see, values of the same column are returned in two different types. 
> The inconsistency breaks some downstream use cases. 
> The reason is that impala uses function 
> boost::posix_time::to_simple_string(time_duration) to convert timestamp to a 
> string and to_simple_string() remove fractional seconds if they are all 
> zeros. Perhaps we can append ".0" if the length of the string is 8 
> (HH:MM:SS).
> For now we can work around it by using function from_timestamp(ts, 
> '-mm-dd hh:mm.ss.s') to unify the output (convert to string), or 
> using function millisecond(ts) to get fractional seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2019-03-21 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797992#comment-16797992
 ] 

Balazs Jeszenszky commented on IMPALA-7310:
---

[~Paul.Rogers] are you working on this?

> Compute Stats not computing NULLs as a distinct value causing wrong estimates
> -
>
> Key: IMPALA-7310
> URL: https://issues.apache.org/jira/browse/IMPALA-7310
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, 
> Impala 2.11.0, Impala 3.0, Impala 2.12.0
>Reporter: Zsombor Fedor
>Priority: Major
>
> As seen in other DBMSs
> {code:java}
> NDV(col){code}
> not counting NULL as a distinct value. The same also applies to
> {code:java}
> COUNT(DISTINCT col){code}
> This is working as intended, but when computing column statistics it can 
> cause some anomalies (i.g. bad join order) as compute stats uses NDV() to 
> determine columns NDVs.
>  
> For example when aggregating more columns, the estimated cardinality is 
> [counted as the product of the columns' number of distinct 
> values.|https://github.com/cloudera/Impala/blob/64cd0bb0c3529efa0ab5452c4e9e2a04fd815b4f/fe/src/main/java/org/apache/impala/analysis/Expr.java#L669]
>  If there is a column full of NULLs the whole product will be 0.
>  
> There are two possible fix for this.
> Either we should count NULLs as a distinct value when Computing Stats in the 
> query:
> {code:java}
> SELECT NDV(a) + COUNT(DISTINCT CASE WHEN a IS NULL THEN 1 END) AS a, CAST(-1 
> as BIGINT), 4, CAST(4 as DOUBLE) FROM test;{code}
> instead of
> {code:java}
> SELECT NDV(a) AS a, CAST(-1 as BIGINT), 4, CAST(4 as DOUBLE) FROM test;{code}
>  
>  
> Or we should change the planner 
> [function|https://github.com/cloudera/Impala/blob/2d2579cb31edda24457d33ff5176d79b7c0432c5/fe/src/main/java/org/apache/impala/planner/AggregationNode.java#L169]
>  to take care of this bug.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7802) Implement support for closing idle sessions

2019-03-20 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16797101#comment-16797101
 ] 

Balazs Jeszenszky commented on IMPALA-7802:
---

I agree with Tim that closing sessions is the lesser evil compared to the 
current state of affairs. I also think this is arguably a bug, it'd be good to 
be in a slightly better state sooner rather than later.

> Implement support for closing idle sessions
> ---
>
> Key: IMPALA-7802
> URL: https://issues.apache.org/jira/browse/IMPALA-7802
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Michael Ho
>Assignee: Zoram Thanga
>Priority: Critical
>  Labels: supportability
>
> Currently, the query option {{idle_session_timeout}} specifies a timeout in 
> seconds after which all running queries of that idle session will be 
> cancelled and no new queries can be issued to it. However, the idle session 
> will remain open and it needs to be closed explicitly. Please see the 
> [documentation|https://www.cloudera.com/documentation/enterprise/latest/topics/impala_idle_session_timeout.html]
>  for details.
> This behavior may be undesirable as each session still consumes an Impala 
> frontend service thread. The number of frontend service threads is bound by 
> the flag {{fe_service_threads}}. So, in a multi-tenant environment, an Impala 
> server can have a lot of idle sessions but they still consume against the 
> quota of {{fe_service_threads}}. If the number of sessions established 
> reaches {{fe_service_threads}}, all new session creations will block until 
> some of the existing sessions exit. There may be no time bound on when these 
> zombie idle sessions will be closed and it's at the mercy of the client 
> implementation to close them. In some sense, leaving many idle sessions open 
> is a way to launch a denial of service attack on Impala.
> To fix this situation, we should have an option to forcefully close a session 
> when it's considered idle so it won't unnecessarily consume the limited 
> number of frontend service threads. cc'ing [~zoram]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8206) Codegen crash on analytic functions in specific environments

2019-02-15 Thread Balazs Jeszenszky (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky resolved IMPALA-8206.
---
   Resolution: Fixed
Fix Version/s: Impala 2.10.0

Fixed indirectly (though I'm not sure which change was responsible).

> Codegen crash on analytic functions in specific environments
> 
>
> Key: IMPALA-8206
> URL: https://issues.apache.org/jira/browse/IMPALA-8206
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.7.0, Impala 2.8.0, Impala 2.9.0
>Reporter: Balazs Jeszenszky
>Priority: Blocker
>  Labels: crash
> Fix For: Impala 2.10.0
>
>
> The following sequence crashes in certain Impala versions on certain 
> environments:
> {code}
> create table test (c1 int, c2 decimal(2,2));
> insert into test values (1,0.0);
> SELECT ROW_NUMBER() OVER (PARTITION BY a ORDER BY c1) b
> FROM (SELECT c1, (case when c2 = 1 then 1 else 0 end) as a FROM test) t;
> {code}
> Any analytic function will do.
> FATAL log:
> {code}
> F0212 06:58:45.937505  3119 llvm-codegen.cc:106] LLVM hit fatal error: Cannot 
> select: 0x8e23130: i32 = X86ISD::CMP 0x61ff390, 0x61ffd10
>   0x61ff390: i1,ch = CopyFromReg 0x5c21830, Register:i1 %vreg72
> 0x8e23980: i1 = Register %vreg72
>   0x61ffd10: i1 = or 0x8e234c0, 0x87e0be0
> 0x8e234c0: i1,ch = CopyFromReg 0x5c21830, Register:i1 %vreg133
>   0x8e23ab0: i1 = Register %vreg133
> 0x87e0be0: i1,ch = CopyFromReg 0x5c21830, Register:i1 %vreg134
>   0x87b0720: i1 = Register %vreg134
> In function: Compare
> {code}
> Minidump:
> {code}
>  0  libc-2.12.so + 0x325e5
>  1  libc-2.12.so + 0x33dc5
>  2  libc-2.12.so + 0x6b58
>  3  impalad!llvm::MCJIT::emitObject(llvm::Module*) + 0x119
>  4  0x7f98ab6a74c0
>  5  impalad!llvm::SelectionDAGISel::DoInstructionSelection() + 0x254
>  6  impalad!llvm::SelectionDAGISel::CodeGenAndEmitDAG() + 0x1d2
>  7  impalad!llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function 
> const&) + 0x36a
>  8  
> impalad!llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) 
> + 0x48f
>  9  impalad!(anonymous 
> namespace)::X86DAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 
> 0x14
> 10  impalad!llvm::FPPassManager::runOnFunction(llvm::Function&) + 0x27a
> 11  impalad!llvm::legacy::PassManagerImpl::run(llvm::Module&) + 0x263
> 12  impalad!llvm::MCJIT::emitObject(llvm::Module*) + 0x119
> 13  impalad!llvm::MCJIT::generateCodeForModule(llvm::Module*) + 0x301
> 14  impalad!llvm::MCJIT::finalizeObject() + 0x110
> 15  impalad!impala::LlvmCodeGen::FinalizeModule() [llvm-codegen.cc : 937 + 
> 0x2]
> 16  impalad!impala::PlanFragmentExecutor::OptimizeLlvmModule() 
> [plan-fragment-executor.cc : 300 + 0xd]
> 17  impalad!impala::PlanFragmentExecutor::Open() [plan-fragment-executor.cc : 
> 342 + 0x8]
> 18  impalad!impala::FragmentMgr::FragmentExecState::Exec() 
> [fragment-exec-state.cc : 58 + 0xb]
> 19  impalad!impala::FragmentMgr::FragmentThread(impala::TUniqueId) 
> [fragment-mgr.cc : 90 + 0xa]
> {code}
> Didn't test versions earlier than 2.7.
> We've used RHEL6.7. For the two environments where we managed to repro, lscpu 
> gives:
> {code}
> Architecture:  x86_64
> CPU op-mode(s):32-bit, 64-bit
> Byte Order:Little Endian
> CPU(s):6
> On-line CPU(s) list:   0-5
> Thread(s) per core:1
> Core(s) per socket:1
> Socket(s): 6
> NUMA node(s):  1
> Vendor ID: GenuineIntel
> CPU family:6
> Model: 61
> Stepping:  2
> CPU MHz:   2197.454
> BogoMIPS:  4394.90
> Hypervisor vendor: KVM
> Virtualization type:   full
> L1d cache: 32K
> L1i cache: 32K
> L2 cache:  4096K
> NUMA node0 CPU(s): 0-5
> {code}
> and
> {code}
> Architecture:  x86_64
> CPU op-mode(s):32-bit, 64-bit
> Byte Order:Little Endian
> CPU(s):40
> On-line CPU(s) list:   0-39
> Thread(s) per core:2
> Core(s) per socket:10
> Socket(s): 2
> NUMA node(s):  2
> Vendor ID: GenuineIntel
> CPU family:6
> Model: 85
> Model name:Intel(R) Xeon(R) Gold 5115 CPU @ 2.40GHz
> Stepping:  4
> CPU MHz:   2394.367
> BogoMIPS:  4786.29
> Virtualization:VT-x
> L1d cache: 32K
> L1i cache: 32K
> L2 cache:  1024K
> L3 cache:  14080K
> NUMA node0 CPU(s): 0-9,20-29
> NUMA node1 CPU(s): 10-19,30-39
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: 

[jira] [Created] (IMPALA-8206) Codegen crash on analytic functions in specific environments

2019-02-15 Thread Balazs Jeszenszky (JIRA)
Balazs Jeszenszky created IMPALA-8206:
-

 Summary: Codegen crash on analytic functions in specific 
environments
 Key: IMPALA-8206
 URL: https://issues.apache.org/jira/browse/IMPALA-8206
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 2.9.0, Impala 2.8.0, Impala 2.7.0
Reporter: Balazs Jeszenszky


The following sequence crashes in certain Impala versions on certain 
environments:
{code}
create table test (c1 int, c2 decimal(2,2));
insert into test values (1,0.0);
SELECT ROW_NUMBER() OVER (PARTITION BY a ORDER BY c1) b
FROM (SELECT c1, (case when c2 = 1 then 1 else 0 end) as a FROM test) t;
{code}
Any analytic function will do.

FATAL log:
{code}
F0212 06:58:45.937505  3119 llvm-codegen.cc:106] LLVM hit fatal error: Cannot 
select: 0x8e23130: i32 = X86ISD::CMP 0x61ff390, 0x61ffd10
  0x61ff390: i1,ch = CopyFromReg 0x5c21830, Register:i1 %vreg72
0x8e23980: i1 = Register %vreg72
  0x61ffd10: i1 = or 0x8e234c0, 0x87e0be0
0x8e234c0: i1,ch = CopyFromReg 0x5c21830, Register:i1 %vreg133
  0x8e23ab0: i1 = Register %vreg133
0x87e0be0: i1,ch = CopyFromReg 0x5c21830, Register:i1 %vreg134
  0x87b0720: i1 = Register %vreg134
In function: Compare
{code}

Minidump:
{code}
 0  libc-2.12.so + 0x325e5
 1  libc-2.12.so + 0x33dc5
 2  libc-2.12.so + 0x6b58
 3  impalad!llvm::MCJIT::emitObject(llvm::Module*) + 0x119
 4  0x7f98ab6a74c0
 5  impalad!llvm::SelectionDAGISel::DoInstructionSelection() + 0x254
 6  impalad!llvm::SelectionDAGISel::CodeGenAndEmitDAG() + 0x1d2
 7  impalad!llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) 
+ 0x36a
 8  
impalad!llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 
0x48f
 9  impalad!(anonymous 
namespace)::X86DAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 0x14
10  impalad!llvm::FPPassManager::runOnFunction(llvm::Function&) + 0x27a
11  impalad!llvm::legacy::PassManagerImpl::run(llvm::Module&) + 0x263
12  impalad!llvm::MCJIT::emitObject(llvm::Module*) + 0x119
13  impalad!llvm::MCJIT::generateCodeForModule(llvm::Module*) + 0x301
14  impalad!llvm::MCJIT::finalizeObject() + 0x110
15  impalad!impala::LlvmCodeGen::FinalizeModule() [llvm-codegen.cc : 937 + 0x2]
16  impalad!impala::PlanFragmentExecutor::OptimizeLlvmModule() 
[plan-fragment-executor.cc : 300 + 0xd]
17  impalad!impala::PlanFragmentExecutor::Open() [plan-fragment-executor.cc : 
342 + 0x8]
18  impalad!impala::FragmentMgr::FragmentExecState::Exec() 
[fragment-exec-state.cc : 58 + 0xb]
19  impalad!impala::FragmentMgr::FragmentThread(impala::TUniqueId) 
[fragment-mgr.cc : 90 + 0xa]
{code}

Didn't test versions earlier than 2.7.
We've used RHEL6.7. For the two environments where we managed to repro, lscpu 
gives:
{code}
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):6
On-line CPU(s) list:   0-5
Thread(s) per core:1
Core(s) per socket:1
Socket(s): 6
NUMA node(s):  1
Vendor ID: GenuineIntel
CPU family:6
Model: 61
Stepping:  2
CPU MHz:   2197.454
BogoMIPS:  4394.90
Hypervisor vendor: KVM
Virtualization type:   full
L1d cache: 32K
L1i cache: 32K
L2 cache:  4096K
NUMA node0 CPU(s): 0-5
{code}
and
{code}
Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):40
On-line CPU(s) list:   0-39
Thread(s) per core:2
Core(s) per socket:10
Socket(s): 2
NUMA node(s):  2
Vendor ID: GenuineIntel
CPU family:6
Model: 85
Model name:Intel(R) Xeon(R) Gold 5115 CPU @ 2.40GHz
Stepping:  4
CPU MHz:   2394.367
BogoMIPS:  4786.29
Virtualization:VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache:  1024K
L3 cache:  14080K
NUMA node0 CPU(s): 0-9,20-29
NUMA node1 CPU(s): 10-19,30-39
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-4337) Wrap long lines in explain plans

2019-02-04 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-4337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16759879#comment-16759879
 ] 

Balazs Jeszenszky commented on IMPALA-4337:
---

I don't think we should do this - text editors can wrap as needed, but if 
Impala wraps the text, there's no way to easily unwrap it. It'd make it more 
difficult to eyeball plans with long predicate lists.

> Wrap long lines in explain plans
> 
>
> Key: IMPALA-4337
> URL: https://issues.apache.org/jira/browse/IMPALA-4337
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.8.0
>Reporter: Henry Robinson
>Priority: Minor
>  Labels: newbie, ramp-up
>
> Explain plans can have very long lines, particularly when printing lists of 
> expressions. It should be possible to wrap, and still correctly indent, those 
> lines.
> This is trickier than it sounds because they have to be wrapped in the 
> context of their place in the plan (i.e. with appropriate prefixes etc). It's 
> a good opportunity to split out explain plan generation from presentation, 
> centralizing the logic so that this kind of change is easy to make.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7910) COMPUTE STATS does an unnecessary REFRESH after writing to the Metastore

2018-12-03 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707374#comment-16707374
 ] 

Balazs Jeszenszky commented on IMPALA-7910:
---

IMPALA-6994 is a similar issue.

> COMPUTE STATS does an unnecessary REFRESH after writing to the Metastore
> 
>
> Key: IMPALA-7910
> URL: https://issues.apache.org/jira/browse/IMPALA-7910
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.9.0, Impala 2.11.0, Impala 2.12.0
>Reporter: Michael Brown
>Assignee: Tianyi Wang
>Priority: Critical
>
> COMPUTE STATS and possibly other DDL operations unnecessarily do the 
> equivalent of a REFRESH after writing to the Hive Metastore. This unnecessary 
> operation can be very expensive, so should be avoided.
> The behavior can be confirmed from the catalogd logs:
> {code}
> compute stats functional_parquet.alltypes;
> +---+
> | summary   |
> +---+
> | Updated 24 partition(s) and 11 column(s). |
> +---+
> Relevant catalogd.INFO snippet
> I0413 14:40:24.210749 27295 HdfsTable.java:1263] Incrementally loading table 
> metadata for: functional_parquet.alltypes
> I0413 14:40:24.242122 27295 HdfsTable.java:555] Refreshed file metadata for 
> functional_parquet.alltypes Path: 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=1: 
> Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
> I0413 14:40:24.244634 27295 HdfsTable.java:555] Refreshed file metadata for 
> functional_parquet.alltypes Path: 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=10: 
> Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
> I0413 14:40:24.247174 27295 HdfsTable.java:555] Refreshed file metadata for 
> functional_parquet.alltypes Path: 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=11: 
> Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
> I0413 14:40:24.249713 27295 HdfsTable.java:555] Refreshed file metadata for 
> functional_parquet.alltypes Path: 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=12: 
> Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
> I0413 14:40:24.252288 27295 HdfsTable.java:555] Refreshed file metadata for 
> functional_parquet.alltypes Path: 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=2: 
> Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
> I0413 14:40:24.254629 27295 HdfsTable.java:555] Refreshed file metadata for 
> functional_parquet.alltypes Path: 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=3: 
> Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
> I0413 14:40:24.256991 27295 HdfsTable.java:555] Refreshed file metadata for 
> functional_parquet.alltypes Path: 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=4: 
> Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
> I0413 14:40:24.259464 27295 HdfsTable.java:555] Refreshed file metadata for 
> functional_parquet.alltypes Path: 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=5: 
> Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
> I0413 14:40:24.262197 27295 HdfsTable.java:555] Refreshed file metadata for 
> functional_parquet.alltypes Path: 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=6: 
> Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
> I0413 14:40:24.264463 27295 HdfsTable.java:555] Refreshed file metadata for 
> functional_parquet.alltypes Path: 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=7: 
> Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
> I0413 14:40:24.266736 27295 HdfsTable.java:555] Refreshed file metadata for 
> functional_parquet.alltypes Path: 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=8: 
> Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
> I0413 14:40:24.269210 27295 HdfsTable.java:555] Refreshed file metadata for 
> functional_parquet.alltypes Path: 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2009/month=9: 
> Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
> I0413 14:40:24.271800 27295 HdfsTable.java:555] Refreshed file metadata for 
> functional_parquet.alltypes Path: 
> hdfs://localhost:20500/test-warehouse/alltypes_parquet/year=2010/month=1: 
> Loaded files: 1 Hidden files: 0 Skipped files: 0 Unknown diskIDs: 0
> I0413 14:40:24.274348 27295 HdfsTable.java:555] 

[jira] [Commented] (IMPALA-7695) Consolidate ACL inheritance

2018-10-11 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646759#comment-16646759
 ] 

Balazs Jeszenszky commented on IMPALA-7695:
---

[~fredyw] FYI.
[~zherczeg], so IIUC, with {{--insert_inherit_permissions=true}} all would be 
well? Do you think an OK fix would be to change the default?

> Consolidate ACL inheritance
> ---
>
> Key: IMPALA-7695
> URL: https://issues.apache.org/jira/browse/IMPALA-7695
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Security
>Reporter: Zsolt Herczeg
>Priority: Minor
>
> As of now Impala behavior is not consistent regarding ACL inheritance by 
> default.
> If --insert_inherit_permissions is not specified, then:
> {code:java}
> CREATE EXTERNAL TABLE acl_test (a int) partitioned by (b int) STORED AS 
> PARQUET LOCATION '/dataroot/acl_test/';
> {code}
> This will create the table directory (/dataroot/acl_test), and inherit the 
> parent dir (/dataroot) acls.
> {code:java}
> ALTER TABLE acl_test ADD PARTITION (b=10) 
> {code}
> This will create the partition directory (/dataroot/acl_test/b=10) and 
> inherit the parent dir (/dataroot/acl_test) acls.
> {code:java}
> INSERT INTO acl_test (a,b) VALUES (1,2) 
> {code}
> This will create the partition directory (/dataroot/acl_test/b=2) but will 
> *not* inherit any acls.
> The difference in the INSERT/ALTER behavior will lead to inconsistent 
> partition directory permissions, depending on whether they were created 
> explicitly beforhand or implicitly during an insert.
> This is documented, but generally unexpected. I'd recommend to review if a 
> more consistent approach could be followed for ACLs on partition directories..



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7695) Consolidate ACL inheritance

2018-10-11 Thread Balazs Jeszenszky (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky updated IMPALA-7695:
--
Component/s: Security

> Consolidate ACL inheritance
> ---
>
> Key: IMPALA-7695
> URL: https://issues.apache.org/jira/browse/IMPALA-7695
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Security
>Reporter: Zsolt Herczeg
>Priority: Minor
>
> As of now Impala behavior is not consistent regarding ACL inheritance by 
> default.
> If --insert_inherit_permissions is not specified, then:
> {code:java}
> CREATE EXTERNAL TABLE acl_test (a int) partitioned by (b int) STORED AS 
> PARQUET LOCATION '/dataroot/acl_test/';
> {code}
> This will create the table directory (/dataroot/acl_test), and inherit the 
> parent dir (/dataroot) acls.
> {code:java}
> ALTER TABLE acl_test ADD PARTITION (b=10) 
> {code}
> This will create the partition directory (/dataroot/acl_test/b=10) and 
> inherit the parent dir (/dataroot/acl_test) acls.
> {code:java}
> INSERT INTO acl_test (a,b) VALUES (1,2) 
> {code}
> This will create the partition directory (/dataroot/acl_test/b=2) but will 
> *not* inherit any acls.
> The difference in the INSERT/ALTER behavior will lead to inconsistent 
> partition directory permissions, depending on whether they were created 
> explicitly beforhand or implicitly during an insert.
> This is documented, but generally unexpected. I'd recommend to review if a 
> more consistent approach could be followed for ACLs on partition directories..



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7694) Add system resource utilization (CPU, disk, network) timelines to profiles

2018-10-11 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16646122#comment-16646122
 ] 

Balazs Jeszenszky commented on IMPALA-7694:
---

An easier version of this would be to note time spent waiting for CPU cycles 
per fragment instance, since that's the hardest to observe currently.

> Add system resource utilization (CPU, disk, network) timelines to profiles
> --
>
> Key: IMPALA-7694
> URL: https://issues.apache.org/jira/browse/IMPALA-7694
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Lars Volker
>Priority: Major
>  Labels: observability, supportability
>
> We often struggle to determine why a query was slow, in particular if it was 
> caused by other tasks on the same machine using resources. To help with this 
> we should include timelines for system resource utilization to the profiles. 
> These should include CPU and disk and network I/O.  If it is too expensive to 
> include these in all queries we should add a flag to add these to a 
> percentage of queries, and a query option to force-enable them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7685) Connect to impala database via JDBC and the connection is not closed when the query is finished

2018-10-10 Thread Balazs Jeszenszky (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky resolved IMPALA-7685.
---
   Resolution: Not A Bug
Fix Version/s: Not Applicable

This is the expected behaviour, you have to close the session from the client. 
Impala will only close idle sessions after --idle_session_timeout, see docs for 
specifics.

> Connect to impala database via JDBC and the connection is not closed when the 
> query is finished
> ---
>
> Key: IMPALA-7685
> URL: https://issues.apache.org/jira/browse/IMPALA-7685
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 2.12.0
>Reporter: yuxuqi
>Priority: Major
> Fix For: Not Applicable
>
>
> CDH:
> h2. 查询详细信息
> DESCRIBE FDM_DB.F_T_MERCHANT_BASE_INFO
> h3. 查询信息
>  * 查询 ID: *94492dca691a08fd:dc057ad2*
>  * 用户:**
>  * 数据库: *ADM_DB*
>  * Coordinator: 
> [node01|http://47.106.246.104:7180/cmf/hardware/hosts/8/status]
>  * 查询类型: *DDL*
>  * 查询状态: *FINISHED*
>  * 开始时间: *2018-10-10 9:35:01*
>  * 持续时间: *2小时,22分钟*
>  * DDL 类型: *DESCRIBE_TABLE*
>  * Impala 版本: *impalad version 2.12.0-cdh5.15.0 RELEASE (build 
> 23f574543323301846b41fa5433690df32efe085)*
>  * Statistics Corrupt: *false*
>  * 会话 ID: *a8436493f9c60748:9ba7d78b5db3f082*
>  * 会话类型: *HIVESERVER2*
>  * 内存不足: *false*
>  * 客户端获取等待时间: *19毫秒*
>  * 客户端获取等待时间百分比: *0*
>  * 文件格式:**
>  * 查询状态: *OK*
>  * 缺少统计数据: *false*
>  * 网络地址: *172.18.230.235:52018*
>  * 规划等待时间: *19毫秒*
>  * 规划等待时间百分比: *0*
>  * 许可结果: *Unknown*
> h3. Query Timeline
>  # Query submitted: *0纳秒 (0纳秒)*
>  # Planning finished: *19毫秒 (19毫秒)*
>  # Rows available: *30毫秒 (11毫秒)*
>  # First row fetched: *49毫秒 (19毫秒)*
>  1010 10:35:00.015696 7925 impala-hs2-server.cc:418] ExecuteStatement(): 
> request=TExecuteStatementReq {
>  01: sessionHandle (struct) = TSessionHandle
> Unknown macro: \{ 01}
> ,
>  02: statement (string) = "DESCRIBE FDM_DB.F_T_MERCHANT_BASE_INFO",
>  04: runAsync (bool) = false,
>  }
>  I1010 10:35:00.015733 7925 impala-hs2-server.cc:234] TExecuteStatementReq: 
> TExecuteStatementReq {
>  01: sessionHandle (struct) = TSessionHandle
> Unknown macro: \{ 01}
> ,
>  02: statement (string) = "DESCRIBE FDM_DB.F_T_MERCHANT_BASE_INFO",
>  04: runAsync (bool) = false,
>  }
>  I1010 10:35:00.015977 7925 impala-hs2-server.cc:271] 
> TClientRequest.queryOptions: TQueryOptions
> { 01: abort_on_error (bool) = false, 02: max_errors (i32) = 100, 03: 
> disable_codegen (bool) = false, 04: batch_size (i32) = 0, 05: num_nodes (i32) 
> = 0, 06: max_scan_range_length (i64) = 0, 07: num_scanner_threads (i32) = 0, 
> 08: max_io_buffers (i32) = 0, 09: allow_unsupported_formats (bool) = false, 
> 10: default_order_by_limit (i64) = -1, 11: debug_action (string) = "", 12: 
> mem_limit (i64) = 0, 13: abort_on_default_limit_exceeded (bool) = false, 15: 
> hbase_caching (i32) = 0, 16: hbase_cache_blocks (bool) = false, 17: 
> parquet_file_size (i64) = 0, 18: explain_level (i32) = 1, 19: sync_ddl (bool) 
> = false, 23: disable_cached_reads (bool) = false, 24: disable_outermost_topn 
> (bool) = false, 25: rm_initial_mem (i64) = 0, 26: query_timeout_s (i32) = 0, 
> 28: appx_count_distinct (bool) = false, 29: disable_unsafe_spills (bool) = 
> false, 31: exec_single_node_rows_threshold (i32) = 100, 32: 
> optimize_partition_key_scans (bool) = false, 33: replica_preference (i32) = 
> 0, 34: schedule_random_replica (bool) = false, 35: 
> scan_node_codegen_threshold (i64) = 180, 36: 
> disable_streaming_preaggregations (bool) = false, 37: runtime_filter_mode 
> (i32) = 2, 38: runtime_bloom_filter_size (i32) = 1048576, 39: 
> runtime_filter_wait_time_ms (i32) = 0, 40: disable_row_runtime_filtering 
> (bool) = false, 41: max_num_runtime_filters (i32) = 10, 42: 
> parquet_annotate_strings_utf8 (bool) = false, 43: 
> parquet_fallback_schema_resolution (i32) = 0, 45: s3_skip_insert_staging 
> (bool) = true, 46: runtime_filter_min_size (i32) = 1048576, 47: 
> runtime_filter_max_size (i32) = 16777216, 48: prefetch_mode (i32) = 1, 49: 
> strict_mode (bool) = false, 50: scratch_limit (i64) = -1, 51: 
> enable_expr_rewrites (bool) = true, 52: decimal_v2 (bool) = false, 53: 
> parquet_dictionary_filtering (bool) = true, 54: parquet_array_resolution 
> (i32) = 2, 55: parquet_read_statistics (bool) = true, 56: 
> default_join_distribution_mode (i32) = 0, 57: disable_codegen_rows_threshold 
> (i32) = 5, 58: default_spillable_buffer_size (i64) = 2097152, 59: 
> min_spillable_buffer_size (i64) = 65536, 60: max_row_size (i64) = 524288, 61: 
> idle_session_timeout (i32) = 0, 62: compute_stats_min_sample_size (i64) = 
> 1073741824, 63: exec_time_limit_s (i32) = 0, }
> I1010 10:35:00.016960 7925 Frontend.java:952] Analyzing query: DESCRIBE 
> FDM_DB.F_T_MERCHANT_BASE_INFO
>  I1010 

[jira] [Commented] (IMPALA-6741) Profiles of running queries should tell last update time of counters

2018-10-08 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16641646#comment-16641646
 ] 

Balazs Jeszenszky commented on IMPALA-6741:
---

[~kwho] Having elapsed time can be unreliable in case the coordinator itself 
gets stuck. Having last update timestamp is more robust but harder to 
interpret, and elapsed time would probably work more than 99% of the time. 
Ideally, I think we should include both, e.g. {{Last update time  
( ago)}}.

> Profiles of running queries should tell last update time of counters
> 
>
> Key: IMPALA-6741
> URL: https://issues.apache.org/jira/browse/IMPALA-6741
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Balazs Jeszenszky
>Assignee: Michael Ho
>Priority: Major
>
> When looking at the profile of a running query, it's impossible to tell the 
> degree of accuracy. We've seen issues both with instances not checking in 
> with the coordinator for a long time, and with hung instances that never 
> update their counters. There are some specific issues as well, see 
> IMPALA-5200. This means that profiles taken off of running queries can't be 
> used perf troubleshooting with confidence.
> Ideally, Impala should guarantee counters to be written at a certain 
> interval, and warn for counters or instances that are out of sync for some 
> reason.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7659) Collect count of nulls when collecting stats

2018-10-04 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16638777#comment-16638777
 ] 

Balazs Jeszenszky commented on IMPALA-7659:
---

Combination of IMPALA-7655 and IMPALA-7497.

> Collect count of nulls when collecting stats
> 
>
> Key: IMPALA-7659
> URL: https://issues.apache.org/jira/browse/IMPALA-7659
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Piotr Findeisen
>Priority: Major
>
> When Impala calculates table stats, NULL count gets overridden with -1. 
> Number of NULLs in a table is a useful information. Even if Impala does not 
> benefit from this information, some other tools do. Thus, not collecting this 
> information may pose a problem for Impala users (potentially forcing them to 
> run COMPUTE STATS elsewhere).
> Now, counting NULLs should be an operation that is cheaper than counting 
> NDVs. However, code comment in {{ComputeStatsStmt.java}} suggests otherwise 
> ([~tarmstrong] suggested this is because of IMPALA-7655).
> My suggestion would be to
> - improve expression used to collect NULL count
> - collect NULL count during COMPUTE STATS



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7653) Improve accuracy of compute incremental stats cardinality estimation

2018-10-04 Thread Balazs Jeszenszky (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky updated IMPALA-7653:
--
Summary: Improve accuracy of compute incremental stats cardinality 
estimation  (was: Improve accuracy of incremental stats cardinality estimation)

> Improve accuracy of compute incremental stats cardinality estimation
> 
>
> Key: IMPALA-7653
> URL: https://issues.apache.org/jira/browse/IMPALA-7653
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Balazs Jeszenszky
>Priority: Major
>
> Currently, the operators of a compute [incremental] stats' subquery rely on 
> combined selectivities - as usual - to estimate cardinality, e.g. during 
> aggregation. For example, note the expected cardinality of the aggregation on 
> this subquery:
> {code}
> F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=4
> Per-Host Resources: mem-estimate=305.20GB mem-reservation=136.00MB
> 01:AGGREGATE [STREAMING]
> |  output: [...]
> |  group by: col_a, col_b, col_c
> |  mem-estimate=76.21GB mem-reservation=34.00MB spill-buffer=2.00MB
> |  tuple-ids=1 row-size=104.83KB cardinality=693000
> |
> 00:SCAN HDFS [default.test, RANDOM]
>partitions=1/554 files=1 size=109.65MB
>stats-rows=1506374 extrapolated-rows=disabled
>table stats: rows=821958291 size=unavailable
>column stats: all
>mem-estimate=88.00MB mem-reservation=0B
>tuple-ids=0 row-size=2.06KB cardinality=1506374
> {code}
> This was generated as a result of compute incremental stats on a single 
> partition, so the output of that aggregation is a single row. Due to the 
> width of the intermediate rows, such overestimations lead to bloated memory 
> estimates. Since the amount of partitions to be updated is known at 
> plan-time, Impala could use that to set the aggregation's cardinality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7653) Improve accuracy of incremental stats cardinality estimation

2018-10-04 Thread Balazs Jeszenszky (JIRA)
Balazs Jeszenszky created IMPALA-7653:
-

 Summary: Improve accuracy of incremental stats cardinality 
estimation
 Key: IMPALA-7653
 URL: https://issues.apache.org/jira/browse/IMPALA-7653
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Affects Versions: Impala 3.0
Reporter: Balazs Jeszenszky


Currently, the operators of a compute [incremental] stats' subquery rely on 
combined selectivities - as usual - to estimate cardinality, e.g. during 
aggregation. For example, note the expected cardinality of the aggregation on 
this subquery:

{code}
F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=4
Per-Host Resources: mem-estimate=305.20GB mem-reservation=136.00MB
01:AGGREGATE [STREAMING]
|  output: [...]
|  group by: col_a, col_b, col_c
|  mem-estimate=76.21GB mem-reservation=34.00MB spill-buffer=2.00MB
|  tuple-ids=1 row-size=104.83KB cardinality=693000
|
00:SCAN HDFS [default.test, RANDOM]
   partitions=1/554 files=1 size=109.65MB
   stats-rows=1506374 extrapolated-rows=disabled
   table stats: rows=821958291 size=unavailable
   column stats: all
   mem-estimate=88.00MB mem-reservation=0B
   tuple-ids=0 row-size=2.06KB cardinality=1506374
{code}

This was generated as a result of compute incremental stats on a single 
partition, so the output of that aggregation is a single row. Due to the width 
of the intermediate rows, such overestimations lead to bloated memory 
estimates. Since the amount of partitions to be updated is known at plan-time, 
Impala could use that to set the aggregation's cardinality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-4714) Idle session expired query goes in to exception state - And this is confusing

2018-10-03 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16637112#comment-16637112
 ] 

Balazs Jeszenszky commented on IMPALA-4714:
---

SGTM

> Idle session expired query goes in to exception state - And this is confusing
> -
>
> Key: IMPALA-4714
> URL: https://issues.apache.org/jira/browse/IMPALA-4714
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.6.0
>Reporter: Mala Chikka Kempanna
>Priority: Major
>  Labels: query-lifecycle
>
> After setting idle_session_timeout , impala server, after completing 
> execution of query, moves it into exception state , if there is no client 
> activity.
> Example profile excerpt showing this behavior:
> {code}
> Query Timeline
> Start execution: 0ns (0ns)
> Planning finished: 9ms (9ms)
> Child queries finished: 8.3m (8.3m)
> Metastore update finished: 8.3m (661ms)
> Rows available: 8.3m (0ns)
> Cancelled: 11.3m (3.0m)
> Unregister query: 12.0m (42.55s)
> {code}
> Query status and query state-
> {code}
> Query Type: DDL
> Query State: EXCEPTION
> Start Time: Dec 22, 2016 11:45:01 AM
> End Time: Dec 22, 2016 11:57:01 AM
> Duration: 11m, 59s
> Admission Result: Unknown
> Client Fetch Wait Time: 3.7m
> Client Fetch Wait Time Percentage: 31
> Connected User: admin
> DDL Type: COMPUTE_STATS
> File Formats:
> Impala Version: impalad version 2.5.0-cdh5.7.2 RELEASE (build 
> 1140f8289dc0d2b1517bcf70454bb4575eb8cc70)
> Network Address: 10.17.100.123:44618
> Out of Memory: false
> Planning Wait Time: 9ms
> Planning Wait Time Percentage: 0
> Query Status: Query d141e0d996c91e72:bb8726fb917537bb expired due to client 
> inactivity (timeout is 3m)
> Session ID: 3043ff5042860968:8f92bc3bd2a0ca83
> Session Type: HIVESERVER2
> {code}
> Though query status string is very clear saying "expired due to client 
> inactivity (timeout is 3m)", the problem is with "Query State: EXCEPTION"
> This makes user, think something went wrong with query execution.
> So I recommend that queries completed, but expired due to client-inactivity 
> be marked as 
> "Query State: FINISHED"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-4714) Idle session expired query goes in to exception state - And this is confusing

2018-10-03 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16636601#comment-16636601
 ] 

Balazs Jeszenszky commented on IMPALA-4714:
---

[~tarmstrong], hmm. From a user POV, this is really a problem. There's a reason 
these clients don't close their queries and timeouts are our recommended 
solution, making a timed out query completely expected. This way EXCEPTION 
state represents both normal and non-normal termination paths.

There are probably more complete (and more involved) solutions than just 
turning these queries into FINISHED ones (e.g. TIMEDOUT state or similar). For 
users, EXCEPTION means there's something to look into, not simply that it was 
closed by the server. Until we come up with a more complete solution, 
considering these queries normal (ie. FINISHED) would ease the pain.

> Idle session expired query goes in to exception state - And this is confusing
> -
>
> Key: IMPALA-4714
> URL: https://issues.apache.org/jira/browse/IMPALA-4714
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.6.0
>Reporter: Mala Chikka Kempanna
>Priority: Major
>  Labels: query-lifecycle
>
> After setting idle_session_timeout , impala server, after completing 
> execution of query, moves it into exception state , if there is no client 
> activity.
> Example profile excerpt showing this behavior:
> {code}
> Query Timeline
> Start execution: 0ns (0ns)
> Planning finished: 9ms (9ms)
> Child queries finished: 8.3m (8.3m)
> Metastore update finished: 8.3m (661ms)
> Rows available: 8.3m (0ns)
> Cancelled: 11.3m (3.0m)
> Unregister query: 12.0m (42.55s)
> {code}
> Query status and query state-
> {code}
> Query Type: DDL
> Query State: EXCEPTION
> Start Time: Dec 22, 2016 11:45:01 AM
> End Time: Dec 22, 2016 11:57:01 AM
> Duration: 11m, 59s
> Admission Result: Unknown
> Client Fetch Wait Time: 3.7m
> Client Fetch Wait Time Percentage: 31
> Connected User: admin
> DDL Type: COMPUTE_STATS
> File Formats:
> Impala Version: impalad version 2.5.0-cdh5.7.2 RELEASE (build 
> 1140f8289dc0d2b1517bcf70454bb4575eb8cc70)
> Network Address: 10.17.100.123:44618
> Out of Memory: false
> Planning Wait Time: 9ms
> Planning Wait Time Percentage: 0
> Query Status: Query d141e0d996c91e72:bb8726fb917537bb expired due to client 
> inactivity (timeout is 3m)
> Session ID: 3043ff5042860968:8f92bc3bd2a0ca83
> Session Type: HIVESERVER2
> {code}
> Though query status string is very clear saying "expired due to client 
> inactivity (timeout is 3m)", the problem is with "Query State: EXCEPTION"
> This makes user, think something went wrong with query execution.
> So I recommend that queries completed, but expired due to client-inactivity 
> be marked as 
> "Query State: FINISHED"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7649) Missing ACL on implicitly created partitions

2018-10-02 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16635565#comment-16635565
 ] 

Balazs Jeszenszky commented on IMPALA-7649:
---

[~zherczeg] this is expected - from the doc 
(http://impala.apache.org/docs/build/html/topics/impala_insert.html):
bq. By default, if an INSERT statement creates any new subdirectories 
underneath a partitioned table, those subdirectories are assigned default HDFS 
permissions for the impala user. To make each subdirectory have the same 
permissions as its parent directory in HDFS, specify the 
--insert_inherit_permissions startup option for the impalad daemon.

> Missing ACL on implicitly created partitions
> 
>
> Key: IMPALA-7649
> URL: https://issues.apache.org/jira/browse/IMPALA-7649
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zsolt Herczeg
>Priority: Minor
>
> If a partition is created via ALTER TABLE, Impala will apply an ACL 
> (group:hive:rwx) to the partition folder to ensure compatibility with 
> HiveServer2.
> In case a partition is created implicitly, during an INSERT query, this ACL 
> is not applied, causing interoperability issues.
> Steps to reproduce:
> {code:java}
> CREATE EXTERNAL TABLE test (a int) PARTITIONED BY (y int) STORED AS TEXTFILE 
> LOCATION '/user/admin/test/';
> ALTER TABLE test ADD PARTITION (y=1);
> INSERT INTO test (a) PARTITION (y=2) VALUES (2);{code}
> Resulting ACLs are:
> {code:java}
> hdfs dfs -getfacl /user/admin/test/
> # file: /user/admin/test
> # owner: impala
> # group: admin
> user::rwx
> group::rwx
> group:hive:rwx
> mask::rwx
> other::rwx
> hdfs dfs -getfacl /user/admin/test/y=1
> # file: /user/admin/test/y=1
> # owner: impala
> # group: admin
> user::rwx
> group::rwx
> group:hive:rwx
> mask::rwx
> other::rwx
> hdfs dfs -getfacl /user/admin/test/y=2
> # file: /user/admin/test/y=2
> # owner: impala
> # group: admin
> user::rwx
> group::r-x
> other::r-x{code}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7642) Optimize UDF jar handling in Catalog

2018-10-01 Thread Balazs Jeszenszky (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky updated IMPALA-7642:
--
Attachment: test.html

> Optimize UDF jar handling in Catalog
> 
>
> Key: IMPALA-7642
> URL: https://issues.apache.org/jira/browse/IMPALA-7642
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 3.0
>Reporter: Miklos Szurap
>Priority: Major
>
> 1. Optimize UDF jar loading
> During startup and global invalidate metadata calls, for each database the 
> [CatalogServiceCatalog.loadJavaFunctions()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L956]
>  is called, which calls 
> [extractFunctions()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/util/FunctionUtils.java#L68]
>  for each function found in HMS, and for each function the related UDF jar 
> file is downloaded from HDFS to the localLibraryPath (file:///tmp). It is not 
> uncommon that the UDFs are not packaged separately, but in everything-in-one 
> big-fat jars, so they can be 10-50 MB of size. Sometimes there are hundreds 
> of functions in a database (which usually related to the same project) and 
> all functions are pointing to the same UDF jar. The above method hundreds of 
> times downloads the same jar, "extracts the function" and deletes the local 
> file.
> The suggestion would be to improve this by:
> - creating a local "cache" in CatalogServiceCatalog.loadJavaFunctions() as a 
> HashMap (map of jarUri -> localJarPath)
> - pass this cache to FunctionUtils.extractFunctions, which checks if the 
> cache already contains the jarUri. If not, downloads the jar, and puts it 
> into the cache (and does everything else needed)
> - move the FileSystemUtil.deleteIfExists(localJarPath) from extractFunctions 
> to loadJavaFunctions - in a finally block iterate over the cache entries 
> (values) and delete the local files, and on the end clear the cache.
> 2. Use {{Set}} instead of {{List}} for addedSignatures in 
> [FunctionUtils.extractFunctions()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/util/FunctionUtils.java#L73]:
> It just tracks which function signatures were added, for that purpose a Set 
> is fine. 
> {noformat}
> if (!addedSignatures.contains(fn.signatureString())){noformat}
> This would be faster ( {{O( 1 )}} ) with a HashSet (compared to ArrayList's 
> {{O( n )}} for the contains method).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7642) Optimize UDF jar handling in Catalog

2018-10-01 Thread Balazs Jeszenszky (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky updated IMPALA-7642:
--
Attachment: (was: test.html)

> Optimize UDF jar handling in Catalog
> 
>
> Key: IMPALA-7642
> URL: https://issues.apache.org/jira/browse/IMPALA-7642
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 3.0
>Reporter: Miklos Szurap
>Priority: Major
>
> 1. Optimize UDF jar loading
> During startup and global invalidate metadata calls, for each database the 
> [CatalogServiceCatalog.loadJavaFunctions()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L956]
>  is called, which calls 
> [extractFunctions()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/util/FunctionUtils.java#L68]
>  for each function found in HMS, and for each function the related UDF jar 
> file is downloaded from HDFS to the localLibraryPath (file:///tmp). It is not 
> uncommon that the UDFs are not packaged separately, but in everything-in-one 
> big-fat jars, so they can be 10-50 MB of size. Sometimes there are hundreds 
> of functions in a database (which usually related to the same project) and 
> all functions are pointing to the same UDF jar. The above method hundreds of 
> times downloads the same jar, "extracts the function" and deletes the local 
> file.
> The suggestion would be to improve this by:
> - creating a local "cache" in CatalogServiceCatalog.loadJavaFunctions() as a 
> HashMap (map of jarUri -> localJarPath)
> - pass this cache to FunctionUtils.extractFunctions, which checks if the 
> cache already contains the jarUri. If not, downloads the jar, and puts it 
> into the cache (and does everything else needed)
> - move the FileSystemUtil.deleteIfExists(localJarPath) from extractFunctions 
> to loadJavaFunctions - in a finally block iterate over the cache entries 
> (values) and delete the local files, and on the end clear the cache.
> 2. Use {{Set}} instead of {{List}} for addedSignatures in 
> [FunctionUtils.extractFunctions()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/util/FunctionUtils.java#L73]:
> It just tracks which function signatures were added, for that purpose a Set 
> is fine. 
> {noformat}
> if (!addedSignatures.contains(fn.signatureString())){noformat}
> This would be faster ( {{O( 1 )}} ) with a HashSet (compared to ArrayList's 
> {{O( n )}} for the contains method).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-7641) Memory Limit Exceeded

2018-09-28 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631469#comment-16631469
 ] 

Balazs Jeszenszky edited comment on IMPALA-7641 at 9/28/18 7:31 AM:


[~ahshan...@gmail.com] that's just an estimated memory requirement and should 
not be relied on. The query can't run within 20GB, so it fails, which is the 
expected behaviour of mem_limit. The reason it can't run, judging by the error 
message {{HdfsParquetScanner::ReadDataPage() failed to allocate 270212389 bytes 
for dictionary.}}, has to do with a wide or just big parquet file. The average 
file size in that partition is ~35 MB, so it should be easy to spot the 
outlier, if any. I doubt we'd need a 270MB+ dictionary for a 35MB file.


was (Author: jeszyb):
[~ahshan...@gmail.com] that's just an estimated memory requirement and should 
not be relied on. The query can't run within 20GB, so it fails, which is the 
expected behaviour of mem_limit. The reason it can run, judging by the error 
message {{HdfsParquetScanner::ReadDataPage() failed to allocate 270212389 bytes 
for dictionary.}}, has to do with a wide or just big parquet file. The average 
file size in that partition is ~35 MB, so it should be easy to spot the 
outlier, if any. I doubt we'd need a 270MB+ dictionary for a 35MB file.

> Memory Limit Exceeded
> -
>
> Key: IMPALA-7641
> URL: https://issues.apache.org/jira/browse/IMPALA-7641
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.6.4
>Reporter: Ahshan
>Priority: Minor
>  Labels: memory
> Attachments: profile(8).txt
>
>
> We are using CDH distribution with impala version -impalad version 
> 2.6.0-cdh5.8.2 RELEASE 
>  
> As per my understanding memory requirement is of 288 MB and we have an total 
> of 18 Impala Daemons which sum upto 5184MB of total memory consumption 
> considering the above details, it should not lead to an memory issue when the 
> MEM_LIMIT is set to 20GB
> Hence , Could you please let us know the cause of memory limit exceeding 
> select * from emp_sales where job_id = 55451 and uploaded_month = 201808 
> limit 1
>  +---+
> |Explain String|
> +---+
> |Estimated Per-Host Requirements: Memory=288.00MB VCores=1|
> | |
> |01:EXCHANGE [UNPARTITIONED]|
> | |limit: 1|
> | | |
> |00:SCAN HDFS [fenet5.hmig_os_changes_details_malicious]|
> |partitions=1/25 files=3118 size=110.01GB|
> |predicates: job_id = 55451|
> |limit: 1|
> +---+
> WARNINGS: 
>  Memory limit exceeded
>  HdfsParquetScanner::ReadDataPage() failed to allocate 269074889 bytes for 
> dictionary.
> Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 257.23 MB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=20.00 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=20.00 GB
>  HDFS_SCAN_NODE (id=0): Consumption=20.00 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 255.63 MB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=20.00 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=20.00 GB
>  HDFS_SCAN_NODE (id=0): Consumption=20.00 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 255.27 MB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=20.00 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=20.00 GB
>  HDFS_SCAN_NODE (id=0): Consumption=20.00 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 255.39 MB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=20.00 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=20.00 GB
>  HDFS_SCAN_NODE (id=0): Consumption=20.00 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 16.09 KB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=19.74 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=19.74 GB
>  HDFS_SCAN_NODE (id=0): Consumption=19.74 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could 

[jira] [Commented] (IMPALA-7641) Memory Limit Exceeded

2018-09-28 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16631469#comment-16631469
 ] 

Balazs Jeszenszky commented on IMPALA-7641:
---

[~ahshan...@gmail.com] that's just an estimated memory requirement and should 
not be relied on. The query can't run within 20GB, so it fails, which is the 
expected behaviour of mem_limit. The reason it can run, judging by the error 
message {{HdfsParquetScanner::ReadDataPage() failed to allocate 270212389 bytes 
for dictionary.}}, has to do with a wide or just big parquet file. The average 
file size in that partition is ~35 MB, so it should be easy to spot the 
outlier, if any. I doubt we'd need a 270MB+ dictionary for a 35MB file.

> Memory Limit Exceeded
> -
>
> Key: IMPALA-7641
> URL: https://issues.apache.org/jira/browse/IMPALA-7641
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.6.4
>Reporter: Ahshan
>Priority: Blocker
>  Labels: memory
> Attachments: profile(8).txt
>
>
> We are using CDH distribution with impala version -impalad version 
> 2.6.0-cdh5.8.2 RELEASE 
>  
> As per my understanding memory requirement is of 288 MB and we have an total 
> of 18 Impala Daemons which sum upto 5184MB of total memory consumption 
> considering the above details, it should not lead to an memory issue when the 
> MEM_LIMIT is set to 20GB
> Hence , Could you please let us know the cause of memory limit exceeding 
> select * from emp_sales where job_id = 55451 and uploaded_month = 201808 
> limit 1
>  +---+
> |Explain String|
> +---+
> |Estimated Per-Host Requirements: Memory=288.00MB VCores=1|
> | |
> |01:EXCHANGE [UNPARTITIONED]|
> | |limit: 1|
> | | |
> |00:SCAN HDFS [fenet5.hmig_os_changes_details_malicious]|
> |partitions=1/25 files=3118 size=110.01GB|
> |predicates: job_id = 55451|
> |limit: 1|
> +---+
> WARNINGS: 
>  Memory limit exceeded
>  HdfsParquetScanner::ReadDataPage() failed to allocate 269074889 bytes for 
> dictionary.
> Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 257.23 MB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=20.00 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=20.00 GB
>  HDFS_SCAN_NODE (id=0): Consumption=20.00 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 255.63 MB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=20.00 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=20.00 GB
>  HDFS_SCAN_NODE (id=0): Consumption=20.00 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 255.27 MB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=20.00 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=20.00 GB
>  HDFS_SCAN_NODE (id=0): Consumption=20.00 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 255.39 MB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=20.00 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=20.00 GB
>  HDFS_SCAN_NODE (id=0): Consumption=20.00 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 16.09 KB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=19.74 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=19.74 GB
>  HDFS_SCAN_NODE (id=0): Consumption=19.74 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 15.20 KB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=19.64 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=19.64 GB
>  HDFS_SCAN_NODE (id=0): Consumption=19.64 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 14.61 KB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=19.64 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=19.64 GB
>  

[jira] [Updated] (IMPALA-7641) Memory Limit Exceeded

2018-09-28 Thread Balazs Jeszenszky (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky updated IMPALA-7641:
--
Priority: Minor  (was: Blocker)

> Memory Limit Exceeded
> -
>
> Key: IMPALA-7641
> URL: https://issues.apache.org/jira/browse/IMPALA-7641
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.6.4
>Reporter: Ahshan
>Priority: Minor
>  Labels: memory
> Attachments: profile(8).txt
>
>
> We are using CDH distribution with impala version -impalad version 
> 2.6.0-cdh5.8.2 RELEASE 
>  
> As per my understanding memory requirement is of 288 MB and we have an total 
> of 18 Impala Daemons which sum upto 5184MB of total memory consumption 
> considering the above details, it should not lead to an memory issue when the 
> MEM_LIMIT is set to 20GB
> Hence , Could you please let us know the cause of memory limit exceeding 
> select * from emp_sales where job_id = 55451 and uploaded_month = 201808 
> limit 1
>  +---+
> |Explain String|
> +---+
> |Estimated Per-Host Requirements: Memory=288.00MB VCores=1|
> | |
> |01:EXCHANGE [UNPARTITIONED]|
> | |limit: 1|
> | | |
> |00:SCAN HDFS [fenet5.hmig_os_changes_details_malicious]|
> |partitions=1/25 files=3118 size=110.01GB|
> |predicates: job_id = 55451|
> |limit: 1|
> +---+
> WARNINGS: 
>  Memory limit exceeded
>  HdfsParquetScanner::ReadDataPage() failed to allocate 269074889 bytes for 
> dictionary.
> Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 257.23 MB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=20.00 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=20.00 GB
>  HDFS_SCAN_NODE (id=0): Consumption=20.00 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 255.63 MB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=20.00 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=20.00 GB
>  HDFS_SCAN_NODE (id=0): Consumption=20.00 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 255.27 MB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=20.00 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=20.00 GB
>  HDFS_SCAN_NODE (id=0): Consumption=20.00 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 255.39 MB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=20.00 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=20.00 GB
>  HDFS_SCAN_NODE (id=0): Consumption=20.00 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 16.09 KB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=19.74 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=19.74 GB
>  HDFS_SCAN_NODE (id=0): Consumption=19.74 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 15.20 KB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=19.64 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=19.64 GB
>  HDFS_SCAN_NODE (id=0): Consumption=19.64 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 14.61 KB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=19.64 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=19.64 GB
>  HDFS_SCAN_NODE (id=0): Consumption=19.64 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  HDFS_SCAN_NODE (id=0) could not allocate 257.11 MB without exceeding limit.
>  Query(294eb435fbf8fc63:f529602818758c80) Limit: Limit=20.00 GB 
> Consumption=19.47 GB
>  Fragment 294eb435fbf8fc63:f529602818758c8b: Consumption=19.47 GB
>  HDFS_SCAN_NODE (id=0): Consumption=19.47 GB
>  DataStreamSender: Consumption=1.45 KB
>  Block Manager: Limit=16.00 GB Consumption=0
>  Memory Limit Exceeded
>  

[jira] [Commented] (IMPALA-7310) Compute Stats not computing NULLs as a distinct value causing wrong estimates

2018-09-19 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16621143#comment-16621143
 ] 

Balazs Jeszenszky commented on IMPALA-7310:
---

[~Paul.Rogers] this one is about the fact that compute stats will come up with 
a 0 NDV for an all-NULL column of non-zero cardinality. As a nice to have, we 
could make cardinality estimation more robust against columns with 0 NDV, but 
post this fix, that value would either be valid or manually set.
The rest sounds like valid improvement (it's probably that way because no one 
made it better yet), but would need their own jiras.

FWIW, IMPALA-7528 is also somewhat related, if you want to fix these in one go.

> Compute Stats not computing NULLs as a distinct value causing wrong estimates
> -
>
> Key: IMPALA-7310
> URL: https://issues.apache.org/jira/browse/IMPALA-7310
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, 
> Impala 2.11.0, Impala 3.0, Impala 2.12.0
>Reporter: Zsombor Fedor
>Assignee: Paul Rogers
>Priority: Major
>
> As seen in other DBMSs
> {code:java}
> NDV(col){code}
> not counting NULL as a distinct value. The same also applies to
> {code:java}
> COUNT(DISTINCT col){code}
> This is working as intended, but when computing column statistics it can 
> cause some anomalies (i.g. bad join order) as compute stats uses NDV() to 
> determine columns NDVs.
>  
> For example when aggregating more columns, the estimated cardinality is 
> [counted as the product of the columns' number of distinct 
> values.|https://github.com/cloudera/Impala/blob/64cd0bb0c3529efa0ab5452c4e9e2a04fd815b4f/fe/src/main/java/org/apache/impala/analysis/Expr.java#L669]
>  If there is a column full of NULLs the whole product will be 0.
>  
> There are two possible fix for this.
> Either we should count NULLs as a distinct value when Computing Stats in the 
> query:
> {code:java}
> SELECT NDV(a) + COUNT(DISTINCT CASE WHEN a IS NULL THEN 1 END) AS a, CAST(-1 
> as BIGINT), 4, CAST(4 as DOUBLE) FROM test;{code}
> instead of
> {code:java}
> SELECT NDV(a) AS a, CAST(-1 as BIGINT), 4, CAST(4 as DOUBLE) FROM test;{code}
>  
>  
> Or we should change the planner 
> [function|https://github.com/cloudera/Impala/blob/2d2579cb31edda24457d33ff5176d79b7c0432c5/fe/src/main/java/org/apache/impala/planner/AggregationNode.java#L169]
>  to take care of this bug.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7592) Planner should correctly estimate UnionNode host count

2018-09-19 Thread Balazs Jeszenszky (JIRA)
Balazs Jeszenszky created IMPALA-7592:
-

 Summary: Planner should correctly estimate UnionNode host count
 Key: IMPALA-7592
 URL: https://issues.apache.org/jira/browse/IMPALA-7592
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Affects Versions: Impala 3.0
Reporter: Balazs Jeszenszky


Currently, the planner estimates UnionNode host count to be the maximum of all 
its children. In reality, scheduler will put these nodes on the union of its 
inputs' hosts. We should update the planner to correctly account for this 
behaviour.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7587) Follow up for IMPALA-2636: consider fixing GetTables() for tables with not loaded metadata

2018-09-18 Thread Balazs Jeszenszky (JIRA)
Balazs Jeszenszky created IMPALA-7587:
-

 Summary: Follow up for IMPALA-2636: consider fixing GetTables() 
for tables with not loaded metadata
 Key: IMPALA-7587
 URL: https://issues.apache.org/jira/browse/IMPALA-7587
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Affects Versions: Impala 3.0
Reporter: Balazs Jeszenszky


Since IMPALA-2636, the HS2 GetTables() call returns the correct type (TABLE or 
VIEW) for tables with already loaded metadata. However, for objects not yet 
loaded, it just returns the default table, even for views.

If possible, we should be able to return the correct response regardless of 
whether a table's metadata is fully loaded.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7578) More flexible placement rules

2018-09-17 Thread Balazs Jeszenszky (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky updated IMPALA-7578:
--
Description: 
Currently, placement rules can be defined based on user names or 
(primary/secondary) group names as a direct mapping. This promotes creating 
much more pools than might be needed, and leads to administrative overhead when 
changing names or adding a new user/group.

It would be good to have a more elaborate mapping, e.g. translate user/group 
names into pools (including potentially mapping multiple users/groups to the 
same pool), either manually or via regular expressions.

  was:
Currently, placement rules can be defined based on user names or 
(primary/secondary) group names as a direct mapping. This promotes creating 
much more pools than might be needed, and leads to administrative overhead in 
when changing names or adding a new user/group.

It would be good to have a more elaborate mapping, e.g. translate user/group 
names into pools (including potentially mapping multiple users/groups to the 
same pool), either manually or via regular expressions.


> More flexible placement rules
> -
>
> Key: IMPALA-7578
> URL: https://issues.apache.org/jira/browse/IMPALA-7578
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Balazs Jeszenszky
>Priority: Major
>
> Currently, placement rules can be defined based on user names or 
> (primary/secondary) group names as a direct mapping. This promotes creating 
> much more pools than might be needed, and leads to administrative overhead 
> when changing names or adding a new user/group.
> It would be good to have a more elaborate mapping, e.g. translate user/group 
> names into pools (including potentially mapping multiple users/groups to the 
> same pool), either manually or via regular expressions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7578) More flexible placement rules

2018-09-17 Thread Balazs Jeszenszky (JIRA)
Balazs Jeszenszky created IMPALA-7578:
-

 Summary: More flexible placement rules
 Key: IMPALA-7578
 URL: https://issues.apache.org/jira/browse/IMPALA-7578
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Affects Versions: Impala 3.0
Reporter: Balazs Jeszenszky


Currently, placement rules can be defined based on user names or 
(primary/secondary) group names as a direct mapping. This promotes creating 
much more pools than might be needed, and leads to administrative overhead in 
when changing names or adding a new user/group.

It would be good to have a more elaborate mapping, e.g. translate user/group 
names into pools (including potentially mapping multiple users/groups to the 
same pool), either manually or via regular expressions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7562) Caused by: java.sql.SQLException: [Cloudera][ImpalaJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: Unknown.

2018-09-12 Thread Balazs Jeszenszky (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky resolved IMPALA-7562.
---
Resolution: Not A Bug

You would probably have to recreate your JDBC connections and implement retry 
logic within your app.
Resolving since Impala itself is working fine, things recover once you restart 
the web application.

> Caused by: java.sql.SQLException: [Cloudera][ImpalaJDBCDriver](500593) 
> Communication link failure. Failed to connect to server. Reason: Unknown.
> 
>
> Key: IMPALA-7562
> URL: https://issues.apache.org/jira/browse/IMPALA-7562
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.12.0
> Environment: centOS 7
>Reporter: ruiliang
>Priority: Major
>  Labels: impala, impala_jdbc
> Attachments: ecliseDeubgCosnle.log
>
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
>  
> I encountered a very strange problem. Spring boot configured impala JDBC 
> query. Under normal circumstances, all SQL queries are normal, and the 
> statement has no problem.However, after I restart the impala service, this 
> exception will be reported, or the same error code will be reported for all 
> SQL queries. After I have to restart my spring boot web service, all the 
> queries will be normal again, as I have tried many times.I looked through the 
> server logs and found that I did not seem to have received the request.But 
> why is this problem so low?Is it driven?My impala is three nodes built under 
> CDH.
> I really can not find out the reason, please help to look.thank you
>  
>  ClouderaImpalaJDBC41-2.6.4.1005.zip
> ImpalaJDBC41.jar
> {code:java}
> //代码占位符
> spring.secondary-datasource.type=com.cloudera.impala.jdbc41.Driver
> datasource.url=jdbc:impala://39.108.9.1:21050/ADM_DB;AuthMech=0;LogLevel=5;LogPath=d:\\temp;
> spring.secondary-datasource.druid.initialSize=2
> spring.secondary-datasource.druid.minIdle=2
> spring.secondary-datasource.druid.maxActive=30
> {code}
>  
>  
>  
> {code:java}
> //代码占位符
> Resolving exception from handler [public com.jx.data.biz.bean.ResultBean 
> com.jx.data.biz.distribution.web.DistributionAnalysisController.show(javax.servlet.http.HttpServletRequest,javax.servlet.http.HttpServletResponse,com.jx.data.biz.distribution.bean.DistributionAnalysisBean)]:
>  org.springframework.dao.DataAccessResourceFailureException: 
> PreparedStatementCallback; SQL [select BUS_DT dt, sum(CASE WHEN METRIC_VAL 
> >=1 and METRIC_VAL<=3 THEN USER_CNT ELSE 0 END) as '1-3', sum(CASE WHEN 
> METRIC_VAL >=4 and METRIC_VAL<=8 THEN USER_CNT ELSE 0 END) as '4-8', sum(CASE 
> WHEN METRIC_VAL >=9 THEN USER_CNT ELSE 0 END) as '9-x' from 
> A_T_BASE_KPI_USER_CNT_SUM_D where BUS_DT>='2018-09-1' and 
> BUS_DT<='2018-09-11' and METRIC_TYPE_CD='BUBAJRGWC01001_COUNT' GROUP BY 
> BUS_DT order by BUS_DT asc ]; [Cloudera][ImpalaJDBCDriver](500593) 
> Communication link failure. Failed to connect to server. Reason: Unknown.; 
> nested exception is java.sql.SQLException: 
> [Cloudera][ImpalaJDBCDriver](500593) Communication link failure. Failed to 
> connect to server. Reason: Unknown.
> 2018-09-12 14:59:40.035 DEBUG [jx-data-analysis,,,] 10628 --- 
> [nio-9005-exec-2] o.s.web.servlet.DispatcherServlet : Could not complete 
> request
> Could not complete request
> org.springframework.dao.DataAccessResourceFailureException: 
> PreparedStatementCallback; SQL [select BUS_DT dt, sum(CASE WHEN METRIC_VAL 
> >=1 and METRIC_VAL<=3 THEN USER_CNT ELSE 0 END) as '1-3', sum(CASE WHEN 
> METRIC_VAL >=4 and METRIC_VAL<=8 THEN USER_CNT ELSE 0 END) as '4-8', sum(CASE 
> WHEN METRIC_VAL >=9 THEN USER_CNT ELSE 0 END) as '9-x' from 
> A_T_BASE_KPI_USER_CNT_SUM_D where BUS_DT>='2018-09-1' and 
> BUS_DT<='2018-09-11' and METRIC_TYPE_CD='BUBAJRGWC01001_COUNT' GROUP BY 
> BUS_DT order by BUS_DT asc ]; [Cloudera][ImpalaJDBCDriver](500593) 
> Communication link failure. Failed to connect to server. Reason: Unknown.; 
> nested exception is java.sql.SQLException: 
> [Cloudera][ImpalaJDBCDriver](500593) Communication link failure. Failed to 
> connect to server. Reason: Unknown.
> at 
> org.springframework.jdbc.support.SQLStateSQLExceptionTranslator.doTranslate(SQLStateSQLExceptionTranslator.java:105)
>  ~[spring-jdbc-4.3.12.RELEASE.jar:4.3.12.RELEASE]
> at 
> org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:73)
>  ~[spring-jdbc-4.3.12.RELEASE.jar:4.3.12.RELEASE]
> at 
> org.springframework.jdbc.support.AbstractFallbackSQLExceptionTranslator.translate(AbstractFallbackSQLExceptionTranslator.java:82)
>  ~[spring-jdbc-4.3.12.RELEASE.jar:4.3.12.RELEASE]

[jira] [Commented] (IMPALA-7528) Division by zero when computing cardinalities of many to many joins on NULL columns

2018-09-05 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16604275#comment-16604275
 ] 

Balazs Jeszenszky commented on IMPALA-7528:
---

Having an NDV of 0 on the right side of the join also trips 
https://github.com/cloudera/Impala/blob/cdh5-trunk/fe/src/main/java/org/apache/impala/planner/JoinNode.java#L305,
 turning every join involving that column into a many to many join.

> Division by zero when computing cardinalities of many to many joins on NULL 
> columns
> ---
>
> Key: IMPALA-7528
> URL: https://issues.apache.org/jira/browse/IMPALA-7528
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.12.0
>Reporter: Balazs Jeszenszky
>Priority: Major
>
> The following:
> {code:java}
> | F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1 |
> | Per-Host Resources: mem-estimate=33.94MB mem-reservation=1.94MB|
> | 02:HASH JOIN [INNER JOIN, BROADCAST]   |
> | |  hash predicates: b.code = a.code|
> | |  fk/pk conjuncts: none   |
> | |  runtime filters: RF000 <- a.code|
> | |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB |
> | |  tuple-ids=1,0 row-size=163B cardinality=9223372036854775807 |
> | |  |
> | |--03:EXCHANGE [BROADCAST] |
> | |  |  mem-estimate=0B mem-reservation=0B   |
> | |  |  tuple-ids=0 row-size=82B cardinality=823 |
> | |  |   |
> | |  F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1  |
> | |  Per-Host Resources: mem-estimate=32.00MB mem-reservation=0B |
> | |  00:SCAN HDFS [default.sample_07 a, RANDOM]  |
> | | partitions=1/1 files=1 size=44.98KB  |
> | | stats-rows=823 extrapolated-rows=disabled|
> | | table stats: rows=823 size=44.98KB   |
> | | column stats: all|
> | | mem-estimate=32.00MB mem-reservation=0B  |
> | | tuple-ids=0 row-size=82B cardinality=823 |
> | |  |
> | 01:SCAN HDFS [default.sample_08 b, RANDOM] |
> |partitions=1/1 files=1 size=44.99KB |
> |runtime filters: RF000 -> b.code|
> |stats-rows=823 extrapolated-rows=disabled   |
> |table stats: rows=823 size=44.99KB  |
> |column stats: all   |
> |mem-estimate=32.00MB mem-reservation=0B |
> |tuple-ids=1 row-size=82B cardinality=823|
> ++
> {code}
> is the result of both join columns having 0 as NDV.
> https://github.com/cloudera/Impala/blob/cdh5-trunk/fe/src/main/java/org/apache/impala/planner/JoinNode.java#L368
> should handle this more gracefully.
> IMPALA-7310 makes it a bit more likely that someone will run into this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7528) Division by zero when computing cardinalities of many to many joins on NULL columns

2018-09-05 Thread Balazs Jeszenszky (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky updated IMPALA-7528:
--
Description: 
The following:

{code:java}
| F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1 |
| Per-Host Resources: mem-estimate=33.94MB mem-reservation=1.94MB|
| 02:HASH JOIN [INNER JOIN, BROADCAST]   |
| |  hash predicates: b.code = a.code|
| |  fk/pk conjuncts: none   |
| |  runtime filters: RF000 <- a.code|
| |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB |
| |  tuple-ids=1,0 row-size=163B cardinality=9223372036854775807 |
| |  |
| |--03:EXCHANGE [BROADCAST] |
| |  |  mem-estimate=0B mem-reservation=0B   |
| |  |  tuple-ids=0 row-size=82B cardinality=823 |
| |  |   |
| |  F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1  |
| |  Per-Host Resources: mem-estimate=32.00MB mem-reservation=0B |
| |  00:SCAN HDFS [default.sample_07 a, RANDOM]  |
| | partitions=1/1 files=1 size=44.98KB  |
| | stats-rows=823 extrapolated-rows=disabled|
| | table stats: rows=823 size=44.98KB   |
| | column stats: all|
| | mem-estimate=32.00MB mem-reservation=0B  |
| | tuple-ids=0 row-size=82B cardinality=823 |
| |  |
| 01:SCAN HDFS [default.sample_08 b, RANDOM] |
|partitions=1/1 files=1 size=44.99KB |
|runtime filters: RF000 -> b.code|
|stats-rows=823 extrapolated-rows=disabled   |
|table stats: rows=823 size=44.99KB  |
|column stats: all   |
|mem-estimate=32.00MB mem-reservation=0B |
|tuple-ids=1 row-size=82B cardinality=823|
++
{code}

is the result of both join columns having 0 as NDV.
https://github.com/cloudera/Impala/blob/cdh5-trunk/fe/src/main/java/org/apache/impala/planner/JoinNode.java#L368
should handle this more gracefully.

IMPALA-7310 makes it a bit more likely that someone will run into this. 

  was:
The following:

{code:java}
| F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1 |
| Per-Host Resources: mem-estimate=33.94MB mem-reservation=1.94MB|
| 02:HASH JOIN [INNER JOIN, BROADCAST]   |
| |  hash predicates: b.code = a.code|
| |  fk/pk conjuncts: none   |
| |  runtime filters: RF000 <- a.code|
| |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB |
| |  tuple-ids=1,0 row-size=163B cardinality=9223372036854775807 |
| |  |
| |--03:EXCHANGE [BROADCAST] |
| |  |  mem-estimate=0B mem-reservation=0B   |
| |  |  tuple-ids=0 row-size=82B cardinality=823 |
| |  |   |
| |  F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1  |
| |  Per-Host Resources: mem-estimate=32.00MB mem-reservation=0B |
| |  00:SCAN HDFS [default.sample_07 a, RANDOM]  |
| | partitions=1/1 files=1 size=44.98KB  |
| | stats-rows=823 extrapolated-rows=disabled|
| | table stats: rows=823 size=44.98KB   |
| | column stats: all|
| | mem-estimate=32.00MB mem-reservation=0B  |
| | tuple-ids=0 row-size=82B cardinality=823 |
| |  |
| 01:SCAN HDFS [default.sample_08 b, RANDOM] |
|partitions=1/1 files=1 size=44.99KB |
|runtime filters: RF000 -> b.code|
|stats-rows=823 extrapolated-rows=disabled   |
|table stats: rows=823 size=44.99KB  |
|column stats: all   |
|mem-estimate=32.00MB mem-reservation=0B |
|tuple-ids=1 

[jira] [Created] (IMPALA-7528) Division by zero when computing cardinalities of many to many joins on NULL columns

2018-09-05 Thread Balazs Jeszenszky (JIRA)
Balazs Jeszenszky created IMPALA-7528:
-

 Summary: Division by zero when computing cardinalities of many to 
many joins on NULL columns
 Key: IMPALA-7528
 URL: https://issues.apache.org/jira/browse/IMPALA-7528
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Affects Versions: Impala 2.12.0
Reporter: Balazs Jeszenszky


The following:

{code:java}
| F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1 |
| Per-Host Resources: mem-estimate=33.94MB mem-reservation=1.94MB|
| 02:HASH JOIN [INNER JOIN, BROADCAST]   |
| |  hash predicates: b.code = a.code|
| |  fk/pk conjuncts: none   |
| |  runtime filters: RF000 <- a.code|
| |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB |
| |  tuple-ids=1,0 row-size=163B cardinality=9223372036854775807 |
| |  |
| |--03:EXCHANGE [BROADCAST] |
| |  |  mem-estimate=0B mem-reservation=0B   |
| |  |  tuple-ids=0 row-size=82B cardinality=823 |
| |  |   |
| |  F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1  |
| |  Per-Host Resources: mem-estimate=32.00MB mem-reservation=0B |
| |  00:SCAN HDFS [default.sample_07 a, RANDOM]  |
| | partitions=1/1 files=1 size=44.98KB  |
| | stats-rows=823 extrapolated-rows=disabled|
| | table stats: rows=823 size=44.98KB   |
| | column stats: all|
| | mem-estimate=32.00MB mem-reservation=0B  |
| | tuple-ids=0 row-size=82B cardinality=823 |
| |  |
| 01:SCAN HDFS [default.sample_08 b, RANDOM] |
|partitions=1/1 files=1 size=44.99KB |
|runtime filters: RF000 -> b.code|
|stats-rows=823 extrapolated-rows=disabled   |
|table stats: rows=823 size=44.99KB  |
|column stats: all   |
|mem-estimate=32.00MB mem-reservation=0B |
|tuple-ids=1 row-size=82B cardinality=823|
++
{code}

is the result of both join columns having 0 as NDV.
https://github.com/cloudera/Impala/blob/cdh5-trunk/fe/src/main/java/org/apache/impala/planner/JoinNode.java#L368
should handle a potential division by zero.

IMPALA-7310 makes it a bit more likely that someone will run into this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7528) Division by zero when computing cardinalities of many to many joins on NULL columns

2018-09-05 Thread Balazs Jeszenszky (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky updated IMPALA-7528:
--
Issue Type: Bug  (was: Improvement)

> Division by zero when computing cardinalities of many to many joins on NULL 
> columns
> ---
>
> Key: IMPALA-7528
> URL: https://issues.apache.org/jira/browse/IMPALA-7528
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.12.0
>Reporter: Balazs Jeszenszky
>Priority: Major
>
> The following:
> {code:java}
> | F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=1 |
> | Per-Host Resources: mem-estimate=33.94MB mem-reservation=1.94MB|
> | 02:HASH JOIN [INNER JOIN, BROADCAST]   |
> | |  hash predicates: b.code = a.code|
> | |  fk/pk conjuncts: none   |
> | |  runtime filters: RF000 <- a.code|
> | |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB |
> | |  tuple-ids=1,0 row-size=163B cardinality=9223372036854775807 |
> | |  |
> | |--03:EXCHANGE [BROADCAST] |
> | |  |  mem-estimate=0B mem-reservation=0B   |
> | |  |  tuple-ids=0 row-size=82B cardinality=823 |
> | |  |   |
> | |  F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1  |
> | |  Per-Host Resources: mem-estimate=32.00MB mem-reservation=0B |
> | |  00:SCAN HDFS [default.sample_07 a, RANDOM]  |
> | | partitions=1/1 files=1 size=44.98KB  |
> | | stats-rows=823 extrapolated-rows=disabled|
> | | table stats: rows=823 size=44.98KB   |
> | | column stats: all|
> | | mem-estimate=32.00MB mem-reservation=0B  |
> | | tuple-ids=0 row-size=82B cardinality=823 |
> | |  |
> | 01:SCAN HDFS [default.sample_08 b, RANDOM] |
> |partitions=1/1 files=1 size=44.99KB |
> |runtime filters: RF000 -> b.code|
> |stats-rows=823 extrapolated-rows=disabled   |
> |table stats: rows=823 size=44.99KB  |
> |column stats: all   |
> |mem-estimate=32.00MB mem-reservation=0B |
> |tuple-ids=1 row-size=82B cardinality=823|
> ++
> {code}
> is the result of both join columns having 0 as NDV.
> https://github.com/cloudera/Impala/blob/cdh5-trunk/fe/src/main/java/org/apache/impala/planner/JoinNode.java#L368
> should handle a potential division by zero.
> IMPALA-7310 makes it a bit more likely that someone will run into this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7505) impalad webserver hang in getting a lock of QueryExecStatus

2018-08-29 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596092#comment-16596092
 ] 

Balazs Jeszenszky commented on IMPALA-7505:
---

[~wangchen1ren] FYI, IMPALA-1972 and IMPALA-3882 track some of the issues 
around this.

> impalad webserver hang in getting a lock of QueryExecStatus
> ---
>
> Key: IMPALA-7505
> URL: https://issues.apache.org/jira/browse/IMPALA-7505
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.8.0
>Reporter: Chen Wang
>Priority: Major
> Attachments: gdb.out
>
>
> Impalad’s webserver would hang sometimes.
> The following is one of the cases: the webserver threads stuck in getting a 
> lock of QueryExecStatus,  but I can't find where the lock is acquired in the 
> stack. The web requests are sent from the agent of CDH, which is to check the 
> activity of impalad.
> Full gdb log is in the attachment.
> {code}
> Thread 116 (Thread 0x7f288f5e1700 (LWP 31062)):
> #0  0x00378780e334 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x0037878095d8 in _L_lock_854 () from /lib64/libpthread.so.0
> #2  0x0037878094a7 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x008d6eb8 in pthread_mutex_lock (this=0xcab4f50) at 
> /data/impala/toolchain/boost-1.57.0/include/boost/thread/pthread/mutex.hpp:62
> #4  boost::mutex::lock (this=0xcab4f50) at 
> /data/impala/toolchain/boost-1.57.0/include/boost/thread/pthread/mutex.hpp:116
> #5  0x00b7903c in lock_guard (this=0xa7b5800, query_id=) at 
> /data/impala/toolchain/boost-1.57.0/include/boost/thread/lock_guard.hpp:38
> #6  impala::ImpalaServer::GetRuntimeProfileStr (this=0xa7b5800, query_id=) at 
> /data/impala/be/src/service/impala-server.cc:573
> #7  0x00ba6a8c in 
> impala::ImpalaHttpHandler::QueryProfileEncodedHandler (this=0x3f56be0, args=) 
> at /data/impala/be/src/service/impala-http-handler.cc:219
> #8  0x00cafe75 in operator() (this=) at 
> /data/impala/toolchain/boost-1.57.0/include/boost/function/function_template.hpp:767
> #9  impala::Webserver::RenderUrlWithTemplate (this=) at 
> /data/impala/be/src/util/webserver.cc:443
> #10 0x00cb1295 in impala::Webserver::BeginRequestCallback (this=) at 
> /data/impala/be/src/util/webserver.cc:414
> #11 0x00cc4850 in handle_request ()
> #12 0x00cc6fcd in process_new_connection ()
> #13 0x00cc765d in worker_thread ()
> #14 0x003787807aa1 in start_thread () from /lib64/libpthread.so.0
> #15 0x0037874e8bcd in clone () from /lib64/libc.so.6
> {code}
> The hang situation appears on impala 2.8.0, but I found that code of 
> be/service part hasn’t changed much from 2.8.0 to 2.11.0. so the problem may 
> still exists.
> Hope you experts can give me some guidance of finding the root cause, or 
> workaround plans to deal with these hang situation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7497) Consider reintroducing numNulls count in compute stats

2018-08-28 Thread Balazs Jeszenszky (JIRA)
Balazs Jeszenszky created IMPALA-7497:
-

 Summary: Consider reintroducing numNulls count in compute stats
 Key: IMPALA-7497
 URL: https://issues.apache.org/jira/browse/IMPALA-7497
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Reporter: Balazs Jeszenszky


IMPALA-1003 disabled numNulls calculations for performance reasons. A lot has 
changed since, it'd be good to reevaluate computing numNulls. The planner still 
has some code that uses it for cardinality estimations, so just adding it back 
to compute stats can lead to easy wins.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-4830) Add metrics for alter table recover partition

2018-08-23 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-4830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16590115#comment-16590115
 ] 

Balazs Jeszenszky commented on IMPALA-4830:
---

We should separate HMS calls, serialization, and lock wait time.

> Add metrics for alter table recover partition
> -
>
> Key: IMPALA-4830
> URL: https://issues.apache.org/jira/browse/IMPALA-4830
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 2.8.0
>Reporter: Peter Ebert
>Priority: Minor
>  Labels: ramp-up
>
> It would be nice to have some metrics in the query profile for alter table 
> recover partitions.  Right now if the catalogopexectimer takes time it's hard 
> to say why.
> Having the number of partitions added as a result of this operation, and the 
> files within them would be helpful.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7478) Add timer to impalad-side of DDLs

2018-08-23 Thread Balazs Jeszenszky (JIRA)
Balazs Jeszenszky created IMPALA-7478:
-

 Summary: Add timer to impalad-side of DDLs
 Key: IMPALA-7478
 URL: https://issues.apache.org/jira/browse/IMPALA-7478
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Affects Versions: Impala 3.0
Reporter: Balazs Jeszenszky


Currently, profiles include CatalogOpExecTimer, but aren't helpful if the time 
is spent on the impalad side. A scoped timer to ProcessCatalogUpdateResult 
would be useful for clarity.

This is what we've come across:
{code:java}
Query Timeline: 6s001ms
   - Query submitted: 159.835us (159.835us)
   - Planning finished: 1.686ms (1.526ms)
   - Rows available: 5s999ms (5s997ms)
   - First row fetched: 6s000ms (1.027ms)
   - Unregister query: 6s000ms (709.442us)
  ImpalaServer:
 - CatalogOpExecTimer: 3s188ms
 - ClientFetchWaitTimer: 1.690ms
 - RowMaterializationTimer: 0.000ns
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7474) Tool to identify CPU bottlenecks

2018-08-22 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16588701#comment-16588701
 ] 

Balazs Jeszenszky commented on IMPALA-7474:
---

[~anujphadke] where would these 'calls' be printed? Profiles, logs, some new 
place? What is the expected output of seeing these calls, e.g. how would the 
output look like in the common but hard to identify case of concurrent queries 
competing for CPU?

> Tool to identify CPU bottlenecks
> 
>
> Key: IMPALA-7474
> URL: https://issues.apache.org/jira/browse/IMPALA-7474
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 2.12.0
>Reporter: Anuj Phadke
>Assignee: Anuj Phadke
>Priority: Major
>  Labels: supportability
>
> We run into a bunch of issues where we run impala into hangs or impacts query 
> performance issues due to a very high CPU usage.
> A tool which periodically collects stacks from impala (when enabled) and 
> prints calls with high CPU usage would be very useful for debugging such 
> issues. 
> Running this tool should ideally incur a minimalistic overhead on impalad 
> while collecting the stacks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7168) DML query may hang if CatalogUpdateCallback() encounters repeated error

2018-08-07 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572224#comment-16572224
 ] 

Balazs Jeszenszky commented on IMPALA-7168:
---

To rephrase, the issue here is that the new subscriber upon joining will have a 
catalog version of 0, which gets propagated as the minimum topic version for 
the catalog topic. If the new subscriber fails to process the initial update 
and keeps re-requesting it (locking its catalog version at 0), SYNC_DDL queries 
will hang.
Without having an initial catalog update processed, the coordinator will not 
serve any queries, and so its metadata staleness isn't relevant for the 
purposes of SYNC_DDL. Maybe it's enough to just ignore 0 values for minimum 
subscriber topic version?

> DML query may hang if CatalogUpdateCallback() encounters repeated error
> ---
>
> Key: IMPALA-7168
> URL: https://issues.apache.org/jira/browse/IMPALA-7168
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 3.0, 
> Impala 2.12.0
>Reporter: Pranay Singh
>Priority: Major
>
> DML queries or INSERT  will encounter a hang, if 
> exec_env_->frontend()->UpdateCatalogCache() in 
> ImpalaServer::CatalogUpdateCallback encounters repeated error like ENOMEM. 
> This happens with SYNC_DDL set to 1 when the coordinator node is waiting for 
> it's catalog version to become current.
> The scenario shows up like this, lets say there are two coordinator nodes , 
> Node A, Node B
> and catalogd and statestored are running on Node C.
> a) CREATE TABLE is executed on Node A, with SYNC_DDL set to 1, the thread 
> running the query is going to block in 
> impala::ImpalaServer::ProcessCatalogUpdateResult(), waiting for it's catalog 
> version to become current.
> b) Meanwhile statestored running on Node C would call 
> ImpalaServer::CatalogUpdateCallback on Node B via thrift RPC to do a delta 
> topic update, which would not happen if we encounter repeated errors, say 
> front end is low on memory (low JVM heap situation).
> c) In such case Node A will wait indefinitely waiting for it's catalog 
> version to become current, till Node B is shutdown voluntarily.
> Note: This is a case where Node B is reachable (hearbeat is fine, but node is 
> in a bad state, non working).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7346) Capture locality data for debug

2018-07-25 Thread Balazs Jeszenszky (JIRA)
Balazs Jeszenszky created IMPALA-7346:
-

 Summary: Capture locality data for debug
 Key: IMPALA-7346
 URL: https://issues.apache.org/jira/browse/IMPALA-7346
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Affects Versions: Impala 2.12.0
Reporter: Balazs Jeszenszky


With IMPALA-5872, it will be possible to capture and recreate query-specific 
details when things break.
Currently there's no way to include data locality information in such test 
cases. For example, if the original issue was reported from a 150 node cluster, 
a reproduction on a 4 node test cluster is not guaranteed to be accurate.

Having the ability to 'pin' a plan (and just export that along with metadata) 
is not a complete solution (it doesn't enable testing the planner itself).

A way to do this could be to to dump locality data to tblproperties.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7322) Add storage wait time to profile for operations with metadata load

2018-07-20 Thread Balazs Jeszenszky (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky updated IMPALA-7322:
--
Description: The profile of a REFRESH or of the query triggering metadata 
load should point out how much time was spent waiting for source systems.  
(was: The profiles of a REFRESH or INVALIDATE should point out how much time 
was spent waiting for source systems.)
Summary: Add storage wait time to profile for operations with metadata 
load  (was: Add storage wait time to REFRESH and INVALIDATE profiles)

> Add storage wait time to profile for operations with metadata load
> --
>
> Key: IMPALA-7322
> URL: https://issues.apache.org/jira/browse/IMPALA-7322
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Balazs Jeszenszky
>Priority: Major
>
> The profile of a REFRESH or of the query triggering metadata load should 
> point out how much time was spent waiting for source systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7322) Add storage wait time to REFRESH and INVALIDATE profiles

2018-07-19 Thread Balazs Jeszenszky (JIRA)
Balazs Jeszenszky created IMPALA-7322:
-

 Summary: Add storage wait time to REFRESH and INVALIDATE profiles
 Key: IMPALA-7322
 URL: https://issues.apache.org/jira/browse/IMPALA-7322
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Affects Versions: Impala 2.12.0, Impala 3.0
Reporter: Balazs Jeszenszky


The profiles of a REFRESH or INVALIDATE should point out how much time was 
spent waiting for source systems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7282) Sentry privilege disappears after a catalog refresh

2018-07-13 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16542724#comment-16542724
 ] 

Balazs Jeszenszky commented on IMPALA-7282:
---

Did some more testing. Here's a copy-pasteable set of statements to repro:

{code:java}
create table test (id int);
create role foo_role;
grant select(id) on table test to role foo_role;
grant all on server to role foo_role;
show grant role foo_role;
invalidate metadata;
show grant role foo_role;
revoke all on server from foo_role;
show grant role foo_role;
invalidate metadata;
show grant role foo_role;
{code}

I tested it all the way to impala 2.5, and repro's on all of the released 
versions from 2.5 to 2.12. The roles are removed from Sentry's DB by catalogd 
(so it's not just that catalog fails to reload them). Since that's the case, 
the last 'invalidate metadata' is not even necessary, once the catalog update 
gets back to the impalad via statestore, the privilege is removed.


> Sentry privilege disappears after a catalog refresh
> ---
>
> Key: IMPALA-7282
> URL: https://issues.apache.org/jira/browse/IMPALA-7282
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog, Security
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Fredy Wijaya
>Priority: Critical
>  Labels: security
>
> {noformat}
> [localhost:21000] default> grant select on database functional to role 
> foo_role;
> Query: grant select on database functional to role foo_role
> +-+
> | summary |
> +-+
> | Privilege(s) have been granted. |
> +-+
> Fetched 1 row(s) in 0.05s
> [localhost:21000] default> grant all on database functional to role foo_role;
> Query: grant all on database functional to role foo_role
> +-+
> | summary |
> +-+
> | Privilege(s) have been granted. |
> +-+
> Fetched 1 row(s) in 0.03s
> [localhost:21000] default> show grant role foo_role;
> Query: show grant role foo_role
> +--++---++-+---+--+-+
> | scope| database   | table | column | uri | privilege | grant_option | 
> create_time |
> +--++---++-+---+--+-+
> | database | functional |   || | select| false| 
> NULL|
> | database | functional |   || | all   | false| 
> NULL|
> +--++---++-+---+--+-+
> Fetched 2 row(s) in 0.02s
> [localhost:21000] default> show grant role foo_role;
> Query: show grant role foo_role
> +--++---++-+---+--+---+
> | scope| database   | table | column | uri | privilege | grant_option | 
> create_time   |
> +--++---++-+---+--+---+
> | database | functional |   || | all   | false| 
> Wed, Jul 11 2018 15:38:41.113 |
> +--++---++-+---+--+---+
> Fetched 1 row(s) in 0.01s
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7291) [DOCS] Document recommendation to use VARCHAR or STRING instead of CHAR(N)

2018-07-12 Thread Balazs Jeszenszky (JIRA)
Balazs Jeszenszky created IMPALA-7291:
-

 Summary: [DOCS] Document recommendation to use VARCHAR or STRING 
instead of CHAR(N)
 Key: IMPALA-7291
 URL: https://issues.apache.org/jira/browse/IMPALA-7291
 Project: IMPALA
  Issue Type: Improvement
  Components: Docs
Affects Versions: Impala 2.12.0
Reporter: Balazs Jeszenszky


CHAR(N) currently does not have codegen support. For that reason, we should 
recommend customers to use VARCHAR or STRING instead - the gain of codegen 
outweighs the benefits of fixed width CHARs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7282) Sentry privilege disappears after a catalog refresh

2018-07-12 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16542083#comment-16542083
 ] 

Balazs Jeszenszky commented on IMPALA-7282:
---

Should this be critical / blocker?
Which change introduced this bug?

> Sentry privilege disappears after a catalog refresh
> ---
>
> Key: IMPALA-7282
> URL: https://issues.apache.org/jira/browse/IMPALA-7282
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog, Security
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Fredy Wijaya
>Priority: Major
>  Labels: security
>
> {noformat}
> [localhost:21000] default> grant select on database functional to role 
> foo_role;
> Query: grant select on database functional to role foo_role
> +-+
> | summary |
> +-+
> | Privilege(s) have been granted. |
> +-+
> Fetched 1 row(s) in 0.05s
> [localhost:21000] default> grant all on database functional to role foo_role;
> Query: grant all on database functional to role foo_role
> +-+
> | summary |
> +-+
> | Privilege(s) have been granted. |
> +-+
> Fetched 1 row(s) in 0.03s
> [localhost:21000] default> show grant role foo_role;
> Query: show grant role foo_role
> +--++---++-+---+--+-+
> | scope| database   | table | column | uri | privilege | grant_option | 
> create_time |
> +--++---++-+---+--+-+
> | database | functional |   || | select| false| 
> NULL|
> | database | functional |   || | all   | false| 
> NULL|
> +--++---++-+---+--+-+
> Fetched 2 row(s) in 0.02s
> [localhost:21000] default> show grant role foo_role;
> Query: show grant role foo_role
> +--++---++-+---+--+---+
> | scope| database   | table | column | uri | privilege | grant_option | 
> create_time   |
> +--++---++-+---+--+---+
> | database | functional |   || | all   | false| 
> Wed, Jul 11 2018 15:38:41.113 |
> +--++---++-+---+--+---+
> Fetched 1 row(s) in 0.01s
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7288) Codegen crash in FinalizeModule()

2018-07-12 Thread Balazs Jeszenszky (JIRA)
Balazs Jeszenszky created IMPALA-7288:
-

 Summary: Codegen crash in FinalizeModule()
 Key: IMPALA-7288
 URL: https://issues.apache.org/jira/browse/IMPALA-7288
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 2.12.0
Reporter: Balazs Jeszenszky


The following sequence crashes Impala 2.12 reliably:
{code}
CREATE TABLE test (c1 CHAR(6),c2 CHAR(6));
select 1 from test t1, test t2
where t1.c1 = FROM_TIMESTAMP(cast(t2.c2 as string), 'MMdd');
{code}

hs_err_pid has:
{code}
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x03b36ce4, pid=28459, tid=0x7f2c49685700
#
# JRE version: Java(TM) SE Runtime Environment (8.0_162-b12) (build 
1.8.0_162-b12)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.162-b12 mixed mode linux-amd64 
compressed oops)
# Problematic frame:
# C  [impalad+0x3736ce4]  llvm::Value::getContext() const+0x4
{code}

Backtrace is:
{code}
#0  0x7f2cb217a5f7 in raise () from /lib64/libc.so.6
#1  0x7f2cb217bce8 in abort () from /lib64/libc.so.6
#2  0x7f2cb4de2f35 in os::abort(bool) () from 
/usr/java/latest/jre/lib/amd64/server/libjvm.so
#3  0x7f2cb4f86f33 in VMError::report_and_die() () from 
/usr/java/latest/jre/lib/amd64/server/libjvm.so
#4  0x7f2cb4de922f in JVM_handle_linux_signal () from 
/usr/java/latest/jre/lib/amd64/server/libjvm.so
#5  0x7f2cb4ddf253 in signalHandler(int, siginfo*, void*) () from 
/usr/java/latest/jre/lib/amd64/server/libjvm.so
#6  
#7  0x03b36ce4 in llvm::Value::getContext() const ()
#8  0x03b36cff in llvm::Value::getValueName() const ()
#9  0x03b36de9 in llvm::Value::getName() const ()
#10 0x01ba6bb2 in impala::LlvmCodeGen::FinalizeModule (this=0x9b53980)
at 
/usr/src/debug/impala-2.12.0-cdh5.15.0/be/src/codegen/llvm-codegen.cc:1076
#11 0x018f5c0f in impala::FragmentInstanceState::Open (this=0xac0b400)
at 
/usr/src/debug/impala-2.12.0-cdh5.15.0/be/src/runtime/fragment-instance-state.cc:255
#12 0x018f3699 in impala::FragmentInstanceState::Exec (this=0xac0b400)
at 
/usr/src/debug/impala-2.12.0-cdh5.15.0/be/src/runtime/fragment-instance-state.cc:80
#13 0x019028c3 in impala::QueryState::ExecFInstance (this=0x9c6ad00, 
fis=0xac0b400)
at /usr/src/debug/impala-2.12.0-cdh5.15.0/be/src/runtime/query-state.cc:410
#14 0x0190113c in impala::QueryStateoperator()(void) 
const (__closure=0x7f2c49684be8)
at /usr/src/debug/impala-2.12.0-cdh5.15.0/be/src/runtime/query-state.cc:350
#15 0x019034dd in 
boost::detail::function::void_function_obj_invoker0,
 void>::invoke(boost::detail::function::function_buffer &) 
(function_obj_ptr=...)
at 
/usr/src/debug/impala-2.12.0-cdh5.15.0/toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:153
{code}

Crash is at 
https://github.com/cloudera/Impala/blob/cdh5-2.12.0_5.15.0/be/src/codegen/llvm-codegen.cc#L1070-L1079.
The repro steps seem to be quite specific.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7282) Sentry privilege disappears after a catalog refresh

2018-07-12 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16541334#comment-16541334
 ] 

Balazs Jeszenszky commented on IMPALA-7282:
---

[~fredyw] this seems like a potentially pretty bad issue. Any more details 
would be welcome (in particular, are you confident it does not affect 2.x?).

> Sentry privilege disappears after a catalog refresh
> ---
>
> Key: IMPALA-7282
> URL: https://issues.apache.org/jira/browse/IMPALA-7282
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog, Security
>Affects Versions: Impala 3.0
>Reporter: Fredy Wijaya
>Priority: Major
>  Labels: security
>
> {noformat}
> [localhost:21000] default> grant select on database functional to role 
> foo_role;
> Query: grant select on database functional to role foo_role
> +-+
> | summary |
> +-+
> | Privilege(s) have been granted. |
> +-+
> Fetched 1 row(s) in 0.05s
> [localhost:21000] default> grant all on database functional to role foo_role;
> Query: grant all on database functional to role foo_role
> +-+
> | summary |
> +-+
> | Privilege(s) have been granted. |
> +-+
> Fetched 1 row(s) in 0.03s
> [localhost:21000] default> show grant role foo_role;
> Query: show grant role foo_role
> +--++---++-+---+--+-+
> | scope| database   | table | column | uri | privilege | grant_option | 
> create_time |
> +--++---++-+---+--+-+
> | database | functional |   || | select| false| 
> NULL|
> | database | functional |   || | all   | false| 
> NULL|
> +--++---++-+---+--+-+
> Fetched 2 row(s) in 0.02s
> [localhost:21000] default> show grant role foo_role;
> Query: show grant role foo_role
> +--++---++-+---+--+---+
> | scope| database   | table | column | uri | privilege | grant_option | 
> create_time   |
> +--++---++-+---+--+---+
> | database | functional |   || | all   | false| 
> Wed, Jul 11 2018 15:38:41.113 |
> +--++---++-+---+--+---+
> Fetched 1 row(s) in 0.01s
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7225) Refresh on single partition resets partition's row count to -1

2018-07-10 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16538119#comment-16538119
 ] 

Balazs Jeszenszky commented on IMPALA-7225:
---

[~bharathv] for now we should just leave it untouched (on 250 in the example). 
In the future it would be good to make it possible to auto-update stats, not 
just here but on insert, etc. Until then, Impala shouldn't change stats unless 
explicitly told so.

> Refresh on single partition resets partition's row count to -1
> --
>
> Key: IMPALA-7225
> URL: https://issues.apache.org/jira/browse/IMPALA-7225
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.10.0, Impala 2.12.0
>Reporter: Mala Chikka Kempanna
>Priority: Major
>
> Doing refresh on single partition resets it's row count to -1
>  
> {code:java}
> [host-2.x.y.z:21000] > show partitions web_logs_new;
> Query: show partitions web_logs_new
> ++---++--+--+---++---+-+
> | date_col | #Rows | #Files | Size | Bytes Cached | Cache Replication | 
> Format | Incremental stats | Location |
> ++---++--+--+---++---+-+
> | 2015-11-18 | -1 | 1 | 112.15KB | NOT CACHED | NOT CACHED | TEXT | false | 
> hdfs://nightly513-unsecure-1.gce.cloudera.com:8020/user/hive/warehouse/web_logs_new/date_col=2015-11-18
>  |
> | 2015-11-19 | -1 | 1 | 98.83KB | NOT CACHED | NOT CACHED | TEXT | false | 
> hdfs://nightly513-unsecure-1.gce.cloudera.com:8020/user/hive/warehouse/web_logs_new/date_col=2015-11-19
>  |
> | 2015-11-20 | -1 | 1 | 101.57KB | NOT CACHED | NOT CACHED | TEXT | false | 
> hdfs://nightly513-unsecure-1.gce.cloudera.com:8020/user/hive/warehouse/web_logs_new/date_col=2015-11-20
>  |
> | 2015-11-21 | -1 | 1 | 82.99KB | NOT CACHED | NOT CACHED | TEXT | false | 
> hdfs://nightly513-unsecure-1.gce.cloudera.com:8020/user/hive/warehouse/web_logs_new/date_col=2015-11-21
>  |
> | Total | -1 | 4 | 395.54KB | 0B | | | | |
> ++---++--+--+---++---+-+
> Fetched 5 row(s) in 0.01s
> [host-2.x.y.z:21000] > compute stats web_logs_new;
> Query: compute stats web_logs_new
> +--+
> | summary |
> +--+
> | Updated 4 partition(s) and 28 column(s). |
> +--+
> Fetched 1 row(s) in 1.31s
> [nightly513-unsecure-2.gce.cloudera.com:21000] > show partitions web_logs_new;
> Query: show partitions web_logs_new
> ++---++--+--+---++---+-+
> | date_col | #Rows | #Files | Size | Bytes Cached | Cache Replication | 
> Format | Incremental stats | Location |
> ++---++--+--+---++---+-+
> | 2015-11-18 | 250 | 1 | 112.15KB | NOT CACHED | NOT CACHED | TEXT | false | 
> hdfs://nightly513-unsecure-1.gce.cloudera.com:8020/user/hive/warehouse/web_logs_new/date_col=2015-11-18
>  |
> | 2015-11-19 | 250 | 1 | 98.83KB | NOT CACHED | NOT CACHED | TEXT | false | 
> hdfs://nightly513-unsecure-1.gce.cloudera.com:8020/user/hive/warehouse/web_logs_new/date_col=2015-11-19
>  |
> | 2015-11-20 | 250 | 1 | 101.57KB | NOT CACHED | NOT CACHED | TEXT | false | 
> hdfs://nightly513-unsecure-1.gce.cloudera.com:8020/user/hive/warehouse/web_logs_new/date_col=2015-11-20
>  |
> | 2015-11-21 | 250 | 1 | 82.99KB | NOT CACHED | NOT CACHED | TEXT | false | 
> hdfs://nightly513-unsecure-1.gce.cloudera.com:8020/user/hive/warehouse/web_logs_new/date_col=2015-11-21
>  |
> | Total | 1000 | 4 | 395.54KB | 0B | | | | |
> ++---++--+--+---++---+-+
> Fetched 5 row(s) in 0.01s
> [host-2.x.y.z:21000] > refresh web_logs_new partition(date_col='2015-11-18');
> Query: refresh web_logs_new partition(date_col='2015-11-18')
> Query submitted at: 2018-06-29 12:53:32 (Coordinator: 
> 

[jira] [Commented] (IMPALA-7232) Display whether fragment instances' profile is complete

2018-07-01 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16529192#comment-16529192
 ] 

Balazs Jeszenszky commented on IMPALA-7232:
---

I think IMPALA-6741 would help. [~kwho] would that be a good solution?

> Display whether fragment instances' profile is complete
> ---
>
> Key: IMPALA-7232
> URL: https://issues.apache.org/jira/browse/IMPALA-7232
> Project: IMPALA
>  Issue Type: Task
>  Components: Distributed Exec
>Affects Versions: Impala 3.1.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Major
>
> While working on IMPALA-7213, it's noticed that we can fail to serialize or 
> deserialize a profile for random reasons. This shouldn't be fatal: the 
> fragment instance status can still be presented to the coordinator to avoid 
> hitting IMPALA-2990. A missing profile in ReportExecStatus() RPC may result 
> in incomplete or stale profile being presented to Impala client. It would be 
> helpful to mark whether the profile may be incomplete and/or final in the 
> profile output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7208) Consider using quickstack instead of pstack

2018-06-25 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16522818#comment-16522818
 ] 

Balazs Jeszenszky commented on IMPALA-7208:
---

This sounds great. 
The repo mentions 'guessing' caller functions, which to me implies that it's 
inaccurate in edge cases. Do we know what these are?

> Consider using quickstack instead of pstack
> ---
>
> Key: IMPALA-7208
> URL: https://issues.apache.org/jira/browse/IMPALA-7208
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: supportability
> Attachments: quickstack.out
>
>
> This is an alternative to heavyweight methods like pstack and gdb that can 
> block the process while they're collecting stacks: 
> https://github.com/yoshinorim/quickstack . Yoshinori has a lot of experience 
> troubleshooting production systems so his recommendation carries some weight 
> for me.
> I tried it out and it seems to work as advertised so far. The binary is 
> pretty small and easy to build so isn't an unreasonable dependency.
> [~bharathv][~dgarg][~anujphadke]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7101) Builds are timing out/hanging

2018-06-19 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16517051#comment-16517051
 ] 

Balazs Jeszenszky commented on IMPALA-7101:
---

[~twmarshall] / [~dhecht], what versions does this issue affect? Seems like 
it's been around for a while.

> Builds are timing out/hanging
> -
>
> Key: IMPALA-7101
> URL: https://issues.apache.org/jira/browse/IMPALA-7101
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Thomas Tauber-Marshall
>Assignee: Dan Hecht
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> We've seen a large number of builds in the last week or two that appear to 
> have hung and gotten killed after a 24-hour timeout.
> Exactly where the hang is occurring is different in each build, but II 
> suspect it has something to do with cancellation no working correctly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7179) Consider changing --allow_multiple_scratch_dirs_per_device default

2018-06-15 Thread Balazs Jeszenszky (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514185#comment-16514185
 ] 

Balazs Jeszenszky commented on IMPALA-7179:
---

I agree this should be default, this behaviour is what I would intuitively 
expect.

> Consider changing --allow_multiple_scratch_dirs_per_device default
> --
>
> Key: IMPALA-7179
> URL: https://issues.apache.org/jira/browse/IMPALA-7179
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: resource-management
>
> I've seen multiple instances of Impala users being tripped up by this 
> behaviour and zero instances of it being useful (although it's possible that 
> it helped someone and they didn't notice). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7107) [DOCS] Review docs for storage formats impala cannot insert into

2018-06-01 Thread Balazs Jeszenszky (JIRA)
Balazs Jeszenszky created IMPALA-7107:
-

 Summary: [DOCS] Review docs for storage formats impala cannot 
insert into
 Key: IMPALA-7107
 URL: https://issues.apache.org/jira/browse/IMPALA-7107
 Project: IMPALA
  Issue Type: Bug
Affects Versions: Impala 2.12.0
Reporter: Balazs Jeszenszky


There are several points to clear up or improve across these pages:
* I'd refer to the Hive documentation on how to set compression codecs instead 
of documenting Hive's behaviour for file formats Impala cannot write
* Add 'Ingesting file formats Impala can't write' section to 'How Impala Works 
with Hadoop File Formats' page, link that central location from wherever 
applicable. Unify the recommendation on data loading (usage of LOAD DATA or 
hive or manual copy).
* add a compatibility matrix for compressions and file formats, clear up 
compatibility on 'How Impala Works with Hadoop File Formats' (the page is 
inconsistent even within itself, e.g. bzip2).
* Remove references to Impala versions <2.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6994) Avoid reloading a table's HMS data for file-only operations

2018-05-22 Thread Balazs Jeszenszky (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483561#comment-16483561
 ] 

Balazs Jeszenszky commented on IMPALA-6994:
---

Can you elaborate [~pranay_singh]? I think SYNC_DDL is handled on the 
coordinator side alone, and doesn't have to do with catalog. Why would not 
reloading something that hasn't changed cause inconsistency?

> Avoid reloading a table's HMS data for file-only operations
> ---
>
> Key: IMPALA-6994
> URL: https://issues.apache.org/jira/browse/IMPALA-6994
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 2.12.0
>Reporter: Balazs Jeszenszky
>Assignee: Pranay Singh
>Priority: Major
>
> Reloading file metadata for HDFS tables (e.g. as a final step in an 'insert') 
> is done via
> https://github.com/apache/impala/blob/branch-2.12.0/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L628
> , which calls
> https://github.com/apache/impala/blob/branch-2.12.0/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1243
> HdfsTable.load has no option to only load file metadata. HMS metadata will 
> also be reloaded every time, which is an unnecessary overhead (and potential 
> point of failure) when adding files to existing locations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7032) Codegen crash when UNIONing NULL and CHAR(N)

2018-05-15 Thread Balazs Jeszenszky (JIRA)
Balazs Jeszenszky created IMPALA-7032:
-

 Summary: Codegen crash when UNIONing NULL and CHAR(N)
 Key: IMPALA-7032
 URL: https://issues.apache.org/jira/browse/IMPALA-7032
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 2.12.0
Reporter: Balazs Jeszenszky


A simple repro:

{code:java}
create table test (c1 int);
select null from test union select cast('a' as char(1)) from test;
{code}

{code}
#0  0x7f050c4a61d7 in raise () from sysroot/lib64/libc.so.6
#1  0x7f050c4a78c8 in abort () from sysroot/lib64/libc.so.6
#2  0x7f050e7816b5 in os::abort(bool) ()
   from sysroot/usr/java/jdk1.8.0_60/jre/lib/amd64/server/libjvm.so
#3  0x7f050e91fbf3 in VMError::report_and_die() ()
   from sysroot/usr/java/jdk1.8.0_60/jre/lib/amd64/server/libjvm.so
#4  0x7f050e786edf in JVM_handle_linux_signal ()
   from sysroot/usr/java/jdk1.8.0_60/jre/lib/amd64/server/libjvm.so
#5  0x7f050e77d673 in signalHandler(int, siginfo*, void*) ()
   from sysroot/usr/java/jdk1.8.0_60/jre/lib/amd64/server/libjvm.so
#6  
#7  0x01a9123d in llvm::FunctionType::get(llvm::Type*, 
llvm::ArrayRef, bool) ()
#8  0x00c2c04f in impala::LlvmCodeGen::FnPrototype::GeneratePrototype 
(this=0x7f03e7fdad90,
builder=0x0, params=0x7f03e7fdae30, print_ir=)
at /usr/src/debug/impala-2.9.0-cdh5.12.2/be/src/codegen/llvm-codegen.cc:710
#9  0x00846187 in impala::Expr::CreateIrFunctionPrototype 
(this=this@entry=0x9fded80,
codegen=codegen@entry=0xa5a2880, name=..., args=args@entry=0x7f03e7fdae30)
at /usr/src/debug/impala-2.9.0-cdh5.12.2/be/src/exprs/expr.cc:505
#10 0x00861e5c in impala::NullLiteral::GetCodegendComputeFn 
(this=0x9fded80,
codegen=0xa5a2880, fn=0x7f03e7fdaf18)
at /usr/src/debug/impala-2.9.0-cdh5.12.2/be/src/exprs/null-literal.cc:106
#11 0x00a79bc7 in impala::Tuple::CodegenMaterializeExprs 
(codegen=codegen@entry=0xa5a2880,
collect_string_vals=collect_string_vals@entry=false, desc=..., 
materialize_expr_ctxs=...,
use_mem_pool=use_mem_pool@entry=true, fn=0x7f03e7fdb410)
at /usr/src/debug/impala-2.9.0-cdh5.12.2/be/src/runtime/tuple.cc:307
#12 0x00d12828 in impala::UnionNode::Codegen (this=, 
state=)
at /usr/src/debug/impala-2.9.0-cdh5.12.2/be/src/exec/union-node.cc:105
#13 0x00c4aaa1 in impala::ExecNode::Codegen (this=this@entry=0xa0c5480,
{code}

NullLiteral::GetCodegendComputeFn is missing a check for type.CHAR which isn't 
implemented.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7007) Kudu duplicate key count can be off

2018-05-10 Thread Balazs Jeszenszky (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-7007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky updated IMPALA-7007:
--
Component/s: Backend

> Kudu duplicate key count can be off
> ---
>
> Key: IMPALA-7007
> URL: https://issues.apache.org/jira/browse/IMPALA-7007
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.12.0
>Reporter: Balazs Jeszenszky
>Priority: Major
>
> After inserting TPC-H data into Kudu by doing:
> {code}
> insert into lineitem select * from PARQUETIMPALA500.LINEITEM
> {code}
> , the query profile contains this error:
> {code}
> Errors: Key already present in Kudu table 'impala::kudu_impala_500.LINEITEM'. 
> (1 of -1831809966 similar)
> {code}
> Clearly, the count is off here. Also, it seems this issue can trigger a 
> DCHECK on debug builds. The accompanying log line in that case is:
> {code}
> F0507 09:46:12.673912 29258 error-util.cc:148] Check failed: log_entry.count 
> > 0 (-1831809966 vs. 0) 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6994) Avoid reloading a table's HMS data for file-only operations

2018-05-08 Thread Balazs Jeszenszky (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky updated IMPALA-6994:
--
Summary: Avoid reloading a table's HMS data for file-only operations  (was: 
Avoid reloading a table's HMS data)

> Avoid reloading a table's HMS data for file-only operations
> ---
>
> Key: IMPALA-6994
> URL: https://issues.apache.org/jira/browse/IMPALA-6994
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.12.0
>Reporter: Balazs Jeszenszky
>Priority: Major
>
> Reloading file metadata for HDFS tables (e.g. as a final step in an 'insert') 
> is done via
> https://github.com/apache/impala/blob/branch-2.12.0/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L628
> , which calls
> https://github.com/apache/impala/blob/branch-2.12.0/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1243
> HdfsTable.load has no option to only load file metadata. HMS metadata will 
> also be reloaded every time, which is an unnecessary overhead (and potential 
> point of failure) when adding files to existing locations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-6994) Avoid reloading a table's HMS data

2018-05-08 Thread Balazs Jeszenszky (JIRA)
Balazs Jeszenszky created IMPALA-6994:
-

 Summary: Avoid reloading a table's HMS data
 Key: IMPALA-6994
 URL: https://issues.apache.org/jira/browse/IMPALA-6994
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Affects Versions: Impala 2.12.0
Reporter: Balazs Jeszenszky


Reloading file metadata for HDFS tables (e.g. as a final step in an 'insert') 
is done via
https://github.com/apache/impala/blob/branch-2.12.0/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L628
, which calls
https://github.com/apache/impala/blob/branch-2.12.0/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1243

HdfsTable.load has no option to only load file metadata. HMS metadata will also 
be reloaded every time, which is an unnecessary overhead (and potential point 
of failure) when adding files to existing locations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6985) Impala View Does Not Populate Table Input/Output Format Class

2018-05-08 Thread Balazs Jeszenszky (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467389#comment-16467389
 ] 

Balazs Jeszenszky commented on IMPALA-6985:
---

Looking at the Hive-side jira, I don't think Impala should be fixing this as 
it's within specification and it would add overhead (by having to keep these 
fields up to date).

> Impala View Does Not Populate Table Input/Output Format Class
> -
>
> Key: IMPALA-6985
> URL: https://issues.apache.org/jira/browse/IMPALA-6985
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: BELUGA BEHR
>Priority: Major
>
> When a view is created in Impala, the InputFormat and OutputFormat fields are 
> set to NULL.  This is breaking some aspects of Hive. See: [HIVE-19424].  
> Perhaps Impala can play nice here and set these fields the same as Hive.
> {code:sql}
> -- hive
> CREATE VIEW test_view_hive AS select * from sample_07;
> -- impala
> CREATE VIEW test_view_impala AS select * from sample_07;
> {code}
> {code:sql}
> -- Impala
> DESCRIBE extended test_view_impala;
> InputFormat - (blank)
> OutputFormat - (blank)
> DESCRIBE extended test_view_hive;
> InputFormat - org.apache.hadoop.mapred.TextInputFormat
> OutputFormat - org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> {code}
> You can see the difference in the Hive Metastore.
> {code}
> MariaDB [hive1]> SELECT TBLS.TBL_NAME FROM TBLS JOIN SDS ON 
> TBLS.SD_ID=SDS.SD_ID WHERE SDS.INPUT_FORMAT IS NULL;
> +--+
> | TBL_NAME |
> +--+
> | test_view_impala |
> +--+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-6729) Provide startup option to disable file and block location cache

2018-05-04 Thread Balazs Jeszenszky (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463587#comment-16463587
 ] 

Balazs Jeszenszky edited comment on IMPALA-6729 at 5/4/18 9:08 AM:
---

[~stiga-huang] just commenting on your experiment. I don't think your 
assumption about small files and block count is safe. Averages based on your 
data are:

{code}
1479550865896345 (total size) / 9098905 (file count) bytes ~= 155MB per file
1479550865896345 (total size) / 1799131 (partition count) bytes ~= 784MB per 
partition
{code}

Both of these averages are too low IMO.
After bumping the average file size to 512MB (assumed equivalent reduction rate 
in block count) and average partition size to 4GB, using the 
[estimations|https://github.com/apache/impala/blob/branch-2.12.0/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L140-L146],
 I got a catalog size of 2.82GB.

{code:java}
9098905*500+1799131*2048+13621520*150
4549452500 bytes  = 4.2GB  from files
3684620288 bytes  = 3.43GB from partitions
2043228000 bytes  = 2.05GB from blocks
10277300788 bytes = 9.57GB sum

2889748*500+361218*2048+(13621520*0.31)*150
1444874000 bytes  = 1.44GB from files
739774464 bytes   = 0.74GB from partitions
633400650 bytes   = 0.63GB from blocks
2818049114 bytes  = 2.82GB sum
{code}

Not saying this invalidates your idea, but there is a lot to be gained by 
compaction and by reducing partition count in this case.


was (Author: jeszyb):
[~stiga-huang] just commenting on your experiment. I don't think your 
assumption about small files and block count is safe. Averages based on your 
data are:

{code}
1479550865896345 (total size) / 9098905 (file count) bytes ~= 155MB per file
1479550865896345 (total size) / 1799131 (partition count) bytes ~= 784MB per 
partition
{code}

Both of these averages are too low IMO.
After bumping the average file size to 512MB (assumed equivalent reduction rate 
in block count) and average partition size to 4GB, using the 
[estimations|https://github.com/apache/impala/blob/branch-2.12.0/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L140-L146],
 I got a catalog size of 3.35GB.

{code:java}
9098905*500+1799131*2048+13621520*150
4549452500 bytes  = 4.2GB  from files
3684620288 bytes  = 3.43GB from partitions
2043228000 bytes  = 2.05GB from blocks
10277300788 bytes = 9.57GB sum

2889748*500+361218*2048+(13621520*0.69)*150
1444874000 bytes  = 1.44GB from files
739774464 bytes   = 0.74GB from partitions
1409827200 bytes  = 1.41GB from blocks
3594475664 bytes  = 3.35GB sum
{code}

Not saying this invalidates your idea, but there is a lot to be gained by 
compaction and by reducing partition count in this case.

> Provide startup option to disable file and block location cache
> ---
>
> Key: IMPALA-6729
> URL: https://issues.apache.org/jira/browse/IMPALA-6729
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Catalog
>Reporter: Quanlong Huang
>Priority: Major
> Attachments: Screen Shot 2018-05-04 at 12.12.21 PM.png
>
>
> In HDFS, scheduling PlanFragments according to block locations can improve 
> the locality of queries. However, every coin has two sides. There’re some 
> scenarios that loading & keeping the block locations brings no benefits, 
> sometimes even becomes a burden.
> {panel:title=Scenario 1}
> In a Hadoop cluster with ~1000 nodes, Impala cluster is only deployed on tens 
> of computation nodes (i.e. with small disks but larger memory and powerful 
> CPUs). Data locality is poor since most of the blocks have no replicas in the 
> Impala nodes. Network bandwidth is 1Gbit/s so it’s ok for remote read. 
> Queries are only required to finish within 5 mins.
>  
> Block location info is useless since the scheduler always comes up with the 
> same plan.
> {panel}
> {panel:title=Scenario 2}
> load_catalog_in_background is set to false since there’re several PB of data 
> in hive warehouse. If it’s set to true, the Impala cluster won’t be able to 
> start up (will waiting for loading block locations and finally full fill the 
> memory of catalogd and crash it).
> Accessing a hive table containing >10,000 partitions at the first time will 
> be stuck for a long time. Sometimes it can’t even finish for some large 
> tables. Users are annoyed when they only want to describe the table or select 
> a few partitions on this table.
>  
> Block location info is a burden here since its loading dominates the query 
> time. Finally, only a little portion of the block location info can be used.
> {panel}
> {panel:title=Scenario 3}
> There’re many ETL pipelines ingesting data into Hive warehouse. Some tables 
> are updated by replacing the whole data set. Some partitioned tables are 
> updated by inserting new partitions.
> Ad hoc queries 

[jira] [Commented] (IMPALA-6729) Provide startup option to disable file and block location cache

2018-05-04 Thread Balazs Jeszenszky (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463587#comment-16463587
 ] 

Balazs Jeszenszky commented on IMPALA-6729:
---

[~stiga-huang] just commenting on your experiment. I don't think your 
assumption about small files and block count is safe. Averages based on your 
data are:

{code}
1479550865896345 (total size) / 9098905 (file count) bytes ~= 155MB per file
1479550865896345 (total size) / 1799131 (partition count) bytes ~= 784MB per 
partition
{code}

Both of these averages are too low IMO.
After bumping the average file size to 512MB (assumed equivalent reduction rate 
in block count) and average partition size to 4GB, using the 
[estimations|https://github.com/apache/impala/blob/branch-2.12.0/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L140-L146],
 I got a catalog size of 3.35GB.

{code:java}
9098905*500+1799131*2048+13621520*150
4549452500 bytes  = 4.2GB  from files
3684620288 bytes  = 3.43GB from partitions
2043228000 bytes  = 2.05GB from blocks
10277300788 bytes = 9.57GB sum

2889748*500+361218*2048+(13621520*0.69)*150
1444874000 bytes  = 1.44GB from files
739774464 bytes   = 0.74GB from partitions
1409827200 bytes  = 1.41GB from blocks
3594475664 bytes  = 3.35GB sum
{code}

Not saying this invalidates your idea, but there is a lot to be gained by 
compaction and by reducing partition count in this case.

> Provide startup option to disable file and block location cache
> ---
>
> Key: IMPALA-6729
> URL: https://issues.apache.org/jira/browse/IMPALA-6729
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Catalog
>Reporter: Quanlong Huang
>Priority: Major
> Attachments: Screen Shot 2018-05-04 at 12.12.21 PM.png
>
>
> In HDFS, scheduling PlanFragments according to block locations can improve 
> the locality of queries. However, every coin has two sides. There’re some 
> scenarios that loading & keeping the block locations brings no benefits, 
> sometimes even becomes a burden.
> {panel:title=Scenario 1}
> In a Hadoop cluster with ~1000 nodes, Impala cluster is only deployed on tens 
> of computation nodes (i.e. with small disks but larger memory and powerful 
> CPUs). Data locality is poor since most of the blocks have no replicas in the 
> Impala nodes. Network bandwidth is 1Gbit/s so it’s ok for remote read. 
> Queries are only required to finish within 5 mins.
>  
> Block location info is useless since the scheduler always comes up with the 
> same plan.
> {panel}
> {panel:title=Scenario 2}
> load_catalog_in_background is set to false since there’re several PB of data 
> in hive warehouse. If it’s set to true, the Impala cluster won’t be able to 
> start up (will waiting for loading block locations and finally full fill the 
> memory of catalogd and crash it).
> Accessing a hive table containing >10,000 partitions at the first time will 
> be stuck for a long time. Sometimes it can’t even finish for some large 
> tables. Users are annoyed when they only want to describe the table or select 
> a few partitions on this table.
>  
> Block location info is a burden here since its loading dominates the query 
> time. Finally, only a little portion of the block location info can be used.
> {panel}
> {panel:title=Scenario 3}
> There’re many ETL pipelines ingesting data into Hive warehouse. Some tables 
> are updated by replacing the whole data set. Some partitioned tables are 
> updated by inserting new partitions.
> Ad hoc queries are used to be served by Presto. When trying to introduce 
> Impala to replace Presto, we should add a REFRESH table step at the end of 
> each pipeline, which takes great efforts (many code changes on the existing 
> warehouse).
> IMPALA-4272 can solve this but has no progress. If file and block location 
> metadata cache can be disabled, things will be simple.
> {panel}
> IMPALA-3127 is relative. But we hope it's possible to not keep the block 
> locations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



  1   2   >