date:20181030

[jira] [Resolved] (IMPALA-2609) Table aliases with spaces do work properly

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2609.
---
Resolution: Cannot Reproduce
  Assignee: (was: Syed A. Hashmi)

> Table aliases with spaces do work properly
> --
>
> Key: IMPALA-2609
> URL: https://issues.apache.org/jira/browse/IMPALA-2609
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.2.4, Impala 2.3.0
>Reporter: Saravana
>Priority: Minor
>
> If the query contains a subquery in the FROM clause and the alias to the 
> subquery contains space, 
>  
> {code:sql}select `alias with space`.code from (select * from sample_07) 
> `alias with space`{code}
>  
> the JDBC driver throws exception
>  
> Exception in thread "main" java.sql.SQLException: 
> [Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error 
> Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, 
> errorMessage:AnalysisException: Syntax error in line 1:
> ...lt`.`sample_07`) AS alias with space
>  ^
> Encountered: WITH
> Expected: CROSS, FROM, FULL, GROUP, HAVING, INNER, JOIN, LEFT, LIMIT, OFFSET, 
> ON, ORDER, RIGHT, UNION, USING, WHERE, COMMA
>  
> CAUSED BY: Exception: Syntax error
> ), Query: SELECT `alias with space`.`code` FROM (SELECT `sample_07`.`code`, 
> `sample_07`.`description`, `sample_07`.`total_emp`, `sample_07`.`salary` FROM 
> `default`.`sample_07`) AS alias with space.
> at 
> com.cloudera.impala.hivecommon.api.HS2Client.executeStatementInternal(Unknown 
> Source)
> at com.cloudera.impala.hivecommon.api.HS2Client.executeStatement(Unknown 
> Source)
> at 
> com.cloudera.impala.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.executeQuery(Unknown
>  Source)
> at 
> com.cloudera.impala.hivecommon.dataengine.HiveJDBCDSIExtQueryExecutor.execute(Unknown
>  Source)
> at com.cloudera.impala.jdbc.common.SStatement.executeNoParams(Unknown Source)
> at com.cloudera.impala.jdbc.common.SStatement.executeQuery(Unknown Source)
> Caused by: com.cloudera.impala.support.exceptions.GeneralException: 
> [Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error 
> Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, 
> errorMessage:AnalysisException: Syntax error in line 1:
> ...lt`.`sample_07`) AS alias with space
>  ^
> Encountered: WITH
> Expected: CROSS, FROM, FULL, GROUP, HAVING, INNER, JOIN, LEFT, LIMIT, OFFSET, 
> ON, ORDER, RIGHT, UNION, USING, WHERE, COMMA
>  
> CAUSED BY: Exception: Syntax error
> ), Query: SELECT `alias with space`.`code` FROM (SELECT `sample_07`.`code`, 
> `sample_07`.`description`, `sample_07`.`total_emp`, `sample_07`.`salary` FROM 
> `default`.`sample_07`) AS alias with space.
> ... 6 more



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-2692) Cancelling Timed Out Queries and/or Timed out connections

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2692.
---
Resolution: Cannot Reproduce

It's not clear what the problem actually is. Please reopen if you have steps to 
reproduce.

> Cancelling Timed Out Queries and/or Timed out connections
> -
>
> Key: IMPALA-2692
> URL: https://issues.apache.org/jira/browse/IMPALA-2692
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.2.1
> Environment: ODBC and Hue connections
>Reporter: Summer
>Priority: Minor
>
> Hi-
> When running queries via Hue or via ODBC connection (for example through the 
> impyla module), if the query or connection times out, even if you then 
> execute a conn.close() statement, that connection does not actually get 
> closed, leaving an ongoing open connection with possible security issues that 
> can only be cancelled via the cloudera manager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-2957) Query Profile: Incorrect "#Hosts" count in ExecSummary for SCAN NODE

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2957.
---
Resolution: Duplicate

> Query Profile: Incorrect "#Hosts" count in ExecSummary for SCAN NODE
> 
>
> Key: IMPALA-2957
> URL: https://issues.apache.org/jira/browse/IMPALA-2957
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.1
>Reporter: Alan Choi
>Priority: Minor
>
> *Problem:*
> For scan node, even if it's scanning only one file, one block (but 3 
> replica), the "#Hosts" count should be one, but the ExecSummary showed "3" 
> instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-3137) Switching between databases give a wrong result

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3137.
---
Resolution: Invalid

We don't track bugs with this driver on Apache JIRA.

> Switching between databases give a wrong result
> ---
>
> Key: IMPALA-3137
> URL: https://issues.apache.org/jira/browse/IMPALA-3137
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.2, Impala 2.3.0
>Reporter: Jiri Novak
>Priority: Minor
> Attachments: ImpalaSwitchDB.zip
>
>
> Setting default database in ODBC driver setup is case sensitive and affects 
> the result of select in a very strange way.
> Use standard C# (.Net version is 4.5) and ADO.Net classes from 
> System.Data.Odbc namespace to access ODBC data source, ODBC driver 
> v2.5.30.1011 (32 bit) 
> Test case - 
> 1.Create two databases TESTDB1 and TESTDB2
> 2.Create two table TESTDB1.TABX and TESTDB2.TABX
> 3.Insert in table TESTDB1.TABX 5 rows
> 4.Set database on “Cloudera ODBC Driver for Impala DSN Setup” as TESTDB1 
> (uppercase!)
> 5.Execute (both of them without statement terminator ';')
> {code:sql}
> USE TESTDB2
> SELECT COUNT(*) FROM TABX -- you will get the expected result – 0
> {code}
> 6.Repeat the step # 4 but in this case set the default database as 
> testdb1 (lower case)
> 7.Execute the same script as in step # 5. Now you will get the result – 
> 5! 
> The table TABX has been resolved as TESTDB1.TABX and it’s wrong.
> There is a simple .Net application attached which can be used to simulate the 
> issue. It is needed only to change a connection string directly in code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IMPALA-7785) Analyzer cannot handle GROUP BY clause rewrites

2018-10-30 Thread Paul Rogers (JIRA)

Paul Rogers created IMPALA-7785:
---

 Summary: Analyzer cannot handle GROUP BY clause rewrites
 Key: IMPALA-7785
 URL: https://issues.apache.org/jira/browse/IMPALA-7785
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 3.0
Reporter: Paul Rogers


The FE analyzer has a stage that rewrites expressions to make them simpler. The 
analyzer also has a stage that matches up {{GROUP BY}} expressions with 
{{SELECT}} clause expressions. Apparently, the two don't work together:

{code:sql}
SELECT coalesce(string_col, 'foo')
FROM functional.alltypes  
GROUP BY coalesce(string_col, 'foo') 
{code}

The above is rewritten using the new conditional function rewrite rules. Result:

{noformat}
org.apache.impala.common.AnalysisException:
  select list expression not produced by aggregation output
  (missing from GROUP BY clause?):
  CASE WHEN string_col IS NOT NULL THEN string_col ELSE 'foo' END
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IMPALA-7783) test_default_timezone failing on real cluster

2018-10-30 Thread David Knupp (JIRA)

David Knupp created IMPALA-7783:
---

 Summary: test_default_timezone failing on real cluster
 Key: IMPALA-7783
 URL: https://issues.apache.org/jira/browse/IMPALA-7783
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 3.1.0
Reporter: David Knupp


shell/test_shell_commandline.py/test_default_timezone is failing due to issues 
in asserting zoneinfo/tzname 
{noformat}
shell/test_shell_commandline.py:715: in test_default_timezone
assert os.path.isfile("/usr/share/zoneinfo/" + tzname)
E   assert (('/usr/share/zoneinfo/' + 
'SystemV/PST8PDT'))
E+  where  = .isfile
E+where 
 = os.path {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-2933) Impala-shell doesn't validate query options

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2933.
---
Resolution: Won't Fix

It's not clear that it's worth the shell doing an RPC per "set" command.

> Impala-shell doesn't validate query options
> ---
>
> Key: IMPALA-2933
> URL: https://issues.apache.org/jira/browse/IMPALA-2933
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.2
>Reporter: Mostafa Mokhtar
>Priority: Minor
>  Labels: impala-shell
>
> Impala-shell doesn't validate query options and waits till a request is sent 
> to the back-end to validate. 
> As a result the user can pass arbitrary values without getting any errors. 
> {code}
> [node1:21000] > set REPLICA_PREFERENCE=10;
> REPLICA_PREFERENCE set to 10
> [node1:21000] > set REPLICA_PREFERENCE=;
> REPLICA_PREFERENCE set to 
> [node1:21000] > set REPLICA_PREFERENCE="hfhhff";
> REPLICA_PREFERENCE set to "hfhhff"
> {code}
> This behavior is not consistent with Hive CLI where it validates query 
> options.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IMPALA-7784) Partition pruning handles escaped strings incorrectly

2018-10-30 Thread Csaba Ringhofer (JIRA)

Csaba Ringhofer created IMPALA-7784:
---

 Summary: Partition pruning handles escaped strings incorrectly
 Key: IMPALA-7784
 URL: https://issues.apache.org/jira/browse/IMPALA-7784
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 3.0
Reporter: Csaba Ringhofer


Repro:
{code}
create table tpart (i int) partitioned by (p string)
insert into tpart partition (p="\"") values (1);

select  * from tpart where p = "\"";
Result;
Fetched 0 row(s)

select  * from tpart where p = '"';
Result:
1,

{code}

Hive returns the row for both queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-3180) Impala Daemon Ready Status leading to Monitor-HostMonitor throttling_logger ERROR

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3180.
---
Resolution: Invalid

Looks like an issue with cloudera manager, which isn't an apache project.

> Impala Daemon Ready Status leading to Monitor-HostMonitor throttling_logger 
> ERROR
> -
>
> Key: IMPALA-3180
> URL: https://issues.apache.org/jira/browse/IMPALA-3180
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.2
>Reporter: chaitanya
>Priority: Minor
>  Labels: impala
>
> When I check from cloudera-scm-agent logs.
> [11/Mar/2016 10:50:19 +] 17000 Metadata-Plugin navigator_plugin_pipeline 
> INFO Stopping Navigator Plugin Pipeline '' for cluster-host-inspector 
> (log dir: None)
> [11/Mar/2016 10:50:23 +] 17000 CP Server Thread-10 _cplogging   INFO 
> 10.81.80.34 - - [11/Mar/2016:10:50:23] "GET 
> /process/9408-cluster-host-inspector/files/inspector HTTP/1.1" 200 2425 "" 
> "Java/1.7.0_67"
> [11/Mar/2016 10:50:38 +] 17000 MainThread agentINFO Process 
> with same id has changed: 9408-cluster-host-inspector.
> [11/Mar/2016 10:50:38 +] 17000 MainThread agentINFO 
> Deactivating process 9408-cluster-host-inspector
> [11/Mar/2016 10:50:39 +] 17000 Metadata-Plugin navigator_plugin INFO 
> stopping Metadata Plugin for cluster-host-inspector with pipelines []
> [11/Mar/2016 10:50:39 +] 17000 Metadata-Plugin navigator_plugin_pipeline 
> INFO Stopping Navigator Plugin Pipeline '' for cluster-host-inspector 
> (log dir: None)
> [11/Mar/2016 10:50:41 +] 17000 Audit-Plugin navigator_plugin INFO 
> stopping Audit Plugin for cluster-host-inspector with pipelines []
> [11/Mar/2016 10:50:41 +] 17000 Audit-Plugin navigator_plugin_pipeline 
> INFO Stopping Navigator Plugin Pipeline '' for cluster-host-inspector 
> (log dir: None)
> [11/Mar/2016 10:52:33 +] 17000 Monitor-HostMonitor throttling_logger 
> ERRORKill subprocess exception with args 
> ['/usr/java/jdk1.7.0_67-cloudera/bin/java', '-classpath', 
> '/usr/share/cmf/lib/agent-5.4.1.jar', 'com.cloudera.cmon.agent.DnsTest']
> Traceback (most recent call last):
>   File "/usr/lib64/cmf/agent/src/cmf/subprocess_timeout.py", line 84, in 
> subprocess_with_timeout
> os.kill(p.pid, signal.SIGTERM)
> OSError: [Errno 3] No such process
> [11/Mar/2016 10:52:33 +] 17000 Monitor-HostMonitor throttling_logger 
> ERROR(1 skipped) Timeout with args 
> ['/usr/java/jdk1.7.0_67-cloudera/bin/java', '-classpath', 
> '/usr/share/cmf/lib/agent-5.4.1.jar', 'com.cloudera.cmon.agent.DnsTest']
> Traceback (most recent call last):
>   File "/usr/lib64/cmf/agent/src/cmf/subprocess_timeout.py", line 84, in 
> subprocess_with_timeout
> os.kill(p.pid, signal.SIGTERM)
> OSError: [Errno 3] No such process
> [11/Mar/2016 10:52:33 +] 17000 Monitor-HostMonitor throttling_logger 
> ERROR(1 skipped) Failed to collect java-based DNS names
> Traceback (most recent call last):
>   File "/usr/lib64/cmf/agent/src/cmf/monitor/host/dns_names.py", line 64, in 
> collect
> result, stdout, stderr = self._subprocess_with_timeout(args, 
> self._poll_timeout)
>   File "/usr/lib64/cmf/agent/src/cmf/monitor/host/dns_names.py", line 46, in 
> _subprocess_with_timeout
> return subprocess_with_timeout(args, timeout)
>   File "/usr/lib64/cmf/agent/src/cmf/subprocess_timeout.py", line 94, in 
> subprocess_with_timeout
> raise Exception("timeout with args %s" % args)
> Exception: timeout with args ['/usr/java/jdk1.7.0_67-cloudera/bin/java', 
> '-classpath', '/usr/share/cmf/lib/agent-5.4.1.jar', 
> 'com.cloudera.cmon.agent.DnsTest']
> [11/Mar/2016 10:52:52 +] 17000 ImpalaDaemonQueryMonitoring 
> throttling_logger ERROR(54 skipped) Error fetching executing query ids at 
> 'http://dcslpd43.amat.com:25000/inflight_query_ids'
> Traceback (most recent call last):
>   File "/usr/lib64/cmf/agent/src/cmf/monitor/impalad/query_monitor.py", line 
> 479, in get_executing_query_ids
> password=password)
>   File "/usr/lib64/cmf/agent/src/cmf/url_util.py", line 62, in 
> urlopen_with_timeout
> return opener.open(url, data, timeout)
>   File "/usr/lib64/python2.6/urllib2.py", line 391, in open
> response = self._open(req, data)
>   File "/usr/lib64/python2.6/urllib2.py", line 409, in _open
> '_open', req)
> File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain
> result = func(*args)
>   File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open
> return self.do_open(httplib.HTTPConnection, req)
>   File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open
> raise URLError(err)
> URLError: 
> could some one please assist on this.



--
This

[jira] [Resolved] (IMPALA-1159) fnv_hash UDF initialized with 32 bits offset basis

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1159.
---
Resolution: Won't Fix

We've moved away from using FNV anyway, doesn't seem worth enhancing it.

> fnv_hash UDF initialized with 32 bits offset basis
> --
>
> Key: IMPALA-1159
> URL: https://issues.apache.org/jira/browse/IMPALA-1159
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 1.4
> Environment: Linux 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 
> UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Thierry Herrmann
>Priority: Minor
>  Labels: correctness, downgraded, incompatibility
>
> According to 
> http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_math_functions.html
> the fnv_hash UDF implements the 64 bits FNV-1a variation.
> According to 
> http://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function
> the algorithm should be seeded with the 64-bit FNV offset basis value: 
> 14695981039346656037 (in hex, 0xcbf29ce484222325)
> Implementing this, I did not obtain the same FNV 1a hashes as Impala
> E.g. with impala-shell I obtain
> {code}
> +-+
> | fnv_hash('hello')   |
> +-+
> | 6414202926103426347 |
> +-+
> {code}
> whereas it should be -6615550055289275125
> By looking at the Impala unit tests:
> https://github.com/cloudera/Impala/blob/8567b51f8c38bd389a338c761242a316d8ffe5c8/be/src/exprs/expr-test.cc
> Excerpt:
> {code}
> // Test fnv_hash
> string s("hello world");
> uint64_t expected = HashUtil::FnvHash64(s.data(), s.size(), 
> HashUtil::FNV_SEED);
> TestValue("fnv_hash('hello world')", TYPE_BIGINT, expected);
> {code} 
> I see that the algorithm is seeded with the 32 bits offset basis
> instead of FNV64_SEED.
> If I update my algorithm and seed it with the 32 bits offset basis, I obtain 
> the same hashes as impala.
> For backward compatibility, it may not be easy to fix. Or it could be 
> deprecated and replaced with a fixed UDF ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-2626) In-flight queries fail when statestore comes back online.

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2626.
---
Resolution: Duplicate

> In-flight queries fail when statestore comes back online.
> -
>
> Key: IMPALA-2626
> URL: https://issues.apache.org/jira/browse/IMPALA-2626
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.3.0
>Reporter: Sailesh Mukil
>Assignee: Sailesh Mukil
>Priority: Minor
>  Labels: chaosmonkey, statestore, usability
>
> During a session, if the statestore goes down, the impalads continue 
> execution if they have enough metadata that they've already received from the 
> statestore prior to it's failure.
> The impalads can continue execution without the statestore with the stale 
> metadata that they posses. However, when the statestore comes back online, 
> the first membership callback it makes to the impalad hosts, erases the 
> "known_backends" list that the impalads have stored locally.
> Therefore, in-flight queries fail(sometimes without propagating the error to 
> the shell -> IMPALA-1325).
> Solution:
> Do not erase the list of "known_backends" in each impalad until the 
> statestore has a new list to provide to the impalads.
> _This bug was found during initial runs of ChaosMonkey on Impala._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-3457) Cloudera Manager when adding HA Name Node does not properly update LOCATION column in table HIVE.SDS, nor DB_LOCATION_URI column in table HIVE.DBS causing all impala qu

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3457.
---
Resolution: Invalid

Cloudera manager is not an apache project, so we don't track issues here.

> Cloudera Manager when adding HA Name Node does not properly update LOCATION 
> column in table HIVE.SDS, nor DB_LOCATION_URI column in table HIVE.DBS 
> causing all impala queries to fail
> -
>
> Key: IMPALA-3457
> URL: https://issues.apache.org/jira/browse/IMPALA-3457
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.5.0
> Environment: Cloudera Manager 5.7 with CDH 5.7.0 (Parcel)
>Reporter: Scott C
>Priority: Minor
>
> Started with functioning system using one Name Node plus a Secondary Name 
> Node.
> Used the Cloudera Manager to add the Name Node to another host for High 
> Availability.
> Afterwards 'hdfs' commands work fine, but any impala queries fail trying to 
> access internal parquet tables:
> {code}CAUSED BY: IOException: Port 9000 specified in URI 
> hdfs://nameservice1:9000/user/hive/warehouse/arecordparquetpartition/year=2015/month=2/day=20
>  but host 'nameservice1' is a logical (HA) namenode and does not use port 
> information.
> CAUSED BY: TableLoadingException: Failed to load metadata for table: 
> arecordparquetpartition
> {code}
> {panel}We use port 9000 for the name node due to legacy starting from CDH 
> 4.8.6 installed from RPM and no Cloudera Manager.
> {panel}
> Manually fixed records in MySQL to correct the problem:
> {code}update DBS set DB_LOCATION_URI = 
> 'hdfs://nameservice1/user/hive/warehouse' where DB_ID=1; 
> update SDS set LOCATION = 
> 'hdfs://nameservice1/user/hive/warehouse/arecordparquetpartition' where SD_ID 
> = 25447;
> select SD_ID,LOCATION from SDS;
> 25469 
> hdfs://nameservice1:9000/user/hive/warehouse/arecordparquetpartition/year=2015/month=2/day=20
> 25470 
> hdfs://nameservice1:9000/user/hive/warehouse/arecordparquetpartition/year=2015/month=3/day=12
> 25471 
> hdfs://nameservice1:9000/user/hive/warehouse/arecordparquetpartition/year=2015/month=3/day=8
> 25472 
> hdfs://nameservice1:9000/user/hive/warehouse/arecordparquetpartition/year=2015/month=3/day=9
> 25473 
> hdfs://nameservice1:9000/user/hive/warehouse/arecordparquetpartition/year=2015/month=4/day=10
> 25474 
> hdfs://nameservice1:9000/user/hive/warehouse/arecordparquetpartition/year=2015/month=4/day=11
> 25475 
> hdfs://nameservice1:9000/user/hive/warehouse/arecordparquetpartition/year=2015/month=5/day=10
> 25476 
> hdfs://nameservice1:9000/user/hive/warehouse/arecordparquetpartition/year=2015/month=5/day=8
> 25477 
> hdfs://nameservice1:9000/user/hive/warehouse/arecordparquetpartition/year=2015/month=5/day=9
> update SDS set LOCATION = 
> 'hdfs://nameservice1/user/hive/warehouse/arecordparquetpartition/year=2015/month=2/day=20'
>  where SD_ID = 25469;
> update SDS set LOCATION = 
> 'hdfs://nameservice1/user/hive/warehouse/arecordparquetpartition/year=2015/month=3/day=12'
>  where SD_ID = 25470;
> update SDS set LOCATION = 
> 'hdfs://nameservice1/user/hive/warehouse/arecordparquetpartition/year=2015/month=3/day=8'
>  where SD_ID = 25471;
> update SDS set LOCATION = 
> 'hdfs://nameservice1/user/hive/warehouse/arecordparquetpartition/year=2015/month=3/day=9'
>  where SD_ID = 25472;
> update SDS set LOCATION = 
> 'hdfs://nameservice1/user/hive/warehouse/arecordparquetpartition/year=2015/month=4/day=10'
>  where SD_ID = 25473;
> update SDS set LOCATION = 
> 'hdfs://nameservice1/user/hive/warehouse/arecordparquetpartition/year=2015/month=4/day=11'
>  where SD_ID = 25474;
> update SDS set LOCATION = 
> 'hdfs://nameservice1/user/hive/warehouse/arecordparquetpartition/year=2015/month=5/day=10'
>  where SD_ID = 25475;
> update SDS set LOCATION = 
> 'hdfs://nameservice1/user/hive/warehouse/arecordparquetpartition/year=2015/month=5/day=8'
>  where SD_ID = 25476;
> update SDS set LOCATION = 
> 'hdfs://nameservice1/user/hive/warehouse/arecordparquetpartition/year=2015/month=5/day=9'
>  where SD_ID = 25477;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-580) Inconsistent or blank fileFormats values passed to CM

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-580.
--
Resolution: Cannot Reproduce

> Inconsistent or blank fileFormats values passed to CM
> -
>
> Key: IMPALA-580
> URL: https://issues.apache.org/jira/browse/IMPALA-580
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 1.1
> Environment: Impala 1.1.0 and CM 4.6.2.
>Reporter: John Russell
>Priority: Minor
>
> In the CM "Query Details" page, one of the fields is "File Formats". If I 
> query a table created with STORED AS SEQFILE with the BZip2 compression 
> codec, CM shows a line like:
> File Formats: SEQUENCE_FILE/BZIP2
> That seems intuitive. However, for other combinations of file format and 
> compression codec, the "File Formats" value is blank or seems misleading. 
> select * from seqfile_snappy limit 5 -> file formats in CM is blank
> select * from rcfile_snappy limit 5 -> file formats in CM is blank
> select count(*) from seqfile_deflate -> file formats in CM = 
> SEQUENCE_FILE/DEFAULT
> select count(*) from rcfile_deflate -> file formats in CM = RC_FILE/DEFAULT 
> (is DEFAULT a typo for DEFLATE since this happens for both SEQFILE and RCFILE 
> tables?)
> select count(*) from parquet_snappy -> file formats =  PARQUET/NONE
> I also see PARQUET/NONE for a Parquet table compressed with GZip.
> I also see PARQUET/NONE for a Parquet table where the Impala data directory 
> contains data files compressed with different codecs. I understand CM could 
> in some cases display multiple values in this "File Formats" field, and 
> that's what I'd expect to happen in this case. (The same way I'd expect 
> multiple "File Formats" values for a join of tables with different file 
> formats, or a query against a partitioned table where partitions had 
> different file formats.)
> I did not have an LZO-compressed text table, so I didn't check if that case 
> would produce TEXT/LZO as expected.
> I did not have an Avro table, so I didn't check those combinations.
> I did not check Avro, SEQFILE, or RCFILE with data files from more than one 
> compression codec in the same directory.
> Other than the above cases, I think I checked every combination of file 
> format and codec, and the only issues I saw were those I listed.
> impala-shell PROFILE output or CM profile text available if desired.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-1798) Very slow performance of Views on top of another Views

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1798.
---
Resolution: Duplicate

Sounds like IMPALA-4242

> Very slow performance of Views on top of another Views
> --
>
> Key: IMPALA-1798
> URL: https://issues.apache.org/jira/browse/IMPALA-1798
> Project: IMPALA
>  Issue Type: Bug
>  Components: Perf Investigation
>Affects Versions: Impala 2.1, Impala 2.1.1, Impala 2.3.0
> Environment: Cluster 3 nodes Impala 2.1
>Reporter: Alex Finch
>Priority: Minor
>  Labels: performance, planner
>
> Query from a view has about the same performance as a query from the source 
> table. If we have a VIEW on top of another VIEW (even CREATE view_2 AS SELECT 
> * FROM view_1) the performance is much slower, for more complex queries on 
> top of view_2, the compilation of the query actually never finishes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-7735) Expose admission control status in impala-shell

2018-10-30 Thread Bikramjeet Vig (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikramjeet Vig resolved IMPALA-7735.

   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Expose admission control status in impala-shell
> ---
>
> Key: IMPALA-7735
> URL: https://issues.apache.org/jira/browse/IMPALA-7735
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: Impala 3.1.0
>Reporter: Tim Armstrong
>Assignee: Bikramjeet Vig
>Priority: Critical
>  Labels: admission-control
> Fix For: Impala 3.2.0
>
> Attachments: Screenshot1.png
>
>
> Following on from IMPALA-7545 we should also expose this in impala-shell. I 
> left some notes on that JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-2876) Investigate on applying rtm to spinlock

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2876.
---
Resolution: Later

> Investigate on applying rtm to spinlock
> ---
>
> Key: IMPALA-2876
> URL: https://issues.apache.org/jira/browse/IMPALA-2876
> Project: IMPALA
>  Issue Type: Bug
>  Components: Perf Investigation
>Affects Versions: Impala 2.3.0
>Reporter: Zuo Wang
>Priority: Minor
>
> Investigate on applying rtm to spinlock and when to use it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-3170) Large literal exponents cause many seconds of delay

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3170.
---
   Resolution: Fixed
Fix Version/s: Impala 3.0

3.0 no longer supports JDK7, so this should be not reproducible from there 
onwards.

> Large literal exponents cause many seconds of delay
> ---
>
> Key: IMPALA-3170
> URL: https://issues.apache.org/jira/browse/IMPALA-3170
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.5.0
>Reporter: Jim Apple
>Priority: Minor
> Fix For: Impala 3.0
>
>
> {noformat}
> for foo in 0 10 210 3210 43210 543210; do time -p impala-shell.sh -q "select 
> 1e${foo}"; done
> {noformat}
> {{0}}, {{10}}, {{210}} succeed.
> {{3210}} and {{43210}} fail with a sensible error message in less than half a 
> second.
> {{543210}} takes 28 seconds to return that error message.
> My first guess is the {{BigDecimal}} constructor in {{NumericLiteral(String 
> value, Type t)}}.
> This is one of those, "well, don't do that, then" type of bugs, but I'd 
> rather that malformed user input fail fast than cause impalad to spin, 
> especially since some user input is generated programatically.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-6397) IllegalStateException in planning of aggregation with float and decimal literal child expressions

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6397.
---
   Resolution: Fixed
Fix Version/s: Impala 3.0

This was fixed when decimal_v2 became the default.

> IllegalStateException in planning of aggregation with float and decimal 
> literal child expressions
> -
>
> Key: IMPALA-6397
> URL: https://issues.apache.org/jira/browse/IMPALA-6397
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.9.0, Impala 2.10.0, Impala 2.11.0
>Reporter: Alexander Behm
>Priority: Major
>  Labels: decimal, planner
> Fix For: Impala 3.0
>
>
> Reproduction:
> {code}
> select sum(float_col + d) from (select float_col, 1.2 d from 
> functional.alltypes) v;
> ERROR: IllegalStateException: Agg expr sum(float_col + 1.2) returns type 
> DOUBLE but its output tuple slot has type DECIMAL(38,9)
> {code}
> FE Stack:
> {code}
> I0113 14:44:36.300395  9285 jni-util.cc:211] java.lang.IllegalStateException: 
> Agg expr sum(f + 1.2) returns type DOUBLE but its output tuple slot has type 
> DECIMAL(38,9)
>   at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:145)
>   at 
> org.apache.impala.analysis.AggregateInfo.checkConsistency(AggregateInfo.java:702)
>   at 
> org.apache.impala.planner.AggregationNode.init(AggregationNode.java:165)
>   at 
> org.apache.impala.planner.SingleNodePlanner.createAggregationPlan(SingleNodePlanner.java:895)
>   at 
> org.apache.impala.planner.SingleNodePlanner.createSelectPlan(SingleNodePlanner.java:621)
>   at 
> org.apache.impala.planner.SingleNodePlanner.createQueryPlan(SingleNodePlanner.java:257)
>   at 
> org.apache.impala.planner.SingleNodePlanner.createSingleNodePlan(SingleNodePlanner.java:147)
>   at org.apache.impala.planner.Planner.createPlan(Planner.java:101)
>   at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1044)
>   at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1147)
>   at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:156)
> {code}
> This bug does not happen with DECIMAL_V2=true. It is specific to the implicit 
> casting behavior of DECIMAL_V1 with decimal literals.
> Note that the following equivalent query without the inline view works fine:
> {code}
> select sum(float_col + 1.2) from functional.alltypes;
> {code}
> Also note that this bug only happens in combination with a decimal literal. 
> The following query also works fine:
> {code}
> create table t (f float, d decimal (2,1));
> select sum(float_col + d) from (select f, d from t) v;
> // works fine
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IMPALA-7787) python26-incompatibility-check failed because of docker 503 Service Unavailable

2018-10-30 Thread Tim Armstrong (JIRA)

Tim Armstrong created IMPALA-7787:
-

 Summary: python26-incompatibility-check failed because of docker 
503 Service Unavailable
 Key: IMPALA-7787
 URL: https://issues.apache.org/jira/browse/IMPALA-7787
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Reporter: Tim Armstrong
Assignee: Philip Zeyliger


https://jenkins.impala.io/job/python26-incompatibility-check/529
https://jenkins.impala.io/job/python26-incompatibility-check/528

{noformat}
15:50:37 Initialized empty Git repository in /tmp/tmp.MKJUMZ3SBi/.git/
15:50:37 + git fetch http://gerrit.cloudera.org:8080/Impala-ASF 
refs/changes/00/11800/5
15:50:54 From http://gerrit.cloudera.org:8080/Impala-ASF
15:50:54  * branchrefs/changes/00/11800/5 -> FETCH_HEAD
15:50:54 + git archive --prefix=impala/ -o /tmp/impala.tar FETCH_HEAD
15:50:54 + docker run -u nobody -v /tmp/impala.tar:/tmp/impala.tar centos:6 
bash -o pipefail -c 'cd /tmp; python -c '\''import 
tarfile;tarfile.TarFile("/tmp/impala.tar").extractall()'\''; python -m 
compileall /tmp/impala'
15:50:54 Unable to find image 'centos:6' locally
15:50:55 docker: Error response from daemon: Get 
https://registry-1.docker.io/v2/library/centos/manifests/6: received unexpected 
HTTP status: 503 Service Unavailable.
15:50:55 See 'docker run --help'.
15:50:55 Build step 'Execute shell' marked build as failure
15:50:55 Set build name.
15:50:55 New build name is '#529 refs/changes/00/11800/5'
15:50:57 Finished: FAILURE
{noformat}

This happened a couple of times. Looks like flakiness but unsure if it was just 
a transient infra issue or something we're doing wrong.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-6111) Impala HBase compatability

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6111.
---
Resolution: Cannot Reproduce

Impala on HBase 2.x seems to be working now.

> Impala HBase compatability
> --
>
> Key: IMPALA-6111
> URL: https://issues.apache.org/jira/browse/IMPALA-6111
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zach Amsden
>Priority: Major
>
> We've discovered several compatibility issues downstream testing Impala with 
> Hbase 2.0; amongst them, key splitting fails because of changed APIs, and we 
> probably should no longer be depending on hbase-server.jar, which implements 
> internal APIs and is in general not meant for public use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-6276) incorrect projection of null column in hbase

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6276.
---
Resolution: Duplicate

> incorrect projection of null column in hbase
> 
>
> Key: IMPALA-6276
> URL: https://issues.apache.org/jira/browse/IMPALA-6276
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Vuk Ercegovac
>Priority: Major
>  Labels: correctness
>
> -- All rows
> select * from functional_hbase.nulltable;
> a,,b,c,d,e,f,g
> a,,NULL,NULL,NULL,ab,
> -- All rows that satisfy a predicate
> select * from functional_hbase.nulltable where a = 'a';
> a,,b,c,d,e,f,g
> a,,NULL,NULL,NULL,ab,
> -- A null column of the results that satisfy the predicate.
> select d from functional_hbase.nulltable where a = 'a';
> No Results
> Regardless of whether the predicate is pushed down to hbase, we still get no 
> results.
> (force no push by using a complex predicate, e.g., "like")
> The row that satisfies the predicate has been materialized (see the "*" 
> example) so
> projecting d, regardless of its null or not, should yield one record and not 
> an empty result set.
> Against a non-hbase table:
> select d from functional.nulltable where a = 'a';
> a,,b,c,d,e,f,g
> a,,NULL,NULL,NULL,ab,



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-728) HBase: NULL values not taken into account depending on the columns in SELECT

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-728.
--
Resolution: Duplicate

> HBase: NULL values not taken into account depending on the columns in SELECT
> 
>
> Key: IMPALA-728
> URL: https://issues.apache.org/jira/browse/IMPALA-728
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 1.1.1
> Environment: IMPALA 1.1.1-1.p0.17
> CDH 4.4.0-1.cdh4.4.0.p0.39
>Reporter: Christophe S
>Priority: Minor
>  Labels: impala
>
> Data are in HBase:
> CREATE EXTERNAL TABLE ttt (
> id  STRING,
> ...
> ...
> ...
> )STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH 
> SERDEPROPERTIES ("hbase.columns.mapping" ="
> :key,
> ...
> ...
> ...
> ") TBLPROPERTIES("hbase.table.name" = "");
> select count(*) from TTT where ColA = 'ABCDE' 
> ==> 53 results
> select * from TTT where ColA = 'ABCDE' 
> ==> 178 results
> select ColA from TTT where ColA = 'ABCDE' 
> ==> 53 results
> Looks like sometimes the 'NULL' values are taken into account and sometimes 
> not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IMPALA-7786) Start Hive and HMS in debug mode in the mini cluster

2018-10-30 Thread Fredy Wijaya (JIRA)

Fredy Wijaya created IMPALA-7786:


 Summary: Start Hive and HMS in debug mode in the mini cluster
 Key: IMPALA-7786
 URL: https://issues.apache.org/jira/browse/IMPALA-7786
 Project: IMPALA
  Issue Type: Improvement
  Components: Infrastructure
Reporter: Fredy Wijaya
Assignee: Fredy Wijaya


For development, it is useful to be able to run Sentry and HMS in debug mode to 
make it easy to debug issues related to Sentry or HMS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IMPALA-7788) Impala Doc: ADLS Gen2 Support

2018-10-30 Thread Alex Rodoni (JIRA)

Alex Rodoni created IMPALA-7788:
---

 Summary: Impala Doc: ADLS Gen2 Support
 Key: IMPALA-7788
 URL: https://issues.apache.org/jira/browse/IMPALA-7788
 Project: IMPALA
  Issue Type: Sub-task
  Components: Docs
Reporter: Alex Rodoni
Assignee: Alex Rodoni






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-7727) failed compute stats child query status no longer propagates to parent query

2018-10-30 Thread bharath v (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bharath v resolved IMPALA-7727.
---
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> failed compute stats child query status no longer propagates to parent query
> 
>
> Key: IMPALA-7727
> URL: https://issues.apache.org/jira/browse/IMPALA-7727
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Michael Brown
>Assignee: bharath v
>Priority: Blocker
>  Labels: regression, stress
> Fix For: Impala 3.1.0
>
> Attachments: 2.12-child-profile.txt, 2.12-compute-stats-profile.txt, 
> 3.1-child-profile.txt, 3.1-compute-stats-profile.txt
>
>
> [~bharathv] since you have been dealing with stats, please take a look. 
> Otherwise feel free to reassign. This bug prevents the stress test from 
> running with compute stats statements. It triggers in non-stressful 
> conditions, too.
> {noformat}
> $ impala-shell.sh -d tpch_parquet
> [localhost:21000] tpch_parquet> set mem_limit=24m;
> MEM_LIMIT set to 24m
> [localhost:21000] tpch_parquet> compute stats customer;
> Query: compute stats customer
> WARNINGS: Cancelled
> [localhost:21000] tpch_parquet>
> {noformat}
> The problem is that the child query didn't have enough memory to run, but 
> this error didn't propagate up.
> {noformat}
> Query (id=384d37fb2826a962:f4b10357):
>   DEBUG MODE WARNING: Query profile created while running a DEBUG build of 
> Impala. Use RELEASE builds to measure query performance.
>   Summary:
> Session ID: d343e1026d497bb0:7e87b342c73c108d
> Session Type: BEESWAX
> Start Time: 2018-10-18 15:16:34.036363000
> End Time: 2018-10-18 15:16:34.177711000
> Query Type: QUERY
> Query State: EXCEPTION
> Query Status: Rejected query from pool default-pool: minimum memory 
> reservation is greater than memory available to the query for buffer 
> reservations. Memory reservation needed given the current plan: 128.00 KB. 
> Adjust either the mem_limit or the pool config (max-query-mem-limit, 
> min-query-mem-limit) for the query to allow the query memory limit to be at 
> least 32.12 MB. Note that changing the mem_limit may also change the plan. 
> See the query profile for more information about the per-node memory 
> requirements.
> Impala Version: impalad version 3.1.0-SNAPSHOT DEBUG (build 
> 9f5c5e6df03824cba292fe5a619153462c11669c)
> User: mikeb
> Connected User: mikeb
> Delegated User: 
> Network Address: :::127.0.0.1:46458
> Default Db: tpch_parquet
> Sql Statement: SELECT COUNT(*) FROM customer
> Coordinator: mikeb-ub162:22000
> Query Options (set by configuration): MEM_LIMIT=25165824,MT_DOP=4
> Query Options (set by configuration and planner): 
> MEM_LIMIT=25165824,NUM_SCANNER_THREADS=1,MT_DOP=4
> Plan: 
> 
> Max Per-Host Resource Reservation: Memory=512.00KB Threads=5
> Per-Host Resource Estimates: Memory=146MB
> F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
> |  Per-Host Resources: mem-estimate=10.00MB mem-reservation=0B 
> thread-reservation=1
> PLAN-ROOT SINK
> |  mem-estimate=0B mem-reservation=0B thread-reservation=0
> |
> 03:AGGREGATE [FINALIZE]
> |  output: count:merge(*)
> |  mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB 
> thread-reservation=0
> |  tuple-ids=1 row-size=8B cardinality=1
> |  in pipelines: 03(GETNEXT), 01(OPEN)
> |
> 02:EXCHANGE [UNPARTITIONED]
> |  mem-estimate=0B mem-reservation=0B thread-reservation=0
> |  tuple-ids=1 row-size=8B cardinality=1
> |  in pipelines: 01(GETNEXT)
> |
> F00:PLAN FRAGMENT [RANDOM] hosts=1 instances=4
> Per-Host Resources: mem-estimate=136.00MB mem-reservation=512.00KB 
> thread-reservation=4
> 01:AGGREGATE
> |  output: sum_init_zero(tpch_parquet.customer.parquet-stats: num_rows)
> |  mem-estimate=10.00MB mem-reservation=0B spill-buffer=2.00MB 
> thread-reservation=0
> |  tuple-ids=1 row-size=8B cardinality=1
> |  in pipelines: 01(GETNEXT), 00(OPEN)
> |
> 00:SCAN HDFS [tpch_parquet.customer, RANDOM]
>partitions=1/1 files=1 size=12.34MB
>stored statistics:
>  table: rows=15 size=12.34MB
>  columns: all
>extrapolated-rows=disabled max-scan-range-rows=15
>mem-estimate=24.00MB mem-reservation=128.00KB thread-reservation=0
>tuple-ids=0 row-size=8B cardinality=15
>in pipelines: 00(GETNEXT)
> 
> Estimated Per-Host Mem: 153092096
> Per Host Min Memory Reservation: mikeb-ub162:22000(0) 
> mikeb-ub162:22001(128.00 KB)
> Request Pool: default-pool
> Admission result: Rejected
> Query Compilation: 126.903ms
>- Metadata of all 1 tables cached: 5.484ms (5.484ms)
>- Analysis

[jira] [Created] (IMPALA-7792) Disabling ORC scanner can cause query hang

2018-10-30 Thread Tim Armstrong (JIRA)

Tim Armstrong created IMPALA-7792:
-

 Summary: Disabling ORC scanner can cause query hang
 Key: IMPALA-7792
 URL: https://issues.apache.org/jira/browse/IMPALA-7792
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 3.1.0
Reporter: Tim Armstrong
Assignee: Tim Armstrong


{noformat}
$ start-impala-cluster.py --impalad_args=--enable_orc_scanner=false
> select * from functional_orc_def.alltypes;
{noformat}

The error in the impalad logs is:
{noformat}
E1030 17:36:12.035167  5934 ImpaladCatalog.java:217] Error adding catalog 
object: null
Java exception follows:
java.lang.NullPointerException
at org.apache.impala.catalog.Table.fromThrift(Table.java:286)
at 
org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:397)
at 
org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:284)
at 
org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:215)
at 
org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:105)
at 
org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:264)
at 
org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:187)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (IMPALA-7614) Impala 3.1 Doc: Document the New Invalidate Options

2018-10-30 Thread Alex Rodoni (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-7614.
---
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Impala 3.1 Doc: Document the New Invalidate Options
> ---
>
> Key: IMPALA-7614
> URL: https://issues.apache.org/jira/browse/IMPALA-7614
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc
> Fix For: Impala 3.1.0
>
>
> Document the new options:
> - invalidate_tables_timeout_s
> - invalidate_tables_on_memory_pressure{quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-4806) Error while flushing Kudu session: Timed out: Failed to write batch of 3 ops to tablet

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-4806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-4806.
---
Resolution: Information Provided

> Error while flushing Kudu session: Timed out: Failed to write batch of 3 ops 
> to tablet
> --
>
> Key: IMPALA-4806
> URL: https://issues.apache.org/jira/browse/IMPALA-4806
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Kudu_Impala, Impala 2.8.0
>Reporter: vadde dheeraj
>Priority: Major
> Fix For: Kudu_Impala
>
>
> Status: Error while flushing Kudu session: Timed out: Failed to write batch 
> of 3 ops to tablet f268863887f5432583c08e41b235071e after 343 attempt(s): 
> Failed to write to server: (no server available): Write(tablet: 
> f268863887f5432583c08e41b235071e, num_ops: 3, num_attempts: 343) passed its 
> deadline: Illegal state: Replica aa61b1ac331f4bd79aee5ed96aab9498 is not 
> leader of this config. Role: FOLLOWER. Consensus state: current_term: 172577 
> leader_uuid: "" config { OBSOLETE_local: false peers { permanent_uuid: 
> "aa61b1ac331f4bd79aee5ed96aab9498" member_type: VOTER last_known_addr { host: 
> peers { permanent_uuid: "6c4f1825b73347b0be4e8c18a4100287" member_type: VOTER 
> last_known_addr



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IMPALA-7791) Aggregation Node memory estimates don't account for number of fragment instances

2018-10-30 Thread Pooja Nilangekar (JIRA)

Pooja Nilangekar created IMPALA-7791:


 Summary: Aggregation Node memory estimates don't account for 
number of fragment instances
 Key: IMPALA-7791
 URL: https://issues.apache.org/jira/browse/IMPALA-7791
 Project: IMPALA
  Issue Type: Sub-task
Affects Versions: Impala 3.1.0
Reporter: Pooja Nilangekar


AggregationNode's memory estimates are calculated based on the input 
cardinality of the node, without accounting for the division of input data 
across fragment instances. This results in very high memory estimates. In 
reality, the nodes often use only a part of this memory.   

Example query:

{code:java}
[localhost:21000] default> select distinct * from tpch.lineitem limit 5; 
{code}

Summary: 

{code:java}
+--++--+--+---++---+---+---+
| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem  
| Est. Peak Mem | Detail




|
+--++--+--+---++---+---+---+
| 04:EXCHANGE  | 1  | 21.24us  | 21.24us  | 5 | 5  | 48.00 KB  
| 16.00 KB  | UNPARTITIONED 




|
| 03:AGGREGATE | 3  | 5.11s| 5.15s| 15| 5  | 576.21 MB 
| 1.62 GB   | FINALIZE  




|
| 02:EXCHANGE  | 3  | 709.75ms | 728.91ms | 6.00M | 6.00M  | 5.46 MB   
| 10.78 MB  | 
HASH(tpch.lineitem.l_orderkey,tpch.lineitem.l_partkey,tpch.lineitem.l_suppkey,tpch.lineitem.l_linenumber,tpch.lineitem.l_quantity,tpch.lineitem.l_extendedprice,tpch.lineitem.l_discount,tpch.lineitem.l_tax,tpch.lineitem.l_returnflag,tpch.lineitem.l_linestatus,tpch.lineitem.l_shipdate,tpch.lineitem.l_commitdate,tpch.lineitem.l_receiptdate,tpch.lineitem.l_shipinstruct,tpch.lineitem.l_shipmode,tpch.lineitem.l_comment)
 |
| 01:AGGREGATE | 3  | 4.37s| 4.70s| 6.00M | 6.00M  | 36.77 MB  
| 1.62 GB   | STREAMING 




|
| 00:SCAN HDFS | 3  | 437.14ms | 480.60ms | 6.00M | 6.00M  | 65.51 MB  
| 264.00 MB | tpch.lineitem

[jira] [Resolved] (IMPALA-7783) test_default_timezone failing on real cluster

2018-10-30 Thread David Knupp (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Knupp resolved IMPALA-7783.
-
Resolution: Fixed

> test_default_timezone failing on real cluster
> -
>
> Key: IMPALA-7783
> URL: https://issues.apache.org/jira/browse/IMPALA-7783
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: David Knupp
>Priority: Major
>
> shell/test_shell_commandline.py/test_default_timezone is failing due to 
> issues in asserting zoneinfo/tzname 
> {noformat}
> shell/test_shell_commandline.py:715: in test_default_timezone
> assert os.path.isfile("/usr/share/zoneinfo/" + tzname)
> E   assert (('/usr/share/zoneinfo/' + 
> 'SystemV/PST8PDT'))
> E+  where  =  '/data0/jenkins/workspace/Quasar-Executor/testing/inf...Impala-cdh-cluster-test-runner/infra/python/env/lib64/python2.7/posixpath.pyc'>.isfile
> E+where  '/data0/jenkins/workspace/Quasar-Executor/testing/inf...Impala-cdh-cluster-test-runner/infra/python/env/lib64/python2.7/posixpath.pyc'>
>  = os.path {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (IMPALA-7687) Impala 3.1 Doc: Add support for multiple distinct operators in the same query block

2018-10-30 Thread Alex Rodoni (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-7687.
---
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Impala 3.1 Doc: Add support for multiple distinct operators in the same query 
> block
> ---
>
> Key: IMPALA-7687
> URL: https://issues.apache.org/jira/browse/IMPALA-7687
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Affects Versions: Impala 3.1.0
>Reporter: Adam Holley
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc
> Fix For: Impala 3.1.0
>
>
> https://gerrit.cloudera.org/#/c/11823/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-7719) COMPUTE STATS not working in Impala 2.10.0 on Hive created AVRO Table

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7719.
---
Resolution: Cannot Reproduce

> COMPUTE STATS not working in Impala 2.10.0 on Hive created AVRO Table
> -
>
> Key: IMPALA-7719
> URL: https://issues.apache.org/jira/browse/IMPALA-7719
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.10.0
> Environment: Hive version : 1.1.0 , Impala version : 2.10.0, 
> Linux  2.6.32-504.46.1.el6.x86_64
>Reporter: Golap Binod
>Priority: Major
>
> Hi Team,
> I have created Hive Avro table and try to do COMPUTE STATS of that table, but 
> its failed saying "Please re-create the table with column definitions, e.g., 
> using the result of 'SHOW CREATE TABLE'". Tried on SHOW CREATE TABLE, but 
> that also not working. My Hive version : 1.1.0 , Impala version : 2.10.0.  
> Getting below Error:
> COMPUTE STATS h011DEMO.CME_CCAR4_SCNR_MGMT_TARGET_VERSIONS
> FAILED SSH Client Call: Remote Execution of command returned with '1' exit 
> code and Error message 
> Starting Impala Shell using Kerberos authentication
> Using service name 'impala'
> Connected to impala-uat.statestr.com:21000
> Server version: impalad version 2.10.0-cdh5.13.1 RELEASE (build 
> 1e4b23c4eb52dac95c5be6316f49685c41783c51)
> Query: compute stats h011DEMO.CME_CCAR5_SCNR_MGMT_TARGET_VERSIONS
> ERROR: AnalysisException: Cannot COMPUTE STATS on Avro table 
> 'cme_ccar5_scnr_mgmt_target_versions' because its column definitions do not 
> match those in the Avro schema.
> Definition of column 'version_id' of type 'double' does not match the 
> Avro-schema column 'row_guid' of type 'STRING' at position '0'.
> Please re-create the table with column definitions, e.g., using the result of 
> 'SHOW CREATE TABLE'
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IMPALA-7789) Impala 3.2 Doc: Admission Control Status in Impala Shell

2018-10-30 Thread Alex Rodoni (JIRA)

Alex Rodoni created IMPALA-7789:
---

 Summary: Impala 3.2 Doc: Admission Control Status in Impala Shell
 Key: IMPALA-7789
 URL: https://issues.apache.org/jira/browse/IMPALA-7789
 Project: IMPALA
  Issue Type: Sub-task
  Components: Docs
Reporter: Alex Rodoni
Assignee: Alex Rodoni






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-5933) Compute incremental stats should always return a result set

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-5933.
---
   Resolution: Fixed
 Assignee: Zoltán Borók-Nagy
Fix Version/s: Impala 3.0

Should return "No partitions selected for incremental stats update" now. See 
test in 
testdata/workloads/functional-query/queries/QueryTest/compute-stats-incremental.test

commit 2ee914d5b365c8230645fdd0604a67eff1edbeb2
Author: Zoltan Borok-Nagy 
Date:   Thu Apr 5 14:54:27 2018 +0200

IMPALA-5903: Inconsistent specification of result set and result set 
metadata

Before this commit it was quite random which DDL oprations
returned a result set and which didn't.

With this commit, every DDL operations return a summary of
its execution. They declare their result set schema in
Frontend.java, and provide the summary in CalatogOpExecutor.java.

Updated the tests according to the new behavior.

Change-Id: Ic542fb8e49e850052416ac663ee329ee3974e3b9
Reviewed-on: http://gerrit.cloudera.org:8080/9090
Reviewed-by: Alex Behm 
Tested-by: Impala Public Jenkins 


> Compute incremental stats should always return a result set
> ---
>
> Key: IMPALA-5933
> URL: https://issues.apache.org/jira/browse/IMPALA-5933
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, Frontend
>Affects Versions: Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, 
> Impala 2.10.0
>Reporter: Alexander Behm
>Assignee: Zoltán Borók-Nagy
>Priority: Major
> Fix For: Impala 3.0
>
>
> Compute incremental stats should always return a result set that states which 
> stats were computed/modified. Today, we have a shortcut in the code for 
> compute incremental stats that does not return a result set when no new stats 
> are computed because all partitions already have incremental stats.
> The fact that the same command sometimes returns a result set and sometimes 
> not depending on the state of a table is strange, and can confuse clients 
> like JDBC/ODBC that might reasonably expect a result set for that statement.
> The issue can be reproduced by running compute incremental stats twice in a 
> row on the same table. The second run does not return a result set.
> The culprit is in client-request-state.cc ClientRequestState::WaitInternal():
> {code}
> ...
>   if (catalog_op_type() == TCatalogOpType::DDL &&
>   ddl_type() == TDdlType::COMPUTE_STATS && child_queries.size() > 0) {
> RETURN_IF_ERROR(UpdateTableAndColumnStats(child_queries));
>   }
> ...
> {code}
> For a no-op incremental stats the number of child queries is 0, so we never 
> set a result set or the result set metadata.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IMPALA-7790) Kudu tests fail when with use_hybrid_clock=false

2018-10-30 Thread Thomas Tauber-Marshall (JIRA)

Thomas Tauber-Marshall created IMPALA-7790:
--

 Summary: Kudu tests fail when with use_hybrid_clock=false
 Key: IMPALA-7790
 URL: https://issues.apache.org/jira/browse/IMPALA-7790
 Project: IMPALA
  Issue Type: Bug
Reporter: Thomas Tauber-Marshall
Assignee: Thomas Tauber-Marshall


Since IMPALA-6812, we've run many of our tests against Kudu at the 
READ_AT_SNAPSHOT scan level, which ensures consistent results. This scan level 
is only supported if Kudu is run with the flag --use_hybrid_clock=true (which 
is the default).

This hasn't generally been a problem in the past, as we've primarily run 
Impala's functional tests against the minicluster, where Kudu is configured 
correctly, but there's been some effort around running these tests against real 
clusters, in which case --use_hybrid_clock=false may be set.

We should at a minimum recognize this situation and skip the tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-7785) GROUP BY clause cannot contain a CASE statement

2018-10-30 Thread Paul Rogers (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paul Rogers resolved IMPALA-7785.
-
Resolution: Invalid

> GROUP BY clause cannot contain a CASE statement
> ---
>
> Key: IMPALA-7785
> URL: https://issues.apache.org/jira/browse/IMPALA-7785
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Priority: Minor
>
> The FE cannot handle a {{CASE}} statement in a {{GROUP BY}} clause. As a 
> result, the change in IMPALA-7655 cannot be applied to queries with such a 
> clause for fear of ending up in the situation shown later.
> Consider this simple query:
> {code:sql}
> SELECT case when string_col is not null then string_col else 'foo' end
> 
> FROM functional.alltypestiny 
> GROUP BY case when string_col is not null then string_col else 'foo' end  
>
> {code}
> The above will fail with the following:
> {noformat}
>  org.apache.impala.common.AnalysisException:
>  select list expression not produced by aggregation output
>  (missing from GROUP BY clause?):
>  CASE WHEN string_col IS NOT NULL THEN string_col ELSE 'foo' END
> {noformat}
> This then causes the rewrites in IMPALA-7655 to fail:
> {code:sql}
> SELECT coalesce(string_col, 'foo')
> FROM functional.alltypes  
> GROUP BY coalesce(string_col, 'foo') 
> {code}
> The above is rewritten using the new conditional function rewrite rules. 
> Result:
> {noformat}
> org.apache.impala.common.AnalysisException:
>   select list expression not produced by aggregation output
>   (missing from GROUP BY clause?):
>   CASE WHEN string_col IS NOT NULL THEN string_col ELSE 'foo' END
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-6374) test tpcds-q98.test has some incorrect data

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6374.
---
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> test tpcds-q98.test has some incorrect data 
> 
>
> Key: IMPALA-6374
> URL: https://issues.apache.org/jira/browse/IMPALA-6374
> Project: IMPALA
>  Issue Type: Test
>  Components: Infrastructure
>Affects Versions: Impala 2.9.0
>Reporter: Stephen Carlin
>Assignee: Tim Armstrong
>Priority: Critical
> Fix For: Impala 3.1.0
>
>
> I happened to look through the unit tests and it looks like tpcds-q98.test 
> has some bad data in it, but it is verifying correctly.
> One example (among maybe 12 or so) is on line 469:
> line 468: 'EKGD','Houses should 
> ','Books','mystery',1.77,3341.80,1.96
> line 469: 'FFDD',','Books','mystery',2.79,4237.23,2.49
> Note that the 2nd field for line 468 looks normal, but line 469 has just a 
> single quote.
> I believe this is happening on all strings that end with a comma for this 
> test.  The correct result for this line (I believe) should be (note the comma 
> after Poor):
> 'FFDD','French, civil hours must report essential values. 
> Reasonable, complete judges vary clearly homes; often pleasant women would 
> watch. Poor,','Books','mystery',2.79,4237.23,2.48
> My guess as to why this is happening is some code in test_result_verifier.py, 
> specifically in the part that says:
> for col_val in row_string.split(','):
>   # This is a bit tricky because we need to handle the case where a comma 
> may be in
>   # the middle of a string. We detect this by finding a split that starts 
> with an
>   # opening string character but that doesn't end in a string character. 
> It is
>   # possible for the first character to be a single-quote, so handle that 
> case
>   if (col_val.startswith("'") and not col_val.endswith("'")) or (col_val 
> == "'"):



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IMPALA-7793) CASE statement does not handle NULL from UDF overflow

2018-10-30 Thread Paul Rogers (JIRA)

Paul Rogers created IMPALA-7793:
---

 Summary: CASE statement does not handle NULL from UDF overflow
 Key: IMPALA-7793
 URL: https://issues.apache.org/jira/browse/IMPALA-7793
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 3.0
Reporter: Paul Rogers


The test suite {{QueryTest/decimal-exprs}} contains the following test:

{code:sql}
set decimal_v2=false;
set ENABLE_EXPR_REWRITES=false;
select coalesce(1.8, cast(0 as decimal(38,38)))
{code}

Which produces this result:

{noformat}
+--+
| coalesce(1.8, cast(0 as decimal(38,38))) |
+--+
| 0.00 |
+--+
WARNINGS: UDF WARNING: Decimal expression overflowed, returning NULL
{noformat}

Notice that the "1.8" overflowed when being put into a {{DECIMAL(38,38)}} type. 
(The precision and range are both 38, meaning all digits are after the decimal 
point.)

The {{coalesce()}} function caught the overflow, treated it as a {{NULL}}, and 
selected the second value from the list, which is 0.

Very good. Now, try the equivalent CASE form (from MPALA-7655):

{noformat}
select CASE WHEN 1.8 IS NOT NULL THEN 1.8 ELSE cast(0 as decimal(38,38)) END;

+---+
| case when 1.8 is not null then 1.8 else cast(0 as decimal(38,38)) end |
+---+
| NULL  |
+---+
WARNINGS: UDF WARNING: Decimal expression overflowed, returning NULL
{noformat}

Apparently, the overflow somehow caused the {{ELSE}} clause to not fire.

This one is likely a bug in the BE code generation. Though, tried the {{CASE}} 
query with a variety of options:

{noformat}
set disable_codegen=true;

and

set disable_codegen=false;
set disable_codegen_rows_threshold=0;

and

set disable_codegen_rows_threshold=10;
{noformat}

In all cases, the {{CASE}} produced the wrong result. Also tried wrapping the 
expression {{1.8 IS NOT NULL}} in a variety of forms: {{IS TRUE}}, {{IS NOT 
FALSE}}. None of this worked correctly.

The result of this bug is the the above-mentioned test case fails in a build 
that contains MPALA-7655.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IMPALA-7794) Rewrite ownership authorization tests

2018-10-30 Thread Fredy Wijaya (JIRA)

Fredy Wijaya created IMPALA-7794:


 Summary: Rewrite ownership authorization tests
 Key: IMPALA-7794
 URL: https://issues.apache.org/jira/browse/IMPALA-7794
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Reporter: Fredy Wijaya
Assignee: Fredy Wijaya


The ownership tests have been very flaky. There were few attempts to fix them, 
but none of them have truly fixed the flakiness due to the tests are 
timing-based. Some of the tests are also not very readable. Those tests need to 
be rewritten while keeping the code coverage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (IMPALA-5847) Some query options do not work as expected in .test files

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-5847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-5847:
--
Issue Type: Improvement  (was: Bug)

> Some query options do not work as expected in .test files
> -
>
> Key: IMPALA-5847
> URL: https://issues.apache.org/jira/browse/IMPALA-5847
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Reporter: Alexander Behm
>Priority: Minor
>
> We often use "set" in .test files to alter query options. Theoretically, a 
> "set" command should change the session-level query options and in most cases 
> a single .test file is executed from the same Impala session. However, for 
> some options using "set" within a query section does not seem to work. For 
> example, "num_nodes" does not work as expected as shown below.
> PyTest:
> {code}
> import pytest
> from tests.common.impala_test_suite import ImpalaTestSuite
> class TestStringQueries(ImpalaTestSuite):
>   @classmethod
>   def get_workload(cls):
> return 'functional-query'
>   def test_set_bug(self, vector):
> self.run_test_case('QueryTest/set_bug', vector)
> {code}
> Corresponding .test file:
> {code}
> 
>  QUERY
> set num_nodes=1;
> select count(*) from functional.alltypes;
> select count(*) from functional.alltypes;
> select count(*) from functional.alltypes;
>  RESULTS
> 7300
>  TYPES
> BIGINT
> 
> {code}
> After running the test above, I validated that the 3 queries were run from 
> the same session, and that the queries run a distributed plan. The 
> "num_nodes" option was definitely not picked up. I am not sure which query 
> options are affected. In several .test files setting other query options does 
> seem to work as expected.
> I suspect that the test framework might keep its own list of default query 
> options which get submitted together with the query, so the session-level 
> options are overridden on a per-request basis. For example, if I change the 
> pytest to remove the "num_nodes" dictionary entry, then the test works as 
> expected.
> PyTest workaround:
> {code}
> import pytest
> from tests.common.impala_test_suite import ImpalaTestSuite
> class TestStringQueries(ImpalaTestSuite):
>   @classmethod
>   def get_workload(cls):
> return 'functional-query'
>   def test_set_bug(self, vector):
> # Workaround SET bug
> vector.get_value('exec_option').pop('num_nodes', None)
> self.run_test_case('QueryTest/set_bug', vector)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-1977) Slow cross join query - requires band join implementation

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-1977:
--
Issue Type: Improvement  (was: Bug)

> Slow cross join query - requires band join implementation
> -
>
> Key: IMPALA-1977
> URL: https://issues.apache.org/jira/browse/IMPALA-1977
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.1.1, Impala 2.2
> Environment: CentOS 6.6, CDH 5.4.0
>Reporter: Isaac Hodes
>Priority: Minor
> Attachments: complete.txt, failed agg profile.txt, oom count 
> profile.txt, profile.txt, running select pileup chr20 query.txt, successful 
> hash join profile.txt
>
>
> I'm unable to get this query to return; according to Cloudera Manager, it 
> hangs around 12%, in F01, which is just an HDFS scan.
> Query (I'd like to be able to run this without restriction the reference and 
> bam to specific contigs, as well):
> {code:sql}
> select ref.contig, ref.position, bam.sequence
> from flatbam bam, reference_genome ref
> where ref.contig = 'chr20'
> and bam.contig__contigname = '20'
> and ref.`position` between bam.`start` and bam.`end`
> ;
> {code}
> Some stats (curiously, the job seemed to get to 98% before hanging without 
> table stats; I can't drop table stats to repro: 
> https://issues.cloudera.org/browse/IMPALA-1976)
> {code}
> > show table stats flatbam;
> +---++-+--+---+-+---+
> | #Rows | #Files | Size| Bytes Cached | Cache Replication | Format  | 
> Incremental stats |
> +---++-+--+---+-+---+
> | 856755914 | 627| 60.48GB | NOT CACHED   | NOT CACHED| PARQUET | 
> false |
> +---++-+--+---+-+---+
> Fetched 1 row(s) in 0.08s
> > show table stats reference_genome;
> +++-+--+---+-+---+
> | #Rows  | #Files | Size| Bytes Cached | Cache Replication | Format  
> | Incremental stats |
> +++-+--+---+-+---+
> | 3137327831 | 91 | 12.63GB | NOT CACHED   | NOT CACHED| PARQUET 
> | false |
> +++-+--+---+-+---+
> Fetched 1 row(s) in 0.02s
> > select count(*) from flatbam where contig__contigname = '20';
> +--+
> | count(*) |
> +--+
> | 17378815 |
> +--+
> Fetched 1 row(s) in 2.44s
> > select count(*) from reference_genome where contig = 'chr20';
> +--+
> | count(*) |
> +--+
> | 63025520 |
> +--+
> Fetched 1 row(s) in 3.66s
> {code}
> Explain:
> {code}
> 
> Estimated Per-Host Requirements: Memory=1.31GB VCores=2
> F02:PLAN FRAGMENT [UNPARTITIONED]
>   04:EXCHANGE [UNPARTITIONED]
>  hosts=91 per-host-mem=unavailable
>  tuple-ids=0,1 row-size=175B cardinality=3513626239162
> F00:PLAN FRAGMENT [RANDOM]
>   DATASTREAM SINK [FRAGMENT=F02, EXCHANGE=04, UNPARTITIONED]
>   02:CROSS JOIN [BROADCAST]
>   |  predicates: ref.position >= bam.start, ref.position <= bam.`end`
>   |  hosts=91 per-host-mem=814.93MB
>   |  tuple-ids=0,1 row-size=175B cardinality=3513626239162
>   |
>   |--03:EXCHANGE [BROADCAST]
>   | hosts=91 per-host-mem=0B
>   | tuple-ids=1 row-size=25B cardinality=34859198
>   |
>   00:SCAN HDFS [default.flatbam bam, RANDOM]
>  partitions=1/1 files=627 size=60.48GB
>  predicates: bam.contig__contigname = '20'
>  table stats: 856755914 rows total
>  column stats: all
>  hosts=91 per-host-mem=352.00MB
>  tuple-ids=0 row-size=150B cardinality=10079481
> F01:PLAN FRAGMENT [RANDOM]
>   DATASTREAM SINK [FRAGMENT=F00, EXCHANGE=03, BROADCAST]
>   01:SCAN HDFS [default.reference_genome ref, RANDOM]
>  partitions=1/1 files=91 size=12.63GB
>  predicates: ref.contig = 'chr20'
>  table stats: 3137327831 rows total
>  column stats: all
>  hosts=91 per-host-mem=176.00MB
>  tuple-ids=1 row-size=25B cardinality=34859198
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-3598) Timestamp casting is inconsistent

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3598.
---
Resolution: Fixed

> Timestamp casting is inconsistent
> -
>
> Key: IMPALA-3598
> URL: https://issues.apache.org/jira/browse/IMPALA-3598
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.6.0
>Reporter: Taras Bobrovytsky
>Priority: Minor
>
> The following two queries result in different timestamp casting behavior.
> {code}
> select cast(25343000 as timestamp);
> {code}
> Result:
> {code}
> 1-11-16 14:13:20
> {code}
> The following query returns NULL.
> {code}
> select cast("1-11-16 14:13:20" as timestamp);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-988) Join strategy (broadcast vs shuffle) decision does not take mem limit and other joins into account

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-988:
-
Issue Type: Improvement  (was: Bug)

> Join strategy (broadcast vs shuffle) decision does not take mem limit and 
> other joins into account
> --
>
> Key: IMPALA-988
> URL: https://issues.apache.org/jira/browse/IMPALA-988
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 1.2.1
>Reporter: Alan Choi
>Priority: Minor
>  Labels: resource-management
>
> The amount of available memory changes the trade-off between partitioned and 
> shuffle join strategies: if switching to shuffle join can avoid spilling to 
> disk, it may be worth paying the cost of the additional network transfer.
> There are two issues:
> 1. Join strategy decision only takes query mem-limit into account but ignore 
> process mem-limit.
> 2. Join strategy decision does not take other joins of the same query into 
> account. When multiple joins are present, it'll go over the mem-limit.
> Note that when IMPALA-3200 is completed, this shouldn't prevent the query 
> running to completion, but still affects performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-1023) Select nodes missing from plans result in minor inefficiencies.

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-1023:
--
Issue Type: Improvement  (was: Bug)

> Select nodes missing from plans result in minor inefficiencies.
> ---
>
> Key: IMPALA-1023
> URL: https://issues.apache.org/jira/browse/IMPALA-1023
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 1.3.1
>Reporter: Alexander Behm
>Priority: Minor
>
> Our predicate assignment/propagation sometimes omits predicates that could 
> also be assigned in a Select node.
> The plans are still correct, i.e., the predicate don't simply disappear, but 
> the plans could be more efficient with an additional Select node.
> For example, consider these queries and plans:
> {code}
> select a.int_col, b.bigint_col from
> (select id, int_col, bigint_col from functional.alltypes) b
> inner join
> (select id, int_col, bigint_col from functional.alltypes order by id limit 
> 100) a
> on a.id = b.id
> where a.id > 10 and b.id > 20
> +-+
> | Explain String  |
> +-+
> | Estimated Per-Host Requirements: Memory=160.00MB VCores=2   |
> | |
> | 08:EXCHANGE [PARTITION=UNPARTITIONED]   |
> | |   |
> | 03:HASH JOIN [INNER JOIN, PARTITIONED]  |
> | |  hash predicates: id = id |
> | |   |
> | |--07:EXCHANGE [PARTITION=HASH(id)] |
> | |  |  limit: 100|
> | |  ||
> | |  05:TOP-N [LIMIT=100] |
> | |  |  order by: id ASC  |
> | |  ||
> | |  04:EXCHANGE [PARTITION=UNPARTITIONED]|
> | |  ||
> | |  02:TOP-N [LIMIT=100] |
> | |  |  order by: id ASC  |
> | |  ||
> | |  01:SCAN HDFS [functional.alltypes]   |
> | | partitions=24/24 size=478.45KB|
> | |   |
> | 06:EXCHANGE [PARTITION=HASH(id)]|
> | |   |
> | 00:SCAN HDFS [functional.alltypes]  |
> |partitions=24/24 size=478.45KB   |
> |predicates: functional.alltypes.id > 10, functional.alltypes.id > 20 |
> +-+
> {code}
> Here's the same queries with the a and b tables flipped:
> {code}
> select a.int_col, b.bigint_col from
> (select id, int_col, bigint_col from functional.alltypes order by id limit 
> 100) a
> inner join
> (select id, int_col, bigint_col from functional.alltypes) b
> on a.id = b.id
> where a.id > 10 and b.id > 20
> +-+
> | Explain String  |
> +-+
> | Estimated Per-Host Requirements: Memory=160.00MB VCores=1   |
> | |
> | 08:EXCHANGE [PARTITION=UNPARTITIONED]   |
> | |   |
> | 04:HASH JOIN [INNER JOIN, BROADCAST]|
> | |  hash predicates: id = id |
> | |   |
> | |--07:EXCHANGE [BROADCAST]  |
> | |  ||
> | |  02:SELECT|
> | |  |  predicates: id > 10

[jira] [Updated] (IMPALA-1076) Add a shell option to limit the maximum number of rows that are pretty printed

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-1076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-1076:
--
Issue Type: Improvement  (was: Bug)

> Add a shell option to limit the maximum number of rows that are pretty printed
> --
>
> Key: IMPALA-1076
> URL: https://issues.apache.org/jira/browse/IMPALA-1076
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: Impala 1.3.1
>Reporter: Nong Li
>Priority: Minor
>  Labels: impala-shell
>
> I think people like to run a query that returns a large number of rows and 
> redirect them as a simple benchmark. This results in a high amount of time 
> spent in pretty print.
> We should add an option "max_pretty_printed_rows" or something and when we 
> hit that value, the shell should disable pretty printing for that query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-2609) Table aliases with spaces do work properly

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2609.
---
Resolution: Cannot Reproduce
  Assignee: (was: Syed A. Hashmi)

> Table aliases with spaces do work properly
> --
>
> Key: IMPALA-2609
> URL: https://issues.apache.org/jira/browse/IMPALA-2609
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.2.4, Impala 2.3.0
>Reporter: Saravana
>Priority: Minor
>
> If the query contains a subquery in the FROM clause and the alias to the 
> subquery contains space, 
>  
> {code:sql}select `alias with space`.code from (select * from sample_07) 
> `alias with space`{code}
>  
> the JDBC driver throws exception
>  
> Exception in thread "main" java.sql.SQLException: 
> [Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error 
> Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, 
> errorMessage:AnalysisException: Syntax error in line 1:
> ...lt`.`sample_07`) AS alias with space
>  ^
> Encountered: WITH
> Expected: CROSS, FROM, FULL, GROUP, HAVING, INNER, JOIN, LEFT, LIMIT, OFFSET, 
> ON, ORDER, RIGHT, UNION, USING, WHERE, COMMA
>  
> CAUSED BY: Exception: Syntax error
> ), Query: SELECT `alias with space`.`code` FROM (SELECT `sample_07`.`code`, 
> `sample_07`.`description`, `sample_07`.`total_emp`, `sample_07`.`salary` FROM 
> `default`.`sample_07`) AS alias with space.
> at 
> com.cloudera.impala.hivecommon.api.HS2Client.executeStatementInternal(Unknown 
> Source)
> at com.cloudera.impala.hivecommon.api.HS2Client.executeStatement(Unknown 
> Source)
> at 
> com.cloudera.impala.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.executeQuery(Unknown
>  Source)
> at 
> com.cloudera.impala.hivecommon.dataengine.HiveJDBCDSIExtQueryExecutor.execute(Unknown
>  Source)
> at com.cloudera.impala.jdbc.common.SStatement.executeNoParams(Unknown Source)
> at com.cloudera.impala.jdbc.common.SStatement.executeQuery(Unknown Source)
> Caused by: com.cloudera.impala.support.exceptions.GeneralException: 
> [Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error 
> Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, 
> errorMessage:AnalysisException: Syntax error in line 1:
> ...lt`.`sample_07`) AS alias with space
>  ^
> Encountered: WITH
> Expected: CROSS, FROM, FULL, GROUP, HAVING, INNER, JOIN, LEFT, LIMIT, OFFSET, 
> ON, ORDER, RIGHT, UNION, USING, WHERE, COMMA
>  
> CAUSED BY: Exception: Syntax error
> ), Query: SELECT `alias with space`.`code` FROM (SELECT `sample_07`.`code`, 
> `sample_07`.`description`, `sample_07`.`total_emp`, `sample_07`.`salary` FROM 
> `default`.`sample_07`) AS alias with space.
> ... 6 more



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-2692) Cancelling Timed Out Queries and/or Timed out connections

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2692.
---
Resolution: Cannot Reproduce

It's not clear what the problem actually is. Please reopen if you have steps to 
reproduce.

> Cancelling Timed Out Queries and/or Timed out connections
> -
>
> Key: IMPALA-2692
> URL: https://issues.apache.org/jira/browse/IMPALA-2692
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.2.1
> Environment: ODBC and Hue connections
>Reporter: Summer
>Priority: Minor
>
> Hi-
> When running queries via Hue or via ODBC connection (for example through the 
> impyla module), if the query or connection times out, even if you then 
> execute a conn.close() statement, that connection does not actually get 
> closed, leaving an ongoing open connection with possible security issues that 
> can only be cancelled via the cloudera manager.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-3137) Switching between databases give a wrong result

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3137.
---
Resolution: Invalid

We don't track bugs with this driver on Apache JIRA.

> Switching between databases give a wrong result
> ---
>
> Key: IMPALA-3137
> URL: https://issues.apache.org/jira/browse/IMPALA-3137
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.2, Impala 2.3.0
>Reporter: Jiri Novak
>Priority: Minor
> Attachments: ImpalaSwitchDB.zip
>
>
> Setting default database in ODBC driver setup is case sensitive and affects 
> the result of select in a very strange way.
> Use standard C# (.Net version is 4.5) and ADO.Net classes from 
> System.Data.Odbc namespace to access ODBC data source, ODBC driver 
> v2.5.30.1011 (32 bit) 
> Test case - 
> 1.Create two databases TESTDB1 and TESTDB2
> 2.Create two table TESTDB1.TABX and TESTDB2.TABX
> 3.Insert in table TESTDB1.TABX 5 rows
> 4.Set database on “Cloudera ODBC Driver for Impala DSN Setup” as TESTDB1 
> (uppercase!)
> 5.Execute (both of them without statement terminator ';')
> {code:sql}
> USE TESTDB2
> SELECT COUNT(*) FROM TABX -- you will get the expected result – 0
> {code}
> 6.Repeat the step # 4 but in this case set the default database as 
> testdb1 (lower case)
> 7.Execute the same script as in step # 5. Now you will get the result – 
> 5! 
> The table TABX has been resolved as TESTDB1.TABX and it’s wrong.
> There is a simple .Net application attached which can be used to simulate the 
> issue. It is needed only to change a connection string directly in code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-3091) LLVM memory is not accounted for in query mem tracker

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-3091:
--
Issue Type: Improvement  (was: Bug)

> LLVM memory is not accounted for in query mem tracker
> -
>
> Key: IMPALA-3091
> URL: https://issues.apache.org/jira/browse/IMPALA-3091
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.2
>Reporter: Skye Wanderman-Milne
>Priority: Minor
>  Labels: codegen, resource-management
>
> LLVM can use a lot of memory optimizing the module in pathological cases, 
> which is only accounted for in the process mem tracker. For instance, the 
> following query:
> {noformat}
> select * from functional_text_lzo.widetable_1000_cols;
> {noformat}
> Causes the process tracker to go to 3GB, while the query tracker reports 
> almost nothing:
> {noformat}
> Process: Limit=12.44 GB Consumption=2.98 GB
>   RequestPool=default-pool: Consumption=16.02 KB
> Query(2d44bf2d595eb55f:3641ee051e509f82) Limit: Consumption=16.02 KB
>   Fragment 2d44bf2d595eb55f:3641ee051e509f83: Consumption=8.00 KB
> EXCHANGE_NODE (id=1): Consumption=0
> DataStreamRecvr: Consumption=0
>   Block Manager: Limit=9.95 GB Consumption=0
>   Fragment 2d44bf2d595eb55f:3641ee051e509f84: Consumption=8.02 KB
> HDFS_SCAN_NODE (id=0): Consumption=0
> DataStreamSender: Consumption=16.00 B
> {noformat}
> This is related to IMPALA-967.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-7785) Analyzer cannot handle GROUP BY clause rewrites

2018-10-30 Thread Paul Rogers (JIRA)

Paul Rogers created IMPALA-7785:
---

 Summary: Analyzer cannot handle GROUP BY clause rewrites
 Key: IMPALA-7785
 URL: https://issues.apache.org/jira/browse/IMPALA-7785
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 3.0
Reporter: Paul Rogers


The FE analyzer has a stage that rewrites expressions to make them simpler. The 
analyzer also has a stage that matches up {{GROUP BY}} expressions with 
{{SELECT}} clause expressions. Apparently, the two don't work together:

{code:sql}
SELECT coalesce(string_col, 'foo')
FROM functional.alltypes  
GROUP BY coalesce(string_col, 'foo') 
{code}

The above is rewritten using the new conditional function rewrite rules. Result:

{noformat}
org.apache.impala.common.AnalysisException:
  select list expression not produced by aggregation output
  (missing from GROUP BY clause?):
  CASE WHEN string_col IS NOT NULL THEN string_col ELSE 'foo' END
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-3636) Regression in DecimalOperators::EQ with codegen disabled

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-3636:
--
Issue Type: Improvement  (was: Bug)

> Regression in DecimalOperators::EQ with codegen disabled
> 
>
> Key: IMPALA-3636
> URL: https://issues.apache.org/jira/browse/IMPALA-3636
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.6.0
>Reporter: Mostafa Mokhtar
>Priority: Minor
>  Labels: performance, regression
>
> Some of the decimal improvements that came in Impala 2.6 introduced a 
> regression in the none-codegened path.
> This regression was cause by 
> https://github.com/cloudera/Impala/blob/cdh5-trunk/testdata/workloads/targeted-perf/queries/primitive_orderby_all.test.
>  
> After
> ||Function Stack||CPU Time: Total||
> |impala::DecimalOperators::Eq_DecimalVal_DecimalVal|62.207s|
> |  --impala::Expr::GetConstantInt|55.458s|
> |  --impala::DecimalValue::Eq|1.480s|
> |  --impala::GetDecimal8Value|0.290s|
> |  --impala::DecimalValue<__int128>::Eq|0.190s|
>   
>  Before 
> ||Function Stack||CPU Time: Total||
> |impala::DecimalOperators::Eq_DecimalVal_DecimalVal|9.809s|
> |  --impala::DecimalValue::Compare|2.300s|
> |  --impala_udf::FunctionContext::GetArgType|2.130s|
> |  --func@0x812950|0.390s|
> This is a simplified version of the query which can be used as a repro
> {code}
> select *
> FROM (
>   SELECT Rank() OVER (
>   ORDER BY l_extendedprice
> ,l_quantity
> ,l_discount
> ,l_tax
>   ) AS rank
>   FROM lineitem
>   WHERE l_shipdate < '1992-05-09'
>   ) a
> WHERE rank < 10
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7780) Rebase PlannerTest expected output for estimates, errors

2018-10-30 Thread Sahil Takiar (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668898#comment-16668898
 ] 

Sahil Takiar commented on IMPALA-7780:
--

+1 to some type of long term solution to this. When modifying Planner tests 
there seems to be a lot of manual work involved to updating the {{.test}} 
files. Either (1) you can make the changes manually, or (2) you can run the 
tests and copy the files from {{$IMPALA_FE_TEST_LOGS_DIR/PlannerTest}} 
(however, if you do this you hit the issue describe in the JIRA description).

Unless I'm missing an easier way to re-generate planner tests?

If the file sizes are already ignored during the diff operation, it would be 
nice to just mask them. This is what Hive does. It replaces certain patterns 
with {{ A masked pattern was here }}. The target patterns are usually 
strings that can change depending on the test environment (e.g. a HDFS path, or 
in this case file sizes).

> Rebase PlannerTest expected output for estimates, errors
> 
>
> Key: IMPALA-7780
> URL: https://issues.apache.org/jira/browse/IMPALA-7780
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.0
>Reporter: Paul Rogers
>Priority: Trivial
>
> The front-end includes the {{PlannerTest}} test which works by running a 
> query, writing the plan to a file, comparing selected parts of the file to 
> expected results, and flagging if the results differ.
> A plan includes some things we test (operators) and some we do not (text of 
> error messages, value of memory estimates). Over time the expected and actual 
> files have drifted apart. Example:
> {noformat}
> Expected:partitions=1/1 files=2 size=54.20MB
> Actual:  partitions=1/1 files=2 size=54.21MB
> {noformat}
> While the tests still pass (because we ignore the parts which have drifted), 
> it is a pain to track down issues because we must learn to manually ignore 
> "unimportant" differences.
> This ticket asks to "rebase" planner tests on the latest results, copying 
> into the expected results file the current "noise" values from the actual 
> results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-7783) test_default_timezone failing on real cluster

2018-10-30 Thread David Knupp (JIRA)

David Knupp created IMPALA-7783:
---

 Summary: test_default_timezone failing on real cluster
 Key: IMPALA-7783
 URL: https://issues.apache.org/jira/browse/IMPALA-7783
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 3.1.0
Reporter: David Knupp


shell/test_shell_commandline.py/test_default_timezone is failing due to issues 
in asserting zoneinfo/tzname 
{noformat}
shell/test_shell_commandline.py:715: in test_default_timezone
assert os.path.isfile("/usr/share/zoneinfo/" + tzname)
E   assert (('/usr/share/zoneinfo/' + 
'SystemV/PST8PDT'))
E+  where  = .isfile
E+where 
 = os.path {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-2876) Investigate on applying rtm to spinlock

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2876.
---
Resolution: Later

> Investigate on applying rtm to spinlock
> ---
>
> Key: IMPALA-2876
> URL: https://issues.apache.org/jira/browse/IMPALA-2876
> Project: IMPALA
>  Issue Type: Bug
>  Components: Perf Investigation
>Affects Versions: Impala 2.3.0
>Reporter: Zuo Wang
>Priority: Minor
>
> Investigate on applying rtm to spinlock and when to use it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-3025) Add empty string test coverage

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-3025:
--
Issue Type: Test  (was: Bug)

> Add empty string test coverage
> --
>
> Key: IMPALA-3025
> URL: https://issues.apache.org/jira/browse/IMPALA-3025
> Project: IMPALA
>  Issue Type: Test
>  Components: Backend
>Affects Versions: Impala 2.5.0
>Reporter: Skye Wanderman-Milne
>Priority: Minor
>  Labels: ramp-up
>
> As revealed by IMPALA-3018, we have little to no test coverage for empty 
> strings. We should add more coverage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-2957) Query Profile: Incorrect "#Hosts" count in ExecSummary for SCAN NODE

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2957.
---
Resolution: Duplicate

> Query Profile: Incorrect "#Hosts" count in ExecSummary for SCAN NODE
> 
>
> Key: IMPALA-2957
> URL: https://issues.apache.org/jira/browse/IMPALA-2957
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.1
>Reporter: Alan Choi
>Priority: Minor
>
> *Problem:*
> For scan node, even if it's scanning only one file, one block (but 3 
> replica), the "#Hosts" count should be one, but the ExecSummary showed "3" 
> instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-3149) Bind variable issue in ODBC

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-3149:
-

Assignee: (was: Syed A. Hashmi)

> Bind variable issue in ODBC
> ---
>
> Key: IMPALA-3149
> URL: https://issues.apache.org/jira/browse/IMPALA-3149
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.2, Impala 2.3.0
>Reporter: Jiri Novak
>Priority: Minor
>
> For some reason Cloudera Impal does not recognize bind variable in HAVING 
> clause.
> If we execute the following simple query using .Net and ADO.Net
> {code:sql}
> SELECT COUNT(address.address_id) 
> , address.country
> FROM quest_stage.address address 
> GROUP BY address.country 
> HAVING (COUNT(address.address_id) > ?)
> {code}
> It returns the following error
> Error:
> {noformat}
> [Cloudera][ImpalaODBC] (110) Error while executing a query in Impala: [HY000] 
> : AnalysisException: Syntax error in line 5:
> HAVING (COUNT(address.address_id) > ?)
> ^
> Encountered: Unexpected character
> Expected: CASE, CAST, EXISTS, FALSE, IF, INTERVAL, NOT, NULL, TRUE, IDENTIFIER
> {noformat}
> Bind variable works correctly in WHERE clause. Also the query returns correct 
> result if we use a number instead of the bind variable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-7784) Partition pruning handles escaped strings incorrectly

2018-10-30 Thread Csaba Ringhofer (JIRA)

Csaba Ringhofer created IMPALA-7784:
---

 Summary: Partition pruning handles escaped strings incorrectly
 Key: IMPALA-7784
 URL: https://issues.apache.org/jira/browse/IMPALA-7784
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 3.0
Reporter: Csaba Ringhofer


Repro:
{code}
create table tpart (i int) partitioned by (p string)
insert into tpart partition (p="\"") values (1);

select  * from tpart where p = "\"";
Result;
Fetched 0 row(s)

select  * from tpart where p = '"';
Result:
1,

{code}

Hive returns the row for both queries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-3180) Impala Daemon Ready Status leading to Monitor-HostMonitor throttling_logger ERROR

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3180.
---
Resolution: Invalid

Looks like an issue with cloudera manager, which isn't an apache project.

> Impala Daemon Ready Status leading to Monitor-HostMonitor throttling_logger 
> ERROR
> -
>
> Key: IMPALA-3180
> URL: https://issues.apache.org/jira/browse/IMPALA-3180
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.2
>Reporter: chaitanya
>Priority: Minor
>  Labels: impala
>
> When I check from cloudera-scm-agent logs.
> [11/Mar/2016 10:50:19 +] 17000 Metadata-Plugin navigator_plugin_pipeline 
> INFO Stopping Navigator Plugin Pipeline '' for cluster-host-inspector 
> (log dir: None)
> [11/Mar/2016 10:50:23 +] 17000 CP Server Thread-10 _cplogging   INFO 
> 10.81.80.34 - - [11/Mar/2016:10:50:23] "GET 
> /process/9408-cluster-host-inspector/files/inspector HTTP/1.1" 200 2425 "" 
> "Java/1.7.0_67"
> [11/Mar/2016 10:50:38 +] 17000 MainThread agentINFO Process 
> with same id has changed: 9408-cluster-host-inspector.
> [11/Mar/2016 10:50:38 +] 17000 MainThread agentINFO 
> Deactivating process 9408-cluster-host-inspector
> [11/Mar/2016 10:50:39 +] 17000 Metadata-Plugin navigator_plugin INFO 
> stopping Metadata Plugin for cluster-host-inspector with pipelines []
> [11/Mar/2016 10:50:39 +] 17000 Metadata-Plugin navigator_plugin_pipeline 
> INFO Stopping Navigator Plugin Pipeline '' for cluster-host-inspector 
> (log dir: None)
> [11/Mar/2016 10:50:41 +] 17000 Audit-Plugin navigator_plugin INFO 
> stopping Audit Plugin for cluster-host-inspector with pipelines []
> [11/Mar/2016 10:50:41 +] 17000 Audit-Plugin navigator_plugin_pipeline 
> INFO Stopping Navigator Plugin Pipeline '' for cluster-host-inspector 
> (log dir: None)
> [11/Mar/2016 10:52:33 +] 17000 Monitor-HostMonitor throttling_logger 
> ERRORKill subprocess exception with args 
> ['/usr/java/jdk1.7.0_67-cloudera/bin/java', '-classpath', 
> '/usr/share/cmf/lib/agent-5.4.1.jar', 'com.cloudera.cmon.agent.DnsTest']
> Traceback (most recent call last):
>   File "/usr/lib64/cmf/agent/src/cmf/subprocess_timeout.py", line 84, in 
> subprocess_with_timeout
> os.kill(p.pid, signal.SIGTERM)
> OSError: [Errno 3] No such process
> [11/Mar/2016 10:52:33 +] 17000 Monitor-HostMonitor throttling_logger 
> ERROR(1 skipped) Timeout with args 
> ['/usr/java/jdk1.7.0_67-cloudera/bin/java', '-classpath', 
> '/usr/share/cmf/lib/agent-5.4.1.jar', 'com.cloudera.cmon.agent.DnsTest']
> Traceback (most recent call last):
>   File "/usr/lib64/cmf/agent/src/cmf/subprocess_timeout.py", line 84, in 
> subprocess_with_timeout
> os.kill(p.pid, signal.SIGTERM)
> OSError: [Errno 3] No such process
> [11/Mar/2016 10:52:33 +] 17000 Monitor-HostMonitor throttling_logger 
> ERROR(1 skipped) Failed to collect java-based DNS names
> Traceback (most recent call last):
>   File "/usr/lib64/cmf/agent/src/cmf/monitor/host/dns_names.py", line 64, in 
> collect
> result, stdout, stderr = self._subprocess_with_timeout(args, 
> self._poll_timeout)
>   File "/usr/lib64/cmf/agent/src/cmf/monitor/host/dns_names.py", line 46, in 
> _subprocess_with_timeout
> return subprocess_with_timeout(args, timeout)
>   File "/usr/lib64/cmf/agent/src/cmf/subprocess_timeout.py", line 94, in 
> subprocess_with_timeout
> raise Exception("timeout with args %s" % args)
> Exception: timeout with args ['/usr/java/jdk1.7.0_67-cloudera/bin/java', 
> '-classpath', '/usr/share/cmf/lib/agent-5.4.1.jar', 
> 'com.cloudera.cmon.agent.DnsTest']
> [11/Mar/2016 10:52:52 +] 17000 ImpalaDaemonQueryMonitoring 
> throttling_logger ERROR(54 skipped) Error fetching executing query ids at 
> 'http://dcslpd43.amat.com:25000/inflight_query_ids'
> Traceback (most recent call last):
>   File "/usr/lib64/cmf/agent/src/cmf/monitor/impalad/query_monitor.py", line 
> 479, in get_executing_query_ids
> password=password)
>   File "/usr/lib64/cmf/agent/src/cmf/url_util.py", line 62, in 
> urlopen_with_timeout
> return opener.open(url, data, timeout)
>   File "/usr/lib64/python2.6/urllib2.py", line 391, in open
> response = self._open(req, data)
>   File "/usr/lib64/python2.6/urllib2.py", line 409, in _open
> '_open', req)
> File "/usr/lib64/python2.6/urllib2.py", line 369, in _call_chain
> result = func(*args)
>   File "/usr/lib64/python2.6/urllib2.py", line 1190, in http_open
> return self.do_open(httplib.HTTPConnection, req)
>   File "/usr/lib64/python2.6/urllib2.py", line 1165, in do_open
> raise URLError(err)
> URLError: 
> could some one please assist on this.



--
This

[jira] [Resolved] (IMPALA-1159) fnv_hash UDF initialized with 32 bits offset basis

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1159.
---
Resolution: Won't Fix

We've moved away from using FNV anyway, doesn't seem worth enhancing it.

> fnv_hash UDF initialized with 32 bits offset basis
> --
>
> Key: IMPALA-1159
> URL: https://issues.apache.org/jira/browse/IMPALA-1159
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 1.4
> Environment: Linux 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 
> UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Thierry Herrmann
>Priority: Minor
>  Labels: correctness, downgraded, incompatibility
>
> According to 
> http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_math_functions.html
> the fnv_hash UDF implements the 64 bits FNV-1a variation.
> According to 
> http://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function
> the algorithm should be seeded with the 64-bit FNV offset basis value: 
> 14695981039346656037 (in hex, 0xcbf29ce484222325)
> Implementing this, I did not obtain the same FNV 1a hashes as Impala
> E.g. with impala-shell I obtain
> {code}
> +-+
> | fnv_hash('hello')   |
> +-+
> | 6414202926103426347 |
> +-+
> {code}
> whereas it should be -6615550055289275125
> By looking at the Impala unit tests:
> https://github.com/cloudera/Impala/blob/8567b51f8c38bd389a338c761242a316d8ffe5c8/be/src/exprs/expr-test.cc
> Excerpt:
> {code}
> // Test fnv_hash
> string s("hello world");
> uint64_t expected = HashUtil::FnvHash64(s.data(), s.size(), 
> HashUtil::FNV_SEED);
> TestValue("fnv_hash('hello world')", TYPE_BIGINT, expected);
> {code} 
> I see that the algorithm is seeded with the 32 bits offset basis
> instead of FNV64_SEED.
> If I update my algorithm and seed it with the 32 bits offset basis, I obtain 
> the same hashes as impala.
> For backward compatibility, it may not be easy to fix. Or it could be 
> deprecated and replaced with a fixed UDF ?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-1193) Restructure error handling within the impala-shell

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-1193:
--
Issue Type: Improvement  (was: Bug)

> Restructure error handling within the impala-shell
> --
>
> Key: IMPALA-1193
> URL: https://issues.apache.org/jira/browse/IMPALA-1193
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: Impala 1.4
>Reporter: Abdullah Yousufi
>Priority: Minor
>  Labels: shell
>
> Take a look at comments (in patch set 1 and 6) 
> http://gerrit.sjc.cloudera.com:8080/#/c/4100/. 
> Essentially, the main points are to move the main control loop of the shell 
> (while shell_is_alive loop) into the shell class. Second, the exception 
> handling that occurs in the _execute_stmt() method should be moved to the top 
> level so errors are caught by the main control loop for interactive mode and 
> at the loop within execute_queries_non_interactive_mode() for non-interactive 
> mode.
> To prevent redundant error handling, the two respective loops should be 
> wrapped/decorated or call the same method that then handles all the errors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-6209) Impala - Allow override hadoop environmental variables for non-cdh installation

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-6209:
--
Issue Type: Improvement  (was: Bug)

> Impala - Allow override hadoop environmental variables for non-cdh 
> installation
> ---
>
> Key: IMPALA-6209
> URL: https://issues.apache.org/jira/browse/IMPALA-6209
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 2.11.0
> Environment: Centos 7, Hadoop trunk, Hive trunk
>Reporter: ROHIT KRISHNAN
>Priority: Minor
>
> When building Apache Impala from git - The bin/impala-config.sh script 
> doesn't allow for overriding certain variables from impala-config-local.sh 
> (for instance HADOOP_HOME). When I try to install Impala with a non-cloudera 
> installation (from hadoop and hive git repos), I have to manually change 
> impala-config.sh



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-3457) Cloudera Manager when adding HA Name Node does not properly update LOCATION column in table HIVE.SDS, nor DB_LOCATION_URI column in table HIVE.DBS causing all impala qu

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3457.
---
Resolution: Invalid

Cloudera manager is not an apache project, so we don't track issues here.

> Cloudera Manager when adding HA Name Node does not properly update LOCATION 
> column in table HIVE.SDS, nor DB_LOCATION_URI column in table HIVE.DBS 
> causing all impala queries to fail
> -
>
> Key: IMPALA-3457
> URL: https://issues.apache.org/jira/browse/IMPALA-3457
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.5.0
> Environment: Cloudera Manager 5.7 with CDH 5.7.0 (Parcel)
>Reporter: Scott C
>Priority: Minor
>
> Started with functioning system using one Name Node plus a Secondary Name 
> Node.
> Used the Cloudera Manager to add the Name Node to another host for High 
> Availability.
> Afterwards 'hdfs' commands work fine, but any impala queries fail trying to 
> access internal parquet tables:
> {code}CAUSED BY: IOException: Port 9000 specified in URI 
> hdfs://nameservice1:9000/user/hive/warehouse/arecordparquetpartition/year=2015/month=2/day=20
>  but host 'nameservice1' is a logical (HA) namenode and does not use port 
> information.
> CAUSED BY: TableLoadingException: Failed to load metadata for table: 
> arecordparquetpartition
> {code}
> {panel}We use port 9000 for the name node due to legacy starting from CDH 
> 4.8.6 installed from RPM and no Cloudera Manager.
> {panel}
> Manually fixed records in MySQL to correct the problem:
> {code}update DBS set DB_LOCATION_URI = 
> 'hdfs://nameservice1/user/hive/warehouse' where DB_ID=1; 
> update SDS set LOCATION = 
> 'hdfs://nameservice1/user/hive/warehouse/arecordparquetpartition' where SD_ID 
> = 25447;
> select SD_ID,LOCATION from SDS;
> 25469 
> hdfs://nameservice1:9000/user/hive/warehouse/arecordparquetpartition/year=2015/month=2/day=20
> 25470 
> hdfs://nameservice1:9000/user/hive/warehouse/arecordparquetpartition/year=2015/month=3/day=12
> 25471 
> hdfs://nameservice1:9000/user/hive/warehouse/arecordparquetpartition/year=2015/month=3/day=8
> 25472 
> hdfs://nameservice1:9000/user/hive/warehouse/arecordparquetpartition/year=2015/month=3/day=9
> 25473 
> hdfs://nameservice1:9000/user/hive/warehouse/arecordparquetpartition/year=2015/month=4/day=10
> 25474 
> hdfs://nameservice1:9000/user/hive/warehouse/arecordparquetpartition/year=2015/month=4/day=11
> 25475 
> hdfs://nameservice1:9000/user/hive/warehouse/arecordparquetpartition/year=2015/month=5/day=10
> 25476 
> hdfs://nameservice1:9000/user/hive/warehouse/arecordparquetpartition/year=2015/month=5/day=8
> 25477 
> hdfs://nameservice1:9000/user/hive/warehouse/arecordparquetpartition/year=2015/month=5/day=9
> update SDS set LOCATION = 
> 'hdfs://nameservice1/user/hive/warehouse/arecordparquetpartition/year=2015/month=2/day=20'
>  where SD_ID = 25469;
> update SDS set LOCATION = 
> 'hdfs://nameservice1/user/hive/warehouse/arecordparquetpartition/year=2015/month=3/day=12'
>  where SD_ID = 25470;
> update SDS set LOCATION = 
> 'hdfs://nameservice1/user/hive/warehouse/arecordparquetpartition/year=2015/month=3/day=8'
>  where SD_ID = 25471;
> update SDS set LOCATION = 
> 'hdfs://nameservice1/user/hive/warehouse/arecordparquetpartition/year=2015/month=3/day=9'
>  where SD_ID = 25472;
> update SDS set LOCATION = 
> 'hdfs://nameservice1/user/hive/warehouse/arecordparquetpartition/year=2015/month=4/day=10'
>  where SD_ID = 25473;
> update SDS set LOCATION = 
> 'hdfs://nameservice1/user/hive/warehouse/arecordparquetpartition/year=2015/month=4/day=11'
>  where SD_ID = 25474;
> update SDS set LOCATION = 
> 'hdfs://nameservice1/user/hive/warehouse/arecordparquetpartition/year=2015/month=5/day=10'
>  where SD_ID = 25475;
> update SDS set LOCATION = 
> 'hdfs://nameservice1/user/hive/warehouse/arecordparquetpartition/year=2015/month=5/day=8'
>  where SD_ID = 25476;
> update SDS set LOCATION = 
> 'hdfs://nameservice1/user/hive/warehouse/arecordparquetpartition/year=2015/month=5/day=9'
>  where SD_ID = 25477;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7733) TestInsertParquetQueries.test_insert_parquet is flaky in S3 due to rename

2018-10-30 Thread Steve Loughran (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668927#comment-16668927
 ] 

Steve Loughran commented on IMPALA-7733:


It could be your assertions are just brittle to change, in which case spinning 
briefly until the listings are consistent is a tactic...but it is a symptom of 
a problem

> TestInsertParquetQueries.test_insert_parquet is flaky in S3 due to rename
> -
>
> Key: IMPALA-7733
> URL: https://issues.apache.org/jira/browse/IMPALA-7733
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.1.0
>Reporter: Vuk Ercegovac
>Assignee: Tianyi Wang
>Priority: Blocker
>
> I see two examples in the past two months or so where this test fails due to 
> a rename error on S3. The test's stacktrace looks like this:
> {noformat}
> query_test/test_insert_parquet.py:112: in test_insert_parquet
> self.run_test_case('insert_parquet', vector, unique_database, 
> multiple_impalad=True)
> common/impala_test_suite.py:408: in run_test_case
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:625: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:160: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:176: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:350: in __execute_query
> self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:371: in wait_for_finished
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:Error(s) moving partition files. First error (of 1) was: 
> Hdfs op (RENAME 
> s3a:///test_insert_parquet_968f37fe.db/orders_insert_table/_impala_insert_staging/4e45cd68bcddd451_3c7156ed/.4e45cd68bcddd451-3c7156ed0002_803672621_dir/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq
>  TO 
> s3a:///test-warehouse/test_insert_parquet_968f37fe.db/orders_insert_table/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq)
>  failed, error was: 
> s3a:///test-warehouse/test_insert_parquet_968f37fe.db/orders_insert_table/_impala_insert_staging/4e45cd68bcddd451_3c7156ed/.4e45cd68bcddd451-3c7156ed0002_803672621_dir/4e45cd68bcddd451-3c7156ed0002_448261088_data.0.parq
> E   Error(5): Input/output error{noformat}
> Since we know this happens once in a while, some ideas to deflake it:
>  * retry
>  * check for this specific issue... if we think its platform flakiness, then 
> we should skip it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-832) Negative look ahead using Regex does not work

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-832:
-
Issue Type: New Feature  (was: Bug)

> Negative look ahead using Regex does not work
> -
>
> Key: IMPALA-832
> URL: https://issues.apache.org/jira/browse/IMPALA-832
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: Impala 1.2.3
>Reporter: Udai Kiran Potluri
>Priority: Minor
>
> Provide support for negative look ahead regex queries - Example in 
> http://www.boost.org/doc/libs/1_31_0/libs/regex/doc/syntax.html 
> Such as regexp '(?!dontmatchthis).*'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-905) Admission control should enforce hierarchical pool limits

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-905:
-
Issue Type: New Feature  (was: Bug)

> Admission control should enforce hierarchical pool limits
> -
>
> Key: IMPALA-905
> URL: https://issues.apache.org/jira/browse/IMPALA-905
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: Impala 1.3
>Reporter: Matthew Jacobs
>Priority: Minor
>  Labels: admission-control, resource-management
>
> When configuring a hierarchy of pools via a fair-scheduler.xml configuration, 
> parent pools can have limits that should be enforced as well as that of the 
> leaf pools.
> E.g. admission to dev/qa requires meeting the limits of both dev/qa and 
> engineering:
> {code}
> ...
> 
>   ...
>   
> ... 
> 
>   ... 
> 
> ...
>   
>   ...
>   ...
> 
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-2581) Push down "LIMIT" when possible

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-2581:
--
Issue Type: Improvement  (was: Bug)

> Push down "LIMIT"  when possible
> 
>
> Key: IMPALA-2581
> URL: https://issues.apache.org/jira/browse/IMPALA-2581
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.5.0
>Reporter: Jim Apple
>Priority: Minor
>  Labels: performance
>
> In a table t with a column x with no null values, "SELECT DISTINCT x FROM t 
> LIMIT 1" should be roughly instant. Instead, it finds *all* the distinct 
> values, then returns one of them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-2581) Push down LIMIT past DISTINCT

2018-10-30 Thread Jim Apple (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Apple updated IMPALA-2581:
--
Summary: Push down LIMIT  past DISTINCT  (was: Push down "LIMIT"  when 
possible)

> Push down LIMIT  past DISTINCT
> --
>
> Key: IMPALA-2581
> URL: https://issues.apache.org/jira/browse/IMPALA-2581
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.5.0
>Reporter: Jim Apple
>Priority: Minor
>  Labels: performance
>
> In a table t with a column x with no null values, "SELECT DISTINCT x FROM t 
> LIMIT 1" should be roughly instant. Instead, it finds *all* the distinct 
> values, then returns one of them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-2553) Impala should work correctly with Avro-based table containing Enum fields

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-2553:
--
Issue Type: New Feature  (was: Bug)

> Impala should work correctly with Avro-based table containing Enum fields
> -
>
> Key: IMPALA-2553
> URL: https://issues.apache.org/jira/browse/IMPALA-2553
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: Impala 2.2.4, Impala 2.3.0
> Environment: Impala: Impala Shell v2.2.0-cdh5 (2ffd73a) built on Tue 
> Apr 21 12:09:21 PDT 2015)
> CDH: 5.4.0 
> OS: Linux quickstart.cloudera 2.6.32-358.el6.x86_64
> Avro 1.7.6
>Reporter: Santosh Kumar
>Priority: Minor
> Fix For: Impala 1.3
>
>
> Running a query on an Avro-based table with enum types in Hive works 
> correctly but running the same query on the same table in Impala fails with 
> an error message.
> Impala should handle enum types by returning a string value (just like Hive).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-2626) In-flight queries fail when statestore comes back online.

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2626.
---
Resolution: Duplicate

> In-flight queries fail when statestore comes back online.
> -
>
> Key: IMPALA-2626
> URL: https://issues.apache.org/jira/browse/IMPALA-2626
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Affects Versions: Impala 2.3.0
>Reporter: Sailesh Mukil
>Assignee: Sailesh Mukil
>Priority: Minor
>  Labels: chaosmonkey, statestore, usability
>
> During a session, if the statestore goes down, the impalads continue 
> execution if they have enough metadata that they've already received from the 
> statestore prior to it's failure.
> The impalads can continue execution without the statestore with the stale 
> metadata that they posses. However, when the statestore comes back online, 
> the first membership callback it makes to the impalad hosts, erases the 
> "known_backends" list that the impalads have stored locally.
> Therefore, in-flight queries fail(sometimes without propagating the error to 
> the shell -> IMPALA-1325).
> Solution:
> Do not erase the list of "known_backends" in each impalad until the 
> statestore has a new list to provide to the impalads.
> _This bug was found during initial runs of ChaosMonkey on Impala._



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-558) HS2::FetchResults sets hasMoreRows in many cases where no more rows are to be returned

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-558:


Assignee: (was: Alexander Behm)

> HS2::FetchResults sets hasMoreRows in many cases where no more rows are to be 
> returned
> --
>
> Key: IMPALA-558
> URL: https://issues.apache.org/jira/browse/IMPALA-558
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 1.1
>Reporter: Henry Robinson
>Priority: Minor
>  Labels: query-lifecycle
>
> The first call to {{FetchResults}} always sets {{hasMoreRows}} even when 0 
> rows should be returned. The next call correctly sets {{hasMoreRows == 
> False}}. The upshot is there's always an extra round-trip, although 
> correctness isn't affected.
> {code}
> execute_statement_req = TCLIService.TExecuteStatementReq()
> execute_statement_req.sessionHandle = resp.sessionHandle
> execute_statement_req.statement = "SELECT COUNT(*) FROM 
> functional.alltypes WHERE 1 = 2"
> execute_statement_resp = 
> self.hs2_client.ExecuteStatement(execute_statement_req)
> 
> fetch_results_req = TCLIService.TFetchResultsReq()
> fetch_results_req.operationHandle = execute_statement_resp.operationHandle
> fetch_results_req.maxRows = 100
> fetch_results_resp = self.hs2_client.FetchResults(fetch_results_req)
> 
> assert not fetch_results_resp.hasMoreRows # Fails
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-532) Impala should tolerate bad locale settings.

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-532:


Assignee: (was: Henry Robinson)

> Impala should tolerate bad locale settings.
> ---
>
> Key: IMPALA-532
> URL: https://issues.apache.org/jira/browse/IMPALA-532
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 1.1
>Reporter: Ishaan Joshi
>Priority: Minor
>
> Currently, the Statestore does not tolerate a bad locale setting and crashes 
> while starting up.
> {code}
>  USE_DEBUG_BUILD=false
> + perl -pi -e 
> 's#{{CMF_CONF_DIR}}#/var/run/cloudera-scm-agent/process/2469-impala-STATESTORE#g'
>  
> /var/run/cloudera-scm-agent/process/2469-impala-STATESTORE/impala-conf/state_store_flags
> perl: warning: Setting locale failed.
> perl: warning: Please check that your locale settings:
> LANGUAGE = (unset),
> LC_ALL = (unset),
> LANG = "fr_FR.UTF-8"
> are supported and installed on your system.
> perl: warning: Falling back to the standard locale ("C").
> + '[' -f 
> /var/run/cloudera-scm-agent/process/2469-impala-STATESTORE/impala-conf/.htpasswd
>  ']'
> + chmod 600 
> /var/run/cloudera-scm-agent/process/2469-impala-STATESTORE/impala-conf/.htpasswd
> + false
> + export 
> IMPALA_BIN=/opt/cloudera/parcels/IMPALA-1.1-1.p0.8/lib/impala/sbin-retail
> + IMPALA_BIN=/opt/cloudera/parcels/IMPALA-1.1-1.p0.8/lib/impala/sbin-retail
> + '[' impalad = statestore ']'
> + '[' statestore = statestore ']'
> + exec 
> /opt/cloudera/parcels/IMPALA-1.1-1.p0.8/lib/impala/../../bin/statestored 
> --flagfile=/var/run/cloudera-scm-agent/process/2469-impala-STATESTORE/impala-conf/state_store_flags
> terminate called after throwing an instance of 'std::runtime_error'
>   what():  locale::facet::_S_create_c_locale name not valid
> {code}
> It should fall back to the standard locale ("C"), if the user's locale is 
> messed up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-1333) Missing implicit casts between char/varchar/string.

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-1333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-1333:
--
Issue Type: Improvement  (was: Bug)

> Missing implicit casts between char/varchar/string.
> ---
>
> Key: IMPALA-1333
> URL: https://issues.apache.org/jira/browse/IMPALA-1333
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.0, Impala 2.2, Impala 2.3.0
>Reporter: Lenni Kuff
>Priority: Minor
>  Labels: sql-language, usability
>
> The following scenario needs to work:
> {code}
> create table vc (v varchar(10));
> insert into vc select 'a';
> ERROR: AnalysisException: Possible loss of precision for target table 
> 'default.vc'.
> Expression ''a'' (type: STRING) would need to be cast to VARCHAR(10) for 
> column 'v'
> {code}
> *Workaround*
> Use explicit casts.
> {code}
> Query: insert into vc select cast('a' as varchar(10))
> Inserted 1 row(s) in 0.32s
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7190) Remove unsupported format write support

2018-10-30 Thread Bikramjeet Vig (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669079#comment-16669079
 ] 

Bikramjeet Vig commented on IMPALA-7190:


I don't think this is a compatibility breaking release. we should probably just 
retain it and mark it as such

> Remove unsupported format write support
> ---
>
> Key: IMPALA-7190
> URL: https://issues.apache.org/jira/browse/IMPALA-7190
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Bikramjeet Vig
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> Let's remove the formats gated by ALLOW_UNSUPPORTED_FORMATS since progress 
> stalled a long time ago. It sounds like there's a consensus on the mailing 
> list to remove the code:
> [https://lists.apache.org/thread.html/749bef4914350ae0756bc88961db2dd39901a649a9cef6949eda5870@%3Cdev.impala.apache.org%3E]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-2398) [Nit]: Change error text on Explain Query for Compute Stats Command

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-2398:
--
Issue Type: Improvement  (was: Bug)

> [Nit]: Change error text on Explain Query for Compute Stats Command
> ---
>
> Key: IMPALA-2398
> URL: https://issues.apache.org/jira/browse/IMPALA-2398
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.3.0, Impala 2.5.0
>Reporter: Aleksandr Shulman
>Priority: Minor
>
> Minor fit & finish nit, but we should take care of it.
> Expected behavior:
> Attempting to perform an 'Explain Query' operation on a command to compute 
> table stats should result in an explanation that explain query on a table 
> stats call is an undefined operation.
> Observed behavior:
> Command: {{COMPUTE STATS orders;}}
> Output: {{AnalysisException: Syntax error in line 1: EXPLAIN COMPUTE STATS 
> orders ^ Encountered: COMPUTE Expected: CREATE, INSERT, SELECT, VALUES, WITH 
> CAUSED BY: Exception: Syntax error}}
> Why this should be fixed:
> The error should instruct the user that the operation they are trying to 
> perform is not supported, instead of claiming that it is a syntax error, as 
> the syntax is correct.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-2571) Flaky test: metadata.test_hdfs_permissions.TestHdfsPermissions.test_insert_into_read_only_table

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2571.
---
Resolution: Cannot Reproduce

> Flaky test: 
> metadata.test_hdfs_permissions.TestHdfsPermissions.test_insert_into_read_only_table
> ---
>
> Key: IMPALA-2571
> URL: https://issues.apache.org/jira/browse/IMPALA-2571
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.5.0
>Reporter: Jim Apple
>Priority: Minor
>  Labels: test, test-infra
>
> http://sandbox.jenkins.cloudera.com/job/impala-master-repeated-runs-cdh5/335/ 
> had no git changes, but 
> metadata.test_hdfs_permissions.TestHdfsPermissions.test_insert_into_read_only_table
>  failed:
> metadata/test_hdfs_permissions.py:59: in test_insert_into_read_only_table 
> assert 'does not have WRITE access to at least one HDFS path: hdfs:' in 
> str(e) E   assert 'does not have WRITE access to at least one HDFS path: 
> hdfs:' in "ImpalaBeeswaxException:\n INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'>\n MESSAGE: AnalysisException: Table does 
> not exist: default.read_only_tbl\n" E+  where "ImpalaBeeswaxException:\n 
> INNER EXCEPTION: \n MESSAGE: 
> AnalysisException: Table does not exist: default.read_only_tbl\n" = 
> str(ImpalaBeeswaxException())



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-2343) Capture operator timing information covering open/close & first/last batch close

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-2343:
--
Issue Type: Improvement  (was: Bug)

> Capture operator timing information covering open/close & first/last batch 
> close
> 
>
> Key: IMPALA-2343
> URL: https://issues.apache.org/jira/browse/IMPALA-2343
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.2.4
>Reporter: Mostafa Mokhtar
>Priority: Minor
>  Labels: performance, supportability
>
> Currently Impala query profile doesn't cover operator level timeline, which 
> makes it difficult to understand the query timeline and fragment dependencies.
> Such information will allow us to provide query swim lanes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-2746) Backend tests should pass with leak sanitizer enabled

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-2746:
--
Issue Type: Test  (was: Bug)

> Backend tests should pass with leak sanitizer enabled
> -
>
> Key: IMPALA-2746
> URL: https://issues.apache.org/jira/browse/IMPALA-2746
> Project: IMPALA
>  Issue Type: Test
>  Components: Backend
>Affects Versions: Impala 2.3.0
>Reporter: Martin Grund
>Priority: Minor
>  Labels: resource-management, test-infra
>
> Currently, when running the backend tests with ASAN, the build will fail if 
> memory leak detection is enabled. We should investigate where leaks occur and 
> fix them to make sure we can benefit from the leak detection as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-2729) Support default values for DECIMAL in Avro.

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-2729:
--
Issue Type: New Feature  (was: Bug)

> Support default values for DECIMAL in Avro.
> ---
>
> Key: IMPALA-2729
> URL: https://issues.apache.org/jira/browse/IMPALA-2729
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: Impala 2.0, Impala 2.1, Impala 2.2, Impala 2.3.0
>Reporter: Alexander Behm
>Priority: Minor
>  Labels: ramp-up, usability
>
> Impala may fail to scan Avro files of a table that has default values 
> specified on DECIMAL columns.
> The following criteria must be met for hitting this bug:
> 1. The Impala table was created from an Avro schema that has a default value 
> on a DECIMAL field
> 2. There is an Avro file storing data of that table that does not have the 
> DECIMAL field declared as part of the table schema from 1
> When querying such a table Impala will return the following error
> {code}
> Field 'decimalFieldName' is missing from file and default values of type 
> DECIMAL are not yet supported.
> {code}
> The relevant code can be found in hdfs-avro-scanner.cc:
> {code}
> Status HdfsAvroScanner::WriteDefaultValue(
> SlotDescriptor* slot_desc, avro_datum_t default_value, const char* 
> field_name) {
>   if (avro_header_->template_tuple == NULL) {
> avro_header_->template_tuple = template_tuple_ != NULL ?
> template_tuple_ : 
> scan_node_->InitEmptyTemplateTuple(*scan_node_->tuple_desc());
>   }
>   switch (default_value->type) {
> case AVRO_BOOLEAN: {
>   // We don't call VerifyTypesMatch() above the switch statement so we 
> don't want to
>   // call it in the default case (since we VerifyTypesMatch() can't 
> handle every type
>   // either, and we want to return the correct error message).
>   RETURN_IF_ERROR(VerifyTypesMatch(slot_desc, default_value));
>   int8_t v;
>   if (avro_boolean_get(default_value, )) DCHECK(false);
>   RawValue::Write(, avro_header_->template_tuple, slot_desc, NULL);
>   break;
> }
> case AVRO_INT32: {
>   RETURN_IF_ERROR(VerifyTypesMatch(slot_desc, default_value));
>   int32_t v;
>   if (avro_int32_get(default_value, )) DCHECK(false);
>   RawValue::Write(, avro_header_->template_tuple, slot_desc, NULL);
>   break;
> }
> ...
> <--- case for AVRO_DECIMAL not handled
> default:
>   return Status(TErrorCode::AVRO_UNSUPPORTED_DEFAULT_VALUE, field_name,
>   avro_type_name(default_value->type));
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-2743) Propagate IS NOT NULL to scans from cols in inner joins

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-2743:
--
Issue Type: Improvement  (was: Bug)

> Propagate IS NOT NULL to scans from cols in inner joins
> ---
>
> Key: IMPALA-2743
> URL: https://issues.apache.org/jira/browse/IMPALA-2743
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.2, Impala 2.3.0
>Reporter: Mostafa Mokhtar
>Priority: Minor
>  Labels: performance, planner, ramp-up
>
> Columns involved inner joins and filters should have a "IS NOT NULL" filter 
> pushed to the ScanNode. Other operators reading the value for the column 
> shouldn't be re-checking if the value is null again. 
> When using Parquet file format ideally we should be using the stats to short 
> circuit these filters. 
> Currently whenever a slot is read by any of the operators SlotRef checks if 
> the value is NULL which adds unnecessary branches and instructions. 
> {code}
> BigIntVal SlotRef::GetBigIntVal(ExprContext* context, TupleRow* row) {
>   DCHECK_EQ(type_.type, TYPE_BIGINT);
>   Tuple* t = row->GetTuple(tuple_idx_);
>   if (t == NULL || t->IsNull(null_indicator_offset_)) return 
> BigIntVal::null();
>   return BigIntVal(*reinterpret_cast(t->GetSlot(slot_offset_)));
> }
> {code}
> {code}
> impalad!impala::Tuple::IsNull
> impalad!impala::SlotRef::GetBigIntVal - [Unknown]
> impalad!impala::SlotRef::GetBigIntVal - [Unknown]
> impalad!impala::ExprContext::GetValue+0x13b - [Unknown]:[Unknown]
> impalad!impala::HashTableCtx::EvalRow+0xb7 - [Unknown]:[Unknown]
> impalad!impala::PartitionedHashJoinNode::Partition::BuildHashTableInternal<(bool)0>+0x372
>  - [Unknown]:[Unknown]
> impalad!impala::PartitionedHashJoinNode::Partition::BuildHashTable+0xd - 
> [Unknown]:[Unknown]
> impalad!impala::PartitionedHashJoinNode::BuildHashTables+0xa0 - 
> [Unknown]:[Unknown]
> impalad!impala::PartitionedHashJoinNode::ProcessBuildInput+0xc71 - 
> [Unknown]:[Unknown]
> impalad!impala::PartitionedHashJoinNode::ConstructBuildSide+0x106 - 
> [Unknown]:[Unknown]
> impalad!impala::BlockingJoinNode::BuildSideThread+0x7f - [Unknown]:[Unknown]
> impalad!impala::Thread::SuperviseThread+0x1b9 - [Unknown]:[Unknown]
> impalad!boost::detail::thread_data (*)(std::string const&, std::string const&, boost::function, 
> impala::Promise*), boost::_bi::list4, 
> boost::_bi::value, boost::_bi::value (void)>>, boost::_bi::value*::run+0x7f - 
> [Unknown]:[Unknown]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-3181) build most of impala with -fno-exceptions

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-3181:
--
Issue Type: Improvement  (was: Bug)

> build most of impala with -fno-exceptions
> -
>
> Key: IMPALA-3181
> URL: https://issues.apache.org/jira/browse/IMPALA-3181
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.5.0
>Reporter: Dan Hecht
>Assignee: Michael Ho
>Priority: Minor
>
> In the backend, we only use exceptions when interfacing with third party 
> libraries that throw exceptions (e.g. boost), which we wrap with try-catch. 
> Outside of that, we don't expect, nor handle, exceptions unwinding impala 
> code.
> So, we could isolate calls to those thirdparty libraries to .cc files, and 
> then compile the rest of Impala with -fno-exceptions.  The advantages to 
> doing so are:
> * Better code. All the stack unwinding code goes away and potentially more 
> optimizations are possible.
> * We'll get compiler errors if try-catch is written in places where it won't 
> work anyway (IR code since we disable generating exception handling code). 
> This probably could have caught IMPALA-2184 at compile time.
> A good starting point is probably to turn on -fno-exceptions for IR since we 
> already don't generate exception handling code.  This may require refactoring 
> use of boost headers, however, since statically exceptions may be possible on 
> these paths.
> An alternative would be to use 'noexcept', but this would require annotating 
> every function.
> Example:
> {code}
> #include 
> struct Foo {
>   Foo() { printf("Foo()"); }
>   ~Foo() { printf("~Foo()"); }
> };
> void NoExcept() noexcept;
> void Except();
> void CallNoExcept() {
>   Foo foo;
>   NoExcept();
> }
> void CallExcept() {
>   Foo foo;
>   Except();
> }
> int main() {
>   CallExcept();
> }
> {code}
> {code:title="-std=c++11 -O2 -S"}
> CallNoExcept():
> .LFB18:
> .cfi_startproc
> .cfi_personality 0x3,__gxx_personality_v0
> .cfi_lsda 0x3,.LLSDA18
> subq$8, %rsp
> .cfi_def_cfa_offset 16
> movl$.LC0, %edi
> xorl%eax, %eax
> .LEHB0:
> callprintf
> .LEHE0:
> callNoExcept()
> movl$.LC1, %edi
> xorl%eax, %eax
> addq$8, %rsp
> .cfi_def_cfa_offset 8
> jmp printf
> .cfi_endproc
> ...
> CallExcept():
> .LFB19:
> .cfi_startproc
> .cfi_personality 0x3,__gxx_personality_v0
> .cfi_lsda 0x3,.LLSDA19
> pushq   %rbx
> .cfi_def_cfa_offset 16
> .cfi_offset 3, -16
> movl$.LC0, %edi
> xorl%eax, %eax
> .LEHB1:
> callprintf
> .LEHE1:
> .LEHB2:
> callExcept()
> .LEHE2:
> popq%rbx
> .cfi_remember_state
> .cfi_def_cfa_offset 8
> movl$.LC1, %edi
> xorl%eax, %eax
> jmp printf
> .L5:
> .cfi_restore_state
> movq%rax, %rbx
> movl$.LC1, %edi
> xorl%eax, %eax
> callprintf
> movq%rbx, %rdi
> .LEHB3:
> call_Unwind_Resume
> .LEHE3:
> .cfi_endproc
> {code}
> {code:title="-std=c++11 -O2 -fno-exceptions -S"}
> CallNoExcept():
> .LFB18:
> .cfi_startproc
> subq$8, %rsp
> .cfi_def_cfa_offset 16
> movl$.LC0, %edi
> xorl%eax, %eax
> callprintf
> callNoExcept()
> movl$.LC1, %edi
> xorl%eax, %eax
> addq$8, %rsp
> .cfi_def_cfa_offset 8
> jmp printf
> .cfi_endproc
> ...
> CallExcept():
> .LFB19:
> .cfi_startproc
> subq$8, %rsp
> .cfi_def_cfa_offset 16
> movl$.LC0, %edi
> xorl%eax, %eax
> callprintf
> callExcept()
> movl$.LC1, %edi
> xorl%eax, %eax
> addq$8, %rsp
> .cfi_def_cfa_offset 8
> jmp printf
> .cfi_endproc
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-659) No Way to Escape Vertical Whitespace in Text Files

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-659:
-
Issue Type: New Feature  (was: Bug)

> No Way to Escape Vertical Whitespace in Text Files
> --
>
> Key: IMPALA-659
> URL: https://issues.apache.org/jira/browse/IMPALA-659
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: Impala 1.1.1, Impala 2.5.0
>Reporter: David E. Wheeler
>Priority: Minor
>  Labels: impala, text
>
> There appears to be no way to escape vertical whitespace in Impala text 
> files. I tried creating a table using \ as an escape, but when I include 
> {{\n}} in it, the character is ignored.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-3366) custom cluster tests should be more easily parallelizable

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-3366:
--
Issue Type: Improvement  (was: Bug)

> custom cluster tests should be more easily parallelizable
> -
>
> Key: IMPALA-3366
> URL: https://issues.apache.org/jira/browse/IMPALA-3366
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 2.6.0
>Reporter: Jim Apple
>Priority: Minor
>
> The custome cluster tests are similar to the e2e tests, in that they are in 
> the same directory and can be run with pytest, but they are run separately 
> and must be run after the e2e tests because they start and stop the cluster.
> They should be made more parallelizable to reduce the wall clock time 
> required for testing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-3357) Sorter's quicksort implementation is very suboptimal for duplicate keys

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-3357:
--
Issue Type: Improvement  (was: Bug)

> Sorter's quicksort implementation is very suboptimal for duplicate keys
> ---
>
> Key: IMPALA-3357
> URL: https://issues.apache.org/jira/browse/IMPALA-3357
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.6.0
>Reporter: Tim Armstrong
>Priority: Minor
>  Labels: perf, ramp-up
>
> {code}
> void Sorter::TupleSorter::SortHelper(TupleIterator first, TupleIterator last) 
> {
>   if (UNLIKELY(state_->is_cancelled())) return;
>   // Use insertion sort for smaller sequences.
>   while (last.index_ - first.index_ > INSERTION_THRESHOLD) {
> TupleIterator iter(this, first.index_ + (last.index_ - first.index_) / 2);
> DCHECK(iter.current_tuple_ != NULL);
> // Partition() splits the tuples in [first, last) into two groups (<= 
> pivot
> // and >= pivot) in-place. 'cut' is the index of the first tuple in the 
> second group.
> TupleIterator cut = Partition(first, last,
> reinterpret_cast(iter.current_tuple_));
> SortHelper(cut, last);
> last = cut;
> if (UNLIKELY(state_->is_cancelled())) return;
>   }
>   InsertionSort(first, last);
> }
> {code}
> The quicksort implementation in the sorter is based on dividing the input 
> into two partitions: <= pivot and >= pivot.
> If all of the input values in a partition are equal, then it will still 
> recursively divide and do insertion sort on the values. We could change the 
> sorter to partition the input into three partitions: <, == and >. Then it 
> doesn't need to recurse on the middle partition. This would mean it could 
> sort a partition full of duplicate values in a single pass over the input.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-3366) custom cluster tests should be more easily parallelizable

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-3366:
--
Issue Type: Test  (was: Improvement)

> custom cluster tests should be more easily parallelizable
> -
>
> Key: IMPALA-3366
> URL: https://issues.apache.org/jira/browse/IMPALA-3366
> Project: IMPALA
>  Issue Type: Test
>  Components: Infrastructure
>Affects Versions: Impala 2.6.0
>Reporter: Jim Apple
>Priority: Minor
>
> The custome cluster tests are similar to the e2e tests, in that they are in 
> the same directory and can be run with pytest, but they are run separately 
> and must be run after the e2e tests because they start and stop the cluster.
> They should be made more parallelizable to reduce the wall clock time 
> required for testing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7782) discrepancy in results with a subquery containing an agg that produces an empty set

2018-10-30 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668899#comment-16668899
 ] 

Tim Armstrong commented on IMPALA-7782:
---

Looks like a bad rewrite in the planner. The plan doesn't make any sense to me, 
it seems like it did some incorrect optimisation based on the subquery being 
empty:
{noformat}
**
[localhost:21000] default> use functional;
Query: use functional
[localhost:21000] functional> explain 
> SELECT id
> FROM alltypestiny
> WHERE -1 NOT IN (SELECT COUNT(id) FROM 
alltypestiny HAVING false);
Query: explain SELECT id
FROM alltypestiny
WHERE -1 NOT IN (SELECT COUNT(id) FROM alltypestiny HAVING false)
++
| Explain String |
++
| Max Per-Host Resource Reservation: Memory=8.00KB Threads=4 |
| Per-Host Resource Estimates: Memory=32MB   |
| Codegen disabled by planner|
||
| PLAN-ROOT SINK |
| |  |
| 04:EXCHANGE [UNPARTITIONED]|
| |  |
| 02:NESTED LOOP JOIN [CROSS JOIN, BROADCAST]|
| |  |
| |--03:EXCHANGE [BROADCAST] |
| |  |   |
| |  01:EMPTYSET |
| |  |
| 00:SCAN HDFS [functional.alltypestiny] |
|partitions=4/4 files=4 size=460B|
++
{noformat}

> discrepancy in results with a subquery containing an agg that produces an 
> empty set
> ---
>
> Key: IMPALA-7782
> URL: https://issues.apache.org/jira/browse/IMPALA-7782
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.12.0, Impala 3.1.0
>Reporter: Michael Brown
>Priority: Blocker
>  Labels: correctness, query_generator
>
> A discrepancy exists between Impala and Postgres when a subquery contains an 
> agg and results in an empty set, yet the WHERE clause looking at the subquery 
> should produce a "True" condition.
> Example queries include:
> {noformat}
> USE functional;
> SELECT id
> FROM alltypestiny
> WHERE -1 NOT IN (SELECT COUNT(id) FROM alltypestiny HAVING false);
> SELECT id
> FROM alltypestiny
> WHERE NULL NOT IN (SELECT COUNT(id) FROM alltypestiny HAVING false);
> SELECT id
> FROM alltypestiny
> WHERE (SELECT COUNT(id) FROM alltypestiny HAVING false) IS NULL;
> {noformat}
> These queries do not produce any rows in Impala. In Postgres, the queries 
> produce all 8 rows for the functional.alltypestiny id column.
> Thinking maybe there were Impala and Postgres differences with {{NOT IN}} 
> behavior, I also tried this:
> {noformat}
> USE functional;
> SELECT id
> FROM alltypestiny
> WHERE -1 NOT IN (SELECT 1 FROM alltypestiny WHERE bool_col IS NULL);
> {noformat}
> This subquery also produces an empty set just like the subquery in the 
> problematic queries at the top, but unlike those queries, this full query 
> returns the same results in Impala and Postgres (all 8 rows for the 
> functional.alltypestiny id column).
> For anyone interested in this bug, you can migrate data into postgres in a 
> dev environment using
> {noformat}
> tests/comparison/data_generator.py --use-postgresql --migrate-table-names 
> alltypestiny --db-name functional migrate
> {noformat}
> This is in 2.12 at least, so it's not a 3.1 regression.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-1306) Avoid passing (empty) tuples of non-materialized slots, if consumer does not need them

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-1306:
--
Issue Type: Improvement  (was: Bug)

> Avoid passing (empty) tuples of non-materialized slots, if consumer does not 
> need them
> --
>
> Key: IMPALA-1306
> URL: https://issues.apache.org/jira/browse/IMPALA-1306
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 1.4.1
>Reporter: Ippokratis Pandis
>Assignee: Marcel Kornacker
>Priority: Minor
>  Labels: planner
>
> In case of non-materialized slots we should not producing tuples
> for those slots, if it is the only slot in the tuple and the consumer
> node(s) do not need them. For example, in the query below nodes 03:ANALYTIC 
> and 06:EXCHANGE should not have tuple_id=1.
> This has some impact in perf of BE, as in many codepaths we iterate over all 
> the tuples in the row.
> {code}
> [localhost:21000] > explain select AVG(t1.int_col) OVER ()  FROM alltypestiny 
> t1 WHERE EXISTS (SELECT t1.month FROM alltypestiny t1);
> Query: explain select AVG(t1.int_col) OVER ()  FROM alltypestiny t1 WHERE 
> EXISTS (SELECT t1.month FROM alltypestiny t1)
> +--+
> | Explain String   |
> +--+
> | Estimated Per-Host Requirements: Memory=64.00MB VCores=2 |
> |  |
> | 03:ANALYTIC  |
> | |  functions: avg(t1.int_col)|
> | |  hosts=3 per-host-mem=unavailable  |
> | |  tuple-ids=0,1,6 row-size=12B cardinality=8|
> | ||
> | 06:EXCHANGE [UNPARTITIONED]  |
> | |  hosts=3 per-host-mem=unavailable  |
> | |  tuple-ids=0,1 row-size=4B cardinality=8   |
> | ||
> | 02:CROSS JOIN [BROADCAST]|
> | |  hosts=3 per-host-mem=0B   |
> | |  tuple-ids=0,1 row-size=4B cardinality=8   |
> | ||
> | |--05:EXCHANGE [BROADCAST]   |
> | |  |  hosts=3 per-host-mem=0B|
> | |  |  tuple-ids=1 row-size=0B cardinality=1  |
> | |  | |
> | |  04:EXCHANGE [UNPARTITIONED]   |
> | |  |  limit: 1   |
> | |  |  hosts=3 per-host-mem=unavailable   |
> | |  |  tuple-ids=1 row-size=0B cardinality=1  |
> | |  | |
> | |  01:SCAN HDFS [functional.alltypestiny t1, RANDOM] |
> | | partitions=4/4 size=460B   |
> | | table stats: 8 rows total  |
> | | column stats: all  |
> | | limit: 1   |
> | | hosts=3 per-host-mem=32.00MB   |
> | | tuple-ids=1 row-size=0B cardinality=1  |
> | ||
> | 00:SCAN HDFS [functional.alltypestiny t1, RANDOM]|
> |partitions=4/4 size=460B  |
> |table stats: 8 rows total |
> |column stats: all |
> |hosts=3 per-host-mem=32.00MB  |
> |tuple-ids=0 row-size=4B cardinality=8 |
> +--+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Comment Edited] (IMPALA-7655) Codegen output for conditional functions (if,isnull, coalesce) is very suboptimal

2018-10-30 Thread Paul Rogers (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16669101#comment-16669101
 ] 

Paul Rogers edited comment on IMPALA-7655 at 10/30/18 5:36 PM:
---

This fix runs into deeper problems: looks like {{CASE}} may not be supported in 
{{GROUP BY}}:

{noformat}
MESSAGE: AnalysisException: select list expression not produced by aggregation 
output
 (missing from GROUP BY clause?): 
 CASE WHEN t1.smallint_col IS NOT NULL THEN t1.smallint_col
 WHEN t1.month IS NOT NULL THEN t1.month ELSE t1.month END int_col
{noformat}

Query:

{code:sql}
select t2.timestamp_col, t1.int_col_1
from
(select coalesce(t1.smallint_col, t1.month, t1.month) as int_col,
   (count(t1.int_col)) <= (coalesce(t1.smallint_col, t1.month, t1.month)) 
as boolean_col,
   (t1.bigint_col) + (t1.smallint_col) as int_col_1
from functional.alltypes t1
group by coalesce(t1.smallint_col, t1.month, t1.month), (t1.bigint_col) + 
(t1.smallint_col)
having (t1.bigint_col) + (t1.smallint_col) != (count(t1.bigint_col + 
t1.smallint_col))
) t1
inner join functional.alltypes t2
on (t2.month = t1.int_col and t2.month = t1.int_col_1 and t2.tinyint_col = 
t1.int_col)
where t2.int_col IN (t1.int_col_1, t1.int_col);
{code}

Apparently, the analyzer cannot match up the rewritten {{GROUP BY}} clause with 
the rewritten {{SELECT}} clause. Maybe need to turn off this feature for 
queries with a {{GROUP BY}} clause?


was (Author: paul.rogers):
This fix runs into deeper problems: looks like {{CASE}} may not be supported in 
{{GROUP BY}}:

{noformat}
MESSAGE: AnalysisException: select list expression not produced by aggregation 
output (missing from GROUP BY clause?): CASE WHEN t1.smallint_col IS NOT NULL 
THEN t1.smallint_col WHEN t1.month IS NOT NULL THEN t1.month ELSE t1.month END 
int_col
{noformat}

Query:

{code:sql}
select t2.timestamp_col, t1.int_col_1
from
(select coalesce(t1.smallint_col, t1.month, t1.month) as int_col,
   (count(t1.int_col)) <= (coalesce(t1.smallint_col, t1.month, t1.month)) 
as boolean_col,
   (t1.bigint_col) + (t1.smallint_col) as int_col_1
from functional.alltypes t1
group by coalesce(t1.smallint_col, t1.month, t1.month), (t1.bigint_col) + 
(t1.smallint_col)
having (t1.bigint_col) + (t1.smallint_col) != (count(t1.bigint_col + 
t1.smallint_col))
) t1
inner join functional.alltypes t2
on (t2.month = t1.int_col and t2.month = t1.int_col_1 and t2.tinyint_col = 
t1.int_col)
where t2.int_col IN (t1.int_col_1, t1.int_col);
{code}

Apparently, the analyzer cannot match up the rewritten {{GROUP BY}} clause with 
the rewritten {{SELECT}} clause. Maybe need to turn off this feature for 
queries with a {{GROUP BY}} clause?

> Codegen output for conditional functions (if,isnull, coalesce) is very 
> suboptimal
> -
>
> Key: IMPALA-7655
> URL: https://issues.apache.org/jira/browse/IMPALA-7655
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Paul Rogers
>Priority: Major
>  Labels: codegen, perf, performance
>
> https://gerrit.cloudera.org/#/c/11565/ provided a clue that an aggregation 
> involving an if() function was very slow, 10x slower than the equivalent 
> version using a case:
> {noformat}
> [localhost:21000] default> set num_nodes=1; set mt_dop=1; select count(case 
> when l_orderkey is NULL then 1 else NULL end) from 
> tpch10_parquet.lineitem;summary;
> NUM_NODES set to 1
> MT_DOP set to 1
> Query: select count(case when l_orderkey is NULL then 1 else NULL end) from 
> tpch10_parquet.lineitem
> Query submitted at: 2018-10-04 11:17:31 (Coordinator: 
> http://tarmstrong-box:25000)
> Query progress can be monitored at: 
> http://tarmstrong-box:25000/query_plan?query_id=274b2a6f35cefe31:95a19642
> +--+
> | count(case when l_orderkey is null then 1 else null end) |
> +--+
> | 0|
> +--+
> Fetched 1 row(s) in 0.51s
> +--++--+--+++--+---+-+
> | Operator | #Hosts | Avg Time | Max Time | #Rows  | Est. #Rows | Peak 
> Mem | Est. Peak Mem | Detail  |
> +--++--+--+++--+---+-+
> | 01:AGGREGATE | 1  | 44.03ms  | 44.03ms  | 1  | 1  | 25.00 
> KB | 10.00 MB  | FINALIZE|
> | 00:SCAN HDFS | 1  | 411.57ms | 411.57ms | 59.99M | -1 | 16.61 
> MB | 88.00 MB  | tpch10_parquet.lineitem |
>

[jira] [Updated] (IMPALA-2850) Populate equivalence classes for <=>

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-2850:
--
Issue Type: Improvement  (was: Bug)

> Populate equivalence classes for <=>
> 
>
> Key: IMPALA-2850
> URL: https://issues.apache.org/jira/browse/IMPALA-2850
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.5.0
>Reporter: Jim Apple
>Priority: Minor
>  Labels: performance
>
> For equality clauses, we populate equivalence classes in the Analyzer, but 
> this is not done for IS NOT DISTINCT FROM, aka `<=>`.
> For an example from 
> testdata/workloads/functional-planner/queries/PlannerTest/joins.test, the 
> following query plans differently with `=` and with `<=>`
> {code:sql}
> explain select 1 from functional.alltypes a
> inner join functional.alltypes b
> on a.id = b.id and a.id = b.int_col and a.id = b.bigint_col
> and a.tinyint_col = b.id and a.smallint_col = b.id
> and a.int_col = b.id and a.bigint_col = b.id
> and b.string_col = a.string_col and b.date_string_col = a.string_col
> where a.tinyint_col = a.smallint_col and a.int_col = a.bigint_col;
> {code}
> In the `=` version, more predicates are pushed down to the scanner, rather 
> than kept in the hash join.
> I think this will require distinguishing between different strengths of 
> equivalence classes. In particular, `=` is not really an equivalence 
> relation, since it is not reflexive. Furthermore, if we know that `a = b` and 
> `b <=> c`, we can conclude that `b = c`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-2783) Push down filters on rank similar to limit

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-2783:
--
Issue Type: Improvement  (was: Bug)

> Push down filters on rank similar to limit
> --
>
> Key: IMPALA-2783
> URL: https://issues.apache.org/jira/browse/IMPALA-2783
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.2
>Reporter: Mostafa Mokhtar
>Priority: Minor
>  Labels: performance, planner, ramp-up
>
> Similar to limit push down optimization we should extend the rule to cover 
> filters on Rank(), dense_rank() etc... as users tend to have explicit filters 
> on RANK()
> Query 
> {code}
> select *
> FROM   (SELECT Rank()
> OVER(
> ORDER BY  l_orderkey) AS rank
> FROM   lineitem
> WHERE  l_shipdate < '1992-05-09') a
> WHERE  rank < 10
> {code}
> Plan
> {code}
> +--+
> | Explain String   |
> +--+
> | Estimated Per-Host Requirements: Memory=512.00MB VCores=1|
> |  |
> | 03:SELECT|
> | |  predicates: rank() < 10   |
> | |  hosts=9 per-host-mem=unavailable  |
> | |  tuple-ids=6,5 row-size=50B cardinality=17999891   |
> | ||
> | 02:ANALYTIC  |
> | |  functions: rank() |
> | |  order by: l_orderkey ASC  |
> | |  window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW |
> | |  hosts=9 per-host-mem=unavailable  |
> | |  tuple-ids=6,5 row-size=50B cardinality=179998909  |
> | ||
> | 04:MERGING-EXCHANGE [UNPARTITIONED]  |
> | |  order by: l_orderkey ASC  |
> | |  hosts=9 per-host-mem=unavailable  |
> | |  tuple-ids=6 row-size=38B cardinality=179998909|
> | ||
> | 01:SORT  |
> | |  order by: l_orderkey ASC  |
> | |  hosts=9 per-host-mem=336.00MB |
> | |  tuple-ids=6 row-size=38B cardinality=179998909|
> | ||
> | 00:SCAN HDFS [tpch_300_parquet.lineitem, RANDOM] |
> |partitions=1/1 files=264 size=64.36GB |
> |predicates: l_shipdate < '1992-05-09' |
> |table stats: 1799989091 rows total|
> |column stats: all |
> |hosts=9 per-host-mem=176.00MB |
> |tuple-ids=0 row-size=38B cardinality=179998909|
> +--+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-3109) For Avro table schema defined in tblproperties, add column should have no effect instead of writing to meta store

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-3109:
--
Issue Type: Improvement  (was: Bug)

> For Avro table schema defined in tblproperties, add column should have no 
> effect instead of writing to meta store
> -
>
> Key: IMPALA-3109
> URL: https://issues.apache.org/jira/browse/IMPALA-3109
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.2, Impala 2.3.0, Impala 2.5.0
>Reporter: Huaisi Xu
>Assignee: Huaisi Xu
>Priority: Minor
>  Labels: supportability, usability
>
> {code:java}
> create external table d2 like functional_avro.tinytable;
> alter table d2 add columns (ff int); -> succeed
> Now second insert:
> alter table d2 add columns (ff int);
> RROR:
> ImpalaRuntimeException: Error making 'alter_table' RPC to Hive Metastore:
> CAUSED BY: MetaException: javax.jdo.JDODataStoreException: Add request failed 
> : INSERT INTO "COLUMNS_V2" 
> ("CD_ID","COMMENT","COLUMN_NAME","TYPE_NAME","INTEGER_IDX") VALUES (?,?,?,?,?)
> ...
> ...
> ...
> ... ( stack traces )
> {code}
> In hive the behavior seems like the add columns statement is ignored.
> We do not use meta data's version during the query stage and thus we should 
> behave similar to Hive as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-3171) data-source-tables.test is flaky when BATCH_SIZE is changed

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-3171:
--
Issue Type: Test  (was: Bug)

> data-source-tables.test is flaky when BATCH_SIZE is changed
> ---
>
> Key: IMPALA-3171
> URL: https://issues.apache.org/jira/browse/IMPALA-3171
> Project: IMPALA
>  Issue Type: Test
>  Components: Infrastructure
>Affects Versions: impala 2.3
>Reporter: Juan Yu
>Priority: Minor
>
> data-source-tables.test is flaky and could return different result when 
> changing BATCH_SIZE
> {code}
> [localhost:21000] > select count(*) from alltypes_datasource;
> Query: select count(*) from alltypes_datasource
> +--+
> | count(*) |
> +--+
> | 4510 |
> +--+
> Fetched 1 row(s) in 0.40s
> [localhost:21000] > set batch_size=12345;
> BATCH_SIZE set to 12345
> [localhost:21000] > select count(*) from alltypes_datasource;
> Query: select count(*) from alltypes_datasource
> +--+
> | count(*) |
> +--+
> | 5000 |
> +--+
> Fetched 1 row(s) in 0.40s
> [localhost:21000] > set batch_size=1;
> BATCH_SIZE set to 1
> [localhost:21000] > select count(*) from alltypes_datasource;
> Query: select count(*) from alltypes_datasource
> +--+
> | count(*) |
> +--+
> | 4501 |
> +--+
> Fetched 1 row(s) in 0.40s
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-3171) data-source-tables.test is flaky when BATCH_SIZE is changed

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-3171:
--
Component/s: (was: Backend)
 Infrastructure

> data-source-tables.test is flaky when BATCH_SIZE is changed
> ---
>
> Key: IMPALA-3171
> URL: https://issues.apache.org/jira/browse/IMPALA-3171
> Project: IMPALA
>  Issue Type: Test
>  Components: Infrastructure
>Affects Versions: impala 2.3
>Reporter: Juan Yu
>Priority: Minor
>
> data-source-tables.test is flaky and could return different result when 
> changing BATCH_SIZE
> {code}
> [localhost:21000] > select count(*) from alltypes_datasource;
> Query: select count(*) from alltypes_datasource
> +--+
> | count(*) |
> +--+
> | 4510 |
> +--+
> Fetched 1 row(s) in 0.40s
> [localhost:21000] > set batch_size=12345;
> BATCH_SIZE set to 12345
> [localhost:21000] > select count(*) from alltypes_datasource;
> Query: select count(*) from alltypes_datasource
> +--+
> | count(*) |
> +--+
> | 5000 |
> +--+
> Fetched 1 row(s) in 0.40s
> [localhost:21000] > set batch_size=1;
> BATCH_SIZE set to 1
> [localhost:21000] > select count(*) from alltypes_datasource;
> Query: select count(*) from alltypes_datasource
> +--+
> | count(*) |
> +--+
> | 4501 |
> +--+
> Fetched 1 row(s) in 0.40s
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-1798) Very slow performance of Views on top of another Views

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-1798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1798.
---
Resolution: Duplicate

Sounds like IMPALA-4242

> Very slow performance of Views on top of another Views
> --
>
> Key: IMPALA-1798
> URL: https://issues.apache.org/jira/browse/IMPALA-1798
> Project: IMPALA
>  Issue Type: Bug
>  Components: Perf Investigation
>Affects Versions: Impala 2.1, Impala 2.1.1, Impala 2.3.0
> Environment: Cluster 3 nodes Impala 2.1
>Reporter: Alex Finch
>Priority: Minor
>  Labels: performance, planner
>
> Query from a view has about the same performance as a query from the source 
> table. If we have a VIEW on top of another VIEW (even CREATE view_2 AS SELECT 
> * FROM view_1) the performance is much slower, for more complex queries on 
> top of view_2, the compilation of the query actually never finishes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7166) ExecSummary should be a first class object

2018-10-30 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668880#comment-16668880
 ] 

ASF subversion and git services commented on IMPALA-7166:
-

Commit bae27edf532d4e29ad8a83bf2ddd3b1b43f8a23f in impala's branch 
refs/heads/master from [~yzhangal]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=bae27ed ]

IMPALA-7166: ExecSummary should be a first class object.

Impala RuntimeProfile currently contains "ExecSummary" as a string. We should 
make it a
first class thrift object, so that tools can extract these fields (Est rows 
etc..).

Testing:
Modified unit test.

Change-Id: I4791237a5579f16c9efda8e57876d48980739e13
Reviewed-on: http://gerrit.cloudera.org:8080/11555
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 


> ExecSummary should be a first class object
> --
>
> Key: IMPALA-7166
> URL: https://issues.apache.org/jira/browse/IMPALA-7166
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: sandeep akinapelli
>Assignee: Yongjun Zhang
>Priority: Major
>  Labels: resource-management, usability
> Fix For: Impala 3.1.0
>
>
> Impala RuntimeProfile currently contains "ExecSummary" as a string. We should 
> make it a first class thrift object, so that tools can extract these fields 
> (Est rows etc..), 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-6777) Whitespace inconsistencies in pretty printer across units

2018-10-30 Thread Andrew Sherman (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman reassigned IMPALA-6777:
--

Assignee: (was: Andrew Sherman)

> Whitespace inconsistencies in pretty printer across units
> -
>
> Key: IMPALA-6777
> URL: https://issues.apache.org/jira/browse/IMPALA-6777
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.12.0
>Reporter: Lars Volker
>Priority: Minor
>  Labels: newbie
>
> Depending on the unit we sometimes print a whitespace between a value and its 
> unit and sometimes we don't:
>  
> {noformat}
> "human_readable": "Count: 9, min / max: 13.000us / 22.000us, 25th %-ile: 
> 13.000us, 50th %-ile: 16.000us, 75th %-ile: 17.000us, 90th %-ile: 18.000us, 
> 95th %-ile: 22.000us, 99.9th %-ile: 22.000us",
> "human_readable": "Count: 9, min / max: 80.00 B / 80.00 B, 25th %-ile: 80.00 
> B, 50th %-ile: 80.00 B, 75th %-ile: 80.00 B, 90th %-ile: 80.00 B, 95th %-ile: 
> 80.00 B, 99.9th %-ile: 80.00 B",
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-7735) Expose admission control status in impala-shell

2018-10-30 Thread Bikramjeet Vig (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikramjeet Vig resolved IMPALA-7735.

   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Expose admission control status in impala-shell
> ---
>
> Key: IMPALA-7735
> URL: https://issues.apache.org/jira/browse/IMPALA-7735
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Clients
>Affects Versions: Impala 3.1.0
>Reporter: Tim Armstrong
>Assignee: Bikramjeet Vig
>Priority: Critical
>  Labels: admission-control
> Fix For: Impala 3.2.0
>
> Attachments: Screenshot1.png
>
>
> Following on from IMPALA-7545 we should also expose this in impala-shell. I 
> left some notes on that JIRA.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-3473) Flaky Test Failure: KuduTableSinkTest.TestInsertJustKey

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3473.
---
Resolution: Cannot Reproduce

> Flaky Test Failure: KuduTableSinkTest.TestInsertJustKey
> ---
>
> Key: IMPALA-3473
> URL: https://issues.apache.org/jira/browse/IMPALA-3473
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.5.0
>Reporter: Lars Volker
>Priority: Minor
>
> http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge/2349/testReport/junit/(root)/KuduTableSinkTest/TestInsertJustKey/
> Casey, I picked you thinking you might have an idea what’s going on here; 
> feel free to find another person or assign back to me if you're swamped.
> I've only seen this fail once, I couldn't reproduce it locally and other GVMs 
> don't seem to be affected. Here's the error message pointing to the test in 
> question ({{be/src/exec/kudu-table-sink-test.cc:264}}):
> {noformat}
> Value of: skip_val == 1 ? expected_num_rows : (expected_num_rows + 1) / 
> skip_val
>   Actual: 20
> Expected: row_idx
> Which is: 10
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-6146) Rethink HdfsTextScanner::FillByteBuffer() extension point

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-6146:
--
Issue Type: Improvement  (was: Bug)

> Rethink HdfsTextScanner::FillByteBuffer() extension point
> -
>
> Key: IMPALA-6146
> URL: https://issues.apache.org/jira/browse/IMPALA-6146
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Priority: Minor
>
> The HDFS text scanner allows extension via subclassing and overriding 
> FillByteBuffer(). FillByteBuffer() is poorly encapsulated from the rest of 
> the text scanner implementation - it modifies several member variables, which 
> makes it hard to fix bugs like IMPALA-6137. We should come up a better 
> defined interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-2543) Typo in timestamp value produces corrupt result instead of an error

2018-10-30 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-2543:
--
Priority: Major  (was: Minor)

> Typo in timestamp value produces corrupt result instead of an error
> ---
>
> Key: IMPALA-2543
> URL: https://issues.apache.org/jira/browse/IMPALA-2543
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.3.0
>Reporter: Ryan Blue
>Priority: Major
>  Labels: correctness, timestamp, usability
>
> Running timestamp tests, I hit the following bug where a typo in a timestamp 
> string produced a strange result without complaining that the input value was 
> invalid:
> {code}
> Query: select to_utc_timestamp('2015-10-13-09:15:34.101', 'PDT')
>  Oops. ---^
> ++
> | to_utc_timestamp('2015-10-13-09:15:34.101', 'pdt') |
> ++
> | 2015-10-13 07:00:00| // ???
> ++
> Query: select to_utc_timestamp('2015-10-13 09:15:34.101', 'PDT')
> ++
> | to_utc_timestamp('2015-10-13 09:15:34.101', 'pdt') |
> ++
> | 2015-10-13 16:15:34.10100  | // MUCH BETTER
> ++
> {code}
> It looks like anything after the bad character is ignored:
> {code}
> Query: select to_utc_timestamp('2015-10-13-09:15:34.101', 'UTC')
> ++
> | to_utc_timestamp('2015-10-13-09:15:34.101', 'utc') |
> ++
> | 2015-10-13 00:00:00|
> ++
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

1 2 >

1 - 100 of 197 matches

Mail list logo