[jira] [Resolved] (IMPALA-7646) SHOW GRANT USER not working on kerberized clusters

2018-10-01 Thread Adam Holley (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Holley resolved IMPALA-7646.
-
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> SHOW GRANT USER not working on kerberized clusters
> --
>
> Key: IMPALA-7646
> URL: https://issues.apache.org/jira/browse/IMPALA-7646
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Adam Holley
>Assignee: Adam Holley
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> SHOW GRANT USER foo_user;
> does not work on kerberized clusters because the requester name does not 
> match the users name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7520) NPE in SentryProxy

2018-10-01 Thread Fredy Wijaya (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fredy Wijaya resolved IMPALA-7520.
--
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> NPE in SentryProxy
> --
>
> Key: IMPALA-7520
> URL: https://issues.apache.org/jira/browse/IMPALA-7520
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Adam Holley
>Assignee: Fredy Wijaya
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> In SentryProxy.refreshPrivilegesInCache(), the call to 
> allPrincipalPrivileges.get(principal.getName()) is sometimes returning null.
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.impala.util.SentryProxy$PolicyReader.refreshPrivilegesInCatalog(SentryProxy.java:245)
> at 
> org.apache.impala.util.SentryProxy$PolicyReader.refreshRolePrivileges(SentryProxy.java:197)
> at 
> org.apache.impala.util.SentryProxy$PolicyReader.run(SentryProxy.java:139)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-4667) Incorrect timeline reported for queries with long running coordinator fragment

2018-10-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-4667.
---
Resolution: Not A Problem

This doesn't look like a bug. IMPALA-6497 makes this more obvious by adding 
additional evens before UnregisterQuery().

> Incorrect timeline reported for queries with long running coordinator fragment
> --
>
> Key: IMPALA-4667
> URL: https://issues.apache.org/jira/browse/IMPALA-4667
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.8.0
>Reporter: Mostafa Mokhtar
>Priority: Major
>  Labels: profile, supportability
> Attachments: IMPALA-4667_profile.txt
>
>
> Query profile shows that "Unregister query" ran for 7 seconds 
> Query
> {code}
>  select *
> FROM   (SELECT Rank()
>  OVER( partition by l_linenumber
>ORDER BY  l_orderkey) AS rank
> FROM   lineitem
> WHERE  l_shipdate < '1992-05-09') a
> WHERE  rank < 10
> {code}
> Plan
> {code}
> PLAN-ROOT SINK
> |
> 05:EXCHANGE [UNPARTITIONED]
> |  hosts=7 per-host-mem=unavailable
> |  tuple-ids=6,5 row-size=50B cardinality=17999891
> |
> 03:SELECT
> |  predicates: rank() < 10
> |  hosts=7 per-host-mem=0B
> |  tuple-ids=6,5 row-size=50B cardinality=17999891
> |
> 02:ANALYTIC
> |  functions: rank()
> |  partition by: l_linenumber
> |  order by: l_orderkey ASC
> |  window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
> |  hosts=7 per-host-mem=0B
> |  tuple-ids=6,5 row-size=50B cardinality=179998909
> |
> 01:SORT
> |  order by: l_linenumber ASC NULLS FIRST, l_orderkey ASC
> |  hosts=7 per-host-mem=352.00MB
> |  tuple-ids=6 row-size=42B cardinality=179998909
> |
> 04:EXCHANGE [HASH(l_linenumber)]
> |  hosts=7 per-host-mem=0B
> |  tuple-ids=0 row-size=42B cardinality=179998909
> |
> 00:SCAN HDFS [tpch_300_parquet.lineitem, RANDOM]
>partitions=1/1 files=259 size=63.71GB
>predicates: l_shipdate < '1992-05-09'
>table stats: 1799989091 rows total
>column stats: all
>hosts=7 per-host-mem=264.00MB
>tuple-ids=0 row-size=42B cardinality=179998909
> {code}
> ExecSummary: 
> {code}
> Operator   #Hosts   Avg Time   Max Time   #Rows  Est. #Rows   Peak Mem  
> Est. Peak Mem  Detail
> -
> 05:EXCHANGE 1  156.063us  156.063us  63  18.00M  0
> -1.00 B  UNPARTITIONED 
> 03:SELECT   7   63.424ms  126.486ms  63  18.00M9.02 MB
>   0
> 02:ANALYTIC 71s582ms3s140ms  50.88M 180.00M   25.03 MB
>   0
> 01:SORT 74s099ms8s368ms  50.88M 180.00M  616.07 MB
>   352.00 MB
> 04:EXCHANGE 7  497.343ms1s304ms  50.88M 180.00M  0
>   0  HASH(l_linenumber)
> 00:SCAN HDFS7   89.198ms   94.103ms  50.88M 180.00M1.21 GB
>   264.00 MB  tpch_300_parquet.lineitem 
> {code}
> Query timeline
> {code}
> Planner Timeline: 14.041ms
>- Analysis finished: 2.081ms (2.081ms)
>- Equivalence classes computed: 2.286ms (205.178us)
>- Single node plan created: 8.255ms (5.968ms)
>- Runtime filters computed: 8.308ms (52.728us)
>- Distributed plan created: 8.455ms (147.135us)
>- Lineage info computed: 8.650ms (195.300us)
>- Planning finished: 14.041ms (5.390ms)
> Query Timeline: 13s183ms
>- Query submitted: 37.469us (37.469us)
>- Planning finished: 18.856ms (18.819ms)
>- Submit for admission: 20.020ms (1.163ms)
>- Completed admission: 20.794ms (773.463us)
>- Ready to start 15 fragment instances: 21.443ms (648.888us)
>- All 15 fragment instances started: 33.322ms (11.879ms)
>- Rows available: 5s904ms (5s871ms)
>- First row fetched: 5s985ms (80.483ms)
>- Unregister query: 13s137ms (7s152ms)
> {code}
> This query has the same issue
> {code}
> select l_orderkey from lineitem ORDER BY  l_orderkey limit 100
> {code}
> Profile snippet
> {code}
> PLAN-ROOT SINK
> |
> 02:MERGING-EXCHANGE [UNPARTITIONED]
> |  order by: l_orderkey ASC
> |  limit: 100
> |  hosts=7 per-host-mem=unavailable
> |  tuple-ids=1 row-size=8B cardinality=100
> |
> 01:TOP-N [LIMIT=100]
> |  order by: l_orderkey ASC
> |  hosts=7 per-host-mem=7.63MB
> |  tuple-ids=1 row-size=8B cardinality=100
> |
> 00:SCAN HDFS [tpch_300_parquet.lineitem, RANDOM]
>partitions=1/1 files=259 size=63.71GB
>table stats: 1799989091 rows total
>column stats: all
>hosts=7 

[jira] [Assigned] (IMPALA-4275) Coordinator::GetNext() should not call WaitForAllInstances() at eos

2018-10-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-4275:
-

Assignee: (was: Henry Robinson)

> Coordinator::GetNext() should not call WaitForAllInstances() at eos
> ---
>
> Key: IMPALA-4275
> URL: https://issues.apache.org/jira/browse/IMPALA-4275
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Affects Versions: Impala 2.8.0
>Reporter: Henry Robinson
>Priority: Minor
>
> {{Coordinator::GetNext()}} calls {{WaitForAllInstances()}} (previously 
> {{WaitForAllBackends()}}) when it finishes pulling the last batch. This puts 
> fragment instance lifecycle management on the critical path to retrieve 
> results, which doesn't make a lot of sense - the client should get notified 
> that the query has returned all results without having to wait for every 
> fragment instance to finish.
> We should move that call to {{TearDown()}} instead. The reason that this is a 
> little tricky is because {{TearDown()}} currently happens _after_ the 
> containing {{QueryExecState}} has been removed from the Impala server's exec 
> state map. Once that's happened, fragment status reports can't get to the 
> {{Coordinator}}, so {{WaitForAllInstances()}} would wait forever. 
> This JIRA is to fix both issues at once - calling {{TearDown()}} before 
> deregistration to ensure fragments may be waited for, and to remove waiting 
> from the {{GetNext()}} path.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-2566) Result of casttochar() not handled properly in SQL operations

2018-10-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-2566:
--
Labels: crash  (was: )

> Result of casttochar() not handled properly in SQL operations
> -
>
> Key: IMPALA-2566
> URL: https://issues.apache.org/jira/browse/IMPALA-2566
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.3.0
>Reporter: John Russell
>Priority: Critical
>  Labels: crash
>
> If I use casttochar() during a CTAS to set the type of a column, Impala 
> considers the result to be STRING. However, somehow the length information 
> for the CHAR results must be getting passed back and messing things up in the 
> output. Trying to query the resulting table causes the query to hang:
> {code}
> [blah:21000] > create table char_types as select casttochar('hello world') as 
> c1, casttochar('xyz') as c2, casttochar('x') as c3;
> Query: create table char_types as select casttochar('hello world') as c1, 
> casttochar('xyz') as c2, casttochar('x') as c3
> +---+
> | summary   |
> +---+
> | Inserted 1 row(s) |
> +---+
> Fetched 1 row(s) in 6.89s
> [blah:21000] > desc char_types;
> Query: describe char_types
> +--++-+
> | name | type   | comment |
> +--++-+
> | c1   | string | |
> | c2   | string | |
> | c3   | string | |
> +--++-+
> [blah:21000] > show functions in _impala_builtins like 'casttochar';
> Query: show functions in _impala_builtins like 'casttochar'
> +-+--+
> | return type | signature|
> +-+--+
> | CHAR(*) | casttochar(BIGINT)   |
> | CHAR(*) | casttochar(BOOLEAN)  |
> | CHAR(*) | casttochar(CHAR(*))  |
> | CHAR(*) | casttochar(DECIMAL(*,*)) |
> | CHAR(*) | casttochar(DOUBLE)   |
> | CHAR(*) | casttochar(FLOAT)|
> | CHAR(*) | casttochar(INT)  |
> | CHAR(*) | casttochar(SMALLINT) |
> | CHAR(*) | casttochar(STRING)   |
> | CHAR(*) | casttochar(TIMESTAMP)|
> | CHAR(*) | casttochar(TINYINT)  |
> | CHAR(*) | casttochar(VARCHAR(*))   |
> +-+--+
> Fetched 12 row(s) in 0.10s
> [blah:21000] > select * from char_types;
> Query: select * from char_types
> ^C Cancelling Query
> {code}
> The HDFS data file has the original text info plus extra control characters. 
> Doing hdfs dfs -cat on the data file causes the OS X terminal to go haywire 
> and lock up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-2566) Result of casttochar() not handled properly in SQL operations

2018-10-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-2566:
--
Target Version: Impala 3.1.0  (was: Product Backlog)

> Result of casttochar() not handled properly in SQL operations
> -
>
> Key: IMPALA-2566
> URL: https://issues.apache.org/jira/browse/IMPALA-2566
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.3.0
>Reporter: John Russell
>Priority: Critical
>  Labels: crash
>
> If I use casttochar() during a CTAS to set the type of a column, Impala 
> considers the result to be STRING. However, somehow the length information 
> for the CHAR results must be getting passed back and messing things up in the 
> output. Trying to query the resulting table causes the query to hang:
> {code}
> [blah:21000] > create table char_types as select casttochar('hello world') as 
> c1, casttochar('xyz') as c2, casttochar('x') as c3;
> Query: create table char_types as select casttochar('hello world') as c1, 
> casttochar('xyz') as c2, casttochar('x') as c3
> +---+
> | summary   |
> +---+
> | Inserted 1 row(s) |
> +---+
> Fetched 1 row(s) in 6.89s
> [blah:21000] > desc char_types;
> Query: describe char_types
> +--++-+
> | name | type   | comment |
> +--++-+
> | c1   | string | |
> | c2   | string | |
> | c3   | string | |
> +--++-+
> [blah:21000] > show functions in _impala_builtins like 'casttochar';
> Query: show functions in _impala_builtins like 'casttochar'
> +-+--+
> | return type | signature|
> +-+--+
> | CHAR(*) | casttochar(BIGINT)   |
> | CHAR(*) | casttochar(BOOLEAN)  |
> | CHAR(*) | casttochar(CHAR(*))  |
> | CHAR(*) | casttochar(DECIMAL(*,*)) |
> | CHAR(*) | casttochar(DOUBLE)   |
> | CHAR(*) | casttochar(FLOAT)|
> | CHAR(*) | casttochar(INT)  |
> | CHAR(*) | casttochar(SMALLINT) |
> | CHAR(*) | casttochar(STRING)   |
> | CHAR(*) | casttochar(TIMESTAMP)|
> | CHAR(*) | casttochar(TINYINT)  |
> | CHAR(*) | casttochar(VARCHAR(*))   |
> +-+--+
> Fetched 12 row(s) in 0.10s
> [blah:21000] > select * from char_types;
> Query: select * from char_types
> ^C Cancelling Query
> {code}
> The HDFS data file has the original text info plus extra control characters. 
> Doing hdfs dfs -cat on the data file causes the OS X terminal to go haywire 
> and lock up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-2566) Result of casttochar() not handled properly in SQL operations

2018-10-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-2566:
--
Priority: Critical  (was: Minor)

> Result of casttochar() not handled properly in SQL operations
> -
>
> Key: IMPALA-2566
> URL: https://issues.apache.org/jira/browse/IMPALA-2566
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.3.0
>Reporter: John Russell
>Priority: Critical
>  Labels: crash
>
> If I use casttochar() during a CTAS to set the type of a column, Impala 
> considers the result to be STRING. However, somehow the length information 
> for the CHAR results must be getting passed back and messing things up in the 
> output. Trying to query the resulting table causes the query to hang:
> {code}
> [blah:21000] > create table char_types as select casttochar('hello world') as 
> c1, casttochar('xyz') as c2, casttochar('x') as c3;
> Query: create table char_types as select casttochar('hello world') as c1, 
> casttochar('xyz') as c2, casttochar('x') as c3
> +---+
> | summary   |
> +---+
> | Inserted 1 row(s) |
> +---+
> Fetched 1 row(s) in 6.89s
> [blah:21000] > desc char_types;
> Query: describe char_types
> +--++-+
> | name | type   | comment |
> +--++-+
> | c1   | string | |
> | c2   | string | |
> | c3   | string | |
> +--++-+
> [blah:21000] > show functions in _impala_builtins like 'casttochar';
> Query: show functions in _impala_builtins like 'casttochar'
> +-+--+
> | return type | signature|
> +-+--+
> | CHAR(*) | casttochar(BIGINT)   |
> | CHAR(*) | casttochar(BOOLEAN)  |
> | CHAR(*) | casttochar(CHAR(*))  |
> | CHAR(*) | casttochar(DECIMAL(*,*)) |
> | CHAR(*) | casttochar(DOUBLE)   |
> | CHAR(*) | casttochar(FLOAT)|
> | CHAR(*) | casttochar(INT)  |
> | CHAR(*) | casttochar(SMALLINT) |
> | CHAR(*) | casttochar(STRING)   |
> | CHAR(*) | casttochar(TIMESTAMP)|
> | CHAR(*) | casttochar(TINYINT)  |
> | CHAR(*) | casttochar(VARCHAR(*))   |
> +-+--+
> Fetched 12 row(s) in 0.10s
> [blah:21000] > select * from char_types;
> Query: select * from char_types
> ^C Cancelling Query
> {code}
> The HDFS data file has the original text info plus extra control characters. 
> Doing hdfs dfs -cat on the data file causes the OS X terminal to go haywire 
> and lock up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-2566) Result of casttochar() not handled properly in SQL operations

2018-10-01 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634831#comment-16634831
 ] 

Tim Armstrong commented on IMPALA-2566:
---

I can't repro the hang but I can repro the DCHECK on a debug build. We should 
fix that.

> Result of casttochar() not handled properly in SQL operations
> -
>
> Key: IMPALA-2566
> URL: https://issues.apache.org/jira/browse/IMPALA-2566
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.3.0
>Reporter: John Russell
>Priority: Critical
>  Labels: crash
>
> If I use casttochar() during a CTAS to set the type of a column, Impala 
> considers the result to be STRING. However, somehow the length information 
> for the CHAR results must be getting passed back and messing things up in the 
> output. Trying to query the resulting table causes the query to hang:
> {code}
> [blah:21000] > create table char_types as select casttochar('hello world') as 
> c1, casttochar('xyz') as c2, casttochar('x') as c3;
> Query: create table char_types as select casttochar('hello world') as c1, 
> casttochar('xyz') as c2, casttochar('x') as c3
> +---+
> | summary   |
> +---+
> | Inserted 1 row(s) |
> +---+
> Fetched 1 row(s) in 6.89s
> [blah:21000] > desc char_types;
> Query: describe char_types
> +--++-+
> | name | type   | comment |
> +--++-+
> | c1   | string | |
> | c2   | string | |
> | c3   | string | |
> +--++-+
> [blah:21000] > show functions in _impala_builtins like 'casttochar';
> Query: show functions in _impala_builtins like 'casttochar'
> +-+--+
> | return type | signature|
> +-+--+
> | CHAR(*) | casttochar(BIGINT)   |
> | CHAR(*) | casttochar(BOOLEAN)  |
> | CHAR(*) | casttochar(CHAR(*))  |
> | CHAR(*) | casttochar(DECIMAL(*,*)) |
> | CHAR(*) | casttochar(DOUBLE)   |
> | CHAR(*) | casttochar(FLOAT)|
> | CHAR(*) | casttochar(INT)  |
> | CHAR(*) | casttochar(SMALLINT) |
> | CHAR(*) | casttochar(STRING)   |
> | CHAR(*) | casttochar(TIMESTAMP)|
> | CHAR(*) | casttochar(TINYINT)  |
> | CHAR(*) | casttochar(VARCHAR(*))   |
> +-+--+
> Fetched 12 row(s) in 0.10s
> [blah:21000] > select * from char_types;
> Query: select * from char_types
> ^C Cancelling Query
> {code}
> The HDFS data file has the original text info plus extra control characters. 
> Doing hdfs dfs -cat on the data file causes the OS X terminal to go haywire 
> and lock up.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-3867) tool to kill runaway queries

2018-10-01 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634825#comment-16634825
 ] 

Tim Armstrong commented on IMPALA-3867:
---

We've been building a lot of this stuff into Impala, which seems overall better 
than an external tool. I'm sure there are cases where people might want to do 
something different, but the requirements are unclear - I think we need more 
concrete use cases for this to be worth tracking.

> tool to kill runaway queries
> 
>
> Key: IMPALA-3867
> URL: https://issues.apache.org/jira/browse/IMPALA-3867
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Clients
>Affects Versions: Impala 2.5.0
>Reporter: Marcell Szabo
>Priority: Minor
>
> We already have a [good 
> mechanism|http://www.cloudera.com/documentation/enterprise/latest/topics/impala_timeouts.html#timeouts__impalad_timeout]
>  for killing idle queries.
> In some situations it would be nice to have a tool that kills active queries 
> based on configurable rules, e.g. runtime or memory consumed goes over a 
> limit.
> A first approach could be a script using CM API calls impalaQueries and 
> impalaQueries/cancel.
> It would be nice though if the user who originally started the query would 
> see a custom error message that describes why and who killed the query. This 
> should also be stored in the profile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-3867) tool to kill runaway queries

2018-10-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3867.
---
Resolution: Later

> tool to kill runaway queries
> 
>
> Key: IMPALA-3867
> URL: https://issues.apache.org/jira/browse/IMPALA-3867
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Clients
>Affects Versions: Impala 2.5.0
>Reporter: Marcell Szabo
>Priority: Minor
>
> We already have a [good 
> mechanism|http://www.cloudera.com/documentation/enterprise/latest/topics/impala_timeouts.html#timeouts__impalad_timeout]
>  for killing idle queries.
> In some situations it would be nice to have a tool that kills active queries 
> based on configurable rules, e.g. runtime or memory consumed goes over a 
> limit.
> A first approach could be a script using CM API calls impalaQueries and 
> impalaQueries/cancel.
> It would be nice though if the user who originally started the query would 
> see a custom error message that describes why and who killed the query. This 
> should also be stored in the profile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-4802) Fix lifecycle of RuntimeProfile

2018-10-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-4802.
---
Resolution: Duplicate

> Fix lifecycle of RuntimeProfile
> ---
>
> Key: IMPALA-4802
> URL: https://issues.apache.org/jira/browse/IMPALA-4802
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.9.0
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: query-lifecycle
>
> The lifecycle of RuntimeProfile objects and counters is confusing and 
> error-prone. In particular, there are various undocumented requirements about 
> the order destructors are run in.
> Some RuntimeProfile counters are registered to be periodically updated by a 
> thread. This thread touches the counter object, and in some cases other 
> objects (e.g. MemTrackers). References to RuntimeProfile counters are also 
> stored in various places. There are a couple of specific problems I'm aware 
> of:
> * ~RuntimeProfile deregisters counters from being updated by a background 
> thread. It must be run before those counters are destroyed. This currently 
> works because either the RuntimeProfile lives in the same ObjectPool as the 
> counters and ObjectPools destroy objects in FIFO order, or because the 
> RuntimeProfile is destroyed before the ObjectPool.
> * MemTracker destructors reference the consumption() counter, so the 
> MemTracker must be destroyed before the counter. In practice this means that 
> a MemTracker cannot always be managed by an ObjectPool because destructors 
> will be run in the wrong order (unless it is using 
> MemTracker::local_counter_).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-2122) Allocate objects associated with query/fragment from reserved mem

2018-10-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-2122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2122.
---
Resolution: Later

> Allocate objects associated with query/fragment from reserved mem
> -
>
> Key: IMPALA-2122
> URL: https://issues.apache.org/jira/browse/IMPALA-2122
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.2
>Reporter: Matthew Jacobs
>Priority: Minor
>  Labels: query-lifecycle, resource-management
>
> We should be allocating _all_ per-query/per-fragment objects from memory 
> reserved for the query/fragment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-1226) need better cancellation tests

2018-10-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-1226:
--
Priority: Major  (was: Critical)

> need better cancellation tests
> --
>
> Key: IMPALA-1226
> URL: https://issues.apache.org/jira/browse/IMPALA-1226
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 2.0
>Reporter: Dan Hecht
>Assignee: Dan Hecht
>Priority: Major
>  Labels: query-lifecycle, test, test-infra
> Attachments: impala-1178-test.patch
>
>
> See IMPALA-1178 comments for details for how to manually reproduce the bug.
> I tried writing a pytest test case to reproduce the problem but it appears 
> that we don't get enough parallelism that way.  Even adding a long delay loop 
> to ImpalaServer::ExecuteStatement inside the race window shows that the 
> attached test will not execute the RPCs concurrently, and so doesn't 
> reproduce IMPALA-1178.
> I'll attach a patch of the test case attempt.  The patch also contains some 
> instrumentation I was using to see why I wasn't hitting the race window.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-1605) Cancellation takes a long time if there are slow exprs

2018-10-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-1605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-1605:
--
Issue Type: Improvement  (was: Bug)

> Cancellation takes a long time if there are slow exprs
> --
>
> Key: IMPALA-1605
> URL: https://issues.apache.org/jira/browse/IMPALA-1605
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.0
>Reporter: Skye Wanderman-Milne
>Priority: Minor
>  Labels: query-lifecycle
>
> Impala generally checks for cancellation once per row batch, which by default 
> is 1024 rows. However, if each row is very expensive to evaluate (e.g. 
> involves a very complicated regular expression evaluation), this may not be 
> frequent enough for responsive cancellation.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-4714) Idle session expired query goes in to exception state - And this is confusing

2018-10-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-4714.
---
Resolution: Won't Fix

> Idle session expired query goes in to exception state - And this is confusing
> -
>
> Key: IMPALA-4714
> URL: https://issues.apache.org/jira/browse/IMPALA-4714
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.6.0
>Reporter: Mala Chikka Kempanna
>Priority: Major
>  Labels: query-lifecycle
>
> After setting idle_session_timeout , impala server, after completing 
> execution of query, moves it into exception state , if there is no client 
> activity.
> Example profile excerpt showing this behavior:
> {code}
> Query Timeline
> Start execution: 0ns (0ns)
> Planning finished: 9ms (9ms)
> Child queries finished: 8.3m (8.3m)
> Metastore update finished: 8.3m (661ms)
> Rows available: 8.3m (0ns)
> Cancelled: 11.3m (3.0m)
> Unregister query: 12.0m (42.55s)
> {code}
> Query status and query state-
> {code}
> Query Type: DDL
> Query State: EXCEPTION
> Start Time: Dec 22, 2016 11:45:01 AM
> End Time: Dec 22, 2016 11:57:01 AM
> Duration: 11m, 59s
> Admission Result: Unknown
> Client Fetch Wait Time: 3.7m
> Client Fetch Wait Time Percentage: 31
> Connected User: admin
> DDL Type: COMPUTE_STATS
> File Formats:
> Impala Version: impalad version 2.5.0-cdh5.7.2 RELEASE (build 
> 1140f8289dc0d2b1517bcf70454bb4575eb8cc70)
> Network Address: 10.17.100.123:44618
> Out of Memory: false
> Planning Wait Time: 9ms
> Planning Wait Time Percentage: 0
> Query Status: Query d141e0d996c91e72:bb8726fb917537bb expired due to client 
> inactivity (timeout is 3m)
> Session ID: 3043ff5042860968:8f92bc3bd2a0ca83
> Session Type: HIVESERVER2
> {code}
> Though query status string is very clear saying "expired due to client 
> inactivity (timeout is 3m)", the problem is with "Query State: EXCEPTION"
> This makes user, think something went wrong with query execution.
> So I recommend that queries completed, but expired due to client-inactivity 
> be marked as 
> "Query State: FINISHED"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-4714) Idle session expired query goes in to exception state - And this is confusing

2018-10-01 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634817#comment-16634817
 ] 

Tim Armstrong commented on IMPALA-4714:
---

I think there are a variety of considerations here, but I'd consider the 
FINISHED state for expired queries to be more problematic than the EXCEPTION 
state, since it implies that the query finished cleanly, but in fact it was 
terminated by the server. IMPALA-7561 provides some evidence that it's simpler 
for clients if the EXCEPTION state is used, because it's clear then that the 
query was terminated.

The right fix is for clients to close queries promptly - having them timed out 
by the server is not a clean way to close the query

I.e. I don't think it's a good idea to make this change.

> Idle session expired query goes in to exception state - And this is confusing
> -
>
> Key: IMPALA-4714
> URL: https://issues.apache.org/jira/browse/IMPALA-4714
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.6.0
>Reporter: Mala Chikka Kempanna
>Priority: Major
>  Labels: query-lifecycle
>
> After setting idle_session_timeout , impala server, after completing 
> execution of query, moves it into exception state , if there is no client 
> activity.
> Example profile excerpt showing this behavior:
> {code}
> Query Timeline
> Start execution: 0ns (0ns)
> Planning finished: 9ms (9ms)
> Child queries finished: 8.3m (8.3m)
> Metastore update finished: 8.3m (661ms)
> Rows available: 8.3m (0ns)
> Cancelled: 11.3m (3.0m)
> Unregister query: 12.0m (42.55s)
> {code}
> Query status and query state-
> {code}
> Query Type: DDL
> Query State: EXCEPTION
> Start Time: Dec 22, 2016 11:45:01 AM
> End Time: Dec 22, 2016 11:57:01 AM
> Duration: 11m, 59s
> Admission Result: Unknown
> Client Fetch Wait Time: 3.7m
> Client Fetch Wait Time Percentage: 31
> Connected User: admin
> DDL Type: COMPUTE_STATS
> File Formats:
> Impala Version: impalad version 2.5.0-cdh5.7.2 RELEASE (build 
> 1140f8289dc0d2b1517bcf70454bb4575eb8cc70)
> Network Address: 10.17.100.123:44618
> Out of Memory: false
> Planning Wait Time: 9ms
> Planning Wait Time Percentage: 0
> Query Status: Query d141e0d996c91e72:bb8726fb917537bb expired due to client 
> inactivity (timeout is 3m)
> Session ID: 3043ff5042860968:8f92bc3bd2a0ca83
> Session Type: HIVESERVER2
> {code}
> Though query status string is very clear saying "expired due to client 
> inactivity (timeout is 3m)", the problem is with "Query State: EXCEPTION"
> This makes user, think something went wrong with query execution.
> So I recommend that queries completed, but expired due to client-inactivity 
> be marked as 
> "Query State: FINISHED"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7220) Queries processing very large strings hit "memory limit exceeded" instead of spilling

2018-10-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7220:
--
Component/s: Backend

> Queries processing very large strings hit "memory limit exceeded" instead of 
> spilling
> -
>
> Key: IMPALA-7220
> URL: https://issues.apache.org/jira/browse/IMPALA-7220
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
> Environment: m4.4xl, ubuntu 16.04
>Reporter: Jim Apple
>Priority: Major
>  Labels: resource-management
>
> I saw {{primitive_shuffle_1mb_rows}} failing at scale factor 4 with 3 
> impalads running:
> {noformat}
> [scheduler] [tpch4_parquet Thread 0]: Running Query: 
> primitive_shuffle_1mb_rows
> [query_exec_functions] [tpch4_parquet Thread 0]: Connected to localhost:21001
> [query_exec_functions] [tpch4_parquet Thread 0]: Connected to localhost:21001
> [query_exec_functions] [tpch4_parquet Thread 0]: ImpalaBeeswaxException:
>  Query aborted:Memory limit exceeded: Error occurred on backend 
> ip-172-31-25-187:22001 by fragment cd4f016604f24316:c310bea6000f
> Memory left in process limit: -410.47 MB
> {noformat}
> Command run was 
> {noformat}
> ./bin/single_node_perf_run.py --iterations 2 --scale 4 --table_formats 
> parquet/none --workload targeted-perf --num_impalads 3 --query_names '.*' 
> --load --start_minicluster $HASH1 $HASH2
> {noformat}
> https://jenkins.impala.io/view/Experimental/job/perf-AB-test/182/consoleText
> cc: [~janulatha]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7220) Queries processing very large strings hit "memory limit exceeded" instead of spilling

2018-10-01 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634812#comment-16634812
 ] 

Tim Armstrong commented on IMPALA-7220:
---

[~jbapple] I guess I missed the last comment. I think we should do something 
about the single_node_perf_run issue - at least make it tolerate query failures 
more gracefully. The query is really only appropriate for running at a large 
cluster. For reference, the query is this beast:
{noformat}
with wide_lineitem
 AS (SELECT *,

repeat(concat(uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),

uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),uuid(),


[jira] [Updated] (IMPALA-7220) Queries processing very large strings hit "memory limit exceeded" instead of spilling

2018-10-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7220:
--
Issue Type: Improvement  (was: Bug)

> Queries processing very large strings hit "memory limit exceeded" instead of 
> spilling
> -
>
> Key: IMPALA-7220
> URL: https://issues.apache.org/jira/browse/IMPALA-7220
> Project: IMPALA
>  Issue Type: Improvement
> Environment: m4.4xl, ubuntu 16.04
>Reporter: Jim Apple
>Priority: Major
>  Labels: resource-management
>
> I saw {{primitive_shuffle_1mb_rows}} failing at scale factor 4 with 3 
> impalads running:
> {noformat}
> [scheduler] [tpch4_parquet Thread 0]: Running Query: 
> primitive_shuffle_1mb_rows
> [query_exec_functions] [tpch4_parquet Thread 0]: Connected to localhost:21001
> [query_exec_functions] [tpch4_parquet Thread 0]: Connected to localhost:21001
> [query_exec_functions] [tpch4_parquet Thread 0]: ImpalaBeeswaxException:
>  Query aborted:Memory limit exceeded: Error occurred on backend 
> ip-172-31-25-187:22001 by fragment cd4f016604f24316:c310bea6000f
> Memory left in process limit: -410.47 MB
> {noformat}
> Command run was 
> {noformat}
> ./bin/single_node_perf_run.py --iterations 2 --scale 4 --table_formats 
> parquet/none --workload targeted-perf --num_impalads 3 --query_names '.*' 
> --load --start_minicluster $HASH1 $HASH2
> {noformat}
> https://jenkins.impala.io/view/Experimental/job/perf-AB-test/182/consoleText
> cc: [~janulatha]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7648) Add tests for all cases where OOM is expected

2018-10-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7648:
--
Description: 
We should add tests for most or all of the subtasks of IMPALA-4834 that 
exercise the expected OOM code path. I'll add some bullet points here to track 
what the coverage is and what's missing.

* Aggregations with large var-len string expressions
* Top-N with large window
* -Kudu scans- : covered by kudu-scan-mem-usage.test
* -Nested loop join- : covered by single-node-nlj-exhaustive.test
* Many duplicate keys on build side of hash join
* -Large number of NULLS on build side of NAAJ- : covered by 
spilling-naaj-no-deny-reservation.test
* -HDFS table partitioned insert- : covered insert-mem-limit.test
* Large analytic window can't be spilled
* Queries processing large strings (may need multiple tests to cover different 
places). large_strings.test has some coverage
* Parquet files with large pages
* Exchange uses a lot of memory

  was:
We should add tests for most or all of the subtasks of IMPALA-4834 that 
exercise the expected OOM code path. I'll add some bullet points here to track 
what the coverage is and what's missing.

* Aggregations with large var-len string expressions
* Top-N with large window
* -Kudu scans- : covered by kudu-scan-mem-usage.test
* -Nested loop join- : covered by single-node-nlj-exhaustive.test
* Many duplicate keys on build side of hash join
* -Large number of NULLS on build side of NAAJ- : covered by 
spilling-naaj-no-deny-reservation.test
* -HDFS table partitioned insert- : covered insert-mem-limit.test
* Large analytic window can't be spilled
* Queries processing large strings (may need multiple tests to cover different 
places). large_strings.test has some coverage
* Parquet files with large pages


> Add tests for all cases where OOM is expected
> -
>
> Key: IMPALA-7648
> URL: https://issues.apache.org/jira/browse/IMPALA-7648
> Project: IMPALA
>  Issue Type: Test
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: resource-management
>
> We should add tests for most or all of the subtasks of IMPALA-4834 that 
> exercise the expected OOM code path. I'll add some bullet points here to 
> track what the coverage is and what's missing.
> * Aggregations with large var-len string expressions
> * Top-N with large window
> * -Kudu scans- : covered by kudu-scan-mem-usage.test
> * -Nested loop join- : covered by single-node-nlj-exhaustive.test
> * Many duplicate keys on build side of hash join
> * -Large number of NULLS on build side of NAAJ- : covered by 
> spilling-naaj-no-deny-reservation.test
> * -HDFS table partitioned insert- : covered insert-mem-limit.test
> * Large analytic window can't be spilled
> * Queries processing large strings (may need multiple tests to cover 
> different places). large_strings.test has some coverage
> * Parquet files with large pages
> * Exchange uses a lot of memory



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-4836) Kudu scans should operate within a memory constraint

2018-10-01 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634809#comment-16634809
 ] 

Tim Armstrong commented on IMPALA-4836:
---

IMPALA-7096 solved one of the more glaring issues here where we would start up 
too many scanner threads when memory was constrained. 

> Kudu scans should operate within a memory constraint
> 
>
> Key: IMPALA-4836
> URL: https://issues.apache.org/jira/browse/IMPALA-4836
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.9.0
>Reporter: Tim Armstrong
>Priority: Major
>  Labels: kudu, resource-management
>
> It is somewhat unclear how we will do this, given that Kudu does its own 
> memory management outside of our buffer pool.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-4836) Kudu scans should operate within a memory constraint

2018-10-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-4836:
--
Priority: Minor  (was: Major)

> Kudu scans should operate within a memory constraint
> 
>
> Key: IMPALA-4836
> URL: https://issues.apache.org/jira/browse/IMPALA-4836
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.9.0
>Reporter: Tim Armstrong
>Priority: Minor
>  Labels: kudu, resource-management
>
> It is somewhat unclear how we will do this, given that Kudu does its own 
> memory management outside of our buffer pool.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7564) Conservative FK/PK join type detection with complex equi-join conjuncts

2018-10-01 Thread Paul Rogers (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634796#comment-16634796
 ] 

Paul Rogers commented on IMPALA-7564:
-

The particular issue here appears to be the {{substr()}} function. We cannot 
know that that function produces unique keys, that {{NDV(substr(foo) = 
NDV(foo)}}.

This is a very odd case. Can we use some kind of hint to tell the planner that 
the expression is still unique? (That is, the NDV of the expression = base 
table row count.)

> Conservative FK/PK join type detection with complex equi-join conjuncts
> ---
>
> Key: IMPALA-7564
> URL: https://issues.apache.org/jira/browse/IMPALA-7564
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.12.0, Impala 2.13.0, Impala 3.1.0
>Reporter: bharath v
>Priority: Major
>
> With IMPALA-5547, we predict whether a join is an FK/PK join as follows.
> {noformat}
>  // Iterate over all groups of conjuncts that belong to the same joined tuple 
> id pair.
> // For each group, we compute the join NDV of the rhs slots and compare 
> it to the
> // number of rows in the rhs table.
> for (List fkPkCandidate: 
> scanSlotsByJoinedTids.values()) {
>   double jointNdv = 1.0;
>   for (EqJoinConjunctScanSlots slots: fkPkCandidate) jointNdv *= 
> slots.rhsNdv();
>   double rhsNumRows = fkPkCandidate.get(0).rhsNumRows();
>   if (jointNdv >= Math.round(rhsNumRows * (1.0 - 
> FK_PK_MAX_STATS_DELTA_PERC))) {
> // We cannot disprove that the RHS is a PK.
> if (result == null) result = Lists.newArrayList();
> result.addAll(fkPkCandidate);
>   }
> }
> {noformat}
> We iterate through all the "simple" equi join conjuncts on the RHS, multiply 
> their NDVs and check if it close to rhsNumRows. The issue here is that this 
> can result in conservative FK/Pk detection if the equi-join conjuncts are not 
> simple (of the form  = )
> {noformat}
> /**
>  * Returns a new EqJoinConjunctScanSlots for the given equi-join conjunct 
> or null if
>  * the given conjunct is not of the form  =  or if the 
> underlying
>  * table/column of at least one side is missing stats.
>  */
> public static EqJoinConjunctScanSlots create(Expr eqJoinConjunct) {
>   if (!Expr.IS_EQ_BINARY_PREDICATE.apply(eqJoinConjunct)) return null;
>   SlotDescriptor lhsScanSlot = 
> eqJoinConjunct.getChild(0).findSrcScanSlot();
>   if (lhsScanSlot == null || !hasNumRowsAndNdvStats(lhsScanSlot)) return 
> null;
>   SlotDescriptor rhsScanSlot = 
> eqJoinConjunct.getChild(1).findSrcScanSlot();
> {noformat}
> For example, the following query contains a complex equi-join conjunct 
> {{substr(l.c3, 1, 6) = substr(r.c3, 1,6)}}, so while detecting if the left 
> outer join is an FK/PK, we just check if 
> {{NDVs(r.c1) * NDVs(r.c2) ~ r.numRows()}} which is incorrect. (This happens 
> because EqJoinConjunctScanSlots.create() returns null for any non-simple 
> predicates which are not considered later).
> {noformat}
> [localhost:21000]> explain select * from test_left l left outer join 
> test_right r on l.c1 = r.c1 and l.c2 = r.c2 and substr(l.c3, 1, 6) = 
> substr(r.c3, 1,6);
> Query: explain select * from test_left l left outer join test_right r on l.c1 
> = r.c1 and l.c2 = r.c2 and substr(l.c3, 1, 6) = substr(r.c3, 1,6)
> +-+
> | Explain String  
> |
> +-+
> | Max Per-Host Resource Reservation: Memory=1.95MB Threads=5  
> |
> | Per-Host Resource Estimates: Memory=66MB
> |
> | 
> |
> | F02:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1   
> |
> | |  Per-Host Resources: mem-estimate=0B mem-reservation=0B 
> thread-reservation=1  |
> | PLAN-ROOT SINK  
> |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0  
> |
> | |   
> |
> | 04:EXCHANGE [UNPARTITIONED] 
> |
> | |  mem-estimate=0B mem-reservation=0B thread-reservation=0  
> |
> | |  tuple-ids=0,1N row-size=94B cardinality=49334767023  
> |
> | |  in pipelines: 00(GETNEXT)

[jira] [Created] (IMPALA-7648) Add tests for all cases where OOM is expected

2018-10-01 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-7648:
-

 Summary: Add tests for all cases where OOM is expected
 Key: IMPALA-7648
 URL: https://issues.apache.org/jira/browse/IMPALA-7648
 Project: IMPALA
  Issue Type: Test
  Components: Infrastructure
Reporter: Tim Armstrong
Assignee: Tim Armstrong


We should add tests for most or all of the subtasks of IMPALA-4834 that 
exercise the expected OOM code path. I'll add some bullet points here to track 
what the coverage is and what's missing.

* Aggregations with large var-len string expressions
* Top-N with large window
* -Kudu scans- : covered by kudu-scan-mem-usage.test
* -Nested loop join- : covered by single-node-nlj-exhaustive.test
* Many duplicate keys on build side of hash join
* -Large number of NULLS on build side of NAAJ- : covered by 
spilling-naaj-no-deny-reservation.test
* -HDFS table partitioned insert- : covered insert-mem-limit.test
* Large analytic window can't be spilled
* Queries processing large strings (may need multiple tests to cover different 
places). large_strings.test has some coverage
* Parquet files with large pages



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Reopened] (IMPALA-7527) Expose fetch-from-catalogd cache and latency metrics in profiles

2018-10-01 Thread Vuk Ercegovac (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vuk Ercegovac reopened IMPALA-7527:
---

> Expose fetch-from-catalogd cache and latency metrics in profiles
> 
>
> Key: IMPALA-7527
> URL: https://issues.apache.org/jira/browse/IMPALA-7527
> Project: IMPALA
>  Issue Type: Sub-task
>Affects Versions: Impala 3.1.0
>Reporter: Todd Lipcon
>Assignee: Vuk Ercegovac
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> Since we now have some caching and potential remote calls on the planning 
> path, it's important to be able to understand how that contributes to the 
> performance of planning. This JIRA tracks adding such information to the 
> profile.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Reopened] (IMPALA-7622) Add query profile metrics for RPC's used when pulling incremental stats.

2018-10-01 Thread Vuk Ercegovac (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vuk Ercegovac reopened IMPALA-7622:
---

> Add query profile metrics for RPC's used when pulling incremental stats.
> 
>
> Key: IMPALA-7622
> URL: https://issues.apache.org/jira/browse/IMPALA-7622
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Vuk Ercegovac
>Assignee: Vuk Ercegovac
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> When --pull_incremental_statistics is enabled, the frontend will fetch these 
> stats from catalogd. We should record metrics for this, such as number 
> partitions fetched, size of received bytes, and elapsed time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7590) Stress test hit inconsistent results with TPCDS-Q18A

2018-10-01 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho reassigned IMPALA-7590:
--

Assignee: Thomas Tauber-Marshall

> Stress test hit inconsistent results with TPCDS-Q18A
> 
>
> Key: IMPALA-7590
> URL: https://issues.apache.org/jira/browse/IMPALA-7590
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Michael Ho
>Assignee: Thomas Tauber-Marshall
>Priority: Blocker
>
> Recent runs of stress test in a cluster with 135 nodes resulted in 
> inconsistent result every now and then for TPCDS-Q18a. The scale of TPC-DS is 
> 1.
> {noformat}
> --- result_correct.txt2018-09-10 08:54:30.427603941 -0700
> +++ result_incorrect.txt  2018-09-10 17:39:59.512926323 -0700
> @@ -1,3 +1,4 @@
> +opening 
> /tmp/stress/instance1/data/jenkins/workspace/impala-test-stress-secure-140node/archive/result_hashes/input.txt
>  
> +--++--+---+---++--++--+-+--+
>  | i_item_id| ca_country | ca_state | ca_county | agg1  | agg2   | 
> agg3 | agg4   | agg5 | agg6| agg7 |
>  
> +--++--+---+---++--++--+-+--+
> @@ -13,7 +14,7 @@
>  | AABM || IN   |   | 67.00 | 105.60 | 
> 2232.51  | 74.08  | -1114.55 | 1964.50 | 1.00 |
>  | AABNFAAA || IN   |   | 40.00 | 115.76 | 
> 0.00 | 70.61  | -459.60  | 1933.00 | 3.00 |
>  | AACBBAAA || IN   |   | 32.00 | 37.99  | 
> 0.00 | 8.73   | -448.64  | 1963.00 | 3.00 |
> -| AACC || IN   |   | 56.00 | 2.50   | 
> 0.00 | 0.62   | -62.72   | NULL| 4.00 |
> +| AACC || IN   |   | 56.00 | 2.50   | 
> 0.00 | 0.62   | -62.72   | 38463209| 4.00 |
>  | AACDCAAA || IN   |   | 30.00 | 53.19  | 
> 0.00 | 17.02  | -505.80  | 1990.00 | 6.00 |
>  | AACFDAAA || IN   |   | 58.00 | 113.96 | 
> 0.00 | 19.37  | -2148.90 | 1974.00 | 1.00 |
>  | AACHEAAA || IN   |   | 16.00 | 19.90  | 
> 0.00 | 13.13  | 9.76 | 1960.00 | 3.00 |
> @@ -101,4 +102,4 @@
>  | AAPKBAAA || IN   |   | 2.00  | 65.90  | 
> 0.00 | 58.65  | 60.24| 1954.00 | 3.00 |
>  | AAPO || IN   |   | 92.00 | 125.36 | 
> 0.00 | 94.02  | 1743.40  | 1963.00 | 6.00 |
>  | AAPODAAA || IN   |   | 75.00 | 119.08 | 
> 0.00 | 104.79 | 4501.50  | 1981.00 | 5.00 |
> -+--++--+---+---++--++--+-+--+
> \ No newline at end of file
> ++--++--+---+---++--++--+-+--+
> {noformat}
> The problem is not reproducible by running the query at Impala shell.
> The query is TPCDS Q18a:
> {noformat}
> with results as
>  (select i_item_id,
> ca_country,
> ca_state,
> ca_county,
> cast(cs_quantity as decimal(12,2)) agg1,
> cast(cs_list_price as decimal(12,2)) agg2,
> cast(cs_coupon_amt as decimal(12,2)) agg3,
> cast(cs_sales_price as decimal(12,2)) agg4,
> cast(cs_net_profit as decimal(12,2)) agg5,
> cast(c_birth_year as decimal(12,2)) agg6,
> cast(cd1.cd_dep_count as decimal(12,2)) agg7
>  from catalog_sales, customer_demographics cd1, customer_demographics cd2, 
> customer, customer_address, date_dim, item
>  where cs_sold_date_sk = d_date_sk and
>cs_item_sk = i_item_sk and
>cs_bill_cdemo_sk = cd1.cd_demo_sk and
>cs_bill_customer_sk = c_customer_sk and
>cd1.cd_gender = 'F' and
>cd1.cd_education_status = 'Unknown' and
>c_current_cdemo_sk = cd2.cd_demo_sk and
>c_current_addr_sk = ca_address_sk and
>c_birth_month in (1, 6, 8, 9, 12, 2) and
>d_year = 1998 and
>ca_state in ('MS', 'IN', 'ND', 'OK', 'NM', 'VA', 'MS')
>  )
>   select  i_item_id, ca_country, ca_state, ca_county, agg1, agg2, agg3, agg4, 
> agg5, agg6, agg7
>  from (
>   select i_item_id, ca_country, ca_state, ca_county, avg(agg1) agg1,
> avg(agg2) agg2, avg(agg3) agg3, avg(agg4) agg4, avg(agg5) agg5, avg(agg6) 
> agg6, avg(agg7) agg7
>   from results
>   group by i_item_id, ca_country, ca_state, ca_county
>   union all
>   select i_item_id, ca_country, ca_state, NULL as county, avg(agg1) agg1, 
> avg(agg2) agg2, avg(agg3) agg3,
> avg(agg4) agg4, 

[jira] [Commented] (IMPALA-7644) Hide Parquet page index writing with feature flag

2018-10-01 Thread Lars Volker (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634584#comment-16634584
 ] 

Lars Volker commented on IMPALA-7644:
-

You're right, sry for the confusion.

> Hide Parquet page index writing with feature flag
> -
>
> Key: IMPALA-7644
> URL: https://issues.apache.org/jira/browse/IMPALA-7644
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: parquet, performance
>
> Currently there is no released Impala version that can write the Parquet page 
> index:
> [https://github.com/apache/parquet-format/blob/master/PageIndex.md]
> However, the current Impala master writes the page index since IMPALA-5842, 
> but cannot read it.
> I think we should hide the write path with a feature flag until Impala is 
> able to read it back and has better test coverage on it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7644) Hide Parquet page index writing with feature flag

2018-10-01 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker updated IMPALA-7644:

Target Version: Impala 3.1.0  (was: Impala 3.2.0)

> Hide Parquet page index writing with feature flag
> -
>
> Key: IMPALA-7644
> URL: https://issues.apache.org/jira/browse/IMPALA-7644
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: parquet, performance
>
> Currently there is no released Impala version that can write the Parquet page 
> index:
> [https://github.com/apache/parquet-format/blob/master/PageIndex.md]
> However, the current Impala master writes the page index since IMPALA-5842, 
> but cannot read it.
> I think we should hide the write path with a feature flag until Impala is 
> able to read it back and has better test coverage on it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-4308) Make the minidumps archived in our Jenkins jobs usable

2018-10-01 Thread Thomas Tauber-Marshall (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-4308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Tauber-Marshall resolved IMPALA-4308.

   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Make the minidumps archived in our Jenkins jobs usable
> --
>
> Key: IMPALA-4308
> URL: https://issues.apache.org/jira/browse/IMPALA-4308
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 2.8.0
>Reporter: Taras Bobrovytsky
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>  Labels: breakpad, test-infra
> Fix For: Impala 3.1.0
>
>
> The minidumps that are archived in our Jenkins jobs are unusable because we 
> do not save the symbols that are required to extract stack traces. As part of 
> the log archiving process, we should:
> # Extract the necessary symbols and save them into the $IMPALA_HOME/logs 
> directory.
> # Automatically collect the backtraces from the minidumps and save them into 
> $IMPALA_HOME/logs directory in a text file



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-110) Add support for multiple distinct operators in the same query block

2018-10-01 Thread Thomas Tauber-Marshall (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Tauber-Marshall resolved IMPALA-110.
---
Resolution: Fixed

> Add support for multiple distinct operators in the same query block
> ---
>
> Key: IMPALA-110
> URL: https://issues.apache.org/jira/browse/IMPALA-110
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend, Frontend
>Affects Versions: Impala 0.5, Impala 1.4, Impala 2.0, Impala 2.2, Impala 
> 2.3.0
>Reporter: Greg Rahn
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>  Labels: sql-language
> Fix For: Impala 3.1.0
>
>
> Impala only allows a single (DISTINCT columns) expression in each query.
> {color:red}Note:
> If you do not need precise accuracy, you can produce an estimate of the 
> distinct values for a column by specifying NDV(column); a query can contain 
> multiple instances of NDV(column). To make Impala automatically rewrite 
> COUNT(DISTINCT) expressions to NDV(), enable the APPX_COUNT_DISTINCT query 
> option.
> {color}
> {code}
> [impala:21000] > select count(distinct i_class_id) from item;
> Query: select count(distinct i_class_id) from item
> Query finished, fetching results ...
> 16
> Returned 1 row(s) in 1.51s
> {code}
> {code}
> [impala:21000] > select count(distinct i_class_id), count(distinct 
> i_brand_id) from item;
> Query: select count(distinct i_class_id), count(distinct i_brand_id) from item
> ERROR: com.cloudera.impala.common.AnalysisException: Analysis exception (in 
> select count(distinct i_class_id), count(distinct i_brand_id) from item)
>   at 
> com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:133)
>   at 
> com.cloudera.impala.service.Frontend.createExecRequest(Frontend.java:221)
>   at 
> com.cloudera.impala.service.JniFrontend.createExecRequest(JniFrontend.java:89)
> Caused by: com.cloudera.impala.common.AnalysisException: all DISTINCT 
> aggregate functions need to have the same set of parameters as COUNT(DISTINCT 
> i_class_id); deviating function: COUNT(DISTINCT i_brand_id)
>   at 
> com.cloudera.impala.analysis.AggregateInfo.createDistinctAggInfo(AggregateInfo.java:196)
>   at 
> com.cloudera.impala.analysis.AggregateInfo.create(AggregateInfo.java:143)
>   at 
> com.cloudera.impala.analysis.SelectStmt.createAggInfo(SelectStmt.java:466)
>   at 
> com.cloudera.impala.analysis.SelectStmt.analyzeAggregation(SelectStmt.java:347)
>   at com.cloudera.impala.analysis.SelectStmt.analyze(SelectStmt.java:155)
>   at 
> com.cloudera.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:130)
>   ... 2 more
> {code}
> Hive supports this:
> {code}
> $ hive -e "select count(distinct i_class_id), count(distinct i_brand_id) from 
> item;"
> Logging initialized using configuration in 
> file:/etc/hive/conf.dist/hive-log4j.properties
> Hive history file=/tmp/grahn/hive_job_log_grahn_201303052234_1625576708.txt
> Total MapReduce jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapred.reduce.tasks=
> Starting Job = job_201302081514_0073, Tracking URL = 
> http://impala:50030/jobdetails.jsp?jobid=job_201302081514_0073
> Kill Command = /usr/lib/hadoop/bin/hadoop job  
> -Dmapred.job.tracker=m0525.mtv.cloudera.com:8021 -kill job_201302081514_0073
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 
> 1
> 2013-03-05 22:34:43,255 Stage-1 map = 0%,  reduce = 0%
> 2013-03-05 22:34:49,323 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:50,337 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:51,351 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:52,360 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:53,370 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:54,379 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.81 
> sec
> 2013-03-05 22:34:55,389 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 8.58 sec
> 2013-03-05 22:34:56,402 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 8.58 sec
> 2013-03-05 22:34:57,413 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 8.58 sec
> 2013-03-05 22:34:58,424 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 
> 8.58 sec
> MapReduce Total cumulative CPU time: 8 seconds 580 msec
> Ended Job = job_201302081514_0073
> MapReduce Jobs Launched: 
> Job 0: Map: 1  Reduce: 1   

[jira] [Updated] (IMPALA-6271) Impala daemon should log a message when it's being shut down

2018-10-01 Thread Pranay Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranay Singh updated IMPALA-6271:
-
Fix Version/s: Impala 3.1.0

> Impala daemon should log a message when it's being shut down
> 
>
> Key: IMPALA-6271
> URL: https://issues.apache.org/jira/browse/IMPALA-6271
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.10.0
>Reporter: Zoram Thanga
>Assignee: Pranay Singh
>Priority: Major
>  Labels: observability, supportability
> Fix For: Impala 3.1.0
>
>
> At present the Impala daemon does not log any message when it is being shut 
> down, usually via SIGTERM from management software or OS shutdown. It would 
> be good to at the very least catch this signal to log a message that we are 
> going down. This will aid in serviceability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-6271) Impala daemon should log a message when it's being shut down

2018-10-01 Thread Pranay Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranay Singh closed IMPALA-6271.

Resolution: Fixed

> Impala daemon should log a message when it's being shut down
> 
>
> Key: IMPALA-6271
> URL: https://issues.apache.org/jira/browse/IMPALA-6271
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.10.0
>Reporter: Zoram Thanga
>Assignee: Pranay Singh
>Priority: Major
>  Labels: observability, supportability
> Fix For: Impala 3.1.0
>
>
> At present the Impala daemon does not log any message when it is being shut 
> down, usually via SIGTERM from management software or OS shutdown. It would 
> be good to at the very least catch this signal to log a message that we are 
> going down. This will aid in serviceability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Reopened] (IMPALA-6271) Impala daemon should log a message when it's being shut down

2018-10-01 Thread Pranay Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranay Singh reopened IMPALA-6271:
--

> Impala daemon should log a message when it's being shut down
> 
>
> Key: IMPALA-6271
> URL: https://issues.apache.org/jira/browse/IMPALA-6271
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.10.0
>Reporter: Zoram Thanga
>Assignee: Pranay Singh
>Priority: Major
>  Labels: observability, supportability
>
> At present the Impala daemon does not log any message when it is being shut 
> down, usually via SIGTERM from management software or OS shutdown. It would 
> be good to at the very least catch this signal to log a message that we are 
> going down. This will aid in serviceability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7644) Hide Parquet page index writing with feature flag

2018-10-01 Thread JIRA


[ 
https://issues.apache.org/jira/browse/IMPALA-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634474#comment-16634474
 ] 

Zoltán Borók-Nagy commented on IMPALA-7644:
---

[~lv] This flag needs to get into the next release (3.1.0), so the target 
version should be 3.1.0, shouldn't it?

The read path won't be available in 3.1.0, so the target version for the read 
path is 3.2.0. Sorry for not being clear.

> Hide Parquet page index writing with feature flag
> -
>
> Key: IMPALA-7644
> URL: https://issues.apache.org/jira/browse/IMPALA-7644
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: parquet, performance
>
> Currently there is no released Impala version that can write the Parquet page 
> index:
> [https://github.com/apache/parquet-format/blob/master/PageIndex.md]
> However, the current Impala master writes the page index since IMPALA-5842, 
> but cannot read it.
> I think we should hide the write path with a feature flag until Impala is 
> able to read it back and has better test coverage on it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-7520) NPE in SentryProxy

2018-10-01 Thread Fredy Wijaya (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-7520 started by Fredy Wijaya.

> NPE in SentryProxy
> --
>
> Key: IMPALA-7520
> URL: https://issues.apache.org/jira/browse/IMPALA-7520
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Adam Holley
>Assignee: Fredy Wijaya
>Priority: Major
>
> In SentryProxy.refreshPrivilegesInCache(), the call to 
> allPrincipalPrivileges.get(principal.getName()) is sometimes returning null.
> {noformat}
> java.lang.NullPointerException
> at 
> org.apache.impala.util.SentryProxy$PolicyReader.refreshPrivilegesInCatalog(SentryProxy.java:245)
> at 
> org.apache.impala.util.SentryProxy$PolicyReader.refreshRolePrivileges(SentryProxy.java:197)
> at 
> org.apache.impala.util.SentryProxy$PolicyReader.run(SentryProxy.java:139)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>  at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>  at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6271) Impala daemon should log a message when it's being shut down

2018-10-01 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634400#comment-16634400
 ] 

Tim Armstrong commented on IMPALA-6271:
---

[~pranay_singh] please set a fix version

> Impala daemon should log a message when it's being shut down
> 
>
> Key: IMPALA-6271
> URL: https://issues.apache.org/jira/browse/IMPALA-6271
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.10.0
>Reporter: Zoram Thanga
>Assignee: Pranay Singh
>Priority: Major
>  Labels: observability, supportability
>
> At present the Impala daemon does not log any message when it is being shut 
> down, usually via SIGTERM from management software or OS shutdown. It would 
> be good to at the very least catch this signal to log a message that we are 
> going down. This will aid in serviceability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-6271) Impala daemon should log a message when it's being shut down

2018-10-01 Thread Pranay Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranay Singh closed IMPALA-6271.


> Impala daemon should log a message when it's being shut down
> 
>
> Key: IMPALA-6271
> URL: https://issues.apache.org/jira/browse/IMPALA-6271
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.10.0
>Reporter: Zoram Thanga
>Assignee: Pranay Singh
>Priority: Major
>  Labels: observability, supportability
>
> At present the Impala daemon does not log any message when it is being shut 
> down, usually via SIGTERM from management software or OS shutdown. It would 
> be good to at the very least catch this signal to log a message that we are 
> going down. This will aid in serviceability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-6271) Impala daemon should log a message when it's being shut down

2018-10-01 Thread Pranay Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranay Singh resolved IMPALA-6271.
--
Resolution: Resolved

> Impala daemon should log a message when it's being shut down
> 
>
> Key: IMPALA-6271
> URL: https://issues.apache.org/jira/browse/IMPALA-6271
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.10.0
>Reporter: Zoram Thanga
>Assignee: Pranay Singh
>Priority: Major
>  Labels: observability, supportability
>
> At present the Impala daemon does not log any message when it is being shut 
> down, usually via SIGTERM from management software or OS shutdown. It would 
> be good to at the very least catch this signal to log a message that we are 
> going down. This will aid in serviceability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7642) Optimize UDF jar handling in Catalog

2018-10-01 Thread Balazs Jeszenszky (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky updated IMPALA-7642:
--
Attachment: test.html

> Optimize UDF jar handling in Catalog
> 
>
> Key: IMPALA-7642
> URL: https://issues.apache.org/jira/browse/IMPALA-7642
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 3.0
>Reporter: Miklos Szurap
>Priority: Major
>
> 1. Optimize UDF jar loading
> During startup and global invalidate metadata calls, for each database the 
> [CatalogServiceCatalog.loadJavaFunctions()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L956]
>  is called, which calls 
> [extractFunctions()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/util/FunctionUtils.java#L68]
>  for each function found in HMS, and for each function the related UDF jar 
> file is downloaded from HDFS to the localLibraryPath (file:///tmp). It is not 
> uncommon that the UDFs are not packaged separately, but in everything-in-one 
> big-fat jars, so they can be 10-50 MB of size. Sometimes there are hundreds 
> of functions in a database (which usually related to the same project) and 
> all functions are pointing to the same UDF jar. The above method hundreds of 
> times downloads the same jar, "extracts the function" and deletes the local 
> file.
> The suggestion would be to improve this by:
> - creating a local "cache" in CatalogServiceCatalog.loadJavaFunctions() as a 
> HashMap (map of jarUri -> localJarPath)
> - pass this cache to FunctionUtils.extractFunctions, which checks if the 
> cache already contains the jarUri. If not, downloads the jar, and puts it 
> into the cache (and does everything else needed)
> - move the FileSystemUtil.deleteIfExists(localJarPath) from extractFunctions 
> to loadJavaFunctions - in a finally block iterate over the cache entries 
> (values) and delete the local files, and on the end clear the cache.
> 2. Use {{Set}} instead of {{List}} for addedSignatures in 
> [FunctionUtils.extractFunctions()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/util/FunctionUtils.java#L73]:
> It just tracks which function signatures were added, for that purpose a Set 
> is fine. 
> {noformat}
> if (!addedSignatures.contains(fn.signatureString())){noformat}
> This would be faster ( {{O( 1 )}} ) with a HashSet (compared to ArrayList's 
> {{O( n )}} for the contains method).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7642) Optimize UDF jar handling in Catalog

2018-10-01 Thread Balazs Jeszenszky (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Balazs Jeszenszky updated IMPALA-7642:
--
Attachment: (was: test.html)

> Optimize UDF jar handling in Catalog
> 
>
> Key: IMPALA-7642
> URL: https://issues.apache.org/jira/browse/IMPALA-7642
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 3.0
>Reporter: Miklos Szurap
>Priority: Major
>
> 1. Optimize UDF jar loading
> During startup and global invalidate metadata calls, for each database the 
> [CatalogServiceCatalog.loadJavaFunctions()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java#L956]
>  is called, which calls 
> [extractFunctions()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/util/FunctionUtils.java#L68]
>  for each function found in HMS, and for each function the related UDF jar 
> file is downloaded from HDFS to the localLibraryPath (file:///tmp). It is not 
> uncommon that the UDFs are not packaged separately, but in everything-in-one 
> big-fat jars, so they can be 10-50 MB of size. Sometimes there are hundreds 
> of functions in a database (which usually related to the same project) and 
> all functions are pointing to the same UDF jar. The above method hundreds of 
> times downloads the same jar, "extracts the function" and deletes the local 
> file.
> The suggestion would be to improve this by:
> - creating a local "cache" in CatalogServiceCatalog.loadJavaFunctions() as a 
> HashMap (map of jarUri -> localJarPath)
> - pass this cache to FunctionUtils.extractFunctions, which checks if the 
> cache already contains the jarUri. If not, downloads the jar, and puts it 
> into the cache (and does everything else needed)
> - move the FileSystemUtil.deleteIfExists(localJarPath) from extractFunctions 
> to loadJavaFunctions - in a finally block iterate over the cache entries 
> (values) and delete the local files, and on the end clear the cache.
> 2. Use {{Set}} instead of {{List}} for addedSignatures in 
> [FunctionUtils.extractFunctions()|https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/util/FunctionUtils.java#L73]:
> It just tracks which function signatures were added, for that purpose a Set 
> is fine. 
> {noformat}
> if (!addedSignatures.contains(fn.signatureString())){noformat}
> This would be faster ( {{O( 1 )}} ) with a HashSet (compared to ArrayList's 
> {{O( n )}} for the contains method).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7644) Hide Parquet page index writing with feature flag

2018-10-01 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker updated IMPALA-7644:

Component/s: Backend

> Hide Parquet page index writing with feature flag
> -
>
> Key: IMPALA-7644
> URL: https://issues.apache.org/jira/browse/IMPALA-7644
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: parquet, performance
>
> Currently there is no released Impala version that can write the Parquet page 
> index:
> [https://github.com/apache/parquet-format/blob/master/PageIndex.md]
> However, the current Impala master writes the page index since IMPALA-5842, 
> but cannot read it.
> I think we should hide the write path with a feature flag until Impala is 
> able to read it back and has better test coverage on it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7644) Hide Parquet page index writing with feature flag

2018-10-01 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker updated IMPALA-7644:

Affects Version/s: Impala 3.1.0

> Hide Parquet page index writing with feature flag
> -
>
> Key: IMPALA-7644
> URL: https://issues.apache.org/jira/browse/IMPALA-7644
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: parquet, performance
>
> Currently there is no released Impala version that can write the Parquet page 
> index:
> [https://github.com/apache/parquet-format/blob/master/PageIndex.md]
> However, the current Impala master writes the page index since IMPALA-5842, 
> but cannot read it.
> I think we should hide the write path with a feature flag until Impala is 
> able to read it back and has better test coverage on it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7644) Hide Parquet page index writing with feature flag

2018-10-01 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker updated IMPALA-7644:

Labels: parquet performance  (was: )

> Hide Parquet page index writing with feature flag
> -
>
> Key: IMPALA-7644
> URL: https://issues.apache.org/jira/browse/IMPALA-7644
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: parquet, performance
>
> Currently there is no released Impala version that can write the Parquet page 
> index:
> [https://github.com/apache/parquet-format/blob/master/PageIndex.md]
> However, the current Impala master writes the page index since IMPALA-5842, 
> but cannot read it.
> I think we should hide the write path with a feature flag until Impala is 
> able to read it back and has better test coverage on it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7644) Hide Parquet page index writing with feature flag

2018-10-01 Thread Lars Volker (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker updated IMPALA-7644:

Target Version: Impala 3.2.0  (was: Impala 3.1.0)

> Hide Parquet page index writing with feature flag
> -
>
> Key: IMPALA-7644
> URL: https://issues.apache.org/jira/browse/IMPALA-7644
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: parquet, performance
>
> Currently there is no released Impala version that can write the Parquet page 
> index:
> [https://github.com/apache/parquet-format/blob/master/PageIndex.md]
> However, the current Impala master writes the page index since IMPALA-5842, 
> but cannot read it.
> I think we should hide the write path with a feature flag until Impala is 
> able to read it back and has better test coverage on it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7644) Hide Parquet page index writing with feature flag

2018-10-01 Thread Lars Volker (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634349#comment-16634349
 ] 

Lars Volker commented on IMPALA-7644:
-

Thanks for the clarification.

> Hide Parquet page index writing with feature flag
> -
>
> Key: IMPALA-7644
> URL: https://issues.apache.org/jira/browse/IMPALA-7644
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>
> Currently there is no released Impala version that can write the Parquet page 
> index:
> [https://github.com/apache/parquet-format/blob/master/PageIndex.md]
> However, the current Impala master writes the page index since IMPALA-5842, 
> but cannot read it.
> I think we should hide the write path with a feature flag until Impala is 
> able to read it back and has better test coverage on it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7638) Lower default timeout for connection setup

2018-10-01 Thread Lars Volker (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634340#comment-16634340
 ] 

Lars Volker commented on IMPALA-7638:
-

Thanks for the feedback, [~jfs]. Here's the relevant quote by [~sailesh] in the 
review:

{quote}
We would also want the client connection timeout to default to a pretty high 
number since on large clusters, we've seen Kerberos negotiations take up to a 
few minutes.
I would prefer keeping the timeout to 5 minutes. It's not ideal, however, we 
would rather not see queries fail because of timed out negotiations vs. 
optimize for an even worse case of clients hung for 5 minutes (which is 
configurable by a flag if the user chooses to do so). This is the same reason 
we keep the internal connection timeout so high, since we'd rather see progress 
than a failed query due to one timed out connection.
{quote}

I suspect that with KRPC enabled we should not see Kerberos negotiations take 
several minutes, but we should confirm this with some stress tests.

> Lower default timeout for connection setup
> --
>
> Key: IMPALA-7638
> URL: https://issues.apache.org/jira/browse/IMPALA-7638
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Lars Volker
>Priority: Major
> Fix For: Impala 2.11.0
>
>
> IMPALA-5394 added the sasl_connect_tcp_timeout_ms flag with a default timeout 
> of 5 minutes. This seems too long as broken clients will prevent new clients 
> from establishing connections for this time. In addition to increasing the 
> acceptor thread pool size (IMPALA-7565) we should lower this timeout 
> considerably, e.g. to 5 seconds.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-7644) Hide Parquet page index writing with feature flag

2018-10-01 Thread JIRA


[ 
https://issues.apache.org/jira/browse/IMPALA-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634324#comment-16634324
 ] 

Zoltán Borók-Nagy edited comment on IMPALA-7644 at 10/1/18 5:01 PM:


Sorry, I meant it won't be available in 3.1.0, we plan to deliver it in 3.2.0.

PS: oops, thanks!


was (Author: boroknagyz):
Sorry I meant it wont be available in 3.1.0, we plan to deliver in 3.2.0.

PS: oops, thanks!

> Hide Parquet page index writing with feature flag
> -
>
> Key: IMPALA-7644
> URL: https://issues.apache.org/jira/browse/IMPALA-7644
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>
> Currently there is no released Impala version that can write the Parquet page 
> index:
> [https://github.com/apache/parquet-format/blob/master/PageIndex.md]
> However, the current Impala master writes the page index since IMPALA-5842, 
> but cannot read it.
> I think we should hide the write path with a feature flag until Impala is 
> able to read it back and has better test coverage on it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7644) Hide Parquet page index writing with feature flag

2018-10-01 Thread JIRA


[ 
https://issues.apache.org/jira/browse/IMPALA-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634324#comment-16634324
 ] 

Zoltán Borók-Nagy commented on IMPALA-7644:
---

Sorry I meant it wont be available in 3.1.0, we plan to deliver in 3.2.0.

PS: oops, thanks!

> Hide Parquet page index writing with feature flag
> -
>
> Key: IMPALA-7644
> URL: https://issues.apache.org/jira/browse/IMPALA-7644
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>
> Currently there is no released Impala version that can write the Parquet page 
> index:
> [https://github.com/apache/parquet-format/blob/master/PageIndex.md]
> However, the current Impala master writes the page index since IMPALA-5842, 
> but cannot read it.
> I think we should hide the write path with a feature flag until Impala is 
> able to read it back and has better test coverage on it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7644) Hide Parquet page index writing with feature flag

2018-10-01 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/IMPALA-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-7644:
--
Target Version: Impala 3.1.0
 Fix Version/s: (was: Impala 3.1.0)

> Hide Parquet page index writing with feature flag
> -
>
> Key: IMPALA-7644
> URL: https://issues.apache.org/jira/browse/IMPALA-7644
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>
> Currently there is no released Impala version that can write the Parquet page 
> index:
> [https://github.com/apache/parquet-format/blob/master/PageIndex.md]
> However, the current Impala master writes the page index since IMPALA-5842, 
> but cannot read it.
> I think we should hide the write path with a feature flag until Impala is 
> able to read it back and has better test coverage on it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7644) Hide Parquet page index writing with feature flag

2018-10-01 Thread Lars Volker (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634318#comment-16634318
 ] 

Lars Volker commented on IMPALA-7644:
-

3.1 will be the next release. If the read-path will be done by then, I don't 
think there's a benefit in adding a flag. Instead, we can just implement reads, 
add more tests, and then release 3.1, no?

PS: I think you mixed up Fix Version and Target Version. :)

> Hide Parquet page index writing with feature flag
> -
>
> Key: IMPALA-7644
> URL: https://issues.apache.org/jira/browse/IMPALA-7644
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> Currently there is no released Impala version that can write the Parquet page 
> index:
> [https://github.com/apache/parquet-format/blob/master/PageIndex.md]
> However, the current Impala master writes the page index since IMPALA-5842, 
> but cannot read it.
> I think we should hide the write path with a feature flag until Impala is 
> able to read it back and has better test coverage on it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7644) Hide Parquet page index writing with feature flag

2018-10-01 Thread JIRA


[ 
https://issues.apache.org/jira/browse/IMPALA-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634315#comment-16634315
 ] 

Zoltán Borók-Nagy commented on IMPALA-7644:
---

[~lv] the read path won't be available until 3.1.0, so we only test it from 
Python in test_parquet_page_index.py.

Until Impala can't read the page index we can't really have a very thorough 
testing I think.

Parquet-MR is about to implement page indexes, so maybe it would be beneficial 
if Impala could write it too, just not by default.

 

> Hide Parquet page index writing with feature flag
> -
>
> Key: IMPALA-7644
> URL: https://issues.apache.org/jira/browse/IMPALA-7644
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> Currently there is no released Impala version that can write the Parquet page 
> index:
> [https://github.com/apache/parquet-format/blob/master/PageIndex.md]
> However, the current Impala master writes the page index since IMPALA-5842, 
> but cannot read it.
> I think we should hide the write path with a feature flag until Impala is 
> able to read it back and has better test coverage on it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7492) Add support for DATE text parser/formatter

2018-10-01 Thread Attila Jeges (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Attila Jeges resolved IMPALA-7492.
--
Resolution: Implemented

https://github.com/apache/impala/commit/cb49371613909e56debee6275fd54759eb36ad33

> Add support for DATE text parser/formatter
> --
>
> Key: IMPALA-7492
> URL: https://issues.apache.org/jira/browse/IMPALA-7492
> Project: IMPALA
>  Issue Type: Sub-task
>Reporter: Attila Jeges
>Assignee: Attila Jeges
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7644) Hide Parquet page index writing with feature flag

2018-10-01 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/IMPALA-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy updated IMPALA-7644:
--
Fix Version/s: Impala 3.1.0

> Hide Parquet page index writing with feature flag
> -
>
> Key: IMPALA-7644
> URL: https://issues.apache.org/jira/browse/IMPALA-7644
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> Currently there is no released Impala version that can write the Parquet page 
> index:
> [https://github.com/apache/parquet-format/blob/master/PageIndex.md]
> However, the current Impala master writes the page index since IMPALA-5842, 
> but cannot read it.
> I think we should hide the write path with a feature flag until Impala is 
> able to read it back and has better test coverage on it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7645) Allow configuring default file format via query option

2018-10-01 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634296#comment-16634296
 ] 

Tim Armstrong commented on IMPALA-7645:
---

Isn't having a default value for the query option sufficient?

> Allow configuring default file format via query option
> --
>
> Key: IMPALA-7645
> URL: https://issues.apache.org/jira/browse/IMPALA-7645
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Tim Armstrong
>Priority: Major
>
> It would be useful to have a query option to allow setting the default file 
> format. This would allow the file format to be overridden globally or 
> per-session. We already have a COMPRESSION_CODEC option. 
> We had some discussion on IMPALA-2210 related to this.
> The current default is hardcoded in the code: 
> https://github.com/apache/impala/blob/64e6719870db5602a6fa85014bc6c264080b9414/fe/src/main/java/org/apache/impala/analysis/TableDef.java#L136
>  
> https://github.com/apache/impala/blob/64e6719870db5602a6fa85014bc6c264080b9414/fe/src/main/java/org/apache/impala/analysis/TableDef.java#L145



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7645) Allow configuring default file format via query option

2018-10-01 Thread Lars Volker (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634291#comment-16634291
 ] 

Lars Volker commented on IMPALA-7645:
-

Should we consider adding a command line flag to switch the default 
daemon-wide. This would still allow us the make the change now, and then switch 
the default during the next compatibility breaking version change.

> Allow configuring default file format via query option
> --
>
> Key: IMPALA-7645
> URL: https://issues.apache.org/jira/browse/IMPALA-7645
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Tim Armstrong
>Priority: Major
>
> It would be useful to have a query option to allow setting the default file 
> format. This would allow the file format to be overridden globally or 
> per-session. We already have a COMPRESSION_CODEC option. 
> We had some discussion on IMPALA-2210 related to this.
> The current default is hardcoded in the code: 
> https://github.com/apache/impala/blob/64e6719870db5602a6fa85014bc6c264080b9414/fe/src/main/java/org/apache/impala/analysis/TableDef.java#L136
>  
> https://github.com/apache/impala/blob/64e6719870db5602a6fa85014bc6c264080b9414/fe/src/main/java/org/apache/impala/analysis/TableDef.java#L145



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7647) Fix test gap - no test coverage of non-trivial HS2 result sets

2018-10-01 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-7647:
-

 Summary: Fix test gap - no test coverage of non-trivial HS2 result 
sets
 Key: IMPALA-7647
 URL: https://issues.apache.org/jira/browse/IMPALA-7647
 Project: IMPALA
  Issue Type: Sub-task
  Components: Infrastructure
Affects Versions: Impala 3.1.0
Reporter: Tim Armstrong
Assignee: Tim Armstrong


IMPALA-7588 revealed that we didn't have sufficient test coverage for HS2 
result sets.

We need to fix this before making any serious modifications to the HS2 server 
logic.

I'd like to get this coverage by adding a test dimension to run .test files via 
HS2, so we can leverage our existing tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7643) Report the number of currently queued queries in stress test

2018-10-01 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7643:
--
Summary: Report the number of currently queued queries in stress test  
(was: Report the number of currently queued queries)

> Report the number of currently queued queries in stress test
> 
>
> Key: IMPALA-7643
> URL: https://issues.apache.org/jira/browse/IMPALA-7643
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>
> The stress test output when running wiht admission control is confusing 
> because it reports queued queries as running, so it looks like we're running 
> with many queued queries, but rather they're piled up in admission control. 
> We should distinguish somehow.
> {noformat}
> Done | Running | Mem Lmt Ex | AC Reject | AC Timeout | Time Out | Cancel | 
> Err | Incorrect | Next Qry Mem Lmt | Tot Qry Mem Lmt | Tracked Mem | RSS Mem
>  230 | 518 |  0 | 0 |124 |0 | 12 |   
> 0 | 0 |   25 |   11182 |3061 |3984
>  243 | 528 |  0 | 0 |135 |0 | 12 |   
> 0 | 0 |   26 |   11363 |4065 |4964
>  255 | 540 |  0 | 0 |144 |0 | 12 |   
> 0 | 0 |   17 |   11644 |2744 |5199
>  261 | 550 |  0 | 0 |148 |0 | 12 |   
> 0 | 0 |   25 |   11866 |5038 |5228
>  266 | 562 |  0 | 0 |152 |0 | 12 |   
> 0 | 0 |   15 |   12085 |4515 |4127
>  271 | 573 |  0 | 0 |153 |0 | 12 |   
> 0 | 0 |   26 |   12318 |3676 |4059
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7644) Hide Parquet page index writing with feature flag

2018-10-01 Thread Lars Volker (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634280#comment-16634280
 ] 

Lars Volker commented on IMPALA-7644:
-

What's the target version for this? I think it would be better to get proper 
testing done before the next release instead of adding another temporary flag.

> Hide Parquet page index writing with feature flag
> -
>
> Key: IMPALA-7644
> URL: https://issues.apache.org/jira/browse/IMPALA-7644
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>
> Currently there is no released Impala version that can write the Parquet page 
> index:
> [https://github.com/apache/parquet-format/blob/master/PageIndex.md]
> However, the current Impala master writes the page index since IMPALA-5842, 
> but cannot read it.
> I think we should hide the write path with a feature flag until Impala is 
> able to read it back and has better test coverage on it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7646) SHOW GRANT USER not working on kerberized clusters

2018-10-01 Thread Adam Holley (JIRA)
Adam Holley created IMPALA-7646:
---

 Summary: SHOW GRANT USER not working on kerberized clusters
 Key: IMPALA-7646
 URL: https://issues.apache.org/jira/browse/IMPALA-7646
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 3.1.0
Reporter: Adam Holley


SHOW GRANT USER foo_user;
does not work on kerberized clusters because the requester name does not match 
the users name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7646) SHOW GRANT USER not working on kerberized clusters

2018-10-01 Thread Adam Holley (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Holley reassigned IMPALA-7646:
---

Assignee: Adam Holley

> SHOW GRANT USER not working on kerberized clusters
> --
>
> Key: IMPALA-7646
> URL: https://issues.apache.org/jira/browse/IMPALA-7646
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Adam Holley
>Assignee: Adam Holley
>Priority: Major
>
> SHOW GRANT USER foo_user;
> does not work on kerberized clusters because the requester name does not 
> match the users name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7645) Allow configuring default file format via query option

2018-10-01 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-7645:
-

 Summary: Allow configuring default file format via query option
 Key: IMPALA-7645
 URL: https://issues.apache.org/jira/browse/IMPALA-7645
 Project: IMPALA
  Issue Type: New Feature
  Components: Frontend
Reporter: Tim Armstrong


It would be useful to have a query option to allow setting the default file 
format. This would allow the file format to be overridden globally or 
per-session. We already have a COMPRESSION_CODEC option. 

We had some discussion on IMPALA-2210 related to this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7645) Allow configuring default file format via query option

2018-10-01 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634247#comment-16634247
 ] 

Tim Armstrong commented on IMPALA-7645:
---

[~lv]

> Allow configuring default file format via query option
> --
>
> Key: IMPALA-7645
> URL: https://issues.apache.org/jira/browse/IMPALA-7645
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Frontend
>Reporter: Tim Armstrong
>Priority: Major
>
> It would be useful to have a query option to allow setting the default file 
> format. This would allow the file format to be overridden globally or 
> per-session. We already have a COMPRESSION_CODEC option. 
> We had some discussion on IMPALA-2210 related to this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6900) Invalidate metadata operation is ignored at a coordinator if catalog is empty

2018-10-01 Thread Michael Brown (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634216#comment-16634216
 ] 

Michael Brown commented on IMPALA-6900:
---

IMPALA-7605 was duped to this. IMPALA-7605 is a P1 because it happens reliably 
in downstream testing setup phases (e.g., stress and performance data setup). 
Making this a P1 as a result of the duplication.

> Invalidate metadata operation is ignored at a coordinator if catalog is empty
> -
>
> Key: IMPALA-6900
> URL: https://issues.apache.org/jira/browse/IMPALA-6900
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Dimitris Tsirogiannis
>Assignee: Vuk Ercegovac
>Priority: Blocker
>
> The following workflow may cause an impalad that issued an invalidate 
> metadata to falsely consider that the effect of that operation has taken 
> effect, thus causing subsequent queries to fail due to unresolved references 
> to tables or databases. 
> Steps to reproduce:
>  # Start an impala cluster connecting to an empty HMS (no databases).
>  # Create a database "db" in HMS outside of Impala (e.g. using Hive).
>  # Run INVALIDATE METADATA through Impala.
>  # Run "use db" statement in Impala.
>  
> The while condition in the code snippet below is cause the 
> WaitForMinCatalogUpdate function to prematurely return even though INVALIDATE 
> METADATA has not taken effect: 
> {code:java}
> void ImpalaServer::WaitForMinCatalogUpdate(..) {
> ...
> VLOG_QUERY << "Waiting for minimum catalog object version: "
><< min_req_catalog_object_version << " current version: "
><< min_catalog_object_version;
> while (catalog_update_info_.min_catalog_object_version <  
> min_req_catalog_object_version && catalog_update_info_.catalog_service_id ==  
> catalog_service_id) {
>catalog_version_update_cv_.Wait(unique_lock);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6900) Invalidate metadata operation is ignored at a coordinator if catalog is empty

2018-10-01 Thread Michael Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Brown updated IMPALA-6900:
--
Priority: Blocker  (was: Major)

> Invalidate metadata operation is ignored at a coordinator if catalog is empty
> -
>
> Key: IMPALA-6900
> URL: https://issues.apache.org/jira/browse/IMPALA-6900
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Dimitris Tsirogiannis
>Assignee: Vuk Ercegovac
>Priority: Blocker
>
> The following workflow may cause an impalad that issued an invalidate 
> metadata to falsely consider that the effect of that operation has taken 
> effect, thus causing subsequent queries to fail due to unresolved references 
> to tables or databases. 
> Steps to reproduce:
>  # Start an impala cluster connecting to an empty HMS (no databases).
>  # Create a database "db" in HMS outside of Impala (e.g. using Hive).
>  # Run INVALIDATE METADATA through Impala.
>  # Run "use db" statement in Impala.
>  
> The while condition in the code snippet below is cause the 
> WaitForMinCatalogUpdate function to prematurely return even though INVALIDATE 
> METADATA has not taken effect: 
> {code:java}
> void ImpalaServer::WaitForMinCatalogUpdate(..) {
> ...
> VLOG_QUERY << "Waiting for minimum catalog object version: "
><< min_req_catalog_object_version << " current version: "
><< min_catalog_object_version;
> while (catalog_update_info_.min_catalog_object_version <  
> min_req_catalog_object_version && catalog_update_info_.catalog_service_id ==  
> catalog_service_id) {
>catalog_version_update_cv_.Wait(unique_lock);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7644) Hide Parquet page index writing with feature flag

2018-10-01 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/IMPALA-7644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-7644:
-

Assignee: Zoltán Borók-Nagy

> Hide Parquet page index writing with feature flag
> -
>
> Key: IMPALA-7644
> URL: https://issues.apache.org/jira/browse/IMPALA-7644
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>
> Currently there is no released Impala version that can write the Parquet page 
> index:
> [https://github.com/apache/parquet-format/blob/master/PageIndex.md]
> However, the current Impala master writes the page index since IMPALA-5842, 
> but cannot read it.
> I think we should hide the write path with a feature flag until Impala is 
> able to read it back and has better test coverage on it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7605) AnalysisException when first accessing Hive-create table on pristine HMS

2018-10-01 Thread bharath v (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634210#comment-16634210
 ] 

bharath v commented on IMPALA-7605:
---

I think this is a dupe of IMPALA-6900. I could reliably repro the issue by 
increasing {{statestore_update_frequency_ms}} to {{1}} which increases the 
window of race. With the defaults, the race window is much smaller and that's 
the reason I was not hitting it locally.

> AnalysisException when first accessing Hive-create table on pristine HMS
> 
>
> Key: IMPALA-7605
> URL: https://issues.apache.org/jira/browse/IMPALA-7605
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.1.0
>Reporter: Michael Brown
>Assignee: bharath v
>Priority: Blocker
>  Labels: regression
> Attachments: 3.0-logs.tar.gz, 3.1-logs.tar.gz, metadata-bug.sh
>
>
> This is a corner case encountered when loading test data from Hive on a 
> pristine/new cluster. As we tend to keep bigger clusters around and upgrade 
> them, as opposed to refreshing them, and our data load doesn't hit this 
> either, this was tough to spot.
> The procedure in general is to start with a pristine HMS, create a table in 
> Hive, and use Impala to access the table. Upon the first access, 
> AnalysisException is raised. Subsequent accesses work.
> This is a P1 in the sense that, while a user can try again and succeed, test 
> automation is going to increasingly load data via Hive and then access it in 
> Impala. This case needs to work. The other thing making this a P1 is that 
> it's a change behavior relative to both 2.12 and 3.0 and thus a regression.
> Here's what catalogd.INFO looks like in the successful case (3.0):
> {noformat}
> I0921 10:28:23.697592 47879 CatalogServiceCatalog.java:1102] Invalidating all 
> metadata. Version: 0
> I0921 10:28:23.810739 47879 CatalogServiceCatalog.java:914] Loading native 
> functions for database: default
> I0921 10:28:23.811686 47879 CatalogServiceCatalog.java:930] Loaded native 
> functions for database: default
> I0921 10:28:23.811738 47879 CatalogServiceCatalog.java:941] Loading Java 
> functions for database: default
> I0921 10:28:23.811772 47879 CatalogServiceCatalog.java:952] Loaded Java 
> functions for database: default
> I0921 10:28:23.853292 47879 CatalogServiceCatalog.java:1170] Invalidated all 
> metadata.
> I0921 10:28:23.860013 47879 statestore-subscriber.cc:190] Starting statestore 
> subscriber
> I0921 10:28:23.861377 47879 thrift-server.cc:452] ThriftServer 
> 'StatestoreSubscriber' started on port: 23020
> I0921 10:28:23.861383 47879 statestore-subscriber.cc:217] Registering with 
> statestore
> I0921 10:28:23.861979 47879 statestore-subscriber.cc:174] Subscriber 
> registration ID: 624d03c923c15c12:f9204d9fb14054a9
> I0921 10:28:23.861989 47879 statestore-subscriber.cc:221] statestore 
> registration successful
> I0921 10:28:23.868679 48041 catalog-server.cc:490] Collected update: 
> DATABASE:default, version=2, original size=156
> I0921 10:28:23.870929 48041 catalog-server.cc:490] Collected deletion: 
> DATABASE:_impala_builtins, version=3, original size=140
> I0921 10:28:23.872802 48041 catalog-server.cc:490] Collected update: 
> CATALOG_SERVICE_ID, version=3, original size=49
> I0921 10:28:23.875424 47879 thrift-server.cc:452] ThriftServer 
> 'CatalogService' started on port: 26000
> I0921 10:28:23.875434 47879 catalogd-main.cc:111] CatalogService started on 
> port: 26000
> I0921 10:28:26.776692 48047 catalog-server.cc:245] A catalog update with 3 
> entries is assembled. Catalog version: 3 Last sent catalog version: 0
> I0921 10:28:53.571209 48924 CatalogServiceCatalog.java:1102] Invalidating all 
> metadata. Version: 3
> I0921 10:28:53.608983 48924 CatalogServiceCatalog.java:914] Loading native 
> functions for database: default
> I0921 10:28:53.609027 48924 CatalogServiceCatalog.java:930] Loaded native 
> functions for database: default
> I0921 10:28:53.609058 48924 CatalogServiceCatalog.java:941] Loading Java 
> functions for database: default
> I0921 10:28:53.609087 48924 CatalogServiceCatalog.java:952] Loaded Java 
> functions for database: default
> I0921 10:28:53.614903 48924 CatalogServiceCatalog.java:914] Loading native 
> functions for database: foo1537550878
> I0921 10:28:53.614946 48924 CatalogServiceCatalog.java:930] Loaded native 
> functions for database: foo1537550878
> I0921 10:28:53.614977 48924 CatalogServiceCatalog.java:941] Loading Java 
> functions for database: foo1537550878
> I0921 10:28:53.615005 48924 CatalogServiceCatalog.java:952] Loaded Java 
> functions for database: foo1537550878
> I0921 10:28:53.632726 48924 CatalogServiceCatalog.java:1170] Invalidated all 
> metadata.
> I0921 10:28:54.782857 48041 

[jira] [Created] (IMPALA-7644) Hide Parquet page index writing with feature flag

2018-10-01 Thread JIRA
Zoltán Borók-Nagy created IMPALA-7644:
-

 Summary: Hide Parquet page index writing with feature flag
 Key: IMPALA-7644
 URL: https://issues.apache.org/jira/browse/IMPALA-7644
 Project: IMPALA
  Issue Type: Improvement
Reporter: Zoltán Borók-Nagy


Currently there is no released Impala version that can write the Parquet page 
index:

[https://github.com/apache/parquet-format/blob/master/PageIndex.md]

However, the current Impala master writes the page index since IMPALA-5842, but 
cannot read it.

I think we should hide the write path with a feature flag until Impala is able 
to read it back and has better test coverage on it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org