[jira] [Commented] (IMPALA-6671) Metadata operations that modify a table blocks topic updates for other unrelated operations

2019-02-07 Thread Adrian Ng (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763217#comment-16763217
 ] 

Adrian Ng commented on IMPALA-6671:
---

Note that this issue is resolved with local catalog mode. However, since the 
mode is experimental, we have not closed this JIRA yet. We will close this JIRA 
once it is turned on by default. 


> Metadata operations that modify a table blocks topic updates for other 
> unrelated operations
> ---
>
> Key: IMPALA-6671
> URL: https://issues.apache.org/jira/browse/IMPALA-6671
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.10.0, Impala 2.11.0, Impala 2.12.0
>Reporter: Mostafa Mokhtar
>Priority: Critical
>  Labels: catalog-server, perfomance
>
> Metadata operations that mutate the state of a table like "compute stats foo" 
> or "alter recover partitions" block topic updates for read only operations 
> against unrelated tables as "describe bar".
> Thread for blocked operation
> {code}
> "Thread-7" prio=10 tid=0x11613000 nid=0x21b3b waiting on condition 
> [0x7f5f2ef52000]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for  <0x7f6f57ff0240> (a 
> java.util.concurrent.locks.ReentrantLock$NonfairSync)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
> at 
> java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
> at 
> java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.addTableToCatalogDeltaHelper(CatalogServiceCatalog.java:639)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.addTableToCatalogDelta(CatalogServiceCatalog.java:611)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.addDatabaseToCatalogDelta(CatalogServiceCatalog.java:567)
> at 
> org.apache.impala.catalog.CatalogServiceCatalog.getCatalogDelta(CatalogServiceCatalog.java:449)
> at 
> org.apache.impala.service.JniCatalog.getCatalogDelta(JniCatalog.java:126)
> {code}
> Thread for blocking operation 
> {code}
> "Thread-130" prio=10 tid=0x113d5800 nid=0x2499d runnable 
> [0x7f5ef80d]
>java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:152)
> at java.net.SocketInputStream.read(SocketInputStream.java:122)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> - locked <0x7f5fffcd9f18> (a java.io.BufferedInputStream)
> at 
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at 
> org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:346)
> at 
> org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:423)
> at 
> org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:405)
> at 
> org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at 
> org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
> at 
> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
> at 
> org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_add_partitions_req(ThriftHiveMetastore.java:1639)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.add_partitions_req(ThriftHiveMetastore.java:1626)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.add_partitions(HiveMetaStoreClient.java:609)
> at 

[jira] [Commented] (IMPALA-7214) Update Impala docs to reflect coordinator/executor separation and decoupling from DataNodes.

2019-02-07 Thread Alex Rodoni (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763216#comment-16763216
 ] 

Alex Rodoni commented on IMPALA-7214:
-

https://gerrit.cloudera.org/#/c/12400/

> Update Impala docs to reflect coordinator/executor separation and decoupling 
> from DataNodes.
> 
>
> Key: IMPALA-7214
> URL: https://issues.apache.org/jira/browse/IMPALA-7214
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Affects Versions: Impala 2.12.0
>Reporter: Tim Armstrong
>Assignee: Alex Rodoni
>Priority: Major
>
> The docs tend to conflate DataNodes (a HDFS service) and Impala daemons. I 
> think this stems from the original deployment practice of always colocating 
> Impala daemons with HDFS datanodes so that HDFS data could always be read 
> from a local DataNode. 
> I'm a bit pedantic so the conflation feels wrong to me regardless, but I 
> think this will become increasingly confusing as alternative deployments 
> without colocated HDFS DataNodes become more common (e.g. running against S3, 
> running with a separate HDFS service).
> E.g. picking an example at random:
> {noformat}
> In Impala 1.4.0 and higher, the LIMIT clause is now 
> optional (rather than required) for
> queries that use the ORDER BY clause. Impala 
> automatically uses a temporary disk work area
> to perform the sort if the sort operation would otherwise exceed the 
> Impala memory limit for a particular
> DataNode.
> {noformat}
> This is wrong because the memory limit is for an Impala daemon, which is the 
> process that does the actual sorting. So here I think it should be "Impala 
> daemon" instead of "DataNode".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7929) Impala query on HBASE table failing with InternalException: Required field*

2019-02-07 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang resolved IMPALA-7929.
---
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

Many thanks to [~joemcdonnell], [~tgar] and [~paul-rogers] for the review and 
commit!

> Impala query on HBASE table failing with InternalException: Required field*
> ---
>
> Key: IMPALA-7929
> URL: https://issues.apache.org/jira/browse/IMPALA-7929
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.2.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
> Fix For: Impala 3.2.0
>
>
> This looks a corner case bug demonstrated at impala-hbase boundary.
> The way to reproduce:
> Create a table in hive shell,
> {code}
> create database abc;
> CREATE TABLE abc.test_hbase1 (k STRING, c STRING) STORED BY 
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('
> hbase.columns.mapping'=':key,cf:c', 'serialization.format'='1') TBLPROPERTIES 
> ('hbase.table.name'='test_hbase1', 'storage_handler'='o
> rg.apache.hadoop.hive.hbase.HBaseStorageHandler');
> {code}
> Then issue query at impala shell:
> {code}
> select * from abc.test_hbase1 where k != "row1"; 
> {code}
> Observe:
> {code}
> Query: select * from abc.test_hbase1 where k != "row1"
>  
> Query submitted at: 2018-12-04 17:02:42 (Coordinator: http://xyz:25000)
> ERROR: InternalException: Required field 'qualifier' was not present! Struct: 
> THBaseFilter(family::key, qualifier:null, op_ordinal:3, filter_constant:row1)
> {code}
> More observations:
> # Replacing {{k != "row1"}} with {{k <> "row1"}} fails the same way. However, 
> replacing it with other operators, such as ">", "<", "=", all works.
> # Replacing {{k != "row1}} with {{c != "row1"}}, it succeeded without the 
> error reported above.
> The above example uses a two-column table, creating a similar table with 
> three columns fails the same way: adding inequality predicate on the first 
> column fails, adding inequility predicate doesn't fail.
> The code that issues the error message is in HBase, it seems Impala did not 
> pass the needed info to HBase in this special case. Also wonder if it's 
> because the first column of the table is the key in hbase table that could 
> reveal the bug.
> {code}
> hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TColumnIncrement.java:
>   throw new org.apache.thrift.protocol.TProtocolException("Required field 
> 'qualifier' was not present! Struct: " + toString());
> hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TColumnValue.java:
>   throw new org.apache.thrift.protocol.TProtocolException("Required field 
> 'qualifier' was not present! Struct: " + toString());
> hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift2/generated/THBaseService.java:
> throw new org.apache.thrift.protocol.TProtocolException("Required 
> field 'qualifier' was not present! Struct: " + toString());
> hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift2/generated/THBaseService.java:
> throw new org.apache.thrift.protocol.TProtocolException("Required 
> field 'qualifier' was not present! Struct: " + toString());
> hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift2/generated/THBaseService.java:
> throw new org.apache.thrift.protocol.TProtocolException("Required 
> field 'qualifier' was not present! Struct: " + toString());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8102) Impala/HBase recommendations need update

2019-02-07 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni updated IMPALA-8102:

Issue Type: Task  (was: Documentation)

> Impala/HBase recommendations need update
> 
>
> Key: IMPALA-8102
> URL: https://issues.apache.org/jira/browse/IMPALA-8102
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 3.2.0
>
>
> https://impala.apache.org/docs/build/html/topics/impala_hbase.html hasn't 
> been updated for a while. The recommendations are a bit out of date - 
> generally HBase is not the best format for analytic workloads yet that page 
> seems to encourage using it.
> E.g.
> {quote}If you have join queries that do aggregation operations on large fact 
> tables and join the results against small dimension tables, consider using 
> Impala for the fact tables and HBase for the dimension tables.{quote}
> Assigning to myself to figure out what the best practice is, but I think we 
> need to include:
> * A statement Kudu offers significantly better performance for analytical 
> workloads with mutable data
> * A statement that HDFS tables are also preferable unless data is frequently 
> mutated
> * A pointer to the Kudu docs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-8172) Impala Doc: Doc the query option to limit on #rows returned from query

2019-02-07 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-8172.
---
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Impala Doc: Doc the query option to limit on #rows returned from query
> --
>
> Key: IMPALA-8172
> URL: https://issues.apache.org/jira/browse/IMPALA-8172
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_32
> Fix For: Impala 3.2.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8109) Impala cannot read the gzip files bigger than 2 GB

2019-02-07 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763187#comment-16763187
 ] 

Tim Armstrong commented on IMPALA-8109:
---

I'm not having any luck reproducing this on master but I can't find a smoking 
fun for which change might have fixed this.

> Impala cannot read the gzip files bigger than 2 GB
> --
>
> Key: IMPALA-8109
> URL: https://issues.apache.org/jira/browse/IMPALA-8109
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.12.0
>Reporter: hakki
>Assignee: Tim Armstrong
>Priority: Major
>
> When querying a partition containing gzip files, the query fails with the 
> error below: 
> WARNINGS: Disk I/O error: Error seeking to -2147483648 in file: 
> hdfs://HADOOP_CLUSTER/user/hive/AAA/BBB/datehour=20180910/XXX.gz: 
> Error(255): Unknown error 255
> Root cause: EOFException: Cannot seek to negative offset
> hdfs://HADOOP_CLUSTER/user/hive/AAA/BBB/datehour=20180910/XXX.gz file is 
> a delimited text file and has a size of bigger than 2 GB (approx: 2.4 GB) The 
> uncompressed size is ~13GB
> The impalad version is : 2.12.0-cdh5.15.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8109) Impala cannot read the gzip files bigger than 2 GB

2019-02-07 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763185#comment-16763185
 ] 

Tim Armstrong commented on IMPALA-8109:
---

Hm, ok, so your file does have multiple blocks rather than one single block.

> Impala cannot read the gzip files bigger than 2 GB
> --
>
> Key: IMPALA-8109
> URL: https://issues.apache.org/jira/browse/IMPALA-8109
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.12.0
>Reporter: hakki
>Assignee: Tim Armstrong
>Priority: Major
>
> When querying a partition containing gzip files, the query fails with the 
> error below: 
> WARNINGS: Disk I/O error: Error seeking to -2147483648 in file: 
> hdfs://HADOOP_CLUSTER/user/hive/AAA/BBB/datehour=20180910/XXX.gz: 
> Error(255): Unknown error 255
> Root cause: EOFException: Cannot seek to negative offset
> hdfs://HADOOP_CLUSTER/user/hive/AAA/BBB/datehour=20180910/XXX.gz file is 
> a delimited text file and has a size of bigger than 2 GB (approx: 2.4 GB) The 
> uncompressed size is ~13GB
> The impalad version is : 2.12.0-cdh5.15.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8102) Impala/HBase recommendations need update

2019-02-07 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8102.
---
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Impala/HBase recommendations need update
> 
>
> Key: IMPALA-8102
> URL: https://issues.apache.org/jira/browse/IMPALA-8102
> Project: IMPALA
>  Issue Type: Documentation
>  Components: Docs
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 3.2.0
>
>
> https://impala.apache.org/docs/build/html/topics/impala_hbase.html hasn't 
> been updated for a while. The recommendations are a bit out of date - 
> generally HBase is not the best format for analytic workloads yet that page 
> seems to encourage using it.
> E.g.
> {quote}If you have join queries that do aggregation operations on large fact 
> tables and join the results against small dimension tables, consider using 
> Impala for the fact tables and HBase for the dimension tables.{quote}
> Assigning to myself to figure out what the best practice is, but I think we 
> need to include:
> * A statement Kudu offers significantly better performance for analytical 
> workloads with mutable data
> * A statement that HDFS tables are also preferable unless data is frequently 
> mutated
> * A pointer to the Kudu docs



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7657) Proper codegen for TupleIsNullPredicate, IsNotEmptyPredicate and ValidTupleId

2019-02-07 Thread Andrew Sherman (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman resolved IMPALA-7657.

   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Proper codegen for TupleIsNullPredicate, IsNotEmptyPredicate and ValidTupleId
> -
>
> Key: IMPALA-7657
> URL: https://issues.apache.org/jira/browse/IMPALA-7657
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Andrew Sherman
>Priority: Major
>  Labels: codegen, performance
> Fix For: Impala 3.2.0
>
>
> These utility functions use GetCodegendComputeFnWrapper() to call the 
> interpreted path but instead we could codegen them into efficient code. We 
> could either use IRBuilder or, if possible, cross-compile the implementation 
> and substitute in constants.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8172) Impala Doc: Doc the query option to limit on #rows returned from query

2019-02-07 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763178#comment-16763178
 ] 

ASF subversion and git services commented on IMPALA-8172:
-

Commit c10113c6f72ba7a877f9bb67a014537f5d278a44 in impala's branch 
refs/heads/master from Alex Rodoni
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c10113c ]

IMPALA-8172: [DOCS] Documented the NUM_ROWS_PRODUCED_LIMIT query option

Change-Id: Iaf789ef2a01def3febf34f0d483cf4a9a32aecc2
Reviewed-on: http://gerrit.cloudera.org:8080/12396
Tested-by: Impala Public Jenkins 
Reviewed-by: Bikramjeet Vig 
Reviewed-by: Pooja Nilangekar 
Reviewed-by: Tim Armstrong 


> Impala Doc: Doc the query option to limit on #rows returned from query
> --
>
> Key: IMPALA-8172
> URL: https://issues.apache.org/jira/browse/IMPALA-8172
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_32
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-8172) Impala Doc: Doc the query option to limit on #rows returned from query

2019-02-07 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-8172 started by Alex Rodoni.
---
> Impala Doc: Doc the query option to limit on #rows returned from query
> --
>
> Key: IMPALA-8172
> URL: https://issues.apache.org/jira/browse/IMPALA-8172
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_32
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-8044) Impala Doc: Document the command to refresh authorization data

2019-02-07 Thread Alex Rodoni (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-8044.
---
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Impala Doc: Document the command to refresh authorization data
> --
>
> Key: IMPALA-8044
> URL: https://issues.apache.org/jira/browse/IMPALA-8044
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_32
> Fix For: Impala 3.2.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8044) Impala Doc: Document the command to refresh authorization data

2019-02-07 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763160#comment-16763160
 ] 

ASF subversion and git services commented on IMPALA-8044:
-

Commit d7a0690142c5d965389a5a62b64a649d2619ffa2 in impala's branch 
refs/heads/master from Alex Rodoni
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d7a0690 ]

IMPALA-8044: [DOCS] Document REFRESH AUTHORIZATION statement

Change-Id: Ibbb1661affc01c72db8f76bd97d356c802ed9680
Reviewed-on: http://gerrit.cloudera.org:8080/12378
Tested-by: Impala Public Jenkins 
Reviewed-by: Fredy Wijaya 


> Impala Doc: Document the command to refresh authorization data
> --
>
> Key: IMPALA-8044
> URL: https://issues.apache.org/jira/browse/IMPALA-8044
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>  Labels: future_release_doc, in_32
> Fix For: Impala 3.2.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7655) Codegen output for conditional functions (if,isnull, coalesce) is very suboptimal

2019-02-07 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763158#comment-16763158
 ] 

ASF subversion and git services commented on IMPALA-7655:
-

Commit 7707eb041728456ea5e05ae8acc5ab59c715b98a in impala's branch 
refs/heads/master from Andrew Sherman
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=7707eb0 ]

IMPALA-7657: Codegen IsNotEmptyPredicate and ValidTupleIdExpr.

These two classes evaluate scalar expressions. Previously codegen was
done by calling ScalarExpr::GetCodegendComputeFnWrapper which generates
a static method that calls the scalar expression evaluation method. Make
this more efficient by generating code which is customized using
information available at codegen time.

Add new cross-compiled files null-literal-ir.cc slot-ref-ir.cc

IsNotEmptyPredicate works by getting a CollectionVal object from the
single child Expr node, and counting its tuples. At codegen time we know
the type and value of the child node. Generate a call to a node-specific
non-virtual cross-compiled method to get the CollectionVal object from
the child. Then generate a code that examines the CollectionVal and
returns an IntVal.

A ValidTupleIdExpr node contains a vector of tuple ids. It works by
probing each row for the tuple ids in the vector to find a non-null
tuple. At codegen time we know the vector of tuple ids. We unroll the
loop through the tuple ids, generating code that evaluates if the tuple
is non-null, and returns the tuple id if/when a non-null tuple is found.

IMPALA-7657 also requests replacing GetCodegendComputeFnWrapper() in
TupleIsNullPredicate. In the current Impala code this method is never
called. This is because TupleIsNullPredicate is always wrapped in an
IfExpr. This is always codegen'd by IfExpr's
GetCodegendComputeFnWrapper() method. There is a separate Jira
IMPALA-7655 to improve codegen of IfExpr.

Minor corrections:
  Correct the link to llvm tutorial in LlvmCodegen.

PERFORMANCE:
  I tested performance on a local mini-cluster. I wrote some
  pathological queries to test the new code. The new codegen'd code is
  very similar in performance. Both ValidTupleIdExpr and
  IsNotEmptyPredicate seem very slightly faster than the old code.
  Overall these changes are not purely for performance but to move away
  from GetCodegendComputeFnWrapper.

TESTING:
  The changed scalar expressions are well exercised by current tests.
  Ran exhaustive end-to-end tests.

Change-Id: Ifb87b9e3b879c278ce8638d97bcb320a7555a6b3
Reviewed-on: http://gerrit.cloudera.org:8080/12068
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Codegen output for conditional functions (if,isnull, coalesce) is very 
> suboptimal
> -
>
> Key: IMPALA-7655
> URL: https://issues.apache.org/jira/browse/IMPALA-7655
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Paul Rogers
>Priority: Major
>  Labels: codegen, perf, performance
>
> https://gerrit.cloudera.org/#/c/11565/ provided a clue that an aggregation 
> involving an if() function was very slow, 10x slower than the equivalent 
> version using a case:
> {noformat}
> [localhost:21000] default> set num_nodes=1; set mt_dop=1; select count(case 
> when l_orderkey is NULL then 1 else NULL end) from 
> tpch10_parquet.lineitem;summary;
> NUM_NODES set to 1
> MT_DOP set to 1
> Query: select count(case when l_orderkey is NULL then 1 else NULL end) from 
> tpch10_parquet.lineitem
> Query submitted at: 2018-10-04 11:17:31 (Coordinator: 
> http://tarmstrong-box:25000)
> Query progress can be monitored at: 
> http://tarmstrong-box:25000/query_plan?query_id=274b2a6f35cefe31:95a19642
> +--+
> | count(case when l_orderkey is null then 1 else null end) |
> +--+
> | 0|
> +--+
> Fetched 1 row(s) in 0.51s
> +--++--+--+++--+---+-+
> | Operator | #Hosts | Avg Time | Max Time | #Rows  | Est. #Rows | Peak 
> Mem | Est. Peak Mem | Detail  |
> +--++--+--+++--+---+-+
> | 01:AGGREGATE | 1  | 44.03ms  | 44.03ms  | 1  | 1  | 25.00 
> KB | 10.00 MB  | FINALIZE|
> | 00:SCAN HDFS | 1  | 411.57ms | 411.57ms | 59.99M | -1 | 16.61 
> MB | 88.00 MB  | tpch10_parquet.lineitem |
> 

[jira] [Commented] (IMPALA-7929) Impala query on HBASE table failing with InternalException: Required field*

2019-02-07 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763161#comment-16763161
 ] 

ASF subversion and git services commented on IMPALA-7929:
-

Commit 06d078e76bad6944ad487fd432ff6d0aa9174960 in impala's branch 
refs/heads/master from Yongjun Zhang
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=06d078e ]

IMPALA-7929: Allow null qualifier in THBaseFilter

Impala query failed with "IntenalException: Required field 'qualifier'
was not present!" on table created via hive and mapped to HBase, because
the qualifier of the HBase key column is null in the mapped table, and
Impala required non-null qualifier. The fix here is to relax this
requirement.

Test:
Added unit test.
Tested in real cluster.

Change-Id: I378c2249604481067b5b1c3a3bbb28c30ad4d751
Reviewed-on: http://gerrit.cloudera.org:8080/12213
Reviewed-by: Joe McDonnell 
Tested-by: Tim Armstrong 


> Impala query on HBASE table failing with InternalException: Required field*
> ---
>
> Key: IMPALA-7929
> URL: https://issues.apache.org/jira/browse/IMPALA-7929
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.2.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
>
> This looks a corner case bug demonstrated at impala-hbase boundary.
> The way to reproduce:
> Create a table in hive shell,
> {code}
> create database abc;
> CREATE TABLE abc.test_hbase1 (k STRING, c STRING) STORED BY 
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('
> hbase.columns.mapping'=':key,cf:c', 'serialization.format'='1') TBLPROPERTIES 
> ('hbase.table.name'='test_hbase1', 'storage_handler'='o
> rg.apache.hadoop.hive.hbase.HBaseStorageHandler');
> {code}
> Then issue query at impala shell:
> {code}
> select * from abc.test_hbase1 where k != "row1"; 
> {code}
> Observe:
> {code}
> Query: select * from abc.test_hbase1 where k != "row1"
>  
> Query submitted at: 2018-12-04 17:02:42 (Coordinator: http://xyz:25000)
> ERROR: InternalException: Required field 'qualifier' was not present! Struct: 
> THBaseFilter(family::key, qualifier:null, op_ordinal:3, filter_constant:row1)
> {code}
> More observations:
> # Replacing {{k != "row1"}} with {{k <> "row1"}} fails the same way. However, 
> replacing it with other operators, such as ">", "<", "=", all works.
> # Replacing {{k != "row1}} with {{c != "row1"}}, it succeeded without the 
> error reported above.
> The above example uses a two-column table, creating a similar table with 
> three columns fails the same way: adding inequality predicate on the first 
> column fails, adding inequility predicate doesn't fail.
> The code that issues the error message is in HBase, it seems Impala did not 
> pass the needed info to HBase in this special case. Also wonder if it's 
> because the first column of the table is the key in hbase table that could 
> reveal the bug.
> {code}
> hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TColumnIncrement.java:
>   throw new org.apache.thrift.protocol.TProtocolException("Required field 
> 'qualifier' was not present! Struct: " + toString());
> hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TColumnValue.java:
>   throw new org.apache.thrift.protocol.TProtocolException("Required field 
> 'qualifier' was not present! Struct: " + toString());
> hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift2/generated/THBaseService.java:
> throw new org.apache.thrift.protocol.TProtocolException("Required 
> field 'qualifier' was not present! Struct: " + toString());
> hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift2/generated/THBaseService.java:
> throw new org.apache.thrift.protocol.TProtocolException("Required 
> field 'qualifier' was not present! Struct: " + toString());
> hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift2/generated/THBaseService.java:
> throw new org.apache.thrift.protocol.TProtocolException("Required 
> field 'qualifier' was not present! Struct: " + toString());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-4555) Don't cancel query for failed ReportExecStatus (done=false) RPC

2019-02-07 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-4555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763159#comment-16763159
 ] 

ASF subversion and git services commented on IMPALA-4555:
-

Commit b1e4957ba78ef496d21728606889d1eb83ef6b27 in impala's branch 
refs/heads/master from Thomas Tauber-Marshall
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=b1e4957 ]

IMPALA-4555: Make QueryState's status reporting more robust

QueryState periodically collects runtime profiles from all of its
fragment instances and sends them to the coordinator. Previously, each
time this happens, if the rpc fails, QueryState will retry twice after
a configurable timeout and then cancel the fragment instances under
the assumption that the coordinator no longer exists.

We've found in real clusters that this logic is too sensitive to
failed rpcs and can result in fragment instances being cancelled even
in cases where the coordinator is still running.

This patch makes a few improvements to this logic:
- When a report fails to send, instead of retrying the same report
  quickly (after waiting report_status_retry_interval_ms), we wait the
  regular reporting interval (status_report_interval_ms), regenerate
  any stale portions of the report, and then retry.
- A new flag, --status_report_max_retries, is introduced, which
  controls the number of failed reports that are allowed before the
  query is cancelled. --report_status_retry_interval_ms is removed.
- Backoff is used for repeated failed attempts, such that for a period
  between retries of 't', on try 'n' the actual timeout will be t * n.

Testing:
- Added a test which results in a large number of failed intermediate
  status reports but still succeeds.

Change-Id: Ib6007013fc2c9e8eeba11b752ee58fb3038da971
Reviewed-on: http://gerrit.cloudera.org:8080/12049
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Don't cancel query for failed ReportExecStatus (done=false) RPC
> ---
>
> Key: IMPALA-4555
> URL: https://issues.apache.org/jira/browse/IMPALA-4555
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 2.7.0
>Reporter: Sailesh Mukil
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>
> We currently try to send the ReportExecStatus RPC up to 3 times if the first 
> 2 times are unsuccessful - due to high network load or a network partition. 
> If all 3 attempts fail, we cancel the fragment instance and hence the query.
> However, we do not need to cancel the fragment instance if sending the report 
> with _done=false_ failed. We can just skip this turn and try again the next 
> time.
> We could probably skip sending the report up to 2 times (if we're unable to 
> send due to high network load and if done=false) before succumbing to the 
> current behavior, which is to cancel the fragment instance. The point is to 
> try at a later time when the network load may be lower rather than try 
> quickly again. The chance that the network load would reduce in 100 ms is 
> less than in 5s.
> Also, we probably do not need to have the retry logic unless we've already 
> skipped twice or if done=true.
> This could help reduce the network load on the coordinator for highly 
> concurrent workloads.
> The only drawback I see now is that the QueryExecSummary might be stale for a 
> while (which it would have anyway because the RPCs would have failed to send)
> P.S: This above proposed solution may need to change if we go ahead with 
> IMPALA-2990.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7657) Proper codegen for TupleIsNullPredicate, IsNotEmptyPredicate and ValidTupleId

2019-02-07 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763157#comment-16763157
 ] 

ASF subversion and git services commented on IMPALA-7657:
-

Commit 7707eb041728456ea5e05ae8acc5ab59c715b98a in impala's branch 
refs/heads/master from Andrew Sherman
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=7707eb0 ]

IMPALA-7657: Codegen IsNotEmptyPredicate and ValidTupleIdExpr.

These two classes evaluate scalar expressions. Previously codegen was
done by calling ScalarExpr::GetCodegendComputeFnWrapper which generates
a static method that calls the scalar expression evaluation method. Make
this more efficient by generating code which is customized using
information available at codegen time.

Add new cross-compiled files null-literal-ir.cc slot-ref-ir.cc

IsNotEmptyPredicate works by getting a CollectionVal object from the
single child Expr node, and counting its tuples. At codegen time we know
the type and value of the child node. Generate a call to a node-specific
non-virtual cross-compiled method to get the CollectionVal object from
the child. Then generate a code that examines the CollectionVal and
returns an IntVal.

A ValidTupleIdExpr node contains a vector of tuple ids. It works by
probing each row for the tuple ids in the vector to find a non-null
tuple. At codegen time we know the vector of tuple ids. We unroll the
loop through the tuple ids, generating code that evaluates if the tuple
is non-null, and returns the tuple id if/when a non-null tuple is found.

IMPALA-7657 also requests replacing GetCodegendComputeFnWrapper() in
TupleIsNullPredicate. In the current Impala code this method is never
called. This is because TupleIsNullPredicate is always wrapped in an
IfExpr. This is always codegen'd by IfExpr's
GetCodegendComputeFnWrapper() method. There is a separate Jira
IMPALA-7655 to improve codegen of IfExpr.

Minor corrections:
  Correct the link to llvm tutorial in LlvmCodegen.

PERFORMANCE:
  I tested performance on a local mini-cluster. I wrote some
  pathological queries to test the new code. The new codegen'd code is
  very similar in performance. Both ValidTupleIdExpr and
  IsNotEmptyPredicate seem very slightly faster than the old code.
  Overall these changes are not purely for performance but to move away
  from GetCodegendComputeFnWrapper.

TESTING:
  The changed scalar expressions are well exercised by current tests.
  Ran exhaustive end-to-end tests.

Change-Id: Ifb87b9e3b879c278ce8638d97bcb320a7555a6b3
Reviewed-on: http://gerrit.cloudera.org:8080/12068
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Proper codegen for TupleIsNullPredicate, IsNotEmptyPredicate and ValidTupleId
> -
>
> Key: IMPALA-7657
> URL: https://issues.apache.org/jira/browse/IMPALA-7657
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Andrew Sherman
>Priority: Major
>  Labels: codegen, performance
>
> These utility functions use GetCodegendComputeFnWrapper() to call the 
> interpreted path but instead we could codegen them into efficient code. We 
> could either use IRBuilder or, if possible, cross-compile the implementation 
> and substitute in constants.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7657) Proper codegen for TupleIsNullPredicate, IsNotEmptyPredicate and ValidTupleId

2019-02-07 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763156#comment-16763156
 ] 

ASF subversion and git services commented on IMPALA-7657:
-

Commit 7707eb041728456ea5e05ae8acc5ab59c715b98a in impala's branch 
refs/heads/master from Andrew Sherman
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=7707eb0 ]

IMPALA-7657: Codegen IsNotEmptyPredicate and ValidTupleIdExpr.

These two classes evaluate scalar expressions. Previously codegen was
done by calling ScalarExpr::GetCodegendComputeFnWrapper which generates
a static method that calls the scalar expression evaluation method. Make
this more efficient by generating code which is customized using
information available at codegen time.

Add new cross-compiled files null-literal-ir.cc slot-ref-ir.cc

IsNotEmptyPredicate works by getting a CollectionVal object from the
single child Expr node, and counting its tuples. At codegen time we know
the type and value of the child node. Generate a call to a node-specific
non-virtual cross-compiled method to get the CollectionVal object from
the child. Then generate a code that examines the CollectionVal and
returns an IntVal.

A ValidTupleIdExpr node contains a vector of tuple ids. It works by
probing each row for the tuple ids in the vector to find a non-null
tuple. At codegen time we know the vector of tuple ids. We unroll the
loop through the tuple ids, generating code that evaluates if the tuple
is non-null, and returns the tuple id if/when a non-null tuple is found.

IMPALA-7657 also requests replacing GetCodegendComputeFnWrapper() in
TupleIsNullPredicate. In the current Impala code this method is never
called. This is because TupleIsNullPredicate is always wrapped in an
IfExpr. This is always codegen'd by IfExpr's
GetCodegendComputeFnWrapper() method. There is a separate Jira
IMPALA-7655 to improve codegen of IfExpr.

Minor corrections:
  Correct the link to llvm tutorial in LlvmCodegen.

PERFORMANCE:
  I tested performance on a local mini-cluster. I wrote some
  pathological queries to test the new code. The new codegen'd code is
  very similar in performance. Both ValidTupleIdExpr and
  IsNotEmptyPredicate seem very slightly faster than the old code.
  Overall these changes are not purely for performance but to move away
  from GetCodegendComputeFnWrapper.

TESTING:
  The changed scalar expressions are well exercised by current tests.
  Ran exhaustive end-to-end tests.

Change-Id: Ifb87b9e3b879c278ce8638d97bcb320a7555a6b3
Reviewed-on: http://gerrit.cloudera.org:8080/12068
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Proper codegen for TupleIsNullPredicate, IsNotEmptyPredicate and ValidTupleId
> -
>
> Key: IMPALA-7657
> URL: https://issues.apache.org/jira/browse/IMPALA-7657
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Andrew Sherman
>Priority: Major
>  Labels: codegen, performance
>
> These utility functions use GetCodegendComputeFnWrapper() to call the 
> interpreted path but instead we could codegen them into efficient code. We 
> could either use IRBuilder or, if possible, cross-compile the implementation 
> and substitute in constants.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8174) test_catalog_restart failed with exception "Detected catalog service ID change"

2019-02-07 Thread Andrew Sherman (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763146#comment-16763146
 ] 

Andrew Sherman commented on IMPALA-8174:


Failing build was 
[https://master-02.jenkins.cloudera.com/job/impala-asf-master-exhaustive-centos6/252/]

 

> test_catalog_restart failed with exception "Detected catalog service ID 
> change"
> ---
>
> Key: IMPALA-8174
> URL: https://issues.apache.org/jira/browse/IMPALA-8174
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.2.0
>Reporter: Michael Ho
>Assignee: Fredy Wijaya
>Priority: Blocker
>  Labels: broken-build
>
> test_catalog_restart failed with exception "Detected catalog service ID 
> change". Didn't dig too much into the details.
> cc'ing [~fredyw],[~bharathv],[~paul.rogers]
> {noformat}
> Error Message
> ImpalaBeeswaxException: ImpalaBeeswaxException:  INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'>  MESSAGE: CatalogException: Detected 
> catalog service ID change. Aborting updateCatalog()
> Stacktrace
> authorization/test_authorization.py:432: in test_catalog_restart
> self.role_cleanup(unique_role)
> authorization/test_authorization.py:438: in role_cleanup
> self.client.execute("drop role %s" % role_name)
> common/impala_connection.py:174: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:183: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:358: in __execute_query
> handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:352: in execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:512: in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: CatalogException: Detected catalog service ID change. Aborting 
> updateCatalog()
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-6854) TestExchangeDelays::test_exchange_small_delay()'s implementation doesn't match description

2019-02-07 Thread Joe McDonnell (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell closed IMPALA-6854.
-
Resolution: Won't Fix

This is not a useful change.

> TestExchangeDelays::test_exchange_small_delay()'s implementation doesn't 
> match description
> --
>
> Key: IMPALA-6854
> URL: https://issues.apache.org/jira/browse/IMPALA-6854
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Trivial
>
> test_exchange_small_delay() has this description:
> {code:java}
> def test_exchange_small_delay(self, vector):
>   """Test delays in registering data stream receivers where the first one or 
> two
>   batches will time out before the receiver registers, but subsequent batches 
> will
>   arrive after the receiver registers. Before IMPALA-2987, this scenario 
> resulted in
>   incorrect results.
>   """
> {code}
> However, looking at the logs when the first sender times out, the query is 
> cancelled. In my manual tests, the query does not return a result. This test 
> case may be out of date. As it is, this overlaps with what 
> test_exchange_large_delay() is doing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6263) Assert hit during service restart Mutex.cpp:130: apache::thrift::concurrency::Mutex::impl::~impl(): Assertion `ret == 0' failed

2019-02-07 Thread Andrew Sherman (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763140#comment-16763140
 ] 

Andrew Sherman commented on IMPALA-6263:


[~kwho] can you link to a failed build please?

> Assert hit during service restart Mutex.cpp:130: 
> apache::thrift::concurrency::Mutex::impl::~impl(): Assertion `ret == 0' failed
> ---
>
> Key: IMPALA-6263
> URL: https://issues.apache.org/jira/browse/IMPALA-6263
> Project: IMPALA
>  Issue Type: Bug
>  Components: Distributed Exec
>Reporter: Mostafa Mokhtar
>Assignee: Andrew Sherman
>Priority: Major
> Attachments: 061ff302-918f-4a2a-000f0b96-29841f85.dmp
>
>
> On a large secure cluster when the Impala service is restarted a core files 
> are generated.
> Found in in impalad.ERR 
> impalad: src/thrift/concurrency/Mutex.cpp:130: 
> apache::thrift::concurrency::Mutex::impl::~impl(): Assertion `ret == 0' 
> failed.
> Wrote minidump to 
> /var/log/impala-minidumps/impalad/061ff302-918f-4a2a-000f0b96-29841f85.dmp
> Mini dump is based off
> {code}
>  Server version: impalad version 2.11.0-SNAPSHOT RELEASE (build 
> b9ccd44599f43776bce7838014cd99e4c76ddb9a)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8174) test_catalog_restart failed with exception "Detected catalog service ID change"

2019-02-07 Thread Fredy Wijaya (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fredy Wijaya reassigned IMPALA-8174:


Assignee: Fredy Wijaya

> test_catalog_restart failed with exception "Detected catalog service ID 
> change"
> ---
>
> Key: IMPALA-8174
> URL: https://issues.apache.org/jira/browse/IMPALA-8174
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 3.2.0
>Reporter: Michael Ho
>Assignee: Fredy Wijaya
>Priority: Blocker
>  Labels: broken-build
>
> test_catalog_restart failed with exception "Detected catalog service ID 
> change". Didn't dig too much into the details.
> cc'ing [~fredyw],[~bharathv],[~paul.rogers]
> {noformat}
> Error Message
> ImpalaBeeswaxException: ImpalaBeeswaxException:  INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'>  MESSAGE: CatalogException: Detected 
> catalog service ID change. Aborting updateCatalog()
> Stacktrace
> authorization/test_authorization.py:432: in test_catalog_restart
> self.role_cleanup(unique_role)
> authorization/test_authorization.py:438: in role_cleanup
> self.client.execute("drop role %s" % role_name)
> common/impala_connection.py:174: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:183: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:358: in __execute_query
> handle = self.execute_query_async(query_string, user=user)
> beeswax/impala_beeswax.py:352: in execute_query_async
> handle = self.__do_rpc(lambda: self.imp_service.query(query,))
> beeswax/impala_beeswax.py:512: in __do_rpc
> raise ImpalaBeeswaxException(self.__build_error_message(b), b)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: CatalogException: Detected catalog service ID change. Aborting 
> updateCatalog()
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8174) test_catalog_restart failed with exception "Detected catalog service ID change"

2019-02-07 Thread Michael Ho (JIRA)
Michael Ho created IMPALA-8174:
--

 Summary: test_catalog_restart failed with exception "Detected 
catalog service ID change"
 Key: IMPALA-8174
 URL: https://issues.apache.org/jira/browse/IMPALA-8174
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Affects Versions: Impala 3.2.0
Reporter: Michael Ho


test_catalog_restart failed with exception "Detected catalog service ID 
change". Didn't dig too much into the details.

cc'ing [~fredyw],[~bharathv],[~paul.rogers]

{noformat}
Error Message
ImpalaBeeswaxException: ImpalaBeeswaxException:  INNER EXCEPTION:   MESSAGE: CatalogException: Detected 
catalog service ID change. Aborting updateCatalog()
Stacktrace
authorization/test_authorization.py:432: in test_catalog_restart
self.role_cleanup(unique_role)
authorization/test_authorization.py:438: in role_cleanup
self.client.execute("drop role %s" % role_name)
common/impala_connection.py:174: in execute
return self.__beeswax_client.execute(sql_stmt, user=user)
beeswax/impala_beeswax.py:183: in execute
handle = self.__execute_query(query_string.strip(), user=user)
beeswax/impala_beeswax.py:358: in __execute_query
handle = self.execute_query_async(query_string, user=user)
beeswax/impala_beeswax.py:352: in execute_query_async
handle = self.__do_rpc(lambda: self.imp_service.query(query,))
beeswax/impala_beeswax.py:512: in __do_rpc
raise ImpalaBeeswaxException(self.__build_error_message(b), b)
E   ImpalaBeeswaxException: ImpalaBeeswaxException:
EINNER EXCEPTION: 
EMESSAGE: CatalogException: Detected catalog service ID change. Aborting 
updateCatalog()
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8096) Limit on #rows returned from query

2019-02-07 Thread Pooja Nilangekar (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pooja Nilangekar resolved IMPALA-8096.
--
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> Limit on #rows returned from query
> --
>
> Key: IMPALA-8096
> URL: https://issues.apache.org/jira/browse/IMPALA-8096
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: resource-management
> Fix For: Impala 3.2.0
>
>
> Sometimes users accidentally run queries that return a large number of rows, 
> e.g.
> {code}
> SELECT * FROM table
> {code}
> When they really only need to look at a subset of the rows. It would be 
> useful to have a guardrail to fail queries the return more rows than a 
> particular limit. Maybe it would make sense to integrate with IMPALA-4268 so 
> that the query is failed when the buffer fills up, but it may also be useful 
> to have an easier-to-understand option based on #rows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8171) stress test doesn't work against minicluster

2019-02-07 Thread Michael Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Brown resolved IMPALA-8171.
---
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> stress test doesn't work against minicluster
> 
>
> Key: IMPALA-8171
> URL: https://issues.apache.org/jira/browse/IMPALA-8171
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.2.0
>Reporter: Michael Brown
>Assignee: Michael Brown
>Priority: Critical
> Fix For: Impala 3.2.0
>
>
> {noformat}
>  $ tests/stress/concurrent_select.py --tpch-db tpch_parquet
> Traceback (most recent call last):
>   File "tests/stress/concurrent_select.py", line 2320, in 
> main()
>   File "tests/stress/concurrent_select.py", line 2159, in main
> if impala.find_stopped_impalads():
>   File "/home/mikeb/Impala/tests/comparison/cluster.py", line 557, in 
> find_stopped_impalads
> for idx, pid in enumerate(self.for_each_impalad(lambda i: i.find_pid())):
>   File "/home/mikeb/Impala/tests/comparison/cluster.py", line 611, in 
> for_each_impalad
> results = promise.get(maxint)
>   File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
> raise self._value
> ValueError: invalid literal for int() with base 10: '48601\n48624\n'
> {noformat}
> This is due to IMPALA-7999. The refactor causes additional processes to match 
> the process list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8171) stress test doesn't work against minicluster

2019-02-07 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763105#comment-16763105
 ] 

ASF subversion and git services commented on IMPALA-8171:
-

Commit 2fad3d08dd6d6c4ebc27f56d8a03e637a65fd29c in impala's branch 
refs/heads/master from Michael Brown
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=2fad3d0 ]

IMPALA-8171: minicluster: use exec builtin to start Impala daemons

In IMPALA-7779 bin/start-daemon.sh was written to unify the place where
Impala daemons are started. It had a side-effect of keeping itself alive
as a parent process of the actual daemon. This broke the
tests.comparison.cluster abstraction used by the stress test and others
in that the process detection logic was finding false positives.

The old bin/start-*.sh scripts used the exec builtin to call their
daemons. Resurrect this as a simple fix; it also makes the ps list
shorter. Add a system test to catch this in the future.

Change-Id: I34ac8ce964f0c56287cee01360a4ba889f9568d7
Reviewed-on: http://gerrit.cloudera.org:8080/12391
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> stress test doesn't work against minicluster
> 
>
> Key: IMPALA-8171
> URL: https://issues.apache.org/jira/browse/IMPALA-8171
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.2.0
>Reporter: Michael Brown
>Assignee: Michael Brown
>Priority: Critical
>
> {noformat}
>  $ tests/stress/concurrent_select.py --tpch-db tpch_parquet
> Traceback (most recent call last):
>   File "tests/stress/concurrent_select.py", line 2320, in 
> main()
>   File "tests/stress/concurrent_select.py", line 2159, in main
> if impala.find_stopped_impalads():
>   File "/home/mikeb/Impala/tests/comparison/cluster.py", line 557, in 
> find_stopped_impalads
> for idx, pid in enumerate(self.for_each_impalad(lambda i: i.find_pid())):
>   File "/home/mikeb/Impala/tests/comparison/cluster.py", line 611, in 
> for_each_impalad
> results = promise.get(maxint)
>   File "/usr/lib/python2.7/multiprocessing/pool.py", line 567, in get
> raise self._value
> ValueError: invalid literal for int() with base 10: '48601\n48624\n'
> {noformat}
> This is due to IMPALA-7999. The refactor causes additional processes to match 
> the process list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8096) Limit on #rows returned from query

2019-02-07 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763104#comment-16763104
 ] 

ASF subversion and git services commented on IMPALA-8096:
-

Commit 6601327af6088def7d880940a5712719fe46acb2 in impala's branch 
refs/heads/master from poojanilangekar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=6601327 ]

IMPALA-8096: Add rows produced limit per query

This patch limits the number of rows produced by a query by
tracking it at the PlanRootSink level. When the
NUM_ROWS_PRODUCED_LIMIT is set, it cancels a query when its
execution produces more rows than the specified limit. This limit
only applies when the results are returned to a client, e.g. for a
SELECT query, but not an INSERT query.

Testing:
Added tests to query-resource-limits.test to verify that the rows
produced limit is honored.
Manually tested on various combinations of tables, fileformats
and ROWS_RETURNED_LIMIT values.

Change-Id: I7b22dbe130a368f4be1f3662a559eb9aae7f0c1d
Reviewed-on: http://gerrit.cloudera.org:8080/12328
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 


> Limit on #rows returned from query
> --
>
> Key: IMPALA-8096
> URL: https://issues.apache.org/jira/browse/IMPALA-8096
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: resource-management
>
> Sometimes users accidentally run queries that return a large number of rows, 
> e.g.
> {code}
> SELECT * FROM table
> {code}
> When they really only need to look at a subset of the rows. It would be 
> useful to have a guardrail to fail queries the return more rows than a 
> particular limit. Maybe it would make sense to integrate with IMPALA-4268 so 
> that the query is failed when the buffer fills up, but it may also be useful 
> to have an easier-to-understand option based on #rows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7779) Parquet Scanner can write binary data into profile

2019-02-07 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763106#comment-16763106
 ] 

ASF subversion and git services commented on IMPALA-7779:
-

Commit 2fad3d08dd6d6c4ebc27f56d8a03e637a65fd29c in impala's branch 
refs/heads/master from Michael Brown
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=2fad3d0 ]

IMPALA-8171: minicluster: use exec builtin to start Impala daemons

In IMPALA-7779 bin/start-daemon.sh was written to unify the place where
Impala daemons are started. It had a side-effect of keeping itself alive
as a parent process of the actual daemon. This broke the
tests.comparison.cluster abstraction used by the stress test and others
in that the process detection logic was finding false positives.

The old bin/start-*.sh scripts used the exec builtin to call their
daemons. Resurrect this as a simple fix; it also makes the ps list
shorter. Add a system test to catch this in the future.

Change-Id: I34ac8ce964f0c56287cee01360a4ba889f9568d7
Reviewed-on: http://gerrit.cloudera.org:8080/12391
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Parquet Scanner can write binary data into profile
> --
>
> Key: IMPALA-7779
> URL: https://issues.apache.org/jira/browse/IMPALA-7779
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Lars Volker
>Priority: Major
>  Labels: profile
>
> In 
> [hdfs-parquet-scanner.cc:1224|https://github.com/apache/impala/blob/master/be/src/exec/hdfs-parquet-scanner.cc#L1224]
>  we log an invalid file version string. Whatever 4 bytes that that pointer 
> points to will end up in the profile. These can be non-ascii characters, thus 
> potentially breaking tools that parse the profiles and expect their content 
> to be plain text. We should either remove the bytes from the message, or 
> escape them as hex.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7265) Cache remote file handles

2019-02-07 Thread Joe McDonnell (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763093#comment-16763093
 ] 

Joe McDonnell commented on IMPALA-7265:
---

[~arodoni_cloudera] Right, this is not an incompatible change.

> Cache remote file handles
> -
>
> Key: IMPALA-7265
> URL: https://issues.apache.org/jira/browse/IMPALA-7265
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
> Fix For: Impala 3.2.0
>
>
> The file handle cache currently does not allow caching remote file handles. 
> This means that clusters that have a lot of remote reads can suffer from 
> overloading the NameNode. Impala should be able to cache remote file handles.
> There are some open questions about remote file handles and whether they 
> behave differently from local file handles. In particular:
>  # Is there any resource constraint on the number of remote file handles 
> open? (e.g. do they maintain a network connection?)
>  # Are there any semantic differences in how remote file handles behave when 
> files are deleted, overwritten, or appended?
>  # Are there any extra failure cases for remote file handles? (i.e. if a 
> machine goes down or a remote file handle is left open for an extended period 
> of time)
> The form of caching will depend on the answers, but at the very least, it 
> should be possible to cache a remote file handle at the level of a query so 
> that a Parquet file with multiple columns can share file handles.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7265) Cache remote file handles

2019-02-07 Thread Alex Rodoni (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-7265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16763079#comment-16763079
 ] 

Alex Rodoni commented on IMPALA-7265:
-

[~joemcdonnell] Since cache_remote_file_handles is a new flag, this new 
behavior does not have to categorized as an "incompatible change", right?

> Cache remote file handles
> -
>
> Key: IMPALA-7265
> URL: https://issues.apache.org/jira/browse/IMPALA-7265
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Critical
> Fix For: Impala 3.2.0
>
>
> The file handle cache currently does not allow caching remote file handles. 
> This means that clusters that have a lot of remote reads can suffer from 
> overloading the NameNode. Impala should be able to cache remote file handles.
> There are some open questions about remote file handles and whether they 
> behave differently from local file handles. In particular:
>  # Is there any resource constraint on the number of remote file handles 
> open? (e.g. do they maintain a network connection?)
>  # Are there any semantic differences in how remote file handles behave when 
> files are deleted, overwritten, or appended?
>  # Are there any extra failure cases for remote file handles? (i.e. if a 
> machine goes down or a remote file handle is left open for an extended period 
> of time)
> The form of caching will depend on the answers, but at the very least, it 
> should be possible to cache a remote file handle at the level of a query so 
> that a Parquet file with multiple columns can share file handles.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8173) run-workload.py KeyError on 'query_id'

2019-02-07 Thread Thomas Tauber-Marshall (JIRA)
Thomas Tauber-Marshall created IMPALA-8173:
--

 Summary: run-workload.py KeyError on 'query_id'
 Key: IMPALA-8173
 URL: https://issues.apache.org/jira/browse/IMPALA-8173
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 3.2.0
Reporter: Thomas Tauber-Marshall


A recent commit (IMPALA-7694) broke bin/run-workload.py by requiring that an 
ImpalaBeeswaxResult is constructed with a query_id available, which is violated 
in query_exec_functions.py

We should fix this, and probably also add an automated test that runs 
run-workload.py to prevent regressions like this in the future

[~lv]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8173) run-workload.py KeyError on 'query_id'

2019-02-07 Thread Thomas Tauber-Marshall (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Tauber-Marshall reassigned IMPALA-8173:
--

Assignee: Thomas Tauber-Marshall

> run-workload.py KeyError on 'query_id'
> --
>
> Key: IMPALA-8173
> URL: https://issues.apache.org/jira/browse/IMPALA-8173
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.2.0
>Reporter: Thomas Tauber-Marshall
>Assignee: Thomas Tauber-Marshall
>Priority: Major
>
> A recent commit (IMPALA-7694) broke bin/run-workload.py by requiring that an 
> ImpalaBeeswaxResult is constructed with a query_id available, which is 
> violated in query_exec_functions.py
> We should fix this, and probably also add an automated test that runs 
> run-workload.py to prevent regressions like this in the future
> [~lv]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7929) Impala query on HBASE table failing with InternalException: Required field*

2019-02-07 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated IMPALA-7929:
--
Component/s: Frontend

> Impala query on HBASE table failing with InternalException: Required field*
> ---
>
> Key: IMPALA-7929
> URL: https://issues.apache.org/jira/browse/IMPALA-7929
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 3.2.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
>
> This looks a corner case bug demonstrated at impala-hbase boundary.
> The way to reproduce:
> Create a table in hive shell,
> {code}
> create database abc;
> CREATE TABLE abc.test_hbase1 (k STRING, c STRING) STORED BY 
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('
> hbase.columns.mapping'=':key,cf:c', 'serialization.format'='1') TBLPROPERTIES 
> ('hbase.table.name'='test_hbase1', 'storage_handler'='o
> rg.apache.hadoop.hive.hbase.HBaseStorageHandler');
> {code}
> Then issue query at impala shell:
> {code}
> select * from abc.test_hbase1 where k != "row1"; 
> {code}
> Observe:
> {code}
> Query: select * from abc.test_hbase1 where k != "row1"
>  
> Query submitted at: 2018-12-04 17:02:42 (Coordinator: http://xyz:25000)
> ERROR: InternalException: Required field 'qualifier' was not present! Struct: 
> THBaseFilter(family::key, qualifier:null, op_ordinal:3, filter_constant:row1)
> {code}
> More observations:
> # Replacing {{k != "row1"}} with {{k <> "row1"}} fails the same way. However, 
> replacing it with other operators, such as ">", "<", "=", all works.
> # Replacing {{k != "row1}} with {{c != "row1"}}, it succeeded without the 
> error reported above.
> The above example uses a two-column table, creating a similar table with 
> three columns fails the same way: adding inequality predicate on the first 
> column fails, adding inequility predicate doesn't fail.
> The code that issues the error message is in HBase, it seems Impala did not 
> pass the needed info to HBase in this special case. Also wonder if it's 
> because the first column of the table is the key in hbase table that could 
> reveal the bug.
> {code}
> hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TColumnIncrement.java:
>   throw new org.apache.thrift.protocol.TProtocolException("Required field 
> 'qualifier' was not present! Struct: " + toString());
> hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TColumnValue.java:
>   throw new org.apache.thrift.protocol.TProtocolException("Required field 
> 'qualifier' was not present! Struct: " + toString());
> hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift2/generated/THBaseService.java:
> throw new org.apache.thrift.protocol.TProtocolException("Required 
> field 'qualifier' was not present! Struct: " + toString());
> hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift2/generated/THBaseService.java:
> throw new org.apache.thrift.protocol.TProtocolException("Required 
> field 'qualifier' was not present! Struct: " + toString());
> hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift2/generated/THBaseService.java:
> throw new org.apache.thrift.protocol.TProtocolException("Required 
> field 'qualifier' was not present! Struct: " + toString());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7929) Impala query on HBASE table failing with InternalException: Required field*

2019-02-07 Thread Yongjun Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongjun Zhang updated IMPALA-7929:
--
Affects Version/s: Impala 3.2.0

> Impala query on HBASE table failing with InternalException: Required field*
> ---
>
> Key: IMPALA-7929
> URL: https://issues.apache.org/jira/browse/IMPALA-7929
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.2.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
>
> This looks a corner case bug demonstrated at impala-hbase boundary.
> The way to reproduce:
> Create a table in hive shell,
> {code}
> create database abc;
> CREATE TABLE abc.test_hbase1 (k STRING, c STRING) STORED BY 
> 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ('
> hbase.columns.mapping'=':key,cf:c', 'serialization.format'='1') TBLPROPERTIES 
> ('hbase.table.name'='test_hbase1', 'storage_handler'='o
> rg.apache.hadoop.hive.hbase.HBaseStorageHandler');
> {code}
> Then issue query at impala shell:
> {code}
> select * from abc.test_hbase1 where k != "row1"; 
> {code}
> Observe:
> {code}
> Query: select * from abc.test_hbase1 where k != "row1"
>  
> Query submitted at: 2018-12-04 17:02:42 (Coordinator: http://xyz:25000)
> ERROR: InternalException: Required field 'qualifier' was not present! Struct: 
> THBaseFilter(family::key, qualifier:null, op_ordinal:3, filter_constant:row1)
> {code}
> More observations:
> # Replacing {{k != "row1"}} with {{k <> "row1"}} fails the same way. However, 
> replacing it with other operators, such as ">", "<", "=", all works.
> # Replacing {{k != "row1}} with {{c != "row1"}}, it succeeded without the 
> error reported above.
> The above example uses a two-column table, creating a similar table with 
> three columns fails the same way: adding inequality predicate on the first 
> column fails, adding inequility predicate doesn't fail.
> The code that issues the error message is in HBase, it seems Impala did not 
> pass the needed info to HBase in this special case. Also wonder if it's 
> because the first column of the table is the key in hbase table that could 
> reveal the bug.
> {code}
> hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TColumnIncrement.java:
>   throw new org.apache.thrift.protocol.TProtocolException("Required field 
> 'qualifier' was not present! Struct: " + toString());
> hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift2/generated/TColumnValue.java:
>   throw new org.apache.thrift.protocol.TProtocolException("Required field 
> 'qualifier' was not present! Struct: " + toString());
> hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift2/generated/THBaseService.java:
> throw new org.apache.thrift.protocol.TProtocolException("Required 
> field 'qualifier' was not present! Struct: " + toString());
> hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift2/generated/THBaseService.java:
> throw new org.apache.thrift.protocol.TProtocolException("Required 
> field 'qualifier' was not present! Struct: " + toString());
> hbase-thrift/src/main/java/org/apache/hadoop/hbase/thrift2/generated/THBaseService.java:
> throw new org.apache.thrift.protocol.TProtocolException("Required 
> field 'qualifier' was not present! Struct: " + toString());
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6032) Configuration knobs to automatically reject and fail queries

2019-02-07 Thread Sahil Takiar (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762940#comment-16762940
 ] 

Sahil Takiar commented on IMPALA-6032:
--

FWIW Hive has a few similar configuration options to do this (not advocating 
this is the right approach for Impala):
 * hive.strict.checks.orderby.no.limit
 ** Default: false
 ** Description: Rejects queries with an order by, but no limit
 * hive.strict.checks.no.partition.filter
 ** Default: false
 ** Description: Rejects queries that do a full table scan against a 
partitioned table
 * hive.strict.checks.cartesian.product
 ** Default: false
 ** Description: Rejects queries with cross joins

> Configuration knobs to automatically reject and fail queries
> 
>
> Key: IMPALA-6032
> URL: https://issues.apache.org/jira/browse/IMPALA-6032
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Distributed Exec
>Reporter: Mostafa Mokhtar
>Priority: Minor
>  Labels: admission-control, resource-management
>
> Umbrella JIRA for Admission control enhancements.
> Query options would be set on a resource pool basis. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7504) ParseKerberosPrincipal() should use krb5_parse_name() instead

2019-02-07 Thread Michael Ho (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Ho updated IMPALA-7504:
---
Target Version: Product Backlog  (was: Impala 3.2.0)

> ParseKerberosPrincipal() should use krb5_parse_name() instead
> -
>
> Key: IMPALA-7504
> URL: https://issues.apache.org/jira/browse/IMPALA-7504
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Security
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Michael Ho
>Priority: Minor
>  Labels: ramp-up
>
> [~tlipcon] pointed out during code review that we should be using 
> krb5_parse_name() to parse the principal instead of creating our own
> bq. I wonder whether we should just be using krb5_parse_name here instead of 
> implementing our own parsing? According to 
> [http://web.mit.edu/kerberos/krb5-1.15/doc/appdev/refs/api/krb5_parse_name.html]
>  there are various escapings, etc, that this function isn't currently 
> supporting.
> We currently do the following to parse the principal:
> {noformat}
>   vector names;
>   split(names, principal, is_any_of("/"));
>   if (names.size() != 2) return Status(TErrorCode::BAD_PRINCIPAL_FORMAT, 
> principal);
>   *service_name = names[0];
>   string remaining_principal = names[1];
>   split(names, remaining_principal, is_any_of("@"));
>   if (names.size() != 2) return Status(TErrorCode::BAD_PRINCIPAL_FORMAT, 
> principal);
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8169) update some random query generator infra settings

2019-02-07 Thread Michael Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Brown resolved IMPALA-8169.
---
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> update some random query generator infra settings
> -
>
> Key: IMPALA-8169
> URL: https://issues.apache.org/jira/browse/IMPALA-8169
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 3.2.0
>Reporter: Michael Brown
>Assignee: Michael Brown
>Priority: Major
> Fix For: Impala 3.2.0
>
>
> I've been running the random query generator in a downstream environment for 
> a long time and would like to update a few variables and other things that I 
> think would benefit others.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7284) Automatically determine shutdown grace period based on configuration

2019-02-07 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-7284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7284:
--
Target Version: Product Backlog  (was: Impala 3.2.0)

> Automatically determine shutdown grace period based on configuration
> 
>
> Key: IMPALA-7284
> URL: https://issues.apache.org/jira/browse/IMPALA-7284
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: admission-control, resource-management
>
> Following on from IMPALA-1760, we should improve usability by allowing 
> automatic configuration of the minimum quiesce period to match admission 
> control configs, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6590) Disable expr rewrites and codegen for VALUES() statements

2019-02-07 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-6590:
--
Issue Type: Bug  (was: Improvement)

> Disable expr rewrites and codegen for VALUES() statements
> -
>
> Key: IMPALA-6590
> URL: https://issues.apache.org/jira/browse/IMPALA-6590
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 2.11.0
>Reporter: Alexander Behm
>Priority: Major
>  Labels: perf, planner, ramp-up, regression
>
> The analysis of statements with big VALUES clauses like INSERT INTO  
> VALUES is slow due to expression rewrites like constant folding. The 
> performance of such statements has regressed since the introduction of expr 
> rewrites and constant folding in IMPALA-1788.
> We should skip expr rewrites for VALUES altogether since it mostly provides 
> no benefit but can have a large overhead due to evaluation of expressions in 
> the backend (constant folding). These expressions are ultimately evaluated 
> and materialized in the backend anyway, so there's no point in folding them 
> during analysis.
> Similarly, there is no point in doing codegen for these exprs in the backend 
> union node.
> *Workaround*
> {code}
> SET ENABLE_EXPR_REWRITES=FALSE;
> SET DISABLE_CODEGEN=TRUE;
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-5063) Enable monitoring of Admission Control queue information

2019-02-07 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-5063:
--
Target Version: Product Backlog  (was: Impala 3.2.0)

> Enable monitoring of Admission Control queue information
> 
>
> Key: IMPALA-5063
> URL: https://issues.apache.org/jira/browse/IMPALA-5063
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.9.0
>Reporter: Miklos Szurap
>Priority: Major
>  Labels: admission-control, resource-management, supportability
>
> It would be nice if we could track the Admission Control / queue information 
> from the StateStore WebUI. 
> The topics page just shows a summary about "impala-request-queue" but nothing 
> on the details of the queues / number of queries / mem usage.
> Besides showing this on the WebUI, it would be nice to have it logged, so 
> there would be some kind of historical view. 
> These would enable to track issues when a query is rejected due to admission 
> control.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6596) Query failed with OOM on coordinator while remote fragments on other nodes continue to run

2019-02-07 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-6596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-6596:
--
Target Version: Product Backlog  (was: Impala 2.13.0, Impala 3.2.0)

> Query failed with OOM on coordinator while remote fragments on other nodes 
> continue to run
> --
>
> Key: IMPALA-6596
> URL: https://issues.apache.org/jira/browse/IMPALA-6596
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.11.0
>Reporter: Mostafa Mokhtar
>Priority: Critical
> Attachments: RowSizeFailureProfile.txt
>
>
> This is somewhat similar to IMPALA-2990.
> Query 
> {code:java}
> set NUM_SCANNER_THREADS=2;
> set MAX_ROW_SIZE=4194304;
> with cte as (select group_concat(cast(fnv_hash(concat(o_comment, 'd')) as 
> string)) as c1,group_concat(cast(fnv_hash(concat(o_comment, 'e')) as string)) 
> as c2 from orders where o_orderkey <12 and o_orderdate <"1993-01-01" 
> union all select group_concat(cast(fnv_hash(concat(o_comment, 'd')) as 
> string)) as c1,group_concat(cast(fnv_hash(concat(o_comment, 'e')) as string)) 
> as c2 from orders where o_orderkey <12 and o_orderdate 
> <"1993-01-01"), cte2 as (select c1,c2,s_suppkey from cte , supplier) select 
> count(*) from cte2 t1, cte2 t2 where t1.s_suppkey = t2.s_suppkey group by 
> t1.c1 , t1.c2 , t2.c1 , t2.c2 having count(*) = 1
> {code}
> Failed on coordinator node which is also an executor with 
> {code:java}
> h4. _Status:_ Row of size 1.82 GB could not be materialized in plan node with 
> id 14. Increase the max_row_size query option (currently 4.00 MB) to process 
> larger rows.
> {code}
> Log on the coordinator has lots of entries with 
> {code:java}
> I0227 19:20:58.057637 62974 impala-server.cc:1196] ReportExecStatus(): 
> Received report for unknown query ID (probably closed or cancelled): 
> 9b439fc1ee1addb7:82d41569 I0227 19:20:58.152979 63129 
> impala-server.cc:1196] ReportExecStatus(): Received report for unknown query 
> ID (probably closed or cancelled): 9b439fc1ee1addb7:82d41569 I0227 
> 19:20:58.714336 63930 impala-server.cc:1196] ReportExecStatus(): Received 
> report for unknown query ID (probably closed or cancelled): 
> 9b439fc1ee1addb7:82d41569 I0227 19:20:58.718415 63095 
> impala-server.cc:1196] ReportExecStatus(): Received report for unknown query 
> ID (probably closed or cancelled): 9b439fc1ee1addb7:82d41569 I0227 
> 19:20:58.757306 63339 impala-server.cc:1196] ReportExecStatus(): Received 
> report for unknown query ID (probably closed or cancelled): 
> 9b439fc1ee1addb7:82d41569 I0227 19:20:58.762310 63406 
> impala-server.cc:1196] ReportExecStatus(): Received report for unknown query 
> ID (probably closed or cancelled): 9b439fc1ee1addb7:82d41569
> {code}
> From the memz tab on a different node. 
> h2. Memory Usage
> Memory consumption / limit: *142.87 GB* / *100.00 GB*
> h3. Breakdown
> {code}
> Process: memory limit exceeded. Limit=100.00 GB Total=141.05 GB Peak=144.25 
> GB Buffer Pool: Free Buffers: Total=0 Buffer Pool: Clean Pages: Total=0 
> Buffer Pool: Unused Reservation: Total=-118.00 MB TCMalloc Overhead: 
> Total=1.68 GB RequestPool=root.default: Total=40.10 GB Peak=89.40 GB 
> Query(9b439fc1ee1addb7:82d41569): Reservation=122.00 MB 
> ReservationLimit=80.00 GB OtherMemory=39.98 GB Total=40.10 GB Peak=40.10 GB 
> Unclaimed reservations: Reservation=72.00 MB OtherMemory=0 Total=72.00 MB 
> Peak=226.00 MB Fragment 9b439fc1ee1addb7:82d415690059: Reservation=0 
> OtherMemory=0 Total=0 Peak=632.88 KB AGGREGATION_NODE (id=29): Total=0 
> Peak=76.12 KB EXCHANGE_NODE (id=28): Reservation=0 OtherMemory=0 Total=0 
> Peak=0 DataStreamRecvr: Total=0 Peak=0 DataStreamSender (dst_id=30): Total=0 
> Peak=1.75 KB CodeGen: Total=0 Peak=547.00 KB Fragment 
> 9b439fc1ee1addb7:82d415690050: Reservation=0 OtherMemory=0 Total=0 
> Peak=3.67 GB AGGREGATION_NODE (id=15): Total=0 Peak=76.12 KB HASH_JOIN_NODE 
> (id=14): Reservation=0 OtherMemory=0 Total=0 Peak=1.85 GB Hash Join Builder 
> (join_node_id=14): Total=0 Peak=13.12 KB EXCHANGE_NODE (id=26): Reservation=0 
> OtherMemory=0 Total=0 Peak=1.82 GB DataStreamRecvr: Total=0 Peak=1.82 GB 
> EXCHANGE_NODE (id=27): Reservation=0 OtherMemory=0 Total=0 Peak=1.82 GB 
> DataStreamRecvr: Total=0 Peak=1.82 GB DataStreamSender (dst_id=28): Total=0 
> Peak=15.75 KB CodeGen: Total=0 Peak=1.81 MB Fragment 
> 9b439fc1ee1addb7:82d415690023: Reservation=26.00 MB OtherMemory=19.99 GB 
> Total=20.02 GB Peak=20.02 GB Runtime Filter Bank: Reservation=2.00 MB 
> ReservationLimit=2.00 MB OtherMemory=0 Total=2.00 MB Peak=2.00 MB 
> NESTED_LOOP_JOIN_NODE (id=6): Total=3.63 GB Peak=3.63 GB Nested Loop Join 
> Builder: Total=3.63 GB 

[jira] [Assigned] (IMPALA-5043) Flag when Impala daemon is disconnected from statestore

2019-02-07 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-5043:
-

Assignee: Tim Armstrong

> Flag when Impala daemon is disconnected from statestore
> ---
>
> Key: IMPALA-5043
> URL: https://issues.apache.org/jira/browse/IMPALA-5043
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.6.0
>Reporter: Thomas Scott
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: admission-control, resource-management, supportability
>
> When (for whatever reason) one or more daemons are disconnected from the 
> statestore the admission control data held on the daemon goes stale. This can 
> lead to the daemon accepting queries when there is not capacity or rejecting 
> queries when there is capacity. 
> For example, a pool somepool has a limit of 10 concurrent queries and is at 
> that limit when a daemon is disconnected from the statestore. Even when other 
> queries in somepool finish and the pool is now empty the disconnected daemon 
> will report the following when new queries are executed:
> ERROR: Admission for query exceeded timeout 6ms. Queued reason: number of 
> running queries 10 is over limit 10
> Could we have some warning to say that the admission control data is stale 
> here?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-5043) Admission control error messages don't hint that information is stale when disconnected from statestore

2019-02-07 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-5043:
--
Summary: Admission control error messages don't hint that information is 
stale when disconnected from statestore  (was: Admission control error messages 
don't)

> Admission control error messages don't hint that information is stale when 
> disconnected from statestore
> ---
>
> Key: IMPALA-5043
> URL: https://issues.apache.org/jira/browse/IMPALA-5043
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.6.0
>Reporter: Thomas Scott
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: admission-control, resource-management, supportability
>
> When (for whatever reason) one or more daemons are disconnected from the 
> statestore the admission control data held on the daemon goes stale. This can 
> lead to the daemon accepting queries when there is not capacity or rejecting 
> queries when there is capacity. 
> For example, a pool somepool has a limit of 10 concurrent queries and is at 
> that limit when a daemon is disconnected from the statestore. Even when other 
> queries in somepool finish and the pool is now empty the disconnected daemon 
> will report the following when new queries are executed:
> ERROR: Admission for query exceeded timeout 6ms. Queued reason: number of 
> running queries 10 is over limit 10
> Could we have some warning to say that the admission control data is stale 
> here?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-5043) Flag when Impala daemon is disconnected from statestore

2019-02-07 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-5043:
--
Issue Type: Bug  (was: Improvement)

> Flag when Impala daemon is disconnected from statestore
> ---
>
> Key: IMPALA-5043
> URL: https://issues.apache.org/jira/browse/IMPALA-5043
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.6.0
>Reporter: Thomas Scott
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: admission-control, resource-management, supportability
>
> When (for whatever reason) one or more daemons are disconnected from the 
> statestore the admission control data held on the daemon goes stale. This can 
> lead to the daemon accepting queries when there is not capacity or rejecting 
> queries when there is capacity. 
> For example, a pool somepool has a limit of 10 concurrent queries and is at 
> that limit when a daemon is disconnected from the statestore. Even when other 
> queries in somepool finish and the pool is now empty the disconnected daemon 
> will report the following when new queries are executed:
> ERROR: Admission for query exceeded timeout 6ms. Queued reason: number of 
> running queries 10 is over limit 10
> Could we have some warning to say that the admission control data is stale 
> here?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-5043) Admission control error messages don't

2019-02-07 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-5043:
--
Issue Type: Improvement  (was: Bug)

> Admission control error messages don't
> --
>
> Key: IMPALA-5043
> URL: https://issues.apache.org/jira/browse/IMPALA-5043
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.6.0
>Reporter: Thomas Scott
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: admission-control, resource-management, supportability
>
> When (for whatever reason) one or more daemons are disconnected from the 
> statestore the admission control data held on the daemon goes stale. This can 
> lead to the daemon accepting queries when there is not capacity or rejecting 
> queries when there is capacity. 
> For example, a pool somepool has a limit of 10 concurrent queries and is at 
> that limit when a daemon is disconnected from the statestore. Even when other 
> queries in somepool finish and the pool is now empty the disconnected daemon 
> will report the following when new queries are executed:
> ERROR: Admission for query exceeded timeout 6ms. Queued reason: number of 
> running queries 10 is over limit 10
> Could we have some warning to say that the admission control data is stale 
> here?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-5043) Admission control error messages don't

2019-02-07 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-5043:
--
Summary: Admission control error messages don't  (was: Flag when Impala 
daemon is disconnected from statestore)

> Admission control error messages don't
> --
>
> Key: IMPALA-5043
> URL: https://issues.apache.org/jira/browse/IMPALA-5043
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.6.0
>Reporter: Thomas Scott
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: admission-control, resource-management, supportability
>
> When (for whatever reason) one or more daemons are disconnected from the 
> statestore the admission control data held on the daemon goes stale. This can 
> lead to the daemon accepting queries when there is not capacity or rejecting 
> queries when there is capacity. 
> For example, a pool somepool has a limit of 10 concurrent queries and is at 
> that limit when a daemon is disconnected from the statestore. Even when other 
> queries in somepool finish and the pool is now empty the disconnected daemon 
> will report the following when new queries are executed:
> ERROR: Admission for query exceeded timeout 6ms. Queued reason: number of 
> running queries 10 is over limit 10
> Could we have some warning to say that the admission control data is stale 
> here?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-5063) Periodically log admission control queue info

2019-02-07 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-5063:
-

Assignee: Bikramjeet Vig

> Periodically log admission control queue info
> -
>
> Key: IMPALA-5063
> URL: https://issues.apache.org/jira/browse/IMPALA-5063
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.9.0
>Reporter: Miklos Szurap
>Assignee: Bikramjeet Vig
>Priority: Major
>  Labels: admission-control, resource-management, supportability
>
> It would be nice if we could track the Admission Control / queue information 
> from the StateStore WebUI. 
> The topics page just shows a summary about "impala-request-queue" but nothing 
> on the details of the queues / number of queries / mem usage.
> Besides showing this on the WebUI, it would be nice to have it logged, so 
> there would be some kind of historical view. 
> These would enable to track issues when a query is rejected due to admission 
> control.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-5063) Periodically log admission control queue info

2019-02-07 Thread Tim Armstrong (JIRA)


 [ 
https://issues.apache.org/jira/browse/IMPALA-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-5063:
--
Summary: Periodically log admission control queue info  (was: Enable 
monitoring of Admission Control queue information)

> Periodically log admission control queue info
> -
>
> Key: IMPALA-5063
> URL: https://issues.apache.org/jira/browse/IMPALA-5063
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.9.0
>Reporter: Miklos Szurap
>Priority: Major
>  Labels: admission-control, resource-management, supportability
>
> It would be nice if we could track the Admission Control / queue information 
> from the StateStore WebUI. 
> The topics page just shows a summary about "impala-request-queue" but nothing 
> on the details of the queues / number of queries / mem usage.
> Besides showing this on the WebUI, it would be nice to have it logged, so 
> there would be some kind of historical view. 
> These would enable to track issues when a query is rejected due to admission 
> control.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-5063) Periodically log admission control queue info

2019-02-07 Thread Tim Armstrong (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-5063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762881#comment-16762881
 ] 

Tim Armstrong commented on IMPALA-5063:
---

IMPALA-8092 added this to the coordinator web UI. We're missing historical 
logging.

> Periodically log admission control queue info
> -
>
> Key: IMPALA-5063
> URL: https://issues.apache.org/jira/browse/IMPALA-5063
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.9.0
>Reporter: Miklos Szurap
>Priority: Major
>  Labels: admission-control, resource-management, supportability
>
> It would be nice if we could track the Admission Control / queue information 
> from the StateStore WebUI. 
> The topics page just shows a summary about "impala-request-queue" but nothing 
> on the details of the queues / number of queries / mem usage.
> Besides showing this on the WebUI, it would be nice to have it logged, so 
> there would be some kind of historical view. 
> These would enable to track issues when a query is rejected due to admission 
> control.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-8159) TAcceptQueueServer: Caught TException: invalid sasl status

2019-02-07 Thread Jinjie Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762696#comment-16762696
 ] 

Jinjie Zhang edited comment on IMPALA-8159 at 2/7/19 2:02 PM:
--

This only happen when enable sasl authentication.

I have known the invalid sasl status reason, which is caused by a misconfigured 
tableau client. When I run the test code, there is a tableau connection to 
access impala service which is periodically refreshed by tableau server. The 
client is configured to use http authentication protocol to access 21050 port. 
Once the refresh happen, impalad thrift server will report invalid sasl status 
and it will hang all thrift client.

But I don't know why one client is misconfigured, it will cause all clients 
hang for serval minutes, which may be a bad design. I check the impalad 
TAcceptQueueServer source code, which seems to use one thread to deal with 
authentication connection.


was (Author: jinjie zhang):
This only happen when authenticating against LDAP.

I have known the invalid sasl status reason, which is caused by a misconfigured 
tableau client. When I run the test code, there is a tableau connection to 
access impala service which is periodically refreshed by tableau server. The 
client is configured to use http authentication protocol to access 21050 port. 
Once the refresh happen, impalad thrift server will report invalid sasl status 
and it will hang all thrift client.

But I don't know why one client is misconfigured, it will cause all clients 
hang for serval minutes, which may be a bad design. I check the impalad 
TAcceptQueueServer source code, which seems to use one thread to deal with 
authentication connection.

> TAcceptQueueServer: Caught TException: invalid sasl status
> --
>
> Key: IMPALA-8159
> URL: https://issues.apache.org/jira/browse/IMPALA-8159
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.11.0
> Environment: python: 3.6.6
> impyla: 0.14.1
> thrift-sasl: https://github.com/yimian/thrift_sasl.git
> impala: 2.11.0 from cdh5.14.2-1.cdh5.14.2.p0.3
> os: CentOS Linux release 7.2.1511 (Core)
>Reporter: Jinjie Zhang
>Priority: Critical
>
> I enable ldap user authentication for impalad. I run the following code to 
> test impalad stability, and I find when I run ten thousand times, the 
> following error will occur two times in impalad.INFO log file.
> test code:
>  
> {code:java}
> import impala.dbapi
> test_conf =
> { 'host': '172.16.24.xx', 'port': 21050, 'user': '', 'password': 'x' }
> def test_impala(conf):
>       conn = impala.dbapi.connect(**conf)
>       conn.close()
>  
> N = 1
> for i in range(N):
>       test_impala(test_conf){code}
>  
> error info:
> {code:java}
> I0204 15:41:40.600821 168276 thrift-util.cc:123] TAcceptQueueServer: Caught 
> TException: invalid sasl status
> I0204 15:46:40.600451 168276 thrift-util.cc:123] TAcceptQueueServer: Caught 
> TException: EAGAIN (timed out)
> I0204 15:46:40.609858 168276 authentication.cc:268] Trying simple LDAP bind 
> for: uid=,dc=
> I0204 15:46:40.729262 168276 authentication.cc:280] LDAP bind successful
> I0204 15:46:40.729287 168276 authentication.cc:478] Successfully 
> authenticated client user ""{code}
> When this error occurs, the connect request will hang for serveral minutes, 
> and all subsequent connect request including request from hue will hang too.
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8159) TAcceptQueueServer: Caught TException: invalid sasl status

2019-02-07 Thread Jinjie Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762696#comment-16762696
 ] 

Jinjie Zhang commented on IMPALA-8159:
--

This only happen when authenticating against LDAP.

I have known the invalid sasl status reason, which is caused by a misconfigured 
tableau client. When I run the test code, there is a tableau connection to 
access impala service which is periodically refreshed by tableau server. The 
client is configured to use http authentication protocol to access 21050 port. 
Once the refresh happen, impalad thrift server will report invalid sasl status 
and it will hang all thrift client.

But I don't know why one client is misconfigured, it will cause all clients 
hang for serval minutes, which may be a bad design. I check the impalad 
TAcceptQueueServer source code, which seems to use one thread to deal with 
authentication connection.

> TAcceptQueueServer: Caught TException: invalid sasl status
> --
>
> Key: IMPALA-8159
> URL: https://issues.apache.org/jira/browse/IMPALA-8159
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.11.0
> Environment: python: 3.6.6
> impyla: 0.14.1
> thrift-sasl: https://github.com/yimian/thrift_sasl.git
> impala: 2.11.0 from cdh5.14.2-1.cdh5.14.2.p0.3
> os: CentOS Linux release 7.2.1511 (Core)
>Reporter: Jinjie Zhang
>Priority: Critical
>
> I enable ldap user authentication for impalad. I run the following code to 
> test impalad stability, and I find when I run ten thousand times, the 
> following error will occur two times in impalad.INFO log file.
> test code:
>  
> {code:java}
> import impala.dbapi
> test_conf =
> { 'host': '172.16.24.xx', 'port': 21050, 'user': '', 'password': 'x' }
> def test_impala(conf):
>       conn = impala.dbapi.connect(**conf)
>       conn.close()
>  
> N = 1
> for i in range(N):
>       test_impala(test_conf){code}
>  
> error info:
> {code:java}
> I0204 15:41:40.600821 168276 thrift-util.cc:123] TAcceptQueueServer: Caught 
> TException: invalid sasl status
> I0204 15:46:40.600451 168276 thrift-util.cc:123] TAcceptQueueServer: Caught 
> TException: EAGAIN (timed out)
> I0204 15:46:40.609858 168276 authentication.cc:268] Trying simple LDAP bind 
> for: uid=,dc=
> I0204 15:46:40.729262 168276 authentication.cc:280] LDAP bind successful
> I0204 15:46:40.729287 168276 authentication.cc:478] Successfully 
> authenticated client user ""{code}
> When this error occurs, the connect request will hang for serveral minutes, 
> and all subsequent connect request including request from hue will hang too.
>   



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-8170) Impala Doc: Add the Special Considerations for running SSL/TLS and a proxy

2019-02-07 Thread Alex Moundalexis (JIRA)


[ 
https://issues.apache.org/jira/browse/IMPALA-8170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16762689#comment-16762689
 ] 

Alex Moundalexis commented on IMPALA-8170:
--

Proposed a small addition to the Doc.

> Impala Doc: Add the Special Considerations for running SSL/TLS and a proxy
> --
>
> Key: IMPALA-8170
> URL: https://issues.apache.org/jira/browse/IMPALA-8170
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org