[jira] [Commented] (IMPALA-7044) int32 overflow in HdfsTableSink::CreateNewTmpFile()

2018-05-21 Thread Lars Volker (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-7044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482673#comment-16482673
 ] 

Lars Volker commented on IMPALA-7044:
-

[~tarmstrong] - I agree. We currently support 10992 columns (2GB / (3 * 64KB)) 
because we aim to have 3 space for 3 pages per column.

The [Impala Cookbook mentions on Slide 
11|https://www.slideshare.net/cloudera/the-impala-cookbook-42530186] a limit of 
2k columns and points out that the sizes of metadata updates and column stats 
can become an issue when having too many columns, too.

> int32 overflow in HdfsTableSink::CreateNewTmpFile()
> ---
>
> Key: IMPALA-7044
> URL: https://issues.apache.org/jira/browse/IMPALA-7044
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 3.0, 
> Impala 2.12.0, Impala 2.13.0
>Reporter: Lars Volker
>Priority: Critical
>  Labels: parquet
> Attachments: ct.sql
>
>
> When writing Parquet files we compute a minimum block size based on the 
> number of columns in the target table in 
> [hdfs-parquet-table-writer.cc:916|https://github.com/apache/impala/blob/master/be/src/exec/hdfs-parquet-table-writer.cc?utf8=%E2%9C%93#L916]:
> {noformat}
> 3 * DEFAULT_DATA_PAGE_SIZE * columns_.size();
> {noformat}
> For tables with a large number of columns (> ~10k), this value will get 
> larger than 2GB. When we pass it to {{hdfsOpenFile()}} in 
> {{HdfsTableSink::CreateNewTmpFile()}} it gets cast to a signed int32 and can 
> overflow.
> This leads to error messages like the following:
> {noformat}
> I0516 16:13:52.755090 24257 status.cc:125] Failed to open HDFS file for 
> writing: 
> hdfs://localhost:20500/test-warehouse/lv.db/a/_impala_insert_staging/3c417cb973b710ab_803e8980/.3c417cb973b710ab-80
> 3e8980_411033576_dir/3c417cb973b710ab-803e8980_271567064_data.0.parq
> Error(255): Unknown error 255
> Root cause: RemoteException: Specified block size is less than configured 
> minimum value (dfs.namenode.fs-limits.min-block-size): -1935671296 < 1024
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2417)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2339)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:764)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:451)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> @  0x187b8b3  impala::Status::Status()
> @  0x1fade89  impala::HdfsTableSink::CreateNewTmpFile()
> @  0x1faeee7  impala::HdfsTableSink::InitOutputPartition()
> @  0x1fb1389  impala::HdfsTableSink::GetOutputPartition()
> @  0x1faf34a  impala::HdfsTableSink::Send()
> @  0x1c91bcd  impala::FragmentInstanceState::ExecInternal()
> @  0x1c8efa5  impala::FragmentInstanceState::Exec()
> @  0x1c9e53f  impala::QueryState::ExecFInstance()
> @  0x1c9cdb2  
> _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
> @  0x1c9f25d  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
> @  0x1bd6cd4  boost::function0<>::operator()()
> @  0x1ec18f9  impala::Thread::SuperviseThread()
> @  0x1ec9a95  boost::_bi::list5<>::operator()<>()
> @  0x1ec99b9  boost::_bi::bind_t<>::operator()()
> @  0x1ec997c  boost::detail::thread_data<>::run()
> @  0x31a527a  thread_proxy
> @ 0x7f30246a8184  start_thread
> @ 0x7f30243d503d  clone
> {noformat}
> The signature of {{hdfsOpenFile()}} is as follows:
> {noformat}
> hdfsFile hdfsOpenFile(hdfsFS fs, const char* path, int flags, int bufferSize, 
> short replication, tSize blocksize);
> {noformat}
> {{tSize}} is 

[jira] [Commented] (IMPALA-7044) int32 overflow in HdfsTableSink::CreateNewTmpFile()

2018-05-21 Thread Tim Armstrong (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-7044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482642#comment-16482642
 ] 

Tim Armstrong commented on IMPALA-7044:
---

[~lv] maybe we should just cap the number of columns that we support writing? I 
don't think it makes sense to support an unlimited number of columns for 
inserts - at some point we should start putting backpressure on users before 
they push the system outside of the limits where it will behave well.

> int32 overflow in HdfsTableSink::CreateNewTmpFile()
> ---
>
> Key: IMPALA-7044
> URL: https://issues.apache.org/jira/browse/IMPALA-7044
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 3.0, 
> Impala 2.12.0, Impala 2.13.0
>Reporter: Lars Volker
>Priority: Critical
>  Labels: parquet
> Attachments: ct.sql
>
>
> When writing Parquet files we compute a minimum block size based on the 
> number of columns in the target table in 
> [hdfs-parquet-table-writer.cc:916|https://github.com/apache/impala/blob/master/be/src/exec/hdfs-parquet-table-writer.cc?utf8=%E2%9C%93#L916]:
> {noformat}
> 3 * DEFAULT_DATA_PAGE_SIZE * columns_.size();
> {noformat}
> For tables with a large number of columns (> ~10k), this value will get 
> larger than 2GB. When we pass it to {{hdfsOpenFile()}} in 
> {{HdfsTableSink::CreateNewTmpFile()}} it gets cast to a signed int32 and can 
> overflow.
> This leads to error messages like the following:
> {noformat}
> I0516 16:13:52.755090 24257 status.cc:125] Failed to open HDFS file for 
> writing: 
> hdfs://localhost:20500/test-warehouse/lv.db/a/_impala_insert_staging/3c417cb973b710ab_803e8980/.3c417cb973b710ab-80
> 3e8980_411033576_dir/3c417cb973b710ab-803e8980_271567064_data.0.parq
> Error(255): Unknown error 255
> Root cause: RemoteException: Specified block size is less than configured 
> minimum value (dfs.namenode.fs-limits.min-block-size): -1935671296 < 1024
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2417)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2339)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:764)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:451)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> @  0x187b8b3  impala::Status::Status()
> @  0x1fade89  impala::HdfsTableSink::CreateNewTmpFile()
> @  0x1faeee7  impala::HdfsTableSink::InitOutputPartition()
> @  0x1fb1389  impala::HdfsTableSink::GetOutputPartition()
> @  0x1faf34a  impala::HdfsTableSink::Send()
> @  0x1c91bcd  impala::FragmentInstanceState::ExecInternal()
> @  0x1c8efa5  impala::FragmentInstanceState::Exec()
> @  0x1c9e53f  impala::QueryState::ExecFInstance()
> @  0x1c9cdb2  
> _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
> @  0x1c9f25d  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
> @  0x1bd6cd4  boost::function0<>::operator()()
> @  0x1ec18f9  impala::Thread::SuperviseThread()
> @  0x1ec9a95  boost::_bi::list5<>::operator()<>()
> @  0x1ec99b9  boost::_bi::bind_t<>::operator()()
> @  0x1ec997c  boost::detail::thread_data<>::run()
> @  0x31a527a  thread_proxy
> @ 0x7f30246a8184  start_thread
> @ 0x7f30243d503d  clone
> {noformat}
> The signature of {{hdfsOpenFile()}} is as follows:
> {noformat}
> hdfsFile hdfsOpenFile(hdfsFS fs, const char* path, int flags, int bufferSize, 
> short replication, tSize blocksize);
> {noformat}
> {{tSize}} is typedef'd to {{int32_t}}.
> The comment of {{hdfsOpenFile()}} is explicit about this:
> 

[jira] [Created] (IMPALA-7053) Reorganise query options into groups

2018-05-21 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-7053:
-

 Summary: Reorganise query options into groups
 Key: IMPALA-7053
 URL: https://issues.apache.org/jira/browse/IMPALA-7053
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Tim Armstrong
Assignee: Tim Armstrong


We have quite a lot of query options now and we're adding more for things like 
resource limits (e.g. IMPALA-6035). It's getting harder for users to understand 
the organisation and find relevant query options. We should consider grouping 
similar query options.

E.g. for this set of resource limits, we could reorganise in various ways:
* mem_limit -> resources.memory.per_node_limit
* buffer_pool_limit -> resources.memory.buffer_pool.per_node_limit
* thread_reservation_limit  -> resources.threads.per_node_limit
* thread_reservation_aggregate_limit -> resources.threads.aggregate_limit
* exec_time_limit_s -> resources.wallclock.limit_s

We could do the conversion incrementally. It would probably make sense to agree 
on a top-level organisation up-front.
* planner - anything that controls planner decisions like join ordering, etc
* scheduler - anything that controls scheduler decisions (admission control 
could maybe be included here too)
* resources - resource management functionality (limits, etc)
* session - anything related to session management like timeouts
* exec - anything that changes query execution behaviour (e.g. codegen, batch 
sizes, runtime filters, etc)
* Probably a group for anything that changes the semantic behaviour of a query 
(e.g. decimal_v2, appx_count_distinct, strict_mode, abort_on_error).
* A group that controls read and write behaviour of file formats like 
compression, etc



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7044) int32 overflow in HdfsTableSink::CreateNewTmpFile()

2018-05-21 Thread Tim Armstrong (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-7044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482775#comment-16482775
 ] 

Tim Armstrong commented on IMPALA-7044:
---

Yeah, so regardless I think we should pick an upper limit and make sure that we 
test up to that limit to be sure it works well as part of fixing this bug.

> int32 overflow in HdfsTableSink::CreateNewTmpFile()
> ---
>
> Key: IMPALA-7044
> URL: https://issues.apache.org/jira/browse/IMPALA-7044
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 3.0, 
> Impala 2.12.0, Impala 2.13.0
>Reporter: Lars Volker
>Priority: Critical
>  Labels: parquet
> Attachments: ct.sql
>
>
> When writing Parquet files we compute a minimum block size based on the 
> number of columns in the target table in 
> [hdfs-parquet-table-writer.cc:916|https://github.com/apache/impala/blob/master/be/src/exec/hdfs-parquet-table-writer.cc?utf8=%E2%9C%93#L916]:
> {noformat}
> 3 * DEFAULT_DATA_PAGE_SIZE * columns_.size();
> {noformat}
> For tables with a large number of columns (> ~10k), this value will get 
> larger than 2GB. When we pass it to {{hdfsOpenFile()}} in 
> {{HdfsTableSink::CreateNewTmpFile()}} it gets cast to a signed int32 and can 
> overflow.
> This leads to error messages like the following:
> {noformat}
> I0516 16:13:52.755090 24257 status.cc:125] Failed to open HDFS file for 
> writing: 
> hdfs://localhost:20500/test-warehouse/lv.db/a/_impala_insert_staging/3c417cb973b710ab_803e8980/.3c417cb973b710ab-80
> 3e8980_411033576_dir/3c417cb973b710ab-803e8980_271567064_data.0.parq
> Error(255): Unknown error 255
> Root cause: RemoteException: Specified block size is less than configured 
> minimum value (dfs.namenode.fs-limits.min-block-size): -1935671296 < 1024
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2417)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2339)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:764)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:451)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> @  0x187b8b3  impala::Status::Status()
> @  0x1fade89  impala::HdfsTableSink::CreateNewTmpFile()
> @  0x1faeee7  impala::HdfsTableSink::InitOutputPartition()
> @  0x1fb1389  impala::HdfsTableSink::GetOutputPartition()
> @  0x1faf34a  impala::HdfsTableSink::Send()
> @  0x1c91bcd  impala::FragmentInstanceState::ExecInternal()
> @  0x1c8efa5  impala::FragmentInstanceState::Exec()
> @  0x1c9e53f  impala::QueryState::ExecFInstance()
> @  0x1c9cdb2  
> _ZZN6impala10QueryState15StartFInstancesEvENKUlvE_clEv
> @  0x1c9f25d  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala10QueryState15StartFInstancesEvEUlvE_vE6invokeERNS1_15function_bufferE
> @  0x1bd6cd4  boost::function0<>::operator()()
> @  0x1ec18f9  impala::Thread::SuperviseThread()
> @  0x1ec9a95  boost::_bi::list5<>::operator()<>()
> @  0x1ec99b9  boost::_bi::bind_t<>::operator()()
> @  0x1ec997c  boost::detail::thread_data<>::run()
> @  0x31a527a  thread_proxy
> @ 0x7f30246a8184  start_thread
> @ 0x7f30243d503d  clone
> {noformat}
> The signature of {{hdfsOpenFile()}} is as follows:
> {noformat}
> hdfsFile hdfsOpenFile(hdfsFS fs, const char* path, int flags, int bufferSize, 
> short replication, tSize blocksize);
> {noformat}
> {{tSize}} is typedef'd to {{int32_t}}.
> The comment of {{hdfsOpenFile()}} is explicit about this:
> {noformat}
> @param blocksize Size of block - pass 0 if you want to use the
> default configured values.  Note that if you want a block size bigger

[jira] [Commented] (IMPALA-7054) "Top-25 tables with highest memory requirements" sorts incorrectly

2018-05-21 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-7054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482829#comment-16482829
 ] 

Philip Zeyliger commented on IMPALA-7054:
-

Is this a dupe of:
{code:java}

commit ea4715fd76d6dba0c3777146989c2bf020efabdd
Author: stiga-huang 
Date: Thu May 3 06:44:42 2018 -0700

IMPALA-6966: sort table memory by size in catalogd web UI

This patch fix the sorting order in "Top-K Tables with Highest
Memory Requirements" in which "Estimated memory" column is sorted
as strings.

Values got from the catalog-server are changed from pretty-printed
strings to bytes numbers. So the web UI is able to sort and render
them correctly.

Change-Id: I60dc253f862f5fde6fa96147f114d8765bb31a85
Reviewed-on: http://gerrit.cloudera.org:8080/10292
Reviewed-by: Dimitris Tsirogiannis 
Tested-by: Impala Public Jenkins {code}

> "Top-25 tables with highest memory requirements" sorts incorrectly
> --
>
> Key: IMPALA-7054
> URL: https://issues.apache.org/jira/browse/IMPALA-7054
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.12.0
>Reporter: Todd Lipcon
>Priority: Minor
>
> The table on catalogd:25020/catalog has an "estimated memory" column which 
> sorts based on the stringified value. For example, "2.07 GB" sorts below 
> "23.65 MB".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7055) test_avro_writer failing on upstream Jenkins:

2018-05-21 Thread David Knupp (JIRA)
David Knupp created IMPALA-7055:
---

 Summary: test_avro_writer failing on upstream Jenkins: 
 Key: IMPALA-7055
 URL: https://issues.apache.org/jira/browse/IMPALA-7055
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 3.0
Reporter: David Knupp


This failure occurred while verifying https://gerrit.cloudera.org/c/10455/, but 
it is not related to that patch. The failing build is 
https://jenkins.impala.io/job/gerrit-verify-dryrun/2511/. 

Test appears to be (from 
[avro-writer.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/avro-writer.test]):
{noformat}
 QUERY
SET ALLOW_UNSUPPORTED_FORMATS=0;
insert into __avro_write select 1, "b", 2.2;
 CATCH
Writing to table format AVRO is not supported. Use query option 
ALLOW_UNSUPPORTED_FORMATS
{noformat}

Error output:
{noformat}
01:50:18 ] FAIL 
query_test/test_compressed_formats.py::TestTableWriters::()::test_avro_writer[exec_option:
 {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 
'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
'exec_single_node_rows_threshold': 0} | table_format: text/none]
01:50:18 ] === FAILURES 
===
01:50:18 ]  TestTableWriters.test_avro_writer[exec_option: {'batch_size': 0, 
'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': 
False, 'abort_on_error': 1, 'debug_action': None, 
'exec_single_node_rows_threshold': 0} | table_format: text/none] 
01:50:18 ] [gw9] linux2 -- Python 2.7.12 
/home/ubuntu/Impala/bin/../infra/python/env/bin/python
01:50:18 ] query_test/test_compressed_formats.py:189: in test_avro_writer
01:50:18 ] self.run_test_case('QueryTest/avro-writer', vector)
01:50:18 ] common/impala_test_suite.py:420: in run_test_case
01:50:18 ] assert False, "Expected exception: %s" % expected_str
01:50:18 ] E   AssertionError: Expected exception: Writing to table format AVRO 
is not supported. Use query option ALLOW_UNSUPPORTED_FORMATS
01:50:18 ]  Captured stderr setup 
-
01:50:18 ] -- connecting to: localhost:21000
01:50:18 ] - Captured stderr call 
-
01:50:18 ] -- executing against localhost:21000
01:50:18 ] use functional;
01:50:18 ] 
01:50:18 ] SET batch_size=0;
01:50:18 ] SET num_nodes=0;
01:50:18 ] SET disable_codegen_rows_threshold=5000;
01:50:18 ] SET disable_codegen=False;
01:50:18 ] SET abort_on_error=1;
01:50:18 ] SET exec_single_node_rows_threshold=0;
01:50:18 ] -- executing against localhost:21000
01:50:18 ] drop table if exists __avro_write;
01:50:18 ] 
01:50:18 ] -- executing against localhost:21000
01:50:18 ] SET COMPRESSION_CODEC=NONE;
01:50:18 ] 
01:50:18 ] -- executing against localhost:21000
01:50:18 ] 
01:50:18 ] create table __avro_write (i int, s string, d double)
01:50:18 ] stored as AVRO
01:50:18 ] TBLPROPERTIES ('avro.schema.literal'='{
01:50:18 ]   "name": "my_record",
01:50:18 ]   "type": "record",
01:50:18 ]   "fields": [
01:50:18 ]   {"name":"i", "type":["int", "null"]},
01:50:18 ]   {"name":"s", "type":["string", "null"]},
01:50:18 ]   {"name":"d", "type":["double", "null"]}]}');
01:50:18 ] 
01:50:18 ] -- executing against localhost:21000
01:50:18 ] SET COMPRESSION_CODEC="";
01:50:18 ] 
01:50:18 ] -- executing against localhost:21000
01:50:18 ] SET COMPRESSION_CODEC=NONE;
01:50:18 ] 
01:50:18 ] -- executing against localhost:21000
01:50:18 ] 
01:50:18 ] SET ALLOW_UNSUPPORTED_FORMATS=1;
01:50:18 ] 
01:50:18 ] -- executing against localhost:21000
01:50:18 ] 
01:50:18 ] insert into __avro_write select 0, "a", 1.1;
01:50:18 ] 
01:50:18 ] -- executing against localhost:21000
01:50:18 ] SET COMPRESSION_CODEC="";
01:50:18 ] 
01:50:18 ] -- executing against localhost:21000
01:50:18 ] SET ALLOW_UNSUPPORTED_FORMATS="0";
01:50:18 ] 
01:50:18 ] -- executing against localhost:21000
01:50:18 ] SET COMPRESSION_CODEC=SNAPPY;
01:50:18 ] 
01:50:18 ] -- executing against localhost:21000
01:50:18 ] 
01:50:18 ] SET ALLOW_UNSUPPORTED_FORMATS=1;
01:50:18 ] 
01:50:18 ] -- executing against localhost:21000
01:50:18 ] 
01:50:18 ] insert into __avro_write select 1, "b", 2.2;
01:50:18 ] 
01:50:18 ] -- executing against localhost:21000
01:50:18 ] SET COMPRESSION_CODEC="";
01:50:18 ] 
01:50:18 ] -- executing against localhost:21000
01:50:18 ] SET ALLOW_UNSUPPORTED_FORMATS="0";
01:50:18 ] 
01:50:18 ] -- executing against localhost:21000
01:50:18 ] select * from __avro_write;
01:50:18 ] 
01:50:18 ] -- executing against localhost:21000
01:50:18 ] SET ALLOW_UNSUPPORTED_FORMATS=0;
01:50:18 ] 
01:50:18 ] -- executing against localhost:21000
01:50:18 ] 
01:50:18 ] insert into __avro_write 

[jira] [Commented] (IMPALA-6994) Avoid reloading a table's HMS data for file-only operations

2018-05-21 Thread Pranay Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482917#comment-16482917
 ] 

Pranay Singh commented on IMPALA-6994:
--

Checked the code in updatePartitionsFromHms() looks like there is little room 
of optimization here, skipping CREATE/RECREATE/DROP of a partition will cause 
inconsistency between HMS and catalog 
this will result in failure of DML statements like INSERT when used with 
SYNC_DDL option which requires underlying data and metadata changes to be 
propagated to all Impala nodes.

-Pranay

> Avoid reloading a table's HMS data for file-only operations
> ---
>
> Key: IMPALA-6994
> URL: https://issues.apache.org/jira/browse/IMPALA-6994
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 2.12.0
>Reporter: Balazs Jeszenszky
>Assignee: Pranay Singh
>Priority: Major
>
> Reloading file metadata for HDFS tables (e.g. as a final step in an 'insert') 
> is done via
> https://github.com/apache/impala/blob/branch-2.12.0/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L628
> , which calls
> https://github.com/apache/impala/blob/branch-2.12.0/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1243
> HdfsTable.load has no option to only load file metadata. HMS metadata will 
> also be reloaded every time, which is an unnecessary overhead (and potential 
> point of failure) when adding files to existing locations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-6994) Avoid reloading a table's HMS data for file-only operations

2018-05-21 Thread Pranay Singh (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482917#comment-16482917
 ] 

Pranay Singh edited comment on IMPALA-6994 at 5/21/18 7:04 PM:
---

Checked the code in updatePartitionsFromHms() looks like there is little room 
of optimization here, skipping CREATE/RECREATE/DROP of a partition will cause 
inconsistency between HMS and catalog will result in failure of DML statements 
like INSERT, when used with SYNC_DDL option which requires underlying data and 
metadata changes to be propagated to all Impala nodes.

-Pranay


was (Author: pranay_singh):
Checked the code in updatePartitionsFromHms() looks like there is little room 
of optimization here, skipping CREATE/RECREATE/DROP of a partition will cause 
inconsistency between HMS and catalog 
this will result in failure of DML statements like INSERT when used with 
SYNC_DDL option which requires underlying data and metadata changes to be 
propagated to all Impala nodes.

-Pranay

> Avoid reloading a table's HMS data for file-only operations
> ---
>
> Key: IMPALA-6994
> URL: https://issues.apache.org/jira/browse/IMPALA-6994
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 2.12.0
>Reporter: Balazs Jeszenszky
>Assignee: Pranay Singh
>Priority: Major
>
> Reloading file metadata for HDFS tables (e.g. as a final step in an 'insert') 
> is done via
> https://github.com/apache/impala/blob/branch-2.12.0/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java#L628
> , which calls
> https://github.com/apache/impala/blob/branch-2.12.0/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L1243
> HdfsTable.load has no option to only load file metadata. HMS metadata will 
> also be reloaded every time, which is an unnecessary overhead (and potential 
> point of failure) when adding files to existing locations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7056) Changing Text Delimiter Does Not Work

2018-05-21 Thread Alan Jackoway (JIRA)
Alan Jackoway created IMPALA-7056:
-

 Summary: Changing Text Delimiter Does Not Work
 Key: IMPALA-7056
 URL: https://issues.apache.org/jira/browse/IMPALA-7056
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog, Docs
Affects Versions: Impala 2.12.0
Reporter: Alan Jackoway


The wording on 
https://impala.apache.org/docs/build/html/topics/impala_alter_table.html makes 
it seem like you can change the delimiter of text tables after they are created.

I did the following to simulate a table that needed to switch between comma and 
pipe delimited:
{code}
hadoop fs -mkdir /user/alanj
hadoop fs -mkdir /user/alanj/test_delim
echo "A,B|C" > delim.txt
hadoop fs -put delim.txt /user/alanj/test_delim
{code}

Then created in impala and tried to change delimiters:
{code:sql}
> create external table default.alanj_test_delim(A string, B string) ROW FORMAT 
> DELIMITED FIELDS TERMINATED BY "," LOCATION '/user/alanj/test_delim';
> select * from default.alanj_test_delim;
Query: select * from default.alanj_test_delim
+---+-+
| a | b   |
+---+-+
| A | B|C |
+---+-+
> alter table default.alanj_test_delim set SERDEPROPERTIES 
> ('serialization.format'='|', 'field.delim'='|');
> select * from default.alanj_test_delim;
+---+-+
| a | b   |
+---+-+
| A | B|C |
+---+-+
> show create table default.alanj_test_delim;
+--+
| result
   |
+--+
| CREATE EXTERNAL TABLE default.alanj_test_delim (  
   |
|   a STRING,   
   |
|   b STRING
   |
| ) 
   |
| ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' 
   |
| WITH SERDEPROPERTIES ('field.delim'='|', 'serialization.format'='|')  
   |
| STORED AS TEXTFILE
   |
| LOCATION 'hdfs://namenode:8020/user/alanj/test_delim' 
 |
| TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='false', 'numFiles'='0', 
'numRows'='-1', 'rawDataSize'='-1', 'totalSize'='0') |
+--+
{code}

So it shows the right serdeproperties, but impala doesn't actually use them to 
read the data.

If you then insert data (as the docs suggest), it writes that data with the new 
delimiter:
{code:sql}
> insert into default.alanj_test_delim values('D', 'E,F');
> select * from alanj_test_delim;
+-+-+
| a   | b   |
+-+-+
| A,B | C   |
| D   | E,F |
+-+-+
# hadoop fs -cat 
/user/alanj/test_delim/a54bb0ec14646492-a7388114_1498283208_data.0.
D|E,F
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-7056) Changing Text Delimiter Does Not Work

2018-05-21 Thread Alex Rodoni (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni reassigned IMPALA-7056:
---

Assignee: Alex Rodoni

> Changing Text Delimiter Does Not Work
> -
>
> Key: IMPALA-7056
> URL: https://issues.apache.org/jira/browse/IMPALA-7056
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog, Docs
>Affects Versions: Impala 2.12.0
>Reporter: Alan Jackoway
>Assignee: Alex Rodoni
>Priority: Major
>
> The wording on 
> https://impala.apache.org/docs/build/html/topics/impala_alter_table.html 
> makes it seem like you can change the delimiter of text tables after they are 
> created.
> I did the following to simulate a table that needed to switch between comma 
> and pipe delimited:
> {code}
> hadoop fs -mkdir /user/alanj
> hadoop fs -mkdir /user/alanj/test_delim
> echo "A,B|C" > delim.txt
> hadoop fs -put delim.txt /user/alanj/test_delim
> {code}
> Then created in impala and tried to change delimiters:
> {code:sql}
> > create external table default.alanj_test_delim(A string, B string) ROW 
> > FORMAT DELIMITED FIELDS TERMINATED BY "," LOCATION '/user/alanj/test_delim';
> > select * from default.alanj_test_delim;
> Query: select * from default.alanj_test_delim
> +---+-+
> | a | b   |
> +---+-+
> | A | B|C |
> +---+-+
> > alter table default.alanj_test_delim set SERDEPROPERTIES 
> > ('serialization.format'='|', 'field.delim'='|');
> > select * from default.alanj_test_delim;
> +---+-+
> | a | b   |
> +---+-+
> | A | B|C |
> +---+-+
> > show create table default.alanj_test_delim;
> +--+
> | result  
>  |
> +--+
> | CREATE EXTERNAL TABLE default.alanj_test_delim (
>  |
> |   a STRING, 
>  |
> |   b STRING  
>  |
> | )   
>  |
> | ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'   
>  |
> | WITH SERDEPROPERTIES ('field.delim'='|', 'serialization.format'='|')
>  |
> | STORED AS TEXTFILE  
>  |
> | LOCATION 'hdfs://namenode:8020/user/alanj/test_delim'   
>|
> | TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='false', 'numFiles'='0', 
> 'numRows'='-1', 'rawDataSize'='-1', 'totalSize'='0') |
> +--+
> {code}
> So it shows the right serdeproperties, but impala doesn't actually use them 
> to read the data.
> If you then insert data (as the docs suggest), it writes that data with the 
> new delimiter:
> {code:sql}
> > insert into default.alanj_test_delim values('D', 'E,F');
> > select * from alanj_test_delim;
> +-+-+
> | a   | b   |
> +-+-+
> | A,B | C   |
> | D   | E,F |
> +-+-+
> # hadoop fs -cat 
> /user/alanj/test_delim/a54bb0ec14646492-a7388114_1498283208_data.0.
> D|E,F
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-7055) test_avro_writer failing on upstream Jenkins (Expected exception: "Writing to table format AVRO is not supported")

2018-05-21 Thread David Knupp (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-7055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Knupp updated IMPALA-7055:

Labels: flaky-test  (was: )

> test_avro_writer failing on upstream Jenkins (Expected exception: "Writing to 
> table format AVRO is not supported")
> --
>
> Key: IMPALA-7055
> URL: https://issues.apache.org/jira/browse/IMPALA-7055
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.0
>Reporter: David Knupp
>Priority: Major
>  Labels: flaky-test
>
> This failure occurred while verifying https://gerrit.cloudera.org/c/10455/, 
> but it is not related to that patch. The failing build is 
> https://jenkins.impala.io/job/gerrit-verify-dryrun/2511/. 
> Test appears to be (from 
> [avro-writer.test|https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/avro-writer.test]):
> {noformat}
>  QUERY
> SET ALLOW_UNSUPPORTED_FORMATS=0;
> insert into __avro_write select 1, "b", 2.2;
>  CATCH
> Writing to table format AVRO is not supported. Use query option 
> ALLOW_UNSUPPORTED_FORMATS
> {noformat}
> Error output:
> {noformat}
> 01:50:18 ] FAIL 
> query_test/test_compressed_formats.py::TestTableWriters::()::test_avro_writer[exec_option:
>  {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: text/none]
> 01:50:18 ] === FAILURES 
> ===
> 01:50:18 ]  TestTableWriters.test_avro_writer[exec_option: {'batch_size': 0, 
> 'num_nodes': 0, 'disable_codegen_rows_threshold': 5000, 'disable_codegen': 
> False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': 0} | table_format: text/none] 
> 01:50:18 ] [gw9] linux2 -- Python 2.7.12 
> /home/ubuntu/Impala/bin/../infra/python/env/bin/python
> 01:50:18 ] query_test/test_compressed_formats.py:189: in test_avro_writer
> 01:50:18 ] self.run_test_case('QueryTest/avro-writer', vector)
> 01:50:18 ] common/impala_test_suite.py:420: in run_test_case
> 01:50:18 ] assert False, "Expected exception: %s" % expected_str
> 01:50:18 ] E   AssertionError: Expected exception: Writing to table format 
> AVRO is not supported. Use query option ALLOW_UNSUPPORTED_FORMATS
> 01:50:18 ]  Captured stderr setup 
> -
> 01:50:18 ] -- connecting to: localhost:21000
> 01:50:18 ] - Captured stderr call 
> -
> 01:50:18 ] -- executing against localhost:21000
> 01:50:18 ] use functional;
> 01:50:18 ] 
> 01:50:18 ] SET batch_size=0;
> 01:50:18 ] SET num_nodes=0;
> 01:50:18 ] SET disable_codegen_rows_threshold=5000;
> 01:50:18 ] SET disable_codegen=False;
> 01:50:18 ] SET abort_on_error=1;
> 01:50:18 ] SET exec_single_node_rows_threshold=0;
> 01:50:18 ] -- executing against localhost:21000
> 01:50:18 ] drop table if exists __avro_write;
> 01:50:18 ] 
> 01:50:18 ] -- executing against localhost:21000
> 01:50:18 ] SET COMPRESSION_CODEC=NONE;
> 01:50:18 ] 
> 01:50:18 ] -- executing against localhost:21000
> 01:50:18 ] 
> 01:50:18 ] create table __avro_write (i int, s string, d double)
> 01:50:18 ] stored as AVRO
> 01:50:18 ] TBLPROPERTIES ('avro.schema.literal'='{
> 01:50:18 ]   "name": "my_record",
> 01:50:18 ]   "type": "record",
> 01:50:18 ]   "fields": [
> 01:50:18 ]   {"name":"i", "type":["int", "null"]},
> 01:50:18 ]   {"name":"s", "type":["string", "null"]},
> 01:50:18 ]   {"name":"d", "type":["double", "null"]}]}');
> 01:50:18 ] 
> 01:50:18 ] -- executing against localhost:21000
> 01:50:18 ] SET COMPRESSION_CODEC="";
> 01:50:18 ] 
> 01:50:18 ] -- executing against localhost:21000
> 01:50:18 ] SET COMPRESSION_CODEC=NONE;
> 01:50:18 ] 
> 01:50:18 ] -- executing against localhost:21000
> 01:50:18 ] 
> 01:50:18 ] SET ALLOW_UNSUPPORTED_FORMATS=1;
> 01:50:18 ] 
> 01:50:18 ] -- executing against localhost:21000
> 01:50:18 ] 
> 01:50:18 ] insert into __avro_write select 0, "a", 1.1;
> 01:50:18 ] 
> 01:50:18 ] -- executing against localhost:21000
> 01:50:18 ] SET COMPRESSION_CODEC="";
> 01:50:18 ] 
> 01:50:18 ] -- executing against localhost:21000
> 01:50:18 ] SET ALLOW_UNSUPPORTED_FORMATS="0";
> 01:50:18 ] 
> 01:50:18 ] -- executing against localhost:21000
> 01:50:18 ] SET COMPRESSION_CODEC=SNAPPY;
> 01:50:18 ] 
> 01:50:18 ] -- executing against localhost:21000
> 01:50:18 ] 
> 01:50:18 ] SET ALLOW_UNSUPPORTED_FORMATS=1;
> 01:50:18 ] 
> 01:50:18 ] -- executing against localhost:21000

[jira] [Commented] (IMPALA-6947) kudu: GetTableLocations RPC timing out with ASAN

2018-05-21 Thread Thomas Tauber-Marshall (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482827#comment-16482827
 ] 

Thomas Tauber-Marshall commented on IMPALA-6947:


https://gerrit.cloudera.org/#/c/10466/

> kudu: GetTableLocations RPC timing out with ASAN
> 
>
> Key: IMPALA-6947
> URL: https://issues.apache.org/jira/browse/IMPALA-6947
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.13.0
>Reporter: Michael Brown
>Assignee: Thomas Tauber-Marshall
>Priority: Critical
>
> {noformat}
> query_test/test_kudu.py:84: in test_kudu_insert
> self.run_test_case('QueryTest/kudu_insert', vector, 
> use_db=unique_database)
> common/impala_test_suite.py:398: in run_test_case
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:613: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:160: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:173: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:341: in __execute_query
> self.wait_for_completion(handle)
> beeswax/impala_beeswax.py:361: in wait_for_completion
> raise ImpalaBeeswaxException("Query aborted:" + error_log, None)
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EQuery aborted:Kudu error(s) reported, first error: Timed out: 
> GetTableLocations { table: 'impala::test_kudu_insert_70eff904.kudu_test', 
> partition-key: (HASH (a, b): 2), attempt: 1 } failed: GetTableLocations RPC 
> to 127.0.0.1:7051 timed out after 10.000s (SENT)
> E   
> E   Key already present in Kudu table 
> 'impala::test_kudu_insert_70eff904.kudu_test'. (1 of 3 similar)
> E   Error in Kudu table 'impala::test_kudu_insert_70eff904.kudu_test': Timed 
> out: GetTableLocations { table: 
> 'impala::test_kudu_insert_70eff904.kudu_test', partition-key: (HASH (a, b): 
> 2), attempt: 1 } failed: GetTableLocations RPC to 127.0.0.1:7051 timed out 
> after 10.000s (SENT) (1 of 21 similar)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-7054) "Top-25 tables with highest memory requirements" sorts incorrectly

2018-05-21 Thread Todd Lipcon (JIRA)
Todd Lipcon created IMPALA-7054:
---

 Summary: "Top-25 tables with highest memory requirements" sorts 
incorrectly
 Key: IMPALA-7054
 URL: https://issues.apache.org/jira/browse/IMPALA-7054
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Affects Versions: Impala 2.12.0
Reporter: Todd Lipcon


The table on catalogd:25020/catalog has an "estimated memory" column which 
sorts based on the stringified value. For example, "2.07 GB" sorts below "23.65 
MB".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-3149) Bind variable issue in ODBC

2018-05-21 Thread Patrick Szalapski (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483026#comment-16483026
 ] 

Patrick Szalapski commented on IMPALA-3149:
---

I am only having trouble with a ? in a subquery in the WHERE clause, not in the 
SELECT.  Separately, I am having trouble having a parameter in the SELECT list 
(e.g. {{select ? as input1name, ... from ...}})

> Bind variable issue in ODBC
> ---
>
> Key: IMPALA-3149
> URL: https://issues.apache.org/jira/browse/IMPALA-3149
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.2, Impala 2.3.0
>Reporter: Jiri Novak
>Assignee: Syed A. Hashmi
>Priority: Minor
>
> For some reason Cloudera Impal does not recognize bind variable in HAVING 
> clause.
> If we execute the following simple query using .Net and ADO.Net
> {code:sql}
> SELECT COUNT(address.address_id) 
> , address.country
> FROM quest_stage.address address 
> GROUP BY address.country 
> HAVING (COUNT(address.address_id) > ?)
> {code}
> It returns the following error
> Error:
> {noformat}
> [Cloudera][ImpalaODBC] (110) Error while executing a query in Impala: [HY000] 
> : AnalysisException: Syntax error in line 5:
> HAVING (COUNT(address.address_id) > ?)
> ^
> Encountered: Unexpected character
> Expected: CASE, CAST, EXISTS, FALSE, IF, INTERVAL, NOT, NULL, TRUE, IDENTIFIER
> {noformat}
> Bind variable works correctly in WHERE clause. Also the query returns correct 
> result if we use a number instead of the bind variable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-7016) Statement to allow setting ownership for database

2018-05-21 Thread Fredy Wijaya (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-7016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-7016 started by Fredy Wijaya.

> Statement to allow setting ownership for database
> -
>
> Key: IMPALA-7016
> URL: https://issues.apache.org/jira/browse/IMPALA-7016
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Affects Versions: Impala 3.0, Impala 2.13.0
>Reporter: Adam Holley
>Assignee: Fredy Wijaya
>Priority: Major
>
> Create statement to allow setting owner on database
> {{ALTER DATABASE database_name SET OWNER [USER|ROLE] user_or_role;}}
> examples:
> ALTER DATABASE  SET OWNER USER 
> ALTER DATABASE  SET OWNER ROLE 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-7056) Changing Text Delimiter Does Not Work

2018-05-21 Thread Alex Rodoni (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-7056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-7056 started by Alex Rodoni.
---
> Changing Text Delimiter Does Not Work
> -
>
> Key: IMPALA-7056
> URL: https://issues.apache.org/jira/browse/IMPALA-7056
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog, Docs
>Affects Versions: Impala 2.12.0
>Reporter: Alan Jackoway
>Assignee: Alex Rodoni
>Priority: Major
>
> The wording on 
> https://impala.apache.org/docs/build/html/topics/impala_alter_table.html 
> makes it seem like you can change the delimiter of text tables after they are 
> created.
> I did the following to simulate a table that needed to switch between comma 
> and pipe delimited:
> {code}
> hadoop fs -mkdir /user/alanj
> hadoop fs -mkdir /user/alanj/test_delim
> echo "A,B|C" > delim.txt
> hadoop fs -put delim.txt /user/alanj/test_delim
> {code}
> Then created in impala and tried to change delimiters:
> {code:sql}
> > create external table default.alanj_test_delim(A string, B string) ROW 
> > FORMAT DELIMITED FIELDS TERMINATED BY "," LOCATION '/user/alanj/test_delim';
> > select * from default.alanj_test_delim;
> Query: select * from default.alanj_test_delim
> +---+-+
> | a | b   |
> +---+-+
> | A | B|C |
> +---+-+
> > alter table default.alanj_test_delim set SERDEPROPERTIES 
> > ('serialization.format'='|', 'field.delim'='|');
> > select * from default.alanj_test_delim;
> +---+-+
> | a | b   |
> +---+-+
> | A | B|C |
> +---+-+
> > show create table default.alanj_test_delim;
> +--+
> | result  
>  |
> +--+
> | CREATE EXTERNAL TABLE default.alanj_test_delim (
>  |
> |   a STRING, 
>  |
> |   b STRING  
>  |
> | )   
>  |
> | ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'   
>  |
> | WITH SERDEPROPERTIES ('field.delim'='|', 'serialization.format'='|')
>  |
> | STORED AS TEXTFILE  
>  |
> | LOCATION 'hdfs://namenode:8020/user/alanj/test_delim'   
>|
> | TBLPROPERTIES ('COLUMN_STATS_ACCURATE'='false', 'numFiles'='0', 
> 'numRows'='-1', 'rawDataSize'='-1', 'totalSize'='0') |
> +--+
> {code}
> So it shows the right serdeproperties, but impala doesn't actually use them 
> to read the data.
> If you then insert data (as the docs suggest), it writes that data with the 
> new delimiter:
> {code:sql}
> > insert into default.alanj_test_delim values('D', 'E,F');
> > select * from alanj_test_delim;
> +-+-+
> | a   | b   |
> +-+-+
> | A,B | C   |
> | D   | E,F |
> +-+-+
> # hadoop fs -cat 
> /user/alanj/test_delim/a54bb0ec14646492-a7388114_1498283208_data.0.
> D|E,F
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-3149) Bind variable issue in ODBC

2018-05-21 Thread Patrick Szalapski (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482957#comment-16482957
 ] 

Patrick Szalapski edited comment on IMPALA-3149 at 5/21/18 8:28 PM:


I am also having this issue, but in a subquery or in the SELECT clause.  Any 
thoughts?

 

In my ODBC log I am getting   

{{DIAG [HY092] [Cloudera][ODBC] (10210) Attribute identifier invalid or not 
supported: 1228 (10210)}}
{{DIAG [HY092] [Cloudera][ODBC] (10210) Attribute identifier invalid or not 
supported: 1227 (10210)}}


was (Author: psz):
I am also having this issue, but in a subquery or in the SELECT clause.  Any 
thoughts?

 

In my ODBC log I am getting   

{{DIAG [HY092] [Cloudera][ODBC] (10210) Attribute identifier invalid or not 
supported: 1228 (10210) }}

{{DIAG [HY092] [Cloudera][ODBC] (10210) Attribute identifier invalid or not 
supported: 1227 (10210) }}

> Bind variable issue in ODBC
> ---
>
> Key: IMPALA-3149
> URL: https://issues.apache.org/jira/browse/IMPALA-3149
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.2, Impala 2.3.0
>Reporter: Jiri Novak
>Assignee: Syed A. Hashmi
>Priority: Minor
>
> For some reason Cloudera Impal does not recognize bind variable in HAVING 
> clause.
> If we execute the following simple query using .Net and ADO.Net
> {code:sql}
> SELECT COUNT(address.address_id) 
> , address.country
> FROM quest_stage.address address 
> GROUP BY address.country 
> HAVING (COUNT(address.address_id) > ?)
> {code}
> It returns the following error
> Error:
> {noformat}
> [Cloudera][ImpalaODBC] (110) Error while executing a query in Impala: [HY000] 
> : AnalysisException: Syntax error in line 5:
> HAVING (COUNT(address.address_id) > ?)
> ^
> Encountered: Unexpected character
> Expected: CASE, CAST, EXISTS, FALSE, IF, INTERVAL, NOT, NULL, TRUE, IDENTIFIER
> {noformat}
> Bind variable works correctly in WHERE clause. Also the query returns correct 
> result if we use a number instead of the bind variable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-1480) Slow DDL statements for tables with large number of partitions

2018-05-21 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-1480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483017#comment-16483017
 ] 

ASF subversion and git services commented on IMPALA-1480:
-

Commit 5c7d3b12e3aa750e7ab88e3ef1092d5218e53cc2 in impala's branch 
refs/heads/master from [~csringhofer]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=5c7d3b1 ]

IMPALA-6131: Track time of last statistics update in metadata

The timestamp of the last COMPUTE STATS operation is saved to
table property "impala.lastComputeStatsTime". The format is
the same as in "transient_lastDdlTime", so the two can be
compared to check if the schema has changed since computing
statistics.

Other changes:
- Handling of "transient_lastDdlTime" is simplified - the old
  logic set it to current time + 1, if the old version was
  >= current time, to ensure that it is always increased by
  DDL operations. This was useful in the past, as IMPALA-387
  used lastDdlTime to check if partition data needs to be
  reloaded, but since IMPALA-1480, Impala does not rely on
  lastDdlTime at all.

- Computing / setting stats on HDFS tables no longer increases
  "transient_lastDdlTime".

- When Kudu tables are (re)loaded, it is checked if their
  HMS representation is up to date, and if it is, then
  IMetaStoreClient.alter_table() is not called. The old
  logic always called alter_table() after loading metadata
  from Kudu. This change was needed to ensure that
  "transient_lastDdlTime" works similarly in HDFS and Kudu
  tables, and should also make (re)loading Kudu tables faster.

Notes:
- Kudu will be able to sync its tables to HMS in the near
  future (see KUDU-2191), so the Kudu metadata handling in
  Impala may need to be redesigned.

Testing:
tests/metadata/test_last_ddl_time_update.py is extended by
- also checking "impala.lastComputeStatsTime"
- testing more SQL statements
- tests for Kudu tables

Note that test_last_ddl_time_update.py is ran only in
exhaustive testing.

Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
Reviewed-on: http://gerrit.cloudera.org:8080/10116
Reviewed-by: Lars Volker 
Tested-by: Impala Public Jenkins 


> Slow DDL statements for tables with large number of partitions
> --
>
> Key: IMPALA-1480
> URL: https://issues.apache.org/jira/browse/IMPALA-1480
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 2.0
>Reporter: Dimitris Tsirogiannis
>Assignee: Dimitris Tsirogiannis
>Priority: Critical
>  Labels: impala, performance
> Fix For: Impala 2.5.0
>
>
> Impala users sometimes report that DDL statements (e.g. alter table partition 
> set location...) are taking multiple seconds (>5) for partitioned tables with 
> large number of partitions. The same operations are significantly faster in 
> hive (sub-second response time). 
> Use case:
> * 2 node cluster
> * Single table (24 columns, 3 partition keys) with 2500 partitions
> * alter table foo partition (foo_i = i) set location 'hdfs://.' takes 
> approximately 5-6sec (0.2 in HIVE)
> * 1 sec delay in the alter stmt is caused by 
> https://issues.apache.org/jira/browse/HIVE-5524



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-5384) Simplify coordinator locking protocol

2018-05-21 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-5384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483011#comment-16483011
 ] 

ASF subversion and git services commented on IMPALA-5384:
-

Commit 75d19c874f2daf7e42231a257a97c07367660226 in impala's branch 
refs/heads/2.x from [~dhecht]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=75d19c8 ]

IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

The is the final change to clarify and break up the Coordinator's lock.
The state machine for the coordinator is made explicit, distinguishing
between executing state and multiple terminal states. Logic to
transition into a terminal state is centralized in one location and
executes exactly once for each coordinator object.

Derived from a patch for IMPALA_5384 by Marcel Kornacker.

Testing:
- exhaustive functional tests
- stress test on minicluster with memory overcommitment. Verified from
  the logs that this exercises all these paths:
  - successful queries
  - client requested cancellation
  - error from exec FInstances RPC
  - error reported asynchronously via report status RPC
  - eos before backend execution completed
- loop query_test & failure for 12 hours with no dchecks or crashes
  (This had previously reproduced IMPALA-7030 and IMPALA-7033 with
  the previous version of this change).

Change-Id: I6dc08da1295f1df3c9dce6d35d65d887b2c00a1c
Reviewed-on: http://gerrit.cloudera.org:8080/10440
Reviewed-by: Dan Hecht 
Tested-by: Impala Public Jenkins 
Reviewed-on: http://gerrit.cloudera.org:8080/10465


> Simplify coordinator locking protocol
> -
>
> Key: IMPALA-5384
> URL: https://issues.apache.org/jira/browse/IMPALA-5384
> Project: IMPALA
>  Issue Type: Improvement
>Affects Versions: Impala 2.9.0
>Reporter: Marcel Kornacker
>Assignee: Dan Hecht
>Priority: Major
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> The coordinator has a central lock (lock_) which is used very liberally to 
> synchronize state changes that don't need to be synchronized, creating a 
> concurrency bottleneck.
> Also, the coordinator contains a number of data structures related to INSERT 
> finalization that don't need to be part of and synchronized with the rest of 
> the coordinator state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-3149) Bind variable issue in ODBC

2018-05-21 Thread Patrick Szalapski (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482957#comment-16482957
 ] 

Patrick Szalapski edited comment on IMPALA-3149 at 5/21/18 8:27 PM:


I am also having this issue, but in a subquery or in the SELECT clause.  Any 
thoughts?

 

In my ODBC log I am getting   

{{DIAG [HY092] [Cloudera][ODBC] (10210) Attribute identifier invalid or not 
supported: 1228 (10210) }}

DIAG [HY092] [Cloudera][ODBC] (10210) Attribute identifier invalid or not 
supported: 1227 (10210) 


was (Author: psz):
I am also having this issue, but in a subquery or in the SELECT clause.  Any 
thoughts?

 

In my ODBC log I am getting   

{{DIAG [HY092] [Cloudera][ODBC] (10210) Attribute identifier invalid or not 
supported: 1228 (10210) }}{{DIAG [HY092] [Cloudera][ODBC] (10210) Attribute 
identifier invalid or not supported: 1227 (10210) }}

> Bind variable issue in ODBC
> ---
>
> Key: IMPALA-3149
> URL: https://issues.apache.org/jira/browse/IMPALA-3149
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.2, Impala 2.3.0
>Reporter: Jiri Novak
>Assignee: Syed A. Hashmi
>Priority: Minor
>
> For some reason Cloudera Impal does not recognize bind variable in HAVING 
> clause.
> If we execute the following simple query using .Net and ADO.Net
> {code:sql}
> SELECT COUNT(address.address_id) 
> , address.country
> FROM quest_stage.address address 
> GROUP BY address.country 
> HAVING (COUNT(address.address_id) > ?)
> {code}
> It returns the following error
> Error:
> {noformat}
> [Cloudera][ImpalaODBC] (110) Error while executing a query in Impala: [HY000] 
> : AnalysisException: Syntax error in line 5:
> HAVING (COUNT(address.address_id) > ?)
> ^
> Encountered: Unexpected character
> Expected: CASE, CAST, EXISTS, FALSE, IF, INTERVAL, NOT, NULL, TRUE, IDENTIFIER
> {noformat}
> Bind variable works correctly in WHERE clause. Also the query returns correct 
> result if we use a number instead of the bind variable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-3149) Bind variable issue in ODBC

2018-05-21 Thread Patrick Szalapski (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16482957#comment-16482957
 ] 

Patrick Szalapski edited comment on IMPALA-3149 at 5/21/18 8:27 PM:


I am also having this issue, but in a subquery or in the SELECT clause.  Any 
thoughts?

 

In my ODBC log I am getting   

{{DIAG [HY092] [Cloudera][ODBC] (10210) Attribute identifier invalid or not 
supported: 1228 (10210) }}{{DIAG [HY092] [Cloudera][ODBC] (10210) Attribute 
identifier invalid or not supported: 1227 (10210) }}


was (Author: psz):
I am also having this issue, but in a subquery or in the SELECT clause.  Any 
thoughts?

> Bind variable issue in ODBC
> ---
>
> Key: IMPALA-3149
> URL: https://issues.apache.org/jira/browse/IMPALA-3149
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.2, Impala 2.3.0
>Reporter: Jiri Novak
>Assignee: Syed A. Hashmi
>Priority: Minor
>
> For some reason Cloudera Impal does not recognize bind variable in HAVING 
> clause.
> If we execute the following simple query using .Net and ADO.Net
> {code:sql}
> SELECT COUNT(address.address_id) 
> , address.country
> FROM quest_stage.address address 
> GROUP BY address.country 
> HAVING (COUNT(address.address_id) > ?)
> {code}
> It returns the following error
> Error:
> {noformat}
> [Cloudera][ImpalaODBC] (110) Error while executing a query in Impala: [HY000] 
> : AnalysisException: Syntax error in line 5:
> HAVING (COUNT(address.address_id) > ?)
> ^
> Encountered: Unexpected character
> Expected: CASE, CAST, EXISTS, FALSE, IF, INTERVAL, NOT, NULL, TRUE, IDENTIFIER
> {noformat}
> Bind variable works correctly in WHERE clause. Also the query returns correct 
> result if we use a number instead of the bind variable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Comment Edited] (IMPALA-3149) Bind variable issue in ODBC

2018-05-21 Thread Patrick Szalapski (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-3149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483110#comment-16483110
 ] 

Patrick Szalapski edited comment on IMPALA-3149 at 5/21/18 10:27 PM:
-

I've simplified my query as much as possible.  I get the "unexpected character" 
error on this query over ODBC using Cloudera ODBC driver 2.05.41.1029 for 
Windows.

{{with nextMatchId as (}}
 {{    select min(match_id) as match_id}}
 {{    from user}}
 {{    where user_id = ?)}}
{{select c.*}}
 {{from candidate c}}
 {{join nextMatchId on nextMatchId.match_id=c.match_id}}

??System.Data.Odbc.OdbcException??
 ??  HResult=0x80131937??
 ??  Message=ERROR [HY000] [Cloudera][ImpalaODBC] (110) Error while executing a 
query in Impala: [HY000] : AnalysisException: Syntax error in line 5:??
 ??    where user_id = ???
 ??    ^??
 ??Encountered: Unexpected character??
 ??Expected: CASE, CAST, DEFAULT, EXISTS, FALSE, IF, INTERVAL, NOT, NULL, 
REPLACE, TRUNCATE, TRUE, IDENTIFIER??


was (Author: psz):
I've simplified my query as much as possible.  I get the "unexpected character" 
error on this query over ODBC using Cloudera ODBC driver 2.05.41.1029 for 
Windows.

{{with nextMatchId as (}}
{{    select min(match_id) as match_id}}
{{    from user}}
{{    where user_id = ?}}
{{) }}
{{select c.*}}
{{from candidate c}}
{{join nextMatchId on nextMatchId.match_id=c.match_id}}

??System.Data.Odbc.OdbcException??
??  HResult=0x80131937??
??  Message=ERROR [HY000] [Cloudera][ImpalaODBC] (110) Error while executing a 
query in Impala: [HY000] : AnalysisException: Syntax error in line 5:??
??    where user_id = ???
??    ^??
??Encountered: Unexpected character??
??Expected: CASE, CAST, DEFAULT, EXISTS, FALSE, IF, INTERVAL, NOT, NULL, 
REPLACE, TRUNCATE, TRUE, IDENTIFIER??

> Bind variable issue in ODBC
> ---
>
> Key: IMPALA-3149
> URL: https://issues.apache.org/jira/browse/IMPALA-3149
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.2, Impala 2.3.0
>Reporter: Jiri Novak
>Assignee: Syed A. Hashmi
>Priority: Minor
>
> For some reason Cloudera Impal does not recognize bind variable in HAVING 
> clause.
> If we execute the following simple query using .Net and ADO.Net
> {code:sql}
> SELECT COUNT(address.address_id) 
> , address.country
> FROM quest_stage.address address 
> GROUP BY address.country 
> HAVING (COUNT(address.address_id) > ?)
> {code}
> It returns the following error
> Error:
> {noformat}
> [Cloudera][ImpalaODBC] (110) Error while executing a query in Impala: [HY000] 
> : AnalysisException: Syntax error in line 5:
> HAVING (COUNT(address.address_id) > ?)
> ^
> Encountered: Unexpected character
> Expected: CASE, CAST, EXISTS, FALSE, IF, INTERVAL, NOT, NULL, TRUE, IDENTIFIER
> {noformat}
> Bind variable works correctly in WHERE clause. Also the query returns correct 
> result if we use a number instead of the bind variable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7033) Impala crashes on exhaustive release tests

2018-05-21 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-7033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483013#comment-16483013
 ] 

ASF subversion and git services commented on IMPALA-7033:
-

Commit 75d19c874f2daf7e42231a257a97c07367660226 in impala's branch 
refs/heads/2.x from [~dhecht]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=75d19c8 ]

IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

The is the final change to clarify and break up the Coordinator's lock.
The state machine for the coordinator is made explicit, distinguishing
between executing state and multiple terminal states. Logic to
transition into a terminal state is centralized in one location and
executes exactly once for each coordinator object.

Derived from a patch for IMPALA_5384 by Marcel Kornacker.

Testing:
- exhaustive functional tests
- stress test on minicluster with memory overcommitment. Verified from
  the logs that this exercises all these paths:
  - successful queries
  - client requested cancellation
  - error from exec FInstances RPC
  - error reported asynchronously via report status RPC
  - eos before backend execution completed
- loop query_test & failure for 12 hours with no dchecks or crashes
  (This had previously reproduced IMPALA-7030 and IMPALA-7033 with
  the previous version of this change).

Change-Id: I6dc08da1295f1df3c9dce6d35d65d887b2c00a1c
Reviewed-on: http://gerrit.cloudera.org:8080/10440
Reviewed-by: Dan Hecht 
Tested-by: Impala Public Jenkins 
Reviewed-on: http://gerrit.cloudera.org:8080/10465


> Impala crashes on exhaustive release tests
> --
>
> Key: IMPALA-7033
> URL: https://issues.apache.org/jira/browse/IMPALA-7033
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Joe McDonnell
>Assignee: Dan Hecht
>Priority: Blocker
>  Labels: broken-build, flaky
> Fix For: Impala 3.1.0
>
>
> Exhaustive release builds have seen crashes related to memory allocation/free:
> {noformat}
> CORE: ./core.1526387352.3540.impalad
> BINARY: ./be/build/latest/service/impalad
> Core was generated by 
> `/data/jenkins/workspace/impala-asf-master-exhaustive-release/repos/Impala/be/bu'.
> Program terminated with signal 6, Aborted.
> #0 0x003f10e328e5 in raise () from /lib64/libc.so.6
> To enable execution of this file add
> add-auto-load-safe-path 
> /data/jenkins/workspace/impala-asf-master-exhaustive-release/Impala-Toolchain/gcc-4.9.2/lib64/libstdc++.so.6.0.20-gdb.py
> line to your configuration file "/var/lib/jenkins/.gdbinit".
> To completely disable this security protection add
> set auto-load safe-path /
> line to your configuration file "/var/lib/jenkins/.gdbinit".
> For more information about this security protection see the
> "Auto-loading safe path" section in the GDB manual. E.g., run from the shell:
> info "(gdb)Auto-loading safe path"
> #0 0x003f10e328e5 in raise () from /lib64/libc.so.6
> #1 0x003f10e340c5 in abort () from /lib64/libc.so.6
> #2 0x7f7d7ff261a5 in os::abort(bool) () from 
> /opt/toolchain/sun-jdk-64bit-1.8.0.05/jre/lib/amd64/server/libjvm.so
> #3 0x7f7d800b6843 in VMError::report_and_die() () from 
> /opt/toolchain/sun-jdk-64bit-1.8.0.05/jre/lib/amd64/server/libjvm.so
> #4 0x7f7d7ff2b562 in JVM_handle_linux_signal () from 
> /opt/toolchain/sun-jdk-64bit-1.8.0.05/jre/lib/amd64/server/libjvm.so
> #5 0x7f7d7ff224f3 in signalHandler(int, siginfo*, void*) () from 
> /opt/toolchain/sun-jdk-64bit-1.8.0.05/jre/lib/amd64/server/libjvm.so
> #6 
> #7 0x026be93f in tc_newarray ()
> #8 0x00c9d508 in allocate (this=0x11f1cae8, __n=1) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-release/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/ext/new_allocator.h:104
> #9 allocate (this=0x11f1cae8, __n=1) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-release/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/bits/alloc_traits.h:357
> #10 _M_allocate (this=0x11f1cae8, __n=1) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-release/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/bits/stl_vector.h:170
> #11 std::vector::_M_default_append 
> (this=0x11f1cae8, __n=1) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-release/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/bits/vector.tcc:557
> #12 0x00cb035f in _M_default_append (this=) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-release/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/bits/stl_map.h:506
> #13 resize (this=) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-release/Impala-Toolchain/gcc-4.9.2/include/c++/4.9.2/bits/stl_vector.h:676
> #14 

[jira] [Commented] (IMPALA-6131) Track time of last statistics update in metadata

2018-05-21 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483015#comment-16483015
 ] 

ASF subversion and git services commented on IMPALA-6131:
-

Commit 5c7d3b12e3aa750e7ab88e3ef1092d5218e53cc2 in impala's branch 
refs/heads/master from [~csringhofer]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=5c7d3b1 ]

IMPALA-6131: Track time of last statistics update in metadata

The timestamp of the last COMPUTE STATS operation is saved to
table property "impala.lastComputeStatsTime". The format is
the same as in "transient_lastDdlTime", so the two can be
compared to check if the schema has changed since computing
statistics.

Other changes:
- Handling of "transient_lastDdlTime" is simplified - the old
  logic set it to current time + 1, if the old version was
  >= current time, to ensure that it is always increased by
  DDL operations. This was useful in the past, as IMPALA-387
  used lastDdlTime to check if partition data needs to be
  reloaded, but since IMPALA-1480, Impala does not rely on
  lastDdlTime at all.

- Computing / setting stats on HDFS tables no longer increases
  "transient_lastDdlTime".

- When Kudu tables are (re)loaded, it is checked if their
  HMS representation is up to date, and if it is, then
  IMetaStoreClient.alter_table() is not called. The old
  logic always called alter_table() after loading metadata
  from Kudu. This change was needed to ensure that
  "transient_lastDdlTime" works similarly in HDFS and Kudu
  tables, and should also make (re)loading Kudu tables faster.

Notes:
- Kudu will be able to sync its tables to HMS in the near
  future (see KUDU-2191), so the Kudu metadata handling in
  Impala may need to be redesigned.

Testing:
tests/metadata/test_last_ddl_time_update.py is extended by
- also checking "impala.lastComputeStatsTime"
- testing more SQL statements
- tests for Kudu tables

Note that test_last_ddl_time_update.py is ran only in
exhaustive testing.

Change-Id: I59a671ac29d352bd92ce40d5cb6662bb23f146b5
Reviewed-on: http://gerrit.cloudera.org:8080/10116
Reviewed-by: Lars Volker 
Tested-by: Impala Public Jenkins 


> Track time of last statistics update in metadata
> 
>
> Key: IMPALA-6131
> URL: https://issues.apache.org/jira/browse/IMPALA-6131
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend, Frontend
>Reporter: Lars Volker
>Assignee: Csaba Ringhofer
>Priority: Major
>  Labels: ramp-up
>
> Currently we (ab-)use {{transient_lastDdlTime}} to track the last update time 
> of statistics. Instead we should introduce a separate counter to track the 
> last update. With that we should also remove all occurrences of 
> {{catalog_.updateLastDdlTime()}} from {{CatalogOpExecutor}} and fall back to 
> Hive's default behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7030) crash in impala::PartitionedAggregationNode::ProcessBatchNoGrouping

2018-05-21 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-7030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483012#comment-16483012
 ] 

ASF subversion and git services commented on IMPALA-7030:
-

Commit 75d19c874f2daf7e42231a257a97c07367660226 in impala's branch 
refs/heads/2.x from [~dhecht]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=75d19c8 ]

IMPALA-5384, part 2: Simplify Coordinator locking and clarify state

The is the final change to clarify and break up the Coordinator's lock.
The state machine for the coordinator is made explicit, distinguishing
between executing state and multiple terminal states. Logic to
transition into a terminal state is centralized in one location and
executes exactly once for each coordinator object.

Derived from a patch for IMPALA_5384 by Marcel Kornacker.

Testing:
- exhaustive functional tests
- stress test on minicluster with memory overcommitment. Verified from
  the logs that this exercises all these paths:
  - successful queries
  - client requested cancellation
  - error from exec FInstances RPC
  - error reported asynchronously via report status RPC
  - eos before backend execution completed
- loop query_test & failure for 12 hours with no dchecks or crashes
  (This had previously reproduced IMPALA-7030 and IMPALA-7033 with
  the previous version of this change).

Change-Id: I6dc08da1295f1df3c9dce6d35d65d887b2c00a1c
Reviewed-on: http://gerrit.cloudera.org:8080/10440
Reviewed-by: Dan Hecht 
Tested-by: Impala Public Jenkins 
Reviewed-on: http://gerrit.cloudera.org:8080/10465


>  crash in impala::PartitionedAggregationNode::ProcessBatchNoGrouping
> 
>
> Key: IMPALA-7030
> URL: https://issues.apache.org/jira/browse/IMPALA-7030
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Michael Brown
>Assignee: Dan Hecht
>Priority: Blocker
> Attachments: crash.dump.gz, gdb.out.gz, hs_err_pid1621.log.gz
>
>
> https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/2176/
> {noformat}
> #0  0x7fc896430428 in __GI_raise (sig=sig@entry=6) at 
> ../sysdeps/unix/sysv/linux/raise.c:54
> #1  0x7fc89643202a in __GI_abort () at abort.c:89
> #2  0x7fc899379c59 in os::abort(bool) (dump_core=) at 
> /build/openjdk-8-wnL82d/openjdk-8-8u171-b11/src/hotspot/src/os/linux/vm/os_linux.cpp:1509
> #3  0x7fc89952f047 in VMError::report_and_die() 
> (this=this@entry=0x7fc7e90287d0) at 
> /build/openjdk-8-wnL82d/openjdk-8-8u171-b11/src/hotspot/src/share/vm/utilities/vmError.cpp:1060
> #4  0x7fc8993836ef in JVM_handle_linux_signal(int, siginfo_t*, void*, 
> int) (sig=sig@entry=11, info=info@entry=0x7fc7e9028a70, 
> ucVoid=ucVoid@entry=0x7fc7e9028940, 
> abort_if_unrecognized=abort_if_unrecognized@entry=1)
> at 
> /build/openjdk-8-wnL82d/openjdk-8-8u171-b11/src/hotspot/src/os_cpu/linux_x86/vm/os_linux_x86.cpp:541
> #5  0x7fc899376d88 in signalHandler(int, siginfo_t*, void*) (sig=11, 
> info=0x7fc7e9028a70, uc=0x7fc7e9028940) at 
> /build/openjdk-8-wnL82d/openjdk-8-8u171-b11/src/hotspot/src/os/linux/vm/os_linux.cpp:4432
> #6  0x7fc8967d6390 in  () at 
> /lib/x86_64-linux-gnu/libpthread.so.0
> #7  0x7fc8584ca000 in 
> impala::PartitionedAggregationNode::ProcessBatchNoGrouping(impala::RowBatch*) 
> [clone .1] ()
> #8  0x02cd5bcf in 
> impala::PartitionedAggregationNode::Open(impala::RuntimeState*) 
> (this=0x15795200, state=0x14e95d40) at 
> /home/ubuntu/Impala/be/src/exec/partitioned-aggregation-node.cc:314
> #9  0x01c94775 in impala::FragmentInstanceState::Open() 
> (this=0x1cc19e00) at 
> /home/ubuntu/Impala/be/src/runtime/fragment-instance-state.cc:268
> #10 0x01c91faf in impala::FragmentInstanceState::Exec() 
> (this=0x1cc19e00) at 
> /home/ubuntu/Impala/be/src/runtime/fragment-instance-state.cc:81
> #11 0x01ca175b in 
> impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) 
> (this=0x3a1f6000, fis=0x1cc19e00) at 
> /home/ubuntu/Impala/be/src/runtime/query-state.cc:401
> #12 0x01c9ffce in impala::QueryState::::operator()(void) 
> const (__closure=0x7fc7e9029ce8) at 
> /home/ubuntu/Impala/be/src/runtime/query-state.cc:341
> #13 0x01ca2479 in 
> boost::detail::function::void_function_obj_invoker0,
>  void>::invoke(boost::detail::function::function_buffer &) 
> (function_obj_ptr=...)
> at 
> /home/ubuntu/Impala/toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:153
> #14 0x01bd9e58 in boost::function0::operator()() const 
> (this=0x7fc7e9029ce0) at 
> /home/ubuntu/Impala/toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:767
> 

[jira] [Commented] (IMPALA-6941) Allow loading more text scanner plugins

2018-05-21 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483010#comment-16483010
 ] 

ASF subversion and git services commented on IMPALA-6941:
-

Commit c3bc72bda89755a7ac3a952df08cdf3d62b7caf9 in impala's branch 
refs/heads/2.x from [~tarmstr...@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=c3bc72b ]

IMPALA-6941: load more text scanner compression plugins

Add extensions for LZ4 and ZSTD (which are supported by Hadoop).
Even without a plugin this results in better behaviour because
we don't try to treat the files with unknown extensions as
uncompressed text.

Also allow loading tables containing files with unsupported
compression types. There was weird behaviour before we knew
of the file extension but didn't support querying the table -
the catalog would load the table but the impalad would fail
processing the catalog update. The simplest way to fix it
is to just allow loading the tables.

Similarly, make the "LOAD DATA" operation more permissive -
we can copy files into a directory even if we can't
decompress them.

Switch to always checking plugin version - running mismatched plugin
is inherently unsafe.

Testing:
Positive case where LZO is loaded is exercised. Added
coverage for negative case where LZO is disabled.

Fixed test gaps:
* Querying LZO table with LZO plugin not available.
* Interacting with tables with known but unsupported text
  compressions.
* Querying files with unknown compression suffixes (which are
  treated as uncompressed text).

Change-Id: If2a9c4a4a11bed81df706e9e834400bfedfe48e6
Reviewed-on: http://gerrit.cloudera.org:8080/10165
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 
Reviewed-on: http://gerrit.cloudera.org:8080/10462


> Allow loading more text scanner plugins
> ---
>
> Key: IMPALA-6941
> URL: https://issues.apache.org/jira/browse/IMPALA-6941
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> It would be nice if Impala supported loading plugins for scanning additional 
> text formats aside from LZO - the current logic is fairly specialized but 
> could easily be extended to load libraries for codecs like LZ4 and ZSTD if 
> available. It's kind of weird that we only support that one format.
> This might help a bit with IMPALA-6941 and IMPALA-3898 since we could test 
> the plugin-loading mechanism without relying on the external Impala-lzo 
> codebase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7042) Unmatched quotes in comments confuse shell

2018-05-21 Thread Fredy Wijaya (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-7042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483418#comment-16483418
 ] 

Fredy Wijaya commented on IMPALA-7042:
--

For the following queries
{noformat}
> ";
> ";{noformat}
and

 
{noformat}
> ';
> ';{noformat}
Those queries actually make sense. They mean ";\n" and ';\n' respectively. 
That's why they need to be properly closed. There was a bug with "\n;" that 
would cause an infinite loop and was fixed in: 
[https://gerrit.cloudera.org/c/9195/]

IMPALA-2751 is a real bug.

 

 

> Unmatched quotes in comments confuse shell
> --
>
> Key: IMPALA-7042
> URL: https://issues.apache.org/jira/browse/IMPALA-7042
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 2.12.0
>Reporter: Alan Jackoway
>Priority: Major
>
> This jira is really similar to IMPALA-696 and IMPALA-2803 each of which claim 
> to be fixed, but this is still happening to me on 2.12.0.
> Basically the issue is that this command requires you to close the quote in 
> impala-shell:
> {code:sql}
> > -- Alan's test query
> > select 1 + 1;
> > ';
> {code}
> Then it runs after closing the quote on the third line.
> With a double quote, the behavior is even worse:
> {code:sql}
> > -- Alan"s test query
> > select 1 + 1;
> > ";
> > ";
> {code}
> I haven't found any way to convince impala the quote is closed and run the 
> query with a double quote in the comment. Fortunately unmatched double quotes 
> in comments should be rare. Unmatched single quotes come up in our comments 
> fairly frequently, which is how I found this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7051) Concurrent Maven invocations can break build

2018-05-21 Thread Philip Zeyliger (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Zeyliger resolved IMPALA-7051.
-
Resolution: Fixed

> Concurrent Maven invocations can break build
> 
>
> Key: IMPALA-7051
> URL: https://issues.apache.org/jira/browse/IMPALA-7051
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Reporter: Philip Zeyliger
>Assignee: Philip Zeyliger
>Priority: Major
>
> Rarely I've seen our build fail when executing two Maven targets 
> simultaneously. Maven isn't really safe for concurrent execution (e.g., 
> ~/.m2/repository has no locking).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-3833) Fix invalid data handling in Sequence and RCFile scanners

2018-05-21 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-3833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483438#comment-16483438
 ] 

ASF subversion and git services commented on IMPALA-3833:
-

Commit 69e88f70f9fffad1086e3e66ebb38be15a2b1c67 in impala's branch 
refs/heads/2.x from [~pranay_singh]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=69e88f7 ]

IMPALA-3833: Fix invalid data handling in Sequence and RCFile scanners

Introduced new error message when scanning a corrupt Sequence or RCFile.
Added new checks to detect buffer overrun while handling Sequence or RCFile.

Testing:
  a) Made changes to fuzz test for RCFile/Sequence file, ran fuzz test in a loop
  with 200 iteration without failure.

  b) Ran exhaustive test on the changes without failure.

Change-Id: Ic9cfc38af3f30c65ada9734eb471dbfa6ecdd74a
Reviewed-on: http://gerrit.cloudera.org:8080/8936
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 


> Fix invalid data handling in Sequence and RCFile scanners
> -
>
> Key: IMPALA-3833
> URL: https://issues.apache.org/jira/browse/IMPALA-3833
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.7.0
>Reporter: Tim Armstrong
>Assignee: Pranay Singh
>Priority: Critical
>  Labels: crash, downgraded
>
> The fuzz testing found multiple crashes in sequence and RCFile scanners. 
> https://gerrit.cloudera.org/#/c/3448/
> I haven't triaged the crashes, but filing this issue to track them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6070) Speed up test execution

2018-05-21 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483439#comment-16483439
 ] 

ASF subversion and git services commented on IMPALA-6070:
-

Commit 9116423a76ca1a11fdd440d20f6fd14700bf9df9 in impala's branch 
refs/heads/2.x from [~philip]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=9116423 ]

IMPALA-6070: Adding ASAN, --tail to test-with-docker.

* Adds -ASAN suites to test-with-docker.
* Adds --tail flag, which starts a tail subprocess. This
  isn't pretty (there's potential for overlap), but it's a dead simple
  way to keep an eye on what's going on.
* Fixes a bug wherein I could call "docker rm " twice
  simultaneously, which would make Docker fail the second call,
  and then fail the related "docker rmi". It's better to serialize,
  and I did that with a simple lock.

Change-Id: I51451cdf1352fc0f9516d729b9a77700488d993f
Reviewed-on: http://gerrit.cloudera.org:8080/10319
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 


> Speed up test execution
> ---
>
> Key: IMPALA-6070
> URL: https://issues.apache.org/jira/browse/IMPALA-6070
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Philip Zeyliger
>Assignee: Philip Zeyliger
>Priority: Major
> Attachments: screenshot-1.png
>
>
> Our tests (e.g., 
> https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/buildTimeTrend) tend 
> to take about 4 hours. This can be improved.
> I'm opening this JIRA track those changes. I'm currently looking at:
> * Parallelizing multiple data-load steps: TPC-DS, TPC-H, and Functional take 
> ~65 minutes when serialized. They take 35 minutes if running in parallel.
> * Parallelizing compute stats: this takes ~10 minutes; probably can be faster.
> The trickier thing is parallelizing fe tests, ee tests, and custom cluster 
> tests. The approach I'm taking is to create a docker container with 
> everything in it (including data load), and then running tests in parallel. 
> This is a bit messier, but I think it has some legs when it comes to using 
> machines with many cores.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7011) Cleanups around PlanRootSink::CloseConsumer()

2018-05-21 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-7011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483442#comment-16483442
 ] 

ASF subversion and git services commented on IMPALA-7011:
-

Commit 482ea3914093064da1f4f176b6c616150100768c in impala's branch 
refs/heads/master from [~dhecht]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=482ea39 ]

IMPALA-7011: Simplify PlanRootSink control logic

1) The eos_ and sender_done_ bits really encode three possible states
   that the sender can be in. Make this explicit using an enum with
   three values.

2) The purpose of CloseConsumer() has changed over time and we can clean
   this up now:

 a) Originally, it looks like it was used to unblock the sender when the
   consumer finishes before eos, but also keep the sink alive long
   enough for the coordinator. This is no longer necessary now that
   control structures are owned by the QueryState whose lifetime is
   controlled by a reference count taken by the coordinator. So, we don't
   need the coordinator to tell the sink it's done calling it and we
   don't need the consumer_done_ state.

 b) Later on, CloseConsumer() was used as a cancellation mechinism.
   We need to keep this around (or use timeouts on the condvars) to kick
   both the consumer and producer on cancellation. But let's make the
   cancellation logic similar to the exec nodes and other sinks by
   driving the cancellation using the RuntimeState's cancellation
   flag. Now that CloseConsumer() is only about cancellation, rename it
   to Cancel() (later we may promote it to DataSink and implement in the
   data stream sender as well).

Testing:
- Exhaustive
- Minicluster concurrent_select.py stress

Change-Id: Ifc75617a253fd43a6122baa4b4dc7aeb1dbe633f
Reviewed-on: http://gerrit.cloudera.org:8080/10449
Reviewed-by: Dan Hecht 
Tested-by: Impala Public Jenkins 


> Cleanups around PlanRootSink::CloseConsumer()
> -
>
> Key: IMPALA-7011
> URL: https://issues.apache.org/jira/browse/IMPALA-7011
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Dan Hecht
>Assignee: Dan Hecht
>Priority: Minor
>
> We may not need some CloseConsumer() calls. Also, this is more about 
> cancellation than closing, so I think we should rename it (and perhaps 
> integrate more directly with the normal cancellation mechanisms).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-2751) quote in WITH block's comment breaks shell

2018-05-21 Thread Fredy Wijaya (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-2751 started by Fredy Wijaya.

> quote in WITH block's comment breaks shell
> --
>
> Key: IMPALA-2751
> URL: https://issues.apache.org/jira/browse/IMPALA-2751
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.2
> Environment: CDH5.4.8
>Reporter: Marcell Szabo
>Assignee: Fredy Wijaya
>Priority: Minor
>  Labels: impala-shell, shell, usability
>
> Steps to reproduce:
> $ cat > test.sql
> with a as (
> select 'a'
> -- shouldn't matter
> ) 
> select * from a; 
> $ impala-shell -f test.sql 
> /usr/bin/impala-shell: line 32: warning: setlocale: LC_CTYPE: cannot change 
> locale (UTF-8): No such file or directory
> /usr/bin/impala-shell: line 32: warning: setlocale: LC_CTYPE: cannot change 
> locale (UTF-8): No such file or directory
> Starting Impala Shell without Kerberos authentication
> Connected to host:21000
> Server version: impalad version 2.2.0-cdh5 RELEASE (build 
> 1d0b017e2441dd8950924743d839f14b3995e259)
> Traceback (most recent call last):
>   File "/usr/lib/impala-shell/impala_shell.py", line 1006, in 
> execute_queries_non_interactive_mode(options)
>   File "/usr/lib/impala-shell/impala_shell.py", line 922, in 
> execute_queries_non_interactive_mode
> if shell.onecmd(query) is CmdStatus.ERROR:
>   File "/usr/lib64/python2.6/cmd.py", line 219, in onecmd
> return func(arg)
>   File "/usr/lib/impala-shell/impala_shell.py", line 762, in do_with
> tokens = list(lexer)
>   File "/usr/lib64/python2.6/shlex.py", line 269, in next
> token = self.get_token()
>   File "/usr/lib64/python2.6/shlex.py", line 96, in get_token
> raw = self.read_token()
>   File "/usr/lib64/python2.6/shlex.py", line 172, in read_token
> raise ValueError, "No closing quotation"
> ValueError: No closing quotation
> Also, copy-pasting the query interactively, the line never closes.
> Strangely, the issue only seems to occur in presence of the WITH block.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-2751) quote in WITH block's comment breaks shell

2018-05-21 Thread Fredy Wijaya (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-2751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fredy Wijaya reassigned IMPALA-2751:


Assignee: Fredy Wijaya

> quote in WITH block's comment breaks shell
> --
>
> Key: IMPALA-2751
> URL: https://issues.apache.org/jira/browse/IMPALA-2751
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.2
> Environment: CDH5.4.8
>Reporter: Marcell Szabo
>Assignee: Fredy Wijaya
>Priority: Minor
>  Labels: impala-shell, shell, usability
>
> Steps to reproduce:
> $ cat > test.sql
> with a as (
> select 'a'
> -- shouldn't matter
> ) 
> select * from a; 
> $ impala-shell -f test.sql 
> /usr/bin/impala-shell: line 32: warning: setlocale: LC_CTYPE: cannot change 
> locale (UTF-8): No such file or directory
> /usr/bin/impala-shell: line 32: warning: setlocale: LC_CTYPE: cannot change 
> locale (UTF-8): No such file or directory
> Starting Impala Shell without Kerberos authentication
> Connected to host:21000
> Server version: impalad version 2.2.0-cdh5 RELEASE (build 
> 1d0b017e2441dd8950924743d839f14b3995e259)
> Traceback (most recent call last):
>   File "/usr/lib/impala-shell/impala_shell.py", line 1006, in 
> execute_queries_non_interactive_mode(options)
>   File "/usr/lib/impala-shell/impala_shell.py", line 922, in 
> execute_queries_non_interactive_mode
> if shell.onecmd(query) is CmdStatus.ERROR:
>   File "/usr/lib64/python2.6/cmd.py", line 219, in onecmd
> return func(arg)
>   File "/usr/lib/impala-shell/impala_shell.py", line 762, in do_with
> tokens = list(lexer)
>   File "/usr/lib64/python2.6/shlex.py", line 269, in next
> token = self.get_token()
>   File "/usr/lib64/python2.6/shlex.py", line 96, in get_token
> raw = self.read_token()
>   File "/usr/lib64/python2.6/shlex.py", line 172, in read_token
> raise ValueError, "No closing quotation"
> ValueError: No closing quotation
> Also, copy-pasting the query interactively, the line never closes.
> Strangely, the issue only seems to occur in presence of the WITH block.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-6917) Implement COMMENT ON TABLE

2018-05-21 Thread Fredy Wijaya (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-6917 started by Fredy Wijaya.

> Implement COMMENT ON TABLE
> --
>
> Key: IMPALA-6917
> URL: https://issues.apache.org/jira/browse/IMPALA-6917
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Fredy Wijaya
>Assignee: Fredy Wijaya
>Priority: Minor
>
> Syntax:
> {noformat}
> COMMENT ON TABLE my_db.my_table IS 'Employee Information';{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-6917) Implement COMMENT ON TABLE

2018-05-21 Thread Fredy Wijaya (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fredy Wijaya reassigned IMPALA-6917:


Assignee: Fredy Wijaya

> Implement COMMENT ON TABLE
> --
>
> Key: IMPALA-6917
> URL: https://issues.apache.org/jira/browse/IMPALA-6917
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Fredy Wijaya
>Assignee: Fredy Wijaya
>Priority: Minor
>
> Syntax:
> {noformat}
> COMMENT ON TABLE my_db.my_table IS 'Employee Information';{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7019) Discard block locations and schedule as remote read with erasure coding

2018-05-21 Thread Tianyi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tianyi Wang resolved IMPALA-7019.
-
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Discard block locations and schedule as remote read with erasure coding
> ---
>
> Key: IMPALA-7019
> URL: https://issues.apache.org/jira/browse/IMPALA-7019
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Tianyi Wang
>Assignee: Tianyi Wang
>Priority: Major
> Fix For: Impala 3.1.0
>
>
> Currently Impala schedules erasure coded scan in the same way as scheduling 
> regular HDFS scan: it tries to schedule the scan on a datanode processing the 
> block. This makes little sense with erasure coding so we should schedule it 
> as if the block is remote.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6998) test_bloom_wait_time fails due to late arrival of filters on Isilon

2018-05-21 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483440#comment-16483440
 ] 

ASF subversion and git services commented on IMPALA-6998:
-

Commit fb876f7e3b3f441760dbad972d56a863401a2437 in impala's branch 
refs/heads/2.x from [~sailesh]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=fb876f7 ]

IMPALA-6998: test_bloom_wait_time fails due to late arrival of filters on Isilon

This test has been failing on Isilon runs, most likely due to timing issues
which makes it a test issue rather than a product bug.

This patch disables the test for Isilon. We should revisit what tests we run
on non-HDFS filesystems later on, but until then, this should unblock
the build.

Change-Id: I2df6983a65a50b7efdd482124b70f518ee4c3229
Reviewed-on: http://gerrit.cloudera.org:8080/10366
Reviewed-by: Sailesh Mukil 
Tested-by: Impala Public Jenkins 


> test_bloom_wait_time fails due to late arrival of filters on Isilon
> ---
>
> Key: IMPALA-6998
> URL: https://issues.apache.org/jira/browse/IMPALA-6998
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.13.0
>Reporter: Sailesh Mukil
>Priority: Critical
>  Labels: broken-build
>
> This is likely a flaky issue and was seen on an instance of an Isilon run:
> {code:java}
> Error Message
> query_test/test_runtime_filters.py:92: in test_bloom_wait_time assert 
> duration < 60, \ E   AssertionError: Query took too long (118.044356108s, 
> possibly waiting for missing filters?) E   assert 118.04435610771179 < 60
> Stacktrace
> query_test/test_runtime_filters.py:92: in test_bloom_wait_time
> assert duration < 60, \
> E   AssertionError: Query took too long (118.044356108s, possibly waiting for 
> missing filters?)
> E   assert 118.04435610771179 < 60
> Standard Error
> -- executing against localhost:21000
> use functional_parquet;
> SET batch_size=0;
> SET num_nodes=0;
> SET disable_codegen_rows_threshold=0;
> SET disable_codegen=False;
> SET abort_on_error=1;
> SET exec_single_node_rows_threshold=0;
> -- executing against localhost:21000
> SET RUNTIME_FILTER_WAIT_TIME_MS=60;
> -- executing against localhost:21000
> SET RUNTIME_FILTER_MODE=GLOBAL;
> -- executing against localhost:21000
> SET RUNTIME_FILTER_MAX_SIZE=64K;
> -- executing against localhost:21000
> with l as (select * from tpch.lineitem UNION ALL select * from tpch.lineitem)
> select STRAIGHT_JOIN count(*) from (select * from tpch.lineitem a LIMIT 1) a
> join (select * from l LIMIT 50) b on a.l_orderkey = -b.l_orderkey;
> -- executing against localhost:21000
> SET RUNTIME_FILTER_WAIT_TIME_MS="0";
> -- executing against localhost:21000
> SET RUNTIME_FILTER_MODE="GLOBAL";
> -- executing against localhost:21000
> SET RUNTIME_FILTER_MAX_SIZE="16777216";
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7051) Concurrent Maven invocations can break build

2018-05-21 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-7051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483444#comment-16483444
 ] 

ASF subversion and git services commented on IMPALA-7051:
-

Commit 23e11dc72662417059b1b7337d69e78c2ac4ba65 in impala's branch 
refs/heads/master from [~philip]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=23e11dc ]

IMPALA-7051: Serialize Maven invocations.

I've observed some rare cases where Impala fails to build. I believe
it's because two Maven targets (yarn-extras and ext-data-source) are
being executed simultaneously. Maven's handling of ~/.m2/repository,
for example, is known to be not safe.

This patch serializes the Maven builds with the following
dependency graph:
  fe -> yarn-extras -> ext-data-source -> impala-parent
The ordering of yarn-extras -> ext-data-source is arbitrary.

I decided that this artificial dependency was the clearest
way to prevent parallel executions. Having mvn-quiet.sh
take a lock seemed considerably more complex.

Change-Id: Ie24f34f421bc7dcf9140938464d43400da95275e
Reviewed-on: http://gerrit.cloudera.org:8080/10460
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 


> Concurrent Maven invocations can break build
> 
>
> Key: IMPALA-7051
> URL: https://issues.apache.org/jira/browse/IMPALA-7051
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Reporter: Philip Zeyliger
>Assignee: Philip Zeyliger
>Priority: Major
>
> Rarely I've seen our build fail when executing two Maven targets 
> simultaneously. Maven isn't really safe for concurrent execution (e.g., 
> ~/.m2/repository has no locking).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7019) Discard block locations and schedule as remote read with erasure coding

2018-05-21 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-7019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483443#comment-16483443
 ] 

ASF subversion and git services commented on IMPALA-7019:
-

Commit 21d92aacbfdbe9780b983acfacd02ced4bb0c132 in impala's branch 
refs/heads/master from [~tianyiwang]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=21d92aa ]

IMPALA-7019: Schedule EC as remote & disable failed tests

This patch schedules HDFS EC files without considering locality. Failed
tests are disabled and a jenkins build should succeed with export
ERASURE_COINDG=true.

Testing: It passes core tests.

Cherry-picks: not for 2.x.

Change-Id: I138738d3e28e5daa1718c05c04cd9dd146c4ff84
Reviewed-on: http://gerrit.cloudera.org:8080/10413
Reviewed-by: Taras Bobrovytsky 
Tested-by: Impala Public Jenkins 


> Discard block locations and schedule as remote read with erasure coding
> ---
>
> Key: IMPALA-7019
> URL: https://issues.apache.org/jira/browse/IMPALA-7019
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Affects Versions: Impala 3.1.0
>Reporter: Tianyi Wang
>Assignee: Tianyi Wang
>Priority: Major
>
> Currently Impala schedules erasure coded scan in the same way as scheduling 
> regular HDFS scan: it tries to schedule the scan on a datanode processing the 
> block. This makes little sense with erasure coding so we should schedule it 
> as if the block is remote.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6317) Expose -cmake_only flag to buildall.sh

2018-05-21 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483441#comment-16483441
 ] 

ASF subversion and git services commented on IMPALA-6317:
-

Commit 7485d6082cb9d298e7ba5a829ff15dcc4937d338 in impala's branch 
refs/heads/master from [~dknupp]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=7485d60 ]

IMPALA-6317: Add -cmake_only option to buildall.sh

It's sometimes useful to be able to build a complete Impala dev
environment without necessarily building the Impala binary itself
-- e.g., when one wants to use the internal test framework to run
tests against an instance of Impala running on a remote cluster.

- This patch adds a -cmake_only flag to buildall.sh, which then
  gets propagated to the make_impala.sh.

- Added a missing line to the help text re: passing the -ninja
  command line option

Change-Id: If31a4e29425a6a20059cba2f43b72e4fb908018f
Reviewed-on: http://gerrit.cloudera.org:8080/10455
Reviewed-by: David Knupp 
Tested-by: Impala Public Jenkins 


> Expose -cmake_only flag to buildall.sh
> --
>
> Key: IMPALA-6317
> URL: https://issues.apache.org/jira/browse/IMPALA-6317
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 2.11.0
>Reporter: David Knupp
>Assignee: David Knupp
>Priority: Minor
>
> Impala/bin/make_impala.sh has a {{-cmake_only}} command line option:
> {noformat}
> -cmake_only)
>   CMAKE_ONLY=1
> {noformat}
> Passing this flag means that makefiles only will be generated during the 
> build. However, this flag is not provided in buildall.sh (the caller of 
> make_impala.sh) which effectively renders it useless.
> It turns out that if one has no intention of running the Impala cluster 
> locally (e.g., as when trying to build just enough of the toolchain and dev 
> environment to run the data load scripts for loading data onto a remote 
> cluster) then being able to only generate makefiles is a useful thing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-7054) "Top-25 tables with highest memory requirements" sorts incorrectly

2018-05-21 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-7054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16483180#comment-16483180
 ] 

Todd Lipcon commented on IMPALA-7054:
-

Ah, it seems so. I was looking at a release rather than the latest trunk build. 
Sorry for noise

> "Top-25 tables with highest memory requirements" sorts incorrectly
> --
>
> Key: IMPALA-7054
> URL: https://issues.apache.org/jira/browse/IMPALA-7054
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.12.0
>Reporter: Todd Lipcon
>Priority: Minor
>
> The table on catalogd:25020/catalog has an "estimated memory" column which 
> sorts based on the stringified value. For example, "2.07 GB" sorts below 
> "23.65 MB".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-7054) "Top-25 tables with highest memory requirements" sorts incorrectly

2018-05-21 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-7054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved IMPALA-7054.
-
Resolution: Duplicate

> "Top-25 tables with highest memory requirements" sorts incorrectly
> --
>
> Key: IMPALA-7054
> URL: https://issues.apache.org/jira/browse/IMPALA-7054
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.12.0
>Reporter: Todd Lipcon
>Priority: Minor
>
> The table on catalogd:25020/catalog has an "estimated memory" column which 
> sorts based on the stringified value. For example, "2.07 GB" sorts below 
> "23.65 MB".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-6020) REFRESH statement cannot detect HDFS block movement

2018-05-21 Thread Alex Rodoni (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-6020 started by Alex Rodoni.
---
> REFRESH statement cannot detect HDFS block movement
> ---
>
> Key: IMPALA-6020
> URL: https://issues.apache.org/jira/browse/IMPALA-6020
> Project: IMPALA
>  Issue Type: Bug
>  Components: Docs
>Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0
>Reporter: Jim Apple
>Assignee: Alex Rodoni
>Priority: Major
>
> In the release notes, it says
> http://impala.apache.org/docs/build/html/topics/impala_new_features.html
> {quote}The REFRESH statement now updates information about HDFS block 
> locations. Therefore, you can perform a fast and efficient REFRESH after 
> doing an HDFS rebalancing operation instead of the more expensive INVALIDATE 
> METADATA statement.
> {quote}
> However there is no change in HDFS or Impala side to support this. There may 
> be some misunderstanding. After hdfs load balancing, user still needs to run 
> INVALIDATE METADATA  to get latest block metadata.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-4970) Record identity of largest latency ExecQueryFInstances() RPC per query

2018-05-21 Thread Sailesh Mukil (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sailesh Mukil updated IMPALA-4970:
--
Labels: newbie ramp-up  (was: newbie)

> Record identity of largest latency ExecQueryFInstances() RPC per query
> --
>
> Key: IMPALA-4970
> URL: https://issues.apache.org/jira/browse/IMPALA-4970
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Affects Versions: Impala 2.9.0
>Reporter: Henry Robinson
>Assignee: Rahul Shivu Mahadev
>Priority: Major
>  Labels: newbie, ramp-up
>
> Although we retain the histogram of fragment instance startup latencies, we 
> don't record the identity of the most expensive instance, or the host it runs 
> on. This would be helpful in diagnosing slow query start-up times.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-4970) Record identity of largest latency ExecQueryFInstances() RPC per query

2018-05-21 Thread Sailesh Mukil (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-4970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sailesh Mukil reassigned IMPALA-4970:
-

Assignee: Rahul Shivu Mahadev

> Record identity of largest latency ExecQueryFInstances() RPC per query
> --
>
> Key: IMPALA-4970
> URL: https://issues.apache.org/jira/browse/IMPALA-4970
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Affects Versions: Impala 2.9.0
>Reporter: Henry Robinson
>Assignee: Rahul Shivu Mahadev
>Priority: Major
>  Labels: newbie
>
> Although we retain the histogram of fragment instance startup latencies, we 
> don't record the identity of the most expensive instance, or the host it runs 
> on. This would be helpful in diagnosing slow query start-up times.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org