[jira] [Commented] (IMPALA-6729) Provide startup option to disable file and block location cache

2018-05-03 Thread Quanlong Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463360#comment-16463360
 ] 

Quanlong Huang commented on IMPALA-6729:


[~tianyiwang], thanks for your exact summary!

Let me list some experiment results I've tested on Impala-2.12.0-rc1. Thanks to 
the new catalogd web UI, I can get the memory consumption of a table's 
metadata. I just uploaded a snapshot after two huge tables are loaded (see 
attachements). The queries both encountered OutOfMemoryError. The 
upp_published_prod table and upp_generated_prod table both consume ~10GB 
memory. It's useless to cache such much memory. The catalogd finally failed to 
send out the catalog updates. Here is the log:
{code:java}
I0503 19:29:42.871878 32749 TableLoader.java:58] Loading metadata for: 
default.upp_raw_prod
I0503 19:29:42.872372 32748 TableLoadingMgr.java:70] Loading metadata for 
table: default.upp_raw_prod
I0503 19:29:42.872709 32748 TableLoadingMgr.java:72] Remaining items in queue: 
0. Loads in progress: 1
I0503 19:29:43.332053 32749 HdfsTable.java:1276] Fetching partition metadata 
from the Metastore: default.upp_raw_prod
W0503 19:29:43.493882 32749 HiveConf.java:2897] HiveConf of name 
hive.access.conf.url does not exist
I0503 19:29:43.736603 32749 HdfsTable.java:1280] Fetched partition metadata 
from the Metastore: default.upp_raw_prod
I0503 19:29:45.541966 32749 HdfsTable.java:902] Loading file and block metadata 
for 794 paths for table default.upp_raw_prod using a thread pool of size 5
I0503 19:30:20.067162 32749 HdfsTable.java:942] Loaded file and block metadata 
for default.upp_raw_prod
I0503 19:30:20.071476 32749 TableLoader.java:97] Loaded metadata for: 
default.upp_raw_prod
I0503 19:30:22.708765 31655 catalog-server.cc:479] Collected update: 
TABLE:default.upp_raw_prod, version=17937, original size=94545234, compressed 
size=25210956
I0503 19:30:22.713205 31655 catalog-server.cc:479] Collected update: 
CATALOG_SERVICE_ID, version=17937, original size=49, compressed size=52
I0503 19:30:22.804253 31660 catalog-server.cc:243] A catalog update with 2 
entries is assembled. Catalog version: 17937 Last sent catalog version: 17936
I0503 19:30:38.328853 32748 TableLoadingMgr.java:70] Loading metadata for 
table: default.upp_generated_prod
I0503 19:30:38.329080 33260 TableLoader.java:58] Loading metadata for: 
default.upp_generated_prod
I0503 19:30:38.329207 32748 TableLoadingMgr.java:72] Remaining items in queue: 
0. Loads in progress: 1
I0503 19:30:38.586901 33260 HdfsTable.java:1276] Fetching partition metadata 
from the Metastore: default.upp_generated_prod
I0503 19:31:03.355075 33260 HdfsTable.java:1280] Fetched partition metadata 
from the Metastore: default.upp_generated_prod
I0503 19:32:59.218356 33260 HdfsTable.java:902] Loading file and block metadata 
for 104474 paths for table default.upp_generated_prod using a thread pool of 
size 5
I0503 19:43:02.928905 42338 webserver.cc:361] Webserver: error reading: 
Resource temporarily unavailable
I0503 19:43:42.053800 33260 HdfsTable.java:942] Loaded file and block metadata 
for default.upp_generated_prod
I0503 19:43:42.054765 33260 TableLoader.java:97] Loaded metadata for: 
default.upp_generated_prod
I0503 19:43:51.832700 32748 jni-util.cc:230] java.lang.OutOfMemoryError
at 
java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
at 
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at 
org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145)
at 
org.apache.thrift.protocol.TBinaryProtocol.writeBinary(TBinaryProtocol.java:211)
at 
org.apache.impala.thrift.THdfsFileDesc$THdfsFileDescStandardScheme.write(THdfsFileDesc.java:366)
at 
org.apache.impala.thrift.THdfsFileDesc$THdfsFileDescStandardScheme.write(THdfsFileDesc.java:329)
at org.apache.impala.thrift.THdfsFileDesc.write(THdfsFileDesc.java:280)
at 
org.apache.impala.thrift.THdfsPartition$THdfsPartitionStandardScheme.write(THdfsPartition.java:2044)
at 
org.apache.impala.thrift.THdfsPartition$THdfsPartitionStandardScheme.write(THdfsPartition.java:1777)
at 
org.apache.impala.thrift.THdfsPartition.write(THdfsPartition.java:1602)
at 
org.apache.impala.thrift.THdfsTable$THdfsTableStandardScheme.write(THdfsTable.java:1243)
at 
org.apache.impala.thrift.THdfsTable$THdfsTableStandardScheme.write(THdfsTable.java:1071)
at org.apache.impala.thrift.THdfsTable.write(THdfsTable.java:940)
at 
org.apache.impala.thrift.TTable$TTableStandardScheme.write(TTable.java:1628)
at 
org.apache.impala.thrift.TTable$TTableStandardScheme.write(TTable.java:1399)
at 

[jira] [Commented] (IMPALA-6970) DiskMgr::AllocateBuffersForRange crashes on failed DCHECK

2018-05-03 Thread Tim Armstrong (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463231#comment-16463231
 ] 

Tim Armstrong commented on IMPALA-6970:
---

So the failing test is

{noformat} 09:30:36 
query_test/test_cancellation.py::TestCancellationSerial::test_cancel_insert[table_format:
 seq/gzip/block | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
0} | query_type: SELECT | cancel_delay: 3 | action: None | query: compute stats 
lineitem | buffer_pool_limit: 0] FAILED
{noformat}

It might be happening on the local filesystem build only because there are 
multiple scan ranges scheduled on the same daemon.

> DiskMgr::AllocateBuffersForRange crashes on failed DCHECK
> -
>
> Key: IMPALA-6970
> URL: https://issues.apache.org/jira/browse/IMPALA-6970
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.13.0
>Reporter: Sailesh Mukil
>Priority: Blocker
>  Labels: crash
> Attachments: stacks.txt
>
>
> Similar to IMPALA-6587, but the DCHECK failed in a slightly different way. 
> Cannot tell if the root cause is the same as that though without further 
> investigation.
> {code:java}
> FSF0503 09:30:26.715791 30750 reservation-tracker.cc:376] Check failed: bytes 
> <= unused_reservation() (8388608 vs. 6291456) 
> *** Check failure stack trace: ***
> @  0x4277c1d  google::LogMessage::Fail()
> @  0x42794c2  google::LogMessage::SendToLog()
> @  0x42775f7  google::LogMessage::Flush()
> @  0x427abbe  google::LogMessageFatal::~LogMessageFatal()
> @  0x1ef1343  impala::ReservationTracker::AllocateFromLocked()
> @  0x1ef111d  impala::ReservationTracker::AllocateFrom()
> @  0x1ee8c57  
> impala::BufferPool::Client::PrepareToAllocateBuffer()
> @  0x1ee5543  impala::BufferPool::AllocateBuffer()
> @  0x2f50f68  impala::io::DiskIoMgr::AllocateBuffersForRange()
> @  0x1f74762  impala::HdfsScanNodeBase::StartNextScanRange()
> @  0x1f6b052  impala::HdfsScanNode::ScannerThread()
> @  0x1f6a4ea  
> _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
> @  0x1f6c5cc  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
> @  0x1bd4748  boost::function0<>::operator()()
> @  0x1ebf349  impala::Thread::SuperviseThread()
> @  0x1ec74e5  boost::_bi::list5<>::operator()<>()
> @  0x1ec7409  boost::_bi::bind_t<>::operator()()
> @  0x1ec73cc  boost::detail::thread_data<>::run()
> @  0x31a1f0a  thread_proxy
> @   0x36d1607851  (unknown)
> @   0x36d12e894d  (unknown)
> {code}
> Git hash of Impala used in job: ba84ad03cb83d7f7aed8524fcfbb0e2cdc9fdd53



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6970) DiskMgr::AllocateBuffersForRange crashes on failed DCHECK

2018-05-03 Thread Tim Armstrong (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463230#comment-16463230
 ] 

Tim Armstrong commented on IMPALA-6970:
---

{noformat}
(gdb) p **scan_range
$29 = (impala::io::ScanRange) {
   = {
::Node> = {
  _vptr.Node = 0x61a6ef0 , 
  parent_queue = 0x0, 
  next = 0x0, 
  prev = 0x0
}, 
members of impala::io::RequestRange: 
fs_ = 0xbf86828, 
file_ = {
  static npos = , 
  _M_dataplus = {
 = {
  <__gnu_cxx::new_allocator> = {}, }, 
members of std::basic_string::_Alloc_hider: 
_M_p = 0xbef1f18 
"file:/tmp/test-warehouse/tpch.lineitem_seq_gzip/00_0"
  }
}, 
offset_ = 33554432, 
len_ = 9276980, 
disk_id_ = 1, 
request_type_ = impala::io::RequestType::READ
  }, 
{noformat}

> DiskMgr::AllocateBuffersForRange crashes on failed DCHECK
> -
>
> Key: IMPALA-6970
> URL: https://issues.apache.org/jira/browse/IMPALA-6970
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.13.0
>Reporter: Sailesh Mukil
>Priority: Blocker
>  Labels: crash
> Attachments: stacks.txt
>
>
> Similar to IMPALA-6587, but the DCHECK failed in a slightly different way. 
> Cannot tell if the root cause is the same as that though without further 
> investigation.
> {code:java}
> FSF0503 09:30:26.715791 30750 reservation-tracker.cc:376] Check failed: bytes 
> <= unused_reservation() (8388608 vs. 6291456) 
> *** Check failure stack trace: ***
> @  0x4277c1d  google::LogMessage::Fail()
> @  0x42794c2  google::LogMessage::SendToLog()
> @  0x42775f7  google::LogMessage::Flush()
> @  0x427abbe  google::LogMessageFatal::~LogMessageFatal()
> @  0x1ef1343  impala::ReservationTracker::AllocateFromLocked()
> @  0x1ef111d  impala::ReservationTracker::AllocateFrom()
> @  0x1ee8c57  
> impala::BufferPool::Client::PrepareToAllocateBuffer()
> @  0x1ee5543  impala::BufferPool::AllocateBuffer()
> @  0x2f50f68  impala::io::DiskIoMgr::AllocateBuffersForRange()
> @  0x1f74762  impala::HdfsScanNodeBase::StartNextScanRange()
> @  0x1f6b052  impala::HdfsScanNode::ScannerThread()
> @  0x1f6a4ea  
> _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
> @  0x1f6c5cc  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
> @  0x1bd4748  boost::function0<>::operator()()
> @  0x1ebf349  impala::Thread::SuperviseThread()
> @  0x1ec74e5  boost::_bi::list5<>::operator()<>()
> @  0x1ec7409  boost::_bi::bind_t<>::operator()()
> @  0x1ec73cc  boost::detail::thread_data<>::run()
> @  0x31a1f0a  thread_proxy
> @   0x36d1607851  (unknown)
> @   0x36d12e894d  (unknown)
> {code}
> Git hash of Impala used in job: ba84ad03cb83d7f7aed8524fcfbb0e2cdc9fdd53



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6970) DiskMgr::AllocateBuffersForRange crashes on failed DCHECK

2018-05-03 Thread Tim Armstrong (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463216#comment-16463216
 ] 

Tim Armstrong commented on IMPALA-6970:
---

{noformat}
(gdb) p runtime_state_->query_state_->query_ctx_.client_request.stmt
$24 = {
  static npos = , 
  _M_dataplus = {
 = {
  <__gnu_cxx::new_allocator> = {}, }, 
members of std::basic_string::_Alloc_hider: 
_M_p = 0xaea0e58 "SELECT COUNT(*) FROM lineitem"
  }
}
{noformat}

> DiskMgr::AllocateBuffersForRange crashes on failed DCHECK
> -
>
> Key: IMPALA-6970
> URL: https://issues.apache.org/jira/browse/IMPALA-6970
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.13.0
>Reporter: Sailesh Mukil
>Priority: Blocker
>  Labels: crash
> Attachments: stacks.txt
>
>
> Similar to IMPALA-6587, but the DCHECK failed in a slightly different way. 
> Cannot tell if the root cause is the same as that though without further 
> investigation.
> {code:java}
> FSF0503 09:30:26.715791 30750 reservation-tracker.cc:376] Check failed: bytes 
> <= unused_reservation() (8388608 vs. 6291456) 
> *** Check failure stack trace: ***
> @  0x4277c1d  google::LogMessage::Fail()
> @  0x42794c2  google::LogMessage::SendToLog()
> @  0x42775f7  google::LogMessage::Flush()
> @  0x427abbe  google::LogMessageFatal::~LogMessageFatal()
> @  0x1ef1343  impala::ReservationTracker::AllocateFromLocked()
> @  0x1ef111d  impala::ReservationTracker::AllocateFrom()
> @  0x1ee8c57  
> impala::BufferPool::Client::PrepareToAllocateBuffer()
> @  0x1ee5543  impala::BufferPool::AllocateBuffer()
> @  0x2f50f68  impala::io::DiskIoMgr::AllocateBuffersForRange()
> @  0x1f74762  impala::HdfsScanNodeBase::StartNextScanRange()
> @  0x1f6b052  impala::HdfsScanNode::ScannerThread()
> @  0x1f6a4ea  
> _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
> @  0x1f6c5cc  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
> @  0x1bd4748  boost::function0<>::operator()()
> @  0x1ebf349  impala::Thread::SuperviseThread()
> @  0x1ec74e5  boost::_bi::list5<>::operator()<>()
> @  0x1ec7409  boost::_bi::bind_t<>::operator()()
> @  0x1ec73cc  boost::detail::thread_data<>::run()
> @  0x31a1f0a  thread_proxy
> @   0x36d1607851  (unknown)
> @   0x36d12e894d  (unknown)
> {code}
> Git hash of Impala used in job: ba84ad03cb83d7f7aed8524fcfbb0e2cdc9fdd53



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6970) DiskMgr::AllocateBuffersForRange crashes on failed DCHECK

2018-05-03 Thread Tim Armstrong (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-6970:
--
Attachment: stacks.txt

> DiskMgr::AllocateBuffersForRange crashes on failed DCHECK
> -
>
> Key: IMPALA-6970
> URL: https://issues.apache.org/jira/browse/IMPALA-6970
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.13.0
>Reporter: Sailesh Mukil
>Priority: Blocker
>  Labels: crash
> Attachments: stacks.txt
>
>
> Similar to IMPALA-6587, but the DCHECK failed in a slightly different way. 
> Cannot tell if the root cause is the same as that though without further 
> investigation.
> {code:java}
> FSF0503 09:30:26.715791 30750 reservation-tracker.cc:376] Check failed: bytes 
> <= unused_reservation() (8388608 vs. 6291456) 
> *** Check failure stack trace: ***
> @  0x4277c1d  google::LogMessage::Fail()
> @  0x42794c2  google::LogMessage::SendToLog()
> @  0x42775f7  google::LogMessage::Flush()
> @  0x427abbe  google::LogMessageFatal::~LogMessageFatal()
> @  0x1ef1343  impala::ReservationTracker::AllocateFromLocked()
> @  0x1ef111d  impala::ReservationTracker::AllocateFrom()
> @  0x1ee8c57  
> impala::BufferPool::Client::PrepareToAllocateBuffer()
> @  0x1ee5543  impala::BufferPool::AllocateBuffer()
> @  0x2f50f68  impala::io::DiskIoMgr::AllocateBuffersForRange()
> @  0x1f74762  impala::HdfsScanNodeBase::StartNextScanRange()
> @  0x1f6b052  impala::HdfsScanNode::ScannerThread()
> @  0x1f6a4ea  
> _ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
> @  0x1f6c5cc  
> _ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
> @  0x1bd4748  boost::function0<>::operator()()
> @  0x1ebf349  impala::Thread::SuperviseThread()
> @  0x1ec74e5  boost::_bi::list5<>::operator()<>()
> @  0x1ec7409  boost::_bi::bind_t<>::operator()()
> @  0x1ec73cc  boost::detail::thread_data<>::run()
> @  0x31a1f0a  thread_proxy
> @   0x36d1607851  (unknown)
> @   0x36d12e894d  (unknown)
> {code}
> Git hash of Impala used in job: ba84ad03cb83d7f7aed8524fcfbb0e2cdc9fdd53



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6865) session-expiry-test failed

2018-05-03 Thread Bikramjeet Vig (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463205#comment-16463205
 ] 

Bikramjeet Vig commented on IMPALA-6865:


>From what I see, "TException: Could not bind: Transport endpoint is not 
>connected" happens when thrift tries to bind to a used port. IMPALA-5499 deals 
>with impala trying to bind 2 different services to the same port. But in this 
>case, according to the logs it looks like there is no other service trying to 
>connect to that port except for  "hiveserver2-frontend", neither is any other 
>service from the minicluster attempting the same.
 The only reason I can think of is that the port is still being used but not by 
any of the services that impala/minicluster starts. Moreover, for that to 
happen the port needs to be picked up by the other service in the short window 
between the call to FindUnusedEphemeralPort() and hs2_server_->Start(). Thats 
because FindUnusedEphemeralPort() binds to the port to check if it is free, 
before returning it.

I am leaving this open in case this happens again but from the reasons 
mentioned above, it dosent seem like an impala issue yet.

> session-expiry-test failed
> --
>
> Key: IMPALA-6865
> URL: https://issues.apache.org/jira/browse/IMPALA-6865
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.0
>Reporter: Vuk Ercegovac
>Assignee: Bikramjeet Vig
>Priority: Major
>  Labels: build-failure
>
> {noformat}
> ...
> I0416 10:33:06.779047 5037 statestore-subscriber.cc:220] statestore 
> registration successful
> I0416 10:33:06.779647 5037 Frontend.java:869] Waiting for first catalog 
> update from the statestore.
> I0416 10:33:06.779865 5037 Frontend.java:874] Local catalog initialized 
> after: 0 ms.
> I0416 10:33:06.791064 5037 impala-server.cc:2115] Initialized 
> coordinator/executor Impala server on 
> ec2-m2-4xlarge-centos-6-4-1cbb.vpc.cloudera.com:59389
> I0416 10:33:06.791720 5037 thrift-server.cc:468] ThriftServer 'backend' 
> started on port: 59389
> I0416 10:33:06.791744 5037 impala-server.cc:2122] Impala InternalService 
> listening on 59389
> E0416 10:33:06.816757 5233 thrift-server.cc:216] ThriftServer 
> 'hiveserver2-frontend' (on port: 54240) exited due to TException: Could not 
> bind: Transport endpoint is not connected
> E0416 10:33:06.818292 5037 thrift-server.cc:205] ThriftServer 
> 'hiveserver2-frontend' (on port: 54240) did not start correctly
> W0416 10:33:06.818378 5037 in-process-servers.cc:76] ThriftServer 
> 'hiveserver2-frontend' (on port: 54240) did not start correctly{noformat}
> Resolving the minidump for more info...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-6972) Dataload is intermittently failing on 2.x

2018-05-03 Thread Joe McDonnell (JIRA)
Joe McDonnell created IMPALA-6972:
-

 Summary: Dataload is intermittently failing on 2.x
 Key: IMPALA-6972
 URL: https://issues.apache.org/jira/browse/IMPALA-6972
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 2.13.0
Reporter: Joe McDonnell
Assignee: Joe McDonnell


Dataload on IMPALA_MINICLUSTER_PROFILE=2 and the 2.x branch are hitting 
IMPALA-6532. IMPALA-6532 is a concurrency issue in Hive that can fail with the 
following stack:
{noformat}
java.lang.Exception: java.lang.NullPointerException
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:489)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:549)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.initIOContext(HiveContextAwareRecordReader.java:171)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.initIOContext(HiveContextAwareRecordReader.java:208)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:258)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:169)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:438)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745){noformat}
The Hive issue can be fixed with a backport, but while that is going on, this 
is only happening during dataload because dataload goes parallel on Hive 
operations. This is hitting a lot of builds, so temporarily disabling 
parallelism makes sense.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-6972) Dataload is intermittently failing on 2.x

2018-05-03 Thread Joe McDonnell (JIRA)
Joe McDonnell created IMPALA-6972:
-

 Summary: Dataload is intermittently failing on 2.x
 Key: IMPALA-6972
 URL: https://issues.apache.org/jira/browse/IMPALA-6972
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 2.13.0
Reporter: Joe McDonnell
Assignee: Joe McDonnell


Dataload on IMPALA_MINICLUSTER_PROFILE=2 and the 2.x branch are hitting 
IMPALA-6532. IMPALA-6532 is a concurrency issue in Hive that can fail with the 
following stack:
{noformat}
java.lang.Exception: java.lang.NullPointerException
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:489)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:549)
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.initIOContext(HiveContextAwareRecordReader.java:171)
at 
org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.initIOContext(HiveContextAwareRecordReader.java:208)
at 
org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:258)
at 
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.(MapTask.java:169)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:438)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at 
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:270)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745){noformat}
The Hive issue can be fixed with a backport, but while that is going on, this 
is only happening during dataload because dataload goes parallel on Hive 
operations. This is hitting a lot of builds, so temporarily disabling 
parallelism makes sense.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-6971) Log flooding with unhelpful parsing error message

2018-05-03 Thread Zoram Thanga (JIRA)
Zoram Thanga created IMPALA-6971:


 Summary: Log flooding with unhelpful parsing error message
 Key: IMPALA-6971
 URL: https://issues.apache.org/jira/browse/IMPALA-6971
 Project: IMPALA
  Issue Type: Improvement
Affects Versions: Impala 2.12.0, Impala 3.0
Reporter: Zoram Thanga


Once in a while I've seen Impala flood the log file with messages similar to 
the following:


{noformat}
I0318 08:20:28.566603 984564 runtime-state.cc:168] Error from query 
cf41e4ee4f7e1273:bcc41038: Error parsing row: file: 
hdfs://nameservice/user/hive/warehouse/mydb/mytable/part-m-0, before 
offset: 2122317824
I0318 08:20:28.566643 984564 runtime-state.cc:168] Error from query 
cf41e4ee4f7e1273:bcc41038: Error parsing row: file: 
hdfs://nameservice/user/hive/warehouse/mydb/mytable/part-m-0, before 
offset: 2122317824
I0318 08:20:28.566676 984564 runtime-state.cc:168] Error from query 
cf41e4ee4f7e1273:bcc41038: Error parsing row: file: 
hdfs://nameservice/user/hive/warehouse/mydb/mytable/part-m-0, before 
offset: 2122317824
I0318 08:20:28.566709 984564 runtime-state.cc:168] Error from query 
cf41e4ee4f7e1273:bcc41038: Error parsing row: file: 
hdfs://nameservice/user/hive/warehouse/mydb/mytable/part-m-0, before 
offset: 2122317824
I0318 08:20:28.566742 984564 runtime-state.cc:168] Error from query 
cf41e4ee4f7e1273:bcc41038: Error parsing row: file: 
hdfs://nameservice/user/hive/warehouse/mydb/mytable/part-m-0, before 
offset: 2122317824
I0318 08:20:28.566787 984564 runtime-state.cc:168] Error from query 
cf41e4ee4f7e1273:bcc41038: Error parsing row: file: 
hdfs://nameservice/user/hive/warehouse/mydb/mytable/part-m-0, before 
offset: 2122317824
I0318 08:20:28.566820 984564 runtime-state.cc:168] Error from query 
cf41e4ee4f7e1273:bcc41038: Error parsing row: file: 
hdfs://nameservice/user/hive/warehouse/mydb/mytable/part-m-0, before 
offset: 2122317824
I0318 08:20:28.566856 984564 runtime-state.cc:168] Error from query 
cf41e4ee4f7e1273:bcc41038: Error parsing row: file: 
hdfs://nameservice/user/hive/warehouse/mydb/mytable/part-m-0, before 
offset: 2122317824
I0318 08:20:28.566890 984564 runtime-state.cc:168] Error from query 
cf41e4ee4f7e1273:bcc41038: Error parsing row: file: 
hdfs://nameservice/user/hive/warehouse/mydb/mytable/part-m-0, before 
offset: 2122317824
I0318 08:20:28.566923 984564 runtime-state.cc:168] Error from query 
cf41e4ee4f7e1273:bcc41038: Error parsing row: file: 
hdfs://nameservice/user/hive/warehouse/mydb/mytable/part-m-0, before 
offset: 2122317824
I0318 08:20:28.566959 984564 runtime-state.cc:168] Error from query 
cf41e4ee4f7e1273:bcc41038: Error parsing row: file: 
hdfs://nameservice/user/hive/warehouse/mydb/mytable/part-m-0, before 
offset: 2122317824
I0318 08:20:28.566998 984564 runtime-state.cc:168] Error from query 
cf41e4ee4f7e1273:bcc41038: Error parsing row: file: 
hdfs://nameservice/user/hive/warehouse/mydb/mytable/part-m-0, before 
offset: 2122317824
I0318 08:20:28.567039 984564 runtime-state.cc:168] Error from query 
cf41e4ee4f7e1273:bcc41038: Error parsing row: file: 
hdfs://nameservice/user/hive/warehouse/mydb/mytable/part-m-0, before 
offset: 2122317824
I0318 08:20:28.567070 984564 runtime-state.cc:168] Error from query 
cf41e4ee4f7e1273:bcc41038: Error parsing row: file: 
hdfs://nameservice/user/hive/warehouse/mydb/mytable/part-m-0, before 
offset: 2122317824
I0318 08:20:28.567101 984564 runtime-state.cc:168] Error from query 
cf41e4ee4f7e1273:bcc41038: Error parsing row: file: 
hdfs://nameservice/user/hive/warehouse/mydb/mytable/part-m-0, before 
offset: 2122317824
I0318 08:20:28.567132 984564 runtime-state.cc:168] Error from query 
cf41e4ee4f7e1273:bcc41038: Error parsing row: file: 
hdfs://nameservice/user/hive/warehouse/mydb/mytable/part-m-0, before 
offset: 2122317824
I0318 08:20:28.567173 984564 runtime-state.cc:168] Error from query 
cf41e4ee4f7e1273:bcc41038: Error parsing row: file: 
hdfs://nameservice/user/hive/warehouse/mydb/mytable/part-m-0, before 
offset: 2122317824
I0318 08:20:28.567209 984564 runtime-state.cc:168] Error from query 
cf41e4ee4f7e1273:bcc41038: Error parsing row: file: 
hdfs://nameservice/user/hive/warehouse/mydb/mytable/part-m-0, before 
offset: 2122317824
I0318 08:20:28.567245 984564 runtime-state.cc:168] Error from query 
cf41e4ee4f7e1273:bcc41038: Error parsing row: file: 
hdfs://nameservice/user/hive/warehouse/mydb/mytable/part-m-0, before 
offset: 2122317824
I0318 08:20:28.567276 984564 runtime-state.cc:168] Error from query 
cf41e4ee4f7e1273:bcc41038: Error parsing row: file: 
hdfs://nameservice/user/hive/warehouse/mydb/mytable/part-m-0, before 
offset: 2122317824
I0318 08:20:28.567312 984564 

[jira] [Commented] (IMPALA-6966) Estimated Memory in Catalogd webpage is not sorted correctly

2018-05-03 Thread Quanlong Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463192#comment-16463192
 ] 

Quanlong Huang commented on IMPALA-6966:


[~dtsirogiannis], patch is ready: [https://gerrit.cloudera.org/#/c/10292/]. 
Could you have a look when you have time?

> Estimated Memory in Catalogd webpage is not sorted correctly
> 
>
> Key: IMPALA-6966
> URL: https://issues.apache.org/jira/browse/IMPALA-6966
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: newbie
> Attachments: Screen Shot 2018-05-03 at 9.38.45 PM.png
>
>
> The "Top-N Tables with Highest Memory Requirements" in Catalogd webpage 
> doesn't sort "Estimated Memory" correctly. In fact, it sorts them as strings 
> instead of size. This is confusing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6227) TestAdmissionControllerStress can be flaky

2018-05-03 Thread Sailesh Mukil (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463188#comment-16463188
 ] 

Sailesh Mukil commented on IMPALA-6227:
---

Hit this again:


{code:java}

custom_cluster.test_admission_controller.TestAdmissionControllerStress.test_mem_limit[num_queries:
 50 | submission_delay_ms: 50 | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 5000, 'disable_codegen': False, 
'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
text/none | round_robin_submission: True] (from pytest)

Failing for the past 1 build (Since Failed#8 )
Took 2 min 30 sec.
Error Message
AssertionError: Timed out waiting 60 seconds for metrics admitted,timed-out 
delta 5 current {'dequeued': 20, 'rejected': 20, 'released': 24, 'admitted': 
30, 'queued': 20, 'timed-out': 0} initial {'dequeued': 14, 'rejected': 20, 
'released': 18, 'admitted': 26, 'queued': 20, 'timed-out': 0} assert 
(1524822944.9910979 - 1524822883.858825) < 60  +  where 1524822944.9910979 = 
time()
Stacktrace
custom_cluster/test_admission_controller.py:943: in test_mem_limit
{'request_pool': self.pool_name, 'mem_limit': query_mem_limit})
custom_cluster/test_admission_controller.py:844: in run_admission_test
['admitted', 'timed-out'], curr_metrics, expected_admitted)
custom_cluster/test_admission_controller.py:547: in wait_for_metric_changes
assert (time() - start_time < STRESS_TIMEOUT),\
E   AssertionError: Timed out waiting 60 seconds for metrics admitted,timed-out 
delta 5 current {'dequeued': 20, 'rejected': 20, 'released': 24, 'admitted': 
30, 'queued': 20, 'timed-out': 0} initial {'dequeued': 14, 'rejected': 20, 
'released': 18, 'admitted': 26, 'queued': 20, 'timed-out': 0}
E   assert (1524822944.9910979 - 1524822883.858825) < 60
E+  where 1524822944.9910979 = time()
Standard Output
Starting State Store logging to 
/data/jenkins/workspace/impala-cdh6.0.x-exhaustive/repos/Impala/logs/custom_cluster_tests/statestored.INFO
Starting Catalog Service logging to 
/data/jenkins/workspace/impala-cdh6.0.x-exhaustive/repos/Impala/logs/custom_cluster_tests/catalogd.INFO
Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-cdh6.0.x-exhaustive/repos/Impala/logs/custom_cluster_tests/impalad.INFO
Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-cdh6.0.x-exhaustive/repos/Impala/logs/custom_cluster_tests/impalad_node1.INFO
Starting Impala Daemon logging to 
/data/jenkins/workspace/impala-cdh6.0.x-exhaustive/repos/Impala/logs/custom_cluster_tests/impalad_node2.INFO
Impala Cluster Running with 3 nodes (3 coordinators, 3 executors).
Standard Error
MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
MainThread: Getting num_known_live_backends from 
ec2-m2-4xlarge-centos-6-4-07fb.vpc.cloudera.com:25000
MainThread: Debug webpage not yet available.
MainThread: Debug webpage not yet available.
MainThread: Waiting for num_known_live_backends=3. Current value: 0
MainThread: Getting num_known_live_backends from 
ec2-m2-4xlarge-centos-6-4-07fb.vpc.cloudera.com:25000
MainThread: Waiting for num_known_live_backends=3. Current value: 0
MainThread: Getting num_known_live_backends from 
ec2-m2-4xlarge-centos-6-4-07fb.vpc.cloudera.com:25000
MainThread: Waiting for num_known_live_backends=3. Current value: 1
MainThread: Getting num_known_live_backends from 
ec2-m2-4xlarge-centos-6-4-07fb.vpc.cloudera.com:25000
MainThread: Waiting for num_known_live_backends=3. Current value: 2
MainThread: Getting num_known_live_backends from 
ec2-m2-4xlarge-centos-6-4-07fb.vpc.cloudera.com:25000
MainThread: num_known_live_backends has reached value: 3
MainThread: Getting num_known_live_backends from 
ec2-m2-4xlarge-centos-6-4-07fb.vpc.cloudera.com:25001
MainThread: num_known_live_backends has reached value: 3
MainThread: Getting num_known_live_backends from 
ec2-m2-4xlarge-centos-6-4-07fb.vpc.cloudera.com:25002
MainThread: num_known_live_backends has reached value: 3
MainThread: Found 3 impalad/1 statestored/1 catalogd process(es)
MainThread: Getting metric: statestore.live-backends from 
ec2-m2-4xlarge-centos-6-4-07fb.vpc.cloudera.com:25010
MainThread: Metric 'statestore.live-backends' has reached desired value: 4
MainThread: Getting num_known_live_backends from 
ec2-m2-4xlarge-centos-6-4-07fb.vpc.cloudera.com:25000
MainThread: num_known_live_backends has reached value: 3
MainThread: Getting num_known_live_backends from 
ec2-m2-4xlarge-centos-6-4-07fb.vpc.cloudera.com:25001
MainThread: num_known_live_backends has reached value: 3
MainThread: Getting num_known_live_backends from 
ec2-m2-4xlarge-centos-6-4-07fb.vpc.cloudera.com:25002
MainThread: num_known_live_backends has reached value: 3
-- connecting to: localhost:21000
MainThread: Starting test case with parameters: num_queries: 50 | 
submission_delay_ms: 50 | exec_option: {'batch_size': 0, 'num_nodes': 0, 
'disable_codegen_rows_threshold': 

[jira] [Assigned] (IMPALA-5872) Implement a SQL test case builder for gathering query diagnostics

2018-05-03 Thread bharath v (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-5872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bharath v reassigned IMPALA-5872:
-

Assignee: bharath v

> Implement a SQL test case builder for gathering query diagnostics
> -
>
> Key: IMPALA-5872
> URL: https://issues.apache.org/jira/browse/IMPALA-5872
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: bharath v
>Assignee: bharath v
>Priority: Major
>  Labels: supportability
>
> The idea is to implement a test case builder for collecting enough 
> diagnostics for a given SQL query so that it can be reproduced/recreated on 
> another cluster easily.
> Input: A valid SQL query
> Expected output: A .sql file that can be run using {{impala-shell -f}} and 
> has all the schema/table/view definitions required to run the query on a 
> target cluster.
> An example:
> {noformat}
> EXPORT TESTCASE INTO OUTFILE 'file:///tmp/repro.sql' select * from view
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-3343) Impala-shell compatibility with python 3

2018-05-03 Thread David Knupp (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Knupp reassigned IMPALA-3343:
---

Assignee: David Knupp  (was: Juliet Hougland)

> Impala-shell compatibility with python 3
> 
>
> Key: IMPALA-3343
> URL: https://issues.apache.org/jira/browse/IMPALA-3343
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 2.5.0
>Reporter: Peter Ebert
>Assignee: David Knupp
>Priority: Minor
>
> Installed Anaconda package and python 3, Impala shell has errors and will not 
> run in the python 3 environment.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6968) TestBlockVerificationGcmDisabled failure in exhaustive

2018-05-03 Thread Tim Armstrong (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463177#comment-16463177
 ] 

Tim Armstrong commented on IMPALA-6968:
---

I know what the bug is - it's a dumb one. The data on data/in memory is 
encrypted with a randomly generated key, so the bytes are effectively random.  
I'm only changing to first byte to '?'. There's a 1/256 chance that the first 
byte of the encrypted data was already '?', so the test will fail 1/256 tries 
on average.

> TestBlockVerificationGcmDisabled failure in exhaustive
> --
>
> Key: IMPALA-6968
> URL: https://issues.apache.org/jira/browse/IMPALA-6968
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.13.0
>Reporter: Sailesh Mukil
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: flaky
>
> {code:java}
> /data/jenkins/workspace/impala-cdh5-trunk-exhaustive-release-thrift/repos/Impala/be/src/runtime/tmp-file-mgr-test.cc:550
> Value of: read_status.code()
>   Actual: 0
> Expected: TErrorCode::SCRATCH_READ_VERIFY_FAILED
> Which is: 118
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6968) TestBlockVerificationGcmDisabled failure in exhaustive

2018-05-03 Thread Tim Armstrong (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463167#comment-16463167
 ] 

Tim Armstrong commented on IMPALA-6968:
---

Spoke too soon - I reproduced it locally after a few more minutes.

> TestBlockVerificationGcmDisabled failure in exhaustive
> --
>
> Key: IMPALA-6968
> URL: https://issues.apache.org/jira/browse/IMPALA-6968
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.13.0
>Reporter: Sailesh Mukil
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: flaky
>
> {code:java}
> /data/jenkins/workspace/impala-cdh5-trunk-exhaustive-release-thrift/repos/Impala/be/src/runtime/tmp-file-mgr-test.cc:550
> Value of: read_status.code()
>   Actual: 0
> Expected: TErrorCode::SCRATCH_READ_VERIFY_FAILED
> Which is: 118
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6968) TestBlockVerificationGcmDisabled failure in exhaustive

2018-05-03 Thread Tim Armstrong (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-6968:
--
Labels: flaky  (was: )

> TestBlockVerificationGcmDisabled failure in exhaustive
> --
>
> Key: IMPALA-6968
> URL: https://issues.apache.org/jira/browse/IMPALA-6968
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 2.13.0
>Reporter: Sailesh Mukil
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: flaky
>
> {code:java}
> /data/jenkins/workspace/impala-cdh5-trunk-exhaustive-release-thrift/repos/Impala/be/src/runtime/tmp-file-mgr-test.cc:550
> Value of: read_status.code()
>   Actual: 0
> Expected: TErrorCode::SCRATCH_READ_VERIFY_FAILED
> Which is: 118
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6968) TestBlockVerificationGcmDisabled failure in exhaustive

2018-05-03 Thread Tim Armstrong (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-6968:
--
Target Version: Impala 2.13.0, Impala 3.1.0
Issue Type: Bug  (was: Task)

> TestBlockVerificationGcmDisabled failure in exhaustive
> --
>
> Key: IMPALA-6968
> URL: https://issues.apache.org/jira/browse/IMPALA-6968
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.13.0
>Reporter: Sailesh Mukil
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: flaky
>
> {code:java}
> /data/jenkins/workspace/impala-cdh5-trunk-exhaustive-release-thrift/repos/Impala/be/src/runtime/tmp-file-mgr-test.cc:550
> Value of: read_status.code()
>   Actual: 0
> Expected: TErrorCode::SCRATCH_READ_VERIFY_FAILED
> Which is: 118
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-6970) DiskMgr::AllocateBuffersForRange crashes on failed DCHECK

2018-05-03 Thread Sailesh Mukil (JIRA)
Sailesh Mukil created IMPALA-6970:
-

 Summary: DiskMgr::AllocateBuffersForRange crashes on failed DCHECK
 Key: IMPALA-6970
 URL: https://issues.apache.org/jira/browse/IMPALA-6970
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 2.13.0
Reporter: Sailesh Mukil


Similar to IMPALA-6587, but the DCHECK failed in a slightly different way. 
Cannot tell if the root cause is the same as that though without further 
investigation.

{code:java}
FSF0503 09:30:26.715791 30750 reservation-tracker.cc:376] Check failed: bytes 
<= unused_reservation() (8388608 vs. 6291456) 
*** Check failure stack trace: ***
@  0x4277c1d  google::LogMessage::Fail()
@  0x42794c2  google::LogMessage::SendToLog()
@  0x42775f7  google::LogMessage::Flush()
@  0x427abbe  google::LogMessageFatal::~LogMessageFatal()
@  0x1ef1343  impala::ReservationTracker::AllocateFromLocked()
@  0x1ef111d  impala::ReservationTracker::AllocateFrom()
@  0x1ee8c57  impala::BufferPool::Client::PrepareToAllocateBuffer()
@  0x1ee5543  impala::BufferPool::AllocateBuffer()
@  0x2f50f68  impala::io::DiskIoMgr::AllocateBuffersForRange()
@  0x1f74762  impala::HdfsScanNodeBase::StartNextScanRange()
@  0x1f6b052  impala::HdfsScanNode::ScannerThread()
@  0x1f6a4ea  
_ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
@  0x1f6c5cc  
_ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
@  0x1bd4748  boost::function0<>::operator()()
@  0x1ebf349  impala::Thread::SuperviseThread()
@  0x1ec74e5  boost::_bi::list5<>::operator()<>()
@  0x1ec7409  boost::_bi::bind_t<>::operator()()
@  0x1ec73cc  boost::detail::thread_data<>::run()
@  0x31a1f0a  thread_proxy
@   0x36d1607851  (unknown)
@   0x36d12e894d  (unknown)

{code}

Git hash of Impala used in job: ba84ad03cb83d7f7aed8524fcfbb0e2cdc9fdd53




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-6970) DiskMgr::AllocateBuffersForRange crashes on failed DCHECK

2018-05-03 Thread Sailesh Mukil (JIRA)
Sailesh Mukil created IMPALA-6970:
-

 Summary: DiskMgr::AllocateBuffersForRange crashes on failed DCHECK
 Key: IMPALA-6970
 URL: https://issues.apache.org/jira/browse/IMPALA-6970
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 2.13.0
Reporter: Sailesh Mukil


Similar to IMPALA-6587, but the DCHECK failed in a slightly different way. 
Cannot tell if the root cause is the same as that though without further 
investigation.

{code:java}
FSF0503 09:30:26.715791 30750 reservation-tracker.cc:376] Check failed: bytes 
<= unused_reservation() (8388608 vs. 6291456) 
*** Check failure stack trace: ***
@  0x4277c1d  google::LogMessage::Fail()
@  0x42794c2  google::LogMessage::SendToLog()
@  0x42775f7  google::LogMessage::Flush()
@  0x427abbe  google::LogMessageFatal::~LogMessageFatal()
@  0x1ef1343  impala::ReservationTracker::AllocateFromLocked()
@  0x1ef111d  impala::ReservationTracker::AllocateFrom()
@  0x1ee8c57  impala::BufferPool::Client::PrepareToAllocateBuffer()
@  0x1ee5543  impala::BufferPool::AllocateBuffer()
@  0x2f50f68  impala::io::DiskIoMgr::AllocateBuffersForRange()
@  0x1f74762  impala::HdfsScanNodeBase::StartNextScanRange()
@  0x1f6b052  impala::HdfsScanNode::ScannerThread()
@  0x1f6a4ea  
_ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
@  0x1f6c5cc  
_ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
@  0x1bd4748  boost::function0<>::operator()()
@  0x1ebf349  impala::Thread::SuperviseThread()
@  0x1ec74e5  boost::_bi::list5<>::operator()<>()
@  0x1ec7409  boost::_bi::bind_t<>::operator()()
@  0x1ec73cc  boost::detail::thread_data<>::run()
@  0x31a1f0a  thread_proxy
@   0x36d1607851  (unknown)
@   0x36d12e894d  (unknown)

{code}

Git hash of Impala used in job: ba84ad03cb83d7f7aed8524fcfbb0e2cdc9fdd53




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IMPALA-6587) Crash in DiskMgr::AllocateBuffersForRange

2018-05-03 Thread Tim Armstrong (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463144#comment-16463144
 ] 

Tim Armstrong commented on IMPALA-6587:
---

Let's open a new JIRA - no way to tell if the root cause is the same.

> Crash in DiskMgr::AllocateBuffersForRange
> -
>
> Key: IMPALA-6587
> URL: https://issues.apache.org/jira/browse/IMPALA-6587
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Affects Versions: Impala 2.12.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: broken-build, crash
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> {noformat}
> F0224 17:43:08.522589 13124 reservation-tracker.cc:376] Check failed: bytes 
> <= unused_reservation() (8192 vs. 0) 
> {noformat}
> {noformat}
> #0  0x003cb32328e5 in raise () from /lib64/libc.so.6
> #1  0x003cb32340c5 in abort () from /lib64/libc.so.6
> #2  0x03c5a244 in google::DumpStackTraceAndExit() ()
> #3  0x03c50cbd in google::LogMessage::Fail() ()
> #4  0x03c52562 in google::LogMessage::SendToLog() ()
> #5  0x03c50697 in google::LogMessage::Flush() ()
> #6  0x03c53c5e in google::LogMessageFatal::~LogMessageFatal() ()
> #7  0x01b7a813 in impala::ReservationTracker::AllocateFromLocked 
> (this=0x1a75d2a98, bytes=8192) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-integration/repos/Impala/be/src/runtime/bufferpool/reservation-tracker.cc:376
> #8  0x01b7a5ed in impala::ReservationTracker::AllocateFrom 
> (this=0x1a75d2a98, bytes=8192) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-integration/repos/Impala/be/src/runtime/bufferpool/reservation-tracker.cc:370
> #9  0x01b72127 in impala::BufferPool::Client::PrepareToAllocateBuffer 
> (this=0x1a75d2a80, len=8192, reserved=true, success=0x0) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-integration/repos/Impala/be/src/runtime/bufferpool/buffer-pool.cc:567
> #10 0x01b6ea13 in impala::BufferPool::AllocateBuffer (this=0xa6af380, 
> client=0x14121248, len=8192, handle=0x7fede6224260) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-integration/repos/Impala/be/src/runtime/bufferpool/buffer-pool.cc:229
> #11 0x02b894f0 in impala::io::DiskIoMgr::AllocateBuffersForRange 
> (this=0xb06fd40, reader=0x1ecf10300, bp_client=0x14121248, range=0x14711180, 
> max_bytes=8192) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-integration/repos/Impala/be/src/runtime/io/disk-io-mgr.cc:470
> #12 0x01bef7ff in impala::HdfsScanNode::ScannerThread 
> (this=0x14121100, scanner_thread_reservation=8192) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-integration/repos/Impala/be/src/exec/hdfs-scan-node.cc:393
> #13 0x01beec52 in impala::HdfsScanNode::::operator()(void) 
> const (__closure=0x7fede6224bc8) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-integration/repos/Impala/be/src/exec/hdfs-scan-node.cc:303
> #14 0x01bf0d75 in 
> boost::detail::function::void_function_obj_invoker0,
>  void>::invoke(boost::detail::function::function_buffer &) 
> (function_obj_ptr=...) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-integration/Impala-Toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:153
> #15 0x0183e44a in boost::function0::operator() 
> (this=0x7fede6224bc0) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-integration/Impala-Toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:767
> #16 0x01b484cf in impala::Thread::SuperviseThread (name=..., 
> category=..., functor=..., parent_thread_info=0x7fede6c25870, 
> thread_started=0x7fede6c24160) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-integration/repos/Impala/be/src/util/thread.cc:356
> #17 0x01b509a5 in 
> boost::_bi::list5 std::char_traits, std::allocator > >, 
> boost::_bi::value std::allocator > >, boost::_bi::value >, 
> boost::_bi::value, 
> boost::_bi::value >::operator() std::basic_string&, const std::basic_string&, 
> boost::function, const impala::ThreadDebugInfo*, impala::Promise int>*), boost::_bi::list0>(boost::_bi::type, void (*&)(const 
> std::basic_string &, 
> const std::basic_string 
> &, boost::function, const impala::ThreadDebugInfo *, 
> impala::Promise *), boost::_bi::list0 &, int) (this=0x1c9dd2fc0, 
> f=@0x1c9dd2fb8, a=...) at 
> 

[jira] [Created] (IMPALA-6969) Profile doesn't include the reason that a query couldn't be dequeued from admission controller

2018-05-03 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-6969:
-

 Summary: Profile doesn't include the reason that a query couldn't 
be dequeued from admission controller
 Key: IMPALA-6969
 URL: https://issues.apache.org/jira/browse/IMPALA-6969
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Affects Versions: Impala 2.12.0, Impala 3.0
Reporter: Tim Armstrong
Assignee: Tim Armstrong


I noticed this while playing around on a local minicluster with AC enabled.

The admission controller adds the reason for initial queuing to the profile, 
but does not expose why the query couldn't execute when it got to the head of 
the line. E.g. if it was initially queued because the queue was non-empty but 
then couldn't execute once it got to the head of the line because of memory.
{noformat}
Request Pool: root.queueA
Admission result: Admitted (queued)
Admission queue details: waited 1130 ms, reason: queue is not empty (size 
4); queued queries are executed first
{noformat}

We should still include the initial reason for queuing, but also include the 
most reason for queuing once it got to the head of the line. It's probably most 
useful to keep the profile updated with the latest reason at all times (since 
the details can change while the query is at the head of the line).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-6969) Profile doesn't include the reason that a query couldn't be dequeued from admission controller

2018-05-03 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-6969:
-

 Summary: Profile doesn't include the reason that a query couldn't 
be dequeued from admission controller
 Key: IMPALA-6969
 URL: https://issues.apache.org/jira/browse/IMPALA-6969
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Affects Versions: Impala 2.12.0, Impala 3.0
Reporter: Tim Armstrong
Assignee: Tim Armstrong


I noticed this while playing around on a local minicluster with AC enabled.

The admission controller adds the reason for initial queuing to the profile, 
but does not expose why the query couldn't execute when it got to the head of 
the line. E.g. if it was initially queued because the queue was non-empty but 
then couldn't execute once it got to the head of the line because of memory.
{noformat}
Request Pool: root.queueA
Admission result: Admitted (queued)
Admission queue details: waited 1130 ms, reason: queue is not empty (size 
4); queued queries are executed first
{noformat}

We should still include the initial reason for queuing, but also include the 
most reason for queuing once it got to the head of the line. It's probably most 
useful to keep the profile updated with the latest reason at all times (since 
the details can change while the query is at the head of the line).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-6968) TestBlockVerificationGcmDisabled failure in exhaustive

2018-05-03 Thread Sailesh Mukil (JIRA)
Sailesh Mukil created IMPALA-6968:
-

 Summary: TestBlockVerificationGcmDisabled failure in exhaustive
 Key: IMPALA-6968
 URL: https://issues.apache.org/jira/browse/IMPALA-6968
 Project: IMPALA
  Issue Type: Task
  Components: Infrastructure
Affects Versions: Impala 2.13.0
Reporter: Sailesh Mukil
Assignee: Tim Armstrong


{code:java}
/data/jenkins/workspace/impala-cdh5-trunk-exhaustive-release-thrift/repos/Impala/be/src/runtime/tmp-file-mgr-test.cc:550
Value of: read_status.code()
  Actual: 0
Expected: TErrorCode::SCRATCH_READ_VERIFY_FAILED
Which is: 118
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IMPALA-6877) Allow setting -vmodule from debug page

2018-05-03 Thread Bikramjeet Vig (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463134#comment-16463134
 ] 

Bikramjeet Vig commented on IMPALA-6877:


[~bharathv] 

All the vmodule patterns are stored in a separate list and have individual 
levels set to them which always take priority over the global FLAGS_v value. If 
you change "-v" that would only change the global FLAGS_v and anything *not a 
part* of the vmodule list will be affected. 

TLDR: anything added to the vmodule list will be handled separately from the 
global FLAGS_v.
So, if we added a pattern/module to the vmodule list and want to keep it in 
sync with FLAGS_v, we will have to reset its value to FLAGS_v everytime we 
change FLAGS_v

> Allow setting -vmodule from debug page
> --
>
> Key: IMPALA-6877
> URL: https://issues.apache.org/jira/browse/IMPALA-6877
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.12.0
>Reporter: Tim Armstrong
>Assignee: Bikramjeet Vig
>Priority: Major
>  Labels: supportability
>
> We allow setting the global verbosity level from the debug page, but we don't 
> allow setting it per module, E.g. -vmodule admission-controller=2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6587) Crash in DiskMgr::AllocateBuffersForRange

2018-05-03 Thread Sailesh Mukil (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463130#comment-16463130
 ] 

Sailesh Mukil commented on IMPALA-6587:
---

We're still seeing this crash after this patch has gone in. [~tarmstrong] 
Should I reopen this JIRA, or file a new one? The DCHECK failed in a slightly 
different way

{code:java}
FSF0503 09:30:26.715791 30750 reservation-tracker.cc:376] Check failed: bytes 
<= unused_reservation() (8388608 vs. 6291456) 
*** Check failure stack trace: ***
@  0x4277c1d  google::LogMessage::Fail()
@  0x42794c2  google::LogMessage::SendToLog()
@  0x42775f7  google::LogMessage::Flush()
@  0x427abbe  google::LogMessageFatal::~LogMessageFatal()
@  0x1ef1343  impala::ReservationTracker::AllocateFromLocked()
@  0x1ef111d  impala::ReservationTracker::AllocateFrom()
@  0x1ee8c57  impala::BufferPool::Client::PrepareToAllocateBuffer()
@  0x1ee5543  impala::BufferPool::AllocateBuffer()
@  0x2f50f68  impala::io::DiskIoMgr::AllocateBuffersForRange()
@  0x1f74762  impala::HdfsScanNodeBase::StartNextScanRange()
@  0x1f6b052  impala::HdfsScanNode::ScannerThread()
@  0x1f6a4ea  
_ZZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS_18ThreadResourcePoolEENKUlvE_clEv
@  0x1f6c5cc  
_ZN5boost6detail8function26void_function_obj_invoker0IZN6impala12HdfsScanNode22ThreadTokenAvailableCbEPNS3_18ThreadResourcePoolEEUlvE_vE6invokeERNS1_15function_bufferE
@  0x1bd4748  boost::function0<>::operator()()
@  0x1ebf349  impala::Thread::SuperviseThread()
@  0x1ec74e5  boost::_bi::list5<>::operator()<>()
@  0x1ec7409  boost::_bi::bind_t<>::operator()()
@  0x1ec73cc  boost::detail::thread_data<>::run()
@  0x31a1f0a  thread_proxy
@   0x36d1607851  (unknown)
@   0x36d12e894d  (unknown)
{code}

Git hash of Impala used in job: ba84ad03cb83d7f7aed8524fcfbb0e2cdc9fdd53

I can provide more details if necessary.

> Crash in DiskMgr::AllocateBuffersForRange
> -
>
> Key: IMPALA-6587
> URL: https://issues.apache.org/jira/browse/IMPALA-6587
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Affects Versions: Impala 2.12.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: broken-build, crash
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> {noformat}
> F0224 17:43:08.522589 13124 reservation-tracker.cc:376] Check failed: bytes 
> <= unused_reservation() (8192 vs. 0) 
> {noformat}
> {noformat}
> #0  0x003cb32328e5 in raise () from /lib64/libc.so.6
> #1  0x003cb32340c5 in abort () from /lib64/libc.so.6
> #2  0x03c5a244 in google::DumpStackTraceAndExit() ()
> #3  0x03c50cbd in google::LogMessage::Fail() ()
> #4  0x03c52562 in google::LogMessage::SendToLog() ()
> #5  0x03c50697 in google::LogMessage::Flush() ()
> #6  0x03c53c5e in google::LogMessageFatal::~LogMessageFatal() ()
> #7  0x01b7a813 in impala::ReservationTracker::AllocateFromLocked 
> (this=0x1a75d2a98, bytes=8192) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-integration/repos/Impala/be/src/runtime/bufferpool/reservation-tracker.cc:376
> #8  0x01b7a5ed in impala::ReservationTracker::AllocateFrom 
> (this=0x1a75d2a98, bytes=8192) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-integration/repos/Impala/be/src/runtime/bufferpool/reservation-tracker.cc:370
> #9  0x01b72127 in impala::BufferPool::Client::PrepareToAllocateBuffer 
> (this=0x1a75d2a80, len=8192, reserved=true, success=0x0) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-integration/repos/Impala/be/src/runtime/bufferpool/buffer-pool.cc:567
> #10 0x01b6ea13 in impala::BufferPool::AllocateBuffer (this=0xa6af380, 
> client=0x14121248, len=8192, handle=0x7fede6224260) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-integration/repos/Impala/be/src/runtime/bufferpool/buffer-pool.cc:229
> #11 0x02b894f0 in impala::io::DiskIoMgr::AllocateBuffersForRange 
> (this=0xb06fd40, reader=0x1ecf10300, bp_client=0x14121248, range=0x14711180, 
> max_bytes=8192) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-integration/repos/Impala/be/src/runtime/io/disk-io-mgr.cc:470
> #12 0x01bef7ff in impala::HdfsScanNode::ScannerThread 
> (this=0x14121100, scanner_thread_reservation=8192) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-integration/repos/Impala/be/src/exec/hdfs-scan-node.cc:393
> #13 0x01beec52 in impala::HdfsScanNode::::operator()(void) 
> const (__closure=0x7fede6224bc8) at 
> 

[jira] [Commented] (IMPALA-6877) Allow setting -vmodule from debug page

2018-05-03 Thread bharath v (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463121#comment-16463121
 ] 

bharath v commented on IMPALA-6877:
---

bq. The most we can do is keep a track externally of which patterns have been 
added and have an option in the debug page to reset them to the global "-v" 
level. But then we will have to do that everytime we change the "-v" level.

IMO, that is a reasonable feature to have. But, why do we need to do that every 
time -v changes? Isn't that polled directly from FLAGS_v?

> Allow setting -vmodule from debug page
> --
>
> Key: IMPALA-6877
> URL: https://issues.apache.org/jira/browse/IMPALA-6877
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.12.0
>Reporter: Tim Armstrong
>Assignee: Bikramjeet Vig
>Priority: Major
>  Labels: supportability
>
> We allow setting the global verbosity level from the debug page, but we don't 
> allow setting it per module, E.g. -vmodule admission-controller=2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-5662) Log all information relevant to admission control decision making

2018-05-03 Thread Tim Armstrong (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-5662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463080#comment-16463080
 ] 

Tim Armstrong commented on IMPALA-5662:
---

Ah I was wrong, the one in the description is logged to VLOG_QUERY == VLOG(1). 
There are some similar messages that are logged at VLOG_RPC == VLOG(1) but 
those seem less important.

> Log all information relevant to admission control decision making
> -
>
> Key: IMPALA-5662
> URL: https://issues.apache.org/jira/browse/IMPALA-5662
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Balazs Jeszenszky
>Assignee: Bikramjeet Vig
>Priority: Major
>  Labels: admission-control, observability, resource-management, 
> supportability
>
> Currently, when making a decision whether to admit a query or not, the log 
> has the following format:
> {code:java}
> I0705 14:43:04.031771  7388 admission-controller.cc:442] Stats: 
> agg_num_running=1, agg_num_queued=0, agg_mem_reserved=486.74 MB,  
> local_host(local_mem_admitted=0, num_admitted_running=0, num_queued=0, 
> backend_mem_reserved=56.07 MB)
> {code}
> Since it's also possible to queue queries due to one node not being able to 
> reserve the required memory, we should also log the max(backend_mem_reserved) 
> across all nodes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-6961) Impala Doc: Doc --enable_minidumps flag

2018-05-03 Thread Alex Rodoni (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-6961.
---

> Impala Doc: Doc --enable_minidumps flag
> ---
>
> Key: IMPALA-6961
> URL: https://issues.apache.org/jira/browse/IMPALA-6961
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
> Fix For: Impala 3.0, Impala 2.13.0
>
>
> https://gerrit.cloudera.org/#/c/10285/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-6961) Impala Doc: Doc --enable_minidumps flag

2018-05-03 Thread Alex Rodoni (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni resolved IMPALA-6961.
-
   Resolution: Fixed
Fix Version/s: Impala 2.13.0
   Impala 3.0

> Impala Doc: Doc --enable_minidumps flag
> ---
>
> Key: IMPALA-6961
> URL: https://issues.apache.org/jira/browse/IMPALA-6961
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
> Fix For: Impala 3.0, Impala 2.13.0
>
>
> https://gerrit.cloudera.org/#/c/10285/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-6961) Impala Doc: Doc --enable_minidumps flag

2018-05-03 Thread Alex Rodoni (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni resolved IMPALA-6961.
-
   Resolution: Fixed
Fix Version/s: Impala 2.13.0
   Impala 3.0

> Impala Doc: Doc --enable_minidumps flag
> ---
>
> Key: IMPALA-6961
> URL: https://issues.apache.org/jira/browse/IMPALA-6961
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
> Fix For: Impala 3.0, Impala 2.13.0
>
>
> https://gerrit.cloudera.org/#/c/10285/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (IMPALA-6961) Impala Doc: Doc --enable_minidumps flag

2018-05-03 Thread Alex Rodoni (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-6961.
---

> Impala Doc: Doc --enable_minidumps flag
> ---
>
> Key: IMPALA-6961
> URL: https://issues.apache.org/jira/browse/IMPALA-6961
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
> Fix For: Impala 3.0, Impala 2.13.0
>
>
> https://gerrit.cloudera.org/#/c/10285/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IMPALA-6961) Impala Doc: Doc --enable_minidumps flag

2018-05-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463033#comment-16463033
 ] 

ASF subversion and git services commented on IMPALA-6961:
-

Commit 1eedafed6ae147829e84c3a08a1bf54d7ab4b1fb in impala's branch 
refs/heads/master from [~arodoni_cloudera]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=1eedafe ]

IMPALA-6961: [DOCS] Doc --enable_minidump flag to disable minidumps

Change-Id: I3412e36272cda0c1502d4643afcdbad01e9548a5
Reviewed-on: http://gerrit.cloudera.org:8080/10285
Reviewed-by: Lars Volker 
Tested-by: Impala Public Jenkins 


> Impala Doc: Doc --enable_minidumps flag
> ---
>
> Key: IMPALA-6961
> URL: https://issues.apache.org/jira/browse/IMPALA-6961
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Docs
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>
> https://gerrit.cloudera.org/#/c/10285/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6877) Allow setting -vmodule from debug page

2018-05-03 Thread Bikramjeet Vig (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16463013#comment-16463013
 ] 

Bikramjeet Vig commented on IMPALA-6877:


It turns out that using "-vmodule" dynamically is more restrictive than "-v" as 
is apparent from this comment:

{noformat}
// Set VLOG(_IS_ON) level for module_pattern to log_level.
// This lets us dynamically control what is normally set by the --vmodule flag.
// Returns the level that previously applied to module_pattern.
// NOTE: To change the log level for VLOG(_IS_ON) sites
//   that have already executed after/during InitGoogleLogging,
//   one needs to supply the exact --vmodule pattern that applied to them.
//   (If no --vmodule pattern applied to them
//   the value of FLAGS_v will continue to control them.)
extern GOOGLE_GLOG_DLL_DECL int SetVLOGLevel(const char* module_pattern,
 int log_level);
{noformat}

This means that unlike "-v" commandline option which can be changed dynamically 
and takes effect after change, the "-vmodule" commandline is only used once 
during initialization (call to InitGoogleLogging() ) and afterwards any change 
to this commandline parameter will have no effect.

Instead we would need to use SetVLOGLevel() to change the log level of a 
particular "vmodule pattern" defined earlier. 
I looked at the source code and it seems like SetVLOGLevel() also supports 
adding more patterns but once patterns are added to the list, they cannot be 
removed and will always override the global "-v" level. This means that if we 
want to reset/remove the vmodule patterns, that currently is not possible. The 
most we can do is keep a track externally of which patterns have been added and 
have an option in the debug page to reset them to the global "-v" level. But 
then we will have to do that everytime we change the "-v" level.

> Allow setting -vmodule from debug page
> --
>
> Key: IMPALA-6877
> URL: https://issues.apache.org/jira/browse/IMPALA-6877
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.12.0
>Reporter: Tim Armstrong
>Assignee: Bikramjeet Vig
>Priority: Major
>  Labels: supportability
>
> We allow setting the global verbosity level from the debug page, but we don't 
> allow setting it per module, E.g. -vmodule admission-controller=2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-6959) Update HAProxy configuration sample for Impala

2018-05-03 Thread Alex Rodoni (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni updated IMPALA-6959:

Fix Version/s: Impala 2.13.0

> Update HAProxy configuration sample for Impala
> --
>
> Key: IMPALA-6959
> URL: https://issues.apache.org/jira/browse/IMPALA-6959
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Affects Versions: Impala 2.12.0
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
> Fix For: Impala 3.0, Impala 2.13.0
>
>
> Our doc regarding setting up Impala HA using HAProxy:
> [https://www.cloudera.com/documentation/enterprise/latest/topics/impala_proxy.html]
> it uses the following timeout values:
> {code:java}
> contimeout 5000
> clitimeout 5
> srvtimeout 5
> {code}
> Two issues here:
> 1. contimeout, clitimeout and srvtimeout are old, the new config names should 
> be
> {code:java}
> timeout connect
> timeout client
> timeout server
> {code}
> The outdated config names shows that our docs are also outdated
> 2. the timeout value of 5, which is 50 seconds is too low, it will cause 
> impala client to timeout all the time, the correct value should be at least 1 
> hour to start with:
> {code:java}
> timeout client 3600s
> timeout server 3600s
> {code}
> And we should also mention that these values should be dependant on how users 
> use the cluster and how long their queries are running. It is NOT one config 
> fits for all case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Closed] (IMPALA-6959) Update HAProxy configuration sample for Impala

2018-05-03 Thread Alex Rodoni (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-6959.
---

> Update HAProxy configuration sample for Impala
> --
>
> Key: IMPALA-6959
> URL: https://issues.apache.org/jira/browse/IMPALA-6959
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Affects Versions: Impala 2.12.0
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
> Fix For: Impala 3.0, Impala 2.13.0
>
>
> Our doc regarding setting up Impala HA using HAProxy:
> [https://www.cloudera.com/documentation/enterprise/latest/topics/impala_proxy.html]
> it uses the following timeout values:
> {code:java}
> contimeout 5000
> clitimeout 5
> srvtimeout 5
> {code}
> Two issues here:
> 1. contimeout, clitimeout and srvtimeout are old, the new config names should 
> be
> {code:java}
> timeout connect
> timeout client
> timeout server
> {code}
> The outdated config names shows that our docs are also outdated
> 2. the timeout value of 5, which is 50 seconds is too low, it will cause 
> impala client to timeout all the time, the correct value should be at least 1 
> hour to start with:
> {code:java}
> timeout client 3600s
> timeout server 3600s
> {code}
> And we should also mention that these values should be dependant on how users 
> use the cluster and how long their queries are running. It is NOT one config 
> fits for all case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-6959) Update HAProxy configuration sample for Impala

2018-05-03 Thread Alex Rodoni (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni resolved IMPALA-6959.
-
   Resolution: Fixed
Fix Version/s: Impala 3.0

> Update HAProxy configuration sample for Impala
> --
>
> Key: IMPALA-6959
> URL: https://issues.apache.org/jira/browse/IMPALA-6959
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Affects Versions: Impala 2.12.0
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
> Fix For: Impala 3.0
>
>
> Our doc regarding setting up Impala HA using HAProxy:
> [https://www.cloudera.com/documentation/enterprise/latest/topics/impala_proxy.html]
> it uses the following timeout values:
> {code:java}
> contimeout 5000
> clitimeout 5
> srvtimeout 5
> {code}
> Two issues here:
> 1. contimeout, clitimeout and srvtimeout are old, the new config names should 
> be
> {code:java}
> timeout connect
> timeout client
> timeout server
> {code}
> The outdated config names shows that our docs are also outdated
> 2. the timeout value of 5, which is 50 seconds is too low, it will cause 
> impala client to timeout all the time, the correct value should be at least 1 
> hour to start with:
> {code:java}
> timeout client 3600s
> timeout server 3600s
> {code}
> And we should also mention that these values should be dependant on how users 
> use the cluster and how long their queries are running. It is NOT one config 
> fits for all case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-6959) Update HAProxy configuration sample for Impala

2018-05-03 Thread Alex Rodoni (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni resolved IMPALA-6959.
-
   Resolution: Fixed
Fix Version/s: Impala 3.0

> Update HAProxy configuration sample for Impala
> --
>
> Key: IMPALA-6959
> URL: https://issues.apache.org/jira/browse/IMPALA-6959
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Affects Versions: Impala 2.12.0
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
> Fix For: Impala 3.0
>
>
> Our doc regarding setting up Impala HA using HAProxy:
> [https://www.cloudera.com/documentation/enterprise/latest/topics/impala_proxy.html]
> it uses the following timeout values:
> {code:java}
> contimeout 5000
> clitimeout 5
> srvtimeout 5
> {code}
> Two issues here:
> 1. contimeout, clitimeout and srvtimeout are old, the new config names should 
> be
> {code:java}
> timeout connect
> timeout client
> timeout server
> {code}
> The outdated config names shows that our docs are also outdated
> 2. the timeout value of 5, which is 50 seconds is too low, it will cause 
> impala client to timeout all the time, the correct value should be at least 1 
> hour to start with:
> {code:java}
> timeout client 3600s
> timeout server 3600s
> {code}
> And we should also mention that these values should be dependant on how users 
> use the cluster and how long their queries are running. It is NOT one config 
> fits for all case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (IMPALA-6959) Update HAProxy configuration sample for Impala

2018-05-03 Thread Alex Rodoni (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Rodoni closed IMPALA-6959.
---

> Update HAProxy configuration sample for Impala
> --
>
> Key: IMPALA-6959
> URL: https://issues.apache.org/jira/browse/IMPALA-6959
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Affects Versions: Impala 2.12.0
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
> Fix For: Impala 3.0, Impala 2.13.0
>
>
> Our doc regarding setting up Impala HA using HAProxy:
> [https://www.cloudera.com/documentation/enterprise/latest/topics/impala_proxy.html]
> it uses the following timeout values:
> {code:java}
> contimeout 5000
> clitimeout 5
> srvtimeout 5
> {code}
> Two issues here:
> 1. contimeout, clitimeout and srvtimeout are old, the new config names should 
> be
> {code:java}
> timeout connect
> timeout client
> timeout server
> {code}
> The outdated config names shows that our docs are also outdated
> 2. the timeout value of 5, which is 50 seconds is too low, it will cause 
> impala client to timeout all the time, the correct value should be at least 1 
> hour to start with:
> {code:java}
> timeout client 3600s
> timeout server 3600s
> {code}
> And we should also mention that these values should be dependant on how users 
> use the cluster and how long their queries are running. It is NOT one config 
> fits for all case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-4123) Columnar decoding in Parquet scanner

2018-05-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462997#comment-16462997
 ] 

ASF subversion and git services commented on IMPALA-4123:
-

Commit 51bc004d798d2c7ab7b8b4553d32d26cb7382ad6 in impala's branch 
refs/heads/master from [~tarmstr...@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=51bc004 ]

IMPALA-4123 (prep): Parquet column reader cleanup

Some miscellaneous cleanup to make it easier to understand and
make future changes to the Parquet scanner.

A lot of the refactoring is about more cleanly separating functions
so that they have clearer purpose, e.g.:
* Functions that strictly do decoding, i.e. materialize values, convert
  and validate them. These are changed to operate on single values, not tuples.
* Functions that are used for the non-batched decoding path (i.e. driven
  by CollectionColumnReader or BoolColumnReader).
* Functions that dispatch to a templated implementation based on one or
  more runtime values.

Other misc changes:
* Move large functions out of class bodies.
* Use parquet::Encoding instead of bool to indicate encoding.
* Add some additional DCHECKs.

Testing:
* Ran exhaustive tests
* Ran fuzz test in a loop

Change-Id: Ibc00352df3a0b2d605f872ae7e43db2dc90faab1
Reviewed-on: http://gerrit.cloudera.org:8080/9799
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 


> Columnar decoding in Parquet scanner
> 
>
> Key: IMPALA-4123
> URL: https://issues.apache.org/jira/browse/IMPALA-4123
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.8.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: parquet, perfomance
>
> Apache Parquet have some nice performance improvements to bit-packed decoding 
> in their Parquet scanner (which is derived from Impala's): 
> https://github.com/apache/parquet-cpp/pull/140
> We should do something similar - i.e. switch to more of a batch-oriented 
> approach to decoding rather than value-at-a-time



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6959) Update HAProxy configuration sample for Impala

2018-05-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462996#comment-16462996
 ] 

ASF subversion and git services commented on IMPALA-6959:
-

Commit aee045df806c5736a9f49d2324fc1e30db732db2 in impala's branch 
refs/heads/master from [~arodoni_cloudera]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=aee045d ]

IMPALA-6959: [DOCS] Update to HAProxy configuration sample code

- Changed to deprecated timeouts: contimeout, clitimeout, srvtimeout
- Changed the sample timeout values to more realistic values
- Added a note that actual timeout values should depend on
  the user cluster

Change-Id: Idff3aa9bbb58c1953cb7e9394ade01c7833c3a34
Reviewed-on: http://gerrit.cloudera.org:8080/10284
Reviewed-by: Alan Choi 
Reviewed-by: Alex Rodoni 
Tested-by: Impala Public Jenkins 


> Update HAProxy configuration sample for Impala
> --
>
> Key: IMPALA-6959
> URL: https://issues.apache.org/jira/browse/IMPALA-6959
> Project: IMPALA
>  Issue Type: Task
>  Components: Docs
>Affects Versions: Impala 2.12.0
>Reporter: Alex Rodoni
>Assignee: Alex Rodoni
>Priority: Major
>
> Our doc regarding setting up Impala HA using HAProxy:
> [https://www.cloudera.com/documentation/enterprise/latest/topics/impala_proxy.html]
> it uses the following timeout values:
> {code:java}
> contimeout 5000
> clitimeout 5
> srvtimeout 5
> {code}
> Two issues here:
> 1. contimeout, clitimeout and srvtimeout are old, the new config names should 
> be
> {code:java}
> timeout connect
> timeout client
> timeout server
> {code}
> The outdated config names shows that our docs are also outdated
> 2. the timeout value of 5, which is 50 seconds is too low, it will cause 
> impala client to timeout all the time, the correct value should be at least 1 
> hour to start with:
> {code:java}
> timeout client 3600s
> timeout server 3600s
> {code}
> And we should also mention that these values should be dependant on how users 
> use the cluster and how long their queries are running. It is NOT one config 
> fits for all case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6967) GVO should only allow patches that apply cleanly to both master and 2.x

2018-05-03 Thread Philip Zeyliger (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462985#comment-16462985
 ] 

Philip Zeyliger commented on IMPALA-6967:
-

Let's say I have a conflicting patch that I need to go in two branches. I don't 
think it's a good idea for me to confuse the commit message by saying 
"Cherry-picks: not for 2.x".

> GVO should only allow patches that apply cleanly to both master and 2.x
> ---
>
> Key: IMPALA-6967
> URL: https://issues.apache.org/jira/browse/IMPALA-6967
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Reporter: Sailesh Mukil
>Priority: Major
>  Labels: jenkins
>
> Following this thread:
> https://lists.apache.org/thread.html/bba3c5a87635ad3c70c40ac120de2ddb41c3d0e2f5db0b29bc0243ff@%3Cdev.impala.apache.org%3E
> It would take load off authors if the GVO could automatically tell if a patch 
> that's being pushed to master would cleanly cherry-pick to 2.x.
> At the beginning of the GVO, we should try to cherry-pick to 2.x and fail if 
> there are conflicts, unless the commit message has the line:
> "Cherry-picks: not for 2.x"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-6967) GVO should only allow patches that apply cleanly to both master and 2.x

2018-05-03 Thread Sailesh Mukil (JIRA)
Sailesh Mukil created IMPALA-6967:
-

 Summary: GVO should only allow patches that apply cleanly to both 
master and 2.x
 Key: IMPALA-6967
 URL: https://issues.apache.org/jira/browse/IMPALA-6967
 Project: IMPALA
  Issue Type: Task
  Components: Infrastructure
Reporter: Sailesh Mukil


Following this thread:
https://lists.apache.org/thread.html/bba3c5a87635ad3c70c40ac120de2ddb41c3d0e2f5db0b29bc0243ff@%3Cdev.impala.apache.org%3E

It would take load off authors if the GVO could automatically tell if a patch 
that's being pushed to master would cleanly cherry-pick to 2.x.

At the beginning of the GVO, we should try to cherry-pick to 2.x and fail if 
there are conflicts, unless the commit message has the line:
"Cherry-picks: not for 2.x"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6967) GVO should only allow patches that apply cleanly to both master and 2.x

2018-05-03 Thread Sailesh Mukil (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462965#comment-16462965
 ] 

Sailesh Mukil commented on IMPALA-6967:
---

CC: [~lv] [~jbapple]

> GVO should only allow patches that apply cleanly to both master and 2.x
> ---
>
> Key: IMPALA-6967
> URL: https://issues.apache.org/jira/browse/IMPALA-6967
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Reporter: Sailesh Mukil
>Priority: Major
>  Labels: jenkins
>
> Following this thread:
> https://lists.apache.org/thread.html/bba3c5a87635ad3c70c40ac120de2ddb41c3d0e2f5db0b29bc0243ff@%3Cdev.impala.apache.org%3E
> It would take load off authors if the GVO could automatically tell if a patch 
> that's being pushed to master would cleanly cherry-pick to 2.x.
> At the beginning of the GVO, we should try to cherry-pick to 2.x and fail if 
> there are conflicts, unless the commit message has the line:
> "Cherry-picks: not for 2.x"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-6967) GVO should only allow patches that apply cleanly to both master and 2.x

2018-05-03 Thread Sailesh Mukil (JIRA)
Sailesh Mukil created IMPALA-6967:
-

 Summary: GVO should only allow patches that apply cleanly to both 
master and 2.x
 Key: IMPALA-6967
 URL: https://issues.apache.org/jira/browse/IMPALA-6967
 Project: IMPALA
  Issue Type: Task
  Components: Infrastructure
Reporter: Sailesh Mukil


Following this thread:
https://lists.apache.org/thread.html/bba3c5a87635ad3c70c40ac120de2ddb41c3d0e2f5db0b29bc0243ff@%3Cdev.impala.apache.org%3E

It would take load off authors if the GVO could automatically tell if a patch 
that's being pushed to master would cleanly cherry-pick to 2.x.

At the beginning of the GVO, we should try to cherry-pick to 2.x and fail if 
there are conflicts, unless the commit message has the line:
"Cherry-picks: not for 2.x"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IMPALA-6931) TestQueryExpiration.test_query_expiration fails on ASAN with unexpected number of expired queries

2018-05-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462767#comment-16462767
 ] 

ASF subversion and git services commented on IMPALA-6931:
-

Commit b69f02ba02a2d9d0cdbe3abc5c662f73e919aee7 in impala's branch 
refs/heads/master from [~vercego]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=b69f02b ]

IMPALA-6931: reduces races in query expiration tests

Recent tests ran into flakiness when testing query expiration.
This change makes two changes:
1) query state is retrieved earlier; a flaky test skipped
the expected state.
2) bump the timing; a flaky test had queries expire before
it could check them

Change-Id: I93f4ec450fc7e5a685c135b444e90d37e632831d
Reviewed-on: http://gerrit.cloudera.org:8080/10279
Reviewed-by: Dan Hecht 
Tested-by: Impala Public Jenkins 


> TestQueryExpiration.test_query_expiration fails on ASAN with unexpected 
> number of expired queries
> -
>
> Key: IMPALA-6931
> URL: https://issues.apache.org/jira/browse/IMPALA-6931
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.12.0
>Reporter: David Knupp
>Assignee: Vuk Ercegovac
>Priority: Blocker
>
> Stacktrace
> {noformat}
> custom_cluster/test_query_expiration.py:108: in test_query_expiration
> client.QUERY_STATES['EXCEPTION'])
> custom_cluster/test_query_expiration.py:184: in __expect_client_state
> assert expected_state == actual_state
> E   assert 5 == 4
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-1803) Avoid hitting OOM in HdfsTableSink when inserting to Parquet

2018-05-03 Thread Tim Armstrong (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1803.
---
   Resolution: Fixed
Fix Version/s: Impala 3.0

IMPALA-4899 and IMPALA-5293 should address this.

> Avoid hitting OOM in HdfsTableSink when inserting to Parquet
> 
>
> Key: IMPALA-1803
> URL: https://issues.apache.org/jira/browse/IMPALA-1803
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.0
>Reporter: Ippokratis Pandis
>Priority: Major
>  Labels: resource-management, usability
> Fix For: Impala 3.0
>
> Attachments: hdfstablesink-oom.txt
>
>
> Impala's memory consumption is very high when it writes to Parquet and there 
> is a large number of partitions, primarily because we try to buffer data per 
> partition. That however can lead to OOM, see attached profile. Instead we can 
> either spill the buffered data to disk or write to Parquet files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-1803) Avoid hitting OOM in HdfsTableSink when inserting to Parquet

2018-05-03 Thread Tim Armstrong (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-1803.
---
   Resolution: Fixed
Fix Version/s: Impala 3.0

IMPALA-4899 and IMPALA-5293 should address this.

> Avoid hitting OOM in HdfsTableSink when inserting to Parquet
> 
>
> Key: IMPALA-1803
> URL: https://issues.apache.org/jira/browse/IMPALA-1803
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.0
>Reporter: Ippokratis Pandis
>Priority: Major
>  Labels: resource-management, usability
> Fix For: Impala 3.0
>
> Attachments: hdfstablesink-oom.txt
>
>
> Impala's memory consumption is very high when it writes to Parquet and there 
> is a large number of partitions, primarily because we try to buffer data per 
> partition. That however can lead to OOM, see attached profile. Instead we can 
> either spill the buffered data to disk or write to Parquet files.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (IMPALA-6507) Consider removing --disable_mem_pools debugging feature

2018-05-03 Thread Tim Armstrong (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-6507:
-

Assignee: Tim Armstrong

> Consider removing --disable_mem_pools debugging feature
> ---
>
> Key: IMPALA-6507
> URL: https://issues.apache.org/jira/browse/IMPALA-6507
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.11.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: resource-management
>
> We should consider removing support for the --disable_mem_pools feature. It 
> was originally somewhat useful for debugging memory-related issues where 
> memory pooling could limit the effectiveness of ASAN's checks. However, now 
> we use ASAN's poisoning in all cases, which provides most of the same 
> coverage and is used by default with ASAN.
> We don't routinely test --disable_mem_pools so we should consider removing it 
> to save the burden of maintaining these extra code paths.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-6920) Multithreaded scans are not guaranteed to get a thread token immediately

2018-05-03 Thread Tim Armstrong (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6920.
---
   Resolution: Fixed
Fix Version/s: Impala 3.1.0
   Impala 2.13.0

> Multithreaded scans are not guaranteed to get a thread token immediately
> 
>
> Key: IMPALA-6920
> URL: https://issues.apache.org/jira/browse/IMPALA-6920
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.12.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: resource-management
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> This bug applies to multithreaded HDFS and Kudu scans.
> So what happens is that we reserve an optional token for the first scanner 
> thread but that can be taken by any other operator in the same fragment. What 
> happens in one fragment in TPC-DS q18a is:
> 1. The hash join grabs an extra token for the join build. I guess it does 
> this early so it gets an optional token before other fragments can grab them.
> 2. The scan node reserves an optional token in Open(). This optional token is 
> already in use by the hash join.
> 3. The scan node tries to start the first scanner thread, but there are no 
> optional tokens available, so it can't start any.
> 4. Eventually the optional token is given up and the scanner thread can start.
> If #4 always happens without the scan making progress, then no deadlock is 
> possible, but if there's any kind of circular dependency, this can deadlock.
> Kudu scans also do not implement the num_scanner_threads query option in the 
> same way as HDFS scans - the IMPALA-2831 changes were not applied to kudu.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-6920) Multithreaded scans are not guaranteed to get a thread token immediately

2018-05-03 Thread Tim Armstrong (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6920.
---
   Resolution: Fixed
Fix Version/s: Impala 3.1.0
   Impala 2.13.0

> Multithreaded scans are not guaranteed to get a thread token immediately
> 
>
> Key: IMPALA-6920
> URL: https://issues.apache.org/jira/browse/IMPALA-6920
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.12.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: resource-management
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> This bug applies to multithreaded HDFS and Kudu scans.
> So what happens is that we reserve an optional token for the first scanner 
> thread but that can be taken by any other operator in the same fragment. What 
> happens in one fragment in TPC-DS q18a is:
> 1. The hash join grabs an extra token for the join build. I guess it does 
> this early so it gets an optional token before other fragments can grab them.
> 2. The scan node reserves an optional token in Open(). This optional token is 
> already in use by the hash join.
> 3. The scan node tries to start the first scanner thread, but there are no 
> optional tokens available, so it can't start any.
> 4. Eventually the optional token is given up and the scanner thread can start.
> If #4 always happens without the scan making progress, then no deadlock is 
> possible, but if there's any kind of circular dependency, this can deadlock.
> Kudu scans also do not implement the num_scanner_threads query option in the 
> same way as HDFS scans - the IMPALA-2831 changes were not applied to kudu.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-2376) Scan of array value with 100m elements with reasonable mem limit hits DCHECK.

2018-05-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-2376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462601#comment-16462601
 ] 

ASF subversion and git services commented on IMPALA-2376:
-

Commit 385afe5f45d2a89b2c246e9db20df74dd720e5d2 in impala's branch 
refs/heads/2.x from [~tarmstr...@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=385afe5 ]

IMPALA-6560: fix regression test for IMPALA-2376

The test is modified to increase the size of collections allocated.
num_nodes and mt_dop query options are set to make execution as
deterministic as possible.

I looped the test overnight to try to flush out flakiness.

Adds support for row_regex lines in CATCH sections so that we can
match a larger part of the error message.

Change-Id: I024cb6b57647902b1735defb885cd095fd99738c
Reviewed-on: http://gerrit.cloudera.org:8080/9681
Reviewed-by: Tim Armstrong 
Tested-by: Tim Armstrong 
Reviewed-on: http://gerrit.cloudera.org:8080/10272


> Scan of array value with 100m elements with reasonable mem limit hits DCHECK.
> -
>
> Key: IMPALA-2376
> URL: https://issues.apache.org/jira/browse/IMPALA-2376
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 2.3.0
>Reporter: Alexander Behm
>Assignee: Skye Wanderman-Milne
>Priority: Blocker
>  Labels: crash, nested_types, resource-management
>
> The query below when run without a mem limit needs roughly 2.4g of memory in 
> the scan.
> My expectation is that I get a mem limit exceeded error when running the same 
> query with a mem limit below that 2.4g. However, we hit a DCHECK in the 
> scanner.
> Repro:
> 1. Grab Parquet file from here:
> vd0212.halxg.cloudera.com:/data/1/huge_array_parquet/100m_array.parq
> 2. Copy file to HDFS and use CREATE TABLE LIKE FILE
> 3. The query below runs fine without a mem limit:
> {code}
> select cnt from huge_array_table t, (select count(item) cnt from t.f) v;
> {code}
> 4. Set the mem limit to 1g and run the query again. You will hit this DCHECK:
> {code}
> hdfs-parquet-scanner.cc:1299] Check failed: !parse_status_.ok()
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-6679) Don't claim ideal reservation in scanner until actually processing scan ranges

2018-05-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462604#comment-16462604
 ] 

ASF subversion and git services commented on IMPALA-6679:
-

Commit 83a70a7ae0a1bfc1fc7c2448e73c95ee2d7f7c09 in impala's branch 
refs/heads/2.x from [~tarmstr...@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=83a70a7 ]

IMPALA-6679,IMPALA-6678: reduce scan reservation

This has two related changes.

IMPALA-6679: defer scanner reservation increases

When starting each scan range, check to see how big the initial scan
range is (the full thing for row-based formats, the footer for
Parquet) and determine whether more reservation would be useful.

For Parquet, base the ideal reservation on the actual column layout
of each file. This avoids reserving memory that we won't use for
the actual files that we're scanning. This also avoid the need to
estimate ideal reservation in the planner.

We also release scanner thread reservations above the minimum as
soon as threads complete, so that resources can be released slightly
earlier.

IMPALA-6678: estimate Parquet column size for reservation
-
This change also reduces reservation computed by the planner in certain
cases by estimating the on-disk size of column data based on stats. It
also reduces the default per-column reservation to 4MB since it appears
that < 8MB columns are generally common in practice and the method for
estimating column size is biased towards over-estimating. There are two
main cases to consider for the performance implications:
* Memory is available to improve query perf - if we underestimate, we
  can increase the reservation so we can do "efficient" 8MB I/Os for
  large columns.
* The ideal reservation is not available - query performance is affected
  because we can't overlap I/O and compute as much and may do smaller
  (probably 4MB I/Os). However, we should avoid pathological behaviour
  like tiny I/Os.

When stats are not available, we just default to reserving 4MB per
column, which typically is more memory than required. When stats are
available, the memory required can be reduced below when some heuristic
tell us with high confidence that the column data for most or all files
is smaller than 4MB.

The stats-based heuristic could reduce scan performance if both the
conservative heuristics significantly underestimate the column size
and memory is constrained such that we can't increase the scan
reservation at runtime (in which case the memory might be used by
a different operator or scanner thread).

Observability:
Added counters to track when threads were not spawned due to reservation
and to track when reservation increases are requested and denied. These
allow determining if performance may have been affected by memory
availability.

Testing:
Updated test_mem_usage_scaling.py memory requirements and added steps
to regenerate the requirements. Loops test for a while to flush out
flakiness.

Added targeted planner and query tests for reservation calculations and
increases.

Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842
Reviewed-on: http://gerrit.cloudera.org:8080/9757
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 
Reviewed-on: http://gerrit.cloudera.org:8080/10273


> Don't claim ideal reservation in scanner until actually processing scan ranges
> --
>
> Key: IMPALA-6679
> URL: https://issues.apache.org/jira/browse/IMPALA-6679
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> One cause of the memory regression in IMPALA-4835 was that scans, 
> particularly Parquet, claimed memory very aggressively in 
> HdfsScanNode::Open() based on the planner's estimated ideal memory. There 
> were two problems here:
> # In many cases the ideal memory increase was not at all needed, because the 
> total amount of data scanned in the Parquet file was actually less than the 
> minimum reservation. We don't know this for sure until the read the Parquet 
> footer and check the column sizes.
> # HdfsScanNode::Open() may happen a long time before the first 
> HdfsScanNode::GetNext(), particularly if the scan node is on the left side of 
> a broadcast join. The scan node may grab a lot of reservation before other 
> plan nodes have started running, eventually resulting in starving a lot of 
> nodes.
> It would make sense to wait until we are actually processing a scan range to 
> increase a scanner thread's reservation from the minimum. 

[jira] [Commented] (IMPALA-6564) Queries randomly fail with "CANCELLED" due to a race with IssueInitialRanges()

2018-05-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462596#comment-16462596
 ] 

ASF subversion and git services commented on IMPALA-6564:
-

Commit 9bf324e7e7159ab47b91c7f152162250bc8ff041 in impala's branch 
refs/heads/2.x from [~tarmstr...@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=9bf324e ]

IMPALA-4835: switch I/O buffers to buffer pool

This is the following squashed patches that were reverted.

I will fix the known issues with some follow-on patches.

==
IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation

In preparation for switching the I/O mgr to the buffer pool, this
removes and cleans up a lot of code so that the switchover patch starts
from a cleaner slate.

* Remove the free buffer cache (which will be replaced by buffer pool's
  own caching).
* Make memory limit exceeded error checking synchronous (in anticipation
  of having to propagate buffer pool errors synchronously).
* Simplify error propagation - remove the (ineffectual) code that
  enqueued BufferDescriptors containing error statuses.
* Document locking scheme better in a few places, make it part of the
  function signature when it seemed reasonable.
* Move ReturnBuffer() to ScanRange, because it is intrinsically
  connected with the lifecycle of a scan range.
* Separate external ReturnBuffer() and internal CleanUpBuffer()
  interfaces - previously callers of ReturnBuffer() were fudging
  the num_buffers_in_reader accounting to make the external interface work.
* Eliminate redundant state in ScanRange: 'eosr_returned_' and
  'is_cancelled_'.
* Clarify the logic around calling Close() for the last
  BufferDescriptor.
  -> There appeared to be an implicit assumption that buffers would be
 freed in the order they were returned from the scan range, so that
 the "eos" buffer was returned last. Instead just count the number
 of outstanding buffers to detect the last one.
  -> Touching the is_cancelled_ field without holding a lock was hard to
 reason about - violated locking rules and it was unclear that it
 was race-free.
* Remove DiskIoMgr::Read() to simplify the interface. It is trivial to
  inline at the callsites.

This will probably regress performance somewhat because of the cache
removal, so my plan is to merge it around the same time as switching
the I/O mgr to allocate from the buffer pool. I'm keeping the patches
separate to make reviewing easier.

Testing:
* Ran exhaustive tests
* Ran the disk-io-mgr-stress-test overnight

==
IMPALA-4835: Part 2: Allocate scan range buffers upfront

This change is a step towards reserving memory for buffers from the
buffer pool and constraining per-scanner memory requirements. This
change restructures the DiskIoMgr code so that each ScanRange operates
with a fixed set of buffers that are allocated upfront and recycled as
the I/O mgr works through the ScanRange.

One major change is that ScanRanges get blocked when a buffer is not
available and get unblocked when a client returns a buffer via
ReturnBuffer(). I was able to remove the logic to maintain the
blocked_ranges_ list by instead adding a separate set with all ranges
that are active.

There is also some miscellaneous cleanup included - e.g. reducing the
amount of code devoted to maintaining counters and metrics.

One tricky part of the existing code was the it called
IssueInitialRanges() with empty lists of files and depended on
DiskIoMgr::AddScanRanges() to not check for cancellation in that case.
See IMPALA-6564/IMPALA-6588. I changed the logic to not try to issue
ranges for empty lists of files.

I plan to merge this along with the actual buffer pool switch, but
separated it out to allow review of the DiskIoMgr changes separate from
other aspects of the buffer pool switchover.

Testing:
* Ran core and exhaustive tests.

==
IMPALA-4835: Part 3: switch I/O buffers to buffer pool

This is the final patch to switch the Disk I/O manager to allocate all
buffer from the buffer pool and to reserve the buffers required for
a query upfront.

* The planner reserves enough memory to run a single scanner per
  scan node.
* The multi-threaded scan node must increase reservation before
  spinning up more threads.
* The scanner implementations must be careful to stay within their
  assigned reservation.

The row-oriented scanners were most straightforward, since they only
have a single scan range active at a time. A single I/O buffer is
sufficient to scan the whole file but more I/O buffers can improve I/O
throughput.

Parquet is more complex because it issues a scan range per column and
the sizes of the columns on disk are not known during planning. To
deal 

[jira] [Commented] (IMPALA-6588) test_compute_stats_tablesample failing with "Cancelled"

2018-05-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462597#comment-16462597
 ] 

ASF subversion and git services commented on IMPALA-6588:
-

Commit 9bf324e7e7159ab47b91c7f152162250bc8ff041 in impala's branch 
refs/heads/2.x from [~tarmstr...@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=9bf324e ]

IMPALA-4835: switch I/O buffers to buffer pool

This is the following squashed patches that were reverted.

I will fix the known issues with some follow-on patches.

==
IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation

In preparation for switching the I/O mgr to the buffer pool, this
removes and cleans up a lot of code so that the switchover patch starts
from a cleaner slate.

* Remove the free buffer cache (which will be replaced by buffer pool's
  own caching).
* Make memory limit exceeded error checking synchronous (in anticipation
  of having to propagate buffer pool errors synchronously).
* Simplify error propagation - remove the (ineffectual) code that
  enqueued BufferDescriptors containing error statuses.
* Document locking scheme better in a few places, make it part of the
  function signature when it seemed reasonable.
* Move ReturnBuffer() to ScanRange, because it is intrinsically
  connected with the lifecycle of a scan range.
* Separate external ReturnBuffer() and internal CleanUpBuffer()
  interfaces - previously callers of ReturnBuffer() were fudging
  the num_buffers_in_reader accounting to make the external interface work.
* Eliminate redundant state in ScanRange: 'eosr_returned_' and
  'is_cancelled_'.
* Clarify the logic around calling Close() for the last
  BufferDescriptor.
  -> There appeared to be an implicit assumption that buffers would be
 freed in the order they were returned from the scan range, so that
 the "eos" buffer was returned last. Instead just count the number
 of outstanding buffers to detect the last one.
  -> Touching the is_cancelled_ field without holding a lock was hard to
 reason about - violated locking rules and it was unclear that it
 was race-free.
* Remove DiskIoMgr::Read() to simplify the interface. It is trivial to
  inline at the callsites.

This will probably regress performance somewhat because of the cache
removal, so my plan is to merge it around the same time as switching
the I/O mgr to allocate from the buffer pool. I'm keeping the patches
separate to make reviewing easier.

Testing:
* Ran exhaustive tests
* Ran the disk-io-mgr-stress-test overnight

==
IMPALA-4835: Part 2: Allocate scan range buffers upfront

This change is a step towards reserving memory for buffers from the
buffer pool and constraining per-scanner memory requirements. This
change restructures the DiskIoMgr code so that each ScanRange operates
with a fixed set of buffers that are allocated upfront and recycled as
the I/O mgr works through the ScanRange.

One major change is that ScanRanges get blocked when a buffer is not
available and get unblocked when a client returns a buffer via
ReturnBuffer(). I was able to remove the logic to maintain the
blocked_ranges_ list by instead adding a separate set with all ranges
that are active.

There is also some miscellaneous cleanup included - e.g. reducing the
amount of code devoted to maintaining counters and metrics.

One tricky part of the existing code was the it called
IssueInitialRanges() with empty lists of files and depended on
DiskIoMgr::AddScanRanges() to not check for cancellation in that case.
See IMPALA-6564/IMPALA-6588. I changed the logic to not try to issue
ranges for empty lists of files.

I plan to merge this along with the actual buffer pool switch, but
separated it out to allow review of the DiskIoMgr changes separate from
other aspects of the buffer pool switchover.

Testing:
* Ran core and exhaustive tests.

==
IMPALA-4835: Part 3: switch I/O buffers to buffer pool

This is the final patch to switch the Disk I/O manager to allocate all
buffer from the buffer pool and to reserve the buffers required for
a query upfront.

* The planner reserves enough memory to run a single scanner per
  scan node.
* The multi-threaded scan node must increase reservation before
  spinning up more threads.
* The scanner implementations must be careful to stay within their
  assigned reservation.

The row-oriented scanners were most straightforward, since they only
have a single scan range active at a time. A single I/O buffer is
sufficient to scan the whole file but more I/O buffers can improve I/O
throughput.

Parquet is more complex because it issues a scan range per column and
the sizes of the columns on disk are not known during planning. To
deal 

[jira] [Commented] (IMPALA-4835) HDFS scans should operate with a constrained number of I/O buffers

2018-05-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462594#comment-16462594
 ] 

ASF subversion and git services commented on IMPALA-4835:
-

Commit 9bf324e7e7159ab47b91c7f152162250bc8ff041 in impala's branch 
refs/heads/2.x from [~tarmstr...@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=9bf324e ]

IMPALA-4835: switch I/O buffers to buffer pool

This is the following squashed patches that were reverted.

I will fix the known issues with some follow-on patches.

==
IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation

In preparation for switching the I/O mgr to the buffer pool, this
removes and cleans up a lot of code so that the switchover patch starts
from a cleaner slate.

* Remove the free buffer cache (which will be replaced by buffer pool's
  own caching).
* Make memory limit exceeded error checking synchronous (in anticipation
  of having to propagate buffer pool errors synchronously).
* Simplify error propagation - remove the (ineffectual) code that
  enqueued BufferDescriptors containing error statuses.
* Document locking scheme better in a few places, make it part of the
  function signature when it seemed reasonable.
* Move ReturnBuffer() to ScanRange, because it is intrinsically
  connected with the lifecycle of a scan range.
* Separate external ReturnBuffer() and internal CleanUpBuffer()
  interfaces - previously callers of ReturnBuffer() were fudging
  the num_buffers_in_reader accounting to make the external interface work.
* Eliminate redundant state in ScanRange: 'eosr_returned_' and
  'is_cancelled_'.
* Clarify the logic around calling Close() for the last
  BufferDescriptor.
  -> There appeared to be an implicit assumption that buffers would be
 freed in the order they were returned from the scan range, so that
 the "eos" buffer was returned last. Instead just count the number
 of outstanding buffers to detect the last one.
  -> Touching the is_cancelled_ field without holding a lock was hard to
 reason about - violated locking rules and it was unclear that it
 was race-free.
* Remove DiskIoMgr::Read() to simplify the interface. It is trivial to
  inline at the callsites.

This will probably regress performance somewhat because of the cache
removal, so my plan is to merge it around the same time as switching
the I/O mgr to allocate from the buffer pool. I'm keeping the patches
separate to make reviewing easier.

Testing:
* Ran exhaustive tests
* Ran the disk-io-mgr-stress-test overnight

==
IMPALA-4835: Part 2: Allocate scan range buffers upfront

This change is a step towards reserving memory for buffers from the
buffer pool and constraining per-scanner memory requirements. This
change restructures the DiskIoMgr code so that each ScanRange operates
with a fixed set of buffers that are allocated upfront and recycled as
the I/O mgr works through the ScanRange.

One major change is that ScanRanges get blocked when a buffer is not
available and get unblocked when a client returns a buffer via
ReturnBuffer(). I was able to remove the logic to maintain the
blocked_ranges_ list by instead adding a separate set with all ranges
that are active.

There is also some miscellaneous cleanup included - e.g. reducing the
amount of code devoted to maintaining counters and metrics.

One tricky part of the existing code was the it called
IssueInitialRanges() with empty lists of files and depended on
DiskIoMgr::AddScanRanges() to not check for cancellation in that case.
See IMPALA-6564/IMPALA-6588. I changed the logic to not try to issue
ranges for empty lists of files.

I plan to merge this along with the actual buffer pool switch, but
separated it out to allow review of the DiskIoMgr changes separate from
other aspects of the buffer pool switchover.

Testing:
* Ran core and exhaustive tests.

==
IMPALA-4835: Part 3: switch I/O buffers to buffer pool

This is the final patch to switch the Disk I/O manager to allocate all
buffer from the buffer pool and to reserve the buffers required for
a query upfront.

* The planner reserves enough memory to run a single scanner per
  scan node.
* The multi-threaded scan node must increase reservation before
  spinning up more threads.
* The scanner implementations must be careful to stay within their
  assigned reservation.

The row-oriented scanners were most straightforward, since they only
have a single scan range active at a time. A single I/O buffer is
sufficient to scan the whole file but more I/O buffers can improve I/O
throughput.

Parquet is more complex because it issues a scan range per column and
the sizes of the columns on disk are not known during planning. To
deal 

[jira] [Commented] (IMPALA-6587) Crash in DiskMgr::AllocateBuffersForRange

2018-05-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462599#comment-16462599
 ] 

ASF subversion and git services commented on IMPALA-6587:
-

Commit 82c43f4f151b56dbae3c74c086a8c401fc5d14bf in impala's branch 
refs/heads/2.x from [~tarmstr...@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=82c43f4 ]

IMPALA-6587: free buffers before ScanRange::Cancel() returns

ScanRange::Cancel() now waits until an in-flight read finishes so
that the disk I/O buffer being processed by the disk thread is
freed when Cancel() returns.

The fix is to set a 'read_in_flight_' flag on the scan range
while the disk thread is doing the read. Cancel() blocks until
read_in_flight_ == false.

The code is refactored to move more logic into ScanRange and
to avoid holding RequestContext::lock_ for longer than necessary.

Testing:
Added query test that reproduces the issue.

Added a unit test and a stress option that reproduces the problem in a
targeted way.

Ran disk-io-mgr-stress test for a few hours. Ran it under TSAN and
inspected output to make sure there were no non-benign data races.

Change-Id: I87182b6bd51b5fb0b923e7e4c8d08a44e7617db2
Reviewed-on: http://gerrit.cloudera.org:8080/9680
Reviewed-by: Tim Armstrong 
Tested-by: Tim Armstrong 
Reviewed-on: http://gerrit.cloudera.org:8080/10271


> Crash in DiskMgr::AllocateBuffersForRange
> -
>
> Key: IMPALA-6587
> URL: https://issues.apache.org/jira/browse/IMPALA-6587
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Affects Versions: Impala 2.12.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: broken-build, crash
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> {noformat}
> F0224 17:43:08.522589 13124 reservation-tracker.cc:376] Check failed: bytes 
> <= unused_reservation() (8192 vs. 0) 
> {noformat}
> {noformat}
> #0  0x003cb32328e5 in raise () from /lib64/libc.so.6
> #1  0x003cb32340c5 in abort () from /lib64/libc.so.6
> #2  0x03c5a244 in google::DumpStackTraceAndExit() ()
> #3  0x03c50cbd in google::LogMessage::Fail() ()
> #4  0x03c52562 in google::LogMessage::SendToLog() ()
> #5  0x03c50697 in google::LogMessage::Flush() ()
> #6  0x03c53c5e in google::LogMessageFatal::~LogMessageFatal() ()
> #7  0x01b7a813 in impala::ReservationTracker::AllocateFromLocked 
> (this=0x1a75d2a98, bytes=8192) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-integration/repos/Impala/be/src/runtime/bufferpool/reservation-tracker.cc:376
> #8  0x01b7a5ed in impala::ReservationTracker::AllocateFrom 
> (this=0x1a75d2a98, bytes=8192) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-integration/repos/Impala/be/src/runtime/bufferpool/reservation-tracker.cc:370
> #9  0x01b72127 in impala::BufferPool::Client::PrepareToAllocateBuffer 
> (this=0x1a75d2a80, len=8192, reserved=true, success=0x0) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-integration/repos/Impala/be/src/runtime/bufferpool/buffer-pool.cc:567
> #10 0x01b6ea13 in impala::BufferPool::AllocateBuffer (this=0xa6af380, 
> client=0x14121248, len=8192, handle=0x7fede6224260) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-integration/repos/Impala/be/src/runtime/bufferpool/buffer-pool.cc:229
> #11 0x02b894f0 in impala::io::DiskIoMgr::AllocateBuffersForRange 
> (this=0xb06fd40, reader=0x1ecf10300, bp_client=0x14121248, range=0x14711180, 
> max_bytes=8192) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-integration/repos/Impala/be/src/runtime/io/disk-io-mgr.cc:470
> #12 0x01bef7ff in impala::HdfsScanNode::ScannerThread 
> (this=0x14121100, scanner_thread_reservation=8192) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-integration/repos/Impala/be/src/exec/hdfs-scan-node.cc:393
> #13 0x01beec52 in impala::HdfsScanNode::::operator()(void) 
> const (__closure=0x7fede6224bc8) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-integration/repos/Impala/be/src/exec/hdfs-scan-node.cc:303
> #14 0x01bf0d75 in 
> boost::detail::function::void_function_obj_invoker0,
>  void>::invoke(boost::detail::function::function_buffer &) 
> (function_obj_ptr=...) at 
> /data/jenkins/workspace/impala-asf-master-exhaustive-integration/Impala-Toolchain/boost-1.57.0-p3/include/boost/function/function_template.hpp:153
> #15 0x0183e44a in boost::function0::operator() 
> (this=0x7fede6224bc0) at 
> 

[jira] [Commented] (IMPALA-6678) Better estimate of per-column compressed data size for low-NDV columns.

2018-05-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462603#comment-16462603
 ] 

ASF subversion and git services commented on IMPALA-6678:
-

Commit 83a70a7ae0a1bfc1fc7c2448e73c95ee2d7f7c09 in impala's branch 
refs/heads/2.x from [~tarmstr...@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=83a70a7 ]

IMPALA-6679,IMPALA-6678: reduce scan reservation

This has two related changes.

IMPALA-6679: defer scanner reservation increases

When starting each scan range, check to see how big the initial scan
range is (the full thing for row-based formats, the footer for
Parquet) and determine whether more reservation would be useful.

For Parquet, base the ideal reservation on the actual column layout
of each file. This avoids reserving memory that we won't use for
the actual files that we're scanning. This also avoid the need to
estimate ideal reservation in the planner.

We also release scanner thread reservations above the minimum as
soon as threads complete, so that resources can be released slightly
earlier.

IMPALA-6678: estimate Parquet column size for reservation
-
This change also reduces reservation computed by the planner in certain
cases by estimating the on-disk size of column data based on stats. It
also reduces the default per-column reservation to 4MB since it appears
that < 8MB columns are generally common in practice and the method for
estimating column size is biased towards over-estimating. There are two
main cases to consider for the performance implications:
* Memory is available to improve query perf - if we underestimate, we
  can increase the reservation so we can do "efficient" 8MB I/Os for
  large columns.
* The ideal reservation is not available - query performance is affected
  because we can't overlap I/O and compute as much and may do smaller
  (probably 4MB I/Os). However, we should avoid pathological behaviour
  like tiny I/Os.

When stats are not available, we just default to reserving 4MB per
column, which typically is more memory than required. When stats are
available, the memory required can be reduced below when some heuristic
tell us with high confidence that the column data for most or all files
is smaller than 4MB.

The stats-based heuristic could reduce scan performance if both the
conservative heuristics significantly underestimate the column size
and memory is constrained such that we can't increase the scan
reservation at runtime (in which case the memory might be used by
a different operator or scanner thread).

Observability:
Added counters to track when threads were not spawned due to reservation
and to track when reservation increases are requested and denied. These
allow determining if performance may have been affected by memory
availability.

Testing:
Updated test_mem_usage_scaling.py memory requirements and added steps
to regenerate the requirements. Loops test for a while to flush out
flakiness.

Added targeted planner and query tests for reservation calculations and
increases.

Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842
Reviewed-on: http://gerrit.cloudera.org:8080/9757
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 
Reviewed-on: http://gerrit.cloudera.org:8080/10273


> Better estimate of per-column compressed data size for low-NDV columns.
> ---
>
> Key: IMPALA-6678
> URL: https://issues.apache.org/jira/browse/IMPALA-6678
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Affects Versions: Not Applicable
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: resource-management
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> In the previous IMPALA-4835 patch, we assumed that the "ideal" memory per 
> Parquet column was 3 * 8MB, except when the total size of the file capped the 
> total amount of memory we might use. This is often an overestimate, 
> particular for smaller files, files with large numbers of columns, and highly 
> compressible data.
> We could do something smarter for Parquet given file sizes, per-partition row 
> count, and column NDV. We can estimate row count per file by dividing the row 
> count by the file size and estimate bytes per value with two methods:
> * For fixed width types, estimating bytes per value based on the type width. 
> We don't know what the physical parquet type is necessarily, but it seems 
> reasonable to estimate based on the type declared in the table.
> * log2(ndv) / 8, assuming that dictionary compression or general-purpose 
> compression will kick in.
> 

[jira] [Commented] (IMPALA-4835) HDFS scans should operate with a constrained number of I/O buffers

2018-05-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462593#comment-16462593
 ] 

ASF subversion and git services commented on IMPALA-4835:
-

Commit 9bf324e7e7159ab47b91c7f152162250bc8ff041 in impala's branch 
refs/heads/2.x from [~tarmstr...@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=9bf324e ]

IMPALA-4835: switch I/O buffers to buffer pool

This is the following squashed patches that were reverted.

I will fix the known issues with some follow-on patches.

==
IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation

In preparation for switching the I/O mgr to the buffer pool, this
removes and cleans up a lot of code so that the switchover patch starts
from a cleaner slate.

* Remove the free buffer cache (which will be replaced by buffer pool's
  own caching).
* Make memory limit exceeded error checking synchronous (in anticipation
  of having to propagate buffer pool errors synchronously).
* Simplify error propagation - remove the (ineffectual) code that
  enqueued BufferDescriptors containing error statuses.
* Document locking scheme better in a few places, make it part of the
  function signature when it seemed reasonable.
* Move ReturnBuffer() to ScanRange, because it is intrinsically
  connected with the lifecycle of a scan range.
* Separate external ReturnBuffer() and internal CleanUpBuffer()
  interfaces - previously callers of ReturnBuffer() were fudging
  the num_buffers_in_reader accounting to make the external interface work.
* Eliminate redundant state in ScanRange: 'eosr_returned_' and
  'is_cancelled_'.
* Clarify the logic around calling Close() for the last
  BufferDescriptor.
  -> There appeared to be an implicit assumption that buffers would be
 freed in the order they were returned from the scan range, so that
 the "eos" buffer was returned last. Instead just count the number
 of outstanding buffers to detect the last one.
  -> Touching the is_cancelled_ field without holding a lock was hard to
 reason about - violated locking rules and it was unclear that it
 was race-free.
* Remove DiskIoMgr::Read() to simplify the interface. It is trivial to
  inline at the callsites.

This will probably regress performance somewhat because of the cache
removal, so my plan is to merge it around the same time as switching
the I/O mgr to allocate from the buffer pool. I'm keeping the patches
separate to make reviewing easier.

Testing:
* Ran exhaustive tests
* Ran the disk-io-mgr-stress-test overnight

==
IMPALA-4835: Part 2: Allocate scan range buffers upfront

This change is a step towards reserving memory for buffers from the
buffer pool and constraining per-scanner memory requirements. This
change restructures the DiskIoMgr code so that each ScanRange operates
with a fixed set of buffers that are allocated upfront and recycled as
the I/O mgr works through the ScanRange.

One major change is that ScanRanges get blocked when a buffer is not
available and get unblocked when a client returns a buffer via
ReturnBuffer(). I was able to remove the logic to maintain the
blocked_ranges_ list by instead adding a separate set with all ranges
that are active.

There is also some miscellaneous cleanup included - e.g. reducing the
amount of code devoted to maintaining counters and metrics.

One tricky part of the existing code was the it called
IssueInitialRanges() with empty lists of files and depended on
DiskIoMgr::AddScanRanges() to not check for cancellation in that case.
See IMPALA-6564/IMPALA-6588. I changed the logic to not try to issue
ranges for empty lists of files.

I plan to merge this along with the actual buffer pool switch, but
separated it out to allow review of the DiskIoMgr changes separate from
other aspects of the buffer pool switchover.

Testing:
* Ran core and exhaustive tests.

==
IMPALA-4835: Part 3: switch I/O buffers to buffer pool

This is the final patch to switch the Disk I/O manager to allocate all
buffer from the buffer pool and to reserve the buffers required for
a query upfront.

* The planner reserves enough memory to run a single scanner per
  scan node.
* The multi-threaded scan node must increase reservation before
  spinning up more threads.
* The scanner implementations must be careful to stay within their
  assigned reservation.

The row-oriented scanners were most straightforward, since they only
have a single scan range active at a time. A single I/O buffer is
sufficient to scan the whole file but more I/O buffers can improve I/O
throughput.

Parquet is more complex because it issues a scan range per column and
the sizes of the columns on disk are not known during planning. To
deal 

[jira] [Commented] (IMPALA-6678) Better estimate of per-column compressed data size for low-NDV columns.

2018-05-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462605#comment-16462605
 ] 

ASF subversion and git services commented on IMPALA-6678:
-

Commit 83a70a7ae0a1bfc1fc7c2448e73c95ee2d7f7c09 in impala's branch 
refs/heads/2.x from [~tarmstr...@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=83a70a7 ]

IMPALA-6679,IMPALA-6678: reduce scan reservation

This has two related changes.

IMPALA-6679: defer scanner reservation increases

When starting each scan range, check to see how big the initial scan
range is (the full thing for row-based formats, the footer for
Parquet) and determine whether more reservation would be useful.

For Parquet, base the ideal reservation on the actual column layout
of each file. This avoids reserving memory that we won't use for
the actual files that we're scanning. This also avoid the need to
estimate ideal reservation in the planner.

We also release scanner thread reservations above the minimum as
soon as threads complete, so that resources can be released slightly
earlier.

IMPALA-6678: estimate Parquet column size for reservation
-
This change also reduces reservation computed by the planner in certain
cases by estimating the on-disk size of column data based on stats. It
also reduces the default per-column reservation to 4MB since it appears
that < 8MB columns are generally common in practice and the method for
estimating column size is biased towards over-estimating. There are two
main cases to consider for the performance implications:
* Memory is available to improve query perf - if we underestimate, we
  can increase the reservation so we can do "efficient" 8MB I/Os for
  large columns.
* The ideal reservation is not available - query performance is affected
  because we can't overlap I/O and compute as much and may do smaller
  (probably 4MB I/Os). However, we should avoid pathological behaviour
  like tiny I/Os.

When stats are not available, we just default to reserving 4MB per
column, which typically is more memory than required. When stats are
available, the memory required can be reduced below when some heuristic
tell us with high confidence that the column data for most or all files
is smaller than 4MB.

The stats-based heuristic could reduce scan performance if both the
conservative heuristics significantly underestimate the column size
and memory is constrained such that we can't increase the scan
reservation at runtime (in which case the memory might be used by
a different operator or scanner thread).

Observability:
Added counters to track when threads were not spawned due to reservation
and to track when reservation increases are requested and denied. These
allow determining if performance may have been affected by memory
availability.

Testing:
Updated test_mem_usage_scaling.py memory requirements and added steps
to regenerate the requirements. Loops test for a while to flush out
flakiness.

Added targeted planner and query tests for reservation calculations and
increases.

Change-Id: Ifc80e05118a9eef72cac8e2308418122e3ee0842
Reviewed-on: http://gerrit.cloudera.org:8080/9757
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins 
Reviewed-on: http://gerrit.cloudera.org:8080/10273


> Better estimate of per-column compressed data size for low-NDV columns.
> ---
>
> Key: IMPALA-6678
> URL: https://issues.apache.org/jira/browse/IMPALA-6678
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Affects Versions: Not Applicable
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: resource-management
> Fix For: Impala 2.13.0, Impala 3.1.0
>
>
> In the previous IMPALA-4835 patch, we assumed that the "ideal" memory per 
> Parquet column was 3 * 8MB, except when the total size of the file capped the 
> total amount of memory we might use. This is often an overestimate, 
> particular for smaller files, files with large numbers of columns, and highly 
> compressible data.
> We could do something smarter for Parquet given file sizes, per-partition row 
> count, and column NDV. We can estimate row count per file by dividing the row 
> count by the file size and estimate bytes per value with two methods:
> * For fixed width types, estimating bytes per value based on the type width. 
> We don't know what the physical parquet type is necessarily, but it seems 
> reasonable to estimate based on the type declared in the table.
> * log2(ndv) / 8, assuming that dictionary compression or general-purpose 
> compression will kick in.
> 

[jira] [Commented] (IMPALA-4835) HDFS scans should operate with a constrained number of I/O buffers

2018-05-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462598#comment-16462598
 ] 

ASF subversion and git services commented on IMPALA-4835:
-

Commit 9bf324e7e7159ab47b91c7f152162250bc8ff041 in impala's branch 
refs/heads/2.x from [~tarmstr...@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=9bf324e ]

IMPALA-4835: switch I/O buffers to buffer pool

This is the following squashed patches that were reverted.

I will fix the known issues with some follow-on patches.

==
IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation

In preparation for switching the I/O mgr to the buffer pool, this
removes and cleans up a lot of code so that the switchover patch starts
from a cleaner slate.

* Remove the free buffer cache (which will be replaced by buffer pool's
  own caching).
* Make memory limit exceeded error checking synchronous (in anticipation
  of having to propagate buffer pool errors synchronously).
* Simplify error propagation - remove the (ineffectual) code that
  enqueued BufferDescriptors containing error statuses.
* Document locking scheme better in a few places, make it part of the
  function signature when it seemed reasonable.
* Move ReturnBuffer() to ScanRange, because it is intrinsically
  connected with the lifecycle of a scan range.
* Separate external ReturnBuffer() and internal CleanUpBuffer()
  interfaces - previously callers of ReturnBuffer() were fudging
  the num_buffers_in_reader accounting to make the external interface work.
* Eliminate redundant state in ScanRange: 'eosr_returned_' and
  'is_cancelled_'.
* Clarify the logic around calling Close() for the last
  BufferDescriptor.
  -> There appeared to be an implicit assumption that buffers would be
 freed in the order they were returned from the scan range, so that
 the "eos" buffer was returned last. Instead just count the number
 of outstanding buffers to detect the last one.
  -> Touching the is_cancelled_ field without holding a lock was hard to
 reason about - violated locking rules and it was unclear that it
 was race-free.
* Remove DiskIoMgr::Read() to simplify the interface. It is trivial to
  inline at the callsites.

This will probably regress performance somewhat because of the cache
removal, so my plan is to merge it around the same time as switching
the I/O mgr to allocate from the buffer pool. I'm keeping the patches
separate to make reviewing easier.

Testing:
* Ran exhaustive tests
* Ran the disk-io-mgr-stress-test overnight

==
IMPALA-4835: Part 2: Allocate scan range buffers upfront

This change is a step towards reserving memory for buffers from the
buffer pool and constraining per-scanner memory requirements. This
change restructures the DiskIoMgr code so that each ScanRange operates
with a fixed set of buffers that are allocated upfront and recycled as
the I/O mgr works through the ScanRange.

One major change is that ScanRanges get blocked when a buffer is not
available and get unblocked when a client returns a buffer via
ReturnBuffer(). I was able to remove the logic to maintain the
blocked_ranges_ list by instead adding a separate set with all ranges
that are active.

There is also some miscellaneous cleanup included - e.g. reducing the
amount of code devoted to maintaining counters and metrics.

One tricky part of the existing code was the it called
IssueInitialRanges() with empty lists of files and depended on
DiskIoMgr::AddScanRanges() to not check for cancellation in that case.
See IMPALA-6564/IMPALA-6588. I changed the logic to not try to issue
ranges for empty lists of files.

I plan to merge this along with the actual buffer pool switch, but
separated it out to allow review of the DiskIoMgr changes separate from
other aspects of the buffer pool switchover.

Testing:
* Ran core and exhaustive tests.

==
IMPALA-4835: Part 3: switch I/O buffers to buffer pool

This is the final patch to switch the Disk I/O manager to allocate all
buffer from the buffer pool and to reserve the buffers required for
a query upfront.

* The planner reserves enough memory to run a single scanner per
  scan node.
* The multi-threaded scan node must increase reservation before
  spinning up more threads.
* The scanner implementations must be careful to stay within their
  assigned reservation.

The row-oriented scanners were most straightforward, since they only
have a single scan range active at a time. A single I/O buffer is
sufficient to scan the whole file but more I/O buffers can improve I/O
throughput.

Parquet is more complex because it issues a scan range per column and
the sizes of the columns on disk are not known during planning. To
deal 

[jira] [Commented] (IMPALA-4835) HDFS scans should operate with a constrained number of I/O buffers

2018-05-03 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462595#comment-16462595
 ] 

ASF subversion and git services commented on IMPALA-4835:
-

Commit 9bf324e7e7159ab47b91c7f152162250bc8ff041 in impala's branch 
refs/heads/2.x from [~tarmstr...@cloudera.com]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=9bf324e ]

IMPALA-4835: switch I/O buffers to buffer pool

This is the following squashed patches that were reverted.

I will fix the known issues with some follow-on patches.

==
IMPALA-4835: Part 1: simplify I/O mgr mem mgmt and cancellation

In preparation for switching the I/O mgr to the buffer pool, this
removes and cleans up a lot of code so that the switchover patch starts
from a cleaner slate.

* Remove the free buffer cache (which will be replaced by buffer pool's
  own caching).
* Make memory limit exceeded error checking synchronous (in anticipation
  of having to propagate buffer pool errors synchronously).
* Simplify error propagation - remove the (ineffectual) code that
  enqueued BufferDescriptors containing error statuses.
* Document locking scheme better in a few places, make it part of the
  function signature when it seemed reasonable.
* Move ReturnBuffer() to ScanRange, because it is intrinsically
  connected with the lifecycle of a scan range.
* Separate external ReturnBuffer() and internal CleanUpBuffer()
  interfaces - previously callers of ReturnBuffer() were fudging
  the num_buffers_in_reader accounting to make the external interface work.
* Eliminate redundant state in ScanRange: 'eosr_returned_' and
  'is_cancelled_'.
* Clarify the logic around calling Close() for the last
  BufferDescriptor.
  -> There appeared to be an implicit assumption that buffers would be
 freed in the order they were returned from the scan range, so that
 the "eos" buffer was returned last. Instead just count the number
 of outstanding buffers to detect the last one.
  -> Touching the is_cancelled_ field without holding a lock was hard to
 reason about - violated locking rules and it was unclear that it
 was race-free.
* Remove DiskIoMgr::Read() to simplify the interface. It is trivial to
  inline at the callsites.

This will probably regress performance somewhat because of the cache
removal, so my plan is to merge it around the same time as switching
the I/O mgr to allocate from the buffer pool. I'm keeping the patches
separate to make reviewing easier.

Testing:
* Ran exhaustive tests
* Ran the disk-io-mgr-stress-test overnight

==
IMPALA-4835: Part 2: Allocate scan range buffers upfront

This change is a step towards reserving memory for buffers from the
buffer pool and constraining per-scanner memory requirements. This
change restructures the DiskIoMgr code so that each ScanRange operates
with a fixed set of buffers that are allocated upfront and recycled as
the I/O mgr works through the ScanRange.

One major change is that ScanRanges get blocked when a buffer is not
available and get unblocked when a client returns a buffer via
ReturnBuffer(). I was able to remove the logic to maintain the
blocked_ranges_ list by instead adding a separate set with all ranges
that are active.

There is also some miscellaneous cleanup included - e.g. reducing the
amount of code devoted to maintaining counters and metrics.

One tricky part of the existing code was the it called
IssueInitialRanges() with empty lists of files and depended on
DiskIoMgr::AddScanRanges() to not check for cancellation in that case.
See IMPALA-6564/IMPALA-6588. I changed the logic to not try to issue
ranges for empty lists of files.

I plan to merge this along with the actual buffer pool switch, but
separated it out to allow review of the DiskIoMgr changes separate from
other aspects of the buffer pool switchover.

Testing:
* Ran core and exhaustive tests.

==
IMPALA-4835: Part 3: switch I/O buffers to buffer pool

This is the final patch to switch the Disk I/O manager to allocate all
buffer from the buffer pool and to reserve the buffers required for
a query upfront.

* The planner reserves enough memory to run a single scanner per
  scan node.
* The multi-threaded scan node must increase reservation before
  spinning up more threads.
* The scanner implementations must be careful to stay within their
  assigned reservation.

The row-oriented scanners were most straightforward, since they only
have a single scan range active at a time. A single I/O buffer is
sufficient to scan the whole file but more I/O buffers can improve I/O
throughput.

Parquet is more complex because it issues a scan range per column and
the sizes of the columns on disk are not known during planning. To
deal 

[jira] [Updated] (IMPALA-6966) Estimated Memory in Catalogd webpage is not sorted correctly

2018-05-03 Thread Jim Apple (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Apple updated IMPALA-6966:
--
Labels: newbie  (was: )

> Estimated Memory in Catalogd webpage is not sorted correctly
> 
>
> Key: IMPALA-6966
> URL: https://issues.apache.org/jira/browse/IMPALA-6966
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.0, Impala 2.12.0
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>  Labels: newbie
> Attachments: Screen Shot 2018-05-03 at 9.38.45 PM.png
>
>
> The "Top-N Tables with Highest Memory Requirements" in Catalogd webpage 
> doesn't sort "Estimated Memory" correctly. In fact, it sorts them as strings 
> instead of size. This is confusing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-6966) Estimated Memory in Catalogd webpage is not sorted correctly

2018-05-03 Thread Quanlong Huang (JIRA)
Quanlong Huang created IMPALA-6966:
--

 Summary: Estimated Memory in Catalogd webpage is not sorted 
correctly
 Key: IMPALA-6966
 URL: https://issues.apache.org/jira/browse/IMPALA-6966
 Project: IMPALA
  Issue Type: Bug
Affects Versions: Impala 2.12.0, Impala 3.0
Reporter: Quanlong Huang
Assignee: Quanlong Huang
 Attachments: Screen Shot 2018-05-03 at 9.38.45 PM.png

The "Top-N Tables with Highest Memory Requirements" in Catalogd webpage doesn't 
sort "Estimated Memory" correctly. In fact, it sorts them as strings instead of 
size. This is confusing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-6966) Estimated Memory in Catalogd webpage is not sorted correctly

2018-05-03 Thread Quanlong Huang (JIRA)
Quanlong Huang created IMPALA-6966:
--

 Summary: Estimated Memory in Catalogd webpage is not sorted 
correctly
 Key: IMPALA-6966
 URL: https://issues.apache.org/jira/browse/IMPALA-6966
 Project: IMPALA
  Issue Type: Bug
Affects Versions: Impala 2.12.0, Impala 3.0
Reporter: Quanlong Huang
Assignee: Quanlong Huang
 Attachments: Screen Shot 2018-05-03 at 9.38.45 PM.png

The "Top-N Tables with Highest Memory Requirements" in Catalogd webpage doesn't 
sort "Estimated Memory" correctly. In fact, it sorts them as strings instead of 
size. This is confusing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (IMPALA-6897) Catalog server should flag tables with large number of small files

2018-05-03 Thread Vuk Ercegovac (JIRA)

[ 
https://issues.apache.org/jira/browse/IMPALA-6897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461966#comment-16461966
 ] 

Vuk Ercegovac commented on IMPALA-6897:
---

Several thoughts on this one:
 * "too small": define in terms of work done per file (e.g., fixed overhead per 
file vs. work on data)? Some ratio of this can make a file "too small"
 * "too many": perhaps it depends on max parallelism per node * number of nodes 
(with a multiplier)?
 * Both the producer (e.g., insert-select) and consumer (e.g., select) should 
get warnings but the producer is making things worse whereas the consumer's 
expectations for performance are being managed.

> Catalog server should flag tables with large number of small files
> --
>
> Key: IMPALA-6897
> URL: https://issues.apache.org/jira/browse/IMPALA-6897
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Affects Versions: Impala 2.13.0
>Reporter: bharath v
>Priority: Major
>  Labels: ramp-up, supportability
>
> Since Catalog has all the file metadata information available, it should help 
> flag tables with large number of small files. This information can be 
> propagated to the coordinators and should be reflected in the query profiles 
> like how we do for "missing stats".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org