[jira] [Created] (IMPALA-8930) Impala Doc: Document object ownership with Ranger authorization provider

2019-09-06 Thread Alex Rodoni (Jira)
Alex Rodoni created IMPALA-8930:
---

 Summary: Impala Doc: Document object ownership with Ranger 
authorization provider
 Key: IMPALA-8930
 URL: https://issues.apache.org/jira/browse/IMPALA-8930
 Project: IMPALA
  Issue Type: Sub-task
  Components: Docs
Reporter: Alex Rodoni
Assignee: Alex Rodoni






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8930) Impala Doc: Document object ownership with Ranger authorization provider

2019-09-06 Thread Alex Rodoni (Jira)
Alex Rodoni created IMPALA-8930:
---

 Summary: Impala Doc: Document object ownership with Ranger 
authorization provider
 Key: IMPALA-8930
 URL: https://issues.apache.org/jira/browse/IMPALA-8930
 Project: IMPALA
  Issue Type: Sub-task
  Components: Docs
Reporter: Alex Rodoni
Assignee: Alex Rodoni






--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IMPALA-8929) Impala Doc: Document the query option to only set the mem limit on executors

2019-09-06 Thread Alex Rodoni (Jira)
Alex Rodoni created IMPALA-8929:
---

 Summary: Impala Doc: Document the query option to only set the mem 
limit on executors
 Key: IMPALA-8929
 URL: https://issues.apache.org/jira/browse/IMPALA-8929
 Project: IMPALA
  Issue Type: Sub-task
  Components: Docs
Reporter: Alex Rodoni
Assignee: Alex Rodoni






--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IMPALA-8929) Impala Doc: Document the query option to only set the mem limit on executors

2019-09-06 Thread Alex Rodoni (Jira)
Alex Rodoni created IMPALA-8929:
---

 Summary: Impala Doc: Document the query option to only set the mem 
limit on executors
 Key: IMPALA-8929
 URL: https://issues.apache.org/jira/browse/IMPALA-8929
 Project: IMPALA
  Issue Type: Sub-task
  Components: Docs
Reporter: Alex Rodoni
Assignee: Alex Rodoni






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8928) Add query option to only set the mem limit on executors

2019-09-06 Thread Bikramjeet Vig (Jira)
Bikramjeet Vig created IMPALA-8928:
--

 Summary: Add query option to only set the mem limit on executors
 Key: IMPALA-8928
 URL: https://issues.apache.org/jira/browse/IMPALA-8928
 Project: IMPALA
  Issue Type: Improvement
Affects Versions: Product Backlog
Reporter: Bikramjeet Vig
Assignee: Bikramjeet Vig






--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IMPALA-8928) Add query option to only set the mem limit on executors

2019-09-06 Thread Bikramjeet Vig (Jira)
Bikramjeet Vig created IMPALA-8928:
--

 Summary: Add query option to only set the mem limit on executors
 Key: IMPALA-8928
 URL: https://issues.apache.org/jira/browse/IMPALA-8928
 Project: IMPALA
  Issue Type: Improvement
Affects Versions: Product Backlog
Reporter: Bikramjeet Vig
Assignee: Bikramjeet Vig






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8927) Improve HTTP auth error message

2019-09-06 Thread Thomas Tauber-Marshall (Jira)
Thomas Tauber-Marshall created IMPALA-8927:
--

 Summary: Improve HTTP auth error message
 Key: IMPALA-8927
 URL: https://issues.apache.org/jira/browse/IMPALA-8927
 Project: IMPALA
  Issue Type: Improvement
Reporter: Thomas Tauber-Marshall
Assignee: Thomas Tauber-Marshall


Currently, when a connection fails to authenticate to the hs2 http server, we 
log an error message that just says "HTTP auth failed." It should be possible 
to include more info with this message to make it clearer why auth failed.

For example this error will be logged when SPNEGO auth is proceeding 
successfully but just incomplete, which can be confusing to users.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IMPALA-8927) Improve HTTP auth error message

2019-09-06 Thread Thomas Tauber-Marshall (Jira)
Thomas Tauber-Marshall created IMPALA-8927:
--

 Summary: Improve HTTP auth error message
 Key: IMPALA-8927
 URL: https://issues.apache.org/jira/browse/IMPALA-8927
 Project: IMPALA
  Issue Type: Improvement
Reporter: Thomas Tauber-Marshall
Assignee: Thomas Tauber-Marshall


Currently, when a connection fails to authenticate to the hs2 http server, we 
log an error message that just says "HTTP auth failed." It should be possible 
to include more info with this message to make it clearer why auth failed.

For example this error will be logged when SPNEGO auth is proceeding 
successfully but just incomplete, which can be confusing to users.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Updated] (IMPALA-8902) TestResultSpooling.test_spilling is flaky

2019-09-06 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-8902:
-
Summary: TestResultSpooling.test_spilling is flaky  (was: 
TestResultSpooling,test_spilling is flaky)

> TestResultSpooling.test_spilling is flaky
> -
>
> Key: IMPALA-8902
> URL: https://issues.apache.org/jira/browse/IMPALA-8902
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.4.0
>Reporter: Attila Jeges
>Assignee: Sahil Takiar
>Priority: Critical
> Fix For: Impala 3.4.0
>
>
> Error: 
> {code:java}
> 17:45:10 FAIL 
> query_test/test_result_spooling.py::TestResultSpooling::()::test_spilling[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> parquet/none]
> 17:45:10 === FAILURES 
> ===
> 17:45:10  TestResultSpooling.test_spilling[protocol: beeswax | exec_option: 
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 
> 'exec_single_node_rows_threshold': 0} | table_format: parquet/none] 
> 17:45:10 [gw1] linux2 -- Python 2.7.5 
> /data/jenkins/workspace/impala-cdpd-master-core-asan/repos/Impala/bin/../infra/python/env/bin/python
> 17:45:10 query_test/test_result_spooling.py:104: in test_spilling
> 17:45:10 .format(query, timeout))
> 17:45:10 E   Timeout: Query select * from functional.alltypes order by id 
> limit 1500 did not spill spooled results within the timeout 10
> 17:45:10 - Captured stderr call 
> -
> 17:45:10 SET 
> client_identifier=query_test/test_result_spooling.py::TestResultSpooling::()::test_spilling[protocol:beeswax|exec_option:{'batch_size':0;'num_nodes':0;'disable_codegen_rows_threshold':0;'disable_codegen':False;'abort_on_error':1;'exec_single_node_rows_threshold':0}|table_f;
> 17:45:10 SET min_spillable_buffer_size=8192;
> 17:45:10 SET batch_size=0;
> 17:45:10 SET num_nodes=0;
> 17:45:10 SET disable_codegen_rows_threshold=0;
> 17:45:10 SET disable_codegen=False;
> 17:45:10 SET abort_on_error=1;
> 17:45:10 SET default_spillable_buffer_size=8192;
> 17:45:10 SET max_result_spooling_mem=32768;
> 17:45:10 SET exec_single_node_rows_threshold=0;
> 17:45:10 -- executing against localhost:21000
> 17:45:10 
> 17:45:10 select * from functional.alltypes order by id limit 1500;
> {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8926) TestResultSpooling::_test_full_queue is flaky

2019-09-06 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-8926:


 Summary: TestResultSpooling::_test_full_queue is flaky
 Key: IMPALA-8926
 URL: https://issues.apache.org/jira/browse/IMPALA-8926
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 3.4.0
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Has happened a few times, error message is:
{code:java}
query_test/test_result_spooling.py:116: in test_full_queue_large_fetch 
self._test_full_queue(vector, query, fetch_size=num_rows) 
query_test/test_result_spooling.py:148: in _test_full_queue assert 
re.search(send_wait_time_regex, self.client.get_runtime_profile(handle)) \ E   
assert None is not None E+  where None = ('RowBatchSendWaitTime: [1-9]', 'Query 
(id=e948cdd2bbde9430:082830be):\n  DEBUG MODE WARNING: Query profile 
created while running a DEBUG buil...: 0.000ns\n - WriteIoBytes: 
0\n - WriteIoOps: 0 (0)\n - WriteIoWaitTime: 
0.000ns\n') E+where  = re.search E   
 +and   'Query (id=e948cdd2bbde9430:082830be):\n  DEBUG MODE 
WARNING: Query profile created while running a DEBUG buil...: 0.000ns\n 
- WriteIoBytes: 0\n - WriteIoOps: 0 (0)\n - 
WriteIoWaitTime: 0.000ns\n' = >() E+  where > = 
.get_runtime_profile E+where 
 = 
.client {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8926) TestResultSpooling::_test_full_queue is flaky

2019-09-06 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-8926:


 Summary: TestResultSpooling::_test_full_queue is flaky
 Key: IMPALA-8926
 URL: https://issues.apache.org/jira/browse/IMPALA-8926
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 3.4.0
Reporter: Sahil Takiar
Assignee: Sahil Takiar


Has happened a few times, error message is:
{code:java}
query_test/test_result_spooling.py:116: in test_full_queue_large_fetch 
self._test_full_queue(vector, query, fetch_size=num_rows) 
query_test/test_result_spooling.py:148: in _test_full_queue assert 
re.search(send_wait_time_regex, self.client.get_runtime_profile(handle)) \ E   
assert None is not None E+  where None = ('RowBatchSendWaitTime: [1-9]', 'Query 
(id=e948cdd2bbde9430:082830be):\n  DEBUG MODE WARNING: Query profile 
created while running a DEBUG buil...: 0.000ns\n - WriteIoBytes: 
0\n - WriteIoOps: 0 (0)\n - WriteIoWaitTime: 
0.000ns\n') E+where  = re.search E   
 +and   'Query (id=e948cdd2bbde9430:082830be):\n  DEBUG MODE 
WARNING: Query profile created while running a DEBUG buil...: 0.000ns\n 
- WriteIoBytes: 0\n - WriteIoOps: 0 (0)\n - 
WriteIoWaitTime: 0.000ns\n' = >() E+  where > = 
.get_runtime_profile E+where 
 = 
.client {code}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (IMPALA-8925) Consider replacing ClientRequestState ResultCache with result spooling

2019-09-06 Thread Sahil Takiar (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated IMPALA-8925:
-
Component/s: Clients

> Consider replacing ClientRequestState ResultCache with result spooling
> --
>
> Key: IMPALA-8925
> URL: https://issues.apache.org/jira/browse/IMPALA-8925
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Clients
>Reporter: Sahil Takiar
>Priority: Major
>
> The {{ClientRequestState}} maintains an internal results cache (which is 
> really just a {{QueryResultSet}}) in order to provide support for the 
> {{TFetchOrientation.FETCH_FIRST}} fetch orientation (used by Hue - see 
> [https://github.com/apache/impala/commit/6b769d011d2016a73483f63b311e108d17d9a083]).
> The cache itself has some limitations:
>  * It caches all results in a {{QueryResultSet}} with limited admission 
> control integration
>  * It has a max size, if the size is exceeded the cache is emptied
>  * It cannot spill to disk
> Result spooling could potentially replace the query result cache and provide 
> a few benefits; it should be able to fit more rows since it can spill to 
> disk. The memory is better tracked as well since it integrates with both 
> admitted and reserved memory. Hue currently sets the max result set fetch 
> size to 
> [https://github.com/cloudera/hue/blob/master/apps/impala/src/impala/impala_flags.py#L61],
>  would be good to check how well that value works for Hue users so we can 
> decide if replacing the current result cache with result spooling makes sense.
> This would require some changes to result spooling as well, currently it 
> discards rows whenever it reads them from the underlying 
> {{BufferedTupleStream}}. It would need the ability to reset the read cursor, 
> which would require some changes to the {{PlanRootSink}} interface as well.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8925) Consider replacing ClientRequestState ResultCache with result spooling

2019-09-06 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-8925:


 Summary: Consider replacing ClientRequestState ResultCache with 
result spooling
 Key: IMPALA-8925
 URL: https://issues.apache.org/jira/browse/IMPALA-8925
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Sahil Takiar


The {{ClientRequestState}} maintains an internal results cache (which is really 
just a {{QueryResultSet}}) in order to provide support for the 
{{TFetchOrientation.FETCH_FIRST}} fetch orientation (used by Hue - see 
[https://github.com/apache/impala/commit/6b769d011d2016a73483f63b311e108d17d9a083]).

The cache itself has some limitations:
 * It caches all results in a {{QueryResultSet}} with limited admission control 
integration
 * It has a max size, if the size is exceeded the cache is emptied
 * It cannot spill to disk

Result spooling could potentially replace the query result cache and provide a 
few benefits; it should be able to fit more rows since it can spill to disk. 
The memory is better tracked as well since it integrates with both admitted and 
reserved memory. Hue currently sets the max result set fetch size to 
[https://github.com/cloudera/hue/blob/master/apps/impala/src/impala/impala_flags.py#L61],
 would be good to check how well that value works for Hue users so we can 
decide if replacing the current result cache with result spooling makes sense.

This would require some changes to result spooling as well, currently it 
discards rows whenever it reads them from the underlying 
{{BufferedTupleStream}}. It would need the ability to reset the read cursor, 
which would require some changes to the {{PlanRootSink}} interface as well.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8925) Consider replacing ClientRequestState ResultCache with result spooling

2019-09-06 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-8925:


 Summary: Consider replacing ClientRequestState ResultCache with 
result spooling
 Key: IMPALA-8925
 URL: https://issues.apache.org/jira/browse/IMPALA-8925
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Sahil Takiar


The {{ClientRequestState}} maintains an internal results cache (which is really 
just a {{QueryResultSet}}) in order to provide support for the 
{{TFetchOrientation.FETCH_FIRST}} fetch orientation (used by Hue - see 
[https://github.com/apache/impala/commit/6b769d011d2016a73483f63b311e108d17d9a083]).

The cache itself has some limitations:
 * It caches all results in a {{QueryResultSet}} with limited admission control 
integration
 * It has a max size, if the size is exceeded the cache is emptied
 * It cannot spill to disk

Result spooling could potentially replace the query result cache and provide a 
few benefits; it should be able to fit more rows since it can spill to disk. 
The memory is better tracked as well since it integrates with both admitted and 
reserved memory. Hue currently sets the max result set fetch size to 
[https://github.com/cloudera/hue/blob/master/apps/impala/src/impala/impala_flags.py#L61],
 would be good to check how well that value works for Hue users so we can 
decide if replacing the current result cache with result spooling makes sense.

This would require some changes to result spooling as well, currently it 
discards rows whenever it reads them from the underlying 
{{BufferedTupleStream}}. It would need the ability to reset the read cursor, 
which would require some changes to the {{PlanRootSink}} interface as well.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IMPALA-8924) DCHECK(!closed_) in SpillableRowBatchQueue::IsEmpty

2019-09-06 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-8924:


 Summary: DCHECK(!closed_) in SpillableRowBatchQueue::IsEmpty
 Key: IMPALA-8924
 URL: https://issues.apache.org/jira/browse/IMPALA-8924
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Affects Versions: Impala 3.4.0
Reporter: Sahil Takiar
Assignee: Sahil Takiar


When running exhaustive tests with result spooling enabled, there are several 
impalad crashes with the following stack:
{code:java}
#0  0x7f5e797541f7 in raise () from /lib64/libc.so.6
#1  0x7f5e797558e8 in abort () from /lib64/libc.so.6
#2  0x04cc5834 in google::DumpStackTraceAndExit() ()
#3  0x04cbc28d in google::LogMessage::Fail() ()
#4  0x04cbdb32 in google::LogMessage::SendToLog() ()
#5  0x04cbbc67 in google::LogMessage::Flush() ()
#6  0x04cbf22e in google::LogMessageFatal::~LogMessageFatal() ()
#7  0x029a16cd in impala::SpillableRowBatchQueue::IsEmpty 
(this=0x13d504e0) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/spillable-row-batch-queue.cc:128
#8  0x025f5610 in impala::BufferedPlanRootSink::IsQueueEmpty 
(this=0x13943000) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/exec/buffered-plan-root-sink.h:147
#9  0x025f4e81 in impala::BufferedPlanRootSink::GetNext 
(this=0x13943000, state=0x13d2a1c0, results=0x173c8520, num_results=-1, 
eos=0xd30cde1) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/exec/buffered-plan-root-sink.cc:158
#10 0x0294ef4d in impala::Coordinator::GetNext (this=0xe4ed180, 
results=0x173c8520, max_rows=-1, eos=0xd30cde1) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/coordinator.cc:683
#11 0x02251043 in impala::ClientRequestState::FetchRowsInternal 
(this=0xd30c800, max_rows=-1, fetched_rows=0x173c8520) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/client-request-state.cc:959
#12 0x022503e7 in impala::ClientRequestState::FetchRows 
(this=0xd30c800, max_rows=-1, fetched_rows=0x173c8520) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/client-request-state.cc:851
#13 0x0226a36d in impala::ImpalaServer::FetchInternal (this=0x12d14800, 
request_state=0xd30c800, start_over=false, fetch_size=-1, 
query_results=0x7f5daf861138) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/impala-beeswax-server.cc:582
#14 0x02264970 in impala::ImpalaServer::fetch (this=0x12d14800, 
query_results=..., query_handle=..., start_over=false, fetch_size=-1) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/impala-beeswax-server.cc:188
#15 0x027caf09 in beeswax::BeeswaxServiceProcessor::process_fetch 
(this=0x12d6fc20, seqid=0, iprot=0x119f5780, oprot=0x119f56c0, 
callContext=0xdf92060) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/BeeswaxService.cpp:3398
#16 0x027c94e6 in beeswax::BeeswaxServiceProcessor::dispatchCall 
(this=0x12d6fc20, iprot=0x119f5780, oprot=0x119f56c0, fname=..., seqid=0, 
callContext=0xdf92060) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/BeeswaxService.cpp:3200
#17 0x02796f13 in impala::ImpalaServiceProcessor::dispatchCall 
(this=0x12d6fc20, iprot=0x119f5780, oprot=0x119f56c0, fname=..., seqid=0, 
callContext=0xdf92060) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/ImpalaService.cpp:1824
#18 0x01b3cee4 in apache::thrift::TDispatchProcessor::process 
(this=0x12d6fc20, in=..., out=..., connectionContext=0xdf92060) at 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/thrift-0.9.3-p7/include/thrift/TDispatchProcessor.h:121
#19 0x01f9bf28 in apache::thrift::server::TAcceptQueueServer::Task::run 
(this=0xdf92000) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/rpc/TAcceptQueueServer.cpp:84
#20 0x01f9166d in impala::ThriftThread::RunRunnable (this=0x116ddfc0, 
runnable=..., promise=0x7f5db0862e90) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/rpc/thrift-thread.cc:74
#21 0x01f92d93 in boost::_mfi::mf2, 
impala::Promise*>::operator() 
(this=0x121e7800, p=0x116ddfc0, a1=..., a2=0x7f5db0862e90) at 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.57.0-p3/include/boost/bind/mem_fn_template.hpp:280
#22 0x01f92c29 in 
boost::_bi::list3, 
boost::_bi::value >, 
boost::_bi::value*> 
>::operator(), 
impala::Promise*>, boost::_bi::list0> 
(this=0x121e7810, f=..., a=...) at 

[jira] [Created] (IMPALA-8924) DCHECK(!closed_) in SpillableRowBatchQueue::IsEmpty

2019-09-06 Thread Sahil Takiar (Jira)
Sahil Takiar created IMPALA-8924:


 Summary: DCHECK(!closed_) in SpillableRowBatchQueue::IsEmpty
 Key: IMPALA-8924
 URL: https://issues.apache.org/jira/browse/IMPALA-8924
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Affects Versions: Impala 3.4.0
Reporter: Sahil Takiar
Assignee: Sahil Takiar


When running exhaustive tests with result spooling enabled, there are several 
impalad crashes with the following stack:
{code:java}
#0  0x7f5e797541f7 in raise () from /lib64/libc.so.6
#1  0x7f5e797558e8 in abort () from /lib64/libc.so.6
#2  0x04cc5834 in google::DumpStackTraceAndExit() ()
#3  0x04cbc28d in google::LogMessage::Fail() ()
#4  0x04cbdb32 in google::LogMessage::SendToLog() ()
#5  0x04cbbc67 in google::LogMessage::Flush() ()
#6  0x04cbf22e in google::LogMessageFatal::~LogMessageFatal() ()
#7  0x029a16cd in impala::SpillableRowBatchQueue::IsEmpty 
(this=0x13d504e0) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/spillable-row-batch-queue.cc:128
#8  0x025f5610 in impala::BufferedPlanRootSink::IsQueueEmpty 
(this=0x13943000) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/exec/buffered-plan-root-sink.h:147
#9  0x025f4e81 in impala::BufferedPlanRootSink::GetNext 
(this=0x13943000, state=0x13d2a1c0, results=0x173c8520, num_results=-1, 
eos=0xd30cde1) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/exec/buffered-plan-root-sink.cc:158
#10 0x0294ef4d in impala::Coordinator::GetNext (this=0xe4ed180, 
results=0x173c8520, max_rows=-1, eos=0xd30cde1) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/runtime/coordinator.cc:683
#11 0x02251043 in impala::ClientRequestState::FetchRowsInternal 
(this=0xd30c800, max_rows=-1, fetched_rows=0x173c8520) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/client-request-state.cc:959
#12 0x022503e7 in impala::ClientRequestState::FetchRows 
(this=0xd30c800, max_rows=-1, fetched_rows=0x173c8520) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/client-request-state.cc:851
#13 0x0226a36d in impala::ImpalaServer::FetchInternal (this=0x12d14800, 
request_state=0xd30c800, start_over=false, fetch_size=-1, 
query_results=0x7f5daf861138) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/impala-beeswax-server.cc:582
#14 0x02264970 in impala::ImpalaServer::fetch (this=0x12d14800, 
query_results=..., query_handle=..., start_over=false, fetch_size=-1) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/service/impala-beeswax-server.cc:188
#15 0x027caf09 in beeswax::BeeswaxServiceProcessor::process_fetch 
(this=0x12d6fc20, seqid=0, iprot=0x119f5780, oprot=0x119f56c0, 
callContext=0xdf92060) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/BeeswaxService.cpp:3398
#16 0x027c94e6 in beeswax::BeeswaxServiceProcessor::dispatchCall 
(this=0x12d6fc20, iprot=0x119f5780, oprot=0x119f56c0, fname=..., seqid=0, 
callContext=0xdf92060) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/BeeswaxService.cpp:3200
#17 0x02796f13 in impala::ImpalaServiceProcessor::dispatchCall 
(this=0x12d6fc20, iprot=0x119f5780, oprot=0x119f56c0, fname=..., seqid=0, 
callContext=0xdf92060) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/generated-sources/gen-cpp/ImpalaService.cpp:1824
#18 0x01b3cee4 in apache::thrift::TDispatchProcessor::process 
(this=0x12d6fc20, in=..., out=..., connectionContext=0xdf92060) at 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/thrift-0.9.3-p7/include/thrift/TDispatchProcessor.h:121
#19 0x01f9bf28 in apache::thrift::server::TAcceptQueueServer::Task::run 
(this=0xdf92000) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/rpc/TAcceptQueueServer.cpp:84
#20 0x01f9166d in impala::ThriftThread::RunRunnable (this=0x116ddfc0, 
runnable=..., promise=0x7f5db0862e90) at 
/data/jenkins/workspace/impala-private-parameterized/repos/Impala/be/src/rpc/thrift-thread.cc:74
#21 0x01f92d93 in boost::_mfi::mf2, 
impala::Promise*>::operator() 
(this=0x121e7800, p=0x116ddfc0, a1=..., a2=0x7f5db0862e90) at 
/data/jenkins/workspace/impala-private-parameterized/Impala-Toolchain/boost-1.57.0-p3/include/boost/bind/mem_fn_template.hpp:280
#22 0x01f92c29 in 
boost::_bi::list3, 
boost::_bi::value >, 
boost::_bi::value*> 
>::operator(), 
impala::Promise*>, boost::_bi::list0> 
(this=0x121e7810, f=..., a=...) at 

[jira] [Commented] (IMPALA-8508) Use Python 3 from toolchain for impala-python

2019-09-06 Thread Tim Armstrong (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-8508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16924412#comment-16924412
 ] 

Tim Armstrong commented on IMPALA-8508:
---

Here's a commit that adds it to the toolchain - 
https://gerrit.cloudera.org/#/c/14161/

> Use Python 3 from toolchain for impala-python
> -
>
> Key: IMPALA-8508
> URL: https://issues.apache.org/jira/browse/IMPALA-8508
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Attachments: 
> 0001-WIP-IMPALA-8508-download-Python-2.7-from-toolchain-i.patch
>
>
> We should standardise on a single python version to use for tests and other 
> infrastructure. Python 2.7 is going EOL soon.
> I started adding it to the toolchain - https://gerrit.cloudera.org/#/c/14161/



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-8922) When startup independent impalad daemon that trys to open transport for localhost:24000

2019-09-06 Thread Tim Armstrong (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-8922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-8922.
---
Resolution: Won't Fix

The packaging and startup scripts are from Cloudera, not Apache Impala.

We generally recommend only deploying one Impala daemon per host anyway. You 
can deploy multiple daemons by manually configuring ports and it works, but 
resource management may not behave exactly as expected without additional 
tuning.

> When startup independent impalad daemon that trys to open transport for 
> localhost:24000
> ---
>
> Key: IMPALA-8922
> URL: https://issues.apache.org/jira/browse/IMPALA-8922
> Project: IMPALA
>  Issue Type: Bug
>Reporter: shaozhipeng
>Priority: Major
>
> When I have install  impala-server-3.2.0+cdh6.3.0-1279813.el7.x86_64 on a 
> newer server node( Other impala-server and impala-state , impala-catalog have 
> installed on the other server node - slave3 and running healthy.)
>  
> Newer Server, when startup impalad daemon that trys to open transport for 
> localhost:24000.
>  
> The catalog and state host was configured in file /etc/default/impala.
>  
> ps -ef|grep impala output:
> /usr/lib/impala/sbin/impalad -log_dir=/sumpay/cdh-impala/logs 
> -catalog_service_host=slave3 -state_store_port=24000 -use_statestore 
> -state_store_host=slave3 -be_port=22000 
> -kudu_master_hosts=slave1:7051,slave2:7051,slave3:7051



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (IMPALA-8923) Don't need synchronized in HBaseTable.getEstimatedRowStats

2019-09-06 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-8923:
--

 Summary: Don't need synchronized in HBaseTable.getEstimatedRowStats
 Key: IMPALA-8923
 URL: https://issues.apache.org/jira/browse/IMPALA-8923
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Affects Versions: Impala 3.2.0, Impala 3.1.0, Impala 2.12.0, Impala 3.0, 
Impala 2.11.0, Impala 2.10.0, Impala 2.9.0, Impala 2.7.1, Impala 2.8.0, Impala 
2.7.0, Impala 3.3.0
Reporter: Quanlong Huang
Assignee: Quanlong Huang


HBaseTable.getEstimatedRowStats() estimates #rows and row size by sampling on 
hbase table in target key range. It requires HBase RPCs so could be slow.

Currently, HBaseTable.getEstimatedRowStats() is marked as synchronized. The 
purpose is to protect the HTable (old HBase API) object in legacy codes (before 
commit 
[cf9d248|https://github.com/apache/impala/commit/cf9d2485dd4e6544f6f1f407e2ad0b43eba31874]).
 However, after commit 
[cf9d248|https://github.com/apache/impala/commit/cf9d2485dd4e6544f6f1f407e2ad0b43eba31874],
 we create org.apache.hadoop.hbase.client.Table object for each task (See 
comments and usages of FeHBaseTable.Util.getHBaseTable()). So we don't need the 
"synchronized" marker anymore in HBaseTable.getEstimatedRowStats().

Keeping the "synchronized" marker is further harmful. In high qps workload, 
queries on the same table will wait for entering this method and cost a lot of 
time in waiting (if this method is comparable slow).

This can be revealed by manually adding a latency (e.g. 100ms) in 
FeHBaseTable.Util.getEstimatedRowStats() and run concurrent queries on the same 
hbase table. In my experiment, removing "synchronized" gains 40% boost in 95% 
percentil query time. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8923) Don't need synchronized in HBaseTable.getEstimatedRowStats

2019-09-06 Thread Quanlong Huang (Jira)
Quanlong Huang created IMPALA-8923:
--

 Summary: Don't need synchronized in HBaseTable.getEstimatedRowStats
 Key: IMPALA-8923
 URL: https://issues.apache.org/jira/browse/IMPALA-8923
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Affects Versions: Impala 3.2.0, Impala 3.1.0, Impala 2.12.0, Impala 3.0, 
Impala 2.11.0, Impala 2.10.0, Impala 2.9.0, Impala 2.7.1, Impala 2.8.0, Impala 
2.7.0, Impala 3.3.0
Reporter: Quanlong Huang
Assignee: Quanlong Huang


HBaseTable.getEstimatedRowStats() estimates #rows and row size by sampling on 
hbase table in target key range. It requires HBase RPCs so could be slow.

Currently, HBaseTable.getEstimatedRowStats() is marked as synchronized. The 
purpose is to protect the HTable (old HBase API) object in legacy codes (before 
commit 
[cf9d248|https://github.com/apache/impala/commit/cf9d2485dd4e6544f6f1f407e2ad0b43eba31874]).
 However, after commit 
[cf9d248|https://github.com/apache/impala/commit/cf9d2485dd4e6544f6f1f407e2ad0b43eba31874],
 we create org.apache.hadoop.hbase.client.Table object for each task (See 
comments and usages of FeHBaseTable.Util.getHBaseTable()). So we don't need the 
"synchronized" marker anymore in HBaseTable.getEstimatedRowStats().

Keeping the "synchronized" marker is further harmful. In high qps workload, 
queries on the same table will wait for entering this method and cost a lot of 
time in waiting (if this method is comparable slow).

This can be revealed by manually adding a latency (e.g. 100ms) in 
FeHBaseTable.Util.getEstimatedRowStats() and run concurrent queries on the same 
hbase table. In my experiment, removing "synchronized" gains 40% boost in 95% 
percentil query time. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (IMPALA-8498) Write column index for floating types when NaN is not present

2019-09-06 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-8498:
-

Assignee: Norbert Luksa  (was: Zoltán Borók-Nagy)

> Write column index for floating types when NaN is not present
> -
>
> Key: IMPALA-8498
> URL: https://issues.apache.org/jira/browse/IMPALA-8498
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Norbert Luksa
>Priority: Major
>  Labels: ramp-up
>
> IMPALA-7304 disabled column index writing for floating point columns until 
> PARQUET-1222 is resolved.
> PARQUET-1222 is responsible for defining a total order for floating values, 
> but the problematic values are only the NaNs. Therefore we can write the 
> column index if NaNs are not present in the data. Parquet-MR also does this, 
> following the principles in 
> [https://github.com/apache/parquet-format/blob/75eb7a7b84e6e62bfb09668b6d8d40b12597456e/src/main/thrift/parquet.thrift#L827-L834]
>  
> Impala should follow this behavior, and also when storing zeroes, it should 
> store -0.0 as minimum and +0.0 as maximum.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-8498) Write column index for floating types when NaN is not present

2019-09-06 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy reassigned IMPALA-8498:
-

Assignee: Zoltán Borók-Nagy

> Write column index for floating types when NaN is not present
> -
>
> Key: IMPALA-8498
> URL: https://issues.apache.org/jira/browse/IMPALA-8498
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>  Labels: ramp-up
>
> IMPALA-7304 disabled column index writing for floating point columns until 
> PARQUET-1222 is resolved.
> PARQUET-1222 is responsible for defining a total order for floating values, 
> but the problematic values are only the NaNs. Therefore we can write the 
> column index if NaNs are not present in the data. Parquet-MR also does this, 
> following the principles in 
> [https://github.com/apache/parquet-format/blob/75eb7a7b84e6e62bfb09668b6d8d40b12597456e/src/main/thrift/parquet.thrift#L827-L834]
>  
> Impala should follow this behavior, and also when storing zeroes, it should 
> store -0.0 as minimum and +0.0 as maximum.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8922) When startup independent impalad daemon that trys to open transport for localhost:24000

2019-09-06 Thread shaozhipeng (Jira)
shaozhipeng created IMPALA-8922:
---

 Summary: When startup independent impalad daemon that trys to open 
transport for localhost:24000
 Key: IMPALA-8922
 URL: https://issues.apache.org/jira/browse/IMPALA-8922
 Project: IMPALA
  Issue Type: Bug
Reporter: shaozhipeng


When I have install  impala-server-3.2.0+cdh6.3.0-1279813.el7.x86_64 on a newer 
server node( Other impala-server and impala-state , impala-catalog have 
installed on the other server node - slave3 and running healthy.)

 

Newer Server, when startup impalad daemon that trys to open transport for 
localhost:24000.

 

The catalog and state host was configured in file /etc/default/impala.

 

ps -ef|grep impala output:

/usr/lib/impala/sbin/impalad -log_dir=/sumpay/cdh-impala/logs 
-catalog_service_host=slave3 -state_store_port=24000 -use_statestore 
-state_store_host=slave3 -be_port=22000 
-kudu_master_hosts=slave1:7051,slave2:7051,slave3:7051



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-8922) When startup independent impalad daemon that trys to open transport for localhost:24000

2019-09-06 Thread shaozhipeng (Jira)
shaozhipeng created IMPALA-8922:
---

 Summary: When startup independent impalad daemon that trys to open 
transport for localhost:24000
 Key: IMPALA-8922
 URL: https://issues.apache.org/jira/browse/IMPALA-8922
 Project: IMPALA
  Issue Type: Bug
Reporter: shaozhipeng


When I have install  impala-server-3.2.0+cdh6.3.0-1279813.el7.x86_64 on a newer 
server node( Other impala-server and impala-state , impala-catalog have 
installed on the other server node - slave3 and running healthy.)

 

Newer Server, when startup impalad daemon that trys to open transport for 
localhost:24000.

 

The catalog and state host was configured in file /etc/default/impala.

 

ps -ef|grep impala output:

/usr/lib/impala/sbin/impalad -log_dir=/sumpay/cdh-impala/logs 
-catalog_service_host=slave3 -state_store_port=24000 -use_statestore 
-state_store_host=slave3 -be_port=22000 
-kudu_master_hosts=slave1:7051,slave2:7051,slave3:7051



--
This message was sent by Atlassian Jira
(v8.3.2#803003)