date:20181220

[jira] [Resolved] (IMPALA-7946) SynchronousThreadPool::SynchronousOffer() can return a timeout Status with the wrong time limit

2018-12-20 Thread Joe McDonnell (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-7946.
---
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> SynchronousThreadPool::SynchronousOffer() can return a timeout Status with 
> the wrong time limit
> ---
>
> Key: IMPALA-7946
> URL: https://issues.apache.org/jira/browse/IMPALA-7946
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Blocker
>  Labels: broken-build, flaky
> Fix For: Impala 3.2.0
>
>
> A recent core build failed on custom_cluster/test_hdfs_timeout.py with this 
> test output:
> {noformat}
> custom_cluster/test_hdfs_timeout.py:82: in test_hdfs_open_timeout
> assert len(re.findall(error_pattern, str(ex))) > 0
> E   assert 0 > 0
> E+  where 0 = len([])
> E+where [] = ('hdfsOpenFile\\(\\) 
> for.*failed to finish before the 5 second timeout', 
> 'ImpalaBeeswaxException:\n Query aborted:hdfsOpenFile() for 
> hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=11/091101.txt 
> failed to finish before the 4 second timeout\n\n')
> E+  where  = re.findall
> E+  and   'ImpalaBeeswaxException:\n Query aborted:hdfsOpenFile() for 
> hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=11/091101.txt 
> failed to finish before the 4 second timeout\n\n' = 
> str(ImpalaBeeswaxException()){noformat}
> When executing SynchronousOffer(), two different operation count towards the 
> timeout. The first is submitting the task by calling Offer with the 
> SynchronousWorkItem. The second is waiting for the task to complete by 
> calling SynchronousWorkItem::Wait(). If the first part task takes any 
> measurable time, then SynchronousOffer() modifies the timeout that it passes 
> into SynchronousWorkItem::Wait() so that the total timeout is respected. The 
> enforcement of the new timeout is correct, but it results in an incorrect 
> error message (in this case, showing 4 seconds rather than 5).
> This should pass in the original timeout and the current elapsed time. This 
> would allow for correct enforcement with a correct error message.
> This issue is flaky.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-7946) SynchronousThreadPool::SynchronousOffer() can return a timeout Status with the wrong time limit

2018-12-20 Thread Joe McDonnell (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joe McDonnell resolved IMPALA-7946.
---
   Resolution: Fixed
Fix Version/s: Impala 3.2.0

> SynchronousThreadPool::SynchronousOffer() can return a timeout Status with 
> the wrong time limit
> ---
>
> Key: IMPALA-7946
> URL: https://issues.apache.org/jira/browse/IMPALA-7946
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Blocker
>  Labels: broken-build, flaky
> Fix For: Impala 3.2.0
>
>
> A recent core build failed on custom_cluster/test_hdfs_timeout.py with this 
> test output:
> {noformat}
> custom_cluster/test_hdfs_timeout.py:82: in test_hdfs_open_timeout
> assert len(re.findall(error_pattern, str(ex))) > 0
> E   assert 0 > 0
> E+  where 0 = len([])
> E+where [] = ('hdfsOpenFile\\(\\) 
> for.*failed to finish before the 5 second timeout', 
> 'ImpalaBeeswaxException:\n Query aborted:hdfsOpenFile() for 
> hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=11/091101.txt 
> failed to finish before the 4 second timeout\n\n')
> E+  where  = re.findall
> E+  and   'ImpalaBeeswaxException:\n Query aborted:hdfsOpenFile() for 
> hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=11/091101.txt 
> failed to finish before the 4 second timeout\n\n' = 
> str(ImpalaBeeswaxException()){noformat}
> When executing SynchronousOffer(), two different operation count towards the 
> timeout. The first is submitting the task by calling Offer with the 
> SynchronousWorkItem. The second is waiting for the task to complete by 
> calling SynchronousWorkItem::Wait(). If the first part task takes any 
> measurable time, then SynchronousOffer() modifies the timeout that it passes 
> into SynchronousWorkItem::Wait() so that the total timeout is respected. The 
> enforcement of the new timeout is correct, but it results in an incorrect 
> error message (in this case, showing 4 seconds rather than 5).
> This should pass in the original timeout and the current elapsed time. This 
> would allow for correct enforcement with a correct error message.
> This issue is flaky.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7946) SynchronousThreadPool::SynchronousOffer() can return a timeout Status with the wrong time limit

2018-12-20 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726316#comment-16726316
 ] 

ASF subversion and git services commented on IMPALA-7946:
-

Commit 9a52dd67bad7b8eb84fdeb6fb193505af7af931e in impala's branch 
refs/heads/master from Joe McDonnell
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=9a52dd6 ]

IMPALA-7946: Use original timeout in THREAD_POOL_TASK_TIMED_OUT message

When SynchronousThreadPool::SynchronousOffer() times out, it can
sometimes print the wrong time out in the error message. This happens
because it is enforcing a total timeout across multiple operations.
For example, if there is a total timeout of 5 seconds and the first
step takes 1 second, the remaining step is given a 4 second timeout
to enforce the total timeout. However, this 4 second timeout should
not be expressed in the THREAD_POOL_TASK_TIMED_OUT error message
if the task times out. Instead, SynchronousOffer() should always use
the original timeout as the internal time out is unimportant to users.

This changes the code to make the error message always use the
original timeout.

Change-Id: Ib7bc31f58a8d29abfdc24959dc2730a0ae24ec56
Reviewed-on: http://gerrit.cloudera.org:8080/12062
Reviewed-by: Joe McDonnell 
Tested-by: Impala Public Jenkins 


> SynchronousThreadPool::SynchronousOffer() can return a timeout Status with 
> the wrong time limit
> ---
>
> Key: IMPALA-7946
> URL: https://issues.apache.org/jira/browse/IMPALA-7946
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: Joe McDonnell
>Assignee: Joe McDonnell
>Priority: Blocker
>  Labels: broken-build, flaky
>
> A recent core build failed on custom_cluster/test_hdfs_timeout.py with this 
> test output:
> {noformat}
> custom_cluster/test_hdfs_timeout.py:82: in test_hdfs_open_timeout
> assert len(re.findall(error_pattern, str(ex))) > 0
> E   assert 0 > 0
> E+  where 0 = len([])
> E+where [] = ('hdfsOpenFile\\(\\) 
> for.*failed to finish before the 5 second timeout', 
> 'ImpalaBeeswaxException:\n Query aborted:hdfsOpenFile() for 
> hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=11/091101.txt 
> failed to finish before the 4 second timeout\n\n')
> E+  where  = re.findall
> E+  and   'ImpalaBeeswaxException:\n Query aborted:hdfsOpenFile() for 
> hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=11/091101.txt 
> failed to finish before the 4 second timeout\n\n' = 
> str(ImpalaBeeswaxException()){noformat}
> When executing SynchronousOffer(), two different operation count towards the 
> timeout. The first is submitting the task by calling Offer with the 
> SynchronousWorkItem. The second is waiting for the task to complete by 
> calling SynchronousWorkItem::Wait(). If the first part task takes any 
> measurable time, then SynchronousOffer() modifies the timeout that it passes 
> into SynchronousWorkItem::Wait() so that the total timeout is respected. The 
> enforcement of the new timeout is correct, but it results in an incorrect 
> error message (in this case, showing 4 seconds rather than 5).
> This should pass in the original timeout and the current elapsed time. This 
> would allow for correct enforcement with a correct error message.
> This issue is flaky.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8012) Log a message when --fe_service_threads have been allocated

2018-12-20 Thread Zoram Thanga (JIRA)

Zoram Thanga created IMPALA-8012:


 Summary: Log a message when --fe_service_threads have been 
allocated
 Key: IMPALA-8012
 URL: https://issues.apache.org/jira/browse/IMPALA-8012
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend, Clients
Reporter: Zoram Thanga


The maximum number of front end service threads that can be created at any time 
to handle client connections is controlled by the "--fe_service_threads" 
parameter. When all such threads have been allocated, new connection requests 
get queued, and in theory can be spending an indefinite amount of time in the 
queue. Users perceive this as slow impala connection setup time.

We should log a message when we're near or at --fe_service_threads threads to 
make debugging this situation easier.

cc: [~kwho] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (IMPALA-8012) Log a message when --fe_service_threads have been allocated

2018-12-20 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726259#comment-16726259
 ] 

Tim Armstrong commented on IMPALA-8012:
---

I think this is the same as IMPALA-4327

> Log a message when --fe_service_threads have been allocated
> ---
>
> Key: IMPALA-8012
> URL: https://issues.apache.org/jira/browse/IMPALA-8012
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Clients
>Reporter: Zoram Thanga
>Priority: Major
>
> The maximum number of front end service threads that can be created at any 
> time to handle client connections is controlled by the "--fe_service_threads" 
> parameter. When all such threads have been allocated, new connection requests 
> get queued, and in theory can be spending an indefinite amount of time in the 
> queue. Users perceive this as slow impala connection setup time.
> We should log a message when we're near or at --fe_service_threads threads to 
> make debugging this situation easier.
> cc: [~kwho] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8012) Log a message when --fe_service_threads have been allocated

2018-12-20 Thread Zoram Thanga (JIRA)

Zoram Thanga created IMPALA-8012:


 Summary: Log a message when --fe_service_threads have been 
allocated
 Key: IMPALA-8012
 URL: https://issues.apache.org/jira/browse/IMPALA-8012
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend, Clients
Reporter: Zoram Thanga


The maximum number of front end service threads that can be created at any time 
to handle client connections is controlled by the "--fe_service_threads" 
parameter. When all such threads have been allocated, new connection requests 
get queued, and in theory can be spending an indefinite amount of time in the 
queue. Users perceive this as slow impala connection setup time.

We should log a message when we're near or at --fe_service_threads threads to 
make debugging this situation easier.

cc: [~kwho] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7992) test_decimal_fuzz.py/test_decimal_ops failing in exhaustive runs

2018-12-20 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726229#comment-16726229
 ] 

ASF subversion and git services commented on IMPALA-7992:
-

Commit 5bf81cdc2797f986189aec4e78ebff2c2d1ed1b6 in impala's branch 
refs/heads/master from Bharath Vissapragada
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=5bf81cd ]

IMPALA-7992: Revert "Symbolize stacktraces in debug builds."

This reverts commit 40caf7898cae163d4f8c5b7831341bc55c3bcf35.

This commit is causing decimal test failures on certain exhaustive
debug builds deterministically (IMPALA-7992). With the revert I could
confirm the test passes fine. I'm reverting this while we fix the
underlying issue.

test_width_bucket repeatedly dumps the following stack trace, which is likely
causing high CPU usage during symbolization.

I1219 19:26:12.834262  5201 status.cc:128] AnalysisException: Cannot
resolve DECIMAL types of the width_bucket(DECIMAL(14,4), DECIMAL(21,13),
DECIMAL(38,0), INT) function arguments. You need to wrap the arguments
in a CAST.
@  0x1a36d56  impala::Status::Status()
@  0x215ef28  impala::JniUtil::GetJniExceptionMsg()
@  0x1ff0a01  impala::JniCall::Call<>()
@  0x1fedc5d  impala::JniUtil::CallJniMethod<>()
@  0x1fec68c  impala::Frontend::GetExecRequest()
@  0x2016a8a  impala::ImpalaServer::ExecuteInternal()
@  0x201659e  impala::ImpalaServer::Execute()
@  0x208c008  impala::ImpalaServer::query()
@  0x257e937  beeswax::BeeswaxServiceProcessor::process_query()
@  0x257e685  beeswax::BeeswaxServiceProcessor::dispatchCall()
@  0x2553e56  impala::ImpalaServiceProcessor::dispatchCall()
@  0x19e0165  apache::thrift::TDispatchProcessor::process()
@  0x1e28cb4  
apache::thrift::server::TAcceptQueueServer::Task::run()
@  0x1e1fcc0  impala::ThriftThread::RunRunnable()
@  0x1e213e6  boost::_mfi::mf2<>::operator()()
@  0x1e2127c  boost::_bi::list3<>::operator()<>()
@  0x1e20fc8  boost::_bi::bind_t<>::operator()()
@  0x1e20edb  
boost::detail::function::void_function_obj_invoker0<>::invoke()
@  0x1d448e1  boost::function0<>::operator()()
@  0x21ca604  impala::Thread::SuperviseThread()
@  0x21d2924  boost::_bi::list5<>::operator()<>()
@  0x21d2848  boost::_bi::bind_t<>::operator()()
@  0x21d280b  boost::detail::thread_data<>::run()
@  0x36b8549  thread_proxy
@   0x3ff5207850  (unknown)
@   0x3ff4ee894c  (unknown)

Change-Id: I3646c9cb1c4db9f5295ff7f80d73acca746d296f
Reviewed-on: http://gerrit.cloudera.org:8080/12115
Reviewed-by: Bharath Vissapragada 
Reviewed-by: Tim Armstrong 
Tested-by: Bharath Vissapragada 


> test_decimal_fuzz.py/test_decimal_ops failing in exhaustive runs
> 
>
> Key: IMPALA-7992
> URL: https://issues.apache.org/jira/browse/IMPALA-7992
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: bharath v
>Assignee: Csaba Ringhofer
>Priority: Blocker
>  Labels: broken-build
>
> Error Message
> {noformat}
> query_test/test_decimal_fuzz.py:251: in test_decimal_ops 
> self.execute_one_decimal_op() query_test/test_decimal_fuzz.py:247: in 
> execute_one_decimal_op assert self.result_equals(expected_result, result) E 
> assert  >(Decimal('-0.80'), 
> None) E + where  > = 
> .result_equals
> {noformat}
> Stacktrace
> {noformat}
> query_test/test_decimal_fuzz.py:251: in test_decimal_ops 
> self.execute_one_decimal_op() query_test/test_decimal_fuzz.py:247: in 
> execute_one_decimal_op assert self.result_equals(expected_result, result) E 
> assert  >(Decimal('-0.80'), 
> None) E + where  > = 
> .result_equals
> {noformat}
> stderr
> {noformat}
> -- 2018-12-16 00:10:48,905 INFO MainThread: Started query 
> aa4b44ad5b34c3fb:24d18385
> SET decimal_v2=true;
> -- executing against localhost:21000
> select cast(-879550566.24 as decimal(11,2)) % 
> cast(-100.000 as decimal(28,5));
> -- 2018-12-16 00:10:48,979 INFO MainThread: Started query 
> b24acf22b1607dc6:4f287530
> SET decimal_v2=true;
> -- executing against localhost:21000
> select cast(17179869.184 as decimal(19,7)) / 
> cast(-87808593158000679814.7939232649738916 as decimal(38,17));
> -- 2018-12-16 00:10:49,054 INFO MainThread: Started query 
> 38435f02022e590a:18f7e97
> SET decimal_v2=true;
> -- executing against localhost:21000
> select cast(99 as decimal(32,2)) - 
> cast(-519203.671959101313 as decimal(18,12));
> -- 2018-12-16 00:10:49,132 INFO MainThread:

[jira] [Commented] (IMPALA-7992) test_decimal_fuzz.py/test_decimal_ops failing in exhaustive runs

2018-12-20 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726228#comment-16726228
 ] 

ASF subversion and git services commented on IMPALA-7992:
-

Commit 5bf81cdc2797f986189aec4e78ebff2c2d1ed1b6 in impala's branch 
refs/heads/master from Bharath Vissapragada
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=5bf81cd ]

IMPALA-7992: Revert "Symbolize stacktraces in debug builds."

This reverts commit 40caf7898cae163d4f8c5b7831341bc55c3bcf35.

This commit is causing decimal test failures on certain exhaustive
debug builds deterministically (IMPALA-7992). With the revert I could
confirm the test passes fine. I'm reverting this while we fix the
underlying issue.

test_width_bucket repeatedly dumps the following stack trace, which is likely
causing high CPU usage during symbolization.

I1219 19:26:12.834262  5201 status.cc:128] AnalysisException: Cannot
resolve DECIMAL types of the width_bucket(DECIMAL(14,4), DECIMAL(21,13),
DECIMAL(38,0), INT) function arguments. You need to wrap the arguments
in a CAST.
@  0x1a36d56  impala::Status::Status()
@  0x215ef28  impala::JniUtil::GetJniExceptionMsg()
@  0x1ff0a01  impala::JniCall::Call<>()
@  0x1fedc5d  impala::JniUtil::CallJniMethod<>()
@  0x1fec68c  impala::Frontend::GetExecRequest()
@  0x2016a8a  impala::ImpalaServer::ExecuteInternal()
@  0x201659e  impala::ImpalaServer::Execute()
@  0x208c008  impala::ImpalaServer::query()
@  0x257e937  beeswax::BeeswaxServiceProcessor::process_query()
@  0x257e685  beeswax::BeeswaxServiceProcessor::dispatchCall()
@  0x2553e56  impala::ImpalaServiceProcessor::dispatchCall()
@  0x19e0165  apache::thrift::TDispatchProcessor::process()
@  0x1e28cb4  
apache::thrift::server::TAcceptQueueServer::Task::run()
@  0x1e1fcc0  impala::ThriftThread::RunRunnable()
@  0x1e213e6  boost::_mfi::mf2<>::operator()()
@  0x1e2127c  boost::_bi::list3<>::operator()<>()
@  0x1e20fc8  boost::_bi::bind_t<>::operator()()
@  0x1e20edb  
boost::detail::function::void_function_obj_invoker0<>::invoke()
@  0x1d448e1  boost::function0<>::operator()()
@  0x21ca604  impala::Thread::SuperviseThread()
@  0x21d2924  boost::_bi::list5<>::operator()<>()
@  0x21d2848  boost::_bi::bind_t<>::operator()()
@  0x21d280b  boost::detail::thread_data<>::run()
@  0x36b8549  thread_proxy
@   0x3ff5207850  (unknown)
@   0x3ff4ee894c  (unknown)

Change-Id: I3646c9cb1c4db9f5295ff7f80d73acca746d296f
Reviewed-on: http://gerrit.cloudera.org:8080/12115
Reviewed-by: Bharath Vissapragada 
Reviewed-by: Tim Armstrong 
Tested-by: Bharath Vissapragada 


> test_decimal_fuzz.py/test_decimal_ops failing in exhaustive runs
> 
>
> Key: IMPALA-7992
> URL: https://issues.apache.org/jira/browse/IMPALA-7992
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: bharath v
>Assignee: Csaba Ringhofer
>Priority: Blocker
>  Labels: broken-build
>
> Error Message
> {noformat}
> query_test/test_decimal_fuzz.py:251: in test_decimal_ops 
> self.execute_one_decimal_op() query_test/test_decimal_fuzz.py:247: in 
> execute_one_decimal_op assert self.result_equals(expected_result, result) E 
> assert  >(Decimal('-0.80'), 
> None) E + where  > = 
> .result_equals
> {noformat}
> Stacktrace
> {noformat}
> query_test/test_decimal_fuzz.py:251: in test_decimal_ops 
> self.execute_one_decimal_op() query_test/test_decimal_fuzz.py:247: in 
> execute_one_decimal_op assert self.result_equals(expected_result, result) E 
> assert  >(Decimal('-0.80'), 
> None) E + where  > = 
> .result_equals
> {noformat}
> stderr
> {noformat}
> -- 2018-12-16 00:10:48,905 INFO MainThread: Started query 
> aa4b44ad5b34c3fb:24d18385
> SET decimal_v2=true;
> -- executing against localhost:21000
> select cast(-879550566.24 as decimal(11,2)) % 
> cast(-100.000 as decimal(28,5));
> -- 2018-12-16 00:10:48,979 INFO MainThread: Started query 
> b24acf22b1607dc6:4f287530
> SET decimal_v2=true;
> -- executing against localhost:21000
> select cast(17179869.184 as decimal(19,7)) / 
> cast(-87808593158000679814.7939232649738916 as decimal(38,17));
> -- 2018-12-16 00:10:49,054 INFO MainThread: Started query 
> 38435f02022e590a:18f7e97
> SET decimal_v2=true;
> -- executing against localhost:21000
> select cast(99 as decimal(32,2)) - 
> cast(-519203.671959101313 as decimal(18,12));
> -- 2018-12-16 00:10:49,132 INFO MainThread:

[jira] [Updated] (IMPALA-8011) Allow filtering on virtual column for file name

2018-12-20 Thread Peter Ebert (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Ebert updated IMPALA-8011:

Description: 
An additional performance enhancement would be the capability to filter on file 
names using a virtual column.  This would be somewhat like the current 
optimization of sorting data and skipping files based on parquet metadata, but 
instead you put something in the file name to indicate it's contents should be 
filtered.

For example say you were writing first names and then searching for them, 
during your writing phase you put the first letter of the first name into your 
file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" then when 
doing a query you could filter based on where INPUT__FILE__NAME contains "D" 
when searching for David and skip reading the file.

Another use would be if you had a daily partition, and you put the timestamp 
into the file name, then limit the search to only the last hour even though 
your partition is daily. This then gives you the ability to sort by another 
column making searches even faster on both.

 

This requires IMPALA-801

  was:
An additional performance enhancement would be the capability to filter on file 
names using a virtual column.  This would be somewhat like the current 
optimization of sorting data and skipping files based on parquet metadata, but 
instead you put something in the file name to indicate it's contents should be 
filtered.

For example say you were writing first names and then searching for them, 
during your writing phase you put the first letter of the first name into your 
file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" then when 
doing a query you could filter based on where INPUT__FILE__NAME contains "D" 
when searching for David and skip reading the file.

One use would be if you had a daily partition, and you put the timestamp into 
the file name, then limit the search to only the last hour even though your 
partition is daily. This then leaves you the ability to sort by another column 
making searches even faster on both.

 

This requires IMPALA-801


> Allow filtering on virtual column for file name
> ---
>
> Key: IMPALA-8011
> URL: https://issues.apache.org/jira/browse/IMPALA-8011
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Peter Ebert
>Priority: Major
>  Labels: built-in-function
>
> An additional performance enhancement would be the capability to filter on 
> file names using a virtual column.  This would be somewhat like the current 
> optimization of sorting data and skipping files based on parquet metadata, 
> but instead you put something in the file name to indicate it's contents 
> should be filtered.
> For example say you were writing first names and then searching for them, 
> during your writing phase you put the first letter of the first name into 
> your file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" 
> then when doing a query you could filter based on where INPUT__FILE__NAME 
> contains "D" when searching for David and skip reading the file.
> Another use would be if you had a daily partition, and you put the timestamp 
> into the file name, then limit the search to only the last hour even though 
> your partition is daily. This then gives you the ability to sort by another 
> column making searches even faster on both.
>  
> This requires IMPALA-801



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-8011) Allow filtering on virtual column for file name

2018-12-20 Thread Peter Ebert (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Ebert updated IMPALA-8011:

Description: 
An additional performance enhancement would be the capability to filter on file 
names using a virtual column.  This would be somewhat like the current 
optimization of sorting data and skipping files based on parquet metadata, but 
instead you put something in the file name to indicate it's contents should be 
filtered.

For example say you were writing first names and then searching for them, 
during your writing phase you put the first letter of the first name into your 
file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" then when 
doing a query you could filter based on where INPUT__FILE__NAME contains "D" 
when searching for David and skip reading the file.

One use would be if you had a daily partition, and you put the timestamp into 
the file name, then limit the search to only the last hour even though your 
partition is daily. This then leaves you the ability to sort by another column 
making searches even faster on both.

 

This requires IMPALA-801

  was:
An additional performance enhancement would be to be able to filter on file 
names using a virtual column.  It would be somewhat the current optimization of 
sorting data and skipping files based on metadata, but instead you put 
something in the file name to indicate it's contents should be filtered.

For example say you were writing first names and then searching for them, 
during your writing phase you put the first letter of the first name into your 
file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" then when 
doing a query you could filter based on where INPUT__FILE__NAME contains "D" 
when searching for David and skip reading the file.

One use would be if you had a daily partition, and you put the timestamp into 
the file name, then limit the search to only the last hour even though your 
partition is daily. This then leaves you the ability to sort by another column 
making searches even faster on both.

 

This requires IMPALA-801


> Allow filtering on virtual column for file name
> ---
>
> Key: IMPALA-8011
> URL: https://issues.apache.org/jira/browse/IMPALA-8011
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Peter Ebert
>Priority: Major
>  Labels: built-in-function
>
> An additional performance enhancement would be the capability to filter on 
> file names using a virtual column.  This would be somewhat like the current 
> optimization of sorting data and skipping files based on parquet metadata, 
> but instead you put something in the file name to indicate it's contents 
> should be filtered.
> For example say you were writing first names and then searching for them, 
> during your writing phase you put the first letter of the first name into 
> your file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" 
> then when doing a query you could filter based on where INPUT__FILE__NAME 
> contains "D" when searching for David and skip reading the file.
> One use would be if you had a daily partition, and you put the timestamp into 
> the file name, then limit the search to only the last hour even though your 
> partition is daily. This then leaves you the ability to sort by another 
> column making searches even faster on both.
>  
> This requires IMPALA-801



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8011) Allow filtering on virtual column for file name

2018-12-20 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726171#comment-16726171
 ] 

Tim Armstrong commented on IMPALA-8011:
---

[~skye] had a prototype patch to add virtual columns like this a few years ago, 
the implementation idea was to treat it similarly to partition key columns and 
add it to the "template tuple" in the scanner.

> Allow filtering on virtual column for file name
> ---
>
> Key: IMPALA-8011
> URL: https://issues.apache.org/jira/browse/IMPALA-8011
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Peter Ebert
>Priority: Major
>  Labels: built-in-function
>
> An additional performance enhancement would be to be able to filter on file 
> names using a virtual column.  It would be somewhat the current optimization 
> of sorting data and skipping files based on metadata, but instead you put 
> something in the file name to indicate it's contents should be filtered.
> For example say you were writing first names and then searching for them, 
> during your writing phase you put the first letter of the first name into 
> your file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" 
> then when doing a query you could filter based on where INPUT__FILE__NAME 
> contains "D" when searching for David and skip reading the file.
> One use would be if you had a daily partition, and you put the timestamp into 
> the file name, then limit the search to only the last hour even though your 
> partition is daily. This then leaves you the ability to sort by another 
> column making searches even faster on both.
>  
> This requires IMPALA-801



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7994) Queries hitting memory limit issues in release builds

2018-12-20 Thread Bikramjeet Vig (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726150#comment-16726150
 ] 

Bikramjeet Vig commented on IMPALA-7994:


Keeping this open for a while to make sure it doesn't occur again after the 
recent commit 

> Queries hitting memory limit issues in release builds
> -
>
> Key: IMPALA-7994
> URL: https://issues.apache.org/jira/browse/IMPALA-7994
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: bharath v
>Assignee: Bikramjeet Vig
>Priority: Blocker
>  Labels: broken-build
>
> This usually causes multiple test failures, especially the ones running 
> around the time memory is oversubscribed. The failures in one of builds I 
> noticed are.
> {noformat}
>  query_test.test_queries.TestQueriesTextTables.test_random[protocol: beeswax 
> | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
> 0} | table_format: text/none]  19 sec  1
>  
> query_test.test_runtime_filters.TestRuntimeRowFilters.test_row_filters[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
> 0} | table_format: parquet/none]   19 sec  1
>  query_test.test_queries.TestHdfsQueries.test_file_partitions[protocol: 
> beeswax | exec_option: {'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': '100', 'batch_size': 0, 'num_nodes': 0} | 
> table_format: avro/snap/block] 2.2 sec 1
>  
> query_test.test_aggregation.TestDistinctAggregation.test_multiple_distinct[protocol:
>  beeswax | exec_option: {'disable_codegen': False, 'shuffle_distinct_exprs': 
> True} | table_format: seq/gzip/block]   1.8 sec 1
>  query_test.test_queries.TestQueriesTextTables.test_values[protocol: beeswax 
> | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
> 0} | table_format: text/none]   60 ms   1
>  query_test.test_queries.TestHdfsQueries.test_file_partitions[protocol: 
> beeswax | exec_option: {'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': '100', 'batch_size': 0, 'num_nodes': 0} | 
> table_format: rc/bzip/block]   7 ms1
>  
> query_test.test_aggregation.TestDistinctAggregation.test_multiple_distinct[protocol:
>  beeswax | exec_option: {'disable_codegen': False, 'shuffle_distinct_exprs': 
> True} | table_format: text/none]60 ms   1
>  query_test.test_queries.TestHdfsQueries.test_file_partitions[protocol: 
> beeswax | exec_option: {'disable_codegen_rows_threshold': 0, 
> 'disable_codegen': True, 'abort_on_error': 1, 'debug_action': None, 
> 'exec_single_node_rows_threshold': '100', 'batch_size': 0, 'num_nodes': 0} | 
> table_format: rc/bzip/block]7 ms1
>  
> query_test.test_runtime_filters.TestRuntimeRowFilters.test_row_filters[protocol:
>  beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
> 0} | table_format: parquet/none]  76 ms   1
>  query_test.test_queries.TestHdfsQueries.test_file_partitions[protocol: 
> beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': 
> 0} | table_format: text/lzo/block]  7 ms1
>  verifiers.test_verify_metrics.TestValidateMetrics.test_metrics_are_zero
> {noformat}
> Following is the mem-tracker dump from one of the failed queries.
> {noformat}
> Stacktrace
> query_test/test_queries.py:182: in test_random
> self.run_test_case('QueryTest/random', vector)
> common/impala_test_suite.py:467: in run_test_case
> result = self.__execute_query(target_impalad_client, query, user=user)
> common/impala_test_suite.py:688: in __execute_query
> return impalad_client.execute(query, user=user)
> common/impala_connection.py:170: in execute
> return self.__beeswax_client.execute(sql_stmt, user=user)
> beeswax/impala_beeswax.py:182: in execute
> handle = self.__execute_query(query_string.strip(), user=user)
> beeswax/impala_beeswax.py:359: in __execute_query
> self.wait_for_finished(handle)
> beeswax/impala_beeswax.py:380: in

[jira] [Closed] (IMPALA-6352) TestTableSample took too long in recent tests

2018-12-20 Thread Bikramjeet Vig (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikramjeet Vig closed IMPALA-6352.
--
Resolution: Cannot Reproduce

> TestTableSample took too long in recent tests
> -
>
> Key: IMPALA-6352
> URL: https://issues.apache.org/jira/browse/IMPALA-6352
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.12.0
>Reporter: Vuk Ercegovac
>Assignee: Bikramjeet Vig
>Priority: Critical
>  Labels: broken-build
> Attachments: impala6352pstacks.tar.gz
>
>
> TestTableSample test took ~8 hours in recent (12/21) exhaustive rhel tests. 
> That caused the overall test to be aborted:
> ...
> 01:53:10 [gw2] PASSED 
> query_test/test_tablesample.py::TestTableSample::test_tablesample[repeatable: 
> True | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> seq/gzip/block] 
> 01:53:10 
> query_test/test_tablesample.py::TestTableSample::test_tablesample[repeatable: 
> False | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> seq/gzip/block] 
> 10:03:51 [gw2] PASSED 
> query_test/test_tablesample.py::TestTableSample::test_tablesample[repeatable: 
> False | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> seq/gzip/block] Build timed out (after 1,440 minutes). Marking the build as 
> aborted.
> 10:03:51 Build was aborted
> ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Closed] (IMPALA-6352) TestTableSample took too long in recent tests

2018-12-20 Thread Bikramjeet Vig (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikramjeet Vig closed IMPALA-6352.
--
Resolution: Cannot Reproduce

> TestTableSample took too long in recent tests
> -
>
> Key: IMPALA-6352
> URL: https://issues.apache.org/jira/browse/IMPALA-6352
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.12.0
>Reporter: Vuk Ercegovac
>Assignee: Bikramjeet Vig
>Priority: Critical
>  Labels: broken-build
> Attachments: impala6352pstacks.tar.gz
>
>
> TestTableSample test took ~8 hours in recent (12/21) exhaustive rhel tests. 
> That caused the overall test to be aborted:
> ...
> 01:53:10 [gw2] PASSED 
> query_test/test_tablesample.py::TestTableSample::test_tablesample[repeatable: 
> True | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> seq/gzip/block] 
> 01:53:10 
> query_test/test_tablesample.py::TestTableSample::test_tablesample[repeatable: 
> False | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> seq/gzip/block] 
> 10:03:51 [gw2] PASSED 
> query_test/test_tablesample.py::TestTableSample::test_tablesample[repeatable: 
> False | exec_option: {'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> seq/gzip/block] Build timed out (after 1,440 minutes). Marking the build as 
> aborted.
> 10:03:51 Build was aborted
> ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7213) Port ReportExecStatus() RPCs to KRPC

2018-12-20 Thread ASF subversion and git services (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726144#comment-16726144
 ] 

ASF subversion and git services commented on IMPALA-7213:
-

Commit db4bc0844015d03c87f195fd48838fa4c755f902 in impala's branch 
refs/heads/master from Michael Ho
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=db4bc08 ]

IMPALA-7213: Use separate network plane for DataStream and Control services

This change is a follow up for a review comment in
https://gerrit.cloudera.org/#/c/10855/ about
separating the TCP connections of DataStream and Control
services so that control commands don't get blocked behind
large payloads being sent in the DataStream services.

Testing done: exhaustive build

Change-Id: I774f4a0e2cfedc4dba72cde4f5d28898cdbdc236
Reviewed-on: http://gerrit.cloudera.org:8080/12107
Reviewed-by: Michael Ho 
Tested-by: Impala Public Jenkins 


> Port ReportExecStatus() RPCs to KRPC
> 
>
> Key: IMPALA-7213
> URL: https://issues.apache.org/jira/browse/IMPALA-7213
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Affects Versions: Impala 3.0
>Reporter: Michael Ho
>Assignee: Michael Ho
>Priority: Major
>  Labels: impala-scalability-sprint-08-13-2018
>
> This is a sub-task to track the porting of ReportExecStatus() to KRPC. This 
> should help reduce the number of connections to coordinator.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-8011) Allow filtering on virtual column for file name

2018-12-20 Thread Greg Rahn (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Rahn updated IMPALA-8011:
--
Labels: built-in-function  (was: )

> Allow filtering on virtual column for file name
> ---
>
> Key: IMPALA-8011
> URL: https://issues.apache.org/jira/browse/IMPALA-8011
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Peter Ebert
>Priority: Major
>  Labels: built-in-function
>
> An additional performance enhancement would be to be able to filter on file 
> names using a virtual column.  It would be somewhat the current optimization 
> of sorting data and skipping files based on metadata, but instead you put 
> something in the file name to indicate it's contents should be filtered.
> For example say you were writing first names and then searching for them, 
> during your writing phase you put the first letter of the first name into 
> your file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" 
> then when doing a query you could filter based on where INPUT__FILE__NAME 
> contains "D" when searching for David and skip reading the file.
> One use would be if you had a daily partition, and you put the timestamp into 
> the file name, then limit the search to only the last hour even though your 
> partition is daily. This then leaves you the ability to sort by another 
> column making searches even faster on both.
>  
> This requires IMPALA-801



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-8011) Allow filtering on virtual column for file name

2018-12-20 Thread Peter Ebert (JIRA)

Peter Ebert created IMPALA-8011:
---

 Summary: Allow filtering on virtual column for file name
 Key: IMPALA-8011
 URL: https://issues.apache.org/jira/browse/IMPALA-8011
 Project: IMPALA
  Issue Type: Improvement
Reporter: Peter Ebert


An additional performance enhancement would be to be able to filter on file 
names using a virtual column.  It would be somewhat the current optimization of 
sorting data and skipping files based on metadata, but instead you put 
something in the file name to indicate it's contents should be filtered.

For example say you were writing first names and then searching for them, 
during your writing phase you put the first letter of the first name into your 
file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" then when 
doing a query you could filter based on where INPUT__FILE__NAME contains "D" 
when searching for David and skip reading the file.

One use would be if you had a daily partition, and you put the timestamp into 
the file name, then limit the search to only the last hour even though your 
partition is daily. This then leaves you the ability to sort by another column 
making searches even faster on both.

 

This requires IMPALA-801



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (IMPALA-8011) Allow filtering on virtual column for file name

2018-12-20 Thread Peter Ebert (JIRA)

Peter Ebert created IMPALA-8011:
---

 Summary: Allow filtering on virtual column for file name
 Key: IMPALA-8011
 URL: https://issues.apache.org/jira/browse/IMPALA-8011
 Project: IMPALA
  Issue Type: Improvement
Reporter: Peter Ebert


An additional performance enhancement would be to be able to filter on file 
names using a virtual column.  It would be somewhat the current optimization of 
sorting data and skipping files based on metadata, but instead you put 
something in the file name to indicate it's contents should be filtered.

For example say you were writing first names and then searching for them, 
during your writing phase you put the first letter of the first name into your 
file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" then when 
doing a query you could filter based on where INPUT__FILE__NAME contains "D" 
when searching for David and skip reading the file.

One use would be if you had a daily partition, and you put the timestamp into 
the file name, then limit the search to only the last hour even though your 
partition is daily. This then leaves you the ability to sort by another column 
making searches even faster on both.

 

This requires IMPALA-801



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-8007) test_slow_subscriber is flaky

2018-12-20 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-8007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726000#comment-16726000
 ] 

Tim Armstrong commented on IMPALA-8007:
---

Pooja I think is afk for a couple of weeks, if it reoccurs we should probably 
find someone else to fix.

> test_slow_subscriber is flaky
> -
>
> Key: IMPALA-8007
> URL: https://issues.apache.org/jira/browse/IMPALA-8007
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.2.0
>Reporter: bharath v
>Assignee: Pooja Nilangekar
>Priority: Major
>  Labels: broken-build, flaky
> Fix For: Impala 3.2.0
>
>
> We have hit both the asserts in the test.
> *Exhaustive:*
> {noformat}
> statestore/test_statestore.py:574: in test_slow_subscriber assert 
> (secs_since_heartbeat < float(sleep_time + 1.0)) E   assert 
> 8.8043 < 6.0 E+  where 6.0 = float((5 + 1.0))
> Stacktrace
> statestore/test_statestore.py:574: in test_slow_subscriber
> assert (secs_since_heartbeat < float(sleep_time + 1.0))
> E   assert 8.8043 < 6.0
> E+  where 6.0 = float((5 + 1.0))
> {noformat}
> *ASAN*
> {noformat}
> Error Message
> statestore/test_statestore.py:573: in t assert (secs_since_heartbeat > 
> float(sleep_time - 1.0)) E   assert 4.995 > 5.0 E+  where 5.0 = float((6 
> - 1.0))
> Stacktrace
> statestore/test_statestore.py:573: in test_slow_subscriber
> assert (secs_since_heartbeat > float(sleep_time - 1.0))
> E   assert 4.995 > 5.0
> E+  where 5.0 = float((6 - 1.0))
> {noformat}
> I only noticed this happen twice (the above two instances) since the patch is 
> committed. So, looks like a racy bug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-376) Built-in functions for parsing JSON

2018-12-20 Thread Quanlong Huang (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-376.
---
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Built-in functions for parsing JSON
> ---
>
> Key: IMPALA-376
> URL: https://issues.apache.org/jira/browse/IMPALA-376
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: Product Backlog
> Environment: All supported environments
>Reporter: Zoltan Toth-Czifra
>Assignee: Quanlong Huang
>Priority: Minor
>  Labels: built-in-function
> Fix For: Impala 3.1.0
>
>
> Hi,
> Hive comes with some useful built-in UDFs to process JSON objects.
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
> Namely:
> - get_json_object
> - json_tuple
> To make Impala and Hive tables and quieries more interchangable, I am 
> proposing porting these UDFs to be part Impala's built in functions:
> http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_functions.html
> h4. Example
> Consider the following table *raw_log*
> ||action||parameters||
> |search|{"keyword":"hotel"}|
> |visit|{"url":"http://example.com"}|
> ...and the following query:
> {code}
> SELECT get_json_object(event_params, "$.keyword") AS keyword FROM raw_log 
> WHERE action='search';
> {code}
> The query should return the following results:
> ||keyword||
> |hotel|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Resolved] (IMPALA-376) Built-in functions for parsing JSON

2018-12-20 Thread Quanlong Huang (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-376.
---
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Built-in functions for parsing JSON
> ---
>
> Key: IMPALA-376
> URL: https://issues.apache.org/jira/browse/IMPALA-376
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Backend
>Affects Versions: Product Backlog
> Environment: All supported environments
>Reporter: Zoltan Toth-Czifra
>Assignee: Quanlong Huang
>Priority: Minor
>  Labels: built-in-function
> Fix For: Impala 3.1.0
>
>
> Hi,
> Hive comes with some useful built-in UDFs to process JSON objects.
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF
> Namely:
> - get_json_object
> - json_tuple
> To make Impala and Hive tables and quieries more interchangable, I am 
> proposing porting these UDFs to be part Impala's built in functions:
> http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_functions.html
> h4. Example
> Consider the following table *raw_log*
> ||action||parameters||
> |search|{"keyword":"hotel"}|
> |visit|{"url":"http://example.com"}|
> ...and the following query:
> {code}
> SELECT get_json_object(event_params, "$.keyword") AS keyword FROM raw_log 
> WHERE action='search';
> {code}
> The query should return the following results:
> ||keyword||
> |hotel|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-7946) SynchronousThreadPool::SynchronousOffer() can return a timeout Status with the wrong time limit

[jira] [Resolved] (IMPALA-7946) SynchronousThreadPool::SynchronousOffer() can return a timeout Status with the wrong time limit

[jira] [Commented] (IMPALA-7946) SynchronousThreadPool::SynchronousOffer() can return a timeout Status with the wrong time limit

[jira] [Created] (IMPALA-8012) Log a message when --fe_service_threads have been allocated

[jira] [Commented] (IMPALA-8012) Log a message when --fe_service_threads have been allocated

[jira] [Created] (IMPALA-8012) Log a message when --fe_service_threads have been allocated

[jira] [Commented] (IMPALA-7992) test_decimal_fuzz.py/test_decimal_ops failing in exhaustive runs

[jira] [Commented] (IMPALA-7992) test_decimal_fuzz.py/test_decimal_ops failing in exhaustive runs

[jira] [Updated] (IMPALA-8011) Allow filtering on virtual column for file name

[jira] [Updated] (IMPALA-8011) Allow filtering on virtual column for file name

[jira] [Commented] (IMPALA-8011) Allow filtering on virtual column for file name

[jira] [Commented] (IMPALA-7994) Queries hitting memory limit issues in release builds

[jira] [Closed] (IMPALA-6352) TestTableSample took too long in recent tests

[jira] [Closed] (IMPALA-6352) TestTableSample took too long in recent tests

[jira] [Commented] (IMPALA-7213) Port ReportExecStatus() RPCs to KRPC

[jira] [Updated] (IMPALA-8011) Allow filtering on virtual column for file name

[jira] [Created] (IMPALA-8011) Allow filtering on virtual column for file name

[jira] [Created] (IMPALA-8011) Allow filtering on virtual column for file name

[jira] [Commented] (IMPALA-8007) test_slow_subscriber is flaky

[jira] [Resolved] (IMPALA-376) Built-in functions for parsing JSON

[jira] [Resolved] (IMPALA-376) Built-in functions for parsing JSON

21 matches

Site Navigation

Mail list logo

Footer information