[jira] [Resolved] (IMPALA-7946) SynchronousThreadPool::SynchronousOffer() can return a timeout Status with the wrong time limit
[ https://issues.apache.org/jira/browse/IMPALA-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe McDonnell resolved IMPALA-7946. --- Resolution: Fixed Fix Version/s: Impala 3.2.0 > SynchronousThreadPool::SynchronousOffer() can return a timeout Status with > the wrong time limit > --- > > Key: IMPALA-7946 > URL: https://issues.apache.org/jira/browse/IMPALA-7946 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.2.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Blocker > Labels: broken-build, flaky > Fix For: Impala 3.2.0 > > > A recent core build failed on custom_cluster/test_hdfs_timeout.py with this > test output: > {noformat} > custom_cluster/test_hdfs_timeout.py:82: in test_hdfs_open_timeout > assert len(re.findall(error_pattern, str(ex))) > 0 > E assert 0 > 0 > E+ where 0 = len([]) > E+where [] = ('hdfsOpenFile\\(\\) > for.*failed to finish before the 5 second timeout', > 'ImpalaBeeswaxException:\n Query aborted:hdfsOpenFile() for > hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=11/091101.txt > failed to finish before the 4 second timeout\n\n') > E+ where = re.findall > E+ and 'ImpalaBeeswaxException:\n Query aborted:hdfsOpenFile() for > hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=11/091101.txt > failed to finish before the 4 second timeout\n\n' = > str(ImpalaBeeswaxException()){noformat} > When executing SynchronousOffer(), two different operation count towards the > timeout. The first is submitting the task by calling Offer with the > SynchronousWorkItem. The second is waiting for the task to complete by > calling SynchronousWorkItem::Wait(). If the first part task takes any > measurable time, then SynchronousOffer() modifies the timeout that it passes > into SynchronousWorkItem::Wait() so that the total timeout is respected. The > enforcement of the new timeout is correct, but it results in an incorrect > error message (in this case, showing 4 seconds rather than 5). > This should pass in the original timeout and the current elapsed time. This > would allow for correct enforcement with a correct error message. > This issue is flaky. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-7946) SynchronousThreadPool::SynchronousOffer() can return a timeout Status with the wrong time limit
[ https://issues.apache.org/jira/browse/IMPALA-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joe McDonnell resolved IMPALA-7946. --- Resolution: Fixed Fix Version/s: Impala 3.2.0 > SynchronousThreadPool::SynchronousOffer() can return a timeout Status with > the wrong time limit > --- > > Key: IMPALA-7946 > URL: https://issues.apache.org/jira/browse/IMPALA-7946 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.2.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Blocker > Labels: broken-build, flaky > Fix For: Impala 3.2.0 > > > A recent core build failed on custom_cluster/test_hdfs_timeout.py with this > test output: > {noformat} > custom_cluster/test_hdfs_timeout.py:82: in test_hdfs_open_timeout > assert len(re.findall(error_pattern, str(ex))) > 0 > E assert 0 > 0 > E+ where 0 = len([]) > E+where [] = ('hdfsOpenFile\\(\\) > for.*failed to finish before the 5 second timeout', > 'ImpalaBeeswaxException:\n Query aborted:hdfsOpenFile() for > hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=11/091101.txt > failed to finish before the 4 second timeout\n\n') > E+ where = re.findall > E+ and 'ImpalaBeeswaxException:\n Query aborted:hdfsOpenFile() for > hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=11/091101.txt > failed to finish before the 4 second timeout\n\n' = > str(ImpalaBeeswaxException()){noformat} > When executing SynchronousOffer(), two different operation count towards the > timeout. The first is submitting the task by calling Offer with the > SynchronousWorkItem. The second is waiting for the task to complete by > calling SynchronousWorkItem::Wait(). If the first part task takes any > measurable time, then SynchronousOffer() modifies the timeout that it passes > into SynchronousWorkItem::Wait() so that the total timeout is respected. The > enforcement of the new timeout is correct, but it results in an incorrect > error message (in this case, showing 4 seconds rather than 5). > This should pass in the original timeout and the current elapsed time. This > would allow for correct enforcement with a correct error message. > This issue is flaky. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7946) SynchronousThreadPool::SynchronousOffer() can return a timeout Status with the wrong time limit
[ https://issues.apache.org/jira/browse/IMPALA-7946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726316#comment-16726316 ] ASF subversion and git services commented on IMPALA-7946: - Commit 9a52dd67bad7b8eb84fdeb6fb193505af7af931e in impala's branch refs/heads/master from Joe McDonnell [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=9a52dd6 ] IMPALA-7946: Use original timeout in THREAD_POOL_TASK_TIMED_OUT message When SynchronousThreadPool::SynchronousOffer() times out, it can sometimes print the wrong time out in the error message. This happens because it is enforcing a total timeout across multiple operations. For example, if there is a total timeout of 5 seconds and the first step takes 1 second, the remaining step is given a 4 second timeout to enforce the total timeout. However, this 4 second timeout should not be expressed in the THREAD_POOL_TASK_TIMED_OUT error message if the task times out. Instead, SynchronousOffer() should always use the original timeout as the internal time out is unimportant to users. This changes the code to make the error message always use the original timeout. Change-Id: Ib7bc31f58a8d29abfdc24959dc2730a0ae24ec56 Reviewed-on: http://gerrit.cloudera.org:8080/12062 Reviewed-by: Joe McDonnell Tested-by: Impala Public Jenkins > SynchronousThreadPool::SynchronousOffer() can return a timeout Status with > the wrong time limit > --- > > Key: IMPALA-7946 > URL: https://issues.apache.org/jira/browse/IMPALA-7946 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.2.0 >Reporter: Joe McDonnell >Assignee: Joe McDonnell >Priority: Blocker > Labels: broken-build, flaky > > A recent core build failed on custom_cluster/test_hdfs_timeout.py with this > test output: > {noformat} > custom_cluster/test_hdfs_timeout.py:82: in test_hdfs_open_timeout > assert len(re.findall(error_pattern, str(ex))) > 0 > E assert 0 > 0 > E+ where 0 = len([]) > E+where [] = ('hdfsOpenFile\\(\\) > for.*failed to finish before the 5 second timeout', > 'ImpalaBeeswaxException:\n Query aborted:hdfsOpenFile() for > hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=11/091101.txt > failed to finish before the 4 second timeout\n\n') > E+ where = re.findall > E+ and 'ImpalaBeeswaxException:\n Query aborted:hdfsOpenFile() for > hdfs://localhost:20500/test-warehouse/alltypes/year=2009/month=11/091101.txt > failed to finish before the 4 second timeout\n\n' = > str(ImpalaBeeswaxException()){noformat} > When executing SynchronousOffer(), two different operation count towards the > timeout. The first is submitting the task by calling Offer with the > SynchronousWorkItem. The second is waiting for the task to complete by > calling SynchronousWorkItem::Wait(). If the first part task takes any > measurable time, then SynchronousOffer() modifies the timeout that it passes > into SynchronousWorkItem::Wait() so that the total timeout is respected. The > enforcement of the new timeout is correct, but it results in an incorrect > error message (in this case, showing 4 seconds rather than 5). > This should pass in the original timeout and the current elapsed time. This > would allow for correct enforcement with a correct error message. > This issue is flaky. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8012) Log a message when --fe_service_threads have been allocated
Zoram Thanga created IMPALA-8012: Summary: Log a message when --fe_service_threads have been allocated Key: IMPALA-8012 URL: https://issues.apache.org/jira/browse/IMPALA-8012 Project: IMPALA Issue Type: Improvement Components: Backend, Clients Reporter: Zoram Thanga The maximum number of front end service threads that can be created at any time to handle client connections is controlled by the "--fe_service_threads" parameter. When all such threads have been allocated, new connection requests get queued, and in theory can be spending an indefinite amount of time in the queue. Users perceive this as slow impala connection setup time. We should log a message when we're near or at --fe_service_threads threads to make debugging this situation easier. cc: [~kwho] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (IMPALA-8012) Log a message when --fe_service_threads have been allocated
[ https://issues.apache.org/jira/browse/IMPALA-8012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726259#comment-16726259 ] Tim Armstrong commented on IMPALA-8012: --- I think this is the same as IMPALA-4327 > Log a message when --fe_service_threads have been allocated > --- > > Key: IMPALA-8012 > URL: https://issues.apache.org/jira/browse/IMPALA-8012 > Project: IMPALA > Issue Type: Improvement > Components: Backend, Clients >Reporter: Zoram Thanga >Priority: Major > > The maximum number of front end service threads that can be created at any > time to handle client connections is controlled by the "--fe_service_threads" > parameter. When all such threads have been allocated, new connection requests > get queued, and in theory can be spending an indefinite amount of time in the > queue. Users perceive this as slow impala connection setup time. > We should log a message when we're near or at --fe_service_threads threads to > make debugging this situation easier. > cc: [~kwho] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8012) Log a message when --fe_service_threads have been allocated
Zoram Thanga created IMPALA-8012: Summary: Log a message when --fe_service_threads have been allocated Key: IMPALA-8012 URL: https://issues.apache.org/jira/browse/IMPALA-8012 Project: IMPALA Issue Type: Improvement Components: Backend, Clients Reporter: Zoram Thanga The maximum number of front end service threads that can be created at any time to handle client connections is controlled by the "--fe_service_threads" parameter. When all such threads have been allocated, new connection requests get queued, and in theory can be spending an indefinite amount of time in the queue. Users perceive this as slow impala connection setup time. We should log a message when we're near or at --fe_service_threads threads to make debugging this situation easier. cc: [~kwho] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7992) test_decimal_fuzz.py/test_decimal_ops failing in exhaustive runs
[ https://issues.apache.org/jira/browse/IMPALA-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726229#comment-16726229 ] ASF subversion and git services commented on IMPALA-7992: - Commit 5bf81cdc2797f986189aec4e78ebff2c2d1ed1b6 in impala's branch refs/heads/master from Bharath Vissapragada [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=5bf81cd ] IMPALA-7992: Revert "Symbolize stacktraces in debug builds." This reverts commit 40caf7898cae163d4f8c5b7831341bc55c3bcf35. This commit is causing decimal test failures on certain exhaustive debug builds deterministically (IMPALA-7992). With the revert I could confirm the test passes fine. I'm reverting this while we fix the underlying issue. test_width_bucket repeatedly dumps the following stack trace, which is likely causing high CPU usage during symbolization. I1219 19:26:12.834262 5201 status.cc:128] AnalysisException: Cannot resolve DECIMAL types of the width_bucket(DECIMAL(14,4), DECIMAL(21,13), DECIMAL(38,0), INT) function arguments. You need to wrap the arguments in a CAST. @ 0x1a36d56 impala::Status::Status() @ 0x215ef28 impala::JniUtil::GetJniExceptionMsg() @ 0x1ff0a01 impala::JniCall::Call<>() @ 0x1fedc5d impala::JniUtil::CallJniMethod<>() @ 0x1fec68c impala::Frontend::GetExecRequest() @ 0x2016a8a impala::ImpalaServer::ExecuteInternal() @ 0x201659e impala::ImpalaServer::Execute() @ 0x208c008 impala::ImpalaServer::query() @ 0x257e937 beeswax::BeeswaxServiceProcessor::process_query() @ 0x257e685 beeswax::BeeswaxServiceProcessor::dispatchCall() @ 0x2553e56 impala::ImpalaServiceProcessor::dispatchCall() @ 0x19e0165 apache::thrift::TDispatchProcessor::process() @ 0x1e28cb4 apache::thrift::server::TAcceptQueueServer::Task::run() @ 0x1e1fcc0 impala::ThriftThread::RunRunnable() @ 0x1e213e6 boost::_mfi::mf2<>::operator()() @ 0x1e2127c boost::_bi::list3<>::operator()<>() @ 0x1e20fc8 boost::_bi::bind_t<>::operator()() @ 0x1e20edb boost::detail::function::void_function_obj_invoker0<>::invoke() @ 0x1d448e1 boost::function0<>::operator()() @ 0x21ca604 impala::Thread::SuperviseThread() @ 0x21d2924 boost::_bi::list5<>::operator()<>() @ 0x21d2848 boost::_bi::bind_t<>::operator()() @ 0x21d280b boost::detail::thread_data<>::run() @ 0x36b8549 thread_proxy @ 0x3ff5207850 (unknown) @ 0x3ff4ee894c (unknown) Change-Id: I3646c9cb1c4db9f5295ff7f80d73acca746d296f Reviewed-on: http://gerrit.cloudera.org:8080/12115 Reviewed-by: Bharath Vissapragada Reviewed-by: Tim Armstrong Tested-by: Bharath Vissapragada > test_decimal_fuzz.py/test_decimal_ops failing in exhaustive runs > > > Key: IMPALA-7992 > URL: https://issues.apache.org/jira/browse/IMPALA-7992 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.2.0 >Reporter: bharath v >Assignee: Csaba Ringhofer >Priority: Blocker > Labels: broken-build > > Error Message > {noformat} > query_test/test_decimal_fuzz.py:251: in test_decimal_ops > self.execute_one_decimal_op() query_test/test_decimal_fuzz.py:247: in > execute_one_decimal_op assert self.result_equals(expected_result, result) E > assert >(Decimal('-0.80'), > None) E + where > = > .result_equals > {noformat} > Stacktrace > {noformat} > query_test/test_decimal_fuzz.py:251: in test_decimal_ops > self.execute_one_decimal_op() query_test/test_decimal_fuzz.py:247: in > execute_one_decimal_op assert self.result_equals(expected_result, result) E > assert >(Decimal('-0.80'), > None) E + where > = > .result_equals > {noformat} > stderr > {noformat} > -- 2018-12-16 00:10:48,905 INFO MainThread: Started query > aa4b44ad5b34c3fb:24d18385 > SET decimal_v2=true; > -- executing against localhost:21000 > select cast(-879550566.24 as decimal(11,2)) % > cast(-100.000 as decimal(28,5)); > -- 2018-12-16 00:10:48,979 INFO MainThread: Started query > b24acf22b1607dc6:4f287530 > SET decimal_v2=true; > -- executing against localhost:21000 > select cast(17179869.184 as decimal(19,7)) / > cast(-87808593158000679814.7939232649738916 as decimal(38,17)); > -- 2018-12-16 00:10:49,054 INFO MainThread: Started query > 38435f02022e590a:18f7e97 > SET decimal_v2=true; > -- executing against localhost:21000 > select cast(99 as decimal(32,2)) - > cast(-519203.671959101313 as decimal(18,12)); > -- 2018-12-16 00:10:49,132 INFO MainThread:
[jira] [Commented] (IMPALA-7992) test_decimal_fuzz.py/test_decimal_ops failing in exhaustive runs
[ https://issues.apache.org/jira/browse/IMPALA-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726228#comment-16726228 ] ASF subversion and git services commented on IMPALA-7992: - Commit 5bf81cdc2797f986189aec4e78ebff2c2d1ed1b6 in impala's branch refs/heads/master from Bharath Vissapragada [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=5bf81cd ] IMPALA-7992: Revert "Symbolize stacktraces in debug builds." This reverts commit 40caf7898cae163d4f8c5b7831341bc55c3bcf35. This commit is causing decimal test failures on certain exhaustive debug builds deterministically (IMPALA-7992). With the revert I could confirm the test passes fine. I'm reverting this while we fix the underlying issue. test_width_bucket repeatedly dumps the following stack trace, which is likely causing high CPU usage during symbolization. I1219 19:26:12.834262 5201 status.cc:128] AnalysisException: Cannot resolve DECIMAL types of the width_bucket(DECIMAL(14,4), DECIMAL(21,13), DECIMAL(38,0), INT) function arguments. You need to wrap the arguments in a CAST. @ 0x1a36d56 impala::Status::Status() @ 0x215ef28 impala::JniUtil::GetJniExceptionMsg() @ 0x1ff0a01 impala::JniCall::Call<>() @ 0x1fedc5d impala::JniUtil::CallJniMethod<>() @ 0x1fec68c impala::Frontend::GetExecRequest() @ 0x2016a8a impala::ImpalaServer::ExecuteInternal() @ 0x201659e impala::ImpalaServer::Execute() @ 0x208c008 impala::ImpalaServer::query() @ 0x257e937 beeswax::BeeswaxServiceProcessor::process_query() @ 0x257e685 beeswax::BeeswaxServiceProcessor::dispatchCall() @ 0x2553e56 impala::ImpalaServiceProcessor::dispatchCall() @ 0x19e0165 apache::thrift::TDispatchProcessor::process() @ 0x1e28cb4 apache::thrift::server::TAcceptQueueServer::Task::run() @ 0x1e1fcc0 impala::ThriftThread::RunRunnable() @ 0x1e213e6 boost::_mfi::mf2<>::operator()() @ 0x1e2127c boost::_bi::list3<>::operator()<>() @ 0x1e20fc8 boost::_bi::bind_t<>::operator()() @ 0x1e20edb boost::detail::function::void_function_obj_invoker0<>::invoke() @ 0x1d448e1 boost::function0<>::operator()() @ 0x21ca604 impala::Thread::SuperviseThread() @ 0x21d2924 boost::_bi::list5<>::operator()<>() @ 0x21d2848 boost::_bi::bind_t<>::operator()() @ 0x21d280b boost::detail::thread_data<>::run() @ 0x36b8549 thread_proxy @ 0x3ff5207850 (unknown) @ 0x3ff4ee894c (unknown) Change-Id: I3646c9cb1c4db9f5295ff7f80d73acca746d296f Reviewed-on: http://gerrit.cloudera.org:8080/12115 Reviewed-by: Bharath Vissapragada Reviewed-by: Tim Armstrong Tested-by: Bharath Vissapragada > test_decimal_fuzz.py/test_decimal_ops failing in exhaustive runs > > > Key: IMPALA-7992 > URL: https://issues.apache.org/jira/browse/IMPALA-7992 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.2.0 >Reporter: bharath v >Assignee: Csaba Ringhofer >Priority: Blocker > Labels: broken-build > > Error Message > {noformat} > query_test/test_decimal_fuzz.py:251: in test_decimal_ops > self.execute_one_decimal_op() query_test/test_decimal_fuzz.py:247: in > execute_one_decimal_op assert self.result_equals(expected_result, result) E > assert >(Decimal('-0.80'), > None) E + where > = > .result_equals > {noformat} > Stacktrace > {noformat} > query_test/test_decimal_fuzz.py:251: in test_decimal_ops > self.execute_one_decimal_op() query_test/test_decimal_fuzz.py:247: in > execute_one_decimal_op assert self.result_equals(expected_result, result) E > assert >(Decimal('-0.80'), > None) E + where > = > .result_equals > {noformat} > stderr > {noformat} > -- 2018-12-16 00:10:48,905 INFO MainThread: Started query > aa4b44ad5b34c3fb:24d18385 > SET decimal_v2=true; > -- executing against localhost:21000 > select cast(-879550566.24 as decimal(11,2)) % > cast(-100.000 as decimal(28,5)); > -- 2018-12-16 00:10:48,979 INFO MainThread: Started query > b24acf22b1607dc6:4f287530 > SET decimal_v2=true; > -- executing against localhost:21000 > select cast(17179869.184 as decimal(19,7)) / > cast(-87808593158000679814.7939232649738916 as decimal(38,17)); > -- 2018-12-16 00:10:49,054 INFO MainThread: Started query > 38435f02022e590a:18f7e97 > SET decimal_v2=true; > -- executing against localhost:21000 > select cast(99 as decimal(32,2)) - > cast(-519203.671959101313 as decimal(18,12)); > -- 2018-12-16 00:10:49,132 INFO MainThread:
[jira] [Updated] (IMPALA-8011) Allow filtering on virtual column for file name
[ https://issues.apache.org/jira/browse/IMPALA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Ebert updated IMPALA-8011: Description: An additional performance enhancement would be the capability to filter on file names using a virtual column. This would be somewhat like the current optimization of sorting data and skipping files based on parquet metadata, but instead you put something in the file name to indicate it's contents should be filtered. For example say you were writing first names and then searching for them, during your writing phase you put the first letter of the first name into your file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" then when doing a query you could filter based on where INPUT__FILE__NAME contains "D" when searching for David and skip reading the file. Another use would be if you had a daily partition, and you put the timestamp into the file name, then limit the search to only the last hour even though your partition is daily. This then gives you the ability to sort by another column making searches even faster on both. This requires IMPALA-801 was: An additional performance enhancement would be the capability to filter on file names using a virtual column. This would be somewhat like the current optimization of sorting data and skipping files based on parquet metadata, but instead you put something in the file name to indicate it's contents should be filtered. For example say you were writing first names and then searching for them, during your writing phase you put the first letter of the first name into your file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" then when doing a query you could filter based on where INPUT__FILE__NAME contains "D" when searching for David and skip reading the file. One use would be if you had a daily partition, and you put the timestamp into the file name, then limit the search to only the last hour even though your partition is daily. This then leaves you the ability to sort by another column making searches even faster on both. This requires IMPALA-801 > Allow filtering on virtual column for file name > --- > > Key: IMPALA-8011 > URL: https://issues.apache.org/jira/browse/IMPALA-8011 > Project: IMPALA > Issue Type: Improvement >Reporter: Peter Ebert >Priority: Major > Labels: built-in-function > > An additional performance enhancement would be the capability to filter on > file names using a virtual column. This would be somewhat like the current > optimization of sorting data and skipping files based on parquet metadata, > but instead you put something in the file name to indicate it's contents > should be filtered. > For example say you were writing first names and then searching for them, > during your writing phase you put the first letter of the first name into > your file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" > then when doing a query you could filter based on where INPUT__FILE__NAME > contains "D" when searching for David and skip reading the file. > Another use would be if you had a daily partition, and you put the timestamp > into the file name, then limit the search to only the last hour even though > your partition is daily. This then gives you the ability to sort by another > column making searches even faster on both. > > This requires IMPALA-801 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8011) Allow filtering on virtual column for file name
[ https://issues.apache.org/jira/browse/IMPALA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Ebert updated IMPALA-8011: Description: An additional performance enhancement would be the capability to filter on file names using a virtual column. This would be somewhat like the current optimization of sorting data and skipping files based on parquet metadata, but instead you put something in the file name to indicate it's contents should be filtered. For example say you were writing first names and then searching for them, during your writing phase you put the first letter of the first name into your file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" then when doing a query you could filter based on where INPUT__FILE__NAME contains "D" when searching for David and skip reading the file. One use would be if you had a daily partition, and you put the timestamp into the file name, then limit the search to only the last hour even though your partition is daily. This then leaves you the ability to sort by another column making searches even faster on both. This requires IMPALA-801 was: An additional performance enhancement would be to be able to filter on file names using a virtual column. It would be somewhat the current optimization of sorting data and skipping files based on metadata, but instead you put something in the file name to indicate it's contents should be filtered. For example say you were writing first names and then searching for them, during your writing phase you put the first letter of the first name into your file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" then when doing a query you could filter based on where INPUT__FILE__NAME contains "D" when searching for David and skip reading the file. One use would be if you had a daily partition, and you put the timestamp into the file name, then limit the search to only the last hour even though your partition is daily. This then leaves you the ability to sort by another column making searches even faster on both. This requires IMPALA-801 > Allow filtering on virtual column for file name > --- > > Key: IMPALA-8011 > URL: https://issues.apache.org/jira/browse/IMPALA-8011 > Project: IMPALA > Issue Type: Improvement >Reporter: Peter Ebert >Priority: Major > Labels: built-in-function > > An additional performance enhancement would be the capability to filter on > file names using a virtual column. This would be somewhat like the current > optimization of sorting data and skipping files based on parquet metadata, > but instead you put something in the file name to indicate it's contents > should be filtered. > For example say you were writing first names and then searching for them, > during your writing phase you put the first letter of the first name into > your file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" > then when doing a query you could filter based on where INPUT__FILE__NAME > contains "D" when searching for David and skip reading the file. > One use would be if you had a daily partition, and you put the timestamp into > the file name, then limit the search to only the last hour even though your > partition is daily. This then leaves you the ability to sort by another > column making searches even faster on both. > > This requires IMPALA-801 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8011) Allow filtering on virtual column for file name
[ https://issues.apache.org/jira/browse/IMPALA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726171#comment-16726171 ] Tim Armstrong commented on IMPALA-8011: --- [~skye] had a prototype patch to add virtual columns like this a few years ago, the implementation idea was to treat it similarly to partition key columns and add it to the "template tuple" in the scanner. > Allow filtering on virtual column for file name > --- > > Key: IMPALA-8011 > URL: https://issues.apache.org/jira/browse/IMPALA-8011 > Project: IMPALA > Issue Type: Improvement >Reporter: Peter Ebert >Priority: Major > Labels: built-in-function > > An additional performance enhancement would be to be able to filter on file > names using a virtual column. It would be somewhat the current optimization > of sorting data and skipping files based on metadata, but instead you put > something in the file name to indicate it's contents should be filtered. > For example say you were writing first names and then searching for them, > during your writing phase you put the first letter of the first name into > your file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" > then when doing a query you could filter based on where INPUT__FILE__NAME > contains "D" when searching for David and skip reading the file. > One use would be if you had a daily partition, and you put the timestamp into > the file name, then limit the search to only the last hour even though your > partition is daily. This then leaves you the ability to sort by another > column making searches even faster on both. > > This requires IMPALA-801 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7994) Queries hitting memory limit issues in release builds
[ https://issues.apache.org/jira/browse/IMPALA-7994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726150#comment-16726150 ] Bikramjeet Vig commented on IMPALA-7994: Keeping this open for a while to make sure it doesn't occur again after the recent commit > Queries hitting memory limit issues in release builds > - > > Key: IMPALA-7994 > URL: https://issues.apache.org/jira/browse/IMPALA-7994 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.2.0 >Reporter: bharath v >Assignee: Bikramjeet Vig >Priority: Blocker > Labels: broken-build > > This usually causes multiple test failures, especially the ones running > around the time memory is oversubscribed. The failures in one of builds I > noticed are. > {noformat} > query_test.test_queries.TestQueriesTextTables.test_random[protocol: beeswax > | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': > 0} | table_format: text/none] 19 sec 1 > > query_test.test_runtime_filters.TestRuntimeRowFilters.test_row_filters[protocol: > beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, > 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': > 0} | table_format: parquet/none] 19 sec 1 > query_test.test_queries.TestHdfsQueries.test_file_partitions[protocol: > beeswax | exec_option: {'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': '100', 'batch_size': 0, 'num_nodes': 0} | > table_format: avro/snap/block] 2.2 sec 1 > > query_test.test_aggregation.TestDistinctAggregation.test_multiple_distinct[protocol: > beeswax | exec_option: {'disable_codegen': False, 'shuffle_distinct_exprs': > True} | table_format: seq/gzip/block] 1.8 sec 1 > query_test.test_queries.TestQueriesTextTables.test_values[protocol: beeswax > | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, > 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': > 0} | table_format: text/none] 60 ms 1 > query_test.test_queries.TestHdfsQueries.test_file_partitions[protocol: > beeswax | exec_option: {'disable_codegen_rows_threshold': 0, > 'disable_codegen': False, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': '100', 'batch_size': 0, 'num_nodes': 0} | > table_format: rc/bzip/block] 7 ms1 > > query_test.test_aggregation.TestDistinctAggregation.test_multiple_distinct[protocol: > beeswax | exec_option: {'disable_codegen': False, 'shuffle_distinct_exprs': > True} | table_format: text/none]60 ms 1 > query_test.test_queries.TestHdfsQueries.test_file_partitions[protocol: > beeswax | exec_option: {'disable_codegen_rows_threshold': 0, > 'disable_codegen': True, 'abort_on_error': 1, 'debug_action': None, > 'exec_single_node_rows_threshold': '100', 'batch_size': 0, 'num_nodes': 0} | > table_format: rc/bzip/block]7 ms1 > > query_test.test_runtime_filters.TestRuntimeRowFilters.test_row_filters[protocol: > beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': > 0} | table_format: parquet/none] 76 ms 1 > query_test.test_queries.TestHdfsQueries.test_file_partitions[protocol: > beeswax | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'debug_action': None, 'exec_single_node_rows_threshold': > 0} | table_format: text/lzo/block] 7 ms1 > verifiers.test_verify_metrics.TestValidateMetrics.test_metrics_are_zero > {noformat} > Following is the mem-tracker dump from one of the failed queries. > {noformat} > Stacktrace > query_test/test_queries.py:182: in test_random > self.run_test_case('QueryTest/random', vector) > common/impala_test_suite.py:467: in run_test_case > result = self.__execute_query(target_impalad_client, query, user=user) > common/impala_test_suite.py:688: in __execute_query > return impalad_client.execute(query, user=user) > common/impala_connection.py:170: in execute > return self.__beeswax_client.execute(sql_stmt, user=user) > beeswax/impala_beeswax.py:182: in execute > handle = self.__execute_query(query_string.strip(), user=user) > beeswax/impala_beeswax.py:359: in __execute_query > self.wait_for_finished(handle) > beeswax/impala_beeswax.py:380: in
[jira] [Closed] (IMPALA-6352) TestTableSample took too long in recent tests
[ https://issues.apache.org/jira/browse/IMPALA-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikramjeet Vig closed IMPALA-6352. -- Resolution: Cannot Reproduce > TestTableSample took too long in recent tests > - > > Key: IMPALA-6352 > URL: https://issues.apache.org/jira/browse/IMPALA-6352 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 2.12.0 >Reporter: Vuk Ercegovac >Assignee: Bikramjeet Vig >Priority: Critical > Labels: broken-build > Attachments: impala6352pstacks.tar.gz > > > TestTableSample test took ~8 hours in recent (12/21) exhaustive rhel tests. > That caused the overall test to be aborted: > ... > 01:53:10 [gw2] PASSED > query_test/test_tablesample.py::TestTableSample::test_tablesample[repeatable: > True | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > seq/gzip/block] > 01:53:10 > query_test/test_tablesample.py::TestTableSample::test_tablesample[repeatable: > False | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > seq/gzip/block] > 10:03:51 [gw2] PASSED > query_test/test_tablesample.py::TestTableSample::test_tablesample[repeatable: > False | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > seq/gzip/block] Build timed out (after 1,440 minutes). Marking the build as > aborted. > 10:03:51 Build was aborted > ... -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (IMPALA-6352) TestTableSample took too long in recent tests
[ https://issues.apache.org/jira/browse/IMPALA-6352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bikramjeet Vig closed IMPALA-6352. -- Resolution: Cannot Reproduce > TestTableSample took too long in recent tests > - > > Key: IMPALA-6352 > URL: https://issues.apache.org/jira/browse/IMPALA-6352 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 2.12.0 >Reporter: Vuk Ercegovac >Assignee: Bikramjeet Vig >Priority: Critical > Labels: broken-build > Attachments: impala6352pstacks.tar.gz > > > TestTableSample test took ~8 hours in recent (12/21) exhaustive rhel tests. > That caused the overall test to be aborted: > ... > 01:53:10 [gw2] PASSED > query_test/test_tablesample.py::TestTableSample::test_tablesample[repeatable: > True | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > seq/gzip/block] > 01:53:10 > query_test/test_tablesample.py::TestTableSample::test_tablesample[repeatable: > False | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > seq/gzip/block] > 10:03:51 [gw2] PASSED > query_test/test_tablesample.py::TestTableSample::test_tablesample[repeatable: > False | exec_option: {'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > seq/gzip/block] Build timed out (after 1,440 minutes). Marking the build as > aborted. > 10:03:51 Build was aborted > ... -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-7213) Port ReportExecStatus() RPCs to KRPC
[ https://issues.apache.org/jira/browse/IMPALA-7213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726144#comment-16726144 ] ASF subversion and git services commented on IMPALA-7213: - Commit db4bc0844015d03c87f195fd48838fa4c755f902 in impala's branch refs/heads/master from Michael Ho [ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=db4bc08 ] IMPALA-7213: Use separate network plane for DataStream and Control services This change is a follow up for a review comment in https://gerrit.cloudera.org/#/c/10855/ about separating the TCP connections of DataStream and Control services so that control commands don't get blocked behind large payloads being sent in the DataStream services. Testing done: exhaustive build Change-Id: I774f4a0e2cfedc4dba72cde4f5d28898cdbdc236 Reviewed-on: http://gerrit.cloudera.org:8080/12107 Reviewed-by: Michael Ho Tested-by: Impala Public Jenkins > Port ReportExecStatus() RPCs to KRPC > > > Key: IMPALA-7213 > URL: https://issues.apache.org/jira/browse/IMPALA-7213 > Project: IMPALA > Issue Type: Sub-task > Components: Distributed Exec >Affects Versions: Impala 3.0 >Reporter: Michael Ho >Assignee: Michael Ho >Priority: Major > Labels: impala-scalability-sprint-08-13-2018 > > This is a sub-task to track the porting of ReportExecStatus() to KRPC. This > should help reduce the number of connections to coordinator. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-8011) Allow filtering on virtual column for file name
[ https://issues.apache.org/jira/browse/IMPALA-8011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Rahn updated IMPALA-8011: -- Labels: built-in-function (was: ) > Allow filtering on virtual column for file name > --- > > Key: IMPALA-8011 > URL: https://issues.apache.org/jira/browse/IMPALA-8011 > Project: IMPALA > Issue Type: Improvement >Reporter: Peter Ebert >Priority: Major > Labels: built-in-function > > An additional performance enhancement would be to be able to filter on file > names using a virtual column. It would be somewhat the current optimization > of sorting data and skipping files based on metadata, but instead you put > something in the file name to indicate it's contents should be filtered. > For example say you were writing first names and then searching for them, > during your writing phase you put the first letter of the first name into > your file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" > then when doing a query you could filter based on where INPUT__FILE__NAME > contains "D" when searching for David and skip reading the file. > One use would be if you had a daily partition, and you put the timestamp into > the file name, then limit the search to only the last hour even though your > partition is daily. This then leaves you the ability to sort by another > column making searches even faster on both. > > This requires IMPALA-801 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-8011) Allow filtering on virtual column for file name
Peter Ebert created IMPALA-8011: --- Summary: Allow filtering on virtual column for file name Key: IMPALA-8011 URL: https://issues.apache.org/jira/browse/IMPALA-8011 Project: IMPALA Issue Type: Improvement Reporter: Peter Ebert An additional performance enhancement would be to be able to filter on file names using a virtual column. It would be somewhat the current optimization of sorting data and skipping files based on metadata, but instead you put something in the file name to indicate it's contents should be filtered. For example say you were writing first names and then searching for them, during your writing phase you put the first letter of the first name into your file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" then when doing a query you could filter based on where INPUT__FILE__NAME contains "D" when searching for David and skip reading the file. One use would be if you had a daily partition, and you put the timestamp into the file name, then limit the search to only the last hour even though your partition is daily. This then leaves you the ability to sort by another column making searches even faster on both. This requires IMPALA-801 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (IMPALA-8011) Allow filtering on virtual column for file name
Peter Ebert created IMPALA-8011: --- Summary: Allow filtering on virtual column for file name Key: IMPALA-8011 URL: https://issues.apache.org/jira/browse/IMPALA-8011 Project: IMPALA Issue Type: Improvement Reporter: Peter Ebert An additional performance enhancement would be to be able to filter on file names using a virtual column. It would be somewhat the current optimization of sorting data and skipping files based on metadata, but instead you put something in the file name to indicate it's contents should be filtered. For example say you were writing first names and then searching for them, during your writing phase you put the first letter of the first name into your file name, so if I'm storing Alice, Bob, Cathy, my file name is "ABC" then when doing a query you could filter based on where INPUT__FILE__NAME contains "D" when searching for David and skip reading the file. One use would be if you had a daily partition, and you put the timestamp into the file name, then limit the search to only the last hour even though your partition is daily. This then leaves you the ability to sort by another column making searches even faster on both. This requires IMPALA-801 -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8007) test_slow_subscriber is flaky
[ https://issues.apache.org/jira/browse/IMPALA-8007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16726000#comment-16726000 ] Tim Armstrong commented on IMPALA-8007: --- Pooja I think is afk for a couple of weeks, if it reoccurs we should probably find someone else to fix. > test_slow_subscriber is flaky > - > > Key: IMPALA-8007 > URL: https://issues.apache.org/jira/browse/IMPALA-8007 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 3.2.0 >Reporter: bharath v >Assignee: Pooja Nilangekar >Priority: Major > Labels: broken-build, flaky > Fix For: Impala 3.2.0 > > > We have hit both the asserts in the test. > *Exhaustive:* > {noformat} > statestore/test_statestore.py:574: in test_slow_subscriber assert > (secs_since_heartbeat < float(sleep_time + 1.0)) E assert > 8.8043 < 6.0 E+ where 6.0 = float((5 + 1.0)) > Stacktrace > statestore/test_statestore.py:574: in test_slow_subscriber > assert (secs_since_heartbeat < float(sleep_time + 1.0)) > E assert 8.8043 < 6.0 > E+ where 6.0 = float((5 + 1.0)) > {noformat} > *ASAN* > {noformat} > Error Message > statestore/test_statestore.py:573: in t assert (secs_since_heartbeat > > float(sleep_time - 1.0)) E assert 4.995 > 5.0 E+ where 5.0 = float((6 > - 1.0)) > Stacktrace > statestore/test_statestore.py:573: in test_slow_subscriber > assert (secs_since_heartbeat > float(sleep_time - 1.0)) > E assert 4.995 > 5.0 > E+ where 5.0 = float((6 - 1.0)) > {noformat} > I only noticed this happen twice (the above two instances) since the patch is > committed. So, looks like a racy bug. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-376) Built-in functions for parsing JSON
[ https://issues.apache.org/jira/browse/IMPALA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-376. --- Resolution: Fixed Fix Version/s: Impala 3.1.0 > Built-in functions for parsing JSON > --- > > Key: IMPALA-376 > URL: https://issues.apache.org/jira/browse/IMPALA-376 > Project: IMPALA > Issue Type: New Feature > Components: Backend >Affects Versions: Product Backlog > Environment: All supported environments >Reporter: Zoltan Toth-Czifra >Assignee: Quanlong Huang >Priority: Minor > Labels: built-in-function > Fix For: Impala 3.1.0 > > > Hi, > Hive comes with some useful built-in UDFs to process JSON objects. > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF > Namely: > - get_json_object > - json_tuple > To make Impala and Hive tables and quieries more interchangable, I am > proposing porting these UDFs to be part Impala's built in functions: > http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_functions.html > h4. Example > Consider the following table *raw_log* > ||action||parameters|| > |search|{"keyword":"hotel"}| > |visit|{"url":"http://example.com"}| > ...and the following query: > {code} > SELECT get_json_object(event_params, "$.keyword") AS keyword FROM raw_log > WHERE action='search'; > {code} > The query should return the following results: > ||keyword|| > |hotel| -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (IMPALA-376) Built-in functions for parsing JSON
[ https://issues.apache.org/jira/browse/IMPALA-376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-376. --- Resolution: Fixed Fix Version/s: Impala 3.1.0 > Built-in functions for parsing JSON > --- > > Key: IMPALA-376 > URL: https://issues.apache.org/jira/browse/IMPALA-376 > Project: IMPALA > Issue Type: New Feature > Components: Backend >Affects Versions: Product Backlog > Environment: All supported environments >Reporter: Zoltan Toth-Czifra >Assignee: Quanlong Huang >Priority: Minor > Labels: built-in-function > Fix For: Impala 3.1.0 > > > Hi, > Hive comes with some useful built-in UDFs to process JSON objects. > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF > Namely: > - get_json_object > - json_tuple > To make Impala and Hive tables and quieries more interchangable, I am > proposing porting these UDFs to be part Impala's built in functions: > http://www.cloudera.com/content/cloudera-content/cloudera-docs/Impala/latest/Installing-and-Using-Impala/ciiu_functions.html > h4. Example > Consider the following table *raw_log* > ||action||parameters|| > |search|{"keyword":"hotel"}| > |visit|{"url":"http://example.com"}| > ...and the following query: > {code} > SELECT get_json_object(event_params, "$.keyword") AS keyword FROM raw_log > WHERE action='search'; > {code} > The query should return the following results: > ||keyword|| > |hotel| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org