[jira] [Created] (IMPALA-6431) Disable codegen if query kudu table with PK predicate

2018-01-19 Thread Juan Yu (JIRA)
Juan Yu created IMPALA-6431:
---

 Summary: Disable codegen if query kudu table with PK predicate
 Key: IMPALA-6431
 URL: https://issues.apache.org/jira/browse/IMPALA-6431
 Project: IMPALA
  Issue Type: Improvement
  Components: Frontend
Reporter: Juan Yu


In near real-time use cases, lots of kudu queries are filtered by primary keys 
so it only needs to process a small amount of data. In such scenario, codegen 
could consume a big part of CPU compare to the total query CPU usage. when 
running with high concurrency, this can easily saturate CPU and impact 
throughput. Disable codegen will not only improve throughput, also reduce CPU 
usage.

Here are some test results:

Create table test(
...
PRIMARY KEY (pickup_datetime, medallion, hack_license, vendor_id)
)
PARTITION BY HASH (medallion, vendor_id) PARTITIONS 10
stored as KUDU;

Q1 - select count(*) from trip_data_kudu where pickup_datetime='2013-01-09 
20:33:00';

Q2 - select passenger_count, avg(trip_time_in_secs), count(1) from 
trip_data_kudu where pickup_datetime='2013-01-09 20:33:00' group by 1 order by 
2 desc;

Q3 - select * from trip_data_kudu where pickup_datetime='2013-01-09 20:33:00' 
and vendor_id='CMT';

 
|16 concurrency|QPS|CPU usage|
|Q1 (codegen enabled)|42|76%|
|Q1 (codegen disabled)|250|58%|
| | | |
|Q2 (codegen enabled)|7.7|85%|
|Q2 (codegen disabled)|78|65%|
| | | |
|Q3 (codegen enabled)|52|75%|
|Q3 (codegen disabled)|185|60%|

 

Note that Impala doesn't have per partition cardinality stats for Kudu table, 
query cannot benefit from DISABLE_CODEGEN_ROWS_THRESHOLD optimization.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-6430) Log a detailed error message on failure of MetricVerifier

2018-01-19 Thread Bikramjeet Vig (JIRA)
Bikramjeet Vig created IMPALA-6430:
--

 Summary: Log a detailed error message on failure of MetricVerifier
 Key: IMPALA-6430
 URL: https://issues.apache.org/jira/browse/IMPALA-6430
 Project: IMPALA
  Issue Type: Sub-task
  Components: Backend
Affects Versions: Impala 2.12.0
Reporter: Bikramjeet Vig
 Fix For: Impala 2.12.0


Log the memz, metrics and query debug pages on failure of the 
"wait_for_metric_value()" method in the impala-service python client. This 
would help us understand the state of impalad during that failure.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-5052) Read and write signed integer logical type metadata in Parquet

2018-01-19 Thread Anuj Phadke (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anuj Phadke resolved IMPALA-5052.
-
   Resolution: Fixed
Fix Version/s: Impala 2.12.0

https://github.com/apache/impala/commit/38461c524f64cc367c483f5f958c64ffd014fcaa

> Read and write signed integer logical type metadata in Parquet
> --
>
> Key: IMPALA-5052
> URL: https://issues.apache.org/jira/browse/IMPALA-5052
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.7.0
>Reporter: Ian Buss
>Assignee: Anuj Phadke
>Priority: Minor
>  Labels: newbie, parquet, ramp-up, usability
> Fix For: Impala 2.12.0
>
>
> Some systems (e.g Spark) write Parquet files with integral types using 
> logical types. Impala fails to handle these logical types when constructing a 
> table from an existing Parquet file. However, reading data from such files 
> works fine.
> For example, consider a file the following Parquet schema:
> {code}
> [ec2-user@ip-172-31-61-61 ~]$ parquet-tools schema 
> part-r-0-a409eea5-3d4f-4172-b376-659005f65489.gz.parquet
> message spark_schema {
>   optional int32 id;
>   optional int32 tinyint_col (INT_8);
>   optional int32 smallint_col (INT_16);
>   optional int32 int_col;
>   optional int64 bigint_col;
> }
> {code}
> A CREATE TABLE ... LIKE PARQUET statement fails with something like the 
> following:
> {code}
> ERROR: AnalysisException: Unsupported logical parquet type INT_8 (primitive 
> type is INT32) for field tinyint_col
> {code}
> This functionality is handled by the {{convertLogicalParquetType}} method in 
> the {{com.cloudera.impala.analysis.CreateTableLikeFileStmt}} class, which 
> currently does not handle integer logical types.
> See 
> https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#numeric-types
>  for information about the mapping between logical types and encodings.
> We should implement read and write support for this metadata, i.e. allow 
> correct round-tripping of tinyint and smallint types.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-4993) Add support for dictionary filtering on nested fields

2018-01-19 Thread Vuk Ercegovac (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-4993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vuk Ercegovac resolved IMPALA-4993.
---
   Resolution: Fixed
Fix Version/s: Impala 2.12.0

> Add support for dictionary filtering on nested fields
> -
>
> Key: IMPALA-4993
> URL: https://issues.apache.org/jira/browse/IMPALA-4993
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.9.0
>Reporter: Joe McDonnell
>Assignee: Vuk Ercegovac
>Priority: Major
>  Labels: ramp-up
> Fix For: Impala 2.12.0
>
>
> Parquet dictionary filtering currently supports only non-nested data. It 
> would be useful to be able to filter on nested data. 
> Filtering can only happen if the value of the nested data is required to be 
> non-empty.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-6422) Compute stats tablesample spends a lot of time in powf()

2018-01-19 Thread Alexander Behm (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Behm resolved IMPALA-6422.

   Resolution: Fixed
Fix Version/s: Impala 2.1.2

commit 1dfdc6704b74c77d63accb69e9197fd203455be0
Author: Alex Behm 
Date:   Thu Jan 18 19:06:30 2018 -0800

IMPALA-6422: Use ldexp() instead of powf() in HLL.

Using ldexp() to compute a floating point power of two is
over 10x faster than powf().

This change is particularly helpful for speeding up
COMPUTE STATS TABLESAMPLE which has many calls to
HllFinalEstimate() where floating point power of two
computations are relevant.

Testing:
- core/hdfs run passed

Change-Id: I517614d3f9cf1cf56b15a173c3cfe76e0f2e0382
Reviewed-on: http://gerrit.cloudera.org:8080/9078
Reviewed-by: Alex Behm 
Tested-by: Impala Public Jenkins


> Compute stats tablesample spends a lot of time in powf()
> 
>
> Key: IMPALA-6422
> URL: https://issues.apache.org/jira/browse/IMPALA-6422
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.11.0
>Reporter: Alexander Behm
>Assignee: Alexander Behm
>Priority: Major
>  Labels: compute-stats, perfomance
> Fix For: Impala 2.1.2
>
>
> [~mmokhtar] did perf profiling for COMPUTE STATS TABLESAMPLE and discovered 
> that a lot of time is spent on finalizing HLL intermediates. Most time is 
> spent in powf().
> Relevant snippet from AggregateFunctions::HllFinalEstimate() in 
> aggregate-functions-ir.cc:
> {code}
>   for (int i = 0; i < num_buckets; ++i) {
> harmonic_mean += powf(2.0f, -buckets[i]);
> if (buckets[i] == 0) ++num_zero_registers;
>   }
> {code}
> Since we're doing a power of 2 using ldexp() should be much more efficient.
> I did a microbenchmark and found that ldexp() is >10x faster than powf() for 
> this scenario.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-6429) Decimal V2 division returns an incorrect result

2018-01-19 Thread Taras Bobrovytsky (JIRA)
Taras Bobrovytsky created IMPALA-6429:
-

 Summary: Decimal V2 division returns an incorrect result
 Key: IMPALA-6429
 URL: https://issues.apache.org/jira/browse/IMPALA-6429
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 2.11.0
Reporter: Taras Bobrovytsky


This bug was found by the decimal fuzzer in one of our exhaustive nigthly 
builds.

Query:
{code:java}
select cast(9269574547799442144750864826042582 as decimal(38,2)) / 
cast(-0.2475880078570760549798248447 as decimal(38,38))
{code}
Result:
{code:java}
-25071524537367060442348433289920.875272
{code}
The expected results should be around:
{code:java}
-37439513440208481766456680165938338.924
{code}
I computed the expected result in Python. Even without a calculator, it should 
be obvious that the first digit of the expected result should be higher than 2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-6428) Fix ALTER for tblproperty num_tablet_replicas

2018-01-19 Thread Thomas Tauber-Marshall (JIRA)
Thomas Tauber-Marshall created IMPALA-6428:
--

 Summary: Fix ALTER for tblproperty num_tablet_replicas
 Key: IMPALA-6428
 URL: https://issues.apache.org/jira/browse/IMPALA-6428
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Affects Versions: Impala 2.12.0
Reporter: Thomas Tauber-Marshall


For Kudu tables created through Impala, we store a tblproperty called 
'kudu.num_tablet_replicas', indicating the replication factor for the table.

If this value is set in a create table, it is passed to Kudu and actually 
applied to the table. However, altering this property is allowed but has no 
effect on the underlying table, which could be confusing.

We should either disallow altering the property, or we should pass the new 
value down to Kudu (if Kudu allows changing the replication factor after table 
creation)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-6268) KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing

2018-01-19 Thread Sailesh Mukil (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sailesh Mukil resolved IMPALA-6268.
---
   Resolution: Fixed
Fix Version/s: Impala 2.12.0

Fix in:

https://github.com/apache/impala/commit/d8ae8801ae668f6ba4771c5794b80f7c9262cd65

> KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices failing
> --
>
> Key: IMPALA-6268
> URL: https://issues.apache.org/jira/browse/IMPALA-6268
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Affects Versions: Impala 2.11.0
>Reporter: Philip Zeyliger
>Assignee: Sailesh Mukil
>Priority: Critical
>  Labels: broken-build
> Fix For: Impala 2.12.0
>
>
> We're seeing failures of 
> {{KerberosOnAndOff/RpcMgrKerberizedTest.MultipleServices/1 (from 
> KerberosOnAndOff_RpcMgrKerberizedTest)}} in a variety of test configurations. 
> Commit {{IMPALA-5053: [SECURITY] Make KRPC work with Kerberos}}  is the last 
> commit where these things happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-6427) Planner test expected output drops QUERYOPTIONS sections

2018-01-19 Thread Tim Armstrong (JIRA)
Tim Armstrong created IMPALA-6427:
-

 Summary: Planner test expected output drops QUERYOPTIONS sections
 Key: IMPALA-6427
 URL: https://issues.apache.org/jira/browse/IMPALA-6427
 Project: IMPALA
  Issue Type: Bug
  Components: Infrastructure
Affects Versions: Impala 2.11.0
Reporter: Tim Armstrong
Assignee: Tim Armstrong


The explain output in logs/fe_test/PlannerTest does not include the 
QUERYOPTIONS section, so it's not possible to replace the current expected 
output with the new expected output.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-6426) TestRuntimeRowFilters can fail due to differences in rows rejected

2018-01-19 Thread Joe McDonnell (JIRA)
Joe McDonnell created IMPALA-6426:
-

 Summary: TestRuntimeRowFilters can fail due to differences in rows 
rejected
 Key: IMPALA-6426
 URL: https://issues.apache.org/jira/browse/IMPALA-6426
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Affects Versions: Impala 2.12.0
Reporter: Joe McDonnell


Some recent test runs have seen some failures in 
query_test/test_runtime_filters.py:TestRuntimeRowFilters. 

The test is looking for:
{code:java}
query_test/test_runtime_filters.py:168: in test_row_filters 
test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS' : str(WAIT_TIME_MS)}) 
common/impala_test_suite.py:441: in run_test_case 
verify_runtime_profile(test_section['RUNTIME_PROFILE'], result.runtime_profile) 
common/test_result_verifier.py:560: in verify_runtime_profile actual)) E 
AssertionError: Did not find matches for lines in runtime profile: E EXPECTED 
LINES: E row_regex: .*Rows rejected: 2.43K .*

{code}
The actual profile does show several locations of "Rows rejected" but not with 
that specific number:

 
{code:java}
E - Rows rejected: 2.33K (2332)
...
E - Rows rejected: 2.45K (2446)
...
E - Rows rejected: 2.15K (2150)
...
E - Rows rejected: 2.15K (2150)
{code}
 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-6382) Impalad crashes on SELECT query when spill buffer is set on certain values

2018-01-19 Thread Bikramjeet Vig (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikramjeet Vig resolved IMPALA-6382.

   Resolution: Fixed
Fix Version/s: Impala 2.12.0

[https://github.com/apache/impala/commit/028a83e6543a18dd3b9161226355f1e8d36c4ed7]

 

IMPALA-6382: Cap spillable buffer size and max row size query options
Currently the default and min spillable buffer size and max row size query 
options accept any valid int64 value. Since the planner depends on these values 
for memory estimations, if a very large value close to the limits of int64 is 
set, the variables representing or relying on these estimates can overflow 
during different phases of query execution. This patch puts a reasonable upper 
limit of 1TB to these query options to prevent such a situation. Testing: Added 
backend query option tests. Change-Id: 
I36d3915f7019b13c3eb06f08bfdb38c71ec864f1 Reviewed-on: 
[http://gerrit.cloudera.org:8080/9023]Reviewed-by: Bikramjeet Vig 
 Tested-by: Impala Public Jenkins

> Impalad crashes on SELECT query when spill buffer is set on certain values
> --
>
> Key: IMPALA-6382
> URL: https://issues.apache.org/jira/browse/IMPALA-6382
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.10.0, Impala 2.11.0, Impala 2.12.0
> Environment: Impala mini cluster
>Reporter: Xinran Tinney
>Assignee: Bikramjeet Vig
>Priority: Critical
>  Labels: crash
> Fix For: Impala 2.12.0
>
> Attachments: Nocrashwithdifferentbuffersize.png, 
> impalacrashesonbigbuffersize.png
>
>
> After starting the minicluster and run bin/impala-shell.sh, do 
> set MIN_SPILLABLE_BUFFER_SIZE= 4611686018427387904;
> set DEFAULT_SPILLABLE_BUFFER_SIZE=4611686018427387904;
> use tpch;
> select distinct l_comment from lineitem;
> then impalad is crashed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-4886) Expose per table partition/files/blocks count in web UI

2018-01-19 Thread Dimitris Tsirogiannis (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-4886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dimitris Tsirogiannis resolved IMPALA-4886.
---
   Resolution: Fixed
Fix Version/s: Impala 3.0

Change-Id: I37d407979e6d3b1a444b6b6265900b148facde9e
Reviewed-on:

[http://gerrit.cloudera.org:8080/8529]


Reviewed-by: Dimitris Tsirogiannis <

[dtsirogian...@cloudera.com|mailto:dtsirogian...@cloudera.com]

>
Tested-by: Impala Public Jenkins
---
M be/src/catalog/catalog-server.cc
M be/src/catalog/catalog-server.h
M be/src/catalog/catalog.cc
M be/src/catalog/catalog.h
M common/thrift/CatalogObjects.thrift
M common/thrift/Frontend.thrift
M common/thrift/JniCatalog.thrift
M fe/pom.xml
M fe/src/main/java/org/apache/impala/catalog/Catalog.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
A fe/src/main/java/org/apache/impala/catalog/CatalogUsageMonitor.java
M fe/src/main/java/org/apache/impala/catalog/HBaseTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/KuduTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/common/Metrics.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
A fe/src/main/java/org/apache/impala/util/TopNCache.java
A fe/src/test/java/org/apache/impala/util/TestTopNCache.java
M tests/webserver/test_web_pages.py
M www/catalog.tmpl
A www/table_metrics.tmpl
24 files changed, 1,206 insertions(+), 113 deletions(-)

> Expose per table partition/files/blocks count in web UI
> ---
>
> Key: IMPALA-4886
> URL: https://issues.apache.org/jira/browse/IMPALA-4886
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Catalog
>Affects Versions: Impala 2.8.0
>Reporter: Dimitris Tsirogiannis
>Assignee: Dimitris Tsirogiannis
>Priority: Major
>  Labels: usability
> Fix For: Impala 3.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-6425) Change Mempool memory allocation size to be <1MB to avoid allocating from CentralFreeList

2018-01-19 Thread Mostafa Mokhtar (JIRA)
Mostafa Mokhtar created IMPALA-6425:
---

 Summary: Change Mempool memory allocation size to be <1MB to avoid 
allocating from CentralFreeList
 Key: IMPALA-6425
 URL: https://issues.apache.org/jira/browse/IMPALA-6425
 Project: IMPALA
  Issue Type: Improvement
Reporter: Mostafa Mokhtar
Assignee: Tim Armstrong


While [~tlipcon] was investigating KRPC contention he noticed that 
MemPool::Allocate is doing 1MB allocations, which is somewhat of an 
anti-pattern with tcmalloc. 
During the tests MemPool was doing several thousand 1MB allocs per second and 
those have to do a full scan of the tcmalloc span linked list, which is very 
slow and only gets slower
 1040384 bytes on the other hand is constant time. 
 
It is not clear if a Power of 2 allocation size would help, worth experimenting 
with 512KB and 1040384 bytes. 
{code}

 /// The maximum size of chunk that should be allocated. Allocations larger 
than this
 /// size will get their own individual chunk.
 static const int MAX_CHUNK_SIZE = 8192*127

{code}

 

{code}

#0  0x02097407 in base::internal::SpinLockDelay(int volatile*, int, 
int) ()
#1  0x020e2049 in SpinLock::SlowLock() ()
#2  0x02124348 in tcmalloc::CentralFreeList::Populate() ()
#3  0x02124458 in tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, 
void**, void**) ()
#4  0x021244e8 in tcmalloc::CentralFreeList::RemoveRange(void**, 
void**, int) ()
#5  0x02131ee5 in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned 
int, int, void* (*)(unsigned long)) ()
#6  0x00b2879a in impala::MemPool::FindChunk(long, bool) ()
#7  0x00b364f6 in impala::MemPool::Allocate(long) ()
#8  0x00b36674 in impala::FreePool::Allocate(long) ()
#9  0x00b353db in impala::RowBatch::Deserialize(kudu::Slice const&, 
kudu::Slice const&, long, bool, impala::FreePool*) ()
#10 0x00b35795 in impala::RowBatch::RowBatch(impala::RowDescriptor 
const*, impala::RowBatchHeaderPB const&, kudu::Slice const&, kudu::Slice 
const&, impala::FreePool*) ()
#11 0x00b1644f in 
impala::KrpcDataStreamRecvr::SenderQueue::AddBatchWork(long, 
impala::RowBatchHeaderPB const&, kudu::Slice const&, kudu::Slice const&, 
boost::unique_lock*) ()
#12 0x00b19135 in 
impala::KrpcDataStreamRecvr::SenderQueue::AddBatch(impala::TransmitDataRequestPB
 const*, kudu::rpc::RpcContext*) ()
#13 0x00b0ee30 in 
impala::KrpcDataStreamMgr::AddData(impala::TransmitDataRequestPB const*, 
kudu::rpc::RpcContext*) ()
#14 0x01187035 in 
kudu::rpc::GeneratedServiceIf::Handle(kudu::rpc::InboundCall*) ()
#15 0x011bc1cd in impala::ImpalaServicePool::RunThread(long) ()

{code}

 

Also it appears that the thread above was a victim of thread below, yet 
allocations <1MB will make MemPool::Allocate content less over the 
CentralFreeList lock. 

{code}

#0 0x003173ae5407 in madvise () from /lib64/libc.so.6
#1 0x02131cca in TCMalloc_SystemRelease(void*, unsigned long) ()
#2 0x0212f26a in tcmalloc::PageHeap::DecommitSpan(tcmalloc::Span*) ()
#3 0x0212f505 in tcmalloc::PageHeap::MergeIntoFreeList(tcmalloc::Span*) 
()
#4 0x0212f864 in tcmalloc::PageHeap::Delete(tcmalloc::Span*) ()
#5 0x02123cf7 in tcmalloc::CentralFreeList::ReleaseToSpans(void*) ()
#6 0x02123d9b in tcmalloc::CentralFreeList::ReleaseListToSpans(void*) ()
#7 0x02124067 in tcmalloc::CentralFreeList::InsertRange(void*, void*, 
int) ()
#8 0x021320a4 in 
tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, 
unsigned int, int) ()
#9 0x02132575 in 
tcmalloc::ThreadCache::ListTooLong(tcmalloc::ThreadCache::FreeList*, unsigned 
int) ()
#10 0x00b276e0 in impala::MemPool::FreeAll() ()
#11 0x00b34655 in impala::RowBatch::Reset() ()
#12 0x00fe882f in 
impala::PartitionedAggregationNode::GetRowsStreaming(impala::RuntimeState*, 
impala::RowBatch*) ()
#13 0x00fe9771 in 
impala::PartitionedAggregationNode::GetNext(impala::RuntimeState*, 
impala::RowBatch*, bool*) ()
#14 0x00b78352 in impala::FragmentInstanceState::ExecInternal() ()
#15 0x00b7adc2 in impala::FragmentInstanceState::Exec() ()
#16 0x00b6a0da in 
impala::QueryState::ExecFInstance(impala::FragmentInstanceState*) ()

{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (IMPALA-6368) test_chars.py races with itself when run in parallel

2018-01-19 Thread Tim Armstrong (JIRA)

 [ 
https://issues.apache.org/jira/browse/IMPALA-6368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-6368.
---
   Resolution: Fixed
Fix Version/s: Impala 2.12.0


IMPALA-6368: make test_chars parallel

Previously it had to be executed serially because it modified tables in
the functional database.

This change separates out tests that use temporary tables and runs those
in a unique_database.

Testing:
Ran locally in a loop with parallelism of 4 for a while.

Change-Id: I2f62ede90f619b8cebbb1276bab903e7555d9744
Reviewed-on: http://gerrit.cloudera.org:8080/9022
Reviewed-by: Tim Armstrong 
Tested-by: Impala Public Jenkins

> test_chars.py races with itself when run in parallel
> 
>
> Key: IMPALA-6368
> URL: https://issues.apache.org/jira/browse/IMPALA-6368
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.11.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
> Fix For: Impala 2.12.0
>
>
> There are a handful of test tables like test_char_tmp that can be inserted 
> into from multiple tests in parallel.
> {noformat}
>  TestStringQueries.test_varchar[exec_option: {'batch_size': 0, 'num_nodes': 
> 0, 'disable_codegen_rows_threshold': 0, 'disable_codegen': True, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> text/none] 
> [gw3] linux2 -- Python 2.7.12 
> /home/tarmstrong/Impala/incubator-impala/bin/../infra/python/env/bin/python
> tests/query_test/test_chars.py:82: in test_varchar
> self.run_test_case('QueryTest/chars', vector)
> tests/common/impala_test_suite.py:424: in run_test_case
> self.__verify_results_and_errors(vector, test_section, result, use_db)
> tests/common/impala_test_suite.py:297: in __verify_results_and_errors
> replace_filenames_with_placeholder)
> tests/common/test_result_verifier.py:404: in verify_raw_results
> VERIFIER_MAP[verifier](expected, actual)
> tests/common/test_result_verifier.py:231: in verify_query_result_is_equal
> assert expected_results == actual_results
> E   assert Comparing QueryTestResults (expected vs actual):
> E 'hello' == 'hello'
> E None != 'hello'
> E Number of rows returned (expected vs actual): 1 != 2
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (IMPALA-6424) invalidate metadata loads file metadata twice

2018-01-19 Thread Juan Yu (JIRA)
Juan Yu created IMPALA-6424:
---

 Summary: invalidate metadata  loads file metadata twice
 Key: IMPALA-6424
 URL: https://issues.apache.org/jira/browse/IMPALA-6424
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Reporter: Juan Yu


Invalidate metadata  load file metadata twice. it loads everything 
follow by a REFRESH table. The refresh seems redundant.

 

I0119 07:46:41.107390 26758 CatalogServiceCatalog.java:1518] Invalidating table 
metadata: s3.catalog_sales
 I0119 07:46:43.002053 26309 catalog-server.cc:331] Publishing update : 
TABLE:s3.catalog_sales@1166
 I0119 07:46:43.002068 26309 catalog-server.cc:331] Publishing update : 
CATALOG:b0f520a5e2ab4056:b7e2e045fa39d625@1166

I0119 07:46:46.696725 26758 TableLoadingMgr.java:70] Loading metadata for 
table: s3.catalog_sales
 I0119 07:46:46.696781 26758 TableLoadingMgr.java:72] Remaining items in queue: 
0. Loads in progress: 1
 I0119 07:46:46.696857 27023 TableLoader.java:58] Loading metadata for: 
s3.catalog_sales
 I0119 07:46:46.713222 27023 HdfsTable.java:1206] Fetching partition metadata 
from the Metastore: s3.catalog_sales
 I0119 07:46:46.905102 27023 HdfsTable.java:1210] Fetched partition metadata 
from the Metastore: s3.catalog_sales
 *I0119 07:46:46.939254 27023 HdfsTable.java:834] Loading file and block 
metadata for 1837 paths for table s3.catalog_sales using a thread pool of size 
20*
 I0119 07:47:00.426975 27023 HdfsTable.java:874] Loaded file and block metadata 
for s3.catalog_sales
 I0119 07:47:00.427062 27023 TableLoader.java:97] Loaded metadata for: 
s3.catalog_sales

I0119 07:47:00.427243 26758 CatalogServiceCatalog.java:1433] Refreshing table 
metadata: s3.catalog_sales
 I0119 07:47:00.441572 26758 HdfsTable.java:1193] Incrementally loading table 
metadata for: s3.catalog_sales
 *I0119 07:47:00.456437 26758 HdfsTable.java:834] Loading file and block 
metadata for 1837 paths for table s3.catalog_sales using a thread pool of size 
20*
 I0119 07:47:14.038097 26758 HdfsTable.java:874] Loaded file and block metadata 
for s3.catalog_sales
 I0119 07:47:14.038132 26758 HdfsTable.java:1203] Incrementally loaded table 
metadata for: s3.catalog_sales
 I0119 07:47:14.038179 26758 CatalogServiceCatalog.java:1456] Refreshed table 
metadata: s3.catalog_sales
 I0119 07:47:14.062625 26309 catalog-server.cc:331] Publishing update : 
TABLE:s3.catalog_sales@1168
 I0119 07:47:14.062645 26309 catalog-server.cc:331] Publishing update : 
CATALOG:b0f520a5e2ab4056:b7e2e045fa39d625@1168



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)