from:"Tim Armstrong \(JIRA\)"

[jira] [Commented] (IMPALA-7239) Mitigate ParseSmaps() overhead

2018-07-06 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535111#comment-16535111
 ] 

Tim Armstrong commented on IMPALA-7239:
---

It looks like smaps does acquire mmap_sem at various points during the page 
table walk. This does affect performance of virtual memory operations. E.g. if 
I run the attached program mmap.c and cat /proc/$(pgrep mmap)/smaps in another 
terminal, mmap noticably slows down, even on a system without the smaps bug 
mentioned above. I think if the bug is present then the smap-producing code is 
probably holding mmap_sem while walking all of the threads.

{code}
# In one terminal.
gcc -std=gnu99 mmap.c -o mmap && ./mmap
# In another terminal.
while cat /proc/$(pgrep mmap)/smaps > /dev/null; do echo yes; done

{code}


> Mitigate ParseSmaps() overhead
> --
>
> Key: IMPALA-7239
> URL: https://issues.apache.org/jira/browse/IMPALA-7239
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: perf, resource-management
> Attachments: mmap.c
>
>
> I've heard anecdotes of high system time spent in functions related this the 
> smap parsing. It appears that this can be expensive on systems once the 
> impalad virtual memory gets fragmented and there are 10s of thousands of maps.
> We can try to mitigate by reducing frequency of the parsing or disabling it 
> entirely. I'm not sure if there are cheaper ways to get all of the same 
> metrics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-6950) Restrict execution of queries that don't include a partition filter or do a cross join

2018-07-06 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-6950:
--
Summary: Restrict execution of queries that don't include a partition 
filter or do a cross join  (was: Query option to restrict the execution of 
risky queries.)

> Restrict execution of queries that don't include a partition filter or do a 
> cross join
> --
>
> Key: IMPALA-6950
> URL: https://issues.apache.org/jira/browse/IMPALA-6950
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Affects Versions: Impala 2.11.0
>Reporter: Luis E Martinez-Poblete
>Priority: Minor
>
> Synopsis:
> =
> Query option to restrict the execution of risky queries.
> Feature Request:
> 
> Please enahnce Impala with a query option to restrict the execution of risk 
> queries.
> For instance, in Hive, the parameter "set hive.mapred.mode=strict" will 
> reject queries based in the following criteria:
> 1) Queries on partitioned tables are not permitted unless they include a 
> partition filter in the WHERE clause.
> 2) Queries doing Cartesian product (cross join).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-6950) Query option to restrict the execution of risky queries.

2018-07-06 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-6950:
--
Issue Type: Sub-task  (was: New Feature)
Parent: IMPALA-6032

> Query option to restrict the execution of risky queries.
> 
>
> Key: IMPALA-6950
> URL: https://issues.apache.org/jira/browse/IMPALA-6950
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Affects Versions: Impala 2.11.0
>Reporter: Luis E Martinez-Poblete
>Priority: Minor
>
> Synopsis:
> =
> Query option to restrict the execution of risky queries.
> Feature Request:
> 
> Please enahnce Impala with a query option to restrict the execution of risk 
> queries.
> For instance, in Hive, the parameter "set hive.mapred.mode=strict" will 
> reject queries based in the following criteria:
> 1) Queries on partitioned tables are not permitted unless they include a 
> partition filter in the WHERE clause.
> 2) Queries doing Cartesian product (cross join).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-6034) Add query option that limits scanned bytes at runtime

2018-07-06 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-6034:
-

Assignee: Tim Armstrong  (was: Mostafa Mokhtar)

> Add query option that limits scanned bytes at runtime
> -
>
> Key: IMPALA-6034
> URL: https://issues.apache.org/jira/browse/IMPALA-6034
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Reporter: Mostafa Mokhtar
>Assignee: Tim Armstrong
>Priority: Major
>
> Reject queries that scans large data before executing the query.
> This is a mechanism to protect the cluster from potentially harmful queries.
> MAX_READ_BYTES: [0]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-6950) Restrict execution of queries that don't include a partition filter or do a cross join

2018-07-06 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-6950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535216#comment-16535216
 ] 

Tim Armstrong commented on IMPALA-6950:
---

I think the ask is reasonable, but we should make sure to make the behaviour 
consistent with other Impala options rather than mimicking Hive.

> Restrict execution of queries that don't include a partition filter or do a 
> cross join
> --
>
> Key: IMPALA-6950
> URL: https://issues.apache.org/jira/browse/IMPALA-6950
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Affects Versions: Impala 2.11.0
>Reporter: Luis E Martinez-Poblete
>Priority: Minor
>
> Synopsis:
> =
> Query option to restrict the execution of risky queries.
> Feature Request:
> 
> Please enahnce Impala with a query option to restrict the execution of risk 
> queries.
> For instance, in Hive, the parameter "set hive.mapred.mode=strict" will 
> reject queries based in the following criteria:
> 1) Queries on partitioned tables are not permitted unless they include a 
> partition filter in the WHERE clause.
> 2) Queries doing Cartesian product (cross join).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-6033) Add query option that rejects queries against tables with missing statistics

2018-07-06 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535221#comment-16535221
 ] 

Tim Armstrong commented on IMPALA-6033:
---

I think #2 is feasible and relatively straightforward - we'd just check that we 
had a valid row count for every table.

#3 I think is tricky to do in a useful way because if it's too strict, it will 
create many false positives and be disabled. Currently the planner very 
aggressively warns about missing stats, e.g. for nested collections where no 
stats are expected, or if we did compute stats on a subset of columns, or if a 
few partitions are missing stats.

> Add query option that rejects queries against tables with missing statistics
> 
>
> Key: IMPALA-6033
> URL: https://issues.apache.org/jira/browse/IMPALA-6033
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Reporter: Mostafa Mokhtar
>Priority: Major
>  Labels: ramp-up
>
> Add query option that rejects queries against tables missing statistics.
> This is a mechanism to protect the cluster from potentially harmful queries.
> REJECT_STATS_MISSING_QUERIES: [0]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7239) Mitigate ParseSmaps() overhead

2018-07-06 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535241#comment-16535241
 ] 

Tim Armstrong commented on IMPALA-7239:
---

I wasn't able to figure out another way to get the number of memory maps, which 
is pretty useful for understanding VM fragmentation (you can use /proc/../maps 
or /proc/.../numa_maps but those appear to be equally expensive).

> Mitigate ParseSmaps() overhead
> --
>
> Key: IMPALA-7239
> URL: https://issues.apache.org/jira/browse/IMPALA-7239
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: perf, resource-management
> Attachments: mmap.c
>
>
> I've heard anecdotes of high system time spent in functions related this the 
> smap parsing. It appears that this can be expensive on systems once the 
> impalad virtual memory gets fragmented and there are 10s of thousands of maps.
> We can try to mitigate by reducing frequency of the parsing or disabling it 
> entirely. I'm not sure if there are cheaper ways to get all of the same 
> metrics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7239) Mitigate ParseSmaps() overhead

2018-07-06 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16535328#comment-16535328
 ] 

Tim Armstrong commented on IMPALA-7239:
---

/maps still has the [stack:] info that's expensive to compute:
{noformat}
7fc89ac74000-7fc89b474000 rw-p  00:00 0  
[stack:1864690]
{noformat}
Same with /numa_maps:
{noformat}
7fc89ac74000 prefer:0 stack:1864690 anon=3 dirty=3 N0=3 kernelpagesize_kB=4
{noformat}
This is on 3.10.0-327.36.3.el7.x86_64

> Mitigate ParseSmaps() overhead
> --
>
> Key: IMPALA-7239
> URL: https://issues.apache.org/jira/browse/IMPALA-7239
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: perf, resource-management
> Attachments: mmap.c
>
>
> I've heard anecdotes of high system time spent in functions related this the 
> smap parsing. It appears that this can be expensive on systems once the 
> impalad virtual memory gets fragmented and there are 10s of thousands of maps.
> We can try to mitigate by reducing frequency of the parsing or disabling it 
> entirely. I'm not sure if there are cheaper ways to get all of the same 
> metrics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-7053) Reorganise query options into groups

2018-07-06 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-7053:
-

Assignee: Bikramjeet Vig  (was: Tim Armstrong)

> Reorganise query options into groups
> 
>
> Key: IMPALA-7053
> URL: https://issues.apache.org/jira/browse/IMPALA-7053
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Tim Armstrong
>Assignee: Bikramjeet Vig
>Priority: Major
>
> We have quite a lot of query options now and we're adding more for things 
> like resource limits (e.g. IMPALA-6035). It's getting harder for users to 
> understand the organisation and find relevant query options. We should 
> consider grouping similar query options.
> E.g. for this set of resource limits, we could reorganise in various ways:
> * mem_limit -> resources.memory.per_node_limit
> * buffer_pool_limit -> resources.memory.buffer_pool.per_node_limit
> * thread_reservation_limit  -> resources.threads.per_node_limit
> * thread_reservation_aggregate_limit -> resources.threads.aggregate_limit
> * exec_time_limit_s -> resources.wallclock.limit_s
> We could do the conversion incrementally. It would probably make sense to 
> agree on a top-level organisation up-front.
> * planner - anything that controls planner decisions like join ordering, etc
> * scheduler - anything that controls scheduler decisions (admission control 
> could maybe be included here too)
> * resources - resource management functionality (limits, etc)
> * session - anything related to session management like timeouts
> * exec - anything that changes query execution behaviour (e.g. codegen, batch 
> sizes, runtime filters, etc)
> * Probably a group for anything that changes the semantic behaviour of a 
> query (e.g. decimal_v2, appx_count_distinct, strict_mode, abort_on_error).
> * A group that controls read and write behaviour of file formats like 
> compression, etc



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-4268) buffer more than a batch of rows at coordinator

2018-07-06 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-4268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-4268:
--
Summary: buffer more than a batch of rows at coordinator  (was: Allow 
PlanRootSink to buffer more than a batch of rows)

> buffer more than a batch of rows at coordinator
> ---
>
> Key: IMPALA-4268
> URL: https://issues.apache.org/jira/browse/IMPALA-4268
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Affects Versions: Impala 2.8.0
>Reporter: Henry Robinson
>Priority: Major
>  Labels: resource-management
>
> In IMPALA-2905, we are introducing a {{PlanRootSink}} that handles the 
> production of output rows at the root of a plan.
> The implementation in IMPALA-2905 has the plan execute in a separate thread 
> to the consumer, which calls {{GetNext()}} to retrieve the rows. However, the 
> sender thread will block until {{GetNext()}} is called, so that there are no 
> complications about memory usage and ownership due to having several batches 
> in flight at one time.
> However, this also leads to many context switches, as each {{GetNext()}} call 
> yields to the sender to produce the rows. If the sender was to fill a buffer 
> asynchronously, the consumer could pull out of that buffer without taking a 
> context switch in many cases (and the extra buffering might smooth out any 
> performance spikes due to client delays, which currently directly affect plan 
> execution).
> The tricky part is managing the mismatch between the size of the row batches 
> processed in {{Send()}} and the size of the fetch result asked for by the 
> client. The sender materializes output rows in a {{QueryResultSet}} that is 
> owned by the coordinator. That is not, currently, a splittable object - 
> instead it contains the actual RPC response struct that will hit the wire 
> when the RPC completes. As asynchronous sender cannot know the batch size, 
> which may change on every fetch call. So the {{GetNext()}} implementation 
> would need to be able to split out the {{QueryResultSet}} to match the 
> correct fetch size, and handle stitching together other {{QueryResultSets}} - 
> without doing extra copies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-7259) impala-shell is weirdly slow with some large queries

2018-07-06 Thread Tim Armstrong (JIRA)

Tim Armstrong created IMPALA-7259:
-

 Summary: impala-shell is weirdly slow with some large queries
 Key: IMPALA-7259
 URL: https://issues.apache.org/jira/browse/IMPALA-7259
 Project: IMPALA
  Issue Type: Bug
  Components: Clients
Affects Versions: Impala 3.1.0
Reporter: Tim Armstrong
 Attachments: wide-parquet-agg.sql

impala-shell is very slow at processing some large queries - it takes over a 
minute to actually submit the query. I've attached an example.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7278) distinct clause is not working as expected with custom UDFs

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7278:
--
Description: 
Distinct clause when executed with custom UDF returns unexpected results.

Custom UDF Definition:

udf.h file:
{code}
#ifndef IMPALA_UDF_SAMPLE_UDF_H
#define IMPALA_UDF_SAMPLE_UDF_H

#include "udf.h"

using namespace impala_udf;

#ifdef __cplusplus
extern "C"
{
#endif

StringVal udf_clear(FunctionContext* context, StringVal& sInput);
#ifdef __cplusplus
}
#endif
#endif
{code}

udf.cpp:

{code}
#include "clear.h"

StringVal udf_clear(
 FunctionContext* context,
 StringVal& sInput /* String to encrypt */
 )
{
 unsigned char* pReturnData = context->Allocate( 100 );
 memset( pReturnData, NULL, 100);
 memcpy(pReturnData, sInput.ptr, sInput.len );
 StringVal sResult( pReturnData );
 sResult.len = sInput.len;
 context->Free( (uint8_t*)pReturnData );
 return sResult;
}
{code}
CMakeLists.txt:
{code}
project (clear)
 ADD_LIBRARY (clear2.8_RHEL SHARED clear.cpp )
 TARGET_LINK_LIBRARIES (clear2.8_RHEL libImpalaUdf.a )
 SET_TARGET_PROPERTIES (clear2.8_RHEL PROPERTIES SUFFIX ".so")
 SET_TARGET_PROPERTIES (clear2.8_RHEL PROPERTIES PREFIX "")
 INSTALL ( TARGETS clear2.8_RHEL DESTINATION . )

Query Syntax:

CREATE TABLE clear (c1 STRING, c2 STRING) row format delimited fields 
terminated by ',' stored as textfile;
LOAD DATA INPATH '/user/clear.csv' OVERWRITE INTO TABLE clear;

Query: describe clear
+--++-+
| name | type | comment |
+--++-+
| c1 | string | |
| c2 | string | |
+--++-+
Fetched 2 row(s) in 0.04s

select * from clear;
+-+-+
| c1 | c2 |
+-+-+
| 111 | 111 |
| 111 | 111 |
| 22 | 22 |
| 44 | 44 |
| 22 | 22 |
| 333 | 333 |
| 333 | 333 |
+-+-+
Fetched 7 row(s) in 0.14s

select distinct udf_clear(c1),c2 from clear;
+---+-+
| default.udf_clear(c1) | c2 |
+---+-+
| {color:#d04437}*22* {color}| 44 |   <== this should be *44* 
| 22 | 22 |
| 333 | 333 |
| 111 | 111 |
+---+-+
Fetched 4 row(s) in 0.24s
{code}
 
Expected result:
{code}
select distinct c1,c2 from clear;
+-+-+
| c1 | c2 |
+-+-+
| 44 | 44 |
| 22 | 22 |
| 333 | 333 |
| 111 | 111 |
+-+-+
Fetched 4 row(s) in 0.25s
 {code}

  was:
Distinct clause when executed with custom UDF returns unexpected results.

Custom UDF Definition:

udf.h file:

==
{code}
#ifndef IMPALA_UDF_SAMPLE_UDF_H
#define IMPALA_UDF_SAMPLE_UDF_H

#include "udf.h"

using namespace impala_udf;

#ifdef __cplusplus
extern "C"
{
#endif

 

StringVal udf_clear(FunctionContext* context, StringVal& sInput);
#ifdef __cplusplus
}
#endif
#endif
{code}

udf.cpp:


{code}
#include "clear.h"

StringVal udf_clear(
 FunctionContext* context,
 StringVal& sInput /* String to encrypt */
 )
{
 unsigned char* pReturnData = context->Allocate( 100 );
 memset( pReturnData, NULL, 100);
 memcpy(pReturnData, sInput.ptr, sInput.len );
 StringVal sResult( pReturnData );
 sResult.len = sInput.len;
 context->Free( (uint8_t*)pReturnData );
 return sResult;
}
{code}
CMakeLists.txt:

===
{code}
project (clear)
 ADD_LIBRARY (clear2.8_RHEL SHARED clear.cpp )
 TARGET_LINK_LIBRARIES (clear2.8_RHEL libImpalaUdf.a )
 SET_TARGET_PROPERTIES (clear2.8_RHEL PROPERTIES SUFFIX ".so")
 SET_TARGET_PROPERTIES (clear2.8_RHEL PROPERTIES PREFIX "")
 INSTALL ( TARGETS clear2.8_RHEL DESTINATION . )

Query Syntax:

CREATE TABLE clear (c1 STRING, c2 STRING) row format delimited fields 
terminated by ',' stored as textfile;
LOAD DATA INPATH '/user/clear.csv' OVERWRITE INTO TABLE clear;

Query: describe clear
+--++-+
| name | type | comment |
+--++-+
| c1 | string | |
| c2 | string | |
+--++-+
Fetched 2 row(s) in 0.04s

select * from clear;
+-+-+
| c1 | c2 |
+-+-+
| 111 | 111 |
| 111 | 111 |
| 22 | 22 |
| 44 | 44 |
| 22 | 22 |
| 333 | 333 |
| 333 | 333 |
+-+-+
Fetched 7 row(s) in 0.14s

select distinct udf_clear(c1),c2 from clear;
+---+-+
| default.udf_clear(c1) | c2 |
+---+-+
| {color:#d04437}*22* {color}| 44 |   <== this should be *44* 
| 22 | 22 |
| 333 | 333 |
| 111 | 111 |
+---+-+
Fetched 4 row(s) in 0.24s
{code}
 
Expected result:
{code}
select distinct c1,c2 from clear;

+-+-+
| c1 | c2 |
+-+-+
| 44 | 44 |
| 22 | 22 |
| 333 | 333 |
| 111 | 111 |
+-+-+
Fetched 4 row(s) in 0.25s
 {code}


> disti

[jira] [Updated] (IMPALA-7278) distinct clause is not working as expected with custom UDFs

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7278:
--
Description: 
Distinct clause when executed with custom UDF returns unexpected results.

Custom UDF Definition:

udf.h file:

==
{code}
#ifndef IMPALA_UDF_SAMPLE_UDF_H
#define IMPALA_UDF_SAMPLE_UDF_H

#include "udf.h"

using namespace impala_udf;

#ifdef __cplusplus
extern "C"
{
#endif

 

StringVal udf_clear(FunctionContext* context, StringVal& sInput);
#ifdef __cplusplus
}
#endif
#endif
{code}

udf.cpp:


{code}
#include "clear.h"

StringVal udf_clear(
 FunctionContext* context,
 StringVal& sInput /* String to encrypt */
 )
{
 unsigned char* pReturnData = context->Allocate( 100 );
 memset( pReturnData, NULL, 100);
 memcpy(pReturnData, sInput.ptr, sInput.len );
 StringVal sResult( pReturnData );
 sResult.len = sInput.len;
 context->Free( (uint8_t*)pReturnData );
 return sResult;
}
{code}
CMakeLists.txt:

===
{code}
project (clear)
 ADD_LIBRARY (clear2.8_RHEL SHARED clear.cpp )
 TARGET_LINK_LIBRARIES (clear2.8_RHEL libImpalaUdf.a )
 SET_TARGET_PROPERTIES (clear2.8_RHEL PROPERTIES SUFFIX ".so")
 SET_TARGET_PROPERTIES (clear2.8_RHEL PROPERTIES PREFIX "")
 INSTALL ( TARGETS clear2.8_RHEL DESTINATION . )

Query Syntax:

CREATE TABLE clear (c1 STRING, c2 STRING) row format delimited fields 
terminated by ',' stored as textfile;
LOAD DATA INPATH '/user/clear.csv' OVERWRITE INTO TABLE clear;

Query: describe clear
+--++-+
| name | type | comment |
+--++-+
| c1 | string | |
| c2 | string | |
+--++-+
Fetched 2 row(s) in 0.04s

select * from clear;
+-+-+
| c1 | c2 |
+-+-+
| 111 | 111 |
| 111 | 111 |
| 22 | 22 |
| 44 | 44 |
| 22 | 22 |
| 333 | 333 |
| 333 | 333 |
+-+-+
Fetched 7 row(s) in 0.14s

select distinct udf_clear(c1),c2 from clear;
+---+-+
| default.udf_clear(c1) | c2 |
+---+-+
| {color:#d04437}*22* {color}| 44 |   <== this should be *44* 
| 22 | 22 |
| 333 | 333 |
| 111 | 111 |
+---+-+
Fetched 4 row(s) in 0.24s
{code}
 
Expected result:
{code}
select distinct c1,c2 from clear;

+-+-+
| c1 | c2 |
+-+-+
| 44 | 44 |
| 22 | 22 |
| 333 | 333 |
| 111 | 111 |
+-+-+
Fetched 4 row(s) in 0.25s
 {code}

  was:
Distinct clause when executed with custom UDF returns unexpected results.

Custom UDF Definition:

udf.h file:

==

#ifndef IMPALA_UDF_SAMPLE_UDF_H
#define IMPALA_UDF_SAMPLE_UDF_H

#include "udf.h"

using namespace impala_udf;

#ifdef __cplusplus
extern "C"
{
#endif

 

StringVal udf_clear(FunctionContext* context, StringVal& sInput);
#ifdef __cplusplus
}
#endif
#endif

udf.cpp:



#include "clear.h"

StringVal udf_clear(
 FunctionContext* context,
 StringVal& sInput /* String to encrypt */
 )
{
 unsigned char* pReturnData = context->Allocate( 100 );
 memset( pReturnData, NULL, 100);
 memcpy(pReturnData, sInput.ptr, sInput.len );
 StringVal sResult( pReturnData );
 sResult.len = sInput.len;
 context->Free( (uint8_t*)pReturnData );
 return sResult;
}

CMakeLists.txt:

===

project (clear)
 ADD_LIBRARY (clear2.8_RHEL SHARED clear.cpp )
 TARGET_LINK_LIBRARIES (clear2.8_RHEL libImpalaUdf.a )
 SET_TARGET_PROPERTIES (clear2.8_RHEL PROPERTIES SUFFIX ".so")
 SET_TARGET_PROPERTIES (clear2.8_RHEL PROPERTIES PREFIX "")
 INSTALL ( TARGETS clear2.8_RHEL DESTINATION . )

Query Syntax:

CREATE TABLE clear (c1 STRING, c2 STRING) row format delimited fields 
terminated by ',' stored as textfile;
LOAD DATA INPATH '/user/clear.csv' OVERWRITE INTO TABLE clear;

Query: describe clear
+--++-+
| name | type | comment |
+--++-+
| c1 | string | |
| c2 | string | |
+--++-+
Fetched 2 row(s) in 0.04s

select * from clear;
+-+-+
| c1 | c2 |
+-+-+
| 111 | 111 |
| 111 | 111 |
| 22 | 22 |
| 44 | 44 |
| 22 | 22 |
| 333 | 333 |
| 333 | 333 |
+-+-+
Fetched 7 row(s) in 0.14s

select distinct udf_clear(c1),c2 from clear;
+---+-+
| default.udf_clear(c1) | c2 |
+---+-+
| {color:#d04437}*22* {color}| 44 |   <== this should be *44* 
| 22 | 22 |
| 333 | 333 |
| 111 | 111 |
+---+-+
Fetched 4 row(s) in 0.24s

 

Expected result:

select distinct c1,c2 from clear;

+-+-+
| c1 | c2 |
+-+-+
| 44 | 44 |
| 22 | 22 |
| 333 | 333 |
| 111 | 111 |
+-+-+
Fetched 4 row(s) in 0.25s

 


> distinct c

[jira] [Commented] (IMPALA-7278) distinct clause is not working as expected with custom UDFs

2018-07-11 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540344#comment-16540344
 ] 

Tim Armstrong commented on IMPALA-7278:
---

There's a bug in your UDF code. You're calling this StringVal constructor: 
https://github.com/apache/impala/blob/master/be/src/udf/udf.h#L604. The 
constructor says "Note: this does not make a copy of ptr so the underlying 
string must exist as long as this StringVal does.".  Your code frees 
sReturnData before returning the string.

I'd suggest using the StringVal(FunctionContext* context, int len) constructor: 
https://github.com/apache/impala/blob/master/be/src/udf/udf.h#L618.  That will 
allocate string memory of 'len' that can be safely returned from your UDF. The 
lifetime of that memory is managed by the Impala runtime so your UDF doesn't 
need to fix it.

> distinct clause is not working as expected with custom UDFs
> ---
>
> Key: IMPALA-7278
> URL: https://issues.apache.org/jira/browse/IMPALA-7278
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.8.0
>Reporter: shabnam perween
>Priority: Critical
>
> Distinct clause when executed with custom UDF returns unexpected results.
> Custom UDF Definition:
> udf.h file:
> {code}
> #ifndef IMPALA_UDF_SAMPLE_UDF_H
> #define IMPALA_UDF_SAMPLE_UDF_H
> #include "udf.h"
> using namespace impala_udf;
> #ifdef __cplusplus
> extern "C"
> {
> #endif
> StringVal udf_clear(FunctionContext* context, StringVal& sInput);
> #ifdef __cplusplus
> }
> #endif
> #endif
> {code}
> udf.cpp:
> {code}
> #include "clear.h"
> StringVal udf_clear(
>  FunctionContext* context,
>  StringVal& sInput /* String to encrypt */
>  )
> {
>  unsigned char* pReturnData = context->Allocate( 100 );
>  memset( pReturnData, NULL, 100);
>  memcpy(pReturnData, sInput.ptr, sInput.len );
>  StringVal sResult( pReturnData );
>  sResult.len = sInput.len;
>  context->Free( (uint8_t*)pReturnData );
>  return sResult;
> }
> {code}
> CMakeLists.txt:
> {code}
> project (clear)
>  ADD_LIBRARY (clear2.8_RHEL SHARED clear.cpp )
>  TARGET_LINK_LIBRARIES (clear2.8_RHEL libImpalaUdf.a )
>  SET_TARGET_PROPERTIES (clear2.8_RHEL PROPERTIES SUFFIX ".so")
>  SET_TARGET_PROPERTIES (clear2.8_RHEL PROPERTIES PREFIX "")
>  INSTALL ( TARGETS clear2.8_RHEL DESTINATION . )
> Query Syntax:
> CREATE TABLE clear (c1 STRING, c2 STRING) row format delimited fields 
> terminated by ',' stored as textfile;
> LOAD DATA INPATH '/user/clear.csv' OVERWRITE INTO TABLE clear;
> Query: describe clear
> +--++-+
> | name | type | comment |
> +--++-+
> | c1 | string | |
> | c2 | string | |
> +--++-+
> Fetched 2 row(s) in 0.04s
> select * from clear;
> +-+-+
> | c1 | c2 |
> +-+-+
> | 111 | 111 |
> | 111 | 111 |
> | 22 | 22 |
> | 44 | 44 |
> | 22 | 22 |
> | 333 | 333 |
> | 333 | 333 |
> +-+-+
> Fetched 7 row(s) in 0.14s
> select distinct udf_clear(c1),c2 from clear;
> +---+-+
> | default.udf_clear(c1) | c2 |
> +---+-+
> | {color:#d04437}*22* {color}| 44 |   <== this should be *44* 
> | 22 | 22 |
> | 333 | 333 |
> | 111 | 111 |
> +---+-+
> Fetched 4 row(s) in 0.24s
> {code}
>  
> Expected result:
> {code}
> select distinct c1,c2 from clear;
> +-+-+
> | c1 | c2 |
> +-+-+
> | 44 | 44 |
> | 22 | 22 |
> | 333 | 333 |
> | 111 | 111 |
> +-+-+
> Fetched 4 row(s) in 0.25s
>  {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-2703) Automated testing of resource management

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2703.
---
Resolution: Won't Fix

Old JIRA, unclear what the intended scope is.

> Automated testing of resource management
> 
>
> Key: IMPALA-2703
> URL: https://issues.apache.org/jira/browse/IMPALA-2703
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Infrastructure
>Affects Versions: Impala 2.3.0
>Reporter: Harrison Sheinblatt
>Assignee: Mostafa Mokhtar
>Priority: Major
>  Labels: admission-control, resource-management, test-infra
>
> Umbrella JIRA for automated testing of RM/AC work for 2.6.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-6032) Configuration knobs to automatically reject and fail queries

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-6032:
-

Assignee: (was: Mostafa Mokhtar)

> Configuration knobs to automatically reject and fail queries
> 
>
> Key: IMPALA-6032
> URL: https://issues.apache.org/jira/browse/IMPALA-6032
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Distributed Exec
>Reporter: Mostafa Mokhtar
>Priority: Major
>  Labels: admission-control, resource-management, supportability
>
> Umbrella JIRA for Admission control enhancements.
> Query options would be set on a resource pool basis. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-4149) Enable automatic cluster provisioning and setup for CM clusters

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-4149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-4149.
---
Resolution: Won't Do

This is leftover from a previous test plan. We still want to do this kind of 
testing but need to rescope it.

> Enable automatic cluster provisioning and setup for CM clusters
> ---
>
> Key: IMPALA-4149
> URL: https://issues.apache.org/jira/browse/IMPALA-4149
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Affects Versions: Impala 2.6.0
>Reporter: Harrison Sheinblatt
>Assignee: Mostafa Mokhtar
>Priority: Major
>  Labels: test-infra
>
> Enable end-to-end automatic execution of admission control system tests, both 
> for automated regression and self-service for clusters using CM.
> Enhance infrastructure to make it easier to provision and configure 
> distributed cluster configurations for clusters using CM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-4148) Basic framework to setup AC, run tests, validate results

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-4148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-4148.
---
Resolution: Won't Do

This is leftover from a previous test plan. We still want to do this kind of 
testing but need to rescope it.

> Basic framework to setup AC, run tests, validate results
> 
>
> Key: IMPALA-4148
> URL: https://issues.apache.org/jira/browse/IMPALA-4148
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Affects Versions: Impala 2.6.0
>Reporter: Harrison Sheinblatt
>Assignee: Mostafa Mokhtar
>Priority: Major
>  Labels: admission-control, test, test-infra
>
> Initial framework to run a single data-driven admission control test that 
> setup different admission control configurations and validate no unexpected 
> query errors and that queue limits were obeyed for different query load 
> profiles.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-4147) Enable query and metric data retrieval and configuration update through CM

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-4147.
---
Resolution: Won't Do

This is leftover from a previous test plan. We still want to do this kind of 
testing but need to rescope it.

> Enable query and metric data retrieval and configuration update through CM
> --
>
> Key: IMPALA-4147
> URL: https://issues.apache.org/jira/browse/IMPALA-4147
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Affects Versions: Impala 2.6.0
>Reporter: Harrison Sheinblatt
>Assignee: Mostafa Mokhtar
>Priority: Major
>  Labels: test-infra
>
> For CM clusters, extend wrapper to allow service configuration update, Impala 
> query log retrieval, and timeseries data retrieval.
> This is to allow validation of load tests and setup of different admission 
> control queue configurations for test.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-4150) Automatic load test regression for AC part of CI, self-service

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-4150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-4150.
---
Resolution: Won't Do

This is leftover from a previous test plan. We still want to do this kind of 
testing but need to rescope it.

> Automatic load test regression for AC part of CI, self-service
> --
>
> Key: IMPALA-4150
> URL: https://issues.apache.org/jira/browse/IMPALA-4150
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Affects Versions: Impala 2.6.0
>Reporter: Harrison Sheinblatt
>Assignee: Mostafa Mokhtar
>Priority: Major
>  Labels: test-infra
>
> Hook all previous tasks together and make run automatically for CI
> Additionally, make it possible to run on-demand on a private build



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-4145) Automated load regression testing for admission control and catalog

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-4145.
---
Resolution: Won't Do

This is leftover from a previous test plan. We still want to do this kind of 
testing but need to rescope it.

> Automated load regression testing for admission control and catalog
> ---
>
> Key: IMPALA-4145
> URL: https://issues.apache.org/jira/browse/IMPALA-4145
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Infrastructure
>Affects Versions: Impala 2.6.0
>Reporter: Harrison Sheinblatt
>Priority: Major
>  Labels: admission-control, resource-management, test, test-infra
>
> Umbrella feature for first phase of load testing to regress basic 
> functionality of Admission Control and Catalog/DML.
> Goals:
> * Create a few basic automated end to end cluster tests for admission control 
> that show resource limits (memory, queue) are obeyed, and bound over 
> admission in lower stress situations.
> * Create a few basic concurrent catalog operations test in the same framework 
> to validate correctness under load
> * Enable additional tests to be more easily added in the framework created
> * Enable self-service as well as regular regression



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-4151) Enhance framework to test Catalog/DML load test workloads as well

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-4151.
---
Resolution: Won't Do

This is leftover from a previous test plan. We still want to do this kind of 
testing but need to rescope it.

> Enhance framework to test Catalog/DML load test workloads as well
> -
>
> Key: IMPALA-4151
> URL: https://issues.apache.org/jira/browse/IMPALA-4151
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Affects Versions: Impala 2.6.0
>Reporter: Harrison Sheinblatt
>Assignee: Mostafa Mokhtar
>Priority: Major
>  Labels: test, test-infra
>
> Either enhance the admission control test script to work for catalog tests or 
> create a separate script runnable through the same framework, and prove out 
> by adding basic catalog tests.
> Main task is to allow sequences of dependent queries (DML) to be executed 
> serially in load generator threads.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-4152) Automatic load test regression for Catalog/DML concurrency in CI, on-demand

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-4152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-4152.
---
Resolution: Won't Do

This is leftover from a previous test plan. We still want to do this kind of 
testing but need to rescope it.

> Automatic load test regression for Catalog/DML concurrency in CI, on-demand
> ---
>
> Key: IMPALA-4152
> URL: https://issues.apache.org/jira/browse/IMPALA-4152
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Infrastructure
>Affects Versions: Impala 2.6.0
>Reporter: Harrison Sheinblatt
>Assignee: Mostafa Mokhtar
>Priority: Major
>  Labels: test-infra
>
> Add existing catalog tests to CI tests, and create an entry point for 
> on-demand execution of these tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-2385) Update test scripts to support repeated runs of end-to-end tests without restarting Impala

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-2385:
--
Summary: Update test scripts to support repeated runs of end-to-end tests 
without restarting Impala  (was: Make impala-cdh5.5.x-repeated-runs not restart 
Impala)

> Update test scripts to support repeated runs of end-to-end tests without 
> restarting Impala
> --
>
> Key: IMPALA-2385
> URL: https://issues.apache.org/jira/browse/IMPALA-2385
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 2.2.4
>Reporter: Harrison Sheinblatt
>Assignee: Mostafa Mokhtar
>Priority: Critical
>  Labels: test-infra
>
> The intent of the job is to find crashes due to resource leaks or other 
> stability issues by running the tests we have (which run in parallel) for an 
> extended period of time.  This testing is required for release of 2.3.
> Currently, the job runs all tests, which include cluster tests that restart 
> Impala, and it runs a loop in the driving bash script that forces a restart 
> each iteration.  These restarts must be removed.
> Further, it needs to be verified that no other test restarts Impala.  Ideally 
> the testing could also verify this automatically, but at least one manual 
> check that there are no restarts must be made.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-2385) Update test scripts to support repeated runs of end-to-end tests without restarting Impala

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-2385:
-

Assignee: (was: Mostafa Mokhtar)

> Update test scripts to support repeated runs of end-to-end tests without 
> restarting Impala
> --
>
> Key: IMPALA-2385
> URL: https://issues.apache.org/jira/browse/IMPALA-2385
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 2.2.4
>Reporter: Harrison Sheinblatt
>Priority: Critical
>  Labels: test-infra
>
> The intent of the job is to find crashes due to resource leaks or other 
> stability issues by running the tests we have (which run in parallel) for an 
> extended period of time.  This testing is required for release of 2.3.
> Currently, the job runs all tests, which include cluster tests that restart 
> Impala, and it runs a loop in the driving bash script that forces a restart 
> each iteration.  These restarts must be removed.
> Further, it needs to be verified that no other test restarts Impala.  Ideally 
> the testing could also verify this automatically, but at least one manual 
> check that there are no restarts must be made.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-2385) Update test scripts to support repeated runs of end-to-end tests without restarting Impala

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-2385:
--
Priority: Major  (was: Critical)

> Update test scripts to support repeated runs of end-to-end tests without 
> restarting Impala
> --
>
> Key: IMPALA-2385
> URL: https://issues.apache.org/jira/browse/IMPALA-2385
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 2.2.4
>Reporter: Harrison Sheinblatt
>Priority: Major
>  Labels: test-infra
>
> The intent of the job is to find crashes due to resource leaks or other 
> stability issues by running the tests we have (which run in parallel) for an 
> extended period of time. 
> Currently, the job runs all tests, which include cluster tests that restart 
> Impala, and it runs a loop in the driving bash script that forces a restart 
> each iteration.  These restarts must be removed.
> Further, it needs to be verified that no other test restarts Impala.  Ideally 
> the testing could also verify this automatically, but at least one manual 
> check that there are no restarts must be made.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-2385) Update test scripts to support repeated runs of end-to-end tests without restarting Impala

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-2385:
--
Description: 
The intent of the job is to find crashes due to resource leaks or other 
stability issues by running the tests we have (which run in parallel) for an 
extended period of time. 

Currently, the job runs all tests, which include cluster tests that restart 
Impala, and it runs a loop in the driving bash script that forces a restart 
each iteration.  These restarts must be removed.

Further, it needs to be verified that no other test restarts Impala.  Ideally 
the testing could also verify this automatically, but at least one manual check 
that there are no restarts must be made.

  was:
The intent of the job is to find crashes due to resource leaks or other 
stability issues by running the tests we have (which run in parallel) for an 
extended period of time.  This testing is required for release of 2.3.

Currently, the job runs all tests, which include cluster tests that restart 
Impala, and it runs a loop in the driving bash script that forces a restart 
each iteration.  These restarts must be removed.

Further, it needs to be verified that no other test restarts Impala.  Ideally 
the testing could also verify this automatically, but at least one manual check 
that there are no restarts must be made.


> Update test scripts to support repeated runs of end-to-end tests without 
> restarting Impala
> --
>
> Key: IMPALA-2385
> URL: https://issues.apache.org/jira/browse/IMPALA-2385
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 2.2.4
>Reporter: Harrison Sheinblatt
>Priority: Critical
>  Labels: test-infra
>
> The intent of the job is to find crashes due to resource leaks or other 
> stability issues by running the tests we have (which run in parallel) for an 
> extended period of time. 
> Currently, the job runs all tests, which include cluster tests that restart 
> Impala, and it runs a loop in the driving bash script that forces a restart 
> each iteration.  These restarts must be removed.
> Further, it needs to be verified that no other test restarts Impala.  Ideally 
> the testing could also verify this automatically, but at least one manual 
> check that there are no restarts must be made.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-3262) Investigate Codegen Performance

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-3262:
--
Labels: codegen  (was: )

> Investigate Codegen Performance
> ---
>
> Key: IMPALA-3262
> URL: https://issues.apache.org/jira/browse/IMPALA-3262
> Project: IMPALA
>  Issue Type: Task
>  Components: Perf Investigation
>Affects Versions: Impala 2.2.4
>Reporter: Alan Choi
>Assignee: Mostafa Mokhtar
>Priority: Minor
>  Labels: codegen
> Attachments: LLVM CallStack.csv
>
>
> It has been observed that codegen for even relatively simple queries would 
> take ~600ms. This is quite a significant overhead for fast queries. 
> Investigate how to reduce the codegen time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-2745) Stop printing query profiles in jenkins output in impala-workload-runner-10node-cdh5

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2745.
---
Resolution: Invalid

Cloudera infra issue

> Stop printing query profiles in jenkins output in 
> impala-workload-runner-10node-cdh5
> 
>
> Key: IMPALA-2745
> URL: https://issues.apache.org/jira/browse/IMPALA-2745
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.2.4
>Reporter: Dimitris Tsirogiannis
>Assignee: Mostafa Mokhtar
>Priority: Minor
>  Labels: test-infra
>
> The impala-workload-runner-10node-cdh5 jenkins job outputs in the log the 
> query profiles of all the queries being executed, making it almost impossible 
> to find errors in the log and diagnose issues.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-3262) Investigate Codegen Performance

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3262.
---
Resolution: Duplicate

I think this is more-or-less redundant with IMPALA-3262. The bulk of codegen 
time is spent in optimization and to improve that we need to evaluate 
individual passes for cost vs benefit

> Investigate Codegen Performance
> ---
>
> Key: IMPALA-3262
> URL: https://issues.apache.org/jira/browse/IMPALA-3262
> Project: IMPALA
>  Issue Type: Task
>  Components: Perf Investigation
>Affects Versions: Impala 2.2.4
>Reporter: Alan Choi
>Assignee: Mostafa Mokhtar
>Priority: Minor
>  Labels: codegen
> Attachments: LLVM CallStack.csv
>
>
> It has been observed that codegen for even relatively simple queries would 
> take ~600ms. This is quite a significant overhead for fast queries. 
> Investigate how to reduce the codegen time.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-5632) Investigate use of transparent huge pages for Bloom Filters

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-5632:
-

Assignee: (was: Mostafa Mokhtar)

> Investigate use of transparent huge pages for Bloom Filters
> ---
>
> Key: IMPALA-5632
> URL: https://issues.apache.org/jira/browse/IMPALA-5632
> Project: IMPALA
>  Issue Type: Task
>  Components: Perf Investigation
>Reporter: Jim Apple
>Priority: Major
> Fix For: Impala 2.12.0
>
>
> See {{system-allocator.cc}} for some templates for using {{madvise}}. It 
> appears that TLB misses are causing performance degradation in BF lookups. 
> THPs could reduce TLB pressure when the BFs are more than 2MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-5632) Investigate use of transparent huge pages for Bloom Filters

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-5632.
---
   Resolution: Fixed
Fix Version/s: Impala 2.12.0

IMPALA-5519 uses THP when available

> Investigate use of transparent huge pages for Bloom Filters
> ---
>
> Key: IMPALA-5632
> URL: https://issues.apache.org/jira/browse/IMPALA-5632
> Project: IMPALA
>  Issue Type: Task
>  Components: Perf Investigation
>Reporter: Jim Apple
>Priority: Major
> Fix For: Impala 2.12.0
>
>
> See {{system-allocator.cc}} for some templates for using {{madvise}}. It 
> appears that TLB misses are causing performance degradation in BF lookups. 
> THPs could reduce TLB pressure when the BFs are more than 2MB.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-340) Improve internal format of strings

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-340:


Assignee: (was: Mostafa Mokhtar)

> Improve internal format of strings
> --
>
> Key: IMPALA-340
> URL: https://issues.apache.org/jira/browse/IMPALA-340
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Affects Versions: Impala 1.0
>Reporter: Nong Li
>Priority: Minor
>  Labels: perfomance
>
> We currently store string data outside of a Tuple, with the string slot 
> taking up 8 bytes (4 bytes length, 8 bytes pointer, 4 bytes padding), which 
> is hugely wasteful.
> We need 2 improvements:
> a more compact string slot: Intel architectures only use 48 bits of a 64-bit 
> address; strings are usually smaller than 64K; if the latter holds, we should 
> pack a string slot into 64 bits total
> in-line representation of strings: schemas we've seen often use strings as 
> ids (which then also show up as foreign keys and are used heavily in joins), 
> and those are typically smaller than 8 bytes; in that case, we could simply 
> store the actual data in the string slot itself
> See benchmarks/string-benchmark.cc.
> See IMP-148 for more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-1255) run-tests.py should give a results summary

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-1255:
-

Assignee: (was: Mostafa Mokhtar)

> run-tests.py should give a results summary
> --
>
> Key: IMPALA-1255
> URL: https://issues.apache.org/jira/browse/IMPALA-1255
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Affects Versions: Impala 1.4
>Reporter: Dan Hecht
>Priority: Minor
>  Labels: test-infra
>
> It would be nice if run-tests.py printed a final summary of 
> passes/failures/xfails/skips etc at the end of a run.  Otherwise, it's too 
> easy to not notice that a test failed, especially with all the spew from the 
> dsession.py plugin and the fact that we always run the test_verify_metrics.py 
> after all runs.  Failures often scroll off the screen by that end of all the 
> output, and you are left with just the test_verify_metrics results.
> Doing this would also make it easier to see if there were failures when not 
> using -x (stop after first failure).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-340) Improve internal format of strings

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-340:
-
Labels: perfomance  (was: )

> Improve internal format of strings
> --
>
> Key: IMPALA-340
> URL: https://issues.apache.org/jira/browse/IMPALA-340
> Project: IMPALA
>  Issue Type: Task
>  Components: Backend
>Affects Versions: Impala 1.0
>Reporter: Nong Li
>Priority: Minor
>  Labels: perfomance
>
> We currently store string data outside of a Tuple, with the string slot 
> taking up 8 bytes (4 bytes length, 8 bytes pointer, 4 bytes padding), which 
> is hugely wasteful.
> We need 2 improvements:
> a more compact string slot: Intel architectures only use 48 bits of a 64-bit 
> address; strings are usually smaller than 64K; if the latter holds, we should 
> pack a string slot into 64 bits total
> in-line representation of strings: schemas we've seen often use strings as 
> ids (which then also show up as foreign keys and are used heavily in joins), 
> and those are typically smaller than 8 bytes; in that case, we could simply 
> store the actual data in the string slot itself
> See benchmarks/string-benchmark.cc.
> See IMP-148 for more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-6294) Concurrent hung with lots of spilling make slow progress due to blocking in DataStreamRecvr and DataStreamSender

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-6294:
-

Assignee: Michael Ho  (was: Mostafa Mokhtar)

> Concurrent hung with lots of spilling make slow progress due to blocking in 
> DataStreamRecvr and DataStreamSender
> 
>
> Key: IMPALA-6294
> URL: https://issues.apache.org/jira/browse/IMPALA-6294
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.11.0
>Reporter: Mostafa Mokhtar
>Assignee: Michael Ho
>Priority: Critical
> Attachments: IMPALA-6285 TPCDS Q3 slow broadcast, 
> slow_broadcast_q3_reciever.txt, slow_broadcast_q3_sender.txt
>
>
> While running a highly concurrent spilling workload on a large cluster 
> queries start running slower, even light weight queries that are not running 
> are affected by this slow down. 
> {code}
>   EXCHANGE_NODE (id=9):(Total: 3m1s, non-child: 3m1s, % non-child: 
> 100.00%)
>  - ConvertRowBatchTime: 999.990us
>  - PeakMemoryUsage: 0
>  - RowsReturned: 108.00K (108001)
>  - RowsReturnedRate: 593.00 /sec
> DataStreamReceiver:
>   BytesReceived(4s000ms): 254.47 KB, 338.82 KB, 338.82 KB, 852.43 
> KB, 1.32 MB, 1.33 MB, 1.50 MB, 2.53 MB, 2.99 MB, 3.00 MB, 3.00 MB, 3.00 MB, 
> 3.00 MB, 3.00 MB, 3.00 MB, 3.00 MB, 3.00 MB, 3.00 MB, 3.16 MB, 3.49 MB, 3.80 
> MB, 4.15 MB, 4.55 MB, 4.84 MB, 4.99 MB, 5.07 MB, 5.41 MB, 5.75 MB, 5.92 MB, 
> 6.00 MB, 6.00 MB, 6.00 MB, 6.07 MB, 6.28 MB, 6.33 MB, 6.43 MB, 6.67 MB, 6.91 
> MB, 7.29 MB, 8.03 MB, 9.12 MB, 9.68 MB, 9.90 MB, 9.97 MB, 10.44 MB, 11.25 MB
>- BytesReceived: 11.73 MB (12301692)
>- DeserializeRowBatchTimer: 957.990ms
>- FirstBatchArrivalWaitTime: 0.000ns
>- PeakMemoryUsage: 644.44 KB (659904)
>- SendersBlockedTimer: 0.000ns
>- SendersBlockedTotalTimer(*): 0.000ns
> {code}
> {code}
> DataStreamSender (dst_id=9):(Total: 1s819ms, non-child: 1s819ms, % 
> non-child: 100.00%)
>- BytesSent: 234.64 MB (246033840)
>- NetworkThroughput(*): 139.58 MB/sec
>- OverallThroughput: 128.92 MB/sec
>- PeakMemoryUsage: 33.12 KB (33920)
>- RowsReturned: 108.00K (108001)
>- SerializeBatchTime: 133.998ms
>- TransmitDataRPCTime: 1s680ms
>- UncompressedRowBatchSize: 446.42 MB (468102200)
> {code}
> Timeouts seen in IMPALA-6285 are caused by this issue
> {code}
> I1206 12:44:14.925405 25274 status.cc:58] RPC recv timed out: Client 
> foo-17.domain.com:22000 timed-out during recv call.
> @   0x957a6a  impala::Status::Status()
> @  0x11dd5fe  
> impala::DataStreamSender::Channel::DoTransmitDataRpc()
> @  0x11ddcd4  
> impala::DataStreamSender::Channel::TransmitDataHelper()
> @  0x11de080  impala::DataStreamSender::Channel::TransmitData()
> @  0x11e1004  impala::ThreadPool<>::WorkerThread()
> @   0xd10063  impala::Thread::SuperviseThread()
> @   0xd107a4  boost::detail::thread_data<>::run()
> @  0x128997a  (unknown)
> @ 0x7f68c5bc7e25  start_thread
> @ 0x7f68c58f534d  __clone
> {code}
> A similar behavior was also observed with KRPC enabled IMPALA-6048



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-6048) Queries make very slow progress and report WaitForRPC() stuck for too long

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-6048:
-

Assignee: Michael Ho  (was: Mostafa Mokhtar)

> Queries make very slow progress and report  WaitForRPC() stuck for too long
> ---
>
> Key: IMPALA-6048
> URL: https://issues.apache.org/jira/browse/IMPALA-6048
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend, Distributed Exec
>Affects Versions: Impala 2.11.0
>Reporter: Mostafa Mokhtar
>Assignee: Michael Ho
>Priority: Critical
> Attachments: Archive 2.zip
>
>
> When running 32 concurrent queries from TPCDS a couple of instances from 
> TPC-DS Q78 9 hours to finish and it appeared to be hung.
> On an idle cluster the query finished in under 5 minutes, profiles attached. 
> When the query ran for long fragments reported +16 hours of network 
> send/receive time
> The logs show there is a lot of messages like the one below, there are 
> incidents for this log message where a node waited too long from an RPC from 
> itself
> {code}
> W1012 00:47:57.633549 117475 krpc-data-stream-sender.cc:360] XXX: 
> WaitForRPC() stuck for too long address=10.17.234.37:29000 
> fragment_instace_id_=1e48ef897e797131:2f05789b05eb dest_node_id_=24 
> sender_id_=81
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-5767) Automated perf job which doesn't rely on OS buffer cache

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-5767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-5767:
-

Assignee: (was: Mostafa Mokhtar)

> Automated perf job which doesn't rely on OS buffer cache
> 
>
> Key: IMPALA-5767
> URL: https://issues.apache.org/jira/browse/IMPALA-5767
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Reporter: Matthew Jacobs
>Priority: Major
>  Labels: performance
>
> It seems that there are no automated jobs right now which reliably capture 
> perf measurements on a workload where the data isn't already in OS buffer 
> cache.
> See 
> https://unix.stackexchange.com/questions/87908/how-do-you-empty-the-buffers-and-cache-on-a-linux-system



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-3743) Kudu scale testing

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3743.
---
Resolution: Later
  Assignee: (was: Mostafa Mokhtar)

> Kudu scale testing
> --
>
> Key: IMPALA-3743
> URL: https://issues.apache.org/jira/browse/IMPALA-3743
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Kudu_Impala
>Reporter: Matthew Jacobs
>Priority: Critical
>  Labels: kudu, test-infra
>
> TBD: scale testing requirements
> Short term: manual scale testing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-3743) Kudu scale testing

2018-07-11 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540388#comment-16540388
 ] 

Tim Armstrong commented on IMPALA-3743:
---

Unclear what the scope was. More testing is good but no point in keeping 
non-specific JIRAs open.

> Kudu scale testing
> --
>
> Key: IMPALA-3743
> URL: https://issues.apache.org/jira/browse/IMPALA-3743
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Kudu_Impala
>Reporter: Matthew Jacobs
>Priority: Critical
>  Labels: kudu, test-infra
>
> TBD: scale testing requirements
> Short term: manual scale testing



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-4769) Add test for the fix in IMPALA-4765

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-4769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-4769:
-

Assignee: (was: Mostafa Mokhtar)

> Add test for the fix in IMPALA-4765
> ---
>
> Key: IMPALA-4769
> URL: https://issues.apache.org/jira/browse/IMPALA-4769
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 2.8.0
>Reporter: Alexander Behm
>Priority: Major
>  Labels: test-infra
>
> The bug in IMPALA-4765 was a concurrency issue that is currently difficult to 
> test for. We should think about how to test the TableLoadingMgr to guard 
> against regressions. A JUnit test seems suitable, but would require 
> restructuring the existing code somewhat to make it amenable to unit testing.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-2939) Nested Types : Address Runtime & Scoped timer overhead

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-2939:
-

Assignee: (was: Mostafa Mokhtar)

> Nested Types : Address Runtime & Scoped timer overhead
> --
>
> Key: IMPALA-2939
> URL: https://issues.apache.org/jira/browse/IMPALA-2939
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.4.0
>Reporter: Mostafa Mokhtar
>Priority: Minor
>  Labels: nested_types, performance
> Attachments: Bottom-Up-HotFunctions.csv, Top-Down-HotFunctions.csv, 
> nestedTypesQ1.zip
>
>
> For the following query about 45% of the time is spent in updating timers, 
> RunTimeProfile and checking query state, since NestedTypes don't always 
> operate on Batches the overhead of updating counters is amplified. 
> {code}
> select 
> l.l_shipdate, count(*) as wins
> from
> customer.c_orders o,
> o.o_lineitems l
> where
> o_orderdate = '1993-12-12'
> group by l.l_shipdate
> order by wins;
> {code}
> |Function||Effective Time by Utilization||
> |clock_gettime29.8%   0s  0s  librt.so.1| clock_gettime|
> |impala::RuntimeProfile::Counter::Add|5.3%|
> |std::map std::less, std::allocator impala::RuntimeProfile::Counter*>>>::operator[]| 4.7%|
> |impala::RuntimeState::CheckQueryState|   3.2%|
> |impala::MonotonicStopWatch::Stop|2.7%|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-5587) ReleaseResources() should not destroy control structures

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-5587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-5587.
---
   Resolution: Fixed
Fix Version/s: Impala 2.13.0

I couldn't find any remaining examples of this pattern.

> ReleaseResources() should not destroy control structures
> 
>
> Key: IMPALA-5587
> URL: https://issues.apache.org/jira/browse/IMPALA-5587
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend
>Reporter: Dan Hecht
>Priority: Major
>  Labels: query-lifecycle
> Fix For: Impala 2.13.0
>
>
> For example:
> {code:title=RuntimeState::ReleaseResources()}
> void RuntimeState::ReleaseResources() {
>   UnregisterReaderContexts();
>   if (filter_bank_ != nullptr) filter_bank_->Close();
>   if (resource_pool_ != nullptr) {
> exec_env_->thread_mgr()->UnregisterPool(resource_pool_);
>   }
>   block_mgr_.reset(); // Release any block mgr memory, if this is the last 
> reference.
>   codegen_.reset(); // Release any memory associated with codegen.
>   // Release the reservation, which should be unused at the point.
>   if (instance_buffer_reservation_ != nullptr) 
> instance_buffer_reservation_->Close();
>   // 'query_mem_tracker()' must be valid as long as 'instance_mem_tracker_' 
> is so
>   // delete 'instance_mem_tracker_' first.
>   // LogUsage() walks the MemTracker tree top-down when the memory limit is 
> exceeded, so
>   // break the link between 'instance_mem_tracker_' and its parent before
>   // 'instance_mem_tracker_' and its children are destroyed.
>   instance_mem_tracker_->UnregisterFromParent();   <===
>   if (instance_mem_tracker_->consumption() != 0) {
> LOG(WARNING) << "Query " << query_id() << " may have leaked memory." << 
> endl
>  << instance_mem_tracker_->LogUsage();
>   }
>   instance_mem_tracker_.reset(); <===
>   if (local_query_state_.get() != nullptr) {
> // if we created this QueryState, we must call ReleaseResources()
> local_query_state_->ReleaseResources();
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-5802) COMPUTE STATS uses MT_DOP=4 by default

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-5802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-5802:
-

Assignee: (was: Alexander Behm)

> COMPUTE STATS uses MT_DOP=4 by default
> --
>
> Key: IMPALA-5802
> URL: https://issues.apache.org/jira/browse/IMPALA-5802
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Backend, Frontend
>Affects Versions: Impala 2.9.0
>Reporter: Alexander Behm
>Priority: Major
>  Labels: compute-stats
>
> Now that IMPALA-3905 has been completely addressed we should run COMPUTE 
> STATS with MT_DOP=4 by default, regardless of file format. The motivation is 
> consistency and speeding up COMPUTE STATS in most cases.
> This task is a continuation of IMPALA-4572.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-484) Some functions always return same data type while, according to MSDN, they should return same data type as input data

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-484:


Assignee: (was: Alexander Behm)

> Some functions always return same data type while, according to MSDN, they 
> should return same data type as input data
> -
>
> Key: IMPALA-484
> URL: https://issues.apache.org/jira/browse/IMPALA-484
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 1.0.1
>Reporter: James Deng
>Priority: Minor
>
> ABS always return double
> ROUND always return double
> SIGN always return float
> FLOOR always return bigint
> CEILING always return bigint



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-3264) FrontEnd Plan Serialization time

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-3264:
-

Assignee: (was: Mostafa Mokhtar)

> FrontEnd Plan Serialization time
> 
>
> Key: IMPALA-3264
> URL: https://issues.apache.org/jira/browse/IMPALA-3264
> Project: IMPALA
>  Issue Type: Task
>  Components: Perf Investigation
>Affects Versions: Impala 2.2.4
>Reporter: Alan Choi
>Priority: Minor
>
> It has been observed that the "planning timeline" in FE is only ~500ms, but 
> the "Finished Planning" time in the query profile timeline is ~2sec. Such a 
> big discrepancy could be due to the plan serialization time. Investigate 
> potential issues in plan serialization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-484) Some functions always return same data type while, according to MSDN, they should return same data type as input data

2018-07-11 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540403#comment-16540403
 ] 

Tim Armstrong commented on IMPALA-484:
--

[~tarasbob] is this fixed by your decimal_v2 work?

> Some functions always return same data type while, according to MSDN, they 
> should return same data type as input data
> -
>
> Key: IMPALA-484
> URL: https://issues.apache.org/jira/browse/IMPALA-484
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 1.0.1
>Reporter: James Deng
>Priority: Minor
>
> ABS always return double
> ROUND always return double
> SIGN always return float
> FLOOR always return bigint
> CEILING always return bigint



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-558) HS2::FetchResults sets hasMoreRows on first call even when 0 rows are returned

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-558:
-
Labels: query-lifecycle  (was: )

> HS2::FetchResults sets hasMoreRows on first call even when 0 rows are returned
> --
>
> Key: IMPALA-558
> URL: https://issues.apache.org/jira/browse/IMPALA-558
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 1.1
>Reporter: Henry Robinson
>Assignee: Alexander Behm
>Priority: Minor
>  Labels: query-lifecycle
>
> The first call to {{FetchResults}} always sets {{hasMoreRows}} even when 0 
> rows should be returned. The next call correctly sets {{hasMoreRows == 
> False}}. The upshot is there's always an extra round-trip, although 
> correctness isn't affected.
> {code}
> execute_statement_req = TCLIService.TExecuteStatementReq()
> execute_statement_req.sessionHandle = resp.sessionHandle
> execute_statement_req.statement = "SELECT COUNT(*) FROM 
> functional.alltypes WHERE 1 = 2"
> execute_statement_resp = 
> self.hs2_client.ExecuteStatement(execute_statement_req)
> 
> fetch_results_req = TCLIService.TFetchResultsReq()
> fetch_results_req.operationHandle = execute_statement_resp.operationHandle
> fetch_results_req.maxRows = 100
> fetch_results_resp = self.hs2_client.FetchResults(fetch_results_req)
> 
> assert not fetch_results_resp.hasMoreRows # Fails
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-3264) FrontEnd Plan Serialization time

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-3264.
---
Resolution: Cannot Reproduce

[~a...@cloudera.com] closing for now, please reopen if you have a repro query.

> FrontEnd Plan Serialization time
> 
>
> Key: IMPALA-3264
> URL: https://issues.apache.org/jira/browse/IMPALA-3264
> Project: IMPALA
>  Issue Type: Task
>  Components: Perf Investigation
>Affects Versions: Impala 2.2.4
>Reporter: Alan Choi
>Priority: Minor
>
> It has been observed that the "planning timeline" in FE is only ~500ms, but 
> the "Finished Planning" time in the query profile timeline is ~2sec. Such a 
> big discrepancy could be due to the plan serialization time. Investigate 
> potential issues in plan serialization.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-2138) Get rid of unused columns by upstream operators at points of materialization

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-2138:
-

Assignee: (was: Alexander Behm)

> Get rid of unused columns by upstream operators at points of materialization
> 
>
> Key: IMPALA-2138
> URL: https://issues.apache.org/jira/browse/IMPALA-2138
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 1.4, Impala 2.0, Impala 2.2
>Reporter: Ippokratis Pandis
>Priority: Critical
>  Labels: performance
>
> It would be a very good performance improvement if we were able to get rid of 
> columns as soon as we know that they are not going to be used from any other 
> operators upstream. The amount of data we are handling will reduce making the 
> network and I/O (spilling) transfers more efficient. It will also improve 
> cache performance. 
> The current row-wise in-memory format does not make it very easy to get rid 
> of such unused columns. However, there are points of materialization where we 
> copy-out the tuples and we can actually perform these projections. There are 
> multiple points of materialization, notably:
> * The exchange operator
> * The build side of hash join
> * The probe side of hash join when we have spilling
> * The aggregation
> * Sorts and analytic function evaluation
> In order to do these projections we need to modify the FE and know at each 
> operator what's the minimum set of columns that are being referenced by this 
> operator and all the upstream ones. (That minimum set is very easy to be 
> calculated during an additional top-down traversal of the plan.) We also need 
> to modify the BE and make the copy-out operation aware of such projections.
> Assigning first to Alex, because of the needed FE changes. Happy to take care 
> of the needed BE changes. Perhaps we could split this issue into 2 sub-tasks, 
> the FE and the BE changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-2138) Get rid of unused columns by upstream operators at points of materialization

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-2138:
--
Attachment: 0001-Projection-prototype.patch

> Get rid of unused columns by upstream operators at points of materialization
> 
>
> Key: IMPALA-2138
> URL: https://issues.apache.org/jira/browse/IMPALA-2138
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 1.4, Impala 2.0, Impala 2.2
>Reporter: Ippokratis Pandis
>Priority: Critical
>  Labels: performance
> Attachments: 0001-Projection-prototype.patch
>
>
> It would be a very good performance improvement if we were able to get rid of 
> columns as soon as we know that they are not going to be used from any other 
> operators upstream. The amount of data we are handling will reduce making the 
> network and I/O (spilling) transfers more efficient. It will also improve 
> cache performance. 
> The current row-wise in-memory format does not make it very easy to get rid 
> of such unused columns. However, there are points of materialization where we 
> copy-out the tuples and we can actually perform these projections. There are 
> multiple points of materialization, notably:
> * The exchange operator
> * The build side of hash join
> * The probe side of hash join when we have spilling
> * The aggregation
> * Sorts and analytic function evaluation
> In order to do these projections we need to modify the FE and know at each 
> operator what's the minimum set of columns that are being referenced by this 
> operator and all the upstream ones. (That minimum set is very easy to be 
> calculated during an additional top-down traversal of the plan.) We also need 
> to modify the BE and make the copy-out operation aware of such projections.
> Assigning first to Alex, because of the needed FE changes. Happy to take care 
> of the needed BE changes. Perhaps we could split this issue into 2 sub-tasks, 
> the FE and the BE changes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-967) Investigate wide table performance

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-967:


Assignee: (was: Mostafa Mokhtar)

> Investigate wide table performance
> --
>
> Key: IMPALA-967
> URL: https://issues.apache.org/jira/browse/IMPALA-967
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Perf Investigation
>Affects Versions: Impala 1.3
>Reporter: Skye Wanderman-Milne
>Priority: Minor
>  Labels: codegen
>
> Querying wide tables (very roughly 1000+ columns) is very slow. It looks like 
> the time is spent in planning and/or codegen, and that the time increases 
> worse than linearly with the number of columns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-5029) Remove un-necessary exchange operators from scan+agg queries when hosts=1

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-5029:
-

Assignee: (was: Alexander Behm)

> Remove un-necessary exchange operators from scan+agg queries when hosts=1
> -
>
> Key: IMPALA-5029
> URL: https://issues.apache.org/jira/browse/IMPALA-5029
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.6.0
>Reporter: Mostafa Mokhtar
>Priority: Minor
>
> Lightweight queries that constitute a small scan followed by an aggregate and 
> hosts=1 can be optimized by doing a final aggregate opposed to streaming -> 
> exchange -> Final
> {code}
> select count (distinct n_nationkey) from nation
> {code}
> Plan
> {code}
> +--+
> | Explain String   |
> +--+
> | Estimated Per-Host Requirements: Memory=36.00MB VCores=2 |
> |  |
> | PLAN-ROOT SINK   |
> | ||
> | 06:AGGREGATE [FINALIZE]  |
> | |  output: count:merge(n_nationkey)  |
> | |  hosts=1 per-host-mem=unavailable  |
> | |  tuple-ids=2 row-size=8B cardinality=1 |
> | ||
> | 05:EXCHANGE [UNPARTITIONED]  |
> | |  hosts=1 per-host-mem=unavailable  |
> | |  tuple-ids=2 row-size=8B cardinality=1 |
> | ||
> | 02:AGGREGATE |
> | |  output: count(n_nationkey)|
> | |  hosts=1 per-host-mem=10.00MB  |
> | |  tuple-ids=2 row-size=8B cardinality=1 |
> | ||
> | 04:AGGREGATE |
> | |  group by: n_nationkey |
> | |  hosts=1 per-host-mem=10.00MB  |
> | |  tuple-ids=1 row-size=8B cardinality=25|
> | ||
> | 03:EXCHANGE [HASH(n_nationkey)]  |
> | |  hosts=1 per-host-mem=0B   |
> | |  tuple-ids=1 row-size=8B cardinality=25|
> | ||
> | 01:AGGREGATE [STREAMING] |
> | |  group by: n_nationkey |
> | |  hosts=1 per-host-mem=10.00MB  |
> | |  tuple-ids=1 row-size=8B cardinality=25|
> | ||
> | 00:SCAN HDFS [tpch_300_parquet.nation, RANDOM]   |
> |partitions=1/1 files=1 size=2.19KB|
> |table stats: 25 rows total|
> |column stats: all |
> |hosts=1 per-host-mem=16.00MB  |
> |tuple-ids=0 row-size=8B cardinality=25|
> +--+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-5029) Remove un-necessary exchange operators from scan+agg queries when hosts=1

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-5029.
---
Resolution: Later

> Remove un-necessary exchange operators from scan+agg queries when hosts=1
> -
>
> Key: IMPALA-5029
> URL: https://issues.apache.org/jira/browse/IMPALA-5029
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.6.0
>Reporter: Mostafa Mokhtar
>Priority: Minor
>
> Lightweight queries that constitute a small scan followed by an aggregate and 
> hosts=1 can be optimized by doing a final aggregate opposed to streaming -> 
> exchange -> Final
> {code}
> select count (distinct n_nationkey) from nation
> {code}
> Plan
> {code}
> +--+
> | Explain String   |
> +--+
> | Estimated Per-Host Requirements: Memory=36.00MB VCores=2 |
> |  |
> | PLAN-ROOT SINK   |
> | ||
> | 06:AGGREGATE [FINALIZE]  |
> | |  output: count:merge(n_nationkey)  |
> | |  hosts=1 per-host-mem=unavailable  |
> | |  tuple-ids=2 row-size=8B cardinality=1 |
> | ||
> | 05:EXCHANGE [UNPARTITIONED]  |
> | |  hosts=1 per-host-mem=unavailable  |
> | |  tuple-ids=2 row-size=8B cardinality=1 |
> | ||
> | 02:AGGREGATE |
> | |  output: count(n_nationkey)|
> | |  hosts=1 per-host-mem=10.00MB  |
> | |  tuple-ids=2 row-size=8B cardinality=1 |
> | ||
> | 04:AGGREGATE |
> | |  group by: n_nationkey |
> | |  hosts=1 per-host-mem=10.00MB  |
> | |  tuple-ids=1 row-size=8B cardinality=25|
> | ||
> | 03:EXCHANGE [HASH(n_nationkey)]  |
> | |  hosts=1 per-host-mem=0B   |
> | |  tuple-ids=1 row-size=8B cardinality=25|
> | ||
> | 01:AGGREGATE [STREAMING] |
> | |  group by: n_nationkey |
> | |  hosts=1 per-host-mem=10.00MB  |
> | |  tuple-ids=1 row-size=8B cardinality=25|
> | ||
> | 00:SCAN HDFS [tpch_300_parquet.nation, RANDOM]   |
> |partitions=1/1 files=1 size=2.19KB|
> |table stats: 25 rows total|
> |column stats: all |
> |hosts=1 per-host-mem=16.00MB  |
> |tuple-ids=0 row-size=8B cardinality=25|
> +--+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-5029) Remove un-necessary exchange operators from scan+agg queries when hosts=1

2018-07-11 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540413#comment-16540413
 ] 

Tim Armstrong commented on IMPALA-5029:
---

This could improve latency in some very specific edge cases but probably isn't 
work tracking unless we find an example where this provides a big gain.

> Remove un-necessary exchange operators from scan+agg queries when hosts=1
> -
>
> Key: IMPALA-5029
> URL: https://issues.apache.org/jira/browse/IMPALA-5029
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.6.0
>Reporter: Mostafa Mokhtar
>Priority: Minor
>
> Lightweight queries that constitute a small scan followed by an aggregate and 
> hosts=1 can be optimized by doing a final aggregate opposed to streaming -> 
> exchange -> Final
> {code}
> select count (distinct n_nationkey) from nation
> {code}
> Plan
> {code}
> +--+
> | Explain String   |
> +--+
> | Estimated Per-Host Requirements: Memory=36.00MB VCores=2 |
> |  |
> | PLAN-ROOT SINK   |
> | ||
> | 06:AGGREGATE [FINALIZE]  |
> | |  output: count:merge(n_nationkey)  |
> | |  hosts=1 per-host-mem=unavailable  |
> | |  tuple-ids=2 row-size=8B cardinality=1 |
> | ||
> | 05:EXCHANGE [UNPARTITIONED]  |
> | |  hosts=1 per-host-mem=unavailable  |
> | |  tuple-ids=2 row-size=8B cardinality=1 |
> | ||
> | 02:AGGREGATE |
> | |  output: count(n_nationkey)|
> | |  hosts=1 per-host-mem=10.00MB  |
> | |  tuple-ids=2 row-size=8B cardinality=1 |
> | ||
> | 04:AGGREGATE |
> | |  group by: n_nationkey |
> | |  hosts=1 per-host-mem=10.00MB  |
> | |  tuple-ids=1 row-size=8B cardinality=25|
> | ||
> | 03:EXCHANGE [HASH(n_nationkey)]  |
> | |  hosts=1 per-host-mem=0B   |
> | |  tuple-ids=1 row-size=8B cardinality=25|
> | ||
> | 01:AGGREGATE [STREAMING] |
> | |  group by: n_nationkey |
> | |  hosts=1 per-host-mem=10.00MB  |
> | |  tuple-ids=1 row-size=8B cardinality=25|
> | ||
> | 00:SCAN HDFS [tpch_300_parquet.nation, RANDOM]   |
> |partitions=1/1 files=1 size=2.19KB|
> |table stats: 25 rows total|
> |column stats: all |
> |hosts=1 per-host-mem=16.00MB  |
> |tuple-ids=0 row-size=8B cardinality=25|
> +--+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-5727) Join Order Optimization time increases non-linearly with the number of tables

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-5727:
-

Assignee: (was: Alexander Behm)

> Join Order Optimization time increases non-linearly with the number of tables
> -
>
> Key: IMPALA-5727
> URL: https://issues.apache.org/jira/browse/IMPALA-5727
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Alan Choi
>Priority: Major
>  Labels: performance, planner
>
> The planning time (believed to be in join order optimization) increases 
> non-linearly with increasing number of tables in the join. By increasing the 
> number of tables in the join from 5 to 10, the planning time increases from 
> 200ms to 700+ms.
> For small data query, 700+ms planning time is significant.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-5727) Join Order Optimization time increases non-linearly with the number of tables

2018-07-11 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540423#comment-16540423
 ] 

Tim Armstrong commented on IMPALA-5727:
---

[~alanchoi] see Alex's previous comment. What's the goal here? Is it to reduce 
planning time for queries on small data? 

Maybe the problem is misstated - we can't make the join order optimisation 
linear but we could disable it or do a lightweight version if the amount of 
data is very small.

> Join Order Optimization time increases non-linearly with the number of tables
> -
>
> Key: IMPALA-5727
> URL: https://issues.apache.org/jira/browse/IMPALA-5727
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Alan Choi
>Assignee: Alexander Behm
>Priority: Major
>  Labels: performance, planner
>
> The planning time (believed to be in join order optimization) increases 
> non-linearly with increasing number of tables in the join. By increasing the 
> number of tables in the join from 5 to 10, the planning time increases from 
> 200ms to 700+ms.
> For small data query, 700+ms planning time is significant.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-2603) Incorrect results and plan for inline view referencing several collection types correlated with different ancestor blocks

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-2603:
-

Assignee: (was: Alexander Behm)

> Incorrect results and plan for inline view referencing several collection 
> types correlated with different ancestor blocks
> -
>
> Key: IMPALA-2603
> URL: https://issues.apache.org/jira/browse/IMPALA-2603
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.3.0
>Reporter: Taras Bobrovytsky
>Priority: Critical
>  Labels: correctness, crash, downgraded, nested_types, 
> query_generator
>
> *Problem*
> Queries with multiple nested inline views that have correlated references to 
> nested collections (relative table references), can return incorrect results 
> in RELEASE and hit a DCHECK in debug under the following condition:
> * There is an inline view that references multiple nested collections which 
> come from different ancestor blocks at different levels of nesting.
> * In the example below, Impala fails to generate a correct plan for the "a" 
> inline view because the references "t5" and "t6" reference different ancestor 
> query blocks at different nesting levels.
> Query:
> {code}
> SELECT
>   1
> FROM 
>   customer t1
>   INNER JOIN (
> SELECT 
>   1
> FROM
>   t1.c_orders t2
>   INNER JOIN (
> SELECT
>   1
> FROM 
>   t2.o_lineitems t5 
>   INNER JOIN t1.c_orders t6
>) as a
>   ) as b;
> {code}
> Wrong Query Plan:
> {code}
> ++
> | Explain String  
>|
> ++
> | Estimated Per-Host Requirements: Memory=176.00MB VCores=1   
>|
> | WARNING: The following tables are missing relevant table and/or column 
> statistics. |
> | tpch_nested_parquet.customer
>|
> | 
>|
> | 05:EXCHANGE [UNPARTITIONED] 
>|
> | |   
>|
> | 01:SUBPLAN  
>|
> | |   
>|
> | |--04:NESTED LOOP JOIN [CROSS JOIN] 
>|
> | |  |
>|
> | |  |--02:SINGULAR ROW SRC   
>|
> | |  |
>|
> | |  03:UNNEST [t1.c_orders t2]   
>|
> | |   
>|
> | 00:SCAN HDFS [tpch_nested_parquet.customer t1]  
>|
> |partitions=1/1 files=4 size=554.13MB 
>|
> ++
> {code}
> Stack Trace:
> {code}
> #0  0x7f5c10cf5cc9 in __GI_raise (sig=sig@entry=6) at 
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> #1  0x7f5c10cf90d8 in __GI_abort () at abort.c:89
> #2  0x02144d09 in google::DumpStackTraceAndExit () at 
> src/utilities.cc:147
> #3  0x0213ddbd in google::LogMessage::Fail () at src/logging.cc:1315
> #4  0x0213fc45 in google::LogMessage::SendToLog (this=0x7f5b9ef08e00) 
> at src/logging.cc:1269
> #5  0x0213d913 in google::LogMessage::Flush 
> (this=this@entry=0x7f5b9ef08e00) at src/logging.cc:1138
> #6  0x0214059e in google::LogMessageFatal::~LogMessageFatal 
> (this=0x7f5b9ef08e00, __in_chrg=) at src/logging.cc:1836
> #7  0x01586657 in impala::Coordinator::ValidateCollectionSlots 
> (this=0xc49ca00, batch=0xc3b3e00) at 
> /home/dev/Impala/be/src/runtime/coordinator.cc:911
> #8  0x0158638d in impala::Coordinator::GetNext (this=0xc49ca00, 
> batch=0x7dd3bd0, state=0xd934400) at 
> /home/dev/Impala/be/src/runtime/coordinator.cc:890
> #9  0x013710c3 in 
> impala::ImpalaServer::QueryExecState::FetchNextBatch (this=0x7dd2000) at 
> /home/dev/Impala/be/src/service/query-exec-state.cc:877
> #10 0x0136f169 in 
> impala::ImpalaServer::QueryExecState::FetchRowsInternal (this=0x7dd2000, 
> max_rows=1024, fetched_ro

[jira] [Commented] (IMPALA-2913) Impalad fails to load tables in test runs

2018-07-11 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540424#comment-16540424
 ] 

Tim Armstrong commented on IMPALA-2913:
---

[~vukercegovac] any idea what's going on here? Looks like it got dropped.

> Impalad fails to load tables in test runs
> -
>
> Key: IMPALA-2913
> URL: https://issues.apache.org/jira/browse/IMPALA-2913
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog, Frontend
>Affects Versions: Impala 2.5.0
>Reporter: Henry Robinson
>Assignee: Vuk Ercegovac
>Priority: Major
> Fix For: Impala 2.5.0
>
> Attachments: catalog.stacks.out, gdb.txt.zip, impalad.stacks.out
>
>
> Build: http://sandbox.jenkins.cloudera.com/job/impala-master-cdh5-trunk/1714/
> {code:title=impalad}
> I0130 10:12:40.874454 24955 Frontend.java:864] analyze query drop database if 
> exists `test_parquet_list_encodings_793898492` cascade
> I0130 10:13:41.530505 24944 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:13:41.532280 24944 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.customer_demographics
> I0130 10:13:43.693908 24948 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:13:43.697772 24948 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.household_demographics, tpcds_parquet.customer
> I0130 10:13:51.970826 24928 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:13:51.971350 24928 Frontend.java:808] Requesting prioritized load of 
> table(s): functional_avro_snap.alltypestiny
> I0130 10:14:10.715591 24925 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:14:10.718114 24925 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.customer, tpcds_parquet.promotion
> I0130 10:15:41.542430 24944 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:15:41.544220 24944 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.customer_demographics
> I0130 10:15:43.707329 24948 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:15:43.711132 24948 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.household_demographics, tpcds_parquet.customer
> I0130 10:15:51.980530 24928 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:15:51.981075 24928 Frontend.java:808] Requesting prioritized load of 
> table(s): functional_avro_snap.alltypestiny
> I0130 10:16:10.728305 24925 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:16:10.730826 24925 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.customer, tpcds_parquet.promotion
> I0130 10:17:41.554452 24944 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:17:41.556248 24944 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.customer_demographics
> I0130 10:17:43.720675 24948 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:17:43.724488 24948 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.household_demographics, tpcds_parquet.customer
> I0130 10:17:51.990448 24928 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:17:51.990978 24928 Frontend.java:808] Requesting prioritized load of 
> table(s): functional_avro_snap.alltypestiny
> I0130 10:18:10.740888 24925 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:18:10.743422 24925 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.customer, tpcds_parquet.promotion
> I0130 10:19:41.566427 24944 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:19:41.568217 24944 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.customer_demographics
> I0130 10:19:43.733989 24948 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:19:43.737788 24948 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.household_demographics, tpcds_parquet.customer
> I0130 10:19:52.59 24928 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:19:52

[jira] [Assigned] (IMPALA-2913) Impalad fails to load tables in test runs

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-2913:
-

Assignee: Vuk Ercegovac  (was: Dimitris Tsirogiannis)

> Impalad fails to load tables in test runs
> -
>
> Key: IMPALA-2913
> URL: https://issues.apache.org/jira/browse/IMPALA-2913
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog, Frontend
>Affects Versions: Impala 2.5.0
>Reporter: Henry Robinson
>Assignee: Vuk Ercegovac
>Priority: Major
> Fix For: Impala 2.5.0
>
> Attachments: catalog.stacks.out, gdb.txt.zip, impalad.stacks.out
>
>
> Build: http://sandbox.jenkins.cloudera.com/job/impala-master-cdh5-trunk/1714/
> {code:title=impalad}
> I0130 10:12:40.874454 24955 Frontend.java:864] analyze query drop database if 
> exists `test_parquet_list_encodings_793898492` cascade
> I0130 10:13:41.530505 24944 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:13:41.532280 24944 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.customer_demographics
> I0130 10:13:43.693908 24948 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:13:43.697772 24948 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.household_demographics, tpcds_parquet.customer
> I0130 10:13:51.970826 24928 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:13:51.971350 24928 Frontend.java:808] Requesting prioritized load of 
> table(s): functional_avro_snap.alltypestiny
> I0130 10:14:10.715591 24925 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:14:10.718114 24925 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.customer, tpcds_parquet.promotion
> I0130 10:15:41.542430 24944 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:15:41.544220 24944 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.customer_demographics
> I0130 10:15:43.707329 24948 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:15:43.711132 24948 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.household_demographics, tpcds_parquet.customer
> I0130 10:15:51.980530 24928 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:15:51.981075 24928 Frontend.java:808] Requesting prioritized load of 
> table(s): functional_avro_snap.alltypestiny
> I0130 10:16:10.728305 24925 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:16:10.730826 24925 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.customer, tpcds_parquet.promotion
> I0130 10:17:41.554452 24944 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:17:41.556248 24944 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.customer_demographics
> I0130 10:17:43.720675 24948 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:17:43.724488 24948 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.household_demographics, tpcds_parquet.customer
> I0130 10:17:51.990448 24928 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:17:51.990978 24928 Frontend.java:808] Requesting prioritized load of 
> table(s): functional_avro_snap.alltypestiny
> I0130 10:18:10.740888 24925 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:18:10.743422 24925 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.customer, tpcds_parquet.promotion
> I0130 10:19:41.566427 24944 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:19:41.568217 24944 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.customer_demographics
> I0130 10:19:43.733989 24948 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:19:43.737788 24948 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.household_demographics, tpcds_parquet.customer
> I0130 10:19:52.59 24928 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:19:52.000630 24928 Frontend.java:808] Requesting prioritized load of 
>

[jira] [Assigned] (IMPALA-4373) Wrong results with correlated WHERE-clause subquery inside a NULL-checking conditional function.

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-4373:
-

Assignee: (was: Alexander Behm)

> Wrong results with correlated WHERE-clause subquery inside a NULL-checking 
> conditional function.
> 
>
> Key: IMPALA-4373
> URL: https://issues.apache.org/jira/browse/IMPALA-4373
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.5.0, Impala 2.4.0, Impala 2.6.0, Impala 2.7.0, 
> Impala 2.8.0, Impala 2.9.0
>Reporter: Alexander Behm
>Priority: Critical
>  Labels: correctness
>
> Impala may generate an incorrect plan for queries that have a correlated 
> scalar subquery as a parameter to a NULL-checking conditional function like 
> ISNULL().
> Example query and incorrect plan:
> {code}
> select t1.int_col
> from functional.alltypessmall as t1
> where t1.int_col >= isnull
> (
>(
> SELECT 
>  MAX(t2.bigint_col)
> FROM 
>  functional.alltypestiny AS t2 
> WHERE 
>  t1.id = t2.id + 1
> ),
>0  
> )
> Fetched 0 row(s) in 1.09s
> Single-node plan:
> +---+
> | Explain String|
> +---+
> | Estimated Per-Host Requirements: Memory=0B VCores=0   |
> |   |
> | PLAN-ROOT SINK|
> | | |
> | 03:HASH JOIN [LEFT SEMI JOIN] |
> | |  hash predicates: t1.id = t2.id + 1 |
> | |  other join predicates: t1.int_col >= isnull(max(t2.bigint_col), 0) |
> | |  runtime filters: RF000 <- t2.id + 1|
> | | |
> | |--02:AGGREGATE [FINALIZE]|
> | |  |  output: max(t2.bigint_col)  |
> | |  |  group by: t2.id |
> | |  |  |
> | |  01:SCAN HDFS [functional.alltypestiny t2]  |
> | | partitions=4/4 files=4 size=460B|
> | | |
> | 00:SCAN HDFS [functional.alltypessmall t1]|
> |partitions=4/4 files=4 size=6.32KB |
> |runtime filters: RF000 -> t1.id|
> +---+
> {code}
> The query returns an empty result set but instead should return all rows from 
> t1 because all invocations of the subquery return NULL, and all rows from t1 
> satisfy "t1.int_col >= 0".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-4040) Performance regression introduced by "IMPALA-3828 Join inversion"

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-4040:
-

Assignee: (was: Alexander Behm)

> Performance regression introduced by  "IMPALA-3828 Join inversion"
> --
>
> Key: IMPALA-4040
> URL: https://issues.apache.org/jira/browse/IMPALA-4040
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.7.0
>Reporter: Mostafa Mokhtar
>Priority: Major
>  Labels: regression
>
> IMPALA-3828 improved several TPC-DS queries but introduced a regression to 
> queries 10 and 17. 
> The regression in TPC-DS Q10 is because the runtime filters in the old plan 
> had a cascading effect mimicking a bushy execution plan.
> Unfortunately filtering from runtime filters is not accounted for by the 
> planner and one of the scan nodes is moved to a location where runtime 
> filters no longer have this cascading effect. 
> For TPC-DS Q10 before IMPALA-3828 RF001 created from customer_address was 
> pushed on to customer creating RF002 which is very selective and eventually 
> gets pushed to web_sales, catalog_sales and store_sales.
> Post IMPALA-3828 RF001 gets pushed on the customer table but in the new plan 
> the customer scan node is the left most node in the subplan and as a result 
> no runtime filters from customer gets pushed onto the  *_sales fact tables 
> which dominate the cost of the plan.
> TPC-DS Q10
> {code}
> select  
>   cd_gender,
>   cd_marital_status,
>   cd_education_status,
>   count(*) cnt1,
>   cd_purchase_estimate,
>   count(*) cnt2,
>   cd_credit_rating,
>   count(*) cnt3,
>   cd_dep_count,
>   count(*) cnt4,
>   cd_dep_employed_count,
>   count(*) cnt5,
>   cd_dep_college_count,
>   count(*) cnt6
>  from
>   customer c,customer_address ca,customer_demographics,
>   (select ss_customer_sk
>   from store_sales,date_dim
>   where ss_sold_date_sk = d_date_sk and
> d_year = 2002 and
> d_moy between 2 and 2+3) ss,
> (select ws_bill_customer_sk
> from web_sales,date_dim
> where ws_sold_date_sk = d_date_sk and
>   d_year = 2002 and
>   d_moy between 2 ANd 2+3) ws,
>   (select cs_ship_customer_sk
> from catalog_sales,date_dim
> where cs_sold_date_sk = d_date_sk and
>   d_year = 2002 and
>   d_moy between 2 and 2+3) cs
>  where
>   c.c_current_addr_sk = ca.ca_address_sk and
>   ca_county in ('McKenzie County','Adams County','Grant County','Saguache 
> County','Waseca County') and
>   cd_demo_sk = c.c_current_cdemo_sk 
>   and c_customer_sk = ss_customer_sk 
> and c_customer_sk = ws_bill_customer_sk 
>   and c_customer_sk = cs_ship_customer_sk 
>   
>   
>  group by cd_gender,
>   cd_marital_status,
>   cd_education_status,
>   cd_purchase_estimate,
>   cd_credit_rating,
>   cd_dep_count,
>   cd_dep_employed_count,
>   cd_dep_college_count
>  order by cd_gender,
>   cd_marital_status,
>   cd_education_status,
>   cd_purchase_estimate,
>   cd_credit_rating,
>   cd_dep_count,
>   cd_dep_employed_count,
>   cd_dep_college_count
> limit 100
> {code}
> Plan before change
> {code}
> 31:MERGING-EXCHANGE [UNPARTITIONED]
> |  order by: cd_gender ASC, cd_marital_status ASC, cd_education_status ASC, 
> cd_purchase_estimate ASC, cd_credit_rating ASC, cd_dep_count ASC, 
> cd_dep_employed_count ASC, cd_dep_college_count ASC
> |  limit: 100
> |  hosts=15 per-host-mem=unavailable
> |  tuple-ids=13 row-size=107B cardinality=100
> |
> 18:TOP-N [LIMIT=100]
> |  order by: cd_gender ASC, cd_marital_status ASC, cd_education_status ASC, 
> cd_purchase_estimate ASC, cd_credit_rating ASC, cd_dep_count ASC, 
> cd_dep_employed_count ASC, cd_dep_college_count ASC
> |  hosts=15 per-host-mem=10.41KB
> |  tuple-ids=13 row-size=107B cardinality=100
> |
> 30:AGGREGATE [FINALIZE]
> |  output: count:merge(*)
> |  group by: cd_gender, cd_marital_status, cd_education_status, 
> cd_purchase_estimate, cd_credit_rating, cd_dep_count, cd_dep_employed_count, 
> cd_dep_college_count
> |  hosts=15 per-host-mem=10.00MB
> |  tuple-ids=12 row-size=107B cardinality=2406
> |
> 29:EXCHANGE 
> [HASH(cd_gender,cd_marital_status,cd_education_status,cd_purchase_estimate,cd_credit_rating,cd_dep_count,cd_dep_employed_count,cd_dep_college_count)]
> |  hosts=15 per-host-mem=0B
> |  tuple-ids=12 row-size=107B cardinality=2406
> |
> 17:AGGREGATE [STREAMING]
> |  output: count(*)
> |  group by: cd_gender, cd_marital_status, cd_education_status, 
> cd_purchase_estimate, cd_credit_rating, cd_dep_count, cd_dep_employed_coun

[jira] [Resolved] (IMPALA-2953) code coverage reports not being generated for frontend

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-2953.
---
Resolution: Cannot Reproduce

> code coverage reports not being generated for frontend
> --
>
> Key: IMPALA-2953
> URL: https://issues.apache.org/jira/browse/IMPALA-2953
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.5.0
>Reporter: Michael Brown
>Assignee: Alexander Behm
>Priority: Minor
>
> The impala-master-code-coverage-cdh5 job is reporting backend code coverage, 
> but not frontend.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-3601) Why not to cache complex expression results , calculation is repeated a plurality of times

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-3601:
-

Assignee: (was: Alexander Behm)

> Why not to cache complex expression results  , calculation is repeated a 
> plurality of times
> ---
>
> Key: IMPALA-3601
> URL: https://issues.apache.org/jira/browse/IMPALA-3601
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.5.0
>Reporter: fishing
>Priority: Minor
> Attachments: image-2016-05-24-10-33-05-594.png
>
>
> select UNIX_TIMESTAMP( CAST(REQUEST_MIN AS STRING), 'MMddHHmm' ) * 1000 
> AS REQUEST_TS, REQUEST_MIN, MEMBERID, CATE3_ID, CASE WHEN ROW_NUM > 1 THEN 1 
> ELSE 0 END AS IS_REPURCHASE FROM ( SELECT REQUEST_MIN, MEMBERID, CATE3_ID, 
> max(ROW_NUM) AS ROW_NUM FROM ( SELECT REQUEST_MIN, MEMBERID, CASE WHEN 
> C.CATE3_ID IS NULL THEN - 1 ELSE C.CATE3_ID END AS CATE3_ID, ROW_NUMBER () 
> OVER ( PARTITION BY REQUEST_MIN, MEMBERID, C.CATE3_ID ORDER BY REQUEST_MIN ) 
> AS ROW_NUM FROM ( SELECT ORDERID, PRODUCTID FROM CP_ORDERDETAIL WHERE 
> CREATEDATE >= '2016-05-04 00:00:00' AND CREATEDATE < '2016-05-10 12:30:00' ) 
> OD INNER JOIN ( SELECT CAST( FROM_UNIXTIME( UNIX_TIMESTAMP(CREATEDATE), 
> 'MMddHHmm' ) AS BIGINT ) AS REQUEST_MIN, ID, MEMBERID FROM CP_ORDERS 
> WHERE CREATEDATE >= '2016-05-04 00:00:00' AND CREATEDATE < '2016-05-10 
> 12:30:00' AND YN = 1 ) O ON OD.ORDERID = O.ID LEFT JOIN [ SHUFFLE ] CP_SKU S 
> ON OD.PRODUCTID = S.SKU_ID LEFT JOIN [ SHUFFLE ] CP_PRODUCT P ON S.PRODUCT_ID 
> = P.PRODUCT_ID LEFT JOIN CATES C ON P.CATEGORY_ID = C.CATE3_ID ) XX WHERE 
> REQUEST_MIN >= 201605101130 GROUP BY REQUEST_MIN, MEMBERID, CATE3_ID ) xx 
> !image-2016-05-24-10-33-05-594.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-2791) Avoid unnecessary two-phased aggregation.

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-2791:
-

Assignee: (was: Alexander Behm)

> Avoid unnecessary two-phased aggregation.
> -
>
> Key: IMPALA-2791
> URL: https://issues.apache.org/jira/browse/IMPALA-2791
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.2, Impala 2.3.0
>Reporter: Alexander Behm
>Priority: Minor
>  Labels: performance, planner
>
> We perform a two-phased aggregation for evaluating distinct aggregate 
> expressions like count(distinct). However, if the distinct aggregate 
> expression is not referenced in enclosing query blocks, then the two-phased 
> aggregation is unnecessary and should be skipped.
> Example:
> {code}
>  explain select x from
>   (select count(int_col) x, count(distinct bigint_col) y from 
> functional.alltypes) v;
> +---+
> | Explain String|
> +---+
> | Estimated Per-Host Requirements: Memory=170.00MB VCores=2 |
> |   |
> | 06:AGGREGATE [FINALIZE]   |
> | |  output: count:merge(bigint_col), count:merge(int_col)  |
> | | |
> | 05:EXCHANGE [UNPARTITIONED]   |
> | | |
> | 02:AGGREGATE  |
> | |  output: count(bigint_col), count:merge(int_col)|
> | | |
> | 04:AGGREGATE  |
> | |  output: count:merge(int_col)   |
> | |  group by: bigint_col   |
> | | |
> | 03:EXCHANGE [HASH(bigint_col)]|
> | | |
> | 01:AGGREGATE  |
> | |  output: count(int_col) |
> | |  group by: bigint_col   |
> | | |
> | 00:SCAN HDFS [functional.alltypes]|
> |partitions=24/24 files=24 size=478.45KB|
> +---+
> {code}
> In the query above, it is unnecessary to compute the "count(distinct 
> bigint_col)" aggregate expression, so a single-phased aggregation would be 
> sufficient.
> One way to fix this issue would be to defer creation of the AggregateInfo to 
> the planning phase where the materialization of aggregate expressions is 
> known. Currently, we create the AggregateInfo during analysis. Retroactively 
> "fixing" an AggregateInfo during planning to remover the two phases seems 
> complicated.
> This limitation inhibits other optimizations such as IMPALA-2499.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-2796) Suboptimal join ordering due to greedy candidate selection.

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-2796:
-

Assignee: (was: Alexander Behm)

> Suboptimal join ordering due to greedy candidate selection.
> ---
>
> Key: IMPALA-2796
> URL: https://issues.apache.org/jira/browse/IMPALA-2796
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.2, Impala 2.3.0
>Reporter: Alexander Behm
>Priority: Minor
>  Labels: performance, planning
>
> Consider the join order that we generate for TPCH-Q8 after fixing IMPALA-976:
> {code}
> hash join *region* on n1.n_regionkey = r_regionkey
> hash join nation n1 on c_nationkey = n1.n_nationkey
> hash join customer o_custkey = c_custkey
> hash join nation n2 s_nationkey = n2.c_nationkey
> hash join supplier on l_suppkey = s_suppkey
> hash join *order* l_orderkey = o_orderkey
> hash join *part* l_partkey = p_partkey
> lineitem
> {code}
> In the pseudo-plan above, the tables in bold have selective predicates 
> applied to the scans.
> Contrast the plan above with the following optimal join order:
> {code}
> hash join nation n2 s_nationkey = n2.c_nationkey
> hash join supplier on l_suppkey = s_suppkey
> hash join *region* on n1.n_regionkey = r_regionkey
> hash join nation n1 on c_nationkey = n1.n_nationkey
> hash join customer o_custkey = c_custkey
> hash join *order* l_orderkey = o_orderkey
> hash join *part* l_partkey = p_partkey
> lineitem
> {code}
> This plan is better because the number of intermediate results are reduced by 
> executing the join on region first. The difference between the two plans is 
> that the following series of joins are "swapped":
> This series of joins leads up to a selective join with region, and should 
> come before the block with supplier and n2.
> {code}
> hash join *region* on n1.n_regionkey = r_regionkey
> hash join nation n1 on c_nationkey = n1.n_nationkey
> hash join customer o_custkey = c_custkey
> {code}
> These series of joins are not selective and should come last.
> {code}
> hash join nation n2 s_nationkey = n2.c_nationkey
> hash join supplier on l_suppkey = s_suppkey
> {code}
> Our current join-ordering algorithm is not able to produce the optimal plan 
> because it greedily adds one join at a time. After it has constructed the 
> partial plan that joins lineitem,part,order, the algorithm considers customer 
> and supplier as candidates for the next join (only these tables are 
> candidates due to the applicable join predicates). Since the resulting join 
> cardinality for customer and supplier is estimated to be equal, we 
> "arbitrarily" pick one and continue.
> We should improve our join ordering algorithm to generate the optimal join 
> order for TPCH-Q8.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-1374) Improve Join Order Planning

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-1374:
-

Assignee: (was: Alexander Behm)

> Improve Join Order Planning
> ---
>
> Key: IMPALA-1374
> URL: https://issues.apache.org/jira/browse/IMPALA-1374
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 1.3.2
>Reporter: Ryan Bosshart
>Priority: Minor
>  Labels: performance, planner
> Attachments: consolidatedqueries_fast, consolidatedqueries_slow
>
>
> The join order is determined entirely by total size (#rows * column width). 
> This makes sense in general. However, when the fact table size (after 
> partition pruning) is close to the dim table, it can be a wrong choice 
> because the join key from the fact table is duplicated many many times. This 
> will make the hash chain very long.
> On an almost identical query (similar join condition, tables, & number of 
> results), this caused a query time of ~10 seconds for one query and ~3 
> minutes for the other (first row fetched, queries attached).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-3266) Investigate Catalog performance

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-3266:
-

Assignee: (was: Mostafa Mokhtar)

> Investigate Catalog performance
> ---
>
> Key: IMPALA-3266
> URL: https://issues.apache.org/jira/browse/IMPALA-3266
> Project: IMPALA
>  Issue Type: Task
>  Components: Perf Investigation
>Affects Versions: Impala 2.2.4
>Reporter: Alan Choi
>Priority: Minor
>
> Large catalog takes a long time to load from HiveMetastore and to distribute 
> it through statestore. Create an experiment to quantify how table, partition, 
> files, blocks affects the timing as well as its resource usage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-2910) create nested types perf microbenchmarks

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-2910:
-

Assignee: (was: Mostafa Mokhtar)

> create nested types perf microbenchmarks
> 
>
> Key: IMPALA-2910
> URL: https://issues.apache.org/jira/browse/IMPALA-2910
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 2.3.0
>Reporter: Silvius Rus
>Priority: Minor
>
> Please extend the perf microbenchmarks to cover performance specific to 
> queries on nested data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-660) Make rand() more non-deterministic

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-660:


Assignee: (was: Alexander Behm)

> Make rand() more non-deterministic
> --
>
> Key: IMPALA-660
> URL: https://issues.apache.org/jira/browse/IMPALA-660
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 1.2.1
>Reporter: John Russell
>Priority: Minor
>  Labels: correctness, downgraded
>
> We document that rand() returns unpredictable values, unless preceded by a 
> call to rand(seed). My expectation from other DBMSes like MySQL and Oracle, 
> or by inferring from POSIX behavior, is that once the random number generator 
> is seeded, there would be an infinite stream of random values that would 
> stretch across multiple queries. However, in practice, Impala rand() resets 
> to the same sequence after each query:
> [localhost:21000] > select rand()*34000 from store limit cast (rand()*34000 
> as int);
> Query: select rand()*34000 from store limit cast (rand()*34000 as int)
> +---+
> | rand() * 34000.0  |
> +---+
> | 16.03013650329324 |
> | 20046.04365399389 |
> | 15068.46292087271 |
> | 2513.48005631635  |
> | 15713.13279760682 |
> | 22709.15090977641 |
> | 28398.60192704881 |
> | 5477.171718830788 |
> | 16371.07374722654 |
> | 18740.27370882233 |
> | 16465.31711354168 |
> | 13467.14456540865 |
> +---+
> Returned 12 row(s) in 0.24s
> [localhost:21000] > select rand()*34000 from store limit cast (rand()*34000 
> as int);
> Query: select rand()*34000 from store limit cast (rand()*34000 as int)
> +---+
> | rand() * 34000.0  |
> +---+
> | 16.03013650329324 |
> | 20046.04365399389 |
> | 15068.46292087271 |
> | 2513.48005631635  |
> | 15713.13279760682 |
> | 22709.15090977641 |
> | 28398.60192704881 |
> | 5477.171718830788 |
> | 16371.07374722654 |
> | 18740.27370882233 |
> | 16465.31711354168 |
> | 13467.14456540865 |
> +---+
> Returned 12 row(s) in 0.22s
> And if rand() is called multiple times in the same query, it gives the same 
> value each time:
> [localhost:21000] > select rand(), rand(), rand() from store;
> Query: select rand(), rand(), rand() from store
> +---+---+---+
> | rand()| rand()| rand()|
> +---+---+---+
> | 0.0004714746030380365 | 0.0004714746030380365 | 0.0004714746030380365 |
> | 0.5895895192351144| 0.5895895192351144| 0.5895895192351144|
> | 0.4431900859080209| 0.4431900859080209| 0.4431900859080209|
> | 0.0739258840093044| 0.0739258840093044| 0.0739258840093044|
> | 0.4621509646354946| 0.4621509646354946| 0.4621509646354946|
> | 0.6679162032287178| 0.6679162032287178| 0.6679162032287178|
> | 0.8352529978543767| 0.8352529978543767| 0.8352529978543767|
> | 0.1610932858479644| 0.1610932858479644| 0.1610932858479644|
> | 0.4815021690360746| 0.4815021690360746| 0.4815021690360746|
> | 0.5511845208477156| 0.5511845208477156| 0.5511845208477156|
> | 0.4842740327512259| 0.4842740327512259| 0.4842740327512259|
> | 0.3960924872179015| 0.3960924872179015| 0.3960924872179015|
> +---+---+---+
> Returned 12 row(s) in 0.23s
> What I was expecting to happen was:
> select rand(12345);
> select rand() from t1 limit 100;
> ... 100 random values ...
> select rand() from t1 limit 100;
> ... 100 different random values ...
> select rand(), rand(), rand();
> ... 3 different random values ...
> select rand(12345);
> -- Then the sequence of rand() queries as above would give the same results 
> as before.
> Otherwise, calling rand(seed) in a standalone query is kind of a no-op, it 
> has no effect on subsequent queries:
> [localhost:21000] > select rand(12345);
> Query: select rand(12345)
> ++
> | rand(12345)|
> ++
> | 0.4827902789613187 |
> ++
> Returned 1 row(s) in 0.11s
> [localhost:21000] > select rand(), rand(), rand();
> Query: select rand(), rand(), rand()
> +---+---+---+
> | rand()| rand()| rand()|
> +---+---+---+
> | 0.0004714746030380365 | 0.0004714746030380365 | 0.0004714746030380365 |
> +---+---+---+
> Returned 1 row(s) in 0.11s
> [localhost:21000] > select rand(23456);
> Query: select

[jira] [Assigned] (IMPALA-3120) Extend planner cost model to handle bucketed tables

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-3120:
-

Assignee: (was: Alexander Behm)

> Extend planner cost model to handle bucketed tables
> ---
>
> Key: IMPALA-3120
> URL: https://issues.apache.org/jira/browse/IMPALA-3120
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Frontend
>Affects Versions: Impala 2.6.0
>Reporter: Mostafa Mokhtar
>Priority: Minor
>
> The planner needs to code joins and aggregations for bucketed tables 
> differently, some costing will be needed to achieve efficient join ordering 
> etc..



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-3685) Planner produces incorrect join cardinality estimation when inequality predicate is used on dimension table

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-3685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-3685:
-

Assignee: (was: Alexander Behm)

> Planner produces incorrect join cardinality estimation when inequality 
> predicate is used on dimension table
> ---
>
> Key: IMPALA-3685
> URL: https://issues.apache.org/jira/browse/IMPALA-3685
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.5.0
>Reporter: Mostafa Mokhtar
>Priority: Minor
>  Labels: planner
> Attachments: query14.sql.2.out copy
>
>
> It appears that when inequality predicate is applied on a fact to dimension 
> join the filter selectivity on the dimension table is not reflected on the 
> join cardinality estimation. 
> This issue was first found in TPC-DS Q14 attached. 
> Query
> {code}
> explain select
> ss_quantity quantity
> from
> store_sales,
> date_dim
> where
> ss_sold_date_sk = d_date_sk
> and d_year  < 2000
> {code}
> Plan
> {code}
> +--+
> | Explain String   |
> +--+
> | Estimated Per-Host Requirements: Memory=88.06MB VCores=2 |
> |  |
> | 04:EXCHANGE [UNPARTITIONED]  |
> | |  hosts=8 per-host-mem=unavailable  |
> | |  tuple-ids=0,1 row-size=16B cardinality=2879987999 |
> | ||
> | 02:HASH JOIN [INNER JOIN, BROADCAST] |
> | |  hash predicates: ss_sold_date_sk = d_date_sk  |
> | |  runtime filters: RF000 <- d_date_sk   |
> | |  hosts=8 per-host-mem=62.78KB  |
> | |  tuple-ids=0,1 row-size=16B cardinality=2879987999 |
> | ||
> | |--03:EXCHANGE [BROADCAST]   |
> | |  |  hosts=1 per-host-mem=0B|
> | |  |  tuple-ids=1 row-size=8B cardinality=7305   |
> | |  | |
> | |  01:SCAN HDFS [tpcds_1000_parquet.date_dim, RANDOM]|
> | | partitions=1/1 files=1 size=2.17MB |
> | | predicates: d_year < 2000  |
> | | table stats: 73049 rows total  |
> | | column stats: all  |
> | | hosts=1 per-host-mem=32.00MB   |
> | | tuple-ids=1 row-size=8B cardinality=7305   |
> | ||
> | 00:SCAN HDFS [tpcds_1000_parquet.store_sales, RANDOM]|
> |partitions=1824/1824 files=1824 size=189.24GB |
> |runtime filters: RF000 -> ss_sold_date_sk |
> |table stats: 2879987999 rows total|
> |column stats: all |
> |hosts=8 per-host-mem=88.00MB  |
> |tuple-ids=0 row-size=8B cardinality=2879987999|
> +--+
> {code}
> When an equality predicate is used the selectivity of the filter on the 
> dimension table is reflected on the  join
> {code}
> select 
> ss_quantity
> from
> store_sales,
> date_dim
> where
> ss_sold_date_sk = d_date_sk
> and d_year = 1999;
> {code}
> {code}
> +--+
> | Explain String   |
> +--+
> | Estimated Per-Host Requirements: Memory=88.00MB VCores=2 |
> |  |
> | 04:EXCHANGE [UNPARTITIONED]  |
> | |  hosts=8 per-host-mem=unavailable  |
> | |  tuple-ids=0,1 row-size=16B cardinality=588944915  |
> | ||
> | 02:HASH JOIN [INNER JOIN, BROADCAST] |
> | |  hash predicates: ss_sold_date_sk = d_date_sk  |
> | |  runtime filters: RF000 <- d_date_sk   |
> | |  hosts=8 per-host-mem=3.21KB   |
> | |  tuple-ids=0,1 row-size=16B cardinality=588944915  |
> | ||
> | |--03:EXCHANGE [BROADCAST]   |
> | |  |  hosts=1 per-host-mem=0B|
> | |  |  tuple-ids=1 row-size=8B cardinality=373|
> | |  | |
> | |

[jira] [Assigned] (IMPALA-2753) Investigate performance gains for adding random prefix to file name

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-2753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong reassigned IMPALA-2753:
-

Assignee: (was: Mostafa Mokhtar)

> Investigate performance gains for adding random prefix to file name
> ---
>
> Key: IMPALA-2753
> URL: https://issues.apache.org/jira/browse/IMPALA-2753
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Perf Investigation
>Affects Versions: Impala 2.5.0
>Reporter: Mostafa Mokhtar
>Priority: Minor
>  Labels: s3
>
> I noticed which is not directly related to Impala is that the file naming 
> convention HDFS produces is the anti pattern of what S3 recommends. 
> If we do a trick with the naming convention we can one up Hive when running 
> on S3. 
> {code}
> examplebucket/2013-26-05-15-00-00/cust1234234/photo1.jpg
> examplebucket/2013-26-05-15-00-00/cust3857422/photo2.jpg
> examplebucket/2013-26-05-15-00-00/cust8474937/photo2.jpg
> examplebucket/2013-26-05-15-00-00/cust1248473/photo3.jpg
> ...
> examplebucket/2013-26-05-15-00-01/cust1248473/photo4.jpg
> examplebucket/2013-26-05-15-00-01/cust1248473/photo5.jpg
> examplebucket/2013-26-05-15-00-01/cust1248473/photo6.jpg
> examplebucket/2013-26-05-15-00-01/cust1248473/photo7.jpg
> ...
> {code}
> The sequence pattern in the key names introduces a performance problem. To 
> understand the issue, let’s look at how Amazon S3 stores key names.
> Amazon S3 maintains an index of object key names in each AWS region. Object 
> keys are stored lexicographically across multiple partitions in the index. 
> That is, Amazon S3 stores key names in alphabetical order. The key name 
> dictates which partition the key is stored in. Using a sequential prefix, 
> such as timestamp or an alphabetical sequence, increases the likelihood that 
> Amazon S3 will target a specific partition for a large number of your keys, 
> overwhelming the I/O capacity of the partition. If you introduce some 
> randomness in your key name prefixes, the key names, and therefore the I/O 
> load, will be distributed across more than one partition.
> If you anticipate that your workload will consistently exceed 100 requests 
> per second, you should avoid sequential key names. If you must use sequential 
> numbers or date and time patterns in key names, add a random prefix to the 
> key name. The randomness of the prefix more evenly distributes key names 
> across multiple index partitions. Examples of introducing randomness are 
> provided later in this topic.
> http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-2913) Impalad fails to load tables in test runs

2018-07-11 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540566#comment-16540566
 ] 

Tim Armstrong commented on IMPALA-2913:
---

Might be a dupe of IMPALA-5996?

> Impalad fails to load tables in test runs
> -
>
> Key: IMPALA-2913
> URL: https://issues.apache.org/jira/browse/IMPALA-2913
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog, Frontend
>Affects Versions: Impala 2.5.0
>Reporter: Henry Robinson
>Assignee: Vuk Ercegovac
>Priority: Major
> Fix For: Impala 2.5.0
>
> Attachments: catalog.stacks.out, gdb.txt.zip, impalad.stacks.out
>
>
> Build: http://sandbox.jenkins.cloudera.com/job/impala-master-cdh5-trunk/1714/
> {code:title=impalad}
> I0130 10:12:40.874454 24955 Frontend.java:864] analyze query drop database if 
> exists `test_parquet_list_encodings_793898492` cascade
> I0130 10:13:41.530505 24944 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:13:41.532280 24944 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.customer_demographics
> I0130 10:13:43.693908 24948 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:13:43.697772 24948 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.household_demographics, tpcds_parquet.customer
> I0130 10:13:51.970826 24928 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:13:51.971350 24928 Frontend.java:808] Requesting prioritized load of 
> table(s): functional_avro_snap.alltypestiny
> I0130 10:14:10.715591 24925 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:14:10.718114 24925 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.customer, tpcds_parquet.promotion
> I0130 10:15:41.542430 24944 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:15:41.544220 24944 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.customer_demographics
> I0130 10:15:43.707329 24948 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:15:43.711132 24948 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.household_demographics, tpcds_parquet.customer
> I0130 10:15:51.980530 24928 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:15:51.981075 24928 Frontend.java:808] Requesting prioritized load of 
> table(s): functional_avro_snap.alltypestiny
> I0130 10:16:10.728305 24925 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:16:10.730826 24925 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.customer, tpcds_parquet.promotion
> I0130 10:17:41.554452 24944 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:17:41.556248 24944 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.customer_demographics
> I0130 10:17:43.720675 24948 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:17:43.724488 24948 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.household_demographics, tpcds_parquet.customer
> I0130 10:17:51.990448 24928 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:17:51.990978 24928 Frontend.java:808] Requesting prioritized load of 
> table(s): functional_avro_snap.alltypestiny
> I0130 10:18:10.740888 24925 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:18:10.743422 24925 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.customer, tpcds_parquet.promotion
> I0130 10:19:41.566427 24944 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:19:41.568217 24944 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.customer_demographics
> I0130 10:19:43.733989 24948 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:19:43.737788 24948 Frontend.java:808] Requesting prioritized load of 
> table(s): tpcds_parquet.household_demographics, tpcds_parquet.customer
> I0130 10:19:52.59 24928 Frontend.java:883] Missing tables were not 
> received in 12ms. Load request will be retried.
> I0130 10:19:52.000630 24928 Frontend.java:808] Requestin

[jira] [Commented] (IMPALA-7280) test_tpcds_partitioned_insert fails when file formats are specified

2018-07-11 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540711#comment-16540711
 ] 

Tim Armstrong commented on IMPALA-7280:
---

[~tlipcon] it's possible that this test has not been running for a long time 
because of IMPALA-3947

> test_tpcds_partitioned_insert fails when file formats are specified
> ---
>
> Key: IMPALA-7280
> URL: https://issues.apache.org/jira/browse/IMPALA-7280
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Todd Lipcon
>Priority: Minor
>
> It seems this test doesn't usually run by default, but when I ran 
> {{run-tests.py -k 'tpcds' --table_formats=parquet/none}} it ran and failed. 
> The issue seems to be that it uses DROP TABLE IF EXISTS as its first query 
> and expects no response. However IMPALA-5903 changed this such that it now 
> returns a result string indicating whether any table was dropped.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7272) impalad crash when Fatigue test

2018-07-11 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540738#comment-16540738
 ] 

Tim Armstrong commented on IMPALA-7272:
---

[~zzjj] do you know what query was running at the time of the crash?

> impalad   crash when Fatigue test
> -
>
> Key: IMPALA-7272
> URL: https://issues.apache.org/jira/browse/IMPALA-7272
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.11.0, Impala 2.12.0
> Environment: apache  branch  
> [329979d6fb0caa0dc449d7e0aa75460c30e868f0]
> centos 6.5
>Reporter: yyzzjj
>Priority: Critical
>  Labels: crash
> Attachments: e4386102-833c-40bb-4eec10b2-827c76be.dmp, 
> impalad_node0.ERROR, impalad_node0.WARNING
>
>
> (gdb) bt
> #0 0x003269832635 in raise () from /lib64/libc.so.6
> #1 0x003269833e15 in abort () from /lib64/libc.so.6
> #2 0x04010f64 in google::DumpStackTraceAndExit() ()
> #3 0x040079dd in google::LogMessage::Fail() ()
> #4 0x04009282 in google::LogMessage::SendToLog() ()
> #5 0x040073b7 in google::LogMessage::Flush() ()
> #6 0x0400a97e in google::LogMessageFatal::~LogMessageFatal() ()
> #7 0x01a2dfab in impala::MemPool::CheckIntegrity (this=0x5916e1f8, 
> check_current_chunk_empty=true)
>  at /export/ldb/online/kudu_rpc_branch/be/src/runtime/mem-pool.cc:258
> #8 0x01a2cf56 in impala::MemPool::FindChunk (this=0x5916e1f8, 
> min_size=10, check_limits=true) at 
> /export/ldb/online/kudu_rpc_branch/be/src/runtime/mem-pool.cc:158
> #9 0x01a3dd1b in impala::MemPool::Allocate (alignment=8, 
> size=10, this=0x5916e1f8) at 
> /export/ldb/online/kudu_rpc_branch/be/src/runtime/mem-pool.h:273
> #10 impala::MemPool::TryAllocate (this=0x5916e1f8, size=10) at 
> /export/ldb/online/kudu_rpc_branch/be/src/runtime/mem-pool.h:109
> #11 0x01caefb8 in impala::StringBuffer::GrowBuffer 
> (this=0x7f90d9489c28, new_size=10) at 
> /export/ldb/online/kudu_rpc_branch/be/src/runtime/string-buffer.h:85
> #12 0x01caee18 in impala::StringBuffer::Append (this=0x7f90d9489c28, 
> str=0x7f92cda6e039 "1104700843don...@jd.com业务运营部\230\340\246͒\177", 
> str_len=10)
>  at /export/ldb/online/kudu_rpc_branch/be/src/runtime/string-buffer.h:53
> #13 0x01cac864 in impala::StringMinMaxFilter::CopyToBuffer 
> (this=0x7f90d9489c00, buffer=0x7f90d9489c28, value=0x7f90d9489c08, len=10)
>  at /export/ldb/online/kudu_rpc_branch/be/src/util/min-max-filter.cc:304
> #14 0x01cac2a9 in impala::StringMinMaxFilter::MaterializeValues 
> (this=0x7f90d9489c00) at 
> /export/ldb/online/kudu_rpc_branch/be/src/util/min-max-filter.cc:229
> #15 0x02b9641a in impala::FilterContext::MaterializeValues 
> (this=0x61cc0b70) at 
> /export/ldb/online/kudu_rpc_branch/be/src/exec/filter-context.cc:97
> #16 0x7f93fdb9440e in ?? ()
> #17 0x7f90a97f5400 in ?? ()
> #18 0x2acd2bba01a2e0f7 in ?? ()
> #19 0x5916e140 in ?? ()
> #20 0x7f930c34d740 in ?? ()
> #21 0x7f90a97f5220 in ?? ()
> #22 0x66aa77bb66aa77bb in ?? ()
> #23 0x61cc0b70 in ?? ()
> #24 0x61cc0b70 in ?? ()
> #25 0x61cc0b98 in ?? ()
> #26 0x61cc0b70 in ?? ()
> #27 0x7f90a97f5300 in ?? ()
> #28 0x01ab84ed in 
> impala::RuntimeFilterBank::AllocateScratchMinMaxFilter (this= variable: Cannot access memory at address 0xff4f>, 
>  filter_id= 0xff4b>, 
>  type= 0xff3f>) at 
> /export/ldb/online/kudu_rpc_branch/be/src/runtime/runtime-filter-bank.cc:250
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7272) impalad crash when Fatigue test

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7272:
--
Labels: crash  (was: )

> impalad   crash when Fatigue test
> -
>
> Key: IMPALA-7272
> URL: https://issues.apache.org/jira/browse/IMPALA-7272
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.11.0, Impala 2.12.0
> Environment: apache  branch  
> [329979d6fb0caa0dc449d7e0aa75460c30e868f0]
> centos 6.5
>Reporter: yyzzjj
>Priority: Critical
>  Labels: crash
> Attachments: e4386102-833c-40bb-4eec10b2-827c76be.dmp, 
> impalad_node0.ERROR, impalad_node0.WARNING
>
>
> (gdb) bt
> #0 0x003269832635 in raise () from /lib64/libc.so.6
> #1 0x003269833e15 in abort () from /lib64/libc.so.6
> #2 0x04010f64 in google::DumpStackTraceAndExit() ()
> #3 0x040079dd in google::LogMessage::Fail() ()
> #4 0x04009282 in google::LogMessage::SendToLog() ()
> #5 0x040073b7 in google::LogMessage::Flush() ()
> #6 0x0400a97e in google::LogMessageFatal::~LogMessageFatal() ()
> #7 0x01a2dfab in impala::MemPool::CheckIntegrity (this=0x5916e1f8, 
> check_current_chunk_empty=true)
>  at /export/ldb/online/kudu_rpc_branch/be/src/runtime/mem-pool.cc:258
> #8 0x01a2cf56 in impala::MemPool::FindChunk (this=0x5916e1f8, 
> min_size=10, check_limits=true) at 
> /export/ldb/online/kudu_rpc_branch/be/src/runtime/mem-pool.cc:158
> #9 0x01a3dd1b in impala::MemPool::Allocate (alignment=8, 
> size=10, this=0x5916e1f8) at 
> /export/ldb/online/kudu_rpc_branch/be/src/runtime/mem-pool.h:273
> #10 impala::MemPool::TryAllocate (this=0x5916e1f8, size=10) at 
> /export/ldb/online/kudu_rpc_branch/be/src/runtime/mem-pool.h:109
> #11 0x01caefb8 in impala::StringBuffer::GrowBuffer 
> (this=0x7f90d9489c28, new_size=10) at 
> /export/ldb/online/kudu_rpc_branch/be/src/runtime/string-buffer.h:85
> #12 0x01caee18 in impala::StringBuffer::Append (this=0x7f90d9489c28, 
> str=0x7f92cda6e039 "1104700843don...@jd.com业务运营部\230\340\246͒\177", 
> str_len=10)
>  at /export/ldb/online/kudu_rpc_branch/be/src/runtime/string-buffer.h:53
> #13 0x01cac864 in impala::StringMinMaxFilter::CopyToBuffer 
> (this=0x7f90d9489c00, buffer=0x7f90d9489c28, value=0x7f90d9489c08, len=10)
>  at /export/ldb/online/kudu_rpc_branch/be/src/util/min-max-filter.cc:304
> #14 0x01cac2a9 in impala::StringMinMaxFilter::MaterializeValues 
> (this=0x7f90d9489c00) at 
> /export/ldb/online/kudu_rpc_branch/be/src/util/min-max-filter.cc:229
> #15 0x02b9641a in impala::FilterContext::MaterializeValues 
> (this=0x61cc0b70) at 
> /export/ldb/online/kudu_rpc_branch/be/src/exec/filter-context.cc:97
> #16 0x7f93fdb9440e in ?? ()
> #17 0x7f90a97f5400 in ?? ()
> #18 0x2acd2bba01a2e0f7 in ?? ()
> #19 0x5916e140 in ?? ()
> #20 0x7f930c34d740 in ?? ()
> #21 0x7f90a97f5220 in ?? ()
> #22 0x66aa77bb66aa77bb in ?? ()
> #23 0x61cc0b70 in ?? ()
> #24 0x61cc0b70 in ?? ()
> #25 0x61cc0b98 in ?? ()
> #26 0x61cc0b70 in ?? ()
> #27 0x7f90a97f5300 in ?? ()
> #28 0x01ab84ed in 
> impala::RuntimeFilterBank::AllocateScratchMinMaxFilter (this= variable: Cannot access memory at address 0xff4f>, 
>  filter_id= 0xff4b>, 
>  type= 0xff3f>) at 
> /export/ldb/online/kudu_rpc_branch/be/src/runtime/runtime-filter-bank.cc:250
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7272) impalad crash when Fatigue test

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7272:
--
Target Version: Impala 3.1.0
  Priority: Blocker  (was: Critical)

> impalad   crash when Fatigue test
> -
>
> Key: IMPALA-7272
> URL: https://issues.apache.org/jira/browse/IMPALA-7272
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.11.0, Impala 2.12.0
> Environment: apache  branch  
> [329979d6fb0caa0dc449d7e0aa75460c30e868f0]
> centos 6.5
>Reporter: yyzzjj
>Priority: Blocker
>  Labels: crash
> Attachments: e4386102-833c-40bb-4eec10b2-827c76be.dmp, 
> impalad_node0.ERROR, impalad_node0.WARNING
>
>
> (gdb) bt
> #0 0x003269832635 in raise () from /lib64/libc.so.6
> #1 0x003269833e15 in abort () from /lib64/libc.so.6
> #2 0x04010f64 in google::DumpStackTraceAndExit() ()
> #3 0x040079dd in google::LogMessage::Fail() ()
> #4 0x04009282 in google::LogMessage::SendToLog() ()
> #5 0x040073b7 in google::LogMessage::Flush() ()
> #6 0x0400a97e in google::LogMessageFatal::~LogMessageFatal() ()
> #7 0x01a2dfab in impala::MemPool::CheckIntegrity (this=0x5916e1f8, 
> check_current_chunk_empty=true)
>  at /export/ldb/online/kudu_rpc_branch/be/src/runtime/mem-pool.cc:258
> #8 0x01a2cf56 in impala::MemPool::FindChunk (this=0x5916e1f8, 
> min_size=10, check_limits=true) at 
> /export/ldb/online/kudu_rpc_branch/be/src/runtime/mem-pool.cc:158
> #9 0x01a3dd1b in impala::MemPool::Allocate (alignment=8, 
> size=10, this=0x5916e1f8) at 
> /export/ldb/online/kudu_rpc_branch/be/src/runtime/mem-pool.h:273
> #10 impala::MemPool::TryAllocate (this=0x5916e1f8, size=10) at 
> /export/ldb/online/kudu_rpc_branch/be/src/runtime/mem-pool.h:109
> #11 0x01caefb8 in impala::StringBuffer::GrowBuffer 
> (this=0x7f90d9489c28, new_size=10) at 
> /export/ldb/online/kudu_rpc_branch/be/src/runtime/string-buffer.h:85
> #12 0x01caee18 in impala::StringBuffer::Append (this=0x7f90d9489c28, 
> str=0x7f92cda6e039 "1104700843don...@jd.com业务运营部\230\340\246͒\177", 
> str_len=10)
>  at /export/ldb/online/kudu_rpc_branch/be/src/runtime/string-buffer.h:53
> #13 0x01cac864 in impala::StringMinMaxFilter::CopyToBuffer 
> (this=0x7f90d9489c00, buffer=0x7f90d9489c28, value=0x7f90d9489c08, len=10)
>  at /export/ldb/online/kudu_rpc_branch/be/src/util/min-max-filter.cc:304
> #14 0x01cac2a9 in impala::StringMinMaxFilter::MaterializeValues 
> (this=0x7f90d9489c00) at 
> /export/ldb/online/kudu_rpc_branch/be/src/util/min-max-filter.cc:229
> #15 0x02b9641a in impala::FilterContext::MaterializeValues 
> (this=0x61cc0b70) at 
> /export/ldb/online/kudu_rpc_branch/be/src/exec/filter-context.cc:97
> #16 0x7f93fdb9440e in ?? ()
> #17 0x7f90a97f5400 in ?? ()
> #18 0x2acd2bba01a2e0f7 in ?? ()
> #19 0x5916e140 in ?? ()
> #20 0x7f930c34d740 in ?? ()
> #21 0x7f90a97f5220 in ?? ()
> #22 0x66aa77bb66aa77bb in ?? ()
> #23 0x61cc0b70 in ?? ()
> #24 0x61cc0b70 in ?? ()
> #25 0x61cc0b98 in ?? ()
> #26 0x61cc0b70 in ?? ()
> #27 0x7f90a97f5300 in ?? ()
> #28 0x01ab84ed in 
> impala::RuntimeFilterBank::AllocateScratchMinMaxFilter (this= variable: Cannot access memory at address 0xff4f>, 
>  filter_id= 0xff4b>, 
>  type= 0xff3f>) at 
> /export/ldb/online/kudu_rpc_branch/be/src/runtime/runtime-filter-bank.cc:250
> Backtrace stopped: previous frame inner to this frame (corrupt stack?)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-7239) Mitigate ParseSmaps() overhead

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7239.
---
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

> Mitigate ParseSmaps() overhead
> --
>
> Key: IMPALA-7239
> URL: https://issues.apache.org/jira/browse/IMPALA-7239
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Critical
>  Labels: perf, resource-management
> Fix For: Impala 3.1.0
>
> Attachments: mmap.c
>
>
> I've heard anecdotes of high system time spent in functions related this the 
> smap parsing. It appears that this can be expensive on systems once the 
> impalad virtual memory gets fragmented and there are 10s of thousands of maps.
> We can try to mitigate by reducing frequency of the parsing or disabling it 
> entirely. I'm not sure if there are cheaper ways to get all of the same 
> metrics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-7095) Improve scanner thread counters in HDFS and Kudu scans

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7095.
---
   Resolution: Fixed
Fix Version/s: Impala 3.1.0

[~sakinapelli] this might be relevant to your interests.


> Improve scanner thread counters in HDFS and Kudu scans
> --
>
> Key: IMPALA-7095
> URL: https://issues.apache.org/jira/browse/IMPALA-7095
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.0
>Reporter: Tim Armstrong
>Assignee: Tim Armstrong
>Priority: Major
>  Labels: observability
> Fix For: Impala 3.1.0
>
>
> There are a few deficiencies here:
> * We don't track the peak number of scanner threads. Consumers of the profile 
> often confuse NumScannerThreadsStarted with the peak.
> * Kudu scans are missing some metrics, e.g. AverageScannerThreadConcurrency. 
> We should make sure that Kudu and HDFS are consistent.
> We should clean this up, and maybe refactor the code so that less logic is 
> duplicated



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-7255) Incorrect result rounding midpoint negative numbers to negative precision

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7255.
---
Resolution: Duplicate

> Incorrect result rounding midpoint negative numbers to negative precision
> -
>
> Key: IMPALA-7255
> URL: https://issues.apache.org/jira/browse/IMPALA-7255
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Ian Cook
>Priority: Major
>
> Based on the Impala docs and on the behavior of other SQL engines, it's 
> evident that:
> {{round(-15, -1)}} should return {{-20}}
>  {{round(-150, -2)}} should return {{-200}}
> and so on, because negative values at the midpoint between the upper and 
> lower rounded values are supposed to round to the value farther from zero. 
> However, in Impala:
> {{round(-15, -1)}} returns {{-10}}
>  {{round(-150, -2)}} returns {{-100}}
> and so on. This issue affects cases where both arguments are negative and the 
> number being rounded is at the midpoint. I believe this issue affects cases 
> where the first argument has any of the integer data types or the {{FLOAT}} 
> or {{DOUBLE}} type, but not when it has a {{DECIMAL}} type. This issue seems 
> to occur regardless of whether the numbers being rounded are specified as 
> literals, column references, or expressions.
> To reproduce this, execute queries like:
> {{SELECT round(-15, -1);}}
>  {{SELECT round(-150, -2);}}
>  {{SELECT round(cast(-150 AS BIGINT), -2);}}
>  {{SELECT round(cast(-150 AS DOUBLE), -2);}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-7283) Consider backing off before retrying RPC

2018-07-11 Thread Tim Armstrong (JIRA)

Tim Armstrong created IMPALA-7283:
-

 Summary: Consider backing off before retrying RPC
 Key: IMPALA-7283
 URL: https://issues.apache.org/jira/browse/IMPALA-7283
 Project: IMPALA
  Issue Type: Improvement
  Components: Distributed Exec
Reporter: Tim Armstrong


[~stiga-huang] suggested here 
https://gerrit.cloudera.org/#/c/10744/11/be/src/service/client-request-state.cc@628
{quote}
What about sleep several seconds before the next retry like this?

 for (int i = 0; i < 3; ++i, sleep(3))

Usually it will increase success rate if there're network issues or the target 
server is stuck temporarily.
{quote}

This seems worth considering but I don't have the knowledge to really evaluate 
it
cc [~kwho] [~sailesh]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-7284) Automatically determine min quiesce period based on configuration

2018-07-11 Thread Tim Armstrong (JIRA)

Tim Armstrong created IMPALA-7284:
-

 Summary: Automatically determine min quiesce period based on 
configuration
 Key: IMPALA-7284
 URL: https://issues.apache.org/jira/browse/IMPALA-7284
 Project: IMPALA
  Issue Type: Improvement
  Components: Distributed Exec
Reporter: Tim Armstrong


Following on from IMPALA-1760, we should improve usability by allowing 
automatic configuration of the minimum quiesce period to match admission 
control configs, etc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7283) Consider backing off before retrying RPC

2018-07-11 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540920#comment-16540920
 ] 

Tim Armstrong commented on IMPALA-7283:
---

Does KRPC have some kind of better retry logic?


> Consider backing off before retrying RPC
> 
>
> Key: IMPALA-7283
> URL: https://issues.apache.org/jira/browse/IMPALA-7283
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Distributed Exec
>Reporter: Tim Armstrong
>Priority: Major
>
> [~stiga-huang] suggested here 
> https://gerrit.cloudera.org/#/c/10744/11/be/src/service/client-request-state.cc@628
> {quote}
> What about sleep several seconds before the next retry like this?
>  for (int i = 0; i < 3; ++i, sleep(3))
> Usually it will increase success rate if there're network issues or the 
> target server is stuck temporarily.
> {quote}
> This seems worth considering but I don't have the knowledge to really 
> evaluate it
> cc [~kwho] [~sailesh]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-6034) Add query option that limits scanned bytes at runtime

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-6034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-6034:
--
Target Version: Impala 3.1.0

> Add query option that limits scanned bytes at runtime
> -
>
> Key: IMPALA-6034
> URL: https://issues.apache.org/jira/browse/IMPALA-6034
> Project: IMPALA
>  Issue Type: Sub-task
>  Components: Distributed Exec
>Reporter: Mostafa Mokhtar
>Assignee: Tim Armstrong
>Priority: Major
>
> Reject queries that scans large data before executing the query.
> This is a mechanism to protect the cluster from potentially harmful queries.
> MAX_READ_BYTES: [0]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7243) width_bucket() returns an incorrect result

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7243:
--
Target Version: Impala 3.1.0

> width_bucket() returns an incorrect result
> --
>
> Key: IMPALA-7243
> URL: https://issues.apache.org/jira/browse/IMPALA-7243
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Taras Bobrovytsky
>Assignee: Anuj Phadke
>Priority: Critical
>
> The following query returns an incorrect result:
> {code:java}
> select width_bucket(cast(9 as decimal(10,7)), cast(-6 as decimal(11,6)), 
> cast(10 as decimal(7,5)), 249895273);{code}
> Result:
> {code:java}
> 1{code}
> Since 9 is slightly less than the upper bound, which is 10, the result should 
> be the number of buckets.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7242) Dcheck fails in width_bucket() function

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7242:
--
Target Version: Impala 3.1.0

> Dcheck fails in width_bucket() function
> ---
>
> Key: IMPALA-7242
> URL: https://issues.apache.org/jira/browse/IMPALA-7242
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Taras Bobrovytsky
>Assignee: Anuj Phadke
>Priority: Critical
>
> The following query hits a DCHECK:
> {code:java}
> select width_bucket(cast(-0.10 as decimal(37,30)), cast(-0.36028797018963968 
> as decimal(25,25)), cast(9151517.4969773200562764155787276999832 as 
> decimal(38,31)), 1328180220){code}
> Failed check:
> {code:java}
> math-functions-ir.cc:566] Check failed: overflow == false (1 vs. 0){code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-7256) Aggregator mem usage isn't reflected in summary

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-7256:
--
Target Version: Impala 3.1.0
Labels: observability supportability  (was: )
  Priority: Blocker  (was: Critical)

> Aggregator mem usage isn't reflected in summary
> ---
>
> Key: IMPALA-7256
> URL: https://issues.apache.org/jira/browse/IMPALA-7256
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Thomas Tauber-Marshall
>Assignee: Thomas Tauber-Marshall
>Priority: Blocker
>  Labels: observability, supportability
>
> Since IMPALA-110 (part 2) went in, which refactored 
> 'PartitionedAggregationNode' and introduces 'Aggregator', the memory used by 
> Aggregator is not reflected in the exec summary, where it should be listed 
> under the corresponding AggregationNode/StreamingAggregationNode, as 
> Aggregator's MemTracker is initialized with the 
> RuntimeState::instance_mem_tracker as its parent, rather than 
> ExecNode::mem_tracker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7256) Aggregator mem usage isn't reflected in summary

2018-07-11 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540932#comment-16540932
 ] 

Tim Armstrong commented on IMPALA-7256:
---

Upgraded to blocker since we really depend on this information to debug memory 
usage problems and this makes it a lot harder to debug some things.

> Aggregator mem usage isn't reflected in summary
> ---
>
> Key: IMPALA-7256
> URL: https://issues.apache.org/jira/browse/IMPALA-7256
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 3.1.0
>Reporter: Thomas Tauber-Marshall
>Assignee: Thomas Tauber-Marshall
>Priority: Blocker
>  Labels: observability, supportability
>
> Since IMPALA-110 (part 2) went in, which refactored 
> 'PartitionedAggregationNode' and introduces 'Aggregator', the memory used by 
> Aggregator is not reflected in the exec summary, where it should be listed 
> under the corresponding AggregationNode/StreamingAggregationNode, as 
> Aggregator's MemTracker is initialized with the 
> RuntimeState::instance_mem_tracker as its parent, rather than 
> ExecNode::mem_tracker



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7261) Implement Memory pool on NVDIMM-P

2018-07-11 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540942#comment-16540942
 ] 

Tim Armstrong commented on IMPALA-7261:
---

No there hasn't been. What are the perf characteristics of NVDIMM?

One non-invasive approach to use NVDIMM as extended memory for query processing 
is to integrate it into the BufferPool. It currently moves data between memory 
and temporary files on storage but it could be extended to use persistent RAM 
as another tier of storage.

> Implement Memory pool on NVDIMM-P
> -
>
> Key: IMPALA-7261
> URL: https://issues.apache.org/jira/browse/IMPALA-7261
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend
>Reporter: Zhongyue Nah
>Priority: Major
>  Labels: features
>
> Implement Impala Memory Pool using PMDK to reduce memory footprint.
>  
> References:
> http://pmem.io/pmdk/
> http://pmem.io/2017/04/03/cloudera-kudu-pmem-enabled-block-cache.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-5821) Add literal suffixes for numeric types

2018-07-11 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-5821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540947#comment-16540947
 ] 

Tim Armstrong commented on IMPALA-5821:
---

[~grahn] any chance you remember the content of the discussion you had with 
alex?

> Add literal suffixes for numeric types
> --
>
> Key: IMPALA-5821
> URL: https://issues.apache.org/jira/browse/IMPALA-5821
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Affects Versions: Impala 2.8.0
>Reporter: Matthew Jacobs
>Priority: Minor
>  Labels: supportability, usability
>
> In this plan, it wasn't clear that the constant in the predicate was being 
> evaluated to a double. Then the lhs required an implicit cast, and the 
> predicate couldn't be pushed to Kudu:
> {code}
> [localhost:21000] > explain select * from functional_kudu.alltypestiny where 
> bigint_col < 1000 / 100;
> Query: explain select * from functional_kudu.alltypestiny where bigint_col < 
> 1000 / 100
> +-+
> | Explain String  |
> +-+
> | Per-Host Resource Reservation: Memory=0B|
> | Per-Host Resource Estimates: Memory=10.00MB |
> | Codegen disabled by planner |
> | |
> | PLAN-ROOT SINK  |
> | |   |
> | 00:SCAN KUDU [functional_kudu.alltypestiny] |
> |predicates: bigint_col < 10  |
> +-+
> {code}
> We should make it more clear by printing this as a double clearly, e.g. 
> {code}
> predicates: bigint_col < 10.0
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-558) HS2::FetchResults sets hasMoreRows on first call even when 0 rows are returned

2018-07-11 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16540977#comment-16540977
 ] 

Tim Armstrong commented on IMPALA-558:
--

I think the issues for the 0-row case may have been fixed with the PlanRootSink 
refactoring but there's a related issues where hasMoreRows is never set to true 
on the last non-empty batch returned.

I don't think we can always solve this without delaying return of rows 
currently processed, because we don't always know if more rows are coming. 

However, the current behaviour is a bit wonky. The problem here is that 
ClientRequestState::eos() doesn't become true until PlanRootSink::FlushFinal() 
is called, which can't be called until PlanRootSink::Send() hands off the last 
batch to the client.

> HS2::FetchResults sets hasMoreRows on first call even when 0 rows are returned
> --
>
> Key: IMPALA-558
> URL: https://issues.apache.org/jira/browse/IMPALA-558
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 1.1
>Reporter: Henry Robinson
>Assignee: Alexander Behm
>Priority: Minor
>  Labels: query-lifecycle
>
> The first call to {{FetchResults}} always sets {{hasMoreRows}} even when 0 
> rows should be returned. The next call correctly sets {{hasMoreRows == 
> False}}. The upshot is there's always an extra round-trip, although 
> correctness isn't affected.
> {code}
> execute_statement_req = TCLIService.TExecuteStatementReq()
> execute_statement_req.sessionHandle = resp.sessionHandle
> execute_statement_req.statement = "SELECT COUNT(*) FROM 
> functional.alltypes WHERE 1 = 2"
> execute_statement_resp = 
> self.hs2_client.ExecuteStatement(execute_statement_req)
> 
> fetch_results_req = TCLIService.TFetchResultsReq()
> fetch_results_req.operationHandle = execute_statement_resp.operationHandle
> fetch_results_req.maxRows = 100
> fetch_results_resp = self.hs2_client.FetchResults(fetch_results_req)
> 
> assert not fetch_results_resp.hasMoreRows # Fails
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-558) HS2::FetchResults sets hasMoreRows in many cases where no more rows are to be returned

2018-07-11 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong updated IMPALA-558:
-
Summary: HS2::FetchResults sets hasMoreRows in many cases where no more 
rows are to be returned  (was: HS2::FetchResults sets hasMoreRows on first call 
even when 0 rows are returned)

> HS2::FetchResults sets hasMoreRows in many cases where no more rows are to be 
> returned
> --
>
> Key: IMPALA-558
> URL: https://issues.apache.org/jira/browse/IMPALA-558
> Project: IMPALA
>  Issue Type: Bug
>  Components: Clients
>Affects Versions: Impala 1.1
>Reporter: Henry Robinson
>Assignee: Alexander Behm
>Priority: Minor
>  Labels: query-lifecycle
>
> The first call to {{FetchResults}} always sets {{hasMoreRows}} even when 0 
> rows should be returned. The next call correctly sets {{hasMoreRows == 
> False}}. The upshot is there's always an extra round-trip, although 
> correctness isn't affected.
> {code}
> execute_statement_req = TCLIService.TExecuteStatementReq()
> execute_statement_req.sessionHandle = resp.sessionHandle
> execute_statement_req.statement = "SELECT COUNT(*) FROM 
> functional.alltypes WHERE 1 = 2"
> execute_statement_resp = 
> self.hs2_client.ExecuteStatement(execute_statement_req)
> 
> fetch_results_req = TCLIService.TFetchResultsReq()
> fetch_results_req.operationHandle = execute_statement_resp.operationHandle
> fetch_results_req.maxRows = 100
> fetch_results_resp = self.hs2_client.FetchResults(fetch_results_req)
> 
> assert not fetch_results_resp.hasMoreRows # Fails
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-7278) distinct clause is not working as expected with custom UDFs

2018-07-12 Thread Tim Armstrong (JIRA)



 [ 
https://issues.apache.org/jira/browse/IMPALA-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Armstrong resolved IMPALA-7278.
---
Resolution: Not A Bug

> distinct clause is not working as expected with custom UDFs
> ---
>
> Key: IMPALA-7278
> URL: https://issues.apache.org/jira/browse/IMPALA-7278
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.8.0
>Reporter: shabnam perween
>Priority: Critical
>
> Distinct clause when executed with custom UDF returns unexpected results.
> Custom UDF Definition:
> udf.h file:
> {code}
> #ifndef IMPALA_UDF_SAMPLE_UDF_H
> #define IMPALA_UDF_SAMPLE_UDF_H
> #include "udf.h"
> using namespace impala_udf;
> #ifdef __cplusplus
> extern "C"
> {
> #endif
> StringVal udf_clear(FunctionContext* context, StringVal& sInput);
> #ifdef __cplusplus
> }
> #endif
> #endif
> {code}
> udf.cpp:
> {code}
> #include "clear.h"
> StringVal udf_clear(
>  FunctionContext* context,
>  StringVal& sInput /* String to encrypt */
>  )
> {
>  unsigned char* pReturnData = context->Allocate( 100 );
>  memset( pReturnData, NULL, 100);
>  memcpy(pReturnData, sInput.ptr, sInput.len );
>  StringVal sResult( pReturnData );
>  sResult.len = sInput.len;
>  context->Free( (uint8_t*)pReturnData );
>  return sResult;
> }
> {code}
> CMakeLists.txt:
> {code}
> project (clear)
>  ADD_LIBRARY (clear2.8_RHEL SHARED clear.cpp )
>  TARGET_LINK_LIBRARIES (clear2.8_RHEL libImpalaUdf.a )
>  SET_TARGET_PROPERTIES (clear2.8_RHEL PROPERTIES SUFFIX ".so")
>  SET_TARGET_PROPERTIES (clear2.8_RHEL PROPERTIES PREFIX "")
>  INSTALL ( TARGETS clear2.8_RHEL DESTINATION . )
> Query Syntax:
> CREATE TABLE clear (c1 STRING, c2 STRING) row format delimited fields 
> terminated by ',' stored as textfile;
> LOAD DATA INPATH '/user/clear.csv' OVERWRITE INTO TABLE clear;
> Query: describe clear
> +--++-+
> | name | type | comment |
> +--++-+
> | c1 | string | |
> | c2 | string | |
> +--++-+
> Fetched 2 row(s) in 0.04s
> select * from clear;
> +-+-+
> | c1 | c2 |
> +-+-+
> | 111 | 111 |
> | 111 | 111 |
> | 22 | 22 |
> | 44 | 44 |
> | 22 | 22 |
> | 333 | 333 |
> | 333 | 333 |
> +-+-+
> Fetched 7 row(s) in 0.14s
> select distinct udf_clear(c1),c2 from clear;
> +---+-+
> | default.udf_clear(c1) | c2 |
> +---+-+
> | {color:#d04437}*22* {color}| 44 |   <== this should be *44* 
> | 22 | 22 |
> | 333 | 333 |
> | 111 | 111 |
> +---+-+
> Fetched 4 row(s) in 0.24s
> {code}
>  
> Expected result:
> {code}
> select distinct c1,c2 from clear;
> +-+-+
> | c1 | c2 |
> +-+-+
> | 44 | 44 |
> | 22 | 22 |
> | 333 | 333 |
> | 111 | 111 |
> +-+-+
> Fetched 4 row(s) in 0.25s
>  {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7278) distinct clause is not working as expected with custom UDFs

2018-07-12 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541808#comment-16541808
 ] 

Tim Armstrong commented on IMPALA-7278:
---

[~shabnam] that UDF code is buggy regardless of what release you're running 
against. You may have gotten lucky for some reason in 2.3 but the code was 
still buggy. So yes, you need to fix the UDFs, Impala's runtime can't determine 
that you didn't *actually* mean to free the memory that your code freed.

> distinct clause is not working as expected with custom UDFs
> ---
>
> Key: IMPALA-7278
> URL: https://issues.apache.org/jira/browse/IMPALA-7278
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Affects Versions: Impala 2.8.0
>Reporter: shabnam perween
>Priority: Critical
>
> Distinct clause when executed with custom UDF returns unexpected results.
> Custom UDF Definition:
> udf.h file:
> {code}
> #ifndef IMPALA_UDF_SAMPLE_UDF_H
> #define IMPALA_UDF_SAMPLE_UDF_H
> #include "udf.h"
> using namespace impala_udf;
> #ifdef __cplusplus
> extern "C"
> {
> #endif
> StringVal udf_clear(FunctionContext* context, StringVal& sInput);
> #ifdef __cplusplus
> }
> #endif
> #endif
> {code}
> udf.cpp:
> {code}
> #include "clear.h"
> StringVal udf_clear(
>  FunctionContext* context,
>  StringVal& sInput /* String to encrypt */
>  )
> {
>  unsigned char* pReturnData = context->Allocate( 100 );
>  memset( pReturnData, NULL, 100);
>  memcpy(pReturnData, sInput.ptr, sInput.len );
>  StringVal sResult( pReturnData );
>  sResult.len = sInput.len;
>  context->Free( (uint8_t*)pReturnData );
>  return sResult;
> }
> {code}
> CMakeLists.txt:
> {code}
> project (clear)
>  ADD_LIBRARY (clear2.8_RHEL SHARED clear.cpp )
>  TARGET_LINK_LIBRARIES (clear2.8_RHEL libImpalaUdf.a )
>  SET_TARGET_PROPERTIES (clear2.8_RHEL PROPERTIES SUFFIX ".so")
>  SET_TARGET_PROPERTIES (clear2.8_RHEL PROPERTIES PREFIX "")
>  INSTALL ( TARGETS clear2.8_RHEL DESTINATION . )
> Query Syntax:
> CREATE TABLE clear (c1 STRING, c2 STRING) row format delimited fields 
> terminated by ',' stored as textfile;
> LOAD DATA INPATH '/user/clear.csv' OVERWRITE INTO TABLE clear;
> Query: describe clear
> +--++-+
> | name | type | comment |
> +--++-+
> | c1 | string | |
> | c2 | string | |
> +--++-+
> Fetched 2 row(s) in 0.04s
> select * from clear;
> +-+-+
> | c1 | c2 |
> +-+-+
> | 111 | 111 |
> | 111 | 111 |
> | 22 | 22 |
> | 44 | 44 |
> | 22 | 22 |
> | 333 | 333 |
> | 333 | 333 |
> +-+-+
> Fetched 7 row(s) in 0.14s
> select distinct udf_clear(c1),c2 from clear;
> +---+-+
> | default.udf_clear(c1) | c2 |
> +---+-+
> | {color:#d04437}*22* {color}| 44 |   <== this should be *44* 
> | 22 | 22 |
> | 333 | 333 |
> | 111 | 111 |
> +---+-+
> Fetched 4 row(s) in 0.24s
> {code}
>  
> Expected result:
> {code}
> select distinct c1,c2 from clear;
> +-+-+
> | c1 | c2 |
> +-+-+
> | 44 | 44 |
> | 22 | 22 |
> | 333 | 333 |
> | 111 | 111 |
> +-+-+
> Fetched 4 row(s) in 0.25s
>  {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-6810) query_test::test_runtime_filters.py::test_row_filters fails when run against an external cluster

2018-07-12 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-6810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541934#comment-16541934
 ] 

Tim Armstrong commented on IMPALA-6810:
---

Option 3 makes the most sense to me, the name of the pool is not important in 
this test since it's not intended to test the admission control configuration.

> query_test::test_runtime_filters.py::test_row_filters fails when run against 
> an external cluster
> 
>
> Key: IMPALA-6810
> URL: https://issues.apache.org/jira/browse/IMPALA-6810
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.12.0
>Reporter: David Knupp
>Assignee: Michael Brown
>Priority: Critical
>  Labels: admission-control, resource-management
>
> Presumably this test has been passing when run against the local 
> mini-cluster. When run against an external cluster, however, the test fails 
> with an AssertionError because the exception string is different than 
> expected.
> The expected string is:
> _ImpalaBeeswaxException: INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'> MESSAGE: Rejected query from pool 
> {color:red}*default-pool*{color}: minimum memory reservation is greater than 
> memory available to the query for buffer reservations. Increase the 
> buffer_pool_limit to 290.00 MB. See the query profile for more information 
> about the per-node memory requirements._
> The actual string is:
> _ImpalaBeeswaxException: INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'> MESSAGE: Rejected query from pool 
> {color:red}*root.jenkins*{color}: minimum memory reservation is greater than 
> memory available to the query for buffer reservations. Increase the 
> buffer_pool_limit to 290.00 MB. See the query profile for more information 
> about the per-node memory requirements._
> {noformat}
> Stacktrace
> query_test/test_runtime_filters.py:168: in test_row_filters
> test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS' : str(WAIT_TIME_MS)})
> common/impala_test_suite.py:401: in run_test_case
> self.__verify_exceptions(test_section['CATCH'], str(e), use_db)
> common/impala_test_suite.py:279: in __verify_exceptions
> (expected_str, actual_str)
> E   AssertionError: Unexpected exception string. Expected: 
> ImpalaBeeswaxException: INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'> MESSAGE: Rejected query from pool 
> default-pool: minimum memory reservation is greater than memory available to 
> the query for buffer reservations. Increase the buffer_pool_limit to 290.00 
> MB. See the query profile for more information about the per-node memory 
> requirements.
> E   Not found in actual: ImpalaBeeswaxException: INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'> MESSAGE: Rejected query from pool 
> root.jenkins: minimum memory reservation is greater than memory available to 
> the query for buffer reservations. Increase the buffer_pool_limit to 290.00 
> MB. See the query profile for more information about the per-node memory 
> requirements.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-6810) query_test::test_runtime_filters.py::test_row_filters fails when run against an external cluster

2018-07-12 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-6810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541935#comment-16541935
 ] 

Tim Armstrong commented on IMPALA-6810:
---

The other two options seems ok too so I wouldn't be opposed, but IMO would be 
good to do the simplest thing

> query_test::test_runtime_filters.py::test_row_filters fails when run against 
> an external cluster
> 
>
> Key: IMPALA-6810
> URL: https://issues.apache.org/jira/browse/IMPALA-6810
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.12.0
>Reporter: David Knupp
>Assignee: Michael Brown
>Priority: Critical
>  Labels: admission-control, resource-management
>
> Presumably this test has been passing when run against the local 
> mini-cluster. When run against an external cluster, however, the test fails 
> with an AssertionError because the exception string is different than 
> expected.
> The expected string is:
> _ImpalaBeeswaxException: INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'> MESSAGE: Rejected query from pool 
> {color:red}*default-pool*{color}: minimum memory reservation is greater than 
> memory available to the query for buffer reservations. Increase the 
> buffer_pool_limit to 290.00 MB. See the query profile for more information 
> about the per-node memory requirements._
> The actual string is:
> _ImpalaBeeswaxException: INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'> MESSAGE: Rejected query from pool 
> {color:red}*root.jenkins*{color}: minimum memory reservation is greater than 
> memory available to the query for buffer reservations. Increase the 
> buffer_pool_limit to 290.00 MB. See the query profile for more information 
> about the per-node memory requirements._
> {noformat}
> Stacktrace
> query_test/test_runtime_filters.py:168: in test_row_filters
> test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS' : str(WAIT_TIME_MS)})
> common/impala_test_suite.py:401: in run_test_case
> self.__verify_exceptions(test_section['CATCH'], str(e), use_db)
> common/impala_test_suite.py:279: in __verify_exceptions
> (expected_str, actual_str)
> E   AssertionError: Unexpected exception string. Expected: 
> ImpalaBeeswaxException: INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'> MESSAGE: Rejected query from pool 
> default-pool: minimum memory reservation is greater than memory available to 
> the query for buffer reservations. Increase the buffer_pool_limit to 290.00 
> MB. See the query profile for more information about the per-node memory 
> requirements.
> E   Not found in actual: ImpalaBeeswaxException: INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'> MESSAGE: Rejected query from pool 
> root.jenkins: minimum memory reservation is greater than memory available to 
> the query for buffer reservations. Increase the buffer_pool_limit to 290.00 
> MB. See the query profile for more information about the per-node memory 
> requirements.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-6810) query_test::test_runtime_filters.py::test_row_filters fails when run against an external cluster

2018-07-12 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-6810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541966#comment-16541966
 ] 

Tim Armstrong commented on IMPALA-6810:
---

Yeah that should be safe




> query_test::test_runtime_filters.py::test_row_filters fails when run against 
> an external cluster
> 
>
> Key: IMPALA-6810
> URL: https://issues.apache.org/jira/browse/IMPALA-6810
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Affects Versions: Impala 2.12.0
>Reporter: David Knupp
>Assignee: Michael Brown
>Priority: Critical
>  Labels: admission-control, resource-management
>
> Presumably this test has been passing when run against the local 
> mini-cluster. When run against an external cluster, however, the test fails 
> with an AssertionError because the exception string is different than 
> expected.
> The expected string is:
> _ImpalaBeeswaxException: INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'> MESSAGE: Rejected query from pool 
> {color:red}*default-pool*{color}: minimum memory reservation is greater than 
> memory available to the query for buffer reservations. Increase the 
> buffer_pool_limit to 290.00 MB. See the query profile for more information 
> about the per-node memory requirements._
> The actual string is:
> _ImpalaBeeswaxException: INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'> MESSAGE: Rejected query from pool 
> {color:red}*root.jenkins*{color}: minimum memory reservation is greater than 
> memory available to the query for buffer reservations. Increase the 
> buffer_pool_limit to 290.00 MB. See the query profile for more information 
> about the per-node memory requirements._
> {noformat}
> Stacktrace
> query_test/test_runtime_filters.py:168: in test_row_filters
> test_file_vars={'$RUNTIME_FILTER_WAIT_TIME_MS' : str(WAIT_TIME_MS)})
> common/impala_test_suite.py:401: in run_test_case
> self.__verify_exceptions(test_section['CATCH'], str(e), use_db)
> common/impala_test_suite.py:279: in __verify_exceptions
> (expected_str, actual_str)
> E   AssertionError: Unexpected exception string. Expected: 
> ImpalaBeeswaxException: INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'> MESSAGE: Rejected query from pool 
> default-pool: minimum memory reservation is greater than memory available to 
> the query for buffer reservations. Increase the buffer_pool_limit to 290.00 
> MB. See the query profile for more information about the per-node memory 
> requirements.
> E   Not found in actual: ImpalaBeeswaxException: INNER EXCEPTION:  'beeswaxd.ttypes.BeeswaxException'> MESSAGE: Rejected query from pool 
> root.jenkins: minimum memory reservation is greater than memory available to 
> the query for buffer reservations. Increase the buffer_pool_limit to 290.00 
> MB. See the query profile for more information about the per-node memory 
> requirements.
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-7290) Move impala-shell to use HS2

2018-07-12 Thread Tim Armstrong (JIRA)

Tim Armstrong created IMPALA-7290:
-

 Summary: Move impala-shell to use HS2
 Key: IMPALA-7290
 URL: https://issues.apache.org/jira/browse/IMPALA-7290
 Project: IMPALA
  Issue Type: Improvement
  Components: Clients
Reporter: Tim Armstrong


Most clients have moved to the HS2 interface. impala-shell is one of the 
laggards. We should switch impala-shell to use the newer and more standard 
interface.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-7287) Speed up printing of tab delimited output in impala-shell

2018-07-12 Thread Tim Armstrong (JIRA)



[ 
https://issues.apache.org/jira/browse/IMPALA-7287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541980#comment-16541980
 ] 

Tim Armstrong commented on IMPALA-7287:
---

[~csringhofer] we eventually want to move impala-shell off beeswax to HS2 since 
it's a legacy interface. I'd advocate against optimising for beeswax at the 
moment since the code will be thrown away eventually. We don't have a JIRA for 
that work so I created one: IMPALA-7290.

> Speed up printing of tab delimited output in impala-shell
> -
>
> Key: IMPALA-7287
> URL: https://issues.apache.org/jira/browse/IMPALA-7287
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Csaba Ringhofer
>Assignee: Nghia Le
>Priority: Minor
>
> Beeswax returns result rows as strings where the columns are separated by tab 
> characters. ImpalaClient.fetch() splits the rows and printing functions join 
> them again. If the output_delimiter is \t, and the row doesn't contain any 
> special characters, then the result is  exactly the same  string that was 
> fetched.  As \t is the default delimiter, I think that it would be useful to 
> create a "fast path" for this special case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 6239 matches

Mail list logo