[Impala-ASF-CR](2.x) IMPALA-6883: [DOCS] Refactor impala authorization doc

2019-03-06 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12688 )

Change subject: IMPALA-6883: [DOCS] Refactor impala_authorization doc
..


Patch Set 1:

Conflicts:
docs/topics/impala_grant.xml

I try to resolve the conflicts as reasonable as possible. Feel like we need to 
go through all the docs about fine-grained privileges when we release 2.13.


--
To view, visit http://gerrit.cloudera.org:8080/12688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I3df72adb25dcdcbc286934b048645f47d876b33d
Gerrit-Change-Number: 12688
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 07 Mar 2019 07:39:29 +
Gerrit-HasComments: No


[Impala-ASF-CR](2.x) IMPALA-6883: [DOCS] Refactor impala authorization doc

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12688 )

Change subject: IMPALA-6883: [DOCS] Refactor impala_authorization doc
..


Patch Set 1: Verified+1

Build Successful

https://jenkins.impala.io/job/gerrit-docs-auto-test/249/ : Doc tests passed.


--
To view, visit http://gerrit.cloudera.org:8080/12688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I3df72adb25dcdcbc286934b048645f47d876b33d
Gerrit-Change-Number: 12688
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 07 Mar 2019 07:29:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7917 (Part 3): Decouple Sentry from Impala

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12684 )

Change subject: IMPALA-7917 (Part 3): Decouple Sentry from Impala
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/2383/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12684
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1a5d3e0a3e86ac2b0329b56247357fca93229dd0
Gerrit-Change-Number: 12684
Gerrit-PatchSet: 4
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Thu, 07 Mar 2019 07:32:34 +
Gerrit-HasComments: No


[Impala-ASF-CR](2.x) IMPALA-6883: [DOCS] Refactor impala authorization doc

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12688 )

Change subject: IMPALA-6883: [DOCS] Refactor impala_authorization doc
..


Patch Set 1:

Build Started https://jenkins.impala.io/job/gerrit-docs-auto-test/249/

Testing docs change - this change appears to modify docs/ and no code. This is 
experimental - please report any issues to tarmstr...@cloudera.com or on this 
JIRA: IMPALA-7317


--
To view, visit http://gerrit.cloudera.org:8080/12688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I3df72adb25dcdcbc286934b048645f47d876b33d
Gerrit-Change-Number: 12688
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 07 Mar 2019 07:23:53 +
Gerrit-HasComments: No


[Impala-ASF-CR](2.x) IMPALA-6883: [DOCS] Refactor impala authorization doc

2019-03-06 Thread Quanlong Huang (Code Review)
Hello Alex Rodoni, Impala Public Jenkins,

I'd like you to do a code review. Please visit

http://gerrit.cloudera.org:8080/12688

to review the following change.


Change subject: IMPALA-6883: [DOCS] Refactor impala_authorization doc
..

IMPALA-6883: [DOCS] Refactor impala_authorization doc

Change-Id: I3df72adb25dcdcbc286934b048645f47d876b33d
Reviewed-on: http://gerrit.cloudera.org:8080/10786
Reviewed-by: Alex Rodoni 
Tested-by: Impala Public Jenkins 
---
M docs/impala_keydefs.ditamap
M docs/shared/impala_common.xml
M docs/topics/impala_authorization.xml
M docs/topics/impala_grant.xml
4 files changed, 565 insertions(+), 614 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/88/12688/1
--
To view, visit http://gerrit.cloudera.org:8080/12688
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3df72adb25dcdcbc286934b048645f47d876b33d
Gerrit-Change-Number: 12688
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Alex Rodoni 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-7917 (Part 3): Decouple Sentry from Impala

2019-03-06 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12684 )

Change subject: IMPALA-7917 (Part 3): Decouple Sentry from Impala
..


Patch Set 4:

This should be the last part of Sentry decoupling.


-- 
To view, visit http://gerrit.cloudera.org:8080/12684
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1a5d3e0a3e86ac2b0329b56247357fca93229dd0
Gerrit-Change-Number: 12684
Gerrit-PatchSet: 4
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Thu, 07 Mar 2019 06:48:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7917 (Part 3): Decouple Sentry from Impala

2019-03-06 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12684


Change subject: IMPALA-7917 (Part 3): Decouple Sentry from Impala
..

IMPALA-7917 (Part 3): Decouple Sentry from Impala

The third part of this patch introduces an interface called
AuthorizationManager to perform authorization management related
functions, such as granting/revoking a privilege, etc. Some existing
authorization management methods have been refactored to reduce the
need for if/else conditions to perform certain actions. This patch
moves code related to Sentry authorization management code to
SentryAuthorizationManager.

This patch does not implement AuthorizationManager for Ranger.

This patch has no functionality change.

Testing:
- Ran all FE tests
- Ran all E2E authorization tests

Change-Id: I1a5d3e0a3e86ac2b0329b56247357fca93229dd0
---
M fe/src/main/java/org/apache/impala/authorization/AuthorizationFactory.java
A fe/src/main/java/org/apache/impala/authorization/AuthorizationManager.java
M fe/src/main/java/org/apache/impala/authorization/NoneAuthorizationFactory.java
M 
fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationFactory.java
M 
fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizationFactory.java
A 
fe/src/main/java/org/apache/impala/authorization/sentry/SentryAuthorizationManager.java
M fe/src/main/java/org/apache/impala/authorization/sentry/SentryProxy.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/JniCatalog.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/testutil/PlannerTestCaseLoader.java
13 files changed, 864 insertions(+), 395 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/12684/4
--
To view, visit http://gerrit.cloudera.org:8080/12684
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I1a5d3e0a3e86ac2b0329b56247357fca93229dd0
Gerrit-Change-Number: 12684
Gerrit-PatchSet: 4
Gerrit-Owner: Fredy Wijaya 


[Impala-ASF-CR] IMPALA-7917 (Part 2): Decouple Sentry from Impala

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12542 )

Change subject: IMPALA-7917 (Part 2): Decouple Sentry from Impala
..


Patch Set 13:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3888/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/12542
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I051d06451fb14ba2ffb3f7e0d4aae37dee55ab86
Gerrit-Change-Number: 12542
Gerrit-PatchSet: 13
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Austin Nobis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: radford nguyen 
Gerrit-Comment-Date: Thu, 07 Mar 2019 06:41:02 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7917 (Part 2): Decouple Sentry from Impala

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12542 )

Change subject: IMPALA-7917 (Part 2): Decouple Sentry from Impala
..


Patch Set 13: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12542
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I051d06451fb14ba2ffb3f7e0d4aae37dee55ab86
Gerrit-Change-Number: 12542
Gerrit-PatchSet: 13
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Austin Nobis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: radford nguyen 
Gerrit-Comment-Date: Thu, 07 Mar 2019 06:41:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7917 (Part 2): Decouple Sentry from Impala

2019-03-06 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12542 )

Change subject: IMPALA-7917 (Part 2): Decouple Sentry from Impala
..


Patch Set 12:

> Patch Set 12: Verified-1
>
> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/3887/

Looks like thrift-server-test crashed:

:25:55 ]  56/103 Test  #56: thrift-server-test ...***Exception: 
Other  1.11 sec
06:25:55 ] Turning perftools heap leak checking off
06:25:55 ] Loading random data
06:25:55 ] Initializing database 'f2b2-8af6-9c5c-7874/krb5kdc/principal' for 
realm 'KRBTEST.COM',
06:25:55 ] master key name 'K/m...@krbtest.com'
06:25:55 ] Mar 07 03:02:53 ip-172-31-45-54 krb5kdc[80473](info): setting up 
network...
06:25:55 ] krb5kdc: Address already in use - Cannot bind server socket on 
0.0.0.0.51293
06:25:55 ] krb5kdc: setsockopt(9,IPV6_V6ONLY,1) worked
06:25:55 ] Mar 07 03:02:53 ip-172-31-45-54 krb5kdc[80473](info): listening on 
fd 9: udp ::.51293 (pktinfo)
06:25:55 ] Mar 07 03:02:53 ip-172-31-45-54 krb5kdc[80473](info): listening on 
fd 10: udp 172.31.45.54.51293
06:25:55 ] Mar 07 03:02:53 ip-172-31-45-54 krb5kdc[80473](info): set up 2 
sockets
06:25:55 ] Mar 07 03:02:53 ip-172-31-45-54 krb5kdc[80473](info): commencing 
operation
06:25:55 ] krb5kdc: starting...
06:25:55 ] *** Check failure stack trace: ***
06:25:55 ] @  0x484217c  google::LogMessage::Fail()
06:25:55 ] @  0x4843a21  google::LogMessage::SendToLog()
06:25:55 ] @  0x4841b56  google::LogMessage::Flush()
06:25:55 ] @  0x484511d  google::LogMessageFatal::~LogMessageFatal()
06:25:55 ] @  0x1a510dd  main
06:25:55 ] @ 0x7f70726a482f  __libc_start_main
06:25:55 ] @  0x1a44a68  _start
06:25:55 ] Wrote minidump to 
/home/ubuntu/Impala/logs/be_tests/minidumps/thrift-server-test/d5f8d324-a674-4331-52673c89-d7e11863.dmp
06:25:55 ] Wrote minidump to 
/home/ubuntu/Impala/logs/be_tests/minidumps/thrift-server-test/d5f8d324-a674-4331-52673c89-d7e11863.dmp

I don't think this change has anything to do with it. Restarting the GVO.


--
To view, visit http://gerrit.cloudera.org:8080/12542
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I051d06451fb14ba2ffb3f7e0d4aae37dee55ab86
Gerrit-Change-Number: 12542
Gerrit-PatchSet: 12
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Austin Nobis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: radford nguyen 
Gerrit-Comment-Date: Thu, 07 Mar 2019 06:40:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7917 (Part 2): Decouple Sentry from Impala

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12542 )

Change subject: IMPALA-7917 (Part 2): Decouple Sentry from Impala
..


Patch Set 12: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/3887/


--
To view, visit http://gerrit.cloudera.org:8080/12542
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I051d06451fb14ba2ffb3f7e0d4aae37dee55ab86
Gerrit-Change-Number: 12542
Gerrit-PatchSet: 12
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Austin Nobis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: radford nguyen 
Gerrit-Comment-Date: Thu, 07 Mar 2019 06:25:56 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7988: addendum: tolerate docker stop failure

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12640 )

Change subject: IMPALA-7988: addendum: tolerate docker stop failure
..


Patch Set 4: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/12640
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I211c9d29a9a30f9eebfff2973b4fb422d963f132
Gerrit-Change-Number: 12640
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 07 Mar 2019 05:58:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7988: addendum: tolerate docker stop failure

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/12640 )

Change subject: IMPALA-7988: addendum: tolerate docker stop failure
..

IMPALA-7988: addendum: tolerate docker stop failure

"docker stop" may fail if the container has not started.

This bug prevented using the dockerised minicluster.

Change-Id: I211c9d29a9a30f9eebfff2973b4fb422d963f132
Reviewed-on: http://gerrit.cloudera.org:8080/12640
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M bin/start-impala-cluster.py
1 file changed, 2 insertions(+), 1 deletion(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/12640
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I211c9d29a9a30f9eebfff2973b4fb422d963f132
Gerrit-Change-Number: 12640
Gerrit-PatchSet: 5
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7917 (Part 2): Decouple Sentry from Impala

2019-03-06 Thread radford nguyen (Code Review)
radford nguyen has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12542 )

Change subject: IMPALA-7917 (Part 2): Decouple Sentry from Impala
..


Patch Set 12:

(1 comment)

hi @fredy, just adding a teeny little comment

http://gerrit.cloudera.org:8080/#/c/12542/2/fe/src/main/java/org/apache/impala/analysis/CopyTestCaseStmt.java
File fe/src/main/java/org/apache/impala/analysis/CopyTestCaseStmt.java:

http://gerrit.cloudera.org:8080/#/c/12542/2/fe/src/main/java/org/apache/impala/analysis/CopyTestCaseStmt.java@121
PS2, Line 121: analyzer.registerPrivReq(builder ->
> I like the last one. Will update.
I also prefer the last one because it makes it easy to see which methods are 
called on `builder` at-a-glance.  With the one-liner, method calls on other 
objects (e.g. arguments) produce noise



--
To view, visit http://gerrit.cloudera.org:8080/12542
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I051d06451fb14ba2ffb3f7e0d4aae37dee55ab86
Gerrit-Change-Number: 12542
Gerrit-PatchSet: 12
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Austin Nobis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: radford nguyen 
Gerrit-Comment-Date: Thu, 07 Mar 2019 03:48:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR](2.x) Ignore IMPALA-6883 and IMPALA-7236

2019-03-06 Thread Quanlong Huang (Code Review)
Quanlong Huang has abandoned this change. ( 
http://gerrit.cloudera.org:8080/12678 )

Change subject: Ignore IMPALA-6883 and IMPALA-7236
..


Abandoned

Should backport IMPALA-6883 to 2.x
--
To view, visit http://gerrit.cloudera.org:8080/12678
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: abandon
Gerrit-Change-Id: I22c5e379085fe894fc3e308919542ae8855bd73e
Gerrit-Change-Number: 12678
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 


[Impala-ASF-CR](2.x) Ignore IMPALA-6883 and IMPALA-7236

2019-03-06 Thread Quanlong Huang (Code Review)
Quanlong Huang has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12678 )

Change subject: Ignore IMPALA-6883 and IMPALA-7236
..


Patch Set 1:

> Patch Set 1:
>
> > Patch Set 1:
> >
> > Fredy, could you help to double check that IMPALA-6883 should be skipped 
> > for 2.x? Thanks!
>
> Do we plan to backport the fine-grained permissions into 2.x? If yes, we 
> should backport the doc as well.

Oh, sure! Haven't gone through this in details. I thought it's for 3.x. Will 
upload a new patch.


--
To view, visit http://gerrit.cloudera.org:8080/12678
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I22c5e379085fe894fc3e308919542ae8855bd73e
Gerrit-Change-Number: 12678
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Thu, 07 Mar 2019 03:12:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6326: part 2: remove fetch thread in stress test

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12681 )

Change subject: IMPALA-6326: part 2: remove fetch thread in stress test
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/2382/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If9afd74e1408823a9e5c0f2628ec9f8aafdcec69
Gerrit-Change-Number: 12681
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 07 Mar 2019 02:38:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6326: part 2: remove fetch thread in stress test

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12681 )

Change subject: IMPALA-6326: part 2: remove fetch thread in stress test
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/2381/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If9afd74e1408823a9e5c0f2628ec9f8aafdcec69
Gerrit-Change-Number: 12681
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 07 Mar 2019 02:18:29 +
Gerrit-HasComments: No


[Impala-ASF-CR] [stress] pull out QueryRunner

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12578 )

Change subject: [stress] pull out QueryRunner
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/2380/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12578
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I69c907a65147d082211837cbbaba7225aa4b67cb
Gerrit-Change-Number: 12578
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 07 Mar 2019 02:18:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12591 )

Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates
..


Patch Set 12:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/2379/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353
Gerrit-Change-Number: 12591
Gerrit-PatchSet: 12
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Bharath Krishna 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 07 Mar 2019 02:17:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12481 )

Change subject: IMPALA-7368: Add initial support for DATE type
..


Patch Set 9:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/2378/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12481
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f
Gerrit-Change-Number: 12481
Gerrit-PatchSet: 9
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 07 Mar 2019 02:14:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] [stress] factor out MemBroker into its own file.

2019-03-06 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12687


Change subject: [stress] factor out MemBroker into its own file.
..

[stress] factor out MemBroker into its own file.

This is a cut-and-paste with the appropriate manual updates to
import statements.

The only motivation is to make concurrent_select.py less intimidating.

Testing:
Ran a short local stress test.

Change-Id: I475d2e166bbf940e89135ea04be7b3003c37141b
---
M tests/stress/concurrent_select.py
A tests/stress/mem_broker.py
2 files changed, 124 insertions(+), 96 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/12687/1
--
To view, visit http://gerrit.cloudera.org:8080/12687
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I475d2e166bbf940e89135ea04be7b3003c37141b
Gerrit-Change-Number: 12687
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7988: addendum: tolerate docker stop failure

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12640 )

Change subject: IMPALA-7988: addendum: tolerate docker stop failure
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3886/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/12640
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I211c9d29a9a30f9eebfff2973b4fb422d963f132
Gerrit-Change-Number: 12640
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 07 Mar 2019 01:36:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7917 (Part 2): Decouple Sentry from Impala

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12542 )

Change subject: IMPALA-7917 (Part 2): Decouple Sentry from Impala
..


Patch Set 12:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3887/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/12542
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I051d06451fb14ba2ffb3f7e0d4aae37dee55ab86
Gerrit-Change-Number: 12542
Gerrit-PatchSet: 12
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Austin Nobis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Thu, 07 Mar 2019 01:36:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12591 )

Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates
..


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12591/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/12591/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3179
PS12, Line 3179: "Table parameters must contain catalog version 
before adding it to partition parameters");
line too long (102 > 90)



--
To view, visit http://gerrit.cloudera.org:8080/12591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353
Gerrit-Change-Number: 12591
Gerrit-PatchSet: 12
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Bharath Krishna 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Vihang Karajgaonkar 
Gerrit-Comment-Date: Thu, 07 Mar 2019 01:34:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type

2019-03-06 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12481 )

Change subject: IMPALA-7368: Add initial support for DATE type
..


Patch Set 9:

(2 comments)

Do you mean that, with these code changes, you can scan HBase date tables? Or 
that we should have negative tests for HBase?

I think we should have test coverage but I'm open to splitting that out into a 
separate patch if that helps this get in faster.

http://gerrit.cloudera.org:8080/#/c/12481/6/fe/src/main/java/org/apache/impala/analysis/DateLiteral.java
File fe/src/main/java/org/apache/impala/analysis/DateLiteral.java:

http://gerrit.cloudera.org:8080/#/c/12481/6/fe/src/main/java/org/apache/impala/analysis/DateLiteral.java@127
PS6, Line 127: return daysSinceEpoch_ - other.daysSinceEpoch_;
> Done. At the very beginning 'daysSinceEpoch_' was not a member and later I
Makes sense!


http://gerrit.cloudera.org:8080/#/c/12481/6/testdata/workloads/functional-query/queries/QueryTest/date.test
File testdata/workloads/functional-query/queries/QueryTest/date.test:

http://gerrit.cloudera.org:8080/#/c/12481/6/testdata/workloads/functional-query/queries/QueryTest/date.test@36
PS6, Line 36: where '2017-11-28' in (date_col)
> To be honest, I didn't give this too much thought. I just copied the timest
Ah ok, that makes sense. I don't think there are likely to be bugs here but 
it's good to be sure.



--
To view, visit http://gerrit.cloudera.org:8080/12481
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f
Gerrit-Change-Number: 12481
Gerrit-PatchSet: 9
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 07 Mar 2019 00:55:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-6326: part 2: remove fetch thread in stress test

2019-03-06 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/12681 )

Change subject: IMPALA-6326: part 2: remove fetch thread in stress test
..

IMPALA-6326: part 2: remove fetch thread in stress test

Ensure cursor is only accessed from a single thread. The means reworking
the code so that we check the time limit between fetch calls.

Use EXEC_TIME_LIMIT_S as an alternative to the previous multi-threaded
cancellation logic - it allows queries to be cancelled even when the
client is blocked or slow. This is implemented with the concept of
a CancelMechanism that determines *how* a query should be cancelled.
Query timeouts (where we want to cancel queries that run longer
than expected) are implemented using both cancel mechanisms, in
case the client is stuck in fetch or similar. Expected cancellations
are implemented with a random mechanism so that both code paths get
covered.

Testing:
Ran a cluster stress test.

Ran a couple of single-node stress tests with TPC-H and random queries.

Change-Id: If9afd74e1408823a9e5c0f2628ec9f8aafdcec69
---
M tests/stress/concurrent_select.py
M tests/stress/query_runner.py
2 files changed, 127 insertions(+), 88 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/12681/3
--
To view, visit http://gerrit.cloudera.org:8080/12681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If9afd74e1408823a9e5c0f2628ec9f8aafdcec69
Gerrit-Change-Number: 12681
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] IMPALA-6326: part 2: remove fetch thread in stress test

2019-03-06 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/12681 )

Change subject: IMPALA-6326: part 2: remove fetch thread in stress test
..

IMPALA-6326: part 2: remove fetch thread in stress test

Ensure cursor is only accessed from a single thread. The means reworking
the code so that we check the time limit between fetch calls.

Use EXEC_TIME_LIMIT_S as an alternative to the previous multi-threaded
cancellation logic - it allows queries to be cancelled even when the
client is blocked or slow. This is implemented with the concept of
a CancelMechanism that determines *how* a query should be cancelled.
Query timeouts (where we want to cancel queries that run longer
than expected) are implemented using both cancel mechanisms, in
case the client is stuck in fetch or similar. Expected cancellations
are implemented with a random mechanism so that both code paths get
covered.

Testing:
Ran a cluster stress test.

Ran a couple of single-node stress tests with TPC-H and random queries.

Change-Id: If9afd74e1408823a9e5c0f2628ec9f8aafdcec69
---
M tests/stress/concurrent_select.py
M tests/stress/query_runner.py
2 files changed, 129 insertions(+), 88 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/81/12681/2
--
To view, visit http://gerrit.cloudera.org:8080/12681
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If9afd74e1408823a9e5c0f2628ec9f8aafdcec69
Gerrit-Change-Number: 12681
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 


[Impala-ASF-CR] [stress] pull out QueryRunner

2019-03-06 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12578 )

Change subject: [stress] pull out QueryRunner
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12578/2/tests/stress/query_runner.py
File tests/stress/query_runner.py:

http://gerrit.cloudera.org:8080/#/c/12578/2/tests/stress/query_runner.py@79
PS2, Line 79: # TODO: other classes reach in and touch _metrics
> I fixed this, can remove the TODO
Done



--
To view, visit http://gerrit.cloudera.org:8080/12578
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I69c907a65147d082211837cbbaba7225aa4b67cb
Gerrit-Change-Number: 12578
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 07 Mar 2019 00:23:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] [stress] pull out QueryRunner

2019-03-06 Thread Tim Armstrong (Code Review)
Hello Thomas Marshall, David Knupp, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12578

to look at the new patch set (#3).

Change subject: [stress] pull out QueryRunner
..

[stress] pull out QueryRunner

Further refactoring to reduce the size of concurrent_select.py.

Pull out the QueryRunner class and miscellaneous utility functions.
Improve encapsulation of _metrics by adding accessor functions.

Testing:
Ran locally with various arguments including DML and random queries.
Made it sure did some binary search by deleting parts of runtime info.

Change-Id: I69c907a65147d082211837cbbaba7225aa4b67cb
---
M tests/stress/concurrent_select.py
A tests/stress/query_runner.py
A tests/stress/util.py
3 files changed, 538 insertions(+), 469 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/78/12578/3
--
To view, visit http://gerrit.cloudera.org:8080/12578
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I69c907a65147d082211837cbbaba7225aa4b67cb
Gerrit-Change-Number: 12578
Gerrit-PatchSet: 3
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates

2019-03-06 Thread Vihang Karajgaonkar (Code Review)
Vihang Karajgaonkar has uploaded a new patch set (#12). ( 
http://gerrit.cloudera.org:8080/12591 )

Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates
..

IMPALA-7972 Detect self-events to avoid unnecessary invalidates

This patch adds support to detect self-generated events from catalog.
This is used to avoid unnecessary invalidates to the tables from such
self-events. Currently, alter_table, alter_partition, add_partition and
drop_partition event types can invalidate the table metadata.

Originally, we planned to have a global version number support from
metastore (see HIVE-21115). But since that is still not complete, we
rely on a combination of other identifiers to determine if a event is
self-generated or not. These self-event identifiers consists of values
from the table/partition parameters. A catalog service uuid
and the catalog version number. The uuid is generated for each
catalogservice when it comes up and it adds it to the table/partition
parameters with the key "impala.CatalogServiceId". The catalog version
number is added with the key "impala.CatalogVersion".

When catalog executes a DDL operation it appends the current catalog
version to the list of version numbers for the in-flight events for the
table. Events processor clears this version when the corresponding
version number identified by serviceId is received in the event. This is
needed since it is possible that a external non-Impala system which
generates the event presents the same serviceId and version number later
on. The algorithm to detect a self-event is as below.

1. Add the service id and expected catalog version to table/partition
parameters when executing the DDL operation. When the HMS operation is
successful, add the version number to the list of version for in-flight
events at table level.
2. When the event is received, the first time you see the combination of
serviceId and version number, event processor clears the version number
from table's list and determines the event as self-generated (and hence
ignored)
3. If the event data presents a unknown serviceId or if the version
number is not present in the list of in-flight versions, event is not a
self-event and needs to be processed.

In order to limit the total memory footprint, only 10 version numbers
are stored at the table. Since the event processor is expected to poll
every few seconds this should be a reasonable bound which satisfies most
use-cases. Otherwise, event processor may wrongly process a self-event
to invalidate the table. In such a case, its a performance penalty not a
correctness issue.

In case of drop_partition event, the partition object is not available
in the event. Hence we cannot determine if its a self-event. In such
cases currently we always issue a invalidate command. This is a known
limitation and will be improved in IMPALA-7973

Patch adds new tests to trigger alter table/partition DDLs from impala
and makes sure that the table is not invalidated.

Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M 
fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M 
fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
6 files changed, 1,102 insertions(+), 145 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/12591/12
--
To view, visit http://gerrit.cloudera.org:8080/12591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353
Gerrit-Change-Number: 12591
Gerrit-PatchSet: 12
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Bharath Krishna 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Vihang Karajgaonkar 


[Impala-ASF-CR] IMPALA-8147: part 2: Remove make impala.sh

2019-03-06 Thread Joe McDonnell (Code Review)
Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12556 )

Change subject: IMPALA-8147: part 2: Remove make_impala.sh
..


Patch Set 1: Code-Review+2

Good to see this cleaned up!


--
To view, visit http://gerrit.cloudera.org:8080/12556
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8688ef9fe651475907cb319eb0b336fd6bef8dfe
Gerrit-Change-Number: 12556
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 07 Mar 2019 00:09:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-6503: Support reading complex types from ORC

2019-03-06 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12168 )

Change subject: IMPALA-6503: Support reading complex types from ORC
..


Patch Set 22: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/12168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790
Gerrit-Change-Number: 12168
Gerrit-PatchSet: 22
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 06 Mar 2019 23:56:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.

2019-03-06 Thread Andrew Sherman (Code Review)
Andrew Sherman has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12680 )

Change subject: IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance 
regression.
..


Patch Set 1:

Thanks Csaba for the review


--
To view, visit http://gerrit.cloudera.org:8080/12680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6bcbaf564fbbe6dc83ba3afc100b4e5ccc7af40
Gerrit-Change-Number: 12680
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Wed, 06 Mar 2019 21:41:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type

2019-03-06 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#9). ( 
http://gerrit.cloudera.org:8080/12481 )

Change subject: IMPALA-7368: Add initial support for DATE type
..

IMPALA-7368: Add initial support for DATE type

DATE values describe a particular year/month/day in the form
-MM-dd. For example: DATE '2019-02-15'. DATE values do not have a
time of day component. The range of values supported for the DATE type
is -01-01 to -12-31.

This initial DATE type support covers TEXT fileformat only.
'DateValue' is used as the internal type to represent DATE values.

The changes are as follows:
- Support for DATE literal syntax.
- Explicit casting between DATE and other types:
- from STRING to DATE. The string value must be formatted as
  -MM-dd.
- from DATE to STRING. The resulting string value is formatted as
  -MM-dd.
- from TIMESTAMP to DATE. The source timestamp's time of day
  component is ignored.
- from DATE to TIMESTAMP. The target timestamp's time of day
  component is set to 00:00:00.
- Implicit casting between DATE and other types:
- from STRING to DATE if the source string value is used in a
  context where a DATE value is expected.
- from DATE to TIMESTAMP if the source date value is used in a
  context where a TIMESTAMP value is expected.
- Since both STRING -> DATE and STRING -> TIMESTAMP implicit
  conversions are possible, the function resolution logic was changed
  to select the right version of overloaded functions:
 - If both implicit conversions are applicable equally well,
   STRING -> TIMESTAMP is preferred for backward compatibility
   reasons. E.g: year('2019-02-15') must resolve to
   year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is
   not implemented yet, so this is not an issue at the moment but
   it will be in the future.
 - If one implicit conversion is a better fit over the other, we
   must choose the better one. E.g:
   if(false, '2011-01-01', DATE '1499-02-02') must resolve to
   if(BOOLEAN, DATE, DATE) instead of
   if(BOOLEAN, TIMESTAMP, TIMESTAMP).
- Codegen infrastructure changes for expression evaluation.
- 'IS [NOT] NULL' and '[NOT] IN' predicates.
- Common comparison operators (including the 'BETWEEN' operator).
- Infrastructure changes for built-in functions.
- Some built-in functions: conditional, aggregate, analytical and
  math functions.
- C++ UDF/UDA support.
- Support partitioning and grouping by DATE.
- Beeswax, HiveServer2 support.

These items are tightly coupled and it makes sense to implement them
in one change-set.

Testing:
- A new partitioned TEXT table 'functional.date_tbl' was introduced
  for DATE-related tests.
- BE and FE tests were extended to cover DATE type.
- E2E tests:
- since DATE type is supported for TEXT fileformat only, most DATE
  tests were implemented separately in
  tests/query_test/test_date_queries.py.

Note, that this change-set is not a complete DATE type implementation,
but it lays the foundation for future work:
- Add date support to the random query generator.
- Implement a complete set of built-in functions.
- Add Parquet support.
- Add HBase and Kudu support.
- Optionally support Avro and ORC.
For further details, see IMPALA-6169.

Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f
---
M be/src/codegen/codegen-anyval.cc
M be/src/codegen/codegen-anyval.h
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/llvm-codegen.cc
M be/src/exec/aggregator.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/hash-table.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/text-converter.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/agg-fn-evaluator.cc
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M be/src/exprs/anyval-util.cc
M be/src/exprs/anyval-util.h
M be/src/exprs/case-expr.cc
M be/src/exprs/case-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/cast-functions.h
M be/src/exprs/conditional-functions-ir.cc
M be/src/exprs/conditional-functions.h
M be/src/exprs/expr-test.cc
M be/src/exprs/expr-value.h
M be/src/exprs/hive-udf-call.cc
M be/src/exprs/hive-udf-call.h
M be/src/exprs/in-predicate-ir.cc
M be/src/exprs/in-predicate.h
M be/src/exprs/is-null-predicate-ir.cc
M be/src/exprs/literal.cc
M be/src/exprs/literal.h
M be/src/exprs/math-functions-ir.cc
M be/src/exprs/math-functions.h
M be/src/exprs/null-literal.cc
M be/src/exprs/null-literal.h
M be/src/exprs/operators-ir.cc
M be/src/exprs/operators.h
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr-ir.cc
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/scalar-fn-call.h
M be/src/exprs/slot-ref.cc
M be/src/exprs/slot-ref.h
M be/src/exprs/timestamp-functions.h
M be/src/exprs/utility-functions-ir.cc
M 

[Impala-ASF-CR] IMPALA-8255: [DOCS] Document the DEFAULT FILE FORMAT query option

2019-03-06 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12670 )

Change subject: IMPALA-8255: [DOCS] Document the DEFAULT_FILE_FORMAT query 
option
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12670/2/docs/topics/impala_default_file_format.xml
File docs/topics/impala_default_file_format.xml:

http://gerrit.cloudera.org:8080/#/c/12670/2/docs/topics/impala_default_file_format.xml@46
PS2, Line 46: The SET DEFAULT_FILE_FORMAT statement will 
not return an
:   error when the option was set to an unsupported value. The 
error will
:   return at the next CREATE TABLE 
statement.
> i think this is only true when running it via Impala shell. With HS2, we wi
Yeah that's correct, the validation happens server-side and impala-shell 
doesn't send the option to the server until it actually needs to execute a 
query on the server. I think Hue may actually have similar behaviour.

So it fails on the next query that impala-shell sends to the server (even if 
it's e.g. just a select statement).



--
To view, visit http://gerrit.cloudera.org:8080/12670
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5dc0e5f66ed59ff516ace2e67b74ade29492186e
Gerrit-Change-Number: 12670
Gerrit-PatchSet: 2
Gerrit-Owner: Alex Rodoni 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 06 Mar 2019 21:30:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Prototype for a remote read byte cache.

2019-03-06 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12683 )

Change subject: Prototype for a remote read byte cache.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12683/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12683/1//COMMIT_MSG@92
PS1, Line 92: useful. The buffer pool code currently provides buffers that are 
pinned
Yup, I think if we convince ourselves that this is useful without cache 
encryption and it is initially only deployed in pure remote read scenarios, 
then this definitely isn't a blocking issue. This is potentially so impactful I 
wouldn't want the perfect to be the enemy of the good. Maybe leaning on mmap() 
lets us get to a good-enough solution faster and we can later revisit.

If we're doing mixed local/remote, the lack of observability for I/O contention 
may be more of an issue. It's similar to some extent to our reliance on the 
buffer cache, but there is a little observability there by virtue of it going 
through the I/O manager.

> I could definitely imagine a world where Impala has a default cache size of 
> ~4GB that's always devoted to cache. The rest of Impala "managed" (as in, 
> within memlimit) memory gets used by the cache if it's available, and evicted 
> as necessary. The cache here is the "memory user of last resort".
I probably sound like a broken record but the buffer pool already has a lot of 
the logic to account for this - "clean pages" are basically this case, where 
they don't count against memory limits because they can be evicted cheaply as 
needed. I imagine the policies would change a bit compared to the spill-to-disk 
case.

Actually one interesting thing (which weighs against us trying to be too clever 
with caching inside Impala) is that caching things in the buffer pool was often 
less effective than the O/S buffer cache, so it was a win to cap the number of 
clean pages cached. See http://gerrit.cloudera.org:8080/7653. It seems like we 
may be able to do better than the buffer cache if we understand the I/O 
patterns and priorities, but it's definitely not a given!



--
To view, visit http://gerrit.cloudera.org:8080/12683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic312b0f7ac7875e00a3855ef21dce5b8a9aa67c5
Gerrit-Change-Number: 12683
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 06 Mar 2019 21:15:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8271: Refactor the use of Thrift enums for query options

2019-03-06 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12682 )

Change subject: IMPALA-8271: Refactor the use of Thrift enums for query options
..


Patch Set 2:

I don't have time to fully review this but a quick skim looks good - thanks for 
making this part of the code more sane!


--
To view, visit http://gerrit.cloudera.org:8080/12682
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6d747aae2c689765be72e117ce030ce4e3ce4641
Gerrit-Change-Number: 12682
Gerrit-PatchSet: 2
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 06 Mar 2019 21:02:06 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8271: Refactor the use of Thrift enums for query options

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12682 )

Change subject: IMPALA-8271: Refactor the use of Thrift enums for query options
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/2377/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12682
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6d747aae2c689765be72e117ce030ce4e3ce4641
Gerrit-Change-Number: 12682
Gerrit-PatchSet: 2
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 06 Mar 2019 20:37:22 +
Gerrit-HasComments: No


[Impala-ASF-CR] Prototype for a remote read byte cache.

2019-03-06 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12683 )

Change subject: Prototype for a remote read byte cache.
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12683/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12683/1//COMMIT_MSG@63
PS1, Line 63:   you'd actually want to integrate this with the general scratch 
space
> Currently we don't have global limits in TmpFileMgr, just per-query limits
It seems like spilling is going to be pretty sequential, so spinning cares less 
about SSD than caching does. That's another angle to reconcile.

Anyway--I agree that this is tractable. We don't want to run the user out of 
disk space, and evicting cache is a very reasonable thing to do.


http://gerrit.cloudera.org:8080/#/c/12683/1//COMMIT_MSG@92
PS1, Line 92: useful. The buffer pool code currently provides buffers that are 
pinned
> I think having the pages pinned by default is the right thing for this case
Yep. I think the underlying question here is how much we want to "manage" this. 
In the traditional "local" case, we don't manage the OS buffer cache, and it 
definitely helps us out.

I could definitely imagine a world where Impala has a default cache size of 
~4GB that's always devoted to cache. The rest of Impala "managed" (as in, 
within memlimit) memory gets used by the cache if it's available, and evicted 
as necessary. The cache here is the "memory user of last resort".

Your point on observability is well-taken. Remote reads incur a write-to-cache 
cost that's kind of unexpected. Fortunately, we presume the disk is fairly idle 
given that the reads are happening over the network, but it'll be an important 
thing to surface.



--
To view, visit http://gerrit.cloudera.org:8080/12683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic312b0f7ac7875e00a3855ef21dce5b8a9aa67c5
Gerrit-Change-Number: 12683
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 06 Mar 2019 20:14:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8271: Refactor the use of Thrift enums for query options

2019-03-06 Thread Fredy Wijaya (Code Review)
Fredy Wijaya has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12682


Change subject: IMPALA-8271: Refactor the use of Thrift enums for query options
..

IMPALA-8271: Refactor the use of Thrift enums for query options

This patch refactors the use of Thrift enums with GetThriftEnum helper
function that can automatically validate and convert the query option
value into the corresponding Thrift enum value. The validation error
message has also been improved to list all possible valid query option
values.

Testing:
- Added missing test cases in both BE and E2E
- Ran query-options-test.cc
- Ran metadata/test_set.py
- Ran query_test/test_nested_types.py
- Ran query_test/test_scanners.py

Change-Id: I6d747aae2c689765be72e117ce030ce4e3ce4641
---
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M 
testdata/workloads/functional-query/queries/QueryTest/parquet-ambiguous-list-modern.test
M 
testdata/workloads/functional-query/queries/QueryTest/parquet-resolution-by-name.test
M testdata/workloads/functional-query/queries/QueryTest/set.test
5 files changed, 85 insertions(+), 109 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/12682/2
--
To view, visit http://gerrit.cloudera.org:8080/12682
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6d747aae2c689765be72e117ce030ce4e3ce4641
Gerrit-Change-Number: 12682
Gerrit-PatchSet: 2
Gerrit-Owner: Fredy Wijaya 


[Impala-ASF-CR] IMPALA-7917 (Part 2): Decouple Sentry from Impala

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12542 )

Change subject: IMPALA-7917 (Part 2): Decouple Sentry from Impala
..


Patch Set 12: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12542
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I051d06451fb14ba2ffb3f7e0d4aae37dee55ab86
Gerrit-Change-Number: 12542
Gerrit-PatchSet: 12
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Austin Nobis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Wed, 06 Mar 2019 19:54:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7917 (Part 2): Decouple Sentry from Impala

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12542 )

Change subject: IMPALA-7917 (Part 2): Decouple Sentry from Impala
..


Patch Set 12:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3885/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/12542
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I051d06451fb14ba2ffb3f7e0d4aae37dee55ab86
Gerrit-Change-Number: 12542
Gerrit-PatchSet: 12
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Austin Nobis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Wed, 06 Mar 2019 19:54:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7917 (Part 2): Decouple Sentry from Impala

2019-03-06 Thread Paul Rogers (Code Review)
Paul Rogers has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12542 )

Change subject: IMPALA-7917 (Part 2): Decouple Sentry from Impala
..


Patch Set 11: Code-Review+2

Very nicely done!


--
To view, visit http://gerrit.cloudera.org:8080/12542
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I051d06451fb14ba2ffb3f7e0d4aae37dee55ab86
Gerrit-Change-Number: 12542
Gerrit-PatchSet: 11
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Austin Nobis 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Comment-Date: Wed, 06 Mar 2019 19:44:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] Prototype for a remote read byte cache.

2019-03-06 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12683 )

Change subject: Prototype for a remote read byte cache.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12683/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12683/1//COMMIT_MSG@91
PS1, Line 91: manager code, and especially its ability to do encryption, would 
be very
I think the other big advantage is that it provides observability - it actually 
tracks what I/O it does, how much is in memory, etc. Whereas with mmap() it's 
opaque.



--
To view, visit http://gerrit.cloudera.org:8080/12683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic312b0f7ac7875e00a3855ef21dce5b8a9aa67c5
Gerrit-Change-Number: 12683
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 06 Mar 2019 19:41:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Prototype for a remote read byte cache.

2019-03-06 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12683 )

Change subject: Prototype for a remote read byte cache.
..


Patch Set 1:

(6 comments)

Left a few comments mostly about high-level things

http://gerrit.cloudera.org:8080/#/c/12683/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12683/1//COMMIT_MSG@42
PS1, Line 42: I've not made the cache coalesce overlapping reads. I suspect that
The main reason I can think of that we might issue reads in smaller increments 
is memory scarcity - the query can't allocate large enough buffers for its 
scans. We try to avoid this so I think the impact is likely to be limited.


http://gerrit.cloudera.org:8080/#/c/12683/1//COMMIT_MSG@63
PS1, Line 63:   you'd actually want to integrate this with the general scratch 
space
Currently we don't have global limits in TmpFileMgr, just per-query limits 
(per-FileGroup, really). So currently they won't fight each other within 
Impala's accounting, but could collectively exhaust disk space on the host.

I don't think it would be too hard to add some collective limits on spilled 
data vs cached data.


http://gerrit.cloudera.org:8080/#/c/12683/1//COMMIT_MSG@92
PS1, Line 92: useful. The buffer pool code currently provides buffers that are 
pinned
I think having the pages pinned by default is the right thing for this case 
since they're not useful unless you write something into them, and you can't 
write anything into them if they're not pinned.

Potentially the worse pitfall there is that the Page (i.e. spill-enabled) 
interface for the buffer pool has some gaps in functionality because it's only 
been built out to handle the spilling plan node use case. Specifically:
* It's not thread-safe to invoke Page operations with the same Client from 
different threads concurrently.
* It is eager about flushing unpinned pages to disk, which makes sense for the 
exec node because they only unpin things when under memory pressure. However 
here it would mean you pay the encryption and I/O cost too soon. We could 
either address this by changing the algorithm or by having the I/O cache manage 
the cache and keeping "hot" pages pinned.


http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/util/cache.cc
File be/src/util/cache.cc:

http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/util/cache.cc@56
PS1, Line 56:   fd_ = mkstemp(fname.get());
I don't know if this is better or worse than the approach in 
TmpFileMgr::InitCustom() for setting up scratch but I think we want to align 
them so we only have one potential failure mode for setting up temporary space.


http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/util/cache.cc@298
PS1, Line 298:   // Not sure if this is quite the right way to combine things
Yup it is


http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/util/cache.cc@300
PS1, Line 300:   uint64_t h = HashUtil::Hash(filename.c_str(), 
filename.length(), 0);
It's probably best to use FastHash64(). HashUtil::Hash() calls out to CRC which 
is a very fast but non-robust hash. FastHash is a bit less fast but much more 
robust.



--
To view, visit http://gerrit.cloudera.org:8080/12683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic312b0f7ac7875e00a3855ef21dce5b8a9aa67c5
Gerrit-Change-Number: 12683
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 06 Mar 2019 19:39:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Prototype for a remote read byte cache.

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12683 )

Change subject: Prototype for a remote read byte cache.
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/2376/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic312b0f7ac7875e00a3855ef21dce5b8a9aa67c5
Gerrit-Change-Number: 12683
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 06 Mar 2019 19:30:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-3816, IMPALA-4065: Remove the indirection to TupleRowComparator::Compare()

2019-03-06 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10680 )

Change subject: IMPALA-3816, IMPALA-4065: Remove the indirection to 
TupleRowComparator::Compare()
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/10680/3/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

http://gerrit.cloudera.org:8080/#/c/10680/3/be/src/codegen/llvm-codegen.cc@915
PS3, Line 915: llvm::Function* 
LlvmCodeGen::ReplaceCallSitesRecursively(llvm::Function* caller,
> As discussed offline, this is an interesting idea. On the other hand, this
This patch is too big for me to adopt at the moment but it would be nice to 
move it along at some point.

I think doing something like this makes sense to solve two problems that we 
don't currently handle in codegen:
1. recursive functions (like in Sorter) where we need to fix up the specialised 
function to call itself (directly or indirectly)
2. code that we want to specialise where the function we need to replace is 
called from non-Impala code (like with std::push_heap and std::pop_heap in Top 
N node)

A nice side-effect would be to avoid having to sprinkle IR_ALWAYS_INLINE in as 
many places. So I like having a function that achieves this. Agree that it 
needs a little thought about how to avoid silent regressions (either cloning 
too many functions or not enough - e.g. if a virtual function call was 
introduced by a code change, preventing this algorithm from following the 
callchain). We could potentially return the number of functions cloned and have 
DCHECKs in callers on that.

I also think, per your comment, that we don't need to find the SCC to solve 
this. If we're planning to specialise a set of functions F, with entry point 
function e, then we need to find F', the set of functions reachable from e that 
call a function in F directly or indirectly. Once you have that, you clone all 
those functions and do a pass over them to replace all calls to the original 
versions with the specialised versions.

This algorithm does the replacement more incrementally, doing the replacement a 
SCC at a time. I don't see a real gain from doing it this way (cache locality, 
maybe?).

I think you can indeed find F' with plain DFS. DFS will find you the subgraph 
reachable from e, and you need to augment it to keep track of the set of nodes 
that call one of the specialised functions directly/indirectly. That seems 
straightforward - you add a node to the set every time you follow an edge that 
leads to a node in the visit.


http://gerrit.cloudera.org:8080/#/c/10680/3/be/src/codegen/llvm-codegen.cc@981
PS3, Line 981:   } else if 
(llvm::isa(&*dfs_stack.top().iter)) {
We generally want to avoid Invoke instructions in codegen'd code - those are 
cases where clang things the callee may throw and exception.


http://gerrit.cloudera.org:8080/#/c/10680/3/be/src/exec/topn-node.cc
File be/src/exec/topn-node.cc:

http://gerrit.cloudera.org:8080/#/c/10680/3/be/src/exec/topn-node.cc@123
PS3, Line 123: codegen->GetFunction(IRFunction::COMPARE_INTERPRETED, 
false), tuple_compare_fn);
This is weird because it doesn't remove the branch in Compare() on 
codegend_compare_fn_ == NULL.



--
To view, visit http://gerrit.cloudera.org:8080/10680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If4657ac09daf20408797856d94521d417d8cf171
Gerrit-Change-Number: 10680
Gerrit-PatchSet: 3
Gerrit-Owner: Tianyi Wang 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tianyi Wang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 06 Mar 2019 19:16:44 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] Prototype for a remote read byte cache.

2019-03-06 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12683 )

Change subject: Prototype for a remote read byte cache.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12683/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12683/1//COMMIT_MSG@28
PS1, Line 28: The code in this commit builds a cache with the simplest policy I 
could
Note that Kudu has an LRU cache that could be used pretty directly.



--
To view, visit http://gerrit.cloudera.org:8080/12683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic312b0f7ac7875e00a3855ef21dce5b8a9aa67c5
Gerrit-Change-Number: 12683
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 06 Mar 2019 19:12:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12660 )

Change subject: IMPALA-8250: Clean up JNI warnings.
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/2375/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Gerrit-Change-Number: 12660
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 06 Mar 2019 18:59:01 +
Gerrit-HasComments: No


[Impala-ASF-CR] Prototype for a remote read byte cache.

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12683 )

Change subject: Prototype for a remote read byte cache.
..


Patch Set 1:

(28 comments)

http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/runtime/io/disk-io-mgr.cc
File be/src/runtime/io/disk-io-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/runtime/io/disk-io-mgr.cc@59
PS1, Line 59: DEFINE_int32(cache_pct, 50, "Percentage (0-100) of free space on 
cache_dirs volumes' to use.");
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/runtime/io/hdfs-file-reader.h
File be/src/runtime/io/hdfs-file-reader.h:

http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/runtime/io/hdfs-file-reader.h@47
PS1, Line 47:   Status ReadFromPosInternal(MultiCache* cache, hdfsFile 
hdfs_file, int64_t position_in_file,
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/runtime/io/hdfs-file-reader.cc
File be/src/runtime/io/hdfs-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/runtime/io/hdfs-file-reader.cc@182
PS1, Line 182:   RETURN_IF_ERROR(ReadFromPosInternal(hdfs_file, 
position_in_file, is_borrowed_fh, buffer, chunk_size, bytes_read));
line too long (116 > 90)


http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/runtime/io/request-context.h
File be/src/runtime/io/request-context.h:

http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/runtime/io/request-context.h@161
PS1, Line 161:   int remote_read_cache_miss_count() const { return 
remote_read_cache_miss_count_.Load(); }
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/runtime/io/request-context.h@162
PS1, Line 162:   int64_t remote_read_cache_hit_bytes() const { return 
remote_read_cache_hit_bytes_.Load(); }
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/runtime/io/request-context.h@163
PS1, Line 163:   int64_t remote_read_cache_miss_bytes() const { return 
remote_read_cache_miss_bytes_.Load(); }
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/scheduling/scheduler.cc@699
PS1, Line 699: // 2. This is an HDFS file split (S3/ADLS are treated as 
HDFS for this purpose, or seem to be) (TODO: verify)
line too long (113 > 90)


http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/util/cache-test.cc
File be/src/util/cache-test.cc:

http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/util/cache-test.cc@80
PS1, Line 80: ASSERT_OK(cache.Store(f1, mtime1 + (1 + i)*1000, offset1, 
small_string_ptr, 5, ));
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/util/cache-test.cc@104
PS1, Line 104: ASSERT_OK(cache.Store(f, mtime1 + (1 + i)*1000, offset1, 
small_string_ptr, 5, ));
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/util/cache.h
File be/src/util/cache.h:

http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/util/cache.h@45
PS1, Line 45: // CacheMeta(len1, 
itr)Data(len1)CacheMeta(len2,ptr)Data(len2)CacheMeta(free length, "free")
line too long (104 > 90)


http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/util/cache.h@48
PS1, Line 48: //   ^ current_offset with "current_offset" pointing 
to the next chunk to be cleared away.
line too long (102 > 90)


http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/util/cache.h@88
PS1, Line 88:   // use int (rather than int64) because we're inheriting that 
from the HDFS APIs. In practice,
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/util/cache.h@90
PS1, Line 90:   Status Lookup(string& filename, int64_t mtime, int64_t offset, 
int bytes_to_read, int* bytes_read, uint8_t* buffer);
line too long (118 > 90)


http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/util/cache.h@91
PS1, Line 91:   Status Store(string& filename, int64_t mtime, int64_t 
file_offset, const uint8_t* buffer, int64_t store_len, bool* stored);
line too long (125 > 90)


http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/util/cache.h@105
PS1, Line 105:   void SetMeta(int64_t offset, int64_t valid_length, int64_t 
length, CacheMetaIterator it) {
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/util/cache.h@148
PS1, Line 148: // Combines multi-caches and uses different ones according to 
the hash of (filename, mtime, offset).
line too long (100 > 90)


http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/util/cache.h@154
PS1, Line 154:   Status Lookup(string& filename, int64_t mtime, int64_t offset, 
int bytes_to_read, int* bytes_read, uint8_t* buffer);
line too long (118 > 90)


http://gerrit.cloudera.org:8080/#/c/12683/1/be/src/util/cache.h@155
PS1, Line 155:   Status Store(string& filename, int64_t mtime, int64_t 
file_offset, const uint8_t* buffer, int64_t store_len, bool* stored);
line too long (125 > 

[Impala-ASF-CR] Prototype for a remote read byte cache.

2019-03-06 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12683 )

Change subject: Prototype for a remote read byte cache.
..


Patch Set 1:

I'm using Gerrit as a way to share this. It's certainly not ready for commit, 
but it may be worthy of discussion.


--
To view, visit http://gerrit.cloudera.org:8080/12683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic312b0f7ac7875e00a3855ef21dce5b8a9aa67c5
Gerrit-Change-Number: 12683
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 06 Mar 2019 18:56:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] Prototype for a remote read byte cache.

2019-03-06 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12683


Change subject: Prototype for a remote read byte cache.
..

Prototype for a remote read byte cache.

This code (which is far from ready for prime time) attempts to get to a
place where a byte cache for remote (either HDFS or S3 or ADLS) reads
can be evaluated. This commit message tries to write down what I've
figured out so far...

Hive LLAP found that it was useful to build such a cache.  (See
hive.git:llap-server/src/java/org/apache/hadoop/hive/llap/cache/LowLevelCache.java
and neighbors.) In their case, they implemented an LRUF policy. They are also
caching the uncompressed ORC byte streams, which give them fairly consistent
chunk sizes (because the compression buffers are typically all the same size).
Alluxio also found it useful to build a cache, and it plugs in as a Hadoop
filesystem. Apache Ignite and Quobole Rubix are of interest. And, of course,
there's memcached, Redis, ehcache, and so on and on and on...

Hive LLAP found that SSDs work best for these caches, since they're
seek heavy. I think this implies that they'd be configured differently
than scratch dirs, since, in a machine with both SSDs and disk, you may
want to keep the cache off the disk. (Or you may want a three-tier policy..)

The code in this commit builds a cache with the simplest policy I could
implement (FIFO) and backs the cache with a large, memory-mapped file. There is
implicitly a two-level cache here: the OS page cache is handling which pages
are in memory, and the cache here is handling eviction from the cache
altogether. The metadata for the cache is stored in a map in memory,
though the linked list management of free space is directly in the buffers.
(I currently regret that, but I don't think it affects the evaluation goal;
a linked list without pointer arithmetic would have been more pleasant
to deal with.)

I hooked up the code through ReadFromPosInternal() which was sort of
easy.  Technically cached files don't need an hdfsFileOpen(), but I've
not optimized that.

I've not made the cache coalesce overlapping reads. I suspect that
in practice it may not be necessary, because how we read tables is pretty
consistent, but that's something that needs to be checked.

I've not made the cache do "compaction" because the FIFO policy means
that there's no fragmentation.

When breaking down the function of the cache, a few things have to happen:
* Impala has to read through the cache or store things in the cache.
  I've done this via wrapping ReadFromPosInternal()

* Impala has to specify the "value" of certain data; e.g., Parquet footers
  or index pages are more valuable than data pages of really large
  fact tables.

  I've not dealt with this.

* The cache has to be configured.

  I've exposed a pair of flags for directories and how much of
  the relevant free space to use. If this sort of thing is long-lived,
  you'd actually want to integrate this with the general scratch space
  management code in Impala: a spilling query could use disk space
  that's being used by a cache. (Though even spilling queries should
  have limits; the cache may be more valuable than the currently
  spilling query.)

* The cache has to expose metrics.

  I exposed hit/miss counters (in both count and bytes) on the query,
  but nothing global.

* The cache needs to have a notion of locality, so that files don't
  end up cached N times for an N-node cluster.

  I've taken advantage of the recent changes to make remote reads pin themselves
  to certain hosts (to make the remote file handle cache work). I don't know
  whether that currently leaves us with 3 or 1 copies of most files, but
  it's enough for evaluation.

* The cache needs to have decent threading, since we read from many threads.

  I've set it up so that you get multiple caches, one per dir, and,
  furthermore, each cache is split into 8. This roughly models
  having a pool of 8 threads reading from SSDs. I've not explored
  this tunable space.

I used neither the temporary file manager code nor the buffer pool code.
This is out of expedience/ignorance. I suspect the temporary file
manager code, and especially its ability to do encryption, would be very
useful. The buffer pool code currently provides buffers that are pinned
by default, and I didn't want to deal with pinning and unpinning. A more
complicated implementation would find much more re-use potential here.

I looked into using 2MB Transparent Huge Pages but found that mmap
doesn't do both "file-backed" and "THP." There's some performance
exploration to be done here, but I've not looked into it.

The cache doesn't survive restarts. This seems ok and certainly makes
state management easier, since we don't have to worry about consistency
for the metadata state in the face of a crash. The underlying file is
deleted at creation time. Users may end up getting confused 

[Impala-ASF-CR] IMPALA-7988: addendum: tolerate docker stop failure

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12640 )

Change subject: IMPALA-7988: addendum: tolerate docker stop failure
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12640
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I211c9d29a9a30f9eebfff2973b4fb422d963f132
Gerrit-Change-Number: 12640
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 06 Mar 2019 18:44:33 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.

2019-03-06 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12680 )

Change subject: IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance 
regression.
..


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12680/1/be/src/util/rle-encoding.h
File be/src/util/rle-encoding.h:

http://gerrit.cloudera.org:8080/#/c/12680/1/be/src/util/rle-encoding.h@64
PS1, Line 64:  For 1 bit-width values, that point is 8 values.  They require 2 
bytes
: /// for both the repeated encoding or the literal encoding.  This 
value can always
: /// be computed based on the bit-width.
Maybe it could be mentioned that the optimal can be 16/24 in some cases, but we 
did not implement it because we are unsure about its benefits.



--
To view, visit http://gerrit.cloudera.org:8080/12680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6bcbaf564fbbe6dc83ba3afc100b4e5ccc7af40
Gerrit-Change-Number: 12680
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Wed, 06 Mar 2019 18:39:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] [stress] pull out QueryRunner

2019-03-06 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12578 )

Change subject: [stress] pull out QueryRunner
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12578/2/tests/stress/query_runner.py
File tests/stress/query_runner.py:

http://gerrit.cloudera.org:8080/#/c/12578/2/tests/stress/query_runner.py@79
PS2, Line 79: # TODO: other classes reach in and touch _metrics
I fixed this, can remove the TODO


http://gerrit.cloudera.org:8080/#/c/12578/2/tests/stress/query_runner.py@279
PS2, Line 279:   def _hash_result(self, cursor, timeout_unix_time, query):
This function is hideously long. I moved it untouched and will clean up in a 
follow-on patch.



--
To view, visit http://gerrit.cloudera.org:8080/12578
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I69c907a65147d082211837cbbaba7225aa4b67cb
Gerrit-Change-Number: 12578
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Thomas Marshall 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 06 Mar 2019 18:44:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7988: addendum: tolerate docker stop failure

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12640 )

Change subject: IMPALA-7988: addendum: tolerate docker stop failure
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/3883/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/12640
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I211c9d29a9a30f9eebfff2973b4fb422d963f132
Gerrit-Change-Number: 12640
Gerrit-PatchSet: 4
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 06 Mar 2019 18:44:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.

2019-03-06 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12660 )

Change subject: IMPALA-8250: Clean up JNI warnings.
..


Patch Set 2:

(1 comment)

Yep, good catch. Thanks.

http://gerrit.cloudera.org:8080/#/c/12660/2/be/src/exec/hbase-table-scanner.cc
File be/src/exec/hbase-table-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/12660/2/be/src/exec/hbase-table-scanner.cc@562
PS2, Line 562:   RETURN_ERROR_IF_EXC(env);
> I think GetArrayLength does not throw an exception, so this check is probab
Done



--
To view, visit http://gerrit.cloudera.org:8080/12660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Gerrit-Change-Number: 12660
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 06 Mar 2019 18:13:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.

2019-03-06 Thread Philip Zeyliger (Code Review)
Hello Sahil Takiar, Todd Lipcon, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12660

to look at the new patch set (#3).

Change subject: IMPALA-8250: Clean up JNI warnings.
..

IMPALA-8250: Clean up JNI warnings.

Using LIBHDFS_OPTS+="-Xcheck:jni" revealed a handful of warnings related to
(a) checking for exceptions and (b) leaking local references.

Checking for exceptions required sprinkling RETURN_ERROR_IF_EXC
left and right. I chose not to expand the JniCall infrastructure
to handle this more generally at the moment.

The leaky local references are a bit harder. In the logs, they show up
as "WARNING: JNI local refs: 2597, exceeds capacity: 35" or similar. A
few of these errors seem to be not in our code.  The ones that I've
found in our code stemmed from HBaseTableScanner::GetRowKey(): this
method uses local references and wasn't returning them. Using a
JniLocalFrame seems to have taken care of the warnings.

I have added code to skip test_large_strings when JNI checking is
enabled. This test takes forever (presumably because JNI is checking
bounds on strings very aggressively), and times out. The time out also
causes some metric-related checks to fail (since a query is still in
flight).

Debugging this required customizing my JDK to give stack traces
when these warnings occurred. The following diff facilitated
this.

  diff -r 76a9c9cf14f1 src/share/vm/prims/jniCheck.cpp
  --- a/src/share/vm/prims/jniCheck.cpp Tue Jan 15 10:43:31 2019 +
  +++ b/src/share/vm/prims/jniCheck.cpp Wed Feb 27 11:57:13 2019 -0800
  @@ -143,11 +143,30 @@
   static const char * fatal_instance_field_mismatch = "Field type (instance) 
mismatch in JNI get/set field operations";
   static const char * fatal_non_string = "JNI string operation received a 
non-string";

  +// thisone: whether to print every time, or maybe, depending on future
  +// how many future stacks we want printed (totally racy); helps catch
  +// missing exception handling if there's a way to tickle that code
  +// reliably.
  +static inline void dump_native_stack(JavaThread* thr, bool thisone, int 
future) {
  +  static int fut_stacks = 0; // racy!
  +  if (fut_stacks > 0) {
  +thisone = true;
  +fut_stacks--;
  +  }
  +  if (future > 0) fut_stacks = future;
  +  if (thisone) {
  +frame fr = os::current_frame();
  +char buf[6000];
  +tty->print_cr("Thread: %s %d", thr->get_thread_name(), 
thr->osthread()->thread_id());
  +print_native_stack(tty, fr, thr, buf, sizeof(buf));
  +  }
  +}

   // When in VM state:
   static void ReportJNIWarning(JavaThread* thr, const char *msg) {
 tty->print_cr("WARNING in native method: %s", msg);
 thr->print_stack();
  +  dump_native_stack(thr, true, 0);
   }

   // When in NATIVE state:
  @@ -199,11 +218,14 @@
 tty->print_cr("WARNING in native method: JNI call made without 
checking exceptions when required to from %s",
   thr->get_pending_jni_exception_check());
 thr->print_stack();
  +  dump_native_stack(thr, true, 10);
   )
   thr->clear_pending_jni_exception_check(); // Just complain once
 }
   }

  +
  +
   /**
* Add to the planned number of handles. I.e. plus current live & warning 
threshold
*/
  @@ -254,9 +276,12 @@
 tty->print_cr("WARNING: JNI local refs: %zu, exceeds capacity: %zu",
 live_handles, planned_capacity);
 thr->print_stack();
  +  dump_native_stack(thr, true, 0);
   )
   // Complain just the once, reset to current + warn threshold
   add_planned_handle_capacity(handles, 0);
  +  } else {
  +dump_native_stack(thr, false, 0);
 }
   }

Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
---
M be/src/catalog/catalog.cc
M be/src/exec/hbase-table-scanner.cc
M be/src/exprs/hive-udf-call.cc
M be/src/runtime/hbase-table-factory.cc
M be/src/service/frontend.cc
M be/src/service/frontend.h
M tests/query_test/test_insert.py
7 files changed, 55 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/12660/3
--
To view, visit http://gerrit.cloudera.org:8080/12660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Gerrit-Change-Number: 12660
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Todd Lipcon 


[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12481 )

Change subject: IMPALA-7368: Add initial support for DATE type
..


Patch Set 8:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/2374/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/12481
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f
Gerrit-Change-Number: 12481
Gerrit-PatchSet: 8
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 06 Mar 2019 17:23:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12481 )

Change subject: IMPALA-7368: Add initial support for DATE type
..


Patch Set 7:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/2373/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/12481
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f
Gerrit-Change-Number: 12481
Gerrit-PatchSet: 7
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 06 Mar 2019 17:20:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12680 )

Change subject: IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance 
regression.
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/2372/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/12680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If6bcbaf564fbbe6dc83ba3afc100b4e5ccc7af40
Gerrit-Change-Number: 12680
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Comment-Date: Wed, 06 Mar 2019 17:02:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type

2019-03-06 Thread Attila Jeges (Code Review)
Attila Jeges has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12481 )

Change subject: IMPALA-7368: Add initial support for DATE type
..


Patch Set 8:

I've also realized that besides TEXT, DATE should work for HBASE as well. Do 
you think, I should add HBASE tests to this patch-set, or it should be a 
separate change?


--
To view, visit http://gerrit.cloudera.org:8080/12481
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f
Gerrit-Change-Number: 12481
Gerrit-PatchSet: 8
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 06 Mar 2019 16:42:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type

2019-03-06 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/12481 )

Change subject: IMPALA-7368: Add initial support for DATE type
..

IMPALA-7368: Add initial support for DATE type

DATE values describe a particular year/month/day in the form
-MM-dd. For example: DATE '2019-02-15'. DATE values do not have a
time of day component. The range of values supported for the DATE type
is -01-01 to -12-31.

This initial DATE type support covers TEXT fileformat only.
'DateValue' is used as the internal type to represent DATE values.

The changes are as follows:
- Support for DATE literal syntax.
- Explicit casting between DATE and other types:
- from STRING to DATE. The string value must be formatted as
  -MM-dd.
- from DATE to STRING. The resulting string value is formatted as
  -MM-dd.
- from TIMESTAMP to DATE. The source timestamp's time of day
  component is ignored.
- from DATE to TIMESTAMP. The target timestamp's time of day
  component is set to 00:00:00.
- Implicit casting between DATE and other types:
- from STRING to DATE if the source string value is used in a
  context where a DATE value is expected.
- from DATE to TIMESTAMP if the source date value is used in a
  context where a TIMESTAMP value is expected.
- Since both STRING -> DATE and STRING -> TIMESTAMP implicit
  conversions are possible, the function resolution logic was changed
  to select the right version of overloaded functions:
 - If both implicit conversions are applicable equally well,
   STRING -> TIMESTAMP is preferred for backward compatibility
   reasons. E.g: year('2019-02-15') must resolve to
   year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is
   not implemented yet, so this is not an issue at the moment but
   it will be in the future.
 - If one implicit conversion is a better fit over the other, we
   must choose the better one. E.g:
   if(false, '2011-01-01', DATE '1499-02-02') must resolve to
   if(BOOLEAN, DATE, DATE) instead of
   if(BOOLEAN, TIMESTAMP, TIMESTAMP).
- Codegen infrastructure changes for expression evaluation.
- 'IS [NOT] NULL' and '[NOT] IN' predicates.
- Common comparison operators (including the 'BETWEEN' operator).
- Infrastructure changes for built-in functions.
- Some built-in functions: conditional, aggregate, analytical and
  math functions.
- C++ UDF/UDA support.
- Support partitioning and grouping by DATE.
- Beeswax, HiveServer2 support.

These items are tightly coupled and it makes sense to implement them
in one change-set.

Testing:
- A new partitioned TEXT table 'functional.date_tbl' was introduced
  for DATE-related tests.
- BE and FE tests were extended to cover DATE type.
- E2E tests:
- since DATE type is supported for TEXT fileformat only, most DATE
  tests were implemented separately in
  tests/query_test/test_date_queries.py.

Note, that this change-set is not a complete DATE type implementation,
but it lays the foundation for future work:
- Add date support to the random query generator.
- Implement a complete set of built-in functions.
- Add Parquet support.
- Add HBase and Kudu support.
- Optionally support Avro and ORC.
For further details, see IMPALA-6169.

Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f
---
M be/src/codegen/codegen-anyval.cc
M be/src/codegen/codegen-anyval.h
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/llvm-codegen.cc
M be/src/exec/aggregator.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/hash-table.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/text-converter.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/agg-fn-evaluator.cc
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M be/src/exprs/anyval-util.cc
M be/src/exprs/anyval-util.h
M be/src/exprs/case-expr.cc
M be/src/exprs/case-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/cast-functions.h
M be/src/exprs/conditional-functions-ir.cc
M be/src/exprs/conditional-functions.h
M be/src/exprs/expr-test.cc
M be/src/exprs/expr-value.h
M be/src/exprs/hive-udf-call.cc
M be/src/exprs/hive-udf-call.h
M be/src/exprs/in-predicate-ir.cc
M be/src/exprs/in-predicate.h
M be/src/exprs/is-null-predicate-ir.cc
M be/src/exprs/literal.cc
M be/src/exprs/literal.h
M be/src/exprs/math-functions-ir.cc
M be/src/exprs/math-functions.h
M be/src/exprs/null-literal.cc
M be/src/exprs/null-literal.h
M be/src/exprs/operators-ir.cc
M be/src/exprs/operators.h
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr-ir.cc
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/scalar-fn-call.h
M be/src/exprs/slot-ref.cc
M be/src/exprs/slot-ref.h
M be/src/exprs/timestamp-functions.h
M be/src/exprs/utility-functions-ir.cc
M 

[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12481 )

Change subject: IMPALA-7368: Add initial support for DATE type
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12481/7/tests/query_test/test_date_queries.py
File tests/query_test/test_date_queries.py:

http://gerrit.cloudera.org:8080/#/c/12481/7/tests/query_test/test_date_queries.py@20
PS7, Line 20: import pytest
flake8: F401 'pytest' imported but unused



--
To view, visit http://gerrit.cloudera.org:8080/12481
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f
Gerrit-Change-Number: 12481
Gerrit-PatchSet: 7
Gerrit-Owner: Attila Jeges 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 06 Mar 2019 16:38:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7368: Add initial support for DATE type

2019-03-06 Thread Attila Jeges (Code Review)
Attila Jeges has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/12481 )

Change subject: IMPALA-7368: Add initial support for DATE type
..

IMPALA-7368: Add initial support for DATE type

DATE values describe a particular year/month/day in the form
-MM-dd. For example: DATE '2019-02-15'. DATE values do not have a
time of day component. The range of values supported for the DATE type
is -01-01 to -12-31.

This initial DATE type support covers TEXT fileformat only.
'DateValue' is used as the internal type to represent DATE values.

The changes are as follows:
- Support for DATE literal syntax.
- Explicit casting between DATE and other types:
- from STRING to DATE. The string value must be formatted as
  -MM-dd.
- from DATE to STRING. The resulting string value is formatted as
  -MM-dd.
- from TIMESTAMP to DATE. The source timestamp's time of day
  component is ignored.
- from DATE to TIMESTAMP. The target timestamp's time of day
  component is set to 00:00:00.
- Implicit casting between DATE and other types:
- from STRING to DATE if the source string value is used in a
  context where a DATE value is expected.
- from DATE to TIMESTAMP if the source date value is used in a
  context where a TIMESTAMP value is expected.
- Since both STRING -> DATE and STRING -> TIMESTAMP implicit
  conversions are possible, the function resolution logic was changed
  to select the right version of overloaded functions:
 - If both implicit conversions are applicable equally well,
   STRING -> TIMESTAMP is preferred for backward compatibility
   reasons. E.g: year('2019-02-15') must resolve to
   year(TIMESTAMP) instead of year(DATE). Note, that year(DATE) is
   not implemented yet, so this is not an issue at the moment but
   it will be in the future.
 - If one implicit conversion is a better fit over the other, we
   must choose the better one. E.g:
   if(false, '2011-01-01', DATE '1499-02-02') must resolve to
   if(BOOLEAN, DATE, DATE) instead of
   if(BOOLEAN, TIMESTAMP, TIMESTAMP).
- Codegen infrastructure changes for expression evaluation.
- 'IS [NOT] NULL' and '[NOT] IN' predicates.
- Common comparison operators (including the 'BETWEEN' operator).
- Infrastructure changes for built-in functions.
- Some built-in functions: conditional, aggregate, analytical and
  math functions.
- C++ UDF/UDA support.
- Support partitioning and grouping by DATE.
- Beeswax, HiveServer2 support.

These items are tightly coupled and it makes sense to implement them
in one change-set.

Testing:
- A new partitioned TEXT table 'functional.date_tbl' was introduced
  for DATE-related tests.
- BE and FE tests were extended to cover DATE type.
- E2E tests:
- since DATE type is supported for TEXT fileformat only, most DATE
  tests were implemented separately in
  tests/query_test/test_date_queries.py.

Note, that this change-set is not a complete DATE type implementation,
but it lays the foundation for future work:
- Add date support to the random query generator.
- Implement a complete set of built-in functions.
- Add Parquet support.
- Add HBase and Kudu support.
- Optionally support Avro and ORC.
For further details, see IMPALA-6169.

Change-Id: Iea8155ef09557e0afa2f8b2d0b2dc9d0896dc30f
---
M be/src/codegen/codegen-anyval.cc
M be/src/codegen/codegen-anyval.h
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/llvm-codegen.cc
M be/src/exec/aggregator.cc
M be/src/exec/data-source-scan-node.cc
M be/src/exec/hash-table.cc
M be/src/exec/hdfs-scanner-ir.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/text-converter.cc
M be/src/exec/text-converter.inline.h
M be/src/exprs/agg-fn-evaluator.cc
M be/src/exprs/aggregate-functions-ir.cc
M be/src/exprs/aggregate-functions.h
M be/src/exprs/anyval-util.cc
M be/src/exprs/anyval-util.h
M be/src/exprs/case-expr.cc
M be/src/exprs/case-expr.h
M be/src/exprs/cast-functions-ir.cc
M be/src/exprs/cast-functions.h
M be/src/exprs/conditional-functions-ir.cc
M be/src/exprs/conditional-functions.h
M be/src/exprs/expr-test.cc
M be/src/exprs/expr-value.h
M be/src/exprs/hive-udf-call.cc
M be/src/exprs/hive-udf-call.h
M be/src/exprs/in-predicate-ir.cc
M be/src/exprs/in-predicate.h
M be/src/exprs/is-null-predicate-ir.cc
M be/src/exprs/literal.cc
M be/src/exprs/literal.h
M be/src/exprs/math-functions-ir.cc
M be/src/exprs/math-functions.h
M be/src/exprs/null-literal.cc
M be/src/exprs/null-literal.h
M be/src/exprs/operators-ir.cc
M be/src/exprs/operators.h
M be/src/exprs/scalar-expr-evaluator.cc
M be/src/exprs/scalar-expr-evaluator.h
M be/src/exprs/scalar-expr-ir.cc
M be/src/exprs/scalar-expr.cc
M be/src/exprs/scalar-expr.h
M be/src/exprs/scalar-fn-call.cc
M be/src/exprs/scalar-fn-call.h
M be/src/exprs/slot-ref.cc
M be/src/exprs/slot-ref.h
M be/src/exprs/timestamp-functions.h
M be/src/exprs/utility-functions-ir.cc
M 

[Impala-ASF-CR] IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.

2019-03-06 Thread Andrew Sherman (Code Review)
Andrew Sherman has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12680


Change subject: IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance 
regression.
..

IMPALA-8279: Revert IMPALA-6658 to avoid ETL performance regression.

IMPALA-6658 changed RleEncoder to have the ability to use run lengths
other than 8. It seemed that a slightly more complex RleEncoder could
save a small amount of disk space by using the longer run lengths, in
particular for bit width of 1. We now see a performance regression on a
simple ETL query.  Overall it seems that the costs of IMPALA-6658 exceed
the benefits. This change removes IMPALA-6658.

The strategy for this was that the change to rle-encoding.h, which
contains the code, was undone using 'git revert'. I removed the test
changes in rle-test.cc that rely on different encoding lengths. This
allows us to keep some useful new tests that were written as part of
IMPALA-6658

TESTING:

Ran all end-to-end tests.

Change-Id: If6bcbaf564fbbe6dc83ba3afc100b4e5ccc7af40
---
M be/src/exec/parquet/parquet-bool-decoder-test.cc
M be/src/util/rle-encoding.h
M be/src/util/rle-test.cc
3 files changed, 139 insertions(+), 383 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/12680/1
--
To view, visit http://gerrit.cloudera.org:8080/12680
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If6bcbaf564fbbe6dc83ba3afc100b4e5ccc7af40
Gerrit-Change-Number: 12680
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-6503: Support reading complex types from ORC

2019-03-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12168 )

Change subject: IMPALA-6503: Support reading complex types from ORC
..


Patch Set 22: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/12168
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I244dc9d2b3e425393f90e45632cb8cdbea6cf790
Gerrit-Change-Number: 12168
Gerrit-PatchSet: 22
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 06 Mar 2019 10:14:10 +
Gerrit-HasComments: No