[jira] [Commented] (SENTRY-2109) Fix the logic of identifying HMS out of Sync and handle gaps and out-of-sequence notifications.

2018-02-01 Thread Vadim Spector (JIRA)

[ 
https://issues.apache.org/jira/browse/SENTRY-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16349521#comment-16349521
 ] 

Vadim Spector commented on SENTRY-2109:
---

I think it's useful to have as many reviews as possible, so if [~akolb] wants 
to review it may be worth to reopen it.

[~spena], it is correct, for now this logic is going to be on hold, so there is 
no rush in committing it. Since it's almost complete (unless the remaining 
reviewers find something wrong), it can be a legitimate plan B, if HMS fix is 
found to have any serious issues.

> Fix the logic of identifying HMS out of Sync and handle gaps and 
> out-of-sequence notifications.
> ---
>
> Key: SENTRY-2109
> URL: https://issues.apache.org/jira/browse/SENTRY-2109
> Project: Sentry
>  Issue Type: Bug
>  Components: Sentry
>Affects Versions: 2.1.0
>Reporter: kalyan kumar kalvagadda
>Assignee: kalyan kumar kalvagadda
>Priority: Major
> Fix For: 2.1.0
>
> Attachments: SENTRY-2109.001.patch, SENTRY-2109.002.patch, 
> SENTRY-2109.003.patch, SENTRY-2109.004.patch, SENTRY-2109.005.patch, 
> SENTRY-2109.006.patch, SENTRY-2109.007.patch, SENTRY-2109.008.patch, 
> SENTRY-2109.009.patch, SENTRY-2109.010.patch, SENTRY-2109.010.patch, 
> SENTRY-2109.011.patch, SENTRY-2109.012.patch, SENTRY-2109.012.patch, 
> SENTRY-2109.012.patch, Screenshot_HMS_NOTIFICATION_LOG.png
>
>
> Currently HMSFollower proactively checks if sentry is out of sync with HMS 
> and initiates full snapshot, if needed.
> There will be false positives with the current logic if there are gaps in the 
> event-id in the notification log sequence.
> This jira is aimed at making that logic robust.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (SENTRY-2110) send HDFS full updates in incremental chunks, to overcome Thrift 2Gb message size limit

2017-12-24 Thread Vadim Spector (JIRA)
Vadim Spector created SENTRY-2110:
-

 Summary: send HDFS full updates in incremental chunks, to overcome 
Thrift 2Gb message size limit
 Key: SENTRY-2110
 URL: https://issues.apache.org/jira/browse/SENTRY-2110
 Project: Sentry
  Issue Type: Improvement
Reporter: Vadim Spector


Thrift messages are limited to 2 Gb. Sending full update for millions of 
partitions from Sentry to HDFS plugin can at some point exceed the message size 
limit. Sample figures: for 15 million partitions, 2 Gb translates to 143 bytes 
per partition record which is not too much.

Full update can be split into several pieces. It will require 

1. adding some additional fields to Thrift message schema, like the sequence 
number of the incremental full update message and the total number of 
incremental full update messages (e.g. messge #2 of 10 total)

2. Logic on Sentry side (SentryPlugin) to split full update into chunks and 
manage sending full updates in those chunks, watching for acknowledgements from 
HDFS plugin and resending chunks if needed.

3. Logic on HDFS plugin to assemble incremental chunks into full updates, 
watching for chunk numbers, and asking SentryPlugin to resend a chunk if needed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SENTRY-2109) Fix the logic of identifying HMS out of Sync

2017-12-23 Thread Vadim Spector (JIRA)

[ 
https://issues.apache.org/jira/browse/SENTRY-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16302609#comment-16302609
 ] 

Vadim Spector commented on SENTRY-2109:
---

[~kkalyan], could you, please, explain how this patch is going to relate to 
SENTRY-2106, since they are trying to address the same issue. This way it will 
be easier to do code review.

In my opinion, it would be easier - and safer - to commit SENTRY-2106 first, so 
you'd provide your code review on top of it. SENTRY-2106 seems to be pretty 
straightforward (once some concerns have been addressed), focusing on one 
specific condition and one specific way of avoiding full update. If you provide 
your patch on top of it, it would be easier to review it. Otherwise, since both 
patches have overlap in the code (trying to fix exactly the same method for 
exactly the same purpose), it may be confusing to analyze this patch on its own 
while thinking how the two patches will interact after the merge.

> Fix the logic of identifying HMS out of Sync
> 
>
> Key: SENTRY-2109
> URL: https://issues.apache.org/jira/browse/SENTRY-2109
> Project: Sentry
>  Issue Type: Bug
>  Components: Sentry
>Affects Versions: 2.1.0
>Reporter: kalyan kumar kalvagadda
>Assignee: kalyan kumar kalvagadda
> Fix For: 2.1.0
>
> Attachments: SENTRY-2109.001.patch, 
> Screenshot_HMS_NOTIFICATION_LOG.png
>
>
> Currently HMSFollower proactively checks if sentry is out of sync with HMS 
> and initiates full snapshot, if needed.
> There will be false positives with the current logic if there are gaps in the 
> event-id in the notification log sequence.
> This jira is aimed at making that logic robust.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SENTRY-2109) Fix the logic of identifying HMS out of Sync

2017-12-22 Thread Vadim Spector (JIRA)

[ 
https://issues.apache.org/jira/browse/SENTRY-2109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16301723#comment-16301723
 ] 

Vadim Spector commented on SENTRY-2109:
---

[~kkalyan], please, also include detection and logging of non-sequential event 
IDs in the reply, in HMSFollower.processNotification(). It is important to see 
if there are gaps within a single fetch.

> Fix the logic of identifying HMS out of Sync
> 
>
> Key: SENTRY-2109
> URL: https://issues.apache.org/jira/browse/SENTRY-2109
> Project: Sentry
>  Issue Type: Bug
>  Components: Sentry
>Affects Versions: 2.1.0
>Reporter: kalyan kumar kalvagadda
>Assignee: kalyan kumar kalvagadda
> Fix For: 2.1.0
>
> Attachments: Screenshot_HMS_NOTIFICATION_LOG.png
>
>
> Currently HMSFollower proactively checks if sentry is out of sync with HMS 
> and initiates full snapshot, if needed.
> There will be false positives with the current logic if there are gaps in the 
> event-id in the notification log sequence.
> This jira is aimed at making that logic robust.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2019) Improve logging in SentryPlugin

2017-12-20 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2019:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Improve logging in SentryPlugin
> ---
>
> Key: SENTRY-2019
> URL: https://issues.apache.org/jira/browse/SENTRY-2019
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-2019.01.patch
>
>
> For supportability, need detailed logging of data flow in and from 
> SentryPlugin.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2019) Improve logging in SentryPlugin

2017-12-20 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2019:
--
Summary: Improve logging in SentryPlugin  (was: Improve logging on 
SentryPlugin)

> Improve logging in SentryPlugin
> ---
>
> Key: SENTRY-2019
> URL: https://issues.apache.org/jira/browse/SENTRY-2019
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-2019.01.patch
>
>
> For supportability, need detailed logging of data flow in and from 
> SentryPlugin.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SENTRY-2019) Improve logging in SentryPlugin

2017-12-20 Thread Vadim Spector (JIRA)

[ 
https://issues.apache.org/jira/browse/SENTRY-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16298992#comment-16298992
 ] 

Vadim Spector commented on SENTRY-2019:
---

Patch committed

> Improve logging in SentryPlugin
> ---
>
> Key: SENTRY-2019
> URL: https://issues.apache.org/jira/browse/SENTRY-2019
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-2019.01.patch
>
>
> For supportability, need detailed logging of data flow in and from 
> SentryPlugin.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (SENTRY-2019) Improve logging on SentryPlugin

2017-12-18 Thread Vadim Spector (JIRA)

[ 
https://issues.apache.org/jira/browse/SENTRY-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16295908#comment-16295908
 ] 

Vadim Spector edited comment on SENTRY-2019 at 12/19/17 12:27 AM:
--

straightforward addition of TRACE logs of complete data for all inputs and 
outputs of public API methods of SentryPlugin class.


was (Author: vspec...@gmail.com):
straightforward addition of TRACE logs of complete data for all inputs and 
outputs of public API methods. 

> Improve logging on SentryPlugin
> ---
>
> Key: SENTRY-2019
> URL: https://issues.apache.org/jira/browse/SENTRY-2019
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-2019.01.patch
>
>
> For supportability, need detailed logging of data flow in and from 
> SentryPlugin.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2019) Improve logging on SentryPlugin

2017-12-18 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2019:
--
Status: Patch Available  (was: Open)

straightforward addition of TRACE logs of complete data for all inputs and 
outputs of public API methods. 

> Improve logging on SentryPlugin
> ---
>
> Key: SENTRY-2019
> URL: https://issues.apache.org/jira/browse/SENTRY-2019
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-2019.01.patch
>
>
> For supportability, need detailed logging of data flow in and from 
> SentryPlugin.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2019) Improve logging on SentryPlugin

2017-12-18 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2019:
--
Attachment: SENTRY-2019.01.patch

> Improve logging on SentryPlugin
> ---
>
> Key: SENTRY-2019
> URL: https://issues.apache.org/jira/browse/SENTRY-2019
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-2019.01.patch
>
>
> For supportability, need detailed logging of data flow in and from 
> SentryPlugin.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2102) Switching to HttpServer2 for WebUI access

2017-12-15 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2102:
--
Description: 
There is the implementation of HTTP server which is supposed to be the standard 
for all Hadoop components implementing Web UIs: 

https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/http/HttpServer2.java

It comes with built-in support of SSL, SPNEGO (Kerberos from browsers), plus 
authorization - all configurable.

It comes with pre-configured servlets, such as Stack, LogLevel servlet, 
Metrics, JMX, Configuration, all for free.

Sentry code is still using its own SentryWebServer.java implementation. Some of 
Sentry's servlets are already supported in HttpServer2 (maybe even improved 
versions of those).

In addition, HttpServer2 security features satisfy security demands of 
commercial deployment, unlike Sentry's own Web UI, which security-conscious 
customers may be reluctant to activate for good reasons.

Switching to HttpServer2 would be a major improvement of the Sentry product 
(and less code to support).

  was:
There is the implementation of HTTP server which is supposed to be the standard 
for all Hadoop components implementing Web UIs: 

https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/http/HttpServer2.java

It comes with built-in support of SSL, SPNEGO (Kerberos from browsers), plus 
authorization - all configurable.

It comes with pre-configured servlets, such as Stack, LogLevel servlet, 
Metrics, JMX, Configuration, all for free.

Sentry code is still using its own SentryWebServer.java implementation. Some of 
Sentry's servlets are already supported in HttpServer2 (maybe even improved 
versions of those).

In addition, HttpServer2 security features satisfy security demands of 
commercial deployment, unlike our own Web UI, which security-conscious 
customers may be reluctant to activate for good reasons.

Switching to HttpServer2 would be a major improvement of the Sentry product 
(and less code to support).


> Switching to HttpServer2 for WebUI access
> -
>
> Key: SENTRY-2102
> URL: https://issues.apache.org/jira/browse/SENTRY-2102
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>
> There is the implementation of HTTP server which is supposed to be the 
> standard for all Hadoop components implementing Web UIs: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/http/HttpServer2.java
> It comes with built-in support of SSL, SPNEGO (Kerberos from browsers), plus 
> authorization - all configurable.
> It comes with pre-configured servlets, such as Stack, LogLevel servlet, 
> Metrics, JMX, Configuration, all for free.
> Sentry code is still using its own SentryWebServer.java implementation. Some 
> of Sentry's servlets are already supported in HttpServer2 (maybe even 
> improved versions of those).
> In addition, HttpServer2 security features satisfy security demands of 
> commercial deployment, unlike Sentry's own Web UI, which security-conscious 
> customers may be reluctant to activate for good reasons.
> Switching to HttpServer2 would be a major improvement of the Sentry product 
> (and less code to support).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2102) Switching to HttpServer2 for WebUI access

2017-12-15 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2102:
--
Description: 
There is the implementation of HTTP server which is supposed to be the standard 
for all Hadoop components implementing Web UIs: 

https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/http/HttpServer2.java

It comes with built-in support of SSL, SPNEGO (Kerberos from browsers), plus 
authorization - all configurable.

It comes with pre-configured servlets, such as Stack, LogLevel servlet, 
Metrics, JMX, Configuration, all for free.

Sentry code is still using its own SentryWebServer.java implementation. Some of 
Sentry's servlets are already supported in HttpServer2 (maybe even improved 
versions of those).

In addition, HttpServer2 security features satisfy security demands of 
commercial deployment, unlike our own Web UI, which security-conscious 
customers may be reluctant to activate for good reasons.

Switching to HttpServer2 would be a major improvement of the Sentry product 
(and less code to support).

  was:
There is the implementation of HTTP server which is supposed to be the standard 
for all Hadoop components implementing Web UIs: 

https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/http/HttpServer2.java

It comes with built-in support of SSL, SPNEGO (Kerberos from browsers), plus 
authorization - all configurable.

It comes with pre-configured servlets, such as Stack, LogLevel servlet, 
Metrics, JMX, Configuration, all for free.

Sentry code is still using its own SentryWebServer.java implementation. ? Some 
of Sentry's servlets are already supported in HttpServer2 (maybe even improved 
versions of those).

In addition, HttpServer2 security features satisfy security demands of 
commercial deployment, unlike our own Web UI, which security-conscious 
customers may be reluctant to activate for good reasons.

Switching to HttpServer2 would be a major improvement of the Sentry product 
(and less code to support).


> Switching to HttpServer2 for WebUI access
> -
>
> Key: SENTRY-2102
> URL: https://issues.apache.org/jira/browse/SENTRY-2102
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>
> There is the implementation of HTTP server which is supposed to be the 
> standard for all Hadoop components implementing Web UIs: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/http/HttpServer2.java
> It comes with built-in support of SSL, SPNEGO (Kerberos from browsers), plus 
> authorization - all configurable.
> It comes with pre-configured servlets, such as Stack, LogLevel servlet, 
> Metrics, JMX, Configuration, all for free.
> Sentry code is still using its own SentryWebServer.java implementation. Some 
> of Sentry's servlets are already supported in HttpServer2 (maybe even 
> improved versions of those).
> In addition, HttpServer2 security features satisfy security demands of 
> commercial deployment, unlike our own Web UI, which security-conscious 
> customers may be reluctant to activate for good reasons.
> Switching to HttpServer2 would be a major improvement of the Sentry product 
> (and less code to support).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SENTRY-2102) Switching to HttpServer2 for WebUI access

2017-12-15 Thread Vadim Spector (JIRA)
Vadim Spector created SENTRY-2102:
-

 Summary: Switching to HttpServer2 for WebUI access
 Key: SENTRY-2102
 URL: https://issues.apache.org/jira/browse/SENTRY-2102
 Project: Sentry
  Issue Type: Improvement
Reporter: Vadim Spector


There is the implementation of HTTP server which is supposed to be the standard 
for all Hadoop components implementing Web UIs: 

https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/http/HttpServer2.java

It comes with built-in support of SSL, SPNEGO (Kerberos from browsers), plus 
authorization - all configurable.

It comes with pre-configured servlets, such as Stack, LogLevel servlet, 
Metrics, JMX, Configuration, all for free.

Sentry code is still using its own SentryWebServer.java implementation. ? Some 
of Sentry's servlets are already supported in HttpServer2 (maybe even improved 
versions of those).

In addition, HttpServer2 security features satisfy security demands of 
commercial deployment, unlike our own Web UI, which security-conscious 
customers may be reluctant to activate for good reasons.

Switching to HttpServer2 would be a major improvement of the Sentry product 
(and less code to support).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1993) StringIndexOutOfBoundsException in HMSPathsDumper.java

2017-11-20 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1993:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> StringIndexOutOfBoundsException in HMSPathsDumper.java
> --
>
> Key: SENTRY-1993
> URL: https://issues.apache.org/jira/browse/SENTRY-1993
> Project: Sentry
>  Issue Type: Bug
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: SENTRY-1993.01.patch
>
>
> The following line in HMSPathsDumper.java is causing 
> StringIndexOutOfBoundsException:
> {code}
> if (tChildPathElement.charAt(0) == DupDetector.REPLACEMENT_STRING_PREFIX) {
> {code}
> It only happens when a path element is "", when someone mistakenly specifies 
> hdfs path with two "/" in the path section, like 
> hdfs://server//element1//element2, instead of 
> hdfs://server/element1/element2. In principle, such paths are invalid, but 
> this code should be made resistant to them anyway.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SENTRY-2047) isTableEmptyCore method in SentryStore has references to MAuthzPathsMapping when it should be generic

2017-11-17 Thread Vadim Spector (JIRA)

[ 
https://issues.apache.org/jira/browse/SENTRY-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257225#comment-16257225
 ] 

Vadim Spector commented on SENTRY-2047:
---

Patch applied

> isTableEmptyCore method in SentryStore has references to MAuthzPathsMapping 
> when it should be generic
> -
>
> Key: SENTRY-2047
> URL: https://issues.apache.org/jira/browse/SENTRY-2047
> Project: Sentry
>  Issue Type: Bug
>  Components: Sentry
>Affects Versions: 2.0.0
>Reporter: Arjun Mishra
>Assignee: Arjun Mishra
>  Labels: newbie
> Attachments: SENTRY-2047.01.patch, SENTRY-2047.02.patch
>
>
> Current isTableEmpty implementation is below
> {noformat}
> private boolean isTableEmptyCore(PersistenceManager pm, Class clazz) {
> Query query = pm.newQuery(clazz);
> query.addExtension(LOAD_RESULTS_AT_COMMIT, "false");
> // setRange is implemented efficiently for MySQL, Postgresql (using the 
> LIMIT SQL keyword)
> // and Oracle (using the ROWNUM keyword), with the query only finding the 
> objects required
> // by the user directly in the datastore. For other RDBMS the query will 
> retrieve all
> // objects up to the "to" record, and will not pass any unnecessary 
> objects that are before
> // the "from" record.
> query.setRange(0, 1);
> return ((List) query.execute()).isEmpty();
>   }
> {noformat}
> We seem to be casting query.execute to a List when there 
> is no need for it



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SENTRY-2032) Leading Slashes need to removed when creating HMS path entries

2017-11-17 Thread Vadim Spector (JIRA)

[ 
https://issues.apache.org/jira/browse/SENTRY-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16257221#comment-16257221
 ] 

Vadim Spector commented on SENTRY-2032:
---

Patch applied

> Leading Slashes need to removed when creating HMS path entries
> --
>
> Key: SENTRY-2032
> URL: https://issues.apache.org/jira/browse/SENTRY-2032
> Project: Sentry
>  Issue Type: Bug
>  Components: Sentry
>Affects Versions: 2.0.0
>Reporter: Arjun Mishra
>Assignee: Arjun Mishra
> Attachments: SENTRY-2032.01.patch, SENTRY-2032.02.patch, 
> SENTRY-2032.03.patch, SENTRY-2032.04.patch
>
>
> When retrieving full path image update, we split a path by "/" and create HMS 
> Path entries. However, the leading "/" presence will cause issues because on 
> splitting the value at index 0 will be empty. This will affect the creation 
> of HMS path entries. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2031) Add trigger mechanism for Sentry to pull full path snapshot from HMS

2017-11-06 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2031:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Add trigger mechanism for Sentry to pull full path snapshot from HMS
> 
>
> Key: SENTRY-2031
> URL: https://issues.apache.org/jira/browse/SENTRY-2031
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-2031.01.patch, SENTRY-2031.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2031) Add trigger mechanism for Sentry to pull full path snapshot from HMS

2017-11-05 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2031:
--
Attachment: SENTRY-2031.03.patch

> Add trigger mechanism for Sentry to pull full path snapshot from HMS
> 
>
> Key: SENTRY-2031
> URL: https://issues.apache.org/jira/browse/SENTRY-2031
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-2031.01.patch, SENTRY-2031.03.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2031) Add trigger mechanism for Sentry to pull full path snapshot from HMS

2017-11-04 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2031:
--
Status: Patch Available  (was: Open)

> Add trigger mechanism for Sentry to pull full path snapshot from HMS
> 
>
> Key: SENTRY-2031
> URL: https://issues.apache.org/jira/browse/SENTRY-2031
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-2031.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2031) Add trigger mechanism for Sentry to pull full path snapshot from HMS

2017-11-04 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2031:
--
Attachment: SENTRY-2031.01.patch

> Add trigger mechanism for Sentry to pull full path snapshot from HMS
> 
>
> Key: SENTRY-2031
> URL: https://issues.apache.org/jira/browse/SENTRY-2031
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-2031.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1712) Add trigger mechanism for Sentry to push full path snapshot to Name Node

2017-11-03 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1712:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Add trigger mechanism for Sentry to push full path snapshot to Name Node
> 
>
> Key: SENTRY-1712
> URL: https://issues.apache.org/jira/browse/SENTRY-1712
> Project: Sentry
>  Issue Type: Bug
>  Components: Sentry
>Affects Versions: 2.0.0
>Reporter: Na Li
>Assignee: Vadim Spector
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: SENTRY-1712.01.patch, SENTRY-1712.02.patch
>
>
> Right now, there is no way to manually force Sentry to get full snapshot. 
> We need to add this capability. That gives us more power to handle error 
> condition and enable testing. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1712) Add trigger mechanism for Sentry to push full path snapshot to Name Node

2017-11-03 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1712:
--
Attachment: SENTRY-1712.02.patch

> Add trigger mechanism for Sentry to push full path snapshot to Name Node
> 
>
> Key: SENTRY-1712
> URL: https://issues.apache.org/jira/browse/SENTRY-1712
> Project: Sentry
>  Issue Type: Bug
>  Components: Sentry
>Affects Versions: 2.0.0
>Reporter: Na Li
>Assignee: Vadim Spector
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: SENTRY-1712.01.patch, SENTRY-1712.02.patch
>
>
> Right now, there is no way to manually force Sentry to get full snapshot. 
> We need to add this capability. That gives us more power to handle error 
> condition and enable testing. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1712) Add trigger mechanism for Sentry to push full path snapshot to Name Node

2017-11-02 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1712:
--
Status: Patch Available  (was: Open)

> Add trigger mechanism for Sentry to push full path snapshot to Name Node
> 
>
> Key: SENTRY-1712
> URL: https://issues.apache.org/jira/browse/SENTRY-1712
> Project: Sentry
>  Issue Type: Bug
>  Components: Sentry
>Affects Versions: 2.0.0
>Reporter: Na Li
>Assignee: Vadim Spector
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: SENTRY-1712.01.patch
>
>
> Right now, there is no way to manually force Sentry to get full snapshot. 
> We need to add this capability. That gives us more power to handle error 
> condition and enable testing. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1712) Add trigger mechanism for Sentry to push full path snapshot to Name Node

2017-11-02 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1712:
--
Attachment: SENTRY-1712.01.patch

> Add trigger mechanism for Sentry to push full path snapshot to Name Node
> 
>
> Key: SENTRY-1712
> URL: https://issues.apache.org/jira/browse/SENTRY-1712
> Project: Sentry
>  Issue Type: Bug
>  Components: Sentry
>Affects Versions: 2.0.0
>Reporter: Na Li
>Assignee: Vadim Spector
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: SENTRY-1712.01.patch
>
>
> Right now, there is no way to manually force Sentry to get full snapshot. 
> We need to add this capability. That gives us more power to handle error 
> condition and enable testing. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1712) Add trigger mechanism for Sentry to push full path snapshot to Name Node

2017-11-02 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1712:
--
Summary: Add trigger mechanism for Sentry to push full path snapshot to 
Name Node  (was: Add trigger mechanism for Sentry to push full snapshot to Name 
Node)

> Add trigger mechanism for Sentry to push full path snapshot to Name Node
> 
>
> Key: SENTRY-1712
> URL: https://issues.apache.org/jira/browse/SENTRY-1712
> Project: Sentry
>  Issue Type: Bug
>  Components: Sentry
>Affects Versions: 2.0.0
>Reporter: Na Li
>Assignee: Vadim Spector
>Priority: Major
> Fix For: 2.0.0
>
>
> Right now, there is no way to manually force Sentry to get full snapshot. 
> We need to add this capability. That gives us more power to handle error 
> condition and enable testing. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2031) Add trigger mechanism for Sentry to pull full path snapshot from HMS

2017-11-02 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2031:
--
Summary: Add trigger mechanism for Sentry to pull full path snapshot from 
HMS  (was: Add trigger mechanism for Sentry to pull full snapshot from HMS)

> Add trigger mechanism for Sentry to pull full path snapshot from HMS
> 
>
> Key: SENTRY-2031
> URL: https://issues.apache.org/jira/browse/SENTRY-2031
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>Assignee: Vadim Spector
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1712) Add trigger mechanism for Sentry to push full snapshot to Name Node

2017-11-02 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1712:
--
Summary: Add trigger mechanism for Sentry to push full snapshot to Name 
Node  (was: Add capability to force Sentry to send full snapshot to HDFS)

> Add trigger mechanism for Sentry to push full snapshot to Name Node
> ---
>
> Key: SENTRY-1712
> URL: https://issues.apache.org/jira/browse/SENTRY-1712
> Project: Sentry
>  Issue Type: Bug
>  Components: Sentry
>Affects Versions: 2.0.0
>Reporter: Na Li
>Assignee: Vadim Spector
>Priority: Major
> Fix For: 2.0.0
>
>
> Right now, there is no way to manually force Sentry to get full snapshot. 
> We need to add this capability. That gives us more power to handle error 
> condition and enable testing. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SENTRY-2031) Add trigger mechanism for Sentry to pull full snapshot from HMS

2017-11-02 Thread Vadim Spector (JIRA)
Vadim Spector created SENTRY-2031:
-

 Summary: Add trigger mechanism for Sentry to pull full snapshot 
from HMS
 Key: SENTRY-2031
 URL: https://issues.apache.org/jira/browse/SENTRY-2031
 Project: Sentry
  Issue Type: Improvement
Reporter: Vadim Spector
Assignee: Vadim Spector
Priority: Major






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1712) Add capability to force Sentry to send full snapshot to HDFS

2017-11-02 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1712:
--
Priority: Major  (was: Minor)

> Add capability to force Sentry to send full snapshot to HDFS
> 
>
> Key: SENTRY-1712
> URL: https://issues.apache.org/jira/browse/SENTRY-1712
> Project: Sentry
>  Issue Type: Bug
>  Components: Sentry
>Affects Versions: 2.0.0
>Reporter: Na Li
>Assignee: Vadim Spector
>Priority: Major
> Fix For: 2.0.0
>
>
> Right now, there is no way to manually force Sentry to get full snapshot. 
> We need to add this capability. That gives us more power to handle error 
> condition and enable testing. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2027) Create mechanism of delivering commands via WebUI

2017-11-02 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2027:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Create mechanism of delivering commands via WebUI
> -
>
> Key: SENTRY-2027
> URL: https://issues.apache.org/jira/browse/SENTRY-2027
> Project: Sentry
>  Issue Type: New Feature
>Reporter: Vadim Spector
>Assignee: Vadim Spector
>Priority: Major
> Attachments: SENTRY-2027.03.patch
>
>
> Need to support triggering full updates from HMS to Sentry and from Sentry to 
> NameNode. WebUI is natural choice. Need dedicated servlet to pass simple 
> notifications to intended Sentry components plus some flexible mechanism for 
> components to receive those notifications. Suggested mechanism is 
> publish-subscribe, which is very flexible, and once implemented, allows 
> adding new functionality with virtually no coding effort.
> Web form presents Topic and Message text fields, Submit button, and text area 
> for reporting information / errors back from the server. Message field can be 
> empty.
> For example, topic "hms-sync" can trigger full update from HMS, and topic 
> "nn-sync" can trigger full update to NameNode.
> Securing WebUI will be addressed in a separate JIRA. To mitigate security 
> concerns, forced sync functionality, as well as the publish-subscribe web 
> servlet will be disabled by default, and can be activated by reconfiguration.
> The implementation is intended to be most basic.
> After this mechanism is implemented, will proceed with two more JIRAs to 
> implement full updates for HMS and NN.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2027) Create mechanism of delivering commands via WebUI

2017-11-02 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2027:
--
Attachment: (was: SENTRY-2027.04.patch)

> Create mechanism of delivering commands via WebUI
> -
>
> Key: SENTRY-2027
> URL: https://issues.apache.org/jira/browse/SENTRY-2027
> Project: Sentry
>  Issue Type: New Feature
>Reporter: Vadim Spector
>Assignee: Vadim Spector
>Priority: Major
> Attachments: SENTRY-2027.03.patch
>
>
> Need to support triggering full updates from HMS to Sentry and from Sentry to 
> NameNode. WebUI is natural choice. Need dedicated servlet to pass simple 
> notifications to intended Sentry components plus some flexible mechanism for 
> components to receive those notifications. Suggested mechanism is 
> publish-subscribe, which is very flexible, and once implemented, allows 
> adding new functionality with virtually no coding effort.
> Web form presents Topic and Message text fields, Submit button, and text area 
> for reporting information / errors back from the server. Message field can be 
> empty.
> For example, topic "hms-sync" can trigger full update from HMS, and topic 
> "nn-sync" can trigger full update to NameNode.
> Securing WebUI will be addressed in a separate JIRA. To mitigate security 
> concerns, forced sync functionality, as well as the publish-subscribe web 
> servlet will be disabled by default, and can be activated by reconfiguration.
> The implementation is intended to be most basic.
> After this mechanism is implemented, will proceed with two more JIRAs to 
> implement full updates for HMS and NN.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2027) Create mechanism of delivering commands via WebUI

2017-11-02 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2027:
--
Attachment: SENTRY-2027.04.patch

> Create mechanism of delivering commands via WebUI
> -
>
> Key: SENTRY-2027
> URL: https://issues.apache.org/jira/browse/SENTRY-2027
> Project: Sentry
>  Issue Type: New Feature
>Reporter: Vadim Spector
>Assignee: Vadim Spector
>Priority: Major
> Attachments: SENTRY-2027.03.patch, SENTRY-2027.04.patch
>
>
> Need to support triggering full updates from HMS to Sentry and from Sentry to 
> NameNode. WebUI is natural choice. Need dedicated servlet to pass simple 
> notifications to intended Sentry components plus some flexible mechanism for 
> components to receive those notifications. Suggested mechanism is 
> publish-subscribe, which is very flexible, and once implemented, allows 
> adding new functionality with virtually no coding effort.
> Web form presents Topic and Message text fields, Submit button, and text area 
> for reporting information / errors back from the server. Message field can be 
> empty.
> For example, topic "hms-sync" can trigger full update from HMS, and topic 
> "nn-sync" can trigger full update to NameNode.
> Securing WebUI will be addressed in a separate JIRA. To mitigate security 
> concerns, forced sync functionality, as well as the publish-subscribe web 
> servlet will be disabled by default, and can be activated by reconfiguration.
> The implementation is intended to be most basic.
> After this mechanism is implemented, will proceed with two more JIRAs to 
> implement full updates for HMS and NN.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SENTRY-1712) Add capability to force Sentry to send full snapshot to HDFS

2017-11-02 Thread Vadim Spector (JIRA)

[ 
https://issues.apache.org/jira/browse/SENTRY-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16236396#comment-16236396
 ] 

Vadim Spector commented on SENTRY-1712:
---

Committed SENTRY-2027: Create mechanism of delivering commands via WebUI. Now 
can proceed with the implementation of SENTRY-1712.

> Add capability to force Sentry to send full snapshot to HDFS
> 
>
> Key: SENTRY-1712
> URL: https://issues.apache.org/jira/browse/SENTRY-1712
> Project: Sentry
>  Issue Type: Bug
>  Components: Sentry
>Affects Versions: 2.0.0
>Reporter: Na Li
>Assignee: Vadim Spector
>Priority: Minor
> Fix For: 2.0.0
>
>
> Right now, there is no way to manually force Sentry to get full snapshot. 
> We need to add this capability. That gives us more power to handle error 
> condition and enable testing. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2027) Create mechanism of delivering commands via WebUI

2017-10-31 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2027:
--
Status: Patch Available  (was: Open)

> Create mechanism of delivering commands via WebUI
> -
>
> Key: SENTRY-2027
> URL: https://issues.apache.org/jira/browse/SENTRY-2027
> Project: Sentry
>  Issue Type: New Feature
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-2027.03.patch
>
>
> Need to support triggering full updates from HMS to Sentry and from Sentry to 
> NameNode. WebUI is natural choice. Need dedicated servlet to pass simple 
> notifications to intended Sentry components plus some flexible mechanism for 
> components to receive those notifications. Suggested mechanism is 
> publish-subscribe, which is very flexible, and once implemented, allows 
> adding new functionality with virtually no coding effort.
> Web form presents Topic and Message text fields, Submit button, and text area 
> for reporting information / errors back from the server. Message field can be 
> empty.
> For example, topic "hms-sync" can trigger full update from HMS, and topic 
> "nn-sync" can trigger full update to NameNode.
> Securing WebUI will be addressed in a separate JIRA. To mitigate security 
> concerns, forced sync functionality, as well as the publish-subscribe web 
> servlet will be disabled by default, and can be activated by reconfiguration.
> The implementation is intended to be most basic.
> After this mechanism is implemented, will proceed with two more JIRAs to 
> implement full updates for HMS and NN.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2027) Create mechanism of delivering commands via WebUI

2017-10-31 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2027:
--
Status: Open  (was: Patch Available)

> Create mechanism of delivering commands via WebUI
> -
>
> Key: SENTRY-2027
> URL: https://issues.apache.org/jira/browse/SENTRY-2027
> Project: Sentry
>  Issue Type: New Feature
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-2027.03.patch
>
>
> Need to support triggering full updates from HMS to Sentry and from Sentry to 
> NameNode. WebUI is natural choice. Need dedicated servlet to pass simple 
> notifications to intended Sentry components plus some flexible mechanism for 
> components to receive those notifications. Suggested mechanism is 
> publish-subscribe, which is very flexible, and once implemented, allows 
> adding new functionality with virtually no coding effort.
> Web form presents Topic and Message text fields, Submit button, and text area 
> for reporting information / errors back from the server. Message field can be 
> empty.
> For example, topic "hms-sync" can trigger full update from HMS, and topic 
> "nn-sync" can trigger full update to NameNode.
> Securing WebUI will be addressed in a separate JIRA. To mitigate security 
> concerns, forced sync functionality, as well as the publish-subscribe web 
> servlet will be disabled by default, and can be activated by reconfiguration.
> The implementation is intended to be most basic.
> After this mechanism is implemented, will proceed with two more JIRAs to 
> implement full updates for HMS and NN.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2027) Create mechanism of delivering commands via WebUI

2017-10-30 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2027:
--
Status: Patch Available  (was: Open)

> Create mechanism of delivering commands via WebUI
> -
>
> Key: SENTRY-2027
> URL: https://issues.apache.org/jira/browse/SENTRY-2027
> Project: Sentry
>  Issue Type: New Feature
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-2027.03.patch
>
>
> Need to support triggering full updates from HMS to Sentry and from Sentry to 
> NameNode. WebUI is natural choice. Need dedicated servlet to pass simple 
> notifications to intended Sentry components plus some flexible mechanism for 
> components to receive those notifications. Suggested mechanism is 
> publish-subscribe, which is very flexible, and once implemented, allows 
> adding new functionality with virtually no coding effort.
> Web form presents Topic and Message text fields, Submit button, and text area 
> for reporting information / errors back from the server. Message field can be 
> empty.
> For example, topic "hms-sync" can trigger full update from HMS, and topic 
> "nn-sync" can trigger full update to NameNode.
> Securing WebUI will be addressed in a separate JIRA. To mitigate security 
> concerns, forced sync functionality, as well as the publish-subscribe web 
> servlet will be disabled by default, and can be activated by reconfiguration.
> The implementation is intended to be most basic.
> After this mechanism is implemented, will proceed with two more JIRAs to 
> implement full updates for HMS and NN.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2027) Create mechanism of delivering commands via WebUI

2017-10-30 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2027:
--
Attachment: (was: SENTRY-2027.01.patch)

> Create mechanism of delivering commands via WebUI
> -
>
> Key: SENTRY-2027
> URL: https://issues.apache.org/jira/browse/SENTRY-2027
> Project: Sentry
>  Issue Type: New Feature
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-2027.03.patch
>
>
> Need to support triggering full updates from HMS to Sentry and from Sentry to 
> NameNode. WebUI is natural choice. Need dedicated servlet to pass simple 
> notifications to intended Sentry components plus some flexible mechanism for 
> components to receive those notifications. Suggested mechanism is 
> publish-subscribe, which is very flexible, and once implemented, allows 
> adding new functionality with virtually no coding effort.
> Web form presents Topic and Message text fields, Submit button, and text area 
> for reporting information / errors back from the server. Message field can be 
> empty.
> For example, topic "hms-sync" can trigger full update from HMS, and topic 
> "nn-sync" can trigger full update to NameNode.
> Securing WebUI will be addressed in a separate JIRA. To mitigate security 
> concerns, forced sync functionality, as well as the publish-subscribe web 
> servlet will be disabled by default, and can be activated by reconfiguration.
> The implementation is intended to be most basic.
> After this mechanism is implemented, will proceed with two more JIRAs to 
> implement full updates for HMS and NN.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2027) Create mechanism of delivering commands via WebUI

2017-10-30 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2027:
--
Attachment: SENTRY-2027.03.patch

> Create mechanism of delivering commands via WebUI
> -
>
> Key: SENTRY-2027
> URL: https://issues.apache.org/jira/browse/SENTRY-2027
> Project: Sentry
>  Issue Type: New Feature
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-2027.03.patch
>
>
> Need to support triggering full updates from HMS to Sentry and from Sentry to 
> NameNode. WebUI is natural choice. Need dedicated servlet to pass simple 
> notifications to intended Sentry components plus some flexible mechanism for 
> components to receive those notifications. Suggested mechanism is 
> publish-subscribe, which is very flexible, and once implemented, allows 
> adding new functionality with virtually no coding effort.
> Web form presents Topic and Message text fields, Submit button, and text area 
> for reporting information / errors back from the server. Message field can be 
> empty.
> For example, topic "hms-sync" can trigger full update from HMS, and topic 
> "nn-sync" can trigger full update to NameNode.
> Securing WebUI will be addressed in a separate JIRA. To mitigate security 
> concerns, forced sync functionality, as well as the publish-subscribe web 
> servlet will be disabled by default, and can be activated by reconfiguration.
> The implementation is intended to be most basic.
> After this mechanism is implemented, will proceed with two more JIRAs to 
> implement full updates for HMS and NN.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2027) Create mechanism of delivering commands via WebUI

2017-10-30 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2027:
--
Attachment: SENTRY-2027.01.patch

> Create mechanism of delivering commands via WebUI
> -
>
> Key: SENTRY-2027
> URL: https://issues.apache.org/jira/browse/SENTRY-2027
> Project: Sentry
>  Issue Type: New Feature
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-2027.01.patch
>
>
> Need to support triggering full updates from HMS to Sentry and from Sentry to 
> NameNode. WebUI is natural choice. Need dedicated servlet to pass simple 
> notifications to intended Sentry components plus some flexible mechanism for 
> components to receive those notifications. Suggested mechanism is 
> publish-subscribe, which is very flexible, and once implemented, allows 
> adding new functionality with virtually no coding effort.
> Web form presents Topic and Message text fields, Submit button, and text area 
> for reporting information / errors back from the server. Message field can be 
> empty.
> For example, topic "hms-sync" can trigger full update from HMS, and topic 
> "nn-sync" can trigger full update to NameNode.
> Securing WebUI will be addressed in a separate JIRA. To mitigate security 
> concerns, forced sync functionality, as well as the publish-subscribe web 
> servlet will be disabled by default, and can be activated by reconfiguration.
> The implementation is intended to be most basic.
> After this mechanism is implemented, will proceed with two more JIRAs to 
> implement full updates for HMS and NN.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2027) Create mechanism of delivering commands via WebUI

2017-10-30 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2027:
--
Description: 
Need to support triggering full updates from HMS to Sentry and from Sentry to 
NameNode. WebUI is natural choice. Need dedicated servlet to pass simple 
notifications to intended Sentry components plus some flexible mechanism for 
components to receive those notifications. Suggested mechanism is 
publish-subscribe, which is very flexible, and once implemented, allows adding 
new functionality with virtually no coding effort.

Web form presents Topic and Message text fields, Submit button, and text area 
for reporting information / errors back from the server. Message field can be 
empty.

For example, topic "hms-sync" can trigger full update from HMS, and topic 
"nn-sync" can trigger full update to NameNode.

Securing WebUI will be addressed in a separate JIRA. To mitigate security 
concerns, forced sync functionality, as well as the publish-subscribe web 
servlet will be disabled by default, and can be activated by reconfiguration.

The implementation is intended to be most basic.

After this mechanism is implemented, will proceed with two more JIRAs to 
implement full update triggers for HMS and NN.

  was:
Need to support triggering full updates from HMS to Sentry and from Sentry to 
NameNode. WebUI is natural choice. Need dedicated servlet to pass simple 
notifications to intended Sentry components plus some flexible mechanism for 
components to receive those notifications. Suggested mechanism is 
publish-subscribe, which is very flexible, and once implemented, allows adding 
new functionality with virtually no coding effort.

Web form presents Topic and Message text fields, Submit button, and text area 
for reporting information / errors back from the server. Message field can be 
empty.

For example, topic "hms-sync" can trigger full update from HMS, and topic 
"nn-sync" can trigger full update to NameNode.

Securing WebUI will be addressed in a separate JIRA. To mitigate security 
concerns, forced sync functionality, as well as the publish-subscribe web 
servlet will be disabled by default, and can be activated by reconfiguration.

The implementation is intended to be most basic.


> Create mechanism of delivering commands via WebUI
> -
>
> Key: SENTRY-2027
> URL: https://issues.apache.org/jira/browse/SENTRY-2027
> Project: Sentry
>  Issue Type: New Feature
>Reporter: Vadim Spector
>Assignee: Vadim Spector
>
> Need to support triggering full updates from HMS to Sentry and from Sentry to 
> NameNode. WebUI is natural choice. Need dedicated servlet to pass simple 
> notifications to intended Sentry components plus some flexible mechanism for 
> components to receive those notifications. Suggested mechanism is 
> publish-subscribe, which is very flexible, and once implemented, allows 
> adding new functionality with virtually no coding effort.
> Web form presents Topic and Message text fields, Submit button, and text area 
> for reporting information / errors back from the server. Message field can be 
> empty.
> For example, topic "hms-sync" can trigger full update from HMS, and topic 
> "nn-sync" can trigger full update to NameNode.
> Securing WebUI will be addressed in a separate JIRA. To mitigate security 
> concerns, forced sync functionality, as well as the publish-subscribe web 
> servlet will be disabled by default, and can be activated by reconfiguration.
> The implementation is intended to be most basic.
> After this mechanism is implemented, will proceed with two more JIRAs to 
> implement full update triggers for HMS and NN.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2027) Create mechanism of delivering commands via WebUI

2017-10-30 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2027:
--
Description: 
Need to support triggering full updates from HMS to Sentry and from Sentry to 
NameNode. WebUI is natural choice. Need dedicated servlet to pass simple 
notifications to intended Sentry components plus some flexible mechanism for 
components to receive those notifications. Suggested mechanism is 
publish-subscribe, which is very flexible, and once implemented, allows adding 
new functionality with virtually no coding effort.

Web form presents Topic and Message text fields, Submit button, and text area 
for reporting information / errors back from the server. Message field can be 
empty.

For example, topic "hms-sync" can trigger full update from HMS, and topic 
"nn-sync" can trigger full update to NameNode.

Securing WebUI will be addressed in a separate JIRA. To mitigate security 
concerns, forced sync functionality, as well as the publish-subscribe web 
servlet will be disabled by default, and can be activated by reconfiguration.

The implementation is intended to be most basic.

After this mechanism is implemented, will proceed with two more JIRAs to 
implement full updates for HMS and NN.

  was:
Need to support triggering full updates from HMS to Sentry and from Sentry to 
NameNode. WebUI is natural choice. Need dedicated servlet to pass simple 
notifications to intended Sentry components plus some flexible mechanism for 
components to receive those notifications. Suggested mechanism is 
publish-subscribe, which is very flexible, and once implemented, allows adding 
new functionality with virtually no coding effort.

Web form presents Topic and Message text fields, Submit button, and text area 
for reporting information / errors back from the server. Message field can be 
empty.

For example, topic "hms-sync" can trigger full update from HMS, and topic 
"nn-sync" can trigger full update to NameNode.

Securing WebUI will be addressed in a separate JIRA. To mitigate security 
concerns, forced sync functionality, as well as the publish-subscribe web 
servlet will be disabled by default, and can be activated by reconfiguration.

The implementation is intended to be most basic.

After this mechanism is implemented, will proceed with two more JIRAs to 
implement full update triggers for HMS and NN.


> Create mechanism of delivering commands via WebUI
> -
>
> Key: SENTRY-2027
> URL: https://issues.apache.org/jira/browse/SENTRY-2027
> Project: Sentry
>  Issue Type: New Feature
>Reporter: Vadim Spector
>Assignee: Vadim Spector
>
> Need to support triggering full updates from HMS to Sentry and from Sentry to 
> NameNode. WebUI is natural choice. Need dedicated servlet to pass simple 
> notifications to intended Sentry components plus some flexible mechanism for 
> components to receive those notifications. Suggested mechanism is 
> publish-subscribe, which is very flexible, and once implemented, allows 
> adding new functionality with virtually no coding effort.
> Web form presents Topic and Message text fields, Submit button, and text area 
> for reporting information / errors back from the server. Message field can be 
> empty.
> For example, topic "hms-sync" can trigger full update from HMS, and topic 
> "nn-sync" can trigger full update to NameNode.
> Securing WebUI will be addressed in a separate JIRA. To mitigate security 
> concerns, forced sync functionality, as well as the publish-subscribe web 
> servlet will be disabled by default, and can be activated by reconfiguration.
> The implementation is intended to be most basic.
> After this mechanism is implemented, will proceed with two more JIRAs to 
> implement full updates for HMS and NN.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SENTRY-2027) Create mechanism of delivering commands via WebUI

2017-10-30 Thread Vadim Spector (JIRA)
Vadim Spector created SENTRY-2027:
-

 Summary: Create mechanism of delivering commands via WebUI
 Key: SENTRY-2027
 URL: https://issues.apache.org/jira/browse/SENTRY-2027
 Project: Sentry
  Issue Type: New Feature
Reporter: Vadim Spector
Assignee: Vadim Spector


Need to support triggering full updates from HMS to Sentry and from Sentry to 
NameNode. WebUI is natural choice. Need dedicated servlet to pass simple 
notifications to intended Sentry components plus some flexible mechanism for 
components to receive those notifications. Suggested mechanism is 
publish-subscribe, which is very flexible, and once implemented, allows adding 
new functionality with virtually no coding effort.

Web form presents Topic and Message text fields, Submit button, and text area 
for reporting information / errors back from the server. Message field can be 
empty.

For example, topic "hms-sync" can trigger full update from HMS, and topic 
"nn-sync" can trigger full update to NameNode.

Securing WebUI will be addressed in a separate JIRA. To mitigate security 
concerns, forced sync functionality, as well as the publish-subscribe web 
servlet will be disabled by default, and can be activated by reconfiguration.

The implementation is intended to be most basic.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SENTRY-1891) SentryPlugin triggers full update due to concurrency bug

2017-10-24 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector resolved SENTRY-1891.
---
Resolution: Won't Fix

No longer applicable; issue does not exist in Sentry HA code

> SentryPlugin triggers full update due to concurrency bug
> 
>
> Key: SENTRY-1891
> URL: https://issues.apache.org/jira/browse/SENTRY-1891
> Project: Sentry
>  Issue Type: Bug
>Reporter: Vadim Spector
>Assignee: Vadim Spector
>
> Sentry server can trigger full update to NameNode for no good reason, in high 
> load cases, due to the bug in concurrency handling. Full update is always 
> done at the initialization time and for large number of permissions and/or 
> hdfs paths it takes lots of time and significantly increases (temporarily) 
> heap size. It is performed during normal operations only if Sentry server 
> decides that it has gaps in the partial updates' sequence numbers, so it 
> restores the entire snapshot. Full update during normal operations is a 
> highly disruptive event, leading to huge increasw in heap size, system 
> slowdown, and even crash. Below is the explanation of the problem:
> 
> *SentryPlugin.java* has multiple permission update methods, such as 
> _onDropSentryRole()_, _onDropSentryPrivilege()_, 
> _onAlterSentryRoleDeleteGroups()_, etc. They, of course, can be called 
> concurrently. And they all delegate work to 
> _permsUpdater.handleUpdateNotification(update)_ call, which, in turn, 
> serializes those requests by turning them into the Runnable tasks and 
> submitting them into a single-threaded thread pool. Each job ends up 
> appending an update to the updates list. So far so good, all updates are 
> serialized.
> However, each of those methods first creates _PermissionsUpdate_ object with 
> auto-incremented sequence number, and only then passes this object to 
> _permsUpdate.handleUpdateNotification(update)_. Here is typical code snippet:
> {quote}PermissionsUpdate update = new 
> PermissionsUpdate(permSeqNum.incrementAndGet(), false);
> // what if another permission update thread preempts this one right here and 
> starts and finishes the whole method ???
> update.addPrivilegeUpdate(authzObj).putToAddPrivileges(roleName, 
> privilege.getAction().toUpperCase());
> // ... or here ???
> permsUpdater.handleUpdateNotification(update); 
> LOGGER.debug("Authz Perm preUpdate [" + update.getSeqNum() + "]..");{quote}
> The problem is that sequence number assignment to _PermissionsUpdate_ object 
> and appending this object to the updates list (by calling 
> _permsUpdater.handleUpdateNotification(update)_ ) must be an atomic pair of 
> operations, and it is not. In multi-threaded environment sooner or later we 
> can end up with appending updates into the updates list with  out of order 
> sequence numbers.
> The problem is not that the updates may end up appended to the updates list 
> not in the order of their arrival. We never guaranteed the right order for 
> concurrent requests coming from different clients. The problem is that they 
> end up appended to the list with their sequence numbers out of order, e.g. 
> #101, #100, instead of #100, #101. Then handleUpdateNotification() method has 
> the following lines:
> {quote}final boolean editNotMissed = lastSeenSeqNum.incrementAndGet() == 
> update.getSeqNum();{quote}
> Suppose lastSeenSeqNum was 99, and the next appended update is #101. Then, 
> obviously, 99+1 != 101, so editNotMissed is set to false. It triggers full 
> update logic a few lines of code later:
> {quote}if (imageRetreiver != null) \{
>try \{
>  toUpdate = 
> imageRetreiver.retrieveFullImage(update.getSeqNum());
>\} catch (Exception e) \{
>  LOGGER.warn("failed to retrieve full image: ", e);
>\}
>updateable = updateable.updateFull(toUpdate);
> \}{quote}
> Since imageRetriever != null for permissions updater, all the hell breaks 
> loose.
> It can be fixed in _SentryPlugin.java_ for all permission update methods as 
> follows:
> {quote}PermissionsUpdate update;
> synchronized (permsUpdateLock) \{ // now sequence number increment and adding 
> update to the list are performed atomically
>   update = new PermissionsUpdate(permSeqNum.incrementAndGet(), false);
>update.addPrivilegeUpdate(authzObj).putToAddPrivileges(roleName, 
> privilege.getAction().toUpperCase());
>permsUpdater.handleUpdateNotification(update);
> \}
> LOGGER.debug("Authz Perm preUpdate [" + update.getSeqNum() + "]..");
> {quote}
>  _handleUpdateNotification()_ is always fast since it submits real work to 
> thread pool; so, it does not block, and introducing additional 
> synchronization should not affect 

[jira] [Updated] (SENTRY-2014) Incorrect handling of HDFS paths with multiple slashes

2017-10-24 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2014:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Incorrect handling of HDFS paths with multiple slashes
> --
>
> Key: SENTRY-2014
> URL: https://issues.apache.org/jira/browse/SENTRY-2014
> Project: Sentry
>  Issue Type: Bug
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-2014.01.patch, SENTRY-2014.02.patch
>
>
> To the best of my knowledge, there are at least three places in the code 
> where HDFS paths may not be parsed correctly:
> a) PathsUpdate.parsePath() does not handle collapse duplicate slashes in the 
> path portion of URI into one slash. This method is used when getting paths 
> data from HMS store. HDFS paths with duplicate slashes are perfectly legal 
> and the specs refer to UNIX guidelines saying that multiple slashes should be 
> treated as single slashes. If we keep multiple slashes in the path, such a 
> path may be incorrectly split into path entries with some entries being 
> empty, ultimately resulting in hard-to-troubleshoot ACL problems in the 
> field. We should not assume that the URIs fed into parsePath() have already 
> been normalized. It's easier to fix the code.
> b) NotificationProcessor.splitPath() is using "/" regex instead of the 
> correct "/+" one. While the inputs to this class _may_ be controlled by 
> Sentry software, which _may_ normalize paths properly, it is better not to 
> make such assumptions and just fix the code.
> c) SentryStore.retrieveFullPathsImageCore() splits paths retrieved from 
> database as "path.split("/") instead of path.split("/+")
> This may result in hard-to-troubleshoot HDFS sync failures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2014) Incorrect handling of HDFS paths with multiple slashes

2017-10-24 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2014:
--
Attachment: SENTRY-2014.02.patch

> Incorrect handling of HDFS paths with multiple slashes
> --
>
> Key: SENTRY-2014
> URL: https://issues.apache.org/jira/browse/SENTRY-2014
> Project: Sentry
>  Issue Type: Bug
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-2014.01.patch, SENTRY-2014.02.patch
>
>
> To the best of my knowledge, there are at least three places in the code 
> where HDFS paths may not be parsed correctly:
> a) PathsUpdate.parsePath() does not handle collapse duplicate slashes in the 
> path portion of URI into one slash. This method is used when getting paths 
> data from HMS store. HDFS paths with duplicate slashes are perfectly legal 
> and the specs refer to UNIX guidelines saying that multiple slashes should be 
> treated as single slashes. If we keep multiple slashes in the path, such a 
> path may be incorrectly split into path entries with some entries being 
> empty, ultimately resulting in hard-to-troubleshoot ACL problems in the 
> field. We should not assume that the URIs fed into parsePath() have already 
> been normalized. It's easier to fix the code.
> b) NotificationProcessor.splitPath() is using "/" regex instead of the 
> correct "/+" one. While the inputs to this class _may_ be controlled by 
> Sentry software, which _may_ normalize paths properly, it is better not to 
> make such assumptions and just fix the code.
> c) SentryStore.retrieveFullPathsImageCore() splits paths retrieved from 
> database as "path.split("/") instead of path.split("/+")
> This may result in hard-to-troubleshoot HDFS sync failures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SENTRY-2019) Improve logging on SentryPlugin

2017-10-23 Thread Vadim Spector (JIRA)
Vadim Spector created SENTRY-2019:
-

 Summary: Improve logging on SentryPlugin
 Key: SENTRY-2019
 URL: https://issues.apache.org/jira/browse/SENTRY-2019
 Project: Sentry
  Issue Type: Improvement
Reporter: Vadim Spector
Assignee: Vadim Spector


For supportability, need detailed logging of data flow in and from SentryPlugin.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SENTRY-2014) Incorrect handling of HDFS paths with multiple slashes

2017-10-21 Thread Vadim Spector (JIRA)

[ 
https://issues.apache.org/jira/browse/SENTRY-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214084#comment-16214084
 ] 

Vadim Spector commented on SENTRY-2014:
---

One failed Kafka unit test seems totally unrelated

> Incorrect handling of HDFS paths with multiple slashes
> --
>
> Key: SENTRY-2014
> URL: https://issues.apache.org/jira/browse/SENTRY-2014
> Project: Sentry
>  Issue Type: Bug
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-2014.01.patch
>
>
> To the best of my knowledge, there are at least three places in the code 
> where HDFS paths may not be parsed correctly:
> a) PathsUpdate.parsePath() does not handle collapse duplicate slashes in the 
> path portion of URI into one slash. This method is used when getting paths 
> data from HMS store. HDFS paths with duplicate slashes are perfectly legal 
> and the specs refer to UNIX guidelines saying that multiple slashes should be 
> treated as single slashes. If we keep multiple slashes in the path, such a 
> path may be incorrectly split into path entries with some entries being 
> empty, ultimately resulting in hard-to-troubleshoot ACL problems in the 
> field. We should not assume that the URIs fed into parsePath() have already 
> been normalized. It's easier to fix the code.
> b) NotificationProcessor.splitPath() is using "/" regex instead of the 
> correct "/+" one. While the inputs to this class _may_ be controlled by 
> Sentry software, which _may_ normalize paths properly, it is better not to 
> make such assumptions and just fix the code.
> c) SentryStore.retrieveFullPathsImageCore() splits paths retrieved from 
> database as "path.split("/") instead of path.split("/+")
> This may result in hard-to-troubleshoot HDFS sync failures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2014) Incorrect handling of HDFS paths with multiple slashes

2017-10-20 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2014:
--
Description: 
To the best of my knowledge, there are at least three places in the code where 
HDFS paths may not be parsed correctly:

a) PathsUpdate.parsePath() does not handle collapse duplicate slashes in the 
path portion of URI into one slash. This method is used when getting paths data 
from HMS store. HDFS paths with duplicate slashes are perfectly legal and the 
specs refer to UNIX guidelines saying that multiple slashes should be treated 
as single slashes. If we keep multiple slashes in the path, such a path may be 
incorrectly split into path entries with some entries being empty, ultimately 
resulting in hard-to-troubleshoot ACL problems in the field. We should not 
assume that the URIs fed into parsePath() have already been normalized. It's 
easier to fix the code.

b) NotificationProcessor.splitPath() is using "/" regex instead of the correct 
"/+" one. While the inputs to this class _may_ be controlled by Sentry 
software, which _may_ normalize paths properly, it is better not to make such 
assumptions and just fix the code.

c) SentryStore.retrieveFullPathsImageCore() splits paths retrieved from 
database as "path.split("/") instead of path.split("/+")

This may result in hard-to-troubleshoot HDFS sync failures.

  was:
To the best of my knowledge, there are at least three places in the code where 
HDFS paths may not be parsed correctly:

a) PathsUpdate.parsePath() does not handle collapse duplicate slashes in the 
path portion of URI into one slash. This method is used when getting paths data 
from HMS store. HDFS paths with duplicate slashes are perfectly legal and the 
specs refer to UNIX guidelines saying that multiple slashes should be treated 
as single slashes. If we keep multiple slashes in the path, such a path may be 
incorrectly split into path entries with some entries being empty, ultimately 
resulting in hard-to-troubleshoot ACL problems in the field. We should not 
assume that the URIs fed into parsePath() have already been normalized. It's 
easier to fix the code.

b) NotificationProcessor.splitPath() is using "/" regex instead of the correct 
"/+" one. While the inputs to this class _may_ be controlled by Sentry 
software, which _may_ normalize paths properly, it is better not to make such 
assumptions and just fix the code.

c) SentryStore.retrieveFullPathsImageCore() splits paths retrieved from 
database as "path.split("/") instead of path.split("/+")

This may result in HDFS sync failures.


> Incorrect handling of HDFS paths with multiple slashes
> --
>
> Key: SENTRY-2014
> URL: https://issues.apache.org/jira/browse/SENTRY-2014
> Project: Sentry
>  Issue Type: Bug
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-2014.01.patch
>
>
> To the best of my knowledge, there are at least three places in the code 
> where HDFS paths may not be parsed correctly:
> a) PathsUpdate.parsePath() does not handle collapse duplicate slashes in the 
> path portion of URI into one slash. This method is used when getting paths 
> data from HMS store. HDFS paths with duplicate slashes are perfectly legal 
> and the specs refer to UNIX guidelines saying that multiple slashes should be 
> treated as single slashes. If we keep multiple slashes in the path, such a 
> path may be incorrectly split into path entries with some entries being 
> empty, ultimately resulting in hard-to-troubleshoot ACL problems in the 
> field. We should not assume that the URIs fed into parsePath() have already 
> been normalized. It's easier to fix the code.
> b) NotificationProcessor.splitPath() is using "/" regex instead of the 
> correct "/+" one. While the inputs to this class _may_ be controlled by 
> Sentry software, which _may_ normalize paths properly, it is better not to 
> make such assumptions and just fix the code.
> c) SentryStore.retrieveFullPathsImageCore() splits paths retrieved from 
> database as "path.split("/") instead of path.split("/+")
> This may result in hard-to-troubleshoot HDFS sync failures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2014) Incorrect handling of HDFS paths with multiple slashes

2017-10-20 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2014:
--
Description: 
To the best of my knowledge, there are at least three places in the code where 
HDFS paths may not be parsed correctly:

a) PathsUpdate.parsePath() does not handle collapse duplicate slashes in the 
path portion of URI into one slash. This method is used when getting paths data 
from HMS store. HDFS paths with duplicate slashes are perfectly legal and the 
specs refer to UNIX guidelines saying that multiple slashes should be treated 
as single slashes. If we keep multiple slashes in the path, such a path may be 
incorrectly split into path entries with some entries being empty, ultimately 
resulting in hard-to-troubleshoot ACL problems in the field. We should not 
assume that the URIs fed into parsePath() have already been normalized. It's 
easier to fix the code.

b) NotificationProcessor.splitPath() is using "/" regex instead of the correct 
"/+" one. While the inputs to this class _may_ be controlled by Sentry 
software, which _may_ normalize paths properly, it is better not to make such 
assumptions and just fix the code.

c) SentryStore.retrieveFullPathsImageCore() splits paths retrieved from 
database as "path.split("/") instead of path.split("/+")

This may result in HDFS sync failures.

  was:
There are at least three places in the code where HDFS paths may not be parsed 
correctly:

a) PathsUpdate.parsePath() does not handle collapse duplicate slashes in the 
path portion of URI into one slash. This method is used when getting paths data 
from HMS store. HDFS paths with duplicate slashes are perfectly legal and the 
specs refer to UNIX guidelines saying that multiple slashes should be treated 
as single slashes. If we keep multiple slashes in the path, such a path may be 
incorrectly split into path entries with some entries being empty, ultimately 
resulting in hard-to-troubleshoot ACL problems in the field. We should not 
assume that the URIs fed into parsePath() have already been normalized. It's 
easier to fix the code.

b) NotificationProcessor.splitPath() is using "/" regex instead of the correct 
"/+" one. While the inputs to this class _may_ be controlled by Sentry 
software, which _may_ normalize paths properly, it is better not to make such 
assumptions and just fix the code.

c) SentryStore.retrieveFullPathsImageCore() splits paths retrieved from 
database as "path.split("/") instead of path.split("/+")

This may result in HDFS sync failures.


> Incorrect handling of HDFS paths with multiple slashes
> --
>
> Key: SENTRY-2014
> URL: https://issues.apache.org/jira/browse/SENTRY-2014
> Project: Sentry
>  Issue Type: Bug
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-2014.01.patch
>
>
> To the best of my knowledge, there are at least three places in the code 
> where HDFS paths may not be parsed correctly:
> a) PathsUpdate.parsePath() does not handle collapse duplicate slashes in the 
> path portion of URI into one slash. This method is used when getting paths 
> data from HMS store. HDFS paths with duplicate slashes are perfectly legal 
> and the specs refer to UNIX guidelines saying that multiple slashes should be 
> treated as single slashes. If we keep multiple slashes in the path, such a 
> path may be incorrectly split into path entries with some entries being 
> empty, ultimately resulting in hard-to-troubleshoot ACL problems in the 
> field. We should not assume that the URIs fed into parsePath() have already 
> been normalized. It's easier to fix the code.
> b) NotificationProcessor.splitPath() is using "/" regex instead of the 
> correct "/+" one. While the inputs to this class _may_ be controlled by 
> Sentry software, which _may_ normalize paths properly, it is better not to 
> make such assumptions and just fix the code.
> c) SentryStore.retrieveFullPathsImageCore() splits paths retrieved from 
> database as "path.split("/") instead of path.split("/+")
> This may result in HDFS sync failures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2014) Incorrect handling of HDFS paths with multiple slashes

2017-10-20 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2014:
--
Status: Patch Available  (was: Open)

> Incorrect handling of HDFS paths with multiple slashes
> --
>
> Key: SENTRY-2014
> URL: https://issues.apache.org/jira/browse/SENTRY-2014
> Project: Sentry
>  Issue Type: Bug
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-2014.01.patch
>
>
> There are at least three places in the code where HDFS paths may not be 
> parsed correctly:
> a) PathsUpdate.parsePath() does not handle collapse duplicate slashes in the 
> path portion of URI into one slash. This method is used when getting paths 
> data from HMS store. HDFS paths with duplicate slashes are perfectly legal 
> and the specs refer to UNIX guidelines saying that multiple slashes should be 
> treated as single slashes. If we keep multiple slashes in the path, such a 
> path may be incorrectly split into path entries with some entries being 
> empty, ultimately resulting in hard-to-troubleshoot ACL problems in the 
> field. We should not assume that the URIs fed into parsePath() have already 
> been normalized. It's easier to fix the code.
> b) NotificationProcessor.splitPath() is using "/" regex instead of the 
> correct "/+" one. While the inputs to this class _may_ be controlled by 
> Sentry software, which _may_ normalize paths properly, it is better not to 
> make such assumptions and just fix the code.
> c) SentryStore.retrieveFullPathsImageCore() splits paths retrieved from 
> database as "path.split("/") instead of path.split("/+")
> This may result in HDFS sync failures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2014) Incorrect handling of HDFS paths with multiple slashes

2017-10-20 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2014:
--
Attachment: SENTRY-2014.01.patch

> Incorrect handling of HDFS paths with multiple slashes
> --
>
> Key: SENTRY-2014
> URL: https://issues.apache.org/jira/browse/SENTRY-2014
> Project: Sentry
>  Issue Type: Bug
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-2014.01.patch
>
>
> There are at least three places in the code where HDFS paths may not be 
> parsed correctly:
> a) PathsUpdate.parsePath() does not handle collapse duplicate slashes in the 
> path portion of URI into one slash. This method is used when getting paths 
> data from HMS store. HDFS paths with duplicate slashes are perfectly legal 
> and the specs refer to UNIX guidelines saying that multiple slashes should be 
> treated as single slashes. If we keep multiple slashes in the path, such a 
> path may be incorrectly split into path entries with some entries being 
> empty, ultimately resulting in hard-to-troubleshoot ACL problems in the 
> field. We should not assume that the URIs fed into parsePath() have already 
> been normalized. It's easier to fix the code.
> b) NotificationProcessor.splitPath() is using "/" regex instead of the 
> correct "/+" one. While the inputs to this class _may_ be controlled by 
> Sentry software, which _may_ normalize paths properly, it is better not to 
> make such assumptions and just fix the code.
> c) SentryStore.retrieveFullPathsImageCore() splits paths retrieved from 
> database as "path.split("/") instead of path.split("/+")
> This may result in HDFS sync failures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2014) Incorrect handling of HDFS paths with multiple slashes

2017-10-20 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2014:
--
Attachment: (was: SENTRY-2014.01.patch)

> Incorrect handling of HDFS paths with multiple slashes
> --
>
> Key: SENTRY-2014
> URL: https://issues.apache.org/jira/browse/SENTRY-2014
> Project: Sentry
>  Issue Type: Bug
>Reporter: Vadim Spector
>Assignee: Vadim Spector
>
> There are at least three places in the code where HDFS paths may not be 
> parsed correctly:
> a) PathsUpdate.parsePath() does not handle collapse duplicate slashes in the 
> path portion of URI into one slash. This method is used when getting paths 
> data from HMS store. HDFS paths with duplicate slashes are perfectly legal 
> and the specs refer to UNIX guidelines saying that multiple slashes should be 
> treated as single slashes. If we keep multiple slashes in the path, such a 
> path may be incorrectly split into path entries with some entries being 
> empty, ultimately resulting in hard-to-troubleshoot ACL problems in the 
> field. We should not assume that the URIs fed into parsePath() have already 
> been normalized. It's easier to fix the code.
> b) NotificationProcessor.splitPath() is using "/" regex instead of the 
> correct "/+" one. While the inputs to this class _may_ be controlled by 
> Sentry software, which _may_ normalize paths properly, it is better not to 
> make such assumptions and just fix the code.
> c) SentryStore.retrieveFullPathsImageCore() splits paths retrieved from 
> database as "path.split("/") instead of path.split("/+")
> This may result in HDFS sync failures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2014) Incorrect handling of HDFS paths with multiple slashes

2017-10-20 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2014:
--
Description: 
There are at least three places in the code where HDFS paths may not be parsed 
correctly:

a) PathsUpdate.parsePath() does not handle collapse duplicate slashes in the 
path portion of URI into one slash. This method is used when getting paths data 
from HMS store. HDFS paths with duplicate slashes are perfectly legal and the 
specs refer to UNIX guidelines saying that multiple slashes should be treated 
as single slashes. If we keep multiple slashes in the path, such a path may be 
incorrectly split into path entries with some entries being empty, ultimately 
resulting in hard-to-troubleshoot ACL problems in the field. We should not 
assume that the URIs fed into parsePath() have already been normalized. It's 
easier to fix the code.

b) NotificationProcessor.splitPath() is using "/" regex instead of the correct 
"/+" one. While the inputs to this class _may_ be controlled by Sentry 
software, which _may_ normalize paths properly, it is better not to make such 
assumptions and just fix the code.

c) SentryStore.retrieveFullPathsImageCore() splits paths retrieved from 
database as "path.split("/") instead of path.split("/+")

This may result in HDFS sync failures.

  was:
There are at least three places in the code where HDFS paths may not be parsed 
correctly:

a) PathsUpdate.parsePath() does not handle collapse duplicate slashes in the 
path portion of URI into one slash. This method is used when getting paths data 
from HMS store. HDFS paths with duplicate slashes are perfectly legal and the 
specs refer to UNIX guidelines saying that multiple slashes should be treated 
as single slashes. If we keep multiple slashes in the path, such a path may be 
incorrectly split into path entries with some entries being empty, ultimately 
resulting in hard-to-troubleshoot ACL problems in the field. We should not 
assume that the URIs fed into parsePath() have already been normalized. It's 
easier to fix the code.

b) NotificationProcessor.splitPath() is using "/" regex instead of the correct 
"/+" one. While the inputs to this class _may_ be controlled by Sentry 
software, which _may_ normalize paths properly, it is better not to make such 
assumptions and just fix the code.

c) SentryStore


> Incorrect handling of HDFS paths with multiple slashes
> --
>
> Key: SENTRY-2014
> URL: https://issues.apache.org/jira/browse/SENTRY-2014
> Project: Sentry
>  Issue Type: Bug
>Reporter: Vadim Spector
>Assignee: Vadim Spector
>
> There are at least three places in the code where HDFS paths may not be 
> parsed correctly:
> a) PathsUpdate.parsePath() does not handle collapse duplicate slashes in the 
> path portion of URI into one slash. This method is used when getting paths 
> data from HMS store. HDFS paths with duplicate slashes are perfectly legal 
> and the specs refer to UNIX guidelines saying that multiple slashes should be 
> treated as single slashes. If we keep multiple slashes in the path, such a 
> path may be incorrectly split into path entries with some entries being 
> empty, ultimately resulting in hard-to-troubleshoot ACL problems in the 
> field. We should not assume that the URIs fed into parsePath() have already 
> been normalized. It's easier to fix the code.
> b) NotificationProcessor.splitPath() is using "/" regex instead of the 
> correct "/+" one. While the inputs to this class _may_ be controlled by 
> Sentry software, which _may_ normalize paths properly, it is better not to 
> make such assumptions and just fix the code.
> c) SentryStore.retrieveFullPathsImageCore() splits paths retrieved from 
> database as "path.split("/") instead of path.split("/+")
> This may result in HDFS sync failures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2014) Incorrect handling of HDFS paths with multiple slashes

2017-10-20 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2014:
--
Description: 
There are at least three places in the code where HDFS paths may not be parsed 
correctly:

a) PathsUpdate.parsePath() does not handle collapse duplicate slashes in the 
path portion of URI into one slash. This method is used when getting paths data 
from HMS store. HDFS paths with duplicate slashes are perfectly legal and the 
specs refer to UNIX guidelines saying that multiple slashes should be treated 
as single slashes. If we keep multiple slashes in the path, such a path may be 
incorrectly split into path entries with some entries being empty, ultimately 
resulting in hard-to-troubleshoot ACL problems in the field. We should not 
assume that the URIs fed into parsePath() have already been normalized. It's 
easier to fix the code.

b) NotificationProcessor.splitPath() is using "/" regex instead of the correct 
"/+" one. While the inputs to this class _may_ be controlled by Sentry 
software, which _may_ normalize paths properly, it is better not to make such 
assumptions and just fix the code.

c) SentryStore

  was:
There are at least three places in the code where HDFS paths may not be parsed 
correctly:

a) PathsUpdate.parsePath() does not handle collapse duplicate slashes in the 
path portion of URI into one slash. This method is used when getting paths data 
from HMS store. HDFS paths with duplicate slashes are perfectly legal and the 
specs refer to UNIX guidelines saying that multiple slashes should be treated 
as single slashes. If we keep multiple slashes in the path, such a path may be 
incorrectly split into path entries with some entries being empty, ultimately 
resulting in hard-to-troubleshoot ACL problems in the field. We should not 
assume that the URIs fed into parsePath() have already been normalized. It's 
easier to fix the code.

b) NotificationProcessor.splitPath() is using "/" regex instead of the correct 
"/+" one. While the inputs to this class _may_ be controlled by Sentry 
software, which _may_ normalize paths properly, it is better not to make such 
assumptions and just fix the code.


> Incorrect handling of HDFS paths with multiple slashes
> --
>
> Key: SENTRY-2014
> URL: https://issues.apache.org/jira/browse/SENTRY-2014
> Project: Sentry
>  Issue Type: Bug
>Reporter: Vadim Spector
>Assignee: Vadim Spector
>
> There are at least three places in the code where HDFS paths may not be 
> parsed correctly:
> a) PathsUpdate.parsePath() does not handle collapse duplicate slashes in the 
> path portion of URI into one slash. This method is used when getting paths 
> data from HMS store. HDFS paths with duplicate slashes are perfectly legal 
> and the specs refer to UNIX guidelines saying that multiple slashes should be 
> treated as single slashes. If we keep multiple slashes in the path, such a 
> path may be incorrectly split into path entries with some entries being 
> empty, ultimately resulting in hard-to-troubleshoot ACL problems in the 
> field. We should not assume that the URIs fed into parsePath() have already 
> been normalized. It's easier to fix the code.
> b) NotificationProcessor.splitPath() is using "/" regex instead of the 
> correct "/+" one. While the inputs to this class _may_ be controlled by 
> Sentry software, which _may_ normalize paths properly, it is better not to 
> make such assumptions and just fix the code.
> c) SentryStore



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-2014) Incorrect handling of HDFS paths with multiple slashes

2017-10-20 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-2014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-2014:
--
Description: 
There are at least three places in the code where HDFS paths may not be parsed 
correctly:

a) PathsUpdate.parsePath() does not handle collapse duplicate slashes in the 
path portion of URI into one slash. This method is used when getting paths data 
from HMS store. HDFS paths with duplicate slashes are perfectly legal and the 
specs refer to UNIX guidelines saying that multiple slashes should be treated 
as single slashes. If we keep multiple slashes in the path, such a path may be 
incorrectly split into path entries with some entries being empty, ultimately 
resulting in hard-to-troubleshoot ACL problems in the field. We should not 
assume that the URIs fed into parsePath() have already been normalized. It's 
easier to fix the code.

b) NotificationProcessor.splitPath() is using "/" regex instead of the correct 
"/+" one. While the inputs to this class _may_ be controlled by Sentry 
software, which _may_ normalize paths properly, it is better not to make such 
assumptions and just fix the code.

  was:
There are at least two places in the code where HDFS paths may not be parsed 
correctly:

a) PathsUpdate.parsePath() does not handle collapse duplicate slashes in the 
path portion of URI into one slash. This method is used when getting paths data 
from HMS store. HDFS paths with duplicate slashes are perfectly legal and the 
specs refer to UNIX guidelines saying that multiple slashes should be treated 
as single slashes. If we keep multiple slashes in the path, such a path may be 
incorrectly split into path entries with some entries being empty, ultimately 
resulting in hard-to-troubleshoot ACL problems in the field. We should not 
assume that the URIs fed into parsePath() have already been normalized. It's 
easier to fix the code.

b) NotificationProcessor.splitPath() is using "/" regex instead of the correct 
"/+" one. While the inputs to this class _may_ be controlled by Sentry 
software, which _may_ normalize paths properly, it is better not to make such 
assumptions and just fix the code.


> Incorrect handling of HDFS paths with multiple slashes
> --
>
> Key: SENTRY-2014
> URL: https://issues.apache.org/jira/browse/SENTRY-2014
> Project: Sentry
>  Issue Type: Bug
>Reporter: Vadim Spector
>Assignee: Vadim Spector
>
> There are at least three places in the code where HDFS paths may not be 
> parsed correctly:
> a) PathsUpdate.parsePath() does not handle collapse duplicate slashes in the 
> path portion of URI into one slash. This method is used when getting paths 
> data from HMS store. HDFS paths with duplicate slashes are perfectly legal 
> and the specs refer to UNIX guidelines saying that multiple slashes should be 
> treated as single slashes. If we keep multiple slashes in the path, such a 
> path may be incorrectly split into path entries with some entries being 
> empty, ultimately resulting in hard-to-troubleshoot ACL problems in the 
> field. We should not assume that the URIs fed into parsePath() have already 
> been normalized. It's easier to fix the code.
> b) NotificationProcessor.splitPath() is using "/" regex instead of the 
> correct "/+" one. While the inputs to this class _may_ be controlled by 
> Sentry software, which _may_ normalize paths properly, it is better not to 
> make such assumptions and just fix the code.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SENTRY-2014) Incorrect handling of HDFS paths with multiple slashes

2017-10-20 Thread Vadim Spector (JIRA)
Vadim Spector created SENTRY-2014:
-

 Summary: Incorrect handling of HDFS paths with multiple slashes
 Key: SENTRY-2014
 URL: https://issues.apache.org/jira/browse/SENTRY-2014
 Project: Sentry
  Issue Type: Bug
Reporter: Vadim Spector
Assignee: Vadim Spector


There are at least two places in the code where HDFS paths may not be parsed 
correctly:

a) PathsUpdate.parsePath() does not handle collapse duplicate slashes in the 
path portion of URI into one slash. This method is used when getting paths data 
from HMS store. HDFS paths with duplicate slashes are perfectly legal and the 
specs refer to UNIX guidelines saying that multiple slashes should be treated 
as single slashes. If we keep multiple slashes in the path, such a path may be 
incorrectly split into path entries with some entries being empty, ultimately 
resulting in hard-to-troubleshoot ACL problems in the field. We should not 
assume that the URIs fed into parsePath() have already been normalized. It's 
easier to fix the code.

b) NotificationProcessor.splitPath() is using "/" regex instead of the correct 
"/+" one. While the inputs to this class _may_ be controlled by Sentry 
software, which _may_ normalize paths properly, it is better not to make such 
assumptions and just fix the code.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1993) StringIndexOutOfBoundsException in HMSPathsDumper.java

2017-10-20 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1993:
--
Attachment: (was: SENTRY-1993.02.patch)

> StringIndexOutOfBoundsException in HMSPathsDumper.java
> --
>
> Key: SENTRY-1993
> URL: https://issues.apache.org/jira/browse/SENTRY-1993
> Project: Sentry
>  Issue Type: Bug
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: SENTRY-1993.01.patch
>
>
> The following line in HMSPathsDumper.java is causing 
> StringIndexOutOfBoundsException:
> {code}
> if (tChildPathElement.charAt(0) == DupDetector.REPLACEMENT_STRING_PREFIX) {
> {code}
> It only happens when a path element is "", when someone mistakenly specifies 
> hdfs path with two "/" in the path section, like 
> hdfs://server//element1//element2, instead of 
> hdfs://server/element1/element2. In principle, such paths are invalid, but 
> this code should be made resistant to them anyway.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SENTRY-1993) StringIndexOutOfBoundsException in HMSPathsDumper.java

2017-10-20 Thread Vadim Spector (JIRA)

[ 
https://issues.apache.org/jira/browse/SENTRY-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16213249#comment-16213249
 ] 

Vadim Spector commented on SENTRY-1993:
---

I provided a patch with two extra lines of code in test class, to directly test 
Misha's fix. Unless there are any objectives, I'll proceed with the commit. It 
will require reverting previous commit SENTRY-1993 commit.

> StringIndexOutOfBoundsException in HMSPathsDumper.java
> --
>
> Key: SENTRY-1993
> URL: https://issues.apache.org/jira/browse/SENTRY-1993
> Project: Sentry
>  Issue Type: Bug
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: SENTRY-1993.01.patch, SENTRY-1993.02.patch
>
>
> The following line in HMSPathsDumper.java is causing 
> StringIndexOutOfBoundsException:
> {code}
> if (tChildPathElement.charAt(0) == DupDetector.REPLACEMENT_STRING_PREFIX) {
> {code}
> It only happens when a path element is "", when someone mistakenly specifies 
> hdfs path with two "/" in the path section, like 
> hdfs://server//element1//element2, instead of 
> hdfs://server/element1/element2. In principle, such paths are invalid, but 
> this code should be made resistant to them anyway.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1993) StringIndexOutOfBoundsException in HMSPathsDumper.java

2017-10-20 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1993:
--
Attachment: SENTRY-1993.02.patch

> StringIndexOutOfBoundsException in HMSPathsDumper.java
> --
>
> Key: SENTRY-1993
> URL: https://issues.apache.org/jira/browse/SENTRY-1993
> Project: Sentry
>  Issue Type: Bug
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: SENTRY-1993.01.patch, SENTRY-1993.02.patch
>
>
> The following line in HMSPathsDumper.java is causing 
> StringIndexOutOfBoundsException:
> {code}
> if (tChildPathElement.charAt(0) == DupDetector.REPLACEMENT_STRING_PREFIX) {
> {code}
> It only happens when a path element is "", when someone mistakenly specifies 
> hdfs path with two "/" in the path section, like 
> hdfs://server//element1//element2, instead of 
> hdfs://server/element1/element2. In principle, such paths are invalid, but 
> this code should be made resistant to them anyway.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SENTRY-1993) StringIndexOutOfBoundsException in HMSPathsDumper.java

2017-10-18 Thread Vadim Spector (JIRA)

[ 
https://issues.apache.org/jira/browse/SENTRY-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16210218#comment-16210218
 ] 

Vadim Spector commented on SENTRY-1993:
---

[~spena], just to clarify, my response "no I do not see how they can be related 
to the patch" was to your question "Are the tests related to the patch?", which 
I presume was about failing tests that [~mi...@cloudera.com] commented on. Then 
I revisited this discussions, and realized that [~mi...@cloudera.com] also said 
he added some tests for his fix, so to make clear my response was *not* about 
those - I was sure that Misha's added test case exercised the fix.

As to commit, you are right of course. I did commit after all three of us 
approved code review, and only then I ran into [~mi...@cloudera.com]'s 
discovery that the tests do not seem to be affected by his fix. Then I reviewed 
the code, and found that, indeed, the added test did not exercise the fixed 
code, found the right way, and published my finding.

I did open SENTRY-2008 to address HDFS path canonicalization, so things will be 
addressed, and it will affect this test and potentially others. As to this 
patch, we can either a) leave it as-is, or b) fix test case as I described and 
then I re-submit the patch. I think (a) is ok, because it is a really trivial 
patch and we have SENTRY-2008 so things won't fall between the cracks, but I do 
not mind (b) either.

Preferences?
[~akolb]



> StringIndexOutOfBoundsException in HMSPathsDumper.java
> --
>
> Key: SENTRY-1993
> URL: https://issues.apache.org/jira/browse/SENTRY-1993
> Project: Sentry
>  Issue Type: Bug
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: SENTRY-1993.01.patch
>
>
> The following line in HMSPathsDumper.java is causing 
> StringIndexOutOfBoundsException:
> {code}
> if (tChildPathElement.charAt(0) == DupDetector.REPLACEMENT_STRING_PREFIX) {
> {code}
> It only happens when a path element is "", when someone mistakenly specifies 
> hdfs path with two "/" in the path section, like 
> hdfs://server//element1//element2, instead of 
> hdfs://server/element1/element2. In principle, such paths are invalid, but 
> this code should be made resistant to them anyway.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (SENTRY-1993) StringIndexOutOfBoundsException in HMSPathsDumper.java

2017-10-18 Thread Vadim Spector (JIRA)

[ 
https://issues.apache.org/jira/browse/SENTRY-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209896#comment-16209896
 ] 

Vadim Spector edited comment on SENTRY-1993 at 10/18/17 7:51 PM:
-

[~mi...@cloudera.com], there is a way.

For some weird reason, TestHMSPathsFullDump.java uses unofficial method 
addPathsToAuthzObject(String, List), where each String in a List is a 
path. It also happens to automatically "fix" double slashes in those paths, so 
no wonder, your test passes even without your fix. "Testing" APIs that are 
never called apart from testing is bogus and needs to be fixed, but that's a 
different story ...

In reality on Sentry HMS plugin where full dump is actually generated, the 
following method is called instead: addPathsToAuthzObject(String, 
List), where each element in an outer List is a path parsed into 
a List. And parsed incorrectly, so each additional slash results in 
additional empty path entry.

So, the right way to make the test fail would be to pass your 
"after_double_slash" entry via official API as (notice empty "" element)
{code}
hmsPaths.addPathsToAuthzObject("db1.tbl11", Arrays.asList(Arrays.asList("user", 
"hive", "warehouse", "db1", "tbl11", "part_duplicate2", "", 
"after_double_slash")));
{code}
Then the test fails with StringIndexOutOfBoundsException as expected. If you 
introduce your fix, then the test still fails, because your assert statement
{code}
Assert.assertEquals(new HashSet(Arrays.asList("db1.tbl11")), 
hmsPaths2.findAuthzObject(new String[]{"user", "hive", "warehouse", "db1", 
"tbl11", "part_duplicate2", "after_double_slash"}, false));
{code}
should be replaced with (notice "" path entry):
{code}
Assert.assertEquals(new HashSet(Arrays.asList("db1.tbl11")), 
hmsPaths2.findAuthzObject(new String[]{"user", "hive", "warehouse", "db1", 
"tbl11", "part_duplicate2", "", "after_double_slash"}, false));
{code}

I verified it locally, so rest assured your patch works. The reason Im not sure 
we should "fix" the test code this way is because it is weird and hdfs path 
handling has to be fixed. So, once we fix it, your test would need to be fixed 
as well (perhaps HMSPaths should ignore empty "" entry in List). So, I 
suggest to keep your patch as-is.

[~spena] [~akolb]


was (Author: vspec...@gmail.com):
[~mi...@cloudera.com], there is a way.

For some weird reason, TestHMSPathsFullDump.java uses unofficial method 
addPathsToAuthzObject(String, List), where each String in a List is a 
path. It also happens to automatically "fix" double slashes in those paths, so 
no wonder, your test passes even without your fix. "Testing" APIs that are 
never called apart from testing is bogus and needs to be fixed, but that's a 
different story ...

In reality on Sentry HMS plugin where full dump is actually generated, the 
following method is called instead: addPathsToAuthzObject(String, 
List), where each element in an outer List is a path parsed into 
a List. And parsed incorrectly, so each additional slash results in 
additional empty path entry.

So, the right way to make the test fail would be to pass your 
"after_double_slash" entry via official API as
{code}
hmsPaths.addPathsToAuthzObject("db1.tbl11", Arrays.asList(Arrays.asList("user", 
"hive", "warehouse", "db1", "tbl11", "part_duplicate2", "", 
"after_double_slash")));
{code}
Then the test fails with StringIndexOutOfBoundsException as expected. If you 
introduce your fix, then the test still fails, because your assert statement
{code}
Assert.assertEquals(new HashSet(Arrays.asList("db1.tbl11")), 
hmsPaths2.findAuthzObject(new String[]{"user", "hive", "warehouse", "db1", 
"tbl11", "part_duplicate2", "after_double_slash"}, false));
{code}
should be replaced with (notice "" path entry):
{code}
Assert.assertEquals(new HashSet(Arrays.asList("db1.tbl11")), 
hmsPaths2.findAuthzObject(new String[]{"user", "hive", "warehouse", "db1", 
"tbl11", "part_duplicate2", "", "after_double_slash"}, false));
{code}

I verified it locally, so rest assured your patch works. The reason Im not sure 
we should "fix" the test code this way is because it is weird and hdfs path 
handling has to be fixed. So, once we fix it, your test would need to be fixed 
as well (perhaps HMSPaths should ignore empty "" entry in List). So, I 
suggest to keep your patch as-is.

[~spena] [~akolb]

> StringIndexOutOfBoundsException in HMSPathsDumper.java
> --
>
> Key: SENTRY-1993
> URL: https://issues.apache.org/jira/browse/SENTRY-1993
> Project: Sentry
>  Issue Type: Bug
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: SENTRY-1993.01.patch
>
>
> The following line in HMSPathsDumper.java is causing 
> StringIndexOutOfBoundsException:
> {code}
> if (tChildPathElement.charAt(0) == 

[jira] [Comment Edited] (SENTRY-1993) StringIndexOutOfBoundsException in HMSPathsDumper.java

2017-10-18 Thread Vadim Spector (JIRA)

[ 
https://issues.apache.org/jira/browse/SENTRY-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209896#comment-16209896
 ] 

Vadim Spector edited comment on SENTRY-1993 at 10/18/17 7:50 PM:
-

[~mi...@cloudera.com], there is a way.

For some weird reason, TestHMSPathsFullDump.java uses unofficial method 
addPathsToAuthzObject(String, List), where each String in a List is a 
path. It also happens to automatically "fix" double slashes in those paths, so 
no wonder, your test passes even without your fix. "Testing" APIs that are 
never called apart from testing is bogus and needs to be fixed, but that's a 
different story ...

In reality on Sentry HMS plugin where full dump is actually generated, the 
following method is called instead: addPathsToAuthzObject(String, 
List), where each element in an outer List is a path parsed into 
a List. And parsed incorrectly, so each additional slash results in 
additional empty path entry.

So, the right way to make the test fail would be to pass your 
"after_double_slash" entry via official API as
{code}
hmsPaths.addPathsToAuthzObject("db1.tbl11", Arrays.asList(Arrays.asList("user", 
"hive", "warehouse", "db1", "tbl11", "part_duplicate2", "", 
"after_double_slash")));
{code}
Then the test fails with StringIndexOutOfBoundsException as expected. If you 
introduce your fix, then the test still fails, because your assert statement
{code}
Assert.assertEquals(new HashSet(Arrays.asList("db1.tbl11")), 
hmsPaths2.findAuthzObject(new String[]{"user", "hive", "warehouse", "db1", 
"tbl11", "part_duplicate2", "after_double_slash"}, false));
{code}
should be replaced with (notice "" path entry):
{code}
Assert.assertEquals(new HashSet(Arrays.asList("db1.tbl11")), 
hmsPaths2.findAuthzObject(new String[]{"user", "hive", "warehouse", "db1", 
"tbl11", "part_duplicate2", "", "after_double_slash"}, false));
{code}

I verified it locally, so rest assured your patch works. The reason Im not sure 
we should "fix" the test code this way is because it is weird and hdfs path 
handling has to be fixed. So, once we fix it, your test would need to be fixed 
as well (perhaps HMSPaths should ignore empty "" entry in List). So, I 
suggest to keep your patch as-is.

[~spena] [~akolb]


was (Author: vspec...@gmail.com):
[~mi...@cloudera.com], there is a way.

For some weird reason, TestHMSPathsFullDump.java uses unofficial method 
addPathsToAuthzObject(String, List), where each String in a List is a 
path. It also happens to automatically "fix" double slashes in those paths, so 
no wonder, your test passes even without your fix. "Testing" APIs that are 
never called apart from testing is bogus and needs to be fixed, but that's a 
different story ...

In reality on Sentry HMS plugin where full dump is actually generated, the 
following method is called instead: addPathsToAuthzObject(String, 
List), where each element in an outer List is a path parsed into 
a List. And parsed incorrectly, so each additional slash results in 
additional empty path entry.

So, the right way to make the test fail would be to pass your 
"after_double_slash" entry via official API as
{code}
hmsPaths.addPathsToAuthzObject("db1.tbl11", Arrays.asList(Arrays.asList("user", 
"hive", "warehouse", "db1", "tbl11", "part_duplicate2", "", 
"after_double_slash")));
{code}
Then the test fails with StringIndexOutOfBoundsException as expected. If you 
introduce your fix, then the test still fails, because your assert statement
{code}
Assert.assertEquals(new HashSet(Arrays.asList("db1.tbl11")), 
hmsPaths2.findAuthzObject(new String[]{"user", "hive", "warehouse", "db1", 
"tbl11", "part_duplicate2", "after_double_slash"}, false));
{code}
should be replaced with (notice "" path entry):
{code}
Assert.assertEquals(new HashSet(Arrays.asList("db1.tbl11")), 
hmsPaths2.findAuthzObject(new String[]{"user", "hive", "warehouse", "db1", 
"tbl11", "part_duplicate2", "", "after_double_slash"}, false));
{code}

I verified it locally, so rest assured your patch works. The reason Im not sure 
we should "fix" the test code this way is because it is weird and hdfs path 
handling has to be fixed. So, once we fix it, your test would need to be fixed 
as well (perhaps HMSPaths should ignore empty "" entry in List). So, I 
suggest to keep your patch as-is.

> StringIndexOutOfBoundsException in HMSPathsDumper.java
> --
>
> Key: SENTRY-1993
> URL: https://issues.apache.org/jira/browse/SENTRY-1993
> Project: Sentry
>  Issue Type: Bug
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: SENTRY-1993.01.patch
>
>
> The following line in HMSPathsDumper.java is causing 
> StringIndexOutOfBoundsException:
> {code}
> if (tChildPathElement.charAt(0) == DupDetector.REPLACEMENT_STRING_PREFIX) {
> {code}
> It only happens 

[jira] [Comment Edited] (SENTRY-1993) StringIndexOutOfBoundsException in HMSPathsDumper.java

2017-10-18 Thread Vadim Spector (JIRA)

[ 
https://issues.apache.org/jira/browse/SENTRY-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209896#comment-16209896
 ] 

Vadim Spector edited comment on SENTRY-1993 at 10/18/17 7:48 PM:
-

[~mi...@cloudera.com], there is a way.

For some weird reason, TestHMSPathsFullDump.java uses unofficial method 
addPathsToAuthzObject(String, List), where each String in a List is a 
path. It also happens to automatically "fix" double slashes in those paths, so 
no wonder, your test passes even without your fix. "Testing" APIs that are 
never called apart from testing is bogus and needs to be fixed, but that's a 
different story ...

In reality on Sentry HMS plugin where full dump is actually generated, the 
following method is called instead: addPathsToAuthzObject(String, 
List), where each element in an outer List is a path parsed into 
a List. And parsed incorrectly, so each additional slash results in 
additional empty path entry.

So, the right way to make the test fail would be to pass your 
"after_double_slash" entry via official API as
{code}
hmsPaths.addPathsToAuthzObject("db1.tbl11", Arrays.asList(Arrays.asList("user", 
"hive", "warehouse", "db1", "tbl11", "part_duplicate2", "", 
"after_double_slash")));
{code}
Then the test fails with StringIndexOutOfBoundsException as expected. If you 
introduce your fix, then the test still fails, because your assert statement
{code}
Assert.assertEquals(new HashSet(Arrays.asList("db1.tbl11")), 
hmsPaths2.findAuthzObject(new String[]{"user", "hive", "warehouse", "db1", 
"tbl11", "part_duplicate2", "after_double_slash"}, false));
{code}
should be replaced with (notice "" path entry):
{code}
Assert.assertEquals(new HashSet(Arrays.asList("db1.tbl11")), 
hmsPaths2.findAuthzObject(new String[]{"user", "hive", "warehouse", "db1", 
"tbl11", "part_duplicate2", "", "after_double_slash"}, false));
{code}

I verified it locally, so rest assured your patch works. The reason Im not sure 
we should "fix" the test code this way is because it is weird and hdfs path 
handling has to be fixed. So, once we fix it, your test would need to be fixed 
as well (perhaps HMSPaths should ignore empty "" entry in List). So, I 
suggest to keep your patch as-is.


was (Author: vspec...@gmail.com):
[~mi...@cloudera.com], there is a way.

For some weird reason, TestHMSPathsFullDump.java uses unofficial method 
_addPathsToAuthzObject(String, List), where each String in a List is a 
path. It also happens to automatically "fix" double slashes in those paths, so 
no wonder, your test passes even without your fix. "Testing" APIs that are 
never called apart from testing is bogus and needs to be fixed, but that's a 
different story ...

In reality on Sentry HMS plugin where full dump is actually generated, the 
following method is called instead: addPathsToAuthzObject(String, 
List), where each element in an outer List is a path parsed into 
a List. And parsed incorrectly, so each additional slash results in 
additional empty path entry.

So, the right way to make the test fail would be to pass your 
"after_double_slash" entry via official API as

_hmsPaths.addPathsToAuthzObject("db1.tbl11", 
Arrays.asList(Arrays.asList("user", "hive", "warehouse", "db1", "tbl11", 
"part_duplicate2", "", "after_double_slash")));_

Then the test fails with StringIndexOutOfBoundsException as expected. If you 
introduce your fix, then the test still fails, because your assert statement

_Assert.assertEquals(new HashSet(Arrays.asList("db1.tbl11")), 
hmsPaths2.findAuthzObject(new String[]{"user", "hive", "warehouse", "db1", 
"tbl11", "part_duplicate2", "after_double_slash"}, false));_

should be replaced with (notice "" path entry):

_Assert.assertEquals(new HashSet(Arrays.asList("db1.tbl11")), 
hmsPaths2.findAuthzObject(new String[]{"user", "hive", "warehouse", "db1", 
"tbl11", "part_duplicate2", "", "after_double_slash"}, false));_

I verified it locally, so rest assured your patch works. The reason Im not sure 
we should "fix" the test code this way is because it is weird and hdfs path 
handling has to be fixed. So, once we fix it, your test would need to be fixed 
as well (perhaps HMSPaths should ignore empty "" entry in List). So, I 
suggest to keep your patch as-is.

> StringIndexOutOfBoundsException in HMSPathsDumper.java
> --
>
> Key: SENTRY-1993
> URL: https://issues.apache.org/jira/browse/SENTRY-1993
> Project: Sentry
>  Issue Type: Bug
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: SENTRY-1993.01.patch
>
>
> The following line in HMSPathsDumper.java is causing 
> StringIndexOutOfBoundsException:
> {code}
> if (tChildPathElement.charAt(0) == DupDetector.REPLACEMENT_STRING_PREFIX) {
> {code}
> It only happens when a path element is "", when someone mistakenly 

[jira] [Comment Edited] (SENTRY-1993) StringIndexOutOfBoundsException in HMSPathsDumper.java

2017-10-18 Thread Vadim Spector (JIRA)

[ 
https://issues.apache.org/jira/browse/SENTRY-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209896#comment-16209896
 ] 

Vadim Spector edited comment on SENTRY-1993 at 10/18/17 7:46 PM:
-

[~mi...@cloudera.com], there is a way.

For some weird reason, TestHMSPathsFullDump.java uses unofficial method 
_addPathsToAuthzObject(String, List), where each String in a List is a 
path. It also happens to automatically "fix" double slashes in those paths, so 
no wonder, your test passes even without your fix. "Testing" APIs that are 
never called apart from testing is bogus and needs to be fixed, but that's a 
different story ...

In reality on Sentry HMS plugin where full dump is actually generated, the 
following method is called instead: addPathsToAuthzObject(String, 
List), where each element in an outer List is a path parsed into 
a List. And parsed incorrectly, so each additional slash results in 
additional empty path entry.

So, the right way to make the test fail would be to pass your 
"after_double_slash" entry via official API as

_hmsPaths.addPathsToAuthzObject("db1.tbl11", 
Arrays.asList(Arrays.asList("user", "hive", "warehouse", "db1", "tbl11", 
"part_duplicate2", "", "after_double_slash")));_

Then the test fails with StringIndexOutOfBoundsException as expected. If you 
introduce your fix, then the test still fails, because your assert statement

_Assert.assertEquals(new HashSet(Arrays.asList("db1.tbl11")), 
hmsPaths2.findAuthzObject(new String[]{"user", "hive", "warehouse", "db1", 
"tbl11", "part_duplicate2", "after_double_slash"}, false));_

should be replaced with (notice "" path entry):

_Assert.assertEquals(new HashSet(Arrays.asList("db1.tbl11")), 
hmsPaths2.findAuthzObject(new String[]{"user", "hive", "warehouse", "db1", 
"tbl11", "part_duplicate2", "", "after_double_slash"}, false));_

I verified it locally, so rest assured your patch works. The reason Im not sure 
we should "fix" the test code this way is because it is weird and hdfs path 
handling has to be fixed. So, once we fix it, your test would need to be fixed 
as well (perhaps HMSPaths should ignore empty "" entry in List). So, I 
suggest to keep your patch as-is.


was (Author: vspec...@gmail.com):
[~mi...@cloudera.com], there is a way.

For some weird reason, TestHMSPathsFullDump.java uses unofficial method 
_addPathsToAuthzObject(String, List), where each String in a List is a 
path. It also happens to automatically "fix" double slashes in those paths, so 
no wonder, your test passes even without your fix. "Testing" APIs that are 
never called apart from testing is bogus and needs to be fixed, but that's a 
different story ...

In reality on Sentry HMS plugin where full dump is actually generated, the 
following method is called instead: addPathsToAuthzObject(String, 
List), where each element in an outer List is a path parsed into 
a List. And parsed incorrectly, so each additional slash results in 
additional empty path entry.

So, the right way to make the test fail would be to pass your 
"after_double_slash" entry via official API as

_hmsPaths.addPathsToAuthzObject("db1.tbl11", 
Arrays.asList(Arrays.asList("user", "hive", "warehouse", "db1", "tbl11", 
"part_duplicate2", "", "after_double_slash")));_

Then the test fails with StringIndexOutOfBoundsException as expected. If you 
introduce your fix, then the test still fails, because your assert statement

_Assert.assertEquals(new HashSet(Arrays.asList("db1.tbl11")), 
hmsPaths2.findAuthzObject(new String[]{"user", "hive", "warehouse", "db1", 
"tbl11", "part_duplicate2", "after_double_slash"}, false));_

should be replaced with (notice "" path entry):

_Assert.assertEquals(new HashSet(Arrays.asList("db1.tbl11")), 
hmsPaths2.findAuthzObject(new String[]{"user", "hive", "warehouse", "db1", 
"tbl11", "part_duplicate2", "", "after_double_slash"}, false));_

I verified it locally, so rest assured your patch works. The reason Im not sure 
we should "fix" the test code this way is because it is weird and hdfs path 
handling has to be fixed. So, once we fix it, your test would need to be fixed 
as well (perhaps HMSPaths should ignore empty "" entry in List). So, I 
suggest to keep your patch as-is.

> StringIndexOutOfBoundsException in HMSPathsDumper.java
> --
>
> Key: SENTRY-1993
> URL: https://issues.apache.org/jira/browse/SENTRY-1993
> Project: Sentry
>  Issue Type: Bug
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: SENTRY-1993.01.patch
>
>
> The following line in HMSPathsDumper.java is causing 
> StringIndexOutOfBoundsException:
> {code}
> if (tChildPathElement.charAt(0) == DupDetector.REPLACEMENT_STRING_PREFIX) {
> {code}
> It only happens when a path element is "", when someone mistakenly specifies 
> hdfs path with 

[jira] [Comment Edited] (SENTRY-1993) StringIndexOutOfBoundsException in HMSPathsDumper.java

2017-10-18 Thread Vadim Spector (JIRA)

[ 
https://issues.apache.org/jira/browse/SENTRY-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209896#comment-16209896
 ] 

Vadim Spector edited comment on SENTRY-1993 at 10/18/17 7:45 PM:
-

[~mi...@cloudera.com], there is a way.

For some weird reason, TestHMSPathsFullDump.java uses unofficial method 
_addPathsToAuthzObject(String, List), where each String in a List is a 
path. It also happens to automatically "fix" double slashes in those paths, so 
no wonder, your test passes even without your fix. "Testing" APIs that are 
never called apart from testing is bogus and needs to be fixed, but that's a 
different story ...

In reality on Sentry HMS plugin where full dump is actually generated, the 
following method is called instead: addPathsToAuthzObject(String, 
List), where each element in an outer List is a path parsed into 
a List. And parsed incorrectly, so each additional slash results in 
additional empty path entry.

So, the right way to make the test fail would be to pass your 
"after_double_slash" entry via official API as

_hmsPaths.addPathsToAuthzObject("db1.tbl11", 
Arrays.asList(Arrays.asList("user", "hive", "warehouse", "db1", "tbl11", 
"part_duplicate2", "", "after_double_slash")));_

Then the test fails with StringIndexOutOfBoundsException as expected. If you 
introduce your fix, then the test still fails, because your assert statement

_Assert.assertEquals(new HashSet(Arrays.asList("db1.tbl11")), 
hmsPaths2.findAuthzObject(new String[]{"user", "hive", "warehouse", "db1", 
"tbl11", "part_duplicate2", "after_double_slash"}, false));_

should be replaced with (notice "" path entry):

_Assert.assertEquals(new HashSet(Arrays.asList("db1.tbl11")), 
hmsPaths2.findAuthzObject(new String[]{"user", "hive", "warehouse", "db1", 
"tbl11", "part_duplicate2", "", "after_double_slash"}, false));_

I verified it locally, so rest assured your patch works. The reason Im not sure 
we should "fix" the test code this way is because it is weird and hdfs path 
handling has to be fixed. So, once we fix it, your test would need to be fixed 
as well (perhaps HMSPaths should ignore empty "" entry in List). So, I 
suggest to keep your patch as-is.


was (Author: vspec...@gmail.com):
[~mi...@cloudera.com], there is a way.

For some weird reason, TestHMSPathsFullDump.java uses unofficial method 
_addPathsToAuthzObject(String, List), where each String in a List is a 
path. It also happens to automatically "fix" double slashes in those paths, so 
no wonder, your test passes even without your fix. "Testing" APIs that are 
never called apart from testing is bogus and needs to be fixed, but that's a 
different story ...

In reality on Sentry HMS plugin where full dump is actually generated, the 
following method is called instead: addPathsToAuthzObject(String, 
List), where each element in an outer List is a path parsed into 
a List. And parsed incorrectly, so each additional slash results in 
additional empty path entry.

So, the right way to make the test fail would be to pass your 
"after_double_slash" entry via official API as

_hmsPaths.addPathsToAuthzObject("db1.tbl11", 
Arrays.asList(Arrays.asList("user", "hive", "warehouse", "db1", "tbl11", 
"part_duplicate2", "", "after_double_slash")));_

Then the test fails with StringIndexOutOfBoundsException as expected. If you 
introduce your fix, then the test still fails, because your assert statement

_Assert.assertEquals(new HashSet(Arrays.asList("db1.tbl11")), 
hmsPaths2.findAuthzObject(new String[]{"user", "hive", "warehouse", "db1", 
"tbl11", "part_duplicate2", "after_double_slash"}, false));_

should be replaced with (notice "" path entry):

_Assert.assertEquals(new HashSet(Arrays.asList("db1.tbl11")), 
hmsPaths2.findAuthzObject(new String[]{"user", "hive", "warehouse", "db1", 
"tbl11", "part_duplicate2", "", "after_double_slash"}, false));
_
I verified it locally, so rest assured your patch works. The reason Im not sure 
we should "fix" the test code this way is because it is weird and hdfs path 
handling has to be fixed. So, once we fix it, your test would need to be fixed 
as well (perhaps HMSPaths should ignore empty "" entry in List). So, I 
suggest to keep your patch as-is.

> StringIndexOutOfBoundsException in HMSPathsDumper.java
> --
>
> Key: SENTRY-1993
> URL: https://issues.apache.org/jira/browse/SENTRY-1993
> Project: Sentry
>  Issue Type: Bug
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: SENTRY-1993.01.patch
>
>
> The following line in HMSPathsDumper.java is causing 
> StringIndexOutOfBoundsException:
> {code}
> if (tChildPathElement.charAt(0) == DupDetector.REPLACEMENT_STRING_PREFIX) {
> {code}
> It only happens when a path element is "", when someone mistakenly specifies 
> hdfs path with 

[jira] [Commented] (SENTRY-1993) StringIndexOutOfBoundsException in HMSPathsDumper.java

2017-10-18 Thread Vadim Spector (JIRA)

[ 
https://issues.apache.org/jira/browse/SENTRY-1993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209686#comment-16209686
 ] 

Vadim Spector commented on SENTRY-1993:
---

[~spena], no I do not see how they can be related to the patch

> StringIndexOutOfBoundsException in HMSPathsDumper.java
> --
>
> Key: SENTRY-1993
> URL: https://issues.apache.org/jira/browse/SENTRY-1993
> Project: Sentry
>  Issue Type: Bug
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: SENTRY-1993.01.patch
>
>
> The following line in HMSPathsDumper.java is causing 
> StringIndexOutOfBoundsException:
> {code}
> if (tChildPathElement.charAt(0) == DupDetector.REPLACEMENT_STRING_PREFIX) {
> {code}
> It only happens when a path element is "", when someone mistakenly specifies 
> hdfs path with two "/" in the path section, like 
> hdfs://server//element1//element2, instead of 
> hdfs://server/element1/element2. In principle, such paths are invalid, but 
> this code should be made resistant to them anyway.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SENTRY-2008) Sentry needs to consistently handle HDFS paths, canonicalizing them when needed

2017-10-17 Thread Vadim Spector (JIRA)
Vadim Spector created SENTRY-2008:
-

 Summary: Sentry needs to consistently handle HDFS paths, 
canonicalizing them when needed
 Key: SENTRY-2008
 URL: https://issues.apache.org/jira/browse/SENTRY-2008
 Project: Sentry
  Issue Type: Bug
Reporter: Vadim Spector


Sentry HDFS sync depends on specific representation of HDFS paths. However, in 
UNIX (and HDFS) world, there is more than one way to represent a path, so 
string comparison would not work. One very common case is using duplicate 
slashes in a path, which should be considered as a single slash. Need to trace 
how HDFS paths are handled and where canonicalization is needed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (SENTRY-1976) Improve logging in MetastoreCacheInitializer.java

2017-10-09 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector resolved SENTRY-1976.
---
Resolution: Won't Fix

> Improve logging in MetastoreCacheInitializer.java
> -
>
> Key: SENTRY-1976
> URL: https://issues.apache.org/jira/browse/SENTRY-1976
> Project: Sentry
>  Issue Type: Improvement
>  Components: Sentry
>Reporter: Vadim Spector
>Assignee: Vadim Spector
>
> MetastoreCacheInitializer has unsatisfactory logging. In case of HDFS sync 
> problems we cannot affirmatively say whether missing mappings between 
> Sentry-managed objects and HDFS paths were ever passed from HMS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SENTRY-1976) Improve logging in MetastoreCacheInitializer.java

2017-10-09 Thread Vadim Spector (JIRA)
Vadim Spector created SENTRY-1976:
-

 Summary: Improve logging in MetastoreCacheInitializer.java
 Key: SENTRY-1976
 URL: https://issues.apache.org/jira/browse/SENTRY-1976
 Project: Sentry
  Issue Type: Improvement
Reporter: Vadim Spector
Assignee: Vadim Spector


MetastoreCacheInitializer has unsatisfactory logging. In case of HDFS sync 
problems we cannot affirmatively say whether missing mappings between 
Sentry-managed objects and HDFS paths were ever passed from HMS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1976) Improve logging in MetastoreCacheInitializer.java

2017-10-09 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1976:
--
Component/s: Sentry

> Improve logging in MetastoreCacheInitializer.java
> -
>
> Key: SENTRY-1976
> URL: https://issues.apache.org/jira/browse/SENTRY-1976
> Project: Sentry
>  Issue Type: Improvement
>  Components: Sentry
>Reporter: Vadim Spector
>Assignee: Vadim Spector
>
> MetastoreCacheInitializer has unsatisfactory logging. In case of HDFS sync 
> problems we cannot affirmatively say whether missing mappings between 
> Sentry-managed objects and HDFS paths were ever passed from HMS.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1966) Improve logging of HMS sync data (paths and permissions) flowing from Sentry to NameNode

2017-10-03 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1966:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Improve logging of HMS sync data (paths and permissions) flowing from Sentry 
> to NameNode
> 
>
> Key: SENTRY-1966
> URL: https://issues.apache.org/jira/browse/SENTRY-1966
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-1966.01.patch, SENTRY-1966.02.patch, 
> SENTRY-1966.03.patch
>
>
> Logging of HDFS sync data pulled by NN from Sentry is virtually non-existent, 
> which makes troubleshooting of HDFS sync issues highly complicated or 
> virtually impossible.
> The scope of this JIRA is limited to improving logging on NN side.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1966) Improve logging of HMS sync data (paths and permissions) flowing from Sentry to NameNode

2017-10-02 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1966:
--
Attachment: SENTRY-1966.03.patch

> Improve logging of HMS sync data (paths and permissions) flowing from Sentry 
> to NameNode
> 
>
> Key: SENTRY-1966
> URL: https://issues.apache.org/jira/browse/SENTRY-1966
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-1966.01.patch, SENTRY-1966.02.patch, 
> SENTRY-1966.03.patch
>
>
> Logging of HDFS sync data pulled by NN from Sentry is virtually non-existent, 
> which makes troubleshooting of HDFS sync issues highly complicated or 
> virtually impossible.
> The scope of this JIRA is limited to improving logging on NN side.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1966) Improve logging of HMS sync data (paths and permissions) flowing from Sentry to NameNode

2017-10-01 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1966:
--
Attachment: SENTRY-1966.02.patch

> Improve logging of HMS sync data (paths and permissions) flowing from Sentry 
> to NameNode
> 
>
> Key: SENTRY-1966
> URL: https://issues.apache.org/jira/browse/SENTRY-1966
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-1966.01.patch, SENTRY-1966.02.patch
>
>
> Logging of HDFS sync data pulled by NN from Sentry is virtually non-existent, 
> which makes troubleshooting of HDFS sync issues highly complicated or 
> virtually impossible.
> The scope of this JIRA is limited to improving logging on NN side.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1966) Improve logging of HMS sync data (paths and permissions) flowing from Sentry to NameNode

2017-09-29 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1966:
--
Description: 
Logging of HDFS sync data pulled by NN from Sentry is virtually non-existent, 
which makes troubleshooting of HDFS sync issues highly complicated or virtually 
impossible.

The scope of this JIRA is limited to improving logging on NN side.

  was:
Logging of HDFS sync data pulled by NN from Sentry is virtually non-existent, 
which makes troubleshooting of HDFS sync issues highly complicated or virtually 
impossible.

Teh scope of this JIRA is limited to improving logging on NN side.


> Improve logging of HMS sync data (paths and permissions) flowing from Sentry 
> to NameNode
> 
>
> Key: SENTRY-1966
> URL: https://issues.apache.org/jira/browse/SENTRY-1966
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-1966.01.patch
>
>
> Logging of HDFS sync data pulled by NN from Sentry is virtually non-existent, 
> which makes troubleshooting of HDFS sync issues highly complicated or 
> virtually impossible.
> The scope of this JIRA is limited to improving logging on NN side.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1966) Improve logging of HMS sync data (paths and permissions) flowing from Sentry to NameNode

2017-09-29 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1966:
--
Status: Patch Available  (was: Open)

> Improve logging of HMS sync data (paths and permissions) flowing from Sentry 
> to NameNode
> 
>
> Key: SENTRY-1966
> URL: https://issues.apache.org/jira/browse/SENTRY-1966
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-1966.01.patch
>
>
> Logging of HDFS sync data pulled by NN from Sentry is virtually non-existent, 
> which makes troubleshooting of HDFS sync issues highly complicated or 
> virtually impossible.
> Teh scope of this JIRA is limited to improving logging on NN side.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1966) Improve logging of HMS sync data (paths and permissions) flowing from Sentry to NameNode

2017-09-29 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1966:
--
Attachment: SENTRY-1966.01.patch

> Improve logging of HMS sync data (paths and permissions) flowing from Sentry 
> to NameNode
> 
>
> Key: SENTRY-1966
> URL: https://issues.apache.org/jira/browse/SENTRY-1966
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-1966.01.patch
>
>
> Logging of HDFS sync data pulled by NN from Sentry is virtually non-existent, 
> which makes troubleshooting of HDFS sync issues highly complicated or 
> virtually impossible.
> Teh scope of this JIRA is limited to improving logging on NN side.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1966) Improve logging of HMS sync data (paths and permissions) flowing from Sentry to NameNode

2017-09-29 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1966:
--
Attachment: (was: SENTRY-1966.01.patch)

> Improve logging of HMS sync data (paths and permissions) flowing from Sentry 
> to NameNode
> 
>
> Key: SENTRY-1966
> URL: https://issues.apache.org/jira/browse/SENTRY-1966
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>Assignee: Vadim Spector
>
> Logging of HDFS sync data pulled by NN from Sentry is virtually non-existent, 
> which makes troubleshooting of HDFS sync issues highly complicated or 
> virtually impossible.
> Teh scope of this JIRA is limited to improving logging on NN side.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1966) Improve logging of HMS sync data (paths and permissions) flowing from Sentry to NameNode

2017-09-29 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1966:
--
Status: Open  (was: Patch Available)

> Improve logging of HMS sync data (paths and permissions) flowing from Sentry 
> to NameNode
> 
>
> Key: SENTRY-1966
> URL: https://issues.apache.org/jira/browse/SENTRY-1966
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-1966.01.patch
>
>
> Logging of HDFS sync data pulled by NN from Sentry is virtually non-existent, 
> which makes troubleshooting of HDFS sync issues highly complicated or 
> virtually impossible.
> Teh scope of this JIRA is limited to improving logging on NN side.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1966) Improve logging of HMS sync data (paths and permissions) flowing from Sentry to NameNode

2017-09-29 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1966:
--
Status: Patch Available  (was: Open)

> Improve logging of HMS sync data (paths and permissions) flowing from Sentry 
> to NameNode
> 
>
> Key: SENTRY-1966
> URL: https://issues.apache.org/jira/browse/SENTRY-1966
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-1966.01.patch
>
>
> Logging of HDFS sync data pulled by NN from Sentry is virtually non-existent, 
> which makes troubleshooting of HDFS sync issues highly complicated or 
> virtually impossible.
> Teh scope of this JIRA is limited to improving logging on NN side.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1966) Improve logging of HMS sync data (paths and permissions) flowing from Sentry to NameNode

2017-09-29 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1966:
--
Attachment: SENTRY-1966.01.patch

> Improve logging of HMS sync data (paths and permissions) flowing from Sentry 
> to NameNode
> 
>
> Key: SENTRY-1966
> URL: https://issues.apache.org/jira/browse/SENTRY-1966
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-1966.01.patch
>
>
> Logging of HDFS sync data pulled by NN from Sentry is virtually non-existent, 
> which makes troubleshooting of HDFS sync issues highly complicated or 
> virtually impossible.
> Teh scope of this JIRA is limited to improving logging on NN side.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1966) Improve logging of HMS sync data (paths and permissions) flowing from Sentry to NameNode

2017-09-29 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1966:
--
Description: 
Logging of HDFS sync data pulled by NN from Sentry is virtually non-existent, 
which makes troubleshooting of HDFS sync issues highly complicated or virtually 
impossible.

Teh scope of this JIRA is limited to improving logging on NN side.

  was:Logging of HDFS sync data pulled by NN from Sentry is virtually 
non-existent, which makes troubleshooting of HDFS sync issues highly 
complicated or virtually impossible.


> Improve logging of HMS sync data (paths and permissions) flowing from Sentry 
> to NameNode
> 
>
> Key: SENTRY-1966
> URL: https://issues.apache.org/jira/browse/SENTRY-1966
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>Assignee: Vadim Spector
>
> Logging of HDFS sync data pulled by NN from Sentry is virtually non-existent, 
> which makes troubleshooting of HDFS sync issues highly complicated or 
> virtually impossible.
> Teh scope of this JIRA is limited to improving logging on NN side.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SENTRY-1712) Add capability to force Sentry to send full snapshot to HDFS

2017-09-01 Thread Vadim Spector (JIRA)

[ 
https://issues.apache.org/jira/browse/SENTRY-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151112#comment-16151112
 ] 

Vadim Spector commented on SENTRY-1712:
---

Resuming working on it.

> Add capability to force Sentry to send full snapshot to HDFS
> 
>
> Key: SENTRY-1712
> URL: https://issues.apache.org/jira/browse/SENTRY-1712
> Project: Sentry
>  Issue Type: Bug
>  Components: Sentry
>Affects Versions: 2.0.0
>Reporter: Na Li
>Assignee: Vadim Spector
>Priority: Minor
> Fix For: 2.0.0
>
>
> Right now, there is no way to manually force Sentry to get full snapshot. 
> We need to add this capability. That gives us more power to handle error 
> condition and enable testing. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (SENTRY-1667) Switching to Jetty v9 library

2017-09-01 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector reassigned SENTRY-1667:
-

Assignee: (was: Vadim Spector)

> Switching to Jetty v9 library
> -
>
> Key: SENTRY-1667
> URL: https://issues.apache.org/jira/browse/SENTRY-1667
> Project: Sentry
>  Issue Type: Bug
>Reporter: Vadim Spector
> Attachments: SENTRY-1667.001.patch
>
>
> Switching the version of Jetty library from 8.1.19.v20160209 to 
> 9.3.8.v20160314



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (SENTRY-1667) Switching to Jetty v9 library

2017-09-01 Thread Vadim Spector (JIRA)

[ 
https://issues.apache.org/jira/browse/SENTRY-1667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16151098#comment-16151098
 ] 

Vadim Spector commented on SENTRY-1667:
---

[~spena], no, it was a short involvement just to enable the build with Jetty 
v9, I'm no longer on it

> Switching to Jetty v9 library
> -
>
> Key: SENTRY-1667
> URL: https://issues.apache.org/jira/browse/SENTRY-1667
> Project: Sentry
>  Issue Type: Bug
>Reporter: Vadim Spector
>Assignee: Vadim Spector
> Attachments: SENTRY-1667.001.patch
>
>
> Switching the version of Jetty library from 8.1.19.v20160209 to 
> 9.3.8.v20160314



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (SENTRY-1891) SentryPlugin triggers full update due to concurrency bug

2017-08-18 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector reassigned SENTRY-1891:
-

Assignee: Vadim Spector

> SentryPlugin triggers full update due to concurrency bug
> 
>
> Key: SENTRY-1891
> URL: https://issues.apache.org/jira/browse/SENTRY-1891
> Project: Sentry
>  Issue Type: Bug
>Reporter: Vadim Spector
>Assignee: Vadim Spector
>
> Sentry server can trigger full update to NameNode for no good reason, in high 
> load cases, due to the bug in concurrency handling. Full update is always 
> done at the initialization time and for large number of permissions and/or 
> hdfs paths it takes lots of time and significantly increases (temporarily) 
> heap size. It is performed during normal operations only if Sentry server 
> decides that it has gaps in the partial updates' sequence numbers, so it 
> restores the entire snapshot. Full update during normal operations is a 
> highly disruptive event, leading to huge increasw in heap size, system 
> slowdown, and even crash. Below is the explanation of the problem:
> 
> *SentryPlugin.java* has multiple permission update methods, such as 
> _onDropSentryRole()_, _onDropSentryPrivilege()_, 
> _onAlterSentryRoleDeleteGroups()_, etc. They, of course, can be called 
> concurrently. And they all delegate work to 
> _permsUpdater.handleUpdateNotification(update)_ call, which, in turn, 
> serializes those requests by turning them into the Runnable tasks and 
> submitting them into a single-threaded thread pool. Each job ends up 
> appending an update to the updates list. So far so good, all updates are 
> serialized.
> However, each of those methods first creates _PermissionsUpdate_ object with 
> auto-incremented sequence number, and only then passes this object to 
> _permsUpdate.handleUpdateNotification(update)_. Here is typical code snippet:
> {quote}PermissionsUpdate update = new 
> PermissionsUpdate(permSeqNum.incrementAndGet(), false);
> // what if another permission update thread preempts this one right here and 
> starts and finishes the whole method ???
> update.addPrivilegeUpdate(authzObj).putToAddPrivileges(roleName, 
> privilege.getAction().toUpperCase());
> // ... or here ???
> permsUpdater.handleUpdateNotification(update); 
> LOGGER.debug("Authz Perm preUpdate [" + update.getSeqNum() + "]..");{quote}
> The problem is that sequence number assignment to _PermissionsUpdate_ object 
> and appending this object to the updates list (by calling 
> _permsUpdater.handleUpdateNotification(update)_ ) must be an atomic pair of 
> operations, and it is not. In multi-threaded environment sooner or later we 
> can end up with appending updates into the updates list with  out of order 
> sequence numbers.
> The problem is not that the updates may end up appended to the updates list 
> not in the order of their arrival. We never guaranteed the right order for 
> concurrent requests coming from different clients. The problem is that they 
> end up appended to the list with their sequence numbers out of order, e.g. 
> #101, #100, instead of #100, #101. Then handleUpdateNotification() method has 
> the following lines:
> {quote}final boolean editNotMissed = lastSeenSeqNum.incrementAndGet() == 
> update.getSeqNum();{quote}
> Suppose lastSeenSeqNum was 99, and the next appended update is #101. Then, 
> obviously, 99+1 != 101, so editNotMissed is set to false. It triggers full 
> update logic a few lines of code later:
> {quote}if (imageRetreiver != null) \{
>try \{
>  toUpdate = 
> imageRetreiver.retrieveFullImage(update.getSeqNum());
>\} catch (Exception e) \{
>  LOGGER.warn("failed to retrieve full image: ", e);
>\}
>updateable = updateable.updateFull(toUpdate);
> \}{quote}
> Since imageRetriever != null for permissions updater, all the hell breaks 
> loose.
> It can be fixed in _SentryPlugin.java_ for all permission update methods as 
> follows:
> {quote}PermissionsUpdate update;
> synchronized (permsUpdateLock) \{ // now sequence number increment and adding 
> update to the list are performed atomically
>   update = new PermissionsUpdate(permSeqNum.incrementAndGet(), false);
>update.addPrivilegeUpdate(authzObj).putToAddPrivileges(roleName, 
> privilege.getAction().toUpperCase());
>permsUpdater.handleUpdateNotification(update);
> \}
> LOGGER.debug("Authz Perm preUpdate [" + update.getSeqNum() + "]..");
> {quote}
>  _handleUpdateNotification()_ is always fast since it submits real work to 
> thread pool; so, it does not block, and introducing additional 
> synchronization should not affect concurrency.



--
This message was sent by Atlassian JIRA

[jira] [Updated] (SENTRY-1891) SentryPlugin triggers full update due to concurrency bug

2017-08-18 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1891:
--
Description: 
Sentry server can trigger full update to NameNode for no good reason, in high 
load cases, due to the bug in concurrency handling. Full update is always done 
at the initialization time and for large number of permissions and/or hdfs 
paths it takes lots of time and significantly increases (temporarily) heap 
size. It is performed during normal operations only if Sentry server decides 
that it has gaps in the partial updates' sequence numbers, so it restores the 
entire snapshot. Full update during normal operations is a highly disruptive 
event, leading to huge increasw in heap size, system slowdown, and even crash. 
Below is the explanation of the problem:



*SentryPlugin.java* has multiple permission update methods, such as 
_onDropSentryRole()_, _onDropSentryPrivilege()_, 
_onAlterSentryRoleDeleteGroups()_, etc. They, of course, can be called 
concurrently. And they all delegate work to 
_permsUpdater.handleUpdateNotification(update)_ call, which, in turn, 
serializes those requests by turning them into the Runnable tasks and 
submitting them into a single-threaded thread pool. Each job ends up appending 
an update to the updates list. So far so good, all updates are serialized.

However, each of those methods first creates _PermissionsUpdate_ object with 
auto-incremented sequence number, and only then passes this object to 
_permsUpdate.handleUpdateNotification(update)_. Here is typical code snippet:

{quote}PermissionsUpdate update = new 
PermissionsUpdate(permSeqNum.incrementAndGet(), false);
// what if another permission update thread preempts this one right here and 
starts and finishes the whole method ???
update.addPrivilegeUpdate(authzObj).putToAddPrivileges(roleName, 
privilege.getAction().toUpperCase());
// ... or here ???
permsUpdater.handleUpdateNotification(update); 
LOGGER.debug("Authz Perm preUpdate [" + update.getSeqNum() + "]..");{quote}

The problem is that sequence number assignment to _PermissionsUpdate_ object 
and appending this object to the updates list (by calling 
_permsUpdater.handleUpdateNotification(update)_ ) must be an atomic pair of 
operations, and it is not. In multi-threaded environment sooner or later we can 
end up with appending updates into the updates list with  out of order sequence 
numbers.

The problem is not that the updates may end up appended to the updates list not 
in the order of their arrival. We never guaranteed the right order for 
concurrent requests coming from different clients. The problem is that they end 
up appended to the list with their sequence numbers out of order, e.g. #101, 
#100, instead of #100, #101. Then handleUpdateNotification() method has the 
following lines:

{quote}final boolean editNotMissed = lastSeenSeqNum.incrementAndGet() == 
update.getSeqNum();{quote}

Suppose lastSeenSeqNum was 99, and the next appended update is #101. Then, 
obviously, 99+1 != 101, so editNotMissed is set to false. It triggers full 
update logic a few lines of code later:

{quote}if (imageRetreiver != null) \{
   try \{
 toUpdate = 
imageRetreiver.retrieveFullImage(update.getSeqNum());
   \} catch (Exception e) \{
 LOGGER.warn("failed to retrieve full image: ", e);
   \}
   updateable = updateable.updateFull(toUpdate);
\}{quote}

Since imageRetriever != null for permissions updater, all the hell breaks loose.

It can be fixed in _SentryPlugin.java_ for all permission update methods as 
follows:

{quote}PermissionsUpdate update;
synchronized (permsUpdateLock) { // now sequence number increment and adding 
update to the list are performed atomically
  update = new PermissionsUpdate(permSeqNum.incrementAndGet(), false);
   update.addPrivilegeUpdate(authzObj).putToAddPrivileges(roleName, 
privilege.getAction().toUpperCase());
   permsUpdater.handleUpdateNotification(update);
}
LOGGER.debug("Authz Perm preUpdate [" + update.getSeqNum() + "]..");
{quote}

 _handleUpdateNotification()_ is always fast since it submits real work to 
thread pool; so, it does not block, and introducing additional synchronization 
should not affect concurrency.

  was:
Sentry server can trigger full update to NameNode for no good reason, in high 
load cases, due to the bug in concurrency handling. Full update is always done 
at the initialization time and for large number of permissions and/or hdfs 
paths it takes lots of time and significantly increases (temporarily) heap 
size. It is performed during normal operations only if Sentry server decides 
that it has gaps in the partial updates' sequence numbers, so it restores the 
entire snapshot. Full update during normal operations is a highly disruptive 
event, leading 

[jira] [Updated] (SENTRY-1891) SentryPlugin triggers full update due to concurrency bug

2017-08-18 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1891:
--
Description: 
Sentry server can trigger full update to NameNode for no good reason, in high 
load cases, due to the bug in concurrency handling. Full update is always done 
at the initialization time and for large number of permissions and/or hdfs 
paths it takes lots of time and significantly increases (temporarily) heap 
size. It is performed during normal operations only if Sentry server decides 
that it has gaps in the partial updates' sequence numbers, so it restores the 
entire snapshot. Full update during normal operations is a highly disruptive 
event, leading to huge increasw in heap size, system slowdown, and even crash. 
Below is the explanation of the problem:



*SentryPlugin.java* has multiple permission update methods, such as 
_onDropSentryRole()_, _onDropSentryPrivilege()_, 
_onAlterSentryRoleDeleteGroups()_, etc. They, of course, can be called 
concurrently. And they all delegate work to 
_permsUpdater.handleUpdateNotification(update)_ call, which, in turn, 
serializes those requests by turning them into the Runnable tasks and 
submitting them into a single-threaded thread pool. Each job ends up appending 
an update to the updates list. So far so good, all updates are serialized.

However, each of those methods first creates _PermissionsUpdate_ object with 
auto-incremented sequence number, and only then passes this object to 
_permsUpdate.handleUpdateNotification(update)_. Here is typical code snippet:

{quote}PermissionsUpdate update = new 
PermissionsUpdate(permSeqNum.incrementAndGet(), false);
// what if another permission update thread preempts this one right here and 
starts and finishes the whole method ???
update.addPrivilegeUpdate(authzObj).putToAddPrivileges(roleName, 
privilege.getAction().toUpperCase());
// ... or here ???
permsUpdater.handleUpdateNotification(update); 
LOGGER.debug("Authz Perm preUpdate [" + update.getSeqNum() + "]..");{quote}

The problem is that sequence number assignment to _PermissionsUpdate_ object 
and appending this object to the updates list (by calling 
_permsUpdater.handleUpdateNotification(update)_ ) must be an atomic pair of 
operations, and it is not. In multi-threaded environment sooner or later we can 
end up with appending updates into the updates list with  out of order sequence 
numbers.

The problem is not that the updates may end up appended to the updates list not 
in the order of their arrival. We never guaranteed the right order for 
concurrent requests coming from different clients. The problem is that they end 
up appended to the list with their sequence numbers out of order, e.g. #101, 
#100, instead of #100, #101. Then handleUpdateNotification() method has the 
following lines:

{quote}final boolean editNotMissed = lastSeenSeqNum.incrementAndGet() == 
update.getSeqNum();{quote}

Suppose lastSeenSeqNum was 99, and the next appended update is #101. Then, 
obviously, 99+1 != 101, so editNotMissed is set to false. It triggers full 
update logic a few lines of code later:

{quote}if (imageRetreiver != null) \{
   try \{
 toUpdate = 
imageRetreiver.retrieveFullImage(update.getSeqNum());
   \} catch (Exception e) \{
 LOGGER.warn("failed to retrieve full image: ", e);
   \}
   updateable = updateable.updateFull(toUpdate);
\}{quote}

Since imageRetriever != null for permissions updater, all the hell breaks loose.

It can be fixed in _SentryPlugin.java_ for all permission update methods as 
follows:

{quote}PermissionsUpdate update;
synchronized (permsUpdateLock) \{ // now sequence number increment and adding 
update to the list are performed atomically
  update = new PermissionsUpdate(permSeqNum.incrementAndGet(), false);
   update.addPrivilegeUpdate(authzObj).putToAddPrivileges(roleName, 
privilege.getAction().toUpperCase());
   permsUpdater.handleUpdateNotification(update);
\}
LOGGER.debug("Authz Perm preUpdate [" + update.getSeqNum() + "]..");
{quote}

 _handleUpdateNotification()_ is always fast since it submits real work to 
thread pool; so, it does not block, and introducing additional synchronization 
should not affect concurrency.

  was:
Sentry server can trigger full update to NameNode for no good reason, in high 
load cases, due to the bug in concurrency handling. Full update is always done 
at the initialization time and for large number of permissions and/or hdfs 
paths it takes lots of time and significantly increases (temporarily) heap 
size. It is performed during normal operations only if Sentry server decides 
that it has gaps in the partial updates' sequence numbers, so it restores the 
entire snapshot. Full update during normal operations is a highly disruptive 
event, leading 

[jira] [Updated] (SENTRY-1891) SentryPlugin triggers full update due to concurrency bug

2017-08-18 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1891:
--
Description: 
Sentry server can trigger full update to NameNode for no good reason, in high 
load cases, due to the bug in concurrency handling. Full update is always done 
at the initialization time and for large number of permissions and/or hdfs 
paths it takes lots of time and significantly increases (temporarily) heap 
size. It is performed during normal operations only if Sentry server decides 
that it has gaps in the partial updates' sequence numbers, so it restores the 
entire snapshot. Full update during normal operations is a highly disruptive 
event, leading to huge increasw in heap size, system slowdown, and even crash. 
Below is the explanation of the problem:



*SentryPlugin.java* has multiple permission update methods, such as 
_onDropSentryRole()_, _onDropSentryPrivilege()_, 
_onAlterSentryRoleDeleteGroups()_, etc. They, of course, can be called 
concurrently. And they all delegate work to 
_permsUpdater.handleUpdateNotification(update)_ call, which, in turn, 
serializes those requests by turning them into the Runnable tasks and 
submitting them into a single-threaded thread pool. Each job ends up appending 
an update to the updates list. So far so good, all updates are serialized.

However, each of those methods first creates _PermissionsUpdate_ object with 
auto-incremented sequence number, and only then passes this object to 
_permsUpdate.handleUpdateNotification(update)_. Here is typical code snippet:

{quote}PermissionsUpdate update = new 
PermissionsUpdate(permSeqNum.incrementAndGet(), false);
// what if another permission update thread preempts this one right here and 
starts and finishes the whole method ???
update.addPrivilegeUpdate(authzObj).putToAddPrivileges(roleName, 
privilege.getAction().toUpperCase());
// ... or here ???
permsUpdater.handleUpdateNotification(update); 
LOGGER.debug("Authz Perm preUpdate [" + update.getSeqNum() + "]..");{quote}

The problem is that sequence number assignment to _PermissionsUpdate_ object 
and appending this object to the updates list (by calling 
_permsUpdater.handleUpdateNotification(update)_ ) must be an atomic pair of 
operations, and it is not. In multi-threaded environment sooner or later we can 
end up with appending updates into the updates list with  out of order sequence 
numbers.

The problem is not that the updates may end up appended to the updates list not 
in the order of their arrival. We never guaranteed the right order for 
concurrent requests coming from different clients. The problem is that they end 
up appended to the list with their sequence numbers out of order, e.g. #101, 
#100, instead of #100, #101. Then handleUpdateNotification() method has the 
following lines:

{quote}final boolean editNotMissed = lastSeenSeqNum.incrementAndGet() == 
update.getSeqNum();{quote}

Suppose lastSeenSeqNum was 99, and the next appended update is #101. Then, 
obviously, 99+1 != 101, so editNotMissed is set to false. It triggers full 
update logic a few lines of code later:

{quote}if (imageRetreiver != null) {
   try {
 toUpdate = 
imageRetreiver.retrieveFullImage(update.getSeqNum());
   } catch (Exception e) {
 LOGGER.warn("failed to retrieve full image: ", e);
   }
   updateable = updateable.updateFull(toUpdate);
}{quote}

Since imageRetriever != null for permissions updater, all the hell breaks loose.

It can be fixed in _SentryPlugin.java_ for all permission update methods as 
follows:

{quote}PermissionsUpdate update;
synchronized (permsUpdateLock) { // now sequence number increment and adding 
update to the list are performed atomically
  update = new PermissionsUpdate(permSeqNum.incrementAndGet(), false);
   update.addPrivilegeUpdate(authzObj).putToAddPrivileges(roleName, 
privilege.getAction().toUpperCase());
   permsUpdater.handleUpdateNotification(update);
}
LOGGER.debug("Authz Perm preUpdate [" + update.getSeqNum() + "]..");
{quote}

 _handleUpdateNotification()_ is always fast since it submits real work to 
thread pool; so, it does not block, and introducing additional synchronization 
should not affect concurrency.

  was:
Sentry server can trigger full update to NameNode for no good reason, in high 
load cases, due to the bug in concurrency handling. Full update is always done 
at the initialization time and for large number of permissions and/or hdfs 
paths it takes lots of time and significantly increases (temporarily) heap 
size. It is performed during normal operations only if Sentry server decides 
that it has gaps in the partial updates' sequence numbers, so it restores the 
entire snapshot. Full update during normal operations is a highly disruptive 
event, leading to huge 

[jira] [Updated] (SENTRY-1891) SentryPlugin triggers full update due to concurrency bug

2017-08-18 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1891:
--
Description: 
Sentry server can trigger full update to NameNode for no good reason, in high 
load cases, due to the bug in concurrency handling. Full update is always done 
at the initialization time and for large number of permissions and/or hdfs 
paths it takes lots of time and significantly increases (temporarily) heap 
size. It is performed during normal operations only if Sentry server decides 
that it has gaps in the partial updates' sequence numbers, so it restores the 
entire snapshot. Full update during normal operations is a highly disruptive 
event, leading to huge increasw in heap size, system slowdown, and even crash. 
Below is the explanation of the problem:



*SentryPlugin.java* has multiple permission update methods, such as 
_onDropSentryRole()_, _onDropSentryPrivilege()_, 
_onAlterSentryRoleDeleteGroups()_, etc. They, of course, can be called 
concurrently. And they all delegate work to 
_permsUpdater.handleUpdateNotification(update)_ call, which, in turn, 
serializes those requests by turning them into the Runnable tasks and 
submitting them into a single-threaded thread pool. Each job ends up appending 
an update to the updates list. So far so good, all updates are serialized.

However, each of those methods first creates _PermissionsUpdate_ object with 
auto-incremented sequence number, and only then passes this object to 
_permsUpdate.handleUpdateNotification(update)_. Here is typical code snippet:

{quote}PermissionsUpdate update = new 
PermissionsUpdate(permSeqNum.incrementAndGet(), false);
// what if another permission update thread preempts this one right here and 
starts and finishes the whole method ???
update.addPrivilegeUpdate(authzObj).putToAddPrivileges(roleName, 
privilege.getAction().toUpperCase());
// ... or here ???
permsUpdater.handleUpdateNotification(update); 
LOGGER.debug("Authz Perm preUpdate [" + update.getSeqNum() + "]..");{quote}

The problem is that sequence number assignment to _PermissionsUpdate_ object 
and appending this object to the updates list (by calling 
_permsUpdater.handleUpdateNotification(update)_ ) must be an atomic pair of 
operations, and it is not. In multi-threaded environment sooner or later we can 
end up with appending updates into the updates list with  out of order sequence 
numbers.

The problem is not that the updates may end up appended to the updates list not 
in the order of their arrival. We never guaranteed the right order for 
concurrent requests coming from different clients. The problem is that they end 
up appended to the list with their sequence numbers out of order, e.g. #101, 
#100, instead of #100, #101. Then handleUpdateNotification() method has the 
following lines:

{quote}final boolean editNotMissed = lastSeenSeqNum.incrementAndGet() == 
update.getSeqNum();{quote}

Suppose lastSeenSeqNum was 99, and the next appended update is #101. Then, 
obviously, 99+1 != 101, so editNotMissed is set to false. It triggers full 
update logic a few lines of code later:

bq. if (imageRetreiver != null) {
bq.   try {
bq. toUpdate = 
imageRetreiver.retrieveFullImage(update.getSeqNum());
bq.   } catch (Exception e) {
bq. LOGGER.warn("failed to retrieve full image: ", e);
bq.   }
bq.   updateable = updateable.updateFull(toUpdate);
bq. }

Since imageRetriever != null for permissions updater, all the hell breaks loose.

It can be fixed in _SentryPlugin.java_ for all permission update methods as 
follows:

{quote}PermissionsUpdate update;
synchronized (permsUpdateLock) { // now sequence number increment and adding 
update to the list are performed atomically
  update = new PermissionsUpdate(permSeqNum.incrementAndGet(), false);
   update.addPrivilegeUpdate(authzObj).putToAddPrivileges(roleName, 
privilege.getAction().toUpperCase());
   permsUpdater.handleUpdateNotification(update);
}
LOGGER.debug("Authz Perm preUpdate [" + update.getSeqNum() + "]..");
{quote}

 _handleUpdateNotification()_ is always fast since it submits real work to 
thread pool; so, it does not block, and introducing additional synchronization 
should not affect concurrency.

  was:
Sentry server can trigger full update to NameNode for no good reason, in high 
load cases, due to the bug in concurrency handling. Full update is always done 
at the initialization time and for large number of permissions and/or hdfs 
paths it takes lots of time and significantly increases (temporarily) heap 
size. It is performed during normal operations only if Sentry server decides 
that it has gaps in the partial updates' sequence numbers, so it restores the 
entire snapshot. Full update during normal operations is a highly disruptive 
event, 

[jira] [Created] (SENTRY-1891) SentryPlugin triggers full update due to concurrency bug

2017-08-18 Thread Vadim Spector (JIRA)
Vadim Spector created SENTRY-1891:
-

 Summary: SentryPlugin triggers full update due to concurrency bug
 Key: SENTRY-1891
 URL: https://issues.apache.org/jira/browse/SENTRY-1891
 Project: Sentry
  Issue Type: Bug
Reporter: Vadim Spector


Sentry server can trigger full update to NameNode for no good reason, in high 
load cases, due to the bug in concurrency handling. Full update is always done 
at the initialization time and for large number of permissions and/or hdfs 
paths it takes lots of time and significantly increases (temporarily) heap 
size. It is performed during normal operations only if Sentry server decides 
that it has gaps in the partial updates' sequence numbers, so it restores the 
entire snapshot. Full update during normal operations is a highly disruptive 
event, leading to huge increasw in heap size, system slowdown, and even crash. 
Below is the explanation of the problem:



*SentryPlugin.java* has multiple permission update methods, such as 
_onDropSentryRole()_, _onDropSentryPrivilege()_, 
_onAlterSentryRoleDeleteGroups()_, etc. They, of course, can be called 
concurrently. And they all delegate work to 
_permsUpdater.handleUpdateNotification(update)_ call, which, in turn, 
serializes those requests by turning them into the Runnable tasks and 
submitting them into a single-threaded thread pool. Each job ends up appending 
an update to the updates list. So far so good, all updates are serialized.

However, each of those methods first creates _PermissionsUpdate_ object with 
auto-incremented sequence number, and only then passes this object to 
_permsUpdate.handleUpdateNotification(update)_. Here is typical code snippet:

{quote}PermissionsUpdate update = new 
PermissionsUpdate(permSeqNum.incrementAndGet(), false);
// what if another permission update thread preempts this one right here and 
starts and finishes the whole method ???
update.addPrivilegeUpdate(authzObj).putToAddPrivileges(roleName, 
privilege.getAction().toUpperCase());
// ... or here ???
permsUpdater.handleUpdateNotification(update); 
LOGGER.debug("Authz Perm preUpdate [" + update.getSeqNum() + "]..");{quote}

The problem is that sequence number assignment to _PermissionsUpdate_ object 
and appending this object to the updates list (by calling 
_permsUpdater.handleUpdateNotification(update)_ ) must be an atomic pair of 
operations, and it is not. In multi-threaded environment sooner or later we can 
end up with appending updates into the updates list with  out of order sequence 
numbers.

The problem is not that the updates may end up appended to the updates list not 
in the order of their arrival. We never guaranteed the right order for 
concurrent requests coming from different clients. The problem is that they end 
up appended to the list with their sequence numbers out of order, e.g. #101, 
#100, instead of #100, #101. Then handleUpdateNotification() method has the 
following lines:

{quote}final boolean editNotMissed = lastSeenSeqNum.incrementAndGet() == 
update.getSeqNum();{quote}

Suppose lastSeenSeqNum was 99, and the next appended update is #101. Then, 
obviously, 99+1 != 101, so editNotMissed is set to false. It triggers full 
update logic a few lines of code later:

{quote}if (imageRetreiver != null) {
  try {
toUpdate = imageRetreiver.retrieveFullImage(update.getSeqNum());
  } catch (Exception e) {
LOGGER.warn("failed to retrieve full image: ", e);
  }
  updateable = updateable.updateFull(toUpdate);
}
{quote}

Since imageRetriever != null for permissions updater, all the hell breaks loose.

It can be fixed in _SentryPlugin.java_ for all permission update methods as 
follows:

{quote}PermissionsUpdate update;
synchronized (permsUpdateLock) { // now sequence number increment and adding 
update to the list are performed atomically
  update = new PermissionsUpdate(permSeqNum.incrementAndGet(), false);
   update.addPrivilegeUpdate(authzObj).putToAddPrivileges(roleName, 
privilege.getAction().toUpperCase());
   permsUpdater.handleUpdateNotification(update);
}
LOGGER.debug("Authz Perm preUpdate [" + update.getSeqNum() + "]..");
{quote}

 _handleUpdateNotification()_ is always fast since it submits real work to 
thread pool; so, it does not block, and introducing additional synchronization 
should not affect concurrency.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SENTRY-1887) Report Sentry server failover events on Sentry clients' side

2017-08-16 Thread Vadim Spector (JIRA)
Vadim Spector created SENTRY-1887:
-

 Summary: Report Sentry server failover events on Sentry clients' 
side
 Key: SENTRY-1887
 URL: https://issues.apache.org/jira/browse/SENTRY-1887
 Project: Sentry
  Issue Type: Improvement
Reporter: Vadim Spector


Based on to-be-introduced Sentry ping() API (SENTRY-1866), report 
unavailability of one of the Sentry server instances in the Sentry client logs.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (SENTRY-1884) Establishing at least INFO logging levels for exceptional events

2017-08-15 Thread Vadim Spector (JIRA)
Vadim Spector created SENTRY-1884:
-

 Summary: Establishing at least INFO logging levels for exceptional 
events
 Key: SENTRY-1884
 URL: https://issues.apache.org/jira/browse/SENTRY-1884
 Project: Sentry
  Issue Type: Improvement
Reporter: Vadim Spector


Right logging level is critical for troubleshooting. The rule of thumb is that 
all events that happen once at the initialization and a few times when fixing 
some exceptional "should-never-happen-in-theory" events like full image 
synchronization between Sentry and NameNode, should happen at at least INFO 
level, and possibly at WARNING level.

This is a broad JIRA requiring code review across multiple Sentry classes.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (SENTRY-1866) Add ping Thrift APIs for Sentry services

2017-08-14 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector reassigned SENTRY-1866:
-

Assignee: Vadim Spector

> Add ping Thrift APIs for Sentry services
> 
>
> Key: SENTRY-1866
> URL: https://issues.apache.org/jira/browse/SENTRY-1866
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>Assignee: Vadim Spector
>
> Motivation: can think of several benefits, but the immediate ones are:
> a) logging Sentry server unavailability on client side. With multiple active 
> connections to Sentry server, logging each failed RPC call (currently at 
> DEBUG level) at the INFO level to the same Sentry server that went down can 
> be way too verbose and redundant. It can also be misleading, because there is 
> no mandatory link between when connection was established and when an attempt 
> to use it has failed, so we can report failures of the old, stale, 
> connections. While periodic pinging allows implementing a simple logic that 
> reports server becoming unavailable and then available again only once per 
> occurrence.
> b) enabling optimization of connection pooling. Ping RPC call would most 
> likely fail due to server unavailability (crash, restart ..), so it can be 
> temporarily marked as unavailable, so no new connection attempts are made 
> within some configurable time interval (say, 1 sec). It can be a significant 
> performance improvement in high call volume scenario.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1866) Add ping Thrift APIs for Sentry services

2017-07-25 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1866:
--
Description: 
Motivation: can think of several benefits, but the immediate ones are:

a) logging Sentry server unavailability on client side. With multiple active 
connections to Sentry server, logging each failed RPC call (currently at DEBUG 
level) at the INFO level to the same Sentry server that went down can be way 
too verbose and redundant. It can also be misleading, because there is no 
mandatory link between when connection was established and when an attempt to 
use it has failed, so we can report failures of the old, stale, connections. 
While periodic pinging allows implementing a simple logic that reports server 
becoming unavailable and then available again only once per occurrence.

b) enabling optimization of connection pooling. Ping RPC call would most likely 
fail due to server unavailability (crash, restart ..), so it can be temporarily 
marked as unavailable, so no new connection attempts are made within some 
configurable time interval (say, 1 sec). It can be a significant performance 
improvement in high call volume scenario.

  was:
Motivation: can think of several benefits, but the immediate ones are:

a) logging Sentry server unavailability on client side. With multiple active 
connections to Sentry server, logging each failed RPC call (currently at DEBUG 
level) at the INFO level to the same Sentry server that went down can be way 
too verbose and redundant. It can also be misleading, because there is no 
mandatory link between when connection was established and when an attempt to 
use it has failed, so we can report failures of the old, stale, connections. 
While periodic pinging allows implementing a simple logic that reports server 
becoming unavailable and then available again only once per occurrence.

b) enabling optimization of connection pooling. Ping RPC call would most likely 
fail due to server unavailability (crash, restart ..), so it can be temporarily 
marked as unavailable, so no new connection attempts are made within some 
configurable time interval (say, 1 sec). It can be a significant performance 
improvement in high volume calls scenario.


> Add ping Thrift APIs for Sentry services
> 
>
> Key: SENTRY-1866
> URL: https://issues.apache.org/jira/browse/SENTRY-1866
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>
> Motivation: can think of several benefits, but the immediate ones are:
> a) logging Sentry server unavailability on client side. With multiple active 
> connections to Sentry server, logging each failed RPC call (currently at 
> DEBUG level) at the INFO level to the same Sentry server that went down can 
> be way too verbose and redundant. It can also be misleading, because there is 
> no mandatory link between when connection was established and when an attempt 
> to use it has failed, so we can report failures of the old, stale, 
> connections. While periodic pinging allows implementing a simple logic that 
> reports server becoming unavailable and then available again only once per 
> occurrence.
> b) enabling optimization of connection pooling. Ping RPC call would most 
> likely fail due to server unavailability (crash, restart ..), so it can be 
> temporarily marked as unavailable, so no new connection attempts are made 
> within some configurable time interval (say, 1 sec). It can be a significant 
> performance improvement in high call volume scenario.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1866) Add ping Thrift APIs for Sentry services

2017-07-25 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1866:
--
Description: 
Motivation: can think of several benefits, but the immediate ones are:

a) logging Sentry server unavailability on client side. With multiple active 
connections to Sentry server, logging each failed RPC call (currently at DEBUG 
level) at the INFO level to the same Sentry server that went down can be way 
too verbose and redundant. It can also be misleading, because there is no 
mandatory link between when connection was established and when an attempt to 
use it has failed, so we can report failures of the old, stale, connections. 
While periodic pinging allows implementing a simple logic that reports server 
becoming unavailable and then available again only once per occurrence.

b) enabling optimization of connection pooling. Ping RPC call would most likely 
fail due to server unavailability (crash, restart ..), so it can be temporarily 
marked as unavailable, so no new connection attempts are made within some 
configurable time interval (say, 1 sec). It can be a significant performance 
improvement in high volume calls scenario.

  was:
Motivation: can think of several, but the immediate ones are:

a) logging Sentry server unavailability on client side. With multiple active 
connections to Sentry server, logging each failed RPC call (currently at DEBUG 
level) at the INFO level to the same Sentry server that went down can be way 
too verbose and redundant. It can also be misleading, because there is no 
mandatory link between when connection was established and when an attempt to 
use it has failed, so we can report failures of the old, stale, connections. 
While periodic pinging allows implementing a simple logic that reports server 
becoming unavailable and then available again only once per occurrence.

b) enabling optimization of connection pooling. Ping RPC call would most likely 
fail due to server unavailability (crash, restart ..), so it can be temporarily 
marked as unavailable, so no new connection attempts are made within some 
configurable time interval (say, 1 sec). It can be a significant performance 
improvement in high volume calls scenario.


> Add ping Thrift APIs for Sentry services
> 
>
> Key: SENTRY-1866
> URL: https://issues.apache.org/jira/browse/SENTRY-1866
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>
> Motivation: can think of several benefits, but the immediate ones are:
> a) logging Sentry server unavailability on client side. With multiple active 
> connections to Sentry server, logging each failed RPC call (currently at 
> DEBUG level) at the INFO level to the same Sentry server that went down can 
> be way too verbose and redundant. It can also be misleading, because there is 
> no mandatory link between when connection was established and when an attempt 
> to use it has failed, so we can report failures of the old, stale, 
> connections. While periodic pinging allows implementing a simple logic that 
> reports server becoming unavailable and then available again only once per 
> occurrence.
> b) enabling optimization of connection pooling. Ping RPC call would most 
> likely fail due to server unavailability (crash, restart ..), so it can be 
> temporarily marked as unavailable, so no new connection attempts are made 
> within some configurable time interval (say, 1 sec). It can be a significant 
> performance improvement in high volume calls scenario.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1866) Add ping Thrift APIs for Sentry services

2017-07-25 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1866:
--
Description: 
Motivation: can think of several, but the immediate ones are:

a) logging Sentry server unavailability on client side. With multiple active 
connections to Sentry server, logging each failed RPC call (currently at DEBUG 
level) at the INFO level to the same Sentry server that went down can be way 
too verbose and redundant. It can also be misleading, because there is no 
mandatory link between when connection was established and when an attempt to 
use it has failed, so we can report failures of the old, stale, connections. 
While periodic pinging allows implementing a simple logic that reports server 
becoming unavailable and then available again only once per occurrence.

b) enabling optimization of connection pooling. Ping RPC call would most likely 
fail due to server unavailability (crash, restart ..), so it can be temporarily 
marked as unavailable, so no new connection attempts are made within some 
configurable time interval (say, 1 sec). It can be a significant performance 
improvement in high volume calls scenario.

  was:
Motivation: can think of several, but the immediate ones are:

a) logging Sentry server unavailability on client side. With multiple active 
connections to Sentry server, logging each failed RPC call (currently at DEBUG 
level) at the INFO level to the same Sentry server that went down can be way 
too verbose and redundant. It can also be misleading, because there is no 
mandatory link between when connection was established and when an attempt to 
use it has failed, so we can report failures of the old, stale, connections.

b) enabling optimization of connection pooling. Ping RPC call would most likely 
fail due to server inavailability (crash, restart ..), so it can be temporarily 
marked as unavailable, so no new connection attempts are made within some 
configurable time interval (say, 1 sec).


> Add ping Thrift APIs for Sentry services
> 
>
> Key: SENTRY-1866
> URL: https://issues.apache.org/jira/browse/SENTRY-1866
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>
> Motivation: can think of several, but the immediate ones are:
> a) logging Sentry server unavailability on client side. With multiple active 
> connections to Sentry server, logging each failed RPC call (currently at 
> DEBUG level) at the INFO level to the same Sentry server that went down can 
> be way too verbose and redundant. It can also be misleading, because there is 
> no mandatory link between when connection was established and when an attempt 
> to use it has failed, so we can report failures of the old, stale, 
> connections. While periodic pinging allows implementing a simple logic that 
> reports server becoming unavailable and then available again only once per 
> occurrence.
> b) enabling optimization of connection pooling. Ping RPC call would most 
> likely fail due to server unavailability (crash, restart ..), so it can be 
> temporarily marked as unavailable, so no new connection attempts are made 
> within some configurable time interval (say, 1 sec). It can be a significant 
> performance improvement in high volume calls scenario.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1866) Add ping Thrift APIs for Sentry services

2017-07-25 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1866:
--
Description: 
Motivation: can think of several, but the immediate ones are:

a) logging Sentry server unavailability on client side. With multiple active 
connections to Sentry server, logging each failed RPC call (currently at DEBUG 
level) at the INFO level to the same Sentry server that went down can be way 
too verbose and redundant. It can also be misleading, because there is no 
mandatory link between when connection was established and when an attempt to 
use it has failed, so we can report failures of the old, stale, connections.

b) enabling optimization of connection pooling. Ping RPC call would most likely 
fail due to server inavailability (crash, restart ..), so it can be temporarily 
marked as unavailable, so no new connection attempts are made within some 
configurable time interval (say, 1 sec).

  was:
Motivation: can think of several, but the immediate ones are:

a) logging Sentry server unavailability on client side. With multiple active 
connections to Sentry server, logging each failed RPC call (currently at DEBUG 
level) at the INFO level to the same Sentry server that went down can be way 
too verbose and redundant. It can also be misleading, because there is no 
mandatory link between when connection was established and when an attempt to 
use it has failed, so we can report failures of the old connections.

b) enabling optimization of connection pooling. Ping RPC call would most likely 
fail due to server inavailability (crash, restart ..), so it can be temporarily 
marked as unavailable, so no new connection attempts are made within some 
configurable time interval (say, 1 sec).


> Add ping Thrift APIs for Sentry services
> 
>
> Key: SENTRY-1866
> URL: https://issues.apache.org/jira/browse/SENTRY-1866
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>
> Motivation: can think of several, but the immediate ones are:
> a) logging Sentry server unavailability on client side. With multiple active 
> connections to Sentry server, logging each failed RPC call (currently at 
> DEBUG level) at the INFO level to the same Sentry server that went down can 
> be way too verbose and redundant. It can also be misleading, because there is 
> no mandatory link between when connection was established and when an attempt 
> to use it has failed, so we can report failures of the old, stale, 
> connections.
> b) enabling optimization of connection pooling. Ping RPC call would most 
> likely fail due to server inavailability (crash, restart ..), so it can be 
> temporarily marked as unavailable, so no new connection attempts are made 
> within some configurable time interval (say, 1 sec).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1866) Add ping Thrift APIs for Sentry services

2017-07-25 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1866:
--
Description: 
Motivation: can think of several, but the immediate ones are:

a) logging Sentry server unavailability on client side. With multiple active 
connections to Sentry server, logging each failed RPC call (currently at DEBUG 
level) at the INFO level to the same Sentry server that went down can be way 
too verbose and redundant. It can also be misleading, because there is no 
mandatory link between when connection was established and when an attempt to 
use it has failed, so we can report failures of the old connections.

b) enabling optimization of connection pooling. Ping RPC call would most likely 
fail due to server inavailability (crash, restart ..), so it can be temporarily 
marked as unavailable, so no new connection attempts are made within some 
configurable time interval (say, 1 sec).

  was:
Motivation: can think of several, but the immediate ones are:

a) logging Sentry server unavailability on client side. With multiple active 
connections to Sentry server, logging each failed RPC call (currently at DEBUG 
level) to the same Sentry server that went down can be redundant and way too 
much. It can also be misleading, because there is no mandatory link between 
when connection was established and when an attempt to use it has failed, so we 
can report failures of the old connections.

b) enabling optimization of connection pooling. Ping RPC call would most likely 
fail due to server inavailability (crash, restart ..), so it can be temporarily 
marked as unavailable, so no new connection attempts are made within some 
configurable time interval (say, 1 sec).


> Add ping Thrift APIs for Sentry services
> 
>
> Key: SENTRY-1866
> URL: https://issues.apache.org/jira/browse/SENTRY-1866
> Project: Sentry
>  Issue Type: Improvement
>Reporter: Vadim Spector
>
> Motivation: can think of several, but the immediate ones are:
> a) logging Sentry server unavailability on client side. With multiple active 
> connections to Sentry server, logging each failed RPC call (currently at 
> DEBUG level) at the INFO level to the same Sentry server that went down can 
> be way too verbose and redundant. It can also be misleading, because there is 
> no mandatory link between when connection was established and when an attempt 
> to use it has failed, so we can report failures of the old connections.
> b) enabling optimization of connection pooling. Ping RPC call would most 
> likely fail due to server inavailability (crash, restart ..), so it can be 
> temporarily marked as unavailable, so no new connection attempts are made 
> within some configurable time interval (say, 1 sec).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (SENTRY-1866) Add ping Thrift APIs for Sentry services

2017-07-25 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1866:
--
Description: 
Motivation: can think of several, but the immediate ones are:

a) logging Sentry server unavailability on client side. With multiple active 
connections to Sentry server, logging each failed RPC call (currently at DEBUG 
level) to the same Sentry server that went down can be redundant and way too 
much. It can also be misleading, because there is no mandatory link between 
when connection was established and when an attempt to use it has failed, so we 
can report failures of the old connections.

b) enabling optimization of connection pooling. Ping RPC call would most likely 
fail due to server inavailability (crash, restart ..), so it can be temporarily 
marked as unavailable, so no new connection attempts are made within some 
configurable time interval (say, 1 sec).

  was:
Motivation: can think of several, but the immediate ones are:

a) logging Sentry server unavailability on client side. With multiple active 
connections to Sentry server, logging each failed RPC call (currently at DEBUG 
level) to the same Sentry server that went down can be redundant and way too 
much. It can also be misleading, because there is no mandatory link between 
when connection was established and when an attempt to use it has failed, so we 
can report failures of the old connections.

Sentry HA-specific: when the Sentry client fails over from one sentry server to 
the other, it does not print a message that it has done so. Have such a client 
print a simple, clear INFO level message when the client fails over form one 
Sentry server to another.

Design considerations:

"Sentry client" stands for a specific class instance capable of connecting to a 
specific Sentry server instance from some app (usually another Hadoop service). 
In HA scenario, Sentry client relies on connection pooling (SentryTransportPool 
class) to select one of several available configured Sentry server instances. 
Whenever connection fails, Sentry client simply asks SentryTransportPool to a) 
invalidate this specific connection and b) get another connection instead. 
There is no monitoring of Sentry server liveliness per se. Each Sentry client 
finds out about a failure independently and only at the time of trying to use 
it. Thus there may be no particular correlation between the time of the 
discovery of connection failure and the time Sentry server actually becomes 
unavailable. E.g. a client can discover a failure of the old connection, long 
after Sentry server crushed and then was restarted (and maybe restarted more 
than once!).

Intuitively, one would like yto have a single log per Sentry server 
crush/shutdown; but due to the explanations above, it seems difficult, if not 
impossible, to group the connections by instance(s) of Sentry server when these 
connections were initiated. Therefore, it may be challenging to say whether 
multiple connection failures have to do with "the same" Sentry server instance 
going down. Therefore, it is difficult to report exactly one connection failure 
per one Sentry server shutdown/crush event.

Yet, the desire to have visibility into such events in the field is 
understandable. At the same time, if we simply log every connection failure, 
such logging can be massive - there may be many concurrent connections to 
Sentry server(s) from the same app. Such logging would be less than useful.

The solution is required to use some less than perfect rules, by which the 
number of connection failure logs can be contained. The alternative solution of 
introducing periodic pinging of Sentry server and only logging pinging failures 
would be possible as well (and it would be awesome if Sentry server responded 
to pings with the server-id initialized as the server start time stamp - this 
would totally solve the problem), but requires more radical changes.

The simplest solution seems to be as follows: since the recovery of the failed 
Sentry serve is likely to take some time, we do not need to be too clever; it 
may just be enough to report each connection failure to a given Sentry instance 
no more often than once every N (configurable value) seconds. If one connection 
failure to Sentry server instance X has been reported, another one won't be 
reported before N seconds expire. This will keep the number of connection 
failure messages at bay. Such logs may still be confusing, if a client attempts 
to use some old connection from the old server instance after some idle period, 
and after the problem has long been fixed, but this is arguably still better 
than nothing.

Alternative suggestions are welcome.


> Add ping Thrift APIs for Sentry services
> 
>
> Key: SENTRY-1866
> URL: https://issues.apache.org/jira/browse/SENTRY-1866
>

[jira] [Updated] (SENTRY-1866) Add ping Thrift APIs for Sentry services

2017-07-25 Thread Vadim Spector (JIRA)

 [ 
https://issues.apache.org/jira/browse/SENTRY-1866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vadim Spector updated SENTRY-1866:
--
Description: 
Motivation: can think of several, but the immediate ones are:

a) logging Sentry server unavailability on client side. With multiple active 
connections to Sentry server, logging each failed RPC call (currently at DEBUG 
level) to the same Sentry server that went down can be redundant and way too 
much. It can also be misleading, because there is no mandatory link between 
when connection was established and when an attempt to use it has failed, so we 
can report failures of the old connections.

Sentry HA-specific: when the Sentry client fails over from one sentry server to 
the other, it does not print a message that it has done so. Have such a client 
print a simple, clear INFO level message when the client fails over form one 
Sentry server to another.

Design considerations:

"Sentry client" stands for a specific class instance capable of connecting to a 
specific Sentry server instance from some app (usually another Hadoop service). 
In HA scenario, Sentry client relies on connection pooling (SentryTransportPool 
class) to select one of several available configured Sentry server instances. 
Whenever connection fails, Sentry client simply asks SentryTransportPool to a) 
invalidate this specific connection and b) get another connection instead. 
There is no monitoring of Sentry server liveliness per se. Each Sentry client 
finds out about a failure independently and only at the time of trying to use 
it. Thus there may be no particular correlation between the time of the 
discovery of connection failure and the time Sentry server actually becomes 
unavailable. E.g. a client can discover a failure of the old connection, long 
after Sentry server crushed and then was restarted (and maybe restarted more 
than once!).

Intuitively, one would like yto have a single log per Sentry server 
crush/shutdown; but due to the explanations above, it seems difficult, if not 
impossible, to group the connections by instance(s) of Sentry server when these 
connections were initiated. Therefore, it may be challenging to say whether 
multiple connection failures have to do with "the same" Sentry server instance 
going down. Therefore, it is difficult to report exactly one connection failure 
per one Sentry server shutdown/crush event.

Yet, the desire to have visibility into such events in the field is 
understandable. At the same time, if we simply log every connection failure, 
such logging can be massive - there may be many concurrent connections to 
Sentry server(s) from the same app. Such logging would be less than useful.

The solution is required to use some less than perfect rules, by which the 
number of connection failure logs can be contained. The alternative solution of 
introducing periodic pinging of Sentry server and only logging pinging failures 
would be possible as well (and it would be awesome if Sentry server responded 
to pings with the server-id initialized as the server start time stamp - this 
would totally solve the problem), but requires more radical changes.

The simplest solution seems to be as follows: since the recovery of the failed 
Sentry serve is likely to take some time, we do not need to be too clever; it 
may just be enough to report each connection failure to a given Sentry instance 
no more often than once every N (configurable value) seconds. If one connection 
failure to Sentry server instance X has been reported, another one won't be 
reported before N seconds expire. This will keep the number of connection 
failure messages at bay. Such logs may still be confusing, if a client attempts 
to use some old connection from the old server instance after some idle period, 
and after the problem has long been fixed, but this is arguably still better 
than nothing.

Alternative suggestions are welcome.

  was:
Motivations: can think of several, but the immediate one is:

Sentry HA-specific: when the Sentry client fails over from one sentry server to 
the other, it does not print a message that it has done so. Have such a client 
print a simple, clear INFO level message when the client fails over form one 
Sentry server to another.

Design considerations:

"Sentry client" stands for a specific class instance capable of connecting to a 
specific Sentry server instance from some app (usually another Hadoop service). 
In HA scenario, Sentry client relies on connection pooling (SentryTransportPool 
class) to select one of several available configured Sentry server instances. 
Whenever connection fails, Sentry client simply asks SentryTransportPool to a) 
invalidate this specific connection and b) get another connection instead. 
There is no monitoring of Sentry server liveliness per se. Each Sentry client 
finds out about a failure independently and only at the time of trying to 

  1   2   3   4   >