[jira] [Comment Edited] (YARN-3663) Federation State and Policy Store (DBMS implementation)

Carlo Curino (JIRA) Thu, 13 Apr 2017 16:18:06 -0700

    [ 
https://issues.apache.org/jira/browse/YARN-3663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968218#comment-15968218
 ]


Carlo Curino edited comment on YARN-3663 at 4/13/17 11:17 PM:
--------------------------------------------------------------

Thanks [~giovanni.fumarola] for the updated patch.

Conf:
 # You set 
{{YarnConfiguration.DEFAULT_FEDERATION_STATESTORE_SQL_MAXCONNECTIONS}} to 1. 
Isn't this too small? What values you commonly use? I would put a conservative 
but not overly tight value, otherwise users are forced to learn/modify more 
params.
 # We should  omit {{yarn.federation.state-store.sql.max-connections} from 
yarn-default.xml like you do for all other params in this patch.
 
In {{SQLFederationStateStore}}
  # For most fields except {{CALL_SP_GET_ALL_POLICY_CONFIGURATIONS}} you simply 
use the plural to differentiate the getter of one or all items. Be consistent 
(e.g., remove the ALL here, or add it everywhere else)
  # Minor: userNameDB --> userName, passwordDB --> password
  # When you throw the exceptions (e.g., subcluster registration), it might be 
nice in the message to include the sub-clusterID / ip or any other info one can 
use to debug.
  # Can you comment on why we are using: {{FederationStateStoreErrorCode}}? 
They don't seem to be connected to SQL error codes, and they are not used 
anywhere else (we normally use named exception, which are easier to 
understand/track).
  # at line 277-278: formatting
  # We should try to remove redundance, e.g., you have lots of things that look 
like this:
  {code}
      try {
      FederationMembershipStateStoreInputValidator
          .validateGetSubClusterInfoRequest(subClusterRequest);
    } catch (FederationStateStoreInvalidInputException e) {
      FederationStateStoreUtils.logAndThrowInvalidInputException(LOG,
          "Unable to obtain the infomation about the SubCluster. "
              + e.getMessage());
    }
  {code}
  They could be factored out to 
{{FederationMembershipStateStoreInputValidator.validate(subClusterRequest)}} 
where the type of input param is used to differentiate the method, and the 
logAndThrowInvalidInputException is done on that side. Same goes for 
{{checkSubClusterInfo}}.
  # Similarly to the above we should try to factor out the very repetitive code 
to create connection/statements, set params, run, and throw. I don't have a 
specific advise on this, but the code is mostly copy and paste, which we should 
avoid.
  # Move the {{fromStringToSubClusterState}} to the SubclusterState class (and 
call it fromString(). 
  # Why {{getSubCluster}} and {{getSubClusters}} use different mechanics for 
return values? (registered params vs ResultSet)? Might be worth to be 
consistent (probably using ResultSet).
  # Line 540: Is this behavior (overwrite existing) consistent with general 
YARN? (I think so, but want to check) 
  # Some of the log are a bit vague {{LOG.debug("Got the information about the 
specified application }} say spefically what info where gotten
  # if you use {{LOG.debug}} consider to prefix it with a check if we are in 
debug mode (save time/objects creations for the String that are then not used).
  # You have several {{      if (cstmt.getInt(2) != 1) }} ROWCOUNT checks. This 
mix the no tuple where changed to multiple tuple where changed. Distinguishing 
the two cases, might help debug (do we have duplicates in DB, or the entry was 
not found). (Not mandatory, just somethign to consider)
  # {{setPolicyConfiguration}} You are doing {{cstmt.setBytes(3, new 
byte[policyConf.getParams().remaining()]);}} which adds an empty byte[] instead 
of what is coming in input.
  # {{getCurrentVersion}} and {{loadVersion}} throw a NotSupportedException or 
something of the sort, a silent return null is easy to confuse people. (I know 
the full version will be in V2, let's just have a clear breakage if someone try 
to use this methods).
  
  
  (.... to be continued ...)

  (.... continued...)
  In {{FederationStateStoreInputValidator}}:
  # Please consider to rename the validate methods (ok to have separate JIRA 
for this). 
  
  In {{FederationStateStoreUtils}}
  # You log at info and debug level inconsistently for the {{set*}} you added. 
I would suggest debug for all.
  
  In {{HSQLDBFederationStateStore}}
  # the empty constructor with super() is redundant, just omit the constructor 
alltogether.
  # I think the code would be more  readable if all the schema was in a 
well-formatted hsqldb-federation-schema.sql (or broken down like the SQLServer 
one) and the code was reading from file the statements and executing them. 
  # The use of {{SELECT *}} is kind of dangerous, because it hides field 
renames/moves and other schema evolution issues, that might lead to hard to 
debug, I would use explicit named fields always
  # shouldn't {{SP_SUBCLUSTERHEARTBEAT}} update the {{lastHeartBeat}} field? 
More concerning, why do the tests miss this? If I am correct, you should fix 
this and more importantly add coverage for this in the tests.
  # Please comment on line 136 {{HAVING COUNT(*) = 0 }}, what are you doing 
there?
  # in {{init}} I don't like the silent catching of exception... why do we have 
it? Since this is only used for testing I rather have a loud failure than a 
silent moving on. 

  
  Store Procedures:
  # Can you create only one or two files called sqlserver-federation-schema.sql 
and sqlserver-federation-storeproc.sql? Makes the reading easier, and git will 
do well handling row-level conflicts in most cases.
  # {{applicationHomeSubcluster}} switch from locale to UTC time consistently 
with other timestamps.
  # {{policies}} the {{params}} should be much bigger than 512. This might 
contain complex info about how each queue/user should be mapped across multiple 
sub-clusters. I would argue for a fairly large amount of space (we should have 
a small number of policies anyway). This is particularly important if/when we 
will use things like Hekaton (since schema changes are majorly painful).
  # {{sp_deleteExpiredApplicationsHomeSubCluster}} the -10 should not be 
hard-coded. How was that chosen? What is that expressed in? I would argue this 
should be an input param, and the value should come up from somewhere in yarn 
conf.
  # {{sp_deregisterSubCluster}} why is STATE configurable? Does it takes 
different values? 
  # {{sp_registerSubCluster}} is {{@capability VARCHAR(6000)}} big enough? Same 
question for the 256 char for the addresses.
  # (NOT TO FIX NOW) I don't love the fact that if two subcluster are 
mis-configured to clash on subclusterID we silently override each other 
entries... I know this behavior is needed to support HA, but we should (in V2) 
improve and detect issues like this. 
  # {{sp_getApplicationsHomeSubCluster, sp_getPoliciesConfigurations, 
sp_getSubClusters}} use of {{SELECT *}} should be avoided.
  # {{sp_getPolicyConfiguration, sp_setPolicyConfiguration}} size of {{param}} 
should be increased.
  # {{sp_getSubCluster}} why using the outparams instead of ResultSet?
 
  
TOP LEVEL CONCERNS:
 # What is our schema-evolution story? Should we introduce views?
 # A concern are SQL injections, for example what happens if in 
{{SP_DELETEAPPLICATIONHOMESUBCLUSTER}} someone sends in an applicationId = 
"(SELECT * FROM applications)" ? Do we remove all apps?  (maybe bad example, 
but you see what I mean, possibly this is the 
FederationStateStoreInputValidator responsibility, let's double check it is 
done properly).  
 # In v2 it would be nice to have a MySQL version as well, consider abstracting 
some of the SQL stuff, are we architecturally ready to have multiple SQL 
dialects for store proc etc?
 # Do we have enough test coverage? E.g., the HSQLDB issues on updating 
heartbeat is concerning. Please double check the tests cover enough of the 
possible issues (you saw some in production, replicate them in the tests).




was (Author: curino):
Thanks [~giovanni.fumarola] for the updated patch.

Conf:
 # You set 
{{YarnConfiguration.DEFAULT_FEDERATION_STATESTORE_SQL_MAXCONNECTIONS}} to 1. 
Isn't this too small? What values you commonly use? I would put a conservative 
but not overly tight value, otherwise users are forced to learn/modify more 
params.
 # We should  omit {{yarn.federation.state-store.sql.max-connections} from 
yarn-default.xml like you do for all other params in this patch.
 
In {{SQLFederationStateStore}}
  # For most fields except {{CALL_SP_GET_ALL_POLICY_CONFIGURATIONS}} you simply 
use the plural to differentiate the getter of one or all items. Be consistent 
(e.g., remove the ALL here, or add it everywhere else)
  # Minor: userNameDB --> userName, passwordDB --> password
  # When you throw the exceptions (e.g., subcluster registration), it might be 
nice in the message to include the sub-clusterID / ip or any other info one can 
use to debug.
  # Can you comment on why we are using: {{FederationStateStoreErrorCode}}? 
They don't seem to be connected to SQL error codes, and they are not used 
anywhere else (we normally use named exception, which are easier to 
understand/track).
  # at line 277-278: formatting
  # We should try to remove redundance, e.g., you have lots of things that look 
like this:
  {code}
      try {
      FederationMembershipStateStoreInputValidator
          .validateGetSubClusterInfoRequest(subClusterRequest);
    } catch (FederationStateStoreInvalidInputException e) {
      FederationStateStoreUtils.logAndThrowInvalidInputException(LOG,
          "Unable to obtain the infomation about the SubCluster. "
              + e.getMessage());
    }
  {code}
  They could be factored out to 
{{FederationMembershipStateStoreInputValidator.validate(subClusterRequest)}} 
where the type of input param is used to differentiate the method, and the 
logAndThrowInvalidInputException is done on that side. Same goes for 
{{checkSubClusterInfo}}.
  # Similarly to the above we should try to factor out the very repetitive code 
to create connection/statements, set params, run, and throw. I don't have a 
specific advise on this, but the code is mostly copy and paste, which we should 
avoid.
  # Move the {{fromStringToSubClusterState}} to the SubclusterState class (and 
call it fromString(). 
  # Why {{getSubCluster}} and {{getSubClusters}} use different mechanics for 
return values? (registered params vs ResultSet)? Might be worth to be 
consistent (probably using ResultSet).
  # Line 540: Is this behavior (overwrite existing) consistent with general 
YARN? (I think so, but want to check) 
  # Some of the log are a bit vague {{LOG.debug("Got the information about the 
specified application }} say spefically what info where gotten
  # if you use {{LOG.debug}} consider to prefix it with a check if we are in 
debug mode (save time/objects creations for the String that are then not used).
  # You have several {{      if (cstmt.getInt(2) != 1) }} ROWCOUNT checks. This 
mix the no tuple where changed to multiple tuple where changed. Distinguishing 
the two cases, might help debug (do we have duplicates in DB, or the entry was 
not found). (Not mandatory, just somethign to consider)
  # {{setPolicyConfiguration}} You are doing {{cstmt.setBytes(3, new 
byte[policyConf.getParams().remaining()]);}} which adds an empty byte[] instead 
of what is coming in input.
  # {{getCurrentVersion}} and {{loadVersion}} throw a NotSupportedException or 
something of the sort, a silent return null is easy to confuse people. (I know 
the full version will be in V2, let's just have a clear breakage if someone try 
to use this methods).
  
  
  (.... to be continued ...)

  (.... continued...)
  In {{FederationStateStoreInputValidator}}:
  # Please consider to rename the validate methods (ok to have separate JIRA 
for this). 
  
  In {{FederationStateStoreUtils}}
  # You log at info and debug level inconsistently for the {{set*}} you added. 
I would suggest debug for all.
  
  In {{HSQLDBFederationStateStore}}
  # the empty constructor with super() is redundant, just omit the constructor 
alltogether.
  # I think the code would be more  readable if all the schema was in a 
well-formatted hsqldb-federation-schema.sql (or broken down like the SQLServer 
one) and the code was reading from file the statements and executing them. 
  # The use of {{SELECT *}} is kind of dangerous, because it hides field 
renames/moves and other schema evolution issues, that might lead to hard to 
debug, I would use explicit named fields always
  # I am not sure this is kosher {{SELECT applicationId_IN, homeSubCluster_IN}} 
line 133. You are not actually returning the homeSubCluster that is in the DB, 
just what you sent in input if the app is there. I believe you want to return 
the fields applicationId an homeSubCluster and not the *_IN params.
  # shouldn't {{SP_SUBCLUSTERHEARTBEAT}} update the {{lastHeartBeat}} field? 
More concerning, why do the tests miss this? If I am correct, you should fix 
this and more importantly add coverage for this in the tests.
  # Please comment on line 136 {{HAVING COUNT(*) = 0 }}, what are you doing 
there?
  # in {{init}} I don't like the silent catching of exception... why do we have 
it? Since this is only used for testing I rather have a loud failure than a 
silent moving on. 

  
  Store Procedures:
  # Can you create only one or two files called sqlserver-federation-schema.sql 
and sqlserver-federation-storeproc.sql? Makes the reading easier, and git will 
do well handling row-level conflicts in most cases.
  # {{applicationHomeSubcluster}} switch from locale to UTC time consistently 
with other timestamps.
  # {{policies}} the {{params}} should be much bigger than 512. This might 
contain complex info about how each queue/user should be mapped across multiple 
sub-clusters. I would argue for a fairly large amount of space (we should have 
a small number of policies anyway). This is particularly important if/when we 
will use things like Hekaton (since schema changes are majorly painful).
  # {{sp_deleteExpiredApplicationsHomeSubCluster}} the -10 should not be 
hard-coded. How was that chosen? What is that expressed in? I would argue this 
should be an input param, and the value should come up from somewhere in yarn 
conf.
  # {{sp_deregisterSubCluster}} why is STATE configurable? Does it takes 
different values? 
  # {{sp_registerSubCluster}} is {{@capability VARCHAR(6000)}} big enough? Same 
question for the 256 char for the addresses.
  # (NOT TO FIX NOW) I don't love the fact that if two subcluster are 
mis-configured to clash on subclusterID we silently override each other 
entries... I know this behavior is needed to support HA, but we should (in V2) 
improve and detect issues like this. 
  # {{sp_getApplicationsHomeSubCluster, sp_getPoliciesConfigurations, 
sp_getSubClusters}} use of {{SELECT *}} should be avoided.
  # {{sp_getPolicyConfiguration, sp_setPolicyConfiguration}} size of {{param}} 
should be increased.
  # {{sp_getSubCluster}} why using the outparams instead of ResultSet?
 
  
TOP LEVEL CONCERNS:
 # What is our schema-evolution story? Should we introduce views?
 # A concern are SQL injections, for example what happens if in 
{{SP_DELETEAPPLICATIONHOMESUBCLUSTER}} someone sends in an applicationId = 
"(SELECT * FROM applications)" ? Do we remove all apps?  (maybe bad example, 
but you see what I mean, possibly this is the 
FederationStateStoreInputValidator responsibility, let's double check it is 
done properly).  
 # In v2 it would be nice to have a MySQL version as well, consider abstracting 
some of the SQL stuff, are we architecturally ready to have multiple SQL 
dialects for store proc etc?
 # Do we have enough test coverage? E.g., the HSQLDB issues on updating 
heartbeat is concerning. Please double check the tests cover enough of the 
possible issues (you saw some in production, replicate them in the tests).



> Federation State and Policy Store (DBMS implementation)
> -------------------------------------------------------
>
>                 Key: YARN-3663
>                 URL: https://issues.apache.org/jira/browse/YARN-3663
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager, resourcemanager
>    Affects Versions: YARN-2915
>            Reporter: Giovanni Matteo Fumarola
>            Assignee: Giovanni Matteo Fumarola
>         Attachments: YARN-3663-YARN-2915.v1.patch, 
> YARN-3663-YARN-2915.v2.patch
>
>
> This JIRA tracks a SQL-based implementation of the Federation State and 
> Policy Store, which implements YARN-3662 APIs.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (YARN-3663) Federation State and Policy Store (DBMS implementation)

Reply via email to