[
https://issues.apache.org/jira/browse/YARN-3663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15968218#comment-15968218
]
Carlo Curino edited comment on YARN-3663 at 4/13/17 11:17 PM:
--------------------------------------------------------------
Thanks [~giovanni.fumarola] for the updated patch.
Conf:
# You set
{{YarnConfiguration.DEFAULT_FEDERATION_STATESTORE_SQL_MAXCONNECTIONS}} to 1.
Isn't this too small? What values you commonly use? I would put a conservative
but not overly tight value, otherwise users are forced to learn/modify more
params.
# We should omit {{yarn.federation.state-store.sql.max-connections} from
yarn-default.xml like you do for all other params in this patch.
In {{SQLFederationStateStore}}
# For most fields except {{CALL_SP_GET_ALL_POLICY_CONFIGURATIONS}} you simply
use the plural to differentiate the getter of one or all items. Be consistent
(e.g., remove the ALL here, or add it everywhere else)
# Minor: userNameDB --> userName, passwordDB --> password
# When you throw the exceptions (e.g., subcluster registration), it might be
nice in the message to include the sub-clusterID / ip or any other info one can
use to debug.
# Can you comment on why we are using: {{FederationStateStoreErrorCode}}?
They don't seem to be connected to SQL error codes, and they are not used
anywhere else (we normally use named exception, which are easier to
understand/track).
# at line 277-278: formatting
# We should try to remove redundance, e.g., you have lots of things that look
like this:
{code}
try {
FederationMembershipStateStoreInputValidator
.validateGetSubClusterInfoRequest(subClusterRequest);
} catch (FederationStateStoreInvalidInputException e) {
FederationStateStoreUtils.logAndThrowInvalidInputException(LOG,
"Unable to obtain the infomation about the SubCluster. "
+ e.getMessage());
}
{code}
They could be factored out to
{{FederationMembershipStateStoreInputValidator.validate(subClusterRequest)}}
where the type of input param is used to differentiate the method, and the
logAndThrowInvalidInputException is done on that side. Same goes for
{{checkSubClusterInfo}}.
# Similarly to the above we should try to factor out the very repetitive code
to create connection/statements, set params, run, and throw. I don't have a
specific advise on this, but the code is mostly copy and paste, which we should
avoid.
# Move the {{fromStringToSubClusterState}} to the SubclusterState class (and
call it fromString().
# Why {{getSubCluster}} and {{getSubClusters}} use different mechanics for
return values? (registered params vs ResultSet)? Might be worth to be
consistent (probably using ResultSet).
# Line 540: Is this behavior (overwrite existing) consistent with general
YARN? (I think so, but want to check)
# Some of the log are a bit vague {{LOG.debug("Got the information about the
specified application }} say spefically what info where gotten
# if you use {{LOG.debug}} consider to prefix it with a check if we are in
debug mode (save time/objects creations for the String that are then not used).
# You have several {{ if (cstmt.getInt(2) != 1) }} ROWCOUNT checks. This
mix the no tuple where changed to multiple tuple where changed. Distinguishing
the two cases, might help debug (do we have duplicates in DB, or the entry was
not found). (Not mandatory, just somethign to consider)
# {{setPolicyConfiguration}} You are doing {{cstmt.setBytes(3, new
byte[policyConf.getParams().remaining()]);}} which adds an empty byte[] instead
of what is coming in input.
# {{getCurrentVersion}} and {{loadVersion}} throw a NotSupportedException or
something of the sort, a silent return null is easy to confuse people. (I know
the full version will be in V2, let's just have a clear breakage if someone try
to use this methods).
(.... to be continued ...)
(.... continued...)
In {{FederationStateStoreInputValidator}}:
# Please consider to rename the validate methods (ok to have separate JIRA
for this).
In {{FederationStateStoreUtils}}
# You log at info and debug level inconsistently for the {{set*}} you added.
I would suggest debug for all.
In {{HSQLDBFederationStateStore}}
# the empty constructor with super() is redundant, just omit the constructor
alltogether.
# I think the code would be more readable if all the schema was in a
well-formatted hsqldb-federation-schema.sql (or broken down like the SQLServer
one) and the code was reading from file the statements and executing them.
# The use of {{SELECT *}} is kind of dangerous, because it hides field
renames/moves and other schema evolution issues, that might lead to hard to
debug, I would use explicit named fields always
# shouldn't {{SP_SUBCLUSTERHEARTBEAT}} update the {{lastHeartBeat}} field?
More concerning, why do the tests miss this? If I am correct, you should fix
this and more importantly add coverage for this in the tests.
# Please comment on line 136 {{HAVING COUNT(*) = 0 }}, what are you doing
there?
# in {{init}} I don't like the silent catching of exception... why do we have
it? Since this is only used for testing I rather have a loud failure than a
silent moving on.
Store Procedures:
# Can you create only one or two files called sqlserver-federation-schema.sql
and sqlserver-federation-storeproc.sql? Makes the reading easier, and git will
do well handling row-level conflicts in most cases.
# {{applicationHomeSubcluster}} switch from locale to UTC time consistently
with other timestamps.
# {{policies}} the {{params}} should be much bigger than 512. This might
contain complex info about how each queue/user should be mapped across multiple
sub-clusters. I would argue for a fairly large amount of space (we should have
a small number of policies anyway). This is particularly important if/when we
will use things like Hekaton (since schema changes are majorly painful).
# {{sp_deleteExpiredApplicationsHomeSubCluster}} the -10 should not be
hard-coded. How was that chosen? What is that expressed in? I would argue this
should be an input param, and the value should come up from somewhere in yarn
conf.
# {{sp_deregisterSubCluster}} why is STATE configurable? Does it takes
different values?
# {{sp_registerSubCluster}} is {{@capability VARCHAR(6000)}} big enough? Same
question for the 256 char for the addresses.
# (NOT TO FIX NOW) I don't love the fact that if two subcluster are
mis-configured to clash on subclusterID we silently override each other
entries... I know this behavior is needed to support HA, but we should (in V2)
improve and detect issues like this.
# {{sp_getApplicationsHomeSubCluster, sp_getPoliciesConfigurations,
sp_getSubClusters}} use of {{SELECT *}} should be avoided.
# {{sp_getPolicyConfiguration, sp_setPolicyConfiguration}} size of {{param}}
should be increased.
# {{sp_getSubCluster}} why using the outparams instead of ResultSet?
TOP LEVEL CONCERNS:
# What is our schema-evolution story? Should we introduce views?
# A concern are SQL injections, for example what happens if in
{{SP_DELETEAPPLICATIONHOMESUBCLUSTER}} someone sends in an applicationId =
"(SELECT * FROM applications)" ? Do we remove all apps? (maybe bad example,
but you see what I mean, possibly this is the
FederationStateStoreInputValidator responsibility, let's double check it is
done properly).
# In v2 it would be nice to have a MySQL version as well, consider abstracting
some of the SQL stuff, are we architecturally ready to have multiple SQL
dialects for store proc etc?
# Do we have enough test coverage? E.g., the HSQLDB issues on updating
heartbeat is concerning. Please double check the tests cover enough of the
possible issues (you saw some in production, replicate them in the tests).
was (Author: curino):
Thanks [~giovanni.fumarola] for the updated patch.
Conf:
# You set
{{YarnConfiguration.DEFAULT_FEDERATION_STATESTORE_SQL_MAXCONNECTIONS}} to 1.
Isn't this too small? What values you commonly use? I would put a conservative
but not overly tight value, otherwise users are forced to learn/modify more
params.
# We should omit {{yarn.federation.state-store.sql.max-connections} from
yarn-default.xml like you do for all other params in this patch.
In {{SQLFederationStateStore}}
# For most fields except {{CALL_SP_GET_ALL_POLICY_CONFIGURATIONS}} you simply
use the plural to differentiate the getter of one or all items. Be consistent
(e.g., remove the ALL here, or add it everywhere else)
# Minor: userNameDB --> userName, passwordDB --> password
# When you throw the exceptions (e.g., subcluster registration), it might be
nice in the message to include the sub-clusterID / ip or any other info one can
use to debug.
# Can you comment on why we are using: {{FederationStateStoreErrorCode}}?
They don't seem to be connected to SQL error codes, and they are not used
anywhere else (we normally use named exception, which are easier to
understand/track).
# at line 277-278: formatting
# We should try to remove redundance, e.g., you have lots of things that look
like this:
{code}
try {
FederationMembershipStateStoreInputValidator
.validateGetSubClusterInfoRequest(subClusterRequest);
} catch (FederationStateStoreInvalidInputException e) {
FederationStateStoreUtils.logAndThrowInvalidInputException(LOG,
"Unable to obtain the infomation about the SubCluster. "
+ e.getMessage());
}
{code}
They could be factored out to
{{FederationMembershipStateStoreInputValidator.validate(subClusterRequest)}}
where the type of input param is used to differentiate the method, and the
logAndThrowInvalidInputException is done on that side. Same goes for
{{checkSubClusterInfo}}.
# Similarly to the above we should try to factor out the very repetitive code
to create connection/statements, set params, run, and throw. I don't have a
specific advise on this, but the code is mostly copy and paste, which we should
avoid.
# Move the {{fromStringToSubClusterState}} to the SubclusterState class (and
call it fromString().
# Why {{getSubCluster}} and {{getSubClusters}} use different mechanics for
return values? (registered params vs ResultSet)? Might be worth to be
consistent (probably using ResultSet).
# Line 540: Is this behavior (overwrite existing) consistent with general
YARN? (I think so, but want to check)
# Some of the log are a bit vague {{LOG.debug("Got the information about the
specified application }} say spefically what info where gotten
# if you use {{LOG.debug}} consider to prefix it with a check if we are in
debug mode (save time/objects creations for the String that are then not used).
# You have several {{ if (cstmt.getInt(2) != 1) }} ROWCOUNT checks. This
mix the no tuple where changed to multiple tuple where changed. Distinguishing
the two cases, might help debug (do we have duplicates in DB, or the entry was
not found). (Not mandatory, just somethign to consider)
# {{setPolicyConfiguration}} You are doing {{cstmt.setBytes(3, new
byte[policyConf.getParams().remaining()]);}} which adds an empty byte[] instead
of what is coming in input.
# {{getCurrentVersion}} and {{loadVersion}} throw a NotSupportedException or
something of the sort, a silent return null is easy to confuse people. (I know
the full version will be in V2, let's just have a clear breakage if someone try
to use this methods).
(.... to be continued ...)
(.... continued...)
In {{FederationStateStoreInputValidator}}:
# Please consider to rename the validate methods (ok to have separate JIRA
for this).
In {{FederationStateStoreUtils}}
# You log at info and debug level inconsistently for the {{set*}} you added.
I would suggest debug for all.
In {{HSQLDBFederationStateStore}}
# the empty constructor with super() is redundant, just omit the constructor
alltogether.
# I think the code would be more readable if all the schema was in a
well-formatted hsqldb-federation-schema.sql (or broken down like the SQLServer
one) and the code was reading from file the statements and executing them.
# The use of {{SELECT *}} is kind of dangerous, because it hides field
renames/moves and other schema evolution issues, that might lead to hard to
debug, I would use explicit named fields always
# I am not sure this is kosher {{SELECT applicationId_IN, homeSubCluster_IN}}
line 133. You are not actually returning the homeSubCluster that is in the DB,
just what you sent in input if the app is there. I believe you want to return
the fields applicationId an homeSubCluster and not the *_IN params.
# shouldn't {{SP_SUBCLUSTERHEARTBEAT}} update the {{lastHeartBeat}} field?
More concerning, why do the tests miss this? If I am correct, you should fix
this and more importantly add coverage for this in the tests.
# Please comment on line 136 {{HAVING COUNT(*) = 0 }}, what are you doing
there?
# in {{init}} I don't like the silent catching of exception... why do we have
it? Since this is only used for testing I rather have a loud failure than a
silent moving on.
Store Procedures:
# Can you create only one or two files called sqlserver-federation-schema.sql
and sqlserver-federation-storeproc.sql? Makes the reading easier, and git will
do well handling row-level conflicts in most cases.
# {{applicationHomeSubcluster}} switch from locale to UTC time consistently
with other timestamps.
# {{policies}} the {{params}} should be much bigger than 512. This might
contain complex info about how each queue/user should be mapped across multiple
sub-clusters. I would argue for a fairly large amount of space (we should have
a small number of policies anyway). This is particularly important if/when we
will use things like Hekaton (since schema changes are majorly painful).
# {{sp_deleteExpiredApplicationsHomeSubCluster}} the -10 should not be
hard-coded. How was that chosen? What is that expressed in? I would argue this
should be an input param, and the value should come up from somewhere in yarn
conf.
# {{sp_deregisterSubCluster}} why is STATE configurable? Does it takes
different values?
# {{sp_registerSubCluster}} is {{@capability VARCHAR(6000)}} big enough? Same
question for the 256 char for the addresses.
# (NOT TO FIX NOW) I don't love the fact that if two subcluster are
mis-configured to clash on subclusterID we silently override each other
entries... I know this behavior is needed to support HA, but we should (in V2)
improve and detect issues like this.
# {{sp_getApplicationsHomeSubCluster, sp_getPoliciesConfigurations,
sp_getSubClusters}} use of {{SELECT *}} should be avoided.
# {{sp_getPolicyConfiguration, sp_setPolicyConfiguration}} size of {{param}}
should be increased.
# {{sp_getSubCluster}} why using the outparams instead of ResultSet?
TOP LEVEL CONCERNS:
# What is our schema-evolution story? Should we introduce views?
# A concern are SQL injections, for example what happens if in
{{SP_DELETEAPPLICATIONHOMESUBCLUSTER}} someone sends in an applicationId =
"(SELECT * FROM applications)" ? Do we remove all apps? (maybe bad example,
but you see what I mean, possibly this is the
FederationStateStoreInputValidator responsibility, let's double check it is
done properly).
# In v2 it would be nice to have a MySQL version as well, consider abstracting
some of the SQL stuff, are we architecturally ready to have multiple SQL
dialects for store proc etc?
# Do we have enough test coverage? E.g., the HSQLDB issues on updating
heartbeat is concerning. Please double check the tests cover enough of the
possible issues (you saw some in production, replicate them in the tests).
> Federation State and Policy Store (DBMS implementation)
> -------------------------------------------------------
>
> Key: YARN-3663
> URL: https://issues.apache.org/jira/browse/YARN-3663
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager, resourcemanager
> Affects Versions: YARN-2915
> Reporter: Giovanni Matteo Fumarola
> Assignee: Giovanni Matteo Fumarola
> Attachments: YARN-3663-YARN-2915.v1.patch,
> YARN-3663-YARN-2915.v2.patch
>
>
> This JIRA tracks a SQL-based implementation of the Federation State and
> Policy Store, which implements YARN-3662 APIs.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]