[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14550688#comment-14550688
]
Junping Du commented on YARN-3411:
----------------------------------
bq. I was thinking it is not necessary since the entity information would come
in a more streaming fashion, one update at a time anyways. If say, one column
is written and other is not, the callee can retry again, hbase put will simply
over-write existing value.
It sounds reasonable. Let's keep it so far and check if necessary to change for
some corner cases (like we could update event and metrics at the same time for
some job's final counter) in future.
Thanks for addressing my previous comments. Latest patch looks much closer!
Some additional comments (besides Zhijie's comments above) on latest (006)
patch:
In TimelineCollectorManager.java,
{code}
+ @Override
+ protected void serviceStop() throws Exception {
+ super.serviceStop();
+ if (writer != null) {
+ writer.close();
+ }
+ }
{code}
We should stop running collectors before stopping the shared writer. Also, put
super.serviceStop(); to the last one to stop as conforming the common practice.
In EntityColumnFamilyDetails.java,
{code}
+ * @param rowKey
+ * @param entityTable
+ * @param inputValue
+ * @throws IOException
+ */
+ public void store(byte[] rowKey, BufferedMutator entityTable, String key,
+ String inputValue) throws IOException {
{code}
Lacking a param of key in javadoc.
In HBaseTimelineWriterImpl.java,
For write() which is synchronized semantics so far (we haven't implemented
async yet), we put each kind of entity to the table by calling
entityTable.mutate(...) which cache these entities locally and flush it later
under some conditions. Do we need to call entityTable.flush() for semantics of
strict synchronized writing? If not, at least, we should flush it at
serviceStop() as close() directly could lose some cached data.
{code}
@Override
protected void serviceStop() throws Exception {
super.serviceStop();
if (entityTable != null) {
LOG.info("closing entity table");
entityTable.close();
}
if (conn != null) {
LOG.info("closing the hbase Connection");
conn.close();
}
}
{code}
Also, as comments above, put super.serviceStop() as the last one to stop.
In Range.java,
{code}
[email protected]
[email protected]
+public class Range {
{code}
For class marked as private, we don't necessary to put InterfaceStability
annotation there.
In TimelineWriterUtils.java,
{code}
+ for (byte[] comp : components) {
+ finalSize += comp.length;
+ }
{code}
Null check for comp.
{code}
+ System.arraycopy(components[i], 0, buf, offset, components[i].length);
+ offset += components[i].length;
{code}
Null check for components[i].
{code}
+ * @param source
+ * @param separator
+ * @return byte[][] after splitting the input source
+ */
+ public static byte[][] split(byte[] source, byte[] separator, int limit) {
{code}
Missing param of limit in javadoc.
It sounds we didn't do any null check for separator in splitRanges(), but we do
null check in join(). It should keep consistent at least in one class.
{code}
+ public static long getValueAsLong(final byte[] key,
+ final Map<byte[], byte[]> taskValues) throws IOException {
+ byte[] value = taskValues.get(key);
+ if (value != null) {
+ Number val = (Number) GenericObjectMapper.read(value);
+ return val.longValue();
+ } else {
+ return 0L;
+ }
+ }
{code}
Shall we use Long instead of long? Or we cannot diff value with null and real 0.
> [Storage implementation] explore the native HBase write schema for storage
> --------------------------------------------------------------------------
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Sangjin Lee
> Assignee: Vrushali C
> Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf,
> YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch,
> YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch,
> YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch,
> YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt,
> YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt,
> YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native
> HBase schema for the write path. Such a schema does not exclude using
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in
> terms of performance, scalability, usability, etc. and make a call.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)