Junping Du commented on YARN-3411:

bq. I was thinking it is not necessary since the entity information would come 
in a more streaming fashion, one update at a time anyways. If say, one column 
is written and other is not, the callee can retry again, hbase put will simply 
over-write existing value.
It sounds reasonable. Let's keep it so far and check if necessary to change for 
some corner cases (like we could update event and metrics at the same time for 
some job's final counter) in future.

Thanks for addressing my previous comments. Latest patch looks much closer! 
Some additional comments (besides Zhijie's comments above) on latest (006) 
In TimelineCollectorManager.java,
+  @Override
+  protected void serviceStop() throws Exception {
+    super.serviceStop();
+    if (writer != null) {
+      writer.close();
+    }
+  }
We should stop running collectors before stopping the shared writer. Also, put 
super.serviceStop(); to the last one to stop as conforming the common practice.

In EntityColumnFamilyDetails.java,
+   * @param rowKey
+   * @param entityTable
+   * @param inputValue
+   * @throws IOException
+   */
+  public void store(byte[] rowKey, BufferedMutator entityTable, String key,
+      String inputValue) throws IOException {
Lacking a param of key in javadoc.

In HBaseTimelineWriterImpl.java,
For write() which is synchronized semantics so far (we haven't implemented 
async yet), we put each kind of entity to the table by calling 
entityTable.mutate(...) which cache these entities locally and flush it later 
under some conditions. Do we need to call entityTable.flush() for semantics of 
strict synchronized writing? If not, at least, we should flush it at 
serviceStop() as close() directly could lose some cached data.

  protected void serviceStop() throws Exception {
    if (entityTable != null) {
      LOG.info("closing entity table");
    if (conn != null) {
      LOG.info("closing the hbase Connection");
Also, as comments above, put super.serviceStop() as the last one to stop.

In Range.java,
+public class Range {
For class marked as private, we don't necessary to put InterfaceStability 
annotation there. 

In TimelineWriterUtils.java,

+    for (byte[] comp : components) {
+      finalSize += comp.length;
+    }
Null check for comp.

+      System.arraycopy(components[i], 0, buf, offset, components[i].length);
+      offset += components[i].length;
Null check for components[i].

+   * @param source
+   * @param separator
+   * @return byte[][] after splitting the input source
+   */
+  public static byte[][] split(byte[] source, byte[] separator, int limit) {
Missing param of limit in javadoc.

It sounds we didn't do any null check for separator in splitRanges(), but we do 
null check in join(). It should keep consistent at least in one class.

+  public static long getValueAsLong(final byte[] key,
+      final Map<byte[], byte[]> taskValues) throws IOException {
+    byte[] value = taskValues.get(key);
+    if (value != null) {
+      Number val = (Number) GenericObjectMapper.read(value);
+      return val.longValue();
+    } else {
+      return 0L;
+    }
+  }
Shall we use Long instead of long? Or we cannot diff value with null and real 0.

> [Storage implementation] explore the native HBase write schema for storage
> --------------------------------------------------------------------------
>                 Key: YARN-3411
>                 URL: https://issues.apache.org/jira/browse/YARN-3411
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Vrushali C
>            Priority: Critical
>         Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
> YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
> YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
> YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
> YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, 
> YARN-3411.poc.txt
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.

This message was sent by Atlassian JIRA

Reply via email to