Sangjin Lee commented on YARN-3134:

Sorry [~gtCarrera9] for the late comments. I took a quick look at the patch, 
and I have yet to delve deep into the SQL- and schema-related parts of the 
code. But I have some quick comments on other aspects:

- I'm curious, is there a strong reason to use the TimelineCollectorManager to 
obtain the the writer? This would introduce a bi-directional (instance) 
dependency between the TimelineCollector and the TimelineCollectorManager, and 
it could be problematic. For example, the current timeline service performance 
benchmark tool uses TimelineCollector directly without creating a manager. Can 
we avoid this dependency?

- l.87: I wish you could use ThreadLocal directly, but I do get you'd need to 
get all the connections at the end when you stop it
- Please replace all StringBuffers with StringBuilders. StringBuffers should 
not be used as a rule as they do unnecessary synchronization.
- l.175: getConnection() is not thread safe with unsynchronized HashMap. Even 
though different threads would operate on different keys, it doesn't mean it 
will be thread safe with HashMap. You need to use ConcurrentHashMap for this or 
another thread-safe concurrent solution.
- l.198: I'm not quite sure why initializeData() should be called every time 
the service comes up. Shouldn't we do this only once at the very beginning when 
the tables do not exist? Also, the method name initializeData() is bit 
misleading. I think initializeTables() is the right name for this?

> [Storage implementation] Exploiting the option of using Phoenix to access 
> HBase backend
> ---------------------------------------------------------------------------------------
>                 Key: YARN-3134
>                 URL: https://issues.apache.org/jira/browse/YARN-3134
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Zhijie Shen
>            Assignee: Li Lu
>         Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
> YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134DataSchema.pdf
> Quote the introduction on Phoenix web page:
> {code}
> Apache Phoenix is a relational database layer over HBase delivered as a 
> client-embedded JDBC driver targeting low latency queries over HBase data. 
> Apache Phoenix takes your SQL query, compiles it into a series of HBase 
> scans, and orchestrates the running of those scans to produce regular JDBC 
> result sets. The table metadata is stored in an HBase table and versioned, 
> such that snapshot queries over prior versions will automatically use the 
> correct schema. Direct use of the HBase API, along with coprocessors and 
> custom filters, results in performance on the order of milliseconds for small 
> queries, or seconds for tens of millions of rows.
> {code}
> It may simply our implementation read/write data from/to HBase, and can 
> easily build index and compose complex query.

This message was sent by Atlassian JIRA

Reply via email to