subject:"\[jira\] \[Commented\] \(YARN\-3134\) \[Storage implementation\] Exploiting the option of using Phoenix to access HBase backend"

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2016-07-10 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15369786#comment-15369786
 ] 

Hudson commented on YARN-3134:
--

SUCCESS: Integrated in Hadoop-trunk-Commit #10074 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/10074/])
YARN-3134. Implemented Phoenix timeline writer to access HBase backend. (sjlee: 
rev 41fb5c738117ab65a2f152a13de8c85476acdc58)
* hadoop-yarn-project/hadoop-yarn/dev-support/findbugs-exclude.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollector.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/storage/PhoenixTimelineWriterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/main/java/org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollectorManager.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/pom.xml
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/test/java/org/apache/hadoop/yarn/server/timelineservice/storage/TestTimelineWriterImpl.java
* 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice/src/test/java/org/apache/hadoop/yarn/server/timelineservice/storage/TestPhoenixTimelineWriterImpl.java


> [Storage implementation] Exploiting the option of using Phoenix to access 
> HBase backend
> ---
>
> Key: YARN-3134
> URL: https://issues.apache.org/jira/browse/YARN-3134
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Zhijie Shen
>Assignee: Li Lu
> Fix For: YARN-2928
>
> Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
> YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
> YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
> YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
> YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, 
> YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, 
> YARN-3134-YARN-2928.007.patch, YARN-3134DataSchema.pdf, 
> hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out
>
>
> Quote the introduction on Phoenix web page:
> {code}
> Apache Phoenix is a relational database layer over HBase delivered as a 
> client-embedded JDBC driver targeting low latency queries over HBase data. 
> Apache Phoenix takes your SQL query, compiles it into a series of HBase 
> scans, and orchestrates the running of those scans to produce regular JDBC 
> result sets. The table metadata is stored in an HBase table and versioned, 
> such that snapshot queries over prior versions will automatically use the 
> correct schema. Direct use of the HBase API, along with coprocessors and 
> custom filters, results in performance on the order of milliseconds for small 
> queries, or seconds for tens of millions of rows.
> {code}
> It may simply our implementation read/write data from/to HBase, and can 
> easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-06-03 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14571606#comment-14571606
 ] 

Vrushali C commented on YARN-3134:
--


After evaluating both approaches of backend storage implementations in terms of 
their performance, scalability, usability, maintenance as given by YARN-3134 
(Phoenix based HBase schema) and  YARN-3411  (hybrid HBase schema - vanilla 
HBase tables in the direct write path and phoenix based tables for reporting),  
conclusion is to use vanilla hbase tables in the direct write path. 

Attached to YARN-2928 is a write-up that describes how we ended up choosing the 
approach of writing to vanilla HBase tables (YARN-3411) in the direct write 
path.


 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Fix For: YARN-2928

 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
 YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, 
 YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, 
 YARN-3134-YARN-2928.007.patch, YARN-3134DataSchema.pdf, 
 hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-05-11 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14538466#comment-14538466
 ] 

Vinod Kumar Vavilapalli commented on YARN-3134:
---

Tx folks, this is great progress!

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Fix For: YARN-2928

 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
 YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, 
 YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, 
 YARN-3134-YARN-2928.007.patch, YARN-3134DataSchema.pdf, 
 hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-05-08 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14534291#comment-14534291
 ] 

Junping Du commented on YARN-3134:
--

bq. For now, I'm removing the connection cache to make the first step right. 
I'll change the description of YARN-3595 for the connection cache. 
+1. The plan sounds reasonable.

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
 YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, 
 YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, 
 YARN-3134DataSchema.pdf, 
 hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-05-07 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533656#comment-14533656
 ] 

Zhijie Shen commented on YARN-3134:
---

Noticed pom.xml is using 4.3.0 Phoenix. I retried with this version, the 
problem still happened.

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
 YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, 
 YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, 
 YARN-3134DataSchema.pdf, 
 hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-05-07 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533670#comment-14533670
 ] 

Li Lu commented on YARN-3134:
-

Sure, I'll make those clean up changes. 

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
 YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, 
 YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, 
 YARN-3134DataSchema.pdf, 
 hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-05-07 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533671#comment-14533671
 ] 

Li Lu commented on YARN-3134:
-

Looking into this. Seems like we need more fine tunes for the synchronizations. 

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
 YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, 
 YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, 
 YARN-3134DataSchema.pdf, 
 hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-05-07 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533600#comment-14533600
 ] 

Junping Du commented on YARN-3134:
--

bq. We've also got some code cleanup work to do, but I put them in a priority 
lower than getting the performance evaluation done for now. 
I disagree. Actually, it usually takes more time for reviewer to identify these 
code style issues than simply fix it. Please respect the effort unless you have 
different opinions against these comments.

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
 YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, 
 YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, 
 YARN-3134DataSchema.pdf, 
 hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-05-07 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533393#comment-14533393
 ] 

Li Lu commented on YARN-3134:
-

I just opened YARN-3595 to trace all connection cache related discussions. I 
wrote a summary for the background of this problem. Please feel free to add 
more. 

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
  Labels: BB2015-05-TBR
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
 YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, 
 YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, 
 YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-05-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533481#comment-14533481
 ] 

Hadoop QA commented on YARN-3134:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 51s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 37s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 34s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  25m 54s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12731281/YARN-3134-YARN-2928.006.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / d4a2362 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7782/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7782/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7782/console |


This message was automatically generated.

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
 YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, 
 YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, 
 YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-05-07 Thread Li Lu (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533883#comment-14533883
]

Li Lu commented on YARN-3134:
-

Looked into the concurrency bug. The problem is caused by concurrent operations
on Connections and the Guava cache removalListener calls. So on cache
evictions, active connections may be mistakenly closed. I believe a concurrent
algorithm to resolve this is possible, but not quite trivial. For now, I'm
removing the connection cache to make the first step right. I'll change the
description of YARN-3595 for the connection cache.

In this JIRA, I'm focusing on code clean ups.

[Storage implementation] Exploiting the option of using Phoenix to access
HBase backend
---

Key: YARN-3134
URL: https://issues.apache.org/jira/browse/YARN-3134
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf,
YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch,
YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch,
YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch,
YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch,
YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch,
YARN-3134DataSchema.pdf,
hadoop-zshen-nodemanager-d-128-95-184-84.dhcp4.washington.edu.out

Quote the introduction on Phoenix web page:
{code}
Apache Phoenix is a relational database layer over HBase delivered as a
client-embedded JDBC driver targeting low latency queries over HBase data.
Apache Phoenix takes your SQL query, compiles it into a series of HBase
scans, and orchestrates the running of those scans to produce regular JDBC
result sets. The table metadata is stored in an HBase table and versioned,
such that snapshot queries over prior versions will automatically use the
correct schema. Direct use of the HBase API, along with coprocessors and
custom filters, results in performance on the order of milliseconds for small
queries, or seconds for tens of millions of rows.
{code}
It may simply our implementation read/write data from/to HBase, and can
easily build index and compose complex query.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-05-07 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533422#comment-14533422
 ] 

Hadoop QA commented on YARN-3134:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m  7s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 38s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 34s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | yarn tests |  15m 20s | Tests failed in 
hadoop-yarn-server-timelineservice. |
| | |  41m 39s | |
\\
\\
|| Reason || Tests ||
| Timed out tests | 
org.apache.hadoop.yarn.server.timelineservice.storage.TestPhoenixTimelineWriterImpl
 |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12731261/YARN-3134-YARN-2928.006.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / d4a2362 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build//artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build//testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build//console |


This message was automatically generated.

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
 YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, 
 YARN-3134-YARN-2928.005.patch, YARN-3134-YARN-2928.006.patch, 
 YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-05-07 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532948#comment-14532948
 ] 

Junping Du commented on YARN-3134:
--

Thanks [~gtCarrera9] for updating the patch! This looks much closer.

In FileSystemTimelineWriterImpl.java, 
{code}
-  out = new PrintWriter(new BufferedWriter(new FileWriter(fileName, 
true)));
+  out = new PrintWriter(new BufferedWriter(new OutputStreamWriter(
+  new FileOutputStream(fileName),UTF-8)));
{code}
This is not necessary given YARN-3562 is already get in. Can you remove it?

In PhoenixTimelineWriterImpl.java,

For createTables(),
{code}
...
 + cluster, user, flow_name, flow_version, flow_run DESC, app_id, 
  + type, entity_id));
{code}
For putting DESC in key of flow_run, I remember from the other doc in 
YARN-3411, Vrushali said we should put user as the first key to make sure 
timeline entities from a particular user can distributed less regions. I agree 
with that, so I think we should put DESC to user instead of flow_run as we 
don't really care the sequence of flow_run. [~vrushalic], can you also comment 
here? 

{code}
+ time UNSIGNED_LONG  
{code}
Two spaces here, omit one.

In method of storeEntityVariableLengthFields(),
{code}
if (entity.getConfigs() != null) {
  appendColumnsSQL(sqlColumns, new ColumnFamilyInfo(
  CONFIG_COLUMN_FAMILY, entity.getConfigs().keySet()));
  numPlaceholders += entity.getConfigs().keySet().size();
}
{code}
We should put the String as type to ColumnFamilyInfo, the same case as 
getInfo(), getIsRelatedToEntities() and getRelatesToEntities(), etc. Also, we 
are duplicated calling entity.getConfigs().keySet() twice which should be 
avoid. In addition, sqlColumns better rename to ColumnDefs which define primary 
key columns?

{code}
  private static K StringBuilder appendColumnsSQL(
  StringBuilder colNames, ColumnFamilyInfoK cfInfo) {
return appendColumnsSQL(colNames, cfInfo,  VARCHAR);
  }
{code}
Looks like we are adding column families here rather than columns. May be we 
should rename it to appendColumnFamiliesToSQL?

For setStringsForColumnFamily() and setBytesForColumnFamily(), it looks like 
most code there is duplicated, can we consolidate into one method?

{code}
  @Private
  @VisibleForTesting
  void dropTable(String tableName) throws Exception {
{code}
Document this is only for test.

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
  Labels: BB2015-05-TBR
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
 YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, 
 YARN-3134-YARN-2928.005.patch, YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-05-06 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14530885#comment-14530885
 ] 

Junping Du commented on YARN-3134:
--

bq. I have not made the change on the Phoenix connection string since, 
according to our previous discussion, we're planning to address this after 
we've decided which implementation to pursue in the future. 
It should be fine to keep hard coded JDBC connection address here. At least, we 
should update the name and make it private instead of default visibility 
(package level) as no place outside of CLASS need to use it for now. 

Some other comments against latest patch:

{code}
+  /** Default Phoenix JDBC driver name */
+  static final String DRIVER_CLASS_NAME = 
org.apache.phoenix.jdbc.PhoenixDriver;
+  /** Default Phoenix timeline entity table name */
+  static final String ENTITY_TABLE_NAME = TIMELINE_ENTITY;
...
{code}
Are we going to share these constant variables with HBaseTimelineWriterImpl? If 
not, we should mark them as private. In fact, just quickly check patch in 
YARN-3411, I would expect we could reuse some of these constants but that not 
something we should worry about it now.

{code}
+  private static final String[] PHOENIX_STORAGE_PK_LIST
+  = {cluster, user, flow, version, run, appid, type, 
entityid};
+  private static final String[] TIMELINE_EVENT_EXTRA_PK_LIST =
+  {timestamp, eventid};
+  private static final String[] TIMELINE_METRIC_EXTRA_PK_LIST =
+  {metricid};
{code}
For key name, we should keep consistent with other places, i.e. replacing with 
cluster_id, user_id, flow_name, flow_version, etc. or we will have significant 
headache when maintain it in future. In addition, for key naming convention, it 
sounds like we don't follow any name conversions for now, e.g. lower/upper 
CamelCase or something else. From examples of Phonix 
project(http://phoenix.apache.org/views.html or 
http://phoenix.apache.org/multi-tenancy.html), they hire the style with having 
an underscore as a word separator. I would suggest to conform the same style 
here.

{code}
+try {
+  conn.commit();
+} catch (SQLException se) {
+  LOG.error(Failed to close the phoenix connection! 
+  + se.getLocalizedMessage());
{code}
Message is not correct as exception is not happen in closing connection, but 
commit.

Last but not the least, looks like we have many helper methods 
(appendColumnsSQL, setStringsForColumnFamily, setStringsForPrimaryKey, etc.) 
for constructing the SQL sentences. However, these helper methods are without 
any comments and lacking of unit tests to verify correctness in all cases.

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
  Labels: BB2015-05-TBR
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
 YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, 
 YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-05-06 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531544#comment-14531544
 ] 

Hadoop QA commented on YARN-3134:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 14s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | install |   1m 43s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 39s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 35s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 22s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  26m 39s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12730977/YARN-3134-YARN-2928.005.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 557a395 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7741/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7741/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7741/console |


This message was automatically generated.

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
  Labels: BB2015-05-TBR
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
 YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, 
 YARN-3134-YARN-2928.005.patch, YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-05-06 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531398#comment-14531398
 ] 

Zhijie Shen commented on YARN-3134:
---

Some more comments:

1. Maybe it's better to commit the batch per entity. Otherwise, if one entity 
has some I/O error, all entities in this write call will be failed?
{code}
186 storeMetrics(entity, currContext, conn);
187   }
188   ps.executeBatch();
189   conn.commit();
{code}

2. stmt doesn't need to be closed explicitly, but conn still does, right?
{code}
277   conn.commit();
278 } catch (SQLException se) {
279   LOG.error(Failed in init data  + se.getLocalizedMessage());
280   throw se;
281 }
{code}

3. Many of PhoenixTimelineWriterImpl private functions can be static.

bq. except for two points

I'm okay if we want to defer it as the future stabilization work.

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
  Labels: BB2015-05-TBR
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
 YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, 
 YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-05-06 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14531964#comment-14531964
 ] 

Zhijie Shen commented on YARN-3134:
---

bq. I'm relying on the underlying LoadingCache to do the close, which is now 
consistent with other connection's logic.

I see, but do we need to create the table on a separate thread? At that moment 
the service is not started yet.

Some more questions about the schema: {{parent VARCHAR, queue VARCHAR}} seems 
not to be necessary columns as we store those in the info section. And we may 
want to rename some more columns:  type - entity_type; singledata - 
single_data; time - timestamp; and table: metric_singledata - 
metric_single_data. Thoughts?

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
  Labels: BB2015-05-TBR
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
 YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, 
 YARN-3134-YARN-2928.005.patch, YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-05-05 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529739#comment-14529739
 ] 

Hadoop QA commented on YARN-3134:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 54s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 38s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 34s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 22s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  26m 15s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12730660/YARN-3134-YARN-2928.004.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 557a395 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7719/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7719/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7719/console |


This message was automatically generated.

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
 YARN-3134-YARN-2928.003.patch, YARN-3134-YARN-2928.004.patch, 
 YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-05-05 Thread Li Lu (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529377#comment-14529377
]

Li Lu commented on YARN-3134:
-

Thanks [~zjshen]! I'm addressing your comments. Meanwhile, I think there are
some points that I'd like to discuss:

bq. How do we chose the size and the expiry time?
Oh, thanks for reminding that! Currently I just set some placeholders for
frequent accesses (10 seconds clean up time) and medium concurrency (16
threads). We may want to tune this case-by-case in the future.

bq. If we use try with resources, do we still need to close stmt? Shall we
close them in finally block?
I believe this is why we need try-with-resource statements. See
https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html

bq. So why name and version should be combined and put it the same cell, but
not be separated?
I've also noticed the HBase implementation and the Phoenix implementation use
those two fields in different ways (the Hbase implementation is not using the
version field, if I understand correctly). How do we want to use the version
field? I was trying to use it as a part of the id to uniquely locate one flow
run

bq. Does phoenix support numeric/decimal? Not sure if we should store the
numbers in these types?
Phoenix supports concrete number types such as INTEGER, BIGINT, SMALLINT,
FLOAT, DOUBLE, etc, but nothing is directly mapped to lang.Number.

[Storage implementation] Exploiting the option of using Phoenix to access
HBase backend
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-05-05 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14529167#comment-14529167
 ] 

Zhijie Shen commented on YARN-3134:
---

Li, thanks for updating the patch. Here're some comments about it.

1. How do we chose the size and the expiry time?
{code}
113 connectionCache = CacheBuilder.newBuilder().maximumSize(16)
114 .expireAfterAccess(10, TimeUnit.SECONDS).removalListener(
{code}

2. If we use try with resources, do we still need to close stmt? Shall we close 
them in finally block?
{code}
235 try (Statement stmt = conn.createStatement()) {
{code}
{code}
272   stmt.close();
273   conn.commit();
274   conn.close();
{code}

3. Seems to be a trivial method wrapper
{code}
292   private K StringBuilder appendVarcharColumnsSQL(
293   StringBuilder colNames, ColumnFamilyInfoK cfInfo) {
294 return appendColumnsSQL(colNames, cfInfo,  VARCHAR);
295   }
{code}

4. So why name and version should be combined and put it the same cell, but not 
be separated? 
{code}
345 ps.setString(idx++,
346 context.getFlowName() + STORAGE_SEPARATOR + 
context.getFlowVersion());
{code}

5. Seems not to be necessary.
{code}
356 if (entity.getConfigs() == null
357  entity.getInfo() == null
358  entity.getIsRelatedToEntities() == null
359  entity.getRelatesToEntities() == null) {
360   return;
361 }
{code}

6. Should info be varbinary?
{code}
245   + INFO_COLUMN_FAMILY + 
PHOENIX_COL_FAMILY_PLACE_HOLDER +  VARCHAR, 
{code}

7. Should config be varchar?
{code}
366   appendColumnsSQL(sqlColumns, new ColumnFamilyInfo(
367   CONFIG_COLUMN_FAMILY, entity.getConfigs().keySet()),  
VARBINARY);
{code}

8. Does phoenix support numeric/decimal? Not sure if we should store the 
numbers in these types?
{code}
268   + singledata VARBINARY 
{code}

9. In storeMetrics, assuming we only deal with single value case now, I think 
it's better to check if the metric is single value first. Another question here 
is if we want to ignore the associated timestamp of the single value? Or we 
should add one more column to store the timestamp of this value.

10. W.R.T the number of conn and threads? Is it better to have the same number 
of conn threads as the number of app collector? And the requests of one app is 
routed to the same thread. This is because I remember somewhere we have 
mentioned we want to isolate between apps. Otherwise, the app with more 
timeline data will occupy more writing capacity to the backend. /cc [~sjlee0]

11. In TestTimelineWriterImpl, can we cover the case that the entity has non 
string info value?

12. In TestPhoenixTimelineWriterImpl, can we verify the each cell are storing 
the right data?

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
 YARN-3134-YARN-2928.003.patch, YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-05-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527835#comment-14527835
 ] 

Hadoop QA commented on YARN-3134:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 51s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | install |   1m 33s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 37s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 34s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  26m  0s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12730332/YARN-3134-YARN-2928.003.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 557a395 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7698/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7698/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7698/console |


This message was automatically generated.

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
 YARN-3134-YARN-2928.003.patch, YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-05-04 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14527632#comment-14527632
 ] 

Li Lu commented on YARN-3134:
-

And, one more thing: I'm closing all PreparedStatements implicitly in the 
try-with-resource statements. This statement will not swallow any exceptions 
(since there's no catch after it) but will guarantee the resource is released 
after the block's execution, even if there're exceptions. 

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
 YARN-3134-YARN-2928.003.patch, YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-05-01 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523303#comment-14523303
 ] 

Junping Du commented on YARN-3134:
--

Thanks [~gtCarrera9] for updating the patch! 
Some comments below:
{code}
+  /** Connection string to the deployed Phoenix cluster */
+  static final String CONN_STRING = jdbc:phoenix:localhost:2181:/hbase;
{code}
Do we need to make port number as configurable given the port could be occupied 
by other applications in production environment? Also CONN_STRING - CONN_ADDR?

{code}
+connectionCache = CacheBuilder.newBuilder().maximumSize(16)
+.expireAfterAccess(10, TimeUnit.SECONDS).removalListener(
+removalListener)
+.build(new CacheLoaderThread, Connection() {
+ @Override public Connection load(Thread key) throws Exception 
{
+   Connection conn = null;
+   try {
+ Class.forName(DRIVER_CLASS_NAME);
+ conn = DriverManager.getConnection(CONN_STRING);
+ conn.setAutoCommit(false);
+   } catch (SQLException se) {
+ LOG.error(Failed to connect to phoenix server! 
+ + se.getLocalizedMessage());
+   } catch (ClassNotFoundException e) {
+ LOG.error(Class not found!  + e.getLocalizedMessage());
+   }
+   return conn;
+ }
+   }
+);
{code}
Indentation issue there. Also, we shouldn't swallow fetal exception. Only 
logging error is not enough, we need to throw exception back.

For write method:
- we should close PreparedStatement in finally block whenever we don't need it, 
or it could affected the JDBC connection cannot be closed physically later.
- IOException should be thrown out when catching a SQLException, as comments 
above, only log error is not enough.

For tryInitTable method:
rename to createTable()? Also throw exception out whenever catch a SQLException.

Similar two issues for methods of 
storeEntityVariableLengthFields/storeMetrics/storeEvents: 
1) PreparedStatement doesn't get closed, 2) exception get swallowed.

Also, I see we are constructing SQL sentences with complicated parameters. Do 
we need to log (at least in debug level) SQL sentence (for some cases, SQL is 
executed successfully but not our expected) ? I think it could be helpful for 
trouble shooting.

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
 YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-30 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522678#comment-14522678
 ] 

Hadoop QA commented on YARN-3134:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 48s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | install |   1m 37s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 37s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 34s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 22s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  26m  8s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729627/YARN-3134-YARN-2928.002.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / b689f5d |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7562/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7562/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7562/console |


This message was automatically generated.

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.001.patch, YARN-3134-YARN-2928.002.patch, 
 YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520520#comment-14520520
 ] 

Hadoop QA commented on YARN-3134:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12729341/YARN-3134-YARN-2928.runJenkins.001.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 4c1af15 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7546/console |


This message was automatically generated.

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134-YARN-2928.runJenkins.001.patch, YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-29 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520599#comment-14520599
 ] 

Hadoop QA commented on YARN-3134:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 47s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 2 new or modified test files. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:red}-1{color} | javac |   7m 58s | The applied patch generated  8  
additional warning messages. |
| {color:green}+1{color} | javadoc |   9m 41s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:red}-1{color} | checkstyle |   4m  5s | The applied patch generated  2 
 additional checkstyle issues. |
| {color:green}+1{color} | install |   1m 39s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 38s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   0m 41s | The patch appears to introduce 
10 new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   0m 23s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  40m 18s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-timelineservice |
|  |  Found reliance on default encoding in 
org.apache.hadoop.yarn.server.timelineservice.storage.FileSystemTimelineWriterImpl.write(String,
 String, String, String, long, String, TimelineEntity, 
TimelineWriteResponse):in 
org.apache.hadoop.yarn.server.timelineservice.storage.FileSystemTimelineWriterImpl.write(String,
 String, String, String, long, String, TimelineEntity, TimelineWriteResponse): 
new java.io.FileWriter(String, boolean)  At 
FileSystemTimelineWriterImpl.java:[line 86] |
|  |  
org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.tryInitTable()
 may fail to clean up java.sql.Statement on checked exception  Obligation to 
clean up resource created at PhoenixTimelineWriterImpl.java:up 
java.sql.Statement on checked exception  Obligation to clean up resource 
created at PhoenixTimelineWriterImpl.java:[line 227] is not discharged |
|  |  
org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.executeQuery(String)
 may fail to close Statement  At PhoenixTimelineWriterImpl.java:Statement  At 
PhoenixTimelineWriterImpl.java:[line 492] |
|  |  A prepared statement is generated from a nonconstant String in 
org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.storeEntityVariableLengthFields(TimelineEntity,
 TimelineCollectorContext, Connection)   At PhoenixTimelineWriterImpl.java:from 
a nonconstant String in 
org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.storeEntityVariableLengthFields(TimelineEntity,
 TimelineCollectorContext, Connection)   At 
PhoenixTimelineWriterImpl.java:[line 389] |
|  |  A prepared statement is generated from a nonconstant String in 
org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.storeEvents(TimelineEntity,
 TimelineCollectorContext, Connection)   At PhoenixTimelineWriterImpl.java:from 
a nonconstant String in 
org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.storeEvents(TimelineEntity,
 TimelineCollectorContext, Connection)   At 
PhoenixTimelineWriterImpl.java:[line 476] |
|  |  A prepared statement is generated from a nonconstant String in 
org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.storeMetrics(TimelineEntity,
 TimelineCollectorContext, Connection)   At PhoenixTimelineWriterImpl.java:from 
a nonconstant String in 
org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.storeMetrics(TimelineEntity,
 TimelineCollectorContext, Connection)   At 
PhoenixTimelineWriterImpl.java:[line 433] |
|  |  A prepared statement is generated from a nonconstant String in 
org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.write(String,
 String, String, String, long, String, TimelineEntities)   At 
PhoenixTimelineWriterImpl.java:from a nonconstant String in 
org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.write(String,
 String, String, String, long, String, TimelineEntities)   At 
PhoenixTimelineWriterImpl.java:[line 167] |
|  |  
org.apache.hadoop.yarn.server.timelineservice.storage.PhoenixTimelineWriterImpl.setBytesForColumnFamily(PreparedStatement,
 Map, int) makes inefficient use of keySet iterator instead of entrySet

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-28 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518480#comment-14518480
 ] 

Sangjin Lee commented on YARN-3134:
---

{{conn.close()}} is called again on l.222 in the finally clause. I think we can 
remove the call on l.216 and rely on the call in the finally clause.

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-28 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518383#comment-14518383
 ] 

Sangjin Lee commented on YARN-3134:
---

Thanks for the patch [~gtCarrera9]!

Some quick comments on the patch:

I'm a little confused as to whether Connection.close() should be called on 
every write. I see write() does not call Connection.close() but tryInitTable() 
calls it. Which is the expected pattern?

- l.216: conn.close() is called twice
- l.318: Are the flow name and the flow version concatenated without any 
separators? Wouldn't that be confusing?
- l.479: is stmt.close() supposed to be called? For that matter, is it better 
to wrap the statement creation in executeQuery() and dropTable() with the 
auto-close try?

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: SettingupPhoenixstorageforatimelinev2end-to-endtest.pdf, 
 YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-28 Thread Li Lu (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14518458#comment-14518458
]

Li Lu commented on YARN-3134:
-

Hi [~sjlee0], thanks for the review! We do not need to call Connection.close()
after every write, since they're cached by the loading cache, and will be
closed upon removal. However, I treat tryInitTable as a special case because
there are potentially less reuse after the table creation. Not sure if this
makes sense so I can change it once more.

A quick question about this comment:

bq. l.216: conn.close() is called twice

I'm a little bit confused here because I was trying to close the connection
after a commit (for potentially pending data). Could you please give me a hint
on why it's closed twice? Thanks!

[Storage implementation] Exploiting the option of using Phoenix to access
HBase backend
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-27 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516107#comment-14516107
 ] 

Hadoop QA commented on YARN-3134:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12728671/YARN-3134-042715.patch 
|
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / feb68cb |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7516/console |


This message was automatically generated.

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134-042715.patch, 
 YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-27 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514388#comment-14514388
 ] 

Junping Du commented on YARN-3134:
--

I would like to raise an important issue for reusing JDBC connections in 
PhoenixTimelineWriterImpl:
It sounds like we only release/close these JDBC connections until the writer 
get stopped. Given the writer's lifecycle is the same as 
TimelineCollectorManager (at current design which could be changed due to 
discussions above) which means it almost the same as RM or NM. It also means we 
don't close/release any JDBC connections in the whole lifecycle of NM/RM. It 
doesn't sounds right as the resource of JDBC connections is pretty expensive 
and very limited (in traditional DB case), phoenix could be better as the 
client only server for local node. However, it could still be expensive when 
large app number especially for RMTimelineCollectorManager.  
In addition, sounds like our cache the connection per thread is also 
problematic: these threads are coming from each collectors, we cache them in a 
Hashmap which could live forever that could affect the GC of these collectors 
even these collectors should be removed when application get finished.

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-24 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511897#comment-14511897
 ] 

Li Lu commented on YARN-3134:
-

Hi [~sjlee0], thanks for the review! I'll address your comments soon, but one 
thing to discuss: the original point for keeping writers inside managers, 
instead of collectors, is to reuse the storage layer connection. Having static 
writers in collectors may be too restrictive, while having writers for each 
collector may accidentally introduce too many heavy weight storage layer 
connections? I think there might be some miscommunications in the collector 
design, where I thought collectors will only hold for collection logic, and 
always need some sort of managers to wrap other logics such as web server and 
writers. 

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-24 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511976#comment-14511976
 ] 

Sangjin Lee commented on YARN-3134:
---

Actually my comment was not so much about collectors having their own writers. 
It might not be the best idea as you mentioned. The main thing I'm concerned 
about is the TimelineCollector's dependency on TimelineCollectorManager *just 
to get the writer*.

The real need of the TimelineCollector is the writer, and true to the 
dependency injection principle, it is the writer that should be injected, 
either through the constructor or through the setter. Is that feasible?

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-24 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512080#comment-14512080
 ] 

Zhijie Shen commented on YARN-3134:
---

I'd like to raise another important issue. According the schema, 
config/info/metrics value are written strings. However, the current data model 
assume that they are objects. Therefore they can be Integer, Long and even 
nested structure that can be passed by jackson ObjectMapper. If we write them 
as strings, how can we read them back and convert them into the corresponding 
objects? Shall we write them as bytes[] instead?

On the other side, I'm not sure we can narrow down config/info/metrics value to 
String only, as we previously allow users to put Integer/Long/Float/Double and 
so on directly into the entity instance.

For the metrics, I even think it shouldn't be String, but usually the decimal 
object. Otherwise, I'm not sure how should we do aggregation upon string values.

bq.  I'm concerned about is the TimelineCollector's dependency on 
TimelineCollectorManager just to get the writer.


To Sangjin's question, I suggest we can change the way to let collect manager 
set the collector writer, and Collector doesn't need to have manager insider, 
but have a setWriter for manage to call.

Some other comments about the patch detail?

1. Should we make it configurable?
{code}
static final String CONN_STRING = jdbc:phoenix:localhost:2181:/hbase;
{code}

2. putEntities are invoked by multiple threads. However, HashMap should not be 
thread safe.
{code}
private HashMapThread, Connection connectionMap = null;
{code}

3. When stopping the writer, should we wait until the current outstanding 
writing gets finished?
{code}
100   @Override
101   protected void serviceStop() throws Exception {
102 // Close all Phoenix connections
103 for (Connection conn : connectionMap.values()) {
104   try {
105 conn.close();
{code}

4. Shall we set auto commit to false, because we commit the batch? Or it's not 
necessary for Phoenix?
{code}
128 Connection conn = getConnection(Thread.currentThread());
{code}

5. So in the backend, we don't put flow name and version in different columns?
 {code}
217   + flow VARCHAR NOT NULL, run UNSIGNED_LONG NOT NULL, 
{code}
{code}
281 ps.setString(idx++, context.getFlowName() + 
context.getFlowVersion());
{code}

6. The conn seems not to be closed and removed after finishing the writes.  It 
may use a lot of necessary resource after running for a long time.

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-24 Thread Sangjin Lee (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14511444#comment-14511444
]

Sangjin Lee commented on YARN-3134:
---

Sorry [~gtCarrera9] for the late comments. I took a quick look at the patch,
and I have yet to delve deep into the SQL- and schema-related parts of the
code. But I have some quick comments on other aspects:

(TimelineCollector.java)
- I'm curious, is there a strong reason to use the TimelineCollectorManager to
obtain the the writer? This would introduce a bi-directional (instance)
dependency between the TimelineCollector and the TimelineCollectorManager, and
it could be problematic. For example, the current timeline service performance
benchmark tool uses TimelineCollector directly without creating a manager. Can
we avoid this dependency?

(PhoenixTimelineWriterImpl.java)
- l.87: I wish you could use ThreadLocal directly, but I do get you'd need to
get all the connections at the end when you stop it
- Please replace all StringBuffers with StringBuilders. StringBuffers should
not be used as a rule as they do unnecessary synchronization.
- l.175: getConnection() is not thread safe with unsynchronized HashMap. Even
though different threads would operate on different keys, it doesn't mean it
will be thread safe with HashMap. You need to use ConcurrentHashMap for this or
another thread-safe concurrent solution.
- l.198: I'm not quite sure why initializeData() should be called every time
the service comes up. Shouldn't we do this only once at the very beginning when
the tables do not exist? Also, the method name initializeData() is bit
misleading. I think initializeTables() is the right name for this?

[Storage implementation] Exploiting the option of using Phoenix to access
HBase backend
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-22 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507914#comment-14507914
 ] 

Vrushali C commented on YARN-3134:
--

Hi [~gtCarrera9]
Thanks for the patch, I had some questions:
- I don't see the isRelatedTo and relatesTo entities being written in this patch
- For the metrics timeseries, I see that the metric values are being written as 
a ; separated list of values as a string, is that right? But I could not 
figure how where the timestamps associated with each metric value are stored. 
Storing metric values as strings would make it harder I think to query in 
numerical queries, like how many entities had GC MILLIS that were more than 25% 
of the CPU MILLIS. 


 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-22 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507986#comment-14507986
 ] 

Li Lu commented on YARN-3134:
-

Hi [~vrushalic], in the current version I've not implemented isRelatedTo and 
relatesTo. I can certainly add this section if it's required for the 
performance benchmark. My current plan is to use the metrics precision table 
for aggregations, and just use the aggregated data for Phoenix SQL queries. I'm 
open in mind about both points, though. 

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-22 Thread Vrushali C (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508005#comment-14508005
]

Vrushali C commented on YARN-3134:
--

Hi [~gtCarrera9]
Thanks!

bq. in the current version I've not implemented isRelatedTo and relatesTo. I
can certainly add this section if it's required for the performance benchmark.
Yes, I think for the PoC we should write everything that the TimelineEntity
class has to the backend store.

bq. My current plan is to use the metrics precision table for aggregations,
and just use the aggregated data for Phoenix SQL queries.
Okay, I see, (for my understanding) how would the query for say, a map task
level metrics be? There won't be any aggregation at that level, no?
Also I am wondering how this metrics timeseries information would be queried.
Could you please explain how the timestamps are stored?

[Storage implementation] Exploiting the option of using Phoenix to access
HBase backend
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-22 Thread Li Lu (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508025#comment-14508025
]

Li Lu commented on YARN-3134:
-

Hi [~vrushalic], sure, will add isRelatedTo and relatesTo since YARN-3431 is
close to finish. For the metrics, my thought is we may need to have some
time-based aggregations, like taking the average (or max) of a few time series
data and store them in an aggregated table. The precision table for now
serves as the raw data table. The user can query on the aggregation table(s)
for data points per-hour, per-day or so. Time stamps information is split into
two parts: the time epoch information, marked by the startTime and endTime of
the metric object, and the actual time for a point in a time series. Epoch
start and end times are used as PKs for the Phoenix storage for better
indexing, and detailed time for each point is stored in the time series. We can
certainly discuss about this design, though...

[Storage implementation] Exploiting the option of using Phoenix to access
HBase backend
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-21 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505903#comment-14505903
 ] 

Hadoop QA commented on YARN-3134:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12727009/YARN-3134-042115.patch 
|
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 2c14690 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7435//console |


This message was automatically generated.

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134-042115.patch, YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-20 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503800#comment-14503800
 ] 

Zhijie Shen commented on YARN-3134:
---

[~vrushalic], so entity relationship will also be included in the POC 
implementation, right?

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-20 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503427#comment-14503427
 ] 

Vrushali C commented on YARN-3134:
--

Hi [~zjshen] Actually, [~gtCarrera9], [~sjlee0] and I were just discussing this 
last week. I was thinking of storing the relatesTo and isRelatedTo as a set of 
comma-separated (some separator) string that contains the list of entities as a 
single value and the column key to be isRelatedTo or relatesTo. That way, 
the reader that needs these values can look at the columns isRelatedTo or 
relatesTo. 



 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-20 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14503414#comment-14503414
 ] 

Zhijie Shen commented on YARN-3134:
---

I noticed that both Phoenix writer and HBase writer (YARN-3411) don't implement 
writing the entity relationship. However, as we may need more thoughts and 
discussion to sort out writing entity relationship, let's keep to the current 
implementation that focus on individual entity details: info, configs, events 
and metrics, and have a separate Jira for storing entity relationship later. 
Does it sound good?

/cc [~vrushalic]

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-17 Thread Li Lu (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499326#comment-14499326
]

Li Lu commented on YARN-3134:
-

Hi [~djp] and [~zjshen], thanks a lot for the review! I'll fix them pretty soon
and upload a new patch. For now, I'm focusing on correctness, readability, and
exception handling. Does that plan sound good to you? Thanks!

[Storage implementation] Exploiting the option of using Phoenix to access
HBase backend
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-17 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499264#comment-14499264
]

Zhijie Shen commented on YARN-3134:
---

Some thoughts about backend POC, not just limited to Phoenix writer, but HBase
writer too.

1. At the current stage, I suggest we focus on logic correctness and
performance tuning. We may have multiple iterations between improving and doing
benchmark.

2. At the beginning, we may not implement storing everything of timeline entity
(such as relationship), but we should at lease make sure what Phoenix writer
and HBase writer have implemented are identical in terms of the data to store.

3. It's good if we can have rich test suites like TimelineStoreTestUtils to
ensure the robustness of the writer. Moreover, it's black box testing, and we
can use them to check if Phoenix writer and HBase writer behave the same.

/cc [~vrushalic]

For Phoenix implementation only:

I used Phoenix writer for a real deployment, and I could see the implementation
is not thread safe. ConcurrentModificatioException will be thrown upon
committing the statements.

[Storage implementation] Exploiting the option of using Phoenix to access
HBase backend
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-17 Thread Li Lu (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499398#comment-14499398
]

Li Lu commented on YARN-3134:
-

Hi [~zjshen] could you please provide some more information to reproduce the
failures? Or, the exception stack would also be helpful. I'm trying to setup a
deployment but would like to make sure we're seeing consistent problems.
Thanks!

[Storage implementation] Exploiting the option of using Phoenix to access
HBase backend
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-17 Thread Junping Du (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14499764#comment-14499764
]

Junping Du commented on YARN-3134:
--

bq. 1. At the current stage, I suggest we focus on logic correctness and
performance tuning. We may have multiple iterations between improving and doing
benchmark
+1. We should get some performance data which help us better understanding on
the direction and priority.

bq. For now, I'm focusing on correctness, readability, and exception handling.
Does that plan sound good to you?
Sounds like a good plan. Thanks [~gtCarrera9].

[Storage implementation] Exploiting the option of using Phoenix to access
HBase backend
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-16 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14498924#comment-14498924
 ] 

Junping Du commented on YARN-3134:
--

Thanks [~gtCarrera9] for delivering a patch here! 
Just start to look at the patch. Some initiative comments so far:

{code}
+  String sql = CREATE TABLE IF NOT EXISTS  + ENTITY_TABLE_NAME
+  + (cluster VARCHAR NOT NULL, user VARCHAR NOT NULL, 
+  + flow VARCHAR NOT NULL, run UNSIGNED_LONG NOT NULL, 
+  + appid VARCHAR NOT NULL, type VARCHAR NOT NULL, 
+  + entityid VARCHAR NOT NULL, 
+  + creationtime UNSIGNED_LONG, modifiedtime UNSIGNED_LONG, 
...
+  stmt.executeUpdate(sql);
+  stmt.close();
+  conn.commit();
{code}
Putting raw SQL sentences in this way sounds a little headache to me as this 
means difficult to debug/maintain in future. Given we could have more tables in 
pipeline, we may want to refactor this in some way to be more maintainable? 
BTW, I don't think HBase support any atomic operation across multiple tables. 
Here we create 3 tables but only one commit which means if 2nd table created 
failed, 1st table should still be created and commit successfully and won't be 
rollback. These partial success after commit doesn't sounds a good practice to 
me.
Additional problem is we didn't close connection here but we need to.

{code}
+  private class TimelineEntityCtxt {
{code}
TimelineEntityCtxt = TimelineEntityContext, better not omit full name (except 
very obviously, like conf for configuration) in method name. It looks like 
exactly the same with TimelineCollectorContext.java. Can we reuse that class 
instead of creating a new duplicated one?

{code}
+  private K, V int setStringsForCf(
{code}
What Cf means? Just like I mentioned above, don't omit the character of a word 
in a method which break code's readability.

{code}
+  private int setStringsForPk(PreparedStatement ps, String clusterId, String 
userId,
{code}
setStringsForPk = setStringsForPrimaryKeys

{code}
+  ResultSet executeQuery(String sql) {
+ResultSet rs = null;
+try {
+  Statement stmt = conn.createStatement();
+  rs = stmt.executeQuery(sql);
+} catch (SQLException se) {
+  LOG.error(SQL exception!  + se.getLocalizedMessage());
+}
+return rs;
+  }
{code}
Does getLocalizedMessage contains enough info (at least the SQL sentences 
executed)? If not, I would prefer we add raw SQL sentences in error message 
when Exception get throw. 

{code}
+// Execute and close
+psConfigInfo.execute();
+psConfigInfo.close();
{code}
Many places like here that we are forgetting to put closable resources to 
finally block. We should close it even exception get throw. 

More comments could comes later.

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134-041415_poc.patch, YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-14 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495180#comment-14495180
 ] 

Li Lu commented on YARN-3134:
-

Hi [~vrushalic], thanks for the note! Yes events and metrics are missing in the 
previous POC patch (actually I should have called them Work-In-Progress). I'm 
currently working on it and mostly done. 

About appending metrics and events, it would be very nice if we can distinguish 
the creation calls, the append and the update calls. For the last two we 
can have some fast path to not touch many fields, and only work on the delta 
part of the entity. We can pretty much simulate that with our current writer 
API, by setting unchanged TimelineEntity fields to be null or empty (like 
events/info/metrics). However, if this is the case, we need to provide some 
wrapper methods in our client to allow users to easily generate the patch, 
hopefully in some user-friendly ways (such as appendMetric(myEntity, myContext, 
newMetric) or updateInfo(myEntity, myContext, infoKey, infoValue), just some 
quick examples...). 

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-14 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495519#comment-14495519
]

Hadoop QA commented on YARN-3134:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12725447/YARN-3134-041415_poc.patch
against trunk revision fddd552.

{color:red}-1 patch{color}. The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7340//console

This message is automatically generated.

[Storage implementation] Exploiting the option of using Phoenix to access
HBase backend
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-14 Thread Vrushali C (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14495133#comment-14495133
]

Vrushali C commented on YARN-3134:
--

Thanks [~gtCarrera9]!

bq. About posting metrics, I was thinking if it's possible to allow users just
send the delta to storage, and we can use some information in the timeline
entity to infer if the entity itself is already in the entity table? If that's
possible then we can have some shortcut (not touching entity table) for faster
metrics updating, which may generate the majority of our storage traffic.

Yes, we do need a way to update a single metric for an entity (regardless of
other implementation aspects like which table or if it's native hbase/phoenix
etc). We had as part of the initial proposal for the TimelineWriter interface
YARN-3031 but then as per review suggestions on that one, decided to add it in
later. I think we do need a writer interface for writing/updating a single
metric.

Also [~gtCarrera9] I think in this patch, you haven't yet included the entity
events? I think I recollect a discussion that we should be adding in only some
lifecycle events but not all, but in any case, the TimelineWriter
Implementation does need to write events to the backend. I think we may want an
API that writes a single event to the backend as well.

[Storage implementation] Exploiting the option of using Phoenix to access
HBase backend
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-13 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493156#comment-14493156
]

Zhijie Shen commented on YARN-3134:
---

Li, thanks for uploading the POC patch. I put some of my thoughts:

1. One entity may need multiple sql sentences to complete one entity write. Do
we need to use transaction? If not (or phoenix doesn't support it), how do we
handle the case if the first sql sentence completes, but the second doesn't.
There will be just partial data. On the other hand, if we use the transaction,
is it going to significantly downgrade the write throughput.

2. From YARN-3448, Jonathan suggested one performance improvement for Leveldb
implementation: sequential writes may be quicker than random writes, such that
we can reorder the records to persist to make them as sequential as possible.
In this case, is it better to write the entity one-by-one (including config,
info), assuming the records are in sequence by PK?

3. To answer the offline question of writing multiple metrics, my thought is:
a) sync write: no matter if the user wrap a single metric or multiple metrics,
we synchronously write it into the backend.
b) async write: server can respond the client immediately, but it can buffer
the entity into the queue, and later on, we merge these metrics together (not
limited to metrics, but even whole entity), and asynchronously write them into
the backend.

4. Do we have a simple deployment, which means by default the backend will
start a HBase on local FS, and have the phoenix lib installed? It's related to
the question what's the default backend of timeline service. If it were the
single node HBase on local FS, we should make sure it is automatically deployed
with default configs. Thoughts?

[Storage implementation] Exploiting the option of using Phoenix to access
HBase backend
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-13 Thread Zhijie Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493215#comment-14493215
 ] 

Zhijie Shen commented on YARN-3134:
---

Another issue: It seems we don't need to build against phoenix 4.3, as the 
timeline service writer implementation use generic JDBC interface. Maybe even 
we don't need to include it inside Hadoop distribution, though it's 
contradictory to the aforementioned simple deployment. However, the benefit is 
that users can choose to deploy with Phoenix 2.x - HBase 0.94.x, Phoenix 3.x - 
HBase 0.94.x or Phoenix 4.x - HBase 0.98.1+.

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Li Lu
 Attachments: YARN-3134-040915_poc.patch, YARN-3134-041015_poc.patch, 
 YARN-3134DataSchema.pdf


 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-13 Thread Li Lu (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14493235#comment-14493235
]

Li Lu commented on YARN-3134:
-

Hi [~vrushalic] and [~zjshen]! Thanks for the comments!

About [~vrushalic]'s questions, for this poc patch I'm not adding metrics info,
but that's my next step. I'm storing configs in the entity table, under a
separate column family CONFIG_COLUMN_FAMILY. Each config item C(k, v) for a PK
will be stored at column CONFIG_COLUMN_FAMILY.k, row PK with value v.

bq. One entity may need multiple sql sentences to complete one entity write. Do
we need to use transaction?
That's a very good question that I'm not sure about the answer right now. Now
we're writing one entity (with a PK) with two writes, one only with static
columns (C_s) and the other only with dynamic columns (C_d). Hbase will
guarantee row level atomicity for each of the write, so I assume the result
after the two calls will be (PK, C_s) or (PK, C_d) or (PK, C_s, C_d). The last
one is the best case of course.

bq. In this case, is it better to write the entity one-by-one (including
config, info), assuming the records are in sequence by PK?
I think you're right. Will look into this improvement.

About deployment, for end users we can either feed then a predefined version of
phoenix+hbase for simpler deployment, or we can allow users to specify the
classpath for the phoenix JDBC driver and choose a version of phoenix+hbase in
a customized way. The latter will unavoidably introduce some difficulties to
deployment, but with more freedom. For now, I think our short-term focus is to
wrap miniclusters to allow UTs pass in our branch (to be prepared for a branch
merge).

About posting metrics, I was thinking if it's possible to allow users just send
the delta to storage, and we can use some information in the timeline entity to
infer if the entity itself is already in the entity table? If that's possible
then we can have some shortcut (not touching entity table) for faster metrics
updating, which may generate the majority of our storage traffic.

[Storage implementation] Exploiting the option of using Phoenix to access
HBase backend
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-11 Thread Vrushali C (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14491173#comment-14491173
]

Vrushali C commented on YARN-3134:
--

Hi [~gtCarrera9]
Thanks for the patch! I took a quick look and had two questions:
1) are you also writing the entity metrics to the backend? I see the primary
key, creation/modification times and configs being written but could not see
entity.getMetrics() anywhere.
2) how are the config column names being written? Are all the columns joined by
a separator? If I can see an example that would be great. For example, say a
config value of pig.job.submitted.timestamp has a string value of
1340218739371, how does it show up in the column family for configs?

[Storage implementation] Exploiting the option of using Phoenix to access
HBase backend
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-10 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490804#comment-14490804
]

Hadoop QA commented on YARN-3134:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12724738/YARN-3134-041015_poc.patch
against trunk revision 2c17da4.

{color:red}-1 patch{color}. The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7310//console

This message is automatically generated.

[Storage implementation] Exploiting the option of using Phoenix to access
HBase backend
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-04-10 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14490791#comment-14490791
]

Hadoop QA commented on YARN-3134:
-

{color:red}-1 overall{color}. Here are the results of testing the latest
attachment

http://issues.apache.org/jira/secure/attachment/12724737/YARN-3134-041015_poc.patch
against trunk revision 2c17da4.

{color:red}-1 patch{color}. The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-YARN-Build/7309//console

This message is automatically generated.

[Storage implementation] Exploiting the option of using Phoenix to access
HBase backend
---

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-03-05 Thread Vrushali C (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349365#comment-14349365
]

Vrushali C commented on YARN-3134:
--

Hi [~swagle]
bq. Do the responses to these API calls return any timeseries data?:
_GetFlowByAppId_ and _GetAppDetails_
No, not for these two. These are specific to that hadoop application id which
is unique.

bq. The set of access patterns do not cover query directly by a metricName. Is
there a use case for this? (Note: General use case for driving graphs)
In hRaven, we usually fetch everything for a given flow and time range and
allow filtering/searching in the UI for querying for a particular metric.

bq. Do you use the hbase native timestamp for querying? This is an obvious
optimization for timeseries data.
No, we don’t use that one at all. We have the submit time of a flow stored as
run id in row key (as well as in columns).
The row key for job history is
{code}
cluster ! user ! flow name ! run id ! app id
{code}
where run id is the submit time of the flow. It is stored as an inverted long,
which helps maintain the sorting such that most recent flow runs are stored
first for that flow. When querying for time series or time range, having this
inverted long in row key helps to set the start and stop rows for scan so that
it's time bound.

Eg:
https://github.com/twitter/hraven/blob/master/hraven-core/src/main/java/com/twitter/hraven/datasource/JobHistoryService.java#L277

bq. however how do you handle out of band data
I am sorry, I didn’t get what is out of band data?

[Storage implementation] Exploiting the option of using Phoenix to access
HBase backend
---

Key: YARN-3134
URL: https://issues.apache.org/jira/browse/YARN-3134
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-03-05 Thread Siddharth Wagle (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349302#comment-14349302
 ] 

Siddharth Wagle commented on YARN-3134:
---

Thanks [~vrushalic] 
*Questions*: 
- Do the responses to these API calls return any timeseries data?: 
_GetFlowByAppId_ and _GetAppDetails_
- The set of access patterns do not cover query directly by a metricName. Is 
there a use case for this? (Note: General use case for driving graphs)
- Do you use the hbase native timestamp for querying? This is an obvious 
optimization for timeseries data, however how do you handle out of band data?

 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens of millions of rows.
 {code}
 It may simply our implementation read/write data from/to HBase, and can 
 easily build index and compose complex query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3134) [Storage implementation] Exploiting the option of using Phoenix to access HBase backend

2015-03-04 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347858#comment-14347858
 ] 

Vrushali C commented on YARN-3134:
--


There is a draft on some flow (and user and queue) based queries to be 
supported put up on jira YARN-3050 that could help us with the schema design. 
  
https://issues.apache.org/jira/secure/attachment/12695071/Flow%20based%20queries.docx

Sharing the schema of some of the hbase tables in hRaven:  (detailed schema at 
https://github.com/twitter/hraven/blob/master/bin/create_schema.rb)

{code}
create 'job_history', {NAME = 'i', COMPRESSION = 'LZO'}
create 'job_history_task', {NAME = 'i', COMPRESSION = 'LZO'}
# job_history (indexed) by jobId table contains 1 column family:
# i: job-level information specifically the rowkey into the
create 'job_history-by_jobId', {NAME = 'i', COMPRESSION = 'LZO'}
# job_history_app_version - stores all version numbers seen for a single app ID
# i: info -- version information
create 'job_history_app_version', {NAME = 'i', COMPRESSION = 'LZO'}
# job_history_agg_daily - stores daily aggregated job info
# the s column family has a TTL of 30 days, it's used as a scratch col family
# it stores the run ids that are seen for that day
# we assume that a flow will not run for more than 30 days, hence it's fine to 
expire that data
create 'job_history_agg_daily', {NAME = 'i', COMPRESSION = 'LZO', BLOOMFILTER 
= 'ROWCOL'},
{NAME = 's', VERSIONS = 1, COMPRESSION = 'LZO', BLOCKCACHE = false, TTL = 
'2592000'}
# job_history_agg_weekly - stores weekly aggregated job info
# the s column family has a TTL of 30 days
# it stores the run ids that are seen for that week
# we assume that a flow will not run for more than 30 days, hence it's fine to 
expire that data
create 'job_history_agg_weekly', {NAME = 'i', COMPRESSION = 'LZO', 
BLOOMFILTER = 'ROWCOL'},
{NAME = 's', VERSIONS = 1, COMPRESSION = 'LZO', BLOCKCACHE = false, TTL = 
'2592000'}

{code}

job_history is the main table. 
It's row key:  cluster!user!application!timestamp!jobID 
cluster, user, application are stored as Strings. timestamp and jobID are 
stored as longs. 
cluster - unique cluster name (ie. “cluster1@dc1”) 
user - user running the application (“edgar”) 
application - application ID (aka flow name) derived from job configuration: 
uses “batch.desc” property if set otherwise parses a consistent ID from 
“mapreduce.job.name” 
timestamp - inverted (Long.MAX_VALUE - value) value of submission time. Storing 
the value as an inverted timestamp ensures the latest jobs are stored first for 
that cluster!user!app. This enables faster retrieval of more recent jobs for 
this flow.
jobID - stored as Job Tracker/Resource Manager start time (long), concatenated 
with job sequence number job_201306271100_0001 - [1372352073732L][1L] 

How the columns are named in hRaven:
- each key in the job history file becomes the column name. For example, for 
finishedMaps, it would be stored as

{code}
column=i:finished_maps,
timestamp= 1425515902000, 
value=\x00\x00\x00\x00\x00\x00\x00\x05
{code}

In the output above, timestamp is the hbase cell timestamp. 

- we store the configuration information with a column name prefix of c!
{code}
column=i:c!yarn.sharedcache.manager.client.thread-count, 
timestamp= 1425515902000,
value=50
{code}

- each counter is stored with a prefix of g! or gr! or gm! 
{code}
For reducer counters, there is a prefix of gr! 
 column=i:gr!org.apache.hadoop.mapreduce.TaskCounter!SPILLED_RECORDS, 
timestamp= 1425515902000
value=\x00\x00\x00\x00\x00\x00\x00\x02

For mapper counters, there is a prefix of gm! 
column=i:gm!org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter!BYTES_READ,
timestamp= 1425515902000, 
value=\x00\x00\x00\x00\x00\x00\x00\x02
{code} 


 [Storage implementation] Exploiting the option of using Phoenix to access 
 HBase backend
 ---

 Key: YARN-3134
 URL: https://issues.apache.org/jira/browse/YARN-3134
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Zhijie Shen
Assignee: Zhijie Shen

 Quote the introduction on Phoenix web page:
 {code}
 Apache Phoenix is a relational database layer over HBase delivered as a 
 client-embedded JDBC driver targeting low latency queries over HBase data. 
 Apache Phoenix takes your SQL query, compiles it into a series of HBase 
 scans, and orchestrates the running of those scans to produce regular JDBC 
 result sets. The table metadata is stored in an HBase table and versioned, 
 such that snapshot queries over prior versions will automatically use the 
 correct schema. Direct use of the HBase API, along with coprocessors and 
 custom filters, results in performance on the order of milliseconds for small 
 queries, or seconds for tens

60 matches

Mail list logo