subject:"\[jira\] \[Commented\] \(YARN\-3411\) \[Storage implementation\] explore the native HBase write schema for storage"


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554787#comment-14554787
 ] 

Junping Du commented on YARN-3411:
--

I see. Thanks for sharing this background. I agree that it shouldn't too much 
different from the performance prospective, so we can decide it later. Prefer 
to open a separated JIRA to discuss on this as a follow up as we either need to 
change for HBase side or Phoenix side. What do you think?

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554786#comment-14554786
 ] 

Junping Du commented on YARN-3411:
--

I see. Thanks for sharing this background. I agree that it shouldn't too much 
different from the performance prospective, so we can decide it later. Prefer 
to open a separated JIRA to discuss on this as a follow up as we either need to 
change for HBase side or Phoenix side. What do you think?

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554794#comment-14554794
]

Junping Du commented on YARN-3411:
--

bq. While I was discussing the patches with Joep Rottinghuis, we thought it
might be better to call it something more generic so that going forward if we
need to add more columns in other column families, we can reuse some of this
structure.
Ok. I didn't aware of this discussion before. We can leave a TODO later either
to make it more generic or sounds more precisely.

About toLowerCase() which is a pretty trivial comment, feel free to address it
or not.

[Storage implementation] explore the native HBase write schema for storage
--

Key: YARN-3411
URL: https://issues.apache.org/jira/browse/YARN-3411
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
Attachments: ATSv2BackendHBaseSchemaproposal.pdf,
YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch,
YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch,
YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch,
YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt,
YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt,
YARN-3411.poc.7.txt, YARN-3411.poc.txt

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554803#comment-14554803
 ] 

Junping Du commented on YARN-3411:
--

Sounds like a plan. I think we need a separated JIRA to follow up with clean.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554868#comment-14554868
 ] 

Vrushali C commented on YARN-3411:
--

Sounds good, done 

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554820#comment-14554820
 ] 

Sangjin Lee commented on YARN-3411:
---

[~vrushalic], how about making YARN-3696 a little broader to address these 
clean-up items, such as removing the extra constructor and other minor review 
feedback?


 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554855#comment-14554855
]

Junping Du commented on YARN-3411:
--

Thanks [~jrottinghuis] for comments!
bq. I think it is reasonable that two implementations can differ in their
backing schema as long as they both can write the data and retrieve the data
with the same key information. Phoenix may need to add somethings to the rowkey
in order to work properly, it may have to add some things, and ditto for the
raw HBase implementation, some additional secondary lookups may be needed etc.
That is part of the performance comparison to see.
I would agree with this in overall: in long term, each implementation should
get optimized to the best which could have different strategies on schema
design. But in short term performance test plan, the ideal case is two schema
of implementations should get as much closed as it can so the result can get
rid of noises caused by schema different. However, I agree that the current
schema different shouldn't hint too much performance different. It should be
fine if we can tolerant these noises.

bq. Junping Du with respect to adding the flow version in the key, I think the
problem with that is that you now require the caller to know what the version
is in order to query back. I don't think that is a natural requirement. I know
that I ran the ComputeUniqueUsers flow on the cluster, so I have user cluster
and flowname, but I don't need to know the version to just query the last few
runs right? If you do have the version (for reducer estimation and you want the
last runs of the same flow back) then it should be possible to query by flow
and by version, but I don't think it should be mandatory. Therefore I don't
think that flow version must perse be a rowkey in all implementations.
Thanks for sharing the use case here. The case for query against cluster and
flowname is pretty solid. However, the query against specific flow version also
reasonable as user may have interest to understand the differences between
different versions of flow on execution time, resource consumption, task
failures/exceptions, etc? In addition, I have a quick question here: does
making flow_version a key means it is mandatory in query? Just like we have
flow_run as key, but we still treat it optional in query (or search). Isn't it?

bq. I think we'll find that with certain schema choices some things will be
more performant while others will be somewhat slower.
That's true. We should keep schema design discussion open even after the patch
here get in.

I would give existing patch a +1 with:
1. open a JIRA for discussion on decision of flow version to key or column,
then either Phoenix or HBase implementation could update with conclusion we
made.
2. open a cleanup JIRA with addressing minor issues mentioned above.

[Storage implementation] explore the native HBase write schema for storage
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554142#comment-14554142
 ] 

Junping Du commented on YARN-3411:
--

Hi [~vrushalic], I had the same concern with [~zjshen]'s previous comments that 
we are putting flow_version in column instead of keys which is different with 
current schema for Phoenix patch. Putting a field as key or column could have 
different performance hint for writing as only row key involve indexing but 
column is not (we don't have secondary index for now). To make our performance 
test/comparison as apple-to-apple, may be we should keep consistent between the 
two schema?

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554959#comment-14554959
 ] 

Vrushali C commented on YARN-3411:
--

Thanks [~djp], filed YARN-3699  for the flow version discussion. 

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554968#comment-14554968
 ] 

Sangjin Lee commented on YARN-3411:
---

Any other review feedback?

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555056#comment-14555056
 ] 

Sangjin Lee commented on YARN-3411:
---

Just committed the patch. Thanks much [~vrushalic] for working on the patch, 
and everyone for very helpful reviews!

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554063#comment-14554063
 ] 

Junping Du commented on YARN-3411:
--

Sure. The plan sounds good to me. Just make sure we keep our eyes on it. Add a 
TODO to the patch could be better. :)

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554076#comment-14554076
]

Junping Du commented on YARN-3411:
--

Automation is a good point. However, the other point is about wrong operation.
The significant ops like creation/drop the table could be better to left user
to handle and decisions. Just like HDFS, user need to do format manually before
starting the namenode for the first time. It could be better to follow the same
practice with providing commondline for admin user to format (initialize)
table. Thoughts?

[Storage implementation] explore the native HBase write schema for storage
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554521#comment-14554521
 ] 

Vrushali C commented on YARN-3411:
--

Hi [~djp] 
I understand your point. The schema between the two approaches is quite 
different, not just from the row key perspective. 

[~gtCarrera9] I am wondering why is flow version part of the row key? Does it 
need to be or it can be a column in Phoenix?

thanks
Vrushali


 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554268#comment-14554268
 ] 

Junping Du commented on YARN-3411:
--

Beside the major comment above, some minor comments over the latest (007) patch 
- some of them are NITs (optional to address here or a follow up JIRA):
{code}
  public HBaseTimelineWriterImpl(Configuration conf) throws IOException {
super(conf.get(yarn.application.id,
HBaseTimelineWriterImpl.class.getName()));
  }
{code}
I don't quite understand this. Do we try to add a new configuration 
yarn.application.id here? It doesn't sounds right and 
HBaseTimelineWriterImpl.class.getName() doesn't sounds like a default value for 
this configuration. Am I missing anything?

{code}
+enum EntityColumnDetails {
+  ID(EntityColumnFamily.INFO, id),
+  TYPE(EntityColumnFamily.INFO, type),
+  CREATED_TIME(EntityColumnFamily.INFO, created_time),
+  MODIFIED_TIME(EntityColumnFamily.INFO, modified_time),
+  FLOW_VERSION(EntityColumnFamily.INFO, flow_version),
+  PREFIX_IS_RELATED_TO(EntityColumnFamily.INFO, r),
+  PREFIX_RELATES_TO(EntityColumnFamily.INFO, s),
+  PREFIX_EVENTS(EntityColumnFamily.INFO, e);
{code}
Given all columns here belongs to INFO C_F, we can omit the parameter of 
EntityColumnFamily.INFO. Also, rename EntityColumnDetails to 
EntityInfoColumnFamilyDetails could sound more clear.

{code}
+  private EntityColumnDetails(EntityColumnFamily columnFamily, 
+  String value) {
+this.columnFamily = columnFamily;
+this.value = value;
+this.inBytes = Bytes.toBytes(this.value.toLowerCase());
+  }
{code}
This is a private constructor and all callers are under control to make sure 
value is already lower case. So toLowerCase() is not necessary (the same for 
EntityColumnFamily).

Other looks fine.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14555133#comment-14555133
 ] 

Junping Du commented on YARN-3411:
--

Excellent work! Thanks [~vrushalic], [~sjlee0] and all for the contribution!

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554877#comment-14554877
 ] 

Junping Du commented on YARN-3411:
--

Address 2nd one in YARN-3696 also sounds good.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554532#comment-14554532
]

Vrushali C commented on YARN-3411:
--

HI [~djp]
Thanks, appreciate your review very much!
bq. Given all columns here belongs to INFO C_F, we can omit the parameter of
EntityColumnFamily.INFO. Also, rename EntityColumnDetails to
EntityInfoColumnFamilyDetails could sound more clear.

I actually had it named as EntityInfoColumnDetails in the previous patches.
While I was discussing the patches with [~jrottinghuis], we thought it might be
better to call it something more generic so that going forward if we need to
add more columns in other column families, we can reuse some of this structure.
Also, [~jrottinghuis] had some really good ideas on how to make things very
generic across tables but in the interest of time, this was not done in this
patch.

bq. This is a private constructor and all callers are under control to make
sure value is already lower case. So toLowerCase() is not necessary (the same
for EntityColumnFamily).
Yes, the values being initialized right now are lower case, but going forward,
if some adds in an uppercase value, or a sentence case, we may not realize it.
While storing in hbase, we need to careful about the column names, since we
need to query for that exact name, hence the explicit lower casing in the
constructor.

bq. Do we try to add a new configuration yarn.application.id here? It doesn't
sounds right and HBaseTimelineWriterImpl.class.getName() doesn't sounds like a
default value for this configuration. Am I missing anything?

So, I didn't have a good string to pass in, so I picked a value. But I am open
to initializing it to something more appropriate. Is there a general
recommendation around this... I can change it to that.

thanks
Vrushali

[Storage implementation] explore the native HBase write schema for storage
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554544#comment-14554544
 ] 

Vrushali C commented on YARN-3411:
--

Yes, I in fact filed a jira https://issues.apache.org/jira/browse/YARN-3696

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554586#comment-14554586
]

Sangjin Lee commented on YARN-3411:
---

[~djp], I understand your concern on this. That said, as Vrushali mentioned,
the schemas are sufficiently different between phoenix and hbase so that I
doubt that the presence of the flow version in the row key will make a
significant difference. Also, in the specific test we will run, the flow
version will be trivial (1).

For background, I chimed in on a few JIRAs that the flow version does not need
to be part of the row key (only the flow id and the run id should). I cannot
find those comments easily, but IMO the flow version does not constitute a
primary key. Rather, it is an attribute of the flow, and can be stored off the
primary key.

[Storage implementation] explore the native HBase write schema for storage
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-21 Thread Joep Rottinghuis (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554595#comment-14554595
]

Joep Rottinghuis commented on YARN-3411:

I think it is reasonable that two implementations can differ in their backing
schema as long as they both can write the data and retrieve the data with the
same key information. Phoenix may need to add somethings to the rowkey in order
to work properly, it may have to add some things, and ditto for the raw HBase
implementation, some additional secondary lookups may be needed etc. That is
part of the performance comparison to see.

[~djp] with respect to adding the flow version in the key, I think the problem
with that is that you now require the caller to know what the version is in
order to query back. I don't think that is a natural requirement. I know that I
ran the ComputeUniqueUsers flow on the cluster, so I have user cluster and
flowname, but I don't need to know the version to just query the last few runs
right? If you do have the version (for reducer estimation and you want the last
runs of the same flow back) then it should be possible to query by flow _and_
by version, but I don't think it should be mandatory.
Therefore I don't think that flow version must perse be a rowkey in all
implementations.

I think we'll find that with certain schema choices some things will be more
performant while others will be somewhat slower. It will be a mater of finding
those schema choices that will give good enough write performance to handle
scale and give good read performance for the most common use cases, while
maintaining reasonable performance for other queries.

[Storage implementation] explore the native HBase write schema for storage
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554599#comment-14554599
 ] 

Sangjin Lee commented on YARN-3411:
---

This constructor ({{HBaseTimelineWriterImpl(Configuration)}}) will not be used 
by non-test code as it is instantiated via the no-arg constructor. I think we 
should simply remove this. I'm fine with cleaning this up after the performance 
test.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-21 Thread Joep Rottinghuis (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554606#comment-14554606
 ] 

Joep Rottinghuis commented on YARN-3411:


With respect to discussion on auto-creating tables I strongly agree with 
Junping that schema creation should be a separate and discrete step.
I do think that this should be as simple to do as possible, but auto-creating 
tables and simply creating them on the fly if they don't exist might seem like 
a good thing to do to avoid initial friction, but will lead to very surprising 
results if any user ever has a problem with the classpath setup and/or 
configurations rolled to a cluster.
We operate a dozen or so production, ad-hoc and test clusters, some specific 
with only HBase, others without. The odds of passing a wrong config, or getting 
a config mixed up is significant. With auto-creation one could simply connect 
to the wrong HBase instance and then data would start flowing to the wrong 
cluster. I think it'd be better to see explicit failures in that case.

The error message should probably be crystal clear and read something like: ATS 
(class so-and-so) is trying to write to HBase cluster (so-and-so) and is 
missing required table (so-and-so).

In an earlier comment I suggested to have a config key to have a prefix for all 
table names so that people can easily switch all tables from one schema (test, 
experimentation) to another (staging, prod).

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-21 Thread Li Lu (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14554690#comment-14554690
]

Li Lu commented on YARN-3411:
-

Sorry about the delayed reply. Firstly, I thought flowVersion was a part of the
ID of a timeline entity. If this is wrong, then we may probably change the
Phoenix implementation rather than fix it in HBase. I can make this quick fix.
Secondly, I'm not sure the impact of this on the performance evaluation planned
tomorrow. In the test I think we mainly care about the raw writing performance
to both systems. If we're sending the same amount of data and the flowVersion
fields are all the same, then I expect the impact of this schema difference
will not be the dominant factor.

[Storage implementation] explore the native HBase write schema for storage
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-20 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553186#comment-14553186
 ] 

Li Lu commented on YARN-3411:
-

Hi [~vrushalic], sure, don't worry about the test code clean up for now. I'll 
try it locally. 

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-20 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553302#comment-14553302
 ] 

Hadoop QA commented on YARN-3411:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m  4s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 45s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 44s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 15s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 41s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 41s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 38s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 14s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  37m 30s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12734244/YARN-3411-YARN-2928.007.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 463e070 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/8033/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/8033/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/8033/console |


This message was automatically generated.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-20 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553369#comment-14553369
 ] 

Li Lu commented on YARN-3411:
-

I looked at the latest patch and think it's in a good shape for performance 
benchmarks. I've also ran it with a local single node hbase cluster, and it 
worked fine with our performance benchmark application as well as the PI sample 
application. 

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-20 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553370#comment-14553370
 ] 

Sangjin Lee commented on YARN-3411:
---

The latest patch LGTM. I'm fine with having a follow-up JIRA to address 
Junping's comment (and other minor issues if any). Once everyone chimes in and 
gives it +1, I'd be happy to commit this patch.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411-YARN-2928.007.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-20 Thread Vrushali C (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553352#comment-14553352
]

Vrushali C commented on YARN-3411:
--

bq. Or we cannot diff value with null and real 0.

Hi Junping,
So, currently, you are right, we can't differentiate between nulls and real 0.
In hRaven, we use 0 in case of nulls for long or int values. But for things
like timestamps, we need stricter checks. After the performance test, I will
file a jira to ensure we handle this more carefully and return null (Long
object) in case it's actually null. Hope that is fine.

thanks
Vrushali

[Storage implementation] explore the native HBase write schema for storage
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-20 Thread Vrushali C (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14553353#comment-14553353
]

Vrushali C commented on YARN-3411:
--

bq. Or we cannot diff value with null and real 0.

thanks
Vrushali

[Storage implementation] explore the native HBase write schema for storage
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-19 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549848#comment-14549848
]

Zhijie Shen commented on YARN-3411:
---

[~vrushalic], thanks for working on the patch. Some comment from my side:

1. I saw in HBase implementation flow version is not included as part of row
key. This is a bit different from primary key design of Phoenix implementation.
Would you mind elaborating your rationale a bit?

2. Shall we make the constants in TimelineEntitySchemaConstants follow Hadoop
convention? We can keep them in this class now. Once we decide to move on with
HBase impl, we should move (some of) them into YarnConfiguration as API.

3. I saw the classes are marked \@Public, but they're the backend classes and
not accessible by the user directly. In fact, you can leave these classes not
annotated.

4. According to TimelineSchemaCreator, we need to run command line to create
the table when we setup the backend, right? Can we include creating the table
into the lifecycle of HBaseTimelineWriterImpl?

[Storage implementation] explore the native HBase write schema for storage
--

Key: YARN-3411
URL: https://issues.apache.org/jira/browse/YARN-3411
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
Attachments: ATSv2BackendHBaseSchemaproposal.pdf,
YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch,
YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch,
YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch,
YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt,
YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt,
YARN-3411.poc.txt

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-19 Thread Zhijie Shen (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551018#comment-14551018
]

Zhijie Shen commented on YARN-3411:
---

bq. A flow can be uniquely identified with the flow name and run id (and of
course cluster and user id).

I think in Phoenix impl, we have treated version as part of identifier of a
unique flow.

bq. Hmm, so schema creation happens more or less once in the lifetime of the
hbase cluster like during cluster setup (or perhaps if we decide to drop and
recreate it, which is rare in production). I believe writers will come to life
and cease to exist with each yarn application lifecycle but cluster is more or
less eternal, so adding to this step to the lifecycle of a Writer Impl object
seems somewhat out of place to me.

Fair point. And this is another place different from Phoenix impl, which
creates table if they don't exist. My perspective is more about automation, and
it's better to leave fewer steps for users to setup the service. Perhaps we can
find somewhere else to invoke the table initialization once if the service is
setup for YARN cluster, and HBase/Phoenix is used as the backend.

[Storage implementation] explore the native HBase write schema for storage
--

Key: YARN-3411
URL: https://issues.apache.org/jira/browse/YARN-3411
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
Attachments: ATSv2BackendHBaseSchemaproposal.pdf,
YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch,
YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch,
YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch,
YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt,
YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt,
YARN-3411.poc.txt

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-19 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550686#comment-14550686
 ] 

Sangjin Lee commented on YARN-3411:
---

Regarding the public/private annotations, I often got beat it to the head that 
in principle the private annotation is opt-in; i.e. if there is no visibility 
annotation it's implicitly assumed to be up for use. I've gotten review 
comments that said we should mark them explicitly as private even if they are 
clearly YARN-internal classes. That's just my experience on this.

What is the official recommendation on this? cc [~vinodkv], [~kasha]

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-19 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550688#comment-14550688
 ] 

Junping Du commented on YARN-3411:
--

bq. I was thinking it is not necessary since the entity information would come 
in a more streaming fashion, one update at a time anyways. If say, one column 
is written and other is not, the callee can retry again, hbase put will simply 
over-write existing value.
It sounds reasonable. Let's keep it so far and check if necessary to change for 
some corner cases (like we could update event and metrics at the same time for 
some job's final counter) in future.

Thanks for addressing my previous comments. Latest patch looks much closer! 
Some additional comments (besides Zhijie's comments above) on latest (006) 
patch:
In TimelineCollectorManager.java,
{code}
+  @Override
+  protected void serviceStop() throws Exception {
+super.serviceStop();
+if (writer != null) {
+  writer.close();
+}
+  }
{code}
We should stop running collectors before stopping the shared writer. Also, put 
super.serviceStop(); to the last one to stop as conforming the common practice.

In EntityColumnFamilyDetails.java,
{code}
+   * @param rowKey
+   * @param entityTable
+   * @param inputValue
+   * @throws IOException
+   */
+  public void store(byte[] rowKey, BufferedMutator entityTable, String key,
+  String inputValue) throws IOException {
{code}
Lacking a param of key in javadoc.



In HBaseTimelineWriterImpl.java,
For write() which is synchronized semantics so far (we haven't implemented 
async yet), we put each kind of entity to the table by calling 
entityTable.mutate(...) which cache these entities locally and flush it later 
under some conditions. Do we need to call entityTable.flush() for semantics of 
strict synchronized writing? If not, at least, we should flush it at 
serviceStop() as close() directly could lose some cached data.

{code}
  @Override
  protected void serviceStop() throws Exception {
super.serviceStop();
if (entityTable != null) {
  LOG.info(closing entity table);
  entityTable.close();
}
if (conn != null) {
  LOG.info(closing the hbase Connection);
  conn.close();
}
  }
{code}
Also, as comments above, put super.serviceStop() as the last one to stop.

In Range.java,
{code}
+@InterfaceAudience.Private
+@InterfaceStability.Unstable
+public class Range {
{code}
For class marked as private, we don't necessary to put InterfaceStability 
annotation there. 

In TimelineWriterUtils.java,

{code}
+for (byte[] comp : components) {
+  finalSize += comp.length;
+}
{code}
Null check for comp.

{code}
+  System.arraycopy(components[i], 0, buf, offset, components[i].length);
+  offset += components[i].length;
{code}
Null check for components[i].

{code}
+   * @param source
+   * @param separator
+   * @return byte[][] after splitting the input source
+   */
+  public static byte[][] split(byte[] source, byte[] separator, int limit) {
{code}
Missing param of limit in javadoc.

It sounds we didn't do any null check for separator in splitRanges(), but we do 
null check in join(). It should keep consistent at least in one class.

{code}
+  public static long getValueAsLong(final byte[] key,
+  final Mapbyte[], byte[] taskValues) throws IOException {
+byte[] value = taskValues.get(key);
+if (value != null) {
+  Number val = (Number) GenericObjectMapper.read(value);
+  return val.longValue();
+} else {
+  return 0L;
+}
+  }
{code}
Shall we use Long instead of long? Or we cannot diff value with null and real 0.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-19 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550714#comment-14550714
 ] 

Karthik Kambatla commented on YARN-3411:


My recommendation regarding visibility annotations is to always specify them 
irrespective of whether they are Private or Public. My rationale - at the time 
of implementing something, it is good to actively think about intended usage. 

That said, our compat guidelines explicitly say that classes not annotated are 
implicitly Private. 


 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-19 Thread Li Lu (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551238#comment-14551238
]

Li Lu commented on YARN-3411:
-

Hi [~vrushalic], thanks for working on this! I just got back and looked at the
latest (v6) patch. In general I think it's quite close. I just have a few quick
comments.

# I notice you made {{EntityColumnFamilyDetails}} and
{{EntityInfoColumnDetails}} to be public. Even though we're adding @Private
tags to them, we may still want to use the strictest access modifier possible.
If they're fine to be visible to package only, maybe we can use default, rather
than public for them? (I tried with my local version and seems like they worked
fine, but not sure if I missed anything. )
# I share the same concern as [~zjshen] on schema creation. Having a separate
step to set up the HBase data schema appears to add some workload to Hadoop
deployments. So, although it's totally fine to have this step for our
performance benchmark, it will be great to keep this in mind when we
consolidate the storage part.
# Maybe it's helpful to move the entity creation steps of
TestHBaseTimelineWriterImpl to TestTimelineWriterImpl? In this way we can
improve test coverage for both writer implementations after this work.

[Storage implementation] explore the native HBase write schema for storage
--

Key: YARN-3411
URL: https://issues.apache.org/jira/browse/YARN-3411
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
Attachments: ATSv2BackendHBaseSchemaproposal.pdf,
YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch,
YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch,
YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch,
YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt,
YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt,
YARN-3411.poc.txt

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-19 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14551251#comment-14551251
 ] 

Vrushali C commented on YARN-3411:
--

Thanks [~gtCarrera9]! Will make the access modifier change and look into the 
TestTimelineWriterImpl point. 

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-19 Thread Vrushali C (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550660#comment-14550660
]

Vrushali C commented on YARN-3411:
--

Hi [~zjshen]

Thanks for the review!
bq I saw in HBase implementation flow version is not included as part of row
key. This is a bit different from primary key design of Phoenix implementation.
Would you mind elaborating your rationale a bit?

Yes, I think the flow version need not be part of the primary key. A flow can
be uniquely identified with the flow name and run id (and of course cluster and
user id). Given a run id, we can determine the version. For production jobs,
the version does not change, so we would be repeating it across runs. I haven’t
looked into the Phoenix schema to understand why it is needed on the Phoenix
side. cc [~gtCarrera9]

bq Shall we make the constants in TimelineEntitySchemaConstants follow Hadoop
convention? We can keep them in this class now. Once we decide to move on with
HBase impl, we should move (some of) them into YarnConfiguration as API.

Yes, I did not add them to YarnConfiguration as API since I figured it may be
cleaner to keep this code contained within timelineservice.storage to help
remove it if needed. But will rename them as per Hadoop convention.

bq. In fact, you can leave these classes not annotated.
I see, I had added the annotations for these classes after some of the review
suggestions, I think from @sjlee.

bq. According to TimelineSchemaCreator, we need to run command line to create
the table when we setup the backend, right? Can we include creating the table
into the lifecycle of HBaseTimelineWriterImpl?

Hmm, so schema creation happens more or less once in the lifetime of the hbase
cluster like during cluster setup (or perhaps if we decide to drop and recreate
it, which is rare in production). I believe writers will come to life and cease
to exist with each yarn application lifecycle but cluster is more or less
eternal, so adding to this step to the lifecycle of a Writer Impl object seems
somewhat out of place to me.

Appreciate the review!
thanks
Vrushali

[Storage implementation] explore the native HBase write schema for storage
--

Key: YARN-3411
URL: https://issues.apache.org/jira/browse/YARN-3411
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
Attachments: ATSv2BackendHBaseSchemaproposal.pdf,
YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch,
YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch,
YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch,
YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt,
YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt,
YARN-3411.poc.txt

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-19 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550717#comment-14550717
 ] 

Karthik Kambatla commented on YARN-3411:


By classes I meant outer classes. Inner classes and other members inherit the 
annotation of the outer class. 

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-19 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550733#comment-14550733
 ] 

Junping Du commented on YARN-3411:
--

bq.  Regarding the public/private annotations, I often got beat it to the head 
that in principle the private annotation is opt-in; i.e. if there is no 
visibility annotation it's implicitly assumed to be up for use. I've gotten 
review comments that said we should mark them explicitly as private even if 
they are clearly YARN-internal classes. That's just my experience on this.
In my understanding, it depends on if this class is within api package that 
under hadoop-yarn-api or hadoop-yarn-common modules. If so, we may need to 
explicitly mark it as Private if we don't like to share it outside of hadoop 
projects. For other classes (like case here for TimelineWriterUtils which is in 
server side), we don't have to mark any annotation. [~vinodkv], can you confirm 
this?

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-19 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14550858#comment-14550858
 ] 

Vrushali C commented on YARN-3411:
--

Thanks [~djp] ! I will make these changes.. 

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547959#comment-14547959
 ] 

Hadoop QA commented on YARN-3411:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 23s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   8m  0s | There were no new javac warning 
messages. |
| {color:red}-1{color} | javadoc |  10m  1s | The applied patch generated  2  
additional warning messages. |
| {color:green}+1{color} | release audit |   0m 24s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 15s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 41s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 42s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 38s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   1m 16s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  38m 26s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12733531/YARN-3411-YARN-2928.004.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 463e070 |
| javadoc | 
https://builds.apache.org/job/PreCommit-YARN-Build/7966/artifact/patchprocess/diffJavadocWarnings.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7966/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7966/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7966/console |


This message was automatically generated.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14547730#comment-14547730
 ] 

Hadoop QA commented on YARN-3411:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m  5s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:red}-1{color} | javac |   3m 18s | The patch appears to cause the 
build to fail. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12733480/YARN-3411-YARN-2928.003.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 463e070 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7965/console |


This message was automatically generated.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-18 Thread Junping Du (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548335#comment-14548335
]

Junping Du commented on YARN-3411:
--

bq. No, we will never drop the last value. MIN_VERSIONS and TTL are set such
that last value is always retained. I am setting the MAX_VERSIONS for now to
200, but we can revisit this when we determine how exactly the timeseries data
is going to be handled. And of course it can be made configurable.
I meant the earliest (oldest) value not the latest. Agree that we can revisit
the value later for other cases that I mentioned above, but just want to double
check we don't have other options, i.e. making time series data as different
rows or columns rather than different timestamps/versions here.

bq. Wondering why we would aggregate data in one timeseries for one metric
over time?
That's because the interested interval (present to enduser) is not always the
same interval for gathering timeline metrics data. Let's saying we received
container metrics data from NodeManager every second, but the aggregated data
user interested is per minutes, then we need to aggregate 60 seconds data for
one single metrics. Make sense?

Thanks for updating the patch. Just quickly check latest patch, a few comments
so far:
1. Sounds like we don't leverage single row transaction of HBase feature, as we
are updating different column families (events, configurations, metrics, etc.)
separately. Do we need to make sure data in each row get updated consistently?

2. We shouldn't swallow exception in updating data to HBase, just log.error()
may not be enough.

3. We need to check null in writing TimelineEntity to HBase, as TimelineEntity
could include null events/configurations/metrics, that could make foreach later
throw NPE exception.

Comments with more details could come later.

[Storage implementation] explore the native HBase write schema for storage
--

Key: YARN-3411
URL: https://issues.apache.org/jira/browse/YARN-3411
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
Attachments: ATSv2BackendHBaseSchemaproposal.pdf,
YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch,
YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch,
YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt,
YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt,
YARN-3411.poc.txt

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-18 Thread Vrushali C (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548379#comment-14548379
]

Vrushali C commented on YARN-3411:
--

Hi [~djp]

Thanks for the initial quick feedback! Some responses below:

bq. Do we need to make sure data in each row get updated consistently?
I was thinking it is not necessary since the entity information would come in a
more streaming fashion, one update at a time anyways. If say, one column is
written and other is not, the callee can retry again, hbase put will simply
over-write existing value.

bq. We shouldn't swallow exception in updating data to HBase, just log.error()
may not be enough.
Okay, let me look through and modify that.

bq. We need to check null in writing TimelineEntity to HBase, as TimelineEntity
could include null events/configurations/metrics, that could make foreach later
throw NPE exception
I have added some null checks, I will go over the code again and update it to
ensure I have null checks for entity class members like configurations, metrics
etc.

[Storage implementation] explore the native HBase write schema for storage
--

Key: YARN-3411
URL: https://issues.apache.org/jira/browse/YARN-3411
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
Attachments: ATSv2BackendHBaseSchemaproposal.pdf,
YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch,
YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch,
YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt,
YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt,
YARN-3411.poc.txt

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-18 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14548383#comment-14548383
 ] 

Vrushali C commented on YARN-3411:
--


The patch has an overall -1 due to a couple of javadoc warnings. Will fix those 
in the next patch. 

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549725#comment-14549725
 ] 

Hadoop QA commented on YARN-3411:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 15s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 58s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 53s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 16s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 39s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 38s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 15s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  38m  2s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12733693/YARN-3411-YARN-2928.006.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 463e070 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7990/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7990/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf903.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7990/console |


This message was automatically generated.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-18 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549692#comment-14549692
 ] 

Sangjin Lee commented on YARN-3411:
---

It seems like the line that adds the hbase configuration was removed from 
HBaseTimelineWriterImpl. Is that intentional? How would it be able to use and 
load hbase configuration then? Come to think of it, I think we may need to add 
both hbase-site.xml and hbase-default.xml?

{code}
conf.addResource(hbase-default.xml);
conf.addResource(hbase-site.xml);
{code}

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-18 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549776#comment-14549776
 ] 

Sangjin Lee commented on YARN-3411:
---

Or, better:

{code}
Configuration hbaseConf = HBaseConfiguration.create(conf);
{code}


 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411-YARN-2928.006.patch, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549601#comment-14549601
 ] 

Hadoop QA commented on YARN-3411:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m  2s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 43s | There were no new javac warning 
messages. |
| {color:red}-1{color} | javadoc |   9m 46s | The applied patch generated  3  
additional warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 19s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 41s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 38s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:green}+1{color} | yarn tests |   1m 14s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  37m 30s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12733677/YARN-3411-YARN-2928.005.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 463e070 |
| javadoc | 
https://builds.apache.org/job/PreCommit-YARN-Build/7986/artifact/patchprocess/diffJavadocWarnings.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7986/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7986/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf909.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7986/console |


This message was automatically generated.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-18 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549631#comment-14549631
 ] 

Vrushali C commented on YARN-3411:
--


Updating the code now to fix the javadoc warning and [~sjlee0]'s review 
suggestion. 

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-18 Thread Sangjin Lee (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549563#comment-14549563
]

Sangjin Lee commented on YARN-3411:
---

Thanks for updating the patch [~vrushalic]! A couple of quick comments:

(TimelineCollectorManager.java)
- It's a good catch! While we're at it, could we also add an override for
serviceStart() where it calls writer.start()?

(HBaseTimelineWriterImpl.java)
- l.71: We need to remove this constructor since the no-arg constructor is the
real constructor. Keeping this would be error-prone. For example, the
addResource(hbase-site.xml) call is not being done via the no-arg constructor
route. We should move any operations that require the configuration to the
serviceInit() method.

[Storage implementation] explore the native HBase write schema for storage
--

Key: YARN-3411
URL: https://issues.apache.org/jira/browse/YARN-3411
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
Attachments: ATSv2BackendHBaseSchemaproposal.pdf,
YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch,
YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch,
YARN-3411-YARN-2928.005.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt,
YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt,
YARN-3411.poc.7.txt, YARN-3411.poc.txt

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14549664#comment-14549664
 ] 

Hadoop QA commented on YARN-3411:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  14m 57s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 43s | There were no new javac warning 
messages. |
| {color:red}-1{color} | javadoc |   9m 40s | The applied patch generated  3  
additional warning messages. |
| {color:green}+1{color} | release audit |   0m 22s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 15s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 35s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 38s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 37s | The patch does not introduce 
any new Findbugs (version 3.0.0) warnings. |
| {color:red}-1{color} | yarn tests |   7m 49s | Tests failed in 
hadoop-yarn-server-timelineservice. |
| | |  43m 43s | |
\\
\\
|| Reason || Tests ||
| Failed unit tests | 
hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineWriterImpl |
|   | hadoop.yarn.server.timelineservice.storage.TestPhoenixTimelineWriterImpl |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12733677/YARN-3411-YARN-2928.005.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 463e070 |
| javadoc | 
https://builds.apache.org/job/PreCommit-YARN-Build/7989/artifact/patchprocess/diffJavadocWarnings.txt
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7989/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7989/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7989/console |


This message was automatically generated.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411-YARN-2928.003.patch, YARN-3411-YARN-2928.004.patch, 
 YARN-3411-YARN-2928.005.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-15 Thread Li Lu (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545857#comment-14545857
]

Li Lu commented on YARN-3411:
-

Hi [~djp]! I totally agree with your concerns. However, for now the Phoenix
storage system still doesn't have the support to timeline series data. This
said, maybe we can firstly focus on the storage of single data, and then
address the problems for timeline series in a separate JIRA? I'd appreciate
your suggestions on this. Thanks!

[Storage implementation] explore the native HBase write schema for storage
--

Key: YARN-3411
URL: https://issues.apache.org/jira/browse/YARN-3411
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
Attachments: ATSv2BackendHBaseSchemaproposal.pdf,
YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch,
YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt,
YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt,
YARN-3411.poc.txt

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-15 Thread Junping Du (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14545788#comment-14545788
]

Junping Du commented on YARN-3411:
--

Hi [~vrushalic] and all, there is one question I had for a long time:
Looks like we are updating time series metrics data as versions of metrics cell
instead of inserting of a row (with time serve as a column). Do we have an
estimation for performance cost for aggregation metrics data over cell versions
vs. rows? The original usecase for cell versions coming from BigTable is to
store web pages which could only be limited number of versions (too stale
versions could be dropped). In our case, the metrics data could be updated much
more frequently and we cannot drop the earliest data. I see we are setting max
version to be 200, in
cf3.setMaxVersions(EntityTableDetails.ENTITY_TABLE_METRICS_MAX_VERSIONS_DEFAULT);.
It sounds like it may not be enough for cases, like: long running services,
container reuse, etc., this even sounds challenge for normal container (assume
appCollector doesn't cache metrics data locally, the interval default is: 1
sec).
Thoughts?

[Storage implementation] explore the native HBase write schema for storage
--

Key: YARN-3411
URL: https://issues.apache.org/jira/browse/YARN-3411
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
Attachments: ATSv2BackendHBaseSchemaproposal.pdf,
YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch,
YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt,
YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt,
YARN-3411.poc.txt

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-15 Thread Vrushali C (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546166#comment-14546166
]

Vrushali C commented on YARN-3411:
--

Hi [~djp]
Thanks for the review!

bq. In our case, the metrics data could be updated much more frequently and we
cannot drop the earliest data
No, we will never drop the last value. MIN_VERSIONS and TTL are set such that
last value is always retained. I am setting the MAX_VERSIONS for now to 200,
but we can revisit this when we determine how exactly the timeseries data is
going to be handled. And of course it can be made configurable.

bq. It sounds like it may not be enough for cases,
With hbase 1.x , we can set per cell level TTLs so we could go more or less
depending on the application/ requirement.

bq. Do we have an estimation for performance cost for aggregation metrics data
over cell versions vs. rows?
Not sure if I understand aggregation for metrics timeseries data. Wondering why
we would aggregate data in one timeseries for one metric over time? But we can
discuss more about timeseries later I think, based on [~gtCarrera9]'s reply
below.

thanks!
Vrushali

[Storage implementation] explore the native HBase write schema for storage
--

Key: YARN-3411
URL: https://issues.apache.org/jira/browse/YARN-3411
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
Attachments: ATSv2BackendHBaseSchemaproposal.pdf,
YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch,
YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt,
YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt,
YARN-3411.poc.txt

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-15 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546176#comment-14546176
 ] 

Vrushali C commented on YARN-3411:
--

Hi [~jrottinghuis]

Thanks for the review! My response below. 

bq. On class range, the start and end are final right?
yes

bq. Wouln't it be better to do this in somewhat more of a streaming manner?
Agreed, modifying the code accordingly right now. Also we discussed today about 
checking out what the hbase batching/flushing policy and accordingly 
considering emitting puts as we see them.

bq. More general question: for TimelineWriter and its implementations, is there 
an expectation set around concurrency? 
Good point, I have filed YARN-3650 to discuss that

bq. it will be super-usefull to be able to have one single config key to prefix 
all tables.
Agreed. I have filed YARN-3649 to work on that

bq. getRowKeyPrefix should cleanse input arguments to strip 
EntityTableDetails.ROW_KEY_SEPARATOR_BYTES
yes, updating the code now.

bq. Overall, we should fill in the javadoc, describe inputs, explain what we 
expect etc.
Yes, will update the documentation as much as I can along the lines of what we 
discussed today.

bq. EntityColumnDetails probably needs a public static method that will take 
bytes, iterates over the enum and returns an enum from those bytes.
Yes, I will add that in later if/when we decide to go forward with this method.

thanks
Vrushali


 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-14 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544190#comment-14544190
 ] 

Sangjin Lee commented on YARN-3411:
---

Vrushali and I spoke offline on the findbugs issues and others. Here is some 
more feedback on the existing patch.

I would try to limit the public surface area as good hygiene (making classes 
non-public by default unless/until necessary). It applies to 
EntityColumnDetails, EntityTableDetails, and Range.

(HBaseTimelineWriterImpl.java)
- l.142: let's declare value inside the loop as well
- l.200: Object value = String value?
- l.228: let's declare key inside the loop
- l.267: let's declare id inside the outer for loop
- l.272: let's declare key inside the inner for loop

(Range.java)
- l.21-22: both can/should be final

(TimelineWriterUtils.java)
- let's add Private and Unstable annotations


 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, 
 YARN-3411.poc.4.txt, YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, 
 YARN-3411.poc.7.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544612#comment-14544612
 ] 

Hadoop QA commented on YARN-3411:
-

\\
\\
| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 13s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 45s | There were no new javac warning 
messages. |
| {color:green}+1{color} | javadoc |   9m 44s | There were no new javadoc 
warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 14s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 38s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:green}+1{color} | findbugs |   0m 37s | The patch does not introduce 
any new Findbugs (version 2.0.3) warnings. |
| {color:green}+1{color} | yarn tests |   1m 14s | Tests passed in 
hadoop-yarn-server-timelineservice. |
| | |  37m 32s | |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732954/YARN-3411-YARN-2928.002.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / 463e070 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7942/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7942/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf907.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7942/console |


This message was automatically generated.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-14 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544659#comment-14544659
 ] 

Sangjin Lee commented on YARN-3411:
---

The latest patch looks pretty good. I think we're almost there. I went over it 
one more time bit more closely, and have mostly minor comments. Some are 
performance-related so you might want to consider them. Hopefully it won't be 
too difficult to address these...

(EntityColumnDetails.java)
- let's add Private and Unstable to this

(EntityTableDetails.java)
- let's add Private and Unstable to this
- l.27: this also should be non-public?

(HBaseTimelineWriterImpl.java)
- l.88: (nit) you must be sick of me saying it, but let's declare row inside 
the for loop, and also use a more standard declaration (byte[] row = ...)
- l.165: it might be a minor optimization, but we might want to reuse bytes for 
te.getType() and te.getId() for other calls that need them (e.g. getInfoPut)
- l.234: we should get the bytes for the key *once* outside the inner for loop 
and reuse it
- l.274: we should get the bytes for the id *once* outside the inner for loop 
and reuse it

(Range.java)
- let's add Private and Unstable to this

(TimelineSchemaCreator.java)
- l.137: I think we should use LOG.error() as we have the logger
- l.144-147: I'm unsure why we're using the log4j logger when we're using the 
Apache Commons logger; is this for HBase debug logging?

(TimelineWriterUtil.java)
- l.53: do we need to handle the case where a component byte array is empty?
- l.53: it's not critical but it could be good to handle the case where 
components.length = 1 (just return it)
- l.181,197: what are taskValues?

(TestHBaseTimelineWriterImpl.java)
- l.56-60: the javadoc should be before the class name
- l.56: (nit) remove PoC
- l.62: (nit) it should be util (not upper case)
- l.141: (nit) it should call init() (best practice)
- l.147: (nit) should be long, not Long
- l.212: (nit) it should call stop() (best practice)

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-14 Thread Vrushali C (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544756#comment-14544756
]

Vrushali C commented on YARN-3411:
--

Thanks [~sjlee0] and [~jrottinghuis] for the feedback! I will make as many
changes as possible for now. For some of Joep's questions/comments I might spin
off a jira or two to work on those later. Also, I will respond to some of the
points brought up after thinking over it for a while.

[Storage implementation] explore the native HBase write schema for storage
--

Key: YARN-3411
URL: https://issues.apache.org/jira/browse/YARN-3411
Project: Hadoop YARN
Issue Type: Sub-task
Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
Attachments: ATSv2BackendHBaseSchemaproposal.pdf,
YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch,
YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt,
YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt,
YARN-3411.poc.txt

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-14 Thread Joep Rottinghuis (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14544692#comment-14544692
 ] 

Joep Rottinghuis commented on YARN-3411:


Ok, admitted that I have not yet read Sangjin's comments from just minutes ago.
I also need to spend some more time to fully read and comprehend the key 
structure.

Here is an initial set of comments that I have, please take them for that they 
are worth.
I'll have to spend more time to dig into more details.

Overall, we should fill in the javadoc, describe inputs, explain what we expect 
etc.

EntityColumnDetails probably needs a public static method

public static EntityColumnDetails createEntityColumnDetails(byte[] bytes)
that will take bytes, iterates over the enum and returns an enum from those 
bytes.
You'll need to decide if you want a default, or throw an exception if bytes 
does not map to any enum.



EntityTableDetails.java

getRowKeyPrefix should cleanse input arguments to strip 
EntityTableDetails.ROW_KEY_SEPARATOR_BYTES
(at least the string representative of that, may be good to split that 
definition into two).
The reverse method that takes a rowkey and splits into components should 
probably not reverse this.
For example, if we use a ! for separator and translate that to underscore, do 
we translate all underscores back to bangs?
In hRaven we did not, but stripping the separators is essential to avoid 
parsing errors during read.

Instead of just using DEFAULT_ENTITY_TABLE_NAME or overriding the tablename 
itself,
it will be super-usefull to be able to have one single config key to prefix all 
tables.
This way you can easily run a staging, a test, a production and whatever setup 
in the same
HBase instance / w/o having to override every single table in the config.
One could simply overwrite the default prefix and you're off and running.
For prefix I'm thinking tst prod exp etc. Once can then still override 
one tablename if needed,
but managing one whole setup will be easier.

For column prefixes, are you simply using a one character prefix, and the first 
character is always a prefix?
Or do we need to do cleansing on the columns written to ensure that nobody else 
will write with this
column prefix?
Also, if there is a prefix, why not make this an enum?


More general question: for TimelineWriter and its implementations, is there an 
expectation set around
concurrency? Is any synchronization expected / needed to ensure visibility when 
calls happen from different threads?
How about entities, are they expected to be immutable once passed to the write 
method?
Similarly for the constructor, we're assuming that the configuration object 
will not be modified while we're constructing a TimelineWriter?

For the write method in HBaseTimelineWriterImpl, we're collecting all the puts 
first, then we're creating a big list of entity puts, and then we're writing 
them out.
For large entities (with large configurations) we end up creating two (if not 
three I'd have to read more carefully) copies of the entire structure.
Wouln't it be better to do this in somewhat more of a streaming manner?
For example, first create info puts, then stream those out to the entityTable, 
then drop the infoPuts Put,
then move on to eventPut, configPut etc.

Even method like getConfigPut could conceivable by constructed in such a way 
that we don't have to create one massive copy of the config in a single Put.
Perhaps we can pass in the table and create smaller puts, perhaps even one per 
config key and stream those out.
Ditto when iterating over the timeseries.keySet() in getMetricsPut.


For serviceStop we should probably handle things a littlebit better than just 
calling close.
Are we expecting to just close the HBase connection, still keep writing and 
just let the write exceptions shoot up to the top of the writing process?
Should we occasionally ( as we're iterating) check the stop method and just 
bail out in a more elegant way?

On class range, the start and end are final right?

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411-YARN-2928.001.patch, YARN-3411-YARN-2928.002.patch, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-14 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14543353#comment-14543353
 ] 

Hadoop QA commented on YARN-3411:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | pre-patch |  15m 36s | Pre-patch YARN-2928 compilation 
is healthy. |
| {color:green}+1{color} | @author |   0m  0s | The patch does not contain any 
@author tags. |
| {color:green}+1{color} | tests included |   0m  0s | The patch appears to 
include 1 new or modified test files. |
| {color:green}+1{color} | javac |   7m 44s | There were no new javac warning 
messages. |
| {color:red}-1{color} | javadoc |   9m 40s | The applied patch generated  11  
additional warning messages. |
| {color:green}+1{color} | release audit |   0m 23s | The applied patch does 
not increase the total number of release audit warnings. |
| {color:green}+1{color} | checkstyle |   0m 14s | There were no new checkstyle 
issues. |
| {color:green}+1{color} | whitespace |   0m  0s | The patch has no lines that 
end in whitespace. |
| {color:green}+1{color} | install |   1m 38s | mvn install still works. |
| {color:green}+1{color} | eclipse:eclipse |   0m 40s | The patch built with 
eclipse:eclipse. |
| {color:red}-1{color} | findbugs |   0m 42s | The patch appears to introduce 8 
new Findbugs (version 2.0.3) warnings. |
| {color:red}-1{color} | yarn tests |   7m 50s | Tests failed in 
hadoop-yarn-server-timelineservice. |
| | |  44m 34s | |
\\
\\
|| Reason || Tests ||
| FindBugs | module:hadoop-yarn-server-timelineservice |
|  |  
org.apache.hadoop.yarn.server.timelineservice.storage.EntityColumnDetails.getInBytes()
 may expose internal representation by returning EntityColumnDetails.inBytes  
At EntityColumnDetails.java:by returning EntityColumnDetails.inBytes  At 
EntityColumnDetails.java:[line 48] |
|  |  
org.apache.hadoop.yarn.server.timelineservice.storage.EntityTableDetails.COLUMN_FAMILY_CONFIG_BYTES
 should be package protected  At EntityTableDetails.java: At 
EntityTableDetails.java:[line 48] |
|  |  
org.apache.hadoop.yarn.server.timelineservice.storage.EntityTableDetails.COLUMN_FAMILY_INFO_BYTES
 should be package protected  At EntityTableDetails.java: At 
EntityTableDetails.java:[line 34] |
|  |  
org.apache.hadoop.yarn.server.timelineservice.storage.EntityTableDetails.COLUMN_FAMILY_METRICS_BYTES
 should be package protected  At EntityTableDetails.java: At 
EntityTableDetails.java:[line 41] |
|  |  
org.apache.hadoop.yarn.server.timelineservice.storage.EntityTableDetails.COLUMN_PREFIX_EVENTS_BYTES
 should be package protected  At EntityTableDetails.java: At 
EntityTableDetails.java:[line 55] |
|  |  
org.apache.hadoop.yarn.server.timelineservice.storage.EntityTableDetails.COLUMN_PREFIX_IS_RELATED_TO_BYTES
 should be package protected  At EntityTableDetails.java: At 
EntityTableDetails.java:[line 69] |
|  |  
org.apache.hadoop.yarn.server.timelineservice.storage.EntityTableDetails.COLUMN_PREFIX_RELATES_TO_BYTES
 should be package protected  At EntityTableDetails.java: At 
EntityTableDetails.java:[line 62] |
|  |  
org.apache.hadoop.yarn.server.timelineservice.storage.EntityTableDetails.ROW_KEY_SEPARATOR_BYTES
 should be package protected  At EntityTableDetails.java: At 
EntityTableDetails.java:[line 79] |
| Failed unit tests | 
hadoop.yarn.server.timelineservice.storage.TestHBaseTimelineWriterImpl |
|   | hadoop.yarn.server.timelineservice.storage.TestPhoenixTimelineWriterImpl |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732791/YARN-3411-YARN-2928.001.patch
 |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | YARN-2928 / b059dd4 |
| javadoc | 
https://builds.apache.org/job/PreCommit-YARN-Build/7932/artifact/patchprocess/diffJavadocWarnings.txt
 |
| Findbugs warnings | 
https://builds.apache.org/job/PreCommit-YARN-Build/7932/artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-timelineservice.html
 |
| hadoop-yarn-server-timelineservice test log | 
https://builds.apache.org/job/PreCommit-YARN-Build/7932/artifact/patchprocess/testrun_hadoop-yarn-server-timelineservice.txt
 |
| Test Results | 
https://builds.apache.org/job/PreCommit-YARN-Build/7932/testReport/ |
| Java | 1.7.0_55 |
| uname | Linux asf901.gq1.ygridcore.net 3.13.0-36-lowlatency #63-Ubuntu SMP 
PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7932/console |


This message was automatically generated.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-13 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14541730#comment-14541730
 ] 

Junping Du commented on YARN-3411:
--

Sure. Cancel the patch until we have new version. Thx!

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-13 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542919#comment-14542919
 ] 

Hadoop QA commented on YARN-3411:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | patch |   0m  1s | The patch file was not named 
according to hadoop's naming conventions. Please see 
https://wiki.apache.org/hadoop/HowToContribute for instructions. |
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732709/YARN-3411.poc.7.txt |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 281d47a |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7927/console |


This message was automatically generated.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-13 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542950#comment-14542950
 ] 

Sangjin Lee commented on YARN-3411:
---

[~vrushalic], could you ensure the patch applies cleanly? You can try it by 
checking out YARN-2928 first and then {{git apply --check}}.

Also, you'll need to rename the patch to {{YARN-3411-YARN-2928.001.patch}}. 
Otherwise, jenkins will try it on the trunk, and it won't work.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-13 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14542980#comment-14542980
 ] 

Sangjin Lee commented on YARN-3411:
---

Here is some quick initial feedback:

(HBaseTimelineWriterImpl.java)
- l.73: as a rule, we should call super.serviceInit(conf)
- l.98-99: lines longer than 80 characters
- l.203-204: nit: again I would prefer putting the declarations inside the for 
loop (the same for other loops)
- l.227: since we're accessing both keys and values, it would be slightly more 
efficient to use entrySet() rather than keySet()
- l.238: use LOG.error(message, throwable)
- l.286: as a rule, we should call super.serviceStop()

(Range.java)
- indentation didn't come out right



 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.7.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-12 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539425#comment-14539425
 ] 

Hadoop QA commented on YARN-3411:
-

\\
\\
| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | patch |   0m  1s | The patch file was not named 
according to hadoop's naming conventions. Please see 
https://wiki.apache.org/hadoop/HowToContribute for instructions. |
| {color:red}-1{color} | patch |   0m  0s | The patch command could not apply 
the patch during dryrun. |
\\
\\
|| Subsystem || Report/Notes ||
| Patch URL | 
http://issues.apache.org/jira/secure/attachment/12732170/YARN-3411.poc.6.txt |
| Optional Tests | javadoc javac unit findbugs checkstyle |
| git revision | trunk / 987abc9 |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/7879/console |


This message was automatically generated.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-12 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540383#comment-14540383
 ] 

Vrushali C commented on YARN-3411:
--

Sounds good, thanks [~djp]! But, would like to request you to wait before 
reviewing again, I am planning to upload a new patch later today. 

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-12 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540297#comment-14540297
 ] 

Vrushali C commented on YARN-3411:
--

Ah, I haven't added that patch, thanks [~gtCarrera9] let me give that a try.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-12 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540211#comment-14540211
 ] 

Li Lu commented on YARN-3411:
-

Hi [~vrushalic], sorry to hear the trouble. Have you tried to apply the latest 
patch in YARN-3529? In that JIRA I'm trying to make the Phoenix writer works 
with the snapshot version of Phoenix, which lives happily with HBase 1.0.1. 

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-12 Thread Vrushali C (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540303#comment-14540303
]

Vrushali C commented on YARN-3411:
--

bq. Also comments on latest (v5) patch
Thanks [~djp] for the review but the latest patch is v6. So some of the code
lines are no longer there but I will make the other changes like class name
rename and making the table name consistent with Phoenix writer table names and
the code changes for creating the connection etc.

I am working on updating the patch with your review suggestions and the patch
in YARN-3529 which will help me build.

[Storage implementation] explore the native HBase write schema for storage
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540373#comment-14540373
 ] 

Junping Du commented on YARN-3411:
--

Sure. I start this rounds of review start several days ago, but cannot get 
finished mostly until today. Sorry for missing the new version of patch, will 
check it quite soon.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540220#comment-14540220
 ] 

Junping Du commented on YARN-3411:
--

Also comments on latest (v5) patch:
{code}
+public class CreateSchema {
{code}
Can we rename it to a more concrete name, something like: TimelineSchemaCreator?

{code}
+  private static int createTimelineEntityTable() {
+try {
+  Configuration config = HBaseConfiguration.create();
+  // add the hbase configuration details from classpath
+  config.addResource(hbase-site.xml);
+  Connection conn = ConnectionFactory.createConnection(config);
+  Admin admin = conn.getAdmin();
...
{code}
All of these code should be reused by create other tables. May be we should 
move it out of createTimelineEntityTable() and make it as static part of Class?

{code}
+  if (admin.tableExists(table)) {
+// do not disable / delete existing table
+// similar to the approach taken by map-reduce jobs when
+// output directory exists
+LOG.error(Table  + table.getNameAsString() +  already exists.);
+return 1;
+  }
{code}
We would like to throw exception here so user can get notified the failed 
reason immediately?

{code}
+  // TTL is 30 days, need to make it configurable perhaps
+  cf3.setTimeToLive(2592000);
{code}
We shouldn't have a hard code value here. At least, add a TODO in comment to 
fix it later.

In HBaseTimelineWriterImpl.java,
{code}
+// TODO right now using a default table name
+// change later to use a config driven table name
+entityTableName = TableName
+.valueOf(EntityTableDetails.DEFAULT_ENTITY_TABLE_NAME);
{code}
Shall we consistent with table name of Pheonix writer if haven't make it 
configurable? Or we intent to do so for some reasons?

{code}
+  if (entityPuts.size()  0) {
+LOG.info(Storing  + entityPuts.size() +  to 
++ this.entityTableName.getNameAsString());
+entityTable.put(entityPuts);
+  } else {
+LOG.warn(empty entity object?);
+  }
{code}
The first log should be DEBUG level and wrap with if block of 
LOG.isDebugEnabled() which help performance.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14539552#comment-14539552
 ] 

Junping Du commented on YARN-3411:
--

bq.   I get Some Enforcer rules have failed errors. I am working on resolving 
those. Any more eyes on this build error would help! 
Hi [~vrushalic], I meet the same problem and I thought it's my JDK version 
issue before but actually not. Will also work on resolving this.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540035#comment-14540035
 ] 

Junping Du commented on YARN-3411:
--

Sorry. I mean I met a similar problem before which is caused by JDK version. 
Here it sounds like phoenix 4.3.0 depends on hbase version of 0.98 and we are 
depends on 1.0.1?

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14540078#comment-14540078
 ] 

Junping Du commented on YARN-3411:
--

PHOENIX-1757 should fix this problem, but the target release is 4.4.0 which 
sounds like still in voting process (start in May.2).
Can we downgrade HBase version to 0.98 here for short term POC and upgrade to 
1.0.1 together with PHOENIX 4.4 later?

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.5.txt, YARN-3411.poc.6.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-07 Thread Junping Du (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533604#comment-14533604
 ] 

Junping Du commented on YARN-3411:
--

Thank you, [~vrushalic]! :)

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-07 Thread Junping Du (JIRA)

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14532992#comment-14532992
]

Junping Du commented on YARN-3411:
--

Thanks [~vrushalic] for updating the patch. I will review and comment based on
your updated patch in plan.
bq. Sure, I can change it to java but I think, as such there should not be any
problem running ruby code with hbase, it does not add to dependencies. In fact,
the HBase Shell, which is a command line interpreter for HBase is written in
Ruby.
I see. Thanks for sharing this prospective. However, accepting code with a new
language into hadoop community is not only mvn dependency issue but also a
maintenance headache according to issues like: setting up new code convention
for new language, etc. which could take more time to reach into a consensus.
Given there is only very limited Ruby code in current patch, shouldn't be very
painful to replace it with Java. Isn't it?

[Storage implementation] explore the native HBase write schema for storage
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-07 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14533033#comment-14533033
 ] 

Vrushali C commented on YARN-3411:
--

bq. Given there is only very limited Ruby code in current patch, shouldn't be 
very painful to replace it with Java. Isn't it?

Okay, let me rework it to java. 

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
  Labels: BB2015-05-TBR
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-05-06 Thread Vrushali C (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14530990#comment-14530990
 ] 

Vrushali C commented on YARN-3411:
--

update: I will generate another patch, thanks [~gtCarrera9] for checking the 
patch and observing that flowVersion is not being stored. 

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
  Labels: BB2015-05-TBR
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-30 Thread Sangjin Lee (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522569#comment-14522569
 ] 

Sangjin Lee commented on YARN-3411:
---

+1 on 1.0.1.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-30 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522556#comment-14522556
 ] 

Li Lu commented on YARN-3411:
-

Awesome, thanks! 

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.4.txt, 
 YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-30 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14522539#comment-14522539
 ] 

Li Lu commented on YARN-3411:
-

Hi [~vrushalic], one quick thing: do you have a rough estimation on how hard it 
will be to bump the version of hbase to 1.0.1? Phoenix 4.4.0-SNAPSHOT may rely 
on 1.0.1 now. 

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.3.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-29 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520115#comment-14520115
 ] 

Li Lu commented on YARN-3411:
-

Thanks [~stack] for the quick info! Yes let's go with HBase 1. We can figure 
out a solution for Phoenix later. On the worst case, we can rely on the 
snapshot version of Phoenix, which already works with HBase 1. 

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-29 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14519858#comment-14519858
 ] 

stack commented on YARN-3411:
-

bq. So there are some major changes between hbase 0.98 and hbase like the 
client facing APIs (HTableInterface, etc) have been deprecated and replaced 
with new interfaces.

It would be a pity if you fellas were stuck on the 0.98 APIs. Phoenix is 
shaping up to do an RC that will work w/ hbase 1.x.

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-29 Thread Li Lu (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520291#comment-14520291
 ] 

Li Lu commented on YARN-3411:
-

Hi [~vrushalic] [~zjshen], just a quick thing to confirm that we want to use 
byte arrays for config and info fields in both of our storage. I'll convert the 
type for config and info in the Phoenix implementation to VARBINARY to be 
consistent with this design. 

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage


[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517334#comment-14517334
 ] 

Vrushali C commented on YARN-3411:
--

Hi [~djp]

Thanks for the feedback! 
bq. Do each put in flow table sounds a little expensive especially when putting 
activity is very frequently. May be we should do some batch mode here?
Yes I think we do need batching, but I think that would be at the 
CollectorManager level. This writer would always do a complete entity write as 
a synchronous operation to the backend by design. 

 [Storage implementation] explore the native HBase write schema for storage
 --

 Key: YARN-3411
 URL: https://issues.apache.org/jira/browse/YARN-3411
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Vrushali C
Priority: Critical
 Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
 YARN-3411.poc.2.txt, YARN-3411.poc.txt


 There is work that's in progress to implement the storage based on a Phoenix 
 schema (YARN-3134).
 In parallel, we would like to explore an implementation based on a native 
 HBase schema for the write path. Such a schema does not exclude using 
 Phoenix, especially for reads and offline queries.
 Once we have basic implementations of both options, we could evaluate them in 
 terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517323#comment-14517323
]

Vrushali C commented on YARN-3411:
--

Hi Sangjin,

Thank you for the comments!
bq. I’m sure you're aware, but please don't forget to add the license to the
new files later.
Ah, did I miss adding the license, let me ensure I add the licenses in the next
patch.

bq. l.77: So do we need to depend on hbase-server? That's bit unexpected?
Let me try to work this out again, I had added it in as I was working through
compilation and runtime errors, but will look into this again to confirm/modify.

bq. l.7: I think we're now moving away from the acronym ats. Perhaps we
should simply use timeline.entity?
Yes, will correct the literal value in the constant.

bq. (HBaseTimelineWriterImpl.java)
l.40: Just curious, does this mean a single writer instance has only on HBase
client connection? We should be able to have multiple connections? What is your
thought on this?
So, I have been thinking about this too. We would right now have one instance
per NodeManager in this version, right? So that is already a lot of connections
for cluster with thousands of nodes. I am wondering if we may not even want
those many. Later on, when we move to having a timeline service writer per
application, even then, say an app that does many writes would simply flood the
writes to the region servers if it had a pool of connections at it’s disposal.
Perhaps having one per app would help to rate limit it somewhat? I am still
thinking out loud about this, haven’t decided which is better or worse.

bq. Also, the initialization operations inside the constructor should probably
belong in serviceInit() or serviceStart()?
Okay, will move it.

[Storage implementation] explore the native HBase write schema for storage
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

[
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14517331#comment-14517331
]

Vrushali C commented on YARN-3411:
--

Hi [~zjshen]

For metrics, shall we be more generalized to support all kinds of numeric
value, boolean and so on?
Hmm. So, hbase stores values in bytes. Reading back a value written as a Long
using Bytes.toInteger does not work. Hence, there needs to be a way to know
what the data type is going to be. If metrics will be numbers and booleans,
could we say we will always write them as longs? Boolean values can be
represented as 0 and 1 for false and true and all numbers can be written out as
longs, do you think would that be fine? I have to check what [~gtCarrera9]'s
patch writes them as.

For the config, I went with the background of hadoop configs where in
everything is like a string in the xml, even a number or a ip address etc.
thanks
Vrushali

[Storage implementation] explore the native HBase write schema for storage
--

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage