[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-27 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14515163#comment-14515163
 ] 

Zhijie Shen commented on YARN-3411:
---

bq. Can we make both the implementations use hBase-client from 0.98 
irrespective of what the server uses?

I guess it's not the client problem, but the server problem. To use Phoenix, 
HBase daemon needs to start with Phoenix server lib installed. That said, we 
won't be able to have a HBase 1.0 cluster which has installed 4.3 Phoenix.

And given Vrushali's comment "Yes, since hbase 1.0 is both on-wire and on-disk 
compatible with HBase 0.98.x, I believe we should be able to use the 0.98 
client to write to a hbase 1.0 cluster." is right, I guess the client should 
probably be fine instead. 4.3 Phoenix client uses HBase 0.98.x client, such 
that it can talk to HBase 1.0 server.

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411.poc.2.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-27 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14515135#comment-14515135
 ] 

Vrushali C commented on YARN-3411:
--


Hi [~vinodkv]

bq. Can we make both the implementations use hBase-client from 0.98 
irrespective of what the server uses?

Yes, since hbase 1.0 is both on-wire and on-disk compatible with HBase 0.98.x, 
I believe we should be able to use the 0.98 client to write to a hbase 1.0 
cluster. But, that means we would still be using the 0.98 APIs in the timeline 
writer and would need code changes to move to 1.0 client. (My current patch 
uses the new 1.0 APIs). 

Using the 0.98.x client means we won’t be able to take advantage of the 1.0 
features which would really be useful in ATSv2.

thanks
Vrushali

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411.poc.2.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-27 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14515075#comment-14515075
 ] 

Li Lu commented on YARN-3411:
-

One alternative plan is to try the 4.4.0 snapshot version of Phoenix in our 
benchmark, which is yet to be released but (probably) usable. I'll double check 
with the Phoenix/hbase team to see how hard this is. 

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411.poc.2.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-27 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14515072#comment-14515072
 ] 

Vinod Kumar Vavilapalli commented on YARN-3411:
---

I think the bigger question is this - if the Phoenix based storage impl needs 
HBase 0.98 and the native HBase impl needs 1.0, how can they both be used by 
yarn-timeline-service module / reside in the same JVM space? Can we make both 
the implementations use hBase-client from 0.98 irrespective of what the server 
uses?

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411.poc.2.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-27 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514942#comment-14514942
 ] 

Vrushali C commented on YARN-3411:
--


Hi Li

I see. Hmm. So there are some major changes between hbase 0.98 and hbase like 
the client facing APIs (HTableInterface, etc) have been deprecated and replaced 
with new interfaces.
(Connection management moved to new ConnectionFactory class; A table is now 
referred to only with TableName  not String or  byte[] ; etc) 

This means, we would need several code changes and upgrade steps to move from 
0.98 to 1.0 in the future. 

Also, would like to mention that HBase 1.0 comes with a whole bunch of 
improvements, performance fixes (improved WAL pipeline, using disruptor, 
multi-WAL, more off-heap, etc) and bug fixes, some of which I think would be 
very beneficial for ATS v2. For instance, 
- per cell TTLs can be set
- Better support for HBase Cell interface internally in read and write paths 
for  better performance and flexibility
- It now has the coprocessor functionality to make endpoint calls against a 
region server, which would be very helpful with aggregations
- A Dockerfile to easily build and run HBase from source (which would be 
helpful during deploy and set up for users)
- It contains a feature where in a region can be hosted in multiple region 
servers in read-only mode. One of the replicas for the region will be primary, 
accepting writes, and other replicas will share the same data files. Read 
requests can be done against any replica for the region with backup RPCs for 
high availability with timeline consistency guarantees.  This should help us 
significantly on the reader side. 

I think for the writer performance testing, we can have phoenix on 0.98 and 
this native approach on 1.0 but that means we need two hbase clusters, one on 
0.98 and one on 1.0. What do you say..





> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411.poc.2.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-27 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514872#comment-14514872
 ] 

Sangjin Lee commented on YARN-3411:
---

[~vinodkv], that's what I understood as well. The remaining concern is that we 
need to pick versions carefully lest HBase (or any other library in this 
situation) may be forced on a uncertified/incompatible version of hadoop. But 
it is true that problem may exist no matter how we structure the code/projects.

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411.poc.2.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-27 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514591#comment-14514591
 ] 

Li Lu commented on YARN-3411:
-

Hi [~vrushalic], one more quick question about the version numbers. The current 
Phoenix release only works with 0.98. Right now we're waiting for Phoenix 4.4 
to support hbase 1, but that may take a while. So for now, will there by 
significant problem if we use hbase 0.98 as the standard version? I know unit 
tests may not run with 0.98 on trunk, but once the main logic works that should 
not block the performance benchmark I guess? Thanks! 

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411.poc.2.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-27 Thread Vinod Kumar Vavilapalli (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514554#comment-14514554
 ] 

Vinod Kumar Vavilapalli commented on YARN-3411:
---

There is no single YARN artifact. hbase-client may depend on yarn-client. But 
yarn-timeline-service may depend on hbase-client. There is no cause for 
concern. yarn-timeline-service should depend on the last stable hbase-client 
that we will test with and support.

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411.poc.2.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-27 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14514466#comment-14514466
 ] 

Sangjin Lee commented on YARN-3411:
---

bq. In detail thinking, HBase relies on Hadoop/HDFS only (not YARN), so it 
should be fine for YARN to rely on HBase component especially it is downloading 
jar rather than build from source?

OK I got clearer about this. "hbase-client" does depend on YARN indirectly as 
it depends on hadoop-mapreduce-client-core. But since timelineservice is high 
enough in terms of YARN project dependency hierarchy so they don't form a 
cycle. I think this specific situation might be OK, so we can move forward. As 
a to-do item, though, I think we want to think about the code structure of 
adding library dependencies that depend on hadoop, as things like versions can 
become issues. Are there any precedents?

bq. Do we have solid case for non-numeric metrics so far? Boolean case should 
be fine as we can represent true and false with 1 and 0.

I think we should stick with numbers (perhaps java.lang.Number as the base 
class for types). I'm not even sure how boolean "aggregation" would work so we 
may just decide not to support it.

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411.poc.2.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-27 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14513989#comment-14513989
 ] 

Junping Du commented on YARN-3411:
--

Just 2 cents on comments above: 
bq. By timelineservice depending on HBase, we now have a circular dependency 
between hadoop and HBase. Evidently it builds (which is bit surprising), but it 
creates an interesting situation. I'm wondering how we should handle this. I 
suppose the same issue exists with Phoenix.
In detail thinking, HBase relies on Hadoop/HDFS only (not YARN), so it should 
be fine for YARN to rely on HBase component especially it is downloading jar 
rather than build from source? In prospective of upstream/downstream project 
relationship, I do agree it bring extra complexity between Hadoop project and 
HBase project in syncing project releases. However, I think we should expect 
this and already decided to take the pain before we are moving to HBase. Don't 
we? Though, I didn't remember to see public discussions on this concern before.

bq. For metrics, shall we be more generalized to support all kinds of numeric 
value, boolean and so on?
Do we have solid case for non-numeric metrics so far? Boolean case should be 
fine as we can represent true and false with 1 and 0.

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411.poc.2.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-24 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512098#comment-14512098
 ] 

Zhijie Shen commented on YARN-3411:
---

[~vrushalic], I've commented on 
[YARN-3134|https://issues.apache.org/jira/browse/YARN-3134?focusedCommentId=14512080&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14512080]
 about the question of config/info/metric value.

I scan through the hbase implementation. It seem config value is treated as 
string, info is stored as directly in bytes, and metrics is treated as Long. I 
kindly agree on config/info, and I think it should be fine if config is assumed 
to be string (maybe we need to adjust the data model), but let's see 
community's opinion. For metrics, shall we be more generalized to support all 
kinds of numeric value, boolean and so on?

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411.poc.2.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-24 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511470#comment-14511470
 ] 

Sangjin Lee commented on YARN-3411:
---

Sorry [~vrushalic] for my late comments. I took a quick pass on the POC patch, 
and as with the Phoenix one I haven't fully delved into the schema-related 
code, but here are some initial quick comments.

- I'm sure you're aware, but please don't forget to add the license to the new 
files later.

(pom.xml)
- l.77: So do we need to depend on hbase-server? That's bit unexpected?
- Come to think of it, this is bit interesting. By timelineservice depending on 
HBase, we now have a circular dependency between hadoop and HBase. Evidently it 
builds (which is bit surprising), but it creates an interesting situation. I'm 
wondering how we should handle this. I suppose the same issue exists with 
Phoenix.

(EntityTableDetails.java)
- l.7: I think we're now moving away from the acronym "ats". Perhaps we should 
simply use "timeline.entity"?

(HBaseTimelineWriterImpl.java)
- l.40: Just curious, does this mean a single writer instance has only on HBase 
client connection? We should be able to have multiple connections? What is your 
thought on this?
- Also, the initialization operations inside the constructor should probably 
belong in serviceInit() or serviceStart()?

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411.poc.2.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-24 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511334#comment-14511334
 ] 

Vrushali C commented on YARN-3411:
--

Hi [~gtCarrera9] and [~djp] 

Thanks for the comments, I will reply to these shortly but wanted to quickly 
respond about 
bq. Can we turn this into java code? As we haven't add any ruby code before, it 
could bring extra complexity/dependency on ruby.

Sure, I can change it to java but I think, as such there should not be any 
problem running ruby code with hbase, it does not add to dependencies. In fact, 
the HBase Shell, which is a command line interpreter for HBase is written in 
Ruby. 

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411.poc.2.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-24 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14511323#comment-14511323
 ] 

Junping Du commented on YARN-3411:
--

Hi [~vrushalic], thanks for updating the patch! Just a quick go through for the 
latest patch, sounds like we have ruby code to create schema:
{code}
+create 'ats.entity', 
+  {NAME => 'i', COMPRESSION => 'LZO', BLOOMFILTER => 'ROWCOL'},
+  {NAME => 'm', VERSIONS => 2147483647, MIN_VERSIONS => 1, COMPRESSION => 
'LZO', BLOCKCACHE => false, TTL => '2592000'},
+  {NAME => 'c', COMPRESSION => 'LZO', BLOCKCACHE => false, BLOOMFILTER => 
'ROWCOL' }
{code} 
Can we turn this into java code? As we haven't add any ruby code before, it 
could bring extra complexity/dependency on ruby.

BTW, I meet the problem to build locally with applying the patch. I think the 
reason seems like my local use JDK 1.8, but HBase version we are using here 
(1.0.0) depends on jdk tools 1.7. I assume we will move to Java 8 in short term 
(may be in 2.8 release cycle? - need confirm later). If so, we may need 
solution for this problem.

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411.poc.2.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-23 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509961#comment-14509961
 ] 

Li Lu commented on YARN-3411:
-

Oh, and one thing to add, in the added pom file, maybe we can centralize the 
version of hbase (the Phoenix patch also has this problem)? This may make 
version management slightly easier. Maybe we can address this problem together 
with the Phoenix one in YARN-3529? 

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411.poc.2.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-23 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509951#comment-14509951
 ] 

Li Lu commented on YARN-3411:
-

Hi [~vrushalic], thanks for the patch! I'm OK with the major part of this patch 
for now. Here, I'm listing some questions that we can have some discussion on. 
# About null checks: so far we do not have a fixed standard on if and where we 
need to do null checks. I noticed you assumed info, config, event, and other 
similar fields are not null. Maybe we'd like to explicitly decide when all 
those fields can be null or empty. 
# Maybe we'd like to change TimelineWriterUtils to default access modifier? I 
think it would be sufficient to make it visible in package?
# One thing I'd like to open a discussion is on deciding the way to store and 
process metrics. Currently, in the hbase patch, startTime and endTime are not 
used. In the Phoenix patch, I store time series as a flattened, non-queryable 
strings. I think this part also requires some hint from the time-based 
aggregations. 
# Another thing I'd like to discuss here is if and how we'd like to set up a 
separate "fast path" for metric only updates. On the storage layer, I'd 
strongly +1 for a separate fast path such that we can only touch the 
(frequently updated) metrics table. Any proposals everyone?


> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, 
> YARN-3411.poc.2.txt, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-22 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14507504#comment-14507504
 ] 

Junping Du commented on YARN-3411:
--

Thanks [~vrushalic] for reply!
bq.  But I will be uploading a refined patch + some more changes like Metric 
writing soon. 
+1. The plan sounds good to me.

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-16 Thread Vrushali C (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498366#comment-14498366
 ] 

Vrushali C commented on YARN-3411:
--

Thanks [~djp] ! 
bq. Just quickly go through the poc patch which is good but only have 
EntityTable so far. Do we have plan to split other tables to other JIRAs?

yes, we can have jiras for other tables as we add in those functionalities. 
Right now, the PoC is focussed only on entity writes, hence this patch has only 
that table related stuff.

bq. Some quick comments on poc patch is we should reuse many operations here 
like split() or join() in other classes, so better to create a utility class 
with putting common methods to share.
Absolutely agreed, I am refining the patch. With hRaven we have a bunch of such 
utility classes.  I was trying to see how many I can put in, since it's not 
confirmed that this would be the way to go. I did not want to mix up too much 
code. But I will be uploading a refined patch + some more changes like Metric 
writing soon. 

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-16 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498278#comment-14498278
 ] 

Junping Du commented on YARN-3411:
--

Just quickly go through the poc patch which is good but only have EntityTable 
so far. Do we have plan to split other tables to other JIRAs? I would support 
that because mid size patch (not too large, not small) can make 
development/review iteration moving faster. 
Some quick comments on poc patch is we should reuse many operations here like 
split() or join() in other classes, so better to create a utility class with 
putting common methods to share.

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-04-15 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496538#comment-14496538
 ] 

Junping Du commented on YARN-3411:
--

Thanks [~vrushalic] for delivering the proposal and poc patch which is an 
excellent job!
Some quick comments from walk through proposal:
bq. Entity Table - primary key components-putting the UserID first helps to 
distribute writes across the regions in the hbase cluster.  Pros:​ avoids 
single region hotspotting. Cons:​ connections would be open to several region 
servers during writes from per node ATS.
Looks like we are try to get rid of region server hotspotting issues. I agree 
that this design could helps. However, this is still possible that specific 
user could submit much more applications than anyone else. In that case, the 
region hotspot issue will still appear. Isn't it? I think the more general way 
to solve this problem is making keys get salted with a prefix. Thoughts?

bq. Entity Table - column families​-config needs to be stored as key value, not 
as a blob to enable efficient key based querying based on config param name. 
storing it in a separate column family helps to avoid scanning over config  
while reading metrics and vice versa
+1. This leverage strength of columnar database. We should get rid of storing 
any default value for key. However, this sounds challengable if TimelineClient 
only has a configuration object.

bq. Entity Table - metrics are written to with an hbase cell timestamp set to 
top of the minute or top of the 5 minute interval or whatever is decided. This 
helps in timeseries storage and retrieval in case of querying at the entity 
level.
Can we also let TimelineCollector do some aggregation of metrics in a similar 
time interval rather than sending to HBase/Pheonix for every metrics when it 
received? This may help to lease some pressure to backend.

bq. Flow by application id table
I am still think we should figure out some way to store application attempts 
info. The typical usecase here is: for some reason (like: bug or hardware 
capability reason), some flow/application's AM could always get failed more 
times than other flows/applications. Keeping this info can help us to track 
these issues. Isn't it?

bq. flow summary daily table (aggregation table managed by Phoenix) - could be 
triggered via co­processor with each put in flow table or a cron run once per 
day to aggregate for yesterday (with catchup functionality in case of backlog 
etc)
Do each put in flow table sounds a little expensive especially when putting 
activity is very frequently. May be we should do some batch mode here? In 
addition, I think we can leverage per node TimelineCollector to do some first 
level aggregation which can help to relieve workload in backend.

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
> Attachments: ATSv2BackendHBaseSchemaproposal.pdf, YARN-3411.poc.txt
>
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3411) [Storage implementation] explore the native HBase write schema for storage

2015-03-27 Thread Li Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14384531#comment-14384531
 ] 

Li Lu commented on YARN-3411:
-

Hi [~vrushalic], thanks for working on this! It would be good for us to have 
both hbase and Phoenix storage implementations for comparison. Just keeping a 
record here that I think we can do the evaluation, before we move into 
implementing the aggregations. In this way we may save duplicated efforts in 
designing and implementing aggregations. 

> [Storage implementation] explore the native HBase write schema for storage
> --
>
> Key: YARN-3411
> URL: https://issues.apache.org/jira/browse/YARN-3411
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: timelineserver
>Reporter: Sangjin Lee
>Assignee: Vrushali C
>Priority: Critical
>
> There is work that's in progress to implement the storage based on a Phoenix 
> schema (YARN-3134).
> In parallel, we would like to explore an implementation based on a native 
> HBase schema for the write path. Such a schema does not exclude using 
> Phoenix, especially for reads and offline queries.
> Once we have basic implementations of both options, we could evaluate them in 
> terms of performance, scalability, usability, etc. and make a call.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    1   2