[jira] [Commented] (HADOOP-12949) Add HTrace to the s3a connector

2017-11-21 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16260508#comment-16260508
 ] 

Steve Loughran commented on HADOOP-12949:
-

Revisiting this

* yes, it would be good. 
* let's not worry about UA headers initially; a later iteration.
* more important: linking across jobs on long lived processes, e.g Spark, Hive 
LLAP. We want those tools to create a context, it to propagate over with their 
queries, and the store clients to pick that up.

Making a subclass of the S3A phase IV work, targeting Hadoop 3.1. 

Patches welcome!

> Add HTrace to the s3a connector
> ---
>
> Key: HADOOP-12949
> URL: https://issues.apache.org/jira/browse/HADOOP-12949
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Madhawa Gunasekara
>Assignee: Madhawa Gunasekara
>
> Hi All, 
> s3, GCS, WASB, and other cloud blob stores are becoming increasingly 
> important in Hadoop. But we don't have distributed tracing for these yet. It 
> would be interesting to add distributed tracing here. It would enable 
> collecting really interesting data like probability distributions of PUT and 
> GET requests to s3 and their impact on MR jobs, etc.
> I would like to implement this feature, Please shed some light on this 
> Thanks,
> Madhawa



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12949) Add HTrace to the s3a connector

2016-07-20 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385826#comment-15385826
 ] 

Steve Loughran commented on HADOOP-12949:
-

moved back to a dependency of S3A phase III from phase II; I'm no expecting 
this for Hadoop 2.8.

Colin, regarding S3 and UA headers, yes, Amazon can use the UA headers when 
dealing with problems. But that's for support issues, not performance (except 
in the more general "why I am being throttled" case)

> Add HTrace to the s3a connector
> ---
>
> Key: HADOOP-12949
> URL: https://issues.apache.org/jira/browse/HADOOP-12949
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Madhawa Gunasekara
>Assignee: Madhawa Gunasekara
>
> Hi All, 
> s3, GCS, WASB, and other cloud blob stores are becoming increasingly 
> important in Hadoop. But we don't have distributed tracing for these yet. It 
> would be interesting to add distributed tracing here. It would enable 
> collecting really interesting data like probability distributions of PUT and 
> GET requests to s3 and their impact on MR jobs, etc.
> I would like to implement this feature, Please shed some light on this 
> Thanks,
> Madhawa



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12949) Add HTrace to the s3a connector

2016-06-20 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340829#comment-15340829
 ] 

Colin Patrick McCabe commented on HADOOP-12949:
---

Yeah, we certainly could use the UA header for this.  That assumes that 
Amazon's s3 implementation will start looking for this (which maybe they 
will?).  In the short term, the big win will be just connecting up the job 
being run with the operations being done at the s3a level.

> Add HTrace to the s3a connector
> ---
>
> Key: HADOOP-12949
> URL: https://issues.apache.org/jira/browse/HADOOP-12949
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Madhawa Gunasekara
>Assignee: Madhawa Gunasekara
>
> Hi All, 
> s3, GCS, WASB, and other cloud blob stores are becoming increasingly 
> important in Hadoop. But we don't have distributed tracing for these yet. It 
> would be interesting to add distributed tracing here. It would enable 
> collecting really interesting data like probability distributions of PUT and 
> GET requests to s3 and their impact on MR jobs, etc.
> I would like to implement this feature, Please shed some light on this 
> Thanks,
> Madhawa



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12949) Add HTrace to the s3a connector

2016-06-19 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15338785#comment-15338785
 ] 

Steve Loughran commented on HADOOP-12949:
-

+ we'll want to have the htrace context ID go all the way down to s3 by way of 
the HADOOP-13122 UA header. That lets your storage infra provider know which 
queries are causing problems, and, if this goes via a proxy capable of reading 
the HTTP Requests, lets them sample and correlate with network load

> Add HTrace to the s3a connector
> ---
>
> Key: HADOOP-12949
> URL: https://issues.apache.org/jira/browse/HADOOP-12949
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Madhawa Gunasekara
>Assignee: Madhawa Gunasekara
>
> Hi All, 
> s3, GCS, WASB, and other cloud blob stores are becoming increasingly 
> important in Hadoop. But we don't have distributed tracing for these yet. It 
> would be interesting to add distributed tracing here. It would enable 
> collecting really interesting data like probability distributions of PUT and 
> GET requests to s3 and their impact on MR jobs, etc.
> I would like to implement this feature, Please shed some light on this 
> Thanks,
> Madhawa



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12949) Add HTrace to the s3a connector

2016-06-13 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327116#comment-15327116
 ] 

Steve Loughran commented on HADOOP-12949:
-

Marking as a dependency of s3a phase III

> Add HTrace to the s3a connector
> ---
>
> Key: HADOOP-12949
> URL: https://issues.apache.org/jira/browse/HADOOP-12949
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Reporter: Madhawa Gunasekara
>Assignee: Madhawa Gunasekara
>
> Hi All, 
> s3, GCS, WASB, and other cloud blob stores are becoming increasingly 
> important in Hadoop. But we don't have distributed tracing for these yet. It 
> would be interesting to add distributed tracing here. It would enable 
> collecting really interesting data like probability distributions of PUT and 
> GET requests to s3 and their impact on MR jobs, etc.
> I would like to implement this feature, Please shed some light on this 
> Thanks,
> Madhawa



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-12949) Add HTrace to the s3a connector

2016-03-22 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15206107#comment-15206107
 ] 

Steve Loughran commented on HADOOP-12949:
-

There's actually some metrics collection in openstack swift; look under 
{{org.apache.hadoop.fs.swift.util.DurationStats}} ; they log primarily to 
stdout, list min, max, (moving) arithmetic mean, stddev,, by HTTP verb.

# It's pretty low cost to do this; even when hbase sampling is inactive, the 
stats for an FS can be collected.
# The stats showed that rackspace UK throttles delete requests; the more files 
in a directory I was cleaning up on teardown, the longer it took —only now 
exponentially, rather than linearly.
# I didn't hook the code up to the normal hadoop metrics; it's something I'd as 
an option now, because it does become something you need to monitor now we are 
shifting to longer-lived applications.
# I'd add more on causes of operations, specifically: open(), seek(), duration 
of close(), delete() —things where the fact that object stores are generally 
O(files*data) means they don't work as expected ... finding that mismatch of 
expectations matters

More and more object stores are coming in. While s3 is the main one, it'd be 
good to have the core stuff store neutral. The classes from hadoop-openstack 
can be moved if that helps; the per-verb stuff is useful at the deep levels, 
while htrace monitoring can track cost of specific actions.



> Add HTrace to the s3a connector
> ---
>
> Key: HADOOP-12949
> URL: https://issues.apache.org/jira/browse/HADOOP-12949
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Madhawa Gunasekara
>
> Hi All, 
> s3, GCS, WASB, and other cloud blob stores are becoming increasingly 
> important in Hadoop. But we don't have distributed tracing for these yet. It 
> would be interesting to add distributed tracing here. It would enable 
> collecting really interesting data like probability distributions of PUT and 
> GET requests to s3 and their impact on MR jobs, etc.
> I would like to implement this feature, Please shed some light on this 
> Thanks,
> Madhawa



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HADOOP-12949) Add HTrace to the s3a connector

2016-03-21 Thread Colin Patrick McCabe (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-12949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15204750#comment-15204750
 ] 

Colin Patrick McCabe commented on HADOOP-12949:
---

Hi [~madhawa], great idea!  I think the first thing to do is to read a bit 
about how to set up HTrace.  See 
http://blog.cloudera.com/blog/2015/12/new-in-cloudera-labs-apache-htrace-incubating/
If you can get a working setup for HTrace-on-HDFS, it will help for adding 
tracing to other projects such as the s3a connector.

> Add HTrace to the s3a connector
> ---
>
> Key: HADOOP-12949
> URL: https://issues.apache.org/jira/browse/HADOOP-12949
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: Madhawa Gunasekara
>
> Hi All, 
> s3, GCS, WASB, and other cloud blob stores are becoming increasingly 
> important in Hadoop. But we don't have distributed tracing for these yet. It 
> would be interesting to add distributed tracing here. It would enable 
> collecting really interesting data like probability distributions of PUT and 
> GET requests to s3 and their impact on MR jobs, etc.
> I would like to implement this feature, Please shed some light on this 
> Thanks,
> Madhawa



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)