from:"Mike Liddell \(JIRA\)"

[jira] [Created] (HADOOP-10809) hadoop-azure: page blob support

2014-07-09 Thread Mike Liddell (JIRA)

Mike Liddell created HADOOP-10809:
-

 Summary: hadoop-azure: page blob support
 Key: HADOOP-10809
 URL: https://issues.apache.org/jira/browse/HADOOP-10809
 Project: Hadoop Common
  Issue Type: Improvement
  Components: tools
Reporter: Mike Liddell


Azure Blob Storage provides two flavors: block-blobs and page-blobs.  
Block-blobs are the general purpose kind that support convenient APIs and are 
the basis for the Azure Filesystem for Hadoop (see HADOOP-9629).

Page-blobs are more difficult to use but provide a different feature set.  Most 
importantly, page-blobs can cope with an effectively infinite number of small 
accesses whereas block-blobs can only tolerate 50K appends before relatively 
manual rewriting of the data is necessary.  The simplest analogy is that 
page-blobs are like a normal filesystem (eg FAT) and the API is like a 
low-level device driver.

See http://msdn.microsoft.com/en-us/library/azure/ee691964.aspx for some 
introductory material.

The primary driving scenario for page-blob support is for HBase transaction log 
files which require an access pattern of many small writes.  Additional 
scenarios can also be supported.

Configuration:
The Hadoop Filesystem abstraction needs a mechanism so that file-create can 
determine whether to create a block- or page-blob.  To permit scenarios where 
application code doesn't know about the details of azure storage we would like 
the configuration to be Aspect-style, ie configured by the Administrator and 
transparent to the application. The current solution is to use hadoop 
configuration to declare a list of page-blob folders -- Azure Filesystem for 
Hadoop will create files in these folders using page-blob flavor.  The 
configuration key is fs.azure.page.blob.dir, and description can be found in 
AzureNativeFileSystemStore.java.

Code changes:
- refactor of basic Azure Filesystem code to use a general BlobWrapper and 
specialized BlockBlobWrapper vs PageBlobWrapper
- introduction of PageBlob support (read, write, etc)
- miscellaneous changes such as umask handling, implementation of 
createNonRecursive(), flush/hflush/hsync.
- new unit tests.

Credit for the primary patch: Dexter Bradshaw, Mostafa Elhemali, Eric Hanson, 
Mike Liddell.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HADOOP-10809) hadoop-azure: page blob support

2014-07-09 Thread Mike Liddell (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mike Liddell updated HADOOP-10809:
--

Description:
Azure Blob Storage provides two flavors: block-blobs and page-blobs.
Block-blobs are the general purpose kind that support convenient APIs and are
the basis for the Azure Filesystem for Hadoop (see HADOOP-9629).

Page-blobs use the same namespace as block-blobs but provide a different
low-level feature set. Most importantly, page-blobs can cope with an
effectively infinite number of small accesses whereas block-blobs can only
tolerate 50K appends before relatively manual rewriting of the data is
necessary. The simplest analogy is that page-blobs are like a normal
filesystem (eg FAT) and the API is like a low-level device driver.

See http://msdn.microsoft.com/en-us/library/azure/ee691964.aspx for some
introductory material.

The primary driving scenario for page-blob support is for HBase transaction log
files which require an access pattern of many small writes. Additional
scenarios can also be supported.

Configuration:
The Hadoop Filesystem abstraction needs a mechanism so that file-create can
determine whether to create a block- or page-blob. To permit scenarios where
application code doesn't know about the details of azure storage we would like
the configuration to be Aspect-style, ie configured by the Administrator and
transparent to the application. The current solution is to use hadoop
configuration to declare a list of page-blob folders -- Azure Filesystem for
Hadoop will create files in these folders using page-blob flavor. The
configuration key is fs.azure.page.blob.dir, and description can be found in
AzureNativeFileSystemStore.java.

Code changes:
- refactor of basic Azure Filesystem code to use a general BlobWrapper and
specialized BlockBlobWrapper vs PageBlobWrapper
- introduction of PageBlob support (read, write, etc)
- miscellaneous changes such as umask handling, implementation of
createNonRecursive(), flush/hflush/hsync.
- new unit tests.

Credit for the primary patch: Dexter Bradshaw, Mostafa Elhemali, Eric Hanson,
Mike Liddell.

was:
Azure Blob Storage provides two flavors: block-blobs and page-blobs.
Block-blobs are the general purpose kind that support convenient APIs and are
the basis for the Azure Filesystem for Hadoop (see HADOOP-9629).

Page-blobs are more difficult to use but provide a different feature set. Most
importantly, page-blobs can cope with an effectively infinite number of small
accesses whereas block-blobs can only tolerate 50K appends before relatively
manual rewriting of the data is necessary. The simplest analogy is that
page-blobs are like a normal filesystem (eg FAT) and the API is like a
low-level device driver.

See http://msdn.microsoft.com/en-us/library/azure/ee691964.aspx for some
introductory material.

The primary driving scenario for page-blob support is for HBase transaction log
files which require an access pattern of many small writes. Additional
scenarios can also be supported.

Credit for the primary patch: Dexter Bradshaw, Mostafa Elhemali, Eric Hanson,
Mike Liddell.

hadoop-azure: page blob support
---

Key: HADOOP-10809
URL: https://issues.apache.org/jira/browse/HADOOP-10809
Project: Hadoop Common
Issue Type: Improvement
Components: tools
Reporter: Mike Liddell

[jira] [Updated] (HADOOP-10809) hadoop-azure: page blob support

2014-07-09 Thread Mike Liddell (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mike Liddell updated HADOOP-10809:
--

Page-blobs use the same namespace as block-blobs but provide a different
low-level feature set. Most importantly, page-blobs can cope with an
effectively infinite number of small accesses whereas block-blobs can only
tolerate 50K appends before relatively manual rewriting of the data is
necessary. A simple analogy is that page-blobs are like a regular disk and the
basic API is like a low-level device driver.

See http://msdn.microsoft.com/en-us/library/azure/ee691964.aspx for some
introductory material.

The primary driving scenario for page-blob support is for HBase transaction log
files which require an access pattern of many small writes. Additional
scenarios can also be supported.

Credit for the primary patch: Dexter Bradshaw, Mostafa Elhemali, Eric Hanson,
Mike Liddell.

See http://msdn.microsoft.com/en-us/library/azure/ee691964.aspx for some
introductory material.

The primary driving scenario for page-blob support is for HBase transaction log
files which require an access pattern of many small writes. Additional
scenarios can also be supported.

Credit for the primary patch: Dexter Bradshaw, Mostafa Elhemali, Eric Hanson,
Mike Liddell.

hadoop-azure: page blob support
---

Key: HADOOP-10809
URL: https://issues.apache.org/jira/browse/HADOOP-10809
Project: Hadoop Common
Issue Type: Improvement
Components: tools
Reporter: Mike Liddell

[jira] [Updated] (HADOOP-10809) hadoop-azure: page blob support

2014-07-09 Thread Mike Liddell (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mike Liddell updated HADOOP-10809:
--

Attachment: HADOOP-10809.1.patch

hadoop-azure: page blob support
---

Key: HADOOP-10809
URL: https://issues.apache.org/jira/browse/HADOOP-10809
Project: Hadoop Common
Issue Type: Improvement
Components: tools
Reporter: Mike Liddell
Attachments: HADOOP-10809.1.patch

Azure Blob Storage provides two flavors: block-blobs and page-blobs.
Block-blobs are the general purpose kind that support convenient APIs and are
the basis for the Azure Filesystem for Hadoop (see HADOOP-9629).
Page-blobs use the same namespace as block-blobs but provide a different
low-level feature set. Most importantly, page-blobs can cope with an
effectively infinite number of small accesses whereas block-blobs can only
tolerate 50K appends before relatively manual rewriting of the data is
necessary. A simple analogy is that page-blobs are like a regular disk and
the basic API is like a low-level device driver.
See http://msdn.microsoft.com/en-us/library/azure/ee691964.aspx for some
introductory material.
The primary driving scenario for page-blob support is for HBase transaction
log files which require an access pattern of many small writes. Additional
scenarios can also be supported.
Configuration:
The Hadoop Filesystem abstraction needs a mechanism so that file-create can
determine whether to create a block- or page-blob. To permit scenarios where
application code doesn't know about the details of azure storage we would
like the configuration to be Aspect-style, ie configured by the Administrator
and transparent to the application. The current solution is to use hadoop
configuration to declare a list of page-blob folders -- Azure Filesystem for
Hadoop will create files in these folders using page-blob flavor. The
configuration key is fs.azure.page.blob.dir, and description can be found
in AzureNativeFileSystemStore.java.
Code changes:
- refactor of basic Azure Filesystem code to use a general BlobWrapper and
specialized BlockBlobWrapper vs PageBlobWrapper
- introduction of PageBlob support (read, write, etc)
- miscellaneous changes such as umask handling, implementation of
createNonRecursive(), flush/hflush/hsync.
- new unit tests.
Credit for the primary patch: Dexter Bradshaw, Mostafa Elhemali, Eric Hanson,
Mike Liddell.

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HADOOP-10809) hadoop-azure: page blob support

2014-07-09 Thread Mike Liddell (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mike Liddell updated HADOOP-10809:
--

Status: Patch Available (was: In Progress)

hadoop-azure: page blob support
---

Key: HADOOP-10809
URL: https://issues.apache.org/jira/browse/HADOOP-10809
Project: Hadoop Common
Issue Type: Improvement
Components: tools
Reporter: Mike Liddell
Assignee: Mike Liddell
Attachments: HADOOP-10809.1.patch

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Work started] (HADOOP-10809) hadoop-azure: page blob support

2014-07-09 Thread Mike Liddell (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Work on HADOOP-10809 started by Mike Liddell.

hadoop-azure: page blob support
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HADOOP-10809) hadoop-azure: page blob support

2014-07-09 Thread Mike Liddell (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-10809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mike Liddell reassigned HADOOP-10809:
-

Assignee: Mike Liddell

hadoop-azure: page blob support
---

--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HADOOP-10728) Metrics system for Windows Azure Storage Filesystem

2014-06-24 Thread Mike Liddell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-10728:
--

Attachment: HADOOP-10728.3.patch

new patch addressing:
- license header
- simplification of pom.xml  (use default behavior for src\test\resources\)

 Metrics system for Windows Azure Storage Filesystem
 ---

 Key: HADOOP-10728
 URL: https://issues.apache.org/jira/browse/HADOOP-10728
 Project: Hadoop Common
  Issue Type: New Feature
  Components: tools
Reporter: Mike Liddell
Assignee: Mike Liddell
 Attachments: HADOOP-10728.2.patch, HADOOP-10728.3.patch


 Add a metrics2 source for the Windows Azure Filesystem driver that was 
 introduced with HADOOP-9629.
 AzureFileSystemInstrumentation is the new MetricsSource.  
 AzureNativeFilesystemStore and NativeAzureFilesystem have been modified to 
 record statistics and some machinery is added for the accumulation of 
 'rolling average' statistics.
 Primary new code appears in org.apache.hadoop.fs.azure.metrics namespace.
 h2. Credits and history
 Credit for this work goes to the early team: [~minwei], [~davidlao], 
 [~lengningliu] and [~stojanovic] as well as multiple people who have taken 
 over this work since then (hope I don't forget anyone): [~dexterb], Johannes 
 Klein, [~ivanmi], Michael Rys, [~mostafae], [~brian_swan], [~mikelid], 
 [~xifang], and [~chuanliu].



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HADOOP-10728) Metrics system for Windows Azure Storage Filesystem

2014-06-23 Thread Mike Liddell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-10728:
--

Attachment: HADOOP-10728.2.patch

new patch: HADOOP-10728.2.patch

- moved finalizer from BandwidthGaugeUpdater to AzureNativeFileSystem.  (it 
would logically be better on the former, but those instances are still attached 
to GC-roots when a filesystem instance gets GCed.  This was root cause of 
testFinalizerThreadShutdown failure.)
- revised testFinalizerThreadShutdown to accurately track thread counts.
- fixed pom.xml to include the metrics configuration file for testing
  (we now include * from src/test/resources)
- apache headers added to all files
- javadoc issues fixed.
- findbugs issues fixed.
- minor tweak to README.txt
- minor tweak to .gitignore

 Metrics system for Windows Azure Storage Filesystem
 ---

 Key: HADOOP-10728
 URL: https://issues.apache.org/jira/browse/HADOOP-10728
 Project: Hadoop Common
  Issue Type: New Feature
  Components: tools
Reporter: Mike Liddell
Assignee: Mike Liddell
 Attachments: HADOOP-10728.2.patch


 Add a metrics2 source for the Windows Azure Filesystem driver that was 
 introduced with HADOOP-9629.
 AzureFileSystemInstrumentation is the new MetricsSource.  
 AzureNativeFilesystemStore and NativeAzureFilesystem have been modified to 
 record statistics and some machinery is added for the accumulation of 
 'rolling average' statistics.
 Primary new code appears in org.apache.hadoop.fs.azure.metrics namespace.
 h2. Credits and history
 Credit for this work goes to the early team: [~minwei], [~davidlao], 
 [~lengningliu] and [~stojanovic] as well as multiple people who have taken 
 over this work since then (hope I don't forget anyone): [~dexterb], Johannes 
 Klein, [~ivanmi], Michael Rys, [~mostafae], [~brian_swan], [~mikelid], 
 [~xifang], and [~chuanliu].



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Work started] (HADOOP-10728) Metrics system for Windows Azure Storage Filesystem

2014-06-23 Thread Mike Liddell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HADOOP-10728 started by Mike Liddell.

 Metrics system for Windows Azure Storage Filesystem
 ---

 Key: HADOOP-10728
 URL: https://issues.apache.org/jira/browse/HADOOP-10728
 Project: Hadoop Common
  Issue Type: New Feature
  Components: tools
Reporter: Mike Liddell
Assignee: Mike Liddell
 Attachments: HADOOP-10728.2.patch


 Add a metrics2 source for the Windows Azure Filesystem driver that was 
 introduced with HADOOP-9629.
 AzureFileSystemInstrumentation is the new MetricsSource.  
 AzureNativeFilesystemStore and NativeAzureFilesystem have been modified to 
 record statistics and some machinery is added for the accumulation of 
 'rolling average' statistics.
 Primary new code appears in org.apache.hadoop.fs.azure.metrics namespace.
 h2. Credits and history
 Credit for this work goes to the early team: [~minwei], [~davidlao], 
 [~lengningliu] and [~stojanovic] as well as multiple people who have taken 
 over this work since then (hope I don't forget anyone): [~dexterb], Johannes 
 Klein, [~ivanmi], Michael Rys, [~mostafae], [~brian_swan], [~mikelid], 
 [~xifang], and [~chuanliu].



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HADOOP-10728) Metrics system for Windows Azure Storage Filesystem

2014-06-23 Thread Mike Liddell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-10728:
--

Status: Patch Available  (was: In Progress)

 Metrics system for Windows Azure Storage Filesystem
 ---

 Key: HADOOP-10728
 URL: https://issues.apache.org/jira/browse/HADOOP-10728
 Project: Hadoop Common
  Issue Type: New Feature
  Components: tools
Reporter: Mike Liddell
Assignee: Mike Liddell
 Attachments: HADOOP-10728.2.patch


 Add a metrics2 source for the Windows Azure Filesystem driver that was 
 introduced with HADOOP-9629.
 AzureFileSystemInstrumentation is the new MetricsSource.  
 AzureNativeFilesystemStore and NativeAzureFilesystem have been modified to 
 record statistics and some machinery is added for the accumulation of 
 'rolling average' statistics.
 Primary new code appears in org.apache.hadoop.fs.azure.metrics namespace.
 h2. Credits and history
 Credit for this work goes to the early team: [~minwei], [~davidlao], 
 [~lengningliu] and [~stojanovic] as well as multiple people who have taken 
 over this work since then (hope I don't forget anyone): [~dexterb], Johannes 
 Klein, [~ivanmi], Michael Rys, [~mostafae], [~brian_swan], [~mikelid], 
 [~xifang], and [~chuanliu].



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-9559) When metrics system is restarted MBean names get incorrectly flagged as dupes

2014-06-19 Thread Mike Liddell (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038195#comment-14038195
 ] 

Mike Liddell commented on HADOOP-9559:
--

[~vicaya] I think DefaultMetricsSystem#sourceName is not new code (and it is 
used by MetricsSystemImpl). so Nofix.

new patch will be added with small amendments: just adding @VisibleForTesting 
to MetricsSourceAdapter#getMBeanName() and MetricsSystemImpl#getSourceAdapter

 When metrics system is restarted MBean names get incorrectly flagged as dupes
 -

 Key: HADOOP-9559
 URL: https://issues.apache.org/jira/browse/HADOOP-9559
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Mostafa Elhemali
 Attachments: HADOOP-9559.2.txt, HADOOP-9559.patch


 In the Metrics2 system, every source gets registered as an MBean name, which 
 gets put into a unique name pool in the singleton DefaultMetricsSystem 
 object. The problem is that when the metrics system is shutdown (which 
 unregisters the MBeans) this unique name pool is left as is, so if the 
 metrics system is started again every attempt to register the same MBean 
 names fails (exception is eaten and a warning is logged).
 I think the fix here is to remove the name from the unique name pool if an 
 MBean is unregistered, since it's OK at this point to add it again.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HADOOP-9559) When metrics system is restarted MBean names get incorrectly flagged as dupes

2014-06-19 Thread Mike Liddell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9559:
-

Attachment: HADOOP-9559.2.txt

 When metrics system is restarted MBean names get incorrectly flagged as dupes
 -

 Key: HADOOP-9559
 URL: https://issues.apache.org/jira/browse/HADOOP-9559
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Mostafa Elhemali
 Attachments: HADOOP-9559.2.txt, HADOOP-9559.patch


 In the Metrics2 system, every source gets registered as an MBean name, which 
 gets put into a unique name pool in the singleton DefaultMetricsSystem 
 object. The problem is that when the metrics system is shutdown (which 
 unregisters the MBeans) this unique name pool is left as is, so if the 
 metrics system is started again every attempt to register the same MBean 
 names fails (exception is eaten and a warning is logged).
 I think the fix here is to remove the name from the unique name pool if an 
 MBean is unregistered, since it's OK at this point to add it again.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HADOOP-9559) When metrics system is restarted MBean names get incorrectly flagged as dupes

2014-06-19 Thread Mike Liddell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9559:
-

Attachment: HADOOP-9559.2.patch

 When metrics system is restarted MBean names get incorrectly flagged as dupes
 -

 Key: HADOOP-9559
 URL: https://issues.apache.org/jira/browse/HADOOP-9559
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Mostafa Elhemali
 Attachments: HADOOP-9559.2.patch, HADOOP-9559.patch


 In the Metrics2 system, every source gets registered as an MBean name, which 
 gets put into a unique name pool in the singleton DefaultMetricsSystem 
 object. The problem is that when the metrics system is shutdown (which 
 unregisters the MBeans) this unique name pool is left as is, so if the 
 metrics system is started again every attempt to register the same MBean 
 names fails (exception is eaten and a warning is logged).
 I think the fix here is to remove the name from the unique name pool if an 
 MBean is unregistered, since it's OK at this point to add it again.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HADOOP-9559) When metrics system is restarted MBean names get incorrectly flagged as dupes

2014-06-19 Thread Mike Liddell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-9559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9559:
-

Attachment: (was: HADOOP-9559.2.txt)

 When metrics system is restarted MBean names get incorrectly flagged as dupes
 -

 Key: HADOOP-9559
 URL: https://issues.apache.org/jira/browse/HADOOP-9559
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Mostafa Elhemali
 Attachments: HADOOP-9559.2.patch, HADOOP-9559.patch


 In the Metrics2 system, every source gets registered as an MBean name, which 
 gets put into a unique name pool in the singleton DefaultMetricsSystem 
 object. The problem is that when the metrics system is shutdown (which 
 unregisters the MBeans) this unique name pool is left as is, so if the 
 metrics system is started again every attempt to register the same MBean 
 names fails (exception is eaten and a warning is logged).
 I think the fix here is to remove the name from the unique name pool if an 
 MBean is unregistered, since it's OK at this point to add it again.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HADOOP-10728) Metrics system for Windows Azure Storage Filesystem

2014-06-19 Thread Mike Liddell (JIRA)

Mike Liddell created HADOOP-10728:
-

 Summary: Metrics system for Windows Azure Storage Filesystem
 Key: HADOOP-10728
 URL: https://issues.apache.org/jira/browse/HADOOP-10728
 Project: Hadoop Common
  Issue Type: New Feature
  Components: tools
Reporter: Mike Liddell
Assignee: Mike Liddell


Add a metrics2 source for the Windows Azure Filesystem driver that was 
introduced with HADOOP-9629.

AzureFileSystemInstrumentation is the new MetricsSource.  

AzureNativeFilesystemStore and NativeAzureFilesystem have been modified to 
record statistics and some machinery is added for the accumulation of 'rolling 
average' statistics.

Primary new code appears in org.apache.hadoop.fs.azure.metrics namespace.

h2. Credits and history
Credit for this work goes to the early team: [~minwei], [~davidlao], 
[~lengningliu] and [~stojanovic] as well as multiple people who have taken over 
this work since then (hope I don't forget anyone): [~dexterb], Johannes Klein, 
[~ivanmi], Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
[~chuanliu].




--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HADOOP-10728) Metrics system for Windows Azure Storage Filesystem

2014-06-19 Thread Mike Liddell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-10728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-10728:
--

Attachment: HADOOP-10728.1.patch

 Metrics system for Windows Azure Storage Filesystem
 ---

 Key: HADOOP-10728
 URL: https://issues.apache.org/jira/browse/HADOOP-10728
 Project: Hadoop Common
  Issue Type: New Feature
  Components: tools
Reporter: Mike Liddell
Assignee: Mike Liddell
 Attachments: HADOOP-10728.1.patch


 Add a metrics2 source for the Windows Azure Filesystem driver that was 
 introduced with HADOOP-9629.
 AzureFileSystemInstrumentation is the new MetricsSource.  
 AzureNativeFilesystemStore and NativeAzureFilesystem have been modified to 
 record statistics and some machinery is added for the accumulation of 
 'rolling average' statistics.
 Primary new code appears in org.apache.hadoop.fs.azure.metrics namespace.
 h2. Credits and history
 Credit for this work goes to the early team: [~minwei], [~davidlao], 
 [~lengningliu] and [~stojanovic] as well as multiple people who have taken 
 over this work since then (hope I don't forget anyone): [~dexterb], Johannes 
 Klein, [~ivanmi], Michael Rys, [~mostafae], [~brian_swan], [~mikelid], 
 [~xifang], and [~chuanliu].



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-06-09 Thread Mike Liddell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9629:
-

Attachment: HADOOP-9629.trunk.4.patch

new patch: HADOOP-9629.trunk.4.patch
 - removed comments re: Thread.currentThread.interrupt()
   (see reviewboard for the discussion)



 Support Windows Azure Storage - Blob as a file system in Hadoop
 ---

 Key: HADOOP-9629
 URL: https://issues.apache.org/jira/browse/HADOOP-9629
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Mostafa Elhemali
Assignee: Mike Liddell
 Attachments: HADOOP-9629 - Azure Filesystem - Information for 
 developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
 developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
 HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch, 
 HADOOP-9629.trunk.3.patch, HADOOP-9629.trunk.4.patch


 h2. Description
 This JIRA incorporates adding a new file system implementation for accessing 
 Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
 to MR jobs or configuring MR jobs to put their output directly into blob 
 storage.
 h2. High level design
 At a high level, the code here extends the FileSystem class to provide an 
 implementation for accessing blob storage; the scheme wasb is used for 
 accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
 scheme: {code}wasb[s]://container@account/path/to/file{code} to address 
 individual blobs. We use the standard Azure Java SDK 
 (com.microsoft.windowsazure) to do most of the work. In order to map a 
 hierarchical file system over the flat name-value pair nature of blob 
 storage, we create a specially tagged blob named path/to/dir whenever we 
 create a directory called path/to/dir, then files under that are stored as 
 normal blobs path/to/dir/file. We have many metrics implemented for it using 
 the Metrics2 interface. Tests are implemented mostly using a mock 
 implementation for the Azure SDK functionality, with an option to test 
 against a real blob storage if configured (instructions provided inside in 
 README.txt).
 h2. Credits and history
 This has been ongoing work for a while, and the early version of this work 
 can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
 we'll post the patch here for Hadoop trunk first, then post a patch for 
 branch-1 as well for backporting the functionality if accepted. Credit for 
 this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
 [~stojanovic] as well as multiple people who have taken over this work since 
 then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
 Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
 [~chuanliu].
 h2. Test
 Besides unit tests, we have used WASB as the default file system in our 
 service product. (HDFS is also used but not as default file system.) Various 
 different customer and test workloads have been run against clusters with 
 such configurations for quite some time. The current version reflects to the 
 version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-06-09 Thread Mike Liddell (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025857#comment-14025857
 ] 

Mike Liddell commented on HADOOP-9629:
--

Thanks Chris! I have applied your patch - looks good.

 Support Windows Azure Storage - Blob as a file system in Hadoop
 ---

 Key: HADOOP-9629
 URL: https://issues.apache.org/jira/browse/HADOOP-9629
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Mostafa Elhemali
Assignee: Mike Liddell
 Attachments: HADOOP-9629 - Azure Filesystem - Information for 
 developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
 developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
 HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch, 
 HADOOP-9629.trunk.3.patch, HADOOP-9629.trunk.4.patch, 
 HADOOP-9629.trunk.5.patch


 h2. Description
 This JIRA incorporates adding a new file system implementation for accessing 
 Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
 to MR jobs or configuring MR jobs to put their output directly into blob 
 storage.
 h2. High level design
 At a high level, the code here extends the FileSystem class to provide an 
 implementation for accessing blob storage; the scheme wasb is used for 
 accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
 scheme: {code}wasb[s]://container@account/path/to/file{code} to address 
 individual blobs. We use the standard Azure Java SDK 
 (com.microsoft.windowsazure) to do most of the work. In order to map a 
 hierarchical file system over the flat name-value pair nature of blob 
 storage, we create a specially tagged blob named path/to/dir whenever we 
 create a directory called path/to/dir, then files under that are stored as 
 normal blobs path/to/dir/file. We have many metrics implemented for it using 
 the Metrics2 interface. Tests are implemented mostly using a mock 
 implementation for the Azure SDK functionality, with an option to test 
 against a real blob storage if configured (instructions provided inside in 
 README.txt).
 h2. Credits and history
 This has been ongoing work for a while, and the early version of this work 
 can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
 we'll post the patch here for Hadoop trunk first, then post a patch for 
 branch-1 as well for backporting the functionality if accepted. Credit for 
 this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
 [~stojanovic] as well as multiple people who have taken over this work since 
 then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
 Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
 [~chuanliu].
 h2. Test
 Besides unit tests, we have used WASB as the default file system in our 
 service product. (HDFS is also used but not as default file system.) Various 
 different customer and test workloads have been run against clusters with 
 such configurations for quite some time. The current version reflects to the 
 version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-06-08 Thread Mike Liddell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9629:
-

Attachment: HADOOP-9629.trunk.3.patch

 Support Windows Azure Storage - Blob as a file system in Hadoop
 ---

 Key: HADOOP-9629
 URL: https://issues.apache.org/jira/browse/HADOOP-9629
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Mostafa Elhemali
Assignee: Mike Liddell
 Attachments: HADOOP-9629 - Azure Filesystem - Information for 
 developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
 developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
 HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch, 
 HADOOP-9629.trunk.3.patch


 h2. Description
 This JIRA incorporates adding a new file system implementation for accessing 
 Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
 to MR jobs or configuring MR jobs to put their output directly into blob 
 storage.
 h2. High level design
 At a high level, the code here extends the FileSystem class to provide an 
 implementation for accessing blob storage; the scheme wasb is used for 
 accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
 scheme: {code}wasb[s]://container@account/path/to/file{code} to address 
 individual blobs. We use the standard Azure Java SDK 
 (com.microsoft.windowsazure) to do most of the work. In order to map a 
 hierarchical file system over the flat name-value pair nature of blob 
 storage, we create a specially tagged blob named path/to/dir whenever we 
 create a directory called path/to/dir, then files under that are stored as 
 normal blobs path/to/dir/file. We have many metrics implemented for it using 
 the Metrics2 interface. Tests are implemented mostly using a mock 
 implementation for the Azure SDK functionality, with an option to test 
 against a real blob storage if configured (instructions provided inside in 
 README.txt).
 h2. Credits and history
 This has been ongoing work for a while, and the early version of this work 
 can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
 we'll post the patch here for Hadoop trunk first, then post a patch for 
 branch-1 as well for backporting the functionality if accepted. Credit for 
 this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
 [~stojanovic] as well as multiple people who have taken over this work since 
 then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
 Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
 [~chuanliu].
 h2. Test
 Besides unit tests, we have used WASB as the default file system in our 
 service product. (HDFS is also used but not as default file system.) Various 
 different customer and test workloads have been run against clusters with 
 such configurations for quite some time. The current version reflects to the 
 version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-06-08 Thread Mike Liddell (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14021497#comment-14021497
 ] 

Mike Liddell commented on HADOOP-9629:
--

The annotations and suggested usages sound good.
The only changes that I suggest are:
- AzureException: Public + Evolving
- WasbFsck: Public + Evolving.

sound good?

 Support Windows Azure Storage - Blob as a file system in Hadoop
 ---

 Key: HADOOP-9629
 URL: https://issues.apache.org/jira/browse/HADOOP-9629
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Mostafa Elhemali
Assignee: Mike Liddell
 Attachments: HADOOP-9629 - Azure Filesystem - Information for 
 developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
 developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
 HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch, 
 HADOOP-9629.trunk.3.patch


 h2. Description
 This JIRA incorporates adding a new file system implementation for accessing 
 Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
 to MR jobs or configuring MR jobs to put their output directly into blob 
 storage.
 h2. High level design
 At a high level, the code here extends the FileSystem class to provide an 
 implementation for accessing blob storage; the scheme wasb is used for 
 accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
 scheme: {code}wasb[s]://container@account/path/to/file{code} to address 
 individual blobs. We use the standard Azure Java SDK 
 (com.microsoft.windowsazure) to do most of the work. In order to map a 
 hierarchical file system over the flat name-value pair nature of blob 
 storage, we create a specially tagged blob named path/to/dir whenever we 
 create a directory called path/to/dir, then files under that are stored as 
 normal blobs path/to/dir/file. We have many metrics implemented for it using 
 the Metrics2 interface. Tests are implemented mostly using a mock 
 implementation for the Azure SDK functionality, with an option to test 
 against a real blob storage if configured (instructions provided inside in 
 README.txt).
 h2. Credits and history
 This has been ongoing work for a while, and the early version of this work 
 can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
 we'll post the patch here for Hadoop trunk first, then post a patch for 
 branch-1 as well for backporting the functionality if accepted. Credit for 
 this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
 [~stojanovic] as well as multiple people who have taken over this work since 
 then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
 Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
 [~chuanliu].
 h2. Test
 Besides unit tests, we have used WASB as the default file system in our 
 service product. (HDFS is also used but not as default file system.) Various 
 different customer and test workloads have been run against clusters with 
 such configurations for quite some time. The current version reflects to the 
 version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-06-08 Thread Mike Liddell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9629:
-

Attachment: (was: HADOOP-9629.trunk.3.patch)

 Support Windows Azure Storage - Blob as a file system in Hadoop
 ---

 Key: HADOOP-9629
 URL: https://issues.apache.org/jira/browse/HADOOP-9629
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Mostafa Elhemali
Assignee: Mike Liddell
 Attachments: HADOOP-9629 - Azure Filesystem - Information for 
 developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
 developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
 HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch


 h2. Description
 This JIRA incorporates adding a new file system implementation for accessing 
 Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
 to MR jobs or configuring MR jobs to put their output directly into blob 
 storage.
 h2. High level design
 At a high level, the code here extends the FileSystem class to provide an 
 implementation for accessing blob storage; the scheme wasb is used for 
 accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
 scheme: {code}wasb[s]://container@account/path/to/file{code} to address 
 individual blobs. We use the standard Azure Java SDK 
 (com.microsoft.windowsazure) to do most of the work. In order to map a 
 hierarchical file system over the flat name-value pair nature of blob 
 storage, we create a specially tagged blob named path/to/dir whenever we 
 create a directory called path/to/dir, then files under that are stored as 
 normal blobs path/to/dir/file. We have many metrics implemented for it using 
 the Metrics2 interface. Tests are implemented mostly using a mock 
 implementation for the Azure SDK functionality, with an option to test 
 against a real blob storage if configured (instructions provided inside in 
 README.txt).
 h2. Credits and history
 This has been ongoing work for a while, and the early version of this work 
 can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
 we'll post the patch here for Hadoop trunk first, then post a patch for 
 branch-1 as well for backporting the functionality if accepted. Credit for 
 this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
 [~stojanovic] as well as multiple people who have taken over this work since 
 then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
 Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
 [~chuanliu].
 h2. Test
 Besides unit tests, we have used WASB as the default file system in our 
 service product. (HDFS is also used but not as default file system.) Various 
 different customer and test workloads have been run against clusters with 
 such configurations for quite some time. The current version reflects to the 
 version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-06-08 Thread Mike Liddell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9629:
-

Attachment: HADOOP-9629.trunk.3.patch

 Support Windows Azure Storage - Blob as a file system in Hadoop
 ---

 Key: HADOOP-9629
 URL: https://issues.apache.org/jira/browse/HADOOP-9629
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Mostafa Elhemali
Assignee: Mike Liddell
 Attachments: HADOOP-9629 - Azure Filesystem - Information for 
 developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
 developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
 HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch, 
 HADOOP-9629.trunk.3.patch


 h2. Description
 This JIRA incorporates adding a new file system implementation for accessing 
 Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
 to MR jobs or configuring MR jobs to put their output directly into blob 
 storage.
 h2. High level design
 At a high level, the code here extends the FileSystem class to provide an 
 implementation for accessing blob storage; the scheme wasb is used for 
 accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
 scheme: {code}wasb[s]://container@account/path/to/file{code} to address 
 individual blobs. We use the standard Azure Java SDK 
 (com.microsoft.windowsazure) to do most of the work. In order to map a 
 hierarchical file system over the flat name-value pair nature of blob 
 storage, we create a specially tagged blob named path/to/dir whenever we 
 create a directory called path/to/dir, then files under that are stored as 
 normal blobs path/to/dir/file. We have many metrics implemented for it using 
 the Metrics2 interface. Tests are implemented mostly using a mock 
 implementation for the Azure SDK functionality, with an option to test 
 against a real blob storage if configured (instructions provided inside in 
 README.txt).
 h2. Credits and history
 This has been ongoing work for a while, and the early version of this work 
 can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
 we'll post the patch here for Hadoop trunk first, then post a patch for 
 branch-1 as well for backporting the functionality if accepted. Credit for 
 this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
 [~stojanovic] as well as multiple people who have taken over this work since 
 then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
 Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
 [~chuanliu].
 h2. Test
 Besides unit tests, we have used WASB as the default file system in our 
 service product. (HDFS is also used but not as default file system.) Various 
 different customer and test workloads have been run against clusters with 
 such configurations for quite some time. The current version reflects to the 
 version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-06-08 Thread Mike Liddell (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14021499#comment-14021499
 ] 

Mike Liddell commented on HADOOP-9629:
--

new patch: HADOOP-9629.trunk.4.patch
 - addresses code-review comments from [~cnauroth], see 
https://reviews.apache.org/r/22096/
 - adds InterfaceAudience and InterfaceStability annotations to the main 
classes.

 Support Windows Azure Storage - Blob as a file system in Hadoop
 ---

 Key: HADOOP-9629
 URL: https://issues.apache.org/jira/browse/HADOOP-9629
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Mostafa Elhemali
Assignee: Mike Liddell
 Attachments: HADOOP-9629 - Azure Filesystem - Information for 
 developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
 developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
 HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch, 
 HADOOP-9629.trunk.3.patch


 h2. Description
 This JIRA incorporates adding a new file system implementation for accessing 
 Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
 to MR jobs or configuring MR jobs to put their output directly into blob 
 storage.
 h2. High level design
 At a high level, the code here extends the FileSystem class to provide an 
 implementation for accessing blob storage; the scheme wasb is used for 
 accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
 scheme: {code}wasb[s]://container@account/path/to/file{code} to address 
 individual blobs. We use the standard Azure Java SDK 
 (com.microsoft.windowsazure) to do most of the work. In order to map a 
 hierarchical file system over the flat name-value pair nature of blob 
 storage, we create a specially tagged blob named path/to/dir whenever we 
 create a directory called path/to/dir, then files under that are stored as 
 normal blobs path/to/dir/file. We have many metrics implemented for it using 
 the Metrics2 interface. Tests are implemented mostly using a mock 
 implementation for the Azure SDK functionality, with an option to test 
 against a real blob storage if configured (instructions provided inside in 
 README.txt).
 h2. Credits and history
 This has been ongoing work for a while, and the early version of this work 
 can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
 we'll post the patch here for Hadoop trunk first, then post a patch for 
 branch-1 as well for backporting the functionality if accepted. Credit for 
 this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
 [~stojanovic] as well as multiple people who have taken over this work since 
 then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
 Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
 [~chuanliu].
 h2. Test
 Besides unit tests, we have used WASB as the default file system in our 
 service product. (HDFS is also used but not as default file system.) Various 
 different customer and test workloads have been run against clusters with 
 such configurations for quite some time. The current version reflects to the 
 version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-06-08 Thread Mike Liddell (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14021506#comment-14021506
 ] 

Mike Liddell commented on HADOOP-9629:
--

Previous comment about new patch file had name wrong: The new patch is 
HADOOP-9629.trunk.3.patch

 Support Windows Azure Storage - Blob as a file system in Hadoop
 ---

 Key: HADOOP-9629
 URL: https://issues.apache.org/jira/browse/HADOOP-9629
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Mostafa Elhemali
Assignee: Mike Liddell
 Attachments: HADOOP-9629 - Azure Filesystem - Information for 
 developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
 developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
 HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch, 
 HADOOP-9629.trunk.3.patch


 h2. Description
 This JIRA incorporates adding a new file system implementation for accessing 
 Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
 to MR jobs or configuring MR jobs to put their output directly into blob 
 storage.
 h2. High level design
 At a high level, the code here extends the FileSystem class to provide an 
 implementation for accessing blob storage; the scheme wasb is used for 
 accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
 scheme: {code}wasb[s]://container@account/path/to/file{code} to address 
 individual blobs. We use the standard Azure Java SDK 
 (com.microsoft.windowsazure) to do most of the work. In order to map a 
 hierarchical file system over the flat name-value pair nature of blob 
 storage, we create a specially tagged blob named path/to/dir whenever we 
 create a directory called path/to/dir, then files under that are stored as 
 normal blobs path/to/dir/file. We have many metrics implemented for it using 
 the Metrics2 interface. Tests are implemented mostly using a mock 
 implementation for the Azure SDK functionality, with an option to test 
 against a real blob storage if configured (instructions provided inside in 
 README.txt).
 h2. Credits and history
 This has been ongoing work for a while, and the early version of this work 
 can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
 we'll post the patch here for Hadoop trunk first, then post a patch for 
 branch-1 as well for backporting the functionality if accepted. Credit for 
 this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
 [~stojanovic] as well as multiple people who have taken over this work since 
 then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
 Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
 [~chuanliu].
 h2. Test
 Besides unit tests, we have used WASB as the default file system in our 
 service product. (HDFS is also used but not as default file system.) Various 
 different customer and test workloads have been run against clusters with 
 such configurations for quite some time. The current version reflects to the 
 version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-05-16 Thread Mike Liddell (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13999338#comment-13999338
 ] 

Mike Liddell commented on HADOOP-9629:
--

A revised approach is now being used so that the Azure driver is handled the 
same way as the open-stack driver:
 - The Azure FileSystem driver is now a separate project 
hadoop-tools\hadoop-azure

As part of moving to a separate project area, the following have also been done:
- findbugs
- checkstyle
- code-cleanup based on the above and also based on Apache formatting rules 
- remove metrics business for now (it will come back later as a dedicated patch)

Namespace altered from org.apache.hadoop.fs.azurenative - 
org.apache.hadoop.fs.azure

New approach is HADOOP-9629.trunk.1.patch 


 Support Windows Azure Storage - Blob as a file system in Hadoop
 ---

 Key: HADOOP-9629
 URL: https://issues.apache.org/jira/browse/HADOOP-9629
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Mostafa Elhemali
Assignee: Mostafa Elhemali
 Attachments: HADOOP-9629.2.patch, HADOOP-9629.3.patch, 
 HADOOP-9629.patch, HADOOP-9629.trunk.1.patch


 h2. Description
 This JIRA incorporates adding a new file system implementation for accessing 
 Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
 to MR jobs or configuring MR jobs to put their output directly into blob 
 storage.
 h2. High level design
 At a high level, the code here extends the FileSystem class to provide an 
 implementation for accessing blob storage; the scheme wasb is used for 
 accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
 scheme: {code}wasb[s]://container@account/path/to/file{code} to address 
 individual blobs. We use the standard Azure Java SDK 
 (com.microsoft.windowsazure) to do most of the work. In order to map a 
 hierarchical file system over the flat name-value pair nature of blob 
 storage, we create a specially tagged blob named path/to/dir whenever we 
 create a directory called path/to/dir, then files under that are stored as 
 normal blobs path/to/dir/file. We have many metrics implemented for it using 
 the Metrics2 interface. Tests are implemented mostly using a mock 
 implementation for the Azure SDK functionality, with an option to test 
 against a real blob storage if configured (instructions provided inside in 
 README.txt).
 h2. Credits and history
 This has been ongoing work for a while, and the early version of this work 
 can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
 we'll post the patch here for Hadoop trunk first, then post a patch for 
 branch-1 as well for backporting the functionality if accepted. Credit for 
 this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
 [~stojanovic] as well as multiple people who have taken over this work since 
 then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
 Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
 [~chuanliu].
 h2. Test
 Besides unit tests, we have used WASB as the default file system in our 
 service product. (HDFS is also used but not as default file system.) Various 
 different customer and test workloads have been run against clusters with 
 such configurations for quite some time. The current version reflects to the 
 version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-05-16 Thread Mike Liddell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell reassigned HADOOP-9629:


Assignee: Mike Liddell  (was: Mostafa Elhemali)

 Support Windows Azure Storage - Blob as a file system in Hadoop
 ---

 Key: HADOOP-9629
 URL: https://issues.apache.org/jira/browse/HADOOP-9629
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Mostafa Elhemali
Assignee: Mike Liddell
 Attachments: HADOOP-9629.2.patch, HADOOP-9629.3.patch, 
 HADOOP-9629.patch, HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch


 h2. Description
 This JIRA incorporates adding a new file system implementation for accessing 
 Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
 to MR jobs or configuring MR jobs to put their output directly into blob 
 storage.
 h2. High level design
 At a high level, the code here extends the FileSystem class to provide an 
 implementation for accessing blob storage; the scheme wasb is used for 
 accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
 scheme: {code}wasb[s]://container@account/path/to/file{code} to address 
 individual blobs. We use the standard Azure Java SDK 
 (com.microsoft.windowsazure) to do most of the work. In order to map a 
 hierarchical file system over the flat name-value pair nature of blob 
 storage, we create a specially tagged blob named path/to/dir whenever we 
 create a directory called path/to/dir, then files under that are stored as 
 normal blobs path/to/dir/file. We have many metrics implemented for it using 
 the Metrics2 interface. Tests are implemented mostly using a mock 
 implementation for the Azure SDK functionality, with an option to test 
 against a real blob storage if configured (instructions provided inside in 
 README.txt).
 h2. Credits and history
 This has been ongoing work for a while, and the early version of this work 
 can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
 we'll post the patch here for Hadoop trunk first, then post a patch for 
 branch-1 as well for backporting the functionality if accepted. Credit for 
 this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
 [~stojanovic] as well as multiple people who have taken over this work since 
 then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
 Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
 [~chuanliu].
 h2. Test
 Besides unit tests, we have used WASB as the default file system in our 
 service product. (HDFS is also used but not as default file system.) Various 
 different customer and test workloads have been run against clusters with 
 such configurations for quite some time. The current version reflects to the 
 version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-05-16 Thread Mike Liddell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9629:
-

Attachment: HADOOP-9629.trunk.2.patch

New patch:
- added apache headers to XML files
- fixed the suppression of m2e warning (in pom.xml)

 Support Windows Azure Storage - Blob as a file system in Hadoop
 ---

 Key: HADOOP-9629
 URL: https://issues.apache.org/jira/browse/HADOOP-9629
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Mostafa Elhemali
Assignee: Mostafa Elhemali
 Attachments: HADOOP-9629.2.patch, HADOOP-9629.3.patch, 
 HADOOP-9629.patch, HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch


 h2. Description
 This JIRA incorporates adding a new file system implementation for accessing 
 Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
 to MR jobs or configuring MR jobs to put their output directly into blob 
 storage.
 h2. High level design
 At a high level, the code here extends the FileSystem class to provide an 
 implementation for accessing blob storage; the scheme wasb is used for 
 accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
 scheme: {code}wasb[s]://container@account/path/to/file{code} to address 
 individual blobs. We use the standard Azure Java SDK 
 (com.microsoft.windowsazure) to do most of the work. In order to map a 
 hierarchical file system over the flat name-value pair nature of blob 
 storage, we create a specially tagged blob named path/to/dir whenever we 
 create a directory called path/to/dir, then files under that are stored as 
 normal blobs path/to/dir/file. We have many metrics implemented for it using 
 the Metrics2 interface. Tests are implemented mostly using a mock 
 implementation for the Azure SDK functionality, with an option to test 
 against a real blob storage if configured (instructions provided inside in 
 README.txt).
 h2. Credits and history
 This has been ongoing work for a while, and the early version of this work 
 can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
 we'll post the patch here for Hadoop trunk first, then post a patch for 
 branch-1 as well for backporting the functionality if accepted. Credit for 
 this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
 [~stojanovic] as well as multiple people who have taken over this work since 
 then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
 Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
 [~chuanliu].
 h2. Test
 Besides unit tests, we have used WASB as the default file system in our 
 service product. (HDFS is also used but not as default file system.) Various 
 different customer and test workloads have been run against clusters with 
 such configurations for quite some time. The current version reflects to the 
 version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-05-16 Thread Mike Liddell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9629:
-

Attachment: HADOOP-9629 - Azure Filesystem - Information for developers.pdf

 Support Windows Azure Storage - Blob as a file system in Hadoop
 ---

 Key: HADOOP-9629
 URL: https://issues.apache.org/jira/browse/HADOOP-9629
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Mostafa Elhemali
Assignee: Mike Liddell
 Attachments: HADOOP-9629 - Azure Filesystem - Information for 
 developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
 developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
 HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch


 h2. Description
 This JIRA incorporates adding a new file system implementation for accessing 
 Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
 to MR jobs or configuring MR jobs to put their output directly into blob 
 storage.
 h2. High level design
 At a high level, the code here extends the FileSystem class to provide an 
 implementation for accessing blob storage; the scheme wasb is used for 
 accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
 scheme: {code}wasb[s]://container@account/path/to/file{code} to address 
 individual blobs. We use the standard Azure Java SDK 
 (com.microsoft.windowsazure) to do most of the work. In order to map a 
 hierarchical file system over the flat name-value pair nature of blob 
 storage, we create a specially tagged blob named path/to/dir whenever we 
 create a directory called path/to/dir, then files under that are stored as 
 normal blobs path/to/dir/file. We have many metrics implemented for it using 
 the Metrics2 interface. Tests are implemented mostly using a mock 
 implementation for the Azure SDK functionality, with an option to test 
 against a real blob storage if configured (instructions provided inside in 
 README.txt).
 h2. Credits and history
 This has been ongoing work for a while, and the early version of this work 
 can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
 we'll post the patch here for Hadoop trunk first, then post a patch for 
 branch-1 as well for backporting the functionality if accepted. Credit for 
 this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
 [~stojanovic] as well as multiple people who have taken over this work since 
 then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
 Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
 [~chuanliu].
 h2. Test
 Besides unit tests, we have used WASB as the default file system in our 
 service product. (HDFS is also used but not as default file system.) Various 
 different customer and test workloads have been run against clusters with 
 such configurations for quite some time. The current version reflects to the 
 version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-05-16 Thread Mike Liddell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9629:
-

Attachment: HADOOP-9629.trunk.1.patch

 Support Windows Azure Storage - Blob as a file system in Hadoop
 ---

 Key: HADOOP-9629
 URL: https://issues.apache.org/jira/browse/HADOOP-9629
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Mostafa Elhemali
Assignee: Mostafa Elhemali
 Attachments: HADOOP-9629.2.patch, HADOOP-9629.3.patch, 
 HADOOP-9629.patch, HADOOP-9629.trunk.1.patch


 h2. Description
 This JIRA incorporates adding a new file system implementation for accessing 
 Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
 to MR jobs or configuring MR jobs to put their output directly into blob 
 storage.
 h2. High level design
 At a high level, the code here extends the FileSystem class to provide an 
 implementation for accessing blob storage; the scheme wasb is used for 
 accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
 scheme: {code}wasb[s]://container@account/path/to/file{code} to address 
 individual blobs. We use the standard Azure Java SDK 
 (com.microsoft.windowsazure) to do most of the work. In order to map a 
 hierarchical file system over the flat name-value pair nature of blob 
 storage, we create a specially tagged blob named path/to/dir whenever we 
 create a directory called path/to/dir, then files under that are stored as 
 normal blobs path/to/dir/file. We have many metrics implemented for it using 
 the Metrics2 interface. Tests are implemented mostly using a mock 
 implementation for the Azure SDK functionality, with an option to test 
 against a real blob storage if configured (instructions provided inside in 
 README.txt).
 h2. Credits and history
 This has been ongoing work for a while, and the early version of this work 
 can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
 we'll post the patch here for Hadoop trunk first, then post a patch for 
 branch-1 as well for backporting the functionality if accepted. Credit for 
 this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
 [~stojanovic] as well as multiple people who have taken over this work since 
 then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
 Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
 [~chuanliu].
 h2. Test
 Besides unit tests, we have used WASB as the default file system in our 
 service product. (HDFS is also used but not as default file system.) Various 
 different customer and test workloads have been run against clusters with 
 such configurations for quite some time. The current version reflects to the 
 version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-05-16 Thread Mike Liddell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9629:
-

Attachment: HADOOP-9629 - Azure Filesystem - Information for developers.docx

 Support Windows Azure Storage - Blob as a file system in Hadoop
 ---

 Key: HADOOP-9629
 URL: https://issues.apache.org/jira/browse/HADOOP-9629
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Mostafa Elhemali
Assignee: Mike Liddell
 Attachments: HADOOP-9629 - Azure Filesystem - Information for 
 developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
 developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
 HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch


 h2. Description
 This JIRA incorporates adding a new file system implementation for accessing 
 Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
 to MR jobs or configuring MR jobs to put their output directly into blob 
 storage.
 h2. High level design
 At a high level, the code here extends the FileSystem class to provide an 
 implementation for accessing blob storage; the scheme wasb is used for 
 accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
 scheme: {code}wasb[s]://container@account/path/to/file{code} to address 
 individual blobs. We use the standard Azure Java SDK 
 (com.microsoft.windowsazure) to do most of the work. In order to map a 
 hierarchical file system over the flat name-value pair nature of blob 
 storage, we create a specially tagged blob named path/to/dir whenever we 
 create a directory called path/to/dir, then files under that are stored as 
 normal blobs path/to/dir/file. We have many metrics implemented for it using 
 the Metrics2 interface. Tests are implemented mostly using a mock 
 implementation for the Azure SDK functionality, with an option to test 
 against a real blob storage if configured (instructions provided inside in 
 README.txt).
 h2. Credits and history
 This has been ongoing work for a while, and the early version of this work 
 can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
 we'll post the patch here for Hadoop trunk first, then post a patch for 
 branch-1 as well for backporting the functionality if accepted. Credit for 
 this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
 [~stojanovic] as well as multiple people who have taken over this work since 
 then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
 Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
 [~chuanliu].
 h2. Test
 Besides unit tests, we have used WASB as the default file system in our 
 service product. (HDFS is also used but not as default file system.) Various 
 different customer and test workloads have been run against clusters with 
 such configurations for quite some time. The current version reflects to the 
 version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HADOOP-9629) Support Windows Azure Storage - Blob as a file system in Hadoop

2014-05-16 Thread Mike Liddell (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000380#comment-14000380
 ] 

Mike Liddell commented on HADOOP-9629:
--

Added a document with information for developers / code-reviewers.

 Support Windows Azure Storage - Blob as a file system in Hadoop
 ---

 Key: HADOOP-9629
 URL: https://issues.apache.org/jira/browse/HADOOP-9629
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Mostafa Elhemali
Assignee: Mike Liddell
 Attachments: HADOOP-9629 - Azure Filesystem - Information for 
 developers.docx, HADOOP-9629 - Azure Filesystem - Information for 
 developers.pdf, HADOOP-9629.2.patch, HADOOP-9629.3.patch, HADOOP-9629.patch, 
 HADOOP-9629.trunk.1.patch, HADOOP-9629.trunk.2.patch


 h2. Description
 This JIRA incorporates adding a new file system implementation for accessing 
 Windows Azure Storage - Blob from within Hadoop, such as using blobs as input 
 to MR jobs or configuring MR jobs to put their output directly into blob 
 storage.
 h2. High level design
 At a high level, the code here extends the FileSystem class to provide an 
 implementation for accessing blob storage; the scheme wasb is used for 
 accessing it over HTTP, and wasbs for accessing over HTTPS. We use the URI 
 scheme: {code}wasb[s]://container@account/path/to/file{code} to address 
 individual blobs. We use the standard Azure Java SDK 
 (com.microsoft.windowsazure) to do most of the work. In order to map a 
 hierarchical file system over the flat name-value pair nature of blob 
 storage, we create a specially tagged blob named path/to/dir whenever we 
 create a directory called path/to/dir, then files under that are stored as 
 normal blobs path/to/dir/file. We have many metrics implemented for it using 
 the Metrics2 interface. Tests are implemented mostly using a mock 
 implementation for the Azure SDK functionality, with an option to test 
 against a real blob storage if configured (instructions provided inside in 
 README.txt).
 h2. Credits and history
 This has been ongoing work for a while, and the early version of this work 
 can be seen in HADOOP-8079. This JIRA is a significant revision of that and 
 we'll post the patch here for Hadoop trunk first, then post a patch for 
 branch-1 as well for backporting the functionality if accepted. Credit for 
 this work goes to the early team: [~minwei], [~davidlao], [~lengningliu] and 
 [~stojanovic] as well as multiple people who have taken over this work since 
 then (hope I don't forget anyone): [~dexterb], Johannes Klein, [~ivanmi], 
 Michael Rys, [~mostafae], [~brian_swan], [~mikelid], [~xifang], and 
 [~chuanliu].
 h2. Test
 Besides unit tests, we have used WASB as the default file system in our 
 service product. (HDFS is also used but not as default file system.) Various 
 different customer and test workloads have been run against clusters with 
 such configurations for quite some time. The current version reflects to the 
 version of the code tested and used in our production environment.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HADOOP-10124) Option to shuffle splits of equal size

2013-11-22 Thread Mike Liddell (JIRA)

Mike Liddell created HADOOP-10124:
-

 Summary: Option to shuffle splits of equal size
 Key: HADOOP-10124
 URL: https://issues.apache.org/jira/browse/HADOOP-10124
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Mike Liddell


Mapreduce split calculation has the following base logic (via JobClient and the 
major InputFormat implementations ):
◾enumerate input files in natural (aka linear) order.
◾create one split for each 'block-size' of each input. Apart from 
rack-awareness, combining and so on, the input file order remains in its 
natural order.
◾sort the splits by size using a stable sort based on splitsize.

When data from multiple storage services are used in a single hadoop job, we 
get better I/O utilization if the list of splits does round-robin or 
random-access across the services. 
The particular scenario arises in Azure HDInsight where jobs can easily read 
from many storage accounts and each storage account has hard limits on 
throughtput.  Concurrent access to the accounts is substantially better than 
 
Two common scenarios can cause non-ideal access pattern:
 1. many/all input files are the same size
 2. files have different sizes, but many/all input files have sizeblocksize.
 In the second scenario, for each file will have one or more splits with size 
exactly equal to block size so it basically degenerates to the first scenario.

There are various ways to solve the problem but the simplest is to alter the 
mapreduce JobClient to sort splits by size _and_ randomize the order of splits 
with equal size. This keeps the old behavior effectively unchanged while also 
fixing both common problematic scenarios.

Some rare scenarios will still suffer bad access patterns due. For example if 
two storage accounts are used and the files from one storage account are all 
smaller than from the other then problems can arise. Addressing these scenarios 
would be further work, perhaps by completely randomizing the split order. These 
problematic scenarios are considered rare and not requiring immediate attention.

If further algorithms for split ordering are necessary, the implementation in 
JobClient will change to being interface-based (eg interface splitOrderer) with 
various standard implementations.  At this time there is only the need for two 
implementations and so simple Boolean flag and if/then logic is used.




--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HADOOP-10124) Option to shuffle splits of equal size

2013-11-22 Thread Mike Liddell (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-10124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mike Liddell updated HADOOP-10124:
--

Attachment: HADOOP-10124.1.patch

Option to shuffle splits of equal size
--

Key: HADOOP-10124
URL: https://issues.apache.org/jira/browse/HADOOP-10124
Project: Hadoop Common
Issue Type: Improvement
Reporter: Mike Liddell
Attachments: HADOOP-10124.1.patch

Mapreduce split calculation has the following base logic (via JobClient and
the major InputFormat implementations ):
◾enumerate input files in natural (aka linear) order.
◾create one split for each 'block-size' of each input. Apart from
rack-awareness, combining and so on, the input file order remains in its
natural order.
◾sort the splits by size using a stable sort based on splitsize.
When data from multiple storage services are used in a single hadoop job, we
get better I/O utilization if the list of splits does round-robin or
random-access across the services.
The particular scenario arises in Azure HDInsight where jobs can easily read
from many storage accounts and each storage account has hard limits on
throughtput. Concurrent access to the accounts is substantially better than

Two common scenarios can cause non-ideal access pattern:
1. many/all input files are the same size
2. files have different sizes, but many/all input files have sizeblocksize.
In the second scenario, for each file will have one or more splits with size
exactly equal to block size so it basically degenerates to the first scenario.
There are various ways to solve the problem but the simplest is to alter the
mapreduce JobClient to sort splits by size _and_ randomize the order of
splits with equal size. This keeps the old behavior effectively unchanged
while also fixing both common problematic scenarios.
Some rare scenarios will still suffer bad access patterns due. For example if
two storage accounts are used and the files from one storage account are all
smaller than from the other then problems can arise. Addressing these
scenarios would be further work, perhaps by completely randomizing the split
order. These problematic scenarios are considered rare and not requiring
immediate attention.
If further algorithms for split ordering are necessary, the implementation in
JobClient will change to being interface-based (eg interface splitOrderer)
with various standard implementations. At this time there is only the need
for two implementations and so simple Boolean flag and if/then logic is used.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HADOOP-10124) Option to shuffle splits of equal size

2013-11-22 Thread Mike Liddell (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-10124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13830371#comment-13830371
]

Mike Liddell commented on HADOOP-10124:
---

patch added.
new flag to govern use of new logic: mapred.submit.shuffle.equalsized.splits.
Default=false.
If flag is true, JobClient will shuffle the splits that share a common size.

Option to shuffle splits of equal size
--

Key: HADOOP-10124
URL: https://issues.apache.org/jira/browse/HADOOP-10124
Project: Hadoop Common
Issue Type: Improvement
Reporter: Mike Liddell
Attachments: HADOOP-10124.1.patch

--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Updated] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously

2013-03-14 Thread Mike Liddell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-9371:
-

Attachment: HADOOP-9361.2.patch

Added HADOOP-9361.2.patch with minor edits.   
 - additional assumptions
 - changed detail for fs.delete(/)

This patch was created via svn diff is not a delta over the original patch.

Please let me know if the patch format is incorrect.

 Define Semantics of FileSystem and FileContext more rigorously
 --

 Key: HADOOP-9371
 URL: https://issues.apache.org/jira/browse/HADOOP-9371
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 1.2.0, 3.0.0
Reporter: Steve Loughran
Assignee: Steve Loughran
 Attachments: HADOOP-9361.2.patch, HADOOP-9361.patch, 
 HadoopFilesystemContract.pdf

   Original Estimate: 48h
  Remaining Estimate: 48h

 The semantics of {{FileSystem}} and {{FileContext}} are not completely 
 defined in terms of 
 # core expectations of a filesystem
 # consistency requirements.
 # concurrency requirements.
 # minimum scale limits
 Furthermore, methods are not defined strictly enough in terms of their 
 outcomes and failure modes.
 The requirements and method semantics should be defined more strictly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-9371) Define Semantics of FileSystem and FileContext more rigorously

2013-03-13 Thread Mike Liddell (JIRA)

[
https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13601901#comment-13601901
]

Mike Liddell commented on HADOOP-9371:
--

A few items for consideration:

Possible additions to 'implicit assumption':
- paths are represented as Unicode strings
- equality/comparison of paths is based on binary content. this implies
case-sensitivity and no locale-specific comparison rules.

The data added to a file during a write or append MAY be visible during while
the write operation is in progress.
- Allowing read(s) during write seems to break the subsequent rule that
readers always see consistent data.

Deleting the root path, /, MUST fail iff recursive==false.
- If the root path is empty, it seems reasonable for delete(/,false) to
succeed but to have no effect.

After a file is created, all ls operations on the file and parent directory
MUST not find the file
- copy-paste error - after a file is deleted ...

Security: if a caller has the rights to list a directory, it has the rights
to list directories all the way up the tree.
- This point raises lots of interesting questions and requirements for
individual methods. A section on security assumptions/rules would be great.

Define Semantics of FileSystem and FileContext more rigorously
--

Key: HADOOP-9371
URL: https://issues.apache.org/jira/browse/HADOOP-9371
Project: Hadoop Common
Issue Type: Sub-task
Components: fs
Affects Versions: 1.2.0, 3.0.0
Reporter: Steve Loughran
Assignee: Steve Loughran
Attachments: HADOOP-9361.patch, HadoopFilesystemContract.pdf

Original Estimate: 48h
Remaining Estimate: 48h

The semantics of {{FileSystem}} and {{FileContext}} are not completely
defined in terms of
# core expectations of a filesystem
# consistency requirements.
# concurrency requirements.
# minimum scale limits
Furthermore, methods are not defined strictly enough in terms of their
outcomes and failure modes.
The requirements and method semantics should be defined more strictly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8562) Enhancements to Hadoop for Windows Server and Windows Azure development and runtime environments

2013-02-27 Thread Mike Liddell (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13588848#comment-13588848
 ] 

Mike Liddell commented on HADOOP-8562:
--

+1 non-binding

 Enhancements to Hadoop for Windows Server and Windows Azure development and 
 runtime environments
 

 Key: HADOOP-8562
 URL: https://issues.apache.org/jira/browse/HADOOP-8562
 Project: Hadoop Common
  Issue Type: New Feature
Affects Versions: 3.0.0
Reporter: Bikas Saha
Assignee: Bikas Saha
 Attachments: branch-trunk-win.patch, branch-trunk-win.patch, 
 branch-trunk-win.patch, branch-trunk-win.patch, branch-trunk-win.patch, 
 branch-trunk-win.patch, branch-trunk-win.patch, branch-trunk-win.patch, 
 test-untar.tar, test-untar.tgz


 This JIRA tracks the work that needs to be done on trunk to enable Hadoop to 
 run on Windows Server and Azure environments. This incorporates porting 
 relevant work from the similar effort on branch 1 tracked via HADOOP-8079.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8902) Enable Gridmix v1 v2 benchmarks on Windows platform

2012-10-17 Thread Mike Liddell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-8902:
-

Attachment: HADOOP-8902.branch-1-win.contribscripts.patch

 Enable Gridmix v1  v2 benchmarks on Windows platform
 -

 Key: HADOOP-8902
 URL: https://issues.apache.org/jira/browse/HADOOP-8902
 Project: Hadoop Common
  Issue Type: Bug
  Components: benchmarks
Affects Versions: 1-win
Reporter: Mike Liddell
 Attachments: HADOOP-8902.branch-1-win.contribscripts.patch, 
 HADOOP-8902.patch


 Gridmix v1 and v2 benchmarks do not run on Windows as they require bash 
 shell.  These scripts have been ported to Windows cmd-scripts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-8902) Enable Gridmix v1 v2 benchmarks on Windows platform

2012-10-17 Thread Mike Liddell (JIRA)


[ 
https://issues.apache.org/jira/browse/HADOOP-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13478374#comment-13478374
 ] 

Mike Liddell commented on HADOOP-8902:
--

patch updated: HADOOP-8902.branch-1-win.contribscripts.patch

 Enable Gridmix v1  v2 benchmarks on Windows platform
 -

 Key: HADOOP-8902
 URL: https://issues.apache.org/jira/browse/HADOOP-8902
 Project: Hadoop Common
  Issue Type: Bug
  Components: benchmarks
Affects Versions: 1-win
Reporter: Mike Liddell
 Attachments: HADOOP-8902.branch-1-win.contribscripts.patch, 
 HADOOP-8902.patch


 Gridmix v1 and v2 benchmarks do not run on Windows as they require bash 
 shell.  These scripts have been ported to Windows cmd-scripts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8902) Enable Gridmix v1 v2 benchmarks on Windows platform

2012-10-09 Thread Mike Liddell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-8902:
-

Status: Open  (was: Patch Available)

 Enable Gridmix v1  v2 benchmarks on Windows platform
 -

 Key: HADOOP-8902
 URL: https://issues.apache.org/jira/browse/HADOOP-8902
 Project: Hadoop Common
  Issue Type: Bug
  Components: benchmarks
Affects Versions: 1-win
Reporter: Mike Liddell
 Attachments: HADOOP-8902.patch


 Gridmix v1 and v2 benchmarks do not run on Windows as they require bash 
 shell.  These scripts have been ported to Windows cmd-scripts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HADOOP-8902) Enable Gridmix v1 v2 benchmarks on Windows platform

2012-10-08 Thread Mike Liddell (JIRA)

Mike Liddell created HADOOP-8902:


 Summary: Enable Gridmix v1  v2 benchmarks on Windows platform
 Key: HADOOP-8902
 URL: https://issues.apache.org/jira/browse/HADOOP-8902
 Project: Hadoop Common
  Issue Type: Bug
  Components: benchmarks
Affects Versions: 1-win
Reporter: Mike Liddell


Gridmix v1 and v2 benchmarks do not run on Windows as they require bash shell.  
These scripts have been ported to Windows cmd-scripts.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8902) Enable Gridmix v1 v2 benchmarks on Windows platform

2012-10-08 Thread Mike Liddell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-8902:
-

Attachment: HADOOP-8902.patch

 Enable Gridmix v1  v2 benchmarks on Windows platform
 -

 Key: HADOOP-8902
 URL: https://issues.apache.org/jira/browse/HADOOP-8902
 Project: Hadoop Common
  Issue Type: Bug
  Components: benchmarks
Affects Versions: 1-win
Reporter: Mike Liddell
 Attachments: HADOOP-8902.patch


 Gridmix v1 and v2 benchmarks do not run on Windows as they require bash 
 shell.  These scripts have been ported to Windows cmd-scripts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HADOOP-8902) Enable Gridmix v1 v2 benchmarks on Windows platform

2012-10-08 Thread Mike Liddell (JIRA)


 [ 
https://issues.apache.org/jira/browse/HADOOP-8902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Liddell updated HADOOP-8902:
-

Status: Patch Available  (was: Open)

 Enable Gridmix v1  v2 benchmarks on Windows platform
 -

 Key: HADOOP-8902
 URL: https://issues.apache.org/jira/browse/HADOOP-8902
 Project: Hadoop Common
  Issue Type: Bug
  Components: benchmarks
Affects Versions: 1-win
Reporter: Mike Liddell
 Attachments: HADOOP-8902.patch


 Gridmix v1 and v2 benchmarks do not run on Windows as they require bash 
 shell.  These scripts have been ported to Windows cmd-scripts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

44 matches

Mail list logo