[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache

2016-10-05 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15550451#comment-15550451
 ] 

Fenghua Hu commented on HDFS-10690:
---

[~xyao], thank you for the great help!

> Optimize insertion/removal of replica in ShortCircuitCache
> --
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Fix For: 2.8.0
>
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> HDFS-10690.003.patch, HDFS-10690.004.patch, HDFS-10690.005.patch, 
> HDFS-10690.006.patch, HDFS-10690.007.patch, HDFS-10690.008.patch, 
> ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-10-02 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15541287#comment-15541287
 ] 

Fenghua Hu commented on HDFS-10690:
---

Failed case has nothing to do with the patch, and they also passed on my test 
bed. [~xyao]

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> HDFS-10690.003.patch, HDFS-10690.004.patch, HDFS-10690.005.patch, 
> HDFS-10690.006.patch, HDFS-10690.007.patch, HDFS-10690.008.patch, 
> ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-10-02 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15541287#comment-15541287
 ] 

Fenghua Hu edited comment on HDFS-10690 at 10/3/16 1:27 AM:


Failed cases have nothing to do with the patch, actually they passed on my test 
bed. [~xyao]


was (Author: fenghua_hu):
Failed cases have nothing to do with the patch, and they also passed on my test 
bed. [~xyao]

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> HDFS-10690.003.patch, HDFS-10690.004.patch, HDFS-10690.005.patch, 
> HDFS-10690.006.patch, HDFS-10690.007.patch, HDFS-10690.008.patch, 
> ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-10-02 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15541287#comment-15541287
 ] 

Fenghua Hu edited comment on HDFS-10690 at 10/3/16 1:26 AM:


Failed cases have nothing to do with the patch, and they also passed on my test 
bed. [~xyao]


was (Author: fenghua_hu):
Failed case has nothing to do with the patch, and they also passed on my test 
bed. [~xyao]

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> HDFS-10690.003.patch, HDFS-10690.004.patch, HDFS-10690.005.patch, 
> HDFS-10690.006.patch, HDFS-10690.007.patch, HDFS-10690.008.patch, 
> ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-10-02 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10690:
--
Attachment: HDFS-10690.008.patch

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> HDFS-10690.003.patch, HDFS-10690.004.patch, HDFS-10690.005.patch, 
> HDFS-10690.006.patch, HDFS-10690.007.patch, HDFS-10690.008.patch, 
> ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-10-02 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10690:
--
Attachment: (was: HDFS-10690.008.patch)

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> HDFS-10690.003.patch, HDFS-10690.004.patch, HDFS-10690.005.patch, 
> HDFS-10690.006.patch, HDFS-10690.007.patch, HDFS-10690.008.patch, 
> ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-10-01 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10690:
--
Attachment: HDFS-10690.008.patch

Fixed unit test issue.

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> HDFS-10690.003.patch, HDFS-10690.004.patch, HDFS-10690.005.patch, 
> HDFS-10690.006.patch, HDFS-10690.007.patch, HDFS-10690.008.patch, 
> ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-09-30 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15537580#comment-15537580
 ] 

Fenghua Hu commented on HDFS-10690:
---

Hi [~xyao], I should had run a clean compilation, sorry for your inconvenience. 
I will update the patch soon.

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> HDFS-10690.003.patch, HDFS-10690.004.patch, HDFS-10690.005.patch, 
> HDFS-10690.006.patch, HDFS-10690.007.patch, ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-09-30 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10690:
--
Attachment: HDFS-10690.007.patch

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> HDFS-10690.003.patch, HDFS-10690.004.patch, HDFS-10690.005.patch, 
> HDFS-10690.006.patch, HDFS-10690.007.patch, ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-09-29 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15533068#comment-15533068
 ] 

Fenghua Hu commented on HDFS-10690:
---

[~stack] thanks for reviewing the patch!

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> HDFS-10690.003.patch, HDFS-10690.004.patch, HDFS-10690.005.patch, 
> HDFS-10690.006.patch, ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-09-28 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15531057#comment-15531057
 ] 

Fenghua Hu commented on HDFS-10690:
---

[~xyao], thanks for the help!

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> HDFS-10690.003.patch, HDFS-10690.004.patch, HDFS-10690.005.patch, 
> HDFS-10690.006.patch, ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10804) Use separate lock for ReplicaMap

2016-09-28 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10804:
--
Attachment: (was: HDFS-10804-003.patch)

> Use separate lock for ReplicaMap
> 
>
> Key: HDFS-10804
> URL: https://issues.apache.org/jira/browse/HDFS-10804
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0-beta1
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
>Priority: Minor
> Attachments: HDFS-10804-001.patch, HDFS-10804-002.patch
>
>
> In currently implementation, ReplicaMap takes an external lock for 
> synchronization.
> In function FsDatasetImpl#FsDatasetImpl(), the object is for synchronization 
> is the same lock object used by FsDatasetImpl routines.
> and in private FsDatasetImpl#addVolume(), the same lock is used for 
> synchronization as well.
> {code}
> ReplicaMap tempVolumeMap = new ReplicaMap(datasetLock);
> {code}
> We can potentially eliminate the heavyweight lock for synchronizing 
> ReplicaMap instances. If it's not necessary, this could reduce lock 
> contention on the datasetLock object and improve performance. 
> Could you please give me some suggestions? Thanks a lot!
> Fenghua



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10804) Use separate lock for ReplicaMap

2016-09-28 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10804:
--
Attachment: HDFS-10804-003.patch

Re-attach to trigger Jenkins build.

> Use separate lock for ReplicaMap
> 
>
> Key: HDFS-10804
> URL: https://issues.apache.org/jira/browse/HDFS-10804
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0-beta1
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
>Priority: Minor
> Attachments: HDFS-10804-001.patch, HDFS-10804-002.patch, 
> HDFS-10804-003.patch
>
>
> In currently implementation, ReplicaMap takes an external lock for 
> synchronization.
> In function FsDatasetImpl#FsDatasetImpl(), the object is for synchronization 
> is the same lock object used by FsDatasetImpl routines.
> and in private FsDatasetImpl#addVolume(), the same lock is used for 
> synchronization as well.
> {code}
> ReplicaMap tempVolumeMap = new ReplicaMap(datasetLock);
> {code}
> We can potentially eliminate the heavyweight lock for synchronizing 
> ReplicaMap instances. If it's not necessary, this could reduce lock 
> contention on the datasetLock object and improve performance. 
> Could you please give me some suggestions? Thanks a lot!
> Fenghua



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-09-28 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15528731#comment-15528731
 ] 

Fenghua Hu commented on HDFS-9668:
--

HI [~jingcheng...@intel.com], thanks for the great fix. I think this fix should 
work. 

My thought is that "this"(i.e. FsDatasetImpl object) is not an dedicated lock, 
which could be used by other callers, thus a dedicated lock could be safer and 
cleaner. HDFS-10804 could address this issue.

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-13.patch, 
> HDFS-9668-14.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, HDFS-9668-4.patch, 
> HDFS-9668-5.patch, HDFS-9668-6.patch, HDFS-9668-7.patch, HDFS-9668-8.patch, 
> HDFS-9668-9.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl 

[jira] [Commented] (HDFS-10804) Use separate lock for ReplicaMap

2016-09-28 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15528661#comment-15528661
 ] 

Fenghua Hu commented on HDFS-10804:
---

Thanks [~arpitagarwal] for updating the description to reflect the recent 
changes in HDFS-10828. I updated a new version, could you please review it? 

Note that I haven't change other places, i think if it's feasible, we will need 
to change many places, for example:
we could safely remove datasetLock.acquire() from following code:

  public FsVolumeImpl getVolume(final ExtendedBlock b) {
try (AutoCloseableLock lock = datasetLock.acquire()) {
  final ReplicaInfo r =
  volumeMap.get(b.getBlockPoolId(), b.getLocalBlock());
  return r != null ? (FsVolumeImpl) r.getVolume() : null;
}
  }

similar functions include: getStoredBlock() and some others functions which 
uses lock for ReplicaMap exclusive access.
Once we agree to use separate lock for ReplicaMap, I'll modify them 
correspondingly. It's expected to improve performance.

> Use separate lock for ReplicaMap
> 
>
> Key: HDFS-10804
> URL: https://issues.apache.org/jira/browse/HDFS-10804
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0-beta1
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
>Priority: Minor
> Attachments: HDFS-10804-001.patch, HDFS-10804-002.patch, 
> HDFS-10804-003.patch
>
>
> In currently implementation, ReplicaMap takes an external lock for 
> synchronization.
> In function FsDatasetImpl#FsDatasetImpl(), the object is for synchronization 
> is the same lock object used by FsDatasetImpl routines.
> and in private FsDatasetImpl#addVolume(), the same lock is used for 
> synchronization as well.
> {code}
> ReplicaMap tempVolumeMap = new ReplicaMap(datasetLock);
> {code}
> We can potentially eliminate the heavyweight lock for synchronizing 
> ReplicaMap instances. If it's not necessary, this could reduce lock 
> contention on the datasetLock object and improve performance. 
> Could you please give me some suggestions? Thanks a lot!
> Fenghua



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10804) Use separate lock for ReplicaMap

2016-09-28 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10804:
--
Attachment: HDFS-10804-003.patch

> Use separate lock for ReplicaMap
> 
>
> Key: HDFS-10804
> URL: https://issues.apache.org/jira/browse/HDFS-10804
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0-beta1
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
>Priority: Minor
> Attachments: HDFS-10804-001.patch, HDFS-10804-002.patch, 
> HDFS-10804-003.patch
>
>
> In currently implementation, ReplicaMap takes an external lock for 
> synchronization.
> In function FsDatasetImpl#FsDatasetImpl(), the object is for synchronization 
> is the same lock object used by FsDatasetImpl routines.
> and in private FsDatasetImpl#addVolume(), the same lock is used for 
> synchronization as well.
> {code}
> ReplicaMap tempVolumeMap = new ReplicaMap(datasetLock);
> {code}
> We can potentially eliminate the heavyweight lock for synchronizing 
> ReplicaMap instances. If it's not necessary, this could reduce lock 
> contention on the datasetLock object and improve performance. 
> Could you please give me some suggestions? Thanks a lot!
> Fenghua



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-09-28 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15528584#comment-15528584
 ] 

Fenghua Hu commented on HDFS-10690:
---

This fix doesn't change any interface, and I have run the unit test for 
ShortCircuitCache.

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> HDFS-10690.003.patch, HDFS-10690.004.patch, HDFS-10690.005.patch, 
> HDFS-10690.006.patch, ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-09-28 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15528580#comment-15528580
 ] 

Fenghua Hu commented on HDFS-9668:
--

hi [~jingcheng...@intel.com], regarding ReplicaMap lock, I opened a new 
JIRA(https://issues.apache.org/jira/browse/HDFS-10804) to address this issue, 
or you could submit 9668 after it? What do you think?

> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-10.patch, 
> HDFS-9668-11.patch, HDFS-9668-12.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, 
> HDFS-9668-4.patch, HDFS-9668-5.patch, HDFS-9668-6.patch, HDFS-9668-7.patch, 
> HDFS-9668-8.patch, HDFS-9668-9.patch, execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement 

[jira] [Updated] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-09-27 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10690:
--
Attachment: HDFS-10690.006.patch

Re-submit patch v6 so as to trigger Jenkins build.

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> HDFS-10690.003.patch, HDFS-10690.004.patch, HDFS-10690.005.patch, 
> HDFS-10690.006.patch, ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-09-27 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10690:
--
Attachment: (was: HDFS-10690.006.patch)

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> HDFS-10690.003.patch, HDFS-10690.004.patch, HDFS-10690.005.patch, 
> ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-09-27 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15528033#comment-15528033
 ] 

Fenghua Hu commented on HDFS-10690:
---

looks like patch v6 hasn't been built and verified by Jenkins. Anything else i 
should do?[~xyao]

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> HDFS-10690.003.patch, HDFS-10690.004.patch, HDFS-10690.005.patch, 
> HDFS-10690.006.patch, ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10804) Use separate lock for ReplicaMap

2016-09-26 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10804:
--
Summary: Use separate lock for ReplicaMap  (was: Use finer-granularity lock 
for ReplicaMap)

> Use separate lock for ReplicaMap
> 
>
> Key: HDFS-10804
> URL: https://issues.apache.org/jira/browse/HDFS-10804
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0-beta1
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
>Priority: Minor
> Attachments: HDFS-10804-001.patch, HDFS-10804-002.patch
>
>
> In currently implementation, ReplicaMap takes an external object as lock for 
> synchronization.
> In function FsDatasetImpl#FsDatasetImpl(), the object is for synchronization 
> is "this", i.e. FsDatasetImpl: 
> volumeMap = new ReplicaMap(this);
> and in private FsDatasetImpl#addVolume(), "this" object is used for 
> synchronization as well.
> ReplicaMap tempVolumeMap = new ReplicaMap(this);
> I am not sure if we really need so big object FsDatasetImpl  for ReplicaMap's 
> synchronization. If it's not necessary, this could reduce lock contention on 
> FsDatasetImpl object and improve performance. 
> Could you please give me some suggestions? Thanks a lot!
> Fenghua



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10828) Fix usage of FsDatasetImpl object lock in ReplicaMap

2016-09-26 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524786#comment-15524786
 ] 

Fenghua Hu commented on HDFS-10828:
---

[~arpitagarwal], 

Actually, i opened another JIRA 
https://issues.apache.org/jira/browse/HDFS-10804 for replicaMap lock issue, but 
after seeing your fix, i thought we could consider them together in HDFS-10828. 

I agree with you that it makes sense to keep the existing behavior in this fix, 
and we will use HDFS-10804 to track ReplicaMap issue.  Thanks.

> Fix usage of FsDatasetImpl object lock in ReplicaMap
> 
>
> Key: HDFS-10828
> URL: https://issues.apache.org/jira/browse/HDFS-10828
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
>Priority: Blocker
> Attachments: HDFS-10828.01.patch, HDFS-10828.02.patch, 
> HDFS-10828.03.patch
>
>
> HDFS-10682 replaced the FsDatasetImpl object lock with a separate reentrant 
> lock but missed updating an instance ReplicaMap still uses the FsDatasetImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-09-26 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524750#comment-15524750
 ] 

Fenghua Hu commented on HDFS-10690:
---

Removed unnessary "if (eldestKey == null)" statement, and updated the patch.

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> HDFS-10690.003.patch, HDFS-10690.004.patch, HDFS-10690.005.patch, 
> HDFS-10690.006.patch, ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-09-26 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10690:
--
Attachment: HDFS-10690.006.patch

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> HDFS-10690.003.patch, HDFS-10690.004.patch, HDFS-10690.005.patch, 
> HDFS-10690.006.patch, ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-09-26 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15524699#comment-15524699
 ] 

Fenghua Hu commented on HDFS-10690:
---

Thanks [~xyao] for your suggestion. How about we just remove "if (eldestKey == 
null) { break; } for bulletin 2? If so we won't need an extra helper function 
for bulletin 1.

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> HDFS-10690.003.patch, HDFS-10690.004.patch, HDFS-10690.005.patch, 
> ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10828) Fix usage of FsDatasetImpl object lock in ReplicaMap

2016-09-26 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15523110#comment-15523110
 ] 

Fenghua Hu commented on HDFS-10828:
---

[~arpitagarwal],

Currently datasetLock is a big lock, which is used to replace original 
"synchronized" FsDatasetImpl object. For volumeMap, I am thinking whether we 
could use a separate lock so that we dont' need to contend datasetLock with 
other thread therefore the performance can be improved. What do you think?

-volumeMap = new ReplicaMap(this);
+volumeMap = new ReplicaMap(datasetLock);   <--
...
-ReplicaMap tempVolumeMap = new ReplicaMap(this);
+ReplicaMap tempVolumeMap = new ReplicaMap(datasetLock); 
<---
 fsVolume.getVolumeMap(tempVolumeMap, ramDiskReplicaTracker);

> Fix usage of FsDatasetImpl object lock in ReplicaMap
> 
>
> Key: HDFS-10828
> URL: https://issues.apache.org/jira/browse/HDFS-10828
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.8.0
>Reporter: Arpit Agarwal
>Assignee: Arpit Agarwal
> Attachments: HDFS-10828.01.patch, HDFS-10828.02.patch, 
> HDFS-10828.03.patch
>
>
> HDFS-10682 replaced the FsDatasetImpl object lock with a separate reentrant 
> lock but missed updating an instance ReplicaMap still uses the FsDatasetImpl.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-09-25 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10690:
--
Attachment: HDFS-10690.005.patch

Fixed a few coding style issues.

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> HDFS-10690.003.patch, HDFS-10690.004.patch, HDFS-10690.005.patch, 
> ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-09-25 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522107#comment-15522107
 ] 

Fenghua Hu commented on HDFS-10690:
---

Existing unit tests have been able to cover all the necessary tests, no new 
case needs to be introduced.

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> HDFS-10690.003.patch, HDFS-10690.004.patch, ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-09-25 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10690:
--
Attachment: HDFS-10690.004.patch

Fixed two minor issues.

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> HDFS-10690.003.patch, HDFS-10690.004.patch, ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-09-25 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15521943#comment-15521943
 ] 

Fenghua Hu commented on HDFS-10690:
---

hi [~xyao], based on your suggestion, i have updated the patch which fixes a 
few unit test issues. Could you please review it? Thanks.

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> HDFS-10690.003.patch, ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-09-25 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10690:
--
Attachment: HDFS-10690.003.patch

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> HDFS-10690.003.patch, ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-09-12 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15484232#comment-15484232
 ] 

Fenghua Hu commented on HDFS-10690:
---

[~xyao],

I just got environment to test your patch. Below is the result.
1. LruList:122K  122K 123K
2. LinkedMap: 118K  119K  117K QPS

In general, LinkedMap's performance is 3%-5% worse than LruList. I think it's 
reasonable, because LinkedMap needs hash and in the worst case it needs more 
comparison. 

I still prefer to LruList implementation for the minor performance improvement, 
but consider that the difference is not big, I am open to LinkedMap.  What do 
you think?

Again, thank you for your review and great suggestions! 


> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10804) Use finer-granularity lock for ReplicaMap

2016-08-30 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15450435#comment-15450435
 ] 

Fenghua Hu commented on HDFS-10804:
---

[~vagarychen], thanks for the suggestion.
My intention is to avoid using FsDatasetImpl for synchronization for 
performance, so a private object is introduced. But I am not quite sure if we 
CAN DO this. What do you think?


> Use finer-granularity lock for ReplicaMap
> -
>
> Key: HDFS-10804
> URL: https://issues.apache.org/jira/browse/HDFS-10804
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0-beta1
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
>Priority: Minor
> Attachments: HDFS-10804-001.patch, HDFS-10804-002.patch
>
>
> In currently implementation, ReplicaMap takes an external object as lock for 
> synchronization.
> In function FsDatasetImpl#FsDatasetImpl(), the object is for synchronization 
> is "this", i.e. FsDatasetImpl: 
> volumeMap = new ReplicaMap(this);
> and in private FsDatasetImpl#addVolume(), "this" object is used for 
> synchronization as well.
> ReplicaMap tempVolumeMap = new ReplicaMap(this);
> I am not sure if we really need so big object FsDatasetImpl  for ReplicaMap's 
> synchronization. If it's not necessary, this could reduce lock contention on 
> FsDatasetImpl object and improve performance. 
> Could you please give me some suggestions? Thanks a lot!
> Fenghua



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10682) Replace FsDatasetImpl object lock with a separate lock object

2016-08-29 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15446506#comment-15446506
 ] 

Fenghua Hu commented on HDFS-10682:
---

[~vagarychen] Actually, I am not sure why ReplicaMap takes a so big object 
FsDatasetImpl as synchronization lock, I think we could use smaller object,  
thus I have created a new JIRA to change it. Could you please review and 
comment?
https://issues.apache.org/jira/browse/HDFS-10804

Thanks.


> Replace FsDatasetImpl object lock with a separate lock object
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Fix For: 2.8.0
>
> Attachments: HDFS-10682-branch-2.001.patch, 
> HDFS-10682-branch-2.002.patch, HDFS-10682-branch-2.003.patch, 
> HDFS-10682-branch-2.004.patch, HDFS-10682-branch-2.005.patch, 
> HDFS-10682-branch-2.006.patch, HDFS-10682.001.patch, HDFS-10682.002.patch, 
> HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, 
> HDFS-10682.006.patch, HDFS-10682.007.patch, HDFS-10682.008.patch, 
> HDFS-10682.009.patch, HDFS-10682.010.patch
>
>
> This Jira proposes to replace the FsDatasetImpl object lock with a separate 
> lock object. Doing so will make it easier to measure lock statistics like 
> lock held time and warn about potential lock contention due to slow disk 
> operations.
> Right now we can use org.apache.hadoop.util.AutoCloseableLock. In the future 
> we can also consider replacing the lock with a read-write lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10804) Use finer-granularity lock for ReplicaMap

2016-08-29 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15445531#comment-15445531
 ] 

Fenghua Hu commented on HDFS-10804:
---

Failed case:
testWhileOpenRenameParentToNonexistentDir(org.apache.hadoop.hdfs.TestRenameWhileOpen)
  Time elapsed: 13.117 sec  <<< ERROR!
java.net.BindException: Problem binding to [localhost:38626] 
java.net.BindException: Address already in use;

Looks like it has nothing to do with the patch.


> Use finer-granularity lock for ReplicaMap
> -
>
> Key: HDFS-10804
> URL: https://issues.apache.org/jira/browse/HDFS-10804
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0-beta1
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
>Priority: Minor
> Fix For: 3.0.0-beta1
>
> Attachments: HDFS-10804-001.patch, HDFS-10804-002.patch
>
>
> In currently implementation, ReplicaMap takes an external object as lock for 
> synchronization.
> In function FsDatasetImpl#FsDatasetImpl(), the object is for synchronization 
> is "this", i.e. FsDatasetImpl: 
> volumeMap = new ReplicaMap(this);
> and in private FsDatasetImpl#addVolume(), "this" object is used for 
> synchronization as well.
> ReplicaMap tempVolumeMap = new ReplicaMap(this);
> I am not sure if we really need so big object FsDatasetImpl  for ReplicaMap's 
> synchronization. If it's not necessary, this could reduce lock contention on 
> FsDatasetImpl object and improve performance. 
> Could you please give me some suggestions? Thanks a lot!
> Fenghua



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10804) Use finer-granularity lock for ReplicaMap

2016-08-29 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10804:
--
Attachment: HDFS-10804-002.patch

> Use finer-granularity lock for ReplicaMap
> -
>
> Key: HDFS-10804
> URL: https://issues.apache.org/jira/browse/HDFS-10804
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0-beta1
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
>Priority: Minor
> Fix For: 3.0.0-beta1
>
> Attachments: HDFS-10804-001.patch, HDFS-10804-002.patch
>
>
> In currently implementation, ReplicaMap takes an external object as lock for 
> synchronization.
> In function FsDatasetImpl#FsDatasetImpl(), the object is for synchronization 
> is "this", i.e. FsDatasetImpl: 
> volumeMap = new ReplicaMap(this);
> and in private FsDatasetImpl#addVolume(), "this" object is used for 
> synchronization as well.
> ReplicaMap tempVolumeMap = new ReplicaMap(this);
> I am not sure if we really need so big object FsDatasetImpl  for ReplicaMap's 
> synchronization. If it's not necessary, this could reduce lock contention on 
> FsDatasetImpl object and improve performance. 
> Could you please give me some suggestions? Thanks a lot!
> Fenghua



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10804) Use finer-granularity lock for ReplicaMap

2016-08-29 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10804:
--
Attachment: HDFS-10804-001.patch

> Use finer-granularity lock for ReplicaMap
> -
>
> Key: HDFS-10804
> URL: https://issues.apache.org/jira/browse/HDFS-10804
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0-beta1
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
>Priority: Minor
> Fix For: 3.0.0-beta1
>
> Attachments: HDFS-10804-001.patch
>
>
> In currently implementation, ReplicaMap takes an external object as lock for 
> synchronization.
> In function FsDatasetImpl#FsDatasetImpl(), the object is for synchronization 
> is "this", i.e. FsDatasetImpl: 
> volumeMap = new ReplicaMap(this);
> and in private FsDatasetImpl#addVolume(), "this" object is used for 
> synchronization as well.
> ReplicaMap tempVolumeMap = new ReplicaMap(this);
> I am not sure if we really need so big object FsDatasetImpl  for ReplicaMap's 
> synchronization. If it's not necessary, this could reduce lock contention on 
> FsDatasetImpl object and improve performance. 
> Could you please give me some suggestions? Thanks a lot!
> Fenghua



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10804) Use finer-granularity lock for ReplicaMap

2016-08-29 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10804:
--
Attachment: (was: HDFS-10804-003.patch)

> Use finer-granularity lock for ReplicaMap
> -
>
> Key: HDFS-10804
> URL: https://issues.apache.org/jira/browse/HDFS-10804
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0-beta1
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
>Priority: Minor
> Fix For: 3.0.0-beta1
>
> Attachments: HDFS-10804-001.patch
>
>
> In currently implementation, ReplicaMap takes an external object as lock for 
> synchronization.
> In function FsDatasetImpl#FsDatasetImpl(), the object is for synchronization 
> is "this", i.e. FsDatasetImpl: 
> volumeMap = new ReplicaMap(this);
> and in private FsDatasetImpl#addVolume(), "this" object is used for 
> synchronization as well.
> ReplicaMap tempVolumeMap = new ReplicaMap(this);
> I am not sure if we really need so big object FsDatasetImpl  for ReplicaMap's 
> synchronization. If it's not necessary, this could reduce lock contention on 
> FsDatasetImpl object and improve performance. 
> Could you please give me some suggestions? Thanks a lot!
> Fenghua



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10804) Use finer-granularity lock for ReplicaMap

2016-08-29 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10804:
--
Attachment: (was: HDFS-10804-002.patch)

> Use finer-granularity lock for ReplicaMap
> -
>
> Key: HDFS-10804
> URL: https://issues.apache.org/jira/browse/HDFS-10804
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0-beta1
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
>Priority: Minor
> Fix For: 3.0.0-beta1
>
> Attachments: HDFS-10804-001.patch
>
>
> In currently implementation, ReplicaMap takes an external object as lock for 
> synchronization.
> In function FsDatasetImpl#FsDatasetImpl(), the object is for synchronization 
> is "this", i.e. FsDatasetImpl: 
> volumeMap = new ReplicaMap(this);
> and in private FsDatasetImpl#addVolume(), "this" object is used for 
> synchronization as well.
> ReplicaMap tempVolumeMap = new ReplicaMap(this);
> I am not sure if we really need so big object FsDatasetImpl  for ReplicaMap's 
> synchronization. If it's not necessary, this could reduce lock contention on 
> FsDatasetImpl object and improve performance. 
> Could you please give me some suggestions? Thanks a lot!
> Fenghua



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10804) Use finer-granularity lock for ReplicaMap

2016-08-29 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10804:
--
Attachment: HDFS-10804-003.patch

> Use finer-granularity lock for ReplicaMap
> -
>
> Key: HDFS-10804
> URL: https://issues.apache.org/jira/browse/HDFS-10804
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0-beta1
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
>Priority: Minor
> Fix For: 3.0.0-beta1
>
> Attachments: HDFS-10804-002.patch, HDFS-10804-003.patch
>
>
> In currently implementation, ReplicaMap takes an external object as lock for 
> synchronization.
> In function FsDatasetImpl#FsDatasetImpl(), the object is for synchronization 
> is "this", i.e. FsDatasetImpl: 
> volumeMap = new ReplicaMap(this);
> and in private FsDatasetImpl#addVolume(), "this" object is used for 
> synchronization as well.
> ReplicaMap tempVolumeMap = new ReplicaMap(this);
> I am not sure if we really need so big object FsDatasetImpl  for ReplicaMap's 
> synchronization. If it's not necessary, this could reduce lock contention on 
> FsDatasetImpl object and improve performance. 
> Could you please give me some suggestions? Thanks a lot!
> Fenghua



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10804) Use finer-granularity lock for ReplicaMap

2016-08-29 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10804:
--
Attachment: HDFS-10804-002.patch

> Use finer-granularity lock for ReplicaMap
> -
>
> Key: HDFS-10804
> URL: https://issues.apache.org/jira/browse/HDFS-10804
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0-beta1
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
>Priority: Minor
> Fix For: 3.0.0-beta1
>
> Attachments: HDFS-10804-002.patch
>
>
> In currently implementation, ReplicaMap takes an external object as lock for 
> synchronization.
> In function FsDatasetImpl#FsDatasetImpl(), the object is for synchronization 
> is "this", i.e. FsDatasetImpl: 
> volumeMap = new ReplicaMap(this);
> and in private FsDatasetImpl#addVolume(), "this" object is used for 
> synchronization as well.
> ReplicaMap tempVolumeMap = new ReplicaMap(this);
> I am not sure if we really need so big object FsDatasetImpl  for ReplicaMap's 
> synchronization. If it's not necessary, this could reduce lock contention on 
> FsDatasetImpl object and improve performance. 
> Could you please give me some suggestions? Thanks a lot!
> Fenghua



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10804) Use finer-granularity lock for ReplicaMap

2016-08-29 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10804:
--
Fix Version/s: 3.0.0-beta1
   Status: Patch Available  (was: Open)

> Use finer-granularity lock for ReplicaMap
> -
>
> Key: HDFS-10804
> URL: https://issues.apache.org/jira/browse/HDFS-10804
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0-beta1
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
>Priority: Minor
> Fix For: 3.0.0-beta1
>
>
> In currently implementation, ReplicaMap takes an external object as lock for 
> synchronization.
> In function FsDatasetImpl#FsDatasetImpl(), the object is for synchronization 
> is "this", i.e. FsDatasetImpl: 
> volumeMap = new ReplicaMap(this);
> and in private FsDatasetImpl#addVolume(), "this" object is used for 
> synchronization as well.
> ReplicaMap tempVolumeMap = new ReplicaMap(this);
> I am not sure if we really need so big object FsDatasetImpl  for ReplicaMap's 
> synchronization. If it's not necessary, this could reduce lock contention on 
> FsDatasetImpl object and improve performance. 
> Could you please give me some suggestions? Thanks a lot!
> Fenghua



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10804) Use finer-granularity lock for ReplicaMap

2016-08-29 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10804:
--
Status: Open  (was: Patch Available)

> Use finer-granularity lock for ReplicaMap
> -
>
> Key: HDFS-10804
> URL: https://issues.apache.org/jira/browse/HDFS-10804
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0-beta1
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
>Priority: Minor
>
> In currently implementation, ReplicaMap takes an external object as lock for 
> synchronization.
> In function FsDatasetImpl#FsDatasetImpl(), the object is for synchronization 
> is "this", i.e. FsDatasetImpl: 
> volumeMap = new ReplicaMap(this);
> and in private FsDatasetImpl#addVolume(), "this" object is used for 
> synchronization as well.
> ReplicaMap tempVolumeMap = new ReplicaMap(this);
> I am not sure if we really need so big object FsDatasetImpl  for ReplicaMap's 
> synchronization. If it's not necessary, this could reduce lock contention on 
> FsDatasetImpl object and improve performance. 
> Could you please give me some suggestions? Thanks a lot!
> Fenghua



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10804) Use finer-granularity lock for ReplicaMap

2016-08-29 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10804:
--
Fix Version/s: 3.0.0-beta1
 Release Note: Add a private object for synchronization as default. 
This should be able to improve performance.
Affects Version/s: 3.0.0-beta1
   Status: Patch Available  (was: Open)

> Use finer-granularity lock for ReplicaMap
> -
>
> Key: HDFS-10804
> URL: https://issues.apache.org/jira/browse/HDFS-10804
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.0.0-beta1
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
>Priority: Minor
> Fix For: 3.0.0-beta1
>
>
> In currently implementation, ReplicaMap takes an external object as lock for 
> synchronization.
> In function FsDatasetImpl#FsDatasetImpl(), the object is for synchronization 
> is "this", i.e. FsDatasetImpl: 
> volumeMap = new ReplicaMap(this);
> and in private FsDatasetImpl#addVolume(), "this" object is used for 
> synchronization as well.
> ReplicaMap tempVolumeMap = new ReplicaMap(this);
> I am not sure if we really need so big object FsDatasetImpl  for ReplicaMap's 
> synchronization. If it's not necessary, this could reduce lock contention on 
> FsDatasetImpl object and improve performance. 
> Could you please give me some suggestions? Thanks a lot!
> Fenghua



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10682) Replace FsDatasetImpl object lock with a separate lock object

2016-08-28 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15444774#comment-15444774
 ] 

Fenghua Hu edited comment on HDFS-10682 at 8/29/16 4:25 AM:


In FsDatasetImpl#FsDatasetImpl() and FsDatasetImpl#addVolume():
volumeMap = new ReplicaMap(this);
and 
ReplicaMap tempVolumeMap = new ReplicaMap(this);

"this" is used as synchronization object:

 ReplicaMap(Object mutex) {
  if (mutex == null) {
throw new HadoopIllegalArgumentException(
"Object to synchronize on cannot be null");
  }
  this.mutex = mutex;
  }

ReplicaMap uses synchronized(mutex) {...} for synchronization. Do we need 
change it accordingly?
[~vagarychen] [~arpitagarwal]



was (Author: fenghua_hu):
In FsDatasetImpl#FsDatasetImpl() and FsDatasetImpl#addVolume():
volumeMap = new ReplicaMap(this);
and 
ReplicaMap tempVolumeMap = new ReplicaMap(this);

"this" is used as synchronization object:

 52   ReplicaMap(Object mutex) {
 53 if (mutex == null) {
 54   throw new HadoopIllegalArgumentException(
 55   "Object to synchronize on cannot be null");
 56 }
 57 this.mutex = mutex;

ReplicaMap uses synchronized(mutex) {...} for synchronization. Do we need 
change it accordingly?
[~vagarychen] [~arpitagarwal]


> Replace FsDatasetImpl object lock with a separate lock object
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Fix For: 2.8.0
>
> Attachments: HDFS-10682-branch-2.001.patch, 
> HDFS-10682-branch-2.002.patch, HDFS-10682-branch-2.003.patch, 
> HDFS-10682-branch-2.004.patch, HDFS-10682-branch-2.005.patch, 
> HDFS-10682-branch-2.006.patch, HDFS-10682.001.patch, HDFS-10682.002.patch, 
> HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, 
> HDFS-10682.006.patch, HDFS-10682.007.patch, HDFS-10682.008.patch, 
> HDFS-10682.009.patch, HDFS-10682.010.patch
>
>
> This Jira proposes to replace the FsDatasetImpl object lock with a separate 
> lock object. Doing so will make it easier to measure lock statistics like 
> lock held time and warn about potential lock contention due to slow disk 
> operations.
> Right now we can use org.apache.hadoop.util.AutoCloseableLock. In the future 
> we can also consider replacing the lock with a read-write lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10682) Replace FsDatasetImpl object lock with a separate lock object

2016-08-28 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15444774#comment-15444774
 ] 

Fenghua Hu commented on HDFS-10682:
---

In FsDatasetImpl#FsDatasetImpl() and FsDatasetImpl#addVolume():
volumeMap = new ReplicaMap(this);
and 
ReplicaMap tempVolumeMap = new ReplicaMap(this);

"this" is used as synchronization object:

 52   ReplicaMap(Object mutex) {
 53 if (mutex == null) {
 54   throw new HadoopIllegalArgumentException(
 55   "Object to synchronize on cannot be null");
 56 }
 57 this.mutex = mutex;

ReplicaMap uses synchronized(mutex) {...} for synchronization. Do we need 
change it accordingly?
[~vagarychen] [~arpitagarwal]


> Replace FsDatasetImpl object lock with a separate lock object
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Fix For: 2.8.0
>
> Attachments: HDFS-10682-branch-2.001.patch, 
> HDFS-10682-branch-2.002.patch, HDFS-10682-branch-2.003.patch, 
> HDFS-10682-branch-2.004.patch, HDFS-10682-branch-2.005.patch, 
> HDFS-10682-branch-2.006.patch, HDFS-10682.001.patch, HDFS-10682.002.patch, 
> HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, 
> HDFS-10682.006.patch, HDFS-10682.007.patch, HDFS-10682.008.patch, 
> HDFS-10682.009.patch, HDFS-10682.010.patch
>
>
> This Jira proposes to replace the FsDatasetImpl object lock with a separate 
> lock object. Doing so will make it easier to measure lock statistics like 
> lock held time and warn about potential lock contention due to slow disk 
> operations.
> Right now we can use org.apache.hadoop.util.AutoCloseableLock. In the future 
> we can also consider replacing the lock with a read-write lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10804) Use finer-granularity lock for ReplicaMap

2016-08-26 Thread Fenghua Hu (JIRA)
Fenghua Hu created HDFS-10804:
-

 Summary: Use finer-granularity lock for ReplicaMap
 Key: HDFS-10804
 URL: https://issues.apache.org/jira/browse/HDFS-10804
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs
Reporter: Fenghua Hu
Assignee: Fenghua Hu
Priority: Minor


In currently implementation, ReplicaMap takes an external object as lock for 
synchronization.

In function FsDatasetImpl#FsDatasetImpl(), the object is for synchronization is 
"this", i.e. FsDatasetImpl: 
volumeMap = new ReplicaMap(this);

and in private FsDatasetImpl#addVolume(), "this" object is used for 
synchronization as well.
ReplicaMap tempVolumeMap = new ReplicaMap(this);

I am not sure if we really need so big object FsDatasetImpl  for ReplicaMap's 
synchronization. If it's not necessary, this could reduce lock contention on 
FsDatasetImpl object and improve performance. 

Could you please give me some suggestions? Thanks a lot!

Fenghua



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-08-23 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15434077#comment-15434077
 ] 

Fenghua Hu commented on HDFS-10690:
---

I got you. I'll run YCSB.

Regarding the patch, what do you mean "I don't think we need the extra 
indexOf()."? I think the LinkedMap patch needs revision, am i right?

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-08-23 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15433918#comment-15433918
 ] 

Fenghua Hu commented on HDFS-10690:
---

[~xyao], thanks for your comments. I'll write a micro benchmark to compare 
LinkedMap with TreeMap and get back to you.

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-08-23 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15433142#comment-15433142
 ] 

Fenghua Hu edited comment on HDFS-10690 at 8/23/16 4:43 PM:


I would like to explain the solution here.

Currently, TreeMap is used to track all the ShortCircuitReplica entries in the 
order of being inserted. These entries could be removed from TreeMap in two 
cases. The first is when the entry is accessed again, it will be removed from 
TreeMap. Please note that the entry could be anyone in the TreeMap. The other 
is when the entry is evicted due to treemap size limitation, in this case, only 
the eldest entry will be removed.

Removal is a costly operation for the first case, because looking up 
ShortCircuitReplica is needed, in TreeMap, it's O(log n) operation. To improve 
it, we design a new data structure LruList, which entirely eliminates costly 
look-up operation. 
+++-+   
  +-+
 |   Replica 1  |-Next> |Replica  2  |-Next> |  
  Replica  3  |
 |   Replica 1  |<-Prev |Replica  2  |<-Prev |  
  Replica  3  |
+++-+   
 +-+


We introduced two references in ShortCircuitReplica objects. Reference Next 
points to the elder ShortCircuitReplica and Prev points to the younger one. All 
the replicas are doubly linked in order of insertion time. The youngest is 
always at the head of the linked list, and the eldest is always at the tail.  
Removing the entries between the head and the tail doesn't need any lookup, 
because the replica knows its position in linked list by next and prev, thus 
remove is simple: change it's precedessor's and its succssor's next and prev. 
The order of operation is always O(1). 

For insertion, the youngest entry is always be added to the head, thus the 
operation is also O(1).

Existing classes, including LinkedHashMap, LinkedMap, can't provide O(1) 
operation for insertion/lookup/removal.

Here comes a brief test result:

Run GET queries with 64 YCSB processes for 30 minutes, record the QPS for each 
process。
Total QPS:
w/o patch: 95K
w/ patch: 135K

The performance gain is (135 - 95) / 95 = 42%.

Suggestions/comments are very welcomed.


was (Author: fenghua_hu):
I would like to explain the solution here.

Currently, TreeMap is used to track all the ShortCircuitReplica entries in the 
order of being inserted. These entries could be removed from TreeMap in two 
cases. The first is when the entry is accessed again, it will be removed from 
TreeMap. Please note that the entry could be anyone in the TreeMap. The other 
is when the entry is evicted due to treemap size limitation, in this case, only 
the eldest entry will be removed.

Removal is a costly operation for the first case, because looking up 
ShortCircuitReplica is needed, in TreeMap, it's O(log n) operation. To improve 
it, we design a new data structure LruList, which entirely eliminates costly 
look-up operation. 
+++-+   
 +-+
  >|   Replica 1  |-Next> |Replica  2  |-Next> |
Replica  3  |
 <-|   Replica 1  |<-Prev |Replica  2  |<-Prev |
Replica  3  |
+++-+   
 +-+


We introduced two references in ShortCircuitReplica objects. Reference Next 
points to the elder ShortCircuitReplica and Prev points to the younger one. All 
the replica is doubly linked in order of insertion time. The youngest is always 
at the head of the linked list, and the eldest is always at the tail.  Removing 
the entries between the head and the tail doesn't need any lookup, because the 
replica knows its position in linked list by next and prev, thus remove is 
simple: change it's precedessor's and its succssor's next and prev. The order 
of operation is always O(1). 

For insertion, the youngest entry is always be added to the head, thus the 
operation is also O(1).

Existing classes, including LinkedHashMap, LinkedMap, can't provide O(1) 
operation for insertion/lookup/removal.

Here comes a brief test result:

Run GET queries with 64 YCSB processes for 30 minutes, record the QPS for each 
process。
Total QPS:
w/o patch: 95K
w/ patch: 135K

The performance gain is (135 - 95) / 95 = 42%.

Suggestions/comments are very welcomed.

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: 

[jira] [Comment Edited] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-08-23 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15433142#comment-15433142
 ] 

Fenghua Hu edited comment on HDFS-10690 at 8/23/16 4:41 PM:


I would like to explain the solution here.

Currently, TreeMap is used to track all the ShortCircuitReplica entries in the 
order of being inserted. These entries could be removed from TreeMap in two 
cases. The first is when the entry is accessed again, it will be removed from 
TreeMap. Please note that the entry could be anyone in the TreeMap. The other 
is when the entry is evicted due to treemap size limitation, in this case, only 
the eldest entry will be removed.

Removal is a costly operation for the first case, because looking up 
ShortCircuitReplica is needed, in TreeMap, it's O(log n) operation. To improve 
it, we design a new data structure LruList, which entirely eliminates costly 
look-up operation. 
+++-+   
 +-+
  >|   Replica 1  |-Next> |Replica  2  |-Next> |
Replica  3  |
 <-|   Replica 1  |<-Prev |Replica  2  |<-Prev |
Replica  3  |
+++-+   
 +-+


We introduced two references in ShortCircuitReplica objects. Reference Next 
points to the elder ShortCircuitReplica and Prev points to the younger one. All 
the replica is doubly linked in order of insertion time. The youngest is always 
at the head of the linked list, and the eldest is always at the tail.  Removing 
the entries between the head and the tail doesn't need any lookup, because the 
replica knows its position in linked list by next and prev, thus remove is 
simple: change it's precedessor's and its succssor's next and prev. The order 
of operation is always O(1). 

For insertion, the youngest entry is always be added to the head, thus the 
operation is also O(1).

Existing classes, including LinkedHashMap, LinkedMap, can't provide O(1) 
operation for insertion/lookup/removal.

Here comes a brief test result:

Run GET queries with 64 YCSB processes for 30 minutes, record the QPS for each 
process。
Total QPS:
w/o patch: 95K
w/ patch: 135K

The performance gain is (135 - 95) / 95 = 42%.

Suggestions/comments are very welcomed.


was (Author: fenghua_hu):
I would like to explain the solution here.

Currently, TreeMap is used to track all the ShortCircuitReplica entries in the 
order of being inserted. These entries could be removed from TreeMap in two 
cases. The first is when the entry is accessed again, it will be removed from 
TreeMap. Please note that the entry could be anyone in the TreeMap. The other 
is when the entry is evicted due to treemap size limitation, in this case, only 
the eldest entry will be removed.

Removal is a costly operation for the first case, because looking up 
ShortCircuitReplica is needed, in TreeMap, it's O(log n) operation. To improve 
it, we design a new data structure LruList, which entirely eliminates costly 
look-up operation. 
+++-+   
 +-+
 || |  
| |  |
  >|   Replica 1  |-Next> |Replica  2  |-Next> |
Replica  3  |
 || |  
| |  |
 <-||<-Prev |  
|<-Prev |  |
++ +-+  
  +-+


We introduced two references in ShortCircuitReplica objects. Reference Next 
points to the elder ShortCircuitReplica and Prev points to the younger one. All 
the replica is doubly linked in order of insertion time. The youngest is always 
at the head of the linked list, and the eldest is always at the tail.  Removing 
the entries between the head and the tail doesn't need any lookup, because the 
replica knows its position in linked list by next and prev, thus remove is 
simple: change it's precedessor's and its succssor's next and prev. The order 
of operation is always O(1). 

For insertion, the youngest entry is always be added to the head, thus the 
operation is also O(1).

Existing classes, including LinkedHashMap, LinkedMap, can't provide O(1) 
operation for insertion/lookup/removal.

Here comes a brief test result:

Run GET queries with 64 YCSB processes for 30 minutes, record the QPS for each 
process。
Total QPS:
w/o patch: 95K
w/ patch: 135K

The performance gain is (135 - 95) / 95 = 42%.

Suggestions/comments are very welcomed.

> Optimize insertion/removal of 

[jira] [Comment Edited] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-08-23 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15433142#comment-15433142
 ] 

Fenghua Hu edited comment on HDFS-10690 at 8/23/16 4:38 PM:


I would like to explain the solution here.

Currently, TreeMap is used to track all the ShortCircuitReplica entries in the 
order of being inserted. These entries could be removed from TreeMap in two 
cases. The first is when the entry is accessed again, it will be removed from 
TreeMap. Please note that the entry could be anyone in the TreeMap. The other 
is when the entry is evicted due to treemap size limitation, in this case, only 
the eldest entry will be removed.

Removal is a costly operation for the first case, because looking up 
ShortCircuitReplica is needed, in TreeMap, it's O(log n) operation. To improve 
it, we design a new data structure LruList, which entirely eliminates costly 
look-up operation. 
+++-+   
 +-+
 || |  
| |  |
  >|   Replica 1  |-Next> |Replica  2  |-Next> |
Replica  3  |
 || |  
| |  |
 <-||<-Prev |  
|<-Prev |  |
++ +-+  
  +-+


We introduced two references in ShortCircuitReplica objects. Reference Next 
points to the elder ShortCircuitReplica and Prev points to the younger one. All 
the replica is doubly linked in order of insertion time. The youngest is always 
at the head of the linked list, and the eldest is always at the tail.  Removing 
the entries between the head and the tail doesn't need any lookup, because the 
replica knows its position in linked list by next and prev, thus remove is 
simple: change it's precedessor's and its succssor's next and prev. The order 
of operation is always O(1). 

For insertion, the youngest entry is always be added to the head, thus the 
operation is also O(1).

Existing classes, including LinkedHashMap, LinkedMap, can't provide O(1) 
operation for insertion/lookup/removal.

Here comes a brief test result:

Run GET queries with 64 YCSB processes for 30 minutes, record the QPS for each 
process。
Total QPS:
w/o patch: 95K
w/ patch: 135K

The performance gain is (135 - 95) / 95 = 42%.

Suggestions/comments are very welcomed.


was (Author: fenghua_hu):
I would like to explain the solution here.

Currently, TreeMap is used to track all the ShortCircuitReplica entries in the 
order of being inserted. These entries could be removed from TreeMap in two 
cases. The first is when the entry is accessed again, it will be removed from 
TreeMap. Please note that the entry could be anyone in the TreeMap. The other 
is when the entry is evicted due to treemap size limitation, in this case, only 
the eldest entry will be removed.

Removal is a costly operation for the first case, because looking up 
ShortCircuitReplica is needed, in TreeMap, it's O(log n) operation. To improve 
it, we design a new data structure LruList, which entirely eliminates costly 
look-up operation. 
+++-+
 |Replica 1 | |Replica  2 |
+++-+
  >| Next|-> | Next |--->...
+++-+
 <-| Prev|<-| Prev |<--...
+++-+
 || | |
+++-+

We introduced two references in ShortCircuitReplica objects. Reference Next 
points to the elder ShortCircuitReplica and Prev points to the younger one. All 
the replica is doubly linked in order of insertion time. The youngest is always 
at the head of the linked list, and the eldest is always at the tail.  Removing 
the entries between the head and the tail doesn't need any lookup, because the 
replica knows its position in linked list by next and prev, thus remove is 
simple: change it's precedessor's and its succssor's next and prev. The order 
of operation is always O(1). 

For insertion, the youngest entry is always be added to the head, thus the 
operation is also O(1).

Existing classes, including LinkedHashMap, LinkedMap, can't provide O(1) 
operation for insertion/lookup/removal.

Here comes a brief test result:

Run GET queries with 64 YCSB processes for 30 minutes, record the QPS for each 
process。
Total QPS:
w/o patch: 95K
w/ 

[jira] [Comment Edited] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-08-23 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15433142#comment-15433142
 ] 

Fenghua Hu edited comment on HDFS-10690 at 8/23/16 4:32 PM:


I would like to explain the solution here.

Currently, TreeMap is used to track all the ShortCircuitReplica entries in the 
order of being inserted. These entries could be removed from TreeMap in two 
cases. The first is when the entry is accessed again, it will be removed from 
TreeMap. Please note that the entry could be anyone in the TreeMap. The other 
is when the entry is evicted due to treemap size limitation, in this case, only 
the eldest entry will be removed.

Removal is a costly operation for the first case, because looking up 
ShortCircuitReplica is needed, in TreeMap, it's O(log n) operation. To improve 
it, we design a new data structure LruList, which entirely eliminates costly 
look-up operation. 
+++-+
 |Replica 1 | |Replica  2 |
+++-+
  >| Next|-> | Next |--->...
+++-+
 <-| Prev|<-| Prev |<--...
+++-+
 || | |
+++-+

We introduced two references in ShortCircuitReplica objects. Reference Next 
points to the elder ShortCircuitReplica and Prev points to the younger one. All 
the replica is doubly linked in order of insertion time. The youngest is always 
at the head of the linked list, and the eldest is always at the tail.  Removing 
the entries between the head and the tail doesn't need any lookup, because the 
replica knows its position in linked list by next and prev, thus remove is 
simple: change it's precedessor's and its succssor's next and prev. The order 
of operation is always O(1). 

For insertion, the youngest entry is always be added to the head, thus the 
operation is also O(1).

Existing classes, including LinkedHashMap, LinkedMap, can't provide O(1) 
operation for insertion/lookup/removal.

Here comes a brief test result:

Run GET queries with 64 YCSB processes for 30 minutes, record the QPS for each 
process。
Total QPS:
w/o patch: 95K
w/ patch: 135K

The performance gain is (135 - 95) / 95 = 42%.

Suggestions/comments are very welcomed.


was (Author: fenghua_hu):
I would like to explain the solution here.

Currently, TreeMap is used to track all the ShortCircuitReplica entries in the 
order of being inserted. These entries could be removed from TreeMap in two 
cases. The first is when the entry is accessed again, it will be removed from 
TreeMap. Please note that the entry could be anyone in the TreeMap. The other 
is when the entry is evicted due to treemap size limitation, in this case, only 
the eldest entry will be removed.

Removal is a costly operation for the first case, because looking up 
ShortCircuitReplica is needed, in TreeMap, it's O(log n) operation. To improve 
it, we design a new data structure LruList, which entirely eliminates costly 
look-up operation. 
+++-+
 |Replica 1 | |Replica  2 |
+++-+
..>| Next|->| Next |--->...
+++-+
.<-| Prev|<-| Prev |<--...
+++-+
 || | |
+++-+

We introduced two references in ShortCircuitReplica objects. Reference Next 
points to the elder ShortCircuitReplica and Prev points to the younger one. All 
the replica is doubly linked in order of insertion time. The youngest is always 
at the head of the linked list, and the eldest is always at the tail.  Removing 
the entries between the head and the tail doesn't need any lookup, because the 
replica knows its position in linked list by next and prev, thus remove is 
simple: change it's precedessor's and its succssor's next and prev. The order 
of operation is always O(1). 

For insertion, the youngest entry is always be added to the head, thus the 
operation is also O(1).

Existing classes, including LinkedHashMap, LinkedMap, can't provide O(1) 
operation for insertion/lookup/removal.

Here comes a brief test result:

Run GET queries with 64 YCSB processes for 30 minutes, record the QPS for each 
process。
Total QPS:
w/o patch: 95K
w/ patch: 135K

The performance gain is (135 - 95) / 95 = 42%.

Suggestions/comments are very welcomed.

> Optimize insertion/removal of 

[jira] [Comment Edited] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-08-23 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15433142#comment-15433142
 ] 

Fenghua Hu edited comment on HDFS-10690 at 8/23/16 4:30 PM:


I would like to explain the solution here.

Currently, TreeMap is used to track all the ShortCircuitReplica entries in the 
order of being inserted. These entries could be removed from TreeMap in two 
cases. The first is when the entry is accessed again, it will be removed from 
TreeMap. Please note that the entry could be anyone in the TreeMap. The other 
is when the entry is evicted due to treemap size limitation, in this case, only 
the eldest entry will be removed.

Removal is a costly operation for the first case, because looking up 
ShortCircuitReplica is needed, in TreeMap, it's O(log n) operation. To improve 
it, we design a new data structure LruList, which entirely eliminates costly 
look-up operation. 
+++-+
 |Replica 1 | |Replica  2 |
+++-+
..>| Next|->| Next |--->...
+++-+
.<-| Prev|<-| Prev |<--...
+++-+
 || | |
+++-+

We introduced two references in ShortCircuitReplica objects. Reference Next 
points to the elder ShortCircuitReplica and Prev points to the younger one. All 
the replica is doubly linked in order of insertion time. The youngest is always 
at the head of the linked list, and the eldest is always at the tail.  Removing 
the entries between the head and the tail doesn't need any lookup, because the 
replica knows its position in linked list by next and prev, thus remove is 
simple: change it's precedessor's and its succssor's next and prev. The order 
of operation is always O(1). 

For insertion, the youngest entry is always be added to the head, thus the 
operation is also O(1).

Existing classes, including LinkedHashMap, LinkedMap, can't provide O(1) 
operation for insertion/lookup/removal.

Here comes a brief test result:

Run GET queries with 64 YCSB processes for 30 minutes, record the QPS for each 
process。
Total QPS:
w/o patch: 95K
w/ patch: 135K

The performance gain is (135 - 95) / 95 = 42%.

Suggestions/comments are very welcomed.


was (Author: fenghua_hu):
I would like to explain the solution here.

Currently, TreeMap is used to track all the ShortCircuitReplica entries in the 
order of being inserted. These entries could be removed from TreeMap in two 
cases. The first is when the entry is accessed again, it will be removed from 
TreeMap. Please note that the entry could be anyone in the TreeMap. The other 
is when the entry is evicted due to treemap size limitation, in this case, only 
the eldest entry will be removed.

Removal is a costly operation for the first case, because looking up 
ShortCircuitReplica is needed, in TreeMap, it's O(n) operation. To improve it, 
we design a new data structure LruList, which entirely eliminates costly 
look-up operation. 
+++-+
 |Replica 1 | |Replica  2 |
..>| Next --|->|Next--- |--->...
.<-|Prev|<-|Prev |<--...
 ||| |
++   ++

We introduced two references in ShortCircuitReplica objects. Reference Next 
points to the elder ShortCircuitReplica and Prev points to the younger one. All 
the replica is doubly linked in order of insertion time. The youngest is always 
at the head of the linked list, and the eldest is always at the tail.  Removing 
the entries between the head and the tail doesn't need any lookup, because the 
replica knows its position in linked list by next and prev, thus remove is 
simple: change it's precedessor's and its succssor's next and prev. The order 
of operation is always O(1). 

For insertion, the youngest entry is always be added to the head, thus the 
operation is also O(1).

Existing classes, including LinkedHashMap, LinkedMap, can't provide O(1) 
operation for insertion/lookup/removal.

Here comes a brief test result:

Run GET queries with 64 YCSB processes for 30 minutes, record the QPS for each 
process。
Total QPS:
w/o patch: 95K
w/ patch: 135K

The performance gain is (135 - 95) / 95 = 42%.

Suggestions/comments are very welcomed.

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: 

[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-08-23 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15433142#comment-15433142
 ] 

Fenghua Hu commented on HDFS-10690:
---

I would like to explain the solution here.

Currently, TreeMap is used to track all the ShortCircuitReplica entries in the 
order of being inserted. These entries could be removed from TreeMap in two 
cases. The first is when the entry is accessed again, it will be removed from 
TreeMap. Please note that the entry could be anyone in the TreeMap. The other 
is when the entry is evicted due to treemap size limitation, in this case, only 
the eldest entry will be removed.

Removal is a costly operation for the first case, because looking up 
ShortCircuitReplica is needed, in TreeMap, it's O(n) operation. To improve it, 
we design a new data structure LruList, which entirely eliminates costly 
look-up operation. 
+++-+
 |Replica 1 | |Replica  2 |
..>| Next --|->|Next--- |--->...
.<-|Prev|<-|Prev |<--...
 ||| |
++   ++

We introduced two references in ShortCircuitReplica objects. Reference Next 
points to the elder ShortCircuitReplica and Prev points to the younger one. All 
the replica is doubly linked in order of insertion time. The youngest is always 
at the head of the linked list, and the eldest is always at the tail.  Removing 
the entries between the head and the tail doesn't need any lookup, because the 
replica knows its position in linked list by next and prev, thus remove is 
simple: change it's precedessor's and its succssor's next and prev. The order 
of operation is always O(1). 

For insertion, the youngest entry is always be added to the head, thus the 
operation is also O(1).

Existing classes, including LinkedHashMap, LinkedMap, can't provide O(1) 
operation for insertion/lookup/removal.

Here comes a brief test result:

Run GET queries with 64 YCSB processes for 30 minutes, record the QPS for each 
process。
Total QPS:
w/o patch: 95K
w/ patch: 135K

The performance gain is (135 - 95) / 95 = 42%.

Suggestions/comments are very welcomed.

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-08-19 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427691#comment-15427691
 ] 

Fenghua Hu commented on HDFS-9668:
--

[~jingcheng...@intel.com],
We reviewed your lock patch, but didn't find any apparent issues, except that 
some external references to FsDatasetImpl object needs to be modified 
correspondingly. "I realize that the patch is not implemented properly.", Could 
you please elaborate your concern so that we can think about again? Thanks.


> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, 
> execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.




[jira] [Commented] (HDFS-9668) Optimize the locking in FsDatasetImpl

2016-08-18 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-9668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15427542#comment-15427542
 ] 

Fenghua Hu commented on HDFS-9668:
--

[~jingcheng...@intel.com], Thanks for the great work. 
I found you had removed the code for finer-granularity lock and read/write lock 
from earlier patches.  I think it is still very important even if we move IO 
out of the lock. What do you think?


> Optimize the locking in FsDatasetImpl
> -
>
> Key: HDFS-9668
> URL: https://issues.apache.org/jira/browse/HDFS-9668
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HDFS-9668-1.patch, HDFS-9668-2.patch, HDFS-9668-3.patch, 
> execution_time.png
>
>
> During the HBase test on a tiered storage of HDFS (WAL is stored in 
> SSD/RAMDISK, and all other files are stored in HDD), we observe many 
> long-time BLOCKED threads on FsDatasetImpl in DataNode. The following is part 
> of the jstack result:
> {noformat}
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48521 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779272_40852]" - Thread 
> t@93336
>java.lang.Thread.State: BLOCKED
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:)
>   - waiting to lock <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl) owned by 
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" t@93335
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
>   
> "DataXceiver for client DFSClient_NONMAPREDUCE_-1626037897_1 at 
> /192.168.50.16:48520 [Receiving block 
> BP-1042877462-192.168.50.13-1446173170517:blk_1073779271_40851]" - Thread 
> t@93335
>java.lang.Thread.State: RUNNABLE
>   at java.io.UnixFileSystem.createFileExclusively(Native Method)
>   at java.io.File.createNewFile(File.java:1012)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DatanodeUtil.createTmpFile(DatanodeUtil.java:66)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice.createRbwFile(BlockPoolSlice.java:271)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeImpl.createRbwFile(FsVolumeImpl.java:286)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1140)
>   - locked <18324c9> (a 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl)
>   at 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:113)
>   at 
> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:183)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:615)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:137)
>   at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:74)
>   at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:235)
>   at java.lang.Thread.run(Thread.java:745)
>Locked ownable synchronizers:
>   - None
> {noformat}
> We measured the execution of some operations in FsDatasetImpl during the 
> test. Here following is the result.
> !execution_time.png!
> The operations of finalizeBlock, addBlock and createRbw on HDD in a heavy 
> load take a really long time.
> It means one slow operation of finalizeBlock, addBlock and createRbw in a 
> slow storage can block all the other same operations in the same DataNode, 
> especially in HBase when many wal/flusher/compactor are configured.
> We need a finer grained lock mechanism in a new FsDatasetImpl implementation 
> and users can choose the implementation by configuring 
> "dfs.datanode.fsdataset.factory" in DataNode.
> We can implement the lock by either storage level or block-level.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-08-16 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15423714#comment-15423714
 ] 

Fenghua Hu commented on HDFS-10690:
---

[~xyao],

Thanks for your support!

I would like to clarify that the removal could happen in two cases. The first 
case is to remove the eldest, just like you mentioned. The other is that when  
an entry in map is accessed/activated again, it's promoted from cache to 
memory, we need to remove it. I think the latter is more common. For this case, 
it's a O( n ) operation. Sorry for the confusion.



> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-08-14 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420579#comment-15420579
 ] 

Fenghua Hu edited comment on HDFS-10690 at 8/15/16 3:25 AM:


[~xyao], I have a question about the patch:

According to 
https://commons.apache.org/proper/commons-collections/jacoco/org.apache.commons.collections4.map/LinkedMap.java.html

public K get(final int index) {
return getEntry(index).getKey();
}

public V remove(final int index) {
return remove(get(index));
}

note that the parameter is index.
In the patch,
...
+  ShortCircuitReplica replica = (ShortCircuitReplica)evictableMmapped.get
+  (eldestKey);
...
+ShortCircuitReplica removed = (ShortCircuitReplica)map.remove
+(evictableTimeNs);

Here eldestKey and evictableTimeNs actually are the key, not the index, thus 
they should respectively been changed to
evictableMmapped.getValue(map.indexOf(eldestKey));
and
map.remove(map.indexOf(evictableTimeNs);

But indexOf() is a O( n ) operation and LinkedMap's performance for remove 
won't compete with TreeMap().

Correct me if i am wrong. Thanks.



was (Author: fenghua_hu):
[~xyao], I have a question about the patch:

According to 
https://commons.apache.org/proper/commons-collections/jacoco/org.apache.commons.collections4.map/LinkedMap.java.html

public K get(final int index) {
return getEntry(index).getKey();
}

public V remove(final int index) {
return remove(get(index));
}

note that the parameter is index.
In the patch,
...
+  ShortCircuitReplica replica = (ShortCircuitReplica)evictableMmapped.get
+  (eldestKey);
...
+ShortCircuitReplica removed = (ShortCircuitReplica)map.remove
+(evictableTimeNs);

Here eldestKey and evictableTimeNs actually are the key, not the index, thus 
they should respectively been changed to
evictableMmapped.getValue(map.indexOf(eldestKey));
and
map.remove(map.indexOf(evictableTimeNs);

But indexOf() is a O(n) operation and LinkedMap's performance for remove won't 
compete with TreeMap().

Correct me if i am wrong. Thanks.


> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-08-14 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15420579#comment-15420579
 ] 

Fenghua Hu commented on HDFS-10690:
---

[~xyao], I have a question about the patch:

According to 
https://commons.apache.org/proper/commons-collections/jacoco/org.apache.commons.collections4.map/LinkedMap.java.html

public K get(final int index) {
return getEntry(index).getKey();
}

public V remove(final int index) {
return remove(get(index));
}

note that the parameter is index.
In the patch,
...
+  ShortCircuitReplica replica = (ShortCircuitReplica)evictableMmapped.get
+  (eldestKey);
...
+ShortCircuitReplica removed = (ShortCircuitReplica)map.remove
+(evictableTimeNs);

Here eldestKey and evictableTimeNs actually are the key, not the index, thus 
they should respectively been changed to
evictableMmapped.getValue(map.indexOf(eldestKey));
and
map.remove(map.indexOf(evictableTimeNs);

But indexOf() is a O(n) operation and LinkedMap's performance for remove won't 
compete with TreeMap().

Correct me if i am wrong. Thanks.


> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-08-10 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15416326#comment-15416326
 ] 

Fenghua Hu commented on HDFS-10690:
---

Xiaoyu,
[~xyao]Thanks for the suggestion. I'll test the patch once I get test 
enviornment ready.


> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch, 
> ShortCircuitCache_LinkedMap.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-08-09 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15407012#comment-15407012
 ] 

Fenghua Hu edited comment on HDFS-10690 at 8/10/16 1:30 AM:


Xiaoyu,

[~xyao]I tried to replace TreeMap with linkedHashMap, but found LinkedHashMap 
lacks of function "ceilingEntry" or similar alternative, which is key to 
implement LRU-based replacement algorithm. LinkedHashMap also can't provide 
getYoungest or getEldest or similar functions. That's to say, if we want to use 
LinkedHashMap, we actually need to rewrite it. Any comments? Thanks.

Finally i found the correct email for you:-)





was (Author: fenghua_hu):
Xiaoyu,

[~xiaoyuyao] I tried to replace TreeMap with linkedHashMap, but found 
LinkedHashMap lacks of function "ceilingEntry" or similar alternative, which is 
key to implement LRU-based replacement algorithm. LinkedHashMap also can't 
provide getYoungest or getEldest or similar functions. That's to say, if we 
want to use LinkedHashMap, we actually need to rewrite it. Any comments? Thanks.




> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-08-03 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15407012#comment-15407012
 ] 

Fenghua Hu commented on HDFS-10690:
---

Xiaoyu,

[~xiaoyuyao] I tried to replace TreeMap with linkedHashMap, but found 
LinkedHashMap lacks of function "ceilingEntry" or similar alternative, which is 
key to implement LRU-based replacement algorithm. LinkedHashMap also can't 
provide getYoungest or getEldest or similar functions. That's to say, if we 
want to use LinkedHashMap, we actually need to rewrite it. Any comments? Thanks.




> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10682) Replace FsDatasetImpl object lock with a separate lock object

2016-08-01 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15403399#comment-15403399
 ] 

Fenghua Hu commented on HDFS-10682:
---

Arpit/Liang,
Looks like there is one JIRA(https://issues.apache.org/jira/browse/HDFS-9668) 
to address the big lock issue, maybe we should we relate them?



> Replace FsDatasetImpl object lock with a separate lock object
> -
>
> Key: HDFS-10682
> URL: https://issues.apache.org/jira/browse/HDFS-10682
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Reporter: Chen Liang
>Assignee: Chen Liang
> Attachments: HDFS-10682.001.patch, HDFS-10682.002.patch, 
> HDFS-10682.003.patch, HDFS-10682.004.patch, HDFS-10682.005.patch, 
> HDFS-10682.006.patch
>
>
> This Jira proposes to replace the FsDatasetImpl object lock with a separate 
> lock object. Doing so will make it easier to measure lock statistics like 
> lock held time and warn about potential lock contention due to slow disk 
> operations.
> In the future we can also consider replacing the lock with a read-write lock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-07-29 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400336#comment-15400336
 ] 

Fenghua Hu commented on HDFS-10690:
---

Xiaoyu, thanks for the reply. Regarding the bulletin 2, look like i didn't 
explain the design very well. Sorry for the misleading. I'd like to clarify 
here. This design won't need lookup in link list, because there are two 
references in ShortCircuitReplica object. If we want to remove a 
ShortCircuitReplica object from the list, just directly access its references 
and unlink itself. That's why it can improve performance. 

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-07-29 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15398939#comment-15398939
 ] 

Fenghua Hu edited comment on HDFS-10690 at 7/29/16 8:28 AM:


Performance test result against hadoop-2.6.4:
Test configuration:
* 1 name node + 1 data node
* Datanode: 1 PCIe SSD, 1 SATA HDD, OS page cache: on, read ahead for PCIe SSD: 0
* Hbase 1.1.2
* Table: key: 10 bytes, value: 1KB, total table size: 320GB, region count: 30, 
storage policy: ALL_SSD, blocksize: 8K
* YCSB 1.1.2, 4 clients, 16 processes / client, in total 64 processes

Test steps:
 Run GET queries with 64 YCSB processes for 30 minutes, record the QPS for each 
process。
Total QPS:
w/o patch: 95K
w/ patch: 135K

The performance gain is (135 - 95) / 95 = 42%.











was (Author: fenghua_hu):
Performance test result against hadoop-2.6.4:
Test configuration:
* 1 name node + 1 data node
* Datanode: 1 PCIe SSD, 1 SATA HDD, OS page cache: on, read ahead for PCIe SSD: 0
* Hbase 1.1.2
* Table: key: 10 bytes, value: 1KB, total table size: 320GB, region count: 30, 
storage policy: ALL_SSD, blocksize: 8K
* YCSB 1.1.2, 4 clients, 16 processes / client, in total 64 processes

Test steps:
 Run GET queries with 64 YCSB processes for 30 minutes, record the QPS for each 
processes.
Total QPS:
w/o patch: 95K
w/ patch: 135K

The performance gain is (135 - 95) / 95 = 42%.










> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-07-29 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15398939#comment-15398939
 ] 

Fenghua Hu commented on HDFS-10690:
---

Performance test result against hadoop-2.6.4:
Test configuration:
* 1 name node + 1 data node
* Datanode: 1 PCIe SSD, 1 SATA HDD, OS page cache: on, read ahead for PCIe SSD: 0
* Hbase 1.1.2
* Table: key: 10 bytes, value: 1KB, total table size: 320GB, region count: 30, 
storage policy: ALL_SSD, blocksize: 8K
* YCSB 1.1.2, 4 clients, 16 processes / client, in total 64 processes

Test steps:
 Run GET queries with 64 YCSB processes for 30 minutes, record the QPS for each 
processes.
Total QPS:
w/o patch: 95K
w/ patch: 135K

The performance gain is (135 - 95) / 95 = 42%.










> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-07-28 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15398762#comment-15398762
 ] 

Fenghua Hu commented on HDFS-10690:
---

Patch updated.

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-07-28 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10690:
--
Attachment: HDFS-10690.002.patch

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch, HDFS-10690.002.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-07-28 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15398429#comment-15398429
 ] 

Fenghua Hu commented on HDFS-10690:
---

Xiaoyu, thanks for the suggestion. LinkedHashMap is another good choice for 
performance. I considered it before but didn't implement thus had no 
performance. LinkedHashMap needs to do two things: 1. calculate key and insert 
hashmap, 2, insert into a linked list, hence in theory, it does more things 
than LruList in this patch. Yes, LinkedHashMap does make code cleaner, but its 
performances could be compromised. Please correct me if i am wrong.

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-07-28 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10690:
--
Attachment: HDFS-10690.001.patch

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-07-28 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10690:
--
Attachment: (was: lrulist.patch)

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: HDFS-10690.001.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-07-28 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10690:
--
Affects Version/s: (was: 2.6.4)
   3.0.0-alpha2
 Target Version/s: 3.0.0-beta1  (was: 3.0.0-alpha2)
   Status: Patch Available  (was: Open)

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 3.0.0-alpha2
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: lrulist.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-07-28 Thread Fenghua Hu (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15397637#comment-15397637
 ] 

Fenghua Hu commented on HDFS-10690:
---

Some comments about the patch:
1. LruList.java implements a double-linked list to track the cached replicainfo 
objects.
2.Two references are added to ShortCircuitReplica object to eliminate the 
lookup in the list.

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.6.4
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: lrulist.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-07-28 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10690:
--
Attachment: lrulist.patch

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.6.4
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
> Attachments: lrulist.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-07-28 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10690:
--
Status: Open  (was: Patch Available)

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.6.4
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-07-28 Thread Fenghua Hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fenghua Hu updated HDFS-10690:
--
Affects Version/s: 2.6.4
 Target Version/s: 3.0.0-alpha2
 Tags: ShortCircuitCache
   Status: Patch Available  (was: Open)

> Optimize insertion/removal of replica in ShortCircuitCache.java
> ---
>
> Key: HDFS-10690
> URL: https://issues.apache.org/jira/browse/HDFS-10690
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs-client
>Affects Versions: 2.6.4
>Reporter: Fenghua Hu
>Assignee: Fenghua Hu
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> Currently in ShortCircuitCache, two TreeMap objects are used to track the 
> cached replicas.
> private final TreeMap evictable = new TreeMap<>();
> private final TreeMap evictableMmapped = new 
> TreeMap<>();
> TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
> traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
> inserting/removing an entry  becomes considerable.
> To mitigate it, we designed a new list-based for replica tracking.
> The list is a double-linked FIFO. FIFO is time-based, thus insertion is a 
> very low cost operation. On the other hand, list is not lookup-friendly. To 
> address this issue, we introduce two references into ShortCircuitReplica 
> object.
> ShortCircuitReplica next = null;
> ShortCircuitReplica prev = null;
> In this way, lookup is not needed when removing a replica from the list. We 
> only need to modify its predecessor's and successor's references in the lists.
> Our tests showed up to 15-50% performance improvement when using PCIe flash 
> as storage media.
> The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
> patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10690) Optimize insertion/removal of replica in ShortCircuitCache.java

2016-07-26 Thread Fenghua Hu (JIRA)
Fenghua Hu created HDFS-10690:
-

 Summary: Optimize insertion/removal of replica in 
ShortCircuitCache.java
 Key: HDFS-10690
 URL: https://issues.apache.org/jira/browse/HDFS-10690
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs-client
Reporter: Fenghua Hu


Currently in ShortCircuitCache, two TreeMap objects are used to track the 
cached replicas.

private final TreeMap evictable = new TreeMap<>();
private final TreeMap evictableMmapped = new 
TreeMap<>();

TreeMap employs Red-Black tree for sorting. This isn't an issue when using 
traditional HDD. But when using high-performance SSD/PCIe Flash, the cost 
inserting/removing an entry  becomes considerable.

To mitigate it, we designed a new list-based for replica tracking.

The list is a double-linked FIFO. FIFO is time-based, thus insertion is a very 
low cost operation. On the other hand, list is not lookup-friendly. To address 
this issue, we introduce two references into ShortCircuitReplica object.

ShortCircuitReplica next = null;
ShortCircuitReplica prev = null;

In this way, lookup is not needed when removing a replica from the list. We 
only need to modify its predecessor's and successor's references in the lists.

Our tests showed up to 15-50% performance improvement when using PCIe flash as 
storage media.

The original patch is against 2.6.4, now I am porting to Hadoop trunk, and 
patch will be posted soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org