[jira] [Commented] (HDFS-8383) Tolerate multiple failures in DFSStripedOutputStream

2015-09-09 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737818#comment-14737818
 ] 

Zhe Zhang commented on HDFS-8383:
-

While we explore the ideas of refactoring {{DFSOutputStream}} and 
{{BlockGroupDataStreamer}}, here are some thoughts:
# A simplified version of a streamer's {{run()}} loop (w.r.t error handling):
{code}
while (true) {
findError;  // F
bumpStamp;  // B
updatePipeline; // U
}
{code}
# I think the mission here is to handle all combinations of interleaved 
{{F-B-U}} sequences from multiple streamers. For example:
{code}
F1-B1-F3-U1-B3-U3
F1-F3-B3-U3-B1-U1
{code}
# Regardless of the final solution I think we should write a test to emulate 
the above interleaved combinations.
# This has been discussed above and in HDFS-8704: since we are never replacing 
a DN, do we still need {{B}} and {{U}}? IIUC we still need them to identify 
stale internal blocks (DN fails and comes back). But we can probably combine 
them into a single RPC call. After getting a new {{genStamp}} we can probably 
just set it on the block in NN?
# If we move {{B}} and {{U}} up (to {{OutputStream}} or {{BGDataStreamer}}), 
the logic is much simpler. We just need to make sure to correctly handle the 
arrival or new {{F}} messages while the upper level processes {{B+U}}. E.g., 
does it make sense the bump stamp multiple times.

> Tolerate multiple failures in DFSStripedOutputStream
> 
>
> Key: HDFS-8383
> URL: https://issues.apache.org/jira/browse/HDFS-8383
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Walter Su
> Attachments: HDFS-8383.00.patch, HDFS-8383.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8383) Tolerate multiple failures in DFSStripedOutputStream

2015-09-09 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14737962#comment-14737962
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8383:
---

> I think the mission here is to handle all combinations of interleaved F-B-U 
> sequences from multiple streamers. ...

When there is a second failure, we should abort the first failure handling and 
then start a new failure handling.  So there is no interleaved.

> Tolerate multiple failures in DFSStripedOutputStream
> 
>
> Key: HDFS-8383
> URL: https://issues.apache.org/jira/browse/HDFS-8383
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Walter Su
> Attachments: HDFS-8383.00.patch, HDFS-8383.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8383) Tolerate multiple failures in DFSStripedOutputStream

2015-09-09 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14738209#comment-14738209
 ] 

Zhe Zhang commented on HDFS-8383:
-

[~szetszwo] Thanks for sharing the thought. 

{code}
Streamer_1 fails => Streamer_1 returns from updateBlockForPipeline => 
Streamer_1 calling updatePipeline => Streamer_3 fails
{code}

At this point, how should we safely abort the ongoing {{updatePipeline}} RPC 
call by Streamer_1?

> Tolerate multiple failures in DFSStripedOutputStream
> 
>
> Key: HDFS-8383
> URL: https://issues.apache.org/jira/browse/HDFS-8383
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Walter Su
> Attachments: HDFS-8383.00.patch, HDFS-8383.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8383) Tolerate multiple failures in DFSStripedOutputStream

2015-09-06 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14732318#comment-14732318
 ] 

Walter Su commented on HDFS-8383:
-

*1. What's the difference between datanodeError and externalError?*

They are both error states of streamer.
datanodeError is set inside, by streamer itself. externalError is set outside, 
by DFSOutputStream.
We provide one node for each internal block. We have no node replacement. So if 
one node is marked error, streamer is dead.
externalError is an error signal from outside, means another streamer has 
datanodeError, probably dead already. In this case, all the left healthy will 
received externalError from DFSOutputStream prepare to start a recovery.

*2. What's the difference between {{failed}} and datanodeError?*

No difference, mostly. {{failed}} can be removed. Some unexpected error is not 
datanodeError but should be {{failed}}, like NPE, in this case streamer will 
close. So {{failed}} == error && streamerClosed. 

*3. How does a recovery begin?*

The failed streamer which has datanodeError will be dead. It will not trigger 
recovery. When a streamer failed, it saves lastException. When DFSOutputStream 
writes to this streamer, it calls {{checkClosed()}} first to check if the 
streamer is healthy by checking lastException. When DFSOutputStream finds out 
the streamer failed, it notifies other streamers by setting externalError. 
Other streamers begin recovery.

*4. How does a recovery begin if DFSOutputStream doesn't write to the failed 
streamer?*

DFSOutputStream just finish writing to streamer#3. And streamer#5 failed 
already. DFSOutputStream by accident suspends (possible if client calls 
write(b) slowly) and never touch streamer#5 again. DFSOutputStream doesn't know 
streamer#5 failed. So no recovery. When it calls {{close()}} it will check 
streamer#5 for the last time and will trigger recovery.

*5. What if a second streamer failed during recovery?*

the first recovery will succeed. the second failed streamer will have 
datanodeError and be dead. A second recovery will begin once condition of #3,#4 
has been met.

*6. How does a sceond recovery begin if the first recovery(1001-->1002) is 
unfinished?*
The second recovery will be scheduled. The second recovery should bump GS to 
1003, because the second recovery maybe from some failed streamer finished bump 
GS to 1002. So the second recovery should bump to 1003. The second recovery 
should wait(or force) the first one to finish.

*7. How does a third recovery begin if the first recovery(1001-->1002) is 
unfinished?*
The third reocvery merged with the second one. Only schedule once.

have I answered your question, Jing?

==

*follow-on:*

1. remove {{failed}}.

2. Coordinator periodically search failed streamer. Start recovery 
automatically. Should't depend on DFSOutputStream.

3. We faked {{DataStreamer#block}} if the streamer failed. We also faked 
{{DataStreamer#bytesCurBlock}}. But {{dataQueue}} is lost. (DFSOutputStream is 
async with streamer. So part of {{dataQueue}} belongs to old block and part of 
it belongs to new block when DFSOutputStream begins write next block) So it's 
hard to restart a failed streamer when moving on to next blockGroup.
We have 2 options:
3a. replace the failed streamer with a new one. we have to cache the new block 
part of {{dataQueue}}.
3b. restart the failed streamer.
HDFS-8704 tries to restart the failed streamer. HDFS-8704 disables 
{{checkClosed()}}, and consider failed streamer as normal one. So {{dataQueue}} 
is not lost. And we can simplify {{getBlockGroup}}.

4. block recovery when block group is ending. This's nothing like 
BlockConstructionStage.PIPELINE_CLOSE_RECOVERY. The fastest streamer have ended 
previous block, and sent request to NN to get new block group, while some 
streamer planing to bump GS for the old blocks. There is no way to bump the 
ended/finalized block. I have no clue how to solve this. My first plan is to 
disable block recovery in this situation.

> Tolerate multiple failures in DFSStripedOutputStream
> 
>
> Key: HDFS-8383
> URL: https://issues.apache.org/jira/browse/HDFS-8383
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Walter Su
> Attachments: HDFS-8383.00.patch, HDFS-8383.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8383) Tolerate multiple failures in DFSStripedOutputStream

2015-09-06 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14732332#comment-14732332
 ] 

Walter Su commented on HDFS-8383:
-

bq. When only one streamer fails, do we need to do anything? I think we can 
just ignore the failed streamer unless more than 3 streamers are found failed. 
The offline decode work will be started by some datanode later.
maintenance of the correctness of UC.replicas is requied by lease recovery.

bq. I think it’s not right to set the failed status of streamer in outputstream 
due to the asynchronization.
So I make it a follow-on.

bq. Not very clear about the error handling. For example, streamer_i fails to 
write a packet of block_j, but it succeeds to write block_j+1, could you give 
some detailed description about this situation?
Are you talking about different block groups? we haven't solved restarting 
streamer for single failure yet. This jira doesn't care about two failure from 
two block groups. It should not be a problem once single failure solved. except 
fowllow-on #4 of my last comment. 

> Tolerate multiple failures in DFSStripedOutputStream
> 
>
> Key: HDFS-8383
> URL: https://issues.apache.org/jira/browse/HDFS-8383
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Walter Su
> Attachments: HDFS-8383.00.patch, HDFS-8383.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8383) Tolerate multiple failures in DFSStripedOutputStream

2015-09-06 Thread Li Bo (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14732320#comment-14732320
 ] 

Li Bo commented on HDFS-8383:
-

Thanks [~walter.k.su] for the work! I have just read the code and find some 
points to be discussed:
1)  When only one streamer fails, do we need to do anything? I think we can 
just ignore the failed streamer unless more than 3 streamers are found failed. 
The offline decode work will be started by some datanode later.
2)  I think it’s not right to set the failed status of streamer in 
outputstream due to the asynchronization. I have given some reasons in 
HDFS-8704. The outputstream doesn’t need to care about the status of each 
streamer if just one or two streamers fail. This will not complicate the logic 
of outputstreamer. 
3)  Not very clear about the error handling. For example, streamer_i fails 
to write a packet of block_j, but it succeeds to write block_j+1, could you 
give some detailed description about this situation? 


> Tolerate multiple failures in DFSStripedOutputStream
> 
>
> Key: HDFS-8383
> URL: https://issues.apache.org/jira/browse/HDFS-8383
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Walter Su
> Attachments: HDFS-8383.00.patch, HDFS-8383.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8383) Tolerate multiple failures in DFSStripedOutputStream

2015-09-04 Thread Jing Zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731630#comment-14731630
 ] 

Jing Zhao commented on HDFS-8383:
-

Thanks for working on this, Walter!

So could you please elaborate more about how the current patch handles multiple 
failures? It will be helpful if you can describe what failure scenarios can be 
tolerated and how they are handled. For example, can we handle the scenario 
where a streamer cannot successfully create a new block outputstream (and bump 
the GS) during the recovery? Quickly checking the patch I did not see where a 
new recovery is scheduled in this case.

> Tolerate multiple failures in DFSStripedOutputStream
> 
>
> Key: HDFS-8383
> URL: https://issues.apache.org/jira/browse/HDFS-8383
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Walter Su
> Attachments: HDFS-8383.00.patch, HDFS-8383.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8383) Tolerate multiple failures in DFSStripedOutputStream

2015-09-03 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730042#comment-14730042
 ] 

Zhe Zhang commented on HDFS-8383:
-

Continuing the review on the patch itself:
# Reading the current single-failure handling logic again, I think the 
{{BlockRecoveryTrigger}} should work. We are making the recovery transactional 
by setting the streamer's {{externalError}} before {{updateBlockForPipeline}} 
and resetting it after {{updatePipeline}}. I think it's the right approach at 
this stage.
# Why not incrementing {{numScheduled}} if it's already positive?
{code}
+if (numScheduled == 0) {
+  numScheduled++;
+}
{code}
# The error handling logic is quite complex now. We should use this chance to 
add more explanation. Below is my draft. [~walter.k.su] If it looks OK to you, 
could you help add to the patch?
{code}
  class Coordinator {
/**
 * The next internal block to write to, allocated by the fastest streamer
 * (earliest to finish writing the current internal block) by calling
 * {@link StripedDataStreamer#locateFollowingBlock}.
 */
private final MultipleBlockingQueue followingBlocks;

/**
 * Records the number of bytes actually written to the most recent internal
 * block. Used to calculate the size of the entire block group.
 */
private final MultipleBlockingQueue endBlocks;

/**
 * The following 2 queues are used to handle stream failures.
 *
 * When stream_i fails, the OutputStream notifies all other healthy
 * streamers by setting an external error on each of them, which triggers
 * {@link DataStreamer#processDatanodeError}. The first streamer reaching
 * the external error will call {@link DataStreamer#updateBlockForPipeline}
 * to get a new block with bumped generation stamp, and populate
 * {@link newBlocks} for other streamers. This first streamer will also
 * call {@link DataStreamer#updatePipeline} to update the NameNode state
 * for the block.
 */
private final MultipleBlockingQueue newBlocks;
private final MultipleBlockingQueue updateBlocks;
{code}
# Naming suggestions:
{code}
BlockRecoveryTrigger -> PipelineRecoveryManager or PipelineRecoveryCoordinator 
(I don't have a strong opinion but we can also consider moving the class under 
Coordinator).
trigger() -> addRecoveryWork()
isRecovering() -> isUnderRecovery()
{code}
# The patch at HDFS-8704 removes the {{setFailed}} API, we need to coordinate 
the 2 efforts. Pinging [~libo-intel] for comments.

Follow-ons:
# {{waitLastRecoveryToFinish}} can be improved. The current logic waits for the 
slowest streamer to get out of {{externalError}} state.
# {{externalError}} is actually quite awkward in {{DataStreamer}} -- it's a 
null concept for non-EC {{DataStreamer}}.

> Tolerate multiple failures in DFSStripedOutputStream
> 
>
> Key: HDFS-8383
> URL: https://issues.apache.org/jira/browse/HDFS-8383
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Walter Su
> Attachments: HDFS-8383.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8383) Tolerate multiple failures in DFSStripedOutputStream

2015-09-03 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728959#comment-14728959
 ] 

Walter Su commented on HDFS-8383:
-

There is debate long ago about parallel write and pipeline write. Parallel 
looks like not quite compelling. If HDFS supports parallel, implementing 
DFSStripedOutputStream would be quite easy. 
DFSStripedOutputStream/StripedDataStreamer is very like parallel write. If you 
change DFSStripedOutputStream.writeChunk(..) you can do parallel write for 
non-EC files easily. We have done the heavy lifting(synchronization), but don't 
want to change many existing code of the pipeline mechanism.

bq. Right now when a DN (e.g. DN_0) fails, we handle other streams (DN_1~DN_5) 
as if each of them has a failed DN. We trigger processDatanodeError to close 
the stream and open again with the same DN. This overhead isn't really 
necessary. IIUC all we want to do is to bump the GenerationStamp for internal 
blocks 1~5. Can we do it by sending a packet (or piggybacking with a data 
packet) to DN?
I think it's incompatible, and changes the protocol of the pipeline mechanism. 
Nothing I can do for single failure. I do suggest interrupt the on-going 
recovery for multiple failures to reduce the number of stream open/close. I 
have added a TODO.

bq. By doing the above we can also simplify the error handling logic. All we 
need is an AtomicInteger groupGS in DFSStripedOutputStream recording the 
current GS. Each failed streamer should increment groupGS. Each streamer can 
compare groupGS with its current GS before sending the next packet.
Without #2 improvement, this is just about passive vs active.

bq. Regardless of this change, the write error handling logic is already very 
complex IMO. Maybe we can consider moving locateFollowingBlock to OutputStream 
level so the streamer's task is capped within a single block. For non-EC files 
this refactor will also facilitate HDFS-8955.
OutputStream and streamer have different roles to play. I think 
{{locateFollowingBlock}} belong to streamer. Actually it should belong to a 
single {{BlockGroupDataStreamer}} to communicate with NN to allocate/update 
block, and {{StripedDataStreamer}} only has to stream block to DN. But I think 
it's ok don't seperate them, just let fastest streamer take the job.



> Tolerate multiple failures in DFSStripedOutputStream
> 
>
> Key: HDFS-8383
> URL: https://issues.apache.org/jira/browse/HDFS-8383
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Walter Su
> Attachments: HDFS-8383.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8383) Tolerate multiple failures in DFSStripedOutputStream

2015-09-02 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728246#comment-14728246
 ] 

Zhe Zhang commented on HDFS-8383:
-

Thanks Walter for creating the patch. Below is a list of comments, some on the 
overall write fault tolerance design, and others on this patch.

# {{DataStreamer#ErrorState#externalError}} looks a key concept. [~szetszwo]: 
Does it mean "error from peer streamers"? We should take this chance to add a 
Javadoc.
# Right now when a DN (e.g. DN_0) fails, we handle other streams (DN_1~DN_5) as 
if each of them has a failed DN. We trigger {{processDatanodeError}} to close 
the stream and open again with the same DN. This overhead isn't really 
necessary. IIUC all we want to do is to bump the {{GenerationStamp}} for 
internal blocks 1~5. Can we do it by sending a packet (or piggybacking with a 
data packet) to DN?
# By doing the above we can also simplify the error handling logic. All we need 
is an {{AtomicInteger groupGS}} in {{DFSStripedOutputStream}} recording the 
current GS. Each failed streamer should increment {{groupGS}}. Each streamer 
can compare {{groupGS}} with its current GS before sending the next packet.
# Regardless of this change, the write error handling logic is already very 
complex IMO. Maybe we can consider moving {{locateFollowingBlock}} to 
OutputStream level so the streamer's task is capped within a single block. For 
non-EC files this refactor will also facilitate HDFS-8955.

Nits on the patch
# Is {{BlockRecoveryTrigger}} a singleton? If so do we need the synchronization?
# {{private Integer numScheduled}} looks like it's a boolean?

> Tolerate multiple failures in DFSStripedOutputStream
> 
>
> Key: HDFS-8383
> URL: https://issues.apache.org/jira/browse/HDFS-8383
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Walter Su
> Attachments: HDFS-8383.00.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8383) Tolerate multiple failures in DFSStripedOutputStream

2015-09-01 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726648#comment-14726648
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8383:
---

Sound good.  Thanks a lot for working on this!

> Tolerate multiple failures in DFSStripedOutputStream
> 
>
> Key: HDFS-8383
> URL: https://issues.apache.org/jira/browse/HDFS-8383
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Walter Su
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8383) Tolerate multiple failures in DFSStripedOutputStream

2015-09-01 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14725115#comment-14725115
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8383:
---

Hi Walter, are you working on this?  Any progress?

> Tolerate multiple failures in DFSStripedOutputStream
> 
>
> Key: HDFS-8383
> URL: https://issues.apache.org/jira/browse/HDFS-8383
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Walter Su
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8383) Tolerate multiple failures in DFSStripedOutputStream

2015-09-01 Thread Walter Su (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14725545#comment-14725545
 ] 

Walter Su commented on HDFS-8383:
-

I'm still working on this. Will upload a patch soon, one or two days later.

> Tolerate multiple failures in DFSStripedOutputStream
> 
>
> Key: HDFS-8383
> URL: https://issues.apache.org/jira/browse/HDFS-8383
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Tsz Wo Nicholas Sze
>Assignee: Walter Su
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8383) Tolerate multiple failures in DFSStripedOutputStream

2015-07-15 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628830#comment-14628830
 ] 

Zhe Zhang commented on HDFS-8383:
-

[~szetszwo] Should we move this to HDFS-8031? Thanks.

 Tolerate multiple failures in DFSStripedOutputStream
 

 Key: HDFS-8383
 URL: https://issues.apache.org/jira/browse/HDFS-8383
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8383) Tolerate multiple failures in DFSStripedOutputStream

2015-07-15 Thread Tsz Wo Nicholas Sze (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628843#comment-14628843
 ] 

Tsz Wo Nicholas Sze commented on HDFS-8383:
---

Let's keep it in HDFS-7285.  I am currently working on it.  I will create more 
JIRAs to handle different failure cases as [~walter.k.su] mentioned earlier.

 Tolerate multiple failures in DFSStripedOutputStream
 

 Key: HDFS-8383
 URL: https://issues.apache.org/jira/browse/HDFS-8383
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-8383) Tolerate multiple failures in DFSStripedOutputStream

2015-07-15 Thread Zhe Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-8383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628844#comment-14628844
 ] 

Zhe Zhang commented on HDFS-8383:
-

Sure! Thanks for the work.

 Tolerate multiple failures in DFSStripedOutputStream
 

 Key: HDFS-8383
 URL: https://issues.apache.org/jira/browse/HDFS-8383
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Tsz Wo Nicholas Sze
Assignee: Tsz Wo Nicholas Sze





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)