[jira] Commented: (HDFS-1222) NameNode fail stop in spite of multiple metadata directories

2011-02-14 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12994534#comment-12994534
 ] 

Konstantin Boudnik commented on HDFS-1222:
--

hasn't it been addressed by recent work in HDFS-903 and such? If so, let's 
close this as dup then.

> NameNode fail stop in spite of multiple metadata directories
> 
>
> Key: HDFS-1222
> URL: https://issues.apache.org/jira/browse/HDFS-1222
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.1
>Reporter: Thanh Do
>
> Despite the ability to configure multiple name directories
> (to store fsimage) and edits directories, the NameNode will fail stop 
> in most of the time it faces exception when accessing to these directories.
>  
> NameNode fail stops if an exception happens when loading fsimage,
> reading fstime, loading edits log, writing fsimage.ckpt ..., although there 
> are still good replicas. NameNode could have tried to work with other 
> replicas,
> and marked the faulty one.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
> Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HDFS-1222) NameNode fail stop in spite of multiple metadata directories

2010-09-06 Thread dhruba borthakur (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906669#action_12906669
 ] 

dhruba borthakur commented on HDFS-1222:


I think this bug says that if the NN is configured with multiple fs.name.dirs 
and there is a bad fsimage/edits in one of the configured directories while the 
fsimage/edits in the other directories are not-corrupted, still the NN fails to 
load the image.

This feature is somewhat by design. On the other hand, i think there is still a 
bug in this "feature". Suppose there were two directories in fs.name.dir, say 
d1 and d2. The edits in d2 is corrupted but is of the same size as the edits 
file in d1. Now, suppose d1 is listed first in the fs.name.dir configuration 
parameter. In this case, the NN will try reading the fsimage/edits from d1, and 
will succeed with the load. 

> NameNode fail stop in spite of multiple metadata directories
> 
>
> Key: HDFS-1222
> URL: https://issues.apache.org/jira/browse/HDFS-1222
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.1
>Reporter: Thanh Do
>
> Despite the ability to configure multiple name directories
> (to store fsimage) and edits directories, the NameNode will fail stop 
> in most of the time it faces exception when accessing to these directories.
>  
> NameNode fail stops if an exception happens when loading fsimage,
> reading fstime, loading edits log, writing fsimage.ckpt ..., although there 
> are still good replicas. NameNode could have tried to work with other 
> replicas,
> and marked the faulty one.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
> Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1222) NameNode fail stop in spite of multiple metadata directories

2010-06-23 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12882005#action_12882005
 ] 

Konstantin Shvachko commented on HDFS-1222:
---

Ok, got it. Then it works as designed. 
We really don't want it to run and make admins think that everything is 
alright. Have I said that in another issue already?
What people think, should we change it?

> NameNode fail stop in spite of multiple metadata directories
> 
>
> Key: HDFS-1222
> URL: https://issues.apache.org/jira/browse/HDFS-1222
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.1
>Reporter: Thanh Do
>
> Despite the ability to configure multiple name directories
> (to store fsimage) and edits directories, the NameNode will fail stop 
> in most of the time it faces exception when accessing to these directories.
>  
> NameNode fail stops if an exception happens when loading fsimage,
> reading fstime, loading edits log, writing fsimage.ckpt ..., although there 
> are still good replicas. NameNode could have tried to work with other 
> replicas,
> and marked the faulty one.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
> Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1222) NameNode fail stop in spite of multiple metadata directories

2010-06-22 Thread Thanh Do (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881527#action_12881527
 ] 

Thanh Do commented on HDFS-1222:


Konstantin, this is the namenode start up workload. 
when namenode gets an exception, it fails, but not tolerate, 
i.e not retry with other image if there is any.
(this may be due the design choice that already been made)


> NameNode fail stop in spite of multiple metadata directories
> 
>
> Key: HDFS-1222
> URL: https://issues.apache.org/jira/browse/HDFS-1222
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.1
>Reporter: Thanh Do
>
> Despite the ability to configure multiple name directories
> (to store fsimage) and edits directories, the NameNode will fail stop 
> in most of the time it faces exception when accessing to these directories.
>  
> NameNode fail stops if an exception happens when loading fsimage,
> reading fstime, loading edits log, writing fsimage.ckpt ..., although there 
> are still good replicas. NameNode could have tried to work with other 
> replicas,
> and marked the faulty one.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
> Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1222) NameNode fail stop in spite of multiple metadata directories

2010-06-21 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881034#action_12881034
 ] 

Konstantin Shvachko commented on HDFS-1222:
---

I did not understand the failure scenario. Could you please elaborate.
When the name-node gets an exception while loading image from one of the 
directories, does it fail or not?
Is it only the startup exceptions you are testing here?

> NameNode fail stop in spite of multiple metadata directories
> 
>
> Key: HDFS-1222
> URL: https://issues.apache.org/jira/browse/HDFS-1222
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.1
>Reporter: Thanh Do
>
> Despite the ability to configure multiple name directories
> (to store fsimage) and edits directories, the NameNode will fail stop 
> in most of the time it faces exception when accessing to these directories.
>  
> NameNode fail stops if an exception happens when loading fsimage,
> reading fstime, loading edits log, writing fsimage.ckpt ..., although there 
> are still good replicas. NameNode could have tried to work with other 
> replicas,
> and marked the faulty one.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
> Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1222) NameNode fail stop in spite of multiple metadata directories

2010-06-20 Thread Thanh Do (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880649#action_12880649
 ] 

Thanh Do commented on HDFS-1222:


Triggering the rare cases is the goal of our project. 
We have read some papers saying that rare failure do happen,
and when they happen, the system does not behave as expected.
Thus, our view is that we should expect the unexpected.

> NameNode fail stop in spite of multiple metadata directories
> 
>
> Key: HDFS-1222
> URL: https://issues.apache.org/jira/browse/HDFS-1222
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.1
>Reporter: Thanh Do
>
> Despite the ability to configure multiple name directories
> (to store fsimage) and edits directories, the NameNode will fail stop 
> in most of the time it faces exception when accessing to these directories.
>  
> NameNode fail stops if an exception happens when loading fsimage,
> reading fstime, loading edits log, writing fsimage.ckpt ..., although there 
> are still good replicas. NameNode could have tried to work with other 
> replicas,
> and marked the faulty one.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
> Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HDFS-1222) NameNode fail stop in spite of multiple metadata directories

2010-06-17 Thread Allen Wittenauer (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879853#action_12879853
 ] 

Allen Wittenauer commented on HDFS-1222:


I have yet to see this condition under real world practice, including losing an 
NFS partition.  You need to give a better description of how you were able to 
make the namenode fail.  Otherwise this is doomed to be a WONTFIX or some other 
'non reproducable' closure.

> NameNode fail stop in spite of multiple metadata directories
> 
>
> Key: HDFS-1222
> URL: https://issues.apache.org/jira/browse/HDFS-1222
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.20.1
>Reporter: Thanh Do
>
> Despite the ability to configure multiple name directories
> (to store fsimage) and edits directories, the NameNode will fail stop 
> in most of the time it faces exception when accessing to these directories.
>  
> NameNode fail stops if an exception happens when loading fsimage,
> reading fstime, loading edits log, writing fsimage.ckpt ..., although there 
> are still good replicas. NameNode could have tried to work with other 
> replicas,
> and marked the faulty one.
> This bug was found by our Failure Testing Service framework:
> http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-98.html
> For questions, please email us: Thanh Do (than...@cs.wisc.edu) and 
> Haryadi Gunawi (hary...@eecs.berkeley.edu)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.