[jira] [Updated] (HDFS-2077) 1073: address checkpoint upload when one of the storage dirs is failed

2011-06-20 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2077:
--

Attachment: hdfs-2077.txt

Updated patch to fix the above dumb bug.

> 1073: address checkpoint upload when one of the storage dirs is failed
> --
>
> Key: HDFS-2077
> URL: https://issues.apache.org/jira/browse/HDFS-2077
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: Edit log branch (HDFS-1073)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: hdfs-2077.txt, hdfs-2077.txt
>
>
> This JIRA addresses the following case:
> - NN is running with 2 storage dirs
> - 1 of the dirs fails
> - 2NN makes a checkpoint
> Currently, if GetImageServlet fails to open _any_ of the local files to 
> receive a checkpoint, it will fail the entire checkpoint upload process. 
> Instead, it should continue to receive checkpoints in the non-failed 
> directories.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2080) Speed up DFS read path

2011-06-20 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2080:
--

Attachment: hdfs-2080.txt

Updated patch to fix a couple of the issues mentioned above:
- fix a couple of tests which used {{new Socket}} directly instead of the 
{{SocketFactory}} -- thus they didn't have associated Channels and BlockReader 
failed
- fix blockreader to handle EOF correctly (fixes TestClientBlockVerification)
- fix TestSeekBug to use readFully where necessary


the append-related bug still exists, but this patch should be useful enough for 
some people to play around with this if interested.

> Speed up DFS read path
> --
>
> Key: HDFS-2080
> URL: https://issues.apache.org/jira/browse/HDFS-2080
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.23.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.23.0
>
> Attachments: hdfs-2080.txt, hdfs-2080.txt
>
>
> I've developed a series of patches that speeds up the HDFS read path by a 
> factor of about 2.5x (~300M/sec to ~800M/sec for localhost reading from 
> buffer cache) and also will make it easier to allow for advanced users (eg 
> hbase) to skip a buffer copy. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2093) 1073: Handle case where an entirely empty log is left during NN crash

2011-06-20 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2093:
--

Attachment: hdfs-2093.txt

Updated patch addresses the case where the above happens and there is only on 
storage dir.

I took the conservative route and consider the single empty segment corrupt 
since it's a very rare failure, more likely to occur due to drive corruption 
than a well-timed crash.

> 1073: Handle case where an entirely empty log is left during NN crash
> -
>
> Key: HDFS-2093
> URL: https://issues.apache.org/jira/browse/HDFS-2093
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: Edit log branch (HDFS-1073)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: hdfs-2093.txt, hdfs-2093.txt
>
>
> In fault-testing the HDFS-1073 branch, I saw the following situation:
> - NN has two storage directories, but one is in failed state
> - NN starts to roll edits logs to edits_inprogress_5160285
> - NN then crashes
> - on restart, it detects the truncated log, but since it has 0 txns, it 
> finalizes it to the nonsense log name edits_5160285-5160284.
> - It then starts logs again at edits_inprogress_5160285.
> - After this point, no checkpoints or future NN startups succeed since there 
> are two logs starting with the same txid

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2084) Sometimes backup node/secondary name node stops with exception

2011-06-20 Thread Vitalii Tymchyshyn (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052360#comment-13052360
 ] 

Vitalii Tymchyshyn commented on HDFS-2084:
--

I dont know how does it happen, but it does often for me. I have a copy of 
namenode state that produces this problem on start I can share.
In general, as for me, it would be great to have an option to perform a 
checkpoint skipping invalid records of any kind. Because now any such record 
make namenode unusable on restart and with an option you will at most loose a 
bit of information.

> Sometimes backup node/secondary name node stops with exception
> --
>
> Key: HDFS-2084
> URL: https://issues.apache.org/jira/browse/HDFS-2084
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0
> Environment: FreeBSD
>Reporter: Vitalii Tymchyshyn
> Attachments: patch.diff
>
>
> 2011-06-17 11:43:23,096 ERROR 
> org.apache.hadoop.hdfs.server.namenode.Checkpointer: Throwable Exception in 
> doCheckpoint: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1765)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1753)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:708)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:411)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:378)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1209)
> at 
> org.apache.hadoop.hdfs.server.namenode.BackupStorage.loadCheckpoint(BackupStorage.java:158)
> at 
> org.apache.hadoop.hdfs.server.namenode.Checkpointer.doCheckpoint(Checkpointer.java:243)
> at 
> org.apache.hadoop.hdfs.server.namenode.Checkpointer.run(Checkpointer.java:141)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2093) 1073: Handle case where an entirely empty log is left during NN crash

2011-06-20 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2093:
--

Attachment: hdfs-2093.txt

Attached patch considers such logs as corrupt at startup time. Thus in the 
situation above, where the only log we have is this corrupted one, it will 
refuse to let the NN start, with a nice message explaining that the logs 
starting at this txid are corrupt with no txns. The operator can then 
double-check whether a different storage drive which possibly went missing 
might have better logs, etc, before starting NN.

> 1073: Handle case where an entirely empty log is left during NN crash
> -
>
> Key: HDFS-2093
> URL: https://issues.apache.org/jira/browse/HDFS-2093
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: Edit log branch (HDFS-1073)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: hdfs-2093.txt
>
>
> In fault-testing the HDFS-1073 branch, I saw the following situation:
> - NN has two storage directories, but one is in failed state
> - NN starts to roll edits logs to edits_inprogress_5160285
> - NN then crashes
> - on restart, it detects the truncated log, but since it has 0 txns, it 
> finalizes it to the nonsense log name edits_5160285-5160284.
> - It then starts logs again at edits_inprogress_5160285.
> - After this point, no checkpoints or future NN startups succeed since there 
> are two logs starting with the same txid

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2093) 1073: Handle case where an entirely empty log is left during NN crash

2011-06-20 Thread Todd Lipcon (JIRA)
1073: Handle case where an entirely empty log is left during NN crash
-

 Key: HDFS-2093
 URL: https://issues.apache.org/jira/browse/HDFS-2093
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Todd Lipcon
Assignee: Todd Lipcon




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2093) 1073: Handle case where an entirely empty log is left during NN crash

2011-06-20 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052347#comment-13052347
 ] 

Todd Lipcon commented on HDFS-2093:
---

The really good news here is that the robustness of this design even in the 
presence of bugs proved itself - I removed the nonsense log file and the NN 
started with no corruption, 2NN checkpointed happily, no data lost.

> 1073: Handle case where an entirely empty log is left during NN crash
> -
>
> Key: HDFS-2093
> URL: https://issues.apache.org/jira/browse/HDFS-2093
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: Edit log branch (HDFS-1073)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: Edit log branch (HDFS-1073)
>
>
> In fault-testing the HDFS-1073 branch, I saw the following situation:
> - NN has two storage directories, but one is in failed state
> - NN starts to roll edits logs to edits_inprogress_5160285
> - NN then crashes
> - on restart, it detects the truncated log, but since it has 0 txns, it 
> finalizes it to the nonsense log name edits_5160285-5160284.
> - It then starts logs again at edits_inprogress_5160285.
> - After this point, no checkpoints or future NN startups succeed since there 
> are two logs starting with the same txid

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2093) 1073: Handle case where an entirely empty log is left during NN crash

2011-06-20 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2093:
--

  Component/s: name-node
  Description: 
In fault-testing the HDFS-1073 branch, I saw the following situation:
- NN has two storage directories, but one is in failed state
- NN starts to roll edits logs to edits_inprogress_5160285
- NN then crashes
- on restart, it detects the truncated log, but since it has 0 txns, it 
finalizes it to the nonsense log name edits_5160285-5160284.
- It then starts logs again at edits_inprogress_5160285.
- After this point, no checkpoints or future NN startups succeed since there 
are two logs starting with the same txid
Affects Version/s: Edit log branch (HDFS-1073)
Fix Version/s: Edit log branch (HDFS-1073)

> 1073: Handle case where an entirely empty log is left during NN crash
> -
>
> Key: HDFS-2093
> URL: https://issues.apache.org/jira/browse/HDFS-2093
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: Edit log branch (HDFS-1073)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: Edit log branch (HDFS-1073)
>
>
> In fault-testing the HDFS-1073 branch, I saw the following situation:
> - NN has two storage directories, but one is in failed state
> - NN starts to roll edits logs to edits_inprogress_5160285
> - NN then crashes
> - on restart, it detects the truncated log, but since it has 0 txns, it 
> finalizes it to the nonsense log name edits_5160285-5160284.
> - It then starts logs again at edits_inprogress_5160285.
> - After this point, no checkpoints or future NN startups succeed since there 
> are two logs starting with the same txid

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2094) Add metrics for write pipeline failures

2011-06-20 Thread Bharath Mundlapudi (JIRA)
Add metrics for write pipeline failures
---

 Key: HDFS-2094
 URL: https://issues.apache.org/jira/browse/HDFS-2094
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0


Write pipeline can fail for various reasons like rpc connection issues, disk 
problem etc. I am proposing to add metrics to detect write pipeline issues. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2092) Remove configuration object reference in DFSClient

2011-06-20 Thread Bharath Mundlapudi (JIRA)
Remove configuration object reference in DFSClient
--

 Key: HDFS-2092
 URL: https://issues.apache.org/jira/browse/HDFS-2092
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Affects Versions: 0.23.0
Reporter: Bharath Mundlapudi
Assignee: Bharath Mundlapudi
 Fix For: 0.23.0


At present, DFSClient stores reference to configuration object. Since, these 
configuration objects are pretty big at times can blot the processes which has 
multiple DFSClient objects like in TaskTracker. This is an attempt to remove 
the reference of conf object in DFSClient. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-2091) Hadoop does not scale as expected

2011-06-20 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HDFS-2091.
---

Resolution: Invalid

Hi Alberto. This is the bug tracker rather than a place for questions. You 
might try the mapreduce-user mailing list.

> Hadoop does not scale as expected
> -
>
> Key: HDFS-2091
> URL: https://issues.apache.org/jira/browse/HDFS-2091
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: Linux, 8 nodes.
>Reporter: Alberto Andreotti
>   Original Estimate: 504h
>  Remaining Estimate: 504h
>
> The more nodes I add to this application, the slower it goes. This is the 
> app's map,
>  public void map(IntWritable linearPos, FloatWritable heat, Context context
> ) throws IOException, InterruptedException {
>int myLinearPos = linearPos.get();
>//Distribute my value to the previous and the next
>linearPos.set(myLinearPos - 1);
>context.write(linearPos, heat);
>linearPos.set(myLinearPos + 1);
>context.write(linearPos, heat);
>//Distribute my value to the cells above and below
>linearPos.set(myLinearPos - MatrixData.Length());
>context.write(linearPos, heat);
>linearPos.set(myLinearPos + MatrixData.Length());
>context.write(linearPos, heat);
> }//end map
> and this is the reduce,
> public void reduce(IntWritable linearPos, Iterable fwValues,
>  Context context) throws IOException, 
> InterruptedException {
>//Handle first and last "cold" boundaries
>if(linearPos.get()<0 || linearPos.get()>MatrixData.LinearSize()){
>   return;
>}
>if(linearPos.get()==MatrixData.HeatSourceLinearPos()){
>   context.write(linearPos, new 
> FloatWritable(MatrixData.HeatSourceTemperature()));
>   return;
>}
>float result = 0.0f;
>//Add all the values
>for(FloatWritable heat : fwValues) {
>   result += heat.get();
>}
>   context.write(linearPos, new FloatWritable(result/4) );
> }
> For example, with 6 nodes I get a running time of 15minutes, and with 4 nodes 
> I get a running time of 8minutes!.
> This is how I generated the input,
>  public static void main(String[] args) throws IOException {
>  //Write file in the local dir
>  String uri = "/home/beto/mySeq";
>  Configuration conf = new Configuration();
>  FileSystem fs = FileSystem.get(URI.create(uri), conf);
>  Path path = new Path(uri);
>  IntWritable key = new IntWritable();
>  FloatWritable value = new FloatWritable(0.0f);
>  SequenceFile.Writer writer = null;
>  try {
>writer = SequenceFile.createWriter(fs, conf, path, key.getClass(), 
> value.getClass());
>  int step = MatrixData.LinearSize()/10;
>  int limit = step;
>  for (int i = 0; i <= MatrixData.LinearSize(); i++) {
> key.set(i);
> if(i>limit){
>  System.out.println("*");
>  limit +=step;
> }
>   if(i==MatrixData.HeatSourceLinearPos()) {
> writer.append(key, new 
> FloatWritable(MatrixData.HeatSourceTemperature()));
> continue;
>   }
> writer.append(key, value);
>   }
> } finally {
>   IOUtils.closeStream(writer);
> }
>   }
> I'm basically solving a heat transfer problem in a squared section. Pretty 
> simple. The input data is being stored as a (key, value) pairs, read in this 
> way, processed, and written again in the same format.
> Any thoughts?
> Alberto.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2091) Hadoop does not scale as expected

2011-06-20 Thread Alberto Andreotti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alberto Andreotti updated HDFS-2091:


Description: 
The more nodes I add to this application, the slower it goes. This is the app's 
map,

 public void map(IntWritable linearPos, FloatWritable heat, Context context
) throws IOException, InterruptedException {

   int myLinearPos = linearPos.get();
   //Distribute my value to the previous and the next
   linearPos.set(myLinearPos - 1);
   context.write(linearPos, heat);
   linearPos.set(myLinearPos + 1);
   context.write(linearPos, heat);
   //Distribute my value to the cells above and below
   linearPos.set(myLinearPos - MatrixData.Length());
   context.write(linearPos, heat);
   linearPos.set(myLinearPos + MatrixData.Length());
   context.write(linearPos, heat);
}//end map

and this is the reduce,

public void reduce(IntWritable linearPos, Iterable fwValues,
 Context context) throws IOException, InterruptedException {

   //Handle first and last "cold" boundaries
   if(linearPos.get()<0 || linearPos.get()>MatrixData.LinearSize()){
  return;
   }

   if(linearPos.get()==MatrixData.HeatSourceLinearPos()){
  context.write(linearPos, new 
FloatWritable(MatrixData.HeatSourceTemperature()));
  return;
   }

   float result = 0.0f;
   //Add all the values
   for(FloatWritable heat : fwValues) {
  result += heat.get();
   }

  context.write(linearPos, new FloatWritable(result/4) );
}

For example, with 6 nodes I get a running time of 15minutes, and with 4 nodes I 
get a running time of 8minutes!.
This is how I generated the input,

 public static void main(String[] args) throws IOException {
 //Write file in the local dir
 String uri = "/home/beto/mySeq";

 Configuration conf = new Configuration();
 FileSystem fs = FileSystem.get(URI.create(uri), conf);
 Path path = new Path(uri);

 IntWritable key = new IntWritable();
 FloatWritable value = new FloatWritable(0.0f);

 SequenceFile.Writer writer = null;
 try {
   writer = SequenceFile.createWriter(fs, conf, path, key.getClass(), 
value.getClass());

 int step = MatrixData.LinearSize()/10;
 int limit = step;
 for (int i = 0; i <= MatrixData.LinearSize(); i++) {
key.set(i);
if(i>limit){
 System.out.println("*");
 limit +=step;
}
  if(i==MatrixData.HeatSourceLinearPos()) {
writer.append(key, new 
FloatWritable(MatrixData.HeatSourceTemperature()));
continue;
  }

writer.append(key, value);

  }
} finally {
  IOUtils.closeStream(writer);
}
  }


I'm basically solving a heat transfer problem in a squared section. Pretty 
simple. The input data is being stored as a (key, value) pairs, read in this 
way, processed, and written again in the same format.
Any thoughts?

Alberto.


  was:
The more nodes I add to this application, the slower it goes. This is the app's 
map,

 public void map(IntWritable linearPos, FloatWritable heat, Context context
) throws IOException, InterruptedException {
   int myLinearPos = linearPos.get();

   //Distribute my value to the previous and the next
   linearPos.set(myLinearPos - 1);
   context.write(linearPos, heat);
   linearPos.set(myLinearPos + 1);
   context.write(linearPos, heat);

   //Distribute my value to the cells above and below
   linearPos.set(myLinearPos - MatrixData.Length());
   context.write(linearPos, heat);
   linearPos.set(myLinearPos + MatrixData.Length());
   context.write(linearPos, heat);

}//end map

and this is the reduce,

public void reduce(IntWritable linearPos, Iterable fwValues,
 Context context) throws IOException, InterruptedException {

   //Handle first and last "cold" boundaries
   if(linearPos.get()<0 || linearPos.get()>MatrixData.LinearSize()){
  return;
   }

   if(linearPos.get()==MatrixData.HeatSourceLinearPos()){
  context.write(linearPos, new 
FloatWritable(MatrixData.HeatSourceTemperature()));
  return;
   }

   float result = 0.0f;
   //Add all the values
   for(FloatWritable heat : fwValues) {
  result += heat.get();
   }

  context.write(linearPos, new FloatWritable(result/4) );
}

For example, with 6 nodes I get a running time of 15minutes, and with 4 nodes I 
get a running time of 8minutes!.
This is how I generated the input,

 public static void main(String[] args) throws IOException {
 //Write file in the local dir
 String uri = "/home/beto/mySeq";

 Configuration conf = new Configuration();
 FileSystem fs = FileSystem.get(URI.create(uri), conf);
 Path path = new Pa

[jira] [Created] (HDFS-2091) Hadoop does not scale as expected

2011-06-20 Thread Alberto Andreotti (JIRA)
Hadoop does not scale as expected
-

 Key: HDFS-2091
 URL: https://issues.apache.org/jira/browse/HDFS-2091
 Project: Hadoop HDFS
  Issue Type: Bug
 Environment: Linux, 8 nodes.
Reporter: Alberto Andreotti


The more nodes I add to this application, the slower it goes. This is the app's 
map,

 public void map(IntWritable linearPos, FloatWritable heat, Context context
) throws IOException, InterruptedException {
   int myLinearPos = linearPos.get();

   //Distribute my value to the previous and the next
   linearPos.set(myLinearPos - 1);
   context.write(linearPos, heat);
   linearPos.set(myLinearPos + 1);
   context.write(linearPos, heat);

   //Distribute my value to the cells above and below
   linearPos.set(myLinearPos - MatrixData.Length());
   context.write(linearPos, heat);
   linearPos.set(myLinearPos + MatrixData.Length());
   context.write(linearPos, heat);

}//end map

and this is the reduce,

public void reduce(IntWritable linearPos, Iterable fwValues,
 Context context) throws IOException, InterruptedException {

   //Handle first and last "cold" boundaries
   if(linearPos.get()<0 || linearPos.get()>MatrixData.LinearSize()){
  return;
   }

   if(linearPos.get()==MatrixData.HeatSourceLinearPos()){
  context.write(linearPos, new 
FloatWritable(MatrixData.HeatSourceTemperature()));
  return;
   }

   float result = 0.0f;
   //Add all the values
   for(FloatWritable heat : fwValues) {
  result += heat.get();
   }

  context.write(linearPos, new FloatWritable(result/4) );
}

For example, with 6 nodes I get a running time of 15minutes, and with 4 nodes I 
get a running time of 8minutes!.
This is how I generated the input,

 public static void main(String[] args) throws IOException {
 //Write file in the local dir
 String uri = "/home/beto/mySeq";

 Configuration conf = new Configuration();
 FileSystem fs = FileSystem.get(URI.create(uri), conf);
 Path path = new Path(uri);

 IntWritable key = new IntWritable();
 FloatWritable value = new FloatWritable(0.0f);

 SequenceFile.Writer writer = null;
 try {
   writer = SequenceFile.createWriter(fs, conf, path, key.getClass(), 
value.getClass());

 int step = MatrixData.LinearSize()/10;
 int limit = step;
 for (int i = 0; i <= MatrixData.LinearSize(); i++) {
key.set(i);
if(i>limit){
 System.out.println("*");
 limit +=step;
}
  if(i==MatrixData.HeatSourceLinearPos()) {
writer.append(key, new 
FloatWritable(MatrixData.HeatSourceTemperature()));
continue;
  }

writer.append(key, value);

  }
} finally {
  IOUtils.closeStream(writer);
}
  }


I'm basically solving a heat transfer problem in a squared section. Pretty 
simple. The input data is being stored as a (key, value) pairs, read in this 
way, processed, and written again in the same format.
Any thoughts?

Alberto.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1568) Improve DataXceiver error logging

2011-06-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052339#comment-13052339
 ] 

Hadoop QA commented on HDFS-1568:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12483041/HDFS-1568-6.patch
  against trunk revision 1137675.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/803//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/803//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/803//console

This message is automatically generated.

> Improve DataXceiver error logging
> -
>
> Key: HDFS-1568
> URL: https://issues.apache.org/jira/browse/HDFS-1568
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.23.0
>Reporter: Todd Lipcon
>Assignee: Joey Echeverria
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-1568-1.patch, HDFS-1568-3.patch, HDFS-1568-4.patch, 
> HDFS-1568-5.patch, HDFS-1568-6.patch, HDFS-1568-output-changes.patch
>
>
> In supporting customers we often see things like SocketTimeoutExceptions or 
> EOFExceptions coming from DataXceiver, but the logging isn't very good. For 
> example, if we get an IOE while setting up a connection to the downstream 
> mirror in writeBlock, the IP of the downstream mirror isn't logged on the DN 
> side.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2090) BackupNode fails when log is streamed due checksum error

2011-06-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052322#comment-13052322
 ] 

André Oriani commented on HDFS-2090:


According to my investigation and the help of Ivan Kelly from Yahoo, the commit 
below has introduced the bug:


{panel:borderStyle=solid}
Commit 27b956fa62ce9b467ab7dd287dd6dcd5ab6a0cb3
Author: Hairong Kuang
Date:   Mon Apr 11 17:15:27 2011 +

HDFS-1630. Support fsedits checksum. Contrbuted by Hairong Kuang.


git-svn-id:
https://svn.apache.org/repos/asf/hadoop/hdfs/trunk@109113113f79535-47bb-0310-9956-ffa450edef68
{panel}

PS: This is a github commit.

> BackupNode fails when log is streamed  due checksum error
> -
>
> Key: HDFS-2090
> URL: https://issues.apache.org/jira/browse/HDFS-2090
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: André Oriani
>
> *Reproductions steps:*
> 1) An HDFS cluster is up and running
> 2) A backupnode is up, running, and registered to the namenode
> 3) Do a write operation like copying a file to the FS.
> *Expected Result:* No exception is thrown
> *Actual Result:* A exception is thrown due a checksum error in the streamed 
> log:
> {panel:title=log| borderStyle=solid}
> 11/06/15 17:52:22 INFO ipc.Server: IPC Server handler 1 on 50100, call 
> journal(NamenodeRegistration(localhost:8020, role=NameNode), 101, 164, 
> [B@3951f910), rpc version=1, client version=5, methodsFingerPrint=302283637 
> from 192.168.1.102:56780: error: java.io.IOException: Error replaying edit 
> log at offset 13
> Recent opcode offsets: 1
> java.io.IOException: Error replaying edit log at offset 13
> Recent opcode offsets: 1
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:514)
>   at 
> org.apache.hadoop.hdfs.server.namenode.BackupImage.journal(BackupImage.java:242)
>   at 
> org.apache.hadoop.hdfs.server.namenode.BackupNode.journal(BackupNode.java:251)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:422)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1496)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1492)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1490)
> Caused by: org.apache.hadoop.fs.ChecksumException: Transaction 1 is corrupt. 
> Calculated checksum is -2116249809 but read checksum 0
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.validateChecksum(FSEditLogLoader.java:546)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:490)
>   ... 13 more
> {panel}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2090) BackupNode fails when log is streamed due checksum error

2011-06-20 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052321#comment-13052321
 ] 

André Oriani commented on HDFS-2090:


According to my investigation and the help of Ivan Kelly from Yahoo, the commit 
below has introduced the bug:


{panel:borderStyle=solid}
Commit 27b956fa62ce9b467ab7dd287dd6dcd5ab6a0cb3
Author: Hairong Kuang
Date:   Mon Apr 11 17:15:27 2011 +

HDFS-1630. Support fsedits checksum. Contrbuted by Hairong Kuang.


git-svn-id:
https://svn.apache.org/repos/asf/hadoop/hdfs/trunk@109113113f79535-47bb-0310-9956-ffa450edef68
{panel}

PS: This is a github commit.

> BackupNode fails when log is streamed  due checksum error
> -
>
> Key: HDFS-2090
> URL: https://issues.apache.org/jira/browse/HDFS-2090
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: André Oriani
>
> *Reproductions steps:*
> 1) An HDFS cluster is up and running
> 2) A backupnode is up, running, and registered to the namenode
> 3) Do a write operation like copying a file to the FS.
> *Expected Result:* No exception is thrown
> *Actual Result:* A exception is thrown due a checksum error in the streamed 
> log:
> {panel:title=log| borderStyle=solid}
> 11/06/15 17:52:22 INFO ipc.Server: IPC Server handler 1 on 50100, call 
> journal(NamenodeRegistration(localhost:8020, role=NameNode), 101, 164, 
> [B@3951f910), rpc version=1, client version=5, methodsFingerPrint=302283637 
> from 192.168.1.102:56780: error: java.io.IOException: Error replaying edit 
> log at offset 13
> Recent opcode offsets: 1
> java.io.IOException: Error replaying edit log at offset 13
> Recent opcode offsets: 1
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:514)
>   at 
> org.apache.hadoop.hdfs.server.namenode.BackupImage.journal(BackupImage.java:242)
>   at 
> org.apache.hadoop.hdfs.server.namenode.BackupNode.journal(BackupNode.java:251)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at 
> org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:422)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1496)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1492)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:396)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1490)
> Caused by: org.apache.hadoop.fs.ChecksumException: Transaction 1 is corrupt. 
> Calculated checksum is -2116249809 but read checksum 0
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.validateChecksum(FSEditLogLoader.java:546)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:490)
>   ... 13 more
> {panel}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2090) BackupNode fails when log is streamed due checksum error

2011-06-20 Thread JIRA
BackupNode fails when log is streamed  due checksum error
-

 Key: HDFS-2090
 URL: https://issues.apache.org/jira/browse/HDFS-2090
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.23.0
Reporter: André Oriani


*Reproductions steps:*

1) An HDFS cluster is up and running
2) A backupnode is up, running, and registered to the namenode
3) Do a write operation like copying a file to the FS.


*Expected Result:* No exception is thrown
*Actual Result:* A exception is thrown due a checksum error in the streamed log:


{panel:title=log| borderStyle=solid}
11/06/15 17:52:22 INFO ipc.Server: IPC Server handler 1 on 50100, call 
journal(NamenodeRegistration(localhost:8020, role=NameNode), 101, 164, 
[B@3951f910), rpc version=1, client version=5, methodsFingerPrint=302283637 
from 192.168.1.102:56780: error: java.io.IOException: Error replaying edit log 
at offset 13
Recent opcode offsets: 1
java.io.IOException: Error replaying edit log at offset 13
Recent opcode offsets: 1
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:514)
at 
org.apache.hadoop.hdfs.server.namenode.BackupImage.journal(BackupImage.java:242)
at 
org.apache.hadoop.hdfs.server.namenode.BackupNode.journal(BackupNode.java:251)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:422)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1496)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1492)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1131)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1490)
Caused by: org.apache.hadoop.fs.ChecksumException: Transaction 1 is corrupt. 
Calculated checksum is -2116249809 but read checksum 0
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.validateChecksum(FSEditLogLoader.java:546)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:490)
... 13 more
{panel}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2086) If the include hosts list contains host name, after restarting namenode, datanodes registrant is denied

2011-06-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052314#comment-13052314
 ] 

Hadoop QA commented on HDFS-2086:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12483221/HDFS-2086.patch
  against trunk revision 1137675.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 4 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.hdfs.server.namenode.TestStartup

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/802//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/802//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/802//console

This message is automatically generated.

> If the include hosts list contains host name, after restarting namenode, 
> datanodes registrant is denied 
> 
>
> Key: HDFS-2086
> URL: https://issues.apache.org/jira/browse/HDFS-2086
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
> Fix For: 0.23.0
>
> Attachments: HDFS-2086.patch
>
>
> As the title describes the problem:  if the include host list contains host 
> name, after restarting namenodes, the datanodes registrant is denied by 
> namenodes.  This is because after namenode is restarted, the still alive data 
> node will try to register itself with the namenode and it identifies itself 
> with its *IP address*.  However, namenode only allows all the hosts in its 
> hosts list to registrant and all of them are hostnames. So namenode would 
> deny the datanode registration.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2087) Add methods to DataTransferProtocol interface

2011-06-20 Thread Tsz Wo (Nicholas), SZE (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HDFS-2087:
-

Attachment: h2087_20110620.patch

h2087_20110620.patch: added {{readBlock(..)}} only for illustrating the idea.

> Add methods to DataTransferProtocol interface
> -
>
> Key: HDFS-2087
> URL: https://issues.apache.org/jira/browse/HDFS-2087
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: data-node, hdfs client
>Reporter: Tsz Wo (Nicholas), SZE
>Assignee: Tsz Wo (Nicholas), SZE
> Attachments: h2087_20110620.patch
>
>
> The {{DataTransferProtocol}} interface is currently empty.  The {{Sender}} 
> and {{Receiver}} define similar methods individually.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2077) 1073: address checkpoint upload when one of the storage dirs is failed

2011-06-20 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052305#comment-13052305
 ] 

Todd Lipcon commented on HDFS-2077:
---

just found a good bug in this while doing some fault testing... in 
reportErrorOnFile, it will mis-ascribe an error sometimes if one namenode 
directory is a prefix of the other... eg if the storage dirs are /data/name and 
/data/name2, it will ascribe an error in /data/name2/... to /data/name.

> 1073: address checkpoint upload when one of the storage dirs is failed
> --
>
> Key: HDFS-2077
> URL: https://issues.apache.org/jira/browse/HDFS-2077
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: Edit log branch (HDFS-1073)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: hdfs-2077.txt
>
>
> This JIRA addresses the following case:
> - NN is running with 2 storage dirs
> - 1 of the dirs fails
> - 2NN makes a checkpoint
> Currently, if GetImageServlet fails to open _any_ of the local files to 
> receive a checkpoint, it will fail the entire checkpoint upload process. 
> Instead, it should continue to receive checkpoints in the non-failed 
> directories.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2082) SecondaryNameNode web interface doesn't show the right info

2011-06-20 Thread Aaron T. Myers (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052303#comment-13052303
 ] 

Aaron T. Myers commented on HDFS-2082:
--

I'm pretty confident this test failure was spurious. The test just passed 
locally on my box.

Curiously, I've never seen {{TestSetTimes}} fail though.

> SecondaryNameNode web interface doesn't show the right info
> ---
>
> Key: HDFS-2082
> URL: https://issues.apache.org/jira/browse/HDFS-2082
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Fix For: 0.23.0
>
> Attachments: hdfs-2082.0.patch, hdfs-2082.1.patch, hdfs-2082.2.patch, 
> hdfs-2082.3.patch
>
>
> HADOOP-3741 introduced some useful info to the 2NN web UI. This broke when 
> security was added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2082) SecondaryNameNode web interface doesn't show the right info

2011-06-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052299#comment-13052299
 ] 

Hadoop QA commented on HDFS-2082:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12483014/hdfs-2082.3.patch
  against trunk revision 1137675.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 2 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed these core unit tests:
  org.apache.hadoop.hdfs.TestSetTimes

+1 contrib tests.  The patch passed contrib unit tests.

+1 system test framework.  The patch passed system test framework compile.

Test results: 
https://builds.apache.org/job/PreCommit-HDFS-Build/801//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HDFS-Build/801//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/801//console

This message is automatically generated.

> SecondaryNameNode web interface doesn't show the right info
> ---
>
> Key: HDFS-2082
> URL: https://issues.apache.org/jira/browse/HDFS-2082
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.22.0, 0.23.0
>Reporter: Aaron T. Myers
>Assignee: Aaron T. Myers
> Fix For: 0.23.0
>
> Attachments: hdfs-2082.0.patch, hdfs-2082.1.patch, hdfs-2082.2.patch, 
> hdfs-2082.3.patch
>
>
> HADOOP-3741 introduced some useful info to the 2NN web UI. This broke when 
> security was added.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2086) If the include hosts list contains host name, after restarting namenode, datanodes registrant is denied

2011-06-20 Thread Jitendra Nath Pandey (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052297#comment-13052297
 ] 

Jitendra Nath Pandey commented on HDFS-2086:


1. inHostsList and inExcludeHostsList do same things on two different lists. 
Both can use a single method that also takes the list as argument.
2. Do we really need to look into hostsList for both node.getName and 
iaddr.getHostName? I understand node.getName may actually be returning the 
ip:port, but for IP iaddr.getHostAddress is more reliable. Caveat with the 
later approach: Can we assume ipAddr and node (DatanodeID) will always be for 
the same host?

Minor: Indentation in checkIncludeListForDead.

> If the include hosts list contains host name, after restarting namenode, 
> datanodes registrant is denied 
> 
>
> Key: HDFS-2086
> URL: https://issues.apache.org/jira/browse/HDFS-2086
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
> Fix For: 0.23.0
>
> Attachments: HDFS-2086.patch
>
>
> As the title describes the problem:  if the include host list contains host 
> name, after restarting namenodes, the datanodes registrant is denied by 
> namenodes.  This is because after namenode is restarted, the still alive data 
> node will try to register itself with the namenode and it identifies itself 
> with its *IP address*.  However, namenode only allows all the hosts in its 
> hosts list to registrant and all of them are hostnames. So namenode would 
> deny the datanode registration.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2086) If the include hosts list contains host name, after restarting namenode, datanodes registrant is denied

2011-06-20 Thread Jitendra Nath Pandey (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jitendra Nath Pandey updated HDFS-2086:
---

Status: Patch Available  (was: Open)

> If the include hosts list contains host name, after restarting namenode, 
> datanodes registrant is denied 
> 
>
> Key: HDFS-2086
> URL: https://issues.apache.org/jira/browse/HDFS-2086
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
> Fix For: 0.23.0
>
> Attachments: HDFS-2086.patch
>
>
> As the title describes the problem:  if the include host list contains host 
> name, after restarting namenodes, the datanodes registrant is denied by 
> namenodes.  This is because after namenode is restarted, the still alive data 
> node will try to register itself with the namenode and it identifies itself 
> with its *IP address*.  However, namenode only allows all the hosts in its 
> hosts list to registrant and all of them are hostnames. So namenode would 
> deny the datanode registration.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2089) new hadoop-config.sh doesn't manage classpath for HADOOP_CONF_DIR correctly

2011-06-20 Thread Todd Lipcon (JIRA)
new hadoop-config.sh doesn't manage classpath for HADOOP_CONF_DIR correctly
---

 Key: HDFS-2089
 URL: https://issues.apache.org/jira/browse/HDFS-2089
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Todd Lipcon
 Fix For: 0.23.0


Since the introduction of the RPM packages, hadoop-config.sh incorrectly puts 
$HADOOP_HDFS_HOME/conf on the classpath regardless of whether HADOOP_CONF_DIR 
is already defined in the environment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2086) If the include hosts list contains host name, after restarting namenode, datanodes registrant is denied

2011-06-20 Thread Tanping Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanping Wang updated HDFS-2086:
---

Attachment: HDFS-2086.patch

> If the include hosts list contains host name, after restarting namenode, 
> datanodes registrant is denied 
> 
>
> Key: HDFS-2086
> URL: https://issues.apache.org/jira/browse/HDFS-2086
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
> Fix For: 0.23.0
>
> Attachments: HDFS-2086.patch
>
>
> As the title describes the problem:  if the include host list contains host 
> name, after restarting namenodes, the datanodes registrant is denied by 
> namenodes.  This is because after namenode is restarted, the still alive data 
> node will try to register itself with the namenode and it identifies itself 
> with its *IP address*.  However, namenode only allows all the hosts in its 
> hosts list to registrant and all of them are hostnames. So namenode would 
> deny the datanode registration.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2088) Move edits log archiving logic into FSEditLog/JournalManager

2011-06-20 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052265#comment-13052265
 ] 

Todd Lipcon commented on HDFS-2088:
---

btw, the above patch sequences after the following: HDFS-2074, HDFS-2085, 
HDFS-2026, HDFS-2077, HDFS-2078.

> Move edits log archiving logic into FSEditLog/JournalManager
> 
>
> Key: HDFS-2088
> URL: https://issues.apache.org/jira/browse/HDFS-2088
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: Edit log branch (HDFS-1073)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: hdfs-2088.txt
>
>
> Currently the logic to archive edits logs is File-specific which presents 
> some issues for Ivan's work. Since it relies on inspecting storage 
> directories using NNStorage.inspectStorageDirs, it also misses directories 
> that the image layer considers "failed" which results in edits logs piling up 
> in these kinds of directories. This JIRA is similar to HDFS-2018 but only 
> deals with archival for now.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2088) Move edits log archiving logic into FSEditLog/JournalManager

2011-06-20 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2088:
--

Attachment: hdfs-2088.txt

Patch does the following:
- once StorageArchivalManager determines the minimum txid that needs to be 
retained, it simply passes it along to FSEditLog.archiveLogsOlderThan.
- FSEditLog now propagates this through to all of the journal managers
- refactors some code in FSImageTransactionalStorageInspector into a static 
method {{matchEditLogs}} so that FileJournalManager can share it. This will 
eventually move into FileJournalManager itself like Ivan did in HDFS-2018, once 
the load-time stuff gets split up.
- adds a functional test to show that edits logs keep getting archived in an 
edits directory even if it's considered "failed" as an image directory

> Move edits log archiving logic into FSEditLog/JournalManager
> 
>
> Key: HDFS-2088
> URL: https://issues.apache.org/jira/browse/HDFS-2088
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: Edit log branch (HDFS-1073)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: hdfs-2088.txt
>
>
> Currently the logic to archive edits logs is File-specific which presents 
> some issues for Ivan's work. Since it relies on inspecting storage 
> directories using NNStorage.inspectStorageDirs, it also misses directories 
> that the image layer considers "failed" which results in edits logs piling up 
> in these kinds of directories. This JIRA is similar to HDFS-2018 but only 
> deals with archival for now.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2088) Move edits log archiving logic into FSEditLog/JournalManager

2011-06-20 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2088:
--

  Component/s: name-node
  Description: Currently the logic to archive edits logs is 
File-specific which presents some issues for Ivan's work. Since it relies on 
inspecting storage directories using NNStorage.inspectStorageDirs, it also 
misses directories that the image layer considers "failed" which results in 
edits logs piling up in these kinds of directories. This JIRA is similar to 
HDFS-2018 but only deals with archival for now.
Affects Version/s: Edit log branch (HDFS-1073)
Fix Version/s: Edit log branch (HDFS-1073)

> Move edits log archiving logic into FSEditLog/JournalManager
> 
>
> Key: HDFS-2088
> URL: https://issues.apache.org/jira/browse/HDFS-2088
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: Edit log branch (HDFS-1073)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: Edit log branch (HDFS-1073)
>
>
> Currently the logic to archive edits logs is File-specific which presents 
> some issues for Ivan's work. Since it relies on inspecting storage 
> directories using NNStorage.inspectStorageDirs, it also misses directories 
> that the image layer considers "failed" which results in edits logs piling up 
> in these kinds of directories. This JIRA is similar to HDFS-2018 but only 
> deals with archival for now.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2018) Move all journal stream management code into one place

2011-06-20 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052262#comment-13052262
 ] 

Todd Lipcon commented on HDFS-2018:
---

Some comments on this patch:
- general idea seems right...
- I think some of it overlaps with HDFS-2085 and HDFS-2074, which are awaiting 
review. Can you take a look at those?
- for getEditLogManifest I think you need to support the case that different 
journal managers will have different sets of logs, but we need to be able to 
transfer all of them. ie imagine the case with two edits directories where one 
fails, comes back, then the other fails. In that case you need to interleave 
copying txns from both of them when transferring edits to the 2NN.
- I just opened HDFS-2088 and about to put a patch up there in a few minutes. 
That deals with the archiving logic and makes some similar changes (eg 
refactoring some stuff out of FSImageTransactionalStorageInspector into 
FileJournalManager)

Let me see if I can merge some of your work into my branch -- sorry that I'm a 
few patches ahead of what's been committed.

> Move all journal stream management code into one place
> --
>
> Key: HDFS-2018
> URL: https://issues.apache.org/jira/browse/HDFS-2018
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff
>
>
> Currently in the HDFS-1073 branch, the code for creating output streams is in 
> FileJournalManager and the code for input streams is in the inspectors. This 
> change does a number of things.
>   - Input and Output streams are now created by the JournalManager.
>   - FSImageStorageInspectors now deals with URIs when referring to edit logs
>   - Recovery of inprogress logs is performed by counting the number of 
> transactions instead of looking at the length of the file.
> The patch for this applies on top of the HDFS-1073 branch + HDFS-2003 patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2088) Move edits log archiving logic into FSEditLog/JournalManager

2011-06-20 Thread Todd Lipcon (JIRA)
Move edits log archiving logic into FSEditLog/JournalManager


 Key: HDFS-2088
 URL: https://issues.apache.org/jira/browse/HDFS-2088
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Todd Lipcon
Assignee: Todd Lipcon




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2083) Adopt JMXJsonServlet into HDFS in order to query statistics

2011-06-20 Thread Suresh Srinivas (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052258#comment-13052258
 ] 

Suresh Srinivas commented on HDFS-2083:
---

# Minor: Make this a staic string? 
"/jmx?qry=Hadoop:service=NameNode,name=NameNodeInfo"
# Why do you need to create new string in readOutput() {{out.append(new 
String(buffer, 0, len));}}
# Every time you need a property, you are querying mbean. Can you do it only 
once and hold on to the response?
# NamenodeMXBeanHelper javadoc needs to be updated (It still talks about JMX, 
also look at other references that talks about JMX access)
# Stream from URLConnection needs to be closed.


> Adopt JMXJsonServlet into HDFS in order to query statistics
> ---
>
> Key: HDFS-2083
> URL: https://issues.apache.org/jira/browse/HDFS-2083
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
> Fix For: 0.23.0
>
> Attachments: HDFS-2083.patch
>
>
> HADOOP-7144 added JMXJsonServlet into Common.  It gives the capability to 
> query statistics and metrics exposed via JMX to be queried through HTTP.  We 
> adopt this into HDFS.  This provides the alternative solution to HDFS-1874.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2084) Sometimes backup node/secondary name node stops with exception

2011-06-20 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052248#comment-13052248
 ] 

Konstantin Shvachko commented on HDFS-2084:
---

Looks like the checkpoint contains a record which tries to set time on a 
non-existing file. This should not happen. So the question is how did it 
happen? If it's a bug we should fix the cause. I don't see how, but if it's a 
legal scenario, then we can suppress NPE as you suggest.

> Sometimes backup node/secondary name node stops with exception
> --
>
> Key: HDFS-2084
> URL: https://issues.apache.org/jira/browse/HDFS-2084
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0
> Environment: FreeBSD
>Reporter: Vitalii Tymchyshyn
> Attachments: patch.diff
>
>
> 2011-06-17 11:43:23,096 ERROR 
> org.apache.hadoop.hdfs.server.namenode.Checkpointer: Throwable Exception in 
> doCheckpoint: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1765)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1753)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:708)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:411)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:378)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1209)
> at 
> org.apache.hadoop.hdfs.server.namenode.BackupStorage.loadCheckpoint(BackupStorage.java:158)
> at 
> org.apache.hadoop.hdfs.server.namenode.Checkpointer.doCheckpoint(Checkpointer.java:243)
> at 
> org.apache.hadoop.hdfs.server.namenode.Checkpointer.run(Checkpointer.java:141)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2087) Add methods to DataTransferProtocol interface

2011-06-20 Thread Tsz Wo (Nicholas), SZE (JIRA)
Add methods to DataTransferProtocol interface
-

 Key: HDFS-2087
 URL: https://issues.apache.org/jira/browse/HDFS-2087
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: data-node, hdfs client
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE


The {{DataTransferProtocol}} interface is currently empty.  The {{Sender}} and 
{{Receiver}} define similar methods individually.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2086) If the include hosts list contains host name, after restarting namenode, datanodes registrant is denied

2011-06-20 Thread Tanping Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052244#comment-13052244
 ] 

Tanping Wang commented on HDFS-2086:


There is a second part of this problem. When namenode checks datanode status by 
calling 
FSNameSystem#getDatanodeListForReport 

The namenode goes over its include hosts list and tries to determine if all the 
hosts in the include list are all registrant. If not, the host will be added 
into the dead list. In our case, as just mentioned early, after the namenode 
restarted, datanode registrants itself with its *IP address*. But the include 
list still
contains its *host name*. So the hostname is not recognized by namenode and is 
added into the dead list when namenode reports the datanodes status.  

> If the include hosts list contains host name, after restarting namenode, 
> datanodes registrant is denied 
> 
>
> Key: HDFS-2086
> URL: https://issues.apache.org/jira/browse/HDFS-2086
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
> Fix For: 0.23.0
>
>
> As the tile describes the problem:  if the include host list contains host 
> name, after restarting namenodes, the datanodes registrant is denied by 
> namenodes.  This is because after namenode is restarted, the still alive data 
> node will try to register itself with the namenode and it identifies itself 
> with its *IP address*.  However, namenode only allows all the hosts in its 
> hosts list to registrant and all of them are hostnames. So namenode would 
> deny the datanode registration.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2086) If the include hosts list contains host name, after restarting namenode, datanodes registrant is denied

2011-06-20 Thread Tanping Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanping Wang updated HDFS-2086:
---

Description: 
As the title describes the problem:  if the include host list contains host 
name, after restarting namenodes, the datanodes registrant is denied by 
namenodes.  This is because after namenode is restarted, the still alive data 
node will try to register itself with the namenode and it identifies itself 
with its *IP address*.  However, namenode only allows all the hosts in its 
hosts list to registrant and all of them are hostnames. So namenode would deny 
the datanode registration.


  was:
As the tile describes the problem:  if the include host list contains host 
name, after restarting namenodes, the datanodes registrant is denied by 
namenodes.  This is because after namenode is restarted, the still alive data 
node will try to register itself with the namenode and it identifies itself 
with its *IP address*.  However, namenode only allows all the hosts in its 
hosts list to registrant and all of them are hostnames. So namenode would deny 
the datanode registration.



> If the include hosts list contains host name, after restarting namenode, 
> datanodes registrant is denied 
> 
>
> Key: HDFS-2086
> URL: https://issues.apache.org/jira/browse/HDFS-2086
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
> Fix For: 0.23.0
>
>
> As the title describes the problem:  if the include host list contains host 
> name, after restarting namenodes, the datanodes registrant is denied by 
> namenodes.  This is because after namenode is restarted, the still alive data 
> node will try to register itself with the namenode and it identifies itself 
> with its *IP address*.  However, namenode only allows all the hosts in its 
> hosts list to registrant and all of them are hostnames. So namenode would 
> deny the datanode registration.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2086) If the include hosts list contains host name, after restarting namenode, datanodes registrant is denied

2011-06-20 Thread Tanping Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tanping Wang updated HDFS-2086:
---

  Component/s: name-node
Affects Version/s: 0.23.0
Fix Version/s: 0.23.0
 Assignee: Tanping Wang

> If the include hosts list contains host name, after restarting namenode, 
> datanodes registrant is denied 
> 
>
> Key: HDFS-2086
> URL: https://issues.apache.org/jira/browse/HDFS-2086
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Tanping Wang
>Assignee: Tanping Wang
> Fix For: 0.23.0
>
>
> As the tile describes the problem:  if the include host list contains host 
> name, after restarting namenodes, the datanodes registrant is denied by 
> namenodes.  This is because after namenode is restarted, the still alive data 
> node will try to register itself with the namenode and it identifies itself 
> with its *IP address*.  However, namenode only allows all the hosts in its 
> hosts list to registrant and all of them are hostnames. So namenode would 
> deny the datanode registration.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2086) If the include hosts list contains host name, after restarting namenode, datanodes registrant is denied

2011-06-20 Thread Tanping Wang (JIRA)
If the include hosts list contains host name, after restarting namenode, 
datanodes registrant is denied 


 Key: HDFS-2086
 URL: https://issues.apache.org/jira/browse/HDFS-2086
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Tanping Wang


As the tile describes the problem:  if the include host list contains host 
name, after restarting namenodes, the datanodes registrant is denied by 
namenodes.  This is because after namenode is restarted, the still alive data 
node will try to register itself with the namenode and it identifies itself 
with its *IP address*.  However, namenode only allows all the hosts in its 
hosts list to registrant and all of them are hostnames. So namenode would deny 
the datanode registration.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2018) Move all journal stream management code into one place

2011-06-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052183#comment-13052183
 ] 

Hadoop QA commented on HDFS-2018:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12483194/HDFS-2018.diff
  against trunk revision 1137675.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 20 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/800//console

This message is automatically generated.

> Move all journal stream management code into one place
> --
>
> Key: HDFS-2018
> URL: https://issues.apache.org/jira/browse/HDFS-2018
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff
>
>
> Currently in the HDFS-1073 branch, the code for creating output streams is in 
> FileJournalManager and the code for input streams is in the inspectors. This 
> change does a number of things.
>   - Input and Output streams are now created by the JournalManager.
>   - FSImageStorageInspectors now deals with URIs when referring to edit logs
>   - Recovery of inprogress logs is performed by counting the number of 
> transactions instead of looking at the length of the file.
> The patch for this applies on top of the HDFS-1073 branch + HDFS-2003 patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2018) Move all journal stream management code into one place

2011-06-20 Thread Ivan Kelly (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052180#comment-13052180
 ] 

Ivan Kelly commented on HDFS-2018:
--

Forgot to mention, the cleanup I plan to do is to remove LoadPlan as its not 
needed anymore. Also testing, as this needs a good few tests to verify the 
functionality.

> Move all journal stream management code into one place
> --
>
> Key: HDFS-2018
> URL: https://issues.apache.org/jira/browse/HDFS-2018
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff
>
>
> Currently in the HDFS-1073 branch, the code for creating output streams is in 
> FileJournalManager and the code for input streams is in the inspectors. This 
> change does a number of things.
>   - Input and Output streams are now created by the JournalManager.
>   - FSImageStorageInspectors now deals with URIs when referring to edit logs
>   - Recovery of inprogress logs is performed by counting the number of 
> transactions instead of looking at the length of the file.
> The patch for this applies on top of the HDFS-1073 branch + HDFS-2003 patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2018) Move all journal stream management code into one place

2011-06-20 Thread Ivan Kelly (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Kelly updated HDFS-2018:
-

Attachment: HDFS-2018.diff

Another rough patch, which I'll clean up tomorrow. In this patch, multiple 
journal streams can be used in loading. PreTransaction stuff is confined solely 
to FSImage, JournalManagers never know about it. Also file and image loading 
are completely separate now.

> Move all journal stream management code into one place
> --
>
> Key: HDFS-2018
> URL: https://issues.apache.org/jira/browse/HDFS-2018
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ivan Kelly
>Assignee: Ivan Kelly
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: HDFS-2018.diff, HDFS-2018.diff, HDFS-2018.diff
>
>
> Currently in the HDFS-1073 branch, the code for creating output streams is in 
> FileJournalManager and the code for input streams is in the inspectors. This 
> change does a number of things.
>   - Input and Output streams are now created by the JournalManager.
>   - FSImageStorageInspectors now deals with URIs when referring to edit logs
>   - Recovery of inprogress logs is performed by counting the number of 
> transactions instead of looking at the length of the file.
> The patch for this applies on top of the HDFS-1073 branch + HDFS-2003 patch.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-20 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052166#comment-13052166
 ] 

Konstantin Shvachko commented on HDFS-941:
--

Answers to some issues raised here:

Stack> RM says whats in a release and no one else.

We can still talk about technical merits of the implementation, don't we?

Todd> nrFiles <= nrNodes means full locality, right?

No. In DFSIO there is no locality, since files that DFSIO reads/writes are not 
the input of the MR job. Their names are. The reason here is to make sure the 
job completes in one wave of mappers, and to minimize contention on the drives 
between tasks.

I was trying to avoid making this issue yet another discussion about DFSIO, 
because
the objective here is to verify that the patch does not introduce regression in 
performance for sequential ios. If the benchmark I proposed doesn't work for 
you guys, you can propose a different one.

Dhruba, Todd, Nicholas> TestDFSIO exhibits very high variance, and its results 
are dependent on mapreduce's scheduling.

DFSIO does not depend on the MR scheduling. It depends on the OS memory cache. 
Cluster nodes these days run with 16, 32 GB RAM. So a 10GB file almost entirely 
can be cached by OS. When you repeatedly run DFSIO then you are not measuring 
cold IO, but RAM access and communication. And high variation is explained by 
the fact that some data is cached and some is not.
For example DFSIO -write is usually very stable with std.dev < 1. This is 
because it deals with cold writes.
For DFSIO -read you need to choose file size larger than your RAM. With 
sequential reads OS cache works as LRU, so if your file is larger than RAM, the 
OS cache will "forget" blocks from the head of the file, when you get to 
reading the tail. And when you start reading the file again cache will release 
oldest pages, which correspond to the higher offset in the file. So it is going 
to be cold read.
I had to go to 100GB files, which brought std.dev to < 2, and variation in 
throughput was around 3%.
Alternatively you can clean Linux cache on all DataNodes.
 
Nicholas> it is hard to explain what do the "Throughput" and "Average IO rate" 
really mean.

[This 
post|http://old.nabble.com/Re%3A-TestDFSIO-delivers-bad-values-of-%22throughput%22-and-%22average-IO-rate%22-p21322404.html]
 has the definitions.

Nicholas, I agree with you the results you are posting don't make sense. 
The point is though not to screw the benchmark, but to find the conditions when 
it reliably measures what you need.

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Fix For: 0.22.0
>
> Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
> HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
> HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
> HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-941) Datanode xceiver protocol should allow reuse of a connection

2011-06-20 Thread Konstantin Shvachko (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052160#comment-13052160
 ] 

Konstantin Shvachko commented on HDFS-941:
--

I ran some test myself over the weekend. The results are good. I am getting 
throughput around 75-78 MB/sec on reads with small (< 2) std.deviation in both 
cases.
So I am +1 now on this patch.

> Datanode xceiver protocol should allow reuse of a connection
> 
>
> Key: HDFS-941
> URL: https://issues.apache.org/jira/browse/HDFS-941
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node, hdfs client
>Affects Versions: 0.22.0
>Reporter: Todd Lipcon
>Assignee: bc Wong
> Fix For: 0.22.0
>
> Attachments: 941.22.txt, 941.22.txt, 941.22.v2.txt, 941.22.v3.txt, 
> HDFS-941-1.patch, HDFS-941-2.patch, HDFS-941-3.patch, HDFS-941-3.patch, 
> HDFS-941-4.patch, HDFS-941-5.patch, HDFS-941-6.22.patch, HDFS-941-6.patch, 
> HDFS-941-6.patch, HDFS-941-6.patch, fix-close-delta.txt, hdfs-941.txt, 
> hdfs-941.txt, hdfs-941.txt, hdfs-941.txt, hdfs941-1.png
>
>
> Right now each connection into the datanode xceiver only processes one 
> operation.
> In the case that an operation leaves the stream in a well-defined state (eg a 
> client reads to the end of a block successfully) the same connection could be 
> reused for a second operation. This should improve random read performance 
> significantly.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2085) 1073: finalize inprogress edit logs at startup

2011-06-20 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2085:
--

Attachment: hdfs-2085.txt

> 1073: finalize inprogress edit logs at startup
> --
>
> Key: HDFS-2085
> URL: https://issues.apache.org/jira/browse/HDFS-2085
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: Edit log branch (HDFS-1073)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: Edit log branch (HDFS-1073)
>
> Attachments: hdfs-2085.txt
>
>
> With HDFS-2074, the NameNode can read through any "in-progress" logs it finds 
> during startup to determine how many transactions they have. It can then 
> re-name the file from its inprogress name to its finalized name. For example, 
> if it finds a file edits_10_inprogress with 3 transactions, it can rename it 
> to edits_10-12 at startup. This means that other parts of the system like 
> edits-log-transfer don't need to worry about in-progress logs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2085) 1073: finalize inprogress edit logs at startup

2011-06-20 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2085:
--

Description: With HDFS-2074, the NameNode can read through any 
"in-progress" logs it finds during startup to determine how many transactions 
they have. It can then re-name the file from its inprogress name to its 
finalized name. For example, if it finds a file edits_10_inprogress with 3 
transactions, it can rename it to edits_10-12 at startup. This means that other 
parts of the system like edits-log-transfer don't need to worry about 
in-progress logs.

> 1073: finalize inprogress edit logs at startup
> --
>
> Key: HDFS-2085
> URL: https://issues.apache.org/jira/browse/HDFS-2085
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: Edit log branch (HDFS-1073)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: Edit log branch (HDFS-1073)
>
>
> With HDFS-2074, the NameNode can read through any "in-progress" logs it finds 
> during startup to determine how many transactions they have. It can then 
> re-name the file from its inprogress name to its finalized name. For example, 
> if it finds a file edits_10_inprogress with 3 transactions, it can rename it 
> to edits_10-12 at startup. This means that other parts of the system like 
> edits-log-transfer don't need to worry about in-progress logs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2085) 1073: finalize inprogress edit logs at startup

2011-06-20 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2085:
--

  Component/s: name-node
Affects Version/s: Edit log branch (HDFS-1073)
Fix Version/s: Edit log branch (HDFS-1073)

> 1073: finalize inprogress edit logs at startup
> --
>
> Key: HDFS-2085
> URL: https://issues.apache.org/jira/browse/HDFS-2085
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: name-node
>Affects Versions: Edit log branch (HDFS-1073)
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: Edit log branch (HDFS-1073)
>
>
> With HDFS-2074, the NameNode can read through any "in-progress" logs it finds 
> during startup to determine how many transactions they have. It can then 
> re-name the file from its inprogress name to its finalized name. For example, 
> if it finds a file edits_10_inprogress with 3 transactions, it can rename it 
> to edits_10-12 at startup. This means that other parts of the system like 
> edits-log-transfer don't need to worry about in-progress logs.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2085) 1073: finalize inprogress edit logs at startup

2011-06-20 Thread Todd Lipcon (JIRA)
1073: finalize inprogress edit logs at startup
--

 Key: HDFS-2085
 URL: https://issues.apache.org/jira/browse/HDFS-2085
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Todd Lipcon
Assignee: Todd Lipcon




--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2080) Speed up DFS read path

2011-06-20 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052119#comment-13052119
 ] 

Todd Lipcon commented on HDFS-2080:
---

bq. Did you compare the performance of "software" version with zlib?

zlib's implementation iirc is the straightforward byte-by-byte algorithm, 
whereas the "software" implementation here is the "slicing-by-8" algorithm 
which generally performs much better. I didn't do a rigorous comparison, though 
I think I did notice a speedup when I switched from zlib to this implementation.

bq. Although it's not in the patch, I am sure you have play with it. Is there 
anything you found useful in making this work?

I did some hacking here: 
https://github.com/toddlipcon/cpp-dfsclient/blob/master/test_readblock.cc
See the read_packet() function and the crc32cHardware64_3parallel(...) code. 
This code does run faster than the "naive" non-pipelined implementation, though 
I didn't do a rigorous benchmark here either.

I figure it would be best to post the patch above before going all-out on 
optimization.


A few other notes on the patch:
- a few unit tests are failing because of bugs in the tests (eg not creating a 
socket with an associated Channel, or assuming read() will always return the 
requested size)
- the use of native byte buffers could cause a leak - we need some kind of 
pooling/buffer reuse here to avoid the native memory leak


Sadly this project is "for fun" for me at the moment so I probably won't be 
able to circle back for a little while. I will try to post a patch which 
addresses some of the above bugs though tonight.

> Speed up DFS read path
> --
>
> Key: HDFS-2080
> URL: https://issues.apache.org/jira/browse/HDFS-2080
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.23.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.23.0
>
> Attachments: hdfs-2080.txt
>
>
> I've developed a series of patches that speeds up the HDFS read path by a 
> factor of about 2.5x (~300M/sec to ~800M/sec for localhost reading from 
> buffer cache) and also will make it easier to allow for advanced users (eg 
> hbase) to skip a buffer copy. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2080) Speed up DFS read path

2011-06-20 Thread Kihwal Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052090#comment-13052090
 ] 

Kihwal Lee commented on HDFS-2080:
--

This is awesome! I will review the patch carefully, but I have a couple of 
questions for now.

* Did you compare the performance of "software" version with zlib? Just to make 
sure we fallback to a better one. If zlib's crc32 doesn't perform significantly 
better, using what we have will be simpler for supporting different polynomials.

* I did a bit of experiment about filling up the pipeline. When there is no 
data dependency, I get 1.17 cycles/Qword. By dividing the buffer into three 
chunks, I get about 1.6 - 1.7 cycles/Qword. This is before combining results 
and processing remainder.  I didn't tweak too much, so it might be possible to 
make it a bit better.  Although it's not in the patch, I am sure you have play 
with it. Is there anything you found useful in making this work?


> Speed up DFS read path
> --
>
> Key: HDFS-2080
> URL: https://issues.apache.org/jira/browse/HDFS-2080
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs client
>Affects Versions: 0.23.0
>Reporter: Todd Lipcon
>Assignee: Todd Lipcon
> Fix For: 0.23.0
>
> Attachments: hdfs-2080.txt
>
>
> I've developed a series of patches that speeds up the HDFS read path by a 
> factor of about 2.5x (~300M/sec to ~800M/sec for localhost reading from 
> buffer cache) and also will make it easier to allow for advanced users (eg 
> hbase) to skip a buffer copy. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1765) Block Replication should respect under-replication block priority

2011-06-20 Thread Haryadi Gunawi (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052086#comment-13052086
 ] 

Haryadi Gunawi commented on HDFS-1765:
--

I agree with Hairong. Recently, I've been playing around with this, and found 
the same problem as shown in the attachment (underReplicatedQueue.pdf).

At a high-level, if the round-robin iterator is in queue-2 (queue with 
priority=2), then the UR blocks in queue-0 must wait until the iterator wraps 
to queue-0 again.  So, I assume, in worst case, if queue-2 is long (as depicted 
in the graph), the UR blocks in queue-0 will take a very long time to be served!

The setup of the figure:
I have 20 nodes.  Each node holds 3000 blocks. I fail 4 nodes.
q-0: UR blocks with 1 replica
q-2: UR blocks with 2 replicas
pq: pending queue
(I stopped the experiment in the middle, because the pattern is obvious)

More details why the round-robin iterator does not work:

It is true that round-robin iterates through queue-0 first,
but the replication monitor runs this logic:
- choose a block B to be replicated
- pick a source node S that still has B 
- BUT if S were already chosen to replicate other blocks 
  (i.e. S' rep stream is already larger than the maxrepstream(2)),
  then increment the iterator (and thus this block B in queue-0
  will not be served until the round-robin iterator wraps).
  And if other queues (e.g. q1 and q2) are super long, then queue-0
  might be starved for a long time.



> Block Replication should respect under-replication block priority
> -
>
> Key: HDFS-1765
> URL: https://issues.apache.org/jira/browse/HDFS-1765
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.23.0
>
>
> Currently under-replicated blocks are assigned different priorities depending 
> on how many replicas a block has. However the replication monitor works on 
> blocks in a round-robin fashion. So the newly added high priority blocks 
> won't get replicated until all low-priority blocks are done. One example is 
> that on decommissioning datanode WebUI we often observe that "blocks with 
> only decommissioning replicas" do not get scheduled to replicate before other 
> blocks, so risking data availability if the node is shutdown for repair 
> before decommission completes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-1765) Block Replication should respect under-replication block priority

2011-06-20 Thread Haryadi Gunawi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haryadi Gunawi updated HDFS-1765:
-

Attachment: underReplicatedQueue.pdf

> Block Replication should respect under-replication block priority
> -
>
> Key: HDFS-1765
> URL: https://issues.apache.org/jira/browse/HDFS-1765
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: name-node
>Affects Versions: 0.23.0
>Reporter: Hairong Kuang
>Assignee: Hairong Kuang
> Fix For: 0.23.0
>
> Attachments: underReplicatedQueue.pdf
>
>
> Currently under-replicated blocks are assigned different priorities depending 
> on how many replicas a block has. However the replication monitor works on 
> blocks in a round-robin fashion. So the newly added high priority blocks 
> won't get replicated until all low-priority blocks are done. One example is 
> that on decommissioning datanode WebUI we often observe that "blocks with 
> only decommissioning replicas" do not get scheduled to replicate before other 
> blocks, so risking data availability if the node is shutdown for repair 
> before decommission completes.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1568) Improve DataXceiver error logging

2011-06-20 Thread Joey Echeverria (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052079#comment-13052079
 ] 

Joey Echeverria commented on HDFS-1568:
---

Thanks.

> Improve DataXceiver error logging
> -
>
> Key: HDFS-1568
> URL: https://issues.apache.org/jira/browse/HDFS-1568
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.23.0
>Reporter: Todd Lipcon
>Assignee: Joey Echeverria
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-1568-1.patch, HDFS-1568-3.patch, HDFS-1568-4.patch, 
> HDFS-1568-5.patch, HDFS-1568-6.patch, HDFS-1568-output-changes.patch
>
>
> In supporting customers we often see things like SocketTimeoutExceptions or 
> EOFExceptions coming from DataXceiver, but the logging isn't very good. For 
> example, if we get an IOE while setting up a connection to the downstream 
> mirror in writeBlock, the IP of the downstream mirror isn't logged on the DN 
> side.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-2084) Sometimes backup node/secondary name node stops with exception

2011-06-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052048#comment-13052048
 ] 

Hadoop QA commented on HDFS-2084:
-

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12483166/patch.diff
  against trunk revision 1137675.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

-1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HDFS-Build/799//console

This message is automatically generated.

> Sometimes backup node/secondary name node stops with exception
> --
>
> Key: HDFS-2084
> URL: https://issues.apache.org/jira/browse/HDFS-2084
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0
> Environment: FreeBSD
>Reporter: Vitalii Tymchyshyn
> Attachments: patch.diff
>
>
> 2011-06-17 11:43:23,096 ERROR 
> org.apache.hadoop.hdfs.server.namenode.Checkpointer: Throwable Exception in 
> doCheckpoint: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1765)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1753)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:708)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:411)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:378)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1209)
> at 
> org.apache.hadoop.hdfs.server.namenode.BackupStorage.loadCheckpoint(BackupStorage.java:158)
> at 
> org.apache.hadoop.hdfs.server.namenode.Checkpointer.doCheckpoint(Checkpointer.java:243)
> at 
> org.apache.hadoop.hdfs.server.namenode.Checkpointer.run(Checkpointer.java:141)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-1568) Improve DataXceiver error logging

2011-06-20 Thread Tsz Wo (Nicholas), SZE (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052043#comment-13052043
 ] 

Tsz Wo (Nicholas), SZE commented on HDFS-1568:
--

The previous build failed unexpectedly.  I have restarted a new build.

> Improve DataXceiver error logging
> -
>
> Key: HDFS-1568
> URL: https://issues.apache.org/jira/browse/HDFS-1568
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: data-node
>Affects Versions: 0.23.0
>Reporter: Todd Lipcon
>Assignee: Joey Echeverria
>Priority: Minor
>  Labels: newbie
> Attachments: HDFS-1568-1.patch, HDFS-1568-3.patch, HDFS-1568-4.patch, 
> HDFS-1568-5.patch, HDFS-1568-6.patch, HDFS-1568-output-changes.patch
>
>
> In supporting customers we often see things like SocketTimeoutExceptions or 
> EOFExceptions coming from DataXceiver, but the logging isn't very good. For 
> example, if we get an IOE while setting up a connection to the downstream 
> mirror in writeBlock, the IP of the downstream mirror isn't logged on the DN 
> side.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HDFS-420) Fuse-dfs should cache fs handles

2011-06-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052042#comment-13052042
 ] 

Hudson commented on HDFS-420:
-

Integrated in Hadoop-Hdfs-trunk-Commit #751 (See 
[https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/751/])
HDFS-420. Fuse-dfs should cache fs handles. Contributed by Brian Bockelman 
and Eli Collins

eli : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1137675
Files : 
* /hadoop/common/trunk/hdfs/src/contrib/build-contrib.xml
* /hadoop/common/trunk/hdfs/src/contrib/fuse-dfs/src/fuse_impls_unlink.c
* /hadoop/common/trunk/hdfs/CHANGES.txt
* /hadoop/common/trunk/hdfs/src/contrib/fuse-dfs/src/fuse_impls_getattr.c
* /hadoop/common/trunk/hdfs/src/contrib/fuse-dfs/src/fuse_impls_release.c
* /hadoop/common/trunk/hdfs/src/contrib/fuse-dfs/src/fuse_impls_utimens.c
* /hadoop/common/trunk/hdfs/src/contrib/fuse-dfs/src/fuse_options.c
* /hadoop/common/trunk/hdfs/src/contrib/fuse-dfs/src/fuse_stat_struct.c
* /hadoop/common/trunk/hdfs/src/contrib/fuse-dfs/src/fuse_dfs_wrapper.sh
* /hadoop/common/trunk/hdfs/src/contrib/fuse-dfs/src/fuse_impls_rename.c
* /hadoop/common/trunk/hdfs/src/contrib/fuse-dfs/src/fuse_impls_mkdir.c
* /hadoop/common/trunk/hdfs/src/contrib/fuse-dfs/src/fuse_impls_statfs.c
* /hadoop/common/trunk/hdfs/src/contrib/fuse-dfs/src/fuse_impls_rmdir.c
* /hadoop/common/trunk/hdfs/src/contrib/fuse-dfs/build.xml
* /hadoop/common/trunk/hdfs/src/contrib/fuse-dfs/src/fuse_users.c
* /hadoop/common/trunk/hdfs/src/contrib/fuse-dfs/src/fuse_init.c
* /hadoop/common/trunk/hdfs/src/contrib/fuse-dfs/src/fuse_impls_access.c
* /hadoop/common/trunk/hdfs/src/contrib/fuse-dfs/configure.ac
* /hadoop/common/trunk/hdfs/src/contrib/fuse-dfs/src/fuse_impls_truncate.c
* /hadoop/common/trunk/hdfs/src/contrib/fuse-dfs/src/fuse_connect.c
* /hadoop/common/trunk/hdfs/src/contrib/fuse-dfs/src/fuse_impls_readdir.c
* /hadoop/common/trunk/hdfs/src/contrib/build.xml
* /hadoop/common/trunk/hdfs/src/contrib/fuse-dfs/src/fuse_impls_open.c
* /hadoop/common/trunk/hdfs/src/c++/libhdfs/hdfs.c
* /hadoop/common/trunk/hdfs/src/contrib/fuse-dfs/src/fuse_connect.h
* /hadoop/common/trunk/hdfs/src/contrib/fuse-dfs/src/fuse_dfs.c
* /hadoop/common/trunk/hdfs/src/contrib/fuse-dfs/src/fuse_impls_chmod.c
* /hadoop/common/trunk/hdfs/src/contrib/fuse-dfs/src/fuse_impls_chown.c
* /hadoop/common/trunk/hdfs/src/contrib/fuse-dfs/src/fuse_context_handle.h
* /hadoop/common/trunk/hdfs/src/contrib/fuse-dfs/src/fuse_dfs.h


> Fuse-dfs should cache fs handles
> 
>
> Key: HDFS-420
> URL: https://issues.apache.org/jira/browse/HDFS-420
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: contrib/fuse-dfs
>Affects Versions: 0.20.2
> Environment: Fedora core 10, x86_64, 2.6.27.7-134.fc10.x86_64 #1 SMP 
> (AMD 64), gcc 4.3.2, java 1.6.0 (IcedTea6 1.4 (fedora-7.b12.fc10-x86_64) 
> Runtime Environment (build 1.6.0_0-b12) OpenJDK 64-Bit Server VM (build 
> 10.0-b19, mixed mode)
>Reporter: Dima Brodsky
>Assignee: Brian Bockelman
> Fix For: 0.23.0
>
> Attachments: fuse_dfs_020_memleaks.patch, 
> fuse_dfs_020_memleaks_v3.patch, fuse_dfs_020_memleaks_v8.patch, 
> hdfs-420-1.patch, hdfs-420-2.patch, hdfs-420-3.patch
>
>
> Fuse-dfs should cache fs handles on a per-user basis. This significantly 
> increases performance, and has the side effect of fixing the current code 
> which leaks fs handles.
> The original bug description follows:
> I run the following test:
> 1.  Run hadoop DFS in single node mode
> 2.  start up fuse_dfs
> 3.  copy my source tree, about 250 megs, into the DFS
>  cp -av * /mnt/hdfs/
> in /var/log/messages I keep seeing:
> Dec 22 09:02:08 bodum fuse_dfs: ERROR: hdfs trying to utime 
> /bar/backend-trunk2/src/machinery/hadoop/output/2008/11/19 to 
> 1229385138/1229963739
> and then eventually
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1209
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: cou

[jira] [Updated] (HDFS-2084) Sometimes backup node/secondary name node stops with exception

2011-06-20 Thread Vitalii Tymchyshyn (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Tymchyshyn updated HDFS-2084:
-

Attachment: patch.diff

This is patch to skip such entries. Note that it's against my "own copy" of 
0.21 release tag, so revisions are from my svn

> Sometimes backup node/secondary name node stops with exception
> --
>
> Key: HDFS-2084
> URL: https://issues.apache.org/jira/browse/HDFS-2084
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0
> Environment: FreeBSD
>Reporter: Vitalii Tymchyshyn
> Attachments: patch.diff
>
>
> 2011-06-17 11:43:23,096 ERROR 
> org.apache.hadoop.hdfs.server.namenode.Checkpointer: Throwable Exception in 
> doCheckpoint: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1765)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1753)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:708)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:411)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:378)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1209)
> at 
> org.apache.hadoop.hdfs.server.namenode.BackupStorage.loadCheckpoint(BackupStorage.java:158)
> at 
> org.apache.hadoop.hdfs.server.namenode.Checkpointer.doCheckpoint(Checkpointer.java:243)
> at 
> org.apache.hadoop.hdfs.server.namenode.Checkpointer.run(Checkpointer.java:141)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2084) Sometimes backup node/secondary name node stops with exception

2011-06-20 Thread Vitalii Tymchyshyn (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitalii Tymchyshyn updated HDFS-2084:
-

  Tags: workaround
Status: Patch Available  (was: Open)

This is my patch to skip such problematic entries

> Sometimes backup node/secondary name node stops with exception
> --
>
> Key: HDFS-2084
> URL: https://issues.apache.org/jira/browse/HDFS-2084
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: name-node
>Affects Versions: 0.21.0
> Environment: FreeBSD
>Reporter: Vitalii Tymchyshyn
>
> 2011-06-17 11:43:23,096 ERROR 
> org.apache.hadoop.hdfs.server.namenode.Checkpointer: Throwable Exception in 
> doCheckpoint: 
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1765)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1753)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:708)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:411)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:378)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1209)
> at 
> org.apache.hadoop.hdfs.server.namenode.BackupStorage.loadCheckpoint(BackupStorage.java:158)
> at 
> org.apache.hadoop.hdfs.server.namenode.Checkpointer.doCheckpoint(Checkpointer.java:243)
> at 
> org.apache.hadoop.hdfs.server.namenode.Checkpointer.run(Checkpointer.java:141)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2084) Sometimes backup node/secondary name node stops with exception

2011-06-20 Thread Vitalii Tymchyshyn (JIRA)
Sometimes backup node/secondary name node stops with exception
--

 Key: HDFS-2084
 URL: https://issues.apache.org/jira/browse/HDFS-2084
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 0.21.0
 Environment: FreeBSD
Reporter: Vitalii Tymchyshyn


2011-06-17 11:43:23,096 ERROR 
org.apache.hadoop.hdfs.server.namenode.Checkpointer: Throwable Exception in 
doCheckpoint: 
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1765)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedSetTimes(FSDirectory.java:1753)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:708)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:411)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:378)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1209)
at 
org.apache.hadoop.hdfs.server.namenode.BackupStorage.loadCheckpoint(BackupStorage.java:158)
at 
org.apache.hadoop.hdfs.server.namenode.Checkpointer.doCheckpoint(Checkpointer.java:243)
at 
org.apache.hadoop.hdfs.server.namenode.Checkpointer.run(Checkpointer.java:141)


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-420) Fuse-dfs should cache fs handles

2011-06-20 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins updated HDFS-420:
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

I've committed this. Thanks Brian and Todd.

> Fuse-dfs should cache fs handles
> 
>
> Key: HDFS-420
> URL: https://issues.apache.org/jira/browse/HDFS-420
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: contrib/fuse-dfs
>Affects Versions: 0.20.2
> Environment: Fedora core 10, x86_64, 2.6.27.7-134.fc10.x86_64 #1 SMP 
> (AMD 64), gcc 4.3.2, java 1.6.0 (IcedTea6 1.4 (fedora-7.b12.fc10-x86_64) 
> Runtime Environment (build 1.6.0_0-b12) OpenJDK 64-Bit Server VM (build 
> 10.0-b19, mixed mode)
>Reporter: Dima Brodsky
>Assignee: Brian Bockelman
> Fix For: 0.23.0
>
> Attachments: fuse_dfs_020_memleaks.patch, 
> fuse_dfs_020_memleaks_v3.patch, fuse_dfs_020_memleaks_v8.patch, 
> hdfs-420-1.patch, hdfs-420-2.patch, hdfs-420-3.patch
>
>
> Fuse-dfs should cache fs handles on a per-user basis. This significantly 
> increases performance, and has the side effect of fixing the current code 
> which leaks fs handles.
> The original bug description follows:
> I run the following test:
> 1.  Run hadoop DFS in single node mode
> 2.  start up fuse_dfs
> 3.  copy my source tree, about 250 megs, into the DFS
>  cp -av * /mnt/hdfs/
> in /var/log/messages I keep seeing:
> Dec 22 09:02:08 bodum fuse_dfs: ERROR: hdfs trying to utime 
> /bar/backend-trunk2/src/machinery/hadoop/output/2008/11/19 to 
> 1229385138/1229963739
> and then eventually
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1209
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1209
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1333
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1209
> Dec 22 09:03:49 bodum fuse_dfs: ERROR: could not connect to dfs 
> fuse_dfs.c:1037
> and the file system hangs.  hadoop is still running and I don't see any 
> errors in it's logs.  I have to unmount the dfs and restart fuse_dfs and then 
> everything is fine again.  At some point I see the following messages in the 
> /var/log/messages:
> ERROR: dfs problem - could not close file_handle(139677114350528) for 
> /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8339-93825052368848-1229278807.log
>  fuse_dfs.c:1464
> Dec 22 09:04:49 bodum fuse_dfs: ERROR: dfs problem - could not close 
> file_handle(139676770220176) for 
> /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8140-93825025883216-1229278759.log
>  fuse_dfs.c:1464
> Dec 22 09:05:13 bodum fuse_dfs: ERROR: dfs problem - could not close 
> file_handle(139677114812832) for 
> /bar/backend-trunk2/src/machinery/hadoop/input/2008/12/14/actionrecordlog-8138-93825070138960-1229251587.log
>  fuse_dfs.c:1464
> Is this a known issue?  Am I just flooding the system too much.  All of this 
> is being performed on a single, dual core, machine.
> Thanks!
> ttyl
> Dima

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HDFS-2034) length in getBlockRange becomes -ve when reading only from currently being written blk

2011-06-20 Thread John George (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John George updated HDFS-2034:
--

Status: Patch Available  (was: Open)

> length in getBlockRange becomes -ve when reading only from currently being 
> written blk
> --
>
> Key: HDFS-2034
> URL: https://issues.apache.org/jira/browse/HDFS-2034
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: John George
>Assignee: John George
>Priority: Minor
> Attachments: HDFS-2034-1.patch, HDFS-2034-1.patch, HDFS-2034-2.patch, 
> HDFS-2034-3.patch, HDFS-2034-4.patch, HDFS-2034.patch
>
>
> This came up during HDFS-1907. Posting an example that Todd posted in 
> HDFS-1907 that brought out this issue.
> {quote}
> Here's an example sequence to describe what I mean:
> 1. open file, write one and a half blocks
> 2. call hflush
> 3. another reader asks for the first byte of the second block
> {quote}
> In this case since offset is greater than the completed block length, the 
> math in getBlockRange() of DFSInputStreamer.java will set "length" to 
> negative.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira