where to find the log info

2011-07-28 Thread Daniel,Wu
Hi everyone,

I am new to it, and want to do some debug/log. I'd like to check what the value 
is for each mapper execution. If I add the following code in bold, where can I 
find the log info? If I can't do it in this way, how should I do?

 public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
  StringTokenizer itr = new StringTokenizer(value.toString());
  System.out.println(value.toString);
  while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
  }
}
  }

Re: where to find the log info

2011-07-28 Thread Harsh J
Task logs are written to userlogs directory on the TT nodes. You can
view task logs on the JobTracker/TaskTracker web UI for each task at:

http://machine:50030/taskdetails.jsp?jobid=JOBIDtipid=TASKID

All of syslogs, stdout and stderr logs are available in the links to
logs off that page.

2011/7/28 Daniel,Wu hadoop...@163.com:
 Hi everyone,

 I am new to it, and want to do some debug/log. I'd like to check what the 
 value is for each mapper execution. If I add the following code in bold, 
 where can I find the log info? If I can't do it in this way, how should I do?

     public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      System.out.println(value.toString);
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }



-- 
Harsh J


Re: File System Counters.

2011-07-28 Thread Harsh J
Raj,

There is no overlap. Data read from HDFS FileSystem instances go to
HDFS_BYTES_READ, and data read from Local FileSystem instances go to
FILE_BYTES_READ. These are two different FileSystems, and have no
overlap at all.

On Thu, Jul 28, 2011 at 5:56 AM, R V cattiv...@yahoo.com wrote:
 Hello

 I don't know if the question has been answered. I  am trying to understand 
 the overlap between FILE_BYTES_READ and HDFS_BYTES_READ. What are the various 
 components that provide value to this counter? For example when I see 
 FILE_BYTES_READ for a specific task ( Map or Reduce ) , is it purely due to 
 the spill during sort phase? If a HDFS read happens on a non local node, does 
 the counter increase on the node where the data block resides? What happens 
 when the data is local? does the counter increase for both HDFS_BYTES_READ 
 and FILE_BYTES_READ? From the values I am seeing, this looks to be the case 
 but I am not sure.

 I am not very fluent in Java , and hence I don't fully understand the source 
 . :-(

 Raj



-- 
Harsh J


RE: where to find the log info

2011-07-28 Thread Devaraj K
Daniel, You can find those std out statements in  {LOG
Directory}/userlogs/{task attemp id}/stdout file.

Same way you can find std err statements in {LOG Directory}/userlogs/{task
attemp id}/stderr and log statements in {LOG Directory}/userlogs/{task
attemp id}/syslog.

Devaraj K 

-Original Message-
From: Daniel,Wu [mailto:hadoop...@163.com] 
Sent: Thursday, July 28, 2011 11:47 AM
To: common-user@hadoop.apache.org
Subject: where to find the log info

Hi everyone,

I am new to it, and want to do some debug/log. I'd like to check what the
value is for each mapper execution. If I add the following code in bold,
where can I find the log info? If I can't do it in this way, how should I
do?

 public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
  StringTokenizer itr = new StringTokenizer(value.toString());
  System.out.println(value.toString);
  while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
  }
}
  }



Re: Replication and failure

2011-07-28 Thread Harsh J
Mohit,

I believe Tom's book (Hadoop: The Definitive Guide) covers this
precisely well. Perhaps others too.

Replication is a best-effort sort of thing. If 2 nodes are all that is
available, then two replicas are written and one is left to the
replica monitor service to replicate later as possible (leading to an
underreplicated write for the moment). The scenario (with default
configs) would only fail if you have 0 DataNodes 'available' to write
to.

Or are you asking about what happens when a DN fails during a write operation?

On Thu, Jul 28, 2011 at 5:08 AM, Mohit Anchlia mohitanch...@gmail.com wrote:
 Just trying to understand what happens if there are 3 nodes with
 replication set to 3 and one node fails. Does it fail the writes too?

 If there is a link that I can look at will be great. I tried searching
 but didn't see any definitive answer.

 Thanks,
 Mohit




-- 
Harsh J


Reader/Writer problem in HDFS

2011-07-28 Thread Meghana
Hi,

We have a job where the map tasks are given the path to an output folder.
Each map task writes a single file to that folder. There is no reduce phase.
There is another thread, which constantly looks for new files in the output
folder. If found, it persists the contents to index, and deletes the file.

We use this code in the map task:
try {
OutputStream oStream = fileSystem.create(path);
IOUtils.write(xyz, oStream);
} finally {
IOUtils.closeQuietly(oStream);
}

The problem: Some times the reader thread sees  tries to read a file which
is not yet fully written to HDFS (or the checksum is not written yet, etc),
and throws an error. Is it possible to write an HDFS file in such a way that
it won't be visible until it is fully written?

We use Hadoop 0.20.203.

Thanks,

Meghana


Why hadoop 0.20.203 unit test failed

2011-07-28 Thread Yu Li
Hi all,

I'm trying to compile and unit testing hadoop 0.20.203, but met with almost
the same problem with previous discussion in the mailing list(
http://mail-archives.apache.org/mod_mbox/hadoop-general/201105.mbox/%3CBANLkTim68H=8ngbfzmsvrqob9pmy7fv...@mail.gmail.com%3E).
Even after setting umask to 022, I still have 11 testcases failed, as listed
below.

Test org.apache.hadoop.mapred.TestJobTrackerRestart FAILED
Test org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker FAILED
Test org.apache.hadoop.mapred.TestJobTrackerSafeMode FAILED
Test org.apache.hadoop.filecache.TestMRWithDistributedCache FAILED
Test org.apache.hadoop.filecache.TestTrackerDistributedCacheManager FAILED
Test org.apache.hadoop.mapred.TestMiniMRMapRedDebugScript FAILED
Test org.apache.hadoop.mapred.TestRecoveryManager FAILED
Test org.apache.hadoop.mapred.TestTaskTrackerLocalization FAILED
Test org.apache.hadoop.mapred.lib.TestCombineFileInputFormat FAILED
Test org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl FAILED
Test org.apache.hadoop.tools.rumen.TestRumenJobTraces FAILED
Test org.apache.hadoop.hdfsproxy.TestHdfsProxy FAILED

The jdk version in my testing environment is Sun jdk 1.6u19, and the ant
version is 1.8.2

Anybody knows what causes these testcase failure? Any comments/suggestion
would be highly appreciated.

-- 
Best Regards,
Li Yu


Hadoop output contains __temporary

2011-07-28 Thread 刘鎏
Hi, all

In my recent work in hadoop, I find that the output dir contains:
both _SUCCESS and __temporary. And then the next job would be failed because
the input path contains _temporary. How does this happen? And How to avoid
this?

Thanks for your replies.


liuliu
--


RE: Reader/Writer problem in HDFS

2011-07-28 Thread Laxman
One approach can be use some .tmp extension while writing. Once the write
is completed rename back to original file name. Also, reducer has to filter
out .tmp files.

This will ensure reducer will not pickup the partial files.

We do have the similar scenario where the a/m approach resolved the issue.

-Original Message-
From: Meghana [mailto:meghana.mara...@germinait.com] 
Sent: Thursday, July 28, 2011 1:38 PM
To: common-user; hdfs-u...@hadoop.apache.org
Subject: Reader/Writer problem in HDFS

Hi,

We have a job where the map tasks are given the path to an output folder.
Each map task writes a single file to that folder. There is no reduce phase.
There is another thread, which constantly looks for new files in the output
folder. If found, it persists the contents to index, and deletes the file.

We use this code in the map task:
try {
OutputStream oStream = fileSystem.create(path);
IOUtils.write(xyz, oStream);
} finally {
IOUtils.closeQuietly(oStream);
}

The problem: Some times the reader thread sees  tries to read a file which
is not yet fully written to HDFS (or the checksum is not written yet, etc),
and throws an error. Is it possible to write an HDFS file in such a way that
it won't be visible until it is fully written?

We use Hadoop 0.20.203.

Thanks,

Meghana



Re: Reader/Writer problem in HDFS

2011-07-28 Thread Meghana
Thanks Laxman! That would definitely help things. :)

Is there a better FileSystem/other method call to create a file in one go
(i.e. atomic i guess?), without having to call create() and then write to
the stream?

..meghana


On 28 July 2011 16:12, Laxman lakshman...@huawei.com wrote:

 One approach can be use some .tmp extension while writing. Once the write
 is completed rename back to original file name. Also, reducer has to filter
 out .tmp files.

 This will ensure reducer will not pickup the partial files.

 We do have the similar scenario where the a/m approach resolved the issue.

 -Original Message-
 From: Meghana [mailto:meghana.mara...@germinait.com]
 Sent: Thursday, July 28, 2011 1:38 PM
 To: common-user; hdfs-u...@hadoop.apache.org
 Subject: Reader/Writer problem in HDFS

 Hi,

 We have a job where the map tasks are given the path to an output folder.
 Each map task writes a single file to that folder. There is no reduce
 phase.
 There is another thread, which constantly looks for new files in the output
 folder. If found, it persists the contents to index, and deletes the file.

 We use this code in the map task:
 try {
OutputStream oStream = fileSystem.create(path);
IOUtils.write(xyz, oStream);
 } finally {
IOUtils.closeQuietly(oStream);
 }

 The problem: Some times the reader thread sees  tries to read a file which
 is not yet fully written to HDFS (or the checksum is not written yet, etc),
 and throws an error. Is it possible to write an HDFS file in such a way
 that
 it won't be visible until it is fully written?

 We use Hadoop 0.20.203.

 Thanks,

 Meghana




Fwd: HBase Mapreduce cannot find Map class

2011-07-28 Thread air
-- Forwarded message --
From: air cnwe...@gmail.com
Date: 2011/7/28
Subject: HBase Mapreduce cannot find Map class
To: CDH Users cdh-u...@cloudera.org


import java.io.IOException;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;

import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.mapred.TableMapReduceUtil;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.lib.NullOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;


public class LoadToHBase extends Configured implements Tool{
public static class XMapK, V extends MapReduceBase implements
MapperLongWritable, Text, K, V{
private JobConf conf;

@Override
public void configure(JobConf conf){
this.conf = conf;
try{
this.table = new HTable(new HBaseConfiguration(conf),
observations);
}catch(IOException e){
throw new RuntimeException(Failed HTable construction, e);
}
}

@Override
public void close() throws IOException{
super.close();
table.close();
}

private HTable table;
public void map(LongWritable key, Text value, OutputCollector
output, Reporter reporter) throws IOException{
String[] valuelist = value.toString().split(\t);
SimpleDateFormat sdf = new  SimpleDateFormat(-MM-dd
HH:mm:ss);
Date addtime = null; // 用户注册时间
Date ds = null;
Long delta_days = null;
String uid = valuelist[0];
try {
addtime = sdf.parse(valuelist[1]);
} catch (ParseException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

String ds_str = conf.get(load.hbase.ds, null);
if (ds_str != null){
try {
ds = sdf.parse(ds_str);
} catch (ParseException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}else{
ds_str = 2011-07-28;
}

if (addtime != null  ds != null){
delta_days = (ds.getTime() - addtime.getTime()) / (24 * 60 *
60 * 1000);
}

if (delta_days != null){
byte[] rowKey = uid.getBytes();
Put p = new Put(rowKey);
p.add(content.getBytes(), attr1.getBytes(),
delta_days.toString().getBytes());
table.put(p);
}
}
}
/**
 * @param args
 * @throws Exception
 */
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
int exitCode = ToolRunner.run(new HBaseConfiguration(), new
LoadToHBase(), args);
System.exit(exitCode);
}

@Override
public int run(String[] args) throws Exception {
// TODO Auto-generated method stub
JobConf conf = new JobConf(getClass());
TableMapReduceUtil.addDependencyJars(conf);
FileInputFormat.addInputPath(conf, new Path(args[0]));
conf.setJobName(LoadToHBase);
conf.setJarByClass(getClass());
conf.setMapperClass(XMap.class);
conf.setNumReduceTasks(0);
conf.setOutputFormat(NullOutputFormat.class);
JobClient.runJob(conf);
return 0;
}

}

execute it using hbase LoadToHBase /user/hive/warehouse/datamining.db/xxx/
and it says:

..
11/07/28 17:20:29 INFO mapred.JobClient: Task Id :
attempt_201107261532_2625_m_04_1, Status : FAILED
java.lang.RuntimeException: Error in configuring object
at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at 

Error in 9000 and 9001 port in hadoop-0.20.2

2011-07-28 Thread Doan Ninh
Hi,

I run Hadoop in 4 Ubuntu 11.04 on VirtualBox.
On the master node (192.168.1.101), I configure fs.default.name = hdfs://
127.0.0.1:9000. Then i configure everything on 3 other node
When i start the cluster by entering $HADOOP_HOME/bin/start-all.sh on the
master node
Everything is ok, but the slave can't connect to the master on 9000, 9001
port.
I manually telnet to 192.168.1.101 in 9000, 9001. And the result is
connection refused
Then, i'm on the master node, telnet to localhost, 127.0.0.1:9000. The
result is connected.
But, on the master node, i telnet to 192.168.1.101:9000 = Connection
Refused

Can somebody help me?


Re: Error in 9000 and 9001 port in hadoop-0.20.2

2011-07-28 Thread madhu phatak
I had issue using IP address in XML files . You can try to use host names in
the place of IP address .

On Thu, Jul 28, 2011 at 5:22 PM, Doan Ninh uitnetw...@gmail.com wrote:

 Hi,

 I run Hadoop in 4 Ubuntu 11.04 on VirtualBox.
 On the master node (192.168.1.101), I configure fs.default.name = hdfs://
 127.0.0.1:9000. Then i configure everything on 3 other node
 When i start the cluster by entering $HADOOP_HOME/bin/start-all.sh on the
 master node
 Everything is ok, but the slave can't connect to the master on 9000, 9001
 port.
 I manually telnet to 192.168.1.101 in 9000, 9001. And the result is
 connection refused
 Then, i'm on the master node, telnet to localhost, 127.0.0.1:9000. The
 result is connected.
 But, on the master node, i telnet to 192.168.1.101:9000 = Connection
 Refused

 Can somebody help me?



Re: Error in 9000 and 9001 port in hadoop-0.20.2

2011-07-28 Thread Doan Ninh
In the first time, i use *hadoop-cluster-1* for 192.168.1.101.
That is the hostname of the master node.
But, the same error occurs
How can i fix it?

On Thu, Jul 28, 2011 at 7:07 PM, madhu phatak phatak@gmail.com wrote:

 I had issue using IP address in XML files . You can try to use host names
 in
 the place of IP address .

 On Thu, Jul 28, 2011 at 5:22 PM, Doan Ninh uitnetw...@gmail.com wrote:

  Hi,
 
  I run Hadoop in 4 Ubuntu 11.04 on VirtualBox.
  On the master node (192.168.1.101), I configure fs.default.name =
 hdfs://
  127.0.0.1:9000. Then i configure everything on 3 other node
  When i start the cluster by entering $HADOOP_HOME/bin/start-all.sh on
 the
  master node
  Everything is ok, but the slave can't connect to the master on 9000, 9001
  port.
  I manually telnet to 192.168.1.101 in 9000, 9001. And the result is
  connection refused
  Then, i'm on the master node, telnet to localhost, 127.0.0.1:9000. The
  result is connected.
  But, on the master node, i telnet to 192.168.1.101:9000 = Connection
  Refused
 
  Can somebody help me?
 



Re: Hadoop Question

2011-07-28 Thread Joey Echeverria
How about having the slave write to temp file first, then move it to the file 
the master is monitoring for after they close it?

-Joey



On Jul 27, 2011, at 22:51, Nitin Khandelwal nitin.khandel...@germinait.com 
wrote:

 Hi All,
 
 How can I determine if a file is being written to (by any thread) in HDFS. I
 have a continuous process on the master node, which is tracking a particular
 folder in HDFS for files to process. On the slave nodes, I am creating files
 in the same folder using the following code :
 
 At the slave node:
 
 import org.apache.commons.io.IOUtils;
 import org.apache.hadoop.fs.FileSystem;
 import java.io.OutputStream;
 
 OutputStream oStream = fileSystem.create(path);
 IOUtils.write(Some String, oStream);
 IOUtils.closeQuietly(oStream);
 
 
 At the master node,
 I am getting the earliest modified file in the folder. At times when I try
 reading the file, I get nothing in the file, mostly because the slave might
 be still finishing writing to the file. Is there any way, to somehow tell
 the master, that the slave is still writing to the file and to check the
 file sometime later for actual content.
 
 Thanks,
 -- 
 
 
 Nitin Khandelwal


next gen map reduce

2011-07-28 Thread real great..
In which Hadoop version is next gen introduced?

-- 
Regards,
R.V.


Re: Error in 9000 and 9001 port in hadoop-0.20.2

2011-07-28 Thread Nitin Khandelwal
Plz change ur* fs.default.name* to hdfs://192.168.1.101:9000
Thanks,
Nitin

On 28 July 2011 17:46, Doan Ninh uitnetw...@gmail.com wrote:

 In the first time, i use *hadoop-cluster-1* for 192.168.1.101.
 That is the hostname of the master node.
 But, the same error occurs
 How can i fix it?

 On Thu, Jul 28, 2011 at 7:07 PM, madhu phatak phatak@gmail.com
 wrote:

  I had issue using IP address in XML files . You can try to use host names
  in
  the place of IP address .
 
  On Thu, Jul 28, 2011 at 5:22 PM, Doan Ninh uitnetw...@gmail.com wrote:
 
   Hi,
  
   I run Hadoop in 4 Ubuntu 11.04 on VirtualBox.
   On the master node (192.168.1.101), I configure fs.default.name =
  hdfs://
   127.0.0.1:9000. Then i configure everything on 3 other node
   When i start the cluster by entering $HADOOP_HOME/bin/start-all.sh on
  the
   master node
   Everything is ok, but the slave can't connect to the master on 9000,
 9001
   port.
   I manually telnet to 192.168.1.101 in 9000, 9001. And the result is
   connection refused
   Then, i'm on the master node, telnet to localhost, 127.0.0.1:9000. The
   result is connected.
   But, on the master node, i telnet to 192.168.1.101:9000 = Connection
   Refused
  
   Can somebody help me?
  
 




-- 


Nitin Khandelwal


Re: next gen map reduce

2011-07-28 Thread Thomas Graves
Its currently still on the MR279 branch -
http://svn.apache.org/viewvc/hadoop/common/branches/MR-279/.  It is planned
to be merged to trunk soon.

Tom

On 7/28/11 7:31 AM, real great.. greatness.hardn...@gmail.com wrote:

 In which Hadoop version is next gen introduced?



Re: Error in 9000 and 9001 port in hadoop-0.20.2

2011-07-28 Thread Doan Ninh
I changed fs.default.name to hdfs://192.168.1.101:9000. But, the same error
as before.
I need a help

On Thu, Jul 28, 2011 at 7:45 PM, Nitin Khandelwal 
nitin.khandel...@germinait.com wrote:

 Plz change ur* fs.default.name* to hdfs://192.168.1.101:9000
 Thanks,
 Nitin

 On 28 July 2011 17:46, Doan Ninh uitnetw...@gmail.com wrote:

  In the first time, i use *hadoop-cluster-1* for 192.168.1.101.
  That is the hostname of the master node.
  But, the same error occurs
  How can i fix it?
 
  On Thu, Jul 28, 2011 at 7:07 PM, madhu phatak phatak@gmail.com
  wrote:
 
   I had issue using IP address in XML files . You can try to use host
 names
   in
   the place of IP address .
  
   On Thu, Jul 28, 2011 at 5:22 PM, Doan Ninh uitnetw...@gmail.com
 wrote:
  
Hi,
   
I run Hadoop in 4 Ubuntu 11.04 on VirtualBox.
On the master node (192.168.1.101), I configure fs.default.name =
   hdfs://
127.0.0.1:9000. Then i configure everything on 3 other node
When i start the cluster by entering $HADOOP_HOME/bin/start-all.sh
 on
   the
master node
Everything is ok, but the slave can't connect to the master on 9000,
  9001
port.
I manually telnet to 192.168.1.101 in 9000, 9001. And the result is
connection refused
Then, i'm on the master node, telnet to localhost, 127.0.0.1:9000.
 The
result is connected.
But, on the master node, i telnet to 192.168.1.101:9000 =
 Connection
Refused
   
Can somebody help me?
   
  
 



 --


 Nitin Khandelwal



/tmp/hadoop-oracle/dfs/name is in an inconsistent state

2011-07-28 Thread Daniel,Wu
When I started hadoop, the namenode failed to startup because of the following 
error. The strange thing is that it says/tmp/hadoop-oracle/dfs/name 
isinconsistent, but I don't think I have configured it as 
/tmp/hadoop-oracle/dfs/name. Where should I check for the related configuration?
  2011-07-28 21:07:35,383 ERROR 
org.apache.hadoop.hdfs.server.namenode.NameNode: 
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory 
/tmp/hadoop-oracle/dfs/name is in an inconsistent state: storage directory does 
not exist or is not accessible.



Re: /tmp/hadoop-oracle/dfs/name is in an inconsistent state

2011-07-28 Thread Uma Maheswara Rao G 72686
Hi,

Before starting, you need to format the namenode.
./hdfs namenode -format

then this directories will be created.

respective configuration is 'dfs.namenode.name.dir'

default configurations will exist in hdfs-default.xml.
If you want to configure your own directory path, you can add the above 
property in hdfs-site.xml file.

Regards,
Uma Mahesh
**
 This email and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained here in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
email in error, please notify the sender by phone or email immediately and 
delete it!
 
*

- Original Message -
From: Daniel,Wu hadoop...@163.com
Date: Thursday, July 28, 2011 6:51 pm
Subject: /tmp/hadoop-oracle/dfs/name is in an inconsistent state
To: common-user@hadoop.apache.org

 When I started hadoop, the namenode failed to startup because of 
 the following error. The strange thing is that it says/tmp/hadoop-
 oracle/dfs/name isinconsistent, but I don't think I have 
 configured it as /tmp/hadoop-oracle/dfs/name. Where should I check 
 for the related configuration?
  2011-07-28 21:07:35,383 ERROR 
 org.apache.hadoop.hdfs.server.namenode.NameNode: 
 org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory 
 /tmp/hadoop-oracle/dfs/name is in an inconsistent state: storage directory 
 does not exist or is not accessible.
 
 


Re: next gen map reduce

2011-07-28 Thread Robert Evans
It has not been introduced yet.  If you are referring to MRV2.  It is targeted 
to go into the 0.23 release of Hadoop, but is currently on the MR-279 branch.  
Which should hopefully be merged to trunk in abut a week.

--Bobby

On 7/28/11 7:31 AM, real great.. greatness.hardn...@gmail.com wrote:

In which Hadoop version is next gen introduced?

--
Regards,
R.V.



Re: Hadoop-streaming using binary executable c program

2011-07-28 Thread Robert Evans
I am not completely sure what you are getting at.  It looks like the output of 
your c program is (And this is just a guess)  NOTE: \t stands for the tab 
character and in streaming it is used to separate the key from the value \n 
stands for carriage return and is used to separate individual records..
RNA-1\tSTRUCTURE-1\n
RNA-2\tSTRUCTURE-2\n
RNA-3\tSTRUCTURE-3\n
...


And you want the output to look like
RNA-1RNA-2RNA-3\tSTRUCTURE-1STRUCTURE-2STRUCTURE-3\n

You could use a reduce to do this, but the issue here is with the shuffle in 
between the maps and the reduces.  The Shuffle will group by the key to send to 
the reducers and then sort by the key.  So in reality your map output looks 
something like

FROM MAP 1:
RNA-1\tSTRUCTURE-1\n
RNA-2\tSTRUCTURE-2\n

FROM MAP 2:
RNA-3\tSTRUCTURE-3\n
RNA-4\tSTRUCTURE-4\n

FROM MAP 3:
RNA-5\tSTRUCTURE-5\n
RNA-6\tSTRUCTURE-6\n

If you send it to a single reducer (The only way to get a single file) Then the 
input to the reducer will be sorted alphabetically by the RNA, and the order of 
the input will be lost.  You can work around this by giving each line a unique 
number that is in the order you want It to be output.  But doing this would 
require you to write some code.  I would suggest that you do it with a small 
shell script after all the maps have completed to splice them together.

--
Bobby

On 7/27/11 2:55 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:



Hi Bobby,

I just want to ask you if there is away of using a reducer or something like 
concatenation to glue my outputs from the mapper and outputs
them as a single file and segment of the predicted RNA 2D structure?

FYI: I have used a reducer NONE before:

HADOOP_HOME$ bin/hadoop jar
/data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper
./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file
/data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input
/user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output
/user/yehdego/RF-out -reducer NONE -verbose

and a sample of my output using the mapper of two different slave nodes looks 
like this :

AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGC
and
[...(((...))).].
  (-13.46)

GGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUCU
.(((.((......)..  (-11.00)

and I want to concatenate and output them as a single predicated RNA sequence 
structure:

AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGCGGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUCU

[...(((...))).]..(((.((......)..


Regards,

Daniel T. Yehdego
Computational Science Program
University of Texas at El Paso, UTEP
dtyehd...@miners.utep.edu

 From: dtyehd...@miners.utep.edu
 To: common-user@hadoop.apache.org
 Subject: RE: Hadoop-streaming using binary executable c program
 Date: Tue, 26 Jul 2011 16:23:10 +


 Good afternoon Bobby,

 Thanks so much, now its working excellent. And the speed is also reasonable. 
 Once again thanks u.

 Regards,

 Daniel T. Yehdego
 Computational Science Program
 University of Texas at El Paso, UTEP
 dtyehd...@miners.utep.edu

  From: ev...@yahoo-inc.com
  To: common-user@hadoop.apache.org
  Date: Mon, 25 Jul 2011 14:47:34 -0700
  Subject: Re: Hadoop-streaming using binary executable c program
 
  This is likely to be slow and it is not ideal.  The ideal would be to 
  modify pknotsRG to be able to read from stdin, but that may not be possible.
 
  The shell script would probably look something like the following
 
  #!/bin/sh
  rm -f temp.txt;
  while read line
  do
echo $line  temp.txt;
  done
  exec pknotsRG temp.txt;
 
  Place it in a file say hadoopPknotsRG  Then you probably want to run
 
  chmod +x hadoopPknotsRG
 
  After that you want to test it with
 
  hadoop fs -cat 
  /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 
  | ./hadoopPknotsRG
 
  If that works then you can try it with Hadoop streaming
 
  HADOOP_HOME$ bin/hadoop jar 
  /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper 
  ./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file 
  /data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input 
  /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output 
  /user/yehdego/RF-out -reducer NONE -verbose
 
  --Bobby
 
  On 7/25/11 3:37 PM, Daniel Yehdego dtyehd...@miners.utep.edu wrote:
 
 
 
  Good afternoon Bobby,
 
  Thanks, you gave me a great help in finding out what the problem was. After 
  I put the command line you suggested me, I found out that there was a 
  segmentation error.
  The binary executable program pknotsRG only reads a file with a sequence in 
  it. This means, there should be a shell script, as you have said, 

RE: Error in 9000 and 9001 port in hadoop-0.20.2

2011-07-28 Thread Laxman
Start the namenode[set fs.default.name to hdfs://192.168.1.101:9000] and
check your netstat report [netstat -nlp] to check which port and IP it is
binding. Ideally, 9000 should be bound to 192.168.1.101. If yes, configure
the same IP in slaves as well. Otw, we may need to revisit your configs
once. 

To use the hostname, you should have hostname-IP mapping in /etc/hosts file
in master as well as slaves.

-Original Message-
From: Doan Ninh [mailto:uitnetw...@gmail.com] 
Sent: Thursday, July 28, 2011 6:45 PM
To: common-user@hadoop.apache.org
Subject: Re: Error in 9000 and 9001 port in hadoop-0.20.2

I changed fs.default.name to hdfs://192.168.1.101:9000. But, the same error
as before.
I need a help

On Thu, Jul 28, 2011 at 7:45 PM, Nitin Khandelwal 
nitin.khandel...@germinait.com wrote:

 Plz change ur* fs.default.name* to hdfs://192.168.1.101:9000
 Thanks,
 Nitin

 On 28 July 2011 17:46, Doan Ninh uitnetw...@gmail.com wrote:

  In the first time, i use *hadoop-cluster-1* for 192.168.1.101.
  That is the hostname of the master node.
  But, the same error occurs
  How can i fix it?
 
  On Thu, Jul 28, 2011 at 7:07 PM, madhu phatak phatak@gmail.com
  wrote:
 
   I had issue using IP address in XML files . You can try to use host
 names
   in
   the place of IP address .
  
   On Thu, Jul 28, 2011 at 5:22 PM, Doan Ninh uitnetw...@gmail.com
 wrote:
  
Hi,
   
I run Hadoop in 4 Ubuntu 11.04 on VirtualBox.
On the master node (192.168.1.101), I configure fs.default.name =
   hdfs://
127.0.0.1:9000. Then i configure everything on 3 other node
When i start the cluster by entering $HADOOP_HOME/bin/start-all.sh
 on
   the
master node
Everything is ok, but the slave can't connect to the master on 9000,
  9001
port.
I manually telnet to 192.168.1.101 in 9000, 9001. And the result is
connection refused
Then, i'm on the master node, telnet to localhost, 127.0.0.1:9000.
 The
result is connected.
But, on the master node, i telnet to 192.168.1.101:9000 =
 Connection
Refused
   
Can somebody help me?
   
  
 



 --


 Nitin Khandelwal




Class loading problem

2011-07-28 Thread Kumar, Ranjan
I have a class to define data I am reading from a MySQL database. According to 
online tutorials I created a class called MyRecord and extended it from 
Writable, DBWritable. While running it with hadoop I get a 
NoSuchMethodException: dataTest$MyRecord.init()

I am using 0.21.0

Thanks for your help
Ranjan


--
Important Notice to Recipients:
 
The sender of this e-mail is an employee of Morgan Stanley Smith Barney LLC. If 
you have received this communication in error, please destroy all electronic 
and paper copies and notify the sender immediately. Erroneous transmission is 
not intended to waive confidentiality or privilege. Morgan Stanley Smith Barney 
reserves the right, to the extent permitted under applicable law, to monitor 
electronic communications. This message is subject to terms available at the 
following link: http://www.morganstanley.com/disclaimers/mssbemail.html. If you 
cannot access this link, please notify us by reply message and we will send the 
contents to you. By messaging with Morgan Stanley Smith Barney you consent to 
the foregoing.


Re: Class loading problem

2011-07-28 Thread John Armstrong
On Thu, 28 Jul 2011 10:05:57 -0400, Kumar, Ranjan
ranjan.kum...@morganstanleysmithbarney.com wrote:
 I have a class to define data I am reading from a MySQL database.
 According to online tutorials I created a class called MyRecord and
 extended it from Writable, DBWritable. While running it with hadoop I
get a
 NoSuchMethodException: dataTest$MyRecord.init()

Hadoop needs a noargs constructor to build the object, which it then fills
in by using readFields().  Many classes come with a default noargs
constructor, which basically defers to the noargs contructor from Object,
or another ancestor class.

HOWEVER

If you defined another constructor that takes arguments, you've implicitly
removed the default noargs constructor on your class.  You need to define
one explicitly, which Hadoop can use to build your objects.

hth


Re: HBase Mapreduce cannot find Map class

2011-07-28 Thread Stack
See 
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description
for some help.
St.Ack

On Thu, Jul 28, 2011 at 4:04 AM, air cnwe...@gmail.com wrote:
 -- Forwarded message --
 From: air cnwe...@gmail.com
 Date: 2011/7/28
 Subject: HBase Mapreduce cannot find Map class
 To: CDH Users cdh-u...@cloudera.org


 import java.io.IOException;
 import java.text.ParseException;
 import java.text.SimpleDateFormat;
 import java.util.Date;

 import org.apache.hadoop.conf.Configured;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.hbase.HBaseConfiguration;
 import org.apache.hadoop.hbase.client.HTable;
 import org.apache.hadoop.hbase.client.Put;
 import org.apache.hadoop.hbase.mapred.TableMapReduceUtil;
 import org.apache.hadoop.io.LongWritable;
 import org.apache.hadoop.io.Text;
 import org.apache.hadoop.mapred.JobClient;
 import org.apache.hadoop.mapred.JobConf;
 import org.apache.hadoop.mapred.MapReduceBase;
 import org.apache.hadoop.mapred.Mapper;
 import org.apache.hadoop.mapred.OutputCollector;
 import org.apache.hadoop.mapred.Reporter;
 import org.apache.hadoop.mapred.FileInputFormat;
 import org.apache.hadoop.mapred.lib.NullOutputFormat;
 import org.apache.hadoop.util.Tool;
 import org.apache.hadoop.util.ToolRunner;


 public class LoadToHBase extends Configured implements Tool{
    public static class XMapK, V extends MapReduceBase implements
 MapperLongWritable, Text, K, V{
        private JobConf conf;

        @Override
        public void configure(JobConf conf){
            this.conf = conf;
            try{
                this.table = new HTable(new HBaseConfiguration(conf),
 observations);
            }catch(IOException e){
                throw new RuntimeException(Failed HTable construction, e);
            }
        }

        @Override
        public void close() throws IOException{
            super.close();
            table.close();
        }

        private HTable table;
        public void map(LongWritable key, Text value, OutputCollector
 output, Reporter reporter) throws IOException{
            String[] valuelist = value.toString().split(\t);
            SimpleDateFormat sdf = new  SimpleDateFormat(-MM-dd
 HH:mm:ss);
            Date addtime = null; // 用户注册时间
            Date ds = null;
            Long delta_days = null;
            String uid = valuelist[0];
            try {
                addtime = sdf.parse(valuelist[1]);
            } catch (ParseException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }

            String ds_str = conf.get(load.hbase.ds, null);
            if (ds_str != null){
                try {
                    ds = sdf.parse(ds_str);
                } catch (ParseException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                }
            }else{
                ds_str = 2011-07-28;
            }

            if (addtime != null  ds != null){
                delta_days = (ds.getTime() - addtime.getTime()) / (24 * 60 *
 60 * 1000);
            }

            if (delta_days != null){
                byte[] rowKey = uid.getBytes();
                Put p = new Put(rowKey);
                p.add(content.getBytes(), attr1.getBytes(),
 delta_days.toString().getBytes());
                table.put(p);
            }
        }
    }
    /**
     * @param args
     * @throws Exception
     */
    public static void main(String[] args) throws Exception {
        // TODO Auto-generated method stub
        int exitCode = ToolRunner.run(new HBaseConfiguration(), new
 LoadToHBase(), args);
        System.exit(exitCode);
    }

    @Override
    public int run(String[] args) throws Exception {
        // TODO Auto-generated method stub
        JobConf conf = new JobConf(getClass());
        TableMapReduceUtil.addDependencyJars(conf);
        FileInputFormat.addInputPath(conf, new Path(args[0]));
        conf.setJobName(LoadToHBase);
        conf.setJarByClass(getClass());
        conf.setMapperClass(XMap.class);
        conf.setNumReduceTasks(0);
        conf.setOutputFormat(NullOutputFormat.class);
        JobClient.runJob(conf);
        return 0;
    }

 }

 execute it using hbase LoadToHBase /user/hive/warehouse/datamining.db/xxx/
 and it says:

 ..
 11/07/28 17:20:29 INFO mapred.JobClient: Task Id :
 attempt_201107261532_2625_m_04_1, Status : FAILED
 java.lang.RuntimeException: Error in configuring object
        at
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
        at
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
        at
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
    

Re: Replication and failure

2011-07-28 Thread Mohit Anchlia
On Thu, Jul 28, 2011 at 12:17 AM, Harsh J ha...@cloudera.com wrote:
 Mohit,

 I believe Tom's book (Hadoop: The Definitive Guide) covers this
 precisely well. Perhaps others too.

 Replication is a best-effort sort of thing. If 2 nodes are all that is
 available, then two replicas are written and one is left to the
 replica monitor service to replicate later as possible (leading to an
 underreplicated write for the moment). The scenario (with default
 configs) would only fail if you have 0 DataNodes 'available' to write
 to.

Thanks Harsh. I think you answered my question. I thought that
replication of 3 is a must. And for that you really need atleast 4
nodes so that if one of the nodes die it can still write to 3 nodes. I
am assuming writes to replica nodes are always synchronous and not
eventually consistent.

 Or are you asking about what happens when a DN fails during a write operation?

I am assuming there will be some errors in this case.


 On Thu, Jul 28, 2011 at 5:08 AM, Mohit Anchlia mohitanch...@gmail.com wrote:
 Just trying to understand what happens if there are 3 nodes with
 replication set to 3 and one node fails. Does it fail the writes too?

 If there is a link that I can look at will be great. I tried searching
 but didn't see any definitive answer.

 Thanks,
 Mohit




 --
 Harsh J



Unit testing strategy for map/reduce methods

2011-07-28 Thread W.P. McNeill
I've been playing with unit testing strategies for my Hadoop work. A
discussion of techniques and a link to example code here:
http://cornercases.wordpress.com/2011/07/28/unit-testing-mapreduce-with-overridden-write-methods/
.


Exporting From Hive

2011-07-28 Thread Bale, Michael
Hi,

I was wondering if anyone could help me?

Does anyone know if it is possible to include the column headers in an
output from a Hive Query? I've had a look through the internet but can't
seem to find an answer.

If not, is it possible to export the result from a describe table query? If
so I could then run that at the same tie and join up at a future date.

Thanks for your help

-- 
*Mike Bale*
Graduate Insight Analyst
*Cable and Wireless Communications*
Tel: +44 (0)20 7315 4437
www.cwc.com

The information contained in this email (and any attachments) is confidential 
and may be privileged. If you are not the intended recipient
and have received this email in error, please notify the sender immediately by 
reply email and delete the message and any attachments.
If you are not the named addressee, you must not copy, disclose, forward or 
otherwise use the information contained in this email.
Cable  Wireless Communications Plc and its affiliates reserve the right to 
monitor all email communications through their networks to
ensure regulatory compliance.
 
Cable  Wireless Communications Plc is a company registered in England  Wales 
with number:
07130199 and offices located at 3rd Floor, 26 Red Lion Square, London WC1R 4HQ


Re: cygwin not connecting to Hadoop server

2011-07-28 Thread Uma Maheswara Rao G 72686
Hi A Df,

see inline at ::

- Original Message -
From: A Df abbey_dragonfor...@yahoo.com
Date: Wednesday, July 27, 2011 10:31 pm
Subject: Re: cygwin not connecting to Hadoop server
To: common-user@hadoop.apache.org common-user@hadoop.apache.org

 See inline at **. More questions and many Thanks :D
 
 
 
 
 
 From: Uma Maheswara Rao G 72686 mahesw...@huawei.com
 To: common-user@hadoop.apache.org; A Df 
 abbey_dragonfor...@yahoo.comCc: common-user@hadoop.apache.org 
 common-user@hadoop.apache.org
 Sent: Wednesday, 27 July 2011, 17:31
 Subject: Re: cygwin not connecting to Hadoop server
 
 
 Hi A Df,
 
 Did you format the NameNode first?
 
 ** I had formatted it already but then I had reinstalled Java and 
 upgraded the plugins in cygwin so I reformatted it again. :D yes it 
 worked!! I am not sure all the steps that got it to finally work 

:: Great :-)

 but I will have to document it to prevent this headache in the 
 future. Although I typed ssh localhost too , so question is, do I 
 need to type ssh localhost each time I need to run hadoop?? Also, 

:: Actually ssh is an authentication service.
To save the athentication keys, you can run below commands. which will provide 
authentication.No need to give password every time.

ssh-keygen -t rsa -P 
cat /root/.ssh/id_rsa.pub  /root/.ssh/authosized_keys

then exceute
/etc/init.d/sshd restart

To connect to remote machines
cat /root/.ssh/id_rsa.pub | ssh root@remoteIP 'cat  
/root/.ssh/authorized_keys'

then in remote machine excute
/etc/init.d/sshd restart

 since I need to work with Eclipse maybe you can have a look at my 
 post about the plugin cause I can get the patch to work. The 
 subject is Re: Cygwin not working with Hadoop and Eclipse Plugin. 
 I plan to read up on how to write programs for Hadoop. I am using 
 the tutorial at Yahoo but if you know of any really good about 
 coding with Hadoop or just about understanding Hadoop then please 
 let me know.
Hadoop Definitive guide will the great book for understanding the 
Hadoop.Some sample prgrams also will be available.
To check the Hadoop internals:
http://www.google.co.in/url?sa=tsource=webcd=8ved=0CEMQFjAHurl=http%3A%2F%2Findia.paxcel.net%3A6060%2FLargeDataMatters%2Fwp-content%2Fuploads%2F2010%2F09%2FHDFS1.pdfrct=jq=hadoop%20internals%20%2B%20part%201ei=CqAxTtD8C4fprQfYq6DMCwusg=AFQjCNGYMQbAeGP0cYGl4OJHseRsfEMGvQcad=rja


 
 Can you check the NN logs whether NN is started or not?
 ** I checked and the previous runs had some logs missing but now 
 the last one have all 5 logs and I got two conf files in xml. I 
 also copied out the other output files which I plan to examine. 
 Where do I specify the output extension format that I want for my 
 output file? I was hoping for an txt file it shows the output in a 
 file with no extension even though I can read it in Notepad++. I 
 also got to view the web interface at:
     NameNode - http://localhost:50070/
     JobTracker - http://localhost:50030/
 
 ** See below for the working version, finally!! Thanks
 CMD
 Williams@TWilliams-LTPC ~/hadoop-0.20.2
 $ bin/hadoop jar hadoop-0.20.2-examples.jar grep input
 11/07/27 17:42:20 INFO mapred.FileInputFormat: Total in
 
 11/07/27 17:42:20 INFO mapred.JobClient: Running job: j
 11/07/27 17:42:21 INFO mapred.JobClient:  map 0% reduce
 11/07/27 17:42:33 INFO mapred.JobClient:  map 15% reduc
 11/07/27 17:42:36 INFO mapred.JobClient:  map 23% reduc
 11/07/27 17:42:39 INFO mapred.JobClient:  map 38% reduc
 11/07/27 17:42:42 INFO mapred.JobClient:  map 38% reduc
 11/07/27 17:42:45 INFO mapred.JobClient:  map 53% reduc
 11/07/27 17:42:48 INFO mapred.JobClient:  map 69% reduc
 11/07/27 17:42:51 INFO mapred.JobClient:  map 76% reduc
 11/07/27 17:42:54 INFO mapred.JobClient:  map 92% reduc
 11/07/27 17:42:57 INFO mapred.JobClient:  map 100% redu
 11/07/27 17:43:06 INFO mapred.JobClient:  map 100% redu
 11/07/27 17:43:09 INFO mapred.JobClient: Job complete:
 11/07/27 17:43:09 INFO mapred.JobClient: Counters: 18
 11/07/27 17:43:09 INFO mapred.JobClient:   Job Counters
 11/07/27 17:43:09 INFO mapred.JobClient: Launched r
 11/07/27 17:43:09 INFO mapred.JobClient: Launched m
 11/07/27 17:43:09 INFO mapred.JobClient: Data-local
 11/07/27 17:43:09 INFO mapred.JobClient:   FileSystemCo
 11/07/27 17:43:09 INFO mapred.JobClient: FILE_BYTES
 11/07/27 17:43:09 INFO mapred.JobClient: HDFS_BYTES
 11/07/27 17:43:09 INFO mapred.JobClient: FILE_BYTES
 11/07/27 17:43:09 INFO mapred.JobClient: HDFS_BYTES
 11/07/27 17:43:09 INFO mapred.JobClient:   Map-Reduce F
 11/07/27 17:43:09 INFO mapred.JobClient: Reduce inp
 11/07/27 17:43:09 INFO mapred.JobClient: Combine ou
 11/07/27 17:43:09 INFO mapred.JobClient: Map input
 11/07/27 17:43:09 INFO mapred.JobClient: Reduce shu
 11/07/27 17:43:09 INFO mapred.JobClient: Reduce out
 11/07/27 17:43:09 INFO mapred.JobClient: Spilled Re
 11/07/27 17:43:09 INFO mapred.JobClient: Map output

Re: OSX starting hadoop error

2011-07-28 Thread Bryan Keller
I am also seeing this error upon startup. I am guessing you are using OS X 
Lion? It started happening to me after I upgraded to 10.7. Hadoop seems to 
function properly despite this error showing up, though it is annoying.


On Jul 27, 2011, at 12:37 PM, Ben Cuthbert wrote:

 All
 When starting hadoop on OSX I am getting this error. is there a fix for it
 
 java[22373:1c03] Unable to load realm info from SCDynamicStore



Re: OSX starting hadoop error

2011-07-28 Thread Bryan Keller
FYI, I logged a bug for this:
https://issues.apache.org/jira/browse/HADOOP-7489

On Jul 28, 2011, at 11:36 AM, Bryan Keller wrote:

 I am also seeing this error upon startup. I am guessing you are using OS X 
 Lion? It started happening to me after I upgraded to 10.7. Hadoop seems to 
 function properly despite this error showing up, though it is annoying.
 
 
 On Jul 27, 2011, at 12:37 PM, Ben Cuthbert wrote:
 
 All
 When starting hadoop on OSX I am getting this error. is there a fix for it
 
 java[22373:1c03] Unable to load realm info from SCDynamicStore
 



Re: cygwin not connecting to Hadoop server

2011-07-28 Thread Uma Maheswara Rao G 72686
Hi A Df,

see inline at ::

- Original Message -
From: A Df abbey_dragonfor...@yahoo.com
Date: Wednesday, July 27, 2011 10:31 pm
Subject: Re: cygwin not connecting to Hadoop server
To: common-user@hadoop.apache.org common-user@hadoop.apache.org

 See inline at **. More questions and many Thanks :D
 
 
 
 
 
 From: Uma Maheswara Rao G 72686 mahesw...@huawei.com
 To: common-user@hadoop.apache.org; A Df 
 abbey_dragonfor...@yahoo.comCc: common-user@hadoop.apache.org 
 common-user@hadoop.apache.org
 Sent: Wednesday, 27 July 2011, 17:31
 Subject: Re: cygwin not connecting to Hadoop server
 
 
 Hi A Df,
 
 Did you format the NameNode first?
 
 ** I had formatted it already but then I had reinstalled Java and 
 upgraded the plugins in cygwin so I reformatted it again. :D yes it 
 worked!! I am not sure all the steps that got it to finally work 

:: Great :-)

 but I will have to document it to prevent this headache in the 
 future. Although I typed ssh localhost too , so question is, do I 
 need to type ssh localhost each time I need to run hadoop?? Also, 

:: Actually ssh is an authentication service.
To save the athentication keys, you can run below commands. which will provide 
authentication.No need to give password every time.

ssh-keygen -t rsa -P 
cat /root/.ssh/id_rsa.pub  /root/.ssh/authosized_keys

then exceute
/etc/init.d/sshd restart

To connect to remote machines
cat /root/.ssh/id_rsa.pub | ssh root@remoteIP 'cat  
/root/.ssh/authorized_keys'

then in remote machine excute
/etc/init.d/sshd restart

 since I need to work with Eclipse maybe you can have a look at my 
 post about the plugin cause I can get the patch to work. The 
 subject is Re: Cygwin not working with Hadoop and Eclipse Plugin. 
 I plan to read up on how to write programs for Hadoop. I am using 
 the tutorial at Yahoo but if you know of any really good about 
 coding with Hadoop or just about understanding Hadoop then please 
 let me know.
Hadoop Definitive guide will the great book for understanding the 
Hadoop.Some sample prgrams also will be available.
To check the Hadoop internals:
http://www.google.co.in/url?sa=tsource=webcd=8ved=0CEMQFjAHurl=http%3A%2F%2Findia.paxcel.net%3A6060%2FLargeDataMatters%2Fwp-content%2Fuploads%2F2010%2F09%2FHDFS1.pdfrct=jq=hadoop%20internals%20%2B%20part%201ei=CqAxTtD8C4fprQfYq6DMCwusg=AFQjCNGYMQbAeGP0cYGl4OJHseRsfEMGvQcad=rja


 
 Can you check the NN logs whether NN is started or not?
 ** I checked and the previous runs had some logs missing but now 
 the last one have all 5 logs and I got two conf files in xml. I 
 also copied out the other output files which I plan to examine. 
 Where do I specify the output extension format that I want for my 
 output file? I was hoping for an txt file it shows the output in a 
 file with no extension even though I can read it in Notepad++. I 
 also got to view the web interface at:
 NameNode - http://localhost:50070/
 JobTracker - http://localhost:50030/
 
 ** See below for the working version, finally!! Thanks
 CMD
 Williams@TWilliams-LTPC ~/hadoop-0.20.2
 $ bin/hadoop jar hadoop-0.20.2-examples.jar grep input
 11/07/27 17:42:20 INFO mapred.FileInputFormat: Total in
 
 11/07/27 17:42:20 INFO mapred.JobClient: Running job: j
 11/07/27 17:42:21 INFO mapred.JobClient:  map 0% reduce
 11/07/27 17:42:33 INFO mapred.JobClient:  map 15% reduc
 11/07/27 17:42:36 INFO mapred.JobClient:  map 23% reduc
 11/07/27 17:42:39 INFO mapred.JobClient:  map 38% reduc
 11/07/27 17:42:42 INFO mapred.JobClient:  map 38% reduc
 11/07/27 17:42:45 INFO mapred.JobClient:  map 53% reduc
 11/07/27 17:42:48 INFO mapred.JobClient:  map 69% reduc
 11/07/27 17:42:51 INFO mapred.JobClient:  map 76% reduc
 11/07/27 17:42:54 INFO mapred.JobClient:  map 92% reduc
 11/07/27 17:42:57 INFO mapred.JobClient:  map 100% redu
 11/07/27 17:43:06 INFO mapred.JobClient:  map 100% redu
 11/07/27 17:43:09 INFO mapred.JobClient: Job complete:
 11/07/27 17:43:09 INFO mapred.JobClient: Counters: 18
 11/07/27 17:43:09 INFO mapred.JobClient:   Job Counters
 11/07/27 17:43:09 INFO mapred.JobClient: Launched r
 11/07/27 17:43:09 INFO mapred.JobClient: Launched m
 11/07/27 17:43:09 INFO mapred.JobClient: Data-local
 11/07/27 17:43:09 INFO mapred.JobClient:   FileSystemCo
 11/07/27 17:43:09 INFO mapred.JobClient: FILE_BYTES
 11/07/27 17:43:09 INFO mapred.JobClient: HDFS_BYTES
 11/07/27 17:43:09 INFO mapred.JobClient: FILE_BYTES
 11/07/27 17:43:09 INFO mapred.JobClient: HDFS_BYTES
 11/07/27 17:43:09 INFO mapred.JobClient:   Map-Reduce F
 11/07/27 17:43:09 INFO mapred.JobClient: Reduce inp
 11/07/27 17:43:09 INFO mapred.JobClient: Combine ou
 11/07/27 17:43:09 INFO mapred.JobClient: Map input
 11/07/27 17:43:09 INFO mapred.JobClient: Reduce shu
 11/07/27 17:43:09 INFO mapred.JobClient: Reduce out
 11/07/27 17:43:09 INFO mapred.JobClient: Spilled Re
 11/07/27 17:43:09 INFO mapred.JobClient: Map output

Re: Exporting From Hive

2011-07-28 Thread Ayon Sinha
This is for CLI
  
Use this:
set hive.cli.print.header=true;

Instead of doing this on the prompt everytime you can change your hive start 
command to:
hive -hiveconf hive.cli.print.header=true

But be careful with this setting as quite a few commands stop working with NPE 
with this on. I think describe doesn't work.
 
-Ayon
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.




From: Bale, Michael michael.b...@cwc.com
To: common-user@hadoop.apache.org
Sent: Thursday, July 28, 2011 8:54 AM
Subject: Exporting From Hive

Hi,

I was wondering if anyone could help me?

Does anyone know if it is possible to include the column headers in an
output from a Hive Query? I've had a look through the internet but can't
seem to find an answer.

If not, is it possible to export the result from a describe table query? If
so I could then run that at the same tie and join up at a future date.

Thanks for your help

-- 
*Mike Bale*
Graduate Insight Analyst
*Cable and Wireless Communications*
Tel: +44 (0)20 7315 4437
www.cwc.com

The information contained in this email (and any attachments) is confidential 
and may be privileged. If you are not the intended recipient
and have received this email in error, please notify the sender immediately by 
reply email and delete the message and any attachments.
If you are not the named addressee, you must not copy, disclose, forward or 
otherwise use the information contained in this email.
Cable  Wireless Communications Plc and its affiliates reserve the right to 
monitor all email communications through their networks to
ensure regulatory compliance.

Cable  Wireless Communications Plc is a company registered in England  Wales 
with number:
07130199 and offices located at 3rd Floor, 26 Red Lion Square, London WC1R 4HQ

Re: File System Counters.

2011-07-28 Thread R V
Harsh
 
If this is the case I don't understand something. If I see FILE_BYTES_READ to 
be non zero for a map, the only thing I can assume is that it came
 from a spill during sort phase.
 
I have a 10 node cluster, and I ran TeraSort with size 100,000 Bytes ( 1000 
records). 
 
My io.sort.mb is 300 and io.sort.factor is 10. My mapred.child.java.opts is set 
to -Xmx512m.
 
When I run this I expected given that I have everything that fits into memory,  
that there will be no FILE_BYTES_READ on the map side and no FILE_BYTES_WRITTEN 
on the redcue side. But I find that my 
FILE_BYTES_READ on the map side is 188,604 (HDFS_BYTES_READ is 149,686) and 
inexplicably SPILLED_RECORDS is 1000 for both and map and reduce. 
 
So my questions have become two.
1. Why is my spill count 1000. Given that io.sort.factor and io.sort.mb are 10 
and 300 MB and I have 512MB for each task?
2.  Where are the numbers for FILE_BYTES_READ/WRITTEN coming from?
 
TIA
 
Raj
From: Harsh J ha...@cloudera.com
To: common-user@hadoop.apache.org; R V cattiv...@yahoo.com
Sent: Thursday, July 28, 2011 12:03 AM
Subject: Re: File System Counters.

Raj,

There is no overlap. Data read from HDFS FileSystem instances go to
HDFS_BYTES_READ, and data read from Local FileSystem instances go to
FILE_BYTES_READ. These are two different FileSystems, and have no
overlap at all.

On Thu, Jul 28, 2011 at 5:56 AM, R V cattiv...@yahoo.com wrote:
 Hello

 I don't know if the question has been answered. I  am trying to understand 
 the overlap between FILE_BYTES_READ and HDFS_BYTES_READ. What are the various 
 components that provide value to this counter? For example when I see 
 FILE_BYTES_READ for a specific task ( Map or Reduce ) , is it purely due to 
 the spill during sort phase? If a HDFS read happens on a non local node, does 
 the counter increase on the node where the data block resides? What happens 
 when the data is local? does the counter increase for both HDFS_BYTES_READ 
 and FILE_BYTES_READ? From the values I am seeing, this looks to be the case 
 but I am not sure.

 I am not very fluent in Java , and hence I don't fully understand the source 
 . :-(

 Raj



-- 
Harsh J


RE: next gen map reduce

2011-07-28 Thread Aaron Baff
Does this mean 0.22.0 has reached stable and will be released as the stable 
version soon?

--Aaron

-Original Message-
From: Robert Evans [mailto:ev...@yahoo-inc.com]
Sent: Thursday, July 28, 2011 6:39 AM
To: common-user@hadoop.apache.org
Subject: Re: next gen map reduce

It has not been introduced yet.  If you are referring to MRV2.  It is targeted 
to go into the 0.23 release of Hadoop, but is currently on the MR-279 branch.  
Which should hopefully be merged to trunk in abut a week.

--Bobby

On 7/28/11 7:31 AM, real great.. greatness.hardn...@gmail.com wrote:

In which Hadoop version is next gen introduced?

--
Regards,
R.V.



Re: Hadoop Question

2011-07-28 Thread George Datskos

Nitin,

On 2011/07/28 14:51, Nitin Khandelwal wrote:

How can I determine if a file is being written to (by any thread) in HDFS.
That information is exposed by the NameNode http servlet.  You can 
obtain it with the

fsck tool (hadoop fsck /path/to/dir -openforwrite) or you can do an http get

http://namenode:port/fsck?path=/your/pathopenforwrite=1


George




TestDFSIO error: libhdfs.so.1 does not exist

2011-07-28 Thread Yang Xiaoliang
Hi all,

I am benchmarking a Hadoop Cluster with the hadoop-*-test.jar TestDFSIO

but the following error returns:
File /usr/hadoop-0.20.2/libhdfs/libhdfs.so.1 does not exist.

How to solve this problem?

Thanks!