RE: Hadoop 2.6.0: "FileSystem file:/// is not a distributed file system"

2014-12-12 Thread Brahma Reddy Battula
Hi Dong, HADOOP_CONF_DIR might be referring to default..you can export HADOOP_CONF_DIR where following configuration files are present.. Thanks & Regards Brahma Reddy Battula From: Dan Dong [dongda...@gmail.com] Sent: Saturday, December 13, 2014 3:43 AM To: u

Hadoop 2.6.0: "FileSystem file:/// is not a distributed file system"

2014-12-12 Thread Dan Dong
Hi, I installed Hadoop2.6.0 on my cluster with 2 nodes, I got the following error when I run: $hadoop dfsadmin -report FileSystem file:/// is not a distributed file system What this mean? I have set it in core-site.xml already: fs.defaultFS hdfs://master-node:9000 and in hdfs-site.xml:

Re: Hadoop Installation on Multihomed Networks

2014-12-12 Thread Fei Hu
I solved the problem by changing the hosts file as follows: 10.10.0.10 10.5.0.10 yngcr10nc01 Thanks, Fei > On Nov 11, 2014, at 11:58 AM, daemeon reiydelle wrote: > > You may want to consider configuring host names that embed the subnet in the > host name itself (e.g. foo50, foo40, for foo vi

Re: Split files into 80% and 20% for building model and prediction

2014-12-12 Thread Wilm Schumacher
Hi, from a machine learning perspective I would recommend this approach, too ... if there is no other information available which splits the data set. Depends on the data you are processing. And I would split the data persistently, e.g. not using the train data directly, but writing it into a fil

Re: What happens to data nodes when name node has failed for long time?

2014-12-12 Thread Rich Haase
The remaining cluster services will continue to run. That way when the namenode (or other failed processes) is restored the cluster will resume healthy operation. This is part of hadoop’s ability to handle network partition events. Rich Haase | Sr. Software Engineer | Pandora m 303.887.1146 |

Re: Split files into 80% and 20% for building model and prediction

2014-12-12 Thread Andre Kelpe
Try Cascading multitool: http://docs.cascading.org/multitool/2.6/ - André On Fri, Dec 12, 2014 at 10:30 AM, unmesha sreeveni wrote: > I am trying to divide my HDFS file into 2 parts/files > 80% and 20% for classification algorithm(80% for modelling and 20% for > prediction) > Please provide sug

Re: Split files into 80% and 20% for building model and prediction

2014-12-12 Thread Chris Mawata
How about doing something on the lines of bucketing: Pick a field that is unique for each record and if hash of the field mod 10 is 8 or less it goes in one bin, otherwise into the other one. Cheers Chris On Dec 12, 2014 1:32 AM, "unmesha sreeveni" wrote: > I am trying to divide my HDFS file into

RE: Split files into 80% and 20% for building model and prediction

2014-12-12 Thread Mikael Sitruk
Hi Unmesha With the random approach you don't need to write the MR job for counting. Mikael.s -Original Message- From: "Hitarth" Sent: ‎12/‎12/‎2014 15:20 To: "user@hadoop.apache.org" Subject: Re: Split files into 80% and 20% for building model and prediction Hi Unmesha, If you us

Re: Split files into 80% and 20% for building model and prediction

2014-12-12 Thread Hitarth
Hi Unmesha, If you use the approach suggested by Mikael of taking the random 80% of data for training and rest for testing then you can have good distribution to generate your predictive model. Thanks, Hitarth > On Dec 12, 2014, at 6:00 AM, unmesha sreeveni wrote: > > Hi Mikael > So you w

MetaData information from DB2 to HDFS

2014-12-12 Thread Saravanan Nagarajan
HI Team, In my project we need to get meta data information (i.e column name,datatype,etc) from RDBMS using sqoop. Based on below link, i come to know that, we can get meta data using java API. Is the any way we to get meta data information from the command line? http://stackoverflow.com/questio

Re: Split files into 80% and 20% for building model and prediction

2014-12-12 Thread unmesha sreeveni
Hi Mikael So you wont write an MR job for counting the number of records in that file to find 80% and 20% On Fri, Dec 12, 2014 at 3:54 PM, Mikael Sitruk wrote: > > I would use a different approach. For each row in the mapper I would have > invoked random.Next() then if the number generated by ra

What happens to data nodes when name node has failed for long time?

2014-12-12 Thread Chandrashekhar Kotekar
Hi, What happens if name node has crashed for more than one hour but secondary name node, all the data nodes, job tracker, task trackers are running fine? Do those daemon services also automatically shutdown after some time? Or those services keep running hoping for namenode to come back? Regards

RE: Split files into 80% and 20% for building model and prediction

2014-12-12 Thread Mikael Sitruk
I would use a different approach. For each row in the mapper I would have invoked random.Next() then if the number generated by random is below 0.8 then the row would go to key for training otherwise go to key for the test. Mikael.s -Original Message- From: "Susheel Kumar Gadalay" Sent:

Re: Split files into 80% and 20% for building model and prediction

2014-12-12 Thread Susheel Kumar Gadalay
Simple solution.. Copy the HDFS file to local and use OS commands to count no of lines cat file1 | wc -l and cut it based on line number. On 12/12/14, unmesha sreeveni wrote: > I am trying to divide my HDFS file into 2 parts/files > 80% and 20% for classification algorithm(80% for modelling a

Re: adding node(s) to Hadoop cluster

2014-12-12 Thread Rainer Toebbicke
Le 12 déc. 2014 à 03:13, Vinod Kumar Vavilapalli a écrit : > Auth to local mappings > - nn/nn-h...@cluster.com -> hdfs > - dn/.*@cluster.com -> hdfs > > The combination of the above lets you block any other user other than hdfs > from faking like a datanode. > > Purposes > - _HOST: Let you d

Split files into 80% and 20% for building model and prediction

2014-12-12 Thread unmesha sreeveni
I am trying to divide my HDFS file into 2 parts/files 80% and 20% for classification algorithm(80% for modelling and 20% for prediction) Please provide suggestion for the same. To take 80% and 20% to 2 seperate files we need to know the exact number of record in the data set And it is only known if