Re: Data cleansing in modern data architecture

2014-08-10 Thread Adaryl Bob Wakefield, MBA
It’s a lot of theory right now so let me give you the full background and see if we can more refine the answer. I’ve had a lot of clients with data warehouses that just weren’t functional for various reasons. I’m researching Hadoop to try and figure out a way to totally eliminate traditional

How to restrict ephemral ports used by Yarn App Master

2014-08-10 Thread Susheel Kumar Gadalay
Hi, I have a question. How do I selectively open port range for Hadoop Yarn App Master on a cluster. I have seen the jira issue in http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-issues/201204.mbox/%3c74835698.75.1335357881103.javamail.tom...@hel.zones.apache.org%3E fixed in version

Re: Data cleansing in modern data architecture

2014-08-10 Thread Sriram Ramachandrasekaran
Ok. If you think, the noise levels in the data is going to be so less, doing the view creation is probably costly and meaningless. HDFS is append-only. So, there's no point writing the transactions as HDFS files and trying to perform analytics on top of it directly. Instead, you could go with

How to run Job tracker/Task tracker in 2.2.0 and later

2014-08-10 Thread Susheel Kumar Gadalay
Hi, I am using Hadoop 2.2.0 version. I am not finding start-mapred.sh in the sbin directory. How do I start Job tracker and Task tracker. I tried version 1 way by directly executing but getting these errors. [hadoop@ip-10-147-128-12 ~]$ sbin/hadoop-daemon.sh start jobtracker starting

Re: Data cleansing in modern data architecture

2014-08-10 Thread Bertrand Dechoux
Well, keeping bad data has its use too. I assume you know about temporal database. Back to your use case, if you only need to remove a few records from HDFS files, the easiest might be during the reading. It is a view, of course, but it doesn't mean you need to write it back to HDFS. All your

100% CPU consumption by Resource Manager process

2014-08-10 Thread Krishna Kishore Bonagiri
Hi, My YARN resource manager is consuming 100% CPU when I am running an application that is running for about 10 hours, requesting as many as 27000 containers. The CPU consumption was very low at the starting of my application, and it gradually went high to over 100%. Is this a known issue or

Yarn, MRv1, MRv2 lots of newbie doubts and questions

2014-08-10 Thread Sebastiano Di Paola
Hi all, I'm a newbie hadoop user, and I started using hadoop 2.4.1 as my first installation. So now I'm struggling with mapred, mapreduce, yarnMRv1, MRv2, yarn. I tried to read the documentation, but I couldn't find a clear answer...sometimes it seems that documentations thinks that you know

JACKY_LIN(Chih-Wei Lin 林志偉) is out of the office.

2014-08-10 Thread jacky_lin
I will be out of the office starting 2014/08/11 and will not return until 2014/08/12. I will respond to your message when I return. --- TSMC PROPERTY This

Re: Data cleansing in modern data architecture

2014-08-10 Thread Adaryl Bob Wakefield, MBA
I quickly went over the wikipedia page for temporal databases. Just sounds like a slowly changing dimension. Data being valid at different points in time isn’t weird to me. Most where clauses off a data warehouse are going to use the date dimension. What we’re talking about is data that was

Multiple datanodes on single machine !!

2014-08-10 Thread Sindhu Hosamane
Hello, I have set up multiple datanodes on a single machine following the instructions in http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201009.mbox/%3ca3ef3f6af24e204b812d1d24ccc8d71a03688...@mse16be2.mse16.exchange.ms%3E So i see 2 datanodes up and running when i run jps command.

Re: Multiple datanodes on single machine !!

2014-08-10 Thread hadoop hive
How much memory it have and how many maps and reducer you have set with how much heap size? On Aug 11, 2014 11:17 AM, Sindhu Hosamane sindh...@gmail.com wrote: Hello, I have set up multiple datanodes on a single machine following the instructions in