Re: Pseudo -distributed mode

2014-08-13 Thread Sergey Murylev
as a single Java process and utilizes only 1 cpu core even if there are many more ? On Tue, Aug 12, 2014 at 4:32 PM, Sergey Murylev sergeymury...@gmail.com mailto:sergeymury...@gmail.com wrote: Yes :) Pseudo-distributed mode is such configuration when we have some Hadoop environment

Re: Pseudo -distributed mode

2014-08-12 Thread Sergey Murylev
Yes :) Pseudo-distributed mode is such configuration when we have some Hadoop environment on single computer. On 12/08/14 18:25, sindhu hosamane wrote: Can Setting up 2 datanodes on same machine be considered as pseudo-distributed mode hadoop ? Thanks, Sindhu signature.asc

Re: High performance Count Distinct - NO Error

2014-08-06 Thread Sergey Murylev
Why do you think that default implementation of COUNT DISTINCT is slow? As far as I understand the most famous way to find number of distinct elements is to sort them and scan all sorted items consequently excluding duplicated elements. Assimptotics of this algoritm is O(n *log n ), I think that

Re: Are mapper classes re-instantiated for each record?

2014-05-05 Thread Sergey Murylev
Hi Jeremy, According to official documentation http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/mapreduce/Mapper.html setup and cleanup calls performed for each InputSplit. In this case you variant 2 is more correct. But actually single mapper can be used for processing multiple

Re: CDH4 administration through one account

2014-04-28 Thread Sergey Murylev
Hi Raj, Should 'john1' be included in the 'sudoers' file ? Hadoop don't use root privileges. But it has some built-in users and groups like hdfs, mapred, etc. I think you should add your admin user at least to groups hdfs and mapred. -- Thanks, Sergey On 28/04/14 23:40, Raj Hadoop wrote: Hi,

Re: Sqoop import/export tool fails with PriviledgedActionException

2014-04-24 Thread Sergey Murylev
Hi Kuchekar, I do have the mentioned jar (avro-mapred-1.5.3.jar) in the mentioned location. Not sure, what I am missing. Make sure that you can read this file as same user as you use to run sqoop. According to logs you run sqoop as root. I not sure that root has such privileges. You can try to

Re: UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE)

2014-04-12 Thread Sergey Murylev
Hi, 1 - I don't understand why this is happening. I should have been able to copy data. Why I can't copy data between hdfs? I think you should check permissions not only for root (/), you need to make sure that whole path is accessible for root. You can do this using following command:

Re: HDFS Installation

2014-04-12 Thread Sergey Murylev
Hi Ekta, You can look to following instructions: single node cluster http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ multi node cluster http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ Actually I recommend to use

Re: How to keep data consistency?

2014-02-19 Thread Sergey Murylev
Hi Edward, You can't achieve data consistency on your cluster configuration. To do this you need at least 3 data nodes and enabled replication with level 3 ( dfs.replication property in hdfs-site.xml). On 19/02/14 13:02, EdwardKing wrote: Hadoop 2.2.0, two computer, one is master,another is