Hadoop Realtime Queries

2014-07-31 Thread Natarajan, Prabakaran 1. (NSN - IN/Bangalore)
Hi I want to perform realtime query on HDFS data. I tried hadoop/yarnt/hive, shark on spark, Tez, etc., But still I couldn't get subsecond performance on the large data that I have. I understand hadoop is not meant for this, but still want to achieve as max as possible 1) How can we

RE: Hadoop Realtime Queries

2014-07-31 Thread Kumar, Deepak8
Hi, As far as I know, real time queries are only possible using HBase cloudera search. Hive would be a batch process, it is not real time. So instead of tuning different parameters , may be you could look for different architecture design so that you could use HBase. Regards, Deepak From:

Re: Hadoop Realtime Queries

2014-07-31 Thread Bertrand Dechoux
It all depends on the context and what is really meant by realtime. Impala (and other concurrent alternatives) are not listed among the tools you have tried. Maybe you should not focus only on batch frameworks for providing a realtime access? The results are not surprising. Bertrand Dechoux On

When all datanodes get down and namenode is still up

2014-07-31 Thread Satyam Singh
Hello Users, When i made all datnodes down and namenode is still up writing failures occures which is fine. But after few mins i have made all datanode up one by one. But still i observe writing failures and it looks like datanodes are not available. This state resolves only when i restart

How to check what is the log directory for container logs

2014-07-31 Thread Krishna Kishore Bonagiri
Hi, Is there a way to check what is the log directory for container logs in my currently running instance of YARN from the command line, I mean using the yarn command or hadoop command or so? Thanks, Kishore

RE: Hadoop Realtime Queries

2014-07-31 Thread Natarajan, Prabakaran 1. (NSN - IN/Bangalore)
Hi, Thank you all for the reply. I want quick response for SQL queries . Thanks and Regards Prabakaran.N From: ext Bertrand Dechoux [mailto:decho...@gmail.com] Sent: Thursday, July 31, 2014 1:28 PM To: user@hadoop.apache.org Subject: Re: Hadoop Realtime Queries It all depends on the context

Hadoop and Hive Performance Tuning

2014-07-31 Thread Natarajan, Prabakaran 1. (NSN - IN/Bangalore)
Hi I am using hive queries on structured RC file. Can you please let me know, the key performance parameters that I have tune for better query performance (for Hadoop 2.3/ Yarn and Hive 0.13). Thanks and Regards Prabakaran.N aka NP nsn, Bangalore When I is replaced by We - even Illness

Re: How to check what is the log directory for container logs

2014-07-31 Thread Haiyang Fu
1.change to the nodemanager log dir according to yarn-site.xml property nameyarn.nodemanager.log-dirs/name value/path/to/hdfs/nodemanager_log//value descriptionthe directories used by Nodemanagers as log directories/description

Re: Performance on singlenode and multinode hadoop

2014-07-31 Thread Sindhu Hosamane
Hello , If i am running my experiment on a server with 2 processors (4 cores each ) . To say it has 2 processors and 8 cores . What would be the ideal values for mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum to get maximum performance. Your help is very much

Re: Performance on singlenode and multinode hadoop

2014-07-31 Thread Nitin Pawar
what kind of jobs your tasks will be doing? are they CPU intensive or only memory intensive ? On Thu, Jul 31, 2014 at 6:28 PM, Sindhu Hosamane sindh...@gmail.com wrote: Hello , If i am running my experiment on a server with 2 processors (4 cores each ) . To say it has 2 processors and 8

Re: Performance on singlenode and multinode hadoop

2014-07-31 Thread Sindhu Hosamane
I am not pretty sure about the answer for this. I am running Cascalog queries which runs on files which are in MB . On 31 Jul 2014, at 15:11, Nitin Pawar nitinpawar...@gmail.com wrote: what kind of jobs your tasks will be doing? are they CPU intensive or only memory intensive ? On

Ideal number of mappers and reducers to increase performance

2014-07-31 Thread Sindhu Hosamane
Hello friends , If i am running my experiment on a server with 2 processors (4 cores each ) . To say it has 2 processors and 8 cores . What would be the ideal values for mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum to get maximum performance. I am running

RE: Hadoop Realtime Queries

2014-07-31 Thread Natarajan, Prabakaran 1. (NSN - IN/Bangalore)
Hi Nitin, I want queries to return within a second Hive table DataSize is 50TB – Snappy RC file Thanks and Regards Prabakaran.N aka NP nsn, Bangalore When I is replaced by We - even Illness becomes Wellness From: ext Nitin Pawar [mailto:nitinpawar...@gmail.com] Sent: Thursday, July 31, 2014

Re: Hadoop Realtime Queries

2014-07-31 Thread Nitin Pawar
Before you read the entire answer, i will advise you to wait for hive experts to answer. you are looking at a wrong system then. Hive is more batch oriented and bring a near real time scenario with ORC/Paraquet fileformats along with tez and stringer. You may want to design your system in a way

Juggling or swaping out the standby NameNode in a QJM / HA configuration

2014-07-31 Thread Colin Kincaid Williams
Hello, I'm trying to swap out a standby NameNode in a QJM / HA configuration. I believe the steps to achieve this would be something similar to: Use the Bootstrap standby command to prep the replacment standby. Or rsync if the command fails. Somehow update the datanodes, so they push the

Unit tests in hadoop in 2.2.0 (and further)

2014-07-31 Thread Rajat Jain
Hello, What is the unit test status on the 2.2.0 branch of Apache? I get a lot of test failures while running those tests. I run it using mvn clean package (which runs all the tests). Am I doing something wrong? Is there a documented way of running the tests? Are there many failures in the 2.2.0

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

2014-07-31 Thread Jing Zhao
Hi Colin, I guess currently we may have to restart almost all the daemons/services in order to swap out a standby NameNode (SBN): 1. The current active NameNode (ANN) needs to know the new SBN since in the current implementation the SBN tries to send rollEditLog RPC request to ANN

Re: Hadoop Realtime Queries

2014-07-31 Thread Alex Kamil
NP, we use Hbase+Phoenix for real time SQL queries in prod: http://phoenix.apache.org/ by real time I mean milliseconds for small queries, or seconds for hundreds of millions of rows. The speed mostly depends on how many nodes/ hbase regionservers are in in the cluster. Hbase is great for

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

2014-07-31 Thread Colin Kincaid Williams
Hi Jing, Thanks for the response. I will try this out, and file an Apache jira. Best, Colin Williams On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao j...@hortonworks.com wrote: Hi Colin, I guess currently we may have to restart almost all the daemons/services in order to swap out a

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

2014-07-31 Thread Bryan Beaudreault
We've done this a number of times without issue. Here's the general flow: 1) Shutdown namenode and zkfc on SNN 2) Stop zkfc on ANN (ANN will remain active because there is no other zkfc instance running to fail over to) 3) Run hdfs zkfc -formatZK on ANN 4) Start zkfc on ANN (will sync up with

Gridmix on Hadoop2

2014-07-31 Thread Brian Husted
Is Gridmix supported on Hadoop2 with yarn? I am getting an ArithmeticException / zero error when I submit jobs through Gridmix. Any help is appreciated.

Re: Setting Up First Hadoop / Yarn Cluster

2014-07-31 Thread Alexander Pivovarov
Probably permission issue. On Thu, Jul 31, 2014 at 11:32 AM, Houston King houston.k...@gmail.com wrote: Hey Everyone, I'm a noob working to setup my first 13 node Hadoop 2.4.0 cluster, and I've run into some problems that I'm having a heck of a time debugging. I've been following the

Re: Ideal number of mappers and reducers to increase performance

2014-07-31 Thread Harsh J
You can perhaps start with a generic 4+4 configuration (which matches your cores), and tune your way upwards or downwards from there based on your results. On Thu, Jul 31, 2014 at 8:35 PM, Sindhu Hosamane sindh...@gmail.com wrote: Hello friends , If i am running my experiment on a server with

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

2014-07-31 Thread Colin Kincaid Williams
On 3) Run hdfs zkfc -formatZK in my test environment, I get a Warning then an error WARNING: Before proceeding, ensure that all HDFS services and failover controllers are stopped! the complete output: sudo hdfs zkfc -formatZK 2014-07-31 17:43:07,952 INFO [main] tools.DFSZKFailoverController

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

2014-07-31 Thread Colin Kincaid Williams
Another error after stopping the zkfc. Do I have to take the cluster down to format ZK? [root@rhel1 conf]# sudo service hadoop-hdfs-zkfc stop Stopping Hadoop zkfc: [ OK ] stopping zkfc [root@rhel1 conf]# sudo -u hdfs zkfc -formatZK sudo: zkfc: command not

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

2014-07-31 Thread Colin Kincaid Williams
I tried a third time and it just worked? sudo hdfs zkfc -formatZK 2014-07-31 18:07:51,595 INFO [main] tools.DFSZKFailoverController (DFSZKFailoverController.java:init(140)) - Failover controller configured for NameNode NameNode at rhel1.local/10.120.5.203:8020 2014-07-31 18:07:51,791 INFO

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

2014-07-31 Thread Colin Kincaid Williams
However continuing with the process my QJM eventually error'd out and my Active NameNode went down. 2014-07-31 20:59:33,944 WARN [Logger channel to rhel6.local/ 10.120.5.247:8485] client.QuorumJournalManager (IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485 failed to write

Re: Juggling or swaping out the standby NameNode in a QJM / HA configuration

2014-07-31 Thread Bryan Beaudreault
This shouldn't have affected the journalnodes at all -- they are mostly unaware of the zkfc and active/standby state. Did you do something else that may have impacted the journalnodes? (i.e. shut down 1 or more of them, or something else) For your previous 2 emails, reporting errors/warns when