Hi
I want to perform realtime query on HDFS data. I tried hadoop/yarnt/hive,
shark on spark, Tez, etc.,
But still I couldn't get subsecond performance on the large data that I have.
I understand hadoop is not meant for this, but still want to achieve as max as
possible
1) How can we
Hi,
As far as I know, real time queries are only possible using HBase cloudera
search. Hive would be a batch process, it is not real time. So instead of
tuning different parameters , may be you could look for different architecture
design so that you could use HBase.
Regards,
Deepak
From:
It all depends on the context and what is really meant by realtime. Impala
(and other concurrent alternatives) are not listed among the tools you have
tried.
Maybe you should not focus only on batch frameworks for providing a
realtime access? The results are not surprising.
Bertrand Dechoux
On
Hello Users,
When i made all datnodes down and namenode is still up writing failures
occures which is fine.
But after few mins i have made all datanode up one by one.
But still i observe writing failures and it looks like datanodes are not
available.
This state resolves only when i restart
Hi,
Is there a way to check what is the log directory for container logs in
my currently running instance of YARN from the command line, I mean using
the yarn command or hadoop command or so?
Thanks,
Kishore
Hi,
Thank you all for the reply.
I want quick response for SQL queries .
Thanks and Regards
Prabakaran.N
From: ext Bertrand Dechoux [mailto:decho...@gmail.com]
Sent: Thursday, July 31, 2014 1:28 PM
To: user@hadoop.apache.org
Subject: Re: Hadoop Realtime Queries
It all depends on the context
Hi
I am using hive queries on structured RC file.
Can you please let me know, the key performance parameters that I have tune
for better query performance (for Hadoop 2.3/ Yarn and Hive 0.13).
Thanks and Regards
Prabakaran.N aka NP
nsn, Bangalore
When I is replaced by We - even Illness
1.change to the nodemanager log dir according to yarn-site.xml
property
nameyarn.nodemanager.log-dirs/name
value/path/to/hdfs/nodemanager_log//value
descriptionthe directories used by Nodemanagers as log
directories/description
Hello ,
If i am running my experiment on a server with 2 processors (4 cores each ) .
To say it has 2 processors and 8 cores .
What would be the ideal values for mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum to get maximum performance.
Your help is very much
what kind of jobs your tasks will be doing?
are they CPU intensive or only memory intensive ?
On Thu, Jul 31, 2014 at 6:28 PM, Sindhu Hosamane sindh...@gmail.com wrote:
Hello ,
If i am running my experiment on a server with 2 processors (4 cores each
) .
To say it has 2 processors and 8
I am not pretty sure about the answer for this.
I am running Cascalog queries which runs on files which are in MB .
On 31 Jul 2014, at 15:11, Nitin Pawar nitinpawar...@gmail.com wrote:
what kind of jobs your tasks will be doing?
are they CPU intensive or only memory intensive ?
On
Hello friends ,
If i am running my experiment on a server with 2 processors (4 cores each ) .
To say it has 2 processors and 8 cores .
What would be the ideal values for mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum to get maximum performance.
I am running
Hi Nitin,
I want queries to return within a second
Hive table DataSize is 50TB – Snappy RC file
Thanks and Regards
Prabakaran.N aka NP
nsn, Bangalore
When I is replaced by We - even Illness becomes Wellness
From: ext Nitin Pawar [mailto:nitinpawar...@gmail.com]
Sent: Thursday, July 31, 2014
Before you read the entire answer, i will advise you to wait for hive
experts to answer.
you are looking at a wrong system then.
Hive is more batch oriented and bring a near real time scenario with
ORC/Paraquet fileformats along with tez and stringer.
You may want to design your system in a way
Hello,
I'm trying to swap out a standby NameNode in a QJM / HA configuration. I
believe the steps to achieve this would be something similar to:
Use the Bootstrap standby command to prep the replacment standby. Or rsync
if the command fails.
Somehow update the datanodes, so they push the
Hello,
What is the unit test status on the 2.2.0 branch of Apache? I get a lot of
test failures while running those tests. I run it using mvn clean package
(which runs all the tests). Am I doing something wrong? Is there a
documented way of running the tests? Are there many failures in the 2.2.0
Hi Colin,
I guess currently we may have to restart almost all the
daemons/services in order to swap out a standby NameNode (SBN):
1. The current active NameNode (ANN) needs to know the new SBN since in the
current implementation the SBN tries to send rollEditLog RPC request to ANN
NP,
we use Hbase+Phoenix for real time SQL queries in prod:
http://phoenix.apache.org/
by real time I mean milliseconds for small queries, or seconds for
hundreds of millions of rows. The speed mostly depends on how many
nodes/ hbase regionservers are in in the cluster. Hbase is great for
Hi Jing,
Thanks for the response. I will try this out, and file an Apache jira.
Best,
Colin Williams
On Thu, Jul 31, 2014 at 11:19 AM, Jing Zhao j...@hortonworks.com wrote:
Hi Colin,
I guess currently we may have to restart almost all the
daemons/services in order to swap out a
We've done this a number of times without issue. Here's the general flow:
1) Shutdown namenode and zkfc on SNN
2) Stop zkfc on ANN (ANN will remain active because there is no other zkfc
instance running to fail over to)
3) Run hdfs zkfc -formatZK on ANN
4) Start zkfc on ANN (will sync up with
Is Gridmix supported on Hadoop2 with yarn? I am getting an
ArithmeticException / zero error when I submit jobs through Gridmix. Any
help is appreciated.
Probably permission issue.
On Thu, Jul 31, 2014 at 11:32 AM, Houston King houston.k...@gmail.com
wrote:
Hey Everyone,
I'm a noob working to setup my first 13 node Hadoop 2.4.0 cluster, and
I've run into some problems that I'm having a heck of a time debugging.
I've been following the
You can perhaps start with a generic 4+4 configuration (which matches
your cores), and tune your way upwards or downwards from there based
on your results.
On Thu, Jul 31, 2014 at 8:35 PM, Sindhu Hosamane sindh...@gmail.com wrote:
Hello friends ,
If i am running my experiment on a server with
On 3) Run hdfs zkfc -formatZK in my test environment, I get a Warning then
an error
WARNING: Before proceeding, ensure that all HDFS services and
failover controllers are stopped!
the complete output:
sudo hdfs zkfc -formatZK
2014-07-31 17:43:07,952 INFO [main] tools.DFSZKFailoverController
Another error after stopping the zkfc. Do I have to take the cluster down
to format ZK?
[root@rhel1 conf]# sudo service hadoop-hdfs-zkfc stop
Stopping Hadoop zkfc: [ OK ]
stopping zkfc
[root@rhel1 conf]# sudo -u hdfs zkfc -formatZK
sudo: zkfc: command not
I tried a third time and it just worked?
sudo hdfs zkfc -formatZK
2014-07-31 18:07:51,595 INFO [main] tools.DFSZKFailoverController
(DFSZKFailoverController.java:init(140)) - Failover controller configured
for NameNode NameNode at rhel1.local/10.120.5.203:8020
2014-07-31 18:07:51,791 INFO
However continuing with the process my QJM eventually error'd out and my
Active NameNode went down.
2014-07-31 20:59:33,944 WARN [Logger channel to rhel6.local/
10.120.5.247:8485] client.QuorumJournalManager
(IPCLoggerChannel.java:call(357)) - Remote journal 10.120.5.247:8485 failed
to write
This shouldn't have affected the journalnodes at all -- they are mostly
unaware of the zkfc and active/standby state. Did you do something else
that may have impacted the journalnodes? (i.e. shut down 1 or more of them,
or something else)
For your previous 2 emails, reporting errors/warns when
28 matches
Mail list logo