On Mon, Apr 29, 2013 at 10:54 PM, Asaf Mesika asaf.mes...@gmail.com wrote:
I think for Pheoenix truly to succeed, it's need HBase to break the JVM
Heap barrier of 12G as I saw mentioned in couple of posts. since Lots of
analytics queries utilize memory, thus since its memory is shared with
Thanks for getting back, Ted. I totally understand other priorities and
will wait for some feedback. I am adding some more info to this post to
allow better diagnosing of performance.
I hit my region servers with a lot of GET requests (~20K per second per
regionserver) using asynchbase in my test
Phoenix will succeed if HBase succeeds. Phoenix just makes it easier to
drive HBase to it's maximum capability. IMHO, if HBase is to make
further gains in the OLAP space, scans need to be faster and new, more
compressed columnar-store type block formats need to be developed.
Running inside
You are making use of batch Gets? get(List)
-Anoop-
On Tue, Apr 30, 2013 at 11:40 AM, Viral Bajaria viral.baja...@gmail.comwrote:
Thanks for getting back, Ted. I totally understand other priorities and
will wait for some feedback. I am adding some more info to this post to
allow better
I am using asynchbase which does not have the notion of batch gets. It
allows you to batch at a rowkey level in a single get request.
-Viral
On Mon, Apr 29, 2013 at 11:29 PM, Anoop John anoop.hb...@gmail.com wrote:
You are making use of batch Gets? get(List)
-Anoop-
Nope.. the system is clean only CDH4 on it. And I can't find
hbase-default.xml on the system.
However, I solved this issue my downloading
http://hbase_master:60010/conf, renaming it to hbase-default.xml and
adding that to the classpath
So maybe a bug in CDH4.
On Mon, Apr 29, 2013 at 11:36 PM,
If you can make use of the batch API ie. get(ListGet) you can reduce
the handlers (and no# of RPC calls also).. One batch will use one handler.
I am using asynchbase which does not have the notion of batch gets
I have not checked with asynchbase. Just telling as a pointer..
-Anoop-
On Tue,
I don't wish to be rude, but you are making odd claims as fact as
mentioned in a couple of posts. It will be difficult to have a serious
conversation. I encourage you to test your hypotheses and let us know if in
fact there is a JVM heap barrier (and where it may be).
On Monday, April 29, 2013,
Looked closely into the async API and there is no way to batch GETs to
reduce the # of RPC calls and thus handlers. Will play around tomorrow with
the handlers again and see if I can find anything interesting.
On Tue, Apr 30, 2013 at 12:03 AM, Anoop John anoop.hb...@gmail.com wrote:
If you can
Hi Loïc,
How many datanodes do you have on your cluster? Your replication factor is
set to 3 so I think you should have at least 3 datanodes?
Is one of those nodes down? There is some blocks missing, they are maybe on
a system which is down now? Bringing it back on might restore those blocks.
Hi Yves,
Your host file looks good.
Don't even try the shell until you get the UI displayed correctly and the
server logs saying that initialization is done.
So what do you have on the logs when you are trying with this new host file?
JM
2013/4/28 Asaf Mesika asaf.mes...@gmail.com
Asaf,
The heap barrier is something of a legend :) You can ask 10 different
HBase committers what they think the max heap is and get 10 different
answers. This is my take on heap sizes from the many clusters I have dealt
with:
8GB - Standard heap size, and tends to run fine without any
Hi Jean-Marc,
Thanks.
I have one datanode in my cluster.
The node isn't down.
How can I restore those blocks ?
Loïc TALON
mail.lta...@teads.tv http://teads.tv/
Video Ads Solutions
2013/4/30 Jean-Marc Spaggiari jean-m...@spaggiari.org
Hi Loïc,
How many datanodes do you have on your
Bonjour Loïc,
I don't thnk you can restore those blocks. If you have only one datanode
and it doesn't have the missing blocks, there is no-where for hadoop to get
those blocks back. So unfortunatly I don't think you can restore them.
Also, this is more hadoop than hbase related. You might want
Geez that's a bad article.
Never salt.
And yes there's a difference between using a salt and using the first 2-4 bytes
from your MD5 hash.
(Hint: Salts are random. Your hash isn't. )
Sorry to be-itch but its a bad idea and it shouldn't be propagated.
On Apr 29, 2013, at 10:17 AM, Shahab
The replication.html reference appears to contain a reference to a bug
(2611) which was solved two years ago :)
On Wed, Mar 6, 2013 at 12:15 AM, Damien Hardy dha...@viadeoteam.com wrote:
IMO the easier would be hbase export. For long term offline backup (for
disaster recovery). It can even be
Well those are *some* words :) Anyway, can you explain a bit in detail that
why you feel so strongly about this design/approach? The salting here is
not the only option mentioned and static hashing can be used as well. Plus
even in case of salting, wouldn't the distributed scan take care of it?
Now I post my configurations:
I use a 3 nodes cluster with all the nodes runnind hadoop, zookeeper and
hbase. Hbase master, a zookeeper daemon and Hadoop namenode run on the same
host. Hbase regionserver, a zookeeper daemon and hadoop datanode run on the
other 2 nodes. I called one of the
1. Change the schema
If I understand correctly, in this scenario, I loose the ordering (changeDate
desc). Moreover in my case, I could have 100k rows per objectId, meaning I
would have to iterate a long list, but I understand the logic.
If I only look for 24 hours before the original column
Yes, I see, but this is quite expensive as the table is huge
-Message d'origine-
De : Jean-Marc Spaggiari [mailto:jean-m...@spaggiari.org]
Envoyé : lundi 29 avril 2013 20:04
À : user@hbase.apache.org; ri...@laposte.net
Objet : Re: Read access pattern
HBASE-4811 is what you should be
I solved the last problem:
I modified the file /etc/hostname and i replaced the default hostname,
debian01 with namenode, jobtracker, or datanode, the hostnames i
used in hbase conf files. Now i start hbase fro master with
bin/start-hbase.sh and regionservers, instead of trying to connect with
I solved my problem with zookeeper. I don't know how, maybe it was a spell
xD
I made this way: on a slave i removed the directory of hbase, and i copied
the diectory of hbase-pseudo-distribuited (which works). Then i copied all
the configurations from the virtual machines which runned as master in
Hi John,
Thanks for sharing that. Might help other people who are facing the same
issues.
JM
2013/4/30 John Foxinhead john.foxinh...@gmail.com
Now I post my configurations:
I use a 3 nodes cluster with all the nodes runnind hadoop, zookeeper and
hbase. Hbase master, a zookeeper daemon and
bq. The downside that I see, is the bucket_number that we have to
maintain both at time or reading/writing and update it in case of
cluster restructuring.
I agree that this maintenance can be painful. However, Phoenix
(https://github.com/forcedotcom/phoenix) now supports salting,
automating
Sure.
By definition, the salt number is a random seed that is not associated with the
underlying record.
A simple example is a round robin counter (mod the counter by 10 yielding
[0..9] )
So you get a record, prepend your salt and you write it out to HBase. The salt
will push the data out to
Rules of thumb for starting off safely and for easing support issues are
really good to have, but there are no hard barriers or singular approaches:
use Java 7 + G1GC, disable HBase blockcache in lieu of OS blockcache, run
multiple regionservers per host. It is going to depend on how the cluster
Hi,
I have an hbase cluster where I have a table with a composite key. I map this
table to a Hive external table using which I insert/select data into/from this
table:
CREATE EXTERNAL TABLE event(key
structname:string,dateCreated:string,uid:string, {more columns here})
ROW FORMAT DELIMITED
Here it is:
select * from event where key.name='Signup' and key.dateCreated='2013-03-06
16:39:55.353' and key.uid='7af4c330-5988-4255-9250-924ce5864e3bf';
From: kulkarni.swar...@gmail.com [mailto:kulkarni.swar...@gmail.com]
Sent: Tuesday, April 30, 2013 11:25 PM
To: u...@hive.apache.org
Cc:
Can you show your query that is taking 700 seconds?
On Tue, Apr 30, 2013 at 12:48 PM, Rupinder Singh rsi...@care.com wrote:
Hi,
** **
I have an hbase cluster where I have a table with a composite key. I map
this table to a Hive external table using which I insert/select data
Multiple RS' per host gets you around the WAL bottleneck as well. But
it's operationally less than ideal. Do you usually recommend this
approach, Andy? I've shied away from it mostly.
On Apr 30, 2013, at 10:38 AM, Andrew Purtell apurt...@apache.org wrote:
Rules of thumb for starting off safely
You wouldn't do that if colocating MR. It is one way to soak up extra RAM
on a large RAM box, although I'm not sure I would recommend it (I have no
personal experience trying it, yet). For more on this where people are
actively considering it, see
https://issues.apache.org/jira/browse/BIGTOP-732
Running more than one RS on a host is an option for soaking up extra RAM,
since that is what we are discussing, but I can't recommend it because I
have no experience with that approach. I think I do want to experiment with
it, but not on a box with less than something like 16 or 24 cores.
On
Hmmm
I don't recommend HBase in situations where you are not running a M/R
Framework. Sorry, as much as I love HBase, IMHO there are probably better
solutions for a standalone NoSQL Databases. (YMMV depending on your use case.)
The strength of HBase is that its part of the Hadoop Ecosystem.
Rupinder,
Hive supports a filter pushdown[1] which means that the predicates in the
where clause are pushed down to the storage handler level where either they
get handled by the storage handler or delegated to hive if they cannot
handle them. As of now, the HBaseStorageHandler only supports
Hi,
We have a simple HBase schema:
row key = subscriber id.
Column family A = counters - all kinds of aggregations.
Events records have a UUID, in some scenarios we might get duplicate
events. We should not count the duplicates.
A possible solution was to keep event ids as qualifiers in another
Hi Jean-Marc. Thanks for the tip. However, for the moment at least,
I'm going to be abandoning my forays into HBase, I received direction
to focus on Hive instead.
Again, thank you. Should I need help in the near future, I'll be sure to
send the mailing list an enquiry.
On Tue, Apr 30, 2013
Swarnim,
Thanks. So this means custom map reduce is the viable option when working with
hbase tables having composite keys, since it allows to set the start and stop
keys. Hive+Hbase combination is out.
Regards
Rupinder
From: kulkarni.swar...@gmail.com [mailto:kulkarni.swar...@gmail.com]
That depends on how dynamic your data is. If it is pretty static, you can
also consider using something like Create Table As Select (CTAS) to create
a snapshot of your data to HDFS and then run queries on top of that data.
So your query might become something like:
create table my_table as
Have you had a look at Phoenix (https://github.com/forcedotcom/phoenix)? It'll
use all of the parts of your row key and depending on how much data you're
returning back to the client, will query over 10 million row in seconds.
James
@JamesPlusPlus
http://phoenix-hbase.blogspot.com
On Apr 30,
I have been attempting to speed up my HBase map-reduce scans for a while now. I
have tried just about everything without much luck. I'm running out of ideas
and was hoping for some suggestions. This is HBase 0.94.2 and Hadoop 2.0.0
(CDH4.2.1).
The table I'm scanning:
20 mil rows
Hundreds of
From http://hbase.apache.org/book.html#mapreduce.example :
scan.setCaching(500);// 1 is the default in Scan, which will
be bad for MapReduce jobs
scan.setCacheBlocks(false); // don't set to true for MR jobs
I guess you have used the above setting.
0.94.x releases are compatible. Have
Yes, I have tried various settings for setCaching() and I have
setCacheBlocks(false)
On Apr 30, 2013, at 9:17 PM, Ted Yu yuzhih...@gmail.com wrote:
From http://hbase.apache.org/book.html#mapreduce.example :
scan.setCaching(500);// 1 is the default in Scan, which will
be bad for
Have you tried enabling short circuit read ?
Thanks
On Apr 30, 2013, at 9:31 PM, Bryan Keller brya...@gmail.com wrote:
Yes, I have tried various settings for setCaching() and I have
setCacheBlocks(false)
On Apr 30, 2013, at 9:17 PM, Ted Yu yuzhih...@gmail.com wrote:
From
Yes, I have it enabled (forgot to mention that).
On Apr 30, 2013, at 9:56 PM, Ted Yu yuzhih...@gmail.com wrote:
Have you tried enabling short circuit read ?
Thanks
On Apr 30, 2013, at 9:31 PM, Bryan Keller brya...@gmail.com wrote:
Yes, I have tried various settings for setCaching() and
Your average row is 35k so scanner caching would not make a huge difference,
although I would have expected some improvements by setting it to 10 or 50
since you have a wide 10ge pipe.
I assume your table is split sufficiently to touch all RegionServer... Do you
see the same load/IO on all
I do not want to be rude or anything... But how often we need to have this
discussion?
When you salt your rowkeys with say 10 salt values then for each read you need
to fork of 10 read requests, and each of them touches only 1/10th of the tables
(which nicely with HBase's prefix scans).
Same here.
HBase is generally good at honing in to a small (maybe 10-100m rows) continuous
subset of an essentially unlimited dataset.
If all you ever do is scanning _everything_ and then throwing it away, a
straight scan (using Impala for example) or direct M/R on file(s) in HDFS is
far
47 matches
Mail list logo