Re: Development work focused on HFile v2

2012-11-03 Thread Jonathan Gray
https://issues.apache.org/jira/browse/HBASE-3857 There is some design doc there. Also, I know Mikhail Bautin has done some presentations on HFile v2, as have some other folks from FB. Some digging on google might turn up some decks/videos from HUGS, etc. On Sat, Nov 3, 2012 at 3:38 PM, Doug

Re: User meetup on 10/29?

2012-09-14 Thread Jonathan Gray
+1 for 10/29 user meetup! On Thu, Sep 13, 2012 at 10:44 AM, Stack st...@duboce.net wrote: The folks at wizecommerce have kindly offered to host a meetup down in San Mateo on the evening of 10/29. Are you all up for a user meetup at the end of October after Hadoop World? If so, I'll stick it

RE: integrating hadoop and Hbase with eclipse

2011-10-19 Thread Jonathan Gray
Not sure what kind of integration you're talking about, but if just want to create a project with the HBase source then just grab an SVN checkout of an HBase repo and just do: mvn eclipse:eclipse This creates all the necessary project files. Then just add new project from existing source.

RE: Correct use of HTablePool

2011-09-28 Thread Jonathan Gray
Yes, you can use the Result once you give back the HTable reference. Result is self contained. -Original Message- From: Joel Halbert [mailto:j...@su3analytics.com] Sent: Wednesday, September 28, 2011 6:27 AM To: user@hbase.apache.org Subject: Re: Correct use of HTablePool Sure,

RE: Error recovery for block... failed because recovery from primary datanode failed 6 times

2011-02-13 Thread Jonathan Gray
The DFS errors are after the server aborts. What is in the log before the server abort? Doesn't seem to show any reason here which is unusual. Anything in the master? Did it time out this RS? You're running with replication = 1? -Original Message- From: Bradford Stephens

RE: Parent/child relation - go vertical, horizontal, or many tables?

2011-02-11 Thread Jonathan Gray
Just to chime in with my usual take on this (seems like the tall vs. wide discussion happens every few weeks...) For get all children of a parent, doing a get() on the wide table vs. doing a scan() on the tall table (as long as you set scanner caching appropriately) will be almost identical.

RE: How long can I have a table open?

2011-02-07 Thread Jonathan Gray
There is not really a limit on that. Underneath, the client will deal with following regions around if the move, a new master if there is a failover, etc... The only thing that cannot be left open indefinitely is a scanner (they have a server-side lease and expire if left idle). JG

RE: Master node Question

2011-02-07 Thread Jonathan Gray
There is only one active HBase master at any given time, but there can be any number of backup masters. The failover is automated and coordinated via ZooKeeper. Regionservers and clients use ZooKeeper to determine who is the current active master. You can run with as many as you want. On

RE: Amazon EC2

2011-02-07 Thread Jonathan Gray
There are others who have had far more experience than I have with HBase + EC2, so will let them chime in. But I personally recommend against this direction if you expect to have a consistent cluster size and/or a significant amount of load. EC2 is great at quickly scaling up/down, but is

RE: How long can I have a table open?

2011-02-07 Thread Jonathan Gray
@hbase.apache.org Subject: RE: How long can I have a table open? Thanks, I have been having some real stability problems with my cluster and I'm trying to narrow down the possible problems. -Pete -Original Message- From: Jonathan Gray [mailto:jg...@fb.com] Sent: Monday, February 07

RE: Master node Question

2011-02-07 Thread Jonathan Gray
of the three in the slaves as well I take it then execute your start commands from one master only example MasterA. On 2/7/11 1:29 PM, Jonathan Gray jg...@fb.com wrote: There is only one active HBase master at any given time, but there can be any number of backup masters. The failover

RE: doing a scan that will return random columns in a table's family

2011-02-03 Thread Jonathan Gray
Result is just the client-side class which wraps whatever the server returns. The ability to do this query is not really about whether Result has the methods to get at this data, but rather whether Scan supports this type of query (it does). Scan.addFamily(family) will make it so that every

RE: Region Servers Crashing during Random Reads

2011-02-03 Thread Jonathan Gray
How much heap are you running on your RegionServers? 6GB of total RAM is on the low end. For high throughput applications, I would recommend at least 6-8GB of heap (so 8+ GB of RAM). -Original Message- From: charan kumar [mailto:charan.ku...@gmail.com] Sent: Thursday, February 03,

RE: Scan (Start Row, End Row) vs Scan (Row)

2011-01-20 Thread Jonathan Gray
The best way to do this is as Friso describes, using the existing stopRow parameter in Scan. There is another way to do it with startRow + a filter. There is a PrefixFilter which could be used here. Looking at the code, it seems as though the PrefixFilter does an early out and stops the scan

RE: Scan (Start Row, End Row) vs Scan (Row)

2011-01-20 Thread Jonathan Gray
. Thanks -Pete -Original Message- From: Jonathan Gray [mailto:jg...@fb.com] Sent: Thursday, January 20, 2011 8:09 AM To: user@hbase.apache.org Subject: RE: Scan (Start Row, End Row) vs Scan (Row) The best way to do this is as Friso describes, using the existing stopRow parameter

RE: stargate 20.6 with hbase 20.2

2011-01-20 Thread Jonathan Gray
It's strongly recommended that you upgrade to HBase 0.20.6 (at least) if not HBase 0.90.0. There are several critical bug fixes in the releases between 0.20.2 and 0.20.6 besides stargate. -Original Message- From: mike anderson [mailto:saidthero...@gmail.com] Sent: Thursday, January

RE: stargate 20.6 with hbase 20.2

2011-01-20 Thread Jonathan Gray
the bullet and just update to 0.90.0. Cheers, Mike On Thu, Jan 20, 2011 at 1:07 PM, Jonathan Gray jg...@fb.com wrote: It's strongly recommended that you upgrade to HBase 0.20.6 (at least) if not HBase 0.90.0. There are several critical bug fixes in the releases between 0.20.2

RE: hbase 0.20.6 - HBaseClusterTestCase, DU, cygwin, IntelliJ - arg!

2011-01-20 Thread Jonathan Gray
suite? I really hope not.' Thanks, Mark -Original Message- From: Jonathan Gray [mailto:jg...@fb.com] Sent: Tuesday, January 18, 2011 7:56 PM To: user@hbase.apache.org Subject: RE: hbase 0.20.6 - HBaseClusterTestCase, DU, cygwin, IntelliJ - arg! Hey Mark. Sorry to hear about

RE: performance regression after hbase restart

2011-01-20 Thread Jonathan Gray
In HBase 0.90.0 there is a new retain assignment configuration parameter that makes it so your cluster keeps the same region assignment between full cluster restarts. It is ON by default. JG -Original Message- From: Tao Xie [mailto:xietao.mail...@gmail.com] Sent: Thursday, January

RE: Scan with Filter

2011-01-18 Thread Jonathan Gray
The API shows one row per next() call but the number of rows fetched per RPC can be configured much higher with Scan.setCaching(). Filters are basically just server-side predicates that will dictate which rows/columns/values will be returned to the client. This does not relate to the number

RE: hbase 0.20.6 - HBaseClusterTestCase, DU, cygwin, IntelliJ - arg!

2011-01-18 Thread Jonathan Gray
Hey Mark. Sorry to hear about your troubles. There is a new testing facility that has replaced HBaseClusterTestCase. Check out HBaseTestingUtility. It's JUnit4 based. One sample usage of it is the test TestFromClientSide. This new one includes support for multiple DataNodes, RegionServers,

RE: Cluster Wide Pauses

2011-01-14 Thread Jonathan Gray
These are a different kind of pause (those caused by blockingStoreFiles). This is HBase stepping in and actually blocking updates to a region because compactions have not been able to keep up with the write load. It could manifest itself in the same way but this is different than shorter

RE: Recommended Node Size Limits

2011-01-14 Thread Jonathan Gray
One of the most important factors to look at is how the number of regions relates to how much heap is available for your RegionServers, and then how that will impact your expected MemStore flush sizes. More than total number of regions, this is about the number of actively written to regions.

RE: HTable.put(ListPut puts) perform batch insert?

2011-01-10 Thread Jonathan Gray
BatchUpdate is the old, deprecated version of Put. You are using the best APIs. -Original Message- From: Weishung Chung [mailto:weish...@gmail.com] Sent: Monday, January 10, 2011 10:10 AM To: user@hbase.apache.org Subject: Re: HTable.put(ListPut puts) perform batch insert? Thank

RE: No minor compactions on a table built only on bulk loads

2011-01-10 Thread Jonathan Gray
It's not really a bug. I think the assumption is that if you are at the level of doing your own bulk loads, you should also manage when you want to compact and split. I know in cases where I've done this, I would usually know at certain points I would want to trigger major compactions. At

RE: perplexing HBase bug: looking for where to learn how to debug

2011-01-06 Thread Jonathan Gray
The first step to debugging HBase is usually going through the Master and RegionServer logs. Sometimes it can be more art than science but a majority of our debugging is done with log analysis. If you can find specific offending regions, you can parse through the logs looking for mentions of

RE: Works

2010-12-22 Thread Jonathan Gray
So there was an existing hbase directory, right? I thought you had said that was not the case. If you were attempting to upgrade with existing data, it could be an incompatibility between 0.20.6 and 0.89. Might be fixed already not sure. -Original Message- From: Pete Haidinyak

RE: Jean-Daniel: RE: some data replication support in hbase?

2010-12-21 Thread Jonathan Gray
Seems like hooking into replication would be a good approach. There's also a JIRA open about a changes API. https://issues.apache.org/jira/browse/HBASE-3247 Or you could use Coprocessors which are committed in 0.92 / trunk. The pre/post hooks can be used as a per-operation trigger mechanism.

RE: I give up, help please

2010-12-21 Thread Jonathan Gray
You have existing data? Try clearing out your hbase directory in hdfs. Looks like some weird problem reading the hbase.version file out of HDFS. -Original Message- From: Pete Haidinyak [mailto:javam...@cox.net] Sent: Tuesday, December 21, 2010 3:32 PM To: HBase Group Subject: I

RE: question on indexes in RDBMS vs. noSQL self created indexes...(disk space wise)

2010-12-21 Thread Jonathan Gray
1. It's a column based sparse table so null's take up no space(ie. More room when we need to duplicate) Correct. Nulls take up no space. 2. Indexes take up space in an RDBMS already and are essentially duplication in your old RDBMS anyways Secondary indexes in an RDBMS use

RE: I give up, help please

2010-12-21 Thread Jonathan Gray
? Thanks -Pete On Tue, 21 Dec 2010 17:24:36 -0800, Jonathan Gray jg...@fb.com wrote: You have existing data? Try clearing out your hbase directory in hdfs. Looks like some weird problem reading the hbase.version file out of HDFS. -Original Message- From: Pete Haidinyak

RE: partitioning and map/reduce hbase hashcodes

2010-12-19 Thread Jonathan Gray
HBase doesn't hashcode anything. It does strict lexicographical ordering of the row keys themselves. So yes, keys with similar prefixes may be in the same partition / next to each other. Rather than using a hashcode modulo some number, we use the META table to determine which partition

RE: Results from a Map/Reduce

2010-12-17 Thread Jonathan Gray
Hey Peter, That System.exit line is nothing important, just the main thread waiting for the tasks to finish before closing. You're interested in having the MR job return a single result? To do that, you would need to roll-up the processing done in each of your Map tasks into a single Reduce

RE: question about multi-transaction queries

2010-12-17 Thread Jonathan Gray
All of my experience doing something like this was with straight Java. There are MultiGet and MultiPut capabilities in the Java client that will help you out significantly. I played with Jython and HBase a couple years ago and back then the performance was horrible. I never looked back but I

RE: question about multi-transaction queries

2010-12-17 Thread Jonathan Gray
it mean though we would incur Java startup cost? Or do you propose we write some sort of java server that has the JVM running and is able to get multi-get queries? Thanks. -Jack On Fri, Dec 17, 2010 at 11:15 AM, Jonathan Gray jg...@fb.com wrote: All of my experience doing something

RE: Results from a Map/Reduce

2010-12-17 Thread Jonathan Gray
much on coprocessors, can you point me to some examples of their use? Thanks -Pete -Original Message- From: Jonathan Gray [mailto:jg...@fb.com] Sent: Friday, December 17, 2010 11:13 AM To: user@hbase.apache.org Subject: RE: Results from a Map/Reduce Hey Peter

RE: Results from a Map/Reduce

2010-12-17 Thread Jonathan Gray
available to send back as a web page. This seems like such a basic operation that I am hoping there are 'Best Practices' or examples on how to accomplish this. I would also like a pony too. :-) Thanks -Pete -Original Message- From: Jonathan Gray [mailto:jg...@fb.com] Sent

RE: Cluster Size/Node Density

2010-12-17 Thread Jonathan Gray
You absolutely need to do some testing and benchmarking. This sounds like the kind of application that will require lots of tuning to get right. It also sounds like the kind of thing HDFS is typically not very good at. There is an increasing amount of activity in this area (optimizing HDFS

RE: Results from a Map/Reduce

2010-12-17 Thread Jonathan Gray
it takes a couple of minutes to do a scan that brings back several million rows. My boss wants the query to be in the 'less than five second' range. Thanks -Pete -Original Message- From: Jonathan Gray [mailto:jg...@fb.com] Sent: Friday, December 17, 2010 1:19 PM To: user

RE: question about multi-transaction queries

2010-12-17 Thread Jonathan Gray
possible -Jack On Dec 17, 2010, at 1:32 PM, Jonathan Gray jg...@fb.com wrote: I'm not sure I understand. Are you trying to build a client? Or you want something that behaves like the mysql client? -Original Message- From: Jack Levin [mailto:magn...@gmail.com] Sent

RE: [RFC] Deployment layout and server configurations

2010-12-16 Thread Jonathan Gray
Hey Imran, This looks reasonable but it's hard to say without knowing what the read/write workload is like. You say all searches are done using Solr... will that also be hosted on these servers? One thing. It looks like you have two servers for ZK? ZK should always be run in odd numbers

RE: Modifying existing table entries

2010-12-14 Thread Jonathan Gray
Hey Adam, Do you need to scan all of the entries in order to know which ones you need to change the expiration of? Or do you have that information as an input? As for why you can't insert an older version, it is because HBase sorts all columns in descending version order regardless of

RE: Modifying existing table entries

2010-12-14 Thread Jonathan Gray
entries On 12/14/10 12:57 AM, Jonathan Gray wrote: Hey Adam, Do you need to scan all of the entries in order to know which ones you need to change the expiration of? Or do you have that information as an input? I don't have to scan everything, but I also can't pinpoint all the entries

RE: Composite key, scan on partial key

2010-12-14 Thread Jonathan Gray
There might be a little confusion. Specifying start/stop rows vs. scanning all rows with a filter... yes, clearly the start/stop is far more efficient. What Ryan is talking about is specifying the start row and then using a filter to determine when you're done with the rows you want. In this

RE: Confusion on the role of regionserver

2010-12-14 Thread Jonathan Gray
The 5 RS will be connecting to all 10 DNs. However, when writing to HDFS the first replica always goes to the local node. Because of this, the 5 DNs that are hosting the 5 RS could potentially have more data than the other 5 DNs. In almost all installations I've been a part of the #RS == #DN

RE: Recommended setup for a small cluster

2010-12-14 Thread Jonathan Gray
That sounds right. One node would have NN, HMaster, and ZK. Others would have DN and RS. You could put the SNN on any of the slave nodes I suppose. -Original Message- From: Nanheng Wu [mailto:nanhen...@gmail.com] Sent: Tuesday, December 14, 2010 10:12 PM To: user@hbase.apache.org

RE: HBase stability

2010-12-13 Thread Jonathan Gray
HBase is not designed or well tested for production or stability on 2 nodes. It will work on 2 nodes, but do not expect good performance or stability. What is the hardware configuration and daemon setup on this cluster of 2 nodes? How many cores, spindles, RAM, heap sizes etc... And you have

RE: Hive HBase integration scan failing

2010-12-10 Thread Jonathan Gray
Hey, Need some more info. Can you paste logs from the MR tasks that fail? What's going on in the cluster while the MR job is running (cpu, io-wait, memory, etc)? And what is the setup of your cluster... how many nodes, specs of nodes (cores, memory, RS heap), and then how many concurrent map

RE: Is my data losed?

2010-12-09 Thread Jonathan Gray
Jiajun, Hard to say whether you've lost data or not. Something looks wrong with HDFS. What versions of HBase and HDFS are you running? What's going on in the logs of the DataNodes and the NameNode when this is happening? What about the dfs web ui? Try running Hadoop fsck to see what's up

[ANNOUNCE] HBase Hackathon: Coprocessor Edition, December 13th @ Facebook

2010-12-02 Thread Jonathan Gray
What? HBase Hackathon: Coprocessor Edition When? December 13, 2010 @ 11AM Where? Facebook, Palo Alto Sign up here: http://www.meetup.com/hackathon/calendar/15597555/ Lunch, dinner, and beers will be provided. From meetup announcement... With HBase 0.90 near release, it's time to shift

RE: Scalability on multi-core machines

2010-11-30 Thread Jonathan Gray
The recommended setup if you want to put RS and ZK on the same node, is to ensure ZK has its own dedicated disk. -Original Message- From: Michael Segel [mailto:michael_se...@hotmail.com] Sent: Tuesday, November 30, 2010 5:35 AM To: user@hbase.apache.org Subject: RE: Scalability on

RE: Schema design, one-to-many question

2010-11-29 Thread Jonathan Gray
Hey Bryan, All of these approaches could work and seem sane. My preference these days would be the wide-table approach (#2, 3, 4) rather than the tall table. Previously #1 was more efficient but in 0.90 and beyond the same optimizations exist for both tall and wide tables. For #2, I would

RE: question about meta data query intensity

2010-11-23 Thread Jonathan Gray
are storing 1 Petabyte of data of images into hbase). -Jack On Tue, Nov 23, 2010 at 9:50 AM, Jonathan Gray jg...@fb.com wrote: It is possible that it could be a bottleneck but usually is not.  Generally production HBase installations have long-lived clients, so the client-side caching

RE: question about meta data query intensity

2010-11-23 Thread Jonathan Gray
long tail hits that will be uncached, which may stress out meta region, that being said, is it possible create affinity and nail meta region into a beefy server or set of beefy servers? -Jack On Tue, Nov 23, 2010 at 10:58 AM, Jonathan Gray jg...@fb.com wrote: Are you going to have long

RE: Newbie question

2010-11-15 Thread Jonathan Gray
://www.brianfrankcooper.net/pubs/ycsb-v4.pdf hari On Mon, Nov 15, 2010 at 4:20 AM, Jonathan Gray jg...@facebook.com wrote: HBase is well-suited for a high-write workload. Hari, I'm not sure what would be different in a database like Cassandra with respect to updates

RE: Regionserver not shutting down automatically

2010-11-08 Thread Jonathan Gray
Hari, When you issue a shutdown to the master process, it performs a full cluster shutdown. You don't have to issue regionserver stops from the shell, the Master takes care of it over RPC. You can stop an individual regionserver (bin/hbase-daemon.sh stop regionserver) but if you're doing a

RE: Hbase insertion process cause to region server down.

2010-11-08 Thread Jonathan Gray
NSRE is normal, this happens when regions move around and your client needs to update the location. That seems like an awful lot of mappers/reducers on a 5 server / dual core setup... You have only 2 cores per server but you have a DataNode, RegionServer, and 4 map tasks and 3 reduce tasks?

RE: XML Files Design Question

2010-11-08 Thread Jonathan Gray
I'd recommend HBase over HDFS with file sizes in that range. It will be faster and far more scalable while inheriting the same durability guarantees you get from HDFS. -Original Message- From: Barney Frank [mailto:barneyfran...@gmail.com] Sent: Monday, November 08, 2010 11:04 AM To:

RE: Where do you get your hardware?

2010-11-05 Thread Jonathan Gray
Just avoid the dell hard drives, they are a super-rip off. Which btw means you'll have to avoid dells, because the _only_ way to get the dell disk trays which are required is to buy dell hard drives (3-4x markup btw). +1 on crazy markup. But there actually are some online retailers out

RE: Unexpected shell behavior -- changing one column family attribute resets the others to the default

2010-11-05 Thread Jonathan Gray
This is fixed in trunk. There was a bug that was resetting other options to defaults. -Original Message- From: Buttler, David [mailto:buttl...@llnl.gov] Sent: Friday, November 05, 2010 3:57 PM To: user@hbase.apache.org Subject: Unexpected shell behavior -- changing one column

RE: HBase as a versioned key/value store

2010-11-03 Thread Jonathan Gray
Hi Wojciech, HBase can easily be used as a versioned key/value store. I'd say that's one of the easiest ways to use it. To help you get more throughput, you'll have to provide more details. What version are you running, what kind of hardware / configuration, and what does your client look

RE: HBase as a versioned key/value store

2010-11-03 Thread Jonathan Gray
[mailto:wlangiew...@gmail.com] Sent: Wednesday, November 03, 2010 7:15 AM To: user@hbase.apache.org Subject: Re: HBase as a versioned key/value store Hello, 2010/11/3 Jonathan Gray jg...@facebook.com Hi Wojciech, HBase can easily be used as a versioned key/value store. I'd say that's one

RE: Sanity date time check when a region server joins the cluster

2010-10-31 Thread Jonathan Gray
wouldn't mind tackling this problem. How much of a skew do we want to allow between the RS and the rest of the cluster? ~Jeff On 10/28/2010 12:08 PM, Jonathan Gray wrote: I was discussing this exact issue this morning. Ran into a problem where master was timing out

RE: Time-series schema

2010-10-29 Thread Jonathan Gray
There is no such atomicity provided by HBase. Recent TableIndexed may help, but I have not personally tried it. Uhm actually there is. :-) Like I said in the other post, when you insert the rows, you can fetch the local time on the node and use it when you insert the row as the

RE: Time-series schema

2010-10-29 Thread Jonathan Gray
a TableIndexed fits, would not an RDBMS be a better choice? Sean On Fri, Oct 29, 2010 at 7:01 PM, Jonathan Gray jg...@facebook.com wrote: There is no such atomicity provided by HBase. Recent TableIndexed may help, but I have not personally tried it. Uhm actually

RE: Sanity date time check when a region server joins the cluster

2010-10-28 Thread Jonathan Gray
I was discussing this exact issue this morning. Ran into a problem where master was timing out a region in transition because the RS was 5 minutes behind the master. I like the idea of the RS sending it's timestamp on startup and if it is outside a certain threshold, the master throws it a

RE: Contributing to hbase but test with less hardware

2010-10-26 Thread Jonathan Gray
One option is to use EC2 to spin up a cluster for a short period of time and test on it, but that brings along its own set of complications. What kind of things are you hoping to contribute? I would say the best way to do things if you don't have large clusters to test on is write lots of good

RE: best way to clear inconsistencies?

2010-10-24 Thread Jonathan Gray
You may have had some duplicate assignment issues, so there were some regions being double counted. The latest version of HBCK has some fixup stuff and I'm working on adding more repair functionality to it. Should get into 0.90/trunk this week. If you're on an 0.89 release, you might be able

RE: large store file split

2010-10-22 Thread Jonathan Gray
Hey Jack, Seems like you're getting a lot of strange ZooKeeper behavior. How many nodes are you running with in your quorum? Do you have any weird networking issues? Check out the ZK server logs as well and see if there's anything suspicious going on in there. Also, if you enable ZK debug

RE: The hfile.block.cache.size = 0 performance is better than default(0.2) in random read? Is it possible?

2010-10-21 Thread Jonathan Gray
By using the block cache, read blocks are referenced within the block cache data structures and referenced for a longer amount of time than if not put into the block cache. This will definitely add additional stress to the GC. If you expect a very low hit ratio, it can be advantageous to not

RE: HBase random access in HDFS and block indices

2010-10-18 Thread Jonathan Gray
Hi William. Answers inline. -Original Message- From: William Kang [mailto:weliam.cl...@gmail.com] Sent: Monday, October 18, 2010 7:48 PM To: hbase-user Subject: HBase random access in HDFS and block indices Hi, Recently I have spent some efforts to try to understand the

RE: HBase random access in HDFS and block indices

2010-10-18 Thread Jonathan Gray
HFiles are generally 256MB and default block size is 64K, so that's 4000 blocks (1/16th what you said). That would have a more reasonable block index of 200K. But the block index is kept in-memory so you only read it once when the file is first opened. So even if you do lower the block size

RE: Cornercase issue when deleting columns with cells on timestamp 0

2010-10-07 Thread Jonathan Gray
Definitely file a new JIRA and put the test case up on it. This is probably an independent issue from most of the other TS/delete issues. You guys are good at finding these ;) Keep it up! JG From: Evert Arckens [mailto:ev...@outerthought.org] Sent: Thursday, October 07, 2010 2:13 AM To:

RE: stopping namenode and regionservers

2010-10-06 Thread Jonathan Gray
Currently HBase cannot ride over an HDFS restart. Might be feasible in the future but not currently planned. Some of the NameNode HA solutions might indirectly address this. Why is it that you need to restart your namenode? -Original Message- From: Jack Levin

RE: stopping namenode and regionservers

2010-10-06 Thread Jonathan Gray
of namenode for any reason. Generally, I guess I should be stopping regionservers before namenode restart, at least I won't generate unflushed data. -Jack On Tue, Oct 5, 2010 at 11:25 PM, Jonathan Gray jg...@facebook.com wrote: Currently HBase cannot ride over an HDFS restart. Might

RE: Paid OSS task for performing manual major compactions

2010-10-05 Thread Jonathan Gray
HBASE-917 looks relevant too. -Original Message- From: Andrew Purtell [mailto:apurt...@apache.org] Sent: Tuesday, October 05, 2010 11:50 AM To: user@hbase.apache.org Subject: Re: Paid OSS task for performing manual major compactions From: Daniel Einspanjer Mozilla recently

RE: How do I setup authentication/permissions for an hbase database?

2010-10-04 Thread Jonathan Gray
The layer hiding your cluster is a firewall. You would permit only an explicit set of IP addresses permission to access HBase. Your client(s) would be coming from a given set of servers and you would know the IPs of those servers. Exceptions would be added to your firewall to allow those IPs

RE: How do you increase the max cell size in Hbase? - more info on the error seen

2010-10-03 Thread Jonathan Gray
Sorry, forgot to respond to this earlier. The configuration parameter you need to change is 'hbase.client.keyvalue.maxsize' Setting it to 0 will remove the limit. Let me know if this does not help. -Original Message- From: Taylor, Ronald C [mailto:ronald.tay...@pnl.gov] Sent:

RE: Client`s cache invalidation

2010-10-02 Thread Jonathan Gray
to be the only one mentioned. Am I missing something ? Thanks, Naresh. On 10/01/2010 09:40 PM, Jonathan Gray wrote: Yes. RegionServers will throw a NotServingRegionException. This, in turn, will cause the client to grab the location from META again. -Original Message- From

RE: Client`s cache invalidation

2010-10-01 Thread Jonathan Gray
Yes. RegionServers will throw a NotServingRegionException. This, in turn, will cause the client to grab the location from META again. -Original Message- From: Naresh Rapolu [mailto:nrap...@purdue.edu] Sent: Friday, October 01, 2010 5:35 PM To: user@hbase.apache.org Subject:

HBase User Group NYC - October 11 - Night before Hadoop World

2010-09-27 Thread Jonathan Gray
Hello HBasers, A bit of a late announcement to the list, but there is a HUG meetup in NYC on October 11, the night before Hadoop World. More information here: http://www.meetup.com/hbaseusergroup/calendar/14606174/ The meetup is being hosted by StumbleUpon at their NYC offices. Snacks and

RE: hbase doesn't delete data older than TTL in old regions

2010-09-15 Thread Jonathan Gray
This sounds reasonable. We are tracking min/max timestamps in storefiles too, so it's possible that we could expire some files of a region as well, even if the region was not completely expired. Jinsong, mind filing a jira? JG -Original Message- From: Jinsong Hu

RE: ycsb test on hbase

2010-09-10 Thread Jonathan Gray
there is some work going on to do concurrent priority compaction (Jonathan Gray has been working on it) but I haven't seen anything yet in hbase and don't know the time line. My personal opinion is that we should integrate the patch into trunk and use it until the more advanced compactions

RE: How to delete a range of table ?

2010-09-10 Thread Jonathan Gray
Lots of reasons. Given the way deletes currently work, it would be extremely expensive to process multi-row deletes. At this point there are already people questioning if we should have row/family deletes because they are expensive to process. If we move towards a new delete mechanism or

RE: Problem with bulk incremental loads..

2010-09-10 Thread Jonathan Gray
I ran into something like this as well but were in a rush to get the import done so didn't look into it. I forgot about it so didn't follow up. We ended up ensuring regions would not be split during the job (configuring the split size way up) and reran the MR job. JG -Original

RE: Problem with bulk incremental loads..

2010-09-10 Thread Jonathan Gray
. And in this case, the system goes to the mode of repeatedly splitting this Hfile... Shall I report a bug and follow up on it? Vidhya On 9/10/10 1:42 PM, Jonathan Gray jg...@facebook.com wrote: I ran into something like this as well but were in a rush to get the import done so didn't

RE: concerns surrounding using timestamp with Put

2010-09-08 Thread Jonathan Gray
Hi Doug, Out of order insertion of timestamps is supported in 0.89/0.90/trunk but not fully supported in the 0.20.x series. Primarily, you can see some weird stuff using Gets in 0.20 if you do out of order timestamp insertion. Scans are mostly okay. JG -Original Message- From:

RE: Limits on HBase

2010-09-07 Thread Jonathan Gray
no matter the size of the heap. When/if HBase RPC can send large objects in smaller chunks, this will be less of an issue. Best regards, - Andy Why is this email five sentences or less? http://five.sentenc.es/ --- On Mon, 9/6/10, Jonathan Gray jg...@facebook.com wrote

RE: Limits on HBase

2010-09-06 Thread Jonathan Gray
I'm not sure what you mean by optimized cell size or whether you're just asking about practical limits? HBase is generally used with cells in the range of tens of bytes to hundreds of kilobytes. However, I have used it with cells that are several megabytes, up to about 50MB. Up at that

RE: question about RegionManager

2010-09-06 Thread Jonathan Gray
) { break; } } return nRegions; } 2010/9/7 Jonathan Gray jg...@facebook.com That code does actually exist in the latest 0.89 release. It was a protection put in place to guard against a weird behavior that we had seen during load balancing. As Ryan suggests

RE: Please help me overcome HBase's weaknesses

2010-09-04 Thread Jonathan Gray
But your boss seems rather to be criticizing the fact that our system is made of components. In software engineering, this is usually considered a strength. As to 'roles', one of the bigtable author's argues that a cluster of master and slaves makes for simpler systems [1]. I

RE: how many regions a regionserver can support

2010-09-01 Thread Jonathan Gray
of the total number of regions that can be supported and I don't run into this IO issue. Can any body show us the actual example of the hbase data size and cluster size ? Jimmy. -- From: Jonathan Gray jg...@facebook.com Sent: Friday, August 27

RE: Slow Inserts on EC2 Cluster

2010-09-01 Thread Jonathan Gray
Been doing lots of importing recently. There are two easy ways to get big performance boosts. The first is HFileOuputFormat. It works into existing tables now. Consistently see 10X+ performance this way versus API. If you must use the API, pre-create a bunch of regions for your table. You

RE: how many regions a regionserver can support

2010-08-27 Thread Jonathan Gray
There is no fixed limit, it has much more to do with the read/write load than the actual dataset size. HBase is usually fine having very densely packed RegionServers, if much of the data is rarely accessed. If you have extremely high numbers of regions per server and you are writing to all of

RE: Initial region loads in hbase..

2010-08-27 Thread Jonathan Gray
Vidhya, Could you post a snippet of an RS log during this time? You should be able to see what's happening between when the OPEN message gets there and the OPEN completes. Like Stack said, it's probably that its single-threaded in the version you're using and with all the file opening, your

RE: Best way to get multiple non-sequential rows

2010-08-25 Thread Jonathan Gray
Yes, something like: ListResult multiGet(ListGet gets, int maxThreads) In general, you should assume that HTable instances are not thread-safe. Behind the scenes, HTables are sharing TCP connections to RS, but from client POV you should have one HTable per thread per table. -Original

RE: HBase: project ideas

2010-08-19 Thread Jonathan Gray
Himanshu, Seems like you might have an interest in using Coprocessors to do stuff like low-latency aggregates. This is a big area of interest for some of us but not a lot of concerted effort in this direction yet. There is plenty to do here for a research project. Check out:

RE: Secondary Index versus Full Table Scan

2010-08-04 Thread Jonathan Gray
Also seek/reseek hooks in the filters will allow skipping of blocks, which for some queries (returning high % of total data) it won't matter but for more sparse filters that want to jump this can be significant. These are being worked on by an intern here and should have some patches up in a

RE: Zero-copy reads

2010-07-27 Thread Jonathan Gray
Can you provide more links to comments in jira mentioning loss of zero copy reads? Basically what this is referring to are changes made in the 0.20 release of HBase related to the block-based HFile format, the KeyValue data pointer, and other stuff like the Result client return type and the

RE: Flaky tableExists()

2010-07-20 Thread Jonathan Gray
scanning, it would not necessarily change the implementation of these client (to RS) calls. Hard to say what would make this flakey besides some of the older bugs around lots of META StoreFiles. What version of HBase are you running? JG -Original Message- From: Jonathan Gray [mailto:jg

  1   2   >