Unique ID per URL

2012-05-17 Thread Amit Sela
Hi all, One of our HBase tables holds URL as a row key. I read that it is recommended to hold the URL key as: reversed domain + URL ID (using unique id per url.) I understand the part of reversed domain but could anyone elaborate about unique id per url, maybe give an example ? Thanks.

Timming problems with getScanner()

2012-05-17 Thread Belussi
Hi, I have really strange timing problem with getScanner() method. I have cluster with 3 nodes. If i want to read 10 rows (using JAVA API, 50bytes per row), ~50% of reads take me 10ms and second ~50% take more then 1000ms. I didn`t find any dependencies between that results and hbase

Re: hbase security

2012-05-17 Thread Eugene Koontz
On 5/15/12 2:24 AM, Harsh J wrote: HBase 0.92 has table-level security (among other goodies). Check out this slide on what all it includes: http://www.slideshare.net/ghelmling/new-hbase-features-coprocessors-and-security There was also a good blog post earlier on how to set it up, but am

Consider individual RSs performance when writing records with random keys?

2012-05-17 Thread Alex Baranau
Hi, 1. Not sure if you've seen HBaseWD (https://github.com/sematext/HBaseWD) project. It implements the salt keys with prefix approach when writing monotonically increasing row key/timeseries data. If simplified, the idea is to add random prefix to the row key so that writes end up on different

Re: Unique ID per URL

2012-05-17 Thread sagar naik
We use md5(url) That gives us a good distribution -Sagar On Thu, May 17, 2012 at 1:02 AM, Amit Sela am...@infolinks.com wrote: Hi all, One of our HBase tables holds URL as a row key. I read that it is recommended to hold the URL key as: reversed domain + URL ID (using unique id per url.)

Re: EndPoint Coprocessor could be dealocked?

2012-05-17 Thread Michael Segel
You should not let just any user run coprocessors on the server. That's madness. Best regards, - Andy Fei Ding, I'm a little confused. Are you trying to solve the problem of querying data efficiently from a table, or are you trying to find an example of where and when to use

Re: Schedule major compaction programmatically

2012-05-17 Thread Jimmy Xiang
I am thinking to add a function to check if a table or region in compaction (major or minor). I filed HBASE-6033. It won't show status of a specific compaction request. Will this help? Thanks, Jimmy On Thu, May 17, 2012 at 11:11 AM, Chen Song chen.song...@gmail.com wrote: I would like to

Re: hbase security

2012-05-17 Thread Andrew Purtell
On 5/15/12 2:24 AM, Harsh J wrote: P.s. If you're making it to HBaseCon, you may not wanna miss http://www.hbasecon.com/sessions/hbase-security-for-the-enterprise/ which also includes a tutorial (from Andrew). Given the time constraints on the material I have to present and QA, what I'm doing

Re: Schedule major compaction programmatically

2012-05-17 Thread Chen Song
Thanks Jimmy. Meanwhile, is there a work around for this? How does compact/major_compact issued from hbase shell handles this under the hood? Is it eventually calling HBaseAdmin API or HRegion synchronous API call? Thanks Chen On Thu, May 17, 2012 at 2:24 PM, Jimmy Xiang jxi...@cloudera.com

Re: Schedule major compaction programmatically

2012-05-17 Thread Jimmy Xiang
It is an async call to the region server to request a compaction. Once the request is accepted, the call returned. There is no sync call here. The request is queued and processed by a pool of threads. Currently, there is a metric to show the queue size. But it doesn't tell how many are for

Re: Schedule major compaction programmatically

2012-05-17 Thread Chen Song
Can you direct me to the API call to get the queue size metrics? On Thu, May 17, 2012 at 2:58 PM, Jimmy Xiang jxi...@cloudera.com wrote: It is an async call to the region server to request a compaction. Once the request is accepted, the call returned. There is no sync call here. The

Re: Schedule major compaction programmatically

2012-05-17 Thread Jimmy Xiang
HRegionServer.java: this.metrics.compactionQueueSize.set(compactSplitThread .getCompactionQueueSize()); On Thu, May 17, 2012 at 12:00 PM, Chen Song chen.song...@gmail.com wrote: Can you direct me to the API call to get the queue size metrics? On Thu, May 17, 2012 at 2:58 PM,

Re: hbase security

2012-05-17 Thread Stack
On Thu, May 17, 2012 at 7:19 AM, Eugene Koontz ekoo...@hiro-tan.org wrote: http://web.archive.org/web/20101031022526/http://hbaseblog.com/2010/10/11/secure-hbase-access-controls/ http://web.archive.org/web/20100817034022/http://hbaseblog.com/2010/07/21/up-and-running-with-secure-hadoop/

Re: Schedule major compaction programmatically

2012-05-17 Thread Chen Song
Sorry for another dump question. As I am querying such information in client code, how to get a HRegionServer from a HRegionInfo, or HServerAddress? I found a way to get HRegionInterface shown below. HConnection.getHRegionConnection(HServerAddress) But getMetrics method is not exposed on

Re: hbase security

2012-05-17 Thread Gary Helmling
I could repost the up and running with secure hadoop one. But it's kind of out of date at this point. I remember, back when the site was still up, getting some comments on it about things that had already changed in the 0.20.20X releases. I can take a look and see how bad it is. On Thu, May

Re: hbase security

2012-05-17 Thread Eugene Koontz
On 5/17/12 1:22 PM, Stack wrote: On Thu, May 17, 2012 at 7:19 AM, Eugene Koontz ekoo...@hiro-tan.org wrote: http://web.archive.org/web/20101031022526/http://hbaseblog.com/2010/10/11/secure-hbase-access-controls/

Re: client timeouts after upgrading to 0.92

2012-05-17 Thread Jean-Daniel Cryans
This means that the servers aren't responding in 60 seconds to the clients, I believe this is new from 0.90 so it could be that you were used to have long-running requests. If not, check what's going on with those servers at the address given in the exception message. J-D On Thu, May 17, 2012

hbase data

2012-05-17 Thread Rita
Hello, Currently, using hbase to store sensor data -- basically large time series data hitting close to 2 billion rows for a type of sensor. I was wondering how hbase differs from HDF (http://www.hdfgroup.org/HDF5/) file format. Most of my operations are scanning a range and getting its values

Trailer 'header' is wrong; does the trailer size match content

2012-05-17 Thread Something Something
Hello, I keep getting this message while running the 'completebulkload' process. I tried the following solutions that I came across while Googling for this error: 1) setReduceSpeculativeExecution(true) 2) Made sure that none of the tasks are failing. 3) The HFileOutput job runs

Re: Trailer 'header' is wrong; does the trailer size match content

2012-05-17 Thread Ted Yu
Can you post the complete message ? What HBase version are you using ? On Thu, May 17, 2012 at 4:48 PM, Something Something mailinglist...@gmail.com wrote: Hello, I keep getting this message while running the 'completebulkload' process. I tried the following solutions that I came across

Re: Trailer 'header' is wrong; does the trailer size match content

2012-05-17 Thread Something Something
HBase Version: hbase-0.90.4-cdh3u3 Hadoop Version: hadoop-0.20.2-cdh3u2 12/05/17 16:37:47 ERROR mapreduce.LoadIncrementalHFiles: IOException during splitting java.util.concurrent.ExecutionException: java.io.IOException: Trailer 'header' is wrong; does the trailer size match content?

Re: EndPoint Coprocessor could be dealocked?

2012-05-17 Thread fding hbase
Hi Michel, On Fri, May 18, 2012 at 1:39 AM, Michael Segel michael_se...@hotmail.comwrote: You should not let just any user run coprocessors on the server. That's madness. Best regards, - Andy Fei Ding, I'm a little confused. Are you trying to solve the problem of querying data