hbase-server required on CP for running MR jobs?

2014-11-05 Thread Tim Robertson
Hey folks, I'm upgrading an application from CDH4.3 to CDH5.2 so jumping from 0.94 to 0.98 and wanted to just ask for confirmation on the dependencies now hbase has split into hbase-client and hbase-server etc. If I am submitting MR jobs (to Yarn) that use things like TableMapReduceUtil it seems

Re: hbase-server required on CP for running MR jobs?

2014-11-05 Thread Ted Yu
Your observation is right. There isn't hbase-mapreduce module yet. For now, you need to include hbase-server module. Cheers On Nov 5, 2014, at 1:27 AM, Tim Robertson timrobertson...@gmail.com wrote: Hey folks, I'm upgrading an application from CDH4.3 to CDH5.2 so jumping from 0.94 to

Re: hbase-server required on CP for running MR jobs?

2014-11-05 Thread Tim Robertson
Thanks for confirming Ted I'll use hbase-server and then exclude most of the transient dependencies, and the hadoop-core (MR1 stuff). I can't find a Jira for this so will write one up. Cheers, Tim On Wed, Nov 5, 2014 at 10:59 AM, Ted Yu yuzhih...@gmail.com wrote: Your observation is

Re: hbase-server required on CP for running MR jobs?

2014-11-05 Thread Ted Yu
See this JIRA: https://issues.apache.org/jira/browse/HBASE-11549 Cheers On Nov 5, 2014, at 4:58 AM, Tim Robertson timrobertson...@gmail.com wrote: Thanks for confirming Ted I'll use hbase-server and then exclude most of the transient dependencies, and the hadoop-core (MR1 stuff). I can't

Re: No FileSystem for scheme: file

2014-11-05 Thread Sean Busbey
How are you submitting the job? How are your cluster configuration files deployed (i.e. are you using CM)? On Wed, Nov 5, 2014 at 8:50 AM, Tim Robertson timrobertson...@gmail.com wrote: Hi all, I'm seeing the following java.io.IOException: No FileSystem for scheme: file at

Re: No FileSystem for scheme: file

2014-11-05 Thread Tim Robertson
Hi Sean, We are using CM, and Hue, Hive etc all work, but for some reason I can't get the CP correct for this job which I submit using: java -cp :$HADOOP_HOME/hdfs/hadoop-hdfs-2.5.0-cdh5.2.0.jar:./:target/classes:target/cube-0.17-SNAPSHOT-jar-with-dependencies.jar

org.apache.hadoop.hbase.client.HTable.exists changes my Get-object

2014-11-05 Thread Gerke Ephorus
Hi all, we are in the process of upgrading HBase from 0.92.x to 0.98.6.x and run into this code in the HBase-client: /** * {@inheritDoc} */ @Override public boolean exists(final Get get) throws IOException { get.setCheckExistenceOnly(true); Result r = get(get); assert

Re: No FileSystem for scheme: file

2014-11-05 Thread Stack
Hey Tim: Add hadoop-common? It has the 'file:///' implementation (look for LocalFileSystem). See if that works. Hope all is well, St.Ack On Wed, Nov 5, 2014 at 7:45 AM, Tim Robertson timrobertson...@gmail.com wrote: Hi Sean, We are using CM, and Hue, Hive etc all work, but for some reason

Re: No FileSystem for scheme: file

2014-11-05 Thread Sean Busbey
The error sounds like you do not have your HDFS configs in the classpath. Generally, you should be submitting the job via the 'hadoop jar' command (and your main class should be implementing Tool). This will take care of setting the correct classpath for both the Hadoop related jars and

Re: No FileSystem for scheme: file

2014-11-05 Thread Tim Robertson
Thanks St.Ack, Sean I'll change the submission process first thing tomorrow - hadoop-common is on the CP (in the fat jar) and it did work before I started ripping out the MR1 stuff. [Things are good St.Ack - thanks. Hope you're also well] On Wed, Nov 5, 2014 at 4:59 PM, Sean Busbey

Re: No FileSystem for scheme: file

2014-11-05 Thread Walter King
We ran into this issue. This post: http://stackoverflow.com/questions/17265002/hadoop-no-filesystem-for-scheme-file was helpful. Differents JARs (hadoop-commons for LocalFileSystem, hadoop-hdfs for DistributedFileSystem) each contain a different file called org.apache.hadoop.fs.FileSystem in

Namespace vs table

2014-11-05 Thread Kenneth Chan
Hi, What's the difference between Namespace and Table? What would be consideration to determine if I should use a new namespace V.S new table? Thanks Kenneth

Re: Namespace vs table

2014-11-05 Thread Ted Yu
Have you read http://hbase.apache.org/book.html#namespace ? Every table lives in namespace - the user tables are under 'default' namespace if you don't create your own namespace. Cheers On Wed, Nov 5, 2014 at 3:34 PM, Kenneth Chan ckh...@gmail.com wrote: Hi, What's the difference between

HMaster keeps dying when using with hadoop

2014-11-05 Thread Sivasubramaniam, Latha
Hi, I am trying to set up Hbase in cluster mode using getting starting guide. In the pseudo cluster setup, if I use regular file system, HMaster works fine. But when I use hdfs URI in the hbase-site.xml, the HMaster crashes. I am able to list the files in the hdfs using the same URI. The

Re: HMaster keeps dying when using with hadoop

2014-11-05 Thread Ted Yu
bq. Failed verification of hbase: meta Can you give the full stack trace ? What release of hbase are you running ? Pastebin'ing your config file would help determine the cause. Cheers On Wed, Nov 5, 2014 at 3:54 PM, Sivasubramaniam, Latha latha.sivasubraman...@aspect.com wrote: Hi, I am

Range or bulk delete

2014-11-05 Thread Kenneth Chan
Hi, What's recommended/efficient way to delete large number of rows based on filter/query? Thanks Kenneth

Re: Namespace vs table

2014-11-05 Thread Kenneth Chan
Thanks for the pointer!

Re: Range or bulk delete

2014-11-05 Thread Ted Yu
Please take a look at hbase-examples/src/main/java/org/apache/hadoop/hbase/coprocessor/example/BulkDeleteEndpoint.java Cheers On Wed, Nov 5, 2014 at 4:15 PM, Kenneth Chan ckh...@gmail.com wrote: Hi, What's recommended/efficient way to delete large number of rows based on filter/query?

Re: Increasing write throughput..

2014-11-05 Thread Gautam
Thanks Anoop, Ted for the replies. This helped me understand Hbase's write path a lot more. After going through the literature and your comments on what triggers memstore flushes, Did the following : - Added 4 nodes ( all 8+4 = 12 RSs have 48000M heap each) - changed

Re: Range or bulk delete

2014-11-05 Thread Kenneth Chan
Thanks. Related question is that is disabling and dropping a table efficient? compared with deleting large number of rows (assuming number of rows to be deleted is the same as number of rows in the table to be disabled and dropped). Thanks! On Wed, Nov 5, 2014 at 4:19 PM, Ted Yu

Re: org.apache.hadoop.hbase.client.HTable.exists changes my Get-object

2014-11-05 Thread Stack
The cited exists has been around a while: https://issues.apache.org/jira/browse/HBASE-1544 You used to use this this one? + * @deprecated As of hbase 0.20.0, replaced by {@link #exists(Get)} */ public boolean exists(final byte [] row) throws IOException { They do the same thing essentially.

Re: Range or bulk delete

2014-11-05 Thread Ted Yu
Please take a look at HBASE-8963. Cheers On Wed, Nov 5, 2014 at 5:14 PM, Kenneth Chan ckh...@gmail.com wrote: Thanks. Related question is that is disabling and dropping a table efficient? compared with deleting large number of rows (assuming number of rows to be deleted is the same as

Re: UI tool

2014-11-05 Thread jeevi tesh
I wanted to use Hue but they said we need to be on Thrift.. I'm just using Hbase.0.96.2 and hadoop 2.2 so can i use Hue? On Thu, Nov 6, 2014 at 10:17 AM, Dima Spivak dspi...@cloudera.com wrote: +user@, bcc: dev@ Check out the Hue project at http://gethue.com/ . All the best, Dima On

Hbase Unusable after auto split to 1024 regions

2014-11-05 Thread Pere Kyle
Hello, Recently our cluster which has been running fine for 2 weeks split to 1024 regions at 1GB per region, after this split the cluster is unusable. Using the performance benchmark I was getting a little better than 100 w/s, whereas before it was 5000 w/s. There are 15 nodes of m2.2xlarge

Re: UI tool

2014-11-05 Thread Dima Spivak
Yep, you just need to set up an HBase Thrift gateway that Hue can connect to (lots of tutorials online for that). Cheers, Dima On Wed, Nov 5, 2014 at 9:13 PM, jeevi tesh jeevitesh...@gmail.com wrote: I wanted to use Hue but they said we need to be on Thrift.. I'm just using Hbase.0.96.2

Re: Hbase Unusable after auto split to 1024 regions

2014-11-05 Thread Ted Yu
IncreasingToUpperBoundRegionSplitPolicy is the default split policy. You can read the javadoc of this class to see how it works. Cheers On Wed, Nov 5, 2014 at 9:39 PM, Ted Yu yuzhih...@gmail.com wrote: Can you provide a bit more information (such as HBase release) ? If you pastebin one of

Re: Hbase Unusable after auto split to 1024 regions

2014-11-05 Thread Pere Kyle
Here is a paste from one of the region servers http://pastebin.com/H3BaHdPq I am running Hbase on EMR HBase Version 0.94.18, re17f91a1f107923d2defc7f18dbca59983f0a69f Thanks, Pere On Nov 5, 2014, at 9:39 PM, Ted Yu yuzhih...@gmail.com wrote: Can you provide a bit more information (such as

Re: Hbase Unusable after auto split to 1024 regions

2014-11-05 Thread Pere Kyle
Ted, Thanks so much for that information. I now see why this split too often, but what I am not sure of is how to fix this without blowing away the cluster. Add more heap? Another symptom I have noticed is that load on the Master instance hbase daemon has been pretty high (load average 4.0,

Re: Hbase Unusable after auto split to 1024 regions

2014-11-05 Thread Ted Yu
You can use ConstantSizeRegionSplitPolicy. Split policy can be specified per table. See the following example in create.rb : hbase create 't1', 'f1', {NUMREGIONS = 15, SPLITALGO = 'HexStringSplit'} In 0.94.18, there isn't online merge. So you have to use other method to merge the small

RE: is there a HBase 0.98 hdfs directory structure introduction?

2014-11-05 Thread Liu, Ming (HPIT-GADSC)
Thanks Ted for the short but very useful reply! ^_^ It is clear now. -Original Message- From: Ted Yu [mailto:yuzhih...@gmail.com] Sent: Monday, November 03, 2014 11:30 AM To: user@hbase.apache.org Subject: Re: is there a HBase 0.98 hdfs directory structure introduction? In 0.98, you

Re: Hbase Unusable after auto split to 1024 regions

2014-11-05 Thread Pere Kyle
Watching closely a region server in action. It seems that the memstores are being flushed at around 2MB on the regions. This would seem to indicate that there is not enough heap for the memstore and I am hitting the upper bound of limit (default). Would this be a fair assumption? Should I look

Re: org.apache.hadoop.hbase.client.HTable.exists changes my Get-object

2014-11-05 Thread Gerke Ephorus
Hi St.Ack, thanks for the quick reply. Three things (hope I counted right): * we used the API that was available in 0.92.x which already allowed specifying more than just the row-key. We specifically check for existence of a cell via a ROWCOL bloom-filter. * Is the *get* in 0.98.x just as fast