Hi Jean-Daniel, " I don't know much about how your job works (is it multithreaded, is it a mapreduce job, etc) and it would be nice if you tell us more about it, so I'm going to assume you are inserting it in a single thread."
my job is a mapreduce job. I've 4 node cluster. I don't have any problems with my mapjobs. I only have problems with my reduce jobs. " If you have a single thread inserting into a 1 machine HBase cluster, then the data is stored once. If you have 4 machines, and you set the replication to 3 which is the default, then 2GB becomes 6GB and it's all inserted sequentially. I would expect a slow down." I don't configure the replication, but it must import in 4 minutes, because I've 4 cluster. " Now, 40 minutes VS 4 minutes is an order of magnitude slower and it doesn't seem right. Have you looked into where it's slow? Can you investigate more and give us some other data points?" I've install Hbase 0.90.1. Find attached my hbase configuration datas. Regards Musa 4 clusters? You mean 4 machines? I don't know much about how your job works (is it multithreaded, is it a mapreduce job, etc) and it would be nice if you tell us more about it, so I'm going to assume you are inserting it in a single thread. If you have a single thread inserting into a 1 machine HBase cluster, then the data is stored once. If you have 4 machines, and you set the replication to 3 which is the default, then 2GB becomes 6GB and it's all inserted sequentially. I would expect a slow down. Now, 40 minutes VS 4 minutes is an order of magnitude slower and it doesn't seem right. Have you looked into where it's slow? Can you investigate more and give us some other data points? J-D On Mon, Feb 28, 2011 at 6:00 AM, Cavus,M.,Fa. Post Direkt <M.Cavus@...> wrote: > Hi, > > > > I've a simple job. It imports 2 GB of data in 4 minutes to hbase with > hadoop and not cluster. > > If I configure full distributed mode, it imports 2 GB of data in 40 > minutes to my 4 clusters. > > > > Did anyone have same problems? > > > > Regards > > Musa > > > >
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- /** * Copyright 2009 The Apache Software Foundation * * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ --> <configuration> <property> <name>hbase.rootdir</name> <value>hdfs://master.local:54310/hbase</value> <description>The directory shared by region servers.</description> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>slave1.local,slave2.local,slave3.local</value> </property> <property> <name>hbase.master</name> <value>master.local:60000</value> <description>The host and port that the HBase master runs at. A value of 'local' runs the master and a regionserver in a single process. </description> </property> <property> <name>hbase.regionserver.handler.count</name> <value>20</value> </property> <property> <name>hbase.hregion.max.filesize</name> <value>134217728</value> <description>Default: 268435456</description> </property> <property> <name>hbase.hstore.compactionThreshold</name> <value>2</value> <description>Default: 3</description> </property> <property> <name>hbase.hregion.memstore.flush.size</name> <value>33554432</value> <description>Default: 67108864</description> </property> </configuration>
