Dear Cristofer, Thanks so much for your reminding!
Best regards, Bing On Thu, Aug 30, 2012 at 12:32 AM, Cristofer Weber < [email protected]> wrote: > There's also a lot of conversions from same values to byte array > representation, eg, your NeighborStructure constants. You should do this > conversion only once to save time, since you are doing this inside 3 nested > loops. Not sure about how much this can improve, but you should try this > also. > > Best regards, > Cristofer > > -----Mensagem original----- > De: Bing Li [mailto:[email protected]] > Enviada em: quarta-feira, 29 de agosto de 2012 13:07 > Para: [email protected] > Cc: [email protected] > Assunto: Re: HBase Is So Slow To Save Data? > > I see. Thanks so much! > > Bing > > > On Wed, Aug 29, 2012 at 11:59 PM, N Keywal <[email protected]> wrote: > > > It's not useful here: if you have a memory issue, it's when your using > > the list, not when you have finished with it and set it to null. > > You need to monitor the memory consumption of the jvm, both the client > > & the server. > > Google around these keywords, there are many examples on the web. > > Google as well arrayList initialization. > > > > Note as well that the important is not the memory size of the > > structure on disk but the size of the" List<Put> puts = new > > ArrayList<Put>();" before the table put. > > > > On Wed, Aug 29, 2012 at 5:42 PM, Bing Li <[email protected]> wrote: > > > > > Dear N Keywal, > > > > > > Thanks so much for your reply! > > > > > > The total amount of data is about 110M. The available memory is > > > enough, > > 2G. > > > > > > In Java, I just set a collection to NULL to collect garbage. Do you > > > think it is fine? > > > > > > Best regards, > > > Bing > > > > > > > > > On Wed, Aug 29, 2012 at 11:22 PM, N Keywal <[email protected]> wrote: > > > > > >> Hi Bing, > > >> > > >> You should expect HBase to be slower in the generic case: > > >> 1) it writes much more data (see hbase data model), with extra > > >> columns qualifiers, timestamps & so on. > > >> 2) the data is written multiple times: once in the write-ahead-log, > > >> once per replica on datanode & so on again. > > >> 3) there are inter process calls & inter machine calls on the > > >> critical path. > > >> > > >> This is the cost of the atomicity, reliability and scalability > features. > > >> With these features in mind, HBase is reasonably fast to save data > > >> on a cluster. > > >> > > >> On your specific case (without the points 2 & 3 above), the > > >> performance seems to be very bad. > > >> > > >> You should first look at: > > >> - how much is spent in the put vs. preparing the list > > >> - do you have garbage collection going on? even swap? > > >> - what's the size of your final Array vs. the available memory? > > >> > > >> Cheers, > > >> > > >> N. > > >> > > >> > > >> > > >> On Wed, Aug 29, 2012 at 4:08 PM, Bing Li <[email protected]> wrote: > > >> > > >>> Dear all, > > >>> > > >>> By the way, my HBase is in the pseudo-distributed mode. Thanks! > > >>> > > >>> Best regards, > > >>> Bing > > >>> > > >>> On Wed, Aug 29, 2012 at 10:04 PM, Bing Li <[email protected]> wrote: > > >>> > > >>> > Dear all, > > >>> > > > >>> > According to my experiences, it is very slow for HBase to save > data? > > >>> Am I > > >>> > right? > > >>> > > > >>> > For example, today I need to save data in a HashMap to HBase. It > > >>> > took about more than three hours. However when saving the same > > >>> > HashMap in > > a > > >>> file > > >>> > in the text format with the redirected System.out, it took only > > >>> > 4.5 > > >>> seconds! > > >>> > > > >>> > Why is HBase so slow? It is indexing? > > >>> > > > >>> > My code to save data in HBase is as follows. I think the code > > >>> > must be correct. > > >>> > > > >>> > ...... > > >>> > public synchronized void > > >>> > AddVirtualOutgoingHHNeighbors(ConcurrentHashMap<String, > > >>> > ConcurrentHashMap<String, Set<String>>> hhOutNeighborMap, int > > >>> timingScale) > > >>> > { > > >>> > List<Put> puts = new ArrayList<Put>(); > > >>> > > > >>> > String hhNeighborRowKey; > > >>> > Put hubKeyPut; > > >>> > Put groupKeyPut; > > >>> > Put topGroupKeyPut; > > >>> > Put timingScalePut; > > >>> > Put nodeKeyPut; > > >>> > Put hubNeighborTypePut; > > >>> > > > >>> > for (Map.Entry<String, ConcurrentHashMap<String, > > >>> > Set<String>>> sourceHubGroupNeighborEntry : > > >>> hhOutNeighborMap.entrySet()) > > >>> > { > > >>> > for (Map.Entry<String, Set<String>> > > >>> > groupNeighborEntry : > > sourceHubGroupNeighborEntry.getValue().entrySet()) > > >>> > { > > >>> > for (String neighborKey : > > >>> > groupNeighborEntry.getValue()) > > >>> > { > > >>> > hhNeighborRowKey = > > >>> > NeighborStructure.HUB_HUB_NEIGHBOR_ROW + > > >>> > Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() + > > >>> > groupNeighborEntry.getKey() + timingScale + neighborKey); > > >>> > > > >>> > hubKeyPut = new > > >>> > Put(Bytes.toBytes(hhNeighborRowKey)); > > >>> > > > >>> > > > hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY) > > , > > >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN) > > >>> > , Bytes.toBytes(sourceHubGroupNeighborEntry.getKey())); > > >>> > puts.add(hubKeyPut); > > >>> > > > >>> > groupKeyPut = new > > >>> > Put(Bytes.toBytes(hhNeighborRowKey)); > > >>> > > > >>> > > > >>> > > groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMIL > > Y), > > >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUM > > >>> > N), Bytes.toBytes(groupNeighborEntry.getKey())); > > >>> > puts.add(groupKeyPut); > > >>> > > > >>> > topGroupKeyPut = new > > >>> > Put(Bytes.toBytes(hhNeighborRowKey)); > > >>> > > > >>> > > > >>> > > topGroupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FA > > MILY), > > >>> > > > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TOP_GROUP_KEY_COLUMN) > > , > > >>> > > > >>> > > Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupNeighborEntry > > .getKey()))); > > >>> > > > >>> > puts.add(topGroupKeyPut); > > >>> > > > >>> > timingScalePut = new > > >>> > Put(Bytes.toBytes(hhNeighborRowKey)); > > >>> > > > >>> > > > >>> > > timingScalePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FA > > MILY), > > >>> > > > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TIMING_SCALE_COLUMN), > > >>> > Bytes.toBytes(timingScale)); > > >>> > > > >>> > puts.add(timingScalePut); > > >>> > > > >>> > nodeKeyPut = new > > >>> > Put(Bytes.toBytes(hhNeighborRowKey)); > > >>> > > > >>> > > > >>> > > nodeKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY > > ), > > >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_NODE_KEY_COLUMN > > >>> > ), > > >>> > Bytes.toBytes(neighborKey)); > > >>> > puts.add(nodeKeyPut); > > >>> > > > >>> > hubNeighborTypePut = new > > >>> > Put(Bytes.toBytes(hhNeighborRowKey)); > > >>> > > > >>> > > > >>> > > hubNeighborTypePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBO > > R_FAMILY), > > >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TYPE_COLUMN), > > >>> > Bytes.toBytes(SocialRole.VIRTUAL_NEIGHBOR)); > > >>> > > puts.add(hubNeighborTypePut); > > >>> > } > > >>> > } > > >>> > } > > >>> > > > >>> > try > > >>> > { > > >>> > this.neighborTable.put(puts); > > >>> > } > > >>> > catch (IOException e) > > >>> > { > > >>> > e.printStackTrace(); > > >>> > } > > >>> > } > > >>> > ...... > > >>> > > > >>> > Thanks so much! > > >>> > > > >>> > Best regards, > > >>> > Bing > > >>> > > > >>> > > >> > > >> > > > > > >
