There's also a lot of conversions from same values to byte array representation, eg, your NeighborStructure constants. You should do this conversion only once to save time, since you are doing this inside 3 nested loops. Not sure about how much this can improve, but you should try this also.
Best regards, Cristofer -----Mensagem original----- De: Bing Li [mailto:[email protected]] Enviada em: quarta-feira, 29 de agosto de 2012 13:07 Para: [email protected] Cc: [email protected] Assunto: Re: HBase Is So Slow To Save Data? I see. Thanks so much! Bing On Wed, Aug 29, 2012 at 11:59 PM, N Keywal <[email protected]> wrote: > It's not useful here: if you have a memory issue, it's when your using > the list, not when you have finished with it and set it to null. > You need to monitor the memory consumption of the jvm, both the client > & the server. > Google around these keywords, there are many examples on the web. > Google as well arrayList initialization. > > Note as well that the important is not the memory size of the > structure on disk but the size of the" List<Put> puts = new > ArrayList<Put>();" before the table put. > > On Wed, Aug 29, 2012 at 5:42 PM, Bing Li <[email protected]> wrote: > > > Dear N Keywal, > > > > Thanks so much for your reply! > > > > The total amount of data is about 110M. The available memory is > > enough, > 2G. > > > > In Java, I just set a collection to NULL to collect garbage. Do you > > think it is fine? > > > > Best regards, > > Bing > > > > > > On Wed, Aug 29, 2012 at 11:22 PM, N Keywal <[email protected]> wrote: > > > >> Hi Bing, > >> > >> You should expect HBase to be slower in the generic case: > >> 1) it writes much more data (see hbase data model), with extra > >> columns qualifiers, timestamps & so on. > >> 2) the data is written multiple times: once in the write-ahead-log, > >> once per replica on datanode & so on again. > >> 3) there are inter process calls & inter machine calls on the > >> critical path. > >> > >> This is the cost of the atomicity, reliability and scalability features. > >> With these features in mind, HBase is reasonably fast to save data > >> on a cluster. > >> > >> On your specific case (without the points 2 & 3 above), the > >> performance seems to be very bad. > >> > >> You should first look at: > >> - how much is spent in the put vs. preparing the list > >> - do you have garbage collection going on? even swap? > >> - what's the size of your final Array vs. the available memory? > >> > >> Cheers, > >> > >> N. > >> > >> > >> > >> On Wed, Aug 29, 2012 at 4:08 PM, Bing Li <[email protected]> wrote: > >> > >>> Dear all, > >>> > >>> By the way, my HBase is in the pseudo-distributed mode. Thanks! > >>> > >>> Best regards, > >>> Bing > >>> > >>> On Wed, Aug 29, 2012 at 10:04 PM, Bing Li <[email protected]> wrote: > >>> > >>> > Dear all, > >>> > > >>> > According to my experiences, it is very slow for HBase to save data? > >>> Am I > >>> > right? > >>> > > >>> > For example, today I need to save data in a HashMap to HBase. It > >>> > took about more than three hours. However when saving the same > >>> > HashMap in > a > >>> file > >>> > in the text format with the redirected System.out, it took only > >>> > 4.5 > >>> seconds! > >>> > > >>> > Why is HBase so slow? It is indexing? > >>> > > >>> > My code to save data in HBase is as follows. I think the code > >>> > must be correct. > >>> > > >>> > ...... > >>> > public synchronized void > >>> > AddVirtualOutgoingHHNeighbors(ConcurrentHashMap<String, > >>> > ConcurrentHashMap<String, Set<String>>> hhOutNeighborMap, int > >>> timingScale) > >>> > { > >>> > List<Put> puts = new ArrayList<Put>(); > >>> > > >>> > String hhNeighborRowKey; > >>> > Put hubKeyPut; > >>> > Put groupKeyPut; > >>> > Put topGroupKeyPut; > >>> > Put timingScalePut; > >>> > Put nodeKeyPut; > >>> > Put hubNeighborTypePut; > >>> > > >>> > for (Map.Entry<String, ConcurrentHashMap<String, > >>> > Set<String>>> sourceHubGroupNeighborEntry : > >>> hhOutNeighborMap.entrySet()) > >>> > { > >>> > for (Map.Entry<String, Set<String>> > >>> > groupNeighborEntry : > sourceHubGroupNeighborEntry.getValue().entrySet()) > >>> > { > >>> > for (String neighborKey : > >>> > groupNeighborEntry.getValue()) > >>> > { > >>> > hhNeighborRowKey = > >>> > NeighborStructure.HUB_HUB_NEIGHBOR_ROW + > >>> > Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() + > >>> > groupNeighborEntry.getKey() + timingScale + neighborKey); > >>> > > >>> > hubKeyPut = new > >>> > Put(Bytes.toBytes(hhNeighborRowKey)); > >>> > > >>> > > hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY) > , > >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN) > >>> > , Bytes.toBytes(sourceHubGroupNeighborEntry.getKey())); > >>> > puts.add(hubKeyPut); > >>> > > >>> > groupKeyPut = new > >>> > Put(Bytes.toBytes(hhNeighborRowKey)); > >>> > > >>> > > >>> > groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMIL > Y), > >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUM > >>> > N), Bytes.toBytes(groupNeighborEntry.getKey())); > >>> > puts.add(groupKeyPut); > >>> > > >>> > topGroupKeyPut = new > >>> > Put(Bytes.toBytes(hhNeighborRowKey)); > >>> > > >>> > > >>> > topGroupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FA > MILY), > >>> > > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TOP_GROUP_KEY_COLUMN) > , > >>> > > >>> > Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupNeighborEntry > .getKey()))); > >>> > > >>> > puts.add(topGroupKeyPut); > >>> > > >>> > timingScalePut = new > >>> > Put(Bytes.toBytes(hhNeighborRowKey)); > >>> > > >>> > > >>> > timingScalePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FA > MILY), > >>> > > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TIMING_SCALE_COLUMN), > >>> > Bytes.toBytes(timingScale)); > >>> > > >>> > puts.add(timingScalePut); > >>> > > >>> > nodeKeyPut = new > >>> > Put(Bytes.toBytes(hhNeighborRowKey)); > >>> > > >>> > > >>> > nodeKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY > ), > >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_NODE_KEY_COLUMN > >>> > ), > >>> > Bytes.toBytes(neighborKey)); > >>> > puts.add(nodeKeyPut); > >>> > > >>> > hubNeighborTypePut = new > >>> > Put(Bytes.toBytes(hhNeighborRowKey)); > >>> > > >>> > > >>> > hubNeighborTypePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBO > R_FAMILY), > >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TYPE_COLUMN), > >>> > Bytes.toBytes(SocialRole.VIRTUAL_NEIGHBOR)); > >>> > puts.add(hubNeighborTypePut); > >>> > } > >>> > } > >>> > } > >>> > > >>> > try > >>> > { > >>> > this.neighborTable.put(puts); > >>> > } > >>> > catch (IOException e) > >>> > { > >>> > e.printStackTrace(); > >>> > } > >>> > } > >>> > ...... > >>> > > >>> > Thanks so much! > >>> > > >>> > Best regards, > >>> > Bing > >>> > > >>> > >> > >> > > >
