Yes Nick, I would like to do a comparison, performance-wise, between this method of bulkloading through coprocessors from 1 MR job with online Puts vs. having 6 MR jobs for generating HFiles. The tables are pre-split for both versions. For the coprocessors version my strategy is to buffer the Put operations for the other tables in the co-processors and flush once every FLUSH_LIMIT puts. I am currently working on making these buffers thread-safe, since the co-processor functions can be called from multiple RS threads. The region server handler count is currently set to the default of 10. These nodes have Dual CPU dual-cores, so 4 hw threads. Of course, on the RS nodes I also have running the Hadoop and HDFS daemons. Do you think I would have problems with this setup under heavy loads? Also does anybody know what is the flow in the system when a coprocessor from one RS make a Put call with a row key which falls on another RS? I.e. do the Region servers communicate directly between each other?
Thank you, Sever On Fri, Jul 6, 2012 at 10:16 PM, Nick Dimiduk <[email protected]> wrote: > Sever, > > I presume you're loading your data via online Puts via the MR job (as > opposed to generating HFiles). What are you hoping to gain from a > coprocessor implementation vs the 6 MR jobs? Have you pre-split your > tables? Can the RegionServer(s) handle all the concurrent mappers? > > -n > > On Mon, Jul 2, 2012 at 11:58 AM, Sever Fundatureanu < > [email protected]> wrote: > > > I agree that increasing the timeout is not the best option, I will work > > both on better balancing the load and maybe doing it in increments like > you > > suggested. However for now I want a quick fix to the problem. > > > > Just to see if I understand this right: a zookeeper node redirects my > > client to a region server node and then my client talk directly to this > > region server; now the timeout happens on the client while talking to the > > RS right? It expects some kind of confirmation and it times out.. if this > > is the case how can I increase this timeout? I only found in the > > documentation "zookeeper.session.timeout" which is the timeout between > > zookeeper and HBase. > > > > Thanks, > > Sever > > > > On Mon, Jul 2, 2012 at 8:19 PM, Jean-Marc Spaggiari < > > [email protected] > > > wrote: > > > > > Hi Sever, > > > > > > It seems one of the nodes in your cluster is overwhelmed with the load > > > you are giving him. > > > > > > So IMO, you have two options here: > > > First, you can try to reduce the load. I mean, split the bulk in > > > multiple smaller bulks and load them one by one to give the time to > > > your cluster to dispatch it correctly. > > > Second, you can inscreade the timeone from 60s to 120s. But you might > > > face the same issue with 120s so I really recommand the fist option. > > > > > > JM > > > > > > 2012/7/2, Sever Fundatureanu <[email protected]>: > > > > Can someone please help me with this? > > > > > > > > Thanks, > > > > Sever > > > > > > > > On Tue, Jun 26, 2012 at 8:14 PM, Sever Fundatureanu < > > > > [email protected]> wrote: > > > > > > > >> My keys are built of 4 8-byte Ids. I am currently doing the load > with > > > MR > > > >> but I get a timeout when doing the loadIncrementalFiles call: > > > >> > > > >> 12/06/24 21:29:01 ERROR mapreduce.LoadIncrementalHFiles: Encountered > > > >> unrecoverable error from region server > > > >> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed > after > > > >> attempts=10, exceptions: > > > >> Sun Jun 24 21:29:01 CEST 2012, > > > >> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3@4699ecf9, > > > >> java.net.SocketTimeoutException: Call to das3002.cm.cluster/ > > > >> 10.141.0.79:60020 > > > >> failed on socket timeout exception: java.net.SocketTimeoutException: > > > >> 60000 > > > >> millis timeout while waiting for channel to be ready for read. ch : > > > >> java.nio.channels.SocketChannel[co > > > >> nnected local=/10.141.0.254:43240 remote=das3002.cm.cluster/ > > > >> 10.141.0.79:60020] > > > >> > > > >> at > > > >> > > > > > > org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1345) > > > >> at > > > >> > > > > > > org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.tryAtomicRegionLoad(LoadIncrementalHFiles.java:487) > > > >> at > > > >> > > > > > > org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$1.call(LoadIncrementalHFiles.java:275) > > > >> at > > > >> > > > > > > org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$1.call(LoadIncrementalHFiles.java:273) > > > >> at > > > >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > > >> at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > > >> at > > > >> > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > > >> at > > > >> > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > > >> at java.lang.Thread.run(Thread.java:662) > > > >> 12/06/24 21:30:52 ERROR mapreduce.LoadIncrementalHFiles: Encountered > > > >> unrecoverable error from region server > > > >> > > > >> Is there a way in which I can increase the timeout period? > > > >> > > > >> Thank you, > > > >> > > > >> On Tue, Jun 26, 2012 at 7:05 PM, Andrew Purtell > > > >> <[email protected]>wrote: > > > >> > > > >>> On Tue, Jun 26, 2012 at 9:56 AM, Sever Fundatureanu > > > >>> <[email protected]> wrote: > > > >>> > I have to bulkload 6 tables which contain the same information > but > > > >>> > with > > > >>> a > > > >>> > different order to cover all possible access patterns. Would it > be > > a > > > >>> good > > > >>> > idea to do only one load and use co-processors to populate the > > other > > > >>> > tables, instead of doing the traditional MR bulkload which would > > > >>> require 6 > > > >>> > separate jobs? > > > >>> > > > >>> Without knowing more than you've said, it seems better to use MR to > > > >>> build all input. > > > >>> > > > >>> Best regards, > > > >>> > > > >>> - Andy > > > >>> > > > >>> Problems worthy of attack prove their worth by hitting back. - Piet > > > >>> Hein (via Tom White) > > > >>> > > > >> > > > >> > > > >> > > > >> -- > > > >> Sever Fundatureanu > > > >> > > > >> Vrije Universiteit Amsterdam > > > >> E-mail: [email protected] > > > >> > > > > > > > > > > > > > > > > -- > > > > Sever Fundatureanu > > > > > > > > Vrije Universiteit Amsterdam > > > > E-mail: [email protected] > > > > > > > > > > > > > > > -- > > Sever Fundatureanu > > > > Vrije Universiteit Amsterdam > > E-mail: [email protected] > > > -- Sever Fundatureanu Vrije Universiteit Amsterdam E-mail: [email protected]
