Sever, You and Jesse Yates should talk. See http://jyates.github.com/2012/07/09/consistent-enough-secondary-indexes.html
- Andy On Jul 14, 2012, at 5:24 AM, Sever Fundatureanu <[email protected]> wrote: > My intention is to implement a Secondary Index as suggested here: > http://wiki.apache.org/hadoop/Hbase/SecondaryIndexing. > It is advised here to add secondary index edits to a "shared work queue". > And that "The shared queue would be a thread or threadpool that picks up > these secondary table edit jobs and applies them using a normal Put > operation to the secondary table". Is this shared queue some kind of > mechanism external to all Region servers? Or a queue shared only between > the threads local to one RS? > > Thanks, > Sever > > On Mon, Jul 9, 2012 at 10:16 AM, Nick Dimiduk <[email protected]> wrote: > >> On Sat, Jul 7, 2012 at 2:58 AM, Sever Fundatureanu < >> [email protected]> wrote: >> >>> Also does anybody know what is the flow in the system when a coprocessor >>> from one RS make a Put call with a row key which falls on another RS? >> I.e. >>> do the Region servers communicate directly between each other? >>> >> >> In this case, the coprocessor in your RS is acting like any other HBase >> client. Puts will write from the coproc to the target RS like a normal >> write. That is, of course, assuming I understand your implementation. >> >> -n >> >> On Fri, Jul 6, 2012 at 10:16 PM, Nick Dimiduk <[email protected]> wrote: >>> >>>> Sever, >>>> >>>> I presume you're loading your data via online Puts via the MR job (as >>>> opposed to generating HFiles). What are you hoping to gain from a >>>> coprocessor implementation vs the 6 MR jobs? Have you pre-split your >>>> tables? Can the RegionServer(s) handle all the concurrent mappers? >>>> >>>> -n >>>> >>>> On Mon, Jul 2, 2012 at 11:58 AM, Sever Fundatureanu < >>>> [email protected]> wrote: >>>> >>>>> I agree that increasing the timeout is not the best option, I will >> work >>>>> both on better balancing the load and maybe doing it in increments >> like >>>> you >>>>> suggested. However for now I want a quick fix to the problem. >>>>> >>>>> Just to see if I understand this right: a zookeeper node redirects my >>>>> client to a region server node and then my client talk directly to >> this >>>>> region server; now the timeout happens on the client while talking to >>> the >>>>> RS right? It expects some kind of confirmation and it times out.. if >>> this >>>>> is the case how can I increase this timeout? I only found in the >>>>> documentation "zookeeper.session.timeout" which is the timeout >> between >>>>> zookeeper and HBase. >>>>> >>>>> Thanks, >>>>> Sever >>>>> >>>>> On Mon, Jul 2, 2012 at 8:19 PM, Jean-Marc Spaggiari < >>>>> [email protected] >>>>>> wrote: >>>>> >>>>>> Hi Sever, >>>>>> >>>>>> It seems one of the nodes in your cluster is overwhelmed with the >>> load >>>>>> you are giving him. >>>>>> >>>>>> So IMO, you have two options here: >>>>>> First, you can try to reduce the load. I mean, split the bulk in >>>>>> multiple smaller bulks and load them one by one to give the time to >>>>>> your cluster to dispatch it correctly. >>>>>> Second, you can inscreade the timeone from 60s to 120s. But you >> might >>>>>> face the same issue with 120s so I really recommand the fist >> option. >>>>>> >>>>>> JM >>>>>> >>>>>> 2012/7/2, Sever Fundatureanu <[email protected]>: >>>>>>> Can someone please help me with this? >>>>>>> >>>>>>> Thanks, >>>>>>> Sever >>>>>>> >>>>>>> On Tue, Jun 26, 2012 at 8:14 PM, Sever Fundatureanu < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>>> My keys are built of 4 8-byte Ids. I am currently doing the >> load >>>> with >>>>>> MR >>>>>>>> but I get a timeout when doing the loadIncrementalFiles call: >>>>>>>> >>>>>>>> 12/06/24 21:29:01 ERROR mapreduce.LoadIncrementalHFiles: >>> Encountered >>>>>>>> unrecoverable error from region server >>>>>>>> org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed >>>> after >>>>>>>> attempts=10, exceptions: >>>>>>>> Sun Jun 24 21:29:01 CEST 2012, >>>>>>>> >> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$3@4699ecf9 >>> , >>>>>>>> java.net.SocketTimeoutException: Call to das3002.cm.cluster/ >>>>>>>> 10.141.0.79:60020 >>>>>>>> failed on socket timeout exception: >>> java.net.SocketTimeoutException: >>>>>>>> 60000 >>>>>>>> millis timeout while waiting for channel to be ready for read. >> ch >>> : >>>>>>>> java.nio.channels.SocketChannel[co >>>>>>>> nnected local=/10.141.0.254:43240 remote=das3002.cm.cluster/ >>>>>>>> 10.141.0.79:60020] >>>>>>>> >>>>>>>> at >>>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1345) >>>>>>>> at >>>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.tryAtomicRegionLoad(LoadIncrementalHFiles.java:487) >>>>>>>> at >>>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$1.call(LoadIncrementalHFiles.java:275) >>>>>>>> at >>>>>>>> >>>>>> >>>>> >>>> >>> >> org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles$1.call(LoadIncrementalHFiles.java:273) >>>>>>>> at >>>>>>>> >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >>>>>>>> at >>> java.util.concurrent.FutureTask.run(FutureTask.java:138) >>>>>>>> at >>>>>>>> >>>>>> >>>>> >>>> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >>>>>>>> at >>>>>>>> >>>>>> >>>>> >>>> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >>>>>>>> at java.lang.Thread.run(Thread.java:662) >>>>>>>> 12/06/24 21:30:52 ERROR mapreduce.LoadIncrementalHFiles: >>> Encountered >>>>>>>> unrecoverable error from region server >>>>>>>> >>>>>>>> Is there a way in which I can increase the timeout period? >>>>>>>> >>>>>>>> Thank you, >>>>>>>> >>>>>>>> On Tue, Jun 26, 2012 at 7:05 PM, Andrew Purtell >>>>>>>> <[email protected]>wrote: >>>>>>>> >>>>>>>>> On Tue, Jun 26, 2012 at 9:56 AM, Sever Fundatureanu >>>>>>>>> <[email protected]> wrote: >>>>>>>>>> I have to bulkload 6 tables which contain the same >> information >>>> but >>>>>>>>>> with >>>>>>>>> a >>>>>>>>>> different order to cover all possible access patterns. Would >> it >>>> be >>>>> a >>>>>>>>> good >>>>>>>>>> idea to do only one load and use co-processors to populate >> the >>>>> other >>>>>>>>>> tables, instead of doing the traditional MR bulkload which >>> would >>>>>>>>> require 6 >>>>>>>>>> separate jobs? >>>>>>>>> >>>>>>>>> Without knowing more than you've said, it seems better to use >> MR >>> to >>>>>>>>> build all input. >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> >>>>>>>>> - Andy >>>>>>>>> >>>>>>>>> Problems worthy of attack prove their worth by hitting back. - >>> Piet >>>>>>>>> Hein (via Tom White) >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Sever Fundatureanu >>>>>>>> >>>>>>>> Vrije Universiteit Amsterdam >>>>>>>> E-mail: [email protected] >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Sever Fundatureanu >>>>>>> >>>>>>> Vrije Universiteit Amsterdam >>>>>>> E-mail: [email protected] >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Sever Fundatureanu >>>>> >>>>> Vrije Universiteit Amsterdam >>>>> E-mail: [email protected] >>>>> >>>> >>> >>> >>> >>> -- >>> Sever Fundatureanu >>> >>> Vrije Universiteit Amsterdam >>> E-mail: [email protected] >>> >> > > > > -- > Sever Fundatureanu > > Vrije Universiteit Amsterdam > E-mail: [email protected]
