Hi Ted, Yes I am aware that this isn't a good setup. I am working on understanding the coprocessors before I can use a bigger setup.
I am talking about the prez at https://hbase.s3.amazonaws.com/hbase/HBase-CP-HUG10.pdf by Andrew Purtell, slide 5 : <quote> For long running jobs the client must periodically poll status to keep it alive; jobs without interest will be cancelled </quote> On Mon, Sep 26, 2011 at 2:08 PM, Ted Yu <[email protected]> wrote: > Looks like my response to user@ got bounced. > > Himanshu is continuing the work on HBASE-3607 > >>> Currently all the regions are on a single region server. > I don't think the above is a good setup. > >>> It was mentioned to poll the job status periodically to avoid timeouts. > Can you tell us more about the presentation ? Normally you can include the > author in the cc list of this email chain. > > Cheers > > On Mon, Sep 26, 2011 at 12:44 AM, Mayuresh <[email protected]> > wrote: >> >> Yes the computation takes longer than 60 secs. Will setting lease >> timeout in hbase-site.xml help? >> >> In one of the coprocessor presentations. It was mentioned to poll the >> job status periodically to avoid timeouts. Any idea how to do that? >> >> On Sep 23, 2011 9:59 PM, "Stack" <[email protected]> wrote: >> > Is it doing a long GC? Is the aggregation taking longer than the >> > client 60 second timeout (You are setting lease below but I don't >> > think that will have an effect since it server-side)? >> > >> > St.Ack >> > >> > >> > On Fri, Sep 23, 2011 at 2:52 AM, Mayuresh >> > <[email protected]> wrote: >> >> Hi, >> >> >> >> I am using the AggregationClient to perform aggregate calculation on a >> >> 10 M row hbase table which is spread across around 27 regions. >> >> Currently all the regions are on a single region server. >> >> >> >> Here is briefly what the code is doing: >> >> Leases leases = new Leases(40 * 60 * 1000, 5 * 1000); >> >> leases.start(); >> >> long startTime = System.currentTimeMillis(); >> >> System.out.println("Start time: " + startTime); >> >> Configuration conf = new Configuration(true); >> >> AggregationClient aClient = new AggregationClient(conf); >> >> >> >> Scan scan = new Scan(); >> >> scan.addColumn(TEST_FAMILY, TEST_QUALIFIER); >> >> final ColumnInterpreter<Long, Long> ci = new >> >> LongColumnInterpreter(); >> >> >> >> double avg = aClient.avg(TEST_TABLE, ci, >> >> scan); >> >> leases.close(); >> >> long endTime = System.currentTimeMillis(); >> >> System.out.println("End time: " + endTime); >> >> System.out.println("Avg is: " + avg); >> >> System.out.println("Time taken is: " + (endTime - startTime)); >> >> >> >> However it is failing after some time with the following log messages >> >> in the region server log: >> >> 2011-09-23 14:21:25,649 DEBUG >> >> org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for >> >> region >> >> Test,serv3-1315575650141,1316682342728.b3fe66994b0f2fb0643c88f6897474a1. >> >> 2011-09-23 14:21:25,650 DEBUG >> >> org.apache.hadoop.hbase.io.hfile.HFileReaderV2: On close of file >> >> 15761056997240226 evicted 344 block(s) >> >> 2011-09-23 14:21:25,650 DEBUG >> >> org.apache.hadoop.hbase.regionserver.Store: closed data >> >> 2011-09-23 14:21:25,650 INFO >> >> org.apache.hadoop.hbase.regionserver.HRegion: Closed >> >> >> >> Test,serv3-1315575650141,1316682342728.b3fe66994b0f2fb0643c88f6897474a1. >> >> 2011-09-23 14:21:25,650 DEBUG >> >> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler: >> >> Closed region >> >> Test,serv3-1315575650141,1316682342728.b3fe66994b0f2fb0643c88f6897474a1. >> >> 2011-09-23 14:21:25,651 WARN org.apache.hadoop.ipc.HBaseServer: IPC >> >> Server Responder, call execCoprocessor([B@6b6ab732, >> >> >> >> getAvg(org.apache.hadoop.hbase.client.coprocessor.LongColumnInterpreter@2b216ab6, >> >> >> >> {"timeRange":[0,9223372036854775807],"batch":-1,"startRow":"serv1-1315571539741","stopRow":"serv5-1315575834916","totalColumns":0,"cacheBlocks":true,"families":{"data":[]},"maxVersions":1,"caching":-1})) >> >> from 137.72.240.177:47563: output error >> >> 2011-09-23 14:21:25,651 WARN org.apache.hadoop.ipc.HBaseServer: IPC >> >> Server handler 9 on 60020 caught: >> >> java.nio.channels.ClosedChannelException >> >> at >> >> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133) >> >> at >> >> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324) >> >> at >> >> org.apache.hadoop.hbase.ipc.HBaseServer.channelWrite(HBaseServer.java:1501) >> >> at >> >> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.processResponse(HBaseServer.java:876) >> >> at >> >> org.apache.hadoop.hbase.ipc.HBaseServer$Responder.doRespond(HBaseServer.java:955) >> >> at >> >> org.apache.hadoop.hbase.ipc.HBaseServer$Call.sendResponseIfReady(HBaseServer.java:390) >> >> at >> >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1240) >> >> >> >> And the following in the zookeeper log: >> >> 2011-09-23 14:19:50,978 INFO >> >> org.apache.zookeeper.server.NIOServerCnxn: Established session >> >> 0x1329577cad70004 with negotiated timeout 180000 for client >> >> /137.72.240.177:45431 >> >> 2011-09-23 14:21:11,028 WARN >> >> org.apache.zookeeper.server.NIOServerCnxn: EndOfStreamException: >> >> Unable to read additional data from client sessionid >> >> 0x1329577cad70004, likely client has closed socket >> >> 2011-09-23 14:21:11,029 INFO >> >> org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection >> >> for client /137.72.240.177:45431 which had sessionid 0x1329577cad70004 >> >> 2011-09-23 14:24:42,000 INFO >> >> org.apache.zookeeper.server.ZooKeeperServer: Expiring session >> >> 0x1329577cad70004, timeout of 180000ms exceeded >> >> 2011-09-23 14:24:42,001 INFO >> >> org.apache.zookeeper.server.PrepRequestProcessor: Processed session >> >> termination for sessionid: 0x1329577cad70004 >> >> >> >> Any ideas on how to resolve this? >> >> >> >> Thanks in advance. >> >> >> >> Regards, >> >> Mayuresh >> >> > > -- -Mayuresh
