That was going to be my suggestion as well, except the zookeeper property is maxclientcnxns.
Cheers, Adam On Aug 16, 2012 7:22 AM, "Jim Klucar" <[email protected]> wrote: > Just shooting from the hip here. > > Zookeeper maxclientcxns in zoo.cfg should be increased from the default to > something like 100. Check the zookeeper log file to see if it is shutting > down connections. > > Check your what your max open files setting is for your OS with 'ulimit > -n' and increase it if necessary. > > > > > > Sent from my iPhone > > On Aug 16, 2012, at 4:00 AM, Arjumand Bonhomme <[email protected]> wrote: > > Hello, > > I'm fairly new to both Accumulo and Hadoop, so I think my problem may be > due to poor configuration on my part, but I'm running out of ideas. > > I'm running this on a mac laptop, with hadoop (hadoop-0.20.2 from cdh3u4) > in pseudo-distributed mode. > zookeeper version zookeeper-3.3.5 from cdh3u4 > I'm using the 1.4.1 release of accumulo with a configuration copied from > "conf/examples/512MB/standalone" > > I've got a Map task that is using an accumulo table as the input. > I'm fetching all rows, but just a single column family, that has hundreds > or even thousands of different column qualifiers. > The table has a SummingCombiner installed for the given the column family. > > The task runs fine at first, but after ~9-15K records (I print the record > count to the console every 1K records), it hangs and the following messages > are printed to the console where I'm running the job: > 12/08/16 02:57:08 INFO zookeeper.ClientCnxn: Unable to read additional > data from server sessionid 0x1392cc35b460d1c, likely server has closed > socket, closing socket connection and attempting reconnect > 12/08/16 02:57:08 INFO zookeeper.ClientCnxn: Opening socket connection to > server localhost/fe80:0:0:0:0:0:0:1%1:2181 > 12/08/16 02:57:08 INFO zookeeper.ClientCnxn: Socket connection established > to localhost/fe80:0:0:0:0:0:0:1%1:2181, initiating session > 12/08/16 02:57:08 INFO zookeeper.ClientCnxn: Unable to reconnect to > ZooKeeper service, session 0x1392cc35b460d1c has expired, closing socket > connection > 12/08/16 02:57:08 INFO zookeeper.ClientCnxn: EventThread shut down > 12/08/16 02:57:10 INFO zookeeper.ZooKeeper: Initiating client connection, > connectString=localhost sessionTimeout=30000 > watcher=org.apache.accumulo.core.zookeeper.ZooSession$AccumuloWatcher@32f5c51c > 12/08/16 02:57:10 INFO zookeeper.ClientCnxn: Opening socket connection to > server localhost/0:0:0:0:0:0:0:1:2181 > 12/08/16 02:57:10 INFO zookeeper.ClientCnxn: Socket connection established > to localhost/0:0:0:0:0:0:0:1:2181, initiating session > 12/08/16 02:57:10 INFO zookeeper.ClientCnxn: Session establishment > complete on server localhost/0:0:0:0:0:0:0:1:2181, sessionid = > 0x1392cc35b460d25, negotiated timeout = 30000 > 12/08/16 02:57:11 INFO mapred.LocalJobRunner: > 12/08/16 02:57:14 INFO mapred.LocalJobRunner: > 12/08/16 02:57:17 INFO mapred.LocalJobRunner: > > Sometimes the messages contain a stacktrace like this below: > 12/08/16 01:57:40 WARN zookeeper.ClientCnxn: Session 0x1392cc35b460b40 for > server localhost/fe80:0:0:0:0:0:0:1%1:2181, unexpected error, closing > socket connection and attempting reconnect > java.io.IOException: Connection reset by peer > at sun.nio.ch.FileDispatcher.read0(Native Method) > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198) > at sun.nio.ch.IOUtil.read(IOUtil.java:166) > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245) > at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:856) > at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1154) > 12/08/16 01:57:40 INFO zookeeper.ClientCnxn: Opening socket connection to > server localhost/127.0.0.1:2181 > 12/08/16 01:57:40 INFO zookeeper.ClientCnxn: Socket connection established > to localhost/127.0.0.1:2181, initiating session > 12/08/16 01:57:40 INFO zookeeper.ClientCnxn: Unable to reconnect to > ZooKeeper service, session 0x1392cc35b460b40 has expired, closing socket > connection > 12/08/16 01:57:40 INFO zookeeper.ClientCnxn: EventThread shut down > 12/08/16 01:57:41 INFO zookeeper.ZooKeeper: Initiating client connection, > connectString=localhost sessionTimeout=30000 > watcher=org.apache.accumulo.core.zookeeper.ZooSession$AccumuloWatcher@684a26e8 > 12/08/16 01:57:41 INFO zookeeper.ClientCnxn: Opening socket connection to > server localhost/fe80:0:0:0:0:0:0:1%1:2181 > 12/08/16 01:57:41 INFO zookeeper.ClientCnxn: Socket connection established > to localhost/fe80:0:0:0:0:0:0:1%1:2181, initiating session > 12/08/16 01:57:41 INFO zookeeper.ClientCnxn: Session establishment > complete on server localhost/fe80:0:0:0:0:0:0:1%1:2181, sessionid = > 0x1392cc35b460b46, negotiated timeout = 30000 > > > I've poked through the logs in accumulo, and I've noticed that when it > hangs, the following is written to the "logger_HOSTNAME.debug.log" file: > 16 03:29:46,332 [logger.LogService] DEBUG: event null None Disconnected > 16 03:29:47,248 [zookeeper.ZooSession] DEBUG: Session expired, state of > current session : Expired > 16 03:29:47,248 [logger.LogService] DEBUG: event null None Expired > 16 03:29:47,249 [logger.LogService] WARN : Logger lost zookeeper > registration at null > 16 03:29:47,452 [logger.LogService] INFO : Logger shutting down > 16 03:29:47,453 [logger.LogWriter] INFO : Shutting down > > > I've noticed that if I make the map task print out the record count more > frequently (ie every 10 records), it seems to be able get through more > records than when I only print every 1K records. My assumption was that > this had something to do with more time being spent in the map task, and > not fetching data from accumulo. There was at least one occasion where I > printed to the console for every record, and in that situation it managed > to process 47K records, although I have been unable to repeat that behavior. > > I've also noticed that if I stop and start accumulo, the map-reduce job > will pickup where it left off, but seems to fail quicker. > > > > Could someone make some suggestions as to what my problem might be? It > would be greatly appreciated. If you need any additional information from > me, just let me know. I'd paste my config files, driver setup, and example > data into this post, but I think it's probably long enough already. > > > Thanks in advance, > -Arjumand > >
