The link (which I may be misreading) is http://groups.google.com/group/hector-users/browse_thread/thread/8d7004b6f85a0f2e
It's only started happening today and happened on 2 occassions (8:43 and 10:21) performing the same function (querying a column family). It seems to be trying to access a connection on one of the servers The client accesses the first node: 2011-08-02 08:43:06,541 ERROR [me.prettyprint.cassandra.connection.HThriftClient] - Could not flush transport (to be expected if the pool is shutting down) in close for client: CassandraClient<cassandradevrk1:9393-33> org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe ... 2011-08-02 08:43:06,544 WARN [me.prettyprint.cassandra.connection.HConnectionManager] - Could not fullfill request on this host CassandraClient<cassandradevrk1:9393-33> ... 2011-08-02 08:43:06,543 ERROR [me.prettyprint.cassandra.connection.HConnectionManager] - MARK HOST AS DOWN TRIGGERED for host cassandradevrk1(10.130.202.34):9393 2011-08-02 08:43:06,543 ERROR [me.prettyprint.cassandra.connection.HConnectionManager] - Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{cassandradevrk1(10.130.202.34):9393}; IsActive?: true; Active: 1; Blocked: 0; Idle: 15; NumBeforeExhausted: 49 2011-08-02 08:43:06,543 ERROR [me.prettyprint.cassandra.connection.ConcurrentHClientPool] - Shutdown triggered on <ConcurrentCassandraClientPoolByHost>:{cassandradevrk1(10.130.202.34):9393} 2011-08-02 08:43:06,544 ERROR [me.prettyprint.cassandra.connection.ConcurrentHClientPool] - Shutdown complete on <ConcurrentCassandraClientPoolByHost>:{cassandradevrk1(10.130.202.34):9393} 2011-08-02 08:43:06,544 INFO [me.prettyprint.cassandra.connection.CassandraHostRetryService] - Host detected as down was added to retry queue: cassandradevrk1(10.130.202.34):9393 2011-08-02 08:43:06,544 WARN [me.prettyprint.cassandra.connection.HConnectionManager] - Could not fullfill request on this host CassandraClient<cassandradevrk1:9393-33> 2011-08-02 08:43:06,544 WARN [me.prettyprint.cassandra.connection.HConnectionManager] - Exception: me.prettyprint.hector.api.exceptions.HectorTransportException: org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset Then it appears to try the second node and fails: 2011-08-02 08:43:06,556 INFO [me.prettyprint.cassandra.connection.HConnectionManager] - Client CassandraClient<cassandradevrk1:9393-33> released to inactive or dead pool. Closing. 2011-08-02 08:43:06,557 ERROR [me.prettyprint.cassandra.connection.HThriftClient] - Could not flush transport (to be expected if the pool is shutting down) in close for client: CassandraClient<cassandradevrk2:9393-49> org.apache.thrift.transport.TTransportException: java.net.SocketException: Broken pipe 2011-08-02 08:43:06,558 ERROR [me.prettyprint.cassandra.connection.HConnectionManager] - MARK HOST AS DOWN TRIGGERED for host cassandradevrk2(10.130.202.35):9393 2011-08-02 08:43:06,559 ERROR [me.prettyprint.cassandra.connection.HConnectionManager] - Pool state on shutdown: <ConcurrentCassandraClientPoolByHost>:{cassandradevrk2(10.130.202.35):9393}; IsActive?: true; Active: 1; Blocked: 0; Idle: 15; NumBeforeExhausted: 49 2011-08-02 08:43:06,559 ERROR [me.prettyprint.cassandra.connection.ConcurrentHClientPool] - Shutdown triggered on <ConcurrentCassandraClientPoolByHost>:{cassandradevrk2(10.130.202.35):9393} 2011-08-02 08:43:06,559 ERROR [me.prettyprint.cassandra.connection.ConcurrentHClientPool] - Shutdown complete on <ConcurrentCassandraClientPoolByHost>:{cassandradevrk2(10.130.202.35):9393} 2011-08-02 08:43:06,559 INFO [me.prettyprint.cassandra.connection.CassandraHostRetryService] - Host detected as down was added to retry queue: cassandradevrk2(10.130.202.35):9393 2011-08-02 08:43:06,560 WARN [me.prettyprint.cassandra.connection.HConnectionManager] - Could not fullfill request on this host CassandraClient<cassandradevrk2:9393-49> 2011-08-02 08:43:06,560 WARN [me.prettyprint.cassandra.connection.HConnectionManager] - Exception: me.prettyprint.hector.api.exceptions.HectorTransportException: org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset The process is the same at 10:21. *Are the exceptions related to any external events (e.g. node restarts, network issues...)?* Not that I'm aware, unless there are firewall timeouts between the application and the node servers. Let me find out. The cassandra log files have no errors reported. *What versions of Hector and Cassandra are you running?* Cassandra 0.8.1, Hector 0.8.0-1 On Tue, Aug 2, 2011 at 10:37 AM, Jim Ancona <j...@anconafamily.com> wrote: > On Tue, Aug 2, 2011 at 4:36 PM, Anthony Ikeda > <anthony.ikeda....@gmail.com> wrote: > > I'm not sure if this is a problem with Hector or with Cassandra. > > We seem to be seeing broken pipe issues with our connections on the > client > > side (Exception below). A bit of googling finds possibly a problem with > the > > amount of data we are trying to store, although I'm certain our datasets > are > > not all that large. > > I'm not sure what you're referring to here. Large requests could lead > to timeouts, but that's not what you're seeing here. Could you link to > the page you're referencing? > > > A nodetool ring command doesn't seem to present any downed nodes: > > Address DC Rack Status State Load > Owns > > Token > > > > 153951716904446304929228999025275230571 > > 10.130.202.34 datacenter1 rack1 Up Normal 470.74 KB > > 79.19% 118538200848404459763384037192174096102 > > 10.130.202.35 datacenter1 rack1 Up Normal 483.63 KB > > 20.81% 153951716904446304929228999025275230571 > > > > There are no errors in the cassandra server logs. > > > > Are there any particular timeouts on connections that we need to be aware > > of? Or perhaps configure on the Cassandra nodes? Is this purely and issue > > with the Hector API configuration? > > There is a server side timeout (rpc_timeout_in_ms in cassandra.yaml) > and a Hector client-side timeout > (CassandraHostConfigurator.cassandraThriftSocketTimeout). But again, > the "Broken pipe" error is not a timeout, it indicates that something > happened to the underlying network socket. For example you will see > those when a server node is restarted. > > Some questions that might help troubleshoot this: > How often are these occurring? > Does this affect both nodes in the cluster or just one? > Are the exceptions related to any external events (e.g. node restarts, > network issues...)? > What versions of Hector and Cassandra are you running? > > Keep in mind that failures like this will normally be retried by > Hector, resulting in no loss of data. For that reason, I think that > exception is logged as a warning in the newest Hector versions. > > We've seen something similar, but more catastrophic because it affects > connectivity to the entire cluster, not just a single node. See this > post for more details: http://goo.gl/hrgkw So far we haven't > identified the cause. > > Jim > > > Anthony > > > > 2011-08-02 08:43:06,541 ERROR > > [me.prettyprint.cassandra.connection.HThriftClient] - Could not flush > > transport (to be expected if the pool is shutting down) in close for > client: > > CassandraClient<cassandradevrk1:9393-33> > > org.apache.thrift.transport.TTransportException: > java.net.SocketException: > > Broken pipe > > at > > > org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:147) > > at > > > org.apache.thrift.transport.TFramedTransport.flush(TFramedTransport.java:156) > > at > > > me.prettyprint.cassandra.connection.HThriftClient.close(HThriftClient.java:85) > > at > > > me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:232) > > at > > > me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131) > > at > > > me.prettyprint.cassandra.service.KeyspaceServiceImpl.getSlice(KeyspaceServiceImpl.java:289) > > at > > > me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(ThriftSliceQuery.java:53) > > at > > > me.prettyprint.cassandra.model.thrift.ThriftSliceQuery$1.doInKeyspace(ThriftSliceQuery.java:49) > > at > > > me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20) > > at > > > me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85) > > at > > > me.prettyprint.cassandra.model.thrift.ThriftSliceQuery.execute(ThriftSliceQuery.java:48) > > at > > > com.wsgc.services.registry.persistenceservice.impl.cassandra.strategy.read.StandardFindRegistryPersistenceStrategy.findRegistryByProfileId(StandardFindRegistryPersistenceStrategy.java:237) > > at > > > com.wsgc.services.registry.persistenceservice.impl.cassandra.strategy.read.StandardFindRegistryPersistenceStrategy.execute(StandardFindRegistryPersistenceStrategy.java:277) > > at > > > com.wsgc.services.registry.registryservice.impl.service.StandardRegistryService.getRegistriesByProfileId(StandardRegistryService.java:327) > > at > > > com.wsgc.services.registry.webapp.impl.RegistryServicesController.getRegistriesByProfileId(RegistryServicesController.java:247) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > > > org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:175) > > at > > > org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:421) > > at > > > org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:409) > > at > > > org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:774) > > at > > > org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:719) > > at > > > org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:644) > > at > > > org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:549) > > at javax.servlet.http.HttpServlet.service(HttpServlet.java:617) > > at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) > > at > > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) > > at > > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > > at > > > org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:77) > > at > > > org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76) > > at > > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) > > at > > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) > > at > > > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) > > at > > > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) > > at > > > org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:563) > > at > > > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) > > at > > > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) > > at > > > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) > > at > > > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) > > at > org.apache.jk.server.JkCoyoteHandler.invoke(JkCoyoteHandler.java:190) > > at > org.apache.jk.common.HandlerRequest.invoke(HandlerRequest.java:291) > > at > org.apache.jk.common.ChannelSocket.invoke(ChannelSocket.java:774) > > at > > > org.apache.jk.common.ChannelSocket.processConnection(ChannelSocket.java:703) > > at > > > org.apache.jk.common.ChannelSocket$SocketConnection.runIt(ChannelSocket.java:896) > > at > > > org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690) > > at java.lang.Thread.run(Thread.java:662) > > Caused by: java.net.SocketException: Broken pipe > > at java.net.SocketOutputStream.socketWrite0(Native Method) > > at > java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92) > > at java.net.SocketOutputStream.write(SocketOutputStream.java:136) > > at > > > org.apache.thrift.transport.TIOStreamTransport.write(TIOStreamTransport.java:145) > > ... 47 more > > 2011-08-02 08:43:06,543 ERROR > > [me.prettyprint.cassandra.connection.HConnectionManager] - MARK HOST AS > DOWN > > TRIGGERED for host cassandradevrk1(10.130.202.34):9393 > > 2011-08-02 08:43:06,543 ERROR > > [me.prettyprint.cassandra.connection.HConnectionManager] - Pool state on > > shutdown: > > > <ConcurrentCassandraClientPoolByHost>:{cassandradevrk1(10.130.202.34):9393}; > > IsActive?: true; Active: 1; Blocked: 0; Idle: 15; NumBeforeExhausted: 49 > > 2011-08-02 08:43:06,543 ERROR > > [me.prettyprint.cassandra.connection.ConcurrentHClientPool] - Shutdown > > triggered on > > > <ConcurrentCassandraClientPoolByHost>:{cassandradevrk1(10.130.202.34):9393} > > 2011-08-02 08:43:06,544 ERROR > > [me.prettyprint.cassandra.connection.ConcurrentHClientPool] - Shutdown > > complete on > > > <ConcurrentCassandraClientPoolByHost>:{cassandradevrk1(10.130.202.34):9393} > > 2011-08-02 08:43:06,544 INFO > > [me.prettyprint.cassandra.connection.CassandraHostRetryService] - Host > > detected as down was added to retry queue: > > cassandradevrk1(10.130.202.34):9393 > > 2011-08-02 08:43:06,544 WARN > > [me.prettyprint.cassandra.connection.HConnectionManager] - Could not > > fullfill request on this host CassandraClient<cassandradevrk1:9393-33> > > >