Also one last thing: One of our web servers using the orientdb client got into a strange state after restarting orientdb server. This starting spewing into the logs until the log file was over 100GB:
[ [31merror [0m] c.o.o.c.r.OStorageRemote - Cannot open database url=<hostname>:2424/<database> java.io.IOException: Channel is closed at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:192) ~[orientdb-enterprise-2.1.1.jar:2.1.1] at com.orientechnologies.orient.enterprise.channel.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:171) ~[orientdb-enterprise-2.1.1.jar:2.1.1] at com.orientechnologies.orient.client.remote.OStorageRemote.beginResponse(OStorageRemote.java:2136) [orientdb-client-2.1.1.jar:2.1.1] at com.orientechnologies.orient.client.remote.OStorageRemote.openRemoteDatabase(OStorageRemote.java:1864) [orientdb-client-2.1.1.jar:2.1.1] at com.orientechnologies.orient.client.remote.OStorageRemote.handleException(OStorageRemote.java:1806) [orientdb-client-2.1.1.jar:2.1.1] [ [31merror [0m] c.o.o.c.r.OStorageRemote - Cannot open database url=<hostname>:2424/<database> java.io.IOException: Channel is closed This literally filled the entire log file. We have two other web server nodes, but they didn't experience this behavior, which suggests its some sort of an edge case. Have you seen this before? On Friday, January 15, 2016 at 11:33:54 AM UTC-5, nightrise wrote: > > Hi Luigi, > > I'll see about updating to 2.1.9, I guess I should also set it up in > distributed mode going forward since it seems one node is not reliable > enough. > > I should also note another issue: on the client side of things, these > errors keep popping up until I restart the client application. > > [warn] c.o.o.c.r.OStorageRemote - Caught I/O errors from Not connected > (local socket=?), trying to reconnect (error: java.io.IOException: Channel > is closed) > [warn] c.o.o.c.r.OStorageRemote - Connection re-acquired transparently > after 1ms and 1 retries to server '<hostname>:2424/<database>': no errors > will be thrown at application level > > (I replaced the actual hostname and database name above). > > Seems like the client gets into a state where it keeps having to reacquire > the connection. I'm using the graph factory and connection pool you guys > provide by the way. > > On Friday, January 15, 2016 at 3:53:18 AM UTC-5, Luigi Dell'Aquila wrote: >> >> Hi, >> >> two things here: >> - first of all I suggest you to update to 2.1.9, we fixed a lot of issues >> since 2.1.1, so probably the problem is already solved >> - if it happens again, please send us a thread dump ( >> http://docs.oracle.com/javase/7/docs/technotes/tools/share/jstack.html) >> it will help us to find the root cause of the problem >> >> Thanks >> >> Luigi >> >> >> 2016-01-14 21:24 GMT+01:00 nightrise <[email protected]>: >> >>> Hey guys, >>> >>> Today we introduced OrientDB at scale to our production environment. It >>> lives on a beefy machine with high memory and is correctly configured to >>> make use of it. >>> >>> Unfortunately, it fell over in the middle of the night and stopped >>> responding. I had to kill the process and restart it -- after which things >>> went back to normal. I should note that at the time, the load was pretty >>> low, and CPU utilization was around 1% or so. >>> >>> I thought perhaps it was a fluke, but 12 hours later, sure enough it >>> crashed again. This time, CPU usage seemed to spike to about 12%, and as >>> did network output. Restarting it once more fixed the problem. Same thing >>> again 15 minutes later -- lock up, had to restart it. >>> >>> I've tried perusing the logs to see if anything unusual pops up in there >>> -- but I'm not finding anything. I'm using version 2.1.1 of OrientDB in >>> standalone mode (not distributed). >>> >>> Are there any suggestions on why this might be happening and how I might >>> be able to diagnose the root issue? Are there tools that I could use? >>> >>> I should note that I've stress tested OrientDB in the past with the >>> queries that are in use. >>> >>> Any help would be appreciated! >>> >>> -- >>> >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "OrientDB" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
