Yes. Thanks to Luca this issue been fixed with latest 1.7.5 hot fix. I was testing it all day today - several nodes on the same machine, several nodes on different machines. Works great,as expected. Everything gets replicated ! Many thanks to Luca for fixing it in such a short time! Chris, thank you for taking time to test in on Windows. And your screencasts are great! So, in case if anyone has any issues with replications - first try the latest 1.7.5 Hot fix.
-Galina On Thursday, July 10, 2014 9:50:05 AM UTC-7, Lvc@ wrote: > > Hi, > I've just closed this issue as last for 1.7.5 before the release. > > Lvc@ > > > > On 10 July 2014 17:54, Chris Wilper <[email protected] <javascript:>> > wrote: > >> Hi Galina, >> >> I finally got back to trying this in Windows and saw the exact same error >> (the stack trace followed by "error on reading distributed request: >> deploy_db". Then on a whim I searched the issues for windows and came up >> with this: >> >> https://github.com/orientechnologies/orientdb/issues/2347 >> >> So I tried setting orientdb_home as suggested (using forward slashes), >> and the final message "error on reading distributed request" no longer >> occurs, and things continue as expected after that. I also noticed that >> #2347 was just closed today, so it looks like the orientdb_home workaround >> will no longer be necessary with 1.7.5. >> >> Note however that I still saw the stack trace. In fact, the same stack >> trace occurs when running in Windows, Mac, and Linux, and creating a class >> in a distributed configuration. On the surface, it doesn't appear to have a >> negative consequence. I've reported it as a separate issue with a >> screencast demo here: >> >> https://github.com/orientechnologies/orientdb/issues/2560 >> >> - Chris >> >> >> >> On Mon, Jul 7, 2014 at 7:04 PM, galina manashirova < >> [email protected] <javascript:>> wrote: >> >>> Chirs; >>> Thank you so much for screencast - great stuff. Helped a lot! >>> I followed the same steps, but on windows machine. >>> At the point when I created database People node1 throw Exception about >>> node2 (see bellow) >>> Database been created only on node1 , node 2 has only one JSON file. >>> Is that the same issue you were able to fix by shutting VMWare? >>> I don't think I have VMWare running anywhere. >>> Does anyone know if there is another work around this problem? >>> Using version 1.7.4. >>> >>> 2014-07-07 15:54:01:645 INFO Sent updated cluster configuration to the >>> remote client 127.0.0.1:50895 [OClientConnectionManager]Exception in >>> thread "hz >>> ._hzInstance_1_orientdb.cached.thread-1" java.lang.NullPointerException >>> at >>> com.orientechnologies.orient.server.OClientConnection.getRemoteAddress(OClientConnection.java:68) >>> at >>> com.orientechnologies.orient.server.OClientConnectionManager.pushDistribCfg2Clients(OClientConnectionManager.java:257) >>> at >>> com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin.entryUpdated(OHazelcastPlugin.java:575) >>> at >>> com.hazelcast.map.MapService.dispatchEvent(MapService.java:906) >>> at com.hazelcast.map.MapService.dispatchEvent(MapService.java:70) >>> at >>> com.hazelcast.spi.impl.EventServiceImpl$EventPacketProcessor.process(EventServiceImpl.java:509) >>> at >>> com.hazelcast.spi.impl.EventServiceImpl$RemoteEventPacketProcessor.run(EventServiceImpl.java:535) >>> at >>> com.hazelcast.util.executor.StripedExecutor$Worker.run(StripedExecutor.java:142) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) >>> at java.lang.Thread.run(Thread.java:662) >>> at >>> com.hazelcast.util.executor.PoolExecutorThreadFactory$ManagedThread.run(PoolExecutorThreadFactory.java:59) >>> [node1]<-[node2] error on reading distributed request: deploy_db >>> >>> Thanks. >>> -galina >>> >>> >>> >>> On Friday, July 4, 2014 12:28:40 AM UTC-7, Chris Wilper wrote: >>> >>>> Update: >>>> >>>> Ok, I haven't determined why I saw the odd behavior in Windows, but I >>>> *have* been able to successfully set up multiple nodes w/replication on >>>> OSX. After looking more carefully at the console output, I noticed on the >>>> Mac that orient was binding to an unfamiliar IP address. It turns out it >>>> was trying to connect via a virtual software network device (VMWare), and >>>> I >>>> believe this explains why I saw the odd behavior; after I shut down >>>> vmware, >>>> I was successful. >>>> >>>> Here is a screecast showing how I got it working with two nodes: >>>> http://screencast.com/t/IiC5SIlUAk >>>> >>>> I basically created two empty nodes, then connected and created a >>>> database and class, and added a record. It shows that the database was >>>> definitely created on both nodes (the database directory), and that if one >>>> node goes down, the other still provides access to the replicated record. >>>> >>>> One thing I realized in this process was that it seems the first node >>>> you start on a given network device seems to have special status. I guess >>>> it is the one responsible for communicating which nodes it knows are >>>> available (including itself). So if you start node1, node2, and node3 all >>>> on the same host in that order, you can shut down nodes 2 and 3 just fine, >>>> but if you instead keep those running and try to shut down node1, you >>>> can't >>>> subsequently connect.However, if you restart any node, it will take over >>>> the role that node1 had and you can then connect to the cluster again. At >>>> least that's the behavior I think I'm observing. Does that sound right to >>>> anybody familiar with this? Any way to get around it? >>>> >>>> Thanks, >>>> Chris >>>> >>>> >>>> On Thu, Jul 3, 2014 at 7:59 PM, galina manashirova < >>>> [email protected]> wrote: >>>> >>>>> Another test of replication : >>>>> >>>>> 1. Started node1 >>>>> 2. Started node2 >>>>> Log file tells me that they are talking to each other. >>>>> I logged to the database (from console) in node1. Created a new class : >>>>> >>>>> CREATE CLASS CUSTOMER EXTENDS V >>>>> Nothing happened on node2. >>>>> Since it is Master to Master replication shouldn't it replicate right >>>>> away? >>>>> I killed node1, then restarted node1 and only after that I could see >>>>> my new CUSTOMER class on the console of node2. >>>>> So, replication happens only if one of the nodes is going down? >>>>> >>>>> Is this expected behavior? >>>>> >>>>> -Galina >>>>> >>>>> >>>>> On Thursday, July 3, 2014 2:30:51 PM UTC-7, Chris Wilper wrote: >>>>> >>>>>> Another data point: >>>>>> >>>>>> I just tried configuring replication with two nodes on the same host >>>>>> with a fresh install of 1.7.4 on Windows and OSX, and I was also not >>>>>> successful. But I saw different problems than you did. >>>>>> >>>>>> Steps I followed: >>>>>> 1) Unpack the official distribution in two separate directories on >>>>>> the same host, one for node1 and one for node2 >>>>>> 2) Start node1 immediately by going into bin and running the >>>>>> dserver script >>>>>> 3) Modify node2's config/hazelcast.xml file, changing the port >>>>>> element's value from 2434 to 2435 >>>>>> 4) Start node2 >>>>>> >>>>>> After this, from the console output I could see that both nodes >>>>>> recognized that they were part of the cluster and could see the other >>>>>> one. >>>>>> >>>>>> But then I ran console.sh: >>>>>> >>>>>> orientdb> connect remote:localhost/GratefulDeadConcerts admin admin >>>>>> >>>>>> On Windows: >>>>>> ------------------- >>>>>> >>>>>> It successfully connected, then showed me the DISTRIBUTED >>>>>> CONFIGURATION, which looked correct. Then I ran a simple query (SELECT >>>>>> COUNT(*) FROM V) successfully. Next, I tried stopping node2 to simulate >>>>>> node failure. Queries still worked fine. Then I restarted node2, and >>>>>> queries still worked as expected. Next, I tried stopping node1 and >>>>>> suddenly >>>>>> queries from the console failed with messages about not being able to >>>>>> connect. Then I exited and restarted the console. Same problem. Finally, >>>>>> I >>>>>> decided to stop the other node, restart both nodes, and restart the >>>>>> console. Immediately upon attempting to connect, I got the following: >>>>>> >>>>>> Connecting to database [remote:localhost/GratefulDeadConcerts] with >>>>>> user 'admin'... >>>>>> Error: >>>>>> com.orientechnologies.orient.core.exception.OConfigurationException: >>>>>> Database 'GratefulDeadConcerts' is not configured on server >>>>>> (home=C:\Users\user >>>>>> \Downloads\cluster\node1/databases/) >>>>>> >>>>>> Next I looked in the databases\GratefulDeadConcerts\ directory and >>>>>> saw there was a single file in there, distributed-config.json, but no >>>>>> data >>>>>> files. For either node. Uh oh... >>>>>> >>>>>> On OS X: >>>>>> -------------- >>>>>> >>>>>> It successfully connected, then said: >>>>>> DISTRIBUTED CONFIGURATION: none (OrientDB is running in standalone >>>>>> mode) >>>>>> >>>>>> ...even though the nodes seem to think they're running in distributed >>>>>> mode. >>>>>> >>>>>> -- >>>>>> >>>>>> Can anyone else reproduce these behaviors with a fresh 1.7.4 install? >>>>>> >>>>>> Thanks, >>>>>> Chris >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Jul 3, 2014 at 2:05 PM, galina manashirova < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Can anybody please help me with this or at least come up with a >>>>>>> better tutorial in regards of replication. >>>>>>> >>>>>>> -Galina >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Wednesday, July 2, 2014 12:44:22 PM UTC-7, galina manashirova >>>>>>> wrote: >>>>>>>> >>>>>>>> Started from scratch: >>>>>>>> 1. Downloaded version 1.7.4 >>>>>>>> 2. Started server node1 in distributed mode (dserver) >>>>>>>> 3. Copied node1 directory as node2 >>>>>>>> 4. changed nodeName in orientdb-dserver-config.xml on both nodes >>>>>>>> giving different names. >>>>>>>> 5. Started node2 >>>>>>>> Both nodes see each other. I see in the console for one node: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> *Members [2] { Member [10.32.10.72]:2434 this Member >>>>>>>> [10.32.10.72]:2435 }* >>>>>>>> >>>>>>>> And on the console of another node: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> *Members [2] { Member [10.32.10.72]:2434 Member >>>>>>>> [10.32.10.72]:2435 this }* >>>>>>>> >>>>>>>> they are definitely talk to each other. Except one of the nodes >>>>>>>> gave me an error: >>>>>>>> >>>>>>>> 2014-07-02 12:12:56:234 WARN [node2]->[[node1]] requesting deploy >>>>>>>> of database 'GratefulDeadConcerts' on local server... >>>>>>>> [OHazelcastPlugin] >>>>>>>> 2014-07-02 12:32:56:266 WARN [node2] timeout (1200001ms) on waiting >>>>>>>> for synchronous responses from nodes=[node1] responsesSoFar=[] >>>>>>>> request=id=0 >>>>>>>> from=n >>>>>>>> ode2 task=deploy_db [OHazelcastDistributedDatabase] >>>>>>>> *Exception in thread "main" >>>>>>>> com.orientechnologies.orient.server.distributed.ODistributedException: >>>>>>>> E >>>>>>>> rror on sending distributed request against database >>>>>>>> 'GratefulDeadConcerts' >>>>>>>> to nodes [node1]* >>>>>>>> at com.orientechnologies.orient.server.hazelcast. >>>>>>>> OHazelcastDistributedDatabase.send2Nodes(OHa >>>>>>>> zelcastDistributedDatabase.java:194) >>>>>>>> at com.orientechnologies.orient.server.hazelcast. >>>>>>>> OHazelcastPlugin.sendRequest(OHazelcastPlugin.java:364) >>>>>>>> at com.orientechnologies.orient.server.hazelcast. >>>>>>>> OHazelcastPlugin.installDatabase(OHazelcastPlugin.java:813) >>>>>>>> at com.orientechnologies.orient.server.hazelcast. >>>>>>>> OHazelcastPlugin.installNewDatabases(OHazelcastPlugin.java:767) >>>>>>>> at com.orientechnologies.orient.server.hazelcast. >>>>>>>> OHazelcastPlugin.startup(OHazelcastPlugin.java:191) >>>>>>>> at com.orientechnologies.orient.server.OServer. >>>>>>>> registerPlugins(OServer.java:720) >>>>>>>> at com.orientechnologies.orient.server.OServer.activate( >>>>>>>> OServer.java:241) >>>>>>>> at com.orientechnologies.orient.server.OServerMain.main( >>>>>>>> OServerMain.java:32) >>>>>>>> Caused by: com.orientechnologies.orient.server.distributed. >>>>>>>> ODistributedException: No response received from any of nodes >>>>>>>> [node1] for request id=0 from >>>>>>>> =node2 task=deploy_db >>>>>>>> at com.orientechnologies.orient.server.distributed. >>>>>>>> ODistributedResponseManager.getFinalResponse( >>>>>>>> ODistributedResponseManager.java:395) >>>>>>>> at com.orientechnologies.orient.server.hazelcast. >>>>>>>> OHazelcastDistributedDatabase.waitForResponse( >>>>>>>> OHazelcastDistributedDatabase.java:422) >>>>>>>> at com.orientechnologies.orient.server.hazelcast. >>>>>>>> OHazelcastDistributedDatabase.send2Nodes(OHa >>>>>>>> zelcastDistributedDatabase.java:191) >>>>>>>> ... 7 more >>>>>>>> >>>>>>>> >>>>>>>> Even though right above that I see a log message saying that >>>>>>>> GratefulDatabase distributed configuration sees 2 nodes: >>>>>>>> >>>>>>>> 2014-07-02 12:12:56:216 INFO updated distributed configuration for >>>>>>>> database: GratefulDeadConcerts: >>>>>>>> ---------- >>>>>>>> { >>>>>>>> "version":2, >>>>>>>> "autoDeploy":true, >>>>>>>> "hotAlignment":false, >>>>>>>> "readQuorum":1, >>>>>>>> "writeQuorum":2, >>>>>>>> "failureAvailableNodesLessQuorum":false, >>>>>>>> "readYourWrites":true,"clusters":{ >>>>>>>> "internal":null, >>>>>>>> "index":null, >>>>>>>> "*":{ >>>>>>>> "servers":["<NEW_NODE>","node1","node2"] >>>>>>>> } >>>>>>>> } >>>>>>>> } >>>>>>>> When I try to add or remove something from one node on that >>>>>>>> database nothing happens to another one. >>>>>>>> Nothing gets replicated on database level. >>>>>>>> Can someone please tell me what I am doing wrong? >>>>>>>> I am not trying anything fancy with replication. This is just a >>>>>>>> basic replication task. >>>>>>>> I tried replication in some earlier versions (don't remember now >>>>>>>> which one ) and it worked. Now I can't make it work. >>>>>>>> We are trying to implement OrientDb for the one of our company >>>>>>>> product and if replication is not going to work we would have to look >>>>>>>> for >>>>>>>> something else. >>>>>>>> Please let me know if I am doing something wrong. >>>>>>>> >>>>>>>> Thank you. >>>>>>>> -galina >>>>>>>> >>>>>>>> >>>>>>> -- >>>>>>> >>>>>>> --- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "OrientDB" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>> >>>>>> -- >>>>> >>>>> --- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "OrientDB" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>> >>>> -- >>> >>> --- >>> You received this message because you are subscribed to the Google >>> Groups "OrientDB" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected] <javascript:>. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> >> --- >> You received this message because you are subscribed to the Google Groups >> "OrientDB" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> For more options, visit https://groups.google.com/d/optout. >> > > -- --- You received this message because you are subscribed to the Google Groups "OrientDB" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
