Re: [orientdb] Re: Replication doesn't work even for demo db - GratefulDeadConcerts - version 1.7.4

galina manashirova Thu, 10 Jul 2014 15:26:39 -0700

Yes. 
Thanks to Luca this issue been fixed with latest 1.7.5 hot fix.
I was testing it all day today - several nodes on the same machine, several 
nodes on different machines. Works great,as expected. Everything  gets 
replicated !
Many thanks to Luca for fixing it in such a short time!
Chris, thank you for taking time to test in on Windows. And your 
screencasts are great!
So, in case if anyone has any issues with replications - first try the 
latest 1.7.5 Hot fix.


-Galina






On Thursday, July 10, 2014 9:50:05 AM UTC-7, Lvc@ wrote:
>
> Hi,
> I've just closed this issue as last for 1.7.5 before the release.
>
> Lvc@
>
>
>
> On 10 July 2014 17:54, Chris Wilper <[email protected] <javascript:>> 
> wrote:
>
>> Hi Galina,
>>
>> I finally got back to trying this in Windows and saw the exact same error 
>> (the stack trace followed by "error on reading distributed request: 
>> deploy_db". Then on a whim I searched the issues for windows and came up 
>> with this: 
>>
>> https://github.com/orientechnologies/orientdb/issues/2347
>>
>> So I tried setting orientdb_home as suggested (using forward slashes), 
>> and the final message "error on reading distributed request" no longer 
>> occurs, and things continue as expected after that. I also noticed that 
>> #2347 was just closed today, so it looks like the orientdb_home workaround 
>> will no longer be necessary with 1.7.5.
>>
>> Note however that I still saw the stack trace. In fact, the same stack 
>> trace occurs when running in Windows, Mac, and Linux, and creating a class 
>> in a distributed configuration. On the surface, it doesn't appear to have a 
>> negative consequence. I've reported it as a separate issue with a 
>> screencast demo here:
>>
>> https://github.com/orientechnologies/orientdb/issues/2560
>>
>> - Chris
>>
>>
>>
>> On Mon, Jul 7, 2014 at 7:04 PM, galina manashirova <
>> [email protected] <javascript:>> wrote:
>>
>>> Chirs;
>>> Thank you so much for screencast - great stuff. Helped a lot!
>>> I followed the same steps, but on windows machine.
>>> At the point when I created database People node1 throw Exception about 
>>> node2 (see bellow)
>>> Database been created only on node1 , node 2 has only one JSON file.
>>> Is that the same issue you were able to fix by shutting VMWare?
>>> I don't think I have VMWare running anywhere.
>>> Does anyone know if there is another work around this problem? 
>>> Using version 1.7.4.
>>>
>>> 2014-07-07 15:54:01:645 INFO Sent updated cluster configuration to the 
>>> remote client 127.0.0.1:50895 [OClientConnectionManager]Exception in 
>>> thread "hz
>>> ._hzInstance_1_orientdb.cached.thread-1" java.lang.NullPointerException
>>>         at 
>>> com.orientechnologies.orient.server.OClientConnection.getRemoteAddress(OClientConnection.java:68)
>>>         at 
>>> com.orientechnologies.orient.server.OClientConnectionManager.pushDistribCfg2Clients(OClientConnectionManager.java:257)
>>>         at 
>>> com.orientechnologies.orient.server.hazelcast.OHazelcastPlugin.entryUpdated(OHazelcastPlugin.java:575)
>>>         at 
>>> com.hazelcast.map.MapService.dispatchEvent(MapService.java:906)
>>>         at com.hazelcast.map.MapService.dispatchEvent(MapService.java:70)
>>>         at 
>>> com.hazelcast.spi.impl.EventServiceImpl$EventPacketProcessor.process(EventServiceImpl.java:509)
>>>         at 
>>> com.hazelcast.spi.impl.EventServiceImpl$RemoteEventPacketProcessor.run(EventServiceImpl.java:535)
>>>         at 
>>> com.hazelcast.util.executor.StripedExecutor$Worker.run(StripedExecutor.java:142)
>>>         at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>>>         at 
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>>>         at java.lang.Thread.run(Thread.java:662)
>>>         at 
>>> com.hazelcast.util.executor.PoolExecutorThreadFactory$ManagedThread.run(PoolExecutorThreadFactory.java:59)
>>> [node1]<-[node2] error on reading distributed request: deploy_db
>>>
>>> Thanks.
>>> -galina
>>>
>>>
>>>
>>> On Friday, July 4, 2014 12:28:40 AM UTC-7, Chris Wilper wrote:
>>>
>>>> Update:
>>>>
>>>> Ok, I haven't determined why I saw the odd behavior in Windows, but I 
>>>> *have* been able to successfully set up multiple nodes w/replication on 
>>>> OSX. After looking more carefully at the console output, I noticed on the 
>>>> Mac that orient was binding to an unfamiliar IP address. It turns out it 
>>>> was trying to connect via a virtual software network device (VMWare), and 
>>>> I 
>>>> believe this explains why I saw the odd behavior; after I shut down 
>>>> vmware, 
>>>> I was successful.
>>>>
>>>> Here is a screecast showing how I got it working with two nodes: 
>>>> http://screencast.com/t/IiC5SIlUAk
>>>>
>>>> I basically created two empty nodes, then connected and created a 
>>>> database and class, and added a record. It shows that the database was 
>>>> definitely created on both nodes (the database directory), and that if one 
>>>> node goes down, the other still provides access to the replicated record.
>>>>
>>>> One thing I realized in this process was that it seems the first node 
>>>> you start on a given network device seems to have special status. I guess 
>>>> it is the one responsible for communicating which nodes it knows are 
>>>> available (including itself). So if you start node1, node2, and node3 all 
>>>> on the same host in that order, you can shut down nodes 2 and 3 just fine, 
>>>> but if you instead keep those running and try to shut down node1, you 
>>>> can't 
>>>> subsequently connect.However, if you restart any node, it will take over 
>>>> the role that node1 had and you can then connect to the cluster again. At 
>>>> least that's the behavior I think I'm observing. Does that sound right to 
>>>> anybody familiar with this? Any way to get around it?
>>>>
>>>> Thanks,
>>>> Chris
>>>>
>>>>
>>>> On Thu, Jul 3, 2014 at 7:59 PM, galina manashirova <
>>>> [email protected]> wrote:
>>>>
>>>>> Another test of replication :
>>>>>
>>>>> 1. Started node1
>>>>> 2. Started node2
>>>>> Log file tells me that they are talking to each other.
>>>>> I logged to the database (from console) in node1. Created a new class :
>>>>>
>>>>> CREATE CLASS CUSTOMER EXTENDS  V
>>>>> Nothing happened on node2.
>>>>> Since it is Master to Master replication shouldn't it replicate right 
>>>>> away?
>>>>> I killed node1, then restarted node1 and only after that I could see 
>>>>> my new CUSTOMER class on the console of node2.
>>>>> So, replication happens only if one of the nodes is going down?
>>>>>
>>>>> Is this expected behavior?
>>>>>
>>>>> -Galina
>>>>>  
>>>>>
>>>>> On Thursday, July 3, 2014 2:30:51 PM UTC-7, Chris Wilper wrote:
>>>>>
>>>>>> Another data point:
>>>>>>
>>>>>> I just tried configuring replication with two nodes on the same host 
>>>>>> with a fresh install of 1.7.4 on Windows and OSX, and I was also not 
>>>>>> successful. But I saw different problems than you did.
>>>>>>
>>>>>> Steps I followed:
>>>>>>   1) Unpack the official distribution in two separate directories on 
>>>>>> the same host, one for node1 and one for node2
>>>>>>   2) Start node1 immediately by going into bin and running the 
>>>>>> dserver script
>>>>>>   3) Modify node2's config/hazelcast.xml file, changing the port 
>>>>>> element's value from 2434 to 2435
>>>>>>   4) Start node2
>>>>>>
>>>>>> After this, from the console output I could see that both nodes 
>>>>>> recognized that they were part of the cluster and could see the other 
>>>>>> one.
>>>>>>
>>>>>> But then I ran console.sh:
>>>>>>
>>>>>> orientdb> connect remote:localhost/GratefulDeadConcerts admin admin
>>>>>>
>>>>>> On Windows:
>>>>>> -------------------
>>>>>>  
>>>>>> It successfully connected, then showed me the DISTRIBUTED 
>>>>>> CONFIGURATION, which looked correct. Then I ran a simple query (SELECT 
>>>>>> COUNT(*) FROM V) successfully. Next, I tried stopping node2 to simulate 
>>>>>> node failure. Queries still worked fine. Then I restarted node2, and 
>>>>>> queries still worked as expected. Next, I tried stopping node1 and 
>>>>>> suddenly 
>>>>>> queries from the console failed with messages about not being able to 
>>>>>> connect. Then I exited and restarted the console. Same problem. Finally, 
>>>>>> I 
>>>>>> decided to stop the other node, restart both nodes, and restart the 
>>>>>> console. Immediately upon attempting to connect, I got the following:
>>>>>>
>>>>>> Connecting to database [remote:localhost/GratefulDeadConcerts] with 
>>>>>> user 'admin'...
>>>>>> Error: 
>>>>>> com.orientechnologies.orient.core.exception.OConfigurationException: 
>>>>>> Database 'GratefulDeadConcerts' is not configured on server 
>>>>>> (home=C:\Users\user
>>>>>> \Downloads\cluster\node1/databases/)
>>>>>>
>>>>>> Next I looked in the databases\GratefulDeadConcerts\ directory and 
>>>>>> saw there was a single file in there, distributed-config.json, but no 
>>>>>> data 
>>>>>> files. For either node. Uh oh...
>>>>>>
>>>>>> On OS X:
>>>>>> --------------
>>>>>>
>>>>>> It successfully connected, then said:
>>>>>> DISTRIBUTED CONFIGURATION: none (OrientDB is running in standalone 
>>>>>> mode)
>>>>>>
>>>>>> ...even though the nodes seem to think they're running in distributed 
>>>>>> mode.
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Can anyone else reproduce these behaviors with a fresh 1.7.4 install?
>>>>>>
>>>>>> Thanks,
>>>>>> Chris
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Jul 3, 2014 at 2:05 PM, galina manashirova <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Can anybody please help me with this or at least come up with a 
>>>>>>> better tutorial in regards of replication.
>>>>>>>
>>>>>>> -Galina
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wednesday, July 2, 2014 12:44:22 PM UTC-7, galina manashirova 
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Started from scratch:
>>>>>>>> 1. Downloaded version 1.7.4
>>>>>>>> 2. Started server node1 in distributed mode (dserver)
>>>>>>>> 3. Copied node1 directory as node2 
>>>>>>>> 4. changed nodeName in orientdb-dserver-config.xml on both nodes 
>>>>>>>> giving different names.
>>>>>>>> 5. Started node2
>>>>>>>>     Both nodes see each other. I see in the console for one node:
>>>>>>>>     
>>>>>>>>
>>>>>>>>
>>>>>>>> *Members [2] {        Member [10.32.10.72]:2434 this        Member 
>>>>>>>> [10.32.10.72]:2435    }*
>>>>>>>>
>>>>>>>>     And on the console of another node:
>>>>>>>>     
>>>>>>>>
>>>>>>>>
>>>>>>>> *Members [2] {        Member [10.32.10.72]:2434        Member 
>>>>>>>> [10.32.10.72]:2435 this    }*
>>>>>>>>
>>>>>>>> they are definitely talk to each other. Except one of the nodes 
>>>>>>>> gave me an error:
>>>>>>>>
>>>>>>>> 2014-07-02 12:12:56:234 WARN [node2]->[[node1]] requesting deploy 
>>>>>>>> of database 'GratefulDeadConcerts' on local server... 
>>>>>>>> [OHazelcastPlugin]
>>>>>>>> 2014-07-02 12:32:56:266 WARN [node2] timeout (1200001ms) on waiting 
>>>>>>>> for synchronous responses from nodes=[node1] responsesSoFar=[] 
>>>>>>>> request=id=0 
>>>>>>>> from=n
>>>>>>>> ode2 task=deploy_db [OHazelcastDistributedDatabase]
>>>>>>>> *Exception in thread "main" 
>>>>>>>> com.orientechnologies.orient.server.distributed.ODistributedException: 
>>>>>>>> E 
>>>>>>>> rror on sending distributed request against database 
>>>>>>>> 'GratefulDeadConcerts' 
>>>>>>>> to nodes [node1]*
>>>>>>>>         at com.orientechnologies.orient.server.hazelcast.
>>>>>>>> OHazelcastDistributedDatabase.send2Nodes(OHa
>>>>>>>> zelcastDistributedDatabase.java:194)
>>>>>>>>         at com.orientechnologies.orient.server.hazelcast.
>>>>>>>> OHazelcastPlugin.sendRequest(OHazelcastPlugin.java:364)
>>>>>>>>         at com.orientechnologies.orient.server.hazelcast.
>>>>>>>> OHazelcastPlugin.installDatabase(OHazelcastPlugin.java:813)
>>>>>>>>         at com.orientechnologies.orient.server.hazelcast.
>>>>>>>> OHazelcastPlugin.installNewDatabases(OHazelcastPlugin.java:767)
>>>>>>>>         at com.orientechnologies.orient.server.hazelcast.
>>>>>>>> OHazelcastPlugin.startup(OHazelcastPlugin.java:191)
>>>>>>>>         at com.orientechnologies.orient.server.OServer.
>>>>>>>> registerPlugins(OServer.java:720)
>>>>>>>>         at com.orientechnologies.orient.server.OServer.activate(
>>>>>>>> OServer.java:241)
>>>>>>>>         at com.orientechnologies.orient.server.OServerMain.main(
>>>>>>>> OServerMain.java:32)
>>>>>>>> Caused by: com.orientechnologies.orient.server.distributed.
>>>>>>>> ODistributedException: No response received from any of nodes 
>>>>>>>> [node1] for request id=0 from
>>>>>>>> =node2 task=deploy_db
>>>>>>>>         at com.orientechnologies.orient.server.distributed.
>>>>>>>> ODistributedResponseManager.getFinalResponse(
>>>>>>>> ODistributedResponseManager.java:395)
>>>>>>>>         at com.orientechnologies.orient.server.hazelcast.
>>>>>>>> OHazelcastDistributedDatabase.waitForResponse(
>>>>>>>> OHazelcastDistributedDatabase.java:422)
>>>>>>>>         at com.orientechnologies.orient.server.hazelcast.
>>>>>>>> OHazelcastDistributedDatabase.send2Nodes(OHa
>>>>>>>> zelcastDistributedDatabase.java:191)
>>>>>>>>         ... 7 more
>>>>>>>>
>>>>>>>>
>>>>>>>> Even though right above that I see a log message saying that 
>>>>>>>> GratefulDatabase distributed configuration sees 2 nodes:
>>>>>>>>
>>>>>>>> 2014-07-02 12:12:56:216 INFO updated distributed configuration for 
>>>>>>>> database: GratefulDeadConcerts:
>>>>>>>> ----------
>>>>>>>> {
>>>>>>>>   "version":2,
>>>>>>>>   "autoDeploy":true,
>>>>>>>>   "hotAlignment":false,
>>>>>>>>   "readQuorum":1,
>>>>>>>>   "writeQuorum":2,
>>>>>>>>   "failureAvailableNodesLessQuorum":false,
>>>>>>>>   "readYourWrites":true,"clusters":{
>>>>>>>>     "internal":null,
>>>>>>>>     "index":null,
>>>>>>>>     "*":{
>>>>>>>>   "servers":["<NEW_NODE>","node1","node2"]
>>>>>>>> }
>>>>>>>>     }
>>>>>>>> }
>>>>>>>> When I try to add or remove something from one node on that 
>>>>>>>> database nothing happens to another one. 
>>>>>>>> Nothing gets replicated on database level.
>>>>>>>> Can someone please tell me what I am doing wrong? 
>>>>>>>> I am not trying anything fancy with replication. This is just a 
>>>>>>>> basic replication task.
>>>>>>>> I tried replication in some earlier versions (don't remember now 
>>>>>>>> which one ) and it worked. Now I can't make it work.
>>>>>>>> We are trying to implement OrientDb for the one of our company 
>>>>>>>> product and if replication is not going to work we would have to look 
>>>>>>>> for 
>>>>>>>> something else.
>>>>>>>> Please let me know if I am doing something wrong.
>>>>>>>>
>>>>>>>> Thank you.
>>>>>>>> -galina
>>>>>>>>     
>>>>>>>>
>>>>>>>  -- 
>>>>>>>
>>>>>>> --- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "OrientDB" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to [email protected].
>>>>>>>
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>
>>>>>>  -- 
>>>>>
>>>>> --- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "OrientDB" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>  -- 
>>>
>>> --- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "OrientDB" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected] <javascript:>.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  -- 
>>
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "OrientDB" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"OrientDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [orientdb] Re: Replication doesn't work even for demo db - GratefulDeadConcerts - version 1.7.4

Reply via email to