Hi, I have a bunch of questions regarding performance on large scale 1-wire networks. Hopefully some of you out there have experience with this.
I'm developing a home automatisation software for my personal use, but when it's mature enough it will be released as Open Source. When I begun experimenting with Owfs with just one DS18S20 everything was working great. However when I kept adding more DS18S20 the network got more and more unresponsive and my single threaded web application finally would stop answering requests. I had to get back to the "drawingboard" and make some redesign on how everything was supposed to work. To clarify my setup I can say that the application (web based) in built upon Java 8 using the jowfsclient (https://github.com/pakerfeldt/jowfsclient) to communicate with Owserver running on the same computer. I'm aiming on running all this on a Raspberry Pi B-model or later, and so far its been working great. Like I wrote above the more devices I added the more time was stacking up until it was unbearable slow, a possible solution to get a more O(1)-like performance instead of O(n) when adding more devices was the "Skip ROM"-command. Pseudo-code of the part that was interfacing with owserver looked like this: setup: owDevices = owfs.listDirectoryAll("/uncached") while(true): for each (device in owDevices) owfs.read(device) sleep 4000 // Read whole bus every 4 secs. More than 3-4 DS18S20 and we could no longer keep the 4 sec. budget. :-) Now the same application after "Skip ROM"-modification: setup: owDevices = owfs.listDirectoryAll("/uncached") while(true): if (any owDevices is temperature device) owfs.write("/simultaneous/temperature", "1") if (any owDevices is voltage device) owfs.write("/simultaneous/voltage", "1") sleep(1000) for each (device in owDevices) owfs.read(device) sleep 4000 // Read whole bus every 4 secs. Ok, to start a conversion on ALL temperature sensors on the bus at the same time we write a "1" to the global "/simultaneous/temperature", we do the same for voltage devices (like the A/D-converter DS2450). But we only do this if there are any devices of these present on the bus. Next I have read in a maillist post that we should wait 1000ms to let the conversion take place. My first question is if this sleep(1000) is really needed? Does not Owserver itself block on a following read until conversion is finished? Unfortunately I can't seem to get any positive effect from this, it takes as long as without the simultaneous-code. When I have tested there has just been DS18S20 and DS2408-devices on the bus, cables are just a couple of meters long, and all devices are powered (this is a requirement for simultaneous to work). I’m running the latest Owfs 3.1p0 version. Any more suggestions from the audience? Note that I do all my reading from "/uncached", is that possible even with simultaneous-code? Or do I have to read from the cached-directories? This is from a logfile of my application, the DS18S20 takes about ~700ms per conversion: Recorded new value '23.75' at time '2016-01-02T21:20:17.582' for device with id '7F0560010800' on Owserver at 192.168.10.110:4304 with id "1410803762". Recorded new value '1,1,1,1,0,0,0,0' at time '2016-01-02T21:20:17.627' for device with id '4D4D13000000' on Owserver at 192.168.10.110:4304 with id "1410803762". Recorded new value '22.6875' at time '2016-01-02T21:20:18.345' for device with id '52A9B5010800' on Owserver at 192.168.10.110:4304 with id "1410803762". Recorded new value '24' at time '2016-01-02T21:20:19.062' for device with id 'AC1A60010800' on Owserver at 192.168.10.110:4304 with id "1410803762". Recorded new value '20.8125' at time '2016-01-02T21:20:19.779' for device with id 'C9E69E010800' on Owserver at 192.168.10.110:4304 with id "1410803762". Recorded new value '1,1,1,1,0,0,0,0' at time '2016-01-02T21:20:19.824' for device with id '3E4D13000000' on Owserver at 192.168.10.110:4304 with id "1410803762". Recorded new value '22.75' at time '2016-01-02T21:20:20.541' for device with id '0C92B5010800' on Owserver at 192.168.10.110:4304 with id "1410803762". Recorded new value '23.125' at time '2016-01-02T21:20:21.259' for device with id '158DB5010800' on Owserver at 192.168.10.110:4304 with id "1410803762". Recorded new value '23.125' at time '2016-01-02T21:20:21.976' for device with id '969D98010800' on Owserver at 192.168.10.110:4304 with id "1410803762". Recorded new value '21.625' at time '2016-01-02T21:20:22.693' for device with id 'B74C8A010800' on Owserver at 192.168.10.110:4304 with id "1410803762". Reading all current device values took 6833ms. I stopped my application and mounted the Owfs-filesystem, looking at the /simultaneous/temperature gave: trycoon@dixie:/mnt/1wire/simultaneous$ cat temperature 0trycoon@dixie:/mnt/1wire/simultaneous$ cat temperature 0trycoon@dixie:/mnt/1wire/simultaneous$ cat temperature 1trycoon@dixie:/mnt/1wire/simultaneous$ cat temperature 1trycoon@dixie:/mnt/1wire/simultaneous$ cat temperature 0trycoon@dixie:/mnt/1wire/simultaneous$ cat temperature 0trycoon@dixie:/mnt/1wire/simultaneous$ cat temperature 0trycoon@dixie:/mnt/1wire/simultaneous$ cat temperature 1trycoon@dixie:/mnt/1wire/simultaneous$ cat temperature 0trycoon@dixie:/mnt/1wire/simultaneous$ cat temperature Is it normal that the value are jumping back and forth between 0 and 1? If my application was running at the same time and was starting new conversions it would, but it was stopped. Could there be something other that was interfering? Or is this a write only file and reading values from it will only give you rubbish? I setup Owserver to use only mocked/faked devices and made the same test, now the time was much better, too good even: Recorded new value '18.9064' at time '2016-01-02T20:20:15.242' for device with id '54F81BE8E78D' on Owserver at 192.168.10.159:4304 with id "1410809215". Recorded new value '59.1111' at time '2016-01-02T20:20:15.282' for device with id '67C6697351FF' on Owserver at 192.168.10.159:4304 with id "1410809215". Recorded new value '5.34394' at time '2016-01-02T20:20:15.322' for device with id '3158A35A255D' on Owserver at 192.168.10.159:4304 with id "1410809215". Recorded new value '1,1,1,1,1,0,1,1' at time '2016-01-02T20:20:15.362' for device with id '213DDC8770E9' on Owserver at 192.168.10.159:4304 with id "1410809215". Recorded new value '5.60222' at time '2016-01-02T20:20:15.402' for device with id 'F2FBE3467CC2' on Owserver at 192.168.10.159:4304 with id "1410809215". Recorded new value '92.522' at time '2016-01-02T20:20:15.442' for device with id 'C99A66320DB7' on Owserver at 192.168.10.159:4304 with id "1410809215". Recorded new value '46.905' at time '2016-01-02T20:20:15.482' for device with id '4AEC29CDBAAB' on Owserver at 192.168.10.159:4304 with id "1410809215". Recorded new value '25.6969' at time '2016-01-02T20:20:15.522' for device with id '765A2E63339F' on Owserver at 192.168.10.159:4304 with id "1410809215". Recorded new value '1,1,0,0,0,0,1,1' at time '2016-01-02T20:20:15.562' for device with id '54110E827441' on Owserver at 192.168.10.159:4304 with id "1410809215". Reading all current device values took 1362ms. Without my 1000ms delay a full bus reading took only 362ms. I got about the same result without the /simultaneous/temperature-code. So my question is, does fake Owfs-devices emulate the time characteristics of the real devices (conversiontime of 700ms for a DS18S20) and is the /simultaneous/temperature-effect correctly emulated? Another problem I'm struggling with is that listing devices returns 0 devices when concurrent requests to Owserver are in progress. To overcome my slow single threaded application I introduced some more threads, so now I basically has four of them: - The presence-thread polls owfs.listDirectoryAll("/uncached") every 30th seconds and stores all found devices in a list, lets call it owDevices. (It has its own dedicated persistent connection to Owserver) - The sampler-thread iterate though owDevices and read current values for all devices and stores readings in a hashmap every 4th second, lets call it owDevicesSamples. (It has its own dedicated persistent connection to Owserver) - The writer-thread that is used when the application wants to change the value of a device, like the pin on a DS2408. It is seldom used, mostly by user intervention or timer-based. (It has its own dedicated persistent connection to Owserver) - The API-thread that handles REST-requests and read current values from the owDevicesSamples-hashmap or trigger the writer-thread to set a value. This setup makes the application very response from a clients perspective, since reading from a hashmap only takes a millisecond or so. All the heavy lifting is done by the first two threads. As I understand it Owserver should be able to handle multiple concurrent request by queuing them and using internal locks as protection, however when I run the presence-thread and the sampler-thread at the same time I often get zero found devices as a response from owfs.listDirectoryAll("/uncached"). Disabling the sampler-thread and I will get "10 found devices..." from the presence-thread. I have checked my code and stepped-debugged it and it's looking fine, it seams to be a lower-level problem, maybe it's the jowfsclient-library or Owserver. Am I correct when I believe that Owserver can handle multiple concurrent commands, like a bus-scan and device-readings. Even though the hardware does not support it, Owserver will serialize the requests and execute them in turns. For those of you that strive for more resilient 1-wire networks, how do you detect device-resets/device power failure? I setup a DS2408 by a write "1" to "/<device id>/strobe", if the device is then power cycled I'm back to square one without even knowing it. Can this be detected somehow without having to write to "/strobe" first for every write I do? Thanks for any answers! // Henrik Östman ------------------------------------------------------------------------------ _______________________________________________ Owfs-developers mailing list Owfs-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/owfs-developers