Mick, I don't know anything about the Python api, or how it returns errors. I'm pretty sure owfs performs retries under the hood, but I assume these are invisible to the user. I'm not doing anything more than checking for error returns from OW_get() in the c api. It sounds like you're already checking this.
Based on Jan's feedback and what I'm seeing from others, it doesn't look like this is a common problem. I'm not sure what I'm going to do to try and track it down further. Paul W Panish Mobile: (603) 343-8901 > On Apr 29, 2015, at 18:34, Mick Sulley <m...@sulley.info> wrote: > > Hi Paul, > > I use Python to drive my system, at startup and every hour I walk the > 1-wire directory looking for devices, all other times I just use read(). > This is all in a try: except: and exceptions are written to a log file, > but that gets rotated. I have looked in the current logs and there are > no exceptions, but obviously there could have been some in the past. > > Is there a better way to look for errors? I assume that at a lower > level owfs will try and retry on error, is there an easy way to access > this? I have tried looking at /statistics/errors/ but all the values in > there seem to stay at zero. > > Cheers > Mick > >> On 29/04/15 12:59, Paul W Panish wrote: >> Mick, >> >> Thanks for the response. The mechanism I've implemented attempts 5 reads >> as quickly as the api responds. As each access fails I log the error >> message returned by the api as a WARNING and retry. After 5 attempts I >> log an ERROR and skip the update. Unfortunately I mistakenly deleted my >> log file so I don't have the exact text, but the failure is a 'file not >> found' indication. >> >> I've increased the owfs update interval to 5 seconds to match my desired >> system update interval. I can't wait indefinitely for a success as my >> queues will start to back up, and eventually overflow. An occasional >> error, or even a short string of errors and skipped updates, isn't a >> problem, but a systematic error isn't acceptable for a couple of >> reasons. First, if I miss an over-temperature indication I won't change >> state to increase circulator speed, or enable heat dumping. Second, and >> this is of more concern, if the bus corruption causes a write to fail I >> may assume I've entered a state (once again circulator speed or heat >> dumping) when I have not. Even if this doesn't occur, in a boiler system >> the heat inputs are high enough that over-temperature and pressure can >> occur very quickly, the result being a blown safety valve with the >> accompanying mess. >> >> Since the low temperature operation has never shown an error or even a >> warning, it's clear the alternative may be to move away from the devices >> causing problems when heated. >> >> What I'm wondering is whether the DS18B20's have an inherent >> vulnerability at high temperatures, and that system implementation has >> to assume a high failure rate under these conditions. Even using >> redundancy this would limit the range of applications where you'd want >> to use these devices. >> >> It would be interesting to know if you're seeing the same type of error >> returns. It probably just means increasing your level of logging so that >> each failed access is indicated. It took me a while to find this since I >> was originally only logging failures after all attempts had failed. As a >> result I saw system failures only after weeks of operation. Once I >> started logging the intermediate warnings it became clear what the >> problem was. >> >> Paul >> >> Mick Sulley wrote: >>> Hi Paul, >>> >>> I assume all references to temperatures in your mail are degree F, if so >>> I am surprised that it causes a problem. I use 27 DS1820's on my system >>> which includes 7 measuring solar panels, these can and have gone to over >>> 120 degrees C and I have not experienced the problems that you have. >>> >>> How are you detecting the errors? I poll as fast as I can, which is >>> about 15 seconds or so. I log when I get a good read from each device, >>> so error detection is really not had a good read for > 45 seconds. I >>> have a couple of sensors that I suspect are faulty and fail from time to >>> time but the rest are fine and nothing seems to be temperature related. >>> >>> If you have some other way to log errors I would be happy to try to >>> incorporate that into my system to gather more info. >>> >>> Cheers >>> Mick >>> >>>> On 29/04/15 01:04, Paul W Panish wrote: >>>> I’m wondering if anyone has information on an issue I’ve been having >>>> with DS18B20 temperature sensors. >>>> >>>> For some time I’ve been developing a wood fired boiler/heating/DHW >>>> system controller >>>> (https://sourceforge.net/projects/bctl/?source=directory) using the >>>> owcapi for all sensing and I/O functionality. My 1-wire network is >>>> limited in length and low in device weight. I have two DS18B20 >>>> temperature sensors and three Hobbyboards DS2408 based PIO boards on a >>>> roughly 50 foot linear topology bus using CAT5e cabling and standard >>>> RJ45 connectors for daisy-chaining bus segments and device attachment. >>>> The drops to each device are 1 meter or less. I’m providing power and >>>> ground through the CAT5e cabling. >>>> >>>> My problem is that there seems to be a strong temperature dependency for >>>> bus read/write errors caused by the DS18B20 sensors. I’ve replaced the >>>> sensors a few times with devices purchased at different times and from >>>> different vendors to rule out random bad devices. >>>> >>>> I’m using a polling loop to read the DS18B20’s and PIO inputs at 5 >>>> second intervals with a conversion resolution of 10 bits >>>> (temperature10). When the system is cold (<140 degrees F) it can go >>>> forever (months) with no errors indicated in any device access. However, >>>> when I fire the boiler I start seeing access errors (file not found) as >>>> the boiler temperature rises above roughly 150 degrees. The error rate >>>> increases as temperatures rise to a maximum level of about 185 degrees >>>> at which point they are quite severe. >>>> >>>> The errors are not just on access to the temperature sensors (which are >>>> hot), but also on access to the DS2408 devices (which remain at room >>>> temperature), though much less frequently. From this I’m deducing that >>>> bus timing is changing for the temperature sensors in such a manner that >>>> they are corrupting access to other devices. I don’t have a scope so I’m >>>> unable to check for slew rates, noise, or reflection problems, however >>>> none of these should be affected by device heating (well maybe slew rate…) >>>> >>>> I’ve implemented a redundant read mechanism (in addition to any >>>> redundancy owfs implements), which has made the system usable, but over >>>> the long term this is a risky solution. I can tolerate the read errors >>>> assuming I get an occasional success, however if a write to a PIO output >>>> is dropped the results could be messy. >>>> >>>> One solution would be to switch to thermocouple sensors for the high >>>> temperature components, using the MAX31850 devices, which I’ll do in the >>>> absence of any other remedy. However, the temperatures I’m dealing with >>>> are all well within the specified limits of the DS18B50 family of >>>> devices, so I’m wondering if anyone has had similar experience and could >>>> shed some light on the situation. >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> One dashboard for servers and applications across Physical-Virtual-Cloud >>>> Widest out-of-the-box monitoring support with 50+ applications >>>> Performance metrics, stats and reports that give you Actionable Insights >>>> Deep dive visibility with transaction tracing using APM Insight. >>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >>>> _______________________________________________ >>>> Owfs-developers mailing list >>>> Owfs-developers@lists.sourceforge.net >>>> https://lists.sourceforge.net/lists/listinfo/owfs-developers >>> >>> ------------------------------------------------------------------------------ >>> One dashboard for servers and applications across Physical-Virtual-Cloud >>> Widest out-of-the-box monitoring support with 50+ applications >>> Performance metrics, stats and reports that give you Actionable Insights >>> Deep dive visibility with transaction tracing using APM Insight. >>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >>> _______________________________________________ >>> Owfs-developers mailing list >>> Owfs-developers@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/owfs-developers >> ------------------------------------------------------------------------------ >> One dashboard for servers and applications across Physical-Virtual-Cloud >> Widest out-of-the-box monitoring support with 50+ applications >> Performance metrics, stats and reports that give you Actionable Insights >> Deep dive visibility with transaction tracing using APM Insight. >> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> _______________________________________________ >> Owfs-developers mailing list >> Owfs-developers@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/owfs-developers > > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > Owfs-developers mailing list > Owfs-developers@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/owfs-developers ------------------------------------------------------------------------------ One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y _______________________________________________ Owfs-developers mailing list Owfs-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/owfs-developers