RE: IoT OS
>Say all objects are connected peer to peer with wifi, some of them are >connected to internet through gsm network or wifi to a box. >These object are moving in space, and for some reasons, connections are >dynamical and can be severely impaired or lost. > >They have incoming local streams of data (eg HD videos, accelerometer, GPS, >other wifi and gsm signals, etc). > >I would like to abstract the CPU layer, storage layer, and internet connection >so that in realtime results of one of my objects are saved >if this object dies, so that if one of the object giving internet access to >the group loose its connection, the redundancy allows the group of object not >to lose internet connection. > >Can I consider these as different load balancing layers ? Do you recommend to >implement this at the kernel layer or at an API layer ? >Can I see that as a lightweight cluster ? > >I think the API is more flexible, especially if I have an heterogeneous (by >CPU, OS) set of connected object. However, working at the kernel level allows >existing programs not to be rewritten. >What are your thoughts ? === OK, I think I understand your question now. This isn't the right list for it, though I'm not sure where the right place to go would be -- it's not FreeBSD-specific, in any case. There are academic research groups looking into this type of problem; for instance, in the area of sensor networks (ACM Transactions on Sensor Networks covers some of these areas). There may be USENET groups which cover this area. To cover your three areas, which I think require somewhat different solutions -- (a) CPU layer. I don't really recommend trying to abstract this. You could use a virtual machine to hide the underlying architecture, and checkpoint state periodically, but this is likely to slow down execution too much to be useful. If the issue that a service may become unavailable, I'd recommend a middleware layer which can detect this and recover by starting a new instance of the service. Middleware layers like ZeroMQ, and clustering software, may be a useful starting point. This does mean that stateful connections (like reading a video stream) won't recover cleanly, though; the client would need to reconnect to attach to the new instance of the service. If you really need that, it's going to be hard. (b) Storage layer. Look into highly-available clustered storage solutions. If you can use key-value or some other simplified storage model, do it. There are clustered file systems but probably none freely available that would work on the scale you envision and give decent performance. There are more alternatives if you're flexible about the format in which you're storing data (e.g. replicated object stores). (c) Networking layer; or internet. If you can drop & re-establish a connection, or if every node has its own IP address (IPv6), this should be pretty straightforward; software could detect loss of connection and change the routing used to go through a different system. If not, you'll be a bit limited since mirroring TCP state between nodes would be too slow. This is a case where the existing operating system kernels are likely to do most of what you need; you simply need to add a layer to detect routing problems and select a new internet gateway appropriately. I'd avoid implementing any clustering within the kernel, in part because if you have a wide variety of objects you may not want the same kernel on all of them, and in part because debugging & recovery is much harder. You're unlikely to want to run most existing software on such a system anyway (especially if they have relatively weak processors); you're better off writing to a set of clustering APIs for storage and state, at least. For networking, as mentioned, you can likely use the existing TCP stack & just add controls to redirect traffic as needed. -- Anton ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
RE: FreeBsd MCA Panic Crash !!
>We've switched to FreeBSD recently to accomodate large video storage as we are >running video streaming website. >So the job of the FreeBSD is to transcode the uploaded videos using ffmpeg and >serve them to users via nginx webserver >but so far our experience is not very good with it. It crashes every 2-3 days >and we're unable to track down the problem. >The server specs are pretty high : > > Supermicro X5690 (12 cores, 24 threads - 2u) 96GB RAM 12x3TB RAID-10 > (HBA-LSI9211) [...] >CPU 3 BANK 5 >MCA: Internal Timer error >STATUS be800400 MCGSTATUS 4 Are those the only MCA errors you're seeing? The reason I ask is that there's an errata in the X5600 series which can cause an "internal timer error" MCA to be logged after another uncorrectable MCA occurs. If not, these do point at a hardware problem *or* errata, though software can also trigger this in some cases (for instance, reading from malfunctioning or non-existent hardware). If your BIOS can be updated, that's a good first step as it will generally update the CPU microcode and add workarounds for many known issues. Replacing the CPU and/or voltage regulator is more drastic, but if the problem is hardware, it's likely in one of those components. Anton Supermicro X5690 (12 cores, 24 threads - 2u) 96GB RAM 12x3TB RAID-10 (HBA-LSI9211) Here is the screenshot of recent crash : http://prntscr.com/9er3pk One thing worth mentioning is, before going down there's no load on server, more or less free RAM usually is around 12GB. We've tried following solutions so far : - Updated FreeBSD OS - Replaced 800W PS with 900W - We've reduced CMOS from MAX(26x) to 18x as suggested in this post http://unix.stackexchange.com/questions/60574/determining-cause-of-linux-kernel-panic The solution we've not performed so far is : - Disable mca using (hw.mca.enabled: 0) - As we're getting MCA panics. Here is the crash dump : [root@cw001 /var/crash]# mcelog --no-dmi --ascii --file core.txt.1 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 3 BANK 5 MISC 0 ADDR 802bf6a69 MCG status:MCIP MCi status: Uncorrected error Error enabled MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: Internal Timer error STATUS be800400 MCGSTATUS 4 MCGCAP 1c09 APICID 3 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 2 BANK 5 MISC 0 ADDR 802bf6a69 MCG status:MCIP MCi status: Uncorrected error Error enabled MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: Internal Timer error STATUS be800400 MCGSTATUS 4 MCGCAP 1c09 APICID 2 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 3 BANK 5 MISC 0 ADDR 802bf6a69 MCG status:MCIP MCi status: Uncorrected error Error enabled MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: Internal Timer error STATUS be800400 MCGSTATUS 4 MCGCAP 1c09 APICID 3 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor CPU 2 BANK 5 MISC 0 ADDR 802bf6a69 MCG status:MCIP MCi status: Uncorrected error Error enabled MCi_MISC register valid MCi_ADDR register valid Processor context corrupt MCA: Internal Timer error STATUS be800400 MCGSTATUS 4 MCGCAP 1c09 APICID 2 SOCKETID 0 CPUID Vendor Intel Family 6 Model 44 --- I showed those Hardware errors to Vendor from whom we purchased Supermicro servers . This is what he has to say : --- Why do you not made one test environment with CentOS or one other Linux that you know to use, and see if you have same errors ??? if not than you know that the errors come from OS not from hardware. ( CentOS, RedHead….work diferend like FreeBSD – work direct on hardware if you don’t have the right kernel settings can the server crashed. CentOS , RedHead…. don’t work direct on hardware and distribute the resource load better and you have better control and you can better debug one situation) --- Now we're on a black hole and unable to find that either issue with FreeBSD or Hardware. We're thinking to disable mca in loader.conf but ppl are not suggesting it. If you guys can help us, it'd be very kind. -- View this message in context: http://freebsd.1045724.n5.nabble.com/FreeBsd-MCA-Panic-Crash-tp6064691.html Sent from the freebsd-current mailing list archive at Nabble.com. ___ freebsd-current@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org" ___
RE: 11-CURRENT r275641 panic: Unrecoverable machine check exception
I certainly could be wrong - but how to know for sure the cause of the panic? MCA: CPU 0 UNCOR PCC OVER DCACHE L2 DRD error MCA: Address 0xbd8d4cc0 MCA: Misc 0x30e386 The root cause may be hard to determine, but the immediate cause was helpfully decoded by the kernel. (Though I don't know whether all of the model-specific fields were decoded.) UNCOR = uncorrected error PCC = processor context corrupted (can't safely continue to execute, thus the panic) OVER = error overflow (hmmm, multiple errors occurred) DCACHE L2 DRD = data being read from L2 data cache The miscellaneous register indicates that 0xbd8d4cc0 is a physical address. So this looks like a processor failure. If it is repeatable, though, it may indicate either failed hardware or some problem in configuring the processor (though I'm not sure how that could lead to a cache error). Anton ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Minor bug in SCSI definition
Coverity found an issue in this area which I tracked down to the incorrect definition patched below. The SID_QUAL macro is (((inq_data)-device 0xE0) 5) which extracts the peripheral qualifier. Per SCSI-2 (draft 10L) table 46, the vendor-specific values are 1XXb. This probably affects almost nobody, but it will clear up a couple of Coverity warnings. Anton Index: sys/cam/scsi/scsi_all.h === --- sys/cam/scsi/scsi_all.h (revision 274352) +++ sys/cam/scsi/scsi_all.h (working copy) @@ -1817,7 +1817,7 @@ * reserved for this peripheral * qualifier. */ -#define SID_QUAL_IS_VENDOR_UNIQUE(inq_data) ((SID_QUAL(inq_data) 0x08) != 0) +#define SID_QUAL_IS_VENDOR_UNIQUE(inq_data) ((SID_QUAL(inq_data) 0x04) != 0) u_int8_t dev_qual2; #define SID_QUAL2 0x7F #define SID_LU_CONG0x40 ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
RE: shells/bash port, add a knob which symlinks to /bin/bash ?
If you want interoperability just use /usr/bin/env bash as a shebang. That doesn't work for this use case -- the user shell coming from LDAP -- but I agree that the port shouldn't be modifying /usr/bin. It's easy enough to add the symlink manually after installing the port if you're in this situation, or there may be a way to configure the LDAP module to map /bin/bash to /usr/local/bin/bash (I haven't looked to see what is supported here). Anton ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
RE: [CURRENT]: weird memory/linker problem?
DOT = DOD 444F54 = 444F44 That's a single-bit flip. Bad memory, perhaps? Anton -Original Message- From: owner-freebsd-curr...@freebsd.org [mailto:owner-freebsd-curr...@freebsd.org] On Behalf Of O. Hartmann Sent: Tuesday, July 01, 2014 8:08 AM To: Dimitry Andric Cc: Adrian Chadd; FreeBSD CURRENT Subject: Re: [CURRENT]: weird memory/linker problem? Am Mon, 23 Jun 2014 17:22:25 +0200 Dimitry Andric d...@freebsd.org schrieb: On 23 Jun 2014, at 16:31, O. Hartmann ohart...@zedat.fu-berlin.de wrote: Am Sun, 22 Jun 2014 10:10:04 -0700 Adrian Chadd adr...@freebsd.org schrieb: When they segfault, where do they segfault? ... GIMP, LaTeX work, nothing special, but a bit memory consuming regrading GIMP) I tried updating the ports tree and surprisingly the tree is left over in a unclean condition while /usr/bin/svn segfault (on console: pid 18013 (svn), uid 0: exited on signal 11 (core dumped)). Using /usr/local/bin/svn, which is from the devel/subversion port, performs well, while FreeBSD 11's svn contribution dies as described. It did not hours ago! I think what Adrian meant was: can you run svn (or another crashing program) in gdb, and post a backtrace? Or maybe run ktrace, and see where it dies? Alternatively, put a core dump and the executable (with debug info) in a tarball, and upload it somewhere, so somebody else can analyze it. -Dimitry It's me again, with the same weird story. After a couple of days silence, the mysterious entity in my computer is back. This time it is again a weird compiler message of failure (trying to buildworld): [...] c++ -O2 -pipe -O3 -O3 c++ -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/tools/clang/include -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support -I. -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/../../lib/clang/include -DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS -fno-strict-aliasing -DLLVM_DEFAULT_TARGET_TRIPLE=\x86_64-unknown-freebsd11.0\ -DLLVM_HOST_TRIPLE=\x86_64-unknown-freebsd11.0\ -DDEFAULT_SYSROOT=\\ -Qunused-arguments -I/usr/obj/usr/src/tmp/legacy/usr/include -std=c++11 -fno-exceptions -fno-rtti -Wno-c++11-extensions -c /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/Host.cpp -o Host.o --- GraphWriter.o --- In file included from /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/GraphWriter.cpp:14: /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llvm/Support/GraphWriter.h:269:10: error: use of undeclared identifier 'DOD'; did you mean 'DOT'? O DOD::EscapeString(Label); ^~~ DOT /usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llvm/Support/GraphWriter.h:35:11: note: 'DOT' declared here namespace DOT { // Private functions... ^ 1 error generated. *** [GraphWriter.o] Error code 1 Well, in the past I saw many of those messages, especially not found labels of routines in shared objects/libraries or even those funny misspelled messages shown above. I can not reproduce them after a reboot, but as long as the system is running with this error occured, it is sticky. So in order to compile the OS successfully, I reboot. Does anyone have an idea what this could be? Since it affects at the moment only one machine (the other CoreDuo has been retired in the meanwhile), it feels a bit like a miscompilation on a certain type of CPU. Thanks for your patience, Oliver ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
RE: PostgreSQL performance on FreeBSD
Thanks for this. The cpu_search problem you reference came up here at Isilon as well. Here's a patch which should get clang to do the right thing (inlining 3 specialized copies of cpu_search); I haven't checked to make sure it doesn't hurt gcc, though. Anton Index: sched_ule.c === --- sched_ule.c (revision 268043) +++ sched_ule.c (working copy) @@ -622,11 +622,11 @@ for ((cpu) = 0; (cpu) = mp_maxid; (cpu)++) \ if (CPU_ISSET(cpu, mask)) -static __inline int cpu_search(const struct cpu_group *cg, struct cpu_search *low, +static __always_inline int cpu_search(const struct cpu_group *cg, struct cpu_search *low, struct cpu_search *high, const int match); -int cpu_search_lowest(const struct cpu_group *cg, struct cpu_search *low); -int cpu_search_highest(const struct cpu_group *cg, struct cpu_search *high); -int cpu_search_both(const struct cpu_group *cg, struct cpu_search *low, +int __noinline cpu_search_lowest(const struct cpu_group *cg, struct cpu_search *low); +int __noinline cpu_search_highest(const struct cpu_group *cg, struct cpu_search *high); +int __noinline cpu_search_both(const struct cpu_group *cg, struct cpu_search *low, struct cpu_search *high); /* @@ -640,7 +640,7 @@ * match argument. It is reduced to the minimum set for each case. It is * also recursive to the depth of the tree. */ -static __inline int +static __always_inline int cpu_search(const struct cpu_group *cg, struct cpu_search *low, struct cpu_search *high, const int match) { -Original Message- From: owner-freebsd-curr...@freebsd.org [mailto:owner-freebsd-curr...@freebsd.org] On Behalf Of Konstantin Belousov Sent: Friday, June 27, 2014 7:56 AM To: performa...@freebsd.org Cc: curr...@freebsd.org Subject: PostgreSQL performance on FreeBSD Hi, I did some measurements and hacks to see about the performance and scalability of PostgreSQL 9.3 on FreeBSD, sponsored by The FreeBSD Foundation. The results are described in https://kib.kiev.ua/kib/pgsql_perf.pdf. The uncommitted patches, referenced in the article, are available as https://kib.kiev.ua/kib/pig1.patch.txt https://kib.kiev.ua/kib/patch-2 ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
A tweak to HWPMC hooks to improve code generation
The HWPMC hooks are never invoked except when using the soft PMC feature for performance monitoring. This trivial patch hints as much to the compiler, which then moves some fairly lengthy code sequences out of the locking primitives (in particular), reducing their runtime footprint. This patch was reviewed by Attilio Rao. Anton pmckern.diff Description: pmckern.diff ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org