RE: IoT OS

2016-01-22 Thread Rang, Anton
>Say all objects are connected peer to peer with wifi, some of them are 
>connected to internet through gsm network or wifi to a box.
>These object are moving in space, and for some reasons, connections are 
>dynamical and can be severely impaired or lost.
>
>They have incoming local streams of data (eg HD videos, accelerometer, GPS, 
>other wifi and gsm signals, etc).
>
>I would like to abstract the CPU layer, storage layer, and internet connection 
>so that in realtime results of one of my objects are saved
>if this object dies, so that if one of the object giving internet access to 
>the group loose its connection, the redundancy allows the group of object not 
>to lose internet connection.
>
>Can I consider these as different load balancing layers ? Do you recommend to 
>implement this at the kernel layer or at an API layer ?
>Can I see that as a lightweight cluster ?
>
>I think the API is more flexible, especially if I have an heterogeneous (by 
>CPU, OS) set of connected object. However, working at the kernel level allows 
>existing programs not to be rewritten.
>What are your thoughts ?

===

OK, I think I understand your question now.

This isn't the right list for it, though I'm not sure where the right place to 
go would be -- it's not FreeBSD-specific, in any case. There are academic 
research groups looking into this type of problem; for instance, in the area of 
sensor networks (ACM Transactions on Sensor Networks covers some of these 
areas). There may be USENET groups which cover this area.

To cover your three areas, which I think require somewhat different solutions --

(a) CPU layer.  I don't really recommend trying to abstract this.  You could 
use a virtual machine to hide the underlying architecture, and checkpoint state 
periodically, but this is likely to slow down execution too much to be useful.  
If the issue that a service may become unavailable, I'd recommend a middleware 
layer which can detect this and recover by starting a new instance of the 
service. Middleware layers like ZeroMQ, and clustering software, may be a 
useful starting point.  This does mean that stateful connections (like reading 
a video stream) won't recover cleanly, though; the client would need to 
reconnect to attach to the new instance of the service.  If you really need 
that, it's going to be hard.

(b) Storage layer.  Look into highly-available clustered storage solutions.  If 
you can use key-value or some other simplified storage model, do it.  There are 
clustered file systems but probably none freely available that would work on 
the scale you envision and give decent performance.  There are more 
alternatives if you're flexible about the format in which you're storing data 
(e.g. replicated object stores).

(c) Networking layer; or internet. If you can drop & re-establish a connection, 
or if every node has its own IP address (IPv6), this should be pretty 
straightforward; software could detect loss of connection and change the 
routing used to go through a different system. If not, you'll be a bit limited 
since mirroring TCP state between nodes would be too slow. This is a case where 
the existing operating system kernels are likely to do most of what you need; 
you simply need to add a layer to detect routing problems and select a new 
internet gateway appropriately.

I'd avoid implementing any clustering within the kernel, in part because if you 
have a wide variety of objects you may not want the same kernel on all of them, 
and in part because debugging & recovery is much harder. You're unlikely to 
want to run most existing software on such a system anyway (especially if they 
have relatively weak processors); you're better off writing to a set of 
clustering APIs for storage and state, at least. For networking, as mentioned, 
you can likely use the existing TCP stack & just add controls to redirect 
traffic as needed.

-- Anton

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


RE: FreeBsd MCA Panic Crash !!

2016-01-04 Thread Rang, Anton
>We've switched to FreeBSD recently to accomodate large video storage as we are 
>running video streaming website.
>So the job of the FreeBSD is to transcode the uploaded videos using ffmpeg and 
>serve them to users via nginx webserver
>but so far our experience is not very good with it. It crashes every 2-3 days 
>and we're unable to track down the problem.
>The server specs are pretty high :
>
> Supermicro X5690 (12 cores, 24 threads - 2u) 96GB RAM 12x3TB RAID-10 
> (HBA-LSI9211)

[...]

>CPU 3 BANK 5
>MCA: Internal Timer error
>STATUS be800400 MCGSTATUS 4

Are those the only MCA errors you're seeing? The reason I ask is that there's 
an errata in the X5600 series which can cause an "internal timer error" MCA to 
be logged after another uncorrectable MCA occurs.

If not, these do point at a hardware problem *or* errata, though software can 
also trigger this in some cases (for instance, reading from malfunctioning or 
non-existent hardware). If your BIOS can be updated, that's a good first step 
as it will generally update the CPU microcode and add workarounds for many 
known issues. Replacing the CPU and/or voltage regulator is more drastic, but 
if the problem is hardware, it's likely in one of those components.

Anton




Supermicro X5690 (12 cores, 24 threads - 2u) 96GB RAM 12x3TB RAID-10 
(HBA-LSI9211)

Here is the screenshot of recent crash :

http://prntscr.com/9er3pk

One thing worth mentioning is, before going down there's no load on server, 
more or less free RAM usually is around 12GB.  We've tried following solutions 
so far :


- Updated FreeBSD OS
- Replaced 800W PS with 900W
- We've reduced CMOS from MAX(26x) to 18x as suggested in this post 
http://unix.stackexchange.com/questions/60574/determining-cause-of-linux-kernel-panic

The solution we've not performed so far is :

- Disable mca using (hw.mca.enabled: 0) - As we're getting MCA panics.

Here is the crash dump :

[root@cw001 /var/crash]# mcelog --no-dmi --ascii --file core.txt.1 HARDWARE 
ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 3 BANK 5
MISC 0 ADDR 802bf6a69
MCG status:MCIP
MCi status:
Uncorrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: Internal Timer error
STATUS be800400 MCGSTATUS 4
MCGCAP 1c09 APICID 3 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 2 BANK 5
MISC 0 ADDR 802bf6a69
MCG status:MCIP
MCi status:
Uncorrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: Internal Timer error
STATUS be800400 MCGSTATUS 4
MCGCAP 1c09 APICID 2 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 3 BANK 5
MISC 0 ADDR 802bf6a69
MCG status:MCIP
MCi status:
Uncorrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: Internal Timer error
STATUS be800400 MCGSTATUS 4
MCGCAP 1c09 APICID 3 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44
HARDWARE ERROR. This is *NOT* a software problem!
Please contact your hardware vendor
CPU 2 BANK 5
MISC 0 ADDR 802bf6a69
MCG status:MCIP
MCi status:
Uncorrected error
Error enabled
MCi_MISC register valid
MCi_ADDR register valid
Processor context corrupt
MCA: Internal Timer error
STATUS be800400 MCGSTATUS 4
MCGCAP 1c09 APICID 2 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44

---

I showed those Hardware errors to Vendor from whom we purchased Supermicro 
servers . This is what he has to say :

---
Why do you not made one test environment with CentOS or one other Linux that 
you know to use, and see if you have same errors ??? if not than you know that 
the errors come from OS not from hardware. ( CentOS, RedHead….work diferend 
like FreeBSD – work direct on hardware if you don’t have the right kernel 
settings can the server crashed. CentOS , RedHead…. don’t work direct on 
hardware and distribute the resource load better and you have better control 
and you can better debug one situation)
---

Now we're on a black hole and unable to find that either issue with FreeBSD or 
Hardware. We're thinking to disable mca in loader.conf but ppl are not 
suggesting it. If you guys can help us, it'd be very kind.



--
View this message in context: 
http://freebsd.1045724.n5.nabble.com/FreeBsd-MCA-Panic-Crash-tp6064691.html
Sent from the freebsd-current mailing list archive at Nabble.com.
___
freebsd-current@freebsd.org mailing list 
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
___

RE: 11-CURRENT r275641 panic: Unrecoverable machine check exception

2014-12-15 Thread Rang, Anton
 I certainly could be wrong - but how to know for sure the cause of the panic?

 MCA: CPU 0 UNCOR PCC OVER DCACHE L2 DRD error
 MCA: Address 0xbd8d4cc0
 MCA: Misc 0x30e386

The root cause may be hard to determine, but the immediate cause was 
helpfully decoded by the kernel. (Though I don't know whether all of the 
model-specific fields were decoded.)

UNCOR = uncorrected error
PCC = processor context corrupted (can't safely continue to execute, thus the 
panic)
OVER = error overflow (hmmm, multiple errors occurred)
DCACHE L2 DRD = data being read from L2 data cache

The miscellaneous register indicates that 0xbd8d4cc0 is a physical address.

So this looks like a processor failure. If it is repeatable, though, it may 
indicate either failed hardware or some problem in configuring the processor 
(though I'm not sure how that could lead to a cache error).

Anton
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Minor bug in SCSI definition

2014-11-12 Thread Rang, Anton
Coverity found an issue in this area which I tracked down to the incorrect 
definition patched below.

The SID_QUAL macro is (((inq_data)-device  0xE0)  5) which extracts the 
peripheral qualifier.
Per SCSI-2 (draft 10L) table 46, the vendor-specific values are 1XXb.

This probably affects almost nobody, but it will clear up a couple of Coverity 
warnings.

Anton

Index: sys/cam/scsi/scsi_all.h
===
--- sys/cam/scsi/scsi_all.h (revision 274352)
+++ sys/cam/scsi/scsi_all.h  (working copy)
@@ -1817,7 +1817,7 @@
* reserved for this 
peripheral
* qualifier.
*/
-#define  SID_QUAL_IS_VENDOR_UNIQUE(inq_data) ((SID_QUAL(inq_data)  
0x08) != 0)
+#define SID_QUAL_IS_VENDOR_UNIQUE(inq_data) ((SID_QUAL(inq_data)  
0x04) != 0)
   u_int8_t dev_qual2;
#define  SID_QUAL2  0x7F
#define  SID_LU_CONG0x40

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


RE: shells/bash port, add a knob which symlinks to /bin/bash ?

2014-09-12 Thread Rang, Anton
 If you want interoperability just use /usr/bin/env bash as a shebang.

That doesn't work for this use case -- the user shell coming from LDAP -- but I 
agree that the port shouldn't be modifying /usr/bin.

It's easy enough to add the symlink manually after installing the port if 
you're in this situation, or there may be a way to configure the LDAP module to 
map /bin/bash to /usr/local/bin/bash (I haven't looked to see what is supported 
here).

Anton

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


RE: [CURRENT]: weird memory/linker problem?

2014-07-01 Thread Rang, Anton
DOT = DOD

444F54 = 444F44

That's a single-bit flip.  Bad memory, perhaps?

Anton

-Original Message-
From: owner-freebsd-curr...@freebsd.org 
[mailto:owner-freebsd-curr...@freebsd.org] On Behalf Of O. Hartmann
Sent: Tuesday, July 01, 2014 8:08 AM
To: Dimitry Andric
Cc: Adrian Chadd; FreeBSD CURRENT
Subject: Re: [CURRENT]: weird memory/linker problem?

Am Mon, 23 Jun 2014 17:22:25 +0200
Dimitry Andric d...@freebsd.org schrieb:

 On 23 Jun 2014, at 16:31, O. Hartmann ohart...@zedat.fu-berlin.de wrote:
  Am Sun, 22 Jun 2014 10:10:04 -0700
  Adrian Chadd adr...@freebsd.org schrieb:
  When they segfault, where do they segfault?
 ...
  GIMP, LaTeX work, nothing special, but a bit memory consuming 
  regrading GIMP) I tried updating the ports tree and surprisingly the 
  tree is left over in a unclean condition while /usr/bin/svn segfault 
  (on console: pid 18013 (svn), uid 0: exited on signal 11 (core dumped)).
  
  Using /usr/local/bin/svn, which is from the devel/subversion port, 
  performs well, while FreeBSD 11's svn contribution dies as described. It 
  did not hours ago!
 
 I think what Adrian meant was: can you run svn (or another crashing
 program) in gdb, and post a backtrace?  Or maybe run ktrace, and see 
 where it dies?
 
 Alternatively, put a core dump and the executable (with debug info) in 
 a tarball, and upload it somewhere, so somebody else can analyze it.
 
 -Dimitry
 

It's me again, with the same weird story.

After a couple of days silence, the mysterious entity in my computer is back. 
This time it is again a weird compiler message of failure (trying to 
buildworld):

[...]
c++  -O2 -pipe -O3 -O3 
c++ -I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include
-I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/tools/clang/include
-I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support -I.
-I/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/../../lib/clang/include
-DLLVM_ON_UNIX -DLLVM_ON_FREEBSD -D__STDC_LIMIT_MACROS -D__STDC_CONSTANT_MACROS 
-fno-strict-aliasing -DLLVM_DEFAULT_TARGET_TRIPLE=\x86_64-unknown-freebsd11.0\
-DLLVM_HOST_TRIPLE=\x86_64-unknown-freebsd11.0\ -DDEFAULT_SYSROOT=\\
-Qunused-arguments -I/usr/obj/usr/src/tmp/legacy/usr/include -std=c++11 
-fno-exceptions -fno-rtti -Wno-c++11-extensions -c 
/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/Host.cpp -o 
Host.o
--- GraphWriter.o --- In file included
from 
/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/lib/Support/GraphWriter.cpp:14:
 
/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llvm/Support/GraphWriter.h:269:10:
error: use of undeclared identifier 'DOD'; did you mean 'DOT'? O  
DOD::EscapeString(Label); ^~~ DOT 
/usr/src/lib/clang/libllvmsupport/../../../contrib/llvm/include/llvm/Support/GraphWriter.h:35:11:
note: 'DOT' declared here namespace DOT {  // Private functions... ^ 1 error 
generated.
*** [GraphWriter.o] Error code 1


Well, in the past I saw many of those messages, especially not found labels of 
routines in shared objects/libraries or even those funny misspelled messages 
shown above.

I can not reproduce them after a reboot, but as long as the system is running 
with this error occured, it is sticky. So in order to compile the OS 
successfully, I reboot.

Does anyone have an idea what this could be? Since it affects at the moment 
only one machine (the other CoreDuo has been retired in the meanwhile), it 
feels a bit like a miscompilation on a certain type of CPU.

Thanks for your patience,

Oliver
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


RE: PostgreSQL performance on FreeBSD

2014-06-30 Thread Rang, Anton
Thanks for this.

The cpu_search problem you reference came up here at Isilon as well.  Here's a 
patch which should get clang to do the right thing (inlining 3 specialized 
copies of cpu_search); I haven't checked to make sure it doesn't hurt gcc, 
though.

Anton

Index: sched_ule.c
===
--- sched_ule.c (revision 268043)
+++ sched_ule.c (working copy)
@@ -622,11 +622,11 @@
for ((cpu) = 0; (cpu) = mp_maxid; (cpu)++) \
if (CPU_ISSET(cpu, mask))
 
-static __inline int cpu_search(const struct cpu_group *cg, struct cpu_search 
*low,
+static __always_inline int cpu_search(const struct cpu_group *cg, struct 
cpu_search *low,
 struct cpu_search *high, const int match);
-int cpu_search_lowest(const struct cpu_group *cg, struct cpu_search *low);
-int cpu_search_highest(const struct cpu_group *cg, struct cpu_search *high);
-int cpu_search_both(const struct cpu_group *cg, struct cpu_search *low,
+int __noinline cpu_search_lowest(const struct cpu_group *cg, struct cpu_search 
*low);
+int __noinline cpu_search_highest(const struct cpu_group *cg, struct 
cpu_search *high);
+int __noinline cpu_search_both(const struct cpu_group *cg, struct cpu_search 
*low,
 struct cpu_search *high);
 
 /*
@@ -640,7 +640,7 @@
  * match argument.  It is reduced to the minimum set for each case.  It is
  * also recursive to the depth of the tree.
  */
-static __inline int
+static __always_inline int
 cpu_search(const struct cpu_group *cg, struct cpu_search *low,
 struct cpu_search *high, const int match)
 {

-Original Message-
From: owner-freebsd-curr...@freebsd.org 
[mailto:owner-freebsd-curr...@freebsd.org] On Behalf Of Konstantin Belousov
Sent: Friday, June 27, 2014 7:56 AM
To: performa...@freebsd.org
Cc: curr...@freebsd.org
Subject: PostgreSQL performance on FreeBSD

Hi,
I did some measurements and hacks to see about the performance and scalability 
of PostgreSQL 9.3 on FreeBSD, sponsored by The FreeBSD Foundation.

The results are described in https://kib.kiev.ua/kib/pgsql_perf.pdf.
The uncommitted patches, referenced in the article, are available as 
https://kib.kiev.ua/kib/pig1.patch.txt
https://kib.kiev.ua/kib/patch-2
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


A tweak to HWPMC hooks to improve code generation

2013-12-23 Thread Rang, Anton
The HWPMC hooks are never invoked except when using the soft PMC feature for 
performance monitoring. This trivial patch hints as much to the compiler, which 
then moves some fairly lengthy code sequences out of the locking primitives (in 
particular), reducing their runtime footprint.

This patch was reviewed by Attilio Rao.

Anton



pmckern.diff
Description: pmckern.diff
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org