Re: [smartos-discuss] NVME 1.3

2018-08-20 Thread Robert Mustacchi
On 7/27/18 8:51 , Jan Paul wrote:
> 
> 
>> On 27 Jul 2018, at 17:29, Robert Mustacchi  wrote:
>>
>> On 7/27/18 7:38 , Jan Paul wrote:
>>> Given the amazing progress Robert has made on making the new Kaby lake 
>>> machines work with SmartOS (huge kudos!),
>>> I'd like to raise a question related to us "small scale" hobbyists HW needs.
>>>
>>> Given the fact the NUCs seem to be still the most efficient small lab box 
>>> for home SmartOS playing and those are mostly NVME only machines.
>>> I found that getting NVME 1.2 M2 SSDs is almost impossible as most of the 
>>> NVMEs I've been able to get are 1.3 ones.
>>>
>>> Is there any plan on up-reving the Illumos nvme driver with the new version 
>>> support?
>>
>> Yes, that's on the TODO list.
> Perfect!
>>
>>> I so far workaround it via setting the strict-version=0 in nvme.conf and 
>>> the system so far works, but I just wanted to bring this question up so
>>> we can get some feedback on it.
>>
>> That's good to know. It should usually work and we maybe should relax
>> that strict version check to only be based on major versions. Which NVMe
>> 1.3 parts are you using?
>>
> Samsung 970EVO 250G
> 
> it seems to be pretty happy with the workaround (so far about 10 hours of 
> runtime on SmartOS).

FYI, a fix for this went back today:
https://github.com/joyent/illumos-joyent/commit/1eb19b4a7770efe8736592808ccffef5e3c16bb8.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125
Powered by Listbox: https://www.listbox.com


[smartos-discuss] ACPI Testing Request

2018-08-16 Thread Robert Mustacchi
Hi All,

There have been a number of issues with boot hangs on some of the more
recent Kaby Lake processors and a few Skylake SKUs. We were able to root
cause this to a deadlock in the core ACPICA code. For the full details
see https://github.com/joyent/smartos-live/issues/727 and
https://smartos.org/bugview/OS-7093.

Since ACPI changes can be a bit gritty, I would like to ask for a bit
more help in testing this across a variety of platforms -- in particular
Desktop platforms. I've put together a series of test images that have
the newer ACPI and also end up logging substantially more ACPI related
information to the console in case something goes wrong (particularly on
debug bits).

If you could test this and ensure that you can boot and reboot OK, I
would greatly appreciate it. I have both debug and non-debug media. If
you'd like to build this yourself, the changes to illumos-joyent that
we've made are available at
https://github.com/rmustacc/illumos-gate/tree/acpi-dev-smartos. I'd also
like to thank Mike Gerdts who wrote a bunch of tools for updating the
ACPI tree in illumos which has made this effort substantially easier.
Based on that we're now able to better track how we're handling changes
and revisions to ACPI. That's available at
https://github.com/joyent/acpica/tree/joyent/20180629-wip.

Please note that these images are based on a platform from last week.

non-debug raw platform:
https://us-east.manta.joyent.com/rmustacc/public/preview/acpi-201808/non-debug/platform-20180807T230146Z.tgz

non-debug ISO vga:
https://us-east.manta.joyent.com/rmustacc/public/preview/acpi-201808/non-debug/acpi-nd-vga.iso

non-debug ISO ttya:
https://us-east.manta.joyent.com/rmustacc/public/preview/acpi-201808/non-debug/acpi-nd-ttya.iso

non-debug ISO ttyb:
https://us-east.manta.joyent.com/rmustacc/public/preview/acpi-201808/non-debug/acpi-nd-ttyb.iso

non-debug USB vga:
https://us-east.manta.joyent.com/rmustacc/public/preview/acpi-201808/non-debug/acpi-nd-vga.usb.bz2

non-debug USB ttya:
https://us-east.manta.joyent.com/rmustacc/public/preview/acpi-201808/non-debug/acpi-nd-ttya.usb.bz2

non-debug USB ttyb:
https://us-east.manta.joyent.com/rmustacc/public/preview/acpi-201808/non-debug/acpi-nd-ttyb.usb.bz2


debug raw platform:
https://us-east.manta.joyent.com/rmustacc/public/preview/acpi-201808/debug/platform-20180807T223604Z.tgz

debug ISO vga:
https://us-east.manta.joyent.com/rmustacc/public/preview/acpi-201808/debug/acpi-debug-vga.iso

debug ISO ttya:
https://us-east.manta.joyent.com/rmustacc/public/preview/acpi-201808/debug/acpi-debug-ttya.iso

debug ISO ttyb:
https://us-east.manta.joyent.com/rmustacc/public/preview/acpi-201808/debug/acpi-debug-ttyb.iso

debug USB vga:
https://us-east.manta.joyent.com/rmustacc/public/preview/acpi-201808/debug/acpi-debug-vga.usb.bz2

debug USB ttya:
https://us-east.manta.joyent.com/rmustacc/public/preview/acpi-201808/debug/acpi-debug-ttya.usb.bz2

debug USB ttyb:
https://us-east.manta.joyent.com/rmustacc/public/preview/acpi-201808/debug/acpi-debug-ttyb.usb.bz2

Again, thank you in advance for giving this a shot. Whether it works and
especially if it does not for some reason, if you test this, can you
please reply and let me know what the motherboard, processor, and BIOS
revision that you're using are?

Thanks,
Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125
Powered by Listbox: https://www.listbox.com


[smartos-discuss] Subject: L1 Terminal Fault (CVE-2018-3615, CVE-2018-3620, CVE-2018-3646)

2018-08-14 Thread Robert Mustacchi
Hi All,

Several vulnerabilities that are all called L1 Terminal Fault (L1TF)
have been announced which are CVE-2018-3615, CVE-2018-3620,
CVE-2018-3646. I wanted to call attention to the fact that this is a
problem for SmartOS users who are running multi-tenant, untrusted,
workloads. The full Joyent security advisory is availble at:
https://help.joyent.com/hc/en-us/articles/360007955414-Security-Advisory-Intel-L1-Terminal-Fault-Vulnerabilities-CVE-2018-3615-CVE-2018-3620-CVE-2018-3646-.

We'll have updated platform media available with fixes for this out
shortly. The changes have just been integrated and you can find the
reviews at https://cr.joyent.us/#/c/4679/ and
https://cr.joyent.us/#/c/4680/.

If you have any questions, please reach out.

Thanks,
Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125
Powered by Listbox: https://www.listbox.com


Re: [smartos-discuss] based on OS-5492 - add more Advanced-Format drives (wiki.illumos.org 29.07.2018)

2018-08-02 Thread Robert Mustacchi
On 8/1/18 12:18 , Daniel Plominski wrote:
> Hi Robert,
> 
> before the patch, the smartos setup had only ever used an ashift of 9,
> which is definitely wrong since the Samsung 850 PRO uses native 8k blocks

The reason we did this previously was because those devices were
actually causing ZFS errors on a scrub that seemed to be firmware
related. If it's not actually causing reliability issues with the drive,
then that's going to be a different story. What logical and physical
sector size is the device actually advertising?

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125
Powered by Listbox: https://www.listbox.com


Re: [smartos-discuss] based on OS-5492 - add more Advanced-Format drives (wiki.illumos.org 29.07.2018)

2018-08-01 Thread Robert Mustacchi
On 8/1/18 11:18 , Daniel Plominski wrote:
> Hi,
> 
> we have several Samsung 850 PRO (MZ7WD480) SSDs in use and we need a
> Zpool ashift of 13, maybe the addition of the sd.conf in the current
> joyent/smartos-live repository makes sense
> 
> https://github.com/ass-a2s/smartos-live/commit/386d25877b44d8c41057aefc933256dd7cc58c7f

Are there correctness problems? What sector sizes is the drive advertising?

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125
Powered by Listbox: https://www.listbox.com


Re: AW: [smartos-discuss] Latest Smartos version cannot boot Win7 x64 with KVM

2018-07-27 Thread Robert Mustacchi
Hi Gernot,

I believe that some part of the eager FPU and the common FPU API for
hypervisors is likely responsible for this regression. Apologies, I
haven't had the time to hunt that down and am seeing if I can track down
a Win 7 image.

Robert

On 7/27/18 1:02 , Gernot Straßer wrote:
> I would really appreciate a word from Joyent engineers, if this issue is
> being worked on or if there is a work-around yet…
> 
> thanks
> 
> Von: Gernot Straßer [mailto:gernot.stras...@freenet.de]
> Gesendet: Dienstag, 17. Juli 2018 07:56
> An: smartos-discuss@lists.smartos.org
> Betreff: [smartos-discuss] Latest Smartos version cannot boot Win7 x64 with
> KVM
> 
> See also https://github.com/joyent/smartos-live/issues/792
> 
> Regards
> 
> Gernot
> 
> smartos-discuss |  
> Archives |   Modify Your
> Subscription
> 
> 
> 


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125
Powered by Listbox: https://www.listbox.com


Re: [smartos-discuss] NVME 1.3

2018-07-27 Thread Robert Mustacchi
On 7/27/18 7:38 , Jan Paul wrote:
> Given the amazing progress Robert has made on making the new Kaby lake 
> machines work with SmartOS (huge kudos!),
> I'd like to raise a question related to us "small scale" hobbyists HW needs.
> 
> Given the fact the NUCs seem to be still the most efficient small lab box for 
> home SmartOS playing and those are mostly NVME only machines.
> I found that getting NVME 1.2 M2 SSDs is almost impossible as most of the 
> NVMEs I've been able to get are 1.3 ones.
> 
> Is there any plan on up-reving the Illumos nvme driver with the new version 
> support?

Yes, that's on the TODO list.

> I so far workaround it via setting the strict-version=0 in nvme.conf and the 
> system so far works, but I just wanted to bring this question up so
> we can get some feedback on it.

That's good to know. It should usually work and we maybe should relax
that strict version check to only be based on major versions. Which NVMe
1.3 parts are you using?

Thanks,
Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125
Powered by Listbox: https://www.listbox.com


Re: [smartos-discuss] after SmartOS Clean Re-install (20180711T060947Z) issue with Intel 10 Gigabit X710-DA2 SFP+ Dual Port Network Card

2018-07-17 Thread Robert Mustacchi
Hi Daniel,

It looks like your dump device may not be large enough for us to have
the entire dump. Is it possible to increase the dump device? I guess
there's something that's gone wrong as a result of the integration of
the TSO support for i40e.

Robert

On 7/17/18 12:00 , Daniel Plominski wrote:
> Hi,
> 
> after a smartos reinstall and upgrade to a newer platform image, the
> network card will not work after a while.
> The network card is responsible for vlan. Vlan works in a lx zone, but
> not for a kvm anymore.
> 
> 1.  SmartOS Clean Re-install
>  Restore Datasets Settings
>  Restore usbkey/config etc.
>  Restore ZFS Datasets
>  Import Zones to /etc/zones/index
>  Run / Start LX & KVM Zones
> 
> PI from SunOS assg10 5.11 joyent_20180509T053210Z i86pc i386 i86pc to
> SunOS assg10 5.11 joyent_20180717T123432Z i86pc i386 i86pc
> https://github.com/ass-a2s/illumos-joyent/tree/ass-release-20180717
> https://datasets.ass.de/public/SmartOS/20180717T123432Z/smartos-20180717T123432Z-USB.img.bz2
> 
> Rollback on version joyent_20180509T053210Z now shows the same error
> after a while.
> 
> I did not see any abnormalities under
> https://github.com/joyent/illumos-joyent/tree/master/usr/src/uts/common/io/i40e
> 
> after Shutdown all VMs and SmartOS Reboot:
> 
> 2018-07-16T05:18:37.496685+00:00 assg10 genunix: [ID 936769 kern.info]
> mpt_sas2 is /pci@7a,0/pci8086,2f01@0/pci1000,30e0@0/iport@v0
> 2018-07-16T05:18:37.496688+00:00 assg10 genunix: [ID 408114 kern.info]
> /pci@7a,0/pci8086,2f01@0/pci1000,30e0@0/iport@v0 (mpt_sas2) online
> 2018-07-16T05:18:37.496703+00:00 assg10 genunix: [ID 454863 kern.info]
> dump on /dev/zvol/dsk/zones/dump size 10465 MB
> 2018-07-16T05:18:37.496706+00:00 assg10 genunix: [ID 127566 kern.info]
> device pciclass,03@0(display#0) keeps up device sd@0,0(disk#0), but
> the former is not power managed
> 2018-07-16T05:18:37.496709+00:00 assg10 mac: [ID 469746 kern.info]
> NOTICE: aggr1000 registered
> 2018-07-16T05:18:37.496712+00:00 assg10 mac: [ID 435574 kern.info]
> NOTICE: igb1 link up, 1000 Mbps, full duplex
> 2018-07-16T05:18:37.496715+00:00 assg10 mac: [ID 435574 kern.info]
> NOTICE: aggr1000 link up, 1000 Mbps, full duplex
> 2018-07-16T05:18:37.496718+00:00 assg10 mac: [ID 435574 kern.info]
> NOTICE: igb2 link up, 1000 Mbps, full duplex
> 2018-07-16T05:18:37.496721+00:00 assg10 mac: [ID 435574 kern.info]
> NOTICE: igb3 link up, 1000 Mbps, full duplex
> 2018-07-16T05:18:37.496724+00:00 assg10 mac: [ID 435574 kern.info]
> NOTICE: igb0 link up, 1000 Mbps, full duplex
> 2018-07-16T05:18:37.496727+00:00 assg10 genunix: [ID 390243 kern.info]
> Creating /etc/devices/devid_cache
> 2018-07-16T05:18:37.496730+00:00 assg10 genunix: [ID 390243 kern.info]
> Creating /etc/devices/pci_unitaddr_persistent
> 2018-07-16T05:18:37.497026+00:00 assg10 savecore: [ID 570001 auth.error]
> reboot after panic: assertion failed: tcb != NULL, file:
> ../../common/io/i40e/i40e_transceiver.c, line: 2074
> 2018-07-16T05:18:33+00:00 assg10 savecore: [ID 676874 auth.error] Saving
> compressed system crash dump in /var/crash/volatile/vmdump.0
> 2018-07-16T05:18:40.860505+00:00 assg10 unix: [ID 504448 kern.info]
> NOTICE: Fastboot: Couldn't open /platform/i86pc/amd64/boot_archive
> 2018-07-16T05:18:45.870340+00:00 assg10 pseudo: [ID 129642 kern.info]
> pseudo-device: devinfo0
> 2018-07-16T05:18:45.870392+00:00 assg10 genunix: [ID 936769 kern.info]
> devinfo0 is /pseudo/devinfo@0
> 2018-07-16T05:18:50.549203+00:00 assg10 genunix: [ID 390243 kern.info]
> Creating /etc/devices/devname_cache
> 2018-07-16T05:20:04+00:00 assg10 savecore: [ID 320429 auth.error]
> Decompress the crash dump with #012'savecore -vf
> /var/crash/volatile/vmdump.0'
> 2018-07-16T05:20:04.857876+00:00 assg10 rootnex: [ID 349649 kern.info]
> xsvc0 at root: space 0 offset 0
> 2018-07-16T05:20:04.857902+00:00 assg10 genunix: [ID 936769 kern.info]
> xsvc0 is /xsvc@0,0
> 2018-07-16T05:20:06.914513+00:00 assg10 fmd: [ID 377184 daemon.error]
> SUNW-MSG-ID: FMD-8000-2K, TYPE: Defect, VER: 1, SEVERITY:
> Minor#012EVENT-TIME: Mon Jul 16 05:20:06 UTC 2018#012PLATFORM:
> Super-Server, CSN: 9000135765, HOSTNAME:
> assg10.assdomain.intern#012SOURCE: fmd-self-diagnosis, REV:
>#012EVENT-ID: 901bac51-20d6-c3ba-d2c3-df70f5af044f#012DESC: An
> illumos Fault Manager component has experienced an error that required
> the module to be disabled.  Refer to http://illumos.org/msg/FMD-8000-2K
>  for more information.#012AUTO-RESPONSE: The module has been disabled.
> Events destined for the module will be saved for manual
> diagnosis.#012IMPACT: Automated diagnosis and response for subsequent
> events associated with this module will not occur.#012REC-ACTION: Use
> fmdump -v -u  to locate the module.  Use fmadm reset 
> to reset the module.
> [root@assg10 ~]#
> [root@assg10 /var/crash/volatile]# ls -all
> total 20467016
> drwx--   2 root root   5 Juli 16 05:20 .
> drwxr-xr-x   3 root root   3 Juli 15 23:03 ..
> -rw-r--r--   1 root root 

[smartos-discuss] platform flag day: optional gcc6 in -extra

2018-07-17 Thread Robert Mustacchi
Hi all,

If you don't build the platform, then you can skip this message. With
the integration of 'OS-7042 illumos-extra should support building
optional, extra gcc versions' if you update illumos-extra, then you must
update smartos-live and rerun ./configure for each workspace. In other
words the following steps should be taken:

$ gmake clobber
$ gmake update
$ ./configure

If you'd like to build the optional compilers, you should specify:

$ gmake BUILD_EXTRA_GCC=yes live

Note, this gcc6 is not currently being used. We will be more
aggressively introducing more compilers and warnings along with
bootstraps to ease the process.

If you have any questions, please reach out.

Thanks,
Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125
Powered by Listbox: https://www.listbox.com


[smartos-discuss] Heads Up: OS-6992 Want hypervisor API for FPU management

2018-06-12 Thread Robert Mustacchi
Hi,

If you don't build the platform, you can ignore this message.

With the integration of OS-6992 Want hypervisor API for FPU management,
if you update the kvm repo, you will need to make sure that you update
illumos-joyent as well. This is primarily done as part of cleaning up
and laying the groundwork to be able to run both bhyve and kvm at the
same time ala Apple's hypervisor framework. If you have any questions or
issues, please let me know.

Thanks,
Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125
Powered by Listbox: http://www.listbox.com


[smartos-discuss] Flag day: OS-6947 ucode shouldn't need install step

2018-06-11 Thread Robert Mustacchi
Hi,

If you don't build the platform image, then you can ignore this e-mail.

I put back OS-6947 ucode shouldn't need install step. With this you will
need to make sure that you update both smartos-live and illumos-joyent
in tandemn. If you have any questions and issues, please let me know. My
apologies for those of you who have build issues due to the initial
integration of OS-6944.

Thanks,
Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
Modify Your Subscription: https://www.listbox.com/member/?member_id=25769125
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] failure to boot on Intel Core i7 8700 Hexa-Core Processor (Coffee Lake) and Gigabyte Z370XP SLI motherboard

2018-02-10 Thread Robert Mustacchi
On 2/8/18 18:34 , de...@hyltown.com wrote:
> I have what appears to be roughly the same situation as described by Robert 
> Fisher (ASUS H110 board and a 6th gen i5). Booting with -v i end up with:
> 
> root on /ramdisk:a fstype ufs
> 
> Neither F1+A nor Shift+Pause breaks out of this. Following suggestions from 
> Robert's thread, I've attempted "-B disable-xhci=true" and "-B 
> disable-acpi=true" but saw no difference. I have also disabled all C-states, 
> power management, UEFI, power management and speedstep-type of things I could 
> find but saw no difference. 
> 
> rmustacc said he hadn't seen reports of coffee lake at all, working or not.
> 
> ricco386 suggested looking through this seemingly related issue:
> https://github.com/joyent/smartos-live/issues/727
> 
> Though I didn't understand everything going on in that thread, and knowing 
> that I am likely running a different version than what is depicted there, I 
> did follow steps from this point:
> https://github.com/joyent/smartos-live/issues/727#issuecomment-342868065
> 
> Here is what I saw at the point of the hang, which definitely differs from 
> what was shown in that thread:
> https://postimg.org/image/4g4kovbdx/
> 
> 
> So ...
> 
> I've been running SmartOS on Supermicro hardware for several years at small 
> customer sites, using a mix of native zones and KVM to achieve a more-or-less 
> all-in-one server solution. I set them up and automate what management I can, 
> and don't revisit until/unless there are problems. It basically just works - 
> so I never end up getting very deep into troubleshooting. Because of this, 
> I'm unfamiliar with kernel debugging and all that - so I don't have much to 
> share relating to my problem other than what is listed above.
> 
> BUT the Gigabyte/i7 box is one I just built in hopes of playing around - so 
> it's currently available to bang on in case someone cares to hold my hand 
> through doing so. Perhaps this can be used to find/circumvent issues related 
> to coffee lake. Any help would be appreciated.

Is there a serial header that we can use for kmdb on that system? It may
be useful to try and use the module auto load / breakpoint system and/or
maybe disable the boot of other CPUs to try and debug.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] "man -s2 stat" is WRONG!

2018-01-18 Thread Robert Mustacchi
On 1/17/18 18:40 , Jesus Cea wrote:
> what would be the right approach to request a man page update?. I guess
> this should be pushed thru Illumos, but I don't know the details.

The bug report to update the manual page for the defined higher
precision values is sufficient. I'll file that and take care of updating
the manual page. It's worth noting that the actual resolution will be
dependent on the file system and hardware clock.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] DHCP server in a zone?

2017-12-20 Thread Robert Mustacchi
On 12/20/17 3:32 , Matthias Teege wrote:
> On Tue, Dec 19, 2017 at 07:57:27AM -0800, Robert Mustacchi wrote:
> 
> Hello!
> 
>> On 12/19/17 7:55 , Matthias Teege wrote:
> 
>>> Do I have "tune" the zone or the root zone to handle DHCP?
>>
>> Did you change any of the vmadm anti-spoofing properties on the zone in
>> vmadm? By default, a zone is prevented from being a dhcp server.
> 
> I've found the documentation. Setting '"allow_dhcp_spoofing": true'
> solved the problem. Maybe the "dhcp_server" is the better option.

Generally, the "dhcp_server" option is the better one. It's the main
thing we set on our zones in triton that service dhcp. I say better
mostly because I like to use the minimal feature set here.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] DHCP server in a zone?

2017-12-19 Thread Robert Mustacchi
On 12/19/17 7:55 , Matthias Teege wrote:
> Hello!
> 
> I've installed SmartOS, created a zone and installed the isc-dhcpd
> from the packages. I can see the DHCPDISCOVER and a DHCPOFFER in
> the logs but dont see an answer packet on the network interfaces.
> The client gets not address. I've also tried an lx branded zone
> with the same result.
> 
> Do I have "tune" the zone or the root zone to handle DHCP?

Did you change any of the vmadm anti-spoofing properties on the zone in
vmadm? By default, a zone is prevented from being a dhcp server.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] No resolv.conf in ubuntu-certified-17.10

2017-11-28 Thread Robert Mustacchi
On 11/21/17 13:07 , Daniel Kontsek wrote:
> Hello,
> 
> We’ve got a bug report  about 
> the ubuntu-certified-17.10 
> 
>  image from images.joyent.com  not setting 
> /etc/resolv.conf from the resolvers property in vmadm. The 
> ubuntu-certified-16.04 image and other ubuntu-certified images before set 
> /etc/resolv.conf correctly.
> 
> At first, I was thinking that this has something to do with cloud-init. But 
> after playing around I noticed that the resolvconf package is missing or was 
> removed as there is a symlink to a non-existent file in /run 
> (/etc/resolv.conf -> ../run/resolvconf/resolv.conf). Installing resolvconf 
> solves the problem.
> However, there is also the systemd-resolved service, which maintains 
> /run/systemd/resolve/resolv.conf from the first boot. So changing the symlink 
> to point to /run/systemd/resolve/resolv.conf seems to be the best solution.

Hi Daniel,

Thanks for reporting this. I'll forward this onto some of the folks
working on images and hopefully we'll hear something back on this.

Thanks,
Robert



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] SmartOS functionality question

2017-11-09 Thread Robert Mustacchi
On 11/9/17 14:39 , Lonnie Cumberland wrote:
> Hi All,
> 
> Well, the weekend is finally approaching and I am hoping to get a bit more
> done on my SmartOS projects.
> 
> With that in mind, I was wondering something about the way that SmartOS can
> start/stop zones with VM's.
> 
> I am just wondering if you can suspend a VM and save the current state of
> the system instead of having to shutdown? Then later you can just restart
> the VM from the saved state.
> 
> Virtualbox has this type of "pause" function as do a number of other
> hypervisors and I was just wondering about SmartOS in this regard.

Hi Lonnie,

We do not have such a functionality at this time.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] Poor network performance on 10GbE over VLAN

2017-11-09 Thread Robert Mustacchi
Hi Denis,

The fundamental issue right now is that in certain VLAN configurations
the system is not taking advantage of hardware polling as it should be.
This means that the system falls back to high watermarks for packet
rates that are a bit low and end up netting the performance around
roughly what you're seeing for a single thread.

As you might imagine, we're acutely aware of this problem and are in the
process of implementing RFD 97
(https://github.com/joyent/rfd/tree/master/rfd/0097) to address it,
which is already seeing promising results from our current experiments.

Robert

On 11/9/17 19:35 , Denis Cheong wrote:
> I am adding 10GbE to my existing SmartOS server but am experiencing unusual 
> and severe performance issues that I’m at a loss to explain.
> 
> Over the default untagged 10GbE link, I can get >9Gbit/sec consistently under 
> all configurations.
> As soon as I test over a VLAN, transfer rates plummet to a very inconsistent 
> 3-4Gbit/sec RX, and <1Gbit/sec TX.
> 
> Does anybody have any ideas what might be going on here?
> 
> Performance over default VLAN ID (SmartOS is running iperf3 -s; nb with 
> SmartOS as client and other host as server, performance is identical):
> 
> Connecting to host 192.168.245.14, port 5201
>   local 192.168.245.21 port 56809 connected to 192.168.245.14 port 5201
>   Interval   Transfer Bandwidth
>   0.00-1.00   sec  1.12 GBytes  9.58 Gbits/sec
>   1.00-2.00   sec  1.12 GBytes  9.62 Gbits/sec
>   2.00-3.00   sec  1.12 GBytes  9.62 Gbits/sec
>   3.00-4.00   sec  1.12 GBytes  9.60 Gbits/sec
>   4.00-5.00   sec  1.12 GBytes  9.59 Gbits/sec
>   5.00-6.00   sec  1.12 GBytes  9.61 Gbits/sec
>   6.00-7.00   sec  1.12 GBytes  9.59 Gbits/sec
>   7.00-8.00   sec  1.10 GBytes  9.47 Gbits/sec
>   8.00-9.00   sec  1.12 GBytes  9.60 Gbits/sec
>   9.00-10.00  sec  1.12 GBytes  9.63 Gbits/sec
>   - - - - - - - - - - - - - - - - - - - - - - - -
>   Interval   Transfer Bandwidth
> 0.00-10.00  sec  11.2 GBytes  9.59 Gbits/sec  sender
> 0.00-10.00  sec  11.2 GBytes  9.59 Gbits/sec  receive
> 
> Performance over the same link, but over VLAN 300 (SmartOS is running iperf3 
> -s; note wild variation from 2 - 5Gbit/sec):
> 
> Connecting to host 192.168.245.134, port 5201
>   local 192.168.245.133 port 56786 connected to 192.168.245.134 port 5201
>   Interval   Transfer Bandwidth
>   0.00-1.00   sec   523 MBytes  4.39 Gbits/sec
>   1.00-2.00   sec   481 MBytes  4.04 Gbits/sec
>   2.00-3.00   sec   608 MBytes  5.10 Gbits/sec
>   3.00-4.00   sec   560 MBytes  4.70 Gbits/sec
>   4.00-5.00   sec   242 MBytes  2.03 Gbits/sec
>   5.00-6.00   sec   592 MBytes  4.96 Gbits/sec
>   6.00-7.00   sec   553 MBytes  4.64 Gbits/sec
>   7.00-8.00   sec   253 MBytes  2.12 Gbits/sec
>   8.00-9.00   sec   569 MBytes  4.77 Gbits/sec
>   9.00-10.00  sec   507 MBytes  4.25 Gbits/sec
>   - - - - - - - - - - - - - - - - - - - - - - - -
>   Interval   Transfer Bandwidth
> 0.00-10.00  sec  4.77 GBytes  4.10 Gbits/sec  sender
> 0.00-10.00  sec  4.77 GBytes  4.10 Gbits/sec  receiver
> 
> Performance over the same link, VLAN 300, SmartOS as client, server on other 
> host (note significantly worse performance on transmit):
> 
> Connecting to host 192.168.245.133, port 5201
>   local 192.168.245.134 port 35851 connected to 192.168.245.133 port 5201
>   Interval   Transfer Bandwidth
>   0.00-1.00   sec   104 MBytes   875 Mbits/sec
>   1.00-2.00   sec  46.3 MBytes   389 Mbits/sec
>   2.00-3.00   sec   130 MBytes  1.09 Gbits/sec
>   3.00-4.00   sec  76.0 MBytes   638 Mbits/sec
>   4.00-5.00   sec  97.0 MBytes   814 Mbits/sec
>   5.00-6.00   sec  17.4 MBytes   146 Mbits/sec
>   6.00-7.00   sec  67.6 MBytes   567 Mbits/sec
>   7.00-8.00   sec  92.4 MBytes   775 Mbits/sec
>   8.00-9.00   sec  79.7 MBytes   669 Mbits/sec
>   9.00-10.00  sec  73.3 MBytes   615 Mbits/sec
>   - - - - - - - - - - - - - - - - - - - - - - - -
>   Interval   Transfer Bandwidth
> 0.00-10.00  sec   785 MBytes   658 Mbits/sec  sender
> 0.00-10.00  sec   784 MBytes   658 Mbits/sec  receiver
> 

Re: [smartos-discuss] New hardware

2017-10-26 Thread Robert Mustacchi
On 10/26/17 6:51 , Len Weincier wrote:
> Hi All
> 
> We are looking to get a bunch of new compute nodes and looking at 2 things 
> 
> - the latest scalable intel cpu’s 
> - all NVMe based storage
> 
> Are there any issues with the above that anyone knows of ?

NVMe hotplug is not currently supported.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] NIC not coming ‘UP’ in newly created lx-zone

2017-09-25 Thread Robert Mustacchi
On 9/24/17 0:49 , Gareth Howell wrote:
> Hi all
> I have an odd problem with a new Ubuntu lx-zone: the default nic won’t come
> UP.
> 
> The server is running joyent_20161110T013148Z and the one has been built
> using
> 
> 7b5981c4-1889-11e7-b4c5-3f3bdfc9b88b  ubuntu-16.0420170403
>  linuxlx-dataset2017-04-03
> 
> I have created several similar zones using this image and a common
> json template where I just change the name of the zone and the ip
> address. The json can be seen at https://pastebin.com/3eRnqAdW
> 
> The zone gets created OK, but when I zlogin, ifconfig shows only the
> loopback interface.
> 
> ifconfig -a shows
> 
> https://pastebin.com/Brcd7sB1
> 
> As you can see, eth0 has an IP address and is showing traffic.
> 
> ifconfig up eth0 returns "eth0: Host name lookup failure”
> 
> 
> Any ideas on how to debug this?

What do you see when you run the native ifconfig -a which can be under
/native?

Thanks,
Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] mr_sas issues with Dell PERC H730 Mini

2017-09-10 Thread Robert Mustacchi
On 9/10/17 10:26 , Michael Loftis wrote:
> Yeah I reported similar or same issue with the H330 here
> https://www.illumos.org/issues/8391
> 
> I haven't had much time to pursue this personally.

David,

Does this match some of the issues with the mr_sas driver that you had
seen previously? Do you know if these are the same symptoms?

Robert

> On Fri, Sep 8, 2017 at 13:56 Ian Collins  wrote:
> 
>> On 09/ 8/17 11:25 AM, Ian Collins wrote:
>>> I've hit this bug again on a new Dell box with an H330 mini.  The box
>>> was running fine under high load for a couple of weeks, now simply
>>> importing the pool triggers the timeouts and failed resets..
>>>
>>> The controller is running firmware version 25.5.2.0001.
>>>
>>> I'm running 20170805T013701Z on this box.
>>>
>>> Any clues?
>>>
>>
>> Thrashing each disk in turn with format surface analysis does not
>> trigger the bug, but importing the pool read only does...
>>
>> Could there be two bugs at play here? A firmware bug causing timeouts
>> and a driver bug failing to reset the controller?
>>
>> Ian.
>>
>> --
>> Ian.
>>



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] Spurious bge_mii_access logging from joyent_20170831T155808Z ?

2017-09-08 Thread Robert Mustacchi
On 9/8/17 5:38 , Chris Ridd wrote:
> Hi,
> 
> I just rebooted my hp microserver (gen 8 - two bge nics) into
> 20170831T155808Z, and noticed a bunch of notice level logging on
> startup. Is it new, or important?
> 
> Only bge0 is connected to anything.

Hi Chris,

It looks like we're hitting an instance of the BGE_REPORT macro which
seems to be enabled regardless of a debug or non-debug build. Do you
know what you were on previously? I don't think anything changed in the
build related to this, but maybe something is changing at runtime that's
causing us to see it for the first time?

From what's there, I don't think that anything there is notable. Can you
confirm that things are still working for you?

Thanks,
Robert

> 2017-09-08T12:30:00.482053+00:00 64-51-06-d8-07-f8 pcplusmp: [ID 805372
> kern.info] pcplusmp: pciex14e4,165f (bge) instance 0 irq 0x1b vector
> 0x60 ioapic 0xff intin 0xff is bound to cpu 0
> 2017-09-08T12:30:00.482068+00:00 64-51-06-d8-07-f8 bge: [ID 801725
> kern.info] NOTICE: bge0: bge_mii_access: cmd 0x24208000 --
> MI_COMMS_START set for 410 us; 0x2820->0x82a3c00
> 2017-09-08T12:30:00.482074+00:00 64-51-06-d8-07-f8 bge: [ID 801725
> kern.info] NOTICE: bge0: bge_mii_access: cmd 0x2820 --
> MI_COMMS_START set for 80 us; 0x24380c00->0x4380c00
> 2017-09-08T12:30:00.482079+00:00 64-51-06-d8-07-f8 bge: [ID 801725
> kern.info] NOTICE: bge0: bge_mii_access: cmd 0x2820 --
> MI_COMMS_START set after transaction; 0x8203100->0x2435010b
> 2017-09-08T12:30:00.482083+00:00 64-51-06-d8-07-f8 bge: [ID 801725
> kern.info] NOTICE: bge0: bge_mii_access: cmd 0x2821 --
> MI_COMMS_START set for 40 us; 0x2435010b->0x435010b
> 2017-09-08T12:30:00.482088+00:00 64-51-06-d8-07-f8 bge: [ID 801725
> kern.info] NOTICE: bge0: bge_mii_access: cmd 0x2821 --
> MI_COMMS_START set after transaction; 0x8217949->0x24380400
> 2017-09-08T12:30:00.482092+00:00 64-51-06-d8-07-f8 bge: [ID 801725
> kern.info] NOTICE: bge0: bge_mii_access: cmd 0x2821 --
> MI_COMMS_START set for 40 us; 0x24380400->0x4380400
> 2017-09-08T12:30:00.482096+00:00 64-51-06-d8-07-f8 bge: [ID 801725
> kern.info] NOTICE: bge0: bge_mii_access: cmd 0x2821 --
> MI_COMMS_START set after transaction; 0x8217949->0x24290300
> 2017-09-08T12:30:00.482100+00:00 64-51-06-d8-07-f8 bge: [ID 801725
> kern.info] NOTICE: bge0: bge_mii_access: cmd 0x2821 --
> MI_COMMS_START set for 50 us; 0x24290300->0x4290300
> 2017-09-08T12:30:00.482116+00:00 64-51-06-d8-07-f8 bge: [ID 801725
> kern.info] NOTICE: bge0: bge_mii_access: cmd 0x2821 --
> MI_COMMS_START set for 60 us; 0x24380c00->0x4380c00
> 2017-09-08T12:30:00.482120+00:00 64-51-06-d8-07-f8 bge: [ID 801725
> kern.info] NOTICE: bge0: bge_mii_access: cmd 0x2821 --
> MI_COMMS_START set for 50 us; 0x24374022->0x4374022
> 2017-09-08T12:30:00.482125+00:00 64-51-06-d8-07-f8 bge: [ID 801725
> kern.info] NOTICE: bge0: bge_mii_access: cmd 0x2821 --
> MI_COMMS_START set after transaction; 0x8217949->0x243501ff
> 2017-09-08T12:30:00.482129+00:00 64-51-06-d8-07-f8 bge: [ID 801725
> kern.info] NOTICE: bge0: bge_mii_access: cmd 0x2821 --
> MI_COMMS_START set for 50 us; 0x243501ff->0x43501ff
> 2017-09-08T12:30:00.482133+00:00 64-51-06-d8-07-f8 bge: [ID 801725
> kern.info] NOTICE: bge0: bge_mii_access: cmd 0x2821 --
> MI_COMMS_START set after transaction; 0x8217949->0x242d0007
> 2017-09-08T12:30:00.482137+00:00 64-51-06-d8-07-f8 bge: [ID 801725
> kern.info] NOTICE: bge0: bge_mii_access: cmd 0x2821 --
> MI_COMMS_START set for 120 us; 0x242d0007->0x42e003c
> 2017-09-08T12:30:00.482141+00:00 64-51-06-d8-07-f8 bge: [ID 801725
> kern.info] NOTICE: bge0: bge_mii_access: cmd 0x2821 --
> MI_COMMS_START set for 50 us; 0x242e0006->0x42e0006
> 2017-09-08T12:30:00.482150+00:00 64-51-06-d8-07-f8 bge: [ID 801725
> kern.info] NOTICE: bge0: bge_mii_access: cmd 0x2821 --
> MI_COMMS_START set after transaction; 0x8217949->0x282e
> 2017-09-08T12:30:00.482156+00:00 64-51-06-d8-07-f8 bge: [ID 801725
> kern.info] NOTICE: bge0: bge_mii_access: cmd 0x2821 --
> MI_COMMS_START set for 50 us; 0x282e->0x82e0006
> 2017-09-08T12:30:00.482160+00:00 64-51-06-d8-07-f8 bge: [ID 801725
> kern.info] NOTICE: bge0: bge_mii_access: cmd 0x2821 --
> MI_COMMS_START set after transaction; 0x8217949->0x24201300
> 2017-09-08T12:30:00.482164+00:00 64-51-06-d8-07-f8 bge: [ID 801725
> kern.info] NOTICE: bge0: bge_mii_access: cmd 0x2821 --
> MI_COMMS_START set for 40 us; 0x24201300->0x4201300
> 2017-09-08T12:30:00.482168+00:00 64-51-06-d8-07-f8 mac: [ID 469746
> kern.info] NOTICE: bge0 registered
> 2017-09-08T12:30:00.482172+00:00 64-51-06-d8-07-f8 pcieb: [ID 586369
> kern.info] PCIE-device: pci103c,2133@0, bge0
> 2017-09-08T12:30:00.482176+00:00 64-51-06-d8-07-f8 npe: [ID 236367
> kern.info] PCI Express-device: pci103c,2133@0, bge0
> 2017-09-08T12:30:00.482180+00:00 64-51-06-d8-07-f8 genunix: [ID 936769
> kern.info] bge0 is /pci@0,0/pci8086,1c18@1c,4/pci103c,2133@0
> 2017-09-08T12:30:00.482193+00:00 

Re: [smartos-discuss] Intel X520-SR2 10GbE NIC and MTU

2017-08-18 Thread Robert Mustacchi
On 8/18/17 16:38 , John Croix wrote:
> Hi Robert,
> 
> Per instructions, I took out every single reference to “ixgbe0” from my 
> /usbkey/config file. I then rebooted the system. Next, I issued the nictagadm 
> command:
> 
> - SmartOS (build: 20170720T001051Z)
> [root@smartos ~]# cd /usbkey
> [root@smartos /usbkey]# grep ixgbe0 config
> [root@smartos /usbkey]# nictagadm add -p mtu=9000 ixgbe0 0:1b:21:bc:51:7a
> MTU changes will not take effect until next reboot
> [root@smartos /usbkey]# grep ixgbe0 config
> ixgbe0_nic=00:1b:21:bc:51:7a
> ixgbe0_mtu=9000
> [root@smartos /usbkey]# reboot
> 
> Upon rebooting, I had the very same issue, with the same error being 
> displayed in the log file about the MTU. Once I commented out the MTU setting 
> in /usbkey/config, I was able to boot the system normally, but at the default 
> MTU of 1500.

That's a bit surprising. I guess we never name our nic tags with the
same name as the device. Would you mind just making up some random name
that doesn't match the device for the tag? If that fails, then can you
share the failed service log?

> Could this have anything to do with the fact that this is an OEM X520 instead 
> of Intel-branded? It doesn’t matter for Linux, but could it make a difference 
> for SmartOS? Here’s the output from lspci on Linux for this device:
> 
> jcroix@ubuntu-server:~$ lspci -nn -vvv | grep Ethernet
> 00:19.0 Ethernet controller [0200]: Intel Corporation 82579V Gigabit Network 
> Connection [8086:1503] (rev 06)
> 02:00.0 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit 
> SFI/SFP+ Network Connection [8086:10fb] (rev 01)
> Subsystem: Intel Corporation Ethernet Server Adapter X520-1 [8086:000a]
> 02:00.1 Ethernet controller [0200]: Intel Corporation 82599ES 10-Gigabit 
> SFI/SFP+ Network Connection [8086:10fb] (rev 01)
> Subsystem: Intel Corporation Ethernet Server Adapter X520-1 [8086:000a]
> 
> Notice that the subsystem identifier is [8086:000a], which is NOT the same 
> subsystem ID for Intel-branded boards. Dell uses one subsystem ID, Small Tree 
> another, etc. It’s still an Intel X520, and it runs perfectly as such under 
> Linux using the Intel driver. And, of course, it works fine under SmartOS at 
> a MTU of 1500. Thought I’d mention it, “just in case”.

That shouldn't make a difference, though thanks for mentioning it. From
my memory we don't use the subdevice ids in the driver at all.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] Contributing to smartos-live

2017-08-18 Thread Robert Mustacchi
On 8/18/17 8:37 , Daniel Kontsek wrote:
> Thank you very much Cody for pointing me in the right direction.
> 
> I already went ahead and followed the Gerrit instructions and created a CR: 
> https://cr.joyent.us/#/c/2353/  I did this to 
> learn the process of working with Gerrit (not that the bug/CR itself isn’t 
> important - it is). The workflow is different compared to the GitHub/GitLab 
> (forks/branches) style. I’ve accidentally created and abandoned another CR 
> when pushing an update to gerrit - sorry for that. Then I followed the Gerrit 
> documentation to update the CR and I’ve just noticed the "We have one major 
> exception to the standard Gerrit workflow:…” message in the wiki. I’ve used 
> the “amend + Change-Id” method to add another patchset and again: I’m sorry 
> for that (I didn’t notice it the first time).
> 
> Should I abandon the Change 2353 and create a new one?

Hi Daniel,

There should be no need. It's fine. The only reason we're not using the
Change-Id method is because of the final commit message. Even if it's
used in the interim it shouldn't cause any problems.

Thanks,
Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] Intel X520-SR2 10GbE NIC and MTU

2017-08-18 Thread Robert Mustacchi
On 8/17/17 8:41 , John Croix wrote:
> I’ve tried to debug a little further into why the Intel X520 MTU can’t be set 
> by comparing 2 log files (/var/svc/log/network-physical:default.log) from 
> separate boots as I enabled and disabled my two 10GbE NICs. I have a log file 
> where my Myricom 10GbE adapter MTU is set to 9000 and where the X520 fails to 
> set the MTU. The Myricom has the “feature” that the MTU can only be set once 
> at the start of a boot. Here are the two relevant excerpts from the log 
> files. All lines prior in both log files are more or less identical 
> (differences being MAC addresses and NIC tag names).
> 
> X520:
> + link=ixgbe0
> + [[ -z ixgbe0 ]]
> + /usr/sbin/dladm show-linkprop -c -o value -p mtu ixgbe0
> + curmtu=1500
> + [[ 0 -eq 0 ]]
> + [[ 1500 -eq 9000 ]]
> + /usr/sbin/dladm set-linkprop -p mtu=9000 ixgbe0
> /usr/sbin/dladm: warning: cannot set link property 'mtu' on 'ixgbe0': link 
> busy
> + echo 'Failed to set mtu to 9000 for link ixgbe0'
> Failed to set mtu to 9000 for link ixgbe0
> + exit 95
> 
> Myricom:
> + link=myri10ge0
> + [[ -z myri10ge0 ]]
> + /usr/sbin/dladm show-linkprop -c -o value -p mtu myri10ge0
> + curmtu=1500
> + [[ 0 -eq 0 ]]
> + [[ 1500 -eq 9000 ]]
> + /usr/sbin/dladm set-linkprop -p mtu=9000 myri10ge0
> + [[ true == true ]]
> 
> Unfortunately I don’t know enough about Illumos internals to go much further. 
> Any suggestions for further experiments?

Hi John,

I'd probably start by taking a step back. I think that the problem here
is likely the way that you're using the config file. What I'd do first
is delete everything related to the ixgbe instance that you have in the
config file. Once that comes up, I would then use nictagadm to create a
nictag with an MTU of 9k. I would then reboot and verify that the mtu is 9k.

We generally don't try to bring up the physical interface in these
config files with the exception of the admin network, which doesn't
support a larger mtu. Most of the GZ device tuning in SmartOS (which is
different from normal illumos due to the live image) is driven through
setting the nic tags up.

The reason that you see the EBUSY when using dladm set-linkprop is for
the same reason that it was seen with the myri10ge devices. lldpd has
already started using the interface and the driver doesn't support
changing things once it's in use.

Robert

Robert

>> On Aug 15, 2017, at 3:06 PM, John Croix > > wrote:
>>
>> In my quest for better connectivity between my Mac and my iSCSI volume on my 
>> SmartOS server, I’ve picked up an Intel X520-SR2 10GbE NIC and put it into 
>> my server. The system recognized the card without an issue, and I was able 
>> to use nictagadm to add the NIC to the system. The problem is that it won’t 
>> accept a MTU parameter of 9000. I had to comment the MTU line out of 
>> /usbkey/config in order to get the server to fully boot.
>>
>> Here are the relevant lines in the /usbkey/config file:
>> ixgbe0_nic=0:1b:21:bc:51:7a
>> ixgbe0_ip=192.168.2.2
>> ixgbe0_netmask=255.255.255.0
>> ixgbe0_mtu=9000
>>
>> From “svcs -x”, I see this:
>> svc:/network/physical:default (physical network interfaces)
>>  State: maintenance since August 15, 2017 at 07:52:33 PM UTC
>> Reason: Start method exited with $SMF_EXIT_ERR_FATAL.
>>See: http://illumos.org/msg/SMF-8000-KS 
>> 
>>See: ifconfig(1M)
>>See: /var/svc/log/network-physical:default.log
>> Impact: 22 dependent services are not running.  (Use -v for list.)
>>
>> Here’s the log file content, with the error at the end:
>> [ Aug 15 19:52:31 Executing start method ("/lib/svc/method/net-physical"). ]
>> [ Aug 15 19:52:31 Timeout override by svc.startd.  Using infinite timeout. ]
>> + smf_configure_ip
>> + /sbin/zonename -t
>> + [ global = global -o shared = exclusive ]
>> + return 0
>> + LD_LIBRARY_PATH=/lib
>> + export LD_LIBRARY_PATH
>> + ADMIN_DHCP_TIMEOUT=300
>> + ActiveAggrLinks=''
>> + typeset -A ActiveAggrLinks
>> + smf_netstrategy
>> + smf_is_nonglobalzone
>> + [ global != global ]
>> + return 1
>> + /sbin/netstrategy
>> + set -- ufs none none
>> + [ 0 -eq 0 ]
>> + [ ufs = nfs ]
>> + _INIT_NET_STRATEGY=none
>> + export _INIT_NET_STRATEGY
>> + typeset -A plumbedifs
>> + smf_is_globalzone
>> + [ global = global ]
>> + return 0
>> + /usr/sbin/dladm init-phys
>> + log_if_state before
>> + echo '== debug start: before =='
>> == debug start: before ==
>> + /usr/sbin/dladm show-phys
>> LINK MEDIASTATE  SPEED  DUPLEXDEVICE
>> myri10ge0Ethernet unknown1  full  myri10ge0
>> e1000g0  Ethernet unknown0  half  e1000g0
>> e1000g1  Ethernet unknown0  half  e1000g1
>> ixgbe0   Ethernet down   0  unknown   ixgbe0
>> ixgbe1   Ethernet down   0  unknown   ixgbe1
>> + /sbin/ifconfig -a
>> lo0: 

Re: [smartos-discuss] vmadm create times out (but zone eventually starts) on DL360g6

2017-08-18 Thread Robert Mustacchi
On 8/17/17 18:31 , Rob Seastrom wrote:
> 
> Hi folks,
> 
> I'm scratching my head over SmartOS on a DL360g6 which I've been trying to 
> piece together for deployment in a remote datacenter (DR and DNS service), so 
> it's smaller / less capable than the machines that I usually run.
> 
> Some time ago I tried running SmartOS on these machines with an HP P410i RAID 
> controller.  The disk performance at the time was generally unsatisfactory, 
> but there's a new driver for that controller family effective late last year 
> ( https://smartos.org/bugview/OS-5564 ) so I figured I'd give it a try.
> 
> I'm running 20170608T172228Z on all the other machines around here so I 
> figured I'd give it a go on this one as well.  Booted fine from the USB, 
> zpool was a ZFS mirror of two drives on the HP controller configured as 
> singles.  Everything seemed to come up OK, disk performace was adequate but 
> nothing to write home about...  but creating a VM timed out waiting for the 
> VM to become ready...  vmadm list showed the VM in provisioning status...  
> and it eventually went to "running".
> 
> Odd.  Well, maybe 6g was not enough ram for it to be truly happy.  I upped 
> the memory to 24g.  No dice same deal.
> 
> OK, must be the disk subsystem right?  Picked up some HP H220s (SAS2308 aka 
> 9207s) and reflashed to IT mode.  System boots but throws an odd error in the 
> middle of booting:  "warning: KCS error: ff" - Google tells me little except 
> that maybe bmc is weird about the card.  Disabled the built-in SmartArray 
> just in case.

That's an error we've seen on some BMCs. In general, I wouldn't worry
about it.

> Disk array faster for copy across the network (getting 110-112 MByte/sec on 
> gigabit ethernet vs. 75-80 before).  zpool scrub operates as expected -  
> "37.6G scanned out of 102G at 134M/s, 0h8m to go".  And still, vmadm create 
> runs out the clock:
> 
> [root@00-23-7d-e8-af-38 /zones/rs]# time vmadm create -f test29.json 
> first of 1 error: timed out waiting for zone to transition to running
> 
> real0m57.031s
> user0m2.322s
> sys 0m1.847s
> [root@00-23-7d-e8-af-38 /zones/rs]# 

In this case, there are a few different things I'd check before we even
worry about the disks. First, there's the zone_bh log in /var/log.
That'll have some information about all the transitions that the brand
went through. I'd first use that to see if the zone transitioned to
running or not. Let's figure that out, as that'll tell us where to look
next.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] 10GbE tuning parameters

2017-08-14 Thread Robert Mustacchi
On 8/13/17 20:24 , John Croix wrote:
> Just wanted to check in to see if anybody had any recommendations for tuning 
> parameters for 10GbE performance or iSCSI.
> 
> I have 2 Myricom 10GbE adapters direct connected to one another. On the 
> SmartOS side, I have an iSCSI ZFS volume set up. On the Mac side, I’m using 
> GlobalSAN to attach to the SmartOS volume. Jumbo frames (MTU=9000) are 
> enabled on both sides. I’ve benchmarked my performance using netperf, and I’m 
> seeing the following (executed from the Mac side, netperf server running on 
> SmartOS):
> 
> # netperf -H 192.168.2.2 -t TCP_STREAM -C -c -l 60  -- -s 512K -S 512K
> MIGRATED TCP STREAM TEST from (null) (0.0.0.0) port 0 AF_INET to (null) () 
> port 0 AF_INET
> Recv   SendSend  Utilization   Service Demand
> Socket Socket  Message  Elapsed  Send Recv SendRecv
> Size   SizeSize Time Throughput  localremote   local   remote
> bytes  bytes   bytessecs.10^6bits/s  % O  % ?  us/KB   us/KB
> 
> 524744 524288 52428860.00  5512.56   6.35 10.421.132   -0.310 
> 
> I’ve also created a very large file of 0’s, and am using “dd” to copy them 
> over. Here’s what I’m seeing when running that on the Mac:
> 
> # time dd if=junk.zero of=/Volumes/remote/junk.zero bs=1048576
> 43158+1 records in
> 43158+1 records out
> 45254967296 bytes transferred in 92.342684 secs (490076369 bytes/sec)
> 
> real  1m32.382s
> user  0m0.041s
> sys   0m30.624s
> 
> I’ve followed a few tuning guides on the Mac, which actually brought the 
> numbers up to the levels I’m showing here. I’m now looking for things on the 
> SmartOS side that can help. BTW, I did try the suggestions here 
> (https://community.emc.com/docs/DOC-39156 
> ), but changing those properties 
> didn’t seem to make a difference to any of my numbers.
> 
> According to my Mac, I have good throughput to the Myricom card itself (a 
> value of 1280 corresponds to 10Gb/sec), so I don’t think that there’s an 
> issue between the Mac and my ethernet card. The card is in a Mercury Helios 
> external cage, connected via Thunderbolt 2 (20Gbs top speed). The card is a 
> Myricom 10G-PCIE2-8B2-2S NIC.
> 
> # sysctl net.myri10ge | grep dma
> net.myri10ge.en13.dma_read_bw_MBs: 1436
> net.myri10ge.en13.dma_write_bw_MBs: 1456
> net.myri10ge.en13.dma_read_write_bw_MBs: 2610
> net.myri10ge.en12.dma_read_bw_MBs: 1436
> net.myri10ge.en12.dma_write_bw_MBs: 1456
> net.myri10ge.en12.dma_read_write_bw_MBs: 2610
> 
> Finally, the SmartOS system itself is a SuperMicro X8DTE-F running 2 Xeon 
> L5630’s (16 cores total) with 96GB of ECC memory and a three 3TB hard drives, 
> in a 3-way mirror, that the iSCSI volume is on. Synchronization is disabled:
> 
> [root@smartos ~]# zfs get sync zones/zpool/iscsi-1
> NAME PROPERTY  VALUE SOURCE
> zones/zpool/iscsi-1  sync  disabled  local
> 
> Sorry for the long post, but trying to supply any pertinent information 
> without people having to ask for it. Any help in boosting these numbers would 
> be appreciated.

In terms of investigating this, I have a couple of different questions.
I guess, in general, I'd first focus on understanding the upper bound.
When you're doing the streaming TCP tests are those going to a VNIC, to
an interface that's been plumbed up in the GZ? Something else? In
general, we haven't seen much tuning across Intel or other vendors cards
in terms of driving 10 GbE perf. That said, VLANs can ultimately limit
perf among some other factors.

I'm not sure if this is that helpful, but hopefully helps to start give
some place to go look at. If you're instead focused on iSCSI, I'd start
characterizing the latency of operations with DTrace by op type so we
can get a better understanding of the overall system perf.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] Myricom 10Gb ethernet card & jumbo frames

2017-08-07 Thread Robert Mustacchi
On 8/7/17 21:34 , John Croix wrote:
> I just downloaded the latest, greatest USB image and tried jumbo frames with 
> it. The Myricom card is now working again. Probably should have done that 
> first. It looks like the powerdown command turns off the computer again, too 
> (that had also stopped working). I’m now seeing a significant speed boost 
> when dealing with large files as a result.
> 
> Sorry for the false alarm. And you were right - all I had to do was play with 
> the config entry. I just seem to remember something different that I had to 
> do back when you were working on this commit. Either my memory was faulty or 
> that was an experiment prior to your commit to the release. Again, sorry for 
> the false alarm.

Just to confirm, the config entry here was creating a nic tag over the
device with a 9k mtu, right?

Thanks,
Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] Myricom 10Gb ethernet card & jumbo frames

2017-08-07 Thread Robert Mustacchi
On 8/7/17 12:08 , John Croix wrote:
> Hi Robert,
> 
> A few years ago, you helped me with this very same problem :). Basically, 
> when the card is initialized, the MTU is set and cannot be reset. That meant 
> that you had to create some type of workaround due to the order in which 
> services were started on SmartOS. I wish I could find the e-mail you sent me, 
> but I don’t seem to have it anymore. I set a parameter in a file on the USB 
> key, and you looked at that parameter to set the MTU when you initialized the 
> card.
> 
> Apparently that process doesn’t work anymore. When I upgraded my SmartOS 
> distribution a few months ago, a bunch of the services didn’t come up. When I 
> checked into it, I found out that the Myricom card wasn’t coming up because 
> of the 9000 parameter. I eliminated the MTU parameter, letting it default 
> back to 1500, and the system booted up just fine.
> 
> When I get back home, I can retry the 9000 value to see what the exact error 
> message is. The stupid thing that I did, though, was to not make a note of 
> where that parameter was set. I thought that I had it in my notes, so when I 
> removed it, I didn’t bother writing anything down about what file I removed 
> it from or what the parameter name was. Now I’m not sure what I need to 
> enable to get the error back again. I don’t suppose that you might still have 
> a copy of that e-mail you sent me back in 2014 with the workaround that you 
> integrated into the release, do you?

Hi John,

What I recall implementing wasn't a workaround, but I actually just
added the support to set the MTU through dladm set-linkprop and not
through the driver.conf property. At least, this is what I recall:
https://github.com/joyent/illumos-joyent/commit/5f23582.

Presuming that the MTU is properly noted in the nic tag, this should
work. So it might be helpful to make sure that we have the nic tag's MTU
properly recorded and then figure out what's going on.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] Myricom 10Gb ethernet card & jumbo frames

2017-08-07 Thread Robert Mustacchi
On 8/7/17 3:00 , Alex Kritikos wrote:
> I have a similar problem with solarflare nics where max MTU seems to be 1500 
> which seems to cause problems with triton overlay networks. 
Hi Alex,

From the driver, it appears that the max MTU is 9202 bytes. At least,
based on
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/io/sfxge/sfxge_gld_v3.c#1055.
Can you provide more detail about what's going wrong?

Thanks,
Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] Myricom 10Gb ethernet card & jumbo frames

2017-08-07 Thread Robert Mustacchi
On 8/6/17 17:25 , John Croix wrote:
> A few months ago I ran into a problem booting SmartOS with my Myricom card 
> set up for jumbo frames (MTU of 9000). It was working for a long time but 
> stopped. I was wondering if anybody has a solution to the problem. My 
> short-term fix was to set the MTU to 1500 and move on, but I then forgot to 
> follow up with a bug report :). I’ve sense gotten back to it because my 
> transmission rate on 10Gb optical is only about 3.4Gb with a 1500 MTU.

Can you provide more details about what specifically wasn't working?
Unfortunately, there isn't enough information here for us to be able to
suggest what to do next.

Thanks,
Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] random host crashes during high proc count & load / lx zone

2017-08-05 Thread Robert Mustacchi
Hi Daniel,

I suspect that you're trying to say that you're seeing a large amount of
crashes with something that seems like the change blow is related?

Do you have any crash dumps in /var/crash/volatile that you can share so
we can help debug this?

Thanks,
Robert

On 8/5/17 0:59 , Daniel Plominski wrote:
>  
> 
> https://github.com/illumos/illumos-gate/commit/b81db1e8f4fb4ce1e3bf7f8053643f62803cf4fe
> 
>  
> 
> https://us-east.manta.joyent.com/Joyent_Dev/public/builds/smartos/release-20170803-20170803T064301Z/smartos//changelog.txt
> 
> 
>  
> 
> Mit freundlichen Grüßen
> 
>  
> 
>  
> 
> *DANIEL PLOMINSKI*
> 
> Leiter – IT / Head of IT
> 
>  
> 
> Telefon 09265 808-151  |  Mobil 0151 58026316  |  d...@ass.de
> 
> 
> PGP Key: http://pgp.ass.de/2B4EB20A.key
> 
>  
> 
>  
> 
> cid:C17DB6FB-5F79-4BCC-AAB4-CAB59266BC29@localdomain
> 
>  
> 
> ASS-Einrichtungssysteme GmbH
> 
> ASS-Adam-Stegner-Straße 19  |  D-96342 Stockheim
> 
>  
> 
> Geschäftsführer: Matthias Stegner, Michael Stegner, Stefan Weiß
> 
> Amtsgericht Coburg HRB 3395  |  Ust-ID: DE218715721
> 
>  
> 
> cid:E40AEC87-91EE-472A-901A-ECAD3F5801FB@localdomain
> 
>  
> 
> *smartos-discuss* | Archives
> 
>  |
> Modify
> 
> Your Subscription [Powered by Listbox] 
> 



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] Disk disappeared

2017-07-31 Thread Robert Mustacchi
On 7/31/17 8:07 , Nigel Magnay wrote:
> I have an el-cheapo dell desktop I'm using for the HN of a test Triton
> cluster.
> 
> It has 1x SSD and 1x HDD.
> 
> Recently, when booting, it was freezing with an error of
> pci@0,0 SYNCHRONIZE CACHE command failed
> 
> I have suspected a disk failure. However: if I remove the SSD, boot into
> Ubuntu using a USB key, I can not only see the HDD, but (nearly) import the
> pool into ZFS (it complains of a custom attribute so it can't).
> 
> Booting into the latest SmartOS image, using 'format' and the disk doesn't
> even appear.
> 
> So is there something else I can try?

I might start by checking fmadm fautly (see fmadm(1M) for more info) and
taking a look at the error log from fmdump(1M). As it's possible the
fault management architecture (FMA) may have thought something was going on.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] KVM live migration debug

2017-07-28 Thread Robert Mustacchi
On 7/21/17 8:33 , Ján Poctavek wrote:
> Hi,
> 
> as advised, I've changed the guest OS to SmartOS. I was able to migrate
> the OS to the second qemu. Before the migration, I've started debugger
> using "mdb -K". Now I have two VMs:
> 
> 1. The source VM running mdb.
> 2. The destination VM also running mdb without a problem.
> 
> When I resume the OS by exiting mdb in the source VM, everything is
> running fine.
> When I do the same in the second VM, the processes start to crash with
> SEGV.
> 
> The contents of the two VMs are expected to be absolutely identical..
> but apparently they are not. Anyway, with running mdb on both VMs, it
> seems as an ideal debug setup to me - just compare and find the
> difference. I believe it should be easy to look on the memory.
> 
> Unfortunately, even after 2 days spent reading various mdb manuals and
> handbooks, I don't know how to actually do it.
> 
> Can you please help me with that?

When this occurs, do you get a kernel crash dump?

Thanks,
Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] random host crashes during high proc count & load / lx zone

2017-07-19 Thread Robert Mustacchi
On 7/19/17 10:55 , Alex Kritikos wrote:
> Hi Robert,
> 
> The output is (truncated to only that NIC)
> 
> LINK PROPERTYPERM VALUE  DEFAULTPOSSIBLE
> 
> sfxge0   mtu rw   1500   1500   1500
> sfxge1   mtu rw   1500   1500   1500
> 
> 
> I am afraid I don’t have a special relationship with them but lets first
> figure what is going on.

OK, from looking at the driver source, I think this is a bug of sorts.
We don't actually report what the MTU range is here.
http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/io/sfxge/sfxge_gld_v3.c#1152
is the relevant source. So that's unfortunate. We should get that fixed.

What happens if you run dladm set-linkprop -t -p mtu=9000 sfxge0?

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] random host crashes during high proc count & load / lx zone

2017-07-19 Thread Robert Mustacchi
On 7/19/17 5:03 , Alex Kritikos wrote:
> Hello list,
> 
> I am running the latest release of SmartOS / Triton and I have a Solaflare
> NIC. While this is now correctly detected it seems to allow a max MTU of
> 1500. Looking at the latest Solarflare Solaris drivers it appears that the
> latest version allows a max MTU of 9000. I am trying to setup an overlay
> network for triton over that NIC so I am currently stuck.

Hi Alex,

What does dladm show-linkprop -p mtu show for that device? From my read
of the driver source code, the maximum MTU should be 9202 for the
Solarflare cards.

> Is there any plan to update the solar flare nic driver? Is it possible to
> do this myself somehow?

In this case, I think we should better understand what's going on. That
said, more generally, we're not in a position to update the driver as it
was written by Solarflare and at least, at Joyent, we don't have
documentation or specifications to do the work. If you have a
relationship with Solarflare it may be worth talking to them about this,
though we're happy to help them (and other vendors) as we can.

That said, depending on what this is, we can still potentially make
changes without that information.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] Multi-processors and Vultr ... I'd love to get this sorted.

2017-07-14 Thread Robert Mustacchi
On 7/14/17 13:25 , David Preece wrote:
> Ahhh, it hasn't magically gone away. Vultr did something to their VM's about 
> ... a year back, something like that. Anyway, it stopped SmartOS from being 
> able to use more than one core. Booting from 20170706 (ISO) it says:
> 
> "NOTICE: System detected 2 cpus, but only 1 cpu(s) were enabled during boot.
> NOTICE: Use "boot-ncpus" parameter to enable more CPU(s). See eeprom(1M)."
> 
> dmesg says: https://pastebin.com/wWAtQsry
> 
> prtpicl says: https://pastebin.com/ruV0W6ZF
> 
> kstat cpu_info: https://pastebin.com/yKKLVkFi
> 
> kstat pg_hw_perf: https://pastebin.com/i2wjmJfn
> 
> psrinfo: "0   on-line   since 07/14/2017 19:37:53"
> 
> The error is produced from: 
> https://github.com/illumos/illumos-gate/blob/2428aad8462660fad2b105777063fea6f4192308/usr/src/uts/i86pc/os/mp_startup.c#L1586
> 
> eeprom boot-ncpus=2 ... does nothing

Hi David,

I have an idea what this may be. Is it possible to get a dump of the
ACPI tables (see acpidump(1M) for more information) from the guest? I
think I have a working theory on this.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] Checking in on Ryzen

2017-07-13 Thread Robert Mustacchi
On 7/13/17 4:30 , Jorge Schrauwen wrote:
> Hi Robert,
> 
> Out of interest, by 'SmartOS runs well' do you mean all of SmartOS or
> everything except QEMU/KVM?

At the moment, this is something that's still going on. So we may be
able to also include KVM, but I can't say at this time.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] Checking in on Ryzen

2017-07-12 Thread Robert Mustacchi
On 7/12/17 20:28 , Patrick O'Sullivan via smartos-discuss wrote:
> I had seen some earlier issues with running SmartOS on AMD Ryzen. Has
> anyone gotten to what one might call success, or do others perhaps have
> plans for the server platform (Epyc)?

Hi Patrick,

Working on making sure that SmartOS runs well on the epyc platform is
definitely something that we care about and is something we're working
on in the background.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] KVM live migration debug

2017-07-12 Thread Robert Mustacchi
On 7/1/17 13:35 , Ján Poctavek wrote:
> Hi,
> 
> I'm trying to get a KVM/qemu live migration working on SmartOS. My
> starting point was the same problem as in this post:
> https://www.listbox.com/member/archive/184463/2012/04/sort/time_rev/page/2/entry/24:101/20120417112635:B4169A4C-88A1-11E1-9C88-F96B3BAD9C1B/
> 
> 
> I have dtraced the EIVALs and I have identified two problems -
> unimplemented ioctls: KVM_GET_IRQCHIP and KVM_GET_CLOCK.
> 
> The first one can be (at least temporarily) solved by adding
> "-no-kvm-irqchip" to qemu flags.
> 
> With the second one, I have implemented ioctl calls for KVM_GET_CLOCK
> and KVM_SET_CLOCK in the KVM kernel module.
> 
> After this, I am able to do migration without qemu complaining. More
> importantly, I am able to successfully migrate the VM in GRUB prompt
> (using "migrate" qemu command).
> 
> But when migrating linux (booted into the initrd target for simplicity),
> it panicks after pressing "enter" in the console:
> 
> [   28.337953] double fault:  [#1] SMP
> [   28.337953] Modules linked in: ext4 mbcache jbd2 sd_mod crc_t10dif
> sr_mod cdrom crct10dif_generic crct10dif_common ata_generic pata_acpi
> ata_piix serio_raw libata floppy
> [   28.337953] CPU: 0 PID: 195 Comm: sh Not tainted
> 3.10.0-514.16.1.el7.x86_64 #1
> [   28.337953] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
> [   28.337953] task: 88001f46 ti: 88001f74c000 task.ti:
> 88001f74c000
> [   28.337953] RIP: 0010:[] []
> do_page_fault+0xb/0x90
> [   28.337953] RSP: 0008:7ffc606b9000  EFLAGS: 00010097
> [   28.337953] RAX: 8168e8ec RBX: 0001 RCX:
> 8168e8ec
> [   28.337953] RDX: cdc0 RSI:  RDI:
> 7ffc606b9018
> [   28.337953] RBP: 7ffc606b9008 R08: 000a R09:
> 7f7a161bf740
> [   28.337953] R10: 0008 R11: 0246 R12:
> 
> [   28.337953] R13:  R14: 0002 R15:
> 7ffc606ba860
> [   28.337953] FS:  7f7a161bf740() GS:88001fc0()
> knlGS:
> [   28.337953] CS:  0010 DS:  ES:  CR0: 8005003b
> [   28.337953] CR2: 7ffc606b8ff8 CR3: 1f71a000 CR4:
> 06f0
> [   28.337953] DR0:  DR1:  DR2:
> 
> [   28.337953] DR3:  DR6: 0ff0 DR7:
> 0400
> [   28.337953] Stack:
> [   28.337953]   7ffc606b90f8 8168eb88
> 7ffc606ba860
> [   28.337953]  0002  
> 7ffc606b90f8
> [   28.337953]  7ffc606b9108 0246 0008
> 7f7a161bf740
> [   28.337953] Call Trace:
> [   28.337953] Code: 89 de 4c 89 ef e8 7c ca fe ff e9 5c fd ff ff 31 c0
> e9 01 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5
> 41 55 <41> 54 49 89 f4 53 48 89 fb 48 83 ec 08 0f 20 d0 0f 1f 40 00 0f
> [   28.337953] RIP  [] do_page_fault+0xb/0x90
> [   28.337953]  RSP <7ffc606b9000>
> [   28.337953] ---[ end trace b556ad308185dda4 ]---
> [   28.337953] Kernel panic - not syncing: Fatal exception
> 
> Can somebody give me a hint how can I debug this?

Hi Ján,

Unfortunately, I'm not very familiar with the internals of Linux. What I
might suggest trying to do is to maybe test with migrating SmartOS, only
because it'll hopefully be easier for us to look at dumps, get a crash
dump, and debug.

I suspect that likely what's going on here is that some part of the
migrated state has not been correctly saved and/or restored, especially
given that we never really focused on bring up at the time.

Sorry I don't have a more actionable next step for you.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] vmadm and overlays

2017-06-15 Thread Robert Mustacchi
On 6/8/17 13:29 , Ján Poctavek wrote:
> Hi Robert,
> 
> thank you for the clarification. Now, with all bits together, it seems
> quite logical. Some additional comments in-line:
> 
> 
> On 8. 6. 2017 2:40, Robert Mustacchi wrote:
>> We never really figured out what a good interface for this would be for
>> normal SmartOS, so you've mostly found all the bits in Triton that
>> automate it there and make more sense there. I'll try and explain some
>> of the things you saw, but if you have ideas on what might make sense,
>> that'd be useful.
> 
> The hardcoded location of overlay_rules.json that has to reside on the
> non-permament storage is a bit weird. One has to create a service early
> in the boot list (definitely before starting zones) just to put the file
> in place. Maybe if there was some alternative (permanent) location that
> can be looked up if not found in the first location. It also could be
> loaded into /run at boot e.g. from /opt/custom/overlays.
> 
> Or, alternatively, dladm create-overlay could have some "persist" flag
> that can add new overlay into the overlay_rules.json

Yeah, if we could do some things over, we might have instead have done a
better job of dealing with /etc/path_to_inst and added the persistence
that nictags represent as permanent dladm devices. I think realistically
we'd want to look at ways of doing that permanence. That file was really
only intended to bootstrap the nictagadm rules at boot time. We should
probably be looking at better ways to handle this for the general
SmartOS case and allow them being present in the config file potentially.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] vmadm and overlays

2017-06-07 Thread Robert Mustacchi
On 6/7/17 14:59 , Ján Poctavek wrote:
> Okay, seems I've figured it out. But it was not an easy one. The
> documentation about this is nonexistent.

Hi Ján,

We never really figured out what a good interface for this would be for
normal SmartOS, so you've mostly found all the bits in Triton that
automate it there and make more sense there. I'll try and explain some
of the things you saw, but if you have ideas on what might make sense,
that'd be useful.

> When using overlays with VMs, the overlays are created automatically and
> they don't need to be created by dladm command.

This was the design center for triton where we're not using direct
tunnels, but we effectively have an overlay per customer, so they're
much more dynamic.

> - note that overlays as nic_tag need to be referenced by name and a
> (random) number after slash

The number in this case was designed to be a vxlan (or other
encapsulation protocol) identifier.

> - and also note that "-s direct" overlay can be created only between two
> (and no more) SmartOS servers

Right, it's really designed for point to point connections. There's an
open RFE about making sure you can use multicast groups, but beyond
that, folks will need to look at other search plugins with overlays to
plug into other directory services.

Hopefully that helps clarify a bit, sorry this was rather painful.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] Get global zone ID inside non global zone

2017-06-05 Thread Robert Mustacchi
On 6/5/17 13:08 , Peter Toth wrote:
> Hi,
> 
> Is there a way to get the UUID of the compute node from a zone?

Take a look at mdata-get and the committed metadata keys here:

https://eng.joyent.com/mdata/datadict.html

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] Linux VM set additional route

2017-06-05 Thread Robert Mustacchi
On 6/4/17 13:28 , the outsider wrote:
> I have a VM that has two nics. 
> 
>  
> 
> Nic 1 is primary and has the default gateway for most routes
> 
> But nic2 needs to be gateway for a special subnet that is not reachable via
> nic 1.
> 
>  
> 
> I tried to set the route via the standard linux way, but that is blocked. 
> 
> route add -net 192.168.132.0/24 gw 192.168.141.1
> 
> SIOCADDRT: Operation not supported
> 
>  
> 
> But how can I get this route to be statically added to nic1 ? 

You should be able to use vmadm to deal with this. It looks like the
routes property is updatedable via the set_routes payload. See the vmadm
update section for more information.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] https://www.phoronix-test-suite.com/

2017-05-25 Thread Robert Mustacchi
On 5/25/17 15:23 , apg wrote:
> Has anyone ran the Phoronix Test Suite on smartos? I've tried it in an 
> lx-zone:
>  "image_uuid": "68a837fe-1b9b-11e7-a66d-ab7961786c42"
> 
> and in a joyent smartos zone:
>  "image_uuid": "f6acf198-2037-11e7-8863-8fdd4ce58b6a",
> 
> - SmartOS (build: 20170413T062226Z)
> 
> One of the few tests that would run on solaris is the pts/apache test:
> 
> ./phoronix-test-suite benchmark pts/apache
> 
> I've tried it on two generations of servers, and compared the results to
> similar hardware running esx with a full blown vm with centos 6 installed. The
> results for smartos are awful. An average: 1749.53 Requests Per Second for
> the smartos zone -vs- 23425.61 Requests Per Second for the full vm (centos 6)
> on an esx host. The performance of the lx-zone were very similar to the
> smartos zone.
> 
> On the pts/apache test, the host on esx ran the test in 3 minutes,
> approximately; the smartos zone, and the lx-zone were closer to 30 minutes.

In general, I'd start with using a basic USE method analysis on the
system and check how much CPU, etc. you've specified versus what the
systems in question are trying to use.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] stalled / hanging connections with tagged interfaces

2017-05-22 Thread Robert Mustacchi
On 5/19/17 16:58 , apg wrote:
> On Fri, May 19, 2017 at 11:59:56AM -0700, apg wrote:
>> Hello,
>>
>> I've installed SmartOS 20170413T062226Z on a blade (PowerEdgeM610). Vlan
>> tagged interfaces come up just fine, can ping, ssh, etc, but they cannot
>> transfer any kind of data to it. rsync, scp, curl, transfers to the smartos
>> host all hang, choke, puke, die.
>>
>> Untagged vlan traffic works just fine.
>>
>> Broadcom nics. prtdiag:
>> --- -  
>> 1   in usePCI Exp. Gen 2   MEZZ1_FAB_C , Broadcom Limited NetXtreme II
>> BCM5709S Gigabit Ethernet (bnx)
>> 3   in usePCI Exp. Gen 2   MEZZ2_FAB_B , Broadcom Limited NetXtreme II
>> BCM5709S Gigabit Ethernet (bnx)
>>
>> Has anyone had similar experience?
> 
> I installed centos 6 on the same blade, used the same settings on the switch,
> same (similar) commands on nics, ran the same commands, it worked just fine.
> 
> here's the net parts of /usbkey/config:
> admin_nic=00:10:18:59:24:78
> admin_ip=10.10.0.238
> admin_netmask=255.255.254.0
> admin_network=
> admin_gateway=10.10.1.254
> 
> v803pvt_nic=00:23:AE:FD:A3:F8
> v803pvt0_ip=10.3.1.211
> v803pvt0_netmask=255.255.254.0
> v803pvt0_vlan_id=803
> 
> Driver issue?

I suspect that this may be https://www.illumos.org/issues/4175.

Robert



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] zone stuck in "down" state after shutdown: unable to unmount lofs filesystem

2017-05-18 Thread Robert Mustacchi
On 5/18/17 0:36 , Matthew Parsons wrote:
> zone1 is a simple (native) fileserver, serving up a zfs filesystem (let's
> call it "TheFiles") mounted via lofs.
> 
> zone2 is an LX container for running backup software, accessing the same
> filesystem via lofs in read-only mode.
> 
> (It's quite possible that last part is very inadvisable, known to be a Bad
> Idea, etc. This was a quick proof-of-concept, was going to switch the
> backup server to access a snapshot...)
> 
> Anyway, attempting to stop zone1 just halted and never timed out. State is
> now "down".
>  dmesg shows repeated failures of zoneadmd failing:
> 
> "unable to unmount '/zones/UUDI/root/mnt/TheFiles', retrying in 2 seconds...
> ..."unmount of '/zones/UUDI/root/mnt/TheFiles' failed
> ...unable to unmount file systems in zone
> ...unable to destroy zone
> 
> Note that while the process on the backup zone appeared to hiccup a bit,
> it's resumed just fine.
> 
> So 1: Is this somewhat expected behavior, undefined, or a surprise?
> 2: Any recommendations on attempts to get zone back to a bootable "stopped"
> state? Would prefer to not have to reboot host.

When you tried to destroy zone1, did zone2 still have the lofs mount active?

Robert



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] (Possibly OT) Joyent pkgsrc

2017-05-18 Thread Robert Mustacchi
On 5/17/17 23:01 , Matthew Parsons wrote:
> (Usual disclaimers, but I've scoured pkgsrc.joyent.com for info, so
> possibly could use some fixing)
> 
> Is this the best place to report issues/requests with your packages?

While Filip is already digging into the specifics of the case, the best
place to file bugs on the packages and issues with them being present is
usually at https://github.com/joyent/pkgsrc/issues.

Thank you for taking the time to dig into this stuff.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] Server reboot related to persistent lwp-related bug?

2017-05-15 Thread Robert Mustacchi
On 5/14/17 3:43 , Adam Richmond-Gordon wrote:
> That said, I’m sure somebody at Joyent would still like to see the crash dump.

I'd agree that it's worth filing a bug on github with the stack trace.
We'll get someone to try and take a look at this to make sure this is a
known issue or not. Someone'll likely follow up with a way to transfer
the dump for us to investigate.

Thanks,
Robert



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] USB to RS232 Serial adapters

2017-05-12 Thread Robert Mustacchi
On 5/10/17 12:44 , Jeff Goeke-Smith wrote:
> On Wed, May 10, 2017 at 2:07 PM Robert Mustacchi <r...@joyent.com> wrote:
> 
>> Great, thanks. That helps. The next thing to figure out is whether or
>> not we're reaching the ftdi driver or not when we're failing to connect
>> these other ports. To do that, could you make sure that the USB device
>> is unplugged and then run:
>>
>> dtrace -n 'fbt::usbser_attach:return{ trace(arg1); stack(); }'
>>
>> After which, plug in the device.
>>
>>
> Unplugged device.
> Executed the dtrace line as stated, redirected to file.
> Attached device, waited for the device to settle (takes about 20 seconds.)
>  The device has blinken lights on the front for each port.  Did it's normal
> pattern of visually blinking the first 9 ports once, and then doing a
> triple blink of the last 7 ports.  For comparison, when attaching to a
> windows box, it blinks the lights in order, rapidly, over about 1 second
> total.
> Ended dtrace.
> 
> Wrote this email and attached the dtrace.  Hoping it makes it through the
> list engine. Added Robert as a direct recipient just to make sure.

Hi Jeff,

Everything made it through just fine. I put together another script to
run for more information. Can you use the same procedure? The script is
available here:

https://us-east.manta.joyent.com/rmustacc/public/tmp/uftdi.d

If you save it to /var/tmp/uftdi.d, you'd run this as dtrace -s
/var/tmp/uftdi.d -o /path/to/output/file.

It may be worth filing a bug on github.com/joyent/smartos-live/issues,
so we can better track this outside of e-mail.

Thanks,
Robert



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] USB to RS232 Serial adapters

2017-05-10 Thread Robert Mustacchi
On 5/9/17 11:06 , Jeff Goeke-Smith wrote:
>>> Maybe that helps?  Suggestions?
>>
>> Hi Jeff,
>>
>> Based on some of the information you've provided, we could probably help
>> debug what's going on here, depending on your level of interest and work
>> to understanding where something's going wrong.
>>
>> Presuming that you'll have this set up for a little while, we can try
>> and figure out where exactly we're failing to connect some of these
>> devices. Is that something that you'd be interested in? I can't promise
>> it'll be the fastest path to getting you righted, but if you're
>> interested, and are willing to accept that it may take a bit of time
>> between having things to ask, I'd be happy to provide you with some next
>> steps in terms of what to look at.
>>
>> Robert
>>
> 
> 
> Hi Robert,
> 
> Yes, I still have this setup and would be interested in debugging it. Slow
> and steady wins the bug race?

Great, thanks. That helps. The next thing to figure out is whether or
not we're reaching the ftdi driver or not when we're failing to connect
these other ports. To do that, could you make sure that the USB device
is unplugged and then run:

dtrace -n 'fbt::usbser_attach:return{ trace(arg1); stack(); }'

After which, plug in the device.

Thanks,
Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] USB to RS232 Serial adapters

2017-05-09 Thread Robert Mustacchi
On 5/9/17 5:31 , cristian pancià wrote:
> This is something I'm interested too,i mean the way to debug and trying to
> work it out ,any docs and info to similar probs would be very informative
> thanks a lot

Unfortunately, I don't have any good documentation on debugging this.
The best starting place is the data from mdb -ke '::prtusb' and then
from there, I usually put together some DTrace scripts to try and figure
out what exactly is going on.

Robert

> On May 9, 2017 3:46 AM, "Robert Mustacchi" <r...@joyent.com> wrote:
> 
>> On 5/5/17 12:51 , Jeff Goeke-Smith wrote:
>>> On Fri, May 5, 2017 at 12:49 PM Jason King <jason.brian.k...@gmail.com>
>>> wrote:
>>>
>>>> Might as well shoot off an email with whatever diagnostic messages you
>>>> have and the version of SmartOS you’re running.  While no guarantees
>>>> someone will be able to help, it can’t hurt either.
>>>>
>>>
>>>
>>> As suggested, here's what I'm seeing.
>>>
>>> SmartOS version:
>>> [root@headnode (us-elns-workshop) /kernel/drv]# uname -a
>>> SunOS headnode 5.11 joyent_20170413T062134Z i86pc i386 i86pc
>>>
>>> dmesg during the usb attach:
>>> 2017-05-04T21:14:40.717522+00:00 headnode usba: [ID 912658 kern.info]
>> USB
>>> 2.0 device (usb1a40,101) operating at hi speed (USB 2.x) on USB 2.0
>>> external hub: hub@2, hubd3 at bus address 6
>>> 2017-05-04T21:14:40.717587+00:00 headnode usba: [ID 349649 kern.info]
>> USB
>>> 2.0 Hub [MTT]
>>> 2017-05-04T21:14:40.717598+00:00 headnode genunix: [ID 936769 kern.info]
>>> hubd3 is /pci@0,0/pci1028,4fe@1a/hub@1/hub@2
>>> 2017-05-04T21:14:40.717610+00:00 headnode genunix: [ID 408114 kern.info]
>>> /pci@0,0/pci1028,4fe@1a/hub@1/hub@2 (hubd3) online
>>> 2017-05-04T21:14:41.853146+00:00 headnode usba: [ID 912658 kern.info]
>> USB
>>> 2.0 device (usb403,6001) operating at full speed (USB 1.x) on USB 2.0
>>> external hub: device@1, usbftdi0 at bus address 7
>>> 2017-05-04T21:14:41.853183+00:00 headnode usba: [ID 349649 kern.info]
>> FTDI
>>> FT232R USB UART ST202314
>>> 2017-05-04T21:14:41.853195+00:00 headnode genunix: [ID 936769 kern.info]
>>> usbftdi0 is /pci@0,0/pci1028,4fe@1a/hub@1/hub@2/device@1
>>> 2017-05-04T21:14:41.853209+00:00 headnode genunix: [ID 408114 kern.info]
>>> /pci@0,0/pci1028,4fe@1a/hub@1/hub@2/device@1 (usbftdi0) online
>>> 2017-05-04T21:14:41.856926+00:00 headnode usba: [ID 912658 kern.info]
>> USB
>>> 2.0 device (usb403,6001) operating at full speed (USB 1.x) on USB 2.0
>>> external hub: device@2, usbftdi1 at bus address 8
>>> 2017-05-04T21:14:41.856957+00:00 headnode usba: [ID 349649 kern.info]
>> FTDI
>>> FT232R USB UART ST203313
>>> 2017-05-04T21:14:41.856967+00:00 headnode genunix: [ID 936769 kern.info]
>>> usbftdi1 is /pci@0,0/pci1028,4fe@1a/hub@1/hub@2/device@2
>>> 2017-05-04T21:14:41.856982+00:00 headnode genunix: [ID 408114 kern.info]
>>> /pci@0,0/pci1028,4fe@1a/hub@1/hub@2/device@2 (usbftdi1) online
>>> 2017-05-04T21:14:41.983547+00:00 headnode usba: [ID 912658 kern.info]
>> USB
>>> 2.0 device (usb1a40,201) operating at hi speed (USB 2.x) on USB 2.0
>>> external hub: hub@3, hubd4 at bus address 9
>>> 2017-05-04T21:14:41.983585+00:00 headnode usba: [ID 349649 kern.info]
>> USB
>>> 2.0 Hub [MTT]
>>> 2017-05-04T21:14:41.983593+00:00 headnode genunix: [ID 936769 kern.info]
>>> hubd4 is /pci@0,0/pci1028,4fe@1a/hub@1/hub@2/hub@3
>>> 2017-05-04T21:14:41.983604+00:00 headnode genunix: [ID 408114 kern.info]
>>> /pci@0,0/pci1028,4fe@1a/hub@1/hub@2/hub@3 (hubd4) online
>>> 2017-05-04T21:14:42.109639+00:00 headnode usba: [ID 912658 kern.info]
>> USB
>>> 2.0 device (usb1a40,201) operating at hi speed (USB 2.x) on USB 2.0
>>> external hub: hub@4, hubd5 at bus address 10
>>> 2017-05-04T21:14:42.109670+00:00 headnode usba: [ID 349649 kern.info]
>> USB
>>> 2.0 Hub [MTT]
>>> 2017-05-04T21:14:42.109679+00:00 headnode genunix: [ID 936769 kern.info]
>>> hubd5 is /pci@0,0/pci1028,4fe@1a/hub@1/hub@2/hub@4
>>> 2017-05-04T21:14:42.109693+00:00 headnode genunix: [ID 408114 kern.info]
>>> /pci@0,0/pci1028,4fe@1a/hub@1/hub@2/hub@4 (hubd5) online
>>> 2017-05-04T21:14:44.040369+00:00 headnode usba: [ID 912658 kern.info]
>> USB
>>> 2.0 device (usb403,6001) operating at full speed (USB 1.x) on USB 2.0
>>> external hub: device@1, usbftdi2 at bus address 11
>>> 2017-05-04T21:14:44.040426+00:00 headnode usba: [ID 

Re: [smartos-discuss] USB to RS232 Serial adapters

2017-05-08 Thread Robert Mustacchi
On 5/5/17 12:51 , Jeff Goeke-Smith wrote:
> On Fri, May 5, 2017 at 12:49 PM Jason King 
> wrote:
> 
>> Might as well shoot off an email with whatever diagnostic messages you
>> have and the version of SmartOS you’re running.  While no guarantees
>> someone will be able to help, it can’t hurt either.
>>
> 
> 
> As suggested, here's what I'm seeing.
> 
> SmartOS version:
> [root@headnode (us-elns-workshop) /kernel/drv]# uname -a
> SunOS headnode 5.11 joyent_20170413T062134Z i86pc i386 i86pc
> 
> dmesg during the usb attach:
> 2017-05-04T21:14:40.717522+00:00 headnode usba: [ID 912658 kern.info] USB
> 2.0 device (usb1a40,101) operating at hi speed (USB 2.x) on USB 2.0
> external hub: hub@2, hubd3 at bus address 6
> 2017-05-04T21:14:40.717587+00:00 headnode usba: [ID 349649 kern.info] USB
> 2.0 Hub [MTT]
> 2017-05-04T21:14:40.717598+00:00 headnode genunix: [ID 936769 kern.info]
> hubd3 is /pci@0,0/pci1028,4fe@1a/hub@1/hub@2
> 2017-05-04T21:14:40.717610+00:00 headnode genunix: [ID 408114 kern.info]
> /pci@0,0/pci1028,4fe@1a/hub@1/hub@2 (hubd3) online
> 2017-05-04T21:14:41.853146+00:00 headnode usba: [ID 912658 kern.info] USB
> 2.0 device (usb403,6001) operating at full speed (USB 1.x) on USB 2.0
> external hub: device@1, usbftdi0 at bus address 7
> 2017-05-04T21:14:41.853183+00:00 headnode usba: [ID 349649 kern.info] FTDI
> FT232R USB UART ST202314
> 2017-05-04T21:14:41.853195+00:00 headnode genunix: [ID 936769 kern.info]
> usbftdi0 is /pci@0,0/pci1028,4fe@1a/hub@1/hub@2/device@1
> 2017-05-04T21:14:41.853209+00:00 headnode genunix: [ID 408114 kern.info]
> /pci@0,0/pci1028,4fe@1a/hub@1/hub@2/device@1 (usbftdi0) online
> 2017-05-04T21:14:41.856926+00:00 headnode usba: [ID 912658 kern.info] USB
> 2.0 device (usb403,6001) operating at full speed (USB 1.x) on USB 2.0
> external hub: device@2, usbftdi1 at bus address 8
> 2017-05-04T21:14:41.856957+00:00 headnode usba: [ID 349649 kern.info] FTDI
> FT232R USB UART ST203313
> 2017-05-04T21:14:41.856967+00:00 headnode genunix: [ID 936769 kern.info]
> usbftdi1 is /pci@0,0/pci1028,4fe@1a/hub@1/hub@2/device@2
> 2017-05-04T21:14:41.856982+00:00 headnode genunix: [ID 408114 kern.info]
> /pci@0,0/pci1028,4fe@1a/hub@1/hub@2/device@2 (usbftdi1) online
> 2017-05-04T21:14:41.983547+00:00 headnode usba: [ID 912658 kern.info] USB
> 2.0 device (usb1a40,201) operating at hi speed (USB 2.x) on USB 2.0
> external hub: hub@3, hubd4 at bus address 9
> 2017-05-04T21:14:41.983585+00:00 headnode usba: [ID 349649 kern.info] USB
> 2.0 Hub [MTT]
> 2017-05-04T21:14:41.983593+00:00 headnode genunix: [ID 936769 kern.info]
> hubd4 is /pci@0,0/pci1028,4fe@1a/hub@1/hub@2/hub@3
> 2017-05-04T21:14:41.983604+00:00 headnode genunix: [ID 408114 kern.info]
> /pci@0,0/pci1028,4fe@1a/hub@1/hub@2/hub@3 (hubd4) online
> 2017-05-04T21:14:42.109639+00:00 headnode usba: [ID 912658 kern.info] USB
> 2.0 device (usb1a40,201) operating at hi speed (USB 2.x) on USB 2.0
> external hub: hub@4, hubd5 at bus address 10
> 2017-05-04T21:14:42.109670+00:00 headnode usba: [ID 349649 kern.info] USB
> 2.0 Hub [MTT]
> 2017-05-04T21:14:42.109679+00:00 headnode genunix: [ID 936769 kern.info]
> hubd5 is /pci@0,0/pci1028,4fe@1a/hub@1/hub@2/hub@4
> 2017-05-04T21:14:42.109693+00:00 headnode genunix: [ID 408114 kern.info]
> /pci@0,0/pci1028,4fe@1a/hub@1/hub@2/hub@4 (hubd5) online
> 2017-05-04T21:14:44.040369+00:00 headnode usba: [ID 912658 kern.info] USB
> 2.0 device (usb403,6001) operating at full speed (USB 1.x) on USB 2.0
> external hub: device@1, usbftdi2 at bus address 11
> 2017-05-04T21:14:44.040426+00:00 headnode usba: [ID 349649 kern.info] FTDI
> FT232R USB UART ST203316
> 2017-05-04T21:14:44.040437+00:00 headnode genunix: [ID 936769 kern.info]
> usbftdi2 is /pci@0,0/pci1028,4fe@1a/hub@1/hub@2/hub@3/device@1
> 2017-05-04T21:14:44.040456+00:00 headnode genunix: [ID 408114 kern.info]
> /pci@0,0/pci1028,4fe@1a/hub@1/hub@2/hub@3/device@1 (usbftdi2) online
> 2017-05-04T21:14:44.044848+00:00 headnode usba: [ID 912658 kern.info] USB
> 2.0 device (usb403,6001) operating at full speed (USB 1.x) on USB 2.0
> external hub: device@2, usbftdi3 at bus address 12
> 2017-05-04T21:14:44.044878+00:00 headnode usba: [ID 349649 kern.info] FTDI
> FT232R USB UART ST203315
> 2017-05-04T21:14:44.044888+00:00 headnode genunix: [ID 936769 kern.info]
> usbftdi3 is /pci@0,0/pci1028,4fe@1a/hub@1/hub@2/hub@3/device@2
> 2017-05-04T21:14:44.044904+00:00 headnode genunix: [ID 408114 kern.info]
> /pci@0,0/pci1028,4fe@1a/hub@1/hub@2/hub@3/device@2 (usbftdi3) online
> 2017-05-04T21:14:47.907735+00:00 headnode usba: [ID 691482 kern.warning]
> WARNING: /pci@0,0/pci1028,4fe@1a/hub@1/hub@2/hub@4 (hubd5): Connecting
> device on port 1 failed
> 2017-05-04T21:14:51.747792+00:00 headnode usba: [ID 691482 kern.warning]
> WARNING: /pci@0,0/pci1028,4fe@1a/hub@1/hub@2/hub@4 (hubd5): Connecting
> device on port 2 failed
> 2017-05-04T21:14:55.587838+00:00 headnode usba: [ID 691482 kern.warning]
> WARNING: /pci@0,0/pci1028,4fe@1a/hub@1/hub@2/hub@4 

Re: [smartos-discuss] SmartOS PXE Boot with LACP Trunk

2017-04-26 Thread Robert Mustacchi
On 4/26/17 0:34 , Tamás Gérczei wrote:
> Hello,
> 
> I solved the same problem by using iPXE as a PXE implementation instead of 
> what 
> the ROM shipped.

In general, most switches have a LACP fallback option for exactly this
purpose, such that in boot it ends up not enabling LACP and just a
static link. While I'm not sure of the exact option name, I suspect that
the switches you're using have similar options.

Robert

> On 2017-04-26 09:07, Joven Sabanal wrote:
>> ​​
>> Hi all,
>>
>> I setup PXE Server to boot SmartOS. I have servers that Link Aggregation is 
>> enabled and from switch, Static LACP is enabled. But when the server reboot 
>> and it needs to load the PXE File, it's having an error:
>>
>> Screenshot:
>> Inline image 1
>>
>> Server Model :
>> Network Switch Model : HP 2920-48G Switch (J9728A)
>>
>> This happen when 2 links are active. Whenever I disable or unplugged 1 link, 
>> loading PXE file is success and booting SmartOS is OK.
>>
>>
>> ​Any advice ​is much appreciated.
>> Thanks in advance.​
>>
>>
>> Regards,
>>
>>
>> ​Joven D. ​
>>
>>
> 
> *smartos-discuss* | Archives 
>  
>  | 
> Modify 
>  
> Your Subscription [Powered by Listbox] 
> 



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] build: 00000000T000000Z

2017-04-07 Thread Robert Mustacchi
On 4/6/17 11:43 , Youzhong Yang wrote:
> Hi,
> 
> I noticed something strange today:
> 
> # ssh host uname -a
> SunOS batfs9920 5.11 joyent_20170406T161321Z i86pc i386 i86pc
> 
> # ssh host
> - SmartOS (build: T00Z)
> 
> By the way, I built the image today after merging latest stuff from
> illumos-joyent, smartos-live etc.


Hi Youzhong,

Thanks for reporting this. This is a cascading failure, the root of
which should be addressed by OS-6041 which I just pushed:
https://github.com/joyent/illumos-joyent/commit/4dc16d22d09463c940a8c3315c7a1f9bf74eba88.

The long and short of it is that a regression caused us to no longer
build the native-man that we use during the customization process of the
live image. That failed, but was unnoticed by the broader process of
building the live image due to some older scripts which didn't notice
the failure. We're working on shoring that up in parallel, but this
should at least address things in the interim.

Robert



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] inotify support in LX zones?

2017-04-03 Thread Robert Mustacchi
On 4/3/17 14:40 , Jason Lawrence wrote:
> Are there any filesystem limitations? For example, should it work fine
> with lofs mounts?

I'm not personally sure. Someone else on the list might know. I'd
suggest testing what you want to do.

Robert

> On Mon, Apr 3, 2017, at 04:33 PM, Robert Mustacchi wrote:
>> On 4/3/17 13:57 , Jason Lawrence wrote:
>>> From various discussions I've found, it seems like inotify support is
>>> available on SmartOS. Is this also true within LX zones (ie, Ubuntu
>>> 16.04)?
>>
>> Hi Jason,
>>
>> inotify support was implemented for lx, so yes, it should be present
>> there.
>>
>> Robert
>>
> 
> 


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] sysinfo script modification question

2017-04-03 Thread Robert Mustacchi
On 4/3/17 0:22 , 강경원 wrote:
> Hello.
> 
> We are testing SDC with same SMBIOS uuid servers.

We recommend that you talk to your hardware vendor and have them provide
tooling to fix the server's UUID. If they have the same UUID, they've
not properly implemented the SMBIOS spec (though it's far from the first
time we've heard of this).

> So we tried to modify images's sysinfo script to test and after modifing the 
> sysinfo, the fake uuid can be created successfully and can be setup.
>
> But when we try to reboot the node, below error message is shown and 
> rebooting 
> is not working.
> 
> The only thing that we can do is ipmi power reset.
> 
> How can we avoid the errors?
> 
>   svc.startd: Killing user processes.
> 
> WARNING: Error writing ufs log state
> WARNING: ufs log for /usr changed state to Error
> WARNING: Please umount(1M) /usr and run fsck(1M)

Given what little information we have to work on, I'd suggest you review
your procedure for building and modifying the live image for how you
updated sysinfo to your custom version. Without knowing what you've done
or not done or how you've done it, it's hard to suggest actionable steps
to take.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] inotify support in LX zones?

2017-04-03 Thread Robert Mustacchi
On 4/3/17 13:57 , Jason Lawrence wrote:
> From various discussions I've found, it seems like inotify support is
> available on SmartOS. Is this also true within LX zones (ie, Ubuntu
> 16.04)?

Hi Jason,

inotify support was implemented for lx, so yes, it should be present there.

Robert



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] virtio-rng

2017-03-23 Thread Robert Mustacchi
On 3/23/17 9:23 , Michele Codutti via smartos-discuss wrote:
> Hi all. Recently I noticed that the tomcat web server had log startup times 
> when it runs inside a KVM linux machine.
> It seems that the problem resides in the fact that the /dev/random produce 
> entropy very slowly.
> I have found two solutions of this problem:
> 1. Configure tomcat to use /dev/urandom
> 2. Use the virtio-rng paravirtual device (if it is implemented in the KVM 
> port in SmartOS). 
>   
> >
> The first solution is quick and dirty.
> The second seems more robust but I need to configure a KVM machine with that 
> paravirtual device.
> I had not found any documentation about that topic in the (SmartOS) wiki. 
> Googling was not useful.
> Some one can give me at least some directions?

There's no support for the virtio-rng paravirtualized device today.

Robert



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] Issue with host service varpd

2017-03-22 Thread Robert Mustacchi
On 3/22/17 9:36 , Mark Creamer wrote:
> Robert, the primary VM having an issue is a database server (MySQL). The
> service will only go from disabled to offline. I think there's a dependency
> issue but don't know where to start of if this is related to the varpd
> issue in the GZ. svcs on the host returns these services offline:
> offline12:14:52 svc:/milestone/network:default
> offline12:14:52 svc:/milestone/single-user:default
> offline12:14:52 svc:/system/filesystem/local:default
> offline12:14:53 svc:/system/sysidtool:net
> offline12:14:53 svc:/network/initial:default
> offline12:14:53 svc:/system/sysidtool:system
> offline12:14:53 svc:/milestone/sysconfig:default
> offline12:14:53 svc:/network/service:default
> offline12:14:53 svc:/network/dns/client:default
> offline12:14:53 svc:/milestone/name-services:default
> offline12:14:53 svc:/network/inetd:default
> offline12:14:53 svc:/system/system-log:default
> offline12:14:53 svc:/system/utmp:default
> offline12:14:53 svc:/system/cron:default
> offline12:14:53 svc:/milestone/multi-user:default
> offline12:14:53 svc:/system/console-login:default
> offline12:14:54 svc:/milestone/multi-user-server:default
> offline12:14:54 svc:/network/ssh:default
> offline12:14:54 svc:/network/shares/group:default
> offline12:14:54 svc:/network/shares/group:zfs
> offline12:14:54 svc:/system/sac:default
> offline12:14:54 svc:/network/netmask:default
> offline12:14:54 svc:/smartdc/mdata:execute
> offline12:14:54 svc:/zabbix/agent:default
> offline*   12:16:07 svc:/network/routing-setup:default
> offline12:33:24 svc:/pkgsrc/mysql:default

So, the first thing I'd do here is look at the service you care about
with svcs -xv and see why it's offline. In general, varpd should not be
a dependent service of anything else, meaning that even if varpd is
having problems, everything else should work. While we should definitely
figure out why it's not working, it's not clear that it's related to the
issue you're currently seeing.

So, there are a few things that could be going on here. The module could
not be loaded, or something could have gone wrong with devfsadm that
causes us not to be able to have the symlink. Here are a few other
things to look at:

mdb -ke 'overlay_thdl_list::whatis'

This will see if the overlay module is present and loaded more or less.
Can you also run:

ls -l /devices/pseudo/overlay@0:overlay

That's the file that the /dev/overlay symlink will point to.

Finally, is the devfsadmd process running? You could figure that out by
running something like pargs $(pgrep devfsadm).

Robert

> On Wed, Mar 22, 2017 at 12:20 PM, Mark Creamer <white...@gmail.com> wrote:
> 
>> First I entered the dtrace command in one window.
>>
>> Then opened another window and with the service in maintenance, did svcadm
>> clear varpd, then svcadm disable varpd, then svcadm enable varpd.
>>
>> Nothing new in the log except the notation about clear. Same error as
>> originally. I did also check, and /dev/overlay does not exist but it does
>> on my other server. Can that be recreated or copied over if that's the
>> issue? It looks like it's a symlink to another file so I don't know.
>>
>> On Wed, Mar 22, 2017 at 12:13 PM, Robert Mustacchi <r...@joyent.com> wrote:
>>
>>> On 3/22/17 9:11 , Mark Creamer wrote:
>>>> Robert, I did that but nothing happens. I don't have any dtrace
>>> experience
>>>> so I'm not sure what to expect. Should I have seen any output in the
>>> dtrace
>>>> command window? Thank you
>>>
>>> Can you relate the exact steps you took? But yes, you should have seen
>>> something in the DTrace command window. Did you see additional entries
>>> in the varpd service log?
>>>
>>> Robert
>>>
>>>> On Wed, Mar 22, 2017 at 11:53 AM, Robert Mustacchi <r...@joyent.com>
>>> wrote:
>>>>
>>>>> On 3/22/17 8:25 , Mark Creamer wrote:
>>>>>> I have a host with a service in maintenance after a reboot, and
>>> several
>>>>>> services on critical VMs will not start. The service in maintenance is
>>>>>> varpd. Following is the log. I can't find anything on Google to help
>>> with
>>>>>> the error "varpd: failed to open a libvarpd handle: No such file or
>>>>>> directory". I appreciate any suggestions. If it might be just a
>>> matter of
>>>>>> reinstalling something or copying a file over from a w

Re: [smartos-discuss] Issue with host service varpd

2017-03-22 Thread Robert Mustacchi
On 3/22/17 9:11 , Mark Creamer wrote:
> Robert, I did that but nothing happens. I don't have any dtrace experience
> so I'm not sure what to expect. Should I have seen any output in the dtrace
> command window? Thank you

Can you relate the exact steps you took? But yes, you should have seen
something in the DTrace command window. Did you see additional entries
in the varpd service log?

Robert

> On Wed, Mar 22, 2017 at 11:53 AM, Robert Mustacchi <r...@joyent.com> wrote:
> 
>> On 3/22/17 8:25 , Mark Creamer wrote:
>>> I have a host with a service in maintenance after a reboot, and several
>>> services on critical VMs will not start. The service in maintenance is
>>> varpd. Following is the log. I can't find anything on Google to help with
>>> the error "varpd: failed to open a libvarpd handle: No such file or
>>> directory". I appreciate any suggestions. If it might be just a matter of
>>> reinstalling something or copying a file over from a working host, I just
>>> need to know what to try.
>>> Thanks
>>
>> Hi Mark,
>>
>> Sorry to hear that you're having trouble. While varpd being in
>> maintenance is something we should understand, it should not be blocking
>> VMs from starting up unless this is Triton and not standalone SmartOS.
>> Probably worth understanding why they're not starting up.
>>
>>> [root@00-25-90-e0-dd-2c ~]# cat /var/svc/log/network-varpd\:default.log
>>> [ May  9 19:03:16 Executing start method ("/lib/svc/method/svc-varpd"). ]
>>> [ May  9 19:03:17 Method "start" exited with status 0. ]
>>> [ Aug 25 23:40:33 Executing start method ("/lib/svc/method/svc-varpd"). ]
>>> [ Aug 25 23:40:35 Method "start" exited with status 0. ]
>>> [ Mar 19 17:39:35 Executing start method ("/lib/svc/method/svc-varpd"). ]
>>> [ Mar 19 17:39:36 Method "start" exited with status 0. ]
>>> [ Jun 30 03:46:10 Executing start method ("/lib/svc/method/svc-varpd"). ]
>>> [ Jun 30 03:46:11 Method "start" exited with status 0. ]
>>> [ Aug 28 23:39:18 Executing start method ("/lib/svc/method/svc-varpd"). ]
>>> [ Aug 28 23:39:20 Method "start" exited with status 0. ]
>>> [ Mar 22 12:38:48 Executing start method ("/lib/svc/method/svc-varpd"). ]
>>> [ Mar 22 12:39:49 Method or service exit timed out.  Killing contract
>> 44. ]
>>> [ Mar 22 14:28:26 Leaving maintenance because disable requested. ]
>>> [ Mar 22 14:28:26 Disabled. ]
>>> [ Mar 22 14:28:48 Enabled. ]
>>> [ Mar 22 14:28:48 Executing start method ("/lib/svc/method/svc-varpd"). ]
>>> varpd: failed to open a libvarpd handle: No such file or directory
>>> [ Mar 22 14:28:48 Method "start" exited with status 95. ]
>>> [ Mar 22 14:30:31 Leaving maintenance because clear requested. ]
>>> [ Mar 22 14:30:31 Enabled. ]
>>> [ Mar 22 14:30:32 Executing start method ("/lib/svc/method/svc-varpd"). ]
>>> varpd: failed to open a libvarpd handle: No such file or directory
>>> [ Mar 22 14:30:32 Method "start" exited with status 95. ]
>>> [ Mar 22 14:56:28 Executing start method ("/lib/svc/method/svc-varpd"). ]
>>> [ Mar 22 14:57:29 Method or service exit timed out.  Killing contract
>> 40. ]
>>> [root@00-25-90-e0-dd-2c ~]#
>>>
>>
>> This means that we're dying relatively early in the library
>> initialization -- before we can even open up a library handle to allow
>> the library to log more. I suspect this means that it's failing to open
>> the /dev/overlay file.
>>
>> I'd recommend confirming that with something like the following:
>>
>> dtrace -qn 'syscall::open:entry/execname == "varpd"/{ self->p = arg0; }'
>> -n 'syscall::open:return/self->p/{ printf("%s: %d %d\n",
>> copyinstr(self->p), arg1, errno); self->p = NULL; }'
>>
>> And then in another window restart / clear varpd.
>>
>> Robert
>>
> 
> 
> 



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] Issue with host service varpd

2017-03-22 Thread Robert Mustacchi
On 3/22/17 8:25 , Mark Creamer wrote:
> I have a host with a service in maintenance after a reboot, and several
> services on critical VMs will not start. The service in maintenance is
> varpd. Following is the log. I can't find anything on Google to help with
> the error "varpd: failed to open a libvarpd handle: No such file or
> directory". I appreciate any suggestions. If it might be just a matter of
> reinstalling something or copying a file over from a working host, I just
> need to know what to try.
> Thanks

Hi Mark,

Sorry to hear that you're having trouble. While varpd being in
maintenance is something we should understand, it should not be blocking
VMs from starting up unless this is Triton and not standalone SmartOS.
Probably worth understanding why they're not starting up.

> [root@00-25-90-e0-dd-2c ~]# cat /var/svc/log/network-varpd\:default.log
> [ May  9 19:03:16 Executing start method ("/lib/svc/method/svc-varpd"). ]
> [ May  9 19:03:17 Method "start" exited with status 0. ]
> [ Aug 25 23:40:33 Executing start method ("/lib/svc/method/svc-varpd"). ]
> [ Aug 25 23:40:35 Method "start" exited with status 0. ]
> [ Mar 19 17:39:35 Executing start method ("/lib/svc/method/svc-varpd"). ]
> [ Mar 19 17:39:36 Method "start" exited with status 0. ]
> [ Jun 30 03:46:10 Executing start method ("/lib/svc/method/svc-varpd"). ]
> [ Jun 30 03:46:11 Method "start" exited with status 0. ]
> [ Aug 28 23:39:18 Executing start method ("/lib/svc/method/svc-varpd"). ]
> [ Aug 28 23:39:20 Method "start" exited with status 0. ]
> [ Mar 22 12:38:48 Executing start method ("/lib/svc/method/svc-varpd"). ]
> [ Mar 22 12:39:49 Method or service exit timed out.  Killing contract 44. ]
> [ Mar 22 14:28:26 Leaving maintenance because disable requested. ]
> [ Mar 22 14:28:26 Disabled. ]
> [ Mar 22 14:28:48 Enabled. ]
> [ Mar 22 14:28:48 Executing start method ("/lib/svc/method/svc-varpd"). ]
> varpd: failed to open a libvarpd handle: No such file or directory
> [ Mar 22 14:28:48 Method "start" exited with status 95. ]
> [ Mar 22 14:30:31 Leaving maintenance because clear requested. ]
> [ Mar 22 14:30:31 Enabled. ]
> [ Mar 22 14:30:32 Executing start method ("/lib/svc/method/svc-varpd"). ]
> varpd: failed to open a libvarpd handle: No such file or directory
> [ Mar 22 14:30:32 Method "start" exited with status 95. ]
> [ Mar 22 14:56:28 Executing start method ("/lib/svc/method/svc-varpd"). ]
> [ Mar 22 14:57:29 Method or service exit timed out.  Killing contract 40. ]
> [root@00-25-90-e0-dd-2c ~]#
> 

This means that we're dying relatively early in the library
initialization -- before we can even open up a library handle to allow
the library to log more. I suspect this means that it's failing to open
the /dev/overlay file.

I'd recommend confirming that with something like the following:

dtrace -qn 'syscall::open:entry/execname == "varpd"/{ self->p = arg0; }'
-n 'syscall::open:return/self->p/{ printf("%s: %d %d\n",
copyinstr(self->p), arg1, errno); self->p = NULL; }'


And then in another window restart / clear varpd.

Robert



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [developer] Re: [smartos-discuss] BBR Congestion Control algorithm

2017-03-16 Thread Robert Mustacchi
Hi Bhavyan,

On 3/16/17 11:34 , Bhavyan Bharatharajan wrote:
> Hi,
> 
> ZebiOS (OS based on Illumos for Tegile Appliances) recently implemented a
> framework to support multiple TCP congestion control algorithms along with
> support for CUBIC algorithm. The framework implementation is heavily
> influenced by the corresponding framework in FreeBSD. All the testing
> including performance was done with ZebiOS but I verified the changes with
> omniOS after importing the patch and it seems to work with no issues.
> 
> The changes are available at following branch
> https://github.com/Tegile-Dev/illumos-gate/tree/tcp-cc-branch
> 
> and the corresponding changes are visible through the following link
> https://github.com/Tegile-Dev/illumos-gate/commit/
> 73f8332f063c73d053fa45b96d602ebc7ac57295
> 
> I just wanted to find out if there is an interest in incorporating these
> changes to illumos, if so I can send out a formal review request and follow
> the process to integrate these changes to illumos gate. Please advice.

Thanks for reaching out.

I'd recommend that you sync up with Sebastien Roy at Delphix who also
has something similar implemented and then we can combine the bits that
exist and get the combination of features implemented.

Thanks,
Robert

> On Wed, Feb 22, 2017 at 7:03 AM, Schweiss, Chip  wrote:
> 
>> I recently did some testing of the BBR congestion control algorithm.We
>> currently license Aspera to speed up downloads for our users, it is a
>> ridiculously expensive solution.  (6 figure $ for only 3 years).   Aspera
>> uses UDP and its own congestion control algorithms to better utilize the
>> network.We will be dropping Aspera and implementing BBR on our download
>> servers when our Aspera license is up.
>>
>> BBR is a game changer and any system without it will be considered a
>> non-solution for anything that is serving data on the web in the future.
>>  BBR is BSD licensed so it will make it into production Linux and Windows
>> servers pretty easily.If it's not on the radar of anyone working on
>> Illumos in the near future it should be.  Without it, Illumos will lose its
>> competitiveness.
>>
>> Here are some tests I did between St.Louis, Missouri, EC2 in Virginia and
>> EC2 in Sidney, Australia.  Keep in mind the system tested with has a 10Gb/s
>> connection to the internet and addition downloads going on at around 2 Gb/s
>> total.
>>
>> * From EC2 in Virginia to St. Louis, kernel 4.4 with CUBIC algorithm, 1 GB
>> file
>>
>> scp   - 190 Mb/s
>> rsync - 150 Mb/s
>> ascp  - 518 Mb/s (Aspera client)
>>
>> After update to kernel 4.9 and enable BBR:
>>
>> scp   - 390 Mb/s
>> rsync - 436 Mb/s
>> ascp  - 400 Mb/s
>>
>> * From EC2 in Sydney to St. Louis, kernel 4.4 with CUBIC algorithm, 1 GB
>> file
>>
>> scp   - 64  Mb/s
>> rsync - 68  Mb/s
>> ascp  - 220 Mb/s
>>
>> After update to 4.10-rc1 and enable BBR
>>
>> scp   - 76  Mb/s
>> rsync - 76  Mb/s
>> ascp  - 220 Mb/s
>>
>> 3 parallel streams
>> scp   - 188 Mb/s
>> ascp  - 192 Mb/s
>>
>> 5 parallel streams
>> scp   - 312 Mb/s
>> ascp  - 188 Mb/s
>>
>> 8 parallel streams
>> scp   - 302 Mb/s
>> ascp  - 192 Mb/s
>>
>> -Chip
>>
>>
>>
>>
>>
>>
>> On Thu, Dec 22, 2016 at 9:58 AM, Dan McDonald  wrote:
>>
>>>
 On Dec 22, 2016, at 8:05 AM, G B via smartos-discuss <
>>> smartos-discuss@lists.smartos.org> wrote:

 From what I've read, Google's BBR congestion control algorithm will be
>>> in the Linux 4.9 kernel and FreeBSD has it slated for later (possibly 11
>>> Stable or 12 Current).  Will this be making it into illumos and what would
>>> be the timeline?
>>>
>>> The optimal approach to this is to get TCP to accept pluggable congestion
>>> algorithms.  This was in-progress at Snoracle prior to the
>>> barn-door-closing of 2010.  After that, putting in new ones SHOULD be
>>> straightforward.
>>>
>>> People who could work on this are likely swamped with other things
>>> currently.  I imagine pluggable algorithms would be a months-long project,
>>> especially given the testing involved (you REALLY don't want to break TCP,
>>> or worse, have it become a bad congestion citizen).  After that, a specific
>>> replacement would be fewer months than the original setup.
>>>
>>> Sorry I can't be of more immediate assistance,
>>> Dan
>>>
>>
>> *illumos-developer* | Archives
>> 
>>  |
>> Modify
>> 
>> Your Subscription 
>>
> 
> 


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] smartos intel nic issue question.

2017-03-13 Thread Robert Mustacchi
On 3/13/17 1:02 , 강경원 wrote:
> Hello.
> 
> We have 10G nic issue with smartos.
> 
> It's Intel nic but we can't communicate with this card (with or without lacp).
> 
> How can we diagnose issue and determine it's compatible or not?

fmadm is telling you that this piece of hardware is having a lot of
problems and is likely broken and needs to be replaced. If you want to
see the individual actions that led here, you can run fmdump -eV to get
a better sense of the error reports that led to that. If the device is
throwing errors like this, it's not terribly surprising that this is
happening. From the ereports, we should be able to see exactly what's
happening.

> * fmadm faulty result and detailed card info
> 
>   * fmadm faulty result
> ---   -- -
> TIMEEVENT-ID  MSG-ID SEVERITY
> ---   -- -
> Mar 07 09:02:48 029c35c2-84ac-ccd0-cc82-dbb19312c111  PCIEX-8000-MH  Major
> 
> Host: headnode
> Platform: TBD   Chassis_id  : 123456789
> Product_sn  :
> 
> Fault class : fault.io.pciex.device-interr-unaf
> Affects : dev:pci@6c,0/pci8086,2f06@2,2/pci8086,3@0
>faulted but still in service
> FRU : "MB" 
> (hc://:product-id=TBD:server-id=headnode:chassis-id=123456789/motherboard=0)
>faulty
> 
> Description : Too many recovered errors have been detected, which indicates a
>problem with the specified PCIEX device. This may degrade into 
> an
>unrecoverable fault.
>Refer to http://illumos.org/msg/PCIEX-8000-MH for more
>information.
> 
> Response: One or more device instances may be disabled
> 
> Impact  : Loss of services provided by the device instances associated 
> with
>this fault
> 
> Action  : Schedule a repair procedure to replace the affected device.  Use
>fmadm faulty to identify the device or contact Sun for support.
> 
> ---   -- -
> TIMEEVENT-ID  MSG-ID SEVERITY
> ---   -- -
> Mar 07 06:12:21 53f0fdad-94e7-448a-aee7-e652d7823734  SUNOS-8000-J0  Major
> 
> Host: headnode
> Platform: TBD   Chassis_id  : 123456789
> Product_sn  :
> 
> Fault class : fault.sunos.eft.unexpected_telemetry max 12%
>defect.sunos.eft.unexpected_telemetry max 12%
> Affects : dev:pci@6c,0/pci8086,2f06@2,2/pci8086,3@0
>mod:///mod-name=ixgbe/mod-id=164
>dev:pci@6c,0/pci8086,2f06@2,2/pci8086,3@0,1
>faulted but still in service
> FRU : "MB" 
> (hc://:product-id=TBD:server-id=headnode:chassis-id=123456789/motherboard=0) 
> max 12%
>faulty
> 
> Description : The diagnosis engine encountered telemetry from the listed
>devices for which it was unable to perform a diagnosis -
>Refer to http://illumos.org/msg/SUNOS-8000-J0 for more
>information.  Refer to http://illumos.org/msg/SUNOS-8000-J0 for
>more information.
> 
> Response: Error reports have been logged for examination by Sun.
> 
> Impact  : Automated diagnosis and response for these events will not 
> occur.
> 
> Action  : Ensure that the latest Solaris Kernel and Predictive 
> Self-Healing
>(PSH) patches are installed.
> 
>   * prtconf result
> dev_path=/pci@6c,0/pci8086,2f06@2,2/pci8086,3@0:ixgbe0 
> 
> dev_path=/pci@6c,0/pci8086,2f06@2,2/pci8086,3@0,1:ixgbe1 
> 
> 
>   * prtconf detailed info for 
> "dev_path=/pci@6c,0/pci8086,2f06@2,2/pci8086,3@0:ixgbe0 
> "
> 
>  name='vendor-name' type=string items=1
>  value='Intel Corporation'
>  name='device-name' type=string items=1
>  value='82599ES 10-Gigabit SFI/SFP+ Network 
> Connection'
>  name='subsystem-name' type=string items=1
> value='Ethernet Server Adapter X520-2'

If you look at our hardware support pages, you'll see that the X520 is
well supported. It's probably one of the most commonly used NICs in the
system.

Robert



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


[smartos-discuss] Heads Up: OS-4903 move mdb_v8 to illumos-extra

2017-03-07 Thread Robert Mustacchi
If you don't build SmartOS, then you can ignore this message.

With the integration of OS-4903 move mdb_v8 to illumos-extra, you'll
want to make sure that illumos-extra and illumos-joyent are both in
sync. Otherwise, you may end up having a build without mdb_v8. To update
a workspace you can simply run the following:

gmake clobber
gmake update
gmake live

Let me know if you have any questions.

Thanks,
Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] nvme ssd question

2017-02-16 Thread Robert Mustacchi
On 2/16/17 4:06 , 강경원 wrote:
> I updated the platform image with "20170202T033902Z" and booted again.
> 
> But the result was like below. Maybe it's nvme 1.2

Yes, the PM963 is specified to be a NVMe 1.2 device per the official
datasheet:
http://www.samsung.com/semiconductor/global/file/insight/2016/08/Samsung_PM963-1.pdf.

We'll follow up with our local Samsung Microelectronics folks to see if
we can get a sample so we can get the support for this cleaned up.

Robert

> [root@manta-bmt-jb01 ~]# uname -a
> SunOS manta-bmt-jb01 5.11 joyent_20170202T033902Z i86pc i386 i86pc
> 
> [root@manta-bmt-jb1 ~]# diskinfo
> TYPEDISKVID  PID  SIZE  RMV 
> SSD
> 
> [root@manta-bmt-jb01 ~]#
> 
> [root@manta-bmt-jb01 ~]# modinfo|grep nvme
> [root@manta-bmt-jb01 ~]#
> 
> [root@manta-bmt-jb01 /kernel/drv]# update_drv -vf nvme
> nvme.conf updated in the kernel.
> 2017-02-16T11:40:58.012544+00:00 manta-bmt-jb01 nvme: [ID 767541 
> kern.warning] 
> WARNING: nvme0: no support for version > 1.1
> 
> And when I change nvme.conf and do update_drv
> 
> [root@manta-bmt-jb01 /kernel/drv]# diskinfo
> 
> TYPEDISKVID  PID  SIZE  RMV 
> SSD
> 
> [root@manta-bmt-jb01 /kernel/drv]# sed -i '' -e 
> '/#strict-version=0;/{s:.*:strict-version=0;:;}' /kernel/drv/nvme.conf
> 
> [root@manta-bmt-jb01 /kernel/drv]# update_drv -vf nvme
> nvme.conf updated in the kernel.
> 2017-02-16T11:40:58.012544+00:00 manta-bmt-jb01 nvme: [ID 767541 
> kern.warning] 
> WARNING: nvme0: no support for version > 1.1
> 
> [root@manta-bmt-jb01 /kernel/drv]# diskinfo
> TYPEDISKVID  PID  SIZE  RMV 
> SSD
> -   c1t1d0      SAMSUNG MZQLW1T9HMJP-3 1788.50 GiB   no  
> yes
> 
> - *Original Message* -
> 
> *Sender* : Robert Mustacchi <r...@joyent.com>
> 
> *Date* : 2017-02-16 13:16 (GMT+9)
> 
> *Title* : Re: [smartos-discuss] nvme ssd question
> 
> On 2/15/17 19:37 , 강경원 wrote:
>> Thank you for the detailed explanation.
>>
>> So NVMe 1.1 driver is already merged or when will be released?
> 
> We sync with illumos every day. NVMe 1.1 has been in SmartOS since
> approximately October 2016.
> 
> Robert
> 
>> - *Original Message* -
>>
>> *Sender* : Robert Mustacchi <r...@joyent.com>
>>
>> *Date* : 2017-02-15 00:36 (GMT+9)
>>
>> *Title* : Re: [smartos-discuss] nvme ssd question
>>
>> On 2/14/17 1:52 , 강경원 wrote:
>>> Hello.
>>>
>>> We are testing the NVMe SSD with samsung 963 model.
>>>
>>> Through uncommenting the nvme.conf's version line and updating the devices, 
>>> we
>>> could configure the zpool and compute node with sdc.
>>>
>>> 1. nvme 1.1 support plan question
>>>
>>> I saw the sdc only supports nvme 1.0.
>>>
>>> Do you have the support plan for nvme 1.1 or nvme 1.2 ?
>>
>> We should have NVMe 1.1 support as of the merge of
>> https://www.illumos.org/issues/7382. We'll work on NVMe 1.2 when we
>> track down some devices that support it.
>>
>>> 2. nvme ssd test question
>>>
>>> When we pull out the nvme SSD, the zpool status command hung. Is it normal 
>>> case?
>>
>> If you're pulling the device, that suggests that you have a U.2 form
>> factor device, is that correct? If so, we don't have NVMe hotplug
>> support yet, but it is something we're looking at for Purley based
>> platforms.
>>
>> Robert
>>
>> 강경원(Kang, Kyungwon)   Marcus Kang
>>   RHCA/PMP/ITIL Master/OCP
>> 직급:수석보
>> 클라우드기술그룹(클라우드)M.P: 82-10-8998-2092
>>   kyungwon.k...@samsung.com
>>
>> *smartos-discuss* | Archives
>> <https://www.listbox.com/member/archive/184463/=now>
>> <https://www.listbox.com/member/archive/rss/184463/21483261-4b78dd38> | 
>> Modify
>> <https://www.listbox.com/member/?;>
>> Your Subscription [Powered by Listbox] <http://www.listbox.com>
> 
> 
> 강경원(Kang, Kyungwon)   Marcus Kang
>   RHCA/PMP/ITIL Master/OCP
> 직급:수석보
> 클라우드기술그룹(클라우드)M.P: 82-10-8998-2092
>   kyungwon.k...@samsung.com
> 
> *smartos-discuss* | Archives
> <https://www.listbox.com/member/archive/184463/=now>
> <https://www.listbox.com/member/archive/rss/184463/21483261-4b78dd38> | Modify
> <https://www.listbox.com/member/?;>
> Your Subscription [Powered by Listbox] <http://www.listbox.com>


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] nvme ssd question

2017-02-15 Thread Robert Mustacchi
On 2/15/17 19:37 , 강경원 wrote:
> Thank you for the detailed explanation.
> 
> So NVMe 1.1 driver is already merged or when will be released?

We sync with illumos every day. NVMe 1.1 has been in SmartOS since
approximately October 2016.

Robert

> - *Original Message* -
> 
> *Sender* : Robert Mustacchi <r...@joyent.com>
> 
> *Date* : 2017-02-15 00:36 (GMT+9)
> 
> *Title* : Re: [smartos-discuss] nvme ssd question
> 
> On 2/14/17 1:52 , 강경원 wrote:
>> Hello.
>>
>> We are testing the NVMe SSD with samsung 963 model.
>>
>> Through uncommenting the nvme.conf's version line and updating the devices, 
>> we
>> could configure the zpool and compute node with sdc.
>>
>> 1. nvme 1.1 support plan question
>>
>> I saw the sdc only supports nvme 1.0.
>>
>> Do you have the support plan for nvme 1.1 or nvme 1.2 ?
> 
> We should have NVMe 1.1 support as of the merge of
> https://www.illumos.org/issues/7382. We'll work on NVMe 1.2 when we
> track down some devices that support it.
> 
>> 2. nvme ssd test question
>>
>> When we pull out the nvme SSD, the zpool status command hung. Is it normal 
>> case?
> 
> If you're pulling the device, that suggests that you have a U.2 form
> factor device, is that correct? If so, we don't have NVMe hotplug
> support yet, but it is something we're looking at for Purley based
> platforms.
> 
> Robert
> 
> 강경원(Kang, Kyungwon)   Marcus Kang
>   RHCA/PMP/ITIL Master/OCP
> 직급:수석보
> 클라우드기술그룹(클라우드)M.P: 82-10-8998-2092
>   kyungwon.k...@samsung.com
> 
> *smartos-discuss* | Archives
> <https://www.listbox.com/member/archive/184463/=now>
> <https://www.listbox.com/member/archive/rss/184463/21483261-4b78dd38> | Modify
> <https://www.listbox.com/member/?;>
> Your Subscription [Powered by Listbox] <http://www.listbox.com>


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] His there a way to pass a physical disk to a VM?

2017-02-14 Thread Robert Mustacchi
On 2/14/17 19:05 , Miguel C wrote:
> Sorry If I missed something trivial in the doc/man pages, Is this possible?
> 
> Say the vm is a windows guest, so If I wanted to add a raw device to the
> guest instead of defining a size of a virtual disk, how could I do that?

No, there isn't a way to do that.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] nvme ssd question

2017-02-14 Thread Robert Mustacchi
On 2/14/17 1:52 , 강경원 wrote:
> Hello.
> 
> We are testing the NVMe SSD with samsung 963 model.
> 
> Through uncommenting the nvme.conf's version line and updating the devices, 
> we 
> could configure the zpool and compute node with sdc.
> 
> 1. nvme 1.1 support plan question
> 
> I saw the sdc only supports nvme 1.0.
> 
> Do you have the support plan for nvme 1.1 or nvme 1.2 ?

We should have NVMe 1.1 support as of the merge of
https://www.illumos.org/issues/7382. We'll work on NVMe 1.2 when we
track down some devices that support it.

> 2. nvme ssd test question
> 
> When we pull out the nvme SSD, the zpool status command hung. Is it normal 
> case?

If you're pulling the device, that suggests that you have a U.2 form
factor device, is that correct? If so, we don't have NVMe hotplug
support yet, but it is something we're looking at for Purley based
platforms.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] vioblk driver

2017-02-13 Thread Robert Mustacchi
On 2/13/17 12:56 , Youzhong Yang wrote:
> It seems it's not that simple. I will leave it for later troubleshooting as
> I don't have cycles to debug it now.

Do you have a write up somewhere about what you encountered and what
went wrong?

Robert

> On Wed, Feb 1, 2017 at 8:33 PM, Robert Mustacchi <r...@joyent.com> wrote:
> 
>> On 1/30/17 16:45 , Youzhong Yang wrote:
>>> Thanks for helping out.
>>>
>>> What I need is to set logical block size to something larger than 512,
>> e.g.
>>> 4096. It doesn't work.
>>>
>>> I guess the bug is here:
>>>
>>> http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/
>> common/io/vioblk/vioblk.c#919
>>>
>>> sc_capacity is in 512B, needs to be converted to logical blocks. I will
>>> test the patch and report back.
>>
>> I agree, that definitely appears to be wrong.
>>
>> Robert
>>
> 
> 


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] mdata-get got stuck in epoll_wait

2017-02-13 Thread Robert Mustacchi
On 2/13/17 1:12 , Stefan wrote:
> Am 09.02.2017 21:23, schrieb Josh Wilsdon:
>> I also have a prototype of a
>> fix at: https://cr.joyent.us/#/c/1483/
> 
> produces a white page "Loading Gerrit Code Review ..." and a link "New
> User Guide" here.  JS console says
> "https://cr.joyent.us/gerrit_ui/undefined.cache.js 404 not found".  Is
> there any change to get the patch itself by HTTP, FTP or git?

Here's an HTTP link to the patch file from the UI, sorry it ended up
erroring on you:

https://cr.joyent.us/changes/1483/revisions/57c424e145e378f68aab66002f5363bc02e6c198/patch?zip

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] mdata-get got stuck in epoll_wait

2017-02-07 Thread Robert Mustacchi
On 2/7/17 11:15 , Stefan wrote:
> Dear List,
> 
> since 20161222T003450Z* mdata-get got stuck when asking for
> the first metadata key while booting our recue image.  The problem
> occured for the first time on a 1 cpu/512 MB VM.  According to strace
> the mdata-get hung in an infinite loop performing epoll_waits with
> a 2 second timeout and returned 0 all the time.
> 
> After re-linking the mdata tools statically we now get "POLLERR"
> printed on the console every two seconds for varying total times
> (up to about 20 seconds) but eventually the boot process finishes.
> 
> Furthermore at the time when I booted the VM from its data
> partition, using the
> 
>centos-6 20150811 linux 2015-08-11T13:37:40Z
> 
> image, multiple lines
> 
>plat_recv timeout
> 
> showed up on the console.  Any ideas what the cause of this strange
> behavior may be?

Stefan,

Can you clarify whether this is a KVM image or an lx image that you're
using for your guest?

Thanks,
Robert

> Kind Regards,
> Stefan
> 
> *  bisected:
> 20170202T040152Z bad
> 20170105T024017Z bad
> 20161222T003450Z bad
> 20161208T003707Z good
> 20161110T013148Z good
> 20160906T181054Z good
> 20160428T170316Z good
> 


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] pkgadd(1M)

2017-02-05 Thread Robert Mustacchi
On 2/5/17 6:07 , a b wrote:
> I would like to build SmartOS-live which will have SVR4 packaging
> (pkgadd(1M) and friends back in /usr/sbin and /usr/bin. 
> 
> SmartOS purposely removes these from the RAMDisk  image,  and  so
> far,  I  have  been  unsuccessful  where  they  are  removed  and
> excluded. 
> 
> What do I need to modify to get those back in?

The list of files included in the build from illumos is based on the
'manifest' file in the root of the illumos-joyent repository. Adding to
them will allow you to proceed.

> Other option would be to bootstrap SVR4 packaging in  /opt/local,
> or  some  other  location  in  /opt. Thus, I have hunted down and
> patched  every  single  location  in  usr/src/cmd/svr4pkg   which
> referenced /usr, and still pkgadd(1M) complains thus:
> 
> # pkgadd -a /var/spool/pkg/SUNWpkgcmdsr/reloc/var/sadm/install/admin/default 
> SUNWpkgcmdsr
> ## Waiting for up to <300> seconds for package administration commands to 
> become available (another user is administering packages)
> pkgadd: ERROR: ERROR: Unable to acquire package administration lock for this 
> system; try again later
> pkgadd: ERROR: Unable to lock this zone for administration
> 
> 1 package was not processed!
> 
> running truss(1) on this I get that it's trying to run /usr/bin/pkgadm(1M):
> 
> 6196:   access("/usr/bin/pkgadm", X_OK) Err#2 ENOENT
> 6196:   fstat64(2, 0x08045220)  = 0
> ## Waiting for up to <300> seconds for package administration commands to 
> become available (another user is administering packages)6196:
> write(2, " # #   W a i t i n g   f".., 131)  = 131
> 
> And so, thinking about this  further,  since  /usr  seems  to  be
> deeply  ingrained  into  SVR4  packaging, rather than hack my way
> around this, this would be a non-existent problem if I only  knew
> where  to  look  to  not  have  SVR4  packaging excluded from the
> RAMDisk when doing `gmake world`. 
> 
> While option 1 with RAMDisk would be ideal, I'm  not  opposed  to
> option 2 (/opt inside of a zone).

I suspect that option 2 will provide you a better long-term experience
and that you should fork them from illumos and do what you want to work
in an arbitrary prefix. Especially as you'll likely need to come up with
something to handle all the existing packages that might assume /usr is
writable. This also gets you out of having to build a custom release.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] vioblk driver

2017-02-01 Thread Robert Mustacchi
On 1/30/17 16:45 , Youzhong Yang wrote:
> Thanks for helping out.
> 
> What I need is to set logical block size to something larger than 512, e.g.
> 4096. It doesn't work.
> 
> I guess the bug is here:
> 
> http://src.illumos.org/source/xref/illumos-gate/usr/src/uts/common/io/vioblk/vioblk.c#919
> 
> sc_capacity is in 512B, needs to be converted to logical blocks. I will
> test the patch and report back.

I agree, that definitely appears to be wrong.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] Weird msr instructions

2017-02-01 Thread Robert Mustacchi
On 1/30/17 23:17 , Micky wrote:
> Does anyone know what kind of MSR instructions are these:
> 
> unhandled rdmsr: 0xc
> unhandled wrmsr: 0x50cc47 data 90
> 
> Because every time this happens, the KVM VM crashes without any trace or
> activity in kvmstat.

You'll want to go through the Intel manuals to determine what exactly
that MSR is for. Unfortunately I'm not sure off hand.

Robert



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] USB3/XHCI troubleshooting

2017-01-23 Thread Robert Mustacchi
On 1/23/17 10:12 , Robert Mustacchi wrote:
> On 1/19/17 8:27 , Jason Lawrence wrote:
>> On Fri, Jan 6, 2017, at 04:55 PM, Robert Mustacchi wrote:
>>> On 1/5/17 18:43 , Jason Lawrence wrote:
>>>> On Mon, Jan 2, 2017, at 06:14 PM, Robert Mustacchi wrote:
>>>>> On 1/2/17 14:16 , Jason Lawrence wrote:
>>>>>> I've upgraded to the 20161222T003450Z build and have been having issues
>>>>>> with getting USB 3.0 drives recognized. The drives work fine on USB 2.0
>>>>>> ports.
>>>>>
>>>>> Hi Jason,
>>>>>
>>>>> Thanks for the report. Sorry you're seeing some trouble with this. The
>>>>> key from the log is the bit:
>>>>>
>>>>> "WARNING: /pci@0,0/pci15d9,811@14 (xhci0): Connecting device on port 12
>>>>> failed"
>>>>>
>>>>> This indicates that something in the connection process failed. While
>>>>> this is a bit of a complicated state machine, I have a D script that
>>>>> should help us figure out what's going on. Here's the D script.
>>>>>
>>>>> https://us-east.manta.joyent.com/rmustacc/public/tmp/connect.d
>>>>>
>>>>> The most useful thing to do would be to do the following:
>>>>>
>>>>> * Make sure drive is unplugged
>>>>> * run the D script as: dtrace -qs connect.d -o /var/tmp/connect.out
>>>>> * When you see the message in the system log about the connection on
>>>>> port 12 failed, hit ctrl+c.
>>>>>
>>>>> If you can do this for each drive independently, that'd be appreciated.
>>>>> If you can put each one of those files somewhere that'll help us figure
>>>>> out what the next step is.
>>>>>
>>>>> Thanks,
>>>>> Robert
>>>>
>>>> Here's one of the drives. I forgot to run the script before plugging in
>>>> the second drive and unplugging/replugging didn't create any output. I
>>>> can grab it the next time I'm able to reboot if it's needed.
>>>>
>>>> Syslog entries:
>>>>
>>>> 2017-01-06T02:28:27.152429+00:00 smarty usba: [ID 691482 kern.warning]
>>>> WARNING: /pci@0,0/pci15d9,811@14 (xhci0): Connecting device on port 12
>>>> failed
>>>> 2017-01-06T02:29:28.350770+00:00 smarty xhci: [ID 902155 kern.info]
>>>> NOTICE: xhci0: xhci stop endpoint command (2)/slot (1) in wrong state:
>>>> 19
>>>> 2017-01-06T02:29:28.350785+00:00 smarty xhci: [ID 617155 kern.info]
>>>> NOTICE: xhci0: endpoint is in state 3#012
>>>>
>>>> Full dtrace output here: http://pastebin.com/raw/rw6UDx2Q
>>>
>>> Thanks for the update Jason. Unfortunately it appears this is failing in
>>> a different way than I really expected. Mainly it's getting to the point
>>> that it seems like scsa2usb is attaching and starting to use the drive,
>>> so the initial script that I provided isn't going to be the most
>>> helpful. I'll try to put together something to further iterate on this.
>>>
>>> Thanks for all your help so far.
>>>
>>> Robert
>>
>> As a sanity check, I've been using both of these drives heavily for
>> backups on USB 2.0 ports with no obvious errors. Monitoring SMART stats
>> don't indicate there's a hardware issue that could impact these tests.
>>
>> Please let me know if I can provide any other information to help
>> troubleshoot the new XHCI implementation.
>>
> 
> Hi Jason,
> 
> Sorry for the delay in getting back to you. I think the next thing to
> figure out is if scsa2usb is failing to attach. I suspect we're getting
> far enough that it is. You can determine that by running:
> 
> dtrace -n 'fbt::scsa2usb_attach:return{ trace(arg1); stack(); }'
> 
> Would you mind opening a bug for this on
> github.com/joyent/smartos-live/issues? I think that might make this a
> bit easier to track.

Can you also include what the specific drives are that are failing to
attach?

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] USB3/XHCI troubleshooting

2017-01-23 Thread Robert Mustacchi
On 1/19/17 8:27 , Jason Lawrence wrote:
> On Fri, Jan 6, 2017, at 04:55 PM, Robert Mustacchi wrote:
>> On 1/5/17 18:43 , Jason Lawrence wrote:
>>> On Mon, Jan 2, 2017, at 06:14 PM, Robert Mustacchi wrote:
>>>> On 1/2/17 14:16 , Jason Lawrence wrote:
>>>>> I've upgraded to the 20161222T003450Z build and have been having issues
>>>>> with getting USB 3.0 drives recognized. The drives work fine on USB 2.0
>>>>> ports.
>>>>
>>>> Hi Jason,
>>>>
>>>> Thanks for the report. Sorry you're seeing some trouble with this. The
>>>> key from the log is the bit:
>>>>
>>>> "WARNING: /pci@0,0/pci15d9,811@14 (xhci0): Connecting device on port 12
>>>> failed"
>>>>
>>>> This indicates that something in the connection process failed. While
>>>> this is a bit of a complicated state machine, I have a D script that
>>>> should help us figure out what's going on. Here's the D script.
>>>>
>>>> https://us-east.manta.joyent.com/rmustacc/public/tmp/connect.d
>>>>
>>>> The most useful thing to do would be to do the following:
>>>>
>>>> * Make sure drive is unplugged
>>>> * run the D script as: dtrace -qs connect.d -o /var/tmp/connect.out
>>>> * When you see the message in the system log about the connection on
>>>> port 12 failed, hit ctrl+c.
>>>>
>>>> If you can do this for each drive independently, that'd be appreciated.
>>>> If you can put each one of those files somewhere that'll help us figure
>>>> out what the next step is.
>>>>
>>>> Thanks,
>>>> Robert
>>>
>>> Here's one of the drives. I forgot to run the script before plugging in
>>> the second drive and unplugging/replugging didn't create any output. I
>>> can grab it the next time I'm able to reboot if it's needed.
>>>
>>> Syslog entries:
>>>
>>> 2017-01-06T02:28:27.152429+00:00 smarty usba: [ID 691482 kern.warning]
>>> WARNING: /pci@0,0/pci15d9,811@14 (xhci0): Connecting device on port 12
>>> failed
>>> 2017-01-06T02:29:28.350770+00:00 smarty xhci: [ID 902155 kern.info]
>>> NOTICE: xhci0: xhci stop endpoint command (2)/slot (1) in wrong state:
>>> 19
>>> 2017-01-06T02:29:28.350785+00:00 smarty xhci: [ID 617155 kern.info]
>>> NOTICE: xhci0: endpoint is in state 3#012
>>>
>>> Full dtrace output here: http://pastebin.com/raw/rw6UDx2Q
>>
>> Thanks for the update Jason. Unfortunately it appears this is failing in
>> a different way than I really expected. Mainly it's getting to the point
>> that it seems like scsa2usb is attaching and starting to use the drive,
>> so the initial script that I provided isn't going to be the most
>> helpful. I'll try to put together something to further iterate on this.
>>
>> Thanks for all your help so far.
>>
>> Robert
> 
> As a sanity check, I've been using both of these drives heavily for
> backups on USB 2.0 ports with no obvious errors. Monitoring SMART stats
> don't indicate there's a hardware issue that could impact these tests.
> 
> Please let me know if I can provide any other information to help
> troubleshoot the new XHCI implementation.
> 

Hi Jason,

Sorry for the delay in getting back to you. I think the next thing to
figure out is if scsa2usb is failing to attach. I suspect we're getting
far enough that it is. You can determine that by running:

dtrace -n 'fbt::scsa2usb_attach:return{ trace(arg1); stack(); }'

Would you mind opening a bug for this on
github.com/joyent/smartos-live/issues? I think that might make this a
bit easier to track.

Thanks,
Robert



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: AW: [smartos-discuss] smartos ssd disk question

2017-01-23 Thread Robert Mustacchi
On 1/23/17 9:36 , Gernot Straßer wrote:
> Most (if not all) so called  of enterprise class SSD claim to be power-save 
> (being equipped with supercaps to power the drive until DRAM write cache is 
> emptied).
> In case of a power failure no system will be able to send a synchronize 
> command to the drive, so what sense would the supercap make if that was a 
> requirement?
> Does anybody have a suggestion on how to test that (besides pulling the power 
> cable)?

Hi Gernot,

I think you're looking at this from the wrong perspective. For example,
ZFS will not treat the write as stable until it receives a synchronize
cache command. For some devices it may be that the synchronize cache
command is required to get outstanding writes into the state that it
will be protected by the supercap. Obviously, this is something that's
going to vary from drive to drive. If it's totally fine for these
Toshiba's great. If someone wanted to make a chance to illumos that said
synchronize cache was unnecessary on those devices, then I'd want the
manufacturer to explicitly say so.

Robert

> -Ursprüngliche Nachricht-----
> Von: Robert Mustacchi [mailto:r...@joyent.com] 
> Gesendet: Montag, 23. Januar 2017 18:30
> An: smartos-discuss@lists.smartos.org
> Betreff: Re: [smartos-discuss] smartos ssd disk question
> 
> On 1/23/17 9:20 , Youzhong Yang wrote:
>> it is power safe and we've tested it here.
>>
>> https://toshiba.semicon-storage.com/us/product/storage-products/enterp
>> rise-ssd/px02smb-px02smfxxx.html?sug=1
> 
> Sure, it does say it's power safe. Are you sure that means you don't need to 
> issue synchronize cache commands to the device? For some devices, you still 
> need to issue synchronize cache commands even if they're power safe. If it 
> works, great. Hopefully that just means synchronize cache commands are a 
> no-op.
> 
> Robert
> 
>> On Mon, Jan 23, 2017 at 12:01 PM, Robert Mustacchi <r...@joyent.com> wrote:
>>
>>> On 1/23/17 6:29 , Youzhong Yang wrote:
>>>> Add something like this to /kernel/drv/sd.conf:
>>>>
>>>> "TOSHIBA PX02SMF020  ", "cache-nonvolatile:true",
>>>>
>>>> I don't think the sd.conf comes with smartos image has it, so you 
>>>> need to build your own image.
>>>
>>> In general, you should _never_ set this value. You have basically 
>>> told the system that this device is power safe and never requires a 
>>> synchronize cache command. This is not true for most devices and a 
>>> poorly timed panic will result in data loss on the one device whose 
>>> purpose is to protect its data: the slog.
>>>
>>> Note, when I generally talk about an SSD being power safe, that does 
>>> not mean that this can be set to true. The devices generally only 
>>> guarantee that data is safe after a synchronize cache command.
>>>
>>> I don't have as much experience with these Toshiba drives, so it may 
>>> be that their datasheet tells you something else in this case.
>>>
>>> Robert
>>>
>>
>>
> 
> 


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] smartos ssd disk question

2017-01-23 Thread Robert Mustacchi
On 1/23/17 9:20 , Youzhong Yang wrote:
> it is power safe and we've tested it here.
> 
> https://toshiba.semicon-storage.com/us/product/storage-products/enterprise-ssd/px02smb-px02smfxxx.html?sug=1

Sure, it does say it's power safe. Are you sure that means you don't
need to issue synchronize cache commands to the device? For some
devices, you still need to issue synchronize cache commands even if
they're power safe. If it works, great. Hopefully that just means
synchronize cache commands are a no-op.

Robert

> On Mon, Jan 23, 2017 at 12:01 PM, Robert Mustacchi <r...@joyent.com> wrote:
> 
>> On 1/23/17 6:29 , Youzhong Yang wrote:
>>> Add something like this to /kernel/drv/sd.conf:
>>>
>>> "TOSHIBA PX02SMF020  ", "cache-nonvolatile:true",
>>>
>>> I don't think the sd.conf comes with smartos image has it, so you need to
>>> build your own image.
>>
>> In general, you should _never_ set this value. You have basically told
>> the system that this device is power safe and never requires a
>> synchronize cache command. This is not true for most devices and a
>> poorly timed panic will result in data loss on the one device whose
>> purpose is to protect its data: the slog.
>>
>> Note, when I generally talk about an SSD being power safe, that does not
>> mean that this can be set to true. The devices generally only guarantee
>> that data is safe after a synchronize cache command.
>>
>> I don't have as much experience with these Toshiba drives, so it may be
>> that their datasheet tells you something else in this case.
>>
>> Robert
>>
> 
> 


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] smartos ssd disk question

2017-01-23 Thread Robert Mustacchi
On 1/22/17 16:38 , 강경원 wrote:
> Hello,
> 
> We are testing with "TOSHIBAPX02SMF020" SSD for zfs log disk.
> 
> Although we added this to the smartos, the disk write io was not increased.
> 
> Have you ever experienced this kind of issue with Toshiba?
> 
> Any recommendation for selecting ssd?

We generally recommend looking at write intensive, low capacity drives.
The number one requirement is that the drive is power safe. The second
requirement is latency.

But please remember, when performing any king of I/O benchmark, that
you're actually performing _active_ benchmarking. That doesn't mean you
just run a test, look at a number, and call it a day. You need to
analyze the system and understand why the numbers that you're getting
are limiters.

Also, it's worth remembering that a slog only will work for synchronous
writes. It's very easy to write a benchmark for which a slog will not
work at all.

Robert



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] smartos ssd disk question

2017-01-23 Thread Robert Mustacchi
On 1/23/17 6:29 , Youzhong Yang wrote:
> Add something like this to /kernel/drv/sd.conf:
> 
> "TOSHIBA PX02SMF020  ", "cache-nonvolatile:true",
> 
> I don't think the sd.conf comes with smartos image has it, so you need to
> build your own image.

In general, you should _never_ set this value. You have basically told
the system that this device is power safe and never requires a
synchronize cache command. This is not true for most devices and a
poorly timed panic will result in data loss on the one device whose
purpose is to protect its data: the slog.

Note, when I generally talk about an SSD being power safe, that does not
mean that this can be set to true. The devices generally only guarantee
that data is safe after a synchronize cache command.

I don't have as much experience with these Toshiba drives, so it may be
that their datasheet tells you something else in this case.

Robert



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] SmartOS 20170119T014200Z fails to deliver 32 bit libficl-sys.so

2017-01-23 Thread Robert Mustacchi
On 1/22/17 15:52 , Attila Fülöp wrote:
> All,
> 
> There seems to be an omission in the latest SmartOS image.
> 
> After updating to platform_20170119T014200Z one of my zones didn't came
> up. Turns out that I disabled lazyloading in that zone and svc.startd
> failed to start due to missing /usr/lib/libficl-sys.so.4.1.0:
> 
> ld.so.1: svc.startd: fatal: libficl-sys.so.4.1.0: open failed: No such
> file or directory
> ld.so.1: svc.startd: fatal: relocation error: file
> /lib/svc/bin/svc.startd: symbol be_get_boot_args: referenced symbol not
> found
> 
> Indeed there is a 64 bit /usr/lib/amd64/libficl-sys.so.4.1.0 but no 32 bit
> /usr/lib/libficl-sys.so.4.1.0. This looks like an accidental omission to
> me. Can anybody confirm this?

Hi Attila,

Thank you for pointing this out. It was an accidental omission. I've
gone and filed 'OS-5912: missing 32-bit libficl-sys.so.4.1.0'
(https://smartos.org/bugview/OS-5912) and should get that fixed in the
next day or so. Sorry for the trouble.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] Which intel 1GE NIC to choose?

2017-01-21 Thread Robert Mustacchi
On 1/21/17 2:12 , Jussi Sallinen wrote:
> Hi,
> 
> Any pros/cons regarding Intel I210 vs. I350?
> I'd need a NIC that can do Jumbo Frames and works Ok with SmartOS, guess
> both of these would do?

Both of these would work just fine, they support jumbo frames.

Robert



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] lofiadm / zpool create inside a zone

2017-01-09 Thread Robert Mustacchi
On 1/8/17 23:40 , Matthias Goetzke wrote:
> Thanks for the feedback. The link was supposed to link to 
> http://constantin.glez.de/blog/2012/02/introducing-sparse-encrypted-zfs-pools.
>  (Pasting it in from pocket hid the getpocket link)
> 
> I tried updating the vm with a number of different privileges, but with no 
> success. sys_config for example fails  stating invalid privilege
> 
> vmadm update f62ecc2d-825f-4df9-b5e1-e95207831d52 
> limit_priv=default,sys_config
> Command failed: On line 1 of /tmp/zonecfg.58411.tmp:
> f62ecc2d-825f-4df9-b5e1-e95207831d52: invalid privilege
> 
> sys_admin works but doesn’t have any effect.
> 
> http://docs.oracle.com/cd/E19044-01/sol.containers/817-1592/6mhahup91/index.html
>  mentions that sys_config etc are not allowed in zones in solaris. Sadly I 
> don’t know how to see which privilege is being denied. Maybe if somebody 
> could tell me that I proceed without guessing.

If you're asking which privilege is missing in the context of the zone
when running the command, then you'll want to use the ppriv -D option.
See http://illumos.org/man/1/ppriv for more information.

If at the end of the day you can't create this in the zone, you can
always create it in the GZ and delgate a dataset from it

Robert




---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] Help with a crash dump (mpt_sas - maybe?)

2017-01-06 Thread Robert Mustacchi
On 1/6/17 4:28 , Adam Richmond-Gordon wrote:
> Hello,
> 
> I have opened a GitHub issue for this;
> https://github.com/joyent/illumos-joyent/issues/135 
> <https://github.com/joyent/illumos-joyent/issues/135>

Hi Adam,

Thanks for following up there. If you can contact me privately, I'll
follow up and grab the dump so we can get some eyes on it at Joyent.

Thanks,
Robert

>> On 5 Jan 2017, at 17:16, Adam Richmond-Gordon <soc...@themisanthrope.co.uk> 
>> wrote:
>>
>> Thanks Robert - I’ll get it uploaded and create an issue.
>>
>> The dump device size is set to 256GB (equal to RAM) - I’ve been gradually 
>> increasing it after every dumpless crash. I’d been considering connecting a 
>> USB disk and pointing dumpadm at that.
>>
>>> On 5 Jan 2017, at 16:52, Robert Mustacchi <r...@joyent.com 
>>> <mailto:r...@joyent.com>> wrote:
>>>
>>> On 1/5/17 6:55 , Adam Richmond-Gordon wrote:
>>>> Afternoon!
>>>>
>>>> I’ve been trying to diagnose a box that’s been crashing after anywhere 
>>>> between 3 and 30 days uptime. For a little while, I’d suspected the issue 
>>>> might be storage related, as the crashes never created a dump, but sending 
>>>> an NMI always did.
>>>>
>>>> Today, the box actually created a dump, and on first investigation, it 
>>>> appears that the mpt_sas driver may be involved. Maybe. The last message 
>>>> in the buffer points to the PCI address that the SAS controller occupies, 
>>>> but a lot of the messages before that appear to be related to KVM.
>>>>
>>>> If anyone has the time, I’d really appreciate some pointers on narrowing 
>>>> this down. I’ve not raised an issue on GitHub yet, because this could 
>>>> easily be a hardware-related issue.
>>>
>>> The stack that you have is when exiting a mutex something incorrect
>>> happen. We tried to unlock the lock to something we don't have. If you
>>> can upload the dump somewhere that folks can dig into it, that'd be
>>> helpful. If you need help doing so or need a location to put it, please
>>> let me know. I'd also create a github ticket for that.
>>>
>>> For the crashes not creating a dump, I'd double check your dump device size.
>>>
>>> Robert
>>>
>>>> Through some poking around at the end of last year, I have also noticed 
>>>> that the onboard SAS controller (LSI/Avago 3008) isn’t running the 
>>>> firmware we specified to the VAR - not sure if this is likely to make a 
>>>> difference, but it’s easily flashed. It is currently running the IR 
>>>> firmware on and older phase, when it should really be running the recent 
>>>> IT firmware.
>>>>
>>>> Here are the potentially interesting bits from the dump;
>>>>
>>>>> ::status
>>>> debugging crash dump vmcore.0 (64-bit) from bri-triw-001
>>>> operating system: 5.11 joyent_20161208T003358Z (i86pc)
>>>> image uuid: (not set)
>>>> panic message:
>>>> mutex_exit: not owner, lp=fea460ba2020 owner=fea3b90aa460 
>>>> thread=fea3f40b1be0
>>>> dump content: kernel pages only
>>>>
>>>>> $C
>>>> fe426a808fe8 vpanic()
>>>> fe426a809008 mutex_panic+0x58(fb94dc45, fea460ba2020)
>>>> fe426a809038 mutex_vector_exit+0x40(fea460ba2020)
>>>> fe426a809070 gfn_to_memslot_unaliased+0x6f()
>>>> fe426a809090 gfn_to_hva+0x27()
>>>> fe426a8090c0 kvm_read_guest_page+0x29()
>>>> fe426a809110 kvm_read_guest+0x43()
>>>> fe426a809190 paging64_walk_addr+0xef()
>>>> fe426a809230 paging64_gva_to_gpa+0x43()
>>>> fe426a809260 kvm_mmu_gva_to_gpa_read+0x45()
>>>> fe426a8092b0 emulator_read_emulated+0x7c()
>>>> fe426a809350 x86_emulate_insn+0x1af()
>>>> fe426a809390 emulate_instruction+0x1e9()
>>>> fe426a8093c0 kvm_mmu_page_fault+0x60()
>>>> fe426a8093f0 handle_ept_violation+0x111()
>>>> fe426a809430 vmx_handle_exit+0x16a()
>>>> fe426a809460 vcpu_enter_guest+0x3ea()
>>>> fe426a8094a0 __vcpu_run+0x8b()
>>>> fe426a8094e0 kvm_arch_vcpu_ioctl_run+0x112()
>>>> fe426a809cc0 kvm_ioctl+0x466()
>>>> fe426a809d00 cdev_ioctl+0x39(340068, 2000ae80, 0, 202003, 
>>>> fea3b7d370d0,
>>>> fe426a809ea8)
>>

Re: [smartos-discuss] USB3/XHCI troubleshooting

2017-01-06 Thread Robert Mustacchi
On 1/5/17 18:43 , Jason Lawrence wrote:
> On Mon, Jan 2, 2017, at 06:14 PM, Robert Mustacchi wrote:
>> On 1/2/17 14:16 , Jason Lawrence wrote:
>>> I've upgraded to the 20161222T003450Z build and have been having issues
>>> with getting USB 3.0 drives recognized. The drives work fine on USB 2.0
>>> ports.
>>
>> Hi Jason,
>>
>> Thanks for the report. Sorry you're seeing some trouble with this. The
>> key from the log is the bit:
>>
>> "WARNING: /pci@0,0/pci15d9,811@14 (xhci0): Connecting device on port 12
>> failed"
>>
>> This indicates that something in the connection process failed. While
>> this is a bit of a complicated state machine, I have a D script that
>> should help us figure out what's going on. Here's the D script.
>>
>> https://us-east.manta.joyent.com/rmustacc/public/tmp/connect.d
>>
>> The most useful thing to do would be to do the following:
>>
>> * Make sure drive is unplugged
>> * run the D script as: dtrace -qs connect.d -o /var/tmp/connect.out
>> * When you see the message in the system log about the connection on
>> port 12 failed, hit ctrl+c.
>>
>> If you can do this for each drive independently, that'd be appreciated.
>> If you can put each one of those files somewhere that'll help us figure
>> out what the next step is.
>>
>> Thanks,
>> Robert
> 
> Here's one of the drives. I forgot to run the script before plugging in
> the second drive and unplugging/replugging didn't create any output. I
> can grab it the next time I'm able to reboot if it's needed.
> 
> Syslog entries:
> 
> 2017-01-06T02:28:27.152429+00:00 smarty usba: [ID 691482 kern.warning]
> WARNING: /pci@0,0/pci15d9,811@14 (xhci0): Connecting device on port 12
> failed
> 2017-01-06T02:29:28.350770+00:00 smarty xhci: [ID 902155 kern.info]
> NOTICE: xhci0: xhci stop endpoint command (2)/slot (1) in wrong state:
> 19
> 2017-01-06T02:29:28.350785+00:00 smarty xhci: [ID 617155 kern.info]
> NOTICE: xhci0: endpoint is in state 3#012
> 
> Full dtrace output here: http://pastebin.com/raw/rw6UDx2Q

Thanks for the update Jason. Unfortunately it appears this is failing in
a different way than I really expected. Mainly it's getting to the point
that it seems like scsa2usb is attaching and starting to use the drive,
so the initial script that I provided isn't going to be the most
helpful. I'll try to put together something to further iterate on this.

Thanks for all your help so far.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] lofiadm / zpool create inside a zone

2017-01-06 Thread Robert Mustacchi
On 1/6/17 0:54 , Matthias Goetzke wrote:
> 
> Does anyone here know whether its possible to create a zpool on top of 
> lofiadm / mkfile inside a smartos and/or LX zone ?
> 
> Basically I am looking at something similar to this: 
> https://getpocket.com/a/read/141369096 . I get the first steps working but a 
> normal zone root user does not have permission to create a new pool (even if 
> it is just a local one). Can I give permission ? If yes what security risks 
> would this expose ? 
> mkdir szpools
> cd szpools
>  mkfile 1g szpool_1 szpool_2 
> 
> lofiadm -c aes-256-cbc -a szpools/szpool_1
>   Enter passphrase: 
>   Re-enter passphrase: 
>   /dev/lofi/1
> 
> lofiadm -c aes-256-cbc -a szpools/szpool_2
>   Enter passphrase: 
>   Re-enter passphrase: 
>   /dev/lofi/2
> 
> zpool create szpool raid-z2 /dev/lofi/1 /dev/lofi/2

Unfortunately your pocket link doesn't seem to work without using pocket
itself, so I'm not sure what the article is.

In general, the security issues are around the fact that ZFS trusts the
disks and doesn't really handle validly checksummed, but bogus data on
disks that well. As such, you could likely manipulate these virtual
disks to panic the system or probably worse. I would just ask do you
trust the stuff in the zone like you do the GZ?

I think you should be able to give the zone the appropriate privileges.
You'll need to grant the SYS_CONFIG privilege at least. You'll want to
review http://illumos.org/man/5/privileges for more information on what
else that grants.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] fs-local is too rigorous

2017-01-06 Thread Robert Mustacchi
On 1/6/17 0:28 , Gernot Straßer wrote:
> This is regarding ./proto/lib/svc/method/fs-local
> 
> I was playing around with zfs crypto when I figured that script will fail if
> there is an encrypted dataset waiting for passphrase input.
> 
> In line 90 it does a zfs mount -va and fails hard if there is an error,
> which causes all dependent services to fail . This includes sshd which
> renders  the system unaccessible.
> 
> My proposal would be to skip checking for mount errors, but there might be
> reasons I am not aware of.

Hi Gernot,

I think there's a bit of a tension in there, which it's hard to have a
good answer to today. Mainly, how do we know if the dataset in question
that we couldn't mount is required for SmartOS to function and operate
versus not.

We may want to consider how that interacts with sshd, but then there are
issues with host keys changing, etc. if you can't actually mount those
datasets.

I think we're going to have to rethink a lot of this when ZFS cyrpto
does actually land, but there are a lot of open questions in terms of
what should the interfaces be and how should it work with encryption. If
those datasets aren't required for SmartOS to function, that's one
thing, but if they are, or correspond to a VM dataset that the zones
service wants to start, there are some more questions.

At this point, I think we probably need to more generally rethink this
before we just make it drive on.

Robert



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] Creating a persistent bridge

2017-01-03 Thread Robert Mustacchi
On 1/3/17 4:49 , G B via smartos-discuss wrote:
> To create a persistent bridge would I still have to create an xml file and 
> import it with svccfg in the global zone?
> For example, to have the following persist across reboots:

Yes, if you actually need a 802.1D semantics, then that's your best bet.

Robert



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] USB3/XHCI troubleshooting

2017-01-02 Thread Robert Mustacchi
On 1/2/17 14:16 , Jason Lawrence wrote:
> I've upgraded to the 20161222T003450Z build and have been having issues
> with getting USB 3.0 drives recognized. The drives work fine on USB 2.0
> ports.

Hi Jason,

Thanks for the report. Sorry you're seeing some trouble with this. The
key from the log is the bit:

"WARNING: /pci@0,0/pci15d9,811@14 (xhci0): Connecting device on port 12
failed"

This indicates that something in the connection process failed. While
this is a bit of a complicated state machine, I have a D script that
should help us figure out what's going on. Here's the D script.

https://us-east.manta.joyent.com/rmustacc/public/tmp/connect.d

The most useful thing to do would be to do the following:

* Make sure drive is unplugged
* run the D script as: dtrace -qs connect.d -o /var/tmp/connect.out
* When you see the message in the system log about the connection on
port 12 failed, hit ctrl+c.

If you can do this for each drive independently, that'd be appreciated.
If you can put each one of those files somewhere that'll help us figure
out what the next step is.

Thanks,
Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] 答复: [smartos-discuss] How to identify a PCIe SSD in SmartOS?

2016-12-30 Thread Robert Mustacchi
On 12/30/16 1:30 , 张俊钦 wrote:
> Thanks for your reply.
> 
> I may need to explain more about my question.
> We know how to identify disk is SSD or not.  : )
> What we want to know is the interface of the SSD is PCIe or SAS, or SATA.

We need to go back and make sure that diskinfo knows something is a type
NVMe (which generally indicates PCIe). It doesn't at the moment. I've
filed https://smartos.org/bugview/OS-5880 to track that.

In the interim, if you need to know, you can use prtconf -v and look at
the blkdev instances that are children of the nvme driver. They'll have
the devices links as properties.

Robert

> 发件人: Gjermund Gusland Thorsen [mailto:gjermundpri...@gmail.com]
> 发送时间: 2016年12月30日 15:52
> 收件人: smartos-discuss@lists.smartos.org
> 主题: Re: [smartos-discuss] How to identify a PCIe SSD in SmartOS?
> 
> It says in the list you just showed from diskinfo the latter column: SSD: yes
> 
> G
> 
> On 30 Dec, 2016, at 8:00, 张俊钦 
> > wrote:
> 
> Hi,
> 
> I have a question about PCIe SSD.
> As you can see below, we have one SSD in each server. We added a PCIe SSD to 
> server 1 and a non PCIe SSD to server2.
> 
> [root@server1 ~]# diskinfo
> TYPEDISKVID  PIDSIZE RMV  
> SSD
> c0t1d0  INTELSSDPEDMW012T41117.81 GiB   no   
> yes
> ATA c2d0-- 465.76 GiBno   
> no
> 
> [root@server2 ~]# diskinfo
> TYPEDISKVID  PID  SIZE
>   RMV SSD
> UNKNOWN c1t0d0ATA ST3000DM001-1ER1662794.52 GiB   no  
> no
> UNKNOWN c1t1d0INTELSSDSC2BA100G3  93.16 GiB 
> no  yes
> 
> But the question is how to identify a SSD type through SmartOS?
> We checked command format and prtconf, but nothing useful found.
> 
> Coud you help this?
> Thanks.
> 
> B.R.
> Junqin Zhang
> 


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] USB 3.0

2016-12-29 Thread Robert Mustacchi
On 12/29/16 8:29 , Robert Mustacchi wrote:
> On 12/29/16 8:25 , Ante Vojvodic wrote:
>> Hi Robert,
>>
>> Sorry I forgot to include BIOS details in previous mail. BIOS revision is
>> 1.0a (Build date: 01/29/2016) and I think that is the only BIOS revision
>> for this mbo.
> 
> Thanks so much, I'll start digging in and reaching out to SMCI so we can
> figure out what's going on. I appreciate all your help so far. Though
> because we're in the holidays it may end up being a bit slower to get
> there than I'd like.

I heard back from SuperMicro about this. They say that this BIOS option
is basically a workaround for Windows 7 and shouldn't be enabled by
default. There only real comment was that systems that aren't Windows 7
shouldn't enable it and it doesn't seem like there's a great way for us
to discover this at run time unfortunately.

Sorry I don't have better news on this one. Thanks for all your help in
digging through this Ante.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] USB 3.0

2016-12-29 Thread Robert Mustacchi
On 12/29/16 8:25 , Ante Vojvodic wrote:
> Hi Robert,
> 
> Sorry I forgot to include BIOS details in previous mail. BIOS revision is
> 1.0a (Build date: 01/29/2016) and I think that is the only BIOS revision
> for this mbo.

Thanks so much, I'll start digging in and reaching out to SMCI so we can
figure out what's going on. I appreciate all your help so far. Though
because we're in the holidays it may end up being a bit slower to get
there than I'd like.

Robert

> On Thu, Dec 29, 2016 at 4:00 PM, Robert Mustacchi <r...@joyent.com> wrote:
> 
>> On 12/29/16 3:38 , Ante Vojvodic wrote:
>>> Hi Robert,
>>>
>>> I had some progress with debugger. Here are steps for similar situations
>> in
>>> the future.
>>>
>>> Start kernel with debugger
>>> Once we hit kmdb enter: 'moddebug/W 0xf001' and then just keep going
>>> with ':c' until "apix" module is loaded.
>>> Now we need to disable panic on nmi and enable kmdb on nmi event. Enter
>>> following command in debugger: 'apic_panic_on_nmi/w0' and
>>> 'apic_kmdb_on_nmi/w1'.
>>> And keep going with ':c' until kernel hangs.
>>>
>>> I got hang after "Releasing drv/kb8042", and now when I issue nmi i get
>>> debugger prompt instead of system panic.
>>>
>>> Here is output from debugger:
>>> https://us-east.manta.joyent.com/avojvodic/public/kmdb_kb8042.txt
>>
>> Hi Ante,
>>
>> Thanks for putting that together, that helps confirm what operations
>> we're doing when we hang. I'll dig from here, but if you could also let
>> me know what the SMCI bios revision is, that'd help.
>>
>> Thanks,
>> Robert
>>
>>> On Thu, Dec 29, 2016 at 2:11 AM, Robert Mustacchi <r...@joyent.com> wrote:
>>>
>>>> Hi Ante,
>>>>
>>>> I have nothing fast I can suggest. So I'm going to reach out to SMCI,
>>>> but could you perhaps send me your current BIOS revision so I can
>>>> include that information when I reach out to them?
>>>>
>>>> Thanks,
>>>> Robert
>>>>
>>>
>>>
>>
>>
> 
> 


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] USB 3.0

2016-12-29 Thread Robert Mustacchi
On 12/29/16 3:38 , Ante Vojvodic wrote:
> Hi Robert,
> 
> I had some progress with debugger. Here are steps for similar situations in
> the future.
> 
> Start kernel with debugger
> Once we hit kmdb enter: 'moddebug/W 0xf001' and then just keep going
> with ':c' until "apix" module is loaded.
> Now we need to disable panic on nmi and enable kmdb on nmi event. Enter
> following command in debugger: 'apic_panic_on_nmi/w0' and
> 'apic_kmdb_on_nmi/w1'.
> And keep going with ':c' until kernel hangs.
> 
> I got hang after "Releasing drv/kb8042", and now when I issue nmi i get
> debugger prompt instead of system panic.
> 
> Here is output from debugger:
> https://us-east.manta.joyent.com/avojvodic/public/kmdb_kb8042.txt

Hi Ante,

Thanks for putting that together, that helps confirm what operations
we're doing when we hang. I'll dig from here, but if you could also let
me know what the SMCI bios revision is, that'd help.

Thanks,
Robert

> On Thu, Dec 29, 2016 at 2:11 AM, Robert Mustacchi <r...@joyent.com> wrote:
> 
>> Hi Ante,
>>
>> I have nothing fast I can suggest. So I'm going to reach out to SMCI,
>> but could you perhaps send me your current BIOS revision so I can
>> include that information when I reach out to them?
>>
>> Thanks,
>> Robert
>>
> 
> 


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] USB 3.0

2016-12-28 Thread Robert Mustacchi
Hi Ante,

I have nothing fast I can suggest. So I'm going to reach out to SMCI,
but could you perhaps send me your current BIOS revision so I can
include that information when I reach out to them?

Thanks,
Robert

On 12/28/16 8:50 , Ante Vojvodic wrote:
> Hi Robert,
> 
> I tried to send break, but kernel is stuck. Can't get to debugger.
> SOL output:
> load 'drv/kb8042' id 97 loaded @ 0xf7c5f000/0xc0001078 size
> 10360/3464
> installing kb8042, module id 97.
> ~B [send break]
> ~B [send break]
> ~B [send break]
> 
> 
> Do you have any other idea to try?
> 
> ---
> Ante
> 
> 
> On Wed, Dec 28, 2016 at 4:34 PM, Robert Mustacchi <r...@joyent.com> wrote:
> 
>> Hi Ante,
>>
>> Sorry, I hadn't realized you had booted with -kd. Normally when you
>> inject an nmi that works just fine and it traps into kmdb. Hmm. May be
>> worth just trying to inject a normal break (if on the ipmi serial you
>> can usually use ~b to get into it). If that doesn't work, let me know
>> and I'll try and put together some more involved instructions on
>> breaking and get some time to dig further into the stack trace to see if
>> I might be able to manually put together what I/O port we're writing to
>> so I can follow up with SMCI.
>>
>> Robert
>>
> 
> 


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] USB 3.0

2016-12-28 Thread Robert Mustacchi
On 12/28/16 8:25 , Ante Vojvodic wrote:
> Hi Robert,
> 
> Problem is as I wrote in my previous mail that when I issue NMI (chassis
> power diag), Instead of entering kmdb (SmartoS is booted with kmdb) I get
> panic with same stack trace as in previous mail and system gets rebooted.

Hi Ante,

Sorry, I hadn't realized you had booted with -kd. Normally when you
inject an nmi that works just fine and it traps into kmdb. Hmm. May be
worth just trying to inject a normal break (if on the ipmi serial you
can usually use ~b to get into it). If that doesn't work, let me know
and I'll try and put together some more involved instructions on
breaking and get some time to dig further into the stack trace to see if
I might be able to manually put together what I/O port we're writing to
so I can follow up with SMCI.

Robert

> On Wed, Dec 28, 2016 at 3:32 PM, Robert Mustacchi <r...@joyent.com> wrote:
> 
>> Hi Ante,
>>
>> The best thing to do here is to select the kmdb boot option from the
>> bios menu. This will then let you end up in the debugger when you take
>> the nmi. When you're at the first prompt, you'll want to use ':c' to
>> continue.
>>
>> At the nmi, the most useful things to get started would be the following
>> commands:
>>
>> '$C'
>> '::stacks -m xhci'
>> '::stacks -m usba'
>> '::prtusb'
>>
>> Once we get that, I'll see if we can dig further into what SMCI is doing
>> with that BIOS option as well and reach out to them.
>>
> 
> 


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] USB 3.0

2016-12-28 Thread Robert Mustacchi
On 12/28/16 3:10 , Ante Vojvodic wrote:
> Hi Robert,
> 
> Here is output from kmdb after sending NMI, I can't get core dump, is this
> too
> early in boot and dump device is not yet configured?
> And is there an option I'm not aware of to disable reboot and get kmdb?
> How to set early on boot apic_panic_on_nmi=0 and apic_kmdb_on_nmi=1?

Hi Ante,

The best thing to do here is to select the kmdb boot option from the
bios menu. This will then let you end up in the debugger when you take
the nmi. When you're at the first prompt, you'll want to use ':c' to
continue.

At the nmi, the most useful things to get started would be the following
commands:

'$C'
'::stacks -m xhci'
'::stacks -m usba'
'::prtusb'

Once we get that, I'll see if we can dig further into what SMCI is doing
with that BIOS option as well and reach out to them.

Thanks,
Robert

> installing kb8042, module id 97.
> 
> panic[cpu0]/thread=fbc31de0: NMI received
> 
> Warning - stack not written to the dump buffer
> fbc73d80 f792970f ()
> fbc73db0 unix:av_dispatch_nmivect+34 ()
> fbc73dc0 unix:nmiint+152 ()
> fbc73f00 unix:ddi_io_get8+f ()
> fbc73f90 kb8042:kb8042_send_and_wait+b5 ()
> fbc73fd0 kb8042:kb8042_send_to_keyboard+b6 ()
> fbc74000 kb8042:kb8042_init+54 ()
> fbc74040 kb8042:kb8042_attach+205 ()
> fbc740b0 genunix:devi_attach+92 ()
> fbc740f0 genunix:attach_node+a7 ()
> fbc74140 genunix:i_ndi_config_node+7d ()
> fbc74170 genunix:i_ddi_attachchild+48 ()
> fbc741b0 genunix:devi_attach_node+5e ()
> fbc742b0 genunix:devi_config_one+294 ()
> fbc74330 genunix:ndi_busop_bus_config+c1 ()
> fbc743c0 i8042:i8042_bus_config+a0 ()
> fbc74430 genunix:ndi_devi_config_one+186 ()
> fbc74510 genunix:resolve_pathname+155 ()
> fbc74540 genunix:ddi_pathname_to_dev_t+19 ()
> fbc74570 consconfig_dacf:consconfig_load_drivers+a3 ()
> fbc74590 consconfig_dacf:dynamic_console_config+a5 ()
> fbc745a0 consconfig:consconfig+9 ()
> fbc745f0 unix:stubs_common_code+59 ()
> fbc74630 genunix:main+1e1 ()
> fbc74640 unix:_locore_start+90 ()
> 
> skipping system dump - no dump device configured
> 
> 
> 
> Next thing I tried is to boot system with "disable-kb8042=true" in kernel
> args.
> And after system boots ok, I tried to manually load kb8042 driver in order
> to
> get coredump but:
> 
> $ modload /kernel/drv/amd64/kb8042
> can't load module: No such device or address
> 
> Any further suggestions?
> 
> On Wed, Dec 28, 2016 at 3:42 AM, Robert Mustacchi <r...@joyent.com> wrote:
> 
>>
>> Do you have ipmi setup on that system? While we have had success on
>> other SMCI X11 boards, that failure is somewhat newer. If you do have
>> IPMI, would it be possible to boot to kmdb and then after it hangs,
>> inject an nmi? If it is, there are a couple of additional diagnostics
>> that'd be worth collecting so we can root cause what's going on.
>>
>> Thanks,
>> Robert
>>
>>
> 
> 


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] USB 3.0

2016-12-27 Thread Robert Mustacchi
On 12/27/16 10:08 , Ante Vojvodic wrote:
> Just heads up for anyone upgrading new platform image with USB 3.0 support.
> On one of my systems (Supermicro X11SSL-CF) after upgrade on newer platform
> with USB 3 support system wouldn't boot any more.
> After running with kernel debugger on, I saw that kernel would just hang
> on: init 'drv/kb8042'.
> Solution was to set BIOS option "Install Windows 7 USB Support" to
> "Disabled".

Do you have ipmi setup on that system? While we have had success on
other SMCI X11 boards, that failure is somewhat newer. If you do have
IPMI, would it be possible to boot to kmdb and then after it hangs,
inject an nmi? If it is, there are a couple of additional diagnostics
that'd be worth collecting so we can root cause what's going on.

Thanks,
Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] Machines with Xeon D-1521 randomly locking up

2016-12-26 Thread Robert Mustacchi
On 12/24/16 13:50 , Micky wrote:
> I have two machines with Xeon D-1521 randomly locking up (with no load).
> 
> The Supermicro java iKVM doesn't respond to any input.
> 
> Unfortunately, currently I don't have any means to send an NMI interrupt or
> chassiss diag to force a crash dump.
> 
> Has anyone else noticed the same crashes on Xeon D?

We haven't had any other reports of behavior like this. If you can't rig
up IPMI (though it should use the same IP as the iKVM), you may want to
look at setting snooping with kmdb or similar. It's hard to say whether
or not that'll work in this case or if it'll catch what's going on.

Robert


---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


Re: [smartos-discuss] vyos and opnsense performance under smartos

2016-12-19 Thread Robert Mustacchi
On 12/14/16 19:14 , Greg Treantos wrote:
> I wanted to test opnsense and vyos under smartos. They both installed fine
> but the performance is terrible throughput is 78Mb/s using iperf3 -b 0 from
> a host on the lan side to a host on the wan side. Both are on 1Gb networks.
> If I run iperf from inside vyos out to the lan I see 500Mb/s and from vyos
> to the wan about 450Mb/s. However from the lan to vyos I see the 78Mb/s and
> from Wan to vyos I see 60Mb/s. So the issue is with inbound requests. CPU
> util on the VM is around 12% during the test.
> 
> you can see the VM config here http://pastebin.com/nMfLDyRW . The system
> hosting SmartOS is a Partaker mini computer with an Intel i5 processor and
> Intel NICs. bare-metal vyos gets approx 925Mb/s and Opnsense (pfsense
> clone) sees 725Mb/s.
> 
> Can anyone help me with troubleshooting the issue?

It may be useful to go through this and look at the general resource
usage of the system following the USE method. For example, have you
allocated all the CPU, etc. that the guest needs. Perhaps also look at
kvmstat to start getting a sense of what the system is doing.

Robert



---
smartos-discuss
Archives: https://www.listbox.com/member/archive/184463/=now
RSS Feed: https://www.listbox.com/member/archive/rss/184463/25769125-55cfbc00
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=25769125_secret=25769125-7688e9fb
Powered by Listbox: http://www.listbox.com


  1   2   3   4   >