Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels

2018-05-11 Thread Larry Chen
Hi Daniel,

On 04/12/2018 08:20 PM, Daniel Sobe wrote:
> Hi Larry,
>
> this is, in a nutshell, what I do to create a LXC container as "ordinary 
> user":
>
> * Install the LXC packages from the distribution
> * run the command "lxc-create -n test1 -t download"
> ** first run might prompt you to generate a ~/.config/lxc/default.conf to 
> define UID mappings
> ** in a corporate environment it might be tricky to set the http_proxy (and 
> maybe even https_proxy) environment variables correctly
> ** once the list of images is shown, select for instance "debian" "jessie" 
> "amd64"
> * the container downloads to ~/.local/share/lxc/
> * adapt the "config" file in that directory to add the shared ocfs2 mount 
> like in my example below
> * if you're lucky, then "lxc-start -d -n test1" already works, which you can 
> confirm by "lxc-ls --fancy", and attach to the container with "lxc-attach -n 
> test1"
> ** if you want to finally enable networking, most distributions arrange a 
> dedicated bridge (lxcbr0) which you can configure similar to my example below
> ** in my case I had to install cgroup related tools and reboot to have all 
> cgroups available, and to allow use of lxcbr0 bridge in /etc/lxc/lxc-usernet
>
> Now if you access the mount-shared OCFS2 file system from with several 
> containers, the bug will (hopefully) trigger on your side as well. I don't 
> know the conditions under which this will occur, unfortunately.
>
> Regards,
>
> Daniel
>
>
> -Original Message-
> From: Larry Chen [mailto:lc...@suse.com]
> Sent: Donnerstag, 12. April 2018 11:20
> To: Daniel Sobe 
> Subject: Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels
>
> Hi Daniel,
>
> Quite an interesting issue.
>
> I'm not familiar with lxc tools, so it may take some time to reproduce it.
>
> Do you have a script to build up your lxc environment?
> Because I want to make sure that my environment is quite the same as yours.
>
> Thanks,
> Larry
>
>
> On 04/12/2018 03:45 PM, Daniel Sobe wrote:
>> Hi Larry,
>>
>> not sure if it helps, the issue wasn't there with Debian 8 and kernel 3.16 - 
>> but that's a long history. Unfortunately, the only machine where I could try 
>> to bisect, does not run any kernel < 4.16 without other issues ☹
>>
>> Regards,
>>
>> Daniel
>>
>>
>> -Original Message-
>> From: Larry Chen [mailto:lc...@suse.com]
>> Sent: Donnerstag, 12. April 2018 05:17
>> To: Daniel Sobe ; ocfs2-devel@oss.oracle.com
>> Subject: Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels
>>
>> Hi Daniel,
>>
>> Thanks for your report.
>> I'll try to reproduce this bug as you did.
>>
>> I'm afraid there may be some bugs on the collaboration of cgroups and ocfs2.
>>
>> Thanks
>> Larry
>>
>>
>> On 04/11/2018 08:24 PM, Daniel Sobe wrote:
>>> Hi Larry,
>>>
>>> below is an example config file like I use it for LXC containers. I 
>>> followed the instructions 
>>> (https://urldefense.proofpoint.com/v2/url?u=https-3A__emea01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fwiki.debian.org-252FLXC-26data-3D02-257C01-257Cdaniel.sobe-2540nxp.com-257C11fd4f062e694faa287a08d5a023f22b-257C686ea1d3bc2b4c6fa92cd99c5c301635-257C0-257C0-257C636590998614059943-26sdata-3DZSqSTx3Vjxy-252FbfKrXdIVGvUqieRFxVl4FFnr-252FPTGAhc-253D-26reserved-3D0&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y&m=EEKBYUthmGW6dmlK0mKda8ET_52Dw7AzLknUfRWu4CM&s=U_Q9zZpmHwanY55E01YBaTOA5wQC8fsTGebfuh8E3dc&e=)
>>>  and downloaded a Debian 8 container as user (unprivileged) and adapted the 
>>> config file. Several of those containers run on one host and share the 
>>> OCFS2 directory as you can see at the "lxc.mount.entry" line.
>>>
>>> Meanwhile I'm trying whether the problem can be reproduced with shared 
>>> mounts in one namespace, as you suggested. So far with no success, will 
>>> report once anything happens.
>>>
>>> Regards,
>>>
>>> Daniel
>>>
>>> 
>>>
>>> # Distribution configuration
>>> lxc.include = /usr/share/lxc/config/debian.common.conf
>>> lxc.include = /usr/share/lxc/config/debian.userns.conf
>>> lxc.arch = x86_64
>>>
>>> # Container specific configuration
>>> lxc.id_map = u 0 624288 65536
>>> lxc.id_map = g 0 624288 65536
>>>
>>> lxc.utsname = c

Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels

2018-04-13 Thread Daniel Sobe
Hi Changwei,

I will try to reproduce the crash once again, after I properly understood how 
to create the reports you are asking for.

I installed the "crash" tool for Debian Stretch now (7.1.7-1). I have looked at 
the man page, and tried to start the tool, but I have to admit that I did not 
succeed.

Can you give me a pointer as to how the tool is used, and what type of 
information you are asking for (are these global symbols, and do they point to 
a primitive data type or are they structs, are they single symbols or are they 
lists/arrays/...)?

Regards,

Daniel


-Original Message-
From: Changwei Ge [mailto:ge.chang...@h3c.com] 
Sent: Freitag, 13. April 2018 03:54
To: Daniel Sobe ; ocfs2-devel@oss.oracle.com
Subject: Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels

Hi Daniel,

It's not easy to analyze your problem unless you can provide *ocfs2_lock_res* 
*dlm_lock_resource* *dlm_ctxt* through _crash tool_

Thanks,
Changwei

On 2018/4/11 17:46, Daniel Sobe wrote:
> Hi,
> 
> having used OCFS2 successfully for a while using Debian 8 with its default 
> kernel "3.16.0-5-amd64 #1 SMP Debian 3.16.51-3+deb8u1 (2018-01-08)", I'm now 
> facing issues trying to accomplish the same with newer kernels and Debian 9. 
> Below are the problems that occur, they seem to be the same although the 
> kernel is different.
> 
> One trace is from the stock kernel of Debian 9 (at that time), the other is 
> from a very fresh kernel (4.16-rc6). In the latter case, the OOM killer was 
> triggered "shortly" before the bug appeared - it maybe related. The call 
> trace is appended below.
> 
> In both cases, only one machine was active. The cluster is configured for 2 
> machines, but the cluster is not even configured yet at the 2^nd system. Only 
> one OCFS2 file system was mounted, and the mount shared to several namespaces 
> (using LXC). Although the mount was R/W, the users/containers just read from 
> this file system.
> 
> Please let me know what I can do to get rid of this issue. I can provide more 
> information about my use case if required.
> 
> I already posted to ocfs2-users, only then I saw that it is now recommended 
> to post bugs on ocfs2-devel.
> 
> Regards,
> 
> Daniel
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.707568] [ cut here 
> ]
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.707600] kernel BUG at 
> /build/linux-YDazDa/linux-4.9.82/fs/ocfs2/dlmglue.c:825!
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.707635] invalid opcode:  [#1] SMP
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.707654] Modules linked in: appletalk 
> ax25 ipx p8023 p8022 psnap veth ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2 
> ocfs2_nodemanager configfs ocfs2_stackglue quota_tree nls_ut
> 
> f8 cifs sha256_ssse3 cmac md4 des_generic arc4 dns_resolver fscache 
> iptable_filter bridge stp llc bonding fuse intel_rapl sb_edac edac_core 
> x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_suppor
> 
> t kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel 
> nls_ascii nls_cp437 vfat fat intel_cstate intel_uncore intel_rapl_perf 
> efi_pstore efivars pcspkr mgag200 ttm sg drm_kms_helper lpc_ich mfd
> 
> _core drm i2c_algo_bit hpwdt hpilo ioatdma evdev dca shpchp wmi ipmi_si 
> acpi_power_meter ipmi_msghandler pcc_cpufreq button drbd lru_cache libcrc32c 
> efivarfs ip_tables x_tables autofs4 uas usb_storage
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708060]  ext4 crc16 jbd2 
> crc32c_generic fscrypto ecb mbcache dm_mod sd_mod crc32c_intel aesni_intel 
> aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd xhci_pci uhci_hcd e
> 
> hci_pci xhci_hcd ehci_hcd i2c_i801 i2c_smbus i40e tg3 hpsa usbcore ptp 
> scsi_transport_sas usb_common pps_core libphy scsi_mod [last unloaded: 
> configfs]
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708231] CPU: 24 PID: 64700 Comm: perl 
> Not tainted 4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708268] Hardware name: HP ProLiant 
> DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708304] task: 990fda6ef100 
> task.stack: b62f36464000
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708331] RIP: 
> 0010:[]  [] 
> __ocfs2_cluster_unlock.isra.36+0x9d/0xb0 [ocfs2]
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708422] RSP: 0018:b62f36467b38  
> EFLAGS: 00010046
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708447] RAX: 0292 RBX: 
> 990fda6c5618 RCX: 0001
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708479] RDX:  RSI: 
> 0001 RDI: 990fda6c5694
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708510

Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels

2018-04-12 Thread Changwei Ge
Hi Daniel,

It's not easy to analyze your problem unless you can provide *ocfs2_lock_res* 
*dlm_lock_resource* *dlm_ctxt* through _crash tool_

Thanks,
Changwei

On 2018/4/11 17:46, Daniel Sobe wrote:
> Hi,
> 
> having used OCFS2 successfully for a while using Debian 8 with its default 
> kernel “3.16.0-5-amd64 #1 SMP Debian 3.16.51-3+deb8u1 (2018-01-08)”, I’m now 
> facing issues trying to accomplish the same with newer kernels and Debian 9. 
> Below are the problems that occur, they seem to be the same although the 
> kernel is different.
> 
> One trace is from the stock kernel of Debian 9 (at that time), the other is 
> from a very fresh kernel (4.16-rc6). In the latter case, the OOM killer was 
> triggered “shortly” before the bug appeared – it maybe related. The call 
> trace is appended below.
> 
> In both cases, only one machine was active. The cluster is configured for 2 
> machines, but the cluster is not even configured yet at the 2^nd system. Only 
> one OCFS2 file system was mounted, and the mount shared to several namespaces 
> (using LXC). Although the mount was R/W, the users/containers just read from 
> this file system.
> 
> Please let me know what I can do to get rid of this issue. I can provide more 
> information about my use case if required.
> 
> I already posted to ocfs2-users, only then I saw that it is now recommended 
> to post bugs on ocfs2-devel.
> 
> Regards,
> 
> Daniel
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.707568] [ cut here 
> ]
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.707600] kernel BUG at 
> /build/linux-YDazDa/linux-4.9.82/fs/ocfs2/dlmglue.c:825!
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.707635] invalid opcode:  [#1] SMP
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.707654] Modules linked in: appletalk 
> ax25 ipx p8023 p8022 psnap veth ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2 
> ocfs2_nodemanager configfs ocfs2_stackglue quota_tree nls_ut
> 
> f8 cifs sha256_ssse3 cmac md4 des_generic arc4 dns_resolver fscache 
> iptable_filter bridge stp llc bonding fuse intel_rapl sb_edac edac_core 
> x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt iTCO_vendor_suppor
> 
> t kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel 
> nls_ascii nls_cp437 vfat fat intel_cstate intel_uncore intel_rapl_perf 
> efi_pstore efivars pcspkr mgag200 ttm sg drm_kms_helper lpc_ich mfd
> 
> _core drm i2c_algo_bit hpwdt hpilo ioatdma evdev dca shpchp wmi ipmi_si 
> acpi_power_meter ipmi_msghandler pcc_cpufreq button drbd lru_cache libcrc32c 
> efivarfs ip_tables x_tables autofs4 uas usb_storage
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708060]  ext4 crc16 jbd2 
> crc32c_generic fscrypto ecb mbcache dm_mod sd_mod crc32c_intel aesni_intel 
> aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd xhci_pci uhci_hcd e
> 
> hci_pci xhci_hcd ehci_hcd i2c_i801 i2c_smbus i40e tg3 hpsa usbcore ptp 
> scsi_transport_sas usb_common pps_core libphy scsi_mod [last unloaded: 
> configfs]
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708231] CPU: 24 PID: 64700 Comm: perl 
> Not tainted 4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708268] Hardware name: HP ProLiant 
> DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708304] task: 990fda6ef100 
> task.stack: b62f36464000
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708331] RIP: 
> 0010:[]  [] 
> __ocfs2_cluster_unlock.isra.36+0x9d/0xb0 [ocfs2]
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708422] RSP: 0018:b62f36467b38  
> EFLAGS: 00010046
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708447] RAX: 0292 RBX: 
> 990fda6c5618 RCX: 0001
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708479] RDX:  RSI: 
> 0001 RDI: 990fda6c5694
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708510] RBP: 0003 R08: 
> 0101 R09: 
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708541] R10: 0038 R11: 
> 007c R12: 990fda6c5694
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708572] R13: 991bb0f76000 R14: 
>  R15: c0ba5080
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708604] FS:  () 
> GS:991bbea8(0063) knlGS:f7462700
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708640] CS:  0010 DS: 002b ES: 002b 
> CR0: 80050033
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708666] CR2: ff60 CR3: 
> 00341a7b6000 CR4: 00360670
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708697] DR0:  DR1: 
>  DR2: 
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708728] DR3:  DR6: 
> fffe0ff0 DR7: 0400
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708759] Stack:
> 
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708771]  c0b12b45 
>  99101a537300 

Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels

2018-04-12 Thread Daniel Sobe
Hi Larry,

not sure if it helps, the issue wasn't there with Debian 8 and kernel 3.16 - 
but that's a long history. Unfortunately, the only machine where I could try to 
bisect, does not run any kernel < 4.16 without other issues ☹

Regards,

Daniel


-Original Message-
From: Larry Chen [mailto:lc...@suse.com] 
Sent: Donnerstag, 12. April 2018 05:17
To: Daniel Sobe ; ocfs2-devel@oss.oracle.com
Subject: Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels

Hi Daniel,

Thanks for your report.
I'll try to reproduce this bug as you did.

I'm afraid there may be some bugs on the collaboration of cgroups and ocfs2.

Thanks
Larry


On 04/11/2018 08:24 PM, Daniel Sobe wrote:
> Hi Larry,
>
> below is an example config file like I use it for LXC containers. I followed 
> the instructions 
> (https://urldefense.proofpoint.com/v2/url?u=https-3A__emea01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fwiki.debian.org-252FLXC-26data-3D02-257C01-257Cdaniel.sobe-2540nxp.com-257C11fd4f062e694faa287a08d5a023f22b-257C686ea1d3bc2b4c6fa92cd99c5c301635-257C0-257C0-257C636590998614059943-26sdata-3DZSqSTx3Vjxy-252FbfKrXdIVGvUqieRFxVl4FFnr-252FPTGAhc-253D-26reserved-3D0&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y&m=-31a8tQ2mImxoLrIuvf9-uFENMmmyGFq4K2E3HQHRn8&s=jKxhrrNPswhDaCa-acK9qU2LqZh1Mv9aEWHK9vfnLTM&e=)
>  and downloaded a Debian 8 container as user (unprivileged) and adapted the 
> config file. Several of those containers run on one host and share the OCFS2 
> directory as you can see at the "lxc.mount.entry" line.
>
> Meanwhile I'm trying whether the problem can be reproduced with shared mounts 
> in one namespace, as you suggested. So far with no success, will report once 
> anything happens.
>
> Regards,
>
> Daniel
>
> 
>
> # Distribution configuration
> lxc.include = /usr/share/lxc/config/debian.common.conf
> lxc.include = /usr/share/lxc/config/debian.userns.conf
> lxc.arch = x86_64
>
> # Container specific configuration
> lxc.id_map = u 0 624288 65536
> lxc.id_map = g 0 624288 65536
>
> lxc.utsname = container1
> lxc.rootfs = /storage/uvirtuals/unpriv/container1/rootfs
>
> lxc.network.type = veth
> lxc.network.flags = up
> lxc.network.link = bridge1
> lxc.network.name = eth0
> lxc.network.veth.pair = aabbccddeeff
> lxc.network.ipv4 = XX.XX.XX.XX/YY
> lxc.network.ipv4.gateway = ZZ.ZZ.ZZ.ZZ
>
> lxc.cgroup.cpuset.cpus = 63-86
>
> lxc.mount.entry = /storage/ocfs2/swswnone bind 0 0
>
> lxc.cgroup.memory.limit_in_bytes   = 240G
> lxc.cgroup.memory.memsw.limit_in_bytes = 240G
>
> lxc.include = /usr/share/lxc/config/common.conf.d/00-lxcfs.conf
>
> ----
>
>
>
>
> -Original Message-
> From: Larry Chen [mailto:lc...@suse.com]
> Sent: Mittwoch, 11. April 2018 13:31
> To: Daniel Sobe ; ocfs2-devel@oss.oracle.com
> Subject: Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels
>
>
>
> On 04/11/2018 07:17 PM, Daniel Sobe wrote:
>> Hi Larry,
>>
>> this is what I was doing. The 2nd node, while being "declared" in the 
>> cluster.conf, does not exist yet, and thus everything was happening on one 
>> node only.
>>
>> I do not know in detail how LXC does the mount sharing, but I assume it 
>> simply calls "mount --bind /original/mount/point /new/mount/point" in a 
>> separate namespace (or, somehow unshares the mount from the original 
>> namespace afterwards).
> I thought of there is a way to share a directory between host and docker 
> container, like
>      docker run -v /host/directory:/container/directory -other -options 
> image_name command_to_run That's different from yours.
>
> How did you setup your lxc or container?
>
> If you could, show me the procedure, I'll try to reproduce it.
>
> And by the way, if you get rid of lxc, and just mount ocfs2 on several 
> different mount point of local host, will the problem recur?
>
> Regards,
> Larry
>> Regards,
>>
>> Daniel
>>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels

2018-04-11 Thread Larry Chen
Hi Daniel,

Thanks for your report.
I'll try to reproduce this bug as you did.

I'm afraid there may be some bugs on the collaboration of cgroups and ocfs2.

Thanks
Larry


On 04/11/2018 08:24 PM, Daniel Sobe wrote:
> Hi Larry,
>
> below is an example config file like I use it for LXC containers. I followed 
> the instructions 
> (https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.debian.org_LXC&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y&m=g0D3je5kgCEJiDPFKQ1Yw-c8S8eNY8KJhFC8PNVcGZM&s=k1_NjIjuXW6KE2FAAuAd77CTAy09r-nVBvnfMYcsAEw&e=)
>  and downloaded a Debian 8 container as user (unprivileged) and adapted the 
> config file. Several of those containers run on one host and share the OCFS2 
> directory as you can see at the "lxc.mount.entry" line.
>
> Meanwhile I'm trying whether the problem can be reproduced with shared mounts 
> in one namespace, as you suggested. So far with no success, will report once 
> anything happens.
>
> Regards,
>
> Daniel
>
> 
>
> # Distribution configuration
> lxc.include = /usr/share/lxc/config/debian.common.conf
> lxc.include = /usr/share/lxc/config/debian.userns.conf
> lxc.arch = x86_64
>
> # Container specific configuration
> lxc.id_map = u 0 624288 65536
> lxc.id_map = g 0 624288 65536
>
> lxc.utsname = container1
> lxc.rootfs = /storage/uvirtuals/unpriv/container1/rootfs
>
> lxc.network.type = veth
> lxc.network.flags = up
> lxc.network.link = bridge1
> lxc.network.name = eth0
> lxc.network.veth.pair = aabbccddeeff
> lxc.network.ipv4 = XX.XX.XX.XX/YY
> lxc.network.ipv4.gateway = ZZ.ZZ.ZZ.ZZ
>
> lxc.cgroup.cpuset.cpus = 63-86
>
> lxc.mount.entry = /storage/ocfs2/swswnone bind 0 0
>
> lxc.cgroup.memory.limit_in_bytes   = 240G
> lxc.cgroup.memory.memsw.limit_in_bytes = 240G
>
> lxc.include = /usr/share/lxc/config/common.conf.d/00-lxcfs.conf
>
> ----
>
>
>
>
> -Original Message-
> From: Larry Chen [mailto:lc...@suse.com]
> Sent: Mittwoch, 11. April 2018 13:31
> To: Daniel Sobe ; ocfs2-devel@oss.oracle.com
> Subject: Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels
>
>
>
> On 04/11/2018 07:17 PM, Daniel Sobe wrote:
>> Hi Larry,
>>
>> this is what I was doing. The 2nd node, while being "declared" in the 
>> cluster.conf, does not exist yet, and thus everything was happening on one 
>> node only.
>>
>> I do not know in detail how LXC does the mount sharing, but I assume it 
>> simply calls "mount --bind /original/mount/point /new/mount/point" in a 
>> separate namespace (or, somehow unshares the mount from the original 
>> namespace afterwards).
> I thought of there is a way to share a directory between host and docker 
> container, like
>      docker run -v /host/directory:/container/directory -other -options 
> image_name command_to_run That's different from yours.
>
> How did you setup your lxc or container?
>
> If you could, show me the procedure, I'll try to reproduce it.
>
> And by the way, if you get rid of lxc, and just mount ocfs2 on several 
> different mount point of local host, will the problem recur?
>
> Regards,
> Larry
>> Regards,
>>
>> Daniel
>>


___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels

2018-04-11 Thread Daniel Sobe
Hi Larry,

below is an example config file like I use it for LXC containers. I followed 
the instructions 
(https://urldefense.proofpoint.com/v2/url?u=https-3A__wiki.debian.org_LXC&d=DwIGaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=C7gAd4uDxlAvTdc0vmU6X8CMk6L2iDY8-HD0qT6Fo7Y&m=BmWCVeE72QTY9ubXpj4I5tnxoA7khmxQhKu6cPriu-Y&s=XWKvduHietaYbL3xzVzkxDF9-WncOOXJneQ7413qJP0&e=)
 and downloaded a Debian 8 container as user (unprivileged) and adapted the 
config file. Several of those containers run on one host and share the OCFS2 
directory as you can see at the "lxc.mount.entry" line.

Meanwhile I'm trying whether the problem can be reproduced with shared mounts 
in one namespace, as you suggested. So far with no success, will report once 
anything happens. 

Regards,

Daniel



# Distribution configuration
lxc.include = /usr/share/lxc/config/debian.common.conf
lxc.include = /usr/share/lxc/config/debian.userns.conf
lxc.arch = x86_64

# Container specific configuration
lxc.id_map = u 0 624288 65536
lxc.id_map = g 0 624288 65536

lxc.utsname = container1
lxc.rootfs = /storage/uvirtuals/unpriv/container1/rootfs

lxc.network.type = veth
lxc.network.flags = up
lxc.network.link = bridge1
lxc.network.name = eth0
lxc.network.veth.pair = aabbccddeeff
lxc.network.ipv4 = XX.XX.XX.XX/YY
lxc.network.ipv4.gateway = ZZ.ZZ.ZZ.ZZ

lxc.cgroup.cpuset.cpus = 63-86

lxc.mount.entry = /storage/ocfs2/swswnone bind 0 0

lxc.cgroup.memory.limit_in_bytes   = 240G
lxc.cgroup.memory.memsw.limit_in_bytes = 240G

lxc.include = /usr/share/lxc/config/common.conf.d/00-lxcfs.conf






-Original Message-
From: Larry Chen [mailto:lc...@suse.com] 
Sent: Mittwoch, 11. April 2018 13:31
To: Daniel Sobe ; ocfs2-devel@oss.oracle.com
Subject: Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels



On 04/11/2018 07:17 PM, Daniel Sobe wrote:
> Hi Larry,
>
> this is what I was doing. The 2nd node, while being "declared" in the 
> cluster.conf, does not exist yet, and thus everything was happening on one 
> node only.
>
> I do not know in detail how LXC does the mount sharing, but I assume it 
> simply calls "mount --bind /original/mount/point /new/mount/point" in a 
> separate namespace (or, somehow unshares the mount from the original 
> namespace afterwards).
I thought of there is a way to share a directory between host and docker 
container, like
    docker run -v /host/directory:/container/directory -other -options 
image_name command_to_run That's different from yours.

How did you setup your lxc or container?

If you could, show me the procedure, I'll try to reproduce it.

And by the way, if you get rid of lxc, and just mount ocfs2 on several 
different mount point of local host, will the problem recur?

Regards,
Larry
> Regards,
>
> Daniel
>

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels

2018-04-11 Thread Larry Chen


On 04/11/2018 07:17 PM, Daniel Sobe wrote:
> Hi Larry,
>
> this is what I was doing. The 2nd node, while being "declared" in the 
> cluster.conf, does not exist yet, and thus everything was happening on one 
> node only.
>
> I do not know in detail how LXC does the mount sharing, but I assume it 
> simply calls "mount --bind /original/mount/point /new/mount/point" in a 
> separate namespace (or, somehow unshares the mount from the original 
> namespace afterwards).
I thought of there is a way to share a directory between host and docker 
container, like
    docker run -v /host/directory:/container/directory -other -options 
image_name command_to_run
That's different from yours.

How did you setup your lxc or container?

If you could, show me the procedure, I'll try to reproduce it.

And by the way, if you get rid of lxc, and just mount ocfs2 on several 
different mount point of local host, will the problem recur?

Regards,
Larry
> Regards,
>
> Daniel
>

___
Ocfs2-devel mailing list
Ocfs2-devel@oss.oracle.com
https://oss.oracle.com/mailman/listinfo/ocfs2-devel

Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels

2018-04-11 Thread Daniel Sobe
Hi Larry,

this is what I was doing. The 2nd node, while being "declared" in the 
cluster.conf, does not exist yet, and thus everything was happening on one node 
only.

I do not know in detail how LXC does the mount sharing, but I assume it simply 
calls "mount --bind /original/mount/point /new/mount/point" in a separate 
namespace (or, somehow unshares the mount from the original namespace 
afterwards).

Regards,

Daniel

-Original Message-
From: Larry Chen [mailto:lc...@suse.com] 
Sent: Mittwoch, 11. April 2018 12:43
To: Daniel Sobe ; ocfs2-devel@oss.oracle.com
Subject: Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels

Hi Daniel,
If you execute mkfs and mount that fs on only one node, and then share the 
mount to several namespaces, will the issue recur?

And could you please show us how you shared the mount to other namespaces?

Thanks
Larry

On 04/11/2018 05:45 PM, Daniel Sobe wrote:
>
> Hi,
>
> having used OCFS2 successfully for a while using Debian 8 with its 
> default kernel "3.16.0-5-amd64 #1 SMP Debian 3.16.51-3+deb8u1 
> (2018-01-08)", I'm now facing issues trying to accomplish the same 
> with newer kernels and Debian 9. Below are the problems that occur, 
> they seem to be the same although the kernel is different.
>
> One trace is from the stock kernel of Debian 9 (at that time), the 
> other is from a very fresh kernel (4.16-rc6). In the latter case, the 
> OOM killer was triggered "shortly" before the bug appeared - it maybe 
> related. The call trace is appended below.
>
> In both cases, only one machine was active. The cluster is configured 
> for 2 machines, but the cluster is not even configured yet at the 2^nd 
> system. Only one OCFS2 file system was mounted, and the mount shared 
> to several namespaces (using LXC). Although the mount was R/W, the 
> users/containers just read from this file system.
>
> Please let me know what I can do to get rid of this issue. I can 
> provide more information about my use case if required.
>
> I already posted to ocfs2-users, only then I saw that it is now 
> recommended to post bugs on ocfs2-devel.
>
> Regards,
>
> Daniel
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.707568] [ cut here
> ]
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.707600] kernel BUG at 
> /build/linux-YDazDa/linux-4.9.82/fs/ocfs2/dlmglue.c:825!
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.707635] invalid opcode:  
> [#1] SMP
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.707654] Modules linked in: 
> appletalk ax25 ipx p8023 p8022 psnap veth ocfs2_dlmfs ocfs2_stack_o2cb 
> ocfs2_dlm ocfs2 ocfs2_nodemanager configfs ocfs2_stackglue quota_tree 
> nls_ut
>
> f8 cifs sha256_ssse3 cmac md4 des_generic arc4 dns_resolver fscache 
> iptable_filter bridge stp llc bonding fuse intel_rapl sb_edac 
> edac_core x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt 
> iTCO_vendor_suppor
>
> t kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul 
> ghash_clmulni_intel nls_ascii nls_cp437 vfat fat intel_cstate 
> intel_uncore intel_rapl_perf efi_pstore efivars pcspkr mgag200 ttm sg 
> drm_kms_helper lpc_ich mfd
>
> _core drm i2c_algo_bit hpwdt hpilo ioatdma evdev dca shpchp wmi 
> ipmi_si acpi_power_meter ipmi_msghandler pcc_cpufreq button drbd 
> lru_cache libcrc32c efivarfs ip_tables x_tables autofs4 uas 
> usb_storage
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708060]  ext4 crc16 jbd2 
> crc32c_generic fscrypto ecb mbcache dm_mod sd_mod crc32c_intel 
> aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd 
> xhci_pci uhci_hcd e
>
> hci_pci xhci_hcd ehci_hcd i2c_i801 i2c_smbus i40e tg3 hpsa usbcore ptp 
> scsi_transport_sas usb_common pps_core libphy scsi_mod [last unloaded:
> configfs]
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708231] CPU: 24 PID: 64700
> Comm: perl Not tainted 4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708268] Hardware name: HP 
> ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708304] task: 990fda6ef100
> task.stack: b62f36464000
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708331] RIP: 
> 0010:[]  []
> __ocfs2_cluster_unlock.isra.36+0x9d/0xb0 [ocfs2]
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708422] RSP: 
> 0018:b62f36467b38  EFLAGS: 00010046
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708447] RAX: 0292
> RBX: 990fda6c5618 RCX: 0001
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708479] RDX: 
> RSI: 0001 RDI: 990fda6c5694
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708510] RBP: 0003
> R08: 0101 R09: 000

Re: [Ocfs2-devel] OCFS2 BUG with 2 different kernels

2018-04-11 Thread Larry Chen
Hi Daniel,
If you execute mkfs and mount that fs on only one node,
and then share the mount to several namespaces, will the
issue recur?

And could you please show us how you shared the mount to
other namespaces?

Thanks
Larry

On 04/11/2018 05:45 PM, Daniel Sobe wrote:
>
> Hi,
>
> having used OCFS2 successfully for a while using Debian 8 with its 
> default kernel “3.16.0-5-amd64 #1 SMP Debian 3.16.51-3+deb8u1 
> (2018-01-08)”, I’m now facing issues trying to accomplish the same 
> with newer kernels and Debian 9. Below are the problems that occur, 
> they seem to be the same although the kernel is different.
>
> One trace is from the stock kernel of Debian 9 (at that time), the 
> other is from a very fresh kernel (4.16-rc6). In the latter case, the 
> OOM killer was triggered “shortly” before the bug appeared – it maybe 
> related. The call trace is appended below.
>
> In both cases, only one machine was active. The cluster is configured 
> for 2 machines, but the cluster is not even configured yet at the 2^nd 
> system. Only one OCFS2 file system was mounted, and the mount shared 
> to several namespaces (using LXC). Although the mount was R/W, the 
> users/containers just read from this file system.
>
> Please let me know what I can do to get rid of this issue. I can 
> provide more information about my use case if required.
>
> I already posted to ocfs2-users, only then I saw that it is now 
> recommended to post bugs on ocfs2-devel.
>
> Regards,
>
> Daniel
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.707568] [ cut here 
> ]
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.707600] kernel BUG at 
> /build/linux-YDazDa/linux-4.9.82/fs/ocfs2/dlmglue.c:825!
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.707635] invalid opcode:  
> [#1] SMP
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.707654] Modules linked in: 
> appletalk ax25 ipx p8023 p8022 psnap veth ocfs2_dlmfs ocfs2_stack_o2cb 
> ocfs2_dlm ocfs2 ocfs2_nodemanager configfs ocfs2_stackglue quota_tree 
> nls_ut
>
> f8 cifs sha256_ssse3 cmac md4 des_generic arc4 dns_resolver fscache 
> iptable_filter bridge stp llc bonding fuse intel_rapl sb_edac 
> edac_core x86_pkg_temp_thermal intel_powerclamp coretemp iTCO_wdt 
> iTCO_vendor_suppor
>
> t kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul 
> ghash_clmulni_intel nls_ascii nls_cp437 vfat fat intel_cstate 
> intel_uncore intel_rapl_perf efi_pstore efivars pcspkr mgag200 ttm sg 
> drm_kms_helper lpc_ich mfd
>
> _core drm i2c_algo_bit hpwdt hpilo ioatdma evdev dca shpchp wmi 
> ipmi_si acpi_power_meter ipmi_msghandler pcc_cpufreq button drbd 
> lru_cache libcrc32c efivarfs ip_tables x_tables autofs4 uas usb_storage
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708060]  ext4 crc16 jbd2 
> crc32c_generic fscrypto ecb mbcache dm_mod sd_mod crc32c_intel 
> aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd 
> xhci_pci uhci_hcd e
>
> hci_pci xhci_hcd ehci_hcd i2c_i801 i2c_smbus i40e tg3 hpsa usbcore ptp 
> scsi_transport_sas usb_common pps_core libphy scsi_mod [last unloaded: 
> configfs]
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708231] CPU: 24 PID: 64700 
> Comm: perl Not tainted 4.9.0-6-amd64 #1 Debian 4.9.82-1+deb9u3
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708268] Hardware name: HP 
> ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 01/22/2018
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708304] task: 990fda6ef100 
> task.stack: b62f36464000
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708331] RIP: 
> 0010:[]  [] 
> __ocfs2_cluster_unlock.isra.36+0x9d/0xb0 [ocfs2]
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708422] RSP: 
> 0018:b62f36467b38  EFLAGS: 00010046
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708447] RAX: 0292 
> RBX: 990fda6c5618 RCX: 0001
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708479] RDX:  
> RSI: 0001 RDI: 990fda6c5694
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708510] RBP: 0003 
> R08: 0101 R09: 
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708541] R10: 0038 
> R11: 007c R12: 990fda6c5694
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708572] R13: 991bb0f76000 
> R14:  R15: c0ba5080
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708604] FS: 
> () GS:991bbea8(0063) knlGS:f7462700
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708640] CS:  0010 DS: 002b ES: 
> 002b CR0: 80050033
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708666] CR2: ff60 
> CR3: 00341a7b6000 CR4: 00360670
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708697] DR0:  
> DR1:  DR2: 
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708728] DR3:  
> DR6: fffe0ff0 DR7: 0400
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708759] Stack:
>
> Mar 22 19:26:55 drs1s005 kernel: [ 7545.708771]  f