J'AI BESOIN DE VOTRE RESPONSE DES QUE POSSIBLE

2018-04-17 Thread Brenda Wilson


Bonjour, je suis le Sgt. Brenda Wilson, originaire de Lake Jackson au Texas aux 
États-Unis. J'ai personnellement fait une recherche spéciale 
et je suis tombé sur votre information. Je suis en train de vous 
écrire ce Message de la base militaire américaine de Kaboul en 
Afghanistan. J'ai une proposition d'affaires sécurisée 
pour vous. Bonjour, je suis le Sgt Brenda Wilson, originaire de Lake Jackson au 
Texas aux États-Unis. vos informations. Je vous écris 
présentement ce message de la base militaire des États-Unis 
à Kaboul en Afghanistan. J'ai une proposition commerciale 
sécurisée pour vous.


[PATCH v2] module: Fix display of wrong module .text address

2018-04-17 Thread Thomas Richter
Fixes: ef0010a30935 ("vsprintf: don't use 'restricted_pointer()'
when not restricting") for /sys/module/*/sections/.text file.

Reading file /proc/modules shows the correct address:
[root@s35lp76 ~]# cat /proc/modules | egrep '^qeth_l2'
qeth_l2 94208 1 - Live 0x03ff80401000

and reading file /sys/module/qeth_l2/sections/.text
[root@s35lp76 ~]# cat /sys/module/qeth_l2/sections/.text
0x18ea8363
displays a random address.

This breaks the perf tool which uses this address on s390
to calculate start of .text section in memory.

Fix this by printing the correct (unhashed) address.

Thanks to Jessica Yu for helping on this.

Suggested-by: Linus Torvalds 
Signed-off-by: Thomas Richter 
Cc: Jessica Yu 
Cc: sta...@vger.kernel.org
---
 kernel/module.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/module.c b/kernel/module.c
index a6e43a5806a1..40b42000bd80 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -1472,7 +1472,8 @@ static ssize_t module_sect_show(struct module_attribute 
*mattr,
 {
struct module_sect_attr *sattr =
container_of(mattr, struct module_sect_attr, mattr);
-   return sprintf(buf, "0x%pK\n", (void *)sattr->address);
+   return sprintf(buf, "0x%px\n", kptr_restrict < 2 ?
+  (void *)sattr->address : 0);
 }
 
 static void free_sect_attrs(struct module_sect_attrs *sect_attrs)
-- 
2.14.3



Re: [RFC 2/6] dmaengine: xilinx_dma: Pass AXI4-Stream control words to netdev dma client

2018-04-17 Thread Peter Ujfalusi


On 2018-04-17 18:42, Vinod Koul wrote:
> On Tue, Apr 17, 2018 at 04:46:43PM +0300, Peter Ujfalusi wrote:
> 
>> @@ -709,6 +709,11 @@ struct dma_filter {
>>   *  be called after period_len bytes have been transferred.
>>   * @device_prep_interleaved_dma: Transfer expression in a generic way.
>>   * @device_prep_dma_imm_data: DMA's 8 byte immediate data to the dst address
>> + * @device_attach_metadata: Some DMA engines can send and receive side band
>> + *  information, commands or parameters which is not transferred within the
>> + *  data stream itself. In such case clients can set the metadata to the
>> + *  given descriptor and it is going to be sent to the peripheral, or in
>> + *  case of DEV_TO_MEM the provided buffer will receive the metadata.
>>   * @device_config: Pushes a new configuration to a channel, return 0 or an 
>> error
>>   *  code
>>   * @device_pause: Pauses any transfer happening on a channel. Returns
>> @@ -796,6 +801,9 @@ struct dma_device {
>>  struct dma_chan *chan, dma_addr_t dst, u64 data,
>>  unsigned long flags);
>>  
>> +int (*device_attach_metadata)(struct dma_async_tx_descriptor *desc,
>> +  void *data, size_t len);
> 
> while i am okay with the concept, I would not want to go again the custom
> pointer route, this is a no-go for me.
> 
> Instead lets add the vendor data, define that explicitly. We can use struct,
> tokens or something else to define these. But lets try to stay away from
> opaque objects please :-)

The DMA does not interpret the metadata, it is information which can be
only understood by the client driver and the remote peripheral. It is
just chunk of data (parameters, timestamps, keys, etc) that needs to
travel along with the payload.

The content is not relevant for the DMA itself.

- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki


Re: [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver

2018-04-17 Thread Oleksandr Andrushchenko

On 04/17/2018 11:57 PM, Dongwon Kim wrote:

On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote:

On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote:

Yeah, I definitely agree on the idea of expanding the use case to the
general domain where dmabuf sharing is used. However, what you are
targetting with proposed changes is identical to the core design of
hyper_dmabuf.

On top of this basic functionalities, hyper_dmabuf has driver level
inter-domain communication, that is needed for dma-buf remote tracking
(no fence forwarding though), event triggering and event handling, extra
meta data exchange and hyper_dmabuf_id that represents grefs
(grefs are shared implicitly on driver level)

This really isn't a positive design aspect of hyperdmabuf imo. The core
code in xen-zcopy (ignoring the ioctl side, which will be cleaned up) is
very simple & clean.

If there's a clear need later on we can extend that. But for now xen-zcopy
seems to cover the basic use-case needs, so gets the job done.


Also it is designed with frontend (common core framework) + backend
(hyper visor specific comm and memory sharing) structure for portability.
We just can't limit this feature to Xen because we want to use the same
uapis not only for Xen but also other applicable hypervisor, like ACORN.

See the discussion around udmabuf and the needs for kvm. I think trying to
make an ioctl/uapi that works for multiple hypervisors is misguided - it
likely won't work.

On top of that the 2nd hypervisor you're aiming to support is ACRN. That's
not even upstream yet, nor have I seen any patches proposing to land linux
support for ACRN. Since it's not upstream, it doesn't really matter for
upstream consideration. I'm doubting that ACRN will use the same grant
references as xen, so the same uapi won't work on ACRN as on Xen anyway.

Yeah, ACRN doesn't have grant-table. Only Xen supports it. But that is why
hyper_dmabuf has been architectured with the concept of backend.
If you look at the structure of backend, you will find that
backend is just a set of standard function calls as shown here:

struct hyper_dmabuf_bknd_ops {
 /* backend initialization routine (optional) */
 int (*init)(void);

 /* backend cleanup routine (optional) */
 int (*cleanup)(void);

 /* retreiving id of current virtual machine */
 int (*get_vm_id)(void);

 /* get pages shared via hypervisor-specific method */
 int (*share_pages)(struct page **pages, int vm_id,
int nents, void **refs_info);

 /* make shared pages unshared via hypervisor specific method */
 int (*unshare_pages)(void **refs_info, int nents);

 /* map remotely shared pages on importer's side via
  * hypervisor-specific method
  */
 struct page ** (*map_shared_pages)(unsigned long ref, int vm_id,
int nents, void **refs_info);

 /* unmap and free shared pages on importer's side via
  * hypervisor-specific method
  */
 int (*unmap_shared_pages)(void **refs_info, int nents);

 /* initialize communication environment */
 int (*init_comm_env)(void);

 void (*destroy_comm)(void);

 /* upstream ch setup (receiving and responding) */
 int (*init_rx_ch)(int vm_id);

 /* downstream ch setup (transmitting and parsing responses) */
 int (*init_tx_ch)(int vm_id);

 int (*send_req)(int vm_id, struct hyper_dmabuf_req *req, int wait);
};

All of these can be mapped with any hypervisor specific implementation.
We designed backend implementation for Xen using grant-table, Xen event
and ring buffer communication. For ACRN, we have another backend using Virt-IO
for both memory sharing and communication.

We tried to define this structure of backend to make it general enough (or
it can be even modified or extended to support more cases.) so that it can
fit to other hypervisor cases. Only requirements/expectation on the hypervisor
are page-level memory sharing and inter-domain communication, which I think
are standard features of modern hypervisor.

And please review common UAPIs that hyper_dmabuf and xen-zcopy supports. They
are very general. One is getting FD (dmabuf) and get those shared. The other
is generating dmabuf from global handle (secure handle hiding gref behind it).
On top of this, hyper_dmabuf has "unshare" and "query" which are also useful
for any cases.

So I don't know why we wouldn't want to try to make these standard in most of
hypervisor cases instead of limiting it to certain hypervisor like Xen.
Frontend-backend structre is optimal for this I think.


So I am wondering we can start with this hyper_dmabuf then modify it for
your use-case if needed and polish and fix any glitches if we want to
to use this for all general dma-buf usecases.

Imo xen-zcopy is a much more reasonable starting point for upstream, which
can then

4.14.34: kernel stack regs has bad 'bp' value

2018-04-17 Thread Daniel J Blueman
When running stress-ng on 4.14.34 mainline on x86, I ran into a
"kernel stack regs has bad 'bp' value" warning [1].

Let me know if any more information/debug is useful.

Thanks,
  Daniel

-- [1]

WARNING: kernel stack regs at 880638ad76b8 in
stress-ng-af-al:32670 has bad 'bp' value 737a756eb765e87a
unwind stack type:0 next_sp: (null) mask:0x6 graph_idx:0
88105ef47af8: 88105ef47b88 (0x88105ef47b88)
88105ef47b00: 810a5a22 (__save_stack_trace+0x82/0x100)
88105ef47b08:  ...
88105ef47b10: 880638ad (0x880638ad)
88105ef47b18: 880638ad8000 (0x880638ad8000)
88105ef47b20:  ...
88105ef47b28: 0006 (0x6)
88105ef47b30: 881004cd9f40 (0x881004cd9f40)
88105ef47b38: 0101 (0x101)
88105ef47b40:  ...
88105ef47b48: 88105ef47af8 (0x88105ef47af8)
88105ef47b50: a0e201e3 (._mainloop+0x8c/0x4ca [salsa20_x86_64])
88105ef47b58: 880638ad76b8 (0x880638ad76b8)
88105ef47b60: b5d8152ac2b05d00 (0xb5d8152ac2b05d00)
88105ef47b68: 0100 (0x100)
88105ef47b70: 8810513d4a00 (0x8810513d4a00)
88105ef47b78: 8810513d4b00 (0x8810513d4b00)
88105ef47b80: 816b2fb3 (file_free_rcu+0x53/0x70)
88105ef47b88: 88105ef47b98 (0x88105ef47b98)
88105ef47b90: 810a5abb (save_stack_trace+0x1b/0x20)
88105ef47b98: 88105ef47dc8 (0x88105ef47dc8)
88105ef47ba0: 81648e53 (save_stack+0x43/0xd0)
88105ef47ba8: 004b (0x4b)
88105ef47bb0: 88105ef47bc0 (0x88105ef47bc0)
88105ef47bb8:  (0x)
88105ef47bc0: 810a5abb (save_stack_trace+0x1b/0x20)
88105ef47bc8: 81648e53 (save_stack+0x43/0xd0)
88105ef47bd0: 81649762 (kasan_slab_free+0x72/0xc0)
88105ef47bd8: 8164473c (kmem_cache_free+0x7c/0x1f0)
88105ef47be0: 816b2fb3 (file_free_rcu+0x53/0x70)
88105ef47be8: 812b8cbd (rcu_process_callbacks+0x39d/0xde0)
88105ef47bf0: 82c00184 (__do_softirq+0x184/0x5b7)
88105ef47bf8: 811721e8 (irq_exit+0x1e8/0x220)
88105ef47c00: 82a03ca8 (smp_apic_timer_interrupt+0xd8/0x2f0)
88105ef47c08: 82a0213e (apic_timer_interrupt+0x8e/0xa0)
88105ef47c10: a0e201e3 (._mainloop+0x8c/0x4ca [salsa20_x86_64])
88105ef47c18: 8332be80 (inat_primary_table+0x17efc0/0x1d0d97)
88105ef47c20: 812ef0a0 (posix_cpu_timers_exit_group+0x50/0x50)
88105ef47c28: 0008 (0x8)
88105ef47c30: 88105ef47c88 (0x88105ef47c88)
88105ef47c38: 811f9feb (__update_load_avg_se.isra.30+0x3cb/0x550)
88105ef47c40: 811f9feb (__update_load_avg_se.isra.30+0x3cb/0x550)
88105ef47c48: 88100a5b9f90 (0x88100a5b9f90)
88105ef47c50: 03b5 (0x03b5)
88105ef47c58: 88100a5b9f01 (0x88100a5b9f01)
88105ef47c60: 0007 (0x7)
88105ef47c68: 0005 (0x5)
88105ef47c70: 88105ef6bbf8 (0x88105ef6bbf8)
88105ef47c78:  ...
88105ef47c80: 0007 (0x7)
88105ef47c88: 88105ef47d28 (0x88105ef47d28)
88105ef47c90: 811fd2f6 (cpu_load_update+0x1b6/0x3a0)
88105ef47c98: 00015ef6bc30 (0x15ef6bc30)
88105ef47ca0: 88100a5b9e00 (0x88100a5b9e00)
88105ef47ca8: 88105ef6bc30 (0x88105ef6bc30)
88105ef47cb0: 2432 (0x2432)
88105ef47cb8: 0001 (0x1)
88105ef47cc0: dc00 (0xdc00)
88105ef47cc8: 88105ef47d00 (0x88105ef47d00)
88105ef47cd0: 8120eaef (0x8120eaef)
88105ef47cd8: 0007 (0x7)
88105ef47ce0: 88105ef6bc30 (0x88105ef6bc30)
88105ef47ce8: 88105ef6bbc0 (0x88105ef6bbc0)
88105ef47cf0: 88105ef47cf0 (0x88105ef47cf0)
88105ef47cf8: 88105ef47cf0 (0x88105ef47cf0)
88105ef47d00: 88105ef6bbc0 (0x88105ef6bbc0)
88105ef47d08: 0001002fda17 (0x1002fda17)
88105ef47d10: 0007 (0x7)
88105ef47d18: 83382628 (__per_cpu_offset+0x268/0x1)
88105ef47d20: 04cd9f40 (0x4cd9f40)
88105ef47d28: 88105ef67b80 (0x88105ef67b80)
88105ef47d30: 8354b600 (rcu_sched_state+0xa00/0x413e0)
88105ef47d38: 88105ef6ca78 (0x88105ef6ca78)
88105ef47d40: 88105ef6ca40 (0x88105ef6ca40)
88105ef47d48: 8354ac00 (rcu_bh_varname+0x60/0x60)
88105ef47d50: 8354ac00 (rcu_bh_varname+0x60/0x60)
88105ef47d58: 88105ef47d90 (0x88105ef47d90)
88105ef47d60: 812b313d (rcu_accelerate_cbs+0x7d/0xd0)
88105ef47d68: 88105ef663e0 (0x88105ef663e0)
88105ef47d70: 88105ef6ca40 (0x88105ef6ca40)
88105ef47d78: 88105ef6ca78 (0x88105ef6ca78)
88105ef47d80: 8354b600 (rcu_sched_state+0xa00/0x413e0

Re: [PATCH] powerpc: Allow selection of CONFIG_LD_DEAD_CODE_DATA_ELIMINATION

2018-04-17 Thread Christophe LEROY



Le 17/04/2018 à 19:10, Mathieu Malaterre a écrit :

On Tue, Apr 17, 2018 at 6:49 PM, Christophe LEROY
 wrote:



Le 17/04/2018 à 18:45, Mathieu Malaterre a écrit :


On Tue, Apr 17, 2018 at 12:49 PM, Christophe Leroy
 wrote:


This option does dead code and data elimination with the linker by
compiling with -ffunction-sections -fdata-sections and linking with
--gc-sections.

By selecting this option on mpc885_ads_defconfig,
vmlinux LOAD segment size gets reduced by 10%

Program Header before the patch:
  LOAD off0x0001 vaddr 0xc000 paddr 0x align 2**16
   filesz 0x0036eda4 memsz 0x0038de04 flags rwx

Program Header after the patch:
  LOAD off0x0001 vaddr 0xc000 paddr 0x align 2**16
   filesz 0x00316da4 memsz 0x00334268 flags rwx

Signed-off-by: Christophe Leroy 
---
   arch/powerpc/Kconfig | 8 
   1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 8fe4353be5e3..e1fac49cf465 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -888,6 +888,14 @@ config PPC_MEM_KEYS

If unsure, say y.

+config PPC_UNUSED_ELIMINATION
+   bool "Eliminate unused functions and data from vmlinux"
+   default n
+   select LD_DEAD_CODE_DATA_ELIMINATION
+   help
+ Select this to do dead code and data elimination with the
linker
+ by compiling with -ffunction-sections -fdata-sections and
linking
+ with --gc-sections.
   endmenu



Just for reference, I cannot boot my Mac Mini G4 anymore (yaboot). The
messages I can see (prom_init) are:



Which version of GCC do you use ?


$ powerpc-linux-gnu-gcc --version
powerpc-linux-gnu-gcc (Debian 6.3.0-18) 6.3.0 20170516
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

this is simply coming from:

$ apt-cache policy crossbuild-essential-powerpc
crossbuild-essential-powerpc:
   Installed: 12.3
   Candidate: 12.3
   Version table:
  *** 12.3 500
 500 http://ftp.fr.debian.org/debian stretch/main amd64 Packages
 500 http://ftp.fr.debian.org/debian stretch/main i386 Packages
 100 /var/lib/dpkg/status



Can you provide the generated System.map with and without that option active
?


$ du -sh g4/System.map.*
1.7M g4/System.map.with
1.8M g4/System.map.without


Here below is the list of objects removed with the option selected. I 
can't see anything suspect at first.
Do you use one of the defconfigs of the kernel ? Otherwise, can you 
provide your .config ?
Can you also provide a copy of the messages you can see (prom_init ...) 
when boot is ok ?

Maybe you can also send me the two vmlinux objects.

Thanks
Christophe

account_steal_time
adbhid_exit
adb_reset_bus
add_range
add_range_with_merge
aes_fini
af_unix_exit
agp_exit
agp_find_client_by_pid
agp_find_mem_by_key
agp_find_private
agp_free_memory_wrap
agpioc_protect_wrap
agpioc_release_wrap
agp_uninorth_cleanup
__alloc_reserved_percpu
all_stat_sessions
all_stat_sessions_mutex
apple_driver_exit
arch_cpu_idle_dead
arch_setup_msi_irq
arch_teardown_msi_irq
arch_tlb_gather_mmu
asymmetric_key_cleanup
asymmetric_key_hex_to_key_id
ata_exit
ata_tf_to_lba
ata_tf_to_lba48
attribute_container_add_class_device_adapter
attribute_container_trigger
backlight_class_exit
bdi_lock
bhrb_table
biovec_create_pool
blk_stat_enable_accounting
boot_mapsize
bpf_map_meta_equal
bvec_free
bvec_nr_vecs
calc_load_fold_active
can_request_irq
capacity_margin
cap_inode_getsecurity
cap_mmap_file
cfq_exit
cgroup_is_threaded
cgroup_is_thread_root
cgroup_migrate_add_src
cgroup_migrate_vet_dst
cgroup_on_dfl
cgroup_sk_update_lock
cgroupstats_build
cgroup_task_count
cgroup_transfer_tasks
change_protection
clean_sort_range
clear_ftrace_function
clear_zone_contiguous
__clockevents_update_freq
clockevents_update_freq
clocksource_mark_unstable
clocksource_touch_watchdog
clone_property.isra.2
cmp_range
cn_fini
cn_queue_free_dev
collect_mounts
compaction_restarting
copy_fpr_from_user
copy_fpr_to_user
copy_mount_string
copy_msg
cpu_check_up_prepare
cpufreq_boost_trigger_state
cpufreq_gov_performance_exit
cpu_hotplug_state
cpu_in_idle
cpu_report_state
cpu_set_state_online
cpu_temp
crashk_low_res
crash_wake_offline
create_prof_cpu_mask
crypto_algapi_exit
crypto_exit_proc
crypto_null_mod_fini
crypto_wq_exit
css_rightmost_descendant
css_set_lock
cubictcp_unregister
__current_kernel_time
d_absolute_path
dbg_release_bp_slot
dbg_reserve_bp_slot
deadline_exit
deadline_exit
debug_guardpage_ops
default_restore_msi_irqs
default_teardown_msi_irqs
del_named_trigger
dereference_module_function_descriptor
__dev_pm_qos_flags
dev_pm_qos_read_value
devtree_lock
die_will_crash
disable_cpufreq
dma_buf_deinit
dma_common_contiguous_remap
dma_common_pages_remap
__dma_get_required_mask
dma_pfn_limit_to_zone
do_execveat
do_fork
__domain_nr
do_msg_redirect_map
do_pipe_fl

Re: [RFC 2/6] dmaengine: xilinx_dma: Pass AXI4-Stream control words to netdev dma client

2018-04-17 Thread Peter Ujfalusi

On 2018-04-17 18:54, Lars-Peter Clausen wrote:
> On 04/17/2018 04:53 PM, Peter Ujfalusi wrote:
>> On 2018-04-17 16:58, Lars-Peter Clausen wrote:
> There are two options.
>
> Either you extend the generic interfaces so it can cover your usecase in a
> generic way. E.g. the ability to attach meta data to transfer.

 Fwiw I have this patch as part of a bigger work to achieve similar results:
>>>
>>> That's good stuff. Is this in a public tree somewhere?
>>
>> Not atm. I can not send the user of the new API and I did not wanted to
>> send something like this out of the blue w/o context.
>>
>> But as it is a generic patch, I can send it as well. The only thing is
>> that the need for the memcpy, so I might end up with
>> ptr = get_metadata_ptr(desc, &size); /* size: in RX the valid size */
>>
>> and set_metadata_size(); /* in TX to tell how the client placed */
>>
>> Or something like that, the attach_metadata() as it is works just fine,
>> but high throughput might not like the memcpy.
>>
> 
> In the most abstracted way I'd say metadata and data are two different data
> streams that are correlated and send/received at the same time.

In my case the meatdata is sideband information or parameters for/from
the remote end. Like timestamp, algorithm parameters, keys, etc.

It is tight to the data payload, but it is not part of it.

But the API should be generic enough to cover other use cases where
clients need to provide additional information.
For me, the metadata is part of the descriptor we give and receive back
from the DMA, others might have sideband channel to send that.

For metadata handling we could have:

struct dma_desc_metadata_ops {
   /* To give a buffer for the DMA with the metadata, as it was in my
* original patch
*/
   int (*desc_attach_metadata)(struct dma_async_tx_descriptor *desc,
   void *data, size_t len);

   void *(*desc_get_metadata_ptr)(struct dma_async_tx_descriptor *desc,
  size_t *payload_len, size_t *max_len);
   int (*desc_set_payload_len)(struct dma_async_tx_descriptor *desc,
  size_t payload_len);
};

Probably a simple flag variable to indicate which of the two modes are
supported:
1. Client provided metadata buffer handling
Clients provide the buffer via desc_attach_metadata(), the DMA driver
will do whatever it needs to do, copy it in place, send it differently,
use parameters.
In RX the received metadata is going to be placed to the provided buffer.
2. Ability to give the metadata pointer to user to work on it.
In TX, clients can use desc_get_metadata_ptr() to get the pointer,
current payload size and maximum size of the metadata and can work
directly on the buffer to place the data. Then desc_set_payload_len() to
let the DMA know how much data is actually placed there.
In RX, desc_get_metadata_ptr() will give the user the pointer and the
payload size so it can process that information correctly.

DMA driver can implement either or both, but clients must only use
either 1 or 2 to work with the metadata.


> Think multi-planar transfer, like for audio when the right and left channel
> are in separate buffers and not interleaved. Or video with different
> color/luminance components in separate buffers. This is something that is at
> the moment not covered by the dmaengine API either.

Hrm, true, but it is hardly the metadata use case. It is more like
different DMA transfer type.

> Or you can implement a interface that is specific to your DMA controller 
> and
> any client using this interface knows it is talking to your DMA 
> controller.

 Hrm, so we can have DMA driver specific calls? The reason why TI's 
 keystone 2
 navigator DMA support was rejected that it was introducing NAV specific 
 calls
 for clients to configure features not yet supported by the framework.
>>>
>>> In my opinion it is OK, somebody else might have different ideas. I mean it
>>> is not nice, but it is better than the alternative of overloading the
>>> generic API with driver specific semantics or introducing some kind of IOCTL
>>> catch all callback.
>>
>> True, but the generic API can be extended as well to cover new grounds,
>> features. Like this metadata thing.
>>
>>> If there is tight coupling between the DMA core and client and there is no
>>> intention of using a generic client the best solution might even be to no
>>> use DMAengine at all.
>>
>> This is how the knav stuff ended up. Well it is only used by networking
>> atm, so it is 'fine' to have custom API, but it is not portable.
> 
> I totally agree generic APIs are better, but not everybody has the resources
> to rewrite the whole framework just because they want to do this tiny thing
> that isn't covered by the framework yet. In that case it is better to go
> with a custom API (that might evolve into a generic API), rather than
> overloading the generic API and putting a strain on 

Re: [PATCH] usb: always build usb/common/ targets; fixes extcon-axp288 build error

2018-04-17 Thread Randy Dunlap
On 04/17/18 02:01, Hans de Goede wrote:
> Hi,
> 
> On 17-04-18 07:14, Randy Dunlap wrote:
>> From: Randy Dunlap 
>>
>> The extcon-axp288 driver selects USB_ROLE_SWITCH, but the USB
>> Makefile does not currently build drivers/usb/common/ (where
>> USB_ROLE_SWITCH code is) unless USB_COMMON is set, so modify
>> the USB Makefile to always descend into drivers/usb/common/
>> to build its configured targets.
>>
>> Fixes these build errors:
>>
>> ERROR: "usb_role_switch_get" [drivers/extcon/extcon-axp288.ko] undefined!
>> ERROR: "usb_role_switch_set_role" [drivers/extcon/extcon-axp288.ko] 
>> undefined!
>> ERROR: "usb_role_switch_get_role" [drivers/extcon/extcon-axp288.ko] 
>> undefined!
>> ERROR: "usb_role_switch_put" [drivers/extcon/extcon-axp288.ko] undefined!
>>
>> An alternative patch would be to select USB_COMMON in the EXTCON_AXP288
>> driver Kconfig entry, but this would build more code in
>> drivers/usb/common/ than is necessary.
> 
> Ah, that variant of fixing this got posted yesterday and I acked that,
> but I agree that this version is better.

That was my first patch version, but I didn't like it.

However, I missed that patch. If I had seen it, I wouldn't have posted
this patch.


> Greg, what is your take on this fix?
> 
> Chanwoo Choi, please wait with merging the fix from yesterday until
> we've a decision which fix to use.
> 
> Regards,
> 
> Hans
> 
> 
> 
>>
>> Reported-by: Fengguang Wu 
>> Signed-off-by: Randy Dunlap 
>> Cc: MyungJoo Ham 
>> Cc: Chanwoo Choi 
>> Cc: Hans de Goede 
>> Cc: Greg Kroah-Hartman 
>> Cc: Andy Shevchenko 
>> Cc: Heikki Krogerus 
>> Cc: linux-...@vger.kernel.org
>> ---
>>   drivers/usb/Makefile |    2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> --- lnx-417-rc1.orig/drivers/usb/Makefile
>> +++ lnx-417-rc1/drivers/usb/Makefile
>> @@ -60,7 +60,7 @@ obj-$(CONFIG_USB_CHIPIDEA)    += chipidea/
>>   obj-$(CONFIG_USB_RENESAS_USBHS)    += renesas_usbhs/
>>   obj-$(CONFIG_USB_GADGET)    += gadget/
>>   -obj-$(CONFIG_USB_COMMON)    += common/
>> +obj-y    += common/
>>     obj-$(CONFIG_USBIP_CORE)    += usbip/
>>  
>>
> 


-- 
~Randy


[PATCH 2/2] perf/core: fix bad use of igrab in kernel/event/core.c

2018-04-17 Thread Song Liu
As Miklos reported and suggested:

  This pattern repeats two times in trace_uprobe.c and in
  kernel/events/core.c as well:

  ret = kern_path(filename, LOOKUP_FOLLOW, &path);
  if (ret)
  goto fail_address_parse;

  inode = igrab(d_inode(path.dentry));
  path_put(&path);

  And it's wrong.  You can only hold a reference to the inode if you
  have an active ref to the superblock as well (which is normally
  through path.mnt) or holding s_umount.

  This way unmounting the containing filesystem while the tracepoint is
  active will give you the "VFS: Busy inodes after unmount..." message
  and a crash when the inode is finally put.

  Solution: store path instead of inode.

This patch fixes the issue in kernel/event/core.c.

NOTE: Based on my understanding, perf_addr_filter only supports intel_pt.
However, my test system doesn't support address filtering (or I made a
mistake?). Therefore, I have NOT tested this patch.

Could someone please help test it?

Fixes: 375637bc5249 ("perf/core: Introduce address range filtering")
Cc: Alexander Shishkin 
Cc: Ingo Molnar 
Cc: Peter Zijlstra (Intel) 
Reported-by: Miklos Szeredi 
Signed-off-by: Song Liu 
---
 arch/x86/events/intel/pt.c |  4 ++--
 include/linux/perf_event.h |  2 +-
 kernel/events/core.c   | 21 +
 3 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 3b99394..8d016ce 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -1194,7 +1194,7 @@ static int pt_event_addr_filters_validate(struct 
list_head *filters)
filter->action == PERF_ADDR_FILTER_ACTION_START)
return -EOPNOTSUPP;
 
-   if (!filter->inode) {
+   if (!filter->path.dentry) {
if (!valid_kernel_ip(filter->offset))
return -EINVAL;
 
@@ -1221,7 +1221,7 @@ static void pt_event_addr_filters_sync(struct perf_event 
*event)
return;
 
list_for_each_entry(filter, &head->list, entry) {
-   if (filter->inode && !offs[range]) {
+   if (filter->path.dentry && !offs[range]) {
msr_a = msr_b = 0;
} else {
/* apply the offset */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e71e99e..88922d8 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -467,7 +467,7 @@ enum perf_addr_filter_action_t {
  */
 struct perf_addr_filter {
struct list_headentry;
-   struct inode*inode;
+   struct path path;
unsigned long   offset;
unsigned long   size;
enum perf_addr_filter_action_t  action;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index d7af828..7d711ed 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6668,7 +6668,7 @@ static void perf_event_addr_filters_exec(struct 
perf_event *event, void *data)
 
raw_spin_lock_irqsave(&ifh->lock, flags);
list_for_each_entry(filter, &ifh->list, entry) {
-   if (filter->inode) {
+   if (filter->path.dentry) {
event->addr_filters_offs[count] = 0;
restart++;
}
@@ -7333,7 +7333,7 @@ static bool perf_addr_filter_match(struct 
perf_addr_filter *filter,
 struct file *file, unsigned long offset,
 unsigned long size)
 {
-   if (filter->inode != file_inode(file))
+   if (d_inode(filter->path.dentry) != file_inode(file))
return false;
 
if (filter->offset > offset + size)
@@ -8674,8 +8674,7 @@ static void free_filters_list(struct list_head *filters)
struct perf_addr_filter *filter, *iter;
 
list_for_each_entry_safe(filter, iter, filters, entry) {
-   if (filter->inode)
-   iput(filter->inode);
+   path_put(&filter->path);
list_del(&filter->entry);
kfree(filter);
}
@@ -8772,7 +8771,7 @@ static void perf_event_addr_filters_apply(struct 
perf_event *event)
 * Adjust base offset if the filter is associated to a binary
 * that needs to be mapped:
 */
-   if (filter->inode)
+   if (filter->path.dentry)
event->addr_filters_offs[count] =
perf_addr_filter_apply(filter, mm);
 
@@ -8846,7 +8845,6 @@ perf_event_parse_addr_filter(struct perf_event *event, 
char *fstr,
 {
struct perf_addr_filter *filter = NULL;
char *start, *orig, *filename = NULL;
-   struct path path;
substring_t args[MAX_OPT_ARGS];
int state = IF_STATE_ACTION, token;
unsigned int kernel = 0;
@@ -8959,19 +8957,18 @@ perf_event_parse_addr

[PATCH 1/2] tracing: fix bad use of igrab in trace_uprobe.c

2018-04-17 Thread Song Liu
As Miklos reported and suggested:

  This pattern repeats two times in trace_uprobe.c and in
  kernel/events/core.c as well:

  ret = kern_path(filename, LOOKUP_FOLLOW, &path);
  if (ret)
  goto fail_address_parse;

  inode = igrab(d_inode(path.dentry));
  path_put(&path);

  And it's wrong.  You can only hold a reference to the inode if you
  have an active ref to the superblock as well (which is normally
  through path.mnt) or holding s_umount.

  This way unmounting the containing filesystem while the tracepoint is
  active will give you the "VFS: Busy inodes after unmount..." message
  and a crash when the inode is finally put.

  Solution: store path instead of inode.

This patch fixes two instances in trace_uprobe.c.

Fixes: f3f096cfedf8 ("tracing: Provide trace events interface for uprobes")
Fixes: 33ea4b24277b ("perf/core: Implement the 'perf_uprobe' PMU")
Cc: Steven Rostedt 
Cc: Ingo Molnar 
Cc: Howard McLauchlan 
Cc: Josef Bacik 
Cc: Srikar Dronamraju 
Reported-by: Miklos Szeredi 
Signed-off-by: Song Liu 
---
 kernel/trace/trace_uprobe.c | 42 ++
 1 file changed, 14 insertions(+), 28 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 0d450b4..80dfcdf 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -55,7 +55,7 @@ struct trace_uprobe {
struct list_headlist;
struct trace_uprobe_filter  filter;
struct uprobe_consumer  consumer;
-   struct inode*inode;
+   struct path path;
char*filename;
unsigned long   offset;
unsigned long   nhit;
@@ -289,7 +289,7 @@ static void free_trace_uprobe(struct trace_uprobe *tu)
for (i = 0; i < tu->tp.nr_args; i++)
traceprobe_free_probe_arg(&tu->tp.args[i]);
 
-   iput(tu->inode);
+   path_put(&tu->path);
kfree(tu->tp.call.class->system);
kfree(tu->tp.call.name);
kfree(tu->filename);
@@ -363,7 +363,6 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
 static int create_trace_uprobe(int argc, char **argv)
 {
struct trace_uprobe *tu;
-   struct inode *inode;
char *arg, *event, *group, *filename;
char buf[MAX_EVENT_NAME_LEN];
struct path path;
@@ -371,7 +370,6 @@ static int create_trace_uprobe(int argc, char **argv)
bool is_delete, is_return;
int i, ret;
 
-   inode = NULL;
ret = 0;
is_delete = false;
is_return = false;
@@ -448,14 +446,6 @@ static int create_trace_uprobe(int argc, char **argv)
if (ret)
goto fail_address_parse;
 
-   inode = igrab(d_inode(path.dentry));
-   path_put(&path);
-
-   if (!inode || !S_ISREG(inode->i_mode)) {
-   ret = -EINVAL;
-   goto fail_address_parse;
-   }
-
ret = kstrtoul(arg, 0, &offset);
if (ret)
goto fail_address_parse;
@@ -490,7 +480,8 @@ static int create_trace_uprobe(int argc, char **argv)
goto fail_address_parse;
}
tu->offset = offset;
-   tu->inode = inode;
+   tu->path.mnt = path.mnt;
+   tu->path.dentry = path.dentry;
tu->filename = kstrdup(filename, GFP_KERNEL);
 
if (!tu->filename) {
@@ -558,7 +549,7 @@ static int create_trace_uprobe(int argc, char **argv)
return ret;
 
 fail_address_parse:
-   iput(inode);
+   path_put(&path);
 
pr_info("Failed to parse address or file.\n");
 
@@ -937,7 +928,8 @@ probe_event_enable(struct trace_uprobe *tu, struct 
trace_event_file *file,
goto err_flags;
 
tu->consumer.filter = filter;
-   ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
+   ret = uprobe_register(d_inode(tu->path.dentry), tu->offset,
+ &tu->consumer);
if (ret)
goto err_buffer;
 
@@ -981,7 +973,7 @@ probe_event_disable(struct trace_uprobe *tu, struct 
trace_event_file *file)
 
WARN_ON(!uprobe_filter_is_empty(&tu->filter));
 
-   uprobe_unregister(tu->inode, tu->offset, &tu->consumer);
+   uprobe_unregister(d_inode(tu->path.dentry), tu->offset, &tu->consumer);
tu->tp.flags &= file ? ~TP_FLAG_TRACE : ~TP_FLAG_PROFILE;
 
uprobe_buffer_disable();
@@ -1056,7 +1048,8 @@ static int uprobe_perf_close(struct trace_uprobe *tu, 
struct perf_event *event)
write_unlock(&tu->filter.rwlock);
 
if (!done)
-   return uprobe_apply(tu->inode, tu->offset, &tu->consumer, 
false);
+   return uprobe_apply(d_inode(tu->path.dentry), tu->offset,
+   &tu->consumer, false);
 
return 0;
 }
@@ -1088,7 +1081,8 @@ static int uprobe_perf_open(struct trace_uprobe *tu, 
struct perf_event *event)
 
err = 0;
if (!done

RE: [PATCH 2/6 v2] iommu: of: make of_pci_map_rid() available for other devices too

2018-04-17 Thread Nipun Gupta


> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: Tuesday, April 17, 2018 10:23 PM
> To: Nipun Gupta ; robh...@kernel.org;
> frowand.l...@gmail.com
> Cc: will.dea...@arm.com; mark.rutl...@arm.com; catalin.mari...@arm.com;
> h...@lst.de; gre...@linuxfoundation.org; j...@8bytes.org;
> m.szyprow...@samsung.com; shawn...@kernel.org; bhelg...@google.com;
> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> devicet...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; linuxppc-
> d...@lists.ozlabs.org; linux-...@vger.kernel.org; Bharat Bhushan
> ; stuyo...@gmail.com; Laurentiu Tudor
> ; Leo Li 
> Subject: Re: [PATCH 2/6 v2] iommu: of: make of_pci_map_rid() available for
> other devices too
> 
> On 17/04/18 11:21, Nipun Gupta wrote:
> > iommu-map property is also used by devices with fsl-mc. This patch
> > moves the of_pci_map_rid to generic location, so that it can be used
> > by other busses too.
> >
> > Signed-off-by: Nipun Gupta 
> > ---
> >   drivers/iommu/of_iommu.c | 106
> > +--
> 
> Doesn't this break "msi-parent" parsing for !CONFIG_OF_IOMMU? I guess you
> don't want fsl-mc to have to depend on PCI, but this looks like a step in the
> wrong direction.

Thanks for pointing out.
Agree, this will break "msi-parent" parsing for !CONFIG_OF_IOMMU case.

> 
> I'm not entirely sure where of_map_rid() fits best, but from a quick look 
> around
> the least-worst option might be drivers/of/of_address.c, unless Rob and Frank
> have a better idea of where generic DT-based ID translation routines could 
> live?
> 
> >   drivers/of/irq.c |   6 +--
> >   drivers/pci/of.c | 101 
> > 
> >   include/linux/of_iommu.h |  11 +
> >   include/linux/of_pci.h   |  10 -
> >   5 files changed, 117 insertions(+), 117 deletions(-)
> >

[...]

> >   struct of_pci_iommu_alias_info {
> > struct device *dev;
> > struct device_node *np;
> > @@ -149,9 +249,9 @@ static int of_pci_iommu_init(struct pci_dev *pdev, u16
> alias, void *data)
> > struct of_phandle_args iommu_spec = { .args_count = 1 };
> > int err;
> >
> > -   err = of_pci_map_rid(info->np, alias, "iommu-map",
> > -"iommu-map-mask", &iommu_spec.np,
> > -iommu_spec.args);
> > +   err = of_map_rid(info->np, alias, "iommu-map",
> > +"iommu-map-mask", &iommu_spec.np,
> > +iommu_spec.args);
> 
> Super-nit: Apparently I missed rewrapping this to 2 lines in d87beb749281, 
> but if
> it's being touched again, that would be nice ;)

Sure.. I'll take care of this in the next version :)

Regards,
Nipun


Re: [PATCH] locking/rwsem: Synchronize task state & waiter->task of readers

2018-04-17 Thread Benjamin Herrenschmidt
On Tue, 2018-04-10 at 13:22 -0400, Waiman Long wrote:
> It was observed occasionally in PowerPC systems that there was reader
> who had not been woken up but that its waiter->task had been cleared.
> 
> One probable cause of this missed wakeup may be the fact that the
> waiter->task and the task state have not been properly synchronized as
> the lock release-acquire pair of different locks in the wakeup code path
> does not provide a full memory barrier guarantee. So smp_store_mb()
> is now used to set waiter->task to NULL to provide a proper memory
> barrier for synchronization.
> 
> Signed-off-by: Waiman Long 

That looks right... nothing in either lock or unlock will prevent a
store going past a load.

> ---
>  kernel/locking/rwsem-xadd.c | 17 +
>  1 file changed, 17 insertions(+)
> 
> diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> index e795908..b3c588c 100644
> --- a/kernel/locking/rwsem-xadd.c
> +++ b/kernel/locking/rwsem-xadd.c
> @@ -209,6 +209,23 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
>   smp_store_release(&waiter->task, NULL);
>   }
>  
> + /*
> +  * To avoid missed wakeup of reader, we need to make sure
> +  * that task state and waiter->task are properly synchronized.
> +  *
> +  * wakeup sleep
> +  * -- -
> +  * __rwsem_mark_wake:   rwsem_down_read_failed*:
> +  *   [S] waiter->task [S] set_current_state(state)
> +  *   MB   MB
> +  * try_to_wake_up:
> +  *   [L] state[L] waiter->task
> +  *
> +  * For the wakeup path, the original lock release-acquire pair
> +  * does not provide enough guarantee of proper synchronization.
> +  */
> + smp_mb();
> +
>   adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment;
>   if (list_empty(&sem->wait_list)) {
>   /* hit end of list above */


Re: [PATCH net-next v4 0/3] kernel: add support to collect hardware logs in crash recovery kernel

2018-04-17 Thread Dave Young
Hi Rahul,
On 04/17/18 at 01:14pm, Rahul Lakkireddy wrote:
> On production servers running variety of workloads over time, kernel
> panic can happen sporadically after days or even months. It is
> important to collect as much debug logs as possible to root cause
> and fix the problem, that may not be easy to reproduce. Snapshot of
> underlying hardware/firmware state (like register dump, firmware
> logs, adapter memory, etc.), at the time of kernel panic will be very
> helpful while debugging the culprit device driver.
> 
> This series of patches add new generic framework that enable device
> drivers to collect device specific snapshot of the hardware/firmware
> state of the underlying device in the crash recovery kernel. In crash
> recovery kernel, the collected logs are added as elf notes to
> /proc/vmcore, which is copied by user space scripts for post-analysis.
> 
> The sequence of actions done by device drivers to append their device
> specific hardware/firmware logs to /proc/vmcore are as follows:
> 
> 1. During probe (before hardware is initialized), device drivers
> register to the vmcore module (via vmcore_add_device_dump()), with
> callback function, along with buffer size and log name needed for
> firmware/hardware log collection.

I assumed the elf notes info should be prepared while kexec_[file_]load
phase. But I did not read the old comment, not sure if it has been discussed
or not.

If do this in 2nd kernel a question is driver can be loaded later than vmcore 
init.
How to guarantee the function works if vmcore reading happens before
the driver is loaded?

Also it is possible that kdump initramfs does not contains the driver
module.

Am I missing something?

> 
> 2. vmcore module allocates the buffer with requested size. It adds
> an elf note and invokes the device driver's registered callback
> function.
> 
> 3. Device driver collects all hardware/firmware logs into the buffer
> and returns control back to vmcore module.
> 
> The device specific hardware/firmware logs can be seen as elf notes:
> 
> # readelf -n /proc/vmcore
> 
> Displaying notes found at file offset 0x1000 with length 0x04003288:
>   Owner Data size Description
>   VMCOREDD_cxgb4_:02:00.4 0x02000fd8  Unknown note type: (0x0700)
>   VMCOREDD_cxgb4_:04:00.4 0x02000fd8  Unknown note type: (0x0700)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   CORE 0x0150 NT_PRSTATUS (prstatus structure)
>   VMCOREINFO   0x074f Unknown note type: (0x)
> 
> Patch 1 adds API to vmcore module to allow drivers to register callback
> to collect the device specific hardware/firmware logs.  The logs will
> be added to /proc/vmcore as elf notes.
> 
> Patch 2 updates read and mmap logic to append device specific hardware/
> firmware logs as elf notes.
> 
> Patch 3 shows a cxgb4 driver example using the API to collect
> hardware/firmware logs in crash recovery kernel, before hardware is
> initialized.
> 
> Thanks,
> Rahul
> 
> RFC v1: https://lkml.org/lkml/2018/3/2/542
> RFC v2: https://lkml.org/lkml/2018/3/16/326
> 
> ---
> v4:
> - Made __vmcore_add_device_dump() static.
> - Moved compile check to define vmcore_add_device_dump() to
>   crash_dump.h to fix compilation when vmcore.c is not compiled in.
> - Convert ---help--- to help in Kconfig as indicated by checkpatch.
> - Rebased to tip.
> 
> v3:
> - Dropped sysfs crashdd module.
> - Exported dumps as elf notes. Suggested by Eric Biederman
>   .  Added as patch 2 in this version.
> - Added CONFIG_PROC_VMCORE_DEVICE_DUMP to allow configuring device
>   dump support.
> - Moved logic related to adding dumps from crashdd to vmcore module.
> - Rename all crashdd* to vmcoredd*.
> - Updated comments.
> 
> v2:
> - Added ABI Documentation for crashdd.
> - Directly use octal permission instead of macro.
> 
> Changes since rfc v2:
> - Moved exporting crashdd from procfs to sysfs. Suggested by
>   Stephen Hemminger 
> - Moved code from fs/proc/crashdd.c to fs/crashdd/ directory.
> - Replaced all proc API with sysfs API and updated comments.
> - Calling driver callback before creating the binary file under
>   crashdd sysfs.
> - Changed binary dump file permission from S_IRUSR to S_IRUGO.
> - Changed module name from CRASH_DRIVER_DUMP to CRASH_DEVICE_DUMP.
> 
> rfc v2:
> - Collecting logs in 2nd kernel instead of during kernel panic.
>   Suggested by Eric Biederman .
> - Added new crashdd module that exports /proc/crashdd/ containing
>   driver's registered hard

Re: [PATCH v3 2/2] iommu/amd: Add basic debugfs infrastructure for AMD IOMMU

2018-04-17 Thread Yang, Shunyong
Hi, Gary and Sohil,

On Tue, 2018-04-17 at 13:38 -0400, Hook, Gary wrote:
> On 4/13/2018 8:08 PM, Mehta, Sohil wrote:
> > 
> > On Fri, 2018-04-06 at 08:17 -0500, Gary R Hook wrote:
> > > 
> > >   
> > > +
> > > +void amd_iommu_debugfs_setup(struct amd_iommu *iommu)
> > > +{
> > > + char name[MAX_NAME_LEN + 1];
> > > + struct dentry *d_top;
> > > +
> > > + if (!debugfs_initialized())
> > Probably not needed.
> Right.

When will this check is needed?
IMO, this function is to check debugfs ready status before we want to
use debugfs. I just want to understand when we should use
debugfs_initialized();

Thanks.
Shunyong.

> 
> > 
> > 
> > > 
> > > + return;
> > > +
> > > + mutex_lock(&amd_iommu_debugfs_lock);
> > > + if (!amd_iommu_debugfs) {
> > > + d_top = iommu_debugfs_setup();
> > > + if (d_top)
> > > + amd_iommu_debugfs =
> > > debugfs_create_dir("amd", d_top);
> > > + }
> > > + mutex_unlock(&amd_iommu_debugfs_lock);
> > 
> > You can do the above only once if you iterate over the IOMMUs here
> >   instead of doing it in amd_iommu_init.
> I'm not sure it matters, given the finite number of IOMMUs in a
> system, 
> and the fact that this work is done exactly once. However, removal of
> a 
> lock is fine thing, so I'll move this around.
> 
> > 
> > 
> > > 
> > > + if (amd_iommu_debugfs) {
> > > + snprintf(name, MAX_NAME_LEN, "iommu%02d", iommu-
> > > > 
> > > > index);
> > > + iommu->debugfs = debugfs_create_dir(name,
> > > + amd_iommu_de
> > > bugf
> > > s);
> > > + if (!iommu->debugfs) {
> > > + debugfs_remove_recursive(amd_iommu_debug
> > > fs);
> > > + amd_iommu_debugfs = NULL;
> > > + }
> > > + }
> > > +}
> > -Sohil
> > 
> ___
> iommu mailing list
> io...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH spi] spi: pxa2xx: pxa2xx_spi_transfer_one() can be static

2018-04-17 Thread Jarkko Nikula

On 04/17/18 22:53, kbuild test robot wrote:


Fixes: d5898e19c0d7 ("spi: pxa2xx: Use core message processing loop")
Signed-off-by: Fengguang Wu 
---
  spi-pxa2xx.c |6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/spi/spi-pxa2xx.c b/drivers/spi/spi-pxa2xx.c
index c852ea5..40f1346 100644
--- a/drivers/spi/spi-pxa2xx.c
+++ b/drivers/spi/spi-pxa2xx.c
@@ -911,9 +911,9 @@ static bool pxa2xx_spi_can_dma(struct spi_controller 
*master,
   xfer->len >= chip->dma_burst_size;
  }
  
-int pxa2xx_spi_transfer_one(struct spi_controller *master,

-   struct spi_device *spi,
-   struct spi_transfer *transfer)
+static int pxa2xx_spi_transfer_one(struct spi_controller *master,
+  struct spi_device *spi,
+  struct spi_transfer *transfer)


Thanks Fengguang. I don't understand how I managed to drop "static" 
while doing manual s/pump_transfers/pxa2xx_spi_transfer_one/ :-)


Reviewed-by: Jarkko Nikula 


Re: [PATCH v2 0/3] perf/buildid-cache: Add --list and --purge-all options

2018-04-17 Thread Masami Hiramatsu
On Tue, 17 Apr 2018 09:43:43 +0530
Ravi Bangoria  wrote:

> First patch is a trivial error message fix. Second and third
> adds new options --list and --purge-all to 'buildid-cache'
> subcommand.
> 
> v2 changes:
>  - [PATCH v2 2/3] Display optput of 'perf buildid-cache -l' same as
>'perf buildid-list'.
>  - [PATCH v2 2/3] Other minor changes as suggested by Jiri.
> 
> v1 can be found at:
>   https://lkml.org/lkml/2018/4/9/295

All patches in this series looks good to me.

Acked-by: Masami Hiramatsu 

Thanks,

> 
> Ravi Bangoria (3):
>   tools/parse-options: Add '\n' at the end of error messages
>   perf/buildid-cache: Support --list option
>   perf/buildid-cache: Support --purge-all option
> 
>  tools/lib/subcmd/parse-options.c|  6 +-
>  tools/perf/Documentation/perf-buildid-cache.txt |  7 ++-
>  tools/perf/builtin-buildid-cache.c  | 77 
> -
>  3 files changed, 83 insertions(+), 7 deletions(-)
> 
> -- 
> 2.14.3
> 


-- 
Masami Hiramatsu 


Re: [PATCH 3/3] perf/buildid-cache: Support --purge-all option

2018-04-17 Thread Masami Hiramatsu
On Mon, 16 Apr 2018 12:30:17 +0200
Jiri Olsa  wrote:

> On Mon, Apr 16, 2018 at 03:10:40PM +0530, Ravi Bangoria wrote:
> > Hi Masami,
> > 
> > On 04/16/2018 02:57 PM, Masami Hiramatsu wrote:
> > > On Mon,  9 Apr 2018 16:36:33 +0530
> > > Ravi Bangoria  wrote:
> > >
> > >> User can remove files from cache using --remove/--purge options
> > >> but both needs list of files as an argument. It's not convenient
> > >> when you want to flush out entire cache. Add an option to purge
> > >> all files from cache.
> > >>
> > >> Ex,
> > >>   # perf buildid-cache -l
> > >> /tmp/a.out (8a86ef73e44067bca52cc3f6cd3e5446c783391c)
> > >> /tmp/a.out.1 (ebe71fdcf4b366518cc154d570a33cd461a51c36)
> > >>   # perf buildid-cache -P -v
> > >> Removing /tmp/a.out (8a86ef73e44067bca52cc3f6cd3e5446c783391c): Ok
> > >> Removing /tmp/a.out.1 (ebe71fdcf4b366518cc154d570a33cd461a51c36): Ok
> > >> Purged all: Ok
> > > Hmm, for purging all caches will be done by
> > >
> > > $ rm -rf ~/.debug
> > >
> > > Are there any difference?
> > 
> > No logical difference if you know it's ~/.debug where it goes. :)
> > I also used to do rm -rf earlier.
> > 
> > This option is for a perf users. But I'm fine if it's not really needed.
> > Will drop it.
> 
> I'd keep it.. as you said it could be configured at some other dir

Sounds reasonable. :)

Thanks,

> 
> jirka


-- 
Masami Hiramatsu 


Re: [PATCH v6 07/11] ARM: sun9i: smp: Rename clusters's power-off

2018-04-17 Thread Mylène Josserand
Hello,

On Tue, 17 Apr 2018 11:21:02 +0300
Sergei Shtylyov  wrote:

> Hello!
> 
> On 4/17/2018 12:50 AM, Mylène Josserand wrote:
> 
> > To prepare the support for sun8i-a83t, rename the variable name  
> 
> s/variable/macro/ maybe? Also "rename the ... name" sounds tautological...

Thank you for the correction.

Best regards,

Mylène

-- 
Mylène Josserand, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
http://bootlin.com

> 
> > that handles the power-off of clusters because it is different from
> > sun9i-a80 to sun8i-a83t.
> > 
> > The power off register for clusters are different from a80 and a83t.
> > 
> > Signed-off-by: Mylène Josserand 
> > Acked-by: Maxime Ripard 
> > Reviewed-by: Chen-Yu Tsai 
> > ---
> >   arch/arm/mach-sunxi/mc_smp.c | 6 +++---
> >   1 file changed, 3 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/arm/mach-sunxi/mc_smp.c b/arch/arm/mach-sunxi/mc_smp.c
> > index 727968d6a3e5..03f021d0c73e 100644
> > --- a/arch/arm/mach-sunxi/mc_smp.c
> > +++ b/arch/arm/mach-sunxi/mc_smp.c
> > @@ -60,7 +60,7 @@
> >   #define PRCM_CPU_PO_RST_CTRL_CORE(n)  BIT(n)
> >   #define PRCM_CPU_PO_RST_CTRL_CORE_ALL 0xf
> >   #define PRCM_PWROFF_GATING_REG(c) (0x100 + 0x4 * (c))
> > -#define PRCM_PWROFF_GATING_REG_CLUSTER BIT(4)
> > +#define PRCM_PWROFF_GATING_REG_CLUSTER_SUN9I   BIT(4)
> >   #define PRCM_PWROFF_GATING_REG_CORE(n)BIT(n)
> >   #define PRCM_PWR_SWITCH_REG(c, cpu)   (0x140 + 0x10 * (c) + 0x4 * 
> > (cpu))
> >   #define PRCM_CPU_SOFT_ENTRY_REG   0x164
> > @@ -255,7 +255,7 @@ static int sunxi_cluster_powerup(unsigned int cluster)
> >   
> > /* clear cluster power gate */
> > reg = readl(prcm_base + PRCM_PWROFF_GATING_REG(cluster));
> > -   reg &= ~PRCM_PWROFF_GATING_REG_CLUSTER;
> > +   reg &= ~PRCM_PWROFF_GATING_REG_CLUSTER_SUN9I;
> > writel(reg, prcm_base + PRCM_PWROFF_GATING_REG(cluster));
> > udelay(20);
> >   
> > @@ -452,7 +452,7 @@ static int sunxi_cluster_powerdown(unsigned int cluster)
> > /* gate cluster power */
> > pr_debug("%s: gate cluster power\n", __func__);
> > reg = readl(prcm_base + PRCM_PWROFF_GATING_REG(cluster));
> > -   reg |= PRCM_PWROFF_GATING_REG_CLUSTER;
> > +   reg |= PRCM_PWROFF_GATING_REG_CLUSTER_SUN9I;
> > writel(reg, prcm_base + PRCM_PWROFF_GATING_REG(cluster));
> > udelay(20);
> > 
> 
> MBR, Sergei



Re: [PATCH v6 00/11] Sunxi: Add SMP support on A83T

2018-04-17 Thread Mylène Josserand
Hello Ondrej,

On Tue, 17 Apr 2018 04:15:00 +0200
Ondřej Jirman  wrote:

> Hello Mylène,
> 
> Please also add this:
> 
> diff --git a/arch/arm/mach-sunxi/Kconfig b/arch/arm/mach-sunxi/Kconfig
> index ce53ceaf4cc5..d9c8ecf88ec6 100644
> --- a/arch/arm/mach-sunxi/Kconfig
> +++ b/arch/arm/mach-sunxi/Kconfig
> @@ -51,7 +51,7 @@ config MACH_SUN9I
>  config ARCH_SUNXI_MC_SMP
> bool
> depends on SMP
> -   default MACH_SUN9I
> + default MACH_SUN9I || MACH_SUN8I
> select ARM_CCI400_PORT_CTRL
> select ARM_CPU_SUSPEND
> 
> Because otherwise when I'm building kernel just for sun8i and I don't have 
> sun9i
> enabled, this new SMP code for A83T (which is sun8i) will not be built.
> 

True, I forgot to add this, thanks!

Best regards,

Mylène

-- 
Mylène Josserand, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
http://bootlin.com


> thank you,
>   Ondrej
> 
> On Mon, Apr 16, 2018 at 11:50:21PM +0200, Mylène Josserand wrote:
> > Hello everyone,
> > 
> > This is a V6 of my series that adds SMP support for Allwinner sun8i-a83t.
> > Based on sunxi's tree, sunxi/for-next branch.
> > Depends on a patch from Doug Berger that allows to include the "cpu-type"
> > header on assembly files:
> > 6c7dd080ba4b ("ARM: Allow this header to be included by assembly files")
> > 
> > This new series refactors the shmobile code to use the function introduced
> > in this series: "secure_cntvoff_init".
> > Geert Uytterhoeven and Simon Horman, could you review and test this series
> > on Renesas boards? Thank you very much!
> > 
> > If you have any remarks/questions, let me know.
> > Thank you in advance,
> > Mylène
> > 
> > Changes since v5:
> > - Remove my patch 01 and use the patch of Doug Berger to be able to
> > include the cpu-type header on assembly files.
> > - Rename smp_init_cntvoff function into secure_cntvoff_init according
> > to Marc Zyngier's review.
> > - According to Chen-Yu and Maxime's reviews, remove the patch that was
> > moving structures. Instead of using an index to retrieve which
> > architecture we are having, use a global variable.
> > - Merge the 2 patches that move assembly code from C to assembly file.
> > - Use a sun8i field instead of sun9i to know on which architecture we
> > are using because many modifications/additions of the code are for
> > sun8i-a83t.
> > - Rework the patch "add is_sun8i field" to add only this field in this
> > patch. The part of the patch that was starting to handle the differences
> > between sun8i-a83t and sun9i-a80 is merged in the patch that adds the
> > support of sun8i-a83t.
> > - Add a new patch that refactor the shmobile code to use the new 
> > function
> > secure_cntvoff_init introduced in this series.
> > 
> > Changes since v4:
> > - Rebased my series according to new Chen-Yu series:
> >"ARM: sunxi: Clean and improvements for multi-cluster SMP"
> >https://lkml.org/lkml/2018/3/8/886
> > - Updated my series according to Marc Zyngier's reviews to add CNTVOFF
> > initialization's function into ARM's common part. Thanks to that, other
> > platforms such as Renesa can use this function.
> > - For boot CPU, create a new machine to handle the CNTVOFF 
> > initialization
> > using "init_early" callback.
> > Changes since v3:
> > - Take into account Maxime's reviews:
> > - split the first patch into 4 new patches: add sun9i device tree
> > parsing, rename some variables, add a83t support and finally,
> > add hotplug support.
> > - Move the code of previous patch 07 (to disable CPU0 disabling)
> > into hotplug support patch (see patch 04)
> > - Remove the patch that added PRCM register because it is already
> > available. Because of that, update the device tree parsing to use
> > "sun8i-a83t-r-ccu".
> > - Use a variable to know which SoC we currently have
> > - Take into account Chen-Yu's reviews: create two iounmap functions
> > to release the resources of the device tree parsing.
> > - Take into account Marc's review: Update the code to initialize CNTVOFF
> > register. As there is already assembly code in the driver, I decided
> > to create an assembly file not to mix assembly and C code.
> > For that, I create 3 new patches: move the current assembly code that
> > handles the cluster cache enabling into a file, move the cpu_resume 
> > entry
> > in this file and finally, add a new assembly entry to initialize the 
> > timer
> > offset for boot CPU and secondary CPUs.
> > 
> > Changes since v2:
> > - Rebased my modifications according to new Chen Yu's patch series
> > that adds SMP support for sun9i-a80 (without MCPM).
> > - Split the device-tree patches into 3 patches for CPUCFG, R_CPUCFG
> > and PRCM registers for more visibility.
> > - The hotplug of CPU0 is currently not working (even after trying what
> > Allwinner's c

Re: cpu stopper threads and load balancing leads to deadlock

2018-04-17 Thread Mike Galbraith
On Tue, 2018-04-17 at 15:21 +0100, Matt Fleming wrote:
> Hi guys,
> 
> We've seen a bug in one of our SLE kernels where the cpu stopper
> thread ("migration/15") is entering idle balance. This then triggers
> active load balance.
> 
> At the same time, a task on another CPU triggers a page fault and NUMA
> balancing kicks in to try and migrate the task closer to the NUMA node
> for that page (we're inside stop_two_cpus()). This faulting task is
> spinning in try_to_wake_up() (inside smp_cond_load_acquire(&p->on_cpu,
> !VAL)), waiting for "migration/15" to context switch.
> 
> Unfortunately, because "migration/15" is doing active load balance
> it's spinning waiting for the NUMA-page-faulting CPU's stopper lock,
> which is already held (since it's inside stop_two_cpus()).
> 
> Deadlock ensues.
> 
> This seems like a situation that should be prohibited, but I cannot
> find any code to prevent it. Is it OK for stopper threads to load
> balance? Is there something that should prevent this situation from
> happening?

I don't see anything to stop the deadlock either, would exclude stop
class from playing idle balancer entirely, though I suppose you could
check for caller being stop class in need_active_balance().  I don't
think any RT class playing idle balancer is particularly wonderful.

-Mike


Re: [PATCH v2] usb: chipidea: Hook into mux framework to toggle usb switch

2018-04-17 Thread yossim

On 2018-04-17 17:11, Peter Rosin wrote:

On 2018-04-17 15:52, Yossi Mansharoff wrote:

On the db410c 96boards platform we have a TC7USB40MU on the board
to mux the D+/D- lines coming from the controller between a micro
usb "device" port and a USB hub for "host" roles[1]. During a
role switch, we need to toggle this mux to forward the D+/D-
lines to either the port or the hub. Add the necessary code to do
the role switch in chipidea core via the generic mux framework.
Board configurations like on db410c are expected to change roles
via the sysfs API described in
Documentation/ABI/testing/sysfs-platform-chipidea-usb2.


Ok, so this is v2. Please describe what is different from v1.
I have told you before that this information helps.

[1] 
https://github.com/96boards/documentation/raw/master/ConsumerEdition/DragonBoard-410c/HardwareDocs/Schematics_DragonBoard.pdf


This link returns 404 for me.

Cheers,
Peter



Hi,
This patch was split apart from the original patch into two patches
one for chipidea and the other for bindings.
this patch has no other changes two the code.

I will update the link.

thanks
Yossi


Re: [PATCH v6 09/11] ARM: sun8i: smp: Add support for A83T

2018-04-17 Thread Mylène Josserand
Hello Maxime,

On Tue, 17 Apr 2018 13:20:38 +0200
Maxime Ripard  wrote:

> On Mon, Apr 16, 2018 at 11:50:30PM +0200, Mylène Josserand wrote:
> > @@ -535,8 +599,12 @@ static int sunxi_mc_smp_cpu_kill(unsigned int l_cpu)
> > return !ret;
> >  }
> >  
> > -static bool sunxi_mc_smp_cpu_can_disable(unsigned int __unused)
> > +static bool sunxi_mc_smp_cpu_can_disable(unsigned int cpu)
> >  {
> > +   /* CPU0 hotplug not handled for sun8i-a83t */
> > +   if (is_sun8i)
> > +   if (cpu == 0)
> > +   return false;
> > return true;  
> 
> I think Chen-Yu told you how to implement the hotplug in the previous
> iteration, did you have time to test it?

Not yet, I will have a look this evening.

Best regards,

-- 
Mylène Josserand, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
http://bootlin.com


[PATCH RESENT v4] dell_rbu: make firmware payload memory uncachable

2018-04-17 Thread Takashi Iwai
From: Stuart Hayes 

The dell_rbu driver takes firmware update payloads and puts them in memory so
the system BIOS can find them after a reboot.  This sometimes fails (though
rarely), because the memory containing the payload is in the CPU cache but
never gets written back to main memory before the system is rebooted (CPU
cache contents are lost on reboot).

With this patch, the payload memory will be changed to uncachable to ensure
that the payload is actually in main memory before the system is rebooted.

Signed-off-by: Stuart Hayes 
Reviewed-by: Takashi Iwai 
Signed-off-by: Takashi Iwai 
---
v2 Added include, removed extra parentheses
v3 Corrected formatting and include line
v4 Moved set_memory_uc() outside the while loop so that the memory is
   definitely allocated before it is set to uncachable

Andrew, could you pick up this orphan one?  Thanks!

diff --git a/drivers/firmware/dell_rbu.c b/drivers/firmware/dell_rbu.c
index 2f452f1f7c8a..53f27a6e2d76 100644
--- a/drivers/firmware/dell_rbu.c
+++ b/drivers/firmware/dell_rbu.c
@@ -45,6 +45,7 @@
 #include 
 #include 
 #include 
+#include 
 
 MODULE_AUTHOR("Abhay Salunke ");
 MODULE_DESCRIPTION("Driver for updating BIOS image on DELL systems");
@@ -181,6 +182,11 @@ static int create_packet(void *data, size_t length)
packet_data_temp_buf = NULL;
}
}
+   /*
+* set to uncachable or it may never get written back before reboot
+*/
+   set_memory_uc((unsigned long)packet_data_temp_buf, 1 << ordernum);
+
spin_lock(&rbu_data.lock);
 
newpacket->data = packet_data_temp_buf;
@@ -349,6 +355,8 @@ static void packet_empty_list(void)
 * to make sure there are no stale RBU packets left in memory
 */
memset(newpacket->data, 0, rbu_data.packetsize);
+   set_memory_wb((unsigned long)newpacket->data,
+   1 << newpacket->ordernum);
free_pages((unsigned long) newpacket->data,
newpacket->ordernum);
kfree(newpacket);



Re: [PATCH] gpu: drm: i915: Change return type to vm_fault_t

2018-04-17 Thread Jani Nikula
On Tue, 17 Apr 2018, Souptick Joarder  wrote:
> On 17-Apr-2018 9:45 PM, "Matthew Wilcox"  wrote:
>>
>> On Tue, Apr 17, 2018 at 09:14:32PM +0530, Souptick Joarder wrote:
>> > Not exactly. The plan for these patches is to introduce new vm_fault_t
> type
>> > in vm_operations_struct fault handlers. It's now available in 4.17-rc1.
> We will
>> > push all the required drivers/filesystem changes through different
> maintainers
>> > to linus tree. Once everything is converted into vm_fault_t type then
> Changing
>> > it from a signed to an unsigned int causes GCC to warn about an
> assignment
>> > from an incompatible type -- int foo(void) is incompatible with
>> > unsigned int foo(void).
>> >
>> > Please refer 1c8f422059ae ("mm: change return type to vm_fault_t") in
> 4.17-rc1.
>>
>> I think this patch would be clearer if you did
>>
>> -   int ret;
>> +   int err;
>> +   vm_fault_t ret;
>>
>> Then it would be clearer to the maintainer that you're splitting apart the
>> VM_FAULT and errno codes.
>>
>> Sorry for not catching this during initial review.
>
> Ok, I will make required changes and send v2. Sorry, even I missed this :)

I'm afraid Daniel is closer to the truth. My bad, sorry for the noise.

BR,
Jani.



>>
>> > On Tue, Apr 17, 2018 at 8:59 PM, Jani Nikula
>> >  wrote:
>> > > On Tue, 17 Apr 2018, Souptick Joarder  wrote:
>> > >> Use new return type vm_fault_t for fault handler. For
>> > >> now, this is just documenting that the function returns
>> > >> a VM_FAULT value rather than an errno. Once all instances
>> > >> are converted, vm_fault_t will become a distinct type.
>> > >>
>> > >> Reference id -> 1c8f422059ae ("mm: change return type to
>> > >> vm_fault_t")
>> > >>
>> > >> Signed-off-by: Souptick Joarder 
>> > >> ---
>> > >>  drivers/gpu/drm/i915/i915_drv.h |  3 ++-
>> > >>  drivers/gpu/drm/i915/i915_gem.c | 15 ---
>> > >>  2 files changed, 10 insertions(+), 8 deletions(-)
>> > >>
>> > >> diff --git a/drivers/gpu/drm/i915/i915_drv.h
> b/drivers/gpu/drm/i915/i915_drv.h
>> > >> index a42deeb..95b0d50 100644
>> > >> --- a/drivers/gpu/drm/i915/i915_drv.h
>> > >> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> > >> @@ -51,6 +51,7 @@
>> > >>  #include 
>> > >>  #include 
>> > >>  #include 
>> > >> +#include 
>> > >>
>> > >>  #include "i915_params.h"
>> > >>  #include "i915_reg.h"
>> > >> @@ -3363,7 +3364,7 @@ int i915_gem_wait_for_idle(struct
> drm_i915_private *dev_priv,
>> > >>  unsigned int flags);
>> > >>  int __must_check i915_gem_suspend(struct drm_i915_private
> *dev_priv);
>> > >>  void i915_gem_resume(struct drm_i915_private *dev_priv);
>> > >> -int i915_gem_fault(struct vm_fault *vmf);
>> > >> +vm_fault_t i915_gem_fault(struct vm_fault *vmf);
>> > >>  int i915_gem_object_wait(struct drm_i915_gem_object *obj,
>> > >>unsigned int flags,
>> > >>long timeout,
>> > >> diff --git a/drivers/gpu/drm/i915/i915_gem.c
> b/drivers/gpu/drm/i915/i915_gem.c
>> > >> index dd89abd..bdac690 100644
>> > >> --- a/drivers/gpu/drm/i915/i915_gem.c
>> > >> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> > >> @@ -1882,7 +1882,7 @@ int i915_gem_mmap_gtt_version(void)
>> > >>   * The current feature set supported by i915_gem_fault() and thus
> GTT mmaps
>> > >>   * is exposed via I915_PARAM_MMAP_GTT_VERSION (see
> i915_gem_mmap_gtt_version).
>> > >>   */
>> > >> -int i915_gem_fault(struct vm_fault *vmf)
>> > >> +vm_fault_t i915_gem_fault(struct vm_fault *vmf)
>> > >>  {
>> > >>  #define MIN_CHUNK_PAGES ((1 << 20) >> PAGE_SHIFT) /* 1 MiB */
>> > >>   struct vm_area_struct *area = vmf->vma;
>> > >> @@ -1895,6 +1895,7 @@ int i915_gem_fault(struct vm_fault *vmf)
>> > >>   pgoff_t page_offset;
>> > >>   unsigned int flags;
>> > >>   int ret;
>> > >> + vm_fault_t retval;
>> > >
>> > > What's the point of changing the name? An unnecessary change.
>> > >
>> > > BR,
>> > > Jani.
>> > >
>> > >>
>> > >>   /* We don't use vmf->pgoff since that has the fake offset */
>> > >>   page_offset = (vmf->address - area->vm_start) >> PAGE_SHIFT;
>> > >> @@ -2000,7 +2001,7 @@ int i915_gem_fault(struct vm_fault *vmf)
>> > >>* and so needs to be reported.
>> > >>*/
>> > >>   if (!i915_terminally_wedged(&dev_priv->gpu_error)) {
>> > >> - ret = VM_FAULT_SIGBUS;
>> > >> + retval = VM_FAULT_SIGBUS;
>> > >>   break;
>> > >>   }
>> > >>   case -EAGAIN:
>> > >> @@ -2017,21 +2018,21 @@ int i915_gem_fault(struct vm_fault *vmf)
>> > >>* EBUSY is ok: this just means that another thread
>> > >>* already did the job.
>> > >>*/
>> > >> - ret = VM_FAULT_NOPAGE;
>> > >> + retval = VM_FAULT_NOPAGE;
>> > >>   break;
>> > >>   case -ENOMEM:
>> > >> - ret = VM_FAULT_OOM;
>> > >> + retval = VM_FAULT_OOM;
>> > >>   brea

Re: [PATCH v6 6/7] remoteproc/davinci: use the reset framework

2018-04-17 Thread Sekhar Nori
On Tuesday 17 April 2018 11:00 PM, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski 
> 
> Switch to using the reset framework instead of handcoded reset routines
> we used so far.
> 
> Signed-off-by: Bartosz Golaszewski 

Looks good!

Reviewed-by: Sekhar Nori 

This depends on DaVinci gaining common clock / reset framework support
though. I am guessing remoteproc maintainers will have to be reminded
when its safe to apply.

Thanks,
Sekhar


[PATCH 2/2] qxl: keep separate release_bo pointer

2018-04-17 Thread Gerd Hoffmann
qxl expects that list_first_entry(release->bos) returns the first
element qxl added to the list.  ttm_eu_reserve_buffers() may reorder
the list though.

Add a release_bo field to struct qxl_release and use that instead.

Signed-off-by: Gerd Hoffmann 
---
 drivers/gpu/drm/qxl/qxl_drv.h |  1 +
 drivers/gpu/drm/qxl/qxl_cmd.c |  6 ++
 drivers/gpu/drm/qxl/qxl_release.c | 12 ++--
 3 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_drv.h b/drivers/gpu/drm/qxl/qxl_drv.h
index 00a1a66b05..864b456080 100644
--- a/drivers/gpu/drm/qxl/qxl_drv.h
+++ b/drivers/gpu/drm/qxl/qxl_drv.h
@@ -167,6 +167,7 @@ struct qxl_release {
 
int id;
int type;
+   struct qxl_bo *release_bo;
uint32_t release_offset;
uint32_t surface_release_id;
struct ww_acquire_ctx ticket;
diff --git a/drivers/gpu/drm/qxl/qxl_cmd.c b/drivers/gpu/drm/qxl/qxl_cmd.c
index c0fb52c6d4..01665b98c5 100644
--- a/drivers/gpu/drm/qxl/qxl_cmd.c
+++ b/drivers/gpu/drm/qxl/qxl_cmd.c
@@ -179,10 +179,9 @@ qxl_push_command_ring_release(struct qxl_device *qdev, 
struct qxl_release *relea
  uint32_t type, bool interruptible)
 {
struct qxl_command cmd;
-   struct qxl_bo_list *entry = list_first_entry(&release->bos, struct 
qxl_bo_list, tv.head);
 
cmd.type = type;
-   cmd.data = qxl_bo_physical_address(qdev, to_qxl_bo(entry->tv.bo), 
release->release_offset);
+   cmd.data = qxl_bo_physical_address(qdev, release->release_bo, 
release->release_offset);
 
return qxl_ring_push(qdev->command_ring, &cmd, interruptible);
 }
@@ -192,10 +191,9 @@ qxl_push_cursor_ring_release(struct qxl_device *qdev, 
struct qxl_release *releas
 uint32_t type, bool interruptible)
 {
struct qxl_command cmd;
-   struct qxl_bo_list *entry = list_first_entry(&release->bos, struct 
qxl_bo_list, tv.head);
 
cmd.type = type;
-   cmd.data = qxl_bo_physical_address(qdev, to_qxl_bo(entry->tv.bo), 
release->release_offset);
+   cmd.data = qxl_bo_physical_address(qdev, release->release_bo, 
release->release_offset);
 
return qxl_ring_push(qdev->cursor_ring, &cmd, interruptible);
 }
diff --git a/drivers/gpu/drm/qxl/qxl_release.c 
b/drivers/gpu/drm/qxl/qxl_release.c
index a0b4244d28..7cb2145772 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -173,6 +173,7 @@ qxl_release_free_list(struct qxl_release *release)
list_del(&entry->tv.head);
kfree(entry);
}
+   release->release_bo = NULL;
 }
 
 void
@@ -296,7 +297,6 @@ int qxl_alloc_surface_release_reserved(struct qxl_device 
*qdev,
 {
if (surface_cmd_type == QXL_SURFACE_CMD_DESTROY && create_rel) {
int idr_ret;
-   struct qxl_bo_list *entry = list_first_entry(&create_rel->bos, 
struct qxl_bo_list, tv.head);
struct qxl_bo *bo;
union qxl_release_info *info;
 
@@ -304,8 +304,9 @@ int qxl_alloc_surface_release_reserved(struct qxl_device 
*qdev,
idr_ret = qxl_release_alloc(qdev, QXL_RELEASE_SURFACE_CMD, 
release);
if (idr_ret < 0)
return idr_ret;
-   bo = to_qxl_bo(entry->tv.bo);
+   bo = create_rel->release_bo;
 
+   (*release)->release_bo = bo;
(*release)->release_offset = create_rel->release_offset + 64;
 
qxl_release_list_add(*release, bo);
@@ -365,6 +366,7 @@ int qxl_alloc_release_reserved(struct qxl_device *qdev, 
unsigned long size,
 
bo = qxl_bo_ref(qdev->current_release_bo[cur_idx]);
 
+   (*release)->release_bo = bo;
(*release)->release_offset = qdev->current_release_bo_offset[cur_idx] * 
release_size_per_bo[cur_idx];
qdev->current_release_bo_offset[cur_idx]++;
 
@@ -408,8 +410,7 @@ union qxl_release_info *qxl_release_map(struct qxl_device 
*qdev,
 {
void *ptr;
union qxl_release_info *info;
-   struct qxl_bo_list *entry = list_first_entry(&release->bos, struct 
qxl_bo_list, tv.head);
-   struct qxl_bo *bo = to_qxl_bo(entry->tv.bo);
+   struct qxl_bo *bo = release->release_bo;
 
ptr = qxl_bo_kmap_atomic_page(qdev, bo, release->release_offset & 
PAGE_MASK);
if (!ptr)
@@ -422,8 +423,7 @@ void qxl_release_unmap(struct qxl_device *qdev,
   struct qxl_release *release,
   union qxl_release_info *info)
 {
-   struct qxl_bo_list *entry = list_first_entry(&release->bos, struct 
qxl_bo_list, tv.head);
-   struct qxl_bo *bo = to_qxl_bo(entry->tv.bo);
+   struct qxl_bo *bo = release->release_bo;
void *ptr;
 
ptr = ((void *)info) - (release->release_offset & ~PAGE_MASK);
-- 
2.9.3



[PATCH 1/2] qxl: fix qxl_release_{map,unmap}

2018-04-17 Thread Gerd Hoffmann
s/PAGE_SIZE/PAGE_MASK/

Luckily release_offset is never larger than PAGE_SIZE, so the bug has no
bad side effects and managed to stay unnoticed for years that way ...

Signed-off-by: Gerd Hoffmann 
---
 drivers/gpu/drm/qxl/qxl_ioctl.c   | 4 ++--
 drivers/gpu/drm/qxl/qxl_release.c | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/qxl/qxl_ioctl.c b/drivers/gpu/drm/qxl/qxl_ioctl.c
index e238a1a2ec..6cc9f3367f 100644
--- a/drivers/gpu/drm/qxl/qxl_ioctl.c
+++ b/drivers/gpu/drm/qxl/qxl_ioctl.c
@@ -182,9 +182,9 @@ static int qxl_process_single_command(struct qxl_device 
*qdev,
goto out_free_reloc;
 
/* TODO copy slow path code from i915 */
-   fb_cmd = qxl_bo_kmap_atomic_page(qdev, cmd_bo, (release->release_offset 
& PAGE_SIZE));
+   fb_cmd = qxl_bo_kmap_atomic_page(qdev, cmd_bo, (release->release_offset 
& PAGE_MASK));
unwritten = __copy_from_user_inatomic_nocache
-   (fb_cmd + sizeof(union qxl_release_info) + 
(release->release_offset & ~PAGE_SIZE),
+   (fb_cmd + sizeof(union qxl_release_info) + 
(release->release_offset & ~PAGE_MASK),
 u64_to_user_ptr(cmd->command), cmd->command_size);
 
{
diff --git a/drivers/gpu/drm/qxl/qxl_release.c 
b/drivers/gpu/drm/qxl/qxl_release.c
index 5d84a66fed..a0b4244d28 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -411,10 +411,10 @@ union qxl_release_info *qxl_release_map(struct qxl_device 
*qdev,
struct qxl_bo_list *entry = list_first_entry(&release->bos, struct 
qxl_bo_list, tv.head);
struct qxl_bo *bo = to_qxl_bo(entry->tv.bo);
 
-   ptr = qxl_bo_kmap_atomic_page(qdev, bo, release->release_offset & 
PAGE_SIZE);
+   ptr = qxl_bo_kmap_atomic_page(qdev, bo, release->release_offset & 
PAGE_MASK);
if (!ptr)
return NULL;
-   info = ptr + (release->release_offset & ~PAGE_SIZE);
+   info = ptr + (release->release_offset & ~PAGE_MASK);
return info;
 }
 
@@ -426,7 +426,7 @@ void qxl_release_unmap(struct qxl_device *qdev,
struct qxl_bo *bo = to_qxl_bo(entry->tv.bo);
void *ptr;
 
-   ptr = ((void *)info) - (release->release_offset & ~PAGE_SIZE);
+   ptr = ((void *)info) - (release->release_offset & ~PAGE_MASK);
qxl_bo_kunmap_atomic_page(qdev, bo, ptr);
 }
 
-- 
2.9.3



Re: [PATCH 5/5] dmaengine: sprd: Add 'device_config' and 'device_prep_slave_sg' interfaces

2018-04-17 Thread Baolin Wang
On 17 April 2018 at 18:45, Lars-Peter Clausen  wrote:
> On 04/10/2018 09:46 AM, Baolin Wang wrote:
> [...]
>> +static int sprd_dma_slave_config(struct dma_chan *chan,
>> +  struct dma_slave_config *config)
>> +{
>> + struct sprd_dma_chn *schan = to_sprd_dma_chan(chan);
>> + struct sprd_dma_config *slave_cfg =
>> + container_of(config, struct sprd_dma_config, config);
>> +
>
> Please do not overload standard API with custom semantics. This makes the
> driver incompatible to the API and negates the whole idea of having a common
> API. E.g. this will crash when somebody passes a normal dma_slave_config
> struct to this function.
>

Yes, we have discussed with Vinod how to use 'dma_slave_config' to
reach our requirements. Thanks for your comments.

-- 
Baolin.wang
Best Regards


Re: [PATCH v6 5/7] remoteproc/davinci: use octal permissions for module_param()

2018-04-17 Thread Sekhar Nori
On Tuesday 17 April 2018 11:00 PM, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski 
> 
> Checkpatch recommends to use octal perms instead of S_IRUGO.
> 
> Signed-off-by: Bartosz Golaszewski 

Reviewed-by: Sekhar Nori 

Thanks,
Sekhar


Re: [PATCH v6 4/7] remoteproc/davinci: prepare and unprepare the clock where needed

2018-04-17 Thread Sekhar Nori
On Tuesday 17 April 2018 11:00 PM, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski 
> 
> We're currently switching the platform to using the common clock
> framework. We need to explicitly prepare and unprepare the rproc
> clock.
> 
> Signed-off-by: Bartosz Golaszewski 
> Acked-by: Suman Anna 
> Reviewed-by: David Lechner 

Reviewed-by: Sekhar Nori 

This should be safe to apply to v4.17-rc1 as well (for inclusion in v4.18).

Bartosz, I noticed that CONFIG_REMOTEPROC and the DA8XX driver is not
enabled in davinci_all_defconfig. Can you please send a patch enabling
that too?

Thanks,
Sekhar


Re: [PATCH 4.15 00/53] 4.15.18-stable review

2018-04-17 Thread Naresh Kamboju
On 17 April 2018 at 21:28, Greg Kroah-Hartman
 wrote:
> 
> NOTE, this is the last expected 4.15.y release.  After this one, the
> tree is end-of-life.  Please move to 4.16.y at this point in time.
> 
>
> This is the start of the stable review cycle for the 4.15.18 release.
> There are 53 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu Apr 19 15:57:06 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.15.18-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.15.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm and x86_64.

kselftest: BPF tests test_xdp_meta.sh and test_xdp_redirect.sh were being
skipped with "Could not run test without the ip {xdp,xdpgeneric} support",
which got added into iproute2 4.11 and now being run and reported failed
on stable-rc-4.15.18-rc1 and also on linux-mainline kernel 4.17.

We have an open bug to investigate this failure.
LKFT: mainline: BPF: test_xdp_redirect.sh and test_xdp_meta.sh skipped -
Could not run test without the ip xdpgeneric support
https://bugs.linaro.org/show_bug.cgi?id=3630

Summary


kernel: 4.15.18-rc1
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.15.y
git commit: ae8929dc5a7d09feab66e3131a04a1ed88d8d284
git describe: v4.15.17-54-gae8929dc5a7d
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.15-oe/build/v4.15.17-54-gae8929dc5a7d

No Regressions (compared to build v4.15.17)



Boards, architectures and test suites:
-

dragonboard-410c - arm64
* boot - fail: 2, pass: 20
* kselftest - skip: 20, fail: 2, pass: 43
* libhugetlbfs - skip: 1, pass: 90
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - skip: 17, pass: 64
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - skip: 6, pass: 57
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - skip: 1, pass: 21
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - pass: 14
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - skip: 134, pass: 1016
* ltp-timers-tests - pass: 13

hi6220-hikey - arm64
* boot - pass: 20
* kselftest - skip: 17, fail: 2, pass: 46
* libhugetlbfs - skip: 1, pass: 90
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - skip: 17, pass: 64
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - skip: 6, pass: 57
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - skip: 1, pass: 21
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - skip: 4, pass: 10
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - skip: 135, pass: 1015
* ltp-timers-tests - pass: 13

juno-r2 - arm64
* boot - pass: 20
* kselftest - skip: 18, fail: 2, pass: 45
* libhugetlbfs - skip: 1, pass: 90
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - skip: 17, pass: 64
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - skip: 6, pass: 57
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - pass: 22
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - skip: 4, pass: 10
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - skip: 134, pass: 1016
* ltp-timers-tests - pass: 13

qemu_x86_64
* boot - pass: 22
* kselftest - skip: 23, fail: 2, pass: 55
* kselftest-vsyscall-mode-native - skip: 23, fail: 2, pass: 55
* kselftest-vsyscall-mode-none - skip: 23, fail: 2, pass: 55
* libhugetlbfs - skip: 1, pass: 90
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - skip: 17, pass: 64
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - skip: 6, pass: 57
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - pass: 22
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - skip: 1, pass: 13
* ltp-securebits-test

Re: [patch v2] mm, oom: fix concurrent munlock and oom reaper unmap

2018-04-17 Thread David Rientjes
On Wed, 18 Apr 2018, Tetsuo Handa wrote:

> > Commit 97b1255cb27c is referencing MMF_OOM_SKIP already being set by 
> > exit_mmap().  The only thing this patch changes is where that is done: 
> > before or after free_pgtables().  We can certainly move it to before 
> > free_pgtables() at the risk of subsequent (and eventually unnecessary) oom 
> > kills.  It's not exactly the point of this patch.
> > 
> > I have thousands of real-world examples where additional processes were 
> > oom killed while the original victim was in free_pgtables().  That's why 
> > we've moved the MMF_OOM_SKIP to after free_pgtables().
> 
> "we have moved"? No, not yet. Your patch is about to move it.
> 

I'm referring to our own kernel, we have thousands of real-world examples 
where additional processes have been oom killed where the original victim 
is in free_pgtables().  It actually happens about 10-15% of the time in 
automated testing where you create a 128MB memcg, fork a canary, and then 
fork a >128MB memory hog.  10-15% of the time both processes get oom 
killed: the memory hog first (higher rss), the canary second.  The pgtable 
stat is unchanged between oom kills.

> My question is: is it guaranteed that 
> munlock_vma_pages_all()/unmap_vmas()/free_pgtables()
> by exit_mmap() are never blocked for memory allocation. Note that exit_mmap() 
> tries to unmap
> all pages while the OOM reaper tries to unmap only safe pages. If there is 
> possibility that
> munlock_vma_pages_all()/unmap_vmas()/free_pgtables() by exit_mmap() are 
> blocked for memory
> allocation, your patch will introduce an OOM livelock.
> 

If munlock_vma_pages_all(), unmap_vmas(), or free_pgtables() require 
memory to make forward progress, then we have bigger problems :)

I just ran a query of real-world oom kill logs that I have.  In 33,773,705 
oom kills, I have no evidence of a thread failing to exit after reaching 
exit_mmap().

You may recall from my support of your patch to emit the stack trace when 
the oom reaper fails, in https://marc.info/?l=linux-mm&m=152157881518627, 
that I have logs of 28,222,058 occurrences of the oom reaper where it 
successfully frees memory and the victim exits.

If you'd like to pursue the possibility that exit_mmap() blocks before 
freeing memory that we have somehow been lucky to miss in 33 million 
occurrences, I'd appreciate the test case.


Re: [PATCH 4.16 00/68] 4.16.3-stable review

2018-04-17 Thread Naresh Kamboju
On 17 April 2018 at 21:27, Greg Kroah-Hartman
 wrote:
> This is the start of the stable review cycle for the 4.16.3 release.
> There are 68 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Thu Apr 19 15:57:33 UTC 2018.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.16.3-rc1.gz
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.16.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h

Results from Linaro’s test farm.
No regressions on arm64, arm and x86_64.

kselftest: BPF tests test_xdp_meta.sh and test_xdp_redirect.sh were being
skipped with "Could not run test without the ip {xdp,xdpgeneric} support",
which got added into iproute2 4.11 and now being run and reported failed
on stable-rc-4.16.3-rc1 and also on linux-mainline kernel 4.16 to 4.17.

We have an open bug to investigate this failure.
LKFT: mainline: BPF: test_xdp_redirect.sh and test_xdp_meta.sh skipped -
Could not run test without the ip xdpgeneric support
https://bugs.linaro.org/show_bug.cgi?id=3630

Summary


kernel: 4.16.3-rc1
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.16.y
git commit: 8485fd17d21b5d9a07b276f1c35c90a5ed529a36
git describe: v4.16.2-69-g8485fd17d21b
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.16-oe/build/v4.16.2-69-g8485fd17d21b

No regressions (compared to build v4.16.2)


Boards, architectures and test suites:
-

dragonboard-410c - arm64
* boot - fail: 1, pass: 19
* kselftest - skip: 20, fail: 2, pass: 43
* libhugetlbfs - skip: 1, pass: 90
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - skip: 17, pass: 64
* ltp-fcntl-locktests-tests - pass: 2
* ltp-fs-tests - skip: 6, pass: 57
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - skip: 1, pass: 21
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - pass: 14
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - skip: 133, pass: 1017
* ltp-timers-tests - pass: 13

hi6220-hikey - arm64
* boot - pass: 20
* kselftest - skip: 17, fail: 2, pass: 46
* libhugetlbfs - skip: 1, pass: 90
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - skip: 17, pass: 64
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - skip: 6, pass: 57
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - skip: 1, pass: 21
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - skip: 4, pass: 10
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - skip: 134, pass: 1016
* ltp-timers-tests - pass: 13

juno-r2 - arm64
* boot - pass: 20
* kselftest - skip: 18, fail: 2, pass: 45
* libhugetlbfs - skip: 1, pass: 90
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - skip: 17, pass: 64
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - skip: 6, pass: 57
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - pass: 22
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - skip: 4, pass: 10
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - skip: 133, pass: 1017
* ltp-timers-tests - pass: 13

qemu_x86_64
* boot - pass: 22
* kselftest - skip: 23, fail: 2, pass: 55
* kselftest-vsyscall-mode-native - skip: 23, fail: 2, pass: 55
* kselftest-vsyscall-mode-none - skip: 23, fail: 2, pass: 55
* libhugetlbfs - skip: 1, pass: 90
* ltp-cap_bounds-tests - pass: 2
* ltp-containers-tests - skip: 17, pass: 64
* ltp-fcntl-locktests-tests - pass: 2
* ltp-filecaps-tests - pass: 2
* ltp-fs-tests - skip: 6, pass: 57
* ltp-fs_bind-tests - pass: 2
* ltp-fs_perms_simple-tests - pass: 19
* ltp-fsx-tests - pass: 2
* ltp-hugetlb-tests - pass: 22
* ltp-io-tests - pass: 3
* ltp-ipc-tests - pass: 9
* ltp-math-tests - pass: 11
* ltp-nptl-tests - pass: 2
* ltp-pty-tests - pass: 4
* ltp-sched-tests - skip: 1, pass: 13
* ltp-securebits-tests - pass: 4
* ltp-syscalls-tests - skip: 147, pass: 1003
* ltp-timers-tests - pass: 13

x15 - arm
* boot - pass: 20
* kselftest - skip: 21, fail: 3, pass: 38
* libhugetlbfs - skip: 1, pass: 87
* ltp-cap_bounds-tests - pass: 2
*

RE: [PATCH 2/6 v2] iommu: of: make of_pci_map_rid() available for other devices too

2018-04-17 Thread Bharat Bhushan


> -Original Message-
> From: Robin Murphy [mailto:robin.mur...@arm.com]
> Sent: Tuesday, April 17, 2018 10:23 PM
> To: Nipun Gupta ; robh...@kernel.org;
> frowand.l...@gmail.com
> Cc: will.dea...@arm.com; mark.rutl...@arm.com; catalin.mari...@arm.com;
> h...@lst.de; gre...@linuxfoundation.org; j...@8bytes.org;
> m.szyprow...@samsung.com; shawn...@kernel.org; bhelg...@google.com;
> io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org;
> devicet...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; linuxppc-
> d...@lists.ozlabs.org; linux-...@vger.kernel.org; Bharat Bhushan
> ; stuyo...@gmail.com; Laurentiu Tudor
> ; Leo Li 
> Subject: Re: [PATCH 2/6 v2] iommu: of: make of_pci_map_rid() available for
> other devices too
> 
> On 17/04/18 11:21, Nipun Gupta wrote:
> > iommu-map property is also used by devices with fsl-mc. This patch
> > moves the of_pci_map_rid to generic location, so that it can be used
> > by other busses too.

Nipun, please clarify that only function name is changed and rest of body is 
same.

> >
> > Signed-off-by: Nipun Gupta 
> > ---
> >   drivers/iommu/of_iommu.c | 106
> > +--
> 
> Doesn't this break "msi-parent" parsing for !CONFIG_OF_IOMMU?

Yes, this will be a problem with MSI 

> I guess you
> don't want fsl-mc to have to depend on PCI, but this looks like a step in the
> wrong direction.
> 
> I'm not entirely sure where of_map_rid() fits best, but from a quick look 
> around
> the least-worst option might be drivers/of/of_address.c, unless Rob and Frank
> have a better idea of where generic DT-based ID translation routines could 
> live?

drivers/of/address.c may be proper place to move until someone have better idea.

Thanks
-Bharat

> 
> >   drivers/of/irq.c |   6 +--
> >   drivers/pci/of.c | 101 
> > 
> >   include/linux/of_iommu.h |  11 +
> >   include/linux/of_pci.h   |  10 -
> >   5 files changed, 117 insertions(+), 117 deletions(-)
> >
> > diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c index
> > 5c36a8b..4e7712f 100644
> > --- a/drivers/iommu/of_iommu.c
> > +++ b/drivers/iommu/of_iommu.c
> > @@ -138,6 +138,106 @@ static int of_iommu_xlate(struct device *dev,
> > return ops->of_xlate(dev, iommu_spec);
> >   }
> >
> > +/**
> > + * of_map_rid - Translate a requester ID through a downstream mapping.
> > + * @np: root complex device node.
> > + * @rid: device requester ID to map.
> > + * @map_name: property name of the map to use.
> > + * @map_mask_name: optional property name of the mask to use.
> > + * @target: optional pointer to a target device node.
> > + * @id_out: optional pointer to receive the translated ID.
> > + *
> > + * Given a device requester ID, look up the appropriate
> > +implementation-defined
> > + * platform ID and/or the target device which receives transactions
> > +on that
> > + * ID, as per the "iommu-map" and "msi-map" bindings. Either of
> > +@target or
> > + * @id_out may be NULL if only the other is required. If @target
> > +points to
> > + * a non-NULL device node pointer, only entries targeting that node
> > +will be
> > + * matched; if it points to a NULL value, it will receive the device
> > +node of
> > + * the first matching target phandle, with a reference held.
> > + *
> > + * Return: 0 on success or a standard error code on failure.
> > + */
> > +int of_map_rid(struct device_node *np, u32 rid,
> > +  const char *map_name, const char *map_mask_name,
> > +  struct device_node **target, u32 *id_out) {
> > +   u32 map_mask, masked_rid;
> > +   int map_len;
> > +   const __be32 *map = NULL;
> > +
> > +   if (!np || !map_name || (!target && !id_out))
> > +   return -EINVAL;
> > +
> > +   map = of_get_property(np, map_name, &map_len);
> > +   if (!map) {
> > +   if (target)
> > +   return -ENODEV;
> > +   /* Otherwise, no map implies no translation */
> > +   *id_out = rid;
> > +   return 0;
> > +   }
> > +
> > +   if (!map_len || map_len % (4 * sizeof(*map))) {
> > +   pr_err("%pOF: Error: Bad %s length: %d\n", np,
> > +   map_name, map_len);
> > +   return -EINVAL;
> > +   }
> > +
> > +   /* The default is to select all bits. */
> > +   map_mask = 0x;
> > +
> > +   /*
> > +* Can be overridden by "{iommu,msi}-map-mask" property.
> > +*/
> > +   if (map_mask_name)
> > +   of_property_read_u32(np, map_mask_name, &map_mask);
> > +
> > +   masked_rid = map_mask & rid;
> > +   for ( ; map_len > 0; map_len -= 4 * sizeof(*map), map += 4) {
> > +   struct device_node *phandle_node;
> > +   u32 rid_base = be32_to_cpup(map + 0);
> > +   u32 phandle = be32_to_cpup(map + 1);
> > +   u32 out_base = be32_to_cpup(map + 2);
> > +   u32 rid_len = be32_to_cpup(map + 3);
> > +
> > +   if (rid_base & ~map_mask) {
> > + 

Re: [PATCH v6 3/7] remoteproc/davinci: add the missing retval check for clk_enable()

2018-04-17 Thread Sekhar Nori
On Tuesday 17 April 2018 11:00 PM, Bartosz Golaszewski wrote:
> From: Bartosz Golaszewski 
> 
> The davinci platform is being switched to using the common clock
> framework, where clk_enable() can fail. Add the return value check.
> 
> Signed-off-by: Bartosz Golaszewski 
> Acked-by: Suman Anna 
> Reviewed-by: David Lechner 

Reviewed-by: Sekhar Nori 

This should be safe to apply to v4.17-rc1 and should be queued by Ohad /
Bjorn.

Thanks,
Sekhar


Re: [patch v2] mm, oom: fix concurrent munlock and oom reaper unmap

2018-04-17 Thread Tetsuo Handa
David Rientjes wrote:
> On Wed, 18 Apr 2018, Tetsuo Handa wrote:
> > > Fix this by reusing MMF_UNSTABLE to specify that an mm should not be
> > > reaped.  This prevents the concurrent munlock_vma_pages_range() and
> > > unmap_page_range().  The oom reaper will simply not operate on an mm that
> > > has the bit set and leave the unmapping to exit_mmap().
> > 
> > This change assumes that 
> > munlock_vma_pages_all()/unmap_vmas()/free_pgtables()
> > are never blocked for memory allocation. Is that guaranteed? For example,
> > i_mmap_lock_write() from unmap_single_vma() from unmap_vmas() is never 
> > blocked
> > for memory allocation? Commit 97b1255cb27c551d ("mm,oom_reaper: check for
> > MMF_OOM_SKIP before complaining") was waiting for i_mmap_lock_write() from
> > unlink_file_vma() from free_pgtables(). Is it really guaranteed that 
> > somebody
> > else who is holding that lock is never waiting for memory allocation?
> > 
> 
> Commit 97b1255cb27c is referencing MMF_OOM_SKIP already being set by 
> exit_mmap().  The only thing this patch changes is where that is done: 
> before or after free_pgtables().  We can certainly move it to before 
> free_pgtables() at the risk of subsequent (and eventually unnecessary) oom 
> kills.  It's not exactly the point of this patch.
> 
> I have thousands of real-world examples where additional processes were 
> oom killed while the original victim was in free_pgtables().  That's why 
> we've moved the MMF_OOM_SKIP to after free_pgtables().

"we have moved"? No, not yet. Your patch is about to move it.

My question is: is it guaranteed that 
munlock_vma_pages_all()/unmap_vmas()/free_pgtables()
by exit_mmap() are never blocked for memory allocation. Note that exit_mmap() 
tries to unmap
all pages while the OOM reaper tries to unmap only safe pages. If there is 
possibility that
munlock_vma_pages_all()/unmap_vmas()/free_pgtables() by exit_mmap() are blocked 
for memory
allocation, your patch will introduce an OOM livelock.

> I'm not sure how 
> likely your scenario is in the real world, but if it poses a problem then 
> I believe it should be fixed by eventually deferring previous victims as a 
> change to oom_evaluate_task(), not exit_mmap().  If you'd like me to fix 
> that, please send along your test case that triggers it and I will send a 
> patch.
> 


Re: [PATCH 0/3] intel-iommu: fix mapping PSI missing for iommu_map()

2018-04-17 Thread Peter Xu
On Wed, Apr 18, 2018 at 12:41:27PM +0800, Peter Xu wrote:
> (PSI stands for: Page Selective Invalidations)
> 
> Intel IOMMU has the caching mode to ease emulation of the device.
> When that bit is set, we need to send PSIs even for newly mapped
> pages.  However current driver is not fully obey the rule.  E.g.,
> iommu_map() API will only do the mapping but it never sent the PSIs
> before.  That can be problematic to emulated IOMMU devices since
> they'll never be able to build up the shadow page tables if without
> such information.  This patchset tries to fix the problem.
> 
> Patch 1 is a tracing enhancement that helped me to triage the problem.
> It might even be useful in the future.
> 
> Patch 2 generalized a helper to notify the MAP PSIs.
> 
> Patch 3 fixes the real problem by making sure every domain mapping
> will trigger the MAP PSI notifications.
> 
> Without the patchset, nested device assignment (assign one device
> firstly to L1 guest, then to L2 guest) won't work for QEMU.  After
> applying the patchset, it works.
> 
> Please review.  Thanks.
> 
> Peter Xu (3):
>   intel-iommu: add some traces for PSIs
>   intel-iommu: generalize __mapping_notify_one()
>   intel-iommu: fix iotlb psi missing for mappings
> 
>  drivers/iommu/dmar.c|  3 ++
>  drivers/iommu/intel-iommu.c | 68 
> -
>  2 files changed, 52 insertions(+), 19 deletions(-)

Adding CC to Alexander Witte and Jintack.

-- 
Peter Xu


Re: [PATCH net-next] net: introduce a new tracepoint for tcp_rcv_space_adjust

2018-04-17 Thread Yafang Shao
On Wed, Apr 18, 2018 at 7:44 AM, Alexei Starovoitov
 wrote:
> On Mon, Apr 16, 2018 at 08:43:31AM -0700, Eric Dumazet wrote:
>>
>>
>> On 04/16/2018 08:33 AM, Yafang Shao wrote:
>> > tcp_rcv_space_adjust is called every time data is copied to user space,
>> > introducing a tcp tracepoint for which could show us when the packet is
>> > copied to user.
>> > This could help us figure out whether there's latency in user process.
>> >
>> > When a tcp packet arrives, tcp_rcv_established() will be called and with
>> > the existed tracepoint tcp_probe we could get the time when this packet
>> > arrives.
>> > Then this packet will be copied to user, and tcp_rcv_space_adjust will
>> > be called and with this new introduced tracepoint we could get the time
>> > when this packet is copied to user.
>> >
>> > arrives time : user process time=> latency caused by user
>> > tcp_probe  tcp_rcv_space_adjust
>> >
>> > Hence in the prink message, sk is printed as a key to connect these two
>> > tracepoints.
>> >
>>
>> socket pointer is not a key.
>>
>> TCP sockets can be reused pretty fast after free.
>>
>> I suggest you go for cookie instead, this is an unique 64bit identifier.
>> ( sock_gen_cookie() for details )
>
> I think would be even better if the stack would do this sock_gen_cookie()
> on its own in some way that user cannnot infere the order.
> In many cases we wanted to use socket cookie, but since it's not inited
> by default it's kinda useless.
> Turning this tracepoint on just to get cookie would be an ugly workaround.
>

Could we init it in sk_alloc() ?
Then in other code paths, for example sock_getsockopt or tracepoints,
we only read the value through a new inline function named
sock_read_cookie().


Thanks
Yafang


[PATCH 1/3] intel-iommu: add some traces for PSIs

2018-04-17 Thread Peter Xu
It is helpful to debug and triage PSI notification missings.

Signed-off-by: Peter Xu 
---
 drivers/iommu/dmar.c| 3 +++
 drivers/iommu/intel-iommu.c | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 9a7ffd13c7f0..62ae26c3f7b7 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1325,6 +1325,9 @@ void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, 
u64 addr,
struct qi_desc desc;
int ih = 0;
 
+   pr_debug("%s: iommu %d did %u addr 0x%llx order %u type %llx\n",
+__func__, iommu->seq_id, did, addr, size_order, type);
+
if (cap_write_drain(iommu->cap))
dw = 1;
 
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 582fd01cb7d1..a64da83e867c 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1396,6 +1396,9 @@ static void __iommu_flush_iotlb(struct intel_iommu 
*iommu, u16 did,
u64 val = 0, val_iva = 0;
unsigned long flag;
 
+   pr_debug("%s: iommu %d did %u addr 0x%llx order %u type %llx\n",
+__func__, iommu->seq_id, did, addr, size_order, type);
+
switch (type) {
case DMA_TLB_GLOBAL_FLUSH:
/* global flush doesn't need set IVA_REG */
-- 
2.14.3



[PATCH 3/3] intel-iommu: fix iotlb psi missing for mappings

2018-04-17 Thread Peter Xu
When caching mode is enabled for IOMMU, we should send explicit IOTLB
PSIs even for newly created mappings.  However these events are missing
for all intel_iommu_map() callers, e.g., iommu_map().  One direct user
is the vfio-pci driver.

To make sure we'll send the PSIs always when necessary, this patch
firstly introduced domain_mapping() helper for page mappings, then fixed
the problem by generalizing the explicit map IOTLB PSI logic into that
new helper. With that, we let iommu_domain_identity_map() to use the
simplified version to avoid sending the notifications, while for all the
rest of cases we send the notifications always.

For VM case, we send the PSIs to all the backend IOMMUs for the domain.

This patch allows the nested device assignment to work with QEMU (assign
device firstly to L1 guest, then assign it again to L2 guest).

Signed-off-by: Peter Xu 
---
 drivers/iommu/intel-iommu.c | 43 ++-
 1 file changed, 34 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index bf111e60857c..eb0f0911342f 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2353,18 +2353,47 @@ static int __domain_mapping(struct dmar_domain *domain, 
unsigned long iov_pfn,
return 0;
 }
 
+static int domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn,
+ struct scatterlist *sg, unsigned long phys_pfn,
+ unsigned long nr_pages, int prot)
+{
+   int ret;
+   struct intel_iommu *iommu;
+
+   /* Do the real mapping first */
+   ret = __domain_mapping(domain, iov_pfn, sg, phys_pfn, nr_pages, prot);
+   if (ret)
+   return ret;
+
+   /* Notify about the new mapping */
+   if (domain_type_is_vm(domain)) {
+  /* VM typed domains can have more than one IOMMUs */
+  int iommu_id;
+  for_each_domain_iommu(iommu_id, domain) {
+  iommu = g_iommus[iommu_id];
+  __mapping_notify_one(iommu, domain, iov_pfn, nr_pages);
+  }
+   } else {
+  /* General domains only have one IOMMU */
+  iommu = domain_get_iommu(domain);
+  __mapping_notify_one(iommu, domain, iov_pfn, nr_pages);
+   }
+
+   return 0;
+}
+
 static inline int domain_sg_mapping(struct dmar_domain *domain, unsigned long 
iov_pfn,
struct scatterlist *sg, unsigned long 
nr_pages,
int prot)
 {
-   return __domain_mapping(domain, iov_pfn, sg, 0, nr_pages, prot);
+   return domain_mapping(domain, iov_pfn, sg, 0, nr_pages, prot);
 }
 
 static inline int domain_pfn_mapping(struct dmar_domain *domain, unsigned long 
iov_pfn,
 unsigned long phys_pfn, unsigned long 
nr_pages,
 int prot)
 {
-   return __domain_mapping(domain, iov_pfn, NULL, phys_pfn, nr_pages, 
prot);
+   return domain_mapping(domain, iov_pfn, NULL, phys_pfn, nr_pages, prot);
 }
 
 static void domain_context_clear_one(struct intel_iommu *iommu, u8 bus, u8 
devfn)
@@ -2669,9 +2698,9 @@ static int iommu_domain_identity_map(struct dmar_domain 
*domain,
 */
dma_pte_clear_range(domain, first_vpfn, last_vpfn);
 
-   return domain_pfn_mapping(domain, first_vpfn, first_vpfn,
- last_vpfn - first_vpfn + 1,
- DMA_PTE_READ|DMA_PTE_WRITE);
+   return __domain_mapping(domain, first_vpfn, NULL,
+   first_vpfn, last_vpfn - first_vpfn + 1,
+   DMA_PTE_READ|DMA_PTE_WRITE);
 }
 
 static int domain_prepare_identity_map(struct device *dev,
@@ -3638,8 +3667,6 @@ static dma_addr_t __intel_map_single(struct device *dev, 
phys_addr_t paddr,
if (ret)
goto error;
 
-   __mapping_notify_one(iommu, domain, mm_to_dma_pfn(iova_pfn), size);
-
start_paddr = (phys_addr_t)iova_pfn << PAGE_SHIFT;
start_paddr += paddr & ~PAGE_MASK;
return start_paddr;
@@ -3857,8 +3884,6 @@ static int intel_map_sg(struct device *dev, struct 
scatterlist *sglist, int nele
return 0;
}
 
-   __mapping_notify_one(iommu, domain, start_vpfn, size);
-
return nelems;
 }
 
-- 
2.14.3



[PATCH 2/3] intel-iommu: generalize __mapping_notify_one()

2018-04-17 Thread Peter Xu
Generalize this new helper to notify one newly created mapping on one
single IOMMU.  We can further leverage this helper in the next patch.

Signed-off-by: Peter Xu 
---
 drivers/iommu/intel-iommu.c | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a64da83e867c..bf111e60857c 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1607,6 +1607,18 @@ static void iommu_flush_iotlb_psi(struct intel_iommu 
*iommu,
iommu_flush_dev_iotlb(domain, addr, mask);
 }
 
+/* Notification for newly created mappings */
+static inline void __mapping_notify_one(struct intel_iommu *iommu,
+   struct dmar_domain *domain,
+   unsigned long pfn, unsigned int pages)
+{
+   /* It's a non-present to present mapping. Only flush if caching mode */
+   if (cap_caching_mode(iommu->cap))
+   iommu_flush_iotlb_psi(iommu, domain, pfn, pages, 0, 1);
+   else
+   iommu_flush_write_buffer(iommu);
+}
+
 static void iommu_flush_iova(struct iova_domain *iovad)
 {
struct dmar_domain *domain;
@@ -3626,13 +3638,7 @@ static dma_addr_t __intel_map_single(struct device *dev, 
phys_addr_t paddr,
if (ret)
goto error;
 
-   /* it's a non-present to present mapping. Only flush if caching mode */
-   if (cap_caching_mode(iommu->cap))
-   iommu_flush_iotlb_psi(iommu, domain,
- mm_to_dma_pfn(iova_pfn),
- size, 0, 1);
-   else
-   iommu_flush_write_buffer(iommu);
+   __mapping_notify_one(iommu, domain, mm_to_dma_pfn(iova_pfn), size);
 
start_paddr = (phys_addr_t)iova_pfn << PAGE_SHIFT;
start_paddr += paddr & ~PAGE_MASK;
@@ -3851,11 +3857,7 @@ static int intel_map_sg(struct device *dev, struct 
scatterlist *sglist, int nele
return 0;
}
 
-   /* it's a non-present to present mapping. Only flush if caching mode */
-   if (cap_caching_mode(iommu->cap))
-   iommu_flush_iotlb_psi(iommu, domain, start_vpfn, size, 0, 1);
-   else
-   iommu_flush_write_buffer(iommu);
+   __mapping_notify_one(iommu, domain, start_vpfn, size);
 
return nelems;
 }
-- 
2.14.3



[PATCH 0/3] intel-iommu: fix mapping PSI missing for iommu_map()

2018-04-17 Thread Peter Xu
(PSI stands for: Page Selective Invalidations)

Intel IOMMU has the caching mode to ease emulation of the device.
When that bit is set, we need to send PSIs even for newly mapped
pages.  However current driver is not fully obey the rule.  E.g.,
iommu_map() API will only do the mapping but it never sent the PSIs
before.  That can be problematic to emulated IOMMU devices since
they'll never be able to build up the shadow page tables if without
such information.  This patchset tries to fix the problem.

Patch 1 is a tracing enhancement that helped me to triage the problem.
It might even be useful in the future.

Patch 2 generalized a helper to notify the MAP PSIs.

Patch 3 fixes the real problem by making sure every domain mapping
will trigger the MAP PSI notifications.

Without the patchset, nested device assignment (assign one device
firstly to L1 guest, then to L2 guest) won't work for QEMU.  After
applying the patchset, it works.

Please review.  Thanks.

Peter Xu (3):
  intel-iommu: add some traces for PSIs
  intel-iommu: generalize __mapping_notify_one()
  intel-iommu: fix iotlb psi missing for mappings

 drivers/iommu/dmar.c|  3 ++
 drivers/iommu/intel-iommu.c | 68 -
 2 files changed, 52 insertions(+), 19 deletions(-)

-- 
2.14.3



Re: [PATCH/RFC] crypto: Add platform dependencies for CRYPTO_DEV_CCREE

2018-04-17 Thread Gilad Ben-Yossef
Hi Geert,

On Tue, Apr 17, 2018 at 9:14 PM, Geert Uytterhoeven
 wrote:
> The ARM TrustZone CryptoCell is found on ARM SoCs only.  Hence make it
> depend on ARM or ARM64, unless compile-testing.
>

Actually it is not. Despite what the name suggest, CryptoCell is
designed by Arm but is
not in fact limited to Arm cores. I think the only requirement is
ability to provide an AMBA bus
interface. Kudos to our marketing department to make that so clear and
so on... :-)

There are in fact licensees of this IP which use it with other (Linux
running) architectures, perhaps
thanks to the fact that the design originated in an outside company
(Discretix) which was bought by Arm.

Therefore, NAK on the specific patch. However, if there is some build
issue with a none Arm
architecture I of course very interested to hear about it.

Many thanks,
Gilad





-- 
Gilad Ben-Yossef
Chief Coffee Drinker

"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
 -- Jean-Baptiste Queru


Re: [PATCH] mm:memcg: add __GFP_NOWARN in __memcg_schedule_kmem_cache_create

2018-04-17 Thread David Rientjes
On Tue, 17 Apr 2018, Matthew Wilcox wrote:

> Not arguing against this patch.  But how many places do we want to use
> GFP_NOWAIT without __GFP_NOWARN?  Not many, and the few which do do this
> seem like they simply haven't added it yet.  Maybe this would be a good idea?
> 
> -#define GFP_NOWAIT  (__GFP_KSWAPD_RECLAIM)
> +#define GFP_NOWAIT  (__GFP_KSWAPD_RECLAIM | __GFP_NOWARN)
> 

I don't think that's a good idea, slab allocators use GFP_NOWAIT during 
init, for example, followed up with a BUG_ON() if it fails.  With an 
implicit __GFP_NOWARN we wouldn't be able to see the state of memory when 
it crashes (likely memory that wasn't freed to the allocator).  I think 
whether the allocation failure should trigger a warning is up to the 
caller.


Re: [PATCH] f2fs: set deadline to drop expired inmem pages

2018-04-17 Thread Jaegeuk Kim
On 04/17, Chao Yu wrote:
> On 2018/4/17 14:44, Chao Yu wrote:
> > On 2018/4/17 4:16, Jaegeuk Kim wrote:
> >> On 04/13, Chao Yu wrote:
> >>> On 2018/4/13 12:05, Jaegeuk Kim wrote:
>  On 04/13, Chao Yu wrote:
> > On 2018/4/13 9:04, Jaegeuk Kim wrote:
> >> On 04/10, Chao Yu wrote:
> >>> Hi Jaegeuk,
> >>>
> >>> On 2018/4/8 16:13, Chao Yu wrote:
>  f2fs doesn't allow abuse on atomic write class interface, so except
>  limiting in-mem pages' total memory usage capacity, we need to limit
>  start-commit time as well, otherwise we may run into infinite loop
>  during foreground GC because target blocks in victim segment are
>  belong to atomic opened file for long time.
> 
>  Now, we will check the condition with f2fs_balance_fs_bg in
>  background threads, once if user doesn't commit data exceeding 30
>  seconds, we will drop all cached data, so I expect it can keep our
>  system running safely to prevent Dos attack.
> >>>
> >>> Is it worth to add this patch to avoid abuse on atomic write 
> >>> interface by user?
> >>
> >> Hmm, hope to see a real problem first in this case.
> >
> > I think this can be a more critical security leak instead of a 
> > potential issue
> > which we can wait for someone reporting that can be too late.
> >
> > For example, user can simply write a huge file whose data spread in all 
> > f2fs
> > segments, once user open that file as atomic, foreground GC will suffer
> > deadloop, causing denying any further service of f2fs.
> 
>  How can you guarantee it won't happen within 30sec? If you want to avoid 
>  that,
> >>>
> >>> Now the value is smaller than generic hang task threshold in order to 
> >>> avoid
> >>> foreground GC helding gc_mutex too long, we can tune that parameter?
> >>>
>  you have to take a look at foreground gc.
> >>>
> >>> What do you mean? let GC moves blocks of atomic write opened file?
> >>
> >> I thought that we first need to detect when foreground GC is stuck by such 
> >> the
> >> huge number of atomic writes. Then, we need to do something like dropping 
> >> all
> >> the atomic writes.
> > 
> > Yup, that will be reasonable. :)
> 
> If we drop all atomic writes, for those atomic write who act very normal, it
> will case them losing all cached data without any hint like error return 
> value.
> So should we just:
> 
> - drop expired inmem pages.
> - or set FI_DROP_ATOMIC flag, return -EIO during atomic_commit, and reset the 
> flag.

Like FI_ATOMIC_REVOKE_REQUEST in atomic_commit?

> 
> Thanks,
> 
> > 
> > Thanks,
> > 
> >>
> >>>
> >>> Thanks,
> >>>
> 
> >
> > Thanks,
> >
> >>
> >>> Thanks,
> >>
> >> .
> >>
> 
>  .
> 
> >>
> >> .
> >>
> > 
> > 
> > .
> > 


Re: [patch v2] mm, oom: fix concurrent munlock and oom reaper unmap

2018-04-17 Thread David Rientjes
On Wed, 18 Apr 2018, Tetsuo Handa wrote:

> > Fix this by reusing MMF_UNSTABLE to specify that an mm should not be
> > reaped.  This prevents the concurrent munlock_vma_pages_range() and
> > unmap_page_range().  The oom reaper will simply not operate on an mm that
> > has the bit set and leave the unmapping to exit_mmap().
> 
> This change assumes that munlock_vma_pages_all()/unmap_vmas()/free_pgtables()
> are never blocked for memory allocation. Is that guaranteed? For example,
> i_mmap_lock_write() from unmap_single_vma() from unmap_vmas() is never blocked
> for memory allocation? Commit 97b1255cb27c551d ("mm,oom_reaper: check for
> MMF_OOM_SKIP before complaining") was waiting for i_mmap_lock_write() from
> unlink_file_vma() from free_pgtables(). Is it really guaranteed that somebody
> else who is holding that lock is never waiting for memory allocation?
> 

Commit 97b1255cb27c is referencing MMF_OOM_SKIP already being set by 
exit_mmap().  The only thing this patch changes is where that is done: 
before or after free_pgtables().  We can certainly move it to before 
free_pgtables() at the risk of subsequent (and eventually unnecessary) oom 
kills.  It's not exactly the point of this patch.

I have thousands of real-world examples where additional processes were 
oom killed while the original victim was in free_pgtables().  That's why 
we've moved the MMF_OOM_SKIP to after free_pgtables().  I'm not sure how 
likely your scenario is in the real world, but if it poses a problem then 
I believe it should be fixed by eventually deferring previous victims as a 
change to oom_evaluate_task(), not exit_mmap().  If you'd like me to fix 
that, please send along your test case that triggers it and I will send a 
patch.


Re: [patch v2] mm, oom: fix concurrent munlock and oom reaper unmap

2018-04-17 Thread Tetsuo Handa
David Rientjes wrote:
> Fix this by reusing MMF_UNSTABLE to specify that an mm should not be
> reaped.  This prevents the concurrent munlock_vma_pages_range() and
> unmap_page_range().  The oom reaper will simply not operate on an mm that
> has the bit set and leave the unmapping to exit_mmap().

This change assumes that munlock_vma_pages_all()/unmap_vmas()/free_pgtables()
are never blocked for memory allocation. Is that guaranteed? For example,
i_mmap_lock_write() from unmap_single_vma() from unmap_vmas() is never blocked
for memory allocation? Commit 97b1255cb27c551d ("mm,oom_reaper: check for
MMF_OOM_SKIP before complaining") was waiting for i_mmap_lock_write() from
unlink_file_vma() from free_pgtables(). Is it really guaranteed that somebody
else who is holding that lock is never waiting for memory allocation?


Re: [PATCH v5] ARM: dts: tpc: Device tree description of the iMX6Q TPC board

2018-04-17 Thread Shawn Guo
On Tue, Apr 10, 2018 at 10:32:09PM +0200, Lukasz Majewski wrote:
> This commit adds device tree description of Kieback & Peter GmbH
> iMX6Q TPC board.
> 
> Signed-off-by: Lukasz Majewski 
> Reviewed-by: Fabio Estevam 
> 
> ---
> Changes for v5:
> - Use 'interpolation-steps' to fill the brightness level table
> - Remove not needed status = "okay" properties
> - Replace goodix_ts -> touchscreen
> - Replace sgtl5000 -> audio-codec
> 
> This patch depends on PWM backlight intepolation work:
> https://patchwork.kernel.org/patch/10330775/
> https://patchwork.kernel.org/patch/10330777/
> 
> Changes for v4:
> - Remove not needed bus-format-override = "rgb565";
>   property
> 
> Changes for v3:
> - Provide proper compatible for RTC DTS node
> - Remove wrong comment style for beeper
> - Add proper SPDX entries for DTS files
> - Add vendor prefix to a separate patch
> 
> Changes for v2:
> - SPDX license identifiers used
> - Separate regulators
> - Proper beeper driver
> - Use of the lcd panel (with compatible) instead of timings provided in
>   device tree
> - Add IRQ types (like IRQ_TYPE_EDGE_FALLING) and GPIO active levels (like
>   GPIO_ACTIVE_HIGH)
> - Remove not needed nodes
> - Make W=1 dtbs compilation with no errors
> ---
>  arch/arm/boot/dts/Makefile |   1 +
>  arch/arm/boot/dts/imx6q-kp-tpc.dts |  22 ++
>  arch/arm/boot/dts/imx6q-kp.dtsi| 432 
> +
>  3 files changed, 455 insertions(+)
>  create mode 100644 arch/arm/boot/dts/imx6q-kp-tpc.dts
>  create mode 100644 arch/arm/boot/dts/imx6q-kp.dtsi
> 
> diff --git a/arch/arm/boot/dts/Makefile b/arch/arm/boot/dts/Makefile
> index ade7a38543dc..c148c4cf28f2 100644
> --- a/arch/arm/boot/dts/Makefile
> +++ b/arch/arm/boot/dts/Makefile
> @@ -459,6 +459,7 @@ dtb-$(CONFIG_SOC_IMX6Q) += \
>   imx6q-icore-ofcap10.dtb \
>   imx6q-icore-ofcap12.dtb \
>   imx6q-icore-rqs.dtb \
> + imx6q-kp-tpc.dtb \
>   imx6q-marsboard.dtb \
>   imx6q-mccmon6.dtb \
>   imx6q-nitrogen6x.dtb \
> diff --git a/arch/arm/boot/dts/imx6q-kp-tpc.dts 
> b/arch/arm/boot/dts/imx6q-kp-tpc.dts
> new file mode 100644
> index ..c3dea0ec54ac
> --- /dev/null
> +++ b/arch/arm/boot/dts/imx6q-kp-tpc.dts
> @@ -0,0 +1,22 @@
> +//SPDX-License-Identifier: (GPL-2.0+ OR MIT)

There should be a space after //.

> +/*
> + * Copyright 2018
> + * Lukasz Majewski, DENX Software Engineering, lu...@denx.de
> + */
> +
> +/dts-v1/;
> +
> +#include "imx6q-kp.dtsi"
> +
> +/ {
> + model = "Freescale i.MX6 Qwuad K+P TPC Board";
> + compatible = "kiebackpeter,imx6q-tpc", "fsl,imx6q";
> +
> + memory@1000 {
> + reg = <0x1000 0x4000>;
> + };
> +};
> +
> +&ipu1_di0_disp0 {
> + remote-endpoint = <&lcd_display_in>;
> +};
> diff --git a/arch/arm/boot/dts/imx6q-kp.dtsi b/arch/arm/boot/dts/imx6q-kp.dtsi
> new file mode 100644
> index ..174d044cf8bf
> --- /dev/null
> +++ b/arch/arm/boot/dts/imx6q-kp.dtsi
> @@ -0,0 +1,432 @@
> +//SPDX-License-Identifier: (GPL-2.0+ OR MIT)

Ditto

> +/*
> + * Copyright 2018
> + * Lukasz Majewski, DENX Software Engineering, lu...@denx.de
> + */
> +
> +/dts-v1/;
> +
> +#include "imx6q.dtsi"
> +
> +#include 
> +#include 
> +#include 
> +
> +/ {
> + backlight_lcd: backlight-lcd {
> + compatible = "pwm-backlight";
> + pwms = <&pwm1 0 500>;
> + brightness-levels = <0 255>;
> + num-interpolated-steps = <255>;
> + default-brightness-level = <250>;
> + };
> +
> + beeper {
> + compatible = "pwm-beeper";
> + pwms = <&pwm2 0 50>;
> + };
> +
> + lcd_display: disp0 {

s/disp0/display

> + compatible = "fsl,imx-parallel-display";
> + #address-cells = <1>;
> + #size-cells = <0>;
> + interface-pix-fmt = "rgb24";
> + pinctrl-names = "default";
> + pinctrl-0 = <&pinctrl_ipu1>;
> +
> + port@0 {
> + reg = <0>;
> +
> + lcd_display_in: endpoint {
> + remote-endpoint = <&ipu1_di0_disp0>;
> + };
> + };
> +
> + port@1 {
> + reg = <1>;
> +
> + lcd_display_out: endpoint {
> + remote-endpoint = <&lcd_panel_in>;
> + };
> + };
> + };
> +
> + lcd_panel: lcd-panel {
> + compatible = "auo,g070vvn01";
> + backlight = <&backlight_lcd>;
> + power-supply = <®_display>;
> +
> + port {
> + lcd_panel_in: endpoint {
> + remote-endpoint = <&lcd_display_out>;
> + };
> + };
> + };
> +
> + leds {
> + compatible = "gpio-leds";
> +
> + green {
> + label = "led1";
> + gpios = <&gpio3 16 GPIO_ACTIVE_HIGH>;
> +   

Re: [PATCH] regulator: Fix return type of of_map_mode()

2018-04-17 Thread Doug Anderson
Hi,

On Tue, Apr 17, 2018 at 10:48 AM, Javier Martinez Canillas
 wrote:
>>> Let's fix the return type of all of the current of_map_mode()
>>> functions.  While we're at it, we'll remove one pointless "inline".
>>
>> Ah, I see...  the thing here is that the mode is always an unsigned int
>> since it's a bitmask - this goes out beying the use in of_map_mode() and
>> into all the other APIs.  We only actually use 4 bits currently so I
>> think there's no problem switching to int but it seems we should
>> probably do that consistently throughout the API so that things don't
>> get missed later on.
>
> Maybe another option could be to add a REGULATOR_MODE_INVALID with
> value 0x0, and fix the drivers that are returning -EINVAL to return
> that instead?
>
> In of_get_regulation_constraints() we could check for that and
> propagate -EINVAL.

I like this idea.  Posted at
.  Note that there's no
actual error to propagate since of_get_regulation_constraints() just
prints the error and continues on its merry way.

-Doug


linux-next: build failure after merge of the arm-current tree

2018-04-17 Thread Stephen Rothwell
Hi Russell,

After merging the arm-current tree, today's linux-next build
(lots of configs) failed like this:

/bin/sh: 1: arithmetic expression: expecting primary: " "
(lots of these)

Caused by commit

  fe680ca02c1e ("ARM: replace unnecessary perl with sed and the shell $(( )) 
operator")

(pointed out by Michael Ellerman)

Our /bin/sh is dash not bash ...

-- 
Cheers,
Stephen Rothwell


pgpUk4rkqtoMu.pgp
Description: OpenPGP digital signature


[PATCH v2] regulator: Don't return or expect -errno from of_map_mode()

2018-04-17 Thread Douglas Anderson
In of_get_regulation_constraints() we were taking the result of
of_map_mode() (an unsigned int) and assigning it to an int.  We were
then checking whether this value was -EINVAL.  Some implementers of
of_map_mode() were returning -EINVAL (even though the return type of
their function needed to be unsigned int) because they needed to to
signal an error back to of_get_regulation_constraints().

In general in the regulator framework the mode is always referred to
as an unsigned int.  While we could fix this to be a signed int (the
highest value we store in there right now is 0x8), it's actually
pretty clean to just define the regulator mode 0x0 (the lack of any
bits set) as an invalid mode.  Let's do that.

Suggested-by: Javier Martinez Canillas 
Fixes: 5e5e3a42c653 ("regulator: of: Add support for parsing initial and 
suspend modes")
Signed-off-by: Douglas Anderson 
---

Changes in v2:
- Use Javier's suggestion of defining 0x0 as invalid

 drivers/regulator/cpcap-regulator.c |  2 +-
 drivers/regulator/of_regulator.c| 15 +--
 drivers/regulator/twl-regulator.c   |  2 +-
 include/linux/regulator/consumer.h  |  1 +
 4 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/regulator/cpcap-regulator.c 
b/drivers/regulator/cpcap-regulator.c
index f541b80f1b54..bd910fe123d9 100644
--- a/drivers/regulator/cpcap-regulator.c
+++ b/drivers/regulator/cpcap-regulator.c
@@ -222,7 +222,7 @@ static unsigned int cpcap_map_mode(unsigned int mode)
case CPCAP_BIT_AUDIO_LOW_PWR:
return REGULATOR_MODE_STANDBY;
default:
-   return -EINVAL;
+   return REGULATOR_MODE_INVALID;
}
 }
 
diff --git a/drivers/regulator/of_regulator.c b/drivers/regulator/of_regulator.c
index f47264fa1940..22c02b7a338b 100644
--- a/drivers/regulator/of_regulator.c
+++ b/drivers/regulator/of_regulator.c
@@ -124,11 +124,12 @@ static void of_get_regulation_constraints(struct 
device_node *np,
 
if (!of_property_read_u32(np, "regulator-initial-mode", &pval)) {
if (desc && desc->of_map_mode) {
-   ret = desc->of_map_mode(pval);
-   if (ret == -EINVAL)
+   unsigned int mode = desc->of_map_mode(pval);
+
+   if (mode == REGULATOR_MODE_INVALID)
pr_err("%s: invalid mode %u\n", np->name, pval);
else
-   constraints->initial_mode = ret;
+   constraints->initial_mode = mode;
} else {
pr_warn("%s: mapping for mode %d not defined\n",
np->name, pval);
@@ -163,12 +164,14 @@ static void of_get_regulation_constraints(struct 
device_node *np,
if (!of_property_read_u32(suspend_np, "regulator-mode",
  &pval)) {
if (desc && desc->of_map_mode) {
-   ret = desc->of_map_mode(pval);
-   if (ret == -EINVAL)
+   unsigned int mode = desc->of_map_mode(pval);
+
+   mode = desc->of_map_mode(pval);
+   if (mode == REGULATOR_MODE_INVALID)
pr_err("%s: invalid mode %u\n",
   np->name, pval);
else
-   suspend_state->mode = ret;
+   suspend_state->mode = mode;
} else {
pr_warn("%s: mapping for mode %d not defined\n",
np->name, pval);
diff --git a/drivers/regulator/twl-regulator.c 
b/drivers/regulator/twl-regulator.c
index a4456db5849d..884c7505ed91 100644
--- a/drivers/regulator/twl-regulator.c
+++ b/drivers/regulator/twl-regulator.c
@@ -274,7 +274,7 @@ static inline unsigned int twl4030reg_map_mode(unsigned int 
mode)
case RES_STATE_SLEEP:
return REGULATOR_MODE_STANDBY;
default:
-   return -EINVAL;
+   return REGULATOR_MODE_INVALID;
}
 }
 
diff --git a/include/linux/regulator/consumer.h 
b/include/linux/regulator/consumer.h
index df176d7c2b87..25602afd4844 100644
--- a/include/linux/regulator/consumer.h
+++ b/include/linux/regulator/consumer.h
@@ -80,6 +80,7 @@ struct regmap;
  * These modes can be OR'ed together to make up a mask of valid register modes.
  */
 
+#define REGULATOR_MODE_INVALID 0x0
 #define REGULATOR_MODE_FAST0x1
 #define REGULATOR_MODE_NORMAL  0x2
 #define REGULATOR_MODE_IDLE0x4
-- 
2.17.0.484.g0c8726318c-goog



[PATCH] prctl: fix compat handling for prctl

2018-04-17 Thread Li Bin
The member auxv in prctl_mm_map structure which be shared with
userspace is pointer type, but the kernel supporting COMPAT didn't
handle it. This patch fix the compat handling for prctl syscall.

Signed-off-by: Li Bin 
---
 kernel/sys.c | 41 +
 1 file changed, 41 insertions(+)

diff --git a/kernel/sys.c b/kernel/sys.c
index ad69218..03b9731 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -1968,6 +1968,25 @@ static int validate_prctl_map(struct prctl_mm_map 
*prctl_map)
return error;
 }
 
+#ifdef CONFIG_COMPAT
+struct compat_prctl_mm_map {
+   __u64   start_code; /* code section bounds */
+   __u64   end_code;
+   __u64   start_data; /* data section bounds */
+   __u64   end_data;
+   __u64   start_brk;  /* heap for brk() syscall */
+   __u64   brk;
+   __u64   start_stack;/* stack starts at */
+   __u64   arg_start;  /* command line arguments bounds */
+   __u64   arg_end;
+   __u64   env_start;  /* environment variables bounds */
+   __u64   env_end;
+   compat_uptr_t   auxv;   /* auxiliary vector */
+   __u32   auxv_size;  /* vector size */
+   __u32   exe_fd; /* /proc/$pid/exe link file */
+};
+#endif
+
 #ifdef CONFIG_CHECKPOINT_RESTORE
 static int prctl_set_mm_map(int opt, const void __user *addr, unsigned long 
data_size)
 {
@@ -1986,6 +2005,28 @@ static int prctl_set_mm_map(int opt, const void __user 
*addr, unsigned long data
if (data_size != sizeof(prctl_map))
return -EINVAL;
 
+#ifdef CONFIG_COMPAT
+   if (is_compat_task()) {
+   struct compat_prctl_mm_map prctl_map32;
+   if (copy_from_user(&prctl_map32, addr, sizeof(prctl_map32)))
+   return -EFAULT;
+
+   prctl_map.start_code = prctl_map32.start_code;
+   prctl_map.end_code = prctl_map32.end_code;
+   prctl_map.start_data = prctl_map32.start_data;
+   prctl_map.end_data = prctl_map32.end_data;
+   prctl_map.start_brk = prctl_map32.start_brk;
+   prctl_map.brk = prctl_map32.brk;
+   prctl_map.start_stack = prctl_map32.start_stack;
+   prctl_map.arg_start = prctl_map32.arg_start;
+   prctl_map.arg_end = prctl_map32.arg_end;
+   prctl_map.env_start = prctl_map32.env_start;
+   prctl_map.env_end = prctl_map32.env_end;
+   prctl_map.auxv = compat_ptr(prctl_map32.auxv);
+   prctl_map.auxv_size = prctl_map32.auxv_size;
+   prctl_map.exe_fd = prctl_map32.exe_fd;
+   } else
+#endif
if (copy_from_user(&prctl_map, addr, sizeof(prctl_map)))
return -EFAULT;
 
-- 
1.7.12.4



Re: [PATCH] KVM: X86: Allow userspace to define the microcode version

2018-04-17 Thread Wanpeng Li
2018-04-18 4:24 GMT+08:00 Eduardo Habkost :
> On Tue, Apr 17, 2018 at 06:40:58PM +0800, Wanpeng Li wrote:
>> Cc Eduardo,
>> 2018-02-26 20:41 GMT+08:00 Paolo Bonzini :
>> > On 26/02/2018 13:22, Borislav Petkov wrote:
>> >> On Mon, Feb 26, 2018 at 01:18:07PM +0100, Paolo Bonzini wrote:
>>  In this context, "host-initiated" write means written by KVM userspace
>>  with ioctl(KVM_SET_MSR).  It generally happens only on VM startup, reset
>>  or live migration.
>> >>>
>> >>> To be clear, the target of the write is still the vCPU's emulated MSR.
>> >>
>> >> So how am I to imagine this as a user:
>> >>
>> >> qemu-system-x86_64 --microcode-revision=0xdeadbeef...
>> >
>> > More like "-cpu foo,ucode_rev=0xdeadbeef".  But in practice what would
>> > happen is one of the following:
>> >
>> > 1) "-cpu host" sets ucode_rev to the same value of the host, everyone
>> > else leaves it to zero as is now.
>>
>> Hi Paolo,
>>
>> Do you mean the host admin to get the ucode_rev from the host and set
>> to -cpu host, ucode_rev=xx or qemu get the ucode_rev directly by
>> rdmsr?
>
> QEMU setting ucode_rev automatically using the host value when
> using "-cpu host" (with no need for explicit ucode_rev option)
> makes sense to me.

QEMU can't get the host value by rdmsr MSR_IA32_UCODE_REV directly
since rdmsr will #GP when ring !=0, any idea?

Regards,
Wanpeng Li


Re: [PATCH 0/4] ARM: imx: use device properties for at24 eeprom

2018-04-17 Thread Shawn Guo
On Wed, Apr 04, 2018 at 03:16:23PM +0200, Bartosz Golaszewski wrote:
> We want to work towards phasing out the at24_platform_data structure.
> There are few users and its contents can be represented using generic
> device properties. Using device properties only will allow us to
> significantly simplify the at24 configuration code.
> 
> This series converts all users of at24_platform_data to using generic
> device properties or - as is the case with PATCH 1/4 - drops the
> platform data entirely since it's not needed.
> 
> Bartosz Golaszewski (4):
>   ARM: imx: vpr200: drop at24_platform_data
>   ARM: imx: pcm043: use device properties for at24 eeprom
>   ARM: imx: pca100: use device properties for at24 eeprom
>   ARM: imx: pcm037: use device properties for at24 eeprom

Applied all, thanks.


[PATCH 14/24] x86/speculation: Move firmware_restrict_branch_speculation_*() from C to CPP

2018-04-17 Thread Youquan Song
From: Ingo Molnar 

(cherry picked from commit d72f4e29e6d84b7ec02ae93088aa459ac70e733b)

firmware_restrict_branch_speculation_*() recently started using
preempt_enable()/disable(), but those are relatively high level
primitives and cause build failures on some 32-bit builds.

Since we want to keep  low level, convert
them to macros to avoid header hell...

Cc: David Woodhouse 
Cc: Thomas Gleixner 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: arjan.van.de@intel.com
Cc: b...@alien8.de
Cc: dave.han...@intel.com
Cc: jmatt...@google.com
Cc: karah...@amazon.de
Cc: k...@vger.kernel.org
Cc: pbonz...@redhat.com
Cc: rkrc...@redhat.com
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar 
Signed-off-by: Youquan Song  [v4.4 backport]
---
 arch/x86/include/asm/nospec-branch.h | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/nospec-branch.h 
b/arch/x86/include/asm/nospec-branch.h
index 27582aa..4675f65 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -214,20 +214,22 @@ static inline void 
indirect_branch_prediction_barrier(void)
 /*
  * With retpoline, we must use IBRS to restrict branch prediction
  * before calling into firmware.
+ *
+ * (Implemented as CPP macros due to header hell.)
  */
-static inline void firmware_restrict_branch_speculation_start(void)
-{
-   preempt_disable();
-   alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,
- X86_FEATURE_USE_IBRS_FW);
-}
+#define firmware_restrict_branch_speculation_start()   \
+do {   \
+   preempt_disable();  \
+   alternative_msr_write(MSR_IA32_SPEC_CTRL, SPEC_CTRL_IBRS,   \
+ X86_FEATURE_USE_IBRS_FW); \
+} while (0)
 
-static inline void firmware_restrict_branch_speculation_end(void)
-{
-   alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,
- X86_FEATURE_USE_IBRS_FW);
-   preempt_enable();
-}
+#define firmware_restrict_branch_speculation_end() \
+do {   \
+   alternative_msr_write(MSR_IA32_SPEC_CTRL, 0,\
+ X86_FEATURE_USE_IBRS_FW); \
+   preempt_enable();   \
+} while (0)
 
 #endif /* __ASSEMBLY__ */
 
-- 
1.8.3.1



Re: [PATCH 00/20] staging: lustre: convert to rhashtable

2018-04-17 Thread NeilBrown
On Tue, Apr 17 2018, James Simmons wrote:

>> libcfs in lustre has a resizeable hashtable.
>> Linux already has a resizeable hashtable, rhashtable, which is better
>> is most metrics. See https://lwn.net/Articles/751374/ in a few days
>> for an introduction to rhashtable.
>
> Thansk for starting this work. I was think about cleaning the libcfs
> hash but your port to rhashtables is way better. How did you gather
> metrics to see that rhashtable was better than libcfs hash?

Code inspection and reputation.  It is hard to beat inlined lockless
code for lookups.  And rhashtable is heavily used in the network
subsystem and they are very focused on latency there.  I'm not sure that
insertion is as fast as it can be (I have some thoughts on that) but I'm
sure lookup will be better.
I haven't done any performance testing myself, only correctness.

>  
>> This series converts lustre to use rhashtable.  This affects several
>> different tables, and each is different is various ways.
>> 
>> There are two outstanding issues.  One is that a bug in rhashtable
>> means that we cannot enable auto-shrinking in one of the tables.  That
>> is documented as appropriate and should be fixed soon.
>> 
>> The other is that rhashtable has an atomic_t which counts the elements
>> in a hash table.  At least one table in lustre went to some trouble to
>> avoid any table-wide atomics, so that could lead to a regression.
>> I'm hoping that rhashtable can be enhanced with the option of a
>> per-cpu counter, or similar.
>> 
>
> This doesn't sound quite ready to land just yet. This will have to do some
> soak testing and a larger scope of test to make sure no new regressions
> happen. Believe me I did work to make lustre work better on tickless 
> systems, which I'm preparing for the linux client, and small changes could 
> break things in interesting ways. I will port the rhashtable change to the 
> Intel developement branch and get people more familar with the hash code
> to look at it.

Whether it is "ready" or not probably depends on perspective and
priorities.  As I see it, getting lustre cleaned up and out of staging
is a fairly high priority, and it will require a lot of code change.
It is inevitable that regressions will slip in (some already have) and
it is important to keep testing (the test suite is of great benefit, but
is only part of the story of course).  But to test the code, it needs to
land.  Testing the code in Intel's devel branch and then porting it
across doesn't really prove much.  For testing to be meaningful, it
needs to be tested in a branch that up-to-date with mainline and on
track to be merged into mainline.

I have no particular desire to rush this in, but I don't see any
particular benefit in delaying it either.

I guess I see staging as implicitly a 'devel' branch.  You seem to be
treating it a bit like a 'stable' branch - is that right?

As mentioned, I think there is room for improvement in rhashtable.
- making the atomic counter optional
- replacing the separate spinlocks with bit-locks in the hash-chain head
  so that the lock and chain are in the same cache line
- for the common case of inserting in an empty chain, a single atomic
  cmpxchg() should be sufficient
I think we have a better chance of being heard if we have "skin in the
game" and have upstream code that would use this.

Thanks,
NeilBrown


signature.asc
Description: PGP signature


Re: [RFC] virtio: Use DMA MAP API for devices without an IOMMU

2018-04-17 Thread Anshuman Khandual
On 04/15/2018 05:41 PM, Christoph Hellwig wrote:
> On Fri, Apr 06, 2018 at 06:37:18PM +1000, Benjamin Herrenschmidt wrote:
 implemented as DMA API which the virtio core understands. There is no
 need for an IOMMU to be involved for the device representation in this
 case IMHO.
>>>
>>> This whole virtio translation issue is a mess.  I think we need to
>>> switch it to the dma API, and then quirk the legacy case to always
>>> use the direct mapping inside the dma API.
>>
>> Fine with using a dma API always on the Linux side, but we do want to
>> special case virtio still at the arch and qemu side to have a "direct
>> mapping" mode. Not sure how (special flags on PCI devices) to avoid
>> actually going through an emulated IOMMU on the qemu side, because that
>> slows things down, esp. with vhost.
>>
>> IE, we can't I think just treat it the same as a physical device.
> 
> We should have treated it like a physical device from the start, but
> that device has unfortunately sailed.
> 
> But yes, we'll need a per-device quirk that says 'don't attach an
> iommu'.

How about doing it per platform basis as suggested in this RFC through
an arch specific callback. Because all the virtio devices in the given
platform would require and exercise this option (to avail bounce buffer
mechanism for secure guests as an example). So the flag basically is a
platform specific one not a device specific one.



Re: [PATCH] mm:memcg: add __GFP_NOWARN in __memcg_schedule_kmem_cache_create

2018-04-17 Thread Matthew Wilcox
On Wed, Apr 18, 2018 at 11:29:12AM +0900, Minchan Kim wrote:
> If there are heavy memory pressure, page allocation with __GFP_NOWAIT
> fails easily although it's order-0 request.
> I got below warning 9 times for normal boot.
> 
> [   17.072747] c0 0  : page allocation failure: order:0, 
> mode:0x220(GFP_NOWAIT|__GFP_NOTRACK)
> 
> Let's not make user scared.
>  
> - cw = kmalloc(sizeof(*cw), GFP_NOWAIT);
> + cw = kmalloc(sizeof(*cw), GFP_NOWAIT | __GFP_NOWARN);
>   if (!cw)

Not arguing against this patch.  But how many places do we want to use
GFP_NOWAIT without __GFP_NOWARN?  Not many, and the few which do do this
seem like they simply haven't added it yet.  Maybe this would be a good idea?

-#define GFP_NOWAIT  (__GFP_KSWAPD_RECLAIM)
+#define GFP_NOWAIT  (__GFP_KSWAPD_RECLAIM | __GFP_NOWARN)


Re: [PATCH] mm:memcg: add __GFP_NOWARN in __memcg_schedule_kmem_cache_create

2018-04-17 Thread David Rientjes
On Wed, 18 Apr 2018, Minchan Kim wrote:

> If there are heavy memory pressure, page allocation with __GFP_NOWAIT
> fails easily although it's order-0 request.
> I got below warning 9 times for normal boot.
> 
> [   17.072747] c0 0  : page allocation failure: order:0, 
> mode:0x220(GFP_NOWAIT|__GFP_NOTRACK)
> < snip >
> [   17.072789] c0 0  Call trace:
> [   17.072803] c0 0  [] dump_backtrace+0x0/0x4
> [   17.072813] c0 0  [] dump_stack+0xa4/0xc0
> [   17.072822] c0 0  [] warn_alloc+0xd4/0x15c
> [   17.072829] c0 0  [] 
> __alloc_pages_nodemask+0xf88/0x10fc
> [   17.072838] c0 0  [] alloc_slab_page+0x40/0x18c
> [   17.072843] c0 0  [] new_slab+0x2b8/0x2e0
> [   17.072849] c0 0  [] ___slab_alloc+0x25c/0x464
> [   17.072858] c0 0  [] __kmalloc+0x394/0x498
> [   17.072865] c0 0  [] memcg_kmem_get_cache+0x114/0x2b8
> [   17.072870] c0 0  [] kmem_cache_alloc+0x98/0x3e8
> [   17.072878] c0 0  [] mmap_region+0x3bc/0x8c0
> [   17.072884] c0 0  [] do_mmap+0x40c/0x43c
> [   17.072890] c0 0  [] vm_mmap_pgoff+0x15c/0x1e4
> [   17.072898] c0 0  [] sys_mmap+0xb0/0xc8
> [   17.072904] c0 0  [] el0_svc_naked+0x24/0x28
> [   17.072908] c0 0  Mem-Info:
> [   17.072920] c0 0  active_anon:17124 inactive_anon:193 isolated_anon:0
> [   17.072920] c0 0   active_file:7898 inactive_file:712955 
> isolated_file:55
> [   17.072920] c0 0   unevictable:0 dirty:27 writeback:18 unstable:0
> [   17.072920] c0 0   slab_reclaimable:12250 slab_unreclaimable:23334
> [   17.072920] c0 0   mapped:19310 shmem:212 pagetables:816 bounce:0
> [   17.072920] c0 0   free:36561 free_pcp:1205 free_cma:35615
> [   17.072933] c0 0  Node 0 active_anon:68496kB inactive_anon:772kB 
> active_file:31592kB inactive_file:2851820kB unevictable:0kB 
> isolated(anon):0kB isolated(file):220kB mapped:77240kB dirty:108kB 
> writeback:72kB shmem:848kB writeback_tmp:0kB unstable:0kB all_unreclaimable? 
> no
> [   17.072945] c0 0  DMA free:142188kB min:3056kB low:3820kB high:4584kB 
> active_anon:10052kB inactive_anon:12kB active_file:312kB 
> inactive_file:1412620kB unevictable:0kB writepending:0kB present:1781412kB 
> managed:1604728kB mlocked:0kB slab_reclaimable:3592kB 
> slab_unreclaimable:876kB kernel_stack:400kB pagetables:52kB bounce:0kB 
> free_pcp:1436kB local_pcp:124kB free_cma:142492kB
> [   17.072949] c0 0  lowmem_reserve[]: 0 1842 1842
> [   17.072966] c0 0  Normal free:4056kB min:4172kB low:5212kB high:6252kB 
> active_anon:58376kB inactive_anon:760kB active_file:31348kB 
> inactive_file:1439040kB unevictable:0kB writepending:180kB present:2000636kB 
> managed:1923688kB mlocked:0kB slab_reclaimable:45408kB 
> slab_unreclaimable:92460kB kernel_stack:9680kB pagetables:3212kB bounce:0kB 
> free_pcp:3392kB local_pcp:688kB free_cma:0kB
> [   17.072971] c0 0  lowmem_reserve[]: 0 0 0
> [   17.072982] c0 0  DMA: 0*4kB 0*8kB 1*16kB (C) 0*32kB 0*64kB 0*128kB 
> 1*256kB (C) 1*512kB (C) 0*1024kB 1*2048kB (C) 34*4096kB (C) = 142096kB
> [   17.073024] c0 0  Normal: 228*4kB (UMEH) 172*8kB (UMH) 23*16kB (UH) 
> 24*32kB (H) 5*64kB (H) 1*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 
> = 3872kB
> [   17.073069] c0 0  721350 total pagecache pages
> [   17.073073] c0 0  0 pages in swap cache
> [   17.073078] c0 0  Swap cache stats: add 0, delete 0, find 0/0
> [   17.073081] c0 0  Free swap  = 0kB
> [   17.073085] c0 0  Total swap = 0kB
> [   17.073089] c0 0  945512 pages RAM
> [   17.073093] c0 0  0 pages HighMem/MovableOnly
> [   17.073097] c0 0  63408 pages reserved
> [   17.073100] c0 0  51200 pages cma reserved
> 
> Let's not make user scared.
> 
> Cc: Johannes Weiner 
> Cc: Michal Hocko 
> Cc: Vladimir Davydov 
> Signed-off-by: Minchan Kim 

Acked-by: David Rientjes 


Re: [PATCH v4] ARM: dts: imx6q-icore-ofcap12: Switch LVDS timings from panel-simple

2018-04-17 Thread Shawn Guo
On Mon, Mar 26, 2018 at 01:35:53PM +0530, Jagan Teki wrote:
> Switch to use koe_tx31d200vm0baa LVDS timings from
> panel-simple instead hard coding the same in dts.
> 
> Signed-off-by: Jagan Teki 

Applied both, thanks.


[patch v2] mm, oom: fix concurrent munlock and oom reaper unmap

2018-04-17 Thread David Rientjes
Since exit_mmap() is done without the protection of mm->mmap_sem, it is
possible for the oom reaper to concurrently operate on an mm until
MMF_OOM_SKIP is set.

This allows munlock_vma_pages_all() to concurrently run while the oom
reaper is operating on a vma.  Since munlock_vma_pages_range() depends on
clearing VM_LOCKED from vm_flags before actually doing the munlock to
determine if any other vmas are locking the same memory, the check for
VM_LOCKED in the oom reaper is racy.

This is especially noticeable on architectures such as powerpc where
clearing a huge pmd requires serialize_against_pte_lookup().  If the pmd
is zapped by the oom reaper during follow_page_mask() after the check for
pmd_none() is bypassed, this ends up deferencing a NULL ptl.

Fix this by reusing MMF_UNSTABLE to specify that an mm should not be
reaped.  This prevents the concurrent munlock_vma_pages_range() and
unmap_page_range().  The oom reaper will simply not operate on an mm that
has the bit set and leave the unmapping to exit_mmap().

Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run 
concurrently")
Cc: sta...@vger.kernel.org [4.14+]
Signed-off-by: David Rientjes 
---
 v2:
  - oom reaper only sets MMF_OOM_SKIP if MMF_UNSTABLE was never set (either
by itself or by exit_mmap(), per Tetsuo
  - s/kick_all_cpus_sync/serialize_against_pte_lookup/ in changelog as more
isolated way of forcing cpus as non-idle on power

 mm/mmap.c | 38 --
 mm/oom_kill.c | 28 +---
 2 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/mm/mmap.c b/mm/mmap.c
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -3015,6 +3015,25 @@ void exit_mmap(struct mm_struct *mm)
/* mm's last user has gone, and its about to be pulled down */
mmu_notifier_release(mm);
 
+   if (unlikely(mm_is_oom_victim(mm))) {
+   /*
+* Wait for oom_reap_task() to stop working on this mm.  Because
+* MMF_UNSTABLE is already set before calling down_read(),
+* oom_reap_task() will not run on this mm after up_write().
+* oom_reap_task() also depends on a stable VM_LOCKED flag to
+* indicate it should not unmap during munlock_vma_pages_all().
+*
+* mm_is_oom_victim() cannot be set from under us because
+* victim->mm is already set to NULL under task_lock before
+* calling mmput() and victim->signal->oom_mm is set by the oom
+* killer only if victim->mm is non-NULL while holding
+* task_lock().
+*/
+   set_bit(MMF_UNSTABLE, &mm->flags);
+   down_write(&mm->mmap_sem);
+   up_write(&mm->mmap_sem);
+   }
+
if (mm->locked_vm) {
vma = mm->mmap;
while (vma) {
@@ -3036,26 +3055,9 @@ void exit_mmap(struct mm_struct *mm)
/* update_hiwater_rss(mm) here? but nobody should be looking */
/* Use -1 here to ensure all VMAs in the mm are unmapped */
unmap_vmas(&tlb, vma, 0, -1);
-
-   if (unlikely(mm_is_oom_victim(mm))) {
-   /*
-* Wait for oom_reap_task() to stop working on this
-* mm. Because MMF_OOM_SKIP is already set before
-* calling down_read(), oom_reap_task() will not run
-* on this "mm" post up_write().
-*
-* mm_is_oom_victim() cannot be set from under us
-* either because victim->mm is already set to NULL
-* under task_lock before calling mmput and oom_mm is
-* set not NULL by the OOM killer only if victim->mm
-* is found not NULL while holding the task_lock.
-*/
-   set_bit(MMF_OOM_SKIP, &mm->flags);
-   down_write(&mm->mmap_sem);
-   up_write(&mm->mmap_sem);
-   }
free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, USER_PGTABLES_CEILING);
tlb_finish_mmu(&tlb, 0, -1);
+   set_bit(MMF_OOM_SKIP, &mm->flags);
 
/*
 * Walk the list again, actually closing and freeing it,
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -521,12 +521,17 @@ static bool __oom_reap_task_mm(struct task_struct *tsk, 
struct mm_struct *mm)
}
 
/*
-* MMF_OOM_SKIP is set by exit_mmap when the OOM reaper can't
-* work on the mm anymore. The check for MMF_OOM_SKIP must run
+* Tell all users of get_user/copy_from_user etc... that the content
+* is no longer stable. No barriers really needed because unmapping
+* should imply barriers already and the reader would hit a page fault
+* if it stumbled over reaped memory.
+*
+* MMF_UNSTABLE is also set by exit_mmap when the OOM reaper shouldn't
+* work on the mm anymore. The check for MMF_OOM_UNSTABLE mu

Re: [patch] mm, oom: fix concurrent munlock and oom reaper unmap

2018-04-17 Thread David Rientjes
On Wed, 18 Apr 2018, Tetsuo Handa wrote:

> > Since exit_mmap() is done without the protection of mm->mmap_sem, it is
> > possible for the oom reaper to concurrently operate on an mm until
> > MMF_OOM_SKIP is set.
> > 
> > This allows munlock_vma_pages_all() to concurrently run while the oom
> > reaper is operating on a vma.  Since munlock_vma_pages_range() depends on
> > clearing VM_LOCKED from vm_flags before actually doing the munlock to
> > determine if any other vmas are locking the same memory, the check for
> > VM_LOCKED in the oom reaper is racy.
> > 
> > This is especially noticeable on architectures such as powerpc where
> > clearing a huge pmd requires kick_all_cpus_sync().  If the pmd is zapped
> > by the oom reaper during follow_page_mask() after the check for pmd_none()
> > is bypassed, this ends up deferencing a NULL ptl.
> 
> I don't know whether the explanation above is correct.
> Did you actually see a crash caused by this race?
> 

Yes, it's trivially reproducible on power by simply mlocking a ton of 
memory and triggering oom kill.

> > Fix this by reusing MMF_UNSTABLE to specify that an mm should not be
> > reaped.  This prevents the concurrent munlock_vma_pages_range() and
> > unmap_page_range().  The oom reaper will simply not operate on an mm that
> > has the bit set and leave the unmapping to exit_mmap().
> 
> But this patch is setting MMF_OOM_SKIP without reaping any memory as soon as
> MMF_UNSTABLE is set, which is the situation described in 212925802454:
> 

Oh, you're referring to __oom_reap_task_mm() returning true because of 
MMF_UNSTABLE and then setting MMF_OOM_SKIP itself?  Yes, that is dumb.  We 
could change __oom_reap_task_mm() to only set MMF_OOM_SKIP if MMF_UNSTABLE 
hasn't been set.  I'll send a v2, which I needed to do anyway to do 
s/kick_all_cpus_sync/serialize_against_pte_lookup/ in the changelog (power 
only does it for the needed cpus).


Re: [PATCH 3/6] staging: lustre: remove include/linux/libcfs/linux/linux-cpu.h

2018-04-17 Thread NeilBrown
On Mon, Apr 16 2018, James Simmons wrote:

>> This include file contains definitions used when CONFIG_SMP
>> is in effect.  Other includes contain corresponding definitions
>> for when it isn't.
>> This can be hard to follow, so move the definitions to the one place.
>> 
>> As HAVE_LIBCFS_CPT is defined precisely when CONFIG_SMP, we discard
>> that macro and just use CONFIG_SMP when needed.
>
> Nak. The lustre SMP is broken and needed to badly be reworked. I have it 
> ready and can push it. I was waiting to see if I had to rebase it once
> the rc1 stuff but since their is a push to get everything out their I will
> push it.
>

Great - thanks for posting those.  I might wait until they land in
Greg's tree, then see if I there is anything else I want to add.

Thanks,
NeilBrown


signature.asc
Description: PGP signature


Re: [PATCH] PCI: Add PCIe to pcie_print_link_status() messages

2018-04-17 Thread Jakub Kicinski
On Fri, 13 Apr 2018 11:16:38 -0700, Jakub Kicinski wrote:
> Currently the pcie_print_link_status() will print PCIe bandwidth
> and link width information but does not mention it is pertaining
> to the PCIe.  Since this and related functions are used exclusively
> by networking drivers today users may get confused into thinking
> that it's the NIC bandwidth that is being talked about.  Insert a
> "PCIe" into the messages.
> 
> Signed-off-by: Jakub Kicinski 

Hi Bjorn!

Could this small change still make it into 4.17 or are you planning to
apply it in 4.18 cycle?  IMHO the message clarification may be worth
considering for 4.17..

>  drivers/pci/pci.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index aa86e904f93c..73a0a4993f6a 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5273,11 +5273,11 @@ void pcie_print_link_status(struct pci_dev *dev)
>   bw_avail = pcie_bandwidth_available(dev, &limiting_dev, &speed, &width);
>  
>   if (bw_avail >= bw_cap)
> - pci_info(dev, "%u.%03u Gb/s available bandwidth (%s x%d 
> link)\n",
> + pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth (%s x%d 
> link)\n",
>bw_cap / 1000, bw_cap % 1000,
>PCIE_SPEED2STR(speed_cap), width_cap);
>   else
> - pci_info(dev, "%u.%03u Gb/s available bandwidth, limited by %s 
> x%d link at %s (capable of %u.%03u Gb/s with %s x%d link)\n",
> + pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth, limited 
> by %s x%d link at %s (capable of %u.%03u Gb/s with %s x%d link)\n",
>bw_avail / 1000, bw_avail % 1000,
>PCIE_SPEED2STR(speed), width,
>limiting_dev ? pci_name(limiting_dev) : "",



Re: [PATCH v2] clk: imx: Set CLK_SET_RATE_GATE for gate and divider clocks

2018-04-17 Thread Shawn Guo
On Wed, Apr 11, 2018 at 05:03:29PM +0300, Abel Vesa wrote:
> From: Shawn Guo 
> 
> Add flag CLK_SET_RATE_GATE for i.MX gate and divider clocks on which the
> client drivers usually make clk_set_rate() call, so that the call will fail
> when clock is still on instead of standing the risk of running into glitch
> issue. Rate cannot be changed when the clock is enabled due to the glitchy
> multiplexers.
> 
> Signed-off-by: Shawn Guo 
> [initial patch from imx internal repo]
> Signed-off-by: Abel Vesa 
> [carried over from 3.14 and also applied the flag to newer functions]
> ---
> 
> Changes since v1:
>  - changed ownership as per initial patch

IIRC, the patch was created on vendor kernel long time ago to work
around a specific glitchy multiplexer issue seen on particular SoC.
I'm not sure it's good for the upstream kernel today.

Shawn


Re: [PATCH 2/6] staging: lustre: remove libcfs/linux/libcfs.h

2018-04-17 Thread NeilBrown
On Mon, Apr 16 2018, James Simmons wrote:

>> This include file is only included in one place,
>> and only contains a list of other include directives.
>> So just move all those to the place where this file
>> is included, and discard the file.
>> 
>> One include directive uses a local name ("linux-cpu.h"), so
>> that needs to be given a proper path.
>> 
>> Probably many of these should be remove from here, and moved to
>> just the files that need them.
>
> Nak. Dumping all the extra headers from linux/libcfs.h to libcfs.h is
> the wrong approach. The one header, libcfs.h, to be the only header
> in all lustre files is the wrong approach. I have been looking to
> unroll that mess. I have patch that I need to polish you that I can
> submit.

I think we both have the same goal - maybe just different paths to get
there.  If you have something nearly ready to submit, I'm happy to wait
for it, then proceed on top of it.

Thanks,
NeilBrown


signature.asc
Description: PGP signature


[PATCH] mm:memcg: add __GFP_NOWARN in __memcg_schedule_kmem_cache_create

2018-04-17 Thread Minchan Kim
If there are heavy memory pressure, page allocation with __GFP_NOWAIT
fails easily although it's order-0 request.
I got below warning 9 times for normal boot.

[   17.072747] c0 0  : page allocation failure: order:0, 
mode:0x220(GFP_NOWAIT|__GFP_NOTRACK)
< snip >
[   17.072789] c0 0  Call trace:
[   17.072803] c0 0  [] dump_backtrace+0x0/0x4
[   17.072813] c0 0  [] dump_stack+0xa4/0xc0
[   17.072822] c0 0  [] warn_alloc+0xd4/0x15c
[   17.072829] c0 0  [] 
__alloc_pages_nodemask+0xf88/0x10fc
[   17.072838] c0 0  [] alloc_slab_page+0x40/0x18c
[   17.072843] c0 0  [] new_slab+0x2b8/0x2e0
[   17.072849] c0 0  [] ___slab_alloc+0x25c/0x464
[   17.072858] c0 0  [] __kmalloc+0x394/0x498
[   17.072865] c0 0  [] memcg_kmem_get_cache+0x114/0x2b8
[   17.072870] c0 0  [] kmem_cache_alloc+0x98/0x3e8
[   17.072878] c0 0  [] mmap_region+0x3bc/0x8c0
[   17.072884] c0 0  [] do_mmap+0x40c/0x43c
[   17.072890] c0 0  [] vm_mmap_pgoff+0x15c/0x1e4
[   17.072898] c0 0  [] sys_mmap+0xb0/0xc8
[   17.072904] c0 0  [] el0_svc_naked+0x24/0x28
[   17.072908] c0 0  Mem-Info:
[   17.072920] c0 0  active_anon:17124 inactive_anon:193 isolated_anon:0
[   17.072920] c0 0   active_file:7898 inactive_file:712955 isolated_file:55
[   17.072920] c0 0   unevictable:0 dirty:27 writeback:18 unstable:0
[   17.072920] c0 0   slab_reclaimable:12250 slab_unreclaimable:23334
[   17.072920] c0 0   mapped:19310 shmem:212 pagetables:816 bounce:0
[   17.072920] c0 0   free:36561 free_pcp:1205 free_cma:35615
[   17.072933] c0 0  Node 0 active_anon:68496kB inactive_anon:772kB 
active_file:31592kB inactive_file:2851820kB unevictable:0kB isolated(anon):0kB 
isolated(file):220kB mapped:77240kB dirty:108kB writeback:72kB shmem:848kB 
writeback_tmp:0kB unstable:0kB all_unreclaimable? no
[   17.072945] c0 0  DMA free:142188kB min:3056kB low:3820kB high:4584kB 
active_anon:10052kB inactive_anon:12kB active_file:312kB 
inactive_file:1412620kB unevictable:0kB writepending:0kB present:1781412kB 
managed:1604728kB mlocked:0kB slab_reclaimable:3592kB slab_unreclaimable:876kB 
kernel_stack:400kB pagetables:52kB bounce:0kB free_pcp:1436kB local_pcp:124kB 
free_cma:142492kB
[   17.072949] c0 0  lowmem_reserve[]: 0 1842 1842
[   17.072966] c0 0  Normal free:4056kB min:4172kB low:5212kB high:6252kB 
active_anon:58376kB inactive_anon:760kB active_file:31348kB 
inactive_file:1439040kB unevictable:0kB writepending:180kB present:2000636kB 
managed:1923688kB mlocked:0kB slab_reclaimable:45408kB 
slab_unreclaimable:92460kB kernel_stack:9680kB pagetables:3212kB bounce:0kB 
free_pcp:3392kB local_pcp:688kB free_cma:0kB
[   17.072971] c0 0  lowmem_reserve[]: 0 0 0
[   17.072982] c0 0  DMA: 0*4kB 0*8kB 1*16kB (C) 0*32kB 0*64kB 0*128kB 
1*256kB (C) 1*512kB (C) 0*1024kB 1*2048kB (C) 34*4096kB (C) = 142096kB
[   17.073024] c0 0  Normal: 228*4kB (UMEH) 172*8kB (UMH) 23*16kB (UH) 
24*32kB (H) 5*64kB (H) 1*128kB (H) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 
3872kB
[   17.073069] c0 0  721350 total pagecache pages
[   17.073073] c0 0  0 pages in swap cache
[   17.073078] c0 0  Swap cache stats: add 0, delete 0, find 0/0
[   17.073081] c0 0  Free swap  = 0kB
[   17.073085] c0 0  Total swap = 0kB
[   17.073089] c0 0  945512 pages RAM
[   17.073093] c0 0  0 pages HighMem/MovableOnly
[   17.073097] c0 0  63408 pages reserved
[   17.073100] c0 0  51200 pages cma reserved

Let's not make user scared.

Cc: Johannes Weiner 
Cc: Michal Hocko 
Cc: Vladimir Davydov 
Signed-off-by: Minchan Kim 
---
 mm/memcontrol.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 448db08d97a0..671d07e73a3b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2200,7 +2200,7 @@ static void __memcg_schedule_kmem_cache_create(struct 
mem_cgroup *memcg,
 {
struct memcg_kmem_cache_create_work *cw;
 
-   cw = kmalloc(sizeof(*cw), GFP_NOWAIT);
+   cw = kmalloc(sizeof(*cw), GFP_NOWAIT | __GFP_NOWARN);
if (!cw)
return;
 
-- 
2.17.0.484.g0c8726318c-goog



Re: [lustre-devel] [PATCH 1/6] staging: lustre: move stack-check macros to libcfs_debug.h

2018-04-17 Thread NeilBrown
On Mon, Apr 16 2018, James Simmons wrote:

>> James,
>> 
>> If I understand correctly, you're saying you want to be able to build 
>> without debug support...?  I'm not convinced that building a client without 
>> debug support is interesting or useful.  In fact, I think it would be 
>> harmful, and we shouldn't open up the possibility - this is switchable debug 
>> with very low overhead when not actually "on".  It would be really awful to 
>> get a problem on a running system and discover there's no debug support - 
>> that you can't even enable debug without a reinstall.
>> 
>> If I've understood you correctly, then I would want to see proof of a 
>> significant performance cost when debug is built but *off* before agreeing 
>> to even exposing this option.  (I know it's a choice they'd have to make, 
>> but if it's not really useful with a side order of potentially harmful, we 
>> shouldn't even give people the choice.)
>
> I'm not saying add the option today but this is more for the long game.
> While the Intel lustre developers deeply love lustre's debugging 
> infrastructure I see a future where something better will come along to
> replace it. When that day comes we will have a period where both
> debugging infrastructurs will exist and some deployers of lustre will
> want to turn off the old debugging infrastructure and just use the new.
> That is what I have in mind. A switch to flip between options.

My position on this is that lustre's debugging infrastructure (in
mainline) *will* be changed to use something that the rest of the kernel
can and does use.  Quite possibly that "something" will first be
enhanced so that it is as powerful and useful as what lustre has.
I suspect this will partly be pr_debug(), partly WARN_ON(), partly trace
points.  But I'm not very familiar with tracepoints or with lustre
debugging yet so this is far from certain.
pr_debug() and tracepoints can be compiled out, but only kernel-wide.
There is no reason for lustre to be special there.  WARN_ON() and
BUG_ON() cannot be compiled out, but BUG_ON() must only be used when
proceeding is unarguably worse than crashing the machine.  In recent
years a lot of BUG_ON()s have been removed or changed to warnings.  We
need to maintain that attitude.

I don't like the idea of have two parallel debuging infrastructures that
you can choose between - it encourages confusion and brings no benefits.

Thanks,
NeilBrown


signature.asc
Description: PGP signature


linux-next: Tree for Apr 18

2018-04-17 Thread Stephen Rothwell
Hi all,

Changes since 20180417:

Removed tree: efi-lock-down (temporarily at maintainer's request)

The netfilter tree lost its build failure.

The sound-asoc tree lost its build failure but gained aonther so I have
used the version from next-20180416.

Non-merge commits (relative to Linus' tree): 797
 915 files changed, 24509 insertions(+), 14978 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 257 trees (counting Linus' and 44 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (a27fc14219f2 Merge branch 'parisc-4.17-3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux)
Merging fixes/master (147a89bc71e7 Merge tag 'kconfig-v4.17' of 
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild)
Merging kbuild-current/fixes (28913ee8191a netfilter: nf_nat_snmp_basic: add 
correct dependency to Makefile)
Merging arc-current/for-curr (661e50bc8532 Linux 4.16-rc4)
Merging arm-current/fixes (fe680ca02c1e ARM: replace unnecessary perl with sed 
and the shell $(( )) operator)
Merging arm64-fixes/for-next/fixes (800cb2e553d4 arm64: kasan: avoid 
pfn_to_nid() before page array is initialized)
Merging m68k-current/for-linus (ecd685580c8f m68k/mac: Remove bogus "FIXME" 
comment)
Merging powerpc-fixes/fixes (9dfbf78e4114 powerpc/64s: Default l1d_size to 64K 
in RFI fallback flush)
Merging sparc/master (17dec0a94915 Merge branch 'userns-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (9c438d7a3a52 KEYS: DNS: limit the length of option strings)
Merging bpf/master (3a38bb98d9ab bpf/tracing: fix a deadlock in 
perf_event_detach_bpf_prog)
Merging ipsec/master (b48c05ab5d32 xfrm: Fix warning in xfrm6_tunnel_net_exit.)
Merging netfilter/master (765cca91b895 netfilter: conntrack: include kmemleak.h 
for kmemleak_not_leak())
Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook 
mask only if set)
Merging wireless-drivers/master (77e30e10ee28 iwlwifi: mvm: query regdb for wmm 
rule if needed)
Merging mac80211/master (b5dbc28762fd Merge tag 'kbuild-fixes-v4.16-3' of 
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild)
Merging rdma-fixes/for-rc (60cc43fc8884 Linux 4.17-rc1)
Merging sound-current/for-linus (af52f9982e41 ALSA: hda - New VIA controller 
suppor no-snoop path)
Merging pci-current/for-linus (60cc43fc8884 Linux 4.17-rc1)
Merging driver-core.current/driver-core-linus (60cc43fc8884 Linux 4.17-rc1)
Merging tty.current/tty-linus (60cc43fc8884 Linux 4.17-rc1)
Merging usb.current/usb-linus (60cc43fc8884 Linux 4.17-rc1)
Merging usb-gadget-fixes/fixes (c6ba5084ce0d usb: gadget: udc: renesas_usb3: 
add binging for r8a77965)
Merging usb-serial-fixes/usb-linus (470b5d6f0cf4 USB: serial: ftdi_sio: use 
jtag quirk for Arrow USB Blaster)
Merging usb-chipidea-fixes/ci-for-usb-stable (964728f9f407 USB: chipidea: msm: 
fix ulpi-node lookup)
Merging phy/fixes (60cc43fc8884 Linux 4.17-rc1)
Merging staging.current/staging-linus (edf5c17d866e staging: irda: remove 
remaining remants of irda code removal)
Merging char-misc.current/char-misc-linus (60cc43fc8884 Linux 4

Re: [PATCH 1/6] staging: lustre: move stack-check macros to libcfs_debug.h

2018-04-17 Thread NeilBrown
On Mon, Apr 16 2018, James Simmons wrote:

>> CDEBUG_STACK() and CHECK_STACK() are macros to help with
>> debugging, so move them from
>>drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
>> to
>>drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
>> 
>> This seems a more fitting location, and is a step towards
>> removing linux/libcfs.h and simplifying the include file structure.
>
> Nak. Currently the lustre client always enables debugging but that
> shouldn't be the case. What we do need is the able to turn off the 
> crazy debugging stuff. In the development branch of lustre it is
> done with CDEBUG_ENABLED. We need something like that in Kconfig
> much like we have CONFIG_LUSTRE_DEBUG_EXPENSIVE_CHECK. Since we like
> to be able to turn that off this should be moved to just after
> LIBCFS_DEBUG_MSG_DATA_DECL. Then from CHECK_STACK down to CWARN()
> it can be build out. When CDEBUG_ENABLED is disabled CDEBUG_LIMIT
> would be empty.

So why, exactly, is this an argument to justify a NAK?
Are you just saying  that the code I moved into libcfs_debug.h should be
moved to somewhere a bit later in the file?
That can easily be done when it is needed.  It isn't needed now so why
insist on it?

Each patch should do one thing and make clear forward progress.  This
patch gets rid of an unnecessary file and brings related code together.
I think that qualifies.

Thanks,
NeilBrown


>  
>> Signed-off-by: NeilBrown 
>> ---
>>  .../lustre/include/linux/libcfs/libcfs_debug.h |   32 
>> 
>>  .../lustre/include/linux/libcfs/linux/libcfs.h |   31 
>> ---
>>  2 files changed, 32 insertions(+), 31 deletions(-)
>> 
>> diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h 
>> b/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
>> index 9290a19429e7..0dc7b91efe7c 100644
>> --- a/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
>> +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs_debug.h
>> @@ -62,6 +62,38 @@ int libcfs_debug_str2mask(int *mask, const char *str, int 
>> is_subsys);
>>  extern unsigned int libcfs_catastrophe;
>>  extern unsigned int libcfs_panic_on_lbug;
>>  
>> +/* Enable debug-checks on stack size - except on x86_64 */
>> +#if !defined(__x86_64__)
>> +# ifdef __ia64__
>> +#  define CDEBUG_STACK() (THREAD_SIZE -  \
>> +  ((unsigned long)__builtin_dwarf_cfa() &   \
>> +   (THREAD_SIZE - 1)))
>> +# else
>> +#  define CDEBUG_STACK() (THREAD_SIZE -  \
>> +  ((unsigned long)__builtin_frame_address(0) &  \
>> +   (THREAD_SIZE - 1)))
>> +# endif /* __ia64__ */
>> +
>> +#define __CHECK_STACK(msgdata, mask, cdls)\
>> +do {\
>> +if (unlikely(CDEBUG_STACK() > libcfs_stack)) {\
>> +LIBCFS_DEBUG_MSG_DATA_INIT(msgdata, D_WARNING, NULL);   \
>> +libcfs_stack = CDEBUG_STACK();\
>> +libcfs_debug_msg(msgdata,  \
>> + "maximum lustre stack %lu\n",\
>> + CDEBUG_STACK());  \
>> +(msgdata)->msg_mask = mask;  \
>> +(msgdata)->msg_cdls = cdls;  \
>> +dump_stack();  \
>> +  /*panic("LBUG");*/\
>> +}  \
>> +} while (0)
>> +#define CFS_CHECK_STACK(msgdata, mask, cdls)  __CHECK_STACK(msgdata, mask, 
>> cdls)
>> +#else /* __x86_64__ */
>> +#define CFS_CHECK_STACK(msgdata, mask, cdls) do {} while (0)
>> +#define CDEBUG_STACK() (0L)
>> +#endif /* __x86_64__ */
>> +
>>  #ifndef DEBUG_SUBSYSTEM
>>  # define DEBUG_SUBSYSTEM S_UNDEFINED
>>  #endif
>> diff --git a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h 
>> b/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
>> index 07d3cb2217d1..83aec9c7698f 100644
>> --- a/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
>> +++ b/drivers/staging/lustre/include/linux/libcfs/linux/libcfs.h
>> @@ -80,35 +80,4 @@
>>  #include 
>>  #include "linux-cpu.h"
>>  
>> -#if !defined(__x86_64__)
>> -# ifdef __ia64__
>> -#  define CDEBUG_STACK() (THREAD_SIZE -  \
>> -  ((unsigned long)__builtin_dwarf_cfa() &   \
>> -   (THREAD_SIZE - 1)))
>> -# else
>> -#  define CDEBUG_STACK() (THREAD_SIZE -  \
>> -  ((unsigned long)__builtin_frame_address(0) &  \
>> -   (THREAD_SIZE - 1)))
>> -# endif /* __ia64__ */
>> -
>> -#define __CHECK_STACK(msgdata, mask, cdls)\
>> -do { 

Re: [PATCH] IB/uverbs: Add missing braces in anonymous union initializers

2018-04-17 Thread Jason Gunthorpe
On Mon, Apr 09, 2018 at 04:52:47PM +0200, Geert Uytterhoeven wrote:
> With gcc-4.1.2:
> 
> drivers/infiniband/core/uverbs_std_types_flow_action.c:366: error: 
> unknown field ‘ptr’ specified in initializer
> drivers/infiniband/core/uverbs_std_types_flow_action.c:367: error: 
> unknown field ‘type’ specified in initializer
> drivers/infiniband/core/uverbs_std_types_flow_action.c:367: warning: 
> missing braces around initializer
> drivers/infiniband/core/uverbs_std_types_flow_action.c:367: warning: 
> (near initialization for 
> ‘uverbs_flow_action_esp_keymat[0]..’)
> drivers/infiniband/core/uverbs_std_types_flow_action.c:368: error: 
> unknown field ‘min_len’ specified in initializer
> drivers/infiniband/core/uverbs_std_types_flow_action.c:368: warning: 
> excess elements in union initializer
> drivers/infiniband/core/uverbs_std_types_flow_action.c:368: warning: 
> (near initialization for ‘uverbs_flow_action_esp_keymat[0].’)
> drivers/infiniband/core/uverbs_std_types_flow_action.c:368: error: 
> unknown field ‘len’ specified in initializer
> drivers/infiniband/core/uverbs_std_types_flow_action.c:368: warning: 
> excess elements in union initializer
> drivers/infiniband/core/uverbs_std_types_flow_action.c:368: warning: 
> (near initialization for ‘uverbs_flow_action_esp_keymat[0].’)
> drivers/infiniband/core/uverbs_std_types_flow_action.c:369: error: 
> unknown field ‘flags’ specified in initializer
> drivers/infiniband/core/uverbs_std_types_flow_action.c:369: warning: 
> excess elements in union initializer
> drivers/infiniband/core/uverbs_std_types_flow_action.c:369: warning: 
> (near initialization for ‘uverbs_flow_action_esp_keymat[0].’)
> drivers/infiniband/core/uverbs_std_types_flow_action.c:376: error: 
> unknown field ‘ptr’ specified in initializer
> drivers/infiniband/core/uverbs_std_types_flow_action.c:377: error: 
> unknown field ‘type’ specified in initializer
> drivers/infiniband/core/uverbs_std_types_flow_action.c:377: warning: 
> missing braces around initializer
> drivers/infiniband/core/uverbs_std_types_flow_action.c:377: warning: 
> (near initialization for 
> ‘uverbs_flow_action_esp_replay[0]..’)
> drivers/infiniband/core/uverbs_std_types_flow_action.c:379: error: 
> unknown field ‘len’ specified in initializer
> drivers/infiniband/core/uverbs_std_types_flow_action.c:379: warning: 
> excess elements in union initializer
> drivers/infiniband/core/uverbs_std_types_flow_action.c:379: warning: 
> (near initialization for ‘uverbs_flow_action_esp_replay[0].’)
> drivers/infiniband/core/uverbs_std_types_flow_action.c:383: error: 
> unknown field ‘ptr’ specified in initializer
> drivers/infiniband/core/uverbs_std_types_flow_action.c:384: error: 
> unknown field ‘type’ specified in initializer
> drivers/infiniband/core/uverbs_std_types_flow_action.c:385: error: 
> unknown field ‘min_len’ specified in initializer
> drivers/infiniband/core/uverbs_std_types_flow_action.c:385: warning: 
> excess elements in union initializer
> drivers/infiniband/core/uverbs_std_types_flow_action.c:385: warning: 
> (near initialization for ‘uverbs_flow_action_esp_replay[1].’)
> drivers/infiniband/core/uverbs_std_types_flow_action.c:385: error: 
> unknown field ‘len’ specified in initializer
> drivers/infiniband/core/uverbs_std_types_flow_action.c:385: warning: 
> excess elements in union initializer
> drivers/infiniband/core/uverbs_std_types_flow_action.c:385: warning: 
> (near initialization for ‘uverbs_flow_action_esp_replay[1].’)
> drivers/infiniband/core/uverbs_std_types_flow_action.c:386: error: 
> unknown field ‘flags’ specified in initializer
> drivers/infiniband/core/uverbs_std_types_flow_action.c:386: warning: 
> excess elements in union initializer
> drivers/infiniband/core/uverbs_std_types_flow_action.c:386: warning: 
> (near initialization for ‘uverbs_flow_action_esp_replay[1].’)
> 
> Add the missing braces to fix this.
> 
> Signed-off-by: Geert Uytterhoeven 
> ---
> Presumably also needed for Andrew's gcc-4.4.4.
> 
> Unfortunately I don't know how to fix the remaining warnings:
> 
> drivers/infiniband/core/uverbs_std_types_flow_action.c:391: warning: 
> initialization from incompatible pointer type
> drivers/infiniband/core/uverbs_std_types_flow_action.c:408: warning: 
> initialization from incompatible pointer type
> drivers/infiniband/core/uverbs_std_types_flow_action.c:423: warning: 
> initialization from incompatible pointer type
> drivers/infiniband/core/uverbs_std_types_flow_action.c:430: warning: 
> initialization from incompatible pointer type
> drivers/infiniband/core/uverbs_std_types_cq.c:149: warning: 
> initialization from incompatible pointer type
> drivers/infiniband/core/uverbs_std_types_cq.c:194: warning: 
> initialization from incompatible pointer type
> drivers/infiniband/core/uverbs_std_types_cq.c:202: warning: 
> initialization from inco

[PATCH v4 1/2] perf: riscv: preliminary RISC-V support

2018-04-17 Thread Alan Kao
This patch provide a basic PMU, riscv_base_pmu, which supports two
general hardware event, instructions and cycles.  Furthermore, this
PMU serves as a reference implementation to ease the portings in
the future.

riscv_base_pmu should be able to run on any RISC-V machine that
conforms to the Priv-Spec.  Note that the latest qemu model hasn't
fully support a proper behavior of Priv-Spec 1.10 yet, but work
around should be easy with very small fixes.  Please check
https://github.com/riscv/riscv-qemu/pull/115 for future updates.

Cc: Nick Hu 
Cc: Greentime Hu 
Signed-off-by: Alan Kao 
---
 arch/riscv/Kconfig  |  13 +
 arch/riscv/include/asm/perf_event.h |  79 -
 arch/riscv/kernel/Makefile  |   1 +
 arch/riscv/kernel/perf_event.c  | 482 
 4 files changed, 571 insertions(+), 4 deletions(-)
 create mode 100644 arch/riscv/kernel/perf_event.c

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index c22ebe08e902..90d9c8e50377 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -203,6 +203,19 @@ config RISCV_ISA_C
 config RISCV_ISA_A
def_bool y
 
+menu "supported PMU type"
+   depends on PERF_EVENTS
+
+config RISCV_BASE_PMU
+   bool "Base Performance Monitoring Unit"
+   def_bool y
+   help
+ A base PMU that serves as a reference implementation and has limited
+ feature of perf.  It can run on any RISC-V machines so serves as the
+ fallback, but this option can also be disable to reduce kernel size.
+
+endmenu
+
 endmenu
 
 menu "Kernel type"
diff --git a/arch/riscv/include/asm/perf_event.h 
b/arch/riscv/include/asm/perf_event.h
index e13d2ff29e83..0e638a0c3feb 100644
--- a/arch/riscv/include/asm/perf_event.h
+++ b/arch/riscv/include/asm/perf_event.h
@@ -1,13 +1,84 @@
+/* SPDX-License-Identifier: GPL-2.0 */
 /*
  * Copyright (C) 2018 SiFive
+ * Copyright (C) 2018 Andes Technology Corporation
  *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public Licence
- * as published by the Free Software Foundation; either version
- * 2 of the Licence, or (at your option) any later version.
  */
 
 #ifndef _ASM_RISCV_PERF_EVENT_H
 #define _ASM_RISCV_PERF_EVENT_H
 
+#include 
+#include 
+
+#define RISCV_BASE_COUNTERS2
+
+/*
+ * The RISCV_MAX_COUNTERS parameter should be specified.
+ */
+
+#ifdef CONFIG_RISCV_BASE_PMU
+#define RISCV_MAX_COUNTERS 2
+#endif
+
+#ifndef RISCV_MAX_COUNTERS
+#error "Please provide a valid RISCV_MAX_COUNTERS for the PMU."
+#endif
+
+/*
+ * These are the indexes of bits in counteren register *minus* 1,
+ * except for cycle.  It would be coherent if it can directly mapped
+ * to counteren bit definition, but there is a *time* register at
+ * counteren[1].  Per-cpu structure is scarce resource here.
+ *
+ * According to the spec, an implementation can support counter up to
+ * mhpmcounter31, but many high-end processors has at most 6 general
+ * PMCs, we give the definition to MHPMCOUNTER8 here.
+ */
+#define RISCV_PMU_CYCLE0
+#define RISCV_PMU_INSTRET  1
+#define RISCV_PMU_MHPMCOUNTER3 2
+#define RISCV_PMU_MHPMCOUNTER4 3
+#define RISCV_PMU_MHPMCOUNTER5 4
+#define RISCV_PMU_MHPMCOUNTER6 5
+#define RISCV_PMU_MHPMCOUNTER7 6
+#define RISCV_PMU_MHPMCOUNTER8 7
+
+#define RISCV_OP_UNSUPP(-EOPNOTSUPP)
+
+struct cpu_hw_events {
+   /* # currently enabled events*/
+   int n_events;
+   /* currently enabled events */
+   struct perf_event   *events[RISCV_MAX_COUNTERS];
+   /* vendor-defined PMU data */
+   void*platform;
+};
+
+struct riscv_pmu {
+   struct pmu  *pmu;
+
+   /* generic hw/cache events table */
+   const int   *hw_events;
+   const int   (*cache_events)[PERF_COUNT_HW_CACHE_MAX]
+  [PERF_COUNT_HW_CACHE_OP_MAX]
+  [PERF_COUNT_HW_CACHE_RESULT_MAX];
+   /* method used to map hw/cache events */
+   int (*map_hw_event)(u64 config);
+   int (*map_cache_event)(u64 config);
+
+   /* max generic hw events in map */
+   int max_events;
+   /* number total counters, 2(base) + x(general) */
+   int num_counters;
+   /* the width of the counter */
+   int counter_width;
+
+   /* vendor-defined PMU features */
+   void*platform;
+
+   irqreturn_t (*handle_irq)(int irq_num, void *dev);
+   int irq;
+};
+
 #endif /* _ASM_RISCV_PERF_EVENT_H */
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index ffa439d4a364..f50d19816757 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -39,5 +39,6 @@ obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o
 obj-$(CONFIG_FUNCTION_TRACER)  += mcount.o
 obj-$(CONFIG_DYNAMIC_FTRACE)   += mcount-dyn.o
 obj-$(CONFIG_FU

[PATCH v4 0/2] perf: riscv: Preliminary Perf Event Support on RISC-V

2018-04-17 Thread Alan Kao
This implements the baseline PMU for RISC-V platforms.

To ease future PMU portings, a guide is also written, containing
perf concepts, arch porting practices and some hints.

Changes in v4:
 - Fix several compilation errors.  Sorry for that.
 - Raise a warning in the write_counter body.

Changes in v3:
 - Fix typos in the document.
 - Change the initialization routine from statically assigning PMU to
   device-tree-based methods, and set default to the PMU proposed in
   this patch.

Changes in v2:
 - Fix the bug reported by Alex, which was caused by not sufficient
   initialization.  Check https://lkml.org/lkml/2018/3/31/251 for the
   discussion.

Alan Kao (2):
  perf: riscv: preliminary RISC-V support
  perf: riscv: Add Document for Future Porting Guide

 Documentation/riscv/pmu.txt | 249 ++
 arch/riscv/Kconfig  |  13 +
 arch/riscv/include/asm/perf_event.h |  79 -
 arch/riscv/kernel/Makefile  |   1 +
 arch/riscv/kernel/perf_event.c  | 482 
 5 files changed, 820 insertions(+), 4 deletions(-)
 create mode 100644 Documentation/riscv/pmu.txt
 create mode 100644 arch/riscv/kernel/perf_event.c

-- 
2.17.0



[PATCH v4 2/2] perf: riscv: Add Document for Future Porting Guide

2018-04-17 Thread Alan Kao
Reviewed-by: Alex Solomatnikov 
Cc: Nick Hu 
Cc: Greentime Hu 
Signed-off-by: Alan Kao 
---
 Documentation/riscv/pmu.txt | 249 
 1 file changed, 249 insertions(+)
 create mode 100644 Documentation/riscv/pmu.txt

diff --git a/Documentation/riscv/pmu.txt b/Documentation/riscv/pmu.txt
new file mode 100644
index ..b29f03a6d82f
--- /dev/null
+++ b/Documentation/riscv/pmu.txt
@@ -0,0 +1,249 @@
+Supporting PMUs on RISC-V platforms
+==
+Alan Kao , Mar 2018
+
+Introduction
+
+
+As of this writing, perf_event-related features mentioned in The RISC-V ISA
+Privileged Version 1.10 are as follows:
+(please check the manual for more details)
+
+* [m|s]counteren
+* mcycle[h], cycle[h]
+* minstret[h], instret[h]
+* mhpeventx, mhpcounterx[h]
+
+With such function set only, porting perf would require a lot of work, due to
+the lack of the following general architectural performance monitoring 
features:
+
+* Enabling/Disabling counters
+  Counters are just free-running all the time in our case.
+* Interrupt caused by counter overflow
+  No such feature in the spec.
+* Interrupt indicator
+  It is not possible to have many interrupt ports for all counters, so an
+  interrupt indicator is required for software to tell which counter has
+  just overflowed.
+* Writing to counters
+  There will be an SBI to support this since the kernel cannot modify the
+  counters [1].  Alternatively, some vendor considers to implement
+  hardware-extension for M-S-U model machines to write counters directly.
+
+This document aims to provide developers a quick guide on supporting their
+PMUs in the kernel.  The following sections briefly explain perf' mechanism
+and todos.
+
+You may check previous discussions here [1][2].  Also, it might be helpful
+to check the appendix for related kernel structures.
+
+
+1. Initialization
+-
+
+*riscv_pmu* is a global pointer of type *struct riscv_pmu*, which contains
+various methods according to perf's internal convention and PMU-specific
+parameters.  One should declare such instance to represent the PMU.  By 
default,
+*riscv_pmu* points to a constant structure *riscv_base_pmu*, which has very
+basic support to a baseline QEMU model.
+
+Then he/she can either assign the instance's pointer to *riscv_pmu* so that
+the minimal and already-implemented logic can be leveraged, or invent his/her
+own *riscv_init_platform_pmu* implementation.
+
+In other words, existing sources of *riscv_base_pmu* merely provide a
+reference implementation.  Developers can flexibly decide how many parts they
+can leverage, and in the most extreme case, they can customize every function
+according to their needs.
+
+
+2. Event Initialization
+---
+
+When a user launches a perf command to monitor some events, it is first
+interpreted by the userspace perf tool into multiple *perf_event_open*
+system calls, and then each of them calls to the body of *event_init*
+member function that was assigned in the previous step.  In *riscv_base_pmu*'s
+case, it is *riscv_event_init*.
+
+The main purpose of this function is to translate the event provided by user
+into bitmap, so that HW-related control registers or counters can directly be
+manipulated.  The translation is based on the mappings and methods provided in
+*riscv_pmu*.
+
+Note that some features can be done in this stage as well:
+
+(1) interrupt setting, which is stated in the next section;
+(2) privilege level setting (user space only, kernel space only, both);
+(3) destructor setting.  Normally it is sufficient to apply 
*riscv_destroy_event*;
+(4) tweaks for non-sampling events, which will be utilized by functions such as
+*perf_adjust_period*, usually something like the follows:
+
+if (!is_sampling_event(event)) {
+hwc->sample_period = x86_pmu.max_period;
+hwc->last_period = hwc->sample_period;
+local64_set(&hwc->period_left, hwc->sample_period);
+}
+
+In the case of *riscv_base_pmu*, only (3) is provided for now.
+
+
+3. Interrupt
+
+
+3.1. Interrupt Initialization
+
+This often occurs at the beginning of the *event_init* method. In common
+practice, this should be a code segment like
+
+int x86_reserve_hardware(void)
+{
+int err = 0;
+
+if (!atomic_inc_not_zero(&pmc_refcount)) {
+mutex_lock(&pmc_reserve_mutex);
+if (atomic_read(&pmc_refcount) == 0) {
+if (!reserve_pmc_hardware())
+err = -EBUSY;
+else
+reserve_ds_buffers();
+}
+if (!err)
+atomic_inc(&pmc_refcount);
+mutex_unlock(&pmc_reserve_mutex);
+}
+
+return err;
+}
+
+And the magic is in *reserve_pmc_hardware*, which usually does atomic
+operations to make implemented IRQ accessible from some global functio

[Patch v2] staging fbtft: Fixed lines exceeding columns limit

2018-04-17 Thread Renato Soma
Fix checkpatch.pl warnings of lines exceeding 80 columns.
Break lines in order to reduce instructions lengths to less than 80 columns.

Signed-off-by: Renato Soma 
---
Changes in v2:
- Break lines respecting function parameters alignment.

 drivers/staging/fbtft/fbtft-bus.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/fbtft/fbtft-bus.c 
b/drivers/staging/fbtft/fbtft-bus.c
index a263bce2..871b307 100644
--- a/drivers/staging/fbtft/fbtft-bus.c
+++ b/drivers/staging/fbtft/fbtft-bus.c
@@ -22,10 +22,13 @@ void func(struct fbtft_par *par, int len, ...)  
  \
if (unlikely(par->debug & DEBUG_WRITE_REGISTER)) {\
va_start(args, len);  \
for (i = 0; i < len; i++) {   \
-   buf[i] = modifier((data_type)va_arg(args, unsigned 
int)); \
+   buf[i] = modifier((data_type)va_arg(args, \
+   unsigned int));   \
} \
va_end(args); \
-   fbtft_par_dbg_hex(DEBUG_WRITE_REGISTER, par, par->info->device, 
buffer_type, buf, len, "%s: ", __func__); \
+   fbtft_par_dbg_hex(DEBUG_WRITE_REGISTER, par,  \
+ par->info->device, buffer_type, buf, len,   \
+ "%s: ", __func__);  \
} \
  \
va_start(args, len);  \
@@ -37,7 +40,8 @@ void func(struct fbtft_par *par, int len, ...)
\
} \
  \
*buf = modifier((data_type)va_arg(args, unsigned int));   \
-   ret = fbtft_write_buf_dc(par, par->buf, sizeof(data_type) + offset, 0); 
\
+   ret = fbtft_write_buf_dc(par, par->buf, sizeof(data_type) + offset,   \
+0);  \
if (ret < 0)  \
goto out; \
len--;\
@@ -48,7 +52,8 @@ void func(struct fbtft_par *par, int len, ...)
\
if (len) {\
i = len;  \
while (i--)   \
-   *buf++ = modifier((data_type)va_arg(args, unsigned 
int)); \
+   *buf++ = modifier((data_type)va_arg(args, \
+   unsigned int));   \
fbtft_write_buf_dc(par, par->buf, \
   len * (sizeof(data_type) + offset), 1);\
} \
-- 
2.7.4



Re: [PATCH ghak80 V1] audit: add syscall information to FEATURE_CHANGE records

2018-04-17 Thread Paul Moore
On Tue, Apr 17, 2018 at 6:10 PM, Steve Grubb  wrote:
> On Tuesday, April 17, 2018 6:06:24 PM EDT Paul Moore wrote:
>> On Wed, Apr 11, 2018 at 8:46 AM, Richard Guy Briggs  wrote:
>> > Tie syscall information to FEATURE_CHANGE calls since it is a result of
>> > user action.
>> >
>> > See: https://github.com/linux-audit/audit-kernel/issues/80
>> >
>> > Signed-off-by: Richard Guy Briggs 
>> > ---
>> >
>> >  kernel/audit.c | 5 ++---
>> >  1 file changed, 2 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/kernel/audit.c b/kernel/audit.c
>> > index 8da24ef..23f125b 100644
>> > --- a/kernel/audit.c
>> > +++ b/kernel/audit.c
>> > @@ -1103,10 +1103,9 @@ static void audit_log_feature_change(int which,
>> > u32 old_feature, u32 new_feature>
>> >  {
>> >
>> > struct audit_buffer *ab;
>> >
>> > -   if (audit_enabled == AUDIT_OFF)
>> > +   if (!audit_enabled)
>>
>> Sooo, this is an unrelated style change, why?  Looking at the rest of
>> kernel/audit.c we seem to use a mix of "(!x)" and "(x == 0/CONST)" so
>> why are you adding noise to this patch?
>>
>> > return;
>> >
>> > -
>> > -   ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_FEATURE_CHANGE);
>> > +   ab = audit_log_start(current->audit_context, GFP_KERNEL,
>> > AUDIT_FEATURE_CHANGE);
>> This is the important part, and the Right Thing To Do.
>
> This is an unexpected change. I have asked questions on the github issue
> tracker but have not gotten a satisfactory answer. Please do not merge this
> until there's agreement on this.

It shouldn't be surprising, we've been talking about connecting
records for some time now, in different contexts and both on and off
list.  Not only does it helps pave the way for the audit container ID
work, it just makes sense.  I've seen your questions in this
particular GitHub issue and I think Richard has answered them
satisfactorily.

Once Richard removes the style change, or provides a good enough
reason for why it should stay in this patch, I plan on merging this
into audit/next.

-- 
paul moore
www.paul-moore.com


Re: [PATCH 1/3] infiniband: i40iw: Replace GFP_ATOMIC with GFP_KERNEL in i40iw_add_mqh_4

2018-04-17 Thread Jason Gunthorpe
On Wed, Apr 11, 2018 at 03:32:25PM +0800, Jia-Ju Bai wrote:
> i40iw_add_mqh_4() is never called in atomic context, because it 
> calls rtnl_lock() that can sleep.
> 
> Despite never getting called from atomic context,
> i40iw_add_mqh_4() calls kzalloc() with GFP_ATOMIC,
> which does not sleep for allocation.
> GFP_ATOMIC is not necessary and can be replaced with GFP_KERNEL,
> which can sleep and improve the possibility of sucessful allocation.
> 
> This is found by a static analysis tool named DCNS written by myself.
> And I also manually check it.
> 
> Signed-off-by: Jia-Ju Bai 
> Acked-by: Shiraz Saleem 
> ---
>  drivers/infiniband/hw/i40iw/i40iw_cm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Applied all three patches in this series to for-next, thanks

Jason


Re: [PATCH v6 3/4] MIPS: vmlinuz: Use generic ashldi3

2018-04-17 Thread Masahiro Yamada
2018-04-18 8:09 GMT+09:00 James Hogan :
> On Wed, Apr 11, 2018 at 08:50:18AM +0100, Matt Redfearn wrote:
>> diff --git a/arch/mips/boot/compressed/Makefile 
>> b/arch/mips/boot/compressed/Makefile
>> index adce180f3ee4..e03f522c33ac 100644
>> --- a/arch/mips/boot/compressed/Makefile
>> +++ b/arch/mips/boot/compressed/Makefile
>> @@ -46,9 +46,12 @@ $(obj)/uart-ath79.c: 
>> $(srctree)/arch/mips/ath79/early_printk.c
>>
>>  vmlinuzobjs-$(CONFIG_KERNEL_XZ) += $(obj)/ashldi3.o $(obj)/bswapsi.o
>>
>> -extra-y += ashldi3.c bswapsi.c
>> -$(obj)/ashldi3.o $(obj)/bswapsi.o: KBUILD_CFLAGS += 
>> -I$(srctree)/arch/mips/lib
>> -$(obj)/ashldi3.c $(obj)/bswapsi.c: $(obj)/%.c: $(srctree)/arch/mips/lib/%.c
>> +extra-y += ashldi3.c
>> +$(obj)/ashldi3.c: $(obj)/%.c: $(srctree)/lib/%.c
>> + $(call cmd,shipped)
>> +
>> +extra-y += bswapsi.c
>> +$(obj)/bswapsi.c: $(obj)/%.c: $(srctree)/arch/mips/lib/%.c
>>   $(call cmd,shipped)
>
> ci20_defconfig:
>
> arch/mips/boot/compressed/ashldi3.c:4:10: fatal error: libgcc.h: No such file 
> or directory
>  #include "libgcc.h"
>^~
>
> It looks like it had already copied ashldi3.c from arch/mips/lib/ when
> building an older commit, and it hasn't been regenerated from lib/ since
> the Makefile changed, so its still using the old version.
>
> I think it should be using FORCE and if_changed like this:
>
> diff --git a/arch/mips/boot/compressed/Makefile 
> b/arch/mips/boot/compressed/Makefile
> index e03f522c33ac..abe77add8789 100644
> --- a/arch/mips/boot/compressed/Makefile
> +++ b/arch/mips/boot/compressed/Makefile
> @@ -47,12 +47,12 @@ $(obj)/uart-ath79.c: 
> $(srctree)/arch/mips/ath79/early_printk.c
>  vmlinuzobjs-$(CONFIG_KERNEL_XZ) += $(obj)/ashldi3.o $(obj)/bswapsi.o
>
>  extra-y += ashldi3.c
> -$(obj)/ashldi3.c: $(obj)/%.c: $(srctree)/lib/%.c
> -   $(call cmd,shipped)
> +$(obj)/ashldi3.c: $(obj)/%.c: $(srctree)/lib/%.c FORCE
> +   $(call if_changed,shipped)
>
>  extra-y += bswapsi.c
> -$(obj)/bswapsi.c: $(obj)/%.c: $(srctree)/arch/mips/lib/%.c
> -   $(call cmd,shipped)
> +$(obj)/bswapsi.c: $(obj)/%.c: $(srctree)/arch/mips/lib/%.c FORCE
> +   $(call if_changed,shipped)
>
>  targets := $(notdir $(vmlinuzobjs-y))
>
> That resolves the build failures when checking out old -> new without
> cleaning, since the .ashldi3.c.cmd is missing so it gets rebuilt.
>
> It should also resolve issues if the path it copies from is updated in
> future since the .ashldi3.c.cmd will get updated.
>
> If you checkout new -> old without cleaning, the now removed
> arch/mips/lib/ashldi3.c will get added which will trigger regeneration,
> so it won't error.
>
> However if you do new -> old -> new then the .ashldi3.cmd file isn't
> updated while at old, so you get the same error as above. I'm not sure
> there's much we can practically do about that, aside perhaps avoiding
> the issue in future by somehow auto-deleting stale .*.cmd files.
>
> Cc'ing kbuild folk in case they have any bright ideas.



I do not have any idea better than if_changed.




> At least the straightforward old->new upgrade will work with the above
> fixup though. If you're okay with it I'm happy to apply as a fixup.
>






-- 
Best Regards
Masahiro Yamada


Re: [PATCH ghak46 V1] audit: normalize MAC_STATUS record

2018-04-17 Thread Paul Moore
On Tue, Apr 17, 2018 at 6:09 PM, Richard Guy Briggs  wrote:
> On 2018-04-17 17:59, Paul Moore wrote:
>> On Wed, Apr 11, 2018 at 5:08 PM, Paul Moore  wrote:
>> > On Mon, Apr 9, 2018 at 7:34 PM, Richard Guy Briggs  wrote:
>> >> There were two formats of the audit MAC_STATUS record, one of which was 
>> >> more
>> >> standard than the other.  One listed enforcing status changes and the
>> >> other listed enabled status changes with a non-standard label.  In
>> >> addition, the record was missing information about which LSM was
>> >> responsible and the operation's completion status.  While this record is
>> >> only issued on success, the parser expects the res= field to be present.
>> >>
>> >> old enforcing/permissive:
>> >> type=MAC_STATUS msg=audit(1523312831.378:24514): enforcing=0 
>> >> old_enforcing=1 auid=0 ses=1
>> >> old enable/disable:
>> >> type=MAC_STATUS msg=audit(1523312831.378:24514): selinux=0 auid=0 ses=1
>> >>
>> >> List both sets of status and old values and add the lsm= field and the
>> >> res= field.
>> >>
>> >> Here is the new format:
>> >> type=MAC_STATUS msg=audit(1523293828.657:891): enforcing=0 
>> >> old_enforcing=1 auid=0 ses=1 enabled=1 old-enabled=1 lsm=selinux res=1
>> >>
>> >> This record already accompanied a SYSCALL record.
>> >>
>> >> See: https://github.com/linux-audit/audit-kernel/issues/46
>> >> Signed-off-by: Richard Guy Briggs 
>> >> ---
>> >>  security/selinux/selinuxfs.c | 11 +++
>> >>  1 file changed, 7 insertions(+), 4 deletions(-)
>> >>
>> >> diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
>> >> index 00eed84..00b21b2 100644
>> >> --- a/security/selinux/selinuxfs.c
>> >> +++ b/security/selinux/selinuxfs.c
>> >> @@ -145,10 +145,11 @@ static ssize_t sel_write_enforce(struct file *file, 
>> >> const char __user *buf,
>> >> if (length)
>> >> goto out;
>> >> audit_log(current->audit_context, GFP_KERNEL, 
>> >> AUDIT_MAC_STATUS,
>> >> -   "enforcing=%d old_enforcing=%d auid=%u ses=%u",
>> >> +   "enforcing=%d old_enforcing=%d auid=%u ses=%u"
>> >> +   " enabled=%d old-enabled=%d lsm=selinux res=1",
>> >> new_value, selinux_enforcing,
>> >> from_kuid(&init_user_ns, 
>> >> audit_get_loginuid(current)),
>> >> -   audit_get_sessionid(current));
>> >> +   audit_get_sessionid(current), selinux_enabled, 
>> >> selinux_enabled);
>> >
>> > This looks fine.
>> >
>> >> selinux_enforcing = new_value;
>> >> if (selinux_enforcing)
>> >> avc_ss_reset(0);
>> >> @@ -272,9 +273,11 @@ static ssize_t sel_write_disable(struct file *file, 
>> >> const char __user *buf,
>> >> if (length)
>> >> goto out;
>> >> audit_log(current->audit_context, GFP_KERNEL, 
>> >> AUDIT_MAC_STATUS,
>> >> -   "selinux=0 auid=%u ses=%u",
>> >> +   "enforcing=%d old_enforcing=%d auid=%u ses=%u"
>> >> +   " enabled=%d old-enabled=%d lsm=selinux res=1",
>> >> +   selinux_enforcing, selinux_enforcing,
>> >> from_kuid(&init_user_ns, 
>> >> audit_get_loginuid(current)),
>> >> -   audit_get_sessionid(current));
>> >> +   audit_get_sessionid(current), 0, 1);
>> >
>> > It needs to be said again that I'm opposed to changes like this:
>> > inserting new fields, removing fields, or otherwise changing the
>> > format in ways that aren't strictly the addition of new fields to the
>> > end of a record is a Bad Thing.  However, there are exceptions (there
>> > are *always* exceptions), and this seems like a reasonable change that
>> > shouldn't negatively affect anyone.
>> >
>> > I'll merge this once the merge window comes to a close (we are going
>> > to need to base selinux/next on v4.17-rc1).
>>
>> Merged into selinux/next, although I should mention that there were
>> some actual code changes because of the SELinux state consolidation
>> patches that went into v4.17.  The changes were small but please take
>> a look and make sure everything still looks okay to you.
>
> Ok, that was a bit disruptive, but looks ok to me.

Yes, it was a pretty big change, but it sets the stage for a few
things we are trying to do with SELinux.

Regardless, thanks for giving the merge a quick look.

-- 
paul moore
www.paul-moore.com


Re: [PATCH 2/3] mm: add find_alloc_contig_pages() interface

2018-04-17 Thread Mike Kravetz
On 04/17/2018 05:10 AM, kbuild test robot wrote:
> All errors (new ones prefixed by >>):
> 
>In file included from include/linux/slab.h:15:0,
> from include/linux/crypto.h:24,
> from arch/x86/kernel/asm-offsets.c:9:
>>> include/linux/gfp.h:580:15: error: unknown type name 'page'
> static inline page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
>   ^~~~
>include/linux/gfp.h:585:13: warning: 'free_contig_pages' defined but not 
> used [-Wunused-function]
> static void free_contig_pages(struct page *page, unsigned long nr_pages)
> ^

Build issues fixed in updated patch below,

>From 28efabb5625c079573a821c8c5cfc19cc73a86bd Mon Sep 17 00:00:00 2001
From: Mike Kravetz 
Date: Mon, 16 Apr 2018 18:41:36 -0700
Subject: [PATCH 2/3] mm: add find_alloc_contig_pages() interface

find_alloc_contig_pages() is a new interface that attempts to locate
and allocate a contiguous range of pages.  It is provided as a more
convenient interface than alloc_contig_range() which is currently
used by CMA and gigantic huge pages.

When attempting to allocate a range of pages, migration is employed
if possible.  There is no guarantee that the routine will succeed.
So, the user must be prepared for failure and have a fall back plan.

Signed-off-by: Mike Kravetz 
---
 include/linux/gfp.h | 12 
 mm/page_alloc.c | 89 +++--
 2 files changed, 99 insertions(+), 2 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index 86a0d06463ab..7d1ea4e659dc 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -573,6 +573,18 @@ static inline bool pm_suspended_storage(void)
 extern int alloc_contig_range(unsigned long start, unsigned long end,
  unsigned migratetype, gfp_t gfp_mask);
 extern void free_contig_range(unsigned long pfn, unsigned long nr_pages);
+extern struct page *find_alloc_contig_pages(unsigned int order, gfp_t gfp,
+   int nid, nodemask_t *nodemask);
+extern void free_contig_pages(struct page *page, unsigned long nr_pages);
+#else
+static inline struct page *find_alloc_contig_pages(unsigned int order,
+   gfp_t gfp, int nid, nodemask_t *nodemask)
+{
+   return NULL;
+}
+static inline void free_contig_pages(struct page *page, unsigned long nr_pages)
+{
+}
 #endif
 
 #ifdef CONFIG_CMA
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0fd5e8e2456e..81070fe55c44 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -67,6 +67,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -2010,9 +2011,13 @@ static __always_inline struct page 
*__rmqueue_cma_fallback(struct zone *zone,
 {
return __rmqueue_smallest(zone, order, MIGRATE_CMA);
 }
+#define contig_alloc_migratetype_ok(migratetype) \
+   ((migratetype) == MIGRATE_CMA || (migratetype) == MIGRATE_MOVABLE)
 #else
 static inline struct page *__rmqueue_cma_fallback(struct zone *zone,
unsigned int order) { return NULL; }
+#define contig_alloc_migratetype_ok(migratetype) \
+   ((migratetype) == MIGRATE_MOVABLE)
 #endif
 
 /*
@@ -7822,6 +7827,9 @@ int alloc_contig_range(unsigned long start, unsigned long 
end,
};
INIT_LIST_HEAD(&cc.migratepages);
 
+   if (!contig_alloc_migratetype_ok(migratetype))
+   return -EINVAL;
+
/*
 * What we do here is we mark all pageblocks in range as
 * MIGRATE_ISOLATE.  Because pageblock and max order pages may
@@ -7912,8 +7920,9 @@ int alloc_contig_range(unsigned long start, unsigned long 
end,
 
/* Make sure the range is really isolated. */
if (test_pages_isolated(outer_start, end, false)) {
-   pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
-   __func__, outer_start, end);
+   if (!(migratetype == MIGRATE_MOVABLE)) /* only print for CMA */
+   pr_info_ratelimited("%s: [%lx, %lx) PFNs busy\n",
+   __func__, outer_start, end);
ret = -EBUSY;
goto done;
}
@@ -7949,6 +7958,82 @@ void free_contig_range(unsigned long pfn, unsigned long 
nr_pages)
}
WARN(count != 0, "%ld pages are still in use!\n", count);
 }
+
+static bool contig_pfn_range_valid(struct zone *z, unsigned long start_pfn,
+   unsigned long nr_pages)
+{
+   unsigned long i, end_pfn = start_pfn + nr_pages;
+   struct page *page;
+
+   for (i = start_pfn; i < end_pfn; i++) {
+   if (!pfn_valid(i))
+   return false;
+
+   page = pfn_to_page(i);
+
+   if (page_zone(page) != z)
+   return false;
+
+   }
+
+   return true;
+}
+
+/**
+ * find_alloc_contig_pages() -- attempt to find and alloca

Re: [PATCH v2] infiniband: mlx5: fix build errors when INFINIBAND_USER_ACCESS=m

2018-04-17 Thread Jason Gunthorpe
On Mon, Apr 16, 2018 at 06:51:50PM -0700, Randy Dunlap wrote:
> From: Randy Dunlap 
> 
> Fix build errors when INFINIBAND_USER_ACCESS=m and MLX5_INFINIBAND=y.
> The build error occurs when the mlx5 driver code attempts to use
> USER_ACCESS interfaces, which are built as a loadable module.
> 
> Fixes these build errors:
> 
> drivers/infiniband/hw/mlx5/main.o: In function `populate_specs_root':
> ../drivers/infiniband/hw/mlx5/main.c:4982: undefined reference to 
> `uverbs_default_get_objects'
> ../drivers/infiniband/hw/mlx5/main.c:4994: undefined reference to 
> `uverbs_alloc_spec_tree'
> drivers/infiniband/hw/mlx5/main.o: In function `depopulate_specs_root':
> ../drivers/infiniband/hw/mlx5/main.c:5001: undefined reference to 
> `uverbs_free_spec_tree'
> 
> Build-tested with multiple config combinations.
> 
> Reported-by: kbuild test robot 
> Signed-off-by: Randy Dunlap 
> Cc: Matan Barak 
> Cc: Jason Gunthorpe 
> Cc: Leon Romanovsky 
> Cc: Doug Ledford 
> Cc: linux-r...@vger.kernel.org
> Cc: sta...@vger.kernel.org # reported against 4.16
> ---
> v2: allow building mxl5 even when INFINIBAND_USER_ACCESS is disabled.
> 
>  drivers/infiniband/hw/mlx5/Kconfig |1 +
>  1 file changed, 1 insertion(+)

I added a

Fixes: 8c84660bb437 ("IB/mlx5: Initialize the parsing tree root without the 
help of uverbs")

Applied to for-rc thanks

Jason


Re: [PATCH v5 4/4] zram: introduce zram memory tracking

2018-04-17 Thread Minchan Kim
Hi Andrew,

On Tue, Apr 17, 2018 at 02:59:21PM -0700, Andrew Morton wrote:
> On Mon, 16 Apr 2018 18:09:46 +0900 Minchan Kim  wrote:
> 
> > zRam as swap is useful for small memory device. However, swap means
> > those pages on zram are mostly cold pages due to VM's LRU algorithm.
> > Especially, once init data for application are touched for launching,
> > they tend to be not accessed any more and finally swapped out.
> > zRAM can store such cold pages as compressed form but it's pointless
> > to keep in memory. Better idea is app developers free them directly
> > rather than remaining them on heap.
> > 
> > This patch tell us last access time of each block of zram via
> > "cat /sys/kernel/debug/zram/zram0/block_state".
> > 
> > The output is as follows,
> >   30075.033841 .wh
> >   30163.806904 s..
> >   30263.806919 ..h
> > 
> > First column is zram's block index and 3rh one represents symbol
> > (s: same page w: written page to backing store h: huge page) of the
> > block state. Second column represents usec time unit of the block
> > was last accessed. So above example means the 300th block is accessed
> > at 75.033851 second and it was huge so it was written to the backing
> > store.
> > 
> > Admin can leverage this information to catch cold|incompressible pages
> > of process with *pagemap* once part of heaps are swapped out.
> 
> A few things..
> 
> - Terms like "Admin can" and "Admin could" are worrisome.  How do we
>   know that admins *will* use this?  How do we know that we aren't
>   adding a bunch of stuff which nobody will find to be (sufficiently)
>   useful?  For example, is there some userspace tool to which you are
>   contributing which will be updated to use this feature?

Actually, I used this feature two years ago to find memory hogger
although the feature was very fast prototyping. It was very useful
to reduce memory cost in embedded space.

The reason I am trying to upstream the feature is I need the feature
again. :)

Yub, I have a userspace tool to use the feature although it was
not compatible with this new version. It should be updated with
new format. I will find a time to submit the tool.

> 
> - block_state's second column is in microseconds since some
>   undocumented time.  But how is userspace to know how much time has
>   elapsed since the access?  ie, "current time".

It's a sched_clock so it should be elapsed time since the system boot.
I should have written it explictly.
I will fix it.

> 
> - Is the sched_clock() return value suitable for exporting to
>   userspace?  Is it monotonic?  Is it consistent across CPUs, across
>   CPU hotadd/remove, across suspend/resume, etc?  Does it run all the
>   way up to 2^64 on all CPU types, or will some processors wrap it at
>   (say) 32 bits?  etcetera.  Documentation/timers/timekeeping.txt
>   points out that suspend/resume can mess it up and that the counter
>   can drift between cpus.

Good point!

I just referenced it from ftrace because I thought the goal is similiar
"no need to be exact unless the drift is frequent but wanted to be fast"

AFAIK, ftrace/printk is active user of the function so if the problem
happens frequently, it might be serious. :)


Re: [RFC v2] virtio: support packed ring

2018-04-17 Thread Tiwei Bie
On Tue, Apr 17, 2018 at 06:54:51PM +0300, Michael S. Tsirkin wrote:
> On Tue, Apr 17, 2018 at 10:56:26PM +0800, Tiwei Bie wrote:
> > On Tue, Apr 17, 2018 at 05:04:59PM +0300, Michael S. Tsirkin wrote:
> > > On Tue, Apr 17, 2018 at 08:47:16PM +0800, Tiwei Bie wrote:
> > > > On Tue, Apr 17, 2018 at 03:17:41PM +0300, Michael S. Tsirkin wrote:
> > > > > On Tue, Apr 17, 2018 at 10:51:33AM +0800, Tiwei Bie wrote:
> > > > > > On Tue, Apr 17, 2018 at 10:11:58AM +0800, Jason Wang wrote:
> > > > > > > On 2018年04月13日 15:15, Tiwei Bie wrote:
> > > > > > > > On Fri, Apr 13, 2018 at 12:30:24PM +0800, Jason Wang wrote:
> > > > > > > > > On 2018年04月01日 22:12, Tiwei Bie wrote:
> > > > > > [...]
> > > > > > > > > > +static int detach_buf_packed(struct vring_virtqueue *vq, 
> > > > > > > > > > unsigned int head,
> > > > > > > > > > + void **ctx)
> > > > > > > > > > +{
> > > > > > > > > > +   struct vring_packed_desc *desc;
> > > > > > > > > > +   unsigned int i, j;
> > > > > > > > > > +
> > > > > > > > > > +   /* Clear data ptr. */
> > > > > > > > > > +   vq->desc_state[head].data = NULL;
> > > > > > > > > > +
> > > > > > > > > > +   i = head;
> > > > > > > > > > +
> > > > > > > > > > +   for (j = 0; j < vq->desc_state[head].num; j++) {
> > > > > > > > > > +   desc = &vq->vring_packed.desc[i];
> > > > > > > > > > +   vring_unmap_one_packed(vq, desc);
> > > > > > > > > > +   desc->flags = 0x0;
> > > > > > > > > Looks like this is unnecessary.
> > > > > > > > It's safer to zero it. If we don't zero it, after we
> > > > > > > > call virtqueue_detach_unused_buf_packed() which calls
> > > > > > > > this function, the desc is still available to the
> > > > > > > > device.
> > > > > > > 
> > > > > > > Well detach_unused_buf_packed() should be called after device is 
> > > > > > > stopped,
> > > > > > > otherwise even if you try to clear, there will still be a window 
> > > > > > > that device
> > > > > > > may use it.
> > > > > > 
> > > > > > This is not about whether the device has been stopped or
> > > > > > not. We don't have other places to re-initialize the ring
> > > > > > descriptors and wrap_counter. So they need to be set to
> > > > > > the correct values when doing detach_unused_buf.
> > > > > > 
> > > > > > Best regards,
> > > > > > Tiwei Bie
> > > > > 
> > > > > find vqs is the time to do it.
> > > > 
> > > > The .find_vqs() will call .setup_vq() which will eventually
> > > > call vring_create_virtqueue(). It's a different case. Here
> > > > we're talking about re-initializing the descs and updating
> > > > the wrap counter when detaching the unused descs (In this
> > > > case, split ring just needs to decrease vring.avail->idx).
> > > > 
> > > > Best regards,
> > > > Tiwei Bie
> > > 
> > > There's no requirement that  virtqueue_detach_unused_buf re-initializes
> > > the descs. It happens on cleanup path just before drivers delete the
> > > vqs.
> > 
> > Cool, I wasn't aware of it. I saw split ring decrease
> > vring.avail->idx after detaching an unused desc, so I
> > thought detaching unused desc also needs to make sure
> > that the ring state will be updated correspondingly.
> 
> 
> Hmm. You are right. Seems to be out console driver being out of spec.
> Will have to look at how to fix that :(
> 
> It was done here:
> 
> Commit b3258ff1d6086bd2b9eeb556844a868ad7d49bc8
> Author: Amit Shah 
> Date:   Wed Mar 16 19:12:10 2011 +0530
> 
> virtio: Decrement avail idx on buffer detach
> 
> When detaching a buffer from a vq, the avail.idx value should be
> decremented as well.
> 
> This was noticed by hot-unplugging a virtio console port and then
> plugging in a new one on the same number (re-using the vqs which were
> just 'disowned').  qemu reported
> 
>'Guest moved used index from 0 to 256'
> 
> when any IO was attempted on the new port.
> 
> CC: sta...@kernel.org
> Reported-by: juzhang 
> Signed-off-by: Amit Shah 
> Signed-off-by: Rusty Russell 
> 
> The spec is quite explicit though:
>   A driver MUST NOT decrement the available idx on a live virtqueue (ie. 
> there is no way to “unexpose”
>   buffers).
> 

Hmm.. Got it. Thanks!

Best regards,
Tiwei Bie


> 
> 
> 
> 
> > If there is no such requirement, do you think it's OK
> > to remove below two lines:
> > 
> > vq->avail_idx_shadow--;
> > vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow);
> > 
> > from virtqueue_detach_unused_buf(), and we could have
> > one generic function to handle both rings:
> > 
> > void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
> > {
> > struct vring_virtqueue *vq = to_vvq(_vq);
> > unsigned int num, i;
> > void *buf;
> > 
> > START_USE(vq);
> > 
> > num = vq->packed ? vq->vring_packed.num : vq->vring.num;
> > 
> > for (i = 0; i < num; i++) {
> > if (!vq->desc_state[i].data)
> > continue;
> > /* detach_buf clears data, so 

[PATCH] spi: cadence: Add usleep_range() for cdns_spi_fill_tx_fifo()

2018-04-17 Thread sxauwsk
In case of xspi work in busy condition, may send bytes failed.
once something wrong, spi controller did't work any more

My test found this situation appear in both of read/write process.
so when TX FIFO is full, add one byte delay before send data;

Signed-off-by: sxauwsk 
---
 drivers/spi/spi-cadence.c |8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/spi/spi-cadence.c b/drivers/spi/spi-cadence.c
index 5c9516a..4a00163 100644
--- a/drivers/spi/spi-cadence.c
+++ b/drivers/spi/spi-cadence.c
@@ -313,6 +313,14 @@ static void cdns_spi_fill_tx_fifo(struct cdns_spi *xspi)

while ((trans_cnt < CDNS_SPI_FIFO_DEPTH) &&
   (xspi->tx_bytes > 0)) {
+
+   /* When xspi in busy condition, bytes may send failed,
+* then spi control did't work thoroughly, add one byte delay
+*/
+   if (cdns_spi_read(xspi, CDNS_SPI_ISR) &
+   CDNS_SPI_IXR_TXFULL)
+   usleep_range(10, 20);
+
if (xspi->txbuf)
cdns_spi_write(xspi, CDNS_SPI_TXD, *xspi->txbuf++);
else
--
1.7.9.5



[PATCH v2 7/7] ocxl: Document new OCXL IOCTLs

2018-04-17 Thread Alastair D'Silva
From: Alastair D'Silva 

Signed-off-by: Alastair D'Silva 
---
 Documentation/accelerators/ocxl.rst | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/Documentation/accelerators/ocxl.rst 
b/Documentation/accelerators/ocxl.rst
index 7904adcc07fd..3b8d3b99795c 100644
--- a/Documentation/accelerators/ocxl.rst
+++ b/Documentation/accelerators/ocxl.rst
@@ -157,6 +157,17 @@ OCXL_IOCTL_GET_METADATA:
   Obtains configuration information from the card, such at the size of
   MMIO areas, the AFU version, and the PASID for the current context.
 
+OCXL_IOCTL_ENABLE_P9_WAIT:
+
+  Allows the AFU to wake a userspace thread executing 'wait'. Returns
+  information to userspace to allow it to configure the AFU. Note that
+  this is only available on Power 9.
+
+OCXL_IOCTL_GET_FEATURES:
+
+  Reports on which CPU features that affect OpenCAPI are usable from
+  userspace.
+
 mmap
 
 
-- 
2.14.3



[PATCH v2 2/7] powerpc: Use TIDR CPU feature to control TIDR allocation

2018-04-17 Thread Alastair D'Silva
From: Alastair D'Silva 

Switch the use of TIDR on it's CPU feature, rather than assuming it
is available based on architecture.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/kernel/process.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 1237f13fed51..3b00da47699b 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1154,7 +1154,7 @@ static inline void restore_sprs(struct thread_struct 
*old_thread,
mtspr(SPRN_TAR, new_thread->tar);
}
 
-   if (cpu_has_feature(CPU_FTR_ARCH_300) &&
+   if (cpu_has_feature(CPU_FTR_P9_TIDR) &&
old_thread->tidr != new_thread->tidr)
mtspr(SPRN_TIDR, new_thread->tidr);
 #endif
@@ -1570,7 +1570,7 @@ void clear_thread_tidr(struct task_struct *t)
if (!t->thread.tidr)
return;
 
-   if (!cpu_has_feature(CPU_FTR_ARCH_300)) {
+   if (!cpu_has_feature(CPU_FTR_P9_TIDR)) {
WARN_ON_ONCE(1);
return;
}
@@ -1593,7 +1593,7 @@ int set_thread_tidr(struct task_struct *t)
 {
int rc;
 
-   if (!cpu_has_feature(CPU_FTR_ARCH_300))
+   if (!cpu_has_feature(CPU_FTR_P9_TIDR))
return -EINVAL;
 
if (t != current)
-- 
2.14.3



[PATCH v2 5/7] ocxl: Expose the thread_id needed for wait on p9

2018-04-17 Thread Alastair D'Silva
From: Alastair D'Silva 

In order to successfully issue as_notify, an AFU needs to know the TID
to notify, which in turn means that this information should be
available in userspace so it can be communicated to the AFU.

Signed-off-by: Alastair D'Silva 
---
 drivers/misc/ocxl/context.c   |  5 +++-
 drivers/misc/ocxl/file.c  | 53 +++
 drivers/misc/ocxl/link.c  | 36 ++
 drivers/misc/ocxl/ocxl_internal.h |  1 +
 include/misc/ocxl.h   |  9 +++
 include/uapi/misc/ocxl.h  | 10 
 6 files changed, 113 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/ocxl/context.c b/drivers/misc/ocxl/context.c
index 909e8807824a..95f74623113e 100644
--- a/drivers/misc/ocxl/context.c
+++ b/drivers/misc/ocxl/context.c
@@ -34,6 +34,8 @@ int ocxl_context_init(struct ocxl_context *ctx, struct 
ocxl_afu *afu,
mutex_init(&ctx->xsl_error_lock);
mutex_init(&ctx->irq_lock);
idr_init(&ctx->irq_idr);
+   ctx->tidr = 0;
+
/*
 * Keep a reference on the AFU to make sure it's valid for the
 * duration of the life of the context
@@ -65,6 +67,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
 {
int rc;
 
+   // Locks both status & tidr
mutex_lock(&ctx->status_mutex);
if (ctx->status != OPENED) {
rc = -EIO;
@@ -72,7 +75,7 @@ int ocxl_context_attach(struct ocxl_context *ctx, u64 amr)
}
 
rc = ocxl_link_add_pe(ctx->afu->fn->link, ctx->pasid,
-   current->mm->context.id, 0, amr, current->mm,
+   current->mm->context.id, ctx->tidr, amr, current->mm,
xsl_fault_error, ctx);
if (rc)
goto out;
diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
index 038509e5d031..eb409a469f21 100644
--- a/drivers/misc/ocxl/file.c
+++ b/drivers/misc/ocxl/file.c
@@ -5,6 +5,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include "ocxl_internal.h"
 
 
@@ -123,11 +125,55 @@ static long afu_ioctl_get_metadata(struct ocxl_context 
*ctx,
return 0;
 }
 
+#ifdef CONFIG_PPC64
+static long afu_ioctl_enable_p9_wait(struct ocxl_context *ctx,
+   struct ocxl_ioctl_p9_wait __user *uarg)
+{
+   struct ocxl_ioctl_p9_wait arg;
+
+   memset(&arg, 0, sizeof(arg));
+
+   if (cpu_has_feature(CPU_FTR_P9_TIDR)) {
+   enum ocxl_context_status status;
+
+   // Locks both status & tidr
+   mutex_lock(&ctx->status_mutex);
+   if (!ctx->tidr) {
+   if (set_thread_tidr(current))
+   return -ENOENT;
+
+   ctx->tidr = current->thread.tidr;
+   }
+
+   status = ctx->status;
+   mutex_unlock(&ctx->status_mutex);
+
+   if (status == ATTACHED) {
+   int rc;
+   struct link *link = ctx->afu->fn->link;
+
+   rc = ocxl_link_update_pe(link, ctx->pasid, ctx->tidr);
+   if (rc)
+   return rc;
+   }
+
+   arg.thread_id = ctx->tidr;
+   } else
+   return -ENOENT;
+
+   if (copy_to_user(uarg, &arg, sizeof(arg)))
+   return -EFAULT;
+
+   return 0;
+}
+#endif
+
 #define CMD_STR(x) (x == OCXL_IOCTL_ATTACH ? "ATTACH" :
\
x == OCXL_IOCTL_IRQ_ALLOC ? "IRQ_ALLOC" :   \
x == OCXL_IOCTL_IRQ_FREE ? "IRQ_FREE" : \
x == OCXL_IOCTL_IRQ_SET_FD ? "IRQ_SET_FD" : \
x == OCXL_IOCTL_GET_METADATA ? "GET_METADATA" : \
+   x == OCXL_IOCTL_ENABLE_P9_WAIT ? "ENABLE_P9_WAIT" : 
\
"UNKNOWN")
 
 static long afu_ioctl(struct file *file, unsigned int cmd,
@@ -186,6 +232,13 @@ static long afu_ioctl(struct file *file, unsigned int cmd,
(struct ocxl_ioctl_metadata __user *) args);
break;
 
+#ifdef CONFIG_PPC64
+   case OCXL_IOCTL_ENABLE_P9_WAIT:
+   rc = afu_ioctl_enable_p9_wait(ctx,
+   (struct ocxl_ioctl_p9_wait __user *) args);
+   break;
+#endif
+
default:
rc = -EINVAL;
}
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index 656e8610eec2..88876ae8f330 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -544,6 +544,42 @@ int ocxl_link_add_pe(void *link_handle, int pasid, u32 
pidr, u32 tidr,
 }
 EXPORT_SYMBOL_GPL(ocxl_link_add_pe);
 
+int ocxl_link_update_pe(void *link_handle, int pasid, __u16 tid)
+{
+   struct link *link = (struct link *) link_handle;
+   struct spa *spa = link->spa;
+   struct ocxl_process_element *pe;
+   int pe_handle, rc;
+
+   if (pasid > SPA

[PATCH v2 6/7] ocxl: Add an IOCTL so userspace knows what CPU features are available

2018-04-17 Thread Alastair D'Silva
From: Alastair D'Silva 

In order for a userspace AFU driver to call the Power9 specific
OCXL_IOCTL_ENABLE_P9_WAIT, it needs to verify that it can actually
make that call.

Signed-off-by: Alastair D'Silva 
---
 Documentation/accelerators/ocxl.rst |  1 -
 drivers/misc/ocxl/file.c| 25 +
 include/uapi/misc/ocxl.h|  4 
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/Documentation/accelerators/ocxl.rst 
b/Documentation/accelerators/ocxl.rst
index ddcc58d01cfb..7904adcc07fd 100644
--- a/Documentation/accelerators/ocxl.rst
+++ b/Documentation/accelerators/ocxl.rst
@@ -157,7 +157,6 @@ OCXL_IOCTL_GET_METADATA:
   Obtains configuration information from the card, such at the size of
   MMIO areas, the AFU version, and the PASID for the current context.
 
-
 mmap
 
 
diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
index eb409a469f21..33ae46ce0a8a 100644
--- a/drivers/misc/ocxl/file.c
+++ b/drivers/misc/ocxl/file.c
@@ -168,12 +168,32 @@ static long afu_ioctl_enable_p9_wait(struct ocxl_context 
*ctx,
 }
 #endif
 
+
+static long afu_ioctl_get_features(struct ocxl_context *ctx,
+   struct ocxl_ioctl_features __user *uarg)
+{
+   struct ocxl_ioctl_features arg;
+
+   memset(&arg, 0, sizeof(arg));
+
+#ifdef CONFIG_PPC64
+   if (cpu_has_feature(CPU_FTR_P9_TIDR))
+   arg.flags[0] |= OCXL_IOCTL_FEATURES_FLAGS0_P9_WAIT;
+#endif
+
+   if (copy_to_user(uarg, &arg, sizeof(arg)))
+   return -EFAULT;
+
+   return 0;
+}
+
 #define CMD_STR(x) (x == OCXL_IOCTL_ATTACH ? "ATTACH" :
\
x == OCXL_IOCTL_IRQ_ALLOC ? "IRQ_ALLOC" :   \
x == OCXL_IOCTL_IRQ_FREE ? "IRQ_FREE" : \
x == OCXL_IOCTL_IRQ_SET_FD ? "IRQ_SET_FD" : \
x == OCXL_IOCTL_GET_METADATA ? "GET_METADATA" : \
x == OCXL_IOCTL_ENABLE_P9_WAIT ? "ENABLE_P9_WAIT" : 
\
+   x == OCXL_IOCTL_GET_FEATURES ? "GET_FEATURES" : \
"UNKNOWN")
 
 static long afu_ioctl(struct file *file, unsigned int cmd,
@@ -239,6 +259,11 @@ static long afu_ioctl(struct file *file, unsigned int cmd,
break;
 #endif
 
+   case OCXL_IOCTL_GET_FEATURES:
+   rc = afu_ioctl_get_features(ctx,
+   (struct ocxl_ioctl_features __user *) args);
+   break;
+
default:
rc = -EINVAL;
}
diff --git a/include/uapi/misc/ocxl.h b/include/uapi/misc/ocxl.h
index 8d2748e69c84..bb80f294b429 100644
--- a/include/uapi/misc/ocxl.h
+++ b/include/uapi/misc/ocxl.h
@@ -55,6 +55,9 @@ struct ocxl_ioctl_p9_wait {
__u64 reserved3[3];
 };
 
+#define OCXL_IOCTL_FEATURES_FLAGS0_P9_WAIT 0x01
+struct ocxl_ioctl_features {
+   __u64 flags[4];
 };
 
 struct ocxl_ioctl_irq_fd {
@@ -72,5 +75,6 @@ struct ocxl_ioctl_irq_fd {
 #define OCXL_IOCTL_IRQ_SET_FD  _IOW(OCXL_MAGIC, 0x13, struct ocxl_ioctl_irq_fd)
 #define OCXL_IOCTL_GET_METADATA _IOR(OCXL_MAGIC, 0x14, struct 
ocxl_ioctl_metadata)
 #define OCXL_IOCTL_ENABLE_P9_WAIT  _IOR(OCXL_MAGIC, 0x15, struct 
ocxl_ioctl_p9_wait)
+#define OCXL_IOCTL_GET_FEATURES _IOR(OCXL_MAGIC, 0x16, struct 
ocxl_ioctl_platform)
 
 #endif /* _UAPI_MISC_OCXL_H */
-- 
2.14.3



[PATCH v2 0/7] ocxl: Implement Power9 as_notify/wait for OpenCAPI

2018-04-17 Thread Alastair D'Silva
From: Alastair D'Silva 

The Power 9 as_notify/wait feature provides a lower latency way to
signal a thread that work is complete. This series enables the use of
this feature from OpenCAPI adapters, as well as addressing a potential
starvation issue when allocating thread IDs.

Changelog:
v2:
  Rename get_platform IOCTL to get_features
  Move stray edit from patch 1 to patch 3

Alastair D'Silva (7):
  powerpc: Add TIDR CPU feature for Power9
  powerpc: Use TIDR CPU feature to control TIDR allocation
  powerpc: use task_pid_nr() for TID allocation
  ocxl: Rename pnv_ocxl_spa_remove_pe to clarify it's action
  ocxl: Expose the thread_id needed for wait on p9
  ocxl: Add an IOCTL so userspace knows what CPU features are available
  ocxl: Document new OCXL IOCTLs

 Documentation/accelerators/ocxl.rst   |  10 
 arch/powerpc/include/asm/cputable.h   |   3 +-
 arch/powerpc/include/asm/pnv-ocxl.h   |   2 +-
 arch/powerpc/include/asm/switch_to.h  |   1 -
 arch/powerpc/kernel/dt_cpu_ftrs.c |   1 +
 arch/powerpc/kernel/process.c | 101 +-
 arch/powerpc/platforms/powernv/ocxl.c |   4 +-
 drivers/misc/ocxl/context.c   |   5 +-
 drivers/misc/ocxl/file.c  |  78 ++
 drivers/misc/ocxl/link.c  |  38 -
 drivers/misc/ocxl/ocxl_internal.h |   1 +
 include/misc/ocxl.h   |   9 +++
 include/uapi/misc/ocxl.h  |  14 +
 13 files changed, 162 insertions(+), 105 deletions(-)

-- 
2.14.3



[PATCH v2 4/7] ocxl: Rename pnv_ocxl_spa_remove_pe to clarify it's action

2018-04-17 Thread Alastair D'Silva
From: Alastair D'Silva 

The function removes the process element from NPU cache.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/include/asm/pnv-ocxl.h   | 2 +-
 arch/powerpc/platforms/powernv/ocxl.c | 4 ++--
 drivers/misc/ocxl/link.c  | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/pnv-ocxl.h 
b/arch/powerpc/include/asm/pnv-ocxl.h
index f6945d3bc971..208b5503f4ed 100644
--- a/arch/powerpc/include/asm/pnv-ocxl.h
+++ b/arch/powerpc/include/asm/pnv-ocxl.h
@@ -28,7 +28,7 @@ extern int pnv_ocxl_map_xsl_regs(struct pci_dev *dev, void 
__iomem **dsisr,
 extern int pnv_ocxl_spa_setup(struct pci_dev *dev, void *spa_mem, int PE_mask,
void **platform_data);
 extern void pnv_ocxl_spa_release(void *platform_data);
-extern int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle);
+extern int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int 
pe_handle);
 
 extern int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr);
 extern void pnv_ocxl_free_xive_irq(u32 irq);
diff --git a/arch/powerpc/platforms/powernv/ocxl.c 
b/arch/powerpc/platforms/powernv/ocxl.c
index fa9b53af3c7b..8c65aacda9c8 100644
--- a/arch/powerpc/platforms/powernv/ocxl.c
+++ b/arch/powerpc/platforms/powernv/ocxl.c
@@ -475,7 +475,7 @@ void pnv_ocxl_spa_release(void *platform_data)
 }
 EXPORT_SYMBOL_GPL(pnv_ocxl_spa_release);
 
-int pnv_ocxl_spa_remove_pe(void *platform_data, int pe_handle)
+int pnv_ocxl_spa_remove_pe_from_cache(void *platform_data, int pe_handle)
 {
struct spa_data *data = (struct spa_data *) platform_data;
int rc;
@@ -483,7 +483,7 @@ int pnv_ocxl_spa_remove_pe(void *platform_data, int 
pe_handle)
rc = opal_npu_spa_clear_cache(data->phb_opal_id, data->bdfn, pe_handle);
return rc;
 }
-EXPORT_SYMBOL_GPL(pnv_ocxl_spa_remove_pe);
+EXPORT_SYMBOL_GPL(pnv_ocxl_spa_remove_pe_from_cache);
 
 int pnv_ocxl_alloc_xive_irq(u32 *irq, u64 *trigger_addr)
 {
diff --git a/drivers/misc/ocxl/link.c b/drivers/misc/ocxl/link.c
index f30790582dc0..656e8610eec2 100644
--- a/drivers/misc/ocxl/link.c
+++ b/drivers/misc/ocxl/link.c
@@ -599,7 +599,7 @@ int ocxl_link_remove_pe(void *link_handle, int pasid)
 * On powerpc, the entry needs to be cleared from the context
 * cache of the NPU.
 */
-   rc = pnv_ocxl_spa_remove_pe(link->platform_data, pe_handle);
+   rc = pnv_ocxl_spa_remove_pe_from_cache(link->platform_data, pe_handle);
WARN_ON(rc);
 
pe_data = radix_tree_delete(&spa->pe_tree, pe_handle);
-- 
2.14.3



[PATCH v2 3/7] powerpc: use task_pid_nr() for TID allocation

2018-04-17 Thread Alastair D'Silva
From: Alastair D'Silva 

The current implementation of TID allocation, using a global IDR, may
result in an errant process starving the system of available TIDs.
Instead, use task_pid_nr(), as mentioned by the original author. The
scenario described which prevented it's use is not applicable, as
set_thread_tidr can only be called after the task struct has been
populated.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/include/asm/switch_to.h |  1 -
 arch/powerpc/kernel/process.c| 97 +---
 2 files changed, 1 insertion(+), 97 deletions(-)

diff --git a/arch/powerpc/include/asm/switch_to.h 
b/arch/powerpc/include/asm/switch_to.h
index be8c9fa23983..5b03d8a82409 100644
--- a/arch/powerpc/include/asm/switch_to.h
+++ b/arch/powerpc/include/asm/switch_to.h
@@ -94,6 +94,5 @@ static inline void clear_task_ebb(struct task_struct *t)
 extern int set_thread_uses_vas(void);
 
 extern int set_thread_tidr(struct task_struct *t);
-extern void clear_thread_tidr(struct task_struct *t);
 
 #endif /* _ASM_POWERPC_SWITCH_TO_H */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 3b00da47699b..87f047fd2762 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1496,103 +1496,12 @@ int set_thread_uses_vas(void)
 }
 
 #ifdef CONFIG_PPC64
-static DEFINE_SPINLOCK(vas_thread_id_lock);
-static DEFINE_IDA(vas_thread_ida);
-
-/*
- * We need to assign a unique thread id to each thread in a process.
- *
- * This thread id, referred to as TIDR, and separate from the Linux's tgid,
- * is intended to be used to direct an ASB_Notify from the hardware to the
- * thread, when a suitable event occurs in the system.
- *
- * One such event is a "paste" instruction in the context of Fast Thread
- * Wakeup (aka Core-to-core wake up in the Virtual Accelerator Switchboard
- * (VAS) in POWER9.
- *
- * To get a unique TIDR per process we could simply reuse task_pid_nr() but
- * the problem is that task_pid_nr() is not yet available copy_thread() is
- * called. Fixing that would require changing more intrusive arch-neutral
- * code in code path in copy_process()?.
- *
- * Further, to assign unique TIDRs within each process, we need an atomic
- * field (or an IDR) in task_struct, which again intrudes into the arch-
- * neutral code. So try to assign globally unique TIDRs for now.
- *
- * NOTE: TIDR 0 indicates that the thread does not need a TIDR value.
- *  For now, only threads that expect to be notified by the VAS
- *  hardware need a TIDR value and we assign values > 0 for those.
- */
-#define MAX_THREAD_CONTEXT ((1 << 16) - 1)
-static int assign_thread_tidr(void)
-{
-   int index;
-   int err;
-   unsigned long flags;
-
-again:
-   if (!ida_pre_get(&vas_thread_ida, GFP_KERNEL))
-   return -ENOMEM;
-
-   spin_lock_irqsave(&vas_thread_id_lock, flags);
-   err = ida_get_new_above(&vas_thread_ida, 1, &index);
-   spin_unlock_irqrestore(&vas_thread_id_lock, flags);
-
-   if (err == -EAGAIN)
-   goto again;
-   else if (err)
-   return err;
-
-   if (index > MAX_THREAD_CONTEXT) {
-   spin_lock_irqsave(&vas_thread_id_lock, flags);
-   ida_remove(&vas_thread_ida, index);
-   spin_unlock_irqrestore(&vas_thread_id_lock, flags);
-   return -ENOMEM;
-   }
-
-   return index;
-}
-
-static void free_thread_tidr(int id)
-{
-   unsigned long flags;
-
-   spin_lock_irqsave(&vas_thread_id_lock, flags);
-   ida_remove(&vas_thread_ida, id);
-   spin_unlock_irqrestore(&vas_thread_id_lock, flags);
-}
-
-/*
- * Clear any TIDR value assigned to this thread.
- */
-void clear_thread_tidr(struct task_struct *t)
-{
-   if (!t->thread.tidr)
-   return;
-
-   if (!cpu_has_feature(CPU_FTR_P9_TIDR)) {
-   WARN_ON_ONCE(1);
-   return;
-   }
-
-   mtspr(SPRN_TIDR, 0);
-   free_thread_tidr(t->thread.tidr);
-   t->thread.tidr = 0;
-}
-
-void arch_release_task_struct(struct task_struct *t)
-{
-   clear_thread_tidr(t);
-}
-
 /*
  * Assign a unique TIDR (thread id) for task @t and set it in the thread
  * structure. For now, we only support setting TIDR for 'current' task.
  */
 int set_thread_tidr(struct task_struct *t)
 {
-   int rc;
-
if (!cpu_has_feature(CPU_FTR_P9_TIDR))
return -EINVAL;
 
@@ -1602,11 +1511,7 @@ int set_thread_tidr(struct task_struct *t)
if (t->thread.tidr)
return 0;
 
-   rc = assign_thread_tidr();
-   if (rc < 0)
-   return rc;
-
-   t->thread.tidr = rc;
+   t->thread.tidr = (u16)task_pid_nr(t);
mtspr(SPRN_TIDR, t->thread.tidr);
 
return 0;
-- 
2.14.3



[PATCH v2 1/7] powerpc: Add TIDR CPU feature for Power9

2018-04-17 Thread Alastair D'Silva
From: Alastair D'Silva 

This patch adds a CPU feature bit to show whether the CPU has
the TIDR register available, enabling as_notify/wait in userspace.

Signed-off-by: Alastair D'Silva 
---
 arch/powerpc/include/asm/cputable.h | 3 ++-
 arch/powerpc/kernel/dt_cpu_ftrs.c   | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/cputable.h 
b/arch/powerpc/include/asm/cputable.h
index 4e332f3531c5..54c4cbbe57b4 100644
--- a/arch/powerpc/include/asm/cputable.h
+++ b/arch/powerpc/include/asm/cputable.h
@@ -215,6 +215,7 @@ static inline void cpu_feature_keys_init(void) { }
 #define CPU_FTR_P9_TM_HV_ASSIST
LONG_ASM_CONST(0x1000)
 #define CPU_FTR_P9_TM_XER_SO_BUG   LONG_ASM_CONST(0x2000)
 #define CPU_FTR_P9_TLBIE_BUG   LONG_ASM_CONST(0x4000)
+#define CPU_FTR_P9_TIDR
LONG_ASM_CONST(0x8000)
 
 #ifndef __ASSEMBLY__
 
@@ -462,7 +463,7 @@ static inline void cpu_feature_keys_init(void) { }
CPU_FTR_CFAR | CPU_FTR_HVMODE | CPU_FTR_VMX_COPY | \
CPU_FTR_DBELL | CPU_FTR_HAS_PPR | CPU_FTR_ARCH_207S | \
CPU_FTR_TM_COMP | CPU_FTR_ARCH_300 | CPU_FTR_PKEY | \
-   CPU_FTR_P9_TLBIE_BUG)
+   CPU_FTR_P9_TLBIE_BUG | CPU_FTR_P9_TIDR)
 #define CPU_FTRS_POWER9_DD1 ((CPU_FTRS_POWER9 | CPU_FTR_POWER9_DD1) & \
 (~CPU_FTR_SAO))
 #define CPU_FTRS_POWER9_DD2_0 CPU_FTRS_POWER9
diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
b/arch/powerpc/kernel/dt_cpu_ftrs.c
index 11a3a4fed3fb..10f8b7f55637 100644
--- a/arch/powerpc/kernel/dt_cpu_ftrs.c
+++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
@@ -722,6 +722,7 @@ static __init void cpufeatures_cpu_quirks(void)
if ((version & 0x) == 0x004e) {
cur_cpu_spec->cpu_features &= ~(CPU_FTR_DAWR);
cur_cpu_spec->cpu_features |= CPU_FTR_P9_TLBIE_BUG;
+   cur_cpu_spec->cpu_features |= CPU_FTR_P9_TIDR;
}
 }
 
-- 
2.14.3



INFO: task hung in fsnotify_mark_destroy_workfn

2018-04-17 Thread syzbot

Hello,

syzbot hit the following crash on upstream commit
a27fc14219f2e3c4a46ba9177b04d9b52c875532 (Mon Apr 16 21:07:39 2018 +)
Merge branch 'parisc-4.17-3' of  
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=e38306788a2e7102a3b6


syzkaller reproducer:  
https://syzkaller.appspot.com/x/repro.syz?id=5126465372815360
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=5956756370882560
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-5914490758943236750

compiler: gcc (GCC) 8.0.1 20180413 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+e38306788a2e7102a...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.

If you forward the report, please keep this part and the footer.

binder: undelivered TRANSACTION_ERROR: 29201
binder: 23363:23363 got reply transaction with no transaction stack
binder: undelivered TRANSACTION_ERROR: 29201
binder: 23363:23363 transaction failed 29201/-71, size 0-0 line 2763
binder: undelivered TRANSACTION_ERROR: 29201
INFO: task kworker/u4:4:853 blocked for more than 120 seconds.
binder: undelivered TRANSACTION_ERROR: 29201
  Not tainted 4.17.0-rc1+ #6
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u4:4D11512   853  2 0x8000
Workqueue: events_unbound fsnotify_mark_destroy_workfn
binder: undelivered TRANSACTION_ERROR: 29201
Call Trace:
 context_switch kernel/sched/core.c:2848 [inline]
 __schedule+0x801/0x1e30 kernel/sched/core.c:3490
binder: undelivered TRANSACTION_ERROR: 29201
binder: undelivered TRANSACTION_ERROR: 29201
 schedule+0xef/0x430 kernel/sched/core.c:3549
binder: undelivered TRANSACTION_ERROR: 29201
binder: undelivered TRANSACTION_ERROR: 29201
binder: undelivered TRANSACTION_ERROR: 29201
binder: undelivered TRANSACTION_ERROR: 29201
 schedule_timeout+0x1b5/0x240 kernel/time/timer.c:1777
binder: undelivered TRANSACTION_ERROR: 29201
 do_wait_for_common kernel/sched/completion.c:83 [inline]
 __wait_for_common kernel/sched/completion.c:104 [inline]
 wait_for_common kernel/sched/completion.c:115 [inline]
 wait_for_completion+0x3e7/0x870 kernel/sched/completion.c:136
binder: undelivered TRANSACTION_ERROR: 29201
 __synchronize_srcu+0x189/0x240 kernel/rcu/srcutree.c:924
binder: undelivered TRANSACTION_ERROR: 29201
binder: undelivered TRANSACTION_ERROR: 29201
 synchronize_srcu+0x408/0x54f kernel/rcu/srcutree.c:1002
binder: undelivered TRANSACTION_ERROR: 29201
binder: undelivered TRANSACTION_ERROR: 29201
 fsnotify_mark_destroy_workfn+0x1aa/0x530 fs/notify/mark.c:759
binder: undelivered TRANSACTION_ERROR: 29201
binder: undelivered TRANSACTION_ERROR: 29201
binder: undelivered TRANSACTION_ERROR: 29201
binder: undelivered TRANSACTION_ERROR: 29201
 process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
binder: undelivered TRANSACTION_ERROR: 29201
binder: undelivered TRANSACTION_ERROR: 29201
binder: 23369:23369 got reply transaction with no transaction stack
binder: 23369:23369 transaction failed 29201/-71, size 0-0 line 2763
binder: 23366:23366 got reply transaction with no transaction stack
binder: 23366:23366 transaction failed 29201/-71, size 0-0 line 2763
 worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
binder: undelivered TRANSACTION_ERROR: 29201
binder: undelivered TRANSACTION_ERROR: 29201
binder: 23379:23379 got reply transaction with no transaction stack
binder: 23379:23379 transaction failed 29201/-71, size 0-0 line 2763
binder: undelivered TRANSACTION_ERROR: 29201
 kthread+0x345/0x410 kernel/kthread.c:238
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
binder: undelivered TRANSACTION_ERROR: 29201

Showing all locks held in the system:
2 locks held by kworker/u4:4/853:
 #0: 9bb0899e ((wq_completion)"events_unbound"){+.+.}, at:  
__write_once_size include/linux/compiler.h:215 [inline]
 #0: 9bb0899e ((wq_completion)"events_unbound"){+.+.}, at:  
arch_atomic64_set arch/x86/include/asm/atomic64_64.h:34 [inline]
 #0: 9bb0899e ((wq_completion)"events_unbound"){+.+.}, at:  
atomic64_set include/asm-generic/atomic-instrumented.h:40 [inline]
 #0: 9bb0899e ((wq_completion)"events_unbound"){+.+.}, at:  
atomic_long_set include/asm-generic/atomic-long.h:57 [inline]
 #0: 9bb0899e ((wq_completion)"events_unbound"){+.+.}, at:  
set_work_data kernel/workqueue.c:617 [inline]
 #0: 9bb0899e ((wq_completion)"events_unbound"){+.+.}, at:  
set_work_pool_and_clear_pending kernel/workqueue.c:644 [inline]
 #0: 9bb0899e ((wq_completion)"events_unbound"){+.+.}, at:  
process_one_work+0xaef/0x1b50 kernel/workqueue.c:2116
 #1: b1269673 ((reaper_work).work){+.+.}, at:  
process_one_work+0xb46/0x1b50 kernel/workqueue.c:2120

2 locks held by khungtaskd/890:
 #0: d567328c (rcu_read_lock){}, at:  
check_hung_uninterruptible_tasks kernel/hung_task.c:175 [in

KASAN: use-after-free Read in iput

2018-04-17 Thread syzbot

Hello,

syzbot hit the following crash on upstream commit
a27fc14219f2e3c4a46ba9177b04d9b52c875532 (Mon Apr 16 21:07:39 2018 +)
Merge branch 'parisc-4.17-3' of  
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
syzbot dashboard link:  
https://syzkaller.appspot.com/bug?extid=c6f45f6af72a45097be2


So far this crash happened 2 times on upstream.
C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5638195895074816
syzkaller reproducer:  
https://syzkaller.appspot.com/x/repro.syz?id=5154296156913664
Raw console output:  
https://syzkaller.appspot.com/x/log.txt?id=5743820952043520
Kernel config:  
https://syzkaller.appspot.com/x/.config?id=-5914490758943236750

compiler: gcc (GCC) 8.0.1 20180413 (experimental)

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+c6f45f6af72a45097...@syzkaller.appspotmail.com
It will help syzbot understand when the bug is fixed. See footer for  
details.

If you forward the report, please keep this part and the footer.

VFS: Busy inodes after unmount of fuse. Self-destruct in 5 seconds.  Have a  
nice day...

==
BUG: KASAN: use-after-free in iput_final fs/inode.c:1489 [inline]
BUG: KASAN: use-after-free in iput+0xa05/0xa80 fs/inode.c:1545
Read of size 8 at addr 8801d9752770 by task syzkaller802106/4482

CPU: 1 PID: 4482 Comm: syzkaller802106 Not tainted 4.17.0-rc1+ #6
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1b9/0x294 lib/dump_stack.c:113
 print_address_description+0x6c/0x20b mm/kasan/report.c:256
 kasan_report_error mm/kasan/report.c:354 [inline]
 kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
 iput_final fs/inode.c:1489 [inline]
 iput+0xa05/0xa80 fs/inode.c:1545
 free_trace_uprobe+0xe2/0x1f0 kernel/trace/trace_uprobe.c:292
 destroy_local_trace_uprobe+0x63/0x7e kernel/trace/trace_uprobe.c:1395
 perf_uprobe_destroy+0xf2/0x130 kernel/trace/trace_event_perf.c:342
 _free_event+0x3ff/0x1430 kernel/events/core.c:4445
 put_event+0x48/0x60 kernel/events/core.c:4531
 perf_event_release_kernel+0x8bd/0xf90 kernel/events/core.c:4637
 perf_release+0x37/0x50 kernel/events/core.c:4647
 __fput+0x34d/0x890 fs/file_table.c:209
 fput+0x15/0x20 fs/file_table.c:243
 task_work_run+0x1e4/0x290 kernel/task_work.c:113
 exit_task_work include/linux/task_work.h:22 [inline]
 do_exit+0x1aee/0x2730 kernel/exit.c:865
 do_group_exit+0x16f/0x430 kernel/exit.c:968
 get_signal+0x886/0x1960 kernel/signal.c:2469
 do_signal+0x98/0x2040 arch/x86/kernel/signal.c:810
 exit_to_usermode_loop+0x28a/0x310 arch/x86/entry/common.c:162
 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
 do_syscall_64+0x6ac/0x800 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4457b9
RSP: 002b:7f98cec58da8 EFLAGS: 0246 ORIG_RAX: 00ca
RAX: fe00 RBX:  RCX: 004457b9
RDX:  RSI:  RDI: 006dac3c
RBP: 006dac3c R08:  R09: 
R10:  R11: 0246 R12: 006dac38
R13: 0030656c69662f2e R14: 7f98cec599c0 R15: 0001

Allocated by task 4481:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
 kmem_cache_alloc_trace+0x152/0x780 mm/slab.c:3620
 kmalloc include/linux/slab.h:512 [inline]
 kzalloc include/linux/slab.h:701 [inline]
 alloc_super fs/super.c:186 [inline]
 sget_userns+0x1c7/0xf20 fs/super.c:503
 sget+0x10b/0x150 fs/super.c:558
 mount_nodev+0x33/0x110 fs/super.c:1206
 fuse_mount+0x2c/0x40 fs/fuse/inode.c:1192
 mount_fs+0xae/0x328 fs/super.c:1268
 vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
 vfs_kern_mount fs/namespace.c:1027 [inline]
 do_new_mount fs/namespace.c:2517 [inline]
 do_mount+0x564/0x3070 fs/namespace.c:2847
 ksys_mount+0x12d/0x140 fs/namespace.c:3063
 __do_sys_mount fs/namespace.c:3077 [inline]
 __se_sys_mount fs/namespace.c:3074 [inline]
 __x64_sys_mount+0xbe/0x150 fs/namespace.c:3074
 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 25:
 save_stack+0x43/0xd0 mm/kasan/kasan.c:448
 set_track mm/kasan/kasan.c:460 [inline]
 __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
 __cache_free mm/slab.c:3498 [inline]
 kfree+0xd9/0x260 mm/slab.c:3813
 destroy_super_work+0x40/0x50 fs/super.c:149
 process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
 worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
 kthread+0x345/0x410 kernel/kthread.c:238
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412

The buggy address belongs to the object at 8801d97

Re: [PATCH 03/11] fs: add frozen sb state helpers

2018-04-17 Thread Luis R. Rodriguez
On Thu, Dec 21, 2017 at 12:03:29PM +0100, Jan Kara wrote:
> Hello,
> 
> I think I owe you a reply here... Sorry that it took so long.

Took me just as long :)

> On Fri 01-12-17 22:13:27, Luis R. Rodriguez wrote:
> > 
> > I'll note that its still not perfectly clear if really the semantics behind
> > freeze_bdev() match what I described above fully. That still needs to be
> > vetted for. For instance, does thaw_bdev() keep a superblock frozen if we
> > an ioctl initiated freeze had occurred before? If so then great. Otherwise
> > I think we'll need to distinguish the ioctl interface. Worst possible case
> > is that bdev semantics and in-kernel semantics differ somehow, then that
> > will really create a holy fucking mess.
> 
> I believe nobody really thought about mixing those two interfaces to fs
> freezing and so the behavior is basically defined by the implementation.
> That is:
> 
> freeze_bdev() on sb frozen by ioctl_fsfreeze() -> EBUSY

Note below as well on your *future* freeze_super() implementation.

> freeze_bdev() on sb frozen by freeze_bdev() -> success
> ioctl_fsfreeze() on sb frozen by freeze_bdev() -> EBUSY
> ioctl_fsfreeze() on sb frozen by ioctl_fsfreeze() -> EBUSY
> 
> thaw_bdev() on sb frozen by ioctl_fsfreeze() -> EINVAL

Phew, so this is what we want for the in-kernel freezing so we're good
and *can* combine these then.

> ioctl_fsthaw() on sb frozen by freeze_bdev() -> success
> 
> What I propose is the following API:
> 
> freeze_super_excl()
>   - freezes superblock, returns EBUSY if the superblock is already frozen
> (either by another freeze_super_excl() or by freeze_super())
> freeze_super()
>   - this function will make sure superblock is frozen when the function
> returns with success. 

That's straight forward.

> It can be nested with other freeze_super() or
> freeze_super_excl() calls 

This is where it can get hairy. More below.

> (this second part is different from how
> freeze_bdev() behaves currently but AFAICT this behavior is actually
> what all current users of freeze_bdev() really want - just make sure
> fs cannot be written to)

If we can agree to this, then sure. However there are two types of
possible nested calls to consider, one where the sb was already frozen
by an IOCTL, and the other where it was initiated by either another
freeze_super_excl() or another freeze_super() call which is currently
being processed. For the first type, its easy to say the device is
already frozen as such return success. If the freezing is ongoing,
we may want to wait or not wait, and this will depend on our current
use cases for freeze_bdev().

As you noted above, freeze_bdev() currently returns EBUSY if we had
the sb already frozen by ioctl_fsfreeze(). It may be a welcomed
enhancement to correct the semantics first to address the first case,
but keep the EBUSY for the other case. A secondary patch could then
add a completion mechanism and let callers decide to either wait or not.
*Iff* the caller did not opt-in to wait we keep the EBUSY return.

Seem reasonable?

I'll address the rest of the mail later.

  Luis


Re: [patch] mm, oom: fix concurrent munlock and oom reaper unmap

2018-04-17 Thread Tetsuo Handa
David Rientjes wrote:
> Since exit_mmap() is done without the protection of mm->mmap_sem, it is
> possible for the oom reaper to concurrently operate on an mm until
> MMF_OOM_SKIP is set.
> 
> This allows munlock_vma_pages_all() to concurrently run while the oom
> reaper is operating on a vma.  Since munlock_vma_pages_range() depends on
> clearing VM_LOCKED from vm_flags before actually doing the munlock to
> determine if any other vmas are locking the same memory, the check for
> VM_LOCKED in the oom reaper is racy.
> 
> This is especially noticeable on architectures such as powerpc where
> clearing a huge pmd requires kick_all_cpus_sync().  If the pmd is zapped
> by the oom reaper during follow_page_mask() after the check for pmd_none()
> is bypassed, this ends up deferencing a NULL ptl.

I don't know whether the explanation above is correct.
Did you actually see a crash caused by this race?

> 
> Fix this by reusing MMF_UNSTABLE to specify that an mm should not be
> reaped.  This prevents the concurrent munlock_vma_pages_range() and
> unmap_page_range().  The oom reaper will simply not operate on an mm that
> has the bit set and leave the unmapping to exit_mmap().

But this patch is setting MMF_OOM_SKIP without reaping any memory as soon as
MMF_UNSTABLE is set, which is the situation described in 212925802454:

At the same time if the OOM reaper doesn't wait at all for the memory of
the current OOM candidate to be freed by exit_mmap->unmap_vmas, it would
generate a spurious OOM kill.

If exit_mmap() does not wait for any pages and __oom_reap_task_mm() can not
handle mlock()ed pages, isn't it better to revert 212925802454 like I mentioned
at https://patchwork.kernel.org/patch/10095661/ and let the OOM reaper reclaim
as much as possible before setting MMF_OOM_SKIP?


Re: [PATCH 4/4] ALSA: usb: add UAC3 BADD profiles support

2018-04-17 Thread Ruslan Bilovol
On Sat, Apr 14, 2018 at 8:55 PM, Jorge Sanjuan
 wrote:
>
>
> On 2018-04-13 23:24, Ruslan Bilovol wrote:
>>
>> Recently released USB Audio Class 3.0 specification
>> contains BADD (Basic Audio Device Definition) document
>> which describes pre-defined UAC3 configurations.
>>
>> BADD support is mandatory for UAC3 devices, it should be
>> implemented as a separate USB device configuration.
>> As per BADD document, class-specific descriptors
>> shall not be included in the Device’s Configuration
>> descriptor ("inferred"), but host can guess them
>> from BADD profile number, number of endpoints and
>> their max packed sizes.
>
>
> Right. I would have thought that, since BADD is a subset of UAC3, it may be
> simpler to fill the Class Specific descriptors buffer and let the UAC3 path
> intact as it would result in the same behavior (for UAC3 and BADD configs)
> without the need to add that much code to the mixer, which is already quite
> big.
>
> In the patch I proposed [1], the Class Specific buffer is filled once with
> the BADD descriptors, which are already UAC3 compliant, so the driver would
> handle the rest in the same way it would do with an UAC3 configuration.

That was looking as a good idea to me as well when I seen patch [1] first
time. However, after thinking a bit more, I realized that in mixer.c we
just need to initialize from one to three feature units for any BADD profile.
Mentioned Feature Units are simple and can have only Volume/Mute controls.
We also have nothing to do with Mixer Unit (which exists in BAIOF
topology) since it doesn't have any controls.
Most of the code there is to just detect all possible combinations of
channels, topologies (BAIF, BAOF, BAIOF, BAIF+BAOF) for BADD profiles,
and to add some meaningful names for feature unit controls, which
are missing in the descriptors.
The only change in the mixer.c I'm unhappy with is to have separate
build_feature_ctl_badd() function that is nothing else but simplified
version of build_feature_ctl() function; but I already have an idea how to
reuse original one for BADD case.

Changes to stream.c are very simple and straightforward, almost all
values are common/predefined for all BADD profiles except channel
numbers and sample size.

So as a boottom line direct changes to mixer/stream code seems more
easy and understandable in this particular case rather than generating
all needed class-specific desriptors. And we have all BADD profiles
support in this quite short patch.

Thanks,
Ruslan

>
> I will keep an eye on this as I'd need to do some work based on this
> instead.
>
> [1] https://www.spinics.net/lists/alsa-devel/msg71617.html
>
> Thanks,
>
> Jorge
>
>>
>> This patch adds support of all BADD profiles from the spec
>>
>> Signed-off-by: Ruslan Bilovol 
>> ---
>>  sound/usb/card.c   |  14 +++
>>  sound/usb/clock.c  |   9 +-
>>  sound/usb/mixer.c  | 313
>> +++--
>>  sound/usb/mixer_maps.c |  65 ++
>>  sound/usb/stream.c |  83 +++--
>>  sound/usb/usbaudio.h   |   2 +
>>  6 files changed, 466 insertions(+), 20 deletions(-)
>>
>> diff --git a/sound/usb/card.c b/sound/usb/card.c
>> index 4d866bd..47ebc50 100644
>> --- a/sound/usb/card.c
>> +++ b/sound/usb/card.c
>> @@ -307,6 +307,20 @@ static int snd_usb_create_streams(struct
>> snd_usb_audio *chip, int ctrlif)
>> return -EINVAL;
>> }
>>
>> +   if (protocol == UAC_VERSION_3) {
>> +   int badd = assoc->bFunctionSubClass;
>> +
>> +   if (badd != UAC3_FUNCTION_SUBCLASS_FULL_ADC_3_0 &&
>> +   (badd < UAC3_FUNCTION_SUBCLASS_GENERIC_IO ||
>> +badd > UAC3_FUNCTION_SUBCLASS_SPEAKERPHONE))
>> {
>> +   dev_err(&dev->dev,
>> +   "Unsupported UAC3 BADD
>> profile\n");
>> +   return -EINVAL;
>> +   }
>> +
>> +   chip->badd_profile = badd;
>> +   }
>> +
>> for (i = 0; i < assoc->bInterfaceCount; i++) {
>> int intf = assoc->bFirstInterface + i;
>>
>> diff --git a/sound/usb/clock.c b/sound/usb/clock.c
>> index 0b030d8..17673f3 100644
>> --- a/sound/usb/clock.c
>> +++ b/sound/usb/clock.c
>> @@ -587,8 +587,15 @@ int snd_usb_init_sample_rate(struct snd_usb_audio
>> *chip, int iface,
>> default:
>> return set_sample_rate_v1(chip, iface, alts, fmt, rate);
>>
>> -   case UAC_VERSION_2:
>> case UAC_VERSION_3:
>> +   if (chip->badd_profile >=
>> UAC3_FUNCTION_SUBCLASS_GENERIC_IO) {
>> +   if (rate != UAC3_BADD_SAMPLING_RATE)
>> +   return -ENXIO;
>> +   else
>> +   return 0;
>> +   }
>> +   /* fall through */
>> +   case UAC_VERSION_2:
>> ret

Re: [PATCH -mm 10/21] mm, THP, swap: Support to count THP swapin and its fallback

2018-04-17 Thread Huang, Ying
Randy Dunlap  writes:

> On 04/16/18 19:02, Huang, Ying wrote:
>> From: Huang Ying 
>> 
>> 2 new /proc/vmstat fields are added, "thp_swapin" and
>> "thp_swapin_fallback" to count swapin a THP from swap device as a
>> whole and fallback to normal page swapin.
>> 
>> Signed-off-by: "Huang, Ying" 
>> Cc: "Kirill A. Shutemov" 
>> Cc: Andrea Arcangeli 
>> Cc: Michal Hocko 
>> Cc: Johannes Weiner 
>> Cc: Shaohua Li 
>> Cc: Hugh Dickins 
>> Cc: Minchan Kim 
>> Cc: Rik van Riel 
>> Cc: Dave Hansen 
>> Cc: Naoya Horiguchi 
>> Cc: Zi Yan 
>> ---
>>  include/linux/vm_event_item.h |  2 ++
>>  mm/huge_memory.c  |  4 +++-
>>  mm/page_io.c  | 15 ---
>>  mm/vmstat.c   |  2 ++
>>  4 files changed, 19 insertions(+), 4 deletions(-)
>> 
>
> Hi,
> Please also update Documentation/vm/transhuge.rst.

Thanks for reminding!  Will do that.

Best Regards,
Huang, Ying


Re: [PATCH -mm 06/21] mm, THP, swap: Support PMD swap mapping when splitting huge PMD

2018-04-17 Thread Huang, Ying
Randy Dunlap  writes:

> On 04/16/18 19:02, Huang, Ying wrote:
>> From: Huang Ying 
>> 
>> A huge PMD need to be split when zap a part of the PMD mapping etc.
>> If the PMD mapping is a swap mapping, we need to split it too.  This
>> patch implemented the support for this.  This is similar as splitting
>> the PMD page mapping, except we need to decrease the PMD swap mapping
>> count for the huge swap cluster too.  If the PMD swap mapping count
>> becomes 0, the huge swap cluster will be split.
>> 
>> Notice: is_huge_zero_pmd() and pmd_page() doesn't work well with swap
>> PMD, so pmd_present() check is called before them.
>
> FWIW, I would prefer to see that comment in the source code, not just
> in the commit description.

Sure.  I will add comment in source code too.

Best Regards,
Huang, Ying

>> 
>> Signed-off-by: "Huang, Ying" 
>> Cc: "Kirill A. Shutemov" 
>> Cc: Andrea Arcangeli 
>> Cc: Michal Hocko 
>> Cc: Johannes Weiner 
>> Cc: Shaohua Li 
>> Cc: Hugh Dickins 
>> Cc: Minchan Kim 
>> Cc: Rik van Riel 
>> Cc: Dave Hansen 
>> Cc: Naoya Horiguchi 
>> Cc: Zi Yan 
>> ---
>>  include/linux/swap.h |  6 +
>>  mm/huge_memory.c | 54 
>>  mm/swapfile.c| 28 +++
>>  3 files changed, 83 insertions(+), 5 deletions(-)


[Patch v3 2/6] cifs: Allocate validate negotiation request through kmalloc

2018-04-17 Thread Long Li
From: Long Li 

The data buffer allocated on the stack can't be DMA'ed, and hence can't send
through RDMA via SMB Direct.

Fix this by allocating the request on the heap in smb3_validate_negotiate.

Changes in v2:
Removed duplicated code on freeing buffers on function exit.
(Thanks to Parav Pandit )
Fixed typo in the patch title.

Changes in v3:
Added "Fixes" to the patch.
Changed sizeof() to use *pointer in place of struct.

Fixes: ff1c038addc4 ("Check SMB3 dialects against downgrade attacks")
Signed-off-by: Long Li 
Cc: sta...@vger.kernel.org
---
 fs/cifs/smb2pdu.c | 59 ++-
 1 file changed, 32 insertions(+), 27 deletions(-)

diff --git a/fs/cifs/smb2pdu.c b/fs/cifs/smb2pdu.c
index 0f044c4..5582a02 100644
--- a/fs/cifs/smb2pdu.c
+++ b/fs/cifs/smb2pdu.c
@@ -729,8 +729,8 @@ SMB2_negotiate(const unsigned int xid, struct cifs_ses *ses)
 
 int smb3_validate_negotiate(const unsigned int xid, struct cifs_tcon *tcon)
 {
-   int rc = 0;
-   struct validate_negotiate_info_req vneg_inbuf;
+   int ret, rc = -EIO;
+   struct validate_negotiate_info_req *pneg_inbuf;
struct validate_negotiate_info_rsp *pneg_rsp = NULL;
u32 rsplen;
u32 inbuflen; /* max of 4 dialects */
@@ -741,6 +741,9 @@ int smb3_validate_negotiate(const unsigned int xid, struct 
cifs_tcon *tcon)
if (tcon->ses->server->rdma)
return 0;
 #endif
+   pneg_inbuf = kmalloc(sizeof(*pneg_inbuf), GFP_KERNEL);
+   if (!pneg_inbuf)
+   return -ENOMEM;
 
/* In SMB3.11 preauth integrity supersedes validate negotiate */
if (tcon->ses->server->dialect == SMB311_PROT_ID)
@@ -764,63 +767,63 @@ int smb3_validate_negotiate(const unsigned int xid, 
struct cifs_tcon *tcon)
if (tcon->ses->session_flags & SMB2_SESSION_FLAG_IS_NULL)
cifs_dbg(VFS, "Unexpected null user (anonymous) auth flag sent 
by server\n");
 
-   vneg_inbuf.Capabilities =
+   pneg_inbuf->Capabilities =
cpu_to_le32(tcon->ses->server->vals->req_capabilities);
-   memcpy(vneg_inbuf.Guid, tcon->ses->server->client_guid,
+   memcpy(pneg_inbuf->Guid, tcon->ses->server->client_guid,
SMB2_CLIENT_GUID_SIZE);
 
if (tcon->ses->sign)
-   vneg_inbuf.SecurityMode =
+   pneg_inbuf->SecurityMode =
cpu_to_le16(SMB2_NEGOTIATE_SIGNING_REQUIRED);
else if (global_secflags & CIFSSEC_MAY_SIGN)
-   vneg_inbuf.SecurityMode =
+   pneg_inbuf->SecurityMode =
cpu_to_le16(SMB2_NEGOTIATE_SIGNING_ENABLED);
else
-   vneg_inbuf.SecurityMode = 0;
+   pneg_inbuf->SecurityMode = 0;
 
 
if (strcmp(tcon->ses->server->vals->version_string,
SMB3ANY_VERSION_STRING) == 0) {
-   vneg_inbuf.Dialects[0] = cpu_to_le16(SMB30_PROT_ID);
-   vneg_inbuf.Dialects[1] = cpu_to_le16(SMB302_PROT_ID);
-   vneg_inbuf.DialectCount = cpu_to_le16(2);
+   pneg_inbuf->Dialects[0] = cpu_to_le16(SMB30_PROT_ID);
+   pneg_inbuf->Dialects[1] = cpu_to_le16(SMB302_PROT_ID);
+   pneg_inbuf->DialectCount = cpu_to_le16(2);
/* structure is big enough for 3 dialects, sending only 2 */
inbuflen = sizeof(struct validate_negotiate_info_req) - 2;
} else if (strcmp(tcon->ses->server->vals->version_string,
SMBDEFAULT_VERSION_STRING) == 0) {
-   vneg_inbuf.Dialects[0] = cpu_to_le16(SMB21_PROT_ID);
-   vneg_inbuf.Dialects[1] = cpu_to_le16(SMB30_PROT_ID);
-   vneg_inbuf.Dialects[2] = cpu_to_le16(SMB302_PROT_ID);
-   vneg_inbuf.DialectCount = cpu_to_le16(3);
+   pneg_inbuf->Dialects[0] = cpu_to_le16(SMB21_PROT_ID);
+   pneg_inbuf->Dialects[1] = cpu_to_le16(SMB30_PROT_ID);
+   pneg_inbuf->Dialects[2] = cpu_to_le16(SMB302_PROT_ID);
+   pneg_inbuf->DialectCount = cpu_to_le16(3);
/* structure is big enough for 3 dialects */
inbuflen = sizeof(struct validate_negotiate_info_req);
} else {
/* otherwise specific dialect was requested */
-   vneg_inbuf.Dialects[0] =
+   pneg_inbuf->Dialects[0] =
cpu_to_le16(tcon->ses->server->vals->protocol_id);
-   vneg_inbuf.DialectCount = cpu_to_le16(1);
+   pneg_inbuf->DialectCount = cpu_to_le16(1);
/* structure is big enough for 3 dialects, sending only 1 */
inbuflen = sizeof(struct validate_negotiate_info_req) - 4;
}
 
-   rc = SMB2_ioctl(xid, tcon, NO_FILE_ID, NO_FILE_ID,
+   ret = SMB2_ioctl(xid, tcon, NO_FILE_ID, NO_FILE_ID,
FSCTL_VALIDATE_NEGOTIATE_INFO, true /* is_fsctl */,
-   (char *)&vneg_inbuf, sizeof(struct validate

  1   2   3   4   5   6   7   8   9   10   >