Re: [PATCH v2 4/4] serial: samsung: Fix UART status handling in DMA mode

2015-09-11 Thread Krzysztof Kozlowski
On 11.09.2015 15:32, Robert Baldyga wrote:
> On 09/11/2015 08:07 AM, Krzysztof Kozlowski wrote:
>> On 10.09.2015 22:41, Robert Baldyga wrote:
>>> This patch fixes UART status handling in DMA mode.
>>
>> I don't see any changes here. You did not respond to my comment neither.
>>
>> Code looks itself good... except a locking issue but I don't know what's
>> the cause. It may be not related to the patchset and maybe just not all
>> of issues are fixed yet. Anyway I'll describe it in 3/4.
>>
>> Best regards,
>> Krzysztof
>>
>>> For this purpose we
>>> use s3c24xx_serial_rx_drain_fifo() instead of uart_rx_drain_fifo(), which
>>> does the same thing plus checks for special conditions (such as 'break').
>>>
>>> Thanks to this we have, for example, Magic SysRq handling, which was
>>> missing in DMA mode so far. Since we can use UART in DMA mode as serial
>>> console, this is a quite important improvement.
>>>
>>> This change additionally simplifies RX handling code, as we no longer
>>> need uart_rx_drain_fifo() function, so we can remove it.
>>>
>>> Reported-by: Marek Szyprowski 
>>> Signed-off-by: Robert Baldyga 
>>> ---
>>>  drivers/tty/serial/samsung.c | 30 +++---
>>>  1 file changed, 3 insertions(+), 27 deletions(-)
>>>
>>> diff --git a/drivers/tty/serial/samsung.c b/drivers/tty/serial/samsung.c
>>> index 1d7dd86..d72cd73 100644
>>> --- a/drivers/tty/serial/samsung.c
>>> +++ b/drivers/tty/serial/samsung.c
>>> @@ -385,32 +385,6 @@ static void s3c24xx_uart_copy_rx_to_tty(struct 
>>> s3c24xx_uart_port *ourport,
>>> }
>>>  }
>>>  
>>> -static int s3c24xx_serial_rx_fifocnt(struct s3c24xx_uart_port *ourport,
>>> -unsigned long ufstat);
>>> -
>>> -static void uart_rx_drain_fifo(struct s3c24xx_uart_port *ourport)
>>> -{
>>> -   struct uart_port *port = >port;
>>> -   struct tty_port *tty = >state->port;
>>> -   unsigned int ch, ufstat;
>>> -   unsigned int count;
>>> -
>>> -   ufstat = rd_regl(port, S3C2410_UFSTAT);
>>> -   count = s3c24xx_serial_rx_fifocnt(ourport, ufstat);
>>> -
>>> -   if (!count)
>>> -   return;
>>> -
>>> -   while (count-- > 0) {
>>> -   ch = rd_regb(port, S3C2410_URXH);
>>> -
>>> -   ourport->port.icount.rx++;
>>> -   tty_insert_flip_char(tty, ch, TTY_NORMAL);
>>> -   }
>>> -
>>> -   tty_flip_buffer_push(tty);
>>> -}
>>> -
>>>  static void s3c24xx_serial_stop_rx(struct uart_port *port)
>>>  {
>>> struct s3c24xx_uart_port *ourport = to_ourport(port);
>>> @@ -573,6 +547,8 @@ static void enable_rx_pio(struct s3c24xx_uart_port 
>>> *ourport)
>>> ourport->rx_mode = S3C24XX_RX_PIO;
>>>  }
>>>  
>>> +static void s3c24xx_serial_rx_drain_fifo(struct s3c24xx_uart_port 
>>> *ourport);
>>> +
>>>  static irqreturn_t s3c24xx_serial_rx_chars_dma(void *dev_id)
>>>  {
>>> unsigned int utrstat, ufstat, received;
>>> @@ -606,7 +582,7 @@ static irqreturn_t s3c24xx_serial_rx_chars_dma(void 
>>> *dev_id)
>>> enable_rx_pio(ourport);
>>> }
>>>  
> 
> The essence of change is here. We use another method for draining FIFO.
> Instead of just putting them into tty buffer, we additionally check for
> special conditions, and that's the improvement.

Hm? I was referring to my comment - I did not see any changes around
"fixes" in commit message.

Best regards,
Krzysztof
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ARM: exynos_defconfig: Disable simplefb support

2015-09-11 Thread Javier Martinez Canillas
Hello Krzysztof,

On 09/11/2015 07:01 AM, Krzysztof Kozlowski wrote:
> On 10.09.2015 22:42, Javier Martinez Canillas wrote:
>> The simplefb driver allows the kernel to render on a pre-allocated
>> buffer that's been initialized by firmware before the kernel boots.
>>
>> This option was enabled to have display working on the Exynos5250
>> Snow Chromebook by commit da9d0fbf5e9a ("ARM: exynos: defconfig
>> update") since proper DRM/KMS support did not exist at that time.
>>
>> But now that the Exynos DRM driver has support for this hardware,
>> there is no need to have simplefb enabled. In fact, if a user has
>> a u-boot that injects the simplefb dev node to the FDT before pass
>> it to the kernel, display won't be properly initialized and only a
>> blank screen will be shown since there isn't a proper handoff from
>> the simplefb driver to the Exynos DRM driver.
>>
>> Signed-off-by: Javier Martinez Canillas 
>>
>> ---
>>
>>  arch/arm/configs/exynos_defconfig | 1 -
>>  1 file changed, 1 deletion(-)
> 
> Seems logical. None of the boards use simple-framebuffer compatible
> anyway. I understand that on Snow simplefb was needed along with change
> in Uboot like this one:
> https://chromium.googlesource.com/chromiumos/third_party/u-boot/+/refs/changes/58/49358/2
>

Exactly but you won't see the dev node with the "simple-framebuffer"
compatible string in the DTS since is the bootloader that adds this
device node to the FDT before passing it to the kernel.

The bootloader shouldn't mangle the FDT (with the exception of the
memory and choosen/bootargs nodes) but simplefb is just a hack to
re-use the display HW initialization made by the bootloader.

> and now none of Exynos boards use simplefb anymore?
>

Yes, there are no other Exynos boards using simplefb besides Snow
that I'm aware of but since Exynos DRM is working well on this board
from v4.0, there is no need for it anymore.

In fact, as explained in the commit message, it could do more harm
than good since users that are still booting with a u-boot that adds
the simplefb device node, only get a blank screen since the simplefb
driver is probed, creates a console and later the Exynos DRM probes
and re-initializes the HW creating its own console, causing this issue.

I got several reports of users that says that mainline stop booting for
them but is just that they didn't get display working. Disabling simplefb
makes display to work again so maybe this is even -rc material and should
go to stable # v4.0+

> Best regards,
> Krzysztof
> 
> 
 
Best regards,
-- 
Javier Martinez Canillas
Open Source Group
Samsung Research America
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/7] f2fs: verify file type early in f2fs_fallocate

2015-09-11 Thread Chao Yu
This patch changes to verify file type early in f2fs_fallocate for
cleanup, meanwhile this also fixes to add missing verification for
expand_inode_data.

Signed-off-by: Chao Yu 
---
 fs/f2fs/file.c | 16 
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index ac97f78..9e03622 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -775,9 +775,6 @@ static int punch_hole(struct inode *inode, loff_t offset, 
loff_t len)
loff_t off_start, off_end;
int ret = 0;
 
-   if (!S_ISREG(inode->i_mode))
-   return -EOPNOTSUPP;
-
if (f2fs_has_inline_data(inode)) {
ret = f2fs_convert_inline_inode(inode);
if (ret)
@@ -918,9 +915,6 @@ static int f2fs_collapse_range(struct inode *inode, loff_t 
offset, loff_t len)
loff_t new_size;
int ret;
 
-   if (!S_ISREG(inode->i_mode))
-   return -EINVAL;
-
if (offset + len >= i_size_read(inode))
return -EINVAL;
 
@@ -969,9 +963,6 @@ static int f2fs_zero_range(struct inode *inode, loff_t 
offset, loff_t len,
loff_t off_start, off_end;
int ret = 0;
 
-   if (!S_ISREG(inode->i_mode))
-   return -EINVAL;
-
ret = inode_newsize_ok(inode, (len + offset));
if (ret)
return ret;
@@ -1078,9 +1069,6 @@ static int f2fs_insert_range(struct inode *inode, loff_t 
offset, loff_t len)
loff_t new_size;
int ret;
 
-   if (!S_ISREG(inode->i_mode))
-   return -EINVAL;
-
new_size = i_size_read(inode) + len;
if (new_size > inode->i_sb->s_maxbytes)
return -EFBIG;
@@ -1238,6 +1226,10 @@ static long f2fs_fallocate(struct file *file, int mode,
struct inode *inode = file_inode(file);
long ret = 0;
 
+   /* f2fs only support ->fallocate for regular file */
+   if (!S_ISREG(inode->i_mode))
+   return -EINVAL;
+
if (f2fs_encrypted_inode(inode) &&
(mode & (FALLOC_FL_COLLAPSE_RANGE | FALLOC_FL_INSERT_RANGE)))
return -EOPNOTSUPP;
-- 
2.4.2


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/7] f2fs: readahead cp payload pages when mount

2015-09-11 Thread Chao Yu
Readahead continuous payload pages in checkpoint area for better
performance.

Signed-off-by: Chao Yu 
---
 fs/f2fs/checkpoint.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index c5a38e3..7c1b297 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -676,6 +676,9 @@ int get_valid_checkpoint(struct f2fs_sb_info *sbi)
if (cur_page == cp2)
cp_blk_no += 1 << le32_to_cpu(fsb->log_blocks_per_seg);
 
+   if (cp_blks > 2)
+   ra_meta_pages(sbi, cp_blk_no + 1, cp_blks - 1, META_CP);
+
for (i = 1; i < cp_blks; i++) {
void *sit_bitmap_ptr;
unsigned char *ckpt = (unsigned char *)sbi->ckpt;
-- 
2.4.2


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH-v2 1/2] mpt3sas: Refcount sas_device objects and fix unsafe list usage

2015-09-11 Thread Nicholas A. Bellinger
On Wed, 2015-09-09 at 15:03 -0700, Nicholas A. Bellinger wrote:
> On Wed, 2015-09-09 at 19:59 +0530, Chaitra Basappa wrote:
> > From: Sreekanth Reddy [mailto:sreekanth.re...@avagotech.com]
> > Sent: Tuesday, September 08, 2015 5:26 PM
> > To: Nicholas A. Bellinger
> > Cc: linux-scsi; linux-kernel; James Bottomley; Calvin Owens; Christoph
> > Hellwig; MPT-FusionLinux.pdl; kernel-team; Nicholas Bellinger; Chaitra
> > Basappa
> > Subject: Re: [PATCH-v2 1/2] mpt3sas: Refcount sas_device objects and fix
> > unsafe list usage
> > 
> > On Sun, Aug 30, 2015 at 1:24 PM, Nicholas A. Bellinger 
> > wrote:
> > > From: Nicholas Bellinger 
> > >
> > > These objects can be referenced concurrently throughout the driver, we
> > > need a way to make sure threads can't delete them out from under each
> > > other. This patch adds the refcount, and refactors the code to use it.
> > >
> > > Additionally, we cannot iterate over the sas_device_list without
> > > holding the lock, or we risk corrupting random memory if items are
> > > added or deleted as we iterate. This patch refactors
> > > _scsih_probe_sas() to use the sas_device_list in a safe way.
> > >
> > > This patch is a port of Calvin's PATCH-v4 for mpt2sas code, atop
> > > mpt3sas changes in scsi.git/for-next.
> > >
> > > Cc: Calvin Owens 
> > > Cc: Christoph Hellwig 
> > > Cc: Sreekanth Reddy 
> > > Cc: MPT-FusionLinux.pdl 
> > > Signed-off-by: Nicholas Bellinger 
> > > ---
> > >  drivers/scsi/mpt3sas/mpt3sas_base.h  |  25 +-
> > >  drivers/scsi/mpt3sas/mpt3sas_scsih.c | 479
> > > +--
> > >  drivers/scsi/mpt3sas/mpt3sas_transport.c |  18 +-
> > >  3 files changed, 364 insertions(+), 158 deletions(-)
> > >
> > > @@ -2763,7 +2874,7 @@ _scsih_block_io_device(struct MPT3SAS_ADAPTER *ioc,
> > > u16 handle)
> > > struct scsi_device *sdev;
> > > struct _sas_device *sas_device;
> > >
> > 
> > [Sreekanth] Here sas_device_lock spin lock needs to be acquired before
> > calling
> >   __mpt3sas_get_sdev_by_addr() function.
> > 
> > [Chaitra]Here instead of calling " __mpt3sas_get_sdev_by_handle()" function
> > calling
> > "mpt3sas_get_sdev_by_handle()" function will fixes "invalid page access"
> > type of kernel panic
> > 
> > > -   sas_device = _scsih_sas_device_find_by_handle(ioc, handle);
> > > +   sas_device = __mpt3sas_get_sdev_by_handle(ioc, handle);
> > > if (!sas_device)
> > > return;
> > >
> 
> Whoops, missed this comment in _scsih_block_io_device() from Sreekanth's
> earlier reply.
> 
> Here's the updated incremental patch atop target-pending/for-next-merge
> to use the protected callers for both cases.
> 
> Please review + ACK ASAP.

The mpt3sas -v2 series + v4.3-rc0 breakage incremental patch here made
it into linux-next-09102015, and at this point I don't see a scenario
where keeping around the broken list_head dereferences makes sense.

So that said, I'd like to send a target-pending/for-next-merge PULL
request out to Linus in the next 48 hours.

Any objections from Avago folks..?

Thank you,

--nab

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3] net: irda: pxaficp_ir: convert to readl and writel

2015-09-11 Thread Robert Jarzmik
Petr Cvek  writes:

>> Should have been posted to linux arm kernel mailing list, unless my mailer
>> failed ...
>> 
> Searching for:
>
>   "ARM: pxa: add resources to pxaficp_ir"
>
> did not found anything, same was for "ficp" in the 
> linux-arm-kernel/netdev/linux-kernel
> mailing list archive.
Ah ok, I'll resend it then.

>>> BTW This patch required update of my kernel repo. It seems that my:
>>>
>>> magician.c patches + ficp patch + new dma engine
>>>
>>> does not work for me at all. Kernel throws some panic about interrupts and 
>>> then
>>> it ends in an infinite stack dumping loop. Fault occurs before rootfs is
>>> mounted, so probably around MMC init (with removed SD card it fails normally
>>> with no rootfs found error).
>> Could you send me (privately) the stack you're getting please. This is 
>> something
>> I'd like to catch up early in the -rc releases.
>
> Well this will be problem as I cannot save anything to an SD card after and
> during the failure.  Only viable interfaces would be earlycon on an infraport
> or high speed camera on LCD :-).
Ah just as on my mioa701. I ended up soldering a JTAG cable :)

> But I was able to revert this commit:
>
>   6464b71409511939efce1ae4fb4ec6e3483b11b2mmc: pxamci: switch 
> over to dmaengine use
>
> and after that I am able to boot.
Okay. I'll try to reproduce this failure then. If I fail, well, before using the
JTAG cable, I used another trick: I was taking a movie from the LCD with a
smartphone, and it worked. It was an horrible thing to decrypt ... Let's hope
I'll be lucky on one of my platforms.

>> Now with your stack, could you also give me the upstream commit id of the 
>> tip of
>> the tree you're using (before your patches) please ?
>
> It is probably irelevant now, but for complete information:
>
> Discovered on my working repo: mainline 
> b8889c4fc6ba03e289cec6a4d692f6f080a55e53
> Still present on fresh downloaded: linux-next
> 22dc312d56ba077db27a9798b340e7d161f1df05
Ok, thanks.

>> And it is true I have not tested the rootfs special case, where drivers are 
>> not
>> yet initialized (and more specifically gpio and interrupt chip). Your 
>> backtrace
>> should tell me if you fall into this category of issues ... but I digress, 
>> this
>> has no link with pxaficp.
>
> Should I start new thread? (same bug can be present in the FICP too)
Yes, this pxamci bothers me, it deserves a thread.

> I will try to configure an initrd rootfs this should create more ways to save
> kernel log.
Great idea.

>
> Anyway after mmc dma revert I was still not able to start FICP. There is an 
> error:
>
>   Unable to handle kernel paging request at virtual address 32e4
>
> from pxa_irda_startup() and it seems it is caused by register definitions. 
> For example:
>
>   writel_relaxed((val), (irda)->stuart_base + (off));
>
> is called by
>
>   stuart_writel(si, 0, STIER);
>
> but STIER is not just an offset, but full register address:
>   
>   __REG(0x4074)
>
> So the definition should be changed, unless there is another patch I did not
> received (in that case, send me full patchset again please) :-).
Agreed, this is a bug in this patch. With this fix, is the pxaficp working or do
you need a bit more time to experiment ?

Cheers.

-- 
Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/7] f2fs: reorganize f2fs_map_blocks

2015-09-11 Thread Chao Yu
In this patch, we try to reorganize f2fs_map_blocks to make block mapping
flow more clear by using following structure:

/* check status of mapping */

if (unmapped) {
/* blkaddr == NULL_ADDR || blkaddr == NEW_ADDR */

if (create) {
/* write path, handle dio write case here */
alloc_and_map;
} else {
/*
 * handle read cases from all call paths:
 * 1. generic read;
 * 2. dio read;
 * 3. fiemap;
 * 4. bmap
 */
}
}

/* map buffer_header */

Besides, this patch handles the missing case correctly for dio write:
When we fail in __allocate_data_blocks, then in f2fs_map_blocks, we will
not allocate blocks correctly for preallocated blocks, but returning with
an unmapped buffer head, which will result in failure of dio write.

Signed-off-by: Chao Yu 
---
 fs/f2fs/data.c | 84 ++
 1 file changed, 44 insertions(+), 40 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index a82abe9..a737ca5 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -595,40 +595,36 @@ static int f2fs_map_blocks(struct inode *inode, struct 
f2fs_map_blocks *map,
err = 0;
goto unlock_out;
}
-   if (dn.data_blkaddr == NEW_ADDR) {
-   if (flag == F2FS_GET_BLOCK_BMAP) {
-   err = -ENOENT;
-   goto put_out;
-   } else if (flag == F2FS_GET_BLOCK_READ ||
-   flag == F2FS_GET_BLOCK_DIO) {
-   goto put_out;
+
+   if (dn.data_blkaddr == NEW_ADDR || dn.data_blkaddr == NULL_ADDR) {
+   if (create) {
+   err = __allocate_data_block();
+   if (err)
+   goto put_out;
+   allocated = true;
+   map->m_flags = F2FS_MAP_NEW;
+   } else {
+   if (flag != F2FS_GET_BLOCK_FIEMAP ||
+   dn.data_blkaddr != NEW_ADDR) {
+   if (flag == F2FS_GET_BLOCK_BMAP)
+   err = -ENOENT;
+   goto put_out;
+   }
+
+   /*
+* preallocated unwritten block should be mapped
+* for fiemap.
+*/
+   if (dn.data_blkaddr == NEW_ADDR)
+   map->m_flags = F2FS_MAP_UNWRITTEN;
}
-   /*
-* if it is in fiemap call path (flag = F2FS_GET_BLOCK_FIEMAP),
-* mark it as mapped and unwritten block.
-*/
}
 
-   if (dn.data_blkaddr != NULL_ADDR) {
-   map->m_flags = F2FS_MAP_MAPPED;
-   map->m_pblk = dn.data_blkaddr;
-   if (dn.data_blkaddr == NEW_ADDR)
-   map->m_flags |= F2FS_MAP_UNWRITTEN;
-   } else if (create) {
-   err = __allocate_data_block();
-   if (err)
-   goto put_out;
-   allocated = true;
-   map->m_flags = F2FS_MAP_NEW | F2FS_MAP_MAPPED;
-   map->m_pblk = dn.data_blkaddr;
-   } else {
-   if (flag == F2FS_GET_BLOCK_BMAP)
-   err = -ENOENT;
-   goto put_out;
-   }
+   map->m_flags |= F2FS_MAP_MAPPED;
+   map->m_pblk = dn.data_blkaddr;
+   map->m_len = 1;
 
end_offset = ADDRS_PER_PAGE(dn.node_page, F2FS_I(inode));
-   map->m_len = 1;
dn.ofs_in_node++;
pgofs++;
 
@@ -647,23 +643,31 @@ get_next:
goto unlock_out;
}
 
-   if (dn.data_blkaddr == NEW_ADDR &&
-   flag != F2FS_GET_BLOCK_FIEMAP)
-   goto put_out;
-
end_offset = ADDRS_PER_PAGE(dn.node_page, F2FS_I(inode));
}
 
if (maxblocks > map->m_len) {
block_t blkaddr = datablock_addr(dn.node_page, dn.ofs_in_node);
-   if (blkaddr == NULL_ADDR && create) {
-   err = __allocate_data_block();
-   if (err)
-   goto sync_out;
-   allocated = true;
-   map->m_flags |= F2FS_MAP_NEW;
-   blkaddr = dn.data_blkaddr;
+
+   if (blkaddr == NEW_ADDR && blkaddr == NULL_ADDR) {
+   if (create) {
+   err = __allocate_data_block();
+   if (err)
+   goto sync_out;
+   allocated = true;
+   map->m_flags |= F2FS_MAP_NEW;

Re: [PATCH v2 1/1] ARM: dts: sun7i: Enable axp209 driver on olinuxino lime2

2015-09-11 Thread Olliver Schinagl
Hey chen,

On September 11, 2015 4:57:03 AM CEST, Chen-Yu Tsai  wrote:
>On Fri, Sep 11, 2015 at 2:13 AM, Maxime Ripard
> wrote:
>> Hi Oliver,
>>
>> On Wed, Sep 09, 2015 at 03:26:44PM +0200, Olliver Schinagl wrote:
>>> The Olimex OLinuXino Lime2 uses the same AXP209 as was recently
>>> introduced this driver for its power regulation.
>>>
>>> Signed-off-by: Olliver Schinagl 
>>> ---
>>>  arch/arm/boot/dts/sun7i-a20-olinuxino-lime2.dts | 87
>+
>>>  1 file changed, 31 insertions(+), 56 deletions(-)
>>>
>>> diff --git a/arch/arm/boot/dts/sun7i-a20-olinuxino-lime2.dts
>b/arch/arm/boot/dts/sun7i-a20-olinuxino-lime2.dts
>>> index d5c796c..dd90a1d 100644
>>> --- a/arch/arm/boot/dts/sun7i-a20-olinuxino-lime2.dts
>>> +++ b/arch/arm/boot/dts/sun7i-a20-olinuxino-lime2.dts
>>> @@ -71,14 +71,6 @@
>>>   default-state = "on";
>>>   };
>>>   };
>>> -
>>> - reg_axp_ipsout: axp_ipsout {
>>> - compatible = "regulator-fixed";
>>> - regulator-name = "axp-ipsout";
>>> - regulator-min-microvolt = <500>;
>>> - regulator-max-microvolt = <500>;
>>> - regulator-always-on;
>>> - };
>>
>> Why are you removing that regulator?
>
>This is really just a placeholder, rather than an actual regulator.
>
>From the bindings:
>
>- -supply: a phandle to the regulator supply node. May be
>omitted if
>inputs are unregulated, such as using the IPSOUT output
>  from the PMIC.
>
>
>>>  };
>>>
>>>   {
>>> @@ -86,6 +78,10 @@
>>>   status = "okay";
>>>  };
>>>
>>> + {
>>> + cpu-supply = <_dcdc2>;
>>> +};
>>> +
>>>   {
>>>   status = "okay";
>>>  };
>>> @@ -112,57 +108,9 @@
>>>   status = "okay";
>>>
>>>   axp209: pmic@34 {
>>> - compatible = "x-powers,axp209";
>>>   reg = <0x34>;
>>>   interrupt-parent = <_intc>;
>>>   interrupts = <0 IRQ_TYPE_LEVEL_LOW>;
>>> -
>>> - interrupt-controller;
>>> - #interrupt-cells = <1>;
>>> -
>>> - acin-supply = <_axp_ipsout>;
>>> - vin2-supply = <_axp_ipsout>;
>>> - vin3-supply = <_axp_ipsout>;
>>> - ldo24in-supply = <_axp_ipsout>;
>>> - ldo3in-supply = <_axp_ipsout>;
>>
>> And these supplies?
>>
>>> - regulators {
>>> - vdd_rtc: ldo1 {
>>> - regulator-min-microvolt = <130>;
>>> - regulator-max-microvolt = <130>;
>>> - regulator-always-on;
>>> - };
>>> -
>>> - avcc: ldo2 {
>>> - regulator-min-microvolt = <180>;
>>> - regulator-max-microvolt = <330>;
>>> - regulator-always-on;
>>> - };
>>> -
>>> - vcc_csi0: ldo3 {
>>> - regulator-min-microvolt = <70>;
>>> - regulator-max-microvolt = <350>;
>>> - regulator-always-on;
>>> - };
>>> -
>>> - vcc_csi1: ldo4 {
>>> - regulator-min-microvolt = <125>;
>>> - regulator-max-microvolt = <330>;
>>> - regulator-always-on;
>>> - };
>>> -
>>> - vdd_cpu: dcdc2 {
>>> - regulator-min-microvolt = <70>;
>>> - regulator-max-microvolt = <2275000>;
>>> - regulator-always-on;
>>> - };
>>> -
>>> - vdd_int: dcdc3 {
>>> - regulator-min-microvolt = <70>;
>>> - regulator-max-microvolt = <350>;
>>> - regulator-always-on;
>>> - };
>>> - };
>>>   };
>>>  };
>>>
>>> @@ -243,6 +191,33 @@
>>>   status = "okay";
>>>  };
>>>
>>> +#include "axp209.dtsi"
>>> +
>>> +_dcdc2 {
>>> + regulator-always-on;
>>> + regulator-min-microvolt = <100>;
>>> + regulator-max-microvolt = <145>;
>>
>> This is outside of the operating voltages of the CPU.
>>
>>> + regulator-name = "vdd-cpu";
>>> +};
>>> +
>>> +_dcdc3 {
>>> + regulator-always-on;
>>> + regulator-min-microvolt = <100>;
>>> + regulator-max-microvolt = <140>;
>>> + regulator-name = "vdd-int-dll";
>>> +};
>>> +
>>> +_ldo1 {
>>> + regulator-name = "vdd-rtc";
>>> +};
>>> +
>>> +_ldo2 {
>>> + regulator-always-on;
>>> + regulator-min-microvolt = <300>;
>>> + regulator-max-microvolt = <300>;
>>> + regulator-name = "avcc";
>>
>> You're changing the boundaries, why?
>
>The old boundaries seem to be from a very 

[PATCH 2/7] f2fs: do in batches truncation in truncate_hole

2015-09-11 Thread Chao Yu
truncate_data_blocks_range can do in batches truncation which makes all
changes in dnode page content, dnode page status, extent cache, block
count updating together.

But previously, truncate_hole() always truncates one block in dnode page
at a time by invoking truncate_data_blocks_range(,1), which make thing
slow.

This patch changes truncate_hole() to do in batches truncation for all
target blocks in one direct node inside truncate_data_blocks_range, which
can make our punch hole operation in ->fallocate more efficent.

Signed-off-by: Chao Yu 
---
 fs/f2fs/file.c | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index b2fab9e..ac97f78 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -738,23 +738,33 @@ static int fill_zero(struct inode *inode, pgoff_t index,
 
 int truncate_hole(struct inode *inode, pgoff_t pg_start, pgoff_t pg_end)
 {
-   pgoff_t index;
+   pgoff_t count, index = pg_start;
int err;
 
-   for (index = pg_start; index < pg_end; index++) {
+   while (index < pg_end) {
struct dnode_of_data dn;
+   pgoff_t end_offset;
 
set_new_dnode(, inode, NULL, NULL, 0);
err = get_dnode_of_data(, index, LOOKUP_NODE);
if (err) {
-   if (err == -ENOENT)
+   if (err == -ENOENT) {
+   index = PGOFS_OF_NEXT_DNODE(index,
+   F2FS_I(inode));
continue;
+   }
return err;
}
 
-   if (dn.data_blkaddr != NULL_ADDR)
-   truncate_data_blocks_range(, 1);
+   end_offset = ADDRS_PER_PAGE(dn.node_page, F2FS_I(inode));
+   count = min(end_offset - dn.ofs_in_node, pg_end - index);
+
+   f2fs_bug_on(F2FS_I_SB(inode), count == 0 || count > end_offset);
+
+   truncate_data_blocks_range(, count);
f2fs_put_dnode();
+
+   index += count;
}
return 0;
 }
-- 
2.4.2


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/7] f2fs: enhance multithread dio write performance

2015-09-11 Thread Chao Yu
When dio writes perform concurrently, our performace will be low because of
Thread A's allocation of multi continuous blocks will be break by Thread B,
there are two cases as below:
 - In Thread B, we may change current segment to a new segment for LFS
   allocation if we dio write in the beginning of the file.
 - In Thread B, we may allocate blocks in the middle of Thread A's
   allocation, which make blocks which allocated in Thread A being
   discontinuous.

This patch adds writepages mutex lock to make block allocation in dio write
atomic to avoid above issues.

Test environment:
ubuntu os with linux kernel 4.2+, intel i7-3770, 16g memory,
32g kingston sd card.

fio --name seqw --ioengine=sync --invalidate=1 --rw=write --directory=/mnt/f2fs 
--filesize=256m --size=16m --bs=2m --direct=1
--numjobs=10

before:
  WRITE: io=163840KB, aggrb=3145KB/s, minb=314KB/s, maxb=411KB/s, 
mint=39836msec, maxt=52083msec

patched:
  WRITE: io=163840KB, aggrb=10033KB/s, minb=1003KB/s, maxb=1124KB/s, 
mint=14565msec, maxt=16329msec

Signed-off-by: Chao Yu 
---
 fs/f2fs/data.c | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index a737ca5..a0a5849 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1536,7 +1536,9 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, struct 
iov_iter *iter,
struct file *file = iocb->ki_filp;
struct address_space *mapping = file->f_mapping;
struct inode *inode = mapping->host;
+   struct f2fs_sb_info *sbi = F2FS_I_SB(inode);
size_t count = iov_iter_count(iter);
+   int rw = iov_iter_rw(iter);
int err;
 
/* we don't need to use inline_data strictly */
@@ -1555,12 +1557,17 @@ static ssize_t f2fs_direct_IO(struct kiocb *iocb, 
struct iov_iter *iter,
 
trace_f2fs_direct_IO_enter(inode, offset, count, iov_iter_rw(iter));
 
-   if (iov_iter_rw(iter) == WRITE)
+   if (rw == WRITE) {
+   mutex_lock(>writepages);
__allocate_data_blocks(inode, offset, count);
+   }
 
err = blockdev_direct_IO(iocb, inode, iter, offset, get_data_block_dio);
-   if (err < 0 && iov_iter_rw(iter) == WRITE)
-   f2fs_write_failed(mapping, offset + count);
+   if (rw == WRITE) {
+   mutex_unlock(>writepages);
+   if (err)
+   f2fs_write_failed(mapping, offset + count);
+   }
 
trace_f2fs_direct_IO_exit(inode, offset, count, iov_iter_rw(iter), err);
 
-- 
2.4.2


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 4/4] serial: samsung: Fix UART status handling in DMA mode

2015-09-11 Thread Krzysztof Kozlowski
On 10.09.2015 22:41, Robert Baldyga wrote:
> This patch fixes UART status handling in DMA mode.

I don't see any changes here. You did not respond to my comment neither.

Code looks itself good... except a locking issue but I don't know what's
the cause. It may be not related to the patchset and maybe just not all
of issues are fixed yet. Anyway I'll describe it in 3/4.

Best regards,
Krzysztof

> For this purpose we
> use s3c24xx_serial_rx_drain_fifo() instead of uart_rx_drain_fifo(), which
> does the same thing plus checks for special conditions (such as 'break').
> 
> Thanks to this we have, for example, Magic SysRq handling, which was
> missing in DMA mode so far. Since we can use UART in DMA mode as serial
> console, this is a quite important improvement.
> 
> This change additionally simplifies RX handling code, as we no longer
> need uart_rx_drain_fifo() function, so we can remove it.
> 
> Reported-by: Marek Szyprowski 
> Signed-off-by: Robert Baldyga 
> ---
>  drivers/tty/serial/samsung.c | 30 +++---
>  1 file changed, 3 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/tty/serial/samsung.c b/drivers/tty/serial/samsung.c
> index 1d7dd86..d72cd73 100644
> --- a/drivers/tty/serial/samsung.c
> +++ b/drivers/tty/serial/samsung.c
> @@ -385,32 +385,6 @@ static void s3c24xx_uart_copy_rx_to_tty(struct 
> s3c24xx_uart_port *ourport,
>   }
>  }
>  
> -static int s3c24xx_serial_rx_fifocnt(struct s3c24xx_uart_port *ourport,
> -  unsigned long ufstat);
> -
> -static void uart_rx_drain_fifo(struct s3c24xx_uart_port *ourport)
> -{
> - struct uart_port *port = >port;
> - struct tty_port *tty = >state->port;
> - unsigned int ch, ufstat;
> - unsigned int count;
> -
> - ufstat = rd_regl(port, S3C2410_UFSTAT);
> - count = s3c24xx_serial_rx_fifocnt(ourport, ufstat);
> -
> - if (!count)
> - return;
> -
> - while (count-- > 0) {
> - ch = rd_regb(port, S3C2410_URXH);
> -
> - ourport->port.icount.rx++;
> - tty_insert_flip_char(tty, ch, TTY_NORMAL);
> - }
> -
> - tty_flip_buffer_push(tty);
> -}
> -
>  static void s3c24xx_serial_stop_rx(struct uart_port *port)
>  {
>   struct s3c24xx_uart_port *ourport = to_ourport(port);
> @@ -573,6 +547,8 @@ static void enable_rx_pio(struct s3c24xx_uart_port 
> *ourport)
>   ourport->rx_mode = S3C24XX_RX_PIO;
>  }
>  
> +static void s3c24xx_serial_rx_drain_fifo(struct s3c24xx_uart_port *ourport);
> +
>  static irqreturn_t s3c24xx_serial_rx_chars_dma(void *dev_id)
>  {
>   unsigned int utrstat, ufstat, received;
> @@ -606,7 +582,7 @@ static irqreturn_t s3c24xx_serial_rx_chars_dma(void 
> *dev_id)
>   enable_rx_pio(ourport);
>   }
>  
> - uart_rx_drain_fifo(ourport);
> + s3c24xx_serial_rx_drain_fifo(ourport);
>  
>   if (tty) {
>   tty_flip_buffer_push(t);
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 4/4] serial: samsung: Fix UART status handling in DMA mode

2015-09-11 Thread Robert Baldyga
On 09/11/2015 08:34 AM, Krzysztof Kozlowski wrote:
> On 11.09.2015 15:32, Robert Baldyga wrote:
>> On 09/11/2015 08:07 AM, Krzysztof Kozlowski wrote:
>>> On 10.09.2015 22:41, Robert Baldyga wrote:
 This patch fixes UART status handling in DMA mode.
>>>
>>> I don't see any changes here. You did not respond to my comment neither.
>>>
>>> Code looks itself good... except a locking issue but I don't know what's
>>> the cause. It may be not related to the patchset and maybe just not all
>>> of issues are fixed yet. Anyway I'll describe it in 3/4.
>>>
>>> Best regards,
>>> Krzysztof
>>>
 For this purpose we
 use s3c24xx_serial_rx_drain_fifo() instead of uart_rx_drain_fifo(), which
 does the same thing plus checks for special conditions (such as 'break').

 Thanks to this we have, for example, Magic SysRq handling, which was
 missing in DMA mode so far. Since we can use UART in DMA mode as serial
 console, this is a quite important improvement.

 This change additionally simplifies RX handling code, as we no longer
 need uart_rx_drain_fifo() function, so we can remove it.

 Reported-by: Marek Szyprowski 
 Signed-off-by: Robert Baldyga 
 ---
  drivers/tty/serial/samsung.c | 30 +++---
  1 file changed, 3 insertions(+), 27 deletions(-)

 diff --git a/drivers/tty/serial/samsung.c b/drivers/tty/serial/samsung.c
 index 1d7dd86..d72cd73 100644
 --- a/drivers/tty/serial/samsung.c
 +++ b/drivers/tty/serial/samsung.c
 @@ -385,32 +385,6 @@ static void s3c24xx_uart_copy_rx_to_tty(struct 
 s3c24xx_uart_port *ourport,
}
  }
  
 -static int s3c24xx_serial_rx_fifocnt(struct s3c24xx_uart_port *ourport,
 -   unsigned long ufstat);
 -
 -static void uart_rx_drain_fifo(struct s3c24xx_uart_port *ourport)
 -{
 -  struct uart_port *port = >port;
 -  struct tty_port *tty = >state->port;
 -  unsigned int ch, ufstat;
 -  unsigned int count;
 -
 -  ufstat = rd_regl(port, S3C2410_UFSTAT);
 -  count = s3c24xx_serial_rx_fifocnt(ourport, ufstat);
 -
 -  if (!count)
 -  return;
 -
 -  while (count-- > 0) {
 -  ch = rd_regb(port, S3C2410_URXH);
 -
 -  ourport->port.icount.rx++;
 -  tty_insert_flip_char(tty, ch, TTY_NORMAL);
 -  }
 -
 -  tty_flip_buffer_push(tty);
 -}
 -
  static void s3c24xx_serial_stop_rx(struct uart_port *port)
  {
struct s3c24xx_uart_port *ourport = to_ourport(port);
 @@ -573,6 +547,8 @@ static void enable_rx_pio(struct s3c24xx_uart_port 
 *ourport)
ourport->rx_mode = S3C24XX_RX_PIO;
  }
  
 +static void s3c24xx_serial_rx_drain_fifo(struct s3c24xx_uart_port 
 *ourport);
 +
  static irqreturn_t s3c24xx_serial_rx_chars_dma(void *dev_id)
  {
unsigned int utrstat, ufstat, received;
 @@ -606,7 +582,7 @@ static irqreturn_t s3c24xx_serial_rx_chars_dma(void 
 *dev_id)
enable_rx_pio(ourport);
}
  
>>
>> The essence of change is here. We use another method for draining FIFO.
>> Instead of just putting them into tty buffer, we additionally check for
>> special conditions, and that's the improvement.
> 
> Hm? I was referring to my comment - I did not see any changes around
> "fixes" in commit message.

Ohh, I see :p I will describe it better ;)

Thanks,
Robert
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/7] f2fs: fix overflow of size calculation

2015-09-11 Thread Chao Yu
We have potential overflow issue when calculating size of object, when
we left shift index with PAGE_CACHE_SHIFT bits, if type of index has only
32-bits space in 32-bit architecture, left shifting will incur overflow,
i.e:

pgoff_t index =  0x;
loff_t size = index << PAGE_CACHE_SHIFT;
size: 0xF000

So we should cast index with 64-bits type to avoid this issue.

Signed-off-by: Chao Yu 
---
 fs/f2fs/data.c | 11 ++-
 fs/f2fs/debug.c| 12 ++--
 fs/f2fs/f2fs.h |  2 +-
 fs/f2fs/file.c | 18 ++
 fs/f2fs/recovery.c |  2 +-
 5 files changed, 24 insertions(+), 21 deletions(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index a0a5849..5b0513d 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -447,9 +447,9 @@ repeat:
lock_page(page);
}
 got_it:
-   if (new_i_size &&
-   i_size_read(inode) < ((index + 1) << PAGE_CACHE_SHIFT)) {
-   i_size_write(inode, ((index + 1) << PAGE_CACHE_SHIFT));
+   if (new_i_size && i_size_read(inode) <
+   ((loff_t)(index + 1) << PAGE_CACHE_SHIFT)) {
+   i_size_write(inode, ((loff_t)(index + 1) << PAGE_CACHE_SHIFT));
/* Only the directory inode sets new_i_size */
set_inode_flag(F2FS_I(inode), FI_UPDATE_DIR);
}
@@ -489,8 +489,9 @@ alloc:
/* update i_size */
fofs = start_bidx_of_node(ofs_of_node(dn->node_page), fi) +
dn->ofs_in_node;
-   if (i_size_read(dn->inode) < ((fofs + 1) << PAGE_CACHE_SHIFT))
-   i_size_write(dn->inode, ((fofs + 1) << PAGE_CACHE_SHIFT));
+   if (i_size_read(dn->inode) < ((loff_t)(fofs + 1) << PAGE_CACHE_SHIFT))
+   i_size_write(dn->inode,
+   ((loff_t)(fofs + 1) << PAGE_CACHE_SHIFT));
 
/* direct IO doesn't use extent cache to maximize the performance */
f2fs_drop_largest_extent(dn->inode, fofs);
diff --git a/fs/f2fs/debug.c b/fs/f2fs/debug.c
index d013d84..ebfcc40 100644
--- a/fs/f2fs/debug.c
+++ b/fs/f2fs/debug.c
@@ -198,9 +198,9 @@ get_cache:
 
si->page_mem = 0;
npages = NODE_MAPPING(sbi)->nrpages;
-   si->page_mem += npages << PAGE_CACHE_SHIFT;
+   si->page_mem += (unsigned long long)npages << PAGE_CACHE_SHIFT;
npages = META_MAPPING(sbi)->nrpages;
-   si->page_mem += npages << PAGE_CACHE_SHIFT;
+   si->page_mem += (unsigned long long)npages << PAGE_CACHE_SHIFT;
 }
 
 static int stat_show(struct seq_file *s, void *v)
@@ -333,13 +333,13 @@ static int stat_show(struct seq_file *s, void *v)
 
/* memory footprint */
update_mem_info(si->sbi);
-   seq_printf(s, "\nMemory: %u KB\n",
+   seq_printf(s, "\nMemory: %llu KB\n",
(si->base_mem + si->cache_mem + si->page_mem) >> 10);
-   seq_printf(s, "  - static: %u KB\n",
+   seq_printf(s, "  - static: %llu KB\n",
si->base_mem >> 10);
-   seq_printf(s, "  - cached: %u KB\n",
+   seq_printf(s, "  - cached: %llu KB\n",
si->cache_mem >> 10);
-   seq_printf(s, "  - paged : %u KB\n",
+   seq_printf(s, "  - paged : %llu KB\n",
si->page_mem >> 10);
}
mutex_unlock(_stat_mutex);
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index f1a90ff..79c38ad 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1844,7 +1844,7 @@ struct f2fs_stat_info {
unsigned int segment_count[2];
unsigned int block_count[2];
unsigned int inplace_count;
-   unsigned base_mem, cache_mem, page_mem;
+   unsigned long long base_mem, cache_mem, page_mem;
 };
 
 static inline struct f2fs_stat_info *F2FS_STAT(struct f2fs_sb_info *sbi)
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 9e03622..180b838 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -74,7 +74,8 @@ static int f2fs_vm_page_mkwrite(struct vm_area_struct *vma,
goto mapped;
 
/* page is wholly or partially inside EOF */
-   if (((page->index + 1) << PAGE_CACHE_SHIFT) > i_size_read(inode)) {
+   if (((loff_t)(page->index + 1) << PAGE_CACHE_SHIFT) >
+   i_size_read(inode)) {
unsigned offset;
offset = i_size_read(inode) & ~PAGE_CACHE_MASK;
zero_user_segment(page, offset, PAGE_CACHE_SIZE);
@@ -343,7 +344,7 @@ static loff_t f2fs_seek_block(struct file *file, loff_t 
offset, int whence)
 
dirty = __get_first_dirty_index(inode->i_mapping, pgofs, whence);
 
-   for (; data_ofs < isize; data_ofs = pgofs << PAGE_CACHE_SHIFT) {
+   for (; data_ofs < isize; data_ofs = (loff_t)pgofs << PAGE_CACHE_SHIFT) {
set_new_dnode(, inode, NULL, NULL, 0);
err = 

[PATCH 6/7] f2fs: fix incorrect searching position when shrinking extent cache

2015-09-11 Thread Chao Yu
When shrinking extent cache, we have two steps in the flow:
1) shrink objects which are unreferenced by inodes;
2) shrink objects from LRU list of extent cache.

In step 1, if we haven't shrunk enough number of objects, we will try
step 2, but before that we didn't update the searching position which
may point to last inode index in global extent tree, result in failing
to shrink objects by traversing the all inodes' extent tree.

In this patch, we reset searching position to beginning of global extent
tree for fixing.

Signed-off-by: Chao Yu 
---
 fs/f2fs/extent_cache.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/fs/f2fs/extent_cache.c b/fs/f2fs/extent_cache.c
index 1cd6c6c..a8b9aa2 100644
--- a/fs/f2fs/extent_cache.c
+++ b/fs/f2fs/extent_cache.c
@@ -652,6 +652,11 @@ unsigned int f2fs_shrink_extent_tree(struct f2fs_sb_info 
*sbi, int nr_shrink)
}
spin_unlock(>extent_lock);
 
+   /*
+* reset ino for searching victims from beginning of global extent tree.
+*/
+   ino = F2FS_ROOT_INO(sbi);
+
while ((found = radix_tree_gang_lookup(root,
(void **)treevec, ino, EXT_TREE_VEC_SIZE))) {
unsigned i;
-- 
2.4.2


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


laina

2015-09-11 Thread YesGrowth Loans®



 Hyvää päivää,

   Olen rouva Rose Butler, toimeenpaneva agentti hyvin tunnustettu laillinen 
luotonanto yritys tunnetaan YesGrowth Lainat. Onko sinulla huono luotto tai 
olet tarvitsevat rahaa maksaa laskujaan? meidän korko on 3%.

  Täytä alla oleva lomake jos kiinnostaa.

 Koko nimi:
 sukupuoli:
 Tarvittava määrä:
 kesto:

 Voit ottaa meihin yhteyttä Puh: +447045734550, yesgrowth1...@gmail.com

  Ystävällisin terveisin,
 Rouva Rose Butler
_
This message is intended only for recipients who are authorized to receive it. 
It contains confidential and/ or legally privileged information belonging to PT 
SEMEN INDONESIA (Persero) Tbk, therefore the authorized recipients shall 
protect this confidential information disclosed pursuant to provisions of Semen 
Indonesia's policy. If you are not a valid recipient of this message, please 
delete it from your system and/ or destroy all of the tangible material 
produced from the information herein together with all copies or reproductions 
thereof and notify the sender immediately. Please also be notified that any 
disclosure, copying, distribution or taking any action based on the contents of 
this message is strictly prohibited and may be unlawful
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] ARM: rockchip: add reboot notifier

2015-09-11 Thread Caesar Wang


在 2015年09月11日 11:40, Andy Yan 写道:

Hi Eddie:

On 2015年09月11日 10:01, Eddie Cai wrote:

Hi Andy

2015-09-10 19:04 GMT+08:00 Andy Yan :

rockchip platform have a protocol to pass the kernel reboot
mode to bootloader by some special registers when system reboot.
By this way the bootloader can take different action according
to the different kernel reboot mode, for example, command
"reboot loader" will reboot the board to rockusb mode, this is
a very convenient way to get the board enter download mode.

Signed-off-by: Andy Yan 
---

Changes in v2:
   - check cpu dt node
   - remove a unnecessary of_put_node in function 
rockchip_get_pmu_regmap

   - fix a align issue
   - use reboot_notifier instead of restart_handler

  arch/arm/mach-rockchip/Makefile |   2 +-
  arch/arm/mach-rockchip/loader.h |  22 
  arch/arm/mach-rockchip/reboot.c | 111 


  3 files changed, 134 insertions(+), 1 deletion(-)
  create mode 100644 arch/arm/mach-rockchip/loader.h
  create mode 100644 arch/arm/mach-rockchip/reboot.c

diff --git a/arch/arm/mach-rockchip/Makefile 
b/arch/arm/mach-rockchip/Makefile

index 5c3a9b2..cd291e3 100644
--- a/arch/arm/mach-rockchip/Makefile
+++ b/arch/arm/mach-rockchip/Makefile
@@ -1,5 +1,5 @@
  CFLAGS_platsmp.o := -march=armv7-a

-obj-$(CONFIG_ARCH_ROCKCHIP) += rockchip.o
+obj-$(CONFIG_ARCH_ROCKCHIP) += rockchip.o reboot.o
  obj-$(CONFIG_PM_SLEEP) += pm.o sleep.o
  obj-$(CONFIG_SMP) += headsmp.o platsmp.o
diff --git a/arch/arm/mach-rockchip/loader.h 
b/arch/arm/mach-rockchip/loader.h

new file mode 100644
index 000..bf51baa
--- /dev/null
+++ b/arch/arm/mach-rockchip/loader.h
@@ -0,0 +1,22 @@
+#ifndef __MACH_ROCKCHIP_LOADER_H
+#define __MACH_ROCKCHIP_LOADER_H
+
+/*high 24 bits is tag, low 8 bits is type*/
+#define SYS_LOADER_REBOOT_FLAG   0x5242C300
+
+enum {
+   BOOT_NORMAL = 0, /* normal boot */
+   BOOT_LOADER, /* enter loader rockusb mode */
+   BOOT_MASKROM,/* enter maskrom rockusb mode (not support 
now) */

+   BOOT_RECOVER,/* enter recover */
+   BOOT_NORECOVER,  /* do not enter recover */
+   BOOT_SECONDOS,   /* boot second OS (not support now)*/
+   BOOT_WIPEDATA,   /* enter recover and wipe data. */
+   BOOT_WIPEALL,/* enter recover and wipe all data. */
+   BOOT_CHECKIMG,   /* check firmware img with backup part*/
+   BOOT_FASTBOOT,   /* enter fast boot mode */
+   BOOT_SECUREBOOT_DISABLE,
+   BOOT_CHARGING,   /* enter charge mode */
+   BOOT_MAX /* MAX VALID BOOT TYPE.*/
+};
+#endif
diff --git a/arch/arm/mach-rockchip/reboot.c 
b/arch/arm/mach-rockchip/reboot.c

new file mode 100644
index 000..c29f031e
--- /dev/null
+++ b/arch/arm/mach-rockchip/reboot.c
@@ -0,0 +1,111 @@
+/*
+ * Copyright (c) 2015, Fuzhou Rockchip Electronics Co., Ltd
+ *
+ * This program is free software; you can redistribute it and/or 
modify
+ * it under the terms of the GNU General Public License as 
published by

+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "loader.h"
+
+#define RK3188_PMU_SYS_REG0 0x40
+#define RK3288_PMU_SYS_REG0 0x94

I think RK3066 and RK3368 also supported in upstream kernel. And the
protocol is the same. So it would be better to support it here.
   RK3066 is supported, please see function 
rockchip_get_reboot_flag_regmap bellow

   it uses the same register offset as rk3188

   RK3368 related code is under arm64, I don't know how to handle it.


Maybe, can we put the driver into  drivers/soc/rockchip/?


Heiko, do you have any suggestion?

+
+struct regmap *regmap;
+int flag_reg;
+
+static int rockchip_get_pmu_regmap(void)
+{
+   struct device_node *node;
+
+   node = of_find_node_by_path("/cpus");
+   if (!node)
+   return -ENODEV;
+
+   regmap = syscon_regmap_lookup_by_phandle(node, "rockchip,pmu");
+   of_node_put(node);
+   if (!IS_ERR(regmap))
+   return 0;
+
+   regmap = 
syscon_regmap_lookup_by_compatible("rockchip,rk3066-pmu");

+   if (!IS_ERR(regmap))
+   return 0;
+
+   return -ENODEV;
+}
+
+static int rockchip_get_reboot_flag_regmap(void)
+{
+   int ret = rockchip_get_pmu_regmap();
+
+   if (ret < 0)
+   return ret;
+
+   if (of_machine_is_compatible("rockchip,rk3288")) {
+   flag_reg = RK3288_PMU_SYS_REG0;
+   return 0;
+   } else if (of_machine_is_compatible("rockchip,rk3066a") ||
+ of_machine_is_compatible("rockchip,rk3066b") ||
+ of_machine_is_compatible("rockchip,rk3188")) {
+   flag_reg = RK3188_PMU_SYS_REG0;
+   return 0;
+   }
+
+   return -ENODEV;
+}
+
+static void rockchip_get_reboot_flag(const char *cmd, u32 *flag)
+{
+   *flag = 

[PATCH 1/2] dma: Add Freescale qDMA engine driver support

2015-09-11 Thread Yuan Yao
Add Freescale Queue Direct Memory Access(qDMA) controller support.
This module can be found on LS-1 and LS-2 SoCs.

This add the legacy mode support for qDMA.

Signed-off-by: Yuan Yao 
---
 Documentation/devicetree/bindings/dma/fsl-qdma.txt |  43 ++
 MAINTAINERS|   7 +
 drivers/dma/Kconfig|  10 +
 drivers/dma/Makefile   |   1 +
 drivers/dma/fsl-qdma.c | 521 +
 5 files changed, 582 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/dma/fsl-qdma.txt
 create mode 100644 drivers/dma/fsl-qdma.c

diff --git a/Documentation/devicetree/bindings/dma/fsl-qdma.txt 
b/Documentation/devicetree/bindings/dma/fsl-qdma.txt
new file mode 100644
index 000..cdae71c
--- /dev/null
+++ b/Documentation/devicetree/bindings/dma/fsl-qdma.txt
@@ -0,0 +1,43 @@
+* Freescale queue Direct Memory Access Controller(qDMA) Controller
+
+  The qDMA controller transfers blocks of data between one source and one or 
more
+destinations. The blocks of data transferred can be represented in memory as 
contiguous
+or non-contiguous using scatter/gather table(s). Channel virtualization is 
supported
+through enqueuing of DMA jobs to, or dequeuing DMA jobs from, different work
+queues.
+  Legacy mode is primarily included for software requiring the earlier
+QorIQ DMA programming model. This mode provides a simple programming
+model not utilizing the datapath architecture. In legacy mode, DMA
+operations are directly configured through a set of architectural
+registers per channel.
+
+* qDMA Controller
+Required properties:
+- compatible :
+   - "fsl,ls-qdma" for qDMA used similar to that on LS SoC
+- reg : Specifies base physical address(s) and size of the qDMA registers.
+   The region is qDMA control register's address and size.
+- interrupts : A list of interrupt-specifiers, one for each entry in
+   interrupt-names.
+- interrupt-names : Should contain:
+   "qdma-tx" - the  interrupt
+   "qdma-err" - the error interrupt
+- channels : Number of channels supported by the controller
+
+Optional properties:
+- big-endian: If present registers and hardware scatter/gather descriptors
+   of the qDMA are implemented in big endian mode, otherwise in little
+   mode.
+
+
+Examples:
+
+   qdma: qdma@839 {
+   compatible = "fsl,ls-qdma";
+   reg = <0x0 0x838 0x0 0x2>;
+   interrupts = ,
+   ;
+   interrupt-names = "qdma-tx", "qdma-err";
+   big-endian;
+   channels = <1>;
+   };
diff --git a/MAINTAINERS b/MAINTAINERS
index 5772ccf..a4d1b52 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4357,6 +4357,13 @@ L:   linuxppc-...@lists.ozlabs.org
 S: Maintained
 F: drivers/dma/fsldma.*
 
+FREESCALE qDMA DRIVER
+M: Yuan Yao 
+L: linux-arm-ker...@lists.infradead.org
+S: Maintained
+F: Documentation/devicetree/bindings/dma/fsl-qdma.txt
+F: drivers/dma/fsl-qdma.c
+
 FREESCALE I2C CPM DRIVER
 M: Jochen Friedrich 
 L: linuxppc-...@lists.ozlabs.org
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index b458475..e29e985 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -193,6 +193,16 @@ config FSL_EDMA
  multiplexing capability for DMA request sources(slot).
  This module can be found on Freescale Vybrid and LS-1 SoCs.
 
+config FSL_QDMA
+   tristate "Freescale qDMA engine support"
+   select DMA_ENGINE
+   select DMA_VIRTUAL_CHANNELS
+   help
+ Support the Freescale qDMA engine with command queue and legacy mode.
+ Channel virtualization is supported through enqueuing of DMA jobs to,
+ or dequeuing DMA jobs from, different work queues.
+ This module can be found on Freescale LS SoCs.
+
 config FSL_RAID
 tristate "Freescale RAID engine Support"
 depends on FSL_SOC && !ASYNC_TX_ENABLE_CHANNEL_SWITCH
diff --git a/drivers/dma/Makefile b/drivers/dma/Makefile
index 7711a71..8de7526 100644
--- a/drivers/dma/Makefile
+++ b/drivers/dma/Makefile
@@ -29,6 +29,7 @@ obj-$(CONFIG_DW_DMAC_CORE) += dw/
 obj-$(CONFIG_EP93XX_DMA) += ep93xx_dma.o
 obj-$(CONFIG_FSL_DMA) += fsldma.o
 obj-$(CONFIG_FSL_EDMA) += fsl-edma.o
+obj-$(CONFIG_FSL_QDMA) += fsl-qdma.o
 obj-$(CONFIG_FSL_RAID) += fsl_raid.o
 obj-$(CONFIG_HSU_DMA) += hsu/
 obj-$(CONFIG_IMG_MDC_DMA) += img-mdc-dma.o
diff --git a/drivers/dma/fsl-qdma.c b/drivers/dma/fsl-qdma.c
new file mode 100644
index 000..846cdba
--- /dev/null
+++ b/drivers/dma/fsl-qdma.c
@@ -0,0 +1,521 @@
+/*
+ * drivers/dma/fsl-qdma.c
+ *
+ * Copyright 2014-2015 Freescale Semiconductor, Inc.
+ *
+ * Driver for the Freescale qDMA engine with legacy mode.
+ * This module can be found on Freescale LS SoCs.
+ *
+ * This program is free software; you can 

[PATCH 2/2] dma: dts: Add Freescale qDMA engine driver support

2015-09-11 Thread Yuan Yao
Add Freescale Queue Direct Memory Access(qDMA) controller support.
This module can be found on LS-1 and LS-2 SoCs.

This add the legacy mode support for qDMA.

Signed-off-by: Yuan Yao 
---
 arch/arm/boot/dts/ls1021a.dtsi | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm/boot/dts/ls1021a.dtsi b/arch/arm/boot/dts/ls1021a.dtsi
index 973a496..fa82dae 100644
--- a/arch/arm/boot/dts/ls1021a.dtsi
+++ b/arch/arm/boot/dts/ls1021a.dtsi
@@ -388,6 +388,16 @@
 <_clk 1>;
};
 
+   qdma: qdma@8389000 {
+ compatible = "fsl,ls-qdma";
+ reg = <0x0 0x8389000 0x0 0x2>;
+ interrupts = ,
+;
+ interrupt-names = "qdma-tx", "qdma-err";
+ channels = <1>;
+ big-endian;
+   };
+
mdio0: mdio@2d24000 {
compatible = "gianfar";
device_type = "mdio";
-- 
2.1.0.27.g96db324

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] virtgpu: mark as a render gpu

2015-09-11 Thread Dave Airlie
On 11 September 2015 at 01:04, Emil Velikov  wrote:
> On 10 September 2015 at 15:52, Gerd Hoffmann  wrote:
>>   Hi,
>>
>>> > Dave?  Looking at the ioctls they are all fine for render nodes, there
>>> > isn't anything modesetting related in the device-specific ioctls.
>>> >
>>> > Correct?
>>> >
>>> Unless I've overdone the coffee this time - modesetting is done via
>>> the card# node, while render via either card# or renderD#.
>>
>> Exactly, thats why anything modesetting-related must be disabled for
>> renderD#.  Looking at the virtio-gpu device-specific ioctls I don't
>> think there is anything doing modesetting (which we would have to leave
>> out), so we can apply DRM_RENDER_ALLOW everythere I think.  Or maybe
>> there is a global switch to flip DRM_RENDER_ALLOW for the whole list ...
>>
> IMHO the idea of having a 'global' switch sounds quite good, yet there
> isn't one atm :-( It will be quite useful as we get more render only
> devices.
> DRIVER_RENDER doesn't do that unfortunately (which I think was the
> original assumption), it only instructs drm core to create the
> renderD# device/node.

doh, yes we need to add DRM_RENDER_ALLOW to the ioctls, can you do that?

Dave.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 4/4] serial: samsung: Fix UART status handling in DMA mode

2015-09-11 Thread Robert Baldyga
On 09/11/2015 08:07 AM, Krzysztof Kozlowski wrote:
> On 10.09.2015 22:41, Robert Baldyga wrote:
>> This patch fixes UART status handling in DMA mode.
> 
> I don't see any changes here. You did not respond to my comment neither.
> 
> Code looks itself good... except a locking issue but I don't know what's
> the cause. It may be not related to the patchset and maybe just not all
> of issues are fixed yet. Anyway I'll describe it in 3/4.
> 
> Best regards,
> Krzysztof
> 
>> For this purpose we
>> use s3c24xx_serial_rx_drain_fifo() instead of uart_rx_drain_fifo(), which
>> does the same thing plus checks for special conditions (such as 'break').
>>
>> Thanks to this we have, for example, Magic SysRq handling, which was
>> missing in DMA mode so far. Since we can use UART in DMA mode as serial
>> console, this is a quite important improvement.
>>
>> This change additionally simplifies RX handling code, as we no longer
>> need uart_rx_drain_fifo() function, so we can remove it.
>>
>> Reported-by: Marek Szyprowski 
>> Signed-off-by: Robert Baldyga 
>> ---
>>  drivers/tty/serial/samsung.c | 30 +++---
>>  1 file changed, 3 insertions(+), 27 deletions(-)
>>
>> diff --git a/drivers/tty/serial/samsung.c b/drivers/tty/serial/samsung.c
>> index 1d7dd86..d72cd73 100644
>> --- a/drivers/tty/serial/samsung.c
>> +++ b/drivers/tty/serial/samsung.c
>> @@ -385,32 +385,6 @@ static void s3c24xx_uart_copy_rx_to_tty(struct 
>> s3c24xx_uart_port *ourport,
>>  }
>>  }
>>  
>> -static int s3c24xx_serial_rx_fifocnt(struct s3c24xx_uart_port *ourport,
>> - unsigned long ufstat);
>> -
>> -static void uart_rx_drain_fifo(struct s3c24xx_uart_port *ourport)
>> -{
>> -struct uart_port *port = >port;
>> -struct tty_port *tty = >state->port;
>> -unsigned int ch, ufstat;
>> -unsigned int count;
>> -
>> -ufstat = rd_regl(port, S3C2410_UFSTAT);
>> -count = s3c24xx_serial_rx_fifocnt(ourport, ufstat);
>> -
>> -if (!count)
>> -return;
>> -
>> -while (count-- > 0) {
>> -ch = rd_regb(port, S3C2410_URXH);
>> -
>> -ourport->port.icount.rx++;
>> -tty_insert_flip_char(tty, ch, TTY_NORMAL);
>> -}
>> -
>> -tty_flip_buffer_push(tty);
>> -}
>> -
>>  static void s3c24xx_serial_stop_rx(struct uart_port *port)
>>  {
>>  struct s3c24xx_uart_port *ourport = to_ourport(port);
>> @@ -573,6 +547,8 @@ static void enable_rx_pio(struct s3c24xx_uart_port 
>> *ourport)
>>  ourport->rx_mode = S3C24XX_RX_PIO;
>>  }
>>  
>> +static void s3c24xx_serial_rx_drain_fifo(struct s3c24xx_uart_port *ourport);
>> +
>>  static irqreturn_t s3c24xx_serial_rx_chars_dma(void *dev_id)
>>  {
>>  unsigned int utrstat, ufstat, received;
>> @@ -606,7 +582,7 @@ static irqreturn_t s3c24xx_serial_rx_chars_dma(void 
>> *dev_id)
>>  enable_rx_pio(ourport);
>>  }
>>  

The essence of change is here. We use another method for draining FIFO.
Instead of just putting them into tty buffer, we additionally check for
special conditions, and that's the improvement.

>> -uart_rx_drain_fifo(ourport);
>> +s3c24xx_serial_rx_drain_fifo(ourport);
>>  
>>  if (tty) {
>>  tty_flip_buffer_push(t);
>>
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] PM / devfreq: Fix incorrect type issue.

2015-09-11 Thread Xiaolong Ye
> 
> 
> > time_in_state in struct devfreq is defined as unsigned long, so
> > devm_kzalloc should use sizeof(unsigned long) as argument instead of
> > sizeof(unsigned int), otherwise it will cause unexpected result in
> > 64bit system.
> >
> > Signed-off-by: Xiaolong Ye 
> > Signed-off-by: Kevin Liu 
> 
> Thanks!
> 
> Signed-off-by: MyungJoo Ham 
> 
> 
> Which SoC are you working with?

I am working on MARVELL PXA1928 SoC platform(with 4 ARM CA53 cores), and we are 
using devfreq framework to implement our 
ddr frequency change design, I found this issue when adapting our driver to 
64bit system.

> Are you going to upstream your 64bit devfreq driver soon?

Currently, we don’t have a plan to upstream our ddr devfreq driver.

> 
> 
> Cheers,
> MyungJoo
> 
> > ---
> >  drivers/devfreq/devfreq.c |2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
> > index ca1b362..ac9845a 100644
> > --- a/drivers/devfreq/devfreq.c
> > +++ b/drivers/devfreq/devfreq.c
> > @@ -482,7 +482,7 @@ struct devfreq *devfreq_add_device(struct device
> *dev,
> > devfreq->profile->max_state *
> > devfreq->profile->max_state,
> > GFP_KERNEL);
> > -   devfreq->time_in_state = devm_kzalloc(dev, sizeof(unsigned int) *
> > +   devfreq->time_in_state = devm_kzalloc(dev, sizeof(unsigned long) *
> > devfreq->profile->max_state,
> > GFP_KERNEL);
> > devfreq->last_stat_updated = jiffies;
> > --
> > 1.7.9.5


[PATCH v2 02/16] Staging: speakup: devsynth.c: Remove explicit NULL comparison

2015-09-11 Thread Shraddha Barke
Remove explicit NULL comparison and write it in its simpler form.
Replacement done with coccinelle:

@replace_rule@
expression e;
@@

-e == NULL
+ !e

Signed-off-by: Shraddha Barke 
---
Change in v2-
 No change

 drivers/staging/speakup/devsynth.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/speakup/devsynth.c 
b/drivers/staging/speakup/devsynth.c
index 71c728a..d1ffdf4 100644
--- a/drivers/staging/speakup/devsynth.c
+++ b/drivers/staging/speakup/devsynth.c
@@ -22,7 +22,7 @@ static ssize_t speakup_file_write(struct file *fp, const char 
__user *buffer,
unsigned long flags;
u_char buf[256];
 
-   if (synth == NULL)
+   if (!synth)
return -ENODEV;
while (count > 0) {
bytes = min(count, sizeof(buf));
@@ -45,7 +45,7 @@ static ssize_t speakup_file_read(struct file *fp, char __user 
*buf,
 
 static int speakup_file_open(struct inode *ip, struct file *fp)
 {
-   if (synth == NULL)
+   if (!synth)
return -ENODEV;
if (xchg(_opened, 1))
return -EBUSY;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 04/16] Staging: speakup: kobjects.c: Remove explicit NULL comparison

2015-09-11 Thread Shraddha Barke
Remove explicit NULL comparison and write it in its simpler form.
Replacement done with coccinelle:

@replace_rule@
expression e;
@@

-e == NULL
+ !e

Signed-off-by: Shraddha Barke 
---
Change in v2-
 No change

 drivers/staging/speakup/kobjects.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/speakup/kobjects.c 
b/drivers/staging/speakup/kobjects.c
index 958add4..fdfeb42 100644
--- a/drivers/staging/speakup/kobjects.c
+++ b/drivers/staging/speakup/kobjects.c
@@ -368,7 +368,7 @@ static ssize_t synth_show(struct kobject *kobj, struct 
kobj_attribute *attr,
 {
int rv;
 
-   if (synth == NULL)
+   if (!synth)
rv = sprintf(buf, "%s\n", "none");
else
rv = sprintf(buf, "%s\n", synth->name);
@@ -459,14 +459,14 @@ static ssize_t punc_show(struct kobject *kobj, struct 
kobj_attribute *attr,
unsigned long flags;
 
p_header = spk_var_header_by_name(attr->attr.name);
-   if (p_header == NULL) {
+   if (!p_header) {
pr_warn("p_header is null, attr->attr.name is %s\n",
attr->attr.name);
return -EINVAL;
}
 
var = spk_get_punc_var(p_header->var_id);
-   if (var == NULL) {
+   if (!var) {
pr_warn("var is null, p_header->var_id is %i\n",
p_header->var_id);
return -EINVAL;
@@ -501,14 +501,14 @@ static ssize_t punc_store(struct kobject *kobj, struct 
kobj_attribute *attr,
return -EINVAL;
 
p_header = spk_var_header_by_name(attr->attr.name);
-   if (p_header == NULL) {
+   if (!p_header) {
pr_warn("p_header is null, attr->attr.name is %s\n",
attr->attr.name);
return -EINVAL;
}
 
var = spk_get_punc_var(p_header->var_id);
-   if (var == NULL) {
+   if (!var) {
pr_warn("var is null, p_header->var_id is %i\n",
p_header->var_id);
return -EINVAL;
@@ -546,7 +546,7 @@ ssize_t spk_var_show(struct kobject *kobj, struct 
kobj_attribute *attr,
unsigned long flags;
 
param = spk_var_header_by_name(attr->attr.name);
-   if (param == NULL)
+   if (!param)
return -EINVAL;
 
spin_lock_irqsave(_info.spinlock, flags);
@@ -622,9 +622,9 @@ ssize_t spk_var_store(struct kobject *kobj, struct 
kobj_attribute *attr,
unsigned long flags;
 
param = spk_var_header_by_name(attr->attr.name);
-   if (param == NULL)
+   if (!param)
return -EINVAL;
-   if (param->data == NULL)
+   if (!param->data)
return 0;
ret = 0;
cp = (char *)buf;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 03/16] Staging: speakup: serialio.c: Remove explicit NULL comparison

2015-09-11 Thread Shraddha Barke
Remove explicit NULL comparison and write it in its simpler form.
Replacement done with coccinelle:

@replace_rule@
expression e;
@@

-e == NULL
+ !e

Signed-off-by: Shraddha Barke 
---
Change in v2-
 No change

 drivers/staging/speakup/serialio.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/speakup/serialio.c 
b/drivers/staging/speakup/serialio.c
index 66ac999..3b5835b 100644
--- a/drivers/staging/speakup/serialio.c
+++ b/drivers/staging/speakup/serialio.c
@@ -101,7 +101,7 @@ static void start_serial_interrupt(int irq)
 {
int rv;
 
-   if (synth->read_buff_add == NULL)
+   if (!synth->read_buff_add)
return;
 
rv = request_irq(irq, synth_readbuf_handler, IRQF_SHARED,
@@ -127,7 +127,7 @@ void spk_stop_serial_interrupt(void)
if (speakup_info.port_tts == 0)
return;
 
-   if (synth->read_buff_add == NULL)
+   if (!synth->read_buff_add)
return;
 
/* Turn off interrupts */
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 01/16] Staging: speakup: varhandlers.c: Remove explicit NULL comparison

2015-09-11 Thread Shraddha Barke
Remove explicit NULL comparison and write it in its simpler form.
Replacement done with coccinelle:

@replace_rule@
expression e;
@@

-e == NULL
+ !e

Signed-off-by: Shraddha Barke 
---
Change in v2-
 Considering cases with != NULL also

 drivers/staging/speakup/varhandlers.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/staging/speakup/varhandlers.c 
b/drivers/staging/speakup/varhandlers.c
index 75bf40c..b2afec6 100644
--- a/drivers/staging/speakup/varhandlers.c
+++ b/drivers/staging/speakup/varhandlers.c
@@ -90,7 +90,7 @@ void speakup_register_var(struct var_t *var)
struct st_var_header *p_header;
 
BUG_ON(!var || var->var_id < 0 || var->var_id >= MAXVARS);
-   if (var_ptrs[0] == NULL) {
+   if (!var_ptrs[0]) {
for (i = 0; i < MAXVARS; i++) {
p_header = _headers[i];
var_ptrs[p_header->var_id] = p_header;
@@ -130,7 +130,7 @@ struct st_var_header *spk_get_var_header(enum var_id_t 
var_id)
if (var_id < 0 || var_id >= MAXVARS)
return NULL;
p_header = var_ptrs[var_id];
-   if (p_header->data == NULL)
+   if (!p_header->data)
return NULL;
return p_header;
 }
@@ -163,7 +163,7 @@ struct punc_var_t *spk_get_punc_var(enum var_id_t var_id)
struct punc_var_t *where;
 
where = punc_vars;
-   while ((where->var_id != -1) && (rv == NULL)) {
+   while ((where->var_id != -1) && (!rv)) {
if (where->var_id == var_id)
rv = where;
else
@@ -183,7 +183,7 @@ int spk_set_num_var(int input, struct st_var_header *var, 
int how)
char *cp;
struct var_t *var_data = var->data;
 
-   if (var_data == NULL)
+   if (!var_data)
return -ENODATA;
 
if (how == E_NEW_DEFAULT) {
@@ -221,9 +221,9 @@ int spk_set_num_var(int input, struct st_var_header *var, 
int how)
if (var_data->u.n.multiplier != 0)
val *= var_data->u.n.multiplier;
val += var_data->u.n.offset;
-   if (var->var_id < FIRST_SYNTH_VAR || synth == NULL)
+   if (var->var_id < FIRST_SYNTH_VAR || !synth)
return ret;
-   if (synth->synth_adjust != NULL) {
+   if (synth->synth_adjust) {
int status = synth->synth_adjust(var);
 
return (status != 0) ? status : ret;
@@ -247,7 +247,7 @@ int spk_set_string_var(const char *page, struct 
st_var_header *var, int len)
 {
struct var_t *var_data = var->data;
 
-   if (var_data == NULL)
+   if (!var_data)
return -ENODATA;
if (len > MAXVARLEN)
return -E2BIG;
@@ -288,7 +288,7 @@ int spk_set_mask_bits(const char *input, const int which, 
const int how)
if (*cp < SPACE)
break;
if (mask < PUNC) {
-   if (!(spk_chartab[*cp]))
+   if (!(spk_chartab[*cp] & PUNC))
break;
} else if (spk_chartab[*cp]_NUM)
break;
@@ -313,7 +313,7 @@ char *spk_strlwr(char *s)
 {
char *p;
 
-   if (s == NULL)
+   if (!s)
return NULL;
 
for (p = s; *p; p++)
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 3/4] serial: samsung: introduce s3c24xx_serial_rx_drain_fifo() function

2015-09-11 Thread Krzysztof Kozlowski
On 10.09.2015 22:41, Robert Baldyga wrote:
> This patch introduces s3c24xx_serial_rx_drain_fifo() which reads data
> from RX FIFO and writes it to tty buffer. It also checks for special
> conditions (such as 'break') and handles it. This function has been
> separated from s3c24xx_serial_rx_chars_pio() as it contains code which
> can be used also in DMA mode.

Much better, thanks! Now it is also easier to spot the difference (see
below).

> 
> Signed-off-by: Robert Baldyga 
> ---
>  drivers/tty/serial/samsung.c | 23 +--
>  1 file changed, 13 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/tty/serial/samsung.c b/drivers/tty/serial/samsung.c
> index dc4be54..1d7dd86 100644
> --- a/drivers/tty/serial/samsung.c
> +++ b/drivers/tty/serial/samsung.c
> @@ -621,16 +621,12 @@ finish:
>   return IRQ_HANDLED;
>  }
>  
> -static irqreturn_t s3c24xx_serial_rx_chars_pio(void *dev_id)
> +static void s3c24xx_serial_rx_drain_fifo(struct s3c24xx_uart_port *ourport)
>  {
> - struct s3c24xx_uart_port *ourport = dev_id;
>   struct uart_port *port = >port;
>   unsigned int ufcon, ch, flag, ufstat, uerstat;
> - unsigned long flags;
>   int max_count = port->fifosize;
>  
> - spin_lock_irqsave(>lock, flags);
> -
>   while (max_count-- > 0) {
>   ufcon = rd_regl(port, S3C2410_UFCON);
>   ufstat = rd_regl(port, S3C2410_UFSTAT);
> @@ -654,9 +650,7 @@ static irqreturn_t s3c24xx_serial_rx_chars_pio(void 
> *dev_id)
>   ufcon |= S3C2410_UFCON_RESETRX;
>   wr_regl(port, S3C2410_UFCON, ufcon);
>   rx_enabled(port) = 1;
> - spin_unlock_irqrestore(>lock,
> - flags);
> - goto out;
> + return;
>   }
>   continue;
>   }
> @@ -702,10 +696,19 @@ static irqreturn_t s3c24xx_serial_rx_chars_pio(void 
> *dev_id)
>ch, flag);
>   }
>  
> - spin_unlock_irqrestore(>lock, flags);
>   tty_flip_buffer_push(>state->port);

Here is a difference - previously this was outside of spinlock. I think
moving it inside spin lock is okay, just the interrupts won't be
disabled before unlock and queue_work() from tty_flip_buffer_push().

However after testing this patchset (entire) on:
next-20150910 + my dt-for-next branch (dma for serial) + this patchset
you can see quite complicated lockdep warning:

[3.568657] =
[3.575079] [ INFO: possible irq lock inversion dependency detected ]
[3.581506] 4.2.0-next-20150910-9-g65fd5a9cff54 #218 Not tainted
[3.587838] -
[3.594263] swapper/0/0 just changed the state of lock:
[3.599470]  (_lock_key){..-...}, at: []
s3c24xx_serial_tx_dma_complete+0x8c/0xfc
[3.608237] but this lock took another, SOFTIRQ-unsafe lock in the past:
[3.614919]  (&(>lock)->rlock){+.+...}
[3.614919]
[3.614919] and interrupts could create inverse lock ordering between
them.
[3.614919]
[3.625076]
[3.625076] other info that might help us debug this:
[3.631586]  Possible interrupt unsafe locking scenario:
[3.631586]
[3.638356]CPU0CPU1
[3.642870]
[3.647382]   lock(&(>lock)->rlock);
[3.651376]local_irq_disable();
[3.657278]lock(_lock_key);
[3.663267]lock(&(>lock)->rlock);
[3.669777]   
[3.672381] lock(_lock_key);


Config: exynos, disabled MMC_CLKGATE, enabled usual testing stuff
Board: Trats2

Didn't you notice it?


Additionally the SysRq "Show backtrace of all active CPUs" on this
linux-next (without additional patches, pure next) has significant delay
(like 5 seconds) and a:
[  169.221223] s3c-i2c 138d.i2c: timeout waiting for bus idle

That's weird. But as I said this occurs on pure next as well.

Dmesgs and config attached.

Best regards,
Krzysztof


> +}
> +
> +static irqreturn_t s3c24xx_serial_rx_chars_pio(void *dev_id)
> +{
> + struct s3c24xx_uart_port *ourport = dev_id;
> + struct uart_port *port = >port;
> + unsigned long flags;
> +
> + spin_lock_irqsave(>lock, flags);
> + s3c24xx_serial_rx_drain_fifo(ourport);
> + spin_unlock_irqrestore(>lock, flags);
>  
> -out:
>   return IRQ_HANDLED;
>  }
>  
> 

-sh-4.1# [  164.160601] sysrq: SysRq : Show backtrace of all active CPUs
[  164.164815] Sending NMI to all CPUs:
[  169.221223] s3c-i2c 138d.i2c: timeout waiting for bus idle
[  174.175570] NMI backtrace for cpu 1
[  174.177607] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 

[PATCH RFC v3 2/6] perf: Use extended syscall error reporting

2015-09-11 Thread Alexander Shishkin
This patch makes use of the extended syscall error reporting
infrastructure to relay error messages that result from perf_event_open()
attribute validation. On top of the default error report bits, it also
transfers the name of the attribute field that triggered the error.

Signed-off-by: Alexander Shishkin 
---
 include/linux/perf_event.h | 14 ++
 kernel/events/core.c   | 17 -
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 2027809433..eb63074012 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -16,6 +16,20 @@
 
 #include 
 
+#include 
+
+struct perf_ext_err_site {
+   struct ext_err_site site;
+   const char  *attr_field;
+};
+
+#define perf_err(__c, __a, __m)
\
+   ({ /* make sure it's a real field before stringifying it */ \
+   struct perf_event_attr __x; (void)__x.__a;  \
+   ext_err(perf, __c, __m, \
+   .attr_field = __stringify(__a));\
+   })
+
 /*
  * Kernel-internal data types and definitions:
  */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index ae16867670..a714f0602b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9,6 +9,8 @@
  * For licensing details see kernel-base/COPYING
  */
 
+#define EXTERR_MODNAME "perf"
+
 #include 
 #include 
 #include 
@@ -44,11 +46,24 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "internal.h"
 
 #include 
 
+static char *perf_exterr_format(void *site)
+{
+   struct perf_ext_err_site *psite = site;
+   char *output;
+
+   output = kasprintf(GFP_KERNEL, "\t\"attr_field\": \"%s\"\n",
+  psite->attr_field);
+   return output;
+}
+
+DECLARE_EXTERR_DOMAIN(perf, perf_exterr_format);
+
 static struct workqueue_struct *perf_wq;
 
 typedef int (*remote_function_f)(void *);
@@ -8352,7 +8367,7 @@ err_group_fd:
fdput(group);
 err_fd:
put_unused_fd(event_fd);
-   return err;
+   return ext_err_errno(err);
 }
 
 /**
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC v3 0/6] Introduce extended syscall error reporting

2015-09-11 Thread Alexander Shishkin
Hi Ingo, Peter and everybody,

This is another stab at the error reporting problem. I've been sitting
on this code for a couple of weeks now for no good reason, so I
figured I'd just put it out there and see where we go next.

This time around, the error reporting itself is a separate
"infrastructure", which is mostly a header file infested with macros
and some code to set it all up and deliver to userspace. The latter is
now done with a prctl() like Ingo suggested. This is the first
patch. Then, it gets integrated into perf core and one example error
return is annotated with a message. The rest of the patchset adds
support to perf tooling, which includes its own JSON parser (I wasn't
aware that Andi was bringing one in with one of his pull requests at
the moment of writing it and I still like it better not only because
of the NIH symptoms) and extends perf_evsel__open_strerror() to fetch
these error messages from the kernel. I didn't include all the
instrumentation that I did in the previous versions of the patchset to
keep the noise level down.

Alexander Shishkin (6):
  exterr: Introduce extended syscall error reporting
  perf: Use extended syscall error reporting
  perf/x86: Annotate a BTS error with extended error reporting
  perf tools: Add a simple JSON parser
  perf tools: Add userspace counterpart for extended error reporting
  perf tools: Use extended syscall error reporting

 arch/x86/kernel/cpu/perf_event.c |   5 +-
 include/linux/exterr.h   |  99 
 include/linux/perf_event.h   |  14 +++
 include/linux/sched.h|   1 +
 include/uapi/linux/prctl.h   |   5 +
 kernel/events/core.c |  17 ++-
 kernel/sys.c |   6 +
 lib/Makefile |   2 +
 lib/exterr.c | 157 
 tools/include/tools/json.h   |  40 +++
 tools/lib/util/json.c| 250 +++
 tools/perf/util/Build|   6 +
 tools/perf/util/evsel.c  |  12 +-
 tools/perf/util/exterr.c |  79 +
 tools/perf/util/exterr.h |  21 
 15 files changed, 711 insertions(+), 3 deletions(-)
 create mode 100644 include/linux/exterr.h
 create mode 100644 lib/exterr.c
 create mode 100644 tools/include/tools/json.h
 create mode 100644 tools/lib/util/json.c
 create mode 100644 tools/perf/util/exterr.c
 create mode 100644 tools/perf/util/exterr.h

-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Regression: Disk corruption with dm-crypt and kernels >= 4.0

2015-09-11 Thread Mike Snitzer
Hi,

Could you please try the following patch (against any of the kernels you
saw the corruption with.  be it 4.0, 4.1, 4.2) to see if the regression
you reported goes away?  Thanks, Mike

From: Mike Snitzer 
Date: Wed, 9 Sep 2015 21:34:51 -0400
Subject: [PATCH] dm crypt: constrain crypt device's max_segment_size to
 PAGE_SIZE

Unfortunate constraint that is required to avoid the potential for
exceeding underlying device's max_segments limits -- due to
crypt_alloc_buffer() possibly allocating pages for the encryption bio
that are not as physically contiguous as the original bio.

Suggested-by: Jeff Moyer 
Signed-off-by: Mike Snitzer 
---
 drivers/md/dm-crypt.c |   17 +++--
 1 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 76f1d6e..f717762 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -973,7 +973,8 @@ static void crypt_free_buffer_pages(struct crypt_config 
*cc, struct bio *clone);
 
 /*
  * Generate a new unfragmented bio with the given size
- * This should never violate the device limitations
+ * This should never violate the device limitations (but only because
+ * max_segment_size is being constrained to PAGE_SIZE).
  *
  * This function may be called concurrently. If we allocate from the mempool
  * concurrently, there is a possibility of deadlock. For example, if we have
@@ -2057,9 +2058,20 @@ static int crypt_iterate_devices(struct dm_target *ti,
return fn(ti, cc->dev, cc->start, ti->len, data);
 }
 
+static void crypt_io_hints(struct dm_target *ti, struct queue_limits *limits)
+{
+   /*
+* Unfortunate constraint that is required to avoid the potential
+* for exceeding underlying device's max_segments limits -- due to
+* crypt_alloc_buffer() possibly allocating pages for the encryption
+* bio that are not as physically contiguous as the original bio.
+*/
+   limits->max_segment_size = PAGE_SIZE;
+}
+
 static struct target_type crypt_target = {
.name   = "crypt",
-   .version = {1, 14, 0},
+   .version = {1, 14, 1},
.module = THIS_MODULE,
.ctr= crypt_ctr,
.dtr= crypt_dtr,
@@ -2071,6 +2083,7 @@ static struct target_type crypt_target = {
.message = crypt_message,
.merge  = crypt_merge,
.iterate_devices = crypt_iterate_devices,
+   .io_hints = crypt_io_hints,
 };
 
 static int __init dm_crypt_init(void)
-- 
1.7.4.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] cpufreq: pass policy to ->get() driver callback

2015-09-11 Thread Viresh Kumar
On 10-09-15, 23:36, Rafael J. Wysocki wrote:
> BTW, I wonder how much of the stuff in cpufreq.h can be moved to a local 
> header
> under drivers/cpufreq/.  It looks like the majority of it is not used by
> anybody else.

Okay, will check that out and do some cleanup.

> ---
> From: Rafael J. Wysocki 
> Subject: cpufreq: acpi-cpufreq: Use cpufreq_cpu_get_raw() in ->get()
> 
> cpufreq_cpu_get() called by get_cur_freq_on_cpu() is overkill,
> because the ->get() callback is always invoked in a context in
> which all of the conditions checked by cpufreq_cpu_get() are
> guaranteed to be satisfied.
> 
> Use cpufreq_cpu_get_raw() instead of it and drop the
> corresponding cpufreq_cpu_put() from get_cur_freq_on_cpu().
> 
> Signed-off-by: Rafael J. Wysocki 
> ---
>  drivers/cpufreq/acpi-cpufreq.c |3 +--
>  drivers/cpufreq/cpufreq.c  |2 +-
>  include/linux/cpufreq.h|5 +
>  3 files changed, 7 insertions(+), 3 deletions(-)
> 
> Index: linux-pm/drivers/cpufreq/acpi-cpufreq.c
> ===
> --- linux-pm.orig/drivers/cpufreq/acpi-cpufreq.c
> +++ linux-pm/drivers/cpufreq/acpi-cpufreq.c
> @@ -375,12 +375,11 @@ static unsigned int get_cur_freq_on_cpu(
>  
>   pr_debug("get_cur_freq_on_cpu (%d)\n", cpu);
>  
> - policy = cpufreq_cpu_get(cpu);
> + policy = cpufreq_cpu_get_raw(cpu);
>   if (unlikely(!policy))
>   return 0;
>  
>   data = policy->driver_data;
> - cpufreq_cpu_put(policy);
>   if (unlikely(!data || !data->freq_table))
>   return 0;
>  
> Index: linux-pm/drivers/cpufreq/cpufreq.c
> ===
> --- linux-pm.orig/drivers/cpufreq/cpufreq.c
> +++ linux-pm/drivers/cpufreq/cpufreq.c
> @@ -238,13 +238,13 @@ int cpufreq_generic_init(struct cpufreq_
>  }
>  EXPORT_SYMBOL_GPL(cpufreq_generic_init);
>  
> -/* Only for cpufreq core internal use */
>  struct cpufreq_policy *cpufreq_cpu_get_raw(unsigned int cpu)
>  {
>   struct cpufreq_policy *policy = per_cpu(cpufreq_cpu_data, cpu);
>  
>   return policy && cpumask_test_cpu(cpu, policy->cpus) ? policy : NULL;
>  }
> +EXPORT_SYMBOL_GPL(cpufreq_cpu_get_raw);
>  
>  unsigned int cpufreq_generic_get(unsigned int cpu)
>  {
> Index: linux-pm/include/linux/cpufreq.h
> ===
> --- linux-pm.orig/include/linux/cpufreq.h
> +++ linux-pm/include/linux/cpufreq.h
> @@ -129,9 +129,14 @@ struct cpufreq_policy {
>  #define CPUFREQ_SHARED_TYPE_ANY   (3) /* Freq can be set from any 
> dependent CPU*/
>  
>  #ifdef CONFIG_CPU_FREQ
> +struct cpufreq_policy *cpufreq_cpu_get_raw(unsigned int cpu);
>  struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu);
>  void cpufreq_cpu_put(struct cpufreq_policy *policy);
>  #else
> +static inline struct cpufreq_policy *cpufreq_cpu_get_raw(unsigned int cpu)
> +{
> + return NULL;
> +}
>  static inline struct cpufreq_policy *cpufreq_cpu_get(unsigned int cpu)
>  {
>   return NULL;

Acked-by: Viresh Kumar 

-- 
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [LINUX RFC v2 1/4] spi: add support of two chip selects & data stripe

2015-09-11 Thread Harini Katakam
Hi Mark,

On Fri, Sep 11, 2015 at 6:06 PM, Mark Brown  wrote:
> On Fri, Sep 04, 2015 at 12:02:21PM +, Ranjit Abhimanyu Waghmode wrote:
>
> Please fix your mail client to word wrap within paragraphs and to quote
> text without reflowing it - your messages are very hard to read.
>
>> > > + /* Controller may support more than one chip.
>> > > +  * This flag will enable that feature.
>> > > +  */
>> > > +#define SPI_MASTER_BOTH_CS   BIT(8)  /* enable both
>> > chips */
>
>> Now we can consider following use cases:
>
>> Suppose we need to send the same data to multiple slaves of same kind:
>> Here the application need not to do individual slave access for writing, 
>> instead it can send data to all the devices in one go.
>
> That's a *very* specific application which will only work for write only
> devices - I'd be surprised if such systems actually had distinct chip
> select lines at the CPU level.
>

Agreed that it is very specific but here are a few ways it is used
when communicating with two flash devices in parallel configuration:
- Write enable is sent to both devices using a single operation.
- Writing to any configuration registers in the flash is done in one go
- Some application that want to mirror important data to both devices.
Even with reading, the assertion of multiple cs combined with stripe
will mean:
- Two status bytes, one form each will be obtained in one operation
- Similarly data that was written using stripe is read back and combined.

Such systems could still maintain separate chip selects to perform
individual operations such as reading flash ID, debugging failures or
locking specific sectors.

Regards,
Harini
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] efi/libstub/fdt: Standardize the names of EFI stub parameters

2015-09-11 Thread Mark Rutland
> >> Considering that the EFI support is just for Dom0, and Dom0 (at
> >> the time) had to be PV anyway, it was the more natural solution to
> >> expose the interface via hypercalls, the more that this allows better
> >> control over what is and primarily what is not being exposed to
> >> Dom0. With the wrapper approach we'd be back to the same
> >> problem (discussed elsewhere) of which EFI version to surface: The
> >> host one would impose potentially missing extensions, while the
> >> most recent hypervisor known one might imply hiding valuable
> >> information from Dom0. Plus there are incompatible changes like
> >> the altered meaning of EFI_MEMORY_WP in 2.5.
> > 
> > I'm not sure I follow how hypercalls solve any impedance mismatch here;
> > you're still expecting Dom0 to call up to Xen in order to perform calls,
> > and all I suggested was a different location for those hypercalls.
> > 
> > If Xen is happy to make such calls blindly, why does it matter if the
> > hypercall was in the kernel binary or an external shim?
> 
> Because there could be new entries in SystemTable->RuntimeServices
> (expected and blindly but validly called by the kernel). Even worse
> (because likely harder to deal with) would be new fields in other
> structures.

Any of these could cause Xen to blow up, while Xen could always provide
a known-safe (but potentially sub-optimal) view to the kernel by
default.

> > Incompatible changes are a spec problem regardless of how this is
> > handled.
> 
> Not necessarily - we don't expose the memory map (we'd have to
> if we were to mimic EFI for Dom0), and hence the mentioned issue
> doesn't exist in our model.

We have to expose _some_ memory map, so I don't follow this point.

Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/6] sched/fair: Name utilization related data and functions consistently

2015-09-11 Thread Dietmar Eggemann
On 04/09/15 10:08, Vincent Guittot wrote:
> On 14 August 2015 at 18:23, Morten Rasmussen  wrote:
>> From: Dietmar Eggemann 
>>
>> Use the advent of the per-entity load tracking rewrite to streamline the
>> naming of utilization related data and functions by using
>> {prefix_}util{_suffix} consistently. Moreover call both signals
>> ({se,cfs}.avg.util_avg) utilization.
> 
> I don't have a strong opinion about the naming of this variable but I
> remember a discussion about this topic:
> https://lkml.org/lkml/2014/9/11/474 : "Call the pure running number
> 'utilization' and this scaled with capacity 'usage' "
> 
> The utilization has been shorten to util with the rewrite of the pelt,
> so the current use of usage in get_cpu_usage still follows this rule.

But since we now do the capacity scaling in __update_load_avg()

util_sum += t * scale_freq/SCHED_CAP_SCALE * arch_scale_freq_capacity()

util_avg = util_sum / LOAD_AVG_MAX;

we could either name everything 'util' or everything 'usage' (including
the utilization sum and avg in struct sched_avg).

> 
> So why do you want to change that now ?
> Furthermore, cfs.avg.util_avg is a load  whereas sgs->group_util is a
> capacity. Both don't use the same unit and same range which can be
> confusing when you read the code

[...]

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-11 Thread bsegall
Morten Rasmussen  writes:

> On Fri, Sep 11, 2015 at 08:28:25AM +0800, Yuyang Du wrote:
>> On Thu, Sep 10, 2015 at 12:07:27PM +0200, Peter Zijlstra wrote:
>> > > > Still don't understand why it's a unit problem. IMHO LOAD/UTIL and
>> > > > CAPACITY have no unit.
>> > > 
>> > > To be more accurate, probably, LOAD can be thought of as having unit,
>> > > but UTIL has no unit.
>> > 
>> > But I'm thinking that is wrong; it should have one, esp. if we go scale
>> > the thing. Giving it the same fixed point unit as load simplifies the
>> > code.
>> 
>> I think we probably are saying the same thing with different terms. Anyway,
>> let me reiterate what I said and make it a little more formalized.
>> 
>> UTIL has no unit because it is pure ratio, the cpu_running%, which is in the
>> range of [0, 100%], and we increase the resolution, because we don't want
>> to lose many (due to integer rounding) by multiplying a number (say 1024), 
>> then
>> the range becomes [0, 1024].
>
> Fully agree, and with frequency invariance we basically scale running
> time to take into account that the cpu might be running slower that it
> is capable of at the highest frequency. With cpu invariance also scale
> by any difference their might be in max frequency and/or cpu
> micro-archiecture so utilization becomes comparable between cpus. One
> can also see it as we slow down or speed up time depending the current
> compute capacity of the cpu relative to the max capacity.
>
>> CAPACITY is also a ratio of ACTUAL_PERF/MAX_PERF, from (0, 1]. Even LOAD
>> is the same, a ratio of NICE_X/NICE_0, from [15/1024=0.015, 
>> 88761/1024=86.68],
>> as it only has relativity meaning (i.e., when comparing to each other).
>
> Fully agree. Though 'LOAD' is a somewhat overloaded term in the
> scheduler. Just to be clear, you refer to load.weight, load_avg is the
> multiplication of load.weight and the task runnable time ratio.
>
>> I said it has unit, it is in the sense that it looks like currency (for 
>> instance,
>> Yuan), you can use to buy CPU fair share. But it is just how you look at it 
>> and
>> there are certainly many other ways.
>> 
>> So, I still propose to generalize all these with the following patch, in the
>> belief that this makes it simple and clear, and error-reducing. 
>> 
>> --
>> 
>> Subject: [PATCH] sched/fair: Generalize the load/util averages resolution
>>  definition
>> 
>> A integer metric needs certain resolution to allow how much detail we
>> can look into (not losing detail by integer rounding), which also
>> determines the range of the metrics.
>> 
>> For instance, to increase the resolution of [0, 1] (two levels), one
>> can multiply 1024 and get [0, 1024] (1025 levels).
>> 
>> In sched/fair, a few metrics depend on the resolution: load/load_avg,
>> util_avg, and capacity (frequency adjustment). In order to reduce the
>> risks of making mistakes relating to resolution/range, we therefore
>> generalize the resolution by defining a basic resolution constant
>> number, and then formalize all metrics to depend on the basic
>> resolution. The basic resolution is 1024 or (1 << 10). Further, one
>> can recursively apply another basic resolution to increase the final
>> resolution (e.g., 1048676=1<<20).
>> 
>> Signed-off-by: Yuyang Du 
>> ---
>>  include/linux/sched.h |  2 +-
>>  kernel/sched/sched.h  | 12 +++-
>>  2 files changed, 8 insertions(+), 6 deletions(-)
>> 
>> diff --git a/include/linux/sched.h b/include/linux/sched.h
>> index 119823d..55a7b93 100644
>> --- a/include/linux/sched.h
>> +++ b/include/linux/sched.h
>> @@ -912,7 +912,7 @@ enum cpu_idle_type {
>>  /*
>>   * Increase resolution of cpu_capacity calculations
>>   */
>> -#define SCHED_CAPACITY_SHIFT10
>> +#define SCHED_CAPACITY_SHIFTSCHED_RESOLUTION_SHIFT
>>  #define SCHED_CAPACITY_SCALE(1L << SCHED_CAPACITY_SHIFT)
>>  
>>  /*
>> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
>> index 68cda11..d27cdd8 100644
>> --- a/kernel/sched/sched.h
>> +++ b/kernel/sched/sched.h
>> @@ -40,6 +40,9 @@ static inline void update_cpu_load_active(struct rq 
>> *this_rq) { }
>>   */
>>  #define NS_TO_JIFFIES(TIME) ((unsigned long)(TIME) / (NSEC_PER_SEC / HZ))
>>  
>> +# define SCHED_RESOLUTION_SHIFT 10
>> +# define SCHED_RESOLUTION_SCALE (1L << SCHED_RESOLUTION_SHIFT)
>> +
>>  /*
>>   * Increase resolution of nice-level calculations for 64-bit architectures.
>>   * The extra resolution improves shares distribution and load balancing of
>> @@ -53,16 +56,15 @@ static inline void update_cpu_load_active(struct rq 
>> *this_rq) { }
>>   * increased costs.
>>   */
>>  #if 0 /* BITS_PER_LONG > 32 -- currently broken: it increases power usage 
>> under light load  */
>> -# define SCHED_LOAD_RESOLUTION  10
>> -# define scale_load(w)  ((w) << SCHED_LOAD_RESOLUTION)
>> -# define scale_load_down(w) ((w) >> SCHED_LOAD_RESOLUTION)
>> +# define SCHED_LOAD_SHIFT   

Re: [PATCH 0/2] nohz_full: Offload task_tick to remote housekeeping cpus for nohz_full cpus

2015-09-11 Thread Vatika Harlalka
> 1. If LB_BIAS is false for nohz_full CPUs. This will help us figure out if
> rq->cpu_load
> is read for them.

lb_bias feature is not disabled for full dynticks. rq->cpu_load[] is
never used for them.
nohz_full cpus verify the condition on_null_domain(rq)

> 2. When a cpu reports scheduling stats for all cpus such as
> sum_exec_runtime,
> does it consider cpu_isolated_map (most likely it will) ? If it does, we
> need to be
> able to remotely update those statistics before reporting them. i.e. call
> update_curr(rq) on behalf of the nohz_full cpus.

Yeah, cpu stats are reported and updated for all cpus.

The next step would be to offload these on other cpus now for nohz_full cpus
running a single task now?

Thanks
Vatika

>
>
>
> --
> This is Preeti, signing off.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Patch V1] x86, mce: CPU synchronization for broadcast MCE's is surprised by offline CPUs

2015-09-11 Thread Raj, Ashok
Hi Boris



On Fri, Sep 11, 2015 at 10:46:36AM +0200, Borislav Petkov wrote:
> 
> One more buffer for MCEs? Why?
> 
> We did add the mce_gen_pool thing exactly for logging stuff in atomic
> context. From looking at the code, we probably could get rid of that
> "struct mce_log mcelog" thing too and use only the gen_pool for logging
> MCEs.

I think using gen_pool should be fine. Also seems the notify moved to the 
mce_gen_pool_process(), so there should be no notifier calls from the offline
cpu.

Let me take a respin and resend. I should have noticed.. apologies. 

> 
> We can then get rid of that MCE_LOG_LEN arbitrary 32 records and use
> a nice 2-paged buffer which can be enlarged transparently later, if
> needed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/5] seccomp: make underlying bpf ref counted as well

2015-09-11 Thread Tycho Andersen
On Fri, Sep 11, 2015 at 06:03:59PM +0200, Daniel Borkmann wrote:
> On 09/11/2015 04:44 PM, Tycho Andersen wrote:
> >On Fri, Sep 11, 2015 at 03:02:36PM +0200, Daniel Borkmann wrote:
> >>On 09/11/2015 02:20 AM, Tycho Andersen wrote:
> >>>In the next patch, we're going to add a way to access the underlying
> >>>filters via bpf fds. This means that we need to ref-count both the
> >>>struct seccomp_filter objects and the struct bpf_prog objects separately,
> >>>in case a process dies but a filter is still referred to by another
> >>>process.
> >>>
> >>>Additionally, we mark classic converted seccomp filters as seccomp eBPF
> >>>programs, since they are a subset of what is supported in seccomp eBPF.
> >>>
> >>>Signed-off-by: Tycho Andersen 
> >>>CC: Kees Cook 
> >>>CC: Will Drewry 
> >>>CC: Oleg Nesterov 
> >>>CC: Andy Lutomirski 
> >>>CC: Pavel Emelyanov 
> >>>CC: Serge E. Hallyn 
> >>>CC: Alexei Starovoitov 
> >>>CC: Daniel Borkmann 
> >>>---
> >>>  kernel/seccomp.c | 4 +++-
> >>>  1 file changed, 3 insertions(+), 1 deletion(-)
> >>>
> >>>diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> >>>index 245df6b..afaeddf 100644
> >>>--- a/kernel/seccomp.c
> >>>+++ b/kernel/seccomp.c
> >>>@@ -378,6 +378,8 @@ static struct seccomp_filter 
> >>>*seccomp_prepare_filter(struct sock_fprog *fprog)
> >>>   }
> >>>
> >>>   atomic_set(>usage, 1);
> >>>+  atomic_set(>prog->aux->refcnt, 1);
> >>>+  sfilter->prog->type = BPF_PROG_TYPE_SECCOMP;
> >>
> >>So, if you do this, then this breaks the assumption of eBPF JITs
> >>that, currently, all classic converted BPF programs always have a
> >>prog->type of BPF_PROG_TYPE_UNSPEC (see: bpf_prog_was_classic()).
> >>
> >>Currently, JITs make use of this information to determine whether
> >>A and X mappings for such programs should or should not be cleared
> >>in the prologue (s390 currently).
> >>
> >>In the seccomp_prepare_filter() stage, we're already past that, so
> >>it will not cause an issue, but we certainly would need to be very
> >>careful in future, if bpf_prog_was_classic() is then used at a later
> >>stage when we already have a generated bpf_prog somewhere, as then
> >>this assumption will break.
> >
> >The only reason we need to do this is to allow BPF_DUMP_PROG to work,
> >since we were restricting it to only allow dumping of seccomp
> >programs, since those don't have maps. Instead, perhaps we could allow
> >dumping of BPF_PROG_TYPE_SECCOMP and BPF_PROG_TYPE_UNSPEC?
> 
> There are possibilities that BPF_PROG_TYPE_UNSPEC is calling helpers
> already today, at least in networking case, not seccomp. So, since
> you want to export [classic -> eBPF] only for seccomp, put fds on them
> and dump these via bpf(2), you could allow that (with a big comment
> stating why it's safe), but mid-term we really need to sanitize all
> this stuff properly as this is needed for other types, too.

Sorry, just to be clear, you're suggesting that the patch is ok modulo
a comment describing the jit issues?

Tycho
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] of: Check for overlap in reserved memory regions

2015-09-11 Thread Mitchel Humpherys
Any overlap in the reserved memory regions (those specified in the
reserved-memory DT node) is a bug.  These bugs might go undetected as
long as the contested region isn't used simultaneously by multiple
software agents, which makes such bugs hard to debug.  Fix this by
printing a scary warning during boot if overlap is detected.

Signed-off-by: Mitchel Humpherys 
---
 drivers/of/of_reserved_mem.c | 45 +++-
 1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/drivers/of/of_reserved_mem.c b/drivers/of/of_reserved_mem.c
index 726ebe792813..5246d346cee0 100644
--- a/drivers/of/of_reserved_mem.c
+++ b/drivers/of/of_reserved_mem.c
@@ -1,7 +1,7 @@
 /*
  * Device tree based initialization code for reserved memory.
  *
- * Copyright (c) 2013, The Linux Foundation. All Rights Reserved.
+ * Copyright (c) 2013, 2015 The Linux Foundation. All Rights Reserved.
  * Copyright (c) 2013,2014 Samsung Electronics Co., Ltd.
  * http://www.samsung.com
  * Author: Marek Szyprowski 
@@ -20,9 +20,11 @@
 #include 
 #include 
 #include 
+#include 
 
 #define MAX_RESERVED_REGIONS   16
 static struct reserved_mem reserved_mem[MAX_RESERVED_REGIONS];
+static struct reserved_mem sorted_reserved_mem[MAX_RESERVED_REGIONS] 
__initdata;
 static int reserved_mem_count;
 
 #if defined(CONFIG_HAVE_MEMBLOCK)
@@ -197,12 +199,53 @@ static int __init __reserved_mem_init_node(struct 
reserved_mem *rmem)
return -ENOENT;
 }
 
+static int __init __rmem_cmp(const void *a, const void *b)
+{
+   const struct reserved_mem *ra = a, *rb = b;
+
+   return ra->base - rb->base;
+}
+
+static void __init __rmem_check_for_overlap(void)
+{
+   int i;
+
+   if (reserved_mem_count < 2)
+   return;
+
+   memcpy(sorted_reserved_mem, reserved_mem, sizeof(sorted_reserved_mem));
+   sort(sorted_reserved_mem, reserved_mem_count,
+sizeof(sorted_reserved_mem[0]), __rmem_cmp, NULL);
+   for (i = 0; i < reserved_mem_count - 1; i++) {
+   struct reserved_mem *this, *next;
+
+   this = _reserved_mem[i];
+   next = _reserved_mem[i + 1];
+   if (!(this->base && next->base))
+   continue;
+   if (this->base + this->size > next->base) {
+   phys_addr_t this_end, next_end;
+
+   this_end = this->base + this->size;
+   next_end = next->base + next->size;
+   WARN(1, "Reserved mem: OVERLAP DETECTED!\n");
+   pr_err("%s (%pa--%pa) overlaps with %s (%pa--%pa)\n",
+  this->name, >base, _end,
+  next->name, >base, _end);
+   }
+   }
+}
+
 /**
  * fdt_init_reserved_mem - allocate and init all saved reserved memory regions
  */
 void __init fdt_init_reserved_mem(void)
 {
int i;
+
+   /* check for overlapping reserved regions */
+   __rmem_check_for_overlap();
+
for (i = 0; i < reserved_mem_count; i++) {
struct reserved_mem *rmem = _mem[i];
unsigned long node = rmem->fdt_node;
-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 03/11] ARM: DT: STiH407: Add SPI 3 wire and 4 wire pinctrl configs

2015-09-11 Thread Lee Jones
On Fri, 11 Sep 2015, Peter Griffin wrote:

> This patch adds the spi pinctrl configurations for all SPI
> controllers, and also the alternate muxings which
> can be used depending on board design.
> 
> Signed-off-by: Christophe Kerello 
> Signed-off-by: Peter Griffin 
> ---
>  arch/arm/boot/dts/stih407-pinctrl.dtsi | 239 
> -
>  1 file changed, 235 insertions(+), 4 deletions(-)

Acked-by: Lee Jones 

> diff --git a/arch/arm/boot/dts/stih407-pinctrl.dtsi 
> b/arch/arm/boot/dts/stih407-pinctrl.dtsi
> index ce219a1..bb3b0c7 100644
> --- a/arch/arm/boot/dts/stih407-pinctrl.dtsi
> +++ b/arch/arm/boot/dts/stih407-pinctrl.dtsi
> @@ -262,6 +262,57 @@
>   };
>   };
>   };
> +
> + spi10 {
> + pinctrl_spi10_default: spi10-4w-alt1-0 {
> + st,pins {
> + mtsr = < 6 ALT1 OUT>;
> + mrst = < 7 ALT1 IN>;
> + scl = < 5 ALT1 OUT>;
> + };
> + };
> +
> + pinctrl_spi10_3w_alt1_0: spi10-3w-alt1-0 {
> + st,pins {
> + mtsr = < 6 ALT1 BIDIR_PU>;
> + scl = < 5 ALT1 OUT>;
> + };
> + };
> + };
> +
> + spi11 {
> + pinctrl_spi11_default: spi11-4w-alt2-0 {
> + st,pins {
> + mtsr = < 1 ALT2 OUT>;
> + mrst = < 0 ALT2 IN>;
> + scl = < 2 ALT2 OUT>;
> + };
> + };
> +
> + pinctrl_spi11_3w_alt2_0: spi11-3w-alt2-0 {
> + st,pins {
> + mtsr = < 1 ALT2 BIDIR_PU>;
> + scl = < 2 ALT2 OUT>;
> + };
> + };
> + };
> +
> + spi12 {
> + pinctrl_spi12_default: spi12-4w-alt2-0 {
> + st,pins {
> + mtsr = < 6 ALT2 OUT>;
> + mrst = < 4 ALT2 IN>;
> + scl = < 7 ALT2 OUT>;
> + };
> + };
> +
> + pinctrl_spi12_3w_alt2_0: spi12-3w-alt2-0 {
> + st,pins {
> + mtsr = < 6 ALT2 BIDIR_PU>;
> + scl = < 7 ALT2 OUT>;
> + };
> + };
> + };
>   };
>  
>   pin-controller-front0 {
> @@ -451,11 +502,159 @@
>   };
>  
>   spi0 {
> - pinctrl_spi0_default: spi0-default {
> + pinctrl_spi0_default: spi0-4w-alt2-0 {
> + st,pins {
> + mtsr = < 6 ALT2 OUT>;
> + mrst = < 7 ALT2 IN>;
> + scl = < 5 ALT2 OUT>;
> + };
> + };
> +
> + pinctrl_spi0_3w_alt2_0: spi0-3w-alt2-0 {
>   st,pins {
> - mtsr = < 6 ALT2 BIDIR>;
> - mrst = < 7 ALT2 BIDIR>;
> - scl = < 5 ALT2 BIDIR>;
> + mtsr = < 6 ALT2 BIDIR_PU>;
> + scl = < 5 ALT2 OUT>;
> + };
> + };
> +
> + pinctrl_spi0_4w_alt1_0: spi0-4w-alt1-0 {
> + st,pins {
> + mtsr = < 7 ALT1 OUT>;
> + mrst = < 5 ALT1 IN>;
> + scl = < 6 ALT1 OUT>;
> + };
> + };
> +
> + pinctrl_spi0_3w_alt1_0: 

Re: [PATCH v2 7/9] ARM: STi: DT: STiH407: Add FDMA driver dt nodes.

2015-09-11 Thread Peter Griffin
Hi Lee,

On Fri, 11 Sep 2015, Lee Jones wrote:

> On Fri, 11 Sep 2015, Peter Griffin wrote:
> > On Fri, 11 Sep 2015, Lee Jones wrote:
> > > On Fri, 11 Sep 2015, Peter Griffin wrote:
> > > 
> > > > These nodes are required to get the fdma driver working
> > > > on STiH407 based silicon.
> > > > 
> > > > Signed-off-by: Peter Griffin 
> > > > ---
> > > >  arch/arm/boot/dts/stih407-family.dtsi | 51 
> > > > +++
> > > >  1 file changed, 51 insertions(+)
> > > > 
> > > > diff --git a/arch/arm/boot/dts/stih407-family.dtsi 
> > > > b/arch/arm/boot/dts/stih407-family.dtsi
> > > > index 838b812..da07474b 100644
> > > > --- a/arch/arm/boot/dts/stih407-family.dtsi
> > > > +++ b/arch/arm/boot/dts/stih407-family.dtsi
> > > > @@ -565,5 +565,56 @@
> > > >   <_port2 
> > > > PHY_TYPE_USB3>;
> > > > };
> > > > };
> > > > +
> > > > +   fdma0: fdma0-audio@8e2 {
> > > 
> > > I'm not familiar with the FDMA driver, so can't comment knowledgeably,
> > > but the  part of @ should only describe the
> > > type of hardware.  I believe in this case it should just be
> > > dma@08e2.  Also notice the leading zero in the address, which I
> > > believe mitigates possible confusion.  Then you be more specific with
> > > the label, so something like 'fdma-audio' seems appropriate here.
> > 
> > Ok, can change to that format in v3.
> > 
> > > 
> > > > +   compatible = "st,stih407-fdma-mpe31";
> > > > +   reg = <0x8e2 0x2>;
> > > 
> > > I personally find padding up to 32bits helpful in the addresses.
> > 
> > None of the stih407-family nodes I can see have this padding, including
> > the ones merged by you.
> 
> Nither of these two facts mean it's correct.

I thought it was a 'personal' thing. If it is mandated by the spec, then that
is different.

> 
> I'm happy to write a patch to correct them all.

Are you sure your actually correcting anything? Where does it say you
should have a leading zero?

> 
> Bear in mind that this isn't a hard and fast rule.  Both work and are
> legal.  I just think the padding is more consistent.

Surely adding a patch with how the nodes are currently formatted, is more 
consistent
than adding a patch with padding?

regards,

Peter.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH] vfio/pci: Use kernel VPD access functions

2015-09-11 Thread Alex Williamson
The PCI VPD capability operates on a set of window registers in PCI
config space.  Writing to the address register triggers either a read
or write, depending on the setting of the PCI_VPD_ADDR_F bit within
the address register.  The data register provides either the source
for writes or the target for reads.

This model is susceptible to being broken by concurrent access, for
which the kernel has adopted a set of access functions to serialize
these registers.  Additionally, commits like 932c435caba8 ("PCI: Add
dev_flags bit to access VPD through function 0") and 7aa6ca4d39ed
("PCI: Add VPD function 0 quirk for Intel Ethernet devices") indicate
that VPD registers can be shared between functions on multifunction
devices creating dependencies between otherwise independent devices.

Fortunately it's quite easy to emulate the VPD registers, simply
storing copies of the address and data registers in memory and
triggering a VPD read or write on writes to the address register.
This allows vfio users to avoid seeing spurious register changes from
accesses on other devices and enables the use of shared quirks in the
host kernel.  We can theoretically still race with access through
sysfs, but the window of opportunity is much smaller.

Signed-off-by: Alex Williamson 
---

RFC - Is this something we should do?  Should we consider providing
similar emulation through PCI sysfs to allow lspci to also make use
of the vpd interfaces?

 drivers/vfio/pci/vfio_pci_config.c |   70 +++-
 1 file changed, 69 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/pci/vfio_pci_config.c 
b/drivers/vfio/pci/vfio_pci_config.c
index ff75ca3..a8657ef 100644
--- a/drivers/vfio/pci/vfio_pci_config.c
+++ b/drivers/vfio/pci/vfio_pci_config.c
@@ -671,6 +671,73 @@ static int __init init_pci_cap_pm_perm(struct perm_bits 
*perm)
return 0;
 }
 
+static int vfio_vpd_config_write(struct vfio_pci_device *vdev, int pos,
+int count, struct perm_bits *perm,
+int offset, __le32 val)
+{
+   struct pci_dev *pdev = vdev->pdev;
+   __le16 *paddr = (__le16 *)(vdev->vconfig + pos - offset + PCI_VPD_ADDR);
+   __le32 *pdata = (__le32 *)(vdev->vconfig + pos - offset + PCI_VPD_DATA);
+   u16 addr;
+   u32 data;
+
+   /*
+* Write through to emulation.  If the write includes the upper byte
+* of PCI_VPD_ADDR, then the PCI_VPD_ADDR_F bit is written and we
+* have work to do.
+*/
+   count = vfio_default_config_write(vdev, pos, count, perm, offset, val);
+   if (count < 0 || offset > PCI_VPD_ADDR + 1 ||
+   offset + count <= PCI_VPD_ADDR + 1)
+   return count;
+
+   addr = le16_to_cpu(*paddr);
+
+   if (addr & PCI_VPD_ADDR_F) {
+   data = le32_to_cpu(*pdata);
+   if (pci_write_vpd(pdev, addr & ~PCI_VPD_ADDR_F, 4, ) != 4)
+   return count;
+   } else {
+   if (pci_read_vpd(pdev, addr, 4, ) != 4)
+   return count;
+   *pdata = cpu_to_le32(data);
+   }
+
+   /*
+* Toggle PCI_VPD_ADDR_F in the emulated PCI_VPD_ADDR register to
+* signal completion.  If an error occurs above, we assume that not
+* toggling this bit will induce a driver timeout.
+*/
+   addr ^= PCI_VPD_ADDR_F;
+   *paddr = cpu_to_le16(addr);
+
+   return count;
+}
+
+/* Permissions for Vital Product Data capability */
+static int __init init_pci_cap_vpd_perm(struct perm_bits *perm)
+{
+   if (alloc_perm_bits(perm, pci_cap_length[PCI_CAP_ID_VPD]))
+   return -ENOMEM;
+
+   perm->writefn = vfio_vpd_config_write;
+
+   /*
+* We always virtualize the next field so we can remove
+* capabilities from the chain if we want to.
+*/
+   p_setb(perm, PCI_CAP_LIST_NEXT, (u8)ALL_VIRT, NO_WRITE);
+
+   /*
+* Both the address and data registers are virtualized to
+* enable access through the pci_vpd_read/write functions
+*/
+   p_setw(perm, PCI_VPD_ADDR, (u16)ALL_VIRT, (u16)ALL_WRITE);
+   p_setd(perm, PCI_VPD_DATA, ALL_VIRT, ALL_WRITE);
+
+   return 0;
+}
+
 /* Permissions for PCI-X capability */
 static int __init init_pci_cap_pcix_perm(struct perm_bits *perm)
 {
@@ -790,6 +857,7 @@ void vfio_pci_uninit_perm_bits(void)
free_perm_bits(_perms[PCI_CAP_ID_BASIC]);
 
free_perm_bits(_perms[PCI_CAP_ID_PM]);
+   free_perm_bits(_perms[PCI_CAP_ID_VPD]);
free_perm_bits(_perms[PCI_CAP_ID_PCIX]);
free_perm_bits(_perms[PCI_CAP_ID_EXP]);
free_perm_bits(_perms[PCI_CAP_ID_AF]);
@@ -807,7 +875,7 @@ int __init vfio_pci_init_perm_bits(void)
 
/* Capabilities */
ret |= init_pci_cap_pm_perm(_perms[PCI_CAP_ID_PM]);
-   cap_perms[PCI_CAP_ID_VPD].writefn = vfio_raw_config_write;
+   ret |= 

RE: Chipidea ULPI driver

2015-09-11 Thread Subbaraya Sundeep Bhatta
Hi,

> -Original Message-
> From: Subbaraya Sundeep Bhatta
> Sent: Friday, September 11, 2015 5:04 PM
> To: 'Peter Chen'; Punnaiah Choudary Kalluri
> Cc: ba...@ti.com; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org;
> Greg Kroah-Hartman (gre...@linuxfoundation.org); kis...@ti.com
> Subject: RE: Chipidea ULPI driver
>
> Hi,
>
> > -Original Message-
> > From: Peter Chen [mailto:peter.c...@freescale.com]
> > Sent: Friday, September 11, 2015 6:52 AM
> > To: Punnaiah Choudary Kalluri
> > Cc: ba...@ti.com; Subbaraya Sundeep Bhatta; linux-...@vger.kernel.org;
> > linux- ker...@vger.kernel.org; Greg Kroah-Hartman
> > (gre...@linuxfoundation.org); kis...@ti.com
> > Subject: Re: Chipidea ULPI driver
> >
> > On Thu, Sep 10, 2015 at 02:57:49PM +, Punnaiah Choudary Kalluri
> wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: Felipe Balbi [mailto:ba...@ti.com]
> > > > Sent: Thursday, September 10, 2015 8:14 PM
> > > > To: Subbaraya Sundeep Bhatta
> > > > Cc: Peter Chen; ba...@ti.com; linux-...@vger.kernel.org; linux-
> > > > ker...@vger.kernel.org; Greg Kroah-Hartman
> > > > (gre...@linuxfoundation.org); kis...@ti.com; Punnaiah Choudary
> > > > Kalluri
> > > > Subject: Re: Chipidea ULPI driver
> > > >
> > > > (break your lines at 80-characters)
> > > >
> > > > On Thu, Sep 10, 2015 at 12:44:58PM +, Subbaraya Sundeep Bhatta
> > wrote:
> > > > > Hi Peter,
> > > > >
> > > > > We are using NOP transceiver driver for USB3320 ULPI PHY with
> > > > > ChipIdea controller.
> > > > >
> > > > > Recently we found that one of the boards (zedboard) requires PHY
> > > > > register access to set VBUS.
> > > > >
> > > > > Note that our local driver we had before migrating to ChipIdea
> > > > > driver calls otg_ulpi_create with flags  ULPI_OTG_DRVVBUS |
> > > > > ULPI_OTG_DRVVBUS_EXT so that VBUS is enabled at initialization.
> > > > >
> > > > > Can you please let me know how to do this with ChipIdea case? I
> > > > > see the following solutions:
> > > > >
> > > > > 1. Write ULPI driver for USB3320 similar to tusb1210.
> > > >
> > > > this
> > >
> > > How about extending the phy-ulpi driver to use it as platform driver?
> > > So that boards that are using the ulpi compatible phy and driving
> > > vbus from the phy can use this driver.
> > >
> >
> > Yes, you can improve phy-ulpi driver, and it can not depend on NOP
> > transceiver driver.
>
> AFAIK phy-ulpi.c is just exporting functions and not registering to platform 
> bus
> since it is not connected to SOC bus. I don't think we can create platform 
> driver
> for this. I have read TUSB1210 data sheet and it is similar to USB3320 with no
> additional SOC bus connection and has only ULPI interface.

Sorry, I mean external PHY here.

> So it should register to ULPI bus which is in kernel recently. I will make 
> changes
> to chipidea similar to dwc3(adding ulpi.c) and write driver similar to 
> tusb1210.c.
> Is that okay?
>
> Thanks,
> Sundeep
>
> >
> > --
> >
> > Best Regards,
> > Peter Chen


This email and any attachments are intended for the sole use of the named 
recipient(s) and contain(s) confidential information that may be proprietary, 
privileged or copyrighted under applicable law. If you are not the intended 
recipient, do not read, copy, or forward this email message or any attachments. 
Delete this email message and any attachments immediately.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/7] devcg: device cgroup extension for rdma resource

2015-09-11 Thread Tejun Heo
Hello, Parav.

On Fri, Sep 11, 2015 at 09:56:31PM +0530, Parav Pandit wrote:
> Resource run away by application can lead to (a) kernel and (b) other
> applications left out with no resources situation.

Yeap, that this controller would be able to prevent to a reasonable
extent.

> Both the problems are the target of this patch set by accounting via cgroup.
> 
> Performance contention can be resolved with higher level user space,
> which will tune it.

If individual applications are gonna be allowed to do that, what's to
prevent them from jacking up their limits?  So, I assume you're
thinking of a central authority overseeing distribution and enforcing
the policy through cgroups?

> Threshold and fail counters are on the way in follow on patch.

If you're planning on following what the existing memcg did in this
area, it's unlikely to go well.  Would you mind sharing what you have
on mind in the long term?  Where do you see this going?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 4/5] seccomp: add a way to access filters via bpf fds

2015-09-11 Thread Tycho Andersen
On Fri, Sep 11, 2015 at 09:20:55AM -0700, Andy Lutomirski wrote:
> On Sep 10, 2015 5:22 PM, "Tycho Andersen"  
> wrote:
> >
> > This patch adds a way for a process that is "real root" to access the
> > seccomp filters of another process. The process first does a
> > PTRACE_SECCOMP_GET_FILTER_FD to get an fd with that process' seccomp filter
> > attached, and then iterates on this with PTRACE_SECCOMP_NEXT_FILTER using
> > bpf(BPF_PROG_DUMP) to dump the actual program at each step.
> >
> 
> > +
> > +   fd = bpf_new_fd(filter->prog, O_RDONLY);
> > +   if (fd > 0)
> > +   atomic_inc(>prog->aux->refcnt);
> 
> Why isn't this folded into bpf_new_fd?

No reason it can't be as far as I can see. I'll make the change for
the next version.

> > +
> > +   return fd;
> > +}
> > +
> > +long seccomp_next_filter(struct task_struct *child, u32 fd)
> > +{
> > +   struct seccomp_filter *cur;
> > +   struct bpf_prog *prog;
> > +   long ret = -ESRCH;
> > +
> > +   if (!capable(CAP_SYS_ADMIN))
> > +   return -EACCES;
> > +
> > +   if (child->seccomp.mode != SECCOMP_MODE_FILTER)
> > +   return -EINVAL;
> > +
> > +   prog = bpf_prog_get(fd);
> > +   if (IS_ERR(prog)) {
> > +   ret = PTR_ERR(prog);
> > +   goto out;
> > +   }
> > +
> > +   for (cur = child->seccomp.filter; cur; cur = cur->prev) {
> > +   if (cur->prog == prog) {
> > +   if (!cur->prev)
> > +   ret = -ENOENT;
> > +   else
> > +   ret = bpf_prog_set(fd, cur->prev->prog);
> 
> This lets you take an fd pointing to one prog and point it elsewhere.
> I'm not sure that's a good idea.

That's how the interface was designed (calling ptrace(NEXT_FILTER, fd) and
then doing bpf(DUMP, fd)). I suppose we could have NEXT_FILTER return
a new fd instead if that seems better to you.

Tycho
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Documentation: Remove misleading examples of the barriers in wake_*()

2015-09-11 Thread Boqun Feng
Hi Oleg,

On Thu, Sep 10, 2015 at 07:55:57PM +0200, Oleg Nesterov wrote:
> On 09/10, Boqun Feng wrote:
> >
> > On Wed, Sep 09, 2015 at 12:28:22PM -0700, Paul E. McKenney wrote:
> > > My feeling is
> > > that we should avoid saying too much about the internals of wait_event()
> > > and wake_up().
> 
> I feel the same. I simply can't understand what we are trying to
> document ;)
> 

What I think we should document here is what memory ordering guarantee
we can rely on with these sleep/wakeup primitives, or what kind of
barriers these primitives imply. Because the structure of the
memory-barriers.txt here is:

 (*) Implicit kernel memory barriers.

 - Locking functions.
 - Interrupt disabling functions.
   ->- Sleep and wake-up functions.<-
 - Miscellaneous functions.

> For example,
> 
> > A STORE-LOAD barrier is implied after setting task state by wait-related 
> > functions:
> >
> > prepare_to_wait();
> > prepare_to_wait_exclusive();
> > prepare_to_wait_event();
> 
> I won't argue, but to me this looks misleading too.
> 
> Yes, prepare_to_wait()->set_current_state() implies mb() and thus
> a STORE-LOAD barrier.
> 
> But this has nothing to do with the explanation above. We do not
> need this barrier to avoid the race with wake_up(). Again, again,
> we can safely rely on wq->lock and acquire/release semantics.
> 

Yes, you are right. prepare_to_wait*() should be put here. What should
be put here is set_current_state(), whose STORE-LOAD barrier pairs with
the STORE-LOAD barrier of wake_up_process().

> This barrier is only needed if you do, say,
> 
>   CONDITION = 1;
> 
>   if (waitqueue_active(wq))
>   wake_up(wq);
> 
> And note that the code above is wrong without another mb() after
> CONDITION = 1.
> 

Understood, I admit I didn't realize this before.

To summarize, we have three kinds of data related to sleep/wakeup:

*   CONDITIONs: global data used to indicate events

*   task states

*   wait queues(may not be used, if users use set_current_state() +
schedule() to sleep and wake_up_process() to wake up)

IIUC, the race on wait queues are almost avoided because of wq->locks,
and if a wait queue is used, race on task states are avoided because
states are readed and written with a wq->lock held in sleep/wakeup
functions. So only in two cases we need STORE-LOAD barriers to avoid the
race:

1.  no wait queue used(e.g. rcu_boost_kthread), we need STORE-LOAD
to order accesses to task states and CONDITIONs, so we have
barriers in wake_up_process() and set_current_state().

2.  wait queue accessed without a wq->lock held(e.g. your example),
we need STORE-LOAD to order accesses to wait queues and CONDITIONs

Since case #1 still exists in kernel, we'd better keep this section in
memory-barriers.txt, however, I'm not sure whether we should mention
case #2 in this section.

Here is a modified version, without mentioning case #2:


SLEEP AND WAKE-UP FUNCTIONS
---

Sleeping and waking on an event flagged in global data can be viewed as an
interaction between two pieces of data: the task state of the task waiting for
the event and the global data used to indicate the event.  To make sure that
these appear to happen in the right order, the primitives to begin the process
of going to sleep, and the primitives to initiate a wake up imply certain
barriers.

If a wait queue is used, all accesses to task states are protected by the lock
of the wait queue, so the race on task states are avoided. However, if no wait
queue used, we need some memory ordering guantanee to avoid the race between
sleepers/wakees and wakers.

The memory ordering requirement here can be expressed by two STORE-LOAD
barriers(barriers which can guarantee a STORE perceding it can never be
reordered after a LOAD following it). One STORE-LOAD barrier is needed on the
sleeper/wakee side, before reading a variable used to indicate the event and
after setting the state of the current task. Another STORE-LOAD barrier is
needed on the waker side, before reading the state of the wakee task and after
setting a variable used to indicate the event. These two barriers can pair with
each other to avoid race conditions between sleepers/wakees and wakers:

sleepr/wakee on CPU 1   waker on CPU 2

{ wakee->state = TASK_RUNNING, event_indicated = 0 }
STORE current->state=TASK_INTERRUPTIBLE

c = LOAD event_indicated
STORE event_indicated=1

s = LOAD wakee->state   

assert(!(c==0 && s == TASK_RUNNING));

A STORE-LOAD barrier is implied after setting task state in set_current_state()
and before reading task state in wake_up_process()

Make sure call set_current_state() before read the global data used to 

平时最多也就联系了三千家,全球还有十几万客户在哪里?

2015-09-11 Thread iSayor
您好:
您还在用ali平台开发外贸客户?
   还在使用展会宣传企业和产品?
 你out了!!!
 当前外贸客户开发难,您是否也在寻找展会,B2B之外好的渠道? 
 行业全球十几万客户,平时最多也就联系了三千家,您是否想把剩下的也开发到?
 加QQ2652697913给您演示下主动开发客户的方法,先用先受益,已经有近万家企业领先您使用!!。
 广东省商业联合会推荐,主动开发客户第一品牌,近万家企业正在获益。您可以没有使用,但是不能没有了解。

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 3/6] locking/pvqspinlock, x86: Optimize PV unlock code path

2015-09-11 Thread Waiman Long
The unlock function in queued spinlocks was optimized for better
performance on bare metal systems at the expense of virtualized guests.

For x86-64 systems, the unlock call needs to go through a
PV_CALLEE_SAVE_REGS_THUNK() which saves and restores 8 64-bit
registers before calling the real __pv_queued_spin_unlock()
function. The thunk code may also be in a separate cacheline from
__pv_queued_spin_unlock().

This patch optimizes the PV unlock code path by:
 1) Moving the unlock slowpath code from the fastpath into a separate
__pv_queued_spin_unlock_slowpath() function to make the fastpath
as simple as possible..
 2) For x86-64, hand-coded an assembly function to combine the register
saving thunk code with the fastpath code. Only registers that
are used in the fastpath will be saved and restored. If the
fastpath fails, the slowpath function will be called via another
PV_CALLEE_SAVE_REGS_THUNK(). For 32-bit, it falls back to the C
__pv_queued_spin_unlock() code as the thunk saves and restores
only one 32-bit register.

With a microbenchmark of 5M lock-unlock loop, the table below shows
the execution times before and after the patch with different number
of threads in a VM running on a 32-core Westmere-EX box with x86-64
4.2-rc1 based kernels:

  Threads   Before patchAfter patch % Change
  ---   --- 
 1 134.1 ms   119.3 ms-11%
 2 1286  ms953  ms-26%
 3 3715  ms   3480  ms-6.3%
 4 4092  ms   3764  ms-8.0%

Signed-off-by: Waiman Long 
---
 arch/x86/include/asm/qspinlock_paravirt.h |   59 +
 kernel/locking/qspinlock_paravirt.h   |   43 +
 2 files changed, 86 insertions(+), 16 deletions(-)

diff --git a/arch/x86/include/asm/qspinlock_paravirt.h 
b/arch/x86/include/asm/qspinlock_paravirt.h
index b002e71..3001972 100644
--- a/arch/x86/include/asm/qspinlock_paravirt.h
+++ b/arch/x86/include/asm/qspinlock_paravirt.h
@@ -1,6 +1,65 @@
 #ifndef __ASM_QSPINLOCK_PARAVIRT_H
 #define __ASM_QSPINLOCK_PARAVIRT_H
 
+/*
+ * For x86-64, PV_CALLEE_SAVE_REGS_THUNK() saves and restores 8 64-bit
+ * registers. For i386, however, only 1 32-bit register needs to be saved
+ * and restored. So an optimized version of __pv_queued_spin_unlock() is
+ * hand-coded for 64-bit, but it isn't worthwhile to do it for 32-bit.
+ */
+#ifdef CONFIG_64BIT
+
+PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock_slowpath);
+#define __pv_queued_spin_unlock__pv_queued_spin_unlock
+#define PV_UNLOCK  "__raw_callee_save___pv_queued_spin_unlock"
+#define PV_UNLOCK_SLOWPATH 
"__raw_callee_save___pv_queued_spin_unlock_slowpath"
+
+/*
+ * Optimized assembly version of __raw_callee_save___pv_queued_spin_unlock
+ * which combines the registers saving trunk and the body of the following
+ * C code:
+ *
+ * void __pv_queued_spin_unlock(struct qspinlock *lock)
+ * {
+ * struct __qspinlock *l = (void *)lock;
+ * u8 lockval = cmpxchg(>locked, _Q_LOCKED_VAL, 0);
+ *
+ * if (likely(lockval == _Q_LOCKED_VAL))
+ * return;
+ * pv_queued_spin_unlock_slowpath(lock, lockval);
+ * }
+ *
+ * For x86-64,
+ *   rdi = lock(first argument)
+ *   rsi = lockval (second argument)
+ *   rdx = internal variable (set to 0)
+ */
+asm(".pushsection .text;"
+".globl " PV_UNLOCK ";"
+".align 4,0x90;"
+PV_UNLOCK ": "
+"push  %rdx;"
+"mov   $0x1,%eax;"
+"xor   %edx,%edx;"
+"lock cmpxchg %dl,(%rdi);"
+"cmp   $0x1,%al;"
+"jne   .slowpath;"
+"pop   %rdx;"
+"ret;"
+".slowpath: "
+"push   %rsi;"
+"movzbl %al,%esi;"
+"call " PV_UNLOCK_SLOWPATH ";"
+"pop%rsi;"
+"pop%rdx;"
+"ret;"
+".size " PV_UNLOCK ", .-" PV_UNLOCK ";"
+".popsection");
+
+#else /* CONFIG_64BIT */
+
+extern void __pv_queued_spin_unlock(struct qspinlock *lock);
 PV_CALLEE_SAVE_REGS_THUNK(__pv_queued_spin_unlock);
 
+#endif /* CONFIG_64BIT */
 #endif
diff --git a/kernel/locking/qspinlock_paravirt.h 
b/kernel/locking/qspinlock_paravirt.h
index f0450ff..4bd323d 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -308,23 +308,14 @@ static void pv_wait_head(struct qspinlock *lock, struct 
mcs_spinlock *node)
 }
 
 /*
- * PV version of the unlock function to be used in stead of
- * queued_spin_unlock().
+ * PV versions of the unlock fastpath and slowpath functions to be used
+ * instead of queued_spin_unlock().
  */
-__visible void __pv_queued_spin_unlock(struct qspinlock *lock)
+__visible void
+__pv_queued_spin_unlock_slowpath(struct qspinlock *lock, u8 locked)
 {
struct __qspinlock *l = (void *)lock;
struct pv_node *node;
-   u8 locked;
-
-   /*
-* We must not unlock if SLOW, because in that case we must first
-* unhash. Otherwise it 

[PATCH v6 6/6] locking/pvqspinlock: Queue node adaptive spinning

2015-09-11 Thread Waiman Long
In an overcommitted guest where some vCPUs have to be halted to make
forward progress in other areas, it is highly likely that a vCPU later
in the spinlock queue will be spinning while the ones earlier in the
queue would have been halted. The spinning in the later vCPUs is then
just a waste of precious CPU cycles because they are not going to
get the lock soon as the earlier ones have to be woken up and take
their turn to get the lock.

This patch implements an adaptive spinning mechanism where the vCPU
will call pv_wait() if the following conditions are true:

 1) the vCPU has not been halted before;
 2) the previous vCPU is not running.

Linux kernel builds were run in KVM guest on an 8-socket, 4
cores/socket Westmere-EX system and a 4-socket, 8 cores/socket
Haswell-EX system. Both systems are configured to have 32 physical
CPUs. The kernel build times before and after the patch were:

WestmereHaswell
  Patch 32 vCPUs48 vCPUs32 vCPUs48 vCPUs
  - 
  Before patch   3m02.3s 5m00.2s 1m43.7s 3m03.5s
  After patch3m03.0s 4m37.5s 1m43.0s 2m47.2s

For 32 vCPUs, this patch doesn't cause any noticeable change in
performance. For 48 vCPUs (over-committed), there is about 8%
performance improvement.

Signed-off-by: Waiman Long 
---
 kernel/locking/qspinlock.c  |5 ++-
 kernel/locking/qspinlock_paravirt.h |   52 +-
 2 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 1be1aab..319e823 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -247,7 +247,8 @@ static __always_inline void set_locked(struct qspinlock 
*lock)
  */
 
 static __always_inline void __pv_init_node(struct mcs_spinlock *node) { }
-static __always_inline void __pv_wait_node(struct mcs_spinlock *node) { }
+static __always_inline void __pv_wait_node(struct mcs_spinlock *node,
+  struct mcs_spinlock *prev) { }
 static __always_inline bool __pv_wait_head_and_lock(struct qspinlock *lock,
struct mcs_spinlock *node,
u32 tail)
@@ -398,7 +399,7 @@ queue:
prev = decode_tail(old);
WRITE_ONCE(prev->next, node);
 
-   pv_wait_node(node);
+   pv_wait_node(node, prev);
arch_mcs_spin_lock_contended(>locked);
}
 
diff --git a/kernel/locking/qspinlock_paravirt.h 
b/kernel/locking/qspinlock_paravirt.h
index 9fd49a2..57ed38b 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -23,6 +23,22 @@
 #define _Q_SLOW_VAL(3U << _Q_LOCKED_OFFSET)
 
 /*
+ * Queue Node Adaptive Spinning
+ *
+ * A queue node vCPU will stop spinning if the following conditions are true:
+ * 1) vCPU in the previous node is not running
+ * 2) current vCPU has not been halted before
+ *
+ * The one lock stealing attempt allowed at slowpath entry mitigates the
+ * slight slowdown for non-overcommitted guest with this aggressive wait-early
+ * mechanism.
+ *
+ * The status of the previous node will be checked at fixed interval
+ * controlled by PV_PREV_CHECK_MASK.
+ */
+#define PV_PREV_CHECK_MASK 0xff
+
+/*
  * Queue node uses: vcpu_running & vcpu_halted.
  * Queue head uses: vcpu_running & vcpu_hashed.
  */
@@ -71,6 +87,7 @@ enum pv_qlock_stat {
pvstat_wait_head,
pvstat_wait_node,
pvstat_wait_again,
+   pvstat_wait_early,
pvstat_kick_wait,
pvstat_kick_unlock,
pvstat_spurious,
@@ -90,6 +107,7 @@ static const char * const stat_fsnames[pvstat_num] = {
[pvstat_wait_head]   = "wait_head_count",
[pvstat_wait_node]   = "wait_node_count",
[pvstat_wait_again]  = "wait_again_count",
+   [pvstat_wait_early]  = "wait_early_count",
[pvstat_kick_wait]   = "kick_wait_count",
[pvstat_kick_unlock] = "kick_unlock_count",
[pvstat_spurious]= "spurious_wakeup",
@@ -328,6 +346,20 @@ static struct pv_node *pv_unhash(struct qspinlock *lock)
 }
 
 /*
+ * Return true if when it is time to check the previous node which is not
+ * in a running state.
+ */
+static inline bool
+pv_wait_early(struct pv_node *node, struct pv_node *prev, int loop)
+{
+
+   if ((loop & PV_PREV_CHECK_MASK) != 0)
+   return false;
+
+   return READ_ONCE(prev->state) != vcpu_running;
+}
+
+/*
  * Initialize the PV part of the mcs_spinlock node.
  */
 static void pv_init_node(struct mcs_spinlock *node)
@@ -345,16 +377,25 @@ static void pv_init_node(struct mcs_spinlock *node)
  * pv_kick_node() is used to set _Q_SLOW_VAL and fill in hash table on its
  * behalf.
  */
-static void pv_wait_node(struct mcs_spinlock *node)
+static void pv_wait_node(struct mcs_spinlock 

[PATCH v6 4/6] locking/pvqspinlock: Collect slowpath lock statistics

2015-09-11 Thread Waiman Long
This patch enables the accumulation of kicking and waiting related
PV qspinlock statistics when the new QUEUED_LOCK_STAT configuration
option is selected. It also enables the collection of kicking and
wakeup latencies which have a heavy dependency on the CPUs being used.

The measured latencies for different CPUs are:

CPU Wakeup  Kicking
--- --  ---
Haswell-EX  63.6us   7.4us
Westmere-EX 67.6us   9.3us

The measured latencies varied a bit from run-to-run. The wakeup
latency is much higher than the kicking latency.

A sample of statistics counts after system bootup (with vCPU
overcommit) was:

hash_hops_count=9001
kick_latencies=138047878
kick_unlock_count=9001
kick_wait_count=9000
spurious_wakeup=3
wait_again_count=2
wait_head_count=10
wait_node_count=8994
wake_latencies=713195944

Signed-off-by: Waiman Long 
---
 arch/x86/Kconfig|7 ++
 kernel/locking/qspinlock_paravirt.h |  171 ++-
 2 files changed, 173 insertions(+), 5 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index f37010f..d08828f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -719,6 +719,13 @@ config PARAVIRT_SPINLOCKS
 
  If you are unsure how to answer this question, answer Y.
 
+config QUEUED_LOCK_STAT
+   bool "Paravirt queued lock statistics"
+   depends on PARAVIRT && DEBUG_FS && QUEUED_SPINLOCKS
+   ---help---
+ Enable the collection of statistical data on the behavior of
+ paravirtualized queued spinlocks and report them on debugfs.
+
 source "arch/x86/xen/Kconfig"
 
 config KVM_GUEST
diff --git a/kernel/locking/qspinlock_paravirt.h 
b/kernel/locking/qspinlock_paravirt.h
index 4bd323d..2d71768 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -41,6 +41,147 @@ struct pv_node {
 };
 
 /*
+ * PV qspinlock statistics
+ */
+enum pv_qlock_stat {
+   pvstat_wait_head,
+   pvstat_wait_node,
+   pvstat_wait_again,
+   pvstat_kick_wait,
+   pvstat_kick_unlock,
+   pvstat_spurious,
+   pvstat_hops,
+   pvstat_num  /* Total number of statistics counts */
+};
+
+#ifdef CONFIG_QUEUED_LOCK_STAT
+/*
+ * Collect pvqspinlock statiatics
+ */
+#include 
+#include 
+
+static const char * const stat_fsnames[pvstat_num] = {
+   [pvstat_wait_head]   = "wait_head_count",
+   [pvstat_wait_node]   = "wait_node_count",
+   [pvstat_wait_again]  = "wait_again_count",
+   [pvstat_kick_wait]   = "kick_wait_count",
+   [pvstat_kick_unlock] = "kick_unlock_count",
+   [pvstat_spurious]= "spurious_wakeup",
+   [pvstat_hops]= "hash_hops_count",
+};
+
+static atomic_t pvstats[pvstat_num];
+
+/*
+ * pv_kick_latencies = sum of all pv_kick latencies in ns
+ * pv_wake_latencies = sum of all wakeup latencies in ns
+ *
+ * Avg kick latency   = pv_kick_latencies/kick_unlock_count
+ * Avg wake latency   = pv_wake_latencies/kick_wait_count
+ * Avg # of hops/hash = hash_hops_count/kick_unlock_count
+ */
+static atomic64_t pv_kick_latencies, pv_wake_latencies;
+static DEFINE_PER_CPU(u64, pv_kick_time);
+
+/*
+ * Reset all the statistics counts if set
+ */
+static bool reset_cnts __read_mostly;
+
+/*
+ * Initialize debugfs for the PV qspinlock statistics
+ */
+static int __init pv_qspinlock_debugfs(void)
+{
+   struct dentry *d_pvqlock = debugfs_create_dir("pv-qspinlock", NULL);
+   int i;
+
+   if (!d_pvqlock)
+   pr_warn("Could not create 'pv-qspinlock' debugfs directory\n");
+
+   for (i = 0; i < pvstat_num; i++)
+   debugfs_create_u32(stat_fsnames[i], 0444, d_pvqlock,
+ (u32 *)[i]);
+   debugfs_create_u64("kick_latencies", 0444, d_pvqlock,
+  (u64 *)_kick_latencies);
+   debugfs_create_u64("wake_latencies", 0444, d_pvqlock,
+  (u64 *)_wake_latencies);
+   debugfs_create_bool("reset_cnts", 0644, d_pvqlock, (u32 *)_cnts);
+   return 0;
+}
+fs_initcall(pv_qspinlock_debugfs);
+
+/*
+ * Reset all the counts
+ */
+static noinline void pvstat_reset(void)
+{
+   int i;
+
+   for (i = 0; i < pvstat_num; i++)
+   atomic_set([i], 0);
+   atomic64_set(_kick_latencies, 0);
+   atomic64_set(_wake_latencies, 0);
+   reset_cnts = 0;
+}
+
+/*
+ * Increment the PV qspinlock statistics counts
+ */
+static inline void pvstat_inc(enum pv_qlock_stat stat)
+{
+   atomic_inc([stat]);
+   if (unlikely(reset_cnts))
+   pvstat_reset();
+}
+
+/*
+ * PV hash hop count
+ */
+static inline void pvstat_hop(int hopcnt)
+{
+   atomic_add(hopcnt, [pvstat_hops]);
+}
+
+/*
+ * Replacement function for pv_kick()
+ */
+static inline void __pv_kick(int cpu)
+{
+   u64 start = sched_clock();
+
+   *per_cpu_ptr(_kick_time, cpu) = start;
+   pv_kick(cpu);
+   

[PATCH v6 1/6] locking/qspinlock: relaxes cmpxchg & xchg ops in native code

2015-09-11 Thread Waiman Long
This patch replaces the cmpxchg() and xchg() calls in the native
qspinlock code with more relaxed versions of those calls to enable
other architectures to adopt queued spinlocks with less performance
overhead.

Signed-off-by: Waiman Long 
---
 arch/x86/include/asm/qspinlock.h |2 +-
 include/asm-generic/qspinlock.h  |6 +++---
 kernel/locking/qspinlock.c   |   21 +
 3 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/qspinlock.h b/arch/x86/include/asm/qspinlock.h
index 9d51fae..053e70d 100644
--- a/arch/x86/include/asm/qspinlock.h
+++ b/arch/x86/include/asm/qspinlock.h
@@ -46,7 +46,7 @@ static inline bool virt_queued_spin_lock(struct qspinlock 
*lock)
if (!static_cpu_has(X86_FEATURE_HYPERVISOR))
return false;
 
-   while (atomic_cmpxchg(>val, 0, _Q_LOCKED_VAL) != 0)
+   while (atomic_cmpxchg_acquire(>val, 0, _Q_LOCKED_VAL) != 0)
cpu_relax();
 
return true;
diff --git a/include/asm-generic/qspinlock.h b/include/asm-generic/qspinlock.h
index 83bfb87..efbd1fd 100644
--- a/include/asm-generic/qspinlock.h
+++ b/include/asm-generic/qspinlock.h
@@ -62,7 +62,7 @@ static __always_inline int queued_spin_is_contended(struct 
qspinlock *lock)
 static __always_inline int queued_spin_trylock(struct qspinlock *lock)
 {
if (!atomic_read(>val) &&
-  (atomic_cmpxchg(>val, 0, _Q_LOCKED_VAL) == 0))
+  (atomic_cmpxchg_acquire(>val, 0, _Q_LOCKED_VAL) == 0))
return 1;
return 0;
 }
@@ -77,7 +77,7 @@ static __always_inline void queued_spin_lock(struct qspinlock 
*lock)
 {
u32 val;
 
-   val = atomic_cmpxchg(>val, 0, _Q_LOCKED_VAL);
+   val = atomic_cmpxchg_acquire(>val, 0, _Q_LOCKED_VAL);
if (likely(val == 0))
return;
queued_spin_lock_slowpath(lock, val);
@@ -93,7 +93,7 @@ static __always_inline void queued_spin_unlock(struct 
qspinlock *lock)
/*
 * smp_mb__before_atomic() in order to guarantee release semantics
 */
-   smp_mb__before_atomic_dec();
+   smp_mb__before_atomic();
atomic_sub(_Q_LOCKED_VAL, >val);
 }
 #endif
diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 337c881..28a15c7 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -176,7 +176,12 @@ static __always_inline u32 xchg_tail(struct qspinlock 
*lock, u32 tail)
 {
struct __qspinlock *l = (void *)lock;
 
-   return (u32)xchg(>tail, tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET;
+   /*
+* Use release semantics to make sure that the MCS node is properly
+* initialized before changing the tail code.
+*/
+   return (u32)xchg_release(>tail,
+tail >> _Q_TAIL_OFFSET) << _Q_TAIL_OFFSET;
 }
 
 #else /* _Q_PENDING_BITS == 8 */
@@ -208,7 +213,11 @@ static __always_inline u32 xchg_tail(struct qspinlock 
*lock, u32 tail)
 
for (;;) {
new = (val & _Q_LOCKED_PENDING_MASK) | tail;
-   old = atomic_cmpxchg(>val, val, new);
+   /*
+* Use release semantics to make sure that the MCS node is
+* properly initialized before changing the tail code.
+*/
+   old = atomic_cmpxchg_release(>val, val, new);
if (old == val)
break;
 
@@ -319,7 +328,7 @@ void queued_spin_lock_slowpath(struct qspinlock *lock, u32 
val)
if (val == new)
new |= _Q_PENDING_VAL;
 
-   old = atomic_cmpxchg(>val, val, new);
+   old = atomic_cmpxchg_acquire(>val, val, new);
if (old == val)
break;
 
@@ -426,7 +435,11 @@ queue:
set_locked(lock);
break;
}
-   old = atomic_cmpxchg(>val, val, _Q_LOCKED_VAL);
+   /*
+* The smp_load_acquire() call above has provided the necessary
+* acquire semantics required for locking.
+*/
+   old = atomic_cmpxchg_relaxed(>val, val, _Q_LOCKED_VAL);
if (old == val)
goto release;   /* No contention */
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC v3 4/6] perf tools: Add a simple JSON parser

2015-09-11 Thread Alexander Shishkin
In order to process kernel's extended syscall error reports, we need a
JSON parser. This one I wrote myself, it should be very simple and
straightforward and extensible when/if somebody needs more features from
it.

Signed-off-by: Alexander Shishkin 
---
 tools/include/tools/json.h |  40 
 tools/lib/util/json.c  | 250 +
 tools/perf/util/Build  |   5 +
 3 files changed, 295 insertions(+)
 create mode 100644 tools/include/tools/json.h
 create mode 100644 tools/lib/util/json.c

diff --git a/tools/include/tools/json.h b/tools/include/tools/json.h
new file mode 100644
index 00..a1684574c2
--- /dev/null
+++ b/tools/include/tools/json.h
@@ -0,0 +1,40 @@
+/*
+ * A simple JSON parser
+ * Copyright (c) 2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#ifndef _TOOLS_JSON_H
+#define _TOOLS_JSON_H
+
+#define JSON_DEPTH 8
+
+struct json_member {
+   const char  *key;
+   char*value;
+   unsigned intsize;
+};
+
+struct json_parser {
+   const char  *buffer;
+   const char  *end;
+   const char  *cursor;
+   unsigned intstate;
+   unsigned intlevel;
+   unsigned char   stack[JSON_DEPTH];
+   struct json_member  *schema;
+   unsigned intschema_strict;
+   int key;
+};
+
+int parse_json(struct json_parser *p);
+
+#endif /* _TOOLS_JSON_H */
diff --git a/tools/lib/util/json.c b/tools/lib/util/json.c
new file mode 100644
index 00..029c244d0e
--- /dev/null
+++ b/tools/lib/util/json.c
@@ -0,0 +1,250 @@
+/*
+ * A simple JSON parser
+ * Copyright (c) 2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+
+#include 
+#include 
+#include 
+
+#include "tools/json.h"
+
+enum {
+   JS_VALUE = 0,
+   JS_SEPARATOR,
+   JS_KEY,
+   JS_END,
+};
+
+enum {
+   OBJECT = 0,
+   ARRAY,
+   LITERAL,
+};
+
+static int got_key(struct json_parser *p, unsigned int back)
+{
+   const char *key = p->cursor - back;
+   int i;
+
+   for (i = 0; p->schema[i].key; i++) {
+   struct json_member *m = >schema[i];
+
+   if (!strncmp(key, m->key, strlen(m->key))) {
+   p->key = i;
+   return 0;
+   }
+   }
+
+   p->key = -1;
+   return p->schema_strict ? -EINVAL : 0;
+}
+
+static int got_value(struct json_parser *p, unsigned int back)
+{
+   unsigned int len = back;
+
+   if (p->key < 0)
+   return p->schema_strict ? -EINVAL : 0;
+
+   if (len > p->schema[p->key].size)
+   len = p->schema[p->key].size;
+
+   memcpy(p->schema[p->key].value, p->cursor - back, len);
+   p->schema[p->key].value[len] = 0;
+   p->key = -1;
+   return 0;
+}
+
+static int consume_string(struct json_parser *p)
+{
+   int ret = 0;
+
+   for (p->cursor++; p->cursor < p->end;
+p->cursor++, ret++) {
+   switch(*p->cursor) {
+   case '"':
+   goto done;
+   case '\\':
+   p->cursor++;
+   ret++;
+   default:
+   continue;
+   }
+   }
+
+   return -EINVAL;
+
+done:
+   if (p->state == JS_KEY)
+   ret = got_key(p, ret);
+   else if (p->state == JS_VALUE)
+   ret = got_value(p, ret);
+
+   return ret;
+}
+
+static int consume_number(struct json_parser *p)
+{
+   unsigned int hex = 0;
+   int ret = 0;
+
+   for (; p->cursor < p->end;
+p->cursor++, ret++) {
+   switch (*p->cursor) {
+   case '.': /* float */
+   if (!ret)
+   return -EINVAL;
+   continue;
+   case 'x': /* hex */
+   if (ret != 1 || *(p->cursor - 1) != '0')
+   return -EINVAL;
+   hex = 1;
+   

[PATCH 0/1] rcu_sync: Cleanup the CONFIG_PROVE_RCU checks

2015-09-11 Thread Oleg Nesterov
On 09/10, Paul E. McKenney wrote:
>
> On Thu, Sep 10, 2015 at 03:59:42PM +0200, Oleg Nesterov wrote:
> > On 09/09, Paul E. McKenney wrote:
> > >
> > > This is obsolete, but its replacement is the same patch.
> >
> > fbe3b97183f84155d81e506b1aa7d2ce986f7a36 in linux-rcu.git#experimental
> > I guess?
> >
> > > Oleg, Davidlohr, am I missing something on how percpu_rwsem or
> > > locktorture work?
> >
> > No, I think the patch is fine. Thanks for doing this! I was going to
> > send something like this change too. And in fact I am still thinking
> > about another test which plays with rcu_sync only, but probably we
> > need some cleanups first (and we need them anyway). I'll try to do
> > this a bit later.
>
> I would welcome an rcu_sync-specific torture patch!

I want it much more than you ;) I have already warned you, I'll send
more rcu_sync patches. The current code is actually a very early draft
which was written during the discussion with Peter a long ago. I sent
it unchanged because a) it was already reviewed and b) I tested it a
bit in the past.

We can greatly simplify this code and at the same time make it more
useful. Actually I already have the patches. The 1st one removes
rcu_sync->cb_state and gp_ops->sync(). This makes the state machine
almost self-obvious and allows other improvements. See the resulting
(pseudo) code at the end.

But again, I'll try very much to write the test before I send the patch.


Until then, let me send this trivial cleanup. The CONFIG_PROVE_RCU
code looks trivial but imo really annoying. And it is not complete,
so lets document this at least. Plus rcu_lockdep_assert() looks more
consistent.


> > > +void torture_percpu_rwsem_init(void)
> > > +{
> > > + BUG_ON(percpu_init_rwsem(_rwsem));
> > > +}
> > > +
> >
> > Aha, we don't really need this... I mean we can use the static initialiser
> > which can also be used by uprobes and cgroups. I'll try to send the patch
> > tomorrow.
>
> Very good, please do!

Hmm. I am lier. I won't send this patch at least today.

The change I had in mind is very simple,

#define DECLARE_PERCPU_RWSEM(sem)   \
static DEFINE_PER_CPU(unsigned int, sem##_counters);\
struct percpu_rw_semaphore sem = {  \
.fast_read_ctr = ##_counters,   \
... \
}

and yes, uprobes and cgroups can use it.

But somehow I missed that we can't use it to define a _static_ sem,

static DECLARE_PERCPU_RWSEM(sem);

obviously won't work. And damn, I am shy to admit that I spent several
hours trying to invent something but failed. Perhaps we can add 2 helpers,
DECLARE_PERCPU_RWSEM_GLOBAL() and DECLARE_PERCPU_RWSEM_STATIC().

Oleg.

---
static const struct {
void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
void (*wait)(void); // TODO: remove this
#ifdef CONFIG_PROVE_RCU
int  (*held)(void);
#endif
} gp_ops[] = {
...
};

// COMMENT to explain these states
enum { GP_IDLE = 0, GP_ENTER, GP_PASSED, GP_EXIT, GP_REPLAY };

#define rss_lockgp_wait.lock

// !!1
// XXX code must be removed when we split rcu_sync_enter() into start + wait
// !!!

static void rcu_sync_func(struct rcu_head *rcu)
{
struct rcu_sync *rsp = container_of(rcu, struct rcu_sync, cb_head);
unsigned long flags;

BUG_ON(rsp->gp_state == GP_IDLE);
BUG_ON(rsp->gp_state == GP_PASSED);

spin_lock_irqsave(>rss_lock, flags);
if (rsp->gp_count) {
/*
 * COMMENT.
 */
rsp->gp_state = GP_PASSED;
wake_up_locked(>gp_wait);
} else if (rsp->gp_state == GP_REPLAY) {
/*
 * A new rcu_sync_exit() has happened; requeue the callback
 * to catch a later GP.
 */
rsp->gp_state = GP_EXIT;
gp_ops[rsp->gp_type].call(>cb_head, rcu_sync_func);
} else {
/*
 * We're at least a GP after rcu_sync_exit(); eveybody will now
 * have observed the write side critical section. Let 'em rip!.
 */
BUG_ON(rsp->gp_state == GP_ENTER);  // XXX
rsp->gp_state = GP_IDLE;
}
spin_unlock_irqrestore(>rss_lock, flags);
}

static void rcu_sync_call(struct rcu_sync *rsp)
{
// TODO:
// This is called by might_sleep() code outside of ->rss_lock,
// we can avoid ->call() in some cases (say rcu_blocking_is_gp())
gp_ops[rsp->gp_type].call(>cb_head, rcu_sync_func);
}

void rcu_sync_enter(struct 

[PATCH RFC v3 1/6] exterr: Introduce extended syscall error reporting

2015-09-11 Thread Alexander Shishkin
It has been pointed out several times that certain system calls' error
reporting leaves a lot to be desired [1], [2]. Such system calls would
take complex parameter structures as their input and return -EINVAL if
one or more parameters are invalid or in conflict leaving it up to the
user to figure out exactly what is wrong with their request. One such
syscall is perf_event_open() with its attribute structure containing
40+ parameters and tens of parameter validation checks.

This patch introduces a fairly simple infrastructure that allows call
sites to annotate their error codes with arbitrary strings, which the
userspace can fetch using a prctl() along with the module name that
produced the error message, file name, line number and optionally any
amount of additional information in JSON format. This way, we can
provide both human-readable and machine-parsable information to user and
leave room for domain-specific extensions, such as the field in the
parameter structure that caused the error.

Each error "site" is referred to by its index, which is folded into an
integer error value within the range of [-EXT_ERRNO, -MAX_ERRNO], where
EXT_ERRNO is chosen to be below any known error codes, but still leaving
enough room to enumerate error sites. This way, all the traditional macros
will still handle these as error codes and we'd only have to convert them
to their original values right before returning to userspace. At that
point we'd also store a pointer to the error descriptor in the task_struct,
so that a subsequent prctl() call can retrieve it.

[1] http://marc.info/?l=linux-kernel=141470811013082
[2] http://marc.info/?l=linux-kernel=144049385530680

Signed-off-by: Alexander Shishkin 
---
 include/linux/exterr.h |  99 
 include/linux/sched.h  |   1 +
 include/uapi/linux/prctl.h |   5 ++
 kernel/sys.c   |   6 ++
 lib/Makefile   |   2 +
 lib/exterr.c   | 157 +
 6 files changed, 270 insertions(+)
 create mode 100644 include/linux/exterr.h
 create mode 100644 lib/exterr.c

diff --git a/include/linux/exterr.h b/include/linux/exterr.h
new file mode 100644
index 00..1f412fe9ac
--- /dev/null
+++ b/include/linux/exterr.h
@@ -0,0 +1,99 @@
+/*
+ * Extended syscall error reporting
+ */
+#ifndef _LINUX_EXTERR_H
+#define _LINUX_EXTERR_H
+
+#include 
+
+/*
+ * Extended error reporting: annotate an error code with a string
+ * and a module name to help users diagnase problems with their
+ * attributes and other syscall parameters.
+ */
+
+/*
+ * This is the basic error descriptor structure that is statically
+ * allocated for every annotated error (error site).
+ *
+ * Subsystems that wish to extend this structure should embed it
+ * and provide a callback for formatting the additional fields.
+ */
+struct ext_err_site {
+   const char  *message;
+   const char  *owner;
+   const char  *file;
+   const int   line;
+   const int   code;
+};
+
+/*
+ * Error domain descriptor (compile/link time)
+ */
+struct ext_err_domain_desc {
+   const char  *name;
+   const size_terr_site_size;
+   const void  *start, *end;
+   char*(*format)(void *site);
+   int first;
+   int last;
+};
+
+extern struct ext_err_domain_desc __attribute__((weak)) 
__start___ext_err_domain_desc[];
+extern struct ext_err_domain_desc __attribute__((weak)) 
__stop___ext_err_domain_desc[];
+
+#define DECLARE_EXTERR_DOMAIN(__name, __format)
\
+   extern const struct __name ## _ext_err_site __attribute__((weak)) 
__start_ ## __name ## _ext_err[]; \
+   extern const struct __name ## _ext_err_site __attribute__((weak)) 
__stop_ ## __name ## _ext_err[]; \
+   const struct ext_err_domain_desc __used \
+   __attribute__ ((__section__("__ext_err_domain_desc")))  \
+   __name ## _ext_err_domain_desc = {  \
+   .name   = __stringify(__name),  \
+   .err_site_size  = sizeof(struct __name ## _ext_err_site), \
+   .start  = __start_ ## __name ## _ext_err,   \
+   .end= __stop_ ## __name ## _ext_err,\
+   .format = __format, \
+   };  \
+
+#define __ext_err(__domain, __c, __m, __domain__fields ...) ({ \
+   static struct __domain ## _ext_err_site \
+   __attribute__ ((__section__(__stringify(__domain) "_ext_err"))) \
+   __err_site = {  \
+   .site = {   \
+   .message= 

[PATCH RFC v3 6/6] perf tools: Use extended syscall error reporting

2015-09-11 Thread Alexander Shishkin
If the kernel has an extended error report for us, use it instead of
trying to guess what might have gone wrong.

Signed-off-by: Alexander Shishkin 
---
 tools/perf/util/evsel.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index c53f79123b..1804781072 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -27,6 +27,7 @@
 #include "debug.h"
 #include "trace-event.h"
 #include "stat.h"
+#include "exterr.h"
 
 static struct {
bool sample_id_all;
@@ -2266,7 +2267,16 @@ bool perf_evsel__fallback(struct perf_evsel *evsel, int 
err,
 int perf_evsel__open_strerror(struct perf_evsel *evsel, struct target *target,
  int err, char *msg, size_t size)
 {
-   char sbuf[STRERR_BUFSIZE];
+   char sbuf[BUFSIZ];
+   int ret;
+
+   ret = exterr__strerror(msg, size);
+   /*
+* If kernel gave an extended error description, don't try to be any
+* more helpful here.
+*/
+   if (ret > 0)
+   return ret;
 
switch (err) {
case EPERM:
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC v3 5/6] perf tools: Add userspace counterpart for extended error reporting

2015-09-11 Thread Alexander Shishkin
Add a wrapper for fetching, parsing and pretty-printing kernel's extended
syscall error reports in a manner that can be useful for communicating
errors to the user.

Signed-off-by: Alexander Shishkin 
---
 tools/perf/util/Build|  1 +
 tools/perf/util/exterr.c | 79 
 tools/perf/util/exterr.h | 21 +
 3 files changed, 101 insertions(+)
 create mode 100644 tools/perf/util/exterr.c
 create mode 100644 tools/perf/util/exterr.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index af5acb9a02..2ccfb3e0e3 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -75,6 +75,7 @@ libperf-y += stat-shadow.o
 libperf-y += record.o
 libperf-y += srcline.o
 libperf-y += data.o
+libperf-y += exterr.o
 libperf-$(CONFIG_X86) += tsc.o
 libperf-$(CONFIG_AUXTRACE) += tsc.o
 libperf-y += cloexec.o
diff --git a/tools/perf/util/exterr.c b/tools/perf/util/exterr.c
new file mode 100644
index 00..3091009688
--- /dev/null
+++ b/tools/perf/util/exterr.c
@@ -0,0 +1,79 @@
+/*
+ * exterr.c: Extended syscall error reporting support
+ * Copyright (c) 2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include 
+#include 
+
+#include "tools/json.h"
+#include "util/util.h"
+#include "util/exterr.h"
+
+static char message[BUFSIZ], attr_field[BUFSIZ], line[8], code[8];
+
+#define JSON_FIELD(name)   \
+   { .key = __stringify(name), .value = (name), .size = sizeof(name), }
+
+static struct json_member exterr_schema[] = {
+   JSON_FIELD(line),
+   JSON_FIELD(code),
+   JSON_FIELD(message),
+   JSON_FIELD(attr_field),
+   { .key = NULL },
+};
+
+ssize_t exterr__strerror(char *msg, size_t size)
+{
+   char sbuf[BUFSIZ];
+   size_t len;
+   int ret;
+
+   ret = prctl(PR_GET_ERR_DESC, sbuf, sizeof(sbuf), 0, 0);
+   if (ret > 0) {
+   struct json_parser p = {
+   .buffer = sbuf,
+   .end= sbuf + strlen(sbuf),
+   .schema = exterr_schema,
+   .schema_strict = 0,
+   };
+
+   ret = parse_json();
+   if (!ret) {
+   int orig_err;
+
+   orig_err = atoi(code);
+   ret = scnprintf(msg, size, "Syscall returned %d, 
becasue %s.\n",
+   orig_err, message);
+   len   = ret;
+   msg  += ret;
+   size -= ret;
+
+   if (attr_field[0]) {
+   /*
+* there can also be a lookup table with more
+* helpful messages based on this field
+*/
+   ret = scnprintf(msg, size, "Offending attribute 
field: \"%s\"\n",
+   attr_field);
+   len  += ret;
+   msg  += ret;
+   size -= ret;
+   }
+
+   ret = len;
+   }
+   }
+
+   return ret;
+}
diff --git a/tools/perf/util/exterr.h b/tools/perf/util/exterr.h
new file mode 100644
index 00..fce70e4f31
--- /dev/null
+++ b/tools/perf/util/exterr.h
@@ -0,0 +1,21 @@
+/*
+ * exterr.h: Extended syscall error reporting support
+ * Copyright (c) 2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef __PERF_EXTERR_H
+#define __PERF_EXTERR_H 1
+
+ssize_t exterr__strerror(char *msg, size_t size);
+
+#endif /* __PERF_EXTERR_H */
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH man-pages v2] capabilities.7, prctl.2: Document ambient capabilities

2015-09-11 Thread Andy Lutomirski
On Fri, Sep 11, 2015 at 1:28 AM, Michael Kerrisk (man-pages)
 wrote:
> Hi Andy,
>
> Not that this has hit mainline, would you be willing to refresh this
> man-pages patch?

Absolutely.  I'll try to get to it over the next week or so.  I need
to refresh my util-linux patch, too.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 04/13] perf env: Introduce read_cpu_topology_map() method

2015-09-11 Thread Arnaldo Carvalho de Melo
Em Sat, Sep 12, 2015 at 01:14:02AM +0900, Namhyung Kim escreveu:
> > You inverted it, no?

> > So, could you please check if the below patch can have your Acked-by?
> > Namhyung?
 
> Looks good to me.
 
> Acked-by: Namhyung Kim 

Thanks, added it to the patch, after lunch I should have another patch
for another bug introduced in the same patch, i.e. if one CPU is
offlined, we simply refuse to collect the topology information.

To fix it, I think, we need to insert "(offline)" where one expects to
find the thread and core siblings info.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Re: [PATCH v2 1/5] perf probe: Split add_perf_probe_events()

2015-09-11 Thread Namhyung Kim
On Thu, Sep 10, 2015 at 08:10:16AM +, 平松雅巳 / HIRAMATU,MASAMI wrote:
> Hi Namhyung,
> 
> From: Namhyung Kim [mailto:namhy...@gmail.com] On Behalf Of Namhyung Kim
> >
> >Hi Masami,
> >
> >On Thu, Sep 10, 2015 at 05:00:07AM +, 平松雅巳 / HIRAMATU,MASAMI wrote:
> >> >From: Namhyung Kim [mailto:namhy...@gmail.com] On Behalf Of Namhyung Kim
> >> >The del_perf_probe_events() uses strfilter, but I think it can be
> >> >problematic if other instances or users are using similar events at
> >> >the same time.
> >>
> >> Yeah, since perf probe doesn't lock the ftrace, there should be a
> >> timing bug, but it can be fixed easily by ignoring -ENOENT. :)
> >
> >By ignoring -ENOENT?  Are you saying that there's a race between two
> >deleters?  Yes, of course, but I think that the bug will hit an adder
> >and a deleter especially if automatic probing is used (by eBPF and/or
> >SDT recording).
> 
> So, I don't think we need the automatic event removing. Instead, I'd like to
> suggest to keep it on the list.

But why?  Do you want reuse the probes for next record session?

I think if something is generated automatically, it should be removed
automatically..

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: v2 of seccomp filter c/r patches

2015-09-11 Thread Andy Lutomirski
On Fri, Sep 11, 2015 at 9:30 AM, Andy Lutomirski  wrote:
> On Sep 10, 2015 5:22 PM, "Tycho Andersen"  
> wrote:
>>
>> Hi all,
>>
>> Here is v2 of the seccomp filter c/r set. The patch notes have individual
>> changes from the last series, but there are two points not noted:
>>
>> * The series still does not allow us to correctly restore state for programs
>>   that will use SECCOMP_FILTER_FLAG_TSYNC in the future. Given that we want 
>> to
>>   keep seccomp_filter's identity, I think something along the lines of 
>> another
>>   seccomp command like SECCOMP_INHERIT_PARENT is needed (although I'm not 
>> sure
>>   if this can even be done yet). In addition, we'll need a kcmp command for
>>   figuring out if filters are the same, although this too needs to compare
>>   seccomp_filter objects, so it's a little screwy. Any thoughts on how to do
>>   this nicely are welcome.
>
> Let's add a concept of a seccompfd.
>
> For background of what I want to add: I want to be able to create a
> seccomp monitor.  A seccomp monitor will be, logically, a pair of a
> struct file that represents the monitor and a seccomp_filter that is
> controlled by the monitor.  Depending on flags, whoever holds the
> monitor fd could change the active filter, intercept syscalls, and
> issue syscalls on behalf of a process that is trapped in an
> intercepted syscall.
>
> Seccomp filters would nest properly.
>
> The interface would probably be (extremely pseudocoded):
>
> monitor_fd, filter_fd = seccomp(CREATE_MONITOR, flags, ...);
>
> Then, later:
>
> seccomp(ATTACH_TO_FILTER, filter_fd);  /* now filtered */
>
> read(monitor_fd, buf, size); /* returns an intercepted syscall */
> write(monitor_fd, buf, size); /* issues a syscall or releases the
> trapped task */
>
> This can't be implemented on x86 without either going insane or
> finishing the massive set of pending cleanups to the x86 entry code.
> I favor the latter.
>
> We could, however, add part of it right now: we could have a way to
> create a filterfd, we could add kcmp support for it, and we could add
> the ATTACH_TO_FILTER thing.  I think that would solve your problem.
>
> One major open question: does a filter_fd know what its parent is and,
> if so, will it just refuse to attach if the caller's parent is wrong?
> Or will a filter_fd attach anywhere.
>

Let me add one more thought:

Currently, struct seccomp_filter encodes a strict tree hierarchy: it
knows what its parent is.  This only matters as an implementation
detail and because TSYNC checks for seccomp_filter equality.

We could change this without user-visible effects.  We could say that,
for TSYNC purposes, two filter states match if they contain exactly
the same layers in the same order where a layer does *not* encode a
concept of parent.  We could then say that attaching a classic bpf
filter creates a branch new layer that is not equal to any other layer
that's been created.

This has no effect whatsoever.  The difference would be that we could
declare that attaching the same ebpf program twice creates the *same*
layer so that, if you fork and both children attach the same ebpf
program, then they match for TSYNC purposes.  Similarly, attaching the
same hypothetical filterfd would create the same layer.

Thoughts?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] arm/xen: Enable user access to the kernel before issuing a privcmd call

2015-09-11 Thread Russell King - ARM Linux
On Fri, Sep 11, 2015 at 05:25:59PM +0100, Julien Grall wrote:
> + /*
> +  * Privcmd calls are issued by the userspace. We need to allow the
> +  * kernel to access the userspace memory before issuing the hypercall.
> +  */
> + uaccess_enable r4
> +
> + /* r4 is loaded now as we use it as scratch register before */
>   ldr r4, [sp, #4]

As I mentioned in one of my previous mails, "ip" should be safe to use
here - it's a caller-corrupted register, just like r0-r3 and lr.  So,
you could do:

ldr r4, [sp, #4]
+   uaccess_enable ip

which fractionally tightens the window.

However, there's nothing actually wrong with your version - there's no
way we could've got this far with sp pointing at userspace.

I'm happy with either version, so:

Acked-by: Russell King 

How do you want to handle the patch?  I already have some other uaccess
fixes queued up to send to Linus before the merge window closes.

>   __HVC(XEN_IMM)
> +
> + /*
> +  * Disable userspace access from kernel. This is fine to do it
> +  * unconditionally as no set_fs(KERNEL_DS)/set_fs(get_ds()) is
> +  * called before.
> +  */
> + uaccess_disable r4
> +
>   ldm sp!, {r4}
>   ret lr
>  ENDPROC(privcmd_call);
> -- 
> 2.1.4
> 

-- 
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/1] rcu_sync: Cleanup the CONFIG_PROVE_RCU checks

2015-09-11 Thread Paul E. McKenney
On Fri, Sep 11, 2015 at 05:59:01PM +0200, Oleg Nesterov wrote:
> On 09/10, Paul E. McKenney wrote:
> >
> > On Thu, Sep 10, 2015 at 03:59:42PM +0200, Oleg Nesterov wrote:
> > > On 09/09, Paul E. McKenney wrote:
> > > >
> > > > This is obsolete, but its replacement is the same patch.
> > >
> > > fbe3b97183f84155d81e506b1aa7d2ce986f7a36 in linux-rcu.git#experimental
> > > I guess?
> > >
> > > > Oleg, Davidlohr, am I missing something on how percpu_rwsem or
> > > > locktorture work?
> > >
> > > No, I think the patch is fine. Thanks for doing this! I was going to
> > > send something like this change too. And in fact I am still thinking
> > > about another test which plays with rcu_sync only, but probably we
> > > need some cleanups first (and we need them anyway). I'll try to do
> > > this a bit later.
> >
> > I would welcome an rcu_sync-specific torture patch!
> 
> I want it much more than you ;) I have already warned you, I'll send
> more rcu_sync patches. The current code is actually a very early draft
> which was written during the discussion with Peter a long ago. I sent
> it unchanged because a) it was already reviewed and b) I tested it a
> bit in the past.
> 
> We can greatly simplify this code and at the same time make it more
> useful. Actually I already have the patches. The 1st one removes
> rcu_sync->cb_state and gp_ops->sync(). This makes the state machine
> almost self-obvious and allows other improvements. See the resulting
> (pseudo) code at the end.
> 
> But again, I'll try very much to write the test before I send the patch.

That sounds very good!  ;-)

> Until then, let me send this trivial cleanup. The CONFIG_PROVE_RCU
> code looks trivial but imo really annoying. And it is not complete,
> so lets document this at least. Plus rcu_lockdep_assert() looks more
> consistent.
> 
> 
> > > > +void torture_percpu_rwsem_init(void)
> > > > +{
> > > > +   BUG_ON(percpu_init_rwsem(_rwsem));
> > > > +}
> > > > +
> > >
> > > Aha, we don't really need this... I mean we can use the static initialiser
> > > which can also be used by uprobes and cgroups. I'll try to send the patch
> > > tomorrow.
> >
> > Very good, please do!
> 
> Hmm. I am lier. I won't send this patch at least today.
> 
> The change I had in mind is very simple,
> 
>   #define DECLARE_PERCPU_RWSEM(sem)   \
>   static DEFINE_PER_CPU(unsigned int, sem##_counters);\
>   struct percpu_rw_semaphore sem = {  \
>   .fast_read_ctr = ##_counters,   \
>   ... \
>   }
>   
> and yes, uprobes and cgroups can use it.
> 
> But somehow I missed that we can't use it to define a _static_ sem,
> 
>   static DECLARE_PERCPU_RWSEM(sem);
> 
> obviously won't work. And damn, I am shy to admit that I spent several
> hours trying to invent something but failed. Perhaps we can add 2 helpers,
> DECLARE_PERCPU_RWSEM_GLOBAL() and DECLARE_PERCPU_RWSEM_STATIC().

That is indeed what we do for SRCU for the same reason, DEFINE_SRCU()
and DEFINE_STATIC_SRCU(), but with a common __DEFINE_SRCU() doing the
actual work.

Thanx, Paul

> Oleg.
> 
> ---
> static const struct {
>   void (*call)(struct rcu_head *, void (*)(struct rcu_head *));
>   void (*wait)(void); // TODO: remove this
> #ifdef CONFIG_PROVE_RCU
>   int  (*held)(void);
> #endif
> } gp_ops[] = {
>   ...
> };
> 
> // COMMENT to explain these states
> enum { GP_IDLE = 0, GP_ENTER, GP_PASSED, GP_EXIT, GP_REPLAY };
> 
> #define   rss_lockgp_wait.lock
> 
> // !!1
> // XXX code must be removed when we split rcu_sync_enter() into start + wait
> // !!!
> 
> static void rcu_sync_func(struct rcu_head *rcu)
> {
>   struct rcu_sync *rsp = container_of(rcu, struct rcu_sync, cb_head);
>   unsigned long flags;
> 
>   BUG_ON(rsp->gp_state == GP_IDLE);
>   BUG_ON(rsp->gp_state == GP_PASSED);
> 
>   spin_lock_irqsave(>rss_lock, flags);
>   if (rsp->gp_count) {
>   /*
>* COMMENT.
>*/
>   rsp->gp_state = GP_PASSED;
>   wake_up_locked(>gp_wait);
>   } else if (rsp->gp_state == GP_REPLAY) {
>   /*
>* A new rcu_sync_exit() has happened; requeue the callback
>* to catch a later GP.
>*/
>   rsp->gp_state = GP_EXIT;
>   gp_ops[rsp->gp_type].call(>cb_head, rcu_sync_func);
>   } else {
>   /*
>* We're at least a GP after rcu_sync_exit(); eveybody will now
>* have observed the write side critical section. Let 

[PATCH 04/11] ARM: DT: STiH407: Add serial3 pinctrl configuration

2015-09-11 Thread Peter Griffin
Add missing serial 3 pinctrl config. This can be used
on b2206 HVK, where it defaults to PIO31[3] & PIO31[4],
alternate 1.

Signed-off-by: Erwan Le Ray 
Signed-off-by: Fabrice Gasnier 
Acked-by: Carmelo Amoroso 
Acked-by: Patrice Chotard 
Signed-off-by: Patrice Chotard 
Signed-off-by: Peter Griffin 
---
 arch/arm/boot/dts/stih407-pinctrl.dtsi | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/arm/boot/dts/stih407-pinctrl.dtsi 
b/arch/arm/boot/dts/stih407-pinctrl.dtsi
index bb3b0c7..6c81f35 100644
--- a/arch/arm/boot/dts/stih407-pinctrl.dtsi
+++ b/arch/arm/boot/dts/stih407-pinctrl.dtsi
@@ -809,6 +809,15 @@
};
};
};
+
+   serial3 {
+   pinctrl_serial3: serial3-0 {
+   st,pins {
+   tx = < 3 ALT1 OUT>;
+   rx = < 4 ALT1 IN>;
+   };
+   };
+   };
};
 
pin-controller-flash {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 02/11] ARM: STi: DT: STiH407: Add i2c3 alternate pin configs

2015-09-11 Thread Peter Griffin
i2c3 controller can use several sets of pins depending
on board design. This patch adds the missing alternate
pinconfigs.

Signed-off-by: Seraphin Bonnaffe 
Signed-off-by: Peter Griffin 
---
 arch/arm/boot/dts/stih407-pinctrl.dtsi | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/stih407-pinctrl.dtsi 
b/arch/arm/boot/dts/stih407-pinctrl.dtsi
index d86ccc8..ce219a1 100644
--- a/arch/arm/boot/dts/stih407-pinctrl.dtsi
+++ b/arch/arm/boot/dts/stih407-pinctrl.dtsi
@@ -430,12 +430,24 @@
};
 
i2c3 {
-   pinctrl_i2c3_default: i2c3-default {
+   pinctrl_i2c3_default: i2c3-alt1-0 {
st,pins {
sda = < 6 ALT1 BIDIR>;
scl = < 5 ALT1 BIDIR>;
};
};
+   pinctrl_i2c3_alt1_1: i2c3-alt1-1 {
+   st,pins {
+   sda = < 7 ALT1 BIDIR>;
+   scl = < 6 ALT1 BIDIR>;
+   };
+   };
+   pinctrl_i2c3_alt3_0: i2c3-alt3-0 {
+   st,pins {
+   sda = < 6 ALT3 BIDIR>;
+   scl = < 5 ALT3 BIDIR>;
+   };
+   };
};
 
spi0 {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 03/11] ARM: DT: STiH407: Add SPI 3 wire and 4 wire pinctrl configs

2015-09-11 Thread Peter Griffin
This patch adds the spi pinctrl configurations for all SPI
controllers, and also the alternate muxings which
can be used depending on board design.

Signed-off-by: Christophe Kerello 
Signed-off-by: Peter Griffin 
---
 arch/arm/boot/dts/stih407-pinctrl.dtsi | 239 -
 1 file changed, 235 insertions(+), 4 deletions(-)

diff --git a/arch/arm/boot/dts/stih407-pinctrl.dtsi 
b/arch/arm/boot/dts/stih407-pinctrl.dtsi
index ce219a1..bb3b0c7 100644
--- a/arch/arm/boot/dts/stih407-pinctrl.dtsi
+++ b/arch/arm/boot/dts/stih407-pinctrl.dtsi
@@ -262,6 +262,57 @@
};
};
};
+
+   spi10 {
+   pinctrl_spi10_default: spi10-4w-alt1-0 {
+   st,pins {
+   mtsr = < 6 ALT1 OUT>;
+   mrst = < 7 ALT1 IN>;
+   scl = < 5 ALT1 OUT>;
+   };
+   };
+
+   pinctrl_spi10_3w_alt1_0: spi10-3w-alt1-0 {
+   st,pins {
+   mtsr = < 6 ALT1 BIDIR_PU>;
+   scl = < 5 ALT1 OUT>;
+   };
+   };
+   };
+
+   spi11 {
+   pinctrl_spi11_default: spi11-4w-alt2-0 {
+   st,pins {
+   mtsr = < 1 ALT2 OUT>;
+   mrst = < 0 ALT2 IN>;
+   scl = < 2 ALT2 OUT>;
+   };
+   };
+
+   pinctrl_spi11_3w_alt2_0: spi11-3w-alt2-0 {
+   st,pins {
+   mtsr = < 1 ALT2 BIDIR_PU>;
+   scl = < 2 ALT2 OUT>;
+   };
+   };
+   };
+
+   spi12 {
+   pinctrl_spi12_default: spi12-4w-alt2-0 {
+   st,pins {
+   mtsr = < 6 ALT2 OUT>;
+   mrst = < 4 ALT2 IN>;
+   scl = < 7 ALT2 OUT>;
+   };
+   };
+
+   pinctrl_spi12_3w_alt2_0: spi12-3w-alt2-0 {
+   st,pins {
+   mtsr = < 6 ALT2 BIDIR_PU>;
+   scl = < 7 ALT2 OUT>;
+   };
+   };
+   };
};
 
pin-controller-front0 {
@@ -451,11 +502,159 @@
};
 
spi0 {
-   pinctrl_spi0_default: spi0-default {
+   pinctrl_spi0_default: spi0-4w-alt2-0 {
+   st,pins {
+   mtsr = < 6 ALT2 OUT>;
+   mrst = < 7 ALT2 IN>;
+   scl = < 5 ALT2 OUT>;
+   };
+   };
+
+   pinctrl_spi0_3w_alt2_0: spi0-3w-alt2-0 {
st,pins {
-   mtsr = < 6 ALT2 BIDIR>;
-   mrst = < 7 ALT2 BIDIR>;
-   scl = < 5 ALT2 BIDIR>;
+   mtsr = < 6 ALT2 BIDIR_PU>;
+   scl = < 5 ALT2 OUT>;
+   };
+   };
+
+   pinctrl_spi0_4w_alt1_0: spi0-4w-alt1-0 {
+   st,pins {
+   mtsr = < 7 ALT1 OUT>;
+   mrst = < 5 ALT1 IN>;
+   scl = < 6 ALT1 OUT>;
+   };
+   };
+
+   pinctrl_spi0_3w_alt1_0: spi0-3w-alt1-0 {
+   st,pins {
+   mtsr = < 7 ALT1 BIDIR_PU>;
+ 

Re: [PATCH 5/6] sched/fair: Get rid of scaling utilization by capacity_orig

2015-09-11 Thread Morten Rasmussen
On Wed, Sep 09, 2015 at 12:13:10PM +0100, Morten Rasmussen wrote:
> On Wed, Sep 09, 2015 at 11:43:05AM +0200, Peter Zijlstra wrote:
> > Sadly that makes the code worse; I get 14 mul instructions where
> > previously I had 11.
> > 
> > What happens is that GCC gets confused and cannot constant propagate the
> > new variables, so what used to be shifts now end up being actual
> > multiplications.
> > 
> > With this, I get back to 11. Can you see what happens on ARM where you
> > have both functions defined to non constants?
> 
> We repeated the experiment on arm and arm64 but still with functions
> defined to constant to compare with your results. The mul instruction
> count seems to be somewhat compiler version dependent, but consistently
> show no effect of the patch:
> 
> arm   before  after
> gcc4.912  12
> gcc4.810  10
> 
> arm64 before  after
> gcc4.911  11
> 
> I will get numbers with the arch-functions implemented as well and do
> hackbench runs to see what happens in terms of performance.

I have done some runs with the proposed fixes added:

1. PeterZ's util_sum shift fix (change util_sum).
2. Morten's scaling of weight instead of time (reduce bit loss).
3. PeterZ's unconditional calls to arch*() functions (compiler opt).

To be clear: 2 includes 1, and 3 includes 1 and 2.

Runs where done with the default (#define) implementation of the
arch-functions and with arch specific implementation for ARM.

I realized that just looking for 'mul' instructions in
update_blocked_averages() is probably not a fair comparison on ARM as it
turned out that it has quite a few multiply-accumulate instructions. So
I have included the total count including those too.


Test platforms:

ARM TC2 (A7x3 only)
perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 200
#mul: grep -e mul (in update_blocked_averages())
#mul_all: grep -e mul -e mla -e mls -e mia (in update_blocked_averages())
gcc: 4.9.3

Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz
perf stat --null --repeat 10 -- perf bench sched messaging -g 50 -l 15000
#mul: grep -e mul (in update_blocked_averages())
gcc: 4.9.2


Results:

perf numbers are average of three (x10) runs. Raw data is available
further down.

ARM TC2 #mul#mul_allperf bench
arch*() default arm default arm default arm

1 shift_fix 10  16  22  36  13.401  13.288
2 scaled_weight 12  14  30  32  13.282  13.238
3 unconditional 12  14  26  32  13.296  13.427

Intel E5-2690   #mul#mul_allperf bench
arch*() default default default

1 shift_fix 13  14.786
2 scaled_weight 18  15.078
3 unconditional 14  15.195


Overall it appears that fewer 'mul' instructions doesn't necessarily
mean better perf bench score. For ARM, 2 seems the best choice overall.
While 1 is better for Intel. If we want to try avoid the bit loss by
scaling weight instead of time, 2 is best for both. However, all that
said, looking at the raw numbers there is a significant difference
between runs of perf --repeat, so we can't really draw any strong
conclusions. It all appears to be in the noise.

I suggest that I spin a v2 of this series and go with scaled_weight to
reduce bit loss. Any objections?

While at it, should I include Yuyang's patch redefining the SCALE/SHIFT
mess?


Raw numbers:

ARM TC2

shift_fix   default_arch
gcc4.9.3
#mul 10
#mul+mla+mls+mia 22
13.384416727 seconds time elapsed ( +-  0.17% )
13.431014702 seconds time elapsed ( +-  0.18% )
13.387434890 seconds time elapsed ( +-  0.15% )

shift_fix   arm_arch
gcc4.9.3
#mul 16
#mul+mla+mls+mia 36
13.271044081 seconds time elapsed ( +-  0.11% )
13.310189123 seconds time elapsed ( +-  0.19% )
13.283594740 seconds time elapsed ( +-  0.12% )

scaled_weight   default_arch
gcc4.9.3
#mul 12
#mul+mla+mls+mia 30
13.295649553 seconds time elapsed ( +-  0.20% )
13.271634654 seconds time elapsed ( +-  0.19% )
13.280081329 seconds time elapsed ( +-  0.14% )

scaled_weight   arm_arch
gcc4.9.3
#mul 14
#mul+mla+mls+mia 32
13.230659223 seconds time elapsed ( +-  0.15% )
13.76527 seconds time elapsed ( +-  0.15% )
13.260275081 seconds time elapsed ( +-  0.21% )

unconditional   default_arch
gcc4.9.3
#mul 12
#mul+mla+mls+mia 26
13.274904460 seconds time elapsed ( +-  0.13% )
13.307853511 seconds time elapsed ( +-  0.15% )
13.304084844 seconds time elapsed ( +-  0.22% )

unconditional   arm_arch
gcc4.9.3
#mul 14
#mul+mla+mls+mia 32
13.432878577 seconds time elapsed ( +-  0.13% )
13.417950552 seconds time elapsed ( +-  0.12% )
13.431682719 seconds time elapsed ( +-  0.18% )


Intel

shift_fix   default_arch
gcc4.9.2
#mul 13
14.905815416 seconds time elapsed ( +-  0.61% )
14.83694 seconds time elapsed ( +-  0.84% )
14.639739309 seconds time elapsed ( +-  0.76% )

scaled_weight   default_arch
gcc4.9.2
#mul 18

[GIT PULL] Ceph changes for 4.3-rc1

2015-09-11 Thread Sage Weil
Hi Linus,

Please pull the following Ceph updates from

  git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client.git for-linus

There are a few fixes for snapshot behavior with CephFS and support for 
the new keepalive protocol from Zheng, a libceph fix that affects both RBD 
and CephFS, a few bug fixes and cleanups for RBD from Ilya, and several 
small fixes and cleanups from Jianpeng and others.

Thanks!
sage



Benoît Canet (1):
  libceph: Avoid holding the zero page on ceph_msgr_slab_init errors

Brad Hubbard (1):
  ceph: remove redundant test of head->safe and silence static analysis 
warnings

Ilya Dryomov (4):
  libceph: rename con_work() to ceph_con_workfn()
  rbd: fix double free on rbd_dev->header_name
  rbd: plug rbd_dev->header.object_prefix memory leak
  libceph: check data_len in ->alloc_msg()

Jianpeng Ma (3):
  ceph: remove the useless judgement
  ceph: no need to get parent inode in ceph_open
  ceph: cleanup use of ceph_msg_get

Nicholas Krause (1):
  libceph: remove the unused macro AES_KEY_SIZE

Yan, Zheng (7):
  ceph: EIO all operations after forced umount
  ceph: invalidate dirty pages after forced umount
  ceph: fix queuing inode to mdsdir's snaprealm
  libceph: set 'exists' flag for newly up osd
  libceph: use keepalive2 to verify the mon session is alive
  ceph: get inode size for each append write
  ceph: improve readahead for file holes

 drivers/block/rbd.c|  6 ++--
 fs/ceph/addr.c |  6 ++--
 fs/ceph/caps.c |  8 +
 fs/ceph/file.c | 14 
 fs/ceph/mds_client.c   | 59 ++
 fs/ceph/mds_client.h   |  1 +
 fs/ceph/snap.c |  7 
 fs/ceph/super.c|  1 +
 include/linux/ceph/libceph.h   |  2 ++
 include/linux/ceph/messenger.h |  4 +++
 include/linux/ceph/msgr.h  |  4 ++-
 net/ceph/ceph_common.c |  1 +
 net/ceph/crypto.c  |  4 ---
 net/ceph/messenger.c   | 82 +++---
 net/ceph/mon_client.c  | 37 ++-
 net/ceph/osd_client.c  | 51 ++
 net/ceph/osdmap.c  |  2 +-
 17 files changed, 191 insertions(+), 98 deletions(-)

Re: [PATCH v2 2/5] seccomp: make underlying bpf ref counted as well

2015-09-11 Thread Daniel Borkmann

On 09/11/2015 07:33 PM, Tycho Andersen wrote:

On Fri, Sep 11, 2015 at 06:03:59PM +0200, Daniel Borkmann wrote:

On 09/11/2015 04:44 PM, Tycho Andersen wrote:

On Fri, Sep 11, 2015 at 03:02:36PM +0200, Daniel Borkmann wrote:

On 09/11/2015 02:20 AM, Tycho Andersen wrote:

In the next patch, we're going to add a way to access the underlying
filters via bpf fds. This means that we need to ref-count both the
struct seccomp_filter objects and the struct bpf_prog objects separately,
in case a process dies but a filter is still referred to by another
process.

Additionally, we mark classic converted seccomp filters as seccomp eBPF
programs, since they are a subset of what is supported in seccomp eBPF.

Signed-off-by: Tycho Andersen 
CC: Kees Cook 
CC: Will Drewry 
CC: Oleg Nesterov 
CC: Andy Lutomirski 
CC: Pavel Emelyanov 
CC: Serge E. Hallyn 
CC: Alexei Starovoitov 
CC: Daniel Borkmann 
---
  kernel/seccomp.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index 245df6b..afaeddf 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -378,6 +378,8 @@ static struct seccomp_filter *seccomp_prepare_filter(struct 
sock_fprog *fprog)
}

atomic_set(>usage, 1);
+   atomic_set(>prog->aux->refcnt, 1);
+   sfilter->prog->type = BPF_PROG_TYPE_SECCOMP;


So, if you do this, then this breaks the assumption of eBPF JITs
that, currently, all classic converted BPF programs always have a
prog->type of BPF_PROG_TYPE_UNSPEC (see: bpf_prog_was_classic()).

Currently, JITs make use of this information to determine whether
A and X mappings for such programs should or should not be cleared
in the prologue (s390 currently).

In the seccomp_prepare_filter() stage, we're already past that, so
it will not cause an issue, but we certainly would need to be very
careful in future, if bpf_prog_was_classic() is then used at a later
stage when we already have a generated bpf_prog somewhere, as then
this assumption will break.


The only reason we need to do this is to allow BPF_DUMP_PROG to work,
since we were restricting it to only allow dumping of seccomp
programs, since those don't have maps. Instead, perhaps we could allow
dumping of BPF_PROG_TYPE_SECCOMP and BPF_PROG_TYPE_UNSPEC?


There are possibilities that BPF_PROG_TYPE_UNSPEC is calling helpers
already today, at least in networking case, not seccomp. So, since
you want to export [classic -> eBPF] only for seccomp, put fds on them
and dump these via bpf(2), you could allow that (with a big comment
stating why it's safe), but mid-term we really need to sanitize all
this stuff properly as this is needed for other types, too.


Sorry, just to be clear, you're suggesting that the patch is ok modulo
a comment describing the jit issues?


I think due to the given insns restrictions on classic seccomp, this
could work for "most cases" (see below) for the time being until pointer
sanitation is resolved and that seccomp-only restriction from the dump
could be removed, BUT there's one more stone in the road which you still
need to take care of with this whole 'giving classic seccomp-BPF -> eBPF
transforms an fd, dumping and restoring that via bpf(2)' approach:

If you have JIT enabled on ARM32, and add a classic seccomp-BPF filter,
and dump that via your bpf(2) interface based on the current patches, what
you'll get is not eBPF opcodes but classic (!) BPF opcodes as ARM32 classic
JIT supports compilation of seccomp, since commit 24e737c1ebac ("ARM: net:
add JIT support for loads from struct seccomp_data.").

So in that case, bpf_prepare_filter() will not call into bpf_migrate_filter()
as there's simply no need for it, because the classic code could already
be JITed there. I guess other archs where JIT support for eBPF in not yet
within near sight might sooner or later support this insn for their classic
JITs, too ...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: randconfig build error with next-20150908, in drivers/md/dm-mpath.c

2015-09-11 Thread Mike Snitzer
On Wed, Sep 09 2015 at 12:04pm -0400,
Christoph Hellwig <h...@lst.de> wrote:

> Does this fix the issue for you?  My Kconfig-fu isn't the best,
> but the idea behind this is that dm-mpath will depend on SCSI
> if SCSI_DH is set.  If SCSI_DH is not set it will use the stubs
> and not care about SCSI.
> 
> diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig
> index b597273..e9ea681 100644
> --- a/drivers/md/Kconfig
> +++ b/drivers/md/Kconfig
> @@ -393,7 +393,7 @@ config DM_MULTIPATH
>   # of SCSI_DH if the latter isn't defined but if
>   # it is, DM_MULTIPATH must depend on it.  We get a build
>   # error if SCSI_DH=m and DM_MULTIPATH=y
> - depends on SCSI_DH || !SCSI_DH
> + depends on !SCSI_DH || SCSI
>   ---help---
> Allow volume managers to support multipath hardware.
>  

I verified (with next-20150911) that without your patch I saw the
reported problem using the provided randconfig (that had
CONFIG_SCSI_DH=y and CONFIG_DM_MULTIPATH=y):

drivers/built-in.o: In function `activate_path':
/root/snitm/git/linux/drivers/md/dm-mpath.c:1225: undefined reference to 
`scsi_dh_activate'
drivers/built-in.o: In function `parse_path':
/root/snitm/git/linux/drivers/md/dm-mpath.c:581: undefined reference to 
`scsi_dh_attached_handler_name'
/root/snitm/git/linux/drivers/md/dm-mpath.c:600: undefined reference to 
`scsi_dh_attach'
/root/snitm/git/linux/drivers/md/dm-mpath.c:615: undefined reference to 
`scsi_dh_set_params'
make: *** [vmlinux] Error 1

But with your patch the build completes successfully.

James, please feel free to pull in Christoph's patch and add my:

Tested-by: Mike Snitzer <snit...@redhat.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 2/6] locking/pvqspinlock: Unconditional PV kick with _Q_SLOW_VAL

2015-09-11 Thread Waiman Long
If _Q_SLOW_VAL has been set, the vCPU state must have been vcpu_hashed.
The extra check at the end of __pv_queued_spin_unlock() is unnecessary
and so is removed.

Signed-off-by: Waiman Long 
Reviewed-by: Davidlohr Bueso 
---
 kernel/locking/qspinlock_paravirt.h |6 +-
 1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/qspinlock_paravirt.h 
b/kernel/locking/qspinlock_paravirt.h
index c8e6e9a..f0450ff 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -267,7 +267,6 @@ static void pv_wait_head(struct qspinlock *lock, struct 
mcs_spinlock *node)
}
 
if (!lp) { /* ONCE */
-   WRITE_ONCE(pn->state, vcpu_hashed);
lp = pv_hash(lock, pn);
 
/*
@@ -275,11 +274,9 @@ static void pv_wait_head(struct qspinlock *lock, struct 
mcs_spinlock *node)
 * when we observe _Q_SLOW_VAL in 
__pv_queued_spin_unlock()
 * we'll be sure to be able to observe our hash entry.
 *
-*   [S] pn->state
 *   [S]  [Rmw] l->locked == 
_Q_SLOW_VAL
 *   MB   RMB
 * [RmW] l->locked = _Q_SLOW_VAL  [L] 
-*[L] pn->state
 *
 * Matches the smp_rmb() in __pv_queued_spin_unlock().
 */
@@ -364,8 +361,7 @@ __visible void __pv_queued_spin_unlock(struct qspinlock 
*lock)
 * vCPU is harmless other than the additional latency in completing
 * the unlock.
 */
-   if (READ_ONCE(node->state) == vcpu_hashed)
-   pv_kick(node->cpu);
+   pv_kick(node->cpu);
 }
 /*
  * Include the architecture specific callee-save thunk of the
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 0/6] locking/qspinlock: Enhance pvqspinlock performance

2015-09-11 Thread Waiman Long
v5->v6:
 - Added a new patch 1 to relax the cmpxchg and xchg operations in
   the native code path to reduce performance overhead on non-x86
   architectures.
 - Updated the unconditional PV kick patch as suggested by PeterZ.
 - Added a new patch to allow one lock stealing attempt at slowpath
   entry point to reduce performance penalty due to lock waiter
   preemption.
 - Removed the pending bit and kick-ahead patches as they didn't show
   any noticeable performance improvement on top of the lock stealing
   patch.
 - Simplified the adaptive spinning patch as the lock stealing patch
   allows more aggressive pv_wait() without much performance penalty
   in non-overcommitted VMs.

v4->v5:
 - Rebased the patch to the latest tip tree.
 - Corrected the comments and commit log for patch 1.
 - Removed the v4 patch 5 as PV kick deferment is no longer needed with
   the new tip tree.
 - Simplified the adaptive spinning patch (patch 6) & improve its
   performance a bit further.
 - Re-ran the benchmark test with the new patch.

v3->v4:
 - Patch 1: add comment about possible racing condition in PV unlock.
 - Patch 2: simplified the pv_pending_lock() function as suggested by
   Davidlohr.
 - Move PV unlock optimization patch forward to patch 4 & rerun
   performance test.

v2->v3:
 - Moved deferred kicking enablement patch forward & move back
   the kick-ahead patch to make the effect of kick-ahead more visible.
 - Reworked patch 6 to make it more readable.
 - Reverted back to use state as a tri-state variable instead of
   adding an additional bistate variable.
 - Added performance data for different values of PV_KICK_AHEAD_MAX.
 - Add a new patch to optimize PV unlock code path performance.

v1->v2:
 - Take out the queued unfair lock patches
 - Add a patch to simplify the PV unlock code
 - Move pending bit and statistics collection patches to the front
 - Keep vCPU kicking in pv_kick_node(), but defer it to unlock time
   when appropriate.
 - Change the wait-early patch to use adaptive spinning to better
   balance the difference effect on normal and over-committed guests.
 - Add patch-to-patch performance changes in the patch commit logs.

This patchset tries to improve the performance of both regular and
over-commmitted VM guests. The adaptive spinning patch was inspired
by the "Do Virtual Machines Really Scale?" blog from Sanidhya Kashyap.

Patch 1 relaxes the memory order restriction of atomic operations by
using less restrictive _acquire and _release variants of cmpxchg()
and xchg(). This will reduce performance overhead when ported to other
non-x86 architectures.

Patch 2 simplifies the unlock code by removing the unnecessary
state check.

Patch 2 adds pending bit support to pvqspinlock improving performance
at light load.

Patch 3 optimizes the PV unlock code path performance for x86-64
architecture.

Patch 4 allows the collection of various count data that are useful
to see what is happening in the system. They do add a bit of overhead
when enabled slowing performance a tiny bit.

Patch 5 allows one lock stealing attempt at slowpath entry. This causes
a pretty big performance improvement for over-committed VM guests.

Patch 6 enables adaptive spinning in the queue nodes. This patch
leads to further performance improvement in over-committed guest,
though it is not as big as the previous patch.

Waiman Long (6):
  locking/qspinlock: relaxes cmpxchg & xchg ops in native code
  locking/pvqspinlock: Unconditional PV kick with _Q_SLOW_VAL
  locking/pvqspinlock, x86: Optimize PV unlock code path
  locking/pvqspinlock: Collect slowpath lock statistics
  locking/pvqspinlock: Allow 1 lock stealing attempt
  locking/pvqspinlock: Queue node adaptive spinning

 arch/x86/Kconfig  |7 +
 arch/x86/include/asm/qspinlock.h  |2 +-
 arch/x86/include/asm/qspinlock_paravirt.h |   59 +
 include/asm-generic/qspinlock.h   |6 +-
 kernel/locking/qspinlock.c|   45 +++--
 kernel/locking/qspinlock_paravirt.h   |  378 +
 6 files changed, 431 insertions(+), 66 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v6 5/6] locking/pvqspinlock: Allow 1 lock stealing attempt

2015-09-11 Thread Waiman Long
This patch allows one attempt for the lock waiter to steal the lock
when entering the PV slowpath.  This helps to reduce the performance
penalty caused by lock waiter preemption while not having much of
the downsides of a real unfair lock.

Linux kernel builds were run in KVM guest on an 8-socket, 4
cores/socket Westmere-EX system and a 4-socket, 8 cores/socket
Haswell-EX system. Both systems are configured to have 32 physical
CPUs. The kernel build times before and after the patch were:

WestmereHaswell
  Patch 32 vCPUs48 vCPUs32 vCPUs48 vCPUs
  - 
  Before patch   3m15.6s10m56.1s 1m44.1s 5m29.1s
  After patch3m02.3s 5m00.2s 1m43.7s 3m03.5s

For the overcommited case (48 vCPUs), this patch is able to reduce
kernel build time by more than 54% for Westmere and 44% for Haswell.

Signed-off-by: Waiman Long 
---
 kernel/locking/qspinlock.c  |   19 +++---
 kernel/locking/qspinlock_paravirt.h |  116 ---
 2 files changed, 102 insertions(+), 33 deletions(-)

diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c
index 28a15c7..1be1aab 100644
--- a/kernel/locking/qspinlock.c
+++ b/kernel/locking/qspinlock.c
@@ -248,17 +248,15 @@ static __always_inline void set_locked(struct qspinlock 
*lock)
 
 static __always_inline void __pv_init_node(struct mcs_spinlock *node) { }
 static __always_inline void __pv_wait_node(struct mcs_spinlock *node) { }
-static __always_inline void __pv_kick_node(struct qspinlock *lock,
-  struct mcs_spinlock *node) { }
-static __always_inline void __pv_wait_head(struct qspinlock *lock,
-  struct mcs_spinlock *node) { }
-
+static __always_inline bool __pv_wait_head_and_lock(struct qspinlock *lock,
+   struct mcs_spinlock *node,
+   u32 tail)
+   { return false; }
 #define pv_enabled()   false
 
 #define pv_init_node   __pv_init_node
 #define pv_wait_node   __pv_wait_node
-#define pv_kick_node   __pv_kick_node
-#define pv_wait_head   __pv_wait_head
+#define pv_wait_head_and_lock  __pv_wait_head_and_lock
 
 #ifdef CONFIG_PARAVIRT_SPINLOCKS
 #define queued_spin_lock_slowpath  native_queued_spin_lock_slowpath
@@ -416,7 +414,8 @@ queue:
 * does not imply a full barrier.
 *
 */
-   pv_wait_head(lock, node);
+   if (pv_wait_head_and_lock(lock, node, tail))
+   goto release;
while ((val = smp_load_acquire(>val.counter)) & 
_Q_LOCKED_PENDING_MASK)
cpu_relax();
 
@@ -453,7 +452,6 @@ queue:
cpu_relax();
 
arch_mcs_spin_unlock_contended(>locked);
-   pv_kick_node(lock, next);
 
 release:
/*
@@ -474,8 +472,7 @@ EXPORT_SYMBOL(queued_spin_lock_slowpath);
 
 #undef pv_init_node
 #undef pv_wait_node
-#undef pv_kick_node
-#undef pv_wait_head
+#undef pv_wait_head_and_lock
 
 #undef  queued_spin_lock_slowpath
 #define queued_spin_lock_slowpath  __pv_queued_spin_lock_slowpath
diff --git a/kernel/locking/qspinlock_paravirt.h 
b/kernel/locking/qspinlock_paravirt.h
index 2d71768..9fd49a2 100644
--- a/kernel/locking/qspinlock_paravirt.h
+++ b/kernel/locking/qspinlock_paravirt.h
@@ -41,6 +41,30 @@ struct pv_node {
 };
 
 /*
+ * Allow one unfair trylock when entering the PV slowpath to reduce the
+ * performance impact of lock waiter preemption (either explicitly via
+ * pv_wait or implicitly via PLE).
+ *
+ * A little bit of unfairness here can improve performance without many
+ * of the downsides of a real unfair lock.
+ */
+#define queued_spin_trylock(l) pv_queued_spin_trylock_unfair(l)
+static inline bool pv_queued_spin_trylock_unfair(struct qspinlock *lock)
+{
+   struct __qspinlock *l = (void *)lock;
+
+   if (READ_ONCE(l->locked))
+   return 0;
+   /*
+* Wait a bit here to ensure that an actively spinning vCPU has a fair
+* chance of getting the lock.
+*/
+   cpu_relax();
+
+   return cmpxchg(>locked, 0, _Q_LOCKED_VAL) == 0;
+}
+
+/*
  * PV qspinlock statistics
  */
 enum pv_qlock_stat {
@@ -51,6 +75,7 @@ enum pv_qlock_stat {
pvstat_kick_unlock,
pvstat_spurious,
pvstat_hops,
+   pvstat_utrylock,
pvstat_num  /* Total number of statistics counts */
 };
 
@@ -69,6 +94,7 @@ static const char * const stat_fsnames[pvstat_num] = {
[pvstat_kick_unlock] = "kick_unlock_count",
[pvstat_spurious]= "spurious_wakeup",
[pvstat_hops]= "hash_hops_count",
+   [pvstat_utrylock]= "utrylock_count",
 };
 
 static atomic_t pvstats[pvstat_num];
@@ -145,6 +171,20 @@ static inline void pvstat_hop(int hopcnt)
 }
 
 /*

Re: [PATCH v2 7/9] ARM: STi: DT: STiH407: Add FDMA driver dt nodes.

2015-09-11 Thread Lee Jones
On Fri, 11 Sep 2015, Peter Griffin wrote:

> These nodes are required to get the fdma driver working
> on STiH407 based silicon.
> 
> Signed-off-by: Peter Griffin 
> ---
>  arch/arm/boot/dts/stih407-family.dtsi | 51 
> +++
>  1 file changed, 51 insertions(+)
> 
> diff --git a/arch/arm/boot/dts/stih407-family.dtsi 
> b/arch/arm/boot/dts/stih407-family.dtsi
> index 838b812..da07474b 100644
> --- a/arch/arm/boot/dts/stih407-family.dtsi
> +++ b/arch/arm/boot/dts/stih407-family.dtsi
> @@ -565,5 +565,56 @@
> <_port2 PHY_TYPE_USB3>;
>   };
>   };
> +
> + fdma0: fdma0-audio@8e2 {

I'm not familiar with the FDMA driver, so can't comment knowledgeably,
but the  part of @ should only describe the
type of hardware.  I believe in this case it should just be
dma@08e2.  Also notice the leading zero in the address, which I
believe mitigates possible confusion.  Then you be more specific with
the label, so something like 'fdma-audio' seems appropriate here.

> + compatible = "st,stih407-fdma-mpe31";
> + reg = <0x8e2 0x2>;

I personally find padding up to 32bits helpful in the addresses.

> + interrupts = ;
> + dma-channels = <16>;
> + #dma-cells = <3>;
> + st,fdma-id = <0>;

We usually shy away from ID properties.  What is it required for in
this case?

> + clocks = <_s_c0_flexgen CLK_FDMA>,
> +  <_s_c0_flexgen CLK_EXT2F_A9>,
> +  <_s_c0_flexgen CLK_EXT2F_A9>,
> +  <_s_c0_flexgen CLK_EXT2F_A9>;
> + clock-names = "fdma_slim",
> +   "fdma_hi",
> +   "fdma_low",
> +   "fdma_ic";
> + };
> +
> + fdma1: fdma1-app@8e4 {
> + compatible = "st,stih407-fdma-mpe31";
> + reg = <0x8e4 0x2>;
> + interrupts = ;
> + dma-channels = <16>;
> + #dma-cells = <3>;
> + st,fdma-id = <1>;
> + clocks = <_s_c0_flexgen CLK_FDMA>,
> +  <_s_c0_flexgen CLK_TX_ICN_DMU>,
> +  <_s_c0_flexgen CLK_TX_ICN_DMU>,
> +  <_s_c0_flexgen CLK_EXT2F_A9>;
> + clock-names = "fdma_slim",
> +   "fdma_hi",
> +   "fdma_low",
> +   "fdma_ic";
> + };
> +
> + fdma2: fdma2-free_running@8e6 {
> + compatible = "st,stih407-fdma-mpe31";
> + reg = <0x8e6 0x2>;
> + interrupts = ;
> + dma-channels = <16>;
> + #dma-cells = <3>;
> + st,fdma-id = <2>;
> + clocks = <_s_c0_flexgen CLK_FDMA>,
> +  <_s_c0_flexgen CLK_EXT2F_A9>,
> +  <_s_c0_flexgen CLK_TX_ICN_DISP_0>,
> +  <_s_c0_flexgen CLK_EXT2F_A9>;
> + clock-names = "fdma_slim",
> +   "fdma_hi",
> +   "fdma_low",
> +   "fdma_ic";
> + };
>   };
>  };

-- 
Lee Jones
Linaro STMicroelectronics Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] Please pull NFS client changes

2015-09-11 Thread Christoph Hellwig
On Mon, Sep 07, 2015 at 11:01:36PM -0700, Christoph Hellwig wrote:
> On Tue, Sep 08, 2015 at 11:59:00AM +1000, Stephen Rothwell wrote:
> > This contains about 12 commits new since Sept 1 and the last 6 are only
> > appearing in linux-next today (though I did not do Friday and Monday's
> > linux-next).  Not judging, just noting.
> 
> And one of tese recent commits causes a regression for block layouts
> in xfstests generic/075.  Still need to check which one.

"NFSv4.1/pNFS: Don't request a minimal read layout beyond the end of file"

is the culprit, posted to the list for the first time and committed on
Aug 31.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] efi/libstub/fdt: Standardize the names of EFI stub parameters

2015-09-11 Thread Mark Rutland
> It feels like this discussion is going in circles.
> 
> When we discussed this six months ago, we already concluded that,
> since UEFI is the only specified way that the presence of ACPI is
> advertised on an ARM system, we need to emulate UEFI to some extent.

My understanding from the last time I was present at such a discussion
was that the emulation was complete from the kernel's PoV (i.e. no
special case handling was required). 

Evidently I misunderstood.

One of the main points of rationale for requiring EFI was that we'd have
a well-defined system state as per the EFI (and ACPI) standards. We'd
know we had the EFI memory map, services, etc (with the memory map and
configuration tables being the most important elements). We didn't want
to have to try to reconcile a DT memory map and ACPI, for instance.

That is somewhat (though admitedly not entirely) broken if we require a
set of rules to be applied beyond what the standards mandate.  That is
broken if we require a non-standard set of rules to be applied, and
limits what we can do in the !Xen case.

> So we need the EFI system table to expose the UEFI configuration table
> that carries the ACPI root pointer.
> 
> Since ACPI support also relies on the UEFI memory map (I think?), we
> need that as well.
> 
> These two items are exactly what we pass via the UEFI DT properties,
> so we should indeed promote the current de-facto binding to a proper
> binding, and renaming the properties makes sense in that context.

I agree that we need to sort these out.

> I agree that this should also include a description of the expected
> state of the firmware, i.e., that ExitBootServices() has been called,
> and that the memory map has been populated with virtual address, which
> have been installed using SetVirtualAddressMap() if they differ from
> the physical addresses. (The current implementation on the kernel side
> is perfectly capable of dealing with a 1:1 mapping).
> 
> Beyond that, there is no point in pretending to be a full UEFI
> implementation, imo. Boot services are not required, nor are runtime
> services (only the current EFI init code on arm needs to be modified
> to deal with a NULL runtime services pointer)

I'm not keen on this because it involves applying Xen-specific caveats
atop of what the UEFI spec says (e.g. as runtime services might be NULL
under Xen). As the spec and Xen evolve, those caveats shift, and that's
going to be fragile for all users regardleses of Xen.

If Xen needs special-casing, then I'd rather that Xen were detected
first and provided us with what it knows is safe for us to use, rather
than we tip-toe around until we're sure Xen isn't present and/or doesn't
need additional constraints met.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] x86: Wire up 32-bit direct socket calls

2015-09-11 Thread Andy Lutomirski
On Fri, Sep 11, 2015 at 3:14 AM, Arnd Bergmann  wrote:
> On Friday 11 September 2015 11:54:50 Geert Uytterhoeven wrote:
>> To make sure I don't miss any (it seems I missed recvmmsg and sendmmsg for
>> the socketcall case, sigh), this is the list of ipc syscalls to implement?
>>
>> sys_msgget
>> sys_msgctl
>> sys_msgrcv
>> sys_msgsnd
>> sys_semget
>> sys_semctl
>> sys_semtimedop
>> sys_shmget
>> sys_shmctl
>> sys_shmat
>> sys_shmdt
>>
>> sys_semop() seems to be unneeded because it can be implemented using
>> sys_semtimedop()?
>>
>
> Yes, that list looks right. IPC also includes a set of six sys_mq_*
> call, but I believe that everyone already has those as they are not
> covered by sys_ipc.
>
> For y2038 compatibility, we will likely add a new variant of
> semtimedop that takes a 64-bit timespec. While the argument passed
> there is a relative time that will never need to be longer than 68
> years, we need to accommodate user space that defines timespec
> in a sane way, and converting the argument in libc would be awkward.
>

I missed sys_ipc entirely.

Ingo, Thomas, want to just wire those up, too?  I can send a patch
next week, but it'll be as trivial as the socket one.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] arm/xen: Enable user access to the kernel before issuing a privcmd call

2015-09-11 Thread Stefano Stabellini
On Fri, 11 Sep 2015, Russell King - ARM Linux wrote:
> On Fri, Sep 11, 2015 at 06:36:05PM +0100, Julien Grall wrote:
> > On 11/09/15 18:32, Julien Grall wrote:
> > > On 11/09/15 18:00, Russell King - ARM Linux wrote:
> > >> On Fri, Sep 11, 2015 at 05:25:59PM +0100, Julien Grall wrote:
> > >>> +   /*
> > >>> +* Privcmd calls are issued by the userspace. We need to allow 
> > >>> the
> > >>> +* kernel to access the userspace memory before issuing the 
> > >>> hypercall.
> > >>> +*/
> > >>> +   uaccess_enable r4
> > >>> +
> > >>> +   /* r4 is loaded now as we use it as scratch register before */
> > >>> ldr r4, [sp, #4]
> > >>
> > >> As I mentioned in one of my previous mails, "ip" should be safe to use
> > >> here - it's a caller-corrupted register, just like r0-r3 and lr.  So,
> > >> you could do:
> > >>
> > >>  ldr r4, [sp, #4]
> > >> +uaccess_enable ip
> > > 
> > > The register ip (aka r12) is used to store the hypercall number. So we
> > > can't reuse it as scratch register.
> > > 
> > > The easiest one is r4.
> > > 
> > >>
> > >> which fractionally tightens the window.
> > >>
> > >> However, there's nothing actually wrong with your version - there's no
> > >> way we could've got this far with sp pointing at userspace.
> > >>
> > >> I'm happy with either version, so:
> > >>
> > >> Acked-by: Russell King 
> > >>
> > >> How do you want to handle the patch?  I already have some other uaccess
> > >> fixes queued up to send to Linus before the merge window closes.
> > 
> > Forgot to answer to this bits. I was thinking to ask Stefano carrying
> > the patch in xentip. Although it won't go until rc1.
> > 
> > I don't mind if it's going earlier in Linux/master.
> 
> Thanks, I've applied your patch as-is now.

That's fine by me, the patch looks good.

Thanks,

Stefano
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] staging/dgap: Use strpbrk() instead of dgap_sindex()

2015-09-11 Thread Alexander Kuleshov
The  provides strpbrk() function that does
the same that the dgap_sindex(). Let's use already defined
function instead of writing custom.

Signed-off-by: Alexander Kuleshov 
---
 drivers/staging/dgap/dgap.c | 24 +---
 1 file changed, 1 insertion(+), 23 deletions(-)

diff --git a/drivers/staging/dgap/dgap.c b/drivers/staging/dgap/dgap.c
index 9112dd2..ee0f022 100644
--- a/drivers/staging/dgap/dgap.c
+++ b/drivers/staging/dgap/dgap.c
@@ -287,28 +287,6 @@ static struct toklist dgap_tlist[] = {
{ 0,NULL }
 };
 
-
-/*
- * dgap_sindex: much like index(), but it looks for a match of any character in
- * the group, and returns that position.
- */
-static char *dgap_sindex(char *string, char *group)
-{
-   char *ptr;
-
-   if (!string || !group)
-   return NULL;
-
-   for (; *string; string++) {
-   for (ptr = group; *ptr; ptr++) {
-   if (*ptr == *string)
-   return string;
-   }
-   }
-
-   return NULL;
-}
-
 /*
  * get a word from the input stream, also keep track of current line number.
  * words are separated by whitespace.
@@ -317,7 +295,7 @@ static char *dgap_getword(char **in)
 {
char *ret_ptr = *in;
 
-   char *ptr = dgap_sindex(*in, " \t\n");
+   char *ptr = strpbrk(*in, " \t\n");
 
/* If no word found, return null */
if (!ptr)
-- 
2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/1] rcu_sync: Cleanup the CONFIG_PROVE_RCU checks

2015-09-11 Thread Oleg Nesterov
1. Rename __rcu_sync_is_idle() to rcu_sync_lockdep_assert() and
   change it to use rcu_lockdep_assert().

2. Change rcu_sync_is_idle() to return rsp->gp_state == GP_IDLE
   unconditonally, this way we can remove the same check from
   rcu_sync_lockdep_assert() and clearly isolate the debugging
   code.

Note: rcu_sync_enter()->wait_event(gp_state == GP_PASSED) needs
another CONFIG_PROVE_RCU check, the same we do in ->sync(); but
this needs some simple preparations in the core RCU code to avoid
the code duplication.

Signed-off-by: Oleg Nesterov 
---
 include/linux/rcu_sync.h |7 +++
 kernel/rcu/sync.c|6 +++---
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/include/linux/rcu_sync.h b/include/linux/rcu_sync.h
index 8069d64..a63a33e 100644
--- a/include/linux/rcu_sync.h
+++ b/include/linux/rcu_sync.h
@@ -40,7 +40,7 @@ struct rcu_sync {
enum rcu_sync_type  gp_type;
 };
 
-extern bool __rcu_sync_is_idle(struct rcu_sync *);
+extern void rcu_sync_lockdep_assert(struct rcu_sync *);
 
 /**
  * rcu_sync_is_idle() - Are readers permitted to use their fastpaths?
@@ -53,10 +53,9 @@ extern bool __rcu_sync_is_idle(struct rcu_sync *);
 static inline bool rcu_sync_is_idle(struct rcu_sync *rsp)
 {
 #ifdef CONFIG_PROVE_RCU
-   return __rcu_sync_is_idle(rsp);
-#else
-   return !rsp->gp_state; /* GP_IDLE */
+   rcu_sync_lockdep_assert(rsp);
 #endif
+   return !rsp->gp_state; /* GP_IDLE */
 }
 
 extern void rcu_sync_init(struct rcu_sync *, enum rcu_sync_type);
diff --git a/kernel/rcu/sync.c b/kernel/rcu/sync.c
index 1c73c57..a8cf199 100644
--- a/kernel/rcu/sync.c
+++ b/kernel/rcu/sync.c
@@ -63,10 +63,10 @@ enum { CB_IDLE = 0, CB_PENDING, CB_REPLAY };
 #definerss_lockgp_wait.lock
 
 #ifdef CONFIG_PROVE_RCU
-bool __rcu_sync_is_idle(struct rcu_sync *rsp)
+void rcu_sync_lockdep_assert(struct rcu_sync *rsp)
 {
-   WARN_ON(!gp_ops[rsp->gp_type].held());
-   return rsp->gp_state == GP_IDLE;
+   rcu_lockdep_assert(gp_ops[rsp->gp_type].held(),
+  "suspicious rcu_sync_is_idle() usage");
 }
 #endif
 
-- 
1.5.5.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] arm/xen: Enable user access to the kernel before issuing a privcmd call

2015-09-11 Thread Julien Grall
Hi Ian,

On 11/09/15 15:29, Ian Campbell wrote:
>> After the commit a5e090acbf545c0a3b04080f8a488b17ec41fe02 "ARM:
>> software-based priviledged-no-access support", the kernel can't access
> 
> "privileged"

That was a typo in the commit title of the patch. So I won't fix this one.

All the others will be fixed on the next version.

Regards,

-- 
Julien Grall
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RESEND PATCH v4 0/8] i2c: Relax mandatory I2C ID table passing

2015-09-11 Thread Lee Jones
On Fri, 11 Sep 2015, Kieran Bingham wrote:
> Hi Wolfram,
> 
> I have picked this patchset [0] up from Lee to rebase it, with an aim to
> get this series moving again.
> 
> This resend fixes up my SoB's as highlighted by Lee
> 
> A couple of minor issues were resolved in the rebase. As it stood, Javier
> proposed [1] to merge this series, and use a follow up series to make sure
> that all I2C drivers are using a MODLE_DEVICE_TABLE(of,...)
> 
> I have prepared a Coccinelle patch to work through the bulk of the changes
> required for the conversion, which will assist the transition process.
> 
> Once this patch set is accepted, I will commence converting the other
> drivers, and submitting with a per subsystem breakdown or simliar to
> reduce traffic.
> 
> [0] https://lkml.org/lkml/2014/8/28/283
> [1] https://lkml.org/lkml/2014/9/12/496

I appreciate that my SoB is on every patch, but this set still looks
good to me, so for extra clarification:

Acked-by: Lee Jones 

[...]

-- 
Lee Jones
Linaro STMicroelectronics Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 7/9] ARM: STi: DT: STiH407: Add FDMA driver dt nodes.

2015-09-11 Thread Peter Griffin
Hi Lee,

On Fri, 11 Sep 2015, Lee Jones wrote:

> On Fri, 11 Sep 2015, Peter Griffin wrote:
> 
> > These nodes are required to get the fdma driver working
> > on STiH407 based silicon.
> > 
> > Signed-off-by: Peter Griffin 
> > ---
> >  arch/arm/boot/dts/stih407-family.dtsi | 51 
> > +++
> >  1 file changed, 51 insertions(+)
> > 
> > diff --git a/arch/arm/boot/dts/stih407-family.dtsi 
> > b/arch/arm/boot/dts/stih407-family.dtsi
> > index 838b812..da07474b 100644
> > --- a/arch/arm/boot/dts/stih407-family.dtsi
> > +++ b/arch/arm/boot/dts/stih407-family.dtsi
> > @@ -565,5 +565,56 @@
> >   <_port2 PHY_TYPE_USB3>;
> > };
> > };
> > +
> > +   fdma0: fdma0-audio@8e2 {
> 
> I'm not familiar with the FDMA driver, so can't comment knowledgeably,
> but the  part of @ should only describe the
> type of hardware.  I believe in this case it should just be
> dma@08e2.  Also notice the leading zero in the address, which I
> believe mitigates possible confusion.  Then you be more specific with
> the label, so something like 'fdma-audio' seems appropriate here.

Ok, can change to that format in v3.

> 
> > +   compatible = "st,stih407-fdma-mpe31";
> > +   reg = <0x8e2 0x2>;
> 
> I personally find padding up to 32bits helpful in the addresses.

None of the stih407-family nodes I can see have this padding, including
the ones merged by you.

> 
> > +   interrupts = ;
> > +   dma-channels = <16>;
> > +   #dma-cells = <3>;
> > +   st,fdma-id = <0>;
> 
> We usually shy away from ID properties.  What is it required for in
> this case?

Yes Rob did already mention that over here, see my reply at the bottom
http://www.spinics.net/lists/devicetree/msg92529.html.

However I can't think of any other useful properties we could add
to derive this information. The fdma controller number is used
by the driver to generate a unique firmware filename.

regards,

Peter.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 09/11] ARM: DT: STiH407: Add pinconfig for IRB UHF and IRB TX

2015-09-11 Thread Peter Griffin
This patch adds the pinconfig for IRB TX and IRB UHF.

Signed-off-by: M'boumba Cedric Madianga 
Acked-by: Patrice Chotard 
Signed-off-by: Patrice Chotard 
Signed-off-by: Peter Griffin 
---
 arch/arm/boot/dts/stih407-pinctrl.dtsi | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/arch/arm/boot/dts/stih407-pinctrl.dtsi 
b/arch/arm/boot/dts/stih407-pinctrl.dtsi
index 3cd7e2a..473f2ea 100644
--- a/arch/arm/boot/dts/stih407-pinctrl.dtsi
+++ b/arch/arm/boot/dts/stih407-pinctrl.dtsi
@@ -121,6 +121,24 @@
ir = < 0 ALT2 IN>;
};
};
+
+   pinctrl_uhf: uhf0 {
+   st,pins {
+   ir = < 1 ALT2 IN>;
+   };
+   };
+
+   pinctrl_tx: tx0 {
+   st,pins {
+   tx = < 2 ALT2 OUT>;
+   };
+   };
+
+   pinctrl_tx_od: tx_od0 {
+   st,pins {
+   tx_od = < 3 ALT2 OUT>;
+   };
+   };
};
 
/* SBC_ASC0 - UART10 */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] Please pull NFS client changes

2015-09-11 Thread Trond Myklebust
On Fri, Sep 11, 2015 at 12:27 PM, Christoph Hellwig  wrote:
> On Mon, Sep 07, 2015 at 11:01:36PM -0700, Christoph Hellwig wrote:
>> On Tue, Sep 08, 2015 at 11:59:00AM +1000, Stephen Rothwell wrote:
>> > This contains about 12 commits new since Sept 1 and the last 6 are only
>> > appearing in linux-next today (though I did not do Friday and Monday's
>> > linux-next).  Not judging, just noting.
>>
>> And one of tese recent commits causes a regression for block layouts
>> in xfstests generic/075.  Still need to check which one.
>
> "NFSv4.1/pNFS: Don't request a minimal read layout beyond the end of file"
>
> is the culprit, posted to the list for the first time and committed on
> Aug 31.

That looks like it is tickling a server protocol bug.

The minimum length is just that; a minimum. If the server wants the
layout to be block aligned, then it is supposed to adjust the returned
offset + length values so that they cover the range described by the
offset+minimum length (see table 13 on
https://tools.ietf.org/html/rfc5661#page-540).
The server is only supposed to fail the LAYOUTGET request if it is
completely unable to meet those requirements. Furthermore, it is
supposed to return either NFS4ERR_BADLAYOUT or NFS4ERR_LAYOUTTRYLATER
(depending on what the value of the minimum layout was); as far as I
can see, the current code is returning NFS4ERR_LAYOUTUNAVAILABLE.

Cheers,
  Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: v2 of seccomp filter c/r patches

2015-09-11 Thread Tycho Andersen
On Fri, Sep 11, 2015 at 10:00:22AM -0700, Andy Lutomirski wrote:
> On Fri, Sep 11, 2015 at 9:30 AM, Andy Lutomirski  wrote:
> > On Sep 10, 2015 5:22 PM, "Tycho Andersen"  
> > wrote:
> >>
> >> Hi all,
> >>
> >> Here is v2 of the seccomp filter c/r set. The patch notes have individual
> >> changes from the last series, but there are two points not noted:
> >>
> >> * The series still does not allow us to correctly restore state for 
> >> programs
> >>   that will use SECCOMP_FILTER_FLAG_TSYNC in the future. Given that we 
> >> want to
> >>   keep seccomp_filter's identity, I think something along the lines of 
> >> another
> >>   seccomp command like SECCOMP_INHERIT_PARENT is needed (although I'm not 
> >> sure
> >>   if this can even be done yet). In addition, we'll need a kcmp command for
> >>   figuring out if filters are the same, although this too needs to compare
> >>   seccomp_filter objects, so it's a little screwy. Any thoughts on how to 
> >> do
> >>   this nicely are welcome.
> >
> > Let's add a concept of a seccompfd.
> >
> > For background of what I want to add: I want to be able to create a
> > seccomp monitor.  A seccomp monitor will be, logically, a pair of a
> > struct file that represents the monitor and a seccomp_filter that is
> > controlled by the monitor.  Depending on flags, whoever holds the
> > monitor fd could change the active filter, intercept syscalls, and
> > issue syscalls on behalf of a process that is trapped in an
> > intercepted syscall.
> >
> > Seccomp filters would nest properly.
> >
> > The interface would probably be (extremely pseudocoded):
> >
> > monitor_fd, filter_fd = seccomp(CREATE_MONITOR, flags, ...);
> >
> > Then, later:
> >
> > seccomp(ATTACH_TO_FILTER, filter_fd);  /* now filtered */
> >
> > read(monitor_fd, buf, size); /* returns an intercepted syscall */
> > write(monitor_fd, buf, size); /* issues a syscall or releases the
> > trapped task */
> >
> > This can't be implemented on x86 without either going insane or
> > finishing the massive set of pending cleanups to the x86 entry code.
> > I favor the latter.
> >
> > We could, however, add part of it right now: we could have a way to
> > create a filterfd, we could add kcmp support for it, and we could add
> > the ATTACH_TO_FILTER thing.  I think that would solve your problem.
> >
> > One major open question: does a filter_fd know what its parent is and,
> > if so, will it just refuse to attach if the caller's parent is wrong?
> > Or will a filter_fd attach anywhere.
> >
> 
> Let me add one more thought:
> 
> Currently, struct seccomp_filter encodes a strict tree hierarchy: it
> knows what its parent is.  This only matters as an implementation
> detail and because TSYNC checks for seccomp_filter equality.
> 
> We could change this without user-visible effects.  We could say that,
> for TSYNC purposes, two filter states match if they contain exactly
> the same layers in the same order where a layer does *not* encode a
> concept of parent.  We could then say that attaching a classic bpf
> filter creates a branch new layer that is not equal to any other layer
> that's been created.
> 
> This has no effect whatsoever.  The difference would be that we could
> declare that attaching the same ebpf program twice creates the *same*
> layer so that, if you fork and both children attach the same ebpf
> program, then they match for TSYNC purposes.

Would you keep struct seccomp_filter identity here (meaning that you'd
reach over and grab the seccomp_filter from a sibling thread if it
existed)? Would it only work for the last filter attached to siblings,
or for all the filters? This does make my life easier, but I like the
idea of just using seccompfd directly below as it seems somewhat
easier (for me at least) to understand,

> Similarly, attaching the
> same hypothetical filterfd would create the same layer.

If we change the api of my current set to have the ptrace commands
iterate over seccomp fds, it looks something like:

seccompfd = ptrace(GET_FILTER_FD, pid);
while (ptrace(NEXT_FD, pid, seccompfd) == 0) {
if (seccomp(CHECK_INHERITED, seccompfd))
break;

bpffd = seccomp(GET_BPF_FD, seccompfd);
err = buf(BPF_PROG_DUMP, bpffd, );
/* save the bpf prog */
}

then restore can look like:

while (have_noninherited_filters()) {
filter = load_filter();
bpffd = bpf(BPF_PROG_LOAD, filter);
seccompfd = seccomp(SECCOMP_FD_CREATE, bpffd);

filters[n_filters++] = seccompfd;
}

/* fork any children as necessary and do the rest of the restore */

for (i = 0; i < n_filters; i++) {
seccomp(SECCOMP_FD_INSTALL, filters[i]);
}

then the only question is how to implement the CHECK_INHERITED command
on dump.

If we support the above API, we don't need to think about the concept
of layers at all, or do any extra work on filter install to preserve
struct seccomp_filter identity, it just comes 

Re: [PATCH v2] arm/xen: Enable user access to the kernel before issuing a privcmd call

2015-09-11 Thread Julien Grall
On 11/09/15 18:00, Russell King - ARM Linux wrote:
> On Fri, Sep 11, 2015 at 05:25:59PM +0100, Julien Grall wrote:
>> +/*
>> + * Privcmd calls are issued by the userspace. We need to allow the
>> + * kernel to access the userspace memory before issuing the hypercall.
>> + */
>> +uaccess_enable r4
>> +
>> +/* r4 is loaded now as we use it as scratch register before */
>>  ldr r4, [sp, #4]
> 
> As I mentioned in one of my previous mails, "ip" should be safe to use
> here - it's a caller-corrupted register, just like r0-r3 and lr.  So,
> you could do:
> 
>   ldr r4, [sp, #4]
> + uaccess_enable ip

The register ip (aka r12) is used to store the hypercall number. So we
can't reuse it as scratch register.

The easiest one is r4.

> 
> which fractionally tightens the window.
> 
> However, there's nothing actually wrong with your version - there's no
> way we could've got this far with sp pointing at userspace.
> 
> I'm happy with either version, so:
> 
> Acked-by: Russell King 
> 
> How do you want to handle the patch?  I already have some other uaccess
> fixes queued up to send to Linus before the merge window closes.
> 
>>  __HVC(XEN_IMM)
>> +
>> +/*
>> + * Disable userspace access from kernel. This is fine to do it
>> + * unconditionally as no set_fs(KERNEL_DS)/set_fs(get_ds()) is
>> + * called before.
>> + */
>> +uaccess_disable r4
>> +
>>  ldm sp!, {r4}
>>  ret lr
>>  ENDPROC(privcmd_call);
>> -- 
>> 2.1.4
>>
> 


-- 
Julien Grall
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 7/9] ARM: STi: DT: STiH407: Add FDMA driver dt nodes.

2015-09-11 Thread Lee Jones
On Fri, 11 Sep 2015, Peter Griffin wrote:
> On Fri, 11 Sep 2015, Lee Jones wrote:
> > On Fri, 11 Sep 2015, Peter Griffin wrote:
> > 
> > > These nodes are required to get the fdma driver working
> > > on STiH407 based silicon.
> > > 
> > > Signed-off-by: Peter Griffin 
> > > ---
> > >  arch/arm/boot/dts/stih407-family.dtsi | 51 
> > > +++
> > >  1 file changed, 51 insertions(+)
> > > 
> > > diff --git a/arch/arm/boot/dts/stih407-family.dtsi 
> > > b/arch/arm/boot/dts/stih407-family.dtsi
> > > index 838b812..da07474b 100644
> > > --- a/arch/arm/boot/dts/stih407-family.dtsi
> > > +++ b/arch/arm/boot/dts/stih407-family.dtsi
> > > @@ -565,5 +565,56 @@
> > > <_port2 PHY_TYPE_USB3>;
> > >   };
> > >   };
> > > +
> > > + fdma0: fdma0-audio@8e2 {
> > 
> > I'm not familiar with the FDMA driver, so can't comment knowledgeably,
> > but the  part of @ should only describe the
> > type of hardware.  I believe in this case it should just be
> > dma@08e2.  Also notice the leading zero in the address, which I
> > believe mitigates possible confusion.  Then you be more specific with
> > the label, so something like 'fdma-audio' seems appropriate here.
> 
> Ok, can change to that format in v3.
> 
> > 
> > > + compatible = "st,stih407-fdma-mpe31";
> > > + reg = <0x8e2 0x2>;
> > 
> > I personally find padding up to 32bits helpful in the addresses.
> 
> None of the stih407-family nodes I can see have this padding, including
> the ones merged by you.

Nither of these two facts mean it's correct.

I'm happy to write a patch to correct them all.

Bear in mind that this isn't a hard and fast rule.  Both work and are
legal.  I just think the padding is more consistent.

> > > + interrupts = ;
> > > + dma-channels = <16>;
> > > + #dma-cells = <3>;
> > > + st,fdma-id = <0>;
> > 
> > We usually shy away from ID properties.  What is it required for in
> > this case?
> 
> Yes Rob did already mention that over here, see my reply at the bottom
> http://www.spinics.net/lists/devicetree/msg92529.html.
> 
> However I can't think of any other useful properties we could add
> to derive this information. The fdma controller number is used
> by the driver to generate a unique firmware filename.

Who chooses the naming scheme of the firmware binary?

Is there any reason they can't be:

  fdma_STiH407_audio.elf
  fdma_STiH407_app.elf
  fdma_STiH407_free_running.elf

Then you can have a different compatible for each.

-- 
Lee Jones
Linaro STMicroelectronics Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/11] ARM: STi: STiH407: Add spi default pinctrl groups.

2015-09-11 Thread Lee Jones
On Fri, 11 Sep 2015, Peter Griffin wrote:

> Now we have default pinconfig groups for each SPI
> controller ensure it is used by the SPI controller
> node.
> 
> Signed-off-by: Peter Griffin 
> ---
>  arch/arm/boot/dts/stih407-family.dtsi | 14 ++
>  1 file changed, 14 insertions(+)

Acked-by: Lee Jones 
 
> diff --git a/arch/arm/boot/dts/stih407-family.dtsi 
> b/arch/arm/boot/dts/stih407-family.dtsi
> index 838b812..94a2fec 100644
> --- a/arch/arm/boot/dts/stih407-family.dtsi
> +++ b/arch/arm/boot/dts/stih407-family.dtsi
> @@ -381,6 +381,8 @@
>   interrupts = ;
>   clocks = <_s_c0_flexgen CLK_EXT2F_A9>;
>   clock-names = "ssc";
> + pinctrl-names = "default";
> + pinctrl-0 = <_spi1_default>;
>  
>   status = "disabled";
>   };
> @@ -391,6 +393,8 @@
>   interrupts = ;
>   clocks = <_s_c0_flexgen CLK_EXT2F_A9>;
>   clock-names = "ssc";
> + pinctrl-names = "default";
> + pinctrl-0 = <_spi2_default>;
>  
>   status = "disabled";
>   };
> @@ -401,6 +405,8 @@
>   interrupts = ;
>   clocks = <_s_c0_flexgen CLK_EXT2F_A9>;
>   clock-names = "ssc";
> + pinctrl-names = "default";
> + pinctrl-0 = <_spi3_default>;
>  
>   status = "disabled";
>   };
> @@ -411,6 +417,8 @@
>   interrupts = ;
>   clocks = <_s_c0_flexgen CLK_EXT2F_A9>;
>   clock-names = "ssc";
> + pinctrl-names = "default";
> + pinctrl-0 = <_spi4_default>;
>  
>   status = "disabled";
>   };
> @@ -422,6 +430,8 @@
>   interrupts = ;
>   clocks = <_sysin>;
>   clock-names = "ssc";
> + pinctrl-names = "default";
> + pinctrl-0 = <_spi10_default>;
>  
>   status = "disabled";
>   };
> @@ -432,6 +442,8 @@
>   interrupts = ;
>   clocks = <_sysin>;
>   clock-names = "ssc";
> + pinctrl-names = "default";
> + pinctrl-0 = <_spi11_default>;
>  
>   status = "disabled";
>   };
> @@ -442,6 +454,8 @@
>   interrupts = ;
>   clocks = <_sysin>;
>   clock-names = "ssc";
> + pinctrl-names = "default";
> + pinctrl-0 = <_spi12_default>;
>  
>   status = "disabled";
>   };

-- 
Lee Jones
Linaro STMicroelectronics Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/11] ARM: STi: DT: STiH407: Add i2c3 alternate pin configs

2015-09-11 Thread Lee Jones
On Fri, 11 Sep 2015, Peter Griffin wrote:

> i2c3 controller can use several sets of pins depending
> on board design. This patch adds the missing alternate
> pinconfigs.
> 
> Signed-off-by: Seraphin Bonnaffe 
> Signed-off-by: Peter Griffin 
> ---
>  arch/arm/boot/dts/stih407-pinctrl.dtsi | 14 +-
>  1 file changed, 13 insertions(+), 1 deletion(-)

Acked-by: Lee Jones 

> diff --git a/arch/arm/boot/dts/stih407-pinctrl.dtsi 
> b/arch/arm/boot/dts/stih407-pinctrl.dtsi
> index d86ccc8..ce219a1 100644
> --- a/arch/arm/boot/dts/stih407-pinctrl.dtsi
> +++ b/arch/arm/boot/dts/stih407-pinctrl.dtsi
> @@ -430,12 +430,24 @@
>   };
>  
>   i2c3 {
> - pinctrl_i2c3_default: i2c3-default {
> + pinctrl_i2c3_default: i2c3-alt1-0 {
>   st,pins {
>   sda = < 6 ALT1 BIDIR>;
>   scl = < 5 ALT1 BIDIR>;
>   };
>   };
> + pinctrl_i2c3_alt1_1: i2c3-alt1-1 {
> + st,pins {
> + sda = < 7 ALT1 BIDIR>;
> + scl = < 6 ALT1 BIDIR>;
> + };
> + };
> + pinctrl_i2c3_alt3_0: i2c3-alt3-0 {
> + st,pins {
> + sda = < 6 ALT3 BIDIR>;
> + scl = < 5 ALT3 BIDIR>;
> + };
> + };
>   };
>  
>   spi0 {

-- 
Lee Jones
Linaro STMicroelectronics Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/11] ARM: STi: DT: STiH407: Add a cec0 pin definition

2015-09-11 Thread Lee Jones
On Fri, 11 Sep 2015, Peter Griffin wrote:

> This pin setup provides the correct configuration in order to
> interact with the CEC HW.
> 
> Signed-off-by: Erwan Le Ray 
> Signed-off-by: Nicolas Vanhaelewyn 
> Acked-by: Patrice Chotard 
> Signed-off-by: Patrice Chotard 

Duplicate.

> Signed-off-by: Peter Griffin 
> ---
>  arch/arm/boot/dts/stih407-pinctrl.dtsi | 8 
>  1 file changed, 8 insertions(+)

For the patch: Acked-by: Lee Jones 

> diff --git a/arch/arm/boot/dts/stih407-pinctrl.dtsi 
> b/arch/arm/boot/dts/stih407-pinctrl.dtsi
> index 0a754f2..d86ccc8 100644
> --- a/arch/arm/boot/dts/stih407-pinctrl.dtsi
> +++ b/arch/arm/boot/dts/stih407-pinctrl.dtsi
> @@ -107,6 +107,14 @@
>   st,retime-pin-mask = <0x3f>;
>   };
>  
> + cec0 {
> + pinctrl_cec0_default: cec0-default {
> + st,pins {
> + hdmi_cec = < 4 ALT1 BIDIR>;
> + };
> + };
> + };
> +
>   rc {
>   pinctrl_ir: ir0 {
>   st,pins {

-- 
Lee Jones
Linaro STMicroelectronics Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][4.1.5-rt5 PATCH] ARM: smp: __cpu_disable: fix sleeping function called from invalid context

2015-09-11 Thread Grygorii Strashko
When running with the RT-kernel (4.1.5-rt5) on TI OMAP dra7-evm and trying
to do Suspend to RAM, the following backtrace occurs:

 Disabling non-boot CPUs ...
 PM: noirq suspend of devices complete after 7.295 msecs
 Disabling non-boot CPUs ...
 BUG: sleeping function called from invalid context at 
kernel/locking/rtmutex.c:917
 in_atomic(): 1, irqs_disabled(): 128, pid: 18, name: migration/1
 INFO: lockdep is turned off.
 irq event stamp: 122
 hardirqs last  enabled at (121): [] 
_raw_spin_unlock_irqrestore+0x88/0x90
 hardirqs last disabled at (122): [] _raw_spin_lock_irq+0x28/0x5c
 softirqs last  enabled at (0): [] copy_process.part.52+0x410/0x19d8
 softirqs last disabled at (0): [<  (null)>]   (null)
 Preemption disabled at:[<  (null)>]   (null)
  CPU: 1 PID: 18 Comm: migration/1 Tainted: GW   
4.1.4-rt3-01046-g96ac8da #204
 Hardware name: Generic DRA74X (Flattened Device Tree)
 [] (unwind_backtrace) from [] (show_stack+0x20/0x24)
 [] (show_stack) from [] (dump_stack+0x88/0xdc)
 [] (dump_stack) from [] (___might_sleep+0x198/0x2a8)
 [] (___might_sleep) from [] (rt_spin_lock+0x30/0x70)
 [] (rt_spin_lock) from [] (find_lock_task_mm+0x9c/0x174)
 [] (find_lock_task_mm) from [] 
(clear_tasks_mm_cpumask+0xb4/0x1ac)
 [] (clear_tasks_mm_cpumask) from [] 
(__cpu_disable+0x98/0xbc)
 [] (__cpu_disable) from [] (take_cpu_down+0x1c/0x50)
 [] (take_cpu_down) from [] (multi_cpu_stop+0x11c/0x158)
 [] (multi_cpu_stop) from [] (cpu_stopper_thread+0xc4/0x184)
 [] (cpu_stopper_thread) from [] 
(smpboot_thread_fn+0x18c/0x324)
 [] (smpboot_thread_fn) from [] (kthread+0xe8/0x104)
 [] (kthread) from [] (ret_from_fork+0x14/0x3c)
 CPU1: shutdown
 PM: Calling sched_clock_suspend+0x0/0x40
 PM: Calling timekeeping_suspend+0x0/0x2e0
 PM: Calling irq_gc_suspend+0x0/0x68
 PM: Calling fw_suspend+0x0/0x2c
 PM: Calling cpu_pm_suspend+0x0/0x28

Also, sometimes system stucks right after displaying "Disabling non-boot
CPUs ...". The root cause of above backtrace is task_lock() which takes
a sleeping lock on -RT.

To fix the issue, move clear_tasks_mm_cpumask() call from __cpu_disable()
to __cpu_die() which is called on the thread which is asking for a target
CPU to be shutdown. In addition, this change restores CPUhotplug functionality
on TI OMAP dra7-evm and CPU1 can be unplugged/plugged many times.

Signed-off-by: Grygorii Strashko 
---

RFC: I'm not sure how safe this change is and will be appreciated for any 
comments.
Most of arches call clear_tasks_mm_cpumask() from __cpu_disable(), but *powerpc*
calls it from CPU_DEAD notifier. This patch follows powerpc's approach in
general.

This issue was first reported in:
 http://www.spinics.net/lists/linux-rt-users/msg13752.html

 arch/arm/kernel/smp.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 48185a7..7ee0912 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -230,8 +230,6 @@ int __cpu_disable(void)
flush_cache_louis();
local_flush_tlb_all();
 
-   clear_tasks_mm_cpumask(cpu);
-
return 0;
 }
 
@@ -247,6 +245,9 @@ void __cpu_die(unsigned int cpu)
pr_err("CPU%u: cpu didn't die\n", cpu);
return;
}
+
+   clear_tasks_mm_cpumask(cpu);
+
pr_notice("CPU%u: shutdown\n", cpu);
 
/*
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] sched,numa: limit amount of virtual memory scanned in task_numa_work

2015-09-11 Thread Rik van Riel
On 09/11/2015 11:05 AM, Mel Gorman wrote:
> On Fri, Sep 11, 2015 at 09:00:27AM -0400, Rik van Riel wrote:
>> Currently task_numa_work scans up to numa_balancing_scan_size_mb worth
>> of memory per invocation, but only counts memory areas that have at
>> least one PTE that is still present and not marked for numa hint faulting.
>>
>> It will skip over arbitarily large amounts of memory that are either
>> unused, full of swap ptes, or full of PTEs that were already marked
>> for NUMA hint faults but have not been faulted on yet.
>>
> 
> This was deliberate and intended to cover a case whereby a process sparsely
> using the address space would quickly skip over the sparse portions and
> reach the active portions. Obviously you've found that this is not always
> a great idea.

Skipping over non-present pages is fine, since the scan
rate is keyed off the RSS.

However, skipping over pages that are already marked
PROT_NONE / PTE_NUMA results in unmapping pages at a much
accelerated rate (sometimes using >90% of the CPU of the
task), because the pages that are already PROT_NONE / NUMA
_are_ counted as part of the RSS.

>> @@ -2240,18 +2242,22 @@ void task_numa_work(struct callback_head *work)
>>  start = max(start, vma->vm_start);
>>  end = ALIGN(start + (pages << PAGE_SHIFT), HPAGE_SIZE);
>>  end = min(end, vma->vm_end);
>> -nr_pte_updates += change_prot_numa(vma, start, end);
>> +nr_pte_updates = change_prot_numa(vma, start, end);
>>  
> 
> Are you *sure* about this particular change?
> 
> The intent is that sparse space be skipped until the first updated PTE
> is found and then scan sysctl_numa_balancing_scan_size pages after that.
> With this change, if we find a single PTE in the middle of a sparse space
> than we stop updating pages in the nr_pte_updates check below. You get
> protected from a lot of scanning by the virtpages check but it does not
> seem this fix is necessary.  It has an odd side-effect whereby we possible
> scan more with this patch in some cases.

True, it is possible that this patch would lead to more scanning
than before, if a process has present PTEs interleaved with areas
that are either sparsely populated, or already marked PROT_NONE.

However, was your intention to not quickly skip over empty areas
that come right after one single present PTE, but only over empty
areas at the beginning of a scan area?

If so, I don't understand the logic behind that, and would like
to know more :)

>>  /*
>> - * Scan sysctl_numa_balancing_scan_size but ensure that
>> - * at least one PTE is updated so that unused virtual
>> - * address space is quickly skipped.
>> + * Try to scan sysctl_numa_balancing_size worth of
>> + * hpages that have at least one present PTE that
>> + * is not already pte-numa. If the VMA contains
>> + * areas that are unused or already full of prot_numa
>> + * PTEs, scan up to virtpages, to skip through those
>> + * areas faster.
>>   */
>>  if (nr_pte_updates)
>>  pages -= (end - start) >> PAGE_SHIFT;
>> +virtpages -= (end - start) >> PAGE_SHIFT;
>>  
> 
> It's a pity there will potentially be a lot of useless dead scanning on
> those processes but caching start addresses is both outside the scope of
> this patch and has its own problems.

The problem has been observed when processes already have a lot of
pages marked PROT_NONE by change_prot_numa(), and change_prot_numa()
returning zero because no PTEs were hanged.

In that case, the amount of useless dead scanning should be a whole
lot less with this patch, than without.

I do not quite understand how this patch makes it worse, though.

-- 
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 9/9] ARM: multi_v7_defconfig: Enable STi FDMA driver

2015-09-11 Thread Lee Jones
On Fri, 11 Sep 2015, Peter Griffin wrote:

> This DMA controller is found on all STi chipsets.
> 
> Signed-off-by: Peter Griffin 
> ---
>  arch/arm/configs/multi_v7_defconfig | 1 +
>  1 file changed, 1 insertion(+)

Acked-by: Lee Jones 

> diff --git a/arch/arm/configs/multi_v7_defconfig 
> b/arch/arm/configs/multi_v7_defconfig
> index 5fd8df6..ce3c8c1 100644
> --- a/arch/arm/configs/multi_v7_defconfig
> +++ b/arch/arm/configs/multi_v7_defconfig
> @@ -587,6 +587,7 @@ CONFIG_IMX_DMA=y
>  CONFIG_MXS_DMA=y
>  CONFIG_DMA_OMAP=y
>  CONFIG_XILINX_VDMA=y
> +CONFIG_ST_FDMA=y
>  CONFIG_STAGING=y
>  CONFIG_SENSORS_ISL29018=y
>  CONFIG_SENSORS_ISL29028=y

-- 
Lee Jones
Linaro STMicroelectronics Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 8/9] MAINTAINERS: Add FDMA driver files to STi section.

2015-09-11 Thread Lee Jones
On Fri, 11 Sep 2015, Peter Griffin wrote:

> This patch adds the FDMA driver files to the STi
> section of the maintainers file.
> 
> Signed-off-by: Peter Griffin 
> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)

Acked-by: Lee Jones 

> diff --git a/MAINTAINERS b/MAINTAINERS
> index b60e2b2..b3cdd5b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1504,6 +1504,7 @@ S:  Maintained
>  F:   arch/arm/mach-sti/
>  F:   arch/arm/boot/dts/sti*
>  F:   drivers/clocksource/arm_global_timer.c
> +F:   drivers/dma/st_fdma*
>  F:   drivers/i2c/busses/i2c-st.c
>  F:   drivers/media/rc/st_rc.c
>  F:   drivers/mmc/host/sdhci-st.c

-- 
Lee Jones
Linaro STMicroelectronics Landing Team Lead
Linaro.org │ Open source software for ARM SoCs
Follow Linaro: Facebook | Twitter | Blog
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] DocBook: ignore .proc files

2015-09-11 Thread Jonathan Corbet
On Mon, 31 Aug 2015 11:16:08 -0700
Brian Norris  wrote:

> These are generated as part of 'make htmldocs'. If we don't ignore them,
> then most of our generated subdirectories get treated as "untracked" by
> git.

Makes sense.  Applied to the docs tree, thanks.

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 05/11] ARM: DT: STiH407: Add SPI FSM (NOR Flash) Controller pin config

2015-09-11 Thread Peter Griffin
This patch adds the pin configuration for the NOR flash controller.

Signed-off-by: Patrice Chotard 
Signed-off-by: Christophe Kerello 
Signed-off-by: Peter Griffin 
---
 arch/arm/boot/dts/stih407-pinctrl.dtsi | 13 +
 1 file changed, 13 insertions(+)

diff --git a/arch/arm/boot/dts/stih407-pinctrl.dtsi 
b/arch/arm/boot/dts/stih407-pinctrl.dtsi
index 6c81f35..d0f5fdd 100644
--- a/arch/arm/boot/dts/stih407-pinctrl.dtsi
+++ b/arch/arm/boot/dts/stih407-pinctrl.dtsi
@@ -872,6 +872,19 @@
};
};
};
+
+   fsm {
+   pinctrl_fsm: fsm {
+   st,pins {
+   spi-fsm-clk = < 1 ALT1 
OUT>;
+   spi-fsm-cs = < 0 ALT1 
OUT>;
+   spi-fsm-mosi = < 2 ALT1 
OUT>;
+   spi-fsm-miso = < 3 ALT1 
IN>;
+   spi-fsm-hol = < 5 ALT1 
OUT>;
+   spi-fsm-wp = < 4 ALT1 
OUT>;
+   };
+   };
+   };
};
};
 };
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 01/11] ARM: STi: DT: STiH407: Add a cec0 pin definition

2015-09-11 Thread Peter Griffin
This pin setup provides the correct configuration in order to
interact with the CEC HW.

Signed-off-by: Erwan Le Ray 
Signed-off-by: Nicolas Vanhaelewyn 
Acked-by: Patrice Chotard 
Signed-off-by: Patrice Chotard 
Signed-off-by: Peter Griffin 
---
 arch/arm/boot/dts/stih407-pinctrl.dtsi | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/arm/boot/dts/stih407-pinctrl.dtsi 
b/arch/arm/boot/dts/stih407-pinctrl.dtsi
index 0a754f2..d86ccc8 100644
--- a/arch/arm/boot/dts/stih407-pinctrl.dtsi
+++ b/arch/arm/boot/dts/stih407-pinctrl.dtsi
@@ -107,6 +107,14 @@
st,retime-pin-mask = <0x3f>;
};
 
+   cec0 {
+   pinctrl_cec0_default: cec0-default {
+   st,pins {
+   hdmi_cec = < 4 ALT1 BIDIR>;
+   };
+   };
+   };
+
rc {
pinctrl_ir: ir0 {
st,pins {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 07/11] ARM: DT: STiH407: Add systrace pin configuration

2015-09-11 Thread Peter Griffin
This patch adds the pin config for systrace for
STiH407 family silicon.

Signed-off-by: Patrice Chotard 
Signed-off-by: Fabrice Gasnier 
Signed-off-by: Peter Griffin 
---
 arch/arm/boot/dts/stih407-pinctrl.dtsi | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/arm/boot/dts/stih407-pinctrl.dtsi 
b/arch/arm/boot/dts/stih407-pinctrl.dtsi
index cde776b..798d901 100644
--- a/arch/arm/boot/dts/stih407-pinctrl.dtsi
+++ b/arch/arm/boot/dts/stih407-pinctrl.dtsi
@@ -658,6 +658,18 @@
};
};
};
+
+   systrace {
+   pinctrl_systrace_default: systrace-default {
+   st,pins {
+   trc_data0 = < 3 ALT5 OUT>;
+   trc_data1 = < 4 ALT5 OUT>;
+   trc_data2 = < 5 ALT5 OUT>;
+   trc_data3 = < 6 ALT5 OUT>;
+   trc_clk   = < 7 ALT5 OUT>;
+   };
+   };
+   };
};
 
pin-controller-front1 {
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC 3/5] powerpc: atomic: implement atomic{,64}_{add,sub}_return_* variants

2015-09-11 Thread Paul E. McKenney
On Fri, Sep 11, 2015 at 01:45:07PM +0100, Will Deacon wrote:
> [left the context in the hope that we can make some progress]
> 
> On Wed, Sep 02, 2015 at 10:59:06AM +0100, Will Deacon wrote:
> > On Tue, Sep 01, 2015 at 10:45:40PM +0100, Paul E. McKenney wrote:
> > > On Tue, Sep 01, 2015 at 08:00:27PM +0100, Will Deacon wrote:
> > > > On Fri, Aug 28, 2015 at 04:39:21PM +0100, Peter Zijlstra wrote:
> > > > > Yes, the difference between RCpc and RCsc is in the meaning of 
> > > > > RELEASE +
> > > > > ACQUIRE. With RCsc that implies a full memory barrier, with RCpc it 
> > > > > does
> > > > > not.
> > > > 
> > > > We've discussed this before, but for the sake of completeness, I don't
> > > > think we're fully RCsc either because we don't order the actual RELEASE
> > > > operation again a subsequent ACQUIRE operation:
> > > > 
> > > > P0
> > > > smp_store_release(, 1);
> > > > foo = smp_load_acquire();
> > > > 
> > > > P1
> > > > smp_store_release(, 1);
> > > > bar = smp_load_acquire();
> > > > 
> > > > We allow foo == bar == 0, which is prohibited by SC.
> > > 
> > > I certainly hope that no one expects foo == bar == 0 to be prohibited!!!
> > 
> > I just thought it was worth making this point, because it is prohibited
> > in SC and I don't want people to think that our RELEASE/ACQUIRE operations
> > are SC (even though they happen to be on arm64).
> > 
> > > On the other hand, in this case, foo == bar == 1 will be prohibited:
> > > 
> > > P0
> > > foo = smp_load_acquire();
> > > smp_store_release(, 1);
> > > 
> > > P1
> > > bar = smp_load_acquire();
> > > smp_store_release(, 1);
> > 
> > Agreed.
> > 
> > > > However, we *do* enforce ordering on any prior or subsequent accesses
> > > > for the code snippet above (the release and acquire combine to give a
> > > > full barrier), which makes these primitives well suited to things like
> > > > message passing.
> > > 
> > > If I understand your example correctly, neither x86 nor Power implement
> > > a full barrier in this case.  For example:
> > > 
> > >   P0
> > >   WRITE_ONCE(a, 1);
> > >   smp_store_release(b, 1);
> > >   r1 = smp_load_acquire(c);
> > >   r2 = READ_ONCE(d);
> > > 
> > >   P1
> > >   WRITE_ONCE(d, 1);
> > >   smp_mb();
> > >   r3 = READ_ONCE(a);
> > > 
> > > Both x86 and Power can reorder P0 as follows:
> > > 
> > >   P0
> > >   r1 = smp_load_acquire(c);
> > >   r2 = READ_ONCE(d);
> > >   WRITE_ONCE(a, 1);
> > >   smp_store_release(b, 1);
> > > 
> > > Which clearly shows that the non-SC outcome r2 == 0 && r3 == 0 is allowed.
> > > 
> > > Or am I missing your point here?
> > 
> > I think this example is slightly different. Having the RELEASE/ACQUIRE
> > operations being reordered with respect to each other is one thing, but
> > I thought we were heading in a direction where they combined to give a
> > full barrier with respect to other accesses. In that case, the reordering
> > above would be forbidden.
> > 
> > Peter -- if the above reordering can happen on x86, then moving away
> > from RCpc is going to be less popular than I hoped...
> 
> Peter, any thoughts? I'm not au fait with the x86 memory model, but what
> Paul's saying is worrying.

The herd tool has an x86 mode, which will allow you to double-check
my scenario.  This tool is described in "Herding Cats: Modelling,
Simulation, Testing, and Data-mining for Weak Memory" by Alglave,
Marenget, and Tautschnig.  The herd tool is available at this git
repository: https://github.com/herd/herdtools.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RCU explosion on ARM Integrator

2015-09-11 Thread Paul E. McKenney
On Fri, Sep 11, 2015 at 01:24:56PM +0200, Linus Walleij wrote:
> Hi RCU folks,
> 
> this happened to me when running the iozone throughput benchmark
> on the ARM Integrator, I wonder if I should take this platform for a ride on
> the RCU torture test or similar? Looks a bit instable :/

You got a pagefault in rcu_check_callbacks().  Congratulations, that -is-
an accomplishment!  ;-)

I haven't seen anything like this recently.

Is this reproducible?  If so, and if it was stable on some previous
release, a bisection would be helpful.

Otherwise, it looks like this blew up just after returning from a
function call.  If you could map back to the source code, let me know what
version you are running, send me a disassembly of rcu_check_callbacks(),
and supply a .config, I can take a look and see if I can provide any
additional information.  Or, for that matter, a fix.

On the other hand, if this is a new port, things to be suspicious of
include correct masking of interrupts, consistent reporting of the number
of CPUs, and of course memory mapping.

Thanx, Paul

> Yours,
> Linus Walleij
> 
> root@integrator:/ iozone -az -i0 -i1 -i2 -s 20m -I -f /mnt/foo.test
> Iozone: Performance Test of File I/O
> Version $Revision: 3.430 $
> Compiled for 32 bit mode.
> Build: linux-arm
> 
> Contributors:William Norcott, Don Capps, Isom Crawford, Kirby Collins
>  Al Slater, Scott Rhine, Mike Wisner, Ken Goss
>  Steve Landherr, Brad Smith, Mark Kelly, Dr. Alain CYR,
>  Randy Dunlap, Mark Montague, Dan Million, Gavin Brebner,
>  Jean-Marc Zucconi, Jeff Blomberg, Benny Halevy, Dave 
> Boone,
>  Erik Habbinga, Kris Strecker, Walter Wong, Joshua Root,
>  Fabrice Bacchella, Zhenghua Xue, Qin Li, Darren Sawyer,
>  Vangel Bojaxhi, Ben England, Vikentsi Lapa.
> 
> Run began: Thu Jan  1 01:16:39 1970
> 
> Auto Mode
> Cross over of record size disabled.
> File size set to 20480 kB
> O_DIRECT feature enabled
> Command line used: iozone -az -i0 -i1 -i2 -s 20m -I -f /mnt/foo.test
> Output is in kBytes/sec
> Time Resolution = 0.16 seconds.
> Processor cache size set to 1024 kBytes.
> Processor cache line size set to 32 bytes.
> File stride size set to 17 * record size.
>   random
>  random bkwdrecordstride
>   kB  reclenwrite  rewritereadrereadread
>   write read   rewrite  read   fwrite frewritefread
> freread
>20480   4   54   56   57   57   57
>  24
>20480   8   56   57   58   58   58
>  34
>20480  16   56   58   59   59Unable to
> handle kernel paging request at virtual address 807b7cac
> pgd = c6404000
> [807b7cac] *pgd=
> Internal error: Oops: 5 [#1] PREEMPT ARM
> Modules linked in:
> CPU: 0 PID: 110 Comm: iozone Not tainted 4.2.0-11142-gb0a1ea51bda4-dirty #3
> Hardware name: ARM Integrator/AP (Device Tree)
> task: c6b45540 ti: c642 task.ti: c642
> PC is at rcu_check_callbacks+0x318/0x850
> LR is at rcu_check_callbacks+0x310/0x850
> pc : []lr : []psr: 6093
> sp : c64218c0  ip : c07b8038  fp : c07b8920
> r10: 807b7ca8  r9 : 0001  r8 : c07b79f8
> r7 : c07b2110  r6 : c07b2118  r5 : c07b7ca8  r4 : c07b8004
> r3 : c07b8038  r2 : c07b8038  r1 : c07b8004  r0 : 
> Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
> Control: 0005317f  Table: 06404000  DAC: 0051
> Process iozone (pid: 110, stack limit = 0xc6420190)
> Stack: (0xc64218c0 to 0xc6422000)
> 18c0:        
> 18e0:        
> 1900:        
> 1920:        
> 1940:        
> 1960:        
> 1980:        
> 19a0:        
> 19c0:        
> 19e0:        
> 1a00:        
> 1a20:        
> 1a40:        
> 1a60:     

Re: [PATCH 5/5] perf tools: Enhance parsing events tracepoint error output

2015-09-11 Thread Raphaël Beamonte
2015-09-11 12:16 GMT-04:00 Jiri Olsa :
> On Sat, Sep 12, 2015 at 01:09:31AM +0900, Namhyung Kim wrote:

>> has a problem - if tracefs is mounted under debugfs, the access mode
>> of debugfs also affects, so in this case I had to change it both for
>> debugfs and tracefs..
>
>
> hum, I wonder the error message needs to be that smart..
>
> jirka

Hmm... If tracefs is mounted under debugfs, wouldn't remounting
debugfs do the trick, as it was done before?
If so, why couldn't we just check the paths with a basic strcmp to
verify if tracefs starts by debugfs, and in that case offer to remount
debugfs, else offer to remount tracefs?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >