date:20140228

Re: [PATCH 1/2] i2c: Add message transfer tracepoints for I2C

2014-02-28 Thread David Howells

Wolfram Sang  wrote:

> > > And for the buffer: %*phN is difficult to read IMO. What about %*ph? Or
> > > %*phD at least?
> > 
> > My problem with that is that it increases the length of the output by 50%
> > and there's a hard limit on how much output we may produce.
> 
> Is it PAGE_SIZE? How is this handled when the buffer is so big that the
> limit will be reached anyhow? Note that it is really uncommon to
> transfer kilobytes in one go via i2c. Usually, big transfers are split
> up into smaller fragments, say 128-256 byte. So, for readability, I'd
> still favour %*ph.

I was thinking that limited the size of the output field to 64 bytes.  But
reading vsprintf.c again, it appears it's actually the size of the binary blob
that it's converting to hex that's limited to 64 bytes.

I would prefer shorter lines - 128 bytes of hex still isn't entirely readable,
even with separators interpolated, but I'll add this.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] i2c: Add message transfer tracepoints for SMBUS

2014-02-28 Thread David Howells

Wolfram Sang  wrote:

> > > Can we have something like this for 'flags'?
> > 
> > There's a __print_flags() which should work.  One thing I'm concerned about
> > there is how do we handle more flags being added - does that count as an ABI
> > break if the printed format changes?
> 
> Not sure, I mean, this is debug output, no?

Apparently, the raw messages are ABI, but the text dump (which is what we're
talking out) is not.  So doing this should be okay.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] irq: Export symbol no_action()

2014-02-28 Thread Alexander Shiyan

This will allow to use dummy IRQ handler no_action() from
drivers compiled as module. For example, dummy handler is could
be used for drivers that use ARM FIQ interrupts.

Signed-off-by: Alexander Shiyan 
---
 kernel/irq/handle.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/irq/handle.c b/kernel/irq/handle.c
index bfec453..e8ddcbf 100644
--- a/kernel/irq/handle.c
+++ b/kernel/irq/handle.c
@@ -41,6 +41,7 @@ irqreturn_t no_action(int cpl, void *dev_id)
 {
return IRQ_NONE;
 }
+EXPORT_SYMBOL(no_action);
 
 static void warn_no_thread(unsigned int irq, struct irqaction *action)
 {
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] mm: implement ->map_pages for shmem/tmpfs

2014-02-28 Thread Ning Qu

Btw, should we first check if page returned by radix_tree_deref_slot is NULL?

diff --git a/mm/filemap.c b/mm/filemap.c
index 1bc12a9..c129ee5 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1745,6 +1745,8 @@ void filemap_map_pages(struct vm_area_struct
*vma, struct vm_fault *vmf)
break;
 repeat:
page = radix_tree_deref_slot(slot);
+   if (unlikely(!page))
+   continue;
if (radix_tree_exception(page)) {
if (radix_tree_deref_retry(page))


Best wishes,
-- 
Ning Qu (曲宁) | Software Engineer | qun...@google.com | +1-408-418-6066


On Fri, Feb 28, 2014 at 5:20 PM, Hugh Dickins  wrote:
> On Fri, 28 Feb 2014, Ning Qu wrote:
>
>> In shmem/tmpfs, we also use the generic filemap_map_pages,
>> seems the additional checking is not worth a separate version
>> of map_pages for it.
>>
>> Signed-off-by: Ning Qu 
>> ---
>>  mm/shmem.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 1f18c9d..2ea4e89 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -2783,6 +2783,7 @@ static const struct super_operations shmem_ops = {
>>
>>  static const struct vm_operations_struct shmem_vm_ops = {
>>   .fault  = shmem_fault,
>> + .map_pages  = filemap_map_pages,
>>  #ifdef CONFIG_NUMA
>>   .set_policy = shmem_set_policy,
>>   .get_policy = shmem_get_policy,
>> --
>
> (There's no need for a 0/1, all the info should go into the one patch.)
>
> I expect this will prove to be a very sensible and adequate patch,
> thank you: it probably wouldn't be worth more effort to give shmem
> anything special of its own, and filemap_map_pages() is already
> (almost) coping with exceptional entries.
>
> But I can't Ack it until I've tested it some more, won't be able to
> do so until Sunday; and even then some doubt, since this and Kirill's
> are built upon mmotm/next, which after a while gives me spinlock
> lockups under load these days, yet to be investigated.
>
> "almost" above because, Kirill, even without Ning's extension to
> shmem, your filemap_map_page() soon crashes on an exceptional entry:
>
> Don't try to dereference an exceptional entry.
>
> Signed-off-by: Hugh Dickins 
>
> --- mmotm+kirill/mm/filemap.c   2014-02-28 15:17:50.984019060 -0800
> +++ linux/mm/filemap.c  2014-02-28 16:38:04.976633308 -0800
> @@ -2084,7 +2084,7 @@ repeat:
> if (radix_tree_deref_retry(page))
> break;
> else
> -   goto next;
> +   continue;
> }
>
> if (!page_cache_get_speculative(page))
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/1] mm, shmem: map few pages around fault address if they are in page cache

2014-02-28 Thread Ning Qu

Yes, the simple test does verify that the page fault number are
correct with the patch. So my previous results are from those command
lines, which also show some performance improvement with this change
in tmpfs.

sequential access
/usr/bin/time -a ./iozone —B s 8g -i 0 -i 1

random access
/usr/bin/time -a ./iozone —B s 8g -i 0 -i 2
Best wishes,
-- 
Ning Qu


On Fri, Feb 28, 2014 at 10:10 PM, Ning Qu  wrote:
> Yes, I am using the iozone -i 0 -i 1. Let me try the most simple test
> as you mentioned.
> Best wishes,
> --
> Ning Qu
>
>
> On Fri, Feb 28, 2014 at 5:41 PM, Andrew Morton
>  wrote:
>> On Fri, 28 Feb 2014 16:35:16 -0800 Ning Qu  wrote:
>>
>>> Sorry about my fault about the experiments, here is the real one.
>>>
>>> Btw, apparently, there are still some questions about the results and
>>> I will sync with Kirill about his test command line.
>>>
>>> Below is just some simple experiment numbers from this patch, let me know if
>>> you would like more:
>>>
>>> Tested on Xeon machine with 64GiB of RAM, using the current default fault
>>> order 4.
>>>
>>> Sequential access 8GiB file
>>> Baselinewith-patch
>>> 1 thread
>>> minor fault 8,389,0524,456,530
>>> time, seconds9.558.31
>>
>> The numbers still seem wrong.  I'd expect to see almost exactly 2M minor
>> faults with this test.
>>
>> Looky:
>>
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>>
>> #define G (1024 * 1024 * 1024)
>>
>> int main(int argc, char *argv[])
>> {
>> char *p;
>> int fd;
>> unsigned long idx;
>> int sum = 0;
>>
>> fd = open("foo", O_RDONLY);
>> if (fd < 0) {
>> perror("open");
>> exit(1);
>> }
>> p = mmap(NULL, 1 * G, PROT_READ, MAP_PRIVATE, fd, 0);
>> if (p == MAP_FAILED) {
>> perror("mmap");
>> exit(1);
>> }
>>
>> for (idx = 0; idx < 1 * G; idx += 4096)
>> sum += p[idx];
>> printf("%d\n", sum);
>> exit(0);
>> }
>>
>> z:/home/akpm> /usr/bin/time ./a.out
>> 0
>> 0.05user 0.33system 0:00.38elapsed 99%CPU (0avgtext+0avgdata 
>> 4195856maxresident)k
>> 0inputs+0outputs (0major+262264minor)pagefaults 0swaps
>>
>> z:/home/akpm> dc
>> 16o
>> 262264 4 * p
>> 1001E0
>>
>> That's close!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/1] mm, shmem: map few pages around fault address if they are in page cache

2014-02-28 Thread Ning Qu

Yes, I am using the iozone -i 0 -i 1. Let me try the most simple test
as you mentioned.
Best wishes,
-- 
Ning Qu


On Fri, Feb 28, 2014 at 5:41 PM, Andrew Morton
 wrote:
> On Fri, 28 Feb 2014 16:35:16 -0800 Ning Qu  wrote:
>
>> Sorry about my fault about the experiments, here is the real one.
>>
>> Btw, apparently, there are still some questions about the results and
>> I will sync with Kirill about his test command line.
>>
>> Below is just some simple experiment numbers from this patch, let me know if
>> you would like more:
>>
>> Tested on Xeon machine with 64GiB of RAM, using the current default fault
>> order 4.
>>
>> Sequential access 8GiB file
>> Baselinewith-patch
>> 1 thread
>> minor fault 8,389,0524,456,530
>> time, seconds9.558.31
>
> The numbers still seem wrong.  I'd expect to see almost exactly 2M minor
> faults with this test.
>
> Looky:
>
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
>
> #define G (1024 * 1024 * 1024)
>
> int main(int argc, char *argv[])
> {
> char *p;
> int fd;
> unsigned long idx;
> int sum = 0;
>
> fd = open("foo", O_RDONLY);
> if (fd < 0) {
> perror("open");
> exit(1);
> }
> p = mmap(NULL, 1 * G, PROT_READ, MAP_PRIVATE, fd, 0);
> if (p == MAP_FAILED) {
> perror("mmap");
> exit(1);
> }
>
> for (idx = 0; idx < 1 * G; idx += 4096)
> sum += p[idx];
> printf("%d\n", sum);
> exit(0);
> }
>
> z:/home/akpm> /usr/bin/time ./a.out
> 0
> 0.05user 0.33system 0:00.38elapsed 99%CPU (0avgtext+0avgdata 
> 4195856maxresident)k
> 0inputs+0outputs (0major+262264minor)pagefaults 0swaps
>
> z:/home/akpm> dc
> 16o
> 262264 4 * p
> 1001E0
>
> That's close!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/3] tracing: evaluate len expression only once in __dynamic_array macro

2014-02-28 Thread Filipe Brandenburger

Use a temporary variable to store the expansion of the len expression.
If the evaluation is expensive, this commit will ensure it is evaluated
only once inside ftrace_get_offsets_.

Signed-off-by: Filipe Brandenburger 
---
 include/trace/ftrace.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 82e8d89..86a056a 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -373,10 +373,11 @@ ftrace_define_fields_##call(struct ftrace_event_call 
*event_call) \
 
 #undef __dynamic_array
 #define __dynamic_array(type, item, len)   \
+   __item_length = (len) * sizeof(type);   \
__data_offsets->item = __data_size +\
   offsetof(typeof(*entry), __data);\
-   __data_offsets->item |= ((len) * sizeof(type)) << 16;   \
-   __data_size += (len) * sizeof(type);
+   __data_offsets->item |= __item_length << 16;\
+   __data_size += __item_length;
 
 #undef __string
 #define __string(item, src) __dynamic_array(char, item,
\
@@ -388,6 +389,7 @@ static inline notrace int ftrace_get_offsets_##call(
\
struct ftrace_data_offsets_##call *__data_offsets, proto)   \
 {  \
int __data_size = 0;\
+   int __maybe_unused __item_length;   \
struct ftrace_raw_##call __maybe_unused *entry; \
\
tstruct;\
-- 
1.9.0.279.gdc9e3eb

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/3] tracing: correctly expand len expressions from __dynamic_array macro

2014-02-28 Thread Filipe Brandenburger

This fixes expansion of the len argument in __dynamic_array macros.
The previous code from commit 7d536cb3f would not fully evaluate the
expression before multiplying its result by the size of the type.

This went unnoticed because the length stored in the high 16 bits of the
offset (which is the one that was broken here) is only used by
filter_pred_strloc which only acts on strings for which the size of the
type is 1.

Signed-off-by: Filipe Brandenburger 
---
 include/trace/ftrace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 1a8b28d..82e8d89 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -375,7 +375,7 @@ ftrace_define_fields_##call(struct ftrace_event_call 
*event_call)   \
 #define __dynamic_array(type, item, len)   \
__data_offsets->item = __data_size +\
   offsetof(typeof(*entry), __data);\
-   __data_offsets->item |= (len * sizeof(type)) << 16; \
+   __data_offsets->item |= ((len) * sizeof(type)) << 16;   \
__data_size += (len) * sizeof(type);
 
 #undef __string
-- 
1.9.0.279.gdc9e3eb

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Staging: comedi: add timeouts to while loops in s626.c

2014-02-28 Thread Chase Southwood

>On Friday, February 28, 2014 11:26 AM, Ian Abbott  wrote:

>>On 2014-02-28 07:35, Chase Southwood wrote:
>> Smatch located a handful of while loops testing readl calls in s626.c.
>> Since these while loops depend on readl succeeding, it's safer to make
>> sure they time out eventually.
>>
>> Signed-off-by: Chase Southwood 
>> ---
>> Ian and/or Hartley, I'd love your comments on this.  It seems to me that
>> we want these kinds of while loops properly timed out, but I want to make
>> sure I'm doing everything properly.  First off, s626_debi_transfer() says
>> directly that it is called from within critical sections, so I assume
>> that means that the new comedi_timeout() function is no good here, and
>> s626_send_dac() looked equally suspicious, so I opted for iterative
>> timeouts.  Is this correct?  Also, for these timeouts, I used a very
>> conservative 1 iterations, would it be better to decrease that?
>
>Well 1 iterations is an improvement on infinity!  If the hardware is 
>working, you'd expect it to go round a lot fewer iterations than that, 
>but if the hardware is broken all bets are off, especially if it is 
>generating interrupts.
>

Great, thanks!  I suppose I'll leave that number there then.

>
>> Also, do my error strings appear acceptable?
>
>Mostly.  There's a type in one of the strings that says "TLS" instead of 
>"TSL".
>

*Sigh* I promise I can type sometimes :P I'll get this corrected.

>
>> And finally, are timeouts here even necessary or helpful, or are there
>> any better ways to do it?
>
>In the case of s626_send_dac(), it doesn't seem to be used in any 
>critical sections, so it could make use of Hartley's comedi_timeout().
>
>Some of the timeout errors could be propagated, especially for 
>s626_send_dac() which is only reachable from very few paths.
>

Awesome, I'll swap all of my timeouts out for comedi_timeout() in 
s626_send_dac().
As for propagating the timeout errors, could you please clarify that a bit 
further?  Both of the functions
which I add timeouts inside of in this patch return void, and so in their 
current state they cannot return any error
values.  Would you like them (or at least s626_send_dac()) to instead return an 
error upon timeout/or success on success,
or am I just totally misunderstanding your meaning of propagate here?

>
>There are other infinite loops involving calls to the s626_mc_test() 
>function, but those could be dealt with by other patches.
>

Yeah, I saw those...I'll whip up a patch for them, just wanted to verify that 
everything looks pretty good here
before I started on that.  I'll have that right out!

Thanks,
Chase
>
>-- 
>-=( Ian Abbott @ MEV Ltd.    E-mail:         )=-
>-=( Tel: +44 (0)161 477 1898   FAX: +44 (0)161 718 3587         )=- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Documentation/trace/postprocess/trace-pagealloc-postprocess.pl: fix the traceevent regex

2014-02-28 Thread vinayakm . list

From: Vinayak Menon 

The script fails, when irq, preempt and lockdep fields (field 3 below)
are printed in the trace output.

Example entry:
worker/0:0-4  [000] ...1  1155.972338: mm_page_alloc:

Signed-off-by: Vinayak Menon 
---
 .../postprocess/trace-pagealloc-postprocess.pl |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/Documentation/trace/postprocess/trace-pagealloc-postprocess.pl 
b/Documentation/trace/postprocess/trace-pagealloc-postprocess.pl
index 0a120aa..94efc7f 100644
--- a/Documentation/trace/postprocess/trace-pagealloc-postprocess.pl
+++ b/Documentation/trace/postprocess/trace-pagealloc-postprocess.pl
@@ -84,7 +84,7 @@ my $regex_fragdetails;
 
 # Static regex used. Specified like this for readability and for use with /o
 #  (process_pid) (cpus  )   ( time  )   (tpoint
) (details)
-my $regex_traceevent = 
'\s*([a-zA-Z0-9-]*)\s*(\[[0-9]*\])\s*([0-9.]*):\s*([a-zA-Z_]*):\s*(.*)';
+my $regex_traceevent = 
'\s*([a-zA-Z0-9-]*)\s*(\[[0-9]*\])(\s*[dX.][Nnp.][Hhs.][0-9a-fA-F.]*|)\s*([0-9.]*):\s*([a-zA-Z_]*):\s*(.*)';
 my $regex_statname = '[-0-9]*\s\((.*)\).*';
 my $regex_statppid = '[-0-9]*\s\(.*\)\s[A-Za-z]\s([0-9]*).*';
 
@@ -195,7 +195,7 @@ EVENT_PROCESS:
while ($traceevent = ) {
if ($traceevent =~ /$regex_traceevent/o) {
$process_pid = $1;
-   $tracepoint = $4;
+   $tracepoint = $5;
 
if ($opt_read_procstat || $opt_prepend_parent) {
$process_pid =~ /(.*)-([0-9]*)$/;
@@ -215,7 +215,7 @@ EVENT_PROCESS:
 
# Unnecessary in this script. Uncomment if required
# $cpus = $2;
-   # $timestamp = $3;
+   # $timestamp = $4;
} else {
next;
}
@@ -236,7 +236,7 @@ EVENT_PROCESS:
} elsif ($tracepoint eq "mm_page_alloc_extfrag") {
 
# Extract the details of the event now
-   $details = $5;
+   $details = $6;
 
my ($page, $pfn);
my ($alloc_order, $fallback_order, $pageblock_order);
-- 
1.7.6

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 0/3] tracing: fix macro expansion and refactor some of dynamic_array support

2014-02-28 Thread Filipe Brandenburger

Hi Steven Rostedt, Li Zefan,

This fixes an issue with macro expansion introduced in commit 7d536cb3f
(tracing/events: record the size of dynamic arrays).

I split it in 3 patches, the first fixes a bug, the second improves the code to
evaluate the expression only once and the third refactors an u32 holding two
pieces of data in lower/higher 16 bits into a struct to make the code cleaner.

I split them this way since I expect the first two to be more straightforward
while the third one might generate some discussion. I'd be happy to squash them
into a single one if you'd prefer that.

Cheers,
Filipe


Filipe Brandenburger (3):
  tracing: correctly expand len expressions from __dynamic_array macro
  tracing: evaluate len expression only once in __dynamic_array macro
  tracing: introduce a trace_data_offset struct to store array size

 include/linux/ftrace_event.h   |  5 +
 include/trace/ftrace.h | 26 ++
 kernel/trace/trace_events_filter.c | 13 ++---
 3 files changed, 29 insertions(+), 15 deletions(-)

-- 
1.9.0.279.gdc9e3eb

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 3/3] tracing: introduce a trace_data_offset struct to store array size

2014-02-28 Thread Filipe Brandenburger

Commit 7d536cb3f stores the length of the array in the high 16 bits of
the offset field. Using a struct with two separate 16 bit fields makes
it cleaner.

Tested: Boot kernel with this change, set a 'filename ~ "/usr/bin/pst*"'
regex filter on events/sched/sched_process_exec/filter, enabled tracing,
checked that calling pstree would log the trace event as expected.

Signed-off-by: Filipe Brandenburger 
---
 include/linux/ftrace_event.h   |  5 +
 include/trace/ftrace.h | 28 ++--
 kernel/trace/trace_events_filter.c | 13 ++---
 3 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h
index 4e4cc28..67e4122 100644
--- a/include/linux/ftrace_event.h
+++ b/include/linux/ftrace_event.h
@@ -123,6 +123,11 @@ struct trace_event {
struct trace_event_functions*funcs;
 };
 
+struct trace_array_offset {
+   u16 offset;
+   u16 length;
+};
+
 extern int register_ftrace_event(struct trace_event *event);
 extern int unregister_ftrace_event(struct trace_event *event);
 
diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h
index 86a056a..eac4d0a 100644
--- a/include/trace/ftrace.h
+++ b/include/trace/ftrace.h
@@ -48,7 +48,8 @@
 #define __array(type, item, len)   typeitem[len];
 
 #undef __dynamic_array
-#define __dynamic_array(type, item, len) u32 __data_loc_##item;
+#define __dynamic_array(type, item, len)   \
+   struct trace_array_offset __data_loc_##item;
 
 #undef __string
 #define __string(item, src) __dynamic_array(char, item, -1)
@@ -103,14 +104,14 @@
  * Include the following:
  *
  * struct ftrace_data_offsets_ {
- * u32 ;
- * u32 ;
+ * struct trace_array_offset   ;
+ * struct trace_array_offset   ;
  * [...]
  * };
  *
- * The __dynamic_array() macro will create each u32 , this is
- * to keep the offset of each array from the beginning of the event.
- * The size of an array is also encoded, in the higher 16 bits of .
+ * The __dynamic_array() macro will create each trace_array_offset , this
+ * is to keep the offset and length of each array from the beginning of the
+ * event.
  */
 
 #undef __field
@@ -123,7 +124,8 @@
 #define __array(type, item, len)
 
 #undef __dynamic_array
-#define __dynamic_array(type, item, len)   u32 item;
+#define __dynamic_array(type, item, len)   \
+   struct trace_array_offset item;
 
 #undef __string
 #define __string(item, src) __dynamic_array(char, item, -1)
@@ -195,7 +197,7 @@
 
 #undef __get_dynamic_array
 #define __get_dynamic_array(field) \
-   ((void *)__entry + (__entry->__data_loc_##field & 0x))
+   ((void *)__entry + __entry->__data_loc_##field.offset)
 
 #undef __get_str
 #define __get_str(field) (char *)__get_dynamic_array(field)
@@ -373,11 +375,10 @@ ftrace_define_fields_##call(struct ftrace_event_call 
*event_call) \
 
 #undef __dynamic_array
 #define __dynamic_array(type, item, len)   \
-   __item_length = (len) * sizeof(type);   \
-   __data_offsets->item = __data_size +\
+   __data_offsets->item.offset = __data_size + \
   offsetof(typeof(*entry), __data);\
-   __data_offsets->item |= __item_length << 16;\
-   __data_size += __item_length;
+   __data_offsets->item.length = (len) * sizeof(type); \
+   __data_size += __data_offsets->item.length;
 
 #undef __string
 #define __string(item, src) __dynamic_array(char, item,
\
@@ -389,7 +390,6 @@ static inline notrace int ftrace_get_offsets_##call(
\
struct ftrace_data_offsets_##call *__data_offsets, proto)   \
 {  \
int __data_size = 0;\
-   int __maybe_unused __item_length;   \
struct ftrace_raw_##call __maybe_unused *entry; \
\
tstruct;\
@@ -658,7 +658,7 @@ __attribute__((section("_ftrace_events"))) *__event_##call 
= _##call
 
 #undef __get_dynamic_array
 #define __get_dynamic_array(field) \
-   ((void *)__entry + (__entry->__data_loc_##field & 0x))
+   ((void *)__entry + __entry->__data_loc_##field.offset)
 
 #undef __get_str
 #define __get_str(field) (char *)__get_dynamic_array(field)
diff --git a/kernel/trace/trace_events_filter.c 
b/kernel/trace/trace_events_filter.c
index 8a86319..805fc0d 100644
--- a/kernel/trace/trace_events_filter.c

Re: [PATCH 1/2 v2] Staging: comedi: fix lines that are over 80 characters

2014-02-28 Thread Chase Southwood

>On Friday, February 28, 2014 4:31 PM, Greg KH  
>wrote:

>>On Fri, Feb 28, 2014 at 03:15:45AM -0600, Chase Southwood wrote:
>>
>> This patch introduces a simple helper function, outl_1564_timer(), to
>> allow several lines which violate the character limit to be shortened.
>> A handful of other lines that are too long are appropriately split as
>> well.
>> 
>> Cc: Dan Carpenter 
>> Signed-off-by: Chase Southwood 
>> ---
>> 2: introduced outl_1564_timer() at the suggestion of Dan.
>>  .../comedi/drivers/addi-data/hwdrv_apci1564.c      | 83 
>>+-
>>  1 file changed, 49 insertions(+), 34 deletions(-)>
>
>The Subject: doesn't match the patch content :(

Greg,
You're right and that's totally my bad!  In all honesty, I sent this as v2 of 
my original cleanup patch (without even
changing the subject :( ) when really, that was a mistake because it's pretty 
much a different patch entirely (although with
the same ultimate end goal).  It needs further changes (based on Dan's 
comments) anyway, so when I send in the next version,
the subject will be changed appropriately.  Sorry for my oversight, it won't 
happen again.

Thanks,
Chase

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 1/2] drm/panel: use gpiod interface for enable GPIO

2014-02-28 Thread Alexandre Courbot

Use the new GPIO descriptor interface to handle the panel's enable GPIO.
This considerably simplifies the code.

Signed-off-by: Alexandre Courbot 
---
 drivers/gpu/drm/panel/panel-simple.c | 69 ++--
 1 file changed, 18 insertions(+), 51 deletions(-)

diff --git a/drivers/gpu/drm/panel/panel-simple.c 
b/drivers/gpu/drm/panel/panel-simple.c
index 94cbf06..d1cabfa 100644
--- a/drivers/gpu/drm/panel/panel-simple.c
+++ b/drivers/gpu/drm/panel/panel-simple.c
@@ -22,9 +22,8 @@
  */
 
 #include 
-#include 
+#include 
 #include 
-#include 
 #include 
 #include 
 #include 
@@ -44,9 +43,6 @@ struct panel_desc {
} size;
 };
 
-/* TODO: convert to gpiod_*() API once it's been merged */
-#define GPIO_ACTIVE_LOW(1 << 0)
-
 struct panel_simple {
struct drm_panel base;
bool enabled;
@@ -57,8 +53,7 @@ struct panel_simple {
struct regulator *supply;
struct i2c_adapter *ddc;
 
-   unsigned long enable_gpio_flags;
-   int enable_gpio;
+   struct gpio_desc *enable_gpio;
 };
 
 static inline struct panel_simple *to_panel_simple(struct drm_panel *panel)
@@ -110,12 +105,8 @@ static int panel_simple_disable(struct drm_panel *panel)
backlight_update_status(p->backlight);
}
 
-   if (gpio_is_valid(p->enable_gpio)) {
-   if (p->enable_gpio_flags & GPIO_ACTIVE_LOW)
-   gpio_set_value(p->enable_gpio, 1);
-   else
-   gpio_set_value(p->enable_gpio, 0);
-   }
+   if (p->enable_gpio)
+   gpiod_set_value(p->enable_gpio, 0);
 
regulator_disable(p->supply);
p->enabled = false;
@@ -137,12 +128,8 @@ static int panel_simple_enable(struct drm_panel *panel)
return err;
}
 
-   if (gpio_is_valid(p->enable_gpio)) {
-   if (p->enable_gpio_flags & GPIO_ACTIVE_LOW)
-   gpio_set_value(p->enable_gpio, 0);
-   else
-   gpio_set_value(p->enable_gpio, 1);
-   }
+   if (p->enable_gpio)
+   gpiod_set_value(p->enable_gpio, 1);
 
if (p->backlight) {
p->backlight->props.power = FB_BLANK_UNBLANK;
@@ -185,7 +172,6 @@ static int panel_simple_probe(struct device *dev, const 
struct panel_desc *desc)
 {
struct device_node *backlight, *ddc;
struct panel_simple *panel;
-   enum of_gpio_flags flags;
int err;
 
panel = devm_kzalloc(dev, sizeof(*panel), GFP_KERNEL);
@@ -199,30 +185,19 @@ static int panel_simple_probe(struct device *dev, const 
struct panel_desc *desc)
if (IS_ERR(panel->supply))
return PTR_ERR(panel->supply);
 
-   panel->enable_gpio = of_get_named_gpio_flags(dev->of_node,
-"enable-gpios", 0,
-);
-   if (gpio_is_valid(panel->enable_gpio)) {
-   unsigned int value;
-
-   if (flags & OF_GPIO_ACTIVE_LOW)
-   panel->enable_gpio_flags |= GPIO_ACTIVE_LOW;
-
-   err = gpio_request(panel->enable_gpio, "enable");
+   panel->enable_gpio = devm_gpiod_get(dev, "enable");
+   if (!IS_ERR(panel->enable_gpio)) {
+   err = gpiod_direction_output(panel->enable_gpio, 0);
if (err < 0) {
-   dev_err(dev, "failed to request GPIO#%u: %d\n",
-   panel->enable_gpio, err);
+   dev_err(dev, "failed to setup enable GPIO: %d\n", err);
return err;
}
-
-   value = (panel->enable_gpio_flags & GPIO_ACTIVE_LOW) != 0;
-
-   err = gpio_direction_output(panel->enable_gpio, value);
-   if (err < 0) {
-   dev_err(dev, "failed to setup GPIO%u: %d\n",
-   panel->enable_gpio, err);
-   goto free_gpio;
-   }
+   } else if (PTR_ERR(panel->enable_gpio) == -ENOENT) {
+   panel->enable_gpio = NULL;
+   } else {
+   err = PTR_ERR(panel->enable_gpio);
+   dev_err(dev, "failed to request enable GPIO: %d\n", err);
+   return err;
}
 
backlight = of_parse_phandle(dev->of_node, "backlight", 0);
@@ -230,10 +205,8 @@ static int panel_simple_probe(struct device *dev, const 
struct panel_desc *desc)
panel->backlight = of_find_backlight_by_node(backlight);
of_node_put(backlight);
 
-   if (!panel->backlight) {
-   err = -EPROBE_DEFER;
-   goto free_gpio;
-   }
+   if (!panel->backlight)
+   return -EPROBE_DEFER;
}
 
ddc = of_parse_phandle(dev->of_node, "ddc-i2c-bus", 0);
@@ -265,9 +238,6 @@ free_ddc:
 free_backlight:
if (panel->backlight)

[PATCH 2/2] drm/panel: remove redundant regulator_disable()

2014-02-28 Thread Alexandre Courbot

regulator_disable() is already performed by panel_simple_disable(),
which is called by panel_simple_remove().

Signed-off-by: Alexandre Courbot 
---
 drivers/gpu/drm/panel/panel-simple.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/panel/panel-simple.c 
b/drivers/gpu/drm/panel/panel-simple.c
index d1cabfa..35d1518 100644
--- a/drivers/gpu/drm/panel/panel-simple.c
+++ b/drivers/gpu/drm/panel/panel-simple.c
@@ -257,8 +257,6 @@ static int panel_simple_remove(struct device *dev)
if (panel->backlight)
put_device(>backlight->dev);
 
-   regulator_disable(panel->supply);
-
return 0;
 }
 
-- 
1.9.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC][PATCH] audit: Simplify by assuming the callers socket buffer is large enough

2014-02-28 Thread Eric W. Biederman


Modify audit_send_reply to directly use a non-blocking send and
to return an error on failure (if anyone cares).

Modify audit_list_rules_send to use audit_send_reply and give up
if we can not send a packet.

Merge audit_list_rules into iaudit_list_rules_send as the code
is now sufficiently simple to not justify to callers.

Kill audit_send_list, audit_send_reply_thread because using
a separate thread for replies is not needed when sending
packets syncrhonously.

Signed-off-by: "Eric W. Biederman" 
---

I haven't properly tested and made certain this doesn't break userspace,
but this is much simpler than what audit is currently doing.

I really don't understand why we are using kernel threads to allow us to
exceed the receiving sockets configured buffer limits.  That just seems
insane.  If that is really what we want we should be able to force
the receiving buffer limits up in audit_send_reply.

 include/linux/audit.h |2 +-
 kernel/audit.c|   75 -
 kernel/audit.h|   14 ++---
 kernel/auditfilter.c  |   64 ++
 4 files changed, 31 insertions(+), 124 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index ec1464df4c60..cd2f5112822a 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -464,7 +464,7 @@ extern int audit_filter_user(int type);
 extern int audit_filter_type(int type);
 extern int audit_rule_change(int type, __u32 portid, int seq,
void *data, size_t datasz);
-extern int audit_list_rules_send(struct sk_buff *request_skb, int seq);
+extern void audit_list_rules_send(struct sk_buff *request_skb, int seq);
 
 extern u32 audit_enabled;
 #else /* CONFIG_AUDIT */
diff --git a/kernel/audit.c b/kernel/audit.c
index 32086bff5564..201808fc86aa 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -180,12 +180,6 @@ struct audit_buffer {
gfp_tgfp_mask;
 };
 
-struct audit_reply {
-   __u32 portid;
-   struct net *net;
-   struct sk_buff *skb;
-};
-
 static void audit_set_portid(struct audit_buffer *ab, __u32 portid)
 {
if (ab) {
@@ -496,26 +490,6 @@ static int kauditd_thread(void *dummy)
return 0;
 }
 
-int audit_send_list(void *_dest)
-{
-   struct audit_netlink_list *dest = _dest;
-   struct sk_buff *skb;
-   struct net *net = dest->net;
-   struct audit_net *aunet = net_generic(net, audit_net_id);
-
-   /* wait for parent to finish and send an ACK */
-   mutex_lock(_cmd_mutex);
-   mutex_unlock(_cmd_mutex);
-
-   while ((skb = __skb_dequeue(>q)) != NULL)
-   netlink_unicast(aunet->nlsk, skb, dest->portid, 0);
-
-   put_net(net);
-   kfree(dest);
-
-   return 0;
-}
-
 struct sk_buff *audit_make_reply(__u32 portid, int seq, int type, int done,
 int multi, const void *payload, int size)
 {
@@ -541,25 +515,9 @@ out_kfree_skb:
return NULL;
 }
 
-static int audit_send_reply_thread(void *arg)
-{
-   struct audit_reply *reply = (struct audit_reply *)arg;
-   struct net *net = reply->net;
-   struct audit_net *aunet = net_generic(net, audit_net_id);
-
-   mutex_lock(_cmd_mutex);
-   mutex_unlock(_cmd_mutex);
-
-   /* Ignore failure. It'll only happen if the sender goes away,
-  because our timeout is set to infinite. */
-   netlink_unicast(aunet->nlsk , reply->skb, reply->portid, 0);
-   put_net(net);
-   kfree(reply);
-   return 0;
-}
 /**
  * audit_send_reply - send an audit reply message via netlink
- * @portid: netlink port to which to send reply
+ * @request_skb: The request skb (used to calculate where to reply)
  * @seq: sequence number
  * @type: audit message type
  * @done: done (last) flag
@@ -570,33 +528,24 @@ static int audit_send_reply_thread(void *arg)
  * Allocates an skb, builds the netlink message, and sends it to the port id.
  * No failure notifications.
  */
-static void audit_send_reply(struct sk_buff *request_skb, int seq, int type, 
int done,
-int multi, const void *payload, int size)
+int audit_send_reply(struct sk_buff *request_skb, int seq, int type, int done,
+int multi, const void *payload, int size)
 {
u32 portid = NETLINK_CB(request_skb).portid;
struct net *net = sock_net(NETLINK_CB(request_skb).sk);
+   struct audit_net *aunet = net_generic(net, audit_net_id);
struct sk_buff *skb;
-   struct task_struct *tsk;
-   struct audit_reply *reply = kmalloc(sizeof(struct audit_reply),
-   GFP_KERNEL);
-
-   if (!reply)
-   return;
 
skb = audit_make_reply(portid, seq, type, done, multi, payload, size);
if (!skb)
-   goto out;
-
-   reply->net = get_net(net);
-   reply->portid = portid;
-   reply->skb = skb;
+   return -ENOMEM;
 
-

[PATCH] audit: Send replies in the proper network namespace.

2014-02-28 Thread Eric W. Biederman


In perverse cases of file descriptor passing the current network
namespace of a process and the network namespace of a socket used by
that socket may differ.  Therefore use the network namespace of the
appropiate socket to ensure replies always go to the appropiate
socket.

Signed-off-by: "Eric W. Biederman" 
---

This is an incremental change on top of my previous patch to guarantee
that replies always happen in the appropriate network namespace.

 include/linux/audit.h |3 ++-
 kernel/audit.c|   21 ++---
 kernel/auditfilter.c  |7 +--
 3 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/include/linux/audit.h b/include/linux/audit.h
index aa865a9a4c4f..ec1464df4c60 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -43,6 +43,7 @@ struct mq_attr;
 struct mqstat;
 struct audit_watch;
 struct audit_tree;
+struct sk_buff;
 
 struct audit_krule {
int vers_ops;
@@ -463,7 +464,7 @@ extern int audit_filter_user(int type);
 extern int audit_filter_type(int type);
 extern int audit_rule_change(int type, __u32 portid, int seq,
void *data, size_t datasz);
-extern int audit_list_rules_send(__u32 portid, int seq);
+extern int audit_list_rules_send(struct sk_buff *request_skb, int seq);
 
 extern u32 audit_enabled;
 #else /* CONFIG_AUDIT */
diff --git a/kernel/audit.c b/kernel/audit.c
index 1e5756f16f6f..32086bff5564 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -570,9 +570,11 @@ static int audit_send_reply_thread(void *arg)
  * Allocates an skb, builds the netlink message, and sends it to the port id.
  * No failure notifications.
  */
-static void audit_send_reply(__u32 portid, int seq, int type, int done,
+static void audit_send_reply(struct sk_buff *request_skb, int seq, int type, 
int done,
 int multi, const void *payload, int size)
 {
+   u32 portid = NETLINK_CB(request_skb).portid;
+   struct net *net = sock_net(NETLINK_CB(request_skb).sk);
struct sk_buff *skb;
struct task_struct *tsk;
struct audit_reply *reply = kmalloc(sizeof(struct audit_reply),
@@ -585,7 +587,7 @@ static void audit_send_reply(__u32 portid, int seq, int 
type, int done,
if (!skb)
goto out;
 
-   reply->net = get_net(current->nsproxy->net_ns);
+   reply->net = get_net(net);
reply->portid = portid;
reply->skb = skb;
 
@@ -675,8 +677,7 @@ static int audit_get_feature(struct sk_buff *skb)
 
seq = nlmsg_hdr(skb)->nlmsg_seq;
 
-   audit_send_reply(NETLINK_CB(skb).portid, seq, AUDIT_GET, 0, 0,
-, sizeof(af));
+   audit_send_reply(skb, seq, AUDIT_GET, 0, 0, , sizeof(af));
 
return 0;
 }
@@ -796,8 +797,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
s.backlog   = skb_queue_len(_skb_queue);
s.version   = AUDIT_VERSION_LATEST;
s.backlog_wait_time = audit_backlog_wait_time;
-   audit_send_reply(NETLINK_CB(skb).portid, seq, AUDIT_GET, 0, 0,
-, sizeof(s));
+   audit_send_reply(skb, seq, AUDIT_GET, 0, 0, , sizeof(s));
break;
}
case AUDIT_SET: {
@@ -907,7 +907,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
   seq, data, nlmsg_len(nlh));
break;
case AUDIT_LIST_RULES:
-   err = audit_list_rules_send(NETLINK_CB(skb).portid, seq);
+   err = audit_list_rules_send(skb, seq);
break;
case AUDIT_TRIM:
audit_trim_trees();
@@ -972,8 +972,8 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
memcpy(sig_data->ctx, ctx, len);
security_release_secctx(ctx, len);
}
-   audit_send_reply(NETLINK_CB(skb).portid, seq, AUDIT_SIGNAL_INFO,
-   0, 0, sig_data, sizeof(*sig_data) + len);
+   audit_send_reply(skb, seq, AUDIT_SIGNAL_INFO, 0, 0,
+sig_data, sizeof(*sig_data) + len);
kfree(sig_data);
break;
case AUDIT_TTY_GET: {
@@ -985,8 +985,7 @@ static int audit_receive_msg(struct sk_buff *skb, struct 
nlmsghdr *nlh)
s.log_passwd = tsk->signal->audit_tty_log_passwd;
spin_unlock(>sighand->siglock);
 
-   audit_send_reply(NETLINK_CB(skb).portid, seq,
-AUDIT_TTY_GET, 0, 0, , sizeof(s));
+   audit_send_reply(skb, seq, AUDIT_TTY_GET, 0, 0, , sizeof(s));
break;
}
case AUDIT_TTY_SET: {
diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
index a5e3d73d73e4..e8d1c7c515d7 100644
--- a/kernel/auditfilter.c
+++ b/kernel/auditfilter.c
@@ -30,6

Re: [PATCH] audit: Use struct net not pid_t to remember the network namespce to reply in

2014-02-28 Thread Eric W. Biederman

Richard Guy Briggs  writes:

> On 14/02/28, Eric W. Biederman wrote:
>> While reading through 3.14-rc1 I found a pretty siginficant mishandling
>> of network namespaces in the recent audit changes.
>> 
>> In struct audit_netlink_list and audit_reply add a reference to the
>> network namespace of the caller and remove the userspace pid of the
>> caller.  This cleanly remembers the callers network namespace, and
>> removes a huge class of races and nasty failure modes that can occur
>> when attempting to relook up the callers network namespace from a pid_t
>> (including the caller's network namespace changing, pid wraparound, and
>> the pid simply not being present).
>
> Ok, so I see that avoiding pid_t in struct audit_reply and struct
> audit_netlink_list is necessary.  Why not switch to struct pid?
>
> How does this patch solve a caller's network namespace changing?

This solves the callers network namespace changing or the caller going
away entirely (a much more serious concern) because we capture the
network namespace at the time of the request when the caller is in the
kernel.  I would have simply captured the socket we want to reply on but
there did not appear to be a good way to do that.

Reading through it again capturing current->nsproxy->net_ns is striclty
wrong.  We should be capturing sock_net(NETLINK_CB(skb).sk).  The
network namespace of the requesting socket.  That handles even weird
cases of passing file descriptors between processes in different network
namespaces.  (An incremental patch to change to code to selct the
network namespace of the requesting socket to follow in a moment).

Still what my patch implements today at least means we won't oops the
kernel if the audit process exits early, and causes get_net_ns_by_pid
to return NULL.

This whole code path is so crazy because what we really should be doing
is sending the packets in nonblocking mode and just dropping packets
if the receiving socket does not have enough socket buvffers.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] lockdep: increase static allocations

2014-02-28 Thread Mike Galbraith

On Fri, 2014-02-28 at 14:32 -0500, Sasha Levin wrote: 
> On 01/08/2014 02:21 PM, Sasha Levin wrote:
> > Fuzzing a recent kernel with a large configuration hits the static
> > allocation limits and disables lockdep.
> >
> > This patch doubles the limits.
> >
> > Signed-off-by: Sasha Levin 
> > ---
> >   kernel/locking/lockdep_internals.h | 6 +++---
> >   1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/kernel/locking/lockdep_internals.h 
> > b/kernel/locking/lockdep_internals.h
> > index 4f560cf..51c4b24 100644
> > --- a/kernel/locking/lockdep_internals.h
> > +++ b/kernel/locking/lockdep_internals.h
> > @@ -54,9 +54,9 @@ enum {
> >* table (if it's not there yet), and we check it for lock order
> >* conflicts and deadlocks.
> >*/
> > -#define MAX_LOCKDEP_ENTRIES16384UL
> > +#define MAX_LOCKDEP_ENTRIES32768UL
> >
> > -#define MAX_LOCKDEP_CHAINS_BITS15
> > +#define MAX_LOCKDEP_CHAINS_BITS16
> >   #define MAX_LOCKDEP_CHAINS(1UL << MAX_LOCKDEP_CHAINS_BITS)
> >
> >   #define MAX_LOCKDEP_CHAIN_HLOCKS (MAX_LOCKDEP_CHAINS*5)
> > @@ -65,7 +65,7 @@ enum {
> >* Stack-trace: tightly packed array of stack backtrace
> >* addresses. Protected by the hash_lock.
> >*/
> > -#define MAX_STACK_TRACE_ENTRIES262144UL
> > +#define MAX_STACK_TRACE_ENTRIES524288UL
> >
> >   extern struct list_head all_lock_classes;
> >   extern struct lock_chain lock_chains[];
> >
> 
> 
> Can someone pick the patch up please? PeterZ even (seemed to) acked it.

I have to do the (exact) same to rt trees, else lockdep routinely gets
in a snit, takes it's cool toys and goes home.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCHv2 2/3] ASoC: io: New signature for snd_soc_codec_set_cache_io()

2014-02-28 Thread Mark Brown

On Fri, Feb 28, 2014 at 05:04:26PM +0800, Xiubo Li wrote:
> Now that all users have been converted to regmap and the config.reg_bits
> and config.val_bits can be setted by each user through regmap core API.
> So these two params are redundant here.

Actually, I think the way to fix the issue with CODECs doing I/O in
probe is to provide a way for drivers to specify a regmap when
registering the CODEC (rather than during probe) and then move the
initialisation of the regmap before the probe function is called.  That
would make set_cache_io() go away entirely.

signature.asc
Description: Digital signature

Re: [PATCHv2 1/3] ASoC: codec: Simplify ASoC probe code.

2014-02-28 Thread Mark Brown

On Fri, Feb 28, 2014 at 05:04:25PM +0800, Xiubo Li wrote:

> "Just removing the set_cache_io() call will not work for all 
> drivers. There are some MFD child devices which use regmap from the parent 
> device. So dev_get_regmap() will return NULL for those."

This is the sort of thing that I was referring to when talking about
doing the non-boring drivers separately.  As well as the warnings Lars
mentioned there's a bisection issue here:

> - codec->control_data = da7213->regmap;
> - ret = snd_soc_codec_set_cache_io(codec, 8, 8, SND_SOC_REGMAP);
> - if (ret < 0) {
> - dev_err(codec->dev, "Failed to set cache I/O: %d\n", ret);
> - return ret;
> - }
> -
>   /* Default to using ALC auto offset calibration mode. */
>   snd_soc_update_bits(codec, DA7213_ALC_CTRL1,
>   DA7213_ALC_CALIB_MODE_MAN, 0);

Unless the core sets up the I/O before calling probe() the above is
going to mean that the snd_soc_update_bits() call fails since the I/O
operations won't have been set up.  There is a defualt call to set a
regmap up but it's only done after the probe.

signature.asc
Description: Digital signature

Re: [PATCHv2 2/3] ASoC: io: New signature for snd_soc_codec_set_cache_io()

2014-02-28 Thread Mark Brown

On Fri, Feb 28, 2014 at 05:04:26PM +0800, Xiubo Li wrote:
> Now that all users have been converted to regmap and the config.reg_bits
> and config.val_bits can be setted by each user through regmap core API.
> So these two params are redundant here.

This looks good.


signature.asc
Description: Digital signature

Re: [PATCH] spi: core: make zero length transfer valid again

2014-02-28 Thread Mark Brown

On Fri, Feb 28, 2014 at 11:03:16PM +0900, Atsushi Nemoto wrote:
> Zero length transfer becomes invalid since
> "spi: core: Validate length of the transfers in message" commit,
> but it should be valid to support an odd device, for example, which
> requires long delay between chipselect and the first transfer, etc.

Applied, thanks.


signature.asc
Description: Digital signature

Re: [PATCH] Staging:tidspbridge: Fixing coding style

2014-02-28 Thread Greg Kroah-Hartman

On Fri, Feb 28, 2014 at 12:30:04AM -0800, Masood Mehmood wrote:
> 
> On Fri, Feb 28, 2014 at 07:01:56PM -0800, Greg Kroah-Hartman wrote:
> > On Fri, Feb 28, 2014 at 06:15:52PM -0800, Masood Mehmood wrote:
> > > 
> > 
> > > 
> > > Fixing some basic coding style issues.
> > 
> > Which issues did you fix?  Please be more specific.  Did you fix them
> > for the whole driver, or just a specific file?
> 
> - Unnecessary line break and space.
> - and some * adjusted to the data name
> - Removed braces for single statement if conditions.

Great, can you put that in the patch itself?

As it's small, they all can be in the same patch, but normally we only
want one patch per "type" of change.  For this case, if it was lots of
changes, you would break it up into different patches.

> I just realized, other files of the same driver also need some style fixes.
> I'll send another patch with reset of the files included.

Watch out, now you might want to send multiple patches, based on the
above "one thing per patch" rule.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [alsa-devel] [PATCH 2/3] ASoC: core: Set the default I/O up try regmap.

2014-02-28 Thread Mark Brown

On Fri, Feb 28, 2014 at 09:15:37AM +0100, Lars-Peter Clausen wrote:

> Yes, I think that's almost all of them. si476x is missing, but I
> think that one is currently broken, as it doesn't call
> snd_soc_codec_set_cache_io() at all.

Probably, yeah - there were other problems with that driver that make me
question if it ever worked properly IIRC.  There is a default call to
set cache I/O already but it relies on dev_get_regmap().

> As to how to handle those, I think there was a plan to add the
> possibility to assign a regmap to a device, so that dev_get_regmap()
> returns the regmap struct that should be used, even though the
> device itself did not allocate the regmap. But I can't find the
> details. Mark may know more about this.

That's not for this and is likely to create confusion - that's for
handling early init with syscon type devices, allowing the regmap to be
created with no device and then have the device attached later.  I'd
need to look through and see what happens if two devices share a regmap,
perhaps it'd actually be OK, but we can always just allow the regmap to
be overridden at the ASoC level.

signature.asc
Description: Digital signature

Re: [RFC V1] drivers/base/regmap: Implementation for regmap_multi_reg_write

2014-02-28 Thread Mark Brown

On Fri, Feb 28, 2014 at 03:58:34PM +, Opensource [Anthony Olech] wrote:

> The algorithm for splitting up into smaller _multi_reg_writes is easy enough,
> so if the calling device driver created a set of (reg,val) pairs for a multi 
> reg
> write operation then surely the intention is for the individual pieces to be
> handled as multi reg writes.

Right, that's what should eventually happen - what I'm saying is it's OK
to defer the hard parts for later if they're not needed right now (in
much the same way that the API was added without an actual multi write
implementation).

> > > + for (i = 0, n = 0, switched = false, base = regs; i < num_regs;
> > > + i++, n++) {
> > Don't put all this stuff in the for (), just put the iteration in the for 
> > ().

> all those variables are a fundamental part of the loop, but I will change it.

It's still ugly and hard to read, look at the line wrapping...  the
normal thing is to have preconditions that aren't part of the actual
iteration process immediately before the loop statement.

signature.asc
Description: Digital signature

[PATCH RT 05/14] rcutree/rcu_bh_qs: disable irq while calling rcu_preempt_qs()

2014-02-28 Thread Steven Rostedt

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

--

From: Tiejun Chen 

Any callers to the function rcu_preempt_qs() must disable irqs in
order to protect the assignment to ->rcu_read_unlock_special. In
RT case, rcu_bh_qs() as the wrapper of rcu_preempt_qs() is called
in some scenarios where irq is enabled, like this path,

do_single_softirq()
|
+ local_irq_enable();
+ handle_softirq()
||
|+ rcu_bh_qs()
||
|+ rcu_preempt_qs()
|
+ local_irq_disable()

So here we'd better disable irq directly inside of rcu_bh_qs() to
fix this, otherwise the kernel may be freezable sometimes as
observed. And especially this way is also kind and safe for the
potential rcu_bh_qs() usage elsewhere in the future.

Cc: stable...@vger.kernel.org
Signed-off-by: Tiejun Chen 
Signed-off-by: Bin Jiang 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Steven Rostedt 
---
 kernel/rcutree.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 75743cb..5439cee 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -187,7 +187,12 @@ static void rcu_preempt_qs(int cpu);
 
 void rcu_bh_qs(int cpu)
 {
+   unsigned long flags;
+
+   /* Callers to this function, rcu_preempt_qs(), must disable irqs. */
+   local_irq_save(flags);
rcu_preempt_qs(cpu);
+   local_irq_restore(flags);
 }
 #else
 void rcu_bh_qs(int cpu)
-- 
1.8.5.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RT 01/14] rcu: Dont activate RCU core on NO_HZ_FULL CPUs

2014-02-28 Thread Steven Rostedt

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

--

From: "Paul E. McKenney" 

Whenever a CPU receives a scheduling-clock interrupt, RCU checks to see
if the RCU core needs anything from this CPU.  If so, RCU raises
RCU_SOFTIRQ to carry out any needed processing.

This approach has worked well historically, but it is undesirable on
NO_HZ_FULL CPUs.  Such CPUs are expected to spend almost all of their time
in userspace, so that scheduling-clock interrupts can be disabled while
there is only one runnable task on the CPU in question.  Unfortunately,
raising any softirq has the potential to wake up ksoftirqd, which would
provide the second runnable task on that CPU, preventing disabling of
scheduling-clock interrupts.

What is needed instead is for RCU to leave NO_HZ_FULL CPUs alone,
relying on the grace-period kthreads' quiescent-state forcing to
do any needed RCU work on behalf of those CPUs.

This commit therefore refrains from raising RCU_SOFTIRQ on any
NO_HZ_FULL CPUs during any grace periods that have been in effect
for less than one second.  The one-second limit handles the case
where an inappropriate workload is running on a NO_HZ_FULL CPU
that features lots of scheduling-clock interrupts, but no idle
or userspace time.

Cc: stable...@vger.kernel.org
Reported-by: Mike Galbraith 
Signed-off-by: Paul E. McKenney 
Tested-by: Mike Galbraith 
Tested-by: Frederic Weisbecker 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Steven Rostedt 
---
 kernel/rcutree.c|  4 
 kernel/rcutree.h|  1 +
 kernel/rcutree_plugin.h | 20 
 3 files changed, 25 insertions(+)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 55915b1..75743cb 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -2749,6 +2749,10 @@ static int __rcu_pending(struct rcu_state *rsp, struct 
rcu_data *rdp)
/* Check for CPU stalls, if enabled. */
check_cpu_stall(rsp, rdp);
 
+   /* Is this CPU a NO_HZ_FULL CPU that should ignore RCU? */
+   if (rcu_nohz_full_cpu(rsp))
+   return 0;
+
/* Is the RCU core waiting for a quiescent state from this CPU? */
if (rcu_scheduler_fully_active &&
rdp->qs_pending && !rdp->passed_quiesce) {
diff --git a/kernel/rcutree.h b/kernel/rcutree.h
index 8491f47..7e0b397 100644
--- a/kernel/rcutree.h
+++ b/kernel/rcutree.h
@@ -541,6 +541,7 @@ static void rcu_boot_init_nocb_percpu_data(struct rcu_data 
*rdp);
 static void rcu_spawn_nocb_kthreads(struct rcu_state *rsp);
 static void rcu_kick_nohz_cpu(int cpu);
 static bool init_nocb_callback_list(struct rcu_data *rdp);
+static bool rcu_nohz_full_cpu(struct rcu_state *rsp);
 
 #endif /* #ifndef RCU_TREE_NONCORE */
 
diff --git a/kernel/rcutree_plugin.h b/kernel/rcutree_plugin.h
index dc0c4b2..481a124 100644
--- a/kernel/rcutree_plugin.h
+++ b/kernel/rcutree_plugin.h
@@ -2356,3 +2356,23 @@ static void rcu_kick_nohz_cpu(int cpu)
smp_send_reschedule(cpu);
 #endif /* #ifdef CONFIG_NO_HZ_FULL */
 }
+
+/*
+ * Is this CPU a NO_HZ_FULL CPU that should ignore RCU so that the
+ * grace-period kthread will do force_quiescent_state() processing?
+ * The idea is to avoid waking up RCU core processing on such a
+ * CPU unless the grace period has extended for too long.
+ *
+ * This code relies on the fact that all NO_HZ_FULL CPUs are also
+ * CONFIG_RCU_NOCB_CPUs.
+ */
+static bool rcu_nohz_full_cpu(struct rcu_state *rsp)
+{
+#ifdef CONFIG_NO_HZ_FULL
+   if (tick_nohz_full_cpu(smp_processor_id()) &&
+   (!rcu_gp_in_progress(rsp) ||
+ULONG_CMP_LT(jiffies, ACCESS_ONCE(rsp->gp_start) + HZ)))
+   return 1;
+#endif /* #ifdef CONFIG_NO_HZ_FULL */
+   return 0;
+}
-- 
1.8.5.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RT 02/14] timers: do not raise softirq unconditionally

2014-02-28 Thread Steven Rostedt

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

--

From: Thomas Gleixner 

Mike,

On Thu, 7 Nov 2013, Mike Galbraith wrote:

> On Thu, 2013-11-07 at 04:26 +0100, Mike Galbraith wrote:
> > On Wed, 2013-11-06 at 18:49 +0100, Thomas Gleixner wrote:
>
> > > I bet you are trying to work around some of the side effects of the
> > > occasional tick which is still necessary despite of full nohz, right?
> >
> > Nope, I wanted to check out cost of nohz_full for rt, and found that it
> > doesn't work at all instead, looked, and found that the sole running
> > task has just awakened ksoftirqd when it wants to shut the tick down, so
> > that shutdown never happens.
>
> Like so in virgin 3.10-rt.  Box is x3550 M3 booted nowatchdog
> rcu_nocbs=1-3 nohz_full=1-3, and CPUs1-3 are completely isolated via
> cpusets as well.

well, that very same problem is in mainline if you add "threadirqs" to
the command line. But we can be smart about this. The untested patch
below should address that issue. If that works on mainline we can
adapt it for RT (needs a trylock(>lock) there).

Though it's not a full solution. It needs some thought versus the
softirq code of timers. Assume we have only one timer queued 1000
ticks into the future. So this change will cause the timer softirq not
to be called until that timer expires and then the timer softirq is
going to do 1000 loops until it catches up with jiffies. That's
anything but pretty ...

What worries me more is this one:

  pert-5229  [003] d..h1..   684.482618: softirq_raise: vec=9 [action=RCU]

The CPU has no callbacks as you shoved them over to cpu 0, so why is
the RCU softirq raised?

Thanks,

tglx
--
Message-id: 
|CONFIG_NO_HZ_FULL + CONFIG_PREEMPT_RT_FULL = nogo
Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Steven Rostedt 
---
 include/linux/hrtimer.h |  3 +--
 kernel/hrtimer.c| 31 +++
 kernel/timer.c  | 28 +---
 3 files changed, 33 insertions(+), 29 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 79a7a35..bdbf77db 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -461,9 +461,8 @@ extern int schedule_hrtimeout_range_clock(ktime_t *expires,
unsigned long delta, const enum hrtimer_mode mode, int clock);
 extern int schedule_hrtimeout(ktime_t *expires, const enum hrtimer_mode mode);
 
-/* Soft interrupt function to run the hrtimer queues: */
+/* Called from the periodic timer tick */
 extern void hrtimer_run_queues(void);
-extern void hrtimer_run_pending(void);
 
 /* Bootup initialization: */
 extern void __init hrtimers_init(void);
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index a63cfaf..a7e90b2 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1691,30 +1691,6 @@ static void run_hrtimer_softirq(struct softirq_action *h)
 }
 
 /*
- * Called from timer softirq every jiffy, expire hrtimers:
- *
- * For HRT its the fall back code to run the softirq in the timer
- * softirq context in case the hrtimer initialization failed or has
- * not been done yet.
- */
-void hrtimer_run_pending(void)
-{
-   if (hrtimer_hres_active())
-   return;
-
-   /*
-* This _is_ ugly: We have to check in the softirq context,
-* whether we can switch to highres and / or nohz mode. The
-* clocksource switch happens in the timer interrupt with
-* xtime_lock held. Notification from there only sets the
-* check bit in the tick_oneshot code, otherwise we might
-* deadlock vs. xtime_lock.
-*/
-   if (tick_check_oneshot_change(!hrtimer_is_hres_enabled()))
-   hrtimer_switch_to_hres();
-}
-
-/*
  * Called from hardirq context every jiffy
  */
 void hrtimer_run_queues(void)
@@ -1727,6 +1703,13 @@ void hrtimer_run_queues(void)
if (hrtimer_hres_active())
return;
 
+   /*
+* Check whether we can switch to highres mode.
+*/
+   if (tick_check_oneshot_change(!hrtimer_is_hres_enabled())
+   && hrtimer_switch_to_hres())
+   return;
+
for (index = 0; index < HRTIMER_MAX_CLOCK_BASES; index++) {
base = _base->clock_base[index];
if (!timerqueue_getnext(>active))
diff --git a/kernel/timer.c b/kernel/timer.c
index 48652cc..3d7313c 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1443,8 +1443,6 @@ static void run_timer_softirq(struct softirq_action *h)
irq_work_run();
 #endif
 
-   hrtimer_run_pending();
-
if (time_after_eq(jiffies, base->timer_jiffies))
__run_timers(base);
 }
@@ -1454,8 +1452,32 @@ static void run_timer_softirq(struct softirq_action *h)
  */
 void run_local_timers(void)
 {
+   struct tvec_base *base = __this_cpu_read(tvec_bases);
+
hrtimer_run_queues();
-   raise_softirq(TIMER_SOFTIRQ);
+

[PATCH RT 07/14] rt: Make cpu_chill() use hrtimer instead of msleep()

2014-02-28 Thread Steven Rostedt

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

--

From: Steven Rostedt 

Ulrich Obergfell pointed out that cpu_chill() calls msleep() which is woken
up by the ksoftirqd running the TIMER softirq. But as the cpu_chill() is
called from softirq context, it may block the ksoftirqd() from running, in
which case, it may never wake up the msleep() causing the deadlock.

I checked the vmcore, and irq/74-qla2xxx is stuck in the msleep() call,
running on CPU 8. The one ksoftirqd that is stuck, happens to be the one that
runs on CPU 8, and it is blocked on a lock held by irq/74-qla2xxx. As that
ksoftirqd is the one that will wake up irq/74-qla2xxx, and it happens to be
blocked on a lock that irq/74-qla2xxx holds, we have our deadlock.

The solution is not to convert the cpu_chill() back to a cpu_relax() as that
will re-create a possible live lock that the cpu_chill() fixed earlier, and may
also leave this bug open on other softirqs. The fix is to remove the
dependency on ksoftirqd from cpu_chill(). That is, instead of calling
msleep() that requires ksoftirqd to wake it up, use the
hrtimer_nanosleep() code that does the wakeup from hard irq context.

|Looks to be the lock of the block softirq. I don't have the core dump
|anymore, but from what I could tell the ksoftirqd was blocked on the
|block softirq lock, where the block softirq handler did a msleep
|(called by the qla2xxx interrupt handler).
|
|Looking at trigger_softirq() in block/blk-softirq.c, it can do a
|smp_callfunction() to another cpu to run the block softirq. If that
|happens to be the cpu where the qla2xx irq handler is doing the block
|softirq and is in a middle of a msleep(), I believe the ksoftirqd will
|try to run the softirq. If it does that, then BOOM, it's deadlocked
|because the ksoftirqd will never run the timer softirq either.

|I should have also stated that it was only one lock that was involved.
|But the lock owner was doing a msleep() that requires a wakeup by
|ksoftirqd to continue. If ksoftirqd happens to be blocked on a lock
|held by the msleep() caller, then you have your deadlock.
|
|It's best not to have any softirqs going to sleep requiring another
|softirq to wake it up. Note, if we ever require a timer softirq to do a
|cpu_chill() it will most definitely hit this deadlock.

Cc: stable...@vger.kernel.org
Found-by: Ulrich Obergfell 
Signed-off-by: Steven Rostedt 
[bigeasy: add the 4 | chapters from email]
Signed-off-by: Sebastian Andrzej Siewior 
---
 include/linux/delay.h |  2 +-
 kernel/hrtimer.c  | 15 +++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/linux/delay.h b/include/linux/delay.h
index e23a7c0..37caab3 100644
--- a/include/linux/delay.h
+++ b/include/linux/delay.h
@@ -53,7 +53,7 @@ static inline void ssleep(unsigned int seconds)
 }
 
 #ifdef CONFIG_PREEMPT_RT_FULL
-# define cpu_chill()   msleep(1)
+extern void cpu_chill(void);
 #else
 # define cpu_chill()   cpu_relax()
 #endif
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index a7e90b2..b569c6d 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1887,6 +1887,21 @@ SYSCALL_DEFINE2(nanosleep, struct timespec __user *, 
rqtp,
return hrtimer_nanosleep(, rmtp, HRTIMER_MODE_REL, CLOCK_MONOTONIC);
 }
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+/*
+ * Sleep for 1 ms in hope whoever holds what we want will let it go.
+ */
+void cpu_chill(void)
+{
+   struct timespec tu = {
+   .tv_nsec = NSEC_PER_MSEC,
+   };
+
+   hrtimer_nanosleep(, NULL, HRTIMER_MODE_REL, CLOCK_MONOTONIC);
+}
+EXPORT_SYMBOL(cpu_chill);
+#endif
+
 /*
  * Functions related to boot-time initialization:
  */
-- 
1.8.5.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RT 04/14] timer/rt: Always raise the softirq if theres irq_work to be done

2014-02-28 Thread Steven Rostedt

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

--

From: Steven Rostedt 

It was previously discovered that some systems would hang on boot up
with a previous version of 3.12-rt. This was due to RCU using irq_work,
and RT defers the irq_work to a softirq. But if there's no active
timers, the softirq will not be raised, and RCU work will not get done,
causing the system to hang.  The fix was to check that if there was no
active timers but irq_work to be done, then we should raise the softirq.

But this fix was not 100% correct. It left out the case that there were
active timers that were not expired yet. This would have the softirq
not get raised even if there was irq work to be done.

If there is irq_work to be done, then we must raise the timer softirq
regardless of if there is active timers or whether they are expired or
not. The softirq can handle those cases. But we can never ignore
irq_work.

As it is only PREEMPT_RT_FULL that requires irq_work to be done in the
softirq, we can pull out the check in the active_timers condition, and
make the code a bit cleaner by having the irq_work check separate, and
put the code in with the other #ifdef PREEMPT_RT. If there is irq_work
to be done, there's no need to check the active timers or if they are
expired. Just raise the time softirq and be done with it. Otherwise, we
can do the timer checks just like we do with non -rt.

Cc: stable...@vger.kernel.org
Signed-off-by: Steven Rostedt 
Signed-off-by: Sebastian Andrzej Siewior 
---
 kernel/timer.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/kernel/timer.c b/kernel/timer.c
index 25a38c4..f63a793 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1461,18 +1461,20 @@ void run_local_timers(void)
 * the timer softirq.
 */
 #ifdef CONFIG_PREEMPT_RT_FULL
+   /* On RT, irq work runs from softirq */
+   if (irq_work_needs_cpu()) {
+   raise_softirq(TIMER_SOFTIRQ);
+   return;
+   }
+
if (!spin_do_trylock(>lock)) {
raise_softirq(TIMER_SOFTIRQ);
return;
}
 #endif
-   if (!base->active_timers) {
-#ifdef CONFIG_PREEMPT_RT_FULL
-   /* On RT, irq work runs from softirq */
-   if (!irq_work_needs_cpu())
-#endif
-   goto out;
-   }
+
+   if (!base->active_timers)
+   goto out;
 
/* Check whether the next pending timer has expired */
if (time_before_eq(base->next_timer, jiffies))
-- 
1.8.5.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RT 06/14] Revert "x86: Disable IST stacks for debug/int 3/stack fault for PREEMPT_RT"

2014-02-28 Thread Steven Rostedt

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

--

From: Sebastian Andrzej Siewior 

where do I start. Let me explain what is going on here. The code
sequence
| pushf
| pop%edx
| or $0x1,%dh
| push   %edx
| mov$0xe0,%eax
| popf
| sysenter

triggers the bug. On 64bit kernel we see the double fault (with 32bit and
64bit userland) and on 32bit kernel there is no problem. The reporter said
that double fault does not happen on 64bit kernel with 64bit userland and
this is because in that case the VDSO uses the "syscall" interface instead
of "sysenter".

The bug. "popf" loads the flags with the TF bit set which enables
"single stepping" and this leads to a debug exception. Usually on 64bit
we have a special IST stack for the debug exception. Due to patch [0] we
do not use the IST stack but the kernel stack instead. On 64bit the
sysenter instruction starts in kernel with the stack address NULL. The
code sequence above enters the debug exception (TF flag) after the
sysenter instruction was executed which sets the stack pointer to NULL
and we have a fault (it seems that the debug exception saves some bytes
on the stack).
To fix the double fault I'm going to drop patch [0]. It is completely
pointless. In do_debug() and do_stack_segment() we disable preemption
which means the task can't leave the CPU. So it does not matter if we run
on IST or on kernel stack.
There is a patch [1] which drops preempt_disable() call for a 32bit
kernel but not for 64bit so there should be no regression.
And [1] seems valid even for this code sequence. We enter the debug
exception with a 256bytes long per cpu stack and migrate to the kernel
stack before calling do_debug().

[0] x86-disable-debug-stack.patch
[1] fix-rt-int3-x86_32-3.2-rt.patch

Cc: stable...@vger.kernel.org
Reported-by: Brian Silverman 
Cc: Andi Kleen 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Steven Rostedt 
---
 arch/x86/include/asm/page_64_types.h | 21 ++---
 arch/x86/kernel/cpu/common.c |  2 --
 arch/x86/kernel/dumpstack_64.c   |  4 
 3 files changed, 6 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/page_64_types.h 
b/arch/x86/include/asm/page_64_types.h
index 3cedb22..6c896fb 100644
--- a/arch/x86/include/asm/page_64_types.h
+++ b/arch/x86/include/asm/page_64_types.h
@@ -14,21 +14,12 @@
 #define IRQ_STACK_ORDER 2
 #define IRQ_STACK_SIZE (PAGE_SIZE << IRQ_STACK_ORDER)
 
-#ifdef CONFIG_PREEMPT_RT_FULL
-# define STACKFAULT_STACK 0
-# define DOUBLEFAULT_STACK 1
-# define NMI_STACK 2
-# define DEBUG_STACK 0
-# define MCE_STACK 3
-# define N_EXCEPTION_STACKS 3  /* hw limit: 7 */
-#else
-# define STACKFAULT_STACK 1
-# define DOUBLEFAULT_STACK 2
-# define NMI_STACK 3
-# define DEBUG_STACK 4
-# define MCE_STACK 5
-# define N_EXCEPTION_STACKS 5  /* hw limit: 7 */
-#endif
+#define STACKFAULT_STACK 1
+#define DOUBLEFAULT_STACK 2
+#define NMI_STACK 3
+#define DEBUG_STACK 4
+#define MCE_STACK 5
+#define N_EXCEPTION_STACKS 5  /* hw limit: 7 */
 
 #define PUD_PAGE_SIZE  (_AC(1, UL) << PUD_SHIFT)
 #define PUD_PAGE_MASK  (~(PUD_PAGE_SIZE-1))
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index d39993b..deeb48d 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1109,9 +1109,7 @@ DEFINE_PER_CPU(struct task_struct *, fpu_owner_task);
  */
 static const unsigned int exception_stack_sizes[N_EXCEPTION_STACKS] = {
  [0 ... N_EXCEPTION_STACKS - 1]= EXCEPTION_STKSZ,
-#if DEBUG_STACK > 0
  [DEBUG_STACK - 1] = DEBUG_STKSZ
-#endif
 };
 
 static DEFINE_PER_CPU_PAGE_ALIGNED(char, exception_stacks
diff --git a/arch/x86/kernel/dumpstack_64.c b/arch/x86/kernel/dumpstack_64.c
index 52b4bcd..addb207 100644
--- a/arch/x86/kernel/dumpstack_64.c
+++ b/arch/x86/kernel/dumpstack_64.c
@@ -21,14 +21,10 @@
(N_EXCEPTION_STACKS + DEBUG_STKSZ/EXCEPTION_STKSZ - 2)
 
 static char x86_stack_ids[][8] = {
-#if DEBUG_STACK > 0
[ DEBUG_STACK-1 ]   = "#DB",
-#endif
[ NMI_STACK-1   ]   = "NMI",
[ DOUBLEFAULT_STACK-1   ]   = "#DF",
-#if STACKFAULT_STACK > 0
[ STACKFAULT_STACK-1]   = "#SS",
-#endif
[ MCE_STACK-1   ]   = "#MC",
 #if DEBUG_STKSZ > EXCEPTION_STKSZ
[ N_EXCEPTION_STACKS ...
-- 
1.8.5.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 14/28] Remove MACH_SMDKC210

2014-02-28 Thread Mark Brown

On Fri, Feb 28, 2014 at 10:43:09PM +0100, Paul Bolle wrote:

> That commit is fine with me, of course. I now see no reason to continue
> my, rather slowly progressing, search for the problem that you wanted to
> get properly fixed. I suppose another commit already fixed it.

No, but it's someone from Samsung (Sachin works with the Samsung landing
team at Linaro) not caring about those drivers on these boards any more
and mentioning the DT conversion which is rather different to someone
doing mechanical cleanup with no mention of where the symbols went.

It should be fairly obvious that if the reason symbols are being removed
due to DT conversion of the platforms then the default thing should be
that the drivers be being converted to DT and appropriate DT entries
being added to the board DTS files, but more generally the important
thing is that some understanding is shown as to why the symbols vanished
and why the mechanical fix suggested is OK.

signature.asc
Description: Digital signature

[PATCH RT 08/14] kernel/hrtimer: be non-freezeable in cpu_chill()

2014-02-28 Thread Steven Rostedt

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

--

From: Sebastian Andrzej Siewior 

Since we replaced msleep() by hrtimer I see now and then (rarely) this:

| [] Waiting for /dev to be fully populated...
| =
| [ BUG: udevd/229 still has locks held! ]
| 3.12.11-rt17 #23 Not tainted
| -
| 1 lock held by udevd/229:
|  #0:  (>i_mutex_dir_key#2){+.+.+.}, at: lookup_slow+0x28/0x98
|
| stack backtrace:
| CPU: 0 PID: 229 Comm: udevd Not tainted 3.12.11-rt17 #23
| (unwind_backtrace+0x0/0xf8) from (show_stack+0x10/0x14)
| (show_stack+0x10/0x14) from (dump_stack+0x74/0xbc)
| (dump_stack+0x74/0xbc) from (do_nanosleep+0x120/0x160)
| (do_nanosleep+0x120/0x160) from (hrtimer_nanosleep+0x90/0x110)
| (hrtimer_nanosleep+0x90/0x110) from (cpu_chill+0x30/0x38)
| (cpu_chill+0x30/0x38) from (dentry_kill+0x158/0x1ec)
| (dentry_kill+0x158/0x1ec) from (dput+0x74/0x15c)
| (dput+0x74/0x15c) from (lookup_real+0x4c/0x50)
| (lookup_real+0x4c/0x50) from (__lookup_hash+0x34/0x44)
| (__lookup_hash+0x34/0x44) from (lookup_slow+0x38/0x98)
| (lookup_slow+0x38/0x98) from (path_lookupat+0x208/0x7fc)
| (path_lookupat+0x208/0x7fc) from (filename_lookup+0x20/0x60)
| (filename_lookup+0x20/0x60) from (user_path_at_empty+0x50/0x7c)
| (user_path_at_empty+0x50/0x7c) from (user_path_at+0x14/0x1c)
| (user_path_at+0x14/0x1c) from (vfs_fstatat+0x48/0x94)
| (vfs_fstatat+0x48/0x94) from (SyS_stat64+0x14/0x30)
| (SyS_stat64+0x14/0x30) from (ret_fast_syscall+0x0/0x48)

For now I see no better way but to disable the freezer the sleep the period.

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Steven Rostedt 
---
 kernel/hrtimer.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index b569c6d..eb4c8831c 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1896,8 +1896,12 @@ void cpu_chill(void)
struct timespec tu = {
.tv_nsec = NSEC_PER_MSEC,
};
+   unsigned int freeze_flag = current->flags & PF_NOFREEZE;
 
+   current->flags |= PF_NOFREEZE;
hrtimer_nanosleep(, NULL, HRTIMER_MODE_REL, CLOCK_MONOTONIC);
+   if (!freeze_flag)
+   current->flags &= ~PF_NOFREEZE;
 }
 EXPORT_SYMBOL(cpu_chill);
 #endif
-- 
1.8.5.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RT 13/14] rcu: Eliminate softirq processing from rcutree

2014-02-28 Thread Steven Rostedt

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

--

From: "Paul E. McKenney" 

Running RCU out of softirq is a problem for some workloads that would
like to manage RCU core processing independently of other softirq work,
for example, setting kthread priority.  This commit therefore moves the
RCU core work from softirq to a per-CPU/per-flavor SCHED_OTHER kthread
named rcuc.  The SCHED_OTHER approach avoids the scalability problems
that appeared with the earlier attempt to move RCU core processing to
from softirq to kthreads.  That said, kernels built with RCU_BOOST=y
will run the rcuc kthreads at the RCU-boosting priority.

Cc: stable...@vger.kernel.org
Reported-by: Thomas Gleixner 
Tested-by: Mike Galbraith 
Signed-off-by: Paul E. McKenney 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Steven Rostedt 
---
 kernel/rcutree.c| 113 +++-
 kernel/rcutree.h|   3 +-
 kernel/rcutree_plugin.h | 134 +---
 3 files changed, 113 insertions(+), 137 deletions(-)

diff --git a/kernel/rcutree.c b/kernel/rcutree.c
index 5439cee..f3bc6eb 100644
--- a/kernel/rcutree.c
+++ b/kernel/rcutree.c
@@ -53,6 +53,11 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
+#include "time/tick-internal.h"
 
 #include "rcutree.h"
 #include 
@@ -128,8 +133,6 @@ EXPORT_SYMBOL_GPL(rcu_scheduler_active);
  */
 static int rcu_scheduler_fully_active __read_mostly;
 
-#ifdef CONFIG_RCU_BOOST
-
 /*
  * Control variables for per-CPU and per-rcu_node kthreads.  These
  * handle all flavors of RCU.
@@ -139,8 +142,6 @@ DEFINE_PER_CPU(unsigned int, rcu_cpu_kthread_status);
 DEFINE_PER_CPU(unsigned int, rcu_cpu_kthread_loops);
 DEFINE_PER_CPU(char, rcu_cpu_has_work);
 
-#endif /* #ifdef CONFIG_RCU_BOOST */
-
 static void rcu_boost_kthread_setaffinity(struct rcu_node *rnp, int 
outgoingcpu);
 static void invoke_rcu_core(void);
 static void invoke_rcu_callbacks(struct rcu_state *rsp, struct rcu_data *rdp);
@@ -2312,16 +2313,14 @@ __rcu_process_callbacks(struct rcu_state *rsp)
 /*
  * Do RCU core processing for the current CPU.
  */
-static void rcu_process_callbacks(struct softirq_action *unused)
+static void rcu_process_callbacks(void)
 {
struct rcu_state *rsp;
 
if (cpu_is_offline(smp_processor_id()))
return;
-   trace_rcu_utilization("Start RCU core");
for_each_rcu_flavor(rsp)
__rcu_process_callbacks(rsp);
-   trace_rcu_utilization("End RCU core");
 }
 
 /*
@@ -2335,18 +2334,105 @@ static void invoke_rcu_callbacks(struct rcu_state 
*rsp, struct rcu_data *rdp)
 {
if (unlikely(!ACCESS_ONCE(rcu_scheduler_fully_active)))
return;
-   if (likely(!rsp->boost)) {
-   rcu_do_batch(rsp, rdp);
+   rcu_do_batch(rsp, rdp);
+}
+
+static void rcu_wake_cond(struct task_struct *t, int status)
+{
+   /*
+* If the thread is yielding, only wake it when this
+* is invoked from idle
+*/
+   if (t && (status != RCU_KTHREAD_YIELDING || is_idle_task(current)))
+   wake_up_process(t);
+}
+
+/*
+ * Wake up this CPU's rcuc kthread to do RCU core processing.
+ */
+static void invoke_rcu_core(void)
+{
+   unsigned long flags;
+   struct task_struct *t;
+
+   if (!cpu_online(smp_processor_id()))
return;
+   local_irq_save(flags);
+   __this_cpu_write(rcu_cpu_has_work, 1);
+   t = __this_cpu_read(rcu_cpu_kthread_task);
+   if (t != NULL && current != t)
+   rcu_wake_cond(t, __this_cpu_read(rcu_cpu_kthread_status));
+   local_irq_restore(flags);
+}
+
+static void rcu_cpu_kthread_park(unsigned int cpu)
+{
+   per_cpu(rcu_cpu_kthread_status, cpu) = RCU_KTHREAD_OFFCPU;
+}
+
+static int rcu_cpu_kthread_should_run(unsigned int cpu)
+{
+   return __this_cpu_read(rcu_cpu_has_work);
+}
+
+/*
+ * Per-CPU kernel thread that invokes RCU callbacks.  This replaces the
+ * RCU softirq used in flavors and configurations of RCU that do not
+ * support RCU priority boosting.
+ */
+static void rcu_cpu_kthread(unsigned int cpu)
+{
+   unsigned int *statusp = &__get_cpu_var(rcu_cpu_kthread_status);
+   char work, *workp = &__get_cpu_var(rcu_cpu_has_work);
+   int spincnt;
+
+   for (spincnt = 0; spincnt < 10; spincnt++) {
+   trace_rcu_utilization("Start CPU kthread@rcu_wait");
+   local_bh_disable();
+   *statusp = RCU_KTHREAD_RUNNING;
+   this_cpu_inc(rcu_cpu_kthread_loops);
+   local_irq_disable();
+   work = *workp;
+   *workp = 0;
+   local_irq_enable();
+   if (work)
+   rcu_process_callbacks();
+   local_bh_enable();
+   if (*workp == 0) {
+   trace_rcu_utilization("End CPU kthread@rcu_wait");
+

[PATCH RT 09/14] irq_work: allow certain work in hard irq context

2014-02-28 Thread Steven Rostedt

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

--

From: Sebastian Andrzej Siewior 

irq_work is processed in softirq context on -RT because we want to avoid
long latencies which might arise from processing lots of perf events.
The noHZ-full mode requires its callback to be called from real hardirq
context (commit 76c24fb ("nohz: New APIs to re-evaluate the tick on full
dynticks CPUs")). If it is called from a thread context we might get
wrong results for checks like "is_idle_task(current)".
This patch introduces a second list (hirq_work_list) which will be used
if irq_work_run() has been invoked from hardirq context and process only
work items marked with IRQ_WORK_HARD_IRQ.

This patch also removes arch_irq_work_raise() from sparc & powerpc like
it is already done for x86. Atleast for powerpc it is somehow
superfluous because it is called from the timer interrupt which should
invoke update_process_times().

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Steven Rostedt 
---
 arch/powerpc/kernel/time.c |  2 +-
 arch/sparc/kernel/pcr.c|  2 ++
 include/linux/irq_work.h   |  1 +
 kernel/irq_work.c  | 22 +++---
 kernel/time/tick-sched.c   |  1 +
 kernel/timer.c |  2 +-
 6 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 5fc29ad..7cc55b2 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -423,7 +423,7 @@ unsigned long profile_pc(struct pt_regs *regs)
 EXPORT_SYMBOL(profile_pc);
 #endif
 
-#ifdef CONFIG_IRQ_WORK
+#if defined(CONFIG_IRQ_WORK) && !defined(CONFIG_PREEMPT_RT_FULL)
 
 /*
  * 64-bit uses a byte in the PACA, 32-bit uses a per-cpu variable...
diff --git a/arch/sparc/kernel/pcr.c b/arch/sparc/kernel/pcr.c
index 269af58..dbb51a6 100644
--- a/arch/sparc/kernel/pcr.c
+++ b/arch/sparc/kernel/pcr.c
@@ -43,10 +43,12 @@ void __irq_entry deferred_pcr_work_irq(int irq, struct 
pt_regs *regs)
set_irq_regs(old_regs);
 }
 
+#ifndef CONFIG_PREEMPT_RT_FULL
 void arch_irq_work_raise(void)
 {
set_softint(1 << PIL_DEFERRED_PCR_WORK);
 }
+#endif
 
 const struct pcr_ops *pcr_ops;
 EXPORT_SYMBOL_GPL(pcr_ops);
diff --git a/include/linux/irq_work.h b/include/linux/irq_work.h
index 6601702..60c19ee 100644
--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -16,6 +16,7 @@
 #define IRQ_WORK_BUSY  2UL
 #define IRQ_WORK_FLAGS 3UL
 #define IRQ_WORK_LAZY  4UL /* Doesn't want IPI, wait for tick */
+#define IRQ_WORK_HARD_IRQ  8UL /* Run hard IRQ context, even on RT */
 
 struct irq_work {
unsigned long flags;
diff --git a/kernel/irq_work.c b/kernel/irq_work.c
index f6e4377..35d21f9 100644
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -20,6 +20,9 @@
 
 
 static DEFINE_PER_CPU(struct llist_head, irq_work_list);
+#ifdef CONFIG_PREEMPT_RT_FULL
+static DEFINE_PER_CPU(struct llist_head, hirq_work_list);
+#endif
 static DEFINE_PER_CPU(int, irq_work_raised);
 
 /*
@@ -48,7 +51,11 @@ static bool irq_work_claim(struct irq_work *work)
return true;
 }
 
+#ifdef CONFIG_PREEMPT_RT_FULL
+void arch_irq_work_raise(void)
+#else
 void __weak arch_irq_work_raise(void)
+#endif
 {
/*
 * Lame architectures will get the timer tick callback
@@ -70,8 +77,12 @@ void irq_work_queue(struct irq_work *work)
/* Queue the entry and raise the IPI if needed. */
preempt_disable();
 
-   llist_add(>llnode, &__get_cpu_var(irq_work_list));
-
+#ifdef CONFIG_PREEMPT_RT_FULL
+   if (work->flags & IRQ_WORK_HARD_IRQ)
+   llist_add(>llnode, &__get_cpu_var(hirq_work_list));
+   else
+#endif
+   llist_add(>llnode, &__get_cpu_var(irq_work_list));
/*
 * If the work is not "lazy" or the tick is stopped, raise the irq
 * work interrupt (if supported by the arch), otherwise, just wait
@@ -115,7 +126,12 @@ static void __irq_work_run(void)
__this_cpu_write(irq_work_raised, 0);
barrier();
 
-   this_list = &__get_cpu_var(irq_work_list);
+#ifdef CONFIG_PREEMPT_RT_FULL
+   if (in_irq())
+   this_list = &__get_cpu_var(hirq_work_list);
+   else
+#endif
+   this_list = &__get_cpu_var(irq_work_list);
if (llist_empty(this_list))
return;
 
diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
index 4e657e5..aa1e4b2 100644
--- a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -214,6 +214,7 @@ static void nohz_full_kick_work_func(struct irq_work *work)
 
 static DEFINE_PER_CPU(struct irq_work, nohz_full_kick_work) = {
.func = nohz_full_kick_work_func,
+   .flags = IRQ_WORK_HARD_IRQ,
 };
 
 /*
diff --git a/kernel/timer.c b/kernel/timer.c
index f63a793..76846a1 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1425,7 +1425,7 @@ void update_process_times(int user_tick)

[PATCH] lib: radix: return correct error code on insertion failure

2014-02-28 Thread Sasha Levin

We would never check the return value of __radix_tree_create() on insertion
which would cause us to return -EEXIST on all cases of failure, even when
such failure would be running out of memory, for example.

This would trigger errors in various code that assumed that -EEXIST is
a critical failure, as opposed to a "regular" error. For example, it
would trigger a VM_BUG_ON in mm's swap handling:

[  469.636769] kernel BUG at mm/swap_state.c:113!
[  469.636769] invalid opcode:  [#1] PREEMPT SMP DEBUG_PAGEALLOC
[  469.638313] Dumping ftrace buffer:
[  469.638526](ftrace buffer empty)
[  469.640016] Modules linked in:
[  469.640110] CPU: 54 PID: 4598 Comm: kswapd6 Tainted: GW
3.14.0-rc4-next-20140228-sasha-00012-g6bbcf46-dirty #29
[  469.640110] task: 8802850d3000 ti: 8802850cc000 task.ti: 
8802850cc000
[  469.640110] RIP: 0010:[]  [] 
__add_to_swap_cache+0x132/0x170
[  469.640110] RSP: :8802850cd7a8  EFLAGS: 00010246
[  469.640110] RAX: 8001 RBX: ea000a02ca00 RCX: 
[  469.640110] RDX: 0001 RSI: 0001 RDI: 
[  469.640110] RBP: 8802850cd7c8 R08:  R09: 
[  469.640110] R10: 0001 R11: 0001 R12: 868c2e18
[  469.640110] R13: 868c2e30 R14: ffef R15: 
[  469.640110] FS:  () GS:88028680() 
knlGS:
[  469.640110] CS:  0010 DS:  ES:  CR0: 8005003b
[  469.640110] CR2: 029c23b0 CR3: 824ca000 CR4: 06e0
[  469.640110] Stack:
[  469.640110]  ea000a02ca00 0204c037 8802850cd9c8 
ea000a02ca00
[  469.640110]  8802850cd7f8 81296cac 8802850cd9c8 
8802850cd9c8
[  469.640110]  ea000a02ca00 0204c037 8802850cd828 
81296d90
[  469.640110] Call Trace:
[  469.640110]  [] add_to_swap_cache+0x2c/0x60
[  469.640110]  [] add_to_swap+0xb0/0xe0
[  469.640110]  [] shrink_page_list+0x411/0x7c0
[  469.640110]  [] shrink_inactive_list+0x31c/0x570
[  469.640110]  [] ? shrink_active_list+0x30b/0x320
[  469.640110]  [] shrink_lruvec+0x124/0x300
[  469.640110]  [] shrink_zone+0x8e/0x1d0
[  469.640110]  [] kswapd_shrink_zone+0xf1/0x1b0
[  469.640110]  [] balance_pgdat+0x363/0x540
[  469.640110]  [] kswapd+0x2b3/0x310
[  469.640110]  [] ? 
ftrace_raw_event_mm_vmscan_writepage+0x180/0x180
[  469.640110]  [] kthread+0x105/0x110
[  469.640110]  [] ? __lock_release+0x1e2/0x200
[  469.640110]  [] ? set_kthreadd_affinity+0x30/0x30
[  469.640110]  [] ret_from_fork+0x7c/0xb0
[  469.640110]  [] ? set_kthreadd_affinity+0x30/0x30
[  469.640110] Code: 00 00 be 0a 00 00 00 e8 0d ae fd ff 48 ff 05 b6 33 d2 06 
4c 89 ef e8 1e f6 0f 03 eb 2c 4c 89 ef e8 14 f6 0f 03 41 83 fe ef 75 04 <0f> 0b 
eb fe 48 c7 43 30 00 00 00 00 f0 80 63 02 fe 48 89 df e8
[  469.640110] RIP  [] __add_to_swap_cache+0x132/0x170
[  469.640110]  RSP 

Signed-off-by: Sasha Levin 
---
 lib/radix-tree.c |2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index f5ea7c9..9599aa7 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -444,6 +444,8 @@ int radix_tree_insert(struct radix_tree_root *root,
BUG_ON(radix_tree_is_indirect_ptr(item));
 
error = __radix_tree_create(root, index, , );
+   if (error)
+   return error;
if (*slot != NULL)
return -EEXIST;
rcu_assign_pointer(*slot, item);
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RT 11/14] net: ip_send_unicast_reply: add missing local serialization

2014-02-28 Thread Steven Rostedt

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

--

From: Nicholas Mc Guire 

in response to the oops in ip_output.c:ip_send_unicast_reply under high
network load with CONFIG_PREEMPT_RT_FULL=y, reported by Sami Pietikainen
, this patch adds local serialization in
ip_send_unicast_reply.

from ip_output.c:
/*
 *  Generic function to send a packet as reply to another packet.
 *  Used to send some TCP resets/acks so far.
 *
 *  Use a fake percpu inet socket to avoid false sharing and contention.
 */
static DEFINE_PER_CPU(struct inet_sock, unicast_sock) = {
...

which was added in commit be9f4a44 in linux-stable. The git log, wich
introduced the PER_CPU unicast_sock, states:

commit be9f4a44e7d41cee50ddb5f038fc2391cbbb4046
Author: Eric Dumazet 
Date:   Thu Jul 19 07:34:03 2012 +

ipv4: tcp: remove per net tcp_sock

tcp_v4_send_reset() and tcp_v4_send_ack() use a single socket
per network namespace.

This leads to bad behavior on multiqueue NICS, because many cpus
contend for the socket lock and once socket lock is acquired, extra
false sharing on various socket fields slow down the operations.

To better resist to attacks, we use a percpu socket. Each cpu can
run without contention, using appropriate memory (local node)


The per-cpu here thus is assuming exclusivity serializing per cpu - so
the use of get_cpu_ligh introduced in
net-use-cpu-light-in-ip-send-unicast-reply.patch, which droped the
preempt_disable in favor of a migrate_disable is probably wrong as this
only handles the referencial consistency but not the serialization. To
evade a preempt_disable here a local lock would be needed.

Therapie:
 * add local lock:
 * and re-introduce local serialization:

Tested on x86 with high network load using the testcase from Sami Pietikainen
  while : ; do wget -O - ftp://LOCAL_SERVER/empty_file > /dev/null 2>&1; done

Link: http://www.spinics.net/lists/linux-rt-users/msg11007.html
Cc: stable...@vger.kernel.org
Signed-off-by: Nicholas Mc Guire 
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Steven Rostedt 
---
 net/ipv4/ip_output.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 2d69b1b..ab674e8 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -79,6 +79,7 @@
 #include 
 #include 
 #include 
+#include 
 
 int sysctl_ip_default_ttl __read_mostly = IPDEFTTL;
 EXPORT_SYMBOL(sysctl_ip_default_ttl);
@@ -1471,6 +1472,9 @@ static DEFINE_PER_CPU(struct inet_sock, unicast_sock) = {
.uc_ttl = -1,
 };
 
+/* serialize concurrent calls on the same CPU to ip_send_unicast_reply */
+static DEFINE_LOCAL_IRQ_LOCK(unicast_lock);
+
 void ip_send_unicast_reply(struct net *net, struct sk_buff *skb, __be32 daddr,
   __be32 saddr, const struct ip_reply_arg *arg,
   unsigned int len)
@@ -1508,8 +1512,7 @@ void ip_send_unicast_reply(struct net *net, struct 
sk_buff *skb, __be32 daddr,
if (IS_ERR(rt))
return;
 
-   get_cpu_light();
-   inet = &__get_cpu_var(unicast_sock);
+   inet = _locked_var(unicast_lock, unicast_sock);
 
inet->tos = arg->tos;
sk = >sk;
@@ -1533,7 +1536,7 @@ void ip_send_unicast_reply(struct net *net, struct 
sk_buff *skb, __be32 daddr,
ip_push_pending_frames(sk, );
}
 
-   put_cpu_light();
+   put_locked_var(unicast_lock, unicast_sock);
 
ip_rt_put(rt);
 }
-- 
1.8.5.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RT 12/14] leds: trigger: disable CPU trigger on -RT

2014-02-28 Thread Steven Rostedt

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

--

From: Sebastian Andrzej Siewior 

as it triggers:
|CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.8-rt10 #141
|[] (unwind_backtrace+0x0/0xf8) from [] 
(show_stack+0x1c/0x20)
|[] (show_stack+0x1c/0x20) from [] (dump_stack+0x20/0x2c)
|[] (dump_stack+0x20/0x2c) from [] 
(__might_sleep+0x13c/0x170)
|[] (__might_sleep+0x13c/0x170) from [] 
(__rt_spin_lock+0x28/0x38)
|[] (__rt_spin_lock+0x28/0x38) from [] 
(rt_read_lock+0x68/0x7c)
|[] (rt_read_lock+0x68/0x7c) from [] 
(led_trigger_event+0x2c/0x5c)
|[] (led_trigger_event+0x2c/0x5c) from [] 
(ledtrig_cpu+0x54/0x5c)
|[] (ledtrig_cpu+0x54/0x5c) from [] 
(arch_cpu_idle_exit+0x18/0x1c)
|[] (arch_cpu_idle_exit+0x18/0x1c) from [] 
(cpu_startup_entry+0xa8/0x234)
|[] (cpu_startup_entry+0xa8/0x234) from [] 
(rest_init+0xb8/0xe0)
|[] (rest_init+0xb8/0xe0) from [] (start_kernel+0x2c4/0x380)

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Steven Rostedt 
---
 drivers/leds/trigger/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/leds/trigger/Kconfig b/drivers/leds/trigger/Kconfig
index 49794b4..3d7245d 100644
--- a/drivers/leds/trigger/Kconfig
+++ b/drivers/leds/trigger/Kconfig
@@ -61,7 +61,7 @@ config LEDS_TRIGGER_BACKLIGHT
 
 config LEDS_TRIGGER_CPU
bool "LED CPU Trigger"
-   depends on LEDS_TRIGGERS
+   depends on LEDS_TRIGGERS && !PREEMPT_RT_BASE
help
  This allows LEDs to be controlled by active CPUs. This shows
  the active CPUs across an array of LEDs so you can see which
-- 
1.8.5.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RT 00/14] Linux 3.10.32-rt31-rc2

2014-02-28 Thread Steven Rostedt


Dear RT Folks,

This is the RT stable review cycle of patch 3.10.32-rt31-rc2.

Please scream at me if I messed something up. Please test the patches too.

The -rc release will be uploaded to kernel.org and will be deleted when
the final release is out. This is just a review release (or release candidate).

The pre-releases will not be pushed to the git repository, only the
final release is.

If all goes well, this patch will be converted to the next main release
on 3/3/2014.

Enjoy,

-- Steve


To build 3.10.32-rt31-rc2 directly, the following patches should be applied:

  http://www.kernel.org/pub/linux/kernel/v3.x/linux-3.10.tar.xz

  http://www.kernel.org/pub/linux/kernel/v3.x/patch-3.10.32.xz

  
http://www.kernel.org/pub/linux/kernel/projects/rt/3.10/patch-3.10.32-rt31-rc2.patch.xz

You can also build from 3.10.32-rt30 by applying the incremental patch:

http://www.kernel.org/pub/linux/kernel/projects/rt/3.10/incr/patch-3.10.32-rt30-rt31-rc2.patch.xz


Changes from 3.10.32-rt30:

---


Nicholas Mc Guire (1):
  net: ip_send_unicast_reply: add missing local serialization

Paul E. McKenney (2):
  rcu: Don't activate RCU core on NO_HZ_FULL CPUs
  rcu: Eliminate softirq processing from rcutree

Sebastian Andrzej Siewior (5):
  Revert "x86: Disable IST stacks for debug/int 3/stack fault for 
PREEMPT_RT"
  kernel/hrtimer: be non-freezeable in cpu_chill()
  irq_work: allow certain work in hard irq context
  arm/unwind: use a raw_spin_lock
  leds: trigger: disable CPU trigger on -RT

Steven Rostedt (3):
  timer: Raise softirq if there's irq_work
  timer/rt: Always raise the softirq if there's irq_work to be done
  rt: Make cpu_chill() use hrtimer instead of msleep()

Steven Rostedt (Red Hat) (1):
  Linux 3.10.32-rt31-rc2

Thomas Gleixner (1):
  timers: do not raise softirq unconditionally

Tiejun Chen (1):
  rcutree/rcu_bh_qs: disable irq while calling rcu_preempt_qs()


 arch/arm/kernel/unwind.c |  14 ++--
 arch/powerpc/kernel/time.c   |   2 +-
 arch/sparc/kernel/pcr.c  |   2 +
 arch/x86/include/asm/page_64_types.h |  21 ++---
 arch/x86/kernel/cpu/common.c |   2 -
 arch/x86/kernel/dumpstack_64.c   |   4 -
 drivers/leds/trigger/Kconfig |   2 +-
 include/linux/delay.h|   2 +-
 include/linux/hrtimer.h  |   3 +-
 include/linux/irq_work.h |   1 +
 kernel/hrtimer.c |  50 ++--
 kernel/irq_work.c|  22 -
 kernel/rcutree.c | 122 +++
 kernel/rcutree.h |   4 +-
 kernel/rcutree_plugin.h  | 154 ---
 kernel/time/tick-sched.c |   1 +
 kernel/timer.c   |  37 -
 localversion-rt  |   2 +-
 net/ipv4/ip_output.c |   9 +-
 19 files changed, 249 insertions(+), 205 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RT 14/14] Linux 3.10.32-rt31-rc2

2014-02-28 Thread Steven Rostedt

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

--

From: "Steven Rostedt (Red Hat)" 

---
 localversion-rt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/localversion-rt b/localversion-rt
index b72862e..c6273a4 100644
--- a/localversion-rt
+++ b/localversion-rt
@@ -1 +1 @@
--rt30
+-rt31-rc2
-- 
1.8.5.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RT 10/14] arm/unwind: use a raw_spin_lock

2014-02-28 Thread Steven Rostedt

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

--

From: Sebastian Andrzej Siewior 

Mostly unwind is done with irqs enabled however SLUB may call it with
irqs disabled while creating a new SLUB cache.

I had system freeze while loading a module which called
kmem_cache_create() on init. That means SLUB's __slab_alloc() disabled
interrupts and then

->new_slab_objects()
 ->new_slab()
  ->setup_object()
   ->setup_object_debug()
->init_tracking()
 ->set_track()
  ->save_stack_trace()
   ->save_stack_trace_tsk()
->walk_stackframe()
 ->unwind_frame()
  ->unwind_find_idx()
   =>spin_lock_irqsave(_lock);

Cc: stable...@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior 
Signed-off-by: Steven Rostedt 
---
 arch/arm/kernel/unwind.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/arm/kernel/unwind.c b/arch/arm/kernel/unwind.c
index 00df012..bbafc67 100644
--- a/arch/arm/kernel/unwind.c
+++ b/arch/arm/kernel/unwind.c
@@ -87,7 +87,7 @@ extern const struct unwind_idx __start_unwind_idx[];
 static const struct unwind_idx *__origin_unwind_idx;
 extern const struct unwind_idx __stop_unwind_idx[];
 
-static DEFINE_SPINLOCK(unwind_lock);
+static DEFINE_RAW_SPINLOCK(unwind_lock);
 static LIST_HEAD(unwind_tables);
 
 /* Convert a prel31 symbol to an absolute address */
@@ -195,7 +195,7 @@ static const struct unwind_idx *unwind_find_idx(unsigned 
long addr)
/* module unwind tables */
struct unwind_table *table;
 
-   spin_lock_irqsave(_lock, flags);
+   raw_spin_lock_irqsave(_lock, flags);
list_for_each_entry(table, _tables, list) {
if (addr >= table->begin_addr &&
addr < table->end_addr) {
@@ -207,7 +207,7 @@ static const struct unwind_idx *unwind_find_idx(unsigned 
long addr)
break;
}
}
-   spin_unlock_irqrestore(_lock, flags);
+   raw_spin_unlock_irqrestore(_lock, flags);
}
 
pr_debug("%s: idx = %p\n", __func__, idx);
@@ -469,9 +469,9 @@ struct unwind_table *unwind_table_add(unsigned long start, 
unsigned long size,
tab->begin_addr = text_addr;
tab->end_addr = text_addr + text_size;
 
-   spin_lock_irqsave(_lock, flags);
+   raw_spin_lock_irqsave(_lock, flags);
list_add_tail(>list, _tables);
-   spin_unlock_irqrestore(_lock, flags);
+   raw_spin_unlock_irqrestore(_lock, flags);
 
return tab;
 }
@@ -483,9 +483,9 @@ void unwind_table_del(struct unwind_table *tab)
if (!tab)
return;
 
-   spin_lock_irqsave(_lock, flags);
+   raw_spin_lock_irqsave(_lock, flags);
list_del(>list);
-   spin_unlock_irqrestore(_lock, flags);
+   raw_spin_unlock_irqrestore(_lock, flags);
 
kfree(tab);
 }
-- 
1.8.5.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH RT 03/14] timer: Raise softirq if theres irq_work

2014-02-28 Thread Steven Rostedt

3.10.32-rt31-rc2 stable review patch.
If anyone has any objections, please let me know.

--

From: Steven Rostedt 

[ Talking with Sebastian on IRC, it seems that doing the irq_work_run()
  from the interrupt in -rt is a bad thing. Here we simply raise the
  softirq if there's irq work to do. This too boots on my i7 ]

After trying hard to figure out why my i7 box was locking up with the
new active_timers code, that does not run the timer softirq if there
are no active timers, I took an extra look at the softirq handler and
noticed that it doesn't just run timer softirqs, it also runs irq work.

This was the bug that was locking up the system. It wasn't missing a
timer, it was missing irq work. By always doing the irq work callbacks,
the system boots fine. The missing irq work callback was the RCU's
sp_wakeup() function.

No need to check for defined(CONFIG_IRQ_WORK). When that's not set the
"irq_work_needs_cpu()" is a static inline that returns false.

Cc: stable...@vger.kernel.org
Signed-off-by: Steven Rostedt 
Signed-off-by: Sebastian Andrzej Siewior 
---
 kernel/timer.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/timer.c b/kernel/timer.c
index 3d7313c..25a38c4 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1466,8 +1466,13 @@ void run_local_timers(void)
return;
}
 #endif
-   if (!base->active_timers)
-   goto out;
+   if (!base->active_timers) {
+#ifdef CONFIG_PREEMPT_RT_FULL
+   /* On RT, irq work runs from softirq */
+   if (!irq_work_needs_cpu())
+#endif
+   goto out;
+   }
 
/* Check whether the next pending timer has expired */
if (time_before_eq(base->next_timer, jiffies))
-- 
1.8.5.3


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Staging:tidspbridge: Fixing coding style

2014-02-28 Thread Masood Mehmood


On Fri, Feb 28, 2014 at 07:01:56PM -0800, Greg Kroah-Hartman wrote:
> On Fri, Feb 28, 2014 at 06:15:52PM -0800, Masood Mehmood wrote:
> > 
> 
> > 
> > Fixing some basic coding style issues.
> 
> Which issues did you fix?  Please be more specific.  Did you fix them
> for the whole driver, or just a specific file?

- Unnecessary line break and space.
- and some * adjusted to the data name
- Removed braces for single statement if conditions.

I just realized, other files of the same driver also need some style fixes.
I'll send another patch with reset of the files included.

> 
> And what's with the odd multiple attachments?
Sorry, new to mutt.

> 
> thanks,
> 
> greg k-h

Thanks,
Masood

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread Steven Rostedt

On Fri, 28 Feb 2014 18:34:00 -0500 (EST)
Vince Weaver  wrote:

> > I was poking fun at you on IRC for this exact reason:
> > 
> >  poor Vince, I keep sending him new patches. "No, don't test this 
> > patch, now test this one. Oh wait, try this one instead"
> > * peterz sees Vince thinking: "stop... sending.. me.. damn... patches... 
> > already... !!!11!"
> >  or at least, "Let me finish this test before I cancel it again 
> > for another damn patch"
> >  then he's probably doing "I'm not going to run any tests now, 
> > until I wait a while to see if there's a new patch to test"

I hope you were not offended by this. It was actually a testament to
the work you've given us.

> 
> Well while it might appear that I spend all of my days finding perf_event 
> bugs, I actually am a college professor so I do occasionally have to run 
> off to teach a class, meet with students, or write papers/grants for other 
> academics to reject.

But perf_event bug finder is a much more prestigious title than
"college professor" ;-)

> 
> It's nice others can reproduce the issue now, it would have saved me a lot 
> of trouble, although now in theory I have a much better handle of how to 
> use/abuse ftrace so I guess it was worth it.

I always enjoy finding bugs in other people's code. That's usually the
best way to learn what their code does.

> 
> Once the fix gets into git I'm sure the relentless perf_fuzzer will let us 
> know if there are any other issues left.  I do look forward to the day 
> when I can leave it running overnight and have a clean syslog the next 
> morning.

BTW, is the perf_fuzzer code posted somewhere? It sounds like it can be
really useful for us to do our own testing too.

Thanks,

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/3] media/drx39xyj: fix DJH_DEBUG path null pointer dereferences, and compile errors.

2014-02-28 Thread Shuah Khan


On 02/28/2014 05:13 PM, Devin Heitmueller wrote:

Seems kind of strange that I wasn't on the CC for this, since I was the
original author of all that code (in fact, DJH are my initials).

Mauro, did you strip off my authorship when you pulled the patches from
my tree?

The patches themselves look sane, and I will send a formal Acked-by once
I can get in front of a real computer.

Devin


Thanks for the ack. I will include you on the cc for future patches. I 
am working in Mauro's experimental git and probably that explains why 
get_maintainer.pl just showed linux-media and Mauro.


-- Shuah



On Feb 28, 2014 4:23 PM, "Shuah Khan" mailto:shuah...@samsung.com>> wrote:

This patch series fixes null pointer dereference boot failure as well as
compile errors.

Shuah Khan (3):
   media/drx39xyj: fix pr_dbg undefined compile errors when DJH_DEBUG is
 defined
   media/drx39xyj: remove return that prevents DJH_DEBUG code to run
   media/drx39xyj: fix boot failure due to null pointer dereference

  drivers/media/dvb-frontends/drx39xyj/drxj.c | 31
++---
  1 file changed, 19 insertions(+), 12 deletions(-)

--
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe
linux-media" in
the body of a message to majord...@vger.kernel.org

More majordomo info at http://vger.kernel.org/majordomo-info.html




--
Shuah Khan
Senior Linux Kernel Developer - Open Source Group
Samsung Research America(Silicon Valley)
shuah...@samsung.com | (970) 672-0658
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 net-next 1/1] bpf32->bpf64 mapper and bpf64 interpreter

2014-02-28 Thread Daniel Borkmann


On 02/28/2014 09:53 PM, Alexei Starovoitov wrote:

On Fri, Feb 28, 2014 at 4:45 AM, Daniel Borkmann  wrote:

...

Did you also test that seccomp-BPF still works out?


yes. Have a prototype, but it needs a bit more cleanup.


Here's [1] actually some code snippet for user space for prctl(). The
libseccomp [2] actually wraps around that and makes usage easier.

  [1] http://outflux.net/teach-seccomp/
  [2] http://lwn.net/Articles/491308/

...

We should keep naming consistent (so either extended BPF or BPF64),
so maybe bpf_ext_enable ? I'd presume rather {bpf,sk_filter}*_ext


agree. we need consistent naming for both (old and new).
I'll try all-out rename of bpf_*() into sk_filter64_*() and sk_filter_ext_*()
to see which one looks better.
I'm kinda leaning to sk_filter64, since it's easier to quickly spot
the difference
and more descriptive.


Just saw your second email regarding sk_filter_ext() et al, yep, I agree.


as in 'struct bpf_insn' the immediate value is 32 bit, so for 64 bit
comparisons, you'd still need to load to immediate values, right?


there is no insn that use 64-bit immediate, since 64-bit immediates
are extremely rare. grep x86-64 asm code for movabsq will return very few.
llvm or gcc can easily construct any constant by combination of mov,
shifts and ors.
bpf64 comparisons are all 64-bit right now. So far I didn't see a need to do
32-bit comparison, since old bpf is all unsigned, mapping 32->64 of
Jxx is painless.


Hm, fair enough, I was just thinking for comparisons of IPv6 addresses
when we do socket filtering. On the other hand, old and new insns are
both 64 bit wide and can be used though the same api then.


Notice I added signed comparison, since real life programs cannot do
without them.
I also kept the spirit of old bpf having > and >= only. (no < and <=)
that made llvm/gcc backends a bit harder to do, since no real cpu has
such limitations.


Hehe, I proposed them once, but for low level BPF it was just easier to
adjust jt/jf offsets differently to achieve the same.


I'm still contemplating do add < and <= (both signed and unsigned), which is
interesting trade-off: number of instructions vs complexity of compiler


After all your changes, we will still have the bpf_jit_enable knob
in tact, right?


Yes.
In this diff the workflow is the following:

old filter comes through sk_attach_filter() or sk_unattached_filter_create()
if (bpf64) {
 convert to new
 sk_chk_filter() - check old bpf
 use new interpreter
} else {
 sk_chk_filter() - check old bpf
 if (bpf_jit_enable)
 use old jit
 else
 use old interpreter
}
soon I'll add bpf64 jit and will reuse the same bpf_jit_enable knob for it.
then add new/old inband demux into sk_attach_filter(),
so that workflow will become:

a filter comes through sk_attach_filter() or sk_unattached_filter_create()
if (new filter prog) {
 sk_chk_filter64() - check new bpf
 if (bpf_jit_enable)
 use new jit
 else
 use new interpreter
} else {
 if (bpf64_enable) {
convert to new
sk_chk_filter() - check old bpf
if (bpf_jit_enable)
 use new jit
else
 use new interpreter
 } else {
sk_chk_filter()
if (bpf_jit_enable)
use old jit
else
use old interpreter
 }
}
eventually bpf64_enable can be made default,
the last 'else' can be retired and 'bpf64_enable' removed along with
old interpreter.

bpf_jit_enable knob will stay for foreseeable future.


Okay, cool. I think it seems reasonable to keep this knob around anyway.
E.g. for seccomp some people might argue speed is important, maybe other
more security related distros might not want to rely on JIT and therefore
trade performance.

...

Why would that need to be exported as a symbol?


the performance numbers I mentioned are from bpf_bench that is done
as kernel module, so I used this for debugging from it.
also to see what execution traces I get while running trinity bpf fuzzer :)


I would actually like to avoid having this pr_info* inside the kernel.
Couldn't this be done e.g. through systemtap script that could e.g. be
placed under tools/net/ or inside the documentation file?


like the idea!
Will drop it from the diff and eventually will move it to tools/net.


Sounds great!

Thanks,

Daniel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/2] i2c: Add driver for Cadence I2C controller

2014-02-28 Thread Sören Brinkmann

On Fri, 2014-02-28 at 04:00PM -0800, Soren Brinkmann wrote:
> Add a driver for the Cadence I2C controller. This controller is for
> example found in Xilinx Zynq.
> 
> Signed-off-by: Soren Brinkmann 
> ---
>  .../devicetree/bindings/i2c/i2c-cadence.txt|  21 +
>  MAINTAINERS|   1 +
>  drivers/i2c/busses/Kconfig |   7 +
>  drivers/i2c/busses/Makefile|   1 +
>  drivers/i2c/busses/i2c-cadence.c   | 901 
> +
>  5 files changed, 931 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/i2c/i2c-cadence.txt
>  create mode 100644 drivers/i2c/busses/i2c-cadence.c
> 
[...]
> diff --git a/drivers/i2c/busses/Kconfig b/drivers/i2c/busses/Kconfig
> index f5ed03164d86..7806c1654068 100644
> --- a/drivers/i2c/busses/Kconfig
> +++ b/drivers/i2c/busses/Kconfig
> @@ -375,6 +375,13 @@ config I2C_BLACKFIN_TWI_CLK_KHZ
>   help
> The unit of the TWI clock is kHz.
>  
> +config I2C_CADENCE
> + tristate "Cadence I2C Controller"
> + depends on ARCH_ZYNQ
This can be reduced to a dependency on COMMON_CLK only, I think.

Sören


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Staging:tidspbridge: Fixing coding style

2014-02-28 Thread Greg Kroah-Hartman

On Fri, Feb 28, 2014 at 06:15:52PM -0800, Masood Mehmood wrote:
> 

> 
> Fixing some basic coding style issues.

Which issues did you fix?  Please be more specific.  Did you fix them
for the whole driver, or just a specific file?

And what's with the odd multiple attachments?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v3 5/5] pci: Add support for creating a generic host_bridge from device tree

2014-02-28 Thread Liviu Dudau

On Fri, Feb 28, 2014 at 06:07:05PM -0800, Tanmay Inamdar wrote:
> Earlier email did not deliver to mailing lists because of plain text
> setting problem on my side. Apologies for spamming. Sending it again.
> 
> Hello Liviu,
> 

Hello Tanmay,

> While porting X-Gene PCIe driver to v2 series, following problems were 
> observed.

Thanks for trying it out.

> 
> 1. In 'of_create_pci_host_bridge' function, bus_range is defined
> locally. So, while walking over list of resources in bridge->windows
> later, during X-Gene controller related setup, garbage values are
> found in the resource. Please allocate it dynamically.

Bah, sorry for that. Will fix.

> 
> 2. 'domain_nr' problem is partially solved. There are still some
> places where functions are getting invalid domain_nr.  For example,
> 'pci_alloc_child_bus' tries to get domain_nr when bridge is not
> assigned to bus. You may want to look for all the places where
> pci_domain_nr is used. Please see below dump -->
> 
> pci 0001:00:00.0: scanning [bus 00-00] behind bridge, pass 1
> [ cut here ]
> WARNING: CPU: 0 PID: 1 at
> /home/tinamdar/work/open-source/linux/fs/sysfs/dir.c:52
> sysfs_warn_dup+0x80/0xc0()
> sysfs: cannot create duplicate filename '/class/pci_bus/:01'
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc4+ #37
> Call trace:
> [] dump_backtrace+0x0/0x140
> [] show_stack+0x14/0x20
> [] dump_stack+0x78/0xc4
> [] warn_slowpath_common+0x88/0xc0
> [] warn_slowpath_fmt+0x50/0x60
> [] sysfs_warn_dup+0x80/0xc0
> [] sysfs_do_create_link_sd.isra.2+0xf8/0x100
> [] sysfs_create_link+0x20/0x40
> [] device_add+0x41c/0x520
> [] device_register+0x1c/0x40
> [] pci_add_new_bus+0x284/0x380
> [] pci_scan_bridge+0x4e0/0x540
> [] pci_scan_child_bus+0xb4/0x140
> [] pci_rescan_bus+0x14/0x40
> [] xgene_pcie_probe_bridge+0x688/0x750
> [] platform_drv_probe+0x24/0x60
> [] really_probe+0xf4/0x220
> [] __driver_attach+0xa4/0xc0
> [] bus_for_each_dev+0x58/0xa0
> [] driver_attach+0x20/0x40
> [] bus_add_driver+0x150/0x220
> [] driver_register+0x60/0x120
> [] __platform_driver_register+0x60/0x80
> [] xgene_pcie_driver_init+0x18/0x20
> [] do_one_initcall+0xe4/0x160

do you have your xgene_pcie_driver_init being called out of some 
subsys_initcall?
If so, remove it and let the generic DT parsing code match your driver. The
bridge should've been associated with the root bus by the time the child busses
are scanned and allocated, unless I'm missing something obvious.

Also, can you share your version of your driver with me?

Best regards,
Liviu

> [] kernel_init_freeable+0x138/0x1d8
> [] kernel_init+0x10/0xe0
> ---[ end trace 53db1c3a7fbdeb88 ]---
> [ cut here ]
> WARNING: CPU: 0 PID: 1 at
> /home/tinamdar/work/open-source/linux/drivers/pci/probe.c:711
> pci_add_new_bus+0x36c/0x380()
> 
> Thanks,
> Tanmay
> 
> On Fri, Feb 28, 2014 at 6:01 PM, Tanmay Inamdar  wrote:
> > Hello Liviu,
> >
> > While porting X-Gene PCIe driver to v2 series, following problems were
> > observed.
> >
> > 1. In 'of_create_pci_host_bridge' function, bus_range is defined locally.
> > So, while walking over list of resources in bridge->windows later, during
> > X-Gene controller related setup, garbage values are found in the resource.
> > Please allocate it dynamically.
> >
> > 2. 'domain_nr' problem is partially solved. There are still some places
> > where functions are getting invalid domain_nr.  For example,
> > 'pci_alloc_child_bus' tries to get domain_nr when bridge is not assigned to
> > bus. You may want to look for all the places where pci_domain_nr is used.
> > Please see below dump -->
> >
> > pci 0001:00:00.0: scanning [bus 00-00] behind bridge, pass 1
> > [ cut here ]
> > WARNING: CPU: 0 PID: 1 at
> > /home/tinamdar/work/open-source/linux/fs/sysfs/dir.c:52
> > sysfs_warn_dup+0x80/0xc0()
> > sysfs: cannot create duplicate filename '/class/pci_bus/:01'
> > Modules linked in:
> > CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc4+ #37
> > Call trace:
> > [] dump_backtrace+0x0/0x140
> > [] show_stack+0x14/0x20
> > [] dump_stack+0x78/0xc4
> > [] warn_slowpath_common+0x88/0xc0
> > [] warn_slowpath_fmt+0x50/0x60
> > [] sysfs_warn_dup+0x80/0xc0
> > [] sysfs_do_create_link_sd.isra.2+0xf8/0x100
> > [] sysfs_create_link+0x20/0x40
> > [] device_add+0x41c/0x520
> > [] device_register+0x1c/0x40
> > [] pci_add_new_bus+0x284/0x380
> > [] pci_scan_bridge+0x4e0/0x540
> > [] pci_scan_child_bus+0xb4/0x140
> > [] pci_rescan_bus+0x14/0x40
> > [] xgene_pcie_probe_bridge+0x688/0x750
> > [] platform_drv_probe+0x24/0x60
> > [] really_probe+0xf4/0x220
> > [] __driver_attach+0xa4/0xc0
> > [] bus_for_each_dev+0x58/0xa0
> > [] driver_attach+0x20/0x40
> > [] bus_add_driver+0x150/0x220
> > [] driver_register+0x60/0x120
> > [] __platform_driver_register+0x60/0x80
> > [] xgene_pcie_driver_init+0x18/0x20
> > [] do_one_initcall+0xe4/0x160
> > [] kernel_init_freeable+0x138/0x1d8
> > [] kernel_init+0x10/0xe0
> > ---[ end trace

[PATCH V5 1/2] x86: IOSF: add dummy functions for loadable modules

2014-02-28 Thread David E. Box

From: "David E. Box" 

Some loadable modules only need IOSF access on the platforms where it exists.
Provide dummy functions to allow these modules to compile and load on the
platforms where it doesn't exist.

Signed-off-by: David E. Box 
---
 arch/x86/include/asm/iosf_mbi.h |   33 +
 arch/x86/kernel/iosf_mbi.c  |6 ++
 2 files changed, 39 insertions(+)

diff --git a/arch/x86/include/asm/iosf_mbi.h b/arch/x86/include/asm/iosf_mbi.h
index 8e71c79..9fc5402 100644
--- a/arch/x86/include/asm/iosf_mbi.h
+++ b/arch/x86/include/asm/iosf_mbi.h
@@ -5,6 +5,8 @@
 #ifndef IOSF_MBI_SYMS_H
 #define IOSF_MBI_SYMS_H
 
+#ifdef CONFIG_IOSF_MBI
+
 #define MBI_MCR_OFFSET 0xD0
 #define MBI_MDR_OFFSET 0xD4
 #define MBI_MCRX_OFFSET0xD8
@@ -50,6 +52,8 @@
 #define BT_MBI_PCIE_READ   0x00
 #define BT_MBI_PCIE_WRITE  0x01
 
+bool iosf_mbi_available(void);
+
 /**
  * iosf_mbi_read() - MailBox Interface read command
  * @port:  port indicating subunit being accessed
@@ -87,4 +91,33 @@ int iosf_mbi_write(u8 port, u8 opcode, u32 offset, u32 mdr);
  */
 int iosf_mbi_modify(u8 port, u8 opcode, u32 offset, u32 mdr, u32 mask);
 
+#else /* CONFIG_IOSF_MBI is not enabled */
+static inline
+bool iosf_mbi_available(void)
+{
+   return false;
+}
+
+static inline
+int iosf_mbi_read(u8 port, u8 opcode, u32 offset, u32 *mdr)
+{
+   WARN();
+   return -EPERM;
+}
+
+static inline
+int iosf_mbi_write(u8 port, u8 opcode, u32 offset, u32 mdr)
+{
+   WARN();
+   return -EPERM;
+}
+
+static inline
+int iosf_mbi_modify(u8 port, u8 opcode, u32 offset, u32 mdr, u32 mask)
+{
+   WARN();
+   return -EPERM;
+}
+#endif /* CONFIG_IOSF_MBI */
+
 #endif /* IOSF_MBI_SYMS_H */
diff --git a/arch/x86/kernel/iosf_mbi.c b/arch/x86/kernel/iosf_mbi.c
index c3aae66..d3803c6 100644
--- a/arch/x86/kernel/iosf_mbi.c
+++ b/arch/x86/kernel/iosf_mbi.c
@@ -177,6 +177,12 @@ int iosf_mbi_modify(u8 port, u8 opcode, u32 offset, u32 
mdr, u32 mask)
 }
 EXPORT_SYMBOL(iosf_mbi_modify);
 
+bool iosf_mbi_available(void)
+{
+   return mbi_pdev;
+}
+EXPORT_SYMBOL(iosf_mbi_available);
+
 static int iosf_mbi_probe(struct pci_dev *pdev,
  const struct pci_device_id *unused)
 {
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V5 2/2] x86: IOSF: Change IOSF_MBI Kconfig to default y

2014-02-28 Thread David E. Box

From: "David E. Box" 

Make the IOSF Mailbox driver built in as it provides core functionality needed
for new Intel SOC platforms to access the device registers on the SOC.

Signed-off-by: David E. Box 
---
 arch/x86/Kconfig |7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index d3b1f8b..e34b252 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2385,12 +2385,9 @@ config X86_DMA_REMAP
depends on STA2X11
 
 config IOSF_MBI
-   bool
+   bool "Intel IOSF Mailbox Driver support"
+   default y
depends on PCI
-   ---help---
- To be selected by modules requiring access to the Intel OnChip System
- Fabric (IOSF) Sideband MailBox Interface (MBI). For MBI platforms
- enumerable by PCI.
 
 source "net/Kconfig"
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH V5 0/2] x86: IOSF: Add loadable module support

2014-02-28 Thread David E. Box

From: "David E. Box" 

This patch series adds missing functionalty that mostly affected loadable
modules.

The first patch adds dummy functions to allow drivers not completely
dependant on the IOSF MBI driver to compile on systems that don't have it.

The second makes MBI driver built in.

Changes from V4:

- Put back Kconfig prompt for IOSF_MBI.

Changes from V3:

- Code is agreed to be small enough to not warrant forcing for
  non-EXPERT only

Changes from V2:

- Remove non linux style externs from iosf_mbi.h

Changes from V1:

- Force default y for non-EXPERT to allow easier custom configuration
  as suggested by hpa 
- Add WARN() to dummy functions, other than iosf_mbi_available(), to
  signal incorrect use as suggested by Alan 
  Also return EPERM in these functions
- Splits single patch into two patch series

David E. Box (2):
  x86: IOSF: add dummy functions for loadable modules
  x86: IOSF: Change IOSF_MBI Kconfig to default y

 arch/x86/Kconfig|7 ++-
 arch/x86/include/asm/iosf_mbi.h |   33 +
 arch/x86/kernel/iosf_mbi.c  |6 ++
 3 files changed, 41 insertions(+), 5 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

RE: [f2fs-dev] [PATCH] f2fs: fix dirty page accounting when redirty

2014-02-28 Thread Chao Yu

Hi,

> -Original Message-
> From: Dave Chinner [mailto:da...@fromorbit.com]
> Sent: Saturday, March 01, 2014 8:27 AM
> To: Chao Yu
> Cc: ???; linux-fsde...@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-f2fs-de...@lists.sourceforge.net
> Subject: Re: [f2fs-dev] [PATCH] f2fs: fix dirty page accounting when redirty
> 
> On Fri, Feb 28, 2014 at 10:12:05AM +0800, Chao Yu wrote:
> > We should de-account dirty counters for page when redirty in ->writepage().
> >
> > Wu Fengguang described in 'commit 971767caf632190f77a40b4011c19948232eed75':
> > "writeback: fix dirtied pages accounting on redirty
> > De-account the accumulative dirty counters on page redirty.
> >
> > Page redirties (very common in ext4) will introduce mismatch between
> > counters (a) and (b)
> >
> > a) NR_DIRTIED, BDI_DIRTIED, tsk->nr_dirtied
> > b) NR_WRITTEN, BDI_WRITTEN
> >
> > This will introduce systematic errors in balanced_rate and result in
> > dirty page position errors (ie. the dirty pages are no longer balanced
> > around the global/bdi setpoints)."
> >
> > Signed-off-by: Chao Yu 
> > ---
> >  fs/f2fs/checkpoint.c |1 +
> >  fs/f2fs/data.c   |1 +
> >  fs/f2fs/node.c   |1 +
> >  3 files changed, 3 insertions(+)
> >
> > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> > index c8516ee..f069249 100644
> > --- a/fs/f2fs/checkpoint.c
> > +++ b/fs/f2fs/checkpoint.c
> > @@ -178,6 +178,7 @@ no_write:
> >  redirty_out:
> > dec_page_count(sbi, F2FS_DIRTY_META);
> > wbc->pages_skipped++;
> > +   account_page_redirty(page);
> > set_page_dirty(page);
> > return AOP_WRITEPAGE_ACTIVATE;
> 
> redirty_page_for_writepage()?

set_page_dirty() in a_ops of f2fs not only call __set_page_dirty_nobuffers(),
but also set some private data of page and print trace info.
So it seems we could not easily replace with redirty_page_for_writepage().

Thanks.

> 
> Cheers,
> 
> Dave.
> --
> Dave Chinner
> da...@fromorbit.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/2] ARM: zynq: dt: Add I2C nodes to Zynq device tree

2014-02-28 Thread Sören Brinkmann

On Fri, 2014-02-28 at 04:00PM -0800, Soren Brinkmann wrote:
> Signed-off-by: Soren Brinkmann 
> ---
>  arch/arm/boot/dts/zynq-7000.dtsi | 22 
>  arch/arm/boot/dts/zynq-zc702.dts | 76 
> 
>  arch/arm/boot/dts/zynq-zc706.dts | 68 +++
>  3 files changed, 166 insertions(+)
> 
[...]
> diff --git a/arch/arm/boot/dts/zynq-zc702.dts 
> b/arch/arm/boot/dts/zynq-zc702.dts
> index c913f77a21eb..91a3a560f03b 100644
> --- a/arch/arm/boot/dts/zynq-zc702.dts
> +++ b/arch/arm/boot/dts/zynq-zc702.dts
[...]
> +
> + i2c@7 {
> + #address-cells = <1>;
> + #size-cells = <0>;
> + reg = <7>;
> + hwmon@52 {
> + compatible = "pmbus,ucd9248";
> + reg = <52>;
> + };
> + hwmon@53 {
> + compatible = "pmbus,ucd9248";
> + reg = <53>;
> + };
> + hwmon@54 {
> + compatible = "pmbus,ucd9248";
All these occurrences of 'pmbus' should rather be replaced with 'ti'.
I wasn't aware of how this DT node to I2C probing works - and still
haven't fully understood it - but the vendor prefix is apparently a
don't care and we could simply use the correct one here.
I have no idea how I came up with the description using 'pmbus' (we
carry this in our vendor tree for quite a while already).
Is there any document describing how a DT node is matched to an
appropriate driver? These I2C drivers usually don't have the common
'struct of_device_id'. Hence my preferred way of simply grepping through
the kernel for the compatibility string does not work.

[...]
> diff --git a/arch/arm/boot/dts/zynq-zc706.dts 
> b/arch/arm/boot/dts/zynq-zc706.dts
> index 88f62c50382e..dab30f80d5b0 100644
> --- a/arch/arm/boot/dts/zynq-zc706.dts
> +++ b/arch/arm/boot/dts/zynq-zc706.dts
[...]
> + i2c@7 {
> + #address-cells = <1>;
> + #size-cells = <0>;
> + reg = <7>;
> + ucd90120@65 {
> + compatible = "pmbus,ucd90120";
same here

Sören


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 03/17] lustre/llite: Do not send parent dir fid in getattr by fid

2014-02-28 Thread Oleg Drokin

Sending getattr by fid in this case is pointless, as the parent
might havelong changed and we have no control over it, but it's
irrelevant anyway, since we already have the child fid.

Signed-off-by: Oleg Drokin 
Reviewed-on: http://review.whamcloud.com/7910
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3240
Reviewed-by: Dmitry Eremin 
Reviewed-by: wangdi 
Reviewed-by: Andreas Dilger 
---
 drivers/staging/lustre/lustre/llite/dir.c  | 2 +-
 drivers/staging/lustre/lustre/llite/file.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/dir.c 
b/drivers/staging/lustre/lustre/llite/dir.c
index fd0dd20e..7fbc18e 100644
--- a/drivers/staging/lustre/lustre/llite/dir.c
+++ b/drivers/staging/lustre/lustre/llite/dir.c
@@ -362,7 +362,7 @@ struct page *ll_get_dir_page(struct inode *dir, __u64 hash,
struct ptlrpc_request *request;
struct md_op_data *op_data;
 
-   op_data = ll_prep_md_op_data(NULL, dir, NULL, NULL, 0, 0,
+   op_data = ll_prep_md_op_data(NULL, dir, dir, NULL, 0, 0,
LUSTRE_OPC_ANY, NULL);
if (IS_ERR(op_data))
return (void *)op_data;
diff --git a/drivers/staging/lustre/lustre/llite/file.c 
b/drivers/staging/lustre/lustre/llite/file.c
index c9ee574..36c54aa 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -2891,7 +2891,7 @@ int __ll_inode_revalidate_it(struct dentry *dentry, 
struct lookup_intent *it,
oit.it_op = IT_LOOKUP;
 
/* Call getattr by fid, so do not provide name at all. */
-   op_data = ll_prep_md_op_data(NULL, dentry->d_parent->d_inode,
+   op_data = ll_prep_md_op_data(NULL, dentry->d_inode,
 dentry->d_inode, NULL, 0, 0,
 LUSTRE_OPC_ANY, NULL);
if (IS_ERR(op_data))
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 06/17] lustre/clio: honor O_NOATIME

2014-02-28 Thread Oleg Drokin

From: "John L. Hammond" 

Add a ci_noatime bit to struct cl_io. In ll_io_init() set this bit if
O_NOATIME is set in f_flags. Ensure that this bit is propagated down
to lower layers. In osc_io_read_start() don't update atime if this bit
is set. Add sanity test 39n to check that passing O_NOATIME to open()
is honored.

Signed-off-by: John L. Hammond 
Reviewed-on: http://review.whamcloud.com/7442
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3832
Reviewed-by: Jinshan Xiong 
Reviewed-by: Lai Siyao 
Tested-by: Maloo 
Reviewed-by: Oleg Drokin 
Signed-off-by: Oleg Drokin 
---
 drivers/staging/lustre/lustre/include/cl_object.h |  6 -
 drivers/staging/lustre/lustre/llite/file.c| 29 +++
 drivers/staging/lustre/lustre/lov/lov_io.c|  1 +
 drivers/staging/lustre/lustre/osc/osc_io.c| 14 ---
 4 files changed, 40 insertions(+), 10 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/cl_object.h 
b/drivers/staging/lustre/lustre/include/cl_object.h
index 4d692dc..76e1b68 100644
--- a/drivers/staging/lustre/lustre/include/cl_object.h
+++ b/drivers/staging/lustre/lustre/include/cl_object.h
@@ -2392,7 +2392,11 @@ struct cl_io {
/**
 * file is released, restore has to to be triggered by vvp layer
 */
-ci_restore_needed:1;
+ci_restore_needed:1,
+   /**
+* O_NOATIME
+*/
+ci_noatime:1;
/**
 * Number of pages owned by this IO. For invariant checking.
 */
diff --git a/drivers/staging/lustre/lustre/llite/file.c 
b/drivers/staging/lustre/lustre/llite/file.c
index 36c54aa..362f5ec 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -1035,6 +1035,33 @@ int ll_glimpse_ioctl(struct ll_sb_info *sbi, struct 
lov_stripe_md *lsm,
return rc;
 }
 
+static bool file_is_noatime(const struct file *file)
+{
+   const struct vfsmount *mnt = file->f_path.mnt;
+   const struct inode *inode = file->f_path.dentry->d_inode;
+
+   /* Adapted from file_accessed() and touch_atime().*/
+   if (file->f_flags & O_NOATIME)
+   return true;
+
+   if (inode->i_flags & S_NOATIME)
+   return true;
+
+   if (IS_NOATIME(inode))
+   return true;
+
+   if (mnt->mnt_flags & (MNT_NOATIME | MNT_READONLY))
+   return true;
+
+   if ((mnt->mnt_flags & MNT_NODIRATIME) && S_ISDIR(inode->i_mode))
+   return true;
+
+   if ((inode->i_sb->s_flags & MS_NODIRATIME) && S_ISDIR(inode->i_mode))
+   return true;
+
+   return false;
+}
+
 void ll_io_init(struct cl_io *io, const struct file *file, int write)
 {
struct inode *inode = file->f_dentry->d_inode;
@@ -1054,6 +1081,8 @@ void ll_io_init(struct cl_io *io, const struct file 
*file, int write)
} else if (file->f_flags & O_APPEND) {
io->ci_lockreq = CILR_MANDATORY;
}
+
+   io->ci_noatime = file_is_noatime(file);
 }
 
 static ssize_t
diff --git a/drivers/staging/lustre/lustre/lov/lov_io.c 
b/drivers/staging/lustre/lustre/lov/lov_io.c
index 5a6ab70..65133ea 100644
--- a/drivers/staging/lustre/lustre/lov/lov_io.c
+++ b/drivers/staging/lustre/lustre/lov/lov_io.c
@@ -194,6 +194,7 @@ static int lov_io_sub_init(const struct lu_env *env, struct 
lov_io *lio,
sub_io->ci_lockreq = io->ci_lockreq;
sub_io->ci_type= io->ci_type;
sub_io->ci_no_srvlock = io->ci_no_srvlock;
+   sub_io->ci_noatime = io->ci_noatime;
 
lov_sub_enter(sub);
result = cl_io_sub_init(sub->sub_env, sub_io,
diff --git a/drivers/staging/lustre/lustre/osc/osc_io.c 
b/drivers/staging/lustre/lustre/osc/osc_io.c
index 777ae24..5f3c545 100644
--- a/drivers/staging/lustre/lustre/osc/osc_io.c
+++ b/drivers/staging/lustre/lustre/osc/osc_io.c
@@ -512,19 +512,15 @@ static int osc_io_read_start(const struct lu_env *env,
struct osc_io*oio   = cl2osc_io(env, slice);
struct cl_object *obj   = slice->cis_obj;
struct cl_attr   *attr  = _env_info(env)->oti_attr;
-   int   result = 0;
+   int rc = 0;
 
-   if (oio->oi_lockless == 0) {
+   if (oio->oi_lockless == 0 && !slice->cis_io->ci_noatime) {
cl_object_attr_lock(obj);
-   result = cl_object_attr_get(env, obj, attr);
-   if (result == 0) {
-   attr->cat_atime = LTIME_S(CURRENT_TIME);
-   result = cl_object_attr_set(env, obj, attr,
-   CAT_ATIME);
-   }
+   attr->cat_atime = LTIME_S(CURRENT_TIME);
+   rc = cl_object_attr_set(env, obj, attr, CAT_ATIME);
cl_object_attr_unlock(obj);
}
-   return result;
+   return rc;
 }
 
 static int osc_io_write_start(const struct

[PATCH 04/17] lustre/mdc: comments on LOOKUP and PERM lock

2014-02-28 Thread Oleg Drokin

From: wang di 

Add more comments for MDS_INODELOCK_PERM and
MDS_INODELOCK_LOOKUP

Signed-off-by: wang di 
Reviewed-on: http://review.whamcloud.com/7937
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3240
Reviewed-by: Andreas Dilger 
Reviewed-by: Oleg Drokin 
Signed-off-by: Oleg Drokin 
---
 .../lustre/lustre/include/lustre/lustre_idl.h  | 24 --
 drivers/staging/lustre/lustre/mdc/mdc_locks.c  |  3 +++
 2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h 
b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 4183a35..4c70c06 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -2116,12 +2116,24 @@ extern void lustre_swab_generic_32s (__u32 *val);
 #define DISP_OPEN_LEASE  0x0400
 
 /* INODE LOCK PARTS */
-#define MDS_INODELOCK_LOOKUP 0x01   /* dentry, mode, owner, group */
-#define MDS_INODELOCK_UPDATE 0x02   /* size, links, timestamps */
-#define MDS_INODELOCK_OPEN   0x04   /* For opened files */
-#define MDS_INODELOCK_LAYOUT 0x08   /* for layout */
-#define MDS_INODELOCK_PERM   0x10   /* for permission */
-#define MDS_INODELOCK_XATTR  0x20   /* extended attributes */
+#define MDS_INODELOCK_LOOKUP 0x01  /* For namespace, dentry etc, and also
+* was used to protect permission (mode,
+* owner, group etc) before 2.4. */
+#define MDS_INODELOCK_UPDATE 0x02  /* size, links, timestamps */
+#define MDS_INODELOCK_OPEN   0x04  /* For opened files */
+#define MDS_INODELOCK_LAYOUT 0x08  /* for layout */
+
+/* The PERM bit is added int 2.4, and it is used to protect permission(mode,
+ * owner, group, acl etc), so to separate the permission from LOOKUP lock.
+ * Because for remote directories(in DNE), these locks will be granted by
+ * different MDTs(different ldlm namespace).
+ *
+ * For local directory, MDT will always grant UPDATE_LOCK|PERM_LOCK together.
+ * For Remote directory, the master MDT, where the remote directory is, will
+ * grant UPDATE_LOCK|PERM_LOCK, and the remote MDT, where the name entry is,
+ * will grant LOOKUP_LOCK. */
+#define MDS_INODELOCK_PERM   0x10
+#define MDS_INODELOCK_XATTR  0x20  /* extended attributes */
 
 #define MDS_INODELOCK_MAXSHIFT 5
 /* This FULL lock is useful to take on unlink sort of operations */
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c 
b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 1336d47..d9017a5 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -1072,6 +1072,9 @@ int mdc_revalidate_lock(struct obd_export *exp, struct 
lookup_intent *it,
 * locks, there's no easy way to match all of them here,
 * so an extra RPC would be performed to fetch all
 * of those bits at once for now. */
+   /* For new MDTs(> 2.4), UPDATE|PERM should be enough,
+* but for old MDTs (< 2.4), permission is covered
+* by LOOKUP lock, so it needs to match all bits here.*/
policy.l_inodebits.bits = MDS_INODELOCK_UPDATE |
  MDS_INODELOCK_LOOKUP |
  MDS_INODELOCK_PERM;
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 02/17] lustre/mdc: Check for all attributes validity in revalidate

2014-02-28 Thread Oleg Drokin

GETATTR needs to return attributes protected by different bits, so
we need to ensure all we have locks with all of those bits, not
just UPDATE bit

Signed-off-by: Alexey Lyashkov 
Signed-off-by: Oleg Drokin 
Reviewed-on: http://review.whamcloud.com/6460
Xyratex-bug-id: MRP-1052
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3240
Reviewed-by: Dmitry Eremin 
Reviewed-by: wangdi 
Reviewed-by: Andreas Dilger 
---
 drivers/staging/lustre/lustre/mdc/mdc_locks.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c 
b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 288a41e..1336d47 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -1061,7 +1061,20 @@ int mdc_revalidate_lock(struct obd_export *exp, struct 
lookup_intent *it,
fid_build_reg_res_name(fid, _id);
switch (it->it_op) {
case IT_GETATTR:
-   policy.l_inodebits.bits = MDS_INODELOCK_UPDATE;
+   /* File attributes are held under multiple bits:
+* nlink is under lookup lock, size and times are
+* under UPDATE lock and recently we've also got
+* a separate permissions lock for owner/group/acl that
+* were protected by lookup lock before.
+* Getattr must provide all of that information,
+* so we need to ensure we have all of those locks.
+* Unfortunately, if the bits are split across multiple
+* locks, there's no easy way to match all of them here,
+* so an extra RPC would be performed to fetch all
+* of those bits at once for now. */
+   policy.l_inodebits.bits = MDS_INODELOCK_UPDATE |
+ MDS_INODELOCK_LOOKUP |
+ MDS_INODELOCK_PERM;
break;
case IT_LAYOUT:
policy.l_inodebits.bits = MDS_INODELOCK_LAYOUT;
@@ -1070,6 +1083,7 @@ int mdc_revalidate_lock(struct obd_export *exp, struct 
lookup_intent *it,
policy.l_inodebits.bits = MDS_INODELOCK_LOOKUP;
break;
}
+
mode = ldlm_lock_match(exp->exp_obd->obd_namespace,
   LDLM_FL_BLOCK_GRANTED, _id,
   LDLM_IBITS, ,
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 05/17] lustre/mdc: use ibits_known mask for lock match

2014-02-28 Thread Oleg Drokin

From: Alexey Lyashkov 

Before revalidating a lock on the client, mask the lock bits against
the lock bits supported by the server (ibits_known), so newer clients
will find valid locks given by older server versions.

Signed-off-by: Patrick Farrell 
Signed-off-by: Alexey Lyashkov 
Reviewed-on: http://review.whamcloud.com/8636
Xyratex-bug-id: MRP-1583
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4405
Reviewed-by: Andreas Dilger 
Reviewed-by: Oleg Drokin 
Signed-off-by: Oleg Drokin 
---
 drivers/staging/lustre/lustre/include/lustre_export.h | 8 
 drivers/staging/lustre/lustre/mdc/mdc_locks.c | 8 +---
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre_export.h 
b/drivers/staging/lustre/lustre/include/lustre_export.h
index 2feb38b..82a230b 100644
--- a/drivers/staging/lustre/lustre/include/lustre_export.h
+++ b/drivers/staging/lustre/lustre/include/lustre_export.h
@@ -380,6 +380,14 @@ static inline bool imp_connect_lvb_type(struct obd_import 
*imp)
return false;
 }
 
+static inline __u64 exp_connect_ibits(struct obd_export *exp)
+{
+   struct obd_connect_data *ocd;
+
+   ocd = >exp_connect_data;
+   return ocd->ocd_ibits_known;
+}
+
 extern struct obd_export *class_conn2export(struct lustre_handle *conn);
 extern struct obd_device *class_conn2obd(struct lustre_handle *conn);
 
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c 
b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index d9017a5..6ef9e28 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -160,6 +160,8 @@ ldlm_mode_t mdc_lock_match(struct obd_export *exp, __u64 
flags,
ldlm_mode_t rc;
 
fid_build_reg_res_name(fid, _id);
+   /* LU-4405: Clear bits not supported by server */
+   policy->l_inodebits.bits &= exp_connect_ibits(exp);
rc = ldlm_lock_match(class_exp2obd(exp)->obd_namespace, flags,
 _id, type, policy, mode, lockh, 0);
return rc;
@@ -1087,10 +1089,10 @@ int mdc_revalidate_lock(struct obd_export *exp, struct 
lookup_intent *it,
break;
}
 
-   mode = ldlm_lock_match(exp->exp_obd->obd_namespace,
-  LDLM_FL_BLOCK_GRANTED, _id,
+   mode = mdc_lock_match(exp, LDLM_FL_BLOCK_GRANTED, fid,
   LDLM_IBITS, ,
-  LCK_CR|LCK_CW|LCK_PR|LCK_PW, , 0);
+ LCK_CR | LCK_CW | LCK_PR | LCK_PW,
+ );
}
 
if (mode) {
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 07/17] lustre/mdc: fix bad ERR_PTR usage in mdc_locks.c

2014-02-28 Thread Oleg Drokin

From: "John L. Hammond" 

In mdc_intent_open_pack() return an ERR_PTR() rather than NULL when
ldlm_prep_enqueue_req() fails. In mdc_intent_getattr_async() check the
return value of mdc_intent_getattr_pack() using IS_ERR(). Clean up the
includes in mdc_locks.c.

Signed-off-by: John L. Hammond 
Reviewed-on: http://review.whamcloud.com/7886
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4078
Reviewed-by: Andreas Dilger 
Reviewed-by: Nathaniel Clark 
Signed-off-by: Oleg Drokin 
---
 drivers/staging/lustre/lustre/mdc/mdc_locks.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c 
b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 6ef9e28..6110943 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -37,15 +37,15 @@
 #define DEBUG_SUBSYSTEM S_MDC
 
 # include 
-# include 
-# include 
 
-#include 
+#include 
+#include 
 #include 
 #include 
-/* fid_res_name_eq() */
-#include 
-#include 
+#include  /* fid_res_name_eq() */
+#include 
+#include 
+#include 
 #include "mdc_internal.h"
 
 struct mdc_getattr_args {
@@ -336,9 +336,9 @@ static struct ptlrpc_request *mdc_intent_open_pack(struct 
obd_export *exp,
 max(lmmsize, obddev->u.cli.cl_default_mds_easize));
 
rc = ldlm_prep_enqueue_req(exp, req, , count);
-   if (rc) {
+   if (rc < 0) {
ptlrpc_request_free(req);
-   return NULL;
+   return ERR_PTR(rc);
}
 
spin_lock(>rq_lock);
@@ -1281,8 +1281,8 @@ int mdc_intent_getattr_async(struct obd_export *exp,
 
fid_build_reg_res_name(_data->op_fid1, _id);
req = mdc_intent_getattr_pack(exp, it, op_data);
-   if (!req)
-   return -ENOMEM;
+   if (IS_ERR(req))
+   return PTR_ERR(req);
 
rc = mdc_enter_request(>u.cli);
if (rc != 0) {
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 10/17] lustre/ldlm: set l_lvb_type coherent when layout is returned

2014-02-28 Thread Oleg Drokin

From: Bruno Faccini 

In case layout has been packed into server reply when not
requested, lock l_lvb_type must be set accordingly.

Signed-off-by: Bruno Faccini 
Reviewed-on: http://review.whamcloud.com/8270
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4194
Reviewed-by: Jinshan Xiong 
Reviewed-by: Johann Lombardi 
Signed-off-by: Oleg Drokin 
---
 drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c | 1 +
 drivers/staging/lustre/lustre/mdc/mdc_locks.c   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c 
b/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
index 3ed020e..d87048d 100644
--- a/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
+++ b/drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c
@@ -228,6 +228,7 @@ static void ldlm_handle_cp_callback(struct ptlrpc_request 
*req,
 
lock_res_and_lock(lock);
LASSERT(lock->l_lvb_data == NULL);
+   lock->l_lvb_type = LVB_T_LAYOUT;
lock->l_lvb_data = lvb_data;
lock->l_lvb_len = lvb_len;
unlock_res_and_lock(lock);
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_locks.c 
b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
index 81adc2b..b0d0e2a 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_locks.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_locks.c
@@ -753,6 +753,7 @@ static int mdc_finish_enqueue(struct obd_export *exp,
/* install lvb_data */
lock_res_and_lock(lock);
if (lock->l_lvb_data == NULL) {
+   lock->l_lvb_type = LVB_T_LAYOUT;
lock->l_lvb_data = lmm;
lock->l_lvb_len = lvb_len;
lmm = NULL;
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 01/17] staging/lustre/llite: fix open lock matching in ll_md_blocking_ast()

2014-02-28 Thread Oleg Drokin

From: "John L. Hammond" 

In ll_md_blocking_ast() match open locks before all others, ensuring
that MDS_INODELOCK_OPEN is not cleared from bits by another open lock
with a different mode. Change the int flags parameter of
ll_md_real_close() to fmode_t fmode. Clean up verious style issues in
both functions.

Signed-off-by: John L. Hammond 
Reviewed-on: http://review.whamcloud.com/8718
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4429
Reviewed-by: Niu Yawei 
Reviewed-by: Jinshan Xiong 
Reviewed-by: Oleg Drokin 
Signed-off-by: Oleg Drokin 
---
 drivers/staging/lustre/lustre/llite/file.c | 19 +++---
 .../staging/lustre/lustre/llite/llite_internal.h   |  2 +-
 drivers/staging/lustre/lustre/llite/namei.c| 78 --
 3 files changed, 54 insertions(+), 45 deletions(-)

diff --git a/drivers/staging/lustre/lustre/llite/file.c 
b/drivers/staging/lustre/lustre/llite/file.c
index 4c28f39..c9ee574 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -205,7 +205,7 @@ out:
return rc;
 }
 
-int ll_md_real_close(struct inode *inode, int flags)
+int ll_md_real_close(struct inode *inode, fmode_t fmode)
 {
struct ll_inode_info *lli = ll_i2info(inode);
struct obd_client_handle **och_p;
@@ -213,30 +213,33 @@ int ll_md_real_close(struct inode *inode, int flags)
__u64 *och_usecount;
int rc = 0;
 
-   if (flags & FMODE_WRITE) {
+   if (fmode & FMODE_WRITE) {
och_p = >lli_mds_write_och;
och_usecount = >lli_open_fd_write_count;
-   } else if (flags & FMODE_EXEC) {
+   } else if (fmode & FMODE_EXEC) {
och_p = >lli_mds_exec_och;
och_usecount = >lli_open_fd_exec_count;
} else {
-   LASSERT(flags & FMODE_READ);
+   LASSERT(fmode & FMODE_READ);
och_p = >lli_mds_read_och;
och_usecount = >lli_open_fd_read_count;
}
 
mutex_lock(>lli_och_mutex);
-   if (*och_usecount) { /* There are still users of this handle, so
-   skip freeing it. */
+   if (*och_usecount > 0) {
+   /* There are still users of this handle, so skip
+* freeing it. */
mutex_unlock(>lli_och_mutex);
return 0;
}
+
och=*och_p;
*och_p = NULL;
mutex_unlock(>lli_och_mutex);
 
-   if (och) { /* There might be a race and somebody have freed this och
- already */
+   if (och != NULL) {
+   /* There might be a race and this handle may already
+  be closed. */
rc = ll_close_inode_openhandle(ll_i2sbi(inode)->ll_md_exp,
   inode, och, NULL);
}
diff --git a/drivers/staging/lustre/lustre/llite/llite_internal.h 
b/drivers/staging/lustre/lustre/llite/llite_internal.h
index e27efd1..47c5142 100644
--- a/drivers/staging/lustre/lustre/llite/llite_internal.h
+++ b/drivers/staging/lustre/lustre/llite/llite_internal.h
@@ -775,7 +775,7 @@ int ll_local_open(struct file *file,
 int ll_release_openhandle(struct dentry *, struct lookup_intent *);
 int ll_md_close(struct obd_export *md_exp, struct inode *inode,
struct file *file);
-int ll_md_real_close(struct inode *inode, int flags);
+int ll_md_real_close(struct inode *inode, fmode_t fmode);
 void ll_ioepoch_close(struct inode *inode, struct md_op_data *op_data,
  struct obd_client_handle **och, unsigned long flags);
 void ll_done_writing_attr(struct inode *inode, struct md_op_data *op_data);
diff --git a/drivers/staging/lustre/lustre/llite/namei.c 
b/drivers/staging/lustre/lustre/llite/namei.c
index 93c3744..86ff708 100644
--- a/drivers/staging/lustre/lustre/llite/namei.c
+++ b/drivers/staging/lustre/lustre/llite/namei.c
@@ -195,101 +195,107 @@ static void ll_invalidate_negative_children(struct 
inode *dir)
 int ll_md_blocking_ast(struct ldlm_lock *lock, struct ldlm_lock_desc *desc,
   void *data, int flag)
 {
-   int rc;
struct lustre_handle lockh;
+   int rc;
 
switch (flag) {
case LDLM_CB_BLOCKING:
ldlm_lock2handle(lock, );
rc = ldlm_cli_cancel(, LCF_ASYNC);
if (rc < 0) {
-   CDEBUG(D_INODE, "ldlm_cli_cancel: %d\n", rc);
+   CDEBUG(D_INODE, "ldlm_cli_cancel: rc = %d\n", rc);
return rc;
}
break;
case LDLM_CB_CANCELING: {
struct inode *inode = ll_inode_from_resource_lock(lock);
-   struct ll_inode_info *lli;
__u64 bits = lock->l_policy_data.l_inodebits.bits;
-   struct lu_fid *fid;
-   ldlm_mode_t mode = lock->l_req_mode;
 
/* Inode is set to lock->l_resource->lr_lvb_inode
 * for

[PATCH 09/17] lustre/llite: simplify dentry revalidate

2014-02-28 Thread Oleg Drokin

From: Lai Siyao 

Lustre client dentry validation is protected by LDLM lock, so
any time a dentry is found, it's valid and no need to revalidate
from MDS, and even it does, there is race that it may be
invalidated after revalidation is finished.

Signed-off-by: Lai Siyao 
Reviewed-on: http://review.whamcloud.com/7475
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3544
Reviewed-by: Peng Tao 
Reviewed-by: Bob Glossman 
Reviewed-by: John L. Hammond 
Reviewed-by: Oleg Drokin 
Signed-off-by: Oleg Drokin 
---
 .../lustre/lustre/include/lustre/lustre_idl.h  |   2 +-
 drivers/staging/lustre/lustre/llite/dcache.c   | 290 ++---
 drivers/staging/lustre/lustre/llite/file.c |   8 +-
 .../staging/lustre/lustre/llite/llite_internal.h   |   4 +-
 drivers/staging/lustre/lustre/lmv/lmv_intent.c |   1 -
 drivers/staging/lustre/lustre/lmv/lmv_obd.c|   1 -
 drivers/staging/lustre/lustre/mdc/mdc_locks.c  |  52 ++--
 7 files changed, 45 insertions(+), 313 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h 
b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index a55eebf..5f5b0ba 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -2112,7 +2112,7 @@ extern void lustre_swab_generic_32s (__u32 *val);
 #define DISP_LOOKUP_POS  0x0008
 #define DISP_OPEN_CREATE 0x0010
 #define DISP_OPEN_OPEN   0x0020
-#define DISP_ENQ_COMPLETE0x0040
+#define DISP_ENQ_COMPLETE0x0040/* obsolete and unused 
*/
 #define DISP_ENQ_OPEN_REF0x0080
 #define DISP_ENQ_CREATE_REF  0x0100
 #define DISP_OPEN_LOCK   0x0200
diff --git a/drivers/staging/lustre/lustre/llite/dcache.c 
b/drivers/staging/lustre/lustre/llite/dcache.c
index 3907c87..f971a54 100644
--- a/drivers/staging/lustre/lustre/llite/dcache.c
+++ b/drivers/staging/lustre/lustre/llite/dcache.c
@@ -241,9 +241,6 @@ void ll_intent_release(struct lookup_intent *it)
 ptlrpc_req_finished(it->d.lustre.it_data); /* ll_file_open */
if (it_disposition(it, DISP_ENQ_CREATE_REF)) /* create rec */
ptlrpc_req_finished(it->d.lustre.it_data);
-   if (it_disposition(it, DISP_ENQ_COMPLETE)) /* saved req from revalidate
-   * to lookup */
-   ptlrpc_req_finished(it->d.lustre.it_data);
 
it->d.lustre.it_disposition = 0;
it->d.lustre.it_data = NULL;
@@ -328,262 +325,32 @@ void ll_frob_intent(struct lookup_intent **itp, struct 
lookup_intent *deft)
 
 }
 
-int ll_revalidate_it(struct dentry *de, int lookup_flags,
-struct lookup_intent *it)
+static int ll_revalidate_dentry(struct dentry *dentry,
+   unsigned int lookup_flags)
 {
-   struct md_op_data *op_data;
-   struct ptlrpc_request *req = NULL;
-   struct lookup_intent lookup_it = { .it_op = IT_LOOKUP };
-   struct obd_export *exp;
-   struct inode *parent = de->d_parent->d_inode;
-   int rc;
-
-   CDEBUG(D_VFSTRACE, "VFS Op:name=%s,intent=%s\n", de->d_name.name,
-  LL_IT2STR(it));
-
-   LASSERT(de != de->d_sb->s_root);
-
-   if (de->d_inode == NULL) {
-   __u64 ibits;
-
-   /* We can only use negative dentries if this is stat or lookup,
-  for opens and stuff we do need to query server. */
-   /* If there is IT_CREAT in intent op set, then we must throw
-  away this negative dentry and actually do the request to
-  kernel to create whatever needs to be created (if possible)*/
-   if (it && (it->it_op & IT_CREAT))
-   return 0;
+   struct inode *dir = dentry->d_parent->d_inode;
 
-   if (d_lustre_invalid(de))
-   return 0;
-
-   ibits = MDS_INODELOCK_UPDATE;
-   rc = ll_have_md_lock(parent, , LCK_MINMODE);
-   GOTO(out_sa, rc);
-   }
-
-   /* Never execute intents for mount points.
-* Attributes will be fixed up in ll_inode_revalidate_it */
-   if (d_mountpoint(de))
-   GOTO(out_sa, rc = 1);
-
-   exp = ll_i2mdexp(de->d_inode);
-
-   OBD_FAIL_TIMEOUT(OBD_FAIL_MDC_REVALIDATE_PAUSE, 5);
-   ll_frob_intent(, _it);
-   LASSERT(it);
+   /*
+* if open is set, talk to MDS to make sure file is created if
+* necessary, because we can't do this in ->open() later since that's
+* called on an inode. return 0 here to let lookup to handle this.
+*/
+   if ((lookup_flags & (LOOKUP_OPEN | LOOKUP_CREATE)) ==
+   (LOOKUP_OPEN | LOOKUP_CREATE))
+   return 0;
 
-   if (it->it_op == IT_LOOKUP && !d_lustre_invalid(de))
+   if (lookup_flags & (LOOKUP_PARENT | LOOKUP_OPEN | LOOKUP_CREATE))
return 1;
 
-   if

[PATCH 12/17] lustre/ptlrpc: skip rpcs that fail ptl_send_rpc

2014-02-28 Thread Oleg Drokin

From: Peng Tao 

ptl_send_rpc is not dealing with -ENOMEM in some
situations.  When the ptl_send_rpc fails we need
set error and skip further processing or trigger
and LBUG

Signed-off-by: Keith Mannthey 
Signed-off-by: Peng Tao 
Reviewed-on: http://review.whamcloud.com/7411
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3698
Reviewed-by: Mike Pershin 
Reviewed-by: Oleg Drokin 
Signed-off-by: Oleg Drokin 
---
 drivers/staging/lustre/lustre/ptlrpc/client.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/client.c 
b/drivers/staging/lustre/lustre/ptlrpc/client.c
index b6d831a..98041e8 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/client.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/client.c
@@ -1692,6 +1692,7 @@ int ptlrpc_check_set(const struct lu_env *env, struct 
ptlrpc_request_set *set)
spin_lock(>rq_lock);
req->rq_net_err = 1;
spin_unlock(>rq_lock);
+   continue;
}
/* need to reset the timeout */
force_timer_recalc = 1;
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 11/17] lustre/ptlrpc: rq_commit_cb is called for twice

2014-02-28 Thread Oleg Drokin

From: Liang Zhen 

If a ptlrpc_request is already on imp::imp_replay_list, when it's
replayed and replied, after_reply() will call req::rq_commit_cb
for the request, then call it again in ptlrpc_free_committed.

Signed-off-by: Liang Zhen 
Reviewed-on: http://review.whamcloud.com/8815
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-3618
Reviewed-by: Alex Zhuravlev 
Reviewed-by: Bobi Jam 
Reviewed-by: Oleg Drokin 
Signed-off-by: Oleg Drokin 
---
 drivers/staging/lustre/lustre/ptlrpc/client.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/client.c 
b/drivers/staging/lustre/lustre/ptlrpc/client.c
index a32b722..b6d831a 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/client.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/client.c
@@ -1313,7 +1313,11 @@ static int after_reply(struct ptlrpc_request *req)
/** version recovery */
ptlrpc_save_versions(req);
ptlrpc_retain_replayable_request(req, imp);
-   } else if (req->rq_commit_cb != NULL) {
+   } else if (req->rq_commit_cb != NULL &&
+  list_empty(>rq_replay_list)) {
+   /* NB: don't call rq_commit_cb if it's already on
+* rq_replay_list, ptlrpc_free_committed() will call
+* it later, see LU-3618 for details */
spin_unlock(>imp_lock);
req->rq_commit_cb(req);
spin_lock(>imp_lock);
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 08/17] lustre/recovery: free open/close request promptly

2014-02-28 Thread Oleg Drokin

From: Hongchao Zhang 

- For the non-create open or committed open, the open request
  should be freed along with the close request as soon as the
  close done, despite that the transno of open/close is
  greater than the last committed transno known by client or not.

- Move the committed open request into another dedicated list,
  that will avoid scanning a huge replay list on receiving each
  reply (when there are many open files).

Signed-off-by: Niu Yawei 
Signed-off-by: Hongchao Zhang 
Reviewed-on: http://review.whamcloud.com/6665
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2613
Reviewed-by: Alex Zhuravlev 
Reviewed-by: Oleg Drokin 
Signed-off-by: Oleg Drokin 
---
 .../lustre/lustre/include/lustre/lustre_idl.h  |  6 +-
 .../staging/lustre/lustre/include/lustre_export.h  |  9 +++
 .../staging/lustre/lustre/include/lustre_import.h  | 11 +++
 drivers/staging/lustre/lustre/include/lustre_net.h |  2 +
 drivers/staging/lustre/lustre/include/obd.h|  5 +-
 drivers/staging/lustre/lustre/include/obd_class.h  |  4 +-
 drivers/staging/lustre/lustre/llite/file.c |  2 +-
 drivers/staging/lustre/lustre/llite/llite_lib.c|  3 +-
 drivers/staging/lustre/lustre/lmv/lmv_obd.c|  4 +-
 drivers/staging/lustre/lustre/mdc/mdc_internal.h   |  2 +-
 drivers/staging/lustre/lustre/mdc/mdc_locks.c  |  2 +-
 drivers/staging/lustre/lustre/mdc/mdc_reint.c  |  1 +
 drivers/staging/lustre/lustre/mdc/mdc_request.c| 27 +++-
 drivers/staging/lustre/lustre/obdclass/genops.c|  2 +
 .../lustre/lustre/obdclass/lprocfs_status.c|  1 +
 drivers/staging/lustre/lustre/ptlrpc/client.c  | 78 +-
 drivers/staging/lustre/lustre/ptlrpc/import.c  | 33 ++---
 drivers/staging/lustre/lustre/ptlrpc/recover.c | 57 +---
 18 files changed, 198 insertions(+), 51 deletions(-)

diff --git a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h 
b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
index 4c70c06..a55eebf 100644
--- a/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
+++ b/drivers/staging/lustre/lustre/include/lustre/lustre_idl.h
@@ -1305,6 +1305,7 @@ extern void lustre_swab_ptlrpc_body(struct ptlrpc_body 
*pb);
 #define OBD_CONNECT_SHORTIO 0x2ULL/* short io */
 #define OBD_CONNECT_PINGLESS   0x4ULL/* pings not required */
 #define OBD_CONNECT_FLOCK_DEAD 0x8ULL/* flock deadlock detection */
+#define OBD_CONNECT_DISP_STRIPE 0x10ULL/*create stripe 
disposition*/
 
 /* XXX README XXX:
  * Please DO NOT add flag values here before first ensuring that this same
@@ -1344,7 +1345,9 @@ extern void lustre_swab_ptlrpc_body(struct ptlrpc_body 
*pb);
OBD_CONNECT_LIGHTWEIGHT | OBD_CONNECT_UMASK | \
OBD_CONNECT_LVB_TYPE | OBD_CONNECT_LAYOUTLOCK |\
OBD_CONNECT_PINGLESS | OBD_CONNECT_MAX_EASIZE |\
-   OBD_CONNECT_FLOCK_DEAD)
+   OBD_CONNECT_FLOCK_DEAD | \
+   OBD_CONNECT_DISP_STRIPE)
+
 #define OST_CONNECT_SUPPORTED  (OBD_CONNECT_SRVLOCK | OBD_CONNECT_GRANT | \
OBD_CONNECT_REQPORTAL | OBD_CONNECT_VERSION | \
OBD_CONNECT_TRUNCLOCK | OBD_CONNECT_INDEX | \
@@ -2114,6 +2117,7 @@ extern void lustre_swab_generic_32s (__u32 *val);
 #define DISP_ENQ_CREATE_REF  0x0100
 #define DISP_OPEN_LOCK   0x0200
 #define DISP_OPEN_LEASE  0x0400
+#define DISP_OPEN_STRIPE 0x0800
 
 /* INODE LOCK PARTS */
 #define MDS_INODELOCK_LOOKUP 0x01  /* For namespace, dentry etc, and also
diff --git a/drivers/staging/lustre/lustre/include/lustre_export.h 
b/drivers/staging/lustre/lustre/include/lustre_export.h
index 82a230b..6f7f48c 100644
--- a/drivers/staging/lustre/lustre/include/lustre_export.h
+++ b/drivers/staging/lustre/lustre/include/lustre_export.h
@@ -388,6 +388,15 @@ static inline __u64 exp_connect_ibits(struct obd_export 
*exp)
return ocd->ocd_ibits_known;
 }
 
+static inline bool imp_connect_disp_stripe(struct obd_import *imp)
+{
+   struct obd_connect_data *ocd;
+
+   LASSERT(imp != NULL);
+   ocd = >imp_connect_data;
+   return ocd->ocd_connect_flags & OBD_CONNECT_DISP_STRIPE;
+}
+
 extern struct obd_export *class_conn2export(struct lustre_handle *conn);
 extern struct obd_device *class_conn2obd(struct lustre_handle *conn);
 
diff --git a/drivers/staging/lustre/lustre/include/lustre_import.h 
b/drivers/staging/lustre/lustre/include/lustre_import.h
index 67259eb..e9833ae 100644
--- a/drivers/staging/lustre/lustre/include/lustre_import.h
+++ b/drivers/staging/lustre/lustre/include/lustre_import.h
@@ -180,6 +180,17 @@ struct obd_import {
struct list_headimp_delayed_list;
/** @} */
 
+   /**
+* List of requests that are retained for committed open replay. Once
+

[PATCH 14/17] lustre/ptlrpc: re-enqueue ptlrpcd worker

2014-02-28 Thread Oleg Drokin

From: Liang Zhen 

osc_extent_wait can be stuck in scenario like this:

1) thread-1 held an active extent
2) thread-2 called flush cache, and marked this extent as "urgent"
   and "sync_wait"
3) thread-3 wants to write to the same extent, osc_extent_find will
   get "conflict" because this extent is "sync_wait", so it starts
   to wait...
4) cl_writeback_work has been scheduled by thread-4 to write some
   other extents, it has sent RPCs but not returned yet.
5) thread-1 finished his work, and called osc_extent_release()->
   osc_io_unplug_async()->ptlrpcd_queue_work(), but found
   cl_writeback_work is still running, so it's ignored (-EBUSY)
6) thread-3 is stuck because nobody will wake him up.

This patch allows ptlrpcd_work to be rescheduled, so it will not
miss request anymore

Signed-off-by: Liang Zhen 
Reviewed-on: http://review.whamcloud.com/8922
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4509
Reviewed-by: Jinshan Xiong 
Reviewed-by: Bobi Jam 
Reviewed-by: Oleg Drokin 
Signed-off-by: Oleg Drokin 
---
 drivers/staging/lustre/lustre/ptlrpc/client.c | 64 +--
 1 file changed, 40 insertions(+), 24 deletions(-)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/client.c 
b/drivers/staging/lustre/lustre/ptlrpc/client.c
index 7b97c64..4c9e006 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/client.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/client.c
@@ -48,6 +48,7 @@
 #include "ptlrpc_internal.h"
 
 static int ptlrpc_send_new_req(struct ptlrpc_request *req);
+static int ptlrpcd_check_work(struct ptlrpc_request *req);
 
 /**
  * Initialize passed in client structure \a cl.
@@ -1784,6 +1785,10 @@ interpret:
 
ptlrpc_req_interpret(env, req, req->rq_status);
 
+   if (ptlrpcd_check_work(req)) {
+   atomic_dec(>set_remaining);
+   continue;
+   }
ptlrpc_rqphase_move(req, RQ_PHASE_COMPLETE);
 
CDEBUG(req->rq_reqmsg != NULL ? D_RPCTRACE : 0,
@@ -2957,22 +2962,50 @@ EXPORT_SYMBOL(ptlrpc_sample_next_xid);
  *have delay before it really runs by ptlrpcd thread.
  */
 struct ptlrpc_work_async_args {
-   __u64   magic;
int   (*cb)(const struct lu_env *, void *);
void   *cbdata;
 };
 
-#define PTLRPC_WORK_MAGIC 0x6655436b676f4f44ULL /* magic code */
+static void ptlrpcd_add_work_req(struct ptlrpc_request *req)
+{
+   /* re-initialize the req */
+   req->rq_timeout = obd_timeout;
+   req->rq_sent= cfs_time_current_sec();
+   req->rq_deadline= req->rq_sent + req->rq_timeout;
+   req->rq_reply_deadline  = req->rq_deadline;
+   req->rq_phase   = RQ_PHASE_INTERPRET;
+   req->rq_next_phase  = RQ_PHASE_COMPLETE;
+   req->rq_xid = ptlrpc_next_xid();
+   req->rq_import_generation = req->rq_import->imp_generation;
+
+   ptlrpcd_add_req(req, PDL_POLICY_ROUND, -1);
+}
 
 static int work_interpreter(const struct lu_env *env,
struct ptlrpc_request *req, void *data, int rc)
 {
struct ptlrpc_work_async_args *arg = data;
 
-   LASSERT(arg->magic == PTLRPC_WORK_MAGIC);
+   LASSERT(ptlrpcd_check_work(req));
LASSERT(arg->cb != NULL);
 
-   return arg->cb(env, arg->cbdata);
+   rc = arg->cb(env, arg->cbdata);
+
+   list_del_init(>rq_set_chain);
+   req->rq_set = NULL;
+
+   if (atomic_dec_return(>rq_refcount) > 1) {
+   atomic_set(>rq_refcount, 2);
+   ptlrpcd_add_work_req(req);
+   }
+   return rc;
+}
+
+static int worker_format;
+
+static int ptlrpcd_check_work(struct ptlrpc_request *req)
+{
+   return req->rq_pill.rc_fmt == (void *)_format;
 }
 
 /**
@@ -3005,6 +3038,7 @@ void *ptlrpcd_alloc_work(struct obd_import *imp,
req->rq_receiving_reply = 0;
req->rq_must_unlink = 0;
req->rq_no_delay = req->rq_no_resend = 1;
+   req->rq_pill.rc_fmt = (void *)_format;
 
spin_lock_init(>rq_lock);
INIT_LIST_HEAD(>rq_list);
@@ -3018,7 +3052,6 @@ void *ptlrpcd_alloc_work(struct obd_import *imp,
 
CLASSERT(sizeof(*args) <= sizeof(req->rq_async_args));
args = ptlrpc_req_async_args(req);
-   args->magic  = PTLRPC_WORK_MAGIC;
args->cb = cb;
args->cbdata = cbdata;
 
@@ -3048,25 +3081,8 @@ int ptlrpcd_queue_work(void *handler)
 * req as opaque data. - Jinshan
 */
LASSERT(atomic_read(>rq_refcount) > 0);
-   if (atomic_read(>rq_refcount) > 1)
-   return -EBUSY;
-
-   if (atomic_inc_return(>rq_refcount) > 2) { /* race */
-   atomic_dec(>rq_refcount);
-   return -EBUSY;
-   }
-
-   /* re-initialize the req */
-   req->rq_timeout = obd_timeout;
-   req->rq_sent   = cfs_time_current_sec();
-   req->rq_deadline   = req->rq_sent + req->rq_timeout;
-   req->rq_reply_deadline = req->rq_deadline;
-   req->rq_phase =

[PATCH 16/17] lustre/quota: improper assert in osc_quota_chkdq()

2014-02-28 Thread Oleg Drokin

From: Niu Yawei 

In osc_quota_chkdq(), we should never try to access oqi found
from hash, since it could have been freed by osc_quota_setdq().

Signed-off-by: Niu Yawei 
Reviewed-on: http://review.whamcloud.com/8460
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4336
Reviewed-by: Johann Lombardi 
Reviewed-by: Fan Yong 
Signed-off-by: Oleg Drokin 
---
 drivers/staging/lustre/lustre/osc/osc_quota.c | 7 ++-
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/lustre/lustre/osc/osc_quota.c 
b/drivers/staging/lustre/lustre/osc/osc_quota.c
index 6045a78..f395ae4 100644
--- a/drivers/staging/lustre/lustre/osc/osc_quota.c
+++ b/drivers/staging/lustre/lustre/osc/osc_quota.c
@@ -51,11 +51,8 @@ int osc_quota_chkdq(struct client_obd *cli, const unsigned 
int qid[])
 
oqi = cfs_hash_lookup(cli->cl_quota_hash[type], [type]);
if (oqi) {
-   obd_uid id = oqi->oqi_id;
-
-   LASSERTF(id == qid[type],
-"The ids don't match %u != %u\n",
-id, qid[type]);
+   /* do not try to access oqi here, it could have been
+* freed by osc_quota_setdq() */
 
/* the slot is busy, the user is about to run out of
 * quota space on this OST */
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 13/17] lustre/ptlrpc: fix 'data race condition' issues

2014-02-28 Thread Oleg Drokin

From: Sebastien Buisson 

Fix 'data race condition' defects found by Coverity version
6.5.0:
Data race condition (MISSING_LOCK)
Accessing variable without holding lock. Elsewhere,
this variable is accessed with lock held.

Signed-off-by: Sebastien Buisson 
Reviewed-on: http://review.whamcloud.com/6575
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-2744
Reviewed-by: Andreas Dilger 
Signed-off-by: Oeg Drokin 
---
 drivers/staging/lustre/lustre/ptlrpc/client.c | 6 ++
 drivers/staging/lustre/lustre/ptlrpc/niobuf.c | 4 
 2 files changed, 10 insertions(+)

diff --git a/drivers/staging/lustre/lustre/ptlrpc/client.c 
b/drivers/staging/lustre/lustre/ptlrpc/client.c
index 98041e8..7b97c64 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/client.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/client.c
@@ -1190,7 +1190,9 @@ static int after_reply(struct ptlrpc_request *req)
 * will roundup it */
req->rq_replen   = req->rq_nob_received;
req->rq_nob_received = 0;
+   spin_lock(>rq_lock);
req->rq_resend   = 1;
+   spin_unlock(>rq_lock);
return 0;
}
 
@@ -1412,7 +1414,9 @@ static int ptlrpc_send_new_req(struct ptlrpc_request *req)
req->rq_status = rc;
return 1;
} else {
+   spin_lock(>rq_lock);
req->rq_wait_ctx = 1;
+   spin_unlock(>rq_lock);
return 0;
}
}
@@ -1427,7 +1431,9 @@ static int ptlrpc_send_new_req(struct ptlrpc_request *req)
rc = ptl_send_rpc(req, 0);
if (rc) {
DEBUG_REQ(D_HA, req, "send failed (%d); expect timeout", rc);
+   spin_lock(>rq_lock);
req->rq_net_err = 1;
+   spin_unlock(>rq_lock);
return rc;
}
return 0;
diff --git a/drivers/staging/lustre/lustre/ptlrpc/niobuf.c 
b/drivers/staging/lustre/lustre/ptlrpc/niobuf.c
index 1e94597..a47a8d8 100644
--- a/drivers/staging/lustre/lustre/ptlrpc/niobuf.c
+++ b/drivers/staging/lustre/lustre/ptlrpc/niobuf.c
@@ -511,7 +511,9 @@ int ptl_send_rpc(struct ptlrpc_request *request, int 
noreply)
CDEBUG(D_HA, "muting rpc for failed imp obd %s\n",
   request->rq_import->imp_obd->obd_name);
/* this prevents us from waiting in ptlrpc_queue_wait */
+   spin_lock(>rq_lock);
request->rq_err = 1;
+   spin_unlock(>rq_lock);
request->rq_status = -ENODEV;
return -ENODEV;
}
@@ -553,7 +555,9 @@ int ptl_send_rpc(struct ptlrpc_request *request, int 
noreply)
if (rc) {
/* this prevents us from looping in
 * ptlrpc_queue_wait */
+   spin_lock(>rq_lock);
request->rq_err = 1;
+   spin_unlock(>rq_lock);
request->rq_status = rc;
GOTO(cleanup_bulk, rc);
}
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 15/17] lustre/osc: Don't flush active extents.

2014-02-28 Thread Oleg Drokin

From: Ann Koehler 

The extent is active so we need to abort and let the caller
re-dirty the page. If we continued on here, and we were the
one making the extent active, we could deadlock waiting for
the page writeback to clear but it won't because the extent
is active and won't be written out.

Signed-off-by: Ann Koehler 
Reviewed-on: http://review.whamcloud.com/8278
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4253
Reviewed-by: Jinshan Xiong 
Reviewed-by: Alex Zhuravlev 
Reviewed-by: Alexey Lyashkov 
Reviewed-by: Oleg Drokin 
Signed-off-by: Oleg Drokin 
---
 drivers/staging/lustre/lustre/osc/osc_cache.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/staging/lustre/lustre/osc/osc_cache.c 
b/drivers/staging/lustre/lustre/osc/osc_cache.c
index b92a02e..af25c19 100644
--- a/drivers/staging/lustre/lustre/osc/osc_cache.c
+++ b/drivers/staging/lustre/lustre/osc/osc_cache.c
@@ -2394,6 +2394,12 @@ int osc_flush_async_page(const struct lu_env *env, 
struct cl_io *io,
 * really sending the RPC. */
case OES_TRUNC:
/* race with truncate, page will be redirtied */
+   case OES_ACTIVE:
+   /* The extent is active so we need to abort and let the caller
+* re-dirty the page. If we continued on here, and we were the
+* one making the extent active, we could deadlock waiting for
+* the page writeback to clear but it won't because the extent
+* is active and won't be written out. */
GOTO(out, rc = -EAGAIN);
default:
break;
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 17/17] lustre/libcfs: warn if all HTs in a core are gone

2014-02-28 Thread Oleg Drokin

libcfs cpu partition can't support CPU hotplug, but it is safe
when plug-in new CPU or enabling/disabling hyper-threading.
It has potential risk only if plug-out CPU because it may break CPU
affinity of Lustre threads.

Current libcfs will print warning for all CPU notification, this
patch changed this behavior and only output warning when we lost all
HTs in a CPU core which may have broken affinity of Lustre threads.

Signed-off-by: Liang Zhen 
Reviewed-on: http://review.whamcloud.com/8770
Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-4454
Reviewed-by: Bobi Jam 
Reviewed-by: Andreas Dilger 
Signed-off-by: Oleg Drokin 
---
 .../staging/lustre/lustre/libcfs/linux/linux-cpu.c| 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/staging/lustre/lustre/libcfs/linux/linux-cpu.c 
b/drivers/staging/lustre/lustre/libcfs/linux/linux-cpu.c
index 58bb256..77b1ef6 100644
--- a/drivers/staging/lustre/lustre/libcfs/linux/linux-cpu.c
+++ b/drivers/staging/lustre/lustre/libcfs/linux/linux-cpu.c
@@ -952,6 +952,7 @@ static int
 cfs_cpu_notify(struct notifier_block *self, unsigned long action, void *hcpu)
 {
unsigned int  cpu = (unsigned long)hcpu;
+   bool warn;
 
switch (action) {
case CPU_DEAD:
@@ -962,9 +963,21 @@ cfs_cpu_notify(struct notifier_block *self, unsigned long 
action, void *hcpu)
cpt_data.cpt_version++;
spin_unlock(_data.cpt_lock);
default:
-   CWARN("Lustre: can't support CPU hotplug well now, "
- "performance and stability could be impacted"
- "[CPU %u notify: %lx]\n", cpu, action);
+   if (action != CPU_DEAD && action != CPU_DEAD_FROZEN) {
+   CDEBUG(D_INFO, "CPU changed [cpu %u action %lx]\n",
+  cpu, action);
+   break;
+   }
+
+   down(_data.cpt_mutex);
+   /* if all HTs in a core are offline, it may break affinity */
+   cfs_cpu_ht_siblings(cpu, cpt_data.cpt_cpumask);
+   warn = any_online_cpu(*cpt_data.cpt_cpumask) >= nr_cpu_ids;
+   up(_data.cpt_mutex);
+   CDEBUG(warn ? D_WARNING : D_INFO,
+  "Lustre: can't support CPU plug-out well now, "
+  "performance and stability could be impacted "
+  "[CPU %u action: %lx]\n", cpu, action);
}
 
return NOTIFY_OK;
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 00/17] Lustre stability patches

2014-02-28 Thread Oleg Drokin

This series of patches fixes most of the issues I hit during
Lustre regression test suite. All observed crashes are gone too.

Please consider for inclusion.

Alexey Lyashkov (1):
  lustre/mdc: use ibits_known mask for lock match

Ann Koehler (1):
  lustre/osc: Don't flush active extents.

Bruno Faccini (1):
  lustre/ldlm: set l_lvb_type coherent when layout is returned

Hongchao Zhang (1):
  lustre/recovery: free open/close request promptly

John L. Hammond (3):
  staging/lustre/llite: fix open lock matching in ll_md_blocking_ast()
  lustre/clio: honor O_NOATIME
  lustre/mdc: fix bad ERR_PTR usage in mdc_locks.c

Lai Siyao (1):
  lustre/llite: simplify dentry revalidate

Liang Zhen (2):
  lustre/ptlrpc: rq_commit_cb is called for twice
  lustre/ptlrpc: re-enqueue ptlrpcd worker

Niu Yawei (1):
  lustre/quota: improper assert in osc_quota_chkdq()

Oleg Drokin (3):
  lustre/mdc: Check for all attributes validity in revalidate
  lustre/llite: Do not send parent dir fid in getattr by fid
  lustre/libcfs: warn if all HTs in a core are gone

Peng Tao (1):
  lustre/ptlrpc: skip rpcs that fail ptl_send_rpc

Sebastien Buisson (1):
  lustre/ptlrpc: fix 'data race condition' issues

wang di (1):
  lustre/mdc: comments on LOOKUP and PERM lock

 drivers/staging/lustre/lustre/include/cl_object.h  |   6 +-
 .../lustre/lustre/include/lustre/lustre_idl.h  |  32 ++-
 .../staging/lustre/lustre/include/lustre_export.h  |  17 ++
 .../staging/lustre/lustre/include/lustre_import.h  |  11 +
 drivers/staging/lustre/lustre/include/lustre_net.h |   2 +
 drivers/staging/lustre/lustre/include/obd.h|   5 +-
 drivers/staging/lustre/lustre/include/obd_class.h  |   4 +-
 drivers/staging/lustre/lustre/ldlm/ldlm_lockd.c|   1 +
 .../staging/lustre/lustre/libcfs/linux/linux-cpu.c |  19 +-
 drivers/staging/lustre/lustre/llite/dcache.c   | 290 ++---
 drivers/staging/lustre/lustre/llite/dir.c  |   2 +-
 drivers/staging/lustre/lustre/llite/file.c |  60 +++--
 .../staging/lustre/lustre/llite/llite_internal.h   |   6 +-
 drivers/staging/lustre/lustre/llite/llite_lib.c|   3 +-
 drivers/staging/lustre/lustre/llite/namei.c|  78 +++---
 drivers/staging/lustre/lustre/lmv/lmv_intent.c |   1 -
 drivers/staging/lustre/lustre/lmv/lmv_obd.c|   5 +-
 drivers/staging/lustre/lustre/lov/lov_io.c |   1 +
 drivers/staging/lustre/lustre/mdc/mdc_internal.h   |   2 +-
 drivers/staging/lustre/lustre/mdc/mdc_locks.c  | 102 
 drivers/staging/lustre/lustre/mdc/mdc_reint.c  |   1 +
 drivers/staging/lustre/lustre/mdc/mdc_request.c|  27 +-
 drivers/staging/lustre/lustre/obdclass/genops.c|   2 +
 .../lustre/lustre/obdclass/lprocfs_status.c|   1 +
 drivers/staging/lustre/lustre/osc/osc_cache.c  |   6 +
 drivers/staging/lustre/lustre/osc/osc_io.c |  14 +-
 drivers/staging/lustre/lustre/osc/osc_quota.c  |   7 +-
 drivers/staging/lustre/lustre/ptlrpc/client.c  | 155 ---
 drivers/staging/lustre/lustre/ptlrpc/import.c  |  33 ++-
 drivers/staging/lustre/lustre/ptlrpc/niobuf.c  |   4 +
 drivers/staging/lustre/lustre/ptlrpc/recover.c |  57 +++-
 31 files changed, 480 insertions(+), 474 deletions(-)

-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH] Staging:tidspbridge: Fixing coding style

2014-02-28 Thread Masood Mehmood



Fixing some basic coding style issues.

Signed-off-by: Masood Mehmood 
---
 drivers/staging/tidspbridge/rmgr/node.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/tidspbridge/rmgr/node.c 
b/drivers/staging/tidspbridge/rmgr/node.c
index 87dfa92..76f885f 100644
--- a/drivers/staging/tidspbridge/rmgr/node.c
+++ b/drivers/staging/tidspbridge/rmgr/node.c
@@ -246,7 +246,7 @@ static void fill_stream_def(struct node_object *hnode,
struct node_strmdef *pstrm_def,
struct dsp_strmattr *pattrs);
 static void free_stream(struct node_mgr *hnode_mgr, struct stream_chnl stream);
-static int get_fxn_address(struct node_object *hnode, u32 * fxn_addr,
+static int get_fxn_address(struct node_object *hnode, u32 *fxn_addr,
  u32 phase);
 static int get_node_props(struct dcd_manager *hdcd_mgr,
 struct node_object *hnode,
@@ -406,7 +406,7 @@ int node_allocate(struct proc_object *hprocessor,
 
/* check for page aligned Heap size */
if (((attr_in->heap_size) & (PG_SIZE4K - 1))) {
-   pr_err("%s: node heap size not aligned to 4K, size = 0x%x \n",
+   pr_err("%s: node heap size not aligned to 4K, size = 0x%x",
   __func__, attr_in->heap_size);
status = -EINVAL;
} else {
@@ -425,7 +425,7 @@ int node_allocate(struct proc_object *hprocessor,
task_arg_obj.dsp_heap_res_addr),
 pr_ctxt);
if (status) {
-   pr_err("%s: Failed to reserve memory for heap: 0x%x\n",
+   pr_err("%s: Failed to reserve memory for heap: 0x%x",
   __func__, status);
goto func_cont;
}
@@ -703,9 +703,9 @@ DBAPI node_alloc_msg_buf(struct node_object *hnode, u32 
usize,
pattr = _dfltbufattrs; /* set defaults */
 
status = proc_get_processor_id(pnode->processor, _id);
-   if (proc_id != DSP_UNIT) {
+   if (proc_id != DSP_UNIT)
goto func_end;
-   }
+
/*  If segment ID includes MEM_SETVIRTUALSEGID then pbuffer is a
 *  virt  address, so set this info in this node's translator
 *  object for  future ref. If MEM_GETVIRTUALSEGID then retrieve
@@ -886,11 +886,10 @@ int node_connect(struct node_object *node1, u32 stream1,
if (pattrs && pattrs->strm_mode != STRMMODE_PROCCOPY)
return -EPERM;  /* illegal stream mode */
 
-   if (node1_type != NODE_GPP) {
+   if (node1_type != NODE_GPP)
hnode_mgr = node1->node_mgr;
-   } else {
+   else
hnode_mgr = node2->node_mgr;
-   }
 
/* Enter critical section */
mutex_lock(_mgr->node_mgr_lock);
@@ -1576,7 +1575,7 @@ func_end:
  *  Purpose:
  *  Frees the message buffer.
  */
-int node_free_msg_buf(struct node_object *hnode, u8 * pbuffer,
+int node_free_msg_buf(struct node_object *hnode, u8 *pbuffer,
 struct dsp_bufferattr *pattr)
 {
struct node_object *pnode = (struct node_object *)hnode;
@@ -2322,7 +2321,8 @@ int node_terminate(struct node_object *hnode, int 
*pstatus)
if (!hdeh_mgr)
goto func_cont;
 
-   bridge_deh_notify(hdeh_mgr, DSP_SYSERROR, 
DSP_EXCEPTIONABORT);
+   bridge_deh_notify(hdeh_mgr, DSP_SYSERROR,
+   DSP_EXCEPTIONABORT);
}
}
 func_cont:
@@ -2640,8 +2640,7 @@ static void free_stream(struct node_mgr *hnode_mgr, 
struct stream_chnl stream)
  *  Purpose:
  *  Retrieves the address for create, execute or delete phase for a node.
  */
-static int get_fxn_address(struct node_object *hnode, u32 * fxn_addr,
- u32 phase)
+static int get_fxn_address(struct node_object *hnode, u32 *fxn_addr, u32 phase)
 {
char *pstr_fxn_name = NULL;
struct node_mgr *hnode_mgr = hnode->node_mgr;
-- 
1.8.1.4

Re: [PATCH v3 5/5] pci: Add support for creating a generic host_bridge from device tree

2014-02-28 Thread Tanmay Inamdar

Earlier email did not deliver to mailing lists because of plain text
setting problem on my side. Apologies for spamming. Sending it again.

Hello Liviu,

While porting X-Gene PCIe driver to v2 series, following problems were observed.

1. In 'of_create_pci_host_bridge' function, bus_range is defined
locally. So, while walking over list of resources in bridge->windows
later, during X-Gene controller related setup, garbage values are
found in the resource. Please allocate it dynamically.

2. 'domain_nr' problem is partially solved. There are still some
places where functions are getting invalid domain_nr.  For example,
'pci_alloc_child_bus' tries to get domain_nr when bridge is not
assigned to bus. You may want to look for all the places where
pci_domain_nr is used. Please see below dump -->

pci 0001:00:00.0: scanning [bus 00-00] behind bridge, pass 1
[ cut here ]
WARNING: CPU: 0 PID: 1 at
/home/tinamdar/work/open-source/linux/fs/sysfs/dir.c:52
sysfs_warn_dup+0x80/0xc0()
sysfs: cannot create duplicate filename '/class/pci_bus/:01'
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc4+ #37
Call trace:
[] dump_backtrace+0x0/0x140
[] show_stack+0x14/0x20
[] dump_stack+0x78/0xc4
[] warn_slowpath_common+0x88/0xc0
[] warn_slowpath_fmt+0x50/0x60
[] sysfs_warn_dup+0x80/0xc0
[] sysfs_do_create_link_sd.isra.2+0xf8/0x100
[] sysfs_create_link+0x20/0x40
[] device_add+0x41c/0x520
[] device_register+0x1c/0x40
[] pci_add_new_bus+0x284/0x380
[] pci_scan_bridge+0x4e0/0x540
[] pci_scan_child_bus+0xb4/0x140
[] pci_rescan_bus+0x14/0x40
[] xgene_pcie_probe_bridge+0x688/0x750
[] platform_drv_probe+0x24/0x60
[] really_probe+0xf4/0x220
[] __driver_attach+0xa4/0xc0
[] bus_for_each_dev+0x58/0xa0
[] driver_attach+0x20/0x40
[] bus_add_driver+0x150/0x220
[] driver_register+0x60/0x120
[] __platform_driver_register+0x60/0x80
[] xgene_pcie_driver_init+0x18/0x20
[] do_one_initcall+0xe4/0x160
[] kernel_init_freeable+0x138/0x1d8
[] kernel_init+0x10/0xe0
---[ end trace 53db1c3a7fbdeb88 ]---
[ cut here ]
WARNING: CPU: 0 PID: 1 at
/home/tinamdar/work/open-source/linux/drivers/pci/probe.c:711
pci_add_new_bus+0x36c/0x380()

Thanks,
Tanmay

On Fri, Feb 28, 2014 at 6:01 PM, Tanmay Inamdar  wrote:
> Hello Liviu,
>
> While porting X-Gene PCIe driver to v2 series, following problems were
> observed.
>
> 1. In 'of_create_pci_host_bridge' function, bus_range is defined locally.
> So, while walking over list of resources in bridge->windows later, during
> X-Gene controller related setup, garbage values are found in the resource.
> Please allocate it dynamically.
>
> 2. 'domain_nr' problem is partially solved. There are still some places
> where functions are getting invalid domain_nr.  For example,
> 'pci_alloc_child_bus' tries to get domain_nr when bridge is not assigned to
> bus. You may want to look for all the places where pci_domain_nr is used.
> Please see below dump -->
>
> pci 0001:00:00.0: scanning [bus 00-00] behind bridge, pass 1
> [ cut here ]
> WARNING: CPU: 0 PID: 1 at
> /home/tinamdar/work/open-source/linux/fs/sysfs/dir.c:52
> sysfs_warn_dup+0x80/0xc0()
> sysfs: cannot create duplicate filename '/class/pci_bus/:01'
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.14.0-rc4+ #37
> Call trace:
> [] dump_backtrace+0x0/0x140
> [] show_stack+0x14/0x20
> [] dump_stack+0x78/0xc4
> [] warn_slowpath_common+0x88/0xc0
> [] warn_slowpath_fmt+0x50/0x60
> [] sysfs_warn_dup+0x80/0xc0
> [] sysfs_do_create_link_sd.isra.2+0xf8/0x100
> [] sysfs_create_link+0x20/0x40
> [] device_add+0x41c/0x520
> [] device_register+0x1c/0x40
> [] pci_add_new_bus+0x284/0x380
> [] pci_scan_bridge+0x4e0/0x540
> [] pci_scan_child_bus+0xb4/0x140
> [] pci_rescan_bus+0x14/0x40
> [] xgene_pcie_probe_bridge+0x688/0x750
> [] platform_drv_probe+0x24/0x60
> [] really_probe+0xf4/0x220
> [] __driver_attach+0xa4/0xc0
> [] bus_for_each_dev+0x58/0xa0
> [] driver_attach+0x20/0x40
> [] bus_add_driver+0x150/0x220
> [] driver_register+0x60/0x120
> [] __platform_driver_register+0x60/0x80
> [] xgene_pcie_driver_init+0x18/0x20
> [] do_one_initcall+0xe4/0x160
> [] kernel_init_freeable+0x138/0x1d8
> [] kernel_init+0x10/0xe0
> ---[ end trace 53db1c3a7fbdeb88 ]---
> [ cut here ]
> WARNING: CPU: 0 PID: 1 at
> /home/tinamdar/work/open-source/linux/drivers/pci/probe.c:711
> pci_add_new_bus+0x36c/0x380()
>
> Thanks,
> Tanmay
>
>
>
> On Fri, Feb 28, 2014 at 5:08 AM, Liviu Dudau  wrote:
>>
>> Several platforms use a rather generic version of parsing
>> the device tree to find the host bridge ranges. Move the common code
>> into the generic PCI code and use it to create a pci_host_bridge
>> structure that can be used by arch code.
>>
>> Based on early attempts by Andrew Murray to unify the code.
>> Used powerpc and microblaze PCI code as starting point.
>>
>> Signed-off-by: Liviu Dudau 
>>
>> diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c
>> index

Re: Final: Add 32 bit VDSO time function support

2014-02-28 Thread Andy Lutomirski

On Thu, Feb 27, 2014 at 11:22 PM, Stefani Seibold  wrote:
> Am Mittwoch, den 26.02.2014, 16:55 -0800 schrieb Andy Lutomirski:
>>
>> Once I patch it to work, your 32-bit code is considerably faster than
>> the 64-bit case.  It's enough faster that I suspect a bug.  Dumping
>> the in-memory shows some rather suspicious nops before the rdtsc
>> instruction.  I suspect that you've forgotten to run the 32-bit vdso
>> through the alternatives code.  The is a nasty bug: it will appear to
>> work, but you'll see non-monotonic times on some SMP systems.
>>
>
> I didn't know this. My basic test case is a KVM which defaults to 1 cpu.
> Thanks for discovering the issue.

This leads to a potentially interesting question: is rdtsc_barrier()
actually necessary on UP?  IIRC the point is that, if an
rdtsc_barrier(); rdtsc in one thread is "before" (in the sense of
being synchronized by some memory operation) an rdtsc_barrier(); rdtsc
in another thread, then the first rdtsc needs to return an earlier or
equal time to the second one.

I assume that no UP CPU is silly enough to execute two rdtsc
instructions out of order relative to each other in the absence of
barriers.  So this is a nonissue on UP.

On the other hand, suppose that some code does:

volatile long x = *(something that's not in cache)
clock_gettime

I can imagine a modern CPU speculating far enough ahead that the rdtsc
happens *before* the cache miss.  This won't cause visible
non-monotonicity as far as I can see, but it might annoy people who
try to benchmark their code.

Note: actually making this change might be a bit tricky.  I don't know
if the alternatives code is smart enough.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/1] mm, shmem: map few pages around fault address if they are in page cache

2014-02-28 Thread Andrew Morton

On Fri, 28 Feb 2014 16:35:16 -0800 Ning Qu  wrote:

> Sorry about my fault about the experiments, here is the real one.
> 
> Btw, apparently, there are still some questions about the results and
> I will sync with Kirill about his test command line.
> 
> Below is just some simple experiment numbers from this patch, let me know if
> you would like more:
> 
> Tested on Xeon machine with 64GiB of RAM, using the current default fault
> order 4.
> 
> Sequential access 8GiB file
> Baselinewith-patch
> 1 thread
> minor fault 8,389,0524,456,530
> time, seconds9.558.31

The numbers still seem wrong.  I'd expect to see almost exactly 2M minor
faults with this test.

Looky:

#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define G (1024 * 1024 * 1024)

int main(int argc, char *argv[])
{
char *p;
int fd;
unsigned long idx;
int sum = 0;

fd = open("foo", O_RDONLY);
if (fd < 0) {
perror("open");
exit(1);
}
p = mmap(NULL, 1 * G, PROT_READ, MAP_PRIVATE, fd, 0);
if (p == MAP_FAILED) {
perror("mmap");
exit(1);
}

for (idx = 0; idx < 1 * G; idx += 4096)
sum += p[idx];
printf("%d\n", sum);
exit(0);
}

z:/home/akpm> /usr/bin/time ./a.out
0
0.05user 0.33system 0:00.38elapsed 99%CPU (0avgtext+0avgdata 
4195856maxresident)k
0inputs+0outputs (0major+262264minor)pagefaults 0swaps

z:/home/akpm> dc
16o
262264 4 * p
1001E0

That's close!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 11/19] perf, c2c: Add in sort on physid

2014-02-28 Thread Namhyung Kim

Hi Andi,

On Sat, Mar 1, 2014 at 1:07 AM, Andi Kleen  wrote:
>> I don't think I understand the problem enough to know what to fix.  I just
>> copied this piece of code from builtin-report.c and things seemed to work.
>>
>> Mind giving me some details and I can look at fixing it. :-)
>
> sort.c even though has all these sort keys only sorts by period.
> It should instead sort by all the specified keys in order instead.
>
> Namhyung looked at it at some point.

Yes, I'm working on it now, but I only have a little time.  Hopefully
I can send a rfc version next week..

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/1] mm: implement ->map_pages for shmem/tmpfs

2014-02-28 Thread Hugh Dickins

On Fri, 28 Feb 2014, Ning Qu wrote:

> In shmem/tmpfs, we also use the generic filemap_map_pages,
> seems the additional checking is not worth a separate version
> of map_pages for it.
> 
> Signed-off-by: Ning Qu 
> ---
>  mm/shmem.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 1f18c9d..2ea4e89 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2783,6 +2783,7 @@ static const struct super_operations shmem_ops = {
>  
>  static const struct vm_operations_struct shmem_vm_ops = {
>   .fault  = shmem_fault,
> + .map_pages  = filemap_map_pages,
>  #ifdef CONFIG_NUMA
>   .set_policy = shmem_set_policy,
>   .get_policy = shmem_get_policy,
> -- 

(There's no need for a 0/1, all the info should go into the one patch.)

I expect this will prove to be a very sensible and adequate patch,
thank you: it probably wouldn't be worth more effort to give shmem
anything special of its own, and filemap_map_pages() is already
(almost) coping with exceptional entries.

But I can't Ack it until I've tested it some more, won't be able to
do so until Sunday; and even then some doubt, since this and Kirill's
are built upon mmotm/next, which after a while gives me spinlock
lockups under load these days, yet to be investigated.

"almost" above because, Kirill, even without Ning's extension to
shmem, your filemap_map_page() soon crashes on an exceptional entry:

Don't try to dereference an exceptional entry.

Signed-off-by: Hugh Dickins 

--- mmotm+kirill/mm/filemap.c   2014-02-28 15:17:50.984019060 -0800
+++ linux/mm/filemap.c  2014-02-28 16:38:04.976633308 -0800
@@ -2084,7 +2084,7 @@ repeat:
if (radix_tree_deref_retry(page))
break;
else
-   goto next;
+   continue;
}
 
if (!page_cache_get_speculative(page))
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

kernfs: possible deadlock between of->mutex and mmap_sem

2014-02-28 Thread Sasha Levin


Hi all,

I've stumbled on the following while fuzzing with trinity inside a KVM tools running the latest 
-next kernel.


We deal with files that have an mmap op by giving them a different locking class than the files 
which don't due to mmap_sem nesting being different for those files.


We assume that for mmap supporting files, of->mutex will be nested inside mm->mmap_sem. However, 
this is not always the case. Consider the following:


kernfs_fop_write()
copy_from_user()
might_fault()

might_fault() suggests that we may lock mm->mmap_sem, which causes a reverse lock nesting of 
mm->mmap_sem inside of of->mutex.


I'll send a patch to fix it some time next week unless someone beats me to it :)


[ 1182.846501] ==
[ 1182.847256] [ INFO: possible circular locking dependency detected ]
[ 1182.848111] 3.14.0-rc4-next-20140228-sasha-00011-g4077c67-dirty #26 Tainted: 
GW
[ 1182.849088] ---
[ 1182.849927] trinity-c236/10658 is trying to acquire lock:
[ 1182.850094]  (>mutex#2){+.+.+.}, at: [] 
kernfs_fop_mmap+0x54/0x120
[ 1182.850094]
[ 1182.850094] but task is already holding lock:
[ 1182.850094]  (>mmap_sem){++}, at: [] 
vm_mmap_pgoff+0x6e/0xe0
[ 1182.850094]
[ 1182.850094] which lock already depends on the new lock.
[ 1182.850094]
[ 1182.850094]
[ 1182.850094] the existing dependency chain (in reverse order) is:
[ 1182.850094]
-> #1 (>mmap_sem){++}:
[ 1182.856968][] 
validate_chain+0x6c5/0x7b0

[ 1182.856968][] 
__lock_acquire+0x4cd/0x5a0
[ 1182.856968][] 
lock_acquire+0x182/0x1d0

[ 1182.856968][] might_fault+0x7e/0xb0
[ 1182.860975][] 
kernfs_fop_write+0xd8/0x190

[ 1182.860975][] vfs_write+0xe3/0x1d0
[ 1182.860975][] 
SyS_write+0x5d/0xa0
[ 1182.860975][] tracesys+0xdd/0xe2
[ 1182.860975]
-> #0 (>mutex#2){+.+.+.}:
[ 1182.860975][] 
check_prev_add+0x13f/0x560
[ 1182.860975][] 
validate_chain+0x6c5/0x7b0

[ 1182.860975][] 
__lock_acquire+0x4cd/0x5a0
[ 1182.860975][] 
lock_acquire+0x182/0x1d0
[ 1182.860975][] 
mutex_lock_nested+0x6a/0x510

[ 1182.860975][] kernfs_fop_mmap+0x54/0x120
[ 1182.860975][] mmap_region+0x310/0x5c0
[ 1182.860975][] do_mmap_pgoff+0x385/0x430
[ 1182.860975][] vm_mmap_pgoff+0x8f/0xe0
[ 1182.860975][] 
SyS_mmap_pgoff+0x1b0/0x210
[ 1182.860975][] SyS_mmap+0x1d/0x20
[ 1182.860975][] tracesys+0xdd/0xe2
[ 1182.860975]
[ 1182.860975] other info that might help us debug this:
[ 1182.860975]
[ 1182.860975]  Possible unsafe locking scenario:
[ 1182.860975]
[ 1182.860975]CPU0CPU1
[ 1182.860975]
[ 1182.860975]   lock(>mmap_sem);
[ 1182.860975]lock(>mutex#2);
[ 1182.860975]lock(>mmap_sem);
[ 1182.860975]   lock(>mutex#2);
[ 1182.860975]
[ 1182.860975]  *** DEADLOCK ***
[ 1182.860975]
[ 1182.860975] 1 lock held by trinity-c236/10658:
[ 1182.860975]  #0:  (>mmap_sem){++}, at: [] 
vm_mmap_pgoff+0x6e/0xe0
[ 1182.860975]
[ 1182.860975] stack backtrace:
[ 1182.860975] CPU: 2 PID: 10658 Comm: trinity-c236 Tainted: GW 
3.14.0-rc4-next-20140228-sasha-00011-g4077c67-dirty #26

[ 1182.860975]   88011911fa48 8438e945 

[ 1182.860975]   88011911fa98 811a0109 
88011911fab8
[ 1182.860975]  88011911fab8 88011911fa98 880119128cc0 
880119128cf8
[ 1182.860975] Call Trace:
[ 1182.860975]  [] dump_stack+0x52/0x7f
[ 1182.860975]  [] print_circular_bug+0x129/0x160
[ 1182.860975]  [] check_prev_add+0x13f/0x560
[ 1182.860975]  [] ? 
deactivate_slab+0x511/0x550
[ 1182.860975]  [] 
validate_chain+0x6c5/0x7b0

[ 1182.860975]  [] __lock_acquire+0x4cd/0x5a0
[ 1182.860975]  [] ? mmap_region+0x24a/0x5c0
[ 1182.860975]  [] 
lock_acquire+0x182/0x1d0

[ 1182.860975]  [] ? kernfs_fop_mmap+0x54/0x120
[ 1182.860975]  [] 
mutex_lock_nested+0x6a/0x510
[ 1182.860975]  [] ? kernfs_fop_mmap+0x54/0x120
[ 1182.860975]  [] ? get_parent_ip+0x11/0x50
[ 1182.860975]  [] ? kernfs_fop_mmap+0x54/0x120
[ 1182.860975]  [] kernfs_fop_mmap+0x54/0x120
[ 1182.860975]  [] mmap_region+0x310/0x5c0
[ 1182.860975]  [] do_mmap_pgoff+0x385/0x430
[ 1182.860975]  [] ? vm_mmap_pgoff+0x6e/0xe0
[ 1182.860975]  [] vm_mmap_pgoff+0x8f/0xe0
[ 1182.860975]  [] ? __rcu_read_unlock+0x44/0xb0
[ 1182.860975]  [] ? dup_fd+0x3c0/0x3c0
[ 1182.860975]  [] SyS_mmap_pgoff+0x1b0/0x210
[ 1182.860975]  [] SyS_mmap+0x1d/0x20
[ 1182.860975]  [] tracesys+0xdd/0xe2


Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] audit: Use struct net not pid_t to remember the network namespce to reply in

2014-02-28 Thread Richard Guy Briggs

On 14/02/28, Eric W. Biederman wrote:
> While reading through 3.14-rc1 I found a pretty siginficant mishandling
> of network namespaces in the recent audit changes.
> 
> In struct audit_netlink_list and audit_reply add a reference to the
> network namespace of the caller and remove the userspace pid of the
> caller.  This cleanly remembers the callers network namespace, and
> removes a huge class of races and nasty failure modes that can occur
> when attempting to relook up the callers network namespace from a pid_t
> (including the caller's network namespace changing, pid wraparound, and
> the pid simply not being present).

Ok, so I see that avoiding pid_t in struct audit_reply and struct
audit_netlink_list is necessary.  Why not switch to struct pid?

How does this patch solve a caller's network namespace changing?

> Signed-off-by: "Eric W. Biederman" 
> ---
>  kernel/audit.c   |   10 ++
>  kernel/audit.h   |2 +-
>  kernel/auditfilter.c |3 ++-
>  3 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/audit.c b/kernel/audit.c
> index 34c5a2310fbf..1e5756f16f6f 100644
> --- a/kernel/audit.c
> +++ b/kernel/audit.c
> @@ -182,7 +182,7 @@ struct audit_buffer {
>  
>  struct audit_reply {
>   __u32 portid;
> - pid_t pid;
> + struct net *net;
>   struct sk_buff *skb;
>  };
>  
> @@ -500,7 +500,7 @@ int audit_send_list(void *_dest)
>  {
>   struct audit_netlink_list *dest = _dest;
>   struct sk_buff *skb;
> - struct net *net = get_net_ns_by_pid(dest->pid);
> + struct net *net = dest->net;
>   struct audit_net *aunet = net_generic(net, audit_net_id);
>  
>   /* wait for parent to finish and send an ACK */
> @@ -510,6 +510,7 @@ int audit_send_list(void *_dest)
>   while ((skb = __skb_dequeue(>q)) != NULL)
>   netlink_unicast(aunet->nlsk, skb, dest->portid, 0);
>  
> + put_net(net);
>   kfree(dest);
>  
>   return 0;
> @@ -543,7 +544,7 @@ out_kfree_skb:
>  static int audit_send_reply_thread(void *arg)
>  {
>   struct audit_reply *reply = (struct audit_reply *)arg;
> - struct net *net = get_net_ns_by_pid(reply->pid);
> + struct net *net = reply->net;
>   struct audit_net *aunet = net_generic(net, audit_net_id);
>  
>   mutex_lock(_cmd_mutex);
> @@ -552,6 +553,7 @@ static int audit_send_reply_thread(void *arg)
>   /* Ignore failure. It'll only happen if the sender goes away,
>  because our timeout is set to infinite. */
>   netlink_unicast(aunet->nlsk , reply->skb, reply->portid, 0);
> + put_net(net);
>   kfree(reply);
>   return 0;
>  }
> @@ -583,8 +585,8 @@ static void audit_send_reply(__u32 portid, int seq, int 
> type, int done,
>   if (!skb)
>   goto out;
>  
> + reply->net = get_net(current->nsproxy->net_ns);
>   reply->portid = portid;
> - reply->pid = task_pid_vnr(current);
>   reply->skb = skb;
>  
>   tsk = kthread_run(audit_send_reply_thread, reply, "audit_send_reply");
> diff --git a/kernel/audit.h b/kernel/audit.h
> index 57cc64d67718..8df132214606 100644
> --- a/kernel/audit.h
> +++ b/kernel/audit.h
> @@ -247,7 +247,7 @@ extern void   audit_panic(const char 
> *message);
>  
>  struct audit_netlink_list {
>   __u32 portid;
> - pid_t pid;
> + struct net *net;
>   struct sk_buff_head q;
>  };
>  
> diff --git a/kernel/auditfilter.c b/kernel/auditfilter.c
> index 14a78cca384e..a5e3d73d73e4 100644
> --- a/kernel/auditfilter.c
> +++ b/kernel/auditfilter.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "audit.h"
>  
>  /*
> @@ -1083,8 +1084,8 @@ int audit_list_rules_send(__u32 portid, int seq)
>   dest = kmalloc(sizeof(struct audit_netlink_list), GFP_KERNEL);
>   if (!dest)
>   return -ENOMEM;
> + dest->net = get_net(current->nsproxy->net_ns);
>   dest->portid = portid;
> - dest->pid = task_pid_vnr(current);
>   skb_queue_head_init(>q);
>  
>   mutex_lock(_filter_mutex);
> -- 
> 1.7.5.4
> 

- RGB

--
Richard Guy Briggs 
Senior Software Engineer, Kernel Security, AMER ENG Base Operating Systems, Red 
Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635, Alt: +1.613.693.0684x3545
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 11/19] perf, c2c: Add in sort on physid

2014-02-28 Thread Andi Kleen

> I don't think I understand the problem enough to know what to fix.  I just
> copied this piece of code from builtin-report.c and things seemed to work.
> 
> Mind giving me some details and I can look at fixing it. :-)

sort.c even though has all these sort keys only sorts by period.
It should instead sort by all the specified keys in order instead.

Namhyung looked at it at some point.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/3] cpufreq: Initialize policy before making it available for others to use

2014-02-28 Thread Rafael J. Wysocki

On Tuesday, February 25, 2014 02:20:10 PM Viresh Kumar wrote:
> Policy must be fully initialized before it is being made available for use by
> others.

True enough.  And the problem is?

> This patch moves some initialization code before making policy available
> for others.

So why/how exactly does this fix the problem?

> Signed-off-by: Viresh Kumar 
> ---
>  drivers/cpufreq/cpufreq.c | 28 ++--
>  1 file changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index cc4f244..110c0cd 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -1116,6 +1116,20 @@ static int __cpufreq_add_dev(struct device *dev, 
> struct subsys_interface *sif,
>   goto err_set_policy_cpu;
>   }
>  
> + /* related cpus should atleast have policy->cpus */
> + cpumask_or(policy->related_cpus, policy->related_cpus, policy->cpus);
> +
> + /*
> +  * affected cpus must always be the one, which are online. We aren't
> +  * managing offline cpus here.
> +  */
> + cpumask_and(policy->cpus, policy->cpus, cpu_online_mask);
> +
> + if (!frozen) {
> + policy->user_policy.min = policy->min;
> + policy->user_policy.max = policy->max;
> + }
> +
>   write_lock_irqsave(_driver_lock, flags);
>   for_each_cpu(j, policy->cpus)
>   per_cpu(cpufreq_cpu_data, j) = policy;
> @@ -1169,20 +1183,6 @@ static int __cpufreq_add_dev(struct device *dev, 
> struct subsys_interface *sif,
>   }
>   }
>  
> - /* related cpus should atleast have policy->cpus */
> - cpumask_or(policy->related_cpus, policy->related_cpus, policy->cpus);
> -
> - /*
> -  * affected cpus must always be the one, which are online. We aren't
> -  * managing offline cpus here.
> -  */
> - cpumask_and(policy->cpus, policy->cpus, cpu_online_mask);
> -
> - if (!frozen) {
> - policy->user_policy.min = policy->min;
> - policy->user_policy.max = policy->max;
> - }
> -
>   blocking_notifier_call_chain(_policy_notifier_list,
>CPUFREQ_START, policy);
>  
> 

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

contact me

2014-02-28 Thread Harley Wang

contact me for details of $21.4m transfer.


The information contained in this electronic communication is intended solely 
for the individual(s) or entity to which it is addressed. It may contain 
proprietary, confidential and/or legally privileged information. Any review, 
retransmission, dissemination, printing, copying or other use of, or taking any 
action in reliance on the contents of this information by person(s) or entities 
other than the intended recipient is strictly prohibited and may be unlawful. 
If you have received this communication in error, please notify us by 
responding to this email or telephone and immediately and permanently delete 
all copies of this message and any attachments from your system(s). The 
contents of this message do not necessarily represent the views or policies of 
Aditya Birla Group. 

Computer viruses can be transmitted via email. Aditya Birla Group Companies 
attempts to sweep e-mails and attachments for viruses, it does not guarantee 
that either are virus free. The recipient should check this email and any 
attachments for the presence of viruses.  Aditya Birla Group does not accept 
any liability for any damage sustained as a result of viruses.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] cpufreq: move call to __find_governor() to cpufreq_init_policy()

2014-02-28 Thread Rafael J. Wysocki

On Tuesday, February 25, 2014 02:20:09 PM Viresh Kumar wrote:
> We call __find_governor() during addition of first CPU of every policy to find
> the last governor used for this CPU before it was hotplugged-out.
> 
> After that we call cpufreq_parse_governor() in cpufreq_init_policy() either 
> with
> this governor or default governor. And right after that policy->governor is 
> set
> to NULL.
> 
> So, instead of doing this move the relevant parts to cpufreq_init_policy()
> policy only and initialize policy->governor to NULL at the beginning.
> 
> Signed-off-by: Viresh Kumar 
> ---
> 
> Hi Saravana,
> 
> I hope only the first two patches would fix things for you but probably you 
> can
> test all three.
> 
> @Rafael: We might need to get these in next rc only as these are more or less
> fixes.
> 
>  drivers/cpufreq/cpufreq.c | 34 --
>  1 file changed, 16 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index c755b5f..cc4f244 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -879,18 +879,27 @@ err_out_kobj_put:
>  
>  static void cpufreq_init_policy(struct cpufreq_policy *policy)
>  {
> + struct cpufreq_governor *gov = NULL;
>   struct cpufreq_policy new_policy;
>   int ret = 0;
>  
>   memcpy(_policy, policy, sizeof(*policy));
>  

And while I'm at it, can we *please* avoid adding new #ifdef blocks into
function bodies?

Please introduce a wrapper around __find_governor() returning NULL for
CONFIG_HOTPLUG_CPU unset.

> + /* Update governor of new_policy to the governor used before hotplug */
> +#ifdef CONFIG_HOTPLUG_CPU
> + gov = __find_governor(per_cpu(cpufreq_cpu_governor, policy->cpu));
> +#endif
> + if (gov)
> + pr_debug("Restoring governor %s for cpu %d\n",
> + policy->governor->name, policy->cpu);
> + else
> + gov = CPUFREQ_DEFAULT_GOVERNOR;
> +
> + new_policy.governor = gov;
> +
>   /* Use the default policy if its valid. */
>   if (cpufreq_driver->setpolicy)
> - cpufreq_parse_governor(policy->governor->name,
> - _policy.policy, NULL);
> -
> - /* assure that the starting sequence is run in cpufreq_set_policy */
> - policy->governor = NULL;
> + cpufreq_parse_governor(gov->name, _policy.policy, NULL);
>  
>   /* set default policy */
>   ret = cpufreq_set_policy(policy, _policy);
> @@ -944,11 +953,11 @@ static struct cpufreq_policy 
> *cpufreq_policy_restore(unsigned int cpu)
>   unsigned long flags;
>  
>   read_lock_irqsave(_driver_lock, flags);
> -
>   policy = per_cpu(cpufreq_cpu_data_fallback, cpu);
> -

Why do these whitespace changes belong to this patch?

>   read_unlock_irqrestore(_driver_lock, flags);
>  
> + policy->governor = NULL;
> +
>   return policy;
>  }
>  
> @@ -1036,7 +1045,6 @@ static int __cpufreq_add_dev(struct device *dev, struct 
> subsys_interface *sif,
>   unsigned long flags;
>  #ifdef CONFIG_HOTPLUG_CPU
>   struct cpufreq_policy *tpolicy;
> - struct cpufreq_governor *gov;
>  #endif
>  
>   if (cpu_is_offline(cpu))
> @@ -1094,7 +1102,6 @@ static int __cpufreq_add_dev(struct device *dev, struct 
> subsys_interface *sif,
>   else
>   policy->cpu = cpu;
>  
> - policy->governor = CPUFREQ_DEFAULT_GOVERNOR;
>   cpumask_copy(policy->cpus, cpumask_of(cpu));
>  
>   init_completion(>kobj_unregister);
> @@ -1179,15 +1186,6 @@ static int __cpufreq_add_dev(struct device *dev, 
> struct subsys_interface *sif,
>   blocking_notifier_call_chain(_policy_notifier_list,
>CPUFREQ_START, policy);
>  
> -#ifdef CONFIG_HOTPLUG_CPU
> - gov = __find_governor(per_cpu(cpufreq_cpu_governor, cpu));
> - if (gov) {
> - policy->governor = gov;
> - pr_debug("Restoring governor %s for cpu %d\n",
> -policy->governor->name, cpu);
> - }
> -#endif
> -
>   if (!frozen) {
>   ret = cpufreq_add_dev_interface(policy, dev);
>   if (ret)
> 

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC][PATCH 0/5] arch: atomic rework

2014-02-28 Thread Paul E. McKenney

On Thu, Feb 27, 2014 at 12:53:12PM -0800, Paul E. McKenney wrote:
> On Thu, Feb 27, 2014 at 11:47:08AM -0800, Linus Torvalds wrote:
> > On Thu, Feb 27, 2014 at 11:06 AM, Paul E. McKenney
> >  wrote:
> > >
> > > 3.  The comparison was against another RCU-protected pointer,
> > > where that other pointer was properly fetched using one
> > > of the RCU primitives.  Here it doesn't matter which pointer
> > > you use.  At least as long as the rcu_assign_pointer() for
> > > that other pointer happened after the last update to the
> > > pointed-to structure.
> > >
> > > I am a bit nervous about #3.  Any thoughts on it?
> > 
> > I think that it might be worth pointing out as an example, and saying
> > that code like
> > 
> >p = atomic_read(consume);
> >X;
> >q = atomic_read(consume);
> >Y;
> >if (p == q)
> > data = p->val;
> > 
> > then the access of "p->val" is constrained to be data-dependent on
> > *either* p or q, but you can't really tell which, since the compiler
> > can decide that the values are interchangeable.
> > 
> > I cannot for the life of me come up with a situation where this would
> > matter, though. If "X" contains a fence, then that fence will be a
> > stronger ordering than anything the consume through "p" would
> > guarantee anyway. And if "X" does *not* contain a fence, then the
> > atomic reads of p and q are unordered *anyway*, so then whether the
> > ordering to the access through "p" is through p or q is kind of
> > irrelevant. No?
> 
> I can make a contrived litmus test for it, but you are right, the only
> time you can see it happen is when X has no barriers, in which case
> you don't have any ordering anyway -- both the compiler and the CPU can
> reorder the loads into p and q, and the read from p->val can, as you say,
> come from either pointer.
> 
> For whatever it is worth, hear is the litmus test:
> 
> T1:   p = kmalloc(...);
>   if (p == NULL)
>   deal_with_it();
>   p->a = 42;  /* Each field in its own cache line. */
>   p->b = 43;
>   p->c = 44;
>   atomic_store_explicit(, p, memory_order_release);
>   p->b = 143;
>   p->c = 144;
>   atomic_store_explicit(, p, memory_order_release);
> 
> T2:   p = atomic_load_explicit(, memory_order_consume);
>   r1 = p->b;  /* Guaranteed to get 143. */
>   q = atomic_load_explicit(, memory_order_consume);
>   if (p == q) {
>   /* The compiler decides that q->c is same as p->c. */
>   r2 = p->c; /* Could get 44 on weakly order system. */
>   }
> 
> The loads from gp1 and gp2 are, as you say, unordered, so you get what
> you get.
> 
> And publishing a structure via one RCU-protected pointer, updating it,
> then publishing it via another pointer seems to me to be asking for
> trouble anyway.  If you really want to do something like that and still
> see consistency across all the fields in the structure, please put a lock
> in the structure and use it to guard updates and accesses to those fields.

And here is a patch documenting the restrictions for the current Linux
kernel.  The rules change a bit due to rcu_dereference() acting a bit
differently than atomic_load_explicit(, memory_order_consume).

Thoughts?

Thanx, Paul



documentation: Record rcu_dereference() value mishandling

Recent LKML discussings (see http://lwn.net/Articles/586838/ and
http://lwn.net/Articles/588300/ for the LWN writeups) brought out
some ways of misusing the return value from rcu_dereference() that
are not necessarily completely intuitive.  This commit therefore
documents what can and cannot safely be done with these values.

Signed-off-by: Paul E. McKenney 

diff --git a/Documentation/RCU/00-INDEX b/Documentation/RCU/00-INDEX
index fa57139f50bf..f773a264ae02 100644
--- a/Documentation/RCU/00-INDEX
+++ b/Documentation/RCU/00-INDEX
@@ -12,6 +12,8 @@ lockdep-splat.txt
- RCU Lockdep splats explained.
 NMI-RCU.txt
- Using RCU to Protect Dynamic NMI Handlers
+rcu_dereference.txt
+   - Proper care and feeding of return values from rcu_dereference()
 rcubarrier.txt
- RCU and Unloadable Modules
 rculist_nulls.txt
diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
index 9d10d1db16a5..877947130ebe 100644
--- a/Documentation/RCU/checklist.txt
+++ b/Documentation/RCU/checklist.txt
@@ -114,12 +114,16 @@ over a rather long period of time, but improvements are 
always welcome!
http://www.openvms.compaq.com/wizard/wiz_2637.html
 
The rcu_dereference() primitive is also an excellent
-   documentation aid, letting the person reading the code
-   know exactly which pointers are protected by RCU.
+   documentation aid, letting the person reading the
+   code know

Re: [PATCH 08/19] perf c2c: Shared data analyser

2014-02-28 Thread Andi Kleen

> David:
>  It looks like you're running on an older Intel processor, which is missing 
> necessary events for C2C to work.

mem-loads should be supported Nehalem and up.
mem-stores is Sandy Bridge and up

You can check in perf list

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 1/3] cpufreq: move call to __find_governor() to cpufreq_init_policy()

2014-02-28 Thread Rafael J. Wysocki

On Tuesday, February 25, 2014 02:20:09 PM Viresh Kumar wrote:
> We call __find_governor() during addition of first CPU of every policy to find
> the last governor used for this CPU before it was hotplugged-out.
> 
> After that we call cpufreq_parse_governor() in cpufreq_init_policy() either 
> with
> this governor or default governor. And right after that policy->governor is 
> set
> to NULL.

This is a problem, right?  So care to write *why* it is a problem here?

> So, instead of doing this move the relevant parts to cpufreq_init_policy()
> policy only and initialize policy->governor to NULL at the beginning.

And this change is supposed to fix that problem, right?  You're not moving
stuff around just for the fun of it?

> Signed-off-by: Viresh Kumar 
>
> ---
> 
> Hi Saravana,
> 
> I hope only the first two patches would fix things for you but probably you 
> can
> test all three.
> 
> @Rafael: We might need to get these in next rc only as these are more or less
> fixes.
> 
>  drivers/cpufreq/cpufreq.c | 34 --
>  1 file changed, 16 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> index c755b5f..cc4f244 100644
> --- a/drivers/cpufreq/cpufreq.c
> +++ b/drivers/cpufreq/cpufreq.c
> @@ -879,18 +879,27 @@ err_out_kobj_put:
>  
>  static void cpufreq_init_policy(struct cpufreq_policy *policy)
>  {
> + struct cpufreq_governor *gov = NULL;
>   struct cpufreq_policy new_policy;
>   int ret = 0;
>  
>   memcpy(_policy, policy, sizeof(*policy));
>  
> + /* Update governor of new_policy to the governor used before hotplug */
> +#ifdef CONFIG_HOTPLUG_CPU
> + gov = __find_governor(per_cpu(cpufreq_cpu_governor, policy->cpu));
> +#endif
> + if (gov)
> + pr_debug("Restoring governor %s for cpu %d\n",
> + policy->governor->name, policy->cpu);
> + else
> + gov = CPUFREQ_DEFAULT_GOVERNOR;
> +
> + new_policy.governor = gov;
> +
>   /* Use the default policy if its valid. */
>   if (cpufreq_driver->setpolicy)
> - cpufreq_parse_governor(policy->governor->name,
> - _policy.policy, NULL);
> -
> - /* assure that the starting sequence is run in cpufreq_set_policy */
> - policy->governor = NULL;
> + cpufreq_parse_governor(gov->name, _policy.policy, NULL);
>  
>   /* set default policy */
>   ret = cpufreq_set_policy(policy, _policy);
> @@ -944,11 +953,11 @@ static struct cpufreq_policy 
> *cpufreq_policy_restore(unsigned int cpu)
>   unsigned long flags;
>  
>   read_lock_irqsave(_driver_lock, flags);
> -
>   policy = per_cpu(cpufreq_cpu_data_fallback, cpu);
> -
>   read_unlock_irqrestore(_driver_lock, flags);
>  
> + policy->governor = NULL;
> +
>   return policy;
>  }
>  
> @@ -1036,7 +1045,6 @@ static int __cpufreq_add_dev(struct device *dev, struct 
> subsys_interface *sif,
>   unsigned long flags;
>  #ifdef CONFIG_HOTPLUG_CPU
>   struct cpufreq_policy *tpolicy;
> - struct cpufreq_governor *gov;
>  #endif
>  
>   if (cpu_is_offline(cpu))
> @@ -1094,7 +1102,6 @@ static int __cpufreq_add_dev(struct device *dev, struct 
> subsys_interface *sif,
>   else
>   policy->cpu = cpu;
>  
> - policy->governor = CPUFREQ_DEFAULT_GOVERNOR;
>   cpumask_copy(policy->cpus, cpumask_of(cpu));
>  
>   init_completion(>kobj_unregister);
> @@ -1179,15 +1186,6 @@ static int __cpufreq_add_dev(struct device *dev, 
> struct subsys_interface *sif,
>   blocking_notifier_call_chain(_policy_notifier_list,
>CPUFREQ_START, policy);
>  
> -#ifdef CONFIG_HOTPLUG_CPU
> - gov = __find_governor(per_cpu(cpufreq_cpu_governor, cpu));
> - if (gov) {
> - policy->governor = gov;
> - pr_debug("Restoring governor %s for cpu %d\n",
> -policy->governor->name, cpu);
> - }
> -#endif
> -
>   if (!frozen) {
>   ret = cpufreq_add_dev_interface(policy, dev);
>   if (ret)
> 

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: perf_fuzzer compiled for x32 causes reboot

2014-02-28 Thread H. Peter Anvin

On 02/28/2014 03:34 PM, Vince Weaver wrote:
> 
> Well while it might appear that I spend all of my days finding perf_event 
> bugs, I actually am a college professor so I do occasionally have to run 
> off to teach a class, meet with students, or write papers/grants for other 
> academics to reject.
> 

We really appreciate your help.  This has been really critical.

>
> It's nice others can reproduce the issue now, it would have saved me a lot 
> of trouble, although now in theory I have a much better handle of how to 
> use/abuse ftrace so I guess it was worth it.
> 
> Once the fix gets into git I'm sure the relentless perf_fuzzer will let us 
> know if there are any other issues left.  I do look forward to the day 
> when I can leave it running overnight and have a clean syslog the next 
> morning.
> 

We all do, definitely, and your help has been a huge step in that direction.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC/RFT v3 1/9] drivers: base: add new class "cpu" to group cpu devices

2014-02-28 Thread Greg Kroah-Hartman

On Wed, Feb 19, 2014 at 04:06:08PM +, Sudeep Holla wrote:
> From: Sudeep Holla 
> 
> This patch creates a new class called "cpu" and assigns it to all the
> cpu devices. This helps in grouping all the cpu devices and associated
> child devices under the same class.
> 
> This patch also:
> 1. modifies the get_parent_device to return the legacy path
>(/sys/devices/system/cpu/..) for the cpu class devices to support
>existing sysfs ABI
> 2. avoids creating link in the class directory pointing to the device as
>there would be per-cpu instance of these devices with the same name
> 3. makes sure subsystem symlink continues pointing to cpu bus instead of
>cpu class for cpu devices
> 
> Signed-off-by: Sudeep Holla 
> Cc: Greg Kroah-Hartman 

Does the sysfs layout change at all with this patch applied?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH RFC/RFT v3 2/9] drivers: base: support cpu cache information interface to userspace via sysfs

2014-02-28 Thread Greg Kroah-Hartman

On Wed, Feb 19, 2014 at 04:06:09PM +, Sudeep Holla wrote:
> From: Sudeep Holla 
> 
> This patch adds initial support for providing processor cache information
> to userspace through sysfs interface. This is based on already existing
> implementations(x86, ia64, s390 and powerpc) and hence the interface is
> intended to be fully compatible.
> 
> The main purpose of this generic support is to avoid further code
> duplication to support new architectures and also to unify all the existing
> different implementations.
> 
> This implementation maintains the hierarchy of cache objects which reflects
> the system's cache topology. Cache devices are instantiated as needed as
> CPUs come online. The cache information is replicated per-cpu even if they are
> shared. A per-cpu array of cache information maintained is used mainly for
> sysfs-related book keeping.
> 
> It also implements the shared_cpu_map attribute, which is essential for
> enabling both kernel and user-space to discover the system's overall cache
> topology.
> 
> This patch also add the missing ABI documentation for the cacheinfo sysfs
> interface already, which is well defined and widely used.
> 
> Signed-off-by: Sudeep Holla 
> Cc: Greg Kroah-Hartman 
> Cc: Rob Herring 
> Cc: linux-...@vger.kernel.org
> ---
>  Documentation/ABI/testing/sysfs-devices-system-cpu |  40 ++
>  drivers/base/Makefile  |   2 +-
>  drivers/base/cacheinfo.c   | 484 
> +
>  include/linux/cacheinfo.h  |  55 +++
>  4 files changed, 580 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/base/cacheinfo.c
>  create mode 100644 include/linux/cacheinfo.h
> 
> diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu 
> b/Documentation/ABI/testing/sysfs-devices-system-cpu
> index d5a0d33..dabe03e 100644
> --- a/Documentation/ABI/testing/sysfs-devices-system-cpu
> +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu
> @@ -224,3 +224,43 @@ Description: Parameters for the Intel P-state driver
>   frequency range.
>  
>   More details can be found in 
> Documentation/cpu-freq/intel-pstate.txt
> +
> +What:
> /sys/devices/system/cpu/cpu*/cache/index*/
> +Date:February 2014
> +Contact: Linux kernel mailing list 

No, your name goes here, you don't get to run away from this new code :)

> +Description: Parameters for the CPU cache attributes
> +
> + attributes:
> + - writethrough: data is written to both the cache line
> + and to the block in the lower-level 
> memory
> + - writeback: data is written only to the cache line and
> +  the modified cache line is written to main
> +  memory only when it is replaced
> + - writeallocate: allocate a memory location to a cache 
> line
> +  on a cache miss because of a write
> + - readallocate: allocate a memory location to a cache 
> line
> + on a cache miss because of a read
> +
> + coherency_line_size: the minimum amount of data that gets 
> transferred
> +
> + level: the cache hierarcy in the multi-level cache configuration
> +
> + number_of_sets: total number of sets in the cache, a set is a
> + collection of cache lines with the same cache 
> index
> +
> + physical_line_partition: number of physical cache line per 
> cache tag
> +
> + shared_cpu_list: the list of cpus sharing the cache
> +
> + shared_cpu_map: logical cpu mask containing the list of cpus 
> sharing
> + the cache
> +
> + size: the total cache size in kB
> +
> + type:
> + - instruction: cache that only holds instructions
> + - data: cache that only caches data
> + - unified: cache that holds both data and instructions
> +
> + ways_of_associativity: degree of freedom in placing a 
> particular block
> + of memory in the cache

With this patch, does this all work for x86, or does it need more glue
logic?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] tty: Add sysfs symlink for console name->tty device

2014-02-28 Thread Peter Hurley


On 02/28/2014 07:35 PM, Greg Kroah-Hartman wrote:

On Wed, Feb 26, 2014 at 09:40:51AM -0500, Peter Hurley wrote:

Enable a user-space process to discover the underlying tty device
for a console, if one exists, and when the tty device is later
created or destroyed.


What userspace code has been tested with this change?


Every existing distro + personal copies going back 42 versions.
No breakage. ;)


Add sysfs symlinks for registered consoles to their respective
devices in [sys/class,sys/devices/virtual]/tty/console.
Scan consoles at tty device (un)registration to handle deferred
console<->device (un)binding.


I don't understand, what does userspace now look like in sysfs?  Do we
need Documentation/ABI/ updates here?

And David has fixed up his original patch, which doesn't break plymouth,
and I'll be taking that, so I don't see why this patch is needed.


Ok. I tried.

Regards,
Peter Hurley

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 0/1] mm, shmem: map few pages around fault address if they are in page cache

2014-02-28 Thread Ning Qu

Sorry about my fault about the experiments, here is the real one.

Btw, apparently, there are still some questions about the results and
I will sync with Kirill about his test command line.

Below is just some simple experiment numbers from this patch, let me know if
you would like more:

Tested on Xeon machine with 64GiB of RAM, using the current default fault
order 4.

Sequential access 8GiB file
Baselinewith-patch
1 thread
minor fault 8,389,0524,456,530
time, seconds9.558.31

Random access 8GiB file
Baselinewith-patch
1 thread
minor fault 8,389,315   6,423,386
time, seconds11.68 10.51



Best wishes,
-- 
Ning Qu
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] tty: Add sysfs symlink for console name->tty device

2014-02-28 Thread Greg Kroah-Hartman

On Wed, Feb 26, 2014 at 09:40:51AM -0500, Peter Hurley wrote:
> Enable a user-space process to discover the underlying tty device
> for a console, if one exists, and when the tty device is later
> created or destroyed.

What userspace code has been tested with this change?

> Add sysfs symlinks for registered consoles to their respective
> devices in [sys/class,sys/devices/virtual]/tty/console.
> Scan consoles at tty device (un)registration to handle deferred
> console<->device (un)binding.

I don't understand, what does userspace now look like in sysfs?  Do we
need Documentation/ABI/ updates here?

And David has fixed up his original patch, which doesn't break plymouth,
and I'll be taking that, so I don't see why this patch is needed.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: zram: lockdep spew for zram->init_lock

2014-02-28 Thread Andrew Morton

On Fri, 28 Feb 2014 08:56:29 +0900 Minchan Kim  wrote:

> Sasha reported following below lockdep spew of zram.
> 
> It was introduced by [1] in recent linux-next but it's false positive
> because zram_meta_alloc with down_write(init_lock) couldn't be called
> during zram is working as swap device so we could annotate the lock.
> 
> But I don't think it's worthy because it would make greate lockdep
> less effective. Instead, move zram_meta_alloc out of the lock as good
> old day so we could do unnecessary allocation/free of zram_meta for
> initialied device as Sergey claimed in [1] but it wouldn't be common
> and be harmful if someone might do it. Rather than, I'd like to respect
> lockdep which is great tool to prevent upcoming subtle bugs.
> 
> [1] zram: delete zram_init_device
>
> ...
>
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -537,26 +537,27 @@ static ssize_t disksize_store(struct device *dev,
>   struct device_attribute *attr, const char *buf, size_t len)
>  {
>   u64 disksize;
> + struct zram_meta *meta;
>   struct zram *zram = dev_to_zram(dev);
>  
>   disksize = memparse(buf, NULL);
>   if (!disksize)
>   return -EINVAL;
>  
> + disksize = PAGE_ALIGN(disksize);
> + meta = zram_meta_alloc(disksize);
> + if (!meta)
> + return -ENOMEM;
> +
>   down_write(>init_lock);
>   if (init_done(zram)) {
> + zram_meta_free(meta);
>   up_write(>init_lock);
>   pr_info("Cannot change disksize for initialized device\n");
>   return -EBUSY;
>   }
>  
> - disksize = PAGE_ALIGN(disksize);
> - zram->meta = zram_meta_alloc(disksize);
> - if (!zram->meta) {
> - up_write(>init_lock);
> - return -ENOMEM;
> - }
> -
> + zram->meta = meta;
>   zram->disksize = disksize;
>   set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
>   up_write(>init_lock);

When applying zram-use-zcomp-compressing-backends.patch on top of this
we get a bit of a mess, and simple conflict resolution results in a
leak.

disksize_store() was one of those nasty functions which does multiple
"return" statements after performing locking and resource allocation. 
As usual, this led to a resource leak.  Remember folks, "return" is a
goto in disguise.


Here's what I ended up with.  Please review.

static ssize_t disksize_store(struct device *dev,
struct device_attribute *attr, const char *buf, size_t len)
{
u64 disksize;
struct zram_meta *meta;
struct zram *zram = dev_to_zram(dev);
int err;

disksize = memparse(buf, NULL);
if (!disksize)
return -EINVAL;

disksize = PAGE_ALIGN(disksize);
meta = zram_meta_alloc(disksize);
if (!meta)
return -ENOMEM;

down_write(>init_lock);
if (init_done(zram)) {
pr_info("Cannot change disksize for initialized device\n");
err = -EBUSY;
goto out_free_meta;
}

zram->comp = zcomp_create(default_compressor);
if (!zram->comp) {
pr_info("Cannot initialise %s compressing backend\n",
default_compressor);
err = -EINVAL;
goto out_free_meta;
}

zram->meta = meta;
zram->disksize = disksize;
set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
up_write(>init_lock);

return len;

out_free_meta:
up_write(>init_lock);
zram_meta_free(meta);
return err;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] drivercore: deferral race condition fix

2014-02-28 Thread Greg Kroah-Hartman

On Wed, Feb 26, 2014 at 09:06:54AM +0200, Peter Ujfalusi wrote:
> When the kernel is built with CONFIG_PREEMPT it is possible to reach a state
> when all modules are loaded but some driver still stuck in the deferred list
> and there is a need for external event to kick the deferred queue to probe
> these drivers.
> 
> The issue has been observed on embedded systems with CONFIG_PREEMPT enabled,
> audio support built as modules and using nfsroot for root filesystem.
> 
> The following fragment of a log shows such sequence when all audio modules
> were loaded but the sound card is not present since the machine driver has
> failed to probe due to missing dependency during it's probe.
> The board is am335x-evmsk (McASP<->tlv320aic3106 codec) with davinci-evm
> machine driver:
> 
> ...
> [   12.615118] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: ENTER
> [   12.719969] davinci_evm sound.3: davinci_evm_probe: ENTER
> [   12.725753] davinci_evm sound.3: davinci_evm_probe: snd_soc_register_card
> [   12.753846] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: 
> snd_soc_register_component
> [   12.922051] davinci-mcasp 4803c000.mcasp: davinci_mcasp_probe: 
> snd_soc_register_component DONE
> [   12.950839] davinci_evm sound.3: ASoC: platform (null) not registered
> [   12.957898] davinci_evm sound.3: davinci_evm_probe: snd_soc_register_card 
> DONE (-517)
> [   13.099026] davinci-mcasp 4803c000.mcasp: Kicking the deferred list
> [   13.177838] davinci-mcasp 4803c000.mcasp: really_probe: probe_count = 2
> [   13.194130] davinci_evm sound.3: snd_soc_register_card failed (-517)
> [   13.346755] davinci_mcasp_driver_init: LEAVE
> [   13.377446] platform sound.3: Driver davinci_evm requests probe deferral
> [   13.592527] platform sound.3: really_probe: probe_count = 0
> 
> In the log the machine driver enters it's probe at 12.719969 (this point it
> has been removed from the deferred lists). McASP driver already executing
> it's probing (12.615118) and finishes first as well.
> The machine driver tries to construct the sound card (12.950839) but did
> not found one of the components so it fails. After this McASP driver
> registers all the ASoC components and the deferred work is prepared at
> 13.099026 (note that this time the machine driver is not in the lists so it
> is not going to be handled when the work is executing).
> Lastly the machine driver exit from it's probe and the core places it to the
> deferred list but there will be no other driver going to load and the
> deferred queue is not going to be kicked again - till we have external event
> like connecting USB stick, etc.
> 
> The proposed solution is to try the deferred queue once more when the last
> driver is asking for deferring and we had drivers loaded while this last
> driver was probing.

"once more"?  What happens if we get a new driver in when that one is
being probed?

It sounds like there's a race condition here somewhere, or improper
locking going on, just "let's try it again" doesn't sound like the
correct fix to me, does it to you?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [f2fs-dev] [PATCH] f2fs: fix dirty page accounting when redirty

2014-02-28 Thread Dave Chinner

On Fri, Feb 28, 2014 at 10:12:05AM +0800, Chao Yu wrote:
> We should de-account dirty counters for page when redirty in ->writepage().
> 
> Wu Fengguang described in 'commit 971767caf632190f77a40b4011c19948232eed75':
> "writeback: fix dirtied pages accounting on redirty
> De-account the accumulative dirty counters on page redirty.
> 
> Page redirties (very common in ext4) will introduce mismatch between
> counters (a) and (b)
> 
> a) NR_DIRTIED, BDI_DIRTIED, tsk->nr_dirtied
> b) NR_WRITTEN, BDI_WRITTEN
> 
> This will introduce systematic errors in balanced_rate and result in
> dirty page position errors (ie. the dirty pages are no longer balanced
> around the global/bdi setpoints)."
> 
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/checkpoint.c |1 +
>  fs/f2fs/data.c   |1 +
>  fs/f2fs/node.c   |1 +
>  3 files changed, 3 insertions(+)
> 
> diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
> index c8516ee..f069249 100644
> --- a/fs/f2fs/checkpoint.c
> +++ b/fs/f2fs/checkpoint.c
> @@ -178,6 +178,7 @@ no_write:
>  redirty_out:
>   dec_page_count(sbi, F2FS_DIRTY_META);
>   wbc->pages_skipped++;
> + account_page_redirty(page);
>   set_page_dirty(page);
>   return AOP_WRITEPAGE_ACTIVATE;

redirty_page_for_writepage()?

Cheers,

Dave.
-- 
Dave Chinner
da...@fromorbit.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH v2 1/2] net: macb: Check DMA mappings for error

2014-02-28 Thread Sören Brinkmann

Hi Ben,

On Thu, 2014-02-27 at 11:57PM +, Ben Hutchings wrote:
> On Thu, 2014-02-27 at 13:58 -0800, Soren Brinkmann wrote:
> [...]
> > diff --git a/drivers/net/ethernet/cadence/macb.c 
> > b/drivers/net/ethernet/cadence/macb.c
> > index 3190d38e16fb..a9c2ccfc1740 100644
> > --- a/drivers/net/ethernet/cadence/macb.c
> > +++ b/drivers/net/ethernet/cadence/macb.c
> > @@ -632,11 +632,16 @@ static void gem_rx_refill(struct macb *bp)
> >"Unable to allocate sk_buff\n");
> > break;
> > }
> > -   bp->rx_skbuff[entry] = skb;
> >  
> > /* now fill corresponding descriptor entry */
> > paddr = dma_map_single(>pdev->dev, skb->data,
> >bp->rx_buffer_size, 
> > DMA_FROM_DEVICE);
> > +   if (dma_mapping_error(>pdev->dev, paddr)) {
> > +   dev_kfree_skb(skb);
> > +   break;
> > +   }
> > +
> > +   bp->rx_skbuff[entry] = skb;
> >  
> > if (entry == RX_RING_SIZE - 1)
> > paddr |= MACB_BIT(RX_WRAP);
> > @@ -1040,6 +1045,10 @@ static int macb_start_xmit(struct sk_buff *skb, 
> > struct net_device *dev)
> 
> A bit more context:
> 
>   entry = macb_tx_ring_wrap(bp->tx_head);
>   bp->tx_head++;
> 
> > netdev_vdbg(bp->dev, "Allocated ring entry %u\n", entry);
> > mapping = dma_map_single(>pdev->dev, skb->data,
> >  len, DMA_TO_DEVICE);
> > +   if (dma_mapping_error(>pdev->dev, mapping)) {
> > +   kfree_skb(skb);
> > +   goto unlock;
> > +   }
> 
> You need to move the bp->tx_head increment below this error check.
> Sorry I didn't spot this the first time.

No problem. I'll wait till next week with a re-spin, in case somebody
else has comments.

Sören


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 02/16] scsi: atari_scsi: fix sleep_on race

2014-02-28 Thread Michael Schmitz


Hello Arnd,

On Thursday 27 February 2014, Michael Schmitz wrote:
  

Arnd Bergmann wrote:

  
  
Nack - the completion condition in the first hunk has its logic 
reversed. Try this instead (while() loops while condition true, do {} 
until () loops while condition false, no?)



Sorry about messing it up again. I though I had fixed it up the
way you commented when you said it worked.
 
  
I'm 99% confident I had tested your current version of the patch before 
and found it still attempts to schedule while in interrupt. I can retest 
if you prefer, but that'll have to wait a few days.



I definitely trust you to have the right version, since you did the
testing.
  


I'm glad I double checked, since there's one other error left in my 
correction to your patch below:


The in_irq() condition is not sufficient, we need in_interrupt() there. 
This has somehow slipped into a related patch sent to linux-scsi, so 
I'll have to refactor the lot. Bugger.


I'll resend the correct version via Geert.

  

diff --git a/drivers/scsi/atari_scsi.c b/drivers/scsi/atari_scsi.c
index a3e6c8a..cc1b013 100644
--- a/drivers/scsi/atari_scsi.c
+++ b/drivers/scsi/atari_scsi.c
@@ -90,6 +90,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 

 #include 
@@ -549,8 +550,10 @@ static void falcon_get_lock(void)
 
local_irq_save(flags);
 
-   while (!in_irq() && falcon_got_lock && stdma_others_waiting())

-   sleep_on(_fairness_wait);
+   wait_event_cmd(falcon_fairness_wait,
+   in_irq() || !falcon_got_lock || !stdma_others_waiting(),
+   local_irq_restore(flags),
+   local_irq_save(flags));
 
while (!falcon_got_lock) {

if (in_irq())



Yes, by inspection your version looks correct and mine looks wrong.
I had figured this out before, just sent the wrong version.
  


These things happen if you bother fixing other people's weird code :-)
And as I mentioned above, I missed another detail myself

  

@@ -562,7 +565,10 @@ static void falcon_get_lock(void)
falcon_trying_lock = 0;
wake_up(_try_wait);
} else {
-   sleep_on(_try_wait);
+   wait_event_cmd(falcon_try_wait,
+   falcon_got_lock && !falcon_trying_lock,
+   local_irq_restore(flags),
+   local_irq_save(flags));
}



I did correct this part compared to my first patch, but forgot
to change the other hunk.

Can you send your version of the patch to Geert for inclusion?
That way I don't have the danger of missing another negation.
This code is clearly too weird to rely on inspection alone and
we know that your version was working when you last tested it.
  


Will do - I'll CC: you in so you can ACK the patch if Geert needs 
convincing.


Cheers,

   Michael

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2/4] HID: cp2112: remove the last hid_output_raw_report() call

2014-02-28 Thread Benjamin Tissoires

I don't have access to the device, so I copied/pasted the code
from hidraw.

Signed-off-by: Benjamin Tissoires 
---
 drivers/hid/hid-cp2112.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/hid/hid-cp2112.c b/drivers/hid/hid-cp2112.c
index 860db694..c4f87bd 100644
--- a/drivers/hid/hid-cp2112.c
+++ b/drivers/hid/hid-cp2112.c
@@ -290,7 +290,21 @@ static int cp2112_hid_output(struct hid_device *hdev, u8 
*data, size_t count,
if (!buf)
return -ENOMEM;
 
-   ret = hdev->hid_output_raw_report(hdev, buf, count, report_type);
+   /* Fixme: test which function is actually called for output reports */
+   if (report_type == HID_OUTPUT_REPORT) {
+   ret = hid_hw_output_report(hdev, buf, count);
+   /*
+* compatibility with old implementation of USB-HID:
+* if the device does not support receiving output reports,
+* on an interrupt endpoint, fallback to SET_REPORT HID command.
+*/
+   if (ret != -ENOSYS)
+   goto out_free;
+   }
+
+   ret = hid_hw_raw_request(hdev, buf[0], buf, count, report_type,
+   HID_REQ_SET_REPORT);
+out_free:
kfree(buf);
return ret;
 }
-- 
1.8.5.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1220 matches

Mail list logo