[PATCH] platform/x86/intel-uncore-freq: Add Sapphire Rapids server support

2021-02-03 Thread Artem Bityutskiy
From: Artem Bityutskiy 

Sapphire Rapids uncore frequency control is the same as Skylake and Ice Lake.
Add the Sapphire Rapids CPU model number to the match array.

Signed-off-by: Artem Bityutskiy 
Reviewed-by: Tony Luck 
---
 drivers/platform/x86/intel-uncore-frequency.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/platform/x86/intel-uncore-frequency.c 
b/drivers/platform/x86/intel-uncore-frequency.c
index 12d5ab7e1f5d..3ee4c5c8a64f 100644
--- a/drivers/platform/x86/intel-uncore-frequency.c
+++ b/drivers/platform/x86/intel-uncore-frequency.c
@@ -377,6 +377,7 @@ static const struct x86_cpu_id intel_uncore_cpu_ids[] = {
X86_MATCH_INTEL_FAM6_MODEL(SKYLAKE_X,   NULL),
X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_X,   NULL),
X86_MATCH_INTEL_FAM6_MODEL(ICELAKE_D,   NULL),
+   X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X, NULL),
{}
 };
 
-- 
2.26.2



[PATCH] docs: trace: fix event state structure name

2020-11-04 Thread Artem Bityutskiy
From: Artem Bityutskiy 

The documentation refers to a non-existent 'struct synth_trace_state'
structure. The correct name is 'struct synth_event_trace_state'.

In other words, this patch is a mechanical substitution:
s/synth_trace_state/synth_event_trace_state/g

Signed-off-by: Artem Bityutskiy 
---
 Documentation/trace/events.rst | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/trace/events.rst b/Documentation/trace/events.rst
index f792b1959a33..bdba7b0e19ef 100644
--- a/Documentation/trace/events.rst
+++ b/Documentation/trace/events.rst
@@ -787,13 +787,13 @@ To trace a synthetic using the piecewise method described 
above, the
 synth_event_trace_start() function is used to 'open' the synthetic
 event trace::
 
-   struct synth_trace_state trace_state;
+   struct synth_event_trace_state trace_state;
 
ret = synth_event_trace_start(schedtest_event_file, &trace_state);
 
 It's passed the trace_event_file representing the synthetic event
 using the same methods as described above, along with a pointer to a
-struct synth_trace_state object, which will be zeroed before use and
+struct synth_event_trace_state object, which will be zeroed before use and
 used to maintain state between this and following calls.
 
 Once the event has been opened, which means space for it has been
@@ -805,7 +805,7 @@ lookup per field.
 
 To assign the values one after the other without lookups,
 synth_event_add_next_val() should be used.  Each call is passed the
-same synth_trace_state object used in the synth_event_trace_start(),
+same synth_event_trace_state object used in the synth_event_trace_start(),
 along with the value to set the next field in the event.  After each
 field is set, the 'cursor' points to the next field, which will be set
 by the subsequent call, continuing until all the fields have been set
@@ -834,7 +834,7 @@ this method would be (without error-handling code)::
ret = synth_event_add_next_val(395, &trace_state);
 
 To assign the values in any order, synth_event_add_val() should be
-used.  Each call is passed the same synth_trace_state object used in
+used.  Each call is passed the same synth_event_trace_state object used in
 the synth_event_trace_start(), along with the field name of the field
 to set and the value to set it to.  The same sequence of calls as in
 the above examples using this method would be (without error-handling
@@ -856,7 +856,7 @@ can be used but not both at the same time.
 
 Finally, the event won't be actually traced until it's 'closed',
 which is done using synth_event_trace_end(), which takes only the
-struct synth_trace_state object used in the previous calls::
+struct synth_event_trace_state object used in the previous calls::
 
ret = synth_event_trace_end(&trace_state);
 
-- 
2.26.2



Re: [RFC v4 1/1] selftests/cpuidle: Add support for cpuidle latency measurement

2020-09-03 Thread Artem Bityutskiy
On Thu, 2020-09-03 at 17:50 +0300, Artem Bityutskiy wrote:
> Well, things depend on platform, it is really "void", it is just
> different and it measures an optimized case. The result may be smaller
> observed latency.

Sorry, I meant to say it is _not_ really "void".



Re: [RFC v4 1/1] selftests/cpuidle: Add support for cpuidle latency measurement

2020-09-03 Thread Artem Bityutskiy
On Thu, 2020-09-03 at 17:30 +0530, Pratik Sampat wrote:
> I certainly did not know about that the Intel architecture being aware
> of timers and pre-wakes the CPUs which makes the timer experiment
> observations void.

Well, things depend on platform, it is really "void", it is just
different and it measures an optimized case. The result may be smaller
observed latency. And things depend on the platform.

> However, we are also collecting a baseline measurement wherein we run
> the same test on a 100% busy CPU and the measurement of latency from
> that could be considered to the kernel-userspace overhead.
> The rest of the measurements would be considered keeping this baseline
> in mind.

Yes, this should give the idea of the overhead, but still, at least for
many Intel platforms I would not be comfortable using the resulting
number (measured latency - baseline) for a cpuidle driver, because
there are just too many variables there. I am not sure I could assume
the baseline measured this way is an invariant - it could be noticeably
different depending on whether you use C-states or not.

> > At least on Intel platforms, this will mean that the IPI method won't
> > cover deep C-states like, say, PC6, because one CPU is busy. Again, not
> > saying this is not interesting, just pointing out the limitation.
> 
> That's a valid point. We have similar deep idle states in POWER too.
> The idea here is that this test should be run on an already idle
> system, of course there will be kernel jitters along the way
> which can cause little skewness in observations across some CPUs but I
> believe the observations overall should be stable.

If baseline and cpuidle latency are numbers of same order of magnitude,
and you are measuring in a controlled lab system, may be yes. But if
baseline is, say, in milliseconds, and you are measuring a 10
microseconds C-state, then probably no.

> Another solution to this could be using isolcpus, but that just
> increases the complexity all the more.
> If you have any suggestions of any other way that could guarantee
> idleness that would be great.

Well, I did not try to guarantee idleness. I just use timers and
external device (the network card), so no CPUs needs to be busy and the
system can enter deep C-states. Then I just look at median, 99%-th
percentile, etc.

But by all means IPI is also a very interesting experiment. Just covers
a different usage scenario.

When I started experimenting in this area, one of my main early
takeaways was realization that C-state latency really depends on the
event source.

HTH,
Artem.



Re: [RFC v4 1/1] selftests/cpuidle: Add support for cpuidle latency measurement

2020-09-02 Thread Artem Bityutskiy
On Wed, 2020-09-02 at 17:15 +0530, Pratik Rajesh Sampat wrote:
> Measure cpuidle latencies on wakeup to determine and compare with the
> advertsied wakeup latencies for each idle state.

It looks like the measurements include more than just C-state wake,
they also include the overhead of waking up the proces, context switch,
and potentially any interrupts that happen on that CPU. I am not saying
this is not interesting data, it surely is, but it is going to be
larger than you see in cpuidle latency tables. Potentially
significantly larger.

Therefore, I am not sure this program should be advertised as "cpuidle
measurement". It really measures the "IPI latency" in case of the IPI
method.

> A baseline measurement for each case of IPI and timers is taken at
> 100 percent CPU usage to quantify for the kernel-userpsace overhead
> during execution.

At least on Intel platforms, this will mean that the IPI method won't
cover deep C-states like, say, PC6, because one CPU is busy. Again, not
saying this is not interesting, just pointing out the limitation.

I was working on a somewhat similar stuff for x86 platforms, and I am
almost ready to publish that on github. I can notify you when I do so
if you are interested. But here is a small presentation of the approach
that I did on Plumbers last year:

https://youtu.be/Opk92aQyvt0?t=8266

(the link points to the start of my talk)



Re: [PATCH v3 3/5] cpufreq: intel_pstate: Tweak the EPP sysfs interface

2020-08-28 Thread Artem Bityutskiy
On Thu, 2020-08-27 at 17:14 +0200, Rafael J. Wysocki wrote:
> From: "Rafael J. Wysocki" 
> 
> Modify the EPP sysfs interface to reject attempts to change the EPP
> to values different from 0 ("performance") in the active mode with
> the "performance" policy (ie. scaling_governor set to "performance"),
> to avoid situations in which the kernel appears to discard data
> passed to it via the EPP sysfs attribute.
> 
> Signed-off-by: Rafael J. Wysocki 

This one looks good to me, thanks.

Reviewed-by: Artem Bityutskiy 

-- 
Best Regards,
Artem Bityutskiy



Re: cpu-freq: running the perf increases the data rate?

2020-08-28 Thread Artem Bityutskiy
On Thu, 2020-08-27 at 22:25 +0530, Subhashini Rao Beerisetty wrote:
> I have an application which finds the data rate over the PCIe
> interface. I’m getting the lesser data rate in one of my Linux X86
> systems.

Some more description, may be? Do you have a PCIe device reading one
RAM buffer and then writing to another RAM buffer? Or does it generate
dome data and writes them to a RAM buffer? Presumably it uses DMA? How
much is the CPU involved into the process? Are we talking about
transferring few kilobytes or gigabytes?

> When I change the scaling_governor from "powersave" to "performance"
> mode for each CPU, then there is slight improvement in the PCIe data
> rate.

Definitely this makes your CPU(s) run at max speed, but depending on
platform and settings, this may also affect C-states. Are the CPU(s)
generally idle while you measure, or busy (involved into the test)? You
could run 'turbostat' while measuring the bandwidth, to get some CPU
statistics (e.g., do C-states happen during the PCI test, how busy are
the CPUs).

> Parallely I started profiling the workload with perf. Whenever I start
> running the profile command “perf stat -a -d -p ” surprisingly
> the application resulted in excellent data rate over PCIe, but when I
> kill the perf command again PCIe data rate drops. I am really confused
> about this behavior.Any clues from this behaviour?

Well, one possible reason that comes to mind - you get rid of C-states
when you rung perf, and this increases the PCI bandwidth. You can just
try disabling C-states (there are sysfs knobs) and check it out.
Turbostat could be useful to check for this (with and without perf, run
'turbostat sleep 10' or something like this (measure for 10 seconds in
this example), do this while running your PCI test.

But I am really just guessing here, I do not know enough about your
test and the system (e.g., "a Linux x86" system can be so many things,
like Intel or AMD server or a mobile device)...




Re: [PATCH] intel_idle: Add ICL support

2020-08-26 Thread Artem Bityutskiy
On Wed, 2020-08-26 at 18:43 +0200, Rafael J. Wysocki wrote:
> > I am curious, somehow that patch makes a difference.
> 
> It does make a difference, because it makes the processor spend more
> time in PC2.  Which very well may be because the processor cannot
> enter deeper C-states.

OK, you are right, we do not indeed have PC10 in both cases. The file
was wrapped and I unwrapped it incorrectly. So indeed this is the real
problem to solve, and the patch does not solve it.



Re: [PATCH] intel_idle: Add ICL support

2020-08-26 Thread Artem Bityutskiy
On Wed, 2020-08-26 at 18:02 +0200, Rafael J. Wysocki wrote:
> To that end, I would try to upgrade the graphics firmware and see if
> you can get some nonzero PC8 residency then.

I am curious, somehow that patch makes a difference.

Guilhem, what do we actually compare: same kernel, just patched vs
unpached? Or these are 2 different kernels, and one of them was
patched.

What does 'uname -r'  say for the "with" and "without" kernels?

Did you compile both kernels yourself and the only difference between
them is the intel_idle patch?



Re: [PATCH] intel_idle: Add ICL support

2020-08-26 Thread Artem Bityutskiy
On Wed, 2020-08-26 at 19:25 +0300, Artem Bityutskiy wrote:
> C6

Ok, too long day, I meant C10 here...



Re: [PATCH] intel_idle: Add ICL support

2020-08-26 Thread Artem Bityutskiy
On Wed, 2020-08-26 at 19:19 +0300, Artem Bityutskiy wrote:
> May be there is a BIOS update that fixes this problem? May be Windows
> user get it quickly because stuff like this is often well-integrated in
> Windows? Would you please check if there is newer BIOS?

Oh, wait a second. So ACPI_C3 is C6, the deepest C-state one can
request. Sorry, I missed this first. Scratch my questions about Windows
and newer BIOS.

So ACPI does expose the deepest C-state, but something prevents your
system from going into PC10.



Re: [PATCH] intel_idle: Add ICL support

2020-08-26 Thread Artem Bityutskiy
Indeed, when I compare them:

acpi_idle (without the patch):

CPU%c1  CPU%c6  CPU%c7  CoreTmp PkgTmp  GFX%rc6 Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 
Pkg%pc8 Pkg%pc9 Pk%pc10 PkgWatt
29.48   0.0060.71   58  58  97.96   16.96   0.000.000.00
0.000.000.006.08

intel_idle (with the patch):

CPU%c1  CPU%c6  CPU%c7  CoreTmp PkgTmp  GFX%rc6 Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 
Pkg%pc8 Pkg%pc9 Pk%pc10 PkgWatt
56  56  96.64   300 68.29   48.58   0.000.000.000.00
0.000.007.380.00

With intel_idle we reach PC10, without it we only go as deep as PC2 - huge 
difference.

I really wonder why the BIOS does not expose deeper C-states... And if
it does not, is this for a reason? And how windows works then?

May be there is a BIOS update that fixes this problem? May be Windows
user get it quickly because stuff like this is often well-integrated in
Windows? Would you please check if there is newer BIOS?

Artem.



Re: [PATCH] intel_idle: Add ICL support

2020-08-26 Thread Artem Bityutskiy
On Wed, 2020-08-26 at 16:16 +0300, Artem Bityutskiy wrote:
> Just get a reasonably new turbostat (it is part of the kernel tree, you
> can compile it yourself) and run it for few seconds (like 'turbostat
> sleep 10'), get the output (will be a lot of it), and we can check what
> is actually going on with regards to C-states.

Oh, and if you could do that with and without your patch, we could even
compare things. But try to do it at least with the default (acpi_idle)
configuration.

Artem.



Re: [PATCH] intel_idle: Add ICL support

2020-08-26 Thread Artem Bityutskiy
On Wed, 2020-08-26 at 15:03 +0200, Guilhem Lettron wrote:
> On Wed, 26 Aug 2020 at 14:43, Rafael J. Wysocki  wrote:
> > On Wed, Aug 26, 2020 at 2:05 PM Guilhem Lettron  wrote:
> > > Use the same C-states as SKL
> > 
> > Why is this change needed?
> 
> On my laptop, a Dell XPS 13 7390 2-in-1 with i7-1065G7, ACPI only
> report "C1_ACPI", "C2_ACPI" and "C3_ACPI".

Also, if you could runt turbostat - we could see which _actual_ HW C-
states are used on your system, which Package C-states are reached.

Just get a reasonably new turbostat (it is part of the kernel tree, you
can compile it yourself) and run it for few seconds (like 'turbostat
sleep 10'), get the output (will be a lot of it), and we can check what
is actually going on with regards to C-states.

Artem.



Re: [PATCH] intel_idle: Add ICL support

2020-08-26 Thread Artem Bityutskiy
On Wed, 2020-08-26 at 15:03 +0200, Guilhem Lettron wrote:
> On Wed, 26 Aug 2020 at 14:43, Rafael J. Wysocki  wrote:
> > On Wed, Aug 26, 2020 at 2:05 PM Guilhem Lettron  wrote:
> > > Use the same C-states as SKL
> > 
> > Why is this change needed?
> 
> On my laptop, a Dell XPS 13 7390 2-in-1 with i7-1065G7, ACPI only
> report "C1_ACPI", "C2_ACPI" and "C3_ACPI".

Did you try to dig into the BIOS menus and check if you can enable
more/deeper C-states?

Artem.



Re: [PATCH v2 2/5] cpufreq: intel_pstate: Always return last EPP value from sysfs

2020-08-26 Thread Artem Bityutskiy
Thanks for answer Rafael, it looks like there are 2 different things
now.

1. What kernel returns when I _read_ e_p_p file - truth or "cached" ?

2. How kernel behaves when I _write_ to e_p_p file something it cannot
provide - error or success.

For #1, I think we need to keep it simple and always return true policy
value. Does not matter what someone wrote there. If some process wrote
"powersave", but kernel uses EPP 0 anyway, the other process probably
wants to know the truth and get "performance" when reading e_p_p.

On Tue, 2020-08-25 at 16:51 +0200, Rafael J. Wysocki wrote:
> An alternative is to fail writes to energy_performance_preference if
> the driver works in the active mode and the scaling algorithm for the
> scaling CPU is performance and *then* to make reads from it return the
> value in the register.

Yes, this is #2. This sounds like the _right_ way to do it.

Suppose my script wants to exercise the system with 4 different EPP
policies. It changes the policy and runs measurements, each run takes
few _days_.

Now, my script asks for "powersave". Kernel _knows_ it cannot provide
it (performance+active enabled). Why would it not return error ("can't
do") instead of success ("yes, Sir!")?

Note, I deliberately use simple words like "my script" instead of "a
user-space process" to make it easier to convey the idea.

Anyway, if kernel returns error, I can go and improve my script WRT
controlling the performance+active mode knobs.



Re: [PATCH v2 2/5] cpufreq: intel_pstate: Always return last EPP value from sysfs

2020-08-24 Thread Artem Bityutskiy
On Mon, 2020-08-24 at 19:42 +0200, Rafael J. Wysocki wrote:
> From: "Rafael J. Wysocki" 
> 
> Make the energy_performance_preference policy attribute in sysfs
> always return the last EPP value written to it instead of the one
> currently in the HWP Request MSR to avoid possible confusion when
> the performance scaling algorithm is used in the active mode with
> HWP enabled (in which case the EPP is forced to 0 regardless of
> what value it has been set to via sysfs).

Why is this a good idea, I wonder. If there was a prior discussion,
please, point to it.

The general approach to changing settings via sysfs is often like this:

1. Write new value.
2. Read it back and verify that it is the same. Because there is no
better way to verify that the kernel "accepted" the value.

Let's say I write 'balanced' to energy_performance_preference. I read
it back, and it contains 'balanced', so I am happy, I trust the kernel
changed EPP to "balanced".

If the kernel, in fact, uses something else, I want to know about it
and have my script fail. Why caching the value and making my script
_think_ it succeeded is a good idea.

In other words, in my usage scenarios at list, I prefer kernel telling
the true EPP value, not some "cached, but not used" value.

Artem.



Re: [PATCH 04/14] bdi: initialize ->ra_pages in bdi_init

2020-07-20 Thread Artem Bityutskiy
On Mon, 2020-07-20 at 14:07 +0200, Christoph Hellwig wrote:
> What about jffs2 and blk2mtd raw block devices?

If my memory serves me correctly JFFS2 did not mind readahead.



Re: [PATCH] MAINTAINERS: mtd/ubi/ubifs: Remove inactive maintainers

2019-10-19 Thread Artem Bityutskiy
On Thu, 2019-10-17 at 16:22 +0200, Miquel Raynal wrote:
> Despite their substantial personal investment in the MTD/UBI/UBIFS a
> few years back, David, Brian, Artem and Adrian are not actively
> maintaining the subsystem anymore. We warmly salute them for all the
> work they have achieved and will of course still welcome their
> participation and reviews.
> 
> That said, Marek retired himself a few weeks ago quoting Harald [1]:
> 
> It matters who has which title and when. Should somebody not
> be an active maintainer, make sure he's not listed as such.
> 
> For this same reason, let’s trim the maintainers list with the
> actually active ones over the past two years.
> 
> [1] http://laforge.gnumonks.org/blog/20180307-mchardy-gpl/
> 
> Cc: David Woodhouse 
> Cc: Brian Norris 
> Cc: Artem Bityutskiy 
> Cc: Adrian Hunter 
> Cc: Marek Vasut 
> Cc: Miquel Raynal 
> Cc: Richard Weinberger 
> Cc: Vignesh Raghavendra 
> Cc: Tudor Ambarus 
> Signed-off-by: Miquel Raynal 

Acked-by: Artem Bityutskiy 




Re: [PATCH] MAINTAINERS: mtd/ubi/ubifs: Remove inactive maintainers

2019-10-17 Thread Artem Bityutskiy
On Thu, 2019-10-17 at 16:22 +0200, Miquel Raynal wrote:
> Despite their substantial personal investment in the MTD/UBI/UBIFS a
> few years back, David, Brian, Artem and Adrian are not actively
> maintaining the subsystem anymore. We warmly salute them for all the
> work they have achieved and will of course still welcome their
> participation and reviews.
> 
> That said, Marek retired himself a few weeks ago quoting Harald [1]:
> 
> It matters who has which title and when. Should somebody not
> be an active maintainer, make sure he's not listed as such.
> 
> For this same reason, let’s trim the maintainers list with the
> actually active ones over the past two years.

I am fine with being removed from maintainers, thanks!



Re: [PATCH v5 0/2] ubi: add possibility to skip CRC check for static UBI volumes

2018-07-02 Thread Artem Bityutskiy
On Mon, 2018-07-02 at 11:43 +0200, Quentin Schulz wrote:
> Some users of static UBI volumes implement their own integrity check, thus
> making the volume CRC check done at open time useless. For instance, this
> is the case when one use the ubiblock + dm-verity + squashfs combination,
> where dm-verity already checks integrity of the block device but this time
> at the block granularity instead of verifying the whole volume.
> 
> Skipping this test drastically improves the boot-time.
> 
> mtd-utils patches are already merged. See:
> http://git.infradead.org/mtd-utils.git/commit/9095d213f536aec0f3c37f177f3b907afde7
> http://git.infradead.org/mtd-utils.git/commit/7b4a65a27d2621b58c634d02c6a068ed9562383c
> http://git.infradead.org/mtd-utils.git/commit/5e9bc0daa41d84ce5de81c4a1665d65f51893c10
> http://git.infradead.org/mtd-utils.git/commit/8ba21ab75b41a1f9a6e27eed3ea80c9829669c5a

For the whole patch-set,

Reviewed-by: Artem Bityutskiy 



Re: [PATCH v3 1/2] ubi: provide a way to skip CRC checks

2018-07-02 Thread Artem Bityutskiy
On Mon, 2018-07-02 at 09:51 +0200, Boris Brezillon wrote:
> Well, I thought checking the CRC just after updating the volume made
> sense, just to make sure things were written correctly on the medium.
> Let's add a comment explaining why we keep the check here, unless you
> see a strong reason to get rid of this check in the update path.

I am fine with this, thanks.


Re: [PATCH v3 1/2] ubi: provide a way to skip CRC checks

2018-07-02 Thread Artem Bityutskiy
On Mon, 2018-07-02 at 09:43 +0200, Richard Weinberger wrote:
> Artem,
> 
> Am Montag, 2. Juli 2018, 09:30:25 CEST schrieb Artem Bityutskiy:
> > Hi,
> > 
> > On Thu, 2018-06-28 at 09:40 +0200, Quentin Schulz wrote:
> > > diff --git a/drivers/mtd/ubi/kapi.c b/drivers/mtd/ubi/kapi.c
> > > index d4b2e87..e9e9ecb 100644
> > > --- a/drivers/mtd/ubi/kapi.c
> > > +++ b/drivers/mtd/ubi/kapi.c
> > > @@ -202,7 +202,7 @@ struct ubi_volume_desc *ubi_open_volume(int
> > > ubi_num, int vol_id, int mode)
> > >   desc->mode = mode;
> > >  
> > >   mutex_lock(&ubi->ckvol_mutex);
> > > - if (!vol->checked) {
> > > + if (!vol->checked && !vol->skip_check) {
> > >   /* This is the first open - check the volume */
> > >   err = ubi_check_volume(ubi, vol_id);
> > >   if (err < 0) {
> > 
> > Did you deliberately did not add a similar check to
> > 'vol_cdev_write()' ?
> > You want to skip checking on load but do have the checking after
> > volume update ?
> > Looks a bit inconsistent to me. At the very least deserves a
> > comment in
> > 'vol_cdev_write()' about why 'skip_check' flag is ignored there.
> 
> Well, the idea is skipping the check, not the crc32 on the medium.
> That way we can later, if needed, offer a way to drop the flag but
> and don't have to rewrite with crc32-enabled.

I understand that. I am talking about cdev.c line 370. We will check
the CRC after the update regardless of 'skip_check'. So I point out
that this is not very consistent with what we do in
'ubi_open_volume()'.

Is this deliberate or not? If this is deliberate, we should at least
add an explanation comment in 'vol_cdev_write()'.


Re: [PATCH v3 1/2] ubi: provide a way to skip CRC checks

2018-07-02 Thread Artem Bityutskiy
Hi,

On Thu, 2018-06-28 at 09:40 +0200, Quentin Schulz wrote:
> diff --git a/drivers/mtd/ubi/kapi.c b/drivers/mtd/ubi/kapi.c
> index d4b2e87..e9e9ecb 100644
> --- a/drivers/mtd/ubi/kapi.c
> +++ b/drivers/mtd/ubi/kapi.c
> @@ -202,7 +202,7 @@ struct ubi_volume_desc *ubi_open_volume(int ubi_num, int 
> vol_id, int mode)
>   desc->mode = mode;
>  
>   mutex_lock(&ubi->ckvol_mutex);
> - if (!vol->checked) {
> + if (!vol->checked && !vol->skip_check) {
>   /* This is the first open - check the volume */
>   err = ubi_check_volume(ubi, vol_id);
>   if (err < 0) {

Did you deliberately did not add a similar check to 'vol_cdev_write()' ?
You want to skip checking on load but do have the checking after volume update ?
Looks a bit inconsistent to me. At the very least deserves a comment in
'vol_cdev_write()' about why 'skip_check' flag is ignored there.


Re: [PATCH][v4] tools/power turbostat: if --iterations, print for specific time of iterations

2018-04-12 Thread Artem Bityutskiy
On Fri, 2018-04-13 at 11:40 +0800, Yu Chen wrote:
> diff --git a/tools/power/x86/turbostat/turbostat.c 
> b/tools/power/x86/turbostat/turbostat.c
> index bd9c6b31a504..a2fe96f038f0 100644
> --- a/tools/power/x86/turbostat/turbostat.c
> +++ b/tools/power/x86/turbostat/turbostat.c
> @@ -48,6 +48,7 @@ char *proc_stat = "/proc/stat";
>  FILE *outf;
>  int *fd_percpu;
>  struct timespec interval_ts = {5, 0};
> +int iterations;
>  unsigned int debug;
>  unsigned int quiet;
>  unsigned int sums_need_wide_columns;
> @@ -470,6 +471,7 @@ void help(void)
>   "   {core | package | j,k,l..m,n-p }\n"
>   "--quietskip decoding system configuration header\n"
>   "--interval sec Override default 5-second measurement interval\n"
> + "--iterations count Number of measurement iterations(requires 
> '--interval'\n"

Missing white-space and ")".

-- 
Best Regards,
Artem Bityutskiy


Re: [PATCH][v3] tools/power turbostat: if --max_loop, print for specific time of loops

2018-04-11 Thread Artem Bityutskiy
A couple of nitpicks.

On Wed, 2018-04-11 at 18:30 +0800, Yu Chen wrote:
> @@ -48,6 +48,7 @@ char *proc_stat = "/proc/stat";
>  FILE *outf;
>  int *fd_percpu;
>  struct timespec interval_ts = {5, 0};
> +int iterations;

OK, out of several choices, you selected "iterations".

>  unsigned int debug;
>  unsigned int quiet;
>  unsigned int sums_need_wide_columns;
> @@ -470,6 +471,7 @@ void help(void)
>   "   {core | package | j,k,l..m,n-p }\n"
>   "--quietskip decoding system configuration header\n"
>   "--interval sec Override default 5-second measurement interval\n"
> + "--iterations loops The number of loops if interval is specified\n"

Since "iterations" is the term, be consistent and do not mix it with
"loops". Who knows may be the "loops" term will be used for something
else in the future. Use something like this:

"--iterations countNumber of measurement iterations (requires '
--interval')"

>   print this help mkk
>   "--list list column headers only\n"
>   "--out file create or truncate \"file\" for all output\n"
> @@ -2565,6 +2567,7 @@ void turbostat_loop()
>  {
>   int retval;
>   int restarted = 0;
> + int loops = 0;

Please, name variables in a consistent manner, this should really be
something like 'int iters = 0'. Or may be 'done_iters', or something.
But not "loops".

> @@ -4999,6 +5010,7 @@ void cmdline(int argc, char **argv)
>   {"Dump",no_argument,0, 'D'},
>   {"debug",   no_argument,0, 'd'},/* 
> internal, not documented */
>   {"interval",required_argument,  0, 'i'},
> + {"iterations",  required_argument,  0, 't'},

If you used term "count", you could have consistent long and short
option names, like '--count / -c'. I find '--iterations / -t' to be
inconsistent, and harder to remember the short option, because I think
about time, not "iterations" when I see -t.


Re: [PATCH][v2] tools/power turbostat: if --max_loop, print for specific time of loops

2018-04-11 Thread Artem Bityutskiy
On Tue, 2018-04-10 at 22:51 +0800, Yu Chen wrote:
> + case 't':
> + {
> + int loops = strtod(optarg, NULL);
> +
> + if (loops <= 0) {
> + fprintf(outf, "loops %d should be 
> positive number\n",
> + iterations);
> + exit(2);
> + }
> + iterations = loops;
> + }

What is the point of the additional {} scope and the 'loops' variable
in it? You could use the 'iterations' variable directly and simplify
the code.

Artem.


Re: [PATCH][RFC] tools/power turbostat: if --max_loop, print for specific time of loops

2018-04-10 Thread Artem Bityutskiy
On Tue, 2018-04-10 at 12:22 +0200, Rafael J. Wysocki wrote:
> On Tue, Apr 10, 2018 at 12:18 PM, Yu Chen 
> wrote:
> > From: Chen Yu 
> > 
> > There's a use case during test to only print specific round of
> > loops
> > if --interval is specified, for example, with this patch applied:
> > 
> > turbostat -i 5 --max_loops 4
> > will capture 4 samples with 5 seconds interval.
> 
> Why --max_loops and not just --loops or --iterations?

Or just --count.

Artem.



Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-27 Thread Artem Bityutskiy
On Mon, 2018-03-26 at 10:39 +0200, Thorsten Leemhuis wrote:
> Lo! Your friendly Linux regression tracker here ;-)
> 
> On 08.03.2018 14:18, Artem Bityutskiy wrote:
> > On Thu, 2018-03-08 at 18:53 +0800, Ming Lei wrote:
> > > This patchset tries to spread among online CPUs as far as possible, so
> > > that we can avoid to allocate too less irq vectors with online CPUs
> > > mapped.
> > 
> > […] 
> > Tested-by: Artem Bityutskiy 
> > Link: https://lkml.kernel.org/r/1519311270.2535.53.ca...@intel.com
> > 
> > this patchset fixes the v4.16-rcX regression that I reported few weeks
> > ago. I applied it and verified that Dell R640 server that I mentioned
> > in the bug report boots up and the disk works.
> 
> Artem (or anyone else), what's the status here? I have this on my list
> of regressions, but it looks like there wasn't any progress in the past
> week. Or was it discussed somewhere else or even fixed in the meantime
> and I missed it? Ciao, Thorsten

Hi, it is not fixed in upstream.

I got an e-mail from James that the fixes are in his tree in the
"fixes" branch. There is no word about when it will be merged. There is
also no stable tag.




Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-14 Thread Artem Bityutskiy
On Wed, 2018-03-14 at 12:11 +0800, Dou Liyang wrote:
> > At 03/13/2018 05:35 PM, Rafael J. Wysocki wrote:
> > > On Tue, Mar 13, 2018 at 9:39 AM, Artem Bityutskiy 
> > > > Longer term, yeah, I agree. Kernel's notion of possible CPU
> > > > count
> > > > should be realistic.
> > 
> > I did a patch for that, Artem, could you help me to test it.
> > 
> 
> I didn't consider the nr_cpu_ids before. please ignore the old patch
> and
> try the following RFC patch.

Sure I can help with testing a patch, could we please:

1. Start a new thread for this
2. Include ACPI forum/folks

Thanks,
Artem.


Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-13 Thread Artem Bityutskiy
On Tue, 2018-03-13 at 16:35 +0800, Ming Lei wrote:
> Then looks this issue need to fix by making possible CPU count
> accurate
> because there are other resources allocated according to
> num_possible_cpus(),
> such as percpu variables.

Short term the regression should be fixed. It is already v4.16-rc6, we
have little time left.

Longer term, yeah, I agree. Kernel's notion of possible CPU count
should be realistic.

Artem.


Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-13 Thread Artem Bityutskiy
On Tue, 2018-03-13 at 11:11 +0800, Dou Liyang wrote:
>  I also
> met the situation that BIOS told to ACPI that it could support
> physical
> CPUs hotplug, But actually, there was no hardware slots in the
> machine.
> the ACPI tables like user inputs which should be validated when we
> use.

This is exactly what happens on Skylake Xeon systems. When I check
dmesg or this file:

/sys/devices/system/cpu/possible

on 2S (two socket) and 4S (four socket) systems, I see the same number
432.

This number comes from ACPI MADT. I will speculate (did not see myself)
that 8S systems will report the same number as well, because of the
Skylake-SP (Scalable Platform) architecture.

Number 432 is good for 8S systems, but it is way too large for 2S and
4S systems - 4x or 2x larger than the theoretical maximum.

I do not know why BIOSes have to report unrealistically high numbers, I
am just sharing my observation.

So yes, Linux kernel's possible CPU count knowledge may be too large.
If we use that number to evenly spread IRQ vectors among the CPUs, we
end up with wasted vectors, and even bugs, as I observe on a 2S
Skylake.

Artem.


Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-08 Thread Artem Bityutskiy
On Fri, 2018-03-09 at 09:24 +0800, Ming Lei wrote:
> Hi Thomas,
> 
> On Fri, Mar 09, 2018 at 12:20:09AM +0100, Thomas Gleixner wrote:
> > On Thu, 8 Mar 2018, Ming Lei wrote:
> > > Actually, it isn't a real fix, the real one is in the following
> > > two:
> > > 
> > >   0c20244d458e scsi: megaraid_sas: fix selection of reply queue
> > >   ed6d043be8cd scsi: hpsa: fix selection of reply queue
> > 
> > Where are these commits? Neither Linus tree not -next know anything
> > about
> > them
> 
> Both aren't merged yet, but they should land V4.16, IMO.

Is it a secret where they are? If not, could you please give ma a
pointer and I'll give them a test.


Re: [PATCH] ubi: Reject MLC NAND

2018-03-08 Thread Artem Bityutskiy
On Thu, 2018-03-08 at 16:28 +0100, Richard Weinberger wrote:
> You mean *not* widely used?

Yes, indeed, thanks for correction. MLC just wasn't there yet when
UBI/UBIFS were upstreamed.


Re: [PATCH] ubi: Reject MLC NAND

2018-03-08 Thread Artem Bityutskiy
On Thu, 2018-03-08 at 15:43 +0100, Richard Weinberger wrote:
> As stated by David Woodhouse, it was a huge mistake by UBI to not
> reject MLC 
> NAND from the very beginning.

Correction: when we were developing UBI/UBIFS and upstreamed them, MLC
was widely used yet we did not really know about it. So there was
nothing to reject yet.

The mistake is that we did not add the reject timely. When people
started reporting MLC issues we were answering that UBI/UBIFS stack
needs more work to make MLC safe to use, and we hoped someone would do
the work.

Artem.


Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-08 Thread Artem Bityutskiy
On Thu, 2018-03-08 at 15:18 +0200, Artem Bityutskiy wrote:
> Tested-by: Artem Bityutskiy 
> Link: https://lkml.kernel.org/r/1519311270.2535.53.ca...@intel.com

And for completeness:
Linux-Regression-ID: lr#15a115


Re: [PATCH V3 0/4] genirq/affinity: irq vector spread among online CPUs as far as possible

2018-03-08 Thread Artem Bityutskiy
On Thu, 2018-03-08 at 18:53 +0800, Ming Lei wrote:
> Hi,
> 
> This patchset tries to spread among online CPUs as far as possible, so
> that we can avoid to allocate too less irq vectors with online CPUs
> mapped.
> 
> For example, in a 8cores system, 4 cpu cores(4~7) are offline/non present,
> on a device with 4 queues:
> 
> 1) before this patchset
>   irq 39, cpu list 0-2
>   irq 40, cpu list 3-4,6
>   irq 41, cpu list 5
>   irq 42, cpu list 7
> 
> 2) after this patchset
>   irq 39, cpu list 0,4
>   irq 40, cpu list 1,6
>   irq 41, cpu list 2,5
>   irq 42, cpu list 3,7
> 
> Without this patchset, only two vectors(39, 40) can be active, but there
> can be 4 active irq vectors after applying this patchset.

Tested-by: Artem Bityutskiy 
Link: https://lkml.kernel.org/r/1519311270.2535.53.ca...@intel.com

Ming,

this patchset fixes the v4.16-rcX regression that I reported few weeks
ago. I applied it and verified that Dell R640 server that I mentioned
in the bug report boots up and the disk works.

So this is not just an improvement, it also includes a bugfix. 

Thanks!


Re: [PATCH] ubi: Reject MLC NAND

2018-03-07 Thread Artem Bityutskiy
On Wed, 2018-03-07 at 09:01 +0100, Richard Weinberger wrote:
> Pavel,
> 
> Am Mittwoch, 7. März 2018, 00:18:05 CET schrieb Pavel Machek:
> > On Sat 2018-03-03 11:45:54, Richard Weinberger wrote:
> > > While UBI and UBIFS seem to work at first sight with MLC NAND,
> > > you will
> > > most likely lose all your data upon a power-cut or due to
> > > read/write
> > > disturb.
> > > In order to protect users from bad surprises, refuse to attach to
> > > MLC
> > > NAND.
> > > 
> > > Cc: sta...@vger.kernel.org
> > 
> > That sounds like _really_ bad idea for stable. All it does is it
> > removes support for hardware that somehow works.
> 
> MLC is not supported and does not work. Full stop.

Ack.


Re: regression: SCSI/SATA failure

2018-03-05 Thread Artem Bityutskiy
Linux-Regression-ID: lr#15a115

On Thu, 2018-02-22 at 16:54 +0200, Artem Bityutskiy wrote:
> Hi Christoph,
> 
> one of our test box Skylake servers does not boot with v4.16-rcX.
> Bisection lead us to this commit:
> 
> 84676c1f21e8 genirq/affinity: assign vectors to all possible CPUs
> 
> Reverting this single commit fixes the problem.
> 
> The server is a Dell R640 machine with the latest Dell BIOS. It has a
> single SATA SSD and we do not use raid, even though the system does
> have a megaraid controller.

Correction: we have Raid0 with this single disk.

> Are you aware of this issue? Below is the failure message and the
> full
> dmesg with some debugging boot parameters is here:
> 
> https://pastebin.com/raw/tTYrTAEQ

FYI, the regression still exists and reverting this single patch fixes
it. But today Dell server

I did not have time to really debug this, but I think people who are
working with this should quickly see what is going on.

I think the platform reports way too large possible CPU count. Indeed,
in dmesg I see this:

[0.00] smpboot: Allowing 328 CPUs, 224 hotplug CPUs

224 is way too large for this system. It only has 2 sockets, it but the
number looks like if the system had 4 sockets.

The commit changes IRQ affinity logic from being per-present CPU to
being per-possible CPU:

-   for_each_present_cpu(cpu)
+   for_each_possible_cpu(cpu)

And it looks like this has an unexpected side-effect on this Dell
platform.

Artem.


Re: regression: SCSI/SATA failure

2018-02-22 Thread Artem Bityutskiy
On Thu, 2018-02-22 at 16:54 +0200, Artem Bityutskiy wrote:
> Hi Christoph,
> 
> one of our test box Skylake servers does not boot with v4.16-rcX.
> Bisection lead us to this commit:
> 
> 84676c1f21e8 genirq/affinity: assign vectors to all possible CPUs
> 
> Reverting this single commit fixes the problem.

I meant to send the e-mail from the my @gmail.com address, sorry for
the company disclaimers that will probably be appended to the message.

Artem.


Re: [PATCH] ubifs: Fix inode leak in xattr code

2017-05-15 Thread Artem Bityutskiy
On Mon, 2017-05-15 at 17:22 +0200, Richard Weinberger wrote:
> Alternatively we could add a iget_locked/drop_nlink/iput sequence to
> ubifs_tnc_remove_ino(). But that will make unlink() much slower for
> files
> that contain xattrs.

At that level we'd need to do it for every xattr, even those that were
never be accessed, which would be slow indeed.

But we really only need to check the inode cache: hey, icache, I am
dying, and if you have any of my guys (xattrs), I want them to die with
me. 

So the question is how to find our guys in the inode cache. I am not
sure. Probably be we'd have to have our own list of cached inodes in
the host inode, and maintain it.

Artem.


Re: [PATCH] ubifs: Fix inode leak in xattr code

2017-05-15 Thread Artem Bityutskiy
On Mon, 2017-05-15 at 16:20 +0200, Richard Weinberger wrote:
> To solve this problem, set i_nlink for all xattr inodes to 0, such
> that
> the iput() in the UBIFS xattr code makes the temporary inode vanish
> immediately.

What if there is iget right after iput? With this patch, will we need
to go all the way to the slow media instead of just getting having the
inode from the inode cache?


Re: [PATCH v2 3/3] fs: ubifs: set s_uuid in super block

2017-05-02 Thread Artem Bityutskiy
On Tue, 2017-04-11 at 22:43 +0200, Richard Weinberger wrote:
> Makes sense.
> 
> Artem, do you remember why UBIFS didn't set s_uuid in first place?

Just did not notice it I think.


Re: [PATCH 1/1] mtd: ubi: fix improper return value

2016-12-05 Thread Artem Bityutskiy
On Mon, 2016-12-05 at 09:23 +0100, Boris Brezillon wrote:
> I started to implement that too but unfortunately never had the time
> to
> finish it :-(.
> Don't know why you were trying to move to kzalloc-ed buffer, but my
> goal was to avoid the extra copy when the controller transfers data
> using DMA, and the recent posts regarding vmalloc-ed buffers and DMA
> might solve the issue.

Yes, I wanted to do that globally for UBI/UBIFS to get rid of vmalloc.

> This being said, UBI and UBIFS tend to allocate big portions of
> memory (usually a full eraseblock), and sometime this is
> overkill.

Those checks were just hacky debugging functions at the beginning, then
they got cleaned up without a re-write.

> For example, I'm not sure we need to allocate that much memory to do
> things like 'check if this portion is all filled with 0xff'. 

Because memcmp() is was very easy to use. Back then the focus was
getting other things work well, and efforts were saved on less
important parts. And 128KiB was not terribly bad. Today, these less
important things are important.

> Allocating
> a ->max_write_size buffer and iterating over write-units should be
> almost as efficient and still consume less memory. But this has
> nothing
> to do with the vmalloc vs kmalloc debate ;-).

Well, this is related. Someone may start small and take care of these
first :-)

Artem.


Re: [PATCH 1/1] mtd: ubi: fix improper return value

2016-12-04 Thread Artem Bityutskiy
On Sun, 2016-12-04 at 21:52 +0100, Richard Weinberger wrote:
> We should better think about how to get ubi_self_check_all_ff()
> fixed.
> When enabled on a modern NAND, vmalloc() is likely to fail now and
> then
> since len is the erase block size and can be up to a few mega bytes.

I did an attempt to switch from virtually continuous buffers to an
array of page pointers, but never finished.


Re: [PATCH 1/1] MAINTAINERS: add a maintainer for the SPI NOR subsystem

2016-10-18 Thread Artem Bityutskiy
On Tue, 2016-10-18 at 11:46 -0700, Brian Norris wrote:
> I could go with either method I suppose, but I don't personally like
> the
> idea of splitting out the various bits of MTD into *completely*
> independent lines of development. As long as someone (not necessarily
> me) can manage pulling the sub-subsystems together, I think it would
> make sense to have 1 PR for Linus for non-UBI/FS MTD changes.

I think this is a good point. MTD is pretty much a "don't care"
subsystem for Linus, an I do not think he'll appreciate 2 or more micro
MTD pull requests. It makes a lot more sense to consolidate all this
under a single tree MTD tree and make a single pull request.


Re: [PATCH] logfs: remove from tree

2016-09-11 Thread Artem Bityutskiy
On Sun, 2016-09-11 at 15:04 +0200, Christoph Hellwig wrote:
> Logfs was introduced to the kernel in 2009, and hasn't seen any non
> drive-by changes since 2012, while having lots of unsolved issues
> including the complete lack of error handling, with more and more
> issues popping up without any fixes.
> 
> The logfs.org domain has been bouncing from a mail, and the
> maintainer
> on the non-logfs.org domain hasn't repsonded to past queries either.
> 
> Signed-off-by: Christoph Hellwig 

Back in 2008 logfs and UBIFS were in sort of competing projects. I
remember we inspected logfs code and tested it - we did not find proper
wear-levelling and bad block handling, we did not see proper error
handling, and it exploded when we were running relatively simple tests.
We indicated this here in a very humble way to avoid the "conflict of
interest" perseption:

https://lkml.org/lkml/2008/3/31/117

I did not follow logfs since then, but I think there wasn't much
development since then and all these issue are still there. I mean,
unless I am horribly mistaken, logfs does not really have the basic
features of a flash file system and there is no point keeping it in the
tree and consuming people's time maintaining it.

Artem.


Re: [PATCH] UBIFS: fix assertion in layout_in_gaps()

2016-08-12 Thread Artem Bityutskiy
On Fri, 2016-08-12 at 15:26 +0200, Vincent Stehlé wrote:
> An assertion in layout_in_gaps() verifies that the gap_lebs pointer
> is
> below the maximum bound. When computing this maximum bound the
> idx_lebs
> count is multiplied by sizeof(int), while C pointers arithmetic does
> take
> into account the size of the pointed elements implicitly already.
> Remove
> the multiplication to fix the assertion.
> 
> Fixes: 1e51764a3c2ac05a ("UBIFS: add new flash file system")
> Signed-off-by: Vincent Stehlé 
> Cc: Artem Bityutskiy 

Signed-off-by: Artem Bityutskiy 

Thanks!

-- 
Best Regards,
Artem Bityutskiy


Re: [PATCH] MAINTAINERS: Update UBIFS entry

2016-03-19 Thread Artem Bityutskiy
On Thu, 2016-03-03 at 14:22 +0100, Richard Weinberger wrote:
> ...to represent the status quo.
> 
> Signed-off-by: Richard Weinberger 
> ---
>  MAINTAINERS | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

Acked-by: Artem Bityutskiy 

Thanks Richard!


Re: [PATCH v2] MAINTAINERS: add a maintainer for the NAND subsystem

2016-03-10 Thread Artem Bityutskiy
On Wed, 2016-03-09 at 13:48 -0800, Brian Norris wrote:
> On Wed, Mar 09, 2016 at 09:57:24PM +0100, Boris Brezillon wrote:
> > 
> > Add myself as the maintainer of the NAND subsystem.
> > 
> > Signed-off-by: Boris Brezillon 
> > ---
> > Changes since v1:
> > - rename the MAINTAINERS entry to NAND FLASH SUBSYSTEM
> > - add Richard as a reviewer
> > - add a git tree
> Acked-by: Brian Norris 

Acked-by: Artem Bityutskiy 



Re: [PATCH V2] ubifs: Add logging functions for ubifs_msg, ubifs_err and ubifs_warn

2016-03-04 Thread Artem Bityutskiy
On Thu, 2016-03-03 at 04:25 -0800, Joe Perches wrote:
> My suggestion would be to add him and designate you and Adrian
> as R: reviewers instead of M: maintainers like below.

I think it is a little bit too early.


Re: [PATCH V2] ubifs: Add logging functions for ubifs_msg, ubifs_err and ubifs_warn

2016-03-03 Thread Artem Bityutskiy
On Wed, 2016-03-02 at 10:58 -0800, Joe Perches wrote:
> 
>  UBI FILE SYSTEM (UBIFS)
> -M:   Artem Bityutskiy 
> -M:   Adrian Hunter 
> +M:   Richard Weinberger 
>  L:   linux-...@lists.infradead.org
>  T:   git git://git.infradead.org/ubifs-2.6.git
>  W:   http://www.linux-mtd.infradead.org/doc/ubifs.html

Hi Joe,

could please, re-send this patch with the following modifications:

1. Put Richard's name first.
2. Do not remove mine and Adrian's name. We are not very active, but
still useful.

Thanks!


Re: [PATCH] ubifs: Fix error codes in ubifs_iget()

2016-01-03 Thread Artem Bityutskiy
On Sun, 2016-01-03 at 15:51 +0200, Artem Bityutskiy wrote:
> On Sat, 2016-01-02 at 23:11 +0100, Richard Weinberger wrote:
> > We cannot use positive error codes in ERR_PTR().
> > IS_ERR() won't catch them.
> 
> Right, but why there is a "err = -EINVAL;" when at 'out_invalid'.

Sorry Richard, I edited the sentence and did not notice it was messy.

Here is what I wanted to say: right, but there is a "err = -EINVAL' at
the end of 'out_invalid'.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ubifs: Use XATTR_*_PREFIX_LEN

2016-01-03 Thread Artem Bityutskiy
On Sat, 2016-01-02 at 23:12 +0100, Richard Weinberger wrote:
> ...instead of open coding it.
> 
> Signed-off-by: Richard Weinberger 

Looks good, thanks!

Signed-off-buy: Artem Bityutskiy 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ubifs: Fix error codes in ubifs_iget()

2016-01-03 Thread Artem Bityutskiy
On Sat, 2016-01-02 at 23:11 +0100, Richard Weinberger wrote:
> We cannot use positive error codes in ERR_PTR().
> IS_ERR() won't catch them.

Right, but why there is a "err = -EINVAL;" when at 'out_invalid'.

> Cc: sta...@vger.kernel.org
> Signed-off-by: Richard Weinberger 

I do not see a bug, but I see a removal of a useful code which lets you
understand what verification failed. Do I miss something?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] fs/ubifs: remove unnecessary new_valid_dev check

2015-10-01 Thread Artem Bityutskiy
On Wed, 2015-09-30 at 10:11 +0200, Richard Weinberger wrote:
> >  {
> > -   if (new_valid_dev(rdev)) {
> > -   dev->new = cpu_to_le32(new_encode_dev(rdev));
> > -   return sizeof(dev->new);
> > -   } else {
> > -   dev->huge = cpu_to_le64(huge_encode_dev(rdev));
> > -   return sizeof(dev->huge);
> > -   }
> > +   dev->new = cpu_to_le32(new_encode_dev(rdev));
> > +   return sizeof(dev->new);
> >  }
> 
> Reviewed-by: Richard Weinberger 

Signed-off-by: Artem Bityutskiy 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH cgroup/for-4.3-fixes] cgroup, writeback: don't enable cgroup writeback on traditional hierarchies

2015-09-29 Thread Artem Bityutskiy
On Sat, 2015-09-26 at 18:14 -0400, Tejun Heo wrote:
> Hello, Artem.
> 
> Thanks a lot for the debug dump.  Can you please test whether the
> below patch fixes the issue?

Hi,

I've tested this, 7 out of 7 reboots successful, so it looks like this
patch fixes the problem.

Artem.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH cgroup/for-4.3-fixes] cgroup, writeback: don't enable cgroup writeback on traditional hierarchies

2015-09-26 Thread Artem Bityutskiy
On Fri, 2015-09-25 at 11:49 -0400, Tejun Heo wrote:
> Hello, Artem.

I've applied this patch on top of

ced255c Merge branch 'next' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux

I've added 'ignore_loglevel' to the kernel boot parameters. I have the
serial console configured and I collect the serial output.

I've reproduced the issue on a machine with host name 'ivbep0'. The
serial log is at the end of this e-mail.

To recap, the test I did is.

1. The machine runs the ced255c kernel + your patch.
2. We are re-installing the same kernel there by doing this:
2.1 Remove the modules: /usr/lib/modules/v4.3-rc2+
2.2 Copy the modules to /usr/lib/modules/v4.3-rc2+
3. We run 'sync'
4. We run 'reboot'

Expected outcome: the machine reboots into the same kernel, modules can
be loaded and they are not corrupted.

Real outcome: the kernel boots up, but modules cannot be loaded,
because they are corrupted, for example:

[root@ivbep0 ~]# modprobe xfs
modprobe: ERROR: magic check fail: 0 instead of b007f457
[root@ivbep0 ~]# hexdump /usr/lib/modules/4.3.0-rc2+/kernel/fs/xfs/xfs.ko  
000        
*
17adc90         
17adc9e
[root@ivbep0 ~]# file /usr/lib/modules/4.3.0-rc2+/kernel/fs/xfs/xfs.ko 
/usr/lib/modules/4.3.0-rc2+/kernel/fs/xfs/xfs.ko: data

To make it easier for you, I've edited the log and added a couple of
comments there describing the stage we are at.

HTH,
Artem.


#
# Artem: The below is printed when we run 'sync'
#
[   94.900810] XXX SYNCING 8:1
[   95.075622] XXX sync_inodes_sb(8:1): dirty inode 13107315 skipped, wb=15 
comp_gen=0->0 which=1->1 i_state=0x7
[   95.087258] XXX sync_inodes_sb(8:1): dirty inode 13107314 skipped, wb=15 
comp_gen=0->0 which=1->1 i_state=0x7
[   95.098881] XXX sync_inodes_sb(8:1): dirty inode 13107313 skipped, wb=15 
comp_gen=0->0 which=1->1 i_state=0x7
[   95.110487] XXX sync_inodes_sb(8:1): dirty inode 13107312 skipped, wb=15 
comp_gen=0->0 which=1->1 i_state=0x7
[   95.122085] XXX sync_inodes_sb(8:1): dirty inode 13107311 skipped, wb=15 
comp_gen=0->0 which=1->1 i_state=0x7
[   95.133677] XXX sync_inodes_sb(8:1): dirty inode 13107310 skipped, wb=15 
comp_gen=0->0 which=1->1 i_state=0x7
[   95.145259] XXX sync_inodes_sb(8:1): dirty inode 13107309 skipped, wb=15 
comp_gen=0->0 which=1->1 i_state=0x7
[   95.156854] XXX sync_inodes_sb(8:1): dirty inode 13107308 skipped, wb=15 
comp_gen=0->0 which=1->1 i_state=0x7
[   95.168447] XXX sync_inodes_sb(8:1): dirty inode 13107307 skipped, wb=15 
comp_gen=0->0 which=1->1 i_state=0x7
[   95.180034] XXX sync_inodes_sb(8:1): dirty inode 13107306 skipped, wb=15 
comp_gen=0->0 which=1->1 i_state=0x7
[   95.191703] XXX all_wbs  = 1
[   95.195596] XXX iterated_wbs = 1
[   95.199488] XXX written_wbs  = 1
[   95.203382] XXX sync_wbs = 
[   95.568881] echo (1452): drop_caches: 3
[   95.982140] XXX SYNCING 8:1
[   95.999746] XXX all_wbs  = 1
[   96.003668] XXX iterated_wbs = 1
[   96.007569] XXX written_wbs  = 1
[   96.011472] XXX sync_wbs = 
[   96.394702] XXX SYNCING 8:1
[   96.404041] ixgbe :81:00.0: removed PHC on enp129s0f0
[   96.803148] systemd[1]: Received SIGRTMIN+20 from PID 1499 (plymouthd).
[  OK  ] Stopped Network Manager.
[   96.816673] XXX all_wbs  = 1
[   96.820401] XXX iterated_wbs = 1
[   96.824099] XXX written_wbs  = 1
[   96.827790] XXX sync_wbs = 
#
# Artem: The below is printed when we run 'restart'
#
[  OK  ] Started Show Plymouth Power Off Screen.
[  OK  ] Stopped User Manager for UID 0.
 Stopping Permit User Sessions...
[  OK  ] Removed slice user-0.slice.
[  OK  ] Stopped Permit User Sessions.
[  OK  ] Stopped target Remote File Systems.
[  OK  ] Stopped target Basic System.
[  OK  ] Stopped target Paths.
[  OK  ] Stopped target Sockets.
[  OK  ] Closed D-Bus [   97.249846] audit_printk_skb: 18 callbacks 
suppressed
System Message B[   97.256420] audit: type=1305 audit(1443252516.709:330): 
audit_pid=0 old=1012 auid=4294967295 ses=4294967295 res=1
us Socket.
[[[   97.269838] audit: type=1131 audit(1443252516.729:331): pid=1 uid=0 
auid=4294967295 ses=4294967295 msg='unit=auditd comm="systemd" 
exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
32m  OK  ] S[   97.291815] audit: type=1131 audit(1443252516.751:332): 
pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-tmpfiles-setup 
comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? 
res=success'
topped target System Initializat[   97.315950] audit: type=1131 
audit(1443252516.776:333): pid=1 uid=0 auid=4294967295 ses=4294967295 
msg='unit=fedora-import-state comm="systemd" exe="/usr/lib/systemd/systemd" 
hostname=? addr=? terminal=? res=success'
ion.
[  OK  ] Stopped target Encrypte[   97.343202] audit: type=1131 
audit(1443252516.803:334): pid=1 uid=0 auid=4294967295 ses=4294967295 
msg='unit=s

Re: [PATCH cgroup/for-4.3-fixes] cgroup, writeback: don't enable cgroup writeback on traditional hierarchies

2015-09-25 Thread Artem Bityutskiy
On Fri, 2015-09-25 at 09:49 +0300, Artem Bityutskiy wrote:
> On Thu, 2015-09-24 at 16:45 -0400, Tejun Heo wrote:
> > Hello, Artem.
> > 
> > On Thu, Sep 24, 2015 at 11:09:46AM +0300, Artem Bityutskiy wrote:
> > > On Wed, 2015-09-23 at 17:07 -0400, Tejun Heo wrote:
> > > > So, this should make the regression go away.  It doesn't fix
> > > > the
> > > > underlying bugs but they shouldn't get triggered by people not
> > > > experimenting with cgroup.
> > > 
> > > this hits the nail on the head and makes the problem go away.
> > 
> > Yeah but there still is an underlying problem here.  I've been
> > going
> > through the sync path today but can't trigger or spot anything
> > wrong.
> > Can you please apply the patch at the end of this mail, trigger the
> > failure and report the kernel log?
> > 
> > Thanks a lot.
> 
> Does not compile with multiple errors like
> 
> linux/fs/fs-writeback.c:799:10: error: ‘struct bdi_writeback’ has no
> member named ‘last_comp_gen’
>bdi->wb.last_comp_gen = bdi->wb.comp_gen;

I tried to extend your patch with these fields, but I am not sure I got
it right, so please, send a new patch, I'll run the reboot corruption
test with your patch.

Please, note, because this test is about reboots, I'll probably output
everything to the serial console. Therefore, please, do not print too
much data. Otherwise I'd have to modify my scripts to collect dmesg
before restarting, which is more work.

Artem.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH cgroup/for-4.3-fixes] cgroup, writeback: don't enable cgroup writeback on traditional hierarchies

2015-09-24 Thread Artem Bityutskiy
On Thu, 2015-09-24 at 16:45 -0400, Tejun Heo wrote:
> Hello, Artem.
> 
> On Thu, Sep 24, 2015 at 11:09:46AM +0300, Artem Bityutskiy wrote:
> > On Wed, 2015-09-23 at 17:07 -0400, Tejun Heo wrote:
> > > So, this should make the regression go away.  It doesn't fix the
> > > underlying bugs but they shouldn't get triggered by people not
> > > experimenting with cgroup.
> > 
> > this hits the nail on the head and makes the problem go away.
> 
> Yeah but there still is an underlying problem here.  I've been going
> through the sync path today but can't trigger or spot anything wrong.
> Can you please apply the patch at the end of this mail, trigger the
> failure and report the kernel log?
> 
> Thanks a lot.

Does not compile with multiple errors like

linux/fs/fs-writeback.c:799:10: error: ‘struct bdi_writeback’ has no member 
named ‘last_comp_gen’
   bdi->wb.last_comp_gen = bdi->wb.comp_gen;
  ^



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] ext4: implement cgroup writeback support

2015-09-24 Thread Artem Bityutskiy
On Wed, 2015-09-23 at 16:48 -0400, Theodore Ts'o wrote:
> On Wed, Sep 23, 2015 at 10:47:16PM +0300, Artem Bityutskiy wrote:
> > And, by the way, I forgot this detail - I cut AC power off at the
> > end, then put it back on after a 20 seconds delay. I mean, this is
> > a
> > clean reboot, but with power cut at the end.
> 
> Is this reproducible without the power cut?  And what model SSD are
> you using, and are you sure that it has Power Loss Protection (many
> SSD vendors use power loss protection as the price discrimination
> feature between consumer-grade SSD and enterprise-grade SSD's that
> cost $$$).  If this problem wasn't showing up with 4.2, and is only
> failing with 4.3-rcX, then it might not be a hardware issue --- but
> it's also possible that there was a timing issue which was hiding a
> hardware problem.  So for the purposes of debugging, removing the
> power cut from the set of variables is a useful thing to do.

Ted, you are right that the problem may be anywhere, but since Tejun's
patch fixes it, I decided to not spend time on testing without AC power
cuts.

But I checked and on of my boxes uses an HDD, and it shows the same
problem, which makes the theory of imperfect SSD firmware being
involved less likely.

Thanks,
Artem.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH cgroup/for-4.3-fixes] cgroup, writeback: don't enable cgroup writeback on traditional hierarchies

2015-09-24 Thread Artem Bityutskiy
On Wed, 2015-09-23 at 17:07 -0400, Tejun Heo wrote:
> Hello,
> 
> So, this should make the regression go away.  It doesn't fix the
> underlying bugs but they shouldn't get triggered by people not
> experimenting with cgroup.

Tejun,

this hits the nail on the head and makes the problem go away.

I've tested the tip of Linuses tree (v4.3-rc2+) plus this patch - no
data corruption after reboots.

I've tested just the tip of Linuses tree (v4.3-rc2+) without this
patch, and I do see the data corruption after reboots.

Tested-by: Artem Bityutskiy 

Artem.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] ext4: implement cgroup writeback support

2015-09-23 Thread Artem Bityutskiy
On Wed, Sep 23, 2015 at 9:24 PM, Theodore Ts'o  wrote:
> Artem,
>
> Can you (or someone on the cgroups list, perhaps) give more details
> about how Fedora 22 sets up groups?
>
> Unfortunately apparently no one has gotten an official Fedora image
> for Google Compute Engine so it's a bit of a pain for me to reproduce
> the problem.  (I suppose I could use AWS, but all of my test
> infrastructure uses GCE, and I'd really rather not have to install a
> Java Runtime on my laptop. :-)

[ My apologies for top posting and for sending HTML e-mails which do
not get through vger.
I am using gmail web interface, and just learned how to send plain
text from here. So re-sending
my longer answer. ]

Hi Ted, Chris, Tejun, all,

quick and probably messy reply before I go to sleep...

I can give more information tomorrow.  But one note - It would be helpful to get
questions like "send us the output of this command" rather than "what are the
cgroups you are in", because I am not fluent with cgroups. IOW, more specific
questions are welcome.

Some more about my setup. I have an number of testboxes, which are 1/2/4-socket
servers. I compile the kernel for them on a separate worker box. Then I copy the
kernel binary to /boot, and the modules to /lib/modules, then run
'sync' and then
reboot to reboot to the new kernel. And vrey often many module files
are corrupted.
They won't load because of majic/crc mismatches.

I copy stuff over scp. Well, this is not exactly scp, but rather a
Python 'scp' module,
which is based on the 'paramiko' module. But I think this should not matter.

Anyway, may be there are some cgroups related with scp/ssh sessions or
/lib/modules
in Fedora 22?

Also note, I tried to be careful during bisecting, I used 4 servers in
parallel, and
did 5 reboot tests on each of them. With this patch reverted all 4
boxes survive 5
reboots just fine. Without this patch reverted, each fail 1-3 reboots.

And, by the way, I forgot this detail - I cut AC power off at the end,
then put it back
on after a 20 seconds delay. I mean, this is a clean reboot, but with
power cut at the
end. So the process is this:

1. I run 'sync' on the box remotely over ssh
2. I run 'reboot'  on the box remotely over ssh, the ssh connection
gets closed at this point
3. I ping the box, and keep doing this until it is stops echoing back
4. I wait several seconds, and then just cut the AC power off. The
wall socket power is off.

So if there was something in, say, SSD cache which was not synced, it
is gone too.

May be this patch reveals an existing issue. My setup has been stable
with 4.2 and many
previous kernels, and it only fails with 4.3-rcX, and my bisecting
lead to this patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] ext4: implement cgroup writeback support

2015-09-23 Thread Artem Bityutskiy
On Wed, 2015-09-23 at 15:49 +0300, Artem Bityutskiy wrote:
> On Tue, 2015-07-21 at 23:56 -0400, Theodore Ts'o wrote:
> > > v2: Updated for MS_CGROUPWB -> SB_I_CGROUPWB.
> > > 
> > > Signed-off-by: Tejun Heo 
> > > Cc: "Theodore Ts'o" 
> > > Cc: Andreas Dilger 
> > > Cc: linux-e...@vger.kernel.org
> > 
> > Thanks, applied.
> 
> Hi, this patch introduces a regression - a major one, I'd say.
> 
> Symptoms: copy a bunch of file, run sync, then run 'reboot', and
> after
> you boot up the copied files are corrupted. So basically the user
> -visible symptom is that 'sync' does not work.

Just FYI, this is the issue I briefly reported last Fri:

https://lkml.org/lkml/2015/9/18/640

Artem.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] ext4: implement cgroup writeback support

2015-09-23 Thread Artem Bityutskiy
On Tue, 2015-07-21 at 23:56 -0400, Theodore Ts'o wrote:
> > v2: Updated for MS_CGROUPWB -> SB_I_CGROUPWB.
> > 
> > Signed-off-by: Tejun Heo 
> > Cc: "Theodore Ts'o" 
> > Cc: Andreas Dilger 
> > Cc: linux-e...@vger.kernel.org
> 
> Thanks, applied.

Hi, this patch introduces a regression - a major one, I'd say.

Symptoms: copy a bunch of file, run sync, then run 'reboot', and after
you boot up the copied files are corrupted. So basically the user
-visible symptom is that 'sync' does not work.

I quite an effort to bisect it, but it led me to this patch.

If I take the latest upstream (v4.3-rc2+), and revert this patch:

001e4a8 ext4: imlpement cgroup writeback support

then the problem goes away - files are not corrupted after reboot.

I use ext4 on top of a "bare" partition, no LVM or dm layers involved.

I use Fedora 22 with all the latest package updates, and I only change
the kernel there.

The corruption seems to be that the start with a bunch of zeroes
instead of the real data, but I did not check carefully, looked only at
one file briefly.

Artem.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


sync issues in 4.3-rc1?

2015-09-18 Thread Artem Bityutskiy
Hi,

I observe a problem in 4.3-rc1 which looks like there are issues with
sync - it does not really sync all.

It is Firday night here, and I am dropping a quick report, I did not
spend much time investigating it.

Symptoms:

1. I Run v4.3-rc1 + Fedora 22, most FC22 services disabled, no GUI, 
minimal FC22 system, quite silent.
2. I copy a directory with a bunch of files, let's call it "dir". In 
my case this is just the kernel modules tree (produces with 'make 
modules_install').
3. Run 'halt' or 'poweroff'
4. After the system boots up, very often, but not always, "dir" is 
corrupted.

I did not do strict verification, but corruption seems to be that
files are there, but some are corrupted. I checked one file - xfs.ko -
it starts with a bunch of zero bytes, while the original does not
start with zeroes.

I've done this quick test - run the following short shell script with
4.3-rc1 and in 4.2 and get different results. The system is otherwise
the same just kernel is different.






$ while true; do rm -rf ~/4.3.0-rc1/; cp -r /usr/lib/modules/4.3.0-rc1
~/ && sync && grep Dirty /proc/meminfo; done 

I get 0's more consistently in case of v4.2. It does not necessary mean
anything, but looks suspicious.

Here are the numbers.

v4.2
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty:   116 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty:   456 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty:   160 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty:   456 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty:72 kB
Dirty: 0 kB
Dirty:12 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB


v4.3-rc1
Dirty:   752 kB
Dirty:   752 kB
Dirty:60 kB
Dirty:   812 kB
Dirty:   688 kB
Dirty: 0 kB
Dirty:   424 kB
Dirty: 0 kB
Dirty:   632 kB
Dirty: 0 kB
Dirty:   272 kB
Dirty:   272 kB
Dirty: 0 kB
Dirty:   188 kB
Dirty:   940 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty:  1624 kB
Dirty: 0 kB
Dirty:   424 kB
Dirty:   316 kB
Dirty:  1068 kB
Dirty:   316 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty:   448 kB
Dirty: 0 kB
Dirty:   752 kB
Dirty: 0 kB
Dirty:   424 kB
Dirty: 0 kB
Dirty:   804 kB
Dirty:   804 kB
Dirty:   804 kB
Dirty:   840 kB
Dirty:   840 kB
Dirty:   888 kB
Dirty: 0 kB
Dirty:   424 kB
Dirty:   112 kB
Dirty:   112 kB
Dirty:   112 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty: 0 kB
Dirty:   224 kB
Dirty:   276 kB
Dirty:

Re: [PATCH 2/2] ubifs: Allow O_DIRECT

2015-08-25 Thread Artem Bityutskiy
On Mon, 2015-08-24 at 21:28 -0400, Chris Mason wrote:
> This is what btrfs already does for O_DIRECT plus compressed, or
> other
> cases where people don't want their applications to break on top of
> new
> features that aren't quite compatible with it.

I do not know how much of direct IO we can do in UBIFS - we have
compression too, there is not block layer under us. But someone should
try and see what could be done.

Artem.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] ubifs: Allow O_DIRECT

2015-08-24 Thread Artem Bityutskiy
On Mon, 2015-08-24 at 01:03 -0700, Christoph Hellwig wrote:
> On Mon, Aug 24, 2015 at 11:02:42AM +0300, Artem Bityutskiy wrote:
> > Back when we were writing UBIFS, we did not need direct IO, so we
> > did
> > not implement it. But yes, probably someone who cares could just
> > try
> > implementing this feature.
> 
> So I think the answer here is to implement a real version insted of
> adding hacks to pretend it is supported.

Fair enough, thank for the input!

Artem.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] ubifs: Allow O_DIRECT

2015-08-24 Thread Artem Bityutskiy
On Mon, 2015-08-24 at 00:53 -0700, Christoph Hellwig wrote:
> On Mon, Aug 24, 2015 at 10:13:25AM +0300, Artem Bityutskiy wrote:
> > 1. we are the only FS erroring out on O_DIRECT
> > 2. other file-systems not supporting direct IO just fake it
> 
> There are lots of file systems not supporting O_DIRECT, but ubifs
> might
> be the most common one.  Given that O_DIRECT implementations aren't
> hard, so what's holding back a real implementation?

Back when we were writing UBIFS, we did not need direct IO, so we did
not implement it. But yes, probably someone who cares could just try
implementing this feature.

Artem.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] ubifs: Allow O_DIRECT

2015-08-24 Thread Artem Bityutskiy
On Thu, 2015-08-20 at 15:34 +0300, Artem Bityutskiy wrote:
> On Thu, 2015-08-20 at 13:40 +0200, Richard Weinberger wrote:
> > > Basically, we need to see what is the "common practice" here, and
> > > follow it. This requires a small research. What would be the most
> > > popular Linux FS which does not support direct I/O? Can we check 
> > > what
> > > it does?
> > 
> > All popular filesystems seem to support direct IO.
> > That's the problem, application do not expect O_DIRECT to fail.
> > 
> > My intention was to do it like exofs:
> 
> Fair enough, thanks!
> 
> Signed-off-by: Artem Bityutskiy 

Richard, you mention this was suggested by Dave, could you please pint
to the discussion, if possible?

Artem.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] ubifs: Allow O_DIRECT

2015-08-24 Thread Artem Bityutskiy
On Thu, 2015-08-20 at 13:49 -0700, Brian Norris wrote:
> Pardon the innocent bystander's comment, but:
> 
> On Thu, Aug 20, 2015 at 01:40:02PM +0200, Richard Weinberger wrote:
> > Am 20.08.2015 um 13:31 schrieb Artem Bityutskiy:
> > > On Thu, 2015-08-20 at 11:00 +0800, Dongsheng Yang wrote:
> > > > On 08/20/2015 04:35 AM, Richard Weinberger wrote:
> > > > > Currently UBIFS does not support direct IO, but some
> > > > > applications
> > > > > blindly use the O_DIRECT flag.
> > > > > Instead of failing upon open() we can do better and fall back
> > > > > to buffered IO.
> > > > 
> > > > H, to be honest, I am not sure we have to do it as Dave
> > > > suggested. I think that's just a work-around for current
> > > > fstests.
> > > > 
> > > > IMHO, perform a buffered IO when user request direct IO without
> > > > any warning sounds not a good idea. Maybe adding a warning
> > > > would
> > > > make it better.
> > > > 
> > > > I think we need more discussion about AIO&DIO in ubifs, and
> > > > actually
> > > > I have a plan for it. But I have not listed the all cons and
> > > > pros of
> > > > it so far.
> > > > 
> > > > Artem, what's your opinion?
> > > 
> > > Yes, this is my worry too.
> > > 
> > > Basically, we need to see what is the "common practice" here, and
> > > follow it. This requires a small research. What would be the most
> > > popular Linux FS which does not support direct I/O? Can we check
> > > what
> > > it does?
> > 
> > All popular filesystems seem to support direct IO.
> > That's the problem, application do not expect O_DIRECT to fail.
> 
> Why can't we just suggest fixing the applications? The man pages for
> O_DIRECT clearly document the EINVAL return code:
> 
>   EINVAL The filesystem does not support the O_DIRECT flag. See NOTES
>   for more information.
> 
> and under NOTES:
> 
>   O_DIRECT  support  was added under Linux in kernel version 2.4.10.
>   Older Linux kernels simply ignore this flag.  Some filesystems may
> not
>   implement the flag and open() will fail with EINVAL if it is used.

That's an option.

When writing the code, we were defensive and preferred to error out for
everything we do not support. This is generally a good SW engineering
practice.

Now, some user-space fails when direct I/O is not supported. We can
chose to fake direct I/O or fix user-space. The latter seems to be the
preferred course of actions, and you are correctly pointing the man
page.

However, if

1. we are the only FS erroring out on O_DIRECT
2. other file-systems not supporting direct IO just fake it

we may just follow the crowd and fake it too.

I am kind of trusting Richard here - I assume he did the research and
the above is the case, this is why I am fine with his patch.

Does this logic seem acceptable to you? Other folk's opinion would be
great to hear.

Artem.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] ubifs: Allow O_DIRECT

2015-08-20 Thread Artem Bityutskiy
On Thu, 2015-08-20 at 13:40 +0200, Richard Weinberger wrote:
> > Basically, we need to see what is the "common practice" here, and
> > follow it. This requires a small research. What would be the most
> > popular Linux FS which does not support direct I/O? Can we check 
> > what
> > it does?
> 
> All popular filesystems seem to support direct IO.
> That's the problem, application do not expect O_DIRECT to fail.
> 
> My intention was to do it like exofs:

Fair enough, thanks!

Signed-off-by: Artem Bityutskiy 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] ubifs: Allow O_DIRECT

2015-08-20 Thread Artem Bityutskiy
On Thu, 2015-08-20 at 11:00 +0800, Dongsheng Yang wrote:
> On 08/20/2015 04:35 AM, Richard Weinberger wrote:
> > Currently UBIFS does not support direct IO, but some applications
> > blindly use the O_DIRECT flag.
> > Instead of failing upon open() we can do better and fall back
> > to buffered IO.
> 
> H, to be honest, I am not sure we have to do it as Dave
> suggested. I think that's just a work-around for current fstests.
> 
> IMHO, perform a buffered IO when user request direct IO without
> any warning sounds not a good idea. Maybe adding a warning would
> make it better.
> 
> I think we need more discussion about AIO&DIO in ubifs, and actually
> I have a plan for it. But I have not listed the all cons and pros of
> it so far.
> 
> Artem, what's your opinion?

Yes, this is my worry too.

Basically, we need to see what is the "common practice" here, and
follow it. This requires a small research. What would be the most
popular Linux FS which does not support direct I/O? Can we check what
it does?

Artem.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] ubifs: Allow O_DIRECT

2015-08-20 Thread Artem Bityutskiy
On Wed, 2015-08-19 at 22:35 +0200, Richard Weinberger wrote:
> Currently UBIFS does not support direct IO, but some applications
> blindly use the O_DIRECT flag.
> Instead of failing upon open() we can do better and fall back
> to buffered IO.
> 
> Cc: Dongsheng Yang 
> Cc: dedeki...@gmail.com
> Suggested-by: Dave Chinner 
> Signed-off-by: Richard Weinberger 

Richard,

The idea was to explicitly reject what we do not support. Let's say I
am an app which requires O_DIRECT, and which does not want to work with
non-O_DIRECT. What would I do to ensure O_DIRECT?

Could you please check what other file-systems which do not support
O_DIRECT do in this case? Do they also fall-back to normal IO instead
of explicitly failing? If yes, we can do what is considered to be the
"standard" behavior.

Thanks!


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] ubifs: Remove dead xattr code

2015-08-19 Thread Artem Bityutskiy
On Thu, 2015-08-20 at 10:48 +0800, Dongsheng Yang wrote:
> On 08/20/2015 04:35 AM, Richard Weinberger wrote:
> > This is a partial revert of commit 
> > d7f0b70d30ffb9bbe6b8a3e1035cf0b79965ef53
> > ("UBIFS: Add security.* XATTR support for the UBIFS").
> 
> Hi Richard,
>   What about a full reverting of this commit. In ubifs, we
> *can* support any namespace of xattr including user, trusted, 
> security
> or other anyone prefixed by any words. But we have a 
> check_namespace()
> in xattr.c to limit what we want to support. That said, if we want to
> "Add security.* XATTR support for the UBIFS", what we need to do is
> just extending the check_namespace() to allow security namespace 
> pass.
> And yes, check_namespace() have been supporting security namespace.
> 
> So, IMHO, we do not depend on the generic mechanism at all, and we 
> can
> fully revert this commit.

We just weren't careful enough.

-- 
Best Regards,
Artem Bityutskiy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 25/31] fs/ubifs: use kmemdup rather than duplicating its implementation

2015-08-10 Thread Artem Bityutskiy
On Fri, 2015-08-07 at 09:59 +0200, Andrzej Hajda wrote:
> The patch was generated using fixed coccinelle semantic patch
> scripts/coccinelle/api/memdup.cocci [1].
> 
> [1]: http://permalink.gmane.org/gmane.linux.kernel/2014320
> 
> Signed-off-by: Andrzej Hajda 

Picked to 'linux-ubifs.git', thanks!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2] numa,sched: resolve conflict between load balancing and NUMA balancing

2015-05-29 Thread Artem Bityutskiy
On Wed, 2015-05-27 at 15:04 -0400, r...@redhat.com wrote:
> A previous attempt to resolve a major conflict between load balancing and
> NUMA balancing, changeset 095bebf61a46 ("sched/numa: Do not move past the
> balance point if unbalanced"), introduced its own problems.
> 
> Revert that changeset, and introduce a new fix, which actually seems to
> resolve the issues observed in Jirka's tests.
> 
> A test where the main thread creates a large memory area, and spawns
> a worker thread to iterate over the memory (placed on another node
> by select_task_rq_fair), after which the main thread goes to sleep
> and waits for the worker thread to loop over all the memory now sees
> the worker thread migrated to where the memory is, instead of having
> all the memory migrated over like before.
> 
> Jirka has run a number of performance tests on several systems:
> single instance SpecJBB 2005 performance is 7-15% higher on a 4 node
> system, with higher gains on systems with more cores per socket.
> Multi-instance SpecJBB 2005 (one per node), linpack, and stream see
> little or no changes with the revert of 095bebf61a46 and this patch.

[Re-sending since it didn't hit the mailing list first time, due to
HTML]

Tested-by: Artem Bityutskiy 

Hi Rik,

I've executed our eCommerce Web workload benchmark. Last time I did not
revert 095bebf61a46, now I tested this patch-set. Let me summarize
everything.

Here is the average web server response time in millisecs for various
kernels.

1. v4.1-rc1 - 1450
2. v4.1-rc1 + a43455a1d572daf7b730fe12eb747d1e17411365 reverted  -  300
3. v4.1-rc1 + NUMA disabled  - 480
4. v4.1-rc5 + this patch-set - 1260

So as you see, for our workload reverting
a43455a1d572daf7b730fe12eb747d1e17411365
results in Web server being most responsive (reminder - this is about a
2-socket Haswell-EP
server).

Just disabling NUMA also gives a big improvement, but not as good as
reverting the
"offending" (from our workload's POW) patch.

This patch-set does result in a noticeable improvement too.

Artem.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] ubifs: fix to check error code of register_shrinker

2015-05-27 Thread Artem Bityutskiy
On Tue, 2015-05-26 at 13:29 +0200, Richard Weinberger wrote:
> Am 26.05.2015 um 13:23 schrieb Chao Yu:
> > Ping.
> > 
> > Add Cc Richard Weinberger.
> 
> Patch looks good to me.
> Artem, if you don't mind I'd pick it up for the 4.2 queue.

Yes, please, thanks!

Signed-off-by: Artem Bityutskiy 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH RFC] numa,sched: have fbq_classify_* factor out the current task

2015-05-21 Thread Artem Bityutskiy
On Tue, 2015-05-19 at 10:55 -0400, Rik van Riel wrote:
> The load balancer tries to find, through find_busiest_group and
> find_busiest_queue, a busy CPU with tasks that are running on
> the wrong NUMA node.
> 
> However, the load balancer only moves runnable-but-not-running
> tasks in most situations. This fails horribly when the current
> task on a CPU is on the wrong NUMA node, but the other task(s)
> on the run queue are placed correctly.
> 
> In that situation, what started out as one misplaced tasks
> quickly turns into two misplaced tasks.
> 
> Try to avoid that by factoring out the placement of the current
> task, in order to find groups and runqueues with misplaced tasks
> that are not currently running.
> 
> Signed-off-by: Rik van Riel 

This seem to give small improvement for our eCommerce web workload,
average sever response time went down from ~1.4 to ~1.34. I can run the
workload for longer time to get better numbers.

Thanks,
Artem.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 0/4] sched,numa: pinned task accounting

2015-05-18 Thread Artem Bityutskiy
On Fri, 2015-05-15 at 17:43 +0200, Peter Zijlstra wrote:
> Hi,
> 
> Here's a first few patches that provide pinned task accounting; they are boot
> tested only so far.
> 
> I don't think this is enough to address Artem's regression, but its a
> foundation to add some more smarts. In particular we should make it harder
> still to migrate well placed tasks away due to these pinned tasks.

Hi,

the most useful thing I can do is to test your patches, which I did, and
there seem to be no difference: average server response time is still
1.4 seconds.

Artem.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] numa,sched: only consider less busy nodes as numa balancing destination

2015-05-13 Thread Artem Bityutskiy
On Wed, 2015-05-13 at 08:29 +0200, Peter Zijlstra wrote:
> > 2) ensure that rq->nr_numa_running and rq->nr_preferred_running also
> >get incremented for kernel threads that are bound to a particular
> >CPU - currently CPU-bound kernel threads will cause the NUMA
> >statistics to look like a CPU has tasks that do not belong on that
> >NUMA node
> 
> I'm thinking accounting those to nr_pinned, lemme see how that works
> out.

Does not make any difference, avg. response time is still ~1.4sec.

Thank you!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] numa,sched: only consider less busy nodes as numa balancing destination

2015-05-12 Thread Artem Bityutskiy
On Fri, 2015-05-08 at 16:03 -0400, Rik van Riel wrote:
> Currently the load balancer has a preference for moving
> tasks to their preferred nodes (NUMA_FAVOUR_HIGHER, true),
> but there is no resistance to moving tasks away from their
> preferred nodes (NUMA_RESIST_LOWER, false).  That setting
> was arrived at after a fair amount of experimenting, and
> is probably correct.

FYI, (NUMA_RESIST_LOWER, true) does not make any difference for me.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] numa,sched: only consider less busy nodes as numa balancing destination

2015-05-11 Thread Artem Bityutskiy
On Fri, 2015-05-08 at 16:03 -0400, Rik van Riel wrote:
> This works well when dealing with tasks that are constantly
> running, but fails catastrophically when dealing with tasks
> that go to sleep, wake back up, go back to sleep, wake back
> up, and generally mess up the load statistics that the NUMA
> balancing code use in a random way.

Sleeping is what happens a lot I believe in this workload: processes do
a lot of network I/O, file I/O too, and a lot of IPC.

Would you please expand on this a bit more - why would this scenario
"mess up load statistics" ?

> If the normal scheduler load balancer is moving tasks the
> other way the NUMA balancer is moving them, things will
> not converge, and tasks will have worse memory locality
> than not doing NUMA balancing at all.

Are the regular and NUMA balancers independent?

Are there mechanisms to detect ping-pong situations? I'd like to verify
your theory, and these kinds of mechanisms would be helpful.

> Currently the load balancer has a preference for moving
> tasks to their preferred nodes (NUMA_FAVOUR_HIGHER, true),
> but there is no resistance to moving tasks away from their
> preferred nodes (NUMA_RESIST_LOWER, false).  That setting
> was arrived at after a fair amount of experimenting, and
> is probably correct.

I guess I can try making NUMA_RESIST_LOWER to be true and see what
happens. But probably first I need to confirm that your theory
(balancers playing ping-pong) is correct, any hints on how would I do
this?

Thanks!

Artem.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] numa,sched: only consider less busy nodes as numa balancing destination

2015-05-08 Thread Artem Bityutskiy
On Wed, 2015-05-06 at 11:41 -0400, Rik van Riel wrote:
> On Wed, 06 May 2015 13:35:30 +0300
> Artem Bityutskiy  wrote:
> 
> > we observe a tremendous regression between kernel version 3.16 and 3.17
> > (and up), and I've bisected it to this commit:
> > 
> > a43455a sched/numa: Ensure task_numa_migrate() checks the preferred node
> 
> Artem, Jirka, does this patch fix (or at least improve) the issues you
> have been seeing?  Does it introduce any new regressions?

Hi Rik,

first of all thanks for your help!

I've tried this patch and it has very small effect. I've also ran the
benchmark with auto-NUMA disabled too, which is useful, I think. I used
the tip of Linuses tree (v4.1-rc2+).


 Kernel Avg response time, ms
--
Vanilla1481
Patched1240
Reverted   256
Disabled   309


Vanilla: pristine v4.1-rc2+
Patched: Vanilla + this patch
Reverted: Vanilla + a revert of a43455a
Disabled: Vanilla and auto-NUMA disabled via procfs

I ran the benchmark for 1 hour for every configuration this time. I
cannot say for sure the deviation right now, but I think it is tens of
milliseconds, so disabled vs reverted _may_ be within the error range,
but I need to do more experiments.

So this patch dropped the average Web server response time dropped from
about 1.4 seconds to about 1.2 seconds, which isn't a bad improvement,
but it is far less than what we get when reverting that patch.

Artem.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] numa,sched: only consider less busy nodes as numa balancing destination

2015-05-07 Thread Artem Bityutskiy
On Wed, 2015-05-06 at 11:41 -0400, Rik van Riel wrote:
> On Wed, 06 May 2015 13:35:30 +0300
> Artem Bityutskiy  wrote:
> 
> > we observe a tremendous regression between kernel version 3.16 and 3.17
> > (and up), and I've bisected it to this commit:
> > 
> > a43455a sched/numa: Ensure task_numa_migrate() checks the preferred node
> 
> Artem, Jirka, does this patch fix (or at least improve) the issues you
> have been seeing?  Does it introduce any new regressions?
> 
> Peter, Mel, I think it may be time to stop waiting for the impedance
> mismatch between the load balancer and NUMA balancing to be resolved,
> and try to just avoid the issue in the NUMA balancing code...

I'll give it a try as soon as I can and report back, thanks!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


autoNUMA web workload regression

2015-05-06 Thread Artem Bityutskiy
Hi Rik,

we observe a tremendous regression between kernel version 3.16 and 3.17
(and up), and I've bisected it to this commit:

a43455a sched/numa: Ensure task_numa_migrate() checks the preferred node

http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a43455a1d572daf7b730fe12eb747d1e17411365

We run a Web server (nginx) on a 2-socket Haswell server and we emulate
an e-Commerce Web-site. Clients send requests to the server and measure
the response time. Clients load the server quite heavily - CPU
utilization is more than 90% as measured with turbostat. We use Fedora
20.

If I take 3.17 and revert this patch, I observe 600% or more average
response time improvement comparing to vanilla 3.17.

If I take 4.1-rc1 and revert this patch, I observe 300% or more average
response time improvement comparing to vanilla 3.17.

I asked Fengguang Wu to run LKP workloads on multiple 4 and 8 socket
machines for v4.1-rc1 with and without this patch, and there seem to be
no difference - all the micro-benchmarks performed similarly and the
difference were withing the error range.

IOW, it looks like this patch has bad effect on Web server QoS (slower
response time). What do you think?

Thank you!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: unnecessary merge in the ubifs tree

2015-03-26 Thread Artem Bityutskiy
On Thu, 2015-03-26 at 11:32 +1100, Stephen Rothwell wrote:
> Hi Artem,
> 
> I noticed merge commit 3527a86b7ae1 ("Merge tag 'v4.0-rc5' into
> linux-next").  Just a note that if you had merged the commit that the
> tag points to, you would have just had a fast forward instead of a
> "real" merge.  Merging the tag forces the --no-ff flag on git merge
> (and adding --ff does not change that).  You could have merged
> "v4.0-rc5^0" (I think) or (apparently) pass --ff-only to get the fast
> forward.

You are right, and thank you for telling about this. I hope Linus won't
flame me for this. I'll definitely keep this in mind next time.

Artem.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 linux-next] UBIFS: simplify returns

2015-03-25 Thread Artem Bityutskiy
On Thu, 2015-03-12 at 21:34 +0100, Fabian Frederick wrote:
> directly return recover_head() and ubifs_leb_unmap()
> instead of storing value in err and testing it.
> 
> Signed-off-by: Fabian Frederick 
> ---

Pushed to linux-ubifs/master, thanks!


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/1] UBIFS: fix incorrect unlocking handling

2015-03-25 Thread Artem Bityutskiy
On Tue, 2015-03-17 at 22:09 -0400, Taesoo Kim wrote:
> When ubifs_init_security() fails, 'ui_mutex' is incorrectly
> unlocked and incorrectly restores 'i_size'. There are four
> such places that were introduce by the last commit.
> 
> Signed-off-by: Taesoo Kim 

Pushed to linux-ubifs.git/master, thanks!


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] UBI fixes for 4.0-rc5

2015-03-20 Thread Artem Bityutskiy
Hi Linus,

a pull request follows, but first a short explanation.

This pull request is based on a "random" commit, unfortunately. We
usually use a tagged -rc commit as a base, but not this time. Sorry for
this. Richard Weinberger is the new co-maintainer for UBI, and he was a
little bit too fast and did not wait for -rc1.


The following changes since commit 3d883483dc0a7261d73d8b1857a7387a1dd99eee:

  Merge branch 'fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal 
(2015-02-19 17:51:22 -0800)

are available in the git repository at:

  git://git.infradead.org/linux-ubifs.git tags/upstream-4.0-rc5

for you to fetch changes up to b388e6a7a6ba988998ddd83919ae8d3debf1a13d:

  UBI: fix missing brace control flow (2015-02-23 22:17:24 +0100)


This pull request fixes a bug introduced during the v4.0 merge window where we
forgot to put braces where they should be.


Brian Norris (1):
  UBI: fix missing brace control flow

 drivers/mtd/ubi/eba.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


--
Regards,
Artem.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] UBIFS: Fix trivial typos in comments

2015-03-12 Thread Artem Bityutskiy
On Fri, 2015-03-06 at 23:14 +0100, Yannick Guerrini wrote:
> Change 'comress' to 'compress'
> Change 'inteval' to 'interval'
> Change 'disabe' to 'disable'
> Change 'nenver' to 'never'
> 
> Signed-off-by: Yannick Guerrini 

Pushed to linux-ubifs.git/master, thank you!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] UBI/UBIFS updates 3.19-rc1

2014-12-12 Thread Artem Bityutskiy
Hi Linus,

The following changes since commit 0df1f2487d2f0d04703f142813d53615d62a1da4:

  Linux 3.18-rc3 (2014-11-02 15:01:51 -0800)

are available in the git repository at:

  git://git.infradead.org/linux-ubifs.git tags/upstream-3.19-rc1

for you to fetch changes up to f38aed975c0c3645bbdfc5ebe35726e64caaf588:

  UBI: Fix invalid vfree() (2014-11-07 15:14:09 +0200)


This pull request includes the following UBI/UBIFS changes:
* UBI debug messages now include the UBI device number. This change
  is responsible for the big diffstat since it touched every debugging
  print statement.
* An Xattr bug-fix which fixes SELinux support
* Several error path fixes in UBI/UBIFS


Artem Bityutskiy (1):
  UBIFS: fix budget leak in error path

Richard Weinberger (3):
  UBI: vtbl: Use ubi_eba_atomic_leb_change()
  UBI: Fix double free after do_sync_erase()
  UBI: Fix invalid vfree()

Subodh Nijsure (1):
  UBIFS: fix a couple bugs in UBIFS xattr length calculation

Tanya Brokhman (1):
  UBI: Extend UBI layer debug/messaging capabilities

 drivers/mtd/ubi/attach.c  | 126 
++
 drivers/mtd/ubi/block.c   |  41 -
 drivers/mtd/ubi/build.c   | 126 
+++---
 drivers/mtd/ubi/cdev.c|  36 +++-
 drivers/mtd/ubi/debug.c   |  10 +-
 drivers/mtd/ubi/eba.c |  53 
+++--
 drivers/mtd/ubi/fastmap.c |  96 

 drivers/mtd/ubi/io.c  | 150 
+-
 drivers/mtd/ubi/kapi.c|   6 +++---
 drivers/mtd/ubi/misc.c|   4 ++--
 drivers/mtd/ubi/ubi.h |  13 +++--
 drivers/mtd/ubi/upd.c |  10 ++
 drivers/mtd/ubi/vmt.c |  69 
-
 drivers/mtd/ubi/vtbl.c|  71 
++-
 drivers/mtd/ubi/wl.c  |  80 
++--
 fs/ubifs/file.c   |   1 +
 fs/ubifs/journal.c|   7 ++-
 17 files changed, 466 insertions(+), 433 deletions(-)

--
Artem

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/6] UBI: Fastmap: Ensure that only one fastmap work is scheduled

2014-11-27 Thread Artem Bityutskiy
On Thu, 2014-11-27 at 17:39 +0100, Richard Weinberger wrote:
> > So the background thread would first check this flag, and if it is set -
> > call the fastmap stuff. The go do the WL works.
> > 
> > Just off-the top of my head, take with grain of salt.
> 
> So you want me to redesign it?
> IMHO this is just a matter of taste.
> 
> Face it, my brain does not work like yours. I design things differently.

OK.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/6] UBI: Fastmap: Ensure that all fastmap work is done upon WL shutdown

2014-11-27 Thread Artem Bityutskiy
On Thu, 2014-11-27 at 17:35 +0100, Richard Weinberger wrote:
> Am 27.11.2014 um 17:29 schrieb Artem Bityutskiy:
> > On Thu, 2014-11-27 at 17:08 +0100, Richard Weinberger wrote:
> >>> Obviously, there is some misunderstanding. This looks like lack of
> >>> separation and misuse of layering. I am missing explanations why I am
> >>> wrong...
> >>
> >> So you want me to use the UBI WL background thread for the scheduled 
> >> fastmap work?
> > 
> > No. It is more like either use it or do not use it.
> 
> Sorry, I don't understand.
> What do you want to do to?

Just keep the code structured. I am just asking questions and trying to
to analyze your patches. If at some point I would like you to do
something specific, I clearly state this. In this case I was complaining
about fastmap specifics in an unrelated file, so basically the wish is
to have it go away. How exactly - not specified, up to you :-) Or, this
means just telling me why it is this way, justify.

When I was working with this code, I did give people specific
suggestions, line-by-line. Now I am more doing more of a sanity check,
looking after the bigger picture.

I understand that this is not a picture of an ideal maintainer, and I am
not anymore an ideal maintainer for this stuff (I think I used to,
though), simply because of lack of time. Doing the best effort job now.

> >> I didn't do it that way because you said more than once that fastmap is 
> >> fastmap and
> >> WL is WL. Therefore I've separated it.
> > 
> > And "separated" meaning adding this code to wl.c?
> > 
> > +#ifdef CONFIG_MTD_UBI_FASTMAP
> > +   flush_work(&ubi->fm_work);
> > +#endif
> > 
> > Could it be separated some more then?
> > 
> 
> Of course, commit "UBI: Move fastmap specific functions out of wl.c" does.

I did not see it in this series. So you could tell this earlier, not
after 2 e-mail exchanges. Do not assume I remember the details of our
previous discussion. Assume I forgot everything :-)

> But this commit is *bugfix* commit.

I thought adding an close function to fastmap.c is a simple task.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/6] UBI: Fastmap: Ensure that only one fastmap work is scheduled

2014-11-27 Thread Artem Bityutskiy
On Thu, 2014-11-27 at 17:13 +0100, Richard Weinberger wrote:
> Am 27.11.2014 um 16:27 schrieb Artem Bityutskiy:
> > On Mon, 2014-11-24 at 14:20 +0100, Richard Weinberger wrote:
> >> If the WL pool runs out of PEBs we schedule a fastmap write
> >> to refill it as soon as possible.
> >> Ensure that only one at a time is scheduled otherwise we might end in
> >> a fastmap write storm because writing the fastmap can schedule another
> >> write if bitflips are detected.
> > 
> > Could you please provide some more details about the "write storm". Does
> > it happen when there are 2 fastmap works in the queue? Or if they run
> > simultaneously? Why the storm happens and white kind of "writes" it
> > consists of?
> 
> If UBI needs to write a new fastmap while wear leveling (by using 
> get_peb_for_wl())
> a fastmap work is scheduled.
> We cannot write a fastmap in this context because we're in atomic context.
> At this point one fastmap write is scheduled. If now get_peb_for_wl() is 
> executed
> a second time it will schedule another fastmap work because the pools are 
> still not refilled.

Sounds like just you do not need any works and any queues at all. All
you need is a "please, fastmap me!" flag.

Then this flag should be checked every time we enter the background
thread or the fastmap code, and be acted upon.

So the background thread would first check this flag, and if it is set -
call the fastmap stuff. The go do the WL works.

Just off-the top of my head, take with grain of salt.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/6] UBI: Fastmap: Ensure that all fastmap work is done upon WL shutdown

2014-11-27 Thread Artem Bityutskiy
On Thu, 2014-11-27 at 17:08 +0100, Richard Weinberger wrote:
> > Obviously, there is some misunderstanding. This looks like lack of
> > separation and misuse of layering. I am missing explanations why I am
> > wrong...
> 
> So you want me to use the UBI WL background thread for the scheduled fastmap 
> work?

No. It is more like either use it or do not use it.

> I didn't do it that way because you said more than once that fastmap is 
> fastmap and
> WL is WL. Therefore I've separated it.

And "separated" meaning adding this code to wl.c?

+#ifdef CONFIG_MTD_UBI_FASTMAP
+   flush_work(&ubi->fm_work);
+#endif

Could it be separated some more then?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/6] UBI: Fastmap: Ensure that all fastmap work is done upon WL shutdown

2014-11-27 Thread Artem Bityutskiy
On Mon, 2014-11-24 at 14:20 +0100, Richard Weinberger wrote:
> ...otherwise the deferred work might run after datastructures
> got freed and corrupt memory.
> 
> Signed-off-by: Richard Weinberger 
> ---
>  drivers/mtd/ubi/wl.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c
> index 7f135df..cb2e571 100644
> --- a/drivers/mtd/ubi/wl.c
> +++ b/drivers/mtd/ubi/wl.c
> @@ -2041,6 +2041,9 @@ static void protection_queue_destroy(struct ubi_device 
> *ubi)
>  void ubi_wl_close(struct ubi_device *ubi)
>  {
>   dbg_wl("close the WL sub-system");
> +#ifdef CONFIG_MTD_UBI_FASTMAP
> + flush_work(&ubi->fm_work);
> +#endif

If you are using the work infrastructure implemented in wl.c, then
fastmap work should be no different to any other work. And we do flush
all works in 'shutdown_work()'. The fastmap work should be flushed there
too.

I think we discussed this already - there should be one single queue of
works, managed by the same set of functions, all flushed in the same
place, one-by-one...

Obviously, there is some misunderstanding. This looks like lack of
separation and misuse of layering. I am missing explanations why I am
wrong...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/6] UBI: Fastmap: Ensure that only one fastmap work is scheduled

2014-11-27 Thread Artem Bityutskiy
On Mon, 2014-11-24 at 14:20 +0100, Richard Weinberger wrote:
> If the WL pool runs out of PEBs we schedule a fastmap write
> to refill it as soon as possible.
> Ensure that only one at a time is scheduled otherwise we might end in
> a fastmap write storm because writing the fastmap can schedule another
> write if bitflips are detected.

Could you please provide some more details about the "write storm". Does
it happen when there are 2 fastmap works in the queue? Or if they run
simultaneously? Why the storm happens and white kind of "writes" it
consists of?

Thanks!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fastmap update v2 (pile 1)

2014-11-27 Thread Artem Bityutskiy
On Mon, 2014-11-24 at 14:20 +0100, Richard Weinberger wrote:
> Artem,
> 
> as requested I'm resending my fastmap work in smaller pieces.
> This is pile 1 of 7.

Thanks, but could be even smaller and more often, in theory :) You could
send 3 or something of these several weeks ago - would be even better -
I was much less busy at that point. And next week I am out of office and
out of network.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/6] UBI: Fastmap: Care about the protection queue

2014-11-27 Thread Artem Bityutskiy
On Mon, 2014-11-24 at 14:20 +0100, Richard Weinberger wrote:
> Fastmap can miss a PEB if it is in the protection queue
> and not jet in the used tree.
> Treat every protected PEB as used.
> 
> Signed-off-by: Richard Weinberger 

Picked this one and pushed, thanks!

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   >