Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-24 Thread Denys Vlasenko
On Thursday 16 August 2007 01:39, Satyam Sharma wrote:

  static inline void wait_for_init_deassert(atomic_t *deassert)
  {
 - while (!atomic_read(deassert));
 + while (!atomic_read(deassert))
 + cpu_relax();
   return;
  }

For less-than-briliant people like me, it's totally non-obvious that
cpu_relax() is needed for correctness here, not just to make P4 happy.

IOW: atomic_read name quite unambiguously means I will read
this variable from main memory. Which is not true and creates
potential for confusion and bugs.
--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-24 Thread Denys Vlasenko
On Saturday 18 August 2007 05:13, Linus Torvalds wrote:
 On Sat, 18 Aug 2007, Satyam Sharma wrote:
  No code does (or would do, or should do):
 
  x.counter++;
 
  on an atomic_t x; anyway.

 That's just an example of a general problem.

 No, you don't use x.counter++. But you *do* use

   if (atomic_read(x) = 1)

 and loading into a register is stupid and pointless, when you could just
 do it as a regular memory-operand to the cmp instruction.

It doesn't mean that (volatile int*) cast is bad, it means that current gcc
is bad (or not good enough). IOW: instead of avoiding volatile cast,
it's better to fix the compiler.

 And as far as the compiler is concerned, the problem is the 100% same:
 combining operations with the volatile memop.

 The fact is, a compiler that thinks that

   movl mem,reg
   cmpl $val,reg

 is any better than

   cmpl $val,mem

 is just not a very good compiler.

Linus, in all honesty gcc has many more cases of suboptimal code,
case of volatile is just one of many.

Off the top of my head:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28417

unsigned v;
void f(unsigned A) { v = ((unsigned long long)A) * 365384439  (27+32); }

gcc-4.1.1 -S -Os -fomit-frame-pointer t.c

f:
movl$365384439, %eax
mull4(%esp)
movl%edx, %eax = ?
shrl$27, %eax
movl%eax, v
ret

Why is it moving %edx to %eax?

gcc-4.2.1 -S -Os -fomit-frame-pointer t.c

f:
movl$365384439, %eax
mull4(%esp)
movl%edx, %eax = ?
xorl%edx, %edx = ??!
shrl$27, %eax
movl%eax, v
ret

Progress... Now we also zero out %edx afterwards for no apparent reason.
--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-24 Thread Denys Vlasenko
On Thursday 16 August 2007 00:22, Paul Mackerras wrote:
 Satyam Sharma writes:
 In the kernel we use atomic variables in precisely those situations
 where a variable is potentially accessed concurrently by multiple
 CPUs, and where each CPU needs to see updates done by other CPUs in a
 timely fashion.  That is what they are for.  Therefore the compiler
 must not cache values of atomic variables in registers; each
 atomic_read must result in a load and each atomic_set must result in a
 store.  Anything else will just lead to subtle bugs.

Amen.
--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-24 Thread Denys Vlasenko
On Friday 24 August 2007 13:12, Kenn Humborg wrote:
  On Thursday 16 August 2007 01:39, Satyam Sharma wrote:
static inline void wait_for_init_deassert(atomic_t *deassert)
{
   - while (!atomic_read(deassert));
   + while (!atomic_read(deassert))
   + cpu_relax();
 return;
}
 
  For less-than-briliant people like me, it's totally non-obvious that
  cpu_relax() is needed for correctness here, not just to make P4 happy.
 
  IOW: atomic_read name quite unambiguously means I will read
  this variable from main memory. Which is not true and creates
  potential for confusion and bugs.

 To me, atomic_read means a read which is synchronized with other
 changes to the variable (using the atomic_XXX functions) in such
 a way that I will always only see the before or after
 state of the variable - never an intermediate state while a
 modification is happening.  It doesn't imply that I have to
 see the after state immediately after another thread modifies
 it.

So you are ok with compiler propagating n1 to n2 here:

n1 += atomic_read(x);
other_variable++;
n2 += atomic_read(x);

without accessing x second time. What's the point? Any sane coder
will say that explicitly anyway:

tmp = atomic_read(x);
n1 += tmp;
other_variable++;
n2 += tmp;

if only for the sake of code readability. Because first code
is definitely hinting that it reads RAM twice, and it's actively *bad*
for code readability when in fact it's not the case!

Locking, compiler and CPU barriers are complicated enough already,
please don't make them even harder to understand.
--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-08-24 Thread Denys Vlasenko
On Friday 24 August 2007 18:15, Christoph Lameter wrote:
 On Fri, 24 Aug 2007, Denys Vlasenko wrote:
  On Thursday 16 August 2007 00:22, Paul Mackerras wrote:
   Satyam Sharma writes:
   In the kernel we use atomic variables in precisely those situations
   where a variable is potentially accessed concurrently by multiple
   CPUs, and where each CPU needs to see updates done by other CPUs in a
   timely fashion.  That is what they are for.  Therefore the compiler
   must not cache values of atomic variables in registers; each
   atomic_read must result in a load and each atomic_set must result in a
   store.  Anything else will just lead to subtle bugs.
 
  Amen.

 A timely fashion? One cannot rely on something like that when coding.
 The visibility of updates is insured by barriers and not by some fuzzy
 notion of timeliness.

But here you do have some notion of time:

while (atomic_read(x))
continue;

continue when other CPU(s) decrement it down to zero.
If read includes an insn which accesses RAM, you will
see new value sometime after other CPU decrements it.
Sometime after is on the order of nanoseconds here.
It is a valid concept of time, right?

The whole confusion is about whether atomic_read implies
read from RAM or not. I am in a camp which thinks it does.
You are in an opposite one.

We just need a less ambiguous name.

What about this:

/**
 * atomic_read - read atomic variable
 * @v: pointer of type atomic_t
 *
 * Atomically reads the value of @v.
 * No compiler barrier implied.
 */
#define atomic_read(v)  ((v)-counter)

+/**
+ * atomic_read_uncached - read atomic variable from memory
+ * @v: pointer of type atomic_t
+ *
+ * Atomically reads the value of @v. This is guaranteed to emit an insn
+ * which accesses memory, atomically. No ordering guarantees!
+ */
+#define atomic_read_uncached(v)  asm_or_volatile_ptr_magic(v)

I was thinking of s/atomic_read/atomic_get/ too, but it implies taking
atomic a-la get_cpu()...
--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] i386: Fix a couple busy loops in mach_wakecpu.h:wait_for_init_deassert()

2007-08-24 Thread Denys Vlasenko
On Friday 24 August 2007 18:06, Christoph Lameter wrote:
 On Fri, 24 Aug 2007, Satyam Sharma wrote:
  But if people do seem to have a mixed / confused notion of atomicity
  and barriers, and if there's consensus, then as I'd said earlier, I
  have no issues in going with the consensus (eg. having API variants).
  Linus would be more difficult to convince, however, I suspect :-)

 The confusion may be the result of us having barrier semantics in
 atomic_read. If we take that out then we may avoid future confusions.

I think better name may help. Nuke atomic_read() altogether.

n = atomic_value(x);// doesnt hint as strongly at reading as atomic_read
n = atomic_fetch(x);// yes, we _do_ touch RAM
n = atomic_read_uncached(x); // or this

How does that sound?
--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-09-09 Thread Denys Vlasenko
On Friday 17 August 2007 17:48, Linus Torvalds wrote:
 
 On Fri, 17 Aug 2007, Nick Piggin wrote:
  
  That's not obviously just taste to me. Not when the primitive has many
  (perhaps, the majority) of uses that do not require said barriers. And
  this is not solely about the code generation (which, as Paul says, is
  relatively minor even on x86). I prefer people to think explicitly
  about barriers in their lockless code.
 
 Indeed.
 
 I think the important issues are:
 
  - volatile itself is simply a badly/weakly defined issue. The semantics 
of it as far as the compiler is concerned are really not very good, and 
in practice tends to boil down to I will generate so bad code that 
nobody can accuse me of optimizing anything away.
 
  - volatile - regardless of how well or badly defined it is - is purely 
a compiler thing. It has absolutely no meaning for the CPU itself, so 
it at no point implies any CPU barriers. As a result, even if the 
compiler generates crap code and doesn't re-order anything, there's 
nothing that says what the CPU will do.
 
  - in other words, the *only* possible meaning for volatile is a purely 
single-CPU meaning. And if you only have a single CPU involved in the 
process, the volatile is by definition pointless (because even 
without a volatile, the compiler is required to make the C code appear 
consistent as far as a single CPU is concerned).
 
 So, let's take the example *buggy* code where we use volatile to wait 
 for other CPU's:
 
   atomic_set(var, 0);
   while (!atomic_read(var))
   /* nothing */;
 
 
 which generates an endless loop if we don't have atomic_read() imply 
 volatile.
 
 The point here is that it's buggy whether the volatile is there or not! 
 Exactly because the user expects multi-processing behaviour, but 
 volatile doesn't actually give any real guarantees about it. Another CPU 
 may have done:
 
   external_ptr = kmalloc(..);
   /* Setup is now complete, inform the waiter */
   atomic_inc(var);
 
 but the fact is, since the other CPU isn't serialized in any way, the 
 while-loop (even in the presense of volatile) doesn't actually work 
 right! Whatever the atomic_read() was waiting for may not have 
 completed, because we have no barriers!

Why is all this fixation on volatile? I don't think
people want volatile keyword per se, they want atomic_read(x) to
_always_ compile into an memory-accessing instruction, not register access.
--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-09-10 Thread Denys Vlasenko
On Monday 10 September 2007 13:22, Kyle Moffett wrote:
 On Sep 10, 2007, at 06:56:29, Denys Vlasenko wrote:
  On Sunday 09 September 2007 19:18, Arjan van de Ven wrote:
  On Sun, 9 Sep 2007 19:02:54 +0100
  Denys Vlasenko [EMAIL PROTECTED] wrote:
 
  Why is all this fixation on volatile? I don't think people want  
  volatile keyword per se, they want atomic_read(x) to _always_  
  compile into an memory-accessing instruction, not register access.
 
  and ... why is that?  is there any valid, non-buggy code sequence  
  that makes that a reasonable requirement?
 
  Well, if you insist on having it again:
 
  Waiting for atomic value to be zero:
 
  while (atomic_read(x))
  continue;
 
  gcc may happily convert it into:
 
  reg = atomic_read(x);
  while (reg)
  continue;
 
 Bzzt.  Even if you fixed gcc to actually convert it to a busy loop on  
 a memory variable, you STILL HAVE A BUG as it may *NOT* be gcc that  
 does the conversion, it may be that the CPU does the caching of the  
 memory value.  GCC has no mechanism to do cache-flushes or memory- 
 barriers except through our custom inline assembly.

CPU can cache the value all right, but it cannot use that cached value
*forever*, it has to react to invalidate cycles on the shared bus
and re-fetch new data.

IOW: atomic_read(x) which compiles down to memory accessor
will work properly.

 the CPU.  Thirdly, on a large system it may take some arbitrarily  
 large amount of time for cache-propagation to update the value of the  
 variable in your local CPU cache.

Yes, but arbitrarily large amount of time is actually measured
in nanoseconds here. Let's say 1000ns max for hundreds of CPUs?

 Also, you   
 probably want a cpu_relax() in there somewhere to avoid overheating  
 the CPU.

Yes, but 
1. CPU shouldn't overheat (in a sense that it gets damaged),
   it will only use more power than needed.
2. cpu_relax() just throttles down my CPU, so it's performance
   optimization only. Wait, it isn't, it's a barrier too.
   Wow, cpu_relax is a barrier? How am I supposed to know
   that without reading lkml flamewars and/or header files?

Let's try reading headers. asm-x86_64/processor.h:

#define cpu_relax()   rep_nop()

So, is it a barrier? No clue yet.

/* REP NOP (PAUSE) is a good thing to insert into busy-wait loops. */
static inline void rep_nop(void)
{
__asm__ __volatile__(rep;nop: : :memory);
}

Comment explicitly says that it is a good thing (doesn't say
that it is mandatory) and says NOTHING about barriers!

Barrier-ness is not mentioned and is hidden in memory clobber.

Do you think it's obvious enough for average driver writer?
I think not, especially that it's unlikely for him to even start
suspecting that it is a memory barrier based on the cpu_relax
name.

 You simply CANNOT use an atomic_t as your sole synchronizing
 primitive, it doesn't work!  You virtually ALWAYS want to use an  
 atomic_t in the following types of situations:
 
 (A) As an object refcount.  The value is never read except as part of  
 an atomic_dec_return().  Why aren't you using struct kref?
 
 (B) As an atomic value counter (number of processes, for example).   
 Just reading the value is racy anyways, if you want to enforce a  
 limit or something then use atomic_inc_return(), check the result,  
 and use atomic_dec() if it's too big.  If you just want to return the  
 statistics then you are going to be instantaneous-point-in-time anyways.
 
 (C) As an optimization value (statistics-like, but exact accuracy  
 isn't important).
 
 Atomics are NOT A REPLACEMENT for the proper kernel subsystem, like  
 completions, mutexes, semaphores, spinlocks, krefs, etc.  It's not  
 useful for synchronization, only for keeping track of simple integer  
 RMW values.  Note that atomic_read() and atomic_set() aren't very  
 useful RMW primitives (read-nomodify-nowrite and read-set-zero- 
 write).  Code which assumes anything else is probably buggy in other  
 ways too.

You are basically trying to educate me how to use atomic properly.
You don't need to do it, as I am (currently) not a driver author.

I am saying that people who are already using atomic_read()
(and who unfortunately did not read your explanation above)
will still sometimes use atomic_read() as a way to read atomic value
*from memory*, and will create nasty heisenbugs for you to debug.
--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-09-10 Thread Denys Vlasenko
On Monday 10 September 2007 14:38, Denys Vlasenko wrote:
 You are basically trying to educate me how to use atomic properly.
 You don't need to do it, as I am (currently) not a driver author.
 
 I am saying that people who are already using atomic_read()
 (and who unfortunately did not read your explanation above)
 will still sometimes use atomic_read() as a way to read atomic value
 *from memory*, and will create nasty heisenbugs for you to debug.

static inline int
qla2x00_wait_for_loop_ready(scsi_qla_host_t *ha)
{
int  return_status = QLA_SUCCESS;
unsigned long loop_timeout ;
scsi_qla_host_t *pha = to_qla_parent(ha);

/* wait for 5 min at the max for loop to be ready */
loop_timeout = jiffies + (MAX_LOOP_TIMEOUT * HZ);

while ((!atomic_read(pha-loop_down_timer) 
atomic_read(pha-loop_state) == LOOP_DOWN) ||
atomic_read(pha-loop_state) != LOOP_READY) {
if (atomic_read(pha-loop_state) == LOOP_DEAD) {
return_status = QLA_FUNCTION_FAILED;
break;
}
msleep(1000);
if (time_after_eq(jiffies, loop_timeout)) {
return_status = QLA_FUNCTION_FAILED;
break;
}
}
return (return_status);
}

Is above correct or buggy? Correct, because msleep is a barrier.
Is it obvious? No.

static void
qla2x00_rst_aen(scsi_qla_host_t *ha)
{
if (ha-flags.online  !ha-flags.reset_active 
!atomic_read(ha-loop_down_timer) 
!(test_bit(ABORT_ISP_ACTIVE, ha-dpc_flags))) {
do {
clear_bit(RESET_MARKER_NEEDED, ha-dpc_flags);

/*
 * Issue marker command only when we are going to start
 * the I/O.
 */
ha-marker_needed = 1;
} while (!atomic_read(ha-loop_down_timer) 
(test_bit(RESET_MARKER_NEEDED, ha-dpc_flags)));
}
}

Is above correct? I honestly don't know. Correct, because set_bit is
a barrier on _all _memory_? Will it break if set_bit will be changed
to be a barrier only on its operand? Probably yes.

drivers/kvm/kvm_main.c

while (atomic_read(completed) != needed) {
cpu_relax();
barrier();
}

Obviously author did not know that cpu_relax is already a barrier.
See why I think driver authors will be confused?

arch/x86_64/kernel/crash.c

static void nmi_shootdown_cpus(void)
{
...
msecs = 1000; /* Wait at most a second for the other cpus to stop */
while ((atomic_read(waiting_for_crash_ipi)  0)  msecs) {
mdelay(1);
msecs--;
}
...
}

Is mdelay(1) a barrier? Yes, because it is a function on x86_64.
Absolutely the same code will be buggy on an arch where
mdelay(1) == udelay(1000), and udelay is implemented
as inline busy-wait.

arch/sparc64/kernel/smp.c

/* Wait for response */
while (atomic_read(data.finished) != cpus)
cpu_relax();
...later in the same file...
while (atomic_read(smp_capture_registry) != ncpus)
rmb();

I'm confused. Do we need cpu_relax() or rmb()? Does cpu_relax() imply rmb()?
(No it doesn't). Which of those two while loops needs correcting?
--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-09-10 Thread Denys Vlasenko
On Monday 10 September 2007 15:51, Arjan van de Ven wrote:
 On Mon, 10 Sep 2007 11:56:29 +0100
 Denys Vlasenko [EMAIL PROTECTED] wrote:
 
  
  Well, if you insist on having it again:
  
  Waiting for atomic value to be zero:
  
  while (atomic_read(x))
  continue;
  
 
 and this I would say is buggy code all the way.

 Not from a pure C level semantics, but from a busy waiting is buggy
 semantics level and a I'm inventing my own locking semantics level.

After inspecting arch/*, I cannot agree with you.
Otherwise almost all major architectures use
conceptually buggy busy-waiting:

arch/alpha
arch/i386
arch/ia64
arch/m32r
arch/mips
arch/parisc
arch/powerpc
arch/sh
arch/sparc64
arch/um
arch/x86_64

All of the above contain busy-waiting on atomic_read.

Including these loops without barriers:

arch/mips/kernel/smtc.c
while (atomic_read(idle_hook_initialized)  1000)
;
arch/mips/sgi-ip27/ip27-nmi.c
while (atomic_read(nmied_cpus) != num_online_cpus());

[Well maybe num_online_cpus() is a barrier, I didn't check]

arch/sh/kernel/smp.c
if (wait)
while (atomic_read(smp_fn_call.finished) != (nr_cpus - 1));

Bugs?
--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/24] make atomic_read() behave consistently across all architectures

2007-09-10 Thread Denys Vlasenko
On Monday 10 September 2007 16:09, Linus Torvalds wrote:
 On Mon, 10 Sep 2007, Denys Vlasenko wrote:
  static inline int
  qla2x00_wait_for_loop_ready(scsi_qla_host_t *ha)
  {
  int  return_status = QLA_SUCCESS;
  unsigned long loop_timeout ;
  scsi_qla_host_t *pha = to_qla_parent(ha);
  
  /* wait for 5 min at the max for loop to be ready */
  loop_timeout = jiffies + (MAX_LOOP_TIMEOUT * HZ);
  
  while ((!atomic_read(pha-loop_down_timer) 
  atomic_read(pha-loop_state) == LOOP_DOWN) ||
  atomic_read(pha-loop_state) != LOOP_READY) {
  if (atomic_read(pha-loop_state) == LOOP_DEAD) {
 ...
  Is above correct or buggy? Correct, because msleep is a barrier.
  Is it obvious? No.
 
 It's *buggy*. But it has nothing to do with any msleep() in the loop, or 
 anything else.
 
 And more importantly, it would be equally buggy even *with* a volatile 
 atomic_read().

I am not saying that this code is okay, this isn't the point.
(The code is in fact awful for several more reasons).

My point is that people are confused as to what atomic_read()
exactly means, and this is bad. Same for cpu_relax().
First one says read, and second one doesn't say barrier.

This is real code from current kernel which demonstrates this:

I don't know that cpu_relax() is a barrier already:

drivers/kvm/kvm_main.c
        while (atomic_read(completed) != needed) {
                cpu_relax();
                barrier();
        }

I think that atomic_read() is a read from memory and therefore
I don't need a barrier:

arch/x86_64/kernel/crash.c
        msecs = 1000; /* Wait at most a second for the other cpus to stop */
        while ((atomic_read(waiting_for_crash_ipi)  0)  msecs) {
                mdelay(1);
                msecs--;
        }

Since neither camp seems to give up, I am proposing renaming
them to something less confusing, and make everybody happy.

cpu_relax_barrier()
atomic_value(x)
atomic_fetch(x)

I'm not native English speaker, do these sound better?
--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bnx2 dirver's firmware images

2007-09-18 Thread Denys Vlasenko
On Tuesday 18 September 2007 19:45, Michael Chan wrote:
 We can compress all the different sections of the firmware.  Currently,
 we only compress the biggest chunks and the rest are uncompressed.

 These zeros should compress to almost nothing.  But I agree that the
 firmware is still big.

You don't need to store and fetch zeros at all. You *know* that
they are zeros, right? So do this:

- REG_WR_IND(bp, offset, fw-bss[j]);
+ REG_WR_IND(bp, offset, 0);

--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bnx2 dirver's firmware images

2007-09-19 Thread Denys Vlasenko
On Tuesday 18 September 2007 21:05, Michael Chan wrote:
 The bnx2 firmware changes quite frequently.  A new driver quite often
 requires new firmware to work correctly.  Splitting them up makes things
 difficult for the user.

sounds reasonable.

I see that bnx2 has support for unpacking gzipped binary blobs,
and it it the only such net driver (maybe the only such driver
in the entire tree, I didn't check).

This can be very useful for all other firmware images in other drivers.

Last night I prepared a patch which basically separates unpacking
function from bnx2-related code. Can you run-test attached patch?

Meanwhile I will prepare follow-on patch which actually moves this
function out of the driver and into lib/*.

Size difference:

# size */bn*.o
   textdata bss dec hex filename
  54884   816896424  142997   22e95 net/bnx2.o
  55276   818236424  143523   230a3 net.org/bnx2.o

Signed-off-by: Denys Vlasenko [EMAIL PROTECTED]
--
vda
--- linux-2.6.23-rc6.org/drivers/net/bnx2.c	Tue Sep 11 22:33:54 2007
+++ linux-2.6.23-rc6.gunzip/drivers/net/bnx2.c	Wed Sep 19 00:01:19 2007
@@ -2767,93 +2767,61 @@
 	spin_unlock_bh(bp-phy_lock);
 }
 
-#define FW_BUF_SIZE	0x8000
-
+/* To be moved to generic lib/ */
 static int
-bnx2_gunzip_init(struct bnx2 *bp)
+bnx2_gunzip(u8 *zbuf, int len, void **outbuf)
 {
-	if ((bp-gunzip_buf = vmalloc(FW_BUF_SIZE)) == NULL)
-		goto gunzip_nomem1;
+	struct z_stream_s *strm;
+	void *gunzip_buf;
+	int rc;
+	int sz;
 
-	if ((bp-strm = kmalloc(sizeof(*bp-strm), GFP_KERNEL)) == NULL)
-		goto gunzip_nomem2;
+	/* gzip header (1f,8b,08... 10 bytes total + possible asciz filename)
+	 * is stripped, 32-bit unpacked size (LE) is prepended instead */
+	sz = *zbuf++;
+	sz = (sz  8) + *zbuf++;
+	sz = (sz  8) + *zbuf++;
+	sz = (sz  8) + *zbuf++;
 
-	bp-strm-workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL);
-	if (bp-strm-workspace == NULL)
+	rc = -ENOMEM;
+	gunzip_buf = vmalloc(sz);
+	if (gunzip_buf == NULL)
+		goto gunzip_nomem1;
+	strm = kmalloc(sizeof(*strm), GFP_KERNEL);
+	if (strm == NULL)
+		goto gunzip_nomem2;
+	strm-workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL);
+	if (strm-workspace == NULL)
 		goto gunzip_nomem3;
 
-	return 0;
+	strm-next_in = zbuf;
+	strm-avail_in = len;
+	strm-next_out = gunzip_buf;
+	strm-avail_out = sz;
 
-gunzip_nomem3:
-	kfree(bp-strm);
-	bp-strm = NULL;
+	rc = zlib_inflateInit2(strm, -MAX_WBITS);
+	if (rc == Z_OK) {
+		rc = zlib_inflate(strm, Z_FINISH);
+		if (rc == Z_OK) {
+			rc = sz - strm-avail_out;
+			*outbuf = gunzip_buf;
+		} else
+			rc = -EINVAL;
+		zlib_inflateEnd(strm);
+	} else
+		rc = -EINVAL;
 
+	kfree(strm-workspace);
+gunzip_nomem3:
+	kfree(strm);
 gunzip_nomem2:
-	vfree(bp-gunzip_buf);
-	bp-gunzip_buf = NULL;
-
+	if (rc != Z_OK)
+		vfree(gunzip_buf);
 gunzip_nomem1:
-	printk(KERN_ERR PFX %s: Cannot allocate firmware buffer for 
-			uncompression.\n, bp-dev-name);
-	return -ENOMEM;
+	return rc; /* returns Z_OK (0) if successful */
 }
 
 static void
-bnx2_gunzip_end(struct bnx2 *bp)
-{
-	kfree(bp-strm-workspace);
-
-	kfree(bp-strm);
-	bp-strm = NULL;
-
-	if (bp-gunzip_buf) {
-		vfree(bp-gunzip_buf);
-		bp-gunzip_buf = NULL;
-	}
-}
-
-static int
-bnx2_gunzip(struct bnx2 *bp, u8 *zbuf, int len, void **outbuf, int *outlen)
-{
-	int n, rc;
-
-	/* check gzip header */
-	if ((zbuf[0] != 0x1f) || (zbuf[1] != 0x8b) || (zbuf[2] != Z_DEFLATED))
-		return -EINVAL;
-
-	n = 10;
-
-#define FNAME	0x8
-	if (zbuf[3]  FNAME)
-		while ((zbuf[n++] != 0)  (n  len));
-
-	bp-strm-next_in = zbuf + n;
-	bp-strm-avail_in = len - n;
-	bp-strm-next_out = bp-gunzip_buf;
-	bp-strm-avail_out = FW_BUF_SIZE;
-
-	rc = zlib_inflateInit2(bp-strm, -MAX_WBITS);
-	if (rc != Z_OK)
-		return rc;
-
-	rc = zlib_inflate(bp-strm, Z_FINISH);
-
-	*outlen = FW_BUF_SIZE - bp-strm-avail_out;
-	*outbuf = bp-gunzip_buf;
-
-	if ((rc != Z_OK)  (rc != Z_STREAM_END))
-		printk(KERN_ERR PFX %s: Firmware decompression error: %s\n,
-		   bp-dev-name, bp-strm-msg);
-
-	zlib_inflateEnd(bp-strm);
-
-	if (rc == Z_STREAM_END)
-		return 0;
-
-	return rc;
-}
-
-static void
 load_rv2p_fw(struct bnx2 *bp, u32 *rv2p_code, u32 rv2p_code_len,
 	u32 rv2p_proc)
 {
@@ -2902,22 +2870,16 @@
 	/* Load the Text area. */
 	offset = cpu_reg-spad_base + (fw-text_addr - cpu_reg-mips_view_base);
 	if (fw-gz_text) {
-		u32 text_len;
-		void *text;
-
-		rc = bnx2_gunzip(bp, fw-gz_text, fw-gz_text_len, text,
- text_len);
-		if (rc)
-			return rc;
-
-		fw-text = text;
-	}
-	if (fw-gz_text) {
+		u32 *text;
 		int j;
 
+		rc = bnx2_gunzip(fw-gz_text, fw-gz_text_len, (void*) text);
+		if (rc  0)
+			return rc;
 		for (j = 0; j  (fw-text_len / 4); j++, offset += 4) {
-			REG_WR_IND(bp, offset, cpu_to_le32(fw-text[j]));
+			REG_WR_IND(bp, offset, cpu_to_le32(text[j]));
 	}
+		vfree(text);
 	}
 
 	/* Load the Data area. */
@@ -2979,28 +2941,22 @@
 {
 	struct cpu_reg cpu_reg;
 	struct fw_info *fw;
-	int rc = 0;
+	int rc;
 	void *text;
-	u32 text_len;
 
-	if ((rc

Re: bnx2 dirver's firmware images

2007-09-19 Thread Denys Vlasenko
On Wednesday 19 September 2007 22:00, Michael Chan wrote:
 On Wed, 2007-09-19 at 09:30 +0100, Denys Vlasenko wrote:
 +   /* gzip header (1f,8b,08... 10 bytes total + possible asciz filename)
 +* is stripped, 32-bit unpacked size (LE) is prepended instead */
 +   sz = *zbuf++;
 +   sz = (sz  8) + *zbuf++;
 +   sz = (sz  8) + *zbuf++;
 +   sz = (sz  8) + *zbuf++;
 
 I don't have a problem with removing the gzip header.  It doesn't
 contain very useful information other than a valid header for sanity
 check.  But I don't think we need to arbitrarily add the unpacked size
 in front of the gzipped data.  The driver knows the size (e.g. the size
 of RAM on the chip) and should pass it to the call.  The driver should
 also allocate the memory for the unpacked data instead of allocating the
 memory inside the call and freeing it by the caller.  For example, the
 driver may need to use pci_alloc_consistent() if the firmware is to be
 DMA'ed to the chip.
 
 Other than that, everything else looks fine.  Thanks.

Are you saying that you successfully run-tested it?
--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bnx2 dirver's firmware images

2007-09-20 Thread Denys Vlasenko
On Wednesday 19 September 2007 22:43, Michael Chan wrote:
 On Wed, 2007-09-19 at 21:29 +0100, Denys Vlasenko wrote:
 
  Are you saying that you successfully run-tested it?
 
 I've only reviewed the code.  Let's resolve these issues first before
 testing the code.

Please test these two patches.
I updated them according to your comments.
--
vda
diff -urpN linux-2.6.23-rc6/drivers/net/bnx2.c linux-2.6.23-rc6.bnx2/drivers/net/bnx2.c
--- linux-2.6.23-rc6/drivers/net/bnx2.c	2007-09-14 00:08:11.0 +0100
+++ linux-2.6.23-rc6.bnx2/drivers/net/bnx2.c	2007-09-20 15:47:06.0 +0100
@@ -52,6 +52,8 @@
 #include bnx2_fw.h
 #include bnx2_fw2.h
 
+#define FW_BUF_SIZE		0x8000
+
 #define DRV_MODULE_NAME		bnx2
 #define PFX DRV_MODULE_NAME	: 
 #define DRV_MODULE_VERSION	1.6.4
@@ -2767,89 +2769,44 @@ bnx2_set_rx_mode(struct net_device *dev)
 	spin_unlock_bh(bp-phy_lock);
 }
 
-#define FW_BUF_SIZE	0x8000
-
+/* To be moved to generic lib/ */
 static int
-bnx2_gunzip_init(struct bnx2 *bp)
+bnx2_gunzip(void *gunzip_buf, unsigned sz, u8 *zbuf, int len, void **outbuf)
 {
-	if ((bp-gunzip_buf = vmalloc(FW_BUF_SIZE)) == NULL)
-		goto gunzip_nomem1;
+	struct z_stream_s *strm;
+	int rc;
 
-	if ((bp-strm = kmalloc(sizeof(*bp-strm), GFP_KERNEL)) == NULL)
-		goto gunzip_nomem2;
+	/* gzip header (1f,8b,08... 10 bytes total + possible asciz filename)
+	 * is stripped */
 
-	bp-strm-workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL);
-	if (bp-strm-workspace == NULL)
+	rc = -ENOMEM;
+	strm = kmalloc(sizeof(*strm), GFP_KERNEL);
+	if (strm == NULL)
+		goto gunzip_nomem2;
+	strm-workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL);
+	if (strm-workspace == NULL)
 		goto gunzip_nomem3;
 
-	return 0;
+	strm-next_in = zbuf;
+	strm-avail_in = len;
+	strm-next_out = gunzip_buf;
+	strm-avail_out = sz;
+
+	rc = zlib_inflateInit2(strm, -MAX_WBITS);
+	if (rc == Z_OK) {
+		rc = zlib_inflate(strm, Z_FINISH);
+		if (rc == Z_OK)
+			rc = sz - strm-avail_out;
+		else
+			rc = -EINVAL;
+		zlib_inflateEnd(strm);
+	} else
+		rc = -EINVAL;
 
+	kfree(strm-workspace);
 gunzip_nomem3:
-	kfree(bp-strm);
-	bp-strm = NULL;
-
+	kfree(strm);
 gunzip_nomem2:
-	vfree(bp-gunzip_buf);
-	bp-gunzip_buf = NULL;
-
-gunzip_nomem1:
-	printk(KERN_ERR PFX %s: Cannot allocate firmware buffer for 
-			uncompression.\n, bp-dev-name);
-	return -ENOMEM;
-}
-
-static void
-bnx2_gunzip_end(struct bnx2 *bp)
-{
-	kfree(bp-strm-workspace);
-
-	kfree(bp-strm);
-	bp-strm = NULL;
-
-	if (bp-gunzip_buf) {
-		vfree(bp-gunzip_buf);
-		bp-gunzip_buf = NULL;
-	}
-}
-
-static int
-bnx2_gunzip(struct bnx2 *bp, u8 *zbuf, int len, void **outbuf, int *outlen)
-{
-	int n, rc;
-
-	/* check gzip header */
-	if ((zbuf[0] != 0x1f) || (zbuf[1] != 0x8b) || (zbuf[2] != Z_DEFLATED))
-		return -EINVAL;
-
-	n = 10;
-
-#define FNAME	0x8
-	if (zbuf[3]  FNAME)
-		while ((zbuf[n++] != 0)  (n  len));
-
-	bp-strm-next_in = zbuf + n;
-	bp-strm-avail_in = len - n;
-	bp-strm-next_out = bp-gunzip_buf;
-	bp-strm-avail_out = FW_BUF_SIZE;
-
-	rc = zlib_inflateInit2(bp-strm, -MAX_WBITS);
-	if (rc != Z_OK)
-		return rc;
-
-	rc = zlib_inflate(bp-strm, Z_FINISH);
-
-	*outlen = FW_BUF_SIZE - bp-strm-avail_out;
-	*outbuf = bp-gunzip_buf;
-
-	if ((rc != Z_OK)  (rc != Z_STREAM_END))
-		printk(KERN_ERR PFX %s: Firmware decompression error: %s\n,
-		   bp-dev-name, bp-strm-msg);
-
-	zlib_inflateEnd(bp-strm);
-
-	if (rc == Z_STREAM_END)
-		return 0;
-
 	return rc;
 }
 
@@ -2902,22 +2859,21 @@ load_cpu_fw(struct bnx2 *bp, struct cpu_
 	/* Load the Text area. */
 	offset = cpu_reg-spad_base + (fw-text_addr - cpu_reg-mips_view_base);
 	if (fw-gz_text) {
-		u32 text_len;
-		void *text;
-
-		rc = bnx2_gunzip(bp, fw-gz_text, fw-gz_text_len, text,
- text_len);
-		if (rc)
-			return rc;
-
-		fw-text = text;
-	}
-	if (fw-gz_text) {
+		u32 *text;
 		int j;
 
+		text = vmalloc(FW_BUF_SIZE);
+		if (!text)
+			return -ENOMEM;
+		rc = bnx2_gunzip(text, FW_BUF_SIZE, fw-gz_text, fw-gz_text_len);
+		if (rc  0) {
+			vfree(text);
+			return rc;
+		}
 		for (j = 0; j  (fw-text_len / 4); j++, offset += 4) {
-			REG_WR_IND(bp, offset, cpu_to_le32(fw-text[j]));
+			REG_WR_IND(bp, offset, cpu_to_le32(text[j]));
 	}
+		vfree(text);
 	}
 
 	/* Load the Data area. */
@@ -2979,27 +2935,27 @@ bnx2_init_cpus(struct bnx2 *bp)
 {
 	struct cpu_reg cpu_reg;
 	struct fw_info *fw;
-	int rc = 0;
+	int rc;
 	void *text;
-	u32 text_len;
-
-	if ((rc = bnx2_gunzip_init(bp)) != 0)
-		return rc;
 
 	/* Initialize the RV2P processor. */
-	rc = bnx2_gunzip(bp, bnx2_rv2p_proc1, sizeof(bnx2_rv2p_proc1), text,
-			 text_len);
-	if (rc)
+	text = vmalloc(FW_BUF_SIZE);
+	if (!text)
+		return -ENOMEM;
+	rc = bnx2_gunzip(text, FW_BUF_SIZE, bnx2_rv2p_proc1, sizeof(bnx2_rv2p_proc1));
+	if (rc  0) {
+		vfree(text);
 		goto init_cpu_err;
+	}
+	load_rv2p_fw(bp, text, rc /* == len */, RV2P_PROC1);
 
-	load_rv2p_fw(bp, text, text_len, RV2P_PROC1);
-
-	rc = bnx2_gunzip(bp, bnx2_rv2p_proc2, sizeof(bnx2_rv2p_proc2), text,
-			 text_len);
-	if (rc

[PATCH 2/2] bnx2: move gzip unpacker to zlib

2007-09-21 Thread Denys Vlasenko
On Friday 21 September 2007 12:01, Denys Vlasenko wrote:
 I will move this code
 out of the driver and into zlib in follow-on patch.

No, I won't. I accidentally attached both patches to first email,
you can find it there. Sorry.
--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] bnx2: factor out gzip unpacker

2007-09-21 Thread Denys Vlasenko
Hi Jeff,

This patch modifies gzip unpacking code in bnx2 driver so that
it does not depend on bnx2 internals. I will move this code
out of the driver and into zlib in follow-on patch.

It can be useful in other drivers which need to store firmwares
or any other relatively big binary blobs - fonts, cursor bitmaps,
whatever.

Patch is run tested by Michael Chan (driver author).

Michael, can you add your ACK?

Signed-off-by: Denys Vlasenko [EMAIL PROTECTED]
--
vda
diff -urpN linux-2.6.23-rc6/drivers/net/bnx2.c linux-2.6.23-rc6.bnx2/drivers/net/bnx2.c
--- linux-2.6.23-rc6/drivers/net/bnx2.c	2007-09-14 00:08:11.0 +0100
+++ linux-2.6.23-rc6.bnx2/drivers/net/bnx2.c	2007-09-21 11:44:03.0 +0100
@@ -52,6 +52,8 @@
 #include bnx2_fw.h
 #include bnx2_fw2.h
 
+#define FW_BUF_SIZE		0x8000
+
 #define DRV_MODULE_NAME		bnx2
 #define PFX DRV_MODULE_NAME	: 
 #define DRV_MODULE_VERSION	1.6.4
@@ -2767,89 +2769,45 @@ bnx2_set_rx_mode(struct net_device *dev)
 	spin_unlock_bh(bp-phy_lock);
 }
 
-#define FW_BUF_SIZE	0x8000
-
+/* To be moved to generic lib/ */
 static int
-bnx2_gunzip_init(struct bnx2 *bp)
+bnx2_gunzip(void *gunzip_buf, unsigned sz, u8 *zbuf, int len)
 {
-	if ((bp-gunzip_buf = vmalloc(FW_BUF_SIZE)) == NULL)
-		goto gunzip_nomem1;
+	struct z_stream_s *strm;
+	int rc;
 
-	if ((bp-strm = kmalloc(sizeof(*bp-strm), GFP_KERNEL)) == NULL)
-		goto gunzip_nomem2;
+	/* gzip header (1f,8b,08... 10 bytes total + possible asciz filename)
+	 * is stripped */
 
-	bp-strm-workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL);
-	if (bp-strm-workspace == NULL)
+	rc = -ENOMEM;
+	strm = kmalloc(sizeof(*strm), GFP_KERNEL);
+	if (strm == NULL)
+		goto gunzip_nomem2;
+	strm-workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL);
+	if (strm-workspace == NULL)
 		goto gunzip_nomem3;
 
-	return 0;
+	strm-next_in = zbuf;
+	strm-avail_in = len;
+	strm-next_out = gunzip_buf;
+	strm-avail_out = sz;
+
+	rc = zlib_inflateInit2(strm, -MAX_WBITS);
+	if (rc == Z_OK) {
+		rc = zlib_inflate(strm, Z_FINISH);
+		/* after Z_FINISH, only Z_STREAM_END is we unpacked it all */
+		if (rc == Z_STREAM_END)
+			rc = sz - strm-avail_out;
+		else
+			rc = -EINVAL;
+		zlib_inflateEnd(strm);
+	} else
+		rc = -EINVAL;
 
+	kfree(strm-workspace);
 gunzip_nomem3:
-	kfree(bp-strm);
-	bp-strm = NULL;
-
+	kfree(strm);
 gunzip_nomem2:
-	vfree(bp-gunzip_buf);
-	bp-gunzip_buf = NULL;
-
-gunzip_nomem1:
-	printk(KERN_ERR PFX %s: Cannot allocate firmware buffer for 
-			uncompression.\n, bp-dev-name);
-	return -ENOMEM;
-}
-
-static void
-bnx2_gunzip_end(struct bnx2 *bp)
-{
-	kfree(bp-strm-workspace);
-
-	kfree(bp-strm);
-	bp-strm = NULL;
-
-	if (bp-gunzip_buf) {
-		vfree(bp-gunzip_buf);
-		bp-gunzip_buf = NULL;
-	}
-}
-
-static int
-bnx2_gunzip(struct bnx2 *bp, u8 *zbuf, int len, void **outbuf, int *outlen)
-{
-	int n, rc;
-
-	/* check gzip header */
-	if ((zbuf[0] != 0x1f) || (zbuf[1] != 0x8b) || (zbuf[2] != Z_DEFLATED))
-		return -EINVAL;
-
-	n = 10;
-
-#define FNAME	0x8
-	if (zbuf[3]  FNAME)
-		while ((zbuf[n++] != 0)  (n  len));
-
-	bp-strm-next_in = zbuf + n;
-	bp-strm-avail_in = len - n;
-	bp-strm-next_out = bp-gunzip_buf;
-	bp-strm-avail_out = FW_BUF_SIZE;
-
-	rc = zlib_inflateInit2(bp-strm, -MAX_WBITS);
-	if (rc != Z_OK)
-		return rc;
-
-	rc = zlib_inflate(bp-strm, Z_FINISH);
-
-	*outlen = FW_BUF_SIZE - bp-strm-avail_out;
-	*outbuf = bp-gunzip_buf;
-
-	if ((rc != Z_OK)  (rc != Z_STREAM_END))
-		printk(KERN_ERR PFX %s: Firmware decompression error: %s\n,
-		   bp-dev-name, bp-strm-msg);
-
-	zlib_inflateEnd(bp-strm);
-
-	if (rc == Z_STREAM_END)
-		return 0;
-
 	return rc;
 }
 
@@ -2902,22 +2860,21 @@ load_cpu_fw(struct bnx2 *bp, struct cpu_
 	/* Load the Text area. */
 	offset = cpu_reg-spad_base + (fw-text_addr - cpu_reg-mips_view_base);
 	if (fw-gz_text) {
-		u32 text_len;
-		void *text;
-
-		rc = bnx2_gunzip(bp, fw-gz_text, fw-gz_text_len, text,
- text_len);
-		if (rc)
-			return rc;
-
-		fw-text = text;
-	}
-	if (fw-gz_text) {
+		u32 *text;
 		int j;
 
+		text = vmalloc(FW_BUF_SIZE);
+		if (!text)
+			return -ENOMEM;
+		rc = bnx2_gunzip(text, FW_BUF_SIZE, fw-gz_text, fw-gz_text_len);
+		if (rc  0) {
+			vfree(text);
+			return rc;
+		}
 		for (j = 0; j  (fw-text_len / 4); j++, offset += 4) {
-			REG_WR_IND(bp, offset, cpu_to_le32(fw-text[j]));
+			REG_WR_IND(bp, offset, cpu_to_le32(text[j]));
 	}
+		vfree(text);
 	}
 
 	/* Load the Data area. */
@@ -2979,27 +2936,27 @@ bnx2_init_cpus(struct bnx2 *bp)
 {
 	struct cpu_reg cpu_reg;
 	struct fw_info *fw;
-	int rc = 0;
+	int rc;
 	void *text;
-	u32 text_len;
-
-	if ((rc = bnx2_gunzip_init(bp)) != 0)
-		return rc;
 
 	/* Initialize the RV2P processor. */
-	rc = bnx2_gunzip(bp, bnx2_rv2p_proc1, sizeof(bnx2_rv2p_proc1), text,
-			 text_len);
-	if (rc)
+	text = vmalloc(FW_BUF_SIZE);
+	if (!text)
+		return -ENOMEM;
+	rc = bnx2_gunzip(text, FW_BUF_SIZE, bnx2_rv2p_proc1, sizeof(bnx2_rv2p_proc1));
+	if (rc  0) {
+		vfree(text);
 		goto init_cpu_err;
+	}
+	load_rv2p_fw(bp, text

Re: [PATCH 1/2] bnx2: factor out gzip unpacker

2007-09-21 Thread Denys Vlasenko
On Friday 21 September 2007 17:14, David Miller wrote:
 From: Denys Vlasenko [EMAIL PROTECTED]
 Date: Fri, 21 Sep 2007 12:01:24 +0100
 
  Hi Jeff,
 
 BNX2 and TG3 patches goes through Michael Chan and myself,
 and I usually merge them in instead of Jeff.

Didn't know that, sorry.

Do patches look ok to you?
--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] bnx2: factor out gzip unpacker

2007-09-21 Thread Denys Vlasenko
On Friday 21 September 2007 18:49, David Miller wrote:
 From: Denys Vlasenko [EMAIL PROTECTED]
 Date: Fri, 21 Sep 2007 18:03:55 +0100
 
  Do patches look ok to you?
 
 I'm travelling so I haven't looked closely yet :-)
 
 Michael can take a look and I'll try to do so as well
 tonight.

Good.

I plan to use gzip compression on following drivers' firmware,
if patches will be accepted:

   textdata bss dec hex filename
  17653  109968 240  127861   1f375 drivers/net/acenic.o
   6628  120448   4  127080   1f068 drivers/net/dgrs.o
 ^^

--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] bnx2: factor out gzip unpacker

2007-09-21 Thread Denys Vlasenko
On Friday 21 September 2007 19:36, [EMAIL PROTECTED] wrote:
 On Fri, 21 Sep 2007 19:05:23 BST, Denys Vlasenko said:
 
  I plan to use gzip compression on following drivers' firmware,
  if patches will be accepted:
  
 textdata bss dec hex filename
17653  109968 240  127861   1f375 drivers/net/acenic.o
 6628  120448   4  127080   1f068 drivers/net/dgrs.o
   ^^
 
 Should this be redone to use the existing firmware loading framework to
 load the firmware instead?

Not in every case.

For example, bnx2 maintainer says that driver and
firmware are closely tied for his driver. IOW: you upgrade kernel
and your NIC is not working anymore.

Another argument is to make kernel be able to bring up NICs
without needing firmware images in initramfs/initrd/hard drive.
--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] bnx2: factor out gzip unpacker

2007-09-21 Thread Denys Vlasenko
On Friday 21 September 2007 21:13, Andi Kleen wrote:
 Denys Vlasenko [EMAIL PROTECTED] writes:
  
  I plan to use gzip compression on following drivers' firmware,
  if patches will be accepted:
  
 textdata bss dec hex filename
17653  109968 240  127861   1f375 drivers/net/acenic.o
 6628  120448   4  127080   1f068 drivers/net/dgrs.o
   ^^
 
 Just change the makefiles to always install gzip'ed modules
 modutils knows how to unzip them on the fly.

But I compile net/* into bzImage. I like netbooting :)
--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] bnx2: factor out gzip unpacker

2007-09-21 Thread Denys Vlasenko
On Friday 21 September 2007 20:33, Krzysztof Oledzki wrote:
 
 On Fri, 21 Sep 2007, Denys Vlasenko wrote:
 
  On Friday 21 September 2007 19:36, [EMAIL PROTECTED] wrote:
  On Fri, 21 Sep 2007 19:05:23 BST, Denys Vlasenko said:
 
  I plan to use gzip compression on following drivers' firmware,
  if patches will be accepted:
 
 textdata bss dec hex filename
17653  109968 240  127861   1f375 drivers/net/acenic.o
 6628  120448   4  127080   1f068 drivers/net/dgrs.o
   ^^
 
  Should this be redone to use the existing firmware loading framework to
  load the firmware instead?
 
  Not in every case.
 
  For example, bnx2 maintainer says that driver and
  firmware are closely tied for his driver. IOW: you upgrade kernel
  and your NIC is not working anymore.

 Firmware may come with a kernel. We have a install modules, we can also 
 add install firmware.

Install where? I boot my machine over NFS, and it has no hard drive.

  Another argument is to make kernel be able to bring up NICs
  without needing firmware images in initramfs/initrd/hard drive.
 
 It is not possible to bring up things like FC or WiFi without firmware, 
 what special is in classic NICs?

Nothing.

It is just not (yet?) decreed from The Very Top that all and every
firmware image should be loaded using request_firmware().

Also people may want to gzip something else than firmware.
--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: net-2.6.24 breaks powerpc mysteriously

2007-10-13 Thread Denys Vlasenko
On Friday 12 October 2007 03:45, David Miller wrote:
 From: Andrew Morton [EMAIL PROTECTED]
 Date: Thu, 11 Oct 2007 19:22:33 -0700
 
  With net-2.6.24 (pulled yesterday) applied:
  
  g5:/usr/src/25 ml arch/powerpc/boot/inflate.o
Using ARCH=powerpc CROSS_COMPILE=
CHK include/linux/version.h
CHK include/linux/utsrelease.h
CALLscripts/checksyscalls.sh
BOOTCC  arch/powerpc/boot/inflate.o
  arch/powerpc/boot/inflate.c:920:19: errno.h: No such file or directory
  arch/powerpc/boot/inflate.c:921:18: slab.h: No such file or directory
  arch/powerpc/boot/inflate.c:922:21: vmalloc.h: No such file or directory
  arch/powerpc/boot/inflate.c: In function `zlib_inflate_blob':
  arch/powerpc/boot/inflate.c:928: error: syntax error before '*' token
 
 The only thing we touched in zlib is in the patch below.
 
 I suspect the lib/zlib_inflate/inflate.c changes, I had no idea that
 some pieces of code try to use this into userspace.
 
 I supposed a hacky fix is to add __KERNEL__ ifdef protection around
 zlib_inflate_blob() and those troublesome includes.  A nicer fix is
 probably to change the zlib_inflate_blob() interface to pass in
 pointers to alloc() and free() routines instead of calling kernel ones
 directly.
 
 Denys?
 
 commit 8336793baf962163c9fab5a3f39614295fdbab27
 Author: Denys Vlasenko [EMAIL PROTECTED]
 Date:   Sun Sep 30 17:56:49 2007 -0700
 
 [ZLIB]: Move bnx2 driver gzip unpacker into zlib.
 
 Signed-off-by: Denys Vlasenko [EMAIL PROTECTED]
 Acked-by: Michael Chan [EMAIL PROTECTED]
 Signed-off-by: David S. Miller [EMAIL PROTECTED]

Please find updated patch in attachment.
Compile-tested.
--
vda
--- a/drivers/net/bnx2.c
+++ b/drivers/net/bnx2.c
@@ -2761,48 +2761,6 @@ bnx2_set_rx_mode(struct net_device *dev)
 	spin_unlock_bh(bp-phy_lock);
 }
 
-/* To be moved to generic lib/ */
-static int
-bnx2_gunzip(void *gunzip_buf, unsigned sz, u8 *zbuf, int len)
-{
-	struct z_stream_s *strm;
-	int rc;
-
-	/* gzip header (1f,8b,08... 10 bytes total + possible asciz filename)
-	 * is stripped */
-
-	rc = -ENOMEM;
-	strm = kmalloc(sizeof(*strm), GFP_KERNEL);
-	if (strm == NULL)
-		goto gunzip_nomem2;
-	strm-workspace = kmalloc(zlib_inflate_workspacesize(), GFP_KERNEL);
-	if (strm-workspace == NULL)
-		goto gunzip_nomem3;
-
-	strm-next_in = zbuf;
-	strm-avail_in = len;
-	strm-next_out = gunzip_buf;
-	strm-avail_out = sz;
-
-	rc = zlib_inflateInit2(strm, -MAX_WBITS);
-	if (rc == Z_OK) {
-		rc = zlib_inflate(strm, Z_FINISH);
-		/* after Z_FINISH, only Z_STREAM_END is we unpacked it all */
-		if (rc == Z_STREAM_END)
-			rc = sz - strm-avail_out;
-		else
-			rc = -EINVAL;
-		zlib_inflateEnd(strm);
-	} else
-		rc = -EINVAL;
-
-	kfree(strm-workspace);
-gunzip_nomem3:
-	kfree(strm);
-gunzip_nomem2:
-	return rc;
-}
-
 static void
 load_rv2p_fw(struct bnx2 *bp, u32 *rv2p_code, u32 rv2p_code_len,
 	u32 rv2p_proc)
@@ -2858,7 +2816,7 @@ load_cpu_fw(struct bnx2 *bp, struct cpu_reg *cpu_reg, struct fw_info *fw)
 		text = vmalloc(FW_BUF_SIZE);
 		if (!text)
 			return -ENOMEM;
-		rc = bnx2_gunzip(text, FW_BUF_SIZE, fw-gz_text, fw-gz_text_len);
+		rc = zlib_inflate_blob(text, FW_BUF_SIZE, fw-gz_text, fw-gz_text_len);
 		if (rc  0) {
 			vfree(text);
 			return rc;
@@ -2935,14 +2893,14 @@ bnx2_init_cpus(struct bnx2 *bp)
 	text = vmalloc(FW_BUF_SIZE);
 	if (!text)
 		return -ENOMEM;
-	rc = bnx2_gunzip(text, FW_BUF_SIZE, bnx2_rv2p_proc1, sizeof(bnx2_rv2p_proc1));
+	rc = zlib_inflate_blob(text, FW_BUF_SIZE, bnx2_rv2p_proc1, sizeof(bnx2_rv2p_proc1));
 	if (rc  0) {
 		vfree(text);
 		goto init_cpu_err;
 	}
 	load_rv2p_fw(bp, text, rc /* == len */, RV2P_PROC1);
 
-	rc = bnx2_gunzip(text, FW_BUF_SIZE, bnx2_rv2p_proc2, sizeof(bnx2_rv2p_proc2));
+	rc = zlib_inflate_blob(text, FW_BUF_SIZE, bnx2_rv2p_proc2, sizeof(bnx2_rv2p_proc2));
 	if (rc  0) {
 		vfree(text);
 		goto init_cpu_err;
--- a/include/linux/zlib.h
+++ b/include/linux/zlib.h
@@ -82,7 +82,7 @@
 struct internal_state;
 
 typedef struct z_stream_s {
-Byte*next_in;   /* next input byte */
+const Byte *next_in;   /* next input byte */
 uInt avail_in;  /* number of bytes available at next_in */
 uLongtotal_in;  /* total nb of input bytes read so far */
 
@@ -699,4 +699,8 @@ extern int zlib_inflateInit2(z_streamp strm, int  windowBits);
 struct internal_state {int dummy;}; /* hack for buggy compilers */
 #endif
 
+/* Utility function: initialize zlib, unpack binary blob, clean up zlib,
+ * return len or negative error code. */
+extern int zlib_inflate_blob(void *dst, unsigned dst_sz, const void *src, unsigned src_sz);
+
 #endif /* _ZLIB_H */
--- a/lib/zlib_inflate/inffast.c
+++ b/lib/zlib_inflate/inffast.c
@@ -69,22 +69,22 @@
 void inflate_fast(z_streamp strm, unsigned start)
 {
 struct inflate_state *state;
-unsigned char *in;  /* local strm-next_in */
-unsigned char *last;/* while in  last, enough input available */
-unsigned char *out; /* local strm-next_out */
-unsigned char

Re: net-2.6.24 breaks powerpc mysteriously

2007-10-13 Thread Denys Vlasenko
On Friday 12 October 2007 05:09, Paul Mackerras wrote:
  I supposed a hacky fix is to add __KERNEL__ ifdef protection around
  zlib_inflate_blob() and those troublesome includes.  A nicer fix is
 
 That would do, but I don't see why zlib_inflate_blob had to be added
 to inflate.c rather than being in a new file under lib/zlib_inflate.

Done.

 If nothing else, that bloats configs that currently use inflate.c (in
 the kernel) but don't need zlib_inflate_blob() - which would be all
 the existing uses. :)

I did it.

However, in general I prefer to have a better linker, one which
is capable of eliminate unused fuctions/data on the per-object basis,
not per .o file.

Patches to implement that are sent to Sam Ravnborg.
Hopefully they will make it to mainline.
--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] New Kernel Bugs

2007-11-13 Thread Denys Vlasenko
On Tuesday 13 November 2007 07:08, Mark Lord wrote:
 Ingo Molnar wrote:
 ..

  This is all QA-101 that _cannot be argued against on a rational basis_,
  it's just that these sorts of things have been largely ignored for
  years, in favor of the all-too-easy open source means many eyeballs and
  that is our QA answer, which is a _good_ answer but by far not the most
  intelligent answer! Today many eyeballs is simply not good enough and
  nature (and other OS projects) will route us around if we dont change.

 ..

 QA-101 and many eyeballs are not at all in opposition.
 The latter is how we find out about bugs on uncommon hardware,
 and the former is what we need to track them and overall quality.

 A HUGE problem I have with current efforts, is that once someone
 reports a bug, the onus seems to be 99% on the *reporter* to find
 the exact line of code or commit.  Ghad what a repressive method.

This is the only method that scales.

Developer has only 24 hours in each day, and sometimes he needs to eat,
sleep, and maybe even pay attention to e.g. his kids.

But bug reporters are much more numerous and they have more
hours in one day combined.

BUT - it means that developers should try to increase user base,
not scare users away.

 And if the developer who broke the damn thing, or who at least
 claims to be supporting that code, cannot reproduce the bug,
 they drop it completely.

Developer should let reporter know that reporter needs to help
a bit here. Sometimes a bit of hand holding is needed, but it
pays off because you breed more qualified testers/bug reporters.

 Contrast that flawed approach with how Linus does things..
 he thinks through the symptoms, matches them to the code,
 and figures out what the few possibilities might be,
 and feeds back some trial balloon patches for the bug reporter to try.

 MUCH better.

 And remember, *I'm* an old-time Linux kernel developer.. just think about
 the people reporting bugs who haven't been around here since 1992..

Yes. Developers should not grow more and more unhelpful
and arrogant towards their users just because inexperienced
users send incomplete/poorly written bug reports.
They need to provide help, not humiliate/ignore.

I think we agree here.
--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] New Kernel Bugs

2007-11-13 Thread Denys Vlasenko
On Tuesday 13 November 2007 10:56, Adrian Bunk wrote:
 On Tue, Nov 13, 2007 at 12:13:56PM -0500, Theodore Tso wrote:
  On Tue, Nov 13, 2007 at 04:52:32PM +0100, Benoit Boissinot wrote:
   Btw, I used to test every -mm kernel. But since I've switched distros
   (gentoo-ubuntu)
   and I have less time, I feel it's harder to test -rc or -mm kernels (I
   know this isn't a lkml problem
   but more a distro problem, but I would love having an ubuntu blessed
   repo with current dev kernel
   for the latest stable ubuntu release).
 
  There are two parts to this.  One is a Ubuntu development kernel which
  we can give to large numbers of people to expand our testing pool.
  But if we don't do a better job of responding to bug reports that
  would be generated by expanded testing this won't necessarily help us.
 ...

 The main problem aren't missing testers [1] - we already have relatively
 experienced people testing kernels and/or reporting bugs, and we slowly
 scare them away due to the many bug reports without any reaction.

 The main problem is finding experienced developers who spend time on
 looking into bug reports.

 Getting many relatively unexperienced users (who need more guidance for
 debugging issues) as additional testers is therefore IMHO not
 necessarily a good idea.

And where experienced developrs are coming from?
They are not born with Linux kernel skills.
They grow up from within user base.

Bigger user base - more developers (eventually)
--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] New Kernel Bugs

2007-11-13 Thread Denys Vlasenko
On Tuesday 13 November 2007 11:57, Gabriel C wrote:
  The main problem is finding experienced developers who spend time on
  looking into bug reports.

 There are already. IMO the problem is the development model.

 There are tons new features in each new kernel release and 'tons new bugs'
 which are not fixed during the release cycle nor in the .XX stable kernels.

 Maybe after XX kernel releases there should be one just with bug-fixes
 _without_ any new features , eg: cleaning bugs from bugzilla , know
 regressions , cleaning up code , removing broken drivers and the like.

Won't work. You cannot force people to work on things they don't
find interesting, long-term.
--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] New Kernel Bugs

2007-11-13 Thread Denys Vlasenko
On Wednesday 14 November 2007 00:27, Adrian Bunk wrote:
 You missed the following in my email:
 we slowly scare them away due to the many bug reports without any
  reaction.

 The problem is that bug reports take time. If you go away from easy
 things like compile errors then even things like describing what does
 no longer work, ideally producing a scenario where you can reproduce it
 and verifying whether it was present in previous kernels can easily take
 many hours that are spent before the initial bug report.

 If the bug report then gets ignored we discourage the person who sent
 the bug report to do any work related to the kernel again.

Cannot agree more. I am in a similar position right now.
My patch to aic7xxx driver was ubmitted four times
with not much reaction from scsi guys.

Finally they replied and asked to rediff it against their
git tree. I did that and sent patches back. No reply since then.

And mind you, the patch is not trying to do anything
complex, it mostly moves code around, removes 'inline',
adds 'const'. What should I think about it?
--
vda
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net/ipv4/netfilter/ip_tables.c: remove some inlines

2007-12-16 Thread Denys Vlasenko
Hi Patrick, Harald,

I was working on unrelated problem and noticed that ip_tables.c
seem to abuse inline. I prepared a patch which removes inlines
except those which are used by packet matching code
(and thus are really performance-critical).
I added comments explaining that remaining inlines are
performance critical.

Result as reported by size:

   textdata bss dec hex filename
-  6451 380  8869191b07 ip_tables.o
+  6339 348  7267591a67 ip_tables.o

Please take this patch into netfilter queue.

Signed-off-by: Denys Vlasenko [EMAIL PROTECTED]
--
vda
diff -urpN linux-2.6.org/net/ipv4/netfilter/ip_tables.c linux-2.6.ipt/net/ipv4/netfilter/ip_tables.c
--- linux-2.6.org/net/ipv4/netfilter/ip_tables.c	2007-12-14 10:46:37.0 -0800
+++ linux-2.6.ipt/net/ipv4/netfilter/ip_tables.c	2007-12-16 12:37:46.0 -0800
@@ -74,6 +74,7 @@ do {\
Hence the start of any table is given by get_table() below.  */
 
 /* Returns whether matches rule or not. */
+/* Performance critical - called for every packet */
 static inline int
 ip_packet_match(const struct iphdr *ip,
 		const char *indev,
@@ -152,7 +153,7 @@ ip_packet_match(const struct iphdr *ip,
 	return 1;
 }
 
-static inline bool
+static bool
 ip_checkentry(const struct ipt_ip *ip)
 {
 	if (ip-flags  ~IPT_F_MASK) {
@@ -182,6 +183,7 @@ ipt_error(struct sk_buff *skb,
 	return NF_DROP;
 }
 
+/* Performance critical - called for every packet */
 static inline
 bool do_match(struct ipt_entry_match *m,
 	  const struct sk_buff *skb,
@@ -198,6 +200,7 @@ bool do_match(struct ipt_entry_match *m,
 		return false;
 }
 
+/* Performance critical */
 static inline struct ipt_entry *
 get_entry(void *base, unsigned int offset)
 {
@@ -205,6 +208,7 @@ get_entry(void *base, unsigned int offse
 }
 
 /* All zeroes == unconditional rule. */
+/* Mildly perf critical (only if packet tracing is on) */
 static inline int
 unconditional(const struct ipt_ip *ip)
 {
@@ -219,7 +223,7 @@ unconditional(const struct ipt_ip *ip)
 
 #if defined(CONFIG_NETFILTER_XT_TARGET_TRACE) || \
 defined(CONFIG_NETFILTER_XT_TARGET_TRACE_MODULE)
-static const char *hooknames[] = {
+static const char *const hooknames[] = {
 	[NF_IP_PRE_ROUTING]		= PREROUTING,
 	[NF_IP_LOCAL_IN]		= INPUT,
 	[NF_IP_FORWARD]			= FORWARD,
@@ -233,7 +237,7 @@ enum nf_ip_trace_comments {
 	NF_IP_TRACE_COMMENT_POLICY,
 };
 
-static const char *comments[] = {
+static const char *const comments[] = {
 	[NF_IP_TRACE_COMMENT_RULE]	= rule,
 	[NF_IP_TRACE_COMMENT_RETURN]	= return,
 	[NF_IP_TRACE_COMMENT_POLICY]	= policy,
@@ -249,6 +253,7 @@ static struct nf_loginfo trace_loginfo =
 	},
 };
 
+/* Mildly perf critical (only if packet tracing is on) */
 static inline int
 get_chainname_rulenum(struct ipt_entry *s, struct ipt_entry *e,
 		  char *hookname, char **chainname,
@@ -567,7 +572,7 @@ mark_source_chains(struct xt_table_info 
 	return 1;
 }
 
-static inline int
+static int
 cleanup_match(struct ipt_entry_match *m, unsigned int *i)
 {
 	if (i  (*i)-- == 0)
@@ -579,7 +584,7 @@ cleanup_match(struct ipt_entry_match *m,
 	return 0;
 }
 
-static inline int
+static int
 check_entry(struct ipt_entry *e, const char *name)
 {
 	struct ipt_entry_target *t;
@@ -599,7 +604,7 @@ check_entry(struct ipt_entry *e, const c
 	return 0;
 }
 
-static inline int check_match(struct ipt_entry_match *m, const char *name,
+static int check_match(struct ipt_entry_match *m, const char *name,
 const struct ipt_ip *ip, unsigned int hookmask,
 unsigned int *i)
 {
@@ -622,7 +627,7 @@ static inline int check_match(struct ipt
 	return ret;
 }
 
-static inline int
+static int
 find_check_match(struct ipt_entry_match *m,
 	const char *name,
 	const struct ipt_ip *ip,
@@ -651,7 +656,7 @@ err:
 	return ret;
 }
 
-static inline int check_target(struct ipt_entry *e, const char *name)
+static int check_target(struct ipt_entry *e, const char *name)
 {
 	struct ipt_entry_target *t;
 	struct xt_target *target;
@@ -672,7 +677,7 @@ static inline int check_target(struct ip
 	return ret;
 }
 
-static inline int
+static int
 find_check_entry(struct ipt_entry *e, const char *name, unsigned int size,
 	unsigned int *i)
 {
@@ -716,7 +721,7 @@ find_check_entry(struct ipt_entry *e, co
 	return ret;
 }
 
-static inline int
+static int
 check_entry_size_and_hooks(struct ipt_entry *e,
 			   struct xt_table_info *newinfo,
 			   unsigned char *base,
@@ -759,7 +764,7 @@ check_entry_size_and_hooks(struct ipt_en
 	return 0;
 }
 
-static inline int
+static int
 cleanup_entry(struct ipt_entry *e, unsigned int *i)
 {
 	struct ipt_entry_target *t;
@@ -1293,7 +1298,7 @@ __do_replace(const char *name, unsigned 
 	get_counters(oldinfo, counters);
 	/* Decrease module usage counts and free resource */
 	loc_cpu_old_entry = oldinfo-entries[raw_smp_processor_id()];
-	IPT_ENTRY_ITERATE(loc_cpu_old_entry, oldinfo-size, cleanup_entry,NULL);
+	IPT_ENTRY_ITERATE(loc_cpu_old_entry, oldinfo-size, cleanup_entry, NULL

Re: [GIT] Networking

2015-04-29 Thread Denys Vlasenko
On Wed, Apr 1, 2015 at 9:48 PM, David Miller da...@davemloft.net wrote:
 D.S. Ljungmark (1):
   ipv6: Don't reduce hop limit for an interface

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=6fd99094de2b83d1d4c8457f2c83483b2828e75a

I was testing this change and apparently it doesn't close the hole.

The python script I use to send RAs:

#!/usr/bin/env python
import sys
import time
import scapy.all
from scapy.layers.inet6 import *
ip = IPv6()
# ip.dst = 'ff02::1'
ip.dst = sys.argv[1]
icmp = ICMPv6ND_RA()
icmp.chlim = 1
for x in range(10):
send(ip/icmp)
time.sleep(1)

# ./ipv6-hop-limit.py fe80::21e:37ff:fed0:5006
.
Sent 1 packets.
...10 times...
Sent 1 packets.

After I do this, on the targeted machine I check hop_limits:

# for f in /proc/sys/net/ipv6/conf/*/hop_limit; do echo -n $f:; cat $f; done
/proc/sys/net/ipv6/conf/all/hop_limit:64
/proc/sys/net/ipv6/conf/default/hop_limit:64
/proc/sys/net/ipv6/conf/enp0s25/hop_limit:1  === THIS
/proc/sys/net/ipv6/conf/lo/hop_limit:64
/proc/sys/net/ipv6/conf/wlp3s0/hop_limit:64

As you see, the interface which received RAs still lowered
its hop_limit to 1. I take it means that the bug is still present
(right? I'm not a network guy...).

I triple-checked that I do run the kernel with the fix.
Further investigation shows that the code touched by the fix
is not even reached, hop_limit is changed elsewhere.

I'm willing to test additional patches.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netns: deinline net_generic()

2015-04-16 Thread Denys Vlasenko
Hi David, Eric,

As you may have surmised, this patch wasn't a result of me looking
at networking code; rather, it is a result of semi-automated search
for huge inlines.

The last step of this process would be the submission of patches.
I am expecting a range of responses from maintainers: enthusiastic,
no reply, go away, I don't care about code size,
and everything in between.

I can't invest a large amount of effort trying to push
every deinlining patch through. If someone, for example, would
flat out refuse to fix a huge inline in his driver, so be it.

I will be happy to run at least a few iterations over a patch
if maintainers do see a merit in deinlining, but have some
reservations wrt performance.

Your response falls into the latter case (you aren't refusing the patch
outright, but want it to be changed in some way).

It would help if you tell me how I should change the patches.

(I also hope to have a semi-generic way of addressing
performance concerns for future deinlining
patch submissions.)


On 04/14/2015 08:37 PM, David Miller wrote:
 From: Denys Vlasenko dvlas...@redhat.com
 Date: Tue, 14 Apr 2015 14:25:11 +0200
 
 On x86 allyesconfig build:
 The function compiles to 130 bytes of machine code.
 It has 493 callsites.
 Total reduction of vmlinux size: 27906 bytes.

text  data  bss   dec hex filename
 82447071 22255384 20627456 125329911 77861f7 vmlinux4
 82419165 22255384 20627456 125302005 777f4f5 vmlinux5

 Signed-off-by: Denys Vlasenko dvlas...@redhat.com
 
 That BUG_ON() was added 7 years ago, and I don't remember it ever
 triggering or helping us diagnose something, so just remove it and
 keep the function inlined.

There are two BUG_ONs. I'll remove both of them in v2.
Let me know if you want something else.

However, without BUG_ONs, function is still a bit big
on PREEMPT configs.

Among almost 500 users, many probably aren't hot paths.

How about having an inlined version, say, fast_net_generic(),
and a deinlined one, and using one or another as appropriate.
Is this ok with you?


On 04/14/2015 03:19 PM, Eric Dumazet wrote:
 On Tue, 2015-04-14 at 14:25 +0200, Denys Vlasenko wrote:
 On x86 allyesconfig build:
 The function compiles to 130 bytes of machine code.
 It has 493 callsites.
 Total reduction of vmlinux size: 27906 bytes.

text   data  bss   dec hex filename
 82447071 22255384 20627456 125329911 77861f7 vmlinux4
 82419165 22255384 20627456 125302005 777f4f5 vmlinux5

 This sounds a big hammer to me.

 These savings probably comes from the BUG_ON() that could simply be
 removed.
 The second one for sure has no purpose. First one looks defensive.

 For a typical (non allyesconfig) kernel, net_generic() would translate
 to :

 return net-gen[id - 1]

 Tunnels need this in fast path, so I presume we could introduce
 net_generic_rcu() to keep this stuff inlined where it matters.

 static inline void *net_generic_rcu(const struct net *net, int id)
 {
   struct net_generic *ng = rcu_dereference(net-gen);

   return ng-ptr[id - 1];
 }

I looked at the code and I'm not feeling confident enough
to find places where replacing net_generic() with
net_generic_rcu() is okay.

Would it be okay if I add net_generic_rcu() as you suggest,
but leave it to you (network people) to switch to it where
appropriate?

-- 
vda
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netns: deinline net_generic()

2015-04-17 Thread Denys Vlasenko
On 04/17/2015 07:42 PM, Eric Dumazet wrote:
 On Fri, 2015-04-17 at 19:05 +0200, Denys Vlasenko wrote:
 How do you expect one to find excessively large inlines,
 if not on allyesconfig build?
 
 Tuning kernel sources based on allyesconfig build _size_ only is
 terrible. We could build an interpreter based kernel and maybe reduce
 its size by 50% who knows...
 
 You are saying that all inline should be removed, since it is obvious
 kernel size _will_ be smaller.

I am not saying that. That would be stupid.


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] netns: remove BUG_ONs from net_generic()

2015-04-17 Thread Denys Vlasenko
This inline has ~500 callsites.

On 04/14/2015 08:37 PM, David Miller wrote:
 That BUG_ON() was added 7 years ago, and I don't remember it ever
 triggering or helping us diagnose something, so just remove it and
 keep the function inlined.

On x86 allyesconfig build:

text data  bss   dec hex filename
82447071 22255384 20627456 125329911 77861f7 vmlinux4
82441375 22255384 20627456 125324215 7784bb7 vmlinux5prime

Signed-off-by: Denys Vlasenko dvlas...@redhat.com
CC: Eric W. Biederman ebied...@xmission.com
CC: David S. Miller da...@davemloft.net
CC: Jan Engelhardt jeng...@medozas.de
CC: Jiri Pirko jpi...@redhat.com
CC: linux-ker...@vger.kernel.org
CC: netdev@vger.kernel.org
---
 include/net/netns/generic.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/net/netns/generic.h b/include/net/netns/generic.h
index 0931618..70e1585 100644
--- a/include/net/netns/generic.h
+++ b/include/net/netns/generic.h
@@ -38,11 +38,9 @@ static inline void *net_generic(const struct net *net, int 
id)
 
rcu_read_lock();
ng = rcu_dereference(net-gen);
-   BUG_ON(id == 0 || id  ng-len);
ptr = ng-ptr[id - 1];
rcu_read_unlock();
 
-   BUG_ON(!ptr);
return ptr;
 }
 #endif
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netns: deinline net_generic()

2015-04-17 Thread Denys Vlasenko
On 04/16/2015 02:38 PM, Eric Dumazet wrote:
 On Thu, 2015-04-16 at 13:14 +0200, Denys Vlasenko wrote:
 
 However, without BUG_ONs, function is still a bit big
 on PREEMPT configs.
 
 Only on allyesconfig builds, that nobody use but to prove some points
 about code size.

How do you expect one to find excessively large inlines,
if not on allyesconfig build?

Only by using allyesconfig, I can measure how many calls
are there in the kernel. (grepping source is utterly unreliable
due to nested inlines and macros).

For the record: I am not using the _full_ allyesconfig,
I do disable some debugging options which clearly aren't
ever enabled on production systems. E.g. in my config:

# CONFIG_DEBUG_KMEMLEAK_TEST is not set
# CONFIG_KASAN is not set

etc.

 If you look at net_generic(), it is mostly used from code that is
 normally in 3 modules. How many people really load them ?
 
 net/tipc : 91 call sites
 net/sunrpc : 57
 fs/nfsd  fs/lockd : 183
 
 Then few remaining uses in tunnels.

Grepping is far from reliable. The above missed more than half
of all calls. I disassembed vmlinux after deinlining, there are
nearly 500 calls of net_generic().

 As we suggested, please just remove the BUG_ON().

Going to send the patch in a minute.
-- 
vda

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] 3c59x: Fix shared IRQ handling

2015-07-07 Thread Denys Vlasenko
As its first order of business, boomerang_interrupt() checks whether
the device really has any pending interrupts. If it does not,
it does nothing and returns, but it still returns IRQ_HANDLED.

This is wrong: interrupt was not handled, IRQ handlers of other
devices sharing this IRQ line need to be called.

vortex_interrupt() has it right: it returns IRQ_NONE in this case
via IRQ_RETVAL(0).

Do the same in boomerang_interrupt().

Signed-off-by: Denys Vlasenko dvlas...@redhat.com
CC: David S. Miller da...@davemloft.net
CC: linux-ker...@vger.kernel.org
CC: netdev@vger.kernel.org
---
 drivers/net/ethernet/3com/3c59x.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/3com/3c59x.c 
b/drivers/net/ethernet/3com/3c59x.c
index 41095eb..c11d6fc 100644
--- a/drivers/net/ethernet/3com/3c59x.c
+++ b/drivers/net/ethernet/3com/3c59x.c
@@ -2382,6 +2384,7 @@ boomerang_interrupt(int irq, void *dev_id)
void __iomem *ioaddr;
int status;
int work_done = max_interrupt_work;
+   int handled = 0;
 
ioaddr = vp-ioaddr;
 
@@ -2400,6 +2403,7 @@ boomerang_interrupt(int irq, void *dev_id)
 
if ((status  IntLatch) == 0)
goto handler_exit;  /* No interrupt: shared IRQs 
can cause this */
+   handled = 1;
 
if (status == 0x) { /* h/w no longer present (hotplug)? */
if (vortex_debug  1)
@@ -2501,7 +2505,7 @@ boomerang_interrupt(int irq, void *dev_id)
 handler_exit:
vp-handling_irq = 0;
spin_unlock(vp-lock);
-   return IRQ_HANDLED;
+   return IRQ_RETVAL(handled);
 }
 
 static int vortex_rx(struct net_device *dev)
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] iwlwifi: Deinline iwl_{read,write}{8,32}

2015-07-14 Thread Denys Vlasenko
On Tue, Jul 14, 2015 at 2:38 PM, Sergei Shtylyov
sergei.shtyl...@cogentembedded.com wrote:
 +#define IWL_READ_WRITE(static_inline) \
 +static_inline void iwl_write8(struct iwl_trans *trans, u32 ofs, u8 val) \
 +{ \
 +   trace_iwlwifi_dev_iowrite8(trans-dev, ofs, val); \
 +   iwl_trans_write8(trans, ofs, val); \
 +} \
 [...]

 +#if !defined(CONFIG_IWLWIFI_DEVICE_TRACING)
 +IWL_READ_WRITE(static inline)

Not static_inline?

Yes. Here we are putting two words, static inline, in front of every
function definition.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] iwlwifi: Deinline iwl_{read,write}{8,32}

2015-07-14 Thread Denys Vlasenko
With CONFIG_IWLWIFI_DEVICE_TRACING=y, these functions are rather large,
too big for inlining.

With this .config: http://busybox.net/~vda/kernel_config,
after uninlining these functions have sizes and callsite counts
as follows:

iwl_read32  475 bytes, 51 callsites
iwl_write32 477 bytes, 90 callsites
iwl_write8  493 bytes, 3 callsites

Reduction in size is about 74,000 bytes:

text data  bss   dec hex filename
90758147 17226024 36659200 144643371 89f152b vmlinux0
90687995 17221928 36659200 144569123 89df323 vmlinux.after

Signed-off-by: Denys Vlasenko dvlas...@redhat.com
CC: Johannes Berg johannes.b...@intel.com
CC: Emmanuel Grumbach emmanuel.grumb...@intel.com
Cc: John W. Linville linvi...@tuxdriver.com
Cc: Intel Linux Wireless i...@linux.intel.com
Cc: linux-wirel...@vger.kernel.org
Cc: netdev@vger.kernel.org
CC: linux-ker...@vger.kernel.org
---
 drivers/net/wireless/iwlwifi/iwl-io.c |  7 +++
 drivers/net/wireless/iwlwifi/iwl-io.h | 39 +--
 2 files changed, 30 insertions(+), 16 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/iwl-io.c 
b/drivers/net/wireless/iwlwifi/iwl-io.c
index 27c66e4..aed121e 100644
--- a/drivers/net/wireless/iwlwifi/iwl-io.c
+++ b/drivers/net/wireless/iwlwifi/iwl-io.c
@@ -38,6 +38,13 @@
 
 #define IWL_POLL_INTERVAL 10   /* microseconds */
 
+#if defined(CONFIG_IWLWIFI_DEVICE_TRACING)
+IWL_READ_WRITE( /*not inlined*/ )
+IWL_EXPORT_SYMBOL(iwl_write8);
+IWL_EXPORT_SYMBOL(iwl_write32);
+IWL_EXPORT_SYMBOL(iwl_read32);
+#endif
+
 int iwl_poll_bit(struct iwl_trans *trans, u32 addr,
 u32 bits, u32 mask, int timeout)
 {
diff --git a/drivers/net/wireless/iwlwifi/iwl-io.h 
b/drivers/net/wireless/iwlwifi/iwl-io.h
index 705d12c..3c9d2a8 100644
--- a/drivers/net/wireless/iwlwifi/iwl-io.h
+++ b/drivers/net/wireless/iwlwifi/iwl-io.h
@@ -32,24 +32,31 @@
 #include iwl-devtrace.h
 #include iwl-trans.h
 
-static inline void iwl_write8(struct iwl_trans *trans, u32 ofs, u8 val)
-{
-   trace_iwlwifi_dev_iowrite8(trans-dev, ofs, val);
-   iwl_trans_write8(trans, ofs, val);
-}
-
-static inline void iwl_write32(struct iwl_trans *trans, u32 ofs, u32 val)
-{
-   trace_iwlwifi_dev_iowrite32(trans-dev, ofs, val);
-   iwl_trans_write32(trans, ofs, val);
+#define IWL_READ_WRITE(static_inline) \
+static_inline void iwl_write8(struct iwl_trans *trans, u32 ofs, u8 val) \
+{ \
+   trace_iwlwifi_dev_iowrite8(trans-dev, ofs, val); \
+   iwl_trans_write8(trans, ofs, val); \
+} \
+static_inline void iwl_write32(struct iwl_trans *trans, u32 ofs, u32 val) \
+{ \
+   trace_iwlwifi_dev_iowrite32(trans-dev, ofs, val); \
+   iwl_trans_write32(trans, ofs, val); \
+} \
+static_inline u32 iwl_read32(struct iwl_trans *trans, u32 ofs) \
+{ \
+   u32 val = iwl_trans_read32(trans, ofs); \
+   trace_iwlwifi_dev_ioread32(trans-dev, ofs, val); \
+   return val; \
 }
 
-static inline u32 iwl_read32(struct iwl_trans *trans, u32 ofs)
-{
-   u32 val = iwl_trans_read32(trans, ofs);
-   trace_iwlwifi_dev_ioread32(trans-dev, ofs, val);
-   return val;
-}
+#if !defined(CONFIG_IWLWIFI_DEVICE_TRACING)
+IWL_READ_WRITE(static inline)
+#else
+void iwl_write8(struct iwl_trans *trans, u32 ofs, u8 val);
+void iwl_write32(struct iwl_trans *trans, u32 ofs, u32 val);
+u32 iwl_read32(struct iwl_trans *trans, u32 ofs);
+#endif
 
 static inline void iwl_set_bit(struct iwl_trans *trans, u32 reg, u32 mask)
 {
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] jhash: Deinline jhash, jhash2 and __jhash_nwords

2015-07-16 Thread Denys Vlasenko
This patch deinlines jhash, jhash2 and __jhash_nwords.

It also removes rhashtable_jhash2(key, length, seed)
because it was merely calling jhash2(key, length, seed).

With this .config: http://busybox.net/~vda/kernel_config,
after deinlining these functions have sizes and callsite counts
as follows:

__jhash_nwords: 72 bytes, 75 calls
jhash: 297 bytes, 111 calls
jhash2: 205 bytes, 136 calls

Total size decrease is about 38,000 bytes:

text data  bss   dec hex filename
90663567 17221960 36659200 144544727 89d93d7 vmlinux5
90625577 17221864 36659200 144506641 89cff11 vmlinux.after

Signed-off-by: Denys Vlasenko dvlas...@redhat.com
CC: Thomas Graf tg...@suug.ch
CC: Alexander Duyck alexander.h.du...@redhat.com
CC: Jozsef Kadlecsik kad...@blackhole.kfki.hu
CC: Herbert Xu herb...@gondor.apana.org.au
CC: netdev@vger.kernel.org
CC: linux-ker...@vger.kernel.org
---
Changes in v2: created a new source file, jhash.c

 include/linux/jhash.h | 123 +
 lib/Makefile  |   2 +-
 lib/jhash.c   | 149 ++
 lib/rhashtable.c  |  13 +++--
 4 files changed, 160 insertions(+), 127 deletions(-)
 create mode 100644 lib/jhash.c

diff --git a/include/linux/jhash.h b/include/linux/jhash.h
index 348c6f4..0b3f55d 100644
--- a/include/linux/jhash.h
+++ b/include/linux/jhash.h
@@ -31,131 +31,14 @@
 /* Mask the hash value, i.e (value  jhash_mask(n)) instead of (value % n) */
 #define jhash_mask(n)   (jhash_size(n)-1)
 
-/* __jhash_mix -- mix 3 32-bit values reversibly. */
-#define __jhash_mix(a, b, c)   \
-{  \
-   a -= c;  a ^= rol32(c, 4);  c += b; \
-   b -= a;  b ^= rol32(a, 6);  a += c; \
-   c -= b;  c ^= rol32(b, 8);  b += a; \
-   a -= c;  a ^= rol32(c, 16); c += b; \
-   b -= a;  b ^= rol32(a, 19); a += c; \
-   c -= b;  c ^= rol32(b, 4);  b += a; \
-}
-
-/* __jhash_final - final mixing of 3 32-bit values (a,b,c) into c */
-#define __jhash_final(a, b, c) \
-{  \
-   c ^= b; c -= rol32(b, 14);  \
-   a ^= c; a -= rol32(c, 11);  \
-   b ^= a; b -= rol32(a, 25);  \
-   c ^= b; c -= rol32(b, 16);  \
-   a ^= c; a -= rol32(c, 4);   \
-   b ^= a; b -= rol32(a, 14);  \
-   c ^= b; c -= rol32(b, 24);  \
-}
-
 /* An arbitrary initial parameter */
 #define JHASH_INITVAL  0xdeadbeef
 
-/* jhash - hash an arbitrary key
- * @k: sequence of bytes as key
- * @length: the length of the key
- * @initval: the previous hash, or an arbitray value
- *
- * The generic version, hashes an arbitrary sequence of bytes.
- * No alignment or length assumptions are made about the input key.
- *
- * Returns the hash value of the key. The result depends on endianness.
- */
-static inline u32 jhash(const void *key, u32 length, u32 initval)
-{
-   u32 a, b, c;
-   const u8 *k = key;
-
-   /* Set up the internal state */
-   a = b = c = JHASH_INITVAL + length + initval;
-
-   /* All but the last block: affect some 32 bits of (a,b,c) */
-   while (length  12) {
-   a += __get_unaligned_cpu32(k);
-   b += __get_unaligned_cpu32(k + 4);
-   c += __get_unaligned_cpu32(k + 8);
-   __jhash_mix(a, b, c);
-   length -= 12;
-   k += 12;
-   }
-   /* Last block: affect all 32 bits of (c) */
-   /* All the case statements fall through */
-   switch (length) {
-   case 12: c += (u32)k[11]24;
-   case 11: c += (u32)k[10]16;
-   case 10: c += (u32)k[9]8;
-   case 9:  c += k[8];
-   case 8:  b += (u32)k[7]24;
-   case 7:  b += (u32)k[6]16;
-   case 6:  b += (u32)k[5]8;
-   case 5:  b += k[4];
-   case 4:  a += (u32)k[3]24;
-   case 3:  a += (u32)k[2]16;
-   case 2:  a += (u32)k[1]8;
-   case 1:  a += k[0];
-__jhash_final(a, b, c);
-   case 0: /* Nothing left to add */
-   break;
-   }
-
-   return c;
-}
-
-/* jhash2 - hash an array of u32's
- * @k: the key which must be an array of u32's
- * @length: the number of u32's in the key
- * @initval: the previous hash, or an arbitray value
- *
- * Returns the hash value of the key.
- */
-static inline u32 jhash2(const u32 *k, u32 length, u32 initval)
-{
-   u32 a, b, c;
-
-   /* Set up the internal state */
-   a = b = c = JHASH_INITVAL + (length2) + initval;
-
-   /* Handle most of the key */
-   while (length  3) {
-   a += k[0];
-   b += k[1];
-   c += k[2];
-   __jhash_mix(a, b, c);
-   length -= 3;
-   k += 3;
-   }
-
-   /* Handle the last 3 u32's: all the case statements fall through */
-   switch (length) {
-   case 3: c

[PATCH v3] jhash: Deinline jhash and jhash2

2015-07-19 Thread Denys Vlasenko
This patch deinlines jhash and jhash2.

It also removes rhashtable_jhash2(key, length, seed)
because it was merely calling jhash2(key, length, seed).

With this .config: http://busybox.net/~vda/kernel_config,
after deinlining these functions have sizes and callsite counts
as follows:

jhash: 297 bytes, 111 calls
jhash2: 205 bytes, 136 calls

Total size decrease is about 33,000 bytes:

text data  bss   dec hex filename
90663567 17221960 36659200 144544727 89d93d7 vmlinux5
90630370 17221864 36659200 144511434 89d11ca vmlinux.after

Signed-off-by: Denys Vlasenko dvlas...@redhat.com
CC: Thomas Graf tg...@suug.ch
CC: David Miller da...@davemloft.net
CC: Tom Herbert t...@herbertland.com
CC: Alexander Duyck alexander.h.du...@redhat.com
CC: Jozsef Kadlecsik kad...@blackhole.kfki.hu
CC: Herbert Xu herb...@gondor.apana.org.au
CC: netdev@vger.kernel.org
CC: linux-ker...@vger.kernel.org
---
Changes in v2: created a new source file, jhash.c
Changes in v3: do not deinline __jhash_nwords;
  use EXPORT_SYMBOL instead of EXPORT_SYMBOL_GPL

 include/linux/jhash.h |  89 +--
 lib/Makefile  |   2 +-
 lib/jhash.c   | 113 ++
 lib/rhashtable.c  |  13 +++---
 4 files changed, 123 insertions(+), 94 deletions(-)
 create mode 100644 lib/jhash.c

diff --git a/include/linux/jhash.h b/include/linux/jhash.h
index 348c6f4..847c1aa 100644
--- a/include/linux/jhash.h
+++ b/include/linux/jhash.h
@@ -57,93 +57,8 @@
 /* An arbitrary initial parameter */
 #define JHASH_INITVAL  0xdeadbeef
 
-/* jhash - hash an arbitrary key
- * @k: sequence of bytes as key
- * @length: the length of the key
- * @initval: the previous hash, or an arbitray value
- *
- * The generic version, hashes an arbitrary sequence of bytes.
- * No alignment or length assumptions are made about the input key.
- *
- * Returns the hash value of the key. The result depends on endianness.
- */
-static inline u32 jhash(const void *key, u32 length, u32 initval)
-{
-   u32 a, b, c;
-   const u8 *k = key;
-
-   /* Set up the internal state */
-   a = b = c = JHASH_INITVAL + length + initval;
-
-   /* All but the last block: affect some 32 bits of (a,b,c) */
-   while (length  12) {
-   a += __get_unaligned_cpu32(k);
-   b += __get_unaligned_cpu32(k + 4);
-   c += __get_unaligned_cpu32(k + 8);
-   __jhash_mix(a, b, c);
-   length -= 12;
-   k += 12;
-   }
-   /* Last block: affect all 32 bits of (c) */
-   /* All the case statements fall through */
-   switch (length) {
-   case 12: c += (u32)k[11]24;
-   case 11: c += (u32)k[10]16;
-   case 10: c += (u32)k[9]8;
-   case 9:  c += k[8];
-   case 8:  b += (u32)k[7]24;
-   case 7:  b += (u32)k[6]16;
-   case 6:  b += (u32)k[5]8;
-   case 5:  b += k[4];
-   case 4:  a += (u32)k[3]24;
-   case 3:  a += (u32)k[2]16;
-   case 2:  a += (u32)k[1]8;
-   case 1:  a += k[0];
-__jhash_final(a, b, c);
-   case 0: /* Nothing left to add */
-   break;
-   }
-
-   return c;
-}
-
-/* jhash2 - hash an array of u32's
- * @k: the key which must be an array of u32's
- * @length: the number of u32's in the key
- * @initval: the previous hash, or an arbitray value
- *
- * Returns the hash value of the key.
- */
-static inline u32 jhash2(const u32 *k, u32 length, u32 initval)
-{
-   u32 a, b, c;
-
-   /* Set up the internal state */
-   a = b = c = JHASH_INITVAL + (length2) + initval;
-
-   /* Handle most of the key */
-   while (length  3) {
-   a += k[0];
-   b += k[1];
-   c += k[2];
-   __jhash_mix(a, b, c);
-   length -= 3;
-   k += 3;
-   }
-
-   /* Handle the last 3 u32's: all the case statements fall through */
-   switch (length) {
-   case 3: c += k[2];
-   case 2: b += k[1];
-   case 1: a += k[0];
-   __jhash_final(a, b, c);
-   case 0: /* Nothing left to add */
-   break;
-   }
-
-   return c;
-}
-
+u32 jhash(const void *key, u32 length, u32 initval);
+u32 jhash2(const u32 *k, u32 length, u32 initval);
 
 /* __jhash_nwords - hash exactly 3, 2 or 1 word(s) */
 static inline u32 __jhash_nwords(u32 a, u32 b, u32 c, u32 initval)
diff --git a/lib/Makefile b/lib/Makefile
index 6897b52..978be53 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -26,7 +26,7 @@ obj-y += bcd.o div64.o sort.o parser.o halfmd4.o 
debug_locks.o random32.o \
 bust_spinlocks.o kasprintf.o bitmap.o scatterlist.o \
 gcd.o lcm.o list_sort.o uuid.o flex_array.o iov_iter.o clz_ctz.o \
 bsearch.o find_bit.o llist.o memweight.o kfifo.o \
-percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o
+percpu-refcount.o percpu_ida.o jhash.o rhashtable.o reciprocal_div.o
 obj-y

Re: [PATCH v2] jhash: Deinline jhash, jhash2 and __jhash_nwords

2015-07-19 Thread Denys Vlasenko
On 07/16/2015 08:17 PM, David Miller wrote:
 From: Tom Herbert t...@herbertland.com
 Date: Thu, 16 Jul 2015 08:43:25 -0700
 
 On Thu, Jul 16, 2015 at 5:40 AM, Denys Vlasenko dvlas...@redhat.com wrote:
 This patch deinlines jhash, jhash2 and __jhash_nwords.

 It also removes rhashtable_jhash2(key, length, seed)
 because it was merely calling jhash2(key, length, seed).

 With this .config: http://busybox.net/~vda/kernel_config,
 after deinlining these functions have sizes and callsite counts
 as follows:

 __jhash_nwords: 72 bytes, 75 calls
 jhash: 297 bytes, 111 calls
 jhash2: 205 bytes, 136 calls

 jhash is used in several places in the critical data path. Does the
 decrease in text size justify performance impact of not inlining it?
 
 Tom took the words right out of my mouth.
 
 Denys, you keep making deinlining changes like this all the time, like
 a robot.  But I never see you make any effort to look into the performance
 nor code generation ramifications of your changes.

The performance impact of the call/ret pair on modern x86
CPUs is only 5 cycles. To make a real difference in execution
speed, the function has to be less than 100 bytes of code.

jhash and jhash2, at more than 200 bytes of code, are definitely
far too large for inlining. Here is the smaller of two, jhash2:

jhash2:
   8d 84 b2 ef be ad delea-0x21524111(%rdx,%rsi,4),%eax
   55  push   %rbp
   89 c1   mov%eax,%ecx
   48 89 e5mov%rsp,%rbp
   89 c2   mov%eax,%edx
   eb 62   jmp81a0d204 jhash2+0x73
   03 47 08add0x8(%rdi),%eax
   03 17   add(%rdi),%edx
   83 ee 03sub$0x3,%esi
   03 4f 04add0x4(%rdi),%ecx
   48 83 c7 0c add$0xc,%rdi
   41 89 c0mov%eax,%r8d
   29 c2   sub%eax,%edx
   41 c1 c8 1c ror$0x1c,%r8d
   01 c8   add%ecx,%eax
   44 31 c2xor%r8d,%edx
   41 89 d0mov%edx,%r8d
   29 d1   sub%edx,%ecx
   01 c2   add%eax,%edx
   41 c1 c8 1a ror$0x1a,%r8d
   41 31 c8xor%ecx,%r8d
   44 89 c1mov%r8d,%ecx
   44 29 c0sub%r8d,%eax
   41 01 d0add%edx,%r8d
   c1 c9 18ror$0x18,%ecx
   31 c1   xor%eax,%ecx
   89 c8   mov%ecx,%eax
   29 ca   sub%ecx,%edx
   46 8d 0c 01 lea(%rcx,%r8,1),%r9d
   c1 c8 10ror$0x10,%eax
   31 d0   xor%edx,%eax
   89 c1   mov%eax,%ecx
   41 29 c0sub%eax,%r8d
   42 8d 14 08 lea(%rax,%r9,1),%edx
   c1 c9 0dror$0xd,%ecx
   44 31 c1xor%r8d,%ecx
   89 c8   mov%ecx,%eax
   41 29 c9sub%ecx,%r9d
   01 d1   add%edx,%ecx
   c1 c8 1cror$0x1c,%eax
   44 31 c8xor%r9d,%eax
   83 fe 03cmp$0x3,%esi
   77 99   ja 81a0d1a2 jhash2+0x11
   83 fe 02cmp$0x2,%esi
   74 0e   je 81a0d21c jhash2+0x8b
   83 fe 03cmp$0x3,%esi
   74 06   je 81a0d219 jhash2+0x88
   ff ce   dec%esi
   75 45   jne81a0d25c jhash2+0xcb
   eb 06   jmp81a0d21f jhash2+0x8e
   03 47 08add0x8(%rdi),%eax
   03 4f 04add0x4(%rdi),%ecx
   89 ce   mov%ecx,%esi
   03 17   add(%rdi),%edx
   31 c8   xor%ecx,%eax
   c1 ce 12ror$0x12,%esi
   29 f0   sub%esi,%eax
   89 c6   mov%eax,%esi
   31 c2   xor%eax,%edx
   c1 ce 15ror$0x15,%esi
   29 f2   sub%esi,%edx
   89 d6   mov%edx,%esi
   31 d1   xor%edx,%ecx
   c1 ce 07ror$0x7,%esi
   29 f1   sub%esi,%ecx
   89 ce   mov%ecx,%esi
   31 c8   xor%ecx,%eax
   c1 ce 10ror$0x10,%esi
   29 f0   sub%esi,%eax
   89 c6   mov%eax,%esi
   31 c2   xor%eax,%edx
   c1 ce 1cror$0x1c,%esi
   29 f2   sub%esi,%edx
   31

[PATCH] mac80211: Deinline drv_sta_state

2015-07-15 Thread Denys Vlasenko
With this .config: http://busybox.net/~vda/kernel_config,
after deinlining the function size is 3132 bytes and there are
7 callsites.

Total size reduction: about 20 kbytes.

Signed-off-by: Denys Vlasenko dvlas...@redhat.com
CC: John Linville linvi...@tuxdriver.com
CC: Michal Kazior michal.kaz...@tieto.com
Cc: Johannes Berg johannes.b...@intel.com
Cc: linux-wirel...@vger.kernel.org
Cc: netdev@vger.kernel.org
CC: linux-ker...@vger.kernel.org
---
 net/mac80211/Makefile |  1 +
 net/mac80211/driver-ops.c | 41 +
 net/mac80211/driver-ops.h | 29 ++---
 3 files changed, 44 insertions(+), 27 deletions(-)
 create mode 100644 net/mac80211/driver-ops.c

diff --git a/net/mac80211/Makefile b/net/mac80211/Makefile
index 3275f01..783e891 100644
--- a/net/mac80211/Makefile
+++ b/net/mac80211/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_MAC80211) += mac80211.o
 # mac80211 objects
 mac80211-y := \
main.o status.o \
+   driver-ops.o \
sta_info.o \
wep.o \
wpa.o \
diff --git a/net/mac80211/driver-ops.c b/net/mac80211/driver-ops.c
new file mode 100644
index 000..267c3b1
--- /dev/null
+++ b/net/mac80211/driver-ops.c
@@ -0,0 +1,41 @@
+/*
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#include net/mac80211.h
+#include ieee80211_i.h
+#include trace.h
+#include driver-ops.h
+
+__must_check
+int drv_sta_state(struct ieee80211_local *local,
+ struct ieee80211_sub_if_data *sdata,
+ struct sta_info *sta,
+ enum ieee80211_sta_state old_state,
+ enum ieee80211_sta_state new_state)
+{
+   int ret = 0;
+
+   might_sleep();
+
+   sdata = get_bss_sdata(sdata);
+   if (!check_sdata_in_driver(sdata))
+   return -EIO;
+
+   trace_drv_sta_state(local, sdata, sta-sta, old_state, new_state);
+   if (local-ops-sta_state) {
+   ret = local-ops-sta_state(local-hw, sdata-vif, sta-sta,
+   old_state, new_state);
+   } else if (old_state == IEEE80211_STA_AUTH 
+  new_state == IEEE80211_STA_ASSOC) {
+   ret = drv_sta_add(local, sdata, sta-sta);
+   if (ret == 0)
+   sta-uploaded = true;
+   } else if (old_state == IEEE80211_STA_ASSOC 
+  new_state == IEEE80211_STA_AUTH) {
+   drv_sta_remove(local, sdata, sta-sta);
+   }
+   trace_drv_return_int(local, ret);
+   return ret;
+}
diff --git a/net/mac80211/driver-ops.h b/net/mac80211/driver-ops.h
index 32a2e70..02d9133 100644
--- a/net/mac80211/driver-ops.h
+++ b/net/mac80211/driver-ops.h
@@ -573,37 +573,12 @@ static inline void drv_sta_pre_rcu_remove(struct 
ieee80211_local *local,
trace_drv_return_void(local);
 }
 
-static inline __must_check
+__must_check
 int drv_sta_state(struct ieee80211_local *local,
  struct ieee80211_sub_if_data *sdata,
  struct sta_info *sta,
  enum ieee80211_sta_state old_state,
- enum ieee80211_sta_state new_state)
-{
-   int ret = 0;
-
-   might_sleep();
-
-   sdata = get_bss_sdata(sdata);
-   if (!check_sdata_in_driver(sdata))
-   return -EIO;
-
-   trace_drv_sta_state(local, sdata, sta-sta, old_state, new_state);
-   if (local-ops-sta_state) {
-   ret = local-ops-sta_state(local-hw, sdata-vif, sta-sta,
-   old_state, new_state);
-   } else if (old_state == IEEE80211_STA_AUTH 
-  new_state == IEEE80211_STA_ASSOC) {
-   ret = drv_sta_add(local, sdata, sta-sta);
-   if (ret == 0)
-   sta-uploaded = true;
-   } else if (old_state == IEEE80211_STA_ASSOC 
-  new_state == IEEE80211_STA_AUTH) {
-   drv_sta_remove(local, sdata, sta-sta);
-   }
-   trace_drv_return_int(local, ret);
-   return ret;
-}
+ enum ieee80211_sta_state new_state);
 
 static inline void drv_sta_rc_update(struct ieee80211_local *local,
 struct ieee80211_sub_if_data *sdata,
-- 
1.8.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] mac80211: Deinline rate_control_rate_init, rate_control_rate_update

2015-07-15 Thread Denys Vlasenko
With this .config: http://busybox.net/~vda/kernel_config,
after deinlining these functions have sizes and callsite counts
as follows:

rate_control_rate_init: 554 bytes, 8 calls
rate_control_rate_update: 1596 bytes, 5 calls

Total size reduction: about 11 kbytes.

Signed-off-by: Denys Vlasenko dvlas...@redhat.com
CC: John Linville linvi...@tuxdriver.com
CC: Michal Kazior michal.kaz...@tieto.com
CC: Johannes Berg johannes.b...@intel.com
Cc: linux-wirel...@vger.kernel.org
Cc: netdev@vger.kernel.org
CC: linux-ker...@vger.kernel.org
---
 net/mac80211/rate.c | 59 
 net/mac80211/rate.h | 60 +++--
 2 files changed, 62 insertions(+), 57 deletions(-)

diff --git a/net/mac80211/rate.c b/net/mac80211/rate.c
index fda33f9..03687d2 100644
--- a/net/mac80211/rate.c
+++ b/net/mac80211/rate.c
@@ -29,6 +29,65 @@ module_param(ieee80211_default_rc_algo, charp, 0644);
 MODULE_PARM_DESC(ieee80211_default_rc_algo,
 Default rate control algorithm for mac80211 to use);
 
+void rate_control_rate_init(struct sta_info *sta)
+{
+   struct ieee80211_local *local = sta-sdata-local;
+   struct rate_control_ref *ref = sta-rate_ctrl;
+   struct ieee80211_sta *ista = sta-sta;
+   void *priv_sta = sta-rate_ctrl_priv;
+   struct ieee80211_supported_band *sband;
+   struct ieee80211_chanctx_conf *chanctx_conf;
+
+   ieee80211_sta_set_rx_nss(sta);
+
+   if (!ref)
+   return;
+
+   rcu_read_lock();
+
+   chanctx_conf = rcu_dereference(sta-sdata-vif.chanctx_conf);
+   if (WARN_ON(!chanctx_conf)) {
+   rcu_read_unlock();
+   return;
+   }
+
+   sband = local-hw.wiphy-bands[chanctx_conf-def.chan-band];
+
+   spin_lock_bh(sta-rate_ctrl_lock);
+   ref-ops-rate_init(ref-priv, sband, chanctx_conf-def, ista,
+   priv_sta);
+   spin_unlock_bh(sta-rate_ctrl_lock);
+   rcu_read_unlock();
+   set_sta_flag(sta, WLAN_STA_RATE_CONTROL);
+}
+
+void rate_control_rate_update(struct ieee80211_local *local,
+   struct ieee80211_supported_band *sband,
+   struct sta_info *sta, u32 changed)
+{
+   struct rate_control_ref *ref = local-rate_ctrl;
+   struct ieee80211_sta *ista = sta-sta;
+   void *priv_sta = sta-rate_ctrl_priv;
+   struct ieee80211_chanctx_conf *chanctx_conf;
+
+   if (ref  ref-ops-rate_update) {
+   rcu_read_lock();
+
+   chanctx_conf = rcu_dereference(sta-sdata-vif.chanctx_conf);
+   if (WARN_ON(!chanctx_conf)) {
+   rcu_read_unlock();
+   return;
+   }
+
+   spin_lock_bh(sta-rate_ctrl_lock);
+   ref-ops-rate_update(ref-priv, sband, chanctx_conf-def,
+ ista, priv_sta, changed);
+   spin_unlock_bh(sta-rate_ctrl_lock);
+   rcu_read_unlock();
+   }
+   drv_sta_rc_update(local, sta-sdata, sta-sta, changed);
+}
+
 int ieee80211_rate_control_register(const struct rate_control_ops *ops)
 {
struct rate_control_alg *alg;
diff --git a/net/mac80211/rate.h b/net/mac80211/rate.h
index 25c9be5..624fe5b 100644
--- a/net/mac80211/rate.h
+++ b/net/mac80211/rate.h
@@ -71,64 +71,10 @@ rate_control_tx_status_noskb(struct ieee80211_local *local,
spin_unlock_bh(sta-rate_ctrl_lock);
 }
 
-static inline void rate_control_rate_init(struct sta_info *sta)
-{
-   struct ieee80211_local *local = sta-sdata-local;
-   struct rate_control_ref *ref = sta-rate_ctrl;
-   struct ieee80211_sta *ista = sta-sta;
-   void *priv_sta = sta-rate_ctrl_priv;
-   struct ieee80211_supported_band *sband;
-   struct ieee80211_chanctx_conf *chanctx_conf;
-
-   ieee80211_sta_set_rx_nss(sta);
-
-   if (!ref)
-   return;
-
-   rcu_read_lock();
-
-   chanctx_conf = rcu_dereference(sta-sdata-vif.chanctx_conf);
-   if (WARN_ON(!chanctx_conf)) {
-   rcu_read_unlock();
-   return;
-   }
-
-   sband = local-hw.wiphy-bands[chanctx_conf-def.chan-band];
-
-   spin_lock_bh(sta-rate_ctrl_lock);
-   ref-ops-rate_init(ref-priv, sband, chanctx_conf-def, ista,
-   priv_sta);
-   spin_unlock_bh(sta-rate_ctrl_lock);
-   rcu_read_unlock();
-   set_sta_flag(sta, WLAN_STA_RATE_CONTROL);
-}
-
-static inline void rate_control_rate_update(struct ieee80211_local *local,
+void rate_control_rate_init(struct sta_info *sta);
+void rate_control_rate_update(struct ieee80211_local *local,
struct ieee80211_supported_band *sband,
-   struct sta_info *sta, u32 changed)
-{
-   struct rate_control_ref *ref = local-rate_ctrl;
-   struct ieee80211_sta *ista = sta-sta;
-   void *priv_sta = sta-rate_ctrl_priv

Re: [PATCH v2] net: sctp: Don't use 64 kilobyte lookup table for four elements

2015-09-28 Thread Denys Vlasenko
On 09/28/2015 05:32 PM, David Laight wrote:
> From: Eric Dumazet
>> Sent: 28 September 2015 15:27
>> On Mon, 2015-09-28 at 14:12 +, David Laight wrote:
>>> From: Neil Horman
>>>> Sent: 28 September 2015 14:51
>>>> On Mon, Sep 28, 2015 at 02:34:04PM +0200, Denys Vlasenko wrote:
>>>>> Seemingly innocuous sctp_trans_state_to_prio_map[] array
>>>>> is way bigger than it looks, since
>>>>> "[SCTP_UNKNOWN] = 2" expands into "[0x] = 2" !
>>>>>
>>>>> This patch replaces it with switch() statement.
>>>
>>> What about just adding 1 (and masking) before indexing the array?
>>> That might require a static inline function with a local static array.
>>>
>>> Or define the array as (say) [16] and just mask the state before using
>>> it as an index?
>>
>> Just let the compiler do its job, instead of obfuscating source.
>>
>> Compilers can transform a switch into an (optimal) table if it is really
>> a gain.
> 
> The compiler can choose between a jump table and nested ifs for a switch
> statement. I've never seen it convert one into a data array index.

I don't know why people are fixated on a lookup table here.

For just four possible values, the amount of generated code
is less than one Icache cacheline.

Instruction cachelines are efficiently prefetched and branches
are predicted on all modern CPUs.
Possible data access for lookup table can not be prefetched
as efficiently.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] net: sctp: Don't use 64 kilobyte lookup table for four elements

2015-09-28 Thread Denys Vlasenko
Seemingly innocuous sctp_trans_state_to_prio_map[] array
is way bigger than it looks, since
"[SCTP_UNKNOWN] = 2" expands into "[0x] = 2" !

This patch replaces it with switch() statement.

Signed-off-by: Denys Vlasenko <dvlas...@redhat.com>
CC: Vlad Yasevich <vyasev...@gmail.com>
CC: Neil Horman <nhor...@tuxdriver.com>
CC: Marcelo Ricardo Leitner <marcelo.leit...@gmail.com>
CC: linux-s...@vger.kernel.org
CC: netdev@vger.kernel.org
CC: linux-ker...@vger.kernel.org
---

Changes since v1: tweaked comments

 net/sctp/associola.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index 197c3f5..dae51ac 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -1208,20 +1208,22 @@ void sctp_assoc_update(struct sctp_association *asoc,
  *   within this document.
  *
  * Our basic strategy is to round-robin transports in priorities
- * according to sctp_state_prio_map[] e.g., if no such
+ * according to sctp_trans_score() e.g., if no such
  * transport with state SCTP_ACTIVE exists, round-robin through
  * SCTP_UNKNOWN, etc. You get the picture.
  */
-static const u8 sctp_trans_state_to_prio_map[] = {
-   [SCTP_ACTIVE]   = 3,/* best case */
-   [SCTP_UNKNOWN]  = 2,
-   [SCTP_PF]   = 1,
-   [SCTP_INACTIVE] = 0,/* worst case */
-};
-
 static u8 sctp_trans_score(const struct sctp_transport *trans)
 {
-   return sctp_trans_state_to_prio_map[trans->state];
+   switch (trans->state) {
+   case SCTP_ACTIVE:
+   return 3;   /* best case */
+   case SCTP_UNKNOWN:
+   return 2;
+   case SCTP_PF:
+   return 1;
+   default: /* case SCTP_INACTIVE */
+   return 0;   /* worst case */
+   }
 }
 
 static struct sctp_transport *sctp_trans_elect_tie(struct sctp_transport 
*trans1,
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net: force inlining of netif_tx_start/stop_queue, sock_hold, __sock_put

2016-04-08 Thread Denys Vlasenko
Sometimes gcc mysteriously doesn't inline
very small functions we expect to be inlined. See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122
Arguably, gcc should do better, but gcc people aren't willing
to invest time into it, asking to use __always_inline instead.

With this .config:
http://busybox.net/~vda/kernel_config_OPTIMIZE_INLINING_and_Os,
the following functions get deinlined many times.

netif_tx_stop_queue: 207 copies, 590 calls:
55  push   %rbp
48 89 e5mov%rsp,%rbp
f0 80 8f e0 01 00 00 01 lock orb $0x1,0x1e0(%rdi)
5d  pop%rbp
c3  retq

netif_tx_start_queue: 47 copies, 111 calls
55  push   %rbp
48 89 e5mov%rsp,%rbp
f0 80 a7 e0 01 00 00 fe lock andb $0xfe,0x1e0(%rdi)
5d  pop%rbp
c3  retq

sock_hold: 39 copies, 124 calls
55  push   %rbp
48 89 e5mov%rsp,%rbp
f0 ff 87 80 00 00 00lock incl 0x80(%rdi)
5d  pop%rbp
c3  retq

__sock_put: 6 copies, 13 calls
55  push   %rbp
48 89 e5mov%rsp,%rbp
f0 ff 8f 80 00 00 00lock decl 0x80(%rdi)
5d  pop%rbp
c3  retq

This patch fixes this via s/inline/__always_inline/.

Code size decrease after the patch is ~2.5k:

text  data  bss   dec hex filename
56719876  56364551 36196352 149280779 8e5d80b vmlinux_before
56717440  56364551 36196352 149278343 8e5ce87 vmlinux

Signed-off-by: Denys Vlasenko <dvlas...@redhat.com>
CC: David S. Miller <da...@davemloft.net>
CC: linux-ker...@vger.kernel.org
CC: netdev@vger.kernel.org
CC: netfilter-de...@vger.kernel.org
---
 include/linux/netdevice.h | 4 ++--
 include/net/sock.h| 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index cb0d5d0..f924ddc 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -2801,7 +2801,7 @@ static inline void netif_tx_schedule_all(struct 
net_device *dev)
netif_schedule_queue(netdev_get_tx_queue(dev, i));
 }
 
-static inline void netif_tx_start_queue(struct netdev_queue *dev_queue)
+static __always_inline void netif_tx_start_queue(struct netdev_queue 
*dev_queue)
 {
clear_bit(__QUEUE_STATE_DRV_XOFF, _queue->state);
 }
@@ -2851,7 +2851,7 @@ static inline void netif_tx_wake_all_queues(struct 
net_device *dev)
}
 }
 
-static inline void netif_tx_stop_queue(struct netdev_queue *dev_queue)
+static __always_inline void netif_tx_stop_queue(struct netdev_queue *dev_queue)
 {
set_bit(__QUEUE_STATE_DRV_XOFF, _queue->state);
 }
diff --git a/include/net/sock.h b/include/net/sock.h
index 255d3e0..fd15eb1 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -564,7 +564,7 @@ static inline bool __sk_del_node_init(struct sock *sk)
modifications.
  */
 
-static inline void sock_hold(struct sock *sk)
+static __always_inline void sock_hold(struct sock *sk)
 {
atomic_inc(>sk_refcnt);
 }
@@ -572,7 +572,7 @@ static inline void sock_hold(struct sock *sk)
 /* Ungrab socket in the context, which assumes that socket refcnt
cannot hit zero, f.e. it is true in context of any socketcall.
  */
-static inline void __sock_put(struct sock *sk)
+static __always_inline void __sock_put(struct sock *sk)
 {
atomic_dec(>sk_refcnt);
 }
-- 
1.8.1.4



[PATCH] drivers/net/ethernet/jme.c: Deinline jme_reset_mac_processor, save 2816 bytes

2016-04-08 Thread Denys Vlasenko
This function compiles to 895 bytes of machine code.

Clearly, this isn't a time-critical function.
For one, it has a number of udelay(1) calls.

Signed-off-by: Denys Vlasenko <dvlas...@redhat.com>
CC: David S. Miller <da...@davemloft.net>
CC: linux-ker...@vger.kernel.org
CC: netdev@vger.kernel.org
---
 drivers/net/ethernet/jme.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/jme.c b/drivers/net/ethernet/jme.c
index 3ddf657..711cb19 100644
--- a/drivers/net/ethernet/jme.c
+++ b/drivers/net/ethernet/jme.c
@@ -222,7 +222,7 @@ jme_clear_ghc_reset(struct jme_adapter *jme)
jwrite32f(jme, JME_GHC, jme->reg_ghc);
 }
 
-static inline void
+static void
 jme_reset_mac_processor(struct jme_adapter *jme)
 {
static const u32 mask[WAKEUP_FRAME_MASK_DWNR] = {0, 0, 0, 0};
-- 
2.1.0



[PATCH] e1000e: prevent division by zero if TIMINCA is zero

2016-05-06 Thread Denys Vlasenko
Users report that under VMWare, er32(TIMINCA) returns zero.
This causes division by zero at init time as follows:

 ==>incvalue = er32(TIMINCA) & E1000_TIMINCA_INCVALUE_MASK;
for (i = 0; i < E1000_MAX_82574_SYSTIM_REREADS; i++) {
/* latch SYSTIMH on read of SYSTIML */
systim_next = (cycle_t)er32(SYSTIML);
systim_next |= (cycle_t)er32(SYSTIMH) << 32;

time_delta = systim_next - systim;
temp = time_delta;
 >  rem = do_div(temp, incvalue);

This change makes kernel survive this, and users report that
NIC does work after this change.

Since on real hardware incvalue is never zero, this should not affect
real hardware use case.

Signed-off-by: Denys Vlasenko <dvlas...@redhat.com>
CC: Jeff Kirsher <jeffrey.t.kirs...@intel.com>
CC: "Ruinskiy, Dima" <dima.ruins...@intel.com>
CC: intel-wired-...@lists.osuosl.org
CC: netdev@vger.kernel.org
CC: LKML <linux-ker...@vger.kernel.org>
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 269087c..0626935 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4315,7 +4315,8 @@ static cycle_t e1000e_cyclecounter_read(const struct 
cyclecounter *cc)
 
time_delta = systim_next - systim;
temp = time_delta;
-   rem = do_div(temp, incvalue);
+   /* VMWare users have seen incvalue of zero, don't div / 
0 */
+   rem = incvalue ? do_div(temp, incvalue) : (time_delta 
!= 0);
 
systim = systim_next;
 
-- 
1.8.1.4



Re: [PATCH 1/3] e1000e: e1000e_cyclecounter_read(): incvalue is 32 bits, not 64

2016-04-20 Thread Denys Vlasenko
On 04/19/2016 10:57 PM, Jeff Kirsher wrote:
> On Tue, 2016-04-19 at 14:34 +0200, Denys Vlasenko wrote:
>> "incvalue" variable holds a result of "er32(TIMINCA) &
>> E1000_TIMINCA_INCVALUE_MASK"
>> and used in "do_div(temp, incvalue)" as a divisor.
>>
>> Thus, "u64 incvalue" declaration is probably a mistake.
>> Even though it seems to be a harmless one, let's fix it.
>>
>> Signed-off-by: Denys Vlasenko <dvlas...@redhat.com>
>> CC: Jeff Kirsher <jeffrey.t.kirs...@intel.com>
>> CC: Jesse Brandeburg <jesse.brandeb...@intel.com>
>> CC: Shannon Nelson <shannon.nel...@intel.com>
>> CC: Carolyn Wyborny <carolyn.wybo...@intel.com>
>> CC: Don Skidmore <donald.c.skidm...@intel.com>
>> CC: Bruce Allan <bruce.w.al...@intel.com>
>> CC: John Ronciak <john.ronc...@intel.com>
>> CC: Mitch Williams <mitch.a.willi...@intel.com>
>> CC: David S. Miller <da...@davemloft.net>
>> CC: LKML <linux-ker...@vger.kernel.org>
>> CC: netdev@vger.kernel.org
>> ---
>>  drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> First of all, trimmed down the recipient list since almost all of the
> reviewers you added have nothing to do with e1000e.

I took the list here, MAINTAINERS:

INTEL ETHERNET DRIVERS
M:  Jeff Kirsher <jeffrey.t.kirs...@intel.com>
R:  Jesse Brandeburg <jesse.brandeb...@intel.com>
R:  Shannon Nelson <shannon.nel...@intel.com>
R:  Carolyn Wyborny <carolyn.wybo...@intel.com>
R:  Don Skidmore <donald.c.skidm...@intel.com>
R:  Bruce Allan <bruce.w.al...@intel.com>
R:  John Ronciak <john.ronc...@intel.com>
R:  Mitch Williams <mitch.a.willi...@intel.com>

No more specific information who's e1000e reviewer.
Sorry.



[PATCH 1/3] e1000e: e1000e_cyclecounter_read(): incvalue is 32 bits, not 64

2016-04-19 Thread Denys Vlasenko
"incvalue" variable holds a result of "er32(TIMINCA) & 
E1000_TIMINCA_INCVALUE_MASK"
and used in "do_div(temp, incvalue)" as a divisor.

Thus, "u64 incvalue" declaration is probably a mistake.
Even though it seems to be a harmless one, let's fix it.

Signed-off-by: Denys Vlasenko <dvlas...@redhat.com>
CC: Jeff Kirsher <jeffrey.t.kirs...@intel.com>
CC: Jesse Brandeburg <jesse.brandeb...@intel.com>
CC: Shannon Nelson <shannon.nel...@intel.com>
CC: Carolyn Wyborny <carolyn.wybo...@intel.com>
CC: Don Skidmore <donald.c.skidm...@intel.com>
CC: Bruce Allan <bruce.w.al...@intel.com>
CC: John Ronciak <john.ronc...@intel.com>
CC: Mitch Williams <mitch.a.willi...@intel.com>
CC: David S. Miller <da...@davemloft.net>
CC: LKML <linux-ker...@vger.kernel.org>
CC: netdev@vger.kernel.org
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 9b4ec13..967311b 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4300,7 +4300,8 @@ static cycle_t e1000e_cyclecounter_read(const struct 
cyclecounter *cc)
}
 
if ((hw->mac.type == e1000_82574) || (hw->mac.type == e1000_82583)) {
-   u64 incvalue, time_delta, rem, temp;
+   u64 time_delta, rem, temp;
+   u32 incvalue;
int i;
 
/* errata for 82574/82583 possible bad bits read from SYSTIMH/L
-- 
1.8.1.4



[PATCH 3/3] e1000e: e1000e_cyclecounter_read(): do overflow check only if needed

2016-04-19 Thread Denys Vlasenko
SYSTIMH:SYSTIML registers are incremented by 24-bit value TIMINCA[23..0]

er32(SYSTIML) are probably moderately expensive (they are pci bus reads).
Can we avoid one of them? Yes, we can.

If the SYSTIML value we see is smaller than 0xff00, the overflow
into SYSTIMH would require at least two increments.

We do two reads, er32(SYSTIML) and er32(SYSTIMH), in this order.

Even if one increment happens between them, the overflow into SYSTIMH
is impossible, and we can avoid doing another er32(SYSTIML) read
and overflow check.

Signed-off-by: Denys Vlasenko <dvlas...@redhat.com>
CC: Jeff Kirsher <jeffrey.t.kirs...@intel.com>
CC: Jesse Brandeburg <jesse.brandeb...@intel.com>
CC: Shannon Nelson <shannon.nel...@intel.com>
CC: Carolyn Wyborny <carolyn.wybo...@intel.com>
CC: Don Skidmore <donald.c.skidm...@intel.com>
CC: Bruce Allan <bruce.w.al...@intel.com>
CC: John Ronciak <john.ronc...@intel.com>
CC: Mitch Williams <mitch.a.willi...@intel.com>
CC: David S. Miller <da...@davemloft.net>
CC: LKML <linux-ker...@vger.kernel.org>
CC: netdev@vger.kernel.org
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 99d0e6e..6f17f89 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4275,7 +4275,7 @@ static cycle_t e1000e_cyclecounter_read(const struct 
cyclecounter *cc)
struct e1000_adapter *adapter = container_of(cc, struct e1000_adapter,
 cc);
struct e1000_hw *hw = >hw;
-   u32 systimel_1, systimel_2, systimeh;
+   u32 systimel, systimeh;
cycle_t systim, systim_next;
/* SYSTIMH latching upon SYSTIML read does not work well.
 * This means that if SYSTIML overflows after we read it but before
@@ -4283,21 +4283,21 @@ static cycle_t e1000e_cyclecounter_read(const struct 
cyclecounter *cc)
 * will experience a huge non linear increment in the systime value
 * to fix that we test for overflow and if true, we re-read systime.
 */
-   systimel_1 = er32(SYSTIML);
+   systimel = er32(SYSTIML);
systimeh = er32(SYSTIMH);
-   systimel_2 = er32(SYSTIML);
-   /* Check for overflow. If there was no overflow, use the values */
-   if (systimel_1 <= systimel_2) {
-   systim = (cycle_t)systimel_1;
-   systim |= (cycle_t)systimeh << 32;
-   } else {
-   /* There was an overflow, read again SYSTIMH, and use
-* systimel_2
-*/
-   systimeh = er32(SYSTIMH);
-   systim = (cycle_t)systimel_2;
-   systim |= (cycle_t)systimeh << 32;
+   /* Is systimel is so large that overflow is possible? */
+   if (systimel >= (u32)0x - E1000_TIMINCA_INCVALUE_MASK) {
+   u32 systimel_2 = er32(SYSTIML);
+   if (systimel > systimel_2) {
+   /* There was an overflow, read again SYSTIMH, and use
+* systimel_2
+*/
+   systimeh = er32(SYSTIMH);
+   systimel = systimel_2;
+   }
}
+   systim = (cycle_t)systimel;
+   systim |= (cycle_t)systimeh << 32;
 
if ((hw->mac.type == e1000_82574) || (hw->mac.type == e1000_82583)) {
u64 time_delta, rem, temp;
-- 
1.8.1.4



[PATCH 2/3] e1000e: e1000e_cyclecounter_read(): fix er32(SYSTIML) overflow check

2016-04-19 Thread Denys Vlasenko
If two consecutive reads of the counter are the same, it is also not an 
overflow.
"systimel_1 < systimel_2" should be "systimel_1 <= systimel_2".

Before the patch, we could perform an *erroneous* correction:

Let's say that systimel_1 == systimel_2 == 0x.
"systimel_1 < systimel_2" is false, we think it's an overflow,
we read "systimeh = er32(SYSTIMH)" which meanwhile had incremented,
and use "(systimeh << 32) + systimel_2" value which is 2^32 too large.

Signed-off-by: Denys Vlasenko <dvlas...@redhat.com>
CC: Jeff Kirsher <jeffrey.t.kirs...@intel.com>
CC: Jesse Brandeburg <jesse.brandeb...@intel.com>
CC: Shannon Nelson <shannon.nel...@intel.com>
CC: Carolyn Wyborny <carolyn.wybo...@intel.com>
CC: Don Skidmore <donald.c.skidm...@intel.com>
CC: Bruce Allan <bruce.w.al...@intel.com>
CC: John Ronciak <john.ronc...@intel.com>
CC: Mitch Williams <mitch.a.willi...@intel.com>
CC: David S. Miller <da...@davemloft.net>
CC: LKML <linux-ker...@vger.kernel.org>
CC: netdev@vger.kernel.org
---
 drivers/net/ethernet/intel/e1000e/netdev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
b/drivers/net/ethernet/intel/e1000e/netdev.c
index 967311b..99d0e6e 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -4287,7 +4287,7 @@ static cycle_t e1000e_cyclecounter_read(const struct 
cyclecounter *cc)
systimeh = er32(SYSTIMH);
systimel_2 = er32(SYSTIML);
/* Check for overflow. If there was no overflow, use the values */
-   if (systimel_1 < systimel_2) {
+   if (systimel_1 <= systimel_2) {
systim = (cycle_t)systimel_1;
systim |= (cycle_t)systimeh << 32;
} else {
-- 
1.8.1.4



bug in "PCI: Support INTx masking on ConnectX-4 with firmware x.14.1100+"?

2017-07-10 Thread Denys Vlasenko
+   /* Reading from resource space should be 32b aligned */
+   fw_maj_min = ioread32be(fw_ver);
+   fw_sub_min = ioread32be(fw_ver + 1);
+   fw_major = fw_maj_min & 0x;
+   fw_minor = fw_maj_min >> 16;
+   fw_subminor = fw_sub_min & 0x;

Maybe second read should be ioread32be(fw_ver + 4)?


Re: [PATCH v2] net/sctp/ulpevent.c: Deinline sctp_ulpevent_set_owner, save 1616 bytes

2017-06-21 Thread Denys Vlasenko

On 06/21/2017 09:24 PM, Marcelo Ricardo Leitner wrote:

On Wed, Jun 21, 2017 at 07:03:27PM +0200, Denys Vlasenko wrote:

This function compiles to 147 bytes of machine code. 13 callsites.

I'm no expert on SCTP events, but quick reading of SCTP docs tells me that
SCTP events are not happening on every packet.
They are ASSOC_CHANGE, PEER_ADDR_CHANGE, REMOTE_ERROR and such.
Does not look performance critical.

Signed-off-by: Denys Vlasenko <dvlas...@redhat.com>
CC: Vlad Yasevich <vyasev...@gmail.com>
CC: Neil Horman <nhor...@tuxdriver.com>
CC: David Miller <da...@davemloft.net>
CC: linux-s...@vger.kernel.org
CC: netdev@vger.kernel.org
CC: linux-ker...@vger.kernel.org


Acked-by: Marcelo Ricardo Leitner <marcelo.leit...@gmail.com>

Just wondering, are you conducting further research on this topic?
Because we probably could use such review on SCTP stack.


Here is the list of sctp inlines which has any (however small)
benefit when deinlined.

filename:lineno:inline_name:lines_of_source_code:saved_bytes_by_deinlining:size_of_code_of_deinlined_function

include/net/sctp/command.h:228:sctp_add_cmd_sf:7:8306:38
net/sctp/ulpevent.c:91:sctp_ulpevent_set_owner:13:1616:147
include/net/sctp/sctp.h:414:sctp_skb_set_owner_r:10:934:75
net/sctp/input.c:840:sctp_hash_key:13:896:359
net/sctp/input.c:823:sctp_hash_obj:13:704:409
include/net/sctp/checksum.h:60:sctp_compute_cksum:13:595:85
net/sctp/input.c:800:sctp_hash_cmp:18:320:124
net/sctp/socket.c:117:sctp_wspace:19:256:76
include/net/sctp/sctp.h:272:sctp_max_rto:7:204:68
net/sctp/socket.c:173:sctp_verify_addr:15:192:72
include/net/sctp/checksum.h:46:sctp_csum_update:4:147:21
include/net/sctp/sctp.h:519:param_type2af:8:134:35
include/net/sctp/sctp.h:399:sctp_list_dequeue:7:123:59
include/net/sctp/sctp.h:596:sctp_transport_dst_check:4:120:60
include/net/sctp/sctp.h:435:sctp_frag_point:12:65:65
net/sctp/outqueue.c:82:sctp_outq_dequeue_data:10:64:87
include/net/sctp/command.h:243:sctp_next_cmd:4:64:37
include/net/sctp/sm.h:347:sctp_data_size:6:19:19

For .c files, the patches are trivial and I have them auto-generated,
I'll send them to you privately to save you the mechanical work.

Unfortunately, for inlines in .h files this does not work
(a human is needed to decide where to more the function).

Of course, not every deinlining makes sense.


[PATCH] liquidio: stop using huge static buffer, save 4096k in .data

2017-06-19 Thread Denys Vlasenko
Only compile-tested - I don't have the hardware.

>From code inspection, octeon_pci_write_core_mem() appears to be safe wrt
unaligned source. In any case, u8 fbuf[] was not guaranteed to be aligned
anyway.

Signed-off-by: Denys Vlasenko <dvlas...@redhat.com>
CC: Felix Manlunas <felix.manlu...@cavium.com>
CC: Prasad Kanneganti <prasad.kannega...@cavium.com>
CC: Derek Chickles <derek.chick...@cavium.com>
CC: David Miller <da...@davemloft.net>
CC: netdev@vger.kernel.org
CC: linux-ker...@vger.kernel.org
---
 drivers/net/ethernet/cavium/liquidio/octeon_console.c | 6 +-
 drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c | 4 ++--
 drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.h | 2 +-
 3 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_console.c 
b/drivers/net/ethernet/cavium/liquidio/octeon_console.c
index 53f38d0..e08f760 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_console.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_console.c
@@ -724,13 +724,11 @@ static int octeon_console_read(struct octeon_device *oct, 
u32 console_num,
 }
 
 #define FBUF_SIZE  (4 * 1024 * 1024)
-u8 fbuf[FBUF_SIZE];
 
 int octeon_download_firmware(struct octeon_device *oct, const u8 *data,
 size_t size)
 {
int ret = 0;
-   u8 *p = fbuf;
u32 crc32_result;
u64 load_addr;
u32 image_len;
@@ -805,10 +803,8 @@ int octeon_download_firmware(struct octeon_device *oct, 
const u8 *data,
else
size = FBUF_SIZE;
 
-   memcpy(p, data, size);
-
/* download the image */
-   octeon_pci_write_core_mem(oct, load_addr, p, (u32)size);
+   octeon_pci_write_core_mem(oct, load_addr, data, 
(u32)size);
 
data += size;
rem -= (u32)size;
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c 
b/drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c
index 5cd96e7..4c85ae6 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c
@@ -167,10 +167,10 @@ octeon_pci_read_core_mem(struct octeon_device *oct,
 void
 octeon_pci_write_core_mem(struct octeon_device *oct,
  u64 coreaddr,
- u8 *buf,
+ const u8 *buf,
  u32 len)
 {
-   __octeon_pci_rw_core_mem(oct, coreaddr, buf, len, 0);
+   __octeon_pci_rw_core_mem(oct, coreaddr, (u8 *)buf, len, 0);
 }
 
 u64 octeon_read_device_mem64(struct octeon_device *oct, u64 coreaddr)
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.h 
b/drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.h
index bae2fdd..47a3ff5 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.h
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.h
@@ -66,7 +66,7 @@ octeon_pci_read_core_mem(struct octeon_device *oct,
 void
 octeon_pci_write_core_mem(struct octeon_device *oct,
  u64 coreaddr,
- u8 *buf,
+ const u8 *buf,
  u32 len);
 
 #endif
-- 
2.9.2



[PATCH] net/sctp/ulpevent.c: Deinline sctp_ulpevent_set_owner, save 1616 bytes

2017-06-21 Thread Denys Vlasenko
This function compiles to 147 bytes of machine code. 13 callsites.

I'm no expert on SCTP events, but quick reading of SCTP docs tells me that
SCTP events are not happening on every packet.
They are ASSOC_CHANGE, PEER_ADDR_CHANGE, REMOTE_ERROR and such.
Does not look performance critical.

Signed-off-by: Denys Vlasenko <dvlas...@redhat.com>
CC: Vlad Yasevich <vyasev...@gmail.com>
CC: Neil Horman <nhor...@tuxdriver.com>
CC: David Miller <da...@davemloft.net>
CC: linux-s...@vger.kernel.org
CC: netdev@vger.kernel.org
CC: linux-ker...@vger.kernel.org
---
 net/sctp/ulpevent.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
index ec2b3e0..049aa5a 100644
--- a/net/sctp/ulpevent.c
+++ b/net/sctp/ulpevent.c
@@ -88,7 +88,7 @@ int sctp_ulpevent_is_notification(const struct sctp_ulpevent 
*event)
 /* Hold the association in case the msg_name needs read out of
  * the association.
  */
-static inline void sctp_ulpevent_set_owner(struct sctp_ulpevent *event,
+static void sctp_ulpevent_set_owner(struct sctp_ulpevent *event,
   const struct sctp_association *asoc)
 {
struct sctp_chunk *chunk = event->chunk;
-- 
1.8.3.1



[PATCH v2] net/sctp/ulpevent.c: Deinline sctp_ulpevent_set_owner, save 1616 bytes

2017-06-21 Thread Denys Vlasenko
This function compiles to 147 bytes of machine code. 13 callsites.

I'm no expert on SCTP events, but quick reading of SCTP docs tells me that
SCTP events are not happening on every packet.
They are ASSOC_CHANGE, PEER_ADDR_CHANGE, REMOTE_ERROR and such.
Does not look performance critical.

Signed-off-by: Denys Vlasenko <dvlas...@redhat.com>
CC: Vlad Yasevich <vyasev...@gmail.com>
CC: Neil Horman <nhor...@tuxdriver.com>
CC: David Miller <da...@davemloft.net>
CC: linux-s...@vger.kernel.org
CC: netdev@vger.kernel.org
CC: linux-ker...@vger.kernel.org
---
Changed since v1:
* reindented function argument list

 net/sctp/ulpevent.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/ulpevent.c b/net/sctp/ulpevent.c
index ec2b3e0..bc3f495 100644
--- a/net/sctp/ulpevent.c
+++ b/net/sctp/ulpevent.c
@@ -88,8 +88,8 @@ int sctp_ulpevent_is_notification(const struct sctp_ulpevent 
*event)
 /* Hold the association in case the msg_name needs read out of
  * the association.
  */
-static inline void sctp_ulpevent_set_owner(struct sctp_ulpevent *event,
-  const struct sctp_association *asoc)
+static void sctp_ulpevent_set_owner(struct sctp_ulpevent *event,
+   const struct sctp_association *asoc)
 {
struct sctp_chunk *chunk = event->chunk;
struct sk_buff *skb;
-- 
1.8.3.1



[PATCH] net: make getname() functions return length rather than use int* parameter

2018-02-12 Thread Denys Vlasenko
Before:
All these functions either return a negative error indicator,
or store length of sockaddr into "int *socklen" parameter
and return zero on success.

"int *socklen" parameter is awkward. For example, if caller does not
care, it still needs to provide on-stack storage for the value
it does not need.

None of the many FOO_getname() functions of various protocols
ever used old value of *socklen. They always just overwrite it.

This change drops this parameter, and makes all these functions, on success,
return length of sockaddr. It's always >= 0 and can be differentiated
from an error.

Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.

rpc_sockname() lost "int buflen" parameter, since its only use was
to be passed to kernel_getsockname() as  and subsequently
not used in any way.

Userspace API is not changed.

textdata bss  dec hex filename
30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
30108109 2633612  873672 33615393 200ee21 vmlinux.o

Signed-off-by: Denys Vlasenko <dvlas...@redhat.com>
CC: David S. Miller <da...@davemloft.net>
CC: linux-ker...@vger.kernel.org
CC: netdev@vger.kernel.org
CC: linux-blueto...@vger.kernel.org
CC: linux-decnet-u...@lists.sourceforge.net
CC: linux-wirel...@vger.kernel.org
CC: linux-r...@vger.kernel.org
CC: linux-s...@vger.kernel.org
CC: linux-...@vger.kernel.org
CC: linux-...@vger.kernel.org
---
 drivers/isdn/mISDN/socket.c|  5 ++---
 drivers/net/ppp/pppoe.c|  6 ++
 drivers/net/ppp/pptp.c |  6 ++
 drivers/scsi/iscsi_tcp.c   | 14 +++---
 drivers/soc/qcom/qmi_interface.c   |  3 +--
 drivers/staging/ipx/af_ipx.c   |  6 ++
 drivers/staging/irda/net/af_irda.c |  8 +++-
 include/linux/net.h|  8 +++-
 include/net/inet_common.h  |  2 +-
 include/net/ipv6.h |  2 +-
 include/net/sock.h |  2 +-
 net/appletalk/ddp.c|  5 ++---
 net/atm/pvc.c  |  5 ++---
 net/atm/svc.c  |  5 ++---
 net/ax25/af_ax25.c |  4 ++--
 net/bluetooth/hci_sock.c   |  4 ++--
 net/bluetooth/l2cap_sock.c |  5 ++---
 net/bluetooth/rfcomm/sock.c|  5 ++---
 net/bluetooth/sco.c|  5 ++---
 net/can/raw.c  |  6 ++
 net/core/sock.c|  5 +++--
 net/decnet/af_decnet.c |  6 ++
 net/ipv4/af_inet.c |  5 ++---
 net/ipv6/af_inet6.c|  5 ++---
 net/iucv/af_iucv.c |  5 ++---
 net/l2tp/l2tp_ip.c |  5 ++---
 net/l2tp/l2tp_ip6.c|  5 ++---
 net/l2tp/l2tp_ppp.c|  5 ++---
 net/llc/af_llc.c   |  5 ++---
 net/netlink/af_netlink.c   |  5 ++---
 net/netrom/af_netrom.c |  9 +
 net/nfc/llcp_sock.c|  5 ++---
 net/packet/af_packet.c | 10 --
 net/phonet/socket.c|  5 ++---
 net/qrtr/qrtr.c|  5 ++---
 net/rds/af_rds.c   |  5 ++---
 net/rds/tcp.c  |  7 ++-
 net/rose/af_rose.c |  5 ++---
 net/sctp/ipv6.c|  8 
 net/smc/af_smc.c   |  7 +++
 net/socket.c   | 35 +--
 net/sunrpc/clnt.c  |  6 +++---
 net/sunrpc/svcsock.c   | 13 -
 net/sunrpc/xprtsock.c  |  3 +--
 net/tipc/socket.c  |  5 ++---
 net/unix/af_unix.c | 10 +-
 net/vmw_vsock/af_vsock.c   |  4 ++--
 net/x25/af_x25.c   |  4 ++--
 48 files changed, 132 insertions(+), 171 deletions(-)

diff --git a/drivers/isdn/mISDN/socket.c b/drivers/isdn/mISDN/socket.c
index c5603d1a07d6..1f8f489b4167 100644
--- a/drivers/isdn/mISDN/socket.c
+++ b/drivers/isdn/mISDN/socket.c
@@ -560,7 +560,7 @@ data_sock_bind(struct socket *sock, struct sockaddr *addr, 
int addr_len)
 
 static int
 data_sock_getname(struct socket *sock, struct sockaddr *addr,
- int *addr_len, int peer)
+ int peer)
 {
struct sockaddr_mISDN   *maddr = (struct sockaddr_mISDN *) addr;
struct sock *sk = sock->sk;
@@ -570,14 +570,13 @@ data_sock_getname(struct socket *sock, struct sockaddr 
*addr,
 
lock_sock(sk);
 
-   *addr_len = sizeof(*maddr);
maddr->family = AF_ISDN;
maddr->dev = _pms(sk)->dev->id;
maddr->channel = _pms(sk)->ch.nr;
maddr->sapi = _pms(sk)->ch.addr & 0xff;
maddr->tei = (_pms(sk)->ch.addr >> 8) & 0xff;
release_sock(sk);
-   return 0;
+   return sizeof(*maddr);
 }
 
 static const struct proto_ops data_sock_ops = {
diff --git a/drivers/net/ppp/pppoe.c b/drivers/

[PATCH v2] net: make getname() functions return length rather than use int* parameter

2018-02-12 Thread Denys Vlasenko
Changes since v1:
Added changes in these files:
drivers/infiniband/hw/usnic/usnic_transport.c
drivers/staging/lustre/lnet/lnet/lib-socket.c
drivers/target/iscsi/iscsi_target_login.c
drivers/vhost/net.c
fs/dlm/lowcomms.c
fs/ocfs2/cluster/tcp.c
security/tomoyo/network.c


Before:
All these functions either return a negative error indicator,
or store length of sockaddr into "int *socklen" parameter
and return zero on success.

"int *socklen" parameter is awkward. For example, if caller does not
care, it still needs to provide on-stack storage for the value
it does not need.

None of the many FOO_getname() functions of various protocols
ever used old value of *socklen. They always just overwrite it.

This change drops this parameter, and makes all these functions, on success,
return length of sockaddr. It's always >= 0 and can be differentiated
from an error.

Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.

rpc_sockname() lost "int buflen" parameter, since its only use was
to be passed to kernel_getsockname() as  and subsequently
not used in any way.

Userspace API is not changed.

textdata bss  dec hex filename
30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
30108109 2633612  873672 33615393 200ee21 vmlinux.o

Signed-off-by: Denys Vlasenko <dvlas...@redhat.com>
CC: David S. Miller <da...@davemloft.net>
CC: linux-ker...@vger.kernel.org
CC: netdev@vger.kernel.org
CC: linux-blueto...@vger.kernel.org
CC: linux-decnet-u...@lists.sourceforge.net
CC: linux-wirel...@vger.kernel.org
CC: linux-r...@vger.kernel.org
CC: linux-s...@vger.kernel.org
CC: linux-...@vger.kernel.org
CC: linux-...@vger.kernel.org
---
 drivers/infiniband/hw/usnic/usnic_transport.c |  5 ++--
 drivers/isdn/mISDN/socket.c   |  5 ++--
 drivers/net/ppp/pppoe.c   |  6 ++---
 drivers/net/ppp/pptp.c|  6 ++---
 drivers/scsi/iscsi_tcp.c  | 14 +--
 drivers/soc/qcom/qmi_interface.c  |  3 +--
 drivers/staging/ipx/af_ipx.c  |  6 ++---
 drivers/staging/irda/net/af_irda.c|  8 +++---
 drivers/staging/lustre/lnet/lnet/lib-socket.c |  7 +++---
 drivers/target/iscsi/iscsi_target_login.c | 18 +++---
 drivers/vhost/net.c   |  7 +++---
 fs/dlm/lowcomms.c |  7 +++---
 fs/ocfs2/cluster/tcp.c|  6 ++---
 include/linux/net.h   |  8 +++---
 include/net/inet_common.h |  2 +-
 include/net/ipv6.h|  2 +-
 include/net/sock.h|  2 +-
 net/appletalk/ddp.c   |  5 ++--
 net/atm/pvc.c |  5 ++--
 net/atm/svc.c |  5 ++--
 net/ax25/af_ax25.c|  4 +--
 net/bluetooth/hci_sock.c  |  4 +--
 net/bluetooth/l2cap_sock.c|  5 ++--
 net/bluetooth/rfcomm/sock.c   |  5 ++--
 net/bluetooth/sco.c   |  5 ++--
 net/can/raw.c |  6 ++---
 net/core/sock.c   |  5 ++--
 net/decnet/af_decnet.c|  6 ++---
 net/ipv4/af_inet.c|  5 ++--
 net/ipv6/af_inet6.c   |  5 ++--
 net/iucv/af_iucv.c|  5 ++--
 net/l2tp/l2tp_ip.c|  5 ++--
 net/l2tp/l2tp_ip6.c   |  5 ++--
 net/l2tp/l2tp_ppp.c   |  5 ++--
 net/llc/af_llc.c  |  5 ++--
 net/netlink/af_netlink.c  |  5 ++--
 net/netrom/af_netrom.c|  9 ---
 net/nfc/llcp_sock.c   |  5 ++--
 net/packet/af_packet.c| 10 +++-
 net/phonet/socket.c   |  5 ++--
 net/qrtr/qrtr.c   |  5 ++--
 net/rds/af_rds.c  |  5 ++--
 net/rds/tcp.c |  7 ++
 net/rose/af_rose.c|  5 ++--
 net/sctp/ipv6.c   |  8 +++---
 net/smc/af_smc.c  | 11 -
 net/socket.c  | 35 +--
 net/sunrpc/clnt.c |  6 ++---
 net/sunrpc/svcsock.c  | 13 ++
 net/sunrpc/xprtsock.c |  3 +--
 net/tipc/socket.c |  5 ++--
 net/unix/af_unix.c| 10 
 net/vmw_vsock/af_vsock.c  |  4 +--
 net/x25/af_x25.c  |  4 +--
 security/tomoyo/network.c |  5 ++--
 55 file

Re: [PATCH] net: make getname() functions return length rather than use int* parameter

2018-02-12 Thread Denys Vlasenko

On 02/12/2018 06:47 PM, David Miller wrote:

From: Denys Vlasenko <dvlas...@redhat.com>
Date: Mon, 12 Feb 2018 15:15:18 +0100


Before:
All these functions either return a negative error indicator,
or store length of sockaddr into "int *socklen" parameter
and return zero on success.

"int *socklen" parameter is awkward. For example, if caller does not
care, it still needs to provide on-stack storage for the value
it does not need.

None of the many FOO_getname() functions of various protocols
ever used old value of *socklen. They always just overwrite it.

This change drops this parameter, and makes all these functions, on success,
return length of sockaddr. It's always >= 0 and can be differentiated
from an error.

Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.

rpc_sockname() lost "int buflen" parameter, since its only use was
to be passed to kernel_getsockname() as  and subsequently
not used in any way.

Userspace API is not changed.

 textdata bss  dec hex filename
30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
30108109 2633612  873672 33615393 200ee21 vmlinux.o

Signed-off-by: Denys Vlasenko <dvlas...@redhat.com>


Please do an allmodconfig build, there are still some conversions you
missed:

security/tomoyo/network.c: In function ‘tomoyo_socket_listen_permission’:
security/tomoyo/network.c:658:19: warning: passing argument 3 of 
‘sock->ops->getname’ makes integer from pointer without a cast 
[-Wint-conversion]
 , _len, 0);
^
security/tomoyo/network.c:658:19: note: expected ‘int’ but argument is of type 
‘int *’
security/tomoyo/network.c:657:21: error: too many arguments to function 
‘sock->ops->getname’
const int error = sock->ops->getname(sock, (struct sockaddr *)
  ^~~~
fs/dlm/lowcomms.c: In function ‘lowcomms_error_report’:
fs/dlm/lowcomms.c:495:6: error: too many arguments to function 
‘kernel_getpeername’
   kernel_getpeername(con->sock, (struct sockaddr *), )) {
   ^~
fs/dlm/lowcomms.c: In function ‘tcp_accept_from_sock’:
fs/dlm/lowcomms.c:761:7: warning: passing argument 3 of ‘newsock->ops->getname’ 
makes integer from pointer without a cast [-Wint-conversion]
, 2)) {
^
fs/dlm/lowcomms.c:761:7: note: expected ‘int’ but argument is of type ‘int *’
fs/dlm/lowcomms.c:760:6: error: too many arguments to function 
‘newsock->ops->getname’
   if (newsock->ops->getname(newsock, (struct sockaddr *),
   ^~~


Sorry. Will send updated patch.