Re: Network stack changes

2013-09-19 Thread Luigi Rizzo
On Thu, Sep 19, 2013 at 03:54:34PM -0400, George Neville-Neil wrote:
 
 On Sep 14, 2013, at 15:24 , Luigi Rizzo ri...@iet.unipi.it wrote:
 
  
  
  On Saturday, September 14, 2013, Olivier Cochard-Labb? oliv...@cochard.me 
  wrote:
   On Sat, Sep 14, 2013 at 4:28 PM, Luigi Rizzo ri...@iet.unipi.it wrote:
  
   IXIA ? For the timescales we need to address we don't need an IXIA,
   a netmap sender is more than enough
  
  
   The great netmap generates only one IP flow (same src/dst IP and same
   src/dst port).
  
  True the sample app generates only one flow but it is trivial to modify it 
  to generate multiple flows. My point was, we have the ability to generate 
  high rate traffic, as long as we do tolerate a .1-1us jitter. Beyond that, 
  you do need some ixia-like solution.
  
 
 On the bandwidth side, can a modern sender with netmap really do a full 10G?  
 I hate the cost of an
 IXIA but I have not been able to destroy our stack as effectively with 
 anything else.

yes george, you can download the picobsd image

http://info.iet.unipi.it/~luigi/netmap/20120618-netmap-picobsd-head-amd64.bin

and try for yourself.

Granted this does not have all the knobs of an ixia but it can
surely blast the full 14.88 Mpps to the link, and it only takes a
bit of userspace programming to generate reasonably arbitrary streams
of packets. A netmap sender/receiver is not CPU bound even with 1 core.

cheers
luigi


 Best,
 George


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Network stack changes

2013-09-14 Thread Luigi Rizzo
On Fri, Sep 13, 2013 at 11:08:27AM -0400, George Neville-Neil wrote:
 
 On Aug 29, 2013, at 7:49 , Adrian Chadd adr...@freebsd.org wrote:
...
  I still have some tool coding to do with PMC before I even think about
  tinkering with this as I'd like to measure stuff like per-packet latency as
  well as top-level processing overhead (ie, CPU_CLK_UNHALTED.THREAD_P /
  lagg0 TX bytes/pkts, RX bytes/pkts, NIC interrupts on that core, etc.)
  
 
 This would be very useful in identifying the actual hot spots, and would be 
 helpful
 to anyone who can generate a decent stream of packets with, say, an IXIA.

IXIA ? For the timescales we need to address we don't need an IXIA,
a netmap sender is more than enough

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Network stack changes

2013-09-14 Thread Luigi Rizzo
On Fri, Sep 13, 2013 at 11:08:27AM -0400, George Neville-Neil wrote:
 
 On Aug 29, 2013, at 7:49 , Adrian Chadd adr...@freebsd.org wrote:
...
 One quick note here.  Every time you increase batching you may increase 
 bandwidth
 but you will also increase per packet latency for the last packet in a batch.

The ones who suffer are the first ones, because their processing
is somewhat delayed to 1) let the input batch build up, and 2) complete
processing of the batch before pushing results to the next stage.

However one should never wait for an input batch to grow; you process
whatever your source gives you (one or more packets)
by the time you are ready (and if you are slow/overloaded, of course
you will get a large backlog at once). Either way, there is no
reason to create additional delay on input.

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Network stack changes

2013-09-14 Thread Luigi Rizzo
On Saturday, September 14, 2013, Olivier Cochard-Labbé oliv...@cochard.me
wrote:
 On Sat, Sep 14, 2013 at 4:28 PM, Luigi Rizzo ri...@iet.unipi.it wrote:

 IXIA ? For the timescales we need to address we don't need an IXIA,
 a netmap sender is more than enough


 The great netmap generates only one IP flow (same src/dst IP and same
 src/dst port).

True the sample app generates only one flow but it is trivial to modify it
to generate multiple flows. My point was, we have the ability to generate
high rate traffic, as long as we do tolerate a .1-1us jitter. Beyond that,
you do need some ixia-like solution.

Cheers
Luigi


 This don't permit to test multi-queue NIC (or SMP packet-filter) on a
 simple lab like this:
 netmap sender = freebsd router = netmap receiver

 Regards,

 Olivier


-- 
-+---
 Prof. Luigi RIZZO, ri...@iet.unipi.it  . Dip. di Ing. dell'Informazione
 http://www.iet.unipi.it/~luigi/. Universita` di Pisa
 TEL  +39-050-2211611   . via Diotisalvi 2
 Mobile   +39-338-6809875   . 56122 PISA (Italy)
-+---
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [PATCH] multiple instances of ipfw(4)

2013-06-10 Thread Luigi Rizzo
On Mon, Jun 10, 2013 at 3:30 PM, Ermal Luçi e...@freebsd.org wrote:

 Hello,

 reviving this old thread since i had time to bring the patch to FreeBSD 10
 and unified the whole controlling under ipfw(8) binary.

 For reminder, the patch located at [1] provides multiple instances for
 ipfw(4).
 Basically you can control which interfaces belong to which context/ruleset
 to make maintaining easier.


...



 Any objections on pushing this into FreeBSD?


 [1]

 https://github.com/pfsense/pfsense-tools/blob/master/patches/RELENG_10_0/CP_multi_instance_ipfw.diff




if i understand well, this has no runtime overhead as the ifp has
the index of the context it refers to ?
Or you need an additional IPFW_CTX_RLOCK() ?

Comments on the control/config path:
- in ipfw_ctl(), handling IP_FW_CTX_GET i am worried that you might
  overflow the temporary buffer when building the list. You compute
  the length under rlock, release the lock, malloc(), then fill the
  list without checking if the total size is still correct.
  This kind of code is terribly boring to write, but essentially
  you need a bound check in the second loop and possibly
  retry if you notice that you need more memory.
  ipfw show addresses the problem by failing and requesting the
  user application to pass a larger buffer.

- similarly, how do you guarantee that deleting a context while
  a packet is under processing does not cause dereferencing a
  NULL pointer ?

cheers
luigi

while 
 --
 Ermal
 ___
 freebsd-...@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org




-- 
-+---
 Prof. Luigi RIZZO, ri...@iet.unipi.it  . Dip. di Ing. dell'Informazione
 http://www.iet.unipi.it/~luigi/. Universita` di Pisa
 TEL  +39-050-2211611   . via Diotisalvi 2
 Mobile   +39-338-6809875   . 56122 PISA (Italy)
-+---
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [PATCH] multiple instances of ipfw(4)

2013-06-10 Thread Luigi Rizzo
On Mon, Jun 10, 2013 at 06:52:01PM +0200, Ermal Lu?i wrote:
 On Mon, Jun 10, 2013 at 5:01 PM, Luigi Rizzo ri...@iet.unipi.it wrote:
...
  if i understand well, this has no runtime overhead as the ifp has
  the index of the context it refers to ?
  Or you need an additional IPFW_CTX_RLOCK() ?
 
 
 Theoretically you would need for correctness the read lock.
 It has never been hit in pfSense hence no further investigation on it has
 been done.
 It can be made even a read mostly lock or to prevent the race the  write
 lock
 of the pfil hooks so no packets are passed through?!

adding another lock (even just a read lock) around invocations is
undesirable in my opinion. I'd rather check if there is already
some other lock which is already held so we can use it to protect
the list of contexts.

  Comments on the control/config path:
  - in ipfw_ctl(), handling IP_FW_CTX_GET i am worried that you might
overflow the temporary buffer when building the list. You compute
the length under rlock, release the lock, malloc(), then fill the
list without checking if the total size is still correct.
This kind of code is terribly boring to write, but essentially
you need a bound check in the second loop and possibly
retry if you notice that you need more memory.
ipfw show addresses the problem by failing and requesting the
user application to pass a larger buffer.
 
 
 Yeah that probably can be fixed.
 During implementation it was considered enough rare operation to not
 justify further thought.

well, unlike the previous problem (locking), this has a very simple fix
and no performance implications so there are really no excuses...

 If you agree with the above i can redo the patch again with the above
 changes for review?

i would just be happy with the fix to IP_FW_CTX_GET and a big red flashing
comment in the place where the context is being accessed.
Or if you can find another lock to recycle, fine.

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Make kernel aware of NIC queues

2013-02-06 Thread Luigi Rizzo
On Wed, Feb 06, 2013 at 06:19:27PM +0400, Alexander V. Chernikov wrote:
 Hello list!
 
 Today more and more NICs are capable of splitting traffic to different 
 Rx/TX rings permitting OS to dispatch this traffic on different CPU 
 cores. However, there are some problems that arises from using multi-nic 
 (or even singe multi-port NIC) configurations:
...
 I propose implementing common API to permit drivers:
 * read user-supplied number of queues/other queue options (e.g:
 * notify kernel of each RX/TX queue being created/destroyed
 * make binding queues to cores via given API
 * Export data to userland (for example, via sysctl) to permit users:
 a) quickly see current configuration
 b) change CPU binding on-fly
 c) change flowid numbers on-fly (with the possibility to set 1) 
 NIC-supplied hash 2) manually supplied value 3) disable setting M_FLOWID)
 
 Having common interface will help users to make network stack tuning 
 easier and puts us one step further to make (probably userland) AI which 
 can auto-tune system according to template (router, webserver) and 
 rc.conf configuration (lagg presense, etc..)
 
 
 What do you guys think?

this is certainly a good idea and a welcome one.

Linux has tried to come up with a common framework to implement
this kind of controls using ethtool, and we should probably
have a look at their approach and reuse it (or at least the good ideas)
to avoid reinventing the same thing.

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Make kernel aware of NIC queues

2013-02-06 Thread Luigi Rizzo
On Wed, Feb 06, 2013 at 11:05:59AM -0500, George Neville-Neil wrote:
 
 On Feb 6, 2013, at 09:37 , Luigi Rizzo ri...@iet.unipi.it wrote:
...
  Linux has tried to come up with a common framework to implement
  this kind of controls using ethtool, and we should probably
  have a look at their approach and reuse it (or at least the good ideas)
  to avoid reinventing the same thing.
  
 And, though Luigi didn't say it, I will, this should integrate with netmap.

i did not say it because it will work without any extra effort:
- the netmap version i committed a few days ago already fetch
  the number of queues and the ring sizes at runtime;
- ethtool (or whatever we will call it) only operates on the
  configuration/control plane (number of queues and slots,
  partitioning of packets onto input queues, etc.), whereas netmap
  operates only on the data plane

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


speed tests (Re: Replace bcopy() to update ether_addr)

2012-08-22 Thread Luigi Rizzo
On Wed, Aug 22, 2012 at 02:32:21AM +, Bruce Evans wrote:
 luigi wrote:
 
  even more orthogonal:
  
  I found that copying 8n + (5, 6 or 7) bytes was much much slower than
  copying a multiple of 8 bytes. For n=0, 1,2,4,8 bytes are efficient,
  other cases are slow (turned into 2 or 3 different writes).
  
  The netmap code uses a pkt_copy routine that does exactly this
  rounding, gaining some 10-20ns per packet for small sizes.
 
 I don't believe 10-20ns for just the extra bytes.  memcpy() ends up
 with a movsb to copy the extra bytes.  This can be slow, but I don't
 believe 10-20ns (except on machines running at i486 speeds of course).

I am adding at the end a test program so people can try things on their hw.

Build it with

cc -O2 -Werror -Wall -Wextra  -lpthread -lrt testlock.c -o testlock


and on my i7 i get these results:

./testlock -m memcpy -l 7   - ~23 Mops/s43 ns/cycle
./testlock -m bcopy -l 7- ~10 Mops/sA  100 ns/cycle
./testlock -m fastcopy -l 7 - ~64 Mops/s16 ns/cycle
   (fastcopy rounds to the next multiple of 8)

Changing the length (-l ...) changes the speed, of course.
For some reason my machine is fast for 8n+(0,1,2,3) and slow for
8n+(4,5,6,7).

cheers
luigi


/*
 * Copyright (C) 2012 Luigi Rizzo. All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 *   1. Redistributions of source code must retain the above copyright
 *  notice, this list of conditions and the following disclaimer.
 *   2. Redistributions in binary form must reproduce the above copyright
 *  notice, this list of conditions and the following disclaimer in the
 *documentation and/or other materials provided with the distribution.
 *
 * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 */

/*
 * $Id: testlock.c 11731 2012-08-22 14:19:50Z luigi $
 *
 * Test program to study various ops and concurrency issues.
 * Create multiple threads, possibly bind to cpus, and run a workload.
 *
 * cc -O2 -Werror -Wall testlock.c -o testlock -lpthread
 *  you might need -lrt
 */

#include inttypes.h
#include sys/types.h
#include pthread.h/* pthread_* */

#if defined(__APPLE__)

#include libkern/OSAtomic.h
#define atomic_add_int(p, n) OSAtomicAdd32(n, (int *)p)
#define atomic_cmpset_32(p, o, n)   OSAtomicCompareAndSwap32(o, n, (int *)p)

#elif defined(linux)

int atomic_cmpset_32(volatile uint32_t *p, uint32_t old, uint32_t new)
{
int ret = *p == old;
*p = new;
return ret;
}

#if defined(HAVE_GCC_ATOMICS)
int atomic_add_int(volatile int *p, int v)
{
return __sync_fetch_and_add(p, v);
}
#else
inline
uint32_t atomic_add_int(uint32_t *p, int v)
{
__asm __volatile (
   lock   xaddl   %0, %1 ;
: +r (v), /* 0 (result) */
  =m (*p) /* 1 */
: m (*p));/* 2 */
return (v);
}
#endif

#else /* FreeBSD */
#include sys/param.h
#include machine/atomic.h
#include pthread_np.h /* pthread w/ affinity */

#if __FreeBSD_version  50
#include sys/cpuset.h /* cpu_set */
#if __FreeBSD_version  80
#define HAVE_AFFINITY
#endif

inline void prefetch (const void *x)
{
__asm volatile(prefetcht0 %0 :: m (*(const unsigned long *)x));
}


#else /* FreeBSD 4.x */
int atomic_cmpset_32(volatile uint32_t *p, uint32_t old, uint32_t new)
{
int ret = *p == old;
*p = new;
return ret;
}

#define PRIu64  llu
#endif /* FreeBSD 4.x */

#endif /* FreeBSD */

#include signal.h /* signal */
#include stdlib.h
#include stdio.h
#include poll.h
#include inttypes.h   /* PRI* macros */
#include string.h /* strcmp */
#include fcntl.h  /* open */
#include unistd.h /* getopt */


#include sys/sysctl.h /* sysctl */
#include sys/time.h   /* timersub */

static inline int min(int a, int b) { return a  b ? a : b; }

#define ONE_MILLION 100
/* debug support */
#define ND(format, ...) 
#define D(format, ...)  \
fprintf(stderr, %s [%d]  format \n, \
__FUNCTION__, __LINE__, ##__VA_ARGS__)

int verbose = 0;

#if 1//def

Re: speed tests (Re: Replace bcopy() to update ether_addr)

2012-08-22 Thread Luigi Rizzo
On Wed, Aug 22, 2012 at 05:26:47PM +0300, Mitya wrote:
 22.08.2012 17:36, Luigi Rizzo ??:
 On Wed, Aug 22, 2012 at 02:32:21AM +, Bruce Evans wrote:
 luigi wrote:
 
 even more orthogonal:
 
 I found that copying 8n + (5, 6 or 7) bytes was much much slower than
 copying a multiple of 8 bytes. For n=0, 1,2,4,8 bytes are efficient,
 other cases are slow (turned into 2 or 3 different writes).
 
 The netmap code uses a pkt_copy routine that does exactly this
 rounding, gaining some 10-20ns per packet for small sizes.
 I don't believe 10-20ns for just the extra bytes.  memcpy() ends up
 with a movsb to copy the extra bytes.  This can be slow, but I don't
 believe 10-20ns (except on machines running at i486 speeds of course).
 I am adding at the end a test program so people can try things on their hw.
 
 Build it with
 
  cc -O2 -Werror -Wall -Wextra  -lpthread -lrt testlock.c -o testlock
 
 
 
 # uname -a
 FreeBSD m18.cabletv.dp.ua 9.0-STABLE FreeBSD 9.0-STABLE #1: Tue Apr 24 
 13:23:05 EEST 2012 r...@m18.cabletv.dp.ua:/usr/src/sys/i386/compile/m18 i386
 
 cc -O2 -Werror -Wall -Wextra  -lpthread -lrt testlock.c -o testlock
 
 testlock.c: In function 'test_rdtsc':
 testlock.c:151: error: can't find a register in class 'AD_REGS' while 
 reloading 'asm'
 testlock.c:151: error: 'asm' operand has impossible constraints

i forgot to mention that i tried this only on amd64, my ASM is horrible.
Just comment out the offending lines and do not run those tests.

Or, if you have a portable fix, let me know and everybody will
appreciate it.

cheers
luigi

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Replace bcopy() to update ether_addr

2012-08-22 Thread Luigi Rizzo
On Wed, Aug 22, 2012 at 03:21:06PM -0400, John Baldwin wrote:
 On Wednesday, August 22, 2012 2:54:07 pm Adrian Chadd wrote:
  On 22 August 2012 05:02, John Baldwin j...@freebsd.org wrote:
   On Tuesday, August 21, 2012 12:34:42 pm Adrian Chadd wrote:
   Hi,
  
   What about just creating an ETHER_ADDR_COPY(dst, src) and putting that
   in a relevant include file, then hide the ugliness there?
  
   The same benefits will likely appear when copying wifi MAC addresses
   to/from headers.
  
   Thanks, I'm glad someone noticed this.
  
   I doubt we even _need_ the ugliness.  We should just use *dst = *src
   unless there is a compelling reason not to.
  
  Because it's not very clear? :-) I'd much prefer my array-of-things
  copies to be explicit.
 
 Eh?  'struct foo *src, *dst; *dst = *src' is pretty bog-standard C.  That 
 isn't really all that obtuse.

the thread has probably forked causing people to miss the explanation
that Bruce gave: quite often the function is called by casting
arbitrary pointers into 'struct foo *', so the compiler's expectations
about alignment do not necessarily match the user's lies.

Unfortunately we are building kernels with many compiler checks
disabled, so there is a fair chance that the compiler will not
detect such invalid casts.

Probably addresses are aligned to 2-byte boundaries, but certainly
not on a 4-byte, which means that a safe copy might require 3
instructions, even though a compiler could otherwise decide to align
all non-packed 'struct foo' to a 4- or 8-byte boundary and possibly
do the copy with 2 or even 1 instruction.

I would also suggest to try the code i posted in response to bruce
so you can check how good or bad are the various solutions on
different architectures or CPUs, and see if there is a reasonable
compromise.

cheers
luigi

  Also, the optimisation and compiler silliness may not be THAT obvious
  on intel (except when you're luigi and using netmap) but I can't help
  but wonder whether the same does hold for MIPS/ARM. Getting it wrong
  there will lead to some very very poor performing code.
 
 Don't you think there's a really good chance the compiler knows how to copy a 
 structure appropriately for each architecture already?
 
 -- 
 John Baldwin
 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Replace bcopy() to update ether_addr

2012-08-21 Thread Luigi Rizzo
On Tue, Aug 21, 2012 at 12:26:30PM +0200, Marius Strobl wrote:
...
  Why we are use bcopy(), to copy only 6 bytes?
  Answer - in some architectures we are can not directly copy unaligned data.
  
  I propose this solution.
  
  In file /usr/src/include/net/ethernet.h add this lines:
  
  static inline void ether_addr_copy(ether_addr* src, ether_addr* dst) {
  #if defined(__i386__) || defined(__amd64__)
  *dst = *src;
  #else
  bcopy(src, dst, ETHER_ADDR_LEN);
  #endif
  }
...
  All this variants are much faster, than bcopy()
  
 
 A bit orthogonal to this but also related to the performance
 impact of these bcopy() calls, for !__NO_STRICT_ALIGNMENT
 architectures these places probably should use memcpy()
 instead as bcopy() additionally has to check for overlap
 while the former does not. Overlaps unlikely are an issue
 in these cases and at least NetBSD apparently has done the
 switch to memcpy() 5.5 years ago.

even more orthogonal:

I found that copying 8n + (5, 6 or 7) bytes was much much slower than
copying a multiple of 8 bytes. For n=0, 1,2,4,8 bytes are efficient,
other cases are slow (turned into 2 or 3 different writes).

The netmap code uses a pkt_copy routine that does exactly this
rounding, gaining some 10-20ns per packet for small sizes.

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: FreeBSD 10G forwarding performance @Intel

2012-07-03 Thread Luigi Rizzo
On Tue, Jul 03, 2012 at 08:11:14PM +0400, Alexander V. Chernikov wrote:
 Hello list!
 
 I'm quite stuck with bad forwarding performance on many FreeBSD boxes 
 doing firewalling.
...
 In most cases system can forward no more than 700 (or 1400) kpps which 
 is quite a bad number (Linux does, say, 5MPPs on nearly the same hardware).

among the many interesting tests you have run, i am curious
if you have tried to remove the update of the counters on route
entries. They might be another severe contention point.

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: FreeBSD 10G forwarding performance @Intel

2012-07-03 Thread Luigi Rizzo
On Tue, Jul 03, 2012 at 09:37:38PM +0400, Alexander V. Chernikov wrote:
...
 Thanks, another good point. I forgot to merge this option from andre's 
 patch.
 
 Another 30-40-50kpps to win.

not much gain though.
What about the other IPSTAT_INC counters ?
I think the IPSTAT_INC macros were introduced (by rwatson ?)
following a discussion on how to make the counters per-cpu
and avoid the contention on cache lines.
But they are still implemented as a single instance,
and neither volatile nor atomic, so it is not even clear
that they can give reliable results, let alone the fact
that you are likely to get some cache misses.

the relevant macro is in ip_var.h.

Cheers
luigi

 
 +u_int rt_count  = 1;
 +SYSCTL_INT(_net, OID_AUTO, rt_count, CTLFLAG_RW, rt_count, 1, );
 
 @@ -601,17 +625,20 @@ passout:
 if (error != 0)
 IPSTAT_INC(ips_odropped);
 else {
 -   ro.ro_rt-rt_rmx.rmx_pksent++;
 +   if (rt_count)
 +   ro.ro_rt-rt_rmx.rmx_pksent++;
 IPSTAT_INC(ips_forward);
 IPSTAT_INC(ips_fastforward);
 
 
 
 cheers
 luigi
 
 
 
 -- 
 WBR, Alexander
 ___
 freebsd-...@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: FreeBSD 10G forwarding performance @Intel

2012-07-03 Thread Luigi Rizzo
On Wed, Jul 04, 2012 at 12:31:56AM +0400, Alexander V. Chernikov wrote:
 On 04.07.2012 00:27, Luigi Rizzo wrote:
 On Tue, Jul 03, 2012 at 09:37:38PM +0400, Alexander V. Chernikov wrote:
 ...
 Thanks, another good point. I forgot to merge this option from andre's
 patch.
 
 Another 30-40-50kpps to win.
 
 not much gain though.
 What about the other IPSTAT_INC counters ?
 Well, we should then remove all such counters (total, forwarded) and 
 per-interface statistics (at least for forwarded packets).

I am not saying to remove them for good, but at least have a
try at what we can hope to save by implementing them
on a per-cpu basis.

There is a chance that one will not
see big gains util the majority of such shared counters
are fixed (there are probably 3-4 at least on the non-error
path for forwarded packets), plus the per-interface ones
that are not even wrapped in macros (see if_ethersubr.c)

 I think the IPSTAT_INC macros were introduced (by rwatson ?)
 following a discussion on how to make the counters per-cpu
 and avoid the contention on cache lines.
 But they are still implemented as a single instance,
 and neither volatile nor atomic, so it is not even clear
 that they can give reliable results, let alone the fact
 that you are likely to get some cache misses.
 
 the relevant macro is in ip_var.h.
 Hm. This seems to be just per-vnet structure instance.

yes but essentially they are still shared by all threads within a vnet
(besides you probably ran your tests in the main instance)

 We've got some more real DPCPU stuff (sys/pcpu.h  kern/subr_pcpu.c) 
 which can be used for global ipstat structure, however since it is 
 allocated from single area without possibility to free we can't use it 
 for per-interface counters.

yes, those should be moved to a private, dynamically allocated
region of the ifnet (the number of CPUs is known at driver init
time, i hope). But again for a quick test disabling the
if_{i|o}{bytesC|packets} should do the job, if you can count
the received rate by some other means.

 I'll try to run tests without any possibly contested counters and report 
 the results on Thursday.

great, that would be really useful info.

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: how to display C sources in Chromium (file:/// only)

2012-04-30 Thread Luigi Rizzo
On Mon, Apr 30, 2012 at 08:47:21AM -0700, Evan Martin wrote:
 On Sat, Apr 28, 2012 at 7:47 AM, Luigi Rizzo ri...@iet.unipi.it wrote:
  (hoping this is of interest for hackers- too)
 
  One of the most annoying features of chromium is that it downloads
  instead of displaying various types of files (.c, .h and so on).
 
 This has long annoyed me too!
 
  it seems that a partial fix can be achieved by arring the list of
  types we want to display to the array
 
  ? ?static const char* const supported_non_image_types[] = {
  ? ? ? ?...
  + ? ? ? text/x-csrc,
  + ? ? ? text/x-chdr,
  ? ? ? ?...
  ? ?}
 
 There's a comment at the end of the block:
   // Note: ADDING a new type here will probably render it AS HTML. This can
   // result in cross site scripting.
 I wonder how to tell?

I think that the comment is partially or completely wrong.

As one can easily verify, if the server reports Content-Type:
text/plain the data is displayed as plain text even if it contains
some html.  There must be another layer which decides how to render
the content but this list is only a YES/NO indication.

 If you follow the references back from the definition of this variable
 it appears to end up used here:
 http://code.google.com/searchframe#OAMlx_jo-ck/src/third_party/WebKit/Source/WebCore/loader/FrameLoader.cppexact_package=chromiumct=rccd=10q=IsSupportedNonImageMimeTypel=859
 which just indicates it's used when WebKit is deciding whether to
 inline the content or not.
 
 It seems to me like you could adjust the code in
 MimeUtil::IsSupportedNonImageMimeType to always return true for any
 text/* mime type.

I wouldn't be surprised if there were an easy override
in some config file. This issue has been mentioned for ages
on the chrome issue database, often referring at the correct
behaviour of other browser e.g. Firefox. Some relevant entries:

http://code.google.com/p/chromium/issues/detail?id=24675
http://code.google.com/p/chromium/issues/detail?id=118204
http://code.google.com/p/chromium/issues/detail?id=106150

But i remember seeing many others, some closed, some with a
long trail eventually mentioning deep security or architectural
issues as a motivation not to implement the feature.

After finding out the .local/share/mime/globs2 trick, i really
believe that those answers really meant

I have no idea why it is so, and i am too afraid of breaking
something to even consider changing things

Annoying but understandable given the size of the code

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


how to display C sources in Chromium (file:/// only)

2012-04-28 Thread Luigi Rizzo
(hoping this is of interest for hackers- too)

One of the most annoying features of chromium is that it downloads
instead of displaying various types of files (.c, .h and so on).

After a bit of investigation i found that at least for local files
you can override this by defining your preferred mime types in
~/.local/share/mime/globs2as follows:

 cat ~/.local/share/mime/globs2
10:text/plain:*.c
10:text/plain:*.cc
10:text/plain:*.c++
10:text/plain:*.cpp
10:text/plain:*.h

The first field is the priority (smaller number means more important),
then follows the mime type, then the pattern that you are matching.
The default rules (/usr/local/share/ ...) have a priority of 50
for .c, .h and so on.

For remotely-served files, the browser relies on the MIME Type
supplied by the server and the trick above does not work.

Looking at the Chromium sources

chromium-courgette-redacted-18.0.1025.162/net/base/mime_util.cc

it seems that a partial fix can be achieved by arring the list of
types we want to display to the array

static const char* const supported_non_image_types[] = {
...
+   text/x-csrc,
+   text/x-chdr,
...
}

Maybe we can have some optional patch to the FreeBSD port,
although i'd rather find a way to override the server-supplied
mime type in a way that does not require rebuilding Chrome.

Anyways, at least for local browsing, this seems a significant
improvement.

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: iso2flash img

2012-03-22 Thread Luigi Rizzo
On Wed, Mar 21, 2012 at 11:11:42PM +1000, Da Rock wrote:
...
 In the meantime I think I may have stumbled on the solution to the 
 script: In the midst of all the output it mentions usage realpath [-q] 
 path. I wasn't 100% sure exactly what that meant, but I put the full 
 path to the iso and a full path to an img file and I *think* that 
 worked. I've yet to test the result; and I have no idea of the '-q' 
 option

REALPATH(1) FreeBSD General Commands ManualREALPATH(1)

NAME
 realpath -- return resolved physical path

SYNOPSIS
 realpath [-q] path [...]

DESCRIPTION
 The realpath utility uses the realpath(3) function to resolve all sym-
 bolic links, extra `/' characters and references to /./ and /../ in path.

 If -q is specified, warnings will not be printed when realpath(3) fails.

EXIT STATUS
 The realpath utility exits 0 on success, and 0 if an error occurs.

SEE ALSO
 realpath(3)

HISTORY
 The realpath utility first appeared in FreeBSD 4.3.

FreeBSD 8.1November 24, 2000   FreeBSD 8.1
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: iso2flash img

2012-03-22 Thread Luigi Rizzo
On Wed, Mar 21, 2012 at 10:52:53AM -0500, Mark Felder wrote:
 As an alternative I recently purchased a Zalman ZM-VE200 device (there's  
 also a USB3.0 flavor) that lets you copy ISOs to it and it will emulate a  
 CDROM/DVDROM/BDROM for you so you never have to deal with this mess again.  
 It works amazingly well. I was tired of fighting this problem and this is  
 an amazing solution -- I can keep every ISO I ever need on a single drive.
 
 http://www.zalman.com/eng/product/Product_Read.asp?idx=431
 http://www.zalman.com/eng/product/Product_Read.asp?idx=459
 http://www.rmprepusb.com/tutorials/ve200

really nice, thanks for the link. Now if they had something
that supported a USB key it would be even nicer...

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: iso2flash img

2012-03-22 Thread Luigi Rizzo
On Thu, Mar 22, 2012 at 05:42:27PM +0100, Thomas Schmitt wrote:
 Hi,
 
 Vitaly Magerya vmage...@gmail.com:
   you might want to try to dd the iso image directly onto USB instead; there
   where talks that Ubuntu would support this starting at 11.10.
 
 Da Rock freebsd-hack...@herveybayaustralia.com.au:
  Nada. Tried that and it didn't work. I'm not sure how that would work given
  that it uses isolinux to boot- ergo needs a cd to load the kernel. Maybe
  some way to determine the install media?
 
 The trick is called isohybrid.
 It works by a DOS MBR which starts the same executable boot image
 that is pointed to by the El Torito boot catalog.
 If the ISO is on a hard disk (or alike), then the BIOS boots via MBR.
 If it is on an optical medium, then the BIOS boots via El Torito.

interesting. It does work for me indeed.
And it might be a nice trick for our images too, so we don't
have to build a memstick and an ISO image...

cheers
luigi

 The question is rather why it does not work for you.
 
 I downloaded
   ubuntu-11.10-desktop-i386.iso
 from
   http://www.ubuntu.com/download/ubuntu/download
 and put it onto an USB stick (by a Linux machine, but that should not matter)
   dd of=/dev/sdc if=ubuntu-11.10-desktop-i386.iso bs=2048
 Note that /dev/sdc is not the first partition but the whole USB stick.
 
 This stick boots on amd64 hardware.
 After some waiting with sparse iconography i get to the question
 whether i want to try or to install. I choose to try and get a
 graphical desktop. From the icon list i start Firefox and google
 a bit via my internet router. All seems well.
 
 
 On FreeBSD, GEOM complains about the DOS partition alignment.
 Partition 1 starts at block 64.
   fdisk -p /dev/da0
   # /dev/da0
   g c243 h255 s63
   p 1 0x17 64 1423896
   a 1
 Nevertheless these two commands work and open access to the image content:
   mount -t cd9660 /dev/da0 /mnt
   mount -t cd9660 /dev/da0s1 /mnt
 (The ISO has two superblocks and two directory trees.)
 
 
 Does your hardware boot from USB stick at all ?
 Is its firmware (U)EFI rather than BIOS ?
 
 
 Have a nice day :)
 
 Thomas
 
 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [PATCH] multiple instances of ipfw(4)

2012-01-31 Thread Luigi Rizzo
On Mon, Jan 30, 2012 at 01:01:13PM +0100, Ermal Lu?i wrote:
 Hello,
 
 from needs on pfSense a patch for allowing multiple intances of
 ipfw(4) in kernel to co-exist was developed.
 It can be found here
 https://raw.github.com/bsdperimeter/pfsense-tools/master/patches/RELENG_9_0/CP_multi_instance_ipfw.diff
 
 It is used in conjuction with this tool
 https://raw.github.com/bsdperimeter/pfsense-tools/master/pfPorts/ipfw_context/files/ipfw_context.c
 It allows creation of contextes/instances and assignment of specific
 interfaces to specific contexts/instances.
 
 Surely i know that this is not the best way to implement generically
 but it gets the job done for us as it is, read below.
 
 What i would like to know is if there is interest to see such
 functionality in FreeBSD?
 
 I am asking first to see if there is some consensus about this as a
 feature, needed or not!
 If interest is shown i will transform the patch to allow:
 - ipfw(8) to manage the contextes create/destroy
 - ipfw(8) to manage interface membership. Closing the race of two
 parallell clients modifying different contextes.
 
 There is another design choice to be made about storing the membership
 of interfaces into contexts/instances, but i do not see that as
 blocking.
 
 It is quite handy feature, which can be exploited even to scale on SMP
 machines by extending it to bind a specific instance(with its
 interaces) to a specific CPU/core?!
 
 Comments/Feedback expected,

if i understand what the patch does, i think it makes sense to be
able to hook ipfw instances to specific interfaces/sets of interfaces,
as it permits the writing of more readable rulesets. Right now the
workaround is start the ruleset with skipto rules matching on
interface names, and then use some discipline in reserving a range
of rule numbers to each interface.

Before making more changes to the code,
it would help if you could give a high level description of

1. what the change does and how specific cases are handled. E.g.
With this change you can create multiple rulesets (contexts ?)
and bind one or more interfaces to a context.
- what happens with outgoing packets where the context
  to be picked up depend on the route in effect at the time
  of the transmission ?
- what happens with encapsulated interfaces (vlan) ?
- can you skipto across contexts (i guess not) ?

2. how intrusive are code changes ? The kernel patch you show
   seems small, which makes sense as i believe all is needed is
   to start from a specific chain instead of the default one when
   an interface is bound to a context. A few comments:
- if you use one of the if_ispare directly, instead of
  renaming it to if_context, this would make backporting and
  testing easier. 
- I think the explosion of sockopt names is a bad thing.
  The IP_FW3 command was introduced exactly to have a single
  entry point to the firewall and avoid a ton of new names
  in raw_ip.c and in.h
- can you reduce the number of global ipfw-related variables ?
  There used to be one (layer3_chain), now you have 3 of them.
  You should delete layer3_chain and replace it with another
  single global (could be ip_fw_contexts) which contains the
  whole firewall state.
- how do you handle reinjects (e.g. from dummynet or divert) ?
  I don't remember if the metadata that stores where you
  reinject the packet also has a pointer to the root of the
  chain.
- i don't completely follow the connection between ip_fw_chain,
  ip_fw_ctx_iflist, ip_fw_ctxmember, ip_fw_ctx, ip_fw_ctx_list.
  The way i see it:
  - the ip_fw_chain could be embedded in the ip_fw_ctx, as they
map 1:1
  - why do you need ip_fw_ctx_iflist and ip_fw_ctxmember ?
You should never need to determine context membership
during packet processing, and for sockopt calls you could
as well iterate over the list of interfaces;

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: dummynet(4) kernel process CPU usage monitoring

2011-11-21 Thread Luigi Rizzo
On Mon, Nov 21, 2011 at 01:23:22PM +0700, Eugene Grosbein wrote:
 Hi!
 
 I need to draw graph of dummynet's CPU usage.
 procstat -t 0 shows me TID (thread id) of dummynet kernel thread.
 ps -Hxo time,lwp shows me total CPU time consumed by this thread.
 
 Now I see this time has 9 seconds increase during 60 seconds of real time.
 This should be 9/60=15% CPU usage, but top -SHP shows me 0.00% meantime.

apart from the scaling on number of cores (e.g. if you have 8 cores
the 15% becomes a bare 2%) remember that percentages are computed
with some kind of filtering (EWMA ?) so if the load of dummynet threads
is bursty, the filter might eat most of it.

not completely sure this explains a steady 0.00%, if that's is
what you are seeing

cheers
luigi

 Where is my error?
 
 Eugene Grosbein
 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: intel checksum offload

2011-09-18 Thread Luigi Rizzo
On Sun, Sep 18, 2011 at 03:19:46PM -0400, Arnaud Lacombe wrote:
 Hi,
 
 On Sat, Sep 17, 2011 at 4:32 PM, YongHyeon PYUN pyu...@gmail.com wrote:
  On Sat, Sep 17, 2011 at 11:57:10AM +0430, Hooman Fazaeli wrote:
  Hi list,
 
  The data sheet for intel 82576 advertises IP TX/RX checksum offload
  but the driver does not set CSUM_IP in ifp-if_hwassist. Does this mean 
  that
  driver (and chip) do not support IP TX checksum offload or the support for
  TX is not yet included in the driver?
...
 This is slightly off-topic, but still..
 
 FWIW, I'm not really impressed by what chips claim to support vs. what
 has been implemented in the driver. As per the product brief, the
...
 [0]: the commit message say performance was not good, but it is not
 the driver's developer to decide whether or not a feature is good or
 not. The developer's job is to implement the chip capabilities, and
 let it to the user to enable or disable the capabilities. At best, the
 developer can decide whether or not to enable the feature by default.

actually, this is a perfect example where the developer has done the
right thing: implemented the feature, verified that performance is bad,
hence presumably removed support for the feature from the code (which also
means that the normal code path will run faster because there are no
run-time decisions to be made).

optional features are often costly even when disabled.

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: intel checksum offload

2011-09-18 Thread Luigi Rizzo
On Sun, Sep 18, 2011 at 06:05:33PM -0400, Arnaud Lacombe wrote:
 Hi,
 
 On Sun, Sep 18, 2011 at 5:06 PM, Luigi Rizzo ri...@iet.unipi.it wrote:
  On Sun, Sep 18, 2011 at 03:19:46PM -0400, Arnaud Lacombe wrote:
  Hi,
 
  On Sat, Sep 17, 2011 at 4:32 PM, YongHyeon PYUN pyu...@gmail.com wrote:
   On Sat, Sep 17, 2011 at 11:57:10AM +0430, Hooman Fazaeli wrote:
   Hi list,
  
   The data sheet for intel 82576 advertises IP TX/RX checksum offload
   but the driver does not set CSUM_IP in ifp-if_hwassist. Does this mean 
   that
   driver (and chip) do not support IP TX checksum offload or the support 
   for
   TX is not yet included in the driver?
  ...
  This is slightly off-topic, but still..
 
  FWIW, I'm not really impressed by what chips claim to support vs. what
  has been implemented in the driver. As per the product brief, the
  ...
  [0]: the commit message say performance was not good, but it is not
  the driver's developer to decide whether or not a feature is good or
  not. The developer's job is to implement the chip capabilities, and
  let it to the user to enable or disable the capabilities. At best, the
  developer can decide whether or not to enable the feature by default.
 
  actually, this is a perfect example where the developer has done the
  right thing: implemented the feature, verified that performance is bad,
  hence presumably removed support for the feature from the code (which also
  means that the normal code path will run faster because there are no
  run-time decisions to be made).
 
  optional features are often costly even when disabled.
 
 I forgot to mention that in this case, the code full of
 EM_MULTIQUEUE's #ifdef and shared code is still fully compatible with
 the multiqueue's architecture. The only thing removed is a conditional
 and an assignation in the driver's attachment which was enabling the
 feature, ie. the cost you point out is still paid today, without any
 benefit.

the above suggests that you have a wonderful opportunity: with just
a little bit of time and effort you should be able to complete/re-enable
the missing code, run tests that you believe significant (given
your statement below) and prove or disprove the comment about
performance.

cheers
luigi

 
 Now I might also openly question the test method used by the folks at
 Intel, just seeing how much issue I've had with the driver (I still
 have for some, even if not driver related), which have not been
 reproduced there.
 
 Finally, when someone say performance are better that way, the first
 thing I'd be tempted to ask is: What is your test ? How did you
 collects the numbers ? How did you reach the conclusion ?. None of
 this stuff is public.
 regards,
  - Arnaud
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: [GSoC]I want to remove everything perl/tcl/gtags in the new nvi

2011-07-14 Thread Luigi Rizzo
On Thu, Jul 14, 2011 at 01:22:49AM -0500, Zhihao Yuan wrote:
 Hi hackers,
 
 I'm doing my GSoC project, Multibyte Encoding Support in Nvi at
 https://github.com/lichray/nvi2 . Currently, the editor can support
 read/display/write multibyte encoding through iconv. Before adding
 more features like file encoding detection, I want to remove some
 features in nvi.
 
 First, gtags mode. This feature was imported by
 http://lists.gnu.org/archive/html/global-commit/2005-01/msg2.html
 . There's no gtags in our base system, and I can't find it in ports.
 This feature is useless (nvi-1.8x does not have it) and unexpected in
 the code (GTAGS macro everywhere). In a word, I want to remove it.
 
 Second, the perl/tcl interpreter support; you can apply a perl/tcl
 command to the file whiling you are editing. I beg no one here used
 this feature before. If the logic is simple, you can use subst; if
 it's not, you'd better write a script and run perl/tcl outside. I
 regard it as feature creep, and I don't like it.
 
 Any comments?

what you suggest makes perfect sense.

I'd also like to commend the attitude: not just say that you don't
like something, but also do proper investigation and give sensible
motivations for your likes or dislikes. Bravo.

cheers
luigi

 -- 
 Zhihao Yuan, nickname lichray
 The best way to predict the future is to invent it.
 ___
 4BSD -- http://4bsd.biz/
 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: FreeBSD I/OAT (QuickData now?) driver

2011-06-11 Thread Luigi Rizzo
On Sat, Jun 11, 2011 at 04:49:17PM +0100, Robert Watson wrote:
 
 On Mon, 6 Jun 2011, grarpamp wrote:
 
 I know we've got polling. And probably MSI-X in a couple drivers. Pretty 
 sure there is still one CPU doing the interrupt work? And none of the 
 multiple queue thread spreading tech exists?
 
 Actually, with most recent 10gbps cards, and even 1gbps cards, we process 
 inbound data with as many CPUs as the hardware has MSI-X enabled input and 
 output queues.  So a couple understates things significantly.
 
* Through PF_RING, expose the RX queues to the userland so that
 the application can spawn one thread per queue hence avoid using
 semaphores at all.
 
 I'm probably a bit out of date, but last I checked, PF_RING still implied 
 copying, albeit into shared memory buffers.  We support shared memory 
 between the kernel and userspace for BPF and have done for quite a while.  
 However, right now a single shared memory buffer is shared for all receive 
 queues on a NIC.  We have a Google summer of code student working on this 
 actively right now -- my hope is that by the end of the summer we'll have a 
 pretty functional system that allows different shared memory buffers to be 
 used for different input queues.  In particular, applications will be able 
 to query the set of queues available, detect CPU affinity for them, and 
 bind particular shared memory rings to particular queues.  It's worth 
 observing that for many types of high-performance analysis, BPF's packet 
 filtering and truncation support is quite helpful, and if you're going to 
 use multiple hardware threads per input queue anyway, you actually get a 
 nice split this way (as long as those threads share L2 caches).
 
 Luigi's work on mapping receive rings straight into userspace looks quite 
 interesting, but I'm pretty behind currently, so haven't had a chance to 
 read his NetMap paper.  The direct mapping of rings approach is what a 
 number of high-performance FreeBSD shops have been doing for a while, but 
 none had generalised it sufficiently to merge into our base stack.  I hope 
 to see this happen in the next year.

for the records, netmap also maps transmit rings, makes them device
independent, and supports the mapping of rings to different cores
through standard setaffinity() calls.

I'd really encourage people to look at the code (e.g. the pkt-gen.c
program, which is part of the archive) so you can see how easy it
is to use.

And of course, any feedback and suggestions are welcome

http://info.iet.unipi.it/~luigi/netmap/

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: FreeBSD I/OAT (QuickData now?) driver

2011-06-06 Thread Luigi Rizzo
On Mon, Jun 06, 2011 at 10:13:51PM -0400, grarpamp wrote:
 Is this work part of what's needed to enable the FreeBSD
 equivalent of TNAPI?
 
 I know we've got polling. And probably MSI-X in a couple drivers.
 Pretty sure there is still one CPU doing the interrupt work?
 And none of the multiple queue thread spreading tech exists?

i have heard of some Gsoc work that addresses the problem
for cards that have a single queue, but drivers for other cards with
native multiqueue (e.g. ixgbe, e1000 drivers) seem to have
the ability to use one cpu per queue.

I'd argue that for many types of applications (basically all for
which PF_RING/TNAPI were designed), spreading
work across cores is a second order problem, you should
first avoid doing useless work.  Please have a look at

http://info.iet.unipi.it/~luigi/netmap/

which addresses both issues.

cheers
luigi

 http://www.ntop.org/blog
 http://www.ntop.org/TNAPI.html
 TNAPI attempts to solve the following problems:
 * Distribute the traffic across cores (i.e. the more core the more
 scalable is your networking application) for improving scalability.
 * Poll packets simultaneously from each RX queue (contraty to
 sequential NAPI polling) for fetching packets as fast as possible
 hence improve performance.
 * Through PF_RING, expose the RX queues to the userland so that
 the application can spawn one thread per queue hence avoid using
 semaphores at all.
 TNAPI achieves all this by starting one thread per RX queue. Received
 packets are then pushed to PF_RING (if available) or through the
 standard Linux stack. However in order to fully exploit this
 technology it is necessary to use PF_RING as it provides a straight
 packet path from kernel to userland. Furthermore it allows to create a
 virtual ethernet card per RX queue.
 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Boot0cfg bug redux (Re: sys/boot/boot0/boot0.S - r186598)

2011-01-10 Thread Luigi Rizzo
In order to understand the bug discussed in the recent thread
(original message attached at the end), Tom Judge passed me the
dump of the boot sector around the bug.

The system giving trouble has the following configuration

Fresh transcript:
file1: ORIGINAL BOOT SECTOR
# boot0cfg -v ad0
#   flag start chs   type   end chs   offset size
1   0x80  0:  1: 1   0xa5494: 15:63   63   498897
2   0x00495:  1: 1   0xa5989: 15:63   499023   498897
3   0x00990:  0: 1   0xa5992: 15:63   997920 3024


file2: boot sector after running 'boot0cfg -s 2 -v ad0'
 cmp -x file1 file2
01b5 00 01  _OPT, default option

No big surprises here, the default selection changes from 0 to 1
HOWEVER, boot0cfg does not alter the 'active' flag in the
partition table. This triggers, if i remember well, a 'feature'
in the boot1/boot2, code which does not know/honor the selected
partition and instead boots the first partition marked as 'active',
and missing that, the first FreeBSD partition.

As a consequence, if we reboot without pressing an F-key, the system
boots from partition s1 even though the boot loader indicates F2.

file3: boot sector after the above reboot
 cmp -x file1 file3
01b5 00 01

Next, reboot this time pressing F2. After the boot we start from s2,
and the boot sector is now changed:

file4: boot sector after pressing F2

 cmp -x file1 file4
01b4 00 b1  _NXTDRV
01b5 00 01  _OPT, default option
01be 80 00  active flag, slice 1
01ce 00 80  active flag, slice 2

As expected the 'active' flag is updated as a result of a boot from 
the partition selected. 
This is something that could be done by 'boot0cfg -s ...' 
to achieve the desired behaviour.

The only surprise here is that _NXTDRV has changed. I am unsure 
if this was the result of an erroneous F5 keypress. Indeed 0xb1 is
probably the correct initial value of the byte at 0x1b4, probably
I/we forgot to initialize the field.


So, to summarize, I guess that a possible fix (that does not involve
using gpart, or even worse, modifying boot0.S, which probably does
not have any spare space) is to modify boot0cfg so that it sets the
'active' flag for the partition corresponding to the default entry.

What do people think ?

cheers
luigi

On Sun, Jan 09, 2011 at 12:39:28AM -0600, Tom Judge wrote:
 Hi,
 
 Today I ran into an issue where setting the default slice with boot0cfg
 -s is broken.
 
 This is related to a section of this revision:
 
 + commit Warner's patch orb $NOUPDATE,_FLAGS(%bp)
   to avoid writing to disk in case of a timeout/default choice;
 
 This issue is quite well documented in bin/134907 which has been open
 since May 2009.
 
 Reproduced with a fresh nanobsd build:
 
 Boot 1 - Slice 1 active as set by nanobsd image builder:
 
 ===
 # boot0cfg -v ad0
 #   flag start chs   type   end chs   offset size
 1   0x80  0:  1: 1   0xa5494: 15:63   63   498897
 2   0x00495:  1: 1   0xa5989: 15:63   499023   498897
 3   0x00990:  0: 1   0xa5992: 15:63   997920 3024
 
 version=2.0  drive=0x80  mask=0x3  ticks=182  bell=# (0x23)
 options=packet,update,nosetdrv
 volume serial ID 9090-9090
 default_selection=F1 (Slice 1)
 ===
 
 Update the active slice to 2:
 ===
 # boot0cfg -s 2 -v ad0
 #   flag start chs   type   end chs   offset size
 1   0x80  0:  1: 1   0xa5494: 15:63   63   498897
 2   0x00495:  1: 1   0xa5989: 15:63   499023   498897
 3   0x00990:  0: 1   0xa5992: 15:63   997920 3024
 
 version=2.0  drive=0x80  mask=0x3  ticks=182  bell=# (0x23)
 options=packet,update,nosetdrv
 volume serial ID 9090-9090
 default_selection=F2 (Slice 2)
 ===
 
 Reboot and let boot0 time out and boot default slice 2:
 ===
 # boot0cfg -v ad0
 #   flag start chs   type   end chs   offset size
 1   0x80  0:  1: 1   0xa5494: 15:63   63   498897
 2   0x00495:  1: 1   0xa5989: 15:63   499023   498897
 3   0x00990:  0: 1   0xa5992: 15:63   997920 3024
 
 version=2.0  drive=0x80  mask=0x3  ticks=182  bell=# (0x23)
 options=packet,update,nosetdrv
 volume serial ID 9090-9090
 default_selection=F2 (Slice 2)
 ===
 The system actually booted into slice 1 here.
 This was verified by dropping to the loader prompt and using show to grab:
 loaddev=disk0s1a:
 
 Reboot and hit 2 at the boot0 prompt:
 ===
 # boot0cfg -v ad0
 #   flag start chs   type   end chs   offset size
 1   0x00  0:  1: 1   0xa5494: 15:63   63   498897
 2   0x80495:  1: 1   0xa5989: 15:63   499023   498897
 3   0x00990:  0: 1   0xa5992: 15:63   997920 3024
 
 version=2.0  drive=0x80  mask=0x3  ticks=182 

Re: Boot0cfg bug redux (Re: sys/boot/boot0/boot0.S - r186598)

2011-01-10 Thread Luigi Rizzo
On Tue, Jan 11, 2011 at 03:38:23AM +0300, Andrey V. Elsukov wrote:
 On 11.01.2011 02:33, Luigi Rizzo wrote:
  As a consequence, if we reboot without pressing an F-key, the system
  boots from partition s1 even though the boot loader indicates F2.
 skip
  So, to summarize, I guess that a possible fix (that does not involve
  using gpart, or even worse, modifying boot0.S, which probably does
  not have any spare space) is to modify boot0cfg so that it sets the
  'active' flag for the partition corresponding to the default entry.
  
  What do people think ?
 
 I don't remember which behavior was before, but it seems that users
 want to change active attribute when they use boot0cfg -s.

more than what was before the issue is whether the 0x80 flag
means ACTIVE (hence is should be set only for one partition)
or it means BOOTABLE (so it is ok to have it set for multiple
partitions, and bootloaders should ignore partitions with
the flag clear, no matter what the selection is).

boot0.S' behaviour is closer to ACTIVE than BOOTABLE.
Same for fdisk, and the change i was proposing is in line
with this interpretation.

 And i think it is not so hard to add several lines of code to pass
 set command to PART class.

i don't know how this would help, because i believe the code at the
origin of the bug is (or at least used to be) in boot2.c

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: sys/boot/boot0/boot0.S - r186598

2011-01-09 Thread Luigi Rizzo
On Sun, Jan 09, 2011 at 12:39:28AM -0600, Tom Judge wrote:
 Hi,
 
 Today I ran into an issue where setting the default slice with boot0cfg
 -s is broken.

a few questions inline:

 This is related to a section of this revision:
 
 + commit Warner's patch orb $NOUPDATE,_FLAGS(%bp)
   to avoid writing to disk in case of a timeout/default choice;
 
 This issue is quite well documented in bin/134907 which has been open
 since May 2009.
 
 Reproduced with a fresh nanobsd build:
 
 Boot 1 - Slice 1 active as set by nanobsd image builder:
 
 ===
 # boot0cfg -v ad0
 #   flag start chs   type   end chs   offset size
 1   0x80  0:  1: 1   0xa5494: 15:63   63   498897
 2   0x00495:  1: 1   0xa5989: 15:63   499023   498897
 3   0x00990:  0: 1   0xa5992: 15:63   997920 3024
 
 version=2.0  drive=0x80  mask=0x3  ticks=182  bell=# (0x23)
 options=packet,update,nosetdrv
 volume serial ID 9090-9090
 default_selection=F1 (Slice 1)
 ===
 
 Update the active slice to 2:
 ===
 # boot0cfg -s 2 -v ad0
 #   flag start chs   type   end chs   offset size
 1   0x80  0:  1: 1   0xa5494: 15:63   63   498897
 2   0x00495:  1: 1   0xa5989: 15:63   499023   498897
 3   0x00990:  0: 1   0xa5992: 15:63   997920 3024
 
 version=2.0  drive=0x80  mask=0x3  ticks=182  bell=# (0x23)
 options=packet,update,nosetdrv
 volume serial ID 9090-9090
 default_selection=F2 (Slice 2)
 ===

what do you get here if you re-run

boot0cfg -v ad0

before rebooting ? It seems that boot0cfg does not re-read
data from disk so if the write for some reason fails
(e.g. kern.geom.debugflags=0) you don't see the actual configuration
of the boot sector.
Looking at the code there should be an error message if writing
to disk fails, but maybe the error reporting oes not work well...


 Reboot and let boot0 time out and boot default slice 2:
 ===
 # boot0cfg -v ad0
 #   flag start chs   type   end chs   offset size
 1   0x80  0:  1: 1   0xa5494: 15:63   63   498897
 2   0x00495:  1: 1   0xa5989: 15:63   499023   498897
 3   0x00990:  0: 1   0xa5992: 15:63   997920 3024
 
 version=2.0  drive=0x80  mask=0x3  ticks=182  bell=# (0x23)
 options=packet,update,nosetdrv
 volume serial ID 9090-9090
 default_selection=F2 (Slice 2)
 ===
 The system actually booted into slice 1 here.

What does the system show as Default when it reboots ? F1 or F2 ?
This is just to check if the update actually went to disk.


 This was verified by dropping to the loader prompt and using show to grab:
 loaddev=disk0s1a:
 
 Reboot and hit 2 at the boot0 prompt:
 ===
 # boot0cfg -v ad0
 #   flag start chs   type   end chs   offset size
 1   0x00  0:  1: 1   0xa5494: 15:63   63   498897
 2   0x80495:  1: 1   0xa5989: 15:63   499023   498897
 3   0x00990:  0: 1   0xa5992: 15:63   997920 3024
 
 version=2.0  drive=0x80  mask=0x3  ticks=182  bell=# (0x23)
 options=packet,update,nosetdrv
 volume serial ID 9090-9090
 default_selection=F2 (Slice 2)
 ===
 
 This time we really boot into slice 2.
 
 The attached patch backs out the relevant part of r186598.
 
 There was a post on the embedded list that suggested this work around:
 echo 'a 2' | fdisk -f /dev/stdin ad0
 boot0cfg -s 2 ad0
 
 There are 2 issues with this:
 1) It can't be done without setting kern.geom.debugflags to 0x10.
 2) It resulted in most/all commands resulting in the error message
 Device not configured including the second command and 'shutdown -r now'.
 
 Both of which leave this really work around fairly broken.
 
 
 Tom
 

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: sys/boot/boot0/boot0.S - r186598

2011-01-09 Thread Luigi Rizzo
On Sun, Jan 09, 2011 at 12:57:24PM -0600, Tom Judge wrote:
 On 09/01/2011 12:33, Luigi Rizzo wrote:
  On Sun, Jan 09, 2011 at 12:39:28AM -0600, Tom Judge wrote:
  Hi,
 
  Today I ran into an issue where setting the default slice with boot0cfg
  -s is broken.
  a few questions inline:
 
 Output inline, full script log attached.
 
 If you need more info let me know.

can you take a dump of the boot sector at various stages
indicated below:

 snip

DUMP #1: ORIGINAL BOOT SECTOR

run boot0cfg -s 2 -v ad0 

DUMP #2: AFTER THE BOOT SECTOR UPDATE

reboot without pressing an F-key
expect to be in s2 but actually end up in s1, as you found

DUMP #3: AFTER A REBOOT WITH NO KEYPRESS

reboot, this time selecting the slice with F2

DUMP #4: AFTER THE SUCCESSFUL BOOT IN SLICE 2

At least from this we can tell how #4 differs from #2/#3

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Slow disk access while rsync - what should I tune?

2010-11-08 Thread Luigi Rizzo
On Mon, Nov 08, 2010 at 04:47:40PM +0300, cronfy wrote:
 Hello,
 
  Might gsched(8) help ?
 
 I am using 7.3, there is no gsched as far as I know..

it actually works just fine there, just take the code from
http://info.iet.unipi.it/~luigi/geom_sched/

cheers
luigi

 I am going to try gjournal instead - there was a suggest that gjoural
 may help here to scale huge io requests.
 
 -- 
 // cronfy
 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: an alternative to powerpoint

2010-07-24 Thread Luigi Rizzo
On Thu, Jul 22, 2010 at 05:19:25PM +0200, Oliver Fromme wrote:
 Ivan Voras ivo...@freebsd.org wrote:
   On 07/13/10 06:15, Luigi Rizzo wrote:
Have fun, it would be great if you could report how it works
on fancy devices (iphone, ipad, androids...) 
   
   For what it's worth, it doesn't work at all on Android :) (and the
   layout is messed up)
 
 It works pretty well on my Nexus One (Android 2.2) with
 the default browser.

good. There were several changes since the initial version just
to improve compatibility with other browsers -- among other things,
in included navigation buttons on the bottom line so you can
use it on phones and similar devices.

At the moment the 'distributed' version does not work with opera-mini
due to the peculiar way opera-mini handles javascript

cheers
luigi

 Best regards
Oliver
 
 -- 
 Oliver Fromme, secnetix GmbH  Co. KG, Marktplatz 29, 85567 Grafing b. M.
 Handelsregister: Registergericht Muenchen, HRA 74606,  Gesch?ftsfuehrung:
 secnetix Verwaltungsgesellsch. mbH, Handelsregister: Registergericht M?n-
 chen, HRB 125758,  Gesch?ftsf?hrer: Maik Bachmann, Olaf Erb, Ralf Gebhart
 
 FreeBSD-Dienstleistungen, -Produkte und mehr:  http://www.secnetix.de/bsd
 
 A language that doesn't have everything is actually easier
 to program in than some that do.
 -- Dennis M. Ritchie
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: an alternative to powerpoint

2010-07-14 Thread Luigi Rizzo
On Wed, Jul 14, 2010 at 01:24:09AM +0200, Julian H. Stacey wrote:
 Whoops I forgot cc hackers so resent.
 
   Haven't used it in years, but I liked it when I used 
   ports/misc/magicpoint.
  
  been there, done that:
  
  http://info.iet.unipi.it/~luigi/mgpm/
  
  cheers
  luigi
 
 Hey That's nice Luigi ! Multicast mgpm ...
 Hmm so eg BSD tech groups could do presentations eith a bunch of laptops
 in a room, rather than needing an overhead projector Eh ? Nice !
 That I must read more on.

for what matters i had implemented the same thing using
a browser-based solution -- the speaker sends its actions
through ajax to a simple web server, which logs them.
Other listeners also connect to the server which relays events
to them as they come. I intend to implement the same thing
in this 'sttp' version.

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: an alternative to powerpoint

2010-07-13 Thread Luigi Rizzo
On Mon, Jul 12, 2010 at 10:41:41PM -0700, Bakul Shah wrote:
 On Tue, 13 Jul 2010 06:15:14 +0200 Luigi Rizzo ri...@iet.unipi.it  wrote:
  Maybe you all love powerpoint for presentations, but sometimes
  one just needs to put together a few slides, perhaps a few bullets
  or images grabbed around the net, so i was wondering how hard
  would it be to do something that accepts a plain text file
  as input (without a ton of formatting) and lets you do a decent
  slide show, and supports editing the slides on the fly within
  the browser.
  
  Well, it's not too hard:
  
  http://info.iet.unipi.it/~luigi/sttp/
  
  just 400 lines of javascript and 100 lines of css, plus
  your human-readable text.
  
  Have fun, it would be great if you could report how it works
  on fancy devices (iphone, ipad, androids...) as my testing
  platforms are limited to Firefox, IE and chrome (which unfortunately
  cannot save the edited file)
 
 Seems to work fine in Safari  Opera.
 
 Your note inspired me to search the 'Net!  Since I prefer
 \latex{goop} to htmlgoop/html I went looking for a latex
 class and found 'Prosper'.  Looks like it can produce some
 really nice slides! See the examples here:
 
 http://amath.colorado.edu/documentation/LaTeX/prosper/
 
 And here is a tutorial:
 
 http://www.math.umbc.edu/~rouben/prosper/
 
 And of course, it is already in /usr/ports/textproc/prosper!
 I will have to give it a try as I was getting tired of
 fiddling around in Keynote (and I don't like powerpoint).
 
 [Hope you don't mind my mentioning Prosper!]

latex based solutions are great when it comes to show formulas.
I normally use prosper or similar things.
But placing figures is a bit of a nightmare, though, and at least
for slides there is a lot of visual clutter in the latex formatting
(of course one could write a preprocessor from plain text to latex/prosper).
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: an alternative to powerpoint

2010-07-13 Thread Luigi Rizzo
On Tue, Jul 13, 2010 at 04:17:06PM +0300, Peter Pentchev wrote:
...
 Nice work indeed!
 
 Just as an aside, though - are you aware of Eric Meyer's S5,
 also available in your friendly neighbourhood Ports Collection
 as textproc/s5? :)

yes, there are many such things -- and i have done a fair amount
of work on Slidy, building a distributed version called 'syncslidy'
which allows distributed presentations.

The problem, after a fair amount of usage, is really editing the
slides in a simple way.

cheers
luigi

 But yours does look a bit simpler to enter text in, although
 I myself am quite used to typing HTML.
 
 G'luck,
 Peter
 
 -- 
 Peter Pentchevr...@space.bgr...@ringlet.netr...@freebsd.org
 PGP key:  http://people.FreeBSD.org/~roam/roam.key.asc
 Key fingerprint   FDBA FD79 C26F 3C51 C95E  DF9E ED18 B68D 1619 4553
 This sentence would be seven words long if it were six words shorter.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: an alternative to powerpoint

2010-07-13 Thread Luigi Rizzo
On Tue, Jul 13, 2010 at 04:36:50PM -, Larry Baird wrote:
 In article 110613.02658.82...@localhost you wrote:
  Maybe you all love powerpoint for presentations, but sometimes
  one just needs to put together a few slides, perhaps a few bullets
  or images grabbed around the net, so i was wondering how hard
  would it be to do something that accepts a plain text file
  as input (without a ton of formatting) and lets you do a decent
  slide show, and supports editing the slides on the fly within
  the browser.
  
  Well, it's not too hard:
  
  http://info.iet.unipi.it/~luigi/sttp/
 Haven't used it in years, but I liked it when I used ports/misc/magicpoint.

been there, done that:

http://info.iet.unipi.it/~luigi/mgpm/

cheers
luigi

 Port description is:
 MagicPoint - an X11 based presentation tool
 
 MagicPoint is an X11 based presentation tool.  It is designed to make
 simple presentations easy while to make complicated presentations
 possible.  Its presentation file (whose suffix is typically .mgp) is
 just text so that you can create presentation files quickly with your
 favorite editor (e.g. Emacs).
 
 
 
 -- 
 
 Larry Baird| http://www.gta.com
 Global Technology Associates, Inc. | Orlando, FL
 Email: l...@gta.com | TEL 407-380-0220, FAX 407-380-6080
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


an alternative to powerpoint

2010-07-12 Thread Luigi Rizzo
Maybe you all love powerpoint for presentations, but sometimes
one just needs to put together a few slides, perhaps a few bullets
or images grabbed around the net, so i was wondering how hard
would it be to do something that accepts a plain text file
as input (without a ton of formatting) and lets you do a decent
slide show, and supports editing the slides on the fly within
the browser.

Well, it's not too hard:

http://info.iet.unipi.it/~luigi/sttp/

just 400 lines of javascript and 100 lines of css, plus
your human-readable text.

Have fun, it would be great if you could report how it works
on fancy devices (iphone, ipad, androids...) as my testing
platforms are limited to Firefox, IE and chrome (which unfortunately
cannot save the edited file)

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: svn commit: r204615 - head/sbin/newfs

2010-03-03 Thread Luigi Rizzo
On Wed, Mar 03, 2010 at 12:44:06AM -0800, Garrett Cooper wrote:
...
 Maxim,
 
 Xin Li has a point. I ran some tests and the ad hoc parsing function
 eats up more memory than expand_number(3) [*]:

as someone reminded me, a static library only brings in the archive
members you actually use, whereas with a dinamic library you are forced,
at runtime, to bring in the entire library (which requires duplicating
the static data at least).
So i suspect that even if other programs on the same system already
use libutil, just the library overhead is more than the cost
of a single function.

If one is really concerned with memory usage and at the same time
wants (for the good reasons mentioned in the thread) to reuse
the expand_number code, the it seems that the best approach is
force static link with libutil (to bring in just the function you
need) and use the default approach for the rest.

cheers
luigi

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


does Copyright on source files expire ?

2009-03-25 Thread Luigi Rizzo
Someone just asked me permission to move to a 3-clause BSD
copyright some piece of software that I haven't touched in 10+ years.

I said yes, but then I was wondering what happens if the
person listed is not responding or not reachable anymore:
does copyright on source code expire, and if so, when ?
(I suppose it is related to either the date listed on the copyright,
or to the date of some remarkable event for the author).

cheers
luigi

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: does Copyright on source files expire ?

2009-03-25 Thread Luigi Rizzo
On Wed, Mar 25, 2009 at 05:31:52AM -0400, David Schultz wrote:
 On Wed, Mar 25, 2009, Luigi Rizzo wrote:
  Someone just asked me permission to move to a 3-clause BSD
  copyright some piece of software that I haven't touched in 10+ years.
  
  I said yes, but then I was wondering what happens if the
  person listed is not responding or not reachable anymore:
  does copyright on source code expire, and if so, when ?
  (I suppose it is related to either the date listed on the copyright,
  or to the date of some remarkable event for the author).
 
 In the US, the rule that applies most of the time is that
 Copyright expires 70 years after the author dies, although there
 are many special cases where the term differs.
 
 A person's Copyright doesn't go away just because they die,
 disappear, or fail to respond. If you can't contact them, their
 heirs, or whomever they transferred the Copyright to, you're stuck.

so it's worse than a patent :)

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: write-only variables in src/sys/ - possible bugs

2009-02-02 Thread Luigi Rizzo
On Mon, Feb 02, 2009 at 08:42:32PM +0100, Christoph Mallon wrote:
 Hi,
 
 I compiled a list of all local variables in src/sys/ (r188000), which 
 are only written to, but never read. This is more than the GCC warning, 

interesting list, thanks.
Also, 700 entries is not a bad result considering the size
of the codebase and the age of parts of it (i am pretty sure
there is a lot of code 15+ years old which received little
if any mainteinance or use in the past decade).
(and i have nothing against old code except that compilers,
coding practices and the amount of peer review have improved
a lot over time, and so -- with some exceptions -- it is
easier to prevent some of these issues with more recent code).

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: Enhancing cdboot [patch for review]

2008-12-08 Thread Luigi Rizzo
On Mon, Dec 08, 2008 at 02:40:41PM -0800, Maxim Sobolev wrote:
 Hi,
 
 Below please find patch that enhances cdboot with two compile-time options:
...
 Any comments/suggestions are appreciated. If there are no objections I
 would like to commit the change. The long-term goal is to make
 CDBOOT_PROMPT default mode for installation CD.
 
 http://sobomax.sippysoft.com/~sobomax/cdboot.diff

Looks good. Some comments:
1. since there is plenty of space in the cdboot sector, why don't you
   make the two option always compiled in, controlling which one to
   activate through flags in the bootsector itself, to be set
   patching the binary sector itself using a mechanism similar to
   boot0cfg.
  Of course you cannot alter a cdrom after you burn it,
   but it makes it easier to build CDs with one or the other defaults,
   patching cdboot or the iso image itself before creating/burning it.

2. in fact, the 'silent' option could be disabled at runtime by
   pressing some key (e.g. adding a short wait loop before proceeding;
   if this is meant for custom, unattended CDs the extra delay should not
   matter much);

3. one nitpick -- in one of the first chunks you replace $start
   with $LOAD, but if i am not mistaken operation depends on $LOAD = $start,
   so why don't you always use the same ?
 Also in terms of relocation size, wouldn't it be the case of
   hardwiring the size of the cd boot sector:

-   mov $((end_init - start)/2),%cx
+   mov 1024,%cx

4. another nitpick -- the value you pass in %si to the MBR does not
   seem to point to anything useful. As discussed about boot0.S and
   the followup in the mailing lists, there seems to be no standard
   but at least some MBR expect %si to point to a partition entry,
   so you should probably initialize one in a way similar way to that
   used by boot0.S

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Enhancing cdboot [patch for review]

2008-12-08 Thread Luigi Rizzo
On Mon, Dec 08, 2008 at 04:29:04PM -0800, Maxim Sobolev wrote:
 Luigi Rizzo wrote:
...
 4. another nitpick -- the value you pass in %si to the MBR does not
seem to point to anything useful. As discussed about boot0.S and
the followup in the mailing lists, there seems to be no standard
but at least some MBR expect %si to point to a partition entry,
so you should probably initialize one in a way similar way to that
used by boot0.S
 
 Hmm, maybe I misunderstood it then. What do you mean by point to 
 partition entry exactly? Right now it points to the beginning on MBR.

ok, so here is what I know.

Even though there is no standard, at least ldlinux.sys and perhaps
other bootloaders expect %si to point to a 16-byte record containing
the partition descriptor (same structure as one of the 4 records
at 0x1be in the MBR) for the partition they were loaded from.

ldlinux.sys uses this info to relocate: it knows the location of the
other sectors of ldlinux.sys relative to the beginning of the partition,
and uses the start-of-partition from the record at %si to compute
these locations in terms of absolute disk positions.

Note that in principle a MBR does not need this info -- even if it
is a multi-sector boot code such as boot0ext, it may well assume to
be located at offset 0.

On the other hand if the code on the MBR uses %si, then you should
set the entry so that at least the starting CHS and LBA info point
to the first sector on disk, i.e. CHS=0,0,1 and LBA=0.

In practical terms -- make %si point to a 16-byte area of memory
containing all 0's except for the byte representing the sector
number for the start of the partition.
See the code in a recent sys/boot/i386/boot0/boot0.S which gives
some details on this.

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: btx/pxeboot problem

2008-12-02 Thread Luigi Rizzo
On Tue, Dec 02, 2008 at 01:48:17PM +0200, Danny Braniss wrote:
 latest pxeboot (7.1):
   mother-boardNIC/LOM CPU
   -   --- ---
   Intel SWV25  em xeonworks fine
   SUN X2200bgeamd works fine
   DELL PE 2950 bcexeonfailes 95% of the times
   hangs or goes into btx dump regs. mode 
 :-)
   Intel SE7320VP21 mskxeonfailes 50% of the times - hangs
 
 pxeboot with btx.S 1.45 2008/02/27 23:35:39, works fine.

 so it seems that changes since 1.45 have fixed it for some, but it
 brakes for others :-). I can help testing, but btx is way out of
 my league.

interesting, so this is the same problem i was seeing on the Asus/amd
machines here...

the commit log for 1.47 mention interrupt issues which are consistent
with the random hangs or errors that I see while booting over the
network.

http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/boot/i386/btx/btx/btx.S.diff?r1=1.46;r2=1.47

I wonder if the hangs are related to interrupts coming in at the
wrong time. I also wonder whether the same symptoms might also
affect the standard loader and not just pxeloader, in which case
the problem would be slightly more serious.

I am afraid my ability to debug the problem isn't going much
beyond this...

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


convert bootable freebsd or linux iso to bootable flash image

2008-11-25 Thread Luigi Rizzo
I have updated my iso2flash.sh script so that now it can
convert both FreeBSD _and_ linux ISO images to flash images.
For the latter, it uses a FAT filesystem, and puts a linux
loader with syslinux, for which you can find a port at

http://info.iet.unipi.it/~luigi/FreeBSD/index.html#syslinux-port

Below is the updated code for iso2flash.sh

--- cut here 

#!/bin/sh
# $Id: iso2flash.sh 897 2008-11-25 22:41:00Z luigi $
# convert an ISO image to flash image
# The type of filesystem depends on the content, but can be forced
# manually.
#
# based on picobsd tricks.
# Requires makefs, bsdlabel, sed and dd
# The linux image uses mtools and syslinux, see
#   http://info.iet.unipi.it/~luigi/FreeBSD/#syslinux-port
# see http://www.allbootdisks.com/download/iso.html

MAKEFS=makefs
MKLABEL=bsdlabel
BSDTAR=tar

# Create a linux disk starting from an ISO. Use a FAT media
# and syslinux to format it. Add some intelligence to figure
# out where the kernel is and what options it needs.

make_linux_image() {# src_tree dest_image
local tree=$1
local img=$2
local size=$(( $3 + 1000 )) # size in kb
local x=$(( 1 + $size / 128 ))  # 128k units, good for dd
# if you have an old newfs_msdos...
local OPTS= -h 16 -u 64 -S 512 -s $(( 2 * ${size} )) -o 0
[ -f ${img} ]  { chmod u+w ${img}; rm ${img} ; }
dd if=/dev/zero bs=128k count=$x of=${img}  # create blank file
newfs_msdos ${OPTS} ${img}  # format msdos
syslinux ${img} # add linux bootcode

# Try to identify where the kernel is
local boot=`find $tree -name boot`
if [ x${boot} != x -a -d ${boot} ] ; then
echo moving boot code
chmod u+w ${boot} ${boot}/*
mv ${boot}/* $tree
[ -f ${tree}/syslinux.cfg ] || mv ${tree}/isolinux.cfg 
${tree}/syslinux.cfg
fi
if [ -d ${tree}/isolinux ] ; then
# systemrescuecd
local sys=${tree}/syslinux
echo moving files... 
[ -d ${sys} ] || mkdir -p ${sys}
chmod -R u+w ${tree}/isolinux
mv ${tree}/isolinux/* ${sys}
[ -f ${sys}/syslinux.cfg ] || mv ${sys}/isolinux.cfg ${sys}/syslinux.cfg
fi
if [ -f ${tree}/syslinux.cfg ] ; then
elif [ -f ${tree}/syslinux/syslinux.cfg ] ; then
elif [ -f ${tree}/linux ] ; then
elif [ -f ${tree}/CE_BZ ] ; then
# splashtop / expressgate
echo default ce_bz  ${tree}/syslinux.cfg
else
boot=`cd ${tree}; find . -name boot.img`
if [ x${boot} != x -a -f ${tree}/${boot} ] ; then
cp -p /usr/local/share/syslinux/memdisk $tree
( echo default memdisk; 
echo append initrd=${boot} )  $tree/syslinux.cfg
fi
fi
mcopy -i ${img} -s ${tree}/* ::/# copy the tree
mdir -/ -i  ${img} ::   # show the results
}

# to add freedos code:
#perl sys-freedos.pl --disk=${img} --heads=16 --sectors=64 --offset=0 # --lb
#dd if=mbrfat.bin bs=90 iseek=1 oseek=1 of=${img} conv=notrunc

# Create a FreeBSD image.
make_freebsd_image() {  # tree imagefile size
local tree=$1
local imagefile=$2
local boot1=${tree}/boot/boot1
local boot2=${tree}/boot/boot2

echo convert tree $tree image $img
${MAKEFS} -t ffs -o bsize=4096 -o fsize=512 \
-f 50 ${imagefile} ${tree}
${MKLABEL} -w -f ${imagefile} auto # write a label
# copy partition c: into a: with some sed magic
${MKLABEL} -f ${imagefile} | sed -e '/  c:/{p;s/c:/a:/;}' | \
${MKLABEL} -R -f ${imagefile} /dev/stdin

# dump the primary and secondary boot (primary is 512 bytes)
dd if=${boot1} of=${imagefile} conv=notrunc 2/dev/null
# XXX secondary starts after the 0x114 = dec 276 bytes of the label
# so we skip 276 from the source, and 276+512=788 from dst
# the old style blocks used 512 and 1024 respectively
dd if=${boot2} iseek=1 ibs=276 2 /dev/null | \
dd of=${imagefile} oseek=1 obs=788 conv=notrunc 2/dev/null
}

extract_image() {   # extract image to a tree
[ -f $1 ] || return
local tmp=${tree}.tree
echo Extract files from ${tree} into $tmp 
(chmod -R +w $tmp; rm -rf $tmp )
mkdir -p $tmp
ls -la $tmp
(cd $tmp  ${BSDTAR} xf $tree )
ls -la $tmp
tree=$tmp
}

guess_type() {
echo guess type
imgtype=error # default
[ -f $tree/boot/loader -a -f $tree/boot/loader.rc ]  { imgtype=bsd; 
return ; }
local a=`find $tree -name isolinux`
[ x$a != x -a -d $a ]  { imgtype=linux; return ; }
}

# option processing
while [ x$* != x ] ; do
case x$1 in
x-t )   # type
shift
imgtype=$1
;;
*)
break
;;
esac
shift
done

tree=`realpath $1`
image=`realpath $2`
echo type $imgtype tree $tree image $image

extract_image $tree
set `du -sk $tree`
size=$1
echo image size is $size kb

while true ; do
case x$imgtype in
*[Bb][Ss][Dd] )

convert bootable freebsd iso to bootable flash image

2008-11-14 Thread Luigi Rizzo
Just in case people have a similar need, or can point me to better
code to do the same job:

i needed to convert a bootable FreeBSD iso image into a bootable
flash image, and have come up with the following code (derived
from PicoBSD). The nice part is that this is all done without
requiring root permissions -- the iso extraction is done with
bsdtar, the file system is created using makefs, and the
other patching is done with bsdlabel and dd.

Now i need to find something similar to convert a bootable
linux image and a bootable DOS image :)

cheers
luigi

--- cut here 

#!/bin/sh
# convert a FreeBSD iso to flash image
#
# based on picobsd tricks.
# requires makefs, bsdlabel, bsdtar, sed and dd

MAKEFS=makefs
MKLABEL=bsdlabel
BSDTAR=tar

make_freebsd_image() {  # tree imagefile
local tree=$1
local imagefile=$2
local boot1=${tree}/boot/boot1
local boot2=${tree}/boot/boot2

echo convert tree $tree image $img
${MAKEFS} -t ffs -o bsize=4096 -o fsize=512 \
-f 50 ${imagefile} ${tree}
${MKLABEL} -w -f ${imagefile} auto # write a label
# copy partition c: into a: with some sed magic
${MKLABEL} -f ${imagefile} | sed -e '/  c:/{p;s/c:/a:/;}' | \
${MKLABEL} -R -f ${imagefile} /dev/stdin

# dump the primary and secondary boot (primary is 512 bytes)
dd if=${boot1} of=${imagefile} conv=notrunc 2/dev/null
# XXX secondary starts after the 0x114 = dec 276 bytes of the label
# so we skip 276 from the source, and 276+512=788 from dst
# the old style blocks used 512 and 1024 respectively
dd if=${boot2} iseek=1 ibs=276 2 /dev/null | \
dd of=${imagefile} oseek=1 obs=788 conv=notrunc 2/dev/null
}

tree=$1
image=$2
if [ -f $1 ] ; then
echo Extract files from ${image}
tmp=${image}.tree
mkdir -p $tmp
(cd $tmp  ${BSDTAR} xf $tree)
tree=$tmp
fi
make_freebsd_image $tree $image
[ -d $tmp ]  (chmod -R +w $tmp  rm -rf $tmp)
#-- end of fil 
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Documentation on writing a custom socket

2008-03-09 Thread Luigi Rizzo
On Sun, Mar 09, 2008 at 10:17:14AM +0100, Hans Petter Selasky wrote:
 On Saturday 08 March 2008, Max Laier wrote:
  Am Sa, 8.03.2008, 11:33, schrieb Hans Petter Selasky:
   I'm planning to create a new socket type in FreeBSD called AF_Q921, which
   is
   to be used for ISDN telephony. Where do I find documentation on how to
 
  interesting ... can you share more information on this project?
 
 Hi Max,
 
 I'm currently working on some redesign of my ISDN4BSD stack in cooperation 
 with the FreeSwitch project. One of things that I want to do is to have all 
 ISDN adapters appear like network devices. The ISDN protocols usually use 
 something called Q.921 which is similar to TCP, only very simplified.

funny, I thought that the word simplified and an ITU specification
were incompatible concepts :)

(for the records, the Q.921 spec is available online from ITU

http://www.itu.int/rec/dologin_pub.asp?lang=eid=T-REC-Q.921-199709-I!!PDF-Etype=items


cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Pluggable Disk Scheduler Project

2007-10-17 Thread Luigi Rizzo
On Wed, Oct 17, 2007 at 03:09:35PM +0200, Ulf Lilleengen wrote:

... discussion on Hybrid vs. GEOM as a suitable location for
... pluggable disk schedulers

 However, I'd hate to see the Hybrid effort go to waste :) I was hoping some
 of the authors of the project would reply with their thoughts, so I CC'ed
 them. 

we are in good contact with Fabio and i am monitoring the discussion,
don't worry.

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


kernel headers dependency graph ? (systm.h tangle)

2007-02-08 Thread Luigi Rizzo
Hi,
Is there a tool to produce a dependency graph for C headers ?

If that matters (i.e. someone has already studied it),
i am interested in the header situation in the FreeBSD kernel.

It may be a well known thing, but i just realized
that sys/systm.h is entagled with sys/libkern.h and both bring
in a lot of other headers, and you cannot bring in simply
the string.h functions, or printf, because there is no
leaf header for them. 
I don't know if this is the only case, or there are other
'classes' which are intermixed with lots of other stuff.

I suppose the problem has been already discussed and it is just
the result of historical reasons, but is there any reason other
than ENOTIME why (to cite things that are trivial to fix while  
preserving compatibility):

- we don't have sys/string.h with all the memcpy/bcopy and friends
  that are currently spread between systm.h and libkern.h

- printf/scanf and strto*() are not in their own header;

and so on ?
 
cheers
luigi

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: unique hardware identification

2007-02-01 Thread Luigi Rizzo
On Thu, Feb 01, 2007 at 11:38:51AM -0700, M. Warner Losh wrote:
 In message: [EMAIL PROTECTED]
 Peter Jeremy [EMAIL PROTECTED] writes:
 : On Sun, 2007-Jan-28 10:39:36 -0600, Jon Passki wrote:
 : If the machine is a PXE-compliant device [2], it should have a GUID/ 
 : UUID [1] available.  This can be exposed by sysutil/hal [3] via the  
 : smbios.system.uuid field.
 : 
 : You can also get it via kenv(8) without needing any ports:
 : # kenv smbios.system.uuid
 : 9F345F4F-BEFC-D431-1340-61235A56DEF9
 
 I wonder why the smbios stuff isn't exported via sysctls as well...

and this is probably a lazy vendor :)

smbios.bios.reldate=07/12/2006
smbios.bios.vendor=American Megatrends Inc.
smbios.bios.version=P1.10
smbios.chassis.maker=To Be Filled By O.E.M.
smbios.chassis.serial=To Be Filled By O.E.M.
smbios.chassis.tag=To Be Filled By O.E.M.
smbios.chassis.version=To Be Filled By O.E.M.
smbios.planar.maker=  
smbios.planar.product=775i945GZ
smbios.planar.serial=  
smbios.planar.version=  
smbios.socket.enabled=1
smbios.socket.populated=1
smbios.system.maker=To Be Filled By O.E.M.
smbios.system.product=775i945GZ
smbios.system.serial=To Be Filled By O.E.M.
smbios.system.uuid=00020003-0004-0005-0006-000700080009
smbios.system.version=To Be Filled By O.E.M.

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


syslog bug ? (was Re: [PATCH] add header pppoe: in ng_pppoe.c printfs)

2006-08-05 Thread Luigi Rizzo
On Sat, Aug 05, 2006 at 12:42:12AM +0100, Joao Barros wrote:
...
 I patched and recompiled the kernel.
 After booting I notice that no messages from ppp are logged by syslog
 (messages|ppp.log)

What is your OS version ?

i hit a similar problem some time ago, and it seems that
the syslog client code remembers any error on the socket
(e.g. ICMP host/port unreachable messages) and does not
retry afterwards (or for some time, or there is some bug
in handling the error condition).
I am a bit fuzzy on the details because
this was some 3 years ago on a 4.x client.

Your problem is likely because ppp starts before the syslog daemon,
the initial message fails and then you get nothing anymore.

the vsyslog code in 6.x (libc/gen/syslog.c) is slightly different
from the one in 4.11.

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Boot manager beep (revisited)

2006-05-01 Thread Luigi Rizzo
On Mon, May 01, 2006 at 01:15:44PM +0300, Giorgos Keramidas wrote:
...
 I'd certainly prefer it if the beep was turned *off* by default,
 but I'm not sure if that's what everyone prefers.  This is why I
 opted for keeping the current behavior and making my personal
 preference an option :)

i do prefer it off.
and my laptop has no volume control available at boot.
many new laptops have everything in software, which means
very little controls available at boot time.

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: IO schedulers in FBSD...

2005-09-15 Thread Luigi Rizzo
On Thu, Sep 15, 2005 at 06:45:27PM +0530, Pranav Peshwe wrote:
 Hello,
 Which is the I/O scheduler used by FBSD 5.4 ? 
 I googled in various ways but could not get an answer.

it is called FCFSUSIIABPIWCTTOIS, which stands for
First Come First Serve Unless Someone Is In A Better Position
In Which Case The Temporal Order Is Subverted

also known as the standard one-way elevator taught in all operating
system courses.

 I do not have access to the source code.

you do, it is at

http://www.freebsd.org/cgi/cvsweb.cgi/

ready for your perusal.

cheers
luigi

 TIA.
 
 Sincere regards,
 Pranav.J.Peshwe
 
 ---
 A picture is worth a thousand words.
 A formula is worth a thousand pictures.
 - Edsger W. Dijkstra
 ___
 freebsd-hackers@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to [EMAIL PROTECTED]
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: IO schedulers in FBSD...

2005-09-15 Thread Luigi Rizzo
On Thu, Sep 15, 2005 at 07:52:15PM +0530, Pranav Peshwe wrote:
...
 Thank you very much for the name and the source
 link.Is there any documentation available on this
 topic(fbsd io schedulers) ?
 Where is the io scheduler located in the
 src code tree ?

see http://wikitest.freebsd.org/moin.cgi/Hybrid there is more docs on this

cheers
luigi
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 5.4 -- bridging, ipfw, dot1q

2005-08-12 Thread Luigi Rizzo
I am afraid the existing code cannot help you.
The packets you see are encapsulated in 802.1q aka VLAN frames,
and since ipfw2 does not try to decapsulate the packets, you
don't get to see the IP headers.

Your most reasonable option would be to write a new ipufw2 opcode,
say something like 'vlan-decap x-y', which succeeds if the packet has
a vlan header in the range x to y, and in this case skips the VLAN header,
tries to re-parse the header fields as in the beginning of
ip_fw_chk() after the section

/*
 * Collect parameters into local variables for faster matching.
 */

and then continues.
It's not a lot of code, in the worst case you can just cutpaste
the relevant 50-60 lines from the beginning of the code
(though of course it would be nice to rearrange the code to
reduce duplication).

By doing this you can do something like

ipfw add skipto 1000 vlan-decap 1-50

and then process vlans 1 to 50 at line 1000.
Maybe it is a good idea to split the vlan-id matching and the decapsulation.

cheers
luigi

On Fri, Aug 12, 2005 at 05:07:13AM -0400, Dan Mahoney, System Admin wrote:
 Note:  I posted this to questions@ earlier, but upon further investigation 
 of the issue, I realize that I basically need a hack.
 
 Warning, long.
 
 My original question:
 
 [begin]
 
 I'm setting up a bridging firewall where the packets are passing through 
 on dot1q trunks.  Figure sixty or so.  Too many to create separate 
 interfaces.
 
 The bridge works.  Packet counts in the default match rule work (so I 
 assume the bridge at least sees the packets).
 
 Problem is, any reasonable rules (such as those which actually say to 
 block traffic by ip or port or anything) aren't working
 at all.  Not even logging counts.
 
 Setting the bridged flag doesn't seem to help.
 
 My only guess is that ipfw doesn't have the brains to look beyond the VLAN 
 tags.  Is this the case?  Is this supported under 4.x (I'm using 5, but 
 can downgrade), or is there any way AT ALL that I can get this to work?
 
 As a note, snort and trafshow and everything else work fine analyzing the 
 bridge traffic, it seems only the kernel has an issue.
 
 [end]
 
 Now my plea to hackers@:
 
 From what I can see, the packet type is mac, and that's the only rules 
 that match.   I'm not 100 percent sure if this is because of the point at 
 which this is being received, or because of the dot1q headers.  I have to 
 assume it's the headers because, well, otherwise putting ipfw on a bridge 
 would seem pretty silly to me.
 
 I basically need minor mods done to the kernel code so that dot1q trunked 
 traffic seen through a bridge is seen by ipfw rules (and matched by the 
 same)...
 
 I basically assume this doesn't work because of this post made by Ted 
 Middelstadt a couple years ago
 
 http://groups-beta.google.com/group/mailing.freebsd.questions/browse_frm/thread/79d023785ddc58ed/4e280a013b6325d4?tvc=1q=vlan+trunk+ipfw+bridge+tedhl=en#4e280a013b6325d4
 
 Of course, he says this:
 
 The biggest loss of NOT having an Ethernet-specific ipfw-like filtering
 program, is that there's no convenient vehicle to use for adding in code
 for filtering based on MAC addresses, which is certainly the domain of
 a bridge.
 
 And ipfw2 basically addresses this.
 
 This is what I see on my bridged packets with log:
 
 Aug 11 23:38:43 fwi kernel: ipfw: 360 Accept MAC in via em1
 
 I've tried every possible combination of arguments to ipfw which seem to 
 match. 
 None are hitting:
 
 00305  00 count ip from any to 56.199.242.178 layer2 
 mac-type 0x8100
 00305  00 count ip from any to 56.199.242.178 mac-type 
 0x8100
 00305  00 count ip from any to 56.199.242.178 mac-type 
 0x8100
 00305  00 count ip from any to 56.199.242.178 mac-type 
 0x8100 via em1
 00305  00 count ip from any to 56.199.242.82 mac-type 
 0x8100 via em1
 00305  00 count ip from any to 56.199.242.82 layer2 
 mac-type 0x
 
 If this is possible with standard vanilla bridging and standard ipfw, 
 please let me know, of course.  I'm guessing dot1q encapsulated traffic 
 just doesn't match this.  I can match traffic with an any to any mac-type 
 vlan or an any to any layer2 rule.  But I think I can't match on more 
 specific criteria (like an IP address) because the ipfw layer sees it as 
 non-ip traffic, and doesn't even attempt to match it (even though I'm 
 telling it specifically to do so), so it falls into the silently passed 
 portion.
 
 I don't know c.  And this is a bad time and place to learn.  The kernel 
 code is also fairly streamlined, and I *really* don't have the time to 
 learn structures and the like.  It's on my long-term to-do list, I swear.
 
 Otherwise, I'm relatively sure this is less than an hour's worth of work, 
 please someone let me know what it's worth to you and I'll make it happen.
 
 (While I'lll be happy with a quick hack, this really 

Re: 5.4 -- bridging, ipfw, dot1q

2005-08-12 Thread Luigi Rizzo
On Sat, Aug 13, 2005 at 12:49:56AM +0200, Jeremie Le Hen wrote:
 Hi,
 
  I am afraid the existing code cannot help you.
  The packets you see are encapsulated in 802.1q aka VLAN frames,
  and since ipfw2 does not try to decapsulate the packets, you
  don't get to see the IP headers.
  
  Your most reasonable option would be to write a new ipufw2 opcode,
  say something like 'vlan-decap x-y', which succeeds if the packet has
  a vlan header in the range x to y, and in this case skips the VLAN header,
  tries to re-parse the header fields as in the beginning of
  ip_fw_chk() after the section
  
  /*
   * Collect parameters into local variables for faster matching.
   */
  
  and then continues.
  It's not a lot of code, in the worst case you can just cutpaste
  the relevant 50-60 lines from the beginning of the code
  (though of course it would be nice to rearrange the code to
  reduce duplication).
  
  By doing this you can do something like
  
  ipfw add skipto 1000 vlan-decap 1-50
  
  and then process vlans 1 to 50 at line 1000.
  Maybe it is a good idea to split the vlan-id matching and the decapsulation.
 
 Isn't it posible to cheat using vlan(4) interface ?  I think it's
 possible to create them in order to use its code to zap the VLAN header
 and then use ipfw to filter on these vlan(4) interfaces.  This isn't
 more than a workaround, but it might help.

well it would be painful to configure, because the number of vlans is
(according to what Dan says) is large, and he would have to define
N vlan interfaces on each of the physical ones, then define
N bridges between the corresponding vlans (and i think there is
a limit on how large N can be).
Additionally demuxing incoming packets would take O(N) time.

cheers
luigi

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Nagios and threads

2005-06-22 Thread Luigi Rizzo
reading also the continuation of this mail thread, I wonder if there
is any relationship with this issue i found a few days ago debugging
asterisk. It happens when linking the code with libc_r, but maybe
some of the bugs in libc_r were also imported in other thread
libraries.

cheers
luigi


Probably a known issue, but I thought it worthwhile reporting it,
if nothing else for archival purposes.

I think our userland thread library (libc_r) has some bugs in
handling descriptors.  I can reproduce the behaviour on -current
and 4.x, and I believe it applies to 5.x too.  

Following is a description of the problem and some code to replicate it
The code includes a workaround but it is not particularly nice.

Any better ideas ? I am not sure on what to do, but perhaps the
only sensible thing to do is to add a note with this workaround
(or better ones, if available) to our pthreads manpage

--- PROBLEM DESCRIPTION ---

Basically, our libc_r keeps two views of i/o descriptors, one
(external) is for threads and reflects the modes requested by the
threads (blocking or not, etc.); the internal view instead is how
descriptors are actually set in the kernel -- and there they should
always be set as O_NONBLOCK to avoid blocking on a syscall.

The bug occurs when a process does a fork(), and then either
a close() or an exec() -- a similar thing also occurs with popen().
The relevant source code is in

/usr/src/lib/libc_r/uthread/uthread_execve.c
/usr/src/lib/libc_r/uthread/uthread_close.c

Right before the exec(), the internal descriptors are put into
blocking mode if the external one are blocking, and they are only
reset to O_NONBLOCK after termination of the child (upon SIGCHLD).
The same occurs for close(). 

Note that close() has hacks to leave pipes alone, but the same
code is not present in the execve() case where instead I believe
it would be necessary. Another thing to note is that there is
some kind of 'fate sharing' among the stdio descriptors (0, 1, 2)
which is not totally clear to me, but seems to require setting
O_NONBLOCK on all 3 to make sure that they are not changed to
blocking mode.

Because descriptors are shared between parent and child, for the
lifetime of the child descriptors in the parent will be blocking
and the scheduling of threads will be completely broken.

The only fix i have found is to act as follows:

pipe(fd);   /* create a pipe with the child */
p = fork();
if (p == 0) { /* child */
/* call fcntl() _before_ close() to avoid resetting
 * O_NONBLOCK on the internal descriptors. After that,
 * close the descriptors not needed in the child.
 */  
for (i=0; i  getdtablesize(); i++) {
long fl = fcntl(i, F_GETFL);
if (fl != -1  i != fd[0]) {
/* open and must be closed in the child */
fcntl(i, F_SETFL, O_NONBLOCK | fl);
close(i);
}
}
/* standard stuff (dup2, exec*()... */
dup2(fd[0], STDOUT_FILENO); /* as an example */
execl();
} else { /* parent */
close(fd[0]);   /* close child end. */
...
}

but of course this is rather unintuitive. On the other hand,
I have no idea of a better way to address the problem, and being
fairly new to threads programming maybe others know better.

I am attaching two minimal programs to demonstrate the bug.

simple.c is a simple program (linked against the regular C library)
cc -o simple simple.c

that only plays with blocking mode on the descriptors.

thre.c is meant to be linked with libc_r.
cc -o thre thre.c -lc_r

It does a fork and exec of the other program.
If you call it without arguments, it does not implement the
above workaround, and you see how the 'internal' descriptor
change to blocking mode. If you call it with an argument, it
implements the workaround.

enjoy
luigi

On Mon, Jun 20, 2005 at 04:56:36PM -0400, Charles Sprickman wrote:
 Hello,
 
 Just curious if there's any regulars here who would like to help Ethan 
 out:
 
 http://nagios.sourceforge.net/docs/2_0/whatsnew.html
 
 Known Issues
 
 There are a few known issues with the Nagios 2.0 code at the moment. 
 Hopefully some of these will be fixed before 2.0 is released as stable...
 
 1. FreeBSD and threads. On FreeBSD there's a native user-level 
 implementation of threads called 'pthread' and there's also an optional 
 ports collection 'linuxthreads' that uses kernel hooks. Some folks from 
 Yahoo! have reported that using the pthread library causes Nagios to pause 
 under heavy I/O load, causing some service check results to be lost. 
 Switching to linuxthreads seems to help this problem, but not fix it. The 
 lock happens in liblthread's __pthread_acquire() - it can't ever acquire 
 the spinlock. It happens when the main thread 

Re: Loadable Scheduler in Freebsd

2004-11-06 Thread Luigi Rizzo
On Sat, Nov 06, 2004 at 11:21:23AM -0800, John-Mark Gurney wrote:
 Devesh Shah wrote this message on Thu, Nov 04, 2004 at 15:22 -0800:
  Based on the SYSINIT framework, I have made ULE scheduler as a loadable module but 
  have not quite
  figured how to migrate from default 4bsd to newly loaded ule scheduler or is it 
  possible at all.
 
 As someone suggested, switches schedulers would be very complex..

actually i beg to differ, as we implemented it in 4.x back
in summer 2002 -- our code allowed to switch between schedulers at
runtime, and we had a prototype Proportional Share (PS for short)
scheduler which you could use instead of the standard BSD one.

I don't see much of a problem in switching schedulers at runtime,
if you properly hide the scheduler's internal information from
the process' descriptor, which is what we did. At which point,
switching scheduler only requires to rearrange the scheduler's
information with no impact on the process descriptor or state.
Of course you can't expect guarantees to be preserved across
switches,if nothing else because they might well be measured
in different way.

if you wonder why our code was not committed, it was because there
was not, and i think there is not yet, a good theoretical framework
for multiprocessor proportional share scheduling, so our PS scheduler
(note, not the scheduler abstraction framework, only the PS scheduler
instance) would not work in the SMP case, and this apparently was
a requirement for inclusion.

cheers
luigi
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: bridge callbacks in if_ed.c? (My pov)

2004-09-06 Thread Luigi Rizzo
On Sun, Sep 05, 2004 at 02:37:31PM -0700, Matthew Dillon wrote:
 Well, wait a second... are we talking about a lot of packets being
 discarded by the filter in 'normal' operation, or are we talking about
 an attack?  Because if we are takling about an attack the LAST ethernet
...

Sure, in this thread we are talking of a performance hack for a
specific piece of hardware, which may be obsolete and poorly
performing, but is also one of the few widespread ones supporting
coax. Once upon a time, this hack was basically the only way to
make a coax bridge perform decently and not saturate the bus (ISA
or PCMCIA).  Granted, these days maybe nobody uses coax anymore or
has a desire to upgrade these boxes.

If the existing code  is broken (but please make a reasonable effort
to prove it, don't just hint things) or gets in the way because
e.g. it would complicate locking for everyone else, or because the
bridge_in_ptr() or BDG_ACTIVE() calls disappear from the API, then
i am all for the suggested change.

But if the suggested change is something in preparation for other
changes that may never see the light, then i'd rather just add a
comment/reminder in the relevant bridging file, and nuke the code
in if_ed.c and everywhere else when this becomes necessary.  After
all the problem (alleged layering violation) is well understood,
and the offender (assuming this is the only one -- the way to check
would be rename bridge_in_ptr() and BDG_ACTIVE() to something
different and try a build of the kernel and modules) and the 
trivial fix are known so postponing the change is not going
to harm anyone.

Speaking about layering violation -- sure, the above bridge thing 
is a small one, but there are much worse (and more critical) offenders.

E.g. the device driver preferably should not know who is going to
consume its packets, and you are pointing the finger at the bridging
code -- but this applies to bpf as well, yet several drivers still  
have explicit 

if (ifp-if_bpf)
bpf_mtap(ifp, m_head);
or implicit 
BPF_MTAP(ifp, m_head);
bpf hooks. 

And another huge one is the support for delayed checksums, which
permeates the entire network stack and breaks bpf feeding it with   
packets carrying invalid checksums.

I guess the above means do what you like, just don't put an
'Approved by: luigi' line in the commit msg :)

cheers
luigi
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: bridge callbacks in if_ed.c?

2004-09-05 Thread Luigi Rizzo
On Mon, Sep 06, 2004 at 12:52:49AM +0400, Gleb Smirnoff wrote:
   Luigi,
 
 I see that bridge callbacks are still living in if_ed.c
 from FreeBSD 2.x times. See if_ed.c:2816. I think this is
 not correct.
 
 Bridge code is called from ether_input(), which is
 indirectly called from if_ed.c:2836.
 
 Any objections about attached patch?

there are performance reasons to do this way -- grabbing
the entire packet is expensive because it is done via programmed
I/O, so the current code only grabs the header, does the
filtering, and grabs the rest of the packet only if
needed.

Probably the current code runs bridge_in_ptr() twice, but I
believe this is still cheaper than grabbing all packets
entirely.

I'd rather not apply the patch unless you can show that
the current code leads to incorrect behaviour.

cheers
luigi

 [ccing hackers@ and net@ to get more eyes reviewing]
 
 -- 
 Totus tuus, Glebius.
 GLEBIUS-RIPN GLEB-RIPE

 Index: if_ed.c
 ===
 RCS file: /home/ncvs/src/sys/dev/ed/if_ed.c,v
 retrieving revision 1.233
 diff -u -r1.233 if_ed.c
 --- if_ed.c   13 Aug 2004 23:04:23 -  1.233
 +++ if_ed.c   5 Sep 2004 20:48:19 -
 @@ -2810,26 +2810,9 @@
   eh = mtod(m, struct ether_header *);
  
   /*
 -  * Don't read in the entire packet if we know we're going to drop it
 -  * and no bpf is active.
 +  * Get packet, including link layer address, from interface.
*/
 - if (!ifp-if_bpf  BDG_ACTIVE( (ifp) ) ) {
 - struct ifnet *bif;
 -
 - ed_ring_copy(sc, buf, (char *)eh, ETHER_HDR_LEN);
 - bif = bridge_in_ptr(ifp, eh) ;
 - if (bif == BDG_DROP) {
 - m_freem(m);
 - return;
 - }
 - if (len  ETHER_HDR_LEN)
 - ed_ring_copy(sc, buf + ETHER_HDR_LEN,
 - (char *)(eh + 1), len - ETHER_HDR_LEN);
 - } else
 - /*
 -  * Get packet, including link layer address, from interface.
 -  */
 - ed_ring_copy(sc, buf, (char *)eh, len);
 + ed_ring_copy(sc, buf, (char *)eh, len);
  
   m-m_pkthdr.len = m-m_len = len;
  

 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to [EMAIL PROTECTED]

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: bridge callbacks in if_ed.c?

2004-09-05 Thread Luigi Rizzo
On Mon, Sep 06, 2004 at 03:01:00AM +0400, Gleb Smirnoff wrote:
...
 L I'd rather not apply the patch unless you can show that
 L the current code leads to incorrect behaviour.
 
 I suspect that packets dropped by bridge_in() called from if_ed will
 not be captured by bpf(4). This is incorrect.

if you read the code you see that the bpf behaviour is
as it should be, and your suspect is unfounded.

-   if (!ifp-if_bpf  BDG_ACTIVE( (ifp) ) ) {

(my summary and pov  on the discussion in a separate email)

cheers
luigi
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: indent bugfix / added features

2004-06-11 Thread Luigi Rizzo
On Fri, Jun 11, 2004 at 04:07:40PM +0200, Jens Schweikhardt wrote:
 On Thu, Jun 10, 2004 at 09:53:07PM -0500, Chip Norkus wrote:
...
 # normalize the code a bit.  In doing so I discovered a few deficencies in
 # the stock FreeBSD (5.2-CURRENT) indent and decided to fix them, I
 # thought these might be fairly common wishes (and one of them is a
 # bugfix) and have attached a patch which does the following:
...
 #   I don't know if anyone would be interested in committing the bugfix (I
 # believe it is correct) or added features, but I hope someone else finds
 # this useful.
...
 I'm willing to commit this if you could demonstrate that it will not
 produce different output in the default case than it does now. Say,
 run the old an new versions against the FreeBSD src tree and make a diff
 which should vanish apart from the bug-fixing effects. If you want to
 make yourself known as a quality software engineer, do the same for
 various sets of indent options :-)

?? this sounds like an absurd request, please... 
 
First, for a small patch like this you are much better off looking 
at the source code diffs rather than checking the output in a 
necessarily small set of test cases. 
 
Second, either you trust the author (in which case his statement
I believe is correct is all you need), or you don't, in which
case you'd have to check the patch yourself in whatever way you 
believe suitable. Either way, I don't see how the additional
tests you are asking for would change your behaviour.

cheers
luigi
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: indent bugfix / added features

2004-06-11 Thread Luigi Rizzo
On Fri, Jun 11, 2004 at 05:20:19PM +0200, Jens Schweikhardt wrote:
...
 Sigh. A request for a little bit of QA and an emoticon as well and
 I'm criticised. I remember when I was not yet a committer that the
 better I could demonstrate that the code has no ill-effect the more
 chances some committer would bring it in the tree.

yeah but that's not a reason to give the same 'treatment' to
other people.

In the end, if you don't feel like taking the risk, you don't commit
the patch and nobody will blame you.

 I've looked at too many innocous patches that performed unexpectedly
 when run, to not let myself get away with this. Of course this may be

i am not trying to discuss general principles, but only this
specific case. This is a very small patch and you can easily
check it while you type/paste it in (if nothing else just for
curiosity on what was the problem and how was it fixed).

In the end, if you don't feel like taking the risk, you don't commit
the patch; nobody will blame you.

 No hard feelings, Luigi :-)

nor on my side! I was just trying to make a point that we should not
try to scare or annoy people who are so kind to contribute patches
just because we don't have time to scrutinize them (which in the
end is our responsibility, not theirs).

I wouldn't have said a word if you had some actual comments/criticism
on the contributed code. Even if they were only style issues (which
we shouldn't even bother to criticise in these cases, as they can
be trivially fixed at commit time). But asking for more input or
i won't even look at your code (at least, that was the sense one
could perceive) was a bit too much...

cheers
luigi
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ipfw cached ucred patch

2004-06-02 Thread Luigi Rizzo
On Wed, Jun 02, 2004 at 03:14:43PM -0700, Christian S.J. Peron wrote:
 
 I understand what you are saying. The only real other choice 
 would be to copy out the entire cr_groups array. Do you know
 if this copy would be more expensive then the mutex lock/unlock
 associated with grabbing a reference to the ucred?

i bet the copy it would be cheaper almost on any architecture -- it
is only 64 bytes anyways, with these sizes what kills you in memory
accesses is the latency, not the throughput.

cheers
luigi
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: make(1) guru question

2004-04-06 Thread Luigi Rizzo
On Tue, Apr 06, 2004 at 08:26:00PM +0200, Jens Schweikhardt wrote:
 Fellow hackers,
 
 suppose you have a long list of files in a make variable V, exceeding
 kern.argmax. This means there is no way you can write a rule where $(V)
 is a command argument in any way shape or form. There is also no way to
 pass the value of V to xargs that I know of. For example with this

depending on the use, you might use something like

make -V variable_name | xargs ...

within the makefile. I got the suggestion from someone long ago
when I had this problem with src/sys/conf/Makefile.i386

cheers
luigi

 Makefile:
 
   # Make V exceed kern.argmax (64K).
   V != jot 12440
   all:
   @ echo $(V)
 
 This fails with
 
   echo:Argument list too long
   *** Error code 1
 
 Furthermore the workaround of creating a process for each file in V with
 
   V != jot 12440
   all:
   .for v in $(V)
   @ echo $(v)
   .endfor
 
 is not acceptable because it creates too much overhead for process
 creation (think of echo being an expensive command.) Question: is there
 any other way (short of increasing kern.argmax) to maybe divide and
 conquer the V contents by use of substitution magic? I'm thinking of
 something along repeatedly cramming N items in some variable and then
 calling echo less often.
 
 The original problem can be found in
 http://www.freebsd.org/cgi/query-pr.cgi?pr=ports/52765
 
 Regards,
 
   Jens
 -- 
 Jens Schweikhardt http://www.schweikhardt.net/
 SIGSIG -- signature too long (core dumped)
 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to [EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: em0, polling performance, P4 2.8ghz FSB 800mhz

2004-03-03 Thread Luigi Rizzo
On Wed, Mar 03, 2004 at 10:03:11AM -0500, Andrew Gallatin wrote:
 
 Don Bowman writes:
 
   I'm not sure what affect on fxp. fxp is inherently limited
   by something internal to it, which prevents achieving 
   high packet rates. bge is the best chip, but doesn't

but you should not compare apples and oranges. the fxp is a 100mbit NIC,
the bge is a GigE NIC.

 Just curious - why is bge the best chip?  Is it because
 it exports a really nice API (separate recv ring for small messages),
 or is the chip inherently faster, regardless of its API?
 
 I'm trying to design a new ethernet API for a firmware-based nic,
 and I'm trying to convince a colleague that having separate
 receive rings for small and large frames is a really good thing.

i am actually not very convinced either, unless you are telling me
that there is a way to preserve ordering. Or you'd be in trouble
when, on your busy link, there is a mismatch between user-level and
link-level block sizes.

So, what is your design like, you want to pass the NIC buffers of
2-3 different sizes and let the NIC choose from the most appropriate
pool depending on the incoming frame size, but still return
received frames in a single ring in arrival order ?
This would make sense, but having completely separate rings
(small frames here, large frames there) with no ordering relation
would not.

cheers
luigi
 Drew
 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to [EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Adding an IPFW rule from C program

2004-03-03 Thread Luigi Rizzo
On Wed, Mar 03, 2004 at 07:37:06AM +, Matt wrote:
 Tried this on the ipfw list but didnt get any response.
 
 Part of an app I am playing with needs to be able to add an ipfw
 rule.  I had though i got all of what i need from ipfw2.c and ip_fw.h
 but I am painfully new to C and must be missing something.  Not

you don't want to use the native APi, do a system(ipfw add ...)
instead.

cheers
luigi
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


munmap.2 inconsistency ?

2004-02-06 Thread Luigi Rizzo
hi,
browsing through the munmap() page, it says
 Munmap() will fail if:

 [EINVAL]   The addr parameter was not page aligned, the len
...

now, i have verified that munmap works fine with any address returned
by mmap, even if not aligned, at least on a recent -STABLE.

As i assume that passing munmap() the same address returned by mmap()
is common behaviour, should we rephrase the manpage ?

cheers
luigi
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: send(2) does not block, send(2) man page wrong?

2004-01-26 Thread Luigi Rizzo
On Mon, Jan 26, 2004 at 10:53:54AM -0800, Julian Elischer wrote:
...
 On Mon, 26 Jan 2004, Stuart Pook wrote:
 
   On 23 Jan 2004, Don Lewis wrote:
the send does not give an error: the packet is just thrown away.
   
   Which is the same result as you would get if the bottleneck is just one
   network hop away instead of at the local NIC.
  
  But it isn't. I'm broadcasting onto the local network.  With Linux and
  Solaris (which implement what FreeBSD send(2) says), it is so easy: I just
  send(2) away, and because the send blocks when the kernel buffer space is

I'd be really curious to know how Linux/Solaris actually implement
this blocking send and if they really block or use some kind
of timeout/retry loop in the kernel.

To implement a blocking send() on UDP sockets, you need a different
driver model from the one we have, one where sockets and other data
sources trying to access a full interface queue should be queued
into some kind of list hanging off the interface, so that when the
interface is ready again you can wake up the pending clients in
turn and process their requests.

This would cause the output queue to become effectively
unbounded (basically, it is like reserving at least one slot
per socket -- more if you want to deal with fragments),
and even if the slot can be allocated as part of
the socket, the delay would become unbounded as well.
Secondly, if the interface for some reason goes temporarily
down (e.g.  no-carrier or the like) the process would suddenly
block unless you mark the socket as non blocking.

cheers
luigi
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: XL driver checksum producing corrupted but checksum-correct packets

2004-01-24 Thread Luigi Rizzo
On Sat, Jan 24, 2004 at 01:38:37PM -0500, Robert Watson wrote:
...
 (2) Try the NDIS driver with the NDIS-u-lator on FreeBSD 5.x and see if
 that also has the problem.

but going this way you have no idea on what the driver does,
including enabling hw checksums. This looks like a
useless test at least for the purpose of finding out
what is going wrong

cheers
luigi
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: XL driver checksum producing corrupted but checksum-correct packets

2004-01-24 Thread Luigi Rizzo
On Sat, Jan 24, 2004 at 02:12:12PM -0500, Robert Watson wrote:
...
  but going this way you have no idea on what the driver does, including
  enabling hw checksums. This looks like a useless test at least for the
  purpose of finding out what is going wrong
 
 Actually, I'm more curious about whether it's a known errata/misbehavior
 for the card that 3Com's drivers work around, or not.  The problem could
 well be compleely unrelated to hardware checksuming per se -- the
 corruption might well be taking place as the buffer is moved from the
 card's buffer to the operating system managed buffer.  If the NDIS driver
 doesn't illustrate the same problem, it tells us that by frobbing
 appropriately, this problem can be worked around.  It also tells us that
 by looking a bit harder at what the driver is doing (i.e., how it frobs
 the hardware), we can learn something about the appropriate workaround. 

yes, but how would you know that, short of reverse engineering
the driver, or tracing I/O accesses to the hardware ?
It really looks like an overkill effort... I'd rather just
try to debug the issue working on an open source driver, or
dump the hardware altogether and replace it with something
known to work...

cheers
luigi
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: send(2) does not block, send(2) man page wrong?

2004-01-23 Thread Luigi Rizzo
On Fri, Jan 23, 2004 at 06:09:20PM +0100, Andre Oppermann wrote:
...
 send() for UDP should block if the socket is filled and the interface
 can't drain the data fast enough.
  
  It doesn't (at least I cannot make it block)
 
 This stuff is rather complex.  A send() on a UDP socket processes right
 down to the if_output.  If that fails because the ifqueue is full, the
 packet will be free()d right away.  No luck with blocking and retrying.

and there would be no point in blocking given that the protocol (UDP)
is unreliable and designed not to give any guarantee whatsoever.
The most you can get is an error code on return from send()/write()
and friends.
Furthermore, send() and write() block on the socket buffer filling
up, not on the interface queue. Because UDP has no output socket
buffer, there is no way it can block.
Finally, overflows in the interface queue are never handled by send()
kernel code, not even for TCP: in this case, it is just TCP
congestion control that acts and, either at the next incoming ACK,
or upon a timeout, tries a retransmission.

  Send(2) indicates that it should do so.

i admit the manpage should definitely be clarified -- it says 'if
no message space is available at the socket...', but it does cover
the UDP behaviour.

Technically, in the UDP case there is always space at the sending
socket, because that space is never used - by definition of the UDP
protocol - and the packet goes straight to the ip layer and then
down to the interface.

cheers
luigi
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: syslog

2004-01-04 Thread Luigi Rizzo
On Sun, Jan 04, 2004 at 02:15:18AM +0700, Eugene Grosbein wrote:
 Hi!
 
 [EMAIL PROTECTED] wrote 8 years ago in src/lib/libc/gen/syslog.c:
 
 p += sprintf(p, %.15s , ctime(now) + 4);
 
 What is '+ 4' for?

quite likely it is to skip the 'day of week' field -- the ctime
manpage says

 The ctime() function adjusts the time value for the current time zone in
 the same manner as localtime(), and returns a pointer to a 26-character
 string of the form:

   Thu Nov 24 18:22:48 1986\n\0

 All the fields have constant width.

so it makes sense

cheers
luigi

 http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/gen/syslog.c.diff?r1=1.2r2=1.3
 
 Eugene Grosbein
 
 P.S. Please CC me, I'm not in list
 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to [EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: natd + ipfw question

2003-12-24 Thread Luigi Rizzo
On Tue, Dec 23, 2003 at 03:17:12PM -0500, Leo Bicknell wrote:
...
 I must not be clear on what in out recv and xmit mean, and
 after reading the manual page 3 times I'm now even more confused.

The names are reasonably intuitive...

  in  matches packets on the INput path (basically,
ip_input() and/or ether_input()

  out matches packets on the OUTput path;
(ip_output() and/or ether_output())

  recv foo0   matches packets that have been received from
interface foo0

  xmit bar1   matches packets that are going to be transmitted
on interface bar1

  via xx2 matches packets that are either received or
transmitted through interface xx2

the flow diagram near the beginning of the ipfw manpage should
clarify things a bit (i agree that the wording of 'recv/xmit/via'
section is a bit confusing, so if you have better suggestions they
are welcome)

cheers
luigi

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: natd + ipfw question

2003-12-24 Thread Luigi Rizzo
On Wed, Dec 24, 2003 at 08:39:45AM -0500, Leo Bicknell wrote:
...
 Now that I've used IPFW2 for something more complicated than simple
 host filtering I see that the syntax and structure makes something
 like a firewall/nat box for any moderately interesting config way
 too complicated with way too many pitfalls. This whole the packet
 may hit your rule between 0 and 4 times, depending on a pile of
 stuff just doesn't fly, and add in the need for one_pass=0 to
 make dummynet traffic shaping work right, which adds some complication

honestly, i think you are mispresenting things.
How many times you hit a rule depends on your ruleset, with
any firewall -- in fact, a ruleset is no different from a
program and if you want to do something useful with a program
you probably need to write slightly more than printf(hello world);
with a correspondingly increased chance for putting in bugs.
And you normally use one_pass=1 only when you want to build
complex firewall structures involving multiple pipes, or doing
dummynet filtering before natd (for which there is a better
way given that you can operate on both the input and output path).

I believe that what you want is not a better config language,
but some default rulesets that you can customize by
simply putting in your addresses (more or less).

cheers
luigi

 to the firewall rules and things are just all kinds of strange.
 
 That's no knock on the authors, backwards compatability is important,
 and a lot has been grafted onto IPFW since it started (like divert/nat
 and the dummynet stuff).  I'll strongly recomend though that IPFW3
 have a whole new, from the ground up, redesigned config language.
 :)  And yes, I'm willing to help.
 
 -- 
Leo Bicknell - [EMAIL PROTECTED] - CCIE 3440
 PGP keys at http://www.ufp.org/~bicknell/
 Read TMBG List - [EMAIL PROTECTED], www.tmbg.org


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: ipfw/ipf IP filtering thoughts

2003-11-30 Thread Luigi Rizzo
On Sun, Nov 30, 2003 at 06:53:10AM -, Antti Louko wrote:
 Generally, I like the (Free)BSD way of doing things.  But the IP
 filtering modules available for FreeBSD lack one feature when compared
 to Linux way (ipchains and iptables).

There is no call instruction by design in ipfw2. The reason is
that in many cases (e.g. after divert action or a dummynet 
pipe) packet processing might need to restart from the point
where it was suspended. Having a call/return would require to save
the return stack with the packet, which is expensive and was
even very hard to do before having m_tags, or to
introduce limitations in the actions, which is not nice and
not backward compatible.

I am not opposed on adding call/return actions (it would be trivial
to do in ipfw2, except for the state saving part) but would really
like to see a convincing example motivating their use. E.g. your
example (do certain tests only if the packet matches X) can be
trivially implemented by skipping to the end of the list if !X.

If you are concerned by readability of the resulting list, I
think you should consider ipfw[2] instructions as machine code
and instead read/generate them from a higher level description
in some scripting language.

I have one (small) extension which might help in producing more
efficient rulesets: introduce 'setflag'/'clearflag' actions (similar
to count) which can set or clear a small number of flags
(think of them as the bits in a 32-bit number)
when the packet matches, and then a flags command which can
look for certain flag configurations. So you could write
things like

setflag 0x100 src-ip a/24,b/26,c/30 ...
setflag 0x200 src-ip d/24,e,f ...
allow flags 0x300:0x300 dst-port 22,80
allow flags 0x100:0x100 dst-port 25

etc. so you can record the result of a potentially long
series of checks in a single flags and then act depending
on the flag configuration.

cheers
luigi

 In ipchains and iptables you have a sequential list of rules, very
 much like in ipfw and ipf, but you can have several different lists
 which have symbolic names and you can make calls from lists to other
 lists based on normal packet criteria.  If the list is exchausted, the
 scan returns to the previous list.  This makes it possible to make
 filtering decisions much more efficient in complex situation.  You can
 for example scan a certain list only for eg. packets going to for
 example port 25 and so on.  In FreeBSD, you don't have this
 subroutine call feature at all and you are limited to only one
 sequential list with a goto.
 
 Any ideas how to proceed.  I think this would be really needed and
 widely used if available.
 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to [EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


RFC: proposed new builtin for /bin/sh (associative arrays)

2003-10-31 Thread Luigi Rizzo

[Not sure what is the appropriate forum to discuss this, so
please redirect the discussion if you know where. I have Bcc-ed
a few /bin/sh committers]

I am trying to implement, in the most unintrusive way, something
resembliung associative arrays in /bin/sh.  I am not interested in
syntactic sugar, so i am happy to use something like _ as a separator
between the array basename and the index, i.e.

foo_red foo_green foo_blue

would be part of the same array. And, as a first step, I am also
happy to be limited to [0-9A-Za-z_]+ as index values.
So all it was necessary was a command to enumerate the indexes
of an array, which is implemented by the attached patch, and
can be used as follows:

for i in `indexes foo_`
do
eval x=\$foo_$i
echo variable foo_$i has value $x
done

(basically, indexes xyz lists the remaining part of all variable
names that start with xyz. As a possibly useful side effect,
indexes  lists all variable names, which i believe is not an
available sh function -- and i find it strange since we can
list the names of readonly and export-ed variables with the
readonly and export builtins).

Any comments ? Is this interesting enough to be committed
(with a proper manpage description) ?
I could provide a flag to indexes to return the values instead
of the names, but i believe this form is more useful.

The next step would be to allow arbitrary strings as indexes, but
that would be trickier because it would change the syntax for
variable names (e.g. allowing \[.*\] as the last part of a variable
name)

cheers
luigi

Index: builtins.def
===
RCS file: /home/ncvs/src/bin/sh/builtins.def,v
retrieving revision 1.7.2.2
diff -u -r1.7.2.2 builtins.def
--- builtins.def27 Aug 2002 01:36:28 -  1.7.2.2
+++ builtins.def30 Oct 2003 09:02:37 -
@@ -69,6 +69,7 @@
 fgcmd -j   fg
 getoptscmd getopts
 hashcmdhash
+indexesindexes
 jobidcmd   jobid
 jobscmdjobs
 #linecmd   line
Index: var.c
===
RCS file: /home/ncvs/src/bin/sh/var.c,v
retrieving revision 1.15.2.2
diff -u -r1.15.2.2 var.c
--- var.c   27 Aug 2002 01:36:28 -  1.15.2.2
+++ var.c   31 Oct 2003 09:06:27 -
@@ -602,6 +602,28 @@
return 0;
 }
 
+int
+indexes(int argc, char **argv)
+{
+   struct var **vpp;
+   struct var *vp;
+   char *p, *q;
+
+   if (argc != 2)
+   error(indexes require one argument);
+   for (vpp = vartab ; vpp  vartab + VTABSIZE ; vpp++) {
+   for (vp = *vpp ; vp ; vp = vp-next) {
+   for (p = vp-text, q = argv[1];
+   *p != '='  *p == *q; p++, q++)
+   ;
+   if (*q != '\0')
+   continue;   /* not found */
+   while (*p != '=')
+   out1c(*p++);
+   out1c('\n');
+   }
+   }
+}
 
 /*
  * The local command.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


suspect spl*() code in syscons.c

2003-10-21 Thread Luigi Rizzo
Hi,
both -current and -stable have the following snippet of code in
sys/dev/syscons/syscons.c:scclose():

{
...
int s;

if (SC_VTY(dev) != SC_CONSOLECTL) {
...
s = spltty();
...
}
spltty();
(*linesw[tp-t_line].l_close)(tp, flag);
ttyclose(tp);
spl0();
return(0);
}

Note that the omitted code never does any spl*() call, nor it
uses the saved value anymore. Also, i am a bit suspicious about the
spltty()/spl0() sequence.

Can someone explain if this code is correct ?
(I have Bcc-ed the committers involved in writing this code,
maybe they know the answer).

cheers
luigi
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: hardclock interrupt deadlock

2003-10-16 Thread Luigi Rizzo
On Thu, Oct 16, 2003 at 11:17:50AM -0400, Michael Marchetti wrote:
 Hi,
 
 We have encountered a problem where the system hangs.  We are running a 4.7
 SMP kernel using kernel polling on a Dual Xeon with hyperthreading enabled

puzzled on what you mean by kernel polling ... DEVICE_POLLING,
if that is what you mean, cannot work with SMP -- it should not even
build unless you manually disabled the check.

luigi

 (essentially a 4 processor system).  As a result, the only HW interrupts in
 the system are hardclock (8254), the rtc, serial console and scsi.  The
 synchronous interrupts are (8254 and rtc).  When the system is hung, I have
 found that the ipending and iactive bits for the 8254 and rtc are set
 (meaning the interrupt is pending and active) although giant lock is not
 held and all processors are idle (and halted).  This lead me to believe that
 somehow the ipending bit was set just before the last interrupt returned.
 The only way the system would be able to run that interrupt again is if
 another interrupt would run and it would notice that ipending is set, and it
 would run (an interrupt delay would be seen).  In a non-polling system, I
 imagine the ethernet interrupts would wake it up.  I believe I found a
 potential hole where this could happen.
 
 In i386/isa/ipl.s:
 
 #ifdef SMP
   cli /* early to prevent INT deadlock */
 doreti_next2:
 #endif
   movl%eax,%ecx
   notl%ecx/* set bit = unmasked level */
 #ifndef SMP
   cli
 #endif
   andl_ipending,%ecx  /* set bit = unmasked pending INT */
   jne doreti_unpend
   movl%eax,_cpl
 
 I'm concerned in the instance the ipending is checked and deemed to be not
 set, but just after another interrupt occurs causing ipending to be set.
 Because CPL is not yet unmasked, that interrupt is not forwarded.  In
 Particular, in i386/isa/apic_vector.s:
 
 3: ;  /* other cpu has isr lock */\
   APIC_ITRACE(apic_itrace_noisrlock, irq_num, APIC_ITRACE_NOISRLOCK)
 ;\
   lock ;  \
   orl $IRQ_BIT(irq_num), _ipending ;  \
   testl   $IRQ_BIT(irq_num), _cpl ;   \
   jne 4f ;/* this INT masked */   \
   callforward_irq ;/* forward irq to lock holder */   \
   POP_FRAME ; /* and return */\
   iret ;  \
   ALIGN_TEXT ;\
 
 The check for _cpl occurs right after the ipending, thus causing a potential
 race for checking/modifying the cpl.
 
 One quick solution that I thought might correct this would be in ipl.s,
 right after modifying the cpl, recheck the ipending again to see if it
 changed, such as:
 
 
 #ifdef SMP
   cli /* early to prevent INT deadlock */
 doreti_next2:
 #endif
   movl%eax,%ecx
   notl%ecx/* set bit = unmasked level */
 #ifndef SMP
   cli
 #endif
   andl_ipending,%ecx  /* set bit = unmasked pending INT */
   jne doreti_unpend
   movl%eax,_cpl
   andl_ipending,%ecx  /* set bit = unmasked pending INT */
   jne doreti_unpend
 
 
 Any opinions/insight?
 
 thanks.
 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to [EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: hardclock interrupt deadlock

2003-10-16 Thread 'Luigi Rizzo'
On Thu, Oct 16, 2003 at 12:05:25PM -0400, Michael Marchetti wrote:
 I have enabled DEVICE_POLLING.   It does work with SMP (disabled the check).

You can remove code and pretend the remaining code works,
but that does not mean that it _actually_ does what is
expected to do.

cheers
luigi

 -Original Message-
 From: Luigi Rizzo [mailto:[EMAIL PROTECTED]
 Sent: Thursday, October 16, 2003 11:49 AM
 To: Michael Marchetti
 Cc: '[EMAIL PROTECTED]'; '[EMAIL PROTECTED]'
 Subject: Re: hardclock interrupt deadlock
 
 
 On Thu, Oct 16, 2003 at 11:17:50AM -0400, Michael Marchetti wrote:
  Hi,
  
  We have encountered a problem where the system hangs.  We are running a
 4.7
  SMP kernel using kernel polling on a Dual Xeon with hyperthreading enabled
 
 puzzled on what you mean by kernel polling ... DEVICE_POLLING,
 if that is what you mean, cannot work with SMP -- it should not even
 build unless you manually disabled the check.
 
   luigi
 
  (essentially a 4 processor system).  As a result, the only HW interrupts
 in
  the system are hardclock (8254), the rtc, serial console and scsi.  The
  synchronous interrupts are (8254 and rtc).  When the system is hung, I
 have
  found that the ipending and iactive bits for the 8254 and rtc are set
  (meaning the interrupt is pending and active) although giant lock is not
  held and all processors are idle (and halted).  This lead me to believe
 that
  somehow the ipending bit was set just before the last interrupt
 returned.
  The only way the system would be able to run that interrupt again is if
  another interrupt would run and it would notice that ipending is set, and
 it
  would run (an interrupt delay would be seen).  In a non-polling system, I
  imagine the ethernet interrupts would wake it up.  I believe I found a
  potential hole where this could happen.
  
  In i386/isa/ipl.s:
  
  #ifdef SMP
  cli /* early to prevent INT deadlock */
  doreti_next2:
  #endif
  movl%eax,%ecx
  notl%ecx/* set bit = unmasked level */
  #ifndef SMP
  cli
  #endif
  andl_ipending,%ecx  /* set bit = unmasked pending INT */
  jne doreti_unpend
  movl%eax,_cpl
  
  I'm concerned in the instance the ipending is checked and deemed to be not
  set, but just after another interrupt occurs causing ipending to be set.
  Because CPL is not yet unmasked, that interrupt is not forwarded.  In
  Particular, in i386/isa/apic_vector.s:
  
  3: ;/* other cpu has isr lock */\
  APIC_ITRACE(apic_itrace_noisrlock, irq_num, APIC_ITRACE_NOISRLOCK)
  ;\
  lock ;  \
  orl $IRQ_BIT(irq_num), _ipending ;  \
  testl   $IRQ_BIT(irq_num), _cpl ;   \
  jne 4f ;/* this INT masked */   \
  callforward_irq ;/* forward irq to lock holder */   \
  POP_FRAME ; /* and return */\
  iret ;  \
  ALIGN_TEXT ;\
  
  The check for _cpl occurs right after the ipending, thus causing a
 potential
  race for checking/modifying the cpl.
  
  One quick solution that I thought might correct this would be in ipl.s,
  right after modifying the cpl, recheck the ipending again to see if it
  changed, such as:
  
  
  #ifdef SMP
  cli /* early to prevent INT deadlock */
  doreti_next2:
  #endif
  movl%eax,%ecx
  notl%ecx/* set bit = unmasked level */
  #ifndef SMP
  cli
  #endif
  andl_ipending,%ecx  /* set bit = unmasked pending INT */
  jne doreti_unpend
  movl%eax,_cpl
  andl_ipending,%ecx  /* set bit = unmasked pending INT */
  jne doreti_unpend
  
  
  Any opinions/insight?
  
  thanks.
  ___
  [EMAIL PROTECTED] mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
  To unsubscribe, send any mail to [EMAIL PROTECTED]
 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to [EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: HZ = 1000 slows down application

2003-10-07 Thread Luigi Rizzo
On Tue, Oct 07, 2003 at 06:17:04PM -0400, [EMAIL PROTECTED] wrote:
 Hi Luigi, Mark,
 
 Thanks for your replies.
 
 We did some intensive profiling of our application. It does not seem like
 we are depending on clock ticks for any calculations.
 
 On the other hand we notice that our slow iterations happen almost at the
 same instant as microuptime went backward messages in the system log. We

if this is the case, probably your code at some point computes a
time difference which turns out negative (or if it is unsigned, it
becomes very very large) upon those events, thus causing some loop
to explode.
It should be easy to check if this is the case, and just ignore
those outliers rather than trying to figure out why the clock
goes backward. I used to see the same microuptime went backwards
msg on some of my 400MHz boxes, even without NTP enabled.
Maybe a buggy timer, not sure which timecounter was used on that
box (i read some time ago that the cpu on the soekris4801 has a
weird TSC implementation where the upper 32 bits change when the
lower 32 bits are 0xfffd, who knows what other bugs might be
in other hardware...)

cheers
luigi

 were told that ntpd is correcting the time when these messages appear. The
 vexing problem is that making HZ=1000 has increased the rate at which ntp
 updates the time. Is this possible ? Does ntp count the number of ticks
 before applying a correction ?
 
 This is the point we are at now. Any help to shed more light on this is
 appreciated.
 
 Thanks,
 -ansh
 
 
 
 
 
 Original Message:
 -
 From: Mark Santcroos [EMAIL PROTECTED]
 Date: Tue, 7 Oct 2003 19:16:14 +0200
 To: [EMAIL PROTECTED]
 Subject: Re: HZ = 1000 slows down application
 
 
 On Mon, Sep 22, 2003 at 02:22:02PM -0700, Luigi Rizzo wrote:
  On Mon, Sep 22, 2003 at 02:43:40PM -0400, [EMAIL PROTECTED] wrote:
  ...
   But now I noticed that my application is occassionally doing slower
   iterations. Average iteration time used to be 0.2 ms without polling
   enabled. With the device polling changes, the average time is still
 around
   the same, but once every few minutes the application sees iterations
 that
   are 3.3 seconds (*seconds*, not a typo) long. 
  
  most likely your application makes some assumptions on the duration of
  a clock tick and then it gets confused when, say, a select returns
  quicker, or some time difference becomes negative, etc. etc. because
  of the finer granularity.
  
  very common type of bug.
 
 Hi,
 
 What was the outcome of this?
 
 Thanks
 
 Mark
 
 
 
 
 
 mail2web - Check your email from the web at
 http://mail2web.com/ .
 
 
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: IPFW2

2003-09-23 Thread Luigi Rizzo
On Tue, Sep 23, 2003 at 12:28:07PM -0400, Matthew George wrote:
...
  you can count the traffic with dynamic rules (but this does not go
  to the logfile), not sure what you mean by 'see the transfered data file'
 
 from ipf(5):
 
 LOGGING
When a packet is logged, with either the  log  action  or  option, the
headers  of  the  packet  are written to the ipl packet logging psuedo-
device. Immediately following the log keyword, the following qualifiers
may be used (in order):
 
body   indicates  that  the first 128 bytes of the packet contents will
   be logged after the headers.
 
 I don't believe there is a comparable ipfw option ...

no, there isn't. However the attached patch lets you run any bpf-based
application on the packets which match an ipfw rule with 'log'
specifier when net.inet.ip.fw.verbose=0, thus achieving a very similar
if not a lot more powerful effect. Just use

sysctl net.inet.ip.fw.verbose=0
ipfw add count log ...

tcpdump -i ipfw0 ...

cheers
luigi

Index: sys/netinet/ip_fw2.c
===
RCS file: /home/ncvs/src/sys/netinet/ip_fw2.c,v
retrieving revision 1.6.2.16
diff -u -r1.6.2.16 ip_fw2.c
--- sys/netinet/ip_fw2.c17 Jul 2003 06:03:39 -  1.6.2.16
+++ sys/netinet/ip_fw2.c22 Sep 2003 22:21:38 -
@@ -51,10 +51,12 @@
 #include sys/proc.h
 #include sys/socket.h
 #include sys/socketvar.h
+#include sys/sockio.h/* for SIOC* */
 #include sys/sysctl.h
 #include sys/syslog.h
 #include sys/ucred.h
 #include net/if.h
+#include net/bpf.h   /* for BPF */
 #include net/route.h
 #include netinet/in.h
 #include netinet/in_systm.h
@@ -225,9 +227,14 @@
 dyn_short_lifetime, 0, Lifetime of dyn. rules for other situations);
 SYSCTL_INT(_net_inet_ip_fw, OID_AUTO, dyn_keepalive, CTLFLAG_RW,
 dyn_keepalive, 0, Enable keepalives for dyn. rules);
+static int fw_bpf_info = 1;
+SYSCTL_INT(_net_inet_ip_fw, OID_AUTO, bpf_info,
+CTLFLAG_RW,
+fw_bpf_info, 0, Add info in mac hdr);
 
 #endif /* SYSCTL_NODE */
 
+static struct ifnet ifn;   /* dummy ifnet to attach to bpf */
 
 static ip_fw_chk_t ipfw_chk;
 
@@ -1812,6 +1819,44 @@
case O_LOG:
if (fw_verbose)
ipfw_log(f, hlen, args-eh, m, oif);
+   else if (ifn.if_bpf != NULL) {
+   /*
+* Prepend a (readonly) header, fill it
+* with the real MAC header, or a dummy
+* one if not available. In this case
+* (layer3 packets) also restore the
+* byte ordering of some fields, and put
+* them back after bpf_mtap.
+* If requested, the first two bytes
+* of the src mac are replaced by the
+* rule number for userland filtering.
+*/
+   struct m_hdr mh;
+   struct ether_header my_eh;
+   char *h;
+
+   mh.mh_next = m;
+   mh.mh_len = ETHER_HDR_LEN;
+   mh.mh_data = (char *)my_eh;
+   if (args-eh)   /* layer2, complete */
+   h = (char *)args-eh;
+   else {
+   h = DDSS\x08\x00;
+   ip-ip_off = ntohs(ip-ip_off);
+   ip-ip_len = ntohs(ip-ip_len);
+   }
+   bcopy(h, my_eh, ETHER_HDR_LEN);
+   if (fw_bpf_info) {
+   mh.mh_data[0] = f-rulenum  8;
+   mh.mh_data[1] = f-rulenum  0xff;
+   }
+   bpf_mtap(ifn, (struct mbuf *)mh);
+   if (args-eh == NULL) {
+   /* restore IP format */
+   ip-ip_off = htons(ip-ip_off);
+   ip-ip_len = htons(ip-ip_len);
+   }
+   }
match = 1;
break;
 
@@ -2767,11 +2833,34 @@
ipfw_timeout_h = timeout(ipfw_tick, NULL, dyn_keepalive_period*hz);
 }
 
+static int
+ipfw_ifnet_ioctl(struct ifnet *ifp, u_long cmd, caddr_t addr)
+{
+   int error = 0;
+
+   switch 

Re: IPFW2

2003-09-22 Thread Luigi Rizzo
On Mon, Sep 22, 2003 at 08:07:13PM +0200, Uwe Klann wrote:
 Hi All,
 
 From the Log file IPFW:-
 Sep 22 00:24:13 muc /kernel: ipfw: 3300 Accept TCP 217.10.213.30:4418
 217.9.121.209:21 in via fxp0
 
 How can I extend on FreeBSD 4.8 (ipfw2) the log contens to see the tranfered
 data File and the amount of bytes went out? Thank you in advance for your

you can count the traffic with dynamic rules (but this does not go
to the logfile), not sure what you mean by 'see the transfered data file'

luigi


 
 Uwe
 
 Uwe Klann
 Isensteinstr. 3
 80634 Munich
 Germany
 Mail: [EMAIL PROTECTED]
 
 
 
 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to [EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: HZ = 1000 slows down application

2003-09-22 Thread Luigi Rizzo
On Mon, Sep 22, 2003 at 02:43:40PM -0400, [EMAIL PROTECTED] wrote:
...
 But now I noticed that my application is occassionally doing slower
 iterations. Average iteration time used to be 0.2 ms without polling
 enabled. With the device polling changes, the average time is still around
 the same, but once every few minutes the application sees iterations that
 are 3.3 seconds (*seconds*, not a typo) long. 

most likely your application makes some assumptions on the duration of
a clock tick and then it gets confused when, say, a select returns
quicker, or some time difference becomes negative, etc. etc. because
of the finer granularity.

very common type of bug.

cheers
luigi

 This seems to happen as soon as I use the kernel with HZ=1000. Enabling or
 disabling device polling does not seem to make any difference to this
 behavior. I am trying to understand why there seem to be a few really long
 iterations. Could it happen that the application does not get any CPU for
 that long? Seems very counter intuitive that higher HZ should cause this. 
 
 Could anyone shed any light on what is happening ?
 
 Thanks,
 -ansh
 
 
 mail2web - Check your email from the web at
 http://mail2web.com/ .
 
 
 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to [EMAIL PROTECTED]
 
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


can we disable AAAA queries in the resolver ?

2003-08-02 Thread Luigi Rizzo
hi,
recently i have been bitten by a problem which might be already
known, but still...

quite a few apps (sendmail and ssh among them) seem to always
try an  query if compiled with ipv6 support, and even if
the kernel does not support ipv6, tcpdump shows  queries going out
to the nameserver, and often timing out or otherwise causing my
apps significand delays at startup.

My understanding is that there are multiple buggy components here:
my ISP's nameserver certainly shouldn't behave so badly on 
requests, and the applications should not bother asking  queries
when the kernel has no ipv6 support.
On the other hand, the resolver code is probably just innocent
because if some application issues an  request, the resolver
has no reason to object. Still, rather than fixing the many
broken applications, or the nameserver (on which i have no control)
i wonder if it is possible to instruct the resolver, perhaps through
some option in resolv.conf, to immediately return some kind
of negative replies on selected queries ?

cheers
luigi

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: can we disable AAAA queries in the resolver ?

2003-08-02 Thread Luigi Rizzo
On Sat, Aug 02, 2003 at 09:59:18AM +0100, David Malone wrote:
 On Fri, Aug 01, 2003 at 11:52:00PM -0700, Luigi Rizzo wrote:
  My understanding is that there are multiple buggy components here:
  my ISP's nameserver certainly shouldn't behave so badly on 
  requests, and the applications should not bother asking  queries
  when the kernel has no ipv6 support.
...
 (Strictly speaking, you shouldn't cripple the resolver to not look
 up IPv6 addresses if none are configured 'cos you might want to
 look up the IPv6 address for some other reason than making a
 connection.  There is a AI_ADDRCONFIG flag for getaddrinfo that

i know, but what is happening is that all these applications
(including sendmail and our ssh, for what matters)
are broken in that they look for an  record just for making
a connection. And then it is easier to have a sensible
default (that can be overridden by those apps who really need it)
than one which is correct but depends on too many things
[over which one has no control] to behave correctly.

 tells it to only look up addresses if you have an address in that
 family configured.  For some reason it isn't mentioned in our man
 page. I'm not sure what the status of our implementation is either...)

browsing through the source code there is a bunch of
'options' in resolv.conf that are not mentioned in the manpage.:

timeout: T
attempts: N
debug
no-tld-query
inet6
rotate
no-check-names
edns0
a6
dname
nibble: suffix
nibble2: suffix
bitstring: suffix
v6revmode: single|both


cheers
luigi

   David.
 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to [EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Network pipes

2003-07-24 Thread Luigi Rizzo
hi,
i have the following questions:

* strange benchmark results! Given the description, I would expect 
  the |@ rsh and |@ ssh cases to give the same throughput, and
  in any case | rsh to be faster than | ssh. How comes, instead,
  that the times differ by an order of magnitude ? Can you run the
  tests in similar conditions to appreciate the gains better ?

* I do not understand how can you remove the pipe in the remote host
  without modifying there both sshd and sh ?
  I think it would be very important to understand how much
  |@ depends on the behaviour of the remote daemon.

* the loss of encription on the channel is certainly something that might
  escape the attention of the user. I also wonder in how many cases you
  really need the extra performance to justify the extra plumbing
  mechanism.

* there are subtle implications of your new plumbing in the way
  processes are started. With A | B | C the shell first creates the
  pipes, then it can start the processes in any order, and they can
  individually fail to start without any direct consequence other
  than an I/O failure. A |@ B |@ C requires that you start things
  from the end of the chain (because you cannot start a process 
  until you have a [socket] descriptor from the next stage in the
  chain), and if a process fails to start you cannot even start the
  next one in the sequence. Not that this is bad, just very different
  from regular pipes.

All the above leaves me a bit puzzled on whether or not this is a
useful addition... In fact, i am not convinced that network pipes
should be implemented in the shell...
 
cheers
luigi

On Thu, Jul 24, 2003 at 11:19:49AM +0300, Diomidis Spinellis wrote:
 I am currently testing a set of modifications to /bin/sh that allow a
 user to create a pipeline over the network using a socket as its
 endpoints.  Currently a command like
 
 tar cvf - / | ssh remotehost dd of=/dev/st0 bs=32k
 
 has tar sending each block through a pipe to a local ssh process, ssh
 communicating through a socket with a remote ssh daemon and dd
 communicating with sshd through a pipe again.  The changed shell allows
 you to write
 
 tar cvf - / |@ ssh remotehost -- dd of=/dev/st0 bs=32k | :
 
 The effect of the above command is that a socket is created between the
 local and the remote host with the standard output of tar and the
 standard input of dd redirected to that socket.  Authentication is still
 performed using ssh (or any other remote login mechanism you specify
 before the -- argument), but the flow between the two processes is from
 then on not protected in terms of integrity and privacy.  Thus the
 method will mostly be useful within the context of a LAN or a VPN.
 
 The authentication design requires the users to have a special command
 in their path on the remote host, but does not require an additional
 privileged server or the reservation of special ports.
 
 By eliminating two processes, the associated context switches, the data
 copying, and (in the case of ssh) encryption performance is markedly
 improved:
 
 dd if=/dev/zero bs=4k count=8192 | ssh remotehost -- dd of=/dev/null
 33554432 bytes transferred in 17.118648 secs (1960110 bytes/sec)
 dd if=/dev/zero bs=4k count=8192 |@ ssh remotehost -- dd of=/dev/null |
 :
 33554432 bytes transferred in  4.452980 secs (7535276 bytes/sec)
 
 Even eliminating the encryption overhead by using rsh you can still see 
 
 dd if=/dev/zero bs=4k count=8192 | rsh remotehost -- dd of=/dev/null
 33554432 bytes transferred in 131.907130 secs (254379 bytes/sec)
 dd if=/dev/zero bs=4k count=8192 |@ rsh remotehost -- dd of=/dev/null |
 :
 33554432 bytes transferred in 86.545385 secs (387709 bytes/sec)
 
 My questions are:
 
 1. How do you feel about integrating these changes to the /bin/sh in
 -CURRENT?  Note that network pipes are a different process plumbing
 mechanism, so they really do belong to a shell; implementing them
 through a separate command would be inelegant.
 
 2. Do you see any problems with the new syntax introduced?
 
 3. After the remote process starts running standard error output is
 lost.  Do find this a significant problem?
 
 4. Both sides of the remote process are communication endpoints and have
 to be connected to other local processes via pipes.  Would it be enough
 to document this behaviour or should it be hidden from the user by means
 of forked read/write processes?
 
 Diomidis - http://www.spinellis.gr
 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
 To unsubscribe, send any mail to [EMAIL PROTECTED]
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Network pipes

2003-07-24 Thread Luigi Rizzo
i understand the motivations (speeding up massive remote backups and the
like), but i do not believe you need to introduce a new shell
construct (with very different semantics) just to accommodate this.

I believe it is a lot more intuitive to say something like

foo [flags] source-host tar cvzf - /usr dest-host dd of=/dev/bar

and have your 'foo' command do the authentication using ssh or whatever
you require with the flags, create both ends of the socket, call
dup() as appropriate and then exec the source and destination
pipelines.

cheers
luigi

On Thu, Jul 24, 2003 at 02:04:21PM +0300, Diomidis Spinellis wrote:
 Luigi Rizzo wrote:
  * strange benchmark results! Given the description, I would expect
the |@ rsh and |@ ssh cases to give the same throughput, and
in any case | rsh to be faster than | ssh. How comes, instead,
that the times differ by an order of magnitude ? Can you run the
tests in similar conditions to appreciate the gains better ?
 
 They were executed on different machines.  The ssh result was between
 freefall.freebsd.org and ref5, the rsh result was between old low-end
 Pentium machines on my home network.
 
  * I do not understand how can you remove the pipe in the remote host
without modifying there both sshd and sh ?
I think it would be very important to understand how much
|@ depends on the behaviour of the remote daemon.
 
 The remote daemon is only used for authentication.  Thus any remote host
 command execution method can be used without modifying the client or the
 server.  What the modified shell does is start on the remote machine a
 separate command netpipe.   Netpipe takes as arguments the originating
 host, the socket port, the command to execute, and its arguments. 
 Netpipe opens the socket back to the originating host, redirects its I/O
 to the socket, and execs the specified command.
 
  * the loss of encription on the channel is certainly something that might
escape the attention of the user. I also wonder in how many cases you
really need the extra performance to justify the extra plumbing
mechanism.
 
 I felt the need for such functionality when moving GB data between
 different machines for creating a disk copy and backup to tape.  My
 requirements may be atypical, this is why I asked for input.
 
  * there are subtle implications of your new plumbing in the way
processes are started. With A | B | C the shell first creates the
pipes, then it can start the processes in any order, and they can
individually fail to start without any direct consequence other
than an I/O failure. A |@ B |@ C requires that you start things
from the end of the chain (because you cannot start a process
until you have a [socket] descriptor from the next stage in the
chain), and if a process fails to start you cannot even start the
next one in the sequence. Not that this is bad, just very different
from regular pipes.
 
 It is even worse.  You can not write A |@ B |@ C because sockets are
 created on the originating host.  For the above to work you would need a
 mechanism to create another socket between the B and C machines.  Maybe
 the syntax should be changed to make such constructions
 counterintuitive.
 
 
 Diomidis
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Network pipes

2003-07-24 Thread Luigi Rizzo
On Thu, Jul 24, 2003 at 10:36:41AM -0700, John-Mark Gurney wrote:
 Diomidis Spinellis wrote this message on Thu, Jul 24, 2003 at 14:04 +0300:
  separate command netpipe.   Netpipe takes as arguments the originating
  host, the socket port, the command to execute, and its arguments. 
  Netpipe opens the socket back to the originating host, redirects its I/O
  to the socket, and execs the specified command.
 
 This breaks nat firewalls.  It is very common occurance to only accept
 incoming connections, and only on certain ports.  This means any system
 of firewill will probably be broken by this. :(

actually it is the other way around -- this solution simply won't
work on firewalled systems. But to tell the truth, i doubt you'd
do a multi-gb backup through a nat and be worried about the encryption
overhead.

cheers
luigi

 i.e. behind a nat to a public system, the return connection can't be
 established.  From any system to a nat redirected ssh server, the
 incoming connection won't make it to the destination machine.
 
 I think this should just be a utility like Luigi suggested.  This will
 help solve these problems.
 
 -- 
   John-Mark GurneyVoice: +1 415 225 5579
 
  All that I will do, has been done, All that I have, has not.
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


hints on shell string expansion ?

2003-07-07 Thread Luigi Rizzo
Hi,
i need a bit of help from creative /bin/sh users...

I am writing a script to generate ipfw test cases, and as
part of the script i need to generate 'actions' which can be either
one or more, e.g.

a1=allow
a2=deny log
a3=pipe 10

Now, this works:

for act in $a1 $a2 $a3; do
echo add $act ip from 1.2.3.4 to 5.6.7.8
done

but i because the string of actions is used in several places,
I would love to find a way to group actions into a single
variable and then write something like this

actions=allow 'deny log' 'pipe 10'
for act in $actions ; do
echo add $act ip from 1.2.3.4 to 5.6.7.8
done

I have tried to play tricks with quotes and backquotes, backslashes,
eval, etc. but no methods helped. Any ideas ?

cheers
luigi
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: High CPU usage on high-bandwidth long distance connections.

2003-03-19 Thread Luigi Rizzo
On Tue, Mar 18, 2003 at 01:28:31PM -0800, Ed Mooring wrote:
...
 I had something vaguely similar happen while I was porting the FreeBSD
 4.2 networking stack to LynxOS. It turned out the culprit was sbappend().
 It does a linear pointer chase down the mbuf chain each time you do
 a write() or send(). With a high bandwidth-delay product, that chain
 can get very long.
 
 This topic came up on freebsd-net last July, and Luigi Rizzo provided
 the following URL for a patch to cache the end of the mbuf chain, so
 sbappend() stays O(1) instead of O(n).

the patch was only for UDP though. I think the poster was seeing the problem
with TCP (which is also affected by the same thing).

cheers
luigi

 http://docs.freebsd.org/cgi/getmsg.cgi?fetch=366972+0+archive/2001/freebsd-net/20010211.freebsd-net
 
 The subject of the July thread was 'the incredible shrinking socket', if
 you want to hunt through the archives.
 
 Hope this helps.
 
 -- 
 Ed Mooring ([EMAIL PROTECTED])
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-hackers in the body of the message

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message


Re: Realtek

2003-03-12 Thread Luigi Rizzo
On Wed, Mar 12, 2003 at 05:44:25PM +1100, Peter Jeremy wrote:
...
 Are you sure you were generating wire speed packets - this is about
 200,000 packets/sec at Fast speed.  ping -f runs at whatever rate

148,800kpps

 In order to get 200,000 pps, you're going to need 5-10 hosts
 generating traffic, each with a good NIC and connected to the test

one is enough as long as it is sufficiently fast (750MHz and above
in my experiments), you use a C program to call sendto() and
generate UDP packets, and your network card can cope with the
outgoing traffic (e.g. there is no way the 'fxp' can transmit
over ~120kpps no matter how fast the CPU is; 'xl' and several 'dc'
supported chips can do the job. Haven't tried other cards.

Using polling on the sender side helps but it is not
fundamental.

cheers
luigi

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message


Re: Realtek

2003-03-08 Thread Luigi Rizzo
On Fri, Mar 07, 2003 at 03:49:05PM -0600, Brandon D. Valentine wrote:
...
 I have had good luck with the Adaptec Quartet 66 cards, under both Linux
 and FreeBSD.  YMMV, though.  They come as 64-bit/66Mhz cards, which
...
 controllers on it.  Chances are if you really need a four-port card $300
 is not that much to throw at it.

At this price level, you can also consider the Intel PRO1000/MT
(part number is PWLA8492MT) which has two Gig-E ports (copper), is
well supported under FreeBSD by the Intel-supported em driver,
and costs $174 (list price, if you shop eg. on yahoo you can find
it cheaper than that).

The good thing of this cart is that it works at Gig speed, and
it is widely available so hopefully it won't disappear from
the market by the time you place your order.

cheers
luigi

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message


Re: FreeBSD firewall for high profile hosts - waste of time ?

2003-01-16 Thread Luigi Rizzo
why don't you read the ipfw manpage, install IPFW2, and rewrite
the ruleset using ipfw2 features (specifically the new syntax to
specify address sets) and dynamic rules:

something like

hosts={4,6,44,52,12,99,130,21,244}
ports=22,25,80,443
allow proto tcp src-ip 1.2.3.${hosts}/24 dst-port $ports setup keep-state
deny tcp from any to any

should reduce the 200+ rules that you have to the 4-lines/2-rules above.
Similar approach for UDP.

I think a lot of this discussion would have been saved if you
had given a good read to the ipfw manpage instead of trying to
tickle the ego of the list readers suggesting that the netwhatever
thing might perform better than FreeBSD on your task.

And i am stepping out of the discussion now...

cheers
luigi


On Thu, Jan 16, 2003 at 03:56:43PM -0800, Josh Brooks wrote:
 
  If I remember correctly he has less then 10Mbit
  uplink and a lot of count rules for client accounting.
  It is reason I recommend him to use userland accounting.
  And as far as I understand a lot of count rules is
  the reason for trouble.
 
 I removed all the count rules a week or so ago.  Now I just have 2-300
 rules in the form:
 
 allow tcp from $IP to any established
 allow tcp from any to $IP established
 allow tcp from any to $IP 22,25,80,443 setup
 deny ip from any to $IP
 
 and I have that same set in there about 50-70 times - one for each
 customer IP address hat has requested it.  That's it :)
 
 So each packet I get goes through about 5 rules at the front to check for
 bogus packets, then about 70 sets of the above until it either matches one
 of those, or goes out the end with the default allow rule.
 
 I _could_ put a ruleset like the above in for every customer, but then I
 would have about 2000 rules - so I only put them in for the customers that
 ask.  But again, even though every day I put in more and more special
 blocks for DoS packets, every day there is some new DoS packet that I have
 never seen before that hits me at thousands of packets per second, and all
 of them flow through that entire ruleset.
 -
 
 So I am going to:
 
 a) do the thing where I specify the interface for all my allow rules -
 that sounds like it will help a lot - 3 out of the 4 rules in the set
 above are allow rules - might as well push them through as soon as they
 get there.
 
 b) get better at blocking bogus packets every day :)
 
 c) start getting more complicated rate shaping with ipfw to limit icmp
 echo response and RSTs, etc.
 
 But I still don't know if any of that helps if I get a 20,000
 packet/second UDP flood to a valid port on an internal machine...
 
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-hackers in the body of the message

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: net.inet.ip.dummynet.hash_size

2002-12-16 Thread Luigi Rizzo
On Mon, Dec 16, 2002 at 11:10:42AM +0200, mika ruohotie wrote:
 
 hello,
 
 ipfw man page says:
 
  buckets hash-table-size
  Specifies the size of the hash table used for storing the various
  queues.  Default value is 64 controlled by the sysctl(8) variable
  net.inet.ip.dummynet.hash_size, allowed range is 16 to 1024.
 
 and my question is if it's possible to somehow go beyond 1024, or if
 doing so wouldnt be adviced. (ie, performance would be poor or so)

i think there was some recent commit which raises the limit above 1024
-- mind you, the number of buckets is not the upper bound for the
number of flows.

cheers
luigi

 i'd need to experiment shaping with large supernets giving each
 single ip addresses a designated bandwidth. (i'm using gigabit, if
 someone wonders if i really need that many, and the answer is yes)
 
 
 mickey
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-hackers in the body of the message

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: panic: icmp_error: bad length

2002-12-11 Thread Luigi Rizzo
the diagnosis looks reasonable, though i do not remember changing
anything related to this between 4.6 and 4.7 so i wonder why the
error did not appear in earlier versions of the code.

icmp_error() consumes the mbuf so i believe it is ok to scramble it
but one should double check.
Note that NTOHS() seem to be deprecated in favour of the function version
of the same

cheers
luigi

On Thu, Dec 12, 2002 at 12:54:48AM +, Ian Dowse wrote:
 In message [EMAIL PROTECTED], Alexander Langer
  writes:
 Yeah, same situation here.  4.6 used to work w/o problem, 4.7 doesn't.
 
 Great, thanks for the debugging info. The bug seems to be that
 icmp_error() requires that the IP header fields are in host order,
 but when it is called on a briged packet by the IPFW code, this is
 not the case. Something like the patch below (untested) should fix
 the IPFW1 case. A similar change is needed for IPFW2.
 
 Luigi: does this look reasonable? I'm not familiour enough with the
 IPFW code to know if it is OK to modify the mbuf like this. If not
 then it needs to be copied first like ip_forward() does, making
 sure that the IP header does not end up in a shared cluster.
 
 Ian
 
 Index: ip_fw.c
 ===
 RCS file: /home/iedowse/CVS/src/sys/netinet/ip_fw.c,v
 retrieving revision 1.131.2.38
 diff -u -r1.131.2.38 ip_fw.c
 --- ip_fw.c   21 Nov 2002 01:27:30 -  1.131.2.38
 +++ ip_fw.c   12 Dec 2002 00:43:22 -
 @@ -1573,6 +1573,11 @@
   break;
 }
   default:/* Send an ICMP unreachable using code */
 + /* Must convert to host order for icmp_error(). */
 + if (BRIDGED) {
 + NTOHS(ip-ip_len);
 + NTOHS(ip-ip_off);
 + }
   icmp_error(*m, ICMP_UNREACH,
   f-fw_reject_code, 0L, 0);
   *m = NULL;
 
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-hackers in the body of the message

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



out-of-order execution and code profiling

2002-11-25 Thread Luigi Rizzo
Hi,
I just got hit by a peculiar problem related to out-of-order
execution of instructions.
I was doing some low-level timing measurements using the rdtsc()
around selected pieces of code (the rdtsc() is included in
the TSTMP() functions that are in RELENG_4, source is in
sys/i386/isa/clock.c), as follows:

 TSTMP(3, ifp-if_unit, 1, 0);
tmp = CSR_READ_1(sc, FXP_CSR_SCB_STATACK);
 TSTMP(3, ifp-if_unit, 2, 0);
 TSTMP(3, ifp-if_unit, 3, 0);

CSR_READ_1() goes to do a volatile read on memory across a 33MHz
PCI bus, so it should take a very minimum of 100ns, plus arbitration
and bridge crossing and whatnot. To my surprise, on a 750MHz Athlon
box, the delta between the first two timestamps turned out to be
in the order of 39 clock cycles, whereas the delta between 2 and 3
is the 270-300 cycles range.

The only explaination i can find is that the rdtsc() within TSTMP()
is executed out of order.

I wonder, is there on the high-end i386 processors any 'barrier'
instruction of some kind that enforces in-order execution of some
piece of code ?
 
cheers
luigi

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: out-of-order execution and code profiling

2002-11-25 Thread Luigi Rizzo
thanks a lot for the pointer to CPUID

luigi

On Mon, Nov 25, 2002 at 05:15:06PM -0800, Nate Lawson wrote:
...
 The Intel processor manual has an explicit example for this and recommends
 you use cpuid as a serializing instruction before the call to rdtsc.  

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



Re: Report from EuroBSDCon 2002

2002-11-19 Thread Luigi Rizzo
Very nice report, thanks :)

just wanted to mention that i am totally unrelated to the Torino
crew (at least, as far as i can tell... unless there was some
former student of mine!) and the credit for the cool stuff
they did is entirely to them.

cheers
luigi

On Tue, Nov 19, 2002 at 11:33:31AM +0100, Poul-Henning Kamp wrote:
...
 Luigi@ had sent a part of his crew from Torino to talk about VPN
 and other cool networking stuff in FreeBSD.


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-hackers in the body of the message



  1   2   3   4   >