Re: [PATCH 4/4] udp: memory accounting in IPv4
On Wed, Dec 05, 2007 at 11:28:34PM -0500, Hideo AOKI wrote: 1. Using sk_forward_alloc and adding socket lock UDP already uses a socket lock to send message. However, it doesn't use the lock to receive message. I wonder if we can also use the lock when sk_forward_alloc is updated in receive processing. I understand performance issue might occur, but ... Having discussed this with Dave we've agreed that this is the best way to go. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
routing policy based on u32 classifier
Hello everybody. Kindly, I would like to know if the is any plan to add this feature to a future kernel release. I know that fwmark is able to do this, but there is the limitation in source ip address selection. TIA -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/8] [TFRC]: Put RX/TX initialisation into tfrc.c
| This separates RX/TX initialisation and puts all packet history / loss intervals | initialisation into tfrc.c. | The organisation is uniform: slab declaration - {rx,tx}_init() - {rx,tx}_exit() | | NAK, you can't call a __exit marked routine from a __init marked | routine. | Ok thanks, will fix that in revision 2. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/8] [TFRC]: Loss interval code needs the macros/inlines that were moved
| |distcc[24516] ERROR: compile /root/.ccache/packet_his.tmp.aspire.home.net.24512.i on _tiptop failed |/usr/src/davem-2.6/net/dccp/ccids/lib/packet_history.c: In function '__one_after_loss': |/usr/src/davem-2.6/net/dccp/ccids/lib/packet_history.c:266: error: lvalue required as unary '' operand snip | | Because you do it this way: | | tfrc_rx_hist_swap(TFRC_RX_HIST_ENTRY(h, 0), TFRC_RX_HIST_ENTRY(h, 3)); | | I checked and at least in this patch series all uses are of this type, | so why not do it using just the indexes, which would be simpler: | | tfrc_rx_hist_swap(h, 0, 3); | | With this implementation: | | static void tfrc_rx_hist_swap(struct tfrc_rx_hist *h, const int a, const int b) | { | const int idx_a = tfrc_rx_hist_index(h, a), | int idx_b = tfrc_rx_hist_index(h, b); | struct tfrc_rx_hist_entry *tmp = h-ring[idx_a]; | | h-ring[idx_a] = h-ring[idx_b]; | h-ring[idx_b] = tmp; | } | Agreed, that is useful in the present case, since then everything uses inlines. The only suggestion I'd like to make is to use `u8' instead of `int' since the indices will have very low values. There is a related point: you will probably have noticed that loss_interval.c also uses macros. I don't know if you are planning to convert these also into inlines. I think that there would be less benefit in converting these, since they are locl to loss_interval.c and mostly serve to improve readability. As I have at least one other patch to revise (plus another minor one), I'll rework this according to the above. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] UCC TDM driver for MPC83xx platforms
There are three patches [PATCH 1/3] drivers/misc : UCC TDM driver for mpc83xx platforms. This driver is usable in VoIP iind of applications to interface with SLIC kind of devices to exchange TDM voice samples. [PATCH 2/3] arch/ : Platform changes - device tree entries for UCC TDM driver for MPC8323ERDB platform. - QE changes related to TDM , like, 1) Modified ucc_fast_init so that it can be used by fast UCC based TDM driver. Mainly changes have been made to configure TDM clocks and Fsyncs. 2) Modified get_brg_clk so that it can return the input frequncy and input source of any BRG by reading the corresponding entries from device tree. 3) Added new nodes brg and clocks in the device tree which represent input clocks for different BRGs. 4) Modified qe_setbrg accordingly. - new device tree entries added for clocks and brg [PATCH 3/3] Documentation - Modified Documentation to explain the device tree entries related to UCC TDM driver and the new nodes added(clocks and brg) The patch applies over a merge of galak's for-2.6.25 plus for-2.6.24 plus of_doc_update branches. In brief the steps were git clone git://git.kernel.org/pub/scm/linux/kernel/git/galak/powerpc.git powerpc-galak git checkout -b for-2.6.25 origin/for-2.6.25 git checkout -b for-2.6.24 origin/for-2.6.24 git checkout -b of_doc_update origin/of_doc_update git pull . for-2.6.24# merge the other two git pull . for-2.6.25 git checkout -b tdm # clean slate for tdm rebase work Also after applying the patches changes have to be made corresponding to Tabi's patch qe: add function qe_clock_source. The driver has been tested with a VoIP stack and application on MPC8323ERDB. With Regards Poonam -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] UCC based TDM driver for MPC83xx platforms.
From: Poonam Agarwal-b10812 [EMAIL PROTECTED] The UCC TDM driver basically multiplexes and demultiplexes data from different channels. It can interface with for example SLIC kind of devices to receive TDM data demultiplex it and send to upper applications. At the transmit end it receives data for different channels multiplexes it and sends them on the TDM channel. It internally uses TSA( Time Slot Assigner) which does multiplexing and demultiplexing, UCC to perform SDMA between host buffers and the TSA, CMX to connect TSA to UCC. This driver will run on MPC8323E-RDB platforms. Signed-off-by: Poonam Aggrwal [EMAIL PROTECTED] Signed-off-by: Ashish Kalra [EMAIL PROTECTED] Signed-off-by: Kim Phillips [EMAIL PROTECTED] Signed-off-by: Michael Barkowski [EMAIL PROTECTED] --- drivers/misc/Kconfig | 21 + drivers/misc/Makefile |1 + drivers/misc/ucc_tdm.c | 1068 drivers/misc/ucc_tdm.h | 227 ++ 4 files changed, 1317 insertions(+), 0 deletions(-) create mode 100644 drivers/misc/ucc_tdm.c create mode 100644 drivers/misc/ucc_tdm.h diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index b5e67c0..698a72c 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -219,6 +219,27 @@ config THINKPAD_ACPI_BAY If you are not sure, say Y here. +config UCC_TDM + bool Freescale UCC TDM Driver + depends on QUICC_ENGINE UCC_FAST + default n + ---help--- + The TDM driver is for UCC based TDM devices for example, TDM device on + MPC832x RDB. Select it to run PowerVoIP on MPC832x RDB board. + The TDM driver can interface with SLIC kind of devices to transmit + and receive TDM samples. The TDM driver receives Time Division + multiplexed samples(for different channels) from the SLIC device, + demutiplexes them and sends them to the upper layers. At the transmit + end the TDM drivers receives samples for different channels, it + multiplexes them and sends them to the SLIC device. + +config TDM_LINEAR_PCM + bool Linear PCM mode + depends on UCC_TDM + ---help--- + This mode should be selected if the TDM driver interface with the + SLIC device is linear PCM(e.g. 16 bit samples). If not selected the + interface will be 8 bit u-law. config ATMEL_SSC tristate Device driver for Atmel SSC peripheral diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index 87f2685..6f0c49d 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -17,3 +17,4 @@ obj-$(CONFIG_SONY_LAPTOP) += sony-laptop.o obj-$(CONFIG_THINKPAD_ACPI)+= thinkpad_acpi.o obj-$(CONFIG_FUJITSU_LAPTOP) += fujitsu-laptop.o obj-$(CONFIG_EEPROM_93CX6) += eeprom_93cx6.o +obj-$(CONFIG_UCC_TDM) += ucc_tdm.o diff --git a/drivers/misc/ucc_tdm.c b/drivers/misc/ucc_tdm.c new file mode 100644 index 000..232d537 --- /dev/null +++ b/drivers/misc/ucc_tdm.c @@ -0,0 +1,1068 @@ +/* + * drivers/misc/ucc_tdm.c + * + * UCC Based Linux TDM Driver + * This driver is designed to support UCC based TDM for PowerPC processors. + * This driver can interface with SLIC device to run VOIP kind of + * applications. + * + * Author: Ashish Kalra Poonam Aggrwal + * + * Copyright (c) 2007 Freescale Semiconductor, Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or (at your + * option) any later version. + */ + +#include linux/autoconf.h +#include linux/module.h +#include linux/sched.h +#include linux/kernel.h +#include linux/slab.h +#include linux/errno.h +#include linux/types.h +#include linux/interrupt.h +#include linux/time.h +#include linux/skbuff.h +#include linux/proc_fs.h +#include linux/delay.h +#include linux/dma-mapping.h +#include linux/string.h +#include linux/irq.h +#include linux/of_platform.h +#include linux/io.h +#include linux/wait.h +#include linux/timer.h + +#include asm/immap_qe.h +#include asm/qe.h +#include asm/ucc.h +#include asm/ucc_fast.h +#include asm/ucc_slow.h + +#include ucc_tdm.h +#define DRV_DESC Freescale QE UCC TDM Driver +#define DRV_NAME ucc_tdm + +/* + * define the following #define if snooping or hardware-based cache coherency + * is disabled on the UCC transparent controller.This flag enables + * software-based cache-coherency support by explicitly flushing data cache + * contents after setting up the TDM output buffer(s) and invalidating the + * data cache contents before the TDM input buffer(s) are read. + */ +#undef UCC_CACHE_SNOOPING_DISABLED + +#define MAX_NUM_TDM_DEVICES 8 + +static struct tdm_ctrl *tdm_ctrl[MAX_NUM_TDM_DEVICES]; + +static int num_tdm_devices; +static int num_tdm_clients; + +#define PREV_PHASE(x) ((x == 0) ? MAX_PHASE : (x - 1)) +#define NEXT_PHASE(x) (((x + 1) MAX_PHASE) ? 0 : (x +
[0/4] DST: Distributed storage.
Distributed storage. I'm pleased to announce the 11'th release of the distributed storage subsystem (DST). This is a maintenance release and includes bug fixes and simple feature extensions only. DST allows to form a storage on top of local and remote nodes and combine them into linear or mirroring setup, which in turn can be exported to remote nodes. Short changelog: * wakeup state when mirror detected error to seedup reconnect * if connecting in csum mode to no-csum server, do not enable csums * do not clean queue until all users are removed * allow to increase size of the storage in linear add callback (with this change it is possible to add nodes into linear array in real time without stopping storage. Filesystem has to be prepared for the case when underlying device has changed its size. Real-time addon of mirror nodes is also supported) * allow to delete gendisk only after device was started * dst debug config option * Name: Gamardjoba, genacvale! ('Hi friend' in georgian) Great thanks to Matthew Hodgson [EMAIL PROTECTED] for debugging! Overall list of features of the DST can be found on project's homepage: http://tservice.net.ru/~s0mbre/old/?section=projectsitem=dst Thank you. Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED] -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[1/4] DST: Distributed storage documentation.
Distributed storage documentation. Algorithms used in the system, userspace interfaces (sysfs dirs and files), design and implementation details are described here. Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED] diff --git a/Documentation/dst/algorithms.txt b/Documentation/dst/algorithms.txt new file mode 100644 index 000..1437a6a --- /dev/null +++ b/Documentation/dst/algorithms.txt @@ -0,0 +1,115 @@ +Each storage by itself is just a set of contiguous logical blocks, with +allowed number of operations. Nodes, each of which has own start and size, +are placed into storage by appropriate algorithm, which remaps +logical sector number into real node's sector. One can create +own algorithms, since DST has pluggable interface for that. +Currently mirrored and linear algorithms are supported. + +Let's briefly describe how they work. + +Linear algorithm. +Simple approach of concatenating storages into single device with +increased size is used in this algorithm. Essentially new device +has size equal to sum of sizes of underlying nodes and nodes are +placed one after another. + + /- Node 1 ---\ /-- Node 3 \ +start end start end + |==||==| + |start end | + | \--- Node 2 -/ | + | | +start end + \-- DST storage --/ + + /\ + || + || + + IO operations + + Figure 1. + 3 nodes combined into single storage using linear algorithm. + +Mirror algorithm. +In this algorithms nodes are placed under each other, so when +operation comes to the first one, it can be mirrored to all +underlying nodes. In case of reading, actual data is obtained from +the nearest node - algoritm keeps track of previous operation +and knows where it was stopped, so that subsequent seek to the +start of the new request will take the shortest time. +Writing is always mirrored to all underlying nodes. + + IO operations + || + || + \/ + +| DST storage ---| +| prev position | +|---| Node 1 | +| prev pos | +| Node 2 -|--| +|prev pos| +|---| Node 3 | + + Figure 2. + 3 nodes combined into single storage using mirror algorithm. + +Each algorithm must implement number of callbacks, +which must be registered during initialization time. + +struct dst_alg_ops +{ + int (*add_node)(struct dst_node *n); + void(*del_node)(struct dst_node *n); + int (*remap)(struct dst_request *req); + int (*error)(struct kst_state *state, int err); + struct module *owner; +}; + [EMAIL PROTECTED] +This callback is invoked when new node is being added into the storage, +but before node is actually added into the storage, so that it could +be accessed from it. When it is called, all appropriate initialization +of the underlying device is already completed (system has been connected +to remote node or got a reference to the local block device). At this +stage algorithm can add node into private map. +It must return zero on success or negative value otherwise. + [EMAIL PROTECTED] +This callback is invoked when node is being deleted from the storage, +i.e. when its reference counter hits zero. It is called before +any cleaning is performed. +It must return zero on success or negative value otherwise. + [EMAIL PROTECTED] +This callback is invoked each time new bio hits the storage. +Request structure contains BIO itself, pointer to the node, which originally +stores the whole region under given IO request, and various parameters +used by storage core to process this block request. +It must return zero on success or negative value otherwise. It is upto +this method to call all cleaning if remapping failed, for example it must +call kst_bio_endio() for given callback in case of error, which in turn +will call bio_endio(). Note, that dst_request structure provided in this +callback is allocated on stack, so if there is a need to use it outside +of the given function, it must be cloned (it will happen automatically +in state's push callback, but that copy will not be shared by any other +user). + [EMAIL PROTECTED] +This callback is invoked for each error, which happend when processed
[3/4] DST: Network state machine.
Network state machine. Includes network async processing state machine and related tasks. Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED] diff --git a/drivers/block/dst/kst.c b/drivers/block/dst/kst.c new file mode 100644 index 000..8fa3387 --- /dev/null +++ b/drivers/block/dst/kst.c @@ -0,0 +1,1513 @@ +/* + * 2007+ Copyright (c) Evgeniy Polyakov [EMAIL PROTECTED] + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include linux/kernel.h +#include linux/module.h +#include linux/list.h +#include linux/slab.h +#include linux/socket.h +#include linux/kthread.h +#include linux/net.h +#include linux/in.h +#include linux/poll.h +#include linux/bio.h +#include linux/dst.h + +#include net/sock.h + +struct kst_poll_helper +{ + poll_table pt; + struct kst_state*st; +}; + +static LIST_HEAD(kst_worker_list); +static DEFINE_MUTEX(kst_worker_mutex); + +/* + * This function creates bound socket for local export node. + */ +static int kst_sock_create(struct kst_state *st, struct saddr *addr, + int type, int proto, int backlog) +{ + int err; + + err = sock_create(addr-sa_family, type, proto, st-socket); + if (err) + goto err_out_exit; + + err = st-socket-ops-bind(st-socket, (struct sockaddr *)addr, + addr-sa_data_len); + + err = st-socket-ops-listen(st-socket, backlog); + if (err) + goto err_out_release; + + st-socket-sk-sk_allocation = GFP_NOIO; + + return 0; + +err_out_release: + sock_release(st-socket); +err_out_exit: + return err; +} + +static void kst_sock_release(struct kst_state *st) +{ + if (st-socket) { + sock_release(st-socket); + st-socket = NULL; + } +} + +void kst_wake(struct kst_state *st) +{ + if (st) { + struct kst_worker *w = st-node-w; + unsigned long flags; + + spin_lock_irqsave(w-ready_lock, flags); + if (list_empty(st-ready_entry)) + list_add_tail(st-ready_entry, w-ready_list); + spin_unlock_irqrestore(w-ready_lock, flags); + + wake_up(w-wait); + } +} +EXPORT_SYMBOL_GPL(kst_wake); + +/* + * Polling machinery. + */ +static int kst_state_wake_callback(wait_queue_t *wait, unsigned mode, + int sync, void *key) +{ + struct kst_state *st = container_of(wait, struct kst_state, wait); + kst_wake(st); + return 1; +} + +static void kst_queue_func(struct file *file, wait_queue_head_t *whead, +poll_table *pt) +{ + struct kst_state *st = container_of(pt, struct kst_poll_helper, pt)-st; + + st-whead = whead; + init_waitqueue_func_entry(st-wait, kst_state_wake_callback); + add_wait_queue(whead, st-wait); +} + +static void kst_poll_exit(struct kst_state *st) +{ + if (st-whead) { + remove_wait_queue(st-whead, st-wait); + st-whead = NULL; + } +} + +/* + * This function removes request from state tree and ordering list. + */ +void kst_del_req(struct dst_request *req) +{ + list_del_init(req-request_list_entry); +} +EXPORT_SYMBOL_GPL(kst_del_req); + +static struct dst_request *kst_req_first(struct kst_state *st) +{ + struct dst_request *req = NULL; + + if (!list_empty(st-request_list)) + req = list_entry(st-request_list.next, struct dst_request, + request_list_entry); + return req; +} + +/* + * This function dequeues first request from the queue and tree. + */ +static struct dst_request *kst_dequeue_req(struct kst_state *st) +{ + struct dst_request *req; + + mutex_lock(st-request_lock); + req = kst_req_first(st); + if (req) + kst_del_req(req); + mutex_unlock(st-request_lock); + return req; +} + +/* + * This function enqueues request into tree, indexed by start of the request, + * and also puts request into ordered queue. + */ +int kst_enqueue_req(struct kst_state *st, struct dst_request *req) +{ + if (unlikely(req-flags DST_REQ_CHECK_QUEUE)) { + struct dst_request *r; + + list_for_each_entry(r, st-request_list, request_list_entry) { + if (bio_rw(r-bio) != bio_rw(req-bio)) + continue; + + if (r-start = req-start + req-size) + continue; + +
[4/4] DST: Algorithms used in distributed storage.
Algorithms used in distributed storage. Mirror and linear mapping code. Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED] diff --git a/drivers/block/dst/alg_linear.c b/drivers/block/dst/alg_linear.c new file mode 100644 index 000..9dc0976 --- /dev/null +++ b/drivers/block/dst/alg_linear.c @@ -0,0 +1,105 @@ +/* + * 2007+ Copyright (c) Evgeniy Polyakov [EMAIL PROTECTED] + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include linux/module.h +#include linux/kernel.h +#include linux/init.h +#include linux/dst.h + +static struct dst_alg *alg_linear; + +/* + * This callback is invoked when node is removed from storage. + */ +static void dst_linear_del_node(struct dst_node *n) +{ +} + +/* + * This callback is invoked when node is added to storage. + */ +static int dst_linear_add_node(struct dst_node *n) +{ + struct dst_storage *st = n-st; + + dprintk(%s: disk_size: %llu, node_size: %llu.\n, + __func__, st-disk_size, n-size); + + mutex_lock(st-tree_lock); + n-start = st-disk_size; + st-disk_size += n-size; + set_capacity(st-disk, st-disk_size); + mutex_unlock(st-tree_lock); + + return 0; +} + +static int dst_linear_remap(struct dst_request *req) +{ + int err; + + if (req-node-bdev) { + generic_make_request(req-bio); + return 0; + } + + err = kst_check_permissions(req-state, req-bio); + if (err) + return err; + + return req-state-ops-push(req); +} + +/* + * Failover callback - it is invoked each time error happens during + * request processing. + */ +static int dst_linear_error(struct kst_state *st, int err) +{ + if (err) + set_bit(DST_NODE_FROZEN, st-node-flags); + else + clear_bit(DST_NODE_FROZEN, st-node-flags); + return 0; +} + +static struct dst_alg_ops alg_linear_ops = { + .remap = dst_linear_remap, + .add_node = dst_linear_add_node, + .del_node = dst_linear_del_node, + .error = dst_linear_error, + .owner = THIS_MODULE, +}; + +static int __devinit alg_linear_init(void) +{ + alg_linear = dst_alloc_alg(alg_linear, alg_linear_ops); + if (!alg_linear) + return -ENOMEM; + + return 0; +} + +static void __devexit alg_linear_exit(void) +{ + dst_remove_alg(alg_linear); +} + +module_init(alg_linear_init); +module_exit(alg_linear_exit); + +MODULE_LICENSE(GPL); +MODULE_AUTHOR(Evgeniy Polyakov [EMAIL PROTECTED]); +MODULE_DESCRIPTION(Linear distributed algorithm.); diff --git a/drivers/block/dst/alg_mirror.c b/drivers/block/dst/alg_mirror.c new file mode 100644 index 000..3c457ff --- /dev/null +++ b/drivers/block/dst/alg_mirror.c @@ -0,0 +1,1128 @@ +/* + * 2007+ Copyright (c) Evgeniy Polyakov [EMAIL PROTECTED] + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include linux/module.h +#include linux/kernel.h +#include linux/init.h +#include linux/poll.h +#include linux/dst.h + +struct dst_mirror_node_data +{ + u64 age; +}; + +struct dst_mirror_priv +{ + unsigned intchunk_num; + + u64 last_start; + + spinlock_t backlog_lock; + struct list_headbacklog_list; + + struct dst_mirror_node_data old_data, new_data; + + unsigned long *chunk; +}; + +static struct dst_alg *alg_mirror; +static struct bio_set *dst_mirror_bio_set; + +static int dst_mirror_resync(struct dst_node *n, int ndp); + +static void dst_mirror_mark_sync(struct dst_node *n) +{ + if (test_bit(DST_NODE_NOTSYNC, n-flags)) { + struct dst_mirror_priv *priv = n-priv; + + clear_bit(DST_NODE_NOTSYNC, n-flags); + dprintk(%s: node: %p, %llu:%llu synchronization + has been completed.\n, + __func__, n, n-start, n-size); + priv-old_data.age = 0; + } +} +
[2/4] DST: Core distributed storage files.
Core distributed storage files. Include userspace interfaces, initialization, block layer bindings and other core functionality. Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED] diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig index b4c8319..ca6592d 100644 --- a/drivers/block/Kconfig +++ b/drivers/block/Kconfig @@ -451,6 +451,8 @@ config ATA_OVER_ETH This driver provides Support for ATA over Ethernet block devices like the Coraid EtherDrive (R) Storage Blade. +source drivers/block/dst/Kconfig + source drivers/s390/block/Kconfig endmenu diff --git a/drivers/block/Makefile b/drivers/block/Makefile index dd88e33..fcf042d 100644 --- a/drivers/block/Makefile +++ b/drivers/block/Makefile @@ -29,3 +29,4 @@ obj-$(CONFIG_VIODASD) += viodasd.o obj-$(CONFIG_BLK_DEV_SX8) += sx8.o obj-$(CONFIG_BLK_DEV_UB) += ub.o +obj-$(CONFIG_DST) += dst/ diff --git a/drivers/block/dst/Kconfig b/drivers/block/dst/Kconfig new file mode 100644 index 000..e91f8ed --- /dev/null +++ b/drivers/block/dst/Kconfig @@ -0,0 +1,28 @@ +config DST + tristate Distributed storage + depends on NET + select CONNECTOR + select LIBCRC32C + ---help--- + This driver allows to create a distributed storage. + +config DST_DEBUG + bool DST debug + depends on DST + ---help--- + This option will turn HEAVY debugging of the DST. + Turn it on ONLY if you have to debug some really obscure problem. + +config DST_ALG_LINEAR + tristate Linear distribution algorithm + depends on DST + ---help--- + This module allows to create linear mapping of the nodes + in the distributed storage. + +config DST_ALG_MIRROR + tristate Mirror distribution algorithm + depends on DST + ---help--- + This module allows to create a mirror of the noes in the + distributed storage. diff --git a/drivers/block/dst/Makefile b/drivers/block/dst/Makefile new file mode 100644 index 000..1400e94 --- /dev/null +++ b/drivers/block/dst/Makefile @@ -0,0 +1,6 @@ +obj-$(CONFIG_DST) += dst.o + +dst-y := dcore.o kst.o + +obj-$(CONFIG_DST_ALG_LINEAR) += alg_linear.o +obj-$(CONFIG_DST_ALG_MIRROR) += alg_mirror.o diff --git a/drivers/block/dst/dcore.c b/drivers/block/dst/dcore.c new file mode 100644 index 000..17a5e61 --- /dev/null +++ b/drivers/block/dst/dcore.c @@ -0,0 +1,1631 @@ +/* + * 2007+ Copyright (c) Evgeniy Polyakov [EMAIL PROTECTED] + * All rights reserved. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + */ + +#include linux/module.h +#include linux/kernel.h +#include linux/init.h +#include linux/blkdev.h +#include linux/bio.h +#include linux/slab.h +#include linux/connector.h +#include linux/socket.h +#include linux/dst.h +#include linux/device.h +#include linux/in.h +#include linux/in6.h +#include linux/buffer_head.h + +#include net/sock.h + +static LIST_HEAD(dst_storage_list); +static LIST_HEAD(dst_alg_list); +static DEFINE_MUTEX(dst_storage_lock); +static DEFINE_MUTEX(dst_alg_lock); +static int dst_major; +static struct kst_worker *kst_main_worker; +static struct cb_id cn_dst_id = { CN_DST_IDX, CN_DST_VAL }; + +struct kmem_cache *dst_request_cache; + +static char dst_name[] = Gamardjoba, genacvale!; + +/* + * DST sysfs tree. For device called 'storage' which is formed + * on top of two nodes this looks like this: + * + * /sys/devices/storage/ + * /sys/devices/storage/alg : alg_linear + * /sys/devices/storage/n-800/type : R: 192.168.4.80:1025 + * /sys/devices/storage/n-800/size : 800 + * /sys/devices/storage/n-800/start : 800 + * /sys/devices/storage/n-800/clean + * /sys/devices/storage/n-800/dirty + * /sys/devices/storage/n-0/type : R: 192.168.4.81:1025 + * /sys/devices/storage/n-0/size : 800 + * /sys/devices/storage/n-0/start : 0 + * /sys/devices/storage/n-0/clean + * /sys/devices/storage/n-0/dirty + * /sys/devices/storage/remove_all_nodes + * /sys/devices/storage/nodes : sectors (start [size]): 0 [800] | 800 [800] + * /sys/devices/storage/name : storage + */ + +static int dst_dev_match(struct device *dev, struct device_driver *drv) +{ + return 1; +} + +static void dst_dev_release(struct device *dev) +{ +} + +static struct bus_type dst_dev_bus_type = { + .name = dst, + .match = dst_dev_match, +}; + +static struct device dst_dev = { + .bus= dst_dev_bus_type, + .release= dst_dev_release +}; + +static void dst_node_release(struct device *dev) +{ +} + +static
Re: [PATCH 5/8] [TFRC]: Loss interval code needs the macros/inlines that were moved
Em Mon, Dec 10, 2007 at 11:31:53AM +, Gerrit Renker escreveu: | |distcc[24516] ERROR: compile /root/.ccache/packet_his.tmp.aspire.home.net.24512.i on _tiptop failed |/usr/src/davem-2.6/net/dccp/ccids/lib/packet_history.c: In function '__one_after_loss': |/usr/src/davem-2.6/net/dccp/ccids/lib/packet_history.c:266: error: lvalue required as unary '' operand snip | | Because you do it this way: | | tfrc_rx_hist_swap(TFRC_RX_HIST_ENTRY(h, 0), TFRC_RX_HIST_ENTRY(h, 3)); | | I checked and at least in this patch series all uses are of this type, | so why not do it using just the indexes, which would be simpler: | | tfrc_rx_hist_swap(h, 0, 3); | | With this implementation: | | static void tfrc_rx_hist_swap(struct tfrc_rx_hist *h, const int a, const int b) | { | const int idx_a = tfrc_rx_hist_index(h, a), | int idx_b = tfrc_rx_hist_index(h, b); | struct tfrc_rx_hist_entry *tmp = h-ring[idx_a]; | | h-ring[idx_a] = h-ring[idx_b]; | h-ring[idx_b] = tmp; | } | Agreed, that is useful in the present case, since then everything uses inlines. The only suggestion I'd like to make is to use `u8' instead of `int' since the indices will have very low values. Agreed. There is a related point: you will probably have noticed that loss_interval.c also uses macros. I don't know if you are planning to convert these also into inlines. I think that there would be less benefit in converting these, since they are locl to loss_interval.c and mostly serve to improve readability. In general I'm against using macros for functions, so please always consider doing things as inlines. I'll read some more patches today and provide comments as to if I think it is ok for now to keep it as macros. As I have at least one other patch to revise (plus another minor one), I'll rework this according to the above. Thank you. - Arnaldo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] UCC TDM driver for MPC83xx platforms
There are three patches [PATCH 1/3] drivers/misc : UCC TDM driver for mpc83xx platforms. This driver is usable in VoIP iind of applications to interface with SLIC kind of devices to exchange TDM voice samples. [PATCH 2/3] arch/ : Platform changes - device tree entries for UCC TDM driver for MPC8323ERDB platform. - QE changes related to TDM , like, 1) Modified ucc_fast_init so that it can be used by fast UCC based TDM driver. Mainly changes have been made to configure TDM clocks and Fsyncs. 2) Modified get_brg_clk so that it can return the input frequncy and input source of any BRG by reading the corresponding entries from device tree. 3) Added new nodes brg and clocks in the device tree which represent input clocks for different BRGs. 4) Modified qe_setbrg accordingly. - new device tree entries added for clocks and brg [PATCH 3/3] Documentation - Modified Documentation to explain the device tree entries related to UCC TDM driver and the new nodes added(clocks and brg) The patch applies over a merge of galak's for-2.6.25 plus for-2.6.24 plus of_doc_update branches. In brief the steps were git clone git://git.kernel.org/pub/scm/linux/kernel/git/galak/powerpc.git powerpc-galak git checkout -b for-2.6.25 origin/for-2.6.25 git checkout -b for-2.6.24 origin/for-2.6.24 git checkout -b of_doc_update origin/of_doc_update git pull . for-2.6.24# merge the other two git pull . for-2.6.25 git checkout -b tdm # clean slate for tdm rebase work Also after applying the patches changes have to be made corresponding to Tabi's patch qe: add function qe_clock_source. The driver has been tested with a VoIP stack and application on MPC8323ERDB. With Regards Poonam -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] drivers/misc :UCC based TDM driver for MPC83xx platforms.
From: Poonam Aggrwal [EMAIL PROTECTED] The UCC TDM driver basically multiplexes and demultiplexes data from different channels. It can interface with for example SLIC kind of devices to receive TDM data demultiplex it and send to upper applications. At the transmit end it receives data for different channels multiplexes it and sends them on the TDM channel. It internally uses TSA( Time Slot Assigner) which does multiplexing and demultiplexing, UCC to perform SDMA between host buffers and the TSA, CMX to connect TSA to UCC. This driver will run on MPC8323E-RDB platforms. Signed-off-by: Poonam Aggrwal [EMAIL PROTECTED] Signed-off-by: Ashish Kalra [EMAIL PROTECTED] Signed-off-by: Kim Phillips [EMAIL PROTECTED] Signed-off-by: Michael Barkowski [EMAIL PROTECTED] --- drivers/misc/Kconfig | 21 + drivers/misc/Makefile |1 + drivers/misc/ucc_tdm.c | 1068 drivers/misc/ucc_tdm.h | 227 ++ 4 files changed, 1317 insertions(+), 0 deletions(-) create mode 100644 drivers/misc/ucc_tdm.c create mode 100644 drivers/misc/ucc_tdm.h diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index b5e67c0..698a72c 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -219,6 +219,27 @@ config THINKPAD_ACPI_BAY If you are not sure, say Y here. +config UCC_TDM + bool Freescale UCC TDM Driver + depends on QUICC_ENGINE UCC_FAST + default n + ---help--- + The TDM driver is for UCC based TDM devices for example, TDM device on + MPC832x RDB. Select it to run PowerVoIP on MPC832x RDB board. + The TDM driver can interface with SLIC kind of devices to transmit + and receive TDM samples. The TDM driver receives Time Division + multiplexed samples(for different channels) from the SLIC device, + demutiplexes them and sends them to the upper layers. At the transmit + end the TDM drivers receives samples for different channels, it + multiplexes them and sends them to the SLIC device. + +config TDM_LINEAR_PCM + bool Linear PCM mode + depends on UCC_TDM + ---help--- + This mode should be selected if the TDM driver interface with the + SLIC device is linear PCM(e.g. 16 bit samples). If not selected the + interface will be 8 bit u-law. config ATMEL_SSC tristate Device driver for Atmel SSC peripheral diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index 87f2685..6f0c49d 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -17,3 +17,4 @@ obj-$(CONFIG_SONY_LAPTOP) += sony-laptop.o obj-$(CONFIG_THINKPAD_ACPI)+= thinkpad_acpi.o obj-$(CONFIG_FUJITSU_LAPTOP) += fujitsu-laptop.o obj-$(CONFIG_EEPROM_93CX6) += eeprom_93cx6.o +obj-$(CONFIG_UCC_TDM) += ucc_tdm.o diff --git a/drivers/misc/ucc_tdm.c b/drivers/misc/ucc_tdm.c new file mode 100644 index 000..232d537 --- /dev/null +++ b/drivers/misc/ucc_tdm.c @@ -0,0 +1,1068 @@ +/* + * drivers/misc/ucc_tdm.c + * + * UCC Based Linux TDM Driver + * This driver is designed to support UCC based TDM for PowerPC processors. + * This driver can interface with SLIC device to run VOIP kind of + * applications. + * + * Author: Ashish Kalra Poonam Aggrwal + * + * Copyright (c) 2007 Freescale Semiconductor, Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or (at your + * option) any later version. + */ + +#include linux/autoconf.h +#include linux/module.h +#include linux/sched.h +#include linux/kernel.h +#include linux/slab.h +#include linux/errno.h +#include linux/types.h +#include linux/interrupt.h +#include linux/time.h +#include linux/skbuff.h +#include linux/proc_fs.h +#include linux/delay.h +#include linux/dma-mapping.h +#include linux/string.h +#include linux/irq.h +#include linux/of_platform.h +#include linux/io.h +#include linux/wait.h +#include linux/timer.h + +#include asm/immap_qe.h +#include asm/qe.h +#include asm/ucc.h +#include asm/ucc_fast.h +#include asm/ucc_slow.h + +#include ucc_tdm.h +#define DRV_DESC Freescale QE UCC TDM Driver +#define DRV_NAME ucc_tdm + +/* + * define the following #define if snooping or hardware-based cache coherency + * is disabled on the UCC transparent controller.This flag enables + * software-based cache-coherency support by explicitly flushing data cache + * contents after setting up the TDM output buffer(s) and invalidating the + * data cache contents before the TDM input buffer(s) are read. + */ +#undef UCC_CACHE_SNOOPING_DISABLED + +#define MAX_NUM_TDM_DEVICES 8 + +static struct tdm_ctrl *tdm_ctrl[MAX_NUM_TDM_DEVICES]; + +static int num_tdm_devices; +static int num_tdm_clients; + +#define PREV_PHASE(x) ((x == 0) ? MAX_PHASE : (x - 1)) +#define NEXT_PHASE(x) (((x + 1) MAX_PHASE) ? 0 : (x + 1)) +
[PATCH 2/3] arch/ : Platform changes for UCC TDM driver for MPC8323ERDB.Also includes related QE changes.
From: Poonam Aggrwal [EMAIL PROTECTED] This patch makes necessary changes in the QE and UCC framework to support TDM. It also adds support to configure the BRG properly through device tree entries. Includes the device tree changes for UCC TDM driver as well. It also includes device tree entries for UCC TDM driver. Tested on MPC8323ERDB platform. Signed-off-by: Poonam Aggrwal [EMAIL PROTECTED] Signed-off-by: Ashish Kalra [EMAIL PROTECTED] Signed-off-by: Kim Phillips [EMAIL PROTECTED] Signed-off-by: Michael Barkowski [EMAIL PROTECTED] --- arch/powerpc/boot/dts/mpc832x_rdb.dts | 58 +++ arch/powerpc/sysdev/qe_lib/qe.c | 128 ++-- arch/powerpc/sysdev/qe_lib/ucc.c | 265 + arch/powerpc/sysdev/qe_lib/ucc_fast.c | 37 + include/asm-powerpc/qe.h |8 + include/asm-powerpc/ucc.h |4 + include/asm-powerpc/ucc_fast.h|4 + 7 files changed, 492 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/boot/dts/mpc832x_rdb.dts b/arch/powerpc/boot/dts/mpc832x_rdb.dts index 388c8a7..333408c 100644 --- a/arch/powerpc/boot/dts/mpc832x_rdb.dts +++ b/arch/powerpc/boot/dts/mpc832x_rdb.dts @@ -105,6 +105,17 @@ device_type = par_io; num-ports = 7; + ucc1pio:[EMAIL PROTECTED] { + pio-map = + /* port pin dir open_drain assignment has_irq */ + 0 e 2 0 1 0/* CLK11 */ + 3 16 1 0 2 0/* BRG9 */ + 3 1b 1 0 2 0/* BRG3 */ + 0 0 3 0 2 0/* TDMATxD0 */ + 0 4 3 0 2 0/* TDMARxD0 */ + 3 1b 2 0 1 0; /* CLK1 */ + }; + ucc2pio:[EMAIL PROTECTED] { pio-map = /* port pin dir open_drain assignment has_irq */ @@ -169,6 +180,36 @@ }; }; + clocks { + compatible = fsl,cpm-clocks; + /* clock freqs in Hz(for CLK1~CLK24). +* CLK11 is 1024KHz, +* all other clocks unused +* #clock-cells define number of cells +* used by the clock-frequency. +* right now only #clock cells=1 is +* implemented. Provision is there to +* handle frequencies 4Gig +*/ + #clock-cells = 1; + clock-frequency = 0 0 0 0 0 0 + 0 0 0 0 d#1024000 0 + 0 0 0 0 0 0 + 0 0 0 0 0 0; + }; + + [EMAIL PROTECTED] { + compatible = fsl,cpm-brg; + /* input clock sources for all the 16 BRGs. +* 1-24 for CLK1 to CLK24. +* BRG9 uses CLK11,BRG1 and BRG2-8 use +* the QE clock. +*/ + fsl,brg-sources = 0 0 0 0 0 0 0 0 + b 0 0 0 0 0 0 0; + reg = 640 7f; + }; + [EMAIL PROTECTED] { device_type = spi; compatible = fsl_spi; @@ -187,6 +228,23 @@ mode = cpu; }; + [EMAIL PROTECTED] { + device_type = tdm; + compatible = fsl,ucc-tdm; + model = UCC; + device-id = 1; + fsl,tdm-num = 1; + fsl,si-num = 1; + fsl,tdm-tx-clk = CLK1; + fsl,tdm-rx-clk = CLK1; + fsl,tdm-tx-sync = BRG9; + fsl,tdm-rx-sync = BRG9; + reg = 2000 200; + interrupts = 20; + interrupt-parent = qeic; + pio-handle = ucc1pio; + }; + [EMAIL PROTECTED] { device_type = network; compatible = ucc_geth; diff --git a/arch/powerpc/sysdev/qe_lib/qe.c b/arch/powerpc/sysdev/qe_lib/qe.c index 1df3b4a..abcf0b4 100644 --- a/arch/powerpc/sysdev/qe_lib/qe.c +++ b/arch/powerpc/sysdev/qe_lib/qe.c @@ -149,22 +149,116 @@ EXPORT_SYMBOL(qe_issue_cmd); */ static unsigned int brg_clk = 0; -unsigned int get_brg_clk(void) +u32 get_brg_clk(enum qe_clock brgclk, enum qe_clock *brg_source) { - struct device_node *qe; -
[PATCH 3/3] Modified Documentation to explain dts entries for UCC TDM driver.
From: Poonam Aggrwal [EMAIL PROTECTED] Modified Documentation to explain new properties introduced for UCC TDM driver. Also two new nodes have been added brg and clocks to configure a BRG from device tree. Signed-off-by: Poonam Aggrwal [EMAIL PROTECTED] Signed-off-by: Ashish Kalra [EMAIL PROTECTED] Signed-off-by: Kim Phillips [EMAIL PROTECTED] Signed-off-by: Michael Barkowski [EMAIL PROTECTED] --- Documentation/powerpc/booting-without-of.txt | 96 +- 1 files changed, 94 insertions(+), 2 deletions(-) diff --git a/Documentation/powerpc/booting-without-of.txt b/Documentation/powerpc/booting-without-of.txt index e9a3cb1..94a6b4b 100644 --- a/Documentation/powerpc/booting-without-of.txt +++ b/Documentation/powerpc/booting-without-of.txt @@ -1613,8 +1613,8 @@ platforms are moved over to use the flattened-device-tree model. Required properties: - device_type : should be network, hldc, uart, transparent -bisync or atm. - - compatible : could be ucc_geth or fsl_atm and so on. +bisync, atm or tdm. + - compatible : could be ucc_geth, fsl_atm or fsl,ucc_tdm and so on. - model : should be UCC. - device-id : the ucc number(1-8), corresponding to UCCx in UM. - reg : Offset and length of the register set for the device @@ -1666,7 +1666,44 @@ platforms are moved over to use the flattened-device-tree model. pio-handle = 140001; }; + Required properties for tdm device_type: + - instead of tx-clock and rx-clock following clock properties are + required: + - fsl,tdm-tx-clk : This property selects the TX clock source for TDM + from a bank of clocks. + - fsl,tdm-rx-clk : This property selects the RX clock source for TDM + from a bank of clocks. + - fsl,tdm-tx-sync : This property selects the TX Frame sync source + for TDM from a bank of clocks. + - fsl,tdm-rx-sync : This property selects the TX Frame sync source + for TDM from a bank of clocks. + + All the above mentioned properties are string type with possible + values + CLK1, CLK2, CLK3...CLK24 and so on + BRG1, BRG2, BRG3...BRG16 and so on + + - fsl,tdm-num : TDM to be used (1,2,3 or 4 for TDMA TDMB TDMC TDMD) + - fsl,si-num : Serial Interface to be used. + Example: + [EMAIL PROTECTED] { + device_type = tdm; + compatible = fsl,ucc-tdm; + model = UCC; + device-id = 1; + fsl,tdm-num = 1; + fsl,si-num = 1; + fsl,tdm-tx-clk = CLK1; + fsl,tdm-rx-clk = CLK1; + fsl,tdm-tx-sync = BRG9; + fsl,tdm-rx-sync = BRG9; + reg = 2000 200; + interrupts = 20; + interrupt-parent = qeic; + pio-handle = ucc1pio; + }; + v) Parallel I/O Ports This node configures Parallel I/O ports for CPUs with QE support. @@ -1772,6 +1809,61 @@ platforms are moved over to use the flattened-device-tree model. }; }; + viii) Clocks (clocks) + This node specifies the frequency values for all the external clocks + viz CLK1 to CLK24 in Hz. + + Required Properties: + - compatible : should be fsl,cpm-clocks. + - #clock-cells : It specifies the number of cells occupied by clock-frequency +property. Currently #clock-cells = 1 is only supported and implemented. +This property is kept for future in case we need frequencies higher than +4 GHz. + - clock-frequency : It is a list of u32 values to represent the frequency +of each external clock(CLK1 to CLK24) in Hz.Each entry occupies +number of cells specified by #clock-cells property(1 for now). + + Example: + + clocks { + compatible = fsl,cpm-clocks; + #clock-cells = 1; + /* clock freqs in Hz(for CLK1~CLK24). +* CLK11 is 1024KHz, +* all other clocks unused +*/ + clock-frequency = 0 0 0 0 0 0 + 0 0 0 0 0 d#1024000 0 + 0 0 0 0 0 0 + 0 0 0 0 0 0; + }; + + ix) Baud Rate Generator (BRG) + + Required properties: + - compatible : shpuld be fsl,cpm-brg + - fsl,brg-sources : define the input clock for all 16 BRGs. The input +clock source could be 1 to 24 for CLK1 to CLK24. Zero means that the +particular BRG will be driven by QE clock(BRGCLK). + - reg : This property defines the address and size of the memory-mapped +registers of the BRG. + + Example: + + [EMAIL PROTECTED] { + compatible = fsl,qe-brg; + /* input clock sources for all the 16 BRGs. +* 1-24 for CLK1 to CLK24. +* BRG9 uses CLK11 others use
Re: 2.6.24-rc4-mm1
On Wed, 5 Dec 2007, Andrew Morton wrote: On Thu, 06 Dec 2007 17:59:37 +1100 Reuben Farrelly [EMAIL PROTECTED] wrote: This non fatal oops which I have just noticed may be related to this change then - certainly looks networking related. yep, but it isn't e1000. It's core TCP. WARNING: at net/ipv4/tcp_input.c:2518 tcp_fastretrans_alert() Pid: 0, comm: swapper Not tainted 2.6.24-rc4-mm1 #1 Ilpo, Reuben's kernel is talking to you ;) ...Please try the patch below. Andrew, this probably fixes your problem (the packets = tp-packets_out) as well. Dave, please include this one to net-2.6.25. -- i. -- [PATCH] [TCP]: Fix fack_count miscountings (multiple places) 1) Fack_count is set incorrectly if the highest sent skb is already sacked (the skb-prev won't return it because it's on the other list already). These manifest as fackets_out counting error later on, the second-order effects are very hard to track, so it may fix all out-standing TCP bug reports. 2) Prev == NULL check was wrong way around 3) Last skb's fack count was incorrectly skipped while() {} loop Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- include/net/tcp.h | 22 -- 1 files changed, 16 insertions(+), 6 deletions(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index 9dbed0b..11a7e3e 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -1337,10 +1337,20 @@ static inline struct sk_buff *tcp_send_head(struct sock *sk) static inline void tcp_advance_send_head(struct sock *sk, struct sk_buff *skb) { struct sk_buff *prev = tcp_write_queue_prev(sk, skb); + unsigned int fc = 0; + + if (prev == (struct sk_buff *)sk-sk_write_queue) + prev = NULL; + else if (!tcp_skb_adjacent(sk, prev, skb)) + prev = NULL; - if (prev != (struct sk_buff *)sk-sk_write_queue) - TCP_SKB_CB(skb)-fack_count = TCP_SKB_CB(prev)-fack_count + - tcp_skb_pcount(prev); + if ((prev == NULL) !__tcp_write_queue_empty(sk, TCP_WQ_SACKED)) + prev = __tcp_write_queue_tail(sk, TCP_WQ_SACKED); + + if (prev != NULL) + fc = TCP_SKB_CB(prev)-fack_count + tcp_skb_pcount(prev); + + TCP_SKB_CB(skb)-fack_count = fc; sk-sk_send_head = tcp_write_queue_next(sk, skb); if (sk-sk_send_head == (struct sk_buff *)sk-sk_write_queue) @@ -1464,7 +1474,7 @@ static inline struct sk_buff *__tcp_reset_fack_counts(struct sock *sk, { unsigned int fc = 0; - if (prev == NULL) + if (prev != NULL) fc = TCP_SKB_CB(*prev)-fack_count + tcp_skb_pcount(*prev); BUG_ON((*prev != NULL) !tcp_skb_adjacent(sk, *prev, skb)); @@ -1521,7 +1531,7 @@ static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff *inskb) skb[otherq] = prev-next; } - while (skb[queue] != __tcp_write_queue_tail(sk, queue)) { + do { /* Lazy find for the other queue */ if (skb[queue] == NULL) { skb[queue] = tcp_write_queue_find(sk, TCP_SKB_CB(prev)-seq, @@ -1535,7 +1545,7 @@ static inline void tcp_reset_fack_counts(struct sock *sk, struct sk_buff *inskb) break; queue ^= TCP_WQ_SACKED; - } + } while (skb[queue] != __tcp_write_queue_tail(sk, queue)); } static inline void __tcp_insert_write_queue_after(struct sk_buff *skb, -- 1.5.0.6
[PATCH] [TCP]: Bind fackets_out state to highest_sack more tightly
Added checks will catch most of the errors if the current complex fack_count counting logic is flawed somewhere. Fackets_out should always be advancable if highest_sack is too because the fackets_out is nowadays accurate (and obviously it must be smaller than packets_out). Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] --- net/ipv4/tcp_input.c | 14 +- 1 files changed, 9 insertions(+), 5 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index 9499a12..23b2a34 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -1270,24 +1270,28 @@ static int tcp_sacktag_one(struct sk_buff *skb, struct sock *sk, } } - if (!before(TCP_SKB_CB(skb)-seq, tcp_highest_sack_seq(tp))) + fack_count += tcp_skb_pcount(skb); + if (!before(TCP_SKB_CB(skb)-seq, tcp_highest_sack_seq(tp))) { + WARN_ON((fack_count = tp-fackets_out) || + (fack_count tp-packets_out)); + tcp_advance_highest_sack(sk, skb); + tp-fackets_out = fack_count; + } else + WARN_ON(fack_count tp-fackets_out); + tcp_write_queue_requeue(skb, sk, TCP_WQ_SACKED); TCP_SKB_CB(skb)-sacked |= TCPCB_SACKED_ACKED; flag |= FLAG_DATA_SACKED; tp-sacked_out += tcp_skb_pcount(skb); - fack_count += tcp_skb_pcount(skb); - /* Lost marker hint past SACKed? Tweak RFC3517 cnt */ if (!tcp_is_fack(tp) (tp-lost_skb_hint != NULL) before(TCP_SKB_CB(skb)-seq, TCP_SKB_CB(tp-lost_skb_hint)-seq)) tp-lost_cnt_hint += tcp_skb_pcount(skb); - if (fack_count tp-fackets_out) - tp-fackets_out = fack_count; } /* D-SACK. We can detect redundant retransmission in S|R and plain R -- 1.5.0.6
Re: [PATCH] [TCP]: Bind fackets_out state to highest_sack more tightly
From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Mon, 10 Dec 2007 14:27:24 +0200 (EET) Added checks will catch most of the errors if the current complex fack_count counting logic is flawed somewhere. Fackets_out should always be advancable if highest_sack is too because the fackets_out is nowadays accurate (and obviously it must be smaller than packets_out). Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] Applied to net-2.6.25, thanks! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [TCP]: Bind fackets_out state to highest_sack more tightly
On Mon, 10 Dec 2007, David Miller wrote: From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Mon, 10 Dec 2007 14:27:24 +0200 (EET) Added checks will catch most of the errors if the current complex fack_count counting logic is flawed somewhere. Fackets_out should always be advancable if highest_sack is too because the fackets_out is nowadays accurate (and obviously it must be smaller than packets_out). Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] Applied to net-2.6.25, thanks! Please get the fack_count fix as well from the mm1 thread before my mailbox gets filled with stacktraces :-) : http://marc.info/?l=linux-netdevm=119728952018975w=2 -- i.
Re: [1/4] DST: Distributed storage documentation.
On Dec 10, 2007 12:47 PM, Evgeniy Polyakov [EMAIL PROTECTED] wrote: diff --git a/Documentation/dst/sysfs.txt b/Documentation/dst/sysfs.txt new file mode 100644 index 000..79d79dc --- /dev/null +++ b/Documentation/dst/sysfs.txt @@ -0,0 +1,30 @@ +This file describes sysfs files created for each storage. + +1. Per-storage files. +Each storage has its own dir /sysfs/devices/$storage_name, It's always /sys/devices/. +which contains following files: + +alg - contains name of the algorithm used to created given storage +name - name of the storage +nodes - map of the storage (list of nodes and their sizes and starts) +remove_all_nodes - writable file which allows to remove all nodes from given + storage +n-$start-$cookie - per node directory, where + $start - start of the given node in sectors, + $cookie - unique node's id used by DST + +2. Per-node files. +Node's files are located in /sysfs/devices/$storage_name/n-$start-$cookie +directory, described above. To which class or bus do the devices you create belong? Care to show a tree or ls -la of the device? Kay -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [1/4] DST: Distributed storage documentation.
On Mon, Dec 10, 2007 at 01:51:43PM +0100, Kay Sievers ([EMAIL PROTECTED]) wrote: On Dec 10, 2007 12:47 PM, Evgeniy Polyakov [EMAIL PROTECTED] wrote: diff --git a/Documentation/dst/sysfs.txt b/Documentation/dst/sysfs.txt new file mode 100644 index 000..79d79dc --- /dev/null +++ b/Documentation/dst/sysfs.txt @@ -0,0 +1,30 @@ +This file describes sysfs files created for each storage. + +1. Per-storage files. +Each storage has its own dir /sysfs/devices/$storage_name, It's always /sys/devices/. I meant that for each new device, it will be placed into /sys/devices/its_name, but it can also be accessed via /sys/bus/dst/devices/ +which contains following files: + +alg - contains name of the algorithm used to created given storage +name - name of the storage +nodes - map of the storage (list of nodes and their sizes and starts) +remove_all_nodes - writable file which allows to remove all nodes from given + storage +n-$start-$cookie - per node directory, where + $start - start of the given node in sectors, + $cookie - unique node's id used by DST + +2. Per-node files. +Node's files are located in /sysfs/devices/$storage_name/n-$start-$cookie +directory, described above. To which class or bus do the devices you create belong? Care to show a tree or ls -la of the device? It is 'dst' bus. uganda:~/codes# ls -la /sys/devices/staorge/ total 0 drwxr-xr-x 4 root root0 2007-12-10 11:46 . drwxr-xr-x 9 root root0 2007-12-10 11:46 .. -r--r--r-- 1 root root 4096 2007-12-10 11:46 alg lrwxrwxrwx 1 root root0 2007-12-10 11:46 bus - ../../bus/dst drwxr-xr-x 3 root root0 2007-12-10 11:46 n-0-81003e24117 -r--r--r-- 1 root root 4096 2007-12-10 11:46 name -r--r--r-- 1 root root 4096 2007-12-10 11:46 nodes drwxr-xr-x 2 root root0 2007-12-10 11:46 power -rw-r--r-- 1 root root 4096 2007-12-10 11:46 remove_all_nodes lrwxrwxrwx 1 root root0 2007-12-10 11:46 subsystem - ../../bus/dst -rw-r--r-- 1 root root 4096 2007-12-10 11:46 uevent uganda:~/codes# ls -l /sys/bus/dst/ total 0 drwxr-xr-x 2 root root0 2007-12-10 09:52 devices drwxr-xr-x 2 root root0 2007-12-10 09:52 drivers -rw-r--r-- 1 root root 4096 2007-12-10 11:46 drivers_autoprobe --w--- 1 root root 4096 2007-12-10 11:46 drivers_probe Kay -- Evgeniy Polyakov -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] [TFRC]: Revised Loss Intervals Patches (macro-less, new swap function)
This revision updates earlier patches, following discussion, and adds one additional cleanup patch at the end. Patch #1: Revision of initialisation patch; fixed calling __exit function from __init function - identified by Arnaldo. Patch #2: Revision - re-converted tfrc_rx_hist_entry() back to inline, following discussion with Arnaldo. Patch #3: Reworked - loss intervals database. Individual changes: - replaced tfrc_rx_hist_swap() with routine suggested by Arnaldo; - replaced all access macros with inlines or in-place(s); - replaced LIH_INDEX also with inline instead of macro. Patch #4: Removes redundant debugging output from syslog. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] [PATCH v2] [TFRC]: Loss interval code needs the macros/inlines that were moved
This moves the inlines (which were previously declared as macros) back into packet_history.h since the loss detection code needs to be able to read entries from the RX history in order to create the relevant loss entries: it needs at least tfrc_rx_hist_loss_prev() and tfrc_rx_hist_last_rcv(), which in turn require the definition of the other inlines (macros). Signed-off-by: Gerrit Renker [EMAIL PROTECTED] --- net/dccp/ccids/lib/packet_history.c | 35 --- net/dccp/ccids/lib/packet_history.h | 35 +++ 2 files changed, 35 insertions(+), 35 deletions(-) diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c index 1346045..22114c6 100644 --- a/net/dccp/ccids/lib/packet_history.c +++ b/net/dccp/ccids/lib/packet_history.c @@ -151,41 +151,6 @@ void tfrc_rx_packet_history_exit(void) } } -/** - * tfrc_rx_hist_index - index to reach n-th entry after loss_start - */ -static inline u8 tfrc_rx_hist_index(const struct tfrc_rx_hist *h, const u8 n) -{ - return (h-loss_start + n) TFRC_NDUPACK; -} - -/** - * tfrc_rx_hist_last_rcv - entry with highest-received-seqno so far - */ -static inline struct tfrc_rx_hist_entry * - tfrc_rx_hist_last_rcv(const struct tfrc_rx_hist *h) -{ - return h-ring[tfrc_rx_hist_index(h, h-loss_count)]; -} - -/** - * tfrc_rx_hist_entry - return the n-th history entry after loss_start - */ -static inline struct tfrc_rx_hist_entry * - tfrc_rx_hist_entry(const struct tfrc_rx_hist *h, const u8 n) -{ - return h-ring[tfrc_rx_hist_index(h, n)]; -} - -/** - * tfrc_rx_hist_loss_prev - entry with highest-received-seqno before loss was detected - */ -static inline struct tfrc_rx_hist_entry * - tfrc_rx_hist_loss_prev(const struct tfrc_rx_hist *h) -{ - return h-ring[h-loss_start]; -} - /* has the packet contained in skb been seen before? */ int tfrc_rx_hist_duplicate(struct tfrc_rx_hist *h, struct sk_buff *skb) { diff --git a/net/dccp/ccids/lib/packet_history.h b/net/dccp/ccids/lib/packet_history.h index 3dfd182..e58b0fc 100644 --- a/net/dccp/ccids/lib/packet_history.h +++ b/net/dccp/ccids/lib/packet_history.h @@ -84,6 +84,41 @@ struct tfrc_rx_hist { #define rtt_sample_prev loss_start }; +/** + * tfrc_rx_hist_index - index to reach n-th entry after loss_start + */ +static inline u8 tfrc_rx_hist_index(const struct tfrc_rx_hist *h, const u8 n) +{ + return (h-loss_start + n) TFRC_NDUPACK; +} + +/** + * tfrc_rx_hist_last_rcv - entry with highest-received-seqno so far + */ +static inline struct tfrc_rx_hist_entry * + tfrc_rx_hist_last_rcv(const struct tfrc_rx_hist *h) +{ + return h-ring[tfrc_rx_hist_index(h, h-loss_count)]; +} + +/** + * tfrc_rx_hist_entry - return the n-th history entry after loss_start + */ +static inline struct tfrc_rx_hist_entry * + tfrc_rx_hist_entry(const struct tfrc_rx_hist *h, const u8 n) +{ + return h-ring[tfrc_rx_hist_index(h, n)]; +} + +/** + * tfrc_rx_hist_loss_prev - entry with highest-received-seqno before loss was detected + */ +static inline struct tfrc_rx_hist_entry * + tfrc_rx_hist_loss_prev(const struct tfrc_rx_hist *h) +{ + return h-ring[h-loss_start]; +} + extern void tfrc_rx_hist_add_packet(struct tfrc_rx_hist *h, const struct sk_buff *skb, const u32 ndp); -- 1.5.3.GIT -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] [CCID3]: Redundant debugging output / documentation
Each time feedback is sent two lines are printed: ccid3_hc_rx_send_feedback: client ... - entry ccid3_hc_rx_send_feedback: Interval ...usec, X_recv=..., 1/p=... The first line is redundant and thus removed. Further, documentation of ccid3_hc_rx_sock (capitalisation) is made consistent. Signed-off-by: Gerrit Renker [EMAIL PROTECTED] --- net/dccp/ccids/ccid3.c |2 -- net/dccp/ccids/ccid3.h |4 ++-- 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/net/dccp/ccids/ccid3.c b/net/dccp/ccids/ccid3.c index 60fcb31..b92069b 100644 --- a/net/dccp/ccids/ccid3.c +++ b/net/dccp/ccids/ccid3.c @@ -685,8 +685,6 @@ static void ccid3_hc_rx_send_feedback(struct sock *sk, ktime_t now; s64 delta = 0; - ccid3_pr_debug(%s(%p) - entry \n, dccp_role(sk), sk); - if (unlikely(hcrx-ccid3hcrx_state == TFRC_RSTATE_TERM)) return; diff --git a/net/dccp/ccids/ccid3.h b/net/dccp/ccids/ccid3.h index 3c33dc6..6ceeb80 100644 --- a/net/dccp/ccids/ccid3.h +++ b/net/dccp/ccids/ccid3.h @@ -135,9 +135,9 @@ enum ccid3_hc_rx_states { * * @ccid3hcrx_x_recv - Receiver estimate of send rate (RFC 3448 4.3) * @ccid3hcrx_rtt - Receiver estimate of rtt (non-standard) - * @ccid3hcrx_p - current loss event rate (RFC 3448 5.4) + * @ccid3hcrx_p - Current loss event rate (RFC 3448 5.4) * @ccid3hcrx_last_counter - Tracks window counter (RFC 4342, 8.1) - * @ccid3hcrx_state - receiver state, one of %ccid3_hc_rx_states + * @ccid3hcrx_state - Receiver state, one of %ccid3_hc_rx_states * @ccid3hcrx_bytes_recv - Total sum of DCCP payload bytes * @ccid3hcrx_tstamp_last_feedback - Time at which last feedback was sent * @ccid3hcrx_tstamp_last_ack - Time at which last feedback was sent -- 1.5.3.GIT -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] [PATCH v2] [TFRC]: Put RX/TX initialisation into tfrc.c
This separates RX/TX initialisation and puts all packet history / loss intervals initialisation into tfrc.c. The organisation is uniform: slab declaration - {rx,tx}_init() - {rx,tx}_exit() Signed-off-by: Gerrit Renker [EMAIL PROTECTED] --- net/dccp/ccids/lib/packet_history.c | 68 -- net/dccp/ccids/lib/tfrc.c | 31 2 files changed, 55 insertions(+), 44 deletions(-) diff --git a/net/dccp/ccids/lib/packet_history.c b/net/dccp/ccids/lib/packet_history.c index af44082..727b17d 100644 --- a/net/dccp/ccids/lib/packet_history.c +++ b/net/dccp/ccids/lib/packet_history.c @@ -57,6 +57,22 @@ struct tfrc_tx_hist_entry { */ static struct kmem_cache *tfrc_tx_hist_slab; +int __init tfrc_tx_packet_history_init(void) +{ + tfrc_tx_hist_slab = kmem_cache_create(tfrc_tx_hist, + sizeof(struct tfrc_tx_hist_entry), + 0, SLAB_HWCACHE_ALIGN, NULL); + return tfrc_tx_hist_slab == NULL ? -ENOBUFS : 0; +} + +void tfrc_tx_packet_history_exit(void) +{ + if (tfrc_tx_hist_slab != NULL) { + kmem_cache_destroy(tfrc_tx_hist_slab); + tfrc_tx_hist_slab = NULL; + } +} + static struct tfrc_tx_hist_entry * tfrc_tx_hist_find_entry(struct tfrc_tx_hist_entry *head, u64 seqno) { @@ -119,6 +135,22 @@ EXPORT_SYMBOL_GPL(tfrc_tx_hist_rtt); */ static struct kmem_cache *tfrc_rx_hist_slab; +int __init tfrc_rx_packet_history_init(void) +{ + tfrc_rx_hist_slab = kmem_cache_create(tfrc_rxh_cache, + sizeof(struct tfrc_rx_hist_entry), + 0, SLAB_HWCACHE_ALIGN, NULL); + return tfrc_rx_hist_slab == NULL ? -ENOBUFS : 0; +} + +void tfrc_rx_packet_history_exit(void) +{ + if (tfrc_rx_hist_slab != NULL) { + kmem_cache_destroy(tfrc_rx_hist_slab); + tfrc_rx_hist_slab = NULL; + } +} + /** * tfrc_rx_hist_index - index to reach n-th entry after loss_start */ @@ -316,39 +348,3 @@ keep_ref_for_next_time: return sample; } EXPORT_SYMBOL_GPL(tfrc_rx_hist_sample_rtt); - -__init int packet_history_init(void) -{ - tfrc_tx_hist_slab = kmem_cache_create(tfrc_tx_hist, - sizeof(struct tfrc_tx_hist_entry), 0, - SLAB_HWCACHE_ALIGN, NULL); - if (tfrc_tx_hist_slab == NULL) - goto out_err; - - tfrc_rx_hist_slab = kmem_cache_create(tfrc_rx_hist, - sizeof(struct tfrc_rx_hist_entry), 0, - SLAB_HWCACHE_ALIGN, NULL); - if (tfrc_rx_hist_slab == NULL) - goto out_free_tx; - - return 0; - -out_free_tx: - kmem_cache_destroy(tfrc_tx_hist_slab); - tfrc_tx_hist_slab = NULL; -out_err: - return -ENOBUFS; -} - -void packet_history_exit(void) -{ - if (tfrc_tx_hist_slab != NULL) { - kmem_cache_destroy(tfrc_tx_hist_slab); - tfrc_tx_hist_slab = NULL; - } - - if (tfrc_rx_hist_slab != NULL) { - kmem_cache_destroy(tfrc_rx_hist_slab); - tfrc_rx_hist_slab = NULL; - } -} diff --git a/net/dccp/ccids/lib/tfrc.c b/net/dccp/ccids/lib/tfrc.c index 3a7a183..20763fa 100644 --- a/net/dccp/ccids/lib/tfrc.c +++ b/net/dccp/ccids/lib/tfrc.c @@ -14,27 +14,42 @@ module_param(tfrc_debug, bool, 0444); MODULE_PARM_DESC(tfrc_debug, Enable debug messages); #endif +extern int tfrc_tx_packet_history_init(void); +extern void tfrc_tx_packet_history_exit(void); +extern int tfrc_rx_packet_history_init(void); +extern void tfrc_rx_packet_history_exit(void); + extern int dccp_li_init(void); extern void dccp_li_exit(void); -extern int packet_history_init(void); -extern void packet_history_exit(void); static int __init tfrc_module_init(void) { int rc = dccp_li_init(); - if (rc == 0) { - rc = packet_history_init(); - if (rc != 0) - dccp_li_exit(); - } + if (rc) + goto out; + + rc = tfrc_tx_packet_history_init(); + if (rc) + goto out_free_loss_intervals; + rc = tfrc_rx_packet_history_init(); + if (rc) + goto out_free_tx_history; + return 0; + +out_free_tx_history: + tfrc_tx_packet_history_exit(); +out_free_loss_intervals: + dccp_li_exit(); +out: return rc; } static void __exit tfrc_module_exit(void) { - packet_history_exit(); + tfrc_rx_packet_history_exit(); + tfrc_tx_packet_history_exit(); dccp_li_exit(); } -- 1.5.3.GIT -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] [PATCH v2] [TFRC]: Ringbuffer to track loss interval history
A ringbuffer-based implementation of loss interval history is easier to maintain, allocate, and update. The `swap' routine to keep the RX history sorted is due to and was written by Arnaldo Carvalho de Melo, simplifying an earlier macro-based variant. Details: * access to the Loss Interval Records via macro wrappers (with safety checks); * simplified, on-demand allocation of entries (no extra memory consumption on lossless links); cache allocation is local to the module / exported as service; * provision of RFC-compliant algorithm to re-compute average loss interval; * provision of comprehensive, new loss detection algorithm - support for all cases of loss, including re-ordered/duplicate packets; - waiting for NDUPACK=3 packets to fill the hole; - updating loss records when a late-arriving packet fills a hole. Signed-off-by: Gerrit Renker [EMAIL PROTECTED] Signed-off-by: Ian McDonald [EMAIL PROTECTED] --- net/dccp/ccids/lib/loss_interval.c | 161 +++- net/dccp/ccids/lib/loss_interval.h | 56 ++- net/dccp/ccids/lib/packet_history.c | 202 +++ net/dccp/ccids/lib/packet_history.h | 11 ++- net/dccp/ccids/lib/tfrc.h |3 + 5 files changed, 423 insertions(+), 10 deletions(-) diff --git a/net/dccp/ccids/lib/loss_interval.c b/net/dccp/ccids/lib/loss_interval.c index c0a933a..39980d1 100644 --- a/net/dccp/ccids/lib/loss_interval.c +++ b/net/dccp/ccids/lib/loss_interval.c @@ -1,6 +1,7 @@ /* * net/dccp/ccids/lib/loss_interval.c * + * Copyright (c) 2007 The University of Aberdeen, Scotland, UK * Copyright (c) 2005-7 The University of Waikato, Hamilton, New Zealand. * Copyright (c) 2005-7 Ian McDonald [EMAIL PROTECTED] * Copyright (c) 2005 Arnaldo Carvalho de Melo [EMAIL PROTECTED] @@ -10,12 +11,7 @@ * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. */ - -#include linux/module.h #include net/sock.h -#include ../../dccp.h -#include loss_interval.h -#include packet_history.h #include tfrc.h #define DCCP_LI_HIST_IVAL_F_LENGTH 8 @@ -27,6 +23,54 @@ struct dccp_li_hist_entry { u32 dccplih_interval; }; +static struct kmem_cache *tfrc_lh_slab __read_mostly; +/* Loss Interval weights from [RFC 3448, 5.4], scaled by 10 */ +static const int tfrc_lh_weights[NINTERVAL] = { 10, 10, 10, 10, 8, 6, 4, 2 }; + +/* implements LIFO semantics on the array */ +static inline u8 LIH_INDEX(const u8 ctr) +{ + return (LIH_SIZE - 1 - (ctr % LIH_SIZE)); +} + +/* the `counter' index always points at the next entry to be populated */ +static inline struct tfrc_loss_interval *tfrc_lh_peek(struct tfrc_loss_hist *lh) +{ + return lh-counter ? lh-ring[LIH_INDEX(lh-counter - 1)] : NULL; +} + +/* given i with 0 = i = k, return I_i as per the rfc3448bis notation */ +static inline u32 tfrc_lh_get_interval(struct tfrc_loss_hist *lh, const u8 i) +{ + BUG_ON(i = lh-counter); + return lh-ring[LIH_INDEX(lh-counter - i - 1)]-li_length; +} + +/* + * On-demand allocation and de-allocation of entries + */ +static struct tfrc_loss_interval *tfrc_lh_demand_next(struct tfrc_loss_hist *lh) +{ + if (lh-ring[LIH_INDEX(lh-counter)] == NULL) + lh-ring[LIH_INDEX(lh-counter)] = kmem_cache_alloc(tfrc_lh_slab, + GFP_ATOMIC); + return lh-ring[LIH_INDEX(lh-counter)]; +} + +void tfrc_lh_cleanup(struct tfrc_loss_hist *lh) +{ + if (!tfrc_lh_is_initialised(lh)) + return; + + for (lh-counter = 0; lh-counter LIH_SIZE; lh-counter++) + if (lh-ring[LIH_INDEX(lh-counter)] != NULL) { + kmem_cache_free(tfrc_lh_slab, + lh-ring[LIH_INDEX(lh-counter)]); + lh-ring[LIH_INDEX(lh-counter)] = NULL; + } +} +EXPORT_SYMBOL_GPL(tfrc_lh_cleanup); + static struct kmem_cache *dccp_li_cachep __read_mostly; static inline struct dccp_li_hist_entry *dccp_li_hist_entry_new(const gfp_t prio) @@ -98,6 +142,65 @@ u32 dccp_li_hist_calc_i_mean(struct list_head *list) EXPORT_SYMBOL_GPL(dccp_li_hist_calc_i_mean); +static void tfrc_lh_calc_i_mean(struct tfrc_loss_hist *lh) +{ + u32 i_i, i_tot0 = 0, i_tot1 = 0, w_tot = 0; + int i, k = tfrc_lh_length(lh) - 1; /* k is as in rfc3448bis, 5.4 */ + + for (i=0; i = k; i++) { + i_i = tfrc_lh_get_interval(lh, i); + + if (i k) { + i_tot0 += i_i * tfrc_lh_weights[i]; + w_tot += tfrc_lh_weights[i]; + } + if (i 0) + i_tot1 += i_i * tfrc_lh_weights[i-1]; + } + + BUG_ON(w_tot == 0); + lh-i_mean = max(i_tot0, i_tot1) / w_tot; +} + +/** + * tfrc_lh_update_i_mean - Update the `open' loss interval I_0 + * For recomputing p: returns
Re: [1/4] DST: Distributed storage documentation.
On Mon, 2007-12-10 at 15:58 +0300, Evgeniy Polyakov wrote: On Mon, Dec 10, 2007 at 01:51:43PM +0100, Kay Sievers ([EMAIL PROTECTED]) wrote: On Dec 10, 2007 12:47 PM, Evgeniy Polyakov [EMAIL PROTECTED] wrote: diff --git a/Documentation/dst/sysfs.txt b/Documentation/dst/sysfs.txt new file mode 100644 index 000..79d79dc --- /dev/null +++ b/Documentation/dst/sysfs.txt @@ -0,0 +1,30 @@ +This file describes sysfs files created for each storage. + +1. Per-storage files. +Each storage has its own dir /sysfs/devices/$storage_name, It's always /sys/devices/. I meant that for each new device, it will be placed into /sys/devices/its_name, but it can also be accessed via /sys/bus/dst/devices/ Still, it looks like a path. :) Please don't reference any device directly with a /sys/devices/ path. You have to use the subsystem links to the devices in /sys/bus/dst/devices/. Devices are free to move around in /sys/devices, even during runtime. Yours don't do, but anyway, please remove all mentioning of direct access to /sys/devices/. Btw, where is the top-level /sys/devices/storage/ coming from? I don't see that in the code. We don't accept any new virtual parents here. Your devices will automatically appear in /sys/devices/virtual/dst/, and not below your own parent. But that path does not matter anyway, because you should only access them from the /sys/bus/dst/devices/ directory. And in general please don't claim generic names like storage in any namespace for a very specific subsystem like this. +which contains following files: + +alg - contains name of the algorithm used to created given storage +name - name of the storage +nodes - map of the storage (list of nodes and their sizes and starts) +remove_all_nodes - writable file which allows to remove all nodes from given + storage +n-$start-$cookie - per node directory, where + $start - start of the given node in sectors, + $cookie - unique node's id used by DST + +2. Per-node files. +Node's files are located in /sysfs/devices/$storage_name/n-$start-$cookie +directory, described above. To which class or bus do the devices you create belong? Care to show a tree or ls -la of the device? It is 'dst' bus. uganda:~/codes# ls -la /sys/devices/staorge/ total 0 drwxr-xr-x 4 root root0 2007-12-10 11:46 . drwxr-xr-x 9 root root0 2007-12-10 11:46 .. -r--r--r-- 1 root root 4096 2007-12-10 11:46 alg lrwxrwxrwx 1 root root0 2007-12-10 11:46 bus - ../../bus/dst drwxr-xr-x 3 root root0 2007-12-10 11:46 n-0-81003e24117 -r--r--r-- 1 root root 4096 2007-12-10 11:46 name -r--r--r-- 1 root root 4096 2007-12-10 11:46 nodes drwxr-xr-x 2 root root0 2007-12-10 11:46 power -rw-r--r-- 1 root root 4096 2007-12-10 11:46 remove_all_nodes lrwxrwxrwx 1 root root0 2007-12-10 11:46 subsystem - ../../bus/dst -rw-r--r-- 1 root root 4096 2007-12-10 11:46 uevent Ok, how does: ls -l /sys/devices/storage/n-0-81003e24117 look? uganda:~/codes# ls -l /sys/bus/dst/ total 0 drwxr-xr-x 2 root root0 2007-12-10 09:52 devices drwxr-xr-x 2 root root0 2007-12-10 09:52 drivers -rw-r--r-- 1 root root 4096 2007-12-10 11:46 drivers_autoprobe --w--- 1 root root 4096 2007-12-10 11:46 drivers_probe How does: ls -l /sys/bus/dst/devices look? Further questions: Why do you do your own refcounting instead of using kref? Why don't you use groups for the attributes? Why don't you use default attributes for the device, where you get all error handling done by the core. Kay -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] netem: trace enhancement
I finally managed to rewrite the netem trace extension to use rtnetlink communication for the data transfer for user space to kernel space. The kernel patch is available here: http://www.tcn.hypert.net/tcn_kernel_2_6_23_rtnetlink and the iproute patch is here: http://www.tcn.hypert.net/tcn_iproute2_2_6_23_rtnetlink Whenever new data is needed the kernel module sends a notification to the user space process. Thereupon the user space process sends a data package to the kernel module. I had to write a new qdisc_notify function (qdisc_notify_pid) since the other was acquiring a lock, which we already hold in this situation. I hope everything works as expected and I'm looking forward for your comments. Thanks! Ariane -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [1/4] DST: Distributed storage documentation.
On Mon, Dec 10, 2007 at 03:31:48PM +0100, Kay Sievers ([EMAIL PROTECTED]) wrote: I meant that for each new device, it will be placed into /sys/devices/its_name, but it can also be accessed via /sys/bus/dst/devices/ Still, it looks like a path. :) Please don't reference any device directly with a /sys/devices/ path. You have to use the subsystem links to the devices in /sys/bus/dst/devices/. Devices are free to move around in /sys/devices, even during runtime. Yours don't do, but anyway, please remove all mentioning of direct access to /sys/devices/. Ok, I will update documentation to reference /sys/bus/dst/devices instead of /sys/devices Btw, where is the top-level /sys/devices/storage/ coming from? I don't see that in the code. We don't accept any new virtual parents here. Your devices will automatically appear in /sys/devices/virtual/dst/, and not below your own parent. But that path does not matter anyway, because you should only access them from the /sys/bus/dst/devices/ directory. And in general please don't claim generic names like storage in any namespace for a very specific subsystem like this. It is not a parent - it is an example for device called 'storage', if it will be called 'testing', then path will be /sys/devices/testing or more correct /sys/bus/dst/devices/testing :) It is 'dst' bus. uganda:~/codes# ls -la /sys/devices/staorge/ total 0 drwxr-xr-x 4 root root0 2007-12-10 11:46 . drwxr-xr-x 9 root root0 2007-12-10 11:46 .. -r--r--r-- 1 root root 4096 2007-12-10 11:46 alg lrwxrwxrwx 1 root root0 2007-12-10 11:46 bus - ../../bus/dst drwxr-xr-x 3 root root0 2007-12-10 11:46 n-0-81003e24117 -r--r--r-- 1 root root 4096 2007-12-10 11:46 name -r--r--r-- 1 root root 4096 2007-12-10 11:46 nodes drwxr-xr-x 2 root root0 2007-12-10 11:46 power -rw-r--r-- 1 root root 4096 2007-12-10 11:46 remove_all_nodes lrwxrwxrwx 1 root root0 2007-12-10 11:46 subsystem - ../../bus/dst -rw-r--r-- 1 root root 4096 2007-12-10 11:46 uevent Ok, how does: ls -l /sys/devices/storage/n-0-81003e24117 look? uganda:~/codes# ls -l /sys/devices/storage/n-0-81003ebc220/ total 0 drwxr-xr-x 2 root root0 2007-12-10 13:23 power -r--r--r-- 1 root root 4096 2007-12-10 13:30 size -r--r--r-- 1 root root 4096 2007-12-10 13:30 start -r--r--r-- 1 root root 4096 2007-12-10 13:30 type -rw-r--r-- 1 root root 4096 2007-12-10 13:30 uevent uganda:~/codes# ls -l /sys/bus/dst/ total 0 drwxr-xr-x 2 root root0 2007-12-10 09:52 devices drwxr-xr-x 2 root root0 2007-12-10 09:52 drivers -rw-r--r-- 1 root root 4096 2007-12-10 11:46 drivers_autoprobe --w--- 1 root root 4096 2007-12-10 11:46 drivers_probe How does: ls -l /sys/bus/dst/devices look? uganda:~/codes# ls -la /sys/bus/dst/devices/ total 0 drwxr-xr-x 2 root root 0 2007-12-10 13:30 . drwxr-xr-x 4 root root 0 2007-12-10 13:22 .. lrwxrwxrwx 1 root root 0 2007-12-10 13:30 storage - ../../../devices/storage Here 'storage' is just a name for device called 'storage', it can be anything else. Further questions: Why do you do your own refcounting instead of using kref? That's because I always used atomic operations as a reference counters and did not tried krefs :) They are the same actually (module tricky arches where smp_mb_* are required), so I can replace them in the next release. Why don't you use groups for the attributes? For 3-4 attributes it is faster to register them in a loop than typing another structure :) Why don't you use default attributes for the device, where you get all error handling done by the core. What is 'default attributes' and for what devices? All my sysfs files are so much trivial, so they do not need anything special and I do not see what is error handling you mentioned. -- Evgeniy Polyakov -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: routing policy based on u32 classifier
Marco wrote: Hello everybody. Kindly, I would like to know if the is any plan to add this feature to a future kernel release. I know that fwmark is able to do this, but there is the limitation in source ip address selection. Could you explain the limitation? My iptables manpage seems to suggest that u32 is pretty general. Are you just asking if the pom-ng ipt_u32 will be mainlined? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [1/4] DST: Distributed storage documentation.
On Mon, Dec 10, 2007 at 05:50:55PM +0300, Evgeniy Polyakov ([EMAIL PROTECTED]) wrote: Further questions: Why do you do your own refcounting instead of using kref? That's because I always used atomic operations as a reference counters and did not tried krefs :) They are the same actually (module tricky arches where smp_mb_* are required), so I can replace them in the next release. Actually not - I have to set reference counter to something other than 1 or +/- 1, and thus will have to call kref_get() in a loop, which is a very ugly step. Is there kref_set() or somethinglike that? At least not in 2.6.22 what I'm using for now. Sigh, I've converted most of the DST already... -- Evgeniy Polyakov -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 0/5] ipv6: make af_inet6 subsystems to return an error at init
This patchset continue the work to make the different af_inet6 subsystems initialization functions to return an error code and to handle the error to fails safely. It takes into account: * flowlabel * exthdrs * frag * udp * udplite * tcp * raw -- -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 2/5] ipv6: make extended headers to return an error at initialization
This patch factorize the code for the differents init functions for rthdr, nodata, destopt in a single function exthdrs_init. This function returns an error so the af_inet6 module can check correctly the initialization. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- include/net/transp_v6.h |5 +-- net/ipv6/af_inet6.c | 10 +-- net/ipv6/exthdrs.c | 64 +--- 3 files changed, 48 insertions(+), 31 deletions(-) Index: net-2.6.25/include/net/transp_v6.h === --- net-2.6.25.orig/include/net/transp_v6.h +++ net-2.6.25/include/net/transp_v6.h @@ -17,10 +17,9 @@ extern struct proto tcpv6_prot; struct flowi; /* extention headers */ -extern voidipv6_rthdr_init(void); +extern int ipv6_exthdrs_init(void); +extern voidipv6_exthdrs_exit(void); extern voidipv6_frag_init(void); -extern voidipv6_nodata_init(void); -extern voidipv6_destopt_init(void); /* transport protocols */ extern voidrawv6_init(void); Index: net-2.6.25/net/ipv6/af_inet6.c === --- net-2.6.25.orig/net/ipv6/af_inet6.c +++ net-2.6.25/net/ipv6/af_inet6.c @@ -859,10 +859,11 @@ static int __init inet6_init(void) goto addrconf_fail; /* Init v6 extension headers. */ - ipv6_rthdr_init(); + err = ipv6_exthdrs_init(); + if (err) + goto ipv6_exthdrs_fail; + ipv6_frag_init(); - ipv6_nodata_init(); - ipv6_destopt_init(); /* Init v6 transport protocols. */ udpv6_init(); @@ -874,6 +875,8 @@ static int __init inet6_init(void) out: return err; +ipv6_exthdrs_fail: + addrconf_cleanup(); addrconf_fail: ip6_flowlabel_cleanup(); ip6_flowlabel_fail: @@ -932,6 +935,7 @@ static void __exit inet6_exit(void) /* Cleanup code parts. */ ipv6_packet_cleanup(); + ipv6_exthdrs_exit(); addrconf_cleanup(); ip6_flowlabel_cleanup(); ip6_route_cleanup(); Index: net-2.6.25/net/ipv6/exthdrs.c === --- net-2.6.25.orig/net/ipv6/exthdrs.c +++ net-2.6.25/net/ipv6/exthdrs.c @@ -308,28 +308,6 @@ static int ipv6_destopt_rcv(struct sk_bu return -1; } -static struct inet6_protocol destopt_protocol = { - .handler= ipv6_destopt_rcv, - .flags = INET6_PROTO_NOPOLICY | INET6_PROTO_GSO_EXTHDR, -}; - -void __init ipv6_destopt_init(void) -{ - if (inet6_add_protocol(destopt_protocol, IPPROTO_DSTOPTS) 0) - printk(KERN_ERR ipv6_destopt_init: Could not register protocol\n); -} - -static struct inet6_protocol nodata_protocol = { - .handler= dst_discard, - .flags = INET6_PROTO_NOPOLICY, -}; - -void __init ipv6_nodata_init(void) -{ - if (inet6_add_protocol(nodata_protocol, IPPROTO_NONE) 0) - printk(KERN_ERR ipv6_nodata_init: Could not register protocol\n); -} - / Routing header. / @@ -527,12 +505,48 @@ static struct inet6_protocol rthdr_proto .flags = INET6_PROTO_NOPOLICY | INET6_PROTO_GSO_EXTHDR, }; -void __init ipv6_rthdr_init(void) +static struct inet6_protocol destopt_protocol = { + .handler= ipv6_destopt_rcv, + .flags = INET6_PROTO_NOPOLICY | INET6_PROTO_GSO_EXTHDR, +}; + +static struct inet6_protocol nodata_protocol = { + .handler= dst_discard, + .flags = INET6_PROTO_NOPOLICY, +}; + +int __init ipv6_exthdrs_init(void) { - if (inet6_add_protocol(rthdr_protocol, IPPROTO_ROUTING) 0) - printk(KERN_ERR ipv6_rthdr_init: Could not register protocol\n); + int ret; + + ret = inet6_add_protocol(rthdr_protocol, IPPROTO_ROUTING); + if (ret) + goto out; + + ret = inet6_add_protocol(destopt_protocol, IPPROTO_DSTOPTS); + if (ret) + goto out_rthdr; + + ret = inet6_add_protocol(nodata_protocol, IPPROTO_NONE); + if (ret) + goto out_destopt; + +out: + return ret; +out_rthdr: + inet6_del_protocol(rthdr_protocol, IPPROTO_ROUTING); +out_destopt: + inet6_del_protocol(destopt_protocol, IPPROTO_DSTOPTS); + goto out; }; +void ipv6_exthdrs_exit(void) +{ + inet6_del_protocol(nodata_protocol, IPPROTO_NONE); + inet6_del_protocol(destopt_protocol, IPPROTO_DSTOPTS); + inet6_del_protocol(rthdr_protocol, IPPROTO_ROUTING); +} + /** Hop-by-hop options. **/ -- -- To unsubscribe from this list:
[patch 5/5] ipv6: make the protocol initialization to return an error code
This patchset makes the different protocols to return an error code, so the af_inet6 module can check the initialization was correct or not. The raw6 was taken into account to be consistent with the rest of the protocols, but the registration is at the same place. Because the raw6 has its own init function, the proto and the ops structure can be moved inside the raw6.c file. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- include/net/ipv6.h |2 - include/net/transp_v6.h | 12 --- net/ipv6/af_inet6.c | 77 --- net/ipv6/ipv6_sockglue.c |3 + net/ipv6/raw.c | 52 +++ net/ipv6/tcp_ipv6.c | 38 ++- net/ipv6/udp.c | 26 +-- net/ipv6/udplite.c | 25 --- 8 files changed, 169 insertions(+), 66 deletions(-) Index: net-2.6.25/include/net/transp_v6.h === --- net-2.6.25.orig/include/net/transp_v6.h +++ net-2.6.25/include/net/transp_v6.h @@ -23,10 +23,14 @@ extern int ipv6_frag_init(void); extern voidipv6_frag_exit(void); /* transport protocols */ -extern voidrawv6_init(void); -extern voidudpv6_init(void); -extern voidudplitev6_init(void); -extern voidtcpv6_init(void); +extern int rawv6_init(void); +extern voidrawv6_exit(void); +extern int udpv6_init(void); +extern voidudpv6_exit(void); +extern int udplitev6_init(void); +extern voidudplitev6_exit(void); +extern int tcpv6_init(void); +extern voidtcpv6_exit(void); extern int udpv6_connect(struct sock *sk, struct sockaddr *uaddr, Index: net-2.6.25/net/ipv6/af_inet6.c === --- net-2.6.25.orig/net/ipv6/af_inet6.c +++ net-2.6.25/net/ipv6/af_inet6.c @@ -529,42 +529,6 @@ static struct net_proto_family inet6_fam .owner = THIS_MODULE, }; -/* Same as inet6_dgram_ops, sans udp_poll. */ -static const struct proto_ops inet6_sockraw_ops = { - .family= PF_INET6, - .owner = THIS_MODULE, - .release = inet6_release, - .bind = inet6_bind, - .connect = inet_dgram_connect,/* ok */ - .socketpair= sock_no_socketpair,/* a do nothing */ - .accept= sock_no_accept,/* a do nothing */ - .getname = inet6_getname, - .poll = datagram_poll, /* ok */ - .ioctl = inet6_ioctl, /* must change */ - .listen= sock_no_listen,/* ok */ - .shutdown = inet_shutdown, /* ok */ - .setsockopt= sock_common_setsockopt,/* ok */ - .getsockopt= sock_common_getsockopt,/* ok */ - .sendmsg = inet_sendmsg, /* ok */ - .recvmsg = sock_common_recvmsg, /* ok */ - .mmap = sock_no_mmap, - .sendpage = sock_no_sendpage, -#ifdef CONFIG_COMPAT - .compat_setsockopt = compat_sock_common_setsockopt, - .compat_getsockopt = compat_sock_common_getsockopt, -#endif -}; - -static struct inet_protosw rawv6_protosw = { - .type = SOCK_RAW, - .protocol = IPPROTO_IP, /* wild card */ - .prot = rawv6_prot, - .ops= inet6_sockraw_ops, - .capability = CAP_NET_RAW, - .no_check = UDP_CSUM_DEFAULT, - .flags = INET_PROTOSW_REUSE, -}; - int inet6_register_protosw(struct inet_protosw *p) { struct list_head *lh; @@ -771,7 +735,6 @@ static int __init inet6_init(void) __this_module.can_unload = ipv6_unload; #endif #endif - err = proto_register(tcpv6_prot, 1); if (err) goto out; @@ -796,14 +759,16 @@ static int __init inet6_init(void) /* We MUST register RAW sockets before we create the ICMP6, * IGMP6, or NDISC control sockets. */ - inet6_register_protosw(rawv6_protosw); + err = rawv6_init(); + if (err) + goto out_unregister_raw_proto; /* Register the family here so that the init calls below will * be able to create sockets. (?? is this dangerous ??) */ err = sock_register(inet6_family_ops); if (err) - goto
[patch 1/5] ipv6: make flowlabel to return an error
This patch makes the flowlab subsystem to return an error code and makes some cleanup with procfs ifdefs. The af_inet6 will use the flowlabel init return code to check the initialization was correct. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- include/net/ipv6.h |2 +- net/ipv6/af_inet6.c |5 - net/ipv6/ip6_flowlabel.c | 30 +++--- 3 files changed, 28 insertions(+), 9 deletions(-) Index: net-2.6.25/include/net/ipv6.h === --- net-2.6.25.orig/include/net/ipv6.h +++ net-2.6.25/include/net/ipv6.h @@ -219,7 +219,7 @@ extern struct ipv6_txoptions*fl6_merge_ struct ipv6_txoptions * fopt); extern voidfl6_free_socklist(struct sock *sk); extern int ipv6_flowlabel_opt(struct sock *sk, char __user *optval, int optlen); -extern voidip6_flowlabel_init(void); +extern int ip6_flowlabel_init(void); extern voidip6_flowlabel_cleanup(void); static inline void fl6_sock_release(struct ip6_flowlabel *fl) Index: net-2.6.25/net/ipv6/af_inet6.c === --- net-2.6.25.orig/net/ipv6/af_inet6.c +++ net-2.6.25/net/ipv6/af_inet6.c @@ -851,7 +851,9 @@ static int __init inet6_init(void) err = ip6_route_init(); if (err) goto ip6_route_fail; - ip6_flowlabel_init(); + err = ip6_flowlabel_init(); + if (err) + goto ip6_flowlabel_fail; err = addrconf_init(); if (err) goto addrconf_fail; @@ -874,6 +876,7 @@ out: addrconf_fail: ip6_flowlabel_cleanup(); +ip6_flowlabel_fail: ip6_route_cleanup(); ip6_route_fail: #ifdef CONFIG_PROC_FS Index: net-2.6.25/net/ipv6/ip6_flowlabel.c === --- net-2.6.25.orig/net/ipv6/ip6_flowlabel.c +++ net-2.6.25/net/ipv6/ip6_flowlabel.c @@ -692,20 +692,36 @@ static const struct file_operations ip6f .llseek = seq_lseek, .release= seq_release_private, }; -#endif +static int ip6_flowlabel_proc_init(struct net *net) +{ + if (!proc_net_fops_create(net, ip6_flowlabel, S_IRUGO, ip6fl_seq_fops)) + return -ENOMEM; + return 0; +} -void ip6_flowlabel_init(void) +static void ip6_flowlabel_proc_fini(struct net *net) { -#ifdef CONFIG_PROC_FS - proc_net_fops_create(init_net, ip6_flowlabel, S_IRUGO, ip6fl_seq_fops); + proc_net_remove(net, ip6_flowlabel); +} +#else +static inline int ip6_flowlabel_proc_init(struct net *net) +{ + return 0; +} +static inline void ip6_flowlabel_proc_fini(struct net *net) +{ + return ; +} #endif + +int ip6_flowlabel_init(void) +{ + return ip6_flowlabel_proc_init(init_net); } void ip6_flowlabel_cleanup(void) { del_timer(ip6_fl_gc_timer); -#ifdef CONFIG_PROC_FS - proc_net_remove(init_net, ip6_flowlabel); -#endif + ip6_flowlabel_proc_fini(init_net); } -- -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 3/5] ipv6: make frag to return an error at initialization
This patch makes the frag_init to return an error code, so the af_inet6 module can handle the error. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- include/net/transp_v6.h |3 ++- net/ipv6/af_inet6.c |8 ++-- net/ipv6/reassembly.c | 16 +--- 3 files changed, 21 insertions(+), 6 deletions(-) Index: net-2.6.25/include/net/transp_v6.h === --- net-2.6.25.orig/include/net/transp_v6.h +++ net-2.6.25/include/net/transp_v6.h @@ -19,7 +19,8 @@ struct flowi; /* extention headers */ extern int ipv6_exthdrs_init(void); extern voidipv6_exthdrs_exit(void); -extern voidipv6_frag_init(void); +extern int ipv6_frag_init(void); +extern voidipv6_frag_exit(void); /* transport protocols */ extern voidrawv6_init(void); Index: net-2.6.25/net/ipv6/af_inet6.c === --- net-2.6.25.orig/net/ipv6/af_inet6.c +++ net-2.6.25/net/ipv6/af_inet6.c @@ -863,7 +863,9 @@ static int __init inet6_init(void) if (err) goto ipv6_exthdrs_fail; - ipv6_frag_init(); + err = ipv6_frag_init(); + if (err) + goto ipv6_frag_fail; /* Init v6 transport protocols. */ udpv6_init(); @@ -875,6 +877,8 @@ static int __init inet6_init(void) out: return err; +ipv6_frag_fail: + ipv6_exthdrs_exit(); ipv6_exthdrs_fail: addrconf_cleanup(); addrconf_fail: @@ -934,7 +938,7 @@ static void __exit inet6_exit(void) /* Cleanup code parts. */ ipv6_packet_cleanup(); - + ipv6_frag_exit(); ipv6_exthdrs_exit(); addrconf_cleanup(); ip6_flowlabel_cleanup(); Index: net-2.6.25/net/ipv6/reassembly.c === --- net-2.6.25.orig/net/ipv6/reassembly.c +++ net-2.6.25/net/ipv6/reassembly.c @@ -632,11 +632,13 @@ static struct inet6_protocol frag_protoc .flags = INET6_PROTO_NOPOLICY, }; -void __init ipv6_frag_init(void) +int __init ipv6_frag_init(void) { - if (inet6_add_protocol(frag_protocol, IPPROTO_FRAGMENT) 0) - printk(KERN_ERR ipv6_frag_init: Could not register protocol\n); + int ret; + ret = inet6_add_protocol(frag_protocol, IPPROTO_FRAGMENT); + if (ret) + goto out; ip6_frags.ctl = ip6_frags_ctl; ip6_frags.hashfn = ip6_hashfn; ip6_frags.constructor = ip6_frag_init; @@ -646,4 +648,12 @@ void __init ipv6_frag_init(void) ip6_frags.match = ip6_frag_match; ip6_frags.frag_expire = ip6_frag_expire; inet_frags_init(ip6_frags); +out: + return ret; +} + +void ipv6_frag_exit(void) +{ + inet_frags_fini(ip6_frags); + inet6_del_protocol(frag_protocol, IPPROTO_FRAGMENT); } -- -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 4/5] ipv6: make inet6_register_protosw to return an error code
This patch makes the inet6_register_protosw to return an error code. The different protocols can be aware the registration was successful or not and can pass the error to the initial caller, af_inet6. Signed-off-by: Daniel Lezcano [EMAIL PROTECTED] --- include/net/protocol.h |2 +- net/ipv6/af_inet6.c| 11 +++ 2 files changed, 8 insertions(+), 5 deletions(-) Index: net-2.6.25/include/net/protocol.h === --- net-2.6.25.orig/include/net/protocol.h +++ net-2.6.25/include/net/protocol.h @@ -102,7 +102,7 @@ extern void inet_unregister_protosw(stru #if defined(CONFIG_IPV6) || defined (CONFIG_IPV6_MODULE) extern int inet6_add_protocol(struct inet6_protocol *prot, unsigned char num); extern int inet6_del_protocol(struct inet6_protocol *prot, unsigned char num); -extern voidinet6_register_protosw(struct inet_protosw *p); +extern int inet6_register_protosw(struct inet_protosw *p); extern voidinet6_unregister_protosw(struct inet_protosw *p); #endif Index: net-2.6.25/net/ipv6/af_inet6.c === --- net-2.6.25.orig/net/ipv6/af_inet6.c +++ net-2.6.25/net/ipv6/af_inet6.c @@ -565,21 +565,23 @@ static struct inet_protosw rawv6_protosw .flags = INET_PROTOSW_REUSE, }; -void -inet6_register_protosw(struct inet_protosw *p) +int inet6_register_protosw(struct inet_protosw *p) { struct list_head *lh; struct inet_protosw *answer; - int protocol = p-protocol; struct list_head *last_perm; + int protocol = p-protocol; + int ret; spin_lock_bh(inetsw6_lock); + ret = -EINVAL; if (p-type = SOCK_MAX) goto out_illegal; /* If we are trying to override a permanent protocol, bail. */ answer = NULL; + ret = -EPERM; last_perm = inetsw6[p-type]; list_for_each(lh, inetsw6[p-type]) { answer = list_entry(lh, struct inet_protosw, list); @@ -603,9 +605,10 @@ inet6_register_protosw(struct inet_proto * system automatically returns to the old behavior. */ list_add_rcu(p-list, last_perm); + ret = 0; out: spin_unlock_bh(inetsw6_lock); - return; + return ret; out_permanent: printk(KERN_ERR Attempt to override permanent protocol %d.\n, -- -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: routing policy based on u32 classifier
Brian S Julin wrote: Marco wrote: Hello everybody. Kindly, I would like to know if the is any plan to add this feature to a future kernel release. I know that fwmark is able to do this, but there is the limitation in source ip address selection. Could you explain the limitation? Indeed. Here is an example: hdsl| |adsl line| |line | | +--+ | | |SQUID | | | +--+ | internal| lan | The linux default gateway point to the hdsl router. I want to redirect the squid http traffic (running on the same host) to the adsl line. So I create a routing table (adsl table) and I put a default route to the adsl router. I mark the http traffic. I insert a rule based on fwmark which select the 'adsl table'. So far, so good. But, squid will select the source ip address from the hdsl network class, because it is the default gateway. Then, the local generated packets must be SNATed with the adsl ip. Is it clear? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 0/5] ipv6: make af_inet6 subsystems to return an error at init
Daniel Lezcano wrote: This patchset continue the work to make the different af_inet6 subsystems initialization functions to return an error code and to handle the error to fails safely. It takes into account: * flowlabel * exthdrs * frag * udp * udplite * tcp * raw I just noticed that I forgot to put ipv6 under bracket. Sorry for that :( Should I resend the patchset ? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: routing policy based on u32 classifier
Brian S Julin wrote: Almost clear... why can you not just add src ADSL IP to the fwmark route to set the default source address for locally originating packets? IIRC, it doesn't work because netfilter isn't called in ip source address selection. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.25] netns: struct net content re-work
Recently David Miller and Herbert Xu pointed out that struct net becomes overbloated and un-maintainable. There are two solutions: - provide a pointer to a network subsystem definition from struct net. This costs an additional dereferrence - place sub-system definition into the structure itself. This will speedup run-time access at the cost of recompilation time The second approach looks better for us. Other sub-systems will be converted to this approach if this will be accepted :) Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index b62e31f..f60e1ce 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -8,6 +8,8 @@ #include linux/workqueue.h #include linux/list.h +#include net/netns/unix.h + struct proc_dir_entry; struct net_device; struct sock; @@ -46,8 +48,7 @@ struct net { struct hlist_head packet_sklist; /* unix sockets */ - int sysctl_unix_max_dgram_qlen; - struct ctl_table_header *unix_ctl; + struct netns_unix unx; }; #ifdef CONFIG_NET diff --git a/include/net/netns/unix.h b/include/net/netns/unix.h new file mode 100644 index 000..27b4e7f --- /dev/null +++ b/include/net/netns/unix.h @@ -0,0 +1,13 @@ +/* + * Unix network namespace + */ +#ifndef __NETNS_UNIX_H__ +#define __NETNS_UNIX_H__ + +struct ctl_table_header; +struct netns_unix { + int sysctl_unix_max_dgram_qlen; + struct ctl_table_header *unix_ctl; +}; + +#endif /* __NETNS_UNIX_H__ */ diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index b8a2189..06f7ec8 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -592,7 +592,7 @@ static struct sock * unix_create1(struct net *net, struct socket *sock) af_unix_sk_receive_queue_lock_key); sk-sk_write_space = unix_write_space; - sk-sk_max_ack_backlog = net-sysctl_unix_max_dgram_qlen; + sk-sk_max_ack_backlog = net-unx.sysctl_unix_max_dgram_qlen; sk-sk_destruct = unix_sock_destructor; u = unix_sk(sk); u-dentry = NULL; @@ -2138,7 +2138,7 @@ static int unix_net_init(struct net *net) { int error = -ENOMEM; - net-sysctl_unix_max_dgram_qlen = 10; + net-unx.sysctl_unix_max_dgram_qlen = 10; if (unix_sysctl_register(net)) goto out; diff --git a/net/unix/sysctl_net_unix.c b/net/unix/sysctl_net_unix.c index 553ef6a..fbddfb5 100644 --- a/net/unix/sysctl_net_unix.c +++ b/net/unix/sysctl_net_unix.c @@ -18,7 +18,7 @@ static ctl_table unix_table[] = { { .ctl_name = NET_UNIX_MAX_DGRAM_QLEN, .procname = max_dgram_qlen, - .data = init_net.sysctl_unix_max_dgram_qlen, + .data = init_net.unx.sysctl_unix_max_dgram_qlen, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec @@ -40,9 +40,9 @@ int unix_sysctl_register(struct net *net) if (table == NULL) goto err_alloc; - table[0].data = net-sysctl_unix_max_dgram_qlen; - net-unix_ctl = register_net_sysctl_table(net, unix_path, table); - if (net-unix_ctl == NULL) + table[0].data = net-unx.sysctl_unix_max_dgram_qlen; + net-unx.unix_ctl = register_net_sysctl_table(net, unix_path, table); + if (net-unx.unix_ctl == NULL) goto err_reg; return 0; @@ -57,8 +57,8 @@ void unix_sysctl_unregister(struct net *net) { struct ctl_table *table; - table = net-unix_ctl-ctl_table_arg; - unregister_sysctl_table(net-unix_ctl); + table = net-unx.unix_ctl-ctl_table_arg; + unregister_sysctl_table(net-unx.unix_ctl); kfree(table); } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2.6.25] UNIX: remove unused declaration of sysctl_unix_max_dgram_qlen
Recently David Miller and Herbert Xu pointed out that struct net becomes overbloated and un-maintainable. There are two solutions: - provide a pointer to a network subsystem definition from struct net. This costs an additional dereferrence - place sub-system definition into the structure itself. This will speedup run-time access at the cost of recompilation time The second approach looks better for us. Other sub-systems will be converted to this approach if this will be accepted :) Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index b62e31f..f60e1ce 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -8,6 +8,8 @@ #include linux/workqueue.h #include linux/list.h +#include net/netns/unix.h + struct proc_dir_entry; struct net_device; struct sock; @@ -46,8 +48,7 @@ struct net { struct hlist_head packet_sklist; /* unix sockets */ - int sysctl_unix_max_dgram_qlen; - struct ctl_table_header *unix_ctl; + struct netns_unix unx; }; #ifdef CONFIG_NET diff --git a/include/net/netns/unix.h b/include/net/netns/unix.h new file mode 100644 index 000..27b4e7f --- /dev/null +++ b/include/net/netns/unix.h @@ -0,0 +1,13 @@ +/* + * Unix network namespace + */ +#ifndef __NETNS_UNIX_H__ +#define __NETNS_UNIX_H__ + +struct ctl_table_header; +struct netns_unix { + int sysctl_unix_max_dgram_qlen; + struct ctl_table_header *unix_ctl; +}; + +#endif /* __NETNS_UNIX_H__ */ diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index b8a2189..06f7ec8 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -592,7 +592,7 @@ static struct sock * unix_create1(struct net *net, struct socket *sock) af_unix_sk_receive_queue_lock_key); sk-sk_write_space = unix_write_space; - sk-sk_max_ack_backlog = net-sysctl_unix_max_dgram_qlen; + sk-sk_max_ack_backlog = net-unx.sysctl_unix_max_dgram_qlen; sk-sk_destruct = unix_sock_destructor; u = unix_sk(sk); u-dentry = NULL; @@ -2138,7 +2138,7 @@ static int unix_net_init(struct net *net) { int error = -ENOMEM; - net-sysctl_unix_max_dgram_qlen = 10; + net-unx.sysctl_unix_max_dgram_qlen = 10; if (unix_sysctl_register(net)) goto out; diff --git a/net/unix/sysctl_net_unix.c b/net/unix/sysctl_net_unix.c index 553ef6a..fbddfb5 100644 --- a/net/unix/sysctl_net_unix.c +++ b/net/unix/sysctl_net_unix.c @@ -18,7 +18,7 @@ static ctl_table unix_table[] = { { .ctl_name = NET_UNIX_MAX_DGRAM_QLEN, .procname = max_dgram_qlen, - .data = init_net.sysctl_unix_max_dgram_qlen, + .data = init_net.unx.sysctl_unix_max_dgram_qlen, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec @@ -40,9 +40,9 @@ int unix_sysctl_register(struct net *net) if (table == NULL) goto err_alloc; - table[0].data = net-sysctl_unix_max_dgram_qlen; - net-unix_ctl = register_net_sysctl_table(net, unix_path, table); - if (net-unix_ctl == NULL) + table[0].data = net-unx.sysctl_unix_max_dgram_qlen; + net-unx.unix_ctl = register_net_sysctl_table(net, unix_path, table); + if (net-unx.unix_ctl == NULL) goto err_reg; return 0; @@ -57,8 +57,8 @@ void unix_sysctl_unregister(struct net *net) { struct ctl_table *table; - table = net-unix_ctl-ctl_table_arg; - unregister_sysctl_table(net-unix_ctl); + table = net-unx.unix_ctl-ctl_table_arg; + unregister_sysctl_table(net-unx.unix_ctl); kfree(table); } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
ip neigh show not showing arp cache entries?
I'm seeing some strange behaviour on a 2.6.14 ppc64 system. If I run ip neigh show it prints out nothing, but if I run arp then I see the other nodes on the local network. [EMAIL PROTECTED]:/root ip neigh show [EMAIL PROTECTED]:/root arp -n Address HWtype HWaddress Flags Mask Iface 172.24.132.0 ether 00:01:AF:14:E8:DA C bond0 172.24.132.1 (incomplete) bond0 172.24.136.0 ether 00:C0:8B:07:B3:7E C bond0 172.24.132.4 ether 00:01:AF:14:E8:DA C bond0 172.24.132.2 ether 00:01:AF:14:E8:DA C bond0 Any ideas what's going on here? Thanks, Chris -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.25] netns: struct net content re-work
Denis V. Lunev wrote: Recently David Miller and Herbert Xu pointed out that struct net becomes overbloated and un-maintainable. There are two solutions: - provide a pointer to a network subsystem definition from struct net. This costs an additional dereferrence - place sub-system definition into the structure itself. This will speedup run-time access at the cost of recompilation time The second approach looks better for us. Yes, we do not need/want a pointer in this structure and add more dereference in the network code. Other sub-systems will be converted to this approach if this will be accepted :) Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index b62e31f..f60e1ce 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -8,6 +8,8 @@ #include linux/workqueue.h #include linux/list.h +#include net/netns/unix.h + struct proc_dir_entry; struct net_device; struct sock; @@ -46,8 +48,7 @@ struct net { struct hlist_head packet_sklist; /* unix sockets */ - int sysctl_unix_max_dgram_qlen; - struct ctl_table_header *unix_ctl; + struct netns_unix unx; Can you change this from unx to unix ? If you encapsulate the structure definitions per subsystem, you can drop the unix prefix in the variable declaration. Instead of having: netns-unix-unix_ctl you will have: netns-unix-ctl }; #ifdef CONFIG_NET diff --git a/include/net/netns/unix.h b/include/net/netns/unix.h new file mode 100644 index 000..27b4e7f --- /dev/null +++ b/include/net/netns/unix.h @@ -0,0 +1,13 @@ +/* + * Unix network namespace + */ +#ifndef __NETNS_UNIX_H__ +#define __NETNS_UNIX_H__ + +struct ctl_table_header; +struct netns_unix { + int sysctl_unix_max_dgram_qlen; + struct ctl_table_header *unix_ctl; +}; + +#endif /* __NETNS_UNIX_H__ */ If I follow the logic, we will have a file per subsystem. These files will be very small, no ? IMHO, having the structure defined in net_namespace.h is enough. diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index b8a2189..06f7ec8 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -592,7 +592,7 @@ static struct sock * unix_create1(struct net *net, struct socket *sock) af_unix_sk_receive_queue_lock_key); sk-sk_write_space = unix_write_space; - sk-sk_max_ack_backlog = net-sysctl_unix_max_dgram_qlen; + sk-sk_max_ack_backlog = net-unx.sysctl_unix_max_dgram_qlen; sk-sk_destruct = unix_sock_destructor; u = unix_sk(sk); u-dentry = NULL; @@ -2138,7 +2138,7 @@ static int unix_net_init(struct net *net) { int error = -ENOMEM; - net-sysctl_unix_max_dgram_qlen = 10; + net-unx.sysctl_unix_max_dgram_qlen = 10; if (unix_sysctl_register(net)) goto out; diff --git a/net/unix/sysctl_net_unix.c b/net/unix/sysctl_net_unix.c index 553ef6a..fbddfb5 100644 --- a/net/unix/sysctl_net_unix.c +++ b/net/unix/sysctl_net_unix.c @@ -18,7 +18,7 @@ static ctl_table unix_table[] = { { .ctl_name = NET_UNIX_MAX_DGRAM_QLEN, .procname = max_dgram_qlen, - .data = init_net.sysctl_unix_max_dgram_qlen, + .data = init_net.unx.sysctl_unix_max_dgram_qlen, .maxlen = sizeof(int), .mode = 0644, .proc_handler = proc_dointvec @@ -40,9 +40,9 @@ int unix_sysctl_register(struct net *net) if (table == NULL) goto err_alloc; - table[0].data = net-sysctl_unix_max_dgram_qlen; - net-unix_ctl = register_net_sysctl_table(net, unix_path, table); - if (net-unix_ctl == NULL) + table[0].data = net-unx.sysctl_unix_max_dgram_qlen; + net-unx.unix_ctl = register_net_sysctl_table(net, unix_path, table); + if (net-unx.unix_ctl == NULL) goto err_reg; return 0; @@ -57,8 +57,8 @@ void unix_sysctl_unregister(struct net *net) { struct ctl_table *table; - table = net-unix_ctl-ctl_table_arg; - unregister_sysctl_table(net-unix_ctl); + table = net-unx.unix_ctl-ctl_table_arg; + unregister_sysctl_table(net-unx.unix_ctl); kfree(table); } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
htb and UDP packages bigger than 1500
Hello, I noticed that HTB doesn't properly limit traffic if someone sends UDP packages bigger than 1500. Does HTB have some problems/known limits in this area? There is other traffic in that class and when I drop udp packets bigger than 1500 then remaining traffic is limited properly to correct value. udp part of tcpdump log: 17:35:24.041364 IP 172.16.4.23.3185 80.51.230.58.9267: UDP, length 35 17:35:24.048834 IP 172.16.4.23.46073 83.53.197.205.50882: UDP, length 35 17:35:24.071234 IP 172.16.4.23.2904 81.38.28.69.26349: UDP, length 35 17:35:24.076924 IP 201.232.209.115.50750 172.16.4.23.57590: UDP, length 8251 17:35:24.140895 IP 90.242.120.252.11982 172.16.4.23.19111: UDP, length 35 17:35:24.140976 IP 79.186.64.121.47130 172.16.4.23.19111: UDP, length 35 17:35:24.141039 IP 24.150.182.77.55984 172.16.4.23.41279: UDP, length 29 17:35:24.141120 IP 83.37.212.153.25153 172.16.4.23.32824: UDP, length 8251 17:35:24.154874 IP 172.16.4.23.19111 90.242.120.252.11982: UDP, length 8251 17:35:24.210940 IP 83.8.18.150.28955 172.16.4.23.24825: UDP, length 8251 17:35:24.240382 IP 172.16.4.23.19111 79.186.64.121.47130: UDP, length 8251 17:35:24.272529 IP 83.19.224.219.32977 172.16.4.23.19111: UDP, length 35 17:35:24.307164 IP 85.219.10.150.18601 172.16.4.23.51986: UDP, length 8251 17:35:24.312335 IP 83.26.249.97.10137 172.16.4.23.9383: UDP, length 8251 17:35:24.404250 IP 83.19.224.11.1833 172.16.4.23.21258: UDP, length 8251 17:35:24.467562 IP 196.206.89.182.58764 172.16.4.23.19111: UDP, length 35 17:35:24.560058 IP 172.16.4.23.50417 82.5.204.164.1024: UDP, length 25 17:35:24.563842 IP 172.16.4.23.24825 83.8.18.150.28955: UDP, length 35 17:35:24.567316 IP 172.16.4.23.59727 195.60.65.36.61323: UDP, length 35 17:35:24.569976 IP 83.11.67.228.31949 172.16.4.23.56823: UDP, length 8251 17:35:24.617104 IP 172.16.4.23.28945 76.11.24.115.13887: UDP, length 29 17:35:24.619235 IP 172.16.4.23.21258 83.19.224.11.1833: UDP, length 35 17:35:24.626488 IP 172.16.4.23.9383 83.26.249.97.10137: UDP, length 35 17:35:24.640367 IP 172.16.4.23.47366 91.146.230.1.38928: UDP, length 25 17:35:24.644314 IP 90.242.120.252.1955 172.16.4.23.19111: UDP, length 35 17:35:24.652024 IP 81.184.124.145.22454 172.16.4.23.57089: UDP, length 8251 -- Arkadiusz MiśkiewiczPLD/Linux Team arekm / maven.plhttp://ftp.pld-linux.org/ -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.25] netns: struct net content re-work
Daniel Lezcano wrote: Denis V. Lunev wrote: Recently David Miller and Herbert Xu pointed out that struct net becomes overbloated and un-maintainable. There are two solutions: - provide a pointer to a network subsystem definition from struct net. This costs an additional dereferrence - place sub-system definition into the structure itself. This will speedup run-time access at the cost of recompilation time The second approach looks better for us. Yes, we do not need/want a pointer in this structure and add more dereference in the network code. Other sub-systems will be converted to this approach if this will be accepted :) Signed-off-by: Denis V. Lunev [EMAIL PROTECTED] --- diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h index b62e31f..f60e1ce 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -8,6 +8,8 @@ #include linux/workqueue.h #include linux/list.h +#include net/netns/unix.h + struct proc_dir_entry; struct net_device; struct sock; @@ -46,8 +48,7 @@ struct net { struct hlist_head packet_sklist; /* unix sockets */ - int sysctl_unix_max_dgram_qlen; - struct ctl_table_header *unix_ctl; + struct netns_unix unx; Can you change this from unx to unix ? no, it won't compile. Guess why :) If you encapsulate the structure definitions per subsystem, you can drop the unix prefix in the variable declaration. Instead of having: netns-unix-unix_ctl you will have: netns-unix-ctl agree. Kirill -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ip neigh show not showing arp cache entries?
Chris Friesen wrote: I'm seeing some strange behaviour on a 2.6.14 ppc64 system. If I run ip neigh show it prints out nothing, but if I run arp then I see the other nodes on the local network. [EMAIL PROTECTED]:/root ip neigh show [EMAIL PROTECTED]:/root arp -n Address HWtype HWaddress Flags MaskIface 172.24.132.0 ether 00:01:AF:14:E8:DA Cbond0 172.24.132.1 (incomplete)bond0 172.24.136.0 ether 00:C0:8B:07:B3:7E Cbond0 172.24.132.4 ether 00:01:AF:14:E8:DA Cbond0 172.24.132.2 ether 00:01:AF:14:E8:DA Cbond0 Any ideas what's going on here? I've got some further information. If I look for a specific address, it seems to work: [EMAIL PROTECTED]:/root ip neigh show 172.24.136.0 172.24.136.0 dev bond0 lladdr 00:c0:8b:07:b3:7e REACHABLE In the above scenario, the arp cache lists the device as reachable via bond0. If I search the arp cache to see whether the address is reachable from one of bond0's slave devices, should it come back positive or negative? Chris -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] IPv6 support for NFS server
Here is a cleanup for the ip_map caching patch in nfs server. It prepares for IPv6 text-based mounts and exports. Tests: tested with only IPv4 network and basic nfs ops (mount, file creation and modification) - Signed-off-by: Aurelien Charbon [EMAIL PROTECTED] diff -p -u -r -N linux-2.6.24-rc4/fs/nfsd/export.c linux-2.6.24-rc4-IPv6-cache-based/fs/nfsd/export.c --- linux-2.6.24-rc4/fs/nfsd/export.c2007-12-10 16:11:37.0 +0100 +++ linux-2.6.24-rc4-IPv6-cache-based/fs/nfsd/export.c2007-12-10 17:50:37.0 +0100 @@ -35,6 +35,7 @@ #include linux/lockd/bind.h #include linux/sunrpc/msg_prot.h #include linux/sunrpc/gss_api.h +#include net/ipv6.h #define NFSDDBG_FACILITYNFSDDBG_EXPORT @@ -1556,6 +1557,7 @@ exp_addclient(struct nfsctl_client *ncp) { struct auth_domain*dom; inti, err; +struct in6_addr addr6; /* First, consistency check. */ err = -EINVAL; @@ -1574,9 +1576,12 @@ exp_addclient(struct nfsctl_client *ncp) goto out_unlock; /* Insert client into hashtable. */ -for (i = 0; i ncp-cl_naddr; i++) -auth_unix_add_addr(ncp-cl_addrlist[i], dom); - +for (i = 0; i ncp-cl_naddr; i++) { +/* Mapping address */ +ipv6_addr_set(addr6, 0, 0, +htonl(0x), ncp-cl_addrlist[i].s_addr); +auth_unix_add_addr(addr6, dom); +} auth_unix_forget_old(dom); auth_domain_put(dom); diff -p -u -r -N linux-2.6.24-rc4/fs/nfsd/nfsctl.c linux-2.6.24-rc4-IPv6-cache-based/fs/nfsd/nfsctl.c --- linux-2.6.24-rc4/fs/nfsd/nfsctl.c2007-12-10 16:11:37.0 +0100 +++ linux-2.6.24-rc4-IPv6-cache-based/fs/nfsd/nfsctl.c2007-12-10 18:15:22.0 +0100 @@ -37,6 +37,7 @@ #include linux/nfsd/syscall.h #include asm/uaccess.h +#include net/ipv6.h /* *We have a single directory with 9 nodes in it. @@ -222,6 +223,7 @@ static ssize_t write_getfs(struct file * struct auth_domain *clp; int err = 0; struct knfsd_fh *res; +struct in6_addr in6; if (size sizeof(*data)) return -EINVAL; @@ -236,7 +238,13 @@ static ssize_t write_getfs(struct file * res = (struct knfsd_fh*)buf; exp_readlock(); -if (!(clp = auth_unix_lookup(sin-sin_addr))) + +/* IPv6 address mapping */ +ipv6_addr_set(in6, 0, 0, +htonl(0x), +(((struct sockaddr_in *)data-gd_addr)-sin_addr.s_addr)); + +if (!(clp = auth_unix_lookup(in6))) err = -EPERM; else { err = exp_rootfh(clp, data-gd_path, res, data-gd_maxlen); @@ -257,6 +265,7 @@ static ssize_t write_getfd(struct file * int err = 0; struct knfsd_fh fh; char *res; +struct in6_addr in6; if (size sizeof(*data)) return -EINVAL; @@ -271,7 +280,12 @@ static ssize_t write_getfd(struct file * res = buf; sin = (struct sockaddr_in *)data-gd_addr; exp_readlock(); -if (!(clp = auth_unix_lookup(sin-sin_addr))) +/* IPv6 address mapping */ +ipv6_addr_set(in6, 0, 0, +htonl(0x), +(((struct sockaddr_in *)data-gd_addr)-sin_addr.s_addr)); + +if (!(clp = auth_unix_lookup(in6))) err = -EPERM; else { err = exp_rootfh(clp, data-gd_path, fh, NFS_FHSIZE); diff -p -u -r -N linux-2.6.24-rc4/include/linux/sunrpc/svcauth.h linux-2.6.24-rc4-IPv6-cache-based/include/linux/sunrpc/svcauth.h --- linux-2.6.24-rc4/include/linux/sunrpc/svcauth.h2007-12-10 16:01:43.0 +0100 +++ linux-2.6.24-rc4-IPv6-cache-based/include/linux/sunrpc/svcauth.h 2007-12-10 17:09:34.0 +0100 @@ -120,10 +120,10 @@ extern voidsvc_auth_unregister(rpc_auth extern struct auth_domain *unix_domain_find(char *name); extern void auth_domain_put(struct auth_domain *item); -extern int auth_unix_add_addr(struct in_addr addr, struct auth_domain *dom); +extern int auth_unix_add_addr(struct in6_addr *addr, struct auth_domain *dom); extern struct auth_domain *auth_domain_lookup(char *name, struct auth_domain *new); extern struct auth_domain *auth_domain_find(char *name); -extern struct auth_domain *auth_unix_lookup(struct in_addr addr); +extern struct auth_domain *auth_unix_lookup(struct in6_addr *addr); extern int auth_unix_forget_old(struct auth_domain *dom); extern void svcauth_unix_purge(void); extern void svcauth_unix_info_release(void *); diff -p -u -r -N linux-2.6.24-rc4/net/sunrpc/svcauth_unix.c linux-2.6.24-rc4-IPv6-cache-based/net/sunrpc/svcauth_unix.c --- linux-2.6.24-rc4/net/sunrpc/svcauth_unix.c2007-12-10 16:01:46.0 +0100 +++ linux-2.6.24-rc4-IPv6-cache-based/net/sunrpc/svcauth_unix.c 2007-12-10 17:38:50.0 +0100 @@ -11,7 +11,8 @@ #include linux/hash.h #include linux/string.h #include net/sock.h - +#include net/ipv6.h +#include linux/kernel.h #define RPCDBG_FACILITYRPCDBG_AUTH @@ -84,7 +85,7 @@ static void svcauth_unix_domain_release( struct ip_map { struct cache_headh; charm_class[8]; /* e.g. nfsd */ -struct in_addr
Re: [PATCH] IPv6 support for NFS server
On Mon, Dec 10, 2007 at 07:34:41PM +0100, Aurélien Charbon wrote: Here is a cleanup for the ip_map caching patch in nfs server. It prepares for IPv6 text-based mounts and exports. Tests: tested with only IPv4 network and basic nfs ops (mount, file creation and modification) Thanks! And also tested with an unmodified rpc.mountd? --b. - Signed-off-by: Aurelien Charbon [EMAIL PROTECTED] diff -p -u -r -N linux-2.6.24-rc4/fs/nfsd/export.c linux-2.6.24-rc4-IPv6-cache-based/fs/nfsd/export.c --- linux-2.6.24-rc4/fs/nfsd/export.c2007-12-10 16:11:37.0 +0100 +++ linux-2.6.24-rc4-IPv6-cache-based/fs/nfsd/export.c2007-12-10 17:50:37.0 +0100 @@ -35,6 +35,7 @@ #include linux/lockd/bind.h #include linux/sunrpc/msg_prot.h #include linux/sunrpc/gss_api.h +#include net/ipv6.h #define NFSDDBG_FACILITYNFSDDBG_EXPORT @@ -1556,6 +1557,7 @@ exp_addclient(struct nfsctl_client *ncp) { struct auth_domain*dom; inti, err; +struct in6_addr addr6; /* First, consistency check. */ err = -EINVAL; @@ -1574,9 +1576,12 @@ exp_addclient(struct nfsctl_client *ncp) goto out_unlock; /* Insert client into hashtable. */ -for (i = 0; i ncp-cl_naddr; i++) -auth_unix_add_addr(ncp-cl_addrlist[i], dom); - +for (i = 0; i ncp-cl_naddr; i++) { +/* Mapping address */ +ipv6_addr_set(addr6, 0, 0, +htonl(0x), ncp-cl_addrlist[i].s_addr); +auth_unix_add_addr(addr6, dom); +} auth_unix_forget_old(dom); auth_domain_put(dom); diff -p -u -r -N linux-2.6.24-rc4/fs/nfsd/nfsctl.c linux-2.6.24-rc4-IPv6-cache-based/fs/nfsd/nfsctl.c --- linux-2.6.24-rc4/fs/nfsd/nfsctl.c2007-12-10 16:11:37.0 +0100 +++ linux-2.6.24-rc4-IPv6-cache-based/fs/nfsd/nfsctl.c2007-12-10 18:15:22.0 +0100 @@ -37,6 +37,7 @@ #include linux/nfsd/syscall.h #include asm/uaccess.h +#include net/ipv6.h /* *We have a single directory with 9 nodes in it. @@ -222,6 +223,7 @@ static ssize_t write_getfs(struct file * struct auth_domain *clp; int err = 0; struct knfsd_fh *res; +struct in6_addr in6; if (size sizeof(*data)) return -EINVAL; @@ -236,7 +238,13 @@ static ssize_t write_getfs(struct file * res = (struct knfsd_fh*)buf; exp_readlock(); -if (!(clp = auth_unix_lookup(sin-sin_addr))) + +/* IPv6 address mapping */ +ipv6_addr_set(in6, 0, 0, +htonl(0x), +(((struct sockaddr_in *)data-gd_addr)-sin_addr.s_addr)); + +if (!(clp = auth_unix_lookup(in6))) err = -EPERM; else { err = exp_rootfh(clp, data-gd_path, res, data-gd_maxlen); @@ -257,6 +265,7 @@ static ssize_t write_getfd(struct file * int err = 0; struct knfsd_fh fh; char *res; +struct in6_addr in6; if (size sizeof(*data)) return -EINVAL; @@ -271,7 +280,12 @@ static ssize_t write_getfd(struct file * res = buf; sin = (struct sockaddr_in *)data-gd_addr; exp_readlock(); -if (!(clp = auth_unix_lookup(sin-sin_addr))) +/* IPv6 address mapping */ +ipv6_addr_set(in6, 0, 0, +htonl(0x), +(((struct sockaddr_in *)data-gd_addr)-sin_addr.s_addr)); + +if (!(clp = auth_unix_lookup(in6))) err = -EPERM; else { err = exp_rootfh(clp, data-gd_path, fh, NFS_FHSIZE); diff -p -u -r -N linux-2.6.24-rc4/include/linux/sunrpc/svcauth.h linux-2.6.24-rc4-IPv6-cache-based/include/linux/sunrpc/svcauth.h --- linux-2.6.24-rc4/include/linux/sunrpc/svcauth.h2007-12-10 16:01:43.0 +0100 +++ linux-2.6.24-rc4-IPv6-cache-based/include/linux/sunrpc/svcauth.h 2007-12-10 17:09:34.0 +0100 @@ -120,10 +120,10 @@ extern voidsvc_auth_unregister(rpc_auth extern struct auth_domain *unix_domain_find(char *name); extern void auth_domain_put(struct auth_domain *item); -extern int auth_unix_add_addr(struct in_addr addr, struct auth_domain *dom); +extern int auth_unix_add_addr(struct in6_addr *addr, struct auth_domain *dom); extern struct auth_domain *auth_domain_lookup(char *name, struct auth_domain *new); extern struct auth_domain *auth_domain_find(char *name); -extern struct auth_domain *auth_unix_lookup(struct in_addr addr); +extern struct auth_domain *auth_unix_lookup(struct in6_addr *addr); extern int auth_unix_forget_old(struct auth_domain *dom); extern void svcauth_unix_purge(void); extern void svcauth_unix_info_release(void *); diff -p -u -r -N linux-2.6.24-rc4/net/sunrpc/svcauth_unix.c linux-2.6.24-rc4-IPv6-cache-based/net/sunrpc/svcauth_unix.c --- linux-2.6.24-rc4/net/sunrpc/svcauth_unix.c2007-12-10 16:01:46.0 +0100 +++ linux-2.6.24-rc4-IPv6-cache-based/net/sunrpc/svcauth_unix.c 2007-12-10 17:38:50.0 +0100 @@ -11,7 +11,8 @@ #include linux/hash.h #include linux/string.h #include net/sock.h - +#include
Re: [PATCH RFC] [3/9] modpost: Declare the modpost error functions as printf like
On Thu, Nov 22, 2007 at 03:43:08AM +0100, Andi Kleen wrote: This way gcc can warn for wrong format strings This loks good. Can I get i s-o-b then I will apply it. Sam --- scripts/mod/modpost.c |8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) Index: linux/scripts/mod/modpost.c === --- linux.orig/scripts/mod/modpost.c +++ linux/scripts/mod/modpost.c @@ -33,7 +33,9 @@ enum export { export_unused_gpl, export_gpl_future, export_unknown }; -void fatal(const char *fmt, ...) +#define PRINTF __attribute__ ((format (printf, 1, 2))) + +PRINTF void fatal(const char *fmt, ...) { va_list arglist; @@ -46,7 +48,7 @@ void fatal(const char *fmt, ...) exit(1); } -void warn(const char *fmt, ...) +PRINTF void warn(const char *fmt, ...) { va_list arglist; @@ -57,7 +59,7 @@ void warn(const char *fmt, ...) va_end(arglist); } -void merror(const char *fmt, ...) +PRINTF void merror(const char *fmt, ...) { va_list arglist; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] [4/9] modpost: Fix format string warnings
On Thu, Nov 22, 2007 at 03:43:09AM +0100, Andi Kleen wrote: Fix wrong format strings in modpost exposed by the previous patch. Including one missing argument -- some random data was printed instead. Looks good. Can I get a s-o-b then I will apply it. Sam --- scripts/mod/modpost.c |7 --- 1 file changed, 4 insertions(+), 3 deletions(-) Index: linux/scripts/mod/modpost.c === --- linux.orig/scripts/mod/modpost.c +++ linux/scripts/mod/modpost.c @@ -388,7 +388,7 @@ static int parse_elf(struct elf_info *in /* Check if file offset is correct */ if (hdr-e_shoff info-size) { - fatal(section header offset=%u in file '%s' is bigger then filesize=%lu\n, hdr-e_shoff, filename, info-size); + fatal(section header offset=%lu in file '%s' is bigger then filesize=%lu\n, (unsigned long)hdr-e_shoff, filename, info-size); return 0; } @@ -409,7 +409,7 @@ static int parse_elf(struct elf_info *in const char *secname; if (sechdrs[i].sh_offset info-size) { - fatal(%s is truncated. sechdrs[i].sh_offset=%u sizeof(*hrd)=%ul\n, filename, (unsigned int)sechdrs[i].sh_offset, sizeof(*hdr)); + fatal(%s is truncated. sechdrs[i].sh_offset=%lu sizeof(*hrd)=%lu\n, filename, (unsigned long)sechdrs[i].sh_offset, sizeof(*hdr)); return 0; } secname = secstrings + sechdrs[i].sh_name; @@ -907,7 +907,8 @@ static void warn_sec_mismatch(const char before '%s' (at offset -0x%llx)\n, modname, fromsec, (unsigned long long)r.r_offset, secname, refsymname, - elf-strtab + after-st_name); + elf-strtab + after-st_name, + (unsigned long long)r.r_offset); } else { warn(%s(%s+0x%llx): Section mismatch: reference to %s:%s\n, modname, fromsec, (unsigned long long)r.r_offset, -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] [3/9] modpost: Declare the modpost error functions as printf like
On Mon, Dec 10, 2007 at 07:50:08PM +0100, Sam Ravnborg wrote: On Thu, Nov 22, 2007 at 03:43:08AM +0100, Andi Kleen wrote: This way gcc can warn for wrong format strings This loks good. Can I get i s-o-b then I will apply it. Sorry must have been left out by mistake. Signed-off-by: Andi Kleen [EMAIL PROTECTED] for both patches. -Andi -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [1/4] DST: Distributed storage documentation.
On Mon, 2007-12-10 at 17:50 +0300, Evgeniy Polyakov wrote: On Mon, Dec 10, 2007 at 03:31:48PM +0100, Kay Sievers ([EMAIL PROTECTED]) wrote: I meant that for each new device, it will be placed into /sys/devices/its_name, but it can also be accessed via /sys/bus/dst/devices/ Still, it looks like a path. :) Please don't reference any device directly with a /sys/devices/ path. You have to use the subsystem links to the devices in /sys/bus/dst/devices/. Devices are free to move around in /sys/devices, even during runtime. Yours don't do, but anyway, please remove all mentioning of direct access to /sys/devices/. Ok, I will update documentation to reference /sys/bus/dst/devices instead of /sys/devices Great, thanks! Btw, where is the top-level /sys/devices/storage/ coming from? I don't see that in the code. We don't accept any new virtual parents here. Your devices will automatically appear in /sys/devices/virtual/dst/, and not below your own parent. But that path does not matter anyway, because you should only access them from the /sys/bus/dst/devices/ directory. And in general please don't claim generic names like storage in any namespace for a very specific subsystem like this. It is not a parent - it is an example for device called 'storage', if it will be called 'testing', then path will be /sys/devices/testing or more correct /sys/bus/dst/devices/testing :) Ah, I see. It is 'dst' bus. uganda:~/codes# ls -la /sys/devices/staorge/ total 0 drwxr-xr-x 4 root root0 2007-12-10 11:46 . drwxr-xr-x 9 root root0 2007-12-10 11:46 .. -r--r--r-- 1 root root 4096 2007-12-10 11:46 alg lrwxrwxrwx 1 root root0 2007-12-10 11:46 bus - ../../bus/dst drwxr-xr-x 3 root root0 2007-12-10 11:46 n-0-81003e24117 -r--r--r-- 1 root root 4096 2007-12-10 11:46 name -r--r--r-- 1 root root 4096 2007-12-10 11:46 nodes drwxr-xr-x 2 root root0 2007-12-10 11:46 power -rw-r--r-- 1 root root 4096 2007-12-10 11:46 remove_all_nodes lrwxrwxrwx 1 root root0 2007-12-10 11:46 subsystem - ../../bus/dst -rw-r--r-- 1 root root 4096 2007-12-10 11:46 uevent Ok, how does: ls -l /sys/devices/storage/n-0-81003e24117 look? uganda:~/codes# ls -l /sys/devices/storage/n-0-81003ebc220/ total 0 drwxr-xr-x 2 root root0 2007-12-10 13:23 power -r--r--r-- 1 root root 4096 2007-12-10 13:30 size -r--r--r-- 1 root root 4096 2007-12-10 13:30 start -r--r--r-- 1 root root 4096 2007-12-10 13:30 type -rw-r--r-- 1 root root 4096 2007-12-10 13:30 uevent This is a struct device instance without a subsystem (bus/class), right? It will not send an uevent to userspace. Is that intended? Why don't you add them all to the dst bus? uganda:~/codes# ls -l /sys/bus/dst/ total 0 drwxr-xr-x 2 root root0 2007-12-10 09:52 devices drwxr-xr-x 2 root root0 2007-12-10 09:52 drivers -rw-r--r-- 1 root root 4096 2007-12-10 11:46 drivers_autoprobe --w--- 1 root root 4096 2007-12-10 11:46 drivers_probe How does: ls -l /sys/bus/dst/devices look? uganda:~/codes# ls -la /sys/bus/dst/devices/ total 0 drwxr-xr-x 2 root root 0 2007-12-10 13:30 . drwxr-xr-x 4 root root 0 2007-12-10 13:22 .. lrwxrwxrwx 1 root root 0 2007-12-10 13:30 storage - ../../../devices/storage Here 'storage' is just a name for device called 'storage', it can be anything else. Fine. Further questions: Why do you do your own refcounting instead of using kref? That's because I always used atomic operations as a reference counters and did not tried krefs :) They are the same actually (module tricky arches where smp_mb_* are required), so I can replace them in the next release. On Mon, 2007-12-10 at 18:12 +0300, Evgeniy Polyakov wrote: Actually not - I have to set reference counter to something other than 1 or +/- 1, and thus will have to call kref_get() in a loop, which is a very ugly step. Is there kref_set() or somethinglike that? At least not in 2.6.22 what I'm using for now. Yeah, a loop would look pretty ugly. How about just adding kref_set(), if you need it. Why don't you use groups for the attributes? For 3-4 attributes it is faster to register them in a loop than typing another structure :) Yeah, but if you would need to recover from an error when the creation of a file fails, a group would do the proper rollback. Why don't you use default attributes for the device, where you get all error handling done by the core. What is 'default attributes' and for what devices? All my sysfs files are so much trivial, so they do not need anything special and I do not see what is error handling you mentioned. If all devices of a subsystem (bus/class) are of the same type, you can set a default array of attributes in the struct bus/class to be created at every device. If you have multiple types of devices in the same subsytem (bus/class) you can to assign a the device_type, which has the
Re: [PATCH RFC] [5/9] modpost: Fix a buffer overflow in modpost
On Thu, Nov 22, 2007 at 03:43:10AM +0100, Andi Kleen wrote: When passing an file name 1k the stack could be overflowed. Not really a security issue, but still better plugged. Looks good. A s-o-b line again please. Although I am not so happy with the ue of gcc extensions. Sam --- scripts/mod/modpost.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) Index: linux/scripts/mod/modpost.c === --- linux.orig/scripts/mod/modpost.c +++ linux/scripts/mod/modpost.c @@ -1656,7 +1656,6 @@ int main(int argc, char **argv) { struct module *mod; struct buffer buf = { }; - char fname[SZ]; char *kernel_read = NULL, *module_read = NULL; char *dump_write = NULL; int opt; @@ -1709,6 +1708,8 @@ int main(int argc, char **argv) err = 0; for (mod = modules; mod; mod = mod-next) { + char fname[strlen(mod-name) + 10]; + if (mod-skip) continue; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] drivers/net/ipg: Remove local definition of TRUE/FALSE
Remove local definition of TRUE/FALSE. Signed-off-by: Richard Knutsson [EMAIL PROTECTED] --- diff --git a/drivers/net/ipg.h b/drivers/net/ipg.h index d5d092c..4484778 100644 --- a/drivers/net/ipg.h +++ b/drivers/net/ipg.h @@ -490,38 +490,34 @@ enum ipg_regs { * Tune */ -/* Miscellaneous Constants. */ -#define TRUE 1 -#define FALSE 0 - /* Assign IPG_APPEND_FCS_ON_TX 0 for auto FCS append on TX. */ -#define IPG_APPEND_FCS_ON_TX TRUE +#define IPG_APPEND_FCS_ON_TX true /* Assign IPG_APPEND_FCS_ON_TX 0 for auto FCS strip on RX. */ -#define IPG_STRIP_FCS_ON_RX TRUE +#define IPG_STRIP_FCS_ON_RX true /* Assign IPG_DROP_ON_RX_ETH_ERRORS 0 to drop RX frames with * Ethernet errors. */ -#define IPG_DROP_ON_RX_ETH_ERRORSTRUE +#define IPG_DROP_ON_RX_ETH_ERRORStrue /* Assign IPG_INSERT_MANUAL_VLAN_TAG 0 to insert VLAN tags manually * (via TFC). */ -#defineIPG_INSERT_MANUAL_VLAN_TAG FALSE +#defineIPG_INSERT_MANUAL_VLAN_TAG false /* Assign IPG_ADD_IPCHECKSUM_ON_TX 0 for auto IP checksum on TX. */ -#define IPG_ADD_IPCHECKSUM_ON_TX FALSE +#define IPG_ADD_IPCHECKSUM_ON_TX false /* Assign IPG_ADD_TCPCHECKSUM_ON_TX 0 for auto TCP checksum on TX. * DO NOT USE FOR SILICON REVISIONS B3 AND EARLIER. */ -#define IPG_ADD_TCPCHECKSUM_ON_TXFALSE +#define IPG_ADD_TCPCHECKSUM_ON_TXfalse /* Assign IPG_ADD_UDPCHECKSUM_ON_TX 0 for auto UDP checksum on TX. * DO NOT USE FOR SILICON REVISIONS B3 AND EARLIER. */ -#define IPG_ADD_UDPCHECKSUM_ON_TXFALSE +#define IPG_ADD_UDPCHECKSUM_ON_TXfalse /* If inserting VLAN tags manually, assign the IPG_MANUAL_VLAN_xx * constants as desired. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [1/4] DST: Distributed storage documentation.
On Mon, Dec 10, 2007 at 08:02:28PM +0100, Kay Sievers ([EMAIL PROTECTED]) wrote: uganda:~/codes# ls -l /sys/devices/storage/n-0-81003ebc220/ total 0 drwxr-xr-x 2 root root0 2007-12-10 13:23 power -r--r--r-- 1 root root 4096 2007-12-10 13:30 size -r--r--r-- 1 root root 4096 2007-12-10 13:30 start -r--r--r-- 1 root root 4096 2007-12-10 13:30 type -rw-r--r-- 1 root root 4096 2007-12-10 13:30 uevent This is a struct device instance without a subsystem (bus/class), right? It will not send an uevent to userspace. Is that intended? Why don't you add them all to the dst bus? I created dst bus for storage devices only, nodes are very different objects, and actually they do not need any events from above, but I need to put some attributes somewhere, so it is 'empty' device. Actually not - I have to set reference counter to something other than 1 or +/- 1, and thus will have to call kref_get() in a loop, which is a very ugly step. Is there kref_set() or somethinglike that? At least not in 2.6.22 what I'm using for now. Yeah, a loop would look pretty ugly. How about just adding kref_set(), if you need it. Well, then it distributed storage will not be able to build as standalone module, and kref_set() itself will not be accepted as a single patch, since there are no in-kernel users :) It is easily doable though. Why don't you use groups for the attributes? For 3-4 attributes it is faster to register them in a loop than typing another structure :) Yeah, but if you would need to recover from an error when the creation of a file fails, a group would do the proper rollback. I do not care about such errors - if there is such an error for a file, which exports information about type of the node (i.e. string L or R) or some other very meaningful info, then system has enough to care about instead of this, so dst does not do anything special - it ignores such errors :) On exit path it will be checked and removed correctly. If there will be additional sysfs files, I think group is a good way to implement them. Why don't you use default attributes for the device, where you get all error handling done by the core. What is 'default attributes' and for what devices? All my sysfs files are so much trivial, so they do not need anything special and I do not see what is error handling you mentioned. If all devices of a subsystem (bus/class) are of the same type, you can set a default array of attributes in the struct bus/class to be created at every device. If you have multiple types of devices in the same subsytem (bus/class) you can to assign a the device_type, which has the default attribute group. That way the core will create the files before the event is sent out to userspace, and the files can be access from the event itself. Not sure if that is needed for dst. Ok, I see. DST right now has 3 types of files - storage files, it is common for every storage device; node files, which are the same for every node; and per-algorithm private devices - they can be different (actually only mirroring algorithm exports something to userspace). I think it is possible to use default attributes for storage devices, but node device does not have a bus/class, so they will be untouched. Thanks, Kay -- Evgeniy Polyakov -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] iproute2: off by one in nested attribute parse
Fix off by one in nested attribute management. Fixes segv in: tc qdisc show dev eth1 due to uninitialized attribute table. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- lib/libnetlink.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/lib/libnetlink.c b/lib/libnetlink.c index 12883fe..d13596f 100644 --- a/lib/libnetlink.c +++ b/lib/libnetlink.c @@ -632,6 +632,6 @@ int __parse_rtattr_nested_compat(struct rtattr *tb[], int max, struct rtattr *rt rta = RTA_DATA(rta) + RTA_ALIGN(len); return parse_rtattr_nested(tb, max, rta); } - memset(tb, 0, sizeof(struct rtattr *) * max); + memset(tb, 0, sizeof(struct rtattr *) * (max + 1)); return 0; } -- 1.5.3.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [1/4] DST: Distributed storage documentation.
On Mon, 2007-12-10 at 22:33 +0300, Evgeniy Polyakov wrote: On Mon, Dec 10, 2007 at 08:02:28PM +0100, Kay Sievers ([EMAIL PROTECTED]) wrote: uganda:~/codes# ls -l /sys/devices/storage/n-0-81003ebc220/ total 0 drwxr-xr-x 2 root root0 2007-12-10 13:23 power -r--r--r-- 1 root root 4096 2007-12-10 13:30 size -r--r--r-- 1 root root 4096 2007-12-10 13:30 start -r--r--r-- 1 root root 4096 2007-12-10 13:30 type -rw-r--r-- 1 root root 4096 2007-12-10 13:30 uevent This is a struct device instance without a subsystem (bus/class), right? It will not send an uevent to userspace. Is that intended? Why don't you add them all to the dst bus? I created dst bus for storage devices only, nodes are very different objects, and actually they do not need any events from above, but I need to put some attributes somewhere, so it is 'empty' device. Ok. Actually not - I have to set reference counter to something other than 1 or +/- 1, and thus will have to call kref_get() in a loop, which is a very ugly step. Is there kref_set() or somethinglike that? At least not in 2.6.22 what I'm using for now. Yeah, a loop would look pretty ugly. How about just adding kref_set(), if you need it. Well, then it distributed storage will not be able to build as standalone module, and kref_set() itself will not be accepted as a single patch, since there are no in-kernel users :) It is easily doable though. Most rules have exceptions. :) Send a patch, so we can see how it looks like. Why don't you use groups for the attributes? For 3-4 attributes it is faster to register them in a loop than typing another structure :) Yeah, but if you would need to recover from an error when the creation of a file fails, a group would do the proper rollback. I do not care about such errors - if there is such an error for a file, which exports information about type of the node (i.e. string L or R) or some other very meaningful info, then system has enough to care about instead of this, so dst does not do anything special - it ignores such errors :) On exit path it will be checked and removed correctly. If there will be additional sysfs files, I think group is a good way to implement them. Why don't you use default attributes for the device, where you get all error handling done by the core. What is 'default attributes' and for what devices? All my sysfs files are so much trivial, so they do not need anything special and I do not see what is error handling you mentioned. If all devices of a subsystem (bus/class) are of the same type, you can set a default array of attributes in the struct bus/class to be created at every device. If you have multiple types of devices in the same subsytem (bus/class) you can to assign a the device_type, which has the default attribute group. That way the core will create the files before the event is sent out to userspace, and the files can be access from the event itself. Not sure if that is needed for dst. Ok, I see. DST right now has 3 types of files - storage files, it is common for every storage device; node files, which are the same for every node; and per-algorithm private devices - they can be different (actually only mirroring algorithm exports something to userspace). I think it is possible to use default attributes for storage devices, but node device does not have a bus/class, so they will be untouched. Sounds fine. Thanks, Kay -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [1/4] DST: Distributed storage documentation.
On Mon, Dec 10, 2007 at 08:44:55PM +0100, Kay Sievers ([EMAIL PROTECTED]) wrote: Actually not - I have to set reference counter to something other than 1 or +/- 1, and thus will have to call kref_get() in a loop, which is a very ugly step. Is there kref_set() or somethinglike that? At least not in 2.6.22 what I'm using for now. Yeah, a loop would look pretty ugly. How about just adding kref_set(), if you need it. Well, then it distributed storage will not be able to build as standalone module, and kref_set() itself will not be accepted as a single patch, since there are no in-kernel users :) It is easily doable though. Most rules have exceptions. :) Send a patch, so we can see how it looks like. It looks really non-trivial :) Signed-off-by: Evgeniy Polyakov [EMAIL PROTECTED] diff --git a/include/linux/kref.h b/include/linux/kref.h index 6fee353..5d18563 100644 --- a/include/linux/kref.h +++ b/include/linux/kref.h @@ -24,6 +24,7 @@ struct kref { atomic_t refcount; }; +void kref_set(struct kref *kref, int num); void kref_init(struct kref *kref); void kref_get(struct kref *kref); int kref_put(struct kref *kref, void (*release) (struct kref *kref)); diff --git a/lib/kref.c b/lib/kref.c index a6dc3ec..40aa9f9 100644 --- a/lib/kref.c +++ b/lib/kref.c @@ -15,13 +15,23 @@ #include linux/module.h /** + * kref_set - initialize object and set refcount to requested number. + * @kref: object in question. + * @num: initial reference counter + */ +void kref_set(struct kref *kref, int num) +{ + atomic_set(kref-refcount, num); + smp_mb(); +} + +/** * kref_init - initialize object. * @kref: object in question. */ void kref_init(struct kref *kref) { - atomic_set(kref-refcount,1); - smp_mb(); + kref_set(kref, 1); } /** -- Evgeniy Polyakov -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] drivers/net/ipg: Remove local definition of TRUE/FALSE
Hi Richard, On Dec 10, 2007 9:29 PM, Richard Knutsson [EMAIL PROTECTED] wrote: Remove local definition of TRUE/FALSE. This is already fixed in Francois' tree: http://git.kernel.org/?p=linux/kernel/git/romieu/netdev-2.6.git;a=commitdiff;h=2af61e99e3d1c959840ea007ff56b15db794fb99 Pekka -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [1/4] DST: Distributed storage documentation.
On Mon, 2007-12-10 at 22:51 +0300, Evgeniy Polyakov wrote: On Mon, Dec 10, 2007 at 08:44:55PM +0100, Kay Sievers ([EMAIL PROTECTED]) wrote: Actually not - I have to set reference counter to something other than 1 or +/- 1, and thus will have to call kref_get() in a loop, which is a very ugly step. Is there kref_set() or somethinglike that? At least not in 2.6.22 what I'm using for now. Yeah, a loop would look pretty ugly. How about just adding kref_set(), if you need it. Well, then it distributed storage will not be able to build as standalone module, and kref_set() itself will not be accepted as a single patch, since there are no in-kernel users :) It is easily doable though. Most rules have exceptions. :) Send a patch, so we can see how it looks like. It looks really non-trivial :) Yeah, it does. :) We miss an EXPORT_SYMBOL(), right? Kay -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] [5/9] modpost: Fix a buffer overflow in modpost
On Monday 10 December 2007 20:32, Sam Ravnborg wrote: On Thu, Nov 22, 2007 at 03:43:10AM +0100, Andi Kleen wrote: When passing an file name 1k the stack could be overflowed. Not really a security issue, but still better plugged. Looks good. A s-o-b line again please. Signed-off-by: Andi Kleen [EMAIL PROTECTED] Although I am not so happy with the ue of gcc extensions. That's not a gcc extension. It's C99. -Andi -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [1/4] DST: Distributed storage documentation.
On Mon, Dec 10, 2007 at 08:56:49PM +0100, Kay Sievers ([EMAIL PROTECTED]) wrote: On Mon, 2007-12-10 at 22:51 +0300, Evgeniy Polyakov wrote: On Mon, Dec 10, 2007 at 08:44:55PM +0100, Kay Sievers ([EMAIL PROTECTED]) wrote: Actually not - I have to set reference counter to something other than 1 or +/- 1, and thus will have to call kref_get() in a loop, which is a very ugly step. Is there kref_set() or somethinglike that? At least not in 2.6.22 what I'm using for now. Yeah, a loop would look pretty ugly. How about just adding kref_set(), if you need it. Well, then it distributed storage will not be able to build as standalone module, and kref_set() itself will not be accepted as a single patch, since there are no in-kernel users :) It is easily doable though. Most rules have exceptions. :) Send a patch, so we can see how it looks like. It looks really non-trivial :) Yeah, it does. :) We miss an EXPORT_SYMBOL(), right? Yep :) diff --git a/include/linux/kref.h b/include/linux/kref.h index 6fee353..5d18563 100644 --- a/include/linux/kref.h +++ b/include/linux/kref.h @@ -24,6 +24,7 @@ struct kref { atomic_t refcount; }; +void kref_set(struct kref *kref, int num); void kref_init(struct kref *kref); void kref_get(struct kref *kref); int kref_put(struct kref *kref, void (*release) (struct kref *kref)); diff --git a/lib/kref.c b/lib/kref.c index a6dc3ec..9ecd6e8 100644 --- a/lib/kref.c +++ b/lib/kref.c @@ -15,13 +15,23 @@ #include linux/module.h /** + * kref_set - initialize object and set refcount to requested number. + * @kref: object in question. + * @num: initial reference counter + */ +void kref_set(struct kref *kref, int num) +{ + atomic_set(kref-refcount, num); + smp_mb(); +} + +/** * kref_init - initialize object. * @kref: object in question. */ void kref_init(struct kref *kref) { - atomic_set(kref-refcount,1); - smp_mb(); + kref_set(kref, 1); } /** @@ -61,6 +71,7 @@ int kref_put(struct kref *kref, void (*release)(struct kref *kref)) return 0; } +EXPORT_SYMBOL(kref_set); EXPORT_SYMBOL(kref_init); EXPORT_SYMBOL(kref_get); EXPORT_SYMBOL(kref_put); -- Evgeniy Polyakov -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc4-mm1
On Mon, 10 Dec 2007, Ilpo Järvinen wrote: Dave, please include this one to net-2.6.25. ... -- [PATCH] [TCP]: Fix fack_count miscountings (multiple places) I've better version of this coming up, so Dave please don't put this one into net-2.6.25 (noticed that both the original and the after patch code can get to an infinite loop and the new code is flawed in some rare cases still as well). I'll submit a better version soon. -- i.
Re: [PATCH RFC] [5/9] modpost: Fix a buffer overflow in modpost
On Mon, Dec 10, 2007 at 08:57:28PM +0100, Andi Kleen wrote: On Monday 10 December 2007 20:32, Sam Ravnborg wrote: On Thu, Nov 22, 2007 at 03:43:10AM +0100, Andi Kleen wrote: When passing an file name 1k the stack could be overflowed. Not really a security issue, but still better plugged. Looks good. A s-o-b line again please. Signed-off-by: Andi Kleen [EMAIL PROTECTED] Although I am not so happy with the ue of gcc extensions. That's not a gcc extension. It's C99. OK. I have applied all three patches to kbuild.git. As I did not follow the whole thread about the namespace I did not take those. And the first patch touching module.c should go in via akpm I think. It is outside my core-competence area at least . Sam -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] drivers/net/ipg: Remove local definition of TRUE/FALSE
Pekka Enberg wrote: Hi Richard, On Dec 10, 2007 9:29 PM, Richard Knutsson [EMAIL PROTECTED] wrote: Remove local definition of TRUE/FALSE. This is already fixed in Francois' tree: http://git.kernel.org/?p=linux/kernel/git/romieu/netdev-2.6.git;a=commitdiff;h=2af61e99e3d1c959840ea007ff56b15db794fb99 I see, thanks. Richard Knutsson -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8] [PATCH v2] [CCID3]: Interface CCID3 code with newer Loss Intervals Database
Em Sat, Dec 08, 2007 at 10:06:28AM +, Gerrit Renker escreveu: This hooks up the TFRC Loss Interval database with CCID 3 packet reception. In addition, it makes the CCID-specific computation of the first loss interval (which requires access to all the guts of CCID3) local to ccid3.c. The patch also fixes an omission in the DCCP code, that of a default / fallback RTT value (defined in section 3.4 of RFC 4340 as 0.2 sec); while at it, the upper bound of 4 seconds for an RTT sample has been reduced to match the initial TCP RTO value of 3 seconds from[RFC 1122, 4.2.3.1]. Signed-off-by: Gerrit Renker [EMAIL PROTECTED] Signed-off-by: Ian McDonald [EMAIL PROTECTED] When interfacing we must make sure that ccid3 tfrc_lh_slab is created and then tfrc_li_cachep is not needed. I'm doing this while keeping the structure of the patches, i.e. one introducing, the other removing. But we need to create tfrc_lh_slab if we want the tree to be bisectable. I'm doing this and keeping your Signed-off-line, please holler if you disagree for some reason. - Arnaldo -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 2.6.24-rc4-mm1
On Tue, 11 Dec 2007 01:48:39 +1100 Reuben Farrelly [EMAIL PROTECTED] wrote: On 5/12/2007 4:17 PM, Andrew Morton wrote: Temporarily at http://userweb.kernel.org/~akpm/2.6.24-rc4-mm1/ Will appear later at ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.24-rc4/2.6.24-rc4-mm1/ - Lots of device IDs have been removed from the e1000 driver and moved over to e1000e. So if your e1000 stops working, you forgot to set CONFIG_E1000E. - The s390 build is still broken. I'm seeing this most incredibly unhelpful (to debug) but fortunately reproduceable problem (so far 4/4 times) on this -mm kernel. I thought this problem may have been related to another bug which I have reported (A TCP oops) but even after applying a likely fix for that I am still seeing this problem. The machine boots up perfectly fine and runs good until I load it up. In this case I can reliably cause this to occur by pulling a 3G ISO across the GigE network from my Linux box to my PC. After maybe 50M or so, the console just displays this (ignore initial boot banner): -- * Starting local ... [ ok ] This is tornado.reub.net (Linux x86_64 2.6.24-rc4-mm1) 00:24:01 tornado login: *** buffer overf --- Yes - after displaying the 'f' in what I can only guess is the word 'overflow', the box spontaneously reboots. There is no further console output until it starts to come back up again. The problem does not exist in 2.6.23-gentoo kernels nor in a vanilla 2.6.24-rc4-git6 (phew!), so this looks to be an -mm only problem at this stage. I enabled a number of kernel debugging options but then I got no output at all when the machine crashed. I'm at a bit of a loss as to which subsystem this might be coming from, so I'm not sure who to CC. Box information is (still) up at http://www.reub.net/files/kernel/2.6.24-rc4-mm1/ hm. grepping around for buffer overflow doesn't turn up anything except in drivers which you won't be using on that machine. I'd be suspecting networking, obviously. If you're feeling keen could you please grep a 2.6.24-rc4 tree and apply 2.6.24-rc4-mm1's origin.patch and git-net.patch and see if the bug is still present? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-2.6.25] qdisc: new rate limiter (v2)
This is a time based rate limiter for use in network testing. When doing network tests it is often useful to test at reduced bandwidths. The existing Token Bucket Filter provides rate control, but causes bursty traffic that can cause different performance than real world. Another alternative is the PSPacer, but it depends on pause frames which may also cause issues. The qdisc depends on high resolution timers and clocks, so it will probably use more CPU than others making it a poor choice for use when doing traffic shaping for QOS. Signed-off-by: Stephen Hemminger [EMAIL PROTECTED] --- include/linux/pkt_sched.h | 15 + net/sched/Kconfig | 13 + net/sched/Makefile|1 net/sched/sch_rlim.c | 354 ++ 4 files changed, 383 insertions(+) --- a/include/linux/pkt_sched.h 2007-12-10 13:08:36.0 -0800 +++ b/include/linux/pkt_sched.h 2007-12-10 13:08:39.0 -0800 @@ -475,4 +475,19 @@ struct tc_netem_corrupt #define NETEM_DIST_SCALE 8192 +enum +{ + TCA_RLIM_UNSPEC, + TCA_RLIM_PARMS, + __TCA_RLIM_MAX, +}; +#define TCA_RLIM_MAX (__TCA_RLIM_MAX - 1) + +struct tc_rlim_qopt +{ + __u64 rate; /* bits per sec */ + __u32 overhead; /* crc overhad */ + __u32 limit; /* fifo limit (packets) */ +}; + #endif --- a/net/sched/Kconfig 2007-12-10 13:08:36.0 -0800 +++ b/net/sched/Kconfig 2007-12-10 13:08:39.0 -0800 @@ -196,6 +196,19 @@ config NET_SCH_NETEM If unsure, say N. +config NET_SCH_RLIM + tristate Network Rate Limiter + ---help--- + Say Y here if you want to use timer based network rate limiter + algorithm. + + See the top of file:net/sched/sch_rlim.c for more details. + + To compile this code as a module, choose M here: the + module will be called sch_rlim. + + If unsure, say N. + config NET_SCH_INGRESS tristate Ingress Qdisc ---help--- --- a/net/sched/Makefile2007-12-10 13:08:36.0 -0800 +++ b/net/sched/Makefile2007-12-10 13:08:39.0 -0800 @@ -28,6 +28,7 @@ obj-$(CONFIG_NET_SCH_TEQL)+= sch_teql.o obj-$(CONFIG_NET_SCH_PRIO) += sch_prio.o obj-$(CONFIG_NET_SCH_ATM) += sch_atm.o obj-$(CONFIG_NET_SCH_NETEM)+= sch_netem.o +obj-$(CONFIG_NET_SCH_RLIM) += sch_rlim.o obj-$(CONFIG_NET_CLS_U32) += cls_u32.o obj-$(CONFIG_NET_CLS_ROUTE4) += cls_route.o obj-$(CONFIG_NET_CLS_FW) += cls_fw.o --- /dev/null 1970-01-01 00:00:00.0 + +++ b/net/sched/sch_rlim.c 2007-12-10 13:26:39.0 -0800 @@ -0,0 +1,353 @@ +/* + * net/sched/sch_rate.cTimer based rate control + * + * Copyright (c) 2007 Stephen Hemminger [EMAIL PROTECTED] + * + */ + +#include linux/module.h +#include linux/types.h +#include linux/kernel.h +#include linux/string.h +#include linux/errno.h +#include linux/skbuff.h +#include net/netlink.h +#include net/pkt_sched.h +#include asm/div64.h + +/* Simple Rate control + + Algorthim used in NISTnet and others. + Logically similar to Token Bucket, but more real time and less lumpy. + + A packet is not allowed to be dequeued until a after the deadline. + Each packet dequeued increases the deadline by rate * size. + + If qdisc throttles, it starts a timer, which will wake it up + when it is ready to transmit. This scheduler works much better + if high resolution timers are available. + + Like classful TBF, limit is just kept for backwards compatibility. + It is passed to the default pfifo qdisc - if the inner qdisc is + changed the limit is not effective anymore. + +*/ + +/* Use scaled math to get 1/64 ns resolution */ +#define NSEC_SCALE 6 + +struct rlim_sched_data { + ktime_t next_send; /* next scheduled departure */ + u64 cost; /* nsec/byte * 64 */ + u32 overhead; /* crc/preamble bytes */ + u32 limit; /* upper bound on fifo (packets) */ + + struct Qdisc *qdisc;/* Inner qdisc, default - bfifo queue */ + struct qdisc_watchdog watchdog; +}; + +static int rlim_enqueue(struct sk_buff *skb, struct Qdisc *sch) +{ + struct rlim_sched_data *q = qdisc_priv(sch); + int ret; + + ret = q-qdisc-enqueue(skb, q-qdisc); + if (ret) + sch-qstats.drops++; + else { + sch-q.qlen++; + sch-bstats.bytes += skb-len; + sch-bstats.packets++; + } + + return ret; +} + + +static u64 pkt_time(const struct rlim_sched_data *q, + const struct sk_buff *skb) +{ + return (q-cost * (skb-len + q-overhead)) NSEC_SCALE; +} + +static unsigned int rlim_drop(struct Qdisc *sch) +{ + struct rlim_sched_data *q = qdisc_priv(sch); + unsigned int len = 0; + + if (q-qdisc-ops-drop (len = q-qdisc-ops-drop(q-qdisc)) != 0) { +
[PATCH 0/4] Pull request for 'sis190' branch
Please pull from branch 'sis190' in repository git://git.kernel.org/pub/scm/linux/kernel/git/romieu/netdev-2.6.git sis190 to get the changes below. Distance from 'upstream-linus' (7962024e9d16e9349d76b553326f3fa7be64305e) - c27e14e508664471b8e44ef1f81ec080213ea314 348de67fe200e25d8cb80cff35642192436cfeda 004a22d03d62cd08e5287273a5143447db009cd0 14deb44ffe7220be2de697d616f28cce17e72297 Diffstat drivers/net/sis190.c | 21 ++--- 1 files changed, 10 insertions(+), 11 deletions(-) Shortlog Francois Romieu (4): sis190: add cmos ram access code for the SiS19x/968 chipset pair sis190: remove duplicate INIT_WORK sis190: mdio operation failure is not correctly detected sis190: scheduling while atomic error Patch - diff --git a/drivers/net/sis190.c b/drivers/net/sis190.c index 7200883..c0db182 100644 --- a/drivers/net/sis190.c +++ b/drivers/net/sis190.c @@ -372,7 +372,7 @@ static void __mdio_cmd(void __iomem *ioaddr, u32 ctl) msleep(1); } - if (i 999) + if (i 99) printk(KERN_ERR PFX PHY command failed !\n); } @@ -847,10 +847,8 @@ static void sis190_soft_reset(void __iomem *ioaddr) { SIS_W32(IntrControl, 0x8000); SIS_PCI_COMMIT(); - msleep(1); SIS_W32(IntrControl, 0x0); sis190_asic_down(ioaddr); - msleep(1); } static void sis190_hw_start(struct net_device *dev) @@ -1041,8 +1039,6 @@ static int sis190_open(struct net_device *dev) if (rc 0) goto err_free_rx_1; - INIT_WORK(tp-phy_task, sis190_phy_task); - sis190_request_timer(dev); rc = request_irq(dev-irq, sis190_interrupt, IRQF_SHARED, dev-name, dev); @@ -1549,28 +1545,31 @@ static int __devinit sis190_get_mac_addr_from_eeprom(struct pci_dev *pdev, } /** - * sis190_get_mac_addr_from_apc - Get MAC address for SiS965 model + * sis190_get_mac_addr_from_apc - Get MAC address for SiS96x model * @pdev: PCI device * @dev: network device to get address for * - * SiS965 model, use APC CMOS RAM to store MAC address. + * SiS96x model, use APC CMOS RAM to store MAC address. * APC CMOS RAM is accessed through ISA bridge. * MAC address is read into @net_dev-dev_addr. */ static int __devinit sis190_get_mac_addr_from_apc(struct pci_dev *pdev, struct net_device *dev) { + static const u16 __devinitdata ids[] = { 0x0965, 0x0966, 0x0968 }; struct sis190_private *tp = netdev_priv(dev); struct pci_dev *isa_bridge; u8 reg, tmp8; - int i; + unsigned int i; net_probe(tp, KERN_INFO %s: Read MAC address from APC.\n, pci_name(pdev)); - isa_bridge = pci_get_device(PCI_VENDOR_ID_SI, 0x0965, NULL); - if (!isa_bridge) - isa_bridge = pci_get_device(PCI_VENDOR_ID_SI, 0x0966, NULL); + for (i = 0; i ARRAY_SIZE(ids); i++) { + isa_bridge = pci_get_device(PCI_VENDOR_ID_SI, ids[i], NULL); + if (isa_bridge) + break; + } if (!isa_bridge) { net_probe(tp, KERN_INFO %s: Can not find ISA bridge.\n, -- Ueimor -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] sis190: remove duplicate INIT_WORK
It is already done in sis190_init_one. Signed-off-by: Francois Romieu [EMAIL PROTECTED] Cc: K.M. Liu [EMAIL PROTECTED] --- drivers/net/sis190.c |2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/drivers/net/sis190.c b/drivers/net/sis190.c index 51bbb60..f6a921c 100644 --- a/drivers/net/sis190.c +++ b/drivers/net/sis190.c @@ -1041,8 +1041,6 @@ static int sis190_open(struct net_device *dev) if (rc 0) goto err_free_rx_1; - INIT_WORK(tp-phy_task, sis190_phy_task); - sis190_request_timer(dev); rc = request_irq(dev-irq, sis190_interrupt, IRQF_SHARED, dev-name, dev); -- 1.5.3.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] sis190: mdio operation failure is not correctly detected
i ranges from 0 to 100 in the 'for' loop a few lines above. Reported by davem. Signed-off-by: Francois Romieu [EMAIL PROTECTED] Cc: K.M. Liu [EMAIL PROTECTED] --- drivers/net/sis190.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/sis190.c b/drivers/net/sis190.c index f6a921c..973b369 100644 --- a/drivers/net/sis190.c +++ b/drivers/net/sis190.c @@ -372,7 +372,7 @@ static void __mdio_cmd(void __iomem *ioaddr, u32 ctl) msleep(1); } - if (i 999) + if (i 99) printk(KERN_ERR PFX PHY command failed !\n); } -- 1.5.3.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] sis190: add cmos ram access code for the SiS19x/968 chipset pair
More work is needed to handle correctly the PHY of the new devices when connected to a 10Mb link but this change already helps some users as is. Fix for: http://bugzilla.kernel.org/show_bug.cgi?id=9467 Signed-off-by: Francois Romieu [EMAIL PROTECTED] Cc: K.M. Liu [EMAIL PROTECTED] Cc: J. Gleacher [EMAIL PROTECTED] Cc: Alexandre Penasso Teixeira [EMAIL PROTECTED] Cc: Arliton Rocha [EMAIL PROTECTED] Cc: Juan Jose Pablos [EMAIL PROTECTED] Cc: Wipat Srutiprom [EMAIL PROTECTED] --- drivers/net/sis190.c | 15 +-- 1 files changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/net/sis190.c b/drivers/net/sis190.c index 7200883..51bbb60 100644 --- a/drivers/net/sis190.c +++ b/drivers/net/sis190.c @@ -1549,28 +1549,31 @@ static int __devinit sis190_get_mac_addr_from_eeprom(struct pci_dev *pdev, } /** - * sis190_get_mac_addr_from_apc - Get MAC address for SiS965 model + * sis190_get_mac_addr_from_apc - Get MAC address for SiS96x model * @pdev: PCI device * @dev: network device to get address for * - * SiS965 model, use APC CMOS RAM to store MAC address. + * SiS96x model, use APC CMOS RAM to store MAC address. * APC CMOS RAM is accessed through ISA bridge. * MAC address is read into @net_dev-dev_addr. */ static int __devinit sis190_get_mac_addr_from_apc(struct pci_dev *pdev, struct net_device *dev) { + static const u16 __devinitdata ids[] = { 0x0965, 0x0966, 0x0968 }; struct sis190_private *tp = netdev_priv(dev); struct pci_dev *isa_bridge; u8 reg, tmp8; - int i; + unsigned int i; net_probe(tp, KERN_INFO %s: Read MAC address from APC.\n, pci_name(pdev)); - isa_bridge = pci_get_device(PCI_VENDOR_ID_SI, 0x0965, NULL); - if (!isa_bridge) - isa_bridge = pci_get_device(PCI_VENDOR_ID_SI, 0x0966, NULL); + for (i = 0; i ARRAY_SIZE(ids); i++) { + isa_bridge = pci_get_device(PCI_VENDOR_ID_SI, ids[i], NULL); + if (isa_bridge) + break; + } if (!isa_bridge) { net_probe(tp, KERN_INFO %s: Can not find ISA bridge.\n, -- 1.5.3.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] sis190: scheduling while atomic error
sis190_tx_timeout - sis190_hw_start - sis190_soft_reset - msleep *splat* PCI transactions are correctly flushed here. The msleep() is probably useless. Signed-off-by: Francois Romieu [EMAIL PROTECTED] Cc: K.M. Liu [EMAIL PROTECTED] --- drivers/net/sis190.c |2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/drivers/net/sis190.c b/drivers/net/sis190.c index 973b369..c0db182 100644 --- a/drivers/net/sis190.c +++ b/drivers/net/sis190.c @@ -847,10 +847,8 @@ static void sis190_soft_reset(void __iomem *ioaddr) { SIS_W32(IntrControl, 0x8000); SIS_PCI_COMMIT(); - msleep(1); SIS_W32(IntrControl, 0x0); sis190_asic_down(ioaddr); - msleep(1); } static void sis190_hw_start(struct net_device *dev) -- 1.5.3.3 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[IPv4] ESP: Discard dummy packets introduced in rfc4303
RFC4303 introduces dummy packets with a nexthdr value of 59 to implement traffic confidentiality. Such packets need to be dropped silently and the payload may not be attempted to be parsed as it consists of random chunk. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.25/net/ipv4/esp4.c === --- net-2.6.25.orig/net/ipv4/esp4.c 2007-12-10 15:57:23.0 +0100 +++ net-2.6.25/net/ipv4/esp4.c 2007-12-10 16:06:10.0 +0100 @@ -9,6 +9,7 @@ #include linux/pfkeyv2.h #include linux/random.h #include linux/spinlock.h +#include linux/in6.h #include net/icmp.h #include net/protocol.h #include net/udp.h @@ -233,6 +234,10 @@ /* ... check padding bits here. Silly. :-) */ + /* RFC4303: Drop dummy packets without any error */ + if (nexthdr[1] == IPPROTO_NONE) + goto out; + iph = ip_hdr(skb); ihl = iph-ihl * 4; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[IPv6] ESP: Discard dummy packets introduced in rfc4303
RFC4303 introduces dummy packets with a nexthdr value of 59 to implement traffic confidentiality. Such packets need to be dropped silently and the payload may not be attempted to be parsed as it consists of random chunk. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Index: net-2.6.25/net/ipv6/esp6.c === --- net-2.6.25.orig/net/ipv6/esp6.c 2007-12-10 16:06:02.0 +0100 +++ net-2.6.25/net/ipv6/esp6.c 2007-12-10 16:08:02.0 +0100 @@ -238,6 +238,12 @@ } /* ... check padding bits here. Silly. :-) */ + /* RFC4303: Drop dummy packets without any error */ + if (nexthdr[1] == IPPROTO_NONE) { + ret = -EINVAL; + goto out; + } + pskb_trim(skb, skb-len - alen - padlen - 2); ret = nexthdr[1]; } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [TCP]: Bind fackets_out state to highest_sack more tightly
From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Mon, 10 Dec 2007 14:39:46 +0200 (EET) On Mon, 10 Dec 2007, David Miller wrote: From: Ilpo_Järvinen [EMAIL PROTECTED] Date: Mon, 10 Dec 2007 14:27:24 +0200 (EET) Added checks will catch most of the errors if the current complex fack_count counting logic is flawed somewhere. Fackets_out should always be advancable if highest_sack is too because the fackets_out is nowadays accurate (and obviously it must be smaller than packets_out). Signed-off-by: Ilpo Järvinen [EMAIL PROTECTED] Applied to net-2.6.25, thanks! Please get the fack_count fix as well from the mm1 thread before my mailbox gets filled with stacktraces :-) : http://marc.info/?l=linux-netdevm=119728952018975w=2 Done, thanks! -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][Take3] PCI legacy I/O port free driver - Making Intel e1000 driver legacy I/O port free
Tomohiro Kusumi wrote: Dear Auke and e1000 maintainers Hi, this is the patch which makes the e1000 driver legacy I/O port free. I've received some advice from Auke quite long time ago, and submitted a patch (http://lkml.org/lkml/2007/8/10/11) which I think meets what Auke had told me. Since the patch has not received any reaction from the e1000 community, let me submit it once again (plus, the previous one had a bug regarding module parameter). this opens up an interesting discussion - e1000 is going to be the driver for 8254x hardware only from 2.6.25 and on. e1000e will be the driver that powers 8257x hardware (and ich8/9 and es2lan NICs) and those are all the pci-e hardware devices. This means that the current e1000 driver will not power the pci-e hardware anymore and thus those io-port free devices are removed from e1000. considering the fact that only 82542, 82543 and 82547 devices are (from 2.6.25) on the only devices that can be ioport free in this new e1000 driver, I think that it almost makes no sense to code this functionality up for that. so, I'm wondering if we should not drop this effort alltogether, since it's just a lot of code and none of the pci-e hardware should use ioport anymore. Can you screen e1000e in jeff garzik's netdev-2.6#upstream tree and see if that is correctly not using ioport? I think that would be worth the time. Cheers, Auke If the module parameter enable_legacy_ioport_free is set to 0, it does not differ from the existing e1000 driver, otherwise legacy I/O port free function is enabled. I may have done something wrong, so any comments would be helpful. Tomohiro Kusumi Signed-off-by: Tomohiro Kusumi [EMAIL PROTECTED] --- diff -Nur linux-2.6.23.org/drivers/net/e1000/e1000.h linux-2.6.23/drivers/net/e1000/e1000.h --- linux-2.6.23.org/drivers/net/e1000/e1000.h2007-10-16 11:30:37.0 +0900 +++ linux-2.6.23/drivers/net/e1000/e1000.h2007-10-16 11:32:55.0 +0900 @@ -342,6 +342,9 @@ boolean_t quad_port_a; unsigned long flags; uint32_t eeprom_wol; + + int use_ioport; + int bars; }; enum e1000_state_t { diff -Nur linux-2.6.23.org/drivers/net/e1000/e1000_main.c linux-2.6.23/drivers/net/e1000/e1000_main.c --- linux-2.6.23.org/drivers/net/e1000/e1000_main.c 2007-10-16 11:30:38.0 +0900 +++ linux-2.6.23/drivers/net/e1000/e1000_main.c 2007-10-16 14:48:16.390575464 +0900 @@ -226,6 +226,11 @@ static pci_ers_result_t e1000_io_slot_reset(struct pci_dev *pdev); static void e1000_io_resume(struct pci_dev *pdev); +static unsigned int enable_legacy_ioport_free = 0; +module_param(enable_legacy_ioport_free, uint, 0644); +MODULE_PARM_DESC(enable_legacy_ioport_free, Enable legacy I/O port free (default:0)); +static int e1000_test_legacy_ioport(struct pci_dev *pdev); + static struct pci_error_handlers e1000_err_handler = { .error_detected = e1000_io_error_detected, .slot_reset = e1000_io_slot_reset, @@ -872,8 +877,24 @@ int i, err, pci_using_dac; uint16_t eeprom_data = 0; uint16_t eeprom_apme_mask = E1000_EEPROM_APME; - if ((err = pci_enable_device(pdev))) + int bars = 0; + int use_ioport = 0; + + if (enable_legacy_ioport_free) { + if ((use_ioport = e1000_test_legacy_ioport(pdev)) 0) { + E1000_ERR(e1000_test_legacy_ioport failed, aborting\n); + return -1; + } + if (use_ioport) + bars = pci_select_bars(pdev, IORESOURCE_MEM | IORESOURCE_IO); + else + bars = pci_select_bars(pdev, IORESOURCE_MEM); + if ((err = pci_enable_device_bars(pdev, bars)) != 0) + return err; + } + else if ((err = pci_enable_device(pdev)) != 0) { return err; + } if (!(err = pci_set_dma_mask(pdev, DMA_64BIT_MASK)) !(err = pci_set_consistent_dma_mask(pdev, DMA_64BIT_MASK))) { @@ -887,7 +908,11 @@ pci_using_dac = 0; } - if ((err = pci_request_regions(pdev, e1000_driver_name))) + if (enable_legacy_ioport_free) + err = pci_request_selected_regions(pdev, bars, e1000_driver_name); + else + err = pci_request_regions(pdev, e1000_driver_name); + if (err) goto err_pci_reg; pci_set_master(pdev); @@ -906,6 +931,10 @@ adapter-pdev = pdev; adapter-hw.back = adapter; adapter-msg_enable = (1 debug) - 1; + if (enable_legacy_ioport_free) { + adapter-use_ioport = use_ioport; + adapter-bars = bars; + } mmio_start = pci_resource_start(pdev, BAR_0); mmio_len = pci_resource_len(pdev, BAR_0); @@ -915,12 +944,14 @@ if (!adapter-hw.hw_addr) goto err_ioremap; - for (i = BAR_1; i = BAR_5; i++) { - if
Re: [PATCH 2/3] arch/ : Platform changes for UCC TDM driver for MPC8323ERDB.Also includes related QE changes.
On Mon, 10 Dec 2007 17:39:22 +0530 (IST) Poonam_Aggrwal-b10812 [EMAIL PROTECTED] wrote: +++ b/arch/powerpc/sysdev/qe_lib/qe.c @@ -149,22 +149,116 @@ EXPORT_SYMBOL(qe_issue_cmd); */ static unsigned int brg_clk = 0; -unsigned int get_brg_clk(void) +u32 get_brg_clk(enum qe_clock brgclk, enum qe_clock *brg_source) { - struct device_node *qe; - if (brg_clk) - return brg_clk; + struct device_node *qe, *brg, *clocks; + enum qe_clock brg_src; + u32 brg_input_freq = 0; + u32 brg_num; + const unsigned int *prop; - qe = of_find_node_by_type(NULL, qe); - if (qe) { + *brg_source = 0; + + brg_num = brgclk - QE_BRG1; + brg = of_find_compatible_node(NULL, NULL, fsl,cpm-brg); + if (brg) { unsigned int size; - const u32 *prop = of_get_property(qe, brg-frequency, size); - brg_clk = *prop; - of_node_put(qe); - }; + prop = of_get_property(brg, + fsl,brg-sources, size); + + brg_src = *(prop + brg_num); You should probably sanity check that prop is not NULL and points to something large enough. You don't use brg after here, so the of_node_put(brg) could go here to save putting it in multiple places later. Also, currently there are paths through the following code that do not do the of_node_put(brg). + if (brg_src == 0) { + *brg_source = 0; + if (brg_clk 0) { + of_node_put(brg); + return brg_clk; + } + qe = of_find_node_by_type(NULL, qe); + if (qe) { + unsigned int size; + prop = of_get_property + (qe, brg-frequency, size); + of_node_put(qe); + of_node_put(brg); + return *prop; NULL check here (yes, I know that the old code didn't check). + } + } else { + *brg_source = brg_src + QE_CLK1 - 1; + clocks = of_find_compatible_node(NULL, NULL, + fsl,cpm-clocks); + prop = of_get_property(clocks, + #clock-cells, size); + /* + * clock-cells = 1 only supported right now. + */ + if (*prop != 1) Again check for NULL (and possibly size). + return 0; + prop = of_get_property(clocks, + clock-frequency, size); + + brg_input_freq = *(prop+(brg_src - 1)); And again. + of_node_put(clocks); + of_node_put(brg); + return brg_input_freq; + } + } return brg_clk; } -- Cheers, Stephen Rothwell[EMAIL PROTECTED] http://www.canb.auug.org.au/~sfr/ pgprvL9K6FPZV.pgp Description: PGP signature
Re: [patch 0/5] ipv6: make af_inet6 subsystems to return an error at init
From: Daniel Lezcano [EMAIL PROTECTED] Date: Mon, 10 Dec 2007 16:32:50 +0100 I just noticed that I forgot to put ipv6 under bracket. Sorry for that :( Should I resend the patchset ? This is not necessary. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPv4] ESP: Discard dummy packets introduced in rfc4303
From: Thomas Graf [EMAIL PROTECTED] Date: Mon, 10 Dec 2007 23:17:03 +0100 RFC4303 introduces dummy packets with a nexthdr value of 59 to implement traffic confidentiality. Such packets need to be dropped silently and the payload may not be attempted to be parsed as it consists of random chunk. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Applied to net-2.6, since this is more of a bug fix than anything else. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [IPv6] ESP: Discard dummy packets introduced in rfc4303
From: Thomas Graf [EMAIL PROTECTED] Date: Mon, 10 Dec 2007 23:18:07 +0100 RFC4303 introduces dummy packets with a nexthdr value of 59 to implement traffic confidentiality. Such packets need to be dropped silently and the payload may not be attempted to be parsed as it consists of random chunk. Signed-off-by: Thomas Graf [EMAIL PROTECTED] Also applied to net-2.6, thanks Thomas. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3][BNX2]: Add PHY_DIS_EARLY_DAC workaround.
From: Michael Chan [EMAIL PROTECTED] Date: Sun, 09 Dec 2007 13:16:48 -0800 [BNX2]: Add PHY_DIS_EARLY_DAC workaround. 5709 Ax and Bx chips all need this workaround. Signed-off-by: Michael Chan [EMAIL PROTECTED] Applied. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3][BNX2]: Fix RX packet rot.
From: Michael Chan [EMAIL PROTECTED] Date: Sun, 09 Dec 2007 13:17:14 -0800 [BNX2]: Fix RX packet rot. Packets can be left in the RX ring if the NAPI budget is reached. This is caused by storing the latest rx index at the beginning of bnx2_rx_int(). We may not process all the work up to this index if the budget is reached and so some packets in the RX ring may rot when we later check for more work using this stored rx index. The fix is to not store this latest hw index and only store the processed rx index. We use a new function bnx2_get_hw_rx_cons() to fetch the latest hw rx index. Signed-off-by: Michael Chan [EMAIL PROTECTED] Applied. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3][BNX2]: Update version to 1.6.9.
From: Michael Chan [EMAIL PROTECTED] Date: Sun, 09 Dec 2007 13:18:02 -0800 [BNX2]: Update version to 1.6.9. Signed-off-by: Michael Chan [EMAIL PROTECTED] Applied. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] udp: memory accounting in IPv4
Herbert Xu wrote: On Wed, Dec 05, 2007 at 11:28:34PM -0500, Hideo AOKI wrote: 1. Using sk_forward_alloc and adding socket lock UDP already uses a socket lock to send message. However, it doesn't use the lock to receive message. I wonder if we can also use the lock when sk_forward_alloc is updated in receive processing. I understand performance issue might occur, but ... Having discussed this with Dave we've agreed that this is the best way to go. Thanks, Hello, Thank you so much for reviewing. I chose this solution and developed new patch set. I'm testing the patch set right now. I'll submit it to netdev as soon as I finish the test. Best regards, Hideo -- Hitachi Computer Products (America) Inc. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[IPSEC]: Add xfrm_input_state helper
Hi Dave: This is the last patch we need before converting ESP over to crypto_aead. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- [IPSEC]: Add xfrm_input_state helper This patch adds the xfrm_input_state helper function which returns the current xfrm state being processed on the input path given an sk_buff. This is currently only used by xfrm_input but will be used by ESP upon asynchronous resumption. Signed-off-by: Herbert Xu [EMAIL PROTECTED] diff --git a/include/net/xfrm.h b/include/net/xfrm.h index fb154a6..c49fe0f 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -1302,4 +1302,9 @@ static inline void xfrm_states_delete(struct xfrm_state **states, int n) } #endif +static inline struct xfrm_state *xfrm_input_state(struct sk_buff *skb) +{ + return skb-sp-xvec[skb-sp-len - 1]; +} + #endif /* _NET_XFRM_H */ diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c index 8b2b1b5..8624cbd 100644 --- a/net/xfrm/xfrm_input.c +++ b/net/xfrm/xfrm_input.c @@ -109,7 +109,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type) /* A negative encap_type indicates async resumption. */ if (encap_type 0) { async = 1; - x = skb-sp-xvec[skb-sp-len - 1]; + x = xfrm_input_state(skb); seq = XFRM_SKB_CB(skb)-seq; goto resume; } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.25] netns: struct net content re-work
The idea of separate structures make sense, and seems needed and useful. Denis V. Lunev [EMAIL PROTECTED] writes: diff --git a/include/net/netns/unix.h b/include/net/netns/unix.h new file mode 100644 index 000..27b4e7f --- /dev/null +++ b/include/net/netns/unix.h ^^ Given that we are making this per protocol adding a separate directory to hold them seems to be the wrong grouping. Ideally we want everything for the protocol all together in the same location so it is easy to find. Possibly with a user/kernel split. So perhaps unix_net.h @@ -0,0 +1,13 @@ +/* + * Unix network namespace + */ +#ifndef __NETNS_UNIX_H__ +#define __NETNS_UNIX_H__ + +struct ctl_table_header; +struct netns_unix { + int sysctl_unix_max_dgram_qlen; + struct ctl_table_header *unix_ctl; +}; How about struct unix_net? I think that tracks a little better with how we have done struct in_device, ip6_dev and their friends. Eric -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] NET : dst_ifdown() cleanup
This cleanup shrinks size of net/core/dst.o on i386 from 1299 to 1289 bytes. (This is because dev_hold()/dev_put() are doing atomic_inc()/atomic_dec() and force compiler to re-evaluate memory contents.) Signed-off-by: Eric Dumazet [EMAIL PROTECTED] diff --git a/net/core/dst.c b/net/core/dst.c index 5c6cfc4..7eceeba 100644 --- a/net/core/dst.c +++ b/net/core/dst.c @@ -284,8 +284,8 @@ static inline void dst_ifdown(struct dst_entry *dst, struct net_device *dev, dev_put(dev); if (dst-neighbour dst-neighbour-dev == dev) { dst-neighbour-dev = dst-dev; + dev_hold(dst-dev); dev_put(dev); - dev_hold(dst-neighbour-dev); } } }
Re: [PATCH 2.6.25] netns: struct net content re-work
Kirill Korotaev [EMAIL PROTECTED] writes: Daniel Lezcano wrote: Denis V. Lunev wrote: Recently David Miller and Herbert Xu pointed out that struct net becomes overbloated and un-maintainable. There are two solutions: - provide a pointer to a network subsystem definition from struct net. This costs an additional dereferrence - place sub-system definition into the structure itself. This will speedup run-time access at the cost of recompilation time The second approach looks better for us. Yes, we do not need/want a pointer in this structure and add more dereference in the network code. If it does go that way we just carefully pass around a properly typed structure in that subsystem to reduce the cost. Still it would be nice not to need to add the extra pointer. index b62e31f..f60e1ce 100644 --- a/include/net/net_namespace.h +++ b/include/net/net_namespace.h @@ -8,6 +8,8 @@ #include linux/workqueue.h #include linux/list.h +#include net/netns/unix.h + struct proc_dir_entry; struct net_device; struct sock; @@ -46,8 +48,7 @@ struct net { struct hlist_head packet_sklist; /* unix sockets */ -int sysctl_unix_max_dgram_qlen; -struct ctl_table_header *unix_ctl; +struct netns_unix unx; Can you change this from unx to unix ? no, it won't compile. Guess why :) Hmm. It looks like it is a #define somewhere gcc? Eric -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.25] netns: struct net content re-work
From: [EMAIL PROTECTED] (Eric W. Biederman) Date: Mon, 10 Dec 2007 21:04:07 -0700 Kirill Korotaev [EMAIL PROTECTED] writes: Daniel Lezcano wrote: Denis V. Lunev wrote: Can you change this from unx to unix ? no, it won't compile. Guess why :) Hmm. It looks like it is a #define somewhere gcc? It is a platform CPP pre-define for UNIX platforms. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC 0/3] Add AEAD support to ESP
Hi Dave: This series of patches add AEAD support to ESP. Please don't merge it just yet because they depend on what's in the current cryptodev-2.6 tree. Once that tree has settled down I'll ask you to pull it and then these patches can go on top of that. Thanks, -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmVHI~} [EMAIL PROTECTED] Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] [IPSEC]: Allow async algorithms
[IPSEC]: Allow async algorithms Now that ESP uses authenc we can turn on the support for async algorithms in IPsec. Signed-off-by: Herbert Xu [EMAIL PROTECTED] --- net/xfrm/xfrm_algo.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/net/xfrm/xfrm_algo.c b/net/xfrm/xfrm_algo.c index 1686f64..ae34a12 100644 --- a/net/xfrm/xfrm_algo.c +++ b/net/xfrm/xfrm_algo.c @@ -358,21 +358,21 @@ static const struct xfrm_algo_list xfrm_aalg_list = { .algs = aalg_list, .entries = ARRAY_SIZE(aalg_list), .type = CRYPTO_ALG_TYPE_HASH, - .mask = CRYPTO_ALG_TYPE_HASH_MASK | CRYPTO_ALG_ASYNC, + .mask = CRYPTO_ALG_TYPE_HASH_MASK, }; static const struct xfrm_algo_list xfrm_ealg_list = { .algs = ealg_list, .entries = ARRAY_SIZE(ealg_list), .type = CRYPTO_ALG_TYPE_BLKCIPHER, - .mask = CRYPTO_ALG_TYPE_MASK | CRYPTO_ALG_ASYNC, + .mask = CRYPTO_ALG_TYPE_BLKCIPHER_MASK, }; static const struct xfrm_algo_list xfrm_calg_list = { .algs = calg_list, .entries = ARRAY_SIZE(calg_list), .type = CRYPTO_ALG_TYPE_COMPRESS, - .mask = CRYPTO_ALG_TYPE_MASK | CRYPTO_ALG_ASYNC, + .mask = CRYPTO_ALG_TYPE_MASK, }; static struct xfrm_algo_desc *xfrm_find_algo( -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] [IPSEC]: Use crypto_aead and authenc in ESP
[IPSEC]: Use crypto_aead and authenc in ESP This patch converts ESP to use the crypto_aead interface and in particular the authenc algorithm. This lays the foundations for future support of combined mode algorithms. Signed-off-by: Herbert Xu [EMAIL PROTECTED] --- include/net/esp.h | 54 +- net/ipv4/esp4.c | 465 -- net/ipv6/esp6.c | 422 + 3 files changed, 536 insertions(+), 405 deletions(-) diff --git a/include/net/esp.h b/include/net/esp.h index c05f529..d9834f7 100644 --- a/include/net/esp.h +++ b/include/net/esp.h @@ -1,58 +1,22 @@ #ifndef _NET_ESP_H #define _NET_ESP_H -#include linux/crypto.h -#include net/xfrm.h -#include linux/scatterlist.h +#include linux/skbuff.h #define ESP_NUM_FAST_SG4 -struct esp_data -{ - struct scatterlist sgbuf[ESP_NUM_FAST_SG]; - - /* Confidentiality */ - struct { - int padlen; /* 0..255 */ - /* ivlen is offset from enc_data, where encrypted data start. -* It is logically different of crypto_tfm_alg_ivsize(tfm). -* We assume that it is either zero (no ivec), or -* = crypto_tfm_alg_ivsize(tfm). */ - int ivlen; - int ivinitted; - u8 *ivec; /* ivec buffer */ - struct crypto_blkcipher *tfm; /* crypto handle */ - } conf; - - /* Integrity. It is active when icv_full_len != 0 */ - struct { - u8 *work_icv; - int icv_full_len; - int icv_trunc_len; - struct crypto_hash *tfm; - } auth; +struct crypto_aead; + +struct esp_data { + /* 0..255 */ + int padlen; + + /* Confidentiality Integrity */ + struct crypto_aead *aead; }; extern void *pskb_put(struct sk_buff *skb, struct sk_buff *tail, int len); -static inline int esp_mac_digest(struct esp_data *esp, struct sk_buff *skb, -int offset, int len) -{ - struct hash_desc desc; - int err; - - desc.tfm = esp-auth.tfm; - desc.flags = 0; - - err = crypto_hash_init(desc); - if (unlikely(err)) - return err; - err = skb_icv_walk(skb, desc, offset, len, crypto_hash_update); - if (unlikely(err)) - return err; - return crypto_hash_final(desc, esp-auth.work_icv); -} - struct ip_esp_hdr; static inline struct ip_esp_hdr *ip_esp_hdr(const struct sk_buff *skb) diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c index de4592c..c1f5936 100644 --- a/net/ipv4/esp4.c +++ b/net/ipv4/esp4.c @@ -1,3 +1,5 @@ +#include crypto/aead.h +#include crypto/authenc.h #include linux/err.h #include linux/module.h #include net/ip.h @@ -7,20 +9,91 @@ #include linux/crypto.h #include linux/kernel.h #include linux/pfkeyv2.h -#include linux/random.h +#include linux/rtnetlink.h #include linux/spinlock.h #include net/icmp.h #include net/protocol.h #include net/udp.h +struct esp_skb_cb { + struct xfrm_skb_cb xfrm; + void *tmp; +}; + +#define ESP_SKB_CB(__skb) ((struct esp_skb_cb *)((__skb)-cb[0])) + +/* + * Allocate an AEAD request structure with extra space for SG and IV. + * + * For alignment considerations the IV is placed at the front, followed + * by the request and finally the SG list. + * + * TODO: Use spare space in skb for this where possible. + */ +static void *esp_alloc_tmp(struct crypto_aead *aead, int nfrags) +{ + unsigned int len; + + len = crypto_aead_ivsize(aead); + if (len) { + len += crypto_aead_alignmask(aead) + ~(crypto_tfm_ctx_alignment() - 1); + len = ALIGN(len, crypto_tfm_ctx_alignment()); + } + + len += sizeof(struct aead_givcrypt_request) + crypto_aead_reqsize(aead); + len = ALIGN(len, __alignof__(struct scatterlist)); + + len += sizeof(struct scatterlist *) * nfrags; + + return kmalloc(len, GFP_ATOMIC); +} + +static inline u8 *esp_tmp_iv(struct crypto_aead *aead, void *tmp) +{ + return crypto_aead_ivsize(aead) ? + PTR_ALIGN((u8 *)tmp, crypto_aead_alignmask(aead) + 1) : tmp; +} + +static inline struct aead_givcrypt_request *esp_tmp_req( + struct crypto_aead *aead, u8 *iv) +{ + struct aead_givcrypt_request *req; + + req = (void *)PTR_ALIGN(iv + crypto_aead_ivsize(aead), + crypto_tfm_ctx_alignment()); + aead_givcrypt_set_tfm(req, aead); + return req; +} + +static inline struct scatterlist *esp_tmp_sg(struct crypto_aead *aead, +struct aead_givcrypt_request *req) +{ + return (void *)ALIGN((unsigned long)(req + 1) + +
[PATCH 3/3] [IPSEC]: Add support for combined mode algorithms
[IPSEC]: Add support for combined mode algorithms This patch adds support for combined mode algorithms with GCM being the first algorithm supported. Combined mode algorithms can be added through the xfrm_user interface using the new algorithm payload type XFRMA_ALG_AEAD. Each algorithms is identified by its name and the ICV length. For the purposes of matching algorithms in xfrm_tmpl structures, combined mode algorithms are occupy the same name space as encryption algorithms. This is in line with how they are negotiated using IKE. Signed-off-by: Herbert Xu [EMAIL PROTECTED] --- include/linux/pfkeyv2.h |3 + include/linux/xfrm.h|8 include/net/xfrm.h | 16 net/ipv4/esp4.c | 71 ++--- net/ipv6/esp6.c | 77 - net/xfrm/xfrm_algo.c| 90 net/xfrm/xfrm_user.c| 69 +++- 7 files changed, 303 insertions(+), 31 deletions(-) diff --git a/include/linux/pfkeyv2.h b/include/linux/pfkeyv2.h index d9db5f6..fb4649e 100644 --- a/include/linux/pfkeyv2.h +++ b/include/linux/pfkeyv2.h @@ -298,6 +298,9 @@ struct sadb_x_sec_ctx { #define SADB_X_EALG_BLOWFISHCBC7 #define SADB_EALG_NULL 11 #define SADB_X_EALG_AESCBC 12 +#define SADB_X_EALG_AES_GCM_ICV8 18 +#define SADB_X_EALG_AES_GCM_ICV12 19 +#define SADB_X_EALG_AES_GCM_ICV16 20 #define SADB_X_EALG_CAMELLIACBC22 #define SADB_EALG_MAX 253 /* last EALG */ /* private allocations should use 249-255 (RFC2407) */ diff --git a/include/linux/xfrm.h b/include/linux/xfrm.h index b58adc5..df2e62a 100644 --- a/include/linux/xfrm.h +++ b/include/linux/xfrm.h @@ -96,6 +96,13 @@ struct xfrm_algo { charalg_key[0]; }; +struct xfrm_algo_aead { + charalg_name[64]; + int alg_key_len;/* in bits */ + int alg_icv_len;/* in bits */ + charalg_key[0]; +}; + struct xfrm_stats { __u32 replay_window; __u32 replay; @@ -269,6 +276,7 @@ enum xfrm_attr_type_t { XFRMA_LASTUSED, XFRMA_POLICY_TYPE, /* struct xfrm_userpolicy_type */ XFRMA_MIGRATE, + XFRMA_ALG_AEAD, /* struct xfrm_algo_aead */ __XFRMA_MAX #define XFRMA_MAX (__XFRMA_MAX - 1) diff --git a/include/net/xfrm.h b/include/net/xfrm.h index c49fe0f..92ee8e5 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -145,6 +145,7 @@ struct xfrm_state struct xfrm_algo*aalg; struct xfrm_algo*ealg; struct xfrm_algo*calg; + struct xfrm_algo_aead *aead; /* Data for encapsulator */ struct xfrm_encap_tmpl *encap; @@ -1021,6 +1022,10 @@ static inline int xfrm_id_proto_match(u8 proto, u8 userproto) /* * xfrm algorithm information */ +struct xfrm_algo_aead_info { + u16 icv_truncbits; +}; + struct xfrm_algo_auth_info { u16 icv_truncbits; u16 icv_fullbits; @@ -1035,11 +1040,20 @@ struct xfrm_algo_comp_info { u16 threshold; }; +enum { + XFRM_ALGO_AEAD, + XFRM_ALGO_AUTH, + XFRM_ALGO_ENCR, + XFRM_ALGO_COMP, +}; + struct xfrm_algo_desc { char *name; char *compat; u8 available:1; + u16 type; union { + struct xfrm_algo_aead_info aead; struct xfrm_algo_auth_info auth; struct xfrm_algo_encr_info encr; struct xfrm_algo_comp_info comp; @@ -1241,6 +1255,8 @@ extern struct xfrm_algo_desc *xfrm_calg_get_byid(int alg_id); extern struct xfrm_algo_desc *xfrm_aalg_get_byname(char *name, int probe); extern struct xfrm_algo_desc *xfrm_ealg_get_byname(char *name, int probe); extern struct xfrm_algo_desc *xfrm_calg_get_byname(char *name, int probe); +extern struct xfrm_algo_desc *xfrm_aead_get_byname(char *name, int icv_len, + int probe); struct hash_desc; struct scatterlist; diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c index c1f5936..45fa230 100644 --- a/net/ipv4/esp4.c +++ b/net/ipv4/esp4.c @@ -412,32 +412,53 @@ static void esp_destroy(struct xfrm_state *x) kfree(esp); } -static int esp_init_state(struct xfrm_state *x) +static int esp_init_aead(struct xfrm_state *x) { - struct esp_data *esp = NULL; + struct esp_data *esp = x-data; + struct crypto_aead *aead; + int err; + + aead = crypto_alloc_aead(x-aead-alg_name, 0, 0); + err = PTR_ERR(aead); + if (IS_ERR(aead)) + goto error; + + esp-aead = aead; + + err = crypto_aead_setkey(aead, x-aead-alg_key, +(x-aead-alg_key_len + 7) / 8); + if (err) + goto error; + + err = crypto_aead_setauthsize(aead, x-aead-alg_icv_len / 8); + if (err) + goto
[PATCH] NETLINK : kzalloc() conversion
nl_pid_hash_alloc() is renamed to nl_pid_hash_zalloc(). It is now returning zeroed memory to its callers. Signed-off-by: Eric Dumazet [EMAIL PROTECTED] diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 2e02b19..dbd7cad 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -237,13 +237,14 @@ found: return sk; } -static inline struct hlist_head *nl_pid_hash_alloc(size_t size) +static inline struct hlist_head *nl_pid_hash_zalloc(size_t size) { if (size = PAGE_SIZE) - return kmalloc(size, GFP_ATOMIC); + return kzalloc(size, GFP_ATOMIC); else return (struct hlist_head *) - __get_free_pages(GFP_ATOMIC, get_order(size)); + __get_free_pages(GFP_ATOMIC | __GFP_ZERO, +get_order(size)); } static inline void nl_pid_hash_free(struct hlist_head *table, size_t size) @@ -272,11 +273,10 @@ static int nl_pid_hash_rehash(struct nl_pid_hash *hash, int grow) size *= 2; } - table = nl_pid_hash_alloc(size); + table = nl_pid_hash_zalloc(size); if (!table) return 0; - memset(table, 0, size); otable = hash-table; hash-table = table; hash-mask = mask; @@ -1919,7 +1919,7 @@ static int __init netlink_proto_init(void) for (i = 0; i MAX_LINKS; i++) { struct nl_pid_hash *hash = nl_table[i].hash; - hash-table = nl_pid_hash_alloc(1 * sizeof(*hash-table)); + hash-table = nl_pid_hash_zalloc(1 * sizeof(*hash-table)); if (!hash-table) { while (i-- 0) nl_pid_hash_free(nl_table[i].hash.table, @@ -1927,7 +1927,6 @@ static int __init netlink_proto_init(void) kfree(nl_table); goto panic; } - memset(hash-table, 0, 1 * sizeof(*hash-table)); hash-max_shift = order; hash-shift = 0; hash-mask = 0;
Re: [PATCH] NETLINK : kzalloc() conversion
In article [EMAIL PROTECTED] (at Tue, 11 Dec 2007 06:40:18 +0100), Eric Dumazet [EMAIL PROTECTED] says: nl_pid_hash_alloc() is renamed to nl_pid_hash_zalloc(). It is now returning zeroed memory to its callers. I do think you do not need (and you should not) rename it because XXX_zalloc() would imply we have raw XXX_alloc(). --yoshfuji -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] NETLINK : kzalloc() conversion
YOSHIFUJI Hideaki / 吉藤英明 a écrit : In article [EMAIL PROTECTED] (at Tue, 11 Dec 2007 06:40:18 +0100), Eric Dumazet [EMAIL PROTECTED] says: nl_pid_hash_alloc() is renamed to nl_pid_hash_zalloc(). It is now returning zeroed memory to its callers. I do think you do not need (and you should not) rename it because XXX_zalloc() would imply we have raw XXX_alloc(). Well, its a static function, so a single grep should satisfy reader's concern :) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2.6.25] netns: struct net content re-work
Eric W. Biederman wrote: The idea of separate structures make sense, and seems needed and useful. Denis V. Lunev [EMAIL PROTECTED] writes: diff --git a/include/net/netns/unix.h b/include/net/netns/unix.h new file mode 100644 index 000..27b4e7f --- /dev/null +++ b/include/net/netns/unix.h ^^ Given that we are making this per protocol adding a separate directory to hold them seems to be the wrong grouping. Ideally we want everything for the protocol all together in the same location so it is easy to find. Possibly with a user/kernel split. So perhaps unix_net.h The idea was simple: - I can name 5 files right now - I want them to be shown to gather by ls - so, there are 2 ways, namely: # include/net/netns/unix.h # include/net/netns-unix.h Regards, Den @@ -0,0 +1,13 @@ +/* + * Unix network namespace + */ +#ifndef __NETNS_UNIX_H__ +#define __NETNS_UNIX_H__ + +struct ctl_table_header; +struct netns_unix { +int sysctl_unix_max_dgram_qlen; +struct ctl_table_header *unix_ctl; +}; How about struct unix_net? I think that tracks a little better with how we have done struct in_device, ip6_dev and their friends. Eric -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html