[PATCH RESEND v2 STABLE 4.4] futex: fix irq self-deadlock and satisfy assertion

2021-03-06 Thread Thomas Schoebel-Theuer
From: Thomas Schoebel-Theuer 

This patch and problem analysis is specific for 4.4 LTS, due to incomplete
backporting of other fixes. Later LTS series have different backports.

Since v4.4.257 when CONFIG_PROVE_LOCKING=y
the following triggers right after reboot of our pre-life systems
which equal our production setup:

Mar 03 11:27:33 icpu-test-bap10 kernel: =
Mar 03 11:27:33 icpu-test-bap10 kernel: [ INFO: inconsistent lock state ]
Mar 03 11:27:33 icpu-test-bap10 kernel: 4.4.259-rc1-grsec+ #730 Not tainted
Mar 03 11:27:33 icpu-test-bap10 kernel: -
Mar 03 11:27:33 icpu-test-bap10 kernel: inconsistent {IN-HARDIRQ-W} -> 
{HARDIRQ-ON-W} usage.
Mar 03 11:27:33 icpu-test-bap10 kernel: apache2-ssl/9310 
[HC0[0]:SC0[0]:HE1:SE1] takes:
Mar 03 11:27:33 icpu-test-bap10 kernel:  (>pi_lock){?.-.-.}, at: 
[] pi_state_update_owner+0x51/0xd7
Mar 03 11:27:33 icpu-test-bap10 kernel: {IN-HARDIRQ-W} state was registered at:
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
__lock_acquire+0x3a7/0xe4a
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
lock_acquire+0x18d/0x1bc
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
_raw_spin_lock_irqsave+0x3e/0x50
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
try_to_wake_up+0x2c/0x210
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
default_wake_function+0xd/0xf
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
autoremove_wake_function+0x11/0x35
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
__wake_up_common+0x48/0x7c
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
__wake_up+0x34/0x46
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
megasas_complete_int_cmd+0x31/0x33
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
megasas_complete_cmd+0x570/0x57b
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
complete_cmd_fusion+0x23e/0x33d
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
megasas_isr_fusion+0x67/0x74
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
handle_irq_event_percpu+0x134/0x311
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
handle_irq_event+0x33/0x51
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
handle_edge_irq+0xa3/0xc2
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
handle_irq+0xf9/0x101
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] do_IRQ+0x80/0xf5
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
ret_from_intr+0x0/0x20
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
arch_cpu_idle+0xa/0xc
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
default_idle_call+0x1e/0x20
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
cpu_startup_entry+0x141/0x22f
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
rest_init+0x135/0x13b
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
start_kernel+0x3fa/0x40a
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
x86_64_start_reservations+0x2a/0x2c
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
x86_64_start_kernel+0x11f/0x12c
Mar 03 11:27:33 icpu-test-bap10 kernel: irq event stamp: 1457
Mar 03 11:27:33 icpu-test-bap10 kernel: hardirqs last  enabled at (1457): 
[] get_user_pages_fast+0xeb/0x14f
Mar 03 11:27:33 icpu-test-bap10 kernel: hardirqs last disabled at (1456): 
[] get_user_pages_fast+0x5f/0x14f
Mar 03 11:27:33 icpu-test-bap10 kernel: softirqs last  enabled at (1446): 
[] release_sock+0x142/0x14d
Mar 03 11:27:33 icpu-test-bap10 kernel: softirqs last disabled at (1444): 
[] release_sock+0x34/0x14d
Mar 03 11:27:33 icpu-test-bap10 kernel:
other info that might help us debug 
this:
Mar 03 11:27:33 icpu-test-bap10 kernel:  Possible unsafe locking scenario:
Mar 03 11:27:33 icpu-test-bap10 kernel:CPU0
Mar 03 11:27:33 icpu-test-bap10 kernel:
Mar 03 11:27:33 icpu-test-bap10 kernel:   lock(>pi_lock);
Mar 03 11:27:33 icpu-test-bap10 kernel:   
Mar 03 11:27:33 icpu-test-bap10 kernel: lock(>pi_lock);
Mar 03 11:27:33 icpu-test-bap10 kernel:
 *** DEADLOCK ***
Mar 03 11:27:33 icpu-test-bap10 kernel: 2 locks held by apache2-ssl/9310:
Mar 03 11:27:33 icpu-test-bap10 kernel:  #0:  
(&(&(__futex_data.queues)[i].lock)->rlock){+.+...}, at: [] do
Mar 03 11:27:33 icpu-test-bap10 kernel:  #1:  (>wait_lock){+.+...}, at: 
[] do_futex+0x639/0x809
Mar 03 11:27:33 icpu-test-bap10 kernel:
stack backtrace:
Mar 03 11:27:33 icpu-test-bap10 kernel: CPU: 13 PID: 9310 UID: 99 Comm: 
apache2-ssl Not tainted 4.4.259-rc1-grsec+ #730
Mar 03 11:27:33 icpu-test-bap10 kernel: Hardware name: Dell Inc. PowerEdge 
R630/02C2CP, BIOS 2.11.0 11/02/2019
Mar 03 11:27:33 icpu-test-bap10 kernel:   883fb79bfc00 
816f8fc2 883ffa66d300
Mar 03 11:27:33 icpu-test-bap10 kernel:  8eaa71f0 883fb79bfc50 
81088484 
Mar 03 11:27:33 icpu-test-bap10 kernel:  0001 0001 
0002 883ffa66db58
Mar 03 11:27:33 icpu-test-bap10 kernel: Call Trace:
Mar 03 11:27:33 icpu-test-bap10 kernel:  [] 
dump_stack+0x94/0xca
Mar 03 11:27:33 icpu-test-bap10 ke

[PATCH RESEND v2 STABLE 4.4] futex: fix spin_lock() / spin_unlock_irq() imbalance

2021-03-06 Thread Thomas Schoebel-Theuer
From: Thomas Schoebel-Theuer 

This patch and problem analysis is specific for 4.4 LTS, due to incomplete
backporting of other fixes. Later LTS series have different backports.

The following is obviously incorrect:

static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this,
 struct futex_hash_bucket *hb)
{
[...]
raw_spin_lock(_state->pi_mutex.wait_lock);
[...]
raw_spin_unlock_irq(_state->pi_mutex.wait_lock);
[...]
}

The 4.4-specific fix should probably go in the direction of
b4abf91047c,
making everything irq-safe.

Probably, backporting of b4abf91047c
to 4.4 LTS could thus be another good idea.

However, this might involve some more 4.4-specific work and
require thorough testing:

> git log --oneline v4.4..b4abf91047c -- kernel/futex.c 
> kernel/locking/rtmutex.c | wc -l
10

So this patch is just an obvious quickfix for now.

Hint: the lock order is documented in 4.9.y and later. A similar
documenting is missing in 4.4.y. Please somebody either backport also,
or write a new description, if there would be some differences I cannot
easily see at the moment. Without reliable docs,
inspection of the locking correctness may become a pain.
 
Signed-off-by: Thomas Schoebel-Theuer 
Cc: Thomas Gleixner 
Cc: Lee Jones 
Cc: Greg Kroah-Hartman 
Fixes: 394fc498142
Fixes: 6510e4a2d04
---
 kernel/futex.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 70ad21bbb1d5..4a707bc7cceb 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1406,7 +1406,7 @@ static int wake_futex_pi(u32 __user *uaddr, u32 uval, 
struct futex_q *this,
if (pi_state->owner != current)
return -EINVAL;
 
-   raw_spin_lock(_state->pi_mutex.wait_lock);
+   raw_spin_lock_irq(_state->pi_mutex.wait_lock);
new_owner = rt_mutex_next_owner(_state->pi_mutex);
 
/*
-- 
2.26.2



[PATCH STABLE 4.4] futex: fix irq self-deadlock and satisfy assertion

2021-03-05 Thread Thomas Schoebel-Theuer
From: Thomas Schoebel-Theuer 

Since v4.4.257 when CONFIG_PROVE_LOCKING=y
the following triggers right after reboot of our pre-life systems
which equal our production setup:

Mar 03 11:27:33 icpu-test-bap10 kernel: =
Mar 03 11:27:33 icpu-test-bap10 kernel: [ INFO: inconsistent lock state ]
Mar 03 11:27:33 icpu-test-bap10 kernel: 4.4.259-rc1-grsec+ #730 Not tainted
Mar 03 11:27:33 icpu-test-bap10 kernel: -
Mar 03 11:27:33 icpu-test-bap10 kernel: inconsistent {IN-HARDIRQ-W} -> 
{HARDIRQ-ON-W} usage.
Mar 03 11:27:33 icpu-test-bap10 kernel: apache2-ssl/9310 
[HC0[0]:SC0[0]:HE1:SE1] takes:
Mar 03 11:27:33 icpu-test-bap10 kernel:  (>pi_lock){?.-.-.}, at: 
[] pi_state_update_owner+0x51/0xd7
Mar 03 11:27:33 icpu-test-bap10 kernel: {IN-HARDIRQ-W} state was registered at:
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
__lock_acquire+0x3a7/0xe4a
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
lock_acquire+0x18d/0x1bc
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
_raw_spin_lock_irqsave+0x3e/0x50
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
try_to_wake_up+0x2c/0x210
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
default_wake_function+0xd/0xf
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
autoremove_wake_function+0x11/0x35
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
__wake_up_common+0x48/0x7c
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
__wake_up+0x34/0x46
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
megasas_complete_int_cmd+0x31/0x33
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
megasas_complete_cmd+0x570/0x57b
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
complete_cmd_fusion+0x23e/0x33d
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
megasas_isr_fusion+0x67/0x74
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
handle_irq_event_percpu+0x134/0x311
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
handle_irq_event+0x33/0x51
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
handle_edge_irq+0xa3/0xc2
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
handle_irq+0xf9/0x101
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] do_IRQ+0x80/0xf5
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
ret_from_intr+0x0/0x20
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
arch_cpu_idle+0xa/0xc
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
default_idle_call+0x1e/0x20
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
cpu_startup_entry+0x141/0x22f
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
rest_init+0x135/0x13b
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
start_kernel+0x3fa/0x40a
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
x86_64_start_reservations+0x2a/0x2c
Mar 03 11:27:33 icpu-test-bap10 kernel:   [] 
x86_64_start_kernel+0x11f/0x12c
Mar 03 11:27:33 icpu-test-bap10 kernel: irq event stamp: 1457
Mar 03 11:27:33 icpu-test-bap10 kernel: hardirqs last  enabled at (1457): 
[] get_user_pages_fast+0xeb/0x14f
Mar 03 11:27:33 icpu-test-bap10 kernel: hardirqs last disabled at (1456): 
[] get_user_pages_fast+0x5f/0x14f
Mar 03 11:27:33 icpu-test-bap10 kernel: softirqs last  enabled at (1446): 
[] release_sock+0x142/0x14d
Mar 03 11:27:33 icpu-test-bap10 kernel: softirqs last disabled at (1444): 
[] release_sock+0x34/0x14d
Mar 03 11:27:33 icpu-test-bap10 kernel:
other info that might help us debug 
this:
Mar 03 11:27:33 icpu-test-bap10 kernel:  Possible unsafe locking scenario:
Mar 03 11:27:33 icpu-test-bap10 kernel:CPU0
Mar 03 11:27:33 icpu-test-bap10 kernel:
Mar 03 11:27:33 icpu-test-bap10 kernel:   lock(>pi_lock);
Mar 03 11:27:33 icpu-test-bap10 kernel:   
Mar 03 11:27:33 icpu-test-bap10 kernel: lock(>pi_lock);
Mar 03 11:27:33 icpu-test-bap10 kernel:
 *** DEADLOCK ***
Mar 03 11:27:33 icpu-test-bap10 kernel: 2 locks held by apache2-ssl/9310:
Mar 03 11:27:33 icpu-test-bap10 kernel:  #0:  
(&(&(__futex_data.queues)[i].lock)->rlock){+.+...}, at: [] do
Mar 03 11:27:33 icpu-test-bap10 kernel:  #1:  (>wait_lock){+.+...}, at: 
[] do_futex+0x639/0x809
Mar 03 11:27:33 icpu-test-bap10 kernel:
stack backtrace:
Mar 03 11:27:33 icpu-test-bap10 kernel: CPU: 13 PID: 9310 UID: 99 Comm: 
apache2-ssl Not tainted 4.4.259-rc1-grsec+ #730
Mar 03 11:27:33 icpu-test-bap10 kernel: Hardware name: Dell Inc. PowerEdge 
R630/02C2CP, BIOS 2.11.0 11/02/2019
Mar 03 11:27:33 icpu-test-bap10 kernel:   883fb79bfc00 
816f8fc2 883ffa66d300
Mar 03 11:27:33 icpu-test-bap10 kernel:  8eaa71f0 883fb79bfc50 
81088484 
Mar 03 11:27:33 icpu-test-bap10 kernel:  0001 0001 
0002 883ffa66db58
Mar 03 11:27:33 icpu-test-bap10 kernel: Call Trace:
Mar 03 11:27:33 icpu-test-bap10 kernel:  [] 
dump_stack+0x94/0xca
Mar 03 11:27:33 icpu-test-bap10 kernel:  [] 
print_usage_bug+0x1bc/0x1d1
Mar 03 11:27:33 icpu-test-bap10 kernel:  [] ? 
check_usage_forwards+0x98/0x98
Mar 03 11:27:33 icpu-test-ba

[PATCH STABLE 4.4] futex: fix spin_lock() / spin_unlock_irq() imbalance

2021-03-05 Thread Thomas Schoebel-Theuer
From: Thomas Schoebel-Theuer 

The following is obviously incorrect:

static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this,
 struct futex_hash_bucket *hb)
{
[...]
raw_spin_lock(_state->pi_mutex.wait_lock);
[...]
raw_spin_unlock_irq(_state->pi_mutex.wait_lock);
[...]
}

The 4.4-specific fix should probably go into the direction of
b4abf91047c.

Probably, backporting of b4abf91047c
to 4.4 LTS could be another good idea.

However, this might involve some more 4.4-specific work and
require thorough testing:

> git log --oneline v4.4..b4abf91047c -- kernel/futex.c 
> kernel/locking/rtmutex.c | wc -l
10

So this patch is just an obvious quickfix for now.

Signed-off-by: Thomas Schoebel-Theuer 
Cc: Thomas Gleixner 
Cc: Lee Jones 
Cc: Greg Kroah-Hartman 
Fixes: 394fc498142
Fixes: 6510e4a2d04
---
 kernel/futex.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 70ad21bbb1d5..4a707bc7cceb 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1406,7 +1406,7 @@ static int wake_futex_pi(u32 __user *uaddr, u32 uval, 
struct futex_q *this,
if (pi_state->owner != current)
return -EINVAL;
 
-   raw_spin_lock(_state->pi_mutex.wait_lock);
+   raw_spin_lock_irq(_state->pi_mutex.wait_lock);
new_owner = rt_mutex_next_owner(_state->pi_mutex);
 
/*
-- 
2.26.2



[PATCH] sched/wait: fix endless kthread loop at timeout

2019-07-29 Thread Thomas Schoebel-Theuer
From: Thomas Schoebel-Theuer 

Scenario, possible since kernel 4.11.x and later:

1) kthread calls a waiting function with a timeout, and blocks.
2) kthread_stop() is called by somebody else.
3) The waiting condition does not change for a long time.
4) Nothing happens => normally the timeout would be reached by the kthread.

However, the && in wait_woken() now prevents any call to
schedule_timeout().

As a consequence, the timeout value will never be decreased, resulting
not only in never reaching the timeout, but also in an endless loop,
burning the CPU in kernel mode.

This fix ensures the following semantics: kthread_should_stop() is treated
as equivalent to a timeout. This is beneficial because most users do not
want to wait for the timeout, but to stop the kthread as soon as possible.
It appears that this semantics was probably intended (otherwise the check
is_kthread_should_stop() would not make much sense), but just went wrong
due to the bug.

Here is an example, triggered by external kernel module MARS on a
production kernel. However, the problem can be triggered by other
kthreads and on newer kernels, and also in very different scenarios,
not only during tcp_revcmsg().

In the following example, the kthread simply waits for network packets
to arrive, but in the test scenario the network had been blocked
underneath by a firewall rule in order to trigger the bug:

Mar 08 07:40:08 icpu5133 kernel: watchdog: BUG: soft lockup - CPU#29 stuck for 
23s! [mars_receiver8.:8139]
Mar 08 07:40:08 icpu5133 kernel: Modules linked in: mars(-) ip6table_mangle 
ip6table_raw iptable_raw ip_set_bitmap_port xt_DSCP xt_multiport ip_set_hash_ip 
xt_own
Mar 08 07:40:08 icpu5133 kernel: irq event stamp: 300719885
Mar 08 07:40:08 icpu5133 kernel: hardirqs last  enabled at (300719883): 
[] _raw_spin_unlock_irqrestore+0x3d/0x4f
Mar 08 07:40:08 icpu5133 kernel: hardirqs last disabled at (300719885): 
[] apic_timer_interrupt+0x82/0x90
Mar 08 07:40:08 icpu5133 kernel: softirqs last  enabled at (300719878): 
[] lock_sock_nested+0x50/0x98
Mar 08 07:40:08 icpu5133 kernel: softirqs last disabled at (300719884): 
[] release_sock+0x16/0xda
Mar 08 07:40:08 icpu5133 kernel: CPU: 29 PID: 8139 Comm: mars_receiver8. Not 
tainted 4.14.104+ #121
Mar 08 07:40:08 icpu5133 kernel: Hardware name: Dell Inc. PowerEdge 
R630/02C2CP, BIOS 2.5.5 08/16/2017
Mar 08 07:40:08 icpu5133 kernel: task: 88bf82764fc0 task.stack: 
c9001243
Mar 08 07:40:08 icpu5133 kernel: RIP: 0010:arch_local_irq_restore+0x2/0x8
Mar 08 07:40:08 icpu5133 kernel: RSP: 0018:c90012433b78 EFLAGS: 0246 
ORIG_RAX: ff10
Mar 08 07:40:08 icpu5133 kernel: RAX:  RBX: 88bf82764fc0 
RCX: fec792b4
Mar 08 07:40:08 icpu5133 kernel: RDX: c18b50d3 RSI:  
RDI: 0246
Mar 08 07:40:08 icpu5133 kernel: RBP: 0001 R08: 0001 
R09: 
Mar 08 07:40:08 icpu5133 kernel: R10: c90012433b08 R11: c90012433ba8 
R12: 0246
Mar 08 07:40:08 icpu5133 kernel: R13: 819df735 R14: 0001 
R15: 88bf82765818
Mar 08 07:40:08 icpu5133 kernel: FS:  () 
GS:88c05fb8() knlGS:
Mar 08 07:40:08 icpu5133 kernel: CS:  0010 DS:  ES:  CR0: 
80050033
Mar 08 07:40:08 icpu5133 kernel: CR2: 55abd12eb688 CR3: 0241e006 
CR4: 001606e0
Mar 08 07:40:08 icpu5133 kernel: Call Trace:
Mar 08 07:40:08 icpu5133 kernel:  lock_release+0x32f/0x33b
Mar 08 07:40:08 icpu5133 kernel:  release_sock+0x90/0xda
Mar 08 07:40:08 icpu5133 kernel:  sk_wait_data+0x7f/0x13f
Mar 08 07:40:08 icpu5133 kernel:  ? prepare_to_wait_exclusive+0xc1/0xc1
Mar 08 07:40:08 icpu5133 kernel:  tcp_recvmsg+0x4e6/0x91a
Mar 08 07:40:08 icpu5133 kernel:  ? flush_signals+0x2b/0x6a
Mar 08 07:40:08 icpu5133 kernel:  ? lock_acquire+0x20a/0x25a
Mar 08 07:40:08 icpu5133 kernel:  inet_recvmsg+0x8d/0xc0
Mar 08 07:40:08 icpu5133 kernel:  kernel_recvmsg+0x8f/0xaa
Mar 08 07:40:08 icpu5133 kernel:  ? ___might_sleep+0xf2/0x256
Mar 08 07:40:08 icpu5133 kernel:  mars_recv_raw+0x22a/0x4da [mars]
Mar 08 07:40:08 icpu5133 kernel:  desc_recv_struct+0x40/0x375 [mars]
Mar 08 07:40:08 icpu5133 kernel:  receiver_thread+0xa2/0x61a [mars]
Mar 08 07:40:08 icpu5133 kernel:  ? _hash_insert+0x160/0x160 [mars]
Mar 08 07:40:08 icpu5133 kernel:  ? kthread+0x1a6/0x1ae
Mar 08 07:40:08 icpu5133 kernel:  kthread+0x1a6/0x1ae
Mar 08 07:40:08 icpu5133 kernel:  ? __list_del_entry+0x60/0x60
Mar 08 07:40:08 icpu5133 kernel:  ret_from_fork+0x3a/0x50
Mar 08 07:40:08 icpu5133 kernel: Code: ee e8 c5 17 00 00 48 85 db 75 0e 31 f6 
48 c7 c7 c0 5f 53 82 e8 68 b9 58 00 48 89 5b 58 58 5b 5d c3 9c 58 0f 1f 44 00 
00 c3

Signed-off-by: Thomas Schoebel-Theuer 
---
 kernel/sched/wait.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/wait.c b/kernel/sched/wait.c
index c1e566a114ca..08f121154a91 100644
--- a/kernel/sched/wait.c
+

Re: Can we drop upstream Linux x32 support?

2018-12-14 Thread Thomas Schoebel-Theuer

On 12/14/18 22:41, Thomas Schöbel-Theuer wrote:

On 12/14/18 22:24, Andy Lutomirski wrote:


I'm talking about x32, which is a different beast.



So from my viewpoint the mentioned roadmap / timing requirements will 
remain the same, whatever you are dropping.


Enterprise-critical use cases will probably need to be migrated to 
KVM/qemu together with their old kernel versions, anyway (because the 
original hardware will be no longer available in a few decades).




Here is a systematic approach to the problem.


AFAICS legacy 32bit userspace code (which exists in some notable masses) 
can be executed at least in the following ways:



1) natively on 32bit-capable hardware, under 32bit kernels. Besides 
legacy hardware, this also encompasses most current Intel / AMD 64bit 
hardware in 32bit compatibility mode.


2) under 64bit kernels, using the 32bit compat layer from practically 
any kernel version.


3) under KVM/qemu.


When you just drop 1), users have a fair chance by migrating to any of 
the other two possibilities.


As explained, a time frame of ~5 years should work for the vast majority.

If you clearly explain the migration paths to your users (and to the 
press), I think it will be acceptable.



[side note: I know of a single legacy instance which is now ~20 years 
old, but makes a revenue of several millions per month. These guys have 
large quantities of legacy hardware in stock. And they have enough money 
to hire a downstream maintainer in case of emergency.]



Fatal problems would only arise if you would drop all three 
possibilities in the very long term.



In ~100 years, possibility 3) should be sufficient for handling use 
cases like preservation of historic documents. The latter is roughly 
equivalent to running binary-only MSDOS, Windows NT, and similar, even 
in 100 years, and even non-natively under future hardware architectures.




Re: [PATCH] acpi / apei: fix NULL deref during init

2018-12-14 Thread Thomas Schoebel-Theuer

On 12/14/18 21:24, Borislav Petkov wrote:


Because apei_resources_fini() happens under the same condition check and
if arch_apei_filter_addr was false, it should not become true, all of a
sudden. Or?


Hi Borislav,

please take a look at the stacktrace. For some reason, and only at that 
specific hardware, the condition is false, there but later the indicated 
error exit is taken whose message you can see immediately before the 
stack trace.


So this should documents the one observed case where the NULL deref is 
actually happening.


Of course, it would be possible to develop another solution, but this 
one appears the simplest and safest to me (minimum changes to the logic).


I have tested the patch on that specifc hardware: I have verified that 
the patch does not trigger the NULL deref anymore.


Of course, on any other hardware we have tested, the bug did not trigger 
at all.


If you don't have that specific hardware, you probably cannot easily 
trigger / verify the problem.


If you need access to the specfic hardware, talk to me in a private 
conversation.


Cheers,

Thomas


[PATCH] acpi / apei: fix NULL deref during init

2018-12-14 Thread Thomas Schoebel-Theuer
Since commit commit d91525eb8ee6 ("ACPI, EINJ: Enhance error injection
tolerance level"), starting with kernel 4.0, the following happens during
boot of a specific old hardware:

APEI: Can not request [mem 0x0009c2f2-0x0009c2fc] for APEI ERST registers
BUG: unable to handle kernel NULL pointer dereference at   (null)
IP: [] __list_del_entry+0x5c/0x98
PGD 0
Oops:  [#1] SMP
Modules linked in:
CPU: 0 PID: 1 UID: 0 Comm: swapper/0 Not tainted 
4.4.0-ui18344.004-uiabi1-infong-amd64 #1
Hardware name: IBM IBM eServer BladeCenter HS12 -[8028Z5S]-/Server Blade, BIOS 
-[N1E150AUS-1.11]- 11/04/2010
task: 88021fe4e040 ti: 88021fe7c000 task.ti: 88021fe7c000
RIP: 0010:[]  [] __list_del_entry+0x5c/0x98
RSP: :88021fe7fd18  EFLAGS: 00010207
RAX:  RBX: 88021fe7fde0 RCX: 88021fe7fde0
RDX: 819bd040 RSI: dead0200 RDI: 88021fe7fde0
RBP: 88021fe7fd18 R08:  R09: 
R10: 816ce240 R11: 0001 R12: 819bd040
R13: 88021fe7fda0 R14: 88021d2cd840 R15: 
FS:  () GS:88022fc0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2:  CR3: 019b6000 CR4: 00040670
Stack:
 88021fe7fd30 81343dd7 88021fe7fde0 88021fe7fd58
 813931c0 88021fe7fda0 88021fe7fe00 88021d2cd840
 88021fe7fd70 813931e5 ffea 88021fe7fdf0
Call Trace:
 [] list_del+0xd/0x25
 [] apei_res_clean+0x1f/0x37
 [] apei_resources_fini+0xd/0x19
 [] apei_resources_request+0x24f/0x268
 [] ? apei_exec_for_each_entry+0x77/0x8e
 [] ? setup_erst_disable+0x12/0x12
 [] erst_init+0xed/0x2ca
 [] ? do_one_initcall+0x8c/0x174
 [] ? setup_erst_disable+0x12/0x12
 [] ? setup_erst_disable+0x12/0x12
 [] do_one_initcall+0xe9/0x174
 [] ? parse_args+0x161/0x296
 [] kernel_init_freeable+0x169/0x1f6
 [] ? do_early_param+0x88/0x88
 [] ? rest_init+0x79/0x79
 [] kernel_init+0x9/0xd5
 [] ret_from_fork+0x55/0x80
 [] ? rest_init+0x79/0x79
Code: 02 00 00 00 00 ad de 48 39 f0 75 1f 49 89 c0 48 c7 c2 38 de 8e 81 be 38 
00 00 00 48 c7 c7 13 dd 8e 81 31 c0 e8 94 36 d0 ff eb 3a <48> 8b 30 48 39 fe 74 
11 49 89 f0 48 c7 c2 6c de 8e 81 be 3b 00
RIP  [] __list_del_entry+0x5c/0x98
 RSP 
CR2: 
---[ end trace 3610e544cef27e81 ]---
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0009

Reason is a conditional initialization of variable arch_res, which happens
only under a specific precondition. When the condition is false, the
variable remains uninitialized.

This may later trigger a splat, e.g. when some error path is taken.

Solution: do the initialisation unconditionally. Also as a safeguard.

Fixes: d91525eb8ee6a622ce476955fe1a2530ade87c83
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/acpi/apei/apei-base.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/apei/apei-base.c b/drivers/acpi/apei/apei-base.c
index da370e1d31f4..ef931b8a0b11 100644
--- a/drivers/acpi/apei/apei-base.c
+++ b/drivers/acpi/apei/apei-base.c
@@ -494,8 +494,8 @@ int apei_resources_request(struct apei_resources *resources,
if (rc)
goto nvs_res_fini;
 
+   apei_resources_init(_res);
if (arch_apei_filter_addr) {
-   apei_resources_init(_res);
rc = apei_get_arch_resources(_res);
if (rc)
goto arch_res_fini;
-- 
2.12.3



Re: [RFC 00/32] State of MARS Reo-Redundancy Module

2016-12-30 Thread Thomas Schoebel-Theuer

Typo correction:

On 12/30/2016 11:57 PM, Thomas Schoebel-Theuer wrote:

standalone servers with local hardware RAIDs. They are hosting about
500 MARS resources (originally DRBD resources) just for the web servers;


This must read 2500. Somehow the leading "2" was eaten at wraparound.



Re: [RFC 00/32] State of MARS Reo-Redundancy Module

2016-12-30 Thread Thomas Schoebel-Theuer

Typo correction:

On 12/30/2016 11:57 PM, Thomas Schoebel-Theuer wrote:

standalone servers with local hardware RAIDs. They are hosting about
500 MARS resources (originally DRBD resources) just for the web servers;


This must read 2500. Somehow the leading "2" was eaten at wraparound.



[RFC 28/32] mars: add new module mars_proc

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 drivers/staging/mars/mars/mars_proc.c | 389 ++
 drivers/staging/mars/mars/mars_proc.h |  34 +++
 2 files changed, 423 insertions(+)
 create mode 100644 drivers/staging/mars/mars/mars_proc.c
 create mode 100644 drivers/staging/mars/mars/mars_proc.h

diff --git a/drivers/staging/mars/mars/mars_proc.c 
b/drivers/staging/mars/mars/mars_proc.c
new file mode 100644
index ..84b4dfc82211
--- /dev/null
+++ b/drivers/staging/mars/mars/mars_proc.c
@@ -0,0 +1,389 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include "strategy.h"
+#include "mars_proc.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+xio_info_fn xio_info;
+
+static
+int trigger_sysctl_handler(
+   struct ctl_table *table,
+   int write,
+   void __user *buffer,
+   size_t *length,
+   loff_t *ppos)
+{
+   ssize_t res = 0;
+   size_t len = *length;
+
+   XIO_DBG("write = %d len = %ld pos = %lld\n", write, len, *ppos);
+
+   if (!len || *ppos > 0)
+   goto done;
+
+   if (write) {
+   char tmp[8] = {};
+
+   res = len; /*  fake consumption of all data */
+
+   if (len > 7)
+   len = 7;
+   if (!copy_from_user(tmp, buffer, len)) {
+   int code = 0;
+   int status = kstrtoint(tmp, 10, );
+
+   /* the return value from ssanf() does not matter */
+   (void)status;
+   if (code > 0)
+   local_trigger();
+   if (code > 1)
+   remote_trigger();
+   }
+   } else {
+   char *answer = "MARS module not operational\n";
+   char *tmp = NULL;
+   int mylen;
+
+   if (xio_info) {
+   answer = "internal error while determining xio_info\n";
+   tmp = xio_info();
+   if (tmp)
+   answer = tmp;
+   }
+
+   mylen = strlen(answer);
+   if (len > mylen)
+   len = mylen;
+   res = len;
+   if (copy_to_user(buffer, answer, len)) {
+   XIO_ERR("write %ld bytes at %p failed\n", len, buffer);
+   res = -EFAULT;
+   }
+   brick_string_free(tmp);
+   }
+
+done:
+   XIO_DBG("res = %ld\n", res);
+   *length = res;
+   if (res >= 0) {
+   *ppos += res;
+   return 0;
+   }
+   return res;
+}
+
+static
+int lamport_sysctl_handler(
+   struct ctl_table *table,
+   int write,
+   void __user *buffer,
+   size_t *length,
+   loff_t *ppos)
+{
+   ssize_t res = 0;
+   size_t len = *length;
+   int my_len = 128;
+   char *tmp = brick_string_alloc(my_len);
+   struct timespec know = CURRENT_TIME;
+   struct timespec lnow;
+
+   XIO_DBG("write = %d len = %ld pos = %lld\n", write, len, *ppos);
+
+   if (!len || *ppos > 0)
+   goto done;
+
+   if (write)
+   return -EINVAL;
+
+   get_lamport();
+
+   res = scnprintf(
+   tmp,
+   my_len,
+   "CURRENT_TIME=%ld.%09ld\nlamport_now=%ld.%09ld\n",
+   know.tv_sec, know.tv_nsec,
+   lnow.tv_sec, lnow.tv_nsec
+   );
+
+   if (copy_to_user(buffer, tmp, res)) {
+   XIO_ERR("write %ld bytes at %p failed\n", res, buffer);
+   res = -EFAULT;
+   }
+   brick_string_free(tmp);
+
+done:
+   XIO_DBG("res = %ld\n", res);
+   *length = res;
+   if (res >= 0) {
+   *ppos += res;
+   return 0;
+   }
+   return res;
+}
+
+#ifdef CTL_UNNUMBERED
+#define _CTL_NAME  .ctl_name = CTL_UNNUMBERED,
+#define _CTL_STRATEGY(handler) .strategy = ,
+#else

[RFC 02/32] mars: add new module brick_say

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 drivers/staging/mars/brick_say.c | 920 +++
 include/linux/brick/brick_say.h  |  89 
 2 files changed, 1009 insertions(+)
 create mode 100644 drivers/staging/mars/brick_say.c
 create mode 100644 include/linux/brick/brick_say.h

diff --git a/drivers/staging/mars/brick_say.c b/drivers/staging/mars/brick_say.c
new file mode 100644
index ..f3bb49a0dfc3
--- /dev/null
+++ b/drivers/staging/mars/brick_say.c
@@ -0,0 +1,920 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+/***/
+
+/*  messaging */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include 
+
+#ifndef GFP_BRICK
+#define GFP_BRICK  GFP_NOIO
+#endif
+
+#define SAY_ORDER  0
+#define SAY_BUFMAX (PAGE_SIZE << SAY_ORDER)
+#define SAY_BUF_LIMIT  (SAY_BUFMAX - 1500)
+#define MAX_FILELEN16
+#define MAX_IDS1000
+
+const char *say_class[MAX_SAY_CLASS] = {
+   [SAY_DEBUG] = "debug",
+   [SAY_INFO] = "info",
+   [SAY_WARN] = "warn",
+   [SAY_ERROR] = "error",
+   [SAY_FATAL] = "fatal",
+   [SAY_TOTAL] = "total",
+};
+
+int brick_say_logging = 1;
+
+module_param_named(say_logging, brick_say_logging, int, 0);
+int brick_say_debug;
+
+module_param_named(say_debug, brick_say_debug, int, 0);
+
+int brick_say_syslog_min = 1;
+int brick_say_syslog_max = -1;
+int brick_say_syslog_flood_class = 3;
+int brick_say_syslog_flood_limit = 20;
+int brick_say_syslog_flood_recovery = 300;
+
+int delay_say_on_overflow =
+#ifdef CONFIG_MARS_DEBUG
+   1;
+#else
+   0;
+#endif
+
+static atomic_t say_alloc_channels = ATOMIC_INIT(0);
+static atomic_t say_alloc_names = ATOMIC_INIT(0);
+static atomic_t say_alloc_pages = ATOMIC_INIT(0);
+
+static unsigned long flood_start_jiffies;
+static int flood_count;
+
+struct say_channel {
+   char *ch_name;
+   struct say_channel *ch_next;
+
+   /* protect against concurrent writes */
+   spinlock_t ch_lock[MAX_SAY_CLASS];
+   char *ch_buf[MAX_SAY_CLASS][2];
+
+   short ch_index[MAX_SAY_CLASS];
+   struct file *ch_filp[MAX_SAY_CLASS][2];
+   int ch_overflow[MAX_SAY_CLASS];
+   bool ch_written[MAX_SAY_CLASS];
+   bool ch_rollover;
+   bool ch_must_exist;
+   bool ch_is_dir;
+   bool ch_delete;
+   int ch_status_written;
+   int ch_id_max;
+   void *ch_ids[MAX_IDS];
+
+   wait_queue_head_t ch_progress;
+};
+
+struct say_channel *default_channel;
+
+static struct say_channel *channel_list;
+
+static rwlock_t say_lock = __RW_LOCK_UNLOCKED(say_lock);
+
+static struct task_struct *say_thread;
+
+static DECLARE_WAIT_QUEUE_HEAD(say_event);
+
+bool say_dirty;
+
+#define use_atomic()   \
+   ((preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK | HARDIRQ_MASK | 
NMI_MASK)) != 0 || irqs_disabled())
+
+static
+void wait_channel(struct say_channel *ch, int class)
+{
+   if (delay_say_on_overflow && ch->ch_index[class] > SAY_BUF_LIMIT) {
+   if (!use_atomic()) {
+   say_dirty = true;
+   wake_up_interruptible(_event);
+   wait_event_interruptible_timeout(
+   ch->ch_progress, ch->ch_index[class] < SAY_BUF_LIMIT, 
HZ / 10);
+   }
+   }
+}
+
+static
+struct say_channel *find_channel(const void *id)
+{
+   struct say_channel *res = default_channel;
+   struct say_channel *ch;
+
+   read_lock(_lock);
+   for (ch = channel_list; ch; ch = ch->ch_next) {
+   int i;
+
+   for (i = 0; i < ch->ch_id_max; i++) {
+   if (ch->ch_ids[i] == id) {
+   res = ch;
+   goto found;
+   }
+   }
+   }
+found:
+   read_unlock(_lock);
+   return res;
+}
+
+static
+void _remove_binding(struct task_str

[RFC 28/32] mars: add new module mars_proc

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/mars/mars_proc.c | 389 ++
 drivers/staging/mars/mars/mars_proc.h |  34 +++
 2 files changed, 423 insertions(+)
 create mode 100644 drivers/staging/mars/mars/mars_proc.c
 create mode 100644 drivers/staging/mars/mars/mars_proc.h

diff --git a/drivers/staging/mars/mars/mars_proc.c 
b/drivers/staging/mars/mars/mars_proc.c
new file mode 100644
index ..84b4dfc82211
--- /dev/null
+++ b/drivers/staging/mars/mars/mars_proc.c
@@ -0,0 +1,389 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include "strategy.h"
+#include "mars_proc.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+xio_info_fn xio_info;
+
+static
+int trigger_sysctl_handler(
+   struct ctl_table *table,
+   int write,
+   void __user *buffer,
+   size_t *length,
+   loff_t *ppos)
+{
+   ssize_t res = 0;
+   size_t len = *length;
+
+   XIO_DBG("write = %d len = %ld pos = %lld\n", write, len, *ppos);
+
+   if (!len || *ppos > 0)
+   goto done;
+
+   if (write) {
+   char tmp[8] = {};
+
+   res = len; /*  fake consumption of all data */
+
+   if (len > 7)
+   len = 7;
+   if (!copy_from_user(tmp, buffer, len)) {
+   int code = 0;
+   int status = kstrtoint(tmp, 10, );
+
+   /* the return value from ssanf() does not matter */
+   (void)status;
+   if (code > 0)
+   local_trigger();
+   if (code > 1)
+   remote_trigger();
+   }
+   } else {
+   char *answer = "MARS module not operational\n";
+   char *tmp = NULL;
+   int mylen;
+
+   if (xio_info) {
+   answer = "internal error while determining xio_info\n";
+   tmp = xio_info();
+   if (tmp)
+   answer = tmp;
+   }
+
+   mylen = strlen(answer);
+   if (len > mylen)
+   len = mylen;
+   res = len;
+   if (copy_to_user(buffer, answer, len)) {
+   XIO_ERR("write %ld bytes at %p failed\n", len, buffer);
+   res = -EFAULT;
+   }
+   brick_string_free(tmp);
+   }
+
+done:
+   XIO_DBG("res = %ld\n", res);
+   *length = res;
+   if (res >= 0) {
+   *ppos += res;
+   return 0;
+   }
+   return res;
+}
+
+static
+int lamport_sysctl_handler(
+   struct ctl_table *table,
+   int write,
+   void __user *buffer,
+   size_t *length,
+   loff_t *ppos)
+{
+   ssize_t res = 0;
+   size_t len = *length;
+   int my_len = 128;
+   char *tmp = brick_string_alloc(my_len);
+   struct timespec know = CURRENT_TIME;
+   struct timespec lnow;
+
+   XIO_DBG("write = %d len = %ld pos = %lld\n", write, len, *ppos);
+
+   if (!len || *ppos > 0)
+   goto done;
+
+   if (write)
+   return -EINVAL;
+
+   get_lamport();
+
+   res = scnprintf(
+   tmp,
+   my_len,
+   "CURRENT_TIME=%ld.%09ld\nlamport_now=%ld.%09ld\n",
+   know.tv_sec, know.tv_nsec,
+   lnow.tv_sec, lnow.tv_nsec
+   );
+
+   if (copy_to_user(buffer, tmp, res)) {
+   XIO_ERR("write %ld bytes at %p failed\n", res, buffer);
+   res = -EFAULT;
+   }
+   brick_string_free(tmp);
+
+done:
+   XIO_DBG("res = %ld\n", res);
+   *length = res;
+   if (res >= 0) {
+   *ppos += res;
+   return 0;
+   }
+   return res;
+}
+
+#ifdef CTL_UNNUMBERED
+#define _CTL_NAME  .ctl_name = CTL_UNNUMBERED,
+#define _CTL_STRATEGY(handler) .strategy = ,
+#else
+#def

[RFC 02/32] mars: add new module brick_say

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/brick_say.c | 920 +++
 include/linux/brick/brick_say.h  |  89 
 2 files changed, 1009 insertions(+)
 create mode 100644 drivers/staging/mars/brick_say.c
 create mode 100644 include/linux/brick/brick_say.h

diff --git a/drivers/staging/mars/brick_say.c b/drivers/staging/mars/brick_say.c
new file mode 100644
index ..f3bb49a0dfc3
--- /dev/null
+++ b/drivers/staging/mars/brick_say.c
@@ -0,0 +1,920 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+/***/
+
+/*  messaging */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include 
+
+#ifndef GFP_BRICK
+#define GFP_BRICK  GFP_NOIO
+#endif
+
+#define SAY_ORDER  0
+#define SAY_BUFMAX (PAGE_SIZE << SAY_ORDER)
+#define SAY_BUF_LIMIT  (SAY_BUFMAX - 1500)
+#define MAX_FILELEN16
+#define MAX_IDS1000
+
+const char *say_class[MAX_SAY_CLASS] = {
+   [SAY_DEBUG] = "debug",
+   [SAY_INFO] = "info",
+   [SAY_WARN] = "warn",
+   [SAY_ERROR] = "error",
+   [SAY_FATAL] = "fatal",
+   [SAY_TOTAL] = "total",
+};
+
+int brick_say_logging = 1;
+
+module_param_named(say_logging, brick_say_logging, int, 0);
+int brick_say_debug;
+
+module_param_named(say_debug, brick_say_debug, int, 0);
+
+int brick_say_syslog_min = 1;
+int brick_say_syslog_max = -1;
+int brick_say_syslog_flood_class = 3;
+int brick_say_syslog_flood_limit = 20;
+int brick_say_syslog_flood_recovery = 300;
+
+int delay_say_on_overflow =
+#ifdef CONFIG_MARS_DEBUG
+   1;
+#else
+   0;
+#endif
+
+static atomic_t say_alloc_channels = ATOMIC_INIT(0);
+static atomic_t say_alloc_names = ATOMIC_INIT(0);
+static atomic_t say_alloc_pages = ATOMIC_INIT(0);
+
+static unsigned long flood_start_jiffies;
+static int flood_count;
+
+struct say_channel {
+   char *ch_name;
+   struct say_channel *ch_next;
+
+   /* protect against concurrent writes */
+   spinlock_t ch_lock[MAX_SAY_CLASS];
+   char *ch_buf[MAX_SAY_CLASS][2];
+
+   short ch_index[MAX_SAY_CLASS];
+   struct file *ch_filp[MAX_SAY_CLASS][2];
+   int ch_overflow[MAX_SAY_CLASS];
+   bool ch_written[MAX_SAY_CLASS];
+   bool ch_rollover;
+   bool ch_must_exist;
+   bool ch_is_dir;
+   bool ch_delete;
+   int ch_status_written;
+   int ch_id_max;
+   void *ch_ids[MAX_IDS];
+
+   wait_queue_head_t ch_progress;
+};
+
+struct say_channel *default_channel;
+
+static struct say_channel *channel_list;
+
+static rwlock_t say_lock = __RW_LOCK_UNLOCKED(say_lock);
+
+static struct task_struct *say_thread;
+
+static DECLARE_WAIT_QUEUE_HEAD(say_event);
+
+bool say_dirty;
+
+#define use_atomic()   \
+   ((preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK | HARDIRQ_MASK | 
NMI_MASK)) != 0 || irqs_disabled())
+
+static
+void wait_channel(struct say_channel *ch, int class)
+{
+   if (delay_say_on_overflow && ch->ch_index[class] > SAY_BUF_LIMIT) {
+   if (!use_atomic()) {
+   say_dirty = true;
+   wake_up_interruptible(_event);
+   wait_event_interruptible_timeout(
+   ch->ch_progress, ch->ch_index[class] < SAY_BUF_LIMIT, 
HZ / 10);
+   }
+   }
+}
+
+static
+struct say_channel *find_channel(const void *id)
+{
+   struct say_channel *res = default_channel;
+   struct say_channel *ch;
+
+   read_lock(_lock);
+   for (ch = channel_list; ch; ch = ch->ch_next) {
+   int i;
+
+   for (i = 0; i < ch->ch_id_max; i++) {
+   if (ch->ch_ids[i] == id) {
+   res = ch;
+   goto found;
+   }
+   }
+   }
+found:
+   read_unlock(_lock);
+   return res;
+}
+
+static
+void _remove_binding(struct task_struct *whom)
+{
+   struct say_channe

[RFC 09/32] mars: add new module lib_rank

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 drivers/staging/mars/lib/lib_rank.c |  87 +++
 include/linux/brick/lib_rank.h  | 136 
 2 files changed, 223 insertions(+)
 create mode 100644 drivers/staging/mars/lib/lib_rank.c
 create mode 100644 include/linux/brick/lib_rank.h

diff --git a/drivers/staging/mars/lib/lib_rank.c 
b/drivers/staging/mars/lib/lib_rank.c
new file mode 100644
index ..6327479039b6
--- /dev/null
+++ b/drivers/staging/mars/lib/lib_rank.c
@@ -0,0 +1,87 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+/*  (c) 2012 Thomas Schoebel-Theuer */
+
+#include 
+#include 
+
+#include 
+
+void ranking_compute(struct rank_data *rkd, const struct rank_info rki[], int 
x)
+{
+   int points = 0;
+   int i;
+
+   for (i = 0; ; i++) {
+   int x0;
+   int x1;
+   int y0;
+   int y1;
+
+   x0 = rki[i].rki_x;
+   if (x < x0)
+   break;
+
+   x1 = rki[i + 1].rki_x;
+
+   if (unlikely(x1 == RKI_DUMMY)) {
+   points = rki[i].rki_y;
+   break;
+   }
+
+   if (x > x1)
+   continue;
+
+   y0 = rki[i].rki_y;
+   y1 = rki[i + 1].rki_y;
+
+   /*  linear interpolation */
+   points = ((long long)(x - x0) * (long long)(y1 - y0)) / (x1 - 
x0) + y0;
+   break;
+   }
+   rkd->rkd_tmp += points;
+}
+
+int ranking_select(struct rank_data rkd[], int rkd_count)
+{
+   int res = -1;
+   long long max = LLONG_MIN / 2;
+   int i;
+
+   for (i = 0; i < rkd_count; i++) {
+   struct rank_data *tmp = [i];
+   long long rest = tmp->rkd_current_points;
+
+   if (rest <= 0)
+   continue;
+   /* rest -= tmp->rkd_got; */
+   if (rest > max) {
+   max = rest;
+   res = i;
+   }
+   }
+   /* Prevent underflow in the long term
+* and reset the "clocks" after each round of
+* weighted round-robin selection.
+*/
+   if (max < 0 && res >= 0) {
+   for (i = 0; i < rkd_count; i++)
+   rkd[i].rkd_got += max;
+   }
+   return res;
+}
diff --git a/include/linux/brick/lib_rank.h b/include/linux/brick/lib_rank.h
new file mode 100644
index ..fa18fdf15597
--- /dev/null
+++ b/include/linux/brick/lib_rank.h
@@ -0,0 +1,136 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+/*  (c) 2012 Thomas Schoebel-Theuer */
+
+#ifndef LIB_RANK_H
+#define LIB_RANK_H
+
+/* Generic round-robin scheduler based on ranking information.
+ */
+
+#define RKI_DUMMY  INT_MIN
+
+struct rank_info {
+   int rki_x;
+   int rki_y;
+};
+
+struct rank_data {
+   /*  public readonly */
+   long long rkd_current_points;
+
+   /*  private */
+   long long rkd_tmp;
+   long long rkd_got;
+};
+
+/* Ranking phase.
+ *
+ * Calls should follow the following usage pattern:
+ *
+ * ranking_start(...);
+ * for (...) {
+ *ranking_compute([this_time], ...);
+ *// usually you need at least 1 call for each rkd[] element,
+ *// but you can call more often to include ranking information
+ *// from many different sources.
+ *// Note: instead / additionally, you may also use
+ *// ranking_add() or ranking_override().
+ * }
+ * ranking_stop(...);
+ *
+ * = > now the new ra

[RFC 07/32] mars: add new module lib_pairing_heap

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 include/linux/brick/lib_pairing_heap.h | 109 +
 1 file changed, 109 insertions(+)
 create mode 100644 include/linux/brick/lib_pairing_heap.h

diff --git a/include/linux/brick/lib_pairing_heap.h 
b/include/linux/brick/lib_pairing_heap.h
new file mode 100644
index ..9456e9ea348c
--- /dev/null
+++ b/include/linux/brick/lib_pairing_heap.h
@@ -0,0 +1,109 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef PAIRING_HEAP_H
+#define PAIRING_HEAP_H
+
+/* Algorithm: see http://en.wikipedia.org/wiki/Pairing_heap
+ * This is just an efficient translation from recursive to iterative form.
+ *
+ * Note: find_min() is so trivial that we don't implement it.
+ */
+
+/* generic version: KEYDEF is kept separate, allowing you to
+ * embed this structure into other container structures already
+ * possessing some key (just provide an empty KEYDEF in this case).
+ */
+#define _PAIRING_HEAP_TYPEDEF(KEYTYPE, KEYDEF) \
+   \
+struct pairing_heap_##KEYTYPE {
\
+   KEYDEF  \
+   struct pairing_heap_##KEYTYPE *next;\
+   struct pairing_heap_##KEYTYPE *subheaps;\
+}
+
+/* less generic version: define the key inside.
+ */
+#define PAIRING_HEAP_TYPEDEF(KEYTYPE)  \
+   _PAIRING_HEAP_TYPEDEF(KEYTYPE, KEYTYPE key;)
+
+/* generic methods: allow arbitrary CMP() functions.
+ */
+#define _PAIRING_HEAP_FUNCTIONS(_STATIC, KEYTYPE, CMP) \
+   \
+_STATIC
\
+struct pairing_heap_##KEYTYPE *_ph_merge_##KEYTYPE(\
+struct pairing_heap_##KEYTYPE *heap1, struct pairing_heap_##KEYTYPE *heap2)\
+{  \
+   if (!heap1) \
+   return heap2;   \
+   if (!heap2) \
+   return heap1;   \
+   if (CMP(heap1, heap2) < 0) {\
+   heap2->next = heap1->subheaps;  \
+   heap1->subheaps = heap2;\
+   return heap1;   \
+   }   \
+   heap1->next = heap2->subheaps;  \
+   heap2->subheaps = heap1;\
+   return heap2;   \
+}  \
+   \
+_STATIC
\
+void ph_insert_##KEYTYPE(struct pairing_heap_##KEYTYPE **heap, struct 
pairing_heap_##KEYTYPE *new)\
+{  \
+   new->next = NULL;   \
+   new->subheaps = NULL;   \
+   *heap = _ph_merge_##KEYTYPE(*heap, new);\
+}  \
+   \
+_STATIC
\
+void ph_delete_min_##KEYTYPE(struct pairing_heap_##KEYTYPE **heap) \
+{  \
+   struct pairing_heap_##KEYTYPE *tmplist = NULL;  \
+   struct pairing_heap_##KEYTYPE *ptr; \
+   struct pairing_heap_##KEYTYPE *next;\
+   struct pairing_heap_##KEYTYPE *res;

[RFC 06/32] mars: add new module brick

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 drivers/staging/mars/brick.c | 723 +++
 include/linux/brick/brick.h  | 620 +
 2 files changed, 1343 insertions(+)
 create mode 100644 drivers/staging/mars/brick.c
 create mode 100644 include/linux/brick/brick.h

diff --git a/drivers/staging/mars/brick.c b/drivers/staging/mars/brick.c
new file mode 100644
index ..be741e896fc9
--- /dev/null
+++ b/drivers/staging/mars/brick.c
@@ -0,0 +1,723 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+
+#define _STRATEGY
+
+#include 
+#include 
+
+//
+
+/*  init / exit functions */
+
+void _generic_output_init(
+struct generic_brick *brick, const struct generic_output_type *type, struct 
generic_output *output)
+{
+   output->brick = brick;
+   output->type = type;
+   output->ops = type->master_ops;
+   output->nr_connected = 0;
+   INIT_LIST_HEAD(>output_head);
+}
+
+void _generic_output_exit(struct generic_output *output)
+{
+   list_del_init(>output_head);
+   output->brick = NULL;
+   output->type = NULL;
+   output->ops = NULL;
+   output->nr_connected = 0;
+}
+
+int generic_brick_init(const struct generic_brick_type *type, struct 
generic_brick *brick)
+{
+   brick->aspect_context.brick_index = get_brick_nr();
+   brick->type = type;
+   brick->ops = type->master_ops;
+   brick->nr_inputs = 0;
+   brick->nr_outputs = 0;
+   brick->power.off_led = true;
+   init_waitqueue_head(>power.event);
+   INIT_LIST_HEAD(>tmp_head);
+   return 0;
+}
+
+void generic_brick_exit(struct generic_brick *brick)
+{
+   list_del_init(>tmp_head);
+   brick->type = NULL;
+   brick->ops = NULL;
+   brick->nr_inputs = 0;
+   brick->nr_outputs = 0;
+   put_brick_nr(brick->aspect_context.brick_index);
+}
+
+int generic_input_init(
+struct generic_brick *brick, int index, const struct generic_input_type *type, 
struct generic_input *input)
+{
+   if (index < 0 || index >= brick->type->max_inputs)
+   return -EINVAL;
+   if (brick->inputs[index])
+   return -EEXIST;
+   input->brick = brick;
+   input->type = type;
+   input->connect = NULL;
+   INIT_LIST_HEAD(>input_head);
+   brick->inputs[index] = input;
+   brick->nr_inputs++;
+   return 0;
+}
+
+void generic_input_exit(struct generic_input *input)
+{
+   list_del_init(>input_head);
+   input->brick = NULL;
+   input->type = NULL;
+   input->connect = NULL;
+}
+
+int generic_output_init(
+struct generic_brick *brick, int index, const struct generic_output_type 
*type, struct generic_output *output)
+{
+   if (index < 0 || index >= brick->type->max_outputs)
+   return -ENOMEM;
+   if (brick->outputs[index])
+   return -EEXIST;
+   _generic_output_init(brick, type, output);
+   brick->outputs[index] = output;
+   brick->nr_outputs++;
+   return 0;
+}
+
+int generic_size(const struct generic_brick_type *brick_type)
+{
+   int size = brick_type->brick_size;
+   int i;
+
+   size += brick_type->max_inputs * sizeof(void *);
+   for (i = 0; i < brick_type->max_inputs; i++)
+   size += brick_type->default_input_types[i]->input_size;
+   size += brick_type->max_outputs * sizeof(void *);
+   for (i = 0; i < brick_type->max_outputs; i++)
+   size += brick_type->default_output_types[i]->output_size;
+   return size;
+}
+
+int generic_connect(struct generic_input *input, struct generic_output *output)
+{
+   BRICK_DBG("generic_connect(input=%p, output=%p)\n", input, output);
+   if (unlikely(!input || !output))
+   return -EINVAL;
+   if (unlikely(input->connect))
+   return -EEXIST;
+   if (unlikely(!list_empty(>input_head)))
+   return -EINVAL;
+   /*  helps only against the most common errors */
+   if (unlikely(input->brick == output->brick))
+ 

[RFC 09/32] mars: add new module lib_rank

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/lib/lib_rank.c |  87 +++
 include/linux/brick/lib_rank.h  | 136 
 2 files changed, 223 insertions(+)
 create mode 100644 drivers/staging/mars/lib/lib_rank.c
 create mode 100644 include/linux/brick/lib_rank.h

diff --git a/drivers/staging/mars/lib/lib_rank.c 
b/drivers/staging/mars/lib/lib_rank.c
new file mode 100644
index ..6327479039b6
--- /dev/null
+++ b/drivers/staging/mars/lib/lib_rank.c
@@ -0,0 +1,87 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+/*  (c) 2012 Thomas Schoebel-Theuer */
+
+#include 
+#include 
+
+#include 
+
+void ranking_compute(struct rank_data *rkd, const struct rank_info rki[], int 
x)
+{
+   int points = 0;
+   int i;
+
+   for (i = 0; ; i++) {
+   int x0;
+   int x1;
+   int y0;
+   int y1;
+
+   x0 = rki[i].rki_x;
+   if (x < x0)
+   break;
+
+   x1 = rki[i + 1].rki_x;
+
+   if (unlikely(x1 == RKI_DUMMY)) {
+   points = rki[i].rki_y;
+   break;
+   }
+
+   if (x > x1)
+   continue;
+
+   y0 = rki[i].rki_y;
+   y1 = rki[i + 1].rki_y;
+
+   /*  linear interpolation */
+   points = ((long long)(x - x0) * (long long)(y1 - y0)) / (x1 - 
x0) + y0;
+   break;
+   }
+   rkd->rkd_tmp += points;
+}
+
+int ranking_select(struct rank_data rkd[], int rkd_count)
+{
+   int res = -1;
+   long long max = LLONG_MIN / 2;
+   int i;
+
+   for (i = 0; i < rkd_count; i++) {
+   struct rank_data *tmp = [i];
+   long long rest = tmp->rkd_current_points;
+
+   if (rest <= 0)
+   continue;
+   /* rest -= tmp->rkd_got; */
+   if (rest > max) {
+   max = rest;
+   res = i;
+   }
+   }
+   /* Prevent underflow in the long term
+* and reset the "clocks" after each round of
+* weighted round-robin selection.
+*/
+   if (max < 0 && res >= 0) {
+   for (i = 0; i < rkd_count; i++)
+   rkd[i].rkd_got += max;
+   }
+   return res;
+}
diff --git a/include/linux/brick/lib_rank.h b/include/linux/brick/lib_rank.h
new file mode 100644
index ..fa18fdf15597
--- /dev/null
+++ b/include/linux/brick/lib_rank.h
@@ -0,0 +1,136 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+/*  (c) 2012 Thomas Schoebel-Theuer */
+
+#ifndef LIB_RANK_H
+#define LIB_RANK_H
+
+/* Generic round-robin scheduler based on ranking information.
+ */
+
+#define RKI_DUMMY  INT_MIN
+
+struct rank_info {
+   int rki_x;
+   int rki_y;
+};
+
+struct rank_data {
+   /*  public readonly */
+   long long rkd_current_points;
+
+   /*  private */
+   long long rkd_tmp;
+   long long rkd_got;
+};
+
+/* Ranking phase.
+ *
+ * Calls should follow the following usage pattern:
+ *
+ * ranking_start(...);
+ * for (...) {
+ *ranking_compute([this_time], ...);
+ *// usually you need at least 1 call for each rkd[] element,
+ *// but you can call more often to include ranking information
+ *// from many different sources.
+ *// Note: instead / additionally, you may also use
+ *// ranking_add() or ranking_override().
+ * }
+ * ranking_stop(...);
+ *
+ * = > now the new ranking values are computed and already activ

[RFC 07/32] mars: add new module lib_pairing_heap

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 include/linux/brick/lib_pairing_heap.h | 109 +
 1 file changed, 109 insertions(+)
 create mode 100644 include/linux/brick/lib_pairing_heap.h

diff --git a/include/linux/brick/lib_pairing_heap.h 
b/include/linux/brick/lib_pairing_heap.h
new file mode 100644
index ..9456e9ea348c
--- /dev/null
+++ b/include/linux/brick/lib_pairing_heap.h
@@ -0,0 +1,109 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef PAIRING_HEAP_H
+#define PAIRING_HEAP_H
+
+/* Algorithm: see http://en.wikipedia.org/wiki/Pairing_heap
+ * This is just an efficient translation from recursive to iterative form.
+ *
+ * Note: find_min() is so trivial that we don't implement it.
+ */
+
+/* generic version: KEYDEF is kept separate, allowing you to
+ * embed this structure into other container structures already
+ * possessing some key (just provide an empty KEYDEF in this case).
+ */
+#define _PAIRING_HEAP_TYPEDEF(KEYTYPE, KEYDEF) \
+   \
+struct pairing_heap_##KEYTYPE {
\
+   KEYDEF  \
+   struct pairing_heap_##KEYTYPE *next;\
+   struct pairing_heap_##KEYTYPE *subheaps;\
+}
+
+/* less generic version: define the key inside.
+ */
+#define PAIRING_HEAP_TYPEDEF(KEYTYPE)  \
+   _PAIRING_HEAP_TYPEDEF(KEYTYPE, KEYTYPE key;)
+
+/* generic methods: allow arbitrary CMP() functions.
+ */
+#define _PAIRING_HEAP_FUNCTIONS(_STATIC, KEYTYPE, CMP) \
+   \
+_STATIC
\
+struct pairing_heap_##KEYTYPE *_ph_merge_##KEYTYPE(\
+struct pairing_heap_##KEYTYPE *heap1, struct pairing_heap_##KEYTYPE *heap2)\
+{  \
+   if (!heap1) \
+   return heap2;   \
+   if (!heap2) \
+   return heap1;   \
+   if (CMP(heap1, heap2) < 0) {\
+   heap2->next = heap1->subheaps;  \
+   heap1->subheaps = heap2;\
+   return heap1;   \
+   }   \
+   heap1->next = heap2->subheaps;  \
+   heap2->subheaps = heap1;\
+   return heap2;   \
+}  \
+   \
+_STATIC
\
+void ph_insert_##KEYTYPE(struct pairing_heap_##KEYTYPE **heap, struct 
pairing_heap_##KEYTYPE *new)\
+{  \
+   new->next = NULL;   \
+   new->subheaps = NULL;   \
+   *heap = _ph_merge_##KEYTYPE(*heap, new);\
+}  \
+   \
+_STATIC
\
+void ph_delete_min_##KEYTYPE(struct pairing_heap_##KEYTYPE **heap) \
+{  \
+   struct pairing_heap_##KEYTYPE *tmplist = NULL;  \
+   struct pairing_heap_##KEYTYPE *ptr; \
+   struct pairing_heap_##KEYTYPE *next;\
+   struct pairing_heap_##KEYTYPE *res;  

[RFC 06/32] mars: add new module brick

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/brick.c | 723 +++
 include/linux/brick/brick.h  | 620 +
 2 files changed, 1343 insertions(+)
 create mode 100644 drivers/staging/mars/brick.c
 create mode 100644 include/linux/brick/brick.h

diff --git a/drivers/staging/mars/brick.c b/drivers/staging/mars/brick.c
new file mode 100644
index ..be741e896fc9
--- /dev/null
+++ b/drivers/staging/mars/brick.c
@@ -0,0 +1,723 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+
+#define _STRATEGY
+
+#include 
+#include 
+
+//
+
+/*  init / exit functions */
+
+void _generic_output_init(
+struct generic_brick *brick, const struct generic_output_type *type, struct 
generic_output *output)
+{
+   output->brick = brick;
+   output->type = type;
+   output->ops = type->master_ops;
+   output->nr_connected = 0;
+   INIT_LIST_HEAD(>output_head);
+}
+
+void _generic_output_exit(struct generic_output *output)
+{
+   list_del_init(>output_head);
+   output->brick = NULL;
+   output->type = NULL;
+   output->ops = NULL;
+   output->nr_connected = 0;
+}
+
+int generic_brick_init(const struct generic_brick_type *type, struct 
generic_brick *brick)
+{
+   brick->aspect_context.brick_index = get_brick_nr();
+   brick->type = type;
+   brick->ops = type->master_ops;
+   brick->nr_inputs = 0;
+   brick->nr_outputs = 0;
+   brick->power.off_led = true;
+   init_waitqueue_head(>power.event);
+   INIT_LIST_HEAD(>tmp_head);
+   return 0;
+}
+
+void generic_brick_exit(struct generic_brick *brick)
+{
+   list_del_init(>tmp_head);
+   brick->type = NULL;
+   brick->ops = NULL;
+   brick->nr_inputs = 0;
+   brick->nr_outputs = 0;
+   put_brick_nr(brick->aspect_context.brick_index);
+}
+
+int generic_input_init(
+struct generic_brick *brick, int index, const struct generic_input_type *type, 
struct generic_input *input)
+{
+   if (index < 0 || index >= brick->type->max_inputs)
+   return -EINVAL;
+   if (brick->inputs[index])
+   return -EEXIST;
+   input->brick = brick;
+   input->type = type;
+   input->connect = NULL;
+   INIT_LIST_HEAD(>input_head);
+   brick->inputs[index] = input;
+   brick->nr_inputs++;
+   return 0;
+}
+
+void generic_input_exit(struct generic_input *input)
+{
+   list_del_init(>input_head);
+   input->brick = NULL;
+   input->type = NULL;
+   input->connect = NULL;
+}
+
+int generic_output_init(
+struct generic_brick *brick, int index, const struct generic_output_type 
*type, struct generic_output *output)
+{
+   if (index < 0 || index >= brick->type->max_outputs)
+   return -ENOMEM;
+   if (brick->outputs[index])
+   return -EEXIST;
+   _generic_output_init(brick, type, output);
+   brick->outputs[index] = output;
+   brick->nr_outputs++;
+   return 0;
+}
+
+int generic_size(const struct generic_brick_type *brick_type)
+{
+   int size = brick_type->brick_size;
+   int i;
+
+   size += brick_type->max_inputs * sizeof(void *);
+   for (i = 0; i < brick_type->max_inputs; i++)
+   size += brick_type->default_input_types[i]->input_size;
+   size += brick_type->max_outputs * sizeof(void *);
+   for (i = 0; i < brick_type->max_outputs; i++)
+   size += brick_type->default_output_types[i]->output_size;
+   return size;
+}
+
+int generic_connect(struct generic_input *input, struct generic_output *output)
+{
+   BRICK_DBG("generic_connect(input=%p, output=%p)\n", input, output);
+   if (unlikely(!input || !output))
+   return -EINVAL;
+   if (unlikely(input->connect))
+   return -EEXIST;
+   if (unlikely(!list_empty(>input_head)))
+   return -EINVAL;
+   /*  helps only against the most common errors */
+   if (unlikely(input->brick == output->brick))
+   return -EDEADLK;
+

[RFC 01/32] mars: add new module lamport

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 drivers/staging/mars/lamport.c | 61 ++
 include/linux/brick/lamport.h  | 26 ++
 2 files changed, 87 insertions(+)
 create mode 100644 drivers/staging/mars/lamport.c
 create mode 100644 include/linux/brick/lamport.h

diff --git a/drivers/staging/mars/lamport.c b/drivers/staging/mars/lamport.c
new file mode 100644
index ..373093f6e35f
--- /dev/null
+++ b/drivers/staging/mars/lamport.c
@@ -0,0 +1,61 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+
+/*  TODO: replace with spinlock if possible (first check) */
+struct semaphore lamport_sem = __SEMAPHORE_INITIALIZER(lamport_sem, 1);
+struct timespec lamport_now = {};
+
+void get_lamport(struct timespec *now)
+{
+   int diff;
+
+   down(_sem);
+
+   *now = CURRENT_TIME;
+   diff = timespec_compare(now, _now);
+   if (diff >= 0) {
+   timespec_add_ns(now, 1);
+   memcpy(_now, now, sizeof(lamport_now));
+   timespec_add_ns(_now, 1);
+   } else {
+   timespec_add_ns(_now, 1);
+   memcpy(now, _now, sizeof(*now));
+   }
+
+   up(_sem);
+}
+
+void set_lamport(struct timespec *old)
+{
+   int diff;
+
+   down(_sem);
+
+   diff = timespec_compare(old, _now);
+   if (diff >= 0) {
+   memcpy(_now, old, sizeof(lamport_now));
+   timespec_add_ns(_now, 1);
+   }
+
+   up(_sem);
+}
diff --git a/include/linux/brick/lamport.h b/include/linux/brick/lamport.h
new file mode 100644
index ..9aac0ce01bb4
--- /dev/null
+++ b/include/linux/brick/lamport.h
@@ -0,0 +1,26 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef LAMPORT_H
+#define LAMPORT_H
+
+#include 
+
+extern void get_lamport(struct timespec *now);
+extern void set_lamport(struct timespec *old);
+
+#endif
-- 
2.11.0



[RFC 11/32] mars: add new module lib_timing

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 drivers/staging/mars/lib/lib_timing.c |  68 +
 include/linux/brick/lib_timing.h  | 182 ++
 2 files changed, 250 insertions(+)
 create mode 100644 drivers/staging/mars/lib/lib_timing.c
 create mode 100644 include/linux/brick/lib_timing.h

diff --git a/drivers/staging/mars/lib/lib_timing.c 
b/drivers/staging/mars/lib/lib_timing.c
new file mode 100644
index ..1996052cb647
--- /dev/null
+++ b/drivers/staging/mars/lib/lib_timing.c
@@ -0,0 +1,68 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+
+#include 
+#include 
+
+#ifdef CONFIG_DEBUG_KERNEL
+
+int report_timing(struct timing_stats *tim, char *str, int maxlen)
+{
+   int len = 0;
+   int time = 1;
+   int resol = 1;
+
+   static const char * const units[] = {
+   "us",
+   "ms",
+   "s",
+   "ERROR"
+   };
+   const char *unit = units[0];
+   int unit_index = 0;
+   int i;
+
+   for (i = 0; i < TIMING_MAX; i++) {
+   int this_len = scnprintf(
+
+   str, maxlen, "<%d%s = %d (%lld) ", resol, unit, 
tim->tim_count[i], (
+   long long)tim->tim_count[i] * time);
+
+   str += this_len;
+   len += this_len;
+   maxlen -= this_len;
+   if (maxlen <= 1)
+   break;
+   resol <<= 1;
+   time <<= 1;
+   if (resol >= 1000) {
+   resol = 1;
+   unit = units[++unit_index];
+   }
+   }
+   return len;
+}
+
+#endif /*  CONFIG_DEBUG_KERNEL */
+
+struct threshold global_io_threshold = {
+   .thr_limit = 30 * 100, /*  30 seconds */
+   .thr_factor = 100,
+   .thr_plus = 0,
+};
diff --git a/include/linux/brick/lib_timing.h b/include/linux/brick/lib_timing.h
new file mode 100644
index ..7081d984a2ce
--- /dev/null
+++ b/include/linux/brick/lib_timing.h
@@ -0,0 +1,182 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef LIB_TIMING_H
+#define LIB_TIMING_H
+
+#include 
+
+/* Simple infrastructure for timing of arbitrary operations and creation
+ * of some simple histogram statistics.
+ */
+
+#define TIMING_MAX 24
+
+struct timing_stats {
+#ifdef CONFIG_DEBUG_KERNEL
+   int tim_count[TIMING_MAX];
+
+#endif
+};
+
+#define _TIME_THIS(_stamp1, _stamp2, _CODE)\
+   ({  \
+   (_stamp1) = cpu_clock(raw_smp_processor_id());  \
+   \
+   _CODE;  \
+   \
+   (_stamp2) = cpu_clock(raw_smp_processor_id());  \
+   (_stamp2) - (_stamp1);  \
+   })
+
+#define TIME_THIS(_CODE)   \
+   ({  \
+   unsigned long long _stamp1; \
+   unsigned long long _stamp2; \
+   _TIME_THIS(_stamp1, _stamp2, _CODE);\
+   })
+
+#ifdef CONFIG_DEBUG_KERNEL
+
+#define _TIME_STATS(_timing, _stamp1, _stamp2, _CODE)  \
+   ({  \
+ 

[RFC 04/32] mars: add new module brick_checking

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 include/linux/brick/brick_checking.h | 107 +++
 1 file changed, 107 insertions(+)
 create mode 100644 include/linux/brick/brick_checking.h

diff --git a/include/linux/brick/brick_checking.h 
b/include/linux/brick/brick_checking.h
new file mode 100644
index ..957bd5227db9
--- /dev/null
+++ b/include/linux/brick/brick_checking.h
@@ -0,0 +1,107 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef BRICK_CHECKING_H
+#define BRICK_CHECKING_H
+
+/***/
+
+/*  checking */
+
+#if defined(CONFIG_MARS_DEBUG) || defined(CONFIG_MARS_CHECKS)
+#define BRICK_CHECKING true
+#else
+#define BRICK_CHECKING false
+#endif
+
+#define _CHECK_ATOMIC(atom, OP, minval)
\
+do {   \
+   if (BRICK_CHECKING) {   \
+   int __test = atomic_read(atom); \
+   if (unlikely(__test OP(minval))) {  \
+   atomic_set(atom, minval);   \
+   BRICK_ERR("%d: atomic " #atom " " #OP " " #minval " 
(%d)\n", __LINE__, __test);\
+   }   \
+   }   \
+} while (0)
+
+#define CHECK_ATOMIC(atom, minval) \
+   _CHECK_ATOMIC(atom, <, minval)
+
+#define CHECK_HEAD_EMPTY(head) \
+do {   \
+   if (BRICK_CHECKING && unlikely(!list_empty(head) && (head)->next)) {\
+   list_del_init(head);\
+   BRICK_ERR("%d: list_head " #head " (%p) not empty\n", __LINE__, 
head);\
+   }   \
+} while (0)
+
+#ifdef CONFIG_MARS_DEBUG_MEM
+#define CHECK_PTR_DEAD(ptr, label) \
+do {   \
+   if (BRICK_CHECKING && unlikely((ptr) == (void *)0x5a5a5a5a5a5a5a5a)) {\
+   BRICK_FAT("%d: pointer '" #ptr "' is DEAD\n", __LINE__);\
+   goto label; \
+   }   \
+} while (0)
+#else
+#define CHECK_PTR_DEAD(ptr, label) /*empty*/
+#endif
+
+#define CHECK_PTR_NULL(ptr, label) \
+do {   \
+   CHECK_PTR_DEAD(ptr, label); \
+   if (BRICK_CHECKING && unlikely(!(ptr))) {   \
+   BRICK_FAT("%d: pointer '" #ptr "' is NULL\n", __LINE__);\
+   goto label; \
+   }   \
+} while (0)
+
+#ifdef CONFIG_MARS_DEBUG
+#define CHECK_PTR(ptr, label)  \
+do {   \
+   CHECK_PTR_NULL(ptr, label); \
+   if (BRICK_CHECKING && unlikely(!virt_addr_valid(ptr))) {\
+   BRICK_FAT("%d: pointer '" #ptr "' (%p) is no valid virtual 
KERNEL address\n", __LINE__, ptr);\
+   goto label; \
+   }   \
+} while (0)
+#else
+#define CHECK_PTR(ptr, label) CHECK_PTR_NULL(ptr, label)
+#endif
+
+#define CHECK_ASPECT(a_ptr, o_ptr, label)  \
+do {   \
+   if (BRICK_CHECKING && unlikely((a_ptr)->object != o_ptr)) { \
+ 

[RFC 01/32] mars: add new module lamport

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/lamport.c | 61 ++
 include/linux/brick/lamport.h  | 26 ++
 2 files changed, 87 insertions(+)
 create mode 100644 drivers/staging/mars/lamport.c
 create mode 100644 include/linux/brick/lamport.h

diff --git a/drivers/staging/mars/lamport.c b/drivers/staging/mars/lamport.c
new file mode 100644
index ..373093f6e35f
--- /dev/null
+++ b/drivers/staging/mars/lamport.c
@@ -0,0 +1,61 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+
+/*  TODO: replace with spinlock if possible (first check) */
+struct semaphore lamport_sem = __SEMAPHORE_INITIALIZER(lamport_sem, 1);
+struct timespec lamport_now = {};
+
+void get_lamport(struct timespec *now)
+{
+   int diff;
+
+   down(_sem);
+
+   *now = CURRENT_TIME;
+   diff = timespec_compare(now, _now);
+   if (diff >= 0) {
+   timespec_add_ns(now, 1);
+   memcpy(_now, now, sizeof(lamport_now));
+   timespec_add_ns(_now, 1);
+   } else {
+   timespec_add_ns(_now, 1);
+   memcpy(now, _now, sizeof(*now));
+   }
+
+   up(_sem);
+}
+
+void set_lamport(struct timespec *old)
+{
+   int diff;
+
+   down(_sem);
+
+   diff = timespec_compare(old, _now);
+   if (diff >= 0) {
+   memcpy(_now, old, sizeof(lamport_now));
+   timespec_add_ns(_now, 1);
+   }
+
+   up(_sem);
+}
diff --git a/include/linux/brick/lamport.h b/include/linux/brick/lamport.h
new file mode 100644
index ..9aac0ce01bb4
--- /dev/null
+++ b/include/linux/brick/lamport.h
@@ -0,0 +1,26 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef LAMPORT_H
+#define LAMPORT_H
+
+#include 
+
+extern void get_lamport(struct timespec *now);
+extern void set_lamport(struct timespec *old);
+
+#endif
-- 
2.11.0



[RFC 11/32] mars: add new module lib_timing

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/lib/lib_timing.c |  68 +
 include/linux/brick/lib_timing.h  | 182 ++
 2 files changed, 250 insertions(+)
 create mode 100644 drivers/staging/mars/lib/lib_timing.c
 create mode 100644 include/linux/brick/lib_timing.h

diff --git a/drivers/staging/mars/lib/lib_timing.c 
b/drivers/staging/mars/lib/lib_timing.c
new file mode 100644
index ..1996052cb647
--- /dev/null
+++ b/drivers/staging/mars/lib/lib_timing.c
@@ -0,0 +1,68 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+
+#include 
+#include 
+
+#ifdef CONFIG_DEBUG_KERNEL
+
+int report_timing(struct timing_stats *tim, char *str, int maxlen)
+{
+   int len = 0;
+   int time = 1;
+   int resol = 1;
+
+   static const char * const units[] = {
+   "us",
+   "ms",
+   "s",
+   "ERROR"
+   };
+   const char *unit = units[0];
+   int unit_index = 0;
+   int i;
+
+   for (i = 0; i < TIMING_MAX; i++) {
+   int this_len = scnprintf(
+
+   str, maxlen, "<%d%s = %d (%lld) ", resol, unit, 
tim->tim_count[i], (
+   long long)tim->tim_count[i] * time);
+
+   str += this_len;
+   len += this_len;
+   maxlen -= this_len;
+   if (maxlen <= 1)
+   break;
+   resol <<= 1;
+   time <<= 1;
+   if (resol >= 1000) {
+   resol = 1;
+   unit = units[++unit_index];
+   }
+   }
+   return len;
+}
+
+#endif /*  CONFIG_DEBUG_KERNEL */
+
+struct threshold global_io_threshold = {
+   .thr_limit = 30 * 100, /*  30 seconds */
+   .thr_factor = 100,
+   .thr_plus = 0,
+};
diff --git a/include/linux/brick/lib_timing.h b/include/linux/brick/lib_timing.h
new file mode 100644
index ..7081d984a2ce
--- /dev/null
+++ b/include/linux/brick/lib_timing.h
@@ -0,0 +1,182 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef LIB_TIMING_H
+#define LIB_TIMING_H
+
+#include 
+
+/* Simple infrastructure for timing of arbitrary operations and creation
+ * of some simple histogram statistics.
+ */
+
+#define TIMING_MAX 24
+
+struct timing_stats {
+#ifdef CONFIG_DEBUG_KERNEL
+   int tim_count[TIMING_MAX];
+
+#endif
+};
+
+#define _TIME_THIS(_stamp1, _stamp2, _CODE)\
+   ({  \
+   (_stamp1) = cpu_clock(raw_smp_processor_id());  \
+   \
+   _CODE;  \
+   \
+   (_stamp2) = cpu_clock(raw_smp_processor_id());  \
+   (_stamp2) - (_stamp1);  \
+   })
+
+#define TIME_THIS(_CODE)   \
+   ({  \
+   unsigned long long _stamp1; \
+   unsigned long long _stamp2; \
+   _TIME_THIS(_stamp1, _stamp2, _CODE);\
+   })
+
+#ifdef CONFIG_DEBUG_KERNEL
+
+#define _TIME_STATS(_timing, _stamp1, _stamp2, _CODE)  \
+   ({  \
+   unsi

[RFC 04/32] mars: add new module brick_checking

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 include/linux/brick/brick_checking.h | 107 +++
 1 file changed, 107 insertions(+)
 create mode 100644 include/linux/brick/brick_checking.h

diff --git a/include/linux/brick/brick_checking.h 
b/include/linux/brick/brick_checking.h
new file mode 100644
index ..957bd5227db9
--- /dev/null
+++ b/include/linux/brick/brick_checking.h
@@ -0,0 +1,107 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef BRICK_CHECKING_H
+#define BRICK_CHECKING_H
+
+/***/
+
+/*  checking */
+
+#if defined(CONFIG_MARS_DEBUG) || defined(CONFIG_MARS_CHECKS)
+#define BRICK_CHECKING true
+#else
+#define BRICK_CHECKING false
+#endif
+
+#define _CHECK_ATOMIC(atom, OP, minval)
\
+do {   \
+   if (BRICK_CHECKING) {   \
+   int __test = atomic_read(atom); \
+   if (unlikely(__test OP(minval))) {  \
+   atomic_set(atom, minval);   \
+   BRICK_ERR("%d: atomic " #atom " " #OP " " #minval " 
(%d)\n", __LINE__, __test);\
+   }   \
+   }   \
+} while (0)
+
+#define CHECK_ATOMIC(atom, minval) \
+   _CHECK_ATOMIC(atom, <, minval)
+
+#define CHECK_HEAD_EMPTY(head) \
+do {   \
+   if (BRICK_CHECKING && unlikely(!list_empty(head) && (head)->next)) {\
+   list_del_init(head);\
+   BRICK_ERR("%d: list_head " #head " (%p) not empty\n", __LINE__, 
head);\
+   }   \
+} while (0)
+
+#ifdef CONFIG_MARS_DEBUG_MEM
+#define CHECK_PTR_DEAD(ptr, label) \
+do {   \
+   if (BRICK_CHECKING && unlikely((ptr) == (void *)0x5a5a5a5a5a5a5a5a)) {\
+   BRICK_FAT("%d: pointer '" #ptr "' is DEAD\n", __LINE__);\
+   goto label; \
+   }   \
+} while (0)
+#else
+#define CHECK_PTR_DEAD(ptr, label) /*empty*/
+#endif
+
+#define CHECK_PTR_NULL(ptr, label) \
+do {   \
+   CHECK_PTR_DEAD(ptr, label); \
+   if (BRICK_CHECKING && unlikely(!(ptr))) {   \
+   BRICK_FAT("%d: pointer '" #ptr "' is NULL\n", __LINE__);\
+   goto label; \
+   }   \
+} while (0)
+
+#ifdef CONFIG_MARS_DEBUG
+#define CHECK_PTR(ptr, label)  \
+do {   \
+   CHECK_PTR_NULL(ptr, label); \
+   if (BRICK_CHECKING && unlikely(!virt_addr_valid(ptr))) {\
+   BRICK_FAT("%d: pointer '" #ptr "' (%p) is no valid virtual 
KERNEL address\n", __LINE__, ptr);\
+   goto label; \
+   }   \
+} while (0)
+#else
+#define CHECK_PTR(ptr, label) CHECK_PTR_NULL(ptr, label)
+#endif
+
+#define CHECK_ASPECT(a_ptr, o_ptr, label)  \
+do {   \
+   if (BRICK_CHECKING && unlikely((a_ptr)->object != o_ptr)) { \
+   

[RFC 05/32] mars: add new module meta

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 include/linux/brick/meta.h | 106 +
 1 file changed, 106 insertions(+)
 create mode 100644 include/linux/brick/meta.h

diff --git a/include/linux/brick/meta.h b/include/linux/brick/meta.h
new file mode 100644
index ..a92b2b649c1f
--- /dev/null
+++ b/include/linux/brick/meta.h
@@ -0,0 +1,106 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef META_H
+#define META_H
+
+/***/
+
+/*  metadata descriptions */
+
+/* The idea is to describe your C structures in such a way that
+ * transfers to disk or over a network become self-describing.
+ *
+ * In essence, this is a kind of version-independent marshalling.
+ *
+ * Advantage:
+ * When you extend your original C struct (and of course update the
+ * corresponding meta structure), old data on disk (or network peers
+ * running an old version of your program) will remain valid.
+ * Upon read, newly added fields missing in the old version will be simply
+ * not filled in and therefore remain zeroed (if you don't forget to
+ * initially clear your structures via memset() / initializers / etc).
+ * Note that this works only if you never rename or remove existing
+ * fields; you should only add new ones.
+ * [TODO: add macros for description of ignored / renamed fields to
+ *  overcome this limitation]
+ * You may increase the size of integers, for example from 32bit to 64bit
+ * or even higher; sign extension will be automatically carried out
+ * when necessary.
+ * Also, you may change the order of fields, because the metadata interpreter
+ * will check each field individually; field offsets are automatically
+ * maintained.
+ *
+ * Disadvantage: this adds some (small) overhead.
+ */
+
+enum field_type {
+   FIELD_DONE,
+   FIELD_REF,
+   FIELD_SUB,
+   FIELD_STRING,
+   FIELD_RAW,
+   FIELD_INT,
+   FIELD_UINT,
+};
+
+struct meta {
+   /* char field_name[MAX_FIELD_LEN]; */
+   char *field_name;
+
+   short field_type;
+   short field_data_size;
+   short field_transfer_size;
+   int   field_offset;
+   const struct meta *field_ref;
+};
+
+#define _META_INI(NAME, STRUCT, TYPE, TSIZE)   \
+   .field_name = #NAME,\
+   .field_type = TYPE, \
+   .field_data_size = sizeof(((STRUCT *)NULL)->NAME),  \
+   .field_transfer_size = (TSIZE), \
+   .field_offset = offsetof(STRUCT, NAME)  \
+
+#define META_INI_TRANSFER(NAME, STRUCT, TYPE, TSIZE)   \
+   { _META_INI(NAME, STRUCT, TYPE, TSIZE) }
+
+#define META_INI(NAME, STRUCT, TYPE)   \
+   { _META_INI(NAME, STRUCT, TYPE, 0) }
+
+#define _META_INI_AIO(NAME, STRUCT, AIO)   \
+   .field_name = #NAME,\
+   .field_type = FIELD_REF,\
+   .field_data_size = sizeof(*(((STRUCT *)NULL)->NAME)),   \
+   .field_offset = offsetof(STRUCT, NAME), \
+   .field_ref = AIO
+
+#define META_INI_AIO(NAME, STRUCT, AIO) { _META_INI_AIO(NAME, STRUCT, AIO) }
+
+#define _META_INI_SUB(NAME, STRUCT, SUB)   \
+   .field_name = #NAME,\
+   .field_type = FIELD_SUB,\
+   .field_data_size = sizeof(((STRUCT *)NULL)->NAME),  \
+   .field_offset = offsetof(STRUCT, NAME), \
+   .field_ref = SUB
+
+#define META_INI_SUB(NAME, STRUCT, SUB) { _META_INI_SUB(NAME, STRUCT, SUB) }
+
+extern const struct meta *find_meta(const struct meta *meta, const char 
*field_name);
+/* extern void free_meta(void *data, const struct meta *meta); */
+
+#endif
-- 
2.11.0



[RFC 00/32] State of MARS Reo-Redundancy Module

2016-12-30 Thread Thomas Schoebel-Theuer
d start joining the MARS
development in 2017, at least for helping me getting it upstream.

I would be excited if I would be invited to the next kernel summit
or a similar meeting.

A happy new year from your devoted

Thomas


[1] https://github.com/schoebel/mars/blob/master/docu/MARS_GUUG2016.pdf

[2] https://github.com/schoebel/mars


Thomas Schoebel-Theuer (32):
  mars: add new module lamport
  mars: add new module brick_say
  mars: add new module brick_mem
  mars: add new module brick_checking
  mars: add new module meta
  mars: add new module brick
  mars: add new module lib_pairing_heap
  mars: add new module lib_queue
  mars: add new module lib_rank
  mars: add new module lib_limiter
  mars: add new module lib_timing
  mars: add new module vfs_compat
  mars: add new module xio
  mars: add new module xio_net
  mars: add new module lib_mapfree
  mars: add new module lib_log
  mars: add new module xio_bio
  mars: add new module xio_sio
  mars: add new module xio_client
  mars: add new module xio_if
  mars: add new module xio_copy
  mars: add new module xio_trans_logger
  mars: add new module xio_server
  mars: add new module strategy
  mars: add new module main_strategy
  mars: add new module net
  mars: add new module server_strategy
  mars: add new module mars_proc
  mars: add new module mars_main
  mars: add new module Makefile
  mars: add new module Kconfig
  mars: activate build

 drivers/staging/Kconfig|2 +
 drivers/staging/Makefile   |1 +
 drivers/staging/mars/Kconfig   |  266 +
 drivers/staging/mars/Makefile  |   96 +
 drivers/staging/mars/brick.c   |  723 +++
 drivers/staging/mars/brick_mem.c   | 1080 
 drivers/staging/mars/brick_say.c   |  920 +++
 drivers/staging/mars/lamport.c |   61 +
 drivers/staging/mars/lib/lib_limiter.c |  163 +
 drivers/staging/mars/lib/lib_rank.c|   87 +
 drivers/staging/mars/lib/lib_timing.c  |   68 +
 drivers/staging/mars/mars/main_strategy.c  | 2135 +++
 drivers/staging/mars/mars/mars_main.c  | 6160 
 drivers/staging/mars/mars/mars_proc.c  |  389 ++
 drivers/staging/mars/mars/mars_proc.h  |   34 +
 drivers/staging/mars/mars/net.c|  109 +
 drivers/staging/mars/mars/server_strategy.c|  436 ++
 drivers/staging/mars/mars/strategy.h   |  239 +
 drivers/staging/mars/xio_bricks/lib_log.c  |  506 ++
 drivers/staging/mars/xio_bricks/lib_mapfree.c  |  382 ++
 drivers/staging/mars/xio_bricks/xio.c  |  227 +
 drivers/staging/mars/xio_bricks/xio_bio.c  |  845 +++
 drivers/staging/mars/xio_bricks/xio_client.c   | 1083 
 drivers/staging/mars/xio_bricks/xio_copy.c | 1005 
 drivers/staging/mars/xio_bricks/xio_if.c   |  892 +++
 drivers/staging/mars/xio_bricks/xio_net.c  | 1849 ++
 drivers/staging/mars/xio_bricks/xio_server.c   |  493 ++
 drivers/staging/mars/xio_bricks/xio_sio.c  |  578 ++
 drivers/staging/mars/xio_bricks/xio_trans_logger.c | 3410 +++
 include/linux/brick/brick.h|  620 ++
 include/linux/brick/brick_checking.h   |  107 +
 include/linux/brick/brick_mem.h|  218 +
 include/linux/brick/brick_say.h|   89 +
 include/linux/brick/lamport.h  |   26 +
 include/linux/brick/lib_limiter.h  |   52 +
 include/linux/brick/lib_pairing_heap.h |  109 +
 include/linux/brick/lib_queue.h|  165 +
 include/linux/brick/lib_rank.h |  136 +
 include/linux/brick/lib_timing.h   |  182 +
 include/linux/brick/meta.h |  106 +
 include/linux/brick/vfs_compat.h   |   48 +
 include/linux/xio/lib_log.h|  333 ++
 include/linux/xio/lib_mapfree.h|   84 +
 include/linux/xio/xio.h|  319 +
 include/linux/xio/xio_bio.h|   85 +
 include/linux/xio/xio_client.h |  105 +
 include/linux/xio/xio_copy.h   |  115 +
 include/linux/xio/xio_if.h |  109 +
 include/linux/xio/xio_net.h|  177 +
 include/linux/xio/xio_server.h |   91 +
 include/linux/xio/xio_sio.h|   68 +
 include/linux/xio/xio_trans_logger.h   |  271 +
 52 files changed, 27854 insertions(+)
 create mode 100644 drivers/staging/mars/Kconfig
 create mode 100644 drivers/staging/mars/Makefile
 create mode 100644 drivers/staging/mars/brick.c
 create mode 100644 drivers/staging/mars/brick_mem.c
 create mode 100644 drivers/staging/mars/brick_say.c
 create mode 100644 drivers/staging/mars/lamport.c
 create m

[RFC 14/32] mars: add new module xio_net

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 drivers/staging/mars/xio_bricks/xio_net.c | 1849 +
 include/linux/xio/xio_net.h   |  177 +++
 2 files changed, 2026 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio_net.c
 create mode 100644 include/linux/xio/xio_net.h

diff --git a/drivers/staging/mars/xio_bricks/xio_net.c 
b/drivers/staging/mars/xio_bricks/xio_net.c
new file mode 100644
index ..441eee1f3912
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio_net.c
@@ -0,0 +1,1849 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+/**/
+
+/*  provisionary version detection */
+
+#ifndef TCP_MAX_REORDERING
+#define __HAS_IOV_ITER
+#endif
+
+#ifdef sk_net_refcnt
+/* see eeb1bd5c40edb0e2fd925c8535e2fdebdbc5cef2 */
+#define __HAS_STRUCT_NET
+#endif
+
+/**/
+
+#define USE_BUFFERING
+
+#define SEND_PROTO_VERSION 2
+
+enum COMPRESS_TYPES {
+   COMPRESS_NONE = 0,
+   COMPRESS_LZO = 1,
+   /* insert further methods here */
+};
+
+int xio_net_compress_data;
+
+const u16 net_global_flags = 0
+#ifdef __HAVE_LZO
+   | COMPRESS_LZO
+#endif
+   ;
+
+/**/
+
+/* Internal data structures for low-level transfer of C structures
+ * described by struct meta.
+ * Only these low-level fields need to have a fixed size like s64.
+ * The size and bytesex of the higher-level C structures is converted
+ * automatically; therefore classical "int" or "long long" etc is viable.
+ */
+
+#define MAX_FIELD_LEN  (32 + 16)
+
+/* Please keep this at a size of 64 bytes by
+ * reuse of *spare* fields.
+ */
+struct xio_desc_cache {
+   u8cache_sender_proto;
+   u8cache_recver_proto;
+   s8cache_is_bigendian;
+   u8cache_spare0;
+   s16   cache_items;
+   u16   cache_spare1;
+   u32   cache_spare2;
+   u32   cache_spare3;
+   u64   cache_spare4[4];
+   u64   cache_sender_cookie;
+   u64   cache_recver_cookie;
+};
+
+/* Please keep this also at a size of 64 bytes by
+ * reuse of *spare* fields.
+ */
+struct xio_desc_item {
+   s8field_type;
+   s8field_spare0;
+   s16   field_data_size;
+   s16   field_sender_size;
+   s16   field_sender_offset;
+   s16   field_recver_size;
+   s16   field_recver_offset;
+   s32   field_spare;
+   char  field_name[MAX_FIELD_LEN];
+};
+
+/* This must not be mirror symmetric between big and little endian
+ */
+#define XIO_DESC_MAGIC 0x73D0A2EC6148F48Ell
+
+struct xio_desc_header {
+   u64 h_magic;
+   u64 h_cookie;
+   s16 h_meta_len;
+   s16 h_index;
+   u32 h_spare1;
+   u64 h_spare2;
+};
+
+#define MAX_INT_TRANSFER   16
+
+/**/
+
+/* Bytesex conversion / sign extension
+ */
+
+#ifdef __LITTLE_ENDIAN
+static const bool myself_is_bigendian;
+
+#endif
+#ifdef __BIG_ENDIAN
+static const bool myself_is_bigendian = true;
+
+#endif
+
+static inline
+void swap_bytes(void *data, int len)
+{
+   char *a = data;
+   char *b = data + len - 1;
+
+   while (a < b) {
+   char tmp = *a;
+
+   *a = *b;
+   *b = tmp;
+   a++;
+   b--;
+   }
+}
+
+#define SWAP_FIELD(x) swap_bytes(&(x), sizeof(x))
+
+static inline
+void swap_mc(struct xio_desc_cache *mc, int len)
+{
+   struct xio_desc_item *mi;
+
+   SWAP_FIELD(mc->cache_sender_cookie);
+   SWAP_FIELD(mc->cache_recver_cookie);
+   SWAP_FIELD(mc->cache_items);
+
+   len -= sizeof(*mc);
+
+   for (mi = (void *)(mc + 1); len > 0; mi++, len -= sizeof(*mi)) {
+   SWAP_FIELD(mi->field_data_size);
+   SWAP_FIELD(mi->field_sender_size);
+   SWAP_FIELD(mi->field_sender_offset);
+   SWAP_FIELD(mi->field_recver_size);
+   SWAP_FIELD(mi->field_recver_offset);
+   }
+}
+
+static inline
+char get_sign(const void *data, int 

[RFC 13/32] mars: add new module xio

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 drivers/staging/mars/xio_bricks/xio.c | 227 
 include/linux/xio/xio.h   | 319 ++
 2 files changed, 546 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio.c
 create mode 100644 include/linux/xio/xio.h

diff --git a/drivers/staging/mars/xio_bricks/xio.c 
b/drivers/staging/mars/xio_bricks/xio.c
new file mode 100644
index ..e58f11f497f9
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio.c
@@ -0,0 +1,227 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+//
+
+/*  infrastructure */
+
+struct banning xio_global_ban = {};
+atomic_t xio_global_io_flying = ATOMIC_INIT(0);
+
+//
+
+/*  object stuff */
+
+const struct generic_object_type aio_type = {
+   .object_type_name = "aio",
+   .default_size = sizeof(struct aio_object),
+   .object_type_nr = OBJ_TYPE_AIO,
+};
+
+//
+
+/*  brick stuff */
+
+/***/
+
+/*  meta descriptions */
+
+const struct meta xio_info_meta[] = {
+   META_INI(current_size,struct xio_info, FIELD_INT),
+   META_INI(tf_align,struct xio_info, FIELD_INT),
+   META_INI(tf_min_size, struct xio_info, FIELD_INT),
+   {}
+};
+
+const struct meta xio_aio_user_meta[] = {
+   META_INI(_object_cb.cb_error, struct aio_object, FIELD_INT),
+   META_INI(io_pos,   struct aio_object, FIELD_INT),
+   META_INI(io_len,   struct aio_object, FIELD_INT),
+   META_INI(io_may_write,struct aio_object, FIELD_INT),
+   META_INI(io_prio,  struct aio_object, FIELD_INT),
+   META_INI(io_cs_mode,   struct aio_object, FIELD_INT),
+   META_INI(io_timeout,   struct aio_object, FIELD_INT),
+   META_INI(io_total_size,   struct aio_object, FIELD_INT),
+   META_INI(io_checksum,  struct aio_object, FIELD_RAW),
+   META_INI(io_flags, struct aio_object, FIELD_INT),
+   META_INI(io_rw,struct aio_object, FIELD_INT),
+   META_INI(io_id,struct aio_object, FIELD_INT),
+   META_INI(io_skip_sync,struct aio_object, FIELD_INT),
+   {}
+};
+
+const struct meta xio_timespec_meta[] = {
+   META_INI_TRANSFER(tv_sec,  struct timespec, FIELD_UINT, 8),
+   META_INI_TRANSFER(tv_nsec, struct timespec, FIELD_UINT, 4),
+   {}
+};
+
+//
+
+/*  crypto stuff */
+
+#include 
+#include 
+
+/* 896545098777564212b9e91af4c973f094649aa7 */
+#ifndef crt_hash
+#define HAS_NEW_CRYPTO
+#endif
+
+#ifdef HAS_NEW_CRYPTO
+
+/* Nor now, use shash.
+ * Later, asynchronous support should be added for full exploitation
+ * of crypto hardware.
+ */
+#include 
+
+static struct crypto_shash *xio_tfm;
+int xio_digest_size;
+
+struct mars_sdesc {
+   struct shash_desc shash;
+   char ctx[];
+};
+
+void xio_digest(unsigned char *digest, void *data, int len)
+{
+   int size = sizeof(struct mars_sdesc) + crypto_shash_descsize(xio_tfm);
+   struct mars_sdesc *sdesc = brick_mem_alloc(size);
+   int status;
+
+   sdesc->shash.tfm = xio_tfm;
+   sdesc->shash.flags = 0;
+
+   memset(digest, 0, xio_digest_size);
+   status = crypto_shash_digest(>shash, data, len, digest);
+   if (unlikely(status < 0))
+   XIO_ERR(
+   "cannot calculate cksum on %p len=%d, status=%d\n",
+data, len,
+status);
+
+   brick_mem_free(sdesc);
+}
+
+#else  /* HAS_NEW_CRYPTO */
+
+/* Old implementation, to disappear.
+ * Was a quick'n dirty lab prototype with unnecessary
+ * global variables and locking.
+ */
+
+static struct crypto_hash *xio_tfm;
+static struct semaphore tfm_sem;
+int xio_digest_size;
+
+void xio_digest(unsigned char *digest, void *data, int len)
+{
+   struct hash_desc desc = {
+   .tfm = xio_tfm,
+   .flags = 0,
+   };
+   struct scatterlist sg;
+
+   memset(digest, 0, xio

[RFC 10/32] mars: add new module lib_limiter

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 drivers/staging/mars/lib/lib_limiter.c | 163 +
 include/linux/brick/lib_limiter.h  |  52 +++
 2 files changed, 215 insertions(+)
 create mode 100644 drivers/staging/mars/lib/lib_limiter.c
 create mode 100644 include/linux/brick/lib_limiter.h

diff --git a/drivers/staging/mars/lib/lib_limiter.c 
b/drivers/staging/mars/lib/lib_limiter.c
new file mode 100644
index ..e77b74a0eae7
--- /dev/null
+++ b/drivers/staging/mars/lib/lib_limiter.c
@@ -0,0 +1,163 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#define LIMITER_TIME_RESOLUTIONNSEC_PER_SEC
+
+int rate_limit(struct rate_limiter *lim, int amount)
+{
+   int delay = 0;
+   long long now;
+
+   now = cpu_clock(raw_smp_processor_id());
+
+   /* Compute the maximum delay along the path
+* down to the root of the hierarchy tree.
+*/
+   while (lim) {
+   long long window = now - lim->lim_stamp;
+
+   /* Sometimes, raw CPU clocks may do weired things...
+* Smaller windows in the denominator than 1s could fake 
unrealistic rates.
+*/
+   if (unlikely(lim->lim_min_window <= 0))
+   lim->lim_min_window = 1000;
+   if (unlikely(lim->lim_max_window <= lim->lim_min_window))
+   lim->lim_max_window = lim->lim_min_window + 8000;
+   if (unlikely(window < (long long)lim->lim_min_window * 
(LIMITER_TIME_RESOLUTION / 1000)))
+   window = (long long)lim->lim_min_window * 
(LIMITER_TIME_RESOLUTION / 1000);
+
+   /* Update total statistics.
+* They will intentionally wrap around.
+* Userspace must take care of that.
+*/
+   if (likely(amount > 0)) {
+   lim->lim_total_amount += amount;
+   lim->lim_total_ops++;
+   }
+
+   /* Only use incremental accumulation at repeated calls, but
+* never after longer pauses.
+*/
+   if (likely(lim->lim_stamp &&
+  window < (long long)lim->lim_max_window * 
(LIMITER_TIME_RESOLUTION / 1000))) {
+   long long rate_raw;
+   int rate;
+   int max_rate;
+
+   /* Races are possible, but taken into account.
+* There is no real harm from rarely lost updates.
+*/
+   if (likely(amount > 0)) {
+   lim->lim_amount_accu += amount;
+   lim->lim_amount_cumul += amount;
+   lim->lim_ops_accu++;
+   lim->lim_ops_cumul++;
+   }
+
+   /* compute amount values */
+   rate_raw = lim->lim_amount_accu * 
LIMITER_TIME_RESOLUTION / window;
+   rate = rate_raw;
+   if (unlikely(rate_raw > INT_MAX))
+   rate = INT_MAX;
+   lim->lim_amount_rate = rate;
+
+   /* amount limit exceeded? */
+   max_rate = lim->lim_max_amount_rate;
+   if (max_rate > 0 && rate > max_rate) {
+   int this_delay = (
+
+   window * rate / max_rate - window) / 
(LIMITER_TIME_RESOLUTION / 1000);
+   /*  compute maximum */
+   if (this_delay > delay && this_delay > 0)
+   delay = this_delay;
+   }
+
+   /* compute ops values */
+   rate_raw = lim->lim_ops_accu * LIMITER_TIME_RESOLUTION 
/ window;
+   rate = rate_raw;
+   if (unlikely(rate_raw > INT_MAX))
+   rate = INT_MAX;
+   lim->lim_ops_rate = rate;
+
+   

[RFC 08/32] mars: add new module lib_queue

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 include/linux/brick/lib_queue.h | 165 
 1 file changed, 165 insertions(+)
 create mode 100644 include/linux/brick/lib_queue.h

diff --git a/include/linux/brick/lib_queue.h b/include/linux/brick/lib_queue.h
new file mode 100644
index ..72cd0a2710c2
--- /dev/null
+++ b/include/linux/brick/lib_queue.h
@@ -0,0 +1,165 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef LIB_QUEUE_H
+#define LIB_QUEUE_H
+
+#define QUEUE_ANCHOR(PREFIX, KEYTYPE, HEAPTYPE)
\
+   /* parameters */\
+   /* readonly from outside */ \
+   atomic_t q_queued;  \
+   atomic_t q_flying;  \
+   atomic_t q_total;   \
+   /* tunables */  \
+   int q_batchlen; \
+   int q_io_prio;  \
+   bool q_ordering;\
+   /* private */   \
+   wait_queue_head_t *q_event; \
+   spinlock_t q_lock;  \
+   struct list_head q_anchor;  \
+   struct pairing_heap_##HEAPTYPE *heap_high;  \
+   struct pairing_heap_##HEAPTYPE *heap_low;   \
+   long long q_last_insert; /* jiffies */  \
+   KEYTYPE heap_margin;\
+   KEYTYPE last_pos
+
+#define QUEUE_FUNCTIONS(PREFIX, ELEM_TYPE, HEAD, KEYFN, KEYCMP, HEAPTYPE)\
+   \
+static inline  \
+void q_##PREFIX##_trigger(struct PREFIX##_queue *q)\
+{  \
+   if (q->q_event) {   \
+   wake_up_interruptible(q->q_event);  \
+   }   \
+}  \
+   \
+static inline  \
+void q_##PREFIX##_init(struct PREFIX##_queue *q)   \
+{  \
+   INIT_LIST_HEAD(>q_anchor);   \
+   q->heap_low = NULL; \
+   q->heap_high = NULL;\
+   spin_lock_init(>q_lock); \
+   atomic_set(>q_queued, 0);\
+   atomic_set(>q_flying, 0);\
+}  \
+   \
+static inline  \
+void q_##PREFIX##_insert(struct PREFIX##_queue *q, ELEM_TYPE * elem)   \
+{  \
+   unsigned long flags;\
+   \
+   spin_lock_irqsave(>q_lock, flags);   \
+   \
+   if (q->q_ordering) {\
+   struct pairing_heap_##HEAPTYPE **use = >heap_high;   \
+   if (KEYCMP(KEYFN(elem), >heap_margin) <= 0) {\
+   use = >heap_low; \
+   }  

[RFC 05/32] mars: add new module meta

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 include/linux/brick/meta.h | 106 +
 1 file changed, 106 insertions(+)
 create mode 100644 include/linux/brick/meta.h

diff --git a/include/linux/brick/meta.h b/include/linux/brick/meta.h
new file mode 100644
index ..a92b2b649c1f
--- /dev/null
+++ b/include/linux/brick/meta.h
@@ -0,0 +1,106 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef META_H
+#define META_H
+
+/***/
+
+/*  metadata descriptions */
+
+/* The idea is to describe your C structures in such a way that
+ * transfers to disk or over a network become self-describing.
+ *
+ * In essence, this is a kind of version-independent marshalling.
+ *
+ * Advantage:
+ * When you extend your original C struct (and of course update the
+ * corresponding meta structure), old data on disk (or network peers
+ * running an old version of your program) will remain valid.
+ * Upon read, newly added fields missing in the old version will be simply
+ * not filled in and therefore remain zeroed (if you don't forget to
+ * initially clear your structures via memset() / initializers / etc).
+ * Note that this works only if you never rename or remove existing
+ * fields; you should only add new ones.
+ * [TODO: add macros for description of ignored / renamed fields to
+ *  overcome this limitation]
+ * You may increase the size of integers, for example from 32bit to 64bit
+ * or even higher; sign extension will be automatically carried out
+ * when necessary.
+ * Also, you may change the order of fields, because the metadata interpreter
+ * will check each field individually; field offsets are automatically
+ * maintained.
+ *
+ * Disadvantage: this adds some (small) overhead.
+ */
+
+enum field_type {
+   FIELD_DONE,
+   FIELD_REF,
+   FIELD_SUB,
+   FIELD_STRING,
+   FIELD_RAW,
+   FIELD_INT,
+   FIELD_UINT,
+};
+
+struct meta {
+   /* char field_name[MAX_FIELD_LEN]; */
+   char *field_name;
+
+   short field_type;
+   short field_data_size;
+   short field_transfer_size;
+   int   field_offset;
+   const struct meta *field_ref;
+};
+
+#define _META_INI(NAME, STRUCT, TYPE, TSIZE)   \
+   .field_name = #NAME,\
+   .field_type = TYPE, \
+   .field_data_size = sizeof(((STRUCT *)NULL)->NAME),  \
+   .field_transfer_size = (TSIZE), \
+   .field_offset = offsetof(STRUCT, NAME)  \
+
+#define META_INI_TRANSFER(NAME, STRUCT, TYPE, TSIZE)   \
+   { _META_INI(NAME, STRUCT, TYPE, TSIZE) }
+
+#define META_INI(NAME, STRUCT, TYPE)   \
+   { _META_INI(NAME, STRUCT, TYPE, 0) }
+
+#define _META_INI_AIO(NAME, STRUCT, AIO)   \
+   .field_name = #NAME,\
+   .field_type = FIELD_REF,\
+   .field_data_size = sizeof(*(((STRUCT *)NULL)->NAME)),   \
+   .field_offset = offsetof(STRUCT, NAME), \
+   .field_ref = AIO
+
+#define META_INI_AIO(NAME, STRUCT, AIO) { _META_INI_AIO(NAME, STRUCT, AIO) }
+
+#define _META_INI_SUB(NAME, STRUCT, SUB)   \
+   .field_name = #NAME,\
+   .field_type = FIELD_SUB,\
+   .field_data_size = sizeof(((STRUCT *)NULL)->NAME),  \
+   .field_offset = offsetof(STRUCT, NAME), \
+   .field_ref = SUB
+
+#define META_INI_SUB(NAME, STRUCT, SUB) { _META_INI_SUB(NAME, STRUCT, SUB) }
+
+extern const struct meta *find_meta(const struct meta *meta, const char 
*field_name);
+/* extern void free_meta(void *data, const struct meta *meta); */
+
+#endif
-- 
2.11.0



[RFC 00/32] State of MARS Reo-Redundancy Module

2016-12-30 Thread Thomas Schoebel-Theuer
d start joining the MARS
development in 2017, at least for helping me getting it upstream.

I would be excited if I would be invited to the next kernel summit
or a similar meeting.

A happy new year from your devoted

Thomas


[1] https://github.com/schoebel/mars/blob/master/docu/MARS_GUUG2016.pdf

[2] https://github.com/schoebel/mars


Thomas Schoebel-Theuer (32):
  mars: add new module lamport
  mars: add new module brick_say
  mars: add new module brick_mem
  mars: add new module brick_checking
  mars: add new module meta
  mars: add new module brick
  mars: add new module lib_pairing_heap
  mars: add new module lib_queue
  mars: add new module lib_rank
  mars: add new module lib_limiter
  mars: add new module lib_timing
  mars: add new module vfs_compat
  mars: add new module xio
  mars: add new module xio_net
  mars: add new module lib_mapfree
  mars: add new module lib_log
  mars: add new module xio_bio
  mars: add new module xio_sio
  mars: add new module xio_client
  mars: add new module xio_if
  mars: add new module xio_copy
  mars: add new module xio_trans_logger
  mars: add new module xio_server
  mars: add new module strategy
  mars: add new module main_strategy
  mars: add new module net
  mars: add new module server_strategy
  mars: add new module mars_proc
  mars: add new module mars_main
  mars: add new module Makefile
  mars: add new module Kconfig
  mars: activate build

 drivers/staging/Kconfig|2 +
 drivers/staging/Makefile   |1 +
 drivers/staging/mars/Kconfig   |  266 +
 drivers/staging/mars/Makefile  |   96 +
 drivers/staging/mars/brick.c   |  723 +++
 drivers/staging/mars/brick_mem.c   | 1080 
 drivers/staging/mars/brick_say.c   |  920 +++
 drivers/staging/mars/lamport.c |   61 +
 drivers/staging/mars/lib/lib_limiter.c |  163 +
 drivers/staging/mars/lib/lib_rank.c|   87 +
 drivers/staging/mars/lib/lib_timing.c  |   68 +
 drivers/staging/mars/mars/main_strategy.c  | 2135 +++
 drivers/staging/mars/mars/mars_main.c  | 6160 
 drivers/staging/mars/mars/mars_proc.c  |  389 ++
 drivers/staging/mars/mars/mars_proc.h  |   34 +
 drivers/staging/mars/mars/net.c|  109 +
 drivers/staging/mars/mars/server_strategy.c|  436 ++
 drivers/staging/mars/mars/strategy.h   |  239 +
 drivers/staging/mars/xio_bricks/lib_log.c  |  506 ++
 drivers/staging/mars/xio_bricks/lib_mapfree.c  |  382 ++
 drivers/staging/mars/xio_bricks/xio.c  |  227 +
 drivers/staging/mars/xio_bricks/xio_bio.c  |  845 +++
 drivers/staging/mars/xio_bricks/xio_client.c   | 1083 
 drivers/staging/mars/xio_bricks/xio_copy.c | 1005 
 drivers/staging/mars/xio_bricks/xio_if.c   |  892 +++
 drivers/staging/mars/xio_bricks/xio_net.c  | 1849 ++
 drivers/staging/mars/xio_bricks/xio_server.c   |  493 ++
 drivers/staging/mars/xio_bricks/xio_sio.c  |  578 ++
 drivers/staging/mars/xio_bricks/xio_trans_logger.c | 3410 +++
 include/linux/brick/brick.h|  620 ++
 include/linux/brick/brick_checking.h   |  107 +
 include/linux/brick/brick_mem.h|  218 +
 include/linux/brick/brick_say.h|   89 +
 include/linux/brick/lamport.h  |   26 +
 include/linux/brick/lib_limiter.h  |   52 +
 include/linux/brick/lib_pairing_heap.h |  109 +
 include/linux/brick/lib_queue.h|  165 +
 include/linux/brick/lib_rank.h |  136 +
 include/linux/brick/lib_timing.h   |  182 +
 include/linux/brick/meta.h |  106 +
 include/linux/brick/vfs_compat.h   |   48 +
 include/linux/xio/lib_log.h|  333 ++
 include/linux/xio/lib_mapfree.h|   84 +
 include/linux/xio/xio.h|  319 +
 include/linux/xio/xio_bio.h|   85 +
 include/linux/xio/xio_client.h |  105 +
 include/linux/xio/xio_copy.h   |  115 +
 include/linux/xio/xio_if.h |  109 +
 include/linux/xio/xio_net.h|  177 +
 include/linux/xio/xio_server.h |   91 +
 include/linux/xio/xio_sio.h|   68 +
 include/linux/xio/xio_trans_logger.h   |  271 +
 52 files changed, 27854 insertions(+)
 create mode 100644 drivers/staging/mars/Kconfig
 create mode 100644 drivers/staging/mars/Makefile
 create mode 100644 drivers/staging/mars/brick.c
 create mode 100644 drivers/staging/mars/brick_mem.c
 create mode 100644 drivers/staging/mars/brick_say.c
 create mode 100644 drivers/staging/mars/lamport.c
 create m

[RFC 14/32] mars: add new module xio_net

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/xio_bricks/xio_net.c | 1849 +
 include/linux/xio/xio_net.h   |  177 +++
 2 files changed, 2026 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio_net.c
 create mode 100644 include/linux/xio/xio_net.h

diff --git a/drivers/staging/mars/xio_bricks/xio_net.c 
b/drivers/staging/mars/xio_bricks/xio_net.c
new file mode 100644
index ..441eee1f3912
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio_net.c
@@ -0,0 +1,1849 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+/**/
+
+/*  provisionary version detection */
+
+#ifndef TCP_MAX_REORDERING
+#define __HAS_IOV_ITER
+#endif
+
+#ifdef sk_net_refcnt
+/* see eeb1bd5c40edb0e2fd925c8535e2fdebdbc5cef2 */
+#define __HAS_STRUCT_NET
+#endif
+
+/**/
+
+#define USE_BUFFERING
+
+#define SEND_PROTO_VERSION 2
+
+enum COMPRESS_TYPES {
+   COMPRESS_NONE = 0,
+   COMPRESS_LZO = 1,
+   /* insert further methods here */
+};
+
+int xio_net_compress_data;
+
+const u16 net_global_flags = 0
+#ifdef __HAVE_LZO
+   | COMPRESS_LZO
+#endif
+   ;
+
+/**/
+
+/* Internal data structures for low-level transfer of C structures
+ * described by struct meta.
+ * Only these low-level fields need to have a fixed size like s64.
+ * The size and bytesex of the higher-level C structures is converted
+ * automatically; therefore classical "int" or "long long" etc is viable.
+ */
+
+#define MAX_FIELD_LEN  (32 + 16)
+
+/* Please keep this at a size of 64 bytes by
+ * reuse of *spare* fields.
+ */
+struct xio_desc_cache {
+   u8cache_sender_proto;
+   u8cache_recver_proto;
+   s8cache_is_bigendian;
+   u8cache_spare0;
+   s16   cache_items;
+   u16   cache_spare1;
+   u32   cache_spare2;
+   u32   cache_spare3;
+   u64   cache_spare4[4];
+   u64   cache_sender_cookie;
+   u64   cache_recver_cookie;
+};
+
+/* Please keep this also at a size of 64 bytes by
+ * reuse of *spare* fields.
+ */
+struct xio_desc_item {
+   s8field_type;
+   s8field_spare0;
+   s16   field_data_size;
+   s16   field_sender_size;
+   s16   field_sender_offset;
+   s16   field_recver_size;
+   s16   field_recver_offset;
+   s32   field_spare;
+   char  field_name[MAX_FIELD_LEN];
+};
+
+/* This must not be mirror symmetric between big and little endian
+ */
+#define XIO_DESC_MAGIC 0x73D0A2EC6148F48Ell
+
+struct xio_desc_header {
+   u64 h_magic;
+   u64 h_cookie;
+   s16 h_meta_len;
+   s16 h_index;
+   u32 h_spare1;
+   u64 h_spare2;
+};
+
+#define MAX_INT_TRANSFER   16
+
+/**/
+
+/* Bytesex conversion / sign extension
+ */
+
+#ifdef __LITTLE_ENDIAN
+static const bool myself_is_bigendian;
+
+#endif
+#ifdef __BIG_ENDIAN
+static const bool myself_is_bigendian = true;
+
+#endif
+
+static inline
+void swap_bytes(void *data, int len)
+{
+   char *a = data;
+   char *b = data + len - 1;
+
+   while (a < b) {
+   char tmp = *a;
+
+   *a = *b;
+   *b = tmp;
+   a++;
+   b--;
+   }
+}
+
+#define SWAP_FIELD(x) swap_bytes(&(x), sizeof(x))
+
+static inline
+void swap_mc(struct xio_desc_cache *mc, int len)
+{
+   struct xio_desc_item *mi;
+
+   SWAP_FIELD(mc->cache_sender_cookie);
+   SWAP_FIELD(mc->cache_recver_cookie);
+   SWAP_FIELD(mc->cache_items);
+
+   len -= sizeof(*mc);
+
+   for (mi = (void *)(mc + 1); len > 0; mi++, len -= sizeof(*mi)) {
+   SWAP_FIELD(mi->field_data_size);
+   SWAP_FIELD(mi->field_sender_size);
+   SWAP_FIELD(mi->field_sender_offset);
+   SWAP_FIELD(mi->field_recver_size);
+   SWAP_FIELD(mi->field_recver_offset);
+   }
+}
+
+static inline
+char get_sign(const void *data, int len, bool is_bigendian, bool is_

[RFC 13/32] mars: add new module xio

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/xio_bricks/xio.c | 227 
 include/linux/xio/xio.h   | 319 ++
 2 files changed, 546 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio.c
 create mode 100644 include/linux/xio/xio.h

diff --git a/drivers/staging/mars/xio_bricks/xio.c 
b/drivers/staging/mars/xio_bricks/xio.c
new file mode 100644
index ..e58f11f497f9
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio.c
@@ -0,0 +1,227 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+//
+
+/*  infrastructure */
+
+struct banning xio_global_ban = {};
+atomic_t xio_global_io_flying = ATOMIC_INIT(0);
+
+//
+
+/*  object stuff */
+
+const struct generic_object_type aio_type = {
+   .object_type_name = "aio",
+   .default_size = sizeof(struct aio_object),
+   .object_type_nr = OBJ_TYPE_AIO,
+};
+
+//
+
+/*  brick stuff */
+
+/***/
+
+/*  meta descriptions */
+
+const struct meta xio_info_meta[] = {
+   META_INI(current_size,struct xio_info, FIELD_INT),
+   META_INI(tf_align,struct xio_info, FIELD_INT),
+   META_INI(tf_min_size, struct xio_info, FIELD_INT),
+   {}
+};
+
+const struct meta xio_aio_user_meta[] = {
+   META_INI(_object_cb.cb_error, struct aio_object, FIELD_INT),
+   META_INI(io_pos,   struct aio_object, FIELD_INT),
+   META_INI(io_len,   struct aio_object, FIELD_INT),
+   META_INI(io_may_write,struct aio_object, FIELD_INT),
+   META_INI(io_prio,  struct aio_object, FIELD_INT),
+   META_INI(io_cs_mode,   struct aio_object, FIELD_INT),
+   META_INI(io_timeout,   struct aio_object, FIELD_INT),
+   META_INI(io_total_size,   struct aio_object, FIELD_INT),
+   META_INI(io_checksum,  struct aio_object, FIELD_RAW),
+   META_INI(io_flags, struct aio_object, FIELD_INT),
+   META_INI(io_rw,struct aio_object, FIELD_INT),
+   META_INI(io_id,struct aio_object, FIELD_INT),
+   META_INI(io_skip_sync,struct aio_object, FIELD_INT),
+   {}
+};
+
+const struct meta xio_timespec_meta[] = {
+   META_INI_TRANSFER(tv_sec,  struct timespec, FIELD_UINT, 8),
+   META_INI_TRANSFER(tv_nsec, struct timespec, FIELD_UINT, 4),
+   {}
+};
+
+//
+
+/*  crypto stuff */
+
+#include 
+#include 
+
+/* 896545098777564212b9e91af4c973f094649aa7 */
+#ifndef crt_hash
+#define HAS_NEW_CRYPTO
+#endif
+
+#ifdef HAS_NEW_CRYPTO
+
+/* Nor now, use shash.
+ * Later, asynchronous support should be added for full exploitation
+ * of crypto hardware.
+ */
+#include 
+
+static struct crypto_shash *xio_tfm;
+int xio_digest_size;
+
+struct mars_sdesc {
+   struct shash_desc shash;
+   char ctx[];
+};
+
+void xio_digest(unsigned char *digest, void *data, int len)
+{
+   int size = sizeof(struct mars_sdesc) + crypto_shash_descsize(xio_tfm);
+   struct mars_sdesc *sdesc = brick_mem_alloc(size);
+   int status;
+
+   sdesc->shash.tfm = xio_tfm;
+   sdesc->shash.flags = 0;
+
+   memset(digest, 0, xio_digest_size);
+   status = crypto_shash_digest(>shash, data, len, digest);
+   if (unlikely(status < 0))
+   XIO_ERR(
+   "cannot calculate cksum on %p len=%d, status=%d\n",
+data, len,
+status);
+
+   brick_mem_free(sdesc);
+}
+
+#else  /* HAS_NEW_CRYPTO */
+
+/* Old implementation, to disappear.
+ * Was a quick'n dirty lab prototype with unnecessary
+ * global variables and locking.
+ */
+
+static struct crypto_hash *xio_tfm;
+static struct semaphore tfm_sem;
+int xio_digest_size;
+
+void xio_digest(unsigned char *digest, void *data, int len)
+{
+   struct hash_desc desc = {
+   .tfm = xio_tfm,
+   .flags = 0,
+   };
+   struct scatterlist sg;
+
+   memset(digest, 0, xio_digest_size);
+
+   /*  TODO: u

[RFC 10/32] mars: add new module lib_limiter

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/lib/lib_limiter.c | 163 +
 include/linux/brick/lib_limiter.h  |  52 +++
 2 files changed, 215 insertions(+)
 create mode 100644 drivers/staging/mars/lib/lib_limiter.c
 create mode 100644 include/linux/brick/lib_limiter.h

diff --git a/drivers/staging/mars/lib/lib_limiter.c 
b/drivers/staging/mars/lib/lib_limiter.c
new file mode 100644
index ..e77b74a0eae7
--- /dev/null
+++ b/drivers/staging/mars/lib/lib_limiter.c
@@ -0,0 +1,163 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#define LIMITER_TIME_RESOLUTIONNSEC_PER_SEC
+
+int rate_limit(struct rate_limiter *lim, int amount)
+{
+   int delay = 0;
+   long long now;
+
+   now = cpu_clock(raw_smp_processor_id());
+
+   /* Compute the maximum delay along the path
+* down to the root of the hierarchy tree.
+*/
+   while (lim) {
+   long long window = now - lim->lim_stamp;
+
+   /* Sometimes, raw CPU clocks may do weired things...
+* Smaller windows in the denominator than 1s could fake 
unrealistic rates.
+*/
+   if (unlikely(lim->lim_min_window <= 0))
+   lim->lim_min_window = 1000;
+   if (unlikely(lim->lim_max_window <= lim->lim_min_window))
+   lim->lim_max_window = lim->lim_min_window + 8000;
+   if (unlikely(window < (long long)lim->lim_min_window * 
(LIMITER_TIME_RESOLUTION / 1000)))
+   window = (long long)lim->lim_min_window * 
(LIMITER_TIME_RESOLUTION / 1000);
+
+   /* Update total statistics.
+* They will intentionally wrap around.
+* Userspace must take care of that.
+*/
+   if (likely(amount > 0)) {
+   lim->lim_total_amount += amount;
+   lim->lim_total_ops++;
+   }
+
+   /* Only use incremental accumulation at repeated calls, but
+* never after longer pauses.
+*/
+   if (likely(lim->lim_stamp &&
+  window < (long long)lim->lim_max_window * 
(LIMITER_TIME_RESOLUTION / 1000))) {
+   long long rate_raw;
+   int rate;
+   int max_rate;
+
+   /* Races are possible, but taken into account.
+* There is no real harm from rarely lost updates.
+*/
+   if (likely(amount > 0)) {
+   lim->lim_amount_accu += amount;
+   lim->lim_amount_cumul += amount;
+   lim->lim_ops_accu++;
+   lim->lim_ops_cumul++;
+   }
+
+   /* compute amount values */
+   rate_raw = lim->lim_amount_accu * 
LIMITER_TIME_RESOLUTION / window;
+   rate = rate_raw;
+   if (unlikely(rate_raw > INT_MAX))
+   rate = INT_MAX;
+   lim->lim_amount_rate = rate;
+
+   /* amount limit exceeded? */
+   max_rate = lim->lim_max_amount_rate;
+   if (max_rate > 0 && rate > max_rate) {
+   int this_delay = (
+
+   window * rate / max_rate - window) / 
(LIMITER_TIME_RESOLUTION / 1000);
+   /*  compute maximum */
+   if (this_delay > delay && this_delay > 0)
+   delay = this_delay;
+   }
+
+   /* compute ops values */
+   rate_raw = lim->lim_ops_accu * LIMITER_TIME_RESOLUTION 
/ window;
+   rate = rate_raw;
+   if (unlikely(rate_raw > INT_MAX))
+   rate = INT_MAX;
+   lim->lim_ops_rate = rate;
+
+   /* ops limit 

[RFC 08/32] mars: add new module lib_queue

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 include/linux/brick/lib_queue.h | 165 
 1 file changed, 165 insertions(+)
 create mode 100644 include/linux/brick/lib_queue.h

diff --git a/include/linux/brick/lib_queue.h b/include/linux/brick/lib_queue.h
new file mode 100644
index ..72cd0a2710c2
--- /dev/null
+++ b/include/linux/brick/lib_queue.h
@@ -0,0 +1,165 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef LIB_QUEUE_H
+#define LIB_QUEUE_H
+
+#define QUEUE_ANCHOR(PREFIX, KEYTYPE, HEAPTYPE)
\
+   /* parameters */\
+   /* readonly from outside */ \
+   atomic_t q_queued;  \
+   atomic_t q_flying;  \
+   atomic_t q_total;   \
+   /* tunables */  \
+   int q_batchlen; \
+   int q_io_prio;  \
+   bool q_ordering;\
+   /* private */   \
+   wait_queue_head_t *q_event; \
+   spinlock_t q_lock;  \
+   struct list_head q_anchor;  \
+   struct pairing_heap_##HEAPTYPE *heap_high;  \
+   struct pairing_heap_##HEAPTYPE *heap_low;   \
+   long long q_last_insert; /* jiffies */  \
+   KEYTYPE heap_margin;\
+   KEYTYPE last_pos
+
+#define QUEUE_FUNCTIONS(PREFIX, ELEM_TYPE, HEAD, KEYFN, KEYCMP, HEAPTYPE)\
+   \
+static inline  \
+void q_##PREFIX##_trigger(struct PREFIX##_queue *q)\
+{  \
+   if (q->q_event) {   \
+   wake_up_interruptible(q->q_event);  \
+   }   \
+}  \
+   \
+static inline  \
+void q_##PREFIX##_init(struct PREFIX##_queue *q)   \
+{  \
+   INIT_LIST_HEAD(>q_anchor);   \
+   q->heap_low = NULL; \
+   q->heap_high = NULL;\
+   spin_lock_init(>q_lock); \
+   atomic_set(>q_queued, 0);\
+   atomic_set(>q_flying, 0);\
+}  \
+   \
+static inline  \
+void q_##PREFIX##_insert(struct PREFIX##_queue *q, ELEM_TYPE * elem)   \
+{  \
+   unsigned long flags;\
+   \
+   spin_lock_irqsave(>q_lock, flags);   \
+   \
+   if (q->q_ordering) {\
+   struct pairing_heap_##HEAPTYPE **use = >heap_high;   \
+   if (KEYCMP(KEYFN(elem), >heap_margin) <= 0) {\
+   use = >heap_low; \
+   }   \
+   ph_insert_##HEAPTYPE

[RFC 19/32] mars: add new module xio_client

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 drivers/staging/mars/xio_bricks/xio_client.c | 1083 ++
 include/linux/xio/xio_client.h   |  105 +++
 2 files changed, 1188 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio_client.c
 create mode 100644 include/linux/xio/xio_client.h

diff --git a/drivers/staging/mars/xio_bricks/xio_client.c 
b/drivers/staging/mars/xio_bricks/xio_client.c
new file mode 100644
index ..209523378660
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio_client.c
@@ -0,0 +1,1083 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+/ own type definitions ***/
+
+#include 
+
+#define CLIENT_HASH_MAX(PAGE_SIZE / sizeof(struct 
list_head))
+
+int xio_client_abort = 10;
+
+int max_client_channels = 1;
+
+int max_client_bulk = 16;
+
+/ own helper functions ***/
+
+static int thread_count;
+
+static
+void _do_resubmit(struct client_channel *ch)
+{
+   struct client_output *output = ch->output;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   if (!list_empty(>wait_list)) {
+   struct list_head *first = ch->wait_list.next;
+   struct list_head *last = ch->wait_list.prev;
+   struct list_head *old_start = output->aio_list.next;
+
+#define list_connect __list_del /*  the original routine has a misleading 
name: in reality it is more general */
+   list_connect(>aio_list, first);
+   list_connect(last, old_start);
+   INIT_LIST_HEAD(>wait_list);
+   }
+   spin_unlock_irqrestore(>lock, flags);
+}
+
+static
+void _kill_thread(struct client_threadinfo *ti, const char *name)
+{
+   struct task_struct *thread = ti->thread;
+
+   if (thread) {
+   XIO_DBG("stopping %s thread\n", name);
+   ti->thread = NULL;
+   brick_thread_stop(thread);
+   }
+}
+
+static
+void _kill_channel(struct client_channel *ch)
+{
+   XIO_DBG("channel = %p\n", ch);
+   if (xio_socket_is_alive(>socket)) {
+   XIO_DBG("shutdown socket\n");
+   xio_shutdown_socket(>socket);
+   }
+   _kill_thread(>receiver, "receiver");
+   if (ch->is_open) {
+   XIO_DBG("close socket\n");
+   xio_put_socket(>socket);
+   }
+   ch->recv_error = 0;
+   ch->is_used = false;
+   ch->is_open = false;
+   ch->is_connected = false;
+   /* Re-Submit any waiting requests
+*/
+   _do_resubmit(ch);
+}
+
+static inline
+void _kill_all_channels(struct client_bundle *bundle)
+{
+   int i;
+
+   /*  first pass: shutdown in parallel without waiting */
+   for (i = 0; i < MAX_CLIENT_CHANNELS; i++) {
+   struct client_channel *ch = >channel[i];
+
+   if (xio_socket_is_alive(>socket)) {
+   XIO_DBG("shutdown socket %d\n", i);
+   xio_shutdown_socket(>socket);
+   }
+   }
+   /*  separate pass (may wait) */
+   for (i = 0; i < MAX_CLIENT_CHANNELS; i++)
+   _kill_channel(>channel[i]);
+}
+
+static int receiver_thread(void *data);
+
+static
+int _setup_channel(struct client_bundle *bundle, int ch_nr)
+{
+   struct client_channel *ch = >channel[ch_nr];
+   struct sockaddr_storage src_sockaddr;
+   struct sockaddr_storage dst_sockaddr;
+   int status;
+
+   ch->ch_nr = ch_nr;
+   if (unlikely(ch->receiver.thread)) {
+   XIO_WRN("receiver thread %d unexpectedly not dead\n", ch_nr);
+   _kill_thread(>receiver, "receiver");
+   }
+
+   status = xio_create_sockaddr(_sockaddr, my_id());
+   if (unlikely(status < 0)) {
+   XIO_DBG("no src sockaddr, status = %d\n", status);
+   goto done;
+   }
+
+   status = xio_create_sockaddr(_sockaddr, bundle->host);
+   if (unlikely(status < 0)) {
+   XIO_DBG("no dst sockaddr, status = %d

[RFC 03/32] mars: add new module brick_mem

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 drivers/staging/mars/brick_mem.c | 1080 ++
 include/linux/brick/brick_mem.h  |  218 
 2 files changed, 1298 insertions(+)
 create mode 100644 drivers/staging/mars/brick_mem.c
 create mode 100644 include/linux/brick/brick_mem.h

diff --git a/drivers/staging/mars/brick_mem.c b/drivers/staging/mars/brick_mem.c
new file mode 100644
index ..232dbf6cb0ca
--- /dev/null
+++ b/drivers/staging/mars/brick_mem.c
@@ -0,0 +1,1080 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include 
+#include 
+#include 
+
+#define USE_KERNEL_PAGES   /*  currently mandatory (vmalloc does 
not work) */
+
+#define MAGIC_BLOCK0x8B395D7B
+#define MAGIC_BEND 0x8B395D7C
+#define MAGIC_MEM1 0x8B395D7D
+#define MAGIC_MEM2 0x9B395D8D
+#define MAGIC_MEND10x8B395D7E
+#define MAGIC_MEND20x9B395D8E
+#define MAGIC_STR  0x8B395D7F
+#define MAGIC_SEND 0x9B395D8F
+
+#define INT_ACCESS(ptr, offset) (*(int *)(((char *)(ptr)) + (offset)))
+
+#define _BRICK_FMT(_fmt, _class)   \
+   "%ld.%09ld %ld.%09ld MEM_%-5s %s[%d] %s:%d %s(): "  \
+   _fmt,   \
+   _s_now.tv_sec, _s_now.tv_nsec,  \
+   _l_now.tv_sec, _l_now.tv_nsec,  \
+   say_class[_class],  \
+   current->comm, (int)smp_processor_id(), \
+   __BASE_FILE__,  \
+   __LINE__,   \
+   __func__
+
+#define _BRICK_MSG(_class, _dump, _fmt, _args...)  \
+   do {\
+   struct timespec _s_now = CURRENT_TIME;  \
+   struct timespec _l_now; \
+   get_lamport(&_l_now);   \
+   say(_class, _BRICK_FMT(_fmt, _class), ##_args); \
+   if (_dump)  \
+   dump_stack();   \
+   } while (0)
+
+#define BRICK_ERR(_fmt, _args...) _BRICK_MSG(SAY_ERROR, true,  _fmt, ##_args)
+#define BRICK_WRN(_fmt, _args...) _BRICK_MSG(SAY_WARN, false, _fmt, ##_args)
+#define BRICK_INF(_fmt, _args...) _BRICK_MSG(SAY_INFO, false, _fmt, ##_args)
+
+/***/
+
+/*  limit handling */
+
+#include 
+
+long long brick_global_memavail;
+long long brick_global_memlimit;
+
+atomic64_t brick_global_block_used = ATOMIC64_INIT(0);
+
+void get_total_ram(void)
+{
+   struct sysinfo i = {};
+
+   si_meminfo();
+   /* si_swapinfo(); */
+   brick_global_memavail = (long long)i.totalram * (PAGE_SIZE / 1024);
+   BRICK_INF("total RAM = %lld [KiB]\n", brick_global_memavail);
+}
+
+/***/
+
+/*  small memory allocation (use this only for len < PAGE_SIZE) */
+
+#ifdef BRICK_DEBUG_MEM
+static atomic_t phys_mem_alloc = ATOMIC_INIT(0);
+static atomic_t mem_redirect_alloc = ATOMIC_INIT(0);
+static atomic_t mem_count[BRICK_DEBUG_MEM];
+static atomic_t mem_free[BRICK_DEBUG_MEM];
+static int  mem_len[BRICK_DEBUG_MEM];
+
+#define PLUS_SIZE  (6 * sizeof(int))
+#else
+#define PLUS_SIZE  (2 * sizeof(int))
+#endif
+
+static inline
+void *__brick_mem_alloc(int len)
+{
+   void *res;
+
+   if (len >= PAGE_SIZE) {
+#ifdef BRICK_DEBUG_MEM
+   atomic_inc(_redirect_alloc);
+#endif
+   res = _brick_block_alloc(0, len, 0);
+   } else {
+   for (;;) {
+   res = kmalloc(len, GFP_BRICK);
+   if (likely(res))
+   break;
+   msleep(1000)

[RFC 22/32] mars: add new module xio_trans_logger

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 drivers/staging/mars/xio_bricks/xio_trans_logger.c | 3410 
 include/linux/xio/xio_trans_logger.h   |  271 ++
 2 files changed, 3681 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio_trans_logger.c
 create mode 100644 include/linux/xio/xio_trans_logger.h

diff --git a/drivers/staging/mars/xio_bricks/xio_trans_logger.c 
b/drivers/staging/mars/xio_bricks/xio_trans_logger.c
new file mode 100644
index ..f82e9075ac5a
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio_trans_logger.c
@@ -0,0 +1,3410 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+/*  Trans_Logger brick */
+
+#define XIO_DEBUGGING
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include 
+
+/*  variants */
+#define KEEP_UNIQUE
+#define DELAY_CALLERS  /*  this is _needed_ for production 
systems */
+/* When possible, queue 1 executes phase3_startio() directly without
+ * intermediate queueing into queue 3 = > may be irritating, but has better
+ * performance. NOTICE: when some day the IO scheduling should be
+ * different between queue 1 and 3, you MUST disable this in order
+ * to distinguish between them!
+ */
+#define SHORTCUT_1_to_3
+
+/*  commenting this out is dangerous for data integrity! use only for testing! 
*/
+#define USE_MEMCPY
+#define DO_WRITEBACK   /*  otherwise FAKE IO */
+#define REPLAY_DATA
+
+/*  tuning */
+#ifdef BRICK_DEBUG_MEM
+#define CONF_TRANS_CHUNKSIZE   (128 * 1024 - PAGE_SIZE * 2)
+#else
+#define CONF_TRANS_CHUNKSIZE   (128 * 1024)
+#endif
+#define CONF_TRANS_MAX_AIO_SIZEPAGE_SIZE
+#define CONF_TRANS_ALIGN   0
+
+#define XIO_RPL(_args...) /*empty*/
+
+struct trans_logger_hash_anchor {
+   struct rw_semaphore hash_mutex;
+   struct list_head hash_anchor;
+};
+
+#define NR_HASH_PAGES  64
+
+#define MAX_HASH_PAGES (PAGE_SIZE / sizeof(struct 
trans_logger_hash_anchor *))
+#define HASH_PER_PAGE  (PAGE_SIZE / sizeof(struct 
trans_logger_hash_anchor))
+#define HASH_TOTAL (NR_HASH_PAGES * HASH_PER_PAGE)
+
+#define STATIST_SIZE   2048
+
+/ global tuning ***/
+
+int trans_logger_completion_semantics = 1;
+
+int trans_logger_do_crc =
+#ifdef CONFIG_MARS_DEBUG
+   true;
+#else
+   false;
+#endif
+
+int trans_logger_mem_usage; /* in KB */
+
+int trans_logger_max_interleave = -1;
+
+int trans_logger_resume = 1;
+
+int trans_logger_replay_timeout = 1; /*  in s */
+
+struct writeback_group global_writeback = {
+   .lock = __RW_LOCK_UNLOCKED(global_writeback.lock),
+   .group_anchor = LIST_HEAD_INIT(global_writeback.group_anchor),
+   .until_percent = 30,
+};
+
+static
+void add_to_group(struct writeback_group *gr, struct trans_logger_brick *brick)
+{
+   unsigned long flags;
+
+   write_lock_irqsave(>lock, flags);
+   list_add_tail(>group_head, >group_anchor);
+   write_unlock_irqrestore(>lock, flags);
+}
+
+static
+void remove_from_group(struct writeback_group *gr, struct trans_logger_brick 
*brick)
+{
+   unsigned long flags;
+
+   write_lock_irqsave(>lock, flags);
+   list_del_init(>group_head);
+   gr->leader = NULL;
+   write_unlock_irqrestore(>lock, flags);
+}
+
+static
+struct trans_logger_brick *elect_leader(struct writeback_group *gr)
+{
+   struct trans_logger_brick *res = gr->leader;
+   struct list_head *tmp;
+   unsigned long flags;
+
+   if (res && gr->until_percent >= 0) {
+   loff_t used = atomic64_read(>shadow_mem_used);
+
+   if (used > gr->biggest * gr->until_percent / 100)
+   goto done;
+   }
+
+   read_lock_irqsave(>lock, flags);
+   for (tmp = gr->group_anchor.next; tmp != >group_anchor; tmp = 
tmp->next) {
+   struct trans_logger_brick *test = container_of(tmp, struct 
trans_logger_brick, group_head);
+   loff_t new_used = atomic64_read(>shadow_mem_used);
+
+   if (!res || new_used > atomic64_read(>shadow_mem_used)) {
+   res = test;
+   gr->b

[RFC 19/32] mars: add new module xio_client

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/xio_bricks/xio_client.c | 1083 ++
 include/linux/xio/xio_client.h   |  105 +++
 2 files changed, 1188 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio_client.c
 create mode 100644 include/linux/xio/xio_client.h

diff --git a/drivers/staging/mars/xio_bricks/xio_client.c 
b/drivers/staging/mars/xio_bricks/xio_client.c
new file mode 100644
index ..209523378660
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio_client.c
@@ -0,0 +1,1083 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+/ own type definitions ***/
+
+#include 
+
+#define CLIENT_HASH_MAX(PAGE_SIZE / sizeof(struct 
list_head))
+
+int xio_client_abort = 10;
+
+int max_client_channels = 1;
+
+int max_client_bulk = 16;
+
+/ own helper functions ***/
+
+static int thread_count;
+
+static
+void _do_resubmit(struct client_channel *ch)
+{
+   struct client_output *output = ch->output;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   if (!list_empty(>wait_list)) {
+   struct list_head *first = ch->wait_list.next;
+   struct list_head *last = ch->wait_list.prev;
+   struct list_head *old_start = output->aio_list.next;
+
+#define list_connect __list_del /*  the original routine has a misleading 
name: in reality it is more general */
+   list_connect(>aio_list, first);
+   list_connect(last, old_start);
+   INIT_LIST_HEAD(>wait_list);
+   }
+   spin_unlock_irqrestore(>lock, flags);
+}
+
+static
+void _kill_thread(struct client_threadinfo *ti, const char *name)
+{
+   struct task_struct *thread = ti->thread;
+
+   if (thread) {
+   XIO_DBG("stopping %s thread\n", name);
+   ti->thread = NULL;
+   brick_thread_stop(thread);
+   }
+}
+
+static
+void _kill_channel(struct client_channel *ch)
+{
+   XIO_DBG("channel = %p\n", ch);
+   if (xio_socket_is_alive(>socket)) {
+   XIO_DBG("shutdown socket\n");
+   xio_shutdown_socket(>socket);
+   }
+   _kill_thread(>receiver, "receiver");
+   if (ch->is_open) {
+   XIO_DBG("close socket\n");
+   xio_put_socket(>socket);
+   }
+   ch->recv_error = 0;
+   ch->is_used = false;
+   ch->is_open = false;
+   ch->is_connected = false;
+   /* Re-Submit any waiting requests
+*/
+   _do_resubmit(ch);
+}
+
+static inline
+void _kill_all_channels(struct client_bundle *bundle)
+{
+   int i;
+
+   /*  first pass: shutdown in parallel without waiting */
+   for (i = 0; i < MAX_CLIENT_CHANNELS; i++) {
+   struct client_channel *ch = >channel[i];
+
+   if (xio_socket_is_alive(>socket)) {
+   XIO_DBG("shutdown socket %d\n", i);
+   xio_shutdown_socket(>socket);
+   }
+   }
+   /*  separate pass (may wait) */
+   for (i = 0; i < MAX_CLIENT_CHANNELS; i++)
+   _kill_channel(>channel[i]);
+}
+
+static int receiver_thread(void *data);
+
+static
+int _setup_channel(struct client_bundle *bundle, int ch_nr)
+{
+   struct client_channel *ch = >channel[ch_nr];
+   struct sockaddr_storage src_sockaddr;
+   struct sockaddr_storage dst_sockaddr;
+   int status;
+
+   ch->ch_nr = ch_nr;
+   if (unlikely(ch->receiver.thread)) {
+   XIO_WRN("receiver thread %d unexpectedly not dead\n", ch_nr);
+   _kill_thread(>receiver, "receiver");
+   }
+
+   status = xio_create_sockaddr(_sockaddr, my_id());
+   if (unlikely(status < 0)) {
+   XIO_DBG("no src sockaddr, status = %d\n", status);
+   goto done;
+   }
+
+   status = xio_create_sockaddr(_sockaddr, bundle->host);
+   if (unlikely(status < 0)) {
+   XIO_DBG("no dst sockaddr, status = %d\n", status);
+  

[RFC 03/32] mars: add new module brick_mem

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/brick_mem.c | 1080 ++
 include/linux/brick/brick_mem.h  |  218 
 2 files changed, 1298 insertions(+)
 create mode 100644 drivers/staging/mars/brick_mem.c
 create mode 100644 include/linux/brick/brick_mem.h

diff --git a/drivers/staging/mars/brick_mem.c b/drivers/staging/mars/brick_mem.c
new file mode 100644
index ..232dbf6cb0ca
--- /dev/null
+++ b/drivers/staging/mars/brick_mem.c
@@ -0,0 +1,1080 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include 
+#include 
+#include 
+
+#define USE_KERNEL_PAGES   /*  currently mandatory (vmalloc does 
not work) */
+
+#define MAGIC_BLOCK0x8B395D7B
+#define MAGIC_BEND 0x8B395D7C
+#define MAGIC_MEM1 0x8B395D7D
+#define MAGIC_MEM2 0x9B395D8D
+#define MAGIC_MEND10x8B395D7E
+#define MAGIC_MEND20x9B395D8E
+#define MAGIC_STR  0x8B395D7F
+#define MAGIC_SEND 0x9B395D8F
+
+#define INT_ACCESS(ptr, offset) (*(int *)(((char *)(ptr)) + (offset)))
+
+#define _BRICK_FMT(_fmt, _class)   \
+   "%ld.%09ld %ld.%09ld MEM_%-5s %s[%d] %s:%d %s(): "  \
+   _fmt,   \
+   _s_now.tv_sec, _s_now.tv_nsec,  \
+   _l_now.tv_sec, _l_now.tv_nsec,  \
+   say_class[_class],  \
+   current->comm, (int)smp_processor_id(), \
+   __BASE_FILE__,  \
+   __LINE__,   \
+   __func__
+
+#define _BRICK_MSG(_class, _dump, _fmt, _args...)  \
+   do {\
+   struct timespec _s_now = CURRENT_TIME;  \
+   struct timespec _l_now; \
+   get_lamport(&_l_now);   \
+   say(_class, _BRICK_FMT(_fmt, _class), ##_args); \
+   if (_dump)  \
+   dump_stack();   \
+   } while (0)
+
+#define BRICK_ERR(_fmt, _args...) _BRICK_MSG(SAY_ERROR, true,  _fmt, ##_args)
+#define BRICK_WRN(_fmt, _args...) _BRICK_MSG(SAY_WARN, false, _fmt, ##_args)
+#define BRICK_INF(_fmt, _args...) _BRICK_MSG(SAY_INFO, false, _fmt, ##_args)
+
+/***/
+
+/*  limit handling */
+
+#include 
+
+long long brick_global_memavail;
+long long brick_global_memlimit;
+
+atomic64_t brick_global_block_used = ATOMIC64_INIT(0);
+
+void get_total_ram(void)
+{
+   struct sysinfo i = {};
+
+   si_meminfo();
+   /* si_swapinfo(); */
+   brick_global_memavail = (long long)i.totalram * (PAGE_SIZE / 1024);
+   BRICK_INF("total RAM = %lld [KiB]\n", brick_global_memavail);
+}
+
+/***/
+
+/*  small memory allocation (use this only for len < PAGE_SIZE) */
+
+#ifdef BRICK_DEBUG_MEM
+static atomic_t phys_mem_alloc = ATOMIC_INIT(0);
+static atomic_t mem_redirect_alloc = ATOMIC_INIT(0);
+static atomic_t mem_count[BRICK_DEBUG_MEM];
+static atomic_t mem_free[BRICK_DEBUG_MEM];
+static int  mem_len[BRICK_DEBUG_MEM];
+
+#define PLUS_SIZE  (6 * sizeof(int))
+#else
+#define PLUS_SIZE  (2 * sizeof(int))
+#endif
+
+static inline
+void *__brick_mem_alloc(int len)
+{
+   void *res;
+
+   if (len >= PAGE_SIZE) {
+#ifdef BRICK_DEBUG_MEM
+   atomic_inc(_redirect_alloc);
+#endif
+   res = _brick_block_alloc(0, len, 0);
+   } else {
+   for (;;) {
+   res = kmalloc(len, GFP_BRICK);
+   if (likely(res))
+   break;
+   msleep(1000);
+   }
+#ifdef BRICK

[RFC 22/32] mars: add new module xio_trans_logger

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/xio_bricks/xio_trans_logger.c | 3410 
 include/linux/xio/xio_trans_logger.h   |  271 ++
 2 files changed, 3681 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio_trans_logger.c
 create mode 100644 include/linux/xio/xio_trans_logger.h

diff --git a/drivers/staging/mars/xio_bricks/xio_trans_logger.c 
b/drivers/staging/mars/xio_bricks/xio_trans_logger.c
new file mode 100644
index ..f82e9075ac5a
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio_trans_logger.c
@@ -0,0 +1,3410 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+/*  Trans_Logger brick */
+
+#define XIO_DEBUGGING
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include 
+
+/*  variants */
+#define KEEP_UNIQUE
+#define DELAY_CALLERS  /*  this is _needed_ for production 
systems */
+/* When possible, queue 1 executes phase3_startio() directly without
+ * intermediate queueing into queue 3 = > may be irritating, but has better
+ * performance. NOTICE: when some day the IO scheduling should be
+ * different between queue 1 and 3, you MUST disable this in order
+ * to distinguish between them!
+ */
+#define SHORTCUT_1_to_3
+
+/*  commenting this out is dangerous for data integrity! use only for testing! 
*/
+#define USE_MEMCPY
+#define DO_WRITEBACK   /*  otherwise FAKE IO */
+#define REPLAY_DATA
+
+/*  tuning */
+#ifdef BRICK_DEBUG_MEM
+#define CONF_TRANS_CHUNKSIZE   (128 * 1024 - PAGE_SIZE * 2)
+#else
+#define CONF_TRANS_CHUNKSIZE   (128 * 1024)
+#endif
+#define CONF_TRANS_MAX_AIO_SIZEPAGE_SIZE
+#define CONF_TRANS_ALIGN   0
+
+#define XIO_RPL(_args...) /*empty*/
+
+struct trans_logger_hash_anchor {
+   struct rw_semaphore hash_mutex;
+   struct list_head hash_anchor;
+};
+
+#define NR_HASH_PAGES  64
+
+#define MAX_HASH_PAGES (PAGE_SIZE / sizeof(struct 
trans_logger_hash_anchor *))
+#define HASH_PER_PAGE  (PAGE_SIZE / sizeof(struct 
trans_logger_hash_anchor))
+#define HASH_TOTAL (NR_HASH_PAGES * HASH_PER_PAGE)
+
+#define STATIST_SIZE   2048
+
+/ global tuning ***/
+
+int trans_logger_completion_semantics = 1;
+
+int trans_logger_do_crc =
+#ifdef CONFIG_MARS_DEBUG
+   true;
+#else
+   false;
+#endif
+
+int trans_logger_mem_usage; /* in KB */
+
+int trans_logger_max_interleave = -1;
+
+int trans_logger_resume = 1;
+
+int trans_logger_replay_timeout = 1; /*  in s */
+
+struct writeback_group global_writeback = {
+   .lock = __RW_LOCK_UNLOCKED(global_writeback.lock),
+   .group_anchor = LIST_HEAD_INIT(global_writeback.group_anchor),
+   .until_percent = 30,
+};
+
+static
+void add_to_group(struct writeback_group *gr, struct trans_logger_brick *brick)
+{
+   unsigned long flags;
+
+   write_lock_irqsave(>lock, flags);
+   list_add_tail(>group_head, >group_anchor);
+   write_unlock_irqrestore(>lock, flags);
+}
+
+static
+void remove_from_group(struct writeback_group *gr, struct trans_logger_brick 
*brick)
+{
+   unsigned long flags;
+
+   write_lock_irqsave(>lock, flags);
+   list_del_init(>group_head);
+   gr->leader = NULL;
+   write_unlock_irqrestore(>lock, flags);
+}
+
+static
+struct trans_logger_brick *elect_leader(struct writeback_group *gr)
+{
+   struct trans_logger_brick *res = gr->leader;
+   struct list_head *tmp;
+   unsigned long flags;
+
+   if (res && gr->until_percent >= 0) {
+   loff_t used = atomic64_read(>shadow_mem_used);
+
+   if (used > gr->biggest * gr->until_percent / 100)
+   goto done;
+   }
+
+   read_lock_irqsave(>lock, flags);
+   for (tmp = gr->group_anchor.next; tmp != >group_anchor; tmp = 
tmp->next) {
+   struct trans_logger_brick *test = container_of(tmp, struct 
trans_logger_brick, group_head);
+   loff_t new_used = atomic64_read(>shadow_mem_used);
+
+   if (!res || new_used > atomic64_read(>shadow_mem_used)) {
+   res = test;
+   gr->biggest = new_used;
+ 

[RFC 18/32] mars: add new module xio_sio

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 drivers/staging/mars/xio_bricks/xio_sio.c | 578 ++
 include/linux/xio/xio_sio.h   |  68 
 2 files changed, 646 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio_sio.c
 create mode 100644 include/linux/xio/xio_sio.h

diff --git a/drivers/staging/mars/xio_bricks/xio_sio.c 
b/drivers/staging/mars/xio_bricks/xio_sio.c
new file mode 100644
index ..c910cbda2ae5
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio_sio.c
@@ -0,0 +1,578 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+/ own type definitions ***/
+
+#include 
+
+/* own brick * input * output operations */
+
+static int sio_io_get(struct sio_output *output, struct aio_object *aio)
+{
+   struct file *file;
+
+   if (unlikely(!output->brick->power.on_led))
+   return -EBADFD;
+
+   if (aio->obj_initialized) {
+   obj_get(aio);
+   return aio->io_len;
+   }
+
+   file = output->mf->mf_filp;
+   if (file) {
+   loff_t total_size = i_size_read(file->f_mapping->host);
+
+   aio->io_total_size = total_size;
+   /* Only check reads.
+* Writes behind EOF are always allowed (sparse files)
+*/
+   if (!aio->io_may_write) {
+   loff_t len = total_size - aio->io_pos;
+
+   if (unlikely(len <= 0)) {
+   /* Special case: allow reads starting _exactly_ 
at EOF when a timeout is specified.
+*/
+   if (len < 0 || aio->io_timeout <= 0) {
+   XIO_DBG("ENODATA %lld\n", len);
+   return -ENODATA;
+   }
+   }
+   /*  Shorten below EOF, but allow special case */
+   if (aio->io_len > len && len > 0)
+   aio->io_len = len;
+   }
+   }
+
+   /* Buffered IO.
+*/
+   if (!aio->io_data) {
+   struct sio_aio_aspect *aio_a = 
sio_aio_get_aspect(output->brick, aio);
+
+   if (unlikely(!aio_a))
+   return -EILSEQ;
+   if (unlikely(aio->io_len <= 0)) {
+   XIO_ERR("bad io_len = %d\n", aio->io_len);
+   return -ENOMEM;
+   }
+   aio->io_data = brick_block_alloc(aio->io_pos, (aio_a->alloc_len 
= aio->io_len));
+   aio_a->do_dealloc = true;
+   /* atomic_inc(>total_alloc_count); */
+   /* atomic_inc(>alloc_count); */
+   }
+
+   obj_get_first(aio);
+   return aio->io_len;
+}
+
+static void sio_io_put(struct sio_output *output, struct aio_object *aio)
+{
+   struct file *file;
+   struct sio_aio_aspect *aio_a;
+
+   if (!obj_put(aio))
+   goto out_return;
+   file = output->mf->mf_filp;
+   aio->io_total_size = i_size_read(file->f_mapping->host);
+
+   aio_a = sio_aio_get_aspect(output->brick, aio);
+   if (aio_a && aio_a->do_dealloc) {
+   brick_block_free(aio->io_data, aio_a->alloc_len);
+   /* atomic_dec(>alloc_count); */
+   }
+
+   obj_free(aio);
+out_return:;
+}
+
+static
+int write_aops(struct sio_output *output, struct aio_object *aio)
+{
+   struct file *file = output->mf->mf_filp;
+   loff_t pos = aio->io_pos;
+   void *data = aio->io_data;
+   int  len = aio->io_len;
+   int ret = 0;
+
+   mm_segment_t oldfs;
+
+   oldfs = get_fs();
+   set_fs(get_ds());
+   ret = vfs_write(file, data, len, );
+   set_fs(oldfs);
+   return ret;
+}
+
+static
+int read_aops(struct sio_output *output, struct aio_object *aio)
+{
+   loff_t pos = aio->io_pos;
+   int len = aio->io

[RFC 20/32] mars: add new module xio_if

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 drivers/staging/mars/xio_bricks/xio_if.c | 892 +++
 include/linux/xio/xio_if.h   | 109 
 2 files changed, 1001 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio_if.c
 create mode 100644 include/linux/xio/xio_if.h

diff --git a/drivers/staging/mars/xio_bricks/xio_if.c 
b/drivers/staging/mars/xio_bricks/xio_if.c
new file mode 100644
index ..97e0cd541c5c
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio_if.c
@@ -0,0 +1,892 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+/* Interface to a Linux device.
+ * 1 Input, 0 Outputs.
+ */
+
+#define ALWAYS_UNPLUG  true
+#define PREFETCH_LEN   PAGE_SIZE
+
+/*  low-level device parameters */
+#define USE_MAX_SECTORS(PAGE_SIZE >> 9)
+#define USE_MAX_PHYS_SEGMENTS  (PAGE_SIZE >> 9)
+#define USE_MAX_SEGMENT_SIZE   PAGE_SIZE
+#define USE_LOGICAL_BLOCK_SIZE 512
+#define USE_SEGMENT_BOUNDARY   (PAGE_SIZE - 1)
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#ifndef XIO_MAJOR
+#define XIO_MAJOR  (DRBD_MAJOR + 1)
+#endif
+
+/ global tuning ***/
+
+int if_throttle_start_size;
+
+struct rate_limiter if_throttle = {
+   .lim_max_amount_rate = 5000,
+};
+
+/ own type definitions ***/
+
+/ own static definitions ***/
+
+/*  TODO: check bounds, ensure that free minor numbers are recycled */
+static int device_minor;
+
+/*** object * aspect constructors * destructors **/
+
+/ linux operations ***/
+
+static
+void _if_start_io_acct(struct if_input *input, struct bio_wrapper *biow)
+{
+   struct bio *bio = biow->bio;
+   const int rw = bio_data_dir(bio);
+   const int cpu = part_stat_lock();
+
+   (void)cpu;
+   part_round_stats(cpu, >disk->part0);
+   part_stat_inc(cpu, >disk->part0, ios[rw]);
+   part_stat_add(cpu, >disk->part0, sectors[rw], 
bio->bi_iter.bi_size >> 9);
+   part_inc_in_flight(>disk->part0, rw);
+   part_stat_unlock();
+   biow->start_time = jiffies;
+}
+
+static
+void _if_end_io_acct(struct if_input *input, struct bio_wrapper *biow)
+{
+   unsigned long duration = jiffies - biow->start_time;
+   struct bio *bio = biow->bio;
+   const int rw = bio_data_dir(bio);
+   const int cpu = part_stat_lock();
+
+   (void)cpu;
+   part_stat_add(cpu, >disk->part0, ticks[rw], duration);
+   part_round_stats(cpu, >disk->part0);
+   part_dec_in_flight(>disk->part0, rw);
+   part_stat_unlock();
+}
+
+/* callback
+ */
+static
+void if_endio(struct generic_callback *cb)
+{
+   struct if_aio_aspect *aio_a = cb->cb_private;
+   struct if_input *input;
+   int k;
+   int rw;
+   int error;
+
+   LAST_CALLBACK(cb);
+   if (unlikely(!aio_a || !aio_a->object)) {
+   XIO_FAT("aio_a = %p aio = %p, something is very wrong here!\n", 
aio_a, aio_a->object);
+   goto out_return;
+   }
+   input = aio_a->input;
+   CHECK_PTR(input, err);
+
+   rw = aio_a->object->io_rw;
+
+   for (k = 0; k < aio_a->bio_count; k++) {
+   struct bio_wrapper *biow;
+   struct bio *bio;
+
+   biow = aio_a->orig_biow[k];
+   aio_a->orig_biow[k] = NULL;
+   CHECK_PTR(biow, err);
+
+   CHECK_ATOMIC(>bi_comp_cnt, 1);
+   if (!atomic_dec_and_test(>bi_comp_cnt))
+   continue;
+
+   bio = biow->bio;
+   CHECK_PTR_NULL(bio, err);
+
+   _if_end_io_acct(input, biow);
+
+   error = CALLBACK_ERROR(aio_a->object);
+   if (unlikely(error < 0)) {
+   int bi_size = bio->bi_iter.bi_size;
+
+   XIO_ERR("NYI: error=%d RETRY LOGIC %u\n", error, 
bi_size);
+   } else { /*  bio conventions are slightly dif

[RFC 23/32] mars: add new module xio_server

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 drivers/staging/mars/xio_bricks/xio_server.c | 493 +++
 include/linux/xio/xio_server.h   |  91 +
 2 files changed, 584 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio_server.c
 create mode 100644 include/linux/xio/xio_server.h

diff --git a/drivers/staging/mars/xio_bricks/xio_server.c 
b/drivers/staging/mars/xio_bricks/xio_server.c
new file mode 100644
index ..28944d15a7bf
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio_server.c
@@ -0,0 +1,493 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+/*  Server brick (just for demonstration) */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/ own type definitions ***/
+
+#include 
+
+static struct xio_socket server_socket[NR_SERVER_SOCKETS];
+static struct task_struct *server_threads[NR_SERVER_SOCKETS];
+
+/ own helper functions ***/
+
+int cb_thread(void *data)
+{
+   struct server_brick *brick = data;
+   struct xio_socket *sock = >handler_socket;
+   bool aborted = false;
+   bool ok = xio_get_socket(sock);
+   int status = -EINVAL;
+
+   XIO_DBG("--- cb_thread starting on socket #%d, ok = %d\n", 
sock->s_debug_nr, ok);
+   if (!ok)
+   goto done;
+
+   brick->cb_running = true;
+   wake_up_interruptible(>startup_event);
+
+   while (!brick_thread_should_stop() ||
+  !list_empty(>cb_read_list) ||
+  !list_empty(>cb_write_list) ||
+  atomic_read(>in_flight) > 0) {
+   struct server_aio_aspect *aio_a;
+   struct aio_object *aio;
+   struct list_head *tmp;
+   unsigned long flags;
+
+   wait_event_interruptible_timeout(
+   brick->cb_event,
+   !list_empty(>cb_read_list) ||
+   !list_empty(>cb_write_list),
+   1 * HZ);
+
+   spin_lock_irqsave(>cb_lock, flags);
+   tmp = brick->cb_write_list.next;
+   if (tmp == >cb_write_list) {
+   tmp = brick->cb_read_list.next;
+   if (tmp == >cb_read_list) {
+   spin_unlock_irqrestore(>cb_lock, flags);
+   brick_msleep(1000 / HZ);
+   continue;
+   }
+   }
+   list_del_init(tmp);
+   spin_unlock_irqrestore(>cb_lock, flags);
+
+   aio_a = container_of(tmp, struct server_aio_aspect, cb_head);
+   aio = aio_a->object;
+   status = -EINVAL;
+   CHECK_PTR(aio, err);
+
+   status = 0;
+   /* Report a remote error when consistency cannot be guaranteed,
+* e.g. emergency mode during sync.
+*/
+   if (brick->conn_brick &&
+   brick->conn_brick->mode_ptr &&
+   *brick->conn_brick->mode_ptr < 0 &&
+   aio->object_cb)
+   aio->object_cb->cb_error = *brick->conn_brick->mode_ptr;
+   if (!aborted) {
+   down(>socket_sem);
+   status = xio_send_cb(sock, aio);
+   up(>socket_sem);
+   }
+
+err:
+   if (unlikely(status < 0) && !aborted) {
+   aborted = true;
+   XIO_WRN("cannot send response, status = %d\n", status);
+   /* Just shutdown the socket and forget all pending
+* requests.
+* The _client_ is responsible for resending
+* any lost operations.
+*/
+   xio_shutdown_socket(sock);
+   }
+
+   if (aio_a->data) {
+   brick_block_free(aio_a->data, aio_a->len);
+   aio->io_data = NULL;
+   }
+ 

[RFC 21/32] mars: add new module xio_copy

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 drivers/staging/mars/xio_bricks/xio_copy.c | 1005 
 include/linux/xio/xio_copy.h   |  115 
 2 files changed, 1120 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio_copy.c
 create mode 100644 include/linux/xio/xio_copy.h

diff --git a/drivers/staging/mars/xio_bricks/xio_copy.c 
b/drivers/staging/mars/xio_bricks/xio_copy.c
new file mode 100644
index ..56b60f2f837e
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio_copy.c
@@ -0,0 +1,1005 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+/*  Copy brick (just for demonstration) */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#ifndef READ
+#define READ   0
+#define WRITE  1
+#endif
+
+#define COPY_CHUNK (PAGE_SIZE)
+#define NR_COPY_REQUESTS   (32 * 1024 * 1024 / COPY_CHUNK)
+
+#define STATES_PER_PAGE(PAGE_SIZE / sizeof(struct 
copy_state))
+#define MAX_SUB_TABLES (NR_COPY_REQUESTS / STATES_PER_PAGE + 
(NR_COPY_REQUESTS % STATES_PER_PAGE ? 1 : 0)\
+   \
+)
+#define MAX_COPY_REQUESTS  (PAGE_SIZE / sizeof(struct copy_state 
*) * STATES_PER_PAGE)
+
+#define GET_STATE(brick, index)
\
+   ((brick)->st[(index) / STATES_PER_PAGE][(index) % STATES_PER_PAGE])
+
+/ own type definitions ***/
+
+#include 
+
+int xio_copy_overlap = 1;
+
+int xio_copy_read_prio = XIO_PRIO_NORMAL;
+
+int xio_copy_write_prio = XIO_PRIO_NORMAL;
+
+int xio_copy_read_max_fly;
+
+int xio_copy_write_max_fly;
+
+#define is_read_limited(brick) \
+   (xio_copy_read_max_fly > 0 && atomic_read(&(brick)->copy_read_flight) 
>= xio_copy_read_max_fly)
+
+#define is_write_limited(brick)
\
+   (xio_copy_write_max_fly > 0 && atomic_read(&(brick)->copy_write_flight) 
>= xio_copy_write_max_fly)
+
+/ own helper functions ***/
+
+/* TODO:
+ * The clash logic is untested / alpha stage (Feb. 2011).
+ *
+ * For now, the output is never used, so this cannot do harm.
+ *
+ * In order to get the output really working / enterprise grade,
+ * some larger test effort should be invested.
+ */
+static inline
+void _clash(struct copy_brick *brick)
+{
+   brick->trigger = true;
+   set_bit(0, >clash);
+   atomic_inc(>total_clash_count);
+   wake_up_interruptible(>event);
+}
+
+static inline
+int _clear_clash(struct copy_brick *brick)
+{
+   int old;
+
+   old = test_and_clear_bit(0, >clash);
+   return old;
+}
+
+/* Current semantics:
+ *
+ * All writes are always going to the original input A. They are _not_
+ * replicated to B.
+ *
+ * In order to get B really uptodate, you have to replay the right
+ * transaction logs there (at the right time).
+ * [If you had no writes on A at all during the copy, of course
+ * this is not necessary]
+ *
+ * When utilize_mode is on, reads can utilize the already copied
+ * region from B, but only as long as this region has not been
+ * invalidated by writes (indicated by low_dirty).
+ *
+ * TODO: implement replicated writes, together with some transaction
+ * replay logic applying the transaction logs _only_ after
+ * crashes during inconsistency caused by partial replication of writes.
+ */
+static
+int _determine_input(struct copy_brick *brick, struct aio_object *aio)
+{
+   int rw;
+   int below;
+   int behind;
+   loff_t io_end;
+
+   if (!brick->utilize_mode || brick->low_dirty)
+   return INPUT_A_IO;
+
+   io_end = aio->io_pos + aio->io_len;
+   below = io_end <= brick->copy_start;
+   behind = !brick->copy_end || aio->io_pos >= brick->copy_end;
+   rw = aio->io_may_write | aio->io_rw;
+   if (rw) {
+   if (!behind) {
+   brick->low_dirty = true;
+   if (!below) {
+   _clash(brick);
+  

[RFC 18/32] mars: add new module xio_sio

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/xio_bricks/xio_sio.c | 578 ++
 include/linux/xio/xio_sio.h   |  68 
 2 files changed, 646 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio_sio.c
 create mode 100644 include/linux/xio/xio_sio.h

diff --git a/drivers/staging/mars/xio_bricks/xio_sio.c 
b/drivers/staging/mars/xio_bricks/xio_sio.c
new file mode 100644
index ..c910cbda2ae5
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio_sio.c
@@ -0,0 +1,578 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+/ own type definitions ***/
+
+#include 
+
+/* own brick * input * output operations */
+
+static int sio_io_get(struct sio_output *output, struct aio_object *aio)
+{
+   struct file *file;
+
+   if (unlikely(!output->brick->power.on_led))
+   return -EBADFD;
+
+   if (aio->obj_initialized) {
+   obj_get(aio);
+   return aio->io_len;
+   }
+
+   file = output->mf->mf_filp;
+   if (file) {
+   loff_t total_size = i_size_read(file->f_mapping->host);
+
+   aio->io_total_size = total_size;
+   /* Only check reads.
+* Writes behind EOF are always allowed (sparse files)
+*/
+   if (!aio->io_may_write) {
+   loff_t len = total_size - aio->io_pos;
+
+   if (unlikely(len <= 0)) {
+   /* Special case: allow reads starting _exactly_ 
at EOF when a timeout is specified.
+*/
+   if (len < 0 || aio->io_timeout <= 0) {
+   XIO_DBG("ENODATA %lld\n", len);
+   return -ENODATA;
+   }
+   }
+   /*  Shorten below EOF, but allow special case */
+   if (aio->io_len > len && len > 0)
+   aio->io_len = len;
+   }
+   }
+
+   /* Buffered IO.
+*/
+   if (!aio->io_data) {
+   struct sio_aio_aspect *aio_a = 
sio_aio_get_aspect(output->brick, aio);
+
+   if (unlikely(!aio_a))
+   return -EILSEQ;
+   if (unlikely(aio->io_len <= 0)) {
+   XIO_ERR("bad io_len = %d\n", aio->io_len);
+   return -ENOMEM;
+   }
+   aio->io_data = brick_block_alloc(aio->io_pos, (aio_a->alloc_len 
= aio->io_len));
+   aio_a->do_dealloc = true;
+   /* atomic_inc(>total_alloc_count); */
+   /* atomic_inc(>alloc_count); */
+   }
+
+   obj_get_first(aio);
+   return aio->io_len;
+}
+
+static void sio_io_put(struct sio_output *output, struct aio_object *aio)
+{
+   struct file *file;
+   struct sio_aio_aspect *aio_a;
+
+   if (!obj_put(aio))
+   goto out_return;
+   file = output->mf->mf_filp;
+   aio->io_total_size = i_size_read(file->f_mapping->host);
+
+   aio_a = sio_aio_get_aspect(output->brick, aio);
+   if (aio_a && aio_a->do_dealloc) {
+   brick_block_free(aio->io_data, aio_a->alloc_len);
+   /* atomic_dec(>alloc_count); */
+   }
+
+   obj_free(aio);
+out_return:;
+}
+
+static
+int write_aops(struct sio_output *output, struct aio_object *aio)
+{
+   struct file *file = output->mf->mf_filp;
+   loff_t pos = aio->io_pos;
+   void *data = aio->io_data;
+   int  len = aio->io_len;
+   int ret = 0;
+
+   mm_segment_t oldfs;
+
+   oldfs = get_fs();
+   set_fs(get_ds());
+   ret = vfs_write(file, data, len, );
+   set_fs(oldfs);
+   return ret;
+}
+
+static
+int read_aops(struct sio_output *output, struct aio_object *aio)
+{
+   loff_t pos = aio->io_pos;
+   int len = aio->io_len;
+   int ret;
+
+   mm_seg

[RFC 20/32] mars: add new module xio_if

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/xio_bricks/xio_if.c | 892 +++
 include/linux/xio/xio_if.h   | 109 
 2 files changed, 1001 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio_if.c
 create mode 100644 include/linux/xio/xio_if.h

diff --git a/drivers/staging/mars/xio_bricks/xio_if.c 
b/drivers/staging/mars/xio_bricks/xio_if.c
new file mode 100644
index ..97e0cd541c5c
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio_if.c
@@ -0,0 +1,892 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+/* Interface to a Linux device.
+ * 1 Input, 0 Outputs.
+ */
+
+#define ALWAYS_UNPLUG  true
+#define PREFETCH_LEN   PAGE_SIZE
+
+/*  low-level device parameters */
+#define USE_MAX_SECTORS(PAGE_SIZE >> 9)
+#define USE_MAX_PHYS_SEGMENTS  (PAGE_SIZE >> 9)
+#define USE_MAX_SEGMENT_SIZE   PAGE_SIZE
+#define USE_LOGICAL_BLOCK_SIZE 512
+#define USE_SEGMENT_BOUNDARY   (PAGE_SIZE - 1)
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#ifndef XIO_MAJOR
+#define XIO_MAJOR  (DRBD_MAJOR + 1)
+#endif
+
+/ global tuning ***/
+
+int if_throttle_start_size;
+
+struct rate_limiter if_throttle = {
+   .lim_max_amount_rate = 5000,
+};
+
+/ own type definitions ***/
+
+/ own static definitions ***/
+
+/*  TODO: check bounds, ensure that free minor numbers are recycled */
+static int device_minor;
+
+/*** object * aspect constructors * destructors **/
+
+/ linux operations ***/
+
+static
+void _if_start_io_acct(struct if_input *input, struct bio_wrapper *biow)
+{
+   struct bio *bio = biow->bio;
+   const int rw = bio_data_dir(bio);
+   const int cpu = part_stat_lock();
+
+   (void)cpu;
+   part_round_stats(cpu, >disk->part0);
+   part_stat_inc(cpu, >disk->part0, ios[rw]);
+   part_stat_add(cpu, >disk->part0, sectors[rw], 
bio->bi_iter.bi_size >> 9);
+   part_inc_in_flight(>disk->part0, rw);
+   part_stat_unlock();
+   biow->start_time = jiffies;
+}
+
+static
+void _if_end_io_acct(struct if_input *input, struct bio_wrapper *biow)
+{
+   unsigned long duration = jiffies - biow->start_time;
+   struct bio *bio = biow->bio;
+   const int rw = bio_data_dir(bio);
+   const int cpu = part_stat_lock();
+
+   (void)cpu;
+   part_stat_add(cpu, >disk->part0, ticks[rw], duration);
+   part_round_stats(cpu, >disk->part0);
+   part_dec_in_flight(>disk->part0, rw);
+   part_stat_unlock();
+}
+
+/* callback
+ */
+static
+void if_endio(struct generic_callback *cb)
+{
+   struct if_aio_aspect *aio_a = cb->cb_private;
+   struct if_input *input;
+   int k;
+   int rw;
+   int error;
+
+   LAST_CALLBACK(cb);
+   if (unlikely(!aio_a || !aio_a->object)) {
+   XIO_FAT("aio_a = %p aio = %p, something is very wrong here!\n", 
aio_a, aio_a->object);
+   goto out_return;
+   }
+   input = aio_a->input;
+   CHECK_PTR(input, err);
+
+   rw = aio_a->object->io_rw;
+
+   for (k = 0; k < aio_a->bio_count; k++) {
+   struct bio_wrapper *biow;
+   struct bio *bio;
+
+   biow = aio_a->orig_biow[k];
+   aio_a->orig_biow[k] = NULL;
+   CHECK_PTR(biow, err);
+
+   CHECK_ATOMIC(>bi_comp_cnt, 1);
+   if (!atomic_dec_and_test(>bi_comp_cnt))
+   continue;
+
+   bio = biow->bio;
+   CHECK_PTR_NULL(bio, err);
+
+   _if_end_io_acct(input, biow);
+
+   error = CALLBACK_ERROR(aio_a->object);
+   if (unlikely(error < 0)) {
+   int bi_size = bio->bi_iter.bi_size;
+
+   XIO_ERR("NYI: error=%d RETRY LOGIC %u\n", error, 
bi_size);
+   } else { /*  bio conventions are slightly different... */
+   

[RFC 23/32] mars: add new module xio_server

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/xio_bricks/xio_server.c | 493 +++
 include/linux/xio/xio_server.h   |  91 +
 2 files changed, 584 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio_server.c
 create mode 100644 include/linux/xio/xio_server.h

diff --git a/drivers/staging/mars/xio_bricks/xio_server.c 
b/drivers/staging/mars/xio_bricks/xio_server.c
new file mode 100644
index ..28944d15a7bf
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio_server.c
@@ -0,0 +1,493 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+/*  Server brick (just for demonstration) */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/ own type definitions ***/
+
+#include 
+
+static struct xio_socket server_socket[NR_SERVER_SOCKETS];
+static struct task_struct *server_threads[NR_SERVER_SOCKETS];
+
+/ own helper functions ***/
+
+int cb_thread(void *data)
+{
+   struct server_brick *brick = data;
+   struct xio_socket *sock = >handler_socket;
+   bool aborted = false;
+   bool ok = xio_get_socket(sock);
+   int status = -EINVAL;
+
+   XIO_DBG("--- cb_thread starting on socket #%d, ok = %d\n", 
sock->s_debug_nr, ok);
+   if (!ok)
+   goto done;
+
+   brick->cb_running = true;
+   wake_up_interruptible(>startup_event);
+
+   while (!brick_thread_should_stop() ||
+  !list_empty(>cb_read_list) ||
+  !list_empty(>cb_write_list) ||
+  atomic_read(>in_flight) > 0) {
+   struct server_aio_aspect *aio_a;
+   struct aio_object *aio;
+   struct list_head *tmp;
+   unsigned long flags;
+
+   wait_event_interruptible_timeout(
+   brick->cb_event,
+   !list_empty(>cb_read_list) ||
+   !list_empty(>cb_write_list),
+   1 * HZ);
+
+   spin_lock_irqsave(>cb_lock, flags);
+   tmp = brick->cb_write_list.next;
+   if (tmp == >cb_write_list) {
+   tmp = brick->cb_read_list.next;
+   if (tmp == >cb_read_list) {
+   spin_unlock_irqrestore(>cb_lock, flags);
+   brick_msleep(1000 / HZ);
+   continue;
+   }
+   }
+   list_del_init(tmp);
+   spin_unlock_irqrestore(>cb_lock, flags);
+
+   aio_a = container_of(tmp, struct server_aio_aspect, cb_head);
+   aio = aio_a->object;
+   status = -EINVAL;
+   CHECK_PTR(aio, err);
+
+   status = 0;
+   /* Report a remote error when consistency cannot be guaranteed,
+* e.g. emergency mode during sync.
+*/
+   if (brick->conn_brick &&
+   brick->conn_brick->mode_ptr &&
+   *brick->conn_brick->mode_ptr < 0 &&
+   aio->object_cb)
+   aio->object_cb->cb_error = *brick->conn_brick->mode_ptr;
+   if (!aborted) {
+   down(>socket_sem);
+   status = xio_send_cb(sock, aio);
+   up(>socket_sem);
+   }
+
+err:
+   if (unlikely(status < 0) && !aborted) {
+   aborted = true;
+   XIO_WRN("cannot send response, status = %d\n", status);
+   /* Just shutdown the socket and forget all pending
+* requests.
+* The _client_ is responsible for resending
+* any lost operations.
+*/
+   xio_shutdown_socket(sock);
+   }
+
+   if (aio_a->data) {
+   brick_block_free(aio_a->data, aio_a->len);
+   aio->io_data = NULL;
+   }
+   if (aio_a->do_put) {
+  

[RFC 21/32] mars: add new module xio_copy

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/xio_bricks/xio_copy.c | 1005 
 include/linux/xio/xio_copy.h   |  115 
 2 files changed, 1120 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio_copy.c
 create mode 100644 include/linux/xio/xio_copy.h

diff --git a/drivers/staging/mars/xio_bricks/xio_copy.c 
b/drivers/staging/mars/xio_bricks/xio_copy.c
new file mode 100644
index ..56b60f2f837e
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio_copy.c
@@ -0,0 +1,1005 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+/*  Copy brick (just for demonstration) */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#ifndef READ
+#define READ   0
+#define WRITE  1
+#endif
+
+#define COPY_CHUNK (PAGE_SIZE)
+#define NR_COPY_REQUESTS   (32 * 1024 * 1024 / COPY_CHUNK)
+
+#define STATES_PER_PAGE(PAGE_SIZE / sizeof(struct 
copy_state))
+#define MAX_SUB_TABLES (NR_COPY_REQUESTS / STATES_PER_PAGE + 
(NR_COPY_REQUESTS % STATES_PER_PAGE ? 1 : 0)\
+   \
+)
+#define MAX_COPY_REQUESTS  (PAGE_SIZE / sizeof(struct copy_state 
*) * STATES_PER_PAGE)
+
+#define GET_STATE(brick, index)
\
+   ((brick)->st[(index) / STATES_PER_PAGE][(index) % STATES_PER_PAGE])
+
+/ own type definitions ***/
+
+#include 
+
+int xio_copy_overlap = 1;
+
+int xio_copy_read_prio = XIO_PRIO_NORMAL;
+
+int xio_copy_write_prio = XIO_PRIO_NORMAL;
+
+int xio_copy_read_max_fly;
+
+int xio_copy_write_max_fly;
+
+#define is_read_limited(brick) \
+   (xio_copy_read_max_fly > 0 && atomic_read(&(brick)->copy_read_flight) 
>= xio_copy_read_max_fly)
+
+#define is_write_limited(brick)
\
+   (xio_copy_write_max_fly > 0 && atomic_read(&(brick)->copy_write_flight) 
>= xio_copy_write_max_fly)
+
+/ own helper functions ***/
+
+/* TODO:
+ * The clash logic is untested / alpha stage (Feb. 2011).
+ *
+ * For now, the output is never used, so this cannot do harm.
+ *
+ * In order to get the output really working / enterprise grade,
+ * some larger test effort should be invested.
+ */
+static inline
+void _clash(struct copy_brick *brick)
+{
+   brick->trigger = true;
+   set_bit(0, >clash);
+   atomic_inc(>total_clash_count);
+   wake_up_interruptible(>event);
+}
+
+static inline
+int _clear_clash(struct copy_brick *brick)
+{
+   int old;
+
+   old = test_and_clear_bit(0, >clash);
+   return old;
+}
+
+/* Current semantics:
+ *
+ * All writes are always going to the original input A. They are _not_
+ * replicated to B.
+ *
+ * In order to get B really uptodate, you have to replay the right
+ * transaction logs there (at the right time).
+ * [If you had no writes on A at all during the copy, of course
+ * this is not necessary]
+ *
+ * When utilize_mode is on, reads can utilize the already copied
+ * region from B, but only as long as this region has not been
+ * invalidated by writes (indicated by low_dirty).
+ *
+ * TODO: implement replicated writes, together with some transaction
+ * replay logic applying the transaction logs _only_ after
+ * crashes during inconsistency caused by partial replication of writes.
+ */
+static
+int _determine_input(struct copy_brick *brick, struct aio_object *aio)
+{
+   int rw;
+   int below;
+   int behind;
+   loff_t io_end;
+
+   if (!brick->utilize_mode || brick->low_dirty)
+   return INPUT_A_IO;
+
+   io_end = aio->io_pos + aio->io_len;
+   below = io_end <= brick->copy_start;
+   behind = !brick->copy_end || aio->io_pos >= brick->copy_end;
+   rw = aio->io_may_write | aio->io_rw;
+   if (rw) {
+   if (!behind) {
+   brick->low_dirty = true;
+   if (!below) {
+   _clash(brick);
+   wake_

[RFC 31/32] mars: add new module Kconfig

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 drivers/staging/mars/Kconfig | 266 +++
 1 file changed, 266 insertions(+)
 create mode 100644 drivers/staging/mars/Kconfig

diff --git a/drivers/staging/mars/Kconfig b/drivers/staging/mars/Kconfig
new file mode 100644
index ..836185e9509c
--- /dev/null
+++ b/drivers/staging/mars/Kconfig
@@ -0,0 +1,266 @@
+#
+# MARS configuration
+#
+
+config MARS
+   tristate "storage system MARS (EXPERIMENTAL)"
+   depends on BLOCK && PROC_SYSCTL && HIGH_RES_TIMERS && !DEBUG_SLAB && 
!DEBUG_SG
+   default n
+   ---help---
+ MARS is a long-distance replication of generic block devices.
+ It works asynchronously and tolerates network bottlenecks.
+ Please read the full documentation at
+   
https://github.com/schoebel/mars/blob/master/docu/mars-manual.pdf?raw=true
+ Always compile MARS as a module!
+
+config MARS_CHECKS
+   bool "enable simple runtime checks in MARS"
+   depends on MARS
+   default y
+   ---help---
+ These checks should be rather lightweight. Use them
+ for beta testing and for production systems where
+ safety is more important than performance.
+ In case of bugs in the reference counting, an automatic   repair
+ is attempted, which lowers the risk of memory corruptions.
+ Disable only if you need the absolutely last grain of
+ performance.
+ If unsure, say Y here.
+
+config MARS_DEBUG
+   bool "enable full runtime checks and some tracing in MARS"
+   depends on MARS
+   default n
+   ---help---
+ Some of these checks and some additional error tracing may
+ consume noticeable amounts of memory. However, this is extremely
+ valuable for finding bugs, even in production systems.
+
+ OFF for production systems. ON for testing!
+
+ If you encounter bugs in production systems, you
+ may / should use this also in production if you carefully
+ monitor your systems.
+
+config MARS_DEBUG_MEM
+   bool "debug memory operations"
+   depends on MARS_DEBUG
+   default n
+   ---help---
+ This adds considerable space and time overhead, but catches
+ many errors (including some that are not caught by kmemleak).
+
+ OFF for production systems. ON for testing!
+ Use only for development and thorough testing!
+
+config MARS_DEBUG_MEM_STRONG
+   bool "intensified debugging of memory operations"
+   depends on MARS_DEBUG_MEM
+   default y
+   ---help---
+ Trace all block allocations, find more errors.
+ Adds some overhead.
+
+ Use for debugging of new bricks or for intensified
+ regression testing.
+
+config MARS_DEBUG_ORDER0
+   bool "also debug order0 operations"
+   depends on MARS_DEBUG_MEM
+   default n
+   ---help---
+ Turn even order 0 allocations into order 1 ones and provoke
+ heavy memory fragmentation problems from the buddy allocator,
+ but catch some additional memory problems.
+ Use only if you know what you are doing!
+ Normally OFF.
+
+config MARS_DEFAULT_PORT
+   int "port number where MARS is listening"
+   depends on MARS
+   default 
+   ---help---
+ Best practice is to uniformly use the same port number
+ in a cluster. Therefore, this is a compiletime constant.
+ You may override this at insmod time via the mars_port= parameter.
+
+config MARS_NET_COMPAT
+   bool "compatibility to 0.1 series network protocol"
+   depends on MARS
+   default y
+   ---help---
+   TRANSITIONAL: this is only needed for _mixed_ operations of the
+   MARS Light 0.1 kernel modules and 0.2 module.
+   Typically, you will need this only during upgrade for minimizig
+   downtime (e.g.  first upgrade secondary side, then handover,
+   and finally upgrade the former primary side).
+   This option will be removed for 0.3 and later stable
+   series, since you will no longer need it.
+
+config MARS_LOGDIR
+   string "absolute path to the logging directory"
+   depends on MARS
+   default "/mars"
+   ---help---
+ Path to the directory where all MARS messages will reside.
+ Usually this is equal to the global /mars directory.
+
+ Logfiles and status files obey the following naming conventions:
+   0.debug.log
+   1.info.log
+   2.warn.log
+   3.error.log
+   4.fatal.log
+   5.total.log
+ Logfiles must already exist in order to be appended.
+ Logiles can be rotated by renaming them and creating
+  

[RFC 17/32] mars: add new module xio_bio

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 drivers/staging/mars/xio_bricks/xio_bio.c | 845 ++
 include/linux/xio/xio_bio.h   |  85 +++
 2 files changed, 930 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio_bio.c
 create mode 100644 include/linux/xio/xio_bio.h

diff --git a/drivers/staging/mars/xio_bricks/xio_bio.c 
b/drivers/staging/mars/xio_bricks/xio_bio.c
new file mode 100644
index ..97bc4fc46f3e
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio_bio.c
@@ -0,0 +1,845 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+/*  Bio brick (interface to blkdev IO via kernel bios) */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include 
+static struct timing_stats timings[2];
+
+struct threshold bio_submit_threshold = {
+   .thr_ban = _global_ban,
+   .thr_parent = _io_threshold,
+   .thr_limit = BIO_SUBMIT_MAX_LATENCY,
+   .thr_factor = 100,
+   .thr_plus = 0,
+};
+
+struct threshold bio_io_threshold[2] = {
+   [0] = {
+   .thr_ban = _global_ban,
+   .thr_parent = _io_threshold,
+   .thr_limit = BIO_IO_R_MAX_LATENCY,
+   .thr_factor = 10,
+   .thr_plus = 1,
+   },
+   [1] = {
+   .thr_ban = _global_ban,
+   .thr_parent = _io_threshold,
+   .thr_limit = BIO_IO_W_MAX_LATENCY,
+   .thr_factor = 10,
+   .thr_plus = 1,
+   },
+};
+
+/ own type definitions ***/
+
+/ own helper functions ***/
+
+/* This is called from the kernel bio layer.
+ */
+static
+void bio_callback(struct bio *bio)
+{
+   struct bio_aio_aspect *aio_a = bio->bi_private;
+   struct bio_brick *brick;
+   unsigned long flags;
+
+   CHECK_PTR(aio_a, err);
+   CHECK_PTR(aio_a->output, err);
+   brick = aio_a->output->brick;
+   CHECK_PTR(brick, err);
+
+   aio_a->status_code = bio->bi_error;
+
+   spin_lock_irqsave(>lock, flags);
+   list_del(_a->io_head);
+   list_add_tail(_a->io_head, >completed_list);
+   atomic_inc(>completed_count);
+   spin_unlock_irqrestore(>lock, flags);
+
+   wake_up_interruptible(>response_event);
+   goto out_return;
+err:
+   XIO_FAT("cannot handle bio callback\n");
+out_return:;
+}
+
+/* Map from kernel address/length to struct page (if not already known),
+ * check alignment constraints, create bio from it.
+ * Return the length (may be smaller than requested).
+ */
+static
+int make_bio(
+struct bio_brick *brick, void *data, int len, loff_t pos, struct 
bio_aio_aspect *private, struct bio **_bio)
+{
+   unsigned long long sector;
+   int sector_offset;
+   int data_offset;
+   int page_offset;
+   int page_len;
+   int bvec_count;
+   int rest_len = len;
+   int result_len = 0;
+   int status;
+   int i;
+   struct bio *bio = NULL;
+   struct block_device *bdev;
+
+   status = -EINVAL;
+   CHECK_PTR(brick, out);
+   bdev = brick->bdev;
+   CHECK_PTR(bdev, out);
+
+   if (unlikely(rest_len <= 0)) {
+   XIO_ERR("bad bio len %d\n", rest_len);
+   goto out;
+   }
+
+   sector = pos >> 9; /*  TODO: make dynamic */
+   sector_offset = pos & ((1 << 9) - 1);  /*  TODO: make dynamic */
+   data_offset = ((unsigned long)data) & ((1 << 9) - 1);  /*  TODO: make 
dynamic */
+
+   if (unlikely(sector_offset > 0)) {
+   XIO_ERR("odd sector offset %d\n", sector_offset);
+   goto out;
+   }
+   if (unlikely(sector_offset != data_offset)) {
+   XIO_ERR("bad alignment: sector_offset %d != data_offset %d\n", 
sector_offset, data_offset);
+   goto out;
+   }
+   if (unlikely(rest_len & ((1 << 9) - 1))) {
+   XIO_ERR("odd length %d\n", rest_len);
+   goto out;
+   }
+
+   page_offset = ((unsigned long)data) & (PAGE_SIZE - 1);
+   page_len = rest_len + page_offset;
+   bvec_count = (page_len - 1) / 

[RFC 31/32] mars: add new module Kconfig

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/Kconfig | 266 +++
 1 file changed, 266 insertions(+)
 create mode 100644 drivers/staging/mars/Kconfig

diff --git a/drivers/staging/mars/Kconfig b/drivers/staging/mars/Kconfig
new file mode 100644
index ..836185e9509c
--- /dev/null
+++ b/drivers/staging/mars/Kconfig
@@ -0,0 +1,266 @@
+#
+# MARS configuration
+#
+
+config MARS
+   tristate "storage system MARS (EXPERIMENTAL)"
+   depends on BLOCK && PROC_SYSCTL && HIGH_RES_TIMERS && !DEBUG_SLAB && 
!DEBUG_SG
+   default n
+   ---help---
+ MARS is a long-distance replication of generic block devices.
+ It works asynchronously and tolerates network bottlenecks.
+ Please read the full documentation at
+   
https://github.com/schoebel/mars/blob/master/docu/mars-manual.pdf?raw=true
+ Always compile MARS as a module!
+
+config MARS_CHECKS
+   bool "enable simple runtime checks in MARS"
+   depends on MARS
+   default y
+   ---help---
+ These checks should be rather lightweight. Use them
+ for beta testing and for production systems where
+ safety is more important than performance.
+ In case of bugs in the reference counting, an automatic   repair
+ is attempted, which lowers the risk of memory corruptions.
+ Disable only if you need the absolutely last grain of
+ performance.
+ If unsure, say Y here.
+
+config MARS_DEBUG
+   bool "enable full runtime checks and some tracing in MARS"
+   depends on MARS
+   default n
+   ---help---
+ Some of these checks and some additional error tracing may
+ consume noticeable amounts of memory. However, this is extremely
+ valuable for finding bugs, even in production systems.
+
+ OFF for production systems. ON for testing!
+
+ If you encounter bugs in production systems, you
+ may / should use this also in production if you carefully
+ monitor your systems.
+
+config MARS_DEBUG_MEM
+   bool "debug memory operations"
+   depends on MARS_DEBUG
+   default n
+   ---help---
+ This adds considerable space and time overhead, but catches
+ many errors (including some that are not caught by kmemleak).
+
+ OFF for production systems. ON for testing!
+ Use only for development and thorough testing!
+
+config MARS_DEBUG_MEM_STRONG
+   bool "intensified debugging of memory operations"
+   depends on MARS_DEBUG_MEM
+   default y
+   ---help---
+ Trace all block allocations, find more errors.
+ Adds some overhead.
+
+ Use for debugging of new bricks or for intensified
+ regression testing.
+
+config MARS_DEBUG_ORDER0
+   bool "also debug order0 operations"
+   depends on MARS_DEBUG_MEM
+   default n
+   ---help---
+ Turn even order 0 allocations into order 1 ones and provoke
+ heavy memory fragmentation problems from the buddy allocator,
+ but catch some additional memory problems.
+ Use only if you know what you are doing!
+ Normally OFF.
+
+config MARS_DEFAULT_PORT
+   int "port number where MARS is listening"
+   depends on MARS
+   default 
+   ---help---
+ Best practice is to uniformly use the same port number
+ in a cluster. Therefore, this is a compiletime constant.
+ You may override this at insmod time via the mars_port= parameter.
+
+config MARS_NET_COMPAT
+   bool "compatibility to 0.1 series network protocol"
+   depends on MARS
+   default y
+   ---help---
+   TRANSITIONAL: this is only needed for _mixed_ operations of the
+   MARS Light 0.1 kernel modules and 0.2 module.
+   Typically, you will need this only during upgrade for minimizig
+   downtime (e.g.  first upgrade secondary side, then handover,
+   and finally upgrade the former primary side).
+   This option will be removed for 0.3 and later stable
+   series, since you will no longer need it.
+
+config MARS_LOGDIR
+   string "absolute path to the logging directory"
+   depends on MARS
+   default "/mars"
+   ---help---
+ Path to the directory where all MARS messages will reside.
+ Usually this is equal to the global /mars directory.
+
+ Logfiles and status files obey the following naming conventions:
+   0.debug.log
+   1.info.log
+   2.warn.log
+   3.error.log
+   4.fatal.log
+   5.total.log
+ Logfiles must already exist in order to be appended.
+ Logiles can be rotated by renaming them and creating
+ a new empty file in place of the ol

[RFC 17/32] mars: add new module xio_bio

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/xio_bricks/xio_bio.c | 845 ++
 include/linux/xio/xio_bio.h   |  85 +++
 2 files changed, 930 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio_bio.c
 create mode 100644 include/linux/xio/xio_bio.h

diff --git a/drivers/staging/mars/xio_bricks/xio_bio.c 
b/drivers/staging/mars/xio_bricks/xio_bio.c
new file mode 100644
index ..97bc4fc46f3e
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio_bio.c
@@ -0,0 +1,845 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+/*  Bio brick (interface to blkdev IO via kernel bios) */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include 
+static struct timing_stats timings[2];
+
+struct threshold bio_submit_threshold = {
+   .thr_ban = _global_ban,
+   .thr_parent = _io_threshold,
+   .thr_limit = BIO_SUBMIT_MAX_LATENCY,
+   .thr_factor = 100,
+   .thr_plus = 0,
+};
+
+struct threshold bio_io_threshold[2] = {
+   [0] = {
+   .thr_ban = _global_ban,
+   .thr_parent = _io_threshold,
+   .thr_limit = BIO_IO_R_MAX_LATENCY,
+   .thr_factor = 10,
+   .thr_plus = 1,
+   },
+   [1] = {
+   .thr_ban = _global_ban,
+   .thr_parent = _io_threshold,
+   .thr_limit = BIO_IO_W_MAX_LATENCY,
+   .thr_factor = 10,
+   .thr_plus = 1,
+   },
+};
+
+/ own type definitions ***/
+
+/ own helper functions ***/
+
+/* This is called from the kernel bio layer.
+ */
+static
+void bio_callback(struct bio *bio)
+{
+   struct bio_aio_aspect *aio_a = bio->bi_private;
+   struct bio_brick *brick;
+   unsigned long flags;
+
+   CHECK_PTR(aio_a, err);
+   CHECK_PTR(aio_a->output, err);
+   brick = aio_a->output->brick;
+   CHECK_PTR(brick, err);
+
+   aio_a->status_code = bio->bi_error;
+
+   spin_lock_irqsave(>lock, flags);
+   list_del(_a->io_head);
+   list_add_tail(_a->io_head, >completed_list);
+   atomic_inc(>completed_count);
+   spin_unlock_irqrestore(>lock, flags);
+
+   wake_up_interruptible(>response_event);
+   goto out_return;
+err:
+   XIO_FAT("cannot handle bio callback\n");
+out_return:;
+}
+
+/* Map from kernel address/length to struct page (if not already known),
+ * check alignment constraints, create bio from it.
+ * Return the length (may be smaller than requested).
+ */
+static
+int make_bio(
+struct bio_brick *brick, void *data, int len, loff_t pos, struct 
bio_aio_aspect *private, struct bio **_bio)
+{
+   unsigned long long sector;
+   int sector_offset;
+   int data_offset;
+   int page_offset;
+   int page_len;
+   int bvec_count;
+   int rest_len = len;
+   int result_len = 0;
+   int status;
+   int i;
+   struct bio *bio = NULL;
+   struct block_device *bdev;
+
+   status = -EINVAL;
+   CHECK_PTR(brick, out);
+   bdev = brick->bdev;
+   CHECK_PTR(bdev, out);
+
+   if (unlikely(rest_len <= 0)) {
+   XIO_ERR("bad bio len %d\n", rest_len);
+   goto out;
+   }
+
+   sector = pos >> 9; /*  TODO: make dynamic */
+   sector_offset = pos & ((1 << 9) - 1);  /*  TODO: make dynamic */
+   data_offset = ((unsigned long)data) & ((1 << 9) - 1);  /*  TODO: make 
dynamic */
+
+   if (unlikely(sector_offset > 0)) {
+   XIO_ERR("odd sector offset %d\n", sector_offset);
+   goto out;
+   }
+   if (unlikely(sector_offset != data_offset)) {
+   XIO_ERR("bad alignment: sector_offset %d != data_offset %d\n", 
sector_offset, data_offset);
+   goto out;
+   }
+   if (unlikely(rest_len & ((1 << 9) - 1))) {
+   XIO_ERR("odd length %d\n", rest_len);
+   goto out;
+   }
+
+   page_offset = ((unsigned long)data) & (PAGE_SIZE - 1);
+   page_len = rest_len + page_offset;
+   bvec_count = (page_len - 1) / PAGE_SIZE + 1;
+   if (bvec_co

[RFC 12/32] mars: add new module vfs_compat

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 include/linux/brick/vfs_compat.h | 48 
 1 file changed, 48 insertions(+)
 create mode 100644 include/linux/brick/vfs_compat.h

diff --git a/include/linux/brick/vfs_compat.h b/include/linux/brick/vfs_compat.h
new file mode 100644
index ..68d082b70b43
--- /dev/null
+++ b/include/linux/brick/vfs_compat.h
@@ -0,0 +1,48 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _MARS_COMPAT
+#define _MARS_COMPAT
+
+/* TRANSITIONAL compatibility to BOTH the old prepatch
+ * and the new wrappers around vfs_*().
+ */
+#ifndef MARS_MAJOR
+#define __USE_COMPAT
+#endif
+
+#ifdef __USE_COMPAT
+
+int _compat_symlink(
+const char __user *oldname,
+   const char __user *newname,
+   struct timespec *mtime);
+
+int _compat_mkdir(
+const char __user *pathname,
+ int mode);
+
+int _compat_rename(
+const char __user *oldname,
+  const char __user *newname);
+
+int _compat_unlink(const char __user *pathname);
+
+#else
+#include 
+#endif
+#endif
-- 
2.11.0



[RFC 26/32] mars: add new module net

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 drivers/staging/mars/mars/net.c | 109 
 1 file changed, 109 insertions(+)
 create mode 100644 drivers/staging/mars/mars/net.c

diff --git a/drivers/staging/mars/mars/net.c b/drivers/staging/mars/mars/net.c
new file mode 100644
index ..d1b9715c0a93
--- /dev/null
+++ b/drivers/staging/mars/mars/net.c
@@ -0,0 +1,109 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+
+#include "strategy.h"
+#include 
+
+static
+char *_xio_translate_hostname(const char *name)
+{
+   char *res = brick_strdup(name);
+   char *test;
+   char *tmp;
+
+   for (tmp = res; *tmp; tmp++) {
+   if (*tmp == ':') {
+   *tmp = '\0';
+   break;
+   }
+   }
+
+   tmp = path_make("/mars/ips/ip-%s", res);
+   if (unlikely(!tmp))
+   goto done;
+
+   test = mars_readlink(tmp);
+   if (test && test[0]) {
+   XIO_DBG("'%s' => '%s'\n", tmp, test);
+   brick_string_free(res);
+   res = test;
+   } else {
+   brick_string_free(test);
+   XIO_WRN("no hostname translation for '%s'\n", tmp);
+   }
+   brick_string_free(tmp);
+
+done:
+   return res;
+}
+
+int xio_send_dent_list(struct xio_socket *sock, struct list_head *anchor)
+{
+   struct list_head *tmp;
+   struct mars_dent *dent;
+   int status = 0;
+
+   for (tmp = anchor->next; tmp != anchor; tmp = tmp->next) {
+   dent = container_of(tmp, struct mars_dent, dent_link);
+   status = xio_send_struct(sock, dent, mars_dent_meta);
+   if (status < 0)
+   break;
+   }
+   if (status >= 0) { /*  send EOR */
+   status = xio_send_struct(sock, NULL, mars_dent_meta);
+   }
+   return status;
+}
+
+int xio_recv_dent_list(struct xio_socket *sock, struct list_head *anchor)
+{
+   int status;
+
+   for (;;) {
+   struct mars_dent *dent = brick_zmem_alloc(sizeof(struct 
mars_dent));
+
+   INIT_LIST_HEAD(>dent_link);
+   INIT_LIST_HEAD(>brick_list);
+
+   status = xio_recv_struct(sock, dent, mars_dent_meta);
+   if (status <= 0) {
+   xio_free_dent(dent);
+   goto done;
+   }
+   list_add_tail(>dent_link, anchor);
+   }
+done:
+   return status;
+}
+
+/* module init stuff /
+
+int __init init_sy_net(void)
+{
+   XIO_INF("init_sy_net()\n");
+   xio_translate_hostname = _xio_translate_hostname;
+   return 0;
+}
+
+void exit_sy_net(void)
+{
+   XIO_INF("exit_sy_net()\n");
+}
-- 
2.11.0



[RFC 12/32] mars: add new module vfs_compat

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 include/linux/brick/vfs_compat.h | 48 
 1 file changed, 48 insertions(+)
 create mode 100644 include/linux/brick/vfs_compat.h

diff --git a/include/linux/brick/vfs_compat.h b/include/linux/brick/vfs_compat.h
new file mode 100644
index ..68d082b70b43
--- /dev/null
+++ b/include/linux/brick/vfs_compat.h
@@ -0,0 +1,48 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _MARS_COMPAT
+#define _MARS_COMPAT
+
+/* TRANSITIONAL compatibility to BOTH the old prepatch
+ * and the new wrappers around vfs_*().
+ */
+#ifndef MARS_MAJOR
+#define __USE_COMPAT
+#endif
+
+#ifdef __USE_COMPAT
+
+int _compat_symlink(
+const char __user *oldname,
+   const char __user *newname,
+   struct timespec *mtime);
+
+int _compat_mkdir(
+const char __user *pathname,
+ int mode);
+
+int _compat_rename(
+const char __user *oldname,
+  const char __user *newname);
+
+int _compat_unlink(const char __user *pathname);
+
+#else
+#include 
+#endif
+#endif
-- 
2.11.0



[RFC 26/32] mars: add new module net

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/mars/net.c | 109 
 1 file changed, 109 insertions(+)
 create mode 100644 drivers/staging/mars/mars/net.c

diff --git a/drivers/staging/mars/mars/net.c b/drivers/staging/mars/mars/net.c
new file mode 100644
index ..d1b9715c0a93
--- /dev/null
+++ b/drivers/staging/mars/mars/net.c
@@ -0,0 +1,109 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+
+#include "strategy.h"
+#include 
+
+static
+char *_xio_translate_hostname(const char *name)
+{
+   char *res = brick_strdup(name);
+   char *test;
+   char *tmp;
+
+   for (tmp = res; *tmp; tmp++) {
+   if (*tmp == ':') {
+   *tmp = '\0';
+   break;
+   }
+   }
+
+   tmp = path_make("/mars/ips/ip-%s", res);
+   if (unlikely(!tmp))
+   goto done;
+
+   test = mars_readlink(tmp);
+   if (test && test[0]) {
+   XIO_DBG("'%s' => '%s'\n", tmp, test);
+   brick_string_free(res);
+   res = test;
+   } else {
+   brick_string_free(test);
+   XIO_WRN("no hostname translation for '%s'\n", tmp);
+   }
+   brick_string_free(tmp);
+
+done:
+   return res;
+}
+
+int xio_send_dent_list(struct xio_socket *sock, struct list_head *anchor)
+{
+   struct list_head *tmp;
+   struct mars_dent *dent;
+   int status = 0;
+
+   for (tmp = anchor->next; tmp != anchor; tmp = tmp->next) {
+   dent = container_of(tmp, struct mars_dent, dent_link);
+   status = xio_send_struct(sock, dent, mars_dent_meta);
+   if (status < 0)
+   break;
+   }
+   if (status >= 0) { /*  send EOR */
+   status = xio_send_struct(sock, NULL, mars_dent_meta);
+   }
+   return status;
+}
+
+int xio_recv_dent_list(struct xio_socket *sock, struct list_head *anchor)
+{
+   int status;
+
+   for (;;) {
+   struct mars_dent *dent = brick_zmem_alloc(sizeof(struct 
mars_dent));
+
+   INIT_LIST_HEAD(>dent_link);
+   INIT_LIST_HEAD(>brick_list);
+
+   status = xio_recv_struct(sock, dent, mars_dent_meta);
+   if (status <= 0) {
+   xio_free_dent(dent);
+   goto done;
+   }
+   list_add_tail(>dent_link, anchor);
+   }
+done:
+   return status;
+}
+
+/* module init stuff /
+
+int __init init_sy_net(void)
+{
+   XIO_INF("init_sy_net()\n");
+   xio_translate_hostname = _xio_translate_hostname;
+   return 0;
+}
+
+void exit_sy_net(void)
+{
+   XIO_INF("exit_sy_net()\n");
+}
-- 
2.11.0



[RFC 30/32] mars: add new module Makefile

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 drivers/staging/mars/Makefile | 96 +++
 1 file changed, 96 insertions(+)
 create mode 100644 drivers/staging/mars/Makefile

diff --git a/drivers/staging/mars/Makefile b/drivers/staging/mars/Makefile
new file mode 100644
index ..5e94c3c692c2
--- /dev/null
+++ b/drivers/staging/mars/Makefile
@@ -0,0 +1,96 @@
+#
+# Makefile for MARS
+#
+
+# remove_this
+#
+# TST: this was required by some sysadmins some years ago for
+# very 1&1-specific OOT Debian build methods.
+# Not tested in other environments. Might need some tweaks, or could
+# be removed in the long term.
+#
+ifndef CONFIG_MARS
+# mars_config.h is generated by a simple Kconfig parser (gen_config.pl)
+# at build time.
+# It does not respect any Kconfig dependencies.
+# Therefore, it is unsafe. Use at your own risk!
+# It is ONLY used for out-of-tree builds.
+#
+CONFIG_MARS_BIGMODULE := m
+CONFIG_MARS_NET_COMPAT := y
+obj-$(CONFIG_MARS_BIGMODULE)   += mars.o
+extra-y+= mars_config.h
+GEN_CONFIG_SCRIPT := $(src)/../scripts/gen_config.pl
+$(obj)/mars_config.h: $(obj)/buildtag.h
+$(obj)/mars_config.h: $(src)/Kconfig $(GEN_CONFIG_SCRIPT)
+   $(Q)$(kecho) "MARS: using compiler $($(CC) --version | head -1)"
+   $(CC) -v
+   $(Q)$(kecho) "MARS: Generating $@"
+   $(Q)set -e; \
+   if [ ! -x $(GEN_CONFIG_SCRIPT) ]; then \
+   $(kecho) "MARS: cannot execute script $(GEN_CONFIG_SCRIPT)"; \
+   /bin/false; \
+   fi; \
+   cat $< | $(GEN_CONFIG_SCRIPT) > $@;
+   cat $@;
+endif
+# end_remove_this
+
+obj-$(CONFIG_MARS) += mars.o
+
+KBUILD_CFLAGS += -fdelete-null-pointer-checks
+
+# remove_this
+# The following is 1&1 specific. Don't use anywhere else.
+ifneq ($(KBUILD_EXTMOD),)
+  CONFIG_MARS := m
+# mars_config.h is generated by a simple Kconfig parser (gen_config.pl)
+# at build time.
+# It does not respect any Kconfig dependencies.
+# Therefore, it is unsafe. Use at your own risk!
+# It is ONLY used for out-of-tree builds.
+#
+extra-y+= mars_config.h
+GEN_CONFIG_SCRIPT := $(src)/../scripts/gen_config.pl
+$(obj)/mars_config.h: $(obj)/buildtag.h
+$(obj)/mars_config.h: $(src)/Kconfig $(GEN_CONFIG_SCRIPT)
+   $(Q)$(kecho) "MARS: using compiler $($(CC) --version | head -1)"
+   $(CC) -v
+   $(Q)$(kecho) "MARS: Generating $@"
+   $(Q)set -e; \
+   if [ ! -x $(GEN_CONFIG_SCRIPT) ]; then \
+   $(kecho) "MARS: cannot execute script $(GEN_CONFIG_SCRIPT)"; \
+   /bin/false; \
+   fi; \
+   cat $< | $(GEN_CONFIG_SCRIPT) > $@;
+   cat $@;
+endif
+# end_remove_this
+
+obj-$(CONFIG_MARS) += mars.o
+
+mars-objs :=   \
+   lamport.o   \
+   brick_say.o \
+   brick_mem.o \
+   brick.o \
+   xio_bricks/xio.o\
+   xio_bricks/lib_log.o\
+   lib/lib_rank.o  \
+   lib/lib_limiter.o   \
+   lib/lib_timing.o\
+   xio_bricks/lib_mapfree.o\
+   xio_bricks/xio_net.o\
+   mars/server_strategy.o  \
+   xio_bricks/xio_server.o \
+   xio_bricks/xio_client.o \
+   xio_bricks/xio_sio.o\
+   xio_bricks/xio_bio.o\
+   xio_bricks/xio_if.o \
+   xio_bricks/xio_copy.o   \
+   xio_bricks/xio_trans_logger.o   \
+   mars/main_strategy.o\
+   mars/net.o  \
+   mars/mars_proc.o\
+   mars/mars_main.o
+
-- 
2.11.0



[RFC 15/32] mars: add new module lib_mapfree

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 drivers/staging/mars/xio_bricks/lib_mapfree.c | 382 ++
 include/linux/xio/lib_mapfree.h   |  84 ++
 2 files changed, 466 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/lib_mapfree.c
 create mode 100644 include/linux/xio/lib_mapfree.h

diff --git a/drivers/staging/mars/xio_bricks/lib_mapfree.c 
b/drivers/staging/mars/xio_bricks/lib_mapfree.c
new file mode 100644
index ..fc7c057fc993
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/lib_mapfree.c
@@ -0,0 +1,382 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*  time to wait between background mapfree operations */
+int mapfree_period_sec = 10;
+
+/*  some grace space where no regular cleanup should occur */
+int mapfree_grace_keep_mb = 16;
+
+static
+DECLARE_RWSEM(mapfree_mutex);
+
+static
+LIST_HEAD(mapfree_list);
+
+void mapfree_pages(struct mapfree_info *mf, int grace_keep)
+{
+   struct address_space *mapping;
+   pgoff_t start;
+   pgoff_t end;
+
+   if (unlikely(!mf))
+   goto done;
+   if (unlikely(!mf->mf_filp))
+   goto done;
+
+   mapping = mf->mf_filp->f_mapping;
+   if (unlikely(!mapping))
+   goto done;
+
+   if (grace_keep < 0) { /*  force full flush */
+   start = 0;
+   end = -1;
+   } else {
+   unsigned long flags;
+   loff_t tmp;
+   loff_t min;
+
+   spin_lock_irqsave(>mf_lock, flags);
+
+   tmp = mf->mf_min[0];
+   min = tmp;
+   if (likely(mf->mf_min[1] < min))
+   min = mf->mf_min[1];
+   if (tmp) {
+   mf->mf_min[1] = tmp;
+   mf->mf_min[0] = 0;
+   }
+
+   spin_unlock_irqrestore(>mf_lock, flags);
+
+   min -= (loff_t)grace_keep * (1024 * 1024); /*  megabytes */
+   end = 0;
+
+   if (min > 0 || mf->mf_last) {
+   start = mf->mf_last / PAGE_SIZE;
+   /*  add some grace overlapping */
+   if (likely(start > 0))
+   start--;
+   mf->mf_last = min;
+   end = min / PAGE_SIZE;
+   } else  { /*  there was no progress for at least 2 rounds */
+   start = 0;
+   if (!grace_keep) /*  also flush thoroughly */
+   end = -1;
+   }
+
+   XIO_DBG("file = '%s' start = %lu end = %lu\n", mf->mf_name, 
start, end);
+   }
+
+   if (end > start || end == -1)
+   invalidate_mapping_pages(mapping, start, end);
+
+done:;
+}
+
+static
+void _mapfree_put(struct mapfree_info *mf)
+{
+   if (atomic_dec_and_test(>mf_count)) {
+   XIO_DBG("closing file '%s' filp = %p\n", mf->mf_name, 
mf->mf_filp);
+   list_del_init(>mf_head);
+   CHECK_HEAD_EMPTY(>mf_dirty_anchor);
+   if (likely(mf->mf_filp)) {
+   mapfree_pages(mf, -1);
+   filp_close(mf->mf_filp, NULL);
+   }
+   brick_string_free(mf->mf_name);
+   brick_mem_free(mf);
+   }
+}
+
+void mapfree_put(struct mapfree_info *mf)
+{
+   if (likely(mf)) {
+   down_write(_mutex);
+   _mapfree_put(mf);
+   up_write(_mutex);
+   }
+}
+
+struct mapfree_info *mapfree_get(const char *name, int flags)
+{
+   struct mapfree_info *mf = NULL;
+   struct list_head *tmp;
+
+   if (!(flags & O_DIRECT)) {
+   down_read(_mutex);
+   for (tmp = mapfree_list.next; tmp != _list; tmp = 
tmp->next) {
+   struct mapfree_info *_mf = container_of(tmp, struct 
mapfree_info, mf_head);
+
+   if (_mf->mf_flags == flags && !strcmp(_mf->mf_name, 
name)) {
+   mf = _mf;
+   atom

[RFC 32/32] mars: activate build

2016-12-30 Thread Thomas Schoebel-Theuer
From: Thomas Schoebel-Theuer <t...@1und1.de>

---
 drivers/staging/Kconfig  | 2 ++
 drivers/staging/Makefile | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig
index 5d3b86a33857..bbccc4f0ebbe 100644
--- a/drivers/staging/Kconfig
+++ b/drivers/staging/Kconfig
@@ -56,6 +56,8 @@ source "drivers/staging/vt6656/Kconfig"
 
 source "drivers/staging/iio/Kconfig"
 
+source "drivers/staging/mars/Kconfig"
+
 source "drivers/staging/sm750fb/Kconfig"
 
 source "drivers/staging/xgifb/Kconfig"
diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile
index 30918edef5e3..01732bd65542 100644
--- a/drivers/staging/Makefile
+++ b/drivers/staging/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_VT6655)  += vt6655/
 obj-$(CONFIG_VT6656)   += vt6656/
 obj-$(CONFIG_VME_BUS)  += vme/
 obj-$(CONFIG_IIO)  += iio/
+obj-$(CONFIG_MARS) += mars/
 obj-$(CONFIG_FB_SM750) += sm750fb/
 obj-$(CONFIG_FB_XGI)   += xgifb/
 obj-$(CONFIG_USB_EMXX) += emxx_udc/
-- 
2.11.0



[RFC 30/32] mars: add new module Makefile

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/Makefile | 96 +++
 1 file changed, 96 insertions(+)
 create mode 100644 drivers/staging/mars/Makefile

diff --git a/drivers/staging/mars/Makefile b/drivers/staging/mars/Makefile
new file mode 100644
index ..5e94c3c692c2
--- /dev/null
+++ b/drivers/staging/mars/Makefile
@@ -0,0 +1,96 @@
+#
+# Makefile for MARS
+#
+
+# remove_this
+#
+# TST: this was required by some sysadmins some years ago for
+# very 1&1-specific OOT Debian build methods.
+# Not tested in other environments. Might need some tweaks, or could
+# be removed in the long term.
+#
+ifndef CONFIG_MARS
+# mars_config.h is generated by a simple Kconfig parser (gen_config.pl)
+# at build time.
+# It does not respect any Kconfig dependencies.
+# Therefore, it is unsafe. Use at your own risk!
+# It is ONLY used for out-of-tree builds.
+#
+CONFIG_MARS_BIGMODULE := m
+CONFIG_MARS_NET_COMPAT := y
+obj-$(CONFIG_MARS_BIGMODULE)   += mars.o
+extra-y+= mars_config.h
+GEN_CONFIG_SCRIPT := $(src)/../scripts/gen_config.pl
+$(obj)/mars_config.h: $(obj)/buildtag.h
+$(obj)/mars_config.h: $(src)/Kconfig $(GEN_CONFIG_SCRIPT)
+   $(Q)$(kecho) "MARS: using compiler $($(CC) --version | head -1)"
+   $(CC) -v
+   $(Q)$(kecho) "MARS: Generating $@"
+   $(Q)set -e; \
+   if [ ! -x $(GEN_CONFIG_SCRIPT) ]; then \
+   $(kecho) "MARS: cannot execute script $(GEN_CONFIG_SCRIPT)"; \
+   /bin/false; \
+   fi; \
+   cat $< | $(GEN_CONFIG_SCRIPT) > $@;
+   cat $@;
+endif
+# end_remove_this
+
+obj-$(CONFIG_MARS) += mars.o
+
+KBUILD_CFLAGS += -fdelete-null-pointer-checks
+
+# remove_this
+# The following is 1&1 specific. Don't use anywhere else.
+ifneq ($(KBUILD_EXTMOD),)
+  CONFIG_MARS := m
+# mars_config.h is generated by a simple Kconfig parser (gen_config.pl)
+# at build time.
+# It does not respect any Kconfig dependencies.
+# Therefore, it is unsafe. Use at your own risk!
+# It is ONLY used for out-of-tree builds.
+#
+extra-y+= mars_config.h
+GEN_CONFIG_SCRIPT := $(src)/../scripts/gen_config.pl
+$(obj)/mars_config.h: $(obj)/buildtag.h
+$(obj)/mars_config.h: $(src)/Kconfig $(GEN_CONFIG_SCRIPT)
+   $(Q)$(kecho) "MARS: using compiler $($(CC) --version | head -1)"
+   $(CC) -v
+   $(Q)$(kecho) "MARS: Generating $@"
+   $(Q)set -e; \
+   if [ ! -x $(GEN_CONFIG_SCRIPT) ]; then \
+   $(kecho) "MARS: cannot execute script $(GEN_CONFIG_SCRIPT)"; \
+   /bin/false; \
+   fi; \
+   cat $< | $(GEN_CONFIG_SCRIPT) > $@;
+   cat $@;
+endif
+# end_remove_this
+
+obj-$(CONFIG_MARS) += mars.o
+
+mars-objs :=   \
+   lamport.o   \
+   brick_say.o \
+   brick_mem.o \
+   brick.o \
+   xio_bricks/xio.o\
+   xio_bricks/lib_log.o\
+   lib/lib_rank.o  \
+   lib/lib_limiter.o   \
+   lib/lib_timing.o\
+   xio_bricks/lib_mapfree.o\
+   xio_bricks/xio_net.o\
+   mars/server_strategy.o  \
+   xio_bricks/xio_server.o \
+   xio_bricks/xio_client.o \
+   xio_bricks/xio_sio.o\
+   xio_bricks/xio_bio.o\
+   xio_bricks/xio_if.o \
+   xio_bricks/xio_copy.o   \
+   xio_bricks/xio_trans_logger.o   \
+   mars/main_strategy.o\
+   mars/net.o  \
+   mars/mars_proc.o\
+   mars/mars_main.o
+
-- 
2.11.0



[RFC 15/32] mars: add new module lib_mapfree

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/xio_bricks/lib_mapfree.c | 382 ++
 include/linux/xio/lib_mapfree.h   |  84 ++
 2 files changed, 466 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/lib_mapfree.c
 create mode 100644 include/linux/xio/lib_mapfree.h

diff --git a/drivers/staging/mars/xio_bricks/lib_mapfree.c 
b/drivers/staging/mars/xio_bricks/lib_mapfree.c
new file mode 100644
index ..fc7c057fc993
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/lib_mapfree.c
@@ -0,0 +1,382 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*  time to wait between background mapfree operations */
+int mapfree_period_sec = 10;
+
+/*  some grace space where no regular cleanup should occur */
+int mapfree_grace_keep_mb = 16;
+
+static
+DECLARE_RWSEM(mapfree_mutex);
+
+static
+LIST_HEAD(mapfree_list);
+
+void mapfree_pages(struct mapfree_info *mf, int grace_keep)
+{
+   struct address_space *mapping;
+   pgoff_t start;
+   pgoff_t end;
+
+   if (unlikely(!mf))
+   goto done;
+   if (unlikely(!mf->mf_filp))
+   goto done;
+
+   mapping = mf->mf_filp->f_mapping;
+   if (unlikely(!mapping))
+   goto done;
+
+   if (grace_keep < 0) { /*  force full flush */
+   start = 0;
+   end = -1;
+   } else {
+   unsigned long flags;
+   loff_t tmp;
+   loff_t min;
+
+   spin_lock_irqsave(>mf_lock, flags);
+
+   tmp = mf->mf_min[0];
+   min = tmp;
+   if (likely(mf->mf_min[1] < min))
+   min = mf->mf_min[1];
+   if (tmp) {
+   mf->mf_min[1] = tmp;
+   mf->mf_min[0] = 0;
+   }
+
+   spin_unlock_irqrestore(>mf_lock, flags);
+
+   min -= (loff_t)grace_keep * (1024 * 1024); /*  megabytes */
+   end = 0;
+
+   if (min > 0 || mf->mf_last) {
+   start = mf->mf_last / PAGE_SIZE;
+   /*  add some grace overlapping */
+   if (likely(start > 0))
+   start--;
+   mf->mf_last = min;
+   end = min / PAGE_SIZE;
+   } else  { /*  there was no progress for at least 2 rounds */
+   start = 0;
+   if (!grace_keep) /*  also flush thoroughly */
+   end = -1;
+   }
+
+   XIO_DBG("file = '%s' start = %lu end = %lu\n", mf->mf_name, 
start, end);
+   }
+
+   if (end > start || end == -1)
+   invalidate_mapping_pages(mapping, start, end);
+
+done:;
+}
+
+static
+void _mapfree_put(struct mapfree_info *mf)
+{
+   if (atomic_dec_and_test(>mf_count)) {
+   XIO_DBG("closing file '%s' filp = %p\n", mf->mf_name, 
mf->mf_filp);
+   list_del_init(>mf_head);
+   CHECK_HEAD_EMPTY(>mf_dirty_anchor);
+   if (likely(mf->mf_filp)) {
+   mapfree_pages(mf, -1);
+   filp_close(mf->mf_filp, NULL);
+   }
+   brick_string_free(mf->mf_name);
+   brick_mem_free(mf);
+   }
+}
+
+void mapfree_put(struct mapfree_info *mf)
+{
+   if (likely(mf)) {
+   down_write(_mutex);
+   _mapfree_put(mf);
+   up_write(_mutex);
+   }
+}
+
+struct mapfree_info *mapfree_get(const char *name, int flags)
+{
+   struct mapfree_info *mf = NULL;
+   struct list_head *tmp;
+
+   if (!(flags & O_DIRECT)) {
+   down_read(_mutex);
+   for (tmp = mapfree_list.next; tmp != _list; tmp = 
tmp->next) {
+   struct mapfree_info *_mf = container_of(tmp, struct 
mapfree_info, mf_head);
+
+   if (_mf->mf_flags == flags && !strcmp(_mf->mf_name, 
name)) {
+   mf = _mf;
+   atomic_inc(>mf_count);
+  

[RFC 32/32] mars: activate build

2016-12-30 Thread Thomas Schoebel-Theuer
From: Thomas Schoebel-Theuer 

---
 drivers/staging/Kconfig  | 2 ++
 drivers/staging/Makefile | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/staging/Kconfig b/drivers/staging/Kconfig
index 5d3b86a33857..bbccc4f0ebbe 100644
--- a/drivers/staging/Kconfig
+++ b/drivers/staging/Kconfig
@@ -56,6 +56,8 @@ source "drivers/staging/vt6656/Kconfig"
 
 source "drivers/staging/iio/Kconfig"
 
+source "drivers/staging/mars/Kconfig"
+
 source "drivers/staging/sm750fb/Kconfig"
 
 source "drivers/staging/xgifb/Kconfig"
diff --git a/drivers/staging/Makefile b/drivers/staging/Makefile
index 30918edef5e3..01732bd65542 100644
--- a/drivers/staging/Makefile
+++ b/drivers/staging/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_VT6655)  += vt6655/
 obj-$(CONFIG_VT6656)   += vt6656/
 obj-$(CONFIG_VME_BUS)  += vme/
 obj-$(CONFIG_IIO)  += iio/
+obj-$(CONFIG_MARS) += mars/
 obj-$(CONFIG_FB_SM750) += sm750fb/
 obj-$(CONFIG_FB_XGI)   += xgifb/
 obj-$(CONFIG_USB_EMXX) += emxx_udc/
-- 
2.11.0



[RFC 24/32] mars: add new module strategy

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 drivers/staging/mars/mars/strategy.h | 239 +++
 1 file changed, 239 insertions(+)
 create mode 100644 drivers/staging/mars/mars/strategy.h

diff --git a/drivers/staging/mars/mars/strategy.h 
b/drivers/staging/mars/mars/strategy.h
new file mode 100644
index ..d570772847c2
--- /dev/null
+++ b/drivers/staging/mars/mars/strategy.h
@@ -0,0 +1,239 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+/*  OLD CODE = > will disappear! */
+#ifndef _OLD_STRATEGY
+#define _OLD_STRATEGY
+
+#define _STRATEGY  /*  call this only in strategy bricks, 
never in ordinary bricks */
+
+#include 
+
+#define MARS_ARGV_MAX  4
+
+extern loff_t global_total_space;
+extern loff_t global_remaining_space;
+
+extern int global_logrot_auto;
+extern int global_free_space_0;
+extern int global_free_space_1;
+extern int global_free_space_2;
+extern int global_free_space_3;
+extern int global_free_space_4;
+extern int global_sync_want;
+extern int global_sync_nr;
+extern int global_sync_limit;
+extern int mars_rollover_interval;
+extern int mars_scan_interval;
+extern int mars_propagate_interval;
+extern int mars_sync_flip_interval;
+extern int mars_peer_abort;
+extern int mars_emergency_mode;
+extern int mars_reset_emergency;
+extern int mars_keep_msg;
+
+extern int mars_fast_fullsync;
+
+#define MARS_DENT(TYPE)
\
+   struct list_head dent_link; \
+   struct list_head brick_list;\
+   struct TYPE *d_parent;  \
+   char *d_argv[MARS_ARGV_MAX];  /* for internal use, will be 
automatically deallocated*/\
+   char *d_args; /* ditto uninterpreted */ \
+   char *d_name; /* current path component */  \
+   char *d_rest; /* some "meaningful" rest of d_name*/ \
+   char *d_path; /* full absolute path */  \
+   struct say_channel *d_say_channel; /* for messages */   \
+   loff_t d_corr_A; /* logical size correction */  \
+   loff_t d_corr_B; /* logical size correction */  \
+   int   d_depth;  \
+   /* from readdir() = > often DT_UNKNOWN */   \
+   /* don't rely on it - use stat_val.mode instead */  \
+   unsigned int d_type;\
+   int   d_class;/* for pre-grouping order */  \
+   int   d_serial;   /* for pre-grouping order */  \
+   int   d_version;  /* dynamic programming per call of mars_ent_work() */\
+   int   d_child_count;\
+   bool d_killme;  \
+   bool d_use_channel; \
+   struct kstat stat_val;  \
+   char *link_val; \
+   struct mars_global *d_global;   \
+   void (*d_private_destruct)(void *private);  \
+   void *d_private
+
+struct mars_dent {
+   MARS_DENT(mars_dent);
+};
+
+extern const struct meta mars_kstat_meta[];
+extern const struct meta mars_dent_meta[];
+
+struct mars_global {
+   struct rw_semaphore dent_mutex;
+   struct rw_semaphore brick_mutex;
+   struct generic_switch global_power;
+   struct list_head dent_anchor;
+   struct list_head brick_anchor;
+
+   wait_queue_head_t main_event;
+   int global_version;
+   int deleted_my_border;
+   int deleted_border;
+   int deleted_min;
+   bool main_trigger;
+};
+
+extern void bind_to_dent(struct mars_dent *dent, struct say_channel **ch);
+
+typedef int (
+*mars_dent_checker_fn)(
+struct mars_dent *parent,
+const char *name,
+int namlen,
+unsigned int d_type,
+int *prefix,
+int *serial,
+bool *use_channel);
+
+typedef int (*mars_dent_worker_fn)(struct mars_global *global, struct 
mars_dent *dent, bool prepare, bool direction);
+
+extern i

[RFC 16/32] mars: add new module lib_log

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 drivers/staging/mars/xio_bricks/lib_log.c | 506 ++
 include/linux/xio/lib_log.h   | 333 
 2 files changed, 839 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/lib_log.c
 create mode 100644 include/linux/xio/lib_log.h

diff --git a/drivers/staging/mars/xio_bricks/lib_log.c 
b/drivers/staging/mars/xio_bricks/lib_log.c
new file mode 100644
index ..e0d086a0981f
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/lib_log.c
@@ -0,0 +1,506 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+
+atomic_t global_aio_flying = ATOMIC_INIT(0);
+
+void exit_logst(struct log_status *logst)
+{
+   int count;
+
+   log_flush(logst);
+
+   /*  TODO: replace by event */
+   count = 0;
+   while (atomic_read(>aio_flying) > 0) {
+   if (!count++)
+   XIO_DBG("waiting for IO terminating...");
+   brick_msleep(500);
+   }
+   if (logst->read_aio) {
+   XIO_DBG("putting read_aio\n");
+   GENERIC_INPUT_CALL(logst->input, aio_put, logst->read_aio);
+   logst->read_aio = NULL;
+   }
+   if (logst->log_aio) {
+   XIO_DBG("putting log_aio\n");
+   GENERIC_INPUT_CALL(logst->input, aio_put, logst->log_aio);
+   logst->log_aio = NULL;
+   }
+}
+
+void init_logst(struct log_status *logst, struct xio_input *input, loff_t 
start_pos, loff_t end_pos)
+{
+   exit_logst(logst);
+
+   memset(logst, 0, sizeof(struct log_status));
+
+   logst->input = input;
+   logst->brick = input->brick;
+   logst->start_pos = start_pos;
+   logst->log_pos = start_pos;
+   logst->end_pos = end_pos;
+   init_waitqueue_head(>event);
+}
+
+#define XIO_LOG_CB_MAX 32
+
+struct log_cb_info {
+   struct aio_object *aio;
+   struct log_status *logst;
+   struct semaphore mutex;
+   atomic_t refcount;
+   int nr_cb;
+   void (*endios[XIO_LOG_CB_MAX])(void *private, int error);
+   void *privates[XIO_LOG_CB_MAX];
+};
+
+static
+void put_log_cb_info(struct log_cb_info *cb_info)
+{
+   if (atomic_dec_and_test(_info->refcount))
+   brick_mem_free(cb_info);
+}
+
+static
+void _do_callbacks(struct log_cb_info *cb_info, int error)
+{
+   int i;
+
+   down(_info->mutex);
+   for (i = 0; i < cb_info->nr_cb; i++) {
+   void (*end_fn)(void *private, int error);
+
+   end_fn = cb_info->endios[i];
+   cb_info->endios[i] = NULL;
+   if (end_fn)
+   end_fn(cb_info->privates[i], error);
+   }
+   up(_info->mutex);
+}
+
+static
+void log_write_endio(struct generic_callback *cb)
+{
+   struct log_cb_info *cb_info = cb->cb_private;
+   struct log_status *logst;
+
+   LAST_CALLBACK(cb);
+   CHECK_PTR(cb_info, err);
+
+   logst = cb_info->logst;
+   CHECK_PTR(logst, done);
+
+   _do_callbacks(cb_info, cb->cb_error);
+
+done:
+   put_log_cb_info(cb_info);
+   atomic_dec(>aio_flying);
+   atomic_dec(_aio_flying);
+   if (logst->signal_event)
+   wake_up_interruptible(logst->signal_event);
+
+   goto out_return;
+err:
+   XIO_FAT("internal pointer corruption\n");
+out_return:;
+}
+
+void log_flush(struct log_status *logst)
+{
+   struct aio_object *aio = logst->log_aio;
+   struct log_cb_info *cb_info;
+   int align_size;
+   int gap;
+
+   if (!aio || !logst->count)
+   goto out_return;
+   gap = 0;
+   align_size = (logst->align_size / PAGE_SIZE) * PAGE_SIZE;
+   if (align_size > 0) {
+   /*  round up to next alignment border */
+   int align_offset = logst->offset & (align_size - 1);
+
+   if (align_offset > 0) {
+   int restlen = aio->io_len - logst->offset;
+
+   gap = align_size - align_offset;
+   if (unlikely(gap > restlen))
+   gap = restlen;

[RFC 25/32] mars: add new module main_strategy

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 drivers/staging/mars/mars/main_strategy.c | 2135 +
 1 file changed, 2135 insertions(+)
 create mode 100644 drivers/staging/mars/mars/main_strategy.c

diff --git a/drivers/staging/mars/mars/main_strategy.c 
b/drivers/staging/mars/mars/main_strategy.c
new file mode 100644
index ..7929b566d645
--- /dev/null
+++ b/drivers/staging/mars/mars/main_strategy.c
@@ -0,0 +1,2135 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#define XIO_DEBUGGING
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "strategy.h"
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#define SKIP_BIO   false
+
+/***/
+
+/*  meta descriptions */
+
+const struct meta mars_kstat_meta[] = {
+   META_INI(ino, struct kstat, FIELD_UINT),
+   META_INI(mode, struct kstat, FIELD_UINT),
+   META_INI(size, struct kstat, FIELD_INT),
+   META_INI_SUB(atime, struct kstat, xio_timespec_meta),
+   META_INI_SUB(mtime, struct kstat, xio_timespec_meta),
+   META_INI_SUB(ctime, struct kstat, xio_timespec_meta),
+   META_INI_TRANSFER(blksize, struct kstat, FIELD_UINT, 4),
+   {}
+};
+
+const struct meta mars_dent_meta[] = {
+   META_INI(d_name,struct mars_dent, FIELD_STRING),
+   META_INI(d_rest,struct mars_dent, FIELD_STRING),
+   META_INI(d_path,struct mars_dent, FIELD_STRING),
+   META_INI(d_type,struct mars_dent, FIELD_UINT),
+   META_INI(d_class,   struct mars_dent, FIELD_INT),
+   META_INI(d_serial,  struct mars_dent, FIELD_INT),
+   META_INI(d_corr_A,  struct mars_dent, FIELD_INT),
+   META_INI(d_corr_B,  struct mars_dent, FIELD_INT),
+   META_INI_SUB(stat_val, struct mars_dent, mars_kstat_meta),
+   META_INI(link_val,struct mars_dent, FIELD_STRING),
+   META_INI(d_args,struct mars_dent, FIELD_STRING),
+   META_INI(d_argv[0], struct mars_dent, FIELD_STRING),
+   META_INI(d_argv[1], struct mars_dent, FIELD_STRING),
+   META_INI(d_argv[2], struct mars_dent, FIELD_STRING),
+   META_INI(d_argv[3], struct mars_dent, FIELD_STRING),
+   {}
+};
+
+/***/
+
+/* The _compat_*() functions are needed for the out-of-tree version
+ * of MARS for adapdation to different kernel version.
+ */
+
+/* Hack because of 8bcb77fabd7cbabcad49f58750be8683febee92b
+ */
+static int __path_parent(const char *name, struct path *path, unsigned flags)
+{
+   char *tmp;
+   int len;
+   int error;
+
+   len = strlen(name);
+   while (len > 0 && name[len] != '/')
+   len--;
+   if (unlikely(!len))
+   return -EINVAL;
+
+   tmp = brick_string_alloc(len + 1);
+   strncpy(tmp, name, len);
+   tmp[len] = '\0';
+
+   error = kern_path(tmp, flags | LOOKUP_DIRECTORY | LOOKUP_FOLLOW, path);
+
+   brick_string_free(tmp);
+   return error;
+}
+
+/* code is blindly stolen from symlinkat()
+ * and later adapted to various kernels
+ */
+int _compat_symlink(
+const char __user *oldname,
+   const char __user *newname,
+   struct timespec *mtime)
+{
+   const int newdfd = AT_FDCWD;
+   int error;
+   char *from;
+   struct dentry *dentry;
+   struct path path;
+   unsigned int lookup_flags = 0;
+
+   from = (char *)oldname;
+
+retry:
+   dentry = user_path_create(newdfd, newname, , lookup_flags);
+   error = PTR_ERR(dentry);
+   if (IS_ERR(dentry))
+   goto out_putname;
+
+   error = vfs_symlink(path.dentry->d_inode, dentry, from);
+   if (error >= 0 && mtime) {
+   struct iattr iattr = {
+   .ia_valid = ATTR_MTIME | ATTR_MTIME_SET | 
ATTR_TIMES_SET,
+   .ia_mtime.tv_sec = mtime->tv_sec,
+   .ia_mtime.tv_nsec = mtime->tv_nsec,
+   };
+
+   mutex_lock(>d_inode->i_mutex);
+   error = notify_change(dentry, , NULL);
+   mutex_unlock(>d_inode->i_mutex);
+   }
+   done_path

[RFC 27/32] mars: add new module server_strategy

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer <t...@schoebel-theuer.de>
---
 drivers/staging/mars/mars/server_strategy.c | 436 
 1 file changed, 436 insertions(+)
 create mode 100644 drivers/staging/mars/mars/server_strategy.c

diff --git a/drivers/staging/mars/mars/server_strategy.c 
b/drivers/staging/mars/mars/server_strategy.c
new file mode 100644
index ..3b880c10be49
--- /dev/null
+++ b/drivers/staging/mars/mars/server_strategy.c
@@ -0,0 +1,436 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2016 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2016 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+/* MARS Light specific parts of xio_server
+ */
+
+#include 
+#include 
+#include 
+
+#define _STRATEGY
+#include 
+#include 
+#include 
+#include 
+
+#include "strategy.h"
+
+#include 
+#include 
+
+static
+int dummy_worker(struct mars_global *global, struct mars_dent *dent, bool 
prepare, bool direction)
+{
+   return 0;
+}
+
+static
+int _set_server_sio_params(struct xio_brick *_brick, void *private)
+{
+   struct sio_brick *sio_brick = (void *)_brick;
+
+   if (_brick->type != (void *)_sio_brick_type) {
+   XIO_ERR("bad brick type\n");
+   return -EINVAL;
+   }
+   sio_brick->o_direct = false;
+   sio_brick->o_fdsync = false;
+   XIO_INF("name = '%s' path = '%s'\n", _brick->brick_name, 
_brick->brick_path);
+   return 1;
+}
+
+static
+int _set_server_bio_params(struct xio_brick *_brick, void *private)
+{
+   struct bio_brick *bio_brick;
+
+   if (_brick->type == (void *)_sio_brick_type)
+   return _set_server_sio_params(_brick, private);
+   if (_brick->type != (void *)_bio_brick_type) {
+   XIO_ERR("bad brick type\n");
+   return -EINVAL;
+   }
+   bio_brick = (void *)_brick;
+   bio_brick->ra_pages = 0;
+   bio_brick->do_noidle = true;
+   bio_brick->do_sync = true;
+   bio_brick->do_unplug = true;
+   XIO_INF("name = '%s' path = '%s'\n", _brick->brick_name, 
_brick->brick_path);
+   return 1;
+}
+
+int handler_thread(void *data)
+{
+   struct mars_global handler_global = {
+   .dent_anchor = LIST_HEAD_INIT(handler_global.dent_anchor),
+   .brick_anchor = LIST_HEAD_INIT(handler_global.brick_anchor),
+   .global_power = {
+   .button = true,
+   },
+   .main_event = 
__WAIT_QUEUE_HEAD_INITIALIZER(handler_global.main_event),
+   };
+   struct task_struct *thread = NULL;
+   struct server_brick *brick = data;
+   struct xio_socket *sock = >handler_socket;
+   bool ok = xio_get_socket(sock);
+   unsigned long statist_jiffies = jiffies;
+   int debug_nr;
+   int status = -EINVAL;
+
+   init_rwsem(_global.dent_mutex);
+   init_rwsem(_global.brick_mutex);
+
+   XIO_DBG("#%d --- handler_thread starting on socket %p\n", 
sock->s_debug_nr, sock);
+   if (!ok)
+   goto done;
+
+   thread = brick_thread_create(cb_thread, brick, "xio_cb%d", 
brick->version);
+   if (unlikely(!thread)) {
+   XIO_ERR("cannot create cb thread\n");
+   status = -ENOENT;
+   goto done;
+   }
+   brick->cb_thread = thread;
+
+   brick->handler_running = true;
+   wake_up_interruptible(>startup_event);
+
+   while (!list_empty(_global.brick_anchor) ||
+  xio_socket_is_alive(sock)) {
+   struct xio_cmd cmd = {};
+
+   handler_global.global_version++;
+
+   if (!list_empty(_global.brick_anchor)) {
+   if (server_show_statist && 
!time_is_before_jiffies(statist_jiffies + 10 * HZ)) {
+   show_statistics(_global, "handler");
+   statist_jiffies = jiffies;
+   }
+   if (!xio_socket_is_alive(sock) &&
+   atomic_read(>in_flight) <= 0 &&
+   brick->conn_brick) {
+   if (generic_disconnect((void 
*)brick->inputs[0]) >= 0)
+   brick->conn_brick = NULL;
+   }
+
+  

[RFC 24/32] mars: add new module strategy

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/mars/strategy.h | 239 +++
 1 file changed, 239 insertions(+)
 create mode 100644 drivers/staging/mars/mars/strategy.h

diff --git a/drivers/staging/mars/mars/strategy.h 
b/drivers/staging/mars/mars/strategy.h
new file mode 100644
index ..d570772847c2
--- /dev/null
+++ b/drivers/staging/mars/mars/strategy.h
@@ -0,0 +1,239 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+/*  OLD CODE = > will disappear! */
+#ifndef _OLD_STRATEGY
+#define _OLD_STRATEGY
+
+#define _STRATEGY  /*  call this only in strategy bricks, 
never in ordinary bricks */
+
+#include 
+
+#define MARS_ARGV_MAX  4
+
+extern loff_t global_total_space;
+extern loff_t global_remaining_space;
+
+extern int global_logrot_auto;
+extern int global_free_space_0;
+extern int global_free_space_1;
+extern int global_free_space_2;
+extern int global_free_space_3;
+extern int global_free_space_4;
+extern int global_sync_want;
+extern int global_sync_nr;
+extern int global_sync_limit;
+extern int mars_rollover_interval;
+extern int mars_scan_interval;
+extern int mars_propagate_interval;
+extern int mars_sync_flip_interval;
+extern int mars_peer_abort;
+extern int mars_emergency_mode;
+extern int mars_reset_emergency;
+extern int mars_keep_msg;
+
+extern int mars_fast_fullsync;
+
+#define MARS_DENT(TYPE)
\
+   struct list_head dent_link; \
+   struct list_head brick_list;\
+   struct TYPE *d_parent;  \
+   char *d_argv[MARS_ARGV_MAX];  /* for internal use, will be 
automatically deallocated*/\
+   char *d_args; /* ditto uninterpreted */ \
+   char *d_name; /* current path component */  \
+   char *d_rest; /* some "meaningful" rest of d_name*/ \
+   char *d_path; /* full absolute path */  \
+   struct say_channel *d_say_channel; /* for messages */   \
+   loff_t d_corr_A; /* logical size correction */  \
+   loff_t d_corr_B; /* logical size correction */  \
+   int   d_depth;  \
+   /* from readdir() = > often DT_UNKNOWN */   \
+   /* don't rely on it - use stat_val.mode instead */  \
+   unsigned int d_type;\
+   int   d_class;/* for pre-grouping order */  \
+   int   d_serial;   /* for pre-grouping order */  \
+   int   d_version;  /* dynamic programming per call of mars_ent_work() */\
+   int   d_child_count;\
+   bool d_killme;  \
+   bool d_use_channel; \
+   struct kstat stat_val;  \
+   char *link_val; \
+   struct mars_global *d_global;   \
+   void (*d_private_destruct)(void *private);  \
+   void *d_private
+
+struct mars_dent {
+   MARS_DENT(mars_dent);
+};
+
+extern const struct meta mars_kstat_meta[];
+extern const struct meta mars_dent_meta[];
+
+struct mars_global {
+   struct rw_semaphore dent_mutex;
+   struct rw_semaphore brick_mutex;
+   struct generic_switch global_power;
+   struct list_head dent_anchor;
+   struct list_head brick_anchor;
+
+   wait_queue_head_t main_event;
+   int global_version;
+   int deleted_my_border;
+   int deleted_border;
+   int deleted_min;
+   bool main_trigger;
+};
+
+extern void bind_to_dent(struct mars_dent *dent, struct say_channel **ch);
+
+typedef int (
+*mars_dent_checker_fn)(
+struct mars_dent *parent,
+const char *name,
+int namlen,
+unsigned int d_type,
+int *prefix,
+int *serial,
+bool *use_channel);
+
+typedef int (*mars_dent_worker_fn)(struct mars_global *global, struct 
mars_dent *dent, bool prepare, bool direction);
+
+extern int mars_dent_work(
+struct mars_g

[RFC 16/32] mars: add new module lib_log

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/xio_bricks/lib_log.c | 506 ++
 include/linux/xio/lib_log.h   | 333 
 2 files changed, 839 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/lib_log.c
 create mode 100644 include/linux/xio/lib_log.h

diff --git a/drivers/staging/mars/xio_bricks/lib_log.c 
b/drivers/staging/mars/xio_bricks/lib_log.c
new file mode 100644
index ..e0d086a0981f
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/lib_log.c
@@ -0,0 +1,506 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+
+atomic_t global_aio_flying = ATOMIC_INIT(0);
+
+void exit_logst(struct log_status *logst)
+{
+   int count;
+
+   log_flush(logst);
+
+   /*  TODO: replace by event */
+   count = 0;
+   while (atomic_read(>aio_flying) > 0) {
+   if (!count++)
+   XIO_DBG("waiting for IO terminating...");
+   brick_msleep(500);
+   }
+   if (logst->read_aio) {
+   XIO_DBG("putting read_aio\n");
+   GENERIC_INPUT_CALL(logst->input, aio_put, logst->read_aio);
+   logst->read_aio = NULL;
+   }
+   if (logst->log_aio) {
+   XIO_DBG("putting log_aio\n");
+   GENERIC_INPUT_CALL(logst->input, aio_put, logst->log_aio);
+   logst->log_aio = NULL;
+   }
+}
+
+void init_logst(struct log_status *logst, struct xio_input *input, loff_t 
start_pos, loff_t end_pos)
+{
+   exit_logst(logst);
+
+   memset(logst, 0, sizeof(struct log_status));
+
+   logst->input = input;
+   logst->brick = input->brick;
+   logst->start_pos = start_pos;
+   logst->log_pos = start_pos;
+   logst->end_pos = end_pos;
+   init_waitqueue_head(>event);
+}
+
+#define XIO_LOG_CB_MAX 32
+
+struct log_cb_info {
+   struct aio_object *aio;
+   struct log_status *logst;
+   struct semaphore mutex;
+   atomic_t refcount;
+   int nr_cb;
+   void (*endios[XIO_LOG_CB_MAX])(void *private, int error);
+   void *privates[XIO_LOG_CB_MAX];
+};
+
+static
+void put_log_cb_info(struct log_cb_info *cb_info)
+{
+   if (atomic_dec_and_test(_info->refcount))
+   brick_mem_free(cb_info);
+}
+
+static
+void _do_callbacks(struct log_cb_info *cb_info, int error)
+{
+   int i;
+
+   down(_info->mutex);
+   for (i = 0; i < cb_info->nr_cb; i++) {
+   void (*end_fn)(void *private, int error);
+
+   end_fn = cb_info->endios[i];
+   cb_info->endios[i] = NULL;
+   if (end_fn)
+   end_fn(cb_info->privates[i], error);
+   }
+   up(_info->mutex);
+}
+
+static
+void log_write_endio(struct generic_callback *cb)
+{
+   struct log_cb_info *cb_info = cb->cb_private;
+   struct log_status *logst;
+
+   LAST_CALLBACK(cb);
+   CHECK_PTR(cb_info, err);
+
+   logst = cb_info->logst;
+   CHECK_PTR(logst, done);
+
+   _do_callbacks(cb_info, cb->cb_error);
+
+done:
+   put_log_cb_info(cb_info);
+   atomic_dec(>aio_flying);
+   atomic_dec(_aio_flying);
+   if (logst->signal_event)
+   wake_up_interruptible(logst->signal_event);
+
+   goto out_return;
+err:
+   XIO_FAT("internal pointer corruption\n");
+out_return:;
+}
+
+void log_flush(struct log_status *logst)
+{
+   struct aio_object *aio = logst->log_aio;
+   struct log_cb_info *cb_info;
+   int align_size;
+   int gap;
+
+   if (!aio || !logst->count)
+   goto out_return;
+   gap = 0;
+   align_size = (logst->align_size / PAGE_SIZE) * PAGE_SIZE;
+   if (align_size > 0) {
+   /*  round up to next alignment border */
+   int align_offset = logst->offset & (align_size - 1);
+
+   if (align_offset > 0) {
+   int restlen = aio->io_len - logst->offset;
+
+   gap = align_size - align_offset;
+   if (unlikely(gap > restlen))
+   gap = restlen;
+   }
+   }
+

[RFC 25/32] mars: add new module main_strategy

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/mars/main_strategy.c | 2135 +
 1 file changed, 2135 insertions(+)
 create mode 100644 drivers/staging/mars/mars/main_strategy.c

diff --git a/drivers/staging/mars/mars/main_strategy.c 
b/drivers/staging/mars/mars/main_strategy.c
new file mode 100644
index ..7929b566d645
--- /dev/null
+++ b/drivers/staging/mars/mars/main_strategy.c
@@ -0,0 +1,2135 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#define XIO_DEBUGGING
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "strategy.h"
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#define SKIP_BIO   false
+
+/***/
+
+/*  meta descriptions */
+
+const struct meta mars_kstat_meta[] = {
+   META_INI(ino, struct kstat, FIELD_UINT),
+   META_INI(mode, struct kstat, FIELD_UINT),
+   META_INI(size, struct kstat, FIELD_INT),
+   META_INI_SUB(atime, struct kstat, xio_timespec_meta),
+   META_INI_SUB(mtime, struct kstat, xio_timespec_meta),
+   META_INI_SUB(ctime, struct kstat, xio_timespec_meta),
+   META_INI_TRANSFER(blksize, struct kstat, FIELD_UINT, 4),
+   {}
+};
+
+const struct meta mars_dent_meta[] = {
+   META_INI(d_name,struct mars_dent, FIELD_STRING),
+   META_INI(d_rest,struct mars_dent, FIELD_STRING),
+   META_INI(d_path,struct mars_dent, FIELD_STRING),
+   META_INI(d_type,struct mars_dent, FIELD_UINT),
+   META_INI(d_class,   struct mars_dent, FIELD_INT),
+   META_INI(d_serial,  struct mars_dent, FIELD_INT),
+   META_INI(d_corr_A,  struct mars_dent, FIELD_INT),
+   META_INI(d_corr_B,  struct mars_dent, FIELD_INT),
+   META_INI_SUB(stat_val, struct mars_dent, mars_kstat_meta),
+   META_INI(link_val,struct mars_dent, FIELD_STRING),
+   META_INI(d_args,struct mars_dent, FIELD_STRING),
+   META_INI(d_argv[0], struct mars_dent, FIELD_STRING),
+   META_INI(d_argv[1], struct mars_dent, FIELD_STRING),
+   META_INI(d_argv[2], struct mars_dent, FIELD_STRING),
+   META_INI(d_argv[3], struct mars_dent, FIELD_STRING),
+   {}
+};
+
+/***/
+
+/* The _compat_*() functions are needed for the out-of-tree version
+ * of MARS for adapdation to different kernel version.
+ */
+
+/* Hack because of 8bcb77fabd7cbabcad49f58750be8683febee92b
+ */
+static int __path_parent(const char *name, struct path *path, unsigned flags)
+{
+   char *tmp;
+   int len;
+   int error;
+
+   len = strlen(name);
+   while (len > 0 && name[len] != '/')
+   len--;
+   if (unlikely(!len))
+   return -EINVAL;
+
+   tmp = brick_string_alloc(len + 1);
+   strncpy(tmp, name, len);
+   tmp[len] = '\0';
+
+   error = kern_path(tmp, flags | LOOKUP_DIRECTORY | LOOKUP_FOLLOW, path);
+
+   brick_string_free(tmp);
+   return error;
+}
+
+/* code is blindly stolen from symlinkat()
+ * and later adapted to various kernels
+ */
+int _compat_symlink(
+const char __user *oldname,
+   const char __user *newname,
+   struct timespec *mtime)
+{
+   const int newdfd = AT_FDCWD;
+   int error;
+   char *from;
+   struct dentry *dentry;
+   struct path path;
+   unsigned int lookup_flags = 0;
+
+   from = (char *)oldname;
+
+retry:
+   dentry = user_path_create(newdfd, newname, , lookup_flags);
+   error = PTR_ERR(dentry);
+   if (IS_ERR(dentry))
+   goto out_putname;
+
+   error = vfs_symlink(path.dentry->d_inode, dentry, from);
+   if (error >= 0 && mtime) {
+   struct iattr iattr = {
+   .ia_valid = ATTR_MTIME | ATTR_MTIME_SET | 
ATTR_TIMES_SET,
+   .ia_mtime.tv_sec = mtime->tv_sec,
+   .ia_mtime.tv_nsec = mtime->tv_nsec,
+   };
+
+   mutex_lock(>d_inode->i_mutex);
+   error = notify_change(dentry, , NULL);
+   mutex_unlock(>d_inode->i_mutex);
+   }
+   done_path_create(, dentry);
+   if (retry_es

[RFC 27/32] mars: add new module server_strategy

2016-12-30 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/mars/server_strategy.c | 436 
 1 file changed, 436 insertions(+)
 create mode 100644 drivers/staging/mars/mars/server_strategy.c

diff --git a/drivers/staging/mars/mars/server_strategy.c 
b/drivers/staging/mars/mars/server_strategy.c
new file mode 100644
index ..3b880c10be49
--- /dev/null
+++ b/drivers/staging/mars/mars/server_strategy.c
@@ -0,0 +1,436 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2016 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2016 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+/* MARS Light specific parts of xio_server
+ */
+
+#include 
+#include 
+#include 
+
+#define _STRATEGY
+#include 
+#include 
+#include 
+#include 
+
+#include "strategy.h"
+
+#include 
+#include 
+
+static
+int dummy_worker(struct mars_global *global, struct mars_dent *dent, bool 
prepare, bool direction)
+{
+   return 0;
+}
+
+static
+int _set_server_sio_params(struct xio_brick *_brick, void *private)
+{
+   struct sio_brick *sio_brick = (void *)_brick;
+
+   if (_brick->type != (void *)_sio_brick_type) {
+   XIO_ERR("bad brick type\n");
+   return -EINVAL;
+   }
+   sio_brick->o_direct = false;
+   sio_brick->o_fdsync = false;
+   XIO_INF("name = '%s' path = '%s'\n", _brick->brick_name, 
_brick->brick_path);
+   return 1;
+}
+
+static
+int _set_server_bio_params(struct xio_brick *_brick, void *private)
+{
+   struct bio_brick *bio_brick;
+
+   if (_brick->type == (void *)_sio_brick_type)
+   return _set_server_sio_params(_brick, private);
+   if (_brick->type != (void *)_bio_brick_type) {
+   XIO_ERR("bad brick type\n");
+   return -EINVAL;
+   }
+   bio_brick = (void *)_brick;
+   bio_brick->ra_pages = 0;
+   bio_brick->do_noidle = true;
+   bio_brick->do_sync = true;
+   bio_brick->do_unplug = true;
+   XIO_INF("name = '%s' path = '%s'\n", _brick->brick_name, 
_brick->brick_path);
+   return 1;
+}
+
+int handler_thread(void *data)
+{
+   struct mars_global handler_global = {
+   .dent_anchor = LIST_HEAD_INIT(handler_global.dent_anchor),
+   .brick_anchor = LIST_HEAD_INIT(handler_global.brick_anchor),
+   .global_power = {
+   .button = true,
+   },
+   .main_event = 
__WAIT_QUEUE_HEAD_INITIALIZER(handler_global.main_event),
+   };
+   struct task_struct *thread = NULL;
+   struct server_brick *brick = data;
+   struct xio_socket *sock = >handler_socket;
+   bool ok = xio_get_socket(sock);
+   unsigned long statist_jiffies = jiffies;
+   int debug_nr;
+   int status = -EINVAL;
+
+   init_rwsem(_global.dent_mutex);
+   init_rwsem(_global.brick_mutex);
+
+   XIO_DBG("#%d --- handler_thread starting on socket %p\n", 
sock->s_debug_nr, sock);
+   if (!ok)
+   goto done;
+
+   thread = brick_thread_create(cb_thread, brick, "xio_cb%d", 
brick->version);
+   if (unlikely(!thread)) {
+   XIO_ERR("cannot create cb thread\n");
+   status = -ENOENT;
+   goto done;
+   }
+   brick->cb_thread = thread;
+
+   brick->handler_running = true;
+   wake_up_interruptible(>startup_event);
+
+   while (!list_empty(_global.brick_anchor) ||
+  xio_socket_is_alive(sock)) {
+   struct xio_cmd cmd = {};
+
+   handler_global.global_version++;
+
+   if (!list_empty(_global.brick_anchor)) {
+   if (server_show_statist && 
!time_is_before_jiffies(statist_jiffies + 10 * HZ)) {
+   show_statistics(_global, "handler");
+   statist_jiffies = jiffies;
+   }
+   if (!xio_socket_is_alive(sock) &&
+   atomic_read(>in_flight) <= 0 &&
+   brick->conn_brick) {
+   if (generic_disconnect((void 
*)brick->inputs[0]) >= 0)
+   brick->conn_brick = NULL;
+   }
+
+   status 

Re: [PATCH 2/2] block: create ioctl to discard-or-zeroout a range of blocks

2016-03-12 Thread Thomas Schoebel-Theuer

On 03/12/2016 08:19 AM, Theodore Ts'o wrote:

On Fri, Mar 11, 2016 at 04:44:16PM -0800, Linus Torvalds wrote:


There's a big difference between "give the user rope", and "tie the
rope in a noose and put a banana peel so that the user might stumble
into the rope and hang himself", though.

[...]  And then the application has to run
setgid with that group's privileges.


Your concept of hierarchically nesting containers via filesystem 
instances looks nice to me.


A potential concern could be whether gids are the right implementation 
for expressing hierarchically nested access permissions in a persistent way.


Your permissions attached to gids are nested (because inside of your 
containers you may have another instance of a completely different gid 
namespace), they are also persistent when your mount flags etc are 
restored properly after a crash (by some scripts), but probably use of 
gids for this might look like a kind of "misuse" of the original gid 
concept from the 1970s.


Maybe you currently don't have a better /persistent/ concept for 
expressing your needs, so maybe your solution could be just fine under 
the currently given cirumstances.


Introduction of a new concept for overcoming the current limitations 
must be done very carefully.


The bad discard semantics concerns about information leaks could be 
/hypothetically/ solved at /concept level/ in the following way. Please 
note that by "concept level" I don't want to imply any particular 
implementation, this is just a mental experiment for discussion of the 
problems,  just a "model of thinking":


a) Use a hierarchical namespace for naming subjects, e.g. 
hypervisorA.containerB.subcontainerC.user9 instead of gid=9


b) Attach actual permissions to each block of the underlying block 
device (fine-grained object model).


c) Correctly maintain access rights at each hierarchical layer, and for 
all operations (including discard with whatever semantics). In case some 
inner instance is untrusted and may do evil things, this will be 
intercepted / corrected at outer layers (which are more trusted). In 
essence, the nesting hierarchy is also a hierarchy of trust.


Now information leaks by bad discard semantics etc should be solved at 
any level, even regarding completely unrelated containers or users, as 
long as no physical access to the disk is possible. In addition, 
encryption may be used for even overcoming this.


Of course, a direct implementation of such extremely fine-grained access 
permissions would carry way too much overhead. Both the number of 
subjects as well as the number of objects must be reduced to some 
reasonable order of magnitude, at least at outer levels.


Thus the question is: how can we achieve almost the same effect with 
much less overhead?



Hmm, in my old Athomux research prototype, I proposed some solutions for 
this, on an academic green meadow. But I am unsure what is transferable 
to a standard POSIX semantics system, and what not. Rethinking these 
concepts as well as checking them may take some time


Here is a first alpha-stage attempt:

1) Give up the hierarchical subject namespace a), but maybe not fully. 
Access checking will continue /locally/ at each layer, by treating each 
subsystem as a (grey) blackbox. This is already the default 
implementation strategy. The total system may be less secure than in an 
idealized fine-grained system, because outer levels can no longer detect 
bad guys inside of their subsystem instances. The question is: how to 
get a "more secure" system than currently, with some reasonable effort.


2) Some /coarse/ access permission checks at the block layer b), but 
finer than today. Currently there is almost no checking at all (except 
when accessing a huge block device as a whole during open() => at 1&1 we 
have very large ones, and they may continue running for years). I am 
unsure how to achieve this in detail.


An idea for a long-term solution would be offloading of "allocation 
groups" to the block layer (if their size is coarsely dynamic in 
general, e.g. in steps of gigabytes), and to implement some coarse 
permission checks there. These could then be related to "containers" or 
"container groups". One of the problems is that some wide-spread network 
protocols like iSCSI have no clue about this, so this can only be an 
optional new feature.


Further ideas sought.

Cheers, Thomas

P.S. The concept of a "nest" in Athomux was already some kind of 
"recursively nested block device".




Re: [PATCH 2/2] block: create ioctl to discard-or-zeroout a range of blocks

2016-03-12 Thread Thomas Schoebel-Theuer

On 03/12/2016 08:19 AM, Theodore Ts'o wrote:

On Fri, Mar 11, 2016 at 04:44:16PM -0800, Linus Torvalds wrote:


There's a big difference between "give the user rope", and "tie the
rope in a noose and put a banana peel so that the user might stumble
into the rope and hang himself", though.

[...]  And then the application has to run
setgid with that group's privileges.


Your concept of hierarchically nesting containers via filesystem 
instances looks nice to me.


A potential concern could be whether gids are the right implementation 
for expressing hierarchically nested access permissions in a persistent way.


Your permissions attached to gids are nested (because inside of your 
containers you may have another instance of a completely different gid 
namespace), they are also persistent when your mount flags etc are 
restored properly after a crash (by some scripts), but probably use of 
gids for this might look like a kind of "misuse" of the original gid 
concept from the 1970s.


Maybe you currently don't have a better /persistent/ concept for 
expressing your needs, so maybe your solution could be just fine under 
the currently given cirumstances.


Introduction of a new concept for overcoming the current limitations 
must be done very carefully.


The bad discard semantics concerns about information leaks could be 
/hypothetically/ solved at /concept level/ in the following way. Please 
note that by "concept level" I don't want to imply any particular 
implementation, this is just a mental experiment for discussion of the 
problems,  just a "model of thinking":


a) Use a hierarchical namespace for naming subjects, e.g. 
hypervisorA.containerB.subcontainerC.user9 instead of gid=9


b) Attach actual permissions to each block of the underlying block 
device (fine-grained object model).


c) Correctly maintain access rights at each hierarchical layer, and for 
all operations (including discard with whatever semantics). In case some 
inner instance is untrusted and may do evil things, this will be 
intercepted / corrected at outer layers (which are more trusted). In 
essence, the nesting hierarchy is also a hierarchy of trust.


Now information leaks by bad discard semantics etc should be solved at 
any level, even regarding completely unrelated containers or users, as 
long as no physical access to the disk is possible. In addition, 
encryption may be used for even overcoming this.


Of course, a direct implementation of such extremely fine-grained access 
permissions would carry way too much overhead. Both the number of 
subjects as well as the number of objects must be reduced to some 
reasonable order of magnitude, at least at outer levels.


Thus the question is: how can we achieve almost the same effect with 
much less overhead?



Hmm, in my old Athomux research prototype, I proposed some solutions for 
this, on an academic green meadow. But I am unsure what is transferable 
to a standard POSIX semantics system, and what not. Rethinking these 
concepts as well as checking them may take some time


Here is a first alpha-stage attempt:

1) Give up the hierarchical subject namespace a), but maybe not fully. 
Access checking will continue /locally/ at each layer, by treating each 
subsystem as a (grey) blackbox. This is already the default 
implementation strategy. The total system may be less secure than in an 
idealized fine-grained system, because outer levels can no longer detect 
bad guys inside of their subsystem instances. The question is: how to 
get a "more secure" system than currently, with some reasonable effort.


2) Some /coarse/ access permission checks at the block layer b), but 
finer than today. Currently there is almost no checking at all (except 
when accessing a huge block device as a whole during open() => at 1&1 we 
have very large ones, and they may continue running for years). I am 
unsure how to achieve this in detail.


An idea for a long-term solution would be offloading of "allocation 
groups" to the block layer (if their size is coarsely dynamic in 
general, e.g. in steps of gigabytes), and to implement some coarse 
permission checks there. These could then be related to "containers" or 
"container groups". One of the problems is that some wide-spread network 
protocols like iSCSI have no clue about this, so this can only be an 
optional new feature.


Further ideas sought.

Cheers, Thomas

P.S. The concept of a "nest" in Athomux was already some kind of 
"recursively nested block device".




Re: [PATCH 2/2] block: create ioctl to discard-or-zeroout a range of blocks

2016-03-03 Thread Thomas Schoebel-Theuer
On 03/03/2016 11:56 PM, Dave Chinner wrote:
> That "new kind of write command" would enable delayed allocation
> algorithms to continue to work at the filesystem level on block
> devices that freespace management completely is offloaded to...
> Cheers, Dave. 

This would advocate a uniform /internal/ interface (family) across both
fs and block layers, similiar in spirit to my old Athomux research
prototype long ago (see www.athomux.net).

This allows for recursive nesting in complex (distributed) storage/fs
hierarchies.

It would be nice if that internal interface (family) would be (partly /
fully) asynchronous with callbacks. In ideal case, it should be
compatible with workqueues (no need for blocking threads anymore).

Uniformity is only needed at concept level. There might remain different
flavours of concrete interfaces at different subsystems, if the number
of subsystems remains as small as possible, and interfacing is close to
trivial.

I would like to support this also in future versions of MARS (see
github.com/schoebel/mars).

Cheers, Thomas



Re: [PATCH 2/2] block: create ioctl to discard-or-zeroout a range of blocks

2016-03-03 Thread Thomas Schoebel-Theuer
On 03/03/2016 11:56 PM, Dave Chinner wrote:
> That "new kind of write command" would enable delayed allocation
> algorithms to continue to work at the filesystem level on block
> devices that freespace management completely is offloaded to...
> Cheers, Dave. 

This would advocate a uniform /internal/ interface (family) across both
fs and block layers, similiar in spirit to my old Athomux research
prototype long ago (see www.athomux.net).

This allows for recursive nesting in complex (distributed) storage/fs
hierarchies.

It would be nice if that internal interface (family) would be (partly /
fully) asynchronous with callbacks. In ideal case, it should be
compatible with workqueues (no need for blocking threads anymore).

Uniformity is only needed at concept level. There might remain different
flavours of concrete interfaces at different subsystems, if the number
of subsystems remains as small as possible, and interfacing is close to
trivial.

I would like to support this also in future versions of MARS (see
github.com/schoebel/mars).

Cheers, Thomas



[RFC 04/31] mars: add new module brick_checking

2015-12-31 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 include/linux/brick/brick_checking.h | 104 +++
 1 file changed, 104 insertions(+)
 create mode 100644 include/linux/brick/brick_checking.h

diff --git a/include/linux/brick/brick_checking.h 
b/include/linux/brick/brick_checking.h
new file mode 100644
index 000..a02f1bf
--- /dev/null
+++ b/include/linux/brick/brick_checking.h
@@ -0,0 +1,104 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef BRICK_CHECKING_H
+#define BRICK_CHECKING_H
+
+/***/
+
+/*  checking */
+
+#if defined(CONFIG_MARS_DEBUG) || defined(CONFIG_MARS_CHECKS)
+#define BRICK_CHECKING true
+#else
+#define BRICK_CHECKING false
+#endif
+
+#define _CHECK_ATOMIC(atom, OP, minval)
\
+do {   \
+   if (BRICK_CHECKING) {   \
+   int __test = atomic_read(atom); \
+   if (unlikely(__test OP(minval))) {  \
+   atomic_set(atom, minval);   \
+   BRICK_ERR("%d: atomic " #atom " " #OP " " #minval " 
(%d)\n", __LINE__, __test);\
+   }   \
+   }   \
+} while (0)
+
+#define CHECK_ATOMIC(atom, minval) \
+   _CHECK_ATOMIC(atom, <, minval)
+
+#define CHECK_HEAD_EMPTY(head) \
+do {   \
+   if (BRICK_CHECKING && unlikely(!list_empty(head) && (head)->next)) {\
+   list_del_init(head);\
+   BRICK_ERR("%d: list_head " #head " (%p) not empty\n", __LINE__, 
head);\
+   }   \
+} while (0)
+
+#ifdef CONFIG_MARS_DEBUG_MEM
+#define CHECK_PTR_DEAD(ptr, label) \
+do {   \
+   if (BRICK_CHECKING && unlikely((ptr) == (void *)0x5a5a5a5a5a5a5a5a)) {\
+   BRICK_FAT("%d: pointer '" #ptr "' is DEAD\n", __LINE__);\
+   goto label; \
+   }   \
+} while (0)
+#else
+#define CHECK_PTR_DEAD(ptr, label) /*empty*/
+#endif
+
+#define CHECK_PTR_NULL(ptr, label) \
+do {   \
+   CHECK_PTR_DEAD(ptr, label); \
+   if (BRICK_CHECKING && unlikely(!(ptr))) {   \
+   BRICK_FAT("%d: pointer '" #ptr "' is NULL\n", __LINE__);\
+   goto label; \
+   }   \
+} while (0)
+
+#ifdef CONFIG_MARS_DEBUG
+#define CHECK_PTR(ptr, label)  \
+do {   \
+   CHECK_PTR_NULL(ptr, label); \
+   if (BRICK_CHECKING && unlikely(!virt_addr_valid(ptr))) {\
+   BRICK_FAT("%d: pointer '" #ptr "' (%p) is no valid virtual 
KERNEL address\n", __LINE__, ptr);\
+   goto label; \
+   }   \
+} while (0)
+#else
+#define CHECK_PTR(ptr, label) CHECK_PTR_NULL(ptr, label)
+#endif
+
+#define CHECK_ASPECT(a_ptr, o_ptr, label)  \
+do {   \
+   if (BRICK_CHECKING && unlikely((a_ptr)->object != o_ptr)) { \
+   BRICK_FAT("%d

[RFC 11/31] mars: add new module lib_timing

2015-12-31 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/lib/lib_timing.c |  71 +
 include/linux/brick/lib_timing.h  | 181 ++
 2 files changed, 252 insertions(+)
 create mode 100644 drivers/staging/mars/lib/lib_timing.c
 create mode 100644 include/linux/brick/lib_timing.h

diff --git a/drivers/staging/mars/lib/lib_timing.c 
b/drivers/staging/mars/lib/lib_timing.c
new file mode 100644
index 000..7421dc4
--- /dev/null
+++ b/drivers/staging/mars/lib/lib_timing.c
@@ -0,0 +1,71 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+
+#include 
+#include 
+
+#ifdef CONFIG_DEBUG_KERNEL
+
+int report_timing(struct timing_stats *tim, char *str, int maxlen)
+{
+   int len = 0;
+   int time = 1;
+   int resol = 1;
+
+   static const char * const units[] = {
+   "us",
+   "ms",
+   "s",
+   "ERROR"
+   };
+   const char *unit = units[0];
+   int unit_index = 0;
+   int i;
+
+   for (i = 0; i < TIMING_MAX; i++) {
+   int this_len = scnprintf(str,
+
+   maxlen,
+   "<%d%s = %d (%lld) ",
+   resol,
+   unit,
+   tim->tim_count[i],
+   (long long)tim->tim_count[i] * time);
+   str += this_len;
+   len += this_len;
+   maxlen -= this_len;
+   if (maxlen <= 1)
+   break;
+   resol <<= 1;
+   time <<= 1;
+   if (resol >= 1000) {
+   resol = 1;
+   unit = units[++unit_index];
+   }
+   }
+   return len;
+}
+
+#endif /*  CONFIG_DEBUG_KERNEL */
+
+struct threshold global_io_threshold = {
+   .thr_limit = 30 * 100, /*  30 seconds */
+   .thr_factor = 100,
+   .thr_plus = 0,
+};
diff --git a/include/linux/brick/lib_timing.h b/include/linux/brick/lib_timing.h
new file mode 100644
index 000..8a7a1e9
--- /dev/null
+++ b/include/linux/brick/lib_timing.h
@@ -0,0 +1,181 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef LIB_TIMING_H
+#define LIB_TIMING_H
+
+#include 
+
+/* Simple infrastructure for timing of arbitrary operations and creation
+ * of some simple histogram statistics.
+ */
+
+#define TIMING_MAX 24
+
+struct timing_stats {
+#ifdef CONFIG_DEBUG_KERNEL
+   int tim_count[TIMING_MAX];
+
+#endif
+};
+
+#define _TIME_THIS(_stamp1, _stamp2, _CODE)\
+   ({  \
+   (_stamp1) = cpu_clock(raw_smp_processor_id());  \
+   \
+   _CODE;  \
+   \
+   (_stamp2) = cpu_clock(raw_smp_processor_id());  \
+   (_stamp2) - (_stamp1);  \
+   })
+
+#define TIME_THIS(_CODE)   \
+   ({  \
+   unsigned long long _stamp1; \
+   unsigned long long _stamp2; \
+   _TIME_THIS(_stamp1, _stamp2, _CODE);\
+   })
+
+#ifdef CONFIG_DEBUG_KERNEL
+
+#define _TIME_STATS(_timing, _stamp1, _stamp2, _CODE)  \
+   ({

[RFC 08/31] mars: add new module lib_queue

2015-12-31 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 include/linux/brick/lib_queue.h | 166 
 1 file changed, 166 insertions(+)
 create mode 100644 include/linux/brick/lib_queue.h

diff --git a/include/linux/brick/lib_queue.h b/include/linux/brick/lib_queue.h
new file mode 100644
index 000..f1b1a9e
--- /dev/null
+++ b/include/linux/brick/lib_queue.h
@@ -0,0 +1,166 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef LIB_QUEUE_H
+#define LIB_QUEUE_H
+
+#define QUEUE_ANCHOR(PREFIX, KEYTYPE, HEAPTYPE)
\
+   /* parameters */\
+   /* readonly from outside */ \
+   atomic_t q_queued;  \
+   atomic_t q_flying;  \
+   atomic_t q_total;   \
+   /* tunables */  \
+   int q_batchlen; \
+   int q_io_prio;  \
+   bool q_ordering;\
+   /* private */   \
+   wait_queue_head_t *q_event; \
+   spinlock_t q_lock;  \
+   struct list_head q_anchor;  \
+   struct pairing_heap_##HEAPTYPE *heap_high;  \
+   struct pairing_heap_##HEAPTYPE *heap_low;   \
+   long long q_last_insert; /* jiffies */  \
+   KEYTYPE heap_margin;\
+   KEYTYPE last_pos;   \
+   /* this comment is for keeping TRAILING_SEMICOLON happy */
+
+#define QUEUE_FUNCTIONS(PREFIX, ELEM_TYPE, HEAD, KEYFN, KEYCMP, HEAPTYPE)\
+   \
+static inline  \
+void q_##PREFIX##_trigger(struct PREFIX##_queue *q)\
+{  \
+   if (q->q_event) {   \
+   wake_up_interruptible(q->q_event);  \
+   }   \
+}  \
+   \
+static inline  \
+void q_##PREFIX##_init(struct PREFIX##_queue *q)   \
+{  \
+   INIT_LIST_HEAD(>q_anchor);   \
+   q->heap_low = NULL; \
+   q->heap_high = NULL;\
+   spin_lock_init(>q_lock); \
+   atomic_set(>q_queued, 0);\
+   atomic_set(>q_flying, 0);\
+}  \
+   \
+static inline  \
+void q_##PREFIX##_insert(struct PREFIX##_queue *q, ELEM_TYPE * elem)   \
+{  \
+   unsigned long flags;\
+   \
+   spin_lock_irqsave(>q_lock, flags);   \
+   \
+   if (q->q_ordering) {\
+   struct pairing_heap_##HEAPTYPE **use = >heap_high;   \
+   if (KEYCMP(KEYFN(elem), >heap_margin) <= 0) {  

[RFC 13/31] mars: add new module xio

2015-12-31 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/xio_bricks/xio.c | 161 +
 include/linux/xio/xio.h   | 313 ++
 2 files changed, 474 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio.c
 create mode 100644 include/linux/xio/xio.h

diff --git a/drivers/staging/mars/xio_bricks/xio.c 
b/drivers/staging/mars/xio_bricks/xio.c
new file mode 100644
index 000..94aeb60
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio.c
@@ -0,0 +1,161 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+//
+
+/*  infrastructure */
+
+struct banning xio_global_ban = {};
+atomic_t xio_global_io_flying = ATOMIC_INIT(0);
+
+//
+
+/*  object stuff */
+
+const struct generic_object_type aio_type = {
+   .object_type_name = "aio",
+   .default_size = sizeof(struct aio_object),
+   .object_type_nr = OBJ_TYPE_AIO,
+};
+
+//
+
+/*  brick stuff */
+
+/***/
+
+/*  meta descriptions */
+
+const struct meta xio_info_meta[] = {
+   META_INI(current_size,struct xio_info, FIELD_INT),
+   META_INI(tf_align,struct xio_info, FIELD_INT),
+   META_INI(tf_min_size, struct xio_info, FIELD_INT),
+   {}
+};
+
+const struct meta xio_aio_user_meta[] = {
+   META_INI(_object_cb.cb_error, struct aio_object, FIELD_INT),
+   META_INI(io_pos,   struct aio_object, FIELD_INT),
+   META_INI(io_len,   struct aio_object, FIELD_INT),
+   META_INI(io_may_write,struct aio_object, FIELD_INT),
+   META_INI(io_prio,  struct aio_object, FIELD_INT),
+   META_INI(io_cs_mode,   struct aio_object, FIELD_INT),
+   META_INI(io_timeout,   struct aio_object, FIELD_INT),
+   META_INI(io_total_size,   struct aio_object, FIELD_INT),
+   META_INI(io_checksum,  struct aio_object, FIELD_RAW),
+   META_INI(io_flags, struct aio_object, FIELD_INT),
+   META_INI(io_rw,struct aio_object, FIELD_INT),
+   META_INI(io_id,struct aio_object, FIELD_INT),
+   META_INI(io_skip_sync,struct aio_object, FIELD_INT),
+   {}
+};
+
+const struct meta xio_timespec_meta[] = {
+   META_INI_TRANSFER(tv_sec,  struct timespec, FIELD_UINT, 8),
+   META_INI_TRANSFER(tv_nsec, struct timespec, FIELD_UINT, 4),
+   {}
+};
+
+//
+
+/*  crypto stuff */
+
+#include 
+#include 
+
+static struct crypto_hash *xio_tfm;
+static struct semaphore tfm_sem;
+int xio_digest_size;
+
+void xio_digest(unsigned char *digest, void *data, int len)
+{
+   struct hash_desc desc = {
+   .tfm = xio_tfm,
+   .flags = 0,
+   };
+   struct scatterlist sg;
+
+   memset(digest, 0, xio_digest_size);
+
+   /*  TODO: use per-thread instance, omit locking */
+   down(_sem);
+
+   crypto_hash_init();
+   sg_init_table(, 1);
+   sg_set_buf(, data, len);
+   crypto_hash_update(, , sg.length);
+   crypto_hash_final(, digest);
+   up(_sem);
+}
+
+void aio_checksum(struct aio_object *aio)
+{
+   unsigned char checksum[xio_digest_size];
+   int len;
+
+   if (aio->io_cs_mode <= 0 || !aio->io_data)
+   goto out_return;
+   xio_digest(checksum, aio->io_data, aio->io_len);
+
+   len = sizeof(aio->io_checksum);
+   if (len > xio_digest_size)
+   len = xio_digest_size;
+   memcpy(>io_checksum, checksum, len);
+out_return:;
+}
+
+/***/
+
+/*  init stuff */
+
+int __init init_xio(void)
+{
+   XIO_INF("init_xio()\n");
+
+   sema_init(_sem, 1);
+
+   xio_tfm = crypto_alloc_hash("md5", 0, CRYPTO_ALG_ASYNC);
+   if (!xio_tfm) {
+   XIO_ERR("cannot alloc crypto hash\n");
+   return -ENOMEM;
+   }
+   if (IS_ERR(xio_tfm)) {
+   XIO_ERR("alloc crypto hash failed, status

[RFC 19/31] mars: add new module xio_client

2015-12-31 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/xio_bricks/xio_client.c | 1055 ++
 include/linux/xio/xio_client.h   |  105 +++
 2 files changed, 1160 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio_client.c
 create mode 100644 include/linux/xio/xio_client.h

diff --git a/drivers/staging/mars/xio_bricks/xio_client.c 
b/drivers/staging/mars/xio_bricks/xio_client.c
new file mode 100644
index 000..6fdc261
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio_client.c
@@ -0,0 +1,1055 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+/ own type definitions ***/
+
+#include 
+
+#define CLIENT_HASH_MAX(PAGE_SIZE / sizeof(struct 
list_head))
+
+int xio_client_abort = 10;
+
+int max_client_channels = 1;
+
+int max_client_bulk = 16;
+
+/ own helper functions ***/
+
+static int thread_count;
+
+static
+void _do_resubmit(struct client_channel *ch)
+{
+   struct client_output *output = ch->output;
+   unsigned long flags;
+
+   spin_lock_irqsave(>lock, flags);
+   if (!list_empty(>wait_list)) {
+   struct list_head *first = ch->wait_list.next;
+   struct list_head *last = ch->wait_list.prev;
+   struct list_head *old_start = output->aio_list.next;
+
+#define list_connect __list_del /*  the original routine has a misleading 
name: in reality it is more general */
+   list_connect(>aio_list, first);
+   list_connect(last, old_start);
+   INIT_LIST_HEAD(>wait_list);
+   }
+   spin_unlock_irqrestore(>lock, flags);
+}
+
+static
+void _kill_thread(struct client_threadinfo *ti, const char *name)
+{
+   struct task_struct *thread = ti->thread;
+
+   if (thread) {
+   XIO_DBG("stopping %s thread\n", name);
+   ti->thread = NULL;
+   brick_thread_stop(thread);
+   }
+}
+
+static
+void _kill_channel(struct client_channel *ch)
+{
+   XIO_DBG("channel = %p\n", ch);
+   if (xio_socket_is_alive(>socket)) {
+   XIO_DBG("shutdown socket\n");
+   xio_shutdown_socket(>socket);
+   }
+   _kill_thread(>receiver, "receiver");
+   if (ch->is_open) {
+   XIO_DBG("close socket\n");
+   xio_put_socket(>socket);
+   }
+   ch->recv_error = 0;
+   ch->is_used = false;
+   ch->is_open = false;
+   ch->is_connected = false;
+   /* Re-Submit any waiting requests
+*/
+   _do_resubmit(ch);
+}
+
+static inline
+void _kill_all_channels(struct client_bundle *bundle)
+{
+   int i;
+
+   /*  first pass: shutdown in parallel without waiting */
+   for (i = 0; i < MAX_CLIENT_CHANNELS; i++) {
+   struct client_channel *ch = >channel[i];
+
+   if (xio_socket_is_alive(>socket)) {
+   XIO_DBG("shutdown socket %d\n", i);
+   xio_shutdown_socket(>socket);
+   }
+   }
+   /*  separate pass (may wait) */
+   for (i = 0; i < MAX_CLIENT_CHANNELS; i++)
+   _kill_channel(>channel[i]);
+}
+
+static int receiver_thread(void *data);
+
+static
+int _setup_channel(struct client_bundle *bundle, int ch_nr)
+{
+   struct client_channel *ch = >channel[ch_nr];
+   struct sockaddr_storage src_sockaddr;
+   struct sockaddr_storage dst_sockaddr;
+   int status;
+
+   ch->ch_nr = ch_nr;
+   if (unlikely(ch->receiver.thread)) {
+   XIO_WRN("receiver thread %d unexpectedly not dead\n", ch_nr);
+   _kill_thread(>receiver, "receiver");
+   }
+
+   status = xio_create_sockaddr(_sockaddr, my_id());
+   if (unlikely(status < 0)) {
+   XIO_DBG("no src sockaddr, status = %d\n", status);
+   goto done;
+   }
+
+   status = xio_create_sockaddr(_sockaddr, bundle->host);
+   if (unlikely(status < 0)) {
+   XIO_DBG("no dst sockaddr, status = %d\n", status);
+   goto

[RFC 27/31] mars: add new module mars_proc

2015-12-31 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/mars_light/mars_proc.c | 369 
 include/linux/mars_light/mars_proc.h|  34 +++
 2 files changed, 403 insertions(+)
 create mode 100644 drivers/staging/mars/mars_light/mars_proc.c
 create mode 100644 include/linux/mars_light/mars_proc.h

diff --git a/drivers/staging/mars/mars_light/mars_proc.c 
b/drivers/staging/mars/mars_light/mars_proc.c
new file mode 100644
index 000..2a96614
--- /dev/null
+++ b/drivers/staging/mars/mars_light/mars_proc.c
@@ -0,0 +1,369 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+xio_info_fn xio_info;
+
+static
+int trigger_sysctl_handler(
+   struct ctl_table *table,
+   int write,
+   void __user *buffer,
+   size_t *length,
+   loff_t *ppos)
+{
+   ssize_t res = 0;
+   size_t len = *length;
+
+   XIO_DBG("write = %d len = %ld pos = %lld\n", write, len, *ppos);
+
+   if (!len || *ppos > 0)
+   goto done;
+
+   if (write) {
+   char tmp[8] = {};
+
+   res = len; /*  fake consumption of all data */
+
+   if (len > 7)
+   len = 7;
+   if (!copy_from_user(tmp, buffer, len)) {
+   int code = 0;
+   int status = kstrtoint(tmp, 10, );
+
+   /* the return value from ssanf() does not matter */
+   (void)status;
+   if (code > 0)
+   local_trigger();
+   if (code > 1)
+   remote_trigger();
+   }
+   } else {
+   char *answer = "MARS module not operational\n";
+   char *tmp = NULL;
+   int mylen;
+
+   if (xio_info) {
+   answer = "internal error while determining xio_info\n";
+   tmp = xio_info();
+   if (tmp)
+   answer = tmp;
+   }
+
+   mylen = strlen(answer);
+   if (len > mylen)
+   len = mylen;
+   res = len;
+   if (copy_to_user(buffer, answer, len)) {
+   XIO_ERR("write %ld bytes at %p failed\n", len, buffer);
+   res = -EFAULT;
+   }
+   brick_string_free(tmp);
+   }
+
+done:
+   XIO_DBG("res = %ld\n", res);
+   *length = res;
+   if (res >= 0) {
+   *ppos += res;
+   return 0;
+   }
+   return res;
+}
+
+static
+int lamport_sysctl_handler(
+   struct ctl_table *table,
+   int write,
+   void __user *buffer,
+   size_t *length,
+   loff_t *ppos)
+{
+   ssize_t res = 0;
+   size_t len = *length;
+
+   XIO_DBG("write = %d len = %ld pos = %lld\n", write, len, *ppos);
+
+   if (!len || *ppos > 0)
+   goto done;
+
+   if (write) {
+   return -EINVAL;
+   } else {
+   int my_len = 128;
+   char *tmp = brick_string_alloc(my_len);
+   struct timespec know = CURRENT_TIME;
+   struct timespec lnow;
+
+   get_lamport();
+
+   res = scnprintf(tmp, my_len,
+  
"CURRENT_TIME=%ld.%09ld\nlamport_now=%ld.%09ld\n",
+  know.tv_sec, know.tv_nsec,
+  lnow.tv_sec, lnow.tv_nsec
+   );
+
+   if (copy_to_user(buffer, tmp, res)) {
+   XIO_ERR("write %ld bytes at %p failed\n", res, buffer);
+   res = -EFAULT;
+   }
+   brick_string_free(tmp);
+   }
+
+done:
+   XIO_DBG("res = %ld\n", res);
+   *length = res;
+   if (res >= 0) {
+   *ppos += res;
+   return 0;
+   }
+   return res;
+}
+
+#ifdef CTL_UNNUMBERED
+#define _CTL_NAME  .ctl_name = CTL_UNNUMBERED,
+#define _CTL_STRATEGY(handler) .strategy = ,
+#else
+#defin

[RFC 29/31] mars: add new module Makefile

2015-12-31 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/Makefile | 61 +++
 1 file changed, 61 insertions(+)
 create mode 100644 drivers/staging/mars/Makefile

diff --git a/drivers/staging/mars/Makefile b/drivers/staging/mars/Makefile
new file mode 100644
index 000..13d68cc
--- /dev/null
+++ b/drivers/staging/mars/Makefile
@@ -0,0 +1,61 @@
+#
+# Makefile for MARS
+#
+
+# remove_this
+ifndef CONFIG_MARS
+# mars_config.h is generated by a simple Kconfig parser (gen_config.pl)
+# at build time.
+# It does not respect any Kconfig dependencies.
+# Therefore, it is unsafe. Use at your own risk!
+# It is ONLY used for out-of-tree builds.
+#
+CONFIG_MARS_BIGMODULE := m
+CONFIG_MARS_NET_COMPAT := y
+obj-$(CONFIG_MARS_BIGMODULE)   += mars.o
+extra-y+= mars_config.h
+GEN_CONFIG_SCRIPT := $(src)/../scripts/gen_config.pl
+$(obj)/mars_config.h: $(obj)/buildtag.h
+$(obj)/mars_config.h: $(src)/Kconfig $(GEN_CONFIG_SCRIPT)
+   $(Q)$(kecho) "MARS: using compiler $($(CC) --version | head -1)"
+   $(CC) -v
+   $(Q)$(kecho) "MARS: Generating $@"
+   $(Q)set -e; \
+   if [ ! -x $(GEN_CONFIG_SCRIPT) ]; then \
+   $(kecho) "MARS: cannot execute script $(GEN_CONFIG_SCRIPT)"; \
+   /bin/false; \
+   fi; \
+   cat $< | $(GEN_CONFIG_SCRIPT) > $@;
+   cat $@;
+endif
+# end_remove_this
+
+obj-$(CONFIG_MARS) += mars.o
+
+KBUILD_CFLAGS += -fdelete-null-pointer-checks
+
+mars-objs :=   \
+   lamport.o   \
+   brick_say.o \
+   brick_mem.o \
+   brick.o \
+   xio_bricks/xio.o\
+   xio_bricks/lib_log.o\
+   lib/lib_rank.o  \
+   lib/lib_limiter.o   \
+   lib/lib_timing.o\
+   xio_bricks/lib_mapfree.o\
+   xio_bricks/xio_net.o\
+   mars_light/light_server_strategy.o  \
+   xio_bricks/xio_server.o \
+   xio_bricks/xio_client.o \
+   xio_bricks/xio_sio.o\
+   xio_bricks/xio_bio.o\
+   xio_bricks/xio_if.o \
+   xio_bricks/xio_copy.o   \
+   xio_bricks/xio_trans_logger.o   \
+   mars_light/light_strategy.o \
+   mars_light/light_net.o  \
+   mars_light/mars_proc.o  \
+   mars_light/mars_light.o
+
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 06/31] mars: add new module brick

2015-12-31 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/brick.c | 728 +++
 include/linux/brick/brick.h  | 642 ++
 2 files changed, 1370 insertions(+)
 create mode 100644 drivers/staging/mars/brick.c
 create mode 100644 include/linux/brick/brick.h

diff --git a/drivers/staging/mars/brick.c b/drivers/staging/mars/brick.c
new file mode 100644
index 000..9c3d5b9
--- /dev/null
+++ b/drivers/staging/mars/brick.c
@@ -0,0 +1,728 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+
+#define _STRATEGY
+
+#include 
+#include 
+
+//
+
+/*  init / exit functions */
+
+void _generic_output_init(struct generic_brick *brick,
+   const struct generic_output_type *type,
+   struct generic_output *output)
+{
+   output->brick = brick;
+   output->type = type;
+   output->ops = type->master_ops;
+   output->nr_connected = 0;
+   INIT_LIST_HEAD(>output_head);
+}
+
+void _generic_output_exit(struct generic_output *output)
+{
+   list_del_init(>output_head);
+   output->brick = NULL;
+   output->type = NULL;
+   output->ops = NULL;
+   output->nr_connected = 0;
+}
+
+int generic_brick_init(const struct generic_brick_type *type, struct 
generic_brick *brick)
+{
+   brick->aspect_context.brick_index = get_brick_nr();
+   brick->type = type;
+   brick->ops = type->master_ops;
+   brick->nr_inputs = 0;
+   brick->nr_outputs = 0;
+   brick->power.off_led = true;
+   init_waitqueue_head(>power.event);
+   INIT_LIST_HEAD(>tmp_head);
+   return 0;
+}
+
+void generic_brick_exit(struct generic_brick *brick)
+{
+   list_del_init(>tmp_head);
+   brick->type = NULL;
+   brick->ops = NULL;
+   brick->nr_inputs = 0;
+   brick->nr_outputs = 0;
+   put_brick_nr(brick->aspect_context.brick_index);
+}
+
+int generic_input_init(struct generic_brick *brick,
+   int index,
+   const struct generic_input_type *type,
+   struct generic_input *input)
+{
+   if (index < 0 || index >= brick->type->max_inputs)
+   return -EINVAL;
+   if (brick->inputs[index])
+   return -EEXIST;
+   input->brick = brick;
+   input->type = type;
+   input->connect = NULL;
+   INIT_LIST_HEAD(>input_head);
+   brick->inputs[index] = input;
+   brick->nr_inputs++;
+   return 0;
+}
+
+void generic_input_exit(struct generic_input *input)
+{
+   list_del_init(>input_head);
+   input->brick = NULL;
+   input->type = NULL;
+   input->connect = NULL;
+}
+
+int generic_output_init(struct generic_brick *brick,
+   int index,
+   const struct generic_output_type *type,
+   struct generic_output *output)
+{
+   if (index < 0 || index >= brick->type->max_outputs)
+   return -ENOMEM;
+   if (brick->outputs[index])
+   return -EEXIST;
+   _generic_output_init(brick, type, output);
+   brick->outputs[index] = output;
+   brick->nr_outputs++;
+   return 0;
+}
+
+int generic_size(const struct generic_brick_type *brick_type)
+{
+   int size = brick_type->brick_size;
+   int i;
+
+   size += brick_type->max_inputs * sizeof(void *);
+   for (i = 0; i < brick_type->max_inputs; i++)
+   size += brick_type->default_input_types[i]->input_size;
+   size += brick_type->max_outputs * sizeof(void *);
+   for (i = 0; i < brick_type->max_outputs; i++)
+   size += brick_type->default_output_types[i]->output_size;
+   return size;
+}
+
+int generic_connect(struct generic_input *input, struct generic_output *output)
+{
+   BRICK_DBG("generic_connect(input=%p, output=%p)\n", input, output);
+   if (unlikely(!input || !output))
+   return -EINVAL;
+   if (unlikely(input->connect))
+   return -EEXIST;
+   if (unlikely(!list_empty(>input_head)))
+   return -EINVAL;
+   /*  helps only against the most common errors */
+   if (unlikely(input->brick == output->bri

[RFC 26/31] mars: add new module light_server_strategy

2015-12-31 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 .../mars/mars_light/light_server_strategy.c| 403 +
 1 file changed, 403 insertions(+)
 create mode 100644 drivers/staging/mars/mars_light/light_server_strategy.c

diff --git a/drivers/staging/mars/mars_light/light_server_strategy.c 
b/drivers/staging/mars/mars_light/light_server_strategy.c
new file mode 100644
index 000..6bb5cd7
--- /dev/null
+++ b/drivers/staging/mars/mars_light/light_server_strategy.c
@@ -0,0 +1,403 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+/* MARS Light specific parts of xio_server
+ */
+
+#include 
+#include 
+#include 
+
+#define _STRATEGY
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include 
+
+static
+int dummy_worker(struct mars_global *global, struct mars_dent *dent, bool 
prepare, bool direction)
+{
+   return 0;
+}
+
+static
+int _set_server_sio_params(struct xio_brick *_brick, void *private)
+{
+   struct sio_brick *sio_brick = (void *)_brick;
+
+   if (_brick->type != (void *)_sio_brick_type) {
+   XIO_ERR("bad brick type\n");
+   return -EINVAL;
+   }
+   sio_brick->o_direct = false;
+   sio_brick->o_fdsync = false;
+   XIO_INF("name = '%s' path = '%s'\n", _brick->brick_name, 
_brick->brick_path);
+   return 1;
+}
+
+static
+int _set_server_bio_params(struct xio_brick *_brick, void *private)
+{
+   struct bio_brick *bio_brick;
+
+   if (_brick->type == (void *)_sio_brick_type)
+   return _set_server_sio_params(_brick, private);
+   if (_brick->type != (void *)_bio_brick_type) {
+   XIO_ERR("bad brick type\n");
+   return -EINVAL;
+   }
+   bio_brick = (void *)_brick;
+   bio_brick->ra_pages = 0;
+   bio_brick->do_noidle = true;
+   bio_brick->do_sync = true;
+   bio_brick->do_unplug = true;
+   XIO_INF("name = '%s' path = '%s'\n", _brick->brick_name, 
_brick->brick_path);
+   return 1;
+}
+
+int handler_thread(void *data)
+{
+   struct mars_global handler_global = {
+   .dent_anchor = LIST_HEAD_INIT(handler_global.dent_anchor),
+   .brick_anchor = LIST_HEAD_INIT(handler_global.brick_anchor),
+   .global_power = {
+   .button = true,
+   },
+   .main_event = 
__WAIT_QUEUE_HEAD_INITIALIZER(handler_global.main_event),
+   };
+   struct task_struct *thread = NULL;
+   struct server_brick *brick = data;
+   struct xio_socket *sock = >handler_socket;
+   bool ok = xio_get_socket(sock);
+   unsigned long statist_jiffies = jiffies;
+   int debug_nr;
+   int status = -EINVAL;
+
+   init_rwsem(_global.dent_mutex);
+   init_rwsem(_global.brick_mutex);
+
+   XIO_DBG("#%d --- handler_thread starting on socket %p\n", 
sock->s_debug_nr, sock);
+   if (!ok)
+   goto done;
+
+   thread = brick_thread_create(cb_thread, brick, "xio_cb%d", 
brick->version);
+   if (unlikely(!thread)) {
+   XIO_ERR("cannot create cb thread\n");
+   status = -ENOENT;
+   goto done;
+   }
+   brick->cb_thread = thread;
+
+   brick->handler_running = true;
+   wake_up_interruptible(>startup_event);
+
+   while (!list_empty(_global.brick_anchor) ||
+  xio_socket_is_alive(sock)) {
+   struct xio_cmd cmd = {};
+
+   handler_global.global_version++;
+
+   if (!list_empty(_global.brick_anchor)) {
+   if (server_show_statist && 
!time_is_before_jiffies(statist_jiffies + 10 * HZ)) {
+   show_statistics(_global, "handler");
+   statist_jiffies = jiffies;
+   }
+   if (!xio_socket_is_alive(sock) &&
+   atomic_read(>in_flight) <= 0 &&
+   brick->conn_brick) {
+   if (generic_disconnect((void 
*)brick->inputs[0]) >= 0)
+   brick->conn_brick = NULL;
+   }
+
+

[RFC 15/31] mars: add new module lib_mapfree

2015-12-31 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/xio_bricks/lib_mapfree.c | 380 ++
 include/linux/xio/lib_mapfree.h   |  84 ++
 2 files changed, 464 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/lib_mapfree.c
 create mode 100644 include/linux/xio/lib_mapfree.h

diff --git a/drivers/staging/mars/xio_bricks/lib_mapfree.c 
b/drivers/staging/mars/xio_bricks/lib_mapfree.c
new file mode 100644
index 000..6b464d7
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/lib_mapfree.c
@@ -0,0 +1,380 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*  time to wait between background mapfree operations */
+int mapfree_period_sec = 10;
+
+/*  some grace space where no regular cleanup should occur */
+int mapfree_grace_keep_mb = 16;
+
+static
+DECLARE_RWSEM(mapfree_mutex);
+
+static
+LIST_HEAD(mapfree_list);
+
+void mapfree_pages(struct mapfree_info *mf, int grace_keep)
+{
+   struct address_space *mapping;
+   pgoff_t start;
+   pgoff_t end;
+
+   if (unlikely(!mf))
+   goto done;
+   if (unlikely(!mf->mf_filp))
+   goto done;
+
+   mapping = mf->mf_filp->f_mapping;
+   if (unlikely(!mapping))
+   goto done;
+
+   if (grace_keep < 0) { /*  force full flush */
+   start = 0;
+   end = -1;
+   } else {
+   unsigned long flags;
+   loff_t tmp;
+   loff_t min;
+
+   spin_lock_irqsave(>mf_lock, flags);
+
+   min = tmp = mf->mf_min[0];
+   if (likely(mf->mf_min[1] < min))
+   min = mf->mf_min[1];
+   if (tmp) {
+   mf->mf_min[1] = tmp;
+   mf->mf_min[0] = 0;
+   }
+
+   spin_unlock_irqrestore(>mf_lock, flags);
+
+   min -= (loff_t)grace_keep * (1024 * 1024); /*  megabytes */
+   end = 0;
+
+   if (min > 0 || mf->mf_last) {
+   start = mf->mf_last / PAGE_SIZE;
+   /*  add some grace overlapping */
+   if (likely(start > 0))
+   start--;
+   mf->mf_last = min;
+   end = min / PAGE_SIZE;
+   } else  { /*  there was no progress for at least 2 rounds */
+   start = 0;
+   if (!grace_keep) /*  also flush thoroughly */
+   end = -1;
+   }
+
+   XIO_DBG("file = '%s' start = %lu end = %lu\n", mf->mf_name, 
start, end);
+   }
+
+   if (end > start || end == -1)
+   invalidate_mapping_pages(mapping, start, end);
+
+done:;
+}
+
+static
+void _mapfree_put(struct mapfree_info *mf)
+{
+   if (atomic_dec_and_test(>mf_count)) {
+   XIO_DBG("closing file '%s' filp = %p\n", mf->mf_name, 
mf->mf_filp);
+   list_del_init(>mf_head);
+   CHECK_HEAD_EMPTY(>mf_dirty_anchor);
+   if (likely(mf->mf_filp)) {
+   mapfree_pages(mf, -1);
+   filp_close(mf->mf_filp, NULL);
+   }
+   brick_string_free(mf->mf_name);
+   brick_mem_free(mf);
+   }
+}
+
+void mapfree_put(struct mapfree_info *mf)
+{
+   if (likely(mf)) {
+   down_write(_mutex);
+   _mapfree_put(mf);
+   up_write(_mutex);
+   }
+}
+
+struct mapfree_info *mapfree_get(const char *name, int flags)
+{
+   struct mapfree_info *mf = NULL;
+   struct list_head *tmp;
+
+   if (!(flags & O_DIRECT)) {
+   down_read(_mutex);
+   for (tmp = mapfree_list.next; tmp != _list; tmp = 
tmp->next) {
+   struct mapfree_info *_mf = container_of(tmp, struct 
mapfree_info, mf_head);
+
+   if (_mf->mf_flags == flags && !strcmp(_mf->mf_name, 
name)) {
+   mf = _mf;
+   atomic_inc(>mf_count);
+   break;
+  

[RFC 17/31] mars: add new module xio_bio

2015-12-31 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/xio_bricks/xio_bio.c | 845 ++
 include/linux/xio/xio_bio.h   |  85 +++
 2 files changed, 930 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio_bio.c
 create mode 100644 include/linux/xio/xio_bio.h

diff --git a/drivers/staging/mars/xio_bricks/xio_bio.c 
b/drivers/staging/mars/xio_bricks/xio_bio.c
new file mode 100644
index 000..ef18325
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio_bio.c
@@ -0,0 +1,845 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+/*  Bio brick (interface to blkdev IO via kernel bios) */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include 
+static struct timing_stats timings[2];
+
+struct threshold bio_submit_threshold = {
+   .thr_ban = _global_ban,
+   .thr_parent = _io_threshold,
+   .thr_limit = BIO_SUBMIT_MAX_LATENCY,
+   .thr_factor = 100,
+   .thr_plus = 0,
+};
+
+struct threshold bio_io_threshold[2] = {
+   [0] = {
+   .thr_ban = _global_ban,
+   .thr_parent = _io_threshold,
+   .thr_limit = BIO_IO_R_MAX_LATENCY,
+   .thr_factor = 10,
+   .thr_plus = 1,
+   },
+   [1] = {
+   .thr_ban = _global_ban,
+   .thr_parent = _io_threshold,
+   .thr_limit = BIO_IO_W_MAX_LATENCY,
+   .thr_factor = 10,
+   .thr_plus = 1,
+   },
+};
+
+/ own type definitions ***/
+
+/ own helper functions ***/
+
+/* This is called from the kernel bio layer.
+ */
+static
+void bio_callback(struct bio *bio)
+{
+   struct bio_aio_aspect *aio_a = bio->bi_private;
+   struct bio_brick *brick;
+   unsigned long flags;
+
+   CHECK_PTR(aio_a, err);
+   CHECK_PTR(aio_a->output, err);
+   brick = aio_a->output->brick;
+   CHECK_PTR(brick, err);
+
+   aio_a->status_code = bio->bi_error;
+
+   spin_lock_irqsave(>lock, flags);
+   list_del(_a->io_head);
+   list_add_tail(_a->io_head, >completed_list);
+   atomic_inc(>completed_count);
+   spin_unlock_irqrestore(>lock, flags);
+
+   wake_up_interruptible(>response_event);
+   goto out_return;
+err:
+   XIO_FAT("cannot handle bio callback\n");
+out_return:;
+}
+
+/* Map from kernel address/length to struct page (if not already known),
+ * check alignment constraints, create bio from it.
+ * Return the length (may be smaller than requested).
+ */
+static
+int make_bio(struct bio_brick *brick,
+   void *data,
+   int len,
+   loff_t pos,
+   struct bio_aio_aspect *private,
+   struct bio **_bio)
+{
+   unsigned long long sector;
+   int sector_offset;
+   int data_offset;
+   int page_offset;
+   int page_len;
+   int bvec_count;
+   int rest_len = len;
+   int result_len = 0;
+   int status;
+   int i;
+   struct bio *bio = NULL;
+   struct block_device *bdev;
+
+   status = -EINVAL;
+   CHECK_PTR(brick, out);
+   bdev = brick->bdev;
+   CHECK_PTR(bdev, out);
+
+   if (unlikely(rest_len <= 0)) {
+   XIO_ERR("bad bio len %d\n", rest_len);
+   goto out;
+   }
+
+   sector = pos >> 9; /*  TODO: make dynamic */
+   sector_offset = pos & ((1 << 9) - 1);  /*  TODO: make dynamic */
+   data_offset = ((unsigned long)data) & ((1 << 9) - 1);  /*  TODO: make 
dynamic */
+
+   if (unlikely(sector_offset > 0)) {
+   XIO_ERR("odd sector offset %d\n", sector_offset);
+   goto out;
+   }
+   if (unlikely(sector_offset != data_offset)) {
+   XIO_ERR("bad alignment: sector_offset %d != data_offset %d\n", 
sector_offset, data_offset);
+   goto out;
+   }
+   if (unlikely(rest_len & ((1 << 9) - 1))) {
+   XIO_ERR("odd length %d\n", rest_len);
+   goto out;
+   }
+
+   page_offset = ((unsigned long)data) & (PAGE_SIZE-1);
+   page_len = rest_len + page_offset;
+   bvec_count = (page_len - 1) / PAGE_S

[RFC 16/31] mars: add new module lib_log

2015-12-31 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/xio_bricks/lib_log.c | 505 ++
 include/linux/xio/lib_log.h   | 329 +++
 2 files changed, 834 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/lib_log.c
 create mode 100644 include/linux/xio/lib_log.h

diff --git a/drivers/staging/mars/xio_bricks/lib_log.c 
b/drivers/staging/mars/xio_bricks/lib_log.c
new file mode 100644
index 000..a8382e5
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/lib_log.c
@@ -0,0 +1,505 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+
+atomic_t global_aio_flying = ATOMIC_INIT(0);
+
+void exit_logst(struct log_status *logst)
+{
+   int count;
+
+   log_flush(logst);
+
+   /*  TODO: replace by event */
+   count = 0;
+   while (atomic_read(>aio_flying) > 0) {
+   if (!count++)
+   XIO_DBG("waiting for IO terminating...");
+   brick_msleep(500);
+   }
+   if (logst->read_aio) {
+   XIO_DBG("putting read_aio\n");
+   GENERIC_INPUT_CALL(logst->input, aio_put, logst->read_aio);
+   logst->read_aio = NULL;
+   }
+   if (logst->log_aio) {
+   XIO_DBG("putting log_aio\n");
+   GENERIC_INPUT_CALL(logst->input, aio_put, logst->log_aio);
+   logst->log_aio = NULL;
+   }
+}
+
+void init_logst(struct log_status *logst, struct xio_input *input, loff_t 
start_pos, loff_t end_pos)
+{
+   exit_logst(logst);
+
+   memset(logst, 0, sizeof(struct log_status));
+
+   logst->input = input;
+   logst->brick = input->brick;
+   logst->start_pos = start_pos;
+   logst->log_pos = start_pos;
+   logst->end_pos = end_pos;
+   init_waitqueue_head(>event);
+}
+
+#define XIO_LOG_CB_MAX 32
+
+struct log_cb_info {
+   struct aio_object *aio;
+   struct log_status *logst;
+   struct semaphore mutex;
+   atomic_t refcount;
+   int nr_cb;
+   void (*endios[XIO_LOG_CB_MAX])(void *private, int error);
+   void *privates[XIO_LOG_CB_MAX];
+};
+
+static
+void put_log_cb_info(struct log_cb_info *cb_info)
+{
+   if (atomic_dec_and_test(_info->refcount))
+   brick_mem_free(cb_info);
+}
+
+static
+void _do_callbacks(struct log_cb_info *cb_info, int error)
+{
+   int i;
+
+   down(_info->mutex);
+   for (i = 0; i < cb_info->nr_cb; i++) {
+   void (*end_fn)(void *private, int error);
+
+   end_fn = cb_info->endios[i];
+   cb_info->endios[i] = NULL;
+   if (end_fn)
+   end_fn(cb_info->privates[i], error);
+   }
+   up(_info->mutex);
+}
+
+static
+void log_write_endio(struct generic_callback *cb)
+{
+   struct log_cb_info *cb_info = cb->cb_private;
+   struct log_status *logst;
+
+   LAST_CALLBACK(cb);
+   CHECK_PTR(cb_info, err);
+
+   logst = cb_info->logst;
+   CHECK_PTR(logst, done);
+
+   _do_callbacks(cb_info, cb->cb_error);
+
+done:
+   put_log_cb_info(cb_info);
+   atomic_dec(>aio_flying);
+   atomic_dec(_aio_flying);
+   if (logst->signal_event)
+   wake_up_interruptible(logst->signal_event);
+
+   goto out_return;
+err:
+   XIO_FAT("internal pointer corruption\n");
+out_return:;
+}
+
+void log_flush(struct log_status *logst)
+{
+   struct aio_object *aio = logst->log_aio;
+   struct log_cb_info *cb_info;
+   int align_size;
+   int gap;
+
+   if (!aio || !logst->count)
+   goto out_return;
+   gap = 0;
+   align_size = (logst->align_size / PAGE_SIZE) * PAGE_SIZE;
+   if (align_size > 0) {
+   /*  round up to next alignment border */
+   int align_offset = logst->offset & (align_size-1);
+
+   if (align_offset > 0) {
+   int restlen = aio->io_len - logst->offset;
+
+   gap = align_size - align_offset;
+   if (unlikely(gap > restlen))
+   gap = restlen;
+   }
+   }
+   if (gap 

[RFC 14/31] mars: add new module xio_net

2015-12-31 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/xio_bricks/xio_net.c | 1830 +
 include/linux/xio/xio_net.h   |  171 +++
 2 files changed, 2001 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio_net.c
 create mode 100644 include/linux/xio/xio_net.h

diff --git a/drivers/staging/mars/xio_bricks/xio_net.c 
b/drivers/staging/mars/xio_bricks/xio_net.c
new file mode 100644
index 000..dcc443c
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio_net.c
@@ -0,0 +1,1830 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+/**/
+
+/*  provisionary version detection */
+
+#ifndef TCP_MAX_REORDERING
+#define __HAS_IOV_ITER
+#endif
+
+#ifdef sk_net_refcnt
+/* see eeb1bd5c40edb0e2fd925c8535e2fdebdbc5cef2 */
+#define __HAS_STRUCT_NET
+#endif
+
+/**/
+
+#define USE_BUFFERING
+
+#define SEND_PROTO_VERSION 2
+
+enum COMPRESS_TYPES {
+   COMPRESS_NONE = 0,
+   COMPRESS_LZO = 1,
+   /* insert further methods here */
+};
+
+int xio_net_compress_data;
+
+const u16 net_global_flags = 0
+#ifdef __HAVE_LZO
+   | COMPRESS_LZO
+#endif
+   ;
+
+/**/
+
+/* Internal data structures for low-level transfer of C structures
+ * described by struct meta.
+ * Only these low-level fields need to have a fixed size like s64.
+ * The size and bytesex of the higher-level C structures is converted
+ * automatically; therefore classical "int" or "long long" etc is viable.
+ */
+
+#define MAX_FIELD_LEN  (32 + 16)
+
+/* Please keep this at a size of 64 bytes by
+ * reuse of *spare* fields.
+ */
+struct xio_desc_cache {
+   u8cache_sender_proto;
+   u8cache_recver_proto;
+   s8cache_is_bigendian;
+   u8cache_spare0;
+   s16   cache_items;
+   u16   cache_spare1;
+   u32   cache_spare2;
+   u32   cache_spare3;
+   u64   cache_spare4[4];
+   u64   cache_sender_cookie;
+   u64   cache_recver_cookie;
+};
+
+/* Please keep this also at a size of 64 bytes by
+ * reuse of *spare* fields.
+ */
+struct xio_desc_item {
+   s8field_type;
+   s8field_spare0;
+   s16   field_data_size;
+   s16   field_sender_size;
+   s16   field_sender_offset;
+   s16   field_recver_size;
+   s16   field_recver_offset;
+   s32   field_spare;
+   char  field_name[MAX_FIELD_LEN];
+};
+
+/* This must not be mirror symmetric between big and little endian
+ */
+#define XIO_DESC_MAGIC 0x73D0A2EC6148F48Ell
+
+struct xio_desc_header {
+   u64 h_magic;
+   u64 h_cookie;
+   s16 h_meta_len;
+   s16 h_index;
+   u32 h_spare1;
+   u64 h_spare2;
+};
+
+#define MAX_INT_TRANSFER   16
+
+/**/
+
+/* Bytesex conversion / sign extension
+ */
+
+#ifdef __LITTLE_ENDIAN
+static const bool myself_is_bigendian;
+
+#endif
+#ifdef __BIG_ENDIAN
+static const bool myself_is_bigendian = true;
+
+#endif
+
+static inline
+void swap_bytes(void *data, int len)
+{
+   char *a = data;
+   char *b = data + len - 1;
+
+   while (a < b) {
+   char tmp = *a;
+
+   *a = *b;
+   *b = tmp;
+   a++;
+   b--;
+   }
+}
+
+#define SWAP_FIELD(x) swap_bytes(&(x), sizeof(x))
+
+static inline
+void swap_mc(struct xio_desc_cache *mc, int len)
+{
+   struct xio_desc_item *mi;
+
+   SWAP_FIELD(mc->cache_sender_cookie);
+   SWAP_FIELD(mc->cache_recver_cookie);
+   SWAP_FIELD(mc->cache_items);
+
+   len -= sizeof(*mc);
+
+   for (mi = (void *)(mc + 1); len > 0; mi++, len -= sizeof(*mi)) {
+   SWAP_FIELD(mi->field_data_size);
+   SWAP_FIELD(mi->field_sender_size);
+   SWAP_FIELD(mi->field_sender_offset);
+   SWAP_FIELD(mi->field_recver_size);
+   SWAP_FIELD(mi->field_recver_offset);
+   }
+}
+
+static inline
+char get_sign(const void *data, int len, bool is_bigendian, bool is_

[RFC 01/31] mars: add new module lamport

2015-12-31 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/lamport.c | 61 ++
 include/linux/brick/lamport.h  | 26 ++
 2 files changed, 87 insertions(+)
 create mode 100644 drivers/staging/mars/lamport.c
 create mode 100644 include/linux/brick/lamport.h

diff --git a/drivers/staging/mars/lamport.c b/drivers/staging/mars/lamport.c
new file mode 100644
index 000..373093f
--- /dev/null
+++ b/drivers/staging/mars/lamport.c
@@ -0,0 +1,61 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+
+/*  TODO: replace with spinlock if possible (first check) */
+struct semaphore lamport_sem = __SEMAPHORE_INITIALIZER(lamport_sem, 1);
+struct timespec lamport_now = {};
+
+void get_lamport(struct timespec *now)
+{
+   int diff;
+
+   down(_sem);
+
+   *now = CURRENT_TIME;
+   diff = timespec_compare(now, _now);
+   if (diff >= 0) {
+   timespec_add_ns(now, 1);
+   memcpy(_now, now, sizeof(lamport_now));
+   timespec_add_ns(_now, 1);
+   } else {
+   timespec_add_ns(_now, 1);
+   memcpy(now, _now, sizeof(*now));
+   }
+
+   up(_sem);
+}
+
+void set_lamport(struct timespec *old)
+{
+   int diff;
+
+   down(_sem);
+
+   diff = timespec_compare(old, _now);
+   if (diff >= 0) {
+   memcpy(_now, old, sizeof(lamport_now));
+   timespec_add_ns(_now, 1);
+   }
+
+   up(_sem);
+}
diff --git a/include/linux/brick/lamport.h b/include/linux/brick/lamport.h
new file mode 100644
index 000..9aac0ce
--- /dev/null
+++ b/include/linux/brick/lamport.h
@@ -0,0 +1,26 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef LAMPORT_H
+#define LAMPORT_H
+
+#include 
+
+extern void get_lamport(struct timespec *now);
+extern void set_lamport(struct timespec *old);
+
+#endif
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 18/31] mars: add new module xio_sio

2015-12-31 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/xio_bricks/xio_sio.c | 571 ++
 include/linux/xio/xio_sio.h   |  68 
 2 files changed, 639 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio_sio.c
 create mode 100644 include/linux/xio/xio_sio.h

diff --git a/drivers/staging/mars/xio_bricks/xio_sio.c 
b/drivers/staging/mars/xio_bricks/xio_sio.c
new file mode 100644
index 000..5822847
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio_sio.c
@@ -0,0 +1,571 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+/ own type definitions ***/
+
+#include 
+
+/* own brick * input * output operations */
+
+static int sio_io_get(struct sio_output *output, struct aio_object *aio)
+{
+   struct file *file;
+
+   if (unlikely(!output->brick->power.on_led))
+   return -EBADFD;
+
+   if (aio->obj_initialized) {
+   obj_get(aio);
+   return aio->io_len;
+   }
+
+   file = output->mf->mf_filp;
+   if (file) {
+   loff_t total_size = i_size_read(file->f_mapping->host);
+
+   aio->io_total_size = total_size;
+   /* Only check reads.
+* Writes behind EOF are always allowed (sparse files)
+*/
+   if (!aio->io_may_write) {
+   loff_t len = total_size - aio->io_pos;
+
+   if (unlikely(len <= 0)) {
+   /* Special case: allow reads starting _exactly_ 
at EOF when a timeout is specified.
+*/
+   if (len < 0 || aio->io_timeout <= 0) {
+   XIO_DBG("ENODATA %lld\n", len);
+   return -ENODATA;
+   }
+   }
+   /*  Shorten below EOF, but allow special case */
+   if (aio->io_len > len && len > 0)
+   aio->io_len = len;
+   }
+   }
+
+   /* Buffered IO.
+*/
+   if (!aio->io_data) {
+   struct sio_aio_aspect *aio_a = 
sio_aio_get_aspect(output->brick, aio);
+
+   if (unlikely(!aio_a))
+   return -EILSEQ;
+   if (unlikely(aio->io_len <= 0)) {
+   XIO_ERR("bad io_len = %d\n", aio->io_len);
+   return -ENOMEM;
+   }
+   aio->io_data = brick_block_alloc(aio->io_pos, (aio_a->alloc_len 
= aio->io_len));
+   aio_a->do_dealloc = true;
+   /* atomic_inc(>total_alloc_count); */
+   /* atomic_inc(>alloc_count); */
+   }
+
+   obj_get_first(aio);
+   return aio->io_len;
+}
+
+static void sio_io_put(struct sio_output *output, struct aio_object *aio)
+{
+   struct file *file;
+   struct sio_aio_aspect *aio_a;
+
+   if (!obj_put(aio))
+   goto out_return;
+   file = output->mf->mf_filp;
+   aio->io_total_size = i_size_read(file->f_mapping->host);
+
+   aio_a = sio_aio_get_aspect(output->brick, aio);
+   if (aio_a && aio_a->do_dealloc) {
+   brick_block_free(aio->io_data, aio_a->alloc_len);
+   /* atomic_dec(>alloc_count); */
+   }
+
+   obj_free(aio);
+out_return:;
+}
+
+static
+int write_aops(struct sio_output *output, struct aio_object *aio)
+{
+   struct file *file = output->mf->mf_filp;
+   loff_t pos = aio->io_pos;
+   void *data = aio->io_data;
+   int  len = aio->io_len;
+   int ret = 0;
+
+   mm_segment_t oldfs;
+
+   oldfs = get_fs();
+   set_fs(get_ds());
+   ret = vfs_write(file, data, len, );
+   set_fs(oldfs);
+   return ret;
+}
+
+static
+int read_aops(struct sio_output *output, struct aio_object *aio)
+{
+   loff_t pos = aio->io_pos;
+   int len = aio->io_len;
+   int ret;
+
+   mm_seg

[RFC 02/31] mars: add new module brick_say

2015-12-31 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/brick_say.c | 916 +++
 include/linux/brick/brick_say.h  |  96 
 2 files changed, 1012 insertions(+)
 create mode 100644 drivers/staging/mars/brick_say.c
 create mode 100644 include/linux/brick/brick_say.h

diff --git a/drivers/staging/mars/brick_say.c b/drivers/staging/mars/brick_say.c
new file mode 100644
index 000..7a51273
--- /dev/null
+++ b/drivers/staging/mars/brick_say.c
@@ -0,0 +1,916 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+/***/
+
+/*  messaging */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include 
+
+#ifndef GFP_BRICK
+#define GFP_BRICK  GFP_NOIO
+#endif
+
+#define SAY_ORDER  0
+#define SAY_BUFMAX (PAGE_SIZE << SAY_ORDER)
+#define SAY_BUF_LIMIT  (SAY_BUFMAX - 1500)
+#define MAX_FILELEN16
+#define MAX_IDS1000
+
+const char *say_class[MAX_SAY_CLASS] = {
+   [SAY_DEBUG] = "debug",
+   [SAY_INFO] = "info",
+   [SAY_WARN] = "warn",
+   [SAY_ERROR] = "error",
+   [SAY_FATAL] = "fatal",
+   [SAY_TOTAL] = "total",
+};
+
+int brick_say_logging = 1;
+
+module_param_named(say_logging, brick_say_logging, int, 0);
+int brick_say_debug;
+
+module_param_named(say_debug, brick_say_debug, int, 0);
+
+int brick_say_syslog_min = 1;
+int brick_say_syslog_max = -1;
+int brick_say_syslog_flood_class = 3;
+int brick_say_syslog_flood_limit = 20;
+int brick_say_syslog_flood_recovery = 300;
+
+int delay_say_on_overflow =
+#ifdef CONFIG_MARS_DEBUG
+   1;
+#else
+   0;
+#endif
+
+static atomic_t say_alloc_channels = ATOMIC_INIT(0);
+static atomic_t say_alloc_names = ATOMIC_INIT(0);
+static atomic_t say_alloc_pages = ATOMIC_INIT(0);
+
+static unsigned long flood_start_jiffies;
+static int flood_count;
+
+struct say_channel {
+   char *ch_name;
+   struct say_channel *ch_next;
+   spinlock_t ch_lock[MAX_SAY_CLASS];
+   char *ch_buf[MAX_SAY_CLASS][2];
+
+   short ch_index[MAX_SAY_CLASS];
+   struct file *ch_filp[MAX_SAY_CLASS][2];
+   int ch_overflow[MAX_SAY_CLASS];
+   bool ch_written[MAX_SAY_CLASS];
+   bool ch_rollover;
+   bool ch_must_exist;
+   bool ch_is_dir;
+   bool ch_delete;
+   int ch_status_written;
+   int ch_id_max;
+   void *ch_ids[MAX_IDS];
+
+   wait_queue_head_t ch_progress;
+};
+
+struct say_channel *default_channel;
+
+static struct say_channel *channel_list;
+
+static rwlock_t say_lock = __RW_LOCK_UNLOCKED(say_lock);
+
+static struct task_struct *say_thread;
+
+static DECLARE_WAIT_QUEUE_HEAD(say_event);
+
+bool say_dirty;
+
+#define use_atomic()   \
+   ((preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK | HARDIRQ_MASK | 
NMI_MASK)) != 0 || irqs_disabled())
+
+static
+void wait_channel(struct say_channel *ch, int class)
+{
+   if (delay_say_on_overflow && ch->ch_index[class] > SAY_BUF_LIMIT) {
+   if (!use_atomic()) {
+   say_dirty = true;
+   wake_up_interruptible(_event);
+   wait_event_interruptible_timeout(ch->ch_progress,
+   ch->ch_index[class] < SAY_BUF_LIMIT,
+   HZ / 10);
+   }
+   }
+}
+
+static
+struct say_channel *find_channel(const void *id)
+{
+   struct say_channel *res = default_channel;
+   struct say_channel *ch;
+
+   read_lock(_lock);
+   for (ch = channel_list; ch; ch = ch->ch_next) {
+   int i;
+
+   for (i = 0; i < ch->ch_id_max; i++) {
+   if (ch->ch_ids[i] == id) {
+   res = ch;
+   goto found;
+   }
+   }
+   }
+found:
+   read_unlock(_lock);
+   return res;
+}
+
+static
+void _remove_binding(struct task_struct *whom)
+{
+   struct say_channel *ch;
+   int 

[RFC 23/31] mars: add new module xio_server

2015-12-31 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/xio_bricks/xio_server.c | 486 +++
 include/linux/xio/xio_server.h   |  91 +
 2 files changed, 577 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio_server.c
 create mode 100644 include/linux/xio/xio_server.h

diff --git a/drivers/staging/mars/xio_bricks/xio_server.c 
b/drivers/staging/mars/xio_bricks/xio_server.c
new file mode 100644
index 000..95a3327
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio_server.c
@@ -0,0 +1,486 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+/*  Server brick (just for demonstration) */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+/ own type definitions ***/
+
+#include 
+
+static struct xio_socket server_socket[NR_SERVER_SOCKETS];
+static struct task_struct *server_threads[NR_SERVER_SOCKETS];
+
+/ own helper functions ***/
+
+int cb_thread(void *data)
+{
+   struct server_brick *brick = data;
+   struct xio_socket *sock = >handler_socket;
+   bool aborted = false;
+   bool ok = xio_get_socket(sock);
+   int status = -EINVAL;
+
+   XIO_DBG("--- cb_thread starting on socket #%d, ok = %d\n", 
sock->s_debug_nr, ok);
+   if (!ok)
+   goto done;
+
+   brick->cb_running = true;
+   wake_up_interruptible(>startup_event);
+
+   while (!brick_thread_should_stop() || !list_empty(>cb_read_list) 
|| !list_empty(>cb_write_list) || atomic_read(>in_flight) > 0) {
+   struct server_aio_aspect *aio_a;
+   struct aio_object *aio;
+   struct list_head *tmp;
+   unsigned long flags;
+
+   wait_event_interruptible_timeout(
+   brick->cb_event,
+   !list_empty(>cb_read_list) ||
+   !list_empty(>cb_write_list),
+   1 * HZ);
+
+   spin_lock_irqsave(>cb_lock, flags);
+   tmp = brick->cb_write_list.next;
+   if (tmp == >cb_write_list) {
+   tmp = brick->cb_read_list.next;
+   if (tmp == >cb_read_list) {
+   spin_unlock_irqrestore(>cb_lock, flags);
+   brick_msleep(1000 / HZ);
+   continue;
+   }
+   }
+   list_del_init(tmp);
+   spin_unlock_irqrestore(>cb_lock, flags);
+
+   aio_a = container_of(tmp, struct server_aio_aspect, cb_head);
+   aio = aio_a->object;
+   status = -EINVAL;
+   CHECK_PTR(aio, err);
+
+   status = 0;
+   /* Report a remote error when consistency cannot be guaranteed,
+* e.g. emergency mode during sync.
+*/
+   if (brick->conn_brick && brick->conn_brick->mode_ptr && 
*brick->conn_brick->mode_ptr < 0
+   && aio->object_cb)
+   aio->object_cb->cb_error = *brick->conn_brick->mode_ptr;
+   if (!aborted) {
+   down(>socket_sem);
+   status = xio_send_cb(sock, aio);
+   up(>socket_sem);
+   }
+
+err:
+   if (unlikely(status < 0) && !aborted) {
+   aborted = true;
+   XIO_WRN("cannot send response, status = %d\n", status);
+   /* Just shutdown the socket and forget all pending
+* requests.
+* The _client_ is responsible for resending
+* any lost operations.
+*/
+   xio_shutdown_socket(sock);
+   }
+
+   if (aio_a->data) {
+   brick_block_free(aio_a->data, aio_a->len);
+   aio->io_data = NULL;
+   }
+   if (aio_a->do_put) {
+   GENERIC_INPUT_CALL(brick->inputs[0], aio_put, aio);
+   atomic_

[RFC 25/31] mars: add new module light_net

2015-12-31 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/mars_light/light_net.c | 109 
 1 file changed, 109 insertions(+)
 create mode 100644 drivers/staging/mars/mars_light/light_net.c

diff --git a/drivers/staging/mars/mars_light/light_net.c 
b/drivers/staging/mars/mars_light/light_net.c
new file mode 100644
index 000..9890edd
--- /dev/null
+++ b/drivers/staging/mars/mars_light/light_net.c
@@ -0,0 +1,109 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+static
+char *_xio_translate_hostname(const char *name)
+{
+   char *res = brick_strdup(name);
+   char *test;
+   char *tmp;
+
+   for (tmp = res; *tmp; tmp++) {
+   if (*tmp == ':') {
+   *tmp = '\0';
+   break;
+   }
+   }
+
+   tmp = path_make("/mars/ips/ip-%s", res);
+   if (unlikely(!tmp))
+   goto done;
+
+   test = mars_readlink(tmp);
+   if (test && test[0]) {
+   XIO_DBG("'%s' => '%s'\n", tmp, test);
+   brick_string_free(res);
+   res = test;
+   } else {
+   brick_string_free(test);
+   XIO_WRN("no hostname translation for '%s'\n", tmp);
+   }
+   brick_string_free(tmp);
+
+done:
+   return res;
+}
+
+int xio_send_dent_list(struct xio_socket *sock, struct list_head *anchor)
+{
+   struct list_head *tmp;
+   struct mars_dent *dent;
+   int status = 0;
+
+   for (tmp = anchor->next; tmp != anchor; tmp = tmp->next) {
+   dent = container_of(tmp, struct mars_dent, dent_link);
+   status = xio_send_struct(sock, dent, mars_dent_meta);
+   if (status < 0)
+   break;
+   }
+   if (status >= 0) { /*  send EOR */
+   status = xio_send_struct(sock, NULL, mars_dent_meta);
+   }
+   return status;
+}
+
+int xio_recv_dent_list(struct xio_socket *sock, struct list_head *anchor)
+{
+   int status;
+
+   for (;;) {
+   struct mars_dent *dent = brick_zmem_alloc(sizeof(struct 
mars_dent));
+
+   INIT_LIST_HEAD(>dent_link);
+   INIT_LIST_HEAD(>brick_list);
+
+   status = xio_recv_struct(sock, dent, mars_dent_meta);
+   if (status <= 0) {
+   xio_free_dent(dent);
+   goto done;
+   }
+   list_add_tail(>dent_link, anchor);
+   }
+done:
+   return status;
+}
+
+/* module init stuff /
+
+int __init init_sy_net(void)
+{
+   XIO_INF("init_sy_net()\n");
+   xio_translate_hostname = _xio_translate_hostname;
+   return 0;
+}
+
+void exit_sy_net(void)
+{
+   XIO_INF("exit_sy_net()\n");
+}
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 05/31] mars: add new module meta

2015-12-31 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 include/linux/brick/meta.h | 106 +
 1 file changed, 106 insertions(+)
 create mode 100644 include/linux/brick/meta.h

diff --git a/include/linux/brick/meta.h b/include/linux/brick/meta.h
new file mode 100644
index 000..a92b2b6
--- /dev/null
+++ b/include/linux/brick/meta.h
@@ -0,0 +1,106 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef META_H
+#define META_H
+
+/***/
+
+/*  metadata descriptions */
+
+/* The idea is to describe your C structures in such a way that
+ * transfers to disk or over a network become self-describing.
+ *
+ * In essence, this is a kind of version-independent marshalling.
+ *
+ * Advantage:
+ * When you extend your original C struct (and of course update the
+ * corresponding meta structure), old data on disk (or network peers
+ * running an old version of your program) will remain valid.
+ * Upon read, newly added fields missing in the old version will be simply
+ * not filled in and therefore remain zeroed (if you don't forget to
+ * initially clear your structures via memset() / initializers / etc).
+ * Note that this works only if you never rename or remove existing
+ * fields; you should only add new ones.
+ * [TODO: add macros for description of ignored / renamed fields to
+ *  overcome this limitation]
+ * You may increase the size of integers, for example from 32bit to 64bit
+ * or even higher; sign extension will be automatically carried out
+ * when necessary.
+ * Also, you may change the order of fields, because the metadata interpreter
+ * will check each field individually; field offsets are automatically
+ * maintained.
+ *
+ * Disadvantage: this adds some (small) overhead.
+ */
+
+enum field_type {
+   FIELD_DONE,
+   FIELD_REF,
+   FIELD_SUB,
+   FIELD_STRING,
+   FIELD_RAW,
+   FIELD_INT,
+   FIELD_UINT,
+};
+
+struct meta {
+   /* char field_name[MAX_FIELD_LEN]; */
+   char *field_name;
+
+   short field_type;
+   short field_data_size;
+   short field_transfer_size;
+   int   field_offset;
+   const struct meta *field_ref;
+};
+
+#define _META_INI(NAME, STRUCT, TYPE, TSIZE)   \
+   .field_name = #NAME,\
+   .field_type = TYPE, \
+   .field_data_size = sizeof(((STRUCT *)NULL)->NAME),  \
+   .field_transfer_size = (TSIZE), \
+   .field_offset = offsetof(STRUCT, NAME)  \
+
+#define META_INI_TRANSFER(NAME, STRUCT, TYPE, TSIZE)   \
+   { _META_INI(NAME, STRUCT, TYPE, TSIZE) }
+
+#define META_INI(NAME, STRUCT, TYPE)   \
+   { _META_INI(NAME, STRUCT, TYPE, 0) }
+
+#define _META_INI_AIO(NAME, STRUCT, AIO)   \
+   .field_name = #NAME,\
+   .field_type = FIELD_REF,\
+   .field_data_size = sizeof(*(((STRUCT *)NULL)->NAME)),   \
+   .field_offset = offsetof(STRUCT, NAME), \
+   .field_ref = AIO
+
+#define META_INI_AIO(NAME, STRUCT, AIO) { _META_INI_AIO(NAME, STRUCT, AIO) }
+
+#define _META_INI_SUB(NAME, STRUCT, SUB)   \
+   .field_name = #NAME,\
+   .field_type = FIELD_SUB,\
+   .field_data_size = sizeof(((STRUCT *)NULL)->NAME),  \
+   .field_offset = offsetof(STRUCT, NAME), \
+   .field_ref = SUB
+
+#define META_INI_SUB(NAME, STRUCT, SUB) { _META_INI_SUB(NAME, STRUCT, SUB) }
+
+extern const struct meta *find_meta(const struct meta *meta, const char 
*field_name);
+/* extern void free_meta(void *data, const struct meta *meta); */
+
+#endif
-- 
2.6.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC 21/31] mars: add new module xio_copy

2015-12-31 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/xio_bricks/xio_copy.c | 1005 
 include/linux/xio/xio_copy.h   |  115 
 2 files changed, 1120 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio_copy.c
 create mode 100644 include/linux/xio/xio_copy.h

diff --git a/drivers/staging/mars/xio_bricks/xio_copy.c 
b/drivers/staging/mars/xio_bricks/xio_copy.c
new file mode 100644
index 000..aa5bc56
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio_copy.c
@@ -0,0 +1,1005 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+/*  Copy brick (just for demonstration) */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#ifndef READ
+#define READ   0
+#define WRITE  1
+#endif
+
+#define COPY_CHUNK (PAGE_SIZE)
+#define NR_COPY_REQUESTS   (32 * 1024 * 1024 / COPY_CHUNK)
+
+#define STATES_PER_PAGE(PAGE_SIZE / sizeof(struct 
copy_state))
+#define MAX_SUB_TABLES (NR_COPY_REQUESTS / STATES_PER_PAGE + 
(NR_COPY_REQUESTS % STATES_PER_PAGE ? 1 : 0)\
+   \
+)
+#define MAX_COPY_REQUESTS  (PAGE_SIZE / sizeof(struct copy_state 
*) * STATES_PER_PAGE)
+
+#define GET_STATE(brick, index)
\
+   ((brick)->st[(index) / STATES_PER_PAGE][(index) % STATES_PER_PAGE])
+
+/ own type definitions ***/
+
+#include 
+
+int xio_copy_overlap = 1;
+
+int xio_copy_read_prio = XIO_PRIO_NORMAL;
+
+int xio_copy_write_prio = XIO_PRIO_NORMAL;
+
+int xio_copy_read_max_fly;
+
+int xio_copy_write_max_fly;
+
+#define is_read_limited(brick) \
+   (xio_copy_read_max_fly > 0 && atomic_read(&(brick)->copy_read_flight) 
>= xio_copy_read_max_fly)
+
+#define is_write_limited(brick)
\
+   (xio_copy_write_max_fly > 0 && atomic_read(&(brick)->copy_write_flight) 
>= xio_copy_write_max_fly)
+
+/ own helper functions ***/
+
+/* TODO:
+ * The clash logic is untested / alpha stage (Feb. 2011).
+ *
+ * For now, the output is never used, so this cannot do harm.
+ *
+ * In order to get the output really working / enterprise grade,
+ * some larger test effort should be invested.
+ */
+static inline
+void _clash(struct copy_brick *brick)
+{
+   brick->trigger = true;
+   set_bit(0, >clash);
+   atomic_inc(>total_clash_count);
+   wake_up_interruptible(>event);
+}
+
+static inline
+int _clear_clash(struct copy_brick *brick)
+{
+   int old;
+
+   old = test_and_clear_bit(0, >clash);
+   return old;
+}
+
+/* Current semantics:
+ *
+ * All writes are always going to the original input A. They are _not_
+ * replicated to B.
+ *
+ * In order to get B really uptodate, you have to replay the right
+ * transaction logs there (at the right time).
+ * [If you had no writes on A at all during the copy, of course
+ * this is not necessary]
+ *
+ * When utilize_mode is on, reads can utilize the already copied
+ * region from B, but only as long as this region has not been
+ * invalidated by writes (indicated by low_dirty).
+ *
+ * TODO: implement replicated writes, together with some transaction
+ * replay logic applying the transaction logs _only_ after
+ * crashes during inconsistency caused by partial replication of writes.
+ */
+static
+int _determine_input(struct copy_brick *brick, struct aio_object *aio)
+{
+   int rw;
+   int below;
+   int behind;
+   loff_t io_end;
+
+   if (!brick->utilize_mode || brick->low_dirty)
+   return INPUT_A_IO;
+
+   io_end = aio->io_pos + aio->io_len;
+   below = io_end <= brick->copy_start;
+   behind = !brick->copy_end || aio->io_pos >= brick->copy_end;
+   rw = aio->io_may_write | aio->io_rw;
+   if (rw) {
+   if (!behind) {
+   brick->low_dirty = true;
+   if (!below) {
+   _clash(brick);
+   wake_up_interruptible(>event);
+ 

[RFC 20/31] mars: add new module xio_if

2015-12-31 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/xio_bricks/xio_if.c | 961 +++
 include/linux/xio/xio_if.h   | 108 
 2 files changed, 1069 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio_if.c
 create mode 100644 include/linux/xio/xio_if.h

diff --git a/drivers/staging/mars/xio_bricks/xio_if.c 
b/drivers/staging/mars/xio_bricks/xio_if.c
new file mode 100644
index 000..65e023c
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio_if.c
@@ -0,0 +1,961 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+/* Interface to a Linux device.
+ * 1 Input, 0 Outputs.
+ */
+
+#define REQUEST_MERGING
+#define ALWAYS_UNPLUG  true
+#define PREFETCH_LEN   PAGE_SIZE
+
+/*  low-level device parameters */
+#define IF_MAX_SEGMENT_SIZEPAGE_SIZE
+#define USE_MAX_SECTORS(IF_MAX_SEGMENT_SIZE >> 9)
+#define USE_MAX_PHYS_SEGMENTS  (IF_MAX_SEGMENT_SIZE >> 9)
+#define USE_MAX_SEGMENT_SIZE   IF_MAX_SEGMENT_SIZE
+#define USE_LOGICAL_BLOCK_SIZE 512
+#define USE_SEGMENT_BOUNDARY   (PAGE_SIZE-1)
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#ifndef XIO_MAJOR
+#define XIO_MAJOR  (DRBD_MAJOR + 1)
+#endif
+
+/ global tuning ***/
+
+int if_throttle_start_size;
+
+struct rate_limiter if_throttle = {
+   .lim_max_rate = 5000,
+};
+
+/ own type definitions ***/
+
+#include 
+
+#define IF_HASH_MAX(PAGE_SIZE / sizeof(struct 
if_hash_anchor))
+#define IF_HASH_CHUNK  (PAGE_SIZE * 32)
+
+struct if_hash_anchor {
+   spinlock_t hash_lock;
+   struct list_head hash_anchor;
+};
+
+/ own static definitions ***/
+
+/*  TODO: check bounds, ensure that free minor numbers are recycled */
+static int device_minor;
+
+/*** object * aspect constructors * destructors **/
+
+/ linux operations ***/
+
+static
+void _if_start_io_acct(struct if_input *input, struct bio_wrapper *biow)
+{
+   struct bio *bio = biow->bio;
+   const int rw = bio_data_dir(bio);
+   const int cpu = part_stat_lock();
+
+   (void)cpu;
+   part_round_stats(cpu, >disk->part0);
+   part_stat_inc(cpu, >disk->part0, ios[rw]);
+   part_stat_add(cpu, >disk->part0, sectors[rw], 
bio->bi_iter.bi_size >> 9);
+   part_inc_in_flight(>disk->part0, rw);
+   part_stat_unlock();
+   biow->start_time = jiffies;
+}
+
+static
+void _if_end_io_acct(struct if_input *input, struct bio_wrapper *biow)
+{
+   unsigned long duration = jiffies - biow->start_time;
+   struct bio *bio = biow->bio;
+   const int rw = bio_data_dir(bio);
+   const int cpu = part_stat_lock();
+
+   (void)cpu;
+   part_stat_add(cpu, >disk->part0, ticks[rw], duration);
+   part_round_stats(cpu, >disk->part0);
+   part_dec_in_flight(>disk->part0, rw);
+   part_stat_unlock();
+}
+
+/* callback
+ */
+static
+void if_endio(struct generic_callback *cb)
+{
+   struct if_aio_aspect *aio_a = cb->cb_private;
+   struct if_input *input;
+   int k;
+   int rw;
+   int error;
+
+   LAST_CALLBACK(cb);
+   if (unlikely(!aio_a || !aio_a->object)) {
+   XIO_FAT("aio_a = %p aio = %p, something is very wrong here!\n", 
aio_a, aio_a->object);
+   goto out_return;
+   }
+   input = aio_a->input;
+   CHECK_PTR(input, err);
+
+   rw = aio_a->object->io_rw;
+
+   for (k = 0; k < aio_a->bio_count; k++) {
+   struct bio_wrapper *biow;
+   struct bio *bio;
+
+   biow = aio_a->orig_biow[k];
+   aio_a->orig_biow[k] = NULL;
+   CHECK_PTR(biow, err);
+
+   CHECK_ATOMIC(>bi_comp_cnt, 1);
+   if (!atomic_dec_and_test(>bi_comp_cnt))
+   continue;
+
+   bio = biow->bio;
+   CHECK_PTR_NULL(bio, err);
+
+   _if_end_io_acct(input, biow);
+
+  

[RFC 22/31] mars: add new module xio_trans_logger

2015-12-31 Thread Thomas Schoebel-Theuer
Signed-off-by: Thomas Schoebel-Theuer 
---
 drivers/staging/mars/xio_bricks/xio_trans_logger.c | 3309 
 include/linux/xio/xio_trans_logger.h   |  263 ++
 2 files changed, 3572 insertions(+)
 create mode 100644 drivers/staging/mars/xio_bricks/xio_trans_logger.c
 create mode 100644 include/linux/xio/xio_trans_logger.h

diff --git a/drivers/staging/mars/xio_bricks/xio_trans_logger.c 
b/drivers/staging/mars/xio_bricks/xio_trans_logger.c
new file mode 100644
index 000..04d4c63
--- /dev/null
+++ b/drivers/staging/mars/xio_bricks/xio_trans_logger.c
@@ -0,0 +1,3309 @@
+/*
+ * MARS Long Distance Replication Software
+ *
+ * Copyright (C) 2010-2014 Thomas Schoebel-Theuer
+ * Copyright (C) 2011-2014 1&1 Internet AG
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+/*  Trans_Logger brick */
+
+#define XIO_DEBUGGING
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include 
+
+/*  variants */
+#define KEEP_UNIQUE
+#define DELAY_CALLERS  /*  this is _needed_ for production 
systems */
+/* When possible, queue 1 executes phase3_startio() directly without
+ * intermediate queueing into queue 3 = > may be irritating, but has better
+ * performance. NOTICE: when some day the IO scheduling should be
+ * different between queue 1 and 3, you MUST disable this in order
+ * to distinguish between them!
+ */
+#define SHORTCUT_1_to_3
+
+/*  commenting this out is dangerous for data integrity! use only for testing! 
*/
+#define USE_MEMCPY
+#define DO_WRITEBACK   /*  otherwise FAKE IO */
+#define REPLAY_DATA
+
+/*  tuning */
+#ifdef BRICK_DEBUG_MEM
+#define CONF_TRANS_CHUNKSIZE   (128 * 1024 - PAGE_SIZE * 2)
+#else
+#define CONF_TRANS_CHUNKSIZE   (128 * 1024)
+#endif
+#define CONF_TRANS_MAX_AIO_SIZEPAGE_SIZE
+#define CONF_TRANS_ALIGN   0
+
+#define XIO_RPL(_args...) /*empty*/
+
+struct trans_logger_hash_anchor {
+   struct rw_semaphore hash_mutex;
+   struct list_head hash_anchor;
+};
+
+#define NR_HASH_PAGES  64
+
+#define MAX_HASH_PAGES (PAGE_SIZE / sizeof(struct 
trans_logger_hash_anchor *))
+#define HASH_PER_PAGE  (PAGE_SIZE / sizeof(struct 
trans_logger_hash_anchor))
+#define HASH_TOTAL (NR_HASH_PAGES * HASH_PER_PAGE)
+
+/ global tuning ***/
+
+int trans_logger_completion_semantics = 1;
+
+int trans_logger_do_crc =
+#ifdef CONFIG_MARS_DEBUG
+   true;
+#else
+   false;
+#endif
+
+int trans_logger_mem_usage; /* in KB */
+
+int trans_logger_max_interleave = -1;
+
+int trans_logger_resume = 1;
+
+int trans_logger_replay_timeout = 1; /*  in s */
+
+struct writeback_group global_writeback = {
+   .lock = __RW_LOCK_UNLOCKED(global_writeback.lock),
+   .group_anchor = LIST_HEAD_INIT(global_writeback.group_anchor),
+   .until_percent = 30,
+};
+
+static
+void add_to_group(struct writeback_group *gr, struct trans_logger_brick *brick)
+{
+   unsigned long flags;
+
+   write_lock_irqsave(>lock, flags);
+   list_add_tail(>group_head, >group_anchor);
+   write_unlock_irqrestore(>lock, flags);
+}
+
+static
+void remove_from_group(struct writeback_group *gr, struct trans_logger_brick 
*brick)
+{
+   unsigned long flags;
+
+   write_lock_irqsave(>lock, flags);
+   list_del_init(>group_head);
+   gr->leader = NULL;
+   write_unlock_irqrestore(>lock, flags);
+}
+
+static
+struct trans_logger_brick *elect_leader(struct writeback_group *gr)
+{
+   struct trans_logger_brick *res = gr->leader;
+   struct list_head *tmp;
+   unsigned long flags;
+
+   if (res && gr->until_percent >= 0) {
+   loff_t used = atomic64_read(>shadow_mem_used);
+
+   if (used > gr->biggest * gr->until_percent / 100)
+   goto done;
+   }
+
+   read_lock_irqsave(>lock, flags);
+   for (tmp = gr->group_anchor.next; tmp != >group_anchor; tmp = 
tmp->next) {
+   struct trans_logger_brick *test = container_of(tmp, struct 
trans_logger_brick, group_head);
+   loff_t new_used = atomic64_read(>shadow_mem_used);
+
+   if (!res || new_used > atomic64_read(>shadow_mem_used)) {
+   res = test;
+   gr->biggest = new_used;
+   }
+   }
+   read_unlo

  1   2   3   >