[Cluster-devel] kernel BUG at fs/gfs2/inode.h:64

2019-01-08 Thread Mark Syms
Hi,

We've seen this in testing with 4.19.

Full trace is at the bottom.

Looking at the code though it looks like it will assert if the value of change 
is equal to the number of blocks currently allocated to the inode. Is this 
expected or should the assert be using >= instead of > ?

Thanks,

Mark

--

2018-12-24 00:07:03 UTC] [ 3366.209273] kernel BUG at fs/gfs2/inode.h:64!

[2018-12-24 00:07:03 UTC] [ 3366.209305] invalid opcode:  [#1] SMP NOPTI

[2018-12-24 00:07:03 UTC] [ 3366.209327] CPU: 10 PID: 30664 Comm: kworker/10:1 
Tainted: G   O  4.19.0+0 #1

[2018-12-24 00:07:03 UTC] [ 3366.209352] Hardware name: HP ProLiant BL420c 
Gen8, BIOS I30 11/03/2014

[2018-12-24 00:07:03 UTC] [ 3366.209397] Workqueue: delete_workqueue 
delete_work_func [gfs2]

[2018-12-24 00:07:03 UTC] [ 3366.209437] RIP: 
e030:gfs2_add_inode_blocks.part.33+0x10/0x12 [gfs2]

[2018-12-24 00:07:03 UTC] [ 3366.209460] Code: 00 89 c2 48 c7 c7 80 bc 4c c0 31 
c0 e8 ec 44 c1 c0 e9 68 fc ff ff 0f 0b 0f 0b 48 8b 47 28 48 8b b8 08 04 00 00 
e8 bf dd ff ff <0f> 0b 0f 0b 0f 0b 66 66 66 66 90 0f 0b 48 8b 47 28 48 8b b8 08 
04

[2018-12-24 00:07:03 UTC] [ 3366.209512] RSP: e02b:c9004a53fbf0 EFLAGS: 
00010246

[2018-12-24 00:07:03 UTC] [ 3366.209532] RAX: 0043 RBX: 
8881709a4000 RCX: 

[2018-12-24 00:07:03 UTC] [ 3366.209556] RDX:  RSI: 
88818d496428 RDI: 88818d496428

[2018-12-24 00:07:03 UTC] [ 3366.209581] RBP: c9004a53fdb0 R08: 
 R09: 0553

[2018-12-24 00:07:03 UTC] [ 3366.209605] R10:  R11: 
c9004a53f968 R12: 888156a303d8

[2018-12-24 00:07:03 UTC] [ 3366.209629] R13: 00f47524 R14: 
8881709a3ac0 R15: 888188d31110

[2018-12-24 00:07:03 UTC] [ 3366.209683] FS:  7fa73976e700() 
GS:88818d48() knlGS:

[2018-12-24 00:07:03 UTC] [ 3366.209709] CS:  e033 DS:  ES:  CR0: 
80050033

[2018-12-24 00:07:03 UTC] [ 3366.209730] CR2: 7fca0def3000 CR3: 
000187fee000 CR4: 00042660

[2018-12-24 00:07:03 UTC] [ 3366.209802] Call Trace:

[2018-12-24 00:07:03 UTC] [ 3366.209835]  punch_hole+0xf4f/0x10d0 [gfs2]

[2018-12-24 00:07:03 UTC] [ 3366.209866]  ? __switch_to_asm+0x40/0x70

[2018-12-24 00:07:03 UTC] [ 3366.209907]  ? __switch_to_asm+0x40/0x70

[2018-12-24 00:07:03 UTC] [ 3366.209924]  ? __switch_to_asm+0x40/0x70

[2018-12-24 00:07:03 UTC] [ 3366.209949]  ? punch_hole+0xb3e/0x10d0 [gfs2]

[2018-12-24 00:07:03 UTC] [ 3366.209983]  gfs2_evict_inode+0x57b/0x680 [gfs2]

[2018-12-24 00:07:03 UTC] [ 3366.210036]  ? __inode_wait_for_writeback+0x75/0xe0

[2018-12-24 00:07:03 UTC] [ 3366.210066]  ? gfs2_evict_inode+0x1c7/0x680 [gfs2]

[2018-12-24 00:07:03 UTC] [ 3366.210090]  evict+0xc6/0x1a0

[2018-12-24 00:07:03 UTC] [ 3366.210151]  delete_work_func+0x60/0x70 [gfs2]

[2018-12-24 00:07:03 UTC] [ 3366.210177]  process_one_work+0x165/0x370

[2018-12-24 00:07:03 UTC] [ 3366.210197]  worker_thread+0x49/0x3e0

[2018-12-24 00:07:03 UTC] [ 3366.210216]  kthread+0xf8/0x130

[2018-12-24 00:07:03 UTC] [ 3366.210235]  ? rescuer_thread+0x310/0x310

[2018-12-24 00:07:03 UTC] [ 3366.210284]  ? kthread_bind+0x10/0x10

[2018-12-24 00:07:04 UTC] [ 3366.210301]  ret_from_fork+0x35/0x40

[2018-12-24 00:07:04 UTC] [ 3366.210319] Modules linked in: tun nfsv3 nfs_acl 
nfs lockd grace fscache gfs2 dlm iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi 8021q garp mrp stp llc openvswitch nsh nf_nat_ipv6 
nf_nat_ipv4 nf_conncount nf_nat ipt_REJECT nf_reject_ipv4 xt_tcpudp 
xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c 
iptable_filter dm_multipath sunrpc sb_edac intel_powerclamp crct10dif_pclmul 
crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd 
glue_helper dm_mod sg hpilo lpc_ich intel_rapl_perf psmouse ipmi_si 
ipmi_devintf ipmi_msghandler nls_utf8 isofs acpi_power_meter loop xen_wdt 
ip_tables x_tables sd_mod uhci_hcd serio_raw ehci_pci ehci_hcd hpsa igb(O) 
scsi_transport_sas scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_mod 
ipv6 crc_ccitt

[2018-12-24 00:07:04 UTC] [ 3366.210663] ---[ end trace d36fcbebd28b36e9 ]---



[Cluster-devel] [PATCH 3/3] dlm: allow binding to all network interfaces

2019-01-08 Thread David Windsor
Currently, in the kernel, DLM only is able to bind its
listen socket to a single network interface.  To support
more robust network configurations, DLM should be able
to bind to all network interfaces.

This patch adds a configfs node to enable/disable binding
to all network interfaces.  When 1 is written to this
configfs node, the DLM listen socket will bind to all network
interfaces.  When 0 is written to the node, DLM will bind
only to its current local network interface.

Signed-off-by: David Windsor 
---
 fs/dlm/config.c   | 21 +
 fs/dlm/config.h   |  3 ++-
 fs/dlm/lowcomms.c | 19 ++-
 3 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/fs/dlm/config.c b/fs/dlm/config.c
index 23d2677e10bd..77dc325c1972 100644
--- a/fs/dlm/config.c
+++ b/fs/dlm/config.c
@@ -29,6 +29,7 @@
  * /config/dlm//spaces//nodes//weight
  * /config/dlm//comms//nodeid
  * /config/dlm//comms//local
+ * /config/dlm//comms//bind_all
  * /config/dlm//comms//addr  (write only)
  * /config/dlm//comms//addr_list (read only)
  * /config/dlm//comms//error(write only)
@@ -39,6 +40,7 @@ static struct config_group *space_list;
 static struct config_group *comm_list;
 static struct dlm_comm *local_comm;
 static uint32_t dlm_comm_count;
+static int bind_all;
 
 struct dlm_clusters;
 struct dlm_cluster;
@@ -197,6 +199,7 @@ static struct configfs_attribute *cluster_attrs[] = {
 enum {
COMM_ATTR_NODEID = 0,
COMM_ATTR_LOCAL,
+   COMM_ATTR_BIND_ALL,
COMM_ATTR_ADDR,
COMM_ATTR_ADDR_LIST,
COMM_ATTR_ERROR,
@@ -681,8 +684,20 @@ static ssize_t comm_error_store(struct config_item *item, 
const char *buf,
return len;
 }
 
+static ssize_t comm_bind_all_show(struct config_item *item, char *buf)
+{
+   return sprintf(buf, "%d\n", bind_all);
+}
+
+static ssize_t comm_bind_all_store(struct config_item *item, const char *buf,
+  size_t len)
+{
+   return kstrtoint(buf, 0, &bind_all);
+}
+
 CONFIGFS_ATTR(comm_, nodeid);
 CONFIGFS_ATTR(comm_, local);
+CONFIGFS_ATTR(comm_, bind_all);
 CONFIGFS_ATTR_WO(comm_, error);
 CONFIGFS_ATTR_WO(comm_, addr);
 CONFIGFS_ATTR_RO(comm_, addr_list);
@@ -693,6 +708,7 @@ static struct configfs_attribute *comm_attrs[] = {
[COMM_ATTR_ADDR] = &comm_attr_addr,
[COMM_ATTR_ADDR_LIST] = &comm_attr_addr_list,
[COMM_ATTR_ERROR] = &comm_attr_error,
+   [COMM_ATTR_BIND_ALL] = &comm_attr_bind_all,
NULL,
 };
 
@@ -868,6 +884,11 @@ int dlm_our_addr(struct sockaddr_storage *addr, int num)
return 0;
 }
 
+int dlm_bind_all(void)
+{
+   return bind_all;
+}
+
 /* Config file defaults */
 #define DEFAULT_TCP_PORT   21064
 #define DEFAULT_BUFFER_SIZE 4096
diff --git a/fs/dlm/config.h b/fs/dlm/config.h
index 6041eec886ab..e3fd8ce45874 100644
--- a/fs/dlm/config.h
+++ b/fs/dlm/config.h
@@ -21,7 +21,7 @@ struct dlm_config_node {
uint32_t comm_seq;
 };
 
-#define DLM_MAX_ADDR_COUNT 3
+#define DLM_MAX_ADDR_COUNT 9
 
 struct dlm_config_info {
int ci_tcp_port;
@@ -49,6 +49,7 @@ int dlm_config_nodes(char *lsname, struct dlm_config_node 
**nodes_out,
 int dlm_comm_seq(int nodeid, uint32_t *seq);
 int dlm_our_nodeid(void);
 int dlm_our_addr(struct sockaddr_storage *addr, int num);
+int dlm_bind_all(void);
 
 #endif /* __CONFIG_DOT_H__ */
 
diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index d37af1372ed0..8b9d02485116 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -1374,6 +1374,9 @@ static int sctp_listen_for_all(void)
 static int tcp_listen_for_all(void)
 {
struct socket *sock = NULL;
+   struct sockaddr_in *sin4;
+   struct sockaddr_in6 *sin6;
+   struct sockaddr_storage sas, laddr;
struct connection *con = nodeid2con(0, GFP_NOFS);
int result = -EINVAL;
 
@@ -1382,7 +1385,21 @@ static int tcp_listen_for_all(void)
 
log_print("Using TCP for communications");
 
-   sock = tcp_create_listen_sock(con, dlm_local_addr[dlm_local_idx]);
+   memcpy(&sas, dlm_local_addr[dlm_local_idx], sizeof(sas));
+   memcpy(&laddr, dlm_local_addr[dlm_local_idx], sizeof(laddr));
+   if (dlm_bind_all()) {
+   if (sas.ss_family == AF_INET) {
+   sin4 = (struct sockaddr_in *) &sas;
+   sin4->sin_addr.s_addr = htonl(INADDR_ANY);
+   memcpy(&laddr, sin4, sizeof(laddr));
+   } else {
+   sin6 = (struct sockaddr_in6 *) &sas;
+   sin6->sin6_addr = in6addr_any;
+   memcpy(&laddr, sin6, sizeof(laddr));
+   }
+   }
+
+   sock = tcp_create_listen_sock(con, &laddr);
if (sock) {
add_sock(sock, con);
result = 0;
-- 
2.20.1



[Cluster-devel] [PATCH 1/3] dlm: check if workqueues are NULL before destroying

2019-01-08 Thread David Windsor
If a network failure occurs before any DLM traffic can be
generated, the send and receive workqueues can be NULL
when work_stop() is called.  Check to see if these workqueues
are NULL before calling destroy_workqueue().

Signed-off-by: David Windsor 
---
 fs/dlm/lowcomms.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 76976d6e50f9..905cbdbd31bc 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -1630,8 +1630,10 @@ static void clean_writequeues(void)
 
 static void work_stop(void)
 {
-   destroy_workqueue(recv_workqueue);
-   destroy_workqueue(send_workqueue);
+   if (recv_workqueue)
+   destroy_workqueue(recv_workqueue);
+   if (send_workqueue)
+   destroy_workqueue(send_workqueue);
 }
 
 static int work_start(void)
-- 
2.20.1



[Cluster-devel] [PATCH 2/3] dlm: add TCP multihoming/failover support

2019-01-08 Thread David Windsor
Add the ability to specify multiple source addresses
for DLM nodes so that multihomed configurations can
use multiple addresses and still be recognized by the
receiving node.

While each node is capable of being configured for multiple
IPs, DLM requires each node have only one active address
at a time.

This patch introduces a round-robin heuristic for selecting
the next active interface, but other heuristics could
easily be added later.

To support failover, a new configfs node is added by this patch:
/sys/kernel/config/dlm/cluster/comms//error
This node is write-only, and is provided so that userspace
may signal the kernel when it detects a communications error.
The kernel will switch to the next local network interface
after 1 is written to the new configfs node.

Signed-off-by: David Windsor 
---
 fs/dlm/config.c   | 21 +
 fs/dlm/lowcomms.c | 34 +++---
 fs/dlm/lowcomms.h |  1 +
 3 files changed, 45 insertions(+), 11 deletions(-)

diff --git a/fs/dlm/config.c b/fs/dlm/config.c
index 1270551d24e3..23d2677e10bd 100644
--- a/fs/dlm/config.c
+++ b/fs/dlm/config.c
@@ -31,6 +31,7 @@
  * /config/dlm//comms//local
  * /config/dlm//comms//addr  (write only)
  * /config/dlm//comms//addr_list (read only)
+ * /config/dlm//comms//error(write only)
  * The  level is useless, but I haven't figured out how to avoid it.
  */
 
@@ -198,6 +199,7 @@ enum {
COMM_ATTR_LOCAL,
COMM_ATTR_ADDR,
COMM_ATTR_ADDR_LIST,
+   COMM_ATTR_ERROR,
 };
 
 enum {
@@ -662,8 +664,26 @@ static ssize_t comm_addr_list_show(struct config_item 
*item, char *buf)
return 4096 - allowance;
 }
 
+static ssize_t comm_error_store(struct config_item *item, const char *buf,
+   size_t len)
+{
+   int ret, i;
+
+   ret = kstrtoint(buf, 0, &i);
+   if (ret < 0)
+   return ret;
+
+   if (i == 0)
+   return 0;
+
+   dlm_lowcomms_next_addr();
+
+   return len;
+}
+
 CONFIGFS_ATTR(comm_, nodeid);
 CONFIGFS_ATTR(comm_, local);
+CONFIGFS_ATTR_WO(comm_, error);
 CONFIGFS_ATTR_WO(comm_, addr);
 CONFIGFS_ATTR_RO(comm_, addr_list);
 
@@ -672,6 +692,7 @@ static struct configfs_attribute *comm_attrs[] = {
[COMM_ATTR_LOCAL] = &comm_attr_local,
[COMM_ATTR_ADDR] = &comm_attr_addr,
[COMM_ATTR_ADDR_LIST] = &comm_attr_addr_list,
+   [COMM_ATTR_ERROR] = &comm_attr_error,
NULL,
 };
 
diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 905cbdbd31bc..d37af1372ed0 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -159,6 +159,7 @@ static DEFINE_SPINLOCK(dlm_node_addrs_spin);
 static struct sockaddr_storage *dlm_local_addr[DLM_MAX_ADDR_COUNT];
 static int dlm_local_count;
 static int dlm_allow_conn;
+static int dlm_local_idx;
 
 /* Work queues */
 static struct workqueue_struct *recv_workqueue;
@@ -330,7 +331,7 @@ static int nodeid_to_addr(int nodeid, struct 
sockaddr_storage *sas_out,
if (!sa_out)
return 0;
 
-   if (dlm_local_addr[0]->ss_family == AF_INET) {
+   if (dlm_local_addr[dlm_local_idx]->ss_family == AF_INET) {
struct sockaddr_in *in4  = (struct sockaddr_in *) &sas;
struct sockaddr_in *ret4 = (struct sockaddr_in *) sa_out;
ret4->sin_addr.s_addr = in4->sin_addr.s_addr;
@@ -519,6 +520,8 @@ static void lowcomms_error_report(struct sock *sk)
   dlm_config.ci_tcp_port, sk->sk_err,
   sk->sk_err_soft);
}
+
+   dlm_lowcomms_next_addr();
 out:
read_unlock_bh(&sk->sk_callback_lock);
if (orig_report)
@@ -572,7 +575,7 @@ static void add_sock(struct socket *sock, struct connection 
*con)
 static void make_sockaddr(struct sockaddr_storage *saddr, uint16_t port,
  int *addr_len)
 {
-   saddr->ss_family =  dlm_local_addr[0]->ss_family;
+   saddr->ss_family =  dlm_local_addr[dlm_local_idx]->ss_family;
if (saddr->ss_family == AF_INET) {
struct sockaddr_in *in4_addr = (struct sockaddr_in *)saddr;
in4_addr->sin_port = cpu_to_be16(port);
@@ -1169,7 +1172,7 @@ static void tcp_connect_to_sock(struct connection *con)
 
/* Bind to our cluster-known address connecting to avoid
   routing problems */
-   memcpy(&src_addr, dlm_local_addr[0], sizeof(src_addr));
+   memcpy(&src_addr, dlm_local_addr[dlm_local_idx], sizeof(src_addr));
make_sockaddr(&src_addr, 0, &addr_len);
result = sock->ops->bind(sock, (struct sockaddr *) &src_addr,
 addr_len);
@@ -1213,6 +1216,7 @@ static void tcp_connect_to_sock(struct connection *con)
  con->retries, result);
mutex_unlock(&con->sock_mutex);
msleep(1000);
+   dlm_lowcomms_next_addr();
lowcomms_connect_sock(con);
return;
}
@@ -1293,6 +129

Re: [Cluster-devel] [PATCH 2/3] socket: Rename SO_RCVTIMEO/ SO_SNDTIMEO with _OLD suffixes

2019-01-08 Thread Arnd Bergmann
On Tue, Jan 8, 2019 at 6:24 AM Deepa Dinamani  wrote:
>
> SO_RCVTIMEO and SO_SNDTIMEO socket options use struct timeval
> as the time format. struct timeval is not y2038 safe.
> The subsequent patches in the series add support for new socket
> timeout options with _NEW suffix that are y2038 safe.
> Rename the existing options with _OLD suffix forms so that the
> right option is enabled for userspace applications according
> to the architecture and time_t definition of libc.
>
> Signed-off-by: Deepa Dinamani 

Looks good overall. A few minor concerns:

The description above makes it sound like there is a bug with y2038-safety
in this particular interface, which I think is just not what you meant,
as the change is only needed for compatiblity with new C libraries
that work around the y2038 problem in general by changing their
timeval definition.

> diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
> index 76976d6e50f9..c98ad9777ad9 100644
> --- a/fs/dlm/lowcomms.c
> +++ b/fs/dlm/lowcomms.c
> @@ -1089,12 +1089,12 @@ static void sctp_connect_to_sock(struct connection 
> *con)
>  * since O_NONBLOCK argument in connect() function does not work here,
>  * then, we should restore the default value of this attribute.
>  */
> -   kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO, (char *)&tv,
> +   kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO_OLD, (char *)&tv,
>   sizeof(tv));
> result = sock->ops->connect(sock, (struct sockaddr *)&daddr, addr_len,
>0);
> memset(&tv, 0, sizeof(tv));
> -   kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO, (char *)&tv,
> +   kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO_OLD, (char *)&tv,
>   sizeof(tv));
>
> if (result == -EINPROGRESS)

It took me a bit to realize there that this is safe as well even if
we don't use SO_SNDTIMEO_NEW, for the same reason.

> --- a/net/compat.c
> +++ b/net/compat.c
> @@ -378,7 +378,7 @@ static int compat_sock_setsockopt(struct socket *sock, 
> int level, int optname,
> return do_set_attach_filter(sock, level, optname,
> optval, optlen);
> if (!COMPAT_USE_64BIT_TIME &&
> -   (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO))
> +   (optname == SO_RCVTIMEO_OLD || optname == SO_SNDTIMEO_OLD))
> return do_set_sock_timeout(sock, level, optname, optval, 
> optlen);
>
> return sock_setsockopt(sock, level, optname, optval, optlen);
> @@ -450,7 +450,7 @@ static int compat_sock_getsockopt(struct socket *sock, 
> int level, int optname,
> char __user *optval, int __user *optlen)
>  {
> if (!COMPAT_USE_64BIT_TIME &&
> -   (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO))
> +   (optname == SO_RCVTIMEO_OLD || optname == SO_SNDTIMEO_OLD))
> return do_get_sock_timeout(sock, level, optname, optval, 
> optlen);
> return sock_getsockopt(sock, level, optname, optval, optlen);
>  }

I looked at the original code and noticed that it's horrible, which of course
is not your fault, but I wonder if we should just fix it now to avoid that
get_fs()/set_fs() hack, since that code mostly implements what you
also have in your patch 3 (which is done more nicely).

I'll follow up with a patch to demonstrate what I mean here. Your third
patch will then just have to add another code path so we can handle
all of old_timespec32 (for existing 32-bit user space), __kernel_old_timespec
(for sparc64) and __kernel_sock_timeval (for everything else).

   Arnd



[Cluster-devel] [PATCH] socket: move compat timeout handling into sock.c

2019-01-08 Thread Arnd Bergmann
This is a cleanup to prepare for the addition of 64-bit time_t
in O_SNDTIMEO/O_RCVTIMEO. The existing compat handler seems
unnecessarily complex and error-prone, moving it all into the
main setsockopt()/getsockopt() implementation requires half
as much code and is easier to extend.

32-bit user space can now use old_timeval32 on both 32-bit
and 64-bit machines, while 64-bit code can use
__old_kernel_timeval.

Signed-off-by: Arnd Bergmann 
---
 net/compat.c| 66 +
 net/core/sock.c | 65 +++-
 2 files changed, 44 insertions(+), 87 deletions(-)

diff --git a/net/compat.c b/net/compat.c
index 959d1c51826d..ce8f6e8cdcd2 100644
--- a/net/compat.c
+++ b/net/compat.c
@@ -348,28 +348,6 @@ static int do_set_attach_filter(struct socket *sock, int 
level, int optname,
  sizeof(struct sock_fprog));
 }
 
-static int do_set_sock_timeout(struct socket *sock, int level,
-   int optname, char __user *optval, unsigned int optlen)
-{
-   struct compat_timeval __user *up = (struct compat_timeval __user 
*)optval;
-   struct timeval ktime;
-   mm_segment_t old_fs;
-   int err;
-
-   if (optlen < sizeof(*up))
-   return -EINVAL;
-   if (!access_ok(up, sizeof(*up)) ||
-   __get_user(ktime.tv_sec, &up->tv_sec) ||
-   __get_user(ktime.tv_usec, &up->tv_usec))
-   return -EFAULT;
-   old_fs = get_fs();
-   set_fs(KERNEL_DS);
-   err = sock_setsockopt(sock, level, optname, (char *)&ktime, 
sizeof(ktime));
-   set_fs(old_fs);
-
-   return err;
-}
-
 static int compat_sock_setsockopt(struct socket *sock, int level, int optname,
char __user *optval, unsigned int optlen)
 {
@@ -377,10 +355,6 @@ static int compat_sock_setsockopt(struct socket *sock, int 
level, int optname,
optname == SO_ATTACH_REUSEPORT_CBPF)
return do_set_attach_filter(sock, level, optname,
optval, optlen);
-   if (!COMPAT_USE_64BIT_TIME &&
-   (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO))
-   return do_set_sock_timeout(sock, level, optname, optval, 
optlen);
-
return sock_setsockopt(sock, level, optname, optval, optlen);
 }
 
@@ -417,44 +391,6 @@ COMPAT_SYSCALL_DEFINE5(setsockopt, int, fd, int, level, 
int, optname,
return __compat_sys_setsockopt(fd, level, optname, optval, optlen);
 }
 
-static int do_get_sock_timeout(struct socket *sock, int level, int optname,
-   char __user *optval, int __user *optlen)
-{
-   struct compat_timeval __user *up;
-   struct timeval ktime;
-   mm_segment_t old_fs;
-   int len, err;
-
-   up = (struct compat_timeval __user *) optval;
-   if (get_user(len, optlen))
-   return -EFAULT;
-   if (len < sizeof(*up))
-   return -EINVAL;
-   len = sizeof(ktime);
-   old_fs = get_fs();
-   set_fs(KERNEL_DS);
-   err = sock_getsockopt(sock, level, optname, (char *) &ktime, &len);
-   set_fs(old_fs);
-
-   if (!err) {
-   if (put_user(sizeof(*up), optlen) ||
-   !access_ok(up, sizeof(*up)) ||
-   __put_user(ktime.tv_sec, &up->tv_sec) ||
-   __put_user(ktime.tv_usec, &up->tv_usec))
-   err = -EFAULT;
-   }
-   return err;
-}
-
-static int compat_sock_getsockopt(struct socket *sock, int level, int optname,
-   char __user *optval, int __user *optlen)
-{
-   if (!COMPAT_USE_64BIT_TIME &&
-   (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO))
-   return do_get_sock_timeout(sock, level, optname, optval, 
optlen);
-   return sock_getsockopt(sock, level, optname, optval, optlen);
-}
-
 int compat_sock_get_timestamp(struct sock *sk, struct timeval __user 
*userstamp)
 {
struct compat_timeval __user *ctv;
@@ -527,7 +463,7 @@ static int __compat_sys_getsockopt(int fd, int level, int 
optname,
}
 
if (level == SOL_SOCKET)
-   err = compat_sock_getsockopt(sock, level,
+   err = sock_getsockopt(sock, level,
optname, optval, optlen);
else if (sock->ops->compat_getsockopt)
err = sock->ops->compat_getsockopt(sock, level,
diff --git a/net/core/sock.c b/net/core/sock.c
index 6aa2e7e0b4fb..e50b9a2abc92 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -335,14 +335,48 @@ int __sk_backlog_rcv(struct sock *sk, struct sk_buff *skb)
 }
 EXPORT_SYMBOL(__sk_backlog_rcv);
 
+static int sock_get_timeout(long timeo, void *optval)
+{
+   struct __kernel_old_timeval tv;
+
+   if (timeo == MAX_SCHEDULE_TIMEOUT) {
+   tv.tv_sec = 0;
+   tv.tv_usec = 0;
+   } else {
+   tv.tv_sec = ti

Re: [Cluster-devel] [PATCH 2/3] socket: Rename SO_RCVTIMEO/ SO_SNDTIMEO with _OLD suffixes

2019-01-08 Thread Deepa Dinamani
On Tue, Jan 8, 2019 at 12:04 PM Arnd Bergmann  wrote:
>
> On Tue, Jan 8, 2019 at 6:24 AM Deepa Dinamani  wrote:
> >
> > SO_RCVTIMEO and SO_SNDTIMEO socket options use struct timeval
> > as the time format. struct timeval is not y2038 safe.
> > The subsequent patches in the series add support for new socket
> > timeout options with _NEW suffix that are y2038 safe.
> > Rename the existing options with _OLD suffix forms so that the
> > right option is enabled for userspace applications according
> > to the architecture and time_t definition of libc.
> >
> > Signed-off-by: Deepa Dinamani 
>
> Looks good overall. A few minor concerns:
>
> The description above makes it sound like there is a bug with y2038-safety
> in this particular interface, which I think is just not what you meant,
> as the change is only needed for compatiblity with new C libraries
> that work around the y2038 problem in general by changing their
> timeval definition.

Right, there is y2038 safety issue, just the libc part that needs to be handled.
I will fix the commit text.

> > diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
> > index 76976d6e50f9..c98ad9777ad9 100644
> > --- a/fs/dlm/lowcomms.c
> > +++ b/fs/dlm/lowcomms.c
> > @@ -1089,12 +1089,12 @@ static void sctp_connect_to_sock(struct connection 
> > *con)
> >  * since O_NONBLOCK argument in connect() function does not work 
> > here,
> >  * then, we should restore the default value of this attribute.
> >  */
> > -   kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO, (char *)&tv,
> > +   kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO_OLD, (char *)&tv,
> >   sizeof(tv));
> > result = sock->ops->connect(sock, (struct sockaddr *)&daddr, 
> > addr_len,
> >0);
> > memset(&tv, 0, sizeof(tv));
> > -   kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO, (char *)&tv,
> > +   kernel_setsockopt(sock, SOL_SOCKET, SO_SNDTIMEO_OLD, (char *)&tv,
> >   sizeof(tv));
> >
> > if (result == -EINPROGRESS)
>
> It took me a bit to realize there that this is safe as well even if
> we don't use SO_SNDTIMEO_NEW, for the same reason.

Correct.

> > --- a/net/compat.c
> > +++ b/net/compat.c
> > @@ -378,7 +378,7 @@ static int compat_sock_setsockopt(struct socket *sock, 
> > int level, int optname,
> > return do_set_attach_filter(sock, level, optname,
> > optval, optlen);
> > if (!COMPAT_USE_64BIT_TIME &&
> > -   (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO))
> > +   (optname == SO_RCVTIMEO_OLD || optname == SO_SNDTIMEO_OLD))
> > return do_set_sock_timeout(sock, level, optname, optval, 
> > optlen);
> >
> > return sock_setsockopt(sock, level, optname, optval, optlen);
> > @@ -450,7 +450,7 @@ static int compat_sock_getsockopt(struct socket *sock, 
> > int level, int optname,
> > char __user *optval, int __user *optlen)
> >  {
> > if (!COMPAT_USE_64BIT_TIME &&
> > -   (optname == SO_RCVTIMEO || optname == SO_SNDTIMEO))
> > +   (optname == SO_RCVTIMEO_OLD || optname == SO_SNDTIMEO_OLD))
> > return do_get_sock_timeout(sock, level, optname, optval, 
> > optlen);
> > return sock_getsockopt(sock, level, optname, optval, optlen);
> >  }
>
> I looked at the original code and noticed that it's horrible, which of course
> is not your fault, but I wonder if we should just fix it now to avoid that
> get_fs()/set_fs() hack, since that code mostly implements what you
> also have in your patch 3 (which is done more nicely).

I did think of getting rid of set_fs()/ get_fs() here.
But, I wasn't sure as the maintainers seemed to prefer to leave to the
old code as is in the other series for timestamps.

> I'll follow up with a patch to demonstrate what I mean here. Your third
> patch will then just have to add another code path so we can handle
> all of old_timespec32 (for existing 32-bit user space), __kernel_old_timespec
> (for sparc64) and __kernel_sock_timeval (for everything else).

Cool, I will rebase on top of your patch.

Thanks,
Deepa



Re: [Cluster-devel] [PATCH] socket: move compat timeout handling into sock.c

2019-01-08 Thread Deepa Dinamani
On Tue, Jan 8, 2019 at 12:10 PM Arnd Bergmann  wrote:
>
> This is a cleanup to prepare for the addition of 64-bit time_t
> in O_SNDTIMEO/O_RCVTIMEO. The existing compat handler seems
> unnecessarily complex and error-prone, moving it all into the
> main setsockopt()/getsockopt() implementation requires half
> as much code and is easier to extend.
>
> 32-bit user space can now use old_timeval32 on both 32-bit
> and 64-bit machines, while 64-bit code can use
> __old_kernel_timeval.
>
> Signed-off-by: Arnd Bergmann 

This will make the other series so much nicer. Thank you.

Acked-by: Deepa Dinamani