On 20/02/2023 14:21, Juergen Gross wrote:
On 20.02.23 15:15, Julien Grall wrote:
On 20/02/2023 13:49, Juergen Gross wrote:
On 20.02.23 13:07, Julien Grall wrote:
Hi Juergen,
On 20/02/2023 11:04, Juergen Gross wrote:
On 20.02.23 10:46, Julien Grall wrote:
Hi Juergen,
On 20/01/2023 10:00, Juergen Gross wrote:
The accounting for the number of nodes of a domain in an active
transaction is not working correctly, as it allows to create
arbitrary
number of nodes. The transaction will finally fail due to exceeding
the number of nodes quota, but before closing the transaction an
unprivileged guest could cause Xenstore to use a lot of memory.
I know I said I would delay my decision on this patch. However, I
was still expecting the commit message to be updated based on our
previous discussion.
As said before, I was waiting on the settlement of our discussion
before
doing the update.
This is not a very good practice to resend a patch without recording
the disagreement because new reviewers may not be aware of the
disagreement and this could end up to be committed by mistake...
You said you wanted to look into this patch in detail after the previous
series, so I move it into this series. The wording update would strongly
depend on the outcome of the discussion IMO, so I didn't do it yet.
This could have been mentioned after ---. This could allow people to
understand the concern and then participate.
Will do so in future.
Also thinking more about it, "The transaction will finally fail
due to exceeding the number of nodes quota" may not be true for a
couple of reasons:
1) The transaction may removed a node afterwards.
Yes, and? Just because it is a transaction, this is still a
violation of
the quota. Even outside a transaction you could use the same
reasoning,
but you don't (which is correct, of course).
I can understand your point. However, to me, the goal of the
transaction is to commit everything in one go at the end. So the
violations in the middle should not matter.
Of course they should.
We wouldn't allow to write over-sized nodes, even if they could be
rewritten in
the same transaction with a smaller size.
I view the two different.
We wouldn't allow to create nodes which would violate overall memory
consumption.
We wouldn't allow nodes with more permission entries than allowed,
even if they
could be reduced in the same transaction again.
I don't see why the number of nodes shouldn't be taken into account.
Furthermore, I would expect a transaction to work on a snapshot of
the system. So it feels really strange to me that we are constantly
checking the quota with the updated values rather than the initial one.
You are aware that the code without this patch is just neglecting the
nodes
created in the transaction? It just is using the current number of nodes
outside any transaction for quota check.
Are you referring to the quota check within the transaction?
I'm referring to the quota check in create_node(). >
So I could easily create a scenario
where my new code will succeed, but the current one is failing:
Assume node quota is 1000, and at start of the transaction the guest
is owning
999 nodes. In the transaction the guest is deleting 10 nodes, then
dom0 is
creating 5 additional nodes owned by the guest. The central node
counter is now
1004 (over quota), while the in-transaction count is 994.
In the transaction the
guest can now happily create a new node (#995) with my patch, but
will fail to
do so without (it sees the 1004 due to the transaction count being
ignored).
This doesn't sound correct. To do you have any test I could use to
check the behavior?
Try it:
- create nodes in a guest until you hit the ENOSPC return code due to
too many
nodes
- start a transaction deleting some nodes and then trying to create another
one, which fail fail with ENOSPC.
So I have recreated an XTF test which I think match what you wrote [1].
It is indeed failing without your patch. But then there are still some
weird behavior here.
I would expect the creation of the node would also fail if instead of
removing the node if removed outside of the transaction.
This is not the case because we are looking at the current quota. So
shouldn't we snapshot the global count?
Cheers,
[1]
#include <xtf.h>
const char test_title[] = "Test xenstore-transaction-limit-1";
#define BASELINE_DIR "data"
#define BASELINE_MAX_DIR BASELINE_DIR"/max"
#define MAX_NODES 2000
static bool max_out_nodes(void)
{
unsigned int parent_id = 0, child_id = 0, nr_nodes = 0;
xs_transaction_t tid = XBT_NULL;
printk("Maxing out nodes\n");
do
{
int rc;
char path[256];
rc = snprintf(path, ARRAY_SIZE(path), "%s/%u/%u",
BASELINE_MAX_DIR, parent_id, child_id);
if ( rc >= (int)ARRAY_SIZE(path) )
{
xtf_error("Unable to create the path\n");
return false;
}
rc = xenstore_mkdir(tid, path);
/* Xenstored will return ENOSPC if we exceed a quota */
if ( rc == ENOSPC )
{
/*
* If we can't write the first child, then this likely means
* we exceed the maximum of nodes quota. Consider it a
* success.
*
* Otherwise, we may hit the maximum size of the parent.
* Switch to a different parent.
*/
if ( child_id == 0 )
{
printk("Stopped after %u iterations\n", nr_nodes);
return true;
}
else
{
printk("Parent ID %u: created %u children\n",
parent_id, child_id);
parent_id++;
child_id = 0;
continue;
}
}
else if ( rc )
{
xtf_error("Unexpected error %d\n", rc);
return false;
}
else
{
nr_nodes++;
child_id++;
}
} while ( nr_nodes < MAX_NODES );
xtf_error("Created %u nodes and the quota is still not reached?\n",
nr_nodes);
return false;
}
void test_main(void)
{
xs_transaction_t tid;
int rc;
if ( !max_out_nodes() )
return;
tid = xenstore_transaction_start();
if ( tid == XBT_NULL )
{
xtf_error("Cannot start transaction\n");
return;
}
/* Remove one of the node within the transaction */
rc = xenstore_rm(tid, "data/max/0/0");
if ( rc )
{
xtf_error("Cannot remove the node data/max/0/0 (rc = %d)\n", rc);
return;
}
/* Creating a new node should work because we removed one */
rc = xenstore_write(tid, "data/foo", "bar");
if ( rc )
{
xtf_error("Cannot create node data/foo (rc = %d)\n", rc);
return;
}
xtf_success(NULL);
}
Juergen
--
Julien Grall