Re: [PATCH 2/2] zram: do not count duplicated pages as compressed

2017-05-21 Thread Minchan Kim
On Sun, May 21, 2017 at 12:04:27AM -0700, Christoph Hellwig wrote:
> On Wed, May 17, 2017 at 05:32:12PM +0900, Minchan Kim wrote:
> > Is block device(esp, zram which is compressed ram block device) okay to
> > return garbage when ongoing overwrite IO fails?
> > 
> > O_DIRECT write 4 block "aaa.." -> success
> > read  4 block "aaa.." -> success
> > O_DIRECT write 4 block "bbb.." -> fail
> > read  4 block "000..' -> it is okay?
> > 
> > Hope to get an answer form experts. :)
> 
> It's "okay" as it's what existing real block devices do (at least on a
> sector boundary).  It's not "nice" though, so if you can avoid it,
> please do.

That was my understanding so I wanted to avoid it for just simple
code refactoring. Your comment helps to confirm the thhought.

Thanks, Christoph!



Re: [PATCH 2/2] zram: do not count duplicated pages as compressed

2017-05-21 Thread Minchan Kim
On Sun, May 21, 2017 at 12:04:27AM -0700, Christoph Hellwig wrote:
> On Wed, May 17, 2017 at 05:32:12PM +0900, Minchan Kim wrote:
> > Is block device(esp, zram which is compressed ram block device) okay to
> > return garbage when ongoing overwrite IO fails?
> > 
> > O_DIRECT write 4 block "aaa.." -> success
> > read  4 block "aaa.." -> success
> > O_DIRECT write 4 block "bbb.." -> fail
> > read  4 block "000..' -> it is okay?
> > 
> > Hope to get an answer form experts. :)
> 
> It's "okay" as it's what existing real block devices do (at least on a
> sector boundary).  It's not "nice" though, so if you can avoid it,
> please do.

That was my understanding so I wanted to avoid it for just simple
code refactoring. Your comment helps to confirm the thhought.

Thanks, Christoph!



Re: [PATCH 2/2] zram: do not count duplicated pages as compressed

2017-05-21 Thread Christoph Hellwig
On Wed, May 17, 2017 at 05:32:12PM +0900, Minchan Kim wrote:
> Is block device(esp, zram which is compressed ram block device) okay to
> return garbage when ongoing overwrite IO fails?
> 
> O_DIRECT write 4 block "aaa.." -> success
> read  4 block "aaa.." -> success
> O_DIRECT write 4 block "bbb.." -> fail
> read  4 block "000..' -> it is okay?
> 
> Hope to get an answer form experts. :)

It's "okay" as it's what existing real block devices do (at least on a
sector boundary).  It's not "nice" though, so if you can avoid it,
please do.


Re: [PATCH 2/2] zram: do not count duplicated pages as compressed

2017-05-21 Thread Christoph Hellwig
On Wed, May 17, 2017 at 05:32:12PM +0900, Minchan Kim wrote:
> Is block device(esp, zram which is compressed ram block device) okay to
> return garbage when ongoing overwrite IO fails?
> 
> O_DIRECT write 4 block "aaa.." -> success
> read  4 block "aaa.." -> success
> O_DIRECT write 4 block "bbb.." -> fail
> read  4 block "000..' -> it is okay?
> 
> Hope to get an answer form experts. :)

It's "okay" as it's what existing real block devices do (at least on a
sector boundary).  It's not "nice" though, so if you can avoid it,
please do.


Re: [PATCH 2/2] zram: do not count duplicated pages as compressed

2017-05-17 Thread Minchan Kim
Hi Sergey,

On Wed, May 17, 2017 at 06:14:23PM +0900, Sergey Senozhatsky wrote:
> Hello Minchan,
> 
> On (05/17/17 17:32), Minchan Kim wrote:
> [..]
> > > what we can return now is a `partially updated' data, with some new
> > > and some stale pages. this is quite unlikely to end up anywhere good.
> > > am I wrong?
> > > 
> > > why does `rd block 4' in your case causes Oops? as a worst case scenario?
> > > application does not expect page to be 'all A' at this point. pages are
> > > likely to belong to some mappings/files/etc., and there is likely a data
> > > dependency between them, dunno C++ objects that span across pages or
> > > JPEG images, etc. so returning "new data   new data   stale data" is a bit
> > > fishy.
> > 
> > I thought more about it and start to confuse. :/
> 
> sorry, I'm not sure I see what's the source of your confusion :)
> 
> my point is - we should not let READ succeed if we know that WRITE
> failed. assume JPEG image example,

I don't think we shoul do it. I will write down my thought below. :)

> 
> 
> over-write block 1 aaa->xxx OK
> over-write block 2 bbb->yyy OK
> over-write block 3 ccc->zzz error
> 
> reading that JPEG file
> 
> read block 1 xxx OK
> read block 2 yyy OK
> read block 3 ccc OK   << we should not return OK here. because
>  "xxxyyyccc" is not the correct JPEG file
>  anyway.
> 
> do you agree that telling application that read() succeeded and at
> the same returning corrupted "xxxyyyccc" instead of "xxxyyyzzz" is
> not correct?

I don't agree. I *think* block device is a just dumb device so
zram doesn't need to know about any objects from the upper layer.
What zram should consider is basically read/write success or fail of
IO unit(maybe, BIO).

So if we assume each step from above example is bio unit, I think
it's no problem returns "xxxyyyccc".

What I meant "started confused" was about atomicity, not above
thing.

I think it's okay to return ccc instead of zzz but is it okay
zram to return "000", not "ccc" and "zzz"?
My conclusion is that it's okay now after discussion from one
of my FS friends.

Let's think about it.

FS requests write "aaa" to block 4 and fails by somethings
(H/W failure, S/W failure like ENOMEM). The interface to catch
the failure is the function registered by bio_endio which is
normally handles by AS_EIO by mappint_set_error as well as
PG_error flags of the page. In this case, FS assumes the block
4 can have stale data, not 'zzz' and 'ccc' because the device
was broken in the middle of write some data to a block if
the block device doesn't support atomic write(I guess it's
more popular) so it would be safe to consider the block
has garbage now rather than old value, new value.
(I hope I explain my thought well :/)

Having said that, I think everyone likes block device supports
atomicity(ie, old or new). so I am reluctant to change the
behavior for simple refactoring.

Thanks.


Re: [PATCH 2/2] zram: do not count duplicated pages as compressed

2017-05-17 Thread Minchan Kim
Hi Sergey,

On Wed, May 17, 2017 at 06:14:23PM +0900, Sergey Senozhatsky wrote:
> Hello Minchan,
> 
> On (05/17/17 17:32), Minchan Kim wrote:
> [..]
> > > what we can return now is a `partially updated' data, with some new
> > > and some stale pages. this is quite unlikely to end up anywhere good.
> > > am I wrong?
> > > 
> > > why does `rd block 4' in your case causes Oops? as a worst case scenario?
> > > application does not expect page to be 'all A' at this point. pages are
> > > likely to belong to some mappings/files/etc., and there is likely a data
> > > dependency between them, dunno C++ objects that span across pages or
> > > JPEG images, etc. so returning "new data   new data   stale data" is a bit
> > > fishy.
> > 
> > I thought more about it and start to confuse. :/
> 
> sorry, I'm not sure I see what's the source of your confusion :)
> 
> my point is - we should not let READ succeed if we know that WRITE
> failed. assume JPEG image example,

I don't think we shoul do it. I will write down my thought below. :)

> 
> 
> over-write block 1 aaa->xxx OK
> over-write block 2 bbb->yyy OK
> over-write block 3 ccc->zzz error
> 
> reading that JPEG file
> 
> read block 1 xxx OK
> read block 2 yyy OK
> read block 3 ccc OK   << we should not return OK here. because
>  "xxxyyyccc" is not the correct JPEG file
>  anyway.
> 
> do you agree that telling application that read() succeeded and at
> the same returning corrupted "xxxyyyccc" instead of "xxxyyyzzz" is
> not correct?

I don't agree. I *think* block device is a just dumb device so
zram doesn't need to know about any objects from the upper layer.
What zram should consider is basically read/write success or fail of
IO unit(maybe, BIO).

So if we assume each step from above example is bio unit, I think
it's no problem returns "xxxyyyccc".

What I meant "started confused" was about atomicity, not above
thing.

I think it's okay to return ccc instead of zzz but is it okay
zram to return "000", not "ccc" and "zzz"?
My conclusion is that it's okay now after discussion from one
of my FS friends.

Let's think about it.

FS requests write "aaa" to block 4 and fails by somethings
(H/W failure, S/W failure like ENOMEM). The interface to catch
the failure is the function registered by bio_endio which is
normally handles by AS_EIO by mappint_set_error as well as
PG_error flags of the page. In this case, FS assumes the block
4 can have stale data, not 'zzz' and 'ccc' because the device
was broken in the middle of write some data to a block if
the block device doesn't support atomic write(I guess it's
more popular) so it would be safe to consider the block
has garbage now rather than old value, new value.
(I hope I explain my thought well :/)

Having said that, I think everyone likes block device supports
atomicity(ie, old or new). so I am reluctant to change the
behavior for simple refactoring.

Thanks.


Re: [PATCH 2/2] zram: do not count duplicated pages as compressed

2017-05-17 Thread Sergey Senozhatsky
Hello Minchan,

On (05/17/17 17:32), Minchan Kim wrote:
[..]
> > what we can return now is a `partially updated' data, with some new
> > and some stale pages. this is quite unlikely to end up anywhere good.
> > am I wrong?
> > 
> > why does `rd block 4' in your case causes Oops? as a worst case scenario?
> > application does not expect page to be 'all A' at this point. pages are
> > likely to belong to some mappings/files/etc., and there is likely a data
> > dependency between them, dunno C++ objects that span across pages or
> > JPEG images, etc. so returning "new data   new data   stale data" is a bit
> > fishy.
> 
> I thought more about it and start to confuse. :/

sorry, I'm not sure I see what's the source of your confusion :)

my point is - we should not let READ succeed if we know that WRITE
failed. assume JPEG image example,


over-write block 1 aaa->xxx OK
over-write block 2 bbb->yyy OK
over-write block 3 ccc->zzz error

reading that JPEG file

read block 1 xxx OK
read block 2 yyy OK
read block 3 ccc OK   << we should not return OK here. because
 "xxxyyyccc" is not the correct JPEG file
 anyway.

do you agree that telling application that read() succeeded and at
the same returning corrupted "xxxyyyccc" instead of "xxxyyyzzz" is
not correct?



so how about this,

- if we fail to compress page (S/W or H/W compressor error, depending
  on particular setup) let's store it uncompressed (page_size-d zspool
  object).

?

this should do the trick. at least we will have correct data:
xxx  - compressed
yyy  - compressed
zzz  - uncompressed, because compressing back-end returned an error.

thoughts?

-ss


Re: [PATCH 2/2] zram: do not count duplicated pages as compressed

2017-05-17 Thread Sergey Senozhatsky
Hello Minchan,

On (05/17/17 17:32), Minchan Kim wrote:
[..]
> > what we can return now is a `partially updated' data, with some new
> > and some stale pages. this is quite unlikely to end up anywhere good.
> > am I wrong?
> > 
> > why does `rd block 4' in your case causes Oops? as a worst case scenario?
> > application does not expect page to be 'all A' at this point. pages are
> > likely to belong to some mappings/files/etc., and there is likely a data
> > dependency between them, dunno C++ objects that span across pages or
> > JPEG images, etc. so returning "new data   new data   stale data" is a bit
> > fishy.
> 
> I thought more about it and start to confuse. :/

sorry, I'm not sure I see what's the source of your confusion :)

my point is - we should not let READ succeed if we know that WRITE
failed. assume JPEG image example,


over-write block 1 aaa->xxx OK
over-write block 2 bbb->yyy OK
over-write block 3 ccc->zzz error

reading that JPEG file

read block 1 xxx OK
read block 2 yyy OK
read block 3 ccc OK   << we should not return OK here. because
 "xxxyyyccc" is not the correct JPEG file
 anyway.

do you agree that telling application that read() succeeded and at
the same returning corrupted "xxxyyyccc" instead of "xxxyyyzzz" is
not correct?



so how about this,

- if we fail to compress page (S/W or H/W compressor error, depending
  on particular setup) let's store it uncompressed (page_size-d zspool
  object).

?

this should do the trick. at least we will have correct data:
xxx  - compressed
yyy  - compressed
zzz  - uncompressed, because compressing back-end returned an error.

thoughts?

-ss


Re: [PATCH 2/2] zram: do not count duplicated pages as compressed

2017-05-17 Thread Minchan Kim
Hi Sergey,

On Tue, May 16, 2017 at 04:36:17PM +0900, Sergey Senozhatsky wrote:
> On (05/16/17 16:16), Minchan Kim wrote:
> > > but would this be correct? the data is not valid - we failed to store
> > > the valid one. but instead we assure application that read()/swapin/etc.,
> > > depending on the usage scenario, is successful (even though the data is
> > > not what application really expects to see), application tries to use the
> > > data from that page and probably crashes (dunno, for example page 
> > > contained
> > > hash tables with pointers that are not valid anymore, etc. etc.).
> > > 
> > > I'm not optimistic about stale data reads; it basically will look like
> > > data corruption to the application.
> > 
> > Hmm, I don't understand what you say.
> > My point is zram_free_page should be done only if whoe write operation
> > is successful.
> > With you change, following situation can happens.
> > 
> > write block 4, 'all A' -> success
> > read  block 4, 'all A' verified -> Good
> > write block 4, 'all B' -> but failed with ENOMEM
> > read  block 4  expected 'all A' but 'all 0' -> Oops
> 
> yes. 'all A' in #4 can be incorrect. zram can be used as a block device
> with a file system, and pid that does write op not necessarily does read
> op later. it can be a completely different application. e.g. compilation,
> or anything else.
> 
> suppose PID A does
> 
> wr block 1   all a
> wr block 2   all a + 1
> wr block 3   all a + 2
> wr block 4   all a + 3
> 
> now PID A does
> 
> wr block 1   all m
> wr block 2   all m + 1
> wr block 3   all m + 2
> wr block 4   failed. block still has 'all a + 3'.
> exit
> 
> another application, PID C, reads in the file and tries to do
> something sane with it
> 
> rd block 1   all m
> rd block 2   all m + 1
> rd block 3   all m + 3
> rd block 4   all a + 3  << this is dangerous. we should return
>error from read() here; not stale data.
> 
> 
> what we can return now is a `partially updated' data, with some new
> and some stale pages. this is quite unlikely to end up anywhere good.
> am I wrong?
> 
> why does `rd block 4' in your case causes Oops? as a worst case scenario?
> application does not expect page to be 'all A' at this point. pages are
> likely to belong to some mappings/files/etc., and there is likely a data
> dependency between them, dunno C++ objects that span across pages or
> JPEG images, etc. so returning "new data   new data   stale data" is a bit
> fishy.

I thought more about it and start to confuse. :/
So, let's Cc linux-block, fs peoples.

The question is that 

Is block device(esp, zram which is compressed ram block device) okay to
return garbage when ongoing overwrite IO fails?

O_DIRECT write 4 block "aaa.." -> success
read  4 block "aaa.." -> success
O_DIRECT write 4 block "bbb.." -> fail
read  4 block "000..' -> it is okay?

Hope to get an answer form experts. :)


Re: [PATCH 2/2] zram: do not count duplicated pages as compressed

2017-05-17 Thread Minchan Kim
Hi Sergey,

On Tue, May 16, 2017 at 04:36:17PM +0900, Sergey Senozhatsky wrote:
> On (05/16/17 16:16), Minchan Kim wrote:
> > > but would this be correct? the data is not valid - we failed to store
> > > the valid one. but instead we assure application that read()/swapin/etc.,
> > > depending on the usage scenario, is successful (even though the data is
> > > not what application really expects to see), application tries to use the
> > > data from that page and probably crashes (dunno, for example page 
> > > contained
> > > hash tables with pointers that are not valid anymore, etc. etc.).
> > > 
> > > I'm not optimistic about stale data reads; it basically will look like
> > > data corruption to the application.
> > 
> > Hmm, I don't understand what you say.
> > My point is zram_free_page should be done only if whoe write operation
> > is successful.
> > With you change, following situation can happens.
> > 
> > write block 4, 'all A' -> success
> > read  block 4, 'all A' verified -> Good
> > write block 4, 'all B' -> but failed with ENOMEM
> > read  block 4  expected 'all A' but 'all 0' -> Oops
> 
> yes. 'all A' in #4 can be incorrect. zram can be used as a block device
> with a file system, and pid that does write op not necessarily does read
> op later. it can be a completely different application. e.g. compilation,
> or anything else.
> 
> suppose PID A does
> 
> wr block 1   all a
> wr block 2   all a + 1
> wr block 3   all a + 2
> wr block 4   all a + 3
> 
> now PID A does
> 
> wr block 1   all m
> wr block 2   all m + 1
> wr block 3   all m + 2
> wr block 4   failed. block still has 'all a + 3'.
> exit
> 
> another application, PID C, reads in the file and tries to do
> something sane with it
> 
> rd block 1   all m
> rd block 2   all m + 1
> rd block 3   all m + 3
> rd block 4   all a + 3  << this is dangerous. we should return
>error from read() here; not stale data.
> 
> 
> what we can return now is a `partially updated' data, with some new
> and some stale pages. this is quite unlikely to end up anywhere good.
> am I wrong?
> 
> why does `rd block 4' in your case causes Oops? as a worst case scenario?
> application does not expect page to be 'all A' at this point. pages are
> likely to belong to some mappings/files/etc., and there is likely a data
> dependency between them, dunno C++ objects that span across pages or
> JPEG images, etc. so returning "new data   new data   stale data" is a bit
> fishy.

I thought more about it and start to confuse. :/
So, let's Cc linux-block, fs peoples.

The question is that 

Is block device(esp, zram which is compressed ram block device) okay to
return garbage when ongoing overwrite IO fails?

O_DIRECT write 4 block "aaa.." -> success
read  4 block "aaa.." -> success
O_DIRECT write 4 block "bbb.." -> fail
read  4 block "000..' -> it is okay?

Hope to get an answer form experts. :)


Re: [PATCH 2/2] zram: do not count duplicated pages as compressed

2017-05-16 Thread Sergey Senozhatsky
On (05/16/17 16:16), Minchan Kim wrote:
> > but would this be correct? the data is not valid - we failed to store
> > the valid one. but instead we assure application that read()/swapin/etc.,
> > depending on the usage scenario, is successful (even though the data is
> > not what application really expects to see), application tries to use the
> > data from that page and probably crashes (dunno, for example page contained
> > hash tables with pointers that are not valid anymore, etc. etc.).
> > 
> > I'm not optimistic about stale data reads; it basically will look like
> > data corruption to the application.
> 
> Hmm, I don't understand what you say.
> My point is zram_free_page should be done only if whoe write operation
> is successful.
> With you change, following situation can happens.
> 
> write block 4, 'all A' -> success
> read  block 4, 'all A' verified -> Good
> write block 4, 'all B' -> but failed with ENOMEM
> read  block 4  expected 'all A' but 'all 0' -> Oops

yes. 'all A' in #4 can be incorrect. zram can be used as a block device
with a file system, and pid that does write op not necessarily does read
op later. it can be a completely different application. e.g. compilation,
or anything else.

suppose PID A does

wr block 1   all a
wr block 2   all a + 1
wr block 3   all a + 2
wr block 4   all a + 3

now PID A does

wr block 1   all m
wr block 2   all m + 1
wr block 3   all m + 2
wr block 4   failed. block still has 'all a + 3'.
exit

another application, PID C, reads in the file and tries to do
something sane with it

rd block 1   all m
rd block 2   all m + 1
rd block 3   all m + 3
rd block 4   all a + 3  << this is dangerous. we should return
   error from read() here; not stale data.


what we can return now is a `partially updated' data, with some new
and some stale pages. this is quite unlikely to end up anywhere good.
am I wrong?

why does `rd block 4' in your case causes Oops? as a worst case scenario?
application does not expect page to be 'all A' at this point. pages are
likely to belong to some mappings/files/etc., and there is likely a data
dependency between them, dunno C++ objects that span across pages or
JPEG images, etc. so returning "new data   new data   stale data" is a bit
fishy.

-ss


Re: [PATCH 2/2] zram: do not count duplicated pages as compressed

2017-05-16 Thread Sergey Senozhatsky
On (05/16/17 16:16), Minchan Kim wrote:
> > but would this be correct? the data is not valid - we failed to store
> > the valid one. but instead we assure application that read()/swapin/etc.,
> > depending on the usage scenario, is successful (even though the data is
> > not what application really expects to see), application tries to use the
> > data from that page and probably crashes (dunno, for example page contained
> > hash tables with pointers that are not valid anymore, etc. etc.).
> > 
> > I'm not optimistic about stale data reads; it basically will look like
> > data corruption to the application.
> 
> Hmm, I don't understand what you say.
> My point is zram_free_page should be done only if whoe write operation
> is successful.
> With you change, following situation can happens.
> 
> write block 4, 'all A' -> success
> read  block 4, 'all A' verified -> Good
> write block 4, 'all B' -> but failed with ENOMEM
> read  block 4  expected 'all A' but 'all 0' -> Oops

yes. 'all A' in #4 can be incorrect. zram can be used as a block device
with a file system, and pid that does write op not necessarily does read
op later. it can be a completely different application. e.g. compilation,
or anything else.

suppose PID A does

wr block 1   all a
wr block 2   all a + 1
wr block 3   all a + 2
wr block 4   all a + 3

now PID A does

wr block 1   all m
wr block 2   all m + 1
wr block 3   all m + 2
wr block 4   failed. block still has 'all a + 3'.
exit

another application, PID C, reads in the file and tries to do
something sane with it

rd block 1   all m
rd block 2   all m + 1
rd block 3   all m + 3
rd block 4   all a + 3  << this is dangerous. we should return
   error from read() here; not stale data.


what we can return now is a `partially updated' data, with some new
and some stale pages. this is quite unlikely to end up anywhere good.
am I wrong?

why does `rd block 4' in your case causes Oops? as a worst case scenario?
application does not expect page to be 'all A' at this point. pages are
likely to belong to some mappings/files/etc., and there is likely a data
dependency between them, dunno C++ objects that span across pages or
JPEG images, etc. so returning "new data   new data   stale data" is a bit
fishy.

-ss


Re: [PATCH 2/2] zram: do not count duplicated pages as compressed

2017-05-16 Thread Minchan Kim
On Tue, May 16, 2017 at 02:45:33PM +0900, Sergey Senozhatsky wrote:
> On (05/16/17 14:26), Minchan Kim wrote:
> [..]
> > > +   /*
> > > +* Free memory associated with this sector
> > > +* before overwriting unused sectors.
> > > +*/
> > > +   zram_slot_lock(zram, index);
> > > +   zram_free_page(zram, index);
> > 
> > Hmm, zram_free should happen only if the write is done successfully.
> > Otherwise, we lose the valid data although write IO was fail.
> 
> but would this be correct? the data is not valid - we failed to store
> the valid one. but instead we assure application that read()/swapin/etc.,
> depending on the usage scenario, is successful (even though the data is
> not what application really expects to see), application tries to use the
> data from that page and probably crashes (dunno, for example page contained
> hash tables with pointers that are not valid anymore, etc. etc.).
> 
> I'm not optimistic about stale data reads; it basically will look like
> data corruption to the application.

Hmm, I don't understand what you say.
My point is zram_free_page should be done only if whoe write operation
is successful.
With you change, following situation can happens.

write block 4, 'all A' -> success
read  block 4, 'all A' verified -> Good
write block 4, 'all B' -> but failed with ENOMEM
read  block 4  expected 'all A' but 'all 0' -> Oops

It is the problem what I pointed out.
If I miss something, could you elaborate it a bit?

Thanks!

> 
> > > +
> > > if (zram_same_page_write(zram, index, page))
> > > -   return 0;
> > > +   goto out_unlock;
> > >  
> > > entry = zram_dedup_find(zram, page, );
> > > if (entry) {
> > > comp_len = entry->len;
> > > +   zram_set_flag(zram, index, ZRAM_DUP);
> >
> > In case of hitting dedup, we shouldn't increase compr_data_size.
> 
> no, we should not. you are right. my "... patch" is incomplete and
> wrong. please don't pay too much attention to it.
> 
> 
> > If we fix above two problems, do you think it's still cleaner?
> > (I don't mean to be reluctant with your suggestion. Just a
> >  real question to know your thought.:)
> 
> do you mean code duplication and stale data read?
> 
> I'd probably prefer to address stale data reads separately.
> but it seems that stale reads fix will re-do parts of your
> 0002 patch and, at least potentially, reduce code duplication.
> 
> so we can go with your 0002 and then stale reads fix will try
> to reduce code duplication (unless we want to have 4 places doing
> the same thing :) )
> 
>   -ss


Re: [PATCH 2/2] zram: do not count duplicated pages as compressed

2017-05-16 Thread Minchan Kim
On Tue, May 16, 2017 at 02:45:33PM +0900, Sergey Senozhatsky wrote:
> On (05/16/17 14:26), Minchan Kim wrote:
> [..]
> > > +   /*
> > > +* Free memory associated with this sector
> > > +* before overwriting unused sectors.
> > > +*/
> > > +   zram_slot_lock(zram, index);
> > > +   zram_free_page(zram, index);
> > 
> > Hmm, zram_free should happen only if the write is done successfully.
> > Otherwise, we lose the valid data although write IO was fail.
> 
> but would this be correct? the data is not valid - we failed to store
> the valid one. but instead we assure application that read()/swapin/etc.,
> depending on the usage scenario, is successful (even though the data is
> not what application really expects to see), application tries to use the
> data from that page and probably crashes (dunno, for example page contained
> hash tables with pointers that are not valid anymore, etc. etc.).
> 
> I'm not optimistic about stale data reads; it basically will look like
> data corruption to the application.

Hmm, I don't understand what you say.
My point is zram_free_page should be done only if whoe write operation
is successful.
With you change, following situation can happens.

write block 4, 'all A' -> success
read  block 4, 'all A' verified -> Good
write block 4, 'all B' -> but failed with ENOMEM
read  block 4  expected 'all A' but 'all 0' -> Oops

It is the problem what I pointed out.
If I miss something, could you elaborate it a bit?

Thanks!

> 
> > > +
> > > if (zram_same_page_write(zram, index, page))
> > > -   return 0;
> > > +   goto out_unlock;
> > >  
> > > entry = zram_dedup_find(zram, page, );
> > > if (entry) {
> > > comp_len = entry->len;
> > > +   zram_set_flag(zram, index, ZRAM_DUP);
> >
> > In case of hitting dedup, we shouldn't increase compr_data_size.
> 
> no, we should not. you are right. my "... patch" is incomplete and
> wrong. please don't pay too much attention to it.
> 
> 
> > If we fix above two problems, do you think it's still cleaner?
> > (I don't mean to be reluctant with your suggestion. Just a
> >  real question to know your thought.:)
> 
> do you mean code duplication and stale data read?
> 
> I'd probably prefer to address stale data reads separately.
> but it seems that stale reads fix will re-do parts of your
> 0002 patch and, at least potentially, reduce code duplication.
> 
> so we can go with your 0002 and then stale reads fix will try
> to reduce code duplication (unless we want to have 4 places doing
> the same thing :) )
> 
>   -ss


Re: [PATCH 2/2] zram: do not count duplicated pages as compressed

2017-05-15 Thread Sergey Senozhatsky
On (05/16/17 14:26), Minchan Kim wrote:
[..]
> > +   /*
> > +* Free memory associated with this sector
> > +* before overwriting unused sectors.
> > +*/
> > +   zram_slot_lock(zram, index);
> > +   zram_free_page(zram, index);
> 
> Hmm, zram_free should happen only if the write is done successfully.
> Otherwise, we lose the valid data although write IO was fail.

but would this be correct? the data is not valid - we failed to store
the valid one. but instead we assure application that read()/swapin/etc.,
depending on the usage scenario, is successful (even though the data is
not what application really expects to see), application tries to use the
data from that page and probably crashes (dunno, for example page contained
hash tables with pointers that are not valid anymore, etc. etc.).

I'm not optimistic about stale data reads; it basically will look like
data corruption to the application.

> > +
> > if (zram_same_page_write(zram, index, page))
> > -   return 0;
> > +   goto out_unlock;
> >  
> > entry = zram_dedup_find(zram, page, );
> > if (entry) {
> > comp_len = entry->len;
> > +   zram_set_flag(zram, index, ZRAM_DUP);
>
> In case of hitting dedup, we shouldn't increase compr_data_size.

no, we should not. you are right. my "... patch" is incomplete and
wrong. please don't pay too much attention to it.


> If we fix above two problems, do you think it's still cleaner?
> (I don't mean to be reluctant with your suggestion. Just a
>  real question to know your thought.:)

do you mean code duplication and stale data read?

I'd probably prefer to address stale data reads separately.
but it seems that stale reads fix will re-do parts of your
0002 patch and, at least potentially, reduce code duplication.

so we can go with your 0002 and then stale reads fix will try
to reduce code duplication (unless we want to have 4 places doing
the same thing :) )

-ss


Re: [PATCH 2/2] zram: do not count duplicated pages as compressed

2017-05-15 Thread Sergey Senozhatsky
On (05/16/17 14:26), Minchan Kim wrote:
[..]
> > +   /*
> > +* Free memory associated with this sector
> > +* before overwriting unused sectors.
> > +*/
> > +   zram_slot_lock(zram, index);
> > +   zram_free_page(zram, index);
> 
> Hmm, zram_free should happen only if the write is done successfully.
> Otherwise, we lose the valid data although write IO was fail.

but would this be correct? the data is not valid - we failed to store
the valid one. but instead we assure application that read()/swapin/etc.,
depending on the usage scenario, is successful (even though the data is
not what application really expects to see), application tries to use the
data from that page and probably crashes (dunno, for example page contained
hash tables with pointers that are not valid anymore, etc. etc.).

I'm not optimistic about stale data reads; it basically will look like
data corruption to the application.

> > +
> > if (zram_same_page_write(zram, index, page))
> > -   return 0;
> > +   goto out_unlock;
> >  
> > entry = zram_dedup_find(zram, page, );
> > if (entry) {
> > comp_len = entry->len;
> > +   zram_set_flag(zram, index, ZRAM_DUP);
>
> In case of hitting dedup, we shouldn't increase compr_data_size.

no, we should not. you are right. my "... patch" is incomplete and
wrong. please don't pay too much attention to it.


> If we fix above two problems, do you think it's still cleaner?
> (I don't mean to be reluctant with your suggestion. Just a
>  real question to know your thought.:)

do you mean code duplication and stale data read?

I'd probably prefer to address stale data reads separately.
but it seems that stale reads fix will re-do parts of your
0002 patch and, at least potentially, reduce code duplication.

so we can go with your 0002 and then stale reads fix will try
to reduce code duplication (unless we want to have 4 places doing
the same thing :) )

-ss


Re: [PATCH 2/2] zram: do not count duplicated pages as compressed

2017-05-15 Thread Minchan Kim
On Tue, May 16, 2017 at 11:36:15AM +0900, Sergey Senozhatsky wrote:
> > > > @@ -794,7 +801,15 @@ static int __zram_bvec_write(struct zram *zram, 
> > > > struct bio_vec *bvec, u32 index)
> > > > entry = zram_dedup_find(zram, page, );
> > > > if (entry) {
> > > > comp_len = entry->len;
> > > > -   goto found_dup;
> > > > +   zram_slot_lock(zram, index);
> > > > +   zram_free_page(zram, index);
> > > > +   zram_set_flag(zram, index, ZRAM_DUP);
> > > > +   zram_set_entry(zram, index, entry);
> > > > +   zram_set_obj_size(zram, index, comp_len);
> > > > +   zram_slot_unlock(zram, index);
> > > > +   atomic64_add(comp_len, >stats.dup_data_size);
> > > > +   atomic64_inc(>stats.pages_stored);
> > > > +   return 0;
> > > 
> > > hm. that's a somewhat big code duplication. isn't it?
> > 
> > Yub. 3 parts. above part,  zram_same_page_write and tail of 
> > __zram_bvec_write.
> 
> hmm... good question... hardly can think of anything significantly
> better, zram object handling is now a mix of flags, entries,
> ref_counters, etc. etc. may be we can merge some of those ops, if we
> would keep slot locked through the entire __zram_bvec_write(), but
> that does not look attractive.
> 
> something ABSOLUTELY untested and incomplete. not even compile tested (!).
> 99% broken and stupid (!). but there is one thing that it has revealed, so
> thus incomplete. see below:
> 
> ---
> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 372602c7da49..b31543c40d54 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -509,11 +509,8 @@ static bool zram_same_page_write(struct zram *zram, u32 
> index,
> if (page_same_filled(mem, )) {
> kunmap_atomic(mem);
> /* Free memory associated with this sector now. */
> -   zram_slot_lock(zram, index);
> -   zram_free_page(zram, index);
> zram_set_flag(zram, index, ZRAM_SAME);
> zram_set_element(zram, index, element);
> -   zram_slot_unlock(zram, index);
>  
> atomic64_inc(>stats.same_pages);
> return true;
> @@ -778,7 +775,7 @@ static int zram_compress(struct zram *zram, struct 
> zcomp_strm **zstrm,
>  
>  static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 
> index)
>  {
> -   int ret;
> +   int ret = 0;
> struct zram_entry *entry;
> unsigned int comp_len;
> void *src, *dst;
> @@ -786,12 +783,20 @@ static int __zram_bvec_write(struct zram *zram, struct 
> bio_vec *bvec, u32 index)
> struct page *page = bvec->bv_page;
> u32 checksum;
>  
> +   /*
> +* Free memory associated with this sector
> +* before overwriting unused sectors.
> +*/
> +   zram_slot_lock(zram, index);
> +   zram_free_page(zram, index);

Hmm, zram_free should happen only if the write is done successfully.
Otherwise, we lose the valid data although write IO was fail.

> +
> if (zram_same_page_write(zram, index, page))
> -   return 0;
> +   goto out_unlock;
>  
> entry = zram_dedup_find(zram, page, );
> if (entry) {
> comp_len = entry->len;
> +   zram_set_flag(zram, index, ZRAM_DUP);

In case of hitting dedup, we shouldn't increase compr_data_size.
If we fix above two problems, do you think it's still cleaner?
(I don't mean to be reluctant with your suggestion. Just a
 real question to know your thought.:)


> goto found_dup;
> }
>  
> @@ -799,7 +804,7 @@ static int __zram_bvec_write(struct zram *zram, struct 
> bio_vec *bvec, u32 index)
> ret = zram_compress(zram, , page, , _len);
> if (ret) {
> zcomp_stream_put(zram->comp);
> -   return ret;
> +   goto out_unlock;
> }
>  
> dst = zs_map_object(zram->mem_pool,
> @@ -817,20 +822,16 @@ static int __zram_bvec_write(struct zram *zram, struct 
> bio_vec *bvec, u32 index)
> zram_dedup_insert(zram, entry, checksum);
>  
>  found_dup:
> -   /*
> -* Free memory associated with this sector
> -* before overwriting unused sectors.
> -*/
> -   zram_slot_lock(zram, index);
> -   zram_free_page(zram, index);
> zram_set_entry(zram, index, entry);
> zram_set_obj_size(zram, index, comp_len);
> -   zram_slot_unlock(zram, index);
>  
> /* Update stats */
> atomic64_add(comp_len, >stats.compr_data_size);
> atomic64_inc(>stats.pages_stored);
> -   return 0;
> +
> +out_unlock:
> +   zram_slot_unlock(zram, index);
> +   return ret;
>  }
> 
> ---
> 
> 
> namely,
> that zram_compress() error return path from __zram_bvec_write().
> 
> currently, we touch the existing compressed 

Re: [PATCH 2/2] zram: do not count duplicated pages as compressed

2017-05-15 Thread Minchan Kim
On Tue, May 16, 2017 at 11:36:15AM +0900, Sergey Senozhatsky wrote:
> > > > @@ -794,7 +801,15 @@ static int __zram_bvec_write(struct zram *zram, 
> > > > struct bio_vec *bvec, u32 index)
> > > > entry = zram_dedup_find(zram, page, );
> > > > if (entry) {
> > > > comp_len = entry->len;
> > > > -   goto found_dup;
> > > > +   zram_slot_lock(zram, index);
> > > > +   zram_free_page(zram, index);
> > > > +   zram_set_flag(zram, index, ZRAM_DUP);
> > > > +   zram_set_entry(zram, index, entry);
> > > > +   zram_set_obj_size(zram, index, comp_len);
> > > > +   zram_slot_unlock(zram, index);
> > > > +   atomic64_add(comp_len, >stats.dup_data_size);
> > > > +   atomic64_inc(>stats.pages_stored);
> > > > +   return 0;
> > > 
> > > hm. that's a somewhat big code duplication. isn't it?
> > 
> > Yub. 3 parts. above part,  zram_same_page_write and tail of 
> > __zram_bvec_write.
> 
> hmm... good question... hardly can think of anything significantly
> better, zram object handling is now a mix of flags, entries,
> ref_counters, etc. etc. may be we can merge some of those ops, if we
> would keep slot locked through the entire __zram_bvec_write(), but
> that does not look attractive.
> 
> something ABSOLUTELY untested and incomplete. not even compile tested (!).
> 99% broken and stupid (!). but there is one thing that it has revealed, so
> thus incomplete. see below:
> 
> ---
> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 372602c7da49..b31543c40d54 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -509,11 +509,8 @@ static bool zram_same_page_write(struct zram *zram, u32 
> index,
> if (page_same_filled(mem, )) {
> kunmap_atomic(mem);
> /* Free memory associated with this sector now. */
> -   zram_slot_lock(zram, index);
> -   zram_free_page(zram, index);
> zram_set_flag(zram, index, ZRAM_SAME);
> zram_set_element(zram, index, element);
> -   zram_slot_unlock(zram, index);
>  
> atomic64_inc(>stats.same_pages);
> return true;
> @@ -778,7 +775,7 @@ static int zram_compress(struct zram *zram, struct 
> zcomp_strm **zstrm,
>  
>  static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 
> index)
>  {
> -   int ret;
> +   int ret = 0;
> struct zram_entry *entry;
> unsigned int comp_len;
> void *src, *dst;
> @@ -786,12 +783,20 @@ static int __zram_bvec_write(struct zram *zram, struct 
> bio_vec *bvec, u32 index)
> struct page *page = bvec->bv_page;
> u32 checksum;
>  
> +   /*
> +* Free memory associated with this sector
> +* before overwriting unused sectors.
> +*/
> +   zram_slot_lock(zram, index);
> +   zram_free_page(zram, index);

Hmm, zram_free should happen only if the write is done successfully.
Otherwise, we lose the valid data although write IO was fail.

> +
> if (zram_same_page_write(zram, index, page))
> -   return 0;
> +   goto out_unlock;
>  
> entry = zram_dedup_find(zram, page, );
> if (entry) {
> comp_len = entry->len;
> +   zram_set_flag(zram, index, ZRAM_DUP);

In case of hitting dedup, we shouldn't increase compr_data_size.
If we fix above two problems, do you think it's still cleaner?
(I don't mean to be reluctant with your suggestion. Just a
 real question to know your thought.:)


> goto found_dup;
> }
>  
> @@ -799,7 +804,7 @@ static int __zram_bvec_write(struct zram *zram, struct 
> bio_vec *bvec, u32 index)
> ret = zram_compress(zram, , page, , _len);
> if (ret) {
> zcomp_stream_put(zram->comp);
> -   return ret;
> +   goto out_unlock;
> }
>  
> dst = zs_map_object(zram->mem_pool,
> @@ -817,20 +822,16 @@ static int __zram_bvec_write(struct zram *zram, struct 
> bio_vec *bvec, u32 index)
> zram_dedup_insert(zram, entry, checksum);
>  
>  found_dup:
> -   /*
> -* Free memory associated with this sector
> -* before overwriting unused sectors.
> -*/
> -   zram_slot_lock(zram, index);
> -   zram_free_page(zram, index);
> zram_set_entry(zram, index, entry);
> zram_set_obj_size(zram, index, comp_len);
> -   zram_slot_unlock(zram, index);
>  
> /* Update stats */
> atomic64_add(comp_len, >stats.compr_data_size);
> atomic64_inc(>stats.pages_stored);
> -   return 0;
> +
> +out_unlock:
> +   zram_slot_unlock(zram, index);
> +   return ret;
>  }
> 
> ---
> 
> 
> namely,
> that zram_compress() error return path from __zram_bvec_write().
> 
> currently, we touch the existing compressed 

Re: [PATCH 2/2] zram: do not count duplicated pages as compressed

2017-05-15 Thread Sergey Senozhatsky
Hello Minchan,

On (05/16/17 10:59), Minchan Kim wrote:
> Hi Sergey,
>

[..]
> You mean this?
> 
> static void zram_free_page(..) {
> if (zram_test_flag(zram, index, ZRAM_SAME))
> ...
> 
> if (!entry)
> return;
> 
> if (zram_dedup_enabled(zram) && )) {
> zram_clear_flag(ZRAM_DUP);
> atomic64_sub(entry->len, >stats.dup_data_size);
> } else {
> atomic64_sub(zram_get_obj_size(zram, index),
> >stats.compr_dat_size);
> }
> 
> zram_entry_free
> zram_set_entry
> zram_set_obj_size
> }

yeah, something like this.

> > > @@ -794,7 +801,15 @@ static int __zram_bvec_write(struct zram *zram, 
> > > struct bio_vec *bvec, u32 index)
> > >   entry = zram_dedup_find(zram, page, );
> > >   if (entry) {
> > >   comp_len = entry->len;
> > > - goto found_dup;
> > > + zram_slot_lock(zram, index);
> > > + zram_free_page(zram, index);
> > > + zram_set_flag(zram, index, ZRAM_DUP);
> > > + zram_set_entry(zram, index, entry);
> > > + zram_set_obj_size(zram, index, comp_len);
> > > + zram_slot_unlock(zram, index);
> > > + atomic64_add(comp_len, >stats.dup_data_size);
> > > + atomic64_inc(>stats.pages_stored);
> > > + return 0;
> > 
> > hm. that's a somewhat big code duplication. isn't it?
> 
> Yub. 3 parts. above part,  zram_same_page_write and tail of __zram_bvec_write.

hmm... good question... hardly can think of anything significantly
better, zram object handling is now a mix of flags, entries,
ref_counters, etc. etc. may be we can merge some of those ops, if we
would keep slot locked through the entire __zram_bvec_write(), but
that does not look attractive.

something ABSOLUTELY untested and incomplete. not even compile tested (!).
99% broken and stupid (!). but there is one thing that it has revealed, so
thus incomplete. see below:

---

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 372602c7da49..b31543c40d54 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -509,11 +509,8 @@ static bool zram_same_page_write(struct zram *zram, u32 
index,
if (page_same_filled(mem, )) {
kunmap_atomic(mem);
/* Free memory associated with this sector now. */
-   zram_slot_lock(zram, index);
-   zram_free_page(zram, index);
zram_set_flag(zram, index, ZRAM_SAME);
zram_set_element(zram, index, element);
-   zram_slot_unlock(zram, index);
 
atomic64_inc(>stats.same_pages);
return true;
@@ -778,7 +775,7 @@ static int zram_compress(struct zram *zram, struct 
zcomp_strm **zstrm,
 
 static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 
index)
 {
-   int ret;
+   int ret = 0;
struct zram_entry *entry;
unsigned int comp_len;
void *src, *dst;
@@ -786,12 +783,20 @@ static int __zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec, u32 index)
struct page *page = bvec->bv_page;
u32 checksum;
 
+   /*
+* Free memory associated with this sector
+* before overwriting unused sectors.
+*/
+   zram_slot_lock(zram, index);
+   zram_free_page(zram, index);
+
if (zram_same_page_write(zram, index, page))
-   return 0;
+   goto out_unlock;
 
entry = zram_dedup_find(zram, page, );
if (entry) {
comp_len = entry->len;
+   zram_set_flag(zram, index, ZRAM_DUP);
goto found_dup;
}
 
@@ -799,7 +804,7 @@ static int __zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec, u32 index)
ret = zram_compress(zram, , page, , _len);
if (ret) {
zcomp_stream_put(zram->comp);
-   return ret;
+   goto out_unlock;
}
 
dst = zs_map_object(zram->mem_pool,
@@ -817,20 +822,16 @@ static int __zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec, u32 index)
zram_dedup_insert(zram, entry, checksum);
 
 found_dup:
-   /*
-* Free memory associated with this sector
-* before overwriting unused sectors.
-*/
-   zram_slot_lock(zram, index);
-   zram_free_page(zram, index);
zram_set_entry(zram, index, entry);
zram_set_obj_size(zram, index, comp_len);
-   zram_slot_unlock(zram, index);
 
/* Update stats */
atomic64_add(comp_len, >stats.compr_data_size);
atomic64_inc(>stats.pages_stored);
-   return 0;
+
+out_unlock:
+   zram_slot_unlock(zram, index);
+   return ret;
 }

---


namely,
that zram_compress() error return path from __zram_bvec_write().

currently, we touch the existing compressed object and overwrite it only
when we successfully compressed a 

Re: [PATCH 2/2] zram: do not count duplicated pages as compressed

2017-05-15 Thread Sergey Senozhatsky
Hello Minchan,

On (05/16/17 10:59), Minchan Kim wrote:
> Hi Sergey,
>

[..]
> You mean this?
> 
> static void zram_free_page(..) {
> if (zram_test_flag(zram, index, ZRAM_SAME))
> ...
> 
> if (!entry)
> return;
> 
> if (zram_dedup_enabled(zram) && )) {
> zram_clear_flag(ZRAM_DUP);
> atomic64_sub(entry->len, >stats.dup_data_size);
> } else {
> atomic64_sub(zram_get_obj_size(zram, index),
> >stats.compr_dat_size);
> }
> 
> zram_entry_free
> zram_set_entry
> zram_set_obj_size
> }

yeah, something like this.

> > > @@ -794,7 +801,15 @@ static int __zram_bvec_write(struct zram *zram, 
> > > struct bio_vec *bvec, u32 index)
> > >   entry = zram_dedup_find(zram, page, );
> > >   if (entry) {
> > >   comp_len = entry->len;
> > > - goto found_dup;
> > > + zram_slot_lock(zram, index);
> > > + zram_free_page(zram, index);
> > > + zram_set_flag(zram, index, ZRAM_DUP);
> > > + zram_set_entry(zram, index, entry);
> > > + zram_set_obj_size(zram, index, comp_len);
> > > + zram_slot_unlock(zram, index);
> > > + atomic64_add(comp_len, >stats.dup_data_size);
> > > + atomic64_inc(>stats.pages_stored);
> > > + return 0;
> > 
> > hm. that's a somewhat big code duplication. isn't it?
> 
> Yub. 3 parts. above part,  zram_same_page_write and tail of __zram_bvec_write.

hmm... good question... hardly can think of anything significantly
better, zram object handling is now a mix of flags, entries,
ref_counters, etc. etc. may be we can merge some of those ops, if we
would keep slot locked through the entire __zram_bvec_write(), but
that does not look attractive.

something ABSOLUTELY untested and incomplete. not even compile tested (!).
99% broken and stupid (!). but there is one thing that it has revealed, so
thus incomplete. see below:

---

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 372602c7da49..b31543c40d54 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -509,11 +509,8 @@ static bool zram_same_page_write(struct zram *zram, u32 
index,
if (page_same_filled(mem, )) {
kunmap_atomic(mem);
/* Free memory associated with this sector now. */
-   zram_slot_lock(zram, index);
-   zram_free_page(zram, index);
zram_set_flag(zram, index, ZRAM_SAME);
zram_set_element(zram, index, element);
-   zram_slot_unlock(zram, index);
 
atomic64_inc(>stats.same_pages);
return true;
@@ -778,7 +775,7 @@ static int zram_compress(struct zram *zram, struct 
zcomp_strm **zstrm,
 
 static int __zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 
index)
 {
-   int ret;
+   int ret = 0;
struct zram_entry *entry;
unsigned int comp_len;
void *src, *dst;
@@ -786,12 +783,20 @@ static int __zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec, u32 index)
struct page *page = bvec->bv_page;
u32 checksum;
 
+   /*
+* Free memory associated with this sector
+* before overwriting unused sectors.
+*/
+   zram_slot_lock(zram, index);
+   zram_free_page(zram, index);
+
if (zram_same_page_write(zram, index, page))
-   return 0;
+   goto out_unlock;
 
entry = zram_dedup_find(zram, page, );
if (entry) {
comp_len = entry->len;
+   zram_set_flag(zram, index, ZRAM_DUP);
goto found_dup;
}
 
@@ -799,7 +804,7 @@ static int __zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec, u32 index)
ret = zram_compress(zram, , page, , _len);
if (ret) {
zcomp_stream_put(zram->comp);
-   return ret;
+   goto out_unlock;
}
 
dst = zs_map_object(zram->mem_pool,
@@ -817,20 +822,16 @@ static int __zram_bvec_write(struct zram *zram, struct 
bio_vec *bvec, u32 index)
zram_dedup_insert(zram, entry, checksum);
 
 found_dup:
-   /*
-* Free memory associated with this sector
-* before overwriting unused sectors.
-*/
-   zram_slot_lock(zram, index);
-   zram_free_page(zram, index);
zram_set_entry(zram, index, entry);
zram_set_obj_size(zram, index, comp_len);
-   zram_slot_unlock(zram, index);
 
/* Update stats */
atomic64_add(comp_len, >stats.compr_data_size);
atomic64_inc(>stats.pages_stored);
-   return 0;
+
+out_unlock:
+   zram_slot_unlock(zram, index);
+   return ret;
 }

---


namely,
that zram_compress() error return path from __zram_bvec_write().

currently, we touch the existing compressed object and overwrite it only
when we successfully compressed a 

Re: [PATCH 2/2] zram: do not count duplicated pages as compressed

2017-05-15 Thread Minchan Kim
Hi Sergey,

On Tue, May 16, 2017 at 10:30:22AM +0900, Sergey Senozhatsky wrote:
> On (05/15/17 16:41), Minchan Kim wrote:
> [..]
> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> > index b885356551e9..8152e405117b 100644
> > --- a/drivers/block/zram/zram_drv.c
> > +++ b/drivers/block/zram/zram_drv.c
> > @@ -624,15 +624,22 @@ static void zram_free_page(struct zram *zram, size_t 
> > index)
> > return;
> > }
> >  
> > +   if (zram_dedup_enabled(zram) &&
> > +   zram_test_flag(zram, index, ZRAM_DUP)) {
> > +   zram_clear_flag(zram, index, ZRAM_DUP);
> > +   atomic64_sub(entry->len, >stats.dup_data_size);
> > +   goto out;
> > +   }
> 
> so that `goto' there is to just jump over ->stats.compr_data_size?

Yub.

> can you sub ->stats.compr_data_size before the `if' and avoid labels?


> 
> > if (!entry)
> > return;
> 
> shouldn't this `if' be moved before `if (zram_dedup_enabled(zram)`?

You mean this?

static void zram_free_page(..) {
if (zram_test_flag(zram, index, ZRAM_SAME))
...

if (!entry)
return;

if (zram_dedup_enabled(zram) && )) {
zram_clear_flag(ZRAM_DUP);
atomic64_sub(entry->len, >stats.dup_data_size);
} else {
atomic64_sub(zram_get_obj_size(zram, index),
>stats.compr_dat_size);
}

zram_entry_free
zram_set_entry
zram_set_obj_size
}

> 
> 
> [..]
> > @@ -794,7 +801,15 @@ static int __zram_bvec_write(struct zram *zram, struct 
> > bio_vec *bvec, u32 index)
> > entry = zram_dedup_find(zram, page, );
> > if (entry) {
> > comp_len = entry->len;
> > -   goto found_dup;
> > +   zram_slot_lock(zram, index);
> > +   zram_free_page(zram, index);
> > +   zram_set_flag(zram, index, ZRAM_DUP);
> > +   zram_set_entry(zram, index, entry);
> > +   zram_set_obj_size(zram, index, comp_len);
> > +   zram_slot_unlock(zram, index);
> > +   atomic64_add(comp_len, >stats.dup_data_size);
> > +   atomic64_inc(>stats.pages_stored);
> > +   return 0;
> 
> hm. that's a somewhat big code duplication. isn't it?

Yub. 3 parts. above part,  zram_same_page_write and tail of __zram_bvec_write.

Do you have any idea? Feel free to suggest. :)
Thanks.


Re: [PATCH 2/2] zram: do not count duplicated pages as compressed

2017-05-15 Thread Minchan Kim
Hi Sergey,

On Tue, May 16, 2017 at 10:30:22AM +0900, Sergey Senozhatsky wrote:
> On (05/15/17 16:41), Minchan Kim wrote:
> [..]
> > diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> > index b885356551e9..8152e405117b 100644
> > --- a/drivers/block/zram/zram_drv.c
> > +++ b/drivers/block/zram/zram_drv.c
> > @@ -624,15 +624,22 @@ static void zram_free_page(struct zram *zram, size_t 
> > index)
> > return;
> > }
> >  
> > +   if (zram_dedup_enabled(zram) &&
> > +   zram_test_flag(zram, index, ZRAM_DUP)) {
> > +   zram_clear_flag(zram, index, ZRAM_DUP);
> > +   atomic64_sub(entry->len, >stats.dup_data_size);
> > +   goto out;
> > +   }
> 
> so that `goto' there is to just jump over ->stats.compr_data_size?

Yub.

> can you sub ->stats.compr_data_size before the `if' and avoid labels?


> 
> > if (!entry)
> > return;
> 
> shouldn't this `if' be moved before `if (zram_dedup_enabled(zram)`?

You mean this?

static void zram_free_page(..) {
if (zram_test_flag(zram, index, ZRAM_SAME))
...

if (!entry)
return;

if (zram_dedup_enabled(zram) && )) {
zram_clear_flag(ZRAM_DUP);
atomic64_sub(entry->len, >stats.dup_data_size);
} else {
atomic64_sub(zram_get_obj_size(zram, index),
>stats.compr_dat_size);
}

zram_entry_free
zram_set_entry
zram_set_obj_size
}

> 
> 
> [..]
> > @@ -794,7 +801,15 @@ static int __zram_bvec_write(struct zram *zram, struct 
> > bio_vec *bvec, u32 index)
> > entry = zram_dedup_find(zram, page, );
> > if (entry) {
> > comp_len = entry->len;
> > -   goto found_dup;
> > +   zram_slot_lock(zram, index);
> > +   zram_free_page(zram, index);
> > +   zram_set_flag(zram, index, ZRAM_DUP);
> > +   zram_set_entry(zram, index, entry);
> > +   zram_set_obj_size(zram, index, comp_len);
> > +   zram_slot_unlock(zram, index);
> > +   atomic64_add(comp_len, >stats.dup_data_size);
> > +   atomic64_inc(>stats.pages_stored);
> > +   return 0;
> 
> hm. that's a somewhat big code duplication. isn't it?

Yub. 3 parts. above part,  zram_same_page_write and tail of __zram_bvec_write.

Do you have any idea? Feel free to suggest. :)
Thanks.


Re: [PATCH 2/2] zram: do not count duplicated pages as compressed

2017-05-15 Thread Sergey Senozhatsky
On (05/15/17 16:41), Minchan Kim wrote:
[..]
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index b885356551e9..8152e405117b 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -624,15 +624,22 @@ static void zram_free_page(struct zram *zram, size_t 
> index)
>   return;
>   }
>  
> + if (zram_dedup_enabled(zram) &&
> + zram_test_flag(zram, index, ZRAM_DUP)) {
> + zram_clear_flag(zram, index, ZRAM_DUP);
> + atomic64_sub(entry->len, >stats.dup_data_size);
> + goto out;
> + }

so that `goto' there is to just jump over ->stats.compr_data_size?
can you sub ->stats.compr_data_size before the `if' and avoid labels?

>   if (!entry)
>   return;

shouldn't this `if' be moved before `if (zram_dedup_enabled(zram)`?


[..]
> @@ -794,7 +801,15 @@ static int __zram_bvec_write(struct zram *zram, struct 
> bio_vec *bvec, u32 index)
>   entry = zram_dedup_find(zram, page, );
>   if (entry) {
>   comp_len = entry->len;
> - goto found_dup;
> + zram_slot_lock(zram, index);
> + zram_free_page(zram, index);
> + zram_set_flag(zram, index, ZRAM_DUP);
> + zram_set_entry(zram, index, entry);
> + zram_set_obj_size(zram, index, comp_len);
> + zram_slot_unlock(zram, index);
> + atomic64_add(comp_len, >stats.dup_data_size);
> + atomic64_inc(>stats.pages_stored);
> + return 0;

hm. that's a somewhat big code duplication. isn't it?

-ss


Re: [PATCH 2/2] zram: do not count duplicated pages as compressed

2017-05-15 Thread Sergey Senozhatsky
On (05/15/17 16:41), Minchan Kim wrote:
[..]
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index b885356551e9..8152e405117b 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -624,15 +624,22 @@ static void zram_free_page(struct zram *zram, size_t 
> index)
>   return;
>   }
>  
> + if (zram_dedup_enabled(zram) &&
> + zram_test_flag(zram, index, ZRAM_DUP)) {
> + zram_clear_flag(zram, index, ZRAM_DUP);
> + atomic64_sub(entry->len, >stats.dup_data_size);
> + goto out;
> + }

so that `goto' there is to just jump over ->stats.compr_data_size?
can you sub ->stats.compr_data_size before the `if' and avoid labels?

>   if (!entry)
>   return;

shouldn't this `if' be moved before `if (zram_dedup_enabled(zram)`?


[..]
> @@ -794,7 +801,15 @@ static int __zram_bvec_write(struct zram *zram, struct 
> bio_vec *bvec, u32 index)
>   entry = zram_dedup_find(zram, page, );
>   if (entry) {
>   comp_len = entry->len;
> - goto found_dup;
> + zram_slot_lock(zram, index);
> + zram_free_page(zram, index);
> + zram_set_flag(zram, index, ZRAM_DUP);
> + zram_set_entry(zram, index, entry);
> + zram_set_obj_size(zram, index, comp_len);
> + zram_slot_unlock(zram, index);
> + atomic64_add(comp_len, >stats.dup_data_size);
> + atomic64_inc(>stats.pages_stored);
> + return 0;

hm. that's a somewhat big code duplication. isn't it?

-ss