[chromium-dev] Re: Spellchecker and memory-mapped dicts

2009-10-22 Thread Peter Kasting
It seems like loading into memory will result in more predictable access
times for the initial set of words that get spellchecked (up to the point
where the memory-mapped file would have been entirely paged in).  If you
combine this with my memory purger code that will (hopefully) result in the
dictionary getting dumped out of memory occasionally, which causes the
behavior right after open to become more significant, I think loading into
memory is a win.
I doubt the dictionaries are structured such that memory-mapping the file
will reduce the browser process memory footprint in a meaningful way.

PK

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Spellchecker and memory-mapped dicts

2009-10-22 Thread Chris Evans
There's also option 3)

Pre-fault the mmap()ed region in the file thread upon dictionary
initialization.
On Linux at least, that may give you better behaviour than malloc() + read()
in the event of memory pressure.

Cheers
Chris

On Thu, Oct 22, 2009 at 1:39 PM, Evan Stade est...@chromium.org wrote:


 Hi all,

 At its last meeting the jank task force discussed improving
 responsiveness of the spellchecker but we didn't come to a solid
 conclusion so I thought I'd bring it up here to see if anyone else has
 opinions. The main concern is that we don't block the IO thread on
 file access. To this end, I recently moved initialization of the
 spellchecker from the IO thread to the file thread. However, instead
 of reading in the spellchecker dictionary in one solid chunk, we
 memory-map it. Then later we check individual words on the IO thread,
 which will be slow since the dictionary starts off effectively
 completely paged out. The proposal is that we read in the dictionary
 at spellchecker intialization instead of memory mapping it.

 Memory mapping pros:
 - possibly uses less overall memory, depending on the structure of the
 dictionary and the usage pattern of the user.
 - strikeloading the dictionary doesn't block for a long
 time/strike this one no longer occurs either way due to my recent
 refactoring

 Reading it all at once pros:
 - costly disk accesses are kept to the file thread (excepting future
 memory paging)
 - overall disk access time is probably lower (since we can read in the
 dict in one chunk)

 For reference, the English dictionary is about 500K, and most
 dictionaries are under 2 megs, some (such as Hungarian) are much
 higher, but no dictionary is over 10 megs.

 Opinions?

 -- Evan Stade

 


--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Spellchecker and memory-mapped dicts

2009-10-22 Thread Steve Vandebogart
If you plan to read the entire file, mmap()ing it, then faulting it in will
be slower than read()ing it, at least in some Linux versions.  I never
pinned down exactly why, but I think the kernel read-ahead mechanism works
slightly differently.
--
Steve

On Thu, Oct 22, 2009 at 2:02 PM, Chris Evans cev...@chromium.org wrote:

 There's also option 3)
 Pre-fault the mmap()ed region in the file thread upon dictionary
 initialization.
 On Linux at least, that may give you better behaviour than malloc() +
 read() in the event of memory pressure.

 Cheers
 Chris


 On Thu, Oct 22, 2009 at 1:39 PM, Evan Stade est...@chromium.org wrote:


 Hi all,

 At its last meeting the jank task force discussed improving
 responsiveness of the spellchecker but we didn't come to a solid
 conclusion so I thought I'd bring it up here to see if anyone else has
 opinions. The main concern is that we don't block the IO thread on
 file access. To this end, I recently moved initialization of the
 spellchecker from the IO thread to the file thread. However, instead
 of reading in the spellchecker dictionary in one solid chunk, we
 memory-map it. Then later we check individual words on the IO thread,
 which will be slow since the dictionary starts off effectively
 completely paged out. The proposal is that we read in the dictionary
 at spellchecker intialization instead of memory mapping it.

 Memory mapping pros:
 - possibly uses less overall memory, depending on the structure of the
 dictionary and the usage pattern of the user.
 - strikeloading the dictionary doesn't block for a long
 time/strike this one no longer occurs either way due to my recent
 refactoring

 Reading it all at once pros:
 - costly disk accesses are kept to the file thread (excepting future
 memory paging)
 - overall disk access time is probably lower (since we can read in the
 dict in one chunk)

 For reference, the English dictionary is about 500K, and most
 dictionaries are under 2 megs, some (such as Hungarian) are much
 higher, but no dictionary is over 10 megs.

 Opinions?

 -- Evan Stade




 


--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Spellchecker and memory-mapped dicts

2009-10-22 Thread Peter Kasting
On Thu, Oct 22, 2009 at 2:02 PM, Chris Evans cev...@chromium.org wrote:

 There's also option 3)
 Pre-fault the mmap()ed region in the file thread upon dictionary
 initialization.
 On Linux at least, that may give you better behaviour than malloc() +
 read() in the event of memory pressure.


On Windows, I believe that will either be equivalent to loading everything
into a memory data structure or very slightly worse.

I forgot to mention one facet of loading into memory.  If we need to, it is
probably easier for us to design the memory data structure so that most hits
occur in a smaller region of pages, and are thus more friendly to memory
pressure, than it is to structure the files on disk to have this property.

PK

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Spellchecker and memory-mapped dicts

2009-10-22 Thread Brett Wilson

On Thu, Oct 22, 2009 at 1:39 PM, Evan Stade est...@chromium.org wrote:

 Hi all,

 At its last meeting the jank task force discussed improving
 responsiveness of the spellchecker but we didn't come to a solid
 conclusion so I thought I'd bring it up here to see if anyone else has
 opinions. The main concern is that we don't block the IO thread on
 file access. To this end, I recently moved initialization of the
 spellchecker from the IO thread to the file thread. However, instead
 of reading in the spellchecker dictionary in one solid chunk, we
 memory-map it. Then later we check individual words on the IO thread,
 which will be slow since the dictionary starts off effectively
 completely paged out. The proposal is that we read in the dictionary
 at spellchecker intialization instead of memory mapping it.

 Memory mapping pros:
 - possibly uses less overall memory, depending on the structure of the
 dictionary and the usage pattern of the user.
 - strikeloading the dictionary doesn't block for a long
 time/strike this one no longer occurs either way due to my recent
 refactoring

 Reading it all at once pros:
 - costly disk accesses are kept to the file thread (excepting future
 memory paging)
 - overall disk access time is probably lower (since we can read in the
 dict in one chunk)

 For reference, the English dictionary is about 500K, and most
 dictionaries are under 2 megs, some (such as Hungarian) are much
 higher, but no dictionary is over 10 megs.

 Opinions?

I've thought about this some (I wrote the memory map thing there now).

History of the spellchecker:
v1 : Per-process Hunspell storage (lots of memory duplicated in each
renderer, expensive to load).
v2 : Browser-process Hunspell storage (lots of memory, expensive to
load, only occurs once)
v3 : Browser-process memmap (less memory, cheap to load, only occurs once).

I would like to consider moving hunspell back to the renderer so we
can avoid sync IPCs and blocking the I/O thread on spellchecking.
Spellchecking isn't fast (especially suggestions) even when everything
is in memory, so it always sucks to have it block the I/O thread. Now
that it can be memmapped, each renderer can memmap its own image of
the data.

This doesn't help on Mac where we want to use the system spellchecker.
There would also be some amount of duplication since there are certain
tables that are initialized once at the beginning (I don't think its
that big, though).

I would suggest first making the current histograms in the
spellchecker.cc file UMA (currently they're debug-only local ones) so
we can see how much blocking we're getting from Hunspell in the field.

Brett

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Spellchecker and memory-mapped dicts

2009-10-22 Thread Scott Hess

On Linux what about mmap() and then madvise() with MADV_WILLNEED?  [or
posix_fadvise() with POSIX_FADV_WILLNEED on the file descriptor).

-scott


On Thu, Oct 22, 2009 at 2:06 PM, Steve Vandebogart vand...@chromium.org wrote:
 If you plan to read the entire file, mmap()ing it, then faulting it in will
 be slower than read()ing it, at least in some Linux versions.  I never
 pinned down exactly why, but I think the kernel read-ahead mechanism works
 slightly differently.
 --
 Steve

 On Thu, Oct 22, 2009 at 2:02 PM, Chris Evans cev...@chromium.org wrote:

 There's also option 3)
 Pre-fault the mmap()ed region in the file thread upon dictionary
 initialization.
 On Linux at least, that may give you better behaviour than malloc() +
 read() in the event of memory pressure.
 Cheers
 Chris

 On Thu, Oct 22, 2009 at 1:39 PM, Evan Stade est...@chromium.org wrote:

 Hi all,

 At its last meeting the jank task force discussed improving
 responsiveness of the spellchecker but we didn't come to a solid
 conclusion so I thought I'd bring it up here to see if anyone else has
 opinions. The main concern is that we don't block the IO thread on
 file access. To this end, I recently moved initialization of the
 spellchecker from the IO thread to the file thread. However, instead
 of reading in the spellchecker dictionary in one solid chunk, we
 memory-map it. Then later we check individual words on the IO thread,
 which will be slow since the dictionary starts off effectively
 completely paged out. The proposal is that we read in the dictionary
 at spellchecker intialization instead of memory mapping it.

 Memory mapping pros:
 - possibly uses less overall memory, depending on the structure of the
 dictionary and the usage pattern of the user.
 - strikeloading the dictionary doesn't block for a long
 time/strike this one no longer occurs either way due to my recent
 refactoring

 Reading it all at once pros:
 - costly disk accesses are kept to the file thread (excepting future
 memory paging)
 - overall disk access time is probably lower (since we can read in the
 dict in one chunk)

 For reference, the English dictionary is about 500K, and most
 dictionaries are under 2 megs, some (such as Hungarian) are much
 higher, but no dictionary is over 10 megs.

 Opinions?

 -- Evan Stade







 


--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Spellchecker and memory-mapped dicts

2009-10-22 Thread Chris Evans
On Thu, Oct 22, 2009 at 2:22 PM, Brett Wilson bre...@chromium.org wrote:


 On Thu, Oct 22, 2009 at 1:39 PM, Evan Stade est...@chromium.org wrote:
 
  Hi all,
 
  At its last meeting the jank task force discussed improving
  responsiveness of the spellchecker but we didn't come to a solid
  conclusion so I thought I'd bring it up here to see if anyone else has
  opinions. The main concern is that we don't block the IO thread on
  file access. To this end, I recently moved initialization of the
  spellchecker from the IO thread to the file thread. However, instead
  of reading in the spellchecker dictionary in one solid chunk, we
  memory-map it. Then later we check individual words on the IO thread,
  which will be slow since the dictionary starts off effectively
  completely paged out. The proposal is that we read in the dictionary
  at spellchecker intialization instead of memory mapping it.
 
  Memory mapping pros:
  - possibly uses less overall memory, depending on the structure of the
  dictionary and the usage pattern of the user.
  - strikeloading the dictionary doesn't block for a long
  time/strike this one no longer occurs either way due to my recent
  refactoring
 
  Reading it all at once pros:
  - costly disk accesses are kept to the file thread (excepting future
  memory paging)
  - overall disk access time is probably lower (since we can read in the
  dict in one chunk)
 
  For reference, the English dictionary is about 500K, and most
  dictionaries are under 2 megs, some (such as Hungarian) are much
  higher, but no dictionary is over 10 megs.
 
  Opinions?

 I've thought about this some (I wrote the memory map thing there now).

 History of the spellchecker:
 v1 : Per-process Hunspell storage (lots of memory duplicated in each
 renderer, expensive to load).
 v2 : Browser-process Hunspell storage (lots of memory, expensive to
 load, only occurs once)
 v3 : Browser-process memmap (less memory, cheap to load, only occurs once).

 I would like to consider moving hunspell back to the renderer so we
 can avoid sync IPCs and blocking the I/O thread on spellchecking.


That would also be a stability win. Currently, any hunspell crashes due to
bust dictionaries take down the entire browser.

Cheers
Chris

Spellchecking isn't fast (especially suggestions) even when everything
 is in memory, so it always sucks to have it block the I/O thread. Now
 that it can be memmapped, each renderer can memmap its own image of
 the data.

 This doesn't help on Mac where we want to use the system spellchecker.
 There would also be some amount of duplication since there are certain
 tables that are initialized once at the beginning (I don't think its
 that big, though).

 I would suggest first making the current histograms in the
 spellchecker.cc file UMA (currently they're debug-only local ones) so
 we can see how much blocking we're getting from Hunspell in the field.

 Brett

 


--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Spellchecker and memory-mapped dicts

2009-10-22 Thread Evan Martin

On Thu, Oct 22, 2009 at 2:22 PM, Brett Wilson bre...@chromium.org wrote:
 This doesn't help on Mac where we want to use the system spellchecker.

FYI, we got a patch to use the system spellchecker on Linux as well.
  http://code.google.com/p/chromium/issues/detail?id=24517
I should probably ping the original uploader again...

This Ubuntu document describes some use cases as to why unification is good:
  https://wiki.ubuntu.com/ConsolidateSpellingLibs

On the other hand, ChromeOS will certainly benefit from this.

--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Spellchecker and memory-mapped dicts

2009-10-22 Thread Steve Vandebogart
It's been awhile since I looked at this, but the email I was able to dig up
suggests that madvise is no faster than faulting in the mmap()ed region by
hand.  However, using posix_fadvise should give the same speeds as read()ing
it into memory.  IIRC though, posix_fadvise will only read so much in a
single request and will let readahead handle the rest after that.
--
Steve

On Thu, Oct 22, 2009 at 2:27 PM, Scott Hess sh...@chromium.org wrote:

 On Linux what about mmap() and then madvise() with MADV_WILLNEED?  [or
 posix_fadvise() with POSIX_FADV_WILLNEED on the file descriptor).

 -scott


 On Thu, Oct 22, 2009 at 2:06 PM, Steve Vandebogart vand...@chromium.org
 wrote:
  If you plan to read the entire file, mmap()ing it, then faulting it in
 will
  be slower than read()ing it, at least in some Linux versions.  I never
  pinned down exactly why, but I think the kernel read-ahead mechanism
 works
  slightly differently.
  --
  Steve
 
  On Thu, Oct 22, 2009 at 2:02 PM, Chris Evans cev...@chromium.org
 wrote:
 
  There's also option 3)
  Pre-fault the mmap()ed region in the file thread upon dictionary
  initialization.
  On Linux at least, that may give you better behaviour than malloc() +
  read() in the event of memory pressure.
  Cheers
  Chris
 
  On Thu, Oct 22, 2009 at 1:39 PM, Evan Stade est...@chromium.org
 wrote:
 
  Hi all,
 
  At its last meeting the jank task force discussed improving
  responsiveness of the spellchecker but we didn't come to a solid
  conclusion so I thought I'd bring it up here to see if anyone else has
  opinions. The main concern is that we don't block the IO thread on
  file access. To this end, I recently moved initialization of the
  spellchecker from the IO thread to the file thread. However, instead
  of reading in the spellchecker dictionary in one solid chunk, we
  memory-map it. Then later we check individual words on the IO thread,
  which will be slow since the dictionary starts off effectively
  completely paged out. The proposal is that we read in the dictionary
  at spellchecker intialization instead of memory mapping it.
 
  Memory mapping pros:
  - possibly uses less overall memory, depending on the structure of the
  dictionary and the usage pattern of the user.
  - strikeloading the dictionary doesn't block for a long
  time/strike this one no longer occurs either way due to my recent
  refactoring
 
  Reading it all at once pros:
  - costly disk accesses are kept to the file thread (excepting future
  memory paging)
  - overall disk access time is probably lower (since we can read in the
  dict in one chunk)
 
  For reference, the English dictionary is about 500K, and most
  dictionaries are under 2 megs, some (such as Hungarian) are much
  higher, but no dictionary is over 10 megs.
 
  Opinions?
 
  -- Evan Stade
 
 
 
 
 
 
 
   
 


--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Spellchecker and memory-mapped dicts

2009-10-22 Thread Scott Hess

Faulting it in by hand is only helpful if we're right!  If we're
wrong, it could evict other stuff from memory to support a feature
which a user may not use until the memory is faulted back out anyhow.

[From the rest of the thread, though, it sounds like maybe we should
just fix hunspell to be more efficient for our usage.]

-scott


On Thu, Oct 22, 2009 at 2:42 PM, Steve Vandebogart vand...@chromium.org wrote:
 It's been awhile since I looked at this, but the email I was able to dig up
 suggests that madvise is no faster than faulting in the mmap()ed region by
 hand.  However, using posix_fadvise should give the same speeds as read()ing
 it into memory.  IIRC though, posix_fadvise will only read so much in a
 single request and will let readahead handle the rest after that.
 --
 Steve

 On Thu, Oct 22, 2009 at 2:27 PM, Scott Hess sh...@chromium.org wrote:

 On Linux what about mmap() and then madvise() with MADV_WILLNEED?  [or
 posix_fadvise() with POSIX_FADV_WILLNEED on the file descriptor).

 -scott


 On Thu, Oct 22, 2009 at 2:06 PM, Steve Vandebogart vand...@chromium.org
 wrote:
  If you plan to read the entire file, mmap()ing it, then faulting it in
  will
  be slower than read()ing it, at least in some Linux versions.  I never
  pinned down exactly why, but I think the kernel read-ahead mechanism
  works
  slightly differently.
  --
  Steve
 
  On Thu, Oct 22, 2009 at 2:02 PM, Chris Evans cev...@chromium.org
  wrote:
 
  There's also option 3)
  Pre-fault the mmap()ed region in the file thread upon dictionary
  initialization.
  On Linux at least, that may give you better behaviour than malloc() +
  read() in the event of memory pressure.
  Cheers
  Chris
 
  On Thu, Oct 22, 2009 at 1:39 PM, Evan Stade est...@chromium.org
  wrote:
 
  Hi all,
 
  At its last meeting the jank task force discussed improving
  responsiveness of the spellchecker but we didn't come to a solid
  conclusion so I thought I'd bring it up here to see if anyone else has
  opinions. The main concern is that we don't block the IO thread on
  file access. To this end, I recently moved initialization of the
  spellchecker from the IO thread to the file thread. However, instead
  of reading in the spellchecker dictionary in one solid chunk, we
  memory-map it. Then later we check individual words on the IO thread,
  which will be slow since the dictionary starts off effectively
  completely paged out. The proposal is that we read in the dictionary
  at spellchecker intialization instead of memory mapping it.
 
  Memory mapping pros:
  - possibly uses less overall memory, depending on the structure of the
  dictionary and the usage pattern of the user.
  - strikeloading the dictionary doesn't block for a long
  time/strike this one no longer occurs either way due to my recent
  refactoring
 
  Reading it all at once pros:
  - costly disk accesses are kept to the file thread (excepting future
  memory paging)
  - overall disk access time is probably lower (since we can read in the
  dict in one chunk)
 
  For reference, the English dictionary is about 500K, and most
  dictionaries are under 2 megs, some (such as Hungarian) are much
  higher, but no dictionary is over 10 megs.
 
  Opinions?
 
  -- Evan Stade
 
 
 
 
 
 
 
   
 



--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Spellchecker and memory-mapped dicts

2009-10-22 Thread cpu

+1 on moving spell to the renderers.

We can memory map in the browser and map again the in renderers.
Hopefully read-only.
We eliminate the sync ipc and do not increase the memory usage.


On Oct 22, 2:42 pm, Steve Vandebogart vand...@chromium.org wrote:
 It's been awhile since I looked at this, but the email I was able to dig up
 suggests that madvise is no faster than faulting in the mmap()ed region by
 hand.  However, using posix_fadvise should give the same speeds as read()ing
 it into memory.  IIRC though, posix_fadvise will only read so much in a
 single request and will let readahead handle the rest after that.
 --
 Steve



 On Thu, Oct 22, 2009 at 2:27 PM, Scott Hess sh...@chromium.org wrote:
  On Linux what about mmap() and then madvise() with MADV_WILLNEED?  [or
  posix_fadvise() with POSIX_FADV_WILLNEED on the file descriptor).

  -scott

  On Thu, Oct 22, 2009 at 2:06 PM, Steve Vandebogart vand...@chromium.org
  wrote:
   If you plan to read the entire file, mmap()ing it, then faulting it in
  will
   be slower than read()ing it, at least in some Linux versions.  I never
   pinned down exactly why, but I think the kernel read-ahead mechanism
  works
   slightly differently.
   --
   Steve

   On Thu, Oct 22, 2009 at 2:02 PM, Chris Evans cev...@chromium.org
  wrote:

   There's also option 3)
   Pre-fault the mmap()ed region in the file thread upon dictionary
   initialization.
   On Linux at least, that may give you better behaviour than malloc() +
   read() in the event of memory pressure.
   Cheers
   Chris

   On Thu, Oct 22, 2009 at 1:39 PM, Evan Stade est...@chromium.org
  wrote:

   Hi all,

   At its last meeting the jank task force discussed improving
   responsiveness of the spellchecker but we didn't come to a solid
   conclusion so I thought I'd bring it up here to see if anyone else has
   opinions. The main concern is that we don't block the IO thread on
   file access. To this end, I recently moved initialization of the
   spellchecker from the IO thread to the file thread. However, instead
   of reading in the spellchecker dictionary in one solid chunk, we
   memory-map it. Then later we check individual words on the IO thread,
   which will be slow since the dictionary starts off effectively
   completely paged out. The proposal is that we read in the dictionary
   at spellchecker intialization instead of memory mapping it.

   Memory mapping pros:
   - possibly uses less overall memory, depending on the structure of the
   dictionary and the usage pattern of the user.
   - strikeloading the dictionary doesn't block for a long
   time/strike this one no longer occurs either way due to my recent
   refactoring

   Reading it all at once pros:
   - costly disk accesses are kept to the file thread (excepting future
   memory paging)
   - overall disk access time is probably lower (since we can read in the
   dict in one chunk)

   For reference, the English dictionary is about 500K, and most
   dictionaries are under 2 megs, some (such as Hungarian) are much
   higher, but no dictionary is over 10 megs.

   Opinions?

   -- Evan Stade
--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Spellchecker and memory-mapped dicts

2009-10-22 Thread Steve Vandebogart
Probably a bit off topic at this point, but but your response confuses me
- MADV_WILLNEED and POSIX_FADV_WILLNEED will bring the pages into ram, just
like faulting in mmap()'ed pages by hand, or read()ing it into memory.  In
my experience, read() and fadvise() are faster than mmap()+faulting
everything in, or madvise().  Of course, read()ing it in means it has to be
swapped out and can't just be dropped. If you want to suck the entire file
in at some point, probably the best way is to fadvise() it in, then mmap()
it and use it from there.

--
Steve

On Thu, Oct 22, 2009 at 2:52 PM, Scott Hess sh...@chromium.org wrote:

 Faulting it in by hand is only helpful if we're right!  If we're
 wrong, it could evict other stuff from memory to support a feature
 which a user may not use until the memory is faulted back out anyhow.

 [From the rest of the thread, though, it sounds like maybe we should
 just fix hunspell to be more efficient for our usage.]

 -scott


 On Thu, Oct 22, 2009 at 2:42 PM, Steve Vandebogart vand...@chromium.org
 wrote:
  It's been awhile since I looked at this, but the email I was able to dig
 up
  suggests that madvise is no faster than faulting in the mmap()ed region
 by
  hand.  However, using posix_fadvise should give the same speeds as
 read()ing
  it into memory.  IIRC though, posix_fadvise will only read so much in a
  single request and will let readahead handle the rest after that.
  --
  Steve
 
  On Thu, Oct 22, 2009 at 2:27 PM, Scott Hess sh...@chromium.org wrote:
 
  On Linux what about mmap() and then madvise() with MADV_WILLNEED?  [or
  posix_fadvise() with POSIX_FADV_WILLNEED on the file descriptor).
 
  -scott
 
 
  On Thu, Oct 22, 2009 at 2:06 PM, Steve Vandebogart 
 vand...@chromium.org
  wrote:
   If you plan to read the entire file, mmap()ing it, then faulting it in
   will
   be slower than read()ing it, at least in some Linux versions.  I never
   pinned down exactly why, but I think the kernel read-ahead mechanism
   works
   slightly differently.
   --
   Steve
  
   On Thu, Oct 22, 2009 at 2:02 PM, Chris Evans cev...@chromium.org
   wrote:
  
   There's also option 3)
   Pre-fault the mmap()ed region in the file thread upon dictionary
   initialization.
   On Linux at least, that may give you better behaviour than malloc() +
   read() in the event of memory pressure.
   Cheers
   Chris
  
   On Thu, Oct 22, 2009 at 1:39 PM, Evan Stade est...@chromium.org
   wrote:
  
   Hi all,
  
   At its last meeting the jank task force discussed improving
   responsiveness of the spellchecker but we didn't come to a solid
   conclusion so I thought I'd bring it up here to see if anyone else
 has
   opinions. The main concern is that we don't block the IO thread on
   file access. To this end, I recently moved initialization of the
   spellchecker from the IO thread to the file thread. However, instead
   of reading in the spellchecker dictionary in one solid chunk, we
   memory-map it. Then later we check individual words on the IO
 thread,
   which will be slow since the dictionary starts off effectively
   completely paged out. The proposal is that we read in the dictionary
   at spellchecker intialization instead of memory mapping it.
  
   Memory mapping pros:
   - possibly uses less overall memory, depending on the structure of
 the
   dictionary and the usage pattern of the user.
   - strikeloading the dictionary doesn't block for a long
   time/strike this one no longer occurs either way due to my recent
   refactoring
  
   Reading it all at once pros:
   - costly disk accesses are kept to the file thread (excepting future
   memory paging)
   - overall disk access time is probably lower (since we can read in
 the
   dict in one chunk)
  
   For reference, the English dictionary is about 500K, and most
   dictionaries are under 2 megs, some (such as Hungarian) are much
   higher, but no dictionary is over 10 megs.
  
   Opinions?
  
   -- Evan Stade
  
  
  
  
  
  
  
 
  
 
 


--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Spellchecker and memory-mapped dicts

2009-10-22 Thread Scott Hess

WILLNEED says Hey, OS, I think I'm going to look at these pages soon,
get yourself ready, but the OS could implement them as a nop, and can
do it async.  If memory is under pressure the system can do less, if
memory is clear it can do more.  Actually reading the data into memory
blocks and actually reads them into memory.

-scott


On Thu, Oct 22, 2009 at 3:01 PM, Steve Vandebogart vand...@chromium.org wrote:
 Probably a bit off topic at this point, but but your response confuses me
 - MADV_WILLNEED and POSIX_FADV_WILLNEED will bring the pages into ram, just
 like faulting in mmap()'ed pages by hand, or read()ing it into memory.  In
 my experience, read() and fadvise() are faster than mmap()+faulting
 everything in, or madvise().  Of course, read()ing it in means it has to be
 swapped out and can't just be dropped.
 If you want to suck the entire file in at some point, probably the best way
 is to fadvise() it in, then mmap() it and use it from there.
 --
 Steve
 On Thu, Oct 22, 2009 at 2:52 PM, Scott Hess sh...@chromium.org wrote:

 Faulting it in by hand is only helpful if we're right!  If we're
 wrong, it could evict other stuff from memory to support a feature
 which a user may not use until the memory is faulted back out anyhow.

 [From the rest of the thread, though, it sounds like maybe we should
 just fix hunspell to be more efficient for our usage.]

 -scott


 On Thu, Oct 22, 2009 at 2:42 PM, Steve Vandebogart vand...@chromium.org
 wrote:
  It's been awhile since I looked at this, but the email I was able to dig
  up
  suggests that madvise is no faster than faulting in the mmap()ed region
  by
  hand.  However, using posix_fadvise should give the same speeds as
  read()ing
  it into memory.  IIRC though, posix_fadvise will only read so much in a
  single request and will let readahead handle the rest after that.
  --
  Steve
 
  On Thu, Oct 22, 2009 at 2:27 PM, Scott Hess sh...@chromium.org wrote:
 
  On Linux what about mmap() and then madvise() with MADV_WILLNEED?  [or
  posix_fadvise() with POSIX_FADV_WILLNEED on the file descriptor).
 
  -scott
 
 
  On Thu, Oct 22, 2009 at 2:06 PM, Steve Vandebogart
  vand...@chromium.org
  wrote:
   If you plan to read the entire file, mmap()ing it, then faulting it
   in
   will
   be slower than read()ing it, at least in some Linux versions.  I
   never
   pinned down exactly why, but I think the kernel read-ahead mechanism
   works
   slightly differently.
   --
   Steve
  
   On Thu, Oct 22, 2009 at 2:02 PM, Chris Evans cev...@chromium.org
   wrote:
  
   There's also option 3)
   Pre-fault the mmap()ed region in the file thread upon dictionary
   initialization.
   On Linux at least, that may give you better behaviour than malloc()
   +
   read() in the event of memory pressure.
   Cheers
   Chris
  
   On Thu, Oct 22, 2009 at 1:39 PM, Evan Stade est...@chromium.org
   wrote:
  
   Hi all,
  
   At its last meeting the jank task force discussed improving
   responsiveness of the spellchecker but we didn't come to a solid
   conclusion so I thought I'd bring it up here to see if anyone else
   has
   opinions. The main concern is that we don't block the IO thread on
   file access. To this end, I recently moved initialization of the
   spellchecker from the IO thread to the file thread. However,
   instead
   of reading in the spellchecker dictionary in one solid chunk, we
   memory-map it. Then later we check individual words on the IO
   thread,
   which will be slow since the dictionary starts off effectively
   completely paged out. The proposal is that we read in the
   dictionary
   at spellchecker intialization instead of memory mapping it.
  
   Memory mapping pros:
   - possibly uses less overall memory, depending on the structure of
   the
   dictionary and the usage pattern of the user.
   - strikeloading the dictionary doesn't block for a long
   time/strike this one no longer occurs either way due to my recent
   refactoring
  
   Reading it all at once pros:
   - costly disk accesses are kept to the file thread (excepting
   future
   memory paging)
   - overall disk access time is probably lower (since we can read in
   the
   dict in one chunk)
  
   For reference, the English dictionary is about 500K, and most
   dictionaries are under 2 megs, some (such as Hungarian) are much
   higher, but no dictionary is over 10 megs.
  
   Opinions?
  
   -- Evan Stade
  
  
  
  
  
  
  
 
  
 
 



--~--~-~--~~~---~--~~
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev
-~--~~~~--~~--~--~---



[chromium-dev] Re: Spellchecker and memory-mapped dicts

2009-10-22 Thread Steve Vandebogart
That is the intention of the interface yes, but all Linux implementations
I've seen actually go and read what ever you say you will need.  Of course
with a few exceptions like actually being out of memory.
--
Steve

On Thu, Oct 22, 2009 at 3:06 PM, Scott Hess sh...@chromium.org wrote:

 WILLNEED says Hey, OS, I think I'm going to look at these pages soon,
 get yourself ready, but the OS could implement them as a nop, and can
 do it async.  If memory is under pressure the system can do less, if
 memory is clear it can do more.  Actually reading the data into memory
 blocks and actually reads them into memory.

 -scott


 On Thu, Oct 22, 2009 at 3:01 PM, Steve Vandebogart vand...@chromium.org
 wrote:
  Probably a bit off topic at this point, but but your response confuses me
  - MADV_WILLNEED and POSIX_FADV_WILLNEED will bring the pages into ram,
 just
  like faulting in mmap()'ed pages by hand, or read()ing it into memory.
  In
  my experience, read() and fadvise() are faster than mmap()+faulting
  everything in, or madvise().  Of course, read()ing it in means it has to
 be
  swapped out and can't just be dropped.
  If you want to suck the entire file in at some point, probably the best
 way
  is to fadvise() it in, then mmap() it and use it from there.
  --
  Steve
  On Thu, Oct 22, 2009 at 2:52 PM, Scott Hess sh...@chromium.org wrote:
 
  Faulting it in by hand is only helpful if we're right!  If we're
  wrong, it could evict other stuff from memory to support a feature
  which a user may not use until the memory is faulted back out anyhow.
 
  [From the rest of the thread, though, it sounds like maybe we should
  just fix hunspell to be more efficient for our usage.]
 
  -scott
 
 
  On Thu, Oct 22, 2009 at 2:42 PM, Steve Vandebogart 
 vand...@chromium.org
  wrote:
   It's been awhile since I looked at this, but the email I was able to
 dig
   up
   suggests that madvise is no faster than faulting in the mmap()ed
 region
   by
   hand.  However, using posix_fadvise should give the same speeds as
   read()ing
   it into memory.  IIRC though, posix_fadvise will only read so much in
 a
   single request and will let readahead handle the rest after that.
   --
   Steve
  
   On Thu, Oct 22, 2009 at 2:27 PM, Scott Hess sh...@chromium.org
 wrote:
  
   On Linux what about mmap() and then madvise() with MADV_WILLNEED?
  [or
   posix_fadvise() with POSIX_FADV_WILLNEED on the file descriptor).
  
   -scott
  
  
   On Thu, Oct 22, 2009 at 2:06 PM, Steve Vandebogart
   vand...@chromium.org
   wrote:
If you plan to read the entire file, mmap()ing it, then faulting it
in
will
be slower than read()ing it, at least in some Linux versions.  I
never
pinned down exactly why, but I think the kernel read-ahead
 mechanism
works
slightly differently.
--
Steve
   
On Thu, Oct 22, 2009 at 2:02 PM, Chris Evans cev...@chromium.org
wrote:
   
There's also option 3)
Pre-fault the mmap()ed region in the file thread upon dictionary
initialization.
On Linux at least, that may give you better behaviour than
 malloc()
+
read() in the event of memory pressure.
Cheers
Chris
   
On Thu, Oct 22, 2009 at 1:39 PM, Evan Stade est...@chromium.org
wrote:
   
Hi all,
   
At its last meeting the jank task force discussed improving
responsiveness of the spellchecker but we didn't come to a solid
conclusion so I thought I'd bring it up here to see if anyone
 else
has
opinions. The main concern is that we don't block the IO thread
 on
file access. To this end, I recently moved initialization of the
spellchecker from the IO thread to the file thread. However,
instead
of reading in the spellchecker dictionary in one solid chunk, we
memory-map it. Then later we check individual words on the IO
thread,
which will be slow since the dictionary starts off effectively
completely paged out. The proposal is that we read in the
dictionary
at spellchecker intialization instead of memory mapping it.
   
Memory mapping pros:
- possibly uses less overall memory, depending on the structure
 of
the
dictionary and the usage pattern of the user.
- strikeloading the dictionary doesn't block for a long
time/strike this one no longer occurs either way due to my
 recent
refactoring
   
Reading it all at once pros:
- costly disk accesses are kept to the file thread (excepting
future
memory paging)
- overall disk access time is probably lower (since we can read
 in
the
dict in one chunk)
   
For reference, the English dictionary is about 500K, and most
dictionaries are under 2 megs, some (such as Hungarian) are much
higher, but no dictionary is over 10 megs.
   
Opinions?
   
-- Evan Stade
   
   
   
   
   
   
   
   
   
  
  
 
 


--~--~-~--~~~---~--~~
Chromium Developers mailing list: