Re: why kafka index file use memory mapped files ,however log file doesn't

2018-02-13 Thread jan
1. Perhaps a human-readable log, being write-only, and which may
buffer on the user side or in the kernel, may be more efficient
because small writes are accumulated in a buffer (cheap) actually
pushed to disk (less cheap)? If you mmap'd this instead, how do you
feel it would behave?

2. Did you read the link to the post about mmapping? The guy knows
more about it than I'll probably ever know and he says it's not that
simple. He's saying mmap not a magic answer to anything.
This bit may be relevant: "APPLICATION BUFFERS WHICH EASILY FIT IN THE
L2 CACHE COST VIRTUALLY NOTHING ON A MODERN CPU!" (NB. post is from
2004 so with cache+cpu VS ram/disk discrepancies growing larger, it
may be more true).
Kafka messages can be largish so perhaps that suggests why they use it
for data files.

If this comes across as bit rude that wasn't intended. I can't really
answer your question, just suggest a bit of reading and some
guesswork.

cheers

jan

On 13/02/2018, YuFeng Shen <v...@hotmail.com> wrote:
>  If that is like what you said , why index file use the memory mapped file?
>
> 
> From: jan <rtm4...@googlemail.com>
> Sent: Monday, February 12, 2018 9:26 PM
> To: users@kafka.apache.org
> Subject: Re: why kafka index file use memory mapped files ,however log file
> doesn't
>
> A human-readable log file is likely to have much less activity in it
> (it was a year ago I was using kafka and we could eat up gigs for the
> data files but the log files were a few meg). So there's perhaps
> little to gain.
>
> Also if the power isn't pulled and the OS doesn't crash, log messages
> will be, I guess, buffered by the OS then written out as a full
> buffer, or perhaps every nth tick if the buffer fills up very slowly.
> So it's still reasonably efficient.
>
> Adding a few hundred context switches a second for the human log
> probably isn't a big deal. I remember seeing several tens of
> thousands/sec  when using kafka (although it was other processes
> running on those multicore machines to be fair). I guess logging
> overhead is down in the noise, though that's just a guess.
>
> Also I remember reading a rather surprising post about mmaping. Just
> found it
> <https://lists.freebsd.org/pipermail/freebsd-questions/2004-June/050371.html>.
> Sniplets:
> "There are major hardware related overheads to the use of mmap(), on
> *ANY* operating system, that cannot be circumvented"
> -and-
> "you are assuming that copying is always bad (it isn't), that copying
> is always horrendously expensive (it isn't), that memory mapping is
> always cheap (it isn't cheap),"
>
> A bit vague on my part, but HTH anyway
>
> jan
>
>
> On 12/02/2018, YuFeng Shen <v...@hotmail.com> wrote:
>> Hi jan ,
>>
>> I think the reason is the same as why index file using  memory mapped
>> file.
>>
>> As the memory mapped file can avoid the data copy between user and kernel
>> buffer space, so it can improve the performance for the index file IO
>> operation ,right? If it is ,why Log file cannot achieve the same
>> performance
>> improvement as memory mapped index file?
>>
>>
>> Jacky
>>
>>
>> 
>> From: jan <rtm4...@googlemail.com>
>> Sent: Saturday, February 10, 2018 8:33 PM
>> To: users@kafka.apache.org
>> Subject: Re: why kafka index file use memory mapped files ,however log
>> file
>> doesn't
>>
>> I'm not sure I can answer your question, but may I pose another in
>> return: why do you feel having a memory mapped log file would be a
>> good thing?
>>
>>
>> On 09/02/2018, YuFeng Shen <v...@hotmail.com> wrote:
>>> Hi Experts,
>>>
>>> We know that kafka use memory mapped files for it's index files ,however
>>> it's log files don't use the memory mapped files technology.
>>>
>>> May I know why index files use memory mapped files, however log files
>>> don't
>>> use the same technology?
>>>
>>>
>>> Jacky
>>>
>>
>


Re: why kafka index file use memory mapped files ,however log file doesn't

2018-02-13 Thread YuFeng Shen
 If that is like what you said , why index file use the memory mapped file?


From: jan <rtm4...@googlemail.com>
Sent: Monday, February 12, 2018 9:26 PM
To: users@kafka.apache.org
Subject: Re: why kafka index file use memory mapped files ,however log file 
doesn't

A human-readable log file is likely to have much less activity in it
(it was a year ago I was using kafka and we could eat up gigs for the
data files but the log files were a few meg). So there's perhaps
little to gain.

Also if the power isn't pulled and the OS doesn't crash, log messages
will be, I guess, buffered by the OS then written out as a full
buffer, or perhaps every nth tick if the buffer fills up very slowly.
So it's still reasonably efficient.

Adding a few hundred context switches a second for the human log
probably isn't a big deal. I remember seeing several tens of
thousands/sec  when using kafka (although it was other processes
running on those multicore machines to be fair). I guess logging
overhead is down in the noise, though that's just a guess.

Also I remember reading a rather surprising post about mmaping. Just
found it 
<https://lists.freebsd.org/pipermail/freebsd-questions/2004-June/050371.html>.
Sniplets:
"There are major hardware related overheads to the use of mmap(), on
*ANY* operating system, that cannot be circumvented"
-and-
"you are assuming that copying is always bad (it isn't), that copying
is always horrendously expensive (it isn't), that memory mapping is
always cheap (it isn't cheap),"

A bit vague on my part, but HTH anyway

jan


On 12/02/2018, YuFeng Shen <v...@hotmail.com> wrote:
> Hi jan ,
>
> I think the reason is the same as why index file using  memory mapped file.
>
> As the memory mapped file can avoid the data copy between user and kernel
> buffer space, so it can improve the performance for the index file IO
> operation ,right? If it is ,why Log file cannot achieve the same performance
> improvement as memory mapped index file?
>
>
> Jacky
>
>
> 
> From: jan <rtm4...@googlemail.com>
> Sent: Saturday, February 10, 2018 8:33 PM
> To: users@kafka.apache.org
> Subject: Re: why kafka index file use memory mapped files ,however log file
> doesn't
>
> I'm not sure I can answer your question, but may I pose another in
> return: why do you feel having a memory mapped log file would be a
> good thing?
>
>
> On 09/02/2018, YuFeng Shen <v...@hotmail.com> wrote:
>> Hi Experts,
>>
>> We know that kafka use memory mapped files for it's index files ,however
>> it's log files don't use the memory mapped files technology.
>>
>> May I know why index files use memory mapped files, however log files
>> don't
>> use the same technology?
>>
>>
>> Jacky
>>
>


Re: why kafka index file use memory mapped files ,however log file doesn't

2018-02-12 Thread jan
A human-readable log file is likely to have much less activity in it
(it was a year ago I was using kafka and we could eat up gigs for the
data files but the log files were a few meg). So there's perhaps
little to gain.

Also if the power isn't pulled and the OS doesn't crash, log messages
will be, I guess, buffered by the OS then written out as a full
buffer, or perhaps every nth tick if the buffer fills up very slowly.
So it's still reasonably efficient.

Adding a few hundred context switches a second for the human log
probably isn't a big deal. I remember seeing several tens of
thousands/sec  when using kafka (although it was other processes
running on those multicore machines to be fair). I guess logging
overhead is down in the noise, though that's just a guess.

Also I remember reading a rather surprising post about mmaping. Just
found it 
<https://lists.freebsd.org/pipermail/freebsd-questions/2004-June/050371.html>.
Sniplets:
"There are major hardware related overheads to the use of mmap(), on
*ANY* operating system, that cannot be circumvented"
-and-
"you are assuming that copying is always bad (it isn't), that copying
is always horrendously expensive (it isn't), that memory mapping is
always cheap (it isn't cheap),"

A bit vague on my part, but HTH anyway

jan


On 12/02/2018, YuFeng Shen <v...@hotmail.com> wrote:
> Hi jan ,
>
> I think the reason is the same as why index file using  memory mapped file.
>
> As the memory mapped file can avoid the data copy between user and kernel
> buffer space, so it can improve the performance for the index file IO
> operation ,right? If it is ,why Log file cannot achieve the same performance
> improvement as memory mapped index file?
>
>
> Jacky
>
>
> 
> From: jan <rtm4...@googlemail.com>
> Sent: Saturday, February 10, 2018 8:33 PM
> To: users@kafka.apache.org
> Subject: Re: why kafka index file use memory mapped files ,however log file
> doesn't
>
> I'm not sure I can answer your question, but may I pose another in
> return: why do you feel having a memory mapped log file would be a
> good thing?
>
>
> On 09/02/2018, YuFeng Shen <v...@hotmail.com> wrote:
>> Hi Experts,
>>
>> We know that kafka use memory mapped files for it's index files ,however
>> it's log files don't use the memory mapped files technology.
>>
>> May I know why index files use memory mapped files, however log files
>> don't
>> use the same technology?
>>
>>
>> Jacky
>>
>


Re: why kafka index file use memory mapped files ,however log file doesn't

2018-02-12 Thread Vincent Dautremont
Just a guess : wouldn't it be because the log files on disk can be made of
compressed data when produced but needs to be uncompressed on consumption
(of a single message) ?

2018-02-12 15:50 GMT+01:00 YuFeng Shen <v...@hotmail.com>:

> Hi jan ,
>
> I think the reason is the same as why index file using  memory mapped file.
>
> As the memory mapped file can avoid the data copy between user and kernel
> buffer space, so it can improve the performance for the index file IO
> operation ,right? If it is ,why Log file cannot achieve the same
> performance improvement as memory mapped index file?
>
>
> Jacky
>
>
> 
> From: jan <rtm4...@googlemail.com>
> Sent: Saturday, February 10, 2018 8:33 PM
> To: users@kafka.apache.org
> Subject: Re: why kafka index file use memory mapped files ,however log
> file doesn't
>
> I'm not sure I can answer your question, but may I pose another in
> return: why do you feel having a memory mapped log file would be a
> good thing?
>
>
> On 09/02/2018, YuFeng Shen <v...@hotmail.com> wrote:
> > Hi Experts,
> >
> > We know that kafka use memory mapped files for it's index files ,however
> > it's log files don't use the memory mapped files technology.
> >
> > May I know why index files use memory mapped files, however log files
> don't
> > use the same technology?
> >
> >
> > Jacky
> >
>

-- 
The information transmitted is intended only for the person or entity to 
which it is addressed and may contain confidential and/or privileged 
material. Any review, retransmission, dissemination or other use of, or 
taking of any action in reliance upon, this information by persons or 
entities other than the intended recipient is prohibited. If you received 
this in error, please contact the sender and delete the material from any 
computer.


Re: why kafka index file use memory mapped files ,however log file doesn't

2018-02-12 Thread YuFeng Shen
Hi jan ,

I think the reason is the same as why index file using  memory mapped file.

As the memory mapped file can avoid the data copy between user and kernel 
buffer space, so it can improve the performance for the index file IO operation 
,right? If it is ,why Log file cannot achieve the same performance improvement 
as memory mapped index file?


Jacky



From: jan <rtm4...@googlemail.com>
Sent: Saturday, February 10, 2018 8:33 PM
To: users@kafka.apache.org
Subject: Re: why kafka index file use memory mapped files ,however log file 
doesn't

I'm not sure I can answer your question, but may I pose another in
return: why do you feel having a memory mapped log file would be a
good thing?


On 09/02/2018, YuFeng Shen <v...@hotmail.com> wrote:
> Hi Experts,
>
> We know that kafka use memory mapped files for it's index files ,however
> it's log files don't use the memory mapped files technology.
>
> May I know why index files use memory mapped files, however log files don't
> use the same technology?
>
>
> Jacky
>


Re: why kafka index file use memory mapped files ,however log file doesn't

2018-02-10 Thread jan
I'm not sure I can answer your question, but may I pose another in
return: why do you feel having a memory mapped log file would be a
good thing?


On 09/02/2018, YuFeng Shen  wrote:
> Hi Experts,
>
> We know that kafka use memory mapped files for it's index files ,however
> it's log files don't use the memory mapped files technology.
>
> May I know why index files use memory mapped files, however log files don't
> use the same technology?
>
>
> Jacky
>


why kafka index file use memory mapped files ,however log file doesn't

2018-02-08 Thread YuFeng Shen
Hi Experts,

We know that kafka use memory mapped files for it's index files ,however it's 
log files don't use the memory mapped files technology.

May I know why index files use memory mapped files, however log files don't  
use the same technology?


Jacky