Re: 4.7.0-rc7 ext4 error in dx_probe

2016-08-22 Thread Török Edwin
On 2016-08-18 02:28, Darrick J. Wong wrote:
> On Wed, Aug 17, 2016 at 05:27:25PM +0300, Török Edwin wrote:
>> On 2016-08-08 09:28, Darrick J. Wong wrote:
>>> On Sun, Aug 07, 2016 at 11:56:34PM -0400, Theodore Ts'o wrote:
>>>> On Fri, Aug 05, 2016 at 12:15:48PM -0700, Darrick J. Wong wrote:
>>>>>>> [1] 
>>>>>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/ext4/inode.c?id=b47820edd1634dc1208f9212b7ecfb4230610a23
>>>>>>
>>>>>> I added the patch, rebuilt and rebooted.  It will take some time
>>>>>> before I'll report back since the issue is so hard to reproduce.
>>>>>
>>>>> FWIW I could trigger it reliably by running a bunch of directory traversal
>>>>> programs simultaneously on the same directory.  I have a script that fires
>>>>> up multiple mutts pointing to the Maildirs for the high traffic Linux 
>>>>> lists.
>>>>
>>>> Hmm, I wonder if we should request that this patch be backported to
>>>> -stable.  Darrick, what do you think?
>>>
>>> Seems like an excellent idea.
>>
>> Hi,
>>
>> I see 4.7.1. was released, however before testing it in the changelog I can't
>> see the inode.c commit with the checksum fix.  Is there another commit that
>> would provide an equivalent fix, or is the fix not part of 4.7.1?
> 
> There's no 'equivalent' fix; the one we have simply didn't make it
> to the stable maintainers before 4.7.1 came out.
> 
> (Hey Ted, can you send the patch to ... whomever's in charge of 4.7?)

Ping? 4.7.2 is out

The process for submitting to stable seems to be described here [1],
it doesn't say who can submit, but I  guess it'd be better if one of the people 
listed in the original commit,
or an ext4 maintainer submitted it.

FWIW 4.7.0+patch has been working reliably here, would be nice if I could tell 
people here that its safe to upgrade to 4.7.x (without manually patching it).

Best regards,
--Edwin


Re: 4.7.0-rc7 ext4 error in dx_probe

2016-08-18 Thread Török Edwin
On 2016-08-17 23:02, Johannes Stezenbach wrote:
> On Fri, Aug 05, 2016 at 08:11:36PM +0200, Johannes Stezenbach wrote:
>> On Fri, Aug 05, 2016 at 10:02:28AM -0700, Darrick J. Wong wrote:
>>>
>>> When you're back on 4.7, can you apply this patch[1] to see if it fixes
>>> the problem?  I speculate that the new parallel dir lookup code enables
>>> multiple threads to be verifying the same directory block buffer at the
>>> same time.
>>>
>>> [1] 
>>> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/ext4/inode.c?id=b47820edd1634dc1208f9212b7ecfb4230610a23
>>
>> I added the patch, rebuilt and rebooted.  It will take some time
>> before I'll report back since the issue is so hard to reproduce.
> 
> FWIW, so far the issue didn't appear again after I applied
> the patch to 4.7.0, and I stressed it a bit with repo syncs,
> AOSP builds, rsync backups etc.

Same here: I've been using 4.7.0+patch daily, and everything is working fine.

Best regards,
--Edwin


Re: 4.7.0-rc7 ext4 error in dx_probe

2016-08-17 Thread Török Edwin
On 2016-08-08 09:28, Darrick J. Wong wrote:
> On Sun, Aug 07, 2016 at 11:56:34PM -0400, Theodore Ts'o wrote:
>> On Fri, Aug 05, 2016 at 12:15:48PM -0700, Darrick J. Wong wrote:
> [1] 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/fs/ext4/inode.c?id=b47820edd1634dc1208f9212b7ecfb4230610a23

 I added the patch, rebuilt and rebooted.  It will take some time
 before I'll report back since the issue is so hard to reproduce.
>>>
>>> FWIW I could trigger it reliably by running a bunch of directory traversal
>>> programs simultaneously on the same directory.  I have a script that fires
>>> up multiple mutts pointing to the Maildirs for the high traffic Linux lists.
>>
>> Hmm, I wonder if we should request that this patch be backported to
>> -stable.  Darrick, what do you think?
> 
> Seems like an excellent idea.

Hi,

I see 4.7.1. was released, however before testing it in the changelog I can't 
see the inode.c commit with the checksum fix.
Is there another commit that would provide an equivalent fix, or is the fix not 
part of 4.7.1?

Best regards,
--Edwin


Re: 4.7.0-rc7 ext4 error in dx_probe

2016-08-09 Thread Török Edwin
On 2016-08-09 05:37, Darrick J. Wong wrote:
> On Tue, Aug 09, 2016 at 12:13:01AM +0300, Török Edwin wrote:
>> On 2016-08-08 19:55, Darrick J. Wong wrote:
>>> On Mon, Aug 08, 2016 at 12:08:18PM -0400, Theodore Ts'o wrote:
>>>> On Sun, Aug 07, 2016 at 11:28:10PM -0700, Darrick J. Wong wrote:
>>>>>
>>>>> I have one lingering concern -- is it a bug that two processes could be
>>>>> computing the checksum of a buffer simultaneously?  I would have thought 
>>>>> ext4
>>>>> would serialize that kind of buffer_head access...
>>>>
>>>> Do we know how this is happening?  We've always depended on the VFS to
>>>> provide this exclusion.  The only way we should be modifying the
>>>> buffer_head at the same time if two CPU's are trying to modify the
>>>> directory at the same time, and that should _never_ be happening, even
>>>> with the new directory parallism code, unless the file system has
>>>> given permission and intends to do its own fine-grained locking.
>>>
>>> It's a combination of two things, I think.  The first is that the
>>> checksum calculation routine (temporarily) set the checksum field to
>>> zero during the computation, which of course is a no-no.  The patch
>>> fixes that problem and should go in.
>>
>> Thanks a lot for the patch.
>> I wrote a small testcase (see below) that triggers the problem quite soon on
>> my box with kernel 4.7.0, and seems to have survived so far with kernel
>> 4.7.0+patch.
>> When it failed it printed something like "readdir: Bad message".
>>
>> The drop caches part is quite important for triggering the bug, and might
>> explain why this bug was hard to reproduce: IIUC this race condition can
>> happen only if 2+ threads/processes try to access the same directory, and the
>> directory's inode is not in the cache (i.e. was never cached, or got kicked
>> out of the cache).
> 
> Could you formulate this into an xfstest, please?  It would be very useful to
> have this as a regression test.
> 
> (Or attach a Signed-off-by and I'll take care of it eventually.)

I've attached a signed-off-by line and a copyright header (feel free to add 
yourself in the copyright header too):

Signed-off-by: Török Edwin 

>> /*
>>  $ gcc trigger.c -o trigger -pthread
>>  $ ./trigger
>> */

/*
 * Test concurrent readdir on uncached inodes
 *
 * Copyright (C) 2016 Skylable Ltd.
 *
 * This program is free software; you can redistribute it and/or
 * modify it under the terms of the GNU General Public License
 * as published by the Free Software Foundation; either version 2
 * of the License, or (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, 
USA.
 */

>>
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>> #include 
>>
>> #define FILES 10
>> #define THREADS 16
>> #define LOOPS 1000
>>
>> static void die(const char *msg)
>> {
>>  perror(msg);
>>  exit(EXIT_FAILURE);
>> }
>>
>> static void* list(void* arg)
>> {
>>  for(int i=0;i>  DIR *d = opendir(".");
>>  if (!d) {
>>  die("opendir");
>>  }
>>  errno = 0;
>>  while(readdir(d)) {}
>>  if (errno) {
>>  die("readdir");
>>  }
>>  closedir(d);
>>  FILE *f = fopen("/proc/sys/vm/drop_caches", "w");
>>  if (f) {
>>  fputs("3", f);
>>  fclose(f);
>>  }
>>  }
>>  return NULL;
>> }
>>
>> int main()
>> {
>>  pthread_t t[THREADS];
>>
>>  if(mkdir("ext4test", 0755) < 0 && errno != EEXIST)
>>  die("mkdir");
>>  if(chdir("ext4test") < 0)
>>  die("chdir");
>>  for (unsigned i=0;i < FILES;i++) {
>>  char name[16];
>>  snprintf(name, sizeof(name), "%d", i); 
>>  int fd = open(name, O_WRONLY|O_CREAT, 0600);
>>  if (fd < 0)
>>  die("open");
>>  close(fd);
>>  }
>>  for (unsigned i=0;i < sizeof(t)/sizeof(t[0]); i++) {
>>  pthread_create(&t[i], NULL,list, NULL);
>>  }
>>  for (unsigned i=0;i < sizeof(t)/sizeof(t[0]); i++) {
>>  pthread_join(t[i], NULL);
>>  }
>>  return 0;
>> }
>>
>>
>>
>> -- 
>> Edwin Török | Co-founder and Lead Developer
>>
>> Skylable open-source object storage: reliable, fast, secure
>> http://www.skylable.com
> 


-- 
Edwin Török | Co-founder and Lead Developer

Skylable open-source object storage: reliable, fast, secure
http://www.skylable.com


Re: 4.7.0-rc7 ext4 error in dx_probe

2016-08-08 Thread Török Edwin
On 2016-08-08 19:55, Darrick J. Wong wrote:
> On Mon, Aug 08, 2016 at 12:08:18PM -0400, Theodore Ts'o wrote:
>> On Sun, Aug 07, 2016 at 11:28:10PM -0700, Darrick J. Wong wrote:
>>>
>>> I have one lingering concern -- is it a bug that two processes could be
>>> computing the checksum of a buffer simultaneously?  I would have thought 
>>> ext4
>>> would serialize that kind of buffer_head access...
>>
>> Do we know how this is happening?  We've always depended on the VFS to
>> provide this exclusion.  The only way we should be modifying the
>> buffer_head at the same time if two CPU's are trying to modify the
>> directory at the same time, and that should _never_ be happening, even
>> with the new directory parallism code, unless the file system has
>> given permission and intends to do its own fine-grained locking.
> 
> It's a combination of two things, I think.  The first is that the
> checksum calculation routine (temporarily) set the checksum field to
> zero during the computation, which of course is a no-no.  The patch
> fixes that problem and should go in.

Thanks a lot for the patch.
I wrote a small testcase (see below) that triggers the problem quite soon on my 
box with kernel 4.7.0, and seems to have survived so far with kernel 
4.7.0+patch.
When it failed it printed something like "readdir: Bad message".

The drop caches part is quite important for triggering the bug, and might 
explain why this bug was hard to reproduce: IIUC this race condition can happen 
only
if 2+ threads/processes try to access the same directory, and the directory's 
inode is not in the cache (i.e. was never cached, or got kicked out of the 
cache).


/*
 $ gcc trigger.c -o trigger -pthread
 $ ./trigger
*/

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define FILES 10
#define THREADS 16
#define LOOPS 1000

static void die(const char *msg)
{
perror(msg);
exit(EXIT_FAILURE);
}

static void* list(void* arg)
{
for(int i=0;ihttp://www.skylable.com


Re: 4.7.0-rc7 ext4 error in dx_probe

2016-08-03 Thread Török Edwin
Hi,


I have just encountered a similar problem after I've recently upgraded to 4.7.0:
[Wed Aug  3 11:08:57 2016] EXT4-fs error (device dm-1): dx_probe:740: inode 
#13295: comm python: Directory index failed checksum
[Wed Aug  3 11:08:57 2016] Aborting journal on device dm-1-8.
[Wed Aug  3 11:08:57 2016] EXT4-fs (dm-1): Remounting filesystem read-only
[Wed Aug  3 11:08:57 2016] EXT4-fs error (device dm-1): 
ext4_journal_check_start:56: Detected aborted journal

I've rebooted in single-user mode, fsck fixed the filesystem, and rebooted, 
filesystem is rw again now.

inode #13295 seems to be this and I can list it now:
stat /usr/lib64/python3.4/site-packages
  File: '/usr/lib64/python3.4/site-packages'
  Size: 12288   Blocks: 24 IO Block: 4096   directory
Device: fd01h/64769dInode: 13295   Links: 180
Access: (0755/drwxr-xr-x)  Uid: (0/root)   Gid: (0/root)
Access: 2016-05-09 11:29:44.056661988 +0300
Modify: 2016-08-01 00:34:24.029779875 +0300
Change: 2016-08-01 00:34:24.029779875 +0300
 Birth: -

The filesystem was /, I only noticed it was readonly after several hours when I 
tried to install something:
/dev/mapper/vg--ssd-root on / type ext4 
(rw,noatime,errors=remount-ro,data=ordered)

$ uname -a
Linux bolt 4.7.0-gentoo-rr #1 SMP Thu Jul 28 11:28:56 EEST 2016 x86_64 AMD 
FX(tm)-8350 Eight-Core Processor AuthenticAMD GNU/Linux

FWIW I've been using ext4 for years and this is the first time I see this 
message.
Prior to 4.7 I was on 4.6.1 -> 4.6.2 -> 4.6.3 -> 4.6.4.

The kernel is from gentoo-sources + a patch for enabling AMD LWP (I had that 
patch since 4.6.3 and its not related to I/O).

If I see this message again what should I do to obtain more information to 
trace down the root cause?

Best regards,
-- 
Edwin Török | Co-founder and Lead Developer

Skylable open-source object storage: reliable, fast, secure
http://www.skylable.com


Re: xfsaild causing 30+ wakeups/s on an idle system since 2.6.25-rcX

2008-02-18 Thread Török Edwin
David Chinner wrote:
> On Sun, Feb 17, 2008 at 05:51:08PM +0100, Oliver Pinter wrote:
>   
>> On 2/17/08, Török Edwin <[EMAIL PROTECTED]> wrote:
>> 
>>> Hi,
>>>
>>> xfsaild is causing many wakeups, a quick investigation shows
>>> xfsaild_push is always
>>> returning 30 msecs timeout value.
>>>   
>
> That's a bug

Ok. Your patches fixes the 30+ wakeups :)

> , and has nothing to do with power consumption. ;)
>   

I suggest using a sysctl value (such as
/proc/sys/vm/dirty_writeback_centisecs), instead of a hardcoded default
1000.
That would further reduce the wakeups.

>
> I only managed to reproduce this on a lazy superblock counter
> filesystem (i.e.  new mkfs and recent kernel), 

The filesystem was created in July 2007

> Can you see if the patch below fixes the problem.

Yes, it reduces wakeups to 1/sec.

Thanks,
--Edwin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


xfsaild causing 30+ wakeups/s on an idle system since 2.6.25-rcX

2008-02-17 Thread Török Edwin
Hi,

xfsaild is causing many wakeups, a quick investigation shows
xfsaild_push is always
returning 30 msecs timeout value.

This is on an idle system, running only gnome, and gnome-terminal.

I suggest changing the timeout logic in xfsaild to be more power
consumption friendly.

See below my original report to the powerTOP mailing list.

Best regards,
--Edwin
--- Begin Message ---

Török Edwin wrote:

Török Edwin wrote:

Hi,

On latest -git of 2.6.25 I am getting lots of wakeups from xfsaild.
  23.5% ( 33.3)   xfsaild : schedule_timeout (process_timeout)
  


[Should I Cc: xfs mailing list / lkml on this?]

The problem seems to be with the timeout logic in xfsaild_push, which
can return 3 timeout values (msecs): 1000, 10, 20, 30.
I inserted a marker and attached a probe function, schedule_timeout
always got called with 9 jiffies (which is 30 msecs, I have HZ=300).

Changing xfs_trans_ail.c:270 from "tout += 20" to "tout = 1000", made
xfsaild only do 1 wakeup/s instead of 33!

For some reason xfs_aild always thinks it has work (I/O) to do, and
never chooses the 1000 msec sleep value.



sounds like an XFS bug... worth reporting to the xfs/lkml folks for sure.

--- End Message ---


LatencyTOP: sync_page description

2008-02-15 Thread Török Edwin
Hi Arjan,

LatencyTOP says that sync_page is 'Writing a page to disk', however
I see that even when no writes are involved, such as during a
readdir, lseek, etc.
Naming it a write is misleading, as no program is running that is
doing constant writes to the disk. The only program is writing to a
temp dir in /dev/shm.

What would be a better description for sync_page?

Here are some /proc/latency_stats containing sync_page:

125 6937678 210821 sync_page sync_page_killable sync_page_killable
__lock_page_killable wake_bit_function generic_file_aio_read
get_unused_fd_flags path_walk do_sync_read autoremove_wake_function
security_file_permission rw_verify_area
306 5677749 215746 sync_page sync_page_killable sync_page_killable
__lock_page_killable wake_bit_function generic_file_aio_read
do_sync_read autoremove_wake_function security_file_permission
rw_verify_area vfs_read vfs_llseek
21 435657 59966 sync_page sync_page __lock_page wake_bit_function
read_cache_page_async ntfs_readpage read_cache_page map_mft_record
ntfs_read_locked_inode ntfs_alloc_big_inode iget5_locked
ntfs_test_inode
195 2716409 133660 blk_unplug sync_page sync_page __lock_page
wake_bit_function read_cache_page_async ntfs_readpage
read_cache_page map_mft_record ntfs_read_locked_inode
ntfs_alloc_big_inode iget5_locked
28 1881278 181986 add_to_page_cache_lru sync_page sync_page_killable
sync_page_killable __lock_page_killable wake_bit_function
generic_file_aio_read get_unused_fd_flags path_walk do_sync_read
autoremove_wake_function security_file_permission
2 17132 9746 add_to_page_cache_lru sync_page sync_page_killable
sync_page_killable __lock_page_killable wake_bit_function
generic_file_aio_read do_sync_read autoremove_wake_function
security_file_permission rw_verify_area vfs_read
1 70 70 irq_exit sync_page sync_page_killable sync_page_killable
__lock_page_killable wake_bit_function generic_file_aio_read
do_sync_read autoremove_wake_function security_file_permission
rw_verify_area vfs_read
23 306682 114514 blk_unplug sync_page sync_page_killable
sync_page_killable __lock_page_killable wake_bit_function
generic_file_aio_read do_sync_read autoremove_wake_function
security_file_permission rw_verify_area vfs_read
1 153 153 hrtimer_interrupt smp_apic_timer_interrupt sync_page
sync_page_killable sync_page_killable __lock_page_killable
wake_bit_function generic_file_aio_read do_sync_read
autoremove_wake_function cfq_idle_slice_timer security_file_permission

Thanks,
--Edwin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange interaction between latencytop and the scheduler

2008-01-24 Thread Török Edwin
Török Edwin wrote:
>   
>> The performance aspect... collecting the data isn't cheap (which is
>> why it's made a sysctl),
>> I still plan to look at optimizing it but it won't ever be free.
>> 
>
> Yes, I understand that. Is there a way latencytop could track its own
> overhead? I suppose it would lead to more accurate results
> (not that there would be anything wrong with the current ones).
>   

Latencytop userspace tool shows latencies > 0.1 msec, thus capturing
backtraces for latencies <0.1msec could be avoided.
If I apply the patch below, then enabling latencytop doesn't freeze X
when running the "10-threads doing infloop usleep(1)" test.
Still, I don't want to loose track of the latencies we didn't collect
backtraces for, so I added a special "untraced" category, reported as
first line in /proc/latency_stats. If needed, instead of hardcoding the
threshold, it could be made a sysctl, or set via writing to
/proc/latency_stats,...

While I am running the test-program:
$ grep untraced /proc/latency_stats
4875605 5120414 49 untraced

On an idle system:
$ grep untraced /proc/latency_stats
532 3287 47 untraced
$ grep untraced /proc/latency_stats
853 5778 47 untraced
$ grep untraced /proc/latency_stats
950 6788 47 untraced
$ grep untraced /proc/latency_stats
1343 9977 49 untraced
$ grep untraced /proc/latency_stats
1448 11075 49 untraced

Best regards,
--Edwin

---
latencytop.c |   19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)
--- /tmp/linux-2.6.24-rc8/kernel/latencytop.c2008-01-24
11:27:32.727487585 +0200
+++ kernel/latencytop.c2008-01-24 10:42:25.0 +0200
@@ -24,8 +24,11 @@
 static DEFINE_SPINLOCK(latency_lock);
 
 #define MAXLR 128
+/* we are not interested in latencies less than 0.1msec, so
+ * don't get backtraces for latencies <0.05msec.*/
+#define LATENCY_TRACE_THRESHOLD 50
 static struct latency_record latency_record[MAXLR];
-
+static struct latency_record untraced;
 int latencytop_enabled;
 
 void clear_all_latency_tracing(struct task_struct *p)
@@ -47,6 +50,7 @@
 
 spin_lock_irqsave(&latency_lock, flags);
 memset(&latency_record, 0, sizeof(latency_record));
+memset(&untraced, 0, sizeof(untraced));
 spin_unlock_irqrestore(&latency_lock, flags);
 }
 
@@ -124,6 +128,15 @@
 if (inter && usecs > 5000)
 return;
 
+if(usecs < LATENCY_TRACE_THRESHOLD) {
+/* don't get stacktrace for very low latencies */
+untraced.time += usecs;
+if(usecs > untraced.max)
+untraced.max = usecs;
+untraced.count++;
+return;
+}
+
 memset(&lat, 0, sizeof(lat));
 lat.count = 1;
 lat.time = usecs;
@@ -177,6 +190,10 @@
 
 seq_puts(m, "Latency Top version : v0.1\n");
 
+seq_printf(m, "%i %li %li untraced \n",
+untraced.count,
+untraced.time,
+untraced.max);
 for (i = 0; i < MAXLR; i++) {
 if (latency_record[i].backtrace[0]) {
 int q;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Strange interaction between latencytop and the scheduler

2008-01-22 Thread Török Edwin
Arjan van de Ven wrote:
> Török Edwin wrote:
>> Is this normal? (is overhead really 40msec?)
>
> I'm seeing similar, I would not rule out that this is actual scheduler
> behavior ;(
> at least it seems consistent.

Ok, it is good that we are seeing same behaviour.

> I also made a patch (to lkml yesterday) that counts the total and
> count of scheduler latencies
> to also show the cumulative effect of these latencies
Found it: http://lkml.org/lkml/2008/1/21/223.
I'll try it the next days. Do you want me to report results? [and
especially, should I Cc: Ingo Molnar on the report?]
>
>> I was also seeing an unusually high number of context switches (as shown
>> by vmstat), I usually have 400-800 with non-patched kernels (while
>> running mplayer too), but I was getting steadily over 1100 with the
>> patch (on idle system).
>
> that's weird; there's nothing blocking in the entire patch at all.
> (the only lock is a spinlock)

Maybe the overhead of latencytop is triggerring other behaviour in the
scheduler? Just a guess.
Are there any tests I can run to see why there are more context switches?

>
> The performance aspect... collecting the data isn't cheap (which is
> why it's made a sysctl),
> I still plan to look at optimizing it but it won't ever be free.

Yes, I understand that. Is there a way latencytop could track its own
overhead? I suppose it would lead to more accurate results
(not that there would be anything wrong with the current ones).
>
>> * I compile without CONFIG_SCHED_DEBUG, I no longer get *any* latency
>> from the scheduler, even if I run multi-threaded programs, etc. Is this
>> to be expected? (i.e. is this feature available only when enabling
>> CONFIG_SCHED_DEBUG?)
>
> afaik yes.

Ok.

>> * percentages: new feature (nice!), but the values seem all wrong
>> (global perc. are always wrong, per app perc. are usually ok, but see
>> second example below)
> Some minor latencytop userspace issues:
>
> note that the percentages are percentage this entry had compared the
> total sum of latencies.
> The maximum entry is only loosely related to that; say you have 1
> latency in "foo" for 100ms
> but 900 of 1ms in "bar", "foo" will show 10% and "bar" will show 90%

Thanks, that explains it. Looks like a good substitute for the average
column.
This is the functionality I was missing, when that column went away.
Still, I wouldn't mind to see the average column too (maybe activated
via a hotkey, or shown only if enough screenspace is available?).

>
>> * I miss the Average latency column. If it is too costly to keep account
>> of an overall average, can we have last N second average?
>
> it's not costly to calculate, it's the screen space versus the value
> of the information :(

If I stretch my terminal window there's room for 2 or 3 more columns :)

>
>> * unknown reasons show a backtrace, but backtrace doesn't have enough
>> room on screen
>
> still working on that; you can pass the --unknown option to dump these
> so that I can add them
> to the translation table.

I gathered these while writing this reply:

Unknown: put_device elv_insert blk_plug_device default_wake_function
blk_execute_rq blk_rq_bio_prep blk_rq_append_bio blk_rq_map_user sg_io
scsi_cmd_ioctl ip_queue_xmit tcp_transmit_skb
Unknown: md_thread autoremove_wake_function md_thread md_thread kthread
child_rip kthread child_rip
Unknown: kswapd autoremove_wake_function kswapd kswapd kthread child_rip
kthread child_rip


Best regards,
--Edwin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Strange interaction between latencytop and the scheduler

2008-01-22 Thread Török Edwin
Hi Arjan, Ingo,

I am using latest latencytop from git (commit
92b6ca9d40998697866689f64b95647eca3200cb), and I'm seeing some strange
things:
[for minor latencytop userspace specific issues see at end of mail]

* if I compile with CONFIG_SCHED_DEBUG I see a 30-40msec latency from
the scheduler, always (Scheduler: waiting for cpu).
There is also a constant ~5% user, ~2% sys CPU usage on an idle system,
regardless  if latencytop sysctl is on or off.
Is this normal? (is overhead really 40msec?)
I was also seeing an unusually high number of context switches (as shown
by vmstat), I usually have 400-800 with non-patched kernels (while
running mplayer too), but I was getting steadily over 1100 with the
patch (on idle system).

CPU usage with latencytop on:
Cpu(s):  7.3%us,  7.0%sy,  0.3%ni, 85.3%id,  0.0%wa,  0.0%hi,  0.0%si, 
0.0%st

CPU usage with latencytop sysctl off:
Cpu(s):  5.0%us,  1.7%sy,  0.0%ni, 93.3%id,  0.0%wa,  0.0%hi,  0.0%si, 
0.0%st

However top won't tell me who's using 5% CPU!
PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
1 root  20   0 10388  648  548 S  0.0  0.0   0:01.70 init
2 root  15  -5 000 S  0.0  0.0   0:00.00 kthreadd
3 root  15  -5 000 S  0.0  0.0   0:00.13 ksoftirqd/0
4 root  RT  -5 000 S  0.0  0.0   0:00.00 watchdog/0

High number of context switches (>1100):
procs ---memory-- ---swap-- -io -system--
cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy
id wa
 2  0520  60828  92276 83221200   28155  330 1215 16  3
73  8
 0  0520  60828  92276 83221200 0 2  242 1158  5  1
94  0
 0  0520  60828  92288 83221200 0 3  290 1261  8  1
91  0
 0  0520  60836  92288 83221600 1 0  227 1407 10  2
89  0

Scheduler latency:
CauseMaximum Percentage
put_device elv_insert blk_plug_device default_wake152.4 msec  4.3 %
Creating block layer request   68.1 msec  2.5 %
Scheduler: waiting for cpu 26.6 msec  6.5 %
Writing a page to disk 15.5 msec  1.8 %
Checking for media change  10.3 msec  7.8 %
SCSI cdrom ioctl9.4 msec  1.3 %
Checking CDROM media present7.1 msec  0.3 %
Waiting for event (poll)5.0 msec 59.4 %
Application requested delay 4.9 msec 11.6 %

Process hddtemp (3772)
put_device elv_insert blk_plug_device default_wake152.4 msec
61.5 %blk_rq_bio_prep blk_rq_append_bio blk_rq_map_user sg_io scsi_
Creating block layer request   68.1 msec 36.3 %
Scheduler: waiting for cpu  6.5 msec  2.2 %


* I compile without CONFIG_SCHED_DEBUG, I no longer get *any* latency
from the scheduler, even if I run multi-threaded programs, etc. Is this
to be expected? (i.e. is this feature available only when enabling
CONFIG_SCHED_DEBUG?)


I was trying to trigger some scheduler latencies artifically, so I wrote
a program that creates 10 threads, each doing an usleep(1) in an infloop.
Still I get no scheduler latency (I was getting scheduler latency with
version 0.1).
If latencytop sysctl is on, when running this test, after a short while
X becomes unusable, and almost freezes; yet I still can't see latencies.
Before it freezes I captured top outputs (see below), notice how
otherwise idle programs get a high CPU usage!
If latencytop sysctl is off no freeze occurs, and CPU usage% is as
normal (a.out getting 99%)

Top output when latencytop sysctl is on:

top - 19:33:31 up  1:28,  2 users,  load average: 3.45, 1.21, 1.05
Tasks: 120 total,   9 running, 110 sleeping,   1 stopped,   0 zombie
Cpu(s):  3.9%us, 95.5%sy,  0.6%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si, 
0.0%st
Mem:   2060592k total,  1979756k used,80836k free,69028k buffers
Swap:  3903724k total,0k used,  3903724k free,  1473896k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
 7008 edwin 20   0 38764  472  356 R 58.4  0.0   0:05.94 a.out
 5613 edwin 20   0  179m  10m 7452 R 17.3  0.5   1:45.19 gkrellm
 5920 edwin 20   0  565m  90m  25m R 17.3  4.5   1:41.82 icedove-bin

This is an idle program!
 5610 edwin 20   0  152m  21m  13m R  1.9  1.1   1:28.88 konsole
 6749 edwin 20   0  254m 7800 5496 S  1.9  0.4   0:00.87 mplayer
 5209 root  20   0  110m  35m 9016 S  1.3  1.8   1:36.31 Xorg
  908 root  15  -5 000 S  0.6  0.0   0:01.60 scsi_eh_0
 5132 postgres  20   0 77216 1284  736 S  0.6  0.1   0:00.35 postgres
 5151 root  35  15 41068  29m  648 S  0.6  1.5   0:27.96 preload
 7005 edwin 20   0 18976 1184  880 R  0.6  0.1   0:00.

Re: [PATCH] LatencyTop: make reason (blk_execute_rq+0x7a/0xd) known

2008-01-19 Thread Török Edwin
Arjan van de Ven wrote:
> Török Edwin wrote:
>>
>> Cause   Maximum 
>> Average
>> SCSI device ioctl  34.2 msec   
>> 14.4 msec
>
> great! I'll put this into my patchkit!

Thanks.

>> I also noticed some unsigned overflows, my top latency was sometimes a
>> very large number (18), which
>> was probably -1, or some other negative number.  I haven't found a way
>> to reproduce it yet (it happens very rarely).
>
>
> I've not seen this; I'll take another peak at the code.

I just captured this:

Cause   Maximum  Average
Waiting for userspace lock18446744073638300.0
msec9223372036783524.0 msec
Waiting for processes to die (waitpid)18446744073638296.0
msec9223372036783520.0 msec
Waiting for event (poll)  18446744073565216.0
msec764505121372.0 msec
SCSI device ioctl  35.2 msec
13.7 msec
Application requested delay19.5 msec 
0.0 msec
page fault 18.6 msec 
4.0 msec
Reading from file  16.6 msec 
1.5 msec
Waiting for buffer IO  15.2 msec 
3.0 msec
mutex lock 15.0 msec
15.0 msec

I was also looking at /proc/latency_stats using watch, and I've seen
that there were negative number.
Unfortunately I couldn't copy+paste it, because it was gone too fast.

I was running a program with 4 threads doing usleep(0) in an inf-loop.
However I can't reproduce the overflow now with the same program.

Best regards,
--Edwin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] LatencyTop: make reason (blk_execute_rq+0x7a/0xd) known

2008-01-19 Thread Török Edwin
Hi Arjan,

One of the top reasons in latencytop is
Unknown reason (blk_execute_rq+0x7a/0xd)  152.3 msec
49.0 msec

Its from hddtemp, hald-addon-storage, ...

I wrote a simple systemtap script [1] to find out where its called from,
and its definitely from the ioctl (see below).

1820322406
hddtemp(3999):  

<- latency of 153260223 ns
Returning from: 0x80311ab0 : blk_execute_rq+0x0/0xd0 []
Returning to  : 0x80315580 : sg_io+0x200/0x3c0 []
 0x80315a02 : scsi_cmd_ioctl+0x2c2/0x410 []
 0x80417153 : ip_queue_xmit+0x2b3/0x420 []
 0x880c3be9 : sd_ioctl+0x99/0xf0 [sd_mod]
 0x80313839 : blkdev_driver_ioctl+0x39/0x40 []
 0x80313ab8 : blkdev_ioctl+0x278/0x710 []
 0x803e540d : release_sock+0x8d/0xa0 []
 0x803e54b3 : lock_sock_nested+0x93/0xa0 []
 0x8029a4ff : d_alloc+0x6f/0x1a0 []
 0x802f102f : security_d_instantiate+0x1f/0x30 []
 0x803e540d : release_sock+0x8d/0xa0 []
 0x8043c8c8 : inet_accept+0x88/0xc0 []
 0x803e408e : move_addr_to_user+0x5e/0x70 []
 0x802f0bb1 : security_socket_post_accept+0x11/0x20 []
 0x803e472e : sys_accept+0x13e/0x1d0 []
 0x802af12b : block_ioctl+0x1b/0x20 []
 0x8029482b : do_ioctl+0x1b/0x70 []
 0x80294a6d : vfs_ioctl+0x1ed/0x290 []
 0x80296281 : sys_select+0x41/0x1b0 []
 0x80294b92 : sys_ioctl+0x82/0xa0 []
 0x8020c18e : system_call+0x7e/0x83 []


I suggest adding an annotation to make this reason known to latencytop
by annotating sd_ioctl/scsi_cmd_ioctl, see my patch below.
After my patch, instead of blk_execute_rq the real reason is showed:

Cause   Maximum  Average
SCSI device ioctl  34.2 msec
14.4 msec

I noticed there's a similar annotation in sr.c for cdrom_ioctl, but sr.c
also has annotations for scsi_nonblockable_ioctl, and scsi_ioctl.
If they are needed for sd_block_ioctl too, maybe
scsi_nonblockable_ioctl/scsi_ioctl should have the annotation themselves.

I also noticed some unsigned overflows, my top latency was sometimes a
very large number (18), which
was probably -1, or some other negative number.  I haven't found a way
to reproduce it yet (it happens very rarely).

[1] Systemtap script:
#!/usr/bin/stap -v
global start
probe kernel.function("blk_execute_rq") {
start[tid()] = gettimeofday_ns()
}

probe kernel.function("blk_execute_rq").return {
# if we have not seen this before
if (!([tid()] in start)) next
delta = gettimeofday_ns() - start[tid()]
# show backtraces for latencies over 1ms
if(delta > 100) {
printf("%s <- latency of %d ns\n", thread_indent(1), delta)
print_backtrace ()
}
delete start[tid()]
}

P.S.: LatencyTop could have a feature to ask for more details on unknown
latency reasons, and it would generate a systemtap script itself, that
would show the backtraces to help figure out whats going on.

Best regards,
--Edwin



diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index a69b155..93ccfc0 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -686,6 +687,7 @@ static int sd_ioctl(struct inode * inode, struct
file * filp,
 struct scsi_device *sdp = scsi_disk(disk)->device;
 void __user *p = (void __user *)arg;
 int error;
+struct latency_entry reason;

 SCSI_LOG_IOCTL(1, printk("sd_ioctl: disk=%s, cmd=0x%x\n",
 disk->disk_name, cmd));
@@ -710,7 +712,9 @@ static int sd_ioctl(struct inode * inode, struct
file * filp,
 case SCSI_IOCTL_GET_BUS_NUMBER:
 return scsi_ioctl(sdp, cmd, p);
 default:
+set_latency_reason("SCSI device ioctl", &reason);
 error = scsi_cmd_ioctl(filp, disk->queue, disk, cmd, p);
+restore_latency_reason(&reason);
 if (error != -ENOTTY)
 return error;
 }
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Improve hackbench

2008-01-04 Thread Török Edwin
Ingo Molnar wrote:
> * Zhang, Yanmin <[EMAIL PROTECTED]> wrote:
>
>   
>> hackbench is to test Linux scheduler. The original program is at 
>> http://devresources.linux-foundation.org/craiger/hackbench/src/hackbench.c 
>> Based on this multi-process version, a nice person created a 
>> multi-thread version. Pls. see 
>> http://www.bullopensource.org/posix/pi-futex/hackbench_pth.c
>> 
>
> great. I've uploaded your unified & improved version to:
>
>   http://redhat.com/~mingo/cfs-scheduler/tools/hackbench.c
>
> (i made some small changes - two warning fixes on gcc 4.2 and a default 
> of 10 groups when hackbench is called without parameters, plus a 
> printout.)
>   

On x86-64 there's a bug [*], that causes hackbench to segfault when
compiled with optimizations:
in reap_worker():
int status;
...
pthread_join(id, (void **)(void *)&status);

That is not correct, sizeof(void*) > sizeof(int) on x86-64.
Something gets overwritten on the stack, I tried with gcc
-fstack-protector, but it doesn't detect it !?
After applying the patch, it no longer segfaults.

This patch fixes it:
--- hackbench.c 2008-01-04 10:08:26.0 +0200
+++ ../hackbench.c  2008-01-04 13:45:22.0 +0200
@@ -241,8 +241,10 @@
wait(&status);
if (!WIFEXITED(status))
exit(1);
-   } else
-   pthread_join(id, (void **)(void *)&status);
+   } else {
+   void* status;
+   pthread_join(id, (void **)&status);
+   }
 }

 /* One group of senders and receivers */




I also notice that the thread version is slower, than process version:

$ ./hackbench 5 thread
Running with 5*40 (== 200) tasks.
Time: 0.413
$ ./hackbench 5 thread
Running with 5*40 (== 200) tasks.
Time: 0.423
$ ./hackbench 5 thread 20
Running with 5*40 (== 200) tasks.
Time: 0.093
$ ./hackbench 5 thread 200
Running with 5*40 (== 200) tasks.
Time: 0.827
$ ./hackbench 5 thread 2000
Running with 5*40 (== 200) tasks.
Time: 8.409
$ ./hackbench 5 process 2000
Running with 5*40 (== 200) tasks.
Time: 7.669
$ ./hackbench -pipe 5  process 2000
Running with 5*40 (== 200) tasks.
Time: 3.416
$ ./hackbench -pipe 5  thread 2000
Running with 5*40 (== 200) tasks.
Time: 4.320

[*]
$ uname -a
Linux lightspeed2 2.6.24-rc6-ge697789d #3 Wed Jan 2 11:15:05 EET 2008
x86_64 GNU/Linux
$ gcc -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v
--enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr
--enable-shared --with-system-zlib --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix --enable-nls
--with-gxx-include-dir=/usr/include/c++/4.2 --program-suffix=-4.2
--enable-clocale=gnu --enable-libstdcxx-debug --enable-mpfr
--enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.2.3 20071123 (prerelease) (Debian 4.2.2-4)
$ wget http://redhat.com/~mingo/cfs-scheduler/tools/hackbench.c
--13:40:53--  http://redhat.com/~mingo/cfs-scheduler/tools/hackbench.c
   => `hackbench.c'
Resolving redhat.com... 209.132.177.50
Connecting to redhat.com|209.132.177.50|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://www.redhat.com/~mingo/cfs-scheduler/tools/hackbench.c
[following]
--13:40:54--  http://www.redhat.com/~mingo/cfs-scheduler/tools/hackbench.c
   => `hackbench.c'
Resolving www.redhat.com... 209.132.177.50
Connecting to www.redhat.com|209.132.177.50|:80... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: http://people.redhat.com/mingo/cfs-scheduler/tools/hackbench.c
[following]
--13:40:54--  http://people.redhat.com/mingo/cfs-scheduler/tools/hackbench.c
   => `hackbench.c'
Resolving people.redhat.com... 66.187.233.237
Connecting to people.redhat.com|66.187.233.237|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8,455 (8.3K) [text/plain]

100%[>]
8,455 --.--K/s

13:40:55 (61.93 KB/s) - `hackbench.c' saved [8455/8455]

$ gcc  -O2 -g -Wall -o hackbench hackbench.c -lpthread
hackbench.c:32:66: warning: missing terminating ' character
$ ./hackbench 1 thread
Running with 1*40 (== 40) tasks.
Segmentation fault

$ valgrind --trace-children=yes ./hackbench 1 thread
==27332== Memcheck, a memory error detector.
==27332== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==27332== Using LibVEX rev 1804, a library for dynamic binary translation.
==27332== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==27332== Using valgrind-3.3.0-Debian, a dynamic binary instrumentation
framework.
==27332== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==27332== For more details, rerun with: -v
==27332==
Running with 1*40 (== 40) tasks.
==27332== Thread 2:
==27332== Syscall param write(buf) p

Re: [RFT] Port 0x80 I/O speed

2007-12-12 Thread Török Edwin
Rene Herman wrote:
> Good day.
>
> Would some people on x86 (both 32 and 64) be kind enough to compile
> and run the attached program? This is about testing how long I/O port
> access to port 0x80 takes. It measures in CPU cycles so CPU speed is
> crucial in reporting.
>
> Posted a previous incarnation of this before, buried in the outb 0x80
> thread which had a serialising problem. This one should as far as I
> can see measure the right thing though. Please yell if you disagree...
>
Hi,

Tested on 2 systems.

System I

processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 47
model name  : AMD Athlon(tm) 64 Processor 3200+
stepping: 2
cpu MHz : 2000.000
cache size  : 512 KB
fpu : yes
fpu_exception   : yes

Motherboard: Asus A9N-E


With -m32:

[EMAIL PROTECTED]:~$ sudo ./port80
cycles: out 1107, in 1067
[EMAIL PROTECTED]:~$ sudo ./port80
cycles: out 1107, in 1067
[EMAIL PROTECTED]:~$ sudo ./port80
cycles: out 1107, in 1067
[EMAIL PROTECTED]:~$ sudo ./port80
cycles: out 1107, in 1067
[EMAIL PROTECTED]:~$ sudo ./port80
cycles: out 1107, in 1067
[EMAIL PROTECTED]:~$ sudo ./port80
cycles: out 619, in 583
[EMAIL PROTECTED]:~$ sudo ./port80
cycles: out 619, in 583
[EMAIL PROTECTED]:~$ sudo ./port80
cycles: out 1107, in 1067
[EMAIL PROTECTED]:~$ sudo ./port80
cycles: out 1107, in 1067
[EMAIL PROTECTED]:~$ sudo ./port80
cycles: out 1107, in 1067
[EMAIL PROTECTED]:~$ sudo ./port80
cycles: out 1107, in 1067
[EMAIL PROTECTED]:~$ sudo ./port80
cycles: out 1107, in 1067

After making the __asm__ __volatile__ ("rdtsc" : "=a" (lo), "=d" (hi));
I get this with 64:
[EMAIL PROTECTED]:~$ sudo ./port80
cycles: out 1107, in 1067
[EMAIL PROTECTED]:~$ sudo ./port80
cycles: out 1107, in 1067
[EMAIL PROTECTED]:~$ sudo ./port80
cycles: out 1107, in 1067
[EMAIL PROTECTED]:~$ sudo ./port80
cycles: out 1107, in 1067
[EMAIL PROTECTED]:~$ sudo ./port80
cycles: out 1107, in 1067
[EMAIL PROTECTED]:~$ sudo ./port80
cycles: out 1107, in 1067
[EMAIL PROTECTED]:~$ sudo ./port80
cycles: out 1107, in 1067
[EMAIL PROTECTED]:~$ sudo ./port80
cycles: out 1107, in 1067
[EMAIL PROTECTED]:~$ sudo ./port80
cycles: out 618, in 583
[EMAIL PROTECTED]:~$ sudo ./port80
cycles: out 618, in 583
[EMAIL PROTECTED]:~$ sudo ./port80
cycles: out 1107, in 1067

If I stop cpudyn I get a constant 618/583.

System II


processor   : 0
vendor_id   : GenuineIntel
cpu family  : 6
model   : 14
model name  : Genuine Intel(R) CPU   T2300  @ 1.66GHz
stepping: 8
cpu MHz : 1667.000
cache size  : 2048 KB
physical id : 0
siblings: 2

processor   : 1
vendor_id   : GenuineIntel
cpu family  : 6
model   : 14
model name  : Genuine Intel(R) CPU   T2300  @ 1.66GHz
stepping: 8
cpu MHz : 1667.000
cache size  : 2048 KB
physical id : 0
siblings: 2


Dell Inspiron 6400, Intel Core Duo (ICH7 chipset)

thunder:/home/edwin# ./port80
cycles: out 2480, in 1867
thunder:/home/edwin# ./port80
cycles: out 2482, in 1865
thunder:/home/edwin# ./port80
cycles: out 2968, in 1893
thunder:/home/edwin# ./port80
cycles: out 1991, in 1372
thunder:/home/edwin# ./port80
cycles: out 1979, in 1366
thunder:/home/edwin# ./port80
cycles: out 2473, in 1865
thunder:/home/edwin# ./port80
cycles: out 2484, in 1869

After setting CPU governor to performance:

# ./port80
cycles: out 2368, in 1783
thunder:/home/edwin# ./port80
cycles: out 2377, in 1783
thunder:/home/edwin# ./port80
cycles: out 2367, in 1774
thunder:/home/edwin# ./port80
cycles: out 2370, in 1780
thunder:/home/edwin# ./port80
cycles: out 2365, in 1782
thunder:/home/edwin# ./port80
cycles: out 2369, in 1774
thunder:/home/edwin# ./port80
cycles: out 2366, in 1784
thunder:/home/edwin# ./port80
cycles: out 2379, in 1786
thunder:/home/edwin# ./port80
cycles: out 2367, in 1773
thunder:/home/edwin# ./port80
cycles: out 2376, in 1783
thunder:/home/edwin# ./port80
cycles: out 2360, in 1784
thunder:/home/edwin# ./port80
cycles: out 2367, in 1783
thunder:/home/edwin# ./port80
cycles: out 2370, in 1783
thunder:/home/edwin# ./port80
cycles: out 2382, in 1782

Also tried in a loop, but values are not constant:
while true; do ./port80; done
cycles: out 2415, in 1818
cycles: out 2405, in 1817
cycles: out 2414, in 1810
cycles: out 2411, in 1819
cycles: out 2407, in 1821
cycles: out 2410, in 1820
cycles: out 2418, in 1821
cycles: out 2408, in 1847
cycles: out 2404, in 1805
cycles: out 2411, in 1858
cycles: out 2395, in 1765
cycles: out 2377, in 1786
cycles: out 2378, in 1813
cycles: out 2395, in 1800
cycles: out 2381, in 1793
cycles: out 2382, in 1790
cycles: out 2399, in 1835
cycles: out 1928, in 1327
cycles: out 2410, in 1781
cycles: out 1996, in 1287
cycles: out 2369, in 1768
cycles: out 2401, in 1805
cycles: out 2395, in 1802
cycles: out 2389, in 1786
cycles: out 2359, in 1768
cycles: out 2495, in 1858
cycles: out 2408, in 1809
cycles: out 2919, in 1

Re: [PATCH] Add quirk to set AHCI mode on ICH boards

2007-11-09 Thread Török Edwin
Alan Cox wrote:
> On Thu, 8 Nov 2007 22:46:22 -0500
> Jeff Garzik <[EMAIL PROTECTED]> wrote:
>
>   
>> On Thu, Nov 08, 2007 at 10:29:37PM -0500, Mark Lord wrote:
>> 
>>> And I might even privately patch my own kernels to map the ACHI BAR
>>> in the cases where the BIOS didn't...
>>>   
>> The inability to do this in the general case is the main reason why
>> AHCI was not unconditionally enabled, even in IDE mode, when it was
>> originally added...  :/
>> 
>
> We've done it all the time for various devices without problems (eg S3
> video cards). I'd like to see it go in - although perhaps attached to a
> force_ahci boot param initially
>   

There is one problem with force enabling ahci. You'll loose the CDROM on
Dell laptops.
Prior to force-enabling ahci there is one "device" that sees the 2 sata
channels, and the 2 ide channels.
When you force-enable ahci, this device becomes the ahci controller (it
changes the device id),
and the IDE controller will appear as a separate new device (with
another device id), but it is disabled.

There are registers on the ICH7 that allows you to set enabled/disabled
status, but according to the documentation you should not enable a
device after it has been disabled. In practice I couldn't get the CDROM
to get re-enabled:
* either nothing happend
* spurious irqs were sent that nobody handles, unless I used irq=poll;
but still no cdrom.

Force-enabling AHCI, and not trying to enable the CDROM works, although
I occasionally got NCQ errors.

For a (long) discussion see this thread on the powertop mailing list:
http://www.bughost.org/pipermail/power/2007-June/000533.html
http://www.bughost.org/pipermail/power/2007-June/000573.html

And there is also another slightly different approach:
http://mjg59.livejournal.com/76062.html

Best regards,
--Edwin






-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/