[ofa-general] Re: QoS management in OpenSM - doc

2008-03-02 Thread Yevgeny Kliteynik

Sasha Khapyorsky wrote:

Hi Yevgeny,

On 16:42 Wed 27 Feb , Yevgeny Kliteynik wrote:

The following doc describes QoS management in OpenSM.
This doc (named QoS_management_in_OpenSM.txt) has been added to
the OFED docs, along with the QoS_in_OFED.txt.

I'd like to add this info to OpenSM man pages as well.


Yes, I think that it could be useful to have it under opensm/doc too.


I'm including the text here as is, so it will be easier to follow
possible changes. When those will be done, I'll fix the format to
match the OpenSM man pages and post a patch.

The only problem is that the whole OpenSM man has ~850 lines,
while this QoS management file has ~500 lines... :)


I would suggest to have some basic part (50-100 lines) included in the
man page and reference an entire document (under opensm/doc) for more
details.


OK, I can prepare some kind of summary that would go into the man page.
However, this means that a user would no be able to define a QoS policy
just from reading an OpenSM man pages - he will HAVE to check the full
doc under opensm/doc.


Please review.


Looks fine, few tiny nits are below.

[snip...]


==
 4. Policy File Syntax Guidelines
==

- Empty lines are ignored.


It is mentioned on the next line too.


Right




- Leading and trailing blanks, as well as empty lines, are ignored, so
  the indentation in the example is just for better readability.
- Comments are started with the pound sign (#) and terminated by EOL.
- Any keyword should be the first non-blank in the line, unless it's a
  comment.
- Keywords that denote section/subsection start have matching closing
  keywords.
- Having a QoS Level named DEFAULT is a must - it is applied to PR/MPR
  requests that didn't match any of the matching rules.
- Any section/subsection of the policy file is optional.


[snip...]


==
 6. Simplified QoS Policy - Details and Examples
==

Simplified QoS policy match rules are tailored for matching ULPs (or some
application on top of a ULP) PR/MPR requests. This section has a list of
per-ULP (or per-application) match rules and the SL that should be enforced
on the matched PR/MPR query.

Match rules include:
 - Default match rule that is applied to PR/MPR query that didn't match any
   of the other match rules
 - SDP
 - SDP application with a specific target TCP/IP port range
 - SRP with a specific target IB port GUID
 - RDS
 - iSER
 - iSER application with a specific target TCP/IP port range
 - IPoIB with a default PKey
 - IPoIB with a specific PKey
 - any ULP/application with a specific Service ID in the PR/MPR query
 - any ULP/application with a specific PKey in the PR/MPR query
 - any ULP/application with a specific target IB port GUID in the PR/MPR 
query


Since any section of the policy file is optional, as long as basic rules of
the file are kept (such as no referring to nonexisting port group, having
default QoS Level, etc), the simplified policy section (qos-ulps) can serve
as a complete QoS policy file.
The shortest policy file in this case would be as follows:

qos-ulps
default  : 0 #default SL
end-qos-ulps

It is equivalent to the previous example of the shortest policy file, and 
it

is also equivalent to not having policy file at all.

Below is an example of simplified QoS policy with all the possible 
keywords:


qos-ulps
default   : 0 # default SL
sdp, port-num 3   : 0 # SL for application running on 
top

  # of SDP when a destination
  # TCP/IPport is 3
sdp, port-num 1-2 : 0
sdp   : 1 # default SL for any other
  # application running on top of 
SDP

rds   : 2 # SL for RDS traffic
iser, port-num 900: 0 # SL for iSER with a specific 
target

  # port
iser  : 3 # default SL for iSER
ipoib, pkey 0x0001: 0 # SL for IPoIB on partition with
  # pkey 0x0001
ipoib : 4 # default IPoIB partition,
  # pkey=0x7FFF
any, service-id 0x6234: 6 # match any PR/MPR query with a
  # specific Service ID
any, pkey 0x0ABC  : 6 # match any PR/MPR query with a
  # specific PKey
srp, target-port-guid 0x1234  : 5 # SRP when SRP Target is located 
on

  # 

[ofa-general] MUTUAL TRUST

2008-03-02 Thread Gordon Anthony
 
Attn:
It is indeed my pleasure to write to you this letter, which I believe will be a 
surprise, met on he net we are both complete strangers. As you read this, I 
don't want you to feel sorry for me, because I believe everyone will die 
someday.
My name is Gordon Anthony, a former oil merchant in the middle east. I have 
been diagnosed with Esophageal cancer which was discovered very late, due to my 
laxity in caring for my health. It has defiled all forms of medicine, and right 
now I have only about a few months to live, according to medical experts. I 
have not particularly lived my life so well, as I never really cared for anyone 
not even myself but my business.
Though I am very rich, I was never generous, I was always hostile to people and 
only focus on my business as that was the only thing I cared for, but now I 
regret all this as I now know that there is more to life than just wanting to 
have or make all the money in the world. I believe when I have a second chance 
to come to this world I would live my life a different way from how I had lived 
it, now that it is dark for me, I have willed and given most of my properties 
and assets to my immediate and extended family members and as well as a few 
close friends.
To correct my wrong past life, I have decided to give alms to charity 
organizations, as I want this to be one of the last good deeds I do on earth. 
So far, I have distributed money to some charity organizations in the U.A.E, 
Algeria and Malaysia.
Now that my health has deteriorated so badly, I cannot do this my self anymore. 
I once asked members of my family to close one of my accounts and distribute 
the money which I have there to charity organization in Bulgaria, India and 
Pakistan, they refused and kept the money to themselves. Hence, I do not trust 
them anymore, as they seem not to be contended with what I have left for them.
The last of my money which no one knows of is the huge cash deposit of Six 
million dollars that I have with a Fiducially Company. I will want you to help 
me collect this deposit and dispatched it to charity organizations and you must 
be sending me information's of how it was disbursed by email.
I have set aside 20% for you for your time and patience.
Thanks.
Eng. Gordon Anthony.

-- ALICE C'EST ENCORE MIEUX AVEC LA MUSIQUE ! 

Découvrez vite l'offre exclusive ALICE BOX avec ALICE MUSIC, le téléchargement 
légal et illimité
de plus de 300 000 titres ! En cliquant ici http://alicemusic.aliceadsl.fr
Offre soumise à conditions
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

[ofa-general] ofa_1_3_kernel 20080302-0200 daily build status

2008-03-02 Thread Vladimir Sokolovsky (Mellanox)
This email was generated automatically, please do not reply


git_url: git://git.openfabrics.org/ofed_1_3/linux-2.6.git
git_branch: ofed_kernel

Common build parameters:   --with-ipoib-mod --with-sdp-mod --with-srp-mod 
--with-user_mad-mod --with-user_access-mod --with-mthca-mod --with-mlx4-mod 
--with-core-mod --with-addr_trans-mod  --with-rds-mod --with-cxgb3-mod 
--with-nes-mod

Passed:
Passed on i686 with 2.6.15-23-server
Passed on i686 with linux-2.6.13
Passed on i686 with linux-2.6.12
Passed on i686 with linux-2.6.16
Passed on i686 with linux-2.6.14
Passed on i686 with linux-2.6.15
Passed on i686 with linux-2.6.17
Passed on i686 with linux-2.6.18
Passed on i686 with linux-2.6.19
Passed on i686 with linux-2.6.22
Passed on i686 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.12
Passed on x86_64 with linux-2.6.14
Passed on x86_64 with linux-2.6.13
Passed on x86_64 with linux-2.6.15
Passed on x86_64 with linux-2.6.16
Passed on x86_64 with linux-2.6.16.43-0.3-smp
Passed on x86_64 with linux-2.6.16.21-0.8-smp
Passed on x86_64 with linux-2.6.18
Passed on x86_64 with linux-2.6.17
Passed on x86_64 with linux-2.6.18-1.2798.fc6
Passed on x86_64 with linux-2.6.19
Passed on x86_64 with linux-2.6.18-8.el5
Passed on x86_64 with linux-2.6.18-53.el5
Passed on x86_64 with linux-2.6.21.1
Passed on x86_64 with linux-2.6.22
Passed on x86_64 with linux-2.6.20
Passed on x86_64 with linux-2.6.24
Passed on x86_64 with linux-2.6.22.5-31-default
Passed on x86_64 with linux-2.6.9-42.ELsmp
Passed on x86_64 with linux-2.6.9-55.ELsmp
Passed on ia64 with linux-2.6.13
Passed on ia64 with linux-2.6.12
Passed on ia64 with linux-2.6.15
Passed on ia64 with linux-2.6.14
Passed on ia64 with linux-2.6.16
Passed on ia64 with linux-2.6.18
Passed on ia64 with linux-2.6.17
Passed on ia64 with linux-2.6.16.21-0.8-default
Passed on ia64 with linux-2.6.21.1
Passed on ia64 with linux-2.6.22
Passed on ia64 with linux-2.6.19
Passed on powerpc with linux-2.6.12
Passed on ia64 with linux-2.6.23
Passed on ia64 with linux-2.6.24
Passed on powerpc with linux-2.6.14
Passed on powerpc with linux-2.6.15
Passed on powerpc with linux-2.6.13
Passed on ppc64 with linux-2.6.12
Passed on ppc64 with linux-2.6.14
Passed on ppc64 with linux-2.6.13
Passed on ppc64 with linux-2.6.15
Passed on ppc64 with linux-2.6.16
Passed on ppc64 with linux-2.6.17
Passed on ppc64 with linux-2.6.19
Passed on ppc64 with linux-2.6.18
Passed on ppc64 with linux-2.6.18-8.el5
Passed on ppc64 with linux-2.6.24

Failed:
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [OpenSM] updn routing performance fix???

2008-03-02 Thread Hal Rosenstock
On Sat, 2008-03-01 at 22:53 +, Sasha Khapyorsky wrote:
 On 19:59 Fri 29 Feb , Hal Rosenstock wrote:
  
  If that makes sense, then also query commands on this state would
  likely also.
 
 Not sure about this. It is dynamically updated flag, so it would be hard
 to catch a valid value by hand from the OpenSM console.

I was referring to the balance state not that flag. Does that make
more sense ?

-- Hal

 Sasha
 ___
 general mailing list
 general@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
 
 To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [OpenSM] updn routing performance fix???

2008-03-02 Thread Sasha Khapyorsky
On 05:04 Sun 02 Mar , Hal Rosenstock wrote:
 On Sat, 2008-03-01 at 22:53 +, Sasha Khapyorsky wrote:
  On 19:59 Fri 29 Feb , Hal Rosenstock wrote:
   
   If that makes sense, then also query commands on this state would
   likely also.
  
  Not sure about this. It is dynamically updated flag, so it would be hard
  to catch a valid value by hand from the OpenSM console.
 
 I was referring to the balance state not that flag. Does that make
 more sense ?

What do you mean? Routing dumps?

Sasha
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] page allocation failure

2008-03-02 Thread Eli Cohen
Look like your system is low on memory. Maybe you have to add memory or
maybe something eats your memory (a memory leak?).

On Thu, 2008-02-28 at 18:42 +0100, Bernd Schubert wrote:
 Hello,
 
 on several on our Lustre Servers we can see page allocation failures.
 
 This is with 2.6.22 + kernel modules from ofed 1.2.5
 
 
 [44464.764559] Lustre: 24052:0:(ldlm_lib.c:698:target_handle_connect()) 
 Skipped 16 previous similar messages
 [54132.351263] ib_cm/2: page allocation failure. order:0, mode:0x10d0
 [54132.360738]
 [54132.360741] Call Trace:
 [54132.367803]  [8020ac61] show_trace+0x34/0x47
 [54132.373235]  [8020ac86] dump_stack+0x12/0x17
 [54132.378937]  [80251bc4] __alloc_pages+0x2a3/0x2bc
 [54132.386180]  [8020f75c] dma_alloc_pages+0x9b/0xbf
 [54132.395120]  [8020f7f6] dma_alloc_coherent+0x76/0x1cc
 [54132.401651]  [8809af1e] :ib_mthca:mthca_buf_alloc+0x1bd/0x2a3
 [54132.408897]  [8809f9a9] 
 :ib_mthca:mthca_alloc_qp_common+0x246/0x4e5
 [54132.418884]  [880a0c6d] :ib_mthca:mthca_alloc_qp+0xab/0x102
 [54132.425774]  [880a5217] :ib_mthca:mthca_create_qp+0x126/0x281
 [54132.432716]  [88054bc5] :ib_core:ib_create_qp+0x17/0x91
 [54132.439102]  [88161c9f] :rdma_cm:rdma_create_qp+0x2d/0x153
 [54132.446301]  [8835d0cc] :ko2iblnd:kiblnd_create_conn+0x81c/0x1250
 [54132.456992]  [88365295] 
 :ko2iblnd:kiblnd_passive_connect+0x605/0xdd0
 [54132.469847]  [88366975] :ko2iblnd:kiblnd_cm_callback+0x255/0xeb0
 [54132.478821]  [881620e7] :rdma_cm:cma_req_handler+0x322/0x389
 [54132.485637]  [88155fa4] :ib_cm:cm_process_work+0x17/0xad
 [54132.492182]  [88157025] :ib_cm:cm_req_handler+0x7ae/0x81b
 [54132.499236]  [881570bf] :ib_cm:cm_work_handler+0x2d/0xbaa
 [54132.506690]  [80236291] run_workqueue+0x7f/0x10b
 [54132.512652]  [80236b1a] worker_thread+0xda/0xe4
 [54132.520136]  [8023959a] kthread+0x47/0x75
 [54132.525570]  [8020a2f8] child_rip+0xa/0x12
 [54132.532975]
 [54132.535527] Mem-info:
 [54132.538157] Node 0 DMA per-cpu:
 [54132.542303] CPU0: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, 
 btch:   1 usd:   0
 [54132.551752] CPU1: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, 
 btch:   1 usd:   0
 [54132.561661] CPU2: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, 
 btch:   1 usd:   0
 [54132.571154] CPU3: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, 
 btch:   1 usd:   0
 [54132.580597] CPU4: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, 
 btch:   1 usd:   0
 [54132.592354] CPU5: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, 
 btch:   1 usd:   0
 [54132.601794] CPU6: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, 
 btch:   1 usd:   0
 [54132.610719] CPU7: Hot: hi:0, btch:   1 usd:   0   Cold: hi:0, 
 btch:   1 usd:   0
 [54132.619630] Node 0 DMA32 per-cpu:
 [54132.623551] CPU0: Hot: hi:  186, btch:  31 usd:  49   Cold: hi:   62, 
 btch:  15 usd:  49
 [54132.632691] CPU1: Hot: hi:  186, btch:  31 usd:  26   Cold: hi:   62, 
 btch:  15 usd:   3
 [54132.642680] CPU2: Hot: hi:  186, btch:  31 usd:  30   Cold: hi:   62, 
 btch:  15 usd:  54
 [54132.651897] CPU3: Hot: hi:  186, btch:  31 usd:   1   Cold: hi:   62, 
 btch:  15 usd:  13
 [54132.663321] CPU4: Hot: hi:  186, btch:  31 usd:  43   Cold: hi:   62, 
 btch:  15 usd:  55
 [54132.673282] CPU5: Hot: hi:  186, btch:  31 usd:  30   Cold: hi:   62, 
 btch:  15 usd:  49
 [54132.683636] CPU6: Hot: hi:  186, btch:  31 usd:  25   Cold: hi:   62, 
 btch:  15 usd:   1
 [54132.693156] CPU7: Hot: hi:  186, btch:  31 usd:  13   Cold: hi:   62, 
 btch:  15 usd:  56
 [54132.703412] Node 0 Normal per-cpu:
 [54132.707024] CPU0: Hot: hi:  186, btch:  31 usd: 130   Cold: hi:   62, 
 btch:  15 usd:  14
 [54132.719317] CPU1: Hot: hi:  186, btch:  31 usd:  81   Cold: hi:   62, 
 btch:  15 usd:   1
 [54132.729276] CPU2: Hot: hi:  186, btch:  31 usd: 134   Cold: hi:   62, 
 btch:  15 usd:   2
 [54132.738819] CPU3: Hot: hi:  186, btch:  31 usd: 124   Cold: hi:   62, 
 btch:  15 usd:   8
 [54132.748078] CPU4: Hot: hi:  186, btch:  31 usd:  21   Cold: hi:   62, 
 btch:  15 usd:   4
 [54132.758029] CPU5: Hot: hi:  186, btch:  31 usd:  30   Cold: hi:   62, 
 btch:  15 usd:   9
 [54132.766855] CPU6: Hot: hi:  186, btch:  31 usd: 120   Cold: hi:   62, 
 btch:  15 usd:  13
 [54132.776462] CPU7: Hot: hi:  186, btch:  31 usd: 166   Cold: hi:   62, 
 btch:  15 usd:  12
 [54132.786009] Active:28507 inactive:62701 dirty:8386 writeback:27 unstable:0
 [54132.786010]  free:5586 slab:273528 mapped:2136 pagetables:699 bounce:0
 [54132.803082] Node 0 DMA free:11192kB min:20kB low:24kB high:28kB active:0kB 
 inactive:0kB present:10660kB pages_scanned:0 all_unreclaimable? yes
 [54132.816507] lowmem_reserve[]: 0 3255 4013
 [54132.820811] Node 0 DMA32 free:9812kB min:6564kB low:8204kB high:9844kB 

Re: [ofa-general] Re: [OpenSM] updn routing performance fix???

2008-03-02 Thread Albert Chu
Hey Hal,

Are you saying a flag inside each osm_switch_t to indicate if that
specific switch is balanced?

The script I wrote for the balance check did have difficulty determining a
lot of corner cases (is port connected to a CA? is it active?  what ports
are up vs. down links, etc.).  At the end of the day you just output a lot
of extra info and have to look through it manually.

Although probably not easy as a whole, these calculations would be easier
in opensm since that information is available.

Al

 On 05:04 Sun 02 Mar , Hal Rosenstock wrote:
 On Sat, 2008-03-01 at 22:53 +, Sasha Khapyorsky wrote:
  On 19:59 Fri 29 Feb , Hal Rosenstock wrote:
  
   If that makes sense, then also query commands on this state would
   likely also.
 
  Not sure about this. It is dynamically updated flag, so it would be
 hard
  to catch a valid value by hand from the OpenSM console.

 I was referring to the balance state not that flag. Does that make
 more sense ?

 What do you mean? Routing dumps?

 Sasha
 ___
 general mailing list
 general@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

 To unsubscribe, please visit
 http://openib.org/mailman/listinfo/openib-general



-- 
Albert Chu
[EMAIL PROTECTED]
925-422-5311
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: [PATCH] opensm: enforce routing paths rebalancing on switch reconnection

2008-03-02 Thread Albert Chu
Hey Sasha,

In order to make things work, I also had to add this patch.  Seems like a
corner case that needs to be handled since we never fall into
__osm_pi_rcv_process_switch_port().  (BTW, I am working off a 3.1.10
branch for the test cluster, so this patch is forward ported and
technically untested.)

--- a/opensm/opensm/osm_port_info_rcv.c
+++ b/opensm/opensm/osm_port_info_rcv.c
@@ -564,6 +564,7 @@ void osm_pi_rcv_process(IN void *context, IN void *data)
, Commencing heavy sweep\n,
cl_ntoh64(node_guid), cl_ntoh64(port_guid));
sm-p_subn-force_heavy_sweep = 1;
+   sm-p_subn-ignore_existing_lfts = 1;
goto Exit;
}

Al

 Hey Sasha,

 This patch should definitely work.  I'll let you know after I get a chance
 to try it.

 Al

 Hi Al,

 On 16:08 Sat 01 Mar , Sasha Khapyorsky wrote:

 When switch ports were reconnected we need to recalculate routing paths
 balancing. Reconnection is detected by port state examination - when it
 becomes INIT routing paths rebalancing (ignore_existing_lfts flag) is
 enforced.

 Signed-off-by: Sasha Khapyorsky [EMAIL PROTECTED]

 This patch is simpler than all previous ones. I tested it with ibsim
 already. Could you test in your environment?

 Sasha



 --
 Albert Chu
 [EMAIL PROTECTED]
 925-422-5311
 Computer Scientist
 High Performance Systems Division
 Lawrence Livermore National Laboratory




-- 
Albert Chu
[EMAIL PROTECTED]
925-422-5311
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: [PATCH] mmu notifiers #v8 + xpmem

2008-03-02 Thread Andrea Arcangeli
Here an example of the futher orthogonal work to do on top of #v8
during .26-rc to make the whole mmu notifier API sleep capable.

1) Every single ptep_clear_flush_young_notify and
ptep_clear_flush_notify must be converted like the below. The below is
the conversion of a single one. do_wp_page has been converted by
Christoph already but with invalidate_range (should be changed to
invalidate_page by releasing the refcount on the page after calling
invalidate_page). Hope it's clear why I'd rather not depend on these
changes to be merged in .25 in order to have the mmu notifier included
in .25.

2) Then after all this conversion work is finished, it's trivial to
delete ptep_clear_flush_young_notify and ptep_clear_flush_notify from
mmu_notifier.h (they will be unused macros once the conversion is
complete).

3) After that the VM has to be changed to convert anon_vma lock and
i_mmap_lock spinlocks to mutex/rwsemaphore.

4) Then finally the mmu_notifier_unregister must be dropped to make the
mmu notifier sleep capable with RCU in the mmu_notifier() fast path.

It's unclear at this point if 3/4 should be switchable and happening
under a CONFIG_XPMEM or similar or if everyone will benefit from those
spinlock becoming mutex (the only one that is certain to appreciate
such a change is preempt-rt, the rest of the userbase I don't know for
sure and I'd be more confortable with a TPC number comparison before
doing such a chance by default, but I leave the commentary on such a
change to linux-mm in a separate thread).

Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED]

diff --git a/mm/rmap.c b/mm/rmap.c
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -274,7 +274,7 @@ static int page_referenced_one(struct pa
unsigned long address;
pte_t *pte;
spinlock_t *ptl;
-   int referenced = 0;
+   int referenced = 0, clear_flush_young = 0;
 
address = vma_address(page, vma);
if (address == -EFAULT)
@@ -287,8 +287,11 @@ static int page_referenced_one(struct pa
if (vma-vm_flags  VM_LOCKED) {
referenced++;
*mapcount = 1;  /* break early from loop */
-   } else if (ptep_clear_flush_young_notify(vma, address, pte))
-   referenced++;
+   } else {
+   clear_flush_young = 1;
+   if (ptep_clear_flush_young(vma, address, pte))
+   referenced++;
+   }
 
/* Pretend the page is referenced if the task has the
   swap token and is in the middle of a page fault. */
@@ -298,6 +301,11 @@ static int page_referenced_one(struct pa
 
(*mapcount)--;
pte_unmap_unlock(pte, ptl);
+
+   if (clear_flush_young)
+   referenced += mmu_notifier_clear_flush_young(vma-vm_mm,
+address);
+
 out:
return referenced;
 }

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] [infiniband-diags] check_lft_balance script

2008-03-02 Thread Albert Chu
Hey Sasha,

Here's the script I mentioned before that I used for the balance checking
earlier.  Its nothing fancy but probably could be useful to others.

Al

-- 
Albert Chu
[EMAIL PROTECTED]
925-422-5311
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory
From 110a19a2d1bdaafe1ace7b2c48f39be5c1ec388f Mon Sep 17 00:00:00 2001
From: Albert L. Chu [EMAIL PROTECTED]
Date: Sat, 1 Mar 2008 20:02:03 -0800
Subject: [PATCH] add check_lft_balance script


Signed-off-by: Albert L. Chu [EMAIL PROTECTED]
---
 infiniband-diags/Makefile.am  |6 +-
 infiniband-diags/infiniband-diags.spec.in |4 +
 infiniband-diags/man/check_lft_balance.8  |   42 
 infiniband-diags/scripts/check_lft_balance.pl |  319 +
 4 files changed, 369 insertions(+), 2 deletions(-)
 create mode 100644 infiniband-diags/man/check_lft_balance.8
 create mode 100755 infiniband-diags/scripts/check_lft_balance.pl

diff --git a/infiniband-diags/Makefile.am b/infiniband-diags/Makefile.am
index ca66e2d..8bbda9e 100644
--- a/infiniband-diags/Makefile.am
+++ b/infiniband-diags/Makefile.am
@@ -25,7 +25,8 @@ sbin_SCRIPTS = scripts/ibcheckerrs scripts/ibchecknet scripts/ibchecknode \
 	   scripts/ibqueryerrors.pl scripts/ibswportwatch.pl \
 	   scripts/iblinkinfo.pl scripts/ibprintswitch.pl \
 	   scripts/ibprintca.pl scripts/ibprintrt.pl \
-	   scripts/ibfindnodesusing.pl scripts/ibidsverify.pl
+	   scripts/ibfindnodesusing.pl scripts/ibidsverify.pl \
+	   scripts/check_lft_balance.pl
 
 src_ibaddr_SOURCES = src/ibaddr.c src/ibdiag_common.c
 src_ibaddr_CFLAGS = -Wall $(DBGFLAGS)
@@ -89,7 +90,8 @@ man_MANS = man/ibaddr.8 man/ibcheckerrors.8 man/ibcheckerrs.8 \
 	man/iblinkinfo.8 man/ibqueryerrors.8 man/ibswportwatch.8 \
 	man/ibprintswitch.8 man/ibprintca.8 man/ibfindnodesusing.8 \
 	man/ibdatacounts.8 man/ibdatacounters.8 \
-	man/ibrouters.8 man/ibprintrt.8 man/ibidsverify.8
+	man/ibrouters.8 man/ibprintrt.8 man/ibidsverify.8 \
+	man/check_lft_balance.pl
 
 BUILT_SOURCES = ibdiag_version
 ibdiag_version:
diff --git a/infiniband-diags/infiniband-diags.spec.in b/infiniband-diags/infiniband-diags.spec.in
index 7a0e17b..9c8c0c4 100644
--- a/infiniband-diags/infiniband-diags.spec.in
+++ b/infiniband-diags/infiniband-diags.spec.in
@@ -48,6 +48,7 @@ rm -rf $RPM_BUILD_ROOT
 %{_sbindir}/vendstat
 %{_sbindir}/dump_mfts.sh
 %{_sbindir}/dump_lfts.sh
+%{_sbindir}/check_lft_balance.pl
 %{_sbindir}/set_nodedesc.sh
 %{_sbindir}/sm*
 %define _perldir %(perl -e 'use Config; $T=$Config{installsitearch}; $T=~/(.*)\\/site_perl.*/; print $1;')
@@ -56,6 +57,9 @@ rm -rf $RPM_BUILD_ROOT
 %doc README COPYING ChangeLog
 
 %changelog
+* Mon Mar 03 2008 Albert Chu [EMAIL PROTECTED] - 1.3.5
+- Add check_lft_balance script.
+
 * Wed Oct 31 2007 Ira Weiny [EMAIL PROTECTED] - 1.3.2
 - Change switch-map option to node-name-map
 
diff --git a/infiniband-diags/man/check_lft_balance.8 b/infiniband-diags/man/check_lft_balance.8
new file mode 100644
index 000..35243f6
--- /dev/null
+++ b/infiniband-diags/man/check_lft_balance.8
@@ -0,0 +1,42 @@
+.TH CHECK_LFT_BALANCE.SH 8 March 1, 2008 OpenIB OpenIB Diagnostics
+
+.SH NAME
+check_lft_balance.sh \- check InfiniBand unicast forwarding tables balance
+
+.SH SYNOPSIS
+.B check_lft_balance.sh
+[-hRv]
+
+
+.SH DESCRIPTION
+.PP
+check_lft_balance.sh is a script which checks for balancing in Infiniband
+unicast forwarding tables.  It analyzes the output of 
+.BR dump_lfts(8)
+and
+.BR iblinkinfo(8).
+
+.SH OPTIONS
+
+.PP
+.TP
+\fB\-h\fR
+show help
+.TP
+\fB\-R\fR
+Recalculate dump_lfts information, ie do not use the cached
+information.  This option is slower but should be used if the diag tools have
+not been used for some time or if there are other reasons to believe that
+the fabric has changed.
+.TP
+\fB\-v\fR
+verbose output
+
+.SH SEE ALSO
+.BR dump_lfts(8),
+.BR iblinkinfo(8)
+
+.SH AUTHORS
+.TP
+Albert Chu
+.RI  [EMAIL PROTECTED] 
diff --git a/infiniband-diags/scripts/check_lft_balance.pl b/infiniband-diags/scripts/check_lft_balance.pl
new file mode 100755
index 000..c4186ed
--- /dev/null
+++ b/infiniband-diags/scripts/check_lft_balance.pl
@@ -0,0 +1,319 @@
+#!/usr/bin/perl
+#
+# Copyright (C) 2001-2003 The Regents of the University of California.
+# Copyright (c) 2006 The Regents of the University of California.
+# Copyright (c) 2007 Voltaire, Inc. All rights reserved.
+#
+# Produced at Lawrence Livermore National Laboratory.
+# Written by Ira Weiny [EMAIL PROTECTED]
+#Jim Garlick [EMAIL PROTECTED]
+#Albert Chu [EMAIL PROTECTED]
+#
+# This software is available to you under a choice of one of two
+# licenses.  You may choose to be licensed under the terms of the GNU
+# General Public License (GPL) Version 2, available from the file
+# COPYING in the main directory of this source tree, or the
+# OpenIB.org BSD license below:
+#
+# Redistribution and use in source and binary forms, with or
+# 

[ofa-general] Re: [PATCH] mmu notifiers #v8 + xpmem

2008-03-02 Thread Peter Zijlstra

On Sun, 2008-03-02 at 17:03 +0100, Andrea Arcangeli wrote:

 4) Then finally the mmu_notifier_unregister must be dropped to make the
 mmu notifier sleep capable with RCU in the mmu_notifier() fast path.

Or require PREEMPTIBLE_RCU, that can handle sleeps..

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] [PATCH] mmu notifiers #v8

2008-03-02 Thread Andrea Arcangeli
Difference between #v7 and #v8:

1) s/age_page/clear_flush_young/ (Nick's suggestion)
2) macro fix (Andrew)
3) move release before final unmap_vmas (for GRU, Jack/Christoph)
4) microoptimize mmu_notifier_unregister (Christoph)
5) use mmap_sem for registration serialization (Christoph)

The (void)xxx in macros doesn't work with args. Christoph's solution
look best in avoiding warnings, even if it forces to make the mmu
notifier operation structure visible even if MMU_NOTIFIER=n (that's
the only downside).

I didn't drop invalidate_page, because invalidate_range_begin/end
would be slower for usages like KVM/GRU (we don't need a begin/end
there because where invalidate_page is called, the VM holds a
reference on the page). do_wp_page should also use invalidate_page
since it can free the page after dropping the PT lock without losing
any performance (that's not true for the places where invalidate_range
is called).

It'd be nice if everyone involved can agree to converge on this API
for .25. KVM/GRU (and perhaps Quadrics) and similar usages will be
fully covered in .25. This is a kernel internal API so there's no
problem if all the methods will become sleep capable only starting
only in .26. The brainer part of the VM work to do to make it sleep
capable is pretty much orthogonal with this patch.

Signed-off-by: Andrea Arcangeli [EMAIL PROTECTED]
Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -10,6 +10,7 @@
 #include linux/rbtree.h
 #include linux/rwsem.h
 #include linux/completion.h
+#include linux/mmu_notifier.h
 #include asm/page.h
 #include asm/mmu.h
 
@@ -228,6 +229,8 @@ struct mm_struct {
 #ifdef CONFIG_CGROUP_MEM_CONT
struct mem_cgroup *mem_cgroup;
 #endif
+
+   struct mmu_notifier_head mmu_notifier; /* MMU notifier list */
 };
 
 #endif /* _LINUX_MM_TYPES_H */
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
new file mode 100644
--- /dev/null
+++ b/include/linux/mmu_notifier.h
@@ -0,0 +1,161 @@
+#ifndef _LINUX_MMU_NOTIFIER_H
+#define _LINUX_MMU_NOTIFIER_H
+
+#include linux/list.h
+#include linux/spinlock.h
+
+struct mmu_notifier;
+
+struct mmu_notifier_ops {
+   /*
+* Called when nobody can register any more notifier in the mm
+* and after the mn notifier has been disarmed already.
+*/
+   void (*release)(struct mmu_notifier *mn,
+   struct mm_struct *mm);
+
+   /*
+* clear_flush_young is called after the VM is
+* test-and-clearing the young/accessed bitflag in the
+* pte. This way the VM will provide proper aging to the
+* accesses to the page through the secondary MMUs and not
+* only to the ones through the Linux pte.
+*/
+   int (*clear_flush_young)(struct mmu_notifier *mn,
+struct mm_struct *mm,
+unsigned long address);
+
+   /*
+* Before this is invoked any secondary MMU is still ok to
+* read/write to the page previously pointed by the Linux pte
+* because the old page hasn't been freed yet.  If required
+* set_page_dirty has to be called internally to this method.
+*/
+   void (*invalidate_page)(struct mmu_notifier *mn,
+   struct mm_struct *mm,
+   unsigned long address);
+
+   /*
+* invalidate_range_begin() and invalidate_range_end() must be
+* paired. Multiple invalidate_range_begin/ends may be nested
+* or called concurrently.
+*/
+   void (*invalidate_range_begin)(struct mmu_notifier *mn,
+  struct mm_struct *mm,
+  unsigned long start, unsigned long end); 
  
+   void (*invalidate_range_end)(struct mmu_notifier *mn,
+struct mm_struct *mm,
+unsigned long start, unsigned long end);
+};
+
+struct mmu_notifier {
+   struct hlist_node hlist;
+   const struct mmu_notifier_ops *ops;
+};
+
+#ifdef CONFIG_MMU_NOTIFIER
+
+struct mmu_notifier_head {
+   struct hlist_head head;
+};
+
+#include linux/mm_types.h
+
+/*
+ * Must hold the mmap_sem for write.
+ *
+ * RCU is used to traverse the list. A quiescent period needs to pass
+ * before the notifier is guaranteed to be visible to all threads.
+ */
+extern void mmu_notifier_register(struct mmu_notifier *mn,
+ struct mm_struct *mm);
+/*
+ * Must hold the mmap_sem for write.
+ *
+ * RCU is used to traverse the list. A quiescent period needs to pass
+ * before the struct mmu_notifier can be freed. Alternatively it
+ * can be synchronously freed inside -release when the list can't
+ * change anymore and nobody could possibly walk it.
+ */
+extern void mmu_notifier_unregister(struct 

[ofa-general] Viagra

2008-03-02 Thread Canadian Pharmacy
Viagra is an oral drug for male impotence, also known as erectile dysfunction. 
Having been around for a lot longer, Viagra has a great safety track record and 
proven effects that start acting in 30 minutes and last for about 5 hours.

Please visit our site for more details.

Type the URL below without spaces to visit us

h t t p : / / b u r n h i t . c o m /


___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [OpenSM] updn routing performance fix???

2008-03-02 Thread Hal Rosenstock
On Sun, 2008-03-02 at 14:17 +, Sasha Khapyorsky wrote:
 On 05:04 Sun 02 Mar , Hal Rosenstock wrote:
  On Sat, 2008-03-01 at 22:53 +, Sasha Khapyorsky wrote:
   On 19:59 Fri 29 Feb , Hal Rosenstock wrote:

If that makes sense, then also query commands on this state would
likely also.
   
   Not sure about this. It is dynamically updated flag, so it would be hard
   to catch a valid value by hand from the OpenSM console.
  
  I was referring to the balance state not that flag. Does that make
  more sense ?
 
 What do you mean? Routing dumps?

A different routing dump reflecting balance or not and how out of
balance.

-- Hal

 
 Sasha
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] Re: [OpenSM] updn routing performance fix???

2008-03-02 Thread Hal Rosenstock
Hi Al,

On Sun, 2008-03-02 at 07:16 -0800, Albert Chu wrote:
 Hey Hal,
 
 Are you saying a flag inside each osm_switch_t to indicate if that
 specific switch is balanced?

I wasn't saying anything about implementation. I was saying there could
be OpenSM console commands to 1. rebalance, and 2. display relevant
state regarding balance/imbalance.

 The script I wrote for the balance check did have difficulty determining a
 lot of corner cases (is port connected to a CA? is it active?  what ports
 are up vs. down links, etc.).  At the end of the day you just output a lot
 of extra info and have to look through it manually.
 
 Although probably not easy as a whole, these calculations would be easier
 in opensm since that information is available.

That's what I was suggesting rather than a separate diag script although
the latter seems like it would be good too.

-- Hal

 Al
 
  On 05:04 Sun 02 Mar , Hal Rosenstock wrote:
  On Sat, 2008-03-01 at 22:53 +, Sasha Khapyorsky wrote:
   On 19:59 Fri 29 Feb , Hal Rosenstock wrote:
   
If that makes sense, then also query commands on this state would
likely also.
  
   Not sure about this. It is dynamically updated flag, so it would be
  hard
   to catch a valid value by hand from the OpenSM console.
 
  I was referring to the balance state not that flag. Does that make
  more sense ?
 
  What do you mean? Routing dumps?
 
  Sasha
  ___
  general mailing list
  general@lists.openfabrics.org
  http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general
 
  To unsubscribe, please visit
  http://openib.org/mailman/listinfo/openib-general
 
 
 
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Achieve all your dreams

2008-03-02 Thread Margarita Sterling

Gain the greatest Schlong ever!
!ck enlargement becomes much easier!
http://mutyouch.com/___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [ofa-general] Re: [OpenSM] updn routing performance fix???

2008-03-02 Thread Sasha Khapyorsky
On 09:51 Sun 02 Mar , Hal Rosenstock wrote:
 
 A different routing dump reflecting balance or not and how out of
 balance.

This makes sense. Actually OpenSM has such sort of dump right now, but
it is printed to stdout.

Sasha
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [ofa-general] [infiniband-diags] check_lft_balance script

2008-03-02 Thread Albert Chu
Hey Sasha,

Noticed one more thing I could clean up from the original script.  here's
a new one.

Al

 Hey Sasha,

 Here's the script I mentioned before that I used for the balance checking
 earlier.  Its nothing fancy but probably could be useful to others.

 Al

 --
 Albert Chu
 [EMAIL PROTECTED]
 925-422-5311
 Computer Scientist
 High Performance Systems Division
 Lawrence Livermore National Laboratory
 ___
 general mailing list
 general@lists.openfabrics.org
 http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

 To unsubscribe, please visit
 http://openib.org/mailman/listinfo/openib-general


-- 
Albert Chu
[EMAIL PROTECTED]
925-422-5311
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory
From 656ba8ee8103fd27f3041570d8c44d823848abf4 Mon Sep 17 00:00:00 2001
From: Albert L. Chu [EMAIL PROTECTED]
Date: Sat, 1 Mar 2008 20:02:03 -0800
Subject: [PATCH] add check_lft_balance script


Signed-off-by: Albert L. Chu [EMAIL PROTECTED]
---
 infiniband-diags/Makefile.am  |6 +-
 infiniband-diags/infiniband-diags.spec.in |4 +
 infiniband-diags/man/check_lft_balance.8  |   42 
 infiniband-diags/scripts/check_lft_balance.pl |  314 +
 4 files changed, 364 insertions(+), 2 deletions(-)
 create mode 100644 infiniband-diags/man/check_lft_balance.8
 create mode 100755 infiniband-diags/scripts/check_lft_balance.pl

diff --git a/infiniband-diags/Makefile.am b/infiniband-diags/Makefile.am
index ca66e2d..8bbda9e 100644
--- a/infiniband-diags/Makefile.am
+++ b/infiniband-diags/Makefile.am
@@ -25,7 +25,8 @@ sbin_SCRIPTS = scripts/ibcheckerrs scripts/ibchecknet scripts/ibchecknode \
 	   scripts/ibqueryerrors.pl scripts/ibswportwatch.pl \
 	   scripts/iblinkinfo.pl scripts/ibprintswitch.pl \
 	   scripts/ibprintca.pl scripts/ibprintrt.pl \
-	   scripts/ibfindnodesusing.pl scripts/ibidsverify.pl
+	   scripts/ibfindnodesusing.pl scripts/ibidsverify.pl \
+	   scripts/check_lft_balance.pl
 
 src_ibaddr_SOURCES = src/ibaddr.c src/ibdiag_common.c
 src_ibaddr_CFLAGS = -Wall $(DBGFLAGS)
@@ -89,7 +90,8 @@ man_MANS = man/ibaddr.8 man/ibcheckerrors.8 man/ibcheckerrs.8 \
 	man/iblinkinfo.8 man/ibqueryerrors.8 man/ibswportwatch.8 \
 	man/ibprintswitch.8 man/ibprintca.8 man/ibfindnodesusing.8 \
 	man/ibdatacounts.8 man/ibdatacounters.8 \
-	man/ibrouters.8 man/ibprintrt.8 man/ibidsverify.8
+	man/ibrouters.8 man/ibprintrt.8 man/ibidsverify.8 \
+	man/check_lft_balance.pl
 
 BUILT_SOURCES = ibdiag_version
 ibdiag_version:
diff --git a/infiniband-diags/infiniband-diags.spec.in b/infiniband-diags/infiniband-diags.spec.in
index 7a0e17b..9c8c0c4 100644
--- a/infiniband-diags/infiniband-diags.spec.in
+++ b/infiniband-diags/infiniband-diags.spec.in
@@ -48,6 +48,7 @@ rm -rf $RPM_BUILD_ROOT
 %{_sbindir}/vendstat
 %{_sbindir}/dump_mfts.sh
 %{_sbindir}/dump_lfts.sh
+%{_sbindir}/check_lft_balance.pl
 %{_sbindir}/set_nodedesc.sh
 %{_sbindir}/sm*
 %define _perldir %(perl -e 'use Config; $T=$Config{installsitearch}; $T=~/(.*)\\/site_perl.*/; print $1;')
@@ -56,6 +57,9 @@ rm -rf $RPM_BUILD_ROOT
 %doc README COPYING ChangeLog
 
 %changelog
+* Mon Mar 03 2008 Albert Chu [EMAIL PROTECTED] - 1.3.5
+- Add check_lft_balance script.
+
 * Wed Oct 31 2007 Ira Weiny [EMAIL PROTECTED] - 1.3.2
 - Change switch-map option to node-name-map
 
diff --git a/infiniband-diags/man/check_lft_balance.8 b/infiniband-diags/man/check_lft_balance.8
new file mode 100644
index 000..35243f6
--- /dev/null
+++ b/infiniband-diags/man/check_lft_balance.8
@@ -0,0 +1,42 @@
+.TH CHECK_LFT_BALANCE.SH 8 March 1, 2008 OpenIB OpenIB Diagnostics
+
+.SH NAME
+check_lft_balance.sh \- check InfiniBand unicast forwarding tables balance
+
+.SH SYNOPSIS
+.B check_lft_balance.sh
+[-hRv]
+
+
+.SH DESCRIPTION
+.PP
+check_lft_balance.sh is a script which checks for balancing in Infiniband
+unicast forwarding tables.  It analyzes the output of 
+.BR dump_lfts(8)
+and
+.BR iblinkinfo(8).
+
+.SH OPTIONS
+
+.PP
+.TP
+\fB\-h\fR
+show help
+.TP
+\fB\-R\fR
+Recalculate dump_lfts information, ie do not use the cached
+information.  This option is slower but should be used if the diag tools have
+not been used for some time or if there are other reasons to believe that
+the fabric has changed.
+.TP
+\fB\-v\fR
+verbose output
+
+.SH SEE ALSO
+.BR dump_lfts(8),
+.BR iblinkinfo(8)
+
+.SH AUTHORS
+.TP
+Albert Chu
+.RI  [EMAIL PROTECTED] 
diff --git a/infiniband-diags/scripts/check_lft_balance.pl b/infiniband-diags/scripts/check_lft_balance.pl
new file mode 100755
index 000..954f319
--- /dev/null
+++ b/infiniband-diags/scripts/check_lft_balance.pl
@@ -0,0 +1,314 @@
+#!/usr/bin/perl
+#
+# Copyright (C) 2001-2003 The Regents of the University of California.
+# Copyright (c) 2006 The Regents of the University of California.
+# Copyright (c) 2007 Voltaire, Inc. All rights reserved.
+#
+# Produced at Lawrence Livermore National Laboratory.
+# 

[ofa-general] [PATCH] opensm: set SA attribute offset to 0 when no records are returned

2008-03-02 Thread Sasha Khapyorsky

IBA 1.2.1 clarifies (t.187, p.897) that SA Attribute offset shell be set
to zero if zero attributes are returned. Fix this.

Signed-off-by: Sasha Khapyorsky [EMAIL PROTECTED]
---
 opensm/opensm/osm_sa.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/opensm/opensm/osm_sa.c b/opensm/opensm/osm_sa.c
index d85463e..46c5bf7 100644
--- a/opensm/opensm/osm_sa.c
+++ b/opensm/opensm/osm_sa.c
@@ -372,6 +372,8 @@ osm_sa_send_error(IN osm_sa_t * sa,
 
if (p_resp_sa_mad-method == IB_MAD_METHOD_SET)
p_resp_sa_mad-method = IB_MAD_METHOD_GET;
+   else if (p_resp_sa_mad-method == IB_MAD_METHOD_GETTABLE)
+   p_resp_sa_mad-attr_offset = 0;
 
p_resp_sa_mad-method |= IB_MAD_METHOD_RESP_MASK;
 
@@ -473,7 +475,7 @@ void osm_sa_respond(osm_sa_t *sa, osm_madw_t *madw, size_t 
attr_size,
resp_sa_mad-sm_key = 0;
 
/* Fill in the offset (paylen will be done by the rmpp SAR) */
-   resp_sa_mad-attr_offset = ib_get_attr_offset(attr_size);
+   resp_sa_mad-attr_offset = num_rec ? ib_get_attr_offset(attr_size) : 0;
 
p = ib_sa_mad_get_payload_ptr(resp_sa_mad);
 
-- 
1.5.4.rc2.60.gb2e62

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] [PATCH] opensm: rename osm_sa_vendor_send() to osm_sa_send()

2008-03-02 Thread Sasha Khapyorsky

Rename osm_sa_vendor_send() to osm_sa_send() (since it is not part of
vendor library). Also it changes prototype to match better other SA
sender functions.

Signed-off-by: Sasha Khapyorsky [EMAIL PROTECTED]
---
 opensm/include/opensm/osm_sa.h |   17 ++---
 opensm/opensm/osm_inform.c |3 +--
 opensm/opensm/osm_sa.c |   21 +
 opensm/opensm/osm_sa_class_port_info.c |2 +-
 4 files changed, 17 insertions(+), 26 deletions(-)

diff --git a/opensm/include/opensm/osm_sa.h b/opensm/include/opensm/osm_sa.h
index f4f751b..370e4e0 100644
--- a/opensm/include/opensm/osm_sa.h
+++ b/opensm/include/opensm/osm_sa.h
@@ -351,20 +351,17 @@ osm_sa_bind(IN osm_sa_t * const p_sa, IN const ib_net64_t 
port_guid);
 * SEE ALSO
 */
 
-/f* OpenSM: SA/osm_sa_vendor_send
+/f* OpenSM: SA/osm_sa_send
 * NAME
-*  osm_sa_vendor_send
+*  osm_sa_send
 *
 * DESCRIPTION
 *  Sends SA MAD via osm_vendor_send and maintains the QP1 sent statistic
 *
 * SYNOPSIS
 */
-ib_api_status_t
-osm_sa_vendor_send(IN osm_bind_handle_t h_bind,
-  IN osm_madw_t * const p_madw,
-  IN boolean_t const resp_expected,
-  IN osm_subn_t * const p_subn);
+ib_api_status_t osm_sa_send(osm_sa_t *sa, IN osm_madw_t * const p_madw,
+   IN boolean_t const resp_expected);
 
 /f* IBA Base: Types/osm_sa_send_error
 * NAME
@@ -376,10 +373,8 @@ osm_sa_vendor_send(IN osm_bind_handle_t h_bind,
 *
 * SYNOPSIS
 */
-void
-osm_sa_send_error(IN osm_sa_t * sa,
- IN const osm_madw_t * const p_madw,
- IN const ib_net16_t sa_status);
+void osm_sa_send_error(IN osm_sa_t * sa, IN const osm_madw_t * const p_madw,
+  IN const ib_net16_t sa_status);
 /*
 * PARAMETERS
 *  sa
diff --git a/opensm/opensm/osm_inform.c b/opensm/opensm/osm_inform.c
index bbd573c..9553f7f 100644
--- a/opensm/opensm/osm_inform.c
+++ b/opensm/opensm/osm_inform.c
@@ -365,8 +365,7 @@ static ib_api_status_t __osm_send_report(IN osm_infr_t * 
p_infr_rec,/* the info
*p_report_ntc = *p_ntc;
 
/* The TRUE is for: response is expected */
-   osm_sa_vendor_send(p_report_madw-h_bind, p_report_madw, TRUE,
-  p_infr_rec-sa-p_subn);
+   osm_sa_send(p_infr_rec-sa, p_report_madw, TRUE);
 
 Exit:
OSM_LOG_EXIT(p_log);
diff --git a/opensm/opensm/osm_sa.c b/opensm/opensm/osm_sa.c
index 4edce47..d85463e 100644
--- a/opensm/opensm/osm_sa.c
+++ b/opensm/opensm/osm_sa.c
@@ -318,19 +318,17 @@ Exit:
return (status);
 }
 
-ib_api_status_t
-osm_sa_vendor_send(IN osm_bind_handle_t h_bind,
-  IN osm_madw_t * const p_madw,
-  IN boolean_t const resp_expected,
-  IN osm_subn_t * const p_subn)
+ib_api_status_t osm_sa_send(osm_sa_t *sa,
+   IN osm_madw_t * const p_madw,
+   IN boolean_t const resp_expected)
 {
ib_api_status_t status;
 
-   cl_atomic_inc(p_subn-p_osm-stats.sa_mads_sent);
-   status = osm_vendor_send(h_bind, p_madw, resp_expected);
+   cl_atomic_inc(sa-p_subn-p_osm-stats.sa_mads_sent);
+   status = osm_vendor_send(p_madw-h_bind, p_madw, resp_expected);
if (status != IB_SUCCESS) {
-   cl_atomic_dec(p_subn-p_osm-stats.sa_mads_sent);
-   OSM_LOG(p_subn-p_osm-log, OSM_LOG_ERROR, ERR 4C04: 
+   cl_atomic_dec(sa-p_subn-p_osm-stats.sa_mads_sent);
+   OSM_LOG(sa-p_log, OSM_LOG_ERROR, ERR 4C04: 
osm_vendor_send failed, status = %s\n,
ib_get_err_str(status));
}
@@ -392,8 +390,7 @@ osm_sa_send_error(IN osm_sa_t * sa,
if (osm_log_is_active(sa-p_log, OSM_LOG_FRAMES))
osm_dump_sa_mad(sa-p_log, p_resp_sa_mad, OSM_LOG_FRAMES);
 
-   osm_sa_vendor_send(osm_madw_get_bind_handle(p_resp_madw),
-  p_resp_madw, FALSE, sa-p_subn);
+   osm_sa_send(sa, p_resp_madw, FALSE);
 
 Exit:
OSM_LOG_EXIT(sa-p_log);
@@ -501,7 +498,7 @@ void osm_sa_respond(osm_sa_t *sa, osm_madw_t *madw, size_t 
attr_size,
p += attr_size;
}
 
-   osm_sa_vendor_send(resp_madw-h_bind, resp_madw, FALSE, sa-p_subn);
+   osm_sa_send(sa, resp_madw, FALSE);
 
osm_dump_sa_mad(sa-p_log, resp_sa_mad, OSM_LOG_FRAMES);
 Exit:
diff --git a/opensm/opensm/osm_sa_class_port_info.c 
b/opensm/opensm/osm_sa_class_port_info.c
index 3a76a69..f0afb32 100644
--- a/opensm/opensm/osm_sa_class_port_info.c
+++ b/opensm/opensm/osm_sa_class_port_info.c
@@ -174,7 +174,7 @@ __osm_cpi_rcv_respond(IN osm_sa_t * sa,
if (osm_log_is_active(sa-p_log, OSM_LOG_FRAMES))
osm_dump_sa_mad(sa-p_log, p_resp_sa_mad, OSM_LOG_FRAMES);
 
-   osm_sa_vendor_send(p_resp_madw-h_bind, p_resp_madw, FALSE, sa-p_subn);
+   osm_sa_send(sa, p_resp_madw, FALSE);
 
 Exit:

Re: [ofa-general] [PATCH] opensm: set SA attribute offset to 0 when no records are returned

2008-03-02 Thread Yevgeny Kliteynik


Sasha Khapyorsky wrote:

IBA 1.2.1 clarifies (t.187, p.897) that SA Attribute offset shell be set
to zero if zero attributes are returned. Fix this.
  


Nice catch, thanks.
BTW, are you aware of any other IBA 1.2.1 - related issues that need to 
be fixed?

I mean, is OpenSM fully IBA 1.2.1 compliant?

-- Yevgeny


Signed-off-by: Sasha Khapyorsky [EMAIL PROTECTED]
---
 opensm/opensm/osm_sa.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/opensm/opensm/osm_sa.c b/opensm/opensm/osm_sa.c
index d85463e..46c5bf7 100644
--- a/opensm/opensm/osm_sa.c
+++ b/opensm/opensm/osm_sa.c
@@ -372,6 +372,8 @@ osm_sa_send_error(IN osm_sa_t * sa,
 
 	if (p_resp_sa_mad-method == IB_MAD_METHOD_SET)

p_resp_sa_mad-method = IB_MAD_METHOD_GET;
+   else if (p_resp_sa_mad-method == IB_MAD_METHOD_GETTABLE)
+   p_resp_sa_mad-attr_offset = 0;
 
 	p_resp_sa_mad-method |= IB_MAD_METHOD_RESP_MASK;
 
@@ -473,7 +475,7 @@ void osm_sa_respond(osm_sa_t *sa, osm_madw_t *madw, size_t attr_size,

resp_sa_mad-sm_key = 0;
 
 	/* Fill in the offset (paylen will be done by the rmpp SAR) */

-   resp_sa_mad-attr_offset = ib_get_attr_offset(attr_size);
+   resp_sa_mad-attr_offset = num_rec ? ib_get_attr_offset(attr_size) : 0;
 
 	p = ib_sa_mad_get_payload_ptr(resp_sa_mad);
 
  

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


Re: [nfs-rdma-devel] [ofa-general] Status of NFS-RDMA ? (fwd)

2008-03-02 Thread Tom Tucker

On Fri, 2008-02-29 at 09:29 +0100, Sebastian Schmitzdorff wrote:
 hi pawel,
 
 I was wondering if you have achieved better nfs rdma benchmark results 
 by now?

Pawel:

What is your network hardware setup? 

Thanks,
Tom

 
 regards
 Sebastian
 
 Pawel Dziekonski schrieb:
  hi,
 
  the saga continues. ;)
 
  very basic benchmarks and surprising (at least for me) results - it
  look's like reading is much slower than writing and NFS/RDMA is twice
  slower in reading than classic NFS. :o
 
  results below - comments appreciated!
  regards, Pawel
 
 
  both nfs server and client have 8-cores, 16 GB RAM, Mellanox DDR HCAs
  (MT25204) connected port-port (no switch).
 
  local_hdd - 2 sata2 disks in soft-raid0,
  nfs_ipoeth - classic nfs over ethernet,
  nfs_ipoib - classic nfs over IPoIB,
  nfs_rdma - NFS/RDMA.
 
  simple write of 36GB file with dd (both machines have 16GB RAM):
  /usr/bin/time -p dd if=/dev/zero of=/mnt/qqq bs=1M count=36000
 
  local_hddsys 54.52user 0.04real 254.59
   
  nfs_ipoibsys 36.35user 0.00real 266.63
  nfs_rdma sys 39.03user 0.02real 323.77
  nfs_ipoeth   sys 34.21user 0.01real 375.24
 
  remount /mnt to clear cache and read a file from nfs share and
  write it to /dev/:
  /usr/bin/time -p dd if=/mnt/qqq of=/scratch/qqq bs=1M
 
  nfs_ipoib   sys 59.04user 0.02real 571.57
  nfs_ipoeth  sys 58.92user 0.02real 606.61
  nfs_rdmasys 62.57user 0.03real 1296.36
 
 
 
  results from bonnie++:
 
  Version  1.03c  --Sequential Write -- --Sequential Read -- 
  --Random-
  -Per Chr- --Block-- -Rewrite- -Per Chr-  --Block-- 
  --Seeks--
  MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP  K/sec %CP  
  /sec %CP
  local_hdd  35G:128k   93353  12 58329   6   143293   7 
  243.6   1
  local_hdd  35G:256k   92283  11 58189   6   144202   8 
  172.2   2
  local_hdd  35G:512k   93879  12 57715   6   144167   8 
  128.2   4
  local_hdd 35G:1024k   93075  12 58637   6   144172   8  
  95.3   7
  nfs_ipoeth 35G:128k   91325   7 31848   464299   4 
  170.2   1
  nfs_ipoeth 35G:256k   90668   7 32036   564542   4 
  163.2   2
  nfs_ipoeth 35G:512k   93348   7 31757   564454   4  
  85.7   3
  nfs_ipoet 35G:1024k   91283   7 31869   564241   5  
  51.7   4
  nfs_ipoib  35G:128k   91733   7 36641   565839   4 
  178.4   2
  nfs_ipoib  35G:256k   92453   7 36567   666682   4 
  166.9   3
  nfs_ipoib  35G:512k   91157   7 37660   666318   4  
  86.8   3
  nfs_ipoib 35G:1024k   92111   7 35786   666277   5  
  53.3   4
  nfs_rdma   35G:128k   91152   8 29942   532147   2 
  187.0   1
  nfs_rdma   35G:256k   89772   7 30560   534587   2 
  158.4   3
  nfs_rdma   35G:512k   91290   7 29698   534277   2  
  60.9   2
  nfs_rdma  35G:1024k   91336   8 29052   531742   2  
  41.5   3
  --Sequential Create-- Random 
  Create
  -Create-- --Read--- -Delete-- -Create-- --Read--- 
  -Delete--
  files:max:min/sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec 
  %CP
  local_hdd16 10587  36 + +++  8674  29 10727  35 + +++  7015 
   28
  local_hdd16 11372  41 + +++  8490  29 11192  43 + +++  6881 
   27
  local_hdd16 10789  35 + +++  8520  29 11468  46 + +++  6651 
   24
  local_hdd16 10841  40 + +++  8443  28 11162  41 + +++  6441 
   22
  nfs_ipoeth   16  3753   7 13390  12  3795   7  3773   8 22181  16  3635 
7
  nfs_ipoeth   16  3762   8 12358   7  3713   8  3753   7 20448  13  3632 
6
  nfs_ipoeth   16  3834   7 12697   6  3729   8  3725   9 22807  11  3673 
7
  nfs_ipoeth   16  3729   8 14260  10  3774   7  3744   7 25285  14  3688 
7
  nfs_ipoib16  6803  17 + +++  6843  15  6820  14 + +++  5834 
   11
  nfs_ipoib16  6587  16 + +++  4959   9  6832  14 + +++  5608 
   12
  nfs_ipoib16  6820  18 + +++  6636  15  6479  15 + +++  5679 
   13
  nfs_ipoib16  6475  14 + +++  6435  14  5543  11 + +++  5431 
   11
  nfs_rdma 16  7014  15 + +++  6714  10  7001  14 + +++  5683 
8
  nfs_rdma 16  7038  13 + +++  6713  12  6956  11 + +++  5488 
8
  nfs_rdma 16  7058  12 + +++  6797  11  6989  14 + +++  5761 
9
  nfs_rdma 16  7201  13 + +++  6821  12  7072  15 + +++  5609 
9
 
 

 
 

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit 

[ofa-general] Re: [PATCH] mmu notifiers #v8

2008-03-02 Thread Nick Piggin
On Sun, Mar 02, 2008 at 04:54:57PM +0100, Andrea Arcangeli wrote:
 Difference between #v7 and #v8:
 
 1) s/age_page/clear_flush_young/ (Nick's suggestion)
 2) macro fix (Andrew)
 3) move release before final unmap_vmas (for GRU, Jack/Christoph)
 4) microoptimize mmu_notifier_unregister (Christoph)
 5) use mmap_sem for registration serialization (Christoph)
 
 The (void)xxx in macros doesn't work with args. Christoph's solution
 look best in avoiding warnings, even if it forces to make the mmu
 notifier operation structure visible even if MMU_NOTIFIER=n (that's
 the only downside).

I have a couple of cleanup patches that change the structure of this
to something I prefer. Others may not, but I'll post them for debate
anyway.

 
 I didn't drop invalidate_page, because invalidate_range_begin/end
 would be slower for usages like KVM/GRU (we don't need a begin/end
 there because where invalidate_page is called, the VM holds a
 reference on the page). do_wp_page should also use invalidate_page
 since it can free the page after dropping the PT lock without losing
 any performance (that's not true for the places where invalidate_range
 is called).

I'm still not completely happy with this. I had a very quick look
at the GRU driver, but I don't see why it can't be implemented
more like the regular TLB model, and have TLB insertions depend on
the linux pte, and do invalidates _after_ restricting permissions
to the pte.

Ie. I'd still like to get rid of invalidate_range_begin, and get
rid of invalidate calls from places where permissions are relaxed.


 It'd be nice if everyone involved can agree to converge on this API
 for .25. KVM/GRU (and perhaps Quadrics) and similar usages will be
 fully covered in .25.

If we can agree on the API, then I don't see any reason why it can't
go into 2.6.25, unless someome wants more time to review it (but
2.6.25 release should be quite far away still so there should be quite
a bit of time).
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: [PATCH] mmu notifiers #v8

2008-03-02 Thread Nick Piggin
On Sun, Mar 02, 2008 at 04:54:57PM +0100, Andrea Arcangeli wrote:
 Difference between #v7 and #v8:

[patch] mmu-v8: demacro


Remove the macros from mmu_notifier.h, in favour of functions.

This requires untangling the include order circular dependencies as well,
so just remove struct mmu_notifier_head in favour of just using the hlist
in mm_struct.

Signed-off-by: Nick Piggin [EMAIL PROTECTED]
---
Index: linux-2.6/include/linux/mmu_notifier.h
===
--- linux-2.6.orig/include/linux/mmu_notifier.h
+++ linux-2.6/include/linux/mmu_notifier.h
@@ -55,12 +55,13 @@ struct mmu_notifier {
 
 #ifdef CONFIG_MMU_NOTIFIER
 
-struct mmu_notifier_head {
-   struct hlist_head head;
-};
-
 #include linux/mm_types.h
 
+static inline int mm_has_notifiers(struct mm_struct *mm)
+{
+   return unlikely(!hlist_empty(mm-mmu_notifier_list));
+}
+
 /*
  * Must hold the mmap_sem for write.
  *
@@ -79,33 +80,59 @@ extern void mmu_notifier_register(struct
  */
 extern void mmu_notifier_unregister(struct mmu_notifier *mn,
struct mm_struct *mm);
-extern void mmu_notifier_release(struct mm_struct *mm);
-extern int mmu_notifier_clear_flush_young(struct mm_struct *mm,
+
+extern void __mmu_notifier_release(struct mm_struct *mm);
+extern int __mmu_notifier_clear_flush_young(struct mm_struct *mm,
  unsigned long address);
+extern void __mmu_notifier_invalidate_page(struct mm_struct *mm,
+ unsigned long address);
+extern void __mmu_notifier_invalidate_range_begin(struct mm_struct *mm,
+ unsigned long start, unsigned long end);
+extern void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+ unsigned long start, unsigned long end);
+
+
+static inline void mmu_notifier_release(struct mm_struct *mm)
+{
+   if (mm_has_notifiers(mm))
+   __mmu_notifier_release(mm);
+}
+
+static inline int mmu_notifier_clear_flush_young(struct mm_struct *mm,
+ unsigned long address)
+{
+   if (mm_has_notifiers(mm))
+   return __mmu_notifier_clear_flush_young(mm, address);
+   return 0;
+}
+
+static inline void mmu_notifier_invalidate_page(struct mm_struct *mm,
+ unsigned long address)
+{
+   if (mm_has_notifiers(mm))
+   __mmu_notifier_invalidate_page(mm, address);
+}
+
+static inline void mmu_notifier_invalidate_range_begin(struct mm_struct *mm,
+ unsigned long start, unsigned long end)
+{
+   if (mm_has_notifiers(mm))
+   __mmu_notifier_invalidate_range_begin(mm, start, end);
+}
+
+static inline void mmu_notifier_invalidate_range_end(struct mm_struct *mm,
+ unsigned long start, unsigned long end)
+{
+   if (mm_has_notifiers(mm))
+   __mmu_notifier_invalidate_range_end(mm, start, end);
+}
 
-static inline void mmu_notifier_head_init(struct mmu_notifier_head *mnh)
+static inline void mmu_notifier_mm_init(struct mm_struct *mm)
 {
-   INIT_HLIST_HEAD(mnh-head);
+   INIT_HLIST_HEAD(mm-mmu_notifier_list);
 }
 
-#define mmu_notifier(function, mm, args...)\
-   do {\
-   struct mmu_notifier *__mn;  \
-   struct hlist_node *__n; \
-   struct mm_struct * __mm = mm;   \
-   \
-   if (unlikely(!hlist_empty(__mm-mmu_notifier.head))) { \
-   rcu_read_lock();\
-   hlist_for_each_entry_rcu(__mn, __n, \
-__mm-mmu_notifier.head, \
-hlist) \
-   if (__mn-ops-function)\
-   __mn-ops-function(__mn,   \
-   __mm,   \
-   args);  \
-   rcu_read_unlock();  \
-   }   \
-   } while (0)
+
 
 #define ptep_clear_flush_notify(__vma, __address, __ptep)  \
 ({ \
@@ -113,7 +140,7 @@ static inline void mmu_notifier_head_ini
struct vm_area_struct * ___vma = __vma; \
unsigned long ___address = __address;   \
__pte = ptep_clear_flush(___vma, ___address, __ptep);   \

[ofa-general] Re: [PATCH] mmu notifiers #v8

2008-03-02 Thread Nick Piggin
On Sun, Mar 02, 2008 at 04:54:57PM +0100, Andrea Arcangeli wrote:
 Difference between #v7 and #v8:

This one on top of the previous patch

[patch] mmu-v8: typesafe

Move definition of struct mmu_notifier and struct mmu_notifier_ops under
CONFIG_MMU_NOTIFIER to ensure they doesn't get dereferenced when they
don't make sense.

Signed-off-by: Nick Piggin [EMAIL PROTECTED]
---
Index: linux-2.6/include/linux/mmu_notifier.h
===
--- linux-2.6.orig/include/linux/mmu_notifier.h
+++ linux-2.6/include/linux/mmu_notifier.h
@@ -3,8 +3,12 @@
 
 #include linux/list.h
 #include linux/spinlock.h
+#include linux/mm_types.h
 
 struct mmu_notifier;
+struct mmu_notifier_ops;
+
+#ifdef CONFIG_MMU_NOTIFIER
 
 struct mmu_notifier_ops {
/*
@@ -53,10 +57,6 @@ struct mmu_notifier {
const struct mmu_notifier_ops *ops;
 };
 
-#ifdef CONFIG_MMU_NOTIFIER
-
-#include linux/mm_types.h
-
 static inline int mm_has_notifiers(struct mm_struct *mm)
 {
return unlikely(!hlist_empty(mm-mmu_notifier_list));
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] I'd like to show you my pic

2008-03-02 Thread Marie William
Hello! I am tired today. I am nice girl that would like to chat with you. Email 
me at [EMAIL PROTECTED] only, because I am using my friend's email to write 
this. Don't miss some of my naughty pictures.


___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: [PATCH] mmu notifiers #v8

2008-03-02 Thread Nick Piggin
On Sun, Mar 02, 2008 at 04:54:57PM +0100, Andrea Arcangeli wrote:
 Difference between #v7 and #v8:

Here is just a couple of checkpatch fixes on top of the last patches.

Index: linux-2.6/include/linux/mmu_notifier.h
===
--- linux-2.6.orig/include/linux/mmu_notifier.h
+++ linux-2.6/include/linux/mmu_notifier.h
@@ -46,7 +46,7 @@ struct mmu_notifier_ops {
 */
void (*invalidate_range_begin)(struct mmu_notifier *mn,
   struct mm_struct *mm,
-  unsigned long start, unsigned long end); 
  
+  unsigned long start, unsigned long end);
void (*invalidate_range_end)(struct mmu_notifier *mn,
 struct mm_struct *mm,
 unsigned long start, unsigned long end);
@@ -137,7 +137,7 @@ static inline void mmu_notifier_mm_init(
 #define ptep_clear_flush_notify(__vma, __address, __ptep)  \
 ({ \
pte_t __pte;\
-   struct vm_area_struct * ___vma = __vma; \
+   struct vm_area_struct *___vma = __vma;  \
unsigned long ___address = __address;   \
__pte = ptep_clear_flush(___vma, ___address, __ptep);   \
mmu_notifier_invalidate_page(___vma-vm_mm, ___address);\
@@ -147,7 +147,7 @@ static inline void mmu_notifier_mm_init(
 #define ptep_clear_flush_young_notify(__vma, __address, __ptep)
\
 ({ \
int __young;\
-   struct vm_area_struct * ___vma = __vma; \
+   struct vm_area_struct *___vma = __vma;  \
unsigned long ___address = __address;   \
__young = ptep_clear_flush_young(___vma, ___address, __ptep);   \
__young |= mmu_notifier_clear_flush_young(___vma-vm_mm,\
___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] Re: [patch 2/6] mmu_notifier: Callbacks to invalidate address ranges

2008-03-02 Thread Nick Piggin
On Thursday 28 February 2008 09:35, Christoph Lameter wrote:
 On Wed, 20 Feb 2008, Nick Piggin wrote:
  On Friday 15 February 2008 17:49, Christoph Lameter wrote:

  Also, what we are going to need here are not skeleton drivers
  that just do all the *easy* bits (of registering their callbacks),
  but actual fully working examples that do everything that any
  real driver will need to do. If not for the sanity of the driver
  writer, then for the sanity of the VM developers (I don't want
  to have to understand xpmem or infiniband in order to understand
  how the VM works).

 There are 3 different drivers that can already use it but the code is
 complex and not easy to review. Skeletons are easy to allow people to get
 started with it.

Your skeleton is just registering notifiers and saying

/* you fill the hard part in */

If somebody needs a skeleton in order just to register the notifiers,
then almost by definition they are unqualified to write the hard
part ;)


 lru_add_drain();
 tlb = tlb_gather_mmu(mm, 0);
 update_hiwater_rss(mm);
   + mmu_notifier(invalidate_range_begin, mm, address, end, atomic);
 end = unmap_vmas(tlb, vma, address, end, nr_accounted, details);
 if (tlb)
 tlb_finish_mmu(tlb, address, end);
   + mmu_notifier(invalidate_range_end, mm, address, end, atomic);
 return end;
}
 
  Where do you invalidate for munmap()?

 zap_page_range() called from unmap_vmas().

But it is not allowed to sleep. Where do you call the sleepable one
from?


  Also, how to you resolve the case where you are not allowed to sleep?
  I would have thought either you have to handle it, in which case nobody
  needs to sleep; or you can't handle it, in which case the code is
  broken.

 That can be done in a variety of ways:

 1. Change VM locking

 2. Not handle file backed mappings (XPmem could work mostly in such a
 config)

 3. Keep the refcount elevated until pages are freed in another execution
 context.

OK, there are ways to solve it or hack around it. But this is exactly
why I think the implementations should be kept seperate. Andrea's
notifiers are coherent, work on all types of mappings, and will
hopefully match closely the regular TLB invalidation sequence in the
Linux VM (at the moment it is quite close, but I hope to make it a
bit closer) so that it requires almost no changes to the mm.

All the other things to try to make it sleep are either hacking holes
in it (eg by removing coherency). So I don't think it is reasonable to
require that any patch handle all cases. I actually think Andrea's
patch is quite nice and simple itself, wheras I am against the patches
that you posted.

What about a completely different approach... XPmem runs over NUMAlink,
right? Why not provide some non-sleeping way to basically IPI remote
nodes over the NUMAlink where they can process the invalidation? If you
intra-node cache coherency has to run over this link anyway, then
presumably it is capable.

Or another idea, why don't you LD_PRELOAD in the MPT library to also
intercept munmap, mprotect, mremap etc as well as just fork()? That
would give you similarly good enough coherency as the mmu notifier
patches except that you can't swap (which Robin said was not a big
problem).

___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general


[ofa-general] legally Erase Your Credit Card Debt

2008-03-02 Thread Richie Valencia
Get Out of Debt Today. Avoid Bankruptcy. Save Thousands... The Professional 
Way!!
http://ilionf.com.cn/___
general mailing list
general@lists.openfabrics.org
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general