Re: [openib-general] [PATCH] osm: partition manager force policy

Eitan Zahavi Sat, 17 Jun 2006 12:40:20 -0700

Sasha Khapyorsky wrote:

I'm working on the changes below. I will send them all as one patch


EZ
> Hi Eitan,
> 
> On 15:19 Thu 15 Jun     , Eitan Zahavi wrote:
> 
>>>>+/*
>>>>+* PARAMETERS
>>>>+*  p_physp
>>>>+*     [in] Pointer to an osm_physp_t object.
>>>>+*
>>>>+* RETURN VALUES
>>>>+*  The pointer to the P_Key table object.
>>>>+*
>>>>+* NOTES
>>>>+*
>>>>+* SEE ALSO
>>>>+*  Port, Physical Port
>>>>+*********/
>>>>+
>>>
>>>
>>>Is not this simpler to remove 'const' from existing
>>>osm_physp_get_pkey_tbl() function instead of using new one?
>>
>>There are plenty of const functions using this function internally
>>so I would have need to fix them too.
> 
> 
> You are right. Maybe separate patch for this?
> 
I think it is preferable to keep the const function.

> 
>>>>@@ -118,14 +121,29 @@ void osm_pkey_tbl_sync_new_blocks(
>>>>   p_block = cl_ptr_vector_get(&p_pkey_tbl->blocks, b);
>>>>   if ( b < new_blocks )
>>>>     p_new_block = cl_ptr_vector_get(&p_pkey_tbl->new_blocks, b);
>>>>-    else {
>>>>+           else 
>>>>+      {
>>>>     p_new_block = (ib_pkey_table_t *)malloc(sizeof(*p_new_block));
>>>>     if (!p_new_block)
>>>>       break;
>>>>+                   cl_ptr_vector_set(&((osm_pkey_tbl_t 
>>>>*)p_pkey_tbl)->new_blocks, +                                                
>>>> b, 
>>>>p_new_block);
>>>>+           }
>>>>+
>>>>     memset(p_new_block, 0, sizeof(*p_new_block));
>>>>-      cl_ptr_vector_set(&((osm_pkey_tbl_t *)p_pkey_tbl)->new_blocks, b, 
>>>>p_new_block);
>>>>   }
>>>>-    memcpy(p_new_block, p_block, sizeof(*p_new_block));
>>>>+}
>>>
>>>
>>>You changed this function so it does not do any sync anymore. Should
>>>function name be changed too?
>>
>>Yes correct I will change it. Is a better name:
>>osm_pkey_tbl_init_new_blocks ?
> 
> 
> Great name.
> 
> 
>>>>+  to show that on the "old" blocks
>>>>+*/
>>>>+int
>>>>+osm_pkey_tbl_set_new_entry( 
>>>>+   IN osm_pkey_tbl_t *p_pkey_tbl,
>>>>+   IN uint16_t        block_idx,
>>>>+   IN uint8_t         pkey_idx,
>>>>+   IN uint16_t        pkey)
>>>>+{  
>>>>+   ib_pkey_table_t *p_old_block;
>>>>+   ib_pkey_table_t *p_new_block;
>>>>+   
>>>>+   if (osm_pkey_tbl_make_block_pair(
>>>>+                    p_pkey_tbl,  block_idx, &p_old_block, &p_new_block))
>>>>+           return 1;
>>>>+           
>>>>+   cl_map_insert( &p_pkey_tbl->keys,
>>>>+                                           ib_pkey_get_base(pkey),
>>>>+                                    
>>>>&(p_old_block->pkey_entry[pkey_idx]));
>>>
>>>
>>>Here you map potentially empty pkey entry. Why? "old block" will be
>>>remapped anyway on pkey receiving.
>>
>>The reason I did this was that if the GetResp will fail I still want to 
>>represent
>>the settings in the map.But actually it might be better not to do that so 
>>next
>>time we run we will not find it without a GetResp.
> 
> 
> Agree.
> 
> 
>>>>+   IN       uint16_t                *p_pkey,
>>>>+   OUT uint32_t             *p_block_idx,
>>>>+   OUT uint8_t                      *p_pkey_index)
>>>>+{
>>>>+   uint32_t                          num_of_blocks;
>>>>+   uint32_t                          block_index;
>>>>+   ib_pkey_table_t *block;
>>>>+
>>>>+   CL_ASSERT( p_pkey_tbl );
>>>>+   CL_ASSERT( p_block_idx != NULL );
>>>>+   CL_ASSERT( p_pkey_idx != NULL );
>>>
>>>
>>>Why last two CL_ASSERTs? What should be problem with uninitialized
>>>pointers here?
>>>
>>
>>These are the outputs of the function. It does not make sense to call the 
>>functions with
>>null output pointers (calling by ref) . Anyway instead of putting the check 
>>in the free build
>>I used an assert
> 
> 
> I see. Actually I've overlooked that addresses and not values are
> checked. Please ignore this comment.
> 
> 
>>>>+
>>>>+   p_pkey_tbl = osm_physp_get_mod_pkey_tbl( p_physp );
>>>>+   if (! p_pkey_tbl)
>>>
>>>          ^^^^^^^^^^^^^
>>>Is it possible?
>>
>>Yes it is ! I run into it during testing. The port did not have any pkey 
>>table.
> 
> 
> static inline osm_pkey_tbl_t *
> osm_physp_get_mod_pkey_tbl( IN osm_physp_t* const p_physp )
> {
> ...
>   return( &p_physp->pkeys );
> };
> 
> This returns the address of physp's pkeys field. Right?
> Then if ( &p_physp->pkeys == NULL ) p_physp pointer should be equal to
> unsigned equivalent of -(offset of pkey field in physp struct).

Correct. I will remove the check.
> 
> 
>>>>+                                   "Fail to allocate new pending pkey 
>>>>entry for node "
>>>>+                                   "0x%016" PRIx64 " port %u\n",
>>>>+                                   cl_ntoh64( osm_node_get_node_guid( 
>>>>p_node ) ),
>>>>+                                   osm_physp_get_port_num( p_physp ) );
>>>>+           return;
>>>>+   }
>>>>+   p_pending->pkey = pkey;
>>>>+   p_orig_pkey = cl_map_get( &p_pkey_tbl->keys, ib_pkey_get_base( pkey 
>>>>) );
>>>>+   if ( !p_orig_pkey  || 
>>>>+             (ib_pkey_get_base(*p_orig_pkey) != ib_pkey_get_base(pkey) 
>>>>))
>>>
>>>
>>>There the cases of new pkey and updated pkey membership is mixed. Why?
>>
>>I am not following your question.
>>The specific case I am trying to catch is the one that for some reason the 
>>map points to
>>a pkey entry that was modified somehow and is different then the one you 
>>would expect by
>>the map.
> 
> 
> Didn't understand it at first pass, now it is clearer.
> 
> If pkey entry was modified somehow (how? bugs?), the assumption is that
> mapping still be valid? Then it is not new entry (or we will change
> pkey's index in the real table).
> 
PKey table mismatch between the block and map should never happen.
I will remove the check and replace that with an ASSERT so I catch the bug if 
we hit it.

> 
>>>>+   {
>>>>+           p_pending->is_new = TRUE;
>>>>+           cl_qlist_insert_tail(&p_pkey_tbl->pending, 
>>>>(cl_list_item_t*)p_pending);
>>>>+           stat = "inserted";
>>>>+   }
>>>>+   else
>>>>+   {
>>>>+           p_pending->is_new = FALSE;
>>>>+           if (osm_pkey_tbl_get_block_and_idx(p_pkey_tbl, p_orig_pkey,
>>>>+                                                                    
>>>>&p_pending->block, &p_pending->index))
>>>
>>>                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>AFAIK in this function there were CL_ASSERTs which check for uinitialized
>>>pointers.
>>
>>True. So the asserts are not required in this case.
> 
> 
> Up to you. Actually this my comment may be ignored, as stated above I
> didn't read this correctly.
> 
> 
>>>
>>>>+           {
>>>>+                   osm_log( p_log, OSM_LOG_ERROR,
>>>>+                                    "pkey_mgr_process_physical_port: 
>>>>ERR 0503: "
>>>>+                                           "Fail to obtain P_Key 0x%04x 
>>>>block and index for node "
>>>>+                                           "0x%016" PRIx64 " port %u\n",
>>>>+                                           cl_ntoh64( 
>>>>osm_node_get_node_guid( p_node ) ),
>>>>+                                           osm_physp_get_port_num( 
>>>>p_physp ) );
>>>>+                   return;
>>>>+           }
>>>>+           cl_qlist_insert_head(&p_pkey_tbl->pending, 
>>>>(cl_list_item_t*)p_pending);
>>>>+           stat = "updated";
>>>
>>>
>>>Is it will be updated? It is likely "already there" case. No?
>>>
>>>Also in this case you can already put the pkey in new_block instead of
>>>holding it in pending list. Then later you will only need to add new
>>>pkeys. This may simplify the flow and even save some mem.
>>
>>True but in my mind it does not simplify - on the contrary it makes the 
>>partition between
>>populating each port pending list and actually setting the pkey tables 
>>mixed.
> 
> 
> I meant new_block filling, not actual setting. You will be able to
> remove whole if { } else { } flow, as well as is_new, block and index
> fields from 'pending' structure (actually only pkey value itself will
> matter) - is it not nice simplification?
I still prefer the clear staging: append to list when scanning the partitions 
and
filling in the tables when looping on all ports.
> 
> 
>>I do not think the memory impact deserves this mix of staging
>>
>>
>>>
> 
>>>>+   max_num_of_blocks = pkey_mgr_get_physp_max_blocks( p_req->p_subn, 
>>>>p_physp );
>>>>+   if (    p_pkey_tbl->max_blocks > max_num_of_blocks )
>>>>     {
>>>>-         block = osm_pkey_tbl_new_block_get( p_pkey_tbl, block_index );
>>>>-         for ( i = 0; i < IB_NUM_PKEY_ELEMENTS_IN_BLOCK; i++ )
>>>>+           osm_log( p_log, OSM_LOG_INFO,
>>>>+                                   "pkey_mgr_update_port: "
>>>>+                                   "Max number of blocks reduced from 
>>>>%u to %u " +                                        "for node 0x%016" 
>>>>PRIx64 " 
>>>>port %u\n",
>>>>+                                   p_pkey_tbl->max_blocks, 
>>>>max_num_of_blocks,
>>>>+                                   cl_ntoh64( osm_node_get_node_guid( 
>>>>p_node ) ),
>>>>+                                   osm_physp_get_port_num( p_physp ) ); 
>>>>+   }
>>>>+   p_pkey_tbl->max_blocks = max_num_of_blocks;
>>>>+
>>>>+   osm_pkey_tbl_sync_new_blocks( p_pkey_tbl );
>>>>+   cl_map_remove_all( &p_pkey_tbl->keys );
>>>
>>>
>>>What is the reason to drop map here? AFAIK it will be reinitialized later
>>>anyway when pkey blocks will be received.
>>
>>What if it is not received?
> 
> 
> Then we will have unreliable data there.
> 
> Maybe I know why you wanted this - this is part of "use pkey tables
> before sending/receiving to/from ports" idea?
> 
> 
>>>>@@ -255,24 +443,36 @@ pkey_mgr_update_peer_port(
>>>>  if (enforce == FALSE)
>>>>     return FALSE;
>>>>
>>>>-   p_pkey_tbl = osm_physp_get_pkey_tbl( p );
>>>>-   p_peer_pkey_tbl = osm_physp_get_pkey_tbl( peer );
>>>>+   p_pkey_tbl = osm_physp_get_pkey_tbl( p_physp );
>>>>+   p_peer_pkey_tbl = osm_physp_get_mod_pkey_tbl( peer );
>>>>  num_of_blocks = osm_pkey_tbl_get_num_blocks( p_pkey_tbl );
>>>>-   if ( num_of_blocks > osm_pkey_tbl_get_num_blocks( p_peer_pkey_tbl ) )
>>>>-      num_of_blocks = osm_pkey_tbl_get_num_blocks( p_peer_pkey_tbl );
>>>>+   peer_max_blocks = pkey_mgr_get_physp_max_blocks( p_subn, peer );
>>>>+   if (peer_max_blocks < p_pkey_tbl->used_blocks)
>>>>+   {
>>>>+           osm_log( p_log, OSM_LOG_ERROR,
>>>>+                                   "pkey_mgr_update_peer_port: ERR 
>>>>0508: "
>>>>+                                   "not enough entries (%u < %u) on 
>>>>switch 0x%016" PRIx64
>>>>+                                   " port %u\n",
>>>>+                                   peer_max_blocks, num_of_blocks,
>>>>+                                   cl_ntoh64( osm_node_get_node_guid( 
>>>>p_node ) ),
>>>>+                                   osm_physp_get_port_num( peer ) );
>>>>+           return FALSE;
>>>
>>>
>>>Do you think it is the best way, just to skip update - partitions are
>>>enforced already on the switch. May be better to truncate pkey tables
>>>in order to meet peer's capabilities?
>>
>>You are right about that - Its a bug!
>>I think the best approach here is to turn off the enforcement on the switch.
>>If we truncate the table we actually impact connectivity of the fabric.
>>I prefer a softer approach - an error in the log.
> 
> 
> Yes this should be good way to handle this.
> 
> 
>>>
>>>>+   }
>>>>
>>>>-   for ( block_index = 0; block_index < num_of_blocks; block_index++ )
>>>>+   p_peer_pkey_tbl->used_blocks = p_pkey_tbl->used_blocks;
>>>>+   for ( block_index = 0; block_index < p_pkey_tbl->used_blocks; 
>>>>block_index++)
>>>>  {
>>>>     block = osm_pkey_tbl_new_block_get( p_pkey_tbl, block_index );
>>>>     peer_block = osm_pkey_tbl_block_get( p_peer_pkey_tbl, block_index );
>>>>     if ( memcmp( peer_block, block, sizeof( *peer_block ) ) )
>>>>     {
>>>>+                   osm_pkey_tbl_set(p_peer_pkey_tbl, block_index, 
>>>>block);
>>>
>>>
>>>Why this (osm_pkey_tbl_set())? This will be called by receiver.
>>
>>Same as the above note about updating the map
>>I wanted to avoid to wait for the GetResp.
>>I think it is a mistake and we can actually remove it.
> 
> 
> Agree.
> 
> Sasha.


_______________________________________________
openib-general mailing list
openib-general@openib.org
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Re: [openib-general] [PATCH] osm: partition manager force policy

Reply via email to