Hi Steffen 
I think I understood your description correctly from the beginning. However the 
problem you described should not happen with a static (unchanged) table, 
because of the inner logic of TableUtils. 
I assume, that the agent does not return the rows in lexicographic order. That 
would have the same effect as if a row is dynamically appearing during 
retrieval. 

I do not want to exclude an off-by-one error in TableUtils but all unit tests I 
run so far do not indicate that. 

What agent are you using?

Nevertheless, the new version will not show the issue you observed with the 
mode denseTableDoubleCheckIncompleteRows

Best regards 
Frank

> Am 19.07.2018 um 17:20 schrieb Steffen Brüntjen <steffen.bruent...@macmon.eu>:
> 
> Hi Frank
> 
> 
> I'm not sure whether we're talking about the same thing. The problem I 
> described is *not* a timinig problem with rows being added to or removed from 
> the table while retrieving rows. The table I am querying doesn't change at 
> all and the problem is highly reproducible. Let's see the example again:
> 
> 
> This is how the List<TableEvent> result should look like and how it actually 
> does - always - when the max-bindings is set to 1 or 32 or some other value.
> 
> [ ... 75 normal rows ... ]
> [1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = 
> service]
> [1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = 
> reception]
> [1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, [...], 1.3.6.1.2.1.31.1.1.1.18.283 = 
> voice]
> [1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, [...], 1.3.6.1.2.1.31.1.1.1.18.373 = 
> clients]
> [1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, [...], 1.3.6.1.2.1.31.1.1.1.18.774 = 
> VLAN601]
> [1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, [...], 1.3.6.1.2.1.31.1.1.1.18.783 = 
> lab6]
> [ ... everything normal ... ]
> 
> 
> When setting the max-bindings to 4 (I'm requesting 7 columns), I - always - 
> get these TableEvents:
> 
> [ ... 75 normal rows ... ]
> [1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 = 
> service] 
> [1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 = 
> reception]
> [null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.283 = 2, 
> 1.3.6.1.2.1.31.1.1.1.15.283 = 0, 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
> [null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.373 = 2, 
> 1.3.6.1.2.1.31.1.1.1.15.373 = 0, 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
> [null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.774 = 2, 
> 1.3.6.1.2.1.31.1.1.1.15.774 = 0, 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
> [null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.783 = 2, 
> 1.3.6.1.2.1.31.1.1.1.15.783 = 0, 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
> [1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, 1.3.6.1.2.1.31.1.1.1.17.283 = 2, 
> 1.3.6.1.2.1.31.1.1.1.6.283 = 0, 1.3.6.1.2.1.31.1.1.1.10.283 = 0, null, null, 
> null]
> [1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, 1.3.6.1.2.1.31.1.1.1.17.373 = 2, 
> 1.3.6.1.2.1.31.1.1.1.6.373 = 0, 1.3.6.1.2.1.31.1.1.1.10.373 = 0, null, null, 
> null]
> [1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, 1.3.6.1.2.1.31.1.1.1.17.774 = 2, 
> 1.3.6.1.2.1.31.1.1.1.6.774 = 0, 1.3.6.1.2.1.31.1.1.1.10.774 = 0, null, null, 
> null]
> [1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, 1.3.6.1.2.1.31.1.1.1.17.783 = 2, 
> 1.3.6.1.2.1.31.1.1.1.6.783 = 0, 1.3.6.1.2.1.31.1.1.1.10.783 = 0, null, null, 
> null]
> [ ... everything normal ... ]
> 
> 
> The returned List<TableEvent> contains 4 more results, because 4 table rows 
> are split into two TableEvents. We can see that these indexes seem to have 
> two rows:
>  index=283
>  index=373
>  index=774
>  index=783
> 
> 
> It's like this table
> 
> 
> IDX |  A  |  B  |  C  |  D
> ----+-----+-----+-----+-----
> 0   |  1  |  2  |  3  |  4
> 1   |  5  |  6  |  7  |  8
> 2   |  9  | 10  | 11  | 12
> 3   | 13  | 14  | 15  | 16
> 
> 
> becomes something like this when obtained by TableUtils:
> 
> IDX |  A  |  B  |  C  |  D
> ----+-----+-----+-----+-----
> 0   |  1  |  2  |  3  |  4
> 1   | null| null|  7  |  8        <-- index=1
> 2   | null| null| 11  | 12        <-- index=2
> 1   |  5  |  6  | null| null      <-- index=1
> 2   |  9  | 10  | null| null      <-- index=2
> 3   | 13  | 14  | 15  | 16
> 
> 
> I tried to describe the reason for this, but it's a bit complicated I admit. 
> Of course it's also possible that I didn't understand your answer correctly. 
> Sorry for the confusion in that case. Then I'd be willing to grasp how sparse 
> and dense tables are the reason for this problem. 
> 
> Thanks for the clarification on tooBig errors with GETBULK requests!
> 
> 
> Best regards
> Steffen Brüntjen
> 
> 
> 
> -----Original Message-----
> From: Frank Fock [mailto:f...@agentpp.com] 
> Sent: Donnerstag, 12. Juli 2018 08:41
> To: Steffen Brüntjen <steffen.bruent...@macmon.eu>
> Cc: snmp4j@agentpp.org
> Subject: Re: [SNMP4J] max-bindings with big tables
> 
> Hi Steffen,
> 
> If the agent sends a tooBig error on a GETBULK request, then this is an error 
> in the agent. See RFC3416 4.2.3:
>    
>    If the size of the message encapsulating the Response-PDU
>         containing the requested number of variable bindings would be
>         greater than either a local constraint or the maximum message
>         size of the originator, then the response is generated with a
>         lesser number of variable bindings.  This lesser number is the
>         ordered set of variable bindings with some of the variable
>         bindings at the end of the set removed, such that the size of
>         the message encapsulating the Response-PDU is approximately
>         equal to but no greater than either a local constraint or the
>         maximum message size of the originator.  Note that the number
>         of variable bindings removed has no relationship to the values
>         of N, M, or R.
> 
> For the issue you reported, there is no general solution, because it 
> interferes with sparse tables. 
> A solution would either decrease the performance for sparse tables or will 
> filter out sparse rows. 
> The latter is not acceptable for intentionally sparse tables. 
> For dense tables, the filtering could be the best option. Although it would 
> hide new rows although the command generator already detected them.
> 
> I am currently about to add an option for getDenseTable to activate a 
> filtering for new rows that appear during the table retrieval and are 
> therefore incompletely received. Would that help you?
> 
> Best regards,
> Frank 
> 
>> On 9. Jul 2018, at 19:45, Steffen Brüntjen <steffen.bruent...@macmon.eu> 
>> wrote:
>> 
>> Hi Frank
>> 
>> Thank you for having a look at it. I agree, the performance with many 
>> bindings is indeed *much* higher and yes, values should be retrieved 
>> row-by-row in order to avoid data inconsistencies. But there are also 
>> problems with many bindings:
>> 
>> 1. Since the agent can not - in the contrast to max-repetition-count - 
>> decide how many values to send, the packet size might get too big if you 
>> have a table with many (big) columns.
>> 
>> 2. There are agents that get into trouble when many columns are requested. 
>> This often results in timeouts (no tooBig error) and then there's no other 
>> option to requesting fewer bindings.
>> 
>> Maybe the proposed change is the way to go, it's decent, but effective (I 
>> believe).
>> 
>> Best regards
>> Steffen 
>> 
>> 
>> -----Original Message-----
>> From: Frank Fock [mailto:f...@agentpp.com] 
>> Sent: Freitag, 6. Juli 2018 18:55
>> To: Steffen Brüntjen <steffen.bruent...@macmon.eu>
>> Cc: snmp4j@agentpp.org
>> Subject: Re: [SNMP4J] max-bindings with big tables
>> 
>> Hi Steffen,
>> I will try to reproduce this issue. 
>> Independent from the result, the parameters for TableUtils are not suitable 
>> for your setup. The maxNumColumnsPerPDU has to be as large as possible. 
>> Otherwise the overall performance will be bad and the likelihood of 
>> incomplete table rows increases significantly (through changes in the agent 
>> while TableUtils operate).
>> Best regards 
>> Frank
>> 
>>> Am 06.07.2018 um 10:20 schrieb Steffen Brüntjen 
>>> <steffen.bruent...@macmon.eu>:
>>> 
>>> Hi!
>>> 
>>> I'm using SNMP4J version 2.6.2.
>>> 
>>> Best regards
>>> Steffen
>>> 
>>> -----Original Message-----
>>> From: Frank Fock [mailto:f...@agentpp.com] 
>>> Sent: Donnerstag, 5. Juli 2018 19:37
>>> To: Steffen Brüntjen <steffen.bruent...@macmon.eu>
>>> Cc: snmp4j@agentpp.org
>>> Subject: Re: [SNMP4J] max-bindings with big tables
>>> 
>>> Hi Steffen 
>>> What SNMP4J version are you using?
>>> Best regards 
>>> Frank
>>> 
>>>> Am 05.07.2018 um 17:04 schrieb Steffen Brüntjen 
>>>> <steffen.bruent...@macmon.eu>:
>>>> 
>>>> Hi Frank
>>>> 
>>>> I believe I found an issue in the TableUtils class. In certain scenarios, 
>>>> the returned List<TableEvent> from getTable(Target target, OID[] 
>>>> columnOIDs, OID lowerBoundIndex, OID upperBoundIndex) will contain 
>>>> incomplete and duplicate rows.
>>>> 
>>>> 
>>>> Here's an extract of an exemplary List<TableEvent> for a "good" result:
>>>> 
>>>> [1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 
>>>> = service]
>>>> [1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 
>>>> = reception]
>>>> [1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, [...], 1.3.6.1.2.1.31.1.1.1.18.283 
>>>> = voice]
>>>> [1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, [...], 1.3.6.1.2.1.31.1.1.1.18.373 
>>>> = clients]
>>>> [1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, [...], 1.3.6.1.2.1.31.1.1.1.18.774 
>>>> = VLAN601]
>>>> [1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, [...], 1.3.6.1.2.1.31.1.1.1.18.783 
>>>> = lab6]
>>>> 
>>>> 
>>>> But in some specific circumstances, I get results like these:
>>>> 
>>>> [ ... 75 normal rows ... ]
>>>> [1.3.6.1.2.1.31.1.1.1.1.278 = VLAN105, [...], 1.3.6.1.2.1.31.1.1.1.18.278 
>>>> = service] 
>>>> [1.3.6.1.2.1.31.1.1.1.1.279 = VLAN106, [...], 1.3.6.1.2.1.31.1.1.1.18.279 
>>>> = reception]
>>>> [null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.283 = 2, 
>>>> 1.3.6.1.2.1.31.1.1.1.15.283 = 0, 1.3.6.1.2.1.31.1.1.1.18.283 = voice]
>>>> [null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.373 = 2, 
>>>> 1.3.6.1.2.1.31.1.1.1.15.373 = 0, 1.3.6.1.2.1.31.1.1.1.18.373 = clients]
>>>> [null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.774 = 2, 
>>>> 1.3.6.1.2.1.31.1.1.1.15.774 = 0, 1.3.6.1.2.1.31.1.1.1.18.774 = VLAN601]
>>>> [null, null, null, null, 1.3.6.1.2.1.31.1.1.1.14.783 = 2, 
>>>> 1.3.6.1.2.1.31.1.1.1.15.783 = 0, 1.3.6.1.2.1.31.1.1.1.18.783 = lab6]
>>>> [1.3.6.1.2.1.31.1.1.1.1.283 = VLAN110, 1.3.6.1.2.1.31.1.1.1.17.283 = 2, 
>>>> 1.3.6.1.2.1.31.1.1.1.6.283 = 0, 1.3.6.1.2.1.31.1.1.1.10.283 = 0, null, 
>>>> null, null]
>>>> [1.3.6.1.2.1.31.1.1.1.1.373 = VLAN200, 1.3.6.1.2.1.31.1.1.1.17.373 = 2, 
>>>> 1.3.6.1.2.1.31.1.1.1.6.373 = 0, 1.3.6.1.2.1.31.1.1.1.10.373 = 0, null, 
>>>> null, null]
>>>> [1.3.6.1.2.1.31.1.1.1.1.774 = VLAN601, 1.3.6.1.2.1.31.1.1.1.17.774 = 2, 
>>>> 1.3.6.1.2.1.31.1.1.1.6.774 = 0, 1.3.6.1.2.1.31.1.1.1.10.774 = 0, null, 
>>>> null, null]
>>>> [1.3.6.1.2.1.31.1.1.1.1.783 = VLAN610, 1.3.6.1.2.1.31.1.1.1.17.783 = 2, 
>>>> 1.3.6.1.2.1.31.1.1.1.6.783 = 0, 1.3.6.1.2.1.31.1.1.1.10.783 = 0, null, 
>>>> null, null]
>>>> [ ... everything normal ... ]
>>>> 
>>>> 
>>>> Here we find some rows split into two: One block with the first 4 columns 
>>>> set null, and another block with the last 3 columns set null.
>>>> 
>>>> 
>>>> Here's the setting which produces the second result:
>>>> 
>>>> - max-bindings is set to 4 - TableUtils.setMaxNumColumnsPerPDU(int)
>>>> - max-repetitions is set to 30 - TableUtils.setMaxNumRowsPerPDU(int)
>>>> - the device returns many rows (like 120)
>>>> - the table request contains more columns than max-bindings
>>>> - the table request contains not a multiple of max-bindings
>>>> - the problem will also depend on MTU size, but that's not important here
>>>> 
>>>> 
>>>> This is what happens:
>>>> 
>>>> 1. TableUtils will request the first 4 columns
>>>> 2. device returns 60 variable bindings, that's 15 cells per column
>>>> 3. TableUtils will request the latter 3 columns
>>>> 4. device returns 60 variable bindings, that's 20 cells per column
>>>> 
>>>> This is repeating until all bindings are retrieved. So far, so good. The 
>>>> problem is now, that all second requests (step 3) will receive more rows, 
>>>> and so these requests will reach index 283 (as in the example above) 
>>>> earlier. I did some debugging and I think I found the reason: When the 
>>>> first results with index 283 are received (step 3), TableUtils creates a 
>>>> row for this index. That row is filled up with null values for the first 4 
>>>> columns so that it's size equals 7 (and not 3). Having size=7, the row is 
>>>> considered finished too soon. TableUtils then prunes these incomplete but 
>>>> finished rows from rowCache. When TableUtils receives the other 4 columns 
>>>> for row 283, it creates a new row with the same index.
>>>> 
>>>> 
>>>> How to fix?
>>>> 
>>>> I believe a moderately easy, but not very good way to fix this is to have 
>>>> the little part contain the first 3 columns, not the remaining last 3 
>>>> columns:
>>>> 
>>>> max-bindings = 4
>>>> columns: .1, .2, .3, .4, .5, .6, .7
>>>> 1. packet should contain: .1, .2, and .3
>>>> 2. packet should contain: .4, .5, .6, and .7
>>>> 
>>>> Number of columns for the first packet is NumColumnsTotal % maxBindings.
>>>> Number of columns for the other packets is maxBindings.
>>>> 
>>>> 
>>>> Please tell me if you need more information or if my method invocation is 
>>>> wrong.
>>>> 
>>>> 
>>>> Best regards
>>>> Steffen Brüntjen
>>>> _______________________________________________
>>>> SNMP4J mailing list
>>>> SNMP4J@agentpp.org
>>>> https://oosnmp.net/mailman/listinfo/snmp4j
> 

_______________________________________________
SNMP4J mailing list
SNMP4J@agentpp.org
https://oosnmp.net/mailman/listinfo/snmp4j

Reply via email to