Hi Václav,

On Mon, Feb 4, 2019 at 12:48 PM Václav Jirovský <[email protected]>
wrote:

> Hello Al,
>
> thanks for your response.
>
> *"Perhaps you could call get-sth to learn the current tree size, then
> divide the range from what you currently have to the log's actual tree size
> into N subranges, start N workers each repeatedly calling get-entries
> always with start=first entry they don't yet have for their subrange, and
> end=limit of their allocated subrange until they have fetched all of their
> allocated entries?"*
>
> *This solution still doesn't solve the problem - I don't know, what is
> good constant value (treesize/value) to split to N subranges (for current
> server - one server returns 1024, google's mostly 256). So I would have to
> perform some "test entry downloads", to recognize what is the best limit to
> download entries: <start, start+limit>, and in case that server just
> throttled (reduced) entries count, I would have to perform calling
> get-entries in missing areas (as you describe).*
>

This is essentially the approach we use for mirroring/ingesting logs, it
seems to work well enough for our purposes at least.

IRL there are other nuances to consider too - e.g. many log operators will
act to protect their logs from high levels of traffic, so you may find that
the throughput you can actually achieve is not entirely in your hands
anyway.


>
> *"It seems like you're optimising for a very specific use case - if I just
> want to inspect one entry (e.g. a monitor tells me "hey, there's a cert for
> your domain at entry X) I might have to download 1000s of certs just to
> fetch the one I'm interested in? Also, what happens when I just want to
> request the last few entries from the tree?*
> *(Perhaps there's a typo in your proposal?)*
>
> *It's optimalization for (I think common) use case - I want to have my app
> to be in sync with all log servers and process all entries. And some
> servers (for example Google Argon 2019) are so quick into pushing new
> entries, but output is sometimes limited to smaller number than increase (i
> mean from previous run, where I called get-entries and process them). If I
> would have constant split value - I would be able to split sth to N
> subranges and download and process it in parallel much easier.*
>
> *You are right, there is **typo, it should be:*
>
>>  Logs MAY restrict the number of entries that can be retrieved per
>>    "get-entries" request.  If a client requests more than the permitted
>>    number of entries ("get_entries_max_limit" output of "get-sth" request),
>>
>>    the log SHALL return the maximum number of entries
>>    permissible. If a client requests less or equal than the permitted
>>    number of entries ("get_entries_max_limit" output of "get-sth" request),
>>    the log MUST return the *requested *number of entries permissible.
>>    These entries SHALL be sequential beginning with the
>>
>>    entry specified by "start".
>>
>>
>
>
> *Just because I happen to have an STH with a given "get_entries_max_limit"
> it doesn't mean that the log didn't change that limit between when the STH
> was created and when I make a request so I'd have to have logic to cope
> anyway - log operators may wish to adjust this limit dynamically/from time
> to time in order to react to usage patterns or resource
> consumption/availability etc."*
>
> *I understand your arguments - I don't have solution at the moment, but I
> think this would be possible to solve.*
>
> Best,
> Vaclav
>
>
>
> On Mon, Feb 4, 2019 at 1:02 PM Al Cutter <[email protected]> wrote:
>
>> Hi Vaclav,
>>
>> I think there might be other ways of achieving what you're after.
>>
>> On Mon, 4 Feb 2019, 10:19 Václav Jirovský, <[email protected]>
>> wrote:
>>
>>> Hello all,
>>>
>>> I would like to propose modification Retrieve Latest Signed Tree Head 
>>> section
>>> of RFC6962 - adding new attribute *get_entries_max_limit.*
>>>
>>> Reason for this change - 4.6 section actual version:
>>>
>>> * Logs MAY restrict the number of entries that can be retrieved per
>>>    "get-entries" request.  If a client requests more than the permitted
>>>    number of entries, the log SHALL return the maximum number of entries
>>>    permissible.  These entries SHALL be sequential beginning with the
>>>    entry specified by "start".
>>> *
>>>
>>>
>>> If you want to download all entries from CT server, you don't what
>>> number of entries will server return to you by request - so you have to
>>> process, count real number of returned entries and after that, you can do
>>> another request. This is not efficient, you could do these request in
>>> parallel, if you would have garanteed number of returned entries.
>>>
>>
>> Perhaps you could call get-sth to learn the current tree size, then
>> divide the range from what you currently have to the log's actual tree size
>> into N subranges, start N workers each repeatedly calling get-entries
>> always with start=first entry they don't yet have for their subrange, and
>> end=limit of their allocated subrange until they have fetched all of their
>> allocated entries?
>>
>> That seems like it might give you the parallelism you're after?
>>
>>
>>
>>> *Proposed modification:*
>>>
>>>
>>> 4.3 <https://tools.ietf.org/html/rfc6962#section-4.3>.  Retrieve Latest 
>>> Signed Tree Head
>>>
>>> GET https://<log server>/ct/v1/get-sth
>>> No inputs.
>>>
>>> Outputs:
>>>
>>>       tree_size:  The size of the tree, in entries, in decimal.
>>>       timestamp:  The timestamp, in decimal.
>>>       sha256_root_hash:  The Merkle Tree Hash of the tree, in base64.
>>>
>>> *      get_entries_max_limit: Maximum entries count provided by server 
>>> get-entries method.*
>>>
>>> tree_head_signature: A TreeHeadSignature for the above data.
>>>
>>>
>>> 4.6 <https://tools.ietf.org/html/rfc6962#section-4.6>.  Retrieve Entries 
>>> from Log
>>>
>>> GET https://<log server>/ct/v1/get-entries
>>>
>>>
>>> Inputs:
>>>       start:  0-based index of first entry to retrieve, in decimal.
>>>       end:  0-based index of last entry to retrieve, in decimal.
>>>
>>>
>>> .....
>>>
>>>
>>>    Logs MAY restrict the number of entries that can be retrieved per
>>>    "get-entries" request.  *If a client requests more than the permitted
>>>    number of entries ("get_entries_max_limit" output of "get-sth" request),*
>>>
>>> *   the log SHALL return the maximum number of entries
>>>    permissible. If a client requests less or equal than the permitted
>>>    number of entries ("get_entries_max_limit" output of "get-sth" request),
>>>    the log MUST return the maximum number of entries permissible. *
>>>    These entries SHALL be sequential beginning with the
>>>
>>>    entry specified by "start".
>>>
>>>
>> It seems like you're optimising for a very specific use case - if I just
>> want to inspect one entry (e.g. a monitor tells me "hey, there's a cert for
>> your domain at entry X) I might have to download 1000s of certs just to
>> fetch the one I'm interested in? Also, what happens when I just want to
>> request the last few entries from the tree?
>> (Perhaps there's a typo in your proposal?)
>>
>> Just because I happen to have an STH with a given "get_entries_max_limit"
>> it doesn't mean that the log didn't change that limit between when the STH
>> was created and when I make a request so I'd have to have logic to cope
>> anyway - log operators may wish to adjust this limit dynamically/from time
>> to time in order to react to usage patterns or resource
>> consumption/availability etc.
>>
>> Cheers,
>> Al.
>>
>>
>>
>>>
>>> Best,
>>>
>>> Vaclav Jirovsky
>>>
>>>
>>> _______________________________________________
>>> Trans mailing list
>>> [email protected]
>>> https://www.ietf.org/mailman/listinfo/trans
>>>
>>
>
> --
> Václav Jirovský
> email: [email protected]
>
>
_______________________________________________
Trans mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/trans

Reply via email to