Re: Data, enumerateBytes: separate blocks?

2017-12-25 Thread Quincey Morris
On Dec 25, 2017, at 10:23 , Daryle Walker  wrote:
> 
> What happens if whichever byte value is second is gigabytes away from the 
> first?

Your Data extension code doesn’t solve that problem anyway:

> var firstCr, firstLf: Index?
> enumerateBytes { buffer, start, stop in
> if let localLf = buffer.index(of: ParsingQueue.Constants.lf) {
> firstLf = start.advanced(by: buffer.startIndex.distance(to: 
> localLf))
> stop = true
> }
> 
> if let firstCrIndex = firstCr, firstCrIndex.distance(to: 
> start.advanced(by: buffer.count)) > 2 {
> // No block after this current one could find a LF close 
> enough to form CR-LF or CR-CR-LF.
> stop = true
> } else if let localCr = buffer.index(of: 
> ParsingQueue.Constants.cr) {
> firstCr = start.advanced(by: buffer.startIndex.distance(to: 
> localCr))
> stop = true
> }
> }


In the case where the Data object is *one* multi-GB buffer, if it doesn’t 
contain a LF you will search gigabytes for the non-existent LF before searching 
them again for the CR. Even if you’re lucky and the Data object is multiple 
smallish-buffers, you will still search all the buffers that don’t have a CR 
for a LF, before you find the one that does have a CR.

So, if your goal is to minimize searching, you have to search for CR and LF 
simultaneously. There are two easy ways to do this:

1. Use “index(where:)” and test for both values in the closure.

2. Use a manual loop that indexes into a buffer pointer (C-style).

#1 is the obvious choice unless invoking the closure is too slow when a lot of 
bytes need to be examined. #2 would use “enumerateBytes” to get a series of 
buffer pointers efficiently, but there is no boundary code to be tested, since 
you’re only examining 1 byte at a time.

Once you have the optional indices to the first CR or LF, and you find you need 
to check for a potential CR-LF or CR-CR-LF, you can do that by subscripting 
into the original Data object directly, outside of the search loop.

This approach would eliminate the problematic test case, and (unless I’m 
missing something obvious) have the initial search as its only O(n) 
computation, everything else being O(1), i.e. constant and trivial.

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Data, enumerateBytes: separate blocks?

2017-12-25 Thread Charles Srstka
> On Dec 25, 2017, at 12:23 PM, Daryle Walker  wrote:
> 
> Not quite.
> 
> My first versions of this idea, pre-Swift and therefore using NSData with 
> Objective-C, did use the direct search functions that come with the NSData 
> API. There seems to be a detail you missed in my sample code that explains 
> the use of “enumerateBytes”:
> 
> LF-only is also a searched-for separator.
> 
> That means no matter what, I must find the first CR and the first LF. Then I 
> compare their relative positions (and check for another CR if the spacing is 
> right). What happens if whichever byte value is second is gigabytes away from 
> the first? (Or equivalently, only one value is present and there’s gigabytes 
> of trailing data to fail to find the other value.) I would end up wasting the 
> user’s time for a second result I’d never use.

With either Collection or Data, the value that index(of:) returns for 
the second value will be one greater than what it returns for the first value 
in that case, regardless of how the data is stored under the hood.

Charles

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com


Re: Data, enumerateBytes: separate blocks?

2017-12-25 Thread Daryle Walker

> On Dec 24, 2017, at 3:51 PM, Quincey Morris 
>  wrote:
> 
> On Dec 24, 2017, at 04:45 , Charles Srstka  > wrote:
>> 
>> you could consider making its interface take generic collections of UInt8
> 
> This would not solve the *general* problem Daryle raised. He’s looking for a 
> way to test the logic of some buffer-boundary-crossing code, which makes 
> sense only if he has multiple buffers, which means he must be using 
> “enumerateBytes”, which not supported by Collection. If he doesn’t use 
> enumerateBytes, then he doesn’t need anything but Data anyway.
> 
> However, considering what appears to be the *actual* problem (finding the 
> first CR or CR-LF or CR-CR-LF separator in a byte sequence), he could use 
> Data without using enumerateBytes, and still not risk copying the data to a 
> contiguous buffer.
> 
> This solution would use Data’s “index(of:)” to find the first CR, then a 
> combination of advancing the index and subscripting to test for LF in the 
> following 1 or 2 positions.

Not quite.

My first versions of this idea, pre-Swift and therefore using NSData with 
Objective-C, did use the direct search functions that come with the NSData API. 
There seems to be a detail you missed in my sample code that explains the use 
of “enumerateBytes”:

LF-only is also a searched-for separator.

That means no matter what, I must find the first CR and the first LF. Then I 
compare their relative positions (and check for another CR if the spacing is 
right). What happens if whichever byte value is second is gigabytes away from 
the first? (Or equivalently, only one value is present and there’s gigabytes of 
trailing data to fail to find the other value.) I would end up wasting the 
user’s time for a second result I’d never use.

— 
Daryle Walker
Mac, Internet, and Video Game Junkie
darylew AT mac DOT com 

___

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com