On Friday, August 19, 2011 03:07 unDEFER wrote: > On Fri, 19 Aug 2011 06:53:37 +0400, Jonathan M Davis <[email protected]> > > wrote: > > Hmmm. Such a function isn't entirely a bad idea, but it also makes me a > > bit > > nervous. Slicing is efficient. The slice function that you suggest is > > not. I > > mean, it's efficient enough for what it's doing, but it's not O(1) like > > slicing > > is, so having a slice function could be a bit misleading. > > I know that it is not efficient, but here just appears the question why D > have decided not support 8-but encodings. Only its makes operations like > this efficient. > > > Once drop has been merged in, you'll be able do to this > > auto s = takeExactly(drop(str, firstIndex), lastIndex - firstIndex)); > > to get the same effect. It may be worth adding such a function though. > > I'm sorry, but looks like there is no "drop()" function. > Anyway, thank you. I really don't understand how takeExactly works, but it > works. For newbies it is really not obvious that std.range works fine with > UTF-8 strings.
I said "once drop has been merged in, you'll be able to..." It's not in yet. There's a pull request for it (which was merged in this morning actually), and it's going to be in before the next release, but it's not in yet. std.range most definitely works with UTF-8 strings. _All_ strings are considered ranges of dchar. And as ranges, strings of char and wchar are not considered sliceable or random access, and they have no length property (as none of that works when multiple elements in the array make up a single element in the range). std.range.take creates a range with up to n elements of the range that it's given. It's not the same type as the original range, since it's lazy and takes elements from the original range only as you iterate it (it would take less than n elements from the range if there were fewer than n elements in the range, otherwise it takse no elements). std.range.takeExactly takes exactly n elements from the range, and if the range defines a length property, then it returns the exact same type. I was thinking that it managed to return the exact same type for strings as well, in spite of the fact that it has no length property, but it does not appear that it does. So, if you need the type to be string specifical yas opposed to a generic range of dchar, then takeExactly isn't going to work. You could call std.array.array on it to get a string again, but that's creating a new string, which obviously isn't as efficient. I would point out though that what's generally done when someone needs random access to a string is to use dstring. So, if you're really looking to take slices out of the middle of a string like that, it's better to just use dstring. It _is_ sliceable and has a length property, because each element in an array of dchar is a dchar, unlike arrays of char and wchar, where multiple elements are required to make a dchar. > > Certainly > > auto s = slice(firstIndex, lastIndex); > > is cleaner. If we add it though, then we should probably give it a > > different name. Maybe sliceByElementType? That does seem a bit long > > though, if accurate. That would make sense if we restricted it to strings, but if we added the function, it would be useful for any range which didn't define a length property, so we wouldn't be making it string-specific, and so subString wouldn't make any sense as a function name. Though, come to think of it, for any type of range other than an array of char or wchar, such a function would not be able to return the original type, so it's value is certainly less in the general case. Regardless, given the inefficiencies involved, I think that we should be discouraging taking random slices of strings or wstrings. There's no reason to make it so that you can't do it, but including a function in Phobos to do it makes it overly easy IMHO. Someone who needs to be taking slices from the middle of strings like that really should be using dstrings in most cases. If it's a bit ugly to slice the middle of a string, that's probably a good thing. As Sean pointed out, std.utf.toUCSindex (which should probably be renamed to toUCSIndex to be properly camelcased, but I don't know if we'll fix that or not) will give you the index into the string that you need. auto firstIndex = str.toUCSindex(7); auto lastIndex = str[firstIndex .. $].toUCSindex(8); auto slice = str[firstIndex .. lastIndex]; should give you the equivalent of str[7 .. 15] if str were a dstring. You could also do it as auto slice = str[str.toUCSindex(7) .. str.toUCSindex(15]; which would be clearer, but it would also be less efficient. So, we _might_ add a slicing function to Phobos, but I'm skepitical of the wisdom of making it that easy to slice a string or wstring like that given how inefficient it is. std.utf already makes it possible in as efficient a manner as is possible - just not in as concise a way - and if you're really taking slices out of the middle of a string, you really should be doing it with dstrings. It's far more efficient that way. - Jonathan M Davis _______________________________________________ phobos mailing list [email protected] http://lists.puremagic.com/mailman/listinfo/phobos
