Errata (this is a keyboarding problem which I didn't catch) "fixed with" should have been "fixed width"
(I've not yet taken time to proofread my whole post, I wish I had more time...). -- Raul On Tue, Jun 4, 2013 at 2:32 PM, Raul Miller <[email protected]> wrote: > Part 1. O(n)? really? > > My understanding of "This problem can be solved in linear time" is > dubious. I feel that big-O calculations are often insufficiently > rigorous. > > For example, consider some code which loops over a string, counting > how many times a particular letter appears in that string. It's common > to claim that this can be solved in linear time. But typical > understandings of the meaning of this problem make it O(n*log n) or > O(1). > > The information produced by this algorithm requires log n bits to > represent. And, there are n operations on the counter which > accumulates those values. So, worse case, the limit on time to perform > the calculations is proportional to n*log n. > > On the other hand, if we use fixed with numbers to represent the count > produced by the algorithm, there's a constant bound on the time needed > to perform the calculations, so it's O(1). And there's something > similar happening with the index into the string which we use to > select the character we are inspecting. > > But if we define our hardware "properly" we can do this in linear > time. To do this, we define both the input and the output as "tape". > The "counting" operation consists of advancing the output tape. Now we > have linear time to scan the string, and linear time to build the > count. But we also have linear time to consume the count - there's no > motivation here for this algorithm to exist independent of its > consumer. > > Anyways, if this is what is meant by "linear time" it's not > particularly relevant to modern machines. (Though the "tape model" > still does have some relevance - in some sense it approximates the > time needed for random access given the size of the data set. We have > architectures with registers, cache lines, L1 and L2 cache, main > memory, mass storage, and cloud storage - the bigger the data set the > longer it takes to randomly access data, and "tape" in some sense > approximates this concept.) > > Recap: I am dubious about the rigor of this "linear time" claim - I > think a rigorous exposition of this claim would have to spell out > assumptions which require a machine with different time > characteristics from what the typical programmer would expect. (But I > still would be interested in reading about the algorithm.) > > > Part 2. Counting Squares > > Problems like this, where we have simultaneously identifying unknown > subsequences and the sequential relationship between them, are outside > my normal realm of activity. So there are probably better ways of > doing this. > > For example, everything in the for loop, where I'm appending to r, > conceptually can be a function F with the for loop replaced with > ; ({."1 lumps) <@F/. lumps > > Also "lump" and "lumps" are particularly non-descriptive names. > There's probably names in the literature on this subject which would > be meaningful. I should probably be defining helper functions, here, > rather than bundling everything into one big pile of code. > > Still, this does find the squares, and I've already spent more time > today thinking about this issue than I had planned on... > > squarest=:3 :0 > lumps=. /:~ ; y <@(}: ,.~ 1 -~ 2 -~/\ ])/. i.#y > r=. i.0 0 > for_box. ({."1 lumps) </. lumps do. > lump=. >box > len=. (<0 0) { lump > keep=. (len#1) E. 0,~2 -~/\ 1 {"1 lump > r=. r,keep#lump > end. > ~.({:"1 r) {&y@(+ i.)&.> 2*1+{."1 r > ) > > FYI, > > -- > Raul > > > On Tue, Jun 4, 2013 at 5:52 AM, Mike Müller <[email protected]> wrote: >> I asked about this problem in #jsoftware on Freenode, where it was mentioned >> to me that this might be of broader interest, >> so I will repeat my question here and hope to get some other interested in >> this topic. >> >> I am interested in finding squares in strings. A square is a string of the >> form uu, for some string u. >> For instance, 'bonbon' is a square. My first problem would be to check if a >> string is square-free, >> which means, that no factor (substring) of the string is a square. >> For example, 'square' is square-free, but 'square-free' is not (as the >> square 'ee' is a factor thereof) >> >> As I'm just learning J, I would be interested in "nice" ways to do this in >> J, not necessarily the fastest ones. >> This problem can be solved in linear time, see for instance [1], but I'm >> more interested in how to do it "the J way". >> As a starter, I can check if a string s is a square like that: 3 <: +/ s E. >> s,s >> >> Another problem would be to calculate the number of distinct squares in a >> string. >> I think this is also best explained by an example. Let s =. 'aababaabb'. >> Then the distinct squares appearing in s are 'aa, 'bb', 'abab', and 'baba'. >> So the number of distinct squares is 4. Note that the word _distinct_ is >> important here, as 'aa' occurs twice, >> but we only count it once. >> >> There's also some theoretical motivation for this second problem, as it is >> shown that this number is at most 2n, >> for any string of length n [2]. It is further conjectured that it is in fact >> at most n, but this is an open problem, >> and the best bound known so far is 2n - log(n) [3]. >> >> I would be interested to see how these problems could be solved nicely in J, >> and hope to learn more of this fascinating language. Thank you! >> >> Mike >> >> >> [1] M. G. Main and R. J. Lorentz. Linear time recognition of squarefree >> strings. In Combinatorial >> Algorithms on Words, volume F12 of NATO ASI Series, pages 271–278. Springer, >> 1985. >> >> [2]Fraenkel, A. S., & Simpson, J. (1998). How many squares can a string >> contain?. >> Journal of Combinatorial Theory, Series A, 82(1), 112-120. >> >> [3]Ilie, L. (2007). A note on the number of squares in a word. >> Theoretical Computer Science, 380(3), 373-376. >> >> >> >> >> >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
