Errata (this is a keyboarding problem which I didn't catch)

"fixed with" should have been "fixed width"

(I've not yet taken time to proofread my whole post, I wish I had more time...).

-- 
Raul

On Tue, Jun 4, 2013 at 2:32 PM, Raul Miller <[email protected]> wrote:
> Part 1. O(n)? really?
>
> My understanding of "This problem can be solved in linear time" is
> dubious. I feel that big-O calculations are often insufficiently
> rigorous.
>
> For example, consider some code which loops over a string, counting
> how many times a particular letter appears in that string. It's common
> to claim that this can be solved in linear time. But typical
> understandings of the meaning of this problem make it O(n*log n) or
> O(1).
>
> The information produced by this algorithm requires log n bits to
> represent. And, there are n operations on the counter which
> accumulates those values. So, worse case, the limit on time to perform
> the calculations is proportional to n*log n.
>
> On the other hand, if we use fixed with numbers to represent the count
> produced by the algorithm, there's a constant bound on the time needed
> to perform the calculations, so it's O(1). And there's something
> similar happening with the index into the string which we use to
> select the character we are inspecting.
>
> But if we define our hardware "properly" we can do this in linear
> time. To do this, we define both the input and the output as "tape".
> The "counting" operation consists of advancing the output tape. Now we
> have linear time to scan the string, and linear time to build the
> count. But we also have linear time to consume the count - there's no
> motivation here for this algorithm to exist independent of its
> consumer.
>
> Anyways, if this is what is meant by "linear time" it's not
> particularly relevant to modern machines. (Though the "tape model"
> still does have some relevance - in some sense it approximates the
> time needed for random access given the size of the data set. We have
> architectures with registers, cache lines, L1 and L2 cache, main
> memory, mass storage, and cloud storage - the bigger the data set the
> longer it takes to randomly access data, and "tape" in some sense
> approximates this concept.)
>
> Recap: I am dubious about the rigor of this "linear time" claim - I
> think a rigorous exposition of this claim would have to spell out
> assumptions which require a machine with different time
> characteristics from what the typical programmer would expect. (But I
> still would be interested in reading about the algorithm.)
>
>
> Part 2. Counting Squares
>
> Problems like this, where we have simultaneously identifying unknown
> subsequences and the sequential relationship between them, are outside
> my normal realm of activity. So there are probably better ways of
> doing this.
>
> For example, everything in the for loop, where I'm appending to r,
> conceptually can be a function F with the for loop replaced with
>   ; ({."1 lumps) <@F/. lumps
>
> Also "lump" and "lumps" are particularly non-descriptive names.
> There's probably names in the literature on this subject which would
> be meaningful. I should probably be defining helper functions, here,
> rather than bundling everything into one big pile of code.
>
> Still, this does find the squares, and I've already spent more time
> today thinking about this issue than I had planned on...
>
> squarest=:3 :0
>   lumps=. /:~ ; y <@(}: ,.~ 1 -~ 2 -~/\ ])/. i.#y
>   r=. i.0 0
>   for_box.   ({."1 lumps) </. lumps do.
>     lump=. >box
>     len=. (<0 0) { lump
>     keep=. (len#1) E. 0,~2 -~/\ 1 {"1 lump
>     r=. r,keep#lump
>   end.
>   ~.({:"1 r) {&y@(+ i.)&.> 2*1+{."1 r
> )
>
> FYI,
>
> --
> Raul
>
>
> On Tue, Jun 4, 2013 at 5:52 AM, Mike Müller <[email protected]> wrote:
>> I asked about this problem in #jsoftware on Freenode, where it was mentioned 
>> to me that this might be of broader interest,
>> so I will repeat my question here and hope to get some other interested in 
>> this topic.
>>
>> I am interested in finding squares in strings. A square is a string of the 
>> form uu, for some string u.
>> For instance, 'bonbon' is a square. My first problem would be to check if a 
>> string is square-free,
>> which means, that no factor (substring) of the string is a square.
>> For example, 'square' is square-free, but 'square-free' is not (as the 
>> square 'ee' is a factor thereof)
>>
>> As I'm just learning J, I would be interested in "nice" ways to do this in 
>> J, not necessarily the fastest ones.
>> This problem can be solved in linear time, see for instance [1], but I'm 
>> more interested in how to do it "the J way".
>> As a starter, I can check if a string s is a square like that: 3 <: +/ s E. 
>> s,s
>>
>> Another problem would be to calculate the number of distinct squares in a 
>> string.
>> I think this is also best explained by an example. Let s =. 'aababaabb'.
>> Then the distinct squares appearing in s are 'aa, 'bb', 'abab', and 'baba'.
>> So the number of distinct squares is 4. Note that the word _distinct_ is 
>> important here, as 'aa' occurs twice,
>> but we only count it once.
>>
>> There's also some theoretical motivation for this second problem, as it is 
>> shown that this number is at most 2n,
>> for any string of length n [2]. It is further conjectured that it is in fact 
>> at most n, but this is an open problem,
>> and the best bound known so far is 2n - log(n) [3].
>>
>> I would be interested to see how these problems could be solved nicely in J,
>> and hope to learn more of this fascinating language. Thank you!
>>
>> Mike
>>
>>
>> [1] M. G. Main and R. J. Lorentz. Linear time recognition of squarefree 
>> strings. In Combinatorial
>> Algorithms on Words, volume F12 of NATO ASI Series, pages 271–278. Springer, 
>> 1985.
>>
>> [2]Fraenkel, A. S., & Simpson, J. (1998). How many squares can a string 
>> contain?.
>> Journal of Combinatorial Theory, Series A, 82(1), 112-120.
>>
>> [3]Ilie, L. (2007). A note on the number of squares in a word.
>> Theoretical Computer Science, 380(3), 373-376.
>>
>>
>>
>>
>>
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to