Re: [Pharo-users] how to use indexes to copy a substring out of a string

2019-01-01 Thread Richard O'Keefe via Pharo-users
--- Begin Message ---
Let us approach this as simply as we can.
(1) There are two numbers 1 <= i <= n-3, i+2 <= j <= n-1
such that (s at: i) = (s at: j) and (s at: i+1) = (s at: j+1)

Brute force will get us where we want to go:
String
  hasTwoCharacterNonoverlappingRepeat
1 to: self size - 3 do: [:i |
  i + 2 to: self size - 1 do: [:j |
((self at: i  ) = (self at: j  ) and:
[(self at: i+1) = (self at: j+1)])
  ifTrue: [^true]]].
^false

This is pretty much how you would do it in C or Java.
This takes O(n**2) time where n is the size of the string.

If you want to be *really* clever, you can do something with
suffix arrays, but if you just want to be a little bit clever,
you can sort the two-character substrings and their positions
by just sorting the positions.
Make an array a of the integers 1..n-1.
Sort it by viewing each i in the array as the triple
(s[i],s[i+1],i).  You don't *make* the triples, just do the
comparisons as if they existed.
Now a linear scan of the array will find a match if there is one.
The whole process is O(n.lg n).  It's only worth doing when n is
large.

(2) Finding xyx.  You did not say whether the middle letter can be the same
as the outer ones or must be different.  I'm going to suppose that it
does not matter.

hasXYX
  3 to: self size do: [:i |
(self at: i-2) = (self at: i) ifTrue: [^true]].
  ^false

Again, this is just what you would do in Java or C:
public static boolean hasXYX(String s) {
final int n = s.length();
for (int i = 2; i < n; i++)
if (s.charAt(i-2) == s.charAt(i)) return true;
}
return false;
}

For what it's worth, in my system this scans a Scrabble dictionary
with 146522 words (1,220,068 bytes), finding 2779 "nice" words,
in 0.18 seconds, so there's not a lot of point in improving it.

In both cases, DON'T copy.  DON'T make substrings.
Just state the problem simply and convert it to obvious code.
Simple boring code is GOOD.


On Sun, 30 Dec 2018 at 05:29, Roelof Wobben  wrote:

> Hello,
>
> Still working on AdventOfCode
>
> Im struggeling to see how this can be solved.
>
> Now, a nice string is one with all of the following properties:
>
>- It contains a pair of any two letters that appears at least twice in
>the string without overlapping, like xyxy (xy) or aabcdefgaa (aa), but
>not like aaa (aa, but it overlaps).
>- It contains at least one letter which repeats with exactly one
>letter between them, like xyx, abcdefeghi (efe), or even aaa.
>
> my game plan was this
>
> 1) group the characters and filter everything out which has a count not
> equal to two
>
> 2) find the indexes of the characters and with the indexes use a copy
> method so I have a substring out of it
>
> 3) do the same with the substring so again 1 and 2
>
> but how can I make 2 work.
>
> Roelof
>
>
>
--- End Message ---


Re: [Pharo-users] how to use indexes to copy a substring out of a string

2018-12-30 Thread Ben Coman
On Sun, 30 Dec 2018 at 17:53, Roelof Wobben via Pharo-users <
pharo-users@lists.pharo.org> wrote:

> Op 30-12-2018 om 02:14 schreef Ben Coman:
>
> 'abcdefg' asArray  overlappingPairsRemainderCollect: [ :a :b :rem | {
> a.b.rem } ]
>
> This was deliberately only half a solution.  The block above "[ :a :b :rem
| { a.b.rem } ]"
is a placeholder only - to show you what data you would have to work with.
You need to adapt it.



> Sorry, I do not see how this can work so I can find the answer
>
> qjhvhtzxzqqjkmpb is nice because is has a pair that appears twice (qj)
>
> so if I try it on the code for overlappingPairsRemainderCollect  I see
> this as answer :
>
> "#($q $j #($h $v $h $t $z $x $z $q $q $j $k $m $p $b))"
>

The line above looks pertinent!! Inside the block you will have...
a ==> $q
b ==> $j
rem ==> #($h $v $h $t $z $x $z $q $q $j $k $m $p $b)



> "#($p $b #())"
>
> How does help me the output in telling me that there are two pairs that
> are there two times
>

You might start with the following block...
   [ :a :b :rem | self halt. rem overlappingPairsCollect: [ :a2 :b2 |
...something... ] ]
and from in the debugger experiment by inspecting the second half of it.

cheers -ben


Re: [Pharo-users] how to use indexes to copy a substring out of a string

2018-12-30 Thread Roelof Wobben via Pharo-users
--- Begin Message ---

  
  
Op 30-12-2018 om 02:14 schreef Ben
  Coman:


  'abcdefg' asArray  overlappingPairsRemainderCollect: [ :a :b :rem
  | { a.b.rem } ]


Sorry, I do not see how this can work so I can find the answer 

qjhvhtzxzqqjkmpb is nice because is has a pair that
appears twice (qj)

so if I try it on the code for overlappingPairsRemainderCollect  I
see this as answer : 

"#($q $j #($h $v $h $t $z $x $z $q $q $j $k $m $p $b))"
"#($j $h #($v $h $t $z $x $z $q $q $j $k $m $p $b))"
"#($h $v #($h $t $z $x $z $q $q $j $k $m $p $b))"
"#($v $h #($t $z $x $z $q $q $j $k $m $p $b))"
"#($h $t #($z $x $z $q $q $j $k $m $p $b))"
"#($t $z #($x $z $q $q $j $k $m $p $b))"
"#($z $x #($z $q $q $j $k $m $p $b))"
"#($x $z #($q $q $j $k $m $p $b))"
"#($z $q #($q $j $k $m $p $b))"
"#($q $q #($j $k $m $p $b))"
"#($q $j #($k $m $p $b))"
"#($j $k #($m $p $b))"
"#($k $m #($p $b))"
"#($m $p #($b))"
"#($p $b #())"

How does help me the output in telling me that there are two pairs
that are there two times 

  


--- End Message ---


Re: [Pharo-users] how to use indexes to copy a substring out of a string

2018-12-30 Thread Roelof Wobben via Pharo-users
--- Begin Message ---

  
  
Thanks Ben, 
  
  I think you almost give me the answer but I have to do some
  testing to see how it really works,
  
  Roelof
  
  
  
  Op 30-12-2018 om 02:14 schreef Ben Coman:


  
  

  

  

  

  




  On Sun, 30 Dec 2018 at 00:29,
Roelof Wobben 
wrote:
  
  
 Hello, 
  
  Still working on AdventOfCode 
  
  I'm struggling to see how this can be
  solved. 
  
  Now, a nice string is one with all of
the following properties:
  
It contains a pair of any two
  letters that appears at least twice in
  the string without overlapping, like xyxy
  (xy) or aabcdefgaa
  (aa), but not like aaa
  (aa, but it overlaps).
  

  
  To exclude overlaps, one approach could
be to subtract each candidate pair from the
string 
  and then check the remainder-string for
matches.
  I'm not sure, but suspect that as the
candidate pair progresses through the
string, only the 
  trailing remainder-string needs matching
since the preceding remainder-string has
already been checked.
  
  
  So a Pharo 7 Spotter search for 'pairs'
finds 28 implementors 
  where
SequenceableCollection>>overlappingPairsCollect:
  looks quite close to what is required...
  
  'abcdefg' asArray
  overlappingPairsCollect: [ :a :b | { a.b }
  ].  "#(
  #($a $b) 
  #($b $c) 
  #($c $d) 
  #($d $e) 
  #($e $f) 
  #($f $g))" 
  
  
  
  Reviewing its implementation...
  
   
  SequenceableCollection>>overlappingPairsCollect:
  aBlock 
	|
  retval |

	retval
  := self species ofSize: self size - 1.
	1
  to: self size - 1
		do:
  [:i | retval at: i put: (aBlock value:
  (self at: i) value: (self at: i + 1)) ].
	^retval
  
  
  
  it can be adapted to provide also a
remainder...
  
   
  SequenceableCollection>>overlappingPairsRemainderCollect:
  aBlock 
	|
  retval |

	retval
  := self species ofSize: self size - 1.
	1
  to: self size - 1
		do:
  [:i | retval at: i put: (aBlock value:
  (self at: i) value: (self at: i + 1)
  value: (self allButFirst: i + 1)) ].
	^retval
  
  
  
  for use like this...
  
  'abcdefg' asArray 
  

Re: [Pharo-users] how to use indexes to copy a substring out of a string

2018-12-29 Thread Ben Coman
On Sun, 30 Dec 2018 at 00:29, Roelof Wobben  wrote:

> Hello,
>
> Still working on AdventOfCode
>
> I'm struggling to see how this can be solved.
>
> Now, a nice string is one with all of the following properties:
>
>- It contains a pair of any two letters that appears at least twice in
>the string without overlapping, like xyxy (xy) or aabcdefgaa (aa), but
>not like aaa (aa, but it overlaps).
>
> To exclude overlaps, one approach could be to subtract each candidate pair
from the string
and then check the remainder-string for matches.
I'm not sure, but suspect that as the candidate pair progresses through the
string, only the
trailing remainder-string needs matching since the preceding
remainder-string has already been checked.

So a Pharo 7 Spotter search for 'pairs' finds 28 implementors
where SequenceableCollection>>overlappingPairsCollect:
looks quite close to what is required...
  'abcdefg' asArray overlappingPairsCollect: [ :a :b | { a.b } ].  "#(
  #($a $b)
  #($b $c)
  #($c $d)
  #($d $e)
  #($e $f)
  #($f $g))"

Reviewing its implementation...
SequenceableCollection>>overlappingPairsCollect: aBlock
| retval |
retval := self species ofSize: self size - 1.
1 to: self size - 1
do: [:i | retval at: i put: (aBlock value: (self at: i) value: (self at: i
+ 1)) ].
^retval

it can be adapted to provide also a remainder...
SequenceableCollection>>overlappingPairsRemainderCollect: aBlock
| retval |
retval := self species ofSize: self size - 1.
1 to: self size - 1
do: [:i | retval at: i put: (aBlock value: (self at: i) value: (self at: i
+ 1) value: (self allButFirst: i + 1)) ].
^retval

for use like this...
  'abcdefg' asArray  overlappingPairsRemainderCollect: [ :a :b :rem | {
a.b.rem } ] "#(
  #($a $b #($c $d $e $f $g))
  #($b $c #($d $e $f $g))
  #($c $d #($e $f $g))
  #($d $e #($f $g))
  #($e $f #($g))
  #($f $g #()))"



>- It contains at least one letter which repeats with exactly one
>letter between them, like xyx, abcdefeghi (efe), or even aaa.
>
> You might adapt "overlappingPairsCollect: aBlock"
into  "overlappingPairsSkip: n collect: aBlock"
but changing  "size -1"  and   "i + 1"
into  "size - n"  and   "i + n"

HTH
cheers -ben

P.S. If  #overlappingPairsSkip:collect:  seemed generically useful to push
into Pharo,
then the following would be redefined to reuse it...
  SequenceableCollection>>overlappingPairsCollect: aBlock
  ^self overlappingPairsSkip: 1 collect: aBlock

The extra layer of function call doesn't add much overhead.

On the other hand #overlappingPairsRemainderCollect:
does lots of extra copying through #allButFirst:
so would not be so good for reuse by  #overlappingPairsCollect:


Re: [Pharo-users] how to use indexes to copy a substring out of a string

2018-12-29 Thread Roelof Wobben

Op 29-12-2018 om 21:05 schreef Hernán Morales Durand via Pharo-users:

Thanks,

I will look at these but I think there are too advanced for Aoc 
challenges but maybe I see a way to solve it more easily





Re: [Pharo-users] how to use indexes to copy a substring out of a string

2018-12-29 Thread Hernán Morales Durand via Pharo-users
--- Begin Message ---
Hi Roelof

You may have a look at my StringExtensions package at
https://github.com/hernanmd/StringExtensions
I wrote several algorithms for working with substrings like:

https://github.com/hernanmd/StringExtensions/blob/master/repository/StringExtensions.package/String.extension/instance/indexesOfMotif..st
https://github.com/hernanmd/StringExtensions/blob/master/repository/StringExtensions.package/String.extension/instance/indicesOfSubstringOverlaps..st

Some of them used in https://github.com/hernanmd/BioSmalltalk where
you can find also algorithms for k-mer counting and clump finding:

https://github.com/hernanmd/BioSmalltalk/blob/master/repository/BioTools.package/BioSequence.class/instance/clumpFindK.length.times..st
https://github.com/hernanmd/BioSmalltalk/blob/master/repository/BioTools.package/BioSequence.class/instance/kmersCount.mismatches..st

for the Bio repo just consider a Sequence like a more advanced String.

Cheers,

Hernán

El sáb., 29 dic. 2018 a las 13:29, Roelof Wobben () escribió:
>
> Hello,
>
> Still working on AdventOfCode
>
> Im struggeling to see how this can be solved.
>
> Now, a nice string is one with all of the following properties:
>
> It contains a pair of any two letters that appears at least twice in the 
> string without overlapping, like xyxy (xy) or aabcdefgaa (aa), but not like 
> aaa (aa, but it overlaps).
> It contains at least one letter which repeats with exactly one letter between 
> them, like xyx, abcdefeghi (efe), or even aaa.
>
> my game plan was this
>
> 1) group the characters and filter everything out which has a count not equal 
> to two
>
> 2) find the indexes of the characters and with the indexes use a copy method 
> so I have a substring out of it
>
> 3) do the same with the substring so again 1 and 2
>
> but how can I make 2 work.
>
> Roelof
>
>

--- End Message ---


[Pharo-users] how to use indexes to copy a substring out of a string

2018-12-29 Thread Roelof Wobben

  
  
Hello, 

Still working on AdventOfCode 

Im struggeling to see how this can be solved. 

Now, a nice string is one with all of the following properties:

  It contains a pair of any two letters that appears at least
twice in the string without overlapping, like xyxy
(xy) or aabcdefgaa (aa),
but not like aaa (aa, but it
overlaps).
  It contains at least one letter which repeats with exactly one
letter between them, like xyx, abcdefeghi
(efe), or even aaa.

my game plan was this 

1) group the characters and filter everything out which has a
  count not equal to two 

2) find the indexes of the characters and with the indexes use a
  copy method so I have a substring out of it 

3) do the same with the substring so again 1 and 2 

but how can I make 2 work. 

Roelof