Do you need to match partial lines as well?
Generic string matching algorithms are pretty complex, but if you have
restrictions on the type of tokens then you can simplify the problem.
For example you could approach this problem by tokenizing patterns.
Your example:
> I want to go home
> 1+3=4
> Hello World
> 1+3=4
> Hello World
> 8+2=10
> Hello World
> Klskd
Can be tokenized as ANBNCNBNCNDNCNE. Then it is much easier to find
repeating patterns in this tokenized string (it is shorten and only
one line long). And if you want {n}+{n}={n} to always be tokenized
with the same token - then it might even apply to your second
question.
A => I want to go home
B => 1+3=4
C => Hello World
D => 8+2=10
E => Klskd
N => new line character
The assumption for this kind of approach is that your tokens are
line-long and all it does is simplify the amount of data that needs to
be matched. If you dont assume that the tokens are line-long then
matching characters
On Wed, Mar 18, 2009 at 1:10 PM, Yossi Itzkovich
<[email protected]> wrote:
> Hi,
> I am looking for a module that will get a file with text lines, and will find
> repeating patterns.
> Example: given the following file:
> -----
> I want to go home
> 1+3=4
> Hello World
> 1+3=4
> Hello World
> 8+2=10
> Hello World
> Klskd
> -----
> The script should tell me that the sequence : 1+3=4 and Hello World repeat 2
> times. A better script may tell me even that the more general pattern:
> {number}+number}={number} and Hello World repeat 3 times.
>
> Any suggestion?
>
> Thanks
> Yossi
> _______________________________________________
> Perl mailing list
> [email protected]
> http://perl.org.il/mailman/listinfo/perl
>
_______________________________________________
Perl mailing list
[email protected]
http://perl.org.il/mailman/listinfo/perl