How to print all expressions that match a regular expression

2010-02-06 Thread hzh...@gmail.com
Hi,

I am a fresh man with python. I know there is regular expressions in
Python. What I need is that given a particular regular expression,
output all the matches. For example, given “[1|2|3]{2}” as the regular
expression, the program should output all 9 matches, i.e., "11 12 13
21 22 23 31 32 33".

Is there any well-written routine in Python or third-party program to
do this? If there isn't, could somebody make some suggestions on how
to write it myself?

Thanks.

Zhuo
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to print all expressions that match a regular expression

2010-02-06 Thread hzh...@gmail.com
Thanks for your reply.
So there isn't such a routine just because some of the regular
expressions cannot be enumerated. However, some of them can be
enumerated. I guess I have to write a function myself.

Zhuo

On Feb 6, 5:23 pm, Roy Smith  wrote:
> In article
> ,
>
>
>
>
>
>  "hzh...@gmail.com"  wrote:
> > Hi,
>
> > I am a fresh man with python. I know there is regular expressions in
> > Python. What I need is that given a particular regular expression,
> > output all the matches. For example, given ³[1|2|3]{2}² as the regular
> > expression, the program should output all 9 matches, i.e., "11 12 13
> > 21 22 23 31 32 33".
>
> > Is there any well-written routine in Python or third-party program to
> > do this? If there isn't, could somebody make some suggestions on how
> > to write it myself?
>
> > Thanks.
>
> > Zhuo
>
> Please enumerate all the strings which match ".*".  Use additional sheets
> of paper if needed.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to print all expressions that match a regular expression

2010-02-06 Thread hzh...@gmail.com

>
> So it seems we both misunderstood the problem.
>
> I didn't read the top level article until now, and reading it, I can't make
> sense of it.
>

Seems that you should read the whole thing before making a post, or
else you cannot know what we are talking about.
Steven doesn't misunderstand me. We are talking about what I need, and
he tries to help.



>
> > "Given the function hashlib.sha256, enumerate all the possible inputs
> > that give the hexadecimal result
> > 0a2591aaf3340ad92faecbc5908e74d04b51ee5d2deee78f089f1607570e2e91."
>
> I tried some "parrot" variants but no dice. :-(
>
> [snip]
>

This is a hash collision problem. Nobody has proved that SHA-256 is
collision free, even not in the random oracle model, because people
always suppose that a random oracle exists, and make hash function its
substitution. That means it may be broken someday. And any provable
security based on random oracle model is not secure.


> > I'm suggesting that, in general, there's no way to tell in advance which
> > regexes will be easy and which will be hard, and even when they are easy,
> > the enumeration will often be infinite.

It is hard to tell in advance. However, we can add some timing limit
or counting limit, to make it an algorithm, which can halt. For
example, whenever the program outputs more than 100 expressions
that match the input regex, we can halt because that exceeds our
limit. But surely this is not efficient because of the post-decision.



>
> Essentially, any regexp that includes '+' or '*' (directly or via e.g. 
> notation
> that denotes "digit sequence") yields an infinite number of strings.

Infinity is really relative, not absolute. It is relative to the
computing speed. For example, the regex '^[0|1]{2048}$' is rather
simple and doesn't contain '+' or '$', but trying to output all
expressions that match it has a complexity of 2^2048. If we can do
that, then we can break RSA-2048.
We must face the reality .

Zhuo

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to print all expressions that match a regular expression

2010-02-07 Thread hzh...@gmail.com

>
> And I really don't see how simple enumeration of range(2^2048) breaks
> RSA-2048, since that problem requires you to find two factors which,
> when multiplied together, give that specific value.
>

I can tell you why is that. RSA-2048 has a composite of length less
than 2^2048, which is a product of two large primes. So one of its
factors cannot exceed 2^2047, and we can treat the multiplication as a
computation with constant complexity. So the time complexity of
enumerating 2^2048 strings is the same with factoring a composite with
length 2^2048 which is the product of two primes.

And obviously, whenever we successfully factor the composite, we can
calculate the Euler function of it. So that given any public key
(n,e), calculating the private key (n,d) is easy.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to print all expressions that match a regular expression

2010-02-07 Thread hzh...@gmail.com
That is a method called brute force. According to my computation,
2^2048=
32317006071311007300714876688669951960444102669715484032130345427524655138867890
89319720141152291346368871796092189801949411955915049092109508815238644828312063
08773673009960917501977503896521067960576383840675682767922186426197561618380943
3847617047058164585203630504288757589154106580860755239912393038552191489668
34242068497478656456949485617603532632205807780565933102619270846031415025859286
41771167259436037184618573575983511523016459044036976132332872312271256847108202
09725157101726931323469678542580656697935045997268352998638215525166389437335543
602135433229604645318478604952148193555853611059596230656L

which is a very large number.

There are some other algorithms for factoring integers, including
Generalized number field sieve. And in quantum computing, there is an
algorithm called Shor, which is claimed to be a polynomial algorithm
if run under quantum computers. But seems that kind of computers
haven't been successfully built, or else RSA and many other security
mechanisms based on computation complexity cannot be used any longer.

What I need in my application is just to list all expressions that
match a particular regex, which I believe will be more efficient to
deal with if there is a general function for this purpose.
Unfortunately there is not such a function, so I will write my own
function to deal with my particular regex, which can be enumerated.

Sincerely,

Zhuo

On Feb 7, 10:38 am, Steve Holden  wrote:
> hzh...@gmail.com wrote:
> >> And I really don't see how simple enumeration of range(2^2048) breaks
> >> RSA-2048, since that problem requires you to find two factors which,
> >> when multiplied together, give that specific value.
>
> > I can tell you why is that. RSA-2048 has a composite of length less
> > than 2^2048, which is a product of two large primes. So one of its
> > factors cannot exceed 2^2047, and we can treat the multiplication as a
> > computation with constant complexity. So the time complexity of
> > enumerating 2^2048 strings is the same with factoring a composite with
> > length 2^2048 which is the product of two primes.
>
> > And obviously, whenever we successfully factor the composite, we can
> > calculate the Euler function of it. So that given any public key
> > (n,e), calculating the private key (n,d) is easy.
>
> So all I have to do to break RSA is to count to 2^2048?
>
> regards
>  Steve
> --
> Steve Holden           +1 571 484 6266   +1 800 494 3119
> PyCon is coming! Atlanta, Feb 2010  http://us.pycon.org/
> Holden Web LLC                http://www.holdenweb.com/
> UPCOMING EVENTS:        http://holdenweb.eventbrite.com/

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to print all expressions that match a regular expression

2010-02-07 Thread hzh...@gmail.com

>
> Please check out this example on the pyparsing wiki, 
> invRegex.py:http://pyparsing.wikispaces.com/file/view/invRegex.py.  This code
> implements a generator that returns successive matching strings for
> the given regex.  Running it, I see that you actually have a typo in
> your example.
>
> >>> print list(invert("[1|2|3]{2}"))
>
> ['11', '1|', '12', '13', '|1', '||', '|2', '|3', '21', '2|', '22',
> '23', '31', '3|', '32', '33']
>
> I think you meant either "[123]{2}" or "(1|2|3){2}".
>
> >>> print list(invert("[123]{2}"))
>
> ['11', '12', '13', '21', '22', '23', '31', '32', '33']
>
> >>> print list(invert("(1|2|3){2}"))
>
> ['11', '12', '13', '21', '22', '23', '31', '32', '33']
>
> Of course, as other posters have pointed out, this inverter does not
> accept regexen with unbounded multiple characters '+' or '*', but '?'
> and "{min,max}" notation will work.  Even '.' is supported, although
> this can generate a large number of return values.
>
> Of course, you'll also have to install pyparsing to get this to work.
>
> -- Paul


Hi Paul,

Thanks very much. This is exactly what I need now. I will check this
function.

Zhuo

-- 
http://mail.python.org/mailman/listinfo/python-list