Ziv, Thanks for the catch with the non-matched groups, I just fixed a bug. :-)
Also, thanks to Benjamin for pointing out an extremely useful blog post in a different thread! I'm not sure exactly if the client-side tests are actually client-side, though... How does the Ur/Web's client-side "VM" work? As it is, the JS code is impenetrable (making it hard to debug). I'll start a new thread for this. Please, let me ponder on your detailed response for a bit. This is a deep rabbit hole. Maybe waiting indefinitely for v2 of typed regex is the best outcome for now. 2017-03-09 21:20 GMT+06:00 Ziv Scully <[email protected]>: > I agree we probably don't want to do fancy things that are only available > the server. However, I think it's still possible to support nice behavior > for alt just by postprocessing on the Ur/Web side. > > Let's say "names" are capture groups from the library user's perspective and > "indices" are capture groups from the FFI's perspective. Right now, each > name corresponds to exactly one index. You could support alt by having each > name correspond to multiple indices, each of which is matched at most once. > > Concrete example: say you have (#X:a)(#Y:b)|(#X:c)(#Z:d)—that is, abusing > notation somewhat, an expression that captures names either X and Y or X and > Z. You could represent this using the record of index lists {X = [1,3], Y = > [2], Z = [4]}. When you get the result list back from the FFI, you have to > check groups 1 and 3 to see if X was matched. If you only ever concatenate > index lists as part of alt, you ensure that you never have two non-null > matches for the same name: given a match, in each name's index list, all but > at most one of the indices is from an alt branch that wasn't taken. > > You could take these ideas further to track statically when a group is > guaranteed to be matched (like X above) or not (like Y or Z above). The > easiest way is probably to have two record parameters for the type—one for > guaranteed match names, another for optional match names. For example > (leaving out folders and disjoitedness constraints): > > star : r ::: Unit -> s ::: Unit -> t r s -> t [] (r + s) > > plus : r ::: Unit -> s ::: Unit -> t r s -> t r s > > capture : r ::: Unit -> s ::: Unit -> nm :: Name -> t r s -> t (r ++ [nm]) > s. > > The left record has the names of guaranteed matches, and the right record > has the names of optional ones. You can see that plus keeps guaranteed > matches guaranteed, star makes all of them optional. The most complicated > type would, of course, be alt. > > alt : r ::: Unit -> r1 ::: Unit -> r2 ::: Unit s1 ::: Unit -> s2 ::: Unit > -> t (r ++ r1) s1 -> t (r ++ r2) s2 > -> t r (r1 ++ r2 ++ s1 ++ s2) > > That is, the only groups that are guaranteed to be matched are those > guaranteed in both of the branches; all others are optional. > > > On Thu, Mar 9, 2017 at 09:17 Artyom Shalkhakov <[email protected]> > wrote: >> >> 2017-03-09 19:39 GMT+06:00 Ziv Scully <[email protected]>: >> > Looks good—glad it worked out! >> > >> > My main question: what happens if the whole string is matched but a >> > particular subgroup isn't? This can happen with alt or star, for >> > instance. I >> > couldn't tell from the types. I would expect a return type for match to >> > be >> > something like `match r (option string)` instead of `option (match r >> > string)`. I don't know the underlying regex library, so perhaps there's >> > a >> > good reason for this. >> >> I think a test is in order; I don't know what happens in that case (do >> you have an example handy?). From JS RegExp object documentation, it >> seems that subgroups are always present (but, indeed, the strings >> might be nulls or something else entirely, this is JS, after all). >> Good catch! >> >> > >> > I also thought it may be useful if alt allowed the same group to be >> > captured >> > on either side, but I realize that takes further fiddling with subgroup >> > indices and post-processing of the results (each group field name would >> > map >> > to a list of indices instead of a single index), so it can wait for >> > version >> > 2 :). >> > >> >> I was a bit puzzled about the type you gave for [alt], it only dawned >> on me later what you meant with it. :-) >> >> I can see two questions here: >> >> 1. say we have an expression like [a(b)*c], if matched against >> [abbbbbbc] in JS, it will only capture the *first* b -- I guess >> Boost.Regex which is used on the server behaves the same (Benjamin, >> could you help out on this?) >> >> 2. say we have an expression like [(a)|(b)], in JS the groups will be >> 1 and 2. PCRE and some other engines support a "group reset" feature, >> so that we can indeed capture both alternatives with one group. This >> is unsupported in JS, so I decided to leave it as is, for now. >> >> > >> > On Tue, Mar 7, 2017 at 02:11 Artyom Shalkhakov >> > <[email protected]> >> > wrote: >> >> >> >> Hello Ziv, >> >> >> >> I've implemented your idea [1], and it works. Looking for feedback. >> >> >> >> 2017-02-20 15:48 GMT+06:00 Ziv Scully <[email protected]>: >> >> > By parametrizing over a type-level record, you can encode the names >> >> > of >> >> > matched subgroups in the type of the regular expression. The >> >> > interface >> >> > probably looks something like: >> >> > >> >> > (* Abstract type of uncompiled regular expressions. *) >> >> > con t :: {Unit} (* names of captured subgroups *) -> Type >> >> > >> >> > val literal : >> >> > string -> t [] >> >> > val any : t [] >> >> > val concat : r1 ::: {Unit} -> r2 ::: {Unit} -> [r1 ~ r2] => >> >> > t r1 -> t r2 -> t (r1 ++ r2) >> >> > val star : r ::: {Unit} -> >> >> > t r -> t r >> >> > val alt : r ::: {Unit} -> s1 ::: {Unit} -> s2 ::: {Unit} -> [r ~ s1] >> >> > => >> >> > [r ~ >> >> > s2] => [s1 ~ s2] => >> >> > t (r ++ s1) -> t (r ++ s2) -> t (r ++ s1 ++ s2) >> >> > val capture : r ::: {Unit} -> >> >> > nm :: Name -> [r ~ [nm]] => t r -> t (r ++ [nm]) >> >> > >> >> > (* Abstract type of compiled regular expressions *) >> >> > con compiled :: {Unit} -> Type >> >> > >> >> > val compile : r ::: {Unit} -> >> >> > t r -> compiled r >> >> > >> >> > Obviously this probably won't exactly work. For instance, if the >> >> > underlying >> >> > library requires explicit names for captured groups, you'll need >> >> > something >> >> > more like >> >> > >> >> > val capture : r ::: {Unit} -> nm :: Name -> [r ~ [nm]] => >> >> > {nm : string} -> t r -> t (r ++ [nm]) >> >> > >> >> > because there's no way to extract a string from a type-level Name. In >> >> > fact, >> >> > this way has another benefit: the internal representation of the type >> >> > can >> >> > just be a string! All the tracking of subgroups can happen at the >> >> > type >> >> > level. You may not even need compile. >> >> > >> >> > This interface assumes each subgroup may appear 0 or 1 times. You >> >> > could >> >> > maybe keep track of the number of matches more precisely by using >> >> > {Type} >> >> > instead of {Unit} and using a type class to implement type-level >> >> > functions. >> >> > >> >> > con t : {Type} (* maps names of captured subgroups to type-level >> >> > representation of number of matched fields *) -> Type >> >> > >> >> > type one >> >> > type oneOrNone >> >> > >> >> > class meet : Type -> Type -> Type >> >> > val meet_one_oneOrNone : weaken one oneOrNone oneOrNone >> >> > val meet_oneOrNone_one : weaken oneOrNone one oneOrNone >> >> > val meet_oneOrNone_oneOrNone : weaken oneOrNone oneOrNone oneOrNone >> >> > >> >> > This makes the types more complicated, but I suspect (though I don't >> >> > know >> >> > for sure) that Ur/Web will be able to infer the extra stuff in any >> >> > concrete >> >> > (that is, monomorphic) use, so the user code will still look pretty. >> >> > Here's >> >> > how some of the functions look with this. The place weaken gets used >> >> > is >> >> > alt. >> >> > >> >> > val star : r ::: {Type} -> >> >> > t r -> t (map (fn _ => oneOrNo) r) >> >> > >> >> > val alt : r ::: {Type * Type * Type} -> s1 ::: {Type} -> s2 ::: >> >> > {Type} >> >> > -> >> >> > [map fst3 r ~ s1] => [map snd3 r ~ s2] => [s1 ~ s2] => >> >> > $(map (fn (t1, t2, t3) => weaken t1 t2 t3) r) >> >> > t (map fst3 r ++ s1) -> t (map snd3 r ++ s2) >> >> > -> t (map thd3 r ++ map (fn _ => oneOrNo) (s1 ++ s2)) >> >> > >> >> > You could extend the same sort of idea to types like oneOrMore and >> >> > noneOrMore—useful if the underlying library can return a list of >> >> > matches >> >> > inside a + or *—by adding more cases to weaken. >> >> > >> >> > Here's a Haskell library that takes basically this approach: >> >> > https://github.com/sweirich/dth/tree/master/popl17/src >> >> > >> >> > Hope this helps! >> >> > Ziv >> >> > >> >> > >> >> > On Mon, Feb 20, 2017 at 12:00 AM, Artyom Shalkhakov >> >> > <[email protected]> wrote: >> >> >> >> >> >> 2017-02-20 10:47 GMT+06:00 <[email protected]>: >> >> >>> >> >> >>> Maybe just define another attribute in basis.urs? >> >> >>> >> >> >>> It would be much interesting to define a DSEL though. I have not >> >> >>> seen >> >> >>> any >> >> >>> attempts to create a DSEL for regular expressions. >> >> >>> >> >> >> >> >> >> Indeed! And it seems like it would be a nice exercise as well having >> >> >> some >> >> >> practical uses to it. >> >> >> >> >> >> I envision something along the lines of: >> >> >> >> >> >> datatype RegExp = ... (* constructors *) >> >> >> >> >> >> type t (* abstract, not exported outside the module *) >> >> >> val compile : RegExp -> t (* the trusted function *) >> >> >> val matches : string -> t -> bool (* check if passed-in string >> >> >> matches >> >> >> the >> >> >> expression *) >> >> >> >> >> >> And the implementation might use something unsafe (like a string, or >> >> >> an >> >> >> actual regexp object in JS). Then we could offload the >> >> >> interpretation >> >> >> of >> >> >> regexps to an off-the-shelf library. The [compile] function would >> >> >> print >> >> >> and >> >> >> escape the AST to a regexp string, and the Ur/Web compiler might >> >> >> actually >> >> >> optimize all of this away. >> >> >> >> >> >> If we are to take care of matching subgroups, that would require a >> >> >> more >> >> >> complicated type. I guess that should be doable as well, but I don't >> >> >> see >> >> >> immediately how to handle this. >> >> >> >> >> >>> >> >> >>> >> >> >>> >> >> >>> 20. Feb 2017 16:28 by [email protected]: >> >> >>> >> >> >>> >> >> >>> Hello all, >> >> >>> >> >> >>> 'm doing form validation stuff, following examples on MDN. [1] >> >> >>> >> >> >>> It seems like Ur/Web doesn't handle regular expressions? >> >> >>> >> >> >>> I guess it should not be too diffcult to define a DSEL in Ur/Web >> >> >>> (just >> >> >>> AST constructors and a few trusted functions: e.g. [compile : AST >> >> >>> -> >> >> >>> regexp], where [regexp] is an abstract type, or maybe just a string >> >> >>> :-)) and >> >> >>> have it work at least in the browser. >> >> >>> >> >> >>> Could somebody point me in the right direction here? >> >> >>> >> >> >>> -- >> >> >>> Cheers, >> >> >>> Artyom Shalkhakov >> >> >>> >> >> >>> [1] >> >> >>> >> >> >>> >> >> >>> https://developer.mozilla.org/en-US/docs/Web/Guide/HTML/HTML5/Constraint_validation >> >> >>> >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Cheers, >> >> >> Artyom Shalkhakov >> >> >> >> >> >> _______________________________________________ >> >> >> Ur mailing list >> >> >> [email protected] >> >> >> http://www.impredicative.com/cgi-bin/mailman/listinfo/ur >> >> >> >> >> > >> >> > >> >> > _______________________________________________ >> >> > Ur mailing list >> >> > [email protected] >> >> > http://www.impredicative.com/cgi-bin/mailman/listinfo/ur >> >> > >> >> >> >> -- >> >> Cheers, >> >> Artyom Shalkhakov >> >> >> >> [1] https://github.com/bbarenblat/urweb-regex/pull/1 >> >> >> >> _______________________________________________ >> >> Ur mailing list >> >> [email protected] >> >> http://www.impredicative.com/cgi-bin/mailman/listinfo/ur >> > >> > >> > _______________________________________________ >> > Ur mailing list >> > [email protected] >> > http://www.impredicative.com/cgi-bin/mailman/listinfo/ur >> > >> >> >> >> -- >> Cheers, >> Artyom Shalkhakov >> >> _______________________________________________ >> Ur mailing list >> [email protected] >> http://www.impredicative.com/cgi-bin/mailman/listinfo/ur > > > _______________________________________________ > Ur mailing list > [email protected] > http://www.impredicative.com/cgi-bin/mailman/listinfo/ur > -- Cheers, Artyom Shalkhakov _______________________________________________ Ur mailing list [email protected] http://www.impredicative.com/cgi-bin/mailman/listinfo/ur
