Hi Gerard, these can usually be fixed by using parse's /all refinement and handeling white space yourself. I find I almost allways do this when I am doing more than simple string splitting.
make a rule that accepts white space and include it at all the places you need it. ... ws: charset [#" " #"^-" #"^/"] ... english-day: ["Fri." |"Sat." |"Sun." |"Mon." |"Tue." |"Wed." |"Thu." |"Every day" some ws] ... parse/all t4 rules2/expr ... Gerard Cote wrote: > Hi everybody, > > in an effort to augment the interest of a friend for REBOL I recently tried > to create a simple datamining app that could analyze > theatre information about films presentation days and hours. The site from > which I retrieve the information comes from the french > site http://cinemaquebec.com). > > In fact for the moment my biggest problem come from the fact that I don't > fully understand the way PARSE works when it encounters > newline characters. > > Let me give a simplified example extracted from the site to illustrate my > point: > t4: { Fri.: 1:00, 3:00, 7:00 Sat., sun., mon., tue., wed., thu.: 10:00am, > 1:00, 3:00, 9:00, 10:00} > > Here we have one day (Fri.) followed by a colon(:) followed again by 3 times. > Right after this cycle is done again with not one but 6 days separated by (,) > again followed by a colon (:) and 5 other times. > > I wrote a block of relatively simple rules that apply well against this > simple example. > > Here is the result I get from the parse: > >>>parse t4 rules2/expr > > which-day: "Fri." 4 > Hour: "1" 1 > Min: "00" 2 > which-hour: " 1:00" 5 > Hour: "3" 1 > Min: "00" 2 > which-hour2: " 3:00" 5 > Hour: "7" 1 > Min: "00" 2 > which-hour2: " 7:00 " 6 > which-days: "Fri.: 1:00, 3:00, 7:00 " 23 > which-day: "Sat." 4 > which-day2: " sun." 5 > which-day2: " mon." 5 > which-day2: " tue." 5 > which-day2: " wed." 5 > which-day2: " thu." 5 > Hour: "10" 2 > Min: "00" 2 > which-hour: " 10:00" 6 > Hour: "1" 1 > Min: "00" 2 > which-hour2: " 1:00" 5 > Hour: "3" 1 > Min: "00" 2 > which-hour2: " 3:00" 5 > Hour: "9" 1 > Min: "00" 2 > which-hour2: " 9:00" 5 > Hour: "10" 2 > Min: "00" 2 > which-hour2: " 10:00" 6 > which-days2: {Sat., sun., mon., tue., wed., thu.: 10:00am, 1:00, 3:00, 9:00, > 10:00} 68 > film-hours: { Fri.: 1:00, 3:00, 7:00 Sat., sun., mon., tue., wed., thu.: > 10:00am, 1:00, 3:00, 9 > :00, 10:00} > ---------------------------------------------------------- > == true > > Now I include my parse rules if I want to let those interested understand > the way I did. > (for convenience I also attach them to this msg.) > You'll notice the many PRINTs to help me navigate in parallel with parse. > > rules2: make object! [ > expr: [copy film-hours film-hours-rules > (print ["film-hours: " mold film-hours newline > "----------------------------------------------------------" > newline]) > to end > ] > > film-hours-rules: [copy which-days days-group > (print ["which-days: " mold which-days length? which-days]) > any [copy which-days2 days-group > (print ["which-days2: " mold which-days2 length? which-days2]) > ] > ] > > days-group: [copy which-day day > (print ["which-day: " mold which-day length? which-day]) > any ["," copy which-day2 day > (print ["which-day2: " mold which-day2 length? which-day2]) > ] > ":" > copy which-hour show-hour > (print ["which-hour: " mold which-hour length? which-hour]) > 0 1 "am" > any ["," copy which-hour2 show-hour > (print ["which-hour2: " mold which-hour2 length? which-hour2]) > 0 1 "am" > ] > ] > > digit: charset [#"0" - #"9"] > hour: [digit 0 1 digit] > minutes: [digit digit] > show-hour: [copy this-hour hour (print ["Hour:" mold this-hour length? > this-hour]) > ":" > copy this-min minutes (print ["Min:" mold this-min length? this-min])] > > english-day: ["Fri." |"Sat." |"Sun." |"Mon." |"Tue." |"Wed." |"Thu." |"Every > day"] > french-day: ["Ven." |"Sam." |"Dim." |"Lun." |"Mar." |"Mer." |"Jeu." |"Tous > les jours"] > day: ["Fri." |"Sat." |"Sun." |"Mon." |"Tue." |"Wed." |"Thu." |"Every day"] > ] > > Now my problem is stated as this: > > When I submit a broken (newline) set of data in the form of a new t4 as > follows, my rules no more work: > t4: { Fri.: 1:00, 3:00, 7:00 > Sat., sun., mon., tue., wed., thu.: 10:00am, 1:00, 3:00, 9:00, 10:00} > > The new results are now more like this: > > >>>parse t4 rules2/expr > > which-day: "Fri." 4 > Hour: "1" 1 > Min: "00" 2 > which-hour: " 1:00" 5 > Hour: "3" 1 > Min: "00" 2 > which-hour2: " 3:00" 5 > Hour: "7" 1 > Min: "00" 2 > which-hour2: " 7:00" 5 > which-days: "Fri.: 1:00, 3:00, 7:00" 22 > film-hours: " Fri.: 1:00, 3:00, 7:00" > ---------------------------------------------------------- > > == true > > The second part of results have been chopped. > > Later this chopped part mixed with the next title film when > I complete my rules to get the title after the last presentation time > > Any help is appreciated. > > Regards, Gerard > > > > > > -- Binary/unsupported file stripped by Ecartis -- > -- Type: text/x-rebol > -- File: parse-film-times.r > > -- To unsubscribe from the list, just send an email to lists at rebol.com with unsubscribe as the subject.
