> On Mar 20, 2019, at 8:19 PM, Joel Dueck <[email protected]> wrote:
>
> The regular-expression version is slower, much more so, it seems, the larger
> the first txexpr that you give it.
>
> I am sure both functions could be made much faster. In particular the regex
> version matches all the words in each string, there is probably a better
> pattern that would stop after the first N words.
Your regexp-based function is slower because of `regexp-match*`, which eagerly
finds all the matches (whether you need them or not). Whereas the port-based
function is faster because it works incrementally.
But you can do both at the same time, by passing an input port as the argument
to `regexp-match`. In this example, the pattern is matched incrementally, and
if we don't get enough words, we incrementally process the next txexpr.
(require racket/string)
(define (first-words-regex2 txs n)
(define words
(let loop ([txs txs][n n])
(define ip (open-input-string (tx-strs (car txs))))
(define words (for*/list ([i (in-range n)]
[bs (in-value (regexp-match #px"\\w+" ip))]
#:break (not bs))
(bytes->string/utf-8 (car bs))))
(if (= (length words) n)
words
(append words (loop (cdr txs) (- n (length words)))))))
(string-join words " "))
--
You received this message because you are subscribed to the Google Groups
"Pollen" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.