great, simple is good here Lorenz. I tried $1 but was surprised to get
full strings bound to ?email on non-matches here as well. I had to
combine it with a filter of the same pattern to get the desired
results. so I read the replacement semantics here now that
backreference $0 is the input string and $1 is the first match.
(following XPath and XQuery Functions and Operators 3.1)?
query now looks like this:
SELECT ?email
WHERE { ?s ?p ?o
FILTER regex(STR(?s),".*/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+)/.*","i")
BIND (REPLACE(STR(?s),".*/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+)/.*","$1") AS ?email)
} GROUP BY ?email
On Wed, Apr 24, 2019 at 9:03 AM Lorenz B.
<[email protected]> wrote:
>
> Or maybe even more simple
>
> |BIND(REPLACE(STR(?url),".*/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+)/.*","$1")
> AS ?email)|
>
> >> BIND (REPLACE(STR(?s),"[a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+","$0") AS ?email)
> > replaces the matching email address by the email address itself, so it's
> > the same as before.
> >
> > You need to replace everything else by the email address, replace is not
> > an "extract" function, you can try
> >
> > BIND
> > (REPLACE(STR(?url),"[a-zA-Z0-9/:._-]+/([a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+)/[a-zA-Z0-9/._-]+","$1")
> > AS ?email)
> >
> > Note, I assume that email addresses are wrapped inside / char
> >
> >
> >> very good Richard, thank you. I was working along these lines with the
> >> following
> >>
> >> BIND (REPLACE(STR(?url),"[a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+","$0") AS ?email)
> >>
> >> where ?url contains the match but binds the entire string again to ?email
> >>
> >> eg data:
> >>
> >> url =
> >> http://www.imagesnippets.com/imgtag/rdf/[email protected]/1598550_10204479279247862_1280347905880818932_o
> >>
> >> query
> >>
> >> SELECT ?email
> >> WHERE {
> >> ?s ?p ?o
> >> BIND (REPLACE(STR(?s),"[a-zA-Z0-9._-]+@[a-zA-Z0-9._-]+","$0") AS ?email)
> >> }
> >>
> >> On Tue, Apr 23, 2019 at 6:00 PM Richard Cyganiak <[email protected]>
> >> wrote:
> >>> Hi Marco,
> >>>
> >>>> On 23 Apr 2019, at 15:53, Marco Neumann <[email protected]> wrote:
> >>>>
> >>>> I think I'm familiar with functions on strings in SPARQL but as far as
> >>>> I can see there is nothing similar to a grep like pattern matching and
> >>>> extraction on strings for SPARQL. Or is there one?
> >>> The replace function does pattern matching and allows extraction of
> >>> matched sub-patterns:
> >>> https://www.w3.org/TR/sparql11-query/#func-replace
> >>> <https://www.w3.org/TR/sparql11-query/#func-replace>
> >>> https://www.w3.org/TR/xpath-functions/#func-replace
> >>> <https://www.w3.org/TR/xpath-functions/#func-replace>
> >>>
> >>> replace(input, pattern, replacement)
> >>>
> >>> The special “variables” $1, $2, $3, and so on can be used in the
> >>> replacement string. They refer to parts of the input that were matched by
> >>> the first, second, third, and so on pair of parentheses in the regex
> >>> pattern. For example:
> >>>
> >>> replace("23 April 2019", "^([0-9][0-9])", "$1")
> >>>
> >>> would return "23" because that is the part of the input matched by the
> >>> first (and only) pair of parentheses.
> >>>
> >>> Also useful might be Jena’s own apf:strSplit property function:
> >>> https://jena.apache.org/documentation/query/library-propfunc.html
> >>> <https://jena.apache.org/documentation/query/library-propfunc.html>
> >>>
> >>> It can split a literal into multiple literals based on a regular
> >>> expression.
> >>>
> >>> Taken together, these two functions can do a wide range of pattern
> >>> matching and extraction tasks.
> >>>
> >>> Hope that helps,
> >>> Richard
> >>
> --
> Lorenz Bühmann
> AKSW group, University of Leipzig
> Group: http://aksw.org - semantic web research center
>
--
---
Marco Neumann
KONA