I got the query down to 7s in a hot fuseki (3.10-snapshot) instance vs 1-2s
with a virtuoso db on the same machine and data. virtuoso definitely
working its magic on this kind of query here.

Andy, for now the fuseki results will work for me in the sandbox, no need
to further investigate this.

Thank You


On Wed 19 Dec 2018 at 22:32, Andy Seaborne <[email protected]> wrote:

> The join-bgp-slice/project could be done better, depending on the data
> shape.
>
>      Andy
>
> On 19/12/2018 13:08, Marco Neumann wrote:
> > Andy, I did a local profiling of the query in a standard
> > Eclipse/YourKit configuration which took 45s. looks like this is just
> > a matter of increasing heap space to allow fuseki to complete the
> > query now.
> >
> > (slice 0 10000
> >   (project (?r ?count)
> >     (extend ((?count ?.0))
> >       (group (?s ?r) ((?.0 (count)))
> >         (join
> >           (bgp (triple ?x ?r ?s))
> >           (slice 124639 1000
> >             (project (?s)
> >               (bgp (triple ?s
> > <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?o)))))))))
> >
> > On Wed, Dec 19, 2018 at 10:57 AM Marco Neumann <[email protected]>
> wrote:
> >>
> >> so yes an apache proxy and a query timeout limit for fuseki instances
> >> it will be.
> >>
> >> I just checked the same query on an open source virtuoso instance
> >> (7.2) with the same data and it seems that virtuoso handles the
> >> request much more resourcefully and to completion. Andy can you
> >> enlighten me what the main difference here is in the treatment of the
> >> query by jena (39s) vs virtuoso (1s)?
> >>
> >> On Wed, Dec 19, 2018 at 6:56 AM Laura Morales <[email protected]>
> wrote:
> >>>
> >>>> and needs some explaining why we put open endpoints on the web
> without great restrictions
> >>>
> >>> I've always been puzzled by this as well. You never see a publicly
> reachable PostgreSQL or MariaDB servers, or any other database. There is
> always a layer in between which defines a list of possible requests, and
> then every requests is optimized to retrieve data from the database. With a
> public endpoint instead, this optimization is not possible since anybody
> can write any query. I think the reason is simply that a sparql endpoint is
> supposed to answer any type of query which traverses any path that is not
> well defined a priori. If you only want the server to serve a specific kind
> of queries instead, in this case you can in fact use some kind of REST API
> in front of it and translate every request to a sparql query; in this
> scenario you don't need the endpoint to be public, but you're limiting the
> type of queries that a user can ask.
> >>>
> >>>
> >>>
> >>>
> >>> Sent: Tuesday, December 18, 2018 at 11:40 PM
> >>> From: "Marco Neumann" <[email protected]>
> >>> To: "Bruno P. Kinoshita" <[email protected]>,
> [email protected]
> >>> Subject: Re: blocking IP to prevent malicious sparql queries
> >>> It's good to see people using sparql one way or another. It's still an
> >>> unusual thing in the wild and needs some explaining why we put open
> >>> endpoints on the web without great restrictions. But since this one is
> >>> intended to be a sandbox to play with and learn I take indeed a
> positive
> >>> view on this incident.
> >>>
> >>> On Tue 18 Dec 2018 at 21:34, Bruno P. Kinoshita
> >>> <[email protected]> wrote:
> >>>
> >>>> I think Laura's option is the best/easiest one, and good on you for
> the
> >>>> positive point-of-view on these spams Marco! :D
> >>>> Bruno
> >>>>
> >>>> From: Marco Neumann <[email protected]>
> >>>> To: [email protected]
> >>>> Sent: Wednesday, 19 December 2018 8:58 AM
> >>>> Subject: Re: blocking IP to prevent malicious sparql queries
> >>>>
> >>>> Thank you Laura,
> >>>>
> >>>> I was hoping for a quick fix and something along the lines of a fuseki
> >>>> blacklist filter in the shiro.ini
> >>>>
> >>>> but yes the reverse proxy is probably a more sensible approach at this
> >>>> point.
> >>>>
> >>>> In any event good to see sparql spam like this here, it means that the
> >>>> Semantic Web has most certainly arrived in the mainstream ;)
> >>>>
> >>>>
> >>>>
> >>>> On Tue, Dec 18, 2018 at 5:35 PM Laura Morales <[email protected]>
> wrote:
> >>>>
> >>>>> While I think the correct answer is YES (perhaps by implementing a
> custom
> >>>>> filter), I guess the answer is going to be "use a reverse proxy".
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> Sent: Tuesday, December 18, 2018 at 6:16 PM
> >>>>> From: "Marco Neumann" <[email protected]>
> >>>>> To: [email protected]
> >>>>> Subject: blocking IP to prevent malicious sparql queries
> >>>>> is it possible to block indiviual IPs with the shiro.ini?
> >>>>>
> >>>>> We receive a number of malicious sparql queries from an IP in France
> >>>>> (193.52.210.70) today
> >>>>>
> >>>>> that continuously issues the following SPARQL query:
> >>>>>
> >>>>> SELECT ?r (count(*) AS ?count)
> >>>>> WHERE{ ?x ?r ?s
> >>>>> { SELECT ?s WHERE
> >>>>> { ?s a ?o }
> >>>>> OFFSET 124639 LIMIT 1000 }
> >>>>> } GROUP BY ?s ?r OFFSET 0 LIMIT 10000
> >>>>>
> >>>>> resulting in:
> >>>>>
> >>>>> [2018-12-18 18:10:31] AbstractConnector WARN
> >>>>> java.lang.OutOfMemoryError: GC overhead limit exceeded
> >>>>> [2018-12-18 18:10:34] Fuseki WARN [424] RC = 500 : GC overhead limit
> >>>>> exceeded
> >>>>> java.lang.OutOfMemoryError: GC overhead limit exceeded
> >>>>> [2018-12-18 18:10:34] Fuseki INFO [424] 500 GC overhead limit
> exceeded
> >>>>> (39.946 s)
> >>>>>
> >>>>> and pushes fuseki offline for a few minutes.
> >>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>>
> >>>>> ---
> >>>>> Marco Neumann
> >>>>> KONA
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>>
> >>>> ---
> >>>> Marco Neumann
> >>>> KONA
> >>>>
> >>>>
> >>>>
> >>>
> >>> --
> >>>
> >>>
> >>> ---
> >>> Marco Neumann
> >>> KONA
> >>
> >>
> >>
> >> --
> >>
> >>
> >> ---
> >> Marco Neumann
> >> KONA
> >
> >
> >
> > --
> >
> >
> > ---
> > Marco Neumann
> > KONA
> >
>

Reply via email to