The join-bgp-slice/project could be done better, depending on the data shape.

    Andy

On 19/12/2018 13:08, Marco Neumann wrote:
Andy, I did a local profiling of the query in a standard
Eclipse/YourKit configuration which took 45s. looks like this is just
a matter of increasing heap space to allow fuseki to complete the
query now.

(slice 0 10000
  (project (?r ?count)
    (extend ((?count ?.0))
      (group (?s ?r) ((?.0 (count)))
        (join
          (bgp (triple ?x ?r ?s))
          (slice 124639 1000
            (project (?s)
              (bgp (triple ?s
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?o)))))))))

On Wed, Dec 19, 2018 at 10:57 AM Marco Neumann <[email protected]> wrote:

so yes an apache proxy and a query timeout limit for fuseki instances
it will be.

I just checked the same query on an open source virtuoso instance
(7.2) with the same data and it seems that virtuoso handles the
request much more resourcefully and to completion. Andy can you
enlighten me what the main difference here is in the treatment of the
query by jena (39s) vs virtuoso (1s)?

On Wed, Dec 19, 2018 at 6:56 AM Laura Morales <[email protected]> wrote:

and needs some explaining why we put open endpoints on the web without great 
restrictions

I've always been puzzled by this as well. You never see a publicly reachable 
PostgreSQL or MariaDB servers, or any other database. There is always a layer 
in between which defines a list of possible requests, and then every requests 
is optimized to retrieve data from the database. With a public endpoint 
instead, this optimization is not possible since anybody can write any query. I 
think the reason is simply that a sparql endpoint is supposed to answer any 
type of query which traverses any path that is not well defined a priori. If 
you only want the server to serve a specific kind of queries instead, in this 
case you can in fact use some kind of REST API in front of it and translate 
every request to a sparql query; in this scenario you don't need the endpoint 
to be public, but you're limiting the type of queries that a user can ask.




Sent: Tuesday, December 18, 2018 at 11:40 PM
From: "Marco Neumann" <[email protected]>
To: "Bruno P. Kinoshita" <[email protected]>, [email protected]
Subject: Re: blocking IP to prevent malicious sparql queries
It's good to see people using sparql one way or another. It's still an
unusual thing in the wild and needs some explaining why we put open
endpoints on the web without great restrictions. But since this one is
intended to be a sandbox to play with and learn I take indeed a positive
view on this incident.

On Tue 18 Dec 2018 at 21:34, Bruno P. Kinoshita
<[email protected]> wrote:

I think Laura's option is the best/easiest one, and good on you for the
positive point-of-view on these spams Marco! :D
Bruno

From: Marco Neumann <[email protected]>
To: [email protected]
Sent: Wednesday, 19 December 2018 8:58 AM
Subject: Re: blocking IP to prevent malicious sparql queries

Thank you Laura,

I was hoping for a quick fix and something along the lines of a fuseki
blacklist filter in the shiro.ini

but yes the reverse proxy is probably a more sensible approach at this
point.

In any event good to see sparql spam like this here, it means that the
Semantic Web has most certainly arrived in the mainstream ;)



On Tue, Dec 18, 2018 at 5:35 PM Laura Morales <[email protected]> wrote:

While I think the correct answer is YES (perhaps by implementing a custom
filter), I guess the answer is going to be "use a reverse proxy".




Sent: Tuesday, December 18, 2018 at 6:16 PM
From: "Marco Neumann" <[email protected]>
To: [email protected]
Subject: blocking IP to prevent malicious sparql queries
is it possible to block indiviual IPs with the shiro.ini?

We receive a number of malicious sparql queries from an IP in France
(193.52.210.70) today

that continuously issues the following SPARQL query:

SELECT ?r (count(*) AS ?count)
WHERE{ ?x ?r ?s
{ SELECT ?s WHERE
{ ?s a ?o }
OFFSET 124639 LIMIT 1000 }
} GROUP BY ?s ?r OFFSET 0 LIMIT 10000

resulting in:

[2018-12-18 18:10:31] AbstractConnector WARN
java.lang.OutOfMemoryError: GC overhead limit exceeded
[2018-12-18 18:10:34] Fuseki WARN [424] RC = 500 : GC overhead limit
exceeded
java.lang.OutOfMemoryError: GC overhead limit exceeded
[2018-12-18 18:10:34] Fuseki INFO [424] 500 GC overhead limit exceeded
(39.946 s)

and pushes fuseki offline for a few minutes.


--


---
Marco Neumann
KONA



--


---
Marco Neumann
KONA




--


---
Marco Neumann
KONA



--


---
Marco Neumann
KONA



--


---
Marco Neumann
KONA

Reply via email to