Re: Strategies to avoid log flooding
On 29/03/2023 14:24, Mikael Pesonen wrote: Here the next line was REPLACE that's why regex VALUES ?class_label { " \\(häiriö\\)" " \\(löydös\\)" " \\(toimenpide\\)" } ?concept rdfs:label ?fsnl FILTER (REGEX(?fsnl, $class_label)) . BIND (REPLACE(?fsnl, ?class_label, "") AS ?newl) . But indeed, I didn't mean to use $ in $class_label, no idea what that syntax means. But it was not the cause here? No. >> org.apache.jena.sparql.expr.ExprException: REGEX: Pattern is not a string: " \\(häiriö\\)"@fi It says " \\(häiriö\\)"@fi It has a language tag. The query you show does not seem to be the query being run. Regexs are xsd:strings, not language tag strings. So to use constants, write above like this? ?concept rdfs:label ?fsnl FILTER (REGEX(?fsnl, " \\(häiriö\\)") | REGEX(?fsnl, " \\(löydös\\)") | REGEX(?fsnl, " \\(toimenpide\\)") ) . BIND (REPLACE(?fsnl, " \\(häiriö\\)", "") AS ?newl1) . BIND (REPLACE(?newl1, " \\(löydös\\)", "") AS ?newl2) . BIND (REPLACE(?newl2, " \\(toimenpide\\)", "") AS ?newl) . Try it on a small amount of data. Andy On 29/03/2023 15.20, Andy Seaborne wrote: On 29/03/2023 12:56, Rob @ DNR wrote: Yes, you can filter these out, the logger in question is the class name shown, the log4j configuration will need to reference that via its fully qualified name i.e. org.apache.jena.sparql.engine.iterator.QueryIterFilterExpr and set it to ERROR/OFF to suppress these warnings Issuing millions of instances of the same identical warning certainly seems like a bug to me, especially since this is elicited by query input it could potentially be abused as a DoS attack vector. Rob From: Mikael Pesonen Date: Wednesday, 29 March 2023 at 10:22 To: users@jena.apache.org Subject: Re: Strategies to avoid log flooding Below is the log, so is it possible to filter just these out? Unfortunately I don't recall the exact regex but it was related to escaping parentheses, so maybe this or with one back slash: ... VALUES ?class_label { " \\(häiriö\\)" " \\(löydös\\)" " \\(toimenpide\\)" } ?concept rdfs:label ?fsnl FILTER (REGEX(?fsnl, $class_label)) . That does not align with the log message which says the pattern is " \\(häiriö\\)"@fi meaning $class_label is @fi. Use str() to get the lexical part. The regex is potentially different every call. So the regex is compiled every call. (If it's the same, a constant, it is compiled once.) Here, write as three calls, one per constant. Or use CONTAINS, because a regex is unnecessary in this case. Andy ... So this is a bug not a feature and can be corrected? Mar 27 13:13:33 insight-terms java[2512289]: [2023-03-27 13:13:33] QueryIterFilterExpr WARN Expression Exception in (regex ?fsnl ?class_label) Mar 27 13:13:33 insight-terms java[2512289]: org.apache.jena.sparql.expr.ExprException: REGEX: Pattern is not a string: " \\(häiriö\\)"@fi Mar 27 13:13:33 insight-terms java[2512289]: #011at org.apache.jena.sparql.expr.E_Regex.makeRegexEngine(E_Regex.java:120) ~[fuseki-server.jar:4.6.1]
Re: Strategies to avoid log flooding
Here the next line was REPLACE that's why regex VALUES ?class_label { " \\(häiriö\\)" " \\(löydös\\)" " \\(toimenpide\\)" } ?concept rdfs:label ?fsnl FILTER (REGEX(?fsnl, $class_label)) . BIND (REPLACE(?fsnl, ?class_label, "") AS ?newl) . But indeed, I didn't mean to use $ in $class_label, no idea what that syntax means. But it was not the cause here? So to use constants, write above like this? ?concept rdfs:label ?fsnl FILTER (REGEX(?fsnl, " \\(häiriö\\)") | REGEX(?fsnl, " \\(löydös\\)") | REGEX(?fsnl, " \\(toimenpide\\)") ) . BIND (REPLACE(?fsnl, " \\(häiriö\\)", "") AS ?newl1) . BIND (REPLACE(?newl1, " \\(löydös\\)", "") AS ?newl2) . BIND (REPLACE(?newl2, " \\(toimenpide\\)", "") AS ?newl) . On 29/03/2023 15.20, Andy Seaborne wrote: On 29/03/2023 12:56, Rob @ DNR wrote: Yes, you can filter these out, the logger in question is the class name shown, the log4j configuration will need to reference that via its fully qualified name i.e. org.apache.jena.sparql.engine.iterator.QueryIterFilterExpr and set it to ERROR/OFF to suppress these warnings Issuing millions of instances of the same identical warning certainly seems like a bug to me, especially since this is elicited by query input it could potentially be abused as a DoS attack vector. Rob From: Mikael Pesonen Date: Wednesday, 29 March 2023 at 10:22 To: users@jena.apache.org Subject: Re: Strategies to avoid log flooding Below is the log, so is it possible to filter just these out? Unfortunately I don't recall the exact regex but it was related to escaping parentheses, so maybe this or with one back slash: ... VALUES ?class_label { " \\(häiriö\\)" " \\(löydös\\)" " \\(toimenpide\\)" } ?concept rdfs:label ?fsnl FILTER (REGEX(?fsnl, $class_label)) . That does not align with the log message which says the pattern is " \\(häiriö\\)"@fi meaning $class_label is @fi. Use str() to get the lexical part. The regex is potentially different every call. So the regex is compiled every call. (If it's the same, a constant, it is compiled once.) Here, write as three calls, one per constant. Or use CONTAINS, because a regex is unnecessary in this case. Andy ... So this is a bug not a feature and can be corrected? Mar 27 13:13:33 insight-terms java[2512289]: [2023-03-27 13:13:33] QueryIterFilterExpr WARN Expression Exception in (regex ?fsnl ?class_label) Mar 27 13:13:33 insight-terms java[2512289]: org.apache.jena.sparql.expr.ExprException: REGEX: Pattern is not a string: " \\(häiriö\\)"@fi Mar 27 13:13:33 insight-terms java[2512289]: #011at org.apache.jena.sparql.expr.E_Regex.makeRegexEngine(E_Regex.java:120) ~[fuseki-server.jar:4.6.1] -- Lingsoft - 30 years of Leading Language Management www.lingsoft.fi Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books Mikael Pesonen Semantic Technologies e-mail: mikael.peso...@lingsoft.fi Tel. +358 2 279 3300 Time zone: GMT+2 Helsinki Office Eteläranta 10 FI-00130 Helsinki FINLAND Turku Office Kauppiaskatu 5 A FI-20100 Turku FINLAND
Re: Strategies to avoid log flooding
On 29/03/2023 12:56, Rob @ DNR wrote: Yes, you can filter these out, the logger in question is the class name shown, the log4j configuration will need to reference that via its fully qualified name i.e. org.apache.jena.sparql.engine.iterator.QueryIterFilterExpr and set it to ERROR/OFF to suppress these warnings Issuing millions of instances of the same identical warning certainly seems like a bug to me, especially since this is elicited by query input it could potentially be abused as a DoS attack vector. Rob From: Mikael Pesonen Date: Wednesday, 29 March 2023 at 10:22 To: users@jena.apache.org Subject: Re: Strategies to avoid log flooding Below is the log, so is it possible to filter just these out? Unfortunately I don't recall the exact regex but it was related to escaping parentheses, so maybe this or with one back slash: ... VALUES ?class_label { " \\(häiriö\\)" " \\(löydös\\)" " \\(toimenpide\\)" } ?concept rdfs:label ?fsnl FILTER (REGEX(?fsnl, $class_label)) . That does not align with the log message which says the pattern is " \\(häiriö\\)"@fi meaning $class_label is @fi. Use str() to get the lexical part. The regex is potentially different every call. So the regex is compiled every call. (If it's the same, a constant, it is compiled once.) Here, write as three calls, one per constant. Or use CONTAINS, because a regex is unnecessary in this case. Andy ... So this is a bug not a feature and can be corrected? Mar 27 13:13:33 insight-terms java[2512289]: [2023-03-27 13:13:33] QueryIterFilterExpr WARN Expression Exception in (regex ?fsnl ?class_label) Mar 27 13:13:33 insight-terms java[2512289]: org.apache.jena.sparql.expr.ExprException: REGEX: Pattern is not a string: " \\(häiriö\\)"@fi Mar 27 13:13:33 insight-terms java[2512289]: #011at org.apache.jena.sparql.expr.E_Regex.makeRegexEngine(E_Regex.java:120) ~[fuseki-server.jar:4.6.1]
Re: Strategies to avoid log flooding
Yes, you can filter these out, the logger in question is the class name shown, the log4j configuration will need to reference that via its fully qualified name i.e. org.apache.jena.sparql.engine.iterator.QueryIterFilterExpr and set it to ERROR/OFF to suppress these warnings Issuing millions of instances of the same identical warning certainly seems like a bug to me, especially since this is elicited by query input it could potentially be abused as a DoS attack vector. Rob From: Mikael Pesonen Date: Wednesday, 29 March 2023 at 10:22 To: users@jena.apache.org Subject: Re: Strategies to avoid log flooding Below is the log, so is it possible to filter just these out? Unfortunately I don't recall the exact regex but it was related to escaping parentheses, so maybe this or with one back slash: ... VALUES ?class_label { " \\(häiriö\\)" " \\(löydös\\)" " \\(toimenpide\\)" } ?concept rdfs:label ?fsnl FILTER (REGEX(?fsnl, $class_label)) . ... So this is a bug not a feature and can be corrected? Mar 27 13:13:33 insight-terms java[2512289]: [2023-03-27 13:13:33] QueryIterFilterExpr WARN Expression Exception in (regex ?fsnl ?class_label) Mar 27 13:13:33 insight-terms java[2512289]: org.apache.jena.sparql.expr.ExprException: REGEX: Pattern is not a string: " \\(häiriö\\)"@fi Mar 27 13:13:33 insight-terms java[2512289]: #011at org.apache.jena.sparql.expr.E_Regex.makeRegexEngine(E_Regex.java:120) ~[fuseki-server.jar:4.6.1] Mar 27 13:13:33 insight-terms java[2512289]: #011at org.apache.jena.sparql.expr.E_Regex.eval(E_Regex.java:102) ~[fuseki-server.jar:4.6.1] Mar 27 13:13:33 insight-terms java[2512289]: #011at org.apache.jena.sparql.expr.ExprFunctionN.eval(ExprFunctionN.java:113) ~[fuseki-server.jar:4.6.1] Mar 27 13:13:33 insight-terms java[2512289]: #011at org.apache.jena.sparql.expr.ExprFunctionN.eval(ExprFunctionN.java:110) ~[fuseki-server.jar:4.6.1] Mar 27 13:13:33 insight-terms java[2512289]: #011at org.apache.jena.sparql.expr.ExprNode.isSatisfied(ExprNode.java:42) ~[fuseki-server.jar:4.6.1] Mar 27 13:13:33 insight-terms java[2512289]: #011at org.apache.jena.sparql.engine.iterator.QueryIterFilterExpr.accept(QueryIterFilterExpr.java:49) ~[fuseki-server.jar:4.6.1] Mar 27 13:13:33 insight-terms java[2512289]: #011at org.apache.jena.sparql.engine.iterator.QueryIterProcessBinding.hasNextBinding(QueryIterProcessBinding.java:81) ~[fuseki-server.jar:4.6.1] Mar 27 13:13:33 insight-terms java[2512289]: #011at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:116) ~[fuseki-server.jar:4.6.1] Mar 27 13:13:33 insight-terms java[2512289]: #011at org.apache.jena.sparql.engine.iterator.QueryIterSlice.hasNextBinding(QueryIterSlice.java:76) ~[fuseki-server.jar:4.6.1] Mar 27 13:13:33 insight-terms java[2512289]: #011at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:116) ~[fuseki-server.jar:4.6.1] Mar 27 13:13:33 insight-terms java[2512289]: #011at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:38) ~[fuseki-server.jar:4.6.1] Mar 27 13:13:33 insight-terms java[2512289]: #011at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:116) ~[fuseki-server.jar:4.6.1] Mar 27 13:13:33 insight-terms java[2512289]: #011at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:38) ~[fuseki-server.jar:4.6.1] Mar 27 13:13:33 insight-terms java[2512289]: #011at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:116) ~[fuseki-server.jar:4.6.1] Mar 27 13:13:33 insight-terms java[2512289]: #011at org.apache.jena.sparql.exec.RowSetStream.hasNext(RowSetStream.java:47) ~[fuseki-server.jar:4.6.1] Mar 27 13:13:33 insight-terms java[2512289]: #011at org.apache.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:81) ~[fuseki-server.jar:4.6.1] Mar 27 13:13:33 insight-terms java[2512289]: #011at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.executeQuery(SPARQLQueryProcessor.java:378) ~[fuseki-server.jar:4.6.1] Mar 27 13:13:33 insight-terms java[2512289]: #011at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.execute(SPARQLQueryProcessor.java:277) ~[fuseki-server.jar:4.6.1] Mar 27 13:13:33 insight-terms java[2512289]: #011at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.executeWithParameter(SPARQLQueryProcessor.java:222) ~[fuseki-server.jar:4.6.1] Mar 27 13:13:33 insight-terms java[2512289]: #011at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.execute(SPARQLQueryProcessor.java:207) ~[fuseki-server.jar:4.6.1] Mar 27 13:13:33 insight-terms java[2512289]: #011at org.apache.jena.fuseki.servlets.ActionService.executeLifecycle(ActionService.java:58) ~[fuseki-server.jar:4.6.1] Mar 27 13:13:33 insight-terms java[2512289]: #011at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.execPost(SPARQLQueryProcessor.java:83) ~[fuseki-se
Re: Strategies to avoid log flooding
pps-fileview.texmex_20230316.01_p2 On 28/03/2023 16.04, Rob @ DNR wrote: A GitHub issue with a minimal example query that reproduces the issue would be a good start so we can reproduce the issue and look into a fix In workaround terms end users control their logging configuration so you could create a Log4j configuration that disables logging for the specific offending logger (assuming that this is a sufficiently specific logger to not suppress actually relevant logging) Rob From: Mikael Pesonen Date: Tuesday, 28 March 2023 at 11:21 To: users@jena.apache.org Subject: Strategies to avoid log flooding Hi, there are some cases where Jena generates dozens of gigs, maybe even terabytes, of log in one query. If you add a bad REGEX, it generates a long warning level exception for every row in db, or atleast million of them (disk filled up so don't know). Is there another way to avoid this except disable warnings? -- Lingsoft - 30 years of Leading Language Management www.lingsoft.fi Speech Applications - Language Management - Translation - Reader's and Writer's Tools - Text Tools - E-books and M-books Mikael Pesonen Semantic Technologies e-mail: mikael.peso...@lingsoft.fi Tel. +358 2 279 3300 Time zone: GMT+2 Helsinki Office Eteläranta 10 FI-00130 Helsinki FINLAND Turku Office Kauppiaskatu 5 A FI-20100 Turku FINLAND
Re: Strategies to avoid log flooding
A GitHub issue with a minimal example query that reproduces the issue would be a good start so we can reproduce the issue and look into a fix In workaround terms end users control their logging configuration so you could create a Log4j configuration that disables logging for the specific offending logger (assuming that this is a sufficiently specific logger to not suppress actually relevant logging) Rob From: Mikael Pesonen Date: Tuesday, 28 March 2023 at 11:21 To: users@jena.apache.org Subject: Strategies to avoid log flooding Hi, there are some cases where Jena generates dozens of gigs, maybe even terabytes, of log in one query. If you add a bad REGEX, it generates a long warning level exception for every row in db, or atleast million of them (disk filled up so don't know). Is there another way to avoid this except disable warnings?
Strategies to avoid log flooding
Hi, there are some cases where Jena generates dozens of gigs, maybe even terabytes, of log in one query. If you add a bad REGEX, it generates a long warning level exception for every row in db, or atleast million of them (disk filled up so don't know). Is there another way to avoid this except disable warnings?