Re: Strategies to avoid log flooding

2023-03-29 Thread Andy Seaborne




On 29/03/2023 14:24, Mikael Pesonen wrote:


Here the next line was REPLACE that's why regex

VALUES ?class_label { " \\(häiriö\\)" " \\(löydös\\)" " \\(toimenpide\\)" }
?concept rdfs:label ?fsnl FILTER (REGEX(?fsnl, $class_label)) .
BIND (REPLACE(?fsnl, ?class_label, "") AS ?newl) .

But indeed, I didn't mean to use $ in $class_label, no idea what that 
syntax means. But it was not the cause here?


No.

>> org.apache.jena.sparql.expr.ExprException: REGEX: Pattern is not a
string: " \\(häiriö\\)"@fi

It says " \\(häiriö\\)"@fi

It has a language tag.

The query you show does not seem to be the query being run.

Regexs are xsd:strings, not language tag strings.




So to use constants, write above like this?

?concept rdfs:label ?fsnl
FILTER (REGEX(?fsnl, " \\(häiriö\\)") | REGEX(?fsnl, " \\(löydös\\)") | 
REGEX(?fsnl, " \\(toimenpide\\)") ) .

BIND (REPLACE(?fsnl, " \\(häiriö\\)", "") AS ?newl1) .
BIND (REPLACE(?newl1, " \\(löydös\\)", "") AS ?newl2) .
BIND (REPLACE(?newl2, " \\(toimenpide\\)", "") AS ?newl) .


Try it on a small amount of data.

Andy




On 29/03/2023 15.20, Andy Seaborne wrote:



On 29/03/2023 12:56, Rob @ DNR wrote:
Yes, you can filter these out, the logger in question is the class 
name shown, the log4j configuration will need to reference that via 
its fully qualified name i.e. 
org.apache.jena.sparql.engine.iterator.QueryIterFilterExpr and set it 
to ERROR/OFF to suppress these warnings


Issuing millions of instances of the same identical warning certainly 
seems like a bug to me, especially since this is elicited by query 
input it could potentially be abused as a DoS attack vector.


Rob


From: Mikael Pesonen 
Date: Wednesday, 29 March 2023 at 10:22
To: users@jena.apache.org 
Subject: Re: Strategies to avoid log flooding
Below is the log, so is it possible to filter just these out?

Unfortunately I don't recall the exact regex but it was related to
escaping parentheses, so maybe this or with one back slash:
...



VALUES ?class_label { " \\(häiriö\\)" " \\(löydös\\)" " 
\\(toimenpide\\)" }

?concept rdfs:label ?fsnl FILTER (REGEX(?fsnl, $class_label)) .


That does not align with the log message which says the pattern is " 
\\(häiriö\\)"@fi


meaning $class_label is @fi.

Use str() to get the lexical part.

The regex is potentially different every call. So the regex is 
compiled every call. (If it's the same, a constant, it is compiled once.)


Here, write as three calls, one per constant.

Or use CONTAINS, because a regex is unnecessary in this case.

    Andy


...

So this is a bug not a feature and can be corrected?

Mar 27 13:13:33 insight-terms java[2512289]: [2023-03-27 13:13:33]
QueryIterFilterExpr WARN  Expression Exception in (regex ?fsnl 
?class_label)

Mar 27 13:13:33 insight-terms java[2512289]:
org.apache.jena.sparql.expr.ExprException: REGEX: Pattern is not a
string: " \\(häiriö\\)"@fi
Mar 27 13:13:33 insight-terms java[2512289]: #011at
org.apache.jena.sparql.expr.E_Regex.makeRegexEngine(E_Regex.java:120)
~[fuseki-server.jar:4.6.1]




Re: Strategies to avoid log flooding

2023-03-29 Thread Mikael Pesonen



Here the next line was REPLACE that's why regex

VALUES ?class_label { " \\(häiriö\\)" " \\(löydös\\)" " \\(toimenpide\\)" }
?concept rdfs:label ?fsnl FILTER (REGEX(?fsnl, $class_label)) .
BIND (REPLACE(?fsnl, ?class_label, "") AS ?newl) .

But indeed, I didn't mean to use $ in $class_label, no idea what that 
syntax means. But it was not the cause here?


So to use constants, write above like this?

?concept rdfs:label ?fsnl
FILTER (REGEX(?fsnl, " \\(häiriö\\)") | REGEX(?fsnl, " \\(löydös\\)") | 
REGEX(?fsnl, " \\(toimenpide\\)") ) .

BIND (REPLACE(?fsnl, " \\(häiriö\\)", "") AS ?newl1) .
BIND (REPLACE(?newl1, " \\(löydös\\)", "") AS ?newl2) .
BIND (REPLACE(?newl2, " \\(toimenpide\\)", "") AS ?newl) .


On 29/03/2023 15.20, Andy Seaborne wrote:



On 29/03/2023 12:56, Rob @ DNR wrote:
Yes, you can filter these out, the logger in question is the class 
name shown, the log4j configuration will need to reference that via 
its fully qualified name i.e. 
org.apache.jena.sparql.engine.iterator.QueryIterFilterExpr and set it 
to ERROR/OFF to suppress these warnings


Issuing millions of instances of the same identical warning certainly 
seems like a bug to me, especially since this is elicited by query 
input it could potentially be abused as a DoS attack vector.


Rob


From: Mikael Pesonen 
Date: Wednesday, 29 March 2023 at 10:22
To: users@jena.apache.org 
Subject: Re: Strategies to avoid log flooding
Below is the log, so is it possible to filter just these out?

Unfortunately I don't recall the exact regex but it was related to
escaping parentheses, so maybe this or with one back slash:
...



VALUES ?class_label { " \\(häiriö\\)" " \\(löydös\\)" " 
\\(toimenpide\\)" }

?concept rdfs:label ?fsnl FILTER (REGEX(?fsnl, $class_label)) .


That does not align with the log message which says the pattern is " 
\\(häiriö\\)"@fi


meaning $class_label is @fi.

Use str() to get the lexical part.

The regex is potentially different every call. So the regex is 
compiled every call. (If it's the same, a constant, it is compiled once.)


Here, write as three calls, one per constant.

Or use CONTAINS, because a regex is unnecessary in this case.

    Andy


...

So this is a bug not a feature and can be corrected?

Mar 27 13:13:33 insight-terms java[2512289]: [2023-03-27 13:13:33]
QueryIterFilterExpr WARN  Expression Exception in (regex ?fsnl 
?class_label)

Mar 27 13:13:33 insight-terms java[2512289]:
org.apache.jena.sparql.expr.ExprException: REGEX: Pattern is not a
string: " \\(häiriö\\)"@fi
Mar 27 13:13:33 insight-terms java[2512289]: #011at
org.apache.jena.sparql.expr.E_Regex.makeRegexEngine(E_Regex.java:120)
~[fuseki-server.jar:4.6.1]


--
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's 
Tools - Text Tools - E-books and M-books

Mikael Pesonen
Semantic Technologies

e-mail: mikael.peso...@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND



Re: Strategies to avoid log flooding

2023-03-29 Thread Andy Seaborne




On 29/03/2023 12:56, Rob @ DNR wrote:

Yes, you can filter these out, the logger in question is the class name shown, 
the log4j configuration will need to reference that via its fully qualified 
name i.e. org.apache.jena.sparql.engine.iterator.QueryIterFilterExpr and set it 
to ERROR/OFF to suppress these warnings

Issuing millions of instances of the same identical warning certainly seems 
like a bug to me, especially since this is elicited by query input it could 
potentially be abused as a DoS attack vector.

Rob


From: Mikael Pesonen 
Date: Wednesday, 29 March 2023 at 10:22
To: users@jena.apache.org 
Subject: Re: Strategies to avoid log flooding
Below is the log, so is it possible to filter just these out?

Unfortunately I don't recall the exact regex but it was related to
escaping parentheses, so maybe this or with one back slash:
...




VALUES ?class_label { " \\(häiriö\\)" " \\(löydös\\)" " \\(toimenpide\\)" }
?concept rdfs:label ?fsnl FILTER (REGEX(?fsnl, $class_label)) .


That does not align with the log message which says the pattern is " 
\\(häiriö\\)"@fi


meaning $class_label is @fi.

Use str() to get the lexical part.

The regex is potentially different every call. So the regex is compiled 
every call. (If it's the same, a constant, it is compiled once.)


Here, write as three calls, one per constant.

Or use CONTAINS, because a regex is unnecessary in this case.

Andy


...

So this is a bug not a feature and can be corrected?

Mar 27 13:13:33 insight-terms java[2512289]: [2023-03-27 13:13:33]
QueryIterFilterExpr WARN  Expression Exception in (regex ?fsnl ?class_label)
Mar 27 13:13:33 insight-terms java[2512289]:
org.apache.jena.sparql.expr.ExprException: REGEX: Pattern is not a
string: " \\(häiriö\\)"@fi
Mar 27 13:13:33 insight-terms java[2512289]: #011at
org.apache.jena.sparql.expr.E_Regex.makeRegexEngine(E_Regex.java:120)
~[fuseki-server.jar:4.6.1]


Re: Strategies to avoid log flooding

2023-03-29 Thread Rob @ DNR
Yes, you can filter these out, the logger in question is the class name shown, 
the log4j configuration will need to reference that via its fully qualified 
name i.e. org.apache.jena.sparql.engine.iterator.QueryIterFilterExpr and set it 
to ERROR/OFF to suppress these warnings

Issuing millions of instances of the same identical warning certainly seems 
like a bug to me, especially since this is elicited by query input it could 
potentially be abused as a DoS attack vector.

Rob


From: Mikael Pesonen 
Date: Wednesday, 29 March 2023 at 10:22
To: users@jena.apache.org 
Subject: Re: Strategies to avoid log flooding
Below is the log, so is it possible to filter just these out?

Unfortunately I don't recall the exact regex but it was related to
escaping parentheses, so maybe this or with one back slash:
...
VALUES ?class_label { " \\(häiriö\\)" " \\(löydös\\)" " \\(toimenpide\\)" }
?concept rdfs:label ?fsnl FILTER (REGEX(?fsnl, $class_label)) .
...

So this is a bug not a feature and can be corrected?

Mar 27 13:13:33 insight-terms java[2512289]: [2023-03-27 13:13:33]
QueryIterFilterExpr WARN  Expression Exception in (regex ?fsnl ?class_label)
Mar 27 13:13:33 insight-terms java[2512289]:
org.apache.jena.sparql.expr.ExprException: REGEX: Pattern is not a
string: " \\(häiriö\\)"@fi
Mar 27 13:13:33 insight-terms java[2512289]: #011at
org.apache.jena.sparql.expr.E_Regex.makeRegexEngine(E_Regex.java:120)
~[fuseki-server.jar:4.6.1]
Mar 27 13:13:33 insight-terms java[2512289]: #011at
org.apache.jena.sparql.expr.E_Regex.eval(E_Regex.java:102)
~[fuseki-server.jar:4.6.1]
Mar 27 13:13:33 insight-terms java[2512289]: #011at
org.apache.jena.sparql.expr.ExprFunctionN.eval(ExprFunctionN.java:113)
~[fuseki-server.jar:4.6.1]
Mar 27 13:13:33 insight-terms java[2512289]: #011at
org.apache.jena.sparql.expr.ExprFunctionN.eval(ExprFunctionN.java:110)
~[fuseki-server.jar:4.6.1]
Mar 27 13:13:33 insight-terms java[2512289]: #011at
org.apache.jena.sparql.expr.ExprNode.isSatisfied(ExprNode.java:42)
~[fuseki-server.jar:4.6.1]
Mar 27 13:13:33 insight-terms java[2512289]: #011at
org.apache.jena.sparql.engine.iterator.QueryIterFilterExpr.accept(QueryIterFilterExpr.java:49)
~[fuseki-server.jar:4.6.1]
Mar 27 13:13:33 insight-terms java[2512289]: #011at
org.apache.jena.sparql.engine.iterator.QueryIterProcessBinding.hasNextBinding(QueryIterProcessBinding.java:81)
~[fuseki-server.jar:4.6.1]
Mar 27 13:13:33 insight-terms java[2512289]: #011at
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:116)
~[fuseki-server.jar:4.6.1]
Mar 27 13:13:33 insight-terms java[2512289]: #011at
org.apache.jena.sparql.engine.iterator.QueryIterSlice.hasNextBinding(QueryIterSlice.java:76)
~[fuseki-server.jar:4.6.1]
Mar 27 13:13:33 insight-terms java[2512289]: #011at
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:116)
~[fuseki-server.jar:4.6.1]
Mar 27 13:13:33 insight-terms java[2512289]: #011at
org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:38)
~[fuseki-server.jar:4.6.1]
Mar 27 13:13:33 insight-terms java[2512289]: #011at
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:116)
~[fuseki-server.jar:4.6.1]
Mar 27 13:13:33 insight-terms java[2512289]: #011at
org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:38)
~[fuseki-server.jar:4.6.1]
Mar 27 13:13:33 insight-terms java[2512289]: #011at
org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:116)
~[fuseki-server.jar:4.6.1]
Mar 27 13:13:33 insight-terms java[2512289]: #011at
org.apache.jena.sparql.exec.RowSetStream.hasNext(RowSetStream.java:47)
~[fuseki-server.jar:4.6.1]
Mar 27 13:13:33 insight-terms java[2512289]: #011at
org.apache.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:81)
~[fuseki-server.jar:4.6.1]
Mar 27 13:13:33 insight-terms java[2512289]: #011at
org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.executeQuery(SPARQLQueryProcessor.java:378)
~[fuseki-server.jar:4.6.1]
Mar 27 13:13:33 insight-terms java[2512289]: #011at
org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.execute(SPARQLQueryProcessor.java:277)
~[fuseki-server.jar:4.6.1]
Mar 27 13:13:33 insight-terms java[2512289]: #011at
org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.executeWithParameter(SPARQLQueryProcessor.java:222)
~[fuseki-server.jar:4.6.1]
Mar 27 13:13:33 insight-terms java[2512289]: #011at
org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.execute(SPARQLQueryProcessor.java:207)
~[fuseki-server.jar:4.6.1]
Mar 27 13:13:33 insight-terms java[2512289]: #011at
org.apache.jena.fuseki.servlets.ActionService.executeLifecycle(ActionService.java:58)
~[fuseki-server.jar:4.6.1]
Mar 27 13:13:33 insight-terms java[2512289]: #011at
org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.execPost(SPARQLQueryProcessor.java:83)
~[fuseki-se

Re: Strategies to avoid log flooding

2023-03-29 Thread Mikael Pesonen
pps-fileview.texmex_20230316.01_p2


On 28/03/2023 16.04, Rob @ DNR wrote:

A GitHub issue with a minimal example query that reproduces the issue would be 
a good start so we can reproduce the issue and look into a fix

In workaround terms end users control their logging configuration so you could 
create a Log4j configuration that disables logging for the specific offending 
logger (assuming that this is a sufficiently specific logger to not suppress 
actually relevant logging)

Rob

From: Mikael Pesonen 
Date: Tuesday, 28 March 2023 at 11:21
To: users@jena.apache.org 
Subject: Strategies to avoid log flooding
Hi,

there are some cases where Jena generates dozens of gigs, maybe even
terabytes, of log in one query. If you add a bad REGEX, it generates a
long warning level exception for every row in db, or atleast million of
them (disk filled up so don't know). Is there another way to avoid this
except disable warnings?



--
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's 
Tools - Text Tools - E-books and M-books

Mikael Pesonen
Semantic Technologies

e-mail: mikael.peso...@lingsoft.fi
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND



Re: Strategies to avoid log flooding

2023-03-28 Thread Rob @ DNR
A GitHub issue with a minimal example query that reproduces the issue would be 
a good start so we can reproduce the issue and look into a fix

In workaround terms end users control their logging configuration so you could 
create a Log4j configuration that disables logging for the specific offending 
logger (assuming that this is a sufficiently specific logger to not suppress 
actually relevant logging)

Rob

From: Mikael Pesonen 
Date: Tuesday, 28 March 2023 at 11:21
To: users@jena.apache.org 
Subject: Strategies to avoid log flooding
Hi,

there are some cases where Jena generates dozens of gigs, maybe even
terabytes, of log in one query. If you add a bad REGEX, it generates a
long warning level exception for every row in db, or atleast million of
them (disk filled up so don't know). Is there another way to avoid this
except disable warnings?


Strategies to avoid log flooding

2023-03-28 Thread Mikael Pesonen

Hi,

there are some cases where Jena generates dozens of gigs, maybe even 
terabytes, of log in one query. If you add a bad REGEX, it generates a 
long warning level exception for every row in db, or atleast million of 
them (disk filled up so don't know). Is there another way to avoid this 
except disable warnings?