Re: Tomcat 10 digester warnings

2021-01-04 Thread Romain Manni-Bucau
Le lun. 4 janv. 2021 à 15:04, Mark Thomas  a écrit :

> On 04/01/2021 11:54, Romain Manni-Bucau wrote:
>
> 
>
> > Well it also depends the classpath. I fully agree with a plain tomcat not
> > customized you don't get much - just have a more consistent usage of
> > resources but it is about being purist so not that interesting to me
> here.
> > Now make your classpath bloated - yes some people do ;) but I guess
> having
> > ~100jars is not that rare too these days - and this is really slower, in
> > particular with a custom security manager getResource check even if it
> got
> > optimized.
>
> If an app is that bloated then I'd be surprised if the changes proposed
> had any noticeable effect.
>

Startup is still very slow but this helps a little bit - but agree order or
magnitude is not critical.


>
> > Another interesting case is the tomee one or more exactly: the embedded
> > one. Let's assume somebody uses tomcat embeded (not rare these days) and
> > its dependency resolution comes with a servlet-api which is not tomcat
> one
> > (not rare too), then it can miss the resources because it is a pure API
> or
> > the project it comes from grab them from elsewhere. Why would he get
> > warnings if he does not use the validation.
>
> Any valid API JAR needs to include the resources as passing the TCK
> requires validation to be enabled - and that requires the schemas to be
> available. The current consensus of the Jakarta Servlet team is that the
> Servlet API JAR must include all the schema required for validation.
>
> Error messages where the root cause is an invalid API JAR are not
> something that concerns me.
>

Still it is allowed to provide the resource outside servlet jar I think.
TCK require signature validation and runtime validation, packaging is up to
the vendor.


>
> (As an aside, due to the dependencies between the schema it means that
> the Servlet API JAR has to include all the schema for the JSP API as
> well - they can't be decoupled into the JSP API JAR).
>
> > So to summarize I see two benefits to update a bit the impl:
> >
> > 1. Makes it smoother for embedded case (where validation is rare)
> > 2. Makes is a bit faster for highly customized standalone instances
> > (enterprise prebundled server/layer)
> >
> > Indeed I don't expect - can be done technically but hope it is never
> true -
> > both to be true at the same time :s.
>
> 1 assumes an invalid API JAR so I don't consider that to be a valid use
> case.
>

Not really, it would be invalid if tomcat would always validate it, it is
valid to consider it as a tomcat bug to log a kind of false positive
warning.
Typically the corrupted case I got shouldn't warn the user by default since
it is not used but I get dozens of warning lines which is very misleading
for end users.


>
> 2 looks to be the most likely (only?) use case where there is potential
> benefit to performing this optimisation
>
> >> The Maps are public but they aren't part of what we consider to be
> >> Tomcat's public API so while this would technically be an API change, I
> >> don't think that is a reason not to do this (or limit it to 10.1.x)
> >
> > Not sure I got this, it is just about chaging the internals of the map so
> > API does not break (like overriding get to be lazy but not changing the
> > generics).
>
> Ah. I was thinking of a simpler implementation that switched the
> Map to Map) or whatever results
> in the simplest code.
>
> That brings us back to the complexity vs performance benefit question.
>

To be concrete I had that in mind: use supplier in the not exposed map then
expose as read only this:

public class LazyMap extends ConcurrentHashMap {
private final Map> delegate = new HashMap<>();

@Override
public B get(final Object key) {
final B b = super.get(key);
if (b == null) {
final Supplier supplier = delegate.remove(key);
if (supplier != null) {
final B value = supplier.get();
final B existing = putIfAbsent((A) key, value);
return existing != null ? existing : value;
}
}
return b;
}

@Override
public boolean containsKey(Object key) {
return super.containsKey(key) || delegate.containsKey(key);
}

@Override
public boolean containsValue(Object value) {
ensurePopulated();
return super.containsValue(value);
}

@Override
public Set> entrySet() {
ensurePopulated();
return super.entrySet();
}

@Override
public KeySetView keySet() {
ensurePopulated();
return super.keySet();
}

@Override
public Collection values() {
ensurePopulated();
return super.values();
}

private void ensurePopulated() {
delegate.forEach((a, b) -> super.putIfAbsent(a, b.get()));
}
}


Don't think it is crazy but if you judge it is not worth no worries too.
Don't want to make you loose too much time on this, just 

Re: Tomcat 10 digester warnings

2021-01-04 Thread Mark Thomas
On 04/01/2021 11:54, Romain Manni-Bucau wrote:



> Well it also depends the classpath. I fully agree with a plain tomcat not
> customized you don't get much - just have a more consistent usage of
> resources but it is about being purist so not that interesting to me here.
> Now make your classpath bloated - yes some people do ;) but I guess having
> ~100jars is not that rare too these days - and this is really slower, in
> particular with a custom security manager getResource check even if it got
> optimized.

If an app is that bloated then I'd be surprised if the changes proposed
had any noticeable effect.

> Another interesting case is the tomee one or more exactly: the embedded
> one. Let's assume somebody uses tomcat embeded (not rare these days) and
> its dependency resolution comes with a servlet-api which is not tomcat one
> (not rare too), then it can miss the resources because it is a pure API or
> the project it comes from grab them from elsewhere. Why would he get
> warnings if he does not use the validation.

Any valid API JAR needs to include the resources as passing the TCK
requires validation to be enabled - and that requires the schemas to be
available. The current consensus of the Jakarta Servlet team is that the
Servlet API JAR must include all the schema required for validation.

Error messages where the root cause is an invalid API JAR are not
something that concerns me.

(As an aside, due to the dependencies between the schema it means that
the Servlet API JAR has to include all the schema for the JSP API as
well - they can't be decoupled into the JSP API JAR).

> So to summarize I see two benefits to update a bit the impl:
> 
> 1. Makes it smoother for embedded case (where validation is rare)
> 2. Makes is a bit faster for highly customized standalone instances
> (enterprise prebundled server/layer)
> 
> Indeed I don't expect - can be done technically but hope it is never true -
> both to be true at the same time :s.

1 assumes an invalid API JAR so I don't consider that to be a valid use
case.

2 looks to be the most likely (only?) use case where there is potential
benefit to performing this optimisation

>> The Maps are public but they aren't part of what we consider to be
>> Tomcat's public API so while this would technically be an API change, I
>> don't think that is a reason not to do this (or limit it to 10.1.x)
> 
> Not sure I got this, it is just about chaging the internals of the map so
> API does not break (like overriding get to be lazy but not changing the
> generics).

Ah. I was thinking of a simpler implementation that switched the
Map to Map) or whatever results
in the simplest code.

That brings us back to the complexity vs performance benefit question.

Mark

-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org



Re: Tomcat 10 digester warnings

2021-01-04 Thread Romain Manni-Bucau
Le lun. 4 janv. 2021 à 12:41, Mark Thomas  a écrit :

> On 04/01/2021 08:59, Romain Manni-Bucau wrote:
> > Hi Mark,
> >
> > The proposal is to move the locationFor to a Supplier instead of
> > eagerly loading the URL (means the maps passed to the resolver override
> the
> > read part to handle it) - or a fully lazy map impl works too.
> > Idea is to avoid to be able to hit locationFor() until it is used for the
> > related xsd.
>
> OK. Now I think I understand the proposal.
>
> > Prefetching = preresolution if you prefer. Overall the goal is to resolve
> > only when used. Most of the xsd are always useless so no real reason to
> > have them ready and consume time and memory for nothing + log warning if
> > irrelevant for the app (some filter out the xsd from jars cause it saves
> > ~1M and does not change at all the runtime for them).
>
> You are looking at the uncompressed size of the schemas (currently 1.3M
> with Tomcat 10). The compressed size is ~230k.
>
> How important is it to save that 230k of storage? As much as I'd like it
> to be the case that every system strives for efficient use of all
> resources my experience is that, for storage at least, that sort of
> storage saving is viewed as insignificant even at the uncompressed size.
>
> A quick test suggests that the static initializer in DigesterFactory
> takes about 4.5ms to complete. A profiler indicates that the memory
> footprint of those two maps are 10.6k and 1.3k respectively.
>
> How much time and memory would typically be saved by this proposal?
>
> Like any proposed performance improvement, this is going to come done to
> comparing the increase in complexity with the expected performance
> benefits.
>

Well it also depends the classpath. I fully agree with a plain tomcat not
customized you don't get much - just have a more consistent usage of
resources but it is about being purist so not that interesting to me here.
Now make your classpath bloated - yes some people do ;) but I guess having
~100jars is not that rare too these days - and this is really slower, in
particular with a custom security manager getResource check even if it got
optimized.

Another interesting case is the tomee one or more exactly: the embedded
one. Let's assume somebody uses tomcat embeded (not rare these days) and
its dependency resolution comes with a servlet-api which is not tomcat one
(not rare too), then it can miss the resources because it is a pure API or
the project it comes from grab them from elsewhere. Why would he get
warnings if he does not use the validation.

So to summarize I see two benefits to update a bit the impl:

1. Makes it smoother for embedded case (where validation is rare)
2. Makes is a bit faster for highly customized standalone instances
(enterprise prebundled server/layer)

Indeed I don't expect - can be done technically but hope it is never true -
both to be true at the same time :s.


>
> The Maps are public but they aren't part of what we consider to be
> Tomcat's public API so while this would technically be an API change, I
> don't think that is a reason not to do this (or limit it to 10.1.x)
>

Not sure I got this, it is just about chaging the internals of the map so
API does not break (like overriding get to be lazy but not changing the
generics).


>
> Mark
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: dev-h...@tomcat.apache.org
>
>


Re: Tomcat 10 digester warnings

2021-01-04 Thread Mark Thomas
On 04/01/2021 08:59, Romain Manni-Bucau wrote:
> Hi Mark,
> 
> The proposal is to move the locationFor to a Supplier instead of
> eagerly loading the URL (means the maps passed to the resolver override the
> read part to handle it) - or a fully lazy map impl works too.
> Idea is to avoid to be able to hit locationFor() until it is used for the
> related xsd.

OK. Now I think I understand the proposal.

> Prefetching = preresolution if you prefer. Overall the goal is to resolve
> only when used. Most of the xsd are always useless so no real reason to
> have them ready and consume time and memory for nothing + log warning if
> irrelevant for the app (some filter out the xsd from jars cause it saves
> ~1M and does not change at all the runtime for them).

You are looking at the uncompressed size of the schemas (currently 1.3M
with Tomcat 10). The compressed size is ~230k.

How important is it to save that 230k of storage? As much as I'd like it
to be the case that every system strives for efficient use of all
resources my experience is that, for storage at least, that sort of
storage saving is viewed as insignificant even at the uncompressed size.

A quick test suggests that the static initializer in DigesterFactory
takes about 4.5ms to complete. A profiler indicates that the memory
footprint of those two maps are 10.6k and 1.3k respectively.

How much time and memory would typically be saved by this proposal?

Like any proposed performance improvement, this is going to come done to
comparing the increase in complexity with the expected performance benefits.

The Maps are public but they aren't part of what we consider to be
Tomcat's public API so while this would technically be an API change, I
don't think that is a reason not to do this (or limit it to 10.1.x)

Mark

-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org



Re: Tomcat 10 digester warnings

2021-01-04 Thread Romain Manni-Bucau
Hi Mark,

The proposal is to move the locationFor to a Supplier instead of
eagerly loading the URL (means the maps passed to the resolver override the
read part to handle it) - or a fully lazy map impl works too.
Idea is to avoid to be able to hit locationFor() until it is used for the
related xsd.
Prefetching = preresolution if you prefer. Overall the goal is to resolve
only when used. Most of the xsd are always useless so no real reason to
have them ready and consume time and memory for nothing + log warning if
irrelevant for the app (some filter out the xsd from jars cause it saves
~1M and does not change at all the runtime for them).

Romain Manni-Bucau
@rmannibucau  |  Blog
 | Old Blog
 | Github  |
LinkedIn  | Book



Le lun. 4 janv. 2021 à 09:25, Mark Thomas  a écrit :

> On 03/01/2021 16:06, Romain Manni-Bucau wrote:
>
> 
>
> > So thought it was a Tomcat 10 issue but seems it is a TomEE 9 one (which
> > wrongly repackaged servlet-api jar).
>
> Indeed.
>
> > So overall Tomcat 10 is ok, just to follow up on one still relevant
> > question: is it worth doing all these lookups lazily (in a concurrent
> hash
> > map or so)?
>
> Sorry, I don't understand the question or your reference the the
> Digester "pre-fetching" schemas at start-up in your original post.
>
> The digester is configured with a custom EntityResolver2 that maps all
> known Java EE and Jakarta EE schemas to the local copies of those schema
> that ship with Tomcat. No schema is loaded until the digester finds a
> reference to it at which point the XML parser loads it via the standard
> EntityResolver2 mechanism.
>
> There is no pre-fetching or pre-loading of schemas.
>
> Mark
>
>
> >
> > Romain Manni-Bucau
> > @rmannibucau  |  Blog
> >  | Old Blog
> >  | Github <
> https://github.com/rmannibucau> |
> > LinkedIn  | Book
> > <
> https://www.packtpub.com/application-development/java-ee-8-high-performance
> >
> >
> >
> > Le dim. 3 janv. 2021 à 16:44, Mark Thomas  a écrit :
> >
> >> On 03/01/2021 10:35, Romain Manni-Bucau wrote:
> >>> Hi all
> >>>
> >>> Digester prefetches most ee schemas but ee 9 bundle does miss most of
> >> them
> >>> leading to a lot of warnings at startup.
> >>> Should log level be reduced or fetching be done lazily?
> >>
> >> Sorry, I am unable to understand the problem you are describing. What
> >> are the steps to recreate it from a clean Tomcat 10 install?
> >>
> >> Mark
> >>
> >> -
> >> To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
> >> For additional commands, e-mail: dev-h...@tomcat.apache.org
> >>
> >>
> >
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
> For additional commands, e-mail: dev-h...@tomcat.apache.org
>
>


Re: Tomcat 10 digester warnings

2021-01-04 Thread Mark Thomas
On 03/01/2021 16:06, Romain Manni-Bucau wrote:



> So thought it was a Tomcat 10 issue but seems it is a TomEE 9 one (which
> wrongly repackaged servlet-api jar).

Indeed.

> So overall Tomcat 10 is ok, just to follow up on one still relevant
> question: is it worth doing all these lookups lazily (in a concurrent hash
> map or so)?

Sorry, I don't understand the question or your reference the the
Digester "pre-fetching" schemas at start-up in your original post.

The digester is configured with a custom EntityResolver2 that maps all
known Java EE and Jakarta EE schemas to the local copies of those schema
that ship with Tomcat. No schema is loaded until the digester finds a
reference to it at which point the XML parser loads it via the standard
EntityResolver2 mechanism.

There is no pre-fetching or pre-loading of schemas.

Mark


> 
> Romain Manni-Bucau
> @rmannibucau  |  Blog
>  | Old Blog
>  | Github  |
> LinkedIn  | Book
> 
> 
> 
> Le dim. 3 janv. 2021 à 16:44, Mark Thomas  a écrit :
> 
>> On 03/01/2021 10:35, Romain Manni-Bucau wrote:
>>> Hi all
>>>
>>> Digester prefetches most ee schemas but ee 9 bundle does miss most of
>> them
>>> leading to a lot of warnings at startup.
>>> Should log level be reduced or fetching be done lazily?
>>
>> Sorry, I am unable to understand the problem you are describing. What
>> are the steps to recreate it from a clean Tomcat 10 install?
>>
>> Mark
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
>> For additional commands, e-mail: dev-h...@tomcat.apache.org
>>
>>
> 


-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org



Re: Tomcat 10 digester warnings

2021-01-03 Thread Romain Manni-Bucau
Hmm, maybe it is a packaging issue. Testing TomEE 9 patched with Tomcat 10
binaries I got:

03-Jan-2021 17:00:57.997 WARNING [main]
org.apache.tomcat.util.descriptor.DigesterFactory.locationFor The XML
schema [XMLSchema.dtd] could not be found. This is very likely to break XML
validation if XML validation is enabled.
03-Jan-2021 17:00:57.997 WARNING [main]
org.apache.tomcat.util.descriptor.DigesterFactory.locationFor The XML
schema [datatypes.dtd] could not be found. This is very likely to break XML
validation if XML validation is enabled.
03-Jan-2021 17:00:57.997 WARNING [main]
org.apache.tomcat.util.descriptor.DigesterFactory.locationFor The XML
schema [xml.xsd] could not be found. This is very likely to break XML
validation if XML validation is enabled.
03-Jan-2021 17:00:57.998 WARNING [main]
org.apache.tomcat.util.descriptor.DigesterFactory.locationFor The XML
schema [web-app_2_2.dtd] could not be found. This is very likely to break
XML validation if XML validation is enabled.
03-Jan-2021 17:00:57.998 WARNING [main]
org.apache.tomcat.util.descriptor.DigesterFactory.locationFor The XML
schema [web-jsptaglibrary_1_1.dtd] could not be found. This is very likely
to break XML validation if XML validation is enabled.
03-Jan-2021 17:00:57.999 WARNING [main]
org.apache.tomcat.util.descriptor.DigesterFactory.locationFor The XML
schema [web-app_2_3.dtd] could not be found. This is very likely to break
XML validation if XML validation is enabled.
03-Jan-2021 17:00:57.999 WARNING [main]
org.apache.tomcat.util.descriptor.DigesterFactory.locationFor The XML
schema [web-jsptaglibrary_1_2.dtd] could not be found. This is very likely
to break XML validation if XML validation is enabled.
03-Jan-2021 17:00:57.999 WARNING [main]
org.apache.tomcat.util.descriptor.DigesterFactory.locationFor The XML
schema [j2ee_web_services_1_1.xsd] could not be found. This is very likely
to break XML validation if XML validation is enabled.
03-Jan-2021 17:00:58.000 WARNING [main]
org.apache.tomcat.util.descriptor.DigesterFactory.locationFor The XML
schema [j2ee_web_services_client_1_1.xsd] could not be found. This is very
likely to break XML validation if XML validation is enabled.
03-Jan-2021 17:00:58.000 WARNING [main]
org.apache.tomcat.util.descriptor.DigesterFactory.locationFor The XML
schema [web-app_2_4.xsd] could not be found. This is very likely to break
XML validation if XML validation is enabled.
03-Jan-2021 17:00:58.001 WARNING [main]
org.apache.tomcat.util.descriptor.DigesterFactory.locationFor The XML
schema [web-jsptaglibrary_2_0.xsd] could not be found. This is very likely
to break XML validation if XML validation is enabled.
03-Jan-2021 17:00:58.001 WARNING [main]
org.apache.tomcat.util.descriptor.DigesterFactory.locationFor The XML
schema [j2ee_1_4.xsd] could not be found. This is very likely to break XML
validation if XML validation is enabled.
03-Jan-2021 17:00:58.002 WARNING [main]
org.apache.tomcat.util.descriptor.DigesterFactory.locationFor The XML
schema [jsp_2_0.xsd] could not be found. This is very likely to break XML
validation if XML validation is enabled.
03-Jan-2021 17:00:58.002 WARNING [main]
org.apache.tomcat.util.descriptor.DigesterFactory.locationFor The XML
schema [web-app_2_5.xsd] could not be found. This is very likely to break
XML validation if XML validation is enabled.
03-Jan-2021 17:00:58.003 WARNING [main]
org.apache.tomcat.util.descriptor.DigesterFactory.locationFor The XML
schema [web-jsptaglibrary_2_1.xsd] could not be found. This is very likely
to break XML validation if XML validation is enabled.
03-Jan-2021 17:00:58.003 WARNING [main]
org.apache.tomcat.util.descriptor.DigesterFactory.locationFor The XML
schema [javaee_5.xsd] could not be found. This is very likely to break XML
validation if XML validation is enabled.
03-Jan-2021 17:00:58.003 WARNING [main]
org.apache.tomcat.util.descriptor.DigesterFactory.locationFor The XML
schema [jsp_2_1.xsd] could not be found. This is very likely to break XML
validation if XML validation is enabled.
03-Jan-2021 17:00:58.004 WARNING [main]
org.apache.tomcat.util.descriptor.DigesterFactory.locationFor The XML
schema [javaee_web_services_1_2.xsd] could not be found. This is very
likely to break XML validation if XML validation is enabled.
03-Jan-2021 17:00:58.004 WARNING [main]
org.apache.tomcat.util.descriptor.DigesterFactory.locationFor The XML
schema [javaee_web_services_client_1_2.xsd] could not be found. This is
very likely to break XML validation if XML validation is enabled.
03-Jan-2021 17:00:58.004 WARNING [main]
org.apache.tomcat.util.descriptor.DigesterFactory.locationFor The XML
schema [web-app_3_0.xsd] could not be found. This is very likely to break
XML validation if XML validation is enabled.
03-Jan-2021 17:00:58.005 WARNING [main]
org.apache.tomcat.util.descriptor.DigesterFactory.locationFor The XML
schema [web-fragment_3_0.xsd] could not be found. This is very likely to
break XML validation if XML validation is enabled.
03-Jan-2021 

Re: Tomcat 10 digester warnings

2021-01-03 Thread Mark Thomas
On 03/01/2021 10:35, Romain Manni-Bucau wrote:
> Hi all
> 
> Digester prefetches most ee schemas but ee 9 bundle does miss most of them
> leading to a lot of warnings at startup.
> Should log level be reduced or fetching be done lazily?

Sorry, I am unable to understand the problem you are describing. What
are the steps to recreate it from a clean Tomcat 10 install?

Mark

-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org



Tomcat 10 digester warnings

2021-01-03 Thread Romain Manni-Bucau
Hi all

Digester prefetches most ee schemas but ee 9 bundle does miss most of them
leading to a lot of warnings at startup.
Should log level be reduced or fetching be done lazily?