Re: [all] OSS Fuzz

2021-04-13 Thread Bruno P. Kinoshita
 +1 for oss fuzz. Fabian also got in contact a few days earlier, and asked me 
about using it with Commons Imaging. I told him it had to be discussed here 
first, but that I thought it could be useful (we are parsing several image file 
formats, probably a few things could be improved).


As for the mailing list, for me it depends on the amount of messages, and 
false-positives. i.e. if we get 50 e-mails in security@commons in one week, and 
turns out only 1 is actually a security issue, and the others are either normal 
bugs and no bugs, then eventually I think I'd just create a filter to move all 
the security@commons to a folder and have a look someday.


I think  we don't have any idea how many e-mails we might get enabling it for 
one or for a few components. So I'd be OK with

- sending e-mails to security@commons initially, but if it spams the list with 
non-security related e-mails, then move to a separate mailing list; OR
- create the new mailing list (probably private too? until we filter the 
issues?) and use it for a few weeks/months. If the traffic is low, or most 
issues are really security related, then move to security@commons if others 
agree

Either way would be OK for me.

Cheers
Bruno


On Wednesday, 14 April 2021, 4:49:31 am NZST, Stefan Bodewig 
 wrote:  
 
 Hi all

I want to pick up (and finish) the discussion that started in
Compress[1].

Short Recap:


OSS Fuzz[2] runs fuzz testing for open source projects by invoking
methods of our code with random data looking for unexpected outcomes
(undeclared exceptions or worse code that never returns because it is
stuck in an infinite loop for example).

For Compress Fabian (who started [1]) has already identified and
reported several issues, one of which would have become a CVE if the
code in question had been part of any release of Compress. In the past
other people have run different fuzzers and found "interesting" results
in Compress as well.

Compress may be especially vulnerable as it basically tries to make
sense out of a bunch of user supplied bytes - but the same is probably
true for codec or imaging for example.

Fabian has offered to set up OSS Fuzz for Compress. Given that the
issues OSS Fuzz detects may or may not be security sensitive, I don't
feel it would be a good idea to have the tool send reports to a public
mailing list. Therefore I propose to create another subscription
moderated list just for these kinds of reports. I'm afraid it could be
too noisy for security@commons.

Proposal


Unless anybody objects until then I will create such a list (I believe
there is a self-service thingy for that, otherwise I'll ask the infra
folks) on the coming Sunday. I'd add myself as a moderator but we will
need more moderators. Also I'll gladly accept ideas for the name of the
list.

If there are objections against yet another mailing list I'll ask Fabian
to set things up using a private mail alias. If you want to receive the
messages as well, please tell me.

Cheers

        Stefan

[1] 
https://lists.apache.org/thread.html/rb34ea7d9272b8e600437ea705b13aba1bcc2f23ceb55880bce27e479%40%3Cdev.commons.apache.org%3E

[2] https://google.github.io/oss-fuzz/

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

  

Re: [Vote] Create a "machine learning" component

2021-04-13 Thread Gilles Sadowski
Le mar. 13 avr. 2021 à 18:21, Avijit Basak  a écrit :
>
> Hi
>
>   Please find my comments below.
>
> >> I don't follow the distinction "prod" vs "non-prod".
>  -- Actually in Prod we really need a very high performing system. So
> use of implicit parallelism in spark would help us to achieve it. But for
> other types of work like POC or R we may not need such performance.

Isn't a GA inherently parallel?
If so, why not take advantage of the concurrency tools provided by the JDK?

> >> the question was actually whether you are willing to modularize CM
>  -- I am not much aware of other ml components in commons. I would look
> into it.

I've mentioned them in earlier messages:
 * Self-organizing feature map (artificial neural net)
 * Clustering

The former is multi-threaded; the latter should be refactored to
take advantage of multi-threading.

> >>You did not expand about the usability/performance (e.g. the issue of
> multi-threading)
>  -- Are we planning to incorporate parallel GA.

Aren't you?

> Then multi-threading
> would be a more appropriate option.

IMHO, a necessary one.

> >> So, as a way forward, I would suggest that you create a project on
> GitHub (copying all the settings from a *Commons modular* component, such as
> "Commons Numbers")
>  -- Could you kindly share the GitHub repository URL for any Commons
> modular component.

https://github.com/apache/commons-rng
https://github.com/apache/commons-numbers
https://github.com/apache/commons-geometry
https://github.com/apache/commons-statistics

>
> Thanks & Regards
> --Avijit Basak
>
>
> On Tue, 13 Apr 2021 at 18:29, Gilles Sadowski  wrote:
>
> > Hello.
> >
> > Le lun. 12 avr. 2021 à 17:21, Avijit Basak  a
> > écrit :
> > >
> > > Hi
> > >
> > >  Sorry for the delayed response. Thanks for your patience. Please
> > > find my comments below:
> > >
> > >  (1) Why not Spark?  [At least post over there (?).]
> > >   --We can move to Spark. But it will be very much useful if the
> > things
> > > can also run without Spark. The use of Spark would make more sense in a
> > > production environment. But the portability of the library will be more
> > > useful for the non-prod environment.
> >
> > I don't follow the distinction "prod" vs "non-prod".
> >
> > > Definitely, we can reach the Spark
> > > team and query.
> >
> > That would be a good idea...
> >
> > >  (2) Further develop a monolithic CM?  [Who will do it?]
> > >--I can help with the upgrade of the existing library related to
> > GA
> > > functionality.
> >
> > Sure, but nobody is currently working on (2).
> >
> > >  (3) Modularize CM? [Who will do it?]
> > >--I can help with the upgrade of the existing library related to
> > GA
> > > functionality.
> >
> > I don't doubt it; but the question was actually whether you are willing
> > to modularize CM (that is: in addition to, and before, contributing to
> > the GA functionality).
> >
> > >  (4) New component (with another name) with the proposed contents?
> > >--This is the best option if permitted.
> >
> > Currently, only the two of us are in favour of this alternative.
> >
> > Nobody, by their action, is really in favour of any of the other
> > alternatives.
> > So, as a way forward, I would suggest that you create a project on GitHub
> > (copying all the settings from a Commons modular component, such as
> > "Commons Numbers"), to be eventually integrated here, once its potential
> > has been demonstrated.
> >
> > >   The code which I have written can be reused with minor
> > modifications.
> > > So it won't take too much effort for this activity.
> >
> > You did not expand about the usability/performance (e.g. the issue of
> > multi-threading)...
> >
> > Regards,
> > Gilles
> >
> > >> [...]
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [all] OSS Fuzz

2021-04-13 Thread Gary Gregory
Please don't use @security for automated emails, that ML IMO should be for
humans.

If you want to setup a new ML for bots that's fine, we can direct GitHub's
Dependanot emails there if GitHub allows for that.

Gary

On Tue, Apr 13, 2021, 12:57 Mark Thomas  wrote:

> On 13/04/2021 17:49, Stefan Bodewig wrote:
>
> 
>
> > Fabian has offered to set up OSS Fuzz for Compress. Given that the
> > issues OSS Fuzz detects may or may not be security sensitive, I don't
> > feel it would be a good idea to have the tool send reports to a public
> > mailing list. Therefore I propose to create another subscription
> > moderated list just for these kinds of reports. I'm afraid it could be
> > too noisy for security@commons.
>
> Following the "split by audience, not by topic" guideline, I'd suggest
> using security@commons.a.o rather than a separate list. Much, much
> bigger projects than Compress use OSS Fuzz and direct traffic to their
> security list where it seems to be manageable.
>
> Mark
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>


Re: [all] OSS Fuzz

2021-04-13 Thread Mark Thomas

On 13/04/2021 17:49, Stefan Bodewig wrote:




Fabian has offered to set up OSS Fuzz for Compress. Given that the
issues OSS Fuzz detects may or may not be security sensitive, I don't
feel it would be a good idea to have the tool send reports to a public
mailing list. Therefore I propose to create another subscription
moderated list just for these kinds of reports. I'm afraid it could be
too noisy for security@commons.


Following the "split by audience, not by topic" guideline, I'd suggest 
using security@commons.a.o rather than a separate list. Much, much 
bigger projects than Compress use OSS Fuzz and direct traffic to their 
security list where it seems to be manageable.


Mark

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



[all] OSS Fuzz

2021-04-13 Thread Stefan Bodewig
Hi all

I want to pick up (and finish) the discussion that started in
Compress[1].

Short Recap:


OSS Fuzz[2] runs fuzz testing for open source projects by invoking
methods of our code with random data looking for unexpected outcomes
(undeclared exceptions or worse code that never returns because it is
stuck in an infinite loop for example).

For Compress Fabian (who started [1]) has already identified and
reported several issues, one of which would have become a CVE if the
code in question had been part of any release of Compress. In the past
other people have run different fuzzers and found "interesting" results
in Compress as well.

Compress may be especially vulnerable as it basically tries to make
sense out of a bunch of user supplied bytes - but the same is probably
true for codec or imaging for example.

Fabian has offered to set up OSS Fuzz for Compress. Given that the
issues OSS Fuzz detects may or may not be security sensitive, I don't
feel it would be a good idea to have the tool send reports to a public
mailing list. Therefore I propose to create another subscription
moderated list just for these kinds of reports. I'm afraid it could be
too noisy for security@commons.

Proposal


Unless anybody objects until then I will create such a list (I believe
there is a self-service thingy for that, otherwise I'll ask the infra
folks) on the coming Sunday. I'd add myself as a moderator but we will
need more moderators. Also I'll gladly accept ideas for the name of the
list.

If there are objections against yet another mailing list I'll ask Fabian
to set things up using a private mail alias. If you want to receive the
messages as well, please tell me.

Cheers

Stefan

[1] 
https://lists.apache.org/thread.html/rb34ea7d9272b8e600437ea705b13aba1bcc2f23ceb55880bce27e479%40%3Cdev.commons.apache.org%3E

[2] https://google.github.io/oss-fuzz/

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Vote] Create a "machine learning" component

2021-04-13 Thread Avijit Basak
Hi

  Please find my comments below.

>> I don't follow the distinction "prod" vs "non-prod".
 -- Actually in Prod we really need a very high performing system. So
use of implicit parallelism in spark would help us to achieve it. But for
other types of work like POC or R we may not need such performance.
>> the question was actually whether you are willing to modularize CM
 -- I am not much aware of other ml components in commons. I would look
into it.
>>You did not expand about the usability/performance (e.g. the issue of
multi-threading)
 -- Are we planning to incorporate parallel GA. Then multi-threading
would be a more appropriate option.
>> So, as a way forward, I would suggest that you create a project on
GitHub (copying all the settings from a *Commons modular* component, such as
"Commons Numbers")
 -- Could you kindly share the GitHub repository URL for any Commons
modular component.

Thanks & Regards
--Avijit Basak


On Tue, 13 Apr 2021 at 18:29, Gilles Sadowski  wrote:

> Hello.
>
> Le lun. 12 avr. 2021 à 17:21, Avijit Basak  a
> écrit :
> >
> > Hi
> >
> >  Sorry for the delayed response. Thanks for your patience. Please
> > find my comments below:
> >
> >  (1) Why not Spark?  [At least post over there (?).]
> >   --We can move to Spark. But it will be very much useful if the
> things
> > can also run without Spark. The use of Spark would make more sense in a
> > production environment. But the portability of the library will be more
> > useful for the non-prod environment.
>
> I don't follow the distinction "prod" vs "non-prod".
>
> > Definitely, we can reach the Spark
> > team and query.
>
> That would be a good idea...
>
> >  (2) Further develop a monolithic CM?  [Who will do it?]
> >--I can help with the upgrade of the existing library related to
> GA
> > functionality.
>
> Sure, but nobody is currently working on (2).
>
> >  (3) Modularize CM? [Who will do it?]
> >--I can help with the upgrade of the existing library related to
> GA
> > functionality.
>
> I don't doubt it; but the question was actually whether you are willing
> to modularize CM (that is: in addition to, and before, contributing to
> the GA functionality).
>
> >  (4) New component (with another name) with the proposed contents?
> >--This is the best option if permitted.
>
> Currently, only the two of us are in favour of this alternative.
>
> Nobody, by their action, is really in favour of any of the other
> alternatives.
> So, as a way forward, I would suggest that you create a project on GitHub
> (copying all the settings from a Commons modular component, such as
> "Commons Numbers"), to be eventually integrated here, once its potential
> has been demonstrated.
>
> >   The code which I have written can be reused with minor
> modifications.
> > So it won't take too much effort for this activity.
>
> You did not expand about the usability/performance (e.g. the issue of
> multi-threading)...
>
> Regards,
> Gilles
>
> >> [...]
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>

-- 
Avijit Basak


Re: [Vote] Create a "machine learning" component

2021-04-13 Thread Gilles Sadowski
Hello.

Le lun. 12 avr. 2021 à 17:21, Avijit Basak  a écrit :
>
> Hi
>
>  Sorry for the delayed response. Thanks for your patience. Please
> find my comments below:
>
>  (1) Why not Spark?  [At least post over there (?).]
>   --We can move to Spark. But it will be very much useful if the things
> can also run without Spark. The use of Spark would make more sense in a
> production environment. But the portability of the library will be more
> useful for the non-prod environment.

I don't follow the distinction "prod" vs "non-prod".

> Definitely, we can reach the Spark
> team and query.

That would be a good idea...

>  (2) Further develop a monolithic CM?  [Who will do it?]
>--I can help with the upgrade of the existing library related to GA
> functionality.

Sure, but nobody is currently working on (2).

>  (3) Modularize CM? [Who will do it?]
>--I can help with the upgrade of the existing library related to GA
> functionality.

I don't doubt it; but the question was actually whether you are willing
to modularize CM (that is: in addition to, and before, contributing to
the GA functionality).

>  (4) New component (with another name) with the proposed contents?
>--This is the best option if permitted.

Currently, only the two of us are in favour of this alternative.

Nobody, by their action, is really in favour of any of the other alternatives.
So, as a way forward, I would suggest that you create a project on GitHub
(copying all the settings from a Commons modular component, such as
"Commons Numbers"), to be eventually integrated here, once its potential
has been demonstrated.

>   The code which I have written can be reused with minor modifications.
> So it won't take too much effort for this activity.

You did not expand about the usability/performance (e.g. the issue of
multi-threading)...

Regards,
Gilles

>> [...]

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [lang] Failing test on Java 16-EA.

2021-04-13 Thread Jaikiran Pai
Hello Gary,

I had a look at this one and I was able to reproduce this. Based on my reading 
of the code and what it does, IMO, this is a JDK issue. Since this was 
previously raised and reported in this list here[1] and a JDK issue was created 
https://bugs.openjdk.java.net/browse/JDK-8262108, I decided to reopen that 
issue and have included the necessary details of my investigation there.

[1] https://www.mail-archive.com/dev@commons.apache.org/msg70599.html

P.S: I'm not subscribed to this commons dev mailing list and I just watch/reply 
from the Apache mailing list tools, so my responses might be delayed.

-Jaikiran

On 2021/03/28 17:17:13, Gary Gregory  wrote: 
> I'm till looking for help on getting LANG working on Java 16...
> 
> Gary
> 
> On Sat, Mar 20, 2021, 21:39 Gary Gregory  wrote:
> 
> > Now that Java 16 is out, we really need to look at this IMO but I would
> > like help from the community.
> >
> > My initial guess that this a JDK bug might be wrong and it could be an
> > issue in our code.
> >
> > Gary
> >
> > On Tue, Feb 23, 2021, 22:13 Gary Gregory  wrote:
> >
> >> Hi All:
> >>
> >> If you feel so inclined, I'd like help with
> >> FastDateParserTest.java#testParsesKnownJava16Ea25Failure().
> >>
> >> The test fails on Java 16 Early Access build 25 and above, I am now
> >> testing with build
> >> 36.
> >>
> >> I cannot tell if this a bug in our code or in the underlying JRE.
> >>
> >> TY!
> >> Gary
> >>
> >
> 

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org