Re: [ALL] Volunteers for a Math IPMC?
On Sat, Jun 18, 2016 at 4:29 AM, Gilles <gil...@harfang.homelinux.org> wrote: > ... > I'm asking, again, whether I need to initiate a VOTE that would allow me > to set up a workspace ("git", etc.) and transfer some code from CM over > there. > Nothing is stopping you from setting something up. Github is usually the easiest way. It doesn't sound like that is what you want, however. I don't understand why not. > > It may be that incubation is a good thing for Commons Math, but it doesn't >> seem valid to say that incubation is necessary because CM is being kicked >> out of Commons. >> > > Never said so. > Hmm... I must have misunderstood the comment about CM not being interested in hosting "these components". > There is a confusion here: *I* say that CM is dead. > Strong words. Such statements are often frustrating to others. It does sound like the community has dwindled, perhaps beyond repair. The development situation *will* change because the context *has* changed > (unsupported code). CM cannot go on as it did before the fork. > You can never go home. No project stays the same. > Everybody (developers, users, Commons PMC) would be better off with a > selected set of new (supported) components because this is something we > can easily do *now* (RERO, etc.). > This was your assertion in the long email thread. It seemed that there was significant counter-positions. > I'm OK to go through the incubator to do that; but I don't see that it > is an easier path. Surely it looks longer. And it seems that even the > incubator people doubt that it will lead anywhere. > The incubator is for building community and adapting to Apache. If you don't have a seed community, then incubator is the wrong place. You need to have more than just you. > > Given the uncertain outcome, going through the incubator would be an > attempt at rethinking the development of the currently unsupported > code. See e.g. > https://issues.apache.org/jira/browse/MATH-172 > [Or is that out of scope for an incubation proposal?] Incubator is not a place to rethink code. It is primarily for building community. > > > > Gilles > > > On Fri, Jun 17, 2016 at 3:35 PM, Gilles <gil...@harfang.homelinux.org> >> wrote: >> >> On Fri, 17 Jun 2016 08:51:36 -0700, Ted Dunning wrote: >>> >>> Excuse me? >>>> >>>> See inline. >>>> >>>> >>>> >>>> On Fri, Jun 17, 2016 at 7:50 AM, Gilles <gil...@harfang.homelinux.org> >>>> wrote: >>>> >>>> Hi all. >>>> >>>>> >>>>> On Tue, 14 Jun 2016 11:01:13 -0700, Ralph Goers wrote: >>>>> >>>>> I thought this had been made clear. Several months Commons voted to >>>>> >>>>>> make Math a TLP. But shortly after that most of the people involved >>>>>> with Commons Math felt that a TLP at the ASF would not work for them, >>>>>> so they forked the project and left, effectively voiding the TLP vote >>>>>> since the proposed PMC is no longer valid. There is one person left >>>>>> who was very involved in Commons Math and a few other people who have >>>>>> expressed interest in joining the new community. >>>>>> >>>>>> So this is a situation where we have an already existing code base >>>>>> where a lot of the people left are not familiar with quite a bit of >>>>>> it. The new group of people who are interested are trying to >>>>>> determine how they should move forward. There is some talk of breaking >>>>>> Commons Math into smaller components and possibly dropping some where >>>>>> there is no one to maintain it. >>>>>> >>>>>> >>>>>> The "Commons" project not being interested in hosting those >>>>> components, >>>>> is the "incubator" a good place for the developers wishing to go in >>>>> that >>>>> direction? >>>>> >>>>> >>>>> Perhaps before we move to next steps, could you provide some links to >>>> the >>>> discussion where it was decided that Commons is not interested in >>>> hosting >>>> these components? >>>> >>>> >>> I proposed to concretely examine this possibility in more than >>> one message: >>> http://markmail.org/message/ye6wvqvlvnqe4qrp >>> http://markmail.org/message/3gupcednhqtcfepw >>> http://markmail.org/message/3kob7djjicax6rgn >>> http://markmail.org/message/7rb2mxq7hhwzykvr >>> >>> And again in another thread: >>> http://markmail.org/message/fnlta2ttfne3aj5f >>> >>> >>> What's the next step? >>>>> >>>>> >>>>> Let's get to a common understanding of what went before. >>>> >>>> >>> Even that seems impossible. :-( >>> >>> >>> Gilles >>> >>> >>> >>> >>> - >>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >>> For additional commands, e-mail: general-h...@incubator.apache.org >>> >>> >>> > > - > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > >
Re: [ALL] Volunteers for a Math IPMC?
Gilles, Thanks for links. I just read that (long-winded) thread and I see no consensus that "Commons project is not being interested in hosting those components". It may be that incubation is a good thing for Commons Math, but it doesn't seem valid to say that incubation is necessary because CM is being kicked out of Commons. On Fri, Jun 17, 2016 at 3:35 PM, Gilles <gil...@harfang.homelinux.org> wrote: > On Fri, 17 Jun 2016 08:51:36 -0700, Ted Dunning wrote: > >> Excuse me? >> >> See inline. >> >> >> >> On Fri, Jun 17, 2016 at 7:50 AM, Gilles <gil...@harfang.homelinux.org> >> wrote: >> >> Hi all. >>> >>> On Tue, 14 Jun 2016 11:01:13 -0700, Ralph Goers wrote: >>> >>> I thought this had been made clear. Several months Commons voted to >>>> make Math a TLP. But shortly after that most of the people involved >>>> with Commons Math felt that a TLP at the ASF would not work for them, >>>> so they forked the project and left, effectively voiding the TLP vote >>>> since the proposed PMC is no longer valid. There is one person left >>>> who was very involved in Commons Math and a few other people who have >>>> expressed interest in joining the new community. >>>> >>>> So this is a situation where we have an already existing code base >>>> where a lot of the people left are not familiar with quite a bit of >>>> it. The new group of people who are interested are trying to >>>> determine how they should move forward. There is some talk of breaking >>>> Commons Math into smaller components and possibly dropping some where >>>> there is no one to maintain it. >>>> >>>> >>> The "Commons" project not being interested in hosting those components, >>> is the "incubator" a good place for the developers wishing to go in that >>> direction? >>> >>> >> Perhaps before we move to next steps, could you provide some links to the >> discussion where it was decided that Commons is not interested in hosting >> these components? >> > > I proposed to concretely examine this possibility in more than > one message: > http://markmail.org/message/ye6wvqvlvnqe4qrp > http://markmail.org/message/3gupcednhqtcfepw > http://markmail.org/message/3kob7djjicax6rgn > http://markmail.org/message/7rb2mxq7hhwzykvr > > And again in another thread: > http://markmail.org/message/fnlta2ttfne3aj5f > > >>> What's the next step? >>> >>> >> Let's get to a common understanding of what went before. >> > > Even that seems impossible. :-( > > > Gilles > > > > - > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > >
Re: [ALL] Volunteers for a Math IPMC?
Excuse me? See inline. On Fri, Jun 17, 2016 at 7:50 AM, Gilleswrote: > Hi all. > > On Tue, 14 Jun 2016 11:01:13 -0700, Ralph Goers wrote: > >> I thought this had been made clear. Several months Commons voted to >> make Math a TLP. But shortly after that most of the people involved >> with Commons Math felt that a TLP at the ASF would not work for them, >> so they forked the project and left, effectively voiding the TLP vote >> since the proposed PMC is no longer valid. There is one person left >> who was very involved in Commons Math and a few other people who have >> expressed interest in joining the new community. >> >> So this is a situation where we have an already existing code base >> where a lot of the people left are not familiar with quite a bit of >> it. The new group of people who are interested are trying to >> determine how they should move forward. There is some talk of breaking >> Commons Math into smaller components and possibly dropping some where >> there is no one to maintain it. >> > > The "Commons" project not being interested in hosting those components, > is the "incubator" a good place for the developers wishing to go in that > direction? > Perhaps before we move to next steps, could you provide some links to the discussion where it was decided that Commons is not interested in hosting these components? > > What's the next step? > Let's get to a common understanding of what went before.
Re: [ALL] Volunteers for a Math IPMC?
On Wed, Jun 15, 2016 at 10:21 AM, John D. Amentwrote: > Yep absolutely. I don't think the incubator has ever rejected a project? > We have discouraged some submissions. But I have never seen a formal submission be denied.
Re: [ALL] Volunteers for a Math IPMC?
Jochen, The need to build the community (nearly) from scratch is definitely NOT a reason for rejection. It is simply a risk factor that must be mitigated to succeed in incubation. On Tue, Jun 14, 2016 at 10:51 PM, Jochen Wiedmannwrote: > On Tue, Jun 14, 2016 at 11:29 PM, John D. Ament > wrote: > > > We generally expect some kind of backing community to bring this to. We > > have seen pretty consistently that starting from an empty community > doesn't > > work. It doesn't mean that it's impossible, but very hard to do. > > Understood. On the other hand: Would that be sufficient reason for > rejecting a proposal? ("It didn't > work in the past" != "It won't work in this case") > > > -- > The next time you hear: "Don't reinvent the wheel!" > > > http://www.keystonedevelopment.co.uk/wp-content/uploads/2014/10/evolution-of-the-wheel-300x85.jpg > > - > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > >
Re: [ALL] Volunteers for a Math IPMC?
On Tue, Jun 14, 2016 at 2:29 PM, John D. Amentwrote: > We generally expect some kind of backing community to bring this to. We > have seen pretty consistently that starting from an empty community doesn't > work. It doesn't mean that it's impossible, but very hard to do. > Frankly, the exceptions to this observation (such as Drill) pretty much reinforce the conclusion. Drill managed to build a community, but only because of a LOT of effort on the part of the founders of the project.
Re: [ALL] Volunteers for a Math IPMC?
Looking back through the discussion, it is a bit of a problem that one of the major reasons given for the fork is that the team thought that they didn't have a large enough PMC and that incubation wouldn't get them enough additional contributors. That made it seem like the project should go forward without meeting Apache requirements (i.e. outside). Is the situation really that different now that a vastly diminished team is likely to benefit from incubation enough to form a viable TLP? (I hate that this sounds negative ... it is a real question) On Tue, Jun 14, 2016 at 11:01 AM, Ralph Goerswrote: > I thought this had been made clear. Several months Commons voted to make > Math a TLP. But shortly after that most of the people involved with Commons > Math felt that a TLP at the ASF would not work for them, so they forked the > project and left, effectively voiding the TLP vote since the proposed PMC > is no longer valid. There is one person left who was very involved in > Commons Math and a few other people who have expressed interest in joining > the new community. > > So this is a situation where we have an already existing code base where a > lot of the people left are not familiar with quite a bit of it. The new > group of people who are interested are trying to determine how they should > move forward. There is some talk of breaking Commons Math into smaller > components and possibly dropping some where there is no one to maintain it. > > Ralph > > > On Jun 11, 2016, at 6:21 PM, Niclas Hedhman wrote: > > > > If you have a functioning community around Commons Math already, why do > you > > feel you need Incubation? > > > > People on a Math TLP would come out of the Commons PMC and simply submit > a > > Board Resolution, and I doubt that there would be any objects. There are > no > > legal concerns, no community training, no need for release management > > training, and so on... > > > > Or are you looking at a situation where the Commons community has no > > interest in Math subproject, and need new blood? > > > > > > Cheers > > Niclas > > > > On Sat, Jun 11, 2016 at 6:25 PM, James Carman < > ja...@carmanconsulting.com> > > wrote: > > > >> We (the Commons PMC) have not decided yet what to do, but I just wanted > to > >> gauge the interest in joining the math IPMC if we choose to go TLP by > way > >> of the incubator. The idea would be that math (whatever its name may > be), > >> would go through the incubator in order to enrich its community prior to > >> becoming a TLP. Do we have any folks willing to throw their hat in the > >> ring? > >> > >> p.s. I've cross-posted to the incubator list as there are folks there > who > >> are very good at this stuff and could perhaps lend us some advice. > >> > > > > > > > > -- > > Niclas Hedhman, Software Developer > > http://zest.apache.org - New Energy for Java > > > > - > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > >
Re: [collections] MultiValuedMap interface discussion
Following Guava on this has something to be said for it. https://code.google.com/p/guava-libraries/wiki/NewCollectionTypesExplained Their decision is that Multimap#get returns a collection always. If there are no values, then an empty collection is returned so that you can always do m.get(key).size() or m.get(key).add(foo) The value returned is a magical view which only takes up space on demand so there is little consing done. There is an asMap method for which get will return null on missing keys. On Thu, Mar 27, 2014 at 2:55 PM, Paul Benedict pbened...@apache.org wrote: The downside of it returning an empty collection is you either have (1) to instantiate a collection just to say you have nothing or (2) you use an immutable collection. #1 is bad in itself and #2 is only as bad if the collection is otherwise writable. For example, it would be really strange for the returned collection to be mutable if you have something but immutable if you have nothing. My preference is you return null. That's the most rational answer, imo. On Thu, Mar 27, 2014 at 4:44 PM, Thomas Neidhart thomas.neidh...@gmail.comwrote: Hi, we are currently working on a new MultiValuedMap interface for collections, see https://issues.apache.org/jira/browse/COLLECTIONS-508. During the work we stumbled across an issue we would like to discuss. The MultiValuedMap is basically a Map that can hold multiple values associated to a given key. Thus the get(K key) method will normally return a Collection. In case no mapping for the key is stored in the map, it may either return null (like a normal map), or an empty collection. I would be in favor to define that get() always returns a collection and never returns null. The advantage being that the result of get() can safely be used for further operations, e.g. size(), iterator(), ... keeping the interface of MultiValuedMap smaller and simple (i.e. no need to add additional methods there like size(K key) or iterator(K key)). The containsKey method would have to check if there is either no mapping at all for the key or the stored collection is empty: public boolean containsKey(K key) { Collection coll = decoratedMap().get(key); return coll != null coll.size 0; } The downside would be that read operations may also alter the map thus leading to unexpected ConcurrentModificationExceptions when iterating on e.g. value(). So, I would be interested on opinions about this. Thomas - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org -- Cheers, Paul
Re: [math] Proposal for New way of Computing an approximate Percentile without storing input data
On Sun, Mar 23, 2014 at 2:09 AM, Thomas Neidhart thomas.neidh...@gmail.comwrote: There is already an issue for this: https://issues.apache.org/jira/browse/MATH-418 It links also other implementations and algorithms, maybe you could add a link to your's as well? Done. Thanks for the pointer.
Re: [math] Proposal for New way of Computing an approximate Percentile without storing input data
Murthy, I recently developed an alternative algorithm which provides superior accuracy for extreme quantiles. You can read more at https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf?raw=true The library involved is available via maven and is apache licensed. Apache Commons Math has a no dependency policy which might mean that sucking in the code would be a better option than simply linking to this. The standard of the art before t-digest is generally considered to be either Greenwald and Khanna's algorithm GK01 or the Q-digest. References are in the paper above. In case it isn't obvious, source code is available on github at https://github.com/tdunning/t-digest The p^2 algorithm that you suggest is actually quite old and far from the state of the art. On Sat, Mar 22, 2014 at 2:40 PM, Phil Steitz phil.ste...@gmail.com wrote: On 3/22/14, 2:11 PM, venkatesha m wrote: Hi, I would like to propose for adding new way of computing the percentile without needing to store most of input data. Since this is my first time on contributing to apache; please help me / correct me if i miss any procedure here. Here are the details. Description: The Percentile calculation in a traditional way require all the data points to be stored and sorted before accesiing the pth Percentile value of the data set. However the storage of points can become prohibitive when we need to make use of the existing Percentile Implementation at big data scale(For eg: when computing the daily or weekly percentile value of a certain performance metric where the data points accumulated over day and week may run to GB and TB). While platforms such as hadoop exist to solve the data scale issue; the need for a statistical computation of quantiles without storing data is an absolute essential. While looking in commons-math classes though Percentile class is available it is implemented with storage of input as requirement. So was wondering if we could add a class to calculate Percentile without needing to store data. The algorithm that i have chosen to implement and propose is based on P Square algorithm ( http://www.cs.wustl.edu/~jain/papers/ftp/psqr.pdf ) which requires a minimal and finite set of memory stores to compute percentiles for continuous stream of data. Ref: http://www.cs.wustl.edu/~jain/papers/ftp/psqr.pdf which has succing representation of the workflow of the algorithm Advantages: a) As is claimed in the orignal workd the accuracy improves over moderate to large data sets which is the need. b) A minimal and constant sized data store used to compute a large data set c) Useful in Hadoop Map reduce applications Implementation: I have implemented this algorithm based on StorelessUnivariateStatistic after checking out from 3.2 branch. I have also opened a JIRA ticket on the same (https://issues.apache.org/jira/browse/MATH-1112 ) for requesting a new feature to be added. Please let me know when and how i could send my code for review. Thanks, Murthy and welcome! I am personally fine with the P-Square algorithm and would welcome a patch including implementation. Unless others disagree with this approach (give it a day or two), I would go ahead and attach a patch with the implementation to the JIRA you opened. Thanks in advance for your contributions! Phil thanks murthy - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] refactoring least squares
On Tue, Feb 25, 2014 at 6:22 AM, Konstantin Berlin kber...@gmail.comwrote: Hi, I am really having problems believing that matrix copying is the major problem in an optimization algorithm. Copying is O(N^2) operations. Surely, for any problem where performance would matter, it is completely dwarfed by the O(N^3) complexity of actually solving the normal equation. Also, I think testing should be done on an actual large problem where scaling issuing would show up. The 1000x2 jaccobian would results in a 2x2 normal equation. Surely this is not a good test case. Konstantin As you point out, the test case in question shows how copying dominates computation for massively over-determined systems.
Re: [math] refactoring least squares
On Tue, Feb 25, 2014 at 8:39 AM, luc l...@spaceroots.org wrote: Also, I think testing should be done on an actual large problem where scaling issuing would show up. The 1000x2 jaccobian would results in a 2x2 normal equation. Surely this is not a good test case. Konstantin As you point out, the test case in question shows how copying dominates computation for massively over-determined systems. You are right. Massively over-determined systems is also an important class of problems, so it needs to be addressed. I am aware there are many other important classes of problems, though, so there are probably no silver bullets and what is important here is not important elsewhere. There are also cases for which forming the normal equations is avoided (mainly for the sake of numerical robustness if I remember correctly), so once again no silver bullets. Underdetermined systems, for instance, have pretty much the opposite problem in that the normal equations are very large. These systems are often solved using least squares with an L_2 regularizer.
Re: [math] refactoring least squares
On Mon, Feb 24, 2014 at 10:23 AM, Gilles gil...@harfang.homelinux.orgwrote: One way to improve performance would be to provide pre-allocated space for the Jacobian and reuse it for each evaluation. Do you have actual data to back this statement? The LeastSquaresProblem interface would then be: void evaluate(RealVector point, RealVector resultResiduals, RealVector resultJacobian); I'm interested in hearing your ideas on other approaches to solve this issue. Or even if this is an issue worth solving. Not before we can be sure that in-place modification (rather than reallocation) always provides a performance benefit. Allocation is rarely the problem in these situations. The implied copying of data is. And even the copying isn't always a problem. For instance, it often pays off big to copy data to column (or row) major representation to improve cache coherency. The result is that a large fraction of the time is spent copying, but without the copying, the remaining time would take 10x longer. The net time taken is 3x faster with the copy.
Re: [math]
On Mon, Feb 17, 2014 at 5:01 AM, Emmanuel Joliet ejol...@sciops.esa.intwrote: https://issues.apache.org/jira/browse/MATH-870 Recently, many problems have been found out with class ... Please, consider not removing it. We use it heavily and need the class as it gives what we need (handling the input of course is necessary regarding NaN and infinities!). I understand the problem of an incorrect/improper handling of NaN/Infinity conditions but can't justify to remove the iterfaces/classes completely? Isn't it?? Yeah... welcome to commons math. It does seem like an extreme sanction to me.
Re: [math] trouble with SingularValueDecomposition
Note that the only reason that the order is unconstrained is because the two corresponding singular values are equal. Strictly speaking, for equal singular values, any unitary transformation of the corresponding singular vectors are also valid singular vectors. On Sat, Feb 15, 2014 at 4:09 AM, Patrick Meyer meyer...@gmail.com wrote: Thanks Ted. As I mentioned my knowledge of SVD is limited, and I was not aware that it is OK to have a different order of the first two columns in the results (or the conditions under which the order doesn't matter). I am trying to track down a bug in some code and that’s what led me to the SVD. I guess I need to keep looking for the real bug. For completeness, my results R were the same as you reported. My results from CM are shown below and if you swap the first and second column, the results agree with R. U: 0.9940594018965339 0.06774763124429131 -0.08518312016997649 0.10615872136916754 -0.7761401247896214 0.621551704858 0.02400481989869077 0.6269104921377042 0.778721390144956 V: 0.9963653125425972 0.0 -0.08518312016997495 0.0531395658155507 -0.7815621241949481 0.621551704865 0.06657590034559915 0.6238274168581248 0.7787213901449556 -Original Message- From: Ted Dunning [mailto:ted.dunn...@gmail.com] Sent: Saturday, February 15, 2014 2:17 AM To: Commons Developers List Subject: Re: [math] trouble with SingularValueDecomposition For what its worth, I tested the Mahout SVD which shares code lineage with the Commons Math implementation. The results I got were: *sum(abs(m - u * s * v')) = 4.31946146e-16S =1.002319690998 1.0023196909981. U =0.994059401897 0.067747631244 -0.0851831201700.106158721369 -0.776140124790 0.62155170 0.024004819899 0.626910492138 0.778721390145 V =0.996365312543 0. -0.0851831201700.053139565816 -0.781562124195 0.621551700.066575900346 0.623827416858 0.778721390145* Note that the residue of the reconstruction is excellently small. This indicates that the result is correct. If you compare these to the R results, *[1] 1.0023196909980066 1.0023196909980066 1.$u [,1] [,2] [,3][1,] 0.067747631244291326 -0.994059401896534967 0.085183120169970525 [2,] -0.776140124789635122 -0.106158721369163295 -0.62155170469113[3,] 0.626910492137687125 -0.024004819898688426 -0.778721390144969994$v [,1] [,2] [,3] [1,] 0.0 -0.996365312542597747 0.085183120169970497[2,] -0.78156212419496163 -0.053139565815546450 -0.62155170469668[3,] 0.62382741685810772 -0.066575900345596822 -0.778721390144969550* These are identical to the previous results except that the first two singular values are equal which means that the order of the corresponding left and right singular vectors are different and there are sign changes in the singular vectors. My guess is that you will get the same results in Apache Commons Math. On Fri, Feb 14, 2014 at 6:07 PM, Patrick Meyer meyer...@gmail.com wrote: Hi, I am using the SingularValueDecomposition class with a matrix but it gives me a different result than R. My knowledge of SVD is limited, so any advice is welcomed. Here's the method in Java public void svdTest(){ double[][] x = { {1.0, -0.053071807862720116, 0.04236086650321309}, {0.05307180786272012, 1.0, 0.0058054424137053435}, {-0.04236086650321309, -0.005805442413705342, 1.0} }; RealMatrix X = new Array2DRowRealMatrix(x); SingularValueDecomposition svd = new SingularValueDecomposition(X); RealMatrix U = svd.getU(); for(int i=0;iU.getRowDimension();i++){ for(int j=0;jU.getColumnDimension();j++){ System.out.print(U.getEntry(i,j) + ); } System.out.println(); } System.out.println(); System.out.println(); RealMatrix V = svd.getV(); for(int i=0;iV.getRowDimension();i++){ for(int j=0;jV.getColumnDimension();j++){ System.out.print(V.getEntry(i,j) + ); } System.out.println(); } } And here's the function in R. x-matrix(c( 1.0, -0.053071807862720116, 0.04236086650321309, 0.05307180786272012, 1.0, 0.0058054424137053435, -0.04236086650321309, -0.005805442413705342, 1.0), nrow=3, byrow=TRUE) svd(x) Does anyone know why I am getting different results for U and V? I am using commons math 3.1. Thanks, Patrick
Re: [math] trouble with SingularValueDecomposition
And what exactly are the results you are getting? On Fri, Feb 14, 2014 at 6:07 PM, Patrick Meyer meyer...@gmail.com wrote: Hi, I am using the SingularValueDecomposition class with a matrix but it gives me a different result than R. My knowledge of SVD is limited, so any advice is welcomed. Here's the method in Java public void svdTest(){ double[][] x = { {1.0, -0.053071807862720116, 0.04236086650321309}, {0.05307180786272012, 1.0, 0.0058054424137053435}, {-0.04236086650321309, -0.005805442413705342, 1.0} }; RealMatrix X = new Array2DRowRealMatrix(x); SingularValueDecomposition svd = new SingularValueDecomposition(X); RealMatrix U = svd.getU(); for(int i=0;iU.getRowDimension();i++){ for(int j=0;jU.getColumnDimension();j++){ System.out.print(U.getEntry(i,j) + ); } System.out.println(); } System.out.println(); System.out.println(); RealMatrix V = svd.getV(); for(int i=0;iV.getRowDimension();i++){ for(int j=0;jV.getColumnDimension();j++){ System.out.print(V.getEntry(i,j) + ); } System.out.println(); } } And here's the function in R. x-matrix(c( 1.0, -0.053071807862720116, 0.04236086650321309, 0.05307180786272012, 1.0, 0.0058054424137053435, -0.04236086650321309, -0.005805442413705342, 1.0), nrow=3, byrow=TRUE) svd(x) Does anyone know why I am getting different results for U and V? I am using commons math 3.1. Thanks, Patrick
Re: [math] trouble with SingularValueDecomposition
For what its worth, I tested the Mahout SVD which shares code lineage with the Commons Math implementation. The results I got were: *sum(abs(m - u * s * v')) = 4.31946146e-16S =1.002319690998 1.0023196909981. U =0.994059401897 0.067747631244 -0.0851831201700.106158721369 -0.776140124790 0.62155170 0.024004819899 0.626910492138 0.778721390145 V =0.996365312543 0. -0.0851831201700.053139565816 -0.781562124195 0.621551700.066575900346 0.623827416858 0.778721390145* Note that the residue of the reconstruction is excellently small. This indicates that the result is correct. If you compare these to the R results, *[1] 1.0023196909980066 1.0023196909980066 1.$u [,1] [,2] [,3][1,] 0.067747631244291326 -0.994059401896534967 0.085183120169970525 [2,] -0.776140124789635122 -0.106158721369163295 -0.62155170469113[3,] 0.626910492137687125 -0.024004819898688426 -0.778721390144969994$v [,1] [,2] [,3] [1,] 0.0 -0.996365312542597747 0.085183120169970497[2,] -0.78156212419496163 -0.053139565815546450 -0.62155170469668[3,] 0.62382741685810772 -0.066575900345596822 -0.778721390144969550* These are identical to the previous results except that the first two singular values are equal which means that the order of the corresponding left and right singular vectors are different and there are sign changes in the singular vectors. My guess is that you will get the same results in Apache Commons Math. On Fri, Feb 14, 2014 at 6:07 PM, Patrick Meyer meyer...@gmail.com wrote: Hi, I am using the SingularValueDecomposition class with a matrix but it gives me a different result than R. My knowledge of SVD is limited, so any advice is welcomed. Here's the method in Java public void svdTest(){ double[][] x = { {1.0, -0.053071807862720116, 0.04236086650321309}, {0.05307180786272012, 1.0, 0.0058054424137053435}, {-0.04236086650321309, -0.005805442413705342, 1.0} }; RealMatrix X = new Array2DRowRealMatrix(x); SingularValueDecomposition svd = new SingularValueDecomposition(X); RealMatrix U = svd.getU(); for(int i=0;iU.getRowDimension();i++){ for(int j=0;jU.getColumnDimension();j++){ System.out.print(U.getEntry(i,j) + ); } System.out.println(); } System.out.println(); System.out.println(); RealMatrix V = svd.getV(); for(int i=0;iV.getRowDimension();i++){ for(int j=0;jV.getColumnDimension();j++){ System.out.print(V.getEntry(i,j) + ); } System.out.println(); } } And here's the function in R. x-matrix(c( 1.0, -0.053071807862720116, 0.04236086650321309, 0.05307180786272012, 1.0, 0.0058054424137053435, -0.04236086650321309, -0.005805442413705342, 1.0), nrow=3, byrow=TRUE) svd(x) Does anyone know why I am getting different results for U and V? I am using commons math 3.1. Thanks, Patrick
Re: [LANG] New class called StringAlgorithms?
On Fri, Jan 17, 2014 at 4:11 AM, Benedikt Ritter brit...@apache.org wrote: A concrete use case could be a query engine which allows customizing its string matching algorithm. Is this really a use case? It sounds very constructed to me. Have you ever thought I'd like to query on google, but I'd like suggestions to be matched using Levenshtein Distance algorithm? This is definitely a use case. Furthermore, Levenshtein distance is often parametrized with edit costs and possible an edit cost matrix. Tuning a system for best accuracy by injecting alternative distance functions is a common activity whether in a spelling suggestion system or DNA alignment program.
Re: [Math] src/userguide/java
In my experience, examples are most useful as ... well ... examples. As such, they should be an example of how user code works. That means that they should be a complete stand-alone project, just as most user programs should be complete and standalone. If you want to also deliver a pre-compiled version of the examples, that's great. But it doesn't affect the desirability of a stand-alone project. On Tue, Dec 31, 2013 at 9:25 AM, Gilles gil...@harfang.homelinux.orgwrote: On Tue, 31 Dec 2013 08:54:59 -0700, Phil Steitz wrote: On Dec 31, 2013, at 4:34 AM, Gilles gil...@harfang.homelinux.org wrote: On Sun, 29 Dec 2013 13:33:23 -0800, Phil Steitz wrote: On 12/29/13, 6:39 AM, Gilles wrote: Hello. Is there some framework in place in order to generate executable files from the Java sources located there? I guess that a configuration snippet could be added in the pom.xml[1] so that one of the build phases can also compile (and perhaps also run) the example applications. Regards, Gilles [1] I tried to use the pom.xml located in src/userguide but it failed to resolve artefact org.apache.commons:commons-math3:jar:3.3-SNAPSHOT. You need to install the [math] snapshot locally for maven to be able to resolve it. Run mvn install to get a current snapshot installed locally. OK. That's easy enough for me at the moment. [I just wanted to check that what I put under src/userguide/java does compile and run.] However, I wonder why it is deemed better to have another pom.xml rather than have the main one generate the examples JAR. Why exactly would one want to generate the examples jar? I get the use case for us of wanting to make sure they build, but for people wanting to use them as reference it would seem a self-contained build might be a little easier to work with. Setting it up the way it is now also makes it easy to test against prior releases. Also, the self contained build is faster. People may want to _run_ the examples, without the requirement to have maven installed. Gilles - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] Include test data from netlib
From the FAQ: *2.1) What is Netlib? *The Netlib repository contains freely available software, documents, and databases of interest to the numerical, scientific computing, and other communities. The repository is maintained by ATT Bell Laboratories, the University of Tennessee and Oak Ridge National Laboratory, and by colleagues world-wide. The collection is replicated at several sites around the world, automatically synchronized, to provide reliable and network efficient service to the global community. and *2.3) Are there restrictions on the use of software retrieved from Netlib?*Most netlib software packages have no restrictions on their use but we recommend you check with the authors to be sure. Checking with the authors is a nice courtesy anyway since many authors like to know how their codes are being used. *2.4) How do I submit software or documents to Netlib? *Direct inqueries to netlib_maintain...@netlib.org On Fri, Dec 20, 2013 at 12:04 PM, Thomas Neidhart thomas.neidh...@gmail.com wrote: Hi, I have a question regarding test data available at http://www.netlib.org/lp/data/. Could this be included in our subversion repository? I could not find a license attached to these files, but it looks like they have been contributed to netlib from various sources. It would be quite valuable to include them in our automatic tests, but they could of course also just be executed stand-alone when working on the simplex solver. Thomas - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [MATH] Eigen decomposition of matrices consisting of RealFieldElements
I had the same question. Presumably, it is a reasonable thing to have in the corner case of needing eigenvalues for matrices with extended precision decimal numbers or some such, but I would be very surprised if there measurably non-zero demand for such a feature. On Thu, Dec 19, 2013 at 9:55 AM, Martin Grotle Soukup martin.grotle.sou...@gmail.com wrote: Hi, Apologies for being impatient, but does someone think this is a good idea? In case of the contrary I will not trouble this mailing list with this request any further. Best regards, Martin Grotle Soukup 2013/12/13 Martin Grotle Soukup martin.grotle.sou...@gmail.com Hi, The linear-package in commons-math contains a class that does an eigen value decomposition of a RealMatrix. Would there be of interest to add a similar class doing eigen value decomposition of a matrix consisting of RealFieldElements? I am happy to help if this is the case. Best regards, Martin Grotle Soukup
Re: [math] Re: Sparse matrices not supported anymore?
On Fri, Nov 8, 2013 at 11:47 AM, Luc Maisonobe l...@spaceroots.org wrote: is there still consensus that we are going to remove the sparse implementations with 4.0? Well, I really think it is a pity, we should support this. But lets face it: up to now we have been unable to do it properly. Sébastien who tried to do something in this direction has left the project and nobody replaced him. I have done a fair bit of noodling and was unable to come up with a solution that is performant. The issue is that you essentially have to maintain a additional bitmask of exceptional values in addition to the implicit bitmask of non-zero elements. I don't see any way of determining that exceptional value bitmask short of a full scan. Moreover, the cost of propagating the exceptional value bitmask significantly changes the cost of various operations because exceptions require an OR while multiplication allows use of an AND. Furthermore, even after the operation itself and the operation on the exception bitmask are done, there needs to be another scan of the results to find new exceptional values. So the upshot is that dealing with this will cost at least a significant integer degradation in performance at no benefit relative to the normal user's expectations with regard to sparse vector operations. I say no benefit because no other package handles this sort of issue so users are very used to imprecise handling of exceptional values.
Re: I need a map for long and double
Serialization of primitive maps is easy to implement since the maps pretty much just consist of a couple of arrays. Most of the developers involved will shy away from java serialization or any dependency on some other framework. So is that really a show stopper? On Wed, Nov 6, 2013 at 6:11 AM, Gary Gregory garydgreg...@gmail.com wrote: On Tue, Nov 5, 2013 at 11:49 PM, Gary Gregory garydgreg...@gmail.com wrote: Thank you all for replying. HPPC looks promising and it's Apache 2 licensed. I'll give it a closer look. HPPC does not allow for serialization and even says so, odd. Now looking at fastutil... Gary Gary On Tue, Nov 5, 2013 at 8:59 PM, Ted Dunning ted.dunn...@gmail.com wrote: Trove is GPL (last I looked). Mahout has primitive collection implementations (and is obviously ASL). There are other implementations such as hppc (see http://labs.carrotsearch.com/hppc.html ) Mahout is a decent implementation, but I think that hppc has had a round or two more optimization. And 150,000 entires in a table is not big for this sort of situation. Anything short of Integer.MAX_VALUE/small_factor should be fine. On Tue, Nov 5, 2013 at 5:49 PM, Bruno P. Kinoshita brunodepau...@yahoo.com.br wrote: Maybe Trove's TObjectMapLong? [1] http://trove4j.sourceforge.net/javadocs/gnu/trove/map/TObjectLongMap.html HTH, Bruno P. Kinoshita http://kinoshita.eti.br http://tupilabs.com From: Gary Gregory garydgreg...@gmail.com To: Commons Developers List dev@commons.apache.org Sent: Tuesday, November 5, 2013 11:39 PM Subject: I need a map for long and double Hi All: I'm looking for a Map implementation that takes a String as a key and a long as the value (and another taking a double as the value). I'd rather not take the extra memory of using generic map with a Long object value hit since the maps will have up to 150,000 entries. That would save me... a meg for each map I am guestimating (on a 64-bit JVM). A meg here, a meg there... I did not see anything in [collections] or Google Guava. Thoughts? Gary -- E-Mail: garydgreg...@gmail.com | ggreg...@apache.org Java Persistence with Hibernate, Second Edition http://www.manning.com/bauer3/ JUnit in Action, Second Edition http://www.manning.com/tahchiev/ Spring Batch in Action http://www.manning.com/templier/ Blog: http://garygregory.wordpress.com Home: http://garygregory.com/ Tweet! http://twitter.com/GaryGregory -- E-Mail: garydgreg...@gmail.com | ggreg...@apache.org Java Persistence with Hibernate, Second Edition http://www.manning.com/bauer3/ JUnit in Action, Second Edition http://www.manning.com/tahchiev/ Spring Batch in Action http://www.manning.com/templier/ Blog: http://garygregory.wordpress.com Home: http://garygregory.com/ Tweet! http://twitter.com/GaryGregory -- E-Mail: garydgreg...@gmail.com | ggreg...@apache.org Java Persistence with Hibernate, Second Edition http://www.manning.com/bauer3/ JUnit in Action, Second Edition http://www.manning.com/tahchiev/ Spring Batch in Action http://www.manning.com/templier/ Blog: http://garygregory.wordpress.com Home: http://garygregory.com/ Tweet! http://twitter.com/GaryGregory
Re: [math] Multithreaded performances
On Mon, Nov 4, 2013 at 10:09 PM, Romain Manni-Bucau rmannibu...@gmail.comwrote: Oh sorry, that's what I said early, in a real app no or not enough to be an issue buy on simple apps or very high thrououtput apps yes. Le 5 nov. 2013 07:00, Ted Dunning ted.dunn...@gmail.com a écrit : That isn't what I meant. Do you really think that more than one metric has to update (increment, say) at precisely the same time? I realize that is what you said. Do you have any serious examples where metrics have to be updated all or nothing?
Re: I need a map for long and double
Trove is GPL (last I looked). Mahout has primitive collection implementations (and is obviously ASL). There are other implementations such as hppc (see http://labs.carrotsearch.com/hppc.html ) Mahout is a decent implementation, but I think that hppc has had a round or two more optimization. And 150,000 entires in a table is not big for this sort of situation. Anything short of Integer.MAX_VALUE/small_factor should be fine. On Tue, Nov 5, 2013 at 5:49 PM, Bruno P. Kinoshita brunodepau...@yahoo.com.br wrote: Maybe Trove's TObjectMapLong? [1] http://trove4j.sourceforge.net/javadocs/gnu/trove/map/TObjectLongMap.html HTH, Bruno P. Kinoshita http://kinoshita.eti.br http://tupilabs.com From: Gary Gregory garydgreg...@gmail.com To: Commons Developers List dev@commons.apache.org Sent: Tuesday, November 5, 2013 11:39 PM Subject: I need a map for long and double Hi All: I'm looking for a Map implementation that takes a String as a key and a long as the value (and another taking a double as the value). I'd rather not take the extra memory of using generic map with a Long object value hit since the maps will have up to 150,000 entries. That would save me... a meg for each map I am guestimating (on a 64-bit JVM). A meg here, a meg there... I did not see anything in [collections] or Google Guava. Thoughts? Gary -- E-Mail: garydgreg...@gmail.com | ggreg...@apache.org Java Persistence with Hibernate, Second Edition http://www.manning.com/bauer3/ JUnit in Action, Second Edition http://www.manning.com/tahchiev/ Spring Batch in Action http://www.manning.com/templier/ Blog: http://garygregory.wordpress.com Home: http://garygregory.com/ Tweet! http://twitter.com/GaryGregory
Re: [math] Multithreaded performances
My experience is that the only way to get really high performance with counter-like objects is to have one per thread and combine them on read. On Mon, Nov 4, 2013 at 8:49 AM, Romain Manni-Bucau rmannibu...@gmail.comwrote: Hi, ATM sirona (a java monitoring library in incubator) relies a lot on Summary stats object from [math3] but it needed a lock to ensure consistency. I know there is a synchronized version but this one scales less then the locked one. My question is quite simple then: will [math] add an implementation with thread safety guarantee and good performances? I think for instance to the LongAdder of Doug Lea which could be used as a good base. Romain Manni-Bucau Twitter: @rmannibucau Blog: http://rmannibucau.wordpress.com/ LinkedIn: http://fr.linkedin.com/in/rmannibucau Github: https://github.com/rmannibucau - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] Multithreaded performances
I still think that what you need is a thread-safe copy rather than a thread-safe mutate. Even if you force every thread to do the copy, the aggregation still still wins on complexity/correctness/performance ideas. On Mon, Nov 4, 2013 at 12:58 PM, Romain Manni-Bucau rmannibu...@gmail.comwrote: In sirona we collect (aggregate) data each N ms and we can still use stats during aggregation (worse case surely) Le 4 nov. 2013 21:48, Phil Steitz phil.ste...@gmail.com a écrit : On 11/4/13 12:12 PM, Romain Manni-Bucau wrote: But aggregation needs to lock so not a real solution. Lock is fine on real cases but not in simple/light ones. ThreadLocal leaks...so a trade off should be found Depends on the use case. If the use case is 0) launch a bunch of threads and let them gather stats individually 1) aggregate results Then the static aggregate method in AggregateSummaryStatistics that takes a collection as input will work with no locking required. Phil Le 4 nov. 2013 18:42, Phil Steitz phil.ste...@gmail.com a écrit : On 11/4/13 8:49 AM, Romain Manni-Bucau wrote: Hi, ATM sirona (a java monitoring library in incubator) relies a lot on Summary stats object from [math3] but it needed a lock to ensure consistency. I know there is a synchronized version but this one scales less then the locked one. My question is quite simple then: will [math] add an implementation with thread safety guarantee and good performances? I think for instance to the LongAdder of Doug Lea which could be used as a good base. The short answer is yes, patches welcome. Ted makes a good point, though; and there is already some support for aggregation in the stats classes in [math] (i.e., you can aggregate the results of per-thread stats by using, e.g. AggregateSummaryStatistics#aggregate). See MATH-1016 re extending this to more stats. Phil Romain Manni-Bucau Twitter: @rmannibucau Blog: http://rmannibucau.wordpress.com/ LinkedIn: http://fr.linkedin.com/in/rmannibucau Github: https://github.com/rmannibucau - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] Multithreaded performances
The copy doesn't have to lock if you build the right data structure. The thread leak problem can be more serious. On Mon, Nov 4, 2013 at 2:47 PM, Phil Steitz phil.ste...@gmail.com wrote: On 11/4/13 2:31 PM, Romain Manni-Bucau wrote: The copy will lock too. Right. That is why I asked exactly how things work. If you can't lock during aggregation, we need something different. And it doesnt solve leak issue of the one instance by thread solution, no? Correct, again depends on the setup how big a problem that is / what can be done to manage it. Phil Le 4 nov. 2013 23:27, Phil Steitz phil.ste...@gmail.com a écrit : On 11/4/13 2:22 PM, Ted Dunning wrote: I still think that what you need is a thread-safe copy rather than a thread-safe mutate. I was just thinking the same thing. Patches welcome. Phil Even if you force every thread to do the copy, the aggregation still still wins on complexity/correctness/performance ideas. On Mon, Nov 4, 2013 at 12:58 PM, Romain Manni-Bucau rmannibu...@gmail.comwrote: In sirona we collect (aggregate) data each N ms and we can still use stats during aggregation (worse case surely) Le 4 nov. 2013 21:48, Phil Steitz phil.ste...@gmail.com a écrit : On 11/4/13 12:12 PM, Romain Manni-Bucau wrote: But aggregation needs to lock so not a real solution. Lock is fine on real cases but not in simple/light ones. ThreadLocal leaks...so a trade off should be found Depends on the use case. If the use case is 0) launch a bunch of threads and let them gather stats individually 1) aggregate results Then the static aggregate method in AggregateSummaryStatistics that takes a collection as input will work with no locking required. Phil Le 4 nov. 2013 18:42, Phil Steitz phil.ste...@gmail.com a écrit : On 11/4/13 8:49 AM, Romain Manni-Bucau wrote: Hi, ATM sirona (a java monitoring library in incubator) relies a lot on Summary stats object from [math3] but it needed a lock to ensure consistency. I know there is a synchronized version but this one scales less then the locked one. My question is quite simple then: will [math] add an implementation with thread safety guarantee and good performances? I think for instance to the LongAdder of Doug Lea which could be used as a good base. The short answer is yes, patches welcome. Ted makes a good point, though; and there is already some support for aggregation in the stats classes in [math] (i.e., you can aggregate the results of per-thread stats by using, e.g. AggregateSummaryStatistics#aggregate). See MATH-1016 re extending this to more stats. Phil Romain Manni-Bucau Twitter: @rmannibucau Blog: http://rmannibucau.wordpress.com/ LinkedIn: http://fr.linkedin.com/in/rmannibucau Github: https://github.com/rmannibucau - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] Multithreaded performances
On Mon, Nov 4, 2013 at 8:23 PM, Phil Steitz phil.ste...@gmail.com wrote: On 11/4/13 3:44 PM, Ted Dunning wrote: The copy doesn't have to lock if you build the right data structure. The individual stats objects need to update multiple quantities atomically when new values come in. Consistency in the copy requires that you suppress updates while the copy is in progress unless you implement some kind of update queue internally. What exactly do you mean by the right data structure? I was talking about lockless data structures in general. Are you sure that real transactions are a requirement here?
Re: [math] Multithreaded performances
That isn't what I meant. Do you really think that more than one metric has to update (increment, say) at precisely the same time? On Mon, Nov 4, 2013 at 9:49 PM, Romain Manni-Bucau rmannibu...@gmail.comwrote: You cant stop the app cause you take a snapshot of the monitoring metrics so yes Le 5 nov. 2013 06:46, Ted Dunning ted.dunn...@gmail.com a écrit : On Mon, Nov 4, 2013 at 8:23 PM, Phil Steitz phil.ste...@gmail.com wrote: On 11/4/13 3:44 PM, Ted Dunning wrote: The copy doesn't have to lock if you build the right data structure. The individual stats objects need to update multiple quantities atomically when new values come in. Consistency in the copy requires that you suppress updates while the copy is in progress unless you implement some kind of update queue internally. What exactly do you mean by the right data structure? I was talking about lockless data structures in general. Are you sure that real transactions are a requirement here?
Re: [MATH] Interest in large patches for small cleanup / performance changes?
On Sun, Nov 3, 2013 at 10:56 AM, Luc Maisonobe l...@spaceroots.org wrote: I had proposed that error messages be incrementally built from simple base patterns, to be assembled either at the point where the exception is going to be thrown or inside specific exceptions[2] (or a combination of both). It often doesn't work. Sentences constructions are completely different in different languages, and it is impossible to simply buid up from elementary components that are individually translated and assembled later. See all the documentation about the ancient gettext for example. Modern printf implementations deal with this by numbered arguments. This is not a problem any more. See http://docs.oracle.com/javase/7/docs/api/java/util/Formatter.html#syntax
Re: [MATH] Interest in large patches for small cleanup / performance changes?
How many of these actually matter any more? On Sat, Nov 2, 2013 at 7:52 AM, Sean Owen sro...@apache.org wrote: In Math, is there any appetite for large patches containing many instances of particular micro-optimizations? Examples: - Replace: a[i][j] = a[i][j] + foo; with: a[i][j] += foo; … which is faster/leaner in the byte code by a little bit. It might make a difference in many nested, tight loops. Does this actually matter after the JIT takes hold? And if the JIT doesn't care to optimize this away, does it even matter? - Inefficient toArray() calls with 0-length arg - Using Map.entrySet() instead of keySet() + get()s I think that this actually really does matter, but escape analysis has gotten dramatically better lately and may make the associated object creation much less of an issue. - Unnecessarily non-static private methods/classes This is stylistic and important. - StringBuffer vs StringBuilder I know for a fact that escape analysis in recent JVM's gets rid of the locks in most StringBuilder idioms and this just doesn't matter any more.
Re: [Math] due-to attribute in changes.xml
On Thu, Oct 31, 2013 at 10:24 AM, Gilles gil...@harfang.homelinux.orgwrote: The person who raised the bug still took the trouble to do so. My question is still: is it sufficient? Without filing a bug report, the reporter is harming himself. Also, some reports are only feature requests. I deem it quite unfair that the release notes would contain lines such as * MATH-123456789: Algorithm Xxx implemented. Thanks to reporter. How is it controversial to say thank you for contributions? The report is a contribution and being nice could encourage more contributions. Being all officious about what suffices to be worthy enough to make the oh so mighty gatekeepers be generous is a great way to turn people off.
Re: [MATH] Repurposing a deprecated constructor in EigenDecomposition
On Wed, Oct 23, 2013 at 3:14 PM, Sean Owen sro...@gmail.com wrote: EigenDecomposition resembles QR in this respect, as far as they are implemented here. This argues for them to treat arguments similarly. Actually not. It is quite reasonable for the EigenDecomposition to stop when singularity is reached. This affects the shape of the eigenvector matrix. Perhaps add a new constructor with a double tolerance and a boolean that says to stop early. QR is subject to the same logic since partial QR is often more useful than full QR with singular R. This is the same logic as with Cholesky since QR and Cholesky are two sides of the same coin in many respects.
Re: [MATH] Repurposing a deprecated constructor in EigenDecomposition
On Wed, Oct 23, 2013 at 8:33 PM, Sean Owen sro...@gmail.com wrote: it feels a little funny just because then we should have similar logic for other decompositions. I think I remember the LU one stops early, always. The stopping early is definitely an option with QR. With LU, it isn't so clear.
Re: svn commit: r1533990 - /commons/proper/math/trunk/src/main/java/org/apache/commons/math3/distribution/BetaDistribution.java
Thread issue. Off topic for this thread. No idea how this happened. On Mon, Oct 21, 2013 at 3:25 PM, Phil Steitz phil.ste...@gmail.com wrote: Was this maybe to the wrong thread, or is there a doco issue here? Phil On Oct 20, 2013, at 10:42 PM, Ted Dunning ted.dunn...@gmail.com wrote: This makes it somewhat harder to read the docs code which is where I read docs 90+% of the time. On the other hand my IDE will do the right thing if I ask it to. Sent from my iPhone On Oct 20, 2013, at 14:27, Thomas Neidhart thomas.neidh...@gmail.com wrote: On 10/20/2013 11:24 PM, t...@apache.org wrote: Author: tn Date: Sun Oct 20 21:24:45 2013 New Revision: 1533990 URL: http://svn.apache.org/r1533990 Log: [MATH-1039] Avoid code duplication by calling logDensity itself. Modified: commons/proper/math/trunk/src/main/java/org/apache/commons/math3/distribution/BetaDistribution.java Modified: commons/proper/math/trunk/src/main/java/org/apache/commons/math3/distribution/BetaDistribution.java URL: http://svn.apache.org/viewvc/commons/proper/math/trunk/src/main/java/org/apache/commons/math3/distribution/BetaDistribution.java?rev=1533990r1=1533989r2=1533990view=diff == --- commons/proper/math/trunk/src/main/java/org/apache/commons/math3/distribution/BetaDistribution.java (original) +++ commons/proper/math/trunk/src/main/java/org/apache/commons/math3/distribution/BetaDistribution.java Sun Oct 20 21:24:45 2013 @@ -136,24 +136,8 @@ public class BetaDistribution extends Ab /** {@inheritDoc} */ public double density(double x) { -recomputeZ(); -if (x 0 || x 1) { -return 0; -} else if (x == 0) { -if (alpha 1) { -throw new NumberIsTooSmallException(LocalizedFormats.CANNOT_COMPUTE_BETA_DENSITY_AT_0_FOR_SOME_ALPHA, alpha, 1, false); -} -return 0; -} else if (x == 1) { -if (beta 1) { -throw new NumberIsTooSmallException(LocalizedFormats.CANNOT_COMPUTE_BETA_DENSITY_AT_1_FOR_SOME_BETA, beta, 1, false); -} -return 0; -} else { -double logX = FastMath.log(x); -double log1mX = FastMath.log1p(-x); -return FastMath.exp((alpha - 1) * logX + (beta - 1) * log1mX - z); -} +final double logDensity = logDensity(x); +return logDensity == Double.NEGATIVE_INFINITY ? 0 : FastMath.exp(logDensity); } /** {@inheritDoc} **/ I did this change for one class, but I propose to do this whereever applicable to avoid code duplication in the distribution classes. WDYT? Thomas - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] MathIllegalArgumentException
+1 The overwhelming standard practice is to use a plausible exception type (such as some form of IllegalArgumentException) with a message. On Mon, Oct 21, 2013 at 5:24 PM, Phil Steitz phil.ste...@gmail.com wrote: I hate to open this can of worms again, but the following is just too painful for me to ignore. From recent mods to BinomialConfidenceInterval javadoc: * @throws NumberIsTooLargeException if {@code numberOfSuccesses numberOfTrials}. The NumberIsTooLarge exception adds exactly zero to what would be more natural - just throw MathIAE. Fortunately, the message is at least still there in the code: if (numberOfSuccesses numberOfTrials) { throw new NumberIsTooLargeException(LocalizedFormats.NUMBER_OF_SUCCESS_LARGER_THAN_POPULATION_SIZE, numberOfSuccesses, numberOfTrials, true); } The NumberIsTooLarge is ridiculous. What number? Why isn't the second number too small? If we really are going insist on defining and advertising lots of little subexceptions to MathIAE, we need to define appropriate ones, or just leave MathIAE. My vote is to just allow throwing MathIAE with a descriptive message. If we insist on adding subexceptions for everything, in this case, we need something like SubsetSizeException and in another set of changes that I am about to commit that will end up similarly mangled, I will need InsufficientDataException I would like to get full community input on this topic for once and for all and either add a slew of new exceptions so what we throw is meaningful in the context of the caller, or just allow MathIAE to be thrown directly. So please all be brief and specify your preference for one of the two options below: 0) allow MathIAE to be thrown directly with an informative message 1) define caller-meaningful exceptions for situations such as insufficient data, invalid subset size, invalid probability, invalid interval, ... I would much prefer 0), but if consensus is 1), I will start adding exceptions so what we throw is meaningful and open a ticket to clean up for 4.0. Phil - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: svn commit: r1533990 - /commons/proper/math/trunk/src/main/java/org/apache/commons/math3/distribution/BetaDistribution.java
This makes it somewhat harder to read the docs code which is where I read docs 90+% of the time. On the other hand my IDE will do the right thing if I ask it to. Sent from my iPhone On Oct 20, 2013, at 14:27, Thomas Neidhart thomas.neidh...@gmail.com wrote: On 10/20/2013 11:24 PM, t...@apache.org wrote: Author: tn Date: Sun Oct 20 21:24:45 2013 New Revision: 1533990 URL: http://svn.apache.org/r1533990 Log: [MATH-1039] Avoid code duplication by calling logDensity itself. Modified: commons/proper/math/trunk/src/main/java/org/apache/commons/math3/distribution/BetaDistribution.java Modified: commons/proper/math/trunk/src/main/java/org/apache/commons/math3/distribution/BetaDistribution.java URL: http://svn.apache.org/viewvc/commons/proper/math/trunk/src/main/java/org/apache/commons/math3/distribution/BetaDistribution.java?rev=1533990r1=1533989r2=1533990view=diff == --- commons/proper/math/trunk/src/main/java/org/apache/commons/math3/distribution/BetaDistribution.java (original) +++ commons/proper/math/trunk/src/main/java/org/apache/commons/math3/distribution/BetaDistribution.java Sun Oct 20 21:24:45 2013 @@ -136,24 +136,8 @@ public class BetaDistribution extends Ab /** {@inheritDoc} */ public double density(double x) { -recomputeZ(); -if (x 0 || x 1) { -return 0; -} else if (x == 0) { -if (alpha 1) { -throw new NumberIsTooSmallException(LocalizedFormats.CANNOT_COMPUTE_BETA_DENSITY_AT_0_FOR_SOME_ALPHA, alpha, 1, false); -} -return 0; -} else if (x == 1) { -if (beta 1) { -throw new NumberIsTooSmallException(LocalizedFormats.CANNOT_COMPUTE_BETA_DENSITY_AT_1_FOR_SOME_BETA, beta, 1, false); -} -return 0; -} else { -double logX = FastMath.log(x); -double log1mX = FastMath.log1p(-x); -return FastMath.exp((alpha - 1) * logX + (beta - 1) * log1mX - z); -} +final double logDensity = logDensity(x); +return logDensity == Double.NEGATIVE_INFINITY ? 0 : FastMath.exp(logDensity); } /** {@inheritDoc} **/ I did this change for one class, but I propose to do this whereever applicable to avoid code duplication in the distribution classes. WDYT? Thomas - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [OT][LANG] Blog post about Validate vs. Guava Preconditions
In general, it is going to be very, very hard for Commons to go up against guava. The Preconditions stuff is only the tip of the ice-berg. The advantages highlighted in the blog are typical of every aspect of guava ... well thought out (the different exception types and varargs for instance) and absolutely no apologies for requiring recent Java versions. To actually match the quality of guava, Commons would have to stop worrying about minutiae like whether or not there is a Validate.isNotEmpty and start pushing hard and fast against the real issues. On Fri, Oct 18, 2013 at 8:20 AM, Benedikt Ritter brit...@apache.org wrote: Hi, this came in via twitter: http://piotrjagielski.com/blog/google-guava-vs-apache-commons-for-argument-validation/ What do we do, to win the next contest? :-) Benedikt -- http://people.apache.org/~britter/ http://www.systemoutprintln.de/ http://twitter.com/BenediktRitter http://github.com/britter
Re: [math] Add Pair factory method, toString(), Comparator
On Thu, Oct 17, 2013 at 2:06 PM, Gilles gil...@harfang.homelinux.orgwrote: The issue is closed, thank you. To be honest I'm sorry I opened this issue, as it wasn't worth this much time or annoyance. If the regular contributors were thinking that way, no work would be done. There wouldn't be a project where people discuss just like we did. Gilles, If it weren't this way, there would be more regular contributors.
Re: [CHALLENGE] Move All of Commons to the Dormant
Careful there. Hen might suggest making that list dormant. Sent from my iPhone On Oct 16, 2013, at 0:38, Jörg Schaible joerg.schai...@gmx.de wrote: BTW: We have already a challenge result, it's just terribly out of date: https://wiki.apache.org/commons/CommonsPeople - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] Add Pair factory method, toString(), Comparator
Does this really add comparisons on average? Or does it only add comparisons on key equality? If the latter the difference is definitely minute. Secondly, changing comparator value to include value changes how sets work. Usually, this is good. Occasionally bad. In any case, a change that is rather subtle and difficult to determine whether there is an impact. On Wed, Oct 16, 2013 at 10:10 PM, Sean Owen sro...@gmail.com wrote: You are right that it adds 1 or 2 more branches per comparison. The new Comparator would at least be consistent with equals(), though it probably doesn't matter for correctness in practice. I am interested in closing this minor issue so I suggest you ignore this part if you guess that this overhead is too much, and that it's not worth offering this for other callers of CM. I'll just maintain my own copy. On Wed, Oct 16, 2013 at 9:56 PM, Gilles gil...@harfang.homelinux.org wrote: The potential problem is performance. The current code for sortInPlace is not as fast as it could be, very probably because it uses a Comparator. IIUC, using your new comparator will add another if (to test whether comparing the values is necessary). [And reverseOrder will also add a few operations on its own, I guess.] It would be nice to know whether the impact is really fairly negligible, or not. There is a class (in the test part of repository) for performing simple benchmarks: org.apache.commons.math3.**PerfTestUtils that tries to provide fair comparison results by interleaving calls to two (or more) alternative codes. Best regards, Gilles --**--**- To unsubscribe, e-mail: dev-unsubscribe@commons.**apache.org dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [VOTE] Move Apache Commons to Git for SCM... - is not a consensus
Ralph, Majority votes at ASF almost never require a majority of all possible voters. Almost always the (plus 3 plus minus) convention is used. As you can find in innumerable threads as well, consensus among the discussion participants is preferable for big changes (like moving to git). Consensus does not depend on the potential number of voters. In fact, virtually nothing depends on a quorum at ASF other than member votes. That said, this vote may well a small victory that causes a larger problem. The hard question here is whether it is better to pause here in order to make faster progress. Phil's point is a bit out of order ... if he had responded to the request for votes with his statement that the vote was premature, it would have been much better. To wait until after the vote has been lost and then claim that more discussion is needed is a bit of a problem, at least from the point of view of appearance. One very confusing procedural point is that half-way through the vote, the subject line reverted to [DISCUSS] rather than [VOTE]. See http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3CCALznzY4v1bPGrMotJkmSN8wp9hSjs8mMjSj89wfzBEgimhtxrw%40mail.gmail.com%3E This is the point that Phil first commented. On the other hand, Phil also commented on the thread with the [VOTE] subject a number of times: http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3ca9d202a4-6e76-42d8-9606-1e40d6916...@gmail.com%3E http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3c08688247-b00e-44c7-8b21-f107921b4...@gmail.com%3E http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3c5256ff12.3070...@gmail.com%3E http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3c110b24a9-dd67-436d-9e2d-e29521693...@gmail.com%3E http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3c110b24a9-dd67-436d-9e2d-e29521693...@gmail.com%3E In none of these did he say that the vote was premature. On Sun, Oct 13, 2013 at 11:11 PM, Ralph Goers ralph.go...@dslextreme.comwrote: Actually, if you read Roy's post from a few days ago on Incubator General you will find that consensus is != to majority or unanimity. See http://mail-archives.apache.org/mod_mbox/incubator-general/201310.mbox/ajax/%3CC2FDB244-459D-4EC4-954A-7A7F6C4B179B%40gbiv.com%3Efrom which I quote below: Consensus is that everyone who shares an opinion agrees to a common resolution (even if they do not personally prefer that resolution). Unanimity means that everyone present agrees (for a PMC discussing things in email, that means everyone listed on the roster must affirmatively agree). Hence, consensus decisions can be vetoed, as is clearly stated in the HTTP Server Project Guidelines, unless the project has decided to adopt some other set of bylaws. As I understand this, consensus means that a majority must vote and there must not be any -1 votes among those who voted. Unanimity means everyone must vote and no one must vote -1. Of course, majority means there must be at least three +1 votes and more +1s than -1s. Notice that http://httpd.apache.org/dev/guidelines.html specifically says An action item requiring consensus approval must receive at least 3 binding +1 votes and no vetoes., However, I don't see any guidance on the httpd page that would indicate whether this vote requires a consensus or a majority. One could certainly argue that deciding to move from svn to git is procedural and thus only requires a majority, however I tend to believe that consensus would be what would be preferred for this vote. Ralph On Oct 13, 2013, at 1:52 PM, James Carman wrote: Phil, While I appreciate your concerns, the vote is a valid vote: Votes on procedural issues follow the common format of majority rule unless otherwise stated. That is, if there are more favourable votes than unfavourable ones, the issue is considered to have passed -- regardless of the number of votes in each category. (If the number of votes seems too small to be representative of a community consensus, the issue is typically not pursued. However, see the description of lazy consensus for a modifying factor.) I got this information from: http://www.apache.org/foundation/voting.html We definitely have enough people voting to be considered a consensus (consensus != unanimous). However, we will not move forward with the Git move if we don't have any luck with our test component (different thread). If we see the test component isn't working out well, then we can just decide (or vote again) to scrap the idea and move on. Hopefully that addresses your concerns. Thanks, James On Sun, Oct 13, 2013 at 3:47 PM, Phil Steitz phil.ste...@gmail.com wrote: On 10/13/13 8:09 AM, James Carman wrote: Well, it has been 72 hours, so let's tally up the votes. As I see it (counting votes on both lists): +1s James Carman Romain Manni-Bucau
Re: [VOTE] Move Apache Commons to Git for SCM... - is not a consensus
James, You succeeded in creating a second thread. It is the first thread that had a reverted subject line. Ironically, it was one of your posts that reverted the subject line ... likely related to the confusion you had in the first place with gmail. Check the archives. They show the subject lines. On Mon, Oct 14, 2013 at 12:07 AM, James Carman ja...@carmanconsulting.comwrote: There were two threads. As I explained, the first two DISCUSSION/VOTE threads were getting mingled together in gmail, so I started another thread for the VOTE hoping to avoid confusion (apparently I failed in that). On Sunday, October 13, 2013, Ted Dunning wrote: Ralph, Majority votes at ASF almost never require a majority of all possible voters. Almost always the (plus 3 plus minus) convention is used. As you can find in innumerable threads as well, consensus among the discussion participants is preferable for big changes (like moving to git). Consensus does not depend on the potential number of voters. In fact, virtually nothing depends on a quorum at ASF other than member votes. That said, this vote may well a small victory that causes a larger problem. The hard question here is whether it is better to pause here in order to make faster progress. Phil's point is a bit out of order ... if he had responded to the request for votes with his statement that the vote was premature, it would have been much better. To wait until after the vote has been lost and then claim that more discussion is needed is a bit of a problem, at least from the point of view of appearance. One very confusing procedural point is that half-way through the vote, the subject line reverted to [DISCUSS] rather than [VOTE]. See http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3CCALznzY4v1bPGrMotJkmSN8wp9hSjs8mMjSj89wfzBEgimhtxrw%40mail.gmail.com%3E This is the point that Phil first commented. On the other hand, Phil also commented on the thread with the [VOTE] subject a number of times: http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3ca9d202a4-6e76-42d8-9606-1e40d6916...@gmail.com%3E http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3c08688247-b00e-44c7-8b21-f107921b4...@gmail.com%3E http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3c5256ff12.3070...@gmail.com%3E http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3c110b24a9-dd67-436d-9e2d-e29521693...@gmail.com%3E http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3c110b24a9-dd67-436d-9e2d-e29521693...@gmail.com%3E In none of these did he say that the vote was premature. On Sun, Oct 13, 2013 at 11:11 PM, Ralph Goers ralph.go...@dslextreme.com wrote: Actually, if you read Roy's post from a few days ago on Incubator General you will find that consensus is != to majority or unanimity. See http://mail-archives.apache.org/mod_mbox/incubator-general/201310.mbox/ajax/%3CC2FDB244-459D-4EC4-954A-7A7F6C4B179B%40gbiv.com%3EfromwhichI quote below: Consensus is that everyone who shares an opinion agrees to a common resolution (even if they do not personally prefer that resolution). Unanimity means that everyone present agrees (for a PMC discussing things in email, that means everyone listed on the roster must affirmatively agree). Hence, consensus decisions can be vetoed, as is clearly stated in the HTTP Server Project Guidelines, unless the project has decided to adopt some other set of bylaws. As I understand this, consensus means that a majority must vote and there must not be any -1 votes among those who voted. Unanimity means everyone must vote and no one must vote -1. Of course, majority means there must be at least three +1 votes and more +1s than -1s. Notice that http://httpd.apache.org/dev/guidelines.html specifically says An action item requiring consensus approval must receive at least 3 binding +1 votes and no vetoes., However, I don't see any guidance on the httpd page that would indicate whether this vote requires a consensus or a majority. One could certainly argue that deciding to move from svn to git is procedural and thus only requires a majority, however I tend to believe that consensus would be what would be preferred for this vote. Ralph On Oct 13, 2013, at 1:52 PM, James Carman wrote: Phil, While I appreciate your concerns, the vote is a valid vote: Votes on procedural issues follow the common format of majority rule unless otherwise stated. That is, if there are more favourable votes than unfavourable ones, the issue is considered to have passed -- regardless of the number of votes in each category. (If the number of votes seems too small to be representative of a community consensus, the issue is typically
Re: [VOTE] Move Apache Commons to Git for SCM... - is not a consensus
Ralph, I completely agree that this vote wasn't consensus. But where you say As I understand this, consensus means that a majority must vote and there must not be any -1 votes among those who voted. I disagree. The only quorum typically required for ASF consensus votes is 3 +1's, not a majority of possible voters. On Mon, Oct 14, 2013 at 2:15 AM, Ralph Goers ralph.go...@dslextreme.comwrote: Please re-read my message. James stated We definitely have enough people voting to be considered a consensus (consensus != unanimous). My point was to quote what Roy posted a few days ago that said while consensus isn't unanimous it also isn't the simple majority vote either, so to state that consensus was reached is incorrect because there were several -1 votes. Ralph On Oct 13, 2013, at 3:51 PM, Ted Dunning wrote: Ralph, Majority votes at ASF almost never require a majority of all possible voters. Almost always the (plus 3 plus minus) convention is used. As you can find in innumerable threads as well, consensus among the discussion participants is preferable for big changes (like moving to git). Consensus does not depend on the potential number of voters. In fact, virtually nothing depends on a quorum at ASF other than member votes. That said, this vote may well a small victory that causes a larger problem. The hard question here is whether it is better to pause here in order to make faster progress. Phil's point is a bit out of order ... if he had responded to the request for votes with his statement that the vote was premature, it would have been much better. To wait until after the vote has been lost and then claim that more discussion is needed is a bit of a problem, at least from the point of view of appearance. One very confusing procedural point is that half-way through the vote, the subject line reverted to [DISCUSS] rather than [VOTE]. See http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3CCALznzY4v1bPGrMotJkmSN8wp9hSjs8mMjSj89wfzBEgimhtxrw%40mail.gmail.com%3E This is the point that Phil first commented. On the other hand, Phil also commented on the thread with the [VOTE] subject a number of times: http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3ca9d202a4-6e76-42d8-9606-1e40d6916...@gmail.com%3E http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3c08688247-b00e-44c7-8b21-f107921b4...@gmail.com%3E http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3c5256ff12.3070...@gmail.com%3E http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3c110b24a9-dd67-436d-9e2d-e29521693...@gmail.com%3E http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3c110b24a9-dd67-436d-9e2d-e29521693...@gmail.com%3E In none of these did he say that the vote was premature. On Sun, Oct 13, 2013 at 11:11 PM, Ralph Goers ralph.go...@dslextreme.comwrote: Actually, if you read Roy's post from a few days ago on Incubator General you will find that consensus is != to majority or unanimity. See http://mail-archives.apache.org/mod_mbox/incubator-general/201310.mbox/ajax/%3CC2FDB244-459D-4EC4-954A-7A7F6C4B179B%40gbiv.com%3Efromwhich I quote below: Consensus is that everyone who shares an opinion agrees to a common resolution (even if they do not personally prefer that resolution). Unanimity means that everyone present agrees (for a PMC discussing things in email, that means everyone listed on the roster must affirmatively agree). Hence, consensus decisions can be vetoed, as is clearly stated in the HTTP Server Project Guidelines, unless the project has decided to adopt some other set of bylaws. As I understand this, consensus means that a majority must vote and there must not be any -1 votes among those who voted. Unanimity means everyone must vote and no one must vote -1. Of course, majority means there must be at least three +1 votes and more +1s than -1s. Notice that http://httpd.apache.org/dev/guidelines.html specifically says An action item requiring consensus approval must receive at least 3 binding +1 votes and no vetoes., However, I don't see any guidance on the httpd page that would indicate whether this vote requires a consensus or a majority. One could certainly argue that deciding to move from svn to git is procedural and thus only requires a majority, however I tend to believe that consensus would be what would be preferred for this vote. Ralph On Oct 13, 2013, at 1:52 PM, James Carman wrote: Phil, While I appreciate your concerns, the vote is a valid vote: Votes on procedural issues follow the common format of majority rule unless otherwise stated. That is, if there are more favourable votes than unfavourable ones, the issue is considered to have passed -- regardless of the number of votes in each category
Re: [DISCUSS] Why is releasing such a pain and what can we do to make it easier?
On Mon, Oct 14, 2013 at 2:55 AM, Henri Yandell flame...@gmail.com wrote: I propose release votes be simple revision based requests and involve no artifact churn :) Hen, This is a pretty good idea. But I still think that artifact churn will be a necessary process in order to get enough valid QA on the artifacts. But it should be possible to get a source artifact out without so much pain.
Re: [VOTE] Moving to Git...
I hate myself a bit for jumping in here, but as much as I prefer git, I really don't think that changing will make that much difference. The problem with commons is that people have much more energy for interminable conversations about things that don't much matter (like this thread). People who do things don't generally want to talk them to death. If half the energy that goes into long debates went into coding for commons there wouldn't be a problem. Long discussions about whether discussions about things that might make coding easier are even worse than a long discussions so I am now part of the problem. Perhaps a good rule of thumb would be no more than 5 email messages about non-code issues per patch that you have posted to a commons component. I am probably at or beyond that limit so I will shut up and not respond further. Given that the open source community has gradually been re-inventing aspects of scientific society (salons = meetups, RS = ASF and so on) maybe it is time to invent something like peer review to moderate the long conversations. On Fri, Oct 11, 2013 at 6:01 AM, Christian Grobmeier grobme...@gmail.comwrote: +1 I consider this move to happen step by step and see only little risk if we start with a single component first. As the half of the world works with git meanwhile I see less risk in general too. On 10 Oct 2013, at 16:41, James Carman wrote: All, We have had some great discussions about moving our SCM to Git. I think it's time to put it to a vote. So, here we go: +1 - yes, move to Git -1 - no, do not move to Git The vote will be left open for 72 hours. Go! James --**--**- To unsubscribe, e-mail: dev-unsubscribe@commons.**apache.orgdev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org --- http://www.grobmeier.de @grobmeier GPG: 0xA5CC90DB --**--**- To unsubscribe, e-mail: dev-unsubscribe@commons.**apache.orgdev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [OT] Anyone going to JavaOne?
I am not going, but we have a ton of guys there. Drop by the MapR booth and say hi! On Thu, Sep 19, 2013 at 12:50 PM, James Carman ja...@carmanconsulting.comwrote: Is anyone planning on going? It would be great to meet some of you guys face-to-face for once, if you're going to be there. James - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
[MATH] Re: commons-math pull request: Two implementations of least squares in separeat...
On Tue, Aug 27, 2013 at 7:41 AM, Gilles gil...@harfang.homelinux.orgwrote: The patch does not apply cleanly (special options needed to handle output from git?). Try different prefix levels. The -p0 option is commonly helpful.
Re: MannWhitneyUTest.mannWhitneyU
I think that it will be somewhat slower, but next to imperceptibly so. It will not be any more accurate. It should be noted, however, that this code will fail for input longer than 2^16 because of integer overflow. On Sun, Aug 25, 2013 at 8:27 PM, Dave Brosius dbros...@apache.org wrote: I would think that in public double mannWhitneyU(final double[] x, final double[] y) final double U1 = sumRankX - (x.length * (x.length + 1)) / 2; should be final double U1 = sumRankX - (x.length * (x.length + 1)) / 2*.0*; right?
Re: Need for an alternatives to the main line of code.
On Thu, Aug 22, 2013 at 11:52 AM, Luc Maisonobe luc.maison...@free.frwrote: Then you just clone it as you would clone any repositories and provide a link to your own repository. If I remember well, Evan just did that a few days ago. And you can do with it as you will. Build a prototype without tests to make a point. Or fork the code into a proprietary product. Or whatever you like.
Re: Need for an alternatives to the main line of code.
+1 Sent from my iPhone On Aug 21, 2013, at 9:42, Ajo Fod ajo@gmail.com wrote: I hope you'll agree that as it stands, this makes CM capable of only solving a subset the mathematical problems of what it can solve with a more open policy. The argument for alternative designs of the API is great too because it allows people to comment on the API design as they use it. - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] Kolmogorov-Smirnov 2-sample test
On Sat, Aug 10, 2013 at 8:59 AM, Ajo Fod ajo@gmail.com wrote: If the data doesn't fit, you probably need a StorelessQuantile estimator like QuantileBin1D from the colt libraries. Then pick a resolution and do the single pass search. Peripheral to the actual topic, but the Colt libraries are out of date in almost every respect. When we added unit tests, even the most basic functions turned up dozens of serious bugs. With respect to more advanced estimation such as quantiles, nothing in Colt comes close to streamlib. Even the Mahout on-line estimators are generally superior. QuantileBin1D, in particular, lacks the machinery of QDigests (not suprising since they were published in 2004, long after Colt went dormant). Check out https://github.com/clearspring/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/quantile/QDigest.java and the original paper http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf
Re: [Math] Fluent API, inheritance and immutability
This is often dealt with by using builder classes and not putting all the fluent methods on the objects being constructed. The other way to deal with this is to use a covariant return type. For instance, there is no guarantee that Pattern.compile returns any particular class other than that it returns a sub-class of Pattern. Do you have a specific example of the problem you are alluding to? On Wed, Aug 7, 2013 at 11:23 AM, Gilles gil...@harfang.homelinux.orgwrote: Hi. It seems that each two of those concepts are at odds with the third one. E.g. you can have a fluent API and immutability, but this then prevents you from defining fluent API methods in a base class because immutability requires creating a new object (but the base class cannot know how to build a subclass). Dropping immutability would allow to define withXxx at the hierarchy level where they belong because the fluent methods modify instance fields (and return an existing this). Whereas keeping immutability requires that all withXxx are always redefined at the bottom level of the hierarchy. Also, if a base class is abstract, no fluent method can be defined in it, since it cannot be instantiated. This also leads to the situation where the (re)initialization of fields that belong to the base class must be delegated to withXxx methods in all the subclasses. Thus, in this particular case, immutability entails code duplication. I wonder whether it would be possible to have the best of all worlds by 1. dropping immutability of the instance fields, 2. requiring that all classes participating in the fluent API implement a copy constructor, 3. requiring that all non abstract classes implement a copy method (whose contract is to return a fresh copy). Hence, code that would like to ensure that it is the sole owner of an object would be able to call the copy method on a mutable instance that would have been constructed with the fluent API. [One a side note: that proposal would also seem to reduce the overhead (however small that may be) of creating a new object for each modification, as well as allow usage in situations where creating a new instance would be undesirable e.g.: Applying a withXxx method on an object stored in a collection would create a local instance, and require that it be assigned back into the collection, preventing a language construct such as foreach loops.] I know that dropping immutability seems a step backwards from what were trying to achieve in Commons Math but it seems that we must let go of something (and security could be retained by unit tests that check the contract of copy) If I'm missing something obvious, please let me know! Gilles --**--**- To unsubscribe, e-mail: dev-unsubscribe@commons.**apache.orgdev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [Math] Does any commiter understand Change of variables?
The math is quite simple. What is not clear is what the numerical properties are for substitution of the sort being advocated. Which functions will do better with substitution? Which will do better with Laguerre polynomials? On Sat, Jul 20, 2013 at 8:59 AM, Ajo Fod ajo@gmail.com wrote: The method is described here: http://en.wikipedia.org/wiki/Integration_by_substitution My patch uses it for improper integration via the change of variable t/(1-t^2) as suggested in : http://en.wikipedia.org/wiki/Numerical_integration Please reach back if anyone understands this concept. Cheers, -Ajo
Re: [Math] Cleaning up the curve fitters
The discussion about how to get something into commons when it is (a) well documented and (b) demonstrated better on at least some domains is partially procedural, but it hinges on technical factors. I think that Ajo is being very reserved here. When I faced similar discouragement in the past with commons math contributions, I simply went elsewhere. It still seems to me that it would serve CM well to pay more attention to Ajo's comments and suggestions. Simply saying that we should focus on technical discussion when CM's list is filled with esthetic arguments really just sounds like a way of pushing people away. On Fri, Jul 19, 2013 at 10:21 AM, Phil Steitz phil.ste...@gmail.com wrote: As I said above, let's focus on actual technical discussion here. We implement standard, well-documented algorithms. We need to provide references and convince ourselves that what we release is numerically sound, well-documented and well-tested. We do our best with the volunteer resources we have. Your help and contributions are appreciated. Phil On 7/19/13 9:44 AM, Ajo Fod wrote: Hi, I very much appreciate the work that has been done in CM and this is precisely why I'd like more people to contribute. Even when you didnt' accept my MATH-995 patch, I got useful input from Konstantin and it has already made my application more efficient. What you required of me in the Improper integral example was a comparison of different methods. This sort of research takes time. I hear that Gilles is working on it. I appreciate that you guys spent so much effort on this. However, my contention is that your efforts at researching alternate solutions to a patch is not justified till you come up with a test that the patch fails OR if you know an alternate performs better for an application you have. In the first case, you're losing the efficiency of open source by reinventing a possibly different wheel without sufficient marginal reward. In the second case, beware of the fact that numerical algorithms are hairy beasts, and it takes a while to encode something new. The efficiency of commons comes from putting the burden of development on the developers who need the code. So, I propose an alternate approach to testing if a submitted patch needs to be accepted: 1. Check if the patch fills a gap in existing CM code 2. if so, check if it passes known tests 3. if so, write up alternate tests to see if the code breaks. 4. if so, wrap the code up in a suitable API and accept the patch This has two advantages. First CM will have more capabilities per unit of your precious time. Second you give people the feeling that they are making a difference. As far as the debate on AQ(AdaptiveQuadrature) vs LGQ(IterativeLegendreGaussIntegrator) goes: The FACTS that support AQ over LGQ are: 1. An example where LGQ failed and AQ succeeded. I also explained why LGQ fails and AQ will probably converge more correctly. Generally adaptive quadrature are known to be so succesful at integration that Konstantin even wondered why we don't have something yet. 2. Efficiency improvement: I also showed that LGQ is more efficient at at least one example in terms of accuracy in digits per function evaluation. So, conversely, its now your turn to provide concrete examples where LGQ does better than AQ. You could pose credible objections by providing examples where: 1. AQ fails but LGQ passes. 2. LGQ is more efficient in accuracy per evaluation. All that to illustrate with example where the perception that it is hard to convince the gatekeepers of commons of the merits of a patch arises from. I have a package in my codebase with assorted patches that I just dont' think is worth the time to try to post to commons. I think it is very inefficient if others have such private patches. Cheers, Ajo On Thu, Jul 18, 2013 at 2:15 PM, Phil Steitz phil.ste...@gmail.com wrote: On 7/18/13 1:48 PM, Ajo Fod wrote: Hello folks, There is a lot of work in API design. However, Konstantin's point is that it takes a lot of effort to convince Gilles of any alternatives. API design issues should really be second to functionality. This idea seems to be lost in conversations. With patience and collaboration you can have both and we *need* to have both. You can't get to a stable API and approachable and maintainable code base without thinking carefully about API design. I agree with Gilles that providing tests and benchmarks that exhibit the advantages of a particular method are probably the best way to show other contributors the value of an alternative approach. There is some value to this, but honestly much more value in carefully researching and presenting the numerical analysis to support improvement / performance claims. It is quite depressing to the contributor to see one's contribution be rejected
Re: [Math] Cleaning up the curve fitters
On Fri, Jul 19, 2013 at 12:27 PM, Phil Steitz phil.ste...@gmail.com wrote: It still seems to me that it would serve CM well to pay more attention to Ajo's comments and suggestions. Simply saying that we should focus on technical discussion when CM's list is filled with esthetic arguments really just sounds like a way of pushing people away. Please read the threads. This is not esthetics. Maybe you can help. I read the threads. I stand by my assertions that there are *lots* of non-technical discussions on CM. Rejecting this particular argument since it is procedural+technical rather than just technical seems less supportable than continuing those other threads. Help? The way that I help with the esthetic threads is by staying out of the way. Contributing to the noise level is not helpful and the endurance and spare time of the typical CM poster is apparently boundless. With this thread, the input I have is that Ajo's comments make a lot of sense (to me) and the technical arguments against including AQ as an option do not make a lot of sense (to me).
Re: [math] Math-Jax in javadoc?
We have adopted this in Mahout based on the suggestion I saw here. It works great. On Sun, Jul 14, 2013 at 2:31 PM, Ajo Fod ajo@gmail.com wrote: I like this idea too. Im curious to know how it works. +1 On Sun, Jul 14, 2013 at 11:35 AM, Thomas Neidhart thomas.neidh...@gmail.com wrote: On 07/14/2013 07:50 PM, Phil Steitz wrote: I think we have talked about this before but did not achieve consensus or at least never got it set up. I am finishing the javadoc for Kolmogorov-Smirnov tests (MATH-437) and would really like to just embed Tex formulas in the class javadoc. I found that just adding an additionalparmeter element to the javadoc plugin config in the pom works to get MathJax configured. Then you just use Tex escapes \( and \) for inline, \[ and \] for formulas. If others are OK with this, I will open a JIRA to make the pom change and document usage in the programmers guide. +1 Thomas - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] Math-Jax in javadoc?
I am not sure that I tested with other than javadoc:javadoc. I also don't think that Mahout uses the site target for generating javadocs (at least not in continuous integration). We are only now doing our first release since making the change so I can't say how that is turning out just yet. On Sun, Jul 14, 2013 at 10:01 PM, Phil Steitz phil.ste...@gmail.com wrote: On 7/14/13 3:24 PM, Ted Dunning wrote: We have adopted this in Mahout based on the suggestion I saw here. It works great. I just opened a ticket (https://issues.apache.org/jira/browse/MATH-1006) and attached a patch. For some reason, the maven site plugin does not pass the -header option through to the javadoc plugin. When I do mvn javadoc:javadoc it works; but mvn site does not bring the MathJax engine in. How did you guys get this to work in mahout? Thanks in advance. Phil On Sun, Jul 14, 2013 at 2:31 PM, Ajo Fod ajo@gmail.com wrote: I like this idea too. Im curious to know how it works. +1 On Sun, Jul 14, 2013 at 11:35 AM, Thomas Neidhart thomas.neidh...@gmail.com wrote: On 07/14/2013 07:50 PM, Phil Steitz wrote: I think we have talked about this before but did not achieve consensus or at least never got it set up. I am finishing the javadoc for Kolmogorov-Smirnov tests (MATH-437) and would really like to just embed Tex formulas in the class javadoc. I found that just adding an additionalparmeter element to the javadoc plugin config in the pom works to get MathJax configured. Then you just use Tex escapes \( and \) for inline, \[ and \] for formulas. If others are OK with this, I will open a JIRA to make the pom change and document usage in the programmers guide. +1 Thomas - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: Lang: ObjectUtils
A bigger question is why this is needed at all. Why not just use composition? In guava, one would do this: Iterables.all(Arrays.asList(foo), new PredicateDouble() { @Override public boolean apply(Double input) { return input != null; } }); Surely the same is already possible with commons. On Thu, Jul 4, 2013 at 12:23 PM, Dave Brosius dbros...@mebigfatguy.comwrote: This implies that having arrays with some null elements is a) somewhat common 2) a good idea I'd say both are not true. I'm not sure the library should promote that the above is the case. On 07/04/2013 02:43 PM, Rafael Santini wrote: Hi, I would like to propose a method in ObjectUtils class that receives an array of objects and returns true if all objects are not null. I have implemented the following: public static boolean isNull(Object object) { return object == null; } public static boolean isNotNull(Object object) { return isNull(object) == false; } public static boolean isNotNull(Object... objects) { for (Object object : objects) { if (isNull(object)) { return false; } } return true; } Can I submit a patch for this feature? Thanks, Rafael Santini --**--**- To unsubscribe, e-mail: dev-unsubscribe@commons.**apache.orgdev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org --**--**- To unsubscribe, e-mail: dev-unsubscribe@commons.**apache.orgdev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] On MATH-995: Problems with LegendreGaussQuadrature class.
On Fri, Jun 28, 2013 at 9:05 AM, Gilles gil...@harfang.homelinux.orgwrote: Did you read my other (rather more lengthy) post? Is that jumping? Yes. You jumped on him rather than helped him be productive. The general message is we have something in the works, don't bother us with your ideas.
Re: [Bag] random pick
I think that this implementation is a problem. Bag implementations tend to fall into different categories, according to whether they provide unit (or log) time random access, random deletion and ordered traversal. Most implementations don't provide unit time for all operations. I think that your implementation assumes multiple operations have unit time. Otherwise sampling all of the items will take quadratic time. Also, your implementations alter the underlying collections. That seems like a bad policy. Sampling without replacement can be implemented as iteration over a permutation with unit amortized time. A trivial implementation uses an auxiliary array. It is also possible to do without the auxiliary. You can get unit amortized time for linked list permutations as well, but it is difficult to do without the extra storage. Finally, the names should not consist of combinations of pick and remit. The correct English terms are sample and replacement. On Sat, Mar 16, 2013 at 6:25 PM, Othmen Tiliouine tiliouine.oth...@gmail.com wrote: This is an example of use of pick from Bag https://github.com/influence160/flera see the classes https://github.com/influence160/flera/blob/master/flera-core/src/main/java/com/otiliouine/flera/SuccessionBasedWordGenerator.javaand https://github.com/influence160/flera/blob/master/flera-core/src/main/java/com/otiliouine/flera/analyzer/SuccessionBasedDictionaryAnalyzer.java 2013/3/13 Othmen Tiliouine tiliouine.oth...@gmail.com I remplaced the patch 2013/3/13 Ted Dunning ted.dunn...@gmail.com You seem to have reformatted the entire file. This makes it nearly impossible to review your suggested change. Can you make a diff that doesn't involve changing every line in the file? On Tue, Mar 12, 2013 at 3:48 PM, Othmen Tiliouine tiliouine.oth...@gmail.com wrote: i puted the suggestion and attached the patch https://issues.apache.org/jira/browse/COLLECTIONS-448 2013/3/12 Thomas Neidhart thomas.neidh...@gmail.com On 03/12/2013 08:58 AM, Othmen Tiliouine wrote: Thank you Ted, I understand from your mail that there is no particular reason that makes that the interface Bag no contains these methods and that this subject has never been discussed in the mailing list. I'll see if I can create this patch this weekend, but i want to know, what do you think what are the methods I should add public Object pick(); //pick random element (with remove) public Object pickAndRemit() ; //pick random element (without remove) public Collection pick(int n); //pick random n element (with remove) public Collection pickAndRemit(int n) ; //pick random n element (without remove) public Iterator pick(int n); //pick random n element (with remove) public Iterator pickAndRemit(int n) ; //pick random n element (without remove) public List pick(int n); //pick random n element (with remove) public List pickAndRemit(int n) ; //pick random n element (without remove) public Bag pick(int n); //pick random n element (with remove) public Bag pickAndRemit(int n) ; //pick random n element (without remove) maby i must provide the two kind of methods ( Bag pick(int n) and Iterator pickOrdered(int n) ) ? there is something I do not understand why the bag does not use generics ? the current version of collections in the trunk is already adapted to generics. We are currently in the process of preparing a release (4.0). So when you provide a patch, please align it to the version in the trunk. Thomas - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [Bag] random pick
You seem to have reformatted the entire file. This makes it nearly impossible to review your suggested change. Can you make a diff that doesn't involve changing every line in the file? On Tue, Mar 12, 2013 at 3:48 PM, Othmen Tiliouine tiliouine.oth...@gmail.com wrote: i puted the suggestion and attached the patch https://issues.apache.org/jira/browse/COLLECTIONS-448 2013/3/12 Thomas Neidhart thomas.neidh...@gmail.com On 03/12/2013 08:58 AM, Othmen Tiliouine wrote: Thank you Ted, I understand from your mail that there is no particular reason that makes that the interface Bag no contains these methods and that this subject has never been discussed in the mailing list. I'll see if I can create this patch this weekend, but i want to know, what do you think what are the methods I should add public Object pick(); //pick random element (with remove) public Object pickAndRemit() ; //pick random element (without remove) public Collection pick(int n); //pick random n element (with remove) public Collection pickAndRemit(int n) ; //pick random n element (without remove) public Iterator pick(int n); //pick random n element (with remove) public Iterator pickAndRemit(int n) ; //pick random n element (without remove) public List pick(int n); //pick random n element (with remove) public List pickAndRemit(int n) ; //pick random n element (without remove) public Bag pick(int n); //pick random n element (with remove) public Bag pickAndRemit(int n) ; //pick random n element (without remove) maby i must provide the two kind of methods ( Bag pick(int n) and Iterator pickOrdered(int n) ) ? there is something I do not understand why the bag does not use generics ? the current version of collections in the trunk is already adapted to generics. We are currently in the process of preparing a release (4.0). So when you provide a patch, please align it to the version in the trunk. Thomas - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [Bag] random pick
Othmen, The common way to contribute code is to file a bug report/enhancement request at the correct commons component: https://issues.apache.org/jira/secure/BrowseProjects.jspa#10260 My guess is that you want collections which is at https://issues.apache.org/jira/browse/COLLECTIONS Then you should put your suggested solution onto that JIRA as an attachment. The attachment should be a patch file. That will be a place that the merits of the contribution can be discussed. My own comment here is that the common idiom in the English statistical literature for sampling from a bag is either sampling without replacement or sampling with replacement. Moreover, it is typical that you allow for multiple items to be sampled rather than requiring sampling to proceed one element at a time. Sampling n items from an n-item bag without replacement, for instance, would return a permutation of the bag (if you get an ordered sample) or a partition of the bag (if you get an unordered sample). There is also the question of whether the bag should be considered immutable during sampling. If you want to leave the bag unchanged by sampling, then you probably should be returning a sampler object of some kind that is kind of a randomized iterator or iterable. These kinds of design decisions need to be hashed out in the process of getting your contribution into the code. Good luck with your contribution! On Mon, Mar 11, 2013 at 4:16 PM, Othmen Tiliouine tiliouine.oth...@gmail.com wrote: Hello, I just saw the Bag interface and its implementations, I'm surprised that Bag T (and none of these implementations) expose the method public T pick() and public T pickAndRemit() (pick a random element) The Bag object we see in nature, it is mainly used to this, that's why it is widely used in the probability that when I met 2 white balls and one black, when I draw a ball randomly I have 2 times more likely to have a white ball I think that if this caracteristic exists it would be very valuable. this is a simple implementation of pick() and pickAndRemit() with HashBag package com.otiliouine.commons.collections; import java.util.Iterator; import org.apache.commons.collections.bag.HashBag; public class OpaqueHashBag extends HashBag implements OpaqueBag { public Object pcik() { if (size() == 0) { return null; } int randomIndex = (int) (Math.random() * size()); int searchIndex = randomIndex; Iterator iterator = this.iterator(); //iterator = this.map.keySet().iterator() Object selectedItem = iterator.next(); int count; while (searchIndex 0) { searchIndex --; selectedItem = iterator.next(); } //while ((count = getCount(selectedItem)) searchIndex + 1) { //searchIndex -= count; //selectedItem = iterator.next(); //} return selectedItem; } public Object pickAndRemit() { Object picked = pcik(); if (picked != null) { remove(picked, 1); } return picked; } } it can be optimized if it is writen in AbstractMapBag class and this is the test public class TestOpaqueHashBag { private OpaqueHashBag bag; public static void main(String ... args) { TestOpaqueHashBag t = new TestOpaqueHashBag(); t.before(); t.testpick(); } public void before(){ bag = new OpaqueHashBag(); bag.add(white, 6); bag.add(black, 3); bag.add(red, 1); } public void testpick() { int w = 0, b = 0, r = 0; for (int i = 0; i 1000; i++) { String ball = (String) bag.pcik(); if (ball.equals(white)) { w ++; } else if (ball.equals(black)) { b ++; } else { r ++; } } System.out.println(%white = + w/10); System.out.println(%black = + b/10); System.out.println(%red = + r/10); } } output : %white = 59 %black = 30 %red = 9 Othmen TILIOUINE
Re: [ALL] How to handle static imports [was: Re: svn commit: r1441784 - /commons/sandbox/beanutils2/trunk/src/main/java/org/apache/commons/beanutils2/PropertyDescriptorsRegistry.java]
Another common use is with junit to import assertEquals and such. On Mon, Feb 4, 2013 at 4:41 PM, Gary Gregory garydgreg...@gmail.com wrote: On Mon, Feb 4, 2013 at 4:32 PM, Benedikt Ritter brit...@apache.org wrote: ... We haven't decided yet how to handle static imports. To form some rules, we'd like to hear what others think about static imports and what rules of thumb you use in your projects. I do not use static imports at work. I do not like using them unless it is for math like expressions (with PI and the like).
Re: [math] DiscreteEmpiricalDistribution
This will be very useful. Sampling from discrete ECDF's is also closely related to generating samples from a multinomial distribution. I did a bit of work on the latter problem. The result of that work is in org.apache.mahout.math.random.Multinomial The major difference that you will have is that you have an ordered domain that you are sampling from while Multinomial is simply sampling from a set. It would be relatively easy to use MultinomialInterval to do what you want where Interval represents a half open interval (and allows +Inf as a right bound and -Inf as a left bound), but I think that you might gain something from the ability to split an interval that would make Multinomial irrelevant to your needs. With MultinomialInterval, one strategy would be to delete an interval and insert both halves which may be a bit more expensive than desired. Doing lots of deletions is also bad in Multinomial because it leaves an entry in place with zero probability (because I was lazy). Trying to mutate the Interval you are splitting so that it forms the left half of the new interval doesn't actually help because modifying the probability of an element just does a deletion and insertion. Better to use the first strategy. It would be very easy to add a garbage collection step that eliminates zero probability entries. As I mentioned, I was just lazy. Anyway, steal concept or code as you feel appropriate. On Mon, Jan 7, 2013 at 8:03 AM, Phil Steitz phil.ste...@gmail.com wrote: The EmpiricalDistribution class in the random package is designed to support large samples. It does not store all of data points in memory, but instead bins the data and uses smoothing kernels within the bins. I have recently had the need for a discrete empirical distribution - i.e., an implementation that stores the full dataset in memory and creates the empirical distribution exactly from it. If there are no objections, I would like to open a JIRA and commit o.a.c.m.distribution.DiscreteEmpiricalDistribution implementing this. I will document the approach and design decisions in the JIRA if others are OK with this addition. Phil - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] [linear] immutability
On Tue, Jan 1, 2013 at 10:25 AM, Phil Steitz phil.ste...@gmail.com wrote: Agreed we should keep the discussion concrete. Sebastien and Luc have both mentioned specific examples where the overhead of matrix data copy and storage creates practical problems. Konstantin mentioned another (Gaussian elimination) which is sort of humorous because we have in fact effectively implemented that already, embedded in the LU decomp class - but to do it, we took the approach that I mentioned above, which is to abandon the linear algebraic objects and operate directly on double arrays. And frankly, that can be disastrous for performance as well. As Konstantin mentioned, it is critical to have BLAS operations exposed to get good performance. Level 2 and 3 operations are particularly important. Phil's allusions to collections is particularly apt. There are immutable collections (see guava) and they are very handy. And there are mutable collections and they are very handy. And there are multi-thread performance friendly mutable collections (see ConcurrentHashMap). They all share a fairly simple API and they have some very simple abstract class implementation helpers.
Re: [math] [linear] immutability
On Tue, Jan 1, 2013 at 11:17 AM, Sébastien Brisard sebastien.bris...@m4x.org wrote: Please mention that when I first mentioned in-place operations, I didn't have speed in mind, but really memory. I think we would not gain much speedwise, as Java has become very good at allocating objects (this would be true of large problems, where typically a few big objects would be allocated at each iteration. The conclusion would probably be different with many small objects to be allocated at each iteration). Allocation is not the problem. The problem is memory bandwidth due to the copies that are a side effect of the allocation.
Re: [math] [linear] immutability
My apologies, but I have totally lost track of who said what because too many comments have enormous lines immediately adjacent to responses. On Tue, Jan 1, 2013 at 11:39 AM, Somebody s...@body.org wrote: I thought that maybe it was due to the underlying (dynamic) data structure for sparse vectors/matrices (OpenIntToDoubleHashMap), while typical storage schemes (compressed row, skyline) might be more efficient (since only arrays are used), but do not lend themselves to changes of the value of initially null coefficients. That's why I was suggesting immutable matrices as a start, but what I really meant was matrices where the null coefficients are constant. Please note that your objection does not hold with iterative linear solvers (conjugate gradient, ...), so immutable matrices might still be interesting. One of the benefits of making it easy to extend matrices can be found in Mahout (which I should emphasize is probably only a source of ideas, not a perfect paragon of perfection by any means). As a result of easy extensibility, we have in mahout two kinds of sparse vectors and many kinds of sparse matrices. One kind of sparse vector uses an open hash table (RandomAccessHashTable). It works well for mutable situations, but is a bit more memory hungry than might be liked. The other kind is implemented using an array of indexes and an array of doubles. It can be read randomly with log n worst case performance or log log n performance if the indices are well distributed. It is nearly immutable in that after a sequence of mutations, it requires substantial time and possibly memory to merge itself back together for more read operations. What it does phenomenally well, however, is support iteration with little memory overhead. As far as matrices are concerned, we have a dense matrix backed by a single indexed double[]. We have a row sparse matrix that has rows that are either kind of sparse vectors. We have block sparse matrices that has row patches that are matrices which need not exist, but are created lazily if written to. We have memory mapped dense matrices. We have a memory mapped dense matrix that maps several regions of large files together to form a single matrix (since mmap only supports 2GB files). Some of these storage forms are in Mahout math. Some are in applications. The virtue here is that we didn't have to discuss these matters much. We could just say Sounds like a great idea, can you build a prototype to demonstrate your point? to any bright spark who came along. Many prototypes were built and several were incorporated. So the impact of a simple core API (with linear algebra, mutable operations and OK, but not great visitor patterns) and associated abstract classes and abstract tests was as much social as technical. The social virtues have, in turn, led to technical virtues.
Re: [math] [linear] immutability
On Mon, Dec 31, 2012 at 9:30 AM, Phil Steitz phil.ste...@gmail.com wrote: If we stick to 0) algebraic objects are immutable 1) algorithms defined using algebraic concepts should be implemented using algebraic objects ... 0) Start, with Konstantin's help, by fleshing out the InPlace matrix / vector interface 1) Integrate Mahout code as part of a wholesale refactoring of the linear package 2) Extend use of the visitor pattern to perform mutations in-place (similar to 0) in effect) Speaking as one of the main authors of the Mahout code and very occasional contributor to CM, I doubt that integrating it directly will suit CM needs/prejudices. For instance, the whole sparse matrix problem where 0 x Inf = 0 instead of NaN is probably not satisfactory for CM, but speed was considered a more important requirement for Mahout. Similarly, Mahout math depends on a primitive collection implementation that generates over 200 classes from templates. That makes some of the sparse codes very fast, but it might lead to some indigestion for CM.
Re: [math] major problem with new released version 3.1
Dim has it exactly right here. On Sun, Dec 30, 2012 at 10:38 AM, Dimitri Pourbaix pourb...@astro.ulb.ac.be wrote: In optimization, the user supplies the function to be minimised. In curve fitting, the user supplies a series of observations and the model to be fitted. Trying to combine both into a unique scheme (how highly abstract it is) is simply misleading.
Re: [math] major problem with new released version 3.1
Konstantin, We are close then. Yes optimization should use vector methods as possible. But visitor functions are very easy to add in an abstract class. They impose very little burden on the implementor. On Sun, Dec 30, 2012 at 8:52 AM, Konstantin Berlin kber...@gmail.comwrote: I think we might have a misunderstanding. What I am discussing is not the general implementation for a matrix, but the base interface that would be required for input into optimizers. I was saying that there should not be a burden of implementing visitor pattern for users creating a custom class for optimization input, if it is not used internally by the optimizers.
Re: [math] major problem with new released version 3.1
The GPU requires native code that is executed on the GPU. Standard linear algebra libraries exist for this so if the API can express a standard linear algebra routine concisely, then the GPU can be used. General Java code usually can't be executed on a GPU. There is some late breaking news on this front, but the way to get performance is generally to recognize standard idioms that have accelerated implementations. In Mahout, for instance, we can recognize many linear algebra operations from idiomatic use of visitor patterns. For instance, in this code, Vector u, v; v.assign(Functions.PLUS, u); Mahout will recognize the call to assign as a vector addition. This is easy with vector operations but much harder to recognize matrix operations expressed with simple visitor patterns. On Sun, Dec 30, 2012 at 11:26 AM, Sébastien Brisard sebastien.bris...@m4x.org wrote: and hence preclude vector based process operations, such as you would find on a GPU. So if the user wanted to speed up the computation using a GPU they would not be able to do it, if we base it on a single element at a time visitor pattern. I fail to see how the GPU could not be used. I am no expert on GPU programming, but I can easily imagine a new implementation of RealVector, say GpuBasedRealVector, where the walkInDefaultOrder method would send multiple values at a time to the GPU. I've already done that for multi-core machines (using fork/join), and the visitor pattern was certainly not a limitation.
Re: [math] major problem with new released version 3.1
On Sun, Dec 30, 2012 at 12:16 PM, Konstantin Berlin kber...@gmail.comwrote: ... There would be no burden on the user's side: the visitor pattern has been implemented for RealVectors in version 3.1. Besides, we could provide all the relevant visitors (addition, scaling, …) There is an additional burden to the user, since if you require implementation of the full RealVector or RealMatrix interface, which includes a large set of functions not need for the optimizer, since the user has no idea which function you will use inside the optimizer and which he can leave blank. For example, if a user needs to create their own implementation of a vector multiplication, because they have a special case which can be handled faster, or because they use a GPU, they are still burdened with implementing unnecessary support for the dozens of others functions which will never be used. For a GPU example, like Ted has pointed out, they can implement a GPU version for basic operations (add, multi, etc), but to guarantee general support for any Java function using the visitor pattern the user would also need to implement a Java version of the visitor pattern. With a good abstract class to inherit from, this isn't much of a problem in practice. You should need to implement very little and you should be able to over-ride what you will without much danger. It also helps to have standardized tests that can pretty exhaustively test new implementations for correctness. To a surprising extent, this allows new implementations to be well tested nearly on the day they are written.
Re: [math] major problem with new released version 3.1
Actually, the visitor pattern or variants thereof can produce very performant linear algebra implementations. You can't usually get quite down to optimized BLAS performance, but you get pretty darned fast code. The reason is that the visitor is typically a very simple class which is immediately inlined by the JIT. Then it is subject to all of the normal optimizations exactly as if the code were written as a single concrete loop. For many implementations, the bounds checks will be hoisted out of the loop so you get pretty decent code. More importantly in many cases, visitors allow in place algorithms. Combined with view operators that limit visibility to part of a matrix, and the inlining phenomenon mentioned above, this can have enormous implications to performance. A great case in point is the Mahout math library. With no special efforts taken and using the visitor style fairly ubiquitously, I can get about 2 G flops from my laptop. Using Atlas as a LINPAK implementation gives me about 5 G flops. I agree with the point that linear algebra operators should be used where possible, but that just isn't feasible for lots of operations in real applications. Getting solid performance with simple code in those applications is a real virtue. On Sat, Dec 29, 2012 at 3:22 PM, Konstantin Berlin kber...@gmail.comwrote: While visitor pattern is a good abstraction, I think it would make for terrible linear algebra performance. All operations should be based on basic vector operations, which internally can take advantage of sequential memory access. For large problems it makes a difference. The visitor pattern is a nice add on, but it should not be the engine driving the package under the hood, in my opinion.
Re: [math] major problem with new released version 3.1
Who said force? Linear algebra operations should be available. Visitors should be available. Your mileage will vary. That was always true. On Sat, Dec 29, 2012 at 3:59 PM, Konstantin Berlin kber...@gmail.comwrote: Also. If you have GPU implementation of a matrix, or use another type of a vector processor, there is no way you can program that in if you force vector operations to use a visitor patterns. On Dec 29, 2012, at 6:43 PM, Konstantin Berlin kber...@gmail.com wrote: That's a good point about the compiler. I never tested the performance of visitors vs. sequential array access. I just don't want the vector operations to be tied to any particular implementation detail. On Dec 29, 2012, at 6:30 PM, Ted Dunning ted.dunn...@gmail.com wrote: Actually, the visitor pattern or variants thereof can produce very performant linear algebra implementations. You can't usually get quite down to optimized BLAS performance, but you get pretty darned fast code. The reason is that the visitor is typically a very simple class which is immediately inlined by the JIT. Then it is subject to all of the normal optimizations exactly as if the code were written as a single concrete loop. For many implementations, the bounds checks will be hoisted out of the loop so you get pretty decent code. More importantly in many cases, visitors allow in place algorithms. Combined with view operators that limit visibility to part of a matrix, and the inlining phenomenon mentioned above, this can have enormous implications to performance. A great case in point is the Mahout math library. With no special efforts taken and using the visitor style fairly ubiquitously, I can get about 2 G flops from my laptop. Using Atlas as a LINPAK implementation gives me about 5 G flops. I agree with the point that linear algebra operators should be used where possible, but that just isn't feasible for lots of operations in real applications. Getting solid performance with simple code in those applications is a real virtue. On Sat, Dec 29, 2012 at 3:22 PM, Konstantin Berlin kber...@gmail.com wrote: While visitor pattern is a good abstraction, I think it would make for terrible linear algebra performance. All operations should be based on basic vector operations, which internally can take advantage of sequential memory access. For large problems it makes a difference. The visitor pattern is a nice add on, but it should not be the engine driving the package under the hood, in my opinion. - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] pearson and spearman correlation runtime complexity
Can you say more about how you implemented these? The Pearson coefficient should be quite simple. A few passes through the data should suffice and it can probably be done in one pass, especially if you aren't worried about 1ULP accuracy. The Spearman coefficient should be no worse than the cost of sorting plus the cost of the Pearson computation. There are often faster methods as well if there are no ties. On Thu, Dec 13, 2012 at 6:57 AM, Martin Rosellen martin.rosel...@fu-berlin.de wrote: Hi again, I tried to implement the pearson and spearman algorithm myself and the computation took very long. That is why I now use the commons math solution. I am curious about the runtime complexity of the Pearson and the Spearman correlation coefficient. Can someone help me with that? Greetz Martin --**--**- To unsubscribe, e-mail: dev-unsubscribe@commons.**apache.orgdev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [Math] Old to new API (MultivariateDifferentiable(Vector)Function)
Correctness isn't that hard to get. You just need to add a bitmap for exceptional values in all matrices. This bitmap can be accessed by sparse operations so that the iteration is across the union of non-zero elements in the sparse vector/matrix and exception elements in the operand. That fact is, however, that preserving NaN results in these corner cases is not normally a huge priority for users. Deleting all support for sparse vectors, on the other hand, is a huge impact on utility of commons math. To my mind deleting hugely useful functionality in the face of a small issue is upside down, especially when there is actually a pretty simple fix available. On Sat, Dec 1, 2012 at 8:02 AM, Sébastien Brisard sebastien.bris...@m4x.org wrote: A few months ago, we started a thread on this issue (on the users ML). It got no answers! I am personally not happy with removing the support for sparse vectors/matrices, but the truth is we didn't see a way to achieve the same degree of correctness as --say-- array based matrices and vectors. As far as I am concerned, though, this is still an open question, and if you have ideas about these issues, we would be glad to here them!
Re: [math] Checking preconditions on package private functions
That's fine. I think raw use of reflection might make the tests pretty complicated, but the idea is reasonable. Jmockit allows mocking of static methods (I have used it to mock System.nanoTime(), for instance). By using a partial mock class, you can gain access to private methods as well. On Thu, Nov 29, 2012 at 10:59 PM, Sébastien Brisard sebastien.bris...@m4x.org wrote: Does anyone oppose the usage of reflection in unit testing to access private methods? I personnally think it is a good compromise between encapsulation and comprehensive testing.
Re: [math] Checking preconditions on package private functions
I can only say from my own experience that people make mistakes over time and having the code warn them when that happens is a good thing. Your experience may be different but I have to admit that I have done some pretty silly things along the lines of forgetting to follow some constraint. On Nov 27, 2012, at 8:08 AM, Gilles Sadowski gil...@harfang.homelinux.org wrote: My point is that _if_ the methods can be made private, then we can assume that they are used properly there (which is the only place where they can be called). It's the same that we wouldn't check whether, say, we wrote a + b instead of a - b - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [CSV] Discussion about the new CSVFormatBuilder
Surely you meant to say no other commons library. Builder patterns are relatively common. See guava for instance: http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/base/Splitter.html On Tue, Nov 20, 2012 at 11:49 AM, Gary Gregory garydgreg...@gmail.comwrote: - it has been argued that using the builder pattern only to make sure CSVFormats are valid is overengineered. No other library has this kind of validation.
Re: [CSV] Discussion about the new CSVFormatBuilder
Another way of looking at the builder style is that it is Java's way of using keyword arguments for complex constructors. It also allows a reasonable amount of future-proofing. These benefits are hard to replicate with constructors. On the other hand, builder-style patterns are a royal pain with serialization frameworks. On Tue, Nov 20, 2012 at 2:57 PM, Gary Gregory garydgreg...@gmail.comwrote: Ok this is good. Let's see some healthy debating. :) What is the alternate API? To me the bother is the extra build() call, but that's the pattern. Could an alt API be used and co-exist? Is making the ctor an option? It would have to do some validation. Gary On Nov 20, 2012, at 16:59, Emmanuel Bourg ebo...@apache.org wrote: Le 20/11/2012 20:01, Benedikt Ritter a écrit : Please share your thoughts about the builder. Sorry Benedikt but I have to say I really don't like this design. I prefer a simpler API for the reasons you mentioned in the disadvantages. The minor improvements from the developer's point of view are much less important than the ease of use from user's point of view. Emmanuel Bourg - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [Math] MATH-894
The typical answer to this when adding a functional method like compute is to also add a view object. The rationale is that a small number of view methods can be composed with a small number of compute/aggregate methods to get the expressive power of what would otherwise require a vast array of methods. On Nov 15, 2012, at 7:03 AM, Phil Steitz phil.ste...@gmail.com wrote: Then in RDA, we add compute(ArrayFunction) as a public method. Then if we make UnivariateStatistic extend this new interface, DescriptiveStatistics can get what it needs from this. Just at thought. Would love to get better ideas on this. What is in trunk now works; but having to subclass for internal use makes me wonder if we have solved the problem. - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [all] moving to svnpubsub or CMS?
On Thu, Nov 15, 2012 at 2:46 AM, Olivier Lamy ol...@apache.org wrote: This is a false dichotomy. Maven site generation can work with ASF CMS if desired. That is sort of true but doesn't really apply to commons. I created the Flume site using Maven and Maven generates the site from RST source files. This isn't a typical Maven project site. In essence it is a CMS project that happens to invoke Maven whenever content is changed. IMO the Logging project site is a lot closer to what Commons needs. The top level site is managed by the CMS. Each of the project sites is built by Maven and is directly checked into the production area and each is independently managed. See http://wiki.apache.org/logging/ManagingTheWebSite Agree on this option (this what we experimented with maven too: not yet live) This hybrid approach is what I was referring to.
Re: [Math] MATH-894
On Thu, Nov 15, 2012 at 8:42 AM, Phil Steitz phil.ste...@gmail.com wrote: On 11/15/12 8:01 AM, Ted Dunning wrote: The typical answer to this when adding a functional method like compute is to also add a view object. The rationale is that a small number of view methods can be composed with a small number of compute/aggregate methods to get the expressive power of what would otherwise require a vast array of methods. If I understand correctly, we already have a view object exposed - getElements. The challenge is that this method returns a copy and what we would like is a way to get a function computed directly on the data encapsulated in the RDA. Without function pointers or real array references, I don't see a straightforward way to do this. When I say view, I mean something that is a reference and is not a copy. The getElements method is a copy, not view under this terminology. The Colt/Mahout approach is to define a view object which opaquely remembers a reference to the original, an offset and a length. Functions and other arguments can be passed to this view object which operates on a subset of the original contents by calling the function. Performance is actually quite good. The JIT seems to in-line the view object access to the underlying object and also in-lines evaluation of the function so that the actual code that is executed is pretty much what you would write in C, but you don't have to worry as much since the pattern of access is more controlled. For completeness, this is essentially what java.nio does with the *Buffer classes as well. You can wrap an array and then you can ask for slices out of that array while retaining the reference semantics.
Re: [Math] MATH-894
On Thu, Nov 15, 2012 at 10:04 AM, Phil Steitz phil.ste...@gmail.com wrote: Do you know how to do that with a primitive array? Can you provide some sample code? You don't. See my next paragraph. See the assign method in this class: https://github.com/apache/mahout/blob/trunk/math/src/main/java/org/apache/mahout/math/VectorView.java Thanks for your help on this. Phil The Colt/Mahout approach is to define a view object which opaquely remembers a reference to the original, an offset and a length. Functions and other arguments can be passed to this view object which operates on a subset of the original contents by calling the function. Performance is actually quite good. The JIT seems to in-line the view object access to the underlying object and also in-lines evaluation of the function so that the actual code that is executed is pretty much what you would write in C, but you don't have to worry as much since the pattern of access is more controlled. For completeness, this is essentially what java.nio does with the *Buffer classes as well. You can wrap an array and then you can ask for slices out of that array while retaining the reference semantics.
Re: [Math] MATH-894
On Thu, Nov 15, 2012 at 10:42 AM, Phil Steitz phil.ste...@gmail.com wrote: On 11/15/12 10:29 AM, Ted Dunning wrote: On Thu, Nov 15, 2012 at 10:04 AM, Phil Steitz phil.ste...@gmail.com wrote: Do you know how to do that with a primitive array? Can you provide some sample code? You don't. See my next paragraph. See the assign method in this class: https://github.com/apache/mahout/blob/trunk/math/src/main/java/org/apache/mahout/math/VectorView.java Interesting. I see no assign method, but I can see what this thing does. It is not clear to me though how this idea could be meaningfully applied to solve the problem we have with applying statistics to an RDA without doing any array copying. Most likely I am missing the point. The assign methods are inherited. The signatures are like assign(DoubleFunction), assign(DoubleDoubleFunction, Matrix other) and so on. My thought was that if you need to operate on part of an RDA, then a RDA_View class might do the job.
Re: [Math] MATH-894
Yes. Sounds similar. On Thu, Nov 15, 2012 at 11:02 AM, Phil Steitz phil.ste...@gmail.com wrote: The assign methods are inherited. The signatures are like assign(DoubleFunction), assign(DoubleDoubleFunction, Matrix other) and so on. OK, assign looks like what I was calling evaluate and DoubleFunction looks like what I was calling ArrayFunction
Re: [all] moving to svnpubsub or CMS?
On Tue, Nov 13, 2012 at 11:40 PM, Luc Maisonobe l...@spaceroots.org wrote: Please, could someone who knows what to do step up? I can't volunteer the time to do this, but I can say that process is really quite simple. We switched with Drill and the results are not bad at all. See http://incubator.apache.org/drill/ All you need to do is translate the pages to mark-down text, copy and adapt a few headers and stick the resulting files into a standardized directory structure in SVN. From there, you notify infra that you have a CMS set of pages ready to go and shortly later, you have CMS supported site. To edit pages after that, you can either edit the markdown version in SVN, check it back in and trigger a CMS rebuild. It is much easier to use a javascript bookmark provided for the purpose which triggers an in-browser edit of the page. You can then stage, review and finally publish the page from your browser. The entire process is (somewhat voluminously) documented at http://www.apache.org/dev/cms.html It really is relatively painless.
Re: [all] moving to svnpubsub or CMS?
On Wed, Nov 14, 2012 at 6:54 AM, Emmanuel Bourg ebo...@apache.org wrote: Le 14/11/2012 08:59, Ted Dunning a écrit : All you need to do is translate the pages to mark-down text, copy and adapt a few headers and stick the resulting files into a standardized directory structure in SVN. From there, you notify infra that you have a CMS set of pages ready to go and shortly later, you have CMS supported site. Is it possible to publish generated content like Javadocs and Maven sites with this CMS ? Yes. See the answer posted 6 hours before your question.
Re: [math] UTF-8 characters in javadoc comments
On Wed, Nov 14, 2012 at 12:11 AM, Sébastien Brisard sebastien.bris...@m4x.org wrote: There is no problem with the current setup of our website (at least, the website generated locally has no problem). For the new system, I would like to step up, but I really (really) have no clue what you are talking about... I don't know what svnpubsub or CMS are. There is a pointer in Ted's answer to your other message. I'll read through the doc, and if I feel not too incompetent, I will try it on CM if you want. I remember exactly that feeling! svnpubsub[1] is a mechanism that svn supports which allows listeners to subscribe to changes in an svn repository. It stands for svn-publish-subscribe. CMS[2][3] stands for content management system and Apache has built their own on top of svnpubsub. One rationale for reinventing this was the need for the site to be completely static. The way that Apache CMS works is that you write documents in mark-down[4] format which is basically just text with a few wiki-like conventions for common textual effects such as headers and links. These documents are converted to HTML and embedded in page boilerplate using a templating system similar to that used by Django's[5]. A nice starting point for a totally simple site might be the Drill web site [6][7]. I say this because it is the only thing that Drill has put into SVN and is also new and thus relatively simpler than a fully fleshed out site. [1] http://www.apache.org/dev/cms.html#svnpubsub [2] http://en.wikipedia.org/wiki/Content_management_system [3] http://www.apache.org/dev/cms.html [4] http://daringfireball.net/projects/markdown/ [5] https://www.djangoproject.com/ [6] http://incubator.apache.org/drill/ [7] http://svn.apache.org/repos/asf/incubator/drill/
Re: [all] moving to svnpubsub or CMS?
On Wed, Nov 14, 2012 at 1:53 PM, Olivier Lamy ol...@apache.org wrote: 2012/11/14 Thomas Vandahl t...@apache.org: On 14.11.2012 08:40, Luc Maisonobe wrote: Please, could someone who knows what to do step up? Just a quick note that sites created by Maven can be published with svnpubsub using the SCM Publish Maven Plugin (http://maven.apache.org/sandbox/plugins/asf-svnpubsub-plugin/). I guess this may keep the effort manageable (no further experience, though). See especially the link http://maven.apache.org/sandbox/plugins/asf-svnpubsub-plugin/examples/importing-maven-site.html for how to do the initial import. IMHO first checkin will be simpler doing a checkin of content from p.a.o If you use this maven plugin and your project doesn't have any sub modules deploying a site is a simple as today: mvm site-deploy (for multi modules it's a different if needed I can help a bit as I have setup that for other asf projects and write some part of this plugin :-) ). The question is more do you want to continue maven site generation for docs or move to asf cms ? This is a false dichotomy. Maven site generation can work with ASF CMS if desired.
Re: [Math] MATH-878: Feature request with patch
On Sun, Nov 4, 2012 at 11:56 AM, Phil Steitz phil.ste...@gmail.com wrote: 0) Did you or anyone else ever analyze the bigram data in the paper using Fisher's test stats? That bigram data isn't particularly interesting; any text will show similar effects. Others have tested Fisher's exact test, but only a few cases turned up where there was any mileage. The cost of Fisher's test makes it much less interesting for the text, genomic, classification and recommendation applications of G^2. 1) Is the bigram data from [1] available anywhere? I don't think so. Any small technical text should exhibit similar characteristics. You can find more examples in my longer work on the subject: http://arxiv.org/abs/1207.1847 Most of these examples are based on publicly available data. 1) Do you think a direct implementation of Fisher's test for 2x2 designs and a monte carlo impl for r x c would be useful? I have this in C from years ago and could translate it fairly easily. I have no clue if people want this. G^2 is pretty well entrenched in text analysis and recommendations and there have been hundreds of citations to my original paper, many of which replicated the value of the test. As such, I wouldn't expect a lot of value in those applications. Other areas may well be a different story. A fully featured implementation of Fisher's exact test is pretty complex, however, since you have to take such different tacks at different data scales and with differently shaped tables.
Re: [Math] MATH-878: Feature request with patch
What kind of check did you want? I checked the code by eye and supplied several test cases. You might say that I am versed in statistics since I am the author of the major paper on this test as applied to computational linguistics. On Sun, Oct 21, 2012 at 11:07 PM, Phil Steitz phil.ste...@gmail.com wrote: On 10/20/12 3:58 AM, Gilles Sadowski wrote: Hello. https://issues.apache.org/jira/browse/MATH-878 Would someone well versed in statistics check that contribution? I wanted to get to this this weekend, but was not able to. I will look at it as soon as I can get some free cycles. Phil Best regards, Gilles - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [Math] MATH-878: Feature request with patch
On Mon, Oct 22, 2012 at 4:20 AM, Gilles Sadowski gil...@harfang.homelinux.org wrote: On Sun, Oct 21, 2012 at 11:25:08PM -0700, Ted Dunning wrote: What kind of check did you want? Well, I'm seeking to know whether the code can be included in Commons Math's trunk. Hard for me to say as I am usually out of step with c.m. Currently, the answer is a partial no (IMHO), because of the remarks which I formulated on the JIRA page. [If it were only that, I would have corrected the formatting problems (to my taste).] Fair. Thus: I'd like people to confirm that the code itself fits with the design of the o.a.c.m.stat package, and to take the responsibility for committing the patch (adapted to their taste!). :-) I can't comment on the design. Only on whether it seems to do what it says it should. I checked the code by eye and supplied several test cases. You might say that I am versed in statistics since I am the author of the major paper on this test as applied to computational linguistics. Thank you for the _contents_ review. Sorry for the misunderstanding that I was talking more about the form. I can't comment on the form.
Re: [Math] MATH-816 (mixture model distribution) . .
Existing code does have a certain cachet to it. On Thu, Oct 18, 2012 at 5:13 AM, Patrick Meyer meyer...@gmail.com wrote: I vote for simplicity. Current practice in the social sciences is to fit multiple models, each with a different number of components, and use fit statistics to choose the best model. There are some additional features I would like to see added and I have the code to contribute if it is not currently there. To be consistent with Mplus, we need have the algorithm use multiple random starts and run a few of the best starts to completion. Mplus uses this strategy to effectively overcome local minima. -Original Message- From: Becksfort, Jared [mailto:jared.becksf...@stjude.org] Sent: Wednesday, October 17, 2012 11:37 PM To: Commons Developers List Subject: RE: [Math] MATH-816 (mixture model distribution) . . I see. I am planning to submit the EM fit for multivariate normal mixture models in the next couple of weeks (Math-817). A Gibbs sampling DP fit may be a bit further out. I am not opposed to allowing the number of components to change, but I also like the simplicity of this class. Whatever you guys decide is probably fine. Jared From: Ted Dunning [ted.dunn...@gmail.com] Sent: Wednesday, October 17, 2012 9:41 PM To: Commons Developers List Subject: Re: [Math] MATH-816 (mixture model distribution) =?utf-8?B?LiAgICAu? ==?utf-8?B?LiAgICAu?= The issue is that with a fixed number of components, you need to do multiple runs to find a best fit number of components. Gibbs sampling against a Dirichlet process can get you to the same answer in about the same cost as a single run of EM with a fixed number of models. On Wed, Oct 17, 2012 at 7:31 PM, Becksfort, Jared jared.becksf...@stjude.org wrote: Ted, I am not sure I understand the problem with the fixed number of components. My understanding is that CM prefers immutable objects. Adding a component to an object would require reweighting in addition to modifying the component list. A new mixture model could be instantiated using the getComponents function and then adding or removing more components if necessary. Jared From: Ted Dunning [ted.dunn...@gmail.com] Sent: Wednesday, October 17, 2012 5:21 PM To: Commons Developers List Subject: Re: [Math] MATH-816 (mixture model distribution)=?utf-8?B?LiAgICAu?= Seems fine. I think that the limitation to a fixed number of mixture components is a bit limiting. So is the limitation to a uniform set of components. Both limitations can be eased without a huge difficultly. Avoiding the fixed number of components can be done by using some variant of Dirichlet processes. Simply picking k_max relatively large and then using an approximate DP over that finite set works well. That said, mixture models are pretty nice to have. On Wed, Oct 17, 2012 at 2:13 PM, Gilles Sadowski gil...@harfang.homelinux.org wrote: Hello. Any objection to commit the code as proposed on the report page? https://issues.apache.org/jira/browse/MATH-816 Regards, Gilles - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org Email Disclaimer: www.stjude.org/emaildisclaimer Consultation Disclaimer: www.stjude.org/consultationdisclaimer - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [Math] MATH-816 (mixture model distribution)
Seems fine. I think that the limitation to a fixed number of mixture components is a bit limiting. So is the limitation to a uniform set of components. Both limitations can be eased without a huge difficultly. Avoiding the fixed number of components can be done by using some variant of Dirichlet processes. Simply picking k_max relatively large and then using an approximate DP over that finite set works well. That said, mixture models are pretty nice to have. On Wed, Oct 17, 2012 at 2:13 PM, Gilles Sadowski gil...@harfang.homelinux.org wrote: Hello. Any objection to commit the code as proposed on the report page? https://issues.apache.org/jira/browse/MATH-816 Regards, Gilles - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [Math] MATH-816 (mixture model distribution)=?utf-8?B?LiAgICAu? =
The issue is that with a fixed number of components, you need to do multiple runs to find a best fit number of components. Gibbs sampling against a Dirichlet process can get you to the same answer in about the same cost as a single run of EM with a fixed number of models. On Wed, Oct 17, 2012 at 7:31 PM, Becksfort, Jared jared.becksf...@stjude.org wrote: Ted, I am not sure I understand the problem with the fixed number of components. My understanding is that CM prefers immutable objects. Adding a component to an object would require reweighting in addition to modifying the component list. A new mixture model could be instantiated using the getComponents function and then adding or removing more components if necessary. Jared From: Ted Dunning [ted.dunn...@gmail.com] Sent: Wednesday, October 17, 2012 5:21 PM To: Commons Developers List Subject: Re: [Math] MATH-816 (mixture model distribution)=?utf-8?B?LiAgICAu?= Seems fine. I think that the limitation to a fixed number of mixture components is a bit limiting. So is the limitation to a uniform set of components. Both limitations can be eased without a huge difficultly. Avoiding the fixed number of components can be done by using some variant of Dirichlet processes. Simply picking k_max relatively large and then using an approximate DP over that finite set works well. That said, mixture models are pretty nice to have. On Wed, Oct 17, 2012 at 2:13 PM, Gilles Sadowski gil...@harfang.homelinux.org wrote: Hello. Any objection to commit the code as proposed on the report page? https://issues.apache.org/jira/browse/MATH-816 Regards, Gilles - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org Email Disclaimer: www.stjude.org/emaildisclaimer Consultation Disclaimer: www.stjude.org/consultationdisclaimer - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org
Re: [math] G-Tests in math.stat.inference
Feel free to grab and adapt the Mahout code. It has some added wrinkles for convenience like the signed square root variant of the G-test. On Tue, Oct 9, 2012 at 12:53 PM, rado tzvetkov rtzvet...@yahoo.com wrote: Also I already have code to contribute and tests for G-Test for independence. (if needed) apache mahout also has some code implemented for LLR
Re: [math] Logistic, Probit regerssion and Tolerance checks
This is great. A very useful feature would be to allow basic L_1 and L_2 regularization. This makes it much easier to avoid problems with separable problems. It might be interesting to think for a moment how easy it would be to support generalized linear regression in this same package. Small changes to the loss function in the optimization should allow you to have not just logistic and probit regression, but also to get Poisson regression and SVM in the same framework. On Fri, Sep 7, 2012 at 3:22 AM, marios michaelidis mimari...@hotmail.comwrote: I am willing to provide complete Logistic and Probit regression algorithms, optimizable by newton Raphson optimization maximum-likelihood method , in a very programmatically easy way (e.g regression(double matrix [][], double Target[], String Constant, double precision, double tolerance) , with academic references and very quick (3 secs for 60k set), with getter methods for all the common statistics such as null Deviance, Deviance, AIC, BIC, Chi-square f the model, betas, Wald statistics and p values, Cox_snell R square, Nagelkerke’s R-Square, Pseudo_r2, residuals, probabilities, classification matrix.