Re: [ALL] Volunteers for a Math IPMC?

2016-06-18 Thread Ted Dunning
On Sat, Jun 18, 2016 at 4:29 AM, Gilles <gil...@harfang.homelinux.org>
wrote:

> ...
> I'm asking, again, whether I need to initiate a VOTE that would allow me
> to set up a workspace ("git", etc.) and transfer some code from CM over
> there.
>

Nothing is stopping you from setting something up.  Github is usually the
easiest way.

It doesn't sound like that is what you want, however. I don't understand
why not.


>
> It may be that incubation is a good thing for Commons Math, but it doesn't
>> seem valid to say that incubation is necessary because CM is being kicked
>> out of Commons.
>>
>
> Never said so.
>

Hmm... I must have misunderstood the comment about CM not being interested
in hosting "these components".


> There is a confusion here: *I* say that CM is dead.
>

Strong words. Such statements are often frustrating to others. It does
sound like the community has dwindled, perhaps beyond repair.

The development situation *will* change because the context *has* changed
> (unsupported code). CM cannot go on as it did before the fork.
>

You can never go home. No project stays the same.


> Everybody (developers, users, Commons PMC) would be better off with a
> selected set of new (supported) components because this is something we
> can easily do *now* (RERO, etc.).
>

This was your assertion in the long email thread. It seemed that there was
significant counter-positions.


> I'm OK to go through the incubator to do that; but I don't see that it
> is an easier path.  Surely it looks longer.  And it seems that even the
> incubator people doubt that it will lead anywhere.
>

The incubator is for building community and adapting to Apache. If you
don't have a seed community, then incubator is the wrong place. You need to
have more than just you.



>
> Given the uncertain outcome, going through the incubator would be an
> attempt at rethinking the development of the currently unsupported
> code.  See e.g.
>   https://issues.apache.org/jira/browse/MATH-172
> [Or is that out of scope for an incubation proposal?]


Incubator is not a place to rethink code. It is primarily for building
community.


>
>
>
> Gilles
>
>
> On Fri, Jun 17, 2016 at 3:35 PM, Gilles <gil...@harfang.homelinux.org>
>> wrote:
>>
>> On Fri, 17 Jun 2016 08:51:36 -0700, Ted Dunning wrote:
>>>
>>> Excuse me?
>>>>
>>>> See inline.
>>>>
>>>>
>>>>
>>>> On Fri, Jun 17, 2016 at 7:50 AM, Gilles <gil...@harfang.homelinux.org>
>>>> wrote:
>>>>
>>>> Hi all.
>>>>
>>>>>
>>>>> On Tue, 14 Jun 2016 11:01:13 -0700, Ralph Goers wrote:
>>>>>
>>>>> I thought this had been made clear.  Several months Commons voted to
>>>>>
>>>>>> make Math a TLP. But shortly after that most of the people involved
>>>>>> with Commons Math felt that a TLP at the ASF would not work for them,
>>>>>> so they forked the project and left, effectively voiding the TLP vote
>>>>>> since the proposed PMC is no longer valid.  There is one person left
>>>>>> who was very involved in Commons Math and a few other people who have
>>>>>> expressed interest in joining the new community.
>>>>>>
>>>>>> So this is a situation where we have an already existing code base
>>>>>> where a lot of the people left are not familiar with quite a bit of
>>>>>> it.  The new group of people who are interested are trying to
>>>>>> determine how they should move forward. There is some talk of breaking
>>>>>> Commons Math into smaller components and possibly dropping some where
>>>>>> there is no one to maintain it.
>>>>>>
>>>>>>
>>>>>> The "Commons" project not being interested in hosting those
>>>>> components,
>>>>> is the "incubator" a good place for the developers wishing to go in
>>>>> that
>>>>> direction?
>>>>>
>>>>>
>>>>> Perhaps before we move to next steps, could you provide some links to
>>>> the
>>>> discussion where it was decided that Commons is not interested in
>>>> hosting
>>>> these components?
>>>>
>>>>
>>> I proposed to concretely examine this possibility in more than
>>> one message:
>>>   http://markmail.org/message/ye6wvqvlvnqe4qrp
>>>   http://markmail.org/message/3gupcednhqtcfepw
>>>   http://markmail.org/message/3kob7djjicax6rgn
>>>   http://markmail.org/message/7rb2mxq7hhwzykvr
>>>
>>> And again in another thread:
>>>   http://markmail.org/message/fnlta2ttfne3aj5f
>>>
>>>
>>> What's the next step?
>>>>>
>>>>>
>>>>> Let's get to a common understanding of what went before.
>>>>
>>>>
>>> Even that seems impossible. :-(
>>>
>>>
>>> Gilles
>>>
>>>
>>>
>>>
>>> -
>>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>>> For additional commands, e-mail: general-h...@incubator.apache.org
>>>
>>>
>>>
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [ALL] Volunteers for a Math IPMC?

2016-06-17 Thread Ted Dunning
Gilles,

Thanks for links.

I just read that (long-winded) thread and I see no consensus that "Commons
project is not being interested in hosting those components".

It may be that incubation is a good thing for Commons Math, but it doesn't
seem valid to say that incubation is necessary because CM is being kicked
out of Commons.



On Fri, Jun 17, 2016 at 3:35 PM, Gilles <gil...@harfang.homelinux.org>
wrote:

> On Fri, 17 Jun 2016 08:51:36 -0700, Ted Dunning wrote:
>
>> Excuse me?
>>
>> See inline.
>>
>>
>>
>> On Fri, Jun 17, 2016 at 7:50 AM, Gilles <gil...@harfang.homelinux.org>
>> wrote:
>>
>> Hi all.
>>>
>>> On Tue, 14 Jun 2016 11:01:13 -0700, Ralph Goers wrote:
>>>
>>> I thought this had been made clear.  Several months Commons voted to
>>>> make Math a TLP. But shortly after that most of the people involved
>>>> with Commons Math felt that a TLP at the ASF would not work for them,
>>>> so they forked the project and left, effectively voiding the TLP vote
>>>> since the proposed PMC is no longer valid.  There is one person left
>>>> who was very involved in Commons Math and a few other people who have
>>>> expressed interest in joining the new community.
>>>>
>>>> So this is a situation where we have an already existing code base
>>>> where a lot of the people left are not familiar with quite a bit of
>>>> it.  The new group of people who are interested are trying to
>>>> determine how they should move forward. There is some talk of breaking
>>>> Commons Math into smaller components and possibly dropping some where
>>>> there is no one to maintain it.
>>>>
>>>>
>>> The "Commons" project not being interested in hosting those components,
>>> is the "incubator" a good place for the developers wishing to go in that
>>> direction?
>>>
>>>
>> Perhaps before we move to next steps, could you provide some links to the
>> discussion where it was decided that Commons is not interested in hosting
>> these components?
>>
>
> I proposed to concretely examine this possibility in more than
> one message:
>   http://markmail.org/message/ye6wvqvlvnqe4qrp
>   http://markmail.org/message/3gupcednhqtcfepw
>   http://markmail.org/message/3kob7djjicax6rgn
>   http://markmail.org/message/7rb2mxq7hhwzykvr
>
> And again in another thread:
>   http://markmail.org/message/fnlta2ttfne3aj5f
>
>
>>> What's the next step?
>>>
>>>
>> Let's get to a common understanding of what went before.
>>
>
> Even that seems impossible. :-(
>
>
> Gilles
>
>
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [ALL] Volunteers for a Math IPMC?

2016-06-17 Thread Ted Dunning
Excuse me?

See inline.



On Fri, Jun 17, 2016 at 7:50 AM, Gilles 
wrote:

> Hi all.
>
> On Tue, 14 Jun 2016 11:01:13 -0700, Ralph Goers wrote:
>
>> I thought this had been made clear.  Several months Commons voted to
>> make Math a TLP. But shortly after that most of the people involved
>> with Commons Math felt that a TLP at the ASF would not work for them,
>> so they forked the project and left, effectively voiding the TLP vote
>> since the proposed PMC is no longer valid.  There is one person left
>> who was very involved in Commons Math and a few other people who have
>> expressed interest in joining the new community.
>>
>> So this is a situation where we have an already existing code base
>> where a lot of the people left are not familiar with quite a bit of
>> it.  The new group of people who are interested are trying to
>> determine how they should move forward. There is some talk of breaking
>> Commons Math into smaller components and possibly dropping some where
>> there is no one to maintain it.
>>
>
> The "Commons" project not being interested in hosting those components,
> is the "incubator" a good place for the developers wishing to go in that
> direction?
>

Perhaps before we move to next steps, could you provide some links to the
discussion where it was decided that Commons is not interested in hosting
these components?


>
> What's the next step?
>

Let's get to a common understanding of what went before.


Re: [ALL] Volunteers for a Math IPMC?

2016-06-15 Thread Ted Dunning
On Wed, Jun 15, 2016 at 10:21 AM, John D. Ament 
wrote:

> Yep absolutely.  I don't think the incubator has ever rejected a project?
>

We have discouraged some submissions. But I have never seen a formal
submission be denied.


Re: [ALL] Volunteers for a Math IPMC?

2016-06-15 Thread Ted Dunning
Jochen,

The need to build the community (nearly) from scratch is definitely NOT a
reason for rejection. It is simply a risk factor that must be mitigated to
succeed in incubation.


On Tue, Jun 14, 2016 at 10:51 PM, Jochen Wiedmann  wrote:

> On Tue, Jun 14, 2016 at 11:29 PM, John D. Ament 
> wrote:
>
> > We generally expect some kind of backing community to bring this to.  We
> > have seen pretty consistently that starting from an empty community
> doesn't
> > work.  It doesn't mean that it's impossible, but very hard to do.
>
> Understood. On the other hand: Would that be sufficient reason for
> rejecting a proposal? ("It didn't
> work in the past" != "It won't work in this case")
>
>
> --
> The next time you hear: "Don't reinvent the wheel!"
>
>
> http://www.keystonedevelopment.co.uk/wp-content/uploads/2014/10/evolution-of-the-wheel-300x85.jpg
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [ALL] Volunteers for a Math IPMC?

2016-06-14 Thread Ted Dunning
On Tue, Jun 14, 2016 at 2:29 PM, John D. Ament 
wrote:

> We generally expect some kind of backing community to bring this to.  We
> have seen pretty consistently that starting from an empty community doesn't
> work.  It doesn't mean that it's impossible, but very hard to do.
>

Frankly, the exceptions to this observation (such as Drill) pretty much
reinforce the conclusion. Drill managed to build a community, but only
because of a LOT of effort on the part of the founders of the project.


Re: [ALL] Volunteers for a Math IPMC?

2016-06-14 Thread Ted Dunning
Looking back through the discussion, it is a bit of a problem that one of
the major reasons given for the fork is that the team thought that they
didn't have a large enough PMC and that incubation wouldn't get them enough
additional contributors. That made it seem like the project should go
forward without meeting Apache requirements (i.e. outside).

Is the situation really that different now that a vastly diminished team is
likely to benefit from incubation enough to form a viable TLP?

(I hate that this sounds negative ... it is a real question)



On Tue, Jun 14, 2016 at 11:01 AM, Ralph Goers 
wrote:

> I thought this had been made clear.  Several months Commons voted to make
> Math a TLP. But shortly after that most of the people involved with Commons
> Math felt that a TLP at the ASF would not work for them, so they forked the
> project and left, effectively voiding the TLP vote since the proposed PMC
> is no longer valid.  There is one person left who was very involved in
> Commons Math and a few other people who have expressed interest in joining
> the new community.
>
> So this is a situation where we have an already existing code base where a
> lot of the people left are not familiar with quite a bit of it.  The new
> group of people who are interested are trying to determine how they should
> move forward. There is some talk of breaking Commons Math into smaller
> components and possibly dropping some where there is no one to maintain it.
>
> Ralph
>
> > On Jun 11, 2016, at 6:21 PM, Niclas Hedhman  wrote:
> >
> > If you have a functioning community around Commons Math already, why do
> you
> > feel you need Incubation?
> >
> > People on a Math TLP would come out of the Commons PMC and simply submit
> a
> > Board Resolution, and I doubt that there would be any objects. There are
> no
> > legal concerns, no community training, no need for release management
> > training, and so on...
> >
> > Or are you looking at a situation where the Commons community has no
> > interest in Math subproject, and need new blood?
> >
> >
> > Cheers
> > Niclas
> >
> > On Sat, Jun 11, 2016 at 6:25 PM, James Carman <
> ja...@carmanconsulting.com>
> > wrote:
> >
> >> We (the Commons PMC) have not decided yet what to do, but I just wanted
> to
> >> gauge the interest in joining the math IPMC if we choose to go TLP by
> way
> >> of the incubator. The idea would be that math (whatever its name may
> be),
> >> would go through the incubator in order to enrich its community prior to
> >> becoming a TLP. Do we have any folks willing to throw their hat in the
> >> ring?
> >>
> >> p.s. I've cross-posted to the incubator list as there are folks there
> who
> >> are very good at this stuff and could perhaps lend us some advice.
> >>
> >
> >
> >
> > --
> > Niclas Hedhman, Software Developer
> > http://zest.apache.org - New Energy for Java
>
>
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [collections] MultiValuedMap interface discussion

2014-03-27 Thread Ted Dunning
Following Guava on this has something to be said for it.

https://code.google.com/p/guava-libraries/wiki/NewCollectionTypesExplained

Their decision is that Multimap#get returns a collection always.  If there
are no values, then an empty collection is returned so that you can always
do

  m.get(key).size()

or

 m.get(key).add(foo)

The value returned is a magical view which only takes up space on demand so
there is little consing done.  There is an asMap method for which get will
return null on missing keys.




On Thu, Mar 27, 2014 at 2:55 PM, Paul Benedict pbened...@apache.org wrote:

 The downside of it returning an empty collection is you either have (1) to
 instantiate a collection just to say you have nothing or (2) you use an
 immutable collection. #1 is bad in itself and #2 is only as bad if the
 collection is otherwise writable. For example, it would be really strange
 for the returned collection to be mutable if you have something but
 immutable if you have nothing.

 My preference is you return null. That's the most rational answer, imo.


 On Thu, Mar 27, 2014 at 4:44 PM, Thomas Neidhart
 thomas.neidh...@gmail.comwrote:

  Hi,
 
  we are currently working on a new MultiValuedMap interface for
  collections, see https://issues.apache.org/jira/browse/COLLECTIONS-508.
 
  During the work we stumbled across an issue we would like to discuss.
  The MultiValuedMap is basically a Map that can hold multiple values
  associated to a given key. Thus the get(K key) method will normally
  return a Collection.
 
  In case no mapping for the key is stored in the map, it may either
  return null (like a normal map), or an empty collection.
 
  I would be in favor to define that get() always returns a collection and
  never returns null. The advantage being that the result of get() can
  safely be used for further operations, e.g. size(), iterator(), ...
  keeping the interface of MultiValuedMap smaller and simple (i.e. no need
  to add additional methods there like size(K key) or iterator(K key)).
 
  The containsKey method would have to check if there is either no mapping
  at all for the key or the stored collection is empty:
 
  public boolean containsKey(K key) {
Collection coll = decoratedMap().get(key);
return coll != null  coll.size  0;
  }
 
  The downside would be that read operations may also alter the map thus
  leading to unexpected ConcurrentModificationExceptions when iterating on
  e.g. value().
 
  So, I would be interested on opinions about this.
 
  Thomas
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
  For additional commands, e-mail: dev-h...@commons.apache.org
 
 


 --
 Cheers,
 Paul



Re: [math] Proposal for New way of Computing an approximate Percentile without storing input data

2014-03-23 Thread Ted Dunning
On Sun, Mar 23, 2014 at 2:09 AM, Thomas Neidhart
thomas.neidh...@gmail.comwrote:


 There is already an issue for this:

 https://issues.apache.org/jira/browse/MATH-418

 It links also other implementations and algorithms, maybe you could add
 a link to your's as well?


Done.  Thanks for the pointer.


Re: [math] Proposal for New way of Computing an approximate Percentile without storing input data

2014-03-22 Thread Ted Dunning
Murthy,

I recently developed an alternative algorithm which provides superior
accuracy for extreme quantiles.  You can read more at

https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf?raw=true

The library involved is available via maven and is apache licensed.  Apache
Commons Math has a no dependency policy which might mean that sucking in
the code would be a better option than simply linking to this.

The standard of the art before t-digest is generally considered to be
either Greenwald and Khanna's algorithm GK01 or the Q-digest.  References
are in the paper above.

In case it isn't obvious, source code is available on github at

https://github.com/tdunning/t-digest

The p^2 algorithm that you suggest is actually quite old and far from the
state of the art.




On Sat, Mar 22, 2014 at 2:40 PM, Phil Steitz phil.ste...@gmail.com wrote:

 On 3/22/14, 2:11 PM, venkatesha m wrote:
  Hi,
 
  I would like to propose for adding new way of computing the percentile
 without needing to store most of input data.  Since this is my first time
 on contributing to apache; please help me / correct me if i miss any
 procedure here.
 
  Here are the details.
 
  Description:
  The Percentile calculation in a  traditional way require all the data
 points to be stored and sorted before accesiing the pth Percentile value of
 the data set. However the storage of points can become prohibitive when we
 need to make use of the existing Percentile Implementation at big data
 scale(For eg: when computing the daily or weekly percentile value of a
 certain performance metric where the data points accumulated over day and
 week may run to GB and TB).  While platforms such as hadoop exist to solve
 the data scale issue; the need for a statistical computation of quantiles
 without storing data is an absolute essential.
  While looking in commons-math classes though Percentile class is
 available it is implemented with storage of input as requirement. So was
 wondering if we could add a class to calculate Percentile without needing
 to store data.  The algorithm that i have chosen to implement and propose
 is based on P Square algorithm (
 http://www.cs.wustl.edu/~jain/papers/ftp/psqr.pdf ) which requires a
 minimal and finite set of memory stores to compute percentiles for
 continuous stream of data.
 
  Ref:
  http://www.cs.wustl.edu/~jain/papers/ftp/psqr.pdf which has succing
 representation of the workflow of the algorithm
 
  Advantages:
  a) As is claimed in the orignal workd the accuracy improves over
 moderate to large data sets which is the need.
  b) A minimal and constant sized data store used to compute a large data
 set
  c) Useful in Hadoop Map reduce applications
 
  Implementation:
  I have implemented this algorithm based on StorelessUnivariateStatistic
 after checking out from 3.2 branch.  I have also opened a JIRA ticket on
 the same (https://issues.apache.org/jira/browse/MATH-1112 ) for
 requesting a new feature to be added.
 
  Please let me know when and how i could send my code for review.

 Thanks, Murthy and welcome!

 I am personally fine with the P-Square algorithm and would welcome a
 patch including implementation.  Unless others disagree with this
 approach (give it a day or two), I would go ahead and attach a patch
 with the implementation to the JIRA you opened.

 Thanks in advance for your contributions!

 Phil
 
  thanks
  murthy


 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




Re: [math] refactoring least squares

2014-02-25 Thread Ted Dunning
On Tue, Feb 25, 2014 at 6:22 AM, Konstantin Berlin kber...@gmail.comwrote:

 Hi,

 I am really having problems believing that matrix copying is the major
 problem in an optimization algorithm. Copying is O(N^2) operations. Surely,
 for any problem where performance would matter, it is completely dwarfed by
 the O(N^3) complexity of actually solving the normal equation.

 Also, I think testing should be done on an actual large problem where
 scaling issuing would show up. The 1000x2 jaccobian would results in a 2x2
 normal equation. Surely this is not a good test case.

 Konstantin


As you point out, the test case in question shows how copying dominates
computation for massively over-determined systems.


Re: [math] refactoring least squares

2014-02-25 Thread Ted Dunning
On Tue, Feb 25, 2014 at 8:39 AM, luc l...@spaceroots.org wrote:

 Also, I think testing should be done on an actual large problem where
 scaling issuing would show up. The 1000x2 jaccobian would results in a
 2x2
 normal equation. Surely this is not a good test case.

 Konstantin


 As you point out, the test case in question shows how copying dominates
 computation for massively over-determined systems.


 You are right. Massively over-determined systems is also an important
 class of problems,
 so it needs to be addressed. I am aware there are many other important
 classes of
 problems, though, so there are probably no silver bullets and what is
 important here
 is not important elsewhere. There are also cases for which forming the
 normal equations
 is avoided (mainly for the sake of numerical robustness if I remember
 correctly), so
 once again no silver bullets.


Underdetermined systems, for instance, have pretty much the opposite
problem in that the normal equations are very large.  These systems are
often solved using least squares with an L_2 regularizer.


Re: [math] refactoring least squares

2014-02-24 Thread Ted Dunning
On Mon, Feb 24, 2014 at 10:23 AM, Gilles gil...@harfang.homelinux.orgwrote:

 One way to improve performance would be to provide pre-allocated space
 for the Jacobian and reuse it for each evaluation.


 Do you have actual data to back this statement?


  The
 LeastSquaresProblem interface would then be:

 void evaluate(RealVector point, RealVector resultResiduals, RealVector
 resultJacobian);

 I'm interested in hearing your ideas on other approaches to solve this
 issue. Or even if this is an issue worth solving.


 Not before we can be sure that in-place modification (rather than
 reallocation) always provides a performance benefit.


Allocation is rarely the problem in these situations.  The implied copying
of data is.

And even the copying isn't always a problem.  For instance, it often pays
off big to copy data to column (or row) major representation to improve
cache coherency.  The result is that a large fraction of the time is spent
copying, but without the copying, the remaining time would take 10x longer.
 The net time taken is 3x faster with the copy.


Re: [math]

2014-02-17 Thread Ted Dunning
On Mon, Feb 17, 2014 at 5:01 AM, Emmanuel Joliet ejol...@sciops.esa.intwrote:

 https://issues.apache.org/jira/browse/MATH-870

  Recently, many problems have been found out with class ...
 Please, consider not removing it.
 We use it heavily and need the class as it gives what we need (handling
 the input of course is necessary regarding NaN and infinities!).
 I understand the problem of an incorrect/improper handling of NaN/Infinity
 conditions but can't justify to remove the iterfaces/classes completely?
 Isn't it??


Yeah... welcome to commons math.

It does seem like an extreme sanction to me.


Re: [math] trouble with SingularValueDecomposition

2014-02-15 Thread Ted Dunning
Note that the only reason that the order is unconstrained is because the
two corresponding singular values are equal.

Strictly speaking, for equal singular values, any unitary transformation of
the corresponding singular vectors are also valid singular vectors.



On Sat, Feb 15, 2014 at 4:09 AM, Patrick Meyer meyer...@gmail.com wrote:

 Thanks Ted. As I mentioned my knowledge of SVD is limited, and I was not
 aware that it is OK to have a different order of the first two columns in
 the results (or the conditions under which the order doesn't matter). I am
 trying to track down a bug in some code and that’s what led me to the SVD.
 I guess I need to keep looking for the real bug.

 For completeness, my results R were the same as you reported. My results
 from CM are shown below and if you swap the first and second column, the
 results agree with R.

 U:
 0.9940594018965339  0.06774763124429131  -0.08518312016997649
 0.10615872136916754  -0.7761401247896214  0.621551704858
 0.02400481989869077  0.6269104921377042  0.778721390144956

 V:
 0.9963653125425972  0.0  -0.08518312016997495
 0.0531395658155507  -0.7815621241949481  0.621551704865
 0.06657590034559915  0.6238274168581248  0.7787213901449556



 -Original Message-
 From: Ted Dunning [mailto:ted.dunn...@gmail.com]
 Sent: Saturday, February 15, 2014 2:17 AM
 To: Commons Developers List
 Subject: Re: [math] trouble with SingularValueDecomposition

 For what its worth, I tested the Mahout SVD which shares code lineage with
 the Commons Math implementation.

 The results I got were:


 
 
 
 
 
 
 
 
 
 
 
  *sum(abs(m - u * s * v')) = 4.31946146e-16S =1.002319690998
   1.0023196909981.

 U =0.994059401897 0.067747631244
  -0.0851831201700.106158721369 -0.776140124790 0.62155170
   0.024004819899 0.626910492138 0.778721390145 V =0.996365312543
  0. -0.0851831201700.053139565816 -0.781562124195
  0.621551700.066575900346 0.623827416858 0.778721390145*


 Note that the residue of the reconstruction is excellently small.  This
 indicates that the result is correct.


 If you compare these to the R results,


 
 
 
 
 
 
 
 
 
 
  *[1] 1.0023196909980066 1.0023196909980066 1.$u
[,1]  [,2]  [,3][1,]
   0.067747631244291326 -0.994059401896534967  0.085183120169970525 [2,]
  -0.776140124789635122 -0.106158721369163295 -0.62155170469113[3,]
   0.626910492137687125 -0.024004819898688426 -0.778721390144969994$v
   [,1]  [,2]  [,3] [1,]
   0.0 -0.996365312542597747  0.085183120169970497[2,]
  -0.78156212419496163 -0.053139565815546450 -0.62155170469668[3,]
   0.62382741685810772 -0.066575900345596822 -0.778721390144969550*


 These are identical to the previous results except that the first two
 singular values are equal which means that the order of the corresponding
 left and right singular vectors are different and there are sign changes in
 the singular vectors.

 My guess is that you will get the same results in Apache Commons Math.



 On Fri, Feb 14, 2014 at 6:07 PM, Patrick Meyer meyer...@gmail.com wrote:

  Hi,
 
 
 
  I am using the SingularValueDecomposition class with a matrix but it
  gives me a different result than R. My knowledge of SVD is limited, so
  any advice is welcomed.
 
 
 
  Here's the method in Java
 
 
 
  public void svdTest(){
 
 
 
  double[][] x = {
 
  {1.0,  -0.053071807862720116,  0.04236086650321309},
 
  {0.05307180786272012,  1.0,  0.0058054424137053435},
 
  {-0.04236086650321309,  -0.005805442413705342,  1.0}
 
  };
 
 
 
  RealMatrix X = new Array2DRowRealMatrix(x);
 
 
 
  SingularValueDecomposition svd = new
  SingularValueDecomposition(X);
 
 
 
  RealMatrix U = svd.getU();
 
  for(int i=0;iU.getRowDimension();i++){
 
  for(int j=0;jU.getColumnDimension();j++){
 
  System.out.print(U.getEntry(i,j) +   );
 
  }
 
  System.out.println();
 
  }
 
 
 
  System.out.println();
 
  System.out.println();
 
  RealMatrix V = svd.getV();
 
  for(int i=0;iV.getRowDimension();i++){
 
  for(int j=0;jV.getColumnDimension();j++){
 
  System.out.print(V.getEntry(i,j) +   );
 
  }
 
  System.out.println();
 
  }
 
 
 
 
 
  }
 
 
 
 
 
  And here's the function in R.
 
 
 
  x-matrix(c(
 
  1.0,  -0.053071807862720116,  0.04236086650321309,
 
0.05307180786272012,  1.0,  0.0058054424137053435,
 
-0.04236086650321309,  -0.005805442413705342,  1.0),
 
  nrow=3, byrow=TRUE)
 
  svd(x)
 
 
 
  Does anyone know why I am getting different results for U and V? I am
  using commons math 3.1.
 
 
 
  Thanks,
 
  Patrick

Re: [math] trouble with SingularValueDecomposition

2014-02-14 Thread Ted Dunning
And what exactly are the results you are getting?




On Fri, Feb 14, 2014 at 6:07 PM, Patrick Meyer meyer...@gmail.com wrote:

 Hi,



 I am using the SingularValueDecomposition class with a matrix but it gives
 me a different result than R. My knowledge of SVD is limited, so any advice
 is welcomed.



 Here's the method in Java



 public void svdTest(){



 double[][] x = {

 {1.0,  -0.053071807862720116,  0.04236086650321309},

 {0.05307180786272012,  1.0,  0.0058054424137053435},

 {-0.04236086650321309,  -0.005805442413705342,  1.0}

 };



 RealMatrix X = new Array2DRowRealMatrix(x);



 SingularValueDecomposition svd = new SingularValueDecomposition(X);



 RealMatrix U = svd.getU();

 for(int i=0;iU.getRowDimension();i++){

 for(int j=0;jU.getColumnDimension();j++){

 System.out.print(U.getEntry(i,j) +   );

 }

 System.out.println();

 }



 System.out.println();

 System.out.println();

 RealMatrix V = svd.getV();

 for(int i=0;iV.getRowDimension();i++){

 for(int j=0;jV.getColumnDimension();j++){

 System.out.print(V.getEntry(i,j) +   );

 }

 System.out.println();

 }





 }





 And here's the function in R.



 x-matrix(c(

 1.0,  -0.053071807862720116,  0.04236086650321309,

   0.05307180786272012,  1.0,  0.0058054424137053435,

   -0.04236086650321309,  -0.005805442413705342,  1.0),

 nrow=3, byrow=TRUE)

 svd(x)



 Does anyone know why I am getting different results for U and V? I am using
 commons math 3.1.



 Thanks,

 Patrick










Re: [math] trouble with SingularValueDecomposition

2014-02-14 Thread Ted Dunning
For what its worth, I tested the Mahout SVD which shares code lineage with
the Commons Math implementation.

The results I got were:













 *sum(abs(m - u * s * v')) = 4.31946146e-16S =1.002319690998
  1.0023196909981. U =0.994059401897 0.067747631244
 -0.0851831201700.106158721369 -0.776140124790 0.62155170
  0.024004819899 0.626910492138 0.778721390145 V =0.996365312543
 0. -0.0851831201700.053139565816 -0.781562124195
 0.621551700.066575900346 0.623827416858 0.778721390145*


Note that the residue of the reconstruction is excellently small.  This
indicates that the result is correct.


If you compare these to the R results,












 *[1] 1.0023196909980066 1.0023196909980066 1.$u
   [,1]  [,2]  [,3][1,]
  0.067747631244291326 -0.994059401896534967  0.085183120169970525 [2,]
 -0.776140124789635122 -0.106158721369163295 -0.62155170469113[3,]
  0.626910492137687125 -0.024004819898688426 -0.778721390144969994$v
  [,1]  [,2]  [,3] [1,]
  0.0 -0.996365312542597747  0.085183120169970497[2,]
 -0.78156212419496163 -0.053139565815546450 -0.62155170469668[3,]
  0.62382741685810772 -0.066575900345596822 -0.778721390144969550*


These are identical to the previous results except that the first two
singular values are equal which means that the order of the corresponding
left and right singular vectors are different and there are sign changes in
the singular vectors.

My guess is that you will get the same results in Apache Commons Math.



On Fri, Feb 14, 2014 at 6:07 PM, Patrick Meyer meyer...@gmail.com wrote:

 Hi,



 I am using the SingularValueDecomposition class with a matrix but it gives
 me a different result than R. My knowledge of SVD is limited, so any advice
 is welcomed.



 Here's the method in Java



 public void svdTest(){



 double[][] x = {

 {1.0,  -0.053071807862720116,  0.04236086650321309},

 {0.05307180786272012,  1.0,  0.0058054424137053435},

 {-0.04236086650321309,  -0.005805442413705342,  1.0}

 };



 RealMatrix X = new Array2DRowRealMatrix(x);



 SingularValueDecomposition svd = new SingularValueDecomposition(X);



 RealMatrix U = svd.getU();

 for(int i=0;iU.getRowDimension();i++){

 for(int j=0;jU.getColumnDimension();j++){

 System.out.print(U.getEntry(i,j) +   );

 }

 System.out.println();

 }



 System.out.println();

 System.out.println();

 RealMatrix V = svd.getV();

 for(int i=0;iV.getRowDimension();i++){

 for(int j=0;jV.getColumnDimension();j++){

 System.out.print(V.getEntry(i,j) +   );

 }

 System.out.println();

 }





 }





 And here's the function in R.



 x-matrix(c(

 1.0,  -0.053071807862720116,  0.04236086650321309,

   0.05307180786272012,  1.0,  0.0058054424137053435,

   -0.04236086650321309,  -0.005805442413705342,  1.0),

 nrow=3, byrow=TRUE)

 svd(x)



 Does anyone know why I am getting different results for U and V? I am using
 commons math 3.1.



 Thanks,

 Patrick










Re: [LANG] New class called StringAlgorithms?

2014-01-17 Thread Ted Dunning
On Fri, Jan 17, 2014 at 4:11 AM, Benedikt Ritter brit...@apache.org wrote:

  A concrete use case could be a query engine which allows customizing its
  string matching algorithm.
 

 Is this really a use case? It sounds very constructed to me. Have you ever
 thought I'd like to query on google, but I'd like suggestions to be
 matched using Levenshtein Distance algorithm?


This is definitely a use case.

Furthermore, Levenshtein distance is often parametrized with edit costs and
possible an edit cost matrix.  Tuning a system for best accuracy by
injecting alternative distance functions is a common activity whether in a
spelling suggestion system or DNA alignment program.


Re: [Math] src/userguide/java

2013-12-31 Thread Ted Dunning
In my experience, examples are most useful as ... well ... examples.  As
such, they should be an example of how user code works.  That means that
they should be a complete stand-alone project, just as most user programs
should be complete and standalone.

If you want to also deliver a pre-compiled version of the examples, that's
great.  But it doesn't affect the desirability of a stand-alone project.



On Tue, Dec 31, 2013 at 9:25 AM, Gilles gil...@harfang.homelinux.orgwrote:

 On Tue, 31 Dec 2013 08:54:59 -0700, Phil Steitz wrote:

 On Dec 31, 2013, at 4:34 AM, Gilles gil...@harfang.homelinux.org wrote:

  On Sun, 29 Dec 2013 13:33:23 -0800, Phil Steitz wrote:

 On 12/29/13, 6:39 AM, Gilles wrote:
 Hello.

 Is there some framework in place in order to generate executable
 files
 from the Java sources located there?
 I guess that a configuration snippet could be added in the
 pom.xml[1]
 so that one of the build phases can also compile (and perhaps also
 run)
 the example applications.


 Regards,
 Gilles

 [1] I tried to use the pom.xml located in src/userguide but it
 failed
to resolve artefact
 org.apache.commons:commons-math3:jar:3.3-SNAPSHOT.


 You need to install the [math] snapshot locally for maven to be able
 to resolve it.  Run mvn install to get a current snapshot
 installed locally.


 OK. That's easy enough for me at the moment.
 [I just wanted to check that what I put under src/userguide/java
 does compile and run.]
 However, I wonder why it is deemed better to have another pom.xml
 rather than have the main one generate the examples JAR.


 Why exactly would one want to generate the examples jar?  I get the
 use case for us of wanting to make sure they build, but for people
 wanting to use them as reference it would seem a self-contained build
 might be a little easier to work with.  Setting it up the way it is
 now also makes it easy to test against prior releases. Also, the self
 contained build is faster.


 People may want to _run_ the examples, without the requirement to have
 maven installed.


 Gilles


 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




Re: [math] Include test data from netlib

2013-12-20 Thread Ted Dunning
From the FAQ:


 *2.1) What is Netlib?  *The Netlib repository contains freely available
 software, documents, and databases of interest to the numerical, scientific
 computing, and other communities. The repository is maintained by ATT Bell
 Laboratories, the University of Tennessee and Oak Ridge National
 Laboratory, and by colleagues world-wide. The collection is replicated at
 several sites around the world, automatically synchronized, to provide
 reliable and network efficient service to the global community.

  and


 *2.3) Are there restrictions on the use of software retrieved from 
 Netlib?*Most
 netlib software packages have no restrictions on their use but we recommend
 you check with the authors to be sure. Checking with the authors is a nice
 courtesy anyway since many authors like to know how their codes are being
 used.


 *2.4) How do I submit software or documents to Netlib? *Direct inqueries
 to netlib_maintain...@netlib.org



On Fri, Dec 20, 2013 at 12:04 PM, Thomas Neidhart thomas.neidh...@gmail.com
 wrote:

 Hi,

 I have a question regarding test data available at
 http://www.netlib.org/lp/data/.

 Could this be included in our subversion repository? I could not find a
 license attached to these files, but it looks like they have been
 contributed to netlib from various sources.

 It would be quite valuable to include them in our automatic tests, but
 they could of course also just be executed stand-alone when working on
 the simplex solver.

 Thomas

 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




Re: [MATH] Eigen decomposition of matrices consisting of RealFieldElements

2013-12-19 Thread Ted Dunning
I had the same question.

Presumably, it is a reasonable thing to have in the corner case of needing
eigenvalues for matrices with extended precision decimal numbers or some
such, but I would be very surprised if there measurably non-zero demand for
such a feature.



On Thu, Dec 19, 2013 at 9:55 AM, Martin Grotle Soukup 
martin.grotle.sou...@gmail.com wrote:

 Hi,

 Apologies for being impatient, but does someone think this is a good idea?
 In case of the contrary I will not trouble this mailing list with this
 request any further.

 Best regards,
 Martin Grotle Soukup



 2013/12/13 Martin Grotle Soukup martin.grotle.sou...@gmail.com

  Hi,
 
  The linear-package in commons-math contains a class that does an eigen
  value decomposition of a RealMatrix. Would there be of interest to add a
  similar class doing eigen value decomposition of a matrix consisting of
  RealFieldElements?
 
  I am happy to help if this is the case.
 
  Best regards,
  Martin Grotle Soukup
 
 



Re: [math] Re: Sparse matrices not supported anymore?

2013-11-08 Thread Ted Dunning
On Fri, Nov 8, 2013 at 11:47 AM, Luc Maisonobe l...@spaceroots.org wrote:

  is there still consensus that we are going to remove the sparse
  implementations with 4.0?

 Well, I really think it is a pity, we should support this. But lets face
 it: up to now we have been unable to do it properly. Sébastien who tried
 to do something in this direction has left the project and nobody
 replaced him.


I have done a fair bit of noodling and was unable to come up with a
solution that is performant.

The issue is that you essentially have to maintain a additional bitmask of
exceptional values in addition to the implicit bitmask of non-zero
elements.  I don't see any way of determining that exceptional value
bitmask short of a full scan.  Moreover, the cost of propagating the
exceptional value bitmask significantly changes the cost of various
operations because exceptions require an OR while multiplication allows use
of an AND.  Furthermore, even after the operation itself and the operation
on the exception bitmask are done, there needs to be another scan of the
results to find new exceptional values.


So the upshot is that dealing with this will cost at least a significant
integer degradation in performance at no benefit relative to the normal
user's expectations with regard to sparse vector operations.  I say no
benefit because no other package handles this sort of issue so users are
very used to imprecise handling of exceptional values.


Re: I need a map for long and double

2013-11-06 Thread Ted Dunning
Serialization of primitive maps is easy to implement since the maps pretty
much just consist of a couple of arrays.  Most of the developers involved
will shy away from java serialization or any dependency on some other
framework.

So is that really a show stopper?



On Wed, Nov 6, 2013 at 6:11 AM, Gary Gregory garydgreg...@gmail.com wrote:

 On Tue, Nov 5, 2013 at 11:49 PM, Gary Gregory garydgreg...@gmail.com
 wrote:

  Thank you all for replying.
 
  HPPC looks promising and it's Apache 2 licensed. I'll give it a closer
  look.
 

 HPPC does not allow for serialization and even says so, odd. Now looking at
 fastutil...

 Gary


 
  Gary
 
 
  On Tue, Nov 5, 2013 at 8:59 PM, Ted Dunning ted.dunn...@gmail.com
 wrote:
 
  Trove is GPL (last I looked).
 
  Mahout has primitive collection implementations (and is obviously ASL).
 
  There are other implementations such as hppc (see
  http://labs.carrotsearch.com/hppc.html )
 
  Mahout is a decent implementation, but I think that hppc has had a round
  or
  two more optimization.
 
  And 150,000 entires in a table is not big for this sort of situation.
   Anything short of Integer.MAX_VALUE/small_factor should be fine.
 
 
 
 
  On Tue, Nov 5, 2013 at 5:49 PM, Bruno P. Kinoshita 
  brunodepau...@yahoo.com.br wrote:
 
   Maybe Trove's TObjectMapLong?
  
   [1]
  
 
 http://trove4j.sourceforge.net/javadocs/gnu/trove/map/TObjectLongMap.html
  
  
   HTH,
  
   Bruno P. Kinoshita
   http://kinoshita.eti.br
   http://tupilabs.com
  
  
   
From: Gary Gregory garydgreg...@gmail.com
   To: Commons Developers List dev@commons.apache.org
   Sent: Tuesday, November 5, 2013 11:39 PM
   Subject: I need a map for long and double
   
   
   Hi All:
   
   I'm looking for a Map implementation that takes a String as a key
 and a
   long as the value (and another taking a double as the value). I'd
  rather
   not take the extra memory of using generic map with a Long object
 value
   hit
   since the maps will have up to 150,000 entries. That would save
 me... a
   meg
   for each map I am guestimating (on a 64-bit JVM). A meg here, a meg
   there...
   
   I did not see anything in [collections] or Google Guava.
   
   Thoughts?
   
   Gary
   
   --
   E-Mail: garydgreg...@gmail.com | ggreg...@apache.org
   Java Persistence with Hibernate, Second Edition
   http://www.manning.com/bauer3/
   JUnit in Action, Second Edition http://www.manning.com/tahchiev/
   Spring Batch in Action http://www.manning.com/templier/
   Blog: http://garygregory.wordpress.com
   Home: http://garygregory.com/
   Tweet! http://twitter.com/GaryGregory
   
   
   
  
 
 
 
 
  --
  E-Mail: garydgreg...@gmail.com | ggreg...@apache.org
  Java Persistence with Hibernate, Second Edition
 http://www.manning.com/bauer3/
  JUnit in Action, Second Edition http://www.manning.com/tahchiev/
  Spring Batch in Action http://www.manning.com/templier/
 
  Blog: http://garygregory.wordpress.com
  Home: http://garygregory.com/
  Tweet! http://twitter.com/GaryGregory
 



 --
 E-Mail: garydgreg...@gmail.com | ggreg...@apache.org
 Java Persistence with Hibernate, Second Edition
 http://www.manning.com/bauer3/
 JUnit in Action, Second Edition http://www.manning.com/tahchiev/
 Spring Batch in Action http://www.manning.com/templier/
 Blog: http://garygregory.wordpress.com
 Home: http://garygregory.com/
 Tweet! http://twitter.com/GaryGregory



Re: [math] Multithreaded performances

2013-11-05 Thread Ted Dunning
On Mon, Nov 4, 2013 at 10:09 PM, Romain Manni-Bucau
rmannibu...@gmail.comwrote:

 Oh sorry, that's what I said early, in a real app no or not enough to be an
 issue buy on simple apps or very high thrououtput apps yes.
  Le 5 nov. 2013 07:00, Ted Dunning ted.dunn...@gmail.com a écrit :

  That isn't what I meant.
 
  Do you really think that more than one metric has to update (increment,
  say) at precisely the same time?
 


I realize that is what you said.  Do you have any serious examples where
metrics have to be updated all or nothing?


Re: I need a map for long and double

2013-11-05 Thread Ted Dunning
Trove is GPL (last I looked).

Mahout has primitive collection implementations (and is obviously ASL).

There are other implementations such as hppc (see
http://labs.carrotsearch.com/hppc.html )

Mahout is a decent implementation, but I think that hppc has had a round or
two more optimization.

And 150,000 entires in a table is not big for this sort of situation.
 Anything short of Integer.MAX_VALUE/small_factor should be fine.




On Tue, Nov 5, 2013 at 5:49 PM, Bruno P. Kinoshita 
brunodepau...@yahoo.com.br wrote:

 Maybe Trove's TObjectMapLong?

 [1]
 http://trove4j.sourceforge.net/javadocs/gnu/trove/map/TObjectLongMap.html


 HTH,

 Bruno P. Kinoshita
 http://kinoshita.eti.br
 http://tupilabs.com


 
  From: Gary Gregory garydgreg...@gmail.com
 To: Commons Developers List dev@commons.apache.org
 Sent: Tuesday, November 5, 2013 11:39 PM
 Subject: I need a map for long and double
 
 
 Hi All:
 
 I'm looking for a Map implementation that takes a String as a key and a
 long as the value (and another taking a double as the value). I'd rather
 not take the extra memory of using generic map with a Long object value
 hit
 since the maps will have up to 150,000 entries. That would save me... a
 meg
 for each map I am guestimating (on a 64-bit JVM). A meg here, a meg
 there...
 
 I did not see anything in [collections] or Google Guava.
 
 Thoughts?
 
 Gary
 
 --
 E-Mail: garydgreg...@gmail.com | ggreg...@apache.org
 Java Persistence with Hibernate, Second Edition
 http://www.manning.com/bauer3/
 JUnit in Action, Second Edition http://www.manning.com/tahchiev/
 Spring Batch in Action http://www.manning.com/templier/
 Blog: http://garygregory.wordpress.com
 Home: http://garygregory.com/
 Tweet! http://twitter.com/GaryGregory
 
 
 



Re: [math] Multithreaded performances

2013-11-04 Thread Ted Dunning
My experience is that the only way to get really high performance with
counter-like objects is to have one per thread and combine them on read.




On Mon, Nov 4, 2013 at 8:49 AM, Romain Manni-Bucau rmannibu...@gmail.comwrote:

 Hi,

 ATM sirona (a java monitoring library in incubator) relies a lot on
 Summary stats object from [math3] but it needed a lock to ensure
 consistency. I know there is a synchronized version but this one
 scales less then the locked one.

 My question is quite simple then: will [math] add an implementation
 with thread safety guarantee and good performances? I think for
 instance to the LongAdder of Doug Lea which could be used as a good
 base.

 Romain Manni-Bucau
 Twitter: @rmannibucau
 Blog: http://rmannibucau.wordpress.com/
 LinkedIn: http://fr.linkedin.com/in/rmannibucau
 Github: https://github.com/rmannibucau

 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




Re: [math] Multithreaded performances

2013-11-04 Thread Ted Dunning
I still think that what you need is a thread-safe copy rather than a
thread-safe mutate.  Even if you force every thread to do the copy, the
aggregation still still wins on complexity/correctness/performance ideas.


On Mon, Nov 4, 2013 at 12:58 PM, Romain Manni-Bucau
rmannibu...@gmail.comwrote:

 In sirona we collect (aggregate) data each N ms and we can still use stats
 during aggregation (worse case surely)
 Le 4 nov. 2013 21:48, Phil Steitz phil.ste...@gmail.com a écrit :

  On 11/4/13 12:12 PM, Romain Manni-Bucau wrote:
   But aggregation needs to lock so not a real solution. Lock is fine on
  real
   cases but not in simple/light ones. ThreadLocal leaks...so a trade off
   should be found
 
  Depends on the use case.  If the use case is
 
  0) launch a bunch of threads and let them gather stats individually
  1) aggregate results
 
  Then the static aggregate method in AggregateSummaryStatistics that
  takes a collection as input will work with no locking required.
 
  Phil
   Le 4 nov. 2013 18:42, Phil Steitz phil.ste...@gmail.com a écrit :
  
   On 11/4/13 8:49 AM, Romain Manni-Bucau wrote:
   Hi,
  
   ATM sirona (a java monitoring library in incubator) relies a lot on
   Summary stats object from [math3] but it needed a lock to ensure
   consistency. I know there is a synchronized version but this one
   scales less then the locked one.
  
   My question is quite simple then: will [math] add an implementation
   with thread safety guarantee and good performances? I think for
   instance to the LongAdder of Doug Lea which could be used as a good
   base.
   The short answer is yes, patches welcome.
  
   Ted makes a good point, though; and there is already some support
   for aggregation in the stats classes in [math] (i.e., you can
   aggregate the results of per-thread stats by using, e.g.
   AggregateSummaryStatistics#aggregate).  See MATH-1016 re extending
   this to more stats.
  
   Phil
  
   Romain Manni-Bucau
   Twitter: @rmannibucau
   Blog: http://rmannibucau.wordpress.com/
   LinkedIn: http://fr.linkedin.com/in/rmannibucau
   Github: https://github.com/rmannibucau
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
   For additional commands, e-mail: dev-h...@commons.apache.org
  
  
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
   For additional commands, e-mail: dev-h...@commons.apache.org
  
  
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
  For additional commands, e-mail: dev-h...@commons.apache.org
 
 



Re: [math] Multithreaded performances

2013-11-04 Thread Ted Dunning
The copy doesn't have to lock if you build the right data structure.

The thread leak problem can be more serious.




On Mon, Nov 4, 2013 at 2:47 PM, Phil Steitz phil.ste...@gmail.com wrote:

 On 11/4/13 2:31 PM, Romain Manni-Bucau wrote:
  The copy will lock too.

 Right.  That is why I asked exactly how things work.  If you can't
 lock during aggregation, we need something different.

  And it doesnt solve leak issue of the one instance
  by thread solution, no?

 Correct, again depends on the setup how big a problem that is / what
 can be done to manage it.

 Phil
  Le 4 nov. 2013 23:27, Phil Steitz phil.ste...@gmail.com a écrit :
 
  On 11/4/13 2:22 PM, Ted Dunning wrote:
  I still think that what you need is a thread-safe copy rather than a
  thread-safe mutate.
  I was just thinking the same thing.  Patches welcome.
 
  Phil
Even if you force every thread to do the copy, the
  aggregation still still wins on complexity/correctness/performance
 ideas.
 
 
  On Mon, Nov 4, 2013 at 12:58 PM, Romain Manni-Bucau
  rmannibu...@gmail.comwrote:
 
  In sirona we collect (aggregate) data each N ms and we can still use
  stats
  during aggregation (worse case surely)
  Le 4 nov. 2013 21:48, Phil Steitz phil.ste...@gmail.com a écrit :
 
  On 11/4/13 12:12 PM, Romain Manni-Bucau wrote:
  But aggregation needs to lock so not a real solution. Lock is fine
 on
  real
  cases but not in simple/light ones. ThreadLocal leaks...so a trade
 off
  should be found
  Depends on the use case.  If the use case is
 
  0) launch a bunch of threads and let them gather stats individually
  1) aggregate results
 
  Then the static aggregate method in AggregateSummaryStatistics that
  takes a collection as input will work with no locking required.
 
  Phil
  Le 4 nov. 2013 18:42, Phil Steitz phil.ste...@gmail.com a
 écrit :
 
  On 11/4/13 8:49 AM, Romain Manni-Bucau wrote:
  Hi,
 
  ATM sirona (a java monitoring library in incubator) relies a lot
 on
  Summary stats object from [math3] but it needed a lock to ensure
  consistency. I know there is a synchronized version but this one
  scales less then the locked one.
 
  My question is quite simple then: will [math] add an
 implementation
  with thread safety guarantee and good performances? I think for
  instance to the LongAdder of Doug Lea which could be used as a
 good
  base.
  The short answer is yes, patches welcome.
 
  Ted makes a good point, though; and there is already some support
  for aggregation in the stats classes in [math] (i.e., you can
  aggregate the results of per-thread stats by using, e.g.
  AggregateSummaryStatistics#aggregate).  See MATH-1016 re extending
  this to more stats.
 
  Phil
 
  Romain Manni-Bucau
  Twitter: @rmannibucau
  Blog: http://rmannibucau.wordpress.com/
  LinkedIn: http://fr.linkedin.com/in/rmannibucau
  Github: https://github.com/rmannibucau
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
  For additional commands, e-mail: dev-h...@commons.apache.org
 
 
 
 -
  To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
  For additional commands, e-mail: dev-h...@commons.apache.org
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
  For additional commands, e-mail: dev-h...@commons.apache.org
 
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
  For additional commands, e-mail: dev-h...@commons.apache.org
 
 


 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




Re: [math] Multithreaded performances

2013-11-04 Thread Ted Dunning
On Mon, Nov 4, 2013 at 8:23 PM, Phil Steitz phil.ste...@gmail.com wrote:

 On 11/4/13 3:44 PM, Ted Dunning wrote:
  The copy doesn't have to lock if you build the right data structure.

 The individual stats objects need to update multiple quantities
 atomically when new values come in.  Consistency in the copy
 requires that you suppress updates while the copy is in progress
 unless you implement some kind of update queue internally.   What
 exactly do you mean by the right data structure?


I was talking about lockless data structures in general.

Are you sure that real transactions are a requirement here?


Re: [math] Multithreaded performances

2013-11-04 Thread Ted Dunning
That isn't what I meant.

Do you really think that more than one metric has to update (increment,
say) at precisely the same time?


On Mon, Nov 4, 2013 at 9:49 PM, Romain Manni-Bucau rmannibu...@gmail.comwrote:

 You cant stop the app cause you take a snapshot of the monitoring metrics
 so yes
 Le 5 nov. 2013 06:46, Ted Dunning ted.dunn...@gmail.com a écrit :

  On Mon, Nov 4, 2013 at 8:23 PM, Phil Steitz phil.ste...@gmail.com
 wrote:
 
   On 11/4/13 3:44 PM, Ted Dunning wrote:
The copy doesn't have to lock if you build the right data structure.
  
   The individual stats objects need to update multiple quantities
   atomically when new values come in.  Consistency in the copy
   requires that you suppress updates while the copy is in progress
   unless you implement some kind of update queue internally.   What
   exactly do you mean by the right data structure?
  
 
  I was talking about lockless data structures in general.
 
  Are you sure that real transactions are a requirement here?
 



Re: [MATH] Interest in large patches for small cleanup / performance changes?

2013-11-03 Thread Ted Dunning
On Sun, Nov 3, 2013 at 10:56 AM, Luc Maisonobe l...@spaceroots.org wrote:

  I had proposed that error messages be incrementally built from simple
  base patterns, to be assembled either at the point where the exception
  is going to be thrown or inside specific exceptions[2] (or a combination
  of both).

 It often doesn't work. Sentences constructions are completely different
 in different languages, and it is impossible to simply buid up from
 elementary components that are individually translated and assembled
 later. See all the documentation about the ancient gettext for example.


Modern printf implementations deal with this by numbered arguments.  This
is not a problem any more.

See http://docs.oracle.com/javase/7/docs/api/java/util/Formatter.html#syntax


Re: [MATH] Interest in large patches for small cleanup / performance changes?

2013-11-02 Thread Ted Dunning
How many of these actually matter any more?


On Sat, Nov 2, 2013 at 7:52 AM, Sean Owen sro...@apache.org wrote:

 In Math, is there any appetite for large patches containing many
 instances of particular micro-optimizations? Examples:

 - Replace:
 a[i][j] = a[i][j] + foo;
   with:
 a[i][j] += foo;
   … which is faster/leaner in the byte code by a little bit. It might
 make a difference in many nested, tight loops.


Does this actually matter after the JIT takes hold?  And if the JIT doesn't
care to optimize this away, does it even matter?



 - Inefficient toArray() calls with 0-length arg
 - Using Map.entrySet() instead of keySet() + get()s


I think that this actually really does matter, but escape analysis has
gotten dramatically better lately and may make the associated object
creation much less of an issue.


 - Unnecessarily non-static private methods/classes


This is stylistic and important.


 - StringBuffer vs StringBuilder


I know for a fact that escape analysis in recent JVM's gets rid of the
locks in most StringBuilder idioms and this just doesn't matter any more.


Re: [Math] due-to attribute in changes.xml

2013-10-31 Thread Ted Dunning
On Thu, Oct 31, 2013 at 10:24 AM, Gilles gil...@harfang.homelinux.orgwrote:


 The person who raised the bug still took the trouble to do so.


 My question is still: is it sufficient?
 Without filing a bug report, the reporter is harming himself.

 Also, some reports are only feature requests. I deem it quite unfair that
 the release notes would contain lines such as
  * MATH-123456789: Algorithm Xxx implemented. Thanks to reporter.


How is it controversial to say thank you for contributions?  The report is
a contribution and being nice could encourage more contributions.

Being all officious about what suffices to be worthy enough to make the oh
so mighty gatekeepers be generous is a great way to turn people off.


Re: [MATH] Repurposing a deprecated constructor in EigenDecomposition

2013-10-23 Thread Ted Dunning
On Wed, Oct 23, 2013 at 3:14 PM, Sean Owen sro...@gmail.com wrote:

 EigenDecomposition resembles QR in this respect, as far as they are
 implemented here. This argues for them to treat arguments similarly.


Actually not.  It is quite reasonable for the EigenDecomposition to stop
when singularity is reached.  This affects the shape of the eigenvector
matrix.

Perhaps add a new constructor with a double tolerance and a boolean that
says to stop early.  QR is subject to the same logic since partial QR is
often more useful than full QR with singular R.  This is the same logic as
with Cholesky since QR and Cholesky are two sides of the same coin in many
respects.


Re: [MATH] Repurposing a deprecated constructor in EigenDecomposition

2013-10-23 Thread Ted Dunning
On Wed, Oct 23, 2013 at 8:33 PM, Sean Owen sro...@gmail.com wrote:

 it feels a little funny just
 because then we should have similar logic for other decompositions. I
 think I remember the LU one stops early, always.


The stopping early is definitely an option with QR.  With LU, it isn't so
clear.


Re: svn commit: r1533990 - /commons/proper/math/trunk/src/main/java/org/apache/commons/math3/distribution/BetaDistribution.java

2013-10-21 Thread Ted Dunning
Thread issue.  Off topic for this thread.  No idea how this happened.


On Mon, Oct 21, 2013 at 3:25 PM, Phil Steitz phil.ste...@gmail.com wrote:

 Was this maybe to the wrong thread, or is there a doco issue here?

 Phil

  On Oct 20, 2013, at 10:42 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 
  This makes it somewhat harder to read the docs code which is where I
 read docs 90+% of the time.
 
  On the other hand my IDE will do the right thing if I ask it to.
 
  Sent from my iPhone
 
  On Oct 20, 2013, at 14:27, Thomas Neidhart thomas.neidh...@gmail.com
 wrote:
 
  On 10/20/2013 11:24 PM, t...@apache.org wrote:
  Author: tn
  Date: Sun Oct 20 21:24:45 2013
  New Revision: 1533990
 
  URL: http://svn.apache.org/r1533990
  Log:
  [MATH-1039] Avoid code duplication by calling logDensity itself.
 
  Modified:
 
 commons/proper/math/trunk/src/main/java/org/apache/commons/math3/distribution/BetaDistribution.java
 
  Modified:
 commons/proper/math/trunk/src/main/java/org/apache/commons/math3/distribution/BetaDistribution.java
  URL:
 http://svn.apache.org/viewvc/commons/proper/math/trunk/src/main/java/org/apache/commons/math3/distribution/BetaDistribution.java?rev=1533990r1=1533989r2=1533990view=diff
 
 ==
  ---
 commons/proper/math/trunk/src/main/java/org/apache/commons/math3/distribution/BetaDistribution.java
 (original)
  +++
 commons/proper/math/trunk/src/main/java/org/apache/commons/math3/distribution/BetaDistribution.java
 Sun Oct 20 21:24:45 2013
  @@ -136,24 +136,8 @@ public class BetaDistribution extends Ab
 
 /** {@inheritDoc} */
 public double density(double x) {
  -recomputeZ();
  -if (x  0 || x  1) {
  -return 0;
  -} else if (x == 0) {
  -if (alpha  1) {
  -throw new
 NumberIsTooSmallException(LocalizedFormats.CANNOT_COMPUTE_BETA_DENSITY_AT_0_FOR_SOME_ALPHA,
 alpha, 1, false);
  -}
  -return 0;
  -} else if (x == 1) {
  -if (beta  1) {
  -throw new
 NumberIsTooSmallException(LocalizedFormats.CANNOT_COMPUTE_BETA_DENSITY_AT_1_FOR_SOME_BETA,
 beta, 1, false);
  -}
  -return 0;
  -} else {
  -double logX = FastMath.log(x);
  -double log1mX = FastMath.log1p(-x);
  -return FastMath.exp((alpha - 1) * logX + (beta - 1) *
 log1mX - z);
  -}
  +final double logDensity = logDensity(x);
  +return logDensity == Double.NEGATIVE_INFINITY ? 0 :
 FastMath.exp(logDensity);
 }
 
 /** {@inheritDoc} **/
 
  I did this change for one class, but I propose to do this whereever
  applicable to avoid code duplication in the distribution classes.
 
  WDYT?
 
  Thomas
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
  For additional commands, e-mail: dev-h...@commons.apache.org
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
  For additional commands, e-mail: dev-h...@commons.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




Re: [math] MathIllegalArgumentException

2013-10-21 Thread Ted Dunning
+1

The overwhelming standard practice is to use a plausible exception type
(such as some form of IllegalArgumentException) with a message.




On Mon, Oct 21, 2013 at 5:24 PM, Phil Steitz phil.ste...@gmail.com wrote:

 I hate to open this can of worms again, but the following is just
 too painful for me to ignore.  From recent mods to
 BinomialConfidenceInterval javadoc:

  * @throws NumberIsTooLargeException if {@code numberOfSuccesses
  numberOfTrials}.

 The NumberIsTooLarge exception adds exactly zero to what would be
 more natural - just throw MathIAE.  Fortunately, the message is at
 least still there in the code:

 if (numberOfSuccesses  numberOfTrials) {
 throw new

 NumberIsTooLargeException(LocalizedFormats.NUMBER_OF_SUCCESS_LARGER_THAN_POPULATION_SIZE,
 numberOfSuccesses,
 numberOfTrials, true);
  }

 The NumberIsTooLarge is ridiculous.  What number?  Why isn't the
 second number too small?  If we really are going insist on
 defining and advertising lots of little subexceptions to MathIAE, we
 need to define appropriate ones, or just leave MathIAE.  My vote is
 to just allow throwing MathIAE with a descriptive message.  If we
 insist on adding subexceptions for everything, in this case, we need
 something like

 SubsetSizeException

 and in another set of changes that I am about to commit that will
 end up similarly mangled,  I will need

 InsufficientDataException

 I would like to get full community input on this topic for once and
 for all and either add a slew of new exceptions so what we throw is
 meaningful in the context of the caller, or just allow MathIAE to be
 thrown directly.

 So please all be brief and specify your preference for one of the
 two options below:

 0) allow MathIAE to be thrown directly with an informative message

 1) define caller-meaningful exceptions for situations such as
 insufficient data, invalid subset size, invalid probability, invalid
 interval, ...

 I would much prefer 0), but if consensus is 1), I will start adding
 exceptions so what we throw is meaningful and open a ticket to clean
 up for 4.0.

 Phil

 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




Re: svn commit: r1533990 - /commons/proper/math/trunk/src/main/java/org/apache/commons/math3/distribution/BetaDistribution.java

2013-10-20 Thread Ted Dunning
This makes it somewhat harder to read the docs code which is where I read docs 
90+% of the time.  

On the other hand my IDE will do the right thing if I ask it to. 

Sent from my iPhone

On Oct 20, 2013, at 14:27, Thomas Neidhart thomas.neidh...@gmail.com wrote:

 On 10/20/2013 11:24 PM, t...@apache.org wrote:
 Author: tn
 Date: Sun Oct 20 21:24:45 2013
 New Revision: 1533990
 
 URL: http://svn.apache.org/r1533990
 Log:
 [MATH-1039] Avoid code duplication by calling logDensity itself.
 
 Modified:

 commons/proper/math/trunk/src/main/java/org/apache/commons/math3/distribution/BetaDistribution.java
 
 Modified: 
 commons/proper/math/trunk/src/main/java/org/apache/commons/math3/distribution/BetaDistribution.java
 URL: 
 http://svn.apache.org/viewvc/commons/proper/math/trunk/src/main/java/org/apache/commons/math3/distribution/BetaDistribution.java?rev=1533990r1=1533989r2=1533990view=diff
 ==
 --- 
 commons/proper/math/trunk/src/main/java/org/apache/commons/math3/distribution/BetaDistribution.java
  (original)
 +++ 
 commons/proper/math/trunk/src/main/java/org/apache/commons/math3/distribution/BetaDistribution.java
  Sun Oct 20 21:24:45 2013
 @@ -136,24 +136,8 @@ public class BetaDistribution extends Ab
 
 /** {@inheritDoc} */
 public double density(double x) {
 -recomputeZ();
 -if (x  0 || x  1) {
 -return 0;
 -} else if (x == 0) {
 -if (alpha  1) {
 -throw new 
 NumberIsTooSmallException(LocalizedFormats.CANNOT_COMPUTE_BETA_DENSITY_AT_0_FOR_SOME_ALPHA,
  alpha, 1, false);
 -}
 -return 0;
 -} else if (x == 1) {
 -if (beta  1) {
 -throw new 
 NumberIsTooSmallException(LocalizedFormats.CANNOT_COMPUTE_BETA_DENSITY_AT_1_FOR_SOME_BETA,
  beta, 1, false);
 -}
 -return 0;
 -} else {
 -double logX = FastMath.log(x);
 -double log1mX = FastMath.log1p(-x);
 -return FastMath.exp((alpha - 1) * logX + (beta - 1) * log1mX - 
 z);
 -}
 +final double logDensity = logDensity(x);
 +return logDensity == Double.NEGATIVE_INFINITY ? 0 : 
 FastMath.exp(logDensity);
 }
 
 /** {@inheritDoc} **/
 
 I did this change for one class, but I propose to do this whereever
 applicable to avoid code duplication in the distribution classes.
 
 WDYT?
 
 Thomas
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org
 

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [OT][LANG] Blog post about Validate vs. Guava Preconditions

2013-10-18 Thread Ted Dunning
In general, it is going to be very, very hard for Commons to go up against
guava. The Preconditions stuff is only the tip of the ice-berg. The
advantages highlighted in the blog are typical of every aspect of guava ...
well thought out (the different exception types and varargs for instance)
and absolutely no apologies for requiring recent Java versions.

To actually match the quality of guava, Commons would have to stop worrying
about minutiae like whether or not there is a Validate.isNotEmpty and start
pushing hard and fast against the real issues.




On Fri, Oct 18, 2013 at 8:20 AM, Benedikt Ritter brit...@apache.org wrote:

 Hi,

 this came in via twitter:

 http://piotrjagielski.com/blog/google-guava-vs-apache-commons-for-argument-validation/

 What do we do, to win the next contest? :-)

 Benedikt


 --
 http://people.apache.org/~britter/
 http://www.systemoutprintln.de/
 http://twitter.com/BenediktRitter
 http://github.com/britter



Re: [math] Add Pair factory method, toString(), Comparator

2013-10-17 Thread Ted Dunning
On Thu, Oct 17, 2013 at 2:06 PM, Gilles gil...@harfang.homelinux.orgwrote:

 The issue is closed, thank you. To be honest I'm sorry I opened this
 issue, as it wasn't worth this much time or annoyance.


 If the regular contributors were thinking that way, no work would be done.
 There wouldn't be a project where people discuss just like we did.


Gilles,

If it weren't this way, there would be more regular contributors.


Re: [CHALLENGE] Move All of Commons to the Dormant

2013-10-16 Thread Ted Dunning
Careful there.  Hen might suggest making that list dormant.  

Sent from my iPhone

On Oct 16, 2013, at 0:38, Jörg Schaible joerg.schai...@gmx.de wrote:

 BTW: We have already a challenge result, it's just terribly out of date:
 https://wiki.apache.org/commons/CommonsPeople

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] Add Pair factory method, toString(), Comparator

2013-10-16 Thread Ted Dunning
Does this really add comparisons on average?  Or does it only add
comparisons on key equality?  If the latter the difference is definitely
minute.

Secondly, changing comparator value to include value changes how sets work.
 Usually, this is good.  Occasionally bad.  In any case, a change that is
rather subtle and difficult to determine whether there is an impact.





On Wed, Oct 16, 2013 at 10:10 PM, Sean Owen sro...@gmail.com wrote:

 You are right that it adds 1 or 2 more branches per comparison. The new
 Comparator would at least be consistent with equals(), though it probably
 doesn't matter for correctness in practice.

 I am interested in closing this minor issue so I suggest you ignore this
 part if you guess that this overhead is too much, and that it's not worth
 offering this for other callers of CM. I'll just maintain my own copy.


 On Wed, Oct 16, 2013 at 9:56 PM, Gilles gil...@harfang.homelinux.org
 wrote:
 
 
  The potential problem is performance. The current code for sortInPlace
 is
  not as fast as it could be, very probably because it uses a Comparator.
  IIUC,
  using your new comparator will add another if (to test whether
 comparing
  the
  values is necessary). [And reverseOrder will also add a few
 operations
  on
  its own, I guess.]
  It would be nice to know whether the impact is really fairly negligible,
 or
  not.
 
  There is a class (in the test part of repository) for performing simple
  benchmarks:
org.apache.commons.math3.**PerfTestUtils
  that tries to provide fair comparison results by interleaving calls to
 two
  (or more) alternative codes.
 
 
  Best regards,
 
  Gilles
 
 
  --**--**-
  To unsubscribe, e-mail: dev-unsubscribe@commons.**apache.org
 dev-unsubscr...@commons.apache.org
  For additional commands, e-mail: dev-h...@commons.apache.org
 
 



Re: [VOTE] Move Apache Commons to Git for SCM... - is not a consensus

2013-10-13 Thread Ted Dunning
Ralph,

Majority votes at ASF almost never require a majority of all possible
voters.  Almost always the (plus  3  plus  minus) convention is used.

As you can find in innumerable threads as well, consensus among the
discussion participants is preferable for big changes (like moving to git).
 Consensus does not depend on the potential number of voters.

In fact, virtually nothing depends on a quorum at ASF other than member
votes.

That said, this vote may well a small victory that causes a larger problem.
 The hard question here is whether it is better to pause here in order to
make faster progress.  Phil's point is a bit out of order ... if he had
responded to the request for votes with his statement that the vote was
premature, it would have been much better.  To wait until after the vote
has been lost and then claim that more discussion is needed is a bit of a
problem, at least from the point of view of appearance.

One very confusing procedural point is that half-way through the vote, the
subject line reverted to [DISCUSS] rather than [VOTE].

See
http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3CCALznzY4v1bPGrMotJkmSN8wp9hSjs8mMjSj89wfzBEgimhtxrw%40mail.gmail.com%3E

This is the point that Phil first commented.

On the other hand, Phil also commented on the thread with the [VOTE]
subject a number of times:

http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3ca9d202a4-6e76-42d8-9606-1e40d6916...@gmail.com%3E

http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3c08688247-b00e-44c7-8b21-f107921b4...@gmail.com%3E

http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3c5256ff12.3070...@gmail.com%3E

http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3c110b24a9-dd67-436d-9e2d-e29521693...@gmail.com%3E

http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3c110b24a9-dd67-436d-9e2d-e29521693...@gmail.com%3E

In none of these did he say that the vote was premature.





On Sun, Oct 13, 2013 at 11:11 PM, Ralph Goers ralph.go...@dslextreme.comwrote:

 Actually, if you read Roy's post from a few days ago on Incubator General
 you will find that consensus is != to majority or unanimity.  See
 http://mail-archives.apache.org/mod_mbox/incubator-general/201310.mbox/ajax/%3CC2FDB244-459D-4EC4-954A-7A7F6C4B179B%40gbiv.com%3Efrom
  which I quote below:

 Consensus is that everyone who shares an opinion agrees to a common
 resolution (even if they do not personally prefer that resolution).
 Unanimity means that everyone present agrees (for a PMC discussing things
 in email, that means everyone listed on the roster must affirmatively
 agree).

 Hence, consensus decisions can be vetoed, as is clearly stated in the HTTP
 Server Project Guidelines, unless the project has decided to adopt some
 other set of bylaws.
 As I understand this, consensus means that a majority must vote and there
 must not be any -1 votes among those who voted.  Unanimity means everyone
 must vote and no one must vote -1. Of course, majority means there must be
 at least three +1 votes and more +1s than -1s.

 Notice that http://httpd.apache.org/dev/guidelines.html specifically says
 An action item requiring consensus approval must receive at least 3
 binding +1 votes and no vetoes.,  However, I don't see any guidance on the
 httpd page that would indicate whether this vote requires a consensus or a
 majority. One could certainly argue that deciding to move from svn to git
 is procedural and thus only requires a majority, however I tend to
 believe that consensus would be what would be preferred for this vote.

 Ralph


 On Oct 13, 2013, at 1:52 PM, James Carman wrote:

  Phil,
 
  While I appreciate your concerns, the vote is a valid vote:
 
  Votes on procedural issues follow the common format of majority rule
  unless otherwise stated. That is, if there are more favourable votes
  than unfavourable ones, the issue is considered to have passed --
  regardless of the number of votes in each category. (If the number of
  votes seems too small to be representative of a community consensus,
  the issue is typically not pursued. However, see the description of
  lazy consensus for a modifying factor.)
 
  I got this information from:
 
  http://www.apache.org/foundation/voting.html
 
  We definitely have enough people voting to be considered a consensus
  (consensus != unanimous).
 
  However, we will not move forward with the Git move if we don't have
  any luck with our test component (different thread).  If we see the
  test component isn't working out well, then we can just decide (or
  vote again) to scrap the idea and move on.  Hopefully that addresses
  your concerns.
 
  Thanks,
 
  James
 
  On Sun, Oct 13, 2013 at 3:47 PM, Phil Steitz phil.ste...@gmail.com
 wrote:
  On 10/13/13 8:09 AM, James Carman wrote:
  Well, it has been 72 hours, so let's tally up the votes.  As I see it
  (counting votes on both lists):
 
  +1s
  James Carman
  Romain Manni-Bucau

Re: [VOTE] Move Apache Commons to Git for SCM... - is not a consensus

2013-10-13 Thread Ted Dunning
James,

You succeeded in creating a second thread.

It is the first thread that had a reverted subject line.  Ironically, it
was one of your posts that reverted the subject line ... likely related to
the confusion you had in the first place with gmail.

Check the archives.  They show the subject lines.


On Mon, Oct 14, 2013 at 12:07 AM, James Carman
ja...@carmanconsulting.comwrote:

 There were two threads.  As I explained, the first two DISCUSSION/VOTE
 threads were getting mingled together in gmail, so I started another thread
 for the VOTE hoping to avoid confusion (apparently I failed in that).



 On Sunday, October 13, 2013, Ted Dunning wrote:

  Ralph,
 
  Majority votes at ASF almost never require a majority of all possible
  voters.  Almost always the (plus  3  plus  minus) convention is used.
 
  As you can find in innumerable threads as well, consensus among the
  discussion participants is preferable for big changes (like moving to
 git).
   Consensus does not depend on the potential number of voters.
 
  In fact, virtually nothing depends on a quorum at ASF other than member
  votes.
 
  That said, this vote may well a small victory that causes a larger
 problem.
   The hard question here is whether it is better to pause here in order to
  make faster progress.  Phil's point is a bit out of order ... if he had
  responded to the request for votes with his statement that the vote was
  premature, it would have been much better.  To wait until after the vote
  has been lost and then claim that more discussion is needed is a bit of a
  problem, at least from the point of view of appearance.
 
  One very confusing procedural point is that half-way through the vote,
 the
  subject line reverted to [DISCUSS] rather than [VOTE].
 
  See
 
 
 http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3CCALznzY4v1bPGrMotJkmSN8wp9hSjs8mMjSj89wfzBEgimhtxrw%40mail.gmail.com%3E
 
  This is the point that Phil first commented.
 
  On the other hand, Phil also commented on the thread with the [VOTE]
  subject a number of times:
 
 
 
 http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3ca9d202a4-6e76-42d8-9606-1e40d6916...@gmail.com%3E
 
 
 
 http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3c08688247-b00e-44c7-8b21-f107921b4...@gmail.com%3E
 
 
 
 http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3c5256ff12.3070...@gmail.com%3E
 
 
 
 http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3c110b24a9-dd67-436d-9e2d-e29521693...@gmail.com%3E
 
 
 
 http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3c110b24a9-dd67-436d-9e2d-e29521693...@gmail.com%3E
 
  In none of these did he say that the vote was premature.
 
 
 
 
 
  On Sun, Oct 13, 2013 at 11:11 PM, Ralph Goers 
 ralph.go...@dslextreme.com
  wrote:
 
   Actually, if you read Roy's post from a few days ago on Incubator
 General
   you will find that consensus is != to majority or unanimity.  See
  
 
 http://mail-archives.apache.org/mod_mbox/incubator-general/201310.mbox/ajax/%3CC2FDB244-459D-4EC4-954A-7A7F6C4B179B%40gbiv.com%3EfromwhichI
  quote below:
  
   Consensus is that everyone who shares an opinion agrees to a common
   resolution (even if they do not personally prefer that resolution).
   Unanimity means that everyone present agrees (for a PMC discussing
 things
   in email, that means everyone listed on the roster must affirmatively
   agree).
  
   Hence, consensus decisions can be vetoed, as is clearly stated in the
  HTTP
   Server Project Guidelines, unless the project has decided to adopt some
   other set of bylaws.
   As I understand this, consensus means that a majority must vote and
 there
   must not be any -1 votes among those who voted.  Unanimity means
 everyone
   must vote and no one must vote -1. Of course, majority means there must
  be
   at least three +1 votes and more +1s than -1s.
  
   Notice that http://httpd.apache.org/dev/guidelines.html specifically
  says
   An action item requiring consensus approval must receive at least 3
   binding +1 votes and no vetoes.,  However, I don't see any guidance on
  the
   httpd page that would indicate whether this vote requires a consensus
 or
  a
   majority. One could certainly argue that deciding to move from svn to
 git
   is procedural and thus only requires a majority, however I tend to
   believe that consensus would be what would be preferred for this vote.
  
   Ralph
  
  
   On Oct 13, 2013, at 1:52 PM, James Carman wrote:
  
Phil,
   
While I appreciate your concerns, the vote is a valid vote:
   
Votes on procedural issues follow the common format of majority rule
unless otherwise stated. That is, if there are more favourable votes
than unfavourable ones, the issue is considered to have passed --
regardless of the number of votes in each category. (If the number of
votes seems too small to be representative of a community consensus,
the issue is typically

Re: [VOTE] Move Apache Commons to Git for SCM... - is not a consensus

2013-10-13 Thread Ted Dunning
Ralph,

I completely agree that this vote wasn't consensus.

But where you say

As I understand this, consensus means that a majority must vote and there
 must not be any -1 votes among those who voted.


I disagree.  The only quorum typically required for ASF consensus votes is
3 +1's, not a majority of possible voters.




On Mon, Oct 14, 2013 at 2:15 AM, Ralph Goers ralph.go...@dslextreme.comwrote:

 Please re-read my message. James stated  We definitely have enough people
 voting to be considered a consensus (consensus != unanimous).  My point
 was to quote what Roy posted a few days ago that said while consensus isn't
 unanimous it also isn't the simple majority vote either, so to state that
 consensus was reached is incorrect because there were several -1 votes.

 Ralph

 On Oct 13, 2013, at 3:51 PM, Ted Dunning wrote:

  Ralph,
 
  Majority votes at ASF almost never require a majority of all possible
  voters.  Almost always the (plus  3  plus  minus) convention is used.
 
  As you can find in innumerable threads as well, consensus among the
  discussion participants is preferable for big changes (like moving to
 git).
  Consensus does not depend on the potential number of voters.
 
  In fact, virtually nothing depends on a quorum at ASF other than member
  votes.
 
  That said, this vote may well a small victory that causes a larger
 problem.
  The hard question here is whether it is better to pause here in order to
  make faster progress.  Phil's point is a bit out of order ... if he had
  responded to the request for votes with his statement that the vote was
  premature, it would have been much better.  To wait until after the vote
  has been lost and then claim that more discussion is needed is a bit of a
  problem, at least from the point of view of appearance.
 
  One very confusing procedural point is that half-way through the vote,
 the
  subject line reverted to [DISCUSS] rather than [VOTE].
 
  See
 
 http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3CCALznzY4v1bPGrMotJkmSN8wp9hSjs8mMjSj89wfzBEgimhtxrw%40mail.gmail.com%3E
 
  This is the point that Phil first commented.
 
  On the other hand, Phil also commented on the thread with the [VOTE]
  subject a number of times:
 
 
 http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3ca9d202a4-6e76-42d8-9606-1e40d6916...@gmail.com%3E
 
 
 http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3c08688247-b00e-44c7-8b21-f107921b4...@gmail.com%3E
 
 
 http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3c5256ff12.3070...@gmail.com%3E
 
 
 http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3c110b24a9-dd67-436d-9e2d-e29521693...@gmail.com%3E
 
 
 http://mail-archives.apache.org/mod_mbox/commons-dev/201310.mbox/%3c110b24a9-dd67-436d-9e2d-e29521693...@gmail.com%3E
 
  In none of these did he say that the vote was premature.
 
 
 
 
 
  On Sun, Oct 13, 2013 at 11:11 PM, Ralph Goers 
 ralph.go...@dslextreme.comwrote:
 
  Actually, if you read Roy's post from a few days ago on Incubator
 General
  you will find that consensus is != to majority or unanimity.  See
 
 http://mail-archives.apache.org/mod_mbox/incubator-general/201310.mbox/ajax/%3CC2FDB244-459D-4EC4-954A-7A7F6C4B179B%40gbiv.com%3Efromwhich
  I quote below:
 
  Consensus is that everyone who shares an opinion agrees to a common
  resolution (even if they do not personally prefer that resolution).
  Unanimity means that everyone present agrees (for a PMC discussing
 things
  in email, that means everyone listed on the roster must affirmatively
  agree).
 
  Hence, consensus decisions can be vetoed, as is clearly stated in the
 HTTP
  Server Project Guidelines, unless the project has decided to adopt some
  other set of bylaws.
  As I understand this, consensus means that a majority must vote and
 there
  must not be any -1 votes among those who voted.  Unanimity means
 everyone
  must vote and no one must vote -1. Of course, majority means there must
 be
  at least three +1 votes and more +1s than -1s.
 
  Notice that http://httpd.apache.org/dev/guidelines.html specifically
 says
  An action item requiring consensus approval must receive at least 3
  binding +1 votes and no vetoes.,  However, I don't see any guidance on
 the
  httpd page that would indicate whether this vote requires a consensus
 or a
  majority. One could certainly argue that deciding to move from svn to
 git
  is procedural and thus only requires a majority, however I tend to
  believe that consensus would be what would be preferred for this vote.
 
  Ralph
 
 
  On Oct 13, 2013, at 1:52 PM, James Carman wrote:
 
  Phil,
 
  While I appreciate your concerns, the vote is a valid vote:
 
  Votes on procedural issues follow the common format of majority rule
  unless otherwise stated. That is, if there are more favourable votes
  than unfavourable ones, the issue is considered to have passed --
  regardless of the number of votes in each category

Re: [DISCUSS] Why is releasing such a pain and what can we do to make it easier?

2013-10-13 Thread Ted Dunning
On Mon, Oct 14, 2013 at 2:55 AM, Henri Yandell flame...@gmail.com wrote:

 I propose release votes be simple revision based requests and involve no
 artifact churn :)


Hen,

This is a pretty good idea.

But I still think that artifact churn will be a necessary process in order
to get enough valid QA on the artifacts.  But it should be possible to get
a source artifact out without so much pain.


Re: [VOTE] Moving to Git...

2013-10-11 Thread Ted Dunning
I hate myself a bit for jumping in here, but as much as I prefer git, I
really don't think that changing will make that much difference.

The problem with commons is that people have much more energy for
interminable conversations about things that don't much matter (like this
thread).

People who do things don't generally want to talk them to death.  If half
the energy that goes into long debates went into coding for commons there
wouldn't be a problem.  Long discussions about whether discussions about
things that might make coding easier are even worse than a long discussions
so I am now part of the problem.

Perhaps a good rule of thumb would be no more than 5 email messages about
non-code issues per patch that you have posted to a commons component.  I
am probably at or beyond that limit so I will shut up and not respond
further.

Given that the open source community has gradually been re-inventing
aspects of scientific society (salons = meetups, RS = ASF and so on) maybe
it is time to invent something like peer review to moderate the long
conversations.





On Fri, Oct 11, 2013 at 6:01 AM, Christian Grobmeier grobme...@gmail.comwrote:

 +1

 I consider this move to happen step by step and see only little risk if we
 start with a single component first.
 As the half of the world works with git meanwhile I see less risk in
 general too.



 On 10 Oct 2013, at 16:41, James Carman wrote:

  All,

 We have had some great discussions about moving our SCM to Git.  I
 think it's time to put it to a vote.  So, here we go:

 +1 - yes, move to Git
 -1 - no, do not move to Git

 The vote will be left open for 72 hours.  Go!

 James


 --**--**-
 To unsubscribe, e-mail: 
 dev-unsubscribe@commons.**apache.orgdev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org



 ---
 http://www.grobmeier.de
 @grobmeier
 GPG: 0xA5CC90DB

 --**--**-
 To unsubscribe, e-mail: 
 dev-unsubscribe@commons.**apache.orgdev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




Re: [OT] Anyone going to JavaOne?

2013-09-19 Thread Ted Dunning
I am not going, but we have a ton of guys there.

Drop by the MapR booth and say hi!




On Thu, Sep 19, 2013 at 12:50 PM, James Carman
ja...@carmanconsulting.comwrote:

 Is anyone planning on going?  It would be great to meet some of you
 guys face-to-face for once, if you're going to be there.

 James

 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




[MATH] Re: commons-math pull request: Two implementations of least squares in separeat...

2013-08-27 Thread Ted Dunning
On Tue, Aug 27, 2013 at 7:41 AM, Gilles gil...@harfang.homelinux.orgwrote:

 The patch does not apply cleanly (special options needed to handle
 output from git?).


Try different prefix levels.  The -p0 option is commonly helpful.


Re: MannWhitneyUTest.mannWhitneyU

2013-08-26 Thread Ted Dunning
I think that it will be somewhat slower, but next to imperceptibly so.

It will not be any more accurate.

It should be noted, however, that this code will fail for input longer than
2^16 because of integer overflow.



On Sun, Aug 25, 2013 at 8:27 PM, Dave Brosius dbros...@apache.org wrote:

 I would think that in

 public double mannWhitneyU(final double[] x, final double[] y)



 final double U1 = sumRankX - (x.length * (x.length + 1)) / 2;


 should be


 final double U1 = sumRankX - (x.length * (x.length + 1)) / 2*.0*;


 right?



Re: Need for an alternatives to the main line of code.

2013-08-23 Thread Ted Dunning
On Thu, Aug 22, 2013 at 11:52 AM, Luc Maisonobe luc.maison...@free.frwrote:

 Then you just clone it as you
 would clone any repositories and provide a link to your own repository.
 If I remember well, Evan just did that a few days ago.


And you can do with it as you will.

Build a prototype without tests to make a point.

Or fork the code into a proprietary product.

Or whatever you like.


Re: Need for an alternatives to the main line of code.

2013-08-21 Thread Ted Dunning
+1

Sent from my iPhone

On Aug 21, 2013, at 9:42, Ajo Fod ajo@gmail.com wrote:

 
 I hope you'll agree that as it stands, this makes CM capable of only
 solving a subset the mathematical problems of what it can solve with a more
 open policy.
 
 The argument for alternative designs of the API is great too because it
 allows people to comment on the API design as they use it.

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [math] Kolmogorov-Smirnov 2-sample test

2013-08-10 Thread Ted Dunning
On Sat, Aug 10, 2013 at 8:59 AM, Ajo Fod ajo@gmail.com wrote:

 If the data doesn't fit, you probably need a StorelessQuantile estimator
 like QuantileBin1D from the colt libraries. Then pick a resolution and do
 the single pass search.


Peripheral to the actual topic, but the Colt libraries are out of date in
almost every respect.  When we added unit tests, even the most basic
functions turned up dozens of serious bugs.  With respect to more advanced
estimation such as quantiles, nothing in Colt comes close to streamlib.
 Even the Mahout on-line estimators are generally superior.

QuantileBin1D, in particular, lacks the machinery of QDigests (not
suprising since they were published in 2004, long after Colt went dormant).
 Check out

https://github.com/clearspring/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/quantile/QDigest.java

and the original paper

http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf


Re: [Math] Fluent API, inheritance and immutability

2013-08-07 Thread Ted Dunning
This is often dealt with by using builder classes and not putting all the
fluent methods on the objects being constructed.

The other way to deal with this is to use a covariant return type.  For
instance, there is no guarantee that Pattern.compile returns any particular
class other than that it returns a sub-class of Pattern.

Do you have a specific example of the problem you are alluding to?




On Wed, Aug 7, 2013 at 11:23 AM, Gilles gil...@harfang.homelinux.orgwrote:

 Hi.

 It seems that each two of those concepts are at odds with
 the third one.

 E.g. you can have a fluent API and immutability, but this
 then prevents you from defining fluent API methods in a base
 class because immutability requires creating a new object
 (but the base class cannot know how to build a subclass).

 Dropping immutability would allow to define withXxx at the
 hierarchy level where they belong because the fluent methods
 modify instance fields (and return an existing this).
 Whereas keeping immutability requires that all withXxx are
 always redefined at the bottom level of the hierarchy.

 Also, if a base class is abstract, no fluent method can be
 defined in it, since it cannot be instantiated. This also
 leads to the situation where the (re)initialization of fields
 that belong to the base class must be delegated to withXxx
 methods in all the subclasses.

 Thus, in this particular case, immutability entails code
 duplication.

 I wonder whether it would be possible to have the best of all
 worlds by
 1. dropping immutability of the instance fields,
 2. requiring that all classes participating in the fluent API
implement a copy constructor,
 3. requiring that all non abstract classes implement a copy
method (whose contract is to return a fresh copy).
 Hence, code that would like to ensure that it is the sole owner
 of an object would be able to call the copy method on a
 mutable instance that would have been constructed with the
 fluent API.

 [One a side note: that proposal would also seem to reduce the
 overhead (however small that may be) of creating a new object
 for each modification, as well as allow usage in situations where
 creating a new instance would be undesirable e.g.:
 Applying a withXxx method on an object stored in a collection
 would create a local instance, and require that it be assigned back
 into the collection, preventing a language construct such as
 foreach loops.]

 I know that dropping immutability seems a step backwards from what
 were trying to achieve in Commons Math but it seems that we must
 let go of something (and security could be retained by unit tests
 that check the contract of copy)

 If I'm missing something obvious, please let me know!

 Gilles


 --**--**-
 To unsubscribe, e-mail: 
 dev-unsubscribe@commons.**apache.orgdev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




Re: [Math] Does any commiter understand Change of variables?

2013-07-20 Thread Ted Dunning
The math is quite simple.

What is not clear is what the numerical properties are for substitution of
the sort being advocated.

Which functions will do better with substitution?  Which will do better
with Laguerre polynomials?



On Sat, Jul 20, 2013 at 8:59 AM, Ajo Fod ajo@gmail.com wrote:

 The method is described here:
 http://en.wikipedia.org/wiki/Integration_by_substitution

 My patch uses it for improper integration via the change of variable
 t/(1-t^2) as suggested in :
 http://en.wikipedia.org/wiki/Numerical_integration

 Please reach back if anyone understands this concept.

 Cheers,
 -Ajo



Re: [Math] Cleaning up the curve fitters

2013-07-19 Thread Ted Dunning
The discussion about how to get something into commons when it is (a) well
documented and (b) demonstrated better on at least some domains is
partially procedural, but it hinges on technical factors.

I think that Ajo is being very reserved here.  When I faced similar
discouragement in the past with commons math contributions, I simply went
elsewhere.

It still seems to me that it would serve CM well to pay more attention to
Ajo's comments and suggestions.  Simply saying that we should focus on
technical discussion when CM's list is filled with esthetic arguments
really just sounds like a way of pushing people away.


On Fri, Jul 19, 2013 at 10:21 AM, Phil Steitz phil.ste...@gmail.com wrote:

 As I said above, let's focus on actual technical discussion here.
 We implement standard, well-documented algorithms.  We need to
 provide references and convince ourselves that what we release is
 numerically sound, well-documented and well-tested.  We do our best
 with the volunteer resources we have.  Your help and contributions
 are appreciated.

 Phil

 On 7/19/13 9:44 AM, Ajo Fod wrote:
  Hi,
 
  I very much appreciate the work that has been done in CM and this is
  precisely why I'd like more people to contribute. Even when you didnt'
  accept my MATH-995 patch, I got useful input from Konstantin and it has
  already made my application more efficient.
 
  What you required of me in the Improper integral example was a comparison
  of different methods. This sort of research takes time. I hear that
 Gilles
  is working on it. I appreciate that you guys spent so much effort on
 this.
 
  However, my contention is that your efforts at researching alternate
  solutions to a patch is not justified till you come up with a test that
 the
  patch fails OR if you know an alternate performs better for an
 application
  you have. In the first case, you're losing the efficiency of open source
 by
  reinventing a possibly different wheel without sufficient marginal
 reward.
  In the second case, beware of the fact that numerical algorithms are
 hairy
  beasts, and it takes a while to encode something new. The efficiency of
  commons comes from putting the burden of development on the developers
 who
  need the code.
 
  So, I propose an alternate approach to testing if a submitted patch needs
  to be accepted:
  1. Check if the patch fills a gap in existing CM code
  2. if so, check if it passes known tests
  3. if so, write up alternate tests to see if the code breaks.
  4. if so, wrap the code up in a suitable API and accept the patch
 
  This has two advantages. First CM will have more capabilities per unit of
  your precious time. Second you give people the feeling that they are
 making
  a difference.
 
  As far as the debate on AQ(AdaptiveQuadrature) vs
  LGQ(IterativeLegendreGaussIntegrator) goes:
  The FACTS that support AQ over LGQ are:
  1. An example where LGQ failed and AQ succeeded. I also explained why LGQ
  fails and AQ will probably converge more correctly. Generally adaptive
  quadrature are known to be so succesful at integration that Konstantin
 even
  wondered why we don't have something yet.
  2. Efficiency improvement: I also showed that LGQ is more efficient at at
  least one example in terms of accuracy in digits per function evaluation.
  So, conversely, its now your turn to provide concrete examples where LGQ
  does better than AQ. You could pose credible objections by providing
  examples where:
  1. AQ fails but LGQ passes.
  2. LGQ is more efficient in accuracy per evaluation.
 
  All that to illustrate with example where the perception that it is hard
 to
  convince the gatekeepers of commons of the merits of a patch arises
 from. I
  have a package in my codebase with assorted patches that I just dont'
 think
  is worth the time to try to post to commons. I think it is very
 inefficient
  if others have such private patches.
 
  Cheers,
  Ajo
 
 
 
 
 
 
 
  On Thu, Jul 18, 2013 at 2:15 PM, Phil Steitz phil.ste...@gmail.com
 wrote:
 
  On 7/18/13 1:48 PM, Ajo Fod wrote:
  Hello folks,
 
  There is a lot of work in API design. However, Konstantin's point is
 that
  it takes a lot of effort to convince Gilles of any alternatives. API
  design
  issues should really be second to functionality. This idea seems to be
  lost
  in conversations.
  With patience and collaboration you can have both and we *need* to
  have both.  You can't get to a stable API and approachable and
  maintainable code base without thinking carefully about API design.
  I agree with Gilles that providing tests and benchmarks that exhibit
 the
  advantages of a particular method are probably the best way to show
 other
  contributors the value of an alternative approach.
  There is some value to this, but honestly much more value in
  carefully researching and presenting the numerical analysis to
  support improvement / performance claims.
  It is quite depressing to the contributor to see one's contribution be
  rejected 

Re: [Math] Cleaning up the curve fitters

2013-07-19 Thread Ted Dunning
On Fri, Jul 19, 2013 at 12:27 PM, Phil Steitz phil.ste...@gmail.com wrote:

  It still seems to me that it would serve CM well to pay more attention to
  Ajo's comments and suggestions.  Simply saying that we should focus on
  technical discussion when CM's list is filled with esthetic arguments
  really just sounds like a way of pushing people away.

 Please read the threads.  This is not esthetics.  Maybe you can
 help.


I read the threads.  I stand by my assertions that there are *lots* of
non-technical discussions on CM.

Rejecting this particular argument since it is procedural+technical rather
than just technical seems less supportable than continuing those other
threads.

Help?  The way that I help with the esthetic threads is by staying out of
the way.  Contributing to the noise level is not helpful and the endurance
and spare time of the typical CM poster is apparently boundless.

With this thread, the input I have is that Ajo's comments make a lot of
sense (to me) and the technical arguments against including AQ as an
option do not make a lot of sense (to me).


Re: [math] Math-Jax in javadoc?

2013-07-14 Thread Ted Dunning
We have adopted this in Mahout based on the suggestion I saw here.

It works great.



On Sun, Jul 14, 2013 at 2:31 PM, Ajo Fod ajo@gmail.com wrote:

 I like this idea too. Im curious to know how it works.

 +1


 On Sun, Jul 14, 2013 at 11:35 AM, Thomas Neidhart 
 thomas.neidh...@gmail.com
  wrote:

  On 07/14/2013 07:50 PM, Phil Steitz wrote:
   I think we have talked about this before but did not achieve
   consensus or at least never got it set up.  I am finishing the
   javadoc for Kolmogorov-Smirnov tests (MATH-437) and would really
   like to just embed Tex formulas in the class javadoc.  I found that
   just adding an additionalparmeter element to the javadoc plugin
   config in the pom works to get MathJax configured.  Then you just
   use Tex escapes \( and \) for inline, \[ and \] for formulas.  If
   others are OK with this, I will open a JIRA to make the pom change
   and document usage in the programmers guide.
 
  +1
 
  Thomas
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
  For additional commands, e-mail: dev-h...@commons.apache.org
 
 



Re: [math] Math-Jax in javadoc?

2013-07-14 Thread Ted Dunning
I am not sure that I tested with other than javadoc:javadoc.

I also don't think that Mahout uses the site target for generating javadocs
(at least not in continuous integration).

We are only now doing our first release since making the change so I can't
say how that is turning out just yet.



On Sun, Jul 14, 2013 at 10:01 PM, Phil Steitz phil.ste...@gmail.com wrote:

 On 7/14/13 3:24 PM, Ted Dunning wrote:
  We have adopted this in Mahout based on the suggestion I saw here.
 
  It works great.

 I just opened a ticket
 (https://issues.apache.org/jira/browse/MATH-1006) and attached a
 patch.  For some reason, the maven site plugin does not pass the
 -header option through to the javadoc plugin.  When I do mvn
 javadoc:javadoc it works; but mvn site does not bring the MathJax
 engine in.  How did you guys get this to work in mahout?  Thanks in
 advance.

 Phil
 
 
 
  On Sun, Jul 14, 2013 at 2:31 PM, Ajo Fod ajo@gmail.com wrote:
 
  I like this idea too. Im curious to know how it works.
 
  +1
 
 
  On Sun, Jul 14, 2013 at 11:35 AM, Thomas Neidhart 
  thomas.neidh...@gmail.com
  wrote:
  On 07/14/2013 07:50 PM, Phil Steitz wrote:
  I think we have talked about this before but did not achieve
  consensus or at least never got it set up.  I am finishing the
  javadoc for Kolmogorov-Smirnov tests (MATH-437) and would really
  like to just embed Tex formulas in the class javadoc.  I found that
  just adding an additionalparmeter element to the javadoc plugin
  config in the pom works to get MathJax configured.  Then you just
  use Tex escapes \( and \) for inline, \[ and \] for formulas.  If
  others are OK with this, I will open a JIRA to make the pom change
  and document usage in the programmers guide.
  +1
 
  Thomas
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
  For additional commands, e-mail: dev-h...@commons.apache.org
 
 


 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




Re: Lang: ObjectUtils

2013-07-04 Thread Ted Dunning
A bigger question is why this is needed at all.

Why not just use composition?  In guava, one would do this:

Iterables.all(Arrays.asList(foo), new PredicateDouble() {
@Override
public boolean apply(Double input) {
return input != null;
}
});

Surely the same is already possible with commons.



On Thu, Jul 4, 2013 at 12:23 PM, Dave Brosius dbros...@mebigfatguy.comwrote:

 This implies that having arrays with some null elements is

 a) somewhat common
 2) a good idea


 I'd say both are not true.

 I'm not sure the library should promote that the above is the case.



 On 07/04/2013 02:43 PM, Rafael Santini wrote:

 Hi,

 I would like to propose a method in ObjectUtils class that receives an
 array of objects and returns true if all objects are not null. I have
 implemented the following:

 public static boolean isNull(Object object) {
return object == null;
 }

 public static boolean isNotNull(Object object) {
return isNull(object) == false;
 }

 public static boolean isNotNull(Object... objects) {
for (Object object : objects) {
if (isNull(object)) {
return false;
}
}
return true;
 }

 Can I submit a patch for this feature?

 Thanks,

 Rafael Santini




 --**--**-
 To unsubscribe, e-mail: 
 dev-unsubscribe@commons.**apache.orgdev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




 --**--**-
 To unsubscribe, e-mail: 
 dev-unsubscribe@commons.**apache.orgdev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




Re: [math] On MATH-995: Problems with LegendreGaussQuadrature class.

2013-06-28 Thread Ted Dunning
On Fri, Jun 28, 2013 at 9:05 AM, Gilles gil...@harfang.homelinux.orgwrote:

 Did you read my other (rather more lengthy) post?  Is that jumping?


Yes.  You jumped on him rather than helped him be productive.  The general
message is we have something in the works, don't bother us with your
ideas.


Re: [Bag] random pick

2013-03-16 Thread Ted Dunning
I think that this implementation is a problem.

Bag implementations tend to fall into different categories, according to
whether they provide unit (or log) time random access, random deletion and
ordered traversal.  Most implementations don't provide unit time for all
operations.

I think that your implementation assumes multiple operations have unit
time.  Otherwise sampling all of the items will take quadratic time.

Also, your implementations alter the underlying collections.  That seems
like a bad policy.

Sampling without replacement can be implemented as iteration over a
permutation with unit amortized time.  A trivial implementation uses
an auxiliary array.  It is also possible to do without the auxiliary.  You
can get unit amortized time for linked list permutations as well, but it is
difficult to do without the extra storage.

Finally, the names should not consist of combinations of pick and
remit.  The correct English terms are sample and replacement.

On Sat, Mar 16, 2013 at 6:25 PM, Othmen Tiliouine 
tiliouine.oth...@gmail.com wrote:

 This is an example of use of pick from Bag

 https://github.com/influence160/flera

 see the classes

 https://github.com/influence160/flera/blob/master/flera-core/src/main/java/com/otiliouine/flera/SuccessionBasedWordGenerator.javaand

 https://github.com/influence160/flera/blob/master/flera-core/src/main/java/com/otiliouine/flera/analyzer/SuccessionBasedDictionaryAnalyzer.java


 2013/3/13 Othmen Tiliouine tiliouine.oth...@gmail.com

  I remplaced the patch
 
  2013/3/13 Ted Dunning ted.dunn...@gmail.com
 
  You seem to have reformatted the entire file.  This makes it nearly
  impossible to review your suggested change.
 
  Can you make a diff that doesn't involve changing every line in the
 file?
 
  On Tue, Mar 12, 2013 at 3:48 PM, Othmen Tiliouine 
  tiliouine.oth...@gmail.com wrote:
 
   i puted the suggestion and attached the patch
  
   https://issues.apache.org/jira/browse/COLLECTIONS-448
  
   2013/3/12 Thomas Neidhart thomas.neidh...@gmail.com
  
On 03/12/2013 08:58 AM, Othmen Tiliouine wrote:
 Thank you Ted,

 I understand from your mail that there is no particular reason
 that
   makes
 that the interface Bag no contains these methods and that this
  subject
has
 never been discussed in the mailing list.

 I'll see if I can create this patch this weekend, but i want to
  know,
what do
 you think what are the methods I should add

 public Object pick(); //pick random element (with remove)
 public Object pickAndRemit() ; //pick random element (without
  remove)
 public Collection pick(int n); //pick random n element (with
 remove)
 public Collection pickAndRemit(int n) ; //pick random n element
   (without
 remove)
 public Iterator pick(int n); //pick random n element (with remove)
 public Iterator pickAndRemit(int n) ; //pick random n element
  (without
 remove)
 public List pick(int n); //pick random n element (with remove)
 public List pickAndRemit(int n) ; //pick random n element (without
remove)
 public Bag pick(int n); //pick random n element (with remove)
 public Bag pickAndRemit(int n) ; //pick random n element (without
   remove)

 maby i must provide the two kind of methods  ( Bag pick(int n) and
Iterator
 pickOrdered(int n) ) ?


 there is something I do not understand why the bag does not use
   generics
?
   
the current version of collections in the trunk is already adapted
 to
generics. We are currently in the process of preparing a release
  (4.0).
   
So when you provide a patch, please align it to the version in the
  trunk.
   
Thomas
   
   
 -
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org
   
   
  
 
 
 



Re: [Bag] random pick

2013-03-12 Thread Ted Dunning
You seem to have reformatted the entire file.  This makes it nearly
impossible to review your suggested change.

Can you make a diff that doesn't involve changing every line in the file?

On Tue, Mar 12, 2013 at 3:48 PM, Othmen Tiliouine 
tiliouine.oth...@gmail.com wrote:

 i puted the suggestion and attached the patch

 https://issues.apache.org/jira/browse/COLLECTIONS-448

 2013/3/12 Thomas Neidhart thomas.neidh...@gmail.com

  On 03/12/2013 08:58 AM, Othmen Tiliouine wrote:
   Thank you Ted,
  
   I understand from your mail that there is no particular reason that
 makes
   that the interface Bag no contains these methods and that this subject
  has
   never been discussed in the mailing list.
  
   I'll see if I can create this patch this weekend, but i want to know,
  what do
   you think what are the methods I should add
  
   public Object pick(); //pick random element (with remove)
   public Object pickAndRemit() ; //pick random element (without remove)
   public Collection pick(int n); //pick random n element (with remove)
   public Collection pickAndRemit(int n) ; //pick random n element
 (without
   remove)
   public Iterator pick(int n); //pick random n element (with remove)
   public Iterator pickAndRemit(int n) ; //pick random n element (without
   remove)
   public List pick(int n); //pick random n element (with remove)
   public List pickAndRemit(int n) ; //pick random n element (without
  remove)
   public Bag pick(int n); //pick random n element (with remove)
   public Bag pickAndRemit(int n) ; //pick random n element (without
 remove)
  
   maby i must provide the two kind of methods  ( Bag pick(int n) and
  Iterator
   pickOrdered(int n) ) ?
  
  
   there is something I do not understand why the bag does not use
 generics
  ?
 
  the current version of collections in the trunk is already adapted to
  generics. We are currently in the process of preparing a release (4.0).
 
  So when you provide a patch, please align it to the version in the trunk.
 
  Thomas
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
  For additional commands, e-mail: dev-h...@commons.apache.org
 
 



Re: [Bag] random pick

2013-03-11 Thread Ted Dunning
Othmen,

The common way to contribute code is to file a bug report/enhancement
request at the correct commons component:

https://issues.apache.org/jira/secure/BrowseProjects.jspa#10260

My guess is that you want collections which is at

https://issues.apache.org/jira/browse/COLLECTIONS

Then you should put your suggested solution onto that JIRA as an
attachment.  The attachment should be a patch file.  That will be a place
that the merits of the contribution can be discussed.

My own comment here is that the common idiom in the English statistical
literature for sampling from a bag is either sampling without replacement
or sampling with replacement.  Moreover, it is typical that you allow for
multiple items to be sampled rather than requiring sampling to proceed one
element at a time.  Sampling n items from an n-item bag without
replacement, for instance, would return a permutation of the bag (if you
get an ordered sample) or a partition of the bag (if you get an unordered
sample).

There is also the question of whether the bag should be considered
immutable during sampling.  If you want to leave the bag unchanged by
sampling, then you probably should be returning a sampler object of some
kind that is kind of a randomized iterator or iterable.

These kinds of design decisions need to be hashed out in the process of
getting your contribution into the code.


Good luck with your contribution!

On Mon, Mar 11, 2013 at 4:16 PM, Othmen Tiliouine 
tiliouine.oth...@gmail.com wrote:

 Hello,

 I just saw the Bag interface and its implementations, I'm surprised that
 Bag
 T (and none of these implementations) expose the method
 public T pick() and public T pickAndRemit() (pick a random element)
 The Bag object we see in nature, it is mainly used to this, that's why
 it is widely
 used in the probability that when I met 2 white balls and one black, when I
 draw a ball randomly I have 2 times more likely to have a white ball

 I think that if this caracteristic exists it would be very valuable.


 this is a simple implementation of pick() and pickAndRemit() with HashBag

 package com.otiliouine.commons.collections;

 import java.util.Iterator;

 import org.apache.commons.collections.bag.HashBag;

 public class OpaqueHashBag extends HashBag implements OpaqueBag {

 public Object pcik() {
 if (size() == 0) {
 return null;
 }
 int randomIndex = (int) (Math.random() * size());
 int searchIndex = randomIndex;

 Iterator iterator = this.iterator();
 //iterator = this.map.keySet().iterator()
 Object selectedItem = iterator.next();
 int count;

 while (searchIndex  0) {
 searchIndex --;
 selectedItem = iterator.next();
 }
 //while ((count = getCount(selectedItem))  searchIndex + 1) {
 //searchIndex -= count;
 //selectedItem = iterator.next();
 //}
 return selectedItem;
 }

 public Object pickAndRemit() {
 Object picked = pcik();
 if (picked != null) {
 remove(picked, 1);
 }
 return picked;
 }
 }



 it can be optimized if it is writen in AbstractMapBag class

 and this is the test

 public class TestOpaqueHashBag {

 private OpaqueHashBag bag;

 public static void main(String ... args) {
 TestOpaqueHashBag t = new TestOpaqueHashBag();
 t.before();
 t.testpick();
 }

 public void before(){
 bag = new OpaqueHashBag();
 bag.add(white, 6);
 bag.add(black, 3);
 bag.add(red, 1);
 }

 public void testpick() {
 int w = 0, b = 0, r = 0;
 for (int i = 0; i  1000; i++) {
 String ball = (String) bag.pcik();

 if (ball.equals(white)) {
 w ++;
 } else if (ball.equals(black)) {
 b ++;
 } else {
 r ++;
 }
 }
 System.out.println(%white =  + w/10);
 System.out.println(%black =  + b/10);
 System.out.println(%red =  + r/10);
 }

 }

 output :

 %white = 59
 %black = 30
 %red = 9


 Othmen TILIOUINE



Re: [ALL] How to handle static imports [was: Re: svn commit: r1441784 - /commons/sandbox/beanutils2/trunk/src/main/java/org/apache/commons/beanutils2/PropertyDescriptorsRegistry.java]

2013-02-04 Thread Ted Dunning
Another common use is with junit to import assertEquals and such.

On Mon, Feb 4, 2013 at 4:41 PM, Gary Gregory garydgreg...@gmail.com wrote:

 On Mon, Feb 4, 2013 at 4:32 PM, Benedikt Ritter brit...@apache.org
 wrote:

  ...
  We haven't decided yet how to handle static imports. To form some rules,
  we'd like to hear what others think about static imports and what rules
 of
  thumb you use in your projects.

 I do not use static imports at work. I do not like using them unless it is
 for math like expressions (with PI and the like).




Re: [math] DiscreteEmpiricalDistribution

2013-01-07 Thread Ted Dunning
This will be very useful.

Sampling from discrete ECDF's is also closely related to generating samples
from a multinomial distribution.  I did a bit of work on the latter
problem.  The result of that work is in

org.apache.mahout.math.random.Multinomial

The major difference that you will have is that you have an ordered domain
that you are sampling from while Multinomial is simply sampling from a set.
 It would be relatively easy to use MultinomialInterval to do what you
want where Interval represents a half open interval (and allows +Inf as a
right bound and -Inf as a left bound), but I think that you might gain
something from the ability to split an interval that would make Multinomial
irrelevant to your needs.

With MultinomialInterval, one strategy would be to delete an interval and
insert both halves which may be a bit more expensive than desired.  Doing
lots of deletions is also bad in Multinomial because it leaves an entry in
place with zero probability (because I was lazy).

Trying to mutate the Interval you are splitting so that it forms the left
half of the new interval doesn't actually help because modifying the
probability of an element just does a deletion and insertion.  Better to
use the first strategy.

It would be very easy to add a garbage collection step that eliminates zero
probability entries.  As I mentioned, I was just lazy.

Anyway, steal concept or code as you feel appropriate.

On Mon, Jan 7, 2013 at 8:03 AM, Phil Steitz phil.ste...@gmail.com wrote:

 The EmpiricalDistribution class in the random package is designed to
 support large samples.  It does not store all of data points in
 memory, but instead bins the data and uses smoothing kernels within
 the bins.  I have recently had the need for a discrete empirical
 distribution - i.e., an implementation that stores the full dataset
 in memory and creates the empirical distribution exactly from it.
 If there are no objections, I would like to open a JIRA and commit
 o.a.c.m.distribution.DiscreteEmpiricalDistribution implementing
 this.  I will document the approach and design decisions in the JIRA
 if others are OK with this addition.

 Phil

 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




Re: [math] [linear] immutability

2013-01-01 Thread Ted Dunning
On Tue, Jan 1, 2013 at 10:25 AM, Phil Steitz phil.ste...@gmail.com wrote:

 Agreed we should keep the discussion concrete.  Sebastien and Luc
 have both mentioned specific examples where the overhead of matrix
 data copy and storage creates practical problems.  Konstantin
 mentioned another (Gaussian elimination) which is sort of humorous
 because we have in fact effectively implemented that already,
 embedded in the LU decomp class - but to do it, we took the approach
 that I mentioned above, which is to abandon the linear algebraic
 objects and operate directly on double arrays.


And frankly, that can be disastrous for performance as well.  As Konstantin
mentioned, it is critical to have BLAS operations exposed to get good
performance.  Level 2 and 3 operations are particularly important.

Phil's allusions to collections is particularly apt.  There are immutable
collections (see guava) and they are very handy.  And there are mutable
collections and they are very handy.  And there are multi-thread
performance friendly mutable collections (see ConcurrentHashMap).  They all
share a fairly simple API and they have some very simple abstract class
implementation helpers.


Re: [math] [linear] immutability

2013-01-01 Thread Ted Dunning
On Tue, Jan 1, 2013 at 11:17 AM, Sébastien Brisard 
sebastien.bris...@m4x.org wrote:

  Please mention that when I first mentioned in-place operations, I didn't
 have speed in mind, but really memory.



 I think we would not gain much speedwise, as Java has become very good at
 allocating objects (this would be true of large problems, where typically a
 few big objects would be allocated at each iteration. The conclusion would
 probably be different with many small objects to be allocated at each
 iteration).


Allocation is not the problem.  The problem is memory bandwidth due to the
copies that are a side effect of the allocation.


Re: [math] [linear] immutability

2013-01-01 Thread Ted Dunning
My apologies, but I have totally lost track of who said what because too
many comments have enormous lines immediately adjacent to responses.

On Tue, Jan 1, 2013 at 11:39 AM, Somebody s...@body.org wrote:

 I thought that maybe it was due to the underlying
 (dynamic) data structure for sparse vectors/matrices
 (OpenIntToDoubleHashMap), while typical storage schemes (compressed row,
 skyline) might be more efficient (since only arrays are used), but do not
 lend themselves to changes of the value of initially null coefficients.
 That's why I was suggesting immutable matrices as a start, but what I
 really meant was matrices where the null coefficients are constant.

 Please note that your objection does not hold with iterative linear solvers
 (conjugate gradient, ...), so immutable matrices might still be
 interesting.



One of the benefits of making it easy to extend matrices can be found in
Mahout (which I should emphasize is probably only a source of ideas, not a
perfect paragon of perfection by any means).

As a result of easy extensibility, we have in mahout two kinds of sparse
vectors and many kinds of sparse matrices.

One kind of sparse vector uses an open hash table (RandomAccessHashTable).
 It works well for mutable situations, but is a bit more memory hungry than
might be liked.

The other kind is implemented using an array of indexes and an array of
doubles.  It can be read randomly with log n worst case performance or log
log n performance if the indices are well distributed.  It is nearly
immutable in that after a sequence of mutations, it requires substantial
time and possibly memory to merge itself back together for more read
operations.  What it does phenomenally well, however, is support iteration
with little memory overhead.

As far as matrices are concerned, we have a dense matrix backed by a single
indexed double[].  We have a row sparse matrix that has rows that are
either kind of sparse vectors.  We have block sparse matrices that has row
patches that are matrices which need not exist, but are created lazily if
written to.  We have memory mapped dense matrices.  We have a memory mapped
dense matrix that maps several regions of large files together to form a
single matrix (since mmap only supports 2GB files).

Some of these storage forms are in Mahout math.  Some are in applications.

The virtue here is that we didn't have to discuss these matters much.  We
could just say Sounds like a great idea, can you build a prototype to
demonstrate your point? to any bright spark who came along.  Many
prototypes were built and several were incorporated.

So the impact of a simple core API (with linear algebra, mutable operations
and OK, but not great visitor patterns) and associated abstract classes and
abstract tests was as much social as technical.  The social virtues have,
in turn, led to technical virtues.


Re: [math] [linear] immutability

2012-12-31 Thread Ted Dunning
On Mon, Dec 31, 2012 at 9:30 AM, Phil Steitz phil.ste...@gmail.com wrote:

 If we stick to

 0) algebraic objects are immutable
 1) algorithms defined using algebraic concepts should be implemented
 using algebraic objects

 ...
 0)  Start, with Konstantin's help, by fleshing out the InPlace
 matrix / vector interface
 1)  Integrate Mahout code as part of a wholesale refactoring of the
 linear package
 2)  Extend use of the visitor pattern to perform mutations
 in-place (similar to 0) in effect)


Speaking as one of the main authors of the Mahout code and very occasional
contributor to CM, I doubt that integrating it directly will suit CM
needs/prejudices.

For instance, the whole sparse matrix problem where 0 x Inf = 0 instead of
NaN is probably not satisfactory for CM, but speed was considered a more
important requirement for Mahout.  Similarly, Mahout math depends on a
primitive collection implementation that generates over 200 classes from
templates.  That makes some of the sparse codes very fast, but it might
lead to some indigestion for CM.


Re: [math] major problem with new released version 3.1

2012-12-30 Thread Ted Dunning
Dim has it exactly right here.

On Sun, Dec 30, 2012 at 10:38 AM, Dimitri Pourbaix pourb...@astro.ulb.ac.be
 wrote:

 In optimization, the user supplies the function to be minimised.  In curve
 fitting, the user supplies a series of observations and the model to be
 fitted.  Trying to combine both into a unique scheme (how highly abstract
 it is) is simply misleading.



Re: [math] major problem with new released version 3.1

2012-12-30 Thread Ted Dunning
Konstantin,

We are close then.  Yes optimization should use vector methods as possible.

But visitor functions are very easy to add in an abstract class.  They
impose very little burden on the implementor.

On Sun, Dec 30, 2012 at 8:52 AM, Konstantin Berlin kber...@gmail.comwrote:

 I think we might have a misunderstanding. What I am discussing is not the
 general implementation for a matrix, but the base interface that would be
 required for input into optimizers. I was saying that there should not be a
 burden of implementing visitor pattern for users creating a custom class
 for optimization input, if it is not used internally by the optimizers.


Re: [math] major problem with new released version 3.1

2012-12-30 Thread Ted Dunning
The GPU requires native code that is executed on the GPU.  Standard linear
algebra libraries exist for this so if the API can express a standard
linear algebra routine concisely, then the GPU can be used.  General Java
code usually can't be executed on a GPU.

There is some late breaking news on this front, but the way to get
performance is generally to recognize standard idioms that have accelerated
implementations.

In Mahout, for instance, we can recognize many linear algebra operations
from idiomatic use of visitor patterns.  For instance, in this code,

  Vector u, v;
  v.assign(Functions.PLUS, u);

Mahout will recognize the call to assign as a vector addition. This is easy
with vector operations but much harder to recognize matrix operations
expressed with simple visitor patterns.



On Sun, Dec 30, 2012 at 11:26 AM, Sébastien Brisard 
sebastien.bris...@m4x.org wrote:

  and hence preclude vector based process operations, such as you would
 find
  on a GPU. So if the user wanted to speed up the computation using a GPU
  they would not be able to do it, if we base it on a single element at a
  time visitor pattern.
 
 
 I fail to see how the GPU could not be used. I am no expert on GPU
 programming, but I can easily imagine a new implementation of RealVector,
 say GpuBasedRealVector, where the walkInDefaultOrder method would send
 multiple values at a time to the GPU. I've already done that for multi-core
 machines (using fork/join), and the visitor pattern was certainly not a
 limitation.



Re: [math] major problem with new released version 3.1

2012-12-30 Thread Ted Dunning
On Sun, Dec 30, 2012 at 12:16 PM, Konstantin Berlin kber...@gmail.comwrote:


 ...
  There would be no burden on the user's side: the visitor pattern has been
  implemented for RealVectors in version 3.1. Besides, we could provide all
  the relevant visitors (addition, scaling, …)

 There is an additional burden to the user, since if you require
 implementation of the full RealVector or RealMatrix interface, which
 includes a large set of functions not need for the optimizer, since the
 user has no idea which function you will use inside the optimizer and which
 he can leave blank. For example, if a user needs to create their own
 implementation of a vector multiplication, because they have a special case
 which can be handled faster, or because they use a GPU, they are still
 burdened with implementing unnecessary support for the dozens of others
 functions which will never be used. For a GPU example, like Ted has
 pointed out, they can implement a GPU version for basic operations (add,
 multi, etc), but to guarantee general support for any Java function using
 the visitor pattern the user would also need to implement a Java version of
 the visitor pattern.



With a good abstract class to inherit from, this isn't much of a problem in
practice.  You should need to implement very little and you should be able
to over-ride what you will without much danger.

It also helps to have standardized tests that can pretty exhaustively test
new implementations for correctness.  To a surprising extent, this allows
new implementations to be well tested nearly on the day they are written.


Re: [math] major problem with new released version 3.1

2012-12-29 Thread Ted Dunning
Actually, the visitor pattern or variants thereof can produce very
performant linear algebra implementations.  You can't usually get quite
down to optimized BLAS performance, but you get pretty darned fast code.

The reason is that the visitor is typically a very simple class which is
immediately inlined by the JIT.  Then it is subject to all of the normal
optimizations exactly as if the code were written as a single concrete
loop.  For many implementations, the bounds checks will be hoisted out of
the loop so you get pretty decent code.

More importantly in many cases, visitors allow in place algorithms.
 Combined with view operators that limit visibility to part of a matrix,
and the inlining phenomenon mentioned above, this can have enormous
implications to performance.

A great case in point is the Mahout math library.  With no special efforts
taken and using the visitor style fairly ubiquitously, I can get about 2 G
flops from my laptop.  Using Atlas as a LINPAK implementation gives me
about 5 G flops.

I agree with the point that linear algebra operators should be used where
possible, but that just isn't feasible for lots of operations in real
applications.  Getting solid performance with simple code in those
applications is a real virtue.

On Sat, Dec 29, 2012 at 3:22 PM, Konstantin Berlin kber...@gmail.comwrote:

 While visitor pattern is a good abstraction, I think it would make for
 terrible linear algebra performance. All operations should be based on
 basic vector operations, which internally can take advantage of sequential
 memory access. For large problems it makes a difference. The visitor
 pattern is a nice add on, but it should not be the engine driving the
 package under the hood, in my opinion.



Re: [math] major problem with new released version 3.1

2012-12-29 Thread Ted Dunning
Who said force?  Linear algebra operations should be available.

Visitors should be available.

Your mileage will vary.  That was always true.

On Sat, Dec 29, 2012 at 3:59 PM, Konstantin Berlin kber...@gmail.comwrote:

 Also. If you have GPU implementation of a matrix, or use another type of a
 vector processor, there is no way you can program that in if you force
 vector operations to use a visitor patterns.

 On Dec 29, 2012, at 6:43 PM, Konstantin Berlin kber...@gmail.com wrote:

  That's a good point about the compiler. I never tested the performance
 of visitors vs. sequential array access. I just don't want the vector
 operations to be tied to any particular implementation detail.
 
  On Dec 29, 2012, at 6:30 PM, Ted Dunning ted.dunn...@gmail.com wrote:
 
  Actually, the visitor pattern or variants thereof can produce very
  performant linear algebra implementations.  You can't usually get quite
  down to optimized BLAS performance, but you get pretty darned fast code.
 
  The reason is that the visitor is typically a very simple class which is
  immediately inlined by the JIT.  Then it is subject to all of the normal
  optimizations exactly as if the code were written as a single concrete
  loop.  For many implementations, the bounds checks will be hoisted out
 of
  the loop so you get pretty decent code.
 
  More importantly in many cases, visitors allow in place algorithms.
  Combined with view operators that limit visibility to part of a matrix,
  and the inlining phenomenon mentioned above, this can have enormous
  implications to performance.
 
  A great case in point is the Mahout math library.  With no special
 efforts
  taken and using the visitor style fairly ubiquitously, I can get about
 2 G
  flops from my laptop.  Using Atlas as a LINPAK implementation gives me
  about 5 G flops.
 
  I agree with the point that linear algebra operators should be used
 where
  possible, but that just isn't feasible for lots of operations in real
  applications.  Getting solid performance with simple code in those
  applications is a real virtue.
 
  On Sat, Dec 29, 2012 at 3:22 PM, Konstantin Berlin kber...@gmail.com
 wrote:
 
  While visitor pattern is a good abstraction, I think it would make for
  terrible linear algebra performance. All operations should be based on
  basic vector operations, which internally can take advantage of
 sequential
  memory access. For large problems it makes a difference. The visitor
  pattern is a nice add on, but it should not be the engine driving the
  package under the hood, in my opinion.
 
 


 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




Re: [math] pearson and spearman correlation runtime complexity

2012-12-13 Thread Ted Dunning
Can you say more about how you implemented these?

The Pearson coefficient should be quite simple.  A few passes through the
data should suffice and it can probably be done in one pass, especially if
you aren't worried about 1ULP accuracy.

The Spearman coefficient should be no worse than the cost of sorting plus
the cost of the Pearson computation.  There are often faster methods as
well if there are no ties.

On Thu, Dec 13, 2012 at 6:57 AM, Martin Rosellen 
martin.rosel...@fu-berlin.de wrote:

 Hi again,

 I tried to implement the pearson and spearman algorithm myself and the
 computation took very long. That is why I now use the commons math
 solution. I am curious about the runtime complexity of the Pearson and the
 Spearman correlation coefficient. Can someone help me with that?

 Greetz
 Martin

 --**--**-
 To unsubscribe, e-mail: 
 dev-unsubscribe@commons.**apache.orgdev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




Re: [Math] Old to new API (MultivariateDifferentiable(Vector)Function)

2012-12-01 Thread Ted Dunning
Correctness isn't that hard to get.  You just need to add a bitmap for
exceptional values in all matrices.  This bitmap can be accessed by sparse
operations so that the iteration is across the union of non-zero elements
in the sparse vector/matrix and exception elements in the operand.

That fact is, however, that preserving NaN results in these corner cases is
not normally a huge priority for users.  Deleting all support for sparse
vectors, on the other hand, is a huge impact on utility of commons math.
 To my mind deleting hugely useful functionality in the face of a small
issue is upside down, especially when there is actually a pretty simple fix
available.

On Sat, Dec 1, 2012 at 8:02 AM, Sébastien Brisard sebastien.bris...@m4x.org
 wrote:

 A few months ago, we started a thread on this issue (on the users ML). It
 got no answers! I am personally not happy with removing the support for
 sparse vectors/matrices, but the truth is we didn't see a way to achieve
 the same degree of correctness as --say-- array based matrices and vectors.
 As far as I am concerned, though, this is still an open question, and if
 you have ideas about these issues, we would be glad to here them!



Re: [math] Checking preconditions on package private functions

2012-11-29 Thread Ted Dunning
That's fine.  I think raw use of reflection might make the tests pretty
complicated, but the idea is reasonable.

Jmockit allows mocking of static methods (I have used it to mock
System.nanoTime(), for instance).  By using a partial mock class, you can
gain access to private methods as well.

On Thu, Nov 29, 2012 at 10:59 PM, Sébastien Brisard 
sebastien.bris...@m4x.org wrote:

 Does anyone oppose the usage of reflection in unit testing to access
 private methods?
 I personnally think it is a good compromise between encapsulation and
 comprehensive testing.



Re: [math] Checking preconditions on package private functions

2012-11-27 Thread Ted Dunning

I can only say from my own experience that people make mistakes over time and 
having the code warn them when that happens is a good thing.  

Your experience may be different but I have to admit that I have done some 
pretty silly things along the lines of forgetting to follow some constraint. 

On Nov 27, 2012, at 8:08 AM, Gilles Sadowski gil...@harfang.homelinux.org 
wrote:

 My point is that _if_ the methods can be made
 private, then we can assume that they are used properly there (which is
 the only place where they can be called). It's the same that we wouldn't
 check whether, say, we wrote
  a + b
 instead of
  a - b

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [CSV] Discussion about the new CSVFormatBuilder

2012-11-20 Thread Ted Dunning
Surely you meant to say no other commons library.

Builder patterns are relatively common.  See guava for instance:

http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/base/Splitter.html


On Tue, Nov 20, 2012 at 11:49 AM, Gary Gregory garydgreg...@gmail.comwrote:

  - it has been argued that using the builder pattern only to make sure
  CSVFormats are valid is overengineered. No other library has this kind of
  validation.



Re: [CSV] Discussion about the new CSVFormatBuilder

2012-11-20 Thread Ted Dunning
Another way of looking at the builder style is that it is Java's way of
using keyword arguments for complex constructors.  It also allows a
reasonable amount of future-proofing.

These benefits are hard to replicate with constructors.  On the other hand,
builder-style patterns are a royal pain with serialization frameworks.

On Tue, Nov 20, 2012 at 2:57 PM, Gary Gregory garydgreg...@gmail.comwrote:

 Ok this is good. Let's see some healthy debating. :)

 What is the alternate API?

 To me the bother is the extra build() call, but that's the pattern.

 Could an alt API be used and co-exist?

 Is making the ctor an option? It would have to do some validation.

 Gary

 On Nov 20, 2012, at 16:59, Emmanuel Bourg ebo...@apache.org wrote:

  Le 20/11/2012 20:01, Benedikt Ritter a écrit :
 
  Please share your thoughts about the builder.
 
  Sorry Benedikt but I have to say I really don't like this design. I
  prefer a simpler API for the reasons you mentioned in the disadvantages.
  The minor improvements from the developer's point of view are much less
  important than the ease of use from user's point of view.
 
  Emmanuel Bourg
 
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




Re: [Math] MATH-894

2012-11-15 Thread Ted Dunning


The typical answer to this when adding a functional method like compute is to 
also add a view object. The rationale is that a small number of view methods 
can be composed with a small number of compute/aggregate methods to get the 
expressive power of what would otherwise require a vast array of methods.  

On Nov 15, 2012, at 7:03 AM, Phil Steitz phil.ste...@gmail.com wrote:

 
 Then in RDA, we add compute(ArrayFunction) as a public method.  Then
 if we make UnivariateStatistic extend this new interface,
 DescriptiveStatistics can get what it needs from this.
 
 Just at thought.  Would love to get better ideas on this.  What is
 in trunk now works; but having to subclass for internal use makes me
 wonder if we have solved the problem.

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [all] moving to svnpubsub or CMS?

2012-11-15 Thread Ted Dunning
On Thu, Nov 15, 2012 at 2:46 AM, Olivier Lamy ol...@apache.org wrote:

  This is a false dichotomy.
 
  Maven site generation can work with ASF CMS if desired.
 
  That is sort of true but doesn't really apply to commons.  I created the
 Flume site using Maven and Maven generates the site from RST source files.
  This isn't a typical Maven project site.  In essence it is a CMS project
 that happens to invoke Maven whenever content is changed.
 
  IMO the Logging project site is a lot closer to what Commons needs.  The
 top level site is managed by the CMS.  Each of the project sites is built
 by Maven and is directly checked into the production area and each is
 independently managed.  See
 http://wiki.apache.org/logging/ManagingTheWebSite

 Agree on this option (this what we experimented with maven too: not yet
 live)


This hybrid approach is what I was referring to.


Re: [Math] MATH-894

2012-11-15 Thread Ted Dunning
On Thu, Nov 15, 2012 at 8:42 AM, Phil Steitz phil.ste...@gmail.com wrote:

 On 11/15/12 8:01 AM, Ted Dunning wrote:
 
  The typical answer to this when adding a functional method like compute
 is to also add a view object. The rationale is that a small number of view
 methods can be composed with a small number of compute/aggregate methods to
 get the expressive power of what would otherwise require a vast array of
 methods.

 If I understand correctly, we already have a view object exposed -
 getElements.  The challenge is that this method returns a copy and
 what we would like is a way to get a function computed directly on
 the data encapsulated in the RDA.  Without function pointers or real
 array references, I don't see a straightforward way to do this.


When I say view, I mean something that is a reference and is not a copy.
 The getElements method is a copy, not  view under this terminology.

The Colt/Mahout approach is to define a view object which opaquely
remembers a reference to the original, an offset and a length.  Functions
and other arguments can be passed to this view object which operates on a
subset of the original contents by calling the function.  Performance is
actually quite good.  The JIT seems to in-line the view object access to
the underlying object and also in-lines evaluation of the function so that
the actual code that is executed is pretty much what you would write in C,
but you don't have to worry as much since the pattern of access is more
controlled.

For completeness, this is essentially what java.nio does with the *Buffer
classes as well.  You can wrap an array and then you can ask for slices out
of that array while retaining the reference semantics.


Re: [Math] MATH-894

2012-11-15 Thread Ted Dunning
On Thu, Nov 15, 2012 at 10:04 AM, Phil Steitz phil.ste...@gmail.com wrote:

 Do you know how to do that with a primitive array?  Can you provide
 some sample code?


You don't.  See my next paragraph.

See the assign method in this class:

https://github.com/apache/mahout/blob/trunk/math/src/main/java/org/apache/mahout/math/VectorView.java






 Thanks for your help on this.

 Phil
 
  The Colt/Mahout approach is to define a view object which opaquely
  remembers a reference to the original, an offset and a length.  Functions
  and other arguments can be passed to this view object which operates on a
  subset of the original contents by calling the function.  Performance is
  actually quite good.  The JIT seems to in-line the view object access to
  the underlying object and also in-lines evaluation of the function so
 that
  the actual code that is executed is pretty much what you would write in
 C,
  but you don't have to worry as much since the pattern of access is more
  controlled.
 
  For completeness, this is essentially what java.nio does with the *Buffer
  classes as well.  You can wrap an array and then you can ask for slices
 out
  of that array while retaining the reference semantics.
 



Re: [Math] MATH-894

2012-11-15 Thread Ted Dunning
On Thu, Nov 15, 2012 at 10:42 AM, Phil Steitz phil.ste...@gmail.com wrote:

 On 11/15/12 10:29 AM, Ted Dunning wrote:
  On Thu, Nov 15, 2012 at 10:04 AM, Phil Steitz phil.ste...@gmail.com
 wrote:
 
  Do you know how to do that with a primitive array?  Can you provide
  some sample code?
 
  You don't.  See my next paragraph.
 
  See the assign method in this class:
 
 
 https://github.com/apache/mahout/blob/trunk/math/src/main/java/org/apache/mahout/math/VectorView.java

 Interesting.  I see no assign method, but I can see what this thing
 does.  It is not clear to me though how this idea could be
 meaningfully applied to solve the problem we have with applying
 statistics to an RDA without doing any array copying.   Most likely
 I am missing the point.


The assign methods are inherited.  The signatures are like
assign(DoubleFunction), assign(DoubleDoubleFunction, Matrix other) and so
on.

My thought was that if you need to operate on part of an RDA, then a
RDA_View class might do the job.


Re: [Math] MATH-894

2012-11-15 Thread Ted Dunning
Yes.  Sounds similar.

On Thu, Nov 15, 2012 at 11:02 AM, Phil Steitz phil.ste...@gmail.com wrote:

  The assign methods are inherited.  The signatures are like
  assign(DoubleFunction), assign(DoubleDoubleFunction, Matrix other) and so
  on.

 OK, assign looks like what I was calling evaluate and
 DoubleFunction looks like what I was calling ArrayFunction



Re: [all] moving to svnpubsub or CMS?

2012-11-14 Thread Ted Dunning
On Tue, Nov 13, 2012 at 11:40 PM, Luc Maisonobe l...@spaceroots.org wrote:

 Please, could someone who knows what to do step up?


I can't volunteer the time to do this, but I can say that process is really
quite simple.  We switched with Drill and the results are not bad at all.
 See http://incubator.apache.org/drill/

All you need to do is translate the pages to mark-down text, copy and adapt
a few headers and stick the resulting files into a standardized directory
structure in SVN.  From there, you notify infra that you have a CMS set of
pages ready to go and shortly later, you have  CMS supported site.

To edit pages after that, you can either edit the markdown version in SVN,
check it back in and trigger a CMS rebuild.  It is much easier to use a
javascript bookmark provided for the purpose which triggers an in-browser
edit of the page.  You can then stage, review and finally publish the page
from your browser.

The entire process is (somewhat voluminously) documented at
http://www.apache.org/dev/cms.html

It really is relatively painless.


Re: [all] moving to svnpubsub or CMS?

2012-11-14 Thread Ted Dunning
On Wed, Nov 14, 2012 at 6:54 AM, Emmanuel Bourg ebo...@apache.org wrote:

 Le 14/11/2012 08:59, Ted Dunning a écrit :

  All you need to do is translate the pages to mark-down text, copy and
 adapt
  a few headers and stick the resulting files into a standardized directory
  structure in SVN.  From there, you notify infra that you have a CMS set
 of
  pages ready to go and shortly later, you have  CMS supported site.

 Is it possible to publish generated content like Javadocs and Maven
 sites with this CMS ?


Yes.  See the answer posted 6 hours before your question.


Re: [math] UTF-8 characters in javadoc comments

2012-11-14 Thread Ted Dunning
On Wed, Nov 14, 2012 at 12:11 AM, Sébastien Brisard 
sebastien.bris...@m4x.org wrote:

  There is no problem with the current setup of our website (at least, the
 website generated locally has no problem).



For the new system, I would like to step up, but I really (really) have
 no clue what you are talking about... I don't know what svnpubsub or CMS
 are.

 There is a pointer in Ted's answer to your other message. I'll read
 through
 the doc, and if I feel not too incompetent, I will try it on CM if you
 want.


I remember exactly that feeling!

svnpubsub[1] is a mechanism that svn supports which allows listeners to
subscribe to changes in an svn repository.  It stands for
svn-publish-subscribe.  CMS[2][3] stands for content management system and
Apache has built their own on top of svnpubsub.  One rationale for
reinventing this was the need for the site to be completely static.

The way that Apache CMS works is that you write documents in mark-down[4]
format which is basically just text with a few wiki-like conventions for
common textual effects such as headers and links.  These documents are
converted to HTML and embedded in page boilerplate using a templating
system similar to that used by Django's[5].

A nice starting point for a totally simple site might be the Drill web site
[6][7].  I say this because it is the only thing that Drill has put into
SVN and is also new and thus relatively simpler than a fully fleshed out
site.

[1] http://www.apache.org/dev/cms.html#svnpubsub
[2] http://en.wikipedia.org/wiki/Content_management_system
[3] http://www.apache.org/dev/cms.html
[4] http://daringfireball.net/projects/markdown/
[5] https://www.djangoproject.com/
[6] http://incubator.apache.org/drill/
[7] http://svn.apache.org/repos/asf/incubator/drill/


Re: [all] moving to svnpubsub or CMS?

2012-11-14 Thread Ted Dunning
On Wed, Nov 14, 2012 at 1:53 PM, Olivier Lamy ol...@apache.org wrote:

 2012/11/14 Thomas Vandahl t...@apache.org:
  On 14.11.2012 08:40, Luc Maisonobe wrote:
 
  Please, could someone who knows what to do step up?
 
 
  Just a quick note that sites created by Maven can be published with
  svnpubsub using the  SCM Publish Maven Plugin
  (http://maven.apache.org/sandbox/plugins/asf-svnpubsub-plugin/). I guess
  this may keep the effort manageable (no further experience, though).
 
  See especially the link
 
 http://maven.apache.org/sandbox/plugins/asf-svnpubsub-plugin/examples/importing-maven-site.html
  for how to do the initial import.
 IMHO first checkin will be simpler doing a checkin of content from p.a.o

 If you use this maven plugin and your project doesn't have any sub
 modules deploying a site is a simple as today: mvm site-deploy (for
 multi modules it's a different if needed I can help a bit as I have
 setup that for other asf projects and write some part of this plugin
 :-) ).

 The question is more do you want to continue maven site generation for
 docs or move to asf cms ?


This is a false dichotomy.

Maven site generation can work with ASF CMS if desired.


Re: [Math] MATH-878: Feature request with patch

2012-11-04 Thread Ted Dunning
On Sun, Nov 4, 2012 at 11:56 AM, Phil Steitz phil.ste...@gmail.com wrote:

 0) Did you or anyone else ever analyze the bigram data in the paper
 using Fisher's test stats?


That bigram data isn't particularly interesting; any text will show similar
effects.

Others have tested Fisher's exact test, but only a few cases turned up
where there was any mileage.  The cost of Fisher's test makes it much less
interesting for the text, genomic, classification and recommendation
applications of G^2.

1) Is the bigram data from [1] available anywhere?


I don't think so.  Any small technical text should exhibit similar
characteristics.

You can find more examples in my longer work on the subject:

http://arxiv.org/abs/1207.1847

Most of these examples are based on publicly available data.


  1) Do you think a direct implementation of Fisher's test for 2x2
 designs and a monte carlo impl for r x c would be useful?  I have
 this in C from years ago and could translate it fairly easily.


I have no clue if people want this.   G^2 is pretty well entrenched in text
analysis and recommendations and there have been hundreds of citations to
my original paper, many of which replicated the value of the test.  As
such, I wouldn't expect a lot of value in those applications.

Other areas may well be a different story.  A fully featured implementation
of Fisher's exact test is pretty complex, however, since you have to take
such different tacks at different data scales and with differently shaped
tables.


Re: [Math] MATH-878: Feature request with patch

2012-10-22 Thread Ted Dunning
What kind of check did you want?

I checked the code by eye and supplied several test cases.  You might say
that I am versed in statistics since I am the author of the major paper on
this test as applied to computational linguistics.

On Sun, Oct 21, 2012 at 11:07 PM, Phil Steitz phil.ste...@gmail.com wrote:

 On 10/20/12 3:58 AM, Gilles Sadowski wrote:
  Hello.
 
https://issues.apache.org/jira/browse/MATH-878
 
  Would someone well versed in statistics check that contribution?

 I wanted to get to this this weekend, but was not able to.  I will
 look at it as soon as I can get some free cycles.

 Phil
 
 
  Best regards,
  Gilles
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
  For additional commands, e-mail: dev-h...@commons.apache.org
 
 


 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




Re: [Math] MATH-878: Feature request with patch

2012-10-22 Thread Ted Dunning
On Mon, Oct 22, 2012 at 4:20 AM, Gilles Sadowski 
gil...@harfang.homelinux.org wrote:

 On Sun, Oct 21, 2012 at 11:25:08PM -0700, Ted Dunning wrote:
  What kind of check did you want?

 Well, I'm seeking to know whether the code can be included in Commons
 Math's
 trunk.


Hard for me to say as I am usually out of step with c.m.


 Currently, the answer is a partial no (IMHO), because of the remarks
 which
 I formulated on the JIRA page.

[If it were only that, I would have corrected the formatting problems (to my
 taste).]


Fair.


 Thus: I'd like people to confirm that the code itself fits with the design
 of the o.a.c.m.stat package, and to take the responsibility for
 committing
 the patch (adapted to their taste!). :-)


I can't comment on the design.  Only on whether it seems to do what it says
it should.



  I checked the code by eye and supplied several test cases.  You might say
  that I am versed in statistics since I am the author of the major paper
 on
  this test as applied to computational linguistics.

 Thank you for the _contents_ review. Sorry for the misunderstanding that I
 was talking more about the form.


I can't comment on the form.


Re: [Math] MATH-816 (mixture model distribution) . .

2012-10-18 Thread Ted Dunning
Existing code does have a certain cachet to it.

On Thu, Oct 18, 2012 at 5:13 AM, Patrick Meyer meyer...@gmail.com wrote:

 I vote for simplicity. Current practice in the social sciences is to fit
 multiple models, each with a different number of components, and use fit
 statistics to choose the best model.

 There are some additional features I would like to see added and I have the
 code to contribute if it is not currently there. To be consistent with
 Mplus, we need have the algorithm use multiple random starts and run a few
 of the best starts to completion. Mplus uses this strategy to effectively
 overcome local minima.


 -Original Message-
 From: Becksfort, Jared [mailto:jared.becksf...@stjude.org]
 Sent: Wednesday, October 17, 2012 11:37 PM
 To: Commons Developers List
 Subject: RE: [Math] MATH-816 (mixture model distribution) . .

 I see.  I am planning to submit the EM fit for multivariate normal mixture
 models in the next couple of weeks (Math-817).  A Gibbs sampling DP fit may
 be a bit further out.   I am not opposed to allowing the number of
 components to change, but I also like the simplicity of this class.
 Whatever you guys decide is probably fine.

 Jared
 
 From: Ted Dunning [ted.dunn...@gmail.com]
 Sent: Wednesday, October 17, 2012 9:41 PM
 To: Commons Developers List
 Subject: Re: [Math] MATH-816 (mixture model distribution)
 =?utf-8?B?LiAgICAu? ==?utf-8?B?LiAgICAu?=

 The issue is that with a fixed number of components, you need to do
 multiple
 runs to find a best fit number of components.  Gibbs sampling against a
 Dirichlet process can get you to the same answer in about the same cost as
 a
 single run of EM with a fixed number of models.

 On Wed, Oct 17, 2012 at 7:31 PM, Becksfort, Jared 
 jared.becksf...@stjude.org wrote:

  Ted,
 
  I am not sure I understand the problem with the fixed number of
  components.  My understanding is that CM prefers immutable objects.
  Adding a component to an object would require reweighting in addition
  to modifying the component list.  A new mixture model could be
  instantiated using the getComponents function and then adding or
  removing more components if necessary.
 
  Jared
  
  From: Ted Dunning [ted.dunn...@gmail.com]
  Sent: Wednesday, October 17, 2012 5:21 PM
  To: Commons Developers List
  Subject: Re: [Math] MATH-816 (mixture model
  distribution)=?utf-8?B?LiAgICAu?=
 
  Seems fine.
 
  I think that the limitation to a fixed number of mixture components is
  a bit limiting.  So is the limitation to a uniform set of components.
  Both limitations can be eased without a huge difficultly.
 
  Avoiding the fixed number of components can be done by using some
  variant of Dirichlet processes.  Simply picking k_max relatively large
  and then using an approximate DP over that finite set works well.
 
  That said, mixture models are pretty nice to have.
 
  On Wed, Oct 17, 2012 at 2:13 PM, Gilles Sadowski 
  gil...@harfang.homelinux.org wrote:
 
   Hello.
  
   Any objection to commit the code as proposed on the report page?
 https://issues.apache.org/jira/browse/MATH-816
  
  
   Regards,
   Gilles
  
   
   - To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
   For additional commands, e-mail: dev-h...@commons.apache.org
  
  
 
  Email Disclaimer:  www.stjude.org/emaildisclaimer Consultation
  Disclaimer:  www.stjude.org/consultationdisclaimer
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
  For additional commands, e-mail: dev-h...@commons.apache.org
 
 


 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




Re: [Math] MATH-816 (mixture model distribution)

2012-10-17 Thread Ted Dunning
Seems fine.

I think that the limitation to a fixed number of mixture components is a
bit limiting.  So is the limitation to a uniform set of components.  Both
limitations can be eased without a huge difficultly.

Avoiding the fixed number of components can be done by using some variant
of Dirichlet processes.  Simply picking k_max relatively large and then
using an approximate DP over that finite set works well.

That said, mixture models are pretty nice to have.

On Wed, Oct 17, 2012 at 2:13 PM, Gilles Sadowski 
gil...@harfang.homelinux.org wrote:

 Hello.

 Any objection to commit the code as proposed on the report page?
   https://issues.apache.org/jira/browse/MATH-816


 Regards,
 Gilles

 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




Re: [Math] MATH-816 (mixture model distribution)=?utf-8?B?LiAgICAu? =

2012-10-17 Thread Ted Dunning
The issue is that with a fixed number of components, you need to do
multiple runs to find a best fit number of components.  Gibbs sampling
against a Dirichlet process can get you to the same answer in about the
same cost as a single run of EM with a fixed number of models.

On Wed, Oct 17, 2012 at 7:31 PM, Becksfort, Jared 
jared.becksf...@stjude.org wrote:

 Ted,

 I am not sure I understand the problem with the fixed number of
 components.  My understanding is that CM prefers immutable objects. Adding
 a component to an object would require reweighting in addition to modifying
 the component list.  A new mixture model could be instantiated using the
 getComponents function and then adding or removing more components if
 necessary.

 Jared
 
 From: Ted Dunning [ted.dunn...@gmail.com]
 Sent: Wednesday, October 17, 2012 5:21 PM
 To: Commons Developers List
 Subject: Re: [Math] MATH-816 (mixture model
 distribution)=?utf-8?B?LiAgICAu?=

 Seems fine.

 I think that the limitation to a fixed number of mixture components is a
 bit limiting.  So is the limitation to a uniform set of components.  Both
 limitations can be eased without a huge difficultly.

 Avoiding the fixed number of components can be done by using some variant
 of Dirichlet processes.  Simply picking k_max relatively large and then
 using an approximate DP over that finite set works well.

 That said, mixture models are pretty nice to have.

 On Wed, Oct 17, 2012 at 2:13 PM, Gilles Sadowski 
 gil...@harfang.homelinux.org wrote:

  Hello.
 
  Any objection to commit the code as proposed on the report page?
https://issues.apache.org/jira/browse/MATH-816
 
 
  Regards,
  Gilles
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
  For additional commands, e-mail: dev-h...@commons.apache.org
 
 

 Email Disclaimer:  www.stjude.org/emaildisclaimer
 Consultation Disclaimer:  www.stjude.org/consultationdisclaimer


 -
 To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
 For additional commands, e-mail: dev-h...@commons.apache.org




Re: [math] G-Tests in math.stat.inference

2012-10-09 Thread Ted Dunning
Feel free to grab and adapt the Mahout code.  It has some added wrinkles
for convenience like the signed square root variant of the G-test.

On Tue, Oct 9, 2012 at 12:53 PM, rado tzvetkov rtzvet...@yahoo.com wrote:

 Also I already have code to contribute and tests for G-Test for
 independence. (if needed) apache mahout also has some code implemented for
 LLR



Re: [math] Logistic, Probit regerssion and Tolerance checks

2012-09-07 Thread Ted Dunning
This is great.

A very useful feature would be to allow basic L_1 and L_2 regularization.

This makes it much easier to avoid problems with separable problems.

It might be interesting to think for a moment how easy it would be to
support generalized linear regression in this same package.  Small changes
to the loss function in the optimization should allow you to have not just
logistic and probit regression, but also to get Poisson regression and SVM
in the same framework.

On Fri, Sep 7, 2012 at 3:22 AM, marios michaelidis mimari...@hotmail.comwrote:

 I am willing to provide complete
 Logistic and Probit regression algorithms, optimizable by newton Raphson
 optimization maximum-likelihood method , in a very programmatically easy
 way
 (e.g  regression(double matrix [][],  double Target[], String
 Constant, double precision, double tolerance) , with academic references
 and
 very quick (3 secs for 60k set), with getter methods for all the common
 statistics such as null Deviance, Deviance, AIC, BIC, Chi-square f the
 model,
 betas, Wald statistics and p values, Cox_snell R square, Nagelkerke’s
 R-Square,
 Pseudo_r2, residuals, probabilities, classification matrix.



  1   2   3   4   >