Re: [Numbers] Arrays of "Complex" objects and RAM

2019-11-09 Thread Gilles Sadowski
Le dim. 10 nov. 2019 à 00:33, Alex Herbert  a écrit :
>
>
>
> > On 9 Nov 2019, at 18:10, Alex Herbert  wrote:
> >
> >
> >
> >> On 9 Nov 2019, at 12:23, Gilles Sadowski  >> > wrote:
> >>
> >>> Snip
> >>
> >> I think that I tried
> >>  $ mvn -Dcheckstyle.failOnViolation=false test
> >>
> >> And still it wouldn't run the test (because I had introduced the
> >> "System.out" forbidden construct).
> >
> > Checkstyle is configured to run in the validate phase (right at the start) 
> > [1]. The default is normally verify. This was copied from commons RNG and 
> > predates my joining the project.
> >
> > So this is why you cannot run ‘mvn test’ as it runs checkstyle:check and 
> > fails you before doing the test.
> >
> > Now fail on violation is ‘true’ perhaps we should move the check back to 
> > validate. So you cannot do a 'mvn install' with any checkstyle errors. You 
> > can also run mvn checkstyle:check to run the goal manually.
> >
> > [1] http://maven.apache.org/ref/3.6.2/maven-core/lifecycles.html 
> > 
> Short answer:
>
> *** Checkstyle ignores command-line user properties in favour of the POM 
> configuration. ***
>
> (Examples are shown below)
>
> If you want to run tests and use System.out.print to debug stuff (which is a 
> very reasonable dev action) then:
>
> 1. We move checkstyle to run after the test phase (i.e. in verify)
> 2. You have to put '// CHECKSTYLE: stop all’ or e.g. ‘// CHECKSTYLE: stop 
> RegexpCheck’ (for System.out violations) in the file you are working on
> 3. You have to use false in the POM 
> (temporarily)
> 4. You can use an *unset* property from the POM: 
> -Dcheckstyle.maxAllowedViolations=10
>
> I favour 1 or 4.
>
> Option 1 would have less typing and less to remember.

+1 :-)

Gilles

> —
>
> Examples:
>
> Hypothesis: Checkstyle is not using the properties set using -D.
>
>
> User set properties should override the POM. This is how PMD works. If you 
> make a violation for PMD and do this:
>
> mvn pmd:check -Dpmd.failOnViolation=true -X
>
> You can see:
>
> [DEBUG]   (f) failOnViolation = true
>
> And it fails.
>
>
> mvn pmd:check -Dpmd.failOnViolation=false -X
>
> You can see:
>
> [DEBUG]   (f) failOnViolation = false
>
> And it is allowed.
>
>
> Trying the same with checkstyle:
>
> mvn checkstyle:check  -Dcheckstyle.failOnViolation=true/false 
> -Dcheckstyle.failsOnError=true/false -X
>
> All variants of true/false show the config from the POM:
>
> [DEBUG]   (f) failOnViolation = true
> [DEBUG]   (f) failsOnError = false
>
>
> Other properties work as documented:
>
> mvn checkstyle:check  -Dcheckstyle.maxAllowedViolations=10
>
> This shows as:
>
> [DEBUG]   (f) maxAllowedViolations = 10
>
> And the build passes.
>
>
> If I remove:
>
>   true
>
> From the checkstyle build plugin configuration in the POM then the command 
> line user properties work, i.e.
>
> mvn checkstyle:check  -Dcheckstyle.failOnViolation=false
>
> [DEBUG]   (f) failOnViolation = false
> -> PASS
>
> mvn checkstyle:check  -Dcheckstyle.failOnViolation=true
>
> [DEBUG]   (f) failOnViolation = true
> -> FAIL
>
> mvn checkstyle:check
>
> [DEBUG]   (f) failOnViolation = true
> -> FAIL
>
> The last case uses the default value of true. It is still logged to the 
> console. The 'effective POM' does not have the  property 
> anywhere so checkstyle is logging this even though it has not been configured.
>
> I tried checkstyle plugin 3.0.0 and 3.1.0 (the latest). Same result.
>
>
> As a control I put 10 in the 
> checkstyle POM configuration. Checkstyle then ignore the same command line 
> switch that worked before.
>
> *** So it seems checkstyle ignores command-line user properties in favour of 
> the POM configuration. ***
>
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Numbers] Arrays of "Complex" objects and RAM

2019-11-09 Thread Alex Herbert


> On 9 Nov 2019, at 18:10, Alex Herbert  wrote:
> 
> 
> 
>> On 9 Nov 2019, at 12:23, Gilles Sadowski > > wrote:
>> 
>>> Snip
>> 
>> I think that I tried
>>  $ mvn -Dcheckstyle.failOnViolation=false test
>> 
>> And still it wouldn't run the test (because I had introduced the
>> "System.out" forbidden construct).
> 
> Checkstyle is configured to run in the validate phase (right at the start) 
> [1]. The default is normally verify. This was copied from commons RNG and 
> predates my joining the project.
> 
> So this is why you cannot run ‘mvn test’ as it runs checkstyle:check and 
> fails you before doing the test.
> 
> Now fail on violation is ‘true’ perhaps we should move the check back to 
> validate. So you cannot do a 'mvn install' with any checkstyle errors. You 
> can also run mvn checkstyle:check to run the goal manually.
> 
> [1] http://maven.apache.org/ref/3.6.2/maven-core/lifecycles.html 
> 
Short answer:

*** Checkstyle ignores command-line user properties in favour of the POM 
configuration. ***

(Examples are shown below)

If you want to run tests and use System.out.print to debug stuff (which is a 
very reasonable dev action) then:

1. We move checkstyle to run after the test phase (i.e. in verify)
2. You have to put '// CHECKSTYLE: stop all’ or e.g. ‘// CHECKSTYLE: stop 
RegexpCheck’ (for System.out violations) in the file you are working on
3. You have to use false in the POM 
(temporarily)
4. You can use an *unset* property from the POM: 
-Dcheckstyle.maxAllowedViolations=10

I favour 1 or 4.

Option 1 would have less typing and less to remember.

—

Examples:

Hypothesis: Checkstyle is not using the properties set using -D.


User set properties should override the POM. This is how PMD works. If you make 
a violation for PMD and do this:

mvn pmd:check -Dpmd.failOnViolation=true -X

You can see:

[DEBUG]   (f) failOnViolation = true

And it fails.


mvn pmd:check -Dpmd.failOnViolation=false -X

You can see:

[DEBUG]   (f) failOnViolation = false

And it is allowed.


Trying the same with checkstyle:

mvn checkstyle:check  -Dcheckstyle.failOnViolation=true/false 
-Dcheckstyle.failsOnError=true/false -X

All variants of true/false show the config from the POM:

[DEBUG]   (f) failOnViolation = true
[DEBUG]   (f) failsOnError = false


Other properties work as documented:

mvn checkstyle:check  -Dcheckstyle.maxAllowedViolations=10

This shows as:

[DEBUG]   (f) maxAllowedViolations = 10

And the build passes.


If I remove:

  true

From the checkstyle build plugin configuration in the POM then the command line 
user properties work, i.e.

mvn checkstyle:check  -Dcheckstyle.failOnViolation=false

[DEBUG]   (f) failOnViolation = false
-> PASS

mvn checkstyle:check  -Dcheckstyle.failOnViolation=true

[DEBUG]   (f) failOnViolation = true
-> FAIL

mvn checkstyle:check

[DEBUG]   (f) failOnViolation = true
-> FAIL

The last case uses the default value of true. It is still logged to the 
console. The 'effective POM' does not have the  property 
anywhere so checkstyle is logging this even though it has not been configured.

I tried checkstyle plugin 3.0.0 and 3.1.0 (the latest). Same result.


As a control I put 10 in the 
checkstyle POM configuration. Checkstyle then ignore the same command line 
switch that worked before.

*** So it seems checkstyle ignores command-line user properties in favour of 
the POM configuration. ***





Re: [Numbers] Arrays of "Complex" objects and RAM

2019-11-09 Thread Alex Herbert


> On 9 Nov 2019, at 12:23, Gilles Sadowski  wrote:
> 
>> Snip
> 
> I think that I tried
>  $ mvn -Dcheckstyle.failOnViolation=false test
> 
> And still it wouldn't run the test (because I had introduced the
> "System.out" forbidden construct).

Checkstyle is configured to run in the validate phase (right at the start) [1]. 
The default is normally verify. This was copied from commons RNG and predates 
my joining the project.

So this is why you cannot run ‘mvn test’ as it runs checkstyle:check and fails 
you before doing the test.

Now fail on violation is ‘true’ perhaps we should move the check back to 
validate. So you cannot do a 'mvn install' with any checkstyle errors. You can 
also run mvn checkstyle:check to run the goal manually.

[1] http://maven.apache.org/ref/3.6.2/maven-core/lifecycles.html 


Re: [Numbers] Arrays of "Complex" objects and RAM

2019-11-09 Thread Alex Herbert


> On 9 Nov 2019, at 12:23, Gilles Sadowski  wrote:
> 
> Hello Alex.
> 
 [...]
 I think at least the minimum implementation of the abstraction will be
 fairly easy. It can then be performance checked with a few variants.
 
 There currently is not a JMH module for numbers. Should I create one
 under a module included with an optional profile?
>>> 
>>> What is going to be benchmarked?
>> 
>> The operations in Complex applied via a ComplexList or a Complex[], or a
>> specialisation that works direct on the numbers and does not use Complex. If
>> one of the purposes of the ComplexList is efficient computation on large
>> amounts of data then it would be a start to show the efficiency. I can do
>> this on a branch and see if it is useful.
> 
> No need for a branch, as all expressed opinions agree on the usefulness
> of "ComplexList", at least to save RAM.
> Indeed, a JMH module would then hopefully show that no performance is
> lost by the one additional creation of a "Complex" instance when the data
> the abstraction is backed by "double[]" (wrt "Complex[]”).

Since this is the crux of my use case it will be interesting to see if using 
numbers can be as performant as what I am currently doing.

> 
 
 I have spent a fair bit of time trying to fix coverage of Complex. It is
 now at 100% with 95% branch coverage. It currently tests all the edge
 cases for the C standard. However it doesn’t really test actual
 computations.
>>> 
>>> Actual computations?
>>> Coveralls seems OK:
>>> 
>>> https://coveralls.io/builds/26869201/source?filename=commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java
>>> 
>> 
>> The point is that you can get 100% coverage by calling all the methods. You
>> don’t have to do assertions on the results. So for example sin() has no test
>> on data that is not an edge case, for example:
>> 
>> complex(3, 4).sin() = ???
>> 
>> The only test is:
>> 
>> assertComplex(z.sin(), negI.multiply(iz.sinh()));
>> 
>> And the tests for sinh() tests the edge cases.
>> 
>> This is why the following PR [1] exists that swaps the sign for the
>> imaginary component in the sin() computation. There are no tests to show the
>> functions working on “normal” computations. Essentially the most basic tests
>> that compute the functions on data that should go down the common path are
>> all missing, Only the computation edge cases are asserted. So I’ll create
>> some using GNU g++ to create expected results.
>> 
>> [1] https://github.com/apache/commons-numbers/pull/31/files
>> 
> 
> Got it.  Thanks.
> 
>>> 
 I am going to continue on this to add more tests that the computations
 output the same values as a C reference implementation, and hit all the
 branches.
 
 When working on the class it raised a few questions:
 
 1. HashCode
 
 This is current the same for any number with a NaN component. However the
 C standard states that NaN components are allowed in infinites. So
 perhaps the hashCode should be updated to at least distinguish infinites
 from NaN.
>>> 
>>> What would be the purpose?
>> 
>> For storing in a HashSet. I noted this when first working on the class but
>> on reflection I admit that it won’t matter. The current equals method will
>> distinguish any binary difference so these are not equal:
>> 
>> Complex(inf, NaN)
>> Complex(-inf, NaN)
>> Complex(NaN, inf)
>> Complex(NaN, -inf)
>> 
>> They are all infinite but are different. They will be stored under the same
>> hash along with any other Complex with NaN. It may be of use for searching
>> to have any infinite stored under the same key but one that is different
>> from a NaN.
> 
> Finally, I think that the safer option is to follow the JDK convention
> and ensure that "hashCode is consistent with equals".
> 
> How about using "java.util.Arrays" to define both?

Arrays:
public static int hashCode(double a[]) {
if (a == null)
return 0;

int result = 1;
for (double element : a) {
long bits = Double.doubleToLongBits(element);
result = 31 * result + (int)(bits ^ (bits >>> 32));
}
return result;
}

Double
public static int hashCode(double value) {
long bits = doubleToLongBits(value);
return (int)(bits ^ (bits >>> 32));
}


So the same result as Arrays.hashCode should be this:

return 31 * (31 + Double.hashCode(real)) + Double.hashCode(imaginary);

It can be checked it is the same as Arrays.hashCode(new double[]{real, 
imaginary}).

The current hash is:

return 37 * (17 * Double.hashCode(imaginary) + Double.hashCode(real));

So not much difference except the factors, order of arguments and that the 
inner operation on imaginary is 

Re: [Numbers] Arrays of "Complex" objects and RAM

2019-11-09 Thread Gilles Sadowski
Hello Alex.

>>> [...]
>>> I think at least the minimum implementation of the abstraction will be
>>> fairly easy. It can then be performance checked with a few variants.
>>>
>>> There currently is not a JMH module for numbers. Should I create one
>>> under a module included with an optional profile?
>>
>> What is going to be benchmarked?
>
> The operations in Complex applied via a ComplexList or a Complex[], or a
> specialisation that works direct on the numbers and does not use Complex. If
> one of the purposes of the ComplexList is efficient computation on large
> amounts of data then it would be a start to show the efficiency. I can do
> this on a branch and see if it is useful.

No need for a branch, as all expressed opinions agree on the usefulness
of "ComplexList", at least to save RAM.
Indeed, a JMH module would then hopefully show that no performance is
lost by the one additional creation of a "Complex" instance when the data
the abstraction is backed by "double[]" (wrt "Complex[]").

>>>
>>> I have spent a fair bit of time trying to fix coverage of Complex. It is
>>> now at 100% with 95% branch coverage. It currently tests all the edge
>>> cases for the C standard. However it doesn’t really test actual
>>> computations.
>>
>> Actual computations?
>> Coveralls seems OK:
>>
>> https://coveralls.io/builds/26869201/source?filename=commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java
>> 
>
> The point is that you can get 100% coverage by calling all the methods. You
> don’t have to do assertions on the results. So for example sin() has no test
> on data that is not an edge case, for example:
>
> complex(3, 4).sin() = ???
>
> The only test is:
>
> assertComplex(z.sin(), negI.multiply(iz.sinh()));
>
> And the tests for sinh() tests the edge cases.
>
> This is why the following PR [1] exists that swaps the sign for the
> imaginary component in the sin() computation. There are no tests to show the
> functions working on “normal” computations. Essentially the most basic tests
> that compute the functions on data that should go down the common path are
> all missing, Only the computation edge cases are asserted. So I’ll create
> some using GNU g++ to create expected results.
>
> [1] https://github.com/apache/commons-numbers/pull/31/files
> 

Got it.  Thanks.

>>
>>> I am going to continue on this to add more tests that the computations
>>> output the same values as a C reference implementation, and hit all the
>>> branches.
>>>
>>> When working on the class it raised a few questions:
>>>
>>> 1. HashCode
>>>
>>> This is current the same for any number with a NaN component. However the
>>> C standard states that NaN components are allowed in infinites. So
>>> perhaps the hashCode should be updated to at least distinguish infinites
>>> from NaN.
>>
>> What would be the purpose?
>
> For storing in a HashSet. I noted this when first working on the class but
> on reflection I admit that it won’t matter. The current equals method will
> distinguish any binary difference so these are not equal:
>
> Complex(inf, NaN)
> Complex(-inf, NaN)
> Complex(NaN, inf)
> Complex(NaN, -inf)
>
> They are all infinite but are different. They will be stored under the same
> hash along with any other Complex with NaN. It may be of use for searching
> to have any infinite stored under the same key but one that is different
> from a NaN.

Finally, I think that the safer option is to follow the JDK convention
and ensure that "hashCode is consistent with equals".

How about using "java.util.Arrays" to define both?

>>>
>>> 2. NaN definition
>>>
>>> What is not clear is if there exists a definition of NaN for a complex
>>> number. The C library has no way to test it with a single function call.
>>> You have to test both the real and imaginary components. On top of this
>>> you can do computation with NaNs. The GNU g++ compiler computes:
>>>
>>>(1e300 + i 1e300) * (1e30 + i NAN) = inf + i inf
>>>
>>> Thus this is allowing some computations with NaN to produce results. I
>>> don’t know if this is a bug in GNU g++ or intentional. The C standard
>>> states:
>>>
>>> “If both operands of the * operator are complex or if the second operand
>>> of the / operator
>>> is complex, the operator raises floating-point exceptions if appropriate
>>> for the calculation
>>> of the parts of the result, and may raise spurious floating-point
>>> exceptions.”
>>>
>>> That is pretty vague. It says that you can do what is “appropriate”.
>>>
>>> So it seems that perhaps we should only call a complex a NaN is both
>>> fields are NaN. If only one is NaN then the complex is semi-NaN and
>>> results of computations may or may not make sense.
>>
>> Do you have an example of a NaN that makes sense?
>> What is semi-NaN?
>
> semiNaN = 

Re: [Numbers] Arrays of "Complex" objects and RAM

2019-11-09 Thread Alex Herbert


> On 9 Nov 2019, at 02:42, Gilles Sadowski  wrote:
> 
> Hi.
> 
> Le sam. 9 nov. 2019 à 01:48, Alex Herbert  > a écrit :
>> 
>> 
>> 
>>> On 7 Nov 2019, at 23:29, Eric Barnhill  wrote:
>>> 
>>> On Thu, Nov 7, 2019 at 3:25 PM Gilles Sadowski  wrote:
>>> 
 Le jeu. 7 nov. 2019 à 18:36, Eric Barnhill  a
 écrit :
> 
> I should also add on this note, my use case for developing ComplexUtils
 in
> the first place was compatibility with JTransforms and JOCL. In both
 cases
> I wanted to convert Complex[] arrays into interleaved double[] arrays to
> feed into algorithms using those libraries.
 
 Implicit in my remark below is the question: Where does the "Complex[]"
 come from?  If it is never a good idea to create such an array, why provide
 code to convert from it?  Do we agree that we should rather create the
 "ComplexList" abstraction, including accessors that shape the data for
 use with e.g. JTransforms?
 
 
>>> I completely agree this is a superior solution and look forward to its
>>> development.
>> 
>> I think at least the minimum implementation of the abstraction will be 
>> fairly easy. It can then be performance checked with a few variants.
>> 
>> There currently is not a JMH module for numbers. Should I create one under a 
>> module included with an optional profile?
> 
> What is going to be benchmarked?

The operations in Complex applied via a ComplexList or a Complex[], or a 
specialisation that works direct on the numbers and does not use Complex. If 
one of the purposes of the ComplexList is efficient computation on large 
amounts of data then it would be a start to show the efficiency. I can do this 
on a branch and see if it is useful.

> 
>> 
>> I have spent a fair bit of time trying to fix coverage of Complex. It is now 
>> at 100% with 95% branch coverage. It currently tests all the edge cases for 
>> the C standard. However it doesn’t really test actual computations.
> 
> Actual computations?
> Coveralls seems OK:
>
> https://coveralls.io/builds/26869201/source?filename=commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java
>  
> 

The point is that you can get 100% coverage by calling all the methods. You 
don’t have to do assertions on the results. So for example sin() has no test on 
data that is not an edge case, for example:

complex(3, 4).sin() = ???

The only test is:

assertComplex(z.sin(), negI.multiply(iz.sinh()));

And the tests for sinh() tests the edge cases.

This is why the following PR [1] exists that swaps the sign for the imaginary 
component in the sin() computation. There are no tests to show the functions 
working on “normal” computations. Essentially the most basic tests that compute 
the functions on data that should go down the common path are all missing, Only 
the computation edge cases are asserted. So I’ll create some using GNU g++ to 
create expected results.

[1] https://github.com/apache/commons-numbers/pull/31/files 



> 
>> I am going to continue on this to add more tests that the computations 
>> output the same values as a C reference implementation, and hit all the 
>> branches.
>> 
>> When working on the class it raised a few questions:
>> 
>> 1. HashCode
>> 
>> This is current the same for any number with a NaN component. However the C 
>> standard states that NaN components are allowed in infinites. So perhaps the 
>> hashCode should be updated to at least distinguish infinites from NaN.
> 
> What would be the purpose?

For storing in a HashSet. I noted this when first working on the class but on 
reflection I admit that it won’t matter. The current equals method will 
distinguish any binary difference so these are not equal:

Complex(inf, NaN)
Complex(-inf, NaN)
Complex(NaN, inf)
Complex(NaN, -inf)

They are all infinite but are different. They will be stored under the same 
hash along with any other Complex with NaN. It may be of use for searching to 
have any infinite stored under the same key but one that is different from a 
NaN.

> 
>> 
>> 2. NaN definition
>> 
>> What is not clear is if there exists a definition of NaN for a complex 
>> number. The C library has no way to test it with a single function call. You 
>> have to test both the real and imaginary components. On top of this you can 
>> do computation with NaNs. The GNU g++ compiler computes:
>> 
>>(1e300 + i 1e300) * (1e30 + i NAN) = inf + i inf
>> 
>> Thus this is allowing some computations with NaN to produce results. I don’t 
>> know if this is a bug in GNU g++ or intentional. The C standard states:
>> 
>> “If both operands of the * operator are complex or if the second operand of 
>> the / operator
>> is complex, the operator raises floating-point 

Re: [Numbers] Arrays of "Complex" objects and RAM

2019-11-08 Thread Gilles Sadowski
Hi.

Le sam. 9 nov. 2019 à 01:48, Alex Herbert  a écrit :
>
>
>
> > On 7 Nov 2019, at 23:29, Eric Barnhill  wrote:
> >
> > On Thu, Nov 7, 2019 at 3:25 PM Gilles Sadowski  wrote:
> >
> >> Le jeu. 7 nov. 2019 à 18:36, Eric Barnhill  a
> >> écrit :
> >>>
> >>> I should also add on this note, my use case for developing ComplexUtils
> >> in
> >>> the first place was compatibility with JTransforms and JOCL. In both
> >> cases
> >>> I wanted to convert Complex[] arrays into interleaved double[] arrays to
> >>> feed into algorithms using those libraries.
> >>
> >> Implicit in my remark below is the question: Where does the "Complex[]"
> >> come from?  If it is never a good idea to create such an array, why provide
> >> code to convert from it?  Do we agree that we should rather create the
> >> "ComplexList" abstraction, including accessors that shape the data for
> >> use with e.g. JTransforms?
> >>
> >>
> > I completely agree this is a superior solution and look forward to its
> > development.
>
> I think at least the minimum implementation of the abstraction will be fairly 
> easy. It can then be performance checked with a few variants.
>
> There currently is not a JMH module for numbers. Should I create one under a 
> module included with an optional profile?

What is going to be benchmarked?

>
> I have spent a fair bit of time trying to fix coverage of Complex. It is now 
> at 100% with 95% branch coverage. It currently tests all the edge cases for 
> the C standard. However it doesn’t really test actual computations.

Actual computations?
Coveralls seems OK:

https://coveralls.io/builds/26869201/source?filename=commons-numbers-complex/src/main/java/org/apache/commons/numbers/complex/Complex.java

> I am going to continue on this to add more tests that the computations output 
> the same values as a C reference implementation, and hit all the branches.
>
> When working on the class it raised a few questions:
>
> 1. HashCode
>
> This is current the same for any number with a NaN component. However the C 
> standard states that NaN components are allowed in infinites. So perhaps the 
> hashCode should be updated to at least distinguish infinites from NaN.

What would be the purpose?

>
> 2. NaN definition
>
> What is not clear is if there exists a definition of NaN for a complex 
> number. The C library has no way to test it with a single function call. You 
> have to test both the real and imaginary components. On top of this you can 
> do computation with NaNs. The GNU g++ compiler computes:
>
> (1e300 + i 1e300) * (1e30 + i NAN) = inf + i inf
>
> Thus this is allowing some computations with NaN to produce results. I don’t 
> know if this is a bug in GNU g++ or intentional. The C standard states:
>
> “If both operands of the * operator are complex or if the second operand of 
> the / operator
> is complex, the operator raises floating-point exceptions if appropriate for 
> the calculation
> of the parts of the result, and may raise spurious floating-point exceptions.”
>
> That is pretty vague. It says that you can do what is “appropriate”.
>
> So it seems that perhaps we should only call a complex a NaN is both fields 
> are NaN. If only one is NaN then the complex is semi-NaN and results of 
> computations may or may not make sense.

Do you have an example of a NaN that makes sense?
What is semi-NaN?

>
> 3. Multiplication
>
> I applied a change to the algorithm provided in the C standard when 
> recovering (NaN,NaN) results of multiplying infinities. If you multiply 
> infinity by zero this should be NaN.
>
> “if one operand is an infinity and the other operand is a nonzero finite 
> number or an
> infinity, then the result of the * operator is an infinity;"
>
> Note the “non-zero” condition.
>
> But the algorithm provided does not do this and you can get an infinite 
> result back.
>
> This is what GNU g++ does:
>
> multiply(0 + i 0, 0 + i inf) = -nan + i -nan
> multiply(-0 + i 0, 0 + i inf) = -nan + i -nan
> multiply(0 + i 0, -0 + i inf) = -nan + i -nan
> multiply(-0 + i 0, -0 + i inf) = -nan + i -nan
> multiply(0 + i 0, 1 + i -inf) = -nan + i -nan
> multiply(-0 + i 0, 1 + i -inf) = -nan + i -nan
>
> So they have implemented the multiply differently because here multiply 
> infinity by zero is NaN. Note how printf distinguishes the sign of infinite 
> values. I don’t think Java does this. It collates NaN into one 
> representation. I will have to convert to raw long bits to get the sign for a 
> cross reference to a C version.

I don't follow.  Why would it be useful to distinguish -NaN from NaN or i NaN?

>
> 4. Divide
>
> I have applied a change to this algorithm to check for overflow in the 
> division. The reference text states:
>
> “if the first operand is a finite number and the second operand is an 
> infinity, then the
> result of the / operator is a zero;”
>
> But for example GNU g++ does this:
>
> divide(1e+308 + i 1e+308, inf + i inf) = -nan + i 0
>
> This is what the reference 

Re: [Numbers] Arrays of "Complex" objects and RAM

2019-11-08 Thread Alex Herbert


> On 7 Nov 2019, at 23:29, Eric Barnhill  wrote:
> 
> On Thu, Nov 7, 2019 at 3:25 PM Gilles Sadowski  wrote:
> 
>> Le jeu. 7 nov. 2019 à 18:36, Eric Barnhill  a
>> écrit :
>>> 
>>> I should also add on this note, my use case for developing ComplexUtils
>> in
>>> the first place was compatibility with JTransforms and JOCL. In both
>> cases
>>> I wanted to convert Complex[] arrays into interleaved double[] arrays to
>>> feed into algorithms using those libraries.
>> 
>> Implicit in my remark below is the question: Where does the "Complex[]"
>> come from?  If it is never a good idea to create such an array, why provide
>> code to convert from it?  Do we agree that we should rather create the
>> "ComplexList" abstraction, including accessors that shape the data for
>> use with e.g. JTransforms?
>> 
>> 
> I completely agree this is a superior solution and look forward to its
> development.

I think at least the minimum implementation of the abstraction will be fairly 
easy. It can then be performance checked with a few variants.

There currently is not a JMH module for numbers. Should I create one under a 
module included with an optional profile?



I have spent a fair bit of time trying to fix coverage of Complex. It is now at 
100% with 95% branch coverage. It currently tests all the edge cases for the C 
standard. However it doesn’t really test actual computations. I am going to 
continue on this to add more tests that the computations output the same values 
as a C reference implementation, and hit all the branches.

When working on the class it raised a few questions:

1. HashCode

This is current the same for any number with a NaN component. However the C 
standard states that NaN components are allowed in infinites. So perhaps the 
hashCode should be updated to at least distinguish infinites from NaN.

2. NaN definition

What is not clear is if there exists a definition of NaN for a complex number. 
The C library has no way to test it with a single function call. You have to 
test both the real and imaginary components. On top of this you can do 
computation with NaNs. The GNU g++ compiler computes:

(1e300 + i 1e300) * (1e30 + i NAN) = inf + i inf

Thus this is allowing some computations with NaN to produce results. I don’t 
know if this is a bug in GNU g++ or intentional. The C standard states:

“If both operands of the * operator are complex or if the second operand of the 
/ operator
is complex, the operator raises floating-point exceptions if appropriate for 
the calculation
of the parts of the result, and may raise spurious floating-point exceptions.”

That is pretty vague. It says that you can do what is “appropriate”.

So it seems that perhaps we should only call a complex a NaN is both fields are 
NaN. If only one is NaN then the complex is semi-NaN and results of 
computations may or may not make sense.

3. Multiplication

I applied a change to the algorithm provided in the C standard when recovering 
(NaN,NaN) results of multiplying infinities. If you multiply infinity by zero 
this should be NaN. 

“if one operand is an infinity and the other operand is a nonzero finite number 
or an
infinity, then the result of the * operator is an infinity;"

Note the “non-zero” condition.

But the algorithm provided does not do this and you can get an infinite result 
back. 

This is what GNU g++ does:

multiply(0 + i 0, 0 + i inf) = -nan + i -nan
multiply(-0 + i 0, 0 + i inf) = -nan + i -nan
multiply(0 + i 0, -0 + i inf) = -nan + i -nan
multiply(-0 + i 0, -0 + i inf) = -nan + i -nan
multiply(0 + i 0, 1 + i -inf) = -nan + i -nan
multiply(-0 + i 0, 1 + i -inf) = -nan + i -nan

So they have implemented the multiply differently because here multiply 
infinity by zero is NaN. Note how printf distinguishes the sign of infinite 
values. I don’t think Java does this. It collates NaN into one representation. 
I will have to convert to raw long bits to get the sign for a cross reference 
to a C version.

4. Divide

I have applied a change to this algorithm to check for overflow in the 
division. The reference text states:

“if the first operand is a finite number and the second operand is an infinity, 
then the
result of the / operator is a zero;”

But for example GNU g++ does this:

divide(1e+308 + i 1e+308, inf + i inf) = -nan + i 0

This is what the reference algorithm does if it is not changed. Here overflow 
cause problems in the result (which should be zero). So I think my patch is 
correct and the result should be (0 + i 0). The current unit tests are 
enforcing this requirement.

5. Edge cases

Some of the methods have a large amount of if statements to test edge cases. 
These are very verbose and hard to follow. They also leave the common path of a 
non infinite, non nan, non zero value at the end. So a rearrangement may allow 
better branch prediction for the common use case. I’ll carry on working on this 
as I add more tests. 

6. tanh

I have put a TODO: note in the code about this. It tests 

Re: [Numbers] Arrays of "Complex" objects and RAM

2019-11-07 Thread Eric Barnhill
On Thu, Nov 7, 2019 at 3:25 PM Gilles Sadowski  wrote:

> Le jeu. 7 nov. 2019 à 18:36, Eric Barnhill  a
> écrit :
> >
> > I should also add on this note, my use case for developing ComplexUtils
> in
> > the first place was compatibility with JTransforms and JOCL. In both
> cases
> > I wanted to convert Complex[] arrays into interleaved double[] arrays to
> > feed into algorithms using those libraries.
>
> Implicit in my remark below is the question: Where does the "Complex[]"
> come from?  If it is never a good idea to create such an array, why provide
> code to convert from it?  Do we agree that we should rather create the
> "ComplexList" abstraction, including accessors that shape the data for
> use with e.g. JTransforms?
>
>
I completely agree this is a superior solution and look forward to its
development.


Re: [Numbers] Arrays of "Complex" objects and RAM

2019-11-07 Thread Gilles Sadowski
Le jeu. 7 nov. 2019 à 18:36, Eric Barnhill  a écrit :
>
> I should also add on this note, my use case for developing ComplexUtils in
> the first place was compatibility with JTransforms and JOCL. In both cases
> I wanted to convert Complex[] arrays into interleaved double[] arrays to
> feed into algorithms using those libraries.

Implicit in my remark below is the question: Where does the "Complex[]"
come from?  If it is never a good idea to create such an array, why provide
code to convert from it?  Do we agree that we should rather create the
"ComplexList" abstraction, including accessors that shape the data for
use with e.g. JTransforms?

Regards,
Gilles

> On Thu, Nov 7, 2019 at 9:34 AM Eric Barnhill  wrote:
>
> >
> >
> > On Thu, Nov 7, 2019 at 6:09 AM Gilles Sadowski 
> > wrote:
> >
> >>
> >> This is also what started this thread: The user called the Commons Math's
> >> FFT utilities using arrays of "Complex" objects and got the "OutOfMemory"
> >> error.  Hence the question of whether storing many "Complex" objects is
> >> ever useful, as compared to the "ComplexList", backed with an array of
> >> primitives, and instantiating transient "Complex" instances on-demand.
> >>
> >
> > I'm glad it has provoked such improvements. As you implicitly reference in
> > your reply, probably we should just be shading JTransforms in the first
> > place. I started using JTransforms because I had trouble using the
> > commons-math FFT as well.
> >

-
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org



Re: [Numbers] Arrays of "Complex" objects and RAM

2019-11-07 Thread Eric Barnhill
I should also add on this note, my use case for developing ComplexUtils in
the first place was compatibility with JTransforms and JOCL. In both cases
I wanted to convert Complex[] arrays into interleaved double[] arrays to
feed into algorithms using those libraries.

On Thu, Nov 7, 2019 at 9:34 AM Eric Barnhill  wrote:

>
>
> On Thu, Nov 7, 2019 at 6:09 AM Gilles Sadowski 
> wrote:
>
>>
>> This is also what started this thread: The user called the Commons Math's
>> FFT utilities using arrays of "Complex" objects and got the "OutOfMemory"
>> error.  Hence the question of whether storing many "Complex" objects is
>> ever useful, as compared to the "ComplexList", backed with an array of
>> primitives, and instantiating transient "Complex" instances on-demand.
>>
>
> I'm glad it has provoked such improvements. As you implicitly reference in
> your reply, probably we should just be shading JTransforms in the first
> place. I started using JTransforms because I had trouble using the
> commons-math FFT as well.
>


Re: [Numbers] Arrays of "Complex" objects and RAM

2019-11-07 Thread Eric Barnhill
On Thu, Nov 7, 2019 at 6:09 AM Gilles Sadowski  wrote:

>
> This is also what started this thread: The user called the Commons Math's
> FFT utilities using arrays of "Complex" objects and got the "OutOfMemory"
> error.  Hence the question of whether storing many "Complex" objects is
> ever useful, as compared to the "ComplexList", backed with an array of
> primitives, and instantiating transient "Complex" instances on-demand.
>

I'm glad it has provoked such improvements. As you implicitly reference in
your reply, probably we should just be shading JTransforms in the first
place. I started using JTransforms because I had trouble using the
commons-math FFT as well.


Re: [Numbers] Arrays of "Complex" objects and RAM

2019-11-07 Thread Gilles Sadowski
>  [...]
> >>>
> >>> public class ComplexArray {
> >>> private final MultidimensionalCounter counter;
> >>> // ...
> >>>
> >>> public ComplexArray(int... dimensions) {
> >>> counter = new MultidimensionalCounter(dimensions);
> >>> data = new data[counter.getSize()];
> >>> }
> >>>
> >>> /**
> >>>  * @param indices Storage location.
> >>>  * @return the complex number stored at the given location.
> >>>  */
> >>> public Complex get(int... indices) {
> >>> return data[counter.getCount(indices)];
> >>> }
> >>> }
> >>>
> >>> Or perhaps the data cube should be an additional abstraction on top of
> >>> "ComplexArray" (?).
> >> I think adding the abstraction on top is easier.
> >>
> >> Array can be renamed to Collection/List? Thus the count is the only
> >> thing that matters, and optionally a constant time access get(int index)
> >> method.
> >
> > +1
> >
> >>
> >> This can later be reshaped to a Matrix representation using the
> >> MultidimensionalCounter pattern.
> >>
> >> We have two use cases:
> >>
> >> 1. You know the final count of Complex objects
> >> 2. You do not know the count of Complex objects
> >>
> >> Use case 2 is used in streams where the ComplexList must be expandable
> >> for use as a collector. This can be satisfied with a constructor with an
> >> initial capacity. The ComplexList would thus offer a specialisation of
> >> ArrayList by storing the Complex objects using primitive array(s).
> >>
> >> Use case 2 rules out the possibility of immutability. This is the type
> >> of functionality under discussion and could be served using an interface
> >> to allow different implementations:
> >>
> >> public interface ComplexList {
> >> [...]
> >> }
> >
> > +1
> >
> >> This could be separated to put the set and add methods in a
> >> sub-interface allowing the top level interface to be an immutable object.
> >
> > Perhaps better to follow the JDK convention and provide a
> >public static ComplexList unmodifiableList(ComplexList delegate)
> > method.
> >
> >> However the apply functions currently are specified using Complex.
> >> Efficiency would be gained using primitive specialisations for operating
> >> on 1 or 2 complex numbers and outputting the results to a provided 
> >> consumer:
> >>
> >> public interface ComplexProvider {
> >> /**
> >>  * Gets the real component of the complex number.
> >>  *
> >>  * @return the real component
> >>  */
> >> double getReal();
> >>
> >> /**
> >>  * Gets the imaginary component of the complex number.
> >>  *
> >>  * @return the imaginary component
> >>  */
> >> double getImaginary();
> >> }
> >>
> >> public interfaceComplexProvider {
> >> /**
> >>  * Accept the components of the complex number.
> >>  * Optionally return a new ComplexProvider with the components.
> >>  *
> >>  * @param real the real
> >>  * @param imaginary the imaginary
> >>  * @return the complex provider (or null)
> >>  */
> >> ComplexProvider accept(double real, double imaginary);
> >> }
> >
> > What is gained wrt using "Complex" instances directly?
>
> The above is a typo. The accept method should be in a ComplexConsumer 
> interface.
>
> If the data is in a primitive array you would like to be able to access it, 
> and then set it back to the same array or a different array without going 
> through creation of a Complex for each (real, imaginary) pair. So the 
> ComplexProvider and ComplexConsumer provide an input and output for any 
> function that changes the complex data. I’ve made it a bit more complicated 
> with the ComplexConsumer returning a ComplexProvider just to allow all the 
> methods in Complex to be rewritten to provide static methods that can use 
> these interfaces.

I can see that one pass through the primitive array could yield the "real"
and "imaginary" arguments to "accept"; but it must return an instance of
a class (that implements "ComplexProvider"); hence calling a constructor.
In a chain of operations, only the first call to the constructor is avoided.

>
> A simpler approach is for the ComplexList to allow modification in-place by 
> providing a list cursor that can traverse the list (like an iterator) and 
> also skip to positions. It would allow get and set of the data at the current 
> position. This cursor could be used by another class to apply functions in 
> place to the data. The functions could be a user specified generic function 
> using the ComplexFunction interface I suggested.

Still, I don't see what is the gain wrt to creating a result "ComplexList" and
let the caller decide whether to replace the input data with the result:
ComplexList in = ...
ComplexList out = in.apply(function);
in = out;

> I think that performance would have to be assessed for various prototypes. 
> Some of the 

Re: [Numbers] Arrays of "Complex" objects and RAM

2019-11-06 Thread Alex Herbert



> On 6 Nov 2019, at 18:08, Gilles Sadowski  wrote:
> 
> Hi.
> 
 [...]
>>> 
>>> public class ComplexArray {
>>> private final MultidimensionalCounter counter;
>>> // ...
>>> 
>>> public ComplexArray(int... dimensions) {
>>> counter = new MultidimensionalCounter(dimensions);
>>> data = new data[counter.getSize()];
>>> }
>>> 
>>> /**
>>>  * @param indices Storage location.
>>>  * @return the complex number stored at the given location.
>>>  */
>>> public Complex get(int... indices) {
>>> return data[counter.getCount(indices)];
>>> }
>>> }
>>> 
>>> Or perhaps the data cube should be an additional abstraction on top of
>>> "ComplexArray" (?).
>> I think adding the abstraction on top is easier.
>> 
>> Array can be renamed to Collection/List? Thus the count is the only
>> thing that matters, and optionally a constant time access get(int index)
>> method.
> 
> +1
> 
>> 
>> This can later be reshaped to a Matrix representation using the
>> MultidimensionalCounter pattern.
>> 
>> We have two use cases:
>> 
>> 1. You know the final count of Complex objects
>> 2. You do not know the count of Complex objects
>> 
>> Use case 2 is used in streams where the ComplexList must be expandable
>> for use as a collector. This can be satisfied with a constructor with an
>> initial capacity. The ComplexList would thus offer a specialisation of
>> ArrayList by storing the Complex objects using primitive array(s).
>> 
>> Use case 2 rules out the possibility of immutability. This is the type
>> of functionality under discussion and could be served using an interface
>> to allow different implementations:
>> 
>> public interface ComplexList {
>> [...]
>> }
> 
> +1
> 
>> This could be separated to put the set and add methods in a
>> sub-interface allowing the top level interface to be an immutable object.
> 
> Perhaps better to follow the JDK convention and provide a
>public static ComplexList unmodifiableList(ComplexList delegate)
> method.
> 
>> However the apply functions currently are specified using Complex.
>> Efficiency would be gained using primitive specialisations for operating
>> on 1 or 2 complex numbers and outputting the results to a provided consumer:
>> 
>> public interface ComplexProvider {
>> /**
>>  * Gets the real component of the complex number.
>>  *
>>  * @return the real component
>>  */
>> double getReal();
>> 
>> /**
>>  * Gets the imaginary component of the complex number.
>>  *
>>  * @return the imaginary component
>>  */
>> double getImaginary();
>> }
>> 
>> public interfaceComplexProvider {
>> /**
>>  * Accept the components of the complex number.
>>  * Optionally return a new ComplexProvider with the components.
>>  *
>>  * @param real the real
>>  * @param imaginary the imaginary
>>  * @return the complex provider (or null)
>>  */
>> ComplexProvider accept(double real, double imaginary);
>> }
> 
> What is gained wrt using "Complex" instances directly?

The above is a typo. The accept method should be in a ComplexConsumer interface.

If the data is in a primitive array you would like to be able to access it, and 
then set it back to the same array or a different array without going through 
creation of a Complex for each (real, imaginary) pair. So the ComplexProvider 
and ComplexConsumer provide an input and output for any function that changes 
the complex data. I’ve made it a bit more complicated with the ComplexConsumer 
returning a ComplexProvider just to allow all the methods in Complex to be 
rewritten to provide static methods that can use these interfaces.

A simpler approach is for the ComplexList to allow modification in-place by 
providing a list cursor that can traverse the list (like an iterator) and also 
skip to positions. It would allow get and set of the data at the current 
position. This cursor could be used by another class to apply functions in 
place to the data. The functions could be a user specified generic function 
using the ComplexFunction interface I suggested.

I think that performance would have to be assessed for various prototypes. Some 
of the methods in Complex have a lot of work. So the creation of a Complex for 
each position may not be a big overhead allowing a stream approach to build a 
custom process pipeline which is then collected into a new list at the end. For 
operations that are fast such as negate, add, subtract, conjugate these would 
be easy to code into the ComplexList to operate directly on the data array. 

> 
>> [...]
>> 
>> Unfortunately processing the stream as encountered may result in out of
>> order elements if parallelisation occurs so the results may not be
>> collected back in the same order.
> 
> IIUC, this would be a non-starter for the data cube abstraction.
> 
>> Here there are 

Re: [Numbers] Arrays of "Complex" objects and RAM

2019-11-06 Thread Gilles Sadowski
Hi.

>>> [...]
> >
> > public class ComplexArray {
> >  private final MultidimensionalCounter counter;
> >  // ...
> >
> >  public ComplexArray(int... dimensions) {
> >  counter = new MultidimensionalCounter(dimensions);
> >  data = new data[counter.getSize()];
> >  }
> >
> >  /**
> >   * @param indices Storage location.
> >   * @return the complex number stored at the given location.
> >   */
> >  public Complex get(int... indices) {
> >  return data[counter.getCount(indices)];
> >  }
> > }
> >
> > Or perhaps the data cube should be an additional abstraction on top of
> > "ComplexArray" (?).
> I think adding the abstraction on top is easier.
>
> Array can be renamed to Collection/List? Thus the count is the only
> thing that matters, and optionally a constant time access get(int index)
> method.

+1

>
> This can later be reshaped to a Matrix representation using the
> MultidimensionalCounter pattern.
>
> We have two use cases:
>
> 1. You know the final count of Complex objects
> 2. You do not know the count of Complex objects
>
> Use case 2 is used in streams where the ComplexList must be expandable
> for use as a collector. This can be satisfied with a constructor with an
> initial capacity. The ComplexList would thus offer a specialisation of
> ArrayList by storing the Complex objects using primitive array(s).
>
> Use case 2 rules out the possibility of immutability. This is the type
> of functionality under discussion and could be served using an interface
> to allow different implementations:
>
> public interface ComplexList {
> [...]
> }

+1

> This could be separated to put the set and add methods in a
> sub-interface allowing the top level interface to be an immutable object.

Perhaps better to follow the JDK convention and provide a
public static ComplexList unmodifiableList(ComplexList delegate)
method.

> However the apply functions currently are specified using Complex.
> Efficiency would be gained using primitive specialisations for operating
> on 1 or 2 complex numbers and outputting the results to a provided consumer:
>
>  public interface ComplexProvider {
>  /**
>   * Gets the real component of the complex number.
>   *
>   * @return the real component
>   */
>  double getReal();
>
>  /**
>   * Gets the imaginary component of the complex number.
>   *
>   * @return the imaginary component
>   */
>  double getImaginary();
>  }
>
>  public interface ComplexConsumer {
>  /**
>   * Accept the components of the complex number.
>   * Optionally return a new ComplexProvider with the components.
>   *
>   * @param real the real
>   * @param imaginary the imaginary
>   * @return the complex provider (or null)
>   */
>  ComplexProvider accept(double real, double imaginary);
>  }

What is gained wrt using "Complex" instances directly?

> [...]
>
> Unfortunately processing the stream as encountered may result in out of
> order elements if parallelisation occurs so the results may not be
> collected back in the same order.

IIUC, this would be a non-starter for the data cube abstraction.

> Here there are solutions to different problems. So which are we trying
> to solve:
>
> 1. Efficient storage of large amounts of complex numbers.
>
> 2. Efficient computation on large amounts of complex numbers.

Those two IMO.

>
> 3. Custom computation on complex numbers.

Not sure what you mean.

>
> 4. Computation on large amounts of complex numbers in-place.

This would assume very, very, large amounts of data.
For applications that would manage with data that fill at most half the
available memory, this could be emulated with
   ComplexList data = ...
   data = function.apply(data);

And, with a "data cube" abstraction with several underlying blocks
of data, the amount of needed "spare" RAM could be reduced at will:
Instead of storing one block of 1000 x 1000, one could store 4 of
500 x 500; process them sequentially would require four times less
spare RAM.

> For 1 we can use the abstraction to a collection (e.g. ComplexList).
>
> For 2 we can duplicate all methods in Complex to work directly on the
> underlying data.
>
> For 2/3 we can rewrite computation of Complex to work on input providers
> and output consumers and allow functions to be applied to the collection.
>
> For 4 we require different mutability of the collection.

I don't think that immutability is a very useful requirement when handling
very large amounts of data.  E.g. the data cube should probably allow this
kinf of usage:
   ComplexCube c = ...
   c.transformInPlace(function);

> I will continue to think about this as the solution to all issues is not
> straightforward.

I didn't want to push this too far.  The sole use-case I was aware of is an
"OutOfMemoryError" due to storing "Complex" 

Re: [Numbers] Arrays of "Complex" objects and RAM

2019-11-06 Thread Alex Herbert


On 06/11/2019 12:41, Gilles Sadowski wrote:

Hello.

Le mar. 5 nov. 2019 à 18:38, Alex Herbert  a écrit :

On 05/11/2019 00:09, Eric Barnhill wrote:

That's interesting. The JTransforms library performs Fourier transforms
that can take complex input, output, or both. They do this with interleaved
double[] arrays, which I suppose is much more space efficient, and the
status of a number as real or imaginary is implicit by its location being
odd or even.

The packing of multi-dimensional structures into a single array is
common in image processing. It has computational efficiency when
operating on the entire set of numbers.

It is used a lot in image processing to pack 2D data as a single array.
In this case even random access to the (x,y) index as:

2D: [x][y]
1D: [y * maxx + x]

is comparable in speed as the later avoids the second array dereference.

I suspected that it might be the case.
If you have benchmark data that confirm it, then is there ever any
advantage to a multidimensional array?


A single array has advantages for a copy operation as it requires a
single .clone() method call. The disadvantage is that the ultimate size
of the multi-dimension array is limited by the size of an array.

That's ~17 GB, and more than 1 billion complex numbers.
And if there are use-cases that require very large data cubes, then the
abstraction would allow to add further indirections.


In the case of complex numbers the use of alternating indices to store
real and imaginary allows the array to be easily resized and extended.
Operations that process all positions only need to cache in memory a
sub-section of the array.

[re1, im1, re2, im2, re3, im3]

The alternative is to store real and imaginary in a large block in
either 1 array or 2 arrays:

[re1, re2, re3, im1, im2, im3]

[re1, re2, re3]
[im1, im2, im3]

This has advantages for operations that require either the real or the
imaginary components on their own. It is a bit more work to extend the
data as it requires two copies of the existing data and packing of the
new values as the appropriate position.

For small sizes the use of 2 arrays this would be less efficient memory
wise. It would require some type of pointer to hold the pair. However it
could store double the amount of complex numbers.

The MultidimensionalCounter functionality you mention is for example known
in Matlab as ind2sub() and sub2ind() . It allows for 1d vectorizing of
certain functions which can improve performance. Some people swear by it. I
always found it an additional mental exercise that I didn't want to occupy
myself with unless I absolutely had to. So, maybe it makes sense to
"advertise" this approach like you say, but some users may just want the 3D
arrays for more rapid prototyping-ish applications.

As Alex has shown, the 1D representation is abstracted away: The data is
accessed as a multi-dimensional array and the user never needs to remember
that it is interleaved.


I wonder if there isn't some streaming solution for this -- the numbers are
stored as an interleaved 1D array, but are streamed through a Complex
constructor before any needed operations are performed.

My initial thought is not as double values. A DoubleStream processes
single values. To collect the real and imaginary components would
require that the stream can be processed two at a time.

This structure would work:

class ComplexArray {
  // Must be even in length. Packed as (real, imaginary)
  double[] data;
  int size;

  Stream stream() {
  return IntStream.range(0, size / 2)
  .mapToObj(i -> Complex.ofCartesian(data[i*2], 
data[i*2+1]));
  }

  void add(Complex c) {
  checkCapacity(2);
  data[size++] = c.getReal();
  data[size++] = c.getImaginary();
  }

Perhaps "store" would be less confusing (?).
Final API names should have some consensus vote. At the moment this is a 
prototype for functionality.



  void combine(ComplexArray c) {
  checkCapacity(c.size);
  System.arraycopy(c.data, 0, data, size, c.size);
  size += c.size;
  }

  private void checkCapacity(int length) {
  final int minCapacity = size + length;
  final int oldCapacity = data.length;
  if (minCapacity > oldCapacity) {
  int newCapacity = ((oldCapacity * 3) >>> 1) + 1;
  if (newCapacity < minCapacity) {
  newCapacity = minCapacity;
  }
  data = Arrays.copyOf(data, newCapacity);
  }
  }
}

To allow easy usage as a multidimensional array, the API should contain
the "MultidimensionalCounter"[1] or equivalent:

public class ComplexArray {
 private final MultidimensionalCounter counter;
 // ...

 public ComplexArray(int... dimensions) {
 counter = new MultidimensionalCounter(dimensions);
 data = new data[counter.getSize()];
 }

 /**
  * @param indices Storage location.
  * @return the 

Re: [Numbers] Arrays of "Complex" objects and RAM

2019-11-06 Thread Gilles Sadowski
Hello.

Le mar. 5 nov. 2019 à 18:38, Alex Herbert  a écrit :
>
> On 05/11/2019 00:09, Eric Barnhill wrote:
> > That's interesting. The JTransforms library performs Fourier transforms
> > that can take complex input, output, or both. They do this with interleaved
> > double[] arrays, which I suppose is much more space efficient, and the
> > status of a number as real or imaginary is implicit by its location being
> > odd or even.
>
> The packing of multi-dimensional structures into a single array is
> common in image processing. It has computational efficiency when
> operating on the entire set of numbers.
>
> It is used a lot in image processing to pack 2D data as a single array.
> In this case even random access to the (x,y) index as:
>
> 2D: [x][y]
> 1D: [y * maxx + x]
>
> is comparable in speed as the later avoids the second array dereference.

I suspected that it might be the case.
If you have benchmark data that confirm it, then is there ever any
advantage to a multidimensional array?

> A single array has advantages for a copy operation as it requires a
> single .clone() method call. The disadvantage is that the ultimate size
> of the multi-dimension array is limited by the size of an array.

That's ~17 GB, and more than 1 billion complex numbers.
And if there are use-cases that require very large data cubes, then the
abstraction would allow to add further indirections.

> In the case of complex numbers the use of alternating indices to store
> real and imaginary allows the array to be easily resized and extended.
> Operations that process all positions only need to cache in memory a
> sub-section of the array.
>
> [re1, im1, re2, im2, re3, im3]
>
> The alternative is to store real and imaginary in a large block in
> either 1 array or 2 arrays:
>
> [re1, re2, re3, im1, im2, im3]
>
> [re1, re2, re3]
> [im1, im2, im3]
>
> This has advantages for operations that require either the real or the
> imaginary components on their own. It is a bit more work to extend the
> data as it requires two copies of the existing data and packing of the
> new values as the appropriate position.
>
> For small sizes the use of 2 arrays this would be less efficient memory
> wise. It would require some type of pointer to hold the pair. However it
> could store double the amount of complex numbers.
> >
> > The MultidimensionalCounter functionality you mention is for example known
> > in Matlab as ind2sub() and sub2ind() . It allows for 1d vectorizing of
> > certain functions which can improve performance. Some people swear by it. I
> > always found it an additional mental exercise that I didn't want to occupy
> > myself with unless I absolutely had to. So, maybe it makes sense to
> > "advertise" this approach like you say, but some users may just want the 3D
> > arrays for more rapid prototyping-ish applications.

As Alex has shown, the 1D representation is abstracted away: The data is
accessed as a multi-dimensional array and the user never needs to remember
that it is interleaved.

> >
> > I wonder if there isn't some streaming solution for this -- the numbers are
> > stored as an interleaved 1D array, but are streamed through a Complex
> > constructor before any needed operations are performed.
>
> My initial thought is not as double values. A DoubleStream processes
> single values. To collect the real and imaginary components would
> require that the stream can be processed two at a time.
>
> This structure would work:
>
> class ComplexArray {
>  // Must be even in length. Packed as (real, imaginary)
>  double[] data;
>  int size;
>
>  Stream stream() {
>  return IntStream.range(0, size / 2)
>  .mapToObj(i -> Complex.ofCartesian(data[i*2], 
> data[i*2+1]));
>  }
>
>  void add(Complex c) {
>  checkCapacity(2);
>  data[size++] = c.getReal();
>  data[size++] = c.getImaginary();
>  }

Perhaps "store" would be less confusing (?).

>
>  void combine(ComplexArray c) {
>  checkCapacity(c.size);
>  System.arraycopy(c.data, 0, data, size, c.size);
>  size += c.size;
>  }
>
>  private void checkCapacity(int length) {
>  final int minCapacity = size + length;
>  final int oldCapacity = data.length;
>  if (minCapacity > oldCapacity) {
>  int newCapacity = ((oldCapacity * 3) >>> 1) + 1;
>  if (newCapacity < minCapacity) {
>  newCapacity = minCapacity;
>  }
>  data = Arrays.copyOf(data, newCapacity);
>  }
>  }
> }

To allow easy usage as a multidimensional array, the API should contain
the "MultidimensionalCounter"[1] or equivalent:

public class ComplexArray {
private final MultidimensionalCounter counter;
// ...

public ComplexArray(int... dimensions) {
counter = new MultidimensionalCounter(dimensions);
data = new data[counter.getSize()];
}

/**
 * @param indices Storage 

Re: [Numbers] Arrays of "Complex" objects and RAM

2019-11-05 Thread Alex Herbert

On 05/11/2019 00:09, Eric Barnhill wrote:

That's interesting. The JTransforms library performs Fourier transforms
that can take complex input, output, or both. They do this with interleaved
double[] arrays, which I suppose is much more space efficient, and the
status of a number as real or imaginary is implicit by its location being
odd or even.


The packing of multi-dimensional structures into a single array is 
common in image processing. It has computational efficiency when 
operating on the entire set of numbers.


It is used a lot in image processing to pack 2D data as a single array. 
In this case even random access to the (x,y) index as:


2D: [x][y]
1D: [y * maxx + x]

is comparable in speed as the later avoids the second array dereference.

A single array has advantages for a copy operation as it requires a 
single .clone() method call. The disadvantage is that the ultimate size 
of the multi-dimension array is limited by the size of an array.


In the case of complex numbers the use of alternating indices to store 
real and imaginary allows the array to be easily resized and extended. 
Operations that process all positions only need to cache in memory a 
sub-section of the array.


[re1, im1, re2, im2, re3, im3]

The alternative is to store real and imaginary in a large block in 
either 1 array or 2 arrays:


[re1, re2, re3, im1, im2, im3]

[re1, re2, re3]
[im1, im2, im3]

This has advantages for operations that require either the real or the 
imaginary components on their own. It is a bit more work to extend the 
data as it requires two copies of the existing data and packing of the 
new values as the appropriate position.


For small sizes the use of 2 arrays this would be less efficient memory 
wise. It would require some type of pointer to hold the pair. However it 
could store double the amount of complex numbers.


The MultidimensionalCounter functionality you mention is for example known
in Matlab as ind2sub() and sub2ind() . It allows for 1d vectorizing of
certain functions which can improve performance. Some people swear by it. I
always found it an additional mental exercise that I didn't want to occupy
myself with unless I absolutely had to. So, maybe it makes sense to
"advertise" this approach like you say, but some users may just want the 3D
arrays for more rapid prototyping-ish applications.

I wonder if there isn't some streaming solution for this -- the numbers are
stored as an interleaved 1D array, but are streamed through a Complex
constructor before any needed operations are performed.


My initial thought is not as double values. A DoubleStream processes 
single values. To collect the real and imaginary components would 
require that the stream can be processed two at a time.


This structure would work:

class ComplexArray {
// Must be even in length. Packed as (real, imaginary)
    double[] data;
int size;

    Stream stream() {
    return IntStream.range(0, size / 2)
    .mapToObj(i -> Complex.ofCartesian(data[i*2], 
data[i*2+1]));
    }

void add(Complex c) {
checkCapacity(2);
data[size++] = c.getReal();
data[size++] = c.getImaginary();
}

void combine(ComplexArray c) {
checkCapacity(c.size);
System.arraycopy(c.data, 0, data, size, c.size);
size += c.size;
}

private void checkCapacity(int length) {
final int minCapacity = size + length;
final int oldCapacity = data.length;
if (minCapacity > oldCapacity) {
int newCapacity = ((oldCapacity * 3) >>> 1) + 1;
if (newCapacity < minCapacity) {
newCapacity = minCapacity;
}
data = Arrays.copyOf(data, newCapacity);
}
}
}



And I guess that leads to my last question -- suppose someone wants to call
a trig function on a series of Complex numbers. Now let's imagine the
primitives have been stored in some maximally efficient way. It seems like,
to use any of the functionality in Complex, these numbers would have to be
unpacked, cast as Complex, operated on, then cast back to how they are
being stored. I wonder if that would prove to be more efficient in the end.


Probably not more efficient to create all the Complex instances. Ideally 
you would operate on the primitive data directly.


Looking at what Complex is capable of doing it seems that there are a 
lot of operations that would have to be duplicated to allow operations 
to work on primitives. However you can create the API and implement it 
lazily using the above object, e.g.


public static ComplexArray apply(ComplexArray array, UnaryOperator 
mapper) {
return array.stream()
.map(mapper)
.collect(ComplexArray::new, ComplexArray::add, 
ComplexArray::combine);
}

public static ComplexArray log(ComplexArray array) {
return apply(Complex::log);
}

This does create a new Complex object for each operation function 
applied to the data.


The 

Re: [Numbers] Arrays of "Complex" objects and RAM

2019-11-04 Thread Eric Barnhill
That's interesting. The JTransforms library performs Fourier transforms
that can take complex input, output, or both. They do this with interleaved
double[] arrays, which I suppose is much more space efficient, and the
status of a number as real or imaginary is implicit by its location being
odd or even.

The MultidimensionalCounter functionality you mention is for example known
in Matlab as ind2sub() and sub2ind() . It allows for 1d vectorizing of
certain functions which can improve performance. Some people swear by it. I
always found it an additional mental exercise that I didn't want to occupy
myself with unless I absolutely had to. So, maybe it makes sense to
"advertise" this approach like you say, but some users may just want the 3D
arrays for more rapid prototyping-ish applications.

I wonder if there isn't some streaming solution for this -- the numbers are
stored as an interleaved 1D array, but are streamed through a Complex
constructor before any needed operations are performed.

And I guess that leads to my last question -- suppose someone wants to call
a trig function on a series of Complex numbers. Now let's imagine the
primitives have been stored in some maximally efficient way. It seems like,
to use any of the functionality in Complex, these numbers would have to be
unpacked, cast as Complex, operated on, then cast back to how they are
being stored. I wonder if that would prove to be more efficient in the end.

On Sat, Nov 2, 2019 at 7:14 PM Gilles Sadowski  wrote:

> Hello.
>
> The class "ComplexUtils" deal with multi-dimensional arrays that hold
> instances of the "Complex" class.
> I've recently encountered a use-case where it was pointed out that storing
> many "Complex" instances (as seems the purpose of these utilities) is quite
> inefficient memory-wise as each instance will take 32 bytes[1] while the
> real and imaginary parts would only take 16 bytes as 2 primitive "double"s.
> This is compounded by multi-dimensional array where each sub-dimensional
> element is an array object (and thus takes another additional 16 bytes).
> For example, a
> double[10][5][4]
> would take
> 16 * (1 + 10 * 5) + 10 * 5 * 4 * 8
>   = 2416 bytes.
> Assuming that in the above array, the last dimension holds 2 complex
> numbers,
> the same data can be represented as
> Complex[10][5][2]
> that would take
> 16 * ((1 + 10 * 5) + (10 * 5 * 2)) + 10 * 5 * 2 * 2 * 8
>   = 4016 bytes.
> In both cases, the payload (10 * 5 * 2 complex numbers) is
> 10 * 5 * 2 * 2 * 8
>   = 1600 bytes.
> If stored in a one-dimensional array, the size in memory would be 1616
> bytes.
> Thus in the case of a data cube holding 100 complex numbers, a 3D array
> takes 1.5 (primitives) or 2.5 ("Complex" objects) more memory than a 1D
> array.
> If this is correct, I wonder whether we should advertise such a
> "ComplexUtils"
> class.  It would perhaps be preferable to propose a data cube
> abstraction.[2]
>
> WDYT?
>
> Regards,
> Gilles
>
> [1] https://www.baeldung.com/java-size-of-object
> [2] Based on
> http://commons.apache.org/proper/commons-math/apidocs/org/apache/commons/math4/util/MultidimensionalCounter.html
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
> For additional commands, e-mail: dev-h...@commons.apache.org
>
>