Github user velvia commented on the pull request:
https://github.com/apache/spark/pull/215#issuecomment-38514709
This is the changelog. One thing not noted here is that one of the APIs in
the hash maps, add(), has also been deprecated and is not available in
newer versions, and was being used to Spark, so it caused difficulties if
newer version of fastutil is used. This might have been removed from Spark
code however.
6.5.12
- Removed some useless wrapper creation in a few methods of tree-based map
classes.
- Fixed pathological maxFill computation for very small-sized big open
hash sets.
6.5.11
- A very old and subtle performance bug in hash-based data structures has
been fixed. Backing arrays were allocated using the number of expected
elements divided by the load factor. However, since the test for
rehashing was fired by equality with the table size multiplied by the
load factor, if the expected number of elements multiplied by the load
factor was an integer a useless rehash would happen for the very last
added element. The only effect was an useless increase in object
creation.
6.5.10
- Now iterators in object set constructors are of type Iterator, and not
anymore ObjectIterator. The kind of allowed iterators has been
rationalised and made uniform through all classes implementing Set.
6.5.9
- New methods to get a type-specific Iterable from binary or
text files.
6.5.8
- Fixed stupid bug in creation of array-based FIFO queues.
6.5.7
- Fixed a very subtle bug in hash-based data structures: addAll() to a
newly created structure could require a very long time due to
correlation between the positions in structures with different table
sizes.
6.5.6
- equals() method between arrays have been deprecated in favour of the
java.util.Arrays version, which is intrinsified in recent JVMs.
- InspectableFileCachedInputStream.reopen() makes it possible to
read again from the start an instance on which close() was
invoked.
6.5.5
- The abstract implementation of equals() between (big) lists now uses
type-specific access methods (as the compareTo() method was already
doing) to avoid massive boxing/unboxing. Thanks to Adrien Grand for
suggesting this improvement.
- FIFO array-based queues are now serializable.
6.5.4
- Further fixes related to NaNs in sorting.
- Fixed very old bug in FastByteArrayOutputStream.write(int).
Thanks to Massimo Santini for reporting this bug.
- We now use Arrays.MAX_ARRAY_SIZE, which is equal to Integer.MAX_VALUE
minus 8, to bound all array allocations. Previously, it might happen
that grow() and other array-related functions could try to allocate an
array of size Integer.MAX_VALUE, which is technically correct from the
JLS, but will not work on most JVMs. The maximum length we use now is
the same value as that used by java.util.ArrayList. Thanks to William
Harvey for suggesting this change.
6.5.3
- Corrected erroneous introduction of compare() methods on integral
classes (they appeared in Java 7).
6.5.2
- A few changes were necessary to make fastutil behave as Java on NaNs
when sorting. Double.compareTo() and Float.compareTo() treat Double.NaN
as greater than Double.POSITIVE_INFINITY, and fastutil was not doing it.
As part of the change, now all comparisons between primitive types are
performed using the compare() method of the wrapper class
(microbenchmarks confirmed that there is no speed penalty for that,
probably due to inlining or even intrinsification). Thanks to Adam Klein
for reporting this bug.
- All quickSort() implementations that do not involve a comparator are now
deprecated, as there are equivalent/better versions in java.util.Arrays.
6.5.0 -> 6.5.1
- Now FastBuffered{Input/Output}Stream has a constructor with an
explicitly given buffer.
- Abandoned golden-ratio based expansion of arrays and lists in favour of
a (more standard) doubling approach.
- Array-based FIFO queues now reduce their capacity automatically by
halving when the size becomes one fourth of the length.
- The add() method for open hash maps has been deprecated and replaced by
addTo(), as the name choice proved to be a recipe for disaster.
- New InspectableFileCachedInputStream for caching easily large byte
streams partially on file and partially in memory.
- The front() method for semi-indirect heaps took no comparator, but
was used in queues in which you could support a comparator. There
is now a further version accepting a comparator.
- Serial Version UIDs are now private.
6.4.6 -> 6.5.0
- Fixed type of array hash strategies.
- Fixed use of equals() instead of compareTo() in
SemiIndirectHeaps.front(). Thanks to Matthew Hatem for reporting this
bug.
- Now we generate custom hash maps for primite types, too (as we were
already doing for sets).
6.4.5 -> 6.4.6
- In array-based priority queues changed() would not invalidate
the cached index of the smallest element.
6.4.4 -> 6.4.5
- In some very rare circumstances, enumeration of hash sets or maps
combined with massive element removal (using the iterator remove()
method) could have led to inconsistent enumeration (duplicates and
missing elements). Thanks to Hamish Morgan for reporting this bug.
On Mon, Mar 24, 2014 at 11:19 AM, Aaron Davidson
<[email protected]>wrote:
> What are the fixes to the OpenHashMaps? Their severity may impact whether
> we have to pick this into branch-0.9.
>
> --
> Reply to this email directly or view it on
GitHub<https://github.com/apache/spark/pull/215#issuecomment-38480889>
> .
>
--
The fruit of silence is prayer;
the fruit of prayer is faith;
the fruit of faith is love;
the fruit of love is service;
the fruit of service is peace. -- Mother Teresa
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---