The Apache MADlib team is pleased to announce the immediate availability of the 1.14 release.
The main goals of this release are: New features: - New module - Balanced datasets: A sampling module to balance classification datasets by resampling using various techniques including undersampling, oversampling, uniform sampling or user-defined proportion sampling (MADLIB-1168) - Mini-batch: Added a mini-batch optimizer for MLP and a preprocessor function necessary to create batches from the data (MADLIB-1200, MADLIB-1206, MADLIB-1220, MADLIB-1224, MADLIB-1226, MADLIB-1227) - k-NN: Added weighted averaging/voting by distance (MADLIB-1181) - Summary: Added additional stats: number of positive, negative, zero values and 95% confidence intervals for the mean (MADLIB-1167) - Encode categorical: Updated to produce lower-case column names when possible (MADLIB-1202) - MLP: Added support for already one-hot encoded categorical dependent variable in a classification task (MADLIB-1222) - Pagerank: Added option for personalized vertices that allows higher weightage for a subset of vertices which will have a higher jump probability as compared to other vertices and a random surfer is more likely to jump to these personalization vertices (MADLIB-1084) Bug fixes: - Fixed issue with invalid calls of construct_array that led to problems in Postgresql 10 (MADLIB-1185) - Added newline between file concatenation during PGXN install (MADLIB-1194) - Fixed upgrade issues in knn (MADLIB-1197) - Added fix to ensure RF variable importance are always non-negative - Fixed inconsistency in LDA output and improved usability (MADLIB-1160, MADLIB-1201) - Fixed MLP and RF predict for models trained in earlier versions to ensure missing optional parameters are given appropriate default values (MADLIB-1207) - Fixed a scenario in DT where no features exist due categorical columns with single level being dropped led to the database crashing - Fixed step size initialization in MLP based on learning rate policy (MADLIB-1212) - Fixed PCA issue that leads to failure when grouping column is a TEXT type (MADLIB-1215) - Fixed cat levels output in DT when grouping is enabled (MADLIB-1218) - Fixed and simplified initialization of model coefficients in MLP - Removed source table dependency for predicting regression models in MLP (MADLIB-1223) - Print loss of first iteration in MLP (MADLIB-1228) - Fixed MLP failure on GPDB 4.3 when verbose=3DTrue (MADLIB-1209) - Fixed RF issue that showed up when var_importance=3DTrue with no continuous features (MADLIB-1219) - Fixed DT/RF issue for null_as_category=3DTrue and grouping enabled (MADLIB-1217) Other: - Reduced install-check runtime for PCA, DT, RF, elastic net (MADLIB-1216) - Added CentOS 7 PostgreSQL 9.6/10 docker files All release changes can be found here: https://cwiki.apache.org/confluence/display/MADLIB/MADlib+1.14 You can download the source release and convenience binary packages from Apache MADlib's download page here: http://madlib.apache.org/download.html Alternatively, you can download through an ASF mirror near you: https://www.apache.org/dyn/closer.lua/madlib/1.14 ---- Apache MADlib is an open-source library for scalable in-database analytics. It provides data-parallel implementations of mathematical, statistical and machine learning methods for structured and unstructured data. The MADlib mission: to foster widespread development of scalable analytic skills, by harnessing efforts from commercial practice, academic research, and open-source development. We welcome your help and feedback. For more information on how to report problems, and to get involved, visit the project website at https://madlib.apache.org ---- Thank you, everyone who contributed to the MADlib 1.13 release. We look forward to continued community participation for the next release. Regards, Jingyi Mei