[GitHub] incubator-hivemall pull request #158: [HIVEMALL-215] Add step-by-step tutori...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/158#discussion_r214223029 --- Diff: docs/gitbook/supervised_learning/tutorial.md --- @@ -0,0 +1,461 @@ + + +# Step-by-Step Tutorial on Supervised Learning with Apache Hivemall + + + +## What is Hivemall? + +[Apache Hive](https://hive.apache.org/) is a data warehousing solution that enables us to process large-scale data in the form of SQL easily. Assume that you have a table named `purchase_history` which can be artificially created as: + +```sql +create table if not exists purchase_history as +select 1 as id, "Saturday" as day_of_week, "male" as gender, 600 as price, "book" as category, 1 as label +union all +select 2 as id, "Friday" as day_of_week, "female" as gender, 4800 as price, "sports" as category, 0 as label +union all +select 3 as id, "Friday" as day_of_week, "other" as gender, 18000 as price, "entertainment" as category, 0 as label +union all +select 4 as id, "Thursday" as day_of_week, "male" as gender, 200 as price, "food" as category, 0 as label +union all +select 5 as id, "Wednesday" as day_of_week, "female" as gender, 1000 as price, "electronics" as category, 1 as label +; +``` + +The syntax of Hive queries, namely **HiveQL**, is very similar to SQL: + +```sql +select count(1) from purchase_history; +``` + +> 5 + +[Apache Hivemall](https://github.com/apache/incubator-hivemall) is a collection of user-defined functions (UDFs) for HiveQL which is strongly optimized for machine learning (ML) and data science. To give an example, you can efficiently build a logistic regression model with the stochastic gradient descent (SGD) optimization by issuing the following ~10 lines of query: + +```sql +SELECT + train_classifier( +features, +label, +'-loss_function logloss -optimizer SGD' + ) as (feature, weight) +FROM + training +; +``` + + +Hivemall function [`hivemall_version()`](../misc/funcs.html#others) shows current Hivemall version, for example: + +```sql +select hivemall_version(); +``` + +> "0.5.1-incubating-SNAPSHOT" + +Below we list ML and relevant problems that Hivemall can solve: + +- [Binary and multi-class classification](../binaryclass/general.html) +- [Regression](../regression/general.html) +- [Recommendation](../recommend/cf.html) +- [Anomaly detection](../anomaly/lof.html) +- [Natural language processing](../misc/tokenizer.html) +- [Clustering](../misc/tokenizer.html) (i.e., topic modeling) +- [Data sketching](../misc/funcs.html#sketching) +- Evaluation + +Our [YouTube demo video](https://www.youtube.com/watch?v=cMUsuA9KZ_c) would be helpful to understand more about an overview of Hivemall. + +This tutorial explains the basic usage of Hivemall with examples of supervised learning of simple regressor and binary classifier. + +## Binary classification + +Imagine a scenario that we like to build a binary classifier from the mock `purchase_history` data and predict unforeseen purchases to conduct a new campaign effectively: + +| day\_of\_week | gender | price | category | label | +|:---:|:---:|:---:|:---:|:---| +|Saturday | male | 600 | book | 1 | +|Friday | female | 4800 | sports | 0 | +|Friday | other | 18000 | entertainment | 0 | +|Thursday | male | 200 | food | 0 | +|Wednesday | female | 1000 | electronics | 1 | + --- End diff -- Insert here something like.. You can create this table as follows: ```sql create table if not exists purchase_history as .. ``` ---
[GitHub] incubator-hivemall pull request #158: [HIVEMALL-215] Add step-by-step tutori...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/158#discussion_r214223937 --- Diff: docs/gitbook/supervised_learning/tutorial.md --- @@ -0,0 +1,461 @@ + + +# Step-by-Step Tutorial on Supervised Learning with Apache Hivemall + + + +## What is Hivemall? + +[Apache Hive](https://hive.apache.org/) is a data warehousing solution that enables us to process large-scale data in the form of SQL easily. Assume that you have a table named `purchase_history` which can be artificially created as: + +```sql +create table if not exists purchase_history as +select 1 as id, "Saturday" as day_of_week, "male" as gender, 600 as price, "book" as category, 1 as label +union all +select 2 as id, "Friday" as day_of_week, "female" as gender, 4800 as price, "sports" as category, 0 as label +union all +select 3 as id, "Friday" as day_of_week, "other" as gender, 18000 as price, "entertainment" as category, 0 as label +union all +select 4 as id, "Thursday" as day_of_week, "male" as gender, 200 as price, "food" as category, 0 as label +union all +select 5 as id, "Wednesday" as day_of_week, "female" as gender, 1000 as price, "electronics" as category, 1 as label +; +``` + +The syntax of Hive queries, namely **HiveQL**, is very similar to SQL: + +```sql +select count(1) from purchase_history; +``` + +> 5 + +[Apache Hivemall](https://github.com/apache/incubator-hivemall) is a collection of user-defined functions (UDFs) for HiveQL which is strongly optimized for machine learning (ML) and data science. To give an example, you can efficiently build a logistic regression model with the stochastic gradient descent (SGD) optimization by issuing the following ~10 lines of query: + +```sql +SELECT + train_classifier( +features, +label, +'-loss_function logloss -optimizer SGD' + ) as (feature, weight) +FROM + training +; +``` + + +Hivemall function [`hivemall_version()`](../misc/funcs.html#others) shows current Hivemall version, for example: + +```sql +select hivemall_version(); +``` + +> "0.5.1-incubating-SNAPSHOT" + +Below we list ML and relevant problems that Hivemall can solve: + +- [Binary and multi-class classification](../binaryclass/general.html) +- [Regression](../regression/general.html) +- [Recommendation](../recommend/cf.html) +- [Anomaly detection](../anomaly/lof.html) +- [Natural language processing](../misc/tokenizer.html) +- [Clustering](../misc/tokenizer.html) (i.e., topic modeling) +- [Data sketching](../misc/funcs.html#sketching) +- Evaluation + +Our [YouTube demo video](https://www.youtube.com/watch?v=cMUsuA9KZ_c) would be helpful to understand more about an overview of Hivemall. + +This tutorial explains the basic usage of Hivemall with examples of supervised learning of simple regressor and binary classifier. + +## Binary classification + +Imagine a scenario that we like to build a binary classifier from the mock `purchase_history` data and predict unforeseen purchases to conduct a new campaign effectively: + +| day\_of\_week | gender | price | category | label | +|:---:|:---:|:---:|:---:|:---| +|Saturday | male | 600 | book | 1 | +|Friday | female | 4800 | sports | 0 | +|Friday | other | 18000 | entertainment | 0 | +|Thursday | male | 200 | food | 0 | +|Wednesday | female | 1000 | electronics | 1 | + +Use Hivemall [`train_classifier()`](../misc/funcs.html#binary-classification) UDF to tackle the problem as follows. + +### Step 1. Feature representation + +First of all, we have to convert the records into pairs of the feature vector and corresponding target value. Here, Hivemall requires you to represent input features in a specific format. + +To be more precise, Hivemall represents single feature in a concatenation of **index** (i.e., **name**) and its **value**: + +- Quantitative feature: `:` + - e.g., `price:600.0` +- Categorical feature: `#` + - e.g., `gender#male` + --- End diff -- Better to insert the following sentence after the example. Feature index and feature value are separated by comma. When comma is omitted, the value is considered to be `1.0`. So, a categorical feature `gender#male` a [one-hot representation](https://www.quora.com/What-is-one-hot-encoding-and-when-is-it-used-in-data-science) of `index := gender#male` and `value := 1.0`. Note that `#` is not a special charactor. ---
[GitHub] incubator-hivemall pull request #158: [HIVEMALL-215] Add step-by-step tutori...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/158#discussion_r214222772 --- Diff: docs/gitbook/supervised_learning/tutorial.md --- @@ -0,0 +1,461 @@ + + +# Step-by-Step Tutorial on Supervised Learning with Apache Hivemall + + + +## What is Hivemall? + +[Apache Hive](https://hive.apache.org/) is a data warehousing solution that enables us to process large-scale data in the form of SQL easily. Assume that you have a table named `purchase_history` which can be artificially created as: + +```sql +create table if not exists purchase_history as +select 1 as id, "Saturday" as day_of_week, "male" as gender, 600 as price, "book" as category, 1 as label +union all +select 2 as id, "Friday" as day_of_week, "female" as gender, 4800 as price, "sports" as category, 0 as label +union all +select 3 as id, "Friday" as day_of_week, "other" as gender, 18000 as price, "entertainment" as category, 0 as label +union all +select 4 as id, "Thursday" as day_of_week, "male" as gender, 200 as price, "food" as category, 0 as label +union all +select 5 as id, "Wednesday" as day_of_week, "female" as gender, 1000 as price, "electronics" as category, 1 as label +; +``` + +The syntax of Hive queries, namely **HiveQL**, is very similar to SQL: + +```sql +select count(1) from purchase_history; +``` + +> 5 + --- End diff -- General introduction to Apache Hive and HiveQL is not required for Hivemall's document. The base document is for introducing Hivemall to TD's customers who might not aware differences of Hive and Presto. You can start with `Apache Hivemall is a ... lines of query as follows:` ---
[GitHub] incubator-hivemall pull request #158: [HIVEMALL-215] Add step-by-step tutori...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/158#discussion_r214226384 --- Diff: docs/gitbook/supervised_learning/tutorial.md --- @@ -0,0 +1,461 @@ + + +# Step-by-Step Tutorial on Supervised Learning with Apache Hivemall + + + +## What is Hivemall? + +[Apache Hive](https://hive.apache.org/) is a data warehousing solution that enables us to process large-scale data in the form of SQL easily. Assume that you have a table named `purchase_history` which can be artificially created as: + +```sql +create table if not exists purchase_history as +select 1 as id, "Saturday" as day_of_week, "male" as gender, 600 as price, "book" as category, 1 as label +union all +select 2 as id, "Friday" as day_of_week, "female" as gender, 4800 as price, "sports" as category, 0 as label +union all +select 3 as id, "Friday" as day_of_week, "other" as gender, 18000 as price, "entertainment" as category, 0 as label +union all +select 4 as id, "Thursday" as day_of_week, "male" as gender, 200 as price, "food" as category, 0 as label +union all +select 5 as id, "Wednesday" as day_of_week, "female" as gender, 1000 as price, "electronics" as category, 1 as label +; +``` + +The syntax of Hive queries, namely **HiveQL**, is very similar to SQL: + +```sql +select count(1) from purchase_history; +``` + +> 5 + +[Apache Hivemall](https://github.com/apache/incubator-hivemall) is a collection of user-defined functions (UDFs) for HiveQL which is strongly optimized for machine learning (ML) and data science. To give an example, you can efficiently build a logistic regression model with the stochastic gradient descent (SGD) optimization by issuing the following ~10 lines of query: + +```sql +SELECT + train_classifier( +features, +label, +'-loss_function logloss -optimizer SGD' + ) as (feature, weight) +FROM + training +; +``` + + +Hivemall function [`hivemall_version()`](../misc/funcs.html#others) shows current Hivemall version, for example: + +```sql +select hivemall_version(); +``` + +> "0.5.1-incubating-SNAPSHOT" + +Below we list ML and relevant problems that Hivemall can solve: + +- [Binary and multi-class classification](../binaryclass/general.html) +- [Regression](../regression/general.html) +- [Recommendation](../recommend/cf.html) +- [Anomaly detection](../anomaly/lof.html) +- [Natural language processing](../misc/tokenizer.html) +- [Clustering](../misc/tokenizer.html) (i.e., topic modeling) +- [Data sketching](../misc/funcs.html#sketching) +- Evaluation + +Our [YouTube demo video](https://www.youtube.com/watch?v=cMUsuA9KZ_c) would be helpful to understand more about an overview of Hivemall. + +This tutorial explains the basic usage of Hivemall with examples of supervised learning of simple regressor and binary classifier. + +## Binary classification + +Imagine a scenario that we like to build a binary classifier from the mock `purchase_history` data and predict unforeseen purchases to conduct a new campaign effectively: + +| day\_of\_week | gender | price | category | label | +|:---:|:---:|:---:|:---:|:---| +|Saturday | male | 600 | book | 1 | +|Friday | female | 4800 | sports | 0 | +|Friday | other | 18000 | entertainment | 0 | +|Thursday | male | 200 | food | 0 | +|Wednesday | female | 1000 | electronics | 1 | + +Use Hivemall [`train_classifier()`](../misc/funcs.html#binary-classification) UDF to tackle the problem as follows. + +### Step 1. Feature representation + +First of all, we have to convert the records into pairs of the feature vector and corresponding target value. Here, Hivemall requires you to represent input features in a specific format. + +To be more precise, Hivemall represents single feature in a concatenation of **index** (i.e., **name**) and its **value**: + +- Quantitative feature: `:` + - e.g., `price:600.0` +- Categorical feature: `#` + - e.g., `gender#male` + +Each of those features is a string value in Hive, and "feature vector" means an array of string values like: + +``` +["price:600.0", "day of week#Saturday", "gender#male", "category#book"] +``` + +See also more detailed [document for input format](../getting_started/input-format.html)). + +Therefore, what we first need to do is to convert the records into an array of feature strings, and Hivemall functions [
[GitHub] incubator-hivemall pull request #158: [HIVEMALL-215] Add step-by-step tutori...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/158#discussion_r214236762 --- Diff: docs/gitbook/supervised_learning/tutorial.md --- @@ -0,0 +1,457 @@ + + +# Step-by-Step Tutorial on Supervised Learning with Apache Hivemall --- End diff -- Remove obvious `with Apache Hivemall` ---
[GitHub] incubator-hivemall issue #158: [HIVEMALL-215] Add step-by-step tutorial on S...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/158 @chezou Merged. Thank you for your first contribution! ---
[GitHub] incubator-hivemall pull request #159: [HIVEMALL-214][DOC] Update userguide f...
GitHub user myui opened a pull request: https://github.com/apache/incubator-hivemall/pull/159 [HIVEMALL-214][DOC] Update userguide for General Classifier/Regressor example ## What changes were proposed in this pull request? Refine user guide for generic classifier/regressor and so on. ## What type of PR is it? Documentation ## What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-214 ## How to use this feature? See user guide. You can merge this pull request into a Git repository by running: $ git pull https://github.com/myui/incubator-hivemall HIVEMALL-214 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/159.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #159 commit 6f40c466e21c78238a74f9c2f227df8ae156b3e2 Author: Makoto Yui Date: 2018-08-31T07:38:17Z Added general classifier example using a9a dataset commit 4963b63ab685aa539c6c0f5f3cd3230215ba4df7 Author: Makoto Yui Date: 2018-08-31T07:46:31Z Added assertions for deprecated contents commit 472821279d70e4171b7cf391a09bac10c95e28cb Author: Makoto Yui Date: 2018-08-31T08:02:13Z Capitalized topics and fixed a typo commit 649e77840ff154bd75cd7c1bfdfc245516b68b0d Author: Makoto Yui Date: 2018-08-31T11:18:50Z Refined user guide ---
[GitHub] incubator-hivemall pull request #161: [HIVEMALL-216] Fix Docker image based ...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/161#discussion_r214793366 --- Diff: docs/gitbook/docker/getting_started.md --- @@ -17,29 +17,31 @@ under the License. --> +# Getting started with Hivemall on Docker + This page introduces how to run Hivemall on Docker. > Caution > This docker image contains a single-node Hadoop enviroment for evaluating Hivemall. Not suited for production uses. -# Requirements +## Requirements * Docker Engine 1.6+ * Docker Compose 1.10+ -# 1. Build image +## 1. Build image --- End diff -- Could you remove `1.` and `2.`? See what's happing in http://hivemall.incubator.apache.org/userguide/docker/getting_started.html#1-build-image ---
[GitHub] incubator-hivemall pull request #160: [HIVEMALL-163] Add IS_INFINITE, IS_FIN...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/160#discussion_r214800712 --- Diff: core/src/main/java/hivemall/tools/math/IsInfiniteUDF.java --- @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package hivemall.tools.math; + +import org.apache.hadoop.hive.ql.exec.Description; +import org.apache.hadoop.hive.ql.exec.UDF; + +@Description(name = "is_infinite", value = "_FUNC_(x) - Determine if x is infinite.") +public final class IsInfiniteUDF extends UDF { +public Boolean evaluate(Double num) { +if (num == null) { +return null; +} else { +return !num.isNaN() && num.isInfinite(); --- End diff -- Is `!num.isNaN() &&` required? ---
[GitHub] incubator-hivemall pull request #162: [HIVEMALL-217] Resolve missing links f...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/162#discussion_r214870585 --- Diff: docs/gitbook/tips/hadoop_tuning.md --- @@ -75,13 +75,13 @@ feature_dimensions (2^24 by the default) * 4 bytes (float) * 2 (iff covariance i ``` > 2^24 * 4 bytes * 2 * 1.2 â 161MB -When [SpaceEfficientDenseModel](https://github.com/apache/incubator-hivemall/blob/master/src/main/java/hivemall/io/SpaceEfficientDenseModel.java) is used, the formula changes as follows: +When [SpaceEfficientDenseModel](https://github.com/myui/hivemall/blob/master/src/main/java/hivemall/io/SpaceEfficientDenseModel.java) is used, the formula changes as follows: --- End diff -- `github.com/myui` is deprecated. Use https://github.com/apache/incubator-hivemall/blob/master/core/src/main/java/hivemall/model/SpaceEfficientDenseModel.java instead other appearance of `github.com/myui` as well. ---
[GitHub] incubator-hivemall pull request #162: [HIVEMALL-217] Resolve missing links f...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/162#discussion_r215535078 --- Diff: docs/gitbook/tips/emr.md --- @@ -21,15 +21,15 @@ ## Prerequisite Learn how to use Hive with Elastic MapReduce (EMR). -http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive.html +https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive.html Before launching an EMR job, * create ${s3bucket}/emr/outputs for outputs * optionally, create ${s3bucket}/emr/logs for logging -* put [emr_hivemall_bootstrap.sh](https://raw.github.com/myui/hivemall/master/scripts/misc/emr_hivemall_bootstrap.sh) on ${s3bucket}/emr/conf +* put [emr_hivemall_bootstrap.sh](https://raw.githubusercontent.com/apache/incubator-hivemall/master/resources/misc/emr_hivemall_bootstrap.sh) on ${s3bucket}/emr/conf Then, lunch an EMR job with hive in an interactive mode. -I'm usually lunching EMR instances with cheap Spot instances through [CLI client](http://aws.amazon.com/developertools/2264) as follows: +I'm usually lunching EMR instances with cheap Spot instances through [CLI client](https://aws.amazon.com/jp/tools/) as follows: --- End diff -- should be `https://aws.amazon.com/tools/` ---
[GitHub] incubator-hivemall pull request #163: [HIVEMALL-196][WIP] Support BM25 scori...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/163#discussion_r215564184 --- Diff: core/src/main/java/hivemall/ftvec/text/OkapiBM25UDF.java --- @@ -0,0 +1,167 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package hivemall.ftvec.text; + +import hivemall.UDFWithOptions; +import org.apache.commons.cli.CommandLine; +import org.apache.commons.cli.Options; +import org.apache.hadoop.hive.ql.exec.Description; +import org.apache.hadoop.hive.ql.exec.UDFArgumentException; +import org.apache.hadoop.hive.ql.metadata.Hive; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import hivemall.utils.hadoop.HiveUtils; +import org.apache.hadoop.hive.ql.udf.UDFType; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory; +import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils; +import org.apache.hadoop.io.DoubleWritable; + +import javax.annotation.Nonnull; +import java.util.Arrays; + +@Description(name = "okapi_bm25", +value = "_FUNC_(double tf_word, int dl, double avgdl, int N, int n [, const string options]) - Return an Okapi BM25 score in float") +//TODO: What does stateful mean? --- End diff -- https://hive.apache.org/javadocs/r1.2.2/api/org/apache/hadoop/hive/ql/udf/UDFType.html#stateful() So, it's okey `stateful = false`. Please remove this comment. ---
[GitHub] incubator-hivemall issue #163: [HIVEMALL-196][WIP] Support BM25 scoring
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/163 Please add a unit test and evaluate this function on Hive environment. ---
[GitHub] incubator-hivemall pull request #164: [HIVEMALL-218] Fixed train_lda NPE whe...
GitHub user myui opened a pull request: https://github.com/apache/incubator-hivemall/pull/164 [HIVEMALL-218] Fixed train_lda NPE where input row is null ## What changes were proposed in this pull request? Fixed NegativeArraySizeException where input is NULL of `train_lda` ## What type of PR is it? Bug Fix ## What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-218 ## How was this patch tested? manual tests ## Checklist - [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit? - [x] Did you run system tests on Hive (or Spark)? You can merge this pull request into a Git repository by running: $ git pull https://github.com/myui/incubator-hivemall HIVEMALL-218 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/164.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #164 commit 67f6f68acad09c7a0e70f9fbdb183116eeec6a1d Author: Makoto Yui Date: 2018-09-07T08:56:43Z Fixed NegativeArraySizeException where input is NULL commit d367de34e34d42514c0bb6141fbf31f295e33e50 Author: Makoto Yui Date: 2018-09-07T09:15:05Z Fixed NPE in forward() ---
[GitHub] incubator-hivemall pull request #165: [HIVEMALL-219][BUGFIX] Fixed NPE in fi...
GitHub user myui opened a pull request: https://github.com/apache/incubator-hivemall/pull/165 [HIVEMALL-219][BUGFIX] Fixed NPE in finalizeTraining() ## What changes were proposed in this pull request? Fixed NPE in finalizeTraining() where there are no training example ## What type of PR is it? Bug Fix ## What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-219 ## How was this patch tested? to appear ## Checklist (Please remove this section if not needed; check `x` for YES, blank for NO) - [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit? - [ ] Did you run system tests on Hive (or Spark)? You can merge this pull request into a Git repository by running: $ git pull https://github.com/myui/incubator-hivemall HIVEMALL-219 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/165.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #165 commit bc0e14d1d29ba13b173165bca9d9511b19abbc6e Author: Makoto Yui Date: 2018-09-18T09:42:06Z Fixed NPE in finalizeTraining() ---
[GitHub] incubator-hivemall pull request #166: [HIVEMALL-219] Fixed LDA bug for singl...
GitHub user myui opened a pull request: https://github.com/apache/incubator-hivemall/pull/166 [HIVEMALL-219] Fixed LDA bug for single update and added unit tests ## What changes were proposed in this pull request? Fixed LDA bug for single update and added unit tests ## What type of PR is it? Bug Fix ## What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-219 ## How was this patch tested? unit tests and manual tests on EMR ## Checklist (Please remove this section if not needed; check `x` for YES, blank for NO) - [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit? - [x] Did you run system tests on Hive (or Spark)? You can merge this pull request into a Git repository by running: $ git pull https://github.com/myui/incubator-hivemall HIVEMALL-219-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/166.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #166 commit 202eddd71c00e3889c0a126fe1038df35c1513d9 Author: Makoto Yui Date: 2018-09-18T10:36:02Z Fixed LDA bug for single update and added unit tests ---
[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/167#discussion_r226199666 --- Diff: core/src/main/java/hivemall/mf/CofactorModel.java --- @@ -0,0 +1,629 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package hivemall.mf; + +import hivemall.fm.Feature; +import hivemall.utils.math.MathUtils; +import hivemall.utils.math.MatrixUtils; +import org.apache.commons.math3.linear.ArrayRealVector; +import org.apache.commons.math3.linear.Array2DRowRealMatrix; +import org.apache.commons.math3.linear.RealMatrix; +import org.apache.commons.math3.linear.RealVector; +import org.apache.commons.math3.linear.SingularValueDecomposition; + +import javax.annotation.Nonnegative; +import javax.annotation.Nonnull; +import javax.annotation.Nullable; +import java.util.*; + +public class CofactorModel { + +public enum RankInitScheme { +random /* default */, gaussian; + +@Nonnegative +protected float maxInitValue; +@Nonnegative +protected double initStdDev; + +@Nonnull +public static CofactorModel.RankInitScheme resolve(@Nullable String opt) { +if (opt == null) { +return random; +} else if ("gaussian".equalsIgnoreCase(opt)) { +return gaussian; +} else if ("random".equalsIgnoreCase(opt)) { +return random; +} +return random; +} + +public void setMaxInitValue(float maxInitValue) { +this.maxInitValue = maxInitValue; +} + +public void setInitStdDev(double initStdDev) { +this.initStdDev = initStdDev; +} + +} + +private static final int EXPECTED_SIZE = 136861; +@Nonnegative +protected final int factor; + +// rank matrix initialization +protected final RankInitScheme initScheme; + +@Nonnull +private double globalBias; + +// storing trainable latent factors and weights +private Map theta; +private Map beta; +private Map betaBias; --- End diff -- Please use `Object2DoubleMap betaBias` instead to reduce memory consumption. ---
[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/167#discussion_r226198983 --- Diff: core/src/main/java/hivemall/mf/FactorizedModel.java --- @@ -30,25 +30,25 @@ import javax.annotation.concurrent.NotThreadSafe; @NotThreadSafe -public final class FactorizedModel { +public class FactorizedModel { --- End diff -- It seems FactorizedModel is not used in Cofactor. Is this change required? Revert if not used. ---
[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/167#discussion_r226199747 --- Diff: core/src/main/java/hivemall/mf/CofactorModel.java --- @@ -0,0 +1,629 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package hivemall.mf; + +import hivemall.fm.Feature; +import hivemall.utils.math.MathUtils; +import hivemall.utils.math.MatrixUtils; +import org.apache.commons.math3.linear.ArrayRealVector; +import org.apache.commons.math3.linear.Array2DRowRealMatrix; +import org.apache.commons.math3.linear.RealMatrix; +import org.apache.commons.math3.linear.RealVector; +import org.apache.commons.math3.linear.SingularValueDecomposition; + +import javax.annotation.Nonnegative; +import javax.annotation.Nonnull; +import javax.annotation.Nullable; +import java.util.*; + +public class CofactorModel { + +public enum RankInitScheme { +random /* default */, gaussian; + +@Nonnegative +protected float maxInitValue; +@Nonnegative +protected double initStdDev; + +@Nonnull +public static CofactorModel.RankInitScheme resolve(@Nullable String opt) { +if (opt == null) { +return random; +} else if ("gaussian".equalsIgnoreCase(opt)) { +return gaussian; +} else if ("random".equalsIgnoreCase(opt)) { +return random; +} +return random; +} + +public void setMaxInitValue(float maxInitValue) { +this.maxInitValue = maxInitValue; +} + +public void setInitStdDev(double initStdDev) { +this.initStdDev = initStdDev; +} + +} + +private static final int EXPECTED_SIZE = 136861; +@Nonnegative +protected final int factor; + +// rank matrix initialization +protected final RankInitScheme initScheme; + +@Nonnull +private double globalBias; + +// storing trainable latent factors and weights +private Map theta; +private Map beta; +private Map betaBias; +private Map gamma; +private Map gammaBias; --- End diff -- Please use `Object2DoubleMap gammaBias` instead to reduce memory consumption. ---
[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/167#discussion_r226202891 --- Diff: core/src/main/java/hivemall/mf/CofactorModel.java --- @@ -0,0 +1,629 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package hivemall.mf; + +import hivemall.fm.Feature; +import hivemall.utils.math.MathUtils; +import hivemall.utils.math.MatrixUtils; +import org.apache.commons.math3.linear.ArrayRealVector; +import org.apache.commons.math3.linear.Array2DRowRealMatrix; +import org.apache.commons.math3.linear.RealMatrix; +import org.apache.commons.math3.linear.RealVector; +import org.apache.commons.math3.linear.SingularValueDecomposition; + +import javax.annotation.Nonnegative; +import javax.annotation.Nonnull; +import javax.annotation.Nullable; +import java.util.*; + +public class CofactorModel { + +public enum RankInitScheme { +random /* default */, gaussian; + +@Nonnegative +protected float maxInitValue; +@Nonnegative +protected double initStdDev; + +@Nonnull +public static CofactorModel.RankInitScheme resolve(@Nullable String opt) { +if (opt == null) { +return random; +} else if ("gaussian".equalsIgnoreCase(opt)) { +return gaussian; +} else if ("random".equalsIgnoreCase(opt)) { +return random; +} +return random; +} + +public void setMaxInitValue(float maxInitValue) { +this.maxInitValue = maxInitValue; +} + +public void setInitStdDev(double initStdDev) { +this.initStdDev = initStdDev; +} + +} + +private static final int EXPECTED_SIZE = 136861; +@Nonnegative +protected final int factor; + +// rank matrix initialization +protected final RankInitScheme initScheme; + +@Nonnull +private double globalBias; + +// storing trainable latent factors and weights +private Map theta; +private Map beta; +private Map betaBias; +private Map gamma; +private Map gammaBias; + +// precomputed identity matrix +private RealMatrix identity; + +protected final Random[] randU, randI; + +// hyperparameters +private final float c0, c1; +private final float lambdaTheta, lambdaBeta, lambdaGamma; + +public CofactorModel(@Nonnegative int factor, @Nonnull RankInitScheme initScheme, + @Nonnull float c0, @Nonnull float c1, float lambdaTheta, + float lambdaBeta, float lambdaGamma) { + +// rank init scheme is gaussian +// https://github.com/dawenl/cofactor/blob/master/src/cofacto.py#L98 +this.factor = factor; +this.initScheme = initScheme; +this.globalBias = 0.d; +this.lambdaTheta = lambdaTheta; +this.lambdaBeta = lambdaBeta; +this.lambdaGamma = lambdaGamma; + +this.theta = new HashMap<>(); +this.beta = new HashMap<>(); +this.betaBias = new HashMap<>(); +this.gamma = new HashMap<>(); +this.gammaBias = new HashMap<>(); + +this.randU = newRandoms(factor, 31L); +this.randI = newRandoms(factor, 41L); + +checkHyperparameterC(c0); +checkHyperparameterC(c1); +this.c0 = c0; +this.c1 = c1; + +} + +private void initFactorVector(String key, Map weights) { +if (weights.containsKey(key)) { +return; +} +RealVector v = new ArrayRealVector(factor); +switch (initScheme) { +case random:
[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/167#discussion_r226204201 --- Diff: core/src/main/java/hivemall/mf/CofactorModel.java --- @@ -0,0 +1,629 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package hivemall.mf; + +import hivemall.fm.Feature; +import hivemall.utils.math.MathUtils; +import hivemall.utils.math.MatrixUtils; +import org.apache.commons.math3.linear.ArrayRealVector; +import org.apache.commons.math3.linear.Array2DRowRealMatrix; +import org.apache.commons.math3.linear.RealMatrix; +import org.apache.commons.math3.linear.RealVector; +import org.apache.commons.math3.linear.SingularValueDecomposition; + +import javax.annotation.Nonnegative; +import javax.annotation.Nonnull; +import javax.annotation.Nullable; +import java.util.*; + +public class CofactorModel { + +public enum RankInitScheme { +random /* default */, gaussian; + +@Nonnegative +protected float maxInitValue; +@Nonnegative +protected double initStdDev; + +@Nonnull +public static CofactorModel.RankInitScheme resolve(@Nullable String opt) { +if (opt == null) { +return random; +} else if ("gaussian".equalsIgnoreCase(opt)) { +return gaussian; +} else if ("random".equalsIgnoreCase(opt)) { +return random; +} +return random; +} + +public void setMaxInitValue(float maxInitValue) { +this.maxInitValue = maxInitValue; +} + +public void setInitStdDev(double initStdDev) { +this.initStdDev = initStdDev; +} + +} + +private static final int EXPECTED_SIZE = 136861; +@Nonnegative +protected final int factor; + +// rank matrix initialization +protected final RankInitScheme initScheme; + +@Nonnull +private double globalBias; + +// storing trainable latent factors and weights +private Map theta; +private Map beta; +private Map betaBias; +private Map gamma; +private Map gammaBias; + +// precomputed identity matrix +private RealMatrix identity; + +protected final Random[] randU, randI; + +// hyperparameters +private final float c0, c1; +private final float lambdaTheta, lambdaBeta, lambdaGamma; + +public CofactorModel(@Nonnegative int factor, @Nonnull RankInitScheme initScheme, + @Nonnull float c0, @Nonnull float c1, float lambdaTheta, + float lambdaBeta, float lambdaGamma) { + +// rank init scheme is gaussian +// https://github.com/dawenl/cofactor/blob/master/src/cofacto.py#L98 +this.factor = factor; +this.initScheme = initScheme; +this.globalBias = 0.d; +this.lambdaTheta = lambdaTheta; +this.lambdaBeta = lambdaBeta; +this.lambdaGamma = lambdaGamma; + +this.theta = new HashMap<>(); +this.beta = new HashMap<>(); +this.betaBias = new HashMap<>(); +this.gamma = new HashMap<>(); +this.gammaBias = new HashMap<>(); + +this.randU = newRandoms(factor, 31L); +this.randI = newRandoms(factor, 41L); + +checkHyperparameterC(c0); +checkHyperparameterC(c1); +this.c0 = c0; +this.c1 = c1; + +} + +private void initFactorVector(String key, Map weights) { +if (weights.containsKey(key)) { +return; +} +RealVector v = new ArrayRealVector(factor); --- End diff -- ``` final double[] v =
[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/167#discussion_r226239017 --- Diff: core/src/main/java/hivemall/mf/CofactorModel.java --- @@ -0,0 +1,629 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package hivemall.mf; + +import hivemall.fm.Feature; +import hivemall.utils.math.MathUtils; +import hivemall.utils.math.MatrixUtils; +import org.apache.commons.math3.linear.ArrayRealVector; +import org.apache.commons.math3.linear.Array2DRowRealMatrix; +import org.apache.commons.math3.linear.RealMatrix; +import org.apache.commons.math3.linear.RealVector; +import org.apache.commons.math3.linear.SingularValueDecomposition; + +import javax.annotation.Nonnegative; +import javax.annotation.Nonnull; +import javax.annotation.Nullable; +import java.util.*; + +public class CofactorModel { + +public enum RankInitScheme { +random /* default */, gaussian; + +@Nonnegative +protected float maxInitValue; +@Nonnegative +protected double initStdDev; + +@Nonnull +public static CofactorModel.RankInitScheme resolve(@Nullable String opt) { +if (opt == null) { +return random; +} else if ("gaussian".equalsIgnoreCase(opt)) { +return gaussian; +} else if ("random".equalsIgnoreCase(opt)) { +return random; +} +return random; +} + +public void setMaxInitValue(float maxInitValue) { +this.maxInitValue = maxInitValue; +} + +public void setInitStdDev(double initStdDev) { +this.initStdDev = initStdDev; +} + +} + +private static final int EXPECTED_SIZE = 136861; +@Nonnegative +protected final int factor; + +// rank matrix initialization +protected final RankInitScheme initScheme; + +@Nonnull +private double globalBias; + +// storing trainable latent factors and weights +private Map theta; +private Map beta; +private Map betaBias; +private Map gamma; +private Map gammaBias; + +// precomputed identity matrix +private RealMatrix identity; + +protected final Random[] randU, randI; + +// hyperparameters +private final float c0, c1; +private final float lambdaTheta, lambdaBeta, lambdaGamma; + +public CofactorModel(@Nonnegative int factor, @Nonnull RankInitScheme initScheme, + @Nonnull float c0, @Nonnull float c1, float lambdaTheta, + float lambdaBeta, float lambdaGamma) { + +// rank init scheme is gaussian +// https://github.com/dawenl/cofactor/blob/master/src/cofacto.py#L98 +this.factor = factor; +this.initScheme = initScheme; +this.globalBias = 0.d; +this.lambdaTheta = lambdaTheta; +this.lambdaBeta = lambdaBeta; +this.lambdaGamma = lambdaGamma; + +this.theta = new HashMap<>(); +this.beta = new HashMap<>(); +this.betaBias = new HashMap<>(); +this.gamma = new HashMap<>(); +this.gammaBias = new HashMap<>(); + +this.randU = newRandoms(factor, 31L); +this.randI = newRandoms(factor, 41L); + +checkHyperparameterC(c0); +checkHyperparameterC(c1); +this.c0 = c0; +this.c1 = c1; + +} + +private void initFactorVector(String key, Map weights) { +if (weights.containsKey(key)) { +return; +} +RealVector v = new ArrayRealVector(factor); +switch (initScheme) { +case random:
[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/167#discussion_r226237654 --- Diff: core/src/main/java/hivemall/mf/CofactorModel.java --- @@ -0,0 +1,629 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package hivemall.mf; + +import hivemall.fm.Feature; +import hivemall.utils.math.MathUtils; +import hivemall.utils.math.MatrixUtils; +import org.apache.commons.math3.linear.ArrayRealVector; +import org.apache.commons.math3.linear.Array2DRowRealMatrix; +import org.apache.commons.math3.linear.RealMatrix; +import org.apache.commons.math3.linear.RealVector; +import org.apache.commons.math3.linear.SingularValueDecomposition; + +import javax.annotation.Nonnegative; +import javax.annotation.Nonnull; +import javax.annotation.Nullable; +import java.util.*; + +public class CofactorModel { + +public enum RankInitScheme { +random /* default */, gaussian; + +@Nonnegative +protected float maxInitValue; +@Nonnegative +protected double initStdDev; + +@Nonnull +public static CofactorModel.RankInitScheme resolve(@Nullable String opt) { +if (opt == null) { +return random; +} else if ("gaussian".equalsIgnoreCase(opt)) { +return gaussian; +} else if ("random".equalsIgnoreCase(opt)) { +return random; +} +return random; +} + +public void setMaxInitValue(float maxInitValue) { +this.maxInitValue = maxInitValue; +} + +public void setInitStdDev(double initStdDev) { +this.initStdDev = initStdDev; +} + +} + +private static final int EXPECTED_SIZE = 136861; +@Nonnegative +protected final int factor; + +// rank matrix initialization +protected final RankInitScheme initScheme; + +@Nonnull +private double globalBias; + +// storing trainable latent factors and weights +private Map theta; +private Map beta; +private Map betaBias; +private Map gamma; +private Map gammaBias; + +// precomputed identity matrix +private RealMatrix identity; + +protected final Random[] randU, randI; + +// hyperparameters +private final float c0, c1; +private final float lambdaTheta, lambdaBeta, lambdaGamma; + +public CofactorModel(@Nonnegative int factor, @Nonnull RankInitScheme initScheme, + @Nonnull float c0, @Nonnull float c1, float lambdaTheta, + float lambdaBeta, float lambdaGamma) { + +// rank init scheme is gaussian +// https://github.com/dawenl/cofactor/blob/master/src/cofacto.py#L98 +this.factor = factor; +this.initScheme = initScheme; +this.globalBias = 0.d; +this.lambdaTheta = lambdaTheta; +this.lambdaBeta = lambdaBeta; +this.lambdaGamma = lambdaGamma; + +this.theta = new HashMap<>(); +this.beta = new HashMap<>(); +this.betaBias = new HashMap<>(); +this.gamma = new HashMap<>(); +this.gammaBias = new HashMap<>(); + +this.randU = newRandoms(factor, 31L); +this.randI = newRandoms(factor, 41L); + +checkHyperparameterC(c0); +checkHyperparameterC(c1); +this.c0 = c0; +this.c1 = c1; + +} + +private void initFactorVector(String key, Map weights) { +if (weights.containsKey(key)) { +return; +} +RealVector v = new ArrayRealVector(factor); +switch (initScheme) { +case random:
[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/167#discussion_r226239653 --- Diff: core/src/main/java/hivemall/mf/CofactorModel.java --- @@ -0,0 +1,629 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package hivemall.mf; + +import hivemall.fm.Feature; +import hivemall.utils.math.MathUtils; +import hivemall.utils.math.MatrixUtils; +import org.apache.commons.math3.linear.ArrayRealVector; +import org.apache.commons.math3.linear.Array2DRowRealMatrix; +import org.apache.commons.math3.linear.RealMatrix; +import org.apache.commons.math3.linear.RealVector; +import org.apache.commons.math3.linear.SingularValueDecomposition; + +import javax.annotation.Nonnegative; +import javax.annotation.Nonnull; +import javax.annotation.Nullable; +import java.util.*; + +public class CofactorModel { + +public enum RankInitScheme { +random /* default */, gaussian; + +@Nonnegative +protected float maxInitValue; +@Nonnegative +protected double initStdDev; + +@Nonnull +public static CofactorModel.RankInitScheme resolve(@Nullable String opt) { +if (opt == null) { +return random; +} else if ("gaussian".equalsIgnoreCase(opt)) { +return gaussian; +} else if ("random".equalsIgnoreCase(opt)) { +return random; +} +return random; +} + +public void setMaxInitValue(float maxInitValue) { +this.maxInitValue = maxInitValue; +} + +public void setInitStdDev(double initStdDev) { +this.initStdDev = initStdDev; +} + +} + +private static final int EXPECTED_SIZE = 136861; +@Nonnegative +protected final int factor; + +// rank matrix initialization +protected final RankInitScheme initScheme; + +@Nonnull +private double globalBias; + +// storing trainable latent factors and weights +private Map theta; +private Map beta; +private Map betaBias; +private Map gamma; +private Map gammaBias; + +// precomputed identity matrix +private RealMatrix identity; + +protected final Random[] randU, randI; + +// hyperparameters +private final float c0, c1; +private final float lambdaTheta, lambdaBeta, lambdaGamma; + +public CofactorModel(@Nonnegative int factor, @Nonnull RankInitScheme initScheme, + @Nonnull float c0, @Nonnull float c1, float lambdaTheta, + float lambdaBeta, float lambdaGamma) { + +// rank init scheme is gaussian +// https://github.com/dawenl/cofactor/blob/master/src/cofacto.py#L98 +this.factor = factor; +this.initScheme = initScheme; +this.globalBias = 0.d; +this.lambdaTheta = lambdaTheta; +this.lambdaBeta = lambdaBeta; +this.lambdaGamma = lambdaGamma; + +this.theta = new HashMap<>(); +this.beta = new HashMap<>(); +this.betaBias = new HashMap<>(); +this.gamma = new HashMap<>(); +this.gammaBias = new HashMap<>(); + +this.randU = newRandoms(factor, 31L); +this.randI = newRandoms(factor, 41L); + +checkHyperparameterC(c0); +checkHyperparameterC(c1); +this.c0 = c0; +this.c1 = c1; + +} + +private void initFactorVector(String key, Map weights) { +if (weights.containsKey(key)) { +return; +} +RealVector v = new ArrayRealVector(factor); +switch (initScheme) { +case random:
[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/167#discussion_r226241124 --- Diff: core/src/main/java/hivemall/mf/CofactorizationUDTF.java --- @@ -0,0 +1,574 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package hivemall.mf; + +import hivemall.UDTFWithOptions; +import hivemall.common.ConversionState; +import hivemall.fm.Feature; +import hivemall.fm.StringFeature; +import hivemall.utils.hadoop.HiveUtils; +import hivemall.utils.io.FileUtils; +import hivemall.utils.io.NioStatefulSegment; +import hivemall.utils.lang.NumberUtils; +import hivemall.utils.lang.Primitives; +import hivemall.utils.lang.SizeOf; +import org.apache.commons.cli.CommandLine; +import org.apache.commons.cli.Options; +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; +import org.apache.hadoop.hive.ql.exec.UDFArgumentException; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.serde2.objectinspector.*; +import org.apache.hadoop.hive.serde2.objectinspector.primitive.BooleanObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory; +import org.apache.hadoop.hive.serde2.objectinspector.primitive.StringObjectInspector; +import org.apache.hadoop.mapred.Counters; +import org.apache.hadoop.mapred.Reporter; + +import javax.annotation.Nonnull; +import javax.annotation.Nullable; +import java.io.File; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.util.ArrayList; +import java.util.List; + +import static hivemall.utils.lang.Primitives.FALSE_BYTE; +import static hivemall.utils.lang.Primitives.TRUE_BYTE; + +public class CofactorizationUDTF extends UDTFWithOptions { +private static final Log LOG = LogFactory.getLog(CofactorizationUDTF.class); + +// Option variables +// The number of latent factors +protected int factor; +// The scaling hyperparameter for zero entries in the rank matrix +protected float scale_zero; +// The scaling hyperparameter for non-zero entries in the rank matrix +protected float scale_nonzero; +// The preferred size of the miniBatch for training +protected int batchSize; +// The initial mean rating +protected float globalBias; +// Whether update (and return) the mean rating or not +protected boolean updateGlobalBias; +// The number of iterations +protected int maxIters; +// Whether to use bias clause +protected boolean useBiasClause; +// Whether to use normalization +protected boolean useL2Norm; +// regularization hyperparameters +protected float lambdaTheta; +protected float lambdaBeta; +protected float lambdaGamma; + +// Initialization strategy of rank matrix +protected CofactorModel.RankInitScheme rankInit; + +// Model itself +protected CofactorModel model; +protected int numItems; + +// Variable managing status of learning + +// The number of processed training examples +protected long count; + +protected ConversionState cvState; +private ConversionState validationState; + +// Input OIs and Context +protected StringObjectInspector contextOI; +protected ListObjectInspector featuresOI; +protected BooleanObjectInspector isItemOI; +protected ListObjectInspector sppmiOI; + +// Used for iterations +protected NioStatefulSegment fileIO; +protected ByteBuffer inputBuf; +private long lastWritePos; + +private Feature contextProbe; +private Feature[] featuresProbe; +private Feature[] sppmiProbe; +private boolean isItemProbe; +private long numValidations; +private long numTraining; +
[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/167#discussion_r226243032 --- Diff: core/src/main/java/hivemall/mf/CofactorModel.java --- @@ -0,0 +1,638 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package hivemall.mf; + +import hivemall.fm.Feature; +import hivemall.utils.math.MathUtils; +import hivemall.utils.math.MatrixUtils; +import org.apache.commons.math3.linear.ArrayRealVector; +import org.apache.commons.math3.linear.Array2DRowRealMatrix; +import org.apache.commons.math3.linear.RealMatrix; +import org.apache.commons.math3.linear.RealVector; +import org.apache.commons.math3.linear.SingularValueDecomposition; + +import javax.annotation.Nonnegative; +import javax.annotation.Nonnull; +import javax.annotation.Nullable; +import java.util.*; + +public class CofactorModel { + +public enum RankInitScheme { +random /* default */, gaussian; + +@Nonnegative +protected float maxInitValue; +@Nonnegative +protected double initStdDev; + +@Nonnull +public static CofactorModel.RankInitScheme resolve(@Nullable String opt) { +if (opt == null) { +return random; +} else if ("gaussian".equalsIgnoreCase(opt)) { +return gaussian; +} else if ("random".equalsIgnoreCase(opt)) { +return random; +} +return random; +} + +public void setMaxInitValue(float maxInitValue) { +this.maxInitValue = maxInitValue; +} + +public void setInitStdDev(double initStdDev) { +this.initStdDev = initStdDev; +} + +} + +private static final int EXPECTED_SIZE = 136861; +@Nonnegative +protected final int factor; + +// rank matrix initialization +protected final RankInitScheme initScheme; + +@Nonnull +private double globalBias; + +// storing trainable latent factors and weights +private Map theta; +private Map beta; +private Map betaBias; +private Map gamma; +private Map gammaBias; + +// precomputed identity matrix +private RealMatrix identity; + +protected final Random[] randU, randI; + +// hyperparameters +private final float c0, c1; +private final float lambdaTheta, lambdaBeta, lambdaGamma; + +public CofactorModel(@Nonnegative int factor, @Nonnull RankInitScheme initScheme, + @Nonnull float c0, @Nonnull float c1, float lambdaTheta, + float lambdaBeta, float lambdaGamma) { + +// rank init scheme is gaussian +// https://github.com/dawenl/cofactor/blob/master/src/cofacto.py#L98 +this.factor = factor; +this.initScheme = initScheme; +this.globalBias = 0.d; +this.lambdaTheta = lambdaTheta; +this.lambdaBeta = lambdaBeta; +this.lambdaGamma = lambdaGamma; + +this.theta = new HashMap<>(); +this.beta = new HashMap<>(); +this.betaBias = new HashMap<>(); +this.gamma = new HashMap<>(); +this.gammaBias = new HashMap<>(); + +this.randU = newRandoms(factor, 31L); +this.randI = newRandoms(factor, 41L); + +checkHyperparameterC(c0); +checkHyperparameterC(c1); +this.c0 = c0; +this.c1 = c1; + +} + +private void initFactorVector(String key, Map weights) { +if (weights.containsKey(key)) { +return; +} +RealVector v = new ArrayRealVector(factor); +switch (initScheme) { +case random:
[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/167#discussion_r226247247 --- Diff: core/src/main/java/hivemall/mf/CofactorModel.java --- @@ -0,0 +1,640 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package hivemall.mf; + +import hivemall.fm.Feature; +import hivemall.utils.math.MathUtils; +import hivemall.utils.math.MatrixUtils; +import it.unimi.dsi.fastutil.objects.Object2DoubleArrayMap; +import it.unimi.dsi.fastutil.objects.Object2DoubleMap; +import org.apache.commons.math3.linear.ArrayRealVector; +import org.apache.commons.math3.linear.Array2DRowRealMatrix; +import org.apache.commons.math3.linear.RealMatrix; +import org.apache.commons.math3.linear.RealVector; +import org.apache.commons.math3.linear.SingularValueDecomposition; + +import javax.annotation.Nonnegative; +import javax.annotation.Nonnull; +import javax.annotation.Nullable; +import java.util.*; + +public class CofactorModel { + +public enum RankInitScheme { +random /* default */, gaussian; + +@Nonnegative +protected float maxInitValue; +@Nonnegative +protected double initStdDev; + +@Nonnull +public static CofactorModel.RankInitScheme resolve(@Nullable String opt) { +if (opt == null) { +return random; +} else if ("gaussian".equalsIgnoreCase(opt)) { +return gaussian; +} else if ("random".equalsIgnoreCase(opt)) { +return random; +} +return random; +} + +public void setMaxInitValue(float maxInitValue) { +this.maxInitValue = maxInitValue; +} + +public void setInitStdDev(double initStdDev) { +this.initStdDev = initStdDev; +} + +} + +private static final int EXPECTED_SIZE = 136861; +@Nonnegative +protected final int factor; + +// rank matrix initialization +protected final RankInitScheme initScheme; + +@Nonnull +private double globalBias; + +// storing trainable latent factors and weights +private Map theta; +private Map beta; +private Object2DoubleMap betaBias; +private Map gamma; +private Object2DoubleMap gammaBias; + +// precomputed identity matrix +private RealMatrix identity; + +protected final Random[] randU, randI; + +// hyperparameters +private final float c0, c1; +private final float lambdaTheta, lambdaBeta, lambdaGamma; + +public CofactorModel(@Nonnegative int factor, @Nonnull RankInitScheme initScheme, + @Nonnull float c0, @Nonnull float c1, float lambdaTheta, + float lambdaBeta, float lambdaGamma) { + +// rank init scheme is gaussian +// https://github.com/dawenl/cofactor/blob/master/src/cofacto.py#L98 +this.factor = factor; +this.initScheme = initScheme; +this.globalBias = 0.d; +this.lambdaTheta = lambdaTheta; +this.lambdaBeta = lambdaBeta; +this.lambdaGamma = lambdaGamma; + +this.theta = new HashMap<>(); +this.beta = new HashMap<>(); +this.betaBias = new Object2DoubleArrayMap<>(); +this.gamma = new HashMap<>(); +this.gammaBias = new Object2DoubleArrayMap<>(); + +this.randU = newRandoms(factor, 31L); +this.randI = newRandoms(factor, 41L); + +checkHyperparameterC(c0); +checkHyperparameterC(c1); +this.c0 = c0; +this.c1 = c1; + +} + +private void initFactorVector(String key, Map weights)
[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/167#discussion_r226525857 --- Diff: core/src/main/java/hivemall/mf/CofactorizationUDTF.java --- @@ -0,0 +1,574 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package hivemall.mf; + +import hivemall.UDTFWithOptions; +import hivemall.common.ConversionState; +import hivemall.fm.Feature; +import hivemall.fm.StringFeature; +import hivemall.utils.hadoop.HiveUtils; +import hivemall.utils.io.FileUtils; +import hivemall.utils.io.NioStatefulSegment; +import hivemall.utils.lang.NumberUtils; +import hivemall.utils.lang.Primitives; +import hivemall.utils.lang.SizeOf; +import org.apache.commons.cli.CommandLine; +import org.apache.commons.cli.Options; +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; +import org.apache.hadoop.hive.ql.exec.UDFArgumentException; +import org.apache.hadoop.hive.ql.metadata.HiveException; +import org.apache.hadoop.hive.serde2.objectinspector.*; +import org.apache.hadoop.hive.serde2.objectinspector.primitive.BooleanObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory; +import org.apache.hadoop.hive.serde2.objectinspector.primitive.StringObjectInspector; +import org.apache.hadoop.mapred.Counters; +import org.apache.hadoop.mapred.Reporter; + +import javax.annotation.Nonnull; +import javax.annotation.Nullable; +import java.io.File; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.util.ArrayList; +import java.util.List; + +import static hivemall.utils.lang.Primitives.FALSE_BYTE; +import static hivemall.utils.lang.Primitives.TRUE_BYTE; + +public class CofactorizationUDTF extends UDTFWithOptions { +private static final Log LOG = LogFactory.getLog(CofactorizationUDTF.class); + +// Option variables +// The number of latent factors +protected int factor; +// The scaling hyperparameter for zero entries in the rank matrix +protected float scale_zero; +// The scaling hyperparameter for non-zero entries in the rank matrix +protected float scale_nonzero; +// The preferred size of the miniBatch for training +protected int batchSize; +// The initial mean rating +protected float globalBias; +// Whether update (and return) the mean rating or not +protected boolean updateGlobalBias; +// The number of iterations +protected int maxIters; +// Whether to use bias clause +protected boolean useBiasClause; +// Whether to use normalization +protected boolean useL2Norm; +// regularization hyperparameters +protected float lambdaTheta; +protected float lambdaBeta; +protected float lambdaGamma; + +// Initialization strategy of rank matrix +protected CofactorModel.RankInitScheme rankInit; + +// Model itself +protected CofactorModel model; +protected int numItems; + +// Variable managing status of learning + +// The number of processed training examples +protected long count; + +protected ConversionState cvState; +private ConversionState validationState; + +// Input OIs and Context +protected StringObjectInspector contextOI; +protected ListObjectInspector featuresOI; +protected BooleanObjectInspector isItemOI; +protected ListObjectInspector sppmiOI; + +// Used for iterations +protected NioStatefulSegment fileIO; +protected ByteBuffer inputBuf; +private long lastWritePos; + +private Feature contextProbe; +private Feature[] featuresProbe; +private Feature[] sppmiProbe; +private boolean isItemProbe; +private long numValidations; +private long numTraining; +
[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/167#discussion_r226578153 --- Diff: core/src/main/java/hivemall/fm/Feature.java --- @@ -383,4 +383,10 @@ public static void l2normalize(@Nonnull final Feature[] features) { } } +@Override --- End diff -- Why this `equals` method is required? Assume this is not used. ---
[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/167#discussion_r226578495 --- Diff: core/src/main/java/hivemall/mf/CofactorModel.java --- @@ -0,0 +1,715 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package hivemall.mf; + +import hivemall.annotations.VisibleForTesting; +import hivemall.fm.Feature; +import hivemall.utils.lang.Preconditions; +import hivemall.utils.math.MathUtils; +import it.unimi.dsi.fastutil.objects.Object2DoubleArrayMap; +import it.unimi.dsi.fastutil.objects.Object2DoubleMap; +import org.apache.commons.math3.linear.ArrayRealVector; +import org.apache.commons.math3.linear.Array2DRowRealMatrix; +import org.apache.commons.math3.linear.RealMatrix; +import org.apache.commons.math3.linear.RealVector; +import org.apache.commons.math3.linear.SingularValueDecomposition; +import org.apache.hadoop.hive.ql.metadata.HiveException; + +import javax.annotation.Nonnegative; +import javax.annotation.Nonnull; +import javax.annotation.Nullable; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Random; + + +public class CofactorModel { + +public enum RankInitScheme { +random /* default */, gaussian; + --- End diff -- please remove unnessesary line breaks. ---
[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/167#discussion_r226579051 --- Diff: core/src/main/java/hivemall/mf/CofactorModel.java --- @@ -0,0 +1,715 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package hivemall.mf; + +import hivemall.annotations.VisibleForTesting; +import hivemall.fm.Feature; +import hivemall.utils.lang.Preconditions; +import hivemall.utils.math.MathUtils; +import it.unimi.dsi.fastutil.objects.Object2DoubleArrayMap; +import it.unimi.dsi.fastutil.objects.Object2DoubleMap; +import org.apache.commons.math3.linear.ArrayRealVector; +import org.apache.commons.math3.linear.Array2DRowRealMatrix; +import org.apache.commons.math3.linear.RealMatrix; +import org.apache.commons.math3.linear.RealVector; +import org.apache.commons.math3.linear.SingularValueDecomposition; +import org.apache.hadoop.hive.ql.metadata.HiveException; + +import javax.annotation.Nonnegative; +import javax.annotation.Nonnull; +import javax.annotation.Nullable; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Random; + + +public class CofactorModel { + +public enum RankInitScheme { +random /* default */, gaussian; + + +@Nonnegative +private float maxInitValue; +@Nonnegative +private double initStdDev; +@Nonnull +public static CofactorModel.RankInitScheme resolve(@Nullable String opt) { +if (opt == null) { +return random; +} else if ("gaussian".equalsIgnoreCase(opt)) { +return gaussian; +} else if ("random".equalsIgnoreCase(opt)) { +return random; +} +return random; +} + +public void setMaxInitValue(float maxInitValue) { +this.maxInitValue = maxInitValue; +} + +public void setInitStdDev(double initStdDev) { +this.initStdDev = initStdDev; +} + + +} + +@Nonnegative +private final int factor; + +// rank matrix initialization +private final RankInitScheme initScheme; + +@Nonnull +private double globalBias; + +// storing trainable latent factors and weights +private final Map theta; +private final Map beta; +private final Object2DoubleMap betaBias; +private final Map gamma; +private final Object2DoubleMap gammaBias; + +private final Random[] randU, randI; + +// hyperparameters +private final float c0, c1; +private final float lambdaTheta, lambdaBeta, lambdaGamma; + +// solve +private final RealMatrix B; +private final RealVector A; + +// error message strings +private static final String ARRAY_NOT_SQUARE_ERR = "Array is not square"; +private static final String DIFFERENT_DIMS_ERR = "Matrix, vector or array do not match in size"; + +public CofactorModel(@Nonnegative int factor, @Nonnull RankInitScheme initScheme, + float c0, float c1, float lambdaTheta, float lambdaBeta, float lambdaGamma) { + +// rank init scheme is gaussian +// https://github.com/dawenl/cofactor/blob/master/src/cofacto.py#L98 +this.factor = factor; +this.initScheme = initScheme; +this.globalBias = 0.d; +this.lambdaTheta = lambdaTheta; +this.lambdaBeta = lambdaBeta; +this.lambdaGamma = lambdaGamma; + +this.theta = new HashMap<>(); +this.beta = new HashMap<>(); +this.betaBias = new Object2DoubleArrayMap<>(); +this.betaBias.defaultReturnValue(0.d)
[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/167#discussion_r226578854 --- Diff: core/src/main/java/hivemall/mf/CofactorModel.java --- @@ -0,0 +1,715 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package hivemall.mf; + +import hivemall.annotations.VisibleForTesting; +import hivemall.fm.Feature; +import hivemall.utils.lang.Preconditions; +import hivemall.utils.math.MathUtils; +import it.unimi.dsi.fastutil.objects.Object2DoubleArrayMap; +import it.unimi.dsi.fastutil.objects.Object2DoubleMap; +import org.apache.commons.math3.linear.ArrayRealVector; +import org.apache.commons.math3.linear.Array2DRowRealMatrix; +import org.apache.commons.math3.linear.RealMatrix; +import org.apache.commons.math3.linear.RealVector; +import org.apache.commons.math3.linear.SingularValueDecomposition; +import org.apache.hadoop.hive.ql.metadata.HiveException; + +import javax.annotation.Nonnegative; +import javax.annotation.Nonnull; +import javax.annotation.Nullable; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Random; + + +public class CofactorModel { + +public enum RankInitScheme { +random /* default */, gaussian; + + +@Nonnegative +private float maxInitValue; +@Nonnegative +private double initStdDev; +@Nonnull +public static CofactorModel.RankInitScheme resolve(@Nullable String opt) { +if (opt == null) { +return random; +} else if ("gaussian".equalsIgnoreCase(opt)) { +return gaussian; +} else if ("random".equalsIgnoreCase(opt)) { +return random; +} +return random; +} + +public void setMaxInitValue(float maxInitValue) { +this.maxInitValue = maxInitValue; +} + +public void setInitStdDev(double initStdDev) { +this.initStdDev = initStdDev; +} + + +} + +@Nonnegative +private final int factor; + +// rank matrix initialization +private final RankInitScheme initScheme; + +@Nonnull +private double globalBias; + +// storing trainable latent factors and weights +private final Map theta; +private final Map beta; +private final Object2DoubleMap betaBias; +private final Map gamma; +private final Object2DoubleMap gammaBias; + +private final Random[] randU, randI; + +// hyperparameters +private final float c0, c1; +private final float lambdaTheta, lambdaBeta, lambdaGamma; + +// solve +private final RealMatrix B; +private final RealVector A; + +// error message strings +private static final String ARRAY_NOT_SQUARE_ERR = "Array is not square"; +private static final String DIFFERENT_DIMS_ERR = "Matrix, vector or array do not match in size"; + +public CofactorModel(@Nonnegative int factor, @Nonnull RankInitScheme initScheme, + float c0, float c1, float lambdaTheta, float lambdaBeta, float lambdaGamma) { + +// rank init scheme is gaussian +// https://github.com/dawenl/cofactor/blob/master/src/cofacto.py#L98 +this.factor = factor; +this.initScheme = initScheme; +this.globalBias = 0.d; +this.lambdaTheta = lambdaTheta; +this.lambdaBeta = lambdaBeta; +this.lambdaGamma = lambdaGamma; + +this.theta = new HashMap<>(); +this.beta = new HashMap<>(); +this.betaBias = new Object2DoubleArrayMap<>(); +this.betaBias.defaultReturnValue(0.d)
[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/167#discussion_r226578817 --- Diff: core/src/main/java/hivemall/mf/CofactorModel.java --- @@ -0,0 +1,715 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package hivemall.mf; + +import hivemall.annotations.VisibleForTesting; +import hivemall.fm.Feature; +import hivemall.utils.lang.Preconditions; +import hivemall.utils.math.MathUtils; +import it.unimi.dsi.fastutil.objects.Object2DoubleArrayMap; +import it.unimi.dsi.fastutil.objects.Object2DoubleMap; +import org.apache.commons.math3.linear.ArrayRealVector; +import org.apache.commons.math3.linear.Array2DRowRealMatrix; +import org.apache.commons.math3.linear.RealMatrix; +import org.apache.commons.math3.linear.RealVector; +import org.apache.commons.math3.linear.SingularValueDecomposition; +import org.apache.hadoop.hive.ql.metadata.HiveException; + +import javax.annotation.Nonnegative; +import javax.annotation.Nonnull; +import javax.annotation.Nullable; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Random; + + +public class CofactorModel { + +public enum RankInitScheme { +random /* default */, gaussian; + + +@Nonnegative +private float maxInitValue; +@Nonnegative +private double initStdDev; +@Nonnull +public static CofactorModel.RankInitScheme resolve(@Nullable String opt) { +if (opt == null) { +return random; +} else if ("gaussian".equalsIgnoreCase(opt)) { +return gaussian; +} else if ("random".equalsIgnoreCase(opt)) { +return random; +} +return random; +} + +public void setMaxInitValue(float maxInitValue) { +this.maxInitValue = maxInitValue; +} + +public void setInitStdDev(double initStdDev) { +this.initStdDev = initStdDev; +} + + +} + +@Nonnegative +private final int factor; + +// rank matrix initialization +private final RankInitScheme initScheme; + +@Nonnull +private double globalBias; + +// storing trainable latent factors and weights +private final Map theta; +private final Map beta; +private final Object2DoubleMap betaBias; +private final Map gamma; +private final Object2DoubleMap gammaBias; + +private final Random[] randU, randI; + +// hyperparameters +private final float c0, c1; +private final float lambdaTheta, lambdaBeta, lambdaGamma; + +// solve +private final RealMatrix B; +private final RealVector A; + +// error message strings +private static final String ARRAY_NOT_SQUARE_ERR = "Array is not square"; +private static final String DIFFERENT_DIMS_ERR = "Matrix, vector or array do not match in size"; + +public CofactorModel(@Nonnegative int factor, @Nonnull RankInitScheme initScheme, + float c0, float c1, float lambdaTheta, float lambdaBeta, float lambdaGamma) { + +// rank init scheme is gaussian +// https://github.com/dawenl/cofactor/blob/master/src/cofacto.py#L98 +this.factor = factor; +this.initScheme = initScheme; +this.globalBias = 0.d; +this.lambdaTheta = lambdaTheta; +this.lambdaBeta = lambdaBeta; +this.lambdaGamma = lambdaGamma; + +this.theta = new HashMap<>(); +this.beta = new HashMap<>(); +this.betaBias = new Object2DoubleArrayMap<>(); +this.betaBias.defaultReturnValue(0.d)
[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/167#discussion_r226579427 --- Diff: core/src/main/java/hivemall/mf/CofactorModel.java --- @@ -0,0 +1,715 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package hivemall.mf; + +import hivemall.annotations.VisibleForTesting; +import hivemall.fm.Feature; +import hivemall.utils.lang.Preconditions; +import hivemall.utils.math.MathUtils; +import it.unimi.dsi.fastutil.objects.Object2DoubleArrayMap; +import it.unimi.dsi.fastutil.objects.Object2DoubleMap; +import org.apache.commons.math3.linear.ArrayRealVector; +import org.apache.commons.math3.linear.Array2DRowRealMatrix; +import org.apache.commons.math3.linear.RealMatrix; +import org.apache.commons.math3.linear.RealVector; +import org.apache.commons.math3.linear.SingularValueDecomposition; +import org.apache.hadoop.hive.ql.metadata.HiveException; + +import javax.annotation.Nonnegative; +import javax.annotation.Nonnull; +import javax.annotation.Nullable; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Random; + + +public class CofactorModel { + +public enum RankInitScheme { +random /* default */, gaussian; + + +@Nonnegative +private float maxInitValue; +@Nonnegative +private double initStdDev; +@Nonnull +public static CofactorModel.RankInitScheme resolve(@Nullable String opt) { +if (opt == null) { +return random; +} else if ("gaussian".equalsIgnoreCase(opt)) { +return gaussian; +} else if ("random".equalsIgnoreCase(opt)) { +return random; +} +return random; +} + +public void setMaxInitValue(float maxInitValue) { +this.maxInitValue = maxInitValue; +} + +public void setInitStdDev(double initStdDev) { +this.initStdDev = initStdDev; +} + + +} + +@Nonnegative +private final int factor; + +// rank matrix initialization +private final RankInitScheme initScheme; + +@Nonnull +private double globalBias; + +// storing trainable latent factors and weights +private final Map theta; +private final Map beta; +private final Object2DoubleMap betaBias; +private final Map gamma; +private final Object2DoubleMap gammaBias; + +private final Random[] randU, randI; + +// hyperparameters +private final float c0, c1; +private final float lambdaTheta, lambdaBeta, lambdaGamma; + +// solve +private final RealMatrix B; +private final RealVector A; + +// error message strings +private static final String ARRAY_NOT_SQUARE_ERR = "Array is not square"; +private static final String DIFFERENT_DIMS_ERR = "Matrix, vector or array do not match in size"; + +public CofactorModel(@Nonnegative int factor, @Nonnull RankInitScheme initScheme, + float c0, float c1, float lambdaTheta, float lambdaBeta, float lambdaGamma) { + +// rank init scheme is gaussian +// https://github.com/dawenl/cofactor/blob/master/src/cofacto.py#L98 +this.factor = factor; +this.initScheme = initScheme; +this.globalBias = 0.d; +this.lambdaTheta = lambdaTheta; +this.lambdaBeta = lambdaBeta; +this.lambdaGamma = lambdaGamma; + +this.theta = new HashMap<>(); +this.beta = new HashMap<>(); +this.betaBias = new Object2DoubleArrayMap<>(); +this.betaBias.defaultReturnValue(0.d)
[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/167#discussion_r226578559 --- Diff: core/src/main/java/hivemall/mf/CofactorModel.java --- @@ -0,0 +1,715 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package hivemall.mf; + +import hivemall.annotations.VisibleForTesting; +import hivemall.fm.Feature; +import hivemall.utils.lang.Preconditions; +import hivemall.utils.math.MathUtils; +import it.unimi.dsi.fastutil.objects.Object2DoubleArrayMap; +import it.unimi.dsi.fastutil.objects.Object2DoubleMap; +import org.apache.commons.math3.linear.ArrayRealVector; +import org.apache.commons.math3.linear.Array2DRowRealMatrix; +import org.apache.commons.math3.linear.RealMatrix; +import org.apache.commons.math3.linear.RealVector; +import org.apache.commons.math3.linear.SingularValueDecomposition; +import org.apache.hadoop.hive.ql.metadata.HiveException; + +import javax.annotation.Nonnegative; +import javax.annotation.Nonnull; +import javax.annotation.Nullable; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Random; + + +public class CofactorModel { + +public enum RankInitScheme { +random /* default */, gaussian; + + +@Nonnegative +private float maxInitValue; +@Nonnegative +private double initStdDev; +@Nonnull +public static CofactorModel.RankInitScheme resolve(@Nullable String opt) { +if (opt == null) { +return random; +} else if ("gaussian".equalsIgnoreCase(opt)) { +return gaussian; +} else if ("random".equalsIgnoreCase(opt)) { +return random; +} +return random; +} + +public void setMaxInitValue(float maxInitValue) { +this.maxInitValue = maxInitValue; +} + +public void setInitStdDev(double initStdDev) { +this.initStdDev = initStdDev; +} + + +} + +@Nonnegative +private final int factor; + +// rank matrix initialization +private final RankInitScheme initScheme; + +@Nonnull +private double globalBias; + +// storing trainable latent factors and weights +private final Map theta; +private final Map beta; +private final Object2DoubleMap betaBias; +private final Map gamma; +private final Object2DoubleMap gammaBias; + +private final Random[] randU, randI; + +// hyperparameters +private final float c0, c1; +private final float lambdaTheta, lambdaBeta, lambdaGamma; + +// solve +private final RealMatrix B; +private final RealVector A; + +// error message strings +private static final String ARRAY_NOT_SQUARE_ERR = "Array is not square"; +private static final String DIFFERENT_DIMS_ERR = "Matrix, vector or array do not match in size"; + +public CofactorModel(@Nonnegative int factor, @Nonnull RankInitScheme initScheme, + float c0, float c1, float lambdaTheta, float lambdaBeta, float lambdaGamma) { + +// rank init scheme is gaussian +// https://github.com/dawenl/cofactor/blob/master/src/cofacto.py#L98 +this.factor = factor; +this.initScheme = initScheme; +this.globalBias = 0.d; +this.lambdaTheta = lambdaTheta; +this.lambdaBeta = lambdaBeta; +this.lambdaGamma = lambdaGamma; + +this.theta = new HashMap<>(); +this.beta = new HashMap<>(); +this.betaBias = new Object2DoubleArrayMap<>(); +this.betaBias.defaultReturnValue(0.d)
[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/167#discussion_r226845079 --- Diff: core/src/main/java/hivemall/fm/Feature.java --- @@ -383,4 +383,10 @@ public static void l2normalize(@Nonnull final Feature[] features) { } } +@Override --- End diff -- See https://medium.com/codelog/overriding-hashcode-method-effective-java-notes-723c1fedf51c Usually, overriding `equals` required `hashCode` because hashCode (and equals) is used for HashMap key search. ---
[GitHub] incubator-hivemall issue #168: Add cache to reduce Maven build time on Travi...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/168 Seems not working.. `timeout: 1000` helps (?) https://docs.travis-ci.com/user/caching/#setting-the-timeout Please add `[HIVEMALL-221]` to the PR title. ---
[GitHub] incubator-hivemall pull request #169: [HIVEMALL-222] Introduce Gradient Clip...
GitHub user myui opened a pull request: https://github.com/apache/incubator-hivemall/pull/169 [HIVEMALL-222] Introduce Gradient Clipping to avoid exploding gradient to General Classifier/Regressor ## What changes were proposed in this pull request? Avoid [exploding gradients](http://www.cs.toronto.edu/~rgrosse/courses/csc321_2017/readings/L15%20Exploding%20and%20Vanishing%20Gradients.pdf) by gradient clipping (by value) ## What type of PR is it? Improvement ## What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-222 ## How was this patch tested? unit tests ## Checklist (Please remove this section if not needed; check `x` for YES, blank for NO) - [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit? - [ ] Did you run system tests on Hive (or Spark)? You can merge this pull request into a Git repository by running: $ git pull https://github.com/myui/incubator-hivemall clipping Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/169.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #169 commit 0c10392d2a3c96b40df57e6b406333e0a239b9f9 Author: Makoto Yui Date: 2018-10-24T08:14:15Z Updated for debugging purpose commit e0dc4b954650c6751d6e37ee5ecf6c9656872b16 Author: Makoto Yui Date: 2018-10-24T08:15:03Z Introduced gradient clipping by value to avoid exploding gradients commit 7e932e99cfd990bb47ff7acfed44c19678fadc8f Author: Makoto Yui Date: 2018-10-24T08:15:52Z Added a unit test for gradient clipping ---
[GitHub] incubator-hivemall issue #168: [HIVEMALL-221] Add cache to reduce Maven buil...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/168 @maropu Is this `clean` required? https://github.com/apache/incubator-hivemall/blob/master/bin/run_travis_tests.sh#L42 ---
[GitHub] incubator-hivemall pull request #168: [HIVEMALL-221] Add cache to reduce Mav...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/168#discussion_r227796760 --- Diff: .travis.yml --- @@ -1,5 +1,10 @@ sudo: false +cache: + timeout: 1500 + directories: + - $HOME/.m2 --- End diff -- Isn't `$HOME/.m2/repository` ? https://github.com/apache/kafka/blob/trunk/.travis.yml#L52 https://github.com/airlift/drift/blob/master/.travis.yml#L11 https://github.com/mesos/storm/blob/master/.travis.yml#L6 ---
[GitHub] incubator-hivemall pull request #168: [HIVEMALL-221] Add cache to reduce Mav...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/168#discussion_r227797186 --- Diff: .travis.yml --- @@ -35,7 +40,7 @@ notifications: email: false script: - - ./bin/run_travis_tests.sh + - travis_wait 10 ./bin/run_travis_tests.sh --- End diff -- plz revert this change because this does not effect ---
[GitHub] incubator-hivemall issue #168: [HIVEMALL-221] Add cache to reduce Maven buil...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/168 See what happens. ---
[GitHub] incubator-hivemall issue #168: [HIVEMALL-221] Add cache to reduce Maven buil...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/168 ``` [WARNING] Could not transfer metadata org.apache.hivemall:hivemall-spark2.1:0.5.1-incubating-SNAPSHOT/maven-metadata.xml from/to apache.snapshots (https://repository.apache.org/snapshots): Connect to repository.apache.org:443 [repository.apache.org/207.244.88.140] failed: Connection timed out (Connection timed out) [WARNING] Failure to transfer org.apache.hivemall:hivemall-spark2.1:0.5.1-incubating-SNAPSHOT/maven-metadata.xml from https://repository.apache.org/snapshots/ was cached in the local repository, resolution will not be reattempted until the update interval of apache-snapshots has elapsed or updates are forced. Original error: Could not transfer metadata org.apache.hivemall:hivemall-spark2.1:0.5.1-incubating-SNAPSHOT/maven-metadata.xml from/to apache-snapshots (https://repository.apache.org/snapshots/): Connect to repository.apache.org:443 [repository.apache.org/207.244.88.140] failed: Connection timed out (Connection timed out) [WARNING] Failure to transfer org.apache.hivemall:hivemall-spark2.1:0.5.1-incubating-SNAPSHOT/maven-metadata.xml from https://repository.apache.org/snapshots was cached in the local repository, resolution will not be reattempted until the update interval of apache.snapshots has elapsed or updates are forced. Original error: Could not transfer metadata org.apache.hivemall:hivemall-spark2.1:0.5.1-incubating-SNAPSHOT/maven-metadata.xml from/to apache.snapshots (https://repository.apache.org/snapshots): Connect to repository.apache.org:443 [repository.apache.org/207.244.88.140] failed: Connection timed out (Connection timed out) [INFO] Downloading from apache-snapshots: https://repository.apache.org/snapshots/org/apache/hivemall/hivemall-spark2.1/0.5.1-incubating-SNAPSHOT/hivemall-spark2.1-0.5.1-incubating-SNAPSHOT-sources.jar [INFO] Downloading from apache.snapshots: https://repository.apache.org/snapshots/org/apache/hivemall/hivemall-spark2.1/0.5.1-incubating-SNAPSHOT/hivemall-spark2.1-0.5.1-incubating-SNAPSHOT-sources.jar ``` hmm could we provide mirror repository in travis-ci ? ---
[GitHub] incubator-hivemall issue #168: [HIVEMALL-221] Add cache to reduce Maven buil...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/168 We might need to set asf mirror to avoid timeout by the default ASF repository. https://maven.apache.org/guides/mini/guide-mirror-settings.html https://code.i-harness.com/ja/q/c326f0 ---
[GitHub] incubator-hivemall issue #163: [HIVEMALL-196] Support BM25 scoring
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/163 @jaxony Merged with some modification. Thank you for your first contribution to Apache Hivemall! ---
[GitHub] incubator-hivemall pull request #170: [WIP][HIVEMALL-223] Add -kv_map and -v...
GitHub user myui opened a pull request: https://github.com/apache/incubator-hivemall/pull/170 [WIP][HIVEMALL-223] Add -kv_map and -vk_map option to to_ordered_list UDAF ## What changes were proposed in this pull request? Add `-kv_map` and `-vk_map` option to `to_ordered_list` UDAF. ## What type of PR is it? Improvement ## What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-223 ## How was this patch tested? unit tests and manual tests on EMR ## How to use this feature? Will be described in http://hivemall.incubator.apache.org/userguide/misc/generic_funcs.html#array ## Checklist - [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit? - [ ] Did you run system tests on Hive (or Spark)? You can merge this pull request into a Git repository by running: $ git pull https://github.com/myui/incubator-hivemall HIVEMALL-223 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/170.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #170 commit 26f361ce7b355410772577f0754f4bb5537ababf Author: Makoto Yui Date: 2018-11-12T04:19:37Z Added -kv_map and -vk_map option commit 39ee911cb12e63f924229e962bbb00247297f75d Author: Makoto Yui Date: 2018-11-12T04:20:13Z Added WIP unit tests for -kv_map/vk_map option of to_ordered_list UDAF ---
[GitHub] incubator-hivemall pull request #171: [SPARK][HOTFIX][WIP] Fix existing test...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/171#discussion_r233287092 --- Diff: spark/spark-2.3/src/test/scala/org/apache/spark/sql/hive/XGBoostSuite.scala --- @@ -77,6 +77,7 @@ final class XGBoostSuite extends VectorQueryTest { val model = hiveContext.sparkSession.read.format("libxgboost").load(tempDir) val predict = model.join(mllibTestDf) .xgboost_predict($"rowid", $"features", $"model_id", $"pred_model") --- End diff -- Let's disable xgboost for spark-2.3. ---
[GitHub] incubator-hivemall pull request #171: [SPARK][HOTFIX][WIP] Fix existing test...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/171#discussion_r233287740 --- Diff: spark/spark-2.3/src/main/scala/org/apache/spark/sql/hive/HivemallOps.scala --- @@ -1935,18 +1935,6 @@ object HivemallOps { ) } - /** - * @see [[hivemall.tools.array.SubarrayUDF]] - * @group tools.array - */ - def subarray(original: Column, fromIndex: Column, toIndex: Column): Column = withExpr { -planHiveUDF( - "hivemall.tools.array.SubarrayUDF", - "subarray", - original :: fromIndex :: toIndex :: Nil -) - } --- End diff -- Replacing SubarrayUDF with ArraySliceUDF is not easy? ``` def subarray(original: Column, fromIndex: Column, length: Column): Column = withExpr { planHiveUDF( "hivemall.tools.array.ArraySliceUDF", ``` ---
[GitHub] incubator-hivemall pull request #171: [SPARK][HOTFIX][WIP] Fix existing test...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/171#discussion_r233288186 --- Diff: spark/pom.xml --- @@ -52,6 +52,12 @@ hivemall-core ${project.version} compile + + + io.netty + netty-all + --- End diff -- ah... I see. ---
[GitHub] incubator-hivemall pull request #171: [SPARK][HOTFIX][WIP] Fix existing test...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/171#discussion_r233324312 --- Diff: spark/spark-2.3/src/test/scala/org/apache/spark/sql/hive/XGBoostSuite.scala --- @@ -77,6 +77,7 @@ final class XGBoostSuite extends VectorQueryTest { val model = hiveContext.sparkSession.read.format("libxgboost").load(tempDir) val predict = model.join(mllibTestDf) .xgboost_predict($"rowid", $"features", $"model_id", $"pred_model") --- End diff -- BTW, could you paste Stacktrace of the exception? ---
[GitHub] incubator-hivemall issue #172: Fix typo
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/172 Merged, thanks! ---
[GitHub] incubator-hivemall issue #171: [SPARK][HOTFIX] Fix the existing test failure...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/171 Merged. Thanks! ---
[GitHub] incubator-hivemall pull request #173: [HIVEMALL-227][DOC] Removed md5 and re...
GitHub user myui opened a pull request: https://github.com/apache/incubator-hivemall/pull/173 [HIVEMALL-227][DOC] Removed md5 and replace sha1 with sha512 following new ASF policy ## What changes were proposed in this pull request? Removed md5 and replace sha1 with sha512 following new ASF policy ## What type of PR is it? Documentation ## What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-227 You can merge this pull request into a Git repository by running: $ git pull https://github.com/myui/incubator-hivemall HIVEMALL-227 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/173.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #173 commit 583eb9991cf8db730d46b431b1cb80ebaeb293a8 Author: Makoto Yui Date: 2018-11-15T09:18:39Z Removed md5 and replace sha1 with sha512 following new ASF policy ---
[GitHub] incubator-hivemall pull request #175: [WIP][HIVEMALL-230] Revise Optimizer I...
GitHub user myui opened a pull request: https://github.com/apache/incubator-hivemall/pull/175 [WIP][HIVEMALL-230] Revise Optimizer Implementation ## What changes were proposed in this pull request? Revise Optimizer implementation. 1. Revise default hyperparameters of AdaDelta and Adam. 2. Support AdamW, AdamHD, Eve, and YellowFin optimizer. * Fixing Weight Decay Regularization in Adam https://openreview.net/forum?id=rk6qdGgCZ * On the Convergence of Adam and Beyond https://openreview.net/forum?id=ryQu7f-RZ * AdamHD (Adam with Hypergradient descent) https://arxiv.org/pdf/1703.04782.pdf ⢠Eve: A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates https://arxiv.org/abs/1611.01505 ⢠YellowFin and the Art of Momentum Tuning https://arxiv.org/abs/1706.03471 ## What type of PR is it? Improvement, Feature ## What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-230 ## How was this patch tested? unit tests, emr (to appear) ## How to use this feature? to appear ## Checklist - [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, for your commit? - [ ] Did you run system tests on Hive (or Spark)? You can merge this pull request into a Git repository by running: $ git pull https://github.com/myui/incubator-hivemall adam_test Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/175.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #175 commit 5168cf06bf03c38f005d435a4415ce8cb8140891 Author: Makoto Yui Date: 2018-12-03T07:04:29Z Added ongoing unit test files commit ed1b6302183a687a3584fe62ce5fa92b26c828ad Author: Makoto Yui Date: 2018-12-04T09:41:42Z Fixed to show ETA in debug log commit 5c9d63f9fc184f05eed28f03986c6269c4ea6e93 Author: Makoto Yui Date: 2018-12-04T09:42:02Z Added unit tests commit 243f4b40899b960f4942c75f89c0c4c94974b03b Author: Makoto Yui Date: 2018-12-05T09:48:17Z Added comments commit ae29e9a669dcd311b154615e19900ec4b01fd4d8 Author: Makoto Yui Date: 2018-12-06T07:08:48Z Refactored commit c25ce02db537570c6ed75db74d9a3783b316c694 Author: Makoto Yui Date: 2018-12-06T07:10:05Z Added square() method commit 71671d10138aa54c0485809b6126753a54dbe3e8 Author: Makoto Yui Date: 2018-12-06T07:10:42Z Added helper methods commit 6f4edbbaaac37884533132dea00c81f36da45e50 Author: Makoto Yui Date: 2018-12-06T07:22:51Z Refactored ADAM implementation commit e61f22afaa46bdf705c2760cebaa601929a77608 Author: Makoto Yui Date: 2018-12-06T08:52:08Z Added logging message commit 22c3f7c132fc01528c93c6e15d40a2b70f1771c0 Author: Makoto Yui Date: 2018-12-06T08:53:01Z Improved -eta option to take eta0 for Fixed ETA estimator commit e9b9b1420c3b573b5cbe15e4340d862251fac81d Author: Makoto Yui Date: 2018-12-06T08:53:28Z Added unit test commit 7c6e4a1da5eaeb99c02a9a83f1519d5274131037 Author: Makoto Yui Date: 2018-12-06T09:06:16Z Made eta default hyper-parameter flexible for each optimizer commit a92293906d43c25ce47032644774723a0cf713d9 Author: Makoto Yui Date: 2018-12-06T09:36:26Z Changed the default hyperparameter of AdaDelta commit 1494ea298497a846650b2d9f6799add77105ae77 Author: Makoto Yui Date: 2018-12-07T05:03:21Z Reduced the size of test data commit 79197a84ca4d840ab3150730d5e6d4a5ad96e719 Author: Makoto Yui Date: 2018-12-07T05:39:13Z Improved -help option handling commit 4fdcf6c84ec81c174f5e107038660b1200b1a9a5 Author: Makoto Yui Date: 2018-12-07T05:48:07Z Added assertions commit e1c7a68df679a65f496268bd4acc286b19d0a964 Author: Makoto Yui Date: 2018-12-07T07:39:58Z Fixed AdaDelta eta to 1.0 commit b8e5698ecd7e7d2758ef85a338c053f5bbcc663d Author: Makoto Yui Date: 2018-12-07T09:13:48Z Supported -amsgrad in Adam commit aa512c3b71039f97c2ac08b598fcb11f1cfc4d80 Author: Makoto Yui Date: 2018-12-07T09:59:59Z Supported -decay option in ADAM optimizer commit 19bd276ff9867ba93f42c241feb9aa5aafd0836c Author: Makoto Yui Date: 2018-12-07T10:15:24Z Revise the default eta0/alpha value commit 19fa61145e8be18c3f86988905b35f171e1ee50e Author: Makoto Yui Date: 2018-12-10T08:37:05Z Revised ADAM hyperparameter treatment ---
[GitHub] incubator-hivemall issue #13: [WIP] Kernelized Passive-Aggressive Algorithm ...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/13 Usage ```sql use a9a; create external table train ( rowid int, label float, features ARRAY ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY "," STORED AS TEXTFILE LOCATION 's3://myui-dev/Datasets/a9a/train/'; create external table test ( rowid int, label float, features ARRAY ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' COLLECTION ITEMS TERMINATED BY "," STORED AS TEXTFILE LOCATION 's3://myui-dev/Datasets/a9a/test/'; create or replace view train_x3 as select * from ( select amplify(3, rowid, label, features) as (rowid, label, features) from train ) t CLUSTER BY rand(31); create or replace view test_exploded as select t1.rowid, t2.h, t2.hk, t2.Xh, t2.Xk from test t1 LATERAL VIEW feature_pairs(features, "-kpa") t2 as h, hk, Xh, Xk; drop table kpa_model; create table kpa_model as select feature, avg(w0) as w0, avg(w1) as w1, avg(w2) as w2, avg(w3) as w3 from (select train_kpa(features,label,"-c 0.01") as (feature, w0, w1, w2, w3) from train -- train_x3 ) t group by feature; create or replace view kpa_predict as WITH p1 as ( select t1.rowid, kpa_predict( t1.Xh, -- nonnull t1.Xk, -- nonnull m1.w0, -- nullable m1.w1, -- nonnull m1.w2, -- nonnull m2.w3 -- nullable ) as score from test_exploded t1 LEFT OUTER JOIN kpa_model m1 ON (m1.feature = t1.h) LEFT OUTER JOIN kpa_model m2 ON (m2.feature = t1.hk) group by rowid ) select rowid, case when score > 0.0 then 1 else 0 end as label from p1; create or replace view kpa_submit as select t.label as actual, p.label as predicted from test t JOIN kpa_predict p on (t.rowid = p.rowid); select count(1)/16281 from kpa_submit where actual = predicted; ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #19: [HIVEMALL-23] Added Java annotations
GitHub user myui opened a pull request: https://github.com/apache/incubator-hivemall/pull/19 [HIVEMALL-23] Added Java annotations Introduce Java Annotations such as `Experimental`, `Issue`, and `VisibleForTesting`. See https://issues.apache.org/jira/browse/HIVEMALL-23 for the detail. You can merge this pull request into a Git repository by running: $ git pull https://github.com/myui/incubator-hivemall dev/annotations Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/19.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #19 commit c05a8721ab3c705edd837dc56a64b39ffc705944 Author: myui Date: 2017-01-16T08:12:53Z Added Java annotations --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #20: [HIVEMALL-28] Set HIVEMALL_HOME to absolute pa...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/20 @wangyum Thank you for the contribution. @maropu Could you review this one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #20: [HIVEMALL-28] Set HIVEMALL_HOME to absolute pa...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/20 @wangyum merged. Thank you for the contribution! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #22: [HIVEMALL-30] Increase -Xmx to 1536 to avoid O...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/22 @maropu Could you take a look at? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #21: [HIVEMALL-29] Add github pull request template
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/21 @wangyum Thank you for the contribution. Merged with small modifications. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #22: [HIVEMALL-30] Increase -Xmx to -Xmx1536m to av...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/22 @maropu BTW, why `mvn -q scalastyle:check test -Pspark-2.0" exited with 1` is happening? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #22: [HIVEMALL-30] Increase -Xmx to -Xmx1536m to av...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/22 @maropu It seems GC is still happening for some cases... ``` HivemallFeatureOpsSuite: No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself. ``` `MaxPermGen=1024m` might be too big to need more space for others. Do we need `-MaxPermGen` ? Less parameter is better for configuring JVM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #22: [HIVEMALL-30] Increase -Xmx to -Xmx1536m to av...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/22 @wangyum @maropu Thanks. Merged with some modifications. Configuration for spark-1.6 should also be changed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #24: [HIVEMALL-32] Print explicit error messages in...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/24 LGTM. Could you update ASF master as follows: ``` git checkout master git checkout -b CheckCompiler git pull https://github.com/maropu/incubator-hivemall.git hotfix/CheckCompiler git log | grep "Author" | head -1 git checkout master git merge --squash hotfix/CheckCompiler git commit -a --author="Takeshi Yamamuro " --message="Close #24: Print explicit error messages in building xgboost with clang" git push origin master ``` Note that it assumes that `origin` is https://git-wip-us.apache.org/repos/asf/incubator-hivemall.git --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #25: [HIVEMALL-34] Fix a bug to wrongly use mllib v...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/25 @maropu LGTM. Please merge this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #26: [HIVEMALL-35] Remove unnecessary implicit conv...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/26 @maropu LGTM. Please merge this PR and close JIRA issue as fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #27: [HIVEMALL-36] Refactor each_top_k
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/27 @maropu LGTM. Please merge this PR and close the JIRA ticket. BTW, `HiveUdfWithFeatureSuite` causes OOM again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/14#discussion_r97726633 --- Diff: core/src/main/java/hivemall/optimizer/Optimizer.java --- @@ -0,0 +1,246 @@ +/* + * Hivemall: Hive scalable Machine Learning Library + * + * Copyright (C) 2015 Makoto YUI + * Copyright (C) 2013-2015 National Institute of Advanced Industrial Science and Technology (AIST) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package hivemall.optimizer; + +import java.util.Map; +import javax.annotation.Nonnull; +import javax.annotation.concurrent.NotThreadSafe; + +import hivemall.model.WeightValue; +import hivemall.model.IWeightValue; + +public interface Optimizer { + +/** + * Update the weights of models thru this interface. --- End diff -- Revised. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/14#discussion_r97936571 --- Diff: core/src/main/java/hivemall/optimizer/Optimizer.java --- @@ -0,0 +1,246 @@ +/* + * Hivemall: Hive scalable Machine Learning Library + * + * Copyright (C) 2015 Makoto YUI + * Copyright (C) 2013-2015 National Institute of Advanced Industrial Science and Technology (AIST) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package hivemall.optimizer; + +import java.util.Map; +import javax.annotation.Nonnull; +import javax.annotation.concurrent.NotThreadSafe; + +import hivemall.model.WeightValue; +import hivemall.model.IWeightValue; + +public interface Optimizer { + +/** + * Update the weights of models thru this interface. + */ +float computeUpdatedValue(@Nonnull Object feature, float weight, float gradient); + +// Count up #step to tune learning rate +void proceedStep(); + +static abstract class OptimizerBase implements Optimizer { + +protected final EtaEstimator etaImpl; +protected final Regularization regImpl; + +protected int numStep = 1; + +public OptimizerBase(final Map options) { +this.etaImpl = EtaEstimator.get(options); +this.regImpl = Regularization.get(options); +} + +@Override +public void proceedStep() { +numStep++; +} + +// Directly update a given `weight` in terms of performance +protected void computeUpdateValue( +@Nonnull final IWeightValue weight, float gradient) { +float delta = computeUpdateValueImpl(weight, regImpl.regularize(weight.get(), gradient)); +weight.set(weight.get() - etaImpl.eta(numStep) * delta); +} + +// Compute a delta to update +protected float computeUpdateValueImpl( +@Nonnull final IWeightValue weight, float gradient) { +return gradient; +} + +} + +@NotThreadSafe +static final class SGD extends OptimizerBase { + +private final IWeightValue weightValueReused; + +public SGD(final Map options) { +super(options); +this.weightValueReused = new WeightValue(0.f); +} + +@Override +public float computeUpdatedValue( +@Nonnull Object feature, float weight, float gradient) { +computeUpdateValue(weightValueReused, gradient); +return weightValueReused.get(); +} + +} + +static abstract class AdaDelta extends OptimizerBase { + +private final float decay; +private final float eps; +private final float scale; + +public AdaDelta(Map options) { +super(options); +float decay = 0.95f; +float eps = 1e-6f; +float scale = 100.0f; --- End diff -- It's hivemall extension. Hivemall's Adagrad stores `*scaled* sum of squared gradients` in `float`, not double to reduce memory consumption. When using it, it try to get the original sum of squared gradients as follows: `double sumOfSquaredGradients = scaledSumOfSquaredGradients * scaling` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/14#discussion_r97724411 --- Diff: core/src/main/java/hivemall/classifier/GeneralClassifierUDTF.java --- @@ -0,0 +1,122 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package hivemall.classifier; + +import java.util.HashMap; +import java.util.Map; +import javax.annotation.Nonnull; + +import org.apache.commons.cli.CommandLine; +import org.apache.commons.cli.Options; +import org.apache.hadoop.hive.ql.exec.UDFArgumentException; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; + +import hivemall.optimizer.LossFunctions; +import hivemall.model.FeatureValue; + +/** + * A general classifier class with replaceable optimization functions. + */ +public class GeneralClassifierUDTF extends BinaryOnlineClassifierUDTF { + +protected final Map optimizerOptions; + +public GeneralClassifierUDTF() { +super(true); // This enables new model interfaces +this.optimizerOptions = new HashMap(); +// Set default values +optimizerOptions.put("optimizer", "adagrad"); +optimizerOptions.put("eta", "fixed"); +optimizerOptions.put("eta0", "1.0"); +optimizerOptions.put("regularization", "RDA"); +optimizerOptions.put("lambda", "1e-6"); +optimizerOptions.put("scale", "100.0"); +optimizerOptions.put("lambda", "1.0"); --- End diff -- `lambda` is specified twice. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/14#discussion_r97716028 --- Diff: core/src/main/java/hivemall/model/NewDenseModel.java --- @@ -0,0 +1,293 @@ +/* + * Hivemall: Hive scalable Machine Learning Library + * + * Copyright (C) 2015 Makoto YUI + * Copyright (C) 2013-2015 National Institute of Advanced Industrial Science and Technology (AIST) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package hivemall.model; + +import java.util.Arrays; +import javax.annotation.Nonnull; + +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; + +import hivemall.model.WeightValue.WeightValueWithCovar; +import hivemall.utils.collections.IMapIterator; +import hivemall.utils.hadoop.HiveUtils; +import hivemall.utils.lang.Copyable; +import hivemall.utils.math.MathUtils; + +public final class NewDenseModel extends AbstractPredictionModel { +private static final Log logger = LogFactory.getLog(NewDenseModel.class); + +private int size; +private float[] weights; +private float[] covars; + +// optional value for MIX +private short[] clocks; +private byte[] deltaUpdates; + +public NewDenseModel(int ndims) { +this(ndims, false); +} + +public NewDenseModel(int ndims, boolean withCovar) { +super(); +int size = ndims + 1; +this.size = size; +this.weights = new float[size]; +if (withCovar) { +float[] covars = new float[size]; +Arrays.fill(covars, 1f); +this.covars = covars; +} else { +this.covars = null; +} +this.clocks = null; +this.deltaUpdates = null; +} + +@Override +protected boolean isDenseModel() { +return true; +} + +@Override +public boolean hasCovariance() { +return covars != null; +} + +@Override +public void configureParams(boolean sum_of_squared_gradients, boolean sum_of_squared_delta_x, +boolean sum_of_gradients) {} + +@Override +public void configureClock() { +if (clocks == null) { +this.clocks = new short[size]; +this.deltaUpdates = new byte[size]; +} +} + +@Override +public boolean hasClock() { +return clocks != null; +} + +@Override +public void resetDeltaUpdates(int feature) { +deltaUpdates[feature] = 0; +} + +private void ensureCapacity(final int index) { +if (index >= size) { +int bits = MathUtils.bitsRequired(index); +int newSize = (1 << bits) + 1; +int oldSize = size; +logger.info("Expands internal array size from " + oldSize + " to " + newSize + " (" ++ bits + " bits)"); +this.size = newSize; +this.weights = Arrays.copyOf(weights, newSize); +if (covars != null) { +this.covars = Arrays.copyOf(covars, newSize); +Arrays.fill(covars, oldSize, newSize, 1.f); +} +if(clocks != null) { +this.clocks = Arrays.copyOf(clocks, newSize); +this.deltaUpdates = Arrays.copyOf(deltaUpdates, newSize); +} +} +} + +@SuppressWarnings("unchecked") +@Override +public T get(Object feature) { +final int i = HiveUtils.parseInt(feature); +if (i >= size) { +return null; +} +if(covars != null) { +return (T) new WeightValueWithCovar(weights[i], covars[i]); +} else { +return (T) new WeightValue(weights[i]); +} +} + +@Override +public void set(Object feature, T value) { +int i = HiveUtils.parseInt(feature); +ensureCapacity(i); +
[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/14#discussion_r97722648 --- Diff: core/src/main/java/hivemall/optimizer/EtaEstimator.java --- @@ -157,4 +158,34 @@ public static EtaEstimator get(@Nullable CommandLine cl, float defaultEta0) return new InvscalingEtaEstimator(eta0, power_t); } +@Nonnull +public static EtaEstimator get(@Nonnull final Map options) +throws IllegalArgumentException { +final String etaName = options.get("eta"); +if(etaName == null) { +return new FixedEtaEstimator(1.f); --- End diff -- Absolutely. Changed to `InvscalingEtaEstimator(0.1f, 0.1d)` when `eta` is not provided. (cc: @maropu ) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/14#discussion_r97715666 --- Diff: core/src/main/java/hivemall/classifier/GeneralClassifierUDTF.java --- @@ -0,0 +1,122 @@ +/* + * Hivemall: Hive scalable Machine Learning Library + * + * Copyright (C) 2015 Makoto YUI + * Copyright (C) 2013-2015 National Institute of Advanced Industrial Science and Technology (AIST) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package hivemall.classifier; + +import java.util.HashMap; +import java.util.Map; +import javax.annotation.Nonnull; + +import org.apache.commons.cli.CommandLine; +import org.apache.commons.cli.Options; +import org.apache.hadoop.hive.ql.exec.UDFArgumentException; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; + +import hivemall.optimizer.LossFunctions; +import hivemall.model.FeatureValue; + +/** + * A general classifier class with replaceable optimization functions. + */ +public class GeneralClassifierUDTF extends BinaryOnlineClassifierUDTF { + +protected final Map optimizerOptions; + +public GeneralClassifierUDTF() { +super(true); // This enables new model interfaces +this.optimizerOptions = new HashMap(); +// Set default values +optimizerOptions.put("optimizer", "adagrad"); +optimizerOptions.put("eta", "fixed"); +optimizerOptions.put("eta0", "1.0"); +optimizerOptions.put("regularization", "RDA"); +optimizerOptions.put("lambda", "1e-6"); +optimizerOptions.put("scale", "100.0"); +optimizerOptions.put("lambda", "1.0"); +} + +@Override +public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException { +if(argOIs.length != 2 && argOIs.length != 3) { +throw new UDFArgumentException( +this.getClass().getSimpleName() + + " takes 2 or 3 arguments: List features, int label " + + "[, constant string options]"); +} +return super.initialize(argOIs); +} + +@Override +protected Options getOptions() { +Options opts = super.getOptions(); +opts.addOption("optimizer", "opt", true, "Optimizer to update weights [default: adagrad+rda]"); +opts.addOption("eta", "eta0", true, "Initial learning rate [default 1.0]"); +opts.addOption("lambda", true, "Lambda value of RDA [default: 1e-6f]"); +opts.addOption("scale", true, "Scaling factor for cumulative weights [100.0]"); +opts.addOption("regularization", "reg", true, "Regularization type [default not-defined]"); +opts.addOption("lambda", true, "Regularization term on weights [default 1.0]"); +return opts; +} + +@Override +protected CommandLine processOptions(ObjectInspector[] argOIs) throws UDFArgumentException { +final CommandLine cl = super.processOptions(argOIs); +assert(cl != null); +if(cl != null) { +for(final String arg : cl.getArgs()) { +optimizerOptions.put(arg, cl.getOptionValue(arg)); +} +} +return cl; +} + +@Override +protected Map getOptimzierOptions() { --- End diff -- removed `getOptimzierOptions` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/14#discussion_r97715653 --- Diff: core/src/main/java/hivemall/classifier/GeneralClassifierUDTF.java --- @@ -0,0 +1,122 @@ +/* + * Hivemall: Hive scalable Machine Learning Library + * + * Copyright (C) 2015 Makoto YUI + * Copyright (C) 2013-2015 National Institute of Advanced Industrial Science and Technology (AIST) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package hivemall.classifier; + +import java.util.HashMap; +import java.util.Map; +import javax.annotation.Nonnull; + +import org.apache.commons.cli.CommandLine; +import org.apache.commons.cli.Options; +import org.apache.hadoop.hive.ql.exec.UDFArgumentException; +import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector; +import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector; + +import hivemall.optimizer.LossFunctions; +import hivemall.model.FeatureValue; + +/** + * A general classifier class with replaceable optimization functions. + */ +public class GeneralClassifierUDTF extends BinaryOnlineClassifierUDTF { + +protected final Map optimizerOptions; + +public GeneralClassifierUDTF() { +super(true); // This enables new model interfaces +this.optimizerOptions = new HashMap(); +// Set default values +optimizerOptions.put("optimizer", "adagrad"); +optimizerOptions.put("eta", "fixed"); +optimizerOptions.put("eta0", "1.0"); +optimizerOptions.put("regularization", "RDA"); +optimizerOptions.put("lambda", "1e-6"); +optimizerOptions.put("scale", "100.0"); +optimizerOptions.put("lambda", "1.0"); +} + +@Override +public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException { +if(argOIs.length != 2 && argOIs.length != 3) { +throw new UDFArgumentException( +this.getClass().getSimpleName() + + " takes 2 or 3 arguments: List features, int label " + + "[, constant string options]"); +} +return super.initialize(argOIs); +} + +@Override +protected Options getOptions() { +Options opts = super.getOptions(); +opts.addOption("optimizer", "opt", true, "Optimizer to update weights [default: adagrad+rda]"); +opts.addOption("eta", "eta0", true, "Initial learning rate [default 1.0]"); +opts.addOption("lambda", true, "Lambda value of RDA [default: 1e-6f]"); +opts.addOption("scale", true, "Scaling factor for cumulative weights [100.0]"); +opts.addOption("regularization", "reg", true, "Regularization type [default not-defined]"); +opts.addOption("lambda", true, "Regularization term on weights [default 1.0]"); +return opts; +} + +@Override +protected CommandLine processOptions(ObjectInspector[] argOIs) throws UDFArgumentException { +final CommandLine cl = super.processOptions(argOIs); +assert(cl != null); +if(cl != null) { --- End diff -- Fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/14#discussion_r97934044 --- Diff: core/src/main/java/hivemall/optimizer/Optimizer.java --- @@ -0,0 +1,246 @@ +/* + * Hivemall: Hive scalable Machine Learning Library + * + * Copyright (C) 2015 Makoto YUI + * Copyright (C) 2013-2015 National Institute of Advanced Industrial Science and Technology (AIST) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package hivemall.optimizer; + +import java.util.Map; +import javax.annotation.Nonnull; +import javax.annotation.concurrent.NotThreadSafe; + +import hivemall.model.WeightValue; +import hivemall.model.IWeightValue; + +public interface Optimizer { + +/** + * Update the weights of models thru this interface. + */ +float computeUpdatedValue(@Nonnull Object feature, float weight, float gradient); + +// Count up #step to tune learning rate --- End diff -- Renamed and added Javadoc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/14#discussion_r97715714 --- Diff: core/src/main/java/hivemall/model/NewDenseModel.java --- @@ -0,0 +1,293 @@ +/* + * Hivemall: Hive scalable Machine Learning Library + * + * Copyright (C) 2015 Makoto YUI + * Copyright (C) 2013-2015 National Institute of Advanced Industrial Science and Technology (AIST) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package hivemall.model; + +import java.util.Arrays; +import javax.annotation.Nonnull; + +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; + +import hivemall.model.WeightValue.WeightValueWithCovar; +import hivemall.utils.collections.IMapIterator; +import hivemall.utils.hadoop.HiveUtils; +import hivemall.utils.lang.Copyable; +import hivemall.utils.math.MathUtils; + +public final class NewDenseModel extends AbstractPredictionModel { +private static final Log logger = LogFactory.getLog(NewDenseModel.class); + +private int size; +private float[] weights; +private float[] covars; + +// optional value for MIX +private short[] clocks; +private byte[] deltaUpdates; + +public NewDenseModel(int ndims) { +this(ndims, false); +} + +public NewDenseModel(int ndims, boolean withCovar) { +super(); +int size = ndims + 1; +this.size = size; +this.weights = new float[size]; +if (withCovar) { +float[] covars = new float[size]; +Arrays.fill(covars, 1f); +this.covars = covars; +} else { +this.covars = null; +} +this.clocks = null; +this.deltaUpdates = null; +} + +@Override +protected boolean isDenseModel() { +return true; +} + +@Override +public boolean hasCovariance() { +return covars != null; +} + +@Override +public void configureParams(boolean sum_of_squared_gradients, boolean sum_of_squared_delta_x, +boolean sum_of_gradients) {} + +@Override +public void configureClock() { +if (clocks == null) { +this.clocks = new short[size]; +this.deltaUpdates = new byte[size]; +} +} + +@Override +public boolean hasClock() { +return clocks != null; +} + +@Override +public void resetDeltaUpdates(int feature) { +deltaUpdates[feature] = 0; +} + +private void ensureCapacity(final int index) { +if (index >= size) { +int bits = MathUtils.bitsRequired(index); +int newSize = (1 << bits) + 1; +int oldSize = size; +logger.info("Expands internal array size from " + oldSize + " to " + newSize + " (" ++ bits + " bits)"); +this.size = newSize; +this.weights = Arrays.copyOf(weights, newSize); +if (covars != null) { +this.covars = Arrays.copyOf(covars, newSize); +Arrays.fill(covars, oldSize, newSize, 1.f); +} +if(clocks != null) { +this.clocks = Arrays.copyOf(clocks, newSize); +this.deltaUpdates = Arrays.copyOf(deltaUpdates, newSize); +} +} +} + +@SuppressWarnings("unchecked") +@Override +public T get(Object feature) { +final int i = HiveUtils.parseInt(feature); +if (i >= size) { +return null; +} +if(covars != null) { +return (T) new WeightValueWithCovar(weights[i], covars[i]); +} else { +return (T) new WeightValue(weights[i]); +} +} + +@Override +public void set(Object feature, T value) { --- End diff -- Fixed --- If your project is set up for it, you can reply to
[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/14#discussion_r97716643 --- Diff: core/src/main/java/hivemall/model/NewSparseModel.java --- @@ -0,0 +1,197 @@ +/* + * Hivemall: Hive scalable Machine Learning Library + * + * Copyright (C) 2015 Makoto YUI + * Copyright (C) 2013-2015 National Institute of Advanced Industrial Science and Technology (AIST) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package hivemall.model; + +import hivemall.model.WeightValueWithClock.WeightValueParamsF1Clock; +import hivemall.model.WeightValueWithClock.WeightValueParamsF2Clock; +import hivemall.model.WeightValueWithClock.WeightValueWithCovarClock; +import hivemall.utils.collections.IMapIterator; +import hivemall.utils.collections.OpenHashMap; + +import javax.annotation.Nonnull; + +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; + +public final class NewSparseModel extends AbstractPredictionModel { +private static final Log logger = LogFactory.getLog(NewSparseModel.class); + +private final OpenHashMap weights; +private final boolean hasCovar; +private boolean clockEnabled; + +public NewSparseModel(int size) { +this(size, false); +} + +public NewSparseModel(int size, boolean hasCovar) { +super(); +this.weights = new OpenHashMap(size); +this.hasCovar = hasCovar; +this.clockEnabled = false; +} + +@Override +protected boolean isDenseModel() { +return false; +} + +@Override +public boolean hasCovariance() { +return hasCovar; +} + +@Override +public void configureParams(boolean sum_of_squared_gradients, boolean sum_of_squared_delta_x, +boolean sum_of_gradients) {} + +@Override +public void configureClock() { +this.clockEnabled = true; +} + +@Override +public boolean hasClock() { +return clockEnabled; +} + +@SuppressWarnings("unchecked") +@Override +public T get(final Object feature) { +return (T) weights.get(feature); +} + +@Override +public void set(final Object feature, final T value) { +assert (feature != null); +assert (value != null); + +final IWeightValue wrapperValue = wrapIfRequired(value); + +if (clockEnabled && value.isTouched()) { +IWeightValue old = weights.get(feature); +if (old != null) { +short newclock = (short) (old.getClock() + (short) 1); +wrapperValue.setClock(newclock); +int newDelta = old.getDeltaUpdates() + 1; +wrapperValue.setDeltaUpdates((byte) newDelta); +} +} +weights.put(feature, wrapperValue); + +onUpdate(feature, wrapperValue); +} + +@Override +public void delete(@Nonnull Object feature) { +weights.remove(feature); +} + +private IWeightValue wrapIfRequired(final IWeightValue value) { +final IWeightValue wrapper; +if (clockEnabled) { +switch (value.getType()) { +case NoParams: +wrapper = new WeightValueWithClock(value); +break; +case ParamsCovar: +wrapper = new WeightValueWithCovarClock(value); +break; +case ParamsF1: +wrapper = new WeightValueParamsF1Clock(value); +break; +case ParamsF2: +wrapper = new WeightValueParamsF2Clock(value); +break; +default: +throw new IllegalStateException("Unexpected value type: " + value.getType()); --- End diff -- It was a bug. Fixed. --- If your project is set up for it, you can reply to this email and have your reply ap
[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/14#discussion_r97715819 --- Diff: core/src/main/java/hivemall/model/NewDenseModel.java --- @@ -0,0 +1,293 @@ +/* + * Hivemall: Hive scalable Machine Learning Library + * + * Copyright (C) 2015 Makoto YUI + * Copyright (C) 2013-2015 National Institute of Advanced Industrial Science and Technology (AIST) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package hivemall.model; + +import java.util.Arrays; +import javax.annotation.Nonnull; + +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; + +import hivemall.model.WeightValue.WeightValueWithCovar; +import hivemall.utils.collections.IMapIterator; +import hivemall.utils.hadoop.HiveUtils; +import hivemall.utils.lang.Copyable; +import hivemall.utils.math.MathUtils; + +public final class NewDenseModel extends AbstractPredictionModel { +private static final Log logger = LogFactory.getLog(NewDenseModel.class); + +private int size; +private float[] weights; +private float[] covars; + +// optional value for MIX +private short[] clocks; +private byte[] deltaUpdates; + +public NewDenseModel(int ndims) { +this(ndims, false); +} + +public NewDenseModel(int ndims, boolean withCovar) { +super(); +int size = ndims + 1; +this.size = size; +this.weights = new float[size]; +if (withCovar) { +float[] covars = new float[size]; +Arrays.fill(covars, 1f); +this.covars = covars; +} else { +this.covars = null; +} +this.clocks = null; +this.deltaUpdates = null; +} + +@Override +protected boolean isDenseModel() { +return true; +} + +@Override +public boolean hasCovariance() { +return covars != null; +} + +@Override +public void configureParams(boolean sum_of_squared_gradients, boolean sum_of_squared_delta_x, +boolean sum_of_gradients) {} + +@Override +public void configureClock() { +if (clocks == null) { +this.clocks = new short[size]; +this.deltaUpdates = new byte[size]; +} +} + +@Override +public boolean hasClock() { +return clocks != null; +} + +@Override +public void resetDeltaUpdates(int feature) { +deltaUpdates[feature] = 0; +} + +private void ensureCapacity(final int index) { +if (index >= size) { +int bits = MathUtils.bitsRequired(index); +int newSize = (1 << bits) + 1; +int oldSize = size; +logger.info("Expands internal array size from " + oldSize + " to " + newSize + " (" ++ bits + " bits)"); +this.size = newSize; +this.weights = Arrays.copyOf(weights, newSize); +if (covars != null) { +this.covars = Arrays.copyOf(covars, newSize); +Arrays.fill(covars, oldSize, newSize, 1.f); +} +if(clocks != null) { +this.clocks = Arrays.copyOf(clocks, newSize); +this.deltaUpdates = Arrays.copyOf(deltaUpdates, newSize); +} +} +} + +@SuppressWarnings("unchecked") +@Override +public T get(Object feature) { +final int i = HiveUtils.parseInt(feature); +if (i >= size) { +return null; +} +if(covars != null) { +return (T) new WeightValueWithCovar(weights[i], covars[i]); +} else { +return (T) new WeightValue(weights[i]); +} +} + +@Override +public void set(Object feature, T value) { +int i = HiveUtils.parseInt(feature); +ensureCapacity(i); +
[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/14#discussion_r97937226 --- Diff: core/src/main/java/hivemall/optimizer/Regularization.java --- @@ -0,0 +1,99 @@ +/* + * Hivemall: Hive scalable Machine Learning Library + * + * Copyright (C) 2015 Makoto YUI + * Copyright (C) 2013-2015 National Institute of Advanced Industrial Science and Technology (AIST) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package hivemall.optimizer; + +import javax.annotation.Nonnull; +import java.util.Map; + +public abstract class Regularization { + +protected final float lambda; + +public Regularization(final Map options) { +float lambda = 1e-6f; --- End diff -- Agree with it. Revised to 0.0001. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/14#discussion_r97715678 --- Diff: core/src/main/java/hivemall/model/NewDenseModel.java --- @@ -0,0 +1,293 @@ +/* + * Hivemall: Hive scalable Machine Learning Library + * + * Copyright (C) 2015 Makoto YUI + * Copyright (C) 2013-2015 National Institute of Advanced Industrial Science and Technology (AIST) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package hivemall.model; + +import java.util.Arrays; +import javax.annotation.Nonnull; + +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; + +import hivemall.model.WeightValue.WeightValueWithCovar; +import hivemall.utils.collections.IMapIterator; +import hivemall.utils.hadoop.HiveUtils; +import hivemall.utils.lang.Copyable; +import hivemall.utils.math.MathUtils; + +public final class NewDenseModel extends AbstractPredictionModel { +private static final Log logger = LogFactory.getLog(NewDenseModel.class); + +private int size; +private float[] weights; +private float[] covars; + +// optional value for MIX +private short[] clocks; +private byte[] deltaUpdates; + +public NewDenseModel(int ndims) { +this(ndims, false); +} + +public NewDenseModel(int ndims, boolean withCovar) { +super(); +int size = ndims + 1; +this.size = size; +this.weights = new float[size]; +if (withCovar) { +float[] covars = new float[size]; +Arrays.fill(covars, 1f); +this.covars = covars; +} else { +this.covars = null; +} +this.clocks = null; +this.deltaUpdates = null; +} + +@Override +protected boolean isDenseModel() { +return true; +} + +@Override +public boolean hasCovariance() { +return covars != null; +} + +@Override +public void configureParams(boolean sum_of_squared_gradients, boolean sum_of_squared_delta_x, +boolean sum_of_gradients) {} + +@Override +public void configureClock() { +if (clocks == null) { +this.clocks = new short[size]; +this.deltaUpdates = new byte[size]; +} +} + +@Override +public boolean hasClock() { +return clocks != null; +} + +@Override +public void resetDeltaUpdates(int feature) { +deltaUpdates[feature] = 0; +} + +private void ensureCapacity(final int index) { +if (index >= size) { +int bits = MathUtils.bitsRequired(index); +int newSize = (1 << bits) + 1; +int oldSize = size; +logger.info("Expands internal array size from " + oldSize + " to " + newSize + " (" ++ bits + " bits)"); +this.size = newSize; +this.weights = Arrays.copyOf(weights, newSize); +if (covars != null) { +this.covars = Arrays.copyOf(covars, newSize); +Arrays.fill(covars, oldSize, newSize, 1.f); +} +if(clocks != null) { +this.clocks = Arrays.copyOf(clocks, newSize); +this.deltaUpdates = Arrays.copyOf(deltaUpdates, newSize); +} +} +} + +@SuppressWarnings("unchecked") +@Override +public T get(Object feature) { --- End diff -- Fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/14#discussion_r97715621 --- Diff: core/src/main/java/hivemall/model/NewSpaceEfficientDenseModel.java --- @@ -0,0 +1,317 @@ +/* + * Hivemall: Hive scalable Machine Learning Library + * + * Copyright (C) 2015 Makoto YUI + * Copyright (C) 2013-2015 National Institute of Advanced Industrial Science and Technology (AIST) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package hivemall.model; + +import hivemall.model.WeightValue.WeightValueWithCovar; +import hivemall.utils.collections.IMapIterator; +import hivemall.utils.hadoop.HiveUtils; +import hivemall.utils.lang.Copyable; +import hivemall.utils.lang.HalfFloat; +import hivemall.utils.math.MathUtils; + +import java.util.Arrays; +import javax.annotation.Nonnull; + +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; + +public final class NewSpaceEfficientDenseModel extends AbstractPredictionModel { +private static final Log logger = LogFactory.getLog(NewSpaceEfficientDenseModel.class); + +private int size; +private short[] weights; +private short[] covars; + +// optional value for MIX +private short[] clocks; +private byte[] deltaUpdates; + +public NewSpaceEfficientDenseModel(int ndims) { +this(ndims, false); +} + +public NewSpaceEfficientDenseModel(int ndims, boolean withCovar) { +super(); +int size = ndims + 1; +this.size = size; +this.weights = new short[size]; +if (withCovar) { +short[] covars = new short[size]; +Arrays.fill(covars, HalfFloat.ONE); +this.covars = covars; +} else { +this.covars = null; +} +this.clocks = null; +this.deltaUpdates = null; +} + +@Override +protected boolean isDenseModel() { +return true; +} + +@Override +public boolean hasCovariance() { +return covars != null; +} + +@Override +public void configureParams(boolean sum_of_squared_gradients, boolean sum_of_squared_delta_x, +boolean sum_of_gradients) {} + +@Override +public void configureClock() { +if (clocks == null) { +this.clocks = new short[size]; +this.deltaUpdates = new byte[size]; +} +} + +@Override +public boolean hasClock() { +return clocks != null; +} + +@Override +public void resetDeltaUpdates(int feature) { +deltaUpdates[feature] = 0; +} + +private float getWeight(final int i) { +final short w = weights[i]; +return (w == HalfFloat.ZERO) ? HalfFloat.ZERO : HalfFloat.halfFloatToFloat(w); +} + +private float getCovar(final int i) { +return HalfFloat.halfFloatToFloat(covars[i]); --- End diff -- That should not be happen. `i` is checked before calling `getCovar` as follows: ``` int i = HiveUtils.parseInt(feature); if (i >= size) { return 1f; } return getCovar(i); ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/14#discussion_r97716142 --- Diff: core/src/main/java/hivemall/model/NewDenseModel.java --- @@ -0,0 +1,293 @@ +/* + * Hivemall: Hive scalable Machine Learning Library + * + * Copyright (C) 2015 Makoto YUI + * Copyright (C) 2013-2015 National Institute of Advanced Industrial Science and Technology (AIST) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package hivemall.model; + +import java.util.Arrays; +import javax.annotation.Nonnull; + +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; + +import hivemall.model.WeightValue.WeightValueWithCovar; +import hivemall.utils.collections.IMapIterator; +import hivemall.utils.hadoop.HiveUtils; +import hivemall.utils.lang.Copyable; +import hivemall.utils.math.MathUtils; + +public final class NewDenseModel extends AbstractPredictionModel { +private static final Log logger = LogFactory.getLog(NewDenseModel.class); + +private int size; +private float[] weights; +private float[] covars; + +// optional value for MIX +private short[] clocks; +private byte[] deltaUpdates; + +public NewDenseModel(int ndims) { +this(ndims, false); +} + +public NewDenseModel(int ndims, boolean withCovar) { +super(); +int size = ndims + 1; +this.size = size; +this.weights = new float[size]; +if (withCovar) { +float[] covars = new float[size]; +Arrays.fill(covars, 1f); +this.covars = covars; +} else { +this.covars = null; +} +this.clocks = null; +this.deltaUpdates = null; +} + +@Override +protected boolean isDenseModel() { +return true; +} + +@Override +public boolean hasCovariance() { +return covars != null; +} + +@Override +public void configureParams(boolean sum_of_squared_gradients, boolean sum_of_squared_delta_x, +boolean sum_of_gradients) {} + +@Override +public void configureClock() { +if (clocks == null) { +this.clocks = new short[size]; +this.deltaUpdates = new byte[size]; +} +} + +@Override +public boolean hasClock() { +return clocks != null; +} + +@Override +public void resetDeltaUpdates(int feature) { +deltaUpdates[feature] = 0; +} + +private void ensureCapacity(final int index) { +if (index >= size) { +int bits = MathUtils.bitsRequired(index); +int newSize = (1 << bits) + 1; +int oldSize = size; +logger.info("Expands internal array size from " + oldSize + " to " + newSize + " (" ++ bits + " bits)"); +this.size = newSize; +this.weights = Arrays.copyOf(weights, newSize); +if (covars != null) { +this.covars = Arrays.copyOf(covars, newSize); +Arrays.fill(covars, oldSize, newSize, 1.f); +} +if(clocks != null) { +this.clocks = Arrays.copyOf(clocks, newSize); +this.deltaUpdates = Arrays.copyOf(deltaUpdates, newSize); +} +} +} + +@SuppressWarnings("unchecked") +@Override +public T get(Object feature) { +final int i = HiveUtils.parseInt(feature); +if (i >= size) { +return null; +} +if(covars != null) { +return (T) new WeightValueWithCovar(weights[i], covars[i]); +} else { +return (T) new WeightValue(weights[i]); +} +} + +@Override +public void set(Object feature, T value) { +int i = HiveUtils.parseInt(feature); +ensureCapacity(i); +
[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/14#discussion_r97716349 --- Diff: core/src/main/java/hivemall/model/NewSpaceEfficientDenseModel.java --- @@ -0,0 +1,317 @@ +/* + * Hivemall: Hive scalable Machine Learning Library + * + * Copyright (C) 2015 Makoto YUI + * Copyright (C) 2013-2015 National Institute of Advanced Industrial Science and Technology (AIST) + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package hivemall.model; + +import hivemall.model.WeightValue.WeightValueWithCovar; +import hivemall.utils.collections.IMapIterator; +import hivemall.utils.hadoop.HiveUtils; +import hivemall.utils.lang.Copyable; +import hivemall.utils.lang.HalfFloat; +import hivemall.utils.math.MathUtils; + +import java.util.Arrays; +import javax.annotation.Nonnull; + +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; + +public final class NewSpaceEfficientDenseModel extends AbstractPredictionModel { +private static final Log logger = LogFactory.getLog(NewSpaceEfficientDenseModel.class); + +private int size; +private short[] weights; +private short[] covars; + +// optional value for MIX +private short[] clocks; +private byte[] deltaUpdates; + +public NewSpaceEfficientDenseModel(int ndims) { +this(ndims, false); +} + +public NewSpaceEfficientDenseModel(int ndims, boolean withCovar) { +super(); +int size = ndims + 1; +this.size = size; +this.weights = new short[size]; +if (withCovar) { +short[] covars = new short[size]; +Arrays.fill(covars, HalfFloat.ONE); +this.covars = covars; +} else { +this.covars = null; +} +this.clocks = null; +this.deltaUpdates = null; +} + +@Override +protected boolean isDenseModel() { +return true; +} + +@Override +public boolean hasCovariance() { +return covars != null; +} + +@Override +public void configureParams(boolean sum_of_squared_gradients, boolean sum_of_squared_delta_x, +boolean sum_of_gradients) {} + +@Override +public void configureClock() { +if (clocks == null) { +this.clocks = new short[size]; +this.deltaUpdates = new byte[size]; +} +} + +@Override +public boolean hasClock() { +return clocks != null; +} + +@Override +public void resetDeltaUpdates(int feature) { +deltaUpdates[feature] = 0; +} + +private float getWeight(final int i) { +final short w = weights[i]; +return (w == HalfFloat.ZERO) ? HalfFloat.ZERO : HalfFloat.halfFloatToFloat(w); +} + +private float getCovar(final int i) { +return HalfFloat.halfFloatToFloat(covars[i]); +} + +private void _setWeight(final int i, final float v) { +if(Math.abs(v) >= HalfFloat.MAX_FLOAT) { +throw new IllegalArgumentException("Acceptable maximum weight is " ++ HalfFloat.MAX_FLOAT + ": " + v); +} +weights[i] = HalfFloat.floatToHalfFloat(v); +} + +private void setCovar(final int i, final float v) { +HalfFloat.checkRange(v); +covars[i] = HalfFloat.floatToHalfFloat(v); +} + +private void ensureCapacity(final int index) { +if (index >= size) { +int bits = MathUtils.bitsRequired(index); +int newSize = (1 << bits) + 1; +int oldSize = size; +logger.info("Expands internal array size from " + oldSize + " to " + newSize + " (" ++ bits + " bits)"); +this.size = newSize; +this.weights = Arrays.copyOf(weights, newSize); +if (covars != null) { +th
[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...
Github user myui commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/14#discussion_r96351621 --- Diff: core/src/main/java/hivemall/classifier/BinaryOnlineClassifierUDTF.java --- @@ -56,8 +57,19 @@ private boolean parseFeature; protected PredictionModel model; +protected Optimizer optimizerImpl; protected int count; +private boolean enableNewModel; + +public BinaryOnlineClassifierUDTF() { +this.enableNewModel = false; --- End diff -- `enableNewModel` is never used. Is this required? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #14: [WIP] Separate optimizer implementations
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/14 LossFunction should be selectable, not fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #28: [HIVEMALL-30] Temporarily ignore a streaming t...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/28 @maropu LGTM. Please merge it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #30: [HIVEMALL-37] Support a SST-based change-point...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/30 LGTM. You can merge this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #31: [HIVEMALL-40] Load xgboost-formatted data via ...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/31 LGTM. Please merge and close this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #30: [HIVEMALL-37] Support a SST-based change-point...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/30 @maropu Could you add [SPARK] after [HIVEMALL-37] in the title? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #29: [HIVEMALL-39][SPARK] Put the use of HiveUDFs i...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/29 @maropu LGTM. Please merge and close this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #30: [HIVEMALL-37][SPARK] Support a SST-based chang...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/30 @maropu LGTM. Please merge this PR and close the JIRA ticket. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #32: [HIVEMALL-42][DOC] Fix the link to license fil...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/32 @aajisaka We welcome your first contribution! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #33: [HIVEMALL-44][SAPRK] Implement a prototype of ...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/33 output `top1Df` should be explained in the description. BTW, I personally prefer `top_k_join` instead of `join_top_k`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #34: [HIVEMALL-45][SPARK] Upgrade spark v2.0.0 to v...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/34 @maropu LGTM. Please merge this PR and close the corresponding JIRA ticket as fixed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #33: [HIVEMALL-44][SAPRK] Implement a prototype of ...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/33 @maropu waiting for markdown to be included in this PR :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #35: [HIVEMALL-31][SPARK] Support Spark-v2.1.0
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/35 @maropu test is failing for v2.1. ``` Saving to outputFile=/home/travis/build/apache/incubator-hivemall/spark/spark-common/target/scalastyle-output.xml Processed 3 file(s) Found 0 errors Found 0 warnings Found 0 infos Finished in 667 ms error file=/home/travis/build/apache/incubator-hivemall/spark/spark-2.1/src/test/scala/org/apache/spark/sql/QueryTest.scala message=Header does not match expected text line=2 error file=/home/travis/build/apache/incubator-hivemall/spark/spark-2.1/src/test/scala/org/apache/spark/sql/catalyst/plans/PlanTest.scala message=Header does not match expected text line=2 error file=/home/travis/build/apache/incubator-hivemall/spark/spark-2.1/src/test/scala/org/apache/spark/sql/hive/HivemallOpsSuite.scala message=import.ordering.wrongOrderInGroup.message line=22 column=0 error file=/home/travis/build/apache/incubator-hivemall/spark/spark-2.1/src/test/scala/org/apache/spark/sql/hive/HivemallOpsSuite.scala message=import.ordering.wrongOrderInGroup.message line=30 column=0 error file=/home/travis/build/apache/incubator-hivemall/spark/spark-2.1/src/test/scala/org/apache/spark/sql/hive/test/TestHiveSingleton.scala message=Header does not match expected text line=2 error file=/home/travis/build/apache/incubator-hivemall/spark/spark-2.1/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala message=Header does not match expected text line=2 error file=/home/travis/build/apache/incubator-hivemall/spark/spark-2.1/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala message=Header does not match expected text line=2 error file=/home/travis/build/apache/incubator-hivemall/spark/spark-2.1/src/test/scala/org/apache/spark/sql/test/VectorQueryTest.scala message=import.ordering.missingEmptyLine.message line=25 column=0 Saving to outputFile=/home/travis/build/apache/incubator-hivemall/spark/spark-2.1/target/scalastyle-output.xml Processed 25 file(s) Found 8 errors Found 0 warnings Found 0 infos Finished in 2737 ms [ERROR] Failed to execute goal org.scalastyle:scalastyle-maven-plugin:0.8.0:check (default-cli) on project hivemall-spark: Failed during scalastyle execution: You have 8 Scalastyle violation(s). -> [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :hivemall-spark The command "mvn -q scalastyle:check test -Pspark-2.1" exited with 1. ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #34: [HIVEMALL-45][SPARK] Upgrade spark v2.0.0 to v...
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/34 Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #35: [HIVEMALL-31][SPARK] Support Spark-v2.1.0
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/35 @maropu LGTM. Please merge this PR and close the JIRA issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #35: [HIVEMALL-31][SPARK] Support Spark-v2.1.0
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/35 BTW, you can close https://github.com/apache/incubator-hivemall/pull/23 as well. `Close #35, #23: ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #35: [HIVEMALL-31][SPARK] Support Spark-v2.1.0
Github user myui commented on the issue: https://github.com/apache/incubator-hivemall/pull/35 Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---