[GitHub] incubator-hivemall pull request #158: [HIVEMALL-215] Add step-by-step tutori...

2018-08-30 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/158#discussion_r214223029
  
--- Diff: docs/gitbook/supervised_learning/tutorial.md ---
@@ -0,0 +1,461 @@
+
+
+# Step-by-Step Tutorial on Supervised Learning with Apache Hivemall
+
+
+
+## What is Hivemall?
+
+[Apache Hive](https://hive.apache.org/) is a data warehousing solution 
that enables us to process large-scale data in the form of SQL easily. Assume 
that you have a table named `purchase_history` which can be artificially 
created as:
+
+```sql
+create table if not exists purchase_history as
+select 1 as id, "Saturday" as day_of_week, "male" as gender, 600 as price, 
"book" as category, 1 as label
+union all
+select 2 as id, "Friday" as day_of_week, "female" as gender, 4800 as 
price, "sports" as category, 0 as label
+union all
+select 3 as id, "Friday" as day_of_week, "other" as gender, 18000 as 
price, "entertainment" as category, 0 as label
+union all
+select 4 as id, "Thursday" as day_of_week, "male" as gender, 200 as price, 
"food" as category, 0 as label
+union all
+select 5 as id, "Wednesday" as day_of_week, "female" as gender, 1000 as 
price, "electronics" as category, 1 as label
+;
+```
+
+The syntax of Hive queries, namely **HiveQL**, is very similar to SQL:
+
+```sql
+select count(1) from purchase_history;
+```
+
+> 5
+
+[Apache Hivemall](https://github.com/apache/incubator-hivemall) is a 
collection of user-defined functions (UDFs) for HiveQL which is strongly 
optimized for machine learning (ML) and data science. To give an example, you 
can efficiently build a logistic regression model with the stochastic gradient 
descent (SGD) optimization by issuing the following ~10 lines of query:
+
+```sql
+SELECT
+  train_classifier(
+features,
+label,
+'-loss_function logloss -optimizer SGD'
+  ) as (feature, weight)
+FROM
+  training
+;
+```
+
+
+Hivemall function [`hivemall_version()`](../misc/funcs.html#others) shows 
current Hivemall version, for example:
+
+```sql
+select hivemall_version();
+```
+
+> "0.5.1-incubating-SNAPSHOT"
+
+Below we list ML and relevant problems that Hivemall can solve:
+
+- [Binary and multi-class classification](../binaryclass/general.html)
+- [Regression](../regression/general.html)
+- [Recommendation](../recommend/cf.html)
+- [Anomaly detection](../anomaly/lof.html)
+- [Natural language processing](../misc/tokenizer.html)
+- [Clustering](../misc/tokenizer.html) (i.e., topic modeling)
+- [Data sketching](../misc/funcs.html#sketching)
+- Evaluation
+
+Our [YouTube demo video](https://www.youtube.com/watch?v=cMUsuA9KZ_c) 
would be helpful to understand more about an overview of Hivemall.
+
+This tutorial explains the basic usage of Hivemall with examples of 
supervised learning of simple regressor and binary classifier.
+
+## Binary classification
+
+Imagine a scenario that we like to build a binary classifier from the mock 
`purchase_history` data and predict unforeseen purchases to conduct a new 
campaign effectively:
+
+| day\_of\_week | gender | price | category | label |
+|:---:|:---:|:---:|:---:|:---|
+|Saturday | male | 600 | book | 1 |
+|Friday | female | 4800 | sports | 0 |
+|Friday | other | 18000  | entertainment | 0 |
+|Thursday | male | 200 | food | 0 |
+|Wednesday | female | 1000 | electronics | 1 |
+
--- End diff --

Insert here something like..

You can create this table as follows:

```sql
create table if not exists purchase_history as ..
```


---


[GitHub] incubator-hivemall pull request #158: [HIVEMALL-215] Add step-by-step tutori...

2018-08-30 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/158#discussion_r214223937
  
--- Diff: docs/gitbook/supervised_learning/tutorial.md ---
@@ -0,0 +1,461 @@
+
+
+# Step-by-Step Tutorial on Supervised Learning with Apache Hivemall
+
+
+
+## What is Hivemall?
+
+[Apache Hive](https://hive.apache.org/) is a data warehousing solution 
that enables us to process large-scale data in the form of SQL easily. Assume 
that you have a table named `purchase_history` which can be artificially 
created as:
+
+```sql
+create table if not exists purchase_history as
+select 1 as id, "Saturday" as day_of_week, "male" as gender, 600 as price, 
"book" as category, 1 as label
+union all
+select 2 as id, "Friday" as day_of_week, "female" as gender, 4800 as 
price, "sports" as category, 0 as label
+union all
+select 3 as id, "Friday" as day_of_week, "other" as gender, 18000 as 
price, "entertainment" as category, 0 as label
+union all
+select 4 as id, "Thursday" as day_of_week, "male" as gender, 200 as price, 
"food" as category, 0 as label
+union all
+select 5 as id, "Wednesday" as day_of_week, "female" as gender, 1000 as 
price, "electronics" as category, 1 as label
+;
+```
+
+The syntax of Hive queries, namely **HiveQL**, is very similar to SQL:
+
+```sql
+select count(1) from purchase_history;
+```
+
+> 5
+
+[Apache Hivemall](https://github.com/apache/incubator-hivemall) is a 
collection of user-defined functions (UDFs) for HiveQL which is strongly 
optimized for machine learning (ML) and data science. To give an example, you 
can efficiently build a logistic regression model with the stochastic gradient 
descent (SGD) optimization by issuing the following ~10 lines of query:
+
+```sql
+SELECT
+  train_classifier(
+features,
+label,
+'-loss_function logloss -optimizer SGD'
+  ) as (feature, weight)
+FROM
+  training
+;
+```
+
+
+Hivemall function [`hivemall_version()`](../misc/funcs.html#others) shows 
current Hivemall version, for example:
+
+```sql
+select hivemall_version();
+```
+
+> "0.5.1-incubating-SNAPSHOT"
+
+Below we list ML and relevant problems that Hivemall can solve:
+
+- [Binary and multi-class classification](../binaryclass/general.html)
+- [Regression](../regression/general.html)
+- [Recommendation](../recommend/cf.html)
+- [Anomaly detection](../anomaly/lof.html)
+- [Natural language processing](../misc/tokenizer.html)
+- [Clustering](../misc/tokenizer.html) (i.e., topic modeling)
+- [Data sketching](../misc/funcs.html#sketching)
+- Evaluation
+
+Our [YouTube demo video](https://www.youtube.com/watch?v=cMUsuA9KZ_c) 
would be helpful to understand more about an overview of Hivemall.
+
+This tutorial explains the basic usage of Hivemall with examples of 
supervised learning of simple regressor and binary classifier.
+
+## Binary classification
+
+Imagine a scenario that we like to build a binary classifier from the mock 
`purchase_history` data and predict unforeseen purchases to conduct a new 
campaign effectively:
+
+| day\_of\_week | gender | price | category | label |
+|:---:|:---:|:---:|:---:|:---|
+|Saturday | male | 600 | book | 1 |
+|Friday | female | 4800 | sports | 0 |
+|Friday | other | 18000  | entertainment | 0 |
+|Thursday | male | 200 | food | 0 |
+|Wednesday | female | 1000 | electronics | 1 |
+
+Use Hivemall 
[`train_classifier()`](../misc/funcs.html#binary-classification) UDF to tackle 
the problem as follows.
+
+### Step 1. Feature representation
+
+First of all, we have to convert the records into pairs of the feature 
vector and corresponding target value. Here, Hivemall requires you to represent 
input features in a specific format.
+
+To be more precise, Hivemall represents single feature in a concatenation 
of **index** (i.e., **name**) and its **value**:
+
+- Quantitative feature: `:`
+  - e.g., `price:600.0`
+- Categorical feature: `#`
+  - e.g., `gender#male`
+
--- End diff --

Better to insert the following sentence after the example.

Feature index and feature value are separated by comma. When comma is 
omitted, the value is considered to be `1.0`. So, a categorical feature 
`gender#male` a [one-hot 
representation](https://www.quora.com/What-is-one-hot-encoding-and-when-is-it-used-in-data-science)
 of `index := gender#male` and `value := 1.0`. Note that `#` is not a special 
charactor.


---


[GitHub] incubator-hivemall pull request #158: [HIVEMALL-215] Add step-by-step tutori...

2018-08-30 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/158#discussion_r214222772
  
--- Diff: docs/gitbook/supervised_learning/tutorial.md ---
@@ -0,0 +1,461 @@
+
+
+# Step-by-Step Tutorial on Supervised Learning with Apache Hivemall
+
+
+
+## What is Hivemall?
+
+[Apache Hive](https://hive.apache.org/) is a data warehousing solution 
that enables us to process large-scale data in the form of SQL easily. Assume 
that you have a table named `purchase_history` which can be artificially 
created as:
+
+```sql
+create table if not exists purchase_history as
+select 1 as id, "Saturday" as day_of_week, "male" as gender, 600 as price, 
"book" as category, 1 as label
+union all
+select 2 as id, "Friday" as day_of_week, "female" as gender, 4800 as 
price, "sports" as category, 0 as label
+union all
+select 3 as id, "Friday" as day_of_week, "other" as gender, 18000 as 
price, "entertainment" as category, 0 as label
+union all
+select 4 as id, "Thursday" as day_of_week, "male" as gender, 200 as price, 
"food" as category, 0 as label
+union all
+select 5 as id, "Wednesday" as day_of_week, "female" as gender, 1000 as 
price, "electronics" as category, 1 as label
+;
+```
+
+The syntax of Hive queries, namely **HiveQL**, is very similar to SQL:
+
+```sql
+select count(1) from purchase_history;
+```
+
+> 5
+
--- End diff --

General introduction to Apache Hive and HiveQL is not required for 
Hivemall's document. The base document is for introducing Hivemall to TD's 
customers who might not aware differences of Hive and Presto.

You can start with `Apache Hivemall is a ... lines of query as follows:`


---


[GitHub] incubator-hivemall pull request #158: [HIVEMALL-215] Add step-by-step tutori...

2018-08-30 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/158#discussion_r214226384
  
--- Diff: docs/gitbook/supervised_learning/tutorial.md ---
@@ -0,0 +1,461 @@
+
+
+# Step-by-Step Tutorial on Supervised Learning with Apache Hivemall
+
+
+
+## What is Hivemall?
+
+[Apache Hive](https://hive.apache.org/) is a data warehousing solution 
that enables us to process large-scale data in the form of SQL easily. Assume 
that you have a table named `purchase_history` which can be artificially 
created as:
+
+```sql
+create table if not exists purchase_history as
+select 1 as id, "Saturday" as day_of_week, "male" as gender, 600 as price, 
"book" as category, 1 as label
+union all
+select 2 as id, "Friday" as day_of_week, "female" as gender, 4800 as 
price, "sports" as category, 0 as label
+union all
+select 3 as id, "Friday" as day_of_week, "other" as gender, 18000 as 
price, "entertainment" as category, 0 as label
+union all
+select 4 as id, "Thursday" as day_of_week, "male" as gender, 200 as price, 
"food" as category, 0 as label
+union all
+select 5 as id, "Wednesday" as day_of_week, "female" as gender, 1000 as 
price, "electronics" as category, 1 as label
+;
+```
+
+The syntax of Hive queries, namely **HiveQL**, is very similar to SQL:
+
+```sql
+select count(1) from purchase_history;
+```
+
+> 5
+
+[Apache Hivemall](https://github.com/apache/incubator-hivemall) is a 
collection of user-defined functions (UDFs) for HiveQL which is strongly 
optimized for machine learning (ML) and data science. To give an example, you 
can efficiently build a logistic regression model with the stochastic gradient 
descent (SGD) optimization by issuing the following ~10 lines of query:
+
+```sql
+SELECT
+  train_classifier(
+features,
+label,
+'-loss_function logloss -optimizer SGD'
+  ) as (feature, weight)
+FROM
+  training
+;
+```
+
+
+Hivemall function [`hivemall_version()`](../misc/funcs.html#others) shows 
current Hivemall version, for example:
+
+```sql
+select hivemall_version();
+```
+
+> "0.5.1-incubating-SNAPSHOT"
+
+Below we list ML and relevant problems that Hivemall can solve:
+
+- [Binary and multi-class classification](../binaryclass/general.html)
+- [Regression](../regression/general.html)
+- [Recommendation](../recommend/cf.html)
+- [Anomaly detection](../anomaly/lof.html)
+- [Natural language processing](../misc/tokenizer.html)
+- [Clustering](../misc/tokenizer.html) (i.e., topic modeling)
+- [Data sketching](../misc/funcs.html#sketching)
+- Evaluation
+
+Our [YouTube demo video](https://www.youtube.com/watch?v=cMUsuA9KZ_c) 
would be helpful to understand more about an overview of Hivemall.
+
+This tutorial explains the basic usage of Hivemall with examples of 
supervised learning of simple regressor and binary classifier.
+
+## Binary classification
+
+Imagine a scenario that we like to build a binary classifier from the mock 
`purchase_history` data and predict unforeseen purchases to conduct a new 
campaign effectively:
+
+| day\_of\_week | gender | price | category | label |
+|:---:|:---:|:---:|:---:|:---|
+|Saturday | male | 600 | book | 1 |
+|Friday | female | 4800 | sports | 0 |
+|Friday | other | 18000  | entertainment | 0 |
+|Thursday | male | 200 | food | 0 |
+|Wednesday | female | 1000 | electronics | 1 |
+
+Use Hivemall 
[`train_classifier()`](../misc/funcs.html#binary-classification) UDF to tackle 
the problem as follows.
+
+### Step 1. Feature representation
+
+First of all, we have to convert the records into pairs of the feature 
vector and corresponding target value. Here, Hivemall requires you to represent 
input features in a specific format.
+
+To be more precise, Hivemall represents single feature in a concatenation 
of **index** (i.e., **name**) and its **value**:
+
+- Quantitative feature: `:`
+  - e.g., `price:600.0`
+- Categorical feature: `#`
+  - e.g., `gender#male`
+
+Each of those features is a string value in Hive, and "feature vector" 
means an array of string values like:
+
+```
+["price:600.0", "day of week#Saturday", "gender#male", "category#book"]
+```
+
+See also more detailed [document for input 
format](../getting_started/input-format.html)).
+
+Therefore, what we first need to do is to convert the records into an 
array of feature strings, and Hivemall functions 
[

[GitHub] incubator-hivemall pull request #158: [HIVEMALL-215] Add step-by-step tutori...

2018-08-30 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/158#discussion_r214236762
  
--- Diff: docs/gitbook/supervised_learning/tutorial.md ---
@@ -0,0 +1,457 @@
+
+
+# Step-by-Step Tutorial on Supervised Learning with Apache Hivemall
--- End diff --

Remove obvious `with Apache Hivemall`


---


[GitHub] incubator-hivemall issue #158: [HIVEMALL-215] Add step-by-step tutorial on S...

2018-08-30 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/158
  
@chezou Merged. Thank you for your first contribution!


---


[GitHub] incubator-hivemall pull request #159: [HIVEMALL-214][DOC] Update userguide f...

2018-08-31 Thread myui
GitHub user myui opened a pull request:

https://github.com/apache/incubator-hivemall/pull/159

[HIVEMALL-214][DOC] Update userguide for General Classifier/Regressor 
example

## What changes were proposed in this pull request?

Refine user guide for generic classifier/regressor and so on.

## What type of PR is it?

Documentation

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-214

## How to use this feature?

See user guide.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/myui/incubator-hivemall HIVEMALL-214

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/159.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #159


commit 6f40c466e21c78238a74f9c2f227df8ae156b3e2
Author: Makoto Yui 
Date:   2018-08-31T07:38:17Z

Added general classifier example using a9a dataset

commit 4963b63ab685aa539c6c0f5f3cd3230215ba4df7
Author: Makoto Yui 
Date:   2018-08-31T07:46:31Z

Added assertions for deprecated contents

commit 472821279d70e4171b7cf391a09bac10c95e28cb
Author: Makoto Yui 
Date:   2018-08-31T08:02:13Z

Capitalized topics and fixed a typo

commit 649e77840ff154bd75cd7c1bfdfc245516b68b0d
Author: Makoto Yui 
Date:   2018-08-31T11:18:50Z

Refined user guide




---


[GitHub] incubator-hivemall pull request #161: [HIVEMALL-216] Fix Docker image based ...

2018-09-03 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/161#discussion_r214793366
  
--- Diff: docs/gitbook/docker/getting_started.md ---
@@ -17,29 +17,31 @@
   under the License.
 -->
 
+# Getting started with Hivemall on Docker
+
 This page introduces how to run Hivemall on Docker.
 
 
 
 >  Caution
 > This docker image contains a single-node Hadoop enviroment for 
evaluating Hivemall. Not suited for production uses.
 
-# Requirements
+## Requirements
 
  * Docker Engine 1.6+
  * Docker Compose 1.10+
 
-# 1. Build image
+## 1. Build image
--- End diff --

Could you remove `1.` and `2.`?

See what's happing in

http://hivemall.incubator.apache.org/userguide/docker/getting_started.html#1-build-image


---


[GitHub] incubator-hivemall pull request #160: [HIVEMALL-163] Add IS_INFINITE, IS_FIN...

2018-09-03 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/160#discussion_r214800712
  
--- Diff: core/src/main/java/hivemall/tools/math/IsInfiniteUDF.java ---
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package hivemall.tools.math;
+
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDF;
+
+@Description(name = "is_infinite", value = "_FUNC_(x) - Determine if x is 
infinite.")
+public final class IsInfiniteUDF extends UDF {
+public Boolean evaluate(Double num) {
+if (num == null) {
+return null;
+} else {
+return !num.isNaN() && num.isInfinite();
--- End diff --

Is `!num.isNaN() &&` required? 


---


[GitHub] incubator-hivemall pull request #162: [HIVEMALL-217] Resolve missing links f...

2018-09-04 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/162#discussion_r214870585
  
--- Diff: docs/gitbook/tips/hadoop_tuning.md ---
@@ -75,13 +75,13 @@ feature_dimensions (2^24 by the default) * 4 bytes 
(float) * 2 (iff covariance i
 ```
 > 2^24 * 4 bytes * 2 * 1.2 ≈ 161MB
 
-When 
[SpaceEfficientDenseModel](https://github.com/apache/incubator-hivemall/blob/master/src/main/java/hivemall/io/SpaceEfficientDenseModel.java)
 is used, the formula changes as follows:
+When 
[SpaceEfficientDenseModel](https://github.com/myui/hivemall/blob/master/src/main/java/hivemall/io/SpaceEfficientDenseModel.java)
 is used, the formula changes as follows:
--- End diff --

`github.com/myui` is deprecated. 

Use 
https://github.com/apache/incubator-hivemall/blob/master/core/src/main/java/hivemall/model/SpaceEfficientDenseModel.java
 instead

other appearance of `github.com/myui` as well.


---


[GitHub] incubator-hivemall pull request #162: [HIVEMALL-217] Resolve missing links f...

2018-09-06 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/162#discussion_r215535078
  
--- Diff: docs/gitbook/tips/emr.md ---
@@ -21,15 +21,15 @@
 
 ## Prerequisite
 Learn how to use Hive with Elastic MapReduce (EMR).  

-http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hive.html
+https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive.html
 
 Before launching an EMR job, 
 * create ${s3bucket}/emr/outputs for outputs
 * optionally, create ${s3bucket}/emr/logs for logging
-* put 
[emr_hivemall_bootstrap.sh](https://raw.github.com/myui/hivemall/master/scripts/misc/emr_hivemall_bootstrap.sh)
 on ${s3bucket}/emr/conf
+* put 
[emr_hivemall_bootstrap.sh](https://raw.githubusercontent.com/apache/incubator-hivemall/master/resources/misc/emr_hivemall_bootstrap.sh)
 on ${s3bucket}/emr/conf
 
 Then, lunch an EMR job with hive in an interactive mode.
-I'm usually lunching EMR instances with cheap Spot instances through [CLI 
client](http://aws.amazon.com/developertools/2264) as follows:
+I'm usually lunching EMR instances with cheap Spot instances through [CLI 
client](https://aws.amazon.com/jp/tools/) as follows:
--- End diff --

should be `https://aws.amazon.com/tools/`


---


[GitHub] incubator-hivemall pull request #163: [HIVEMALL-196][WIP] Support BM25 scori...

2018-09-06 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/163#discussion_r215564184
  
--- Diff: core/src/main/java/hivemall/ftvec/text/OkapiBM25UDF.java ---
@@ -0,0 +1,167 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package hivemall.ftvec.text;
+
+import hivemall.UDFWithOptions;
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.Options;
+import org.apache.hadoop.hive.ql.exec.Description;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.metadata.Hive;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import hivemall.utils.hadoop.HiveUtils;
+import org.apache.hadoop.hive.ql.udf.UDFType;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.PrimitiveObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils;
+import org.apache.hadoop.io.DoubleWritable;
+
+import javax.annotation.Nonnull;
+import java.util.Arrays;
+
+@Description(name = "okapi_bm25",
+value = "_FUNC_(double tf_word, int dl, double avgdl, int N, int n 
[, const string options]) - Return an Okapi BM25 score in float")
+//TODO: What does stateful mean?
--- End diff --


https://hive.apache.org/javadocs/r1.2.2/api/org/apache/hadoop/hive/ql/udf/UDFType.html#stateful()

So, it's okey `stateful = false`. Please remove this comment.


---


[GitHub] incubator-hivemall issue #163: [HIVEMALL-196][WIP] Support BM25 scoring

2018-09-06 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/163
  
Please add a unit test and evaluate this function on Hive environment.


---


[GitHub] incubator-hivemall pull request #164: [HIVEMALL-218] Fixed train_lda NPE whe...

2018-09-07 Thread myui
GitHub user myui opened a pull request:

https://github.com/apache/incubator-hivemall/pull/164

[HIVEMALL-218] Fixed train_lda NPE where input row is null

## What changes were proposed in this pull request?

Fixed NegativeArraySizeException where input is NULL of `train_lda`

## What type of PR is it?

Bug Fix

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-218

## How was this patch tested?

manual tests

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, 
for your commit?
- [x] Did you run system tests on Hive (or Spark)?


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/myui/incubator-hivemall HIVEMALL-218

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/164.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #164


commit 67f6f68acad09c7a0e70f9fbdb183116eeec6a1d
Author: Makoto Yui 
Date:   2018-09-07T08:56:43Z

Fixed NegativeArraySizeException where input is NULL

commit d367de34e34d42514c0bb6141fbf31f295e33e50
Author: Makoto Yui 
Date:   2018-09-07T09:15:05Z

Fixed NPE in forward()




---


[GitHub] incubator-hivemall pull request #165: [HIVEMALL-219][BUGFIX] Fixed NPE in fi...

2018-09-18 Thread myui
GitHub user myui opened a pull request:

https://github.com/apache/incubator-hivemall/pull/165

[HIVEMALL-219][BUGFIX] Fixed NPE in finalizeTraining()

## What changes were proposed in this pull request?

Fixed NPE in finalizeTraining() where there are no training example 

## What type of PR is it?

Bug Fix

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-219

## How was this patch tested?

to appear 

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, 
for your commit?
- [ ] Did you run system tests on Hive (or Spark)?


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/myui/incubator-hivemall HIVEMALL-219

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/165.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #165


commit bc0e14d1d29ba13b173165bca9d9511b19abbc6e
Author: Makoto Yui 
Date:   2018-09-18T09:42:06Z

Fixed NPE in finalizeTraining()




---


[GitHub] incubator-hivemall pull request #166: [HIVEMALL-219] Fixed LDA bug for singl...

2018-09-18 Thread myui
GitHub user myui opened a pull request:

https://github.com/apache/incubator-hivemall/pull/166

[HIVEMALL-219] Fixed LDA bug for single update and added unit tests

## What changes were proposed in this pull request?

Fixed LDA bug for single update and added unit tests

## What type of PR is it?

Bug Fix

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-219

## How was this patch tested?

unit tests and manual tests on EMR

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, 
for your commit?
- [x] Did you run system tests on Hive (or Spark)?


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/myui/incubator-hivemall HIVEMALL-219-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/166.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #166


commit 202eddd71c00e3889c0a126fe1038df35c1513d9
Author: Makoto Yui 
Date:   2018-09-18T10:36:02Z

Fixed LDA bug for single update and added unit tests




---


[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor

2018-10-18 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/167#discussion_r226199666
  
--- Diff: core/src/main/java/hivemall/mf/CofactorModel.java ---
@@ -0,0 +1,629 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package hivemall.mf;
+
+import hivemall.fm.Feature;
+import hivemall.utils.math.MathUtils;
+import hivemall.utils.math.MatrixUtils;
+import org.apache.commons.math3.linear.ArrayRealVector;
+import org.apache.commons.math3.linear.Array2DRowRealMatrix;
+import org.apache.commons.math3.linear.RealMatrix;
+import org.apache.commons.math3.linear.RealVector;
+import org.apache.commons.math3.linear.SingularValueDecomposition;
+
+import javax.annotation.Nonnegative;
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import java.util.*;
+
+public class CofactorModel {
+
+public enum RankInitScheme {
+random /* default */, gaussian;
+
+@Nonnegative
+protected float maxInitValue;
+@Nonnegative
+protected double initStdDev;
+
+@Nonnull
+public static CofactorModel.RankInitScheme resolve(@Nullable 
String opt) {
+if (opt == null) {
+return random;
+} else if ("gaussian".equalsIgnoreCase(opt)) {
+return gaussian;
+} else if ("random".equalsIgnoreCase(opt)) {
+return random;
+}
+return random;
+}
+
+public void setMaxInitValue(float maxInitValue) {
+this.maxInitValue = maxInitValue;
+}
+
+public void setInitStdDev(double initStdDev) {
+this.initStdDev = initStdDev;
+}
+
+}
+
+private static final int EXPECTED_SIZE = 136861;
+@Nonnegative
+protected final int factor;
+
+// rank matrix initialization
+protected final RankInitScheme initScheme;
+
+@Nonnull
+private double globalBias;
+
+// storing trainable latent factors and weights
+private Map theta;
+private Map beta;
+private Map betaBias;
--- End diff --

Please use `Object2DoubleMap betaBias` instead to reduce memory 
consumption.


---


[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor

2018-10-18 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/167#discussion_r226198983
  
--- Diff: core/src/main/java/hivemall/mf/FactorizedModel.java ---
@@ -30,25 +30,25 @@
 import javax.annotation.concurrent.NotThreadSafe;
 
 @NotThreadSafe
-public final class FactorizedModel {
+public class FactorizedModel {
--- End diff --

It seems FactorizedModel is not used in Cofactor. 

Is this change required? Revert if not used.


---


[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor

2018-10-18 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/167#discussion_r226199747
  
--- Diff: core/src/main/java/hivemall/mf/CofactorModel.java ---
@@ -0,0 +1,629 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package hivemall.mf;
+
+import hivemall.fm.Feature;
+import hivemall.utils.math.MathUtils;
+import hivemall.utils.math.MatrixUtils;
+import org.apache.commons.math3.linear.ArrayRealVector;
+import org.apache.commons.math3.linear.Array2DRowRealMatrix;
+import org.apache.commons.math3.linear.RealMatrix;
+import org.apache.commons.math3.linear.RealVector;
+import org.apache.commons.math3.linear.SingularValueDecomposition;
+
+import javax.annotation.Nonnegative;
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import java.util.*;
+
+public class CofactorModel {
+
+public enum RankInitScheme {
+random /* default */, gaussian;
+
+@Nonnegative
+protected float maxInitValue;
+@Nonnegative
+protected double initStdDev;
+
+@Nonnull
+public static CofactorModel.RankInitScheme resolve(@Nullable 
String opt) {
+if (opt == null) {
+return random;
+} else if ("gaussian".equalsIgnoreCase(opt)) {
+return gaussian;
+} else if ("random".equalsIgnoreCase(opt)) {
+return random;
+}
+return random;
+}
+
+public void setMaxInitValue(float maxInitValue) {
+this.maxInitValue = maxInitValue;
+}
+
+public void setInitStdDev(double initStdDev) {
+this.initStdDev = initStdDev;
+}
+
+}
+
+private static final int EXPECTED_SIZE = 136861;
+@Nonnegative
+protected final int factor;
+
+// rank matrix initialization
+protected final RankInitScheme initScheme;
+
+@Nonnull
+private double globalBias;
+
+// storing trainable latent factors and weights
+private Map theta;
+private Map beta;
+private Map betaBias;
+private Map gamma;
+private Map gammaBias;
--- End diff --

Please use `Object2DoubleMap gammaBias` instead to reduce memory 
consumption.


---


[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor

2018-10-18 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/167#discussion_r226202891
  
--- Diff: core/src/main/java/hivemall/mf/CofactorModel.java ---
@@ -0,0 +1,629 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package hivemall.mf;
+
+import hivemall.fm.Feature;
+import hivemall.utils.math.MathUtils;
+import hivemall.utils.math.MatrixUtils;
+import org.apache.commons.math3.linear.ArrayRealVector;
+import org.apache.commons.math3.linear.Array2DRowRealMatrix;
+import org.apache.commons.math3.linear.RealMatrix;
+import org.apache.commons.math3.linear.RealVector;
+import org.apache.commons.math3.linear.SingularValueDecomposition;
+
+import javax.annotation.Nonnegative;
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import java.util.*;
+
+public class CofactorModel {
+
+public enum RankInitScheme {
+random /* default */, gaussian;
+
+@Nonnegative
+protected float maxInitValue;
+@Nonnegative
+protected double initStdDev;
+
+@Nonnull
+public static CofactorModel.RankInitScheme resolve(@Nullable 
String opt) {
+if (opt == null) {
+return random;
+} else if ("gaussian".equalsIgnoreCase(opt)) {
+return gaussian;
+} else if ("random".equalsIgnoreCase(opt)) {
+return random;
+}
+return random;
+}
+
+public void setMaxInitValue(float maxInitValue) {
+this.maxInitValue = maxInitValue;
+}
+
+public void setInitStdDev(double initStdDev) {
+this.initStdDev = initStdDev;
+}
+
+}
+
+private static final int EXPECTED_SIZE = 136861;
+@Nonnegative
+protected final int factor;
+
+// rank matrix initialization
+protected final RankInitScheme initScheme;
+
+@Nonnull
+private double globalBias;
+
+// storing trainable latent factors and weights
+private Map theta;
+private Map beta;
+private Map betaBias;
+private Map gamma;
+private Map gammaBias;
+
+// precomputed identity matrix
+private RealMatrix identity;
+
+protected final Random[] randU, randI;
+
+// hyperparameters
+private final float c0, c1;
+private final float lambdaTheta, lambdaBeta, lambdaGamma;
+
+public CofactorModel(@Nonnegative int factor, @Nonnull RankInitScheme 
initScheme,
+ @Nonnull float c0, @Nonnull float c1, float 
lambdaTheta,
+ float lambdaBeta, float lambdaGamma) {
+
+// rank init scheme is gaussian
+// 
https://github.com/dawenl/cofactor/blob/master/src/cofacto.py#L98
+this.factor = factor;
+this.initScheme = initScheme;
+this.globalBias = 0.d;
+this.lambdaTheta = lambdaTheta;
+this.lambdaBeta = lambdaBeta;
+this.lambdaGamma = lambdaGamma;
+
+this.theta = new HashMap<>();
+this.beta = new HashMap<>();
+this.betaBias = new HashMap<>();
+this.gamma = new HashMap<>();
+this.gammaBias = new HashMap<>();
+
+this.randU = newRandoms(factor, 31L);
+this.randI = newRandoms(factor, 41L);
+
+checkHyperparameterC(c0);
+checkHyperparameterC(c1);
+this.c0 = c0;
+this.c1 = c1;
+
+}
+
+private void initFactorVector(String key, Map 
weights) {
+if (weights.containsKey(key)) {
+return;
+}
+RealVector v = new ArrayRealVector(factor);
+switch (initScheme) {
+case random:

[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor

2018-10-18 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/167#discussion_r226204201
  
--- Diff: core/src/main/java/hivemall/mf/CofactorModel.java ---
@@ -0,0 +1,629 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package hivemall.mf;
+
+import hivemall.fm.Feature;
+import hivemall.utils.math.MathUtils;
+import hivemall.utils.math.MatrixUtils;
+import org.apache.commons.math3.linear.ArrayRealVector;
+import org.apache.commons.math3.linear.Array2DRowRealMatrix;
+import org.apache.commons.math3.linear.RealMatrix;
+import org.apache.commons.math3.linear.RealVector;
+import org.apache.commons.math3.linear.SingularValueDecomposition;
+
+import javax.annotation.Nonnegative;
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import java.util.*;
+
+public class CofactorModel {
+
+public enum RankInitScheme {
+random /* default */, gaussian;
+
+@Nonnegative
+protected float maxInitValue;
+@Nonnegative
+protected double initStdDev;
+
+@Nonnull
+public static CofactorModel.RankInitScheme resolve(@Nullable 
String opt) {
+if (opt == null) {
+return random;
+} else if ("gaussian".equalsIgnoreCase(opt)) {
+return gaussian;
+} else if ("random".equalsIgnoreCase(opt)) {
+return random;
+}
+return random;
+}
+
+public void setMaxInitValue(float maxInitValue) {
+this.maxInitValue = maxInitValue;
+}
+
+public void setInitStdDev(double initStdDev) {
+this.initStdDev = initStdDev;
+}
+
+}
+
+private static final int EXPECTED_SIZE = 136861;
+@Nonnegative
+protected final int factor;
+
+// rank matrix initialization
+protected final RankInitScheme initScheme;
+
+@Nonnull
+private double globalBias;
+
+// storing trainable latent factors and weights
+private Map theta;
+private Map beta;
+private Map betaBias;
+private Map gamma;
+private Map gammaBias;
+
+// precomputed identity matrix
+private RealMatrix identity;
+
+protected final Random[] randU, randI;
+
+// hyperparameters
+private final float c0, c1;
+private final float lambdaTheta, lambdaBeta, lambdaGamma;
+
+public CofactorModel(@Nonnegative int factor, @Nonnull RankInitScheme 
initScheme,
+ @Nonnull float c0, @Nonnull float c1, float 
lambdaTheta,
+ float lambdaBeta, float lambdaGamma) {
+
+// rank init scheme is gaussian
+// 
https://github.com/dawenl/cofactor/blob/master/src/cofacto.py#L98
+this.factor = factor;
+this.initScheme = initScheme;
+this.globalBias = 0.d;
+this.lambdaTheta = lambdaTheta;
+this.lambdaBeta = lambdaBeta;
+this.lambdaGamma = lambdaGamma;
+
+this.theta = new HashMap<>();
+this.beta = new HashMap<>();
+this.betaBias = new HashMap<>();
+this.gamma = new HashMap<>();
+this.gammaBias = new HashMap<>();
+
+this.randU = newRandoms(factor, 31L);
+this.randI = newRandoms(factor, 41L);
+
+checkHyperparameterC(c0);
+checkHyperparameterC(c1);
+this.c0 = c0;
+this.c1 = c1;
+
+}
+
+private void initFactorVector(String key, Map 
weights) {
+if (weights.containsKey(key)) {
+return;
+}
+RealVector v = new ArrayRealVector(factor);
--- End diff --

```
final double[] v =

[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor

2018-10-18 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/167#discussion_r226239017
  
--- Diff: core/src/main/java/hivemall/mf/CofactorModel.java ---
@@ -0,0 +1,629 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package hivemall.mf;
+
+import hivemall.fm.Feature;
+import hivemall.utils.math.MathUtils;
+import hivemall.utils.math.MatrixUtils;
+import org.apache.commons.math3.linear.ArrayRealVector;
+import org.apache.commons.math3.linear.Array2DRowRealMatrix;
+import org.apache.commons.math3.linear.RealMatrix;
+import org.apache.commons.math3.linear.RealVector;
+import org.apache.commons.math3.linear.SingularValueDecomposition;
+
+import javax.annotation.Nonnegative;
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import java.util.*;
+
+public class CofactorModel {
+
+public enum RankInitScheme {
+random /* default */, gaussian;
+
+@Nonnegative
+protected float maxInitValue;
+@Nonnegative
+protected double initStdDev;
+
+@Nonnull
+public static CofactorModel.RankInitScheme resolve(@Nullable 
String opt) {
+if (opt == null) {
+return random;
+} else if ("gaussian".equalsIgnoreCase(opt)) {
+return gaussian;
+} else if ("random".equalsIgnoreCase(opt)) {
+return random;
+}
+return random;
+}
+
+public void setMaxInitValue(float maxInitValue) {
+this.maxInitValue = maxInitValue;
+}
+
+public void setInitStdDev(double initStdDev) {
+this.initStdDev = initStdDev;
+}
+
+}
+
+private static final int EXPECTED_SIZE = 136861;
+@Nonnegative
+protected final int factor;
+
+// rank matrix initialization
+protected final RankInitScheme initScheme;
+
+@Nonnull
+private double globalBias;
+
+// storing trainable latent factors and weights
+private Map theta;
+private Map beta;
+private Map betaBias;
+private Map gamma;
+private Map gammaBias;
+
+// precomputed identity matrix
+private RealMatrix identity;
+
+protected final Random[] randU, randI;
+
+// hyperparameters
+private final float c0, c1;
+private final float lambdaTheta, lambdaBeta, lambdaGamma;
+
+public CofactorModel(@Nonnegative int factor, @Nonnull RankInitScheme 
initScheme,
+ @Nonnull float c0, @Nonnull float c1, float 
lambdaTheta,
+ float lambdaBeta, float lambdaGamma) {
+
+// rank init scheme is gaussian
+// 
https://github.com/dawenl/cofactor/blob/master/src/cofacto.py#L98
+this.factor = factor;
+this.initScheme = initScheme;
+this.globalBias = 0.d;
+this.lambdaTheta = lambdaTheta;
+this.lambdaBeta = lambdaBeta;
+this.lambdaGamma = lambdaGamma;
+
+this.theta = new HashMap<>();
+this.beta = new HashMap<>();
+this.betaBias = new HashMap<>();
+this.gamma = new HashMap<>();
+this.gammaBias = new HashMap<>();
+
+this.randU = newRandoms(factor, 31L);
+this.randI = newRandoms(factor, 41L);
+
+checkHyperparameterC(c0);
+checkHyperparameterC(c1);
+this.c0 = c0;
+this.c1 = c1;
+
+}
+
+private void initFactorVector(String key, Map 
weights) {
+if (weights.containsKey(key)) {
+return;
+}
+RealVector v = new ArrayRealVector(factor);
+switch (initScheme) {
+case random:

[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor

2018-10-18 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/167#discussion_r226237654
  
--- Diff: core/src/main/java/hivemall/mf/CofactorModel.java ---
@@ -0,0 +1,629 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package hivemall.mf;
+
+import hivemall.fm.Feature;
+import hivemall.utils.math.MathUtils;
+import hivemall.utils.math.MatrixUtils;
+import org.apache.commons.math3.linear.ArrayRealVector;
+import org.apache.commons.math3.linear.Array2DRowRealMatrix;
+import org.apache.commons.math3.linear.RealMatrix;
+import org.apache.commons.math3.linear.RealVector;
+import org.apache.commons.math3.linear.SingularValueDecomposition;
+
+import javax.annotation.Nonnegative;
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import java.util.*;
+
+public class CofactorModel {
+
+public enum RankInitScheme {
+random /* default */, gaussian;
+
+@Nonnegative
+protected float maxInitValue;
+@Nonnegative
+protected double initStdDev;
+
+@Nonnull
+public static CofactorModel.RankInitScheme resolve(@Nullable 
String opt) {
+if (opt == null) {
+return random;
+} else if ("gaussian".equalsIgnoreCase(opt)) {
+return gaussian;
+} else if ("random".equalsIgnoreCase(opt)) {
+return random;
+}
+return random;
+}
+
+public void setMaxInitValue(float maxInitValue) {
+this.maxInitValue = maxInitValue;
+}
+
+public void setInitStdDev(double initStdDev) {
+this.initStdDev = initStdDev;
+}
+
+}
+
+private static final int EXPECTED_SIZE = 136861;
+@Nonnegative
+protected final int factor;
+
+// rank matrix initialization
+protected final RankInitScheme initScheme;
+
+@Nonnull
+private double globalBias;
+
+// storing trainable latent factors and weights
+private Map theta;
+private Map beta;
+private Map betaBias;
+private Map gamma;
+private Map gammaBias;
+
+// precomputed identity matrix
+private RealMatrix identity;
+
+protected final Random[] randU, randI;
+
+// hyperparameters
+private final float c0, c1;
+private final float lambdaTheta, lambdaBeta, lambdaGamma;
+
+public CofactorModel(@Nonnegative int factor, @Nonnull RankInitScheme 
initScheme,
+ @Nonnull float c0, @Nonnull float c1, float 
lambdaTheta,
+ float lambdaBeta, float lambdaGamma) {
+
+// rank init scheme is gaussian
+// 
https://github.com/dawenl/cofactor/blob/master/src/cofacto.py#L98
+this.factor = factor;
+this.initScheme = initScheme;
+this.globalBias = 0.d;
+this.lambdaTheta = lambdaTheta;
+this.lambdaBeta = lambdaBeta;
+this.lambdaGamma = lambdaGamma;
+
+this.theta = new HashMap<>();
+this.beta = new HashMap<>();
+this.betaBias = new HashMap<>();
+this.gamma = new HashMap<>();
+this.gammaBias = new HashMap<>();
+
+this.randU = newRandoms(factor, 31L);
+this.randI = newRandoms(factor, 41L);
+
+checkHyperparameterC(c0);
+checkHyperparameterC(c1);
+this.c0 = c0;
+this.c1 = c1;
+
+}
+
+private void initFactorVector(String key, Map 
weights) {
+if (weights.containsKey(key)) {
+return;
+}
+RealVector v = new ArrayRealVector(factor);
+switch (initScheme) {
+case random:

[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor

2018-10-18 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/167#discussion_r226239653
  
--- Diff: core/src/main/java/hivemall/mf/CofactorModel.java ---
@@ -0,0 +1,629 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package hivemall.mf;
+
+import hivemall.fm.Feature;
+import hivemall.utils.math.MathUtils;
+import hivemall.utils.math.MatrixUtils;
+import org.apache.commons.math3.linear.ArrayRealVector;
+import org.apache.commons.math3.linear.Array2DRowRealMatrix;
+import org.apache.commons.math3.linear.RealMatrix;
+import org.apache.commons.math3.linear.RealVector;
+import org.apache.commons.math3.linear.SingularValueDecomposition;
+
+import javax.annotation.Nonnegative;
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import java.util.*;
+
+public class CofactorModel {
+
+public enum RankInitScheme {
+random /* default */, gaussian;
+
+@Nonnegative
+protected float maxInitValue;
+@Nonnegative
+protected double initStdDev;
+
+@Nonnull
+public static CofactorModel.RankInitScheme resolve(@Nullable 
String opt) {
+if (opt == null) {
+return random;
+} else if ("gaussian".equalsIgnoreCase(opt)) {
+return gaussian;
+} else if ("random".equalsIgnoreCase(opt)) {
+return random;
+}
+return random;
+}
+
+public void setMaxInitValue(float maxInitValue) {
+this.maxInitValue = maxInitValue;
+}
+
+public void setInitStdDev(double initStdDev) {
+this.initStdDev = initStdDev;
+}
+
+}
+
+private static final int EXPECTED_SIZE = 136861;
+@Nonnegative
+protected final int factor;
+
+// rank matrix initialization
+protected final RankInitScheme initScheme;
+
+@Nonnull
+private double globalBias;
+
+// storing trainable latent factors and weights
+private Map theta;
+private Map beta;
+private Map betaBias;
+private Map gamma;
+private Map gammaBias;
+
+// precomputed identity matrix
+private RealMatrix identity;
+
+protected final Random[] randU, randI;
+
+// hyperparameters
+private final float c0, c1;
+private final float lambdaTheta, lambdaBeta, lambdaGamma;
+
+public CofactorModel(@Nonnegative int factor, @Nonnull RankInitScheme 
initScheme,
+ @Nonnull float c0, @Nonnull float c1, float 
lambdaTheta,
+ float lambdaBeta, float lambdaGamma) {
+
+// rank init scheme is gaussian
+// 
https://github.com/dawenl/cofactor/blob/master/src/cofacto.py#L98
+this.factor = factor;
+this.initScheme = initScheme;
+this.globalBias = 0.d;
+this.lambdaTheta = lambdaTheta;
+this.lambdaBeta = lambdaBeta;
+this.lambdaGamma = lambdaGamma;
+
+this.theta = new HashMap<>();
+this.beta = new HashMap<>();
+this.betaBias = new HashMap<>();
+this.gamma = new HashMap<>();
+this.gammaBias = new HashMap<>();
+
+this.randU = newRandoms(factor, 31L);
+this.randI = newRandoms(factor, 41L);
+
+checkHyperparameterC(c0);
+checkHyperparameterC(c1);
+this.c0 = c0;
+this.c1 = c1;
+
+}
+
+private void initFactorVector(String key, Map 
weights) {
+if (weights.containsKey(key)) {
+return;
+}
+RealVector v = new ArrayRealVector(factor);
+switch (initScheme) {
+case random:

[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor

2018-10-18 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/167#discussion_r226241124
  
--- Diff: core/src/main/java/hivemall/mf/CofactorizationUDTF.java ---
@@ -0,0 +1,574 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package hivemall.mf;
+
+import hivemall.UDTFWithOptions;
+import hivemall.common.ConversionState;
+import hivemall.fm.Feature;
+import hivemall.fm.StringFeature;
+import hivemall.utils.hadoop.HiveUtils;
+import hivemall.utils.io.FileUtils;
+import hivemall.utils.io.NioStatefulSegment;
+import hivemall.utils.lang.NumberUtils;
+import hivemall.utils.lang.Primitives;
+import hivemall.utils.lang.SizeOf;
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.Options;
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.serde2.objectinspector.*;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.BooleanObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.StringObjectInspector;
+import org.apache.hadoop.mapred.Counters;
+import org.apache.hadoop.mapred.Reporter;
+
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.List;
+
+import static hivemall.utils.lang.Primitives.FALSE_BYTE;
+import static hivemall.utils.lang.Primitives.TRUE_BYTE;
+
+public class CofactorizationUDTF extends UDTFWithOptions {
+private static final Log LOG = 
LogFactory.getLog(CofactorizationUDTF.class);
+
+// Option variables
+// The number of latent factors
+protected int factor;
+// The scaling hyperparameter for zero entries in the rank matrix
+protected float scale_zero;
+// The scaling hyperparameter for non-zero entries in the rank matrix
+protected float scale_nonzero;
+// The preferred size of the miniBatch for training
+protected int batchSize;
+// The initial mean rating
+protected float globalBias;
+// Whether update (and return) the mean rating or not
+protected boolean updateGlobalBias;
+// The number of iterations
+protected int maxIters;
+// Whether to use bias clause
+protected boolean useBiasClause;
+// Whether to use normalization
+protected boolean useL2Norm;
+// regularization hyperparameters
+protected float lambdaTheta;
+protected float lambdaBeta;
+protected float lambdaGamma;
+
+// Initialization strategy of rank matrix
+protected CofactorModel.RankInitScheme rankInit;
+
+// Model itself
+protected CofactorModel model;
+protected int numItems;
+
+// Variable managing status of learning
+
+// The number of processed training examples
+protected long count;
+
+protected ConversionState cvState;
+private ConversionState validationState;
+
+// Input OIs and Context
+protected StringObjectInspector contextOI;
+protected ListObjectInspector featuresOI;
+protected BooleanObjectInspector isItemOI;
+protected ListObjectInspector sppmiOI;
+
+// Used for iterations
+protected NioStatefulSegment fileIO;
+protected ByteBuffer inputBuf;
+private long lastWritePos;
+
+private Feature contextProbe;
+private Feature[] featuresProbe;
+private Feature[] sppmiProbe;
+private boolean isItemProbe;
+private long numValidations;
+private long numTraining;
+
 

[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor

2018-10-18 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/167#discussion_r226243032
  
--- Diff: core/src/main/java/hivemall/mf/CofactorModel.java ---
@@ -0,0 +1,638 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package hivemall.mf;
+
+import hivemall.fm.Feature;
+import hivemall.utils.math.MathUtils;
+import hivemall.utils.math.MatrixUtils;
+import org.apache.commons.math3.linear.ArrayRealVector;
+import org.apache.commons.math3.linear.Array2DRowRealMatrix;
+import org.apache.commons.math3.linear.RealMatrix;
+import org.apache.commons.math3.linear.RealVector;
+import org.apache.commons.math3.linear.SingularValueDecomposition;
+
+import javax.annotation.Nonnegative;
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import java.util.*;
+
+public class CofactorModel {
+
+public enum RankInitScheme {
+random /* default */, gaussian;
+
+@Nonnegative
+protected float maxInitValue;
+@Nonnegative
+protected double initStdDev;
+
+@Nonnull
+public static CofactorModel.RankInitScheme resolve(@Nullable 
String opt) {
+if (opt == null) {
+return random;
+} else if ("gaussian".equalsIgnoreCase(opt)) {
+return gaussian;
+} else if ("random".equalsIgnoreCase(opt)) {
+return random;
+}
+return random;
+}
+
+public void setMaxInitValue(float maxInitValue) {
+this.maxInitValue = maxInitValue;
+}
+
+public void setInitStdDev(double initStdDev) {
+this.initStdDev = initStdDev;
+}
+
+}
+
+private static final int EXPECTED_SIZE = 136861;
+@Nonnegative
+protected final int factor;
+
+// rank matrix initialization
+protected final RankInitScheme initScheme;
+
+@Nonnull
+private double globalBias;
+
+// storing trainable latent factors and weights
+private Map theta;
+private Map beta;
+private Map betaBias;
+private Map gamma;
+private Map gammaBias;
+
+// precomputed identity matrix
+private RealMatrix identity;
+
+protected final Random[] randU, randI;
+
+// hyperparameters
+private final float c0, c1;
+private final float lambdaTheta, lambdaBeta, lambdaGamma;
+
+public CofactorModel(@Nonnegative int factor, @Nonnull RankInitScheme 
initScheme,
+ @Nonnull float c0, @Nonnull float c1, float 
lambdaTheta,
+ float lambdaBeta, float lambdaGamma) {
+
+// rank init scheme is gaussian
+// 
https://github.com/dawenl/cofactor/blob/master/src/cofacto.py#L98
+this.factor = factor;
+this.initScheme = initScheme;
+this.globalBias = 0.d;
+this.lambdaTheta = lambdaTheta;
+this.lambdaBeta = lambdaBeta;
+this.lambdaGamma = lambdaGamma;
+
+this.theta = new HashMap<>();
+this.beta = new HashMap<>();
+this.betaBias = new HashMap<>();
+this.gamma = new HashMap<>();
+this.gammaBias = new HashMap<>();
+
+this.randU = newRandoms(factor, 31L);
+this.randI = newRandoms(factor, 41L);
+
+checkHyperparameterC(c0);
+checkHyperparameterC(c1);
+this.c0 = c0;
+this.c1 = c1;
+
+}
+
+private void initFactorVector(String key, Map 
weights) {
+if (weights.containsKey(key)) {
+return;
+}
+RealVector v = new ArrayRealVector(factor);
+switch (initScheme) {
+case random:

[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor

2018-10-18 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/167#discussion_r226247247
  
--- Diff: core/src/main/java/hivemall/mf/CofactorModel.java ---
@@ -0,0 +1,640 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package hivemall.mf;
+
+import hivemall.fm.Feature;
+import hivemall.utils.math.MathUtils;
+import hivemall.utils.math.MatrixUtils;
+import it.unimi.dsi.fastutil.objects.Object2DoubleArrayMap;
+import it.unimi.dsi.fastutil.objects.Object2DoubleMap;
+import org.apache.commons.math3.linear.ArrayRealVector;
+import org.apache.commons.math3.linear.Array2DRowRealMatrix;
+import org.apache.commons.math3.linear.RealMatrix;
+import org.apache.commons.math3.linear.RealVector;
+import org.apache.commons.math3.linear.SingularValueDecomposition;
+
+import javax.annotation.Nonnegative;
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import java.util.*;
+
+public class CofactorModel {
+
+public enum RankInitScheme {
+random /* default */, gaussian;
+
+@Nonnegative
+protected float maxInitValue;
+@Nonnegative
+protected double initStdDev;
+
+@Nonnull
+public static CofactorModel.RankInitScheme resolve(@Nullable 
String opt) {
+if (opt == null) {
+return random;
+} else if ("gaussian".equalsIgnoreCase(opt)) {
+return gaussian;
+} else if ("random".equalsIgnoreCase(opt)) {
+return random;
+}
+return random;
+}
+
+public void setMaxInitValue(float maxInitValue) {
+this.maxInitValue = maxInitValue;
+}
+
+public void setInitStdDev(double initStdDev) {
+this.initStdDev = initStdDev;
+}
+
+}
+
+private static final int EXPECTED_SIZE = 136861;
+@Nonnegative
+protected final int factor;
+
+// rank matrix initialization
+protected final RankInitScheme initScheme;
+
+@Nonnull
+private double globalBias;
+
+// storing trainable latent factors and weights
+private Map theta;
+private Map beta;
+private Object2DoubleMap betaBias;
+private Map gamma;
+private Object2DoubleMap gammaBias;
+
+// precomputed identity matrix
+private RealMatrix identity;
+
+protected final Random[] randU, randI;
+
+// hyperparameters
+private final float c0, c1;
+private final float lambdaTheta, lambdaBeta, lambdaGamma;
+
+public CofactorModel(@Nonnegative int factor, @Nonnull RankInitScheme 
initScheme,
+ @Nonnull float c0, @Nonnull float c1, float 
lambdaTheta,
+ float lambdaBeta, float lambdaGamma) {
+
+// rank init scheme is gaussian
+// 
https://github.com/dawenl/cofactor/blob/master/src/cofacto.py#L98
+this.factor = factor;
+this.initScheme = initScheme;
+this.globalBias = 0.d;
+this.lambdaTheta = lambdaTheta;
+this.lambdaBeta = lambdaBeta;
+this.lambdaGamma = lambdaGamma;
+
+this.theta = new HashMap<>();
+this.beta = new HashMap<>();
+this.betaBias = new Object2DoubleArrayMap<>();
+this.gamma = new HashMap<>();
+this.gammaBias = new Object2DoubleArrayMap<>();
+
+this.randU = newRandoms(factor, 31L);
+this.randI = newRandoms(factor, 41L);
+
+checkHyperparameterC(c0);
+checkHyperparameterC(c1);
+this.c0 = c0;
+this.c1 = c1;
+
+}
+
+private void initFactorVector(String key, Map 
weights) 

[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor

2018-10-18 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/167#discussion_r226525857
  
--- Diff: core/src/main/java/hivemall/mf/CofactorizationUDTF.java ---
@@ -0,0 +1,574 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package hivemall.mf;
+
+import hivemall.UDTFWithOptions;
+import hivemall.common.ConversionState;
+import hivemall.fm.Feature;
+import hivemall.fm.StringFeature;
+import hivemall.utils.hadoop.HiveUtils;
+import hivemall.utils.io.FileUtils;
+import hivemall.utils.io.NioStatefulSegment;
+import hivemall.utils.lang.NumberUtils;
+import hivemall.utils.lang.Primitives;
+import hivemall.utils.lang.SizeOf;
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.Options;
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+import org.apache.hadoop.hive.serde2.objectinspector.*;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.BooleanObjectInspector;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
+import 
org.apache.hadoop.hive.serde2.objectinspector.primitive.StringObjectInspector;
+import org.apache.hadoop.mapred.Counters;
+import org.apache.hadoop.mapred.Reporter;
+
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.List;
+
+import static hivemall.utils.lang.Primitives.FALSE_BYTE;
+import static hivemall.utils.lang.Primitives.TRUE_BYTE;
+
+public class CofactorizationUDTF extends UDTFWithOptions {
+private static final Log LOG = 
LogFactory.getLog(CofactorizationUDTF.class);
+
+// Option variables
+// The number of latent factors
+protected int factor;
+// The scaling hyperparameter for zero entries in the rank matrix
+protected float scale_zero;
+// The scaling hyperparameter for non-zero entries in the rank matrix
+protected float scale_nonzero;
+// The preferred size of the miniBatch for training
+protected int batchSize;
+// The initial mean rating
+protected float globalBias;
+// Whether update (and return) the mean rating or not
+protected boolean updateGlobalBias;
+// The number of iterations
+protected int maxIters;
+// Whether to use bias clause
+protected boolean useBiasClause;
+// Whether to use normalization
+protected boolean useL2Norm;
+// regularization hyperparameters
+protected float lambdaTheta;
+protected float lambdaBeta;
+protected float lambdaGamma;
+
+// Initialization strategy of rank matrix
+protected CofactorModel.RankInitScheme rankInit;
+
+// Model itself
+protected CofactorModel model;
+protected int numItems;
+
+// Variable managing status of learning
+
+// The number of processed training examples
+protected long count;
+
+protected ConversionState cvState;
+private ConversionState validationState;
+
+// Input OIs and Context
+protected StringObjectInspector contextOI;
+protected ListObjectInspector featuresOI;
+protected BooleanObjectInspector isItemOI;
+protected ListObjectInspector sppmiOI;
+
+// Used for iterations
+protected NioStatefulSegment fileIO;
+protected ByteBuffer inputBuf;
+private long lastWritePos;
+
+private Feature contextProbe;
+private Feature[] featuresProbe;
+private Feature[] sppmiProbe;
+private boolean isItemProbe;
+private long numValidations;
+private long numTraining;
+
 

[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor

2018-10-19 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/167#discussion_r226578153
  
--- Diff: core/src/main/java/hivemall/fm/Feature.java ---
@@ -383,4 +383,10 @@ public static void l2normalize(@Nonnull final 
Feature[] features) {
 }
 }
 
+@Override
--- End diff --

Why this `equals` method is required? Assume this is not used.


---


[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor

2018-10-19 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/167#discussion_r226578495
  
--- Diff: core/src/main/java/hivemall/mf/CofactorModel.java ---
@@ -0,0 +1,715 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package hivemall.mf;
+
+import hivemall.annotations.VisibleForTesting;
+import hivemall.fm.Feature;
+import hivemall.utils.lang.Preconditions;
+import hivemall.utils.math.MathUtils;
+import it.unimi.dsi.fastutil.objects.Object2DoubleArrayMap;
+import it.unimi.dsi.fastutil.objects.Object2DoubleMap;
+import org.apache.commons.math3.linear.ArrayRealVector;
+import org.apache.commons.math3.linear.Array2DRowRealMatrix;
+import org.apache.commons.math3.linear.RealMatrix;
+import org.apache.commons.math3.linear.RealVector;
+import org.apache.commons.math3.linear.SingularValueDecomposition;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+
+import javax.annotation.Nonnegative;
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Random;
+
+
+public class CofactorModel {
+
+public enum RankInitScheme {
+random /* default */, gaussian;
+
--- End diff --

please remove unnessesary line breaks.


---


[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor

2018-10-19 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/167#discussion_r226579051
  
--- Diff: core/src/main/java/hivemall/mf/CofactorModel.java ---
@@ -0,0 +1,715 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package hivemall.mf;
+
+import hivemall.annotations.VisibleForTesting;
+import hivemall.fm.Feature;
+import hivemall.utils.lang.Preconditions;
+import hivemall.utils.math.MathUtils;
+import it.unimi.dsi.fastutil.objects.Object2DoubleArrayMap;
+import it.unimi.dsi.fastutil.objects.Object2DoubleMap;
+import org.apache.commons.math3.linear.ArrayRealVector;
+import org.apache.commons.math3.linear.Array2DRowRealMatrix;
+import org.apache.commons.math3.linear.RealMatrix;
+import org.apache.commons.math3.linear.RealVector;
+import org.apache.commons.math3.linear.SingularValueDecomposition;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+
+import javax.annotation.Nonnegative;
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Random;
+
+
+public class CofactorModel {
+
+public enum RankInitScheme {
+random /* default */, gaussian;
+
+
+@Nonnegative
+private float maxInitValue;
+@Nonnegative
+private double initStdDev;
+@Nonnull
+public static CofactorModel.RankInitScheme resolve(@Nullable 
String opt) {
+if (opt == null) {
+return random;
+} else if ("gaussian".equalsIgnoreCase(opt)) {
+return gaussian;
+} else if ("random".equalsIgnoreCase(opt)) {
+return random;
+}
+return random;
+}
+
+public void setMaxInitValue(float maxInitValue) {
+this.maxInitValue = maxInitValue;
+}
+
+public void setInitStdDev(double initStdDev) {
+this.initStdDev = initStdDev;
+}
+
+
+}
+
+@Nonnegative
+private final int factor;
+
+// rank matrix initialization
+private final RankInitScheme initScheme;
+
+@Nonnull
+private double globalBias;
+
+// storing trainable latent factors and weights
+private final Map theta;
+private final Map beta;
+private final Object2DoubleMap betaBias;
+private final Map gamma;
+private final Object2DoubleMap gammaBias;
+
+private final Random[] randU, randI;
+
+// hyperparameters
+private final float c0, c1;
+private final float lambdaTheta, lambdaBeta, lambdaGamma;
+
+// solve
+private final RealMatrix B;
+private final RealVector A;
+
+// error message strings
+private static final String ARRAY_NOT_SQUARE_ERR = "Array is not 
square";
+private static final String DIFFERENT_DIMS_ERR = "Matrix, vector or 
array do not match in size";
+
+public CofactorModel(@Nonnegative int factor, @Nonnull RankInitScheme 
initScheme,
+ float c0, float c1, float lambdaTheta, float 
lambdaBeta, float lambdaGamma) {
+
+// rank init scheme is gaussian
+// 
https://github.com/dawenl/cofactor/blob/master/src/cofacto.py#L98
+this.factor = factor;
+this.initScheme = initScheme;
+this.globalBias = 0.d;
+this.lambdaTheta = lambdaTheta;
+this.lambdaBeta = lambdaBeta;
+this.lambdaGamma = lambdaGamma;
+
+this.theta = new HashMap<>();
+this.beta = new HashMap<>();
+this.betaBias = new Object2DoubleArrayMap<>();
+this.betaBias.defaultReturnValue(0.d)

[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor

2018-10-19 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/167#discussion_r226578854
  
--- Diff: core/src/main/java/hivemall/mf/CofactorModel.java ---
@@ -0,0 +1,715 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package hivemall.mf;
+
+import hivemall.annotations.VisibleForTesting;
+import hivemall.fm.Feature;
+import hivemall.utils.lang.Preconditions;
+import hivemall.utils.math.MathUtils;
+import it.unimi.dsi.fastutil.objects.Object2DoubleArrayMap;
+import it.unimi.dsi.fastutil.objects.Object2DoubleMap;
+import org.apache.commons.math3.linear.ArrayRealVector;
+import org.apache.commons.math3.linear.Array2DRowRealMatrix;
+import org.apache.commons.math3.linear.RealMatrix;
+import org.apache.commons.math3.linear.RealVector;
+import org.apache.commons.math3.linear.SingularValueDecomposition;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+
+import javax.annotation.Nonnegative;
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Random;
+
+
+public class CofactorModel {
+
+public enum RankInitScheme {
+random /* default */, gaussian;
+
+
+@Nonnegative
+private float maxInitValue;
+@Nonnegative
+private double initStdDev;
+@Nonnull
+public static CofactorModel.RankInitScheme resolve(@Nullable 
String opt) {
+if (opt == null) {
+return random;
+} else if ("gaussian".equalsIgnoreCase(opt)) {
+return gaussian;
+} else if ("random".equalsIgnoreCase(opt)) {
+return random;
+}
+return random;
+}
+
+public void setMaxInitValue(float maxInitValue) {
+this.maxInitValue = maxInitValue;
+}
+
+public void setInitStdDev(double initStdDev) {
+this.initStdDev = initStdDev;
+}
+
+
+}
+
+@Nonnegative
+private final int factor;
+
+// rank matrix initialization
+private final RankInitScheme initScheme;
+
+@Nonnull
+private double globalBias;
+
+// storing trainable latent factors and weights
+private final Map theta;
+private final Map beta;
+private final Object2DoubleMap betaBias;
+private final Map gamma;
+private final Object2DoubleMap gammaBias;
+
+private final Random[] randU, randI;
+
+// hyperparameters
+private final float c0, c1;
+private final float lambdaTheta, lambdaBeta, lambdaGamma;
+
+// solve
+private final RealMatrix B;
+private final RealVector A;
+
+// error message strings
+private static final String ARRAY_NOT_SQUARE_ERR = "Array is not 
square";
+private static final String DIFFERENT_DIMS_ERR = "Matrix, vector or 
array do not match in size";
+
+public CofactorModel(@Nonnegative int factor, @Nonnull RankInitScheme 
initScheme,
+ float c0, float c1, float lambdaTheta, float 
lambdaBeta, float lambdaGamma) {
+
+// rank init scheme is gaussian
+// 
https://github.com/dawenl/cofactor/blob/master/src/cofacto.py#L98
+this.factor = factor;
+this.initScheme = initScheme;
+this.globalBias = 0.d;
+this.lambdaTheta = lambdaTheta;
+this.lambdaBeta = lambdaBeta;
+this.lambdaGamma = lambdaGamma;
+
+this.theta = new HashMap<>();
+this.beta = new HashMap<>();
+this.betaBias = new Object2DoubleArrayMap<>();
+this.betaBias.defaultReturnValue(0.d)

[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor

2018-10-19 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/167#discussion_r226578817
  
--- Diff: core/src/main/java/hivemall/mf/CofactorModel.java ---
@@ -0,0 +1,715 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package hivemall.mf;
+
+import hivemall.annotations.VisibleForTesting;
+import hivemall.fm.Feature;
+import hivemall.utils.lang.Preconditions;
+import hivemall.utils.math.MathUtils;
+import it.unimi.dsi.fastutil.objects.Object2DoubleArrayMap;
+import it.unimi.dsi.fastutil.objects.Object2DoubleMap;
+import org.apache.commons.math3.linear.ArrayRealVector;
+import org.apache.commons.math3.linear.Array2DRowRealMatrix;
+import org.apache.commons.math3.linear.RealMatrix;
+import org.apache.commons.math3.linear.RealVector;
+import org.apache.commons.math3.linear.SingularValueDecomposition;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+
+import javax.annotation.Nonnegative;
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Random;
+
+
+public class CofactorModel {
+
+public enum RankInitScheme {
+random /* default */, gaussian;
+
+
+@Nonnegative
+private float maxInitValue;
+@Nonnegative
+private double initStdDev;
+@Nonnull
+public static CofactorModel.RankInitScheme resolve(@Nullable 
String opt) {
+if (opt == null) {
+return random;
+} else if ("gaussian".equalsIgnoreCase(opt)) {
+return gaussian;
+} else if ("random".equalsIgnoreCase(opt)) {
+return random;
+}
+return random;
+}
+
+public void setMaxInitValue(float maxInitValue) {
+this.maxInitValue = maxInitValue;
+}
+
+public void setInitStdDev(double initStdDev) {
+this.initStdDev = initStdDev;
+}
+
+
+}
+
+@Nonnegative
+private final int factor;
+
+// rank matrix initialization
+private final RankInitScheme initScheme;
+
+@Nonnull
+private double globalBias;
+
+// storing trainable latent factors and weights
+private final Map theta;
+private final Map beta;
+private final Object2DoubleMap betaBias;
+private final Map gamma;
+private final Object2DoubleMap gammaBias;
+
+private final Random[] randU, randI;
+
+// hyperparameters
+private final float c0, c1;
+private final float lambdaTheta, lambdaBeta, lambdaGamma;
+
+// solve
+private final RealMatrix B;
+private final RealVector A;
+
+// error message strings
+private static final String ARRAY_NOT_SQUARE_ERR = "Array is not 
square";
+private static final String DIFFERENT_DIMS_ERR = "Matrix, vector or 
array do not match in size";
+
+public CofactorModel(@Nonnegative int factor, @Nonnull RankInitScheme 
initScheme,
+ float c0, float c1, float lambdaTheta, float 
lambdaBeta, float lambdaGamma) {
+
+// rank init scheme is gaussian
+// 
https://github.com/dawenl/cofactor/blob/master/src/cofacto.py#L98
+this.factor = factor;
+this.initScheme = initScheme;
+this.globalBias = 0.d;
+this.lambdaTheta = lambdaTheta;
+this.lambdaBeta = lambdaBeta;
+this.lambdaGamma = lambdaGamma;
+
+this.theta = new HashMap<>();
+this.beta = new HashMap<>();
+this.betaBias = new Object2DoubleArrayMap<>();
+this.betaBias.defaultReturnValue(0.d)

[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor

2018-10-19 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/167#discussion_r226579427
  
--- Diff: core/src/main/java/hivemall/mf/CofactorModel.java ---
@@ -0,0 +1,715 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package hivemall.mf;
+
+import hivemall.annotations.VisibleForTesting;
+import hivemall.fm.Feature;
+import hivemall.utils.lang.Preconditions;
+import hivemall.utils.math.MathUtils;
+import it.unimi.dsi.fastutil.objects.Object2DoubleArrayMap;
+import it.unimi.dsi.fastutil.objects.Object2DoubleMap;
+import org.apache.commons.math3.linear.ArrayRealVector;
+import org.apache.commons.math3.linear.Array2DRowRealMatrix;
+import org.apache.commons.math3.linear.RealMatrix;
+import org.apache.commons.math3.linear.RealVector;
+import org.apache.commons.math3.linear.SingularValueDecomposition;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+
+import javax.annotation.Nonnegative;
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Random;
+
+
+public class CofactorModel {
+
+public enum RankInitScheme {
+random /* default */, gaussian;
+
+
+@Nonnegative
+private float maxInitValue;
+@Nonnegative
+private double initStdDev;
+@Nonnull
+public static CofactorModel.RankInitScheme resolve(@Nullable 
String opt) {
+if (opt == null) {
+return random;
+} else if ("gaussian".equalsIgnoreCase(opt)) {
+return gaussian;
+} else if ("random".equalsIgnoreCase(opt)) {
+return random;
+}
+return random;
+}
+
+public void setMaxInitValue(float maxInitValue) {
+this.maxInitValue = maxInitValue;
+}
+
+public void setInitStdDev(double initStdDev) {
+this.initStdDev = initStdDev;
+}
+
+
+}
+
+@Nonnegative
+private final int factor;
+
+// rank matrix initialization
+private final RankInitScheme initScheme;
+
+@Nonnull
+private double globalBias;
+
+// storing trainable latent factors and weights
+private final Map theta;
+private final Map beta;
+private final Object2DoubleMap betaBias;
+private final Map gamma;
+private final Object2DoubleMap gammaBias;
+
+private final Random[] randU, randI;
+
+// hyperparameters
+private final float c0, c1;
+private final float lambdaTheta, lambdaBeta, lambdaGamma;
+
+// solve
+private final RealMatrix B;
+private final RealVector A;
+
+// error message strings
+private static final String ARRAY_NOT_SQUARE_ERR = "Array is not 
square";
+private static final String DIFFERENT_DIMS_ERR = "Matrix, vector or 
array do not match in size";
+
+public CofactorModel(@Nonnegative int factor, @Nonnull RankInitScheme 
initScheme,
+ float c0, float c1, float lambdaTheta, float 
lambdaBeta, float lambdaGamma) {
+
+// rank init scheme is gaussian
+// 
https://github.com/dawenl/cofactor/blob/master/src/cofacto.py#L98
+this.factor = factor;
+this.initScheme = initScheme;
+this.globalBias = 0.d;
+this.lambdaTheta = lambdaTheta;
+this.lambdaBeta = lambdaBeta;
+this.lambdaGamma = lambdaGamma;
+
+this.theta = new HashMap<>();
+this.beta = new HashMap<>();
+this.betaBias = new Object2DoubleArrayMap<>();
+this.betaBias.defaultReturnValue(0.d)

[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor

2018-10-19 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/167#discussion_r226578559
  
--- Diff: core/src/main/java/hivemall/mf/CofactorModel.java ---
@@ -0,0 +1,715 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package hivemall.mf;
+
+import hivemall.annotations.VisibleForTesting;
+import hivemall.fm.Feature;
+import hivemall.utils.lang.Preconditions;
+import hivemall.utils.math.MathUtils;
+import it.unimi.dsi.fastutil.objects.Object2DoubleArrayMap;
+import it.unimi.dsi.fastutil.objects.Object2DoubleMap;
+import org.apache.commons.math3.linear.ArrayRealVector;
+import org.apache.commons.math3.linear.Array2DRowRealMatrix;
+import org.apache.commons.math3.linear.RealMatrix;
+import org.apache.commons.math3.linear.RealVector;
+import org.apache.commons.math3.linear.SingularValueDecomposition;
+import org.apache.hadoop.hive.ql.metadata.HiveException;
+
+import javax.annotation.Nonnegative;
+import javax.annotation.Nonnull;
+import javax.annotation.Nullable;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Random;
+
+
+public class CofactorModel {
+
+public enum RankInitScheme {
+random /* default */, gaussian;
+
+
+@Nonnegative
+private float maxInitValue;
+@Nonnegative
+private double initStdDev;
+@Nonnull
+public static CofactorModel.RankInitScheme resolve(@Nullable 
String opt) {
+if (opt == null) {
+return random;
+} else if ("gaussian".equalsIgnoreCase(opt)) {
+return gaussian;
+} else if ("random".equalsIgnoreCase(opt)) {
+return random;
+}
+return random;
+}
+
+public void setMaxInitValue(float maxInitValue) {
+this.maxInitValue = maxInitValue;
+}
+
+public void setInitStdDev(double initStdDev) {
+this.initStdDev = initStdDev;
+}
+
+
+}
+
+@Nonnegative
+private final int factor;
+
+// rank matrix initialization
+private final RankInitScheme initScheme;
+
+@Nonnull
+private double globalBias;
+
+// storing trainable latent factors and weights
+private final Map theta;
+private final Map beta;
+private final Object2DoubleMap betaBias;
+private final Map gamma;
+private final Object2DoubleMap gammaBias;
+
+private final Random[] randU, randI;
+
+// hyperparameters
+private final float c0, c1;
+private final float lambdaTheta, lambdaBeta, lambdaGamma;
+
+// solve
+private final RealMatrix B;
+private final RealVector A;
+
+// error message strings
+private static final String ARRAY_NOT_SQUARE_ERR = "Array is not 
square";
+private static final String DIFFERENT_DIMS_ERR = "Matrix, vector or 
array do not match in size";
+
+public CofactorModel(@Nonnegative int factor, @Nonnull RankInitScheme 
initScheme,
+ float c0, float c1, float lambdaTheta, float 
lambdaBeta, float lambdaGamma) {
+
+// rank init scheme is gaussian
+// 
https://github.com/dawenl/cofactor/blob/master/src/cofacto.py#L98
+this.factor = factor;
+this.initScheme = initScheme;
+this.globalBias = 0.d;
+this.lambdaTheta = lambdaTheta;
+this.lambdaBeta = lambdaBeta;
+this.lambdaGamma = lambdaGamma;
+
+this.theta = new HashMap<>();
+this.beta = new HashMap<>();
+this.betaBias = new Object2DoubleArrayMap<>();
+this.betaBias.defaultReturnValue(0.d)

[GitHub] incubator-hivemall pull request #167: [HIVEMALL-220] Implement Cofactor

2018-10-20 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/167#discussion_r226845079
  
--- Diff: core/src/main/java/hivemall/fm/Feature.java ---
@@ -383,4 +383,10 @@ public static void l2normalize(@Nonnull final 
Feature[] features) {
 }
 }
 
+@Override
--- End diff --

See 
https://medium.com/codelog/overriding-hashcode-method-effective-java-notes-723c1fedf51c
 

Usually, overriding `equals` required `hashCode` because hashCode (and 
equals) is used for HashMap key search.


---


[GitHub] incubator-hivemall issue #168: Add cache to reduce Maven build time on Travi...

2018-10-23 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/168
  
Seems not working.. 

`timeout: 1000` helps (?)
https://docs.travis-ci.com/user/caching/#setting-the-timeout

Please add `[HIVEMALL-221]` to the PR title.


---


[GitHub] incubator-hivemall pull request #169: [HIVEMALL-222] Introduce Gradient Clip...

2018-10-24 Thread myui
GitHub user myui opened a pull request:

https://github.com/apache/incubator-hivemall/pull/169

[HIVEMALL-222] Introduce Gradient Clipping to avoid exploding gradient to 
General Classifier/Regressor

## What changes were proposed in this pull request?

Avoid [exploding 
gradients](http://www.cs.toronto.edu/~rgrosse/courses/csc321_2017/readings/L15%20Exploding%20and%20Vanishing%20Gradients.pdf)
 by gradient clipping (by value)

## What type of PR is it?

Improvement

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-222

## How was this patch tested?

unit tests

## Checklist

(Please remove this section if not needed; check `x` for YES, blank for NO)

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, 
for your commit?
- [ ] Did you run system tests on Hive (or Spark)?


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/myui/incubator-hivemall clipping

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/169.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #169


commit 0c10392d2a3c96b40df57e6b406333e0a239b9f9
Author: Makoto Yui 
Date:   2018-10-24T08:14:15Z

Updated for debugging purpose

commit e0dc4b954650c6751d6e37ee5ecf6c9656872b16
Author: Makoto Yui 
Date:   2018-10-24T08:15:03Z

Introduced gradient clipping by value to avoid exploding gradients

commit 7e932e99cfd990bb47ff7acfed44c19678fadc8f
Author: Makoto Yui 
Date:   2018-10-24T08:15:52Z

Added a unit test for gradient clipping




---


[GitHub] incubator-hivemall issue #168: [HIVEMALL-221] Add cache to reduce Maven buil...

2018-10-24 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/168
  
@maropu 

Is this `clean` required?

https://github.com/apache/incubator-hivemall/blob/master/bin/run_travis_tests.sh#L42


---


[GitHub] incubator-hivemall pull request #168: [HIVEMALL-221] Add cache to reduce Mav...

2018-10-24 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/168#discussion_r227796760
  
--- Diff: .travis.yml ---
@@ -1,5 +1,10 @@
 sudo: false
 
+cache:
+  timeout: 1500
+  directories:
+  - $HOME/.m2
--- End diff --

Isn't `$HOME/.m2/repository` ?

https://github.com/apache/kafka/blob/trunk/.travis.yml#L52
https://github.com/airlift/drift/blob/master/.travis.yml#L11
https://github.com/mesos/storm/blob/master/.travis.yml#L6


---


[GitHub] incubator-hivemall pull request #168: [HIVEMALL-221] Add cache to reduce Mav...

2018-10-24 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/168#discussion_r227797186
  
--- Diff: .travis.yml ---
@@ -35,7 +40,7 @@ notifications:
   email: false
 
 script:
-  - ./bin/run_travis_tests.sh
+  - travis_wait 10 ./bin/run_travis_tests.sh
--- End diff --

plz revert this change because this does not effect 


---


[GitHub] incubator-hivemall issue #168: [HIVEMALL-221] Add cache to reduce Maven buil...

2018-10-24 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/168
  
See what happens. 


---


[GitHub] incubator-hivemall issue #168: [HIVEMALL-221] Add cache to reduce Maven buil...

2018-10-24 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/168
  
```
[WARNING] Could not transfer metadata 
org.apache.hivemall:hivemall-spark2.1:0.5.1-incubating-SNAPSHOT/maven-metadata.xml
 from/to apache.snapshots (https://repository.apache.org/snapshots): Connect to 
repository.apache.org:443 [repository.apache.org/207.244.88.140] failed: 
Connection timed out (Connection timed out)
[WARNING] Failure to transfer 
org.apache.hivemall:hivemall-spark2.1:0.5.1-incubating-SNAPSHOT/maven-metadata.xml
 from https://repository.apache.org/snapshots/ was cached in the local 
repository, resolution will not be reattempted until the update interval of 
apache-snapshots has elapsed or updates are forced. Original error: Could not 
transfer metadata 
org.apache.hivemall:hivemall-spark2.1:0.5.1-incubating-SNAPSHOT/maven-metadata.xml
 from/to apache-snapshots (https://repository.apache.org/snapshots/): Connect 
to repository.apache.org:443 [repository.apache.org/207.244.88.140] failed: 
Connection timed out (Connection timed out)
[WARNING] Failure to transfer 
org.apache.hivemall:hivemall-spark2.1:0.5.1-incubating-SNAPSHOT/maven-metadata.xml
 from https://repository.apache.org/snapshots was cached in the local 
repository, resolution will not be reattempted until the update interval of 
apache.snapshots has elapsed or updates are forced. Original error: Could not 
transfer metadata 
org.apache.hivemall:hivemall-spark2.1:0.5.1-incubating-SNAPSHOT/maven-metadata.xml
 from/to apache.snapshots (https://repository.apache.org/snapshots): Connect to 
repository.apache.org:443 [repository.apache.org/207.244.88.140] failed: 
Connection timed out (Connection timed out)
[INFO] Downloading from apache-snapshots: 
https://repository.apache.org/snapshots/org/apache/hivemall/hivemall-spark2.1/0.5.1-incubating-SNAPSHOT/hivemall-spark2.1-0.5.1-incubating-SNAPSHOT-sources.jar
[INFO] Downloading from apache.snapshots: 
https://repository.apache.org/snapshots/org/apache/hivemall/hivemall-spark2.1/0.5.1-incubating-SNAPSHOT/hivemall-spark2.1-0.5.1-incubating-SNAPSHOT-sources.jar
```

hmm could we provide mirror repository in travis-ci ?


---


[GitHub] incubator-hivemall issue #168: [HIVEMALL-221] Add cache to reduce Maven buil...

2018-10-28 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/168
  
We might need to set asf mirror to avoid timeout by the default ASF 
repository.

https://maven.apache.org/guides/mini/guide-mirror-settings.html
https://code.i-harness.com/ja/q/c326f0


---


[GitHub] incubator-hivemall issue #163: [HIVEMALL-196] Support BM25 scoring

2018-11-02 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/163
  
@jaxony Merged with some modification. Thank you for your first 
contribution to Apache Hivemall!


---


[GitHub] incubator-hivemall pull request #170: [WIP][HIVEMALL-223] Add -kv_map and -v...

2018-11-11 Thread myui
GitHub user myui opened a pull request:

https://github.com/apache/incubator-hivemall/pull/170

[WIP][HIVEMALL-223] Add -kv_map and -vk_map option to to_ordered_list UDAF

## What changes were proposed in this pull request?

Add `-kv_map` and `-vk_map` option to `to_ordered_list` UDAF.

## What type of PR is it?

Improvement

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-223

## How was this patch tested?

unit tests and manual tests on EMR

## How to use this feature?

Will be described in 
http://hivemall.incubator.apache.org/userguide/misc/generic_funcs.html#array

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, 
for your commit?
- [ ] Did you run system tests on Hive (or Spark)?


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/myui/incubator-hivemall HIVEMALL-223

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/170.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #170


commit 26f361ce7b355410772577f0754f4bb5537ababf
Author: Makoto Yui 
Date:   2018-11-12T04:19:37Z

Added -kv_map and -vk_map option

commit 39ee911cb12e63f924229e962bbb00247297f75d
Author: Makoto Yui 
Date:   2018-11-12T04:20:13Z

Added WIP unit tests for -kv_map/vk_map option of to_ordered_list UDAF




---


[GitHub] incubator-hivemall pull request #171: [SPARK][HOTFIX][WIP] Fix existing test...

2018-11-13 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/171#discussion_r233287092
  
--- Diff: 
spark/spark-2.3/src/test/scala/org/apache/spark/sql/hive/XGBoostSuite.scala ---
@@ -77,6 +77,7 @@ final class XGBoostSuite extends VectorQueryTest {
 val model = 
hiveContext.sparkSession.read.format("libxgboost").load(tempDir)
 val predict = model.join(mllibTestDf)
   .xgboost_predict($"rowid", $"features", $"model_id", 
$"pred_model")
--- End diff --

Let's disable xgboost for spark-2.3.


---


[GitHub] incubator-hivemall pull request #171: [SPARK][HOTFIX][WIP] Fix existing test...

2018-11-13 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/171#discussion_r233287740
  
--- Diff: 
spark/spark-2.3/src/main/scala/org/apache/spark/sql/hive/HivemallOps.scala ---
@@ -1935,18 +1935,6 @@ object HivemallOps {
 )
   }
 
-  /**
-   * @see [[hivemall.tools.array.SubarrayUDF]]
-   * @group tools.array
-   */
-  def subarray(original: Column, fromIndex: Column, toIndex: Column): 
Column = withExpr {
-planHiveUDF(
-  "hivemall.tools.array.SubarrayUDF",
-  "subarray",
-  original :: fromIndex :: toIndex :: Nil
-)
-  }
--- End diff --

Replacing SubarrayUDF with  ArraySliceUDF is not easy?

```
def subarray(original: Column, fromIndex: Column, length: Column): Column = 
withExpr {  
planHiveUDF(
  "hivemall.tools.array.ArraySliceUDF",
```


---


[GitHub] incubator-hivemall pull request #171: [SPARK][HOTFIX][WIP] Fix existing test...

2018-11-13 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/171#discussion_r233288186
  
--- Diff: spark/pom.xml ---
@@ -52,6 +52,12 @@
hivemall-core
${project.version}
compile
+   
+   
+   io.netty
+   
netty-all
+   
--- End diff --

ah... I see.


---


[GitHub] incubator-hivemall pull request #171: [SPARK][HOTFIX][WIP] Fix existing test...

2018-11-13 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/171#discussion_r233324312
  
--- Diff: 
spark/spark-2.3/src/test/scala/org/apache/spark/sql/hive/XGBoostSuite.scala ---
@@ -77,6 +77,7 @@ final class XGBoostSuite extends VectorQueryTest {
 val model = 
hiveContext.sparkSession.read.format("libxgboost").load(tempDir)
 val predict = model.join(mllibTestDf)
   .xgboost_predict($"rowid", $"features", $"model_id", 
$"pred_model")
--- End diff --

BTW, could you paste Stacktrace of the exception?


---


[GitHub] incubator-hivemall issue #172: Fix typo

2018-11-13 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/172
  
Merged, thanks! 


---


[GitHub] incubator-hivemall issue #171: [SPARK][HOTFIX] Fix the existing test failure...

2018-11-14 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/171
  
Merged. Thanks!


---


[GitHub] incubator-hivemall pull request #173: [HIVEMALL-227][DOC] Removed md5 and re...

2018-11-15 Thread myui
GitHub user myui opened a pull request:

https://github.com/apache/incubator-hivemall/pull/173

[HIVEMALL-227][DOC] Removed md5 and replace sha1 with sha512 following new 
ASF policy

## What changes were proposed in this pull request?

Removed md5 and replace sha1 with sha512 following new ASF policy

## What type of PR is it?

Documentation

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-227


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/myui/incubator-hivemall HIVEMALL-227

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/173.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #173


commit 583eb9991cf8db730d46b431b1cb80ebaeb293a8
Author: Makoto Yui 
Date:   2018-11-15T09:18:39Z

Removed md5 and replace sha1 with sha512 following new ASF policy




---


[GitHub] incubator-hivemall pull request #175: [WIP][HIVEMALL-230] Revise Optimizer I...

2018-12-12 Thread myui
GitHub user myui opened a pull request:

https://github.com/apache/incubator-hivemall/pull/175

[WIP][HIVEMALL-230] Revise Optimizer Implementation

## What changes were proposed in this pull request?

Revise Optimizer implementation. 

1. Revise default hyperparameters of AdaDelta and Adam. 
2. Support AdamW, AdamHD, Eve, and YellowFin optimizer.

* Fixing Weight Decay Regularization in Adam
https://openreview.net/forum?id=rk6qdGgCZ
* On the Convergence of Adam and Beyond 
https://openreview.net/forum?id=ryQu7f-RZ
* AdamHD (Adam with Hypergradient descent)
https://arxiv.org/pdf/1703.04782.pdf
• Eve: A Gradient Based Optimization Method with Locally and Globally 
Adaptive Learning Rates
https://arxiv.org/abs/1611.01505
• YellowFin and the Art of Momentum Tuning
https://arxiv.org/abs/1706.03471

## What type of PR is it?

Improvement, Feature

## What is the Jira issue?

https://issues.apache.org/jira/browse/HIVEMALL-230

## How was this patch tested?

unit tests, emr (to appear)

## How to use this feature?

to appear

## Checklist

- [x] Did you apply source code formatter, i.e., `./bin/format_code.sh`, 
for your commit?
- [ ] Did you run system tests on Hive (or Spark)?


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/myui/incubator-hivemall adam_test

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/175.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #175


commit 5168cf06bf03c38f005d435a4415ce8cb8140891
Author: Makoto Yui 
Date:   2018-12-03T07:04:29Z

Added ongoing unit test files

commit ed1b6302183a687a3584fe62ce5fa92b26c828ad
Author: Makoto Yui 
Date:   2018-12-04T09:41:42Z

Fixed to show ETA in debug log

commit 5c9d63f9fc184f05eed28f03986c6269c4ea6e93
Author: Makoto Yui 
Date:   2018-12-04T09:42:02Z

Added unit tests

commit 243f4b40899b960f4942c75f89c0c4c94974b03b
Author: Makoto Yui 
Date:   2018-12-05T09:48:17Z

Added comments

commit ae29e9a669dcd311b154615e19900ec4b01fd4d8
Author: Makoto Yui 
Date:   2018-12-06T07:08:48Z

Refactored

commit c25ce02db537570c6ed75db74d9a3783b316c694
Author: Makoto Yui 
Date:   2018-12-06T07:10:05Z

Added square() method

commit 71671d10138aa54c0485809b6126753a54dbe3e8
Author: Makoto Yui 
Date:   2018-12-06T07:10:42Z

Added helper methods

commit 6f4edbbaaac37884533132dea00c81f36da45e50
Author: Makoto Yui 
Date:   2018-12-06T07:22:51Z

Refactored ADAM implementation

commit e61f22afaa46bdf705c2760cebaa601929a77608
Author: Makoto Yui 
Date:   2018-12-06T08:52:08Z

Added logging message

commit 22c3f7c132fc01528c93c6e15d40a2b70f1771c0
Author: Makoto Yui 
Date:   2018-12-06T08:53:01Z

Improved -eta option to take eta0 for Fixed ETA estimator

commit e9b9b1420c3b573b5cbe15e4340d862251fac81d
Author: Makoto Yui 
Date:   2018-12-06T08:53:28Z

Added unit test

commit 7c6e4a1da5eaeb99c02a9a83f1519d5274131037
Author: Makoto Yui 
Date:   2018-12-06T09:06:16Z

Made eta default hyper-parameter flexible for each optimizer

commit a92293906d43c25ce47032644774723a0cf713d9
Author: Makoto Yui 
Date:   2018-12-06T09:36:26Z

Changed the default hyperparameter of AdaDelta

commit 1494ea298497a846650b2d9f6799add77105ae77
Author: Makoto Yui 
Date:   2018-12-07T05:03:21Z

Reduced the size of test data

commit 79197a84ca4d840ab3150730d5e6d4a5ad96e719
Author: Makoto Yui 
Date:   2018-12-07T05:39:13Z

Improved -help option handling

commit 4fdcf6c84ec81c174f5e107038660b1200b1a9a5
Author: Makoto Yui 
Date:   2018-12-07T05:48:07Z

Added assertions

commit e1c7a68df679a65f496268bd4acc286b19d0a964
Author: Makoto Yui 
Date:   2018-12-07T07:39:58Z

Fixed AdaDelta eta to 1.0

commit b8e5698ecd7e7d2758ef85a338c053f5bbcc663d
Author: Makoto Yui 
Date:   2018-12-07T09:13:48Z

Supported -amsgrad in Adam

commit aa512c3b71039f97c2ac08b598fcb11f1cfc4d80
Author: Makoto Yui 
Date:   2018-12-07T09:59:59Z

Supported -decay option in ADAM optimizer

commit 19bd276ff9867ba93f42c241feb9aa5aafd0836c
Author: Makoto Yui 
Date:   2018-12-07T10:15:24Z

Revise the default eta0/alpha value

commit 19fa61145e8be18c3f86988905b35f171e1ee50e
Author: Makoto Yui 
Date:   2018-12-10T08:37:05Z

Revised ADAM hyperparameter treatment




---


[GitHub] incubator-hivemall issue #13: [WIP] Kernelized Passive-Aggressive Algorithm ...

2017-01-12 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/13
  
Usage

```sql
use a9a;

create external table train (
  rowid int,
  label float,
  features ARRAY
) ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY '\t' 
  COLLECTION ITEMS TERMINATED BY "," 
STORED AS TEXTFILE LOCATION 's3://myui-dev/Datasets/a9a/train/';

create external table test (
  rowid int, 
  label float,
  features ARRAY
) ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY '\t' 
  COLLECTION ITEMS TERMINATED BY "," 
STORED AS TEXTFILE LOCATION 's3://myui-dev/Datasets/a9a/test/';

create or replace view train_x3
as
select 
  * 
from (
  select
 amplify(3, rowid, label, features) as (rowid, label, features)
  from  
 train 
) t
CLUSTER BY rand(31);

create or replace view test_exploded as
select 
  t1.rowid,
  t2.h,
  t2.hk,
  t2.Xh,
  t2.Xk
from
  test t1
  LATERAL VIEW feature_pairs(features, "-kpa") t2 as h, hk, Xh, Xk;
  
drop table kpa_model;
create table kpa_model as
select 
 feature,
 avg(w0) as w0,
 avg(w1) as w1,
 avg(w2) as w2,
 avg(w3) as w3
from 
 (select 
 train_kpa(features,label,"-c 0.01") as (feature, w0, w1, w2, w3)
  from 
 train
 -- train_x3
 ) t 
group by feature;

create or replace view kpa_predict 
as
WITH p1 as (
select
  t1.rowid, 
  kpa_predict(
t1.Xh, -- nonnull
t1.Xk, -- nonnull
m1.w0, -- nullable
m1.w1, -- nonnull
m1.w2, -- nonnull
m2.w3 -- nullable
  ) as score
from 
  test_exploded t1
  LEFT OUTER JOIN kpa_model m1 ON (m1.feature = t1.h)
  LEFT OUTER JOIN kpa_model m2 ON (m2.feature = t1.hk)
group by
  rowid
)
select
  rowid,
  case when score > 0.0 then 1 else 0 end as label
from
  p1;

create or replace view kpa_submit as
select 
  t.label as actual, 
  p.label as predicted
from 
  test t 
  JOIN kpa_predict p on (t.rowid = p.rowid);

select count(1)/16281 from kpa_submit 
where actual = predicted;
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #19: [HIVEMALL-23] Added Java annotations

2017-01-16 Thread myui
GitHub user myui opened a pull request:

https://github.com/apache/incubator-hivemall/pull/19

[HIVEMALL-23] Added Java annotations

Introduce Java Annotations such as `Experimental`, `Issue`, and 
`VisibleForTesting`.

See https://issues.apache.org/jira/browse/HIVEMALL-23 for the detail.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/myui/incubator-hivemall dev/annotations

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/19.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19


commit c05a8721ab3c705edd837dc56a64b39ffc705944
Author: myui 
Date:   2017-01-16T08:12:53Z

Added Java annotations




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #20: [HIVEMALL-28] Set HIVEMALL_HOME to absolute pa...

2017-01-19 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/20
  
@wangyum Thank you for the contribution.

@maropu Could you review this one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #20: [HIVEMALL-28] Set HIVEMALL_HOME to absolute pa...

2017-01-20 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/20
  
@wangyum merged. Thank you for the contribution!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #22: [HIVEMALL-30] Increase -Xmx to 1536 to avoid O...

2017-01-22 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/22
  
@maropu Could you take a look at?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #21: [HIVEMALL-29] Add github pull request template

2017-01-22 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/21
  
@wangyum Thank you for the contribution. Merged with small modifications.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #22: [HIVEMALL-30] Increase -Xmx to -Xmx1536m to av...

2017-01-22 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/22
  
@maropu BTW, why `mvn -q scalastyle:check test -Pspark-2.0" exited with 1` 
is happening?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #22: [HIVEMALL-30] Increase -Xmx to -Xmx1536m to av...

2017-01-22 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/22
  
@maropu It seems GC is still happening for some cases...
```
HivemallFeatureOpsSuite:
No output has been received in the last 10m0s, this potentially indicates a 
stalled build or something wrong with the build itself.
```

`MaxPermGen=1024m` might be too big to need more space for others. Do we 
need `-MaxPermGen` ? Less parameter is better for configuring JVM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #22: [HIVEMALL-30] Increase -Xmx to -Xmx1536m to av...

2017-01-22 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/22
  
@wangyum @maropu Thanks. Merged with some modifications. Configuration for 
spark-1.6 should also be changed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #24: [HIVEMALL-32] Print explicit error messages in...

2017-01-23 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/24
  
LGTM. Could you update ASF master as follows:

```
git checkout master
git checkout -b CheckCompiler
git pull https://github.com/maropu/incubator-hivemall.git 
hotfix/CheckCompiler
git log | grep "Author" | head -1
git checkout master
git merge --squash hotfix/CheckCompiler
git commit -a --author="Takeshi Yamamuro " --message="Close 
#24: Print explicit error messages in building xgboost with clang"
git push origin master
```

Note that it assumes that `origin` is 
https://git-wip-us.apache.org/repos/asf/incubator-hivemall.git


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #25: [HIVEMALL-34] Fix a bug to wrongly use mllib v...

2017-01-25 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/25
  
@maropu LGTM. Please merge this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #26: [HIVEMALL-35] Remove unnecessary implicit conv...

2017-01-25 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/26
  
@maropu LGTM. Please merge this PR and close JIRA issue as fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #27: [HIVEMALL-36] Refactor each_top_k

2017-01-25 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/27
  
@maropu LGTM. Please merge this PR and close the JIRA ticket.

BTW, `HiveUdfWithFeatureSuite` causes OOM again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...

2017-01-25 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/14#discussion_r97726633
  
--- Diff: core/src/main/java/hivemall/optimizer/Optimizer.java ---
@@ -0,0 +1,246 @@
+/*
+ * Hivemall: Hive scalable Machine Learning Library
+ *
+ * Copyright (C) 2015 Makoto YUI
+ * Copyright (C) 2013-2015 National Institute of Advanced Industrial 
Science and Technology (AIST)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package hivemall.optimizer;
+
+import java.util.Map;
+import javax.annotation.Nonnull;
+import javax.annotation.concurrent.NotThreadSafe;
+
+import hivemall.model.WeightValue;
+import hivemall.model.IWeightValue;
+
+public interface Optimizer {
+
+/**
+ * Update the weights of models thru this interface.
--- End diff --

Revised.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...

2017-01-25 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/14#discussion_r97936571
  
--- Diff: core/src/main/java/hivemall/optimizer/Optimizer.java ---
@@ -0,0 +1,246 @@
+/*
+ * Hivemall: Hive scalable Machine Learning Library
+ *
+ * Copyright (C) 2015 Makoto YUI
+ * Copyright (C) 2013-2015 National Institute of Advanced Industrial 
Science and Technology (AIST)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package hivemall.optimizer;
+
+import java.util.Map;
+import javax.annotation.Nonnull;
+import javax.annotation.concurrent.NotThreadSafe;
+
+import hivemall.model.WeightValue;
+import hivemall.model.IWeightValue;
+
+public interface Optimizer {
+
+/**
+ * Update the weights of models thru this interface.
+ */
+float computeUpdatedValue(@Nonnull Object feature, float weight, float 
gradient);
+
+// Count up #step to tune learning rate
+void proceedStep();
+
+static abstract class OptimizerBase implements Optimizer {
+
+protected final EtaEstimator etaImpl;
+protected final Regularization regImpl;
+
+protected int numStep = 1;
+
+public OptimizerBase(final Map options) {
+this.etaImpl = EtaEstimator.get(options);
+this.regImpl = Regularization.get(options);
+}
+
+@Override
+public void proceedStep() {
+numStep++;
+}
+
+// Directly update a given `weight` in terms of performance
+protected void computeUpdateValue(
+@Nonnull final IWeightValue weight, float gradient) {
+float delta = computeUpdateValueImpl(weight, 
regImpl.regularize(weight.get(), gradient));
+weight.set(weight.get() - etaImpl.eta(numStep) * delta);
+}
+
+// Compute a delta to update
+protected float computeUpdateValueImpl(
+@Nonnull final IWeightValue weight, float gradient) {
+return gradient;
+}
+
+}
+
+@NotThreadSafe
+static final class SGD extends OptimizerBase {
+
+private final IWeightValue weightValueReused;
+
+public SGD(final Map options) {
+super(options);
+this.weightValueReused = new WeightValue(0.f);
+}
+
+@Override
+public float computeUpdatedValue(
+@Nonnull Object feature, float weight, float gradient) {
+computeUpdateValue(weightValueReused, gradient);
+return weightValueReused.get();
+}
+
+}
+
+static abstract class AdaDelta extends OptimizerBase {
+
+private final float decay;
+private final float eps;
+private final float scale;
+
+public AdaDelta(Map options) {
+super(options);
+float decay = 0.95f;
+float eps = 1e-6f;
+float scale = 100.0f;
--- End diff --

It's hivemall extension. Hivemall's Adagrad stores `*scaled* sum of squared 
gradients` in `float`, not double to reduce memory consumption. When using it,  
it try to get the original sum of squared gradients as follows:

`double sumOfSquaredGradients = scaledSumOfSquaredGradients * scaling` 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...

2017-01-25 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/14#discussion_r97724411
  
--- Diff: core/src/main/java/hivemall/classifier/GeneralClassifierUDTF.java 
---
@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package hivemall.classifier;
+
+import java.util.HashMap;
+import java.util.Map;
+import javax.annotation.Nonnull;
+
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.Options;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+
+import hivemall.optimizer.LossFunctions;
+import hivemall.model.FeatureValue;
+
+/**
+ * A general classifier class with replaceable optimization functions.
+ */
+public class GeneralClassifierUDTF extends BinaryOnlineClassifierUDTF {
+
+protected final Map optimizerOptions;
+
+public GeneralClassifierUDTF() {
+super(true); // This enables new model interfaces
+this.optimizerOptions = new HashMap();
+// Set default values
+optimizerOptions.put("optimizer", "adagrad");
+optimizerOptions.put("eta", "fixed");
+optimizerOptions.put("eta0", "1.0");
+optimizerOptions.put("regularization", "RDA");
+optimizerOptions.put("lambda", "1e-6");
+optimizerOptions.put("scale", "100.0");
+optimizerOptions.put("lambda", "1.0");
--- End diff --

`lambda` is specified twice.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...

2017-01-25 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/14#discussion_r97716028
  
--- Diff: core/src/main/java/hivemall/model/NewDenseModel.java ---
@@ -0,0 +1,293 @@
+/*
+ * Hivemall: Hive scalable Machine Learning Library
+ *
+ * Copyright (C) 2015 Makoto YUI
+ * Copyright (C) 2013-2015 National Institute of Advanced Industrial 
Science and Technology (AIST)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package hivemall.model;
+
+import java.util.Arrays;
+import javax.annotation.Nonnull;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+
+import hivemall.model.WeightValue.WeightValueWithCovar;
+import hivemall.utils.collections.IMapIterator;
+import hivemall.utils.hadoop.HiveUtils;
+import hivemall.utils.lang.Copyable;
+import hivemall.utils.math.MathUtils;
+
+public final class NewDenseModel extends AbstractPredictionModel {
+private static final Log logger = 
LogFactory.getLog(NewDenseModel.class);
+
+private int size;
+private float[] weights;
+private float[] covars;
+
+// optional value for MIX
+private short[] clocks;
+private byte[] deltaUpdates;
+
+public NewDenseModel(int ndims) {
+this(ndims, false);
+}
+
+public NewDenseModel(int ndims, boolean withCovar) {
+super();
+int size = ndims + 1;
+this.size = size;
+this.weights = new float[size];
+if (withCovar) {
+float[] covars = new float[size];
+Arrays.fill(covars, 1f);
+this.covars = covars;
+} else {
+this.covars = null;
+}
+this.clocks = null;
+this.deltaUpdates = null;
+}
+
+@Override
+protected boolean isDenseModel() {
+return true;
+}
+
+@Override
+public boolean hasCovariance() {
+return covars != null;
+}
+
+@Override
+public void configureParams(boolean sum_of_squared_gradients, boolean 
sum_of_squared_delta_x,
+boolean sum_of_gradients) {}
+
+@Override
+public void configureClock() {
+if (clocks == null) {
+this.clocks = new short[size];
+this.deltaUpdates = new byte[size];
+}
+}
+
+@Override
+public boolean hasClock() {
+return clocks != null;
+}
+
+@Override
+public void resetDeltaUpdates(int feature) {
+deltaUpdates[feature] = 0;
+}
+
+private void ensureCapacity(final int index) {
+if (index >= size) {
+int bits = MathUtils.bitsRequired(index);
+int newSize = (1 << bits) + 1;
+int oldSize = size;
+logger.info("Expands internal array size from " + oldSize + " 
to " + newSize + " ("
++ bits + " bits)");
+this.size = newSize;
+this.weights = Arrays.copyOf(weights, newSize);
+if (covars != null) {
+this.covars = Arrays.copyOf(covars, newSize);
+Arrays.fill(covars, oldSize, newSize, 1.f);
+}
+if(clocks != null) {
+this.clocks = Arrays.copyOf(clocks, newSize);
+this.deltaUpdates = Arrays.copyOf(deltaUpdates, newSize);
+}
+}
+}
+
+@SuppressWarnings("unchecked")
+@Override
+public  T get(Object feature) {
+final int i = HiveUtils.parseInt(feature);
+if (i >= size) {
+return null;
+}
+if(covars != null) {
+return (T) new WeightValueWithCovar(weights[i], covars[i]);
+} else {
+return (T) new WeightValue(weights[i]);
+}
+}
+
+@Override
+public  void set(Object feature, T value) {
+int i = HiveUtils.parseInt(feature);
+ensureCapacity(i);
+  

[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...

2017-01-25 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/14#discussion_r97722648
  
--- Diff: core/src/main/java/hivemall/optimizer/EtaEstimator.java ---
@@ -157,4 +158,34 @@ public static EtaEstimator get(@Nullable CommandLine 
cl, float defaultEta0)
 return new InvscalingEtaEstimator(eta0, power_t);
 }
 
+@Nonnull
+public static EtaEstimator get(@Nonnull final Map 
options)
+throws IllegalArgumentException {
+final String etaName = options.get("eta");
+if(etaName == null) {
+return new FixedEtaEstimator(1.f);
--- End diff --

Absolutely. Changed to `InvscalingEtaEstimator(0.1f, 0.1d)` when `eta` is 
not provided. (cc: @maropu )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...

2017-01-25 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/14#discussion_r97715666
  
--- Diff: core/src/main/java/hivemall/classifier/GeneralClassifierUDTF.java 
---
@@ -0,0 +1,122 @@
+/*
+ * Hivemall: Hive scalable Machine Learning Library
+ *
+ * Copyright (C) 2015 Makoto YUI
+ * Copyright (C) 2013-2015 National Institute of Advanced Industrial 
Science and Technology (AIST)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package hivemall.classifier;
+
+import java.util.HashMap;
+import java.util.Map;
+import javax.annotation.Nonnull;
+
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.Options;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+
+import hivemall.optimizer.LossFunctions;
+import hivemall.model.FeatureValue;
+
+/**
+ * A general classifier class with replaceable optimization functions.
+ */
+public class GeneralClassifierUDTF extends BinaryOnlineClassifierUDTF {
+
+protected final Map optimizerOptions;
+
+public GeneralClassifierUDTF() {
+super(true); // This enables new model interfaces
+this.optimizerOptions = new HashMap();
+// Set default values
+optimizerOptions.put("optimizer", "adagrad");
+optimizerOptions.put("eta", "fixed");
+optimizerOptions.put("eta0", "1.0");
+optimizerOptions.put("regularization", "RDA");
+optimizerOptions.put("lambda", "1e-6");
+optimizerOptions.put("scale", "100.0");
+optimizerOptions.put("lambda", "1.0");
+}
+
+@Override
+public StructObjectInspector initialize(ObjectInspector[] argOIs) 
throws UDFArgumentException {
+if(argOIs.length != 2 && argOIs.length != 3) {
+throw new UDFArgumentException(
+this.getClass().getSimpleName()
+  + " takes 2 or 3 arguments: List 
features, int label "
+  + "[, constant string options]");
+}
+return super.initialize(argOIs);
+}
+
+@Override
+protected Options getOptions() {
+Options opts = super.getOptions();
+opts.addOption("optimizer", "opt", true, "Optimizer to update 
weights [default: adagrad+rda]");
+opts.addOption("eta", "eta0", true, "Initial learning rate 
[default 1.0]");
+opts.addOption("lambda", true, "Lambda value of RDA [default: 
1e-6f]");
+opts.addOption("scale", true, "Scaling factor for cumulative 
weights [100.0]");
+opts.addOption("regularization", "reg", true, "Regularization type 
[default not-defined]");
+opts.addOption("lambda", true, "Regularization term on weights 
[default 1.0]");
+return opts;
+}
+
+@Override
+protected CommandLine processOptions(ObjectInspector[] argOIs) throws 
UDFArgumentException {
+final CommandLine cl = super.processOptions(argOIs);
+assert(cl != null);
+if(cl != null) {
+for(final String arg : cl.getArgs()) {
+optimizerOptions.put(arg, cl.getOptionValue(arg));
+}
+}
+return cl;
+}
+
+@Override
+protected Map getOptimzierOptions() {
--- End diff --

removed `getOptimzierOptions`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...

2017-01-25 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/14#discussion_r97715653
  
--- Diff: core/src/main/java/hivemall/classifier/GeneralClassifierUDTF.java 
---
@@ -0,0 +1,122 @@
+/*
+ * Hivemall: Hive scalable Machine Learning Library
+ *
+ * Copyright (C) 2015 Makoto YUI
+ * Copyright (C) 2013-2015 National Institute of Advanced Industrial 
Science and Technology (AIST)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package hivemall.classifier;
+
+import java.util.HashMap;
+import java.util.Map;
+import javax.annotation.Nonnull;
+
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.Options;
+import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
+import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
+import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
+
+import hivemall.optimizer.LossFunctions;
+import hivemall.model.FeatureValue;
+
+/**
+ * A general classifier class with replaceable optimization functions.
+ */
+public class GeneralClassifierUDTF extends BinaryOnlineClassifierUDTF {
+
+protected final Map optimizerOptions;
+
+public GeneralClassifierUDTF() {
+super(true); // This enables new model interfaces
+this.optimizerOptions = new HashMap();
+// Set default values
+optimizerOptions.put("optimizer", "adagrad");
+optimizerOptions.put("eta", "fixed");
+optimizerOptions.put("eta0", "1.0");
+optimizerOptions.put("regularization", "RDA");
+optimizerOptions.put("lambda", "1e-6");
+optimizerOptions.put("scale", "100.0");
+optimizerOptions.put("lambda", "1.0");
+}
+
+@Override
+public StructObjectInspector initialize(ObjectInspector[] argOIs) 
throws UDFArgumentException {
+if(argOIs.length != 2 && argOIs.length != 3) {
+throw new UDFArgumentException(
+this.getClass().getSimpleName()
+  + " takes 2 or 3 arguments: List 
features, int label "
+  + "[, constant string options]");
+}
+return super.initialize(argOIs);
+}
+
+@Override
+protected Options getOptions() {
+Options opts = super.getOptions();
+opts.addOption("optimizer", "opt", true, "Optimizer to update 
weights [default: adagrad+rda]");
+opts.addOption("eta", "eta0", true, "Initial learning rate 
[default 1.0]");
+opts.addOption("lambda", true, "Lambda value of RDA [default: 
1e-6f]");
+opts.addOption("scale", true, "Scaling factor for cumulative 
weights [100.0]");
+opts.addOption("regularization", "reg", true, "Regularization type 
[default not-defined]");
+opts.addOption("lambda", true, "Regularization term on weights 
[default 1.0]");
+return opts;
+}
+
+@Override
+protected CommandLine processOptions(ObjectInspector[] argOIs) throws 
UDFArgumentException {
+final CommandLine cl = super.processOptions(argOIs);
+assert(cl != null);
+if(cl != null) {
--- End diff --

Fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...

2017-01-25 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/14#discussion_r97934044
  
--- Diff: core/src/main/java/hivemall/optimizer/Optimizer.java ---
@@ -0,0 +1,246 @@
+/*
+ * Hivemall: Hive scalable Machine Learning Library
+ *
+ * Copyright (C) 2015 Makoto YUI
+ * Copyright (C) 2013-2015 National Institute of Advanced Industrial 
Science and Technology (AIST)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package hivemall.optimizer;
+
+import java.util.Map;
+import javax.annotation.Nonnull;
+import javax.annotation.concurrent.NotThreadSafe;
+
+import hivemall.model.WeightValue;
+import hivemall.model.IWeightValue;
+
+public interface Optimizer {
+
+/**
+ * Update the weights of models thru this interface.
+ */
+float computeUpdatedValue(@Nonnull Object feature, float weight, float 
gradient);
+
+// Count up #step to tune learning rate
--- End diff --

Renamed and added Javadoc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...

2017-01-25 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/14#discussion_r97715714
  
--- Diff: core/src/main/java/hivemall/model/NewDenseModel.java ---
@@ -0,0 +1,293 @@
+/*
+ * Hivemall: Hive scalable Machine Learning Library
+ *
+ * Copyright (C) 2015 Makoto YUI
+ * Copyright (C) 2013-2015 National Institute of Advanced Industrial 
Science and Technology (AIST)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package hivemall.model;
+
+import java.util.Arrays;
+import javax.annotation.Nonnull;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+
+import hivemall.model.WeightValue.WeightValueWithCovar;
+import hivemall.utils.collections.IMapIterator;
+import hivemall.utils.hadoop.HiveUtils;
+import hivemall.utils.lang.Copyable;
+import hivemall.utils.math.MathUtils;
+
+public final class NewDenseModel extends AbstractPredictionModel {
+private static final Log logger = 
LogFactory.getLog(NewDenseModel.class);
+
+private int size;
+private float[] weights;
+private float[] covars;
+
+// optional value for MIX
+private short[] clocks;
+private byte[] deltaUpdates;
+
+public NewDenseModel(int ndims) {
+this(ndims, false);
+}
+
+public NewDenseModel(int ndims, boolean withCovar) {
+super();
+int size = ndims + 1;
+this.size = size;
+this.weights = new float[size];
+if (withCovar) {
+float[] covars = new float[size];
+Arrays.fill(covars, 1f);
+this.covars = covars;
+} else {
+this.covars = null;
+}
+this.clocks = null;
+this.deltaUpdates = null;
+}
+
+@Override
+protected boolean isDenseModel() {
+return true;
+}
+
+@Override
+public boolean hasCovariance() {
+return covars != null;
+}
+
+@Override
+public void configureParams(boolean sum_of_squared_gradients, boolean 
sum_of_squared_delta_x,
+boolean sum_of_gradients) {}
+
+@Override
+public void configureClock() {
+if (clocks == null) {
+this.clocks = new short[size];
+this.deltaUpdates = new byte[size];
+}
+}
+
+@Override
+public boolean hasClock() {
+return clocks != null;
+}
+
+@Override
+public void resetDeltaUpdates(int feature) {
+deltaUpdates[feature] = 0;
+}
+
+private void ensureCapacity(final int index) {
+if (index >= size) {
+int bits = MathUtils.bitsRequired(index);
+int newSize = (1 << bits) + 1;
+int oldSize = size;
+logger.info("Expands internal array size from " + oldSize + " 
to " + newSize + " ("
++ bits + " bits)");
+this.size = newSize;
+this.weights = Arrays.copyOf(weights, newSize);
+if (covars != null) {
+this.covars = Arrays.copyOf(covars, newSize);
+Arrays.fill(covars, oldSize, newSize, 1.f);
+}
+if(clocks != null) {
+this.clocks = Arrays.copyOf(clocks, newSize);
+this.deltaUpdates = Arrays.copyOf(deltaUpdates, newSize);
+}
+}
+}
+
+@SuppressWarnings("unchecked")
+@Override
+public  T get(Object feature) {
+final int i = HiveUtils.parseInt(feature);
+if (i >= size) {
+return null;
+}
+if(covars != null) {
+return (T) new WeightValueWithCovar(weights[i], covars[i]);
+} else {
+return (T) new WeightValue(weights[i]);
+}
+}
+
+@Override
+public  void set(Object feature, T value) {
--- End diff --

Fixed


---
If your project is set up for it, you can reply to

[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...

2017-01-25 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/14#discussion_r97716643
  
--- Diff: core/src/main/java/hivemall/model/NewSparseModel.java ---
@@ -0,0 +1,197 @@
+/*
+ * Hivemall: Hive scalable Machine Learning Library
+ *
+ * Copyright (C) 2015 Makoto YUI
+ * Copyright (C) 2013-2015 National Institute of Advanced Industrial 
Science and Technology (AIST)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package hivemall.model;
+
+import hivemall.model.WeightValueWithClock.WeightValueParamsF1Clock;
+import hivemall.model.WeightValueWithClock.WeightValueParamsF2Clock;
+import hivemall.model.WeightValueWithClock.WeightValueWithCovarClock;
+import hivemall.utils.collections.IMapIterator;
+import hivemall.utils.collections.OpenHashMap;
+
+import javax.annotation.Nonnull;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+
+public final class NewSparseModel extends AbstractPredictionModel {
+private static final Log logger = 
LogFactory.getLog(NewSparseModel.class);
+
+private final OpenHashMap weights;
+private final boolean hasCovar;
+private boolean clockEnabled;
+
+public NewSparseModel(int size) {
+this(size, false);
+}
+
+public NewSparseModel(int size, boolean hasCovar) {
+super();
+this.weights = new OpenHashMap(size);
+this.hasCovar = hasCovar;
+this.clockEnabled = false;
+}
+
+@Override
+protected boolean isDenseModel() {
+return false;
+}
+
+@Override
+public boolean hasCovariance() {
+return hasCovar;
+}
+
+@Override
+public void configureParams(boolean sum_of_squared_gradients, boolean 
sum_of_squared_delta_x,
+boolean sum_of_gradients) {}
+
+@Override
+public void configureClock() {
+this.clockEnabled = true;
+}
+
+@Override
+public boolean hasClock() {
+return clockEnabled;
+}
+
+@SuppressWarnings("unchecked")
+@Override
+public  T get(final Object feature) {
+return (T) weights.get(feature);
+}
+
+@Override
+public  void set(final Object feature, final T 
value) {
+assert (feature != null);
+assert (value != null);
+
+final IWeightValue wrapperValue = wrapIfRequired(value);
+
+if (clockEnabled && value.isTouched()) {
+IWeightValue old = weights.get(feature);
+if (old != null) {
+short newclock = (short) (old.getClock() + (short) 1);
+wrapperValue.setClock(newclock);
+int newDelta = old.getDeltaUpdates() + 1;
+wrapperValue.setDeltaUpdates((byte) newDelta);
+}
+}
+weights.put(feature, wrapperValue);
+
+onUpdate(feature, wrapperValue);
+}
+
+@Override
+public void delete(@Nonnull Object feature) {
+weights.remove(feature);
+}
+
+private IWeightValue wrapIfRequired(final IWeightValue value) {
+final IWeightValue wrapper;
+if (clockEnabled) {
+switch (value.getType()) {
+case NoParams:
+wrapper = new WeightValueWithClock(value);
+break;
+case ParamsCovar:
+wrapper = new WeightValueWithCovarClock(value);
+break;
+case ParamsF1:
+wrapper = new WeightValueParamsF1Clock(value);
+break;
+case ParamsF2:
+wrapper = new WeightValueParamsF2Clock(value);
+break;
+default:
+throw new IllegalStateException("Unexpected value 
type: " + value.getType());
--- End diff --

It was a bug. Fixed.


---
If your project is set up for it, you can reply to this email and have your
reply ap

[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...

2017-01-25 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/14#discussion_r97715819
  
--- Diff: core/src/main/java/hivemall/model/NewDenseModel.java ---
@@ -0,0 +1,293 @@
+/*
+ * Hivemall: Hive scalable Machine Learning Library
+ *
+ * Copyright (C) 2015 Makoto YUI
+ * Copyright (C) 2013-2015 National Institute of Advanced Industrial 
Science and Technology (AIST)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package hivemall.model;
+
+import java.util.Arrays;
+import javax.annotation.Nonnull;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+
+import hivemall.model.WeightValue.WeightValueWithCovar;
+import hivemall.utils.collections.IMapIterator;
+import hivemall.utils.hadoop.HiveUtils;
+import hivemall.utils.lang.Copyable;
+import hivemall.utils.math.MathUtils;
+
+public final class NewDenseModel extends AbstractPredictionModel {
+private static final Log logger = 
LogFactory.getLog(NewDenseModel.class);
+
+private int size;
+private float[] weights;
+private float[] covars;
+
+// optional value for MIX
+private short[] clocks;
+private byte[] deltaUpdates;
+
+public NewDenseModel(int ndims) {
+this(ndims, false);
+}
+
+public NewDenseModel(int ndims, boolean withCovar) {
+super();
+int size = ndims + 1;
+this.size = size;
+this.weights = new float[size];
+if (withCovar) {
+float[] covars = new float[size];
+Arrays.fill(covars, 1f);
+this.covars = covars;
+} else {
+this.covars = null;
+}
+this.clocks = null;
+this.deltaUpdates = null;
+}
+
+@Override
+protected boolean isDenseModel() {
+return true;
+}
+
+@Override
+public boolean hasCovariance() {
+return covars != null;
+}
+
+@Override
+public void configureParams(boolean sum_of_squared_gradients, boolean 
sum_of_squared_delta_x,
+boolean sum_of_gradients) {}
+
+@Override
+public void configureClock() {
+if (clocks == null) {
+this.clocks = new short[size];
+this.deltaUpdates = new byte[size];
+}
+}
+
+@Override
+public boolean hasClock() {
+return clocks != null;
+}
+
+@Override
+public void resetDeltaUpdates(int feature) {
+deltaUpdates[feature] = 0;
+}
+
+private void ensureCapacity(final int index) {
+if (index >= size) {
+int bits = MathUtils.bitsRequired(index);
+int newSize = (1 << bits) + 1;
+int oldSize = size;
+logger.info("Expands internal array size from " + oldSize + " 
to " + newSize + " ("
++ bits + " bits)");
+this.size = newSize;
+this.weights = Arrays.copyOf(weights, newSize);
+if (covars != null) {
+this.covars = Arrays.copyOf(covars, newSize);
+Arrays.fill(covars, oldSize, newSize, 1.f);
+}
+if(clocks != null) {
+this.clocks = Arrays.copyOf(clocks, newSize);
+this.deltaUpdates = Arrays.copyOf(deltaUpdates, newSize);
+}
+}
+}
+
+@SuppressWarnings("unchecked")
+@Override
+public  T get(Object feature) {
+final int i = HiveUtils.parseInt(feature);
+if (i >= size) {
+return null;
+}
+if(covars != null) {
+return (T) new WeightValueWithCovar(weights[i], covars[i]);
+} else {
+return (T) new WeightValue(weights[i]);
+}
+}
+
+@Override
+public  void set(Object feature, T value) {
+int i = HiveUtils.parseInt(feature);
+ensureCapacity(i);
+  

[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...

2017-01-25 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/14#discussion_r97937226
  
--- Diff: core/src/main/java/hivemall/optimizer/Regularization.java ---
@@ -0,0 +1,99 @@
+/*
+ * Hivemall: Hive scalable Machine Learning Library
+ *
+ * Copyright (C) 2015 Makoto YUI
+ * Copyright (C) 2013-2015 National Institute of Advanced Industrial 
Science and Technology (AIST)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package hivemall.optimizer;
+
+import javax.annotation.Nonnull;
+import java.util.Map;
+
+public abstract class Regularization {
+
+protected final float lambda;
+
+public Regularization(final Map options) {
+float lambda = 1e-6f;
--- End diff --

Agree with it. Revised to 0.0001.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...

2017-01-25 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/14#discussion_r97715678
  
--- Diff: core/src/main/java/hivemall/model/NewDenseModel.java ---
@@ -0,0 +1,293 @@
+/*
+ * Hivemall: Hive scalable Machine Learning Library
+ *
+ * Copyright (C) 2015 Makoto YUI
+ * Copyright (C) 2013-2015 National Institute of Advanced Industrial 
Science and Technology (AIST)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package hivemall.model;
+
+import java.util.Arrays;
+import javax.annotation.Nonnull;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+
+import hivemall.model.WeightValue.WeightValueWithCovar;
+import hivemall.utils.collections.IMapIterator;
+import hivemall.utils.hadoop.HiveUtils;
+import hivemall.utils.lang.Copyable;
+import hivemall.utils.math.MathUtils;
+
+public final class NewDenseModel extends AbstractPredictionModel {
+private static final Log logger = 
LogFactory.getLog(NewDenseModel.class);
+
+private int size;
+private float[] weights;
+private float[] covars;
+
+// optional value for MIX
+private short[] clocks;
+private byte[] deltaUpdates;
+
+public NewDenseModel(int ndims) {
+this(ndims, false);
+}
+
+public NewDenseModel(int ndims, boolean withCovar) {
+super();
+int size = ndims + 1;
+this.size = size;
+this.weights = new float[size];
+if (withCovar) {
+float[] covars = new float[size];
+Arrays.fill(covars, 1f);
+this.covars = covars;
+} else {
+this.covars = null;
+}
+this.clocks = null;
+this.deltaUpdates = null;
+}
+
+@Override
+protected boolean isDenseModel() {
+return true;
+}
+
+@Override
+public boolean hasCovariance() {
+return covars != null;
+}
+
+@Override
+public void configureParams(boolean sum_of_squared_gradients, boolean 
sum_of_squared_delta_x,
+boolean sum_of_gradients) {}
+
+@Override
+public void configureClock() {
+if (clocks == null) {
+this.clocks = new short[size];
+this.deltaUpdates = new byte[size];
+}
+}
+
+@Override
+public boolean hasClock() {
+return clocks != null;
+}
+
+@Override
+public void resetDeltaUpdates(int feature) {
+deltaUpdates[feature] = 0;
+}
+
+private void ensureCapacity(final int index) {
+if (index >= size) {
+int bits = MathUtils.bitsRequired(index);
+int newSize = (1 << bits) + 1;
+int oldSize = size;
+logger.info("Expands internal array size from " + oldSize + " 
to " + newSize + " ("
++ bits + " bits)");
+this.size = newSize;
+this.weights = Arrays.copyOf(weights, newSize);
+if (covars != null) {
+this.covars = Arrays.copyOf(covars, newSize);
+Arrays.fill(covars, oldSize, newSize, 1.f);
+}
+if(clocks != null) {
+this.clocks = Arrays.copyOf(clocks, newSize);
+this.deltaUpdates = Arrays.copyOf(deltaUpdates, newSize);
+}
+}
+}
+
+@SuppressWarnings("unchecked")
+@Override
+public  T get(Object feature) {
--- End diff --

Fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...

2017-01-25 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/14#discussion_r97715621
  
--- Diff: 
core/src/main/java/hivemall/model/NewSpaceEfficientDenseModel.java ---
@@ -0,0 +1,317 @@
+/*
+ * Hivemall: Hive scalable Machine Learning Library
+ *
+ * Copyright (C) 2015 Makoto YUI
+ * Copyright (C) 2013-2015 National Institute of Advanced Industrial 
Science and Technology (AIST)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package hivemall.model;
+
+import hivemall.model.WeightValue.WeightValueWithCovar;
+import hivemall.utils.collections.IMapIterator;
+import hivemall.utils.hadoop.HiveUtils;
+import hivemall.utils.lang.Copyable;
+import hivemall.utils.lang.HalfFloat;
+import hivemall.utils.math.MathUtils;
+
+import java.util.Arrays;
+import javax.annotation.Nonnull;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+
+public final class NewSpaceEfficientDenseModel extends 
AbstractPredictionModel {
+private static final Log logger = 
LogFactory.getLog(NewSpaceEfficientDenseModel.class);
+
+private int size;
+private short[] weights;
+private short[] covars;
+
+// optional value for MIX
+private short[] clocks;
+private byte[] deltaUpdates;
+
+public NewSpaceEfficientDenseModel(int ndims) {
+this(ndims, false);
+}
+
+public NewSpaceEfficientDenseModel(int ndims, boolean withCovar) {
+super();
+int size = ndims + 1;
+this.size = size;
+this.weights = new short[size];
+if (withCovar) {
+short[] covars = new short[size];
+Arrays.fill(covars, HalfFloat.ONE);
+this.covars = covars;
+} else {
+this.covars = null;
+}
+this.clocks = null;
+this.deltaUpdates = null;
+}
+
+@Override
+protected boolean isDenseModel() {
+return true;
+}
+
+@Override
+public boolean hasCovariance() {
+return covars != null;
+}
+
+@Override
+public void configureParams(boolean sum_of_squared_gradients, boolean 
sum_of_squared_delta_x,
+boolean sum_of_gradients) {}
+
+@Override
+public void configureClock() {
+if (clocks == null) {
+this.clocks = new short[size];
+this.deltaUpdates = new byte[size];
+}
+}
+
+@Override
+public boolean hasClock() {
+return clocks != null;
+}
+
+@Override
+public void resetDeltaUpdates(int feature) {
+deltaUpdates[feature] = 0;
+}
+
+private float getWeight(final int i) {
+final short w = weights[i];
+return (w == HalfFloat.ZERO) ? HalfFloat.ZERO : 
HalfFloat.halfFloatToFloat(w);
+}
+
+private float getCovar(final int i) {
+return HalfFloat.halfFloatToFloat(covars[i]);
--- End diff --

That should not be happen. `i` is checked before calling `getCovar` as 
follows:
```
int i = HiveUtils.parseInt(feature);
if (i >= size) {
return 1f;
}
return getCovar(i);
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...

2017-01-25 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/14#discussion_r97716142
  
--- Diff: core/src/main/java/hivemall/model/NewDenseModel.java ---
@@ -0,0 +1,293 @@
+/*
+ * Hivemall: Hive scalable Machine Learning Library
+ *
+ * Copyright (C) 2015 Makoto YUI
+ * Copyright (C) 2013-2015 National Institute of Advanced Industrial 
Science and Technology (AIST)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package hivemall.model;
+
+import java.util.Arrays;
+import javax.annotation.Nonnull;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+
+import hivemall.model.WeightValue.WeightValueWithCovar;
+import hivemall.utils.collections.IMapIterator;
+import hivemall.utils.hadoop.HiveUtils;
+import hivemall.utils.lang.Copyable;
+import hivemall.utils.math.MathUtils;
+
+public final class NewDenseModel extends AbstractPredictionModel {
+private static final Log logger = 
LogFactory.getLog(NewDenseModel.class);
+
+private int size;
+private float[] weights;
+private float[] covars;
+
+// optional value for MIX
+private short[] clocks;
+private byte[] deltaUpdates;
+
+public NewDenseModel(int ndims) {
+this(ndims, false);
+}
+
+public NewDenseModel(int ndims, boolean withCovar) {
+super();
+int size = ndims + 1;
+this.size = size;
+this.weights = new float[size];
+if (withCovar) {
+float[] covars = new float[size];
+Arrays.fill(covars, 1f);
+this.covars = covars;
+} else {
+this.covars = null;
+}
+this.clocks = null;
+this.deltaUpdates = null;
+}
+
+@Override
+protected boolean isDenseModel() {
+return true;
+}
+
+@Override
+public boolean hasCovariance() {
+return covars != null;
+}
+
+@Override
+public void configureParams(boolean sum_of_squared_gradients, boolean 
sum_of_squared_delta_x,
+boolean sum_of_gradients) {}
+
+@Override
+public void configureClock() {
+if (clocks == null) {
+this.clocks = new short[size];
+this.deltaUpdates = new byte[size];
+}
+}
+
+@Override
+public boolean hasClock() {
+return clocks != null;
+}
+
+@Override
+public void resetDeltaUpdates(int feature) {
+deltaUpdates[feature] = 0;
+}
+
+private void ensureCapacity(final int index) {
+if (index >= size) {
+int bits = MathUtils.bitsRequired(index);
+int newSize = (1 << bits) + 1;
+int oldSize = size;
+logger.info("Expands internal array size from " + oldSize + " 
to " + newSize + " ("
++ bits + " bits)");
+this.size = newSize;
+this.weights = Arrays.copyOf(weights, newSize);
+if (covars != null) {
+this.covars = Arrays.copyOf(covars, newSize);
+Arrays.fill(covars, oldSize, newSize, 1.f);
+}
+if(clocks != null) {
+this.clocks = Arrays.copyOf(clocks, newSize);
+this.deltaUpdates = Arrays.copyOf(deltaUpdates, newSize);
+}
+}
+}
+
+@SuppressWarnings("unchecked")
+@Override
+public  T get(Object feature) {
+final int i = HiveUtils.parseInt(feature);
+if (i >= size) {
+return null;
+}
+if(covars != null) {
+return (T) new WeightValueWithCovar(weights[i], covars[i]);
+} else {
+return (T) new WeightValue(weights[i]);
+}
+}
+
+@Override
+public  void set(Object feature, T value) {
+int i = HiveUtils.parseInt(feature);
+ensureCapacity(i);
+  

[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...

2017-01-25 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/14#discussion_r97716349
  
--- Diff: 
core/src/main/java/hivemall/model/NewSpaceEfficientDenseModel.java ---
@@ -0,0 +1,317 @@
+/*
+ * Hivemall: Hive scalable Machine Learning Library
+ *
+ * Copyright (C) 2015 Makoto YUI
+ * Copyright (C) 2013-2015 National Institute of Advanced Industrial 
Science and Technology (AIST)
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package hivemall.model;
+
+import hivemall.model.WeightValue.WeightValueWithCovar;
+import hivemall.utils.collections.IMapIterator;
+import hivemall.utils.hadoop.HiveUtils;
+import hivemall.utils.lang.Copyable;
+import hivemall.utils.lang.HalfFloat;
+import hivemall.utils.math.MathUtils;
+
+import java.util.Arrays;
+import javax.annotation.Nonnull;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+
+public final class NewSpaceEfficientDenseModel extends 
AbstractPredictionModel {
+private static final Log logger = 
LogFactory.getLog(NewSpaceEfficientDenseModel.class);
+
+private int size;
+private short[] weights;
+private short[] covars;
+
+// optional value for MIX
+private short[] clocks;
+private byte[] deltaUpdates;
+
+public NewSpaceEfficientDenseModel(int ndims) {
+this(ndims, false);
+}
+
+public NewSpaceEfficientDenseModel(int ndims, boolean withCovar) {
+super();
+int size = ndims + 1;
+this.size = size;
+this.weights = new short[size];
+if (withCovar) {
+short[] covars = new short[size];
+Arrays.fill(covars, HalfFloat.ONE);
+this.covars = covars;
+} else {
+this.covars = null;
+}
+this.clocks = null;
+this.deltaUpdates = null;
+}
+
+@Override
+protected boolean isDenseModel() {
+return true;
+}
+
+@Override
+public boolean hasCovariance() {
+return covars != null;
+}
+
+@Override
+public void configureParams(boolean sum_of_squared_gradients, boolean 
sum_of_squared_delta_x,
+boolean sum_of_gradients) {}
+
+@Override
+public void configureClock() {
+if (clocks == null) {
+this.clocks = new short[size];
+this.deltaUpdates = new byte[size];
+}
+}
+
+@Override
+public boolean hasClock() {
+return clocks != null;
+}
+
+@Override
+public void resetDeltaUpdates(int feature) {
+deltaUpdates[feature] = 0;
+}
+
+private float getWeight(final int i) {
+final short w = weights[i];
+return (w == HalfFloat.ZERO) ? HalfFloat.ZERO : 
HalfFloat.halfFloatToFloat(w);
+}
+
+private float getCovar(final int i) {
+return HalfFloat.halfFloatToFloat(covars[i]);
+}
+
+private void _setWeight(final int i, final float v) {
+if(Math.abs(v) >= HalfFloat.MAX_FLOAT) {
+throw new IllegalArgumentException("Acceptable maximum weight 
is "
++ HalfFloat.MAX_FLOAT + ": " + v);
+}
+weights[i] = HalfFloat.floatToHalfFloat(v);
+}
+
+private void setCovar(final int i, final float v) {
+HalfFloat.checkRange(v);
+covars[i] = HalfFloat.floatToHalfFloat(v);
+}
+
+private void ensureCapacity(final int index) {
+if (index >= size) {
+int bits = MathUtils.bitsRequired(index);
+int newSize = (1 << bits) + 1;
+int oldSize = size;
+logger.info("Expands internal array size from " + oldSize + " 
to " + newSize + " ("
++ bits + " bits)");
+this.size = newSize;
+this.weights = Arrays.copyOf(weights, newSize);
+if (covars != null) {
+th

[GitHub] incubator-hivemall pull request #14: [WIP] Separate optimizer implementation...

2017-01-25 Thread myui
Github user myui commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/14#discussion_r96351621
  
--- Diff: 
core/src/main/java/hivemall/classifier/BinaryOnlineClassifierUDTF.java ---
@@ -56,8 +57,19 @@
 private boolean parseFeature;
 
 protected PredictionModel model;
+protected Optimizer optimizerImpl;
 protected int count;
 
+private boolean enableNewModel;
+
+public BinaryOnlineClassifierUDTF() {
+this.enableNewModel = false;
--- End diff --

`enableNewModel` is never used. Is this required?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #14: [WIP] Separate optimizer implementations

2017-01-25 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/14
  
LossFunction should be selectable, not fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #28: [HIVEMALL-30] Temporarily ignore a streaming t...

2017-01-26 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/28
  
@maropu LGTM. Please merge it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #30: [HIVEMALL-37] Support a SST-based change-point...

2017-01-26 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/30
  
LGTM. You can merge this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #31: [HIVEMALL-40] Load xgboost-formatted data via ...

2017-01-26 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/31
  
LGTM. Please merge and close this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #30: [HIVEMALL-37] Support a SST-based change-point...

2017-01-26 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/30
  
@maropu Could you add [SPARK] after [HIVEMALL-37]  in the title? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #29: [HIVEMALL-39][SPARK] Put the use of HiveUDFs i...

2017-01-26 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/29
  
@maropu LGTM. Please merge and close this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #30: [HIVEMALL-37][SPARK] Support a SST-based chang...

2017-01-26 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/30
  
@maropu LGTM. Please merge this PR and close the JIRA ticket.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #32: [HIVEMALL-42][DOC] Fix the link to license fil...

2017-01-28 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/32
  
@aajisaka We welcome your first contribution!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #33: [HIVEMALL-44][SAPRK] Implement a prototype of ...

2017-01-30 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/33
  
output `top1Df` should be explained in the description.

BTW, I personally prefer `top_k_join` instead of `join_top_k`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #34: [HIVEMALL-45][SPARK] Upgrade spark v2.0.0 to v...

2017-01-30 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/34
  
@maropu LGTM. Please merge this PR and close the corresponding JIRA ticket 
as fixed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #33: [HIVEMALL-44][SAPRK] Implement a prototype of ...

2017-01-30 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/33
  
@maropu waiting for markdown to be included in this PR :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #35: [HIVEMALL-31][SPARK] Support Spark-v2.1.0

2017-01-31 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/35
  
@maropu test is failing for v2.1.

```
Saving to 
outputFile=/home/travis/build/apache/incubator-hivemall/spark/spark-common/target/scalastyle-output.xml
Processed 3 file(s)
Found 0 errors
Found 0 warnings
Found 0 infos
Finished in 667 ms
error 
file=/home/travis/build/apache/incubator-hivemall/spark/spark-2.1/src/test/scala/org/apache/spark/sql/QueryTest.scala
 message=Header does not match expected text line=2
error 
file=/home/travis/build/apache/incubator-hivemall/spark/spark-2.1/src/test/scala/org/apache/spark/sql/catalyst/plans/PlanTest.scala
 message=Header does not match expected text line=2
error 
file=/home/travis/build/apache/incubator-hivemall/spark/spark-2.1/src/test/scala/org/apache/spark/sql/hive/HivemallOpsSuite.scala
 message=import.ordering.wrongOrderInGroup.message line=22 column=0
error 
file=/home/travis/build/apache/incubator-hivemall/spark/spark-2.1/src/test/scala/org/apache/spark/sql/hive/HivemallOpsSuite.scala
 message=import.ordering.wrongOrderInGroup.message line=30 column=0
error 
file=/home/travis/build/apache/incubator-hivemall/spark/spark-2.1/src/test/scala/org/apache/spark/sql/hive/test/TestHiveSingleton.scala
 message=Header does not match expected text line=2
error 
file=/home/travis/build/apache/incubator-hivemall/spark/spark-2.1/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala
 message=Header does not match expected text line=2
error 
file=/home/travis/build/apache/incubator-hivemall/spark/spark-2.1/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala
 message=Header does not match expected text line=2
error 
file=/home/travis/build/apache/incubator-hivemall/spark/spark-2.1/src/test/scala/org/apache/spark/sql/test/VectorQueryTest.scala
 message=import.ordering.missingEmptyLine.message line=25 column=0
Saving to 
outputFile=/home/travis/build/apache/incubator-hivemall/spark/spark-2.1/target/scalastyle-output.xml
Processed 25 file(s)
Found 8 errors
Found 0 warnings
Found 0 infos
Finished in 2737 ms
[ERROR] Failed to execute goal 
org.scalastyle:scalastyle-maven-plugin:0.8.0:check (default-cli) on project 
hivemall-spark: Failed during scalastyle execution: You have 8 Scalastyle 
violation(s). -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, 
please read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the 
command
[ERROR]   mvn  -rf :hivemall-spark
The command "mvn -q scalastyle:check test -Pspark-2.1" exited with 1.
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #34: [HIVEMALL-45][SPARK] Upgrade spark v2.0.0 to v...

2017-01-31 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/34
  
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #35: [HIVEMALL-31][SPARK] Support Spark-v2.1.0

2017-01-31 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/35
  
@maropu LGTM. Please merge this PR and close the JIRA issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #35: [HIVEMALL-31][SPARK] Support Spark-v2.1.0

2017-01-31 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/35
  
BTW, you can close https://github.com/apache/incubator-hivemall/pull/23 as 
well.
`Close #35, #23:  `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #35: [HIVEMALL-31][SPARK] Support Spark-v2.1.0

2017-01-31 Thread myui
Github user myui commented on the issue:

https://github.com/apache/incubator-hivemall/pull/35
  
Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


  1   2   3   4   5   6   7   8   >