Github user myui commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/118#discussion_r142308265
--- Diff: core/src/main/java/hivemall/tools/text/NgramsUDF.java ---
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user myui commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/118#discussion_r142313252
--- Diff: core/src/main/java/hivemall/tools/text/NgramsUDF.java ---
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF
Github user myui commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/118#discussion_r142323455
--- Diff: core/src/main/java/hivemall/tools/text/WordNgramsUDF.java ---
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user myui commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/118#discussion_r142323289
--- Diff: core/src/main/java/hivemall/tools/text/WordNgramsUDF.java ---
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user myui commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/118#discussion_r142323365
--- Diff: core/src/main/java/hivemall/tools/text/WordNgramsUDF.java ---
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user myui commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/118#discussion_r142323566
--- Diff: core/src/main/java/hivemall/tools/text/WordNgramsUDF.java ---
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/118
@takuti LGTM. Can you merge this PR with squashing.
---
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/118
I personally prefer `wordgrams` though.
http://search.cpan.org/dist/Text-WordGrams/lib/Text/WordGrams.pm
---
GitHub user myui opened a pull request:
https://github.com/apache/incubator-hivemall/pull/121
[WIP][HIVEMALL-151] Support Matrix conversion from DoK to CSR/CSC matrix
## What changes were proposed in this pull request?
- Support Matrix conversion from DoK to CSR/CSC matrix
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/121
Review comments are welcome @takuti
Still work in progress
- [ ] Add more unit tests
- [ ] Revise SLIM implementation for [this
issue](https://github.com/apache/incubator
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/121
```
Caused by: java.lang.IllegalArgumentException: numCols SHOULD be greater
than zero. numCols = rowEnd - rowStart = 0 - 0 = 0
at hivemall.math.matrix.MatrixUtils.sortIndicies
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/105
FFM implementation is still work in progress.
- [ ] Need to revise `initialization scheme of V`
- [ ] Early stopping support
- [ ] Need to write documentations
---
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/121
@maropu could you review my PFOR implementation
https://github.com/apache/incubator-hivemall/pull/121/commits/c0f465a066379a1053a654de6655da71482ee482
if possible?
https://paperhub.s3
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/121
Decided to avoid using DoK => CSR conversion for dataMatrix because rows
are also sparse and not suited for CSR.
---
Github user myui commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/121#discussion_r144224976
--- Diff: core/src/main/java/hivemall/math/matrix/MatrixUtils.java ---
@@ -70,4 +77,259 @@ public void apply(int i, int value) {
return
Github user myui commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/122#discussion_r144765287
--- Diff: core/src/main/java/hivemall/evaluation/AUCUDAF.java ---
@@ -110,7 +110,7 @@ public ClassificationEvaluator
Github user myui commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/122#discussion_r144765390
--- Diff: core/src/main/java/hivemall/evaluation/HitRateUDAF.java ---
@@ -71,9 +71,6 @@
+ " - Returns HitRate")
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/122
LGTM. Merged! Thanks.
---
GitHub user myui opened a pull request:
https://github.com/apache/incubator-hivemall/pull/123
[WIP][HIVEMALL-154] Refactor Field-aware Factorization Machines to support
Instance-wise L2 normalization
## What changes were proposed in this pull request?
- Support instance
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/123
Discussion in the user list:
https://lists.apache.org/thread.html/05c3071ad92d7a0c364d3367025c89d86c96d0971e247ee8e434f536@%3Cuser.hivemall.apache.org%3E
---
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/123
Enabling [instance-wise L2 normalization
](https://github.com/apache/incubator-hivemall/pull/123/files#diff-07f782d5891a557e0af0b31638b5680fR179)
resulted in worse performance than `-no
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/123
https://gyazo.com/4049ff3f38a3342859663b0fb0216914
Found infinite loop in findKey in a certain condition.
---
Github user myui commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/124#discussion_r147900558
--- Diff: core/src/main/java/hivemall/tools/list/UDAFToOrderedList.java ---
@@ -406,6 +406,11 @@ void merge(@Nonnull List o_keyList, @Nonnull
List
Github user myui commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/124#discussion_r148199276
--- Diff: core/src/main/java/hivemall/tools/list/UDAFToOrderedList.java ---
@@ -404,11 +407,9 @@ void merge(@Nonnull List o_keyList, @Nonnull
List
Github user myui commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/124#discussion_r148200058
--- Diff: core/src/main/java/hivemall/tools/list/UDAFToOrderedList.java ---
@@ -363,6 +363,9 @@ public void merge(@SuppressWarnings("deprec
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/124
@takuti LGTM. You can merge it with squashing.
---
GitHub user myui opened a pull request:
https://github.com/apache/incubator-hivemall/pull/125
approx_distinct_count UDAF using HyperLogLog++
## What changes were proposed in this pull request?
This PR introduce `approx_distinct_count` using
[HyperLogLog++](https
Github user myui commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/125#discussion_r152910918
--- Diff:
core/src/main/java/hivemall/sketch/hll/ApproxCountDistinctUDAF.java ---
@@ -0,0 +1,253 @@
+/*
+ * Licensed to the Apache Software
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/126
@takuti
`normalize` may be introduced to Hive default UDF later as unicode
normalization or so.
So, `l1_normalize` is preferred.
Could you add gitbook documentation
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/126
Please link and close this issue as well.
https://issues.apache.org/jira/browse/HIVEMALL-59
---
Github user myui commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/126#discussion_r157173567
--- Diff: core/src/main/java/hivemall/ftvec/scaling/L1NormalizationUDF.java
---
@@ -0,0 +1,79 @@
+/*
+ * Licensed to the Apache Software
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/126
LGTM. Please merge it EMR testing.
---
GitHub user myui opened a pull request:
https://github.com/apache/incubator-hivemall/pull/127
[WIP][HIVEMALL-2][HIVEMALL-155] Change maven release scheme and create
release guide
## What changes were proposed in this pull request?
(Please fill in changes proposed in this
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/126
It seems recent roaring bitmap release caused a problem.
https://github.com/RoaringBitmap/RoaringBitmap/issues/197
We need to fix versions of depending libraries.
https
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/126
https://github.com/apache/incubator-hivemall/commit/2fa6fb99dd059c2003829e9c455668835e26be24
fixed CI error.
---
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/126
@takuti CI error is happening. Unit test is failing.
```
[ERROR]
/home/travis/build/apache/incubator-hivemall/core/src/test/java/hivemall/ftvec/scaling
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/126
@lemire ð Thank you for creating a great library.
---
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/128
Another approach is use
https://maven.apache.org/plugins/maven-resources-plugin/examples/copy-resources.html
to copy NOTICE, LICENSE, DISCLAIMER to `target/classes/META-INF` for each
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/128
@aajisaka here is my solution. DEPENDENCIES file created by
maven-remote-resources-plugin is missing though.
https://gist.github.com/myui/94af9ed1ca334422aa5fd078f60a673c
---
GitHub user myui opened a pull request:
https://github.com/apache/incubator-hivemall/pull/129
[HIVEMALL-164] Fixed pom for appending proper NOTICE/LICENSE/DISCLAIMER to
jars
## What changes were proposed in this pull request?
Fixed pom for appending proper NOTICE/LICENSE
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/119
Merged.
---
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/130
Ah, I'm planning to revert this commit
https://github.com/apache/incubator-hivemall/commit/1e940aff316b1a91484ad08ba286492892b32d07
of v0.5.0 branch for rc2 release.
```
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/130
Better to use jenv, scalaenv for java/scala compiler section in the
document.
---
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/132
@oza Thanks. Updated.
---
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/133
@oza Sure. It seems there are some delay in git mirroring... Already merged
in ASF master.
https://git-wip-us.apache.org/repos/asf?p=incubator-hivemall.git
---
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/134
merged. Thanks!
---
GitHub user myui opened a pull request:
https://github.com/apache/incubator-hivemall/pull/135
[WIP] Merge Brickhouse functions
## What changes were proposed in this pull request?
Merge [brickhouse](https://github.com/klout/brickhouse) functions.
## What type of PR
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
Still WIP for
[reviewing](https://docs.google.com/spreadsheets/d/1gtFNcTvPR9OZAsbobj2D9d37tOx4nAoSlib9CLdEDQg/edit#gid=0)
functions to merge.
---
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/80
@amaya382 Instead of using dockcross, I built native xgboost binaries for
most Linux platform and Mac OSX in https://github.com/myui/build-xgboost-jvm/
To support more platforms
GitHub user myui opened a pull request:
https://github.com/apache/incubator-hivemall/pull/136
[HIVEMALL-174][DOC] Update RandomForest document to reflect changes in
usages
## What changes were proposed in this pull request?
Update RandomForest document to reflect changes
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/137
@maropu Yes, let's drop spark-2.0 support in the next release. Please do it
in another ticket.
FYI supported Spark version in EMR.
https://docs.aws.amazon.com/emr/l
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/137
CI failing due to `Failed to execute goal
org.scalastyle:scalastyle-maven-plugin:0.8.0:check (default) on project
hivemall-spark2.3: Failed during scalastyle execution: You have 8
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/137
@maropu Is this PR still WIP?
---
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/137
LGTM.
@maropu Could you merge this PR using `bin/merge_pr.py`?
---
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
```sql
select
NAMED_STRUCT("Name", "John", "age", 31),
to_json(
NAMED_STRUCT("Name", "John", "age", 31)
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/138
+1 to drop spark 2.0 support in the next release.
---
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/139
@maropu CI failing.
---
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/138
LGTM. Will merge.
---
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/140
Merged. Project site will be updated later. Thanks.
---
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
@jeromebanks merging of Brickhouse functions is in-progress in this PR. FYI
We need to add unit test, improve qualities of functions, and add documents.
---
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/141
LGTM.
@maropu
Could you merge this PR into master?
---
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
```sql
create temporary function moving_avg as
'hivemall.statistics.MovingAverageUDTF';
select moving_avg(x, 3) from (select explode(array(1,2,3,4,5,6,7)) as
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/141
@maropu It it possible to change duplicating
`apache-hivemall:apache-hivemall:0.5.1-spark2.2` to
`apache-hivemall:hivemall-on-spark:v0.5.1-spark2.2` or so?
---
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
@maropu Could you check whether `to_json` and `from_json` works on Spark or
not if possible?
I'm not sure hcatalog is provided in Spark environment.
https://github.com/a
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
```sql
create temporary function conditional_emit as
'hivemall.tools.array.ConditionalEmitUDTF';
WITH input as (
select array(true, false, true) as conditions,
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
```sql
create temporary function array_slice as
'hivemall.tools.array.ArraySliceUDF';
select
array_slice(
array("zero", "one", "
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
@maropu Deprecated SubarrayUDF to use ArraySliceUDF instead. FYI
---
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
@paulojblack Thank you for comments. Will confirm it and fix master.
---
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/142
LGTM.
---
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
@jeromebanks Add you to [the committer
list](http://hivemall.incubator.apache.org/team-list.html) with
https://github.com/apache/incubator-hivemall/pull/135/commits
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
@paulojblack You need to use up-to-date DDLs since we updated DDLs for
`subarray` UDF in
https://github.com/apache/incubator-hivemall/pull/135/commits
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
If you are using `v0.5.0`, then you need to use [one of
v0.5.0](https://github.com/apache/incubator-hivemall/blob/v0.5.0/resources/ddl/define-all.hive).
DDLs are pointing specified
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
@paulojblack Generally, we recommend to use [Official ASF
releases](http://hivemall.incubator.apache.org/download.html), not one in the
master branch.
When you are using the master
GitHub user myui opened a pull request:
https://github.com/apache/incubator-hivemall/pull/144
[HIVEMALL-190][HOTFIX] Fixed a bug in tree_predict_v1 on loading old
prediction models
## What changes were proposed in this pull request?
Fix a bug in `tree_predict_v1` on
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/143
@takuti I think it's better to have hivemalldoc module (e.g.,
`tools/hivemalldoc` )
Hadoop tools.
https://github.com/apache/hadoop/tree/trunk/hadoop-tools
Here is
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/145
@takuti Ideally, we SHOULD have serialization tests for all UDFs. New UDFs
MUST have a serialization test.
---
Github user myui commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/146#discussion_r182308352
--- Diff: core/src/main/java/hivemall/smile/tools/TreeExportUDF.java ---
@@ -141,17 +141,17 @@ public String getDisplayString(String[] children
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/145
Sure.
---
Github user myui commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/146#discussion_r182315386
--- Diff: core/src/main/java/hivemall/smile/tools/TreeExportUDF.java ---
@@ -141,17 +141,17 @@ public String getDisplayString(String[] children
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/146
@takuti LGTM! You can merge this PR.
---
Github user myui commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/145#discussion_r182357781
--- Diff:
core/src/main/java/hivemall/ftvec/trans/QuantifiedFeaturesUDTF.java ---
@@ -87,30 +80,27 @@ public StructObjectInspector
initialize
Github user myui commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/145#discussion_r182360921
--- Diff: nlp/src/main/java/hivemall/nlp/tokenizer/KuromojiUDF.java ---
@@ -69,13 +69,10 @@
private static final int READ_TIMEOUT_MS
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
```sql
select generate_series(2,4);
value
2
3
4
select generate_series(5,1,-2);
value
5
3
1
select generate_series(4,3
Github user myui commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/148#discussion_r183578547
--- Diff: tools/hivemall-docs/pom.xml ---
@@ -0,0 +1,173 @@
+
+http://maven.apache.org/POM/4.0.0";
xmlns:xsi="http://www.w
Github user myui commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/148#discussion_r183642989
--- Diff: tools/hivemall-docs/pom.xml ---
@@ -0,0 +1,173 @@
+
+http://maven.apache.org/POM/4.0.0";
xmlns:xsi="http://www.w
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/135
```sql
create temporary function merge_maps as 'hivemall.tools.map.MergeMapsUDAF';
create table test as
SELECT map('A',10,'B',20,'C
Github user myui commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/148#discussion_r183682380
--- Diff: tools/hivemall-docs/pom.xml ---
@@ -0,0 +1,173 @@
+
+http://maven.apache.org/POM/4.0.0";
xmlns:xsi="http://www.w
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/148
LGTM. Merged! Thanks.
---
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/145
Merged. Thanks. I'll do some refactoring in the following commit
(TestUtils.java is redundant etc).
```
TestUtils.java
TestBinariseLabelUDTF
Github user myui commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/145#discussion_r184277241
--- Diff:
core/src/main/java/hivemall/ftvec/trans/QuantifiedFeaturesUDTF.java ---
@@ -87,32 +87,37 @@ public StructObjectInspector
initialize
Github user myui commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/145#discussion_r184277838
--- Diff:
core/src/main/java/hivemall/ftvec/trans/QuantifiedFeaturesUDTF.java ---
@@ -87,32 +87,37 @@ public StructObjectInspector
initialize
Github user myui commented on a diff in the pull request:
https://github.com/apache/incubator-hivemall/pull/145#discussion_r184281067
--- Diff: nlp/src/main/java/hivemall/nlp/tokenizer/KuromojiUDF.java ---
@@ -69,13 +69,10 @@
private static final int READ_TIMEOUT_MS
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/149
@takuti
Linear term is not used in Libffm implementation. Better to do research
about other FFM impl as well.
https://github.com/chenhuang-learn/ffm/blob/master/ffm/src/ffm
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/149
@takuti I advice to check 2-3 updates to investigate how gradient updates
differ.
---
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/149
@takuti Thank you for detailed verification. Let's disable linear term by
the default.
Remove `-disable_wi` and `-enable_wi` (alias `-linear_term` ) to enable
linear
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/149
BTW, it might be better to implement `early stopping` using validation data.
https://github.com/guestwalk/libffm
We can use a similar approaches to `_validationRatio` used in
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/149
Also, it's better to revise default `-iters` from 1 to 10 (at least 10
iterations with early stopping).
---
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/149
@takuti so then, better to enable l2_norm by the default and
`-disable_l2norm` to disable l2 normalization. My concern is that L2
normalization performed worse for small datasets with
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/149
It might be better to reconsider `eta0` when enabling `l2norm` by the
default and by enlarging`max_init_size`. In my experience for FM, init random
size should be small when the avg feature
Github user myui commented on the issue:
https://github.com/apache/incubator-hivemall/pull/150
Merged Thanks.
---
601 - 700 of 741 matches
Mail list logo