[GitHub] incubator-hivemall issue #35: [HIVEMALL-31][SPARK] Support Spark-v2.1.0

2017-01-31 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/35
  
Merged


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #37: [HIVEMALL-47][SPARK] Support codegen for top-K...

2017-02-06 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/37
  
Updated the benchmark; the size of the left table is ~140MB and the size of 
the right table is ~70MB.
```
  TestUtils.benchmark("codegen top-k join") {
/**
 * Java HotSpot(TM) 64-Bit Server VM 1.8.0_31-b13 on Mac OS X 10.10.2
 * Intel(R) Core(TM) i7-4578U CPU @ 3.00GHz
 *
 * top_k_join: Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative
 * 
---
 * top_k_join wholestage off   3 /5   2751.9   
0.4   1.0X
 * top_k_join wholestage on1 /1   6494.4   
0.2   2.4X
 */
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #37: [HIVEMALL-47][SPARK] Support codegen for top-K...

2017-02-06 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/37
  
A codegen'd top-K join is as follows;
```
Found 1 WholeStageCodegen subtrees.
== Subtree 1 / 1 ==
*ShuffledHashJoinTopK -1, [group#10], [group#27]
:- Exchange hashpartitioning(group#10, 200)
:  +- LocalTableScan [userId#9, group#10, x#11, y#12]
+- Exchange hashpartitioning(group#27, 200)
   +- LocalTableScan [group#27, position#28, x#29, y#30]

Generated code:
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIterator(references);
/* 003 */ }
/* 004 */
/* 005 */ final class GeneratedIterator extends 
org.apache.spark.sql.execution.BufferedRowIterator {
/* 006 */   private Object[] references;
/* 007 */   private scala.collection.Iterator[] inputs;
/* 008 */   private 
org.apache.spark.sql.execution.joins.ShuffledHashJoinTopKExec 
shuffledhashjointopk_topKJoin;
/* 009 */   private org.apache.spark.sql.execution.joins.PriorityQueueShim 
shuffledhashjointopk_queue;
/* 010 */   private scala.collection.Iterator shuffledhashjointopk_leftIter;
/* 011 */   private InternalRow shuffledhashjointopk_leftRow;
/* 012 */   private int shuffledhashjointopk_value;
/* 013 */   private UTF8String shuffledhashjointopk_value1;
/* 014 */   private boolean shuffledhashjointopk_isNull;
/* 015 */   private double shuffledhashjointopk_value2;
/* 016 */   private double shuffledhashjointopk_value3;
/* 017 */   private int shuffledhashjointopk_value8;
/* 018 */   private double shuffledhashjointopk_value9;
/* 019 */   private 
org.apache.spark.sql.execution.joins.ShuffledHashJoinTopKExec 
shuffledhashjointopk_joinExec;
/* 020 */   private org.apache.spark.sql.execution.joins.HashedRelation 
shuffledhashjointopk_relation;
/* 021 */   private UnsafeRow shuffledhashjointopk_result;
/* 022 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder 
shuffledhashjointopk_holder;
/* 023 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
shuffledhashjointopk_rowWriter;
/* 024 */   private org.apache.spark.sql.catalyst.expressions.JoinedRow 
shuffledhashjointopk_joinedRow;
/* 025 */   private int shuffledhashjointopk_value23;
/* 026 */   private boolean shuffledhashjointopk_isNull18;
/* 027 */   private double shuffledhashjointopk_value24;
/* 028 */   private boolean shuffledhashjointopk_isNull19;
/* 029 */   private int shuffledhashjointopk_value25;
/* 030 */   private UTF8String shuffledhashjointopk_value26;
/* 031 */   private boolean shuffledhashjointopk_isNull20;
/* 032 */   private double shuffledhashjointopk_value27;
/* 033 */   private double shuffledhashjointopk_value28;
/* 034 */   private UTF8String shuffledhashjointopk_value29;
/* 035 */   private boolean shuffledhashjointopk_isNull21;
/* 036 */   private UTF8String shuffledhashjointopk_value30;
/* 037 */   private boolean shuffledhashjointopk_isNull22;
/* 038 */   private double shuffledhashjointopk_value31;
/* 039 */   private double shuffledhashjointopk_value32;
/* 040 */   private org.apache.spark.sql.execution.metric.SQLMetric 
shuffledhashjointopk_numOutputRows;
/* 041 */   private UnsafeRow shuffledhashjointopk_result1;
/* 042 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder 
shuffledhashjointopk_holder1;
/* 043 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter 
shuffledhashjointopk_rowWriter1;
/* 044 */
/* 045 */   public GeneratedIterator(Object[] references) {
/* 046 */ this.references = references;  
/* 047 */   }
/* 048 */
/* 049 */   public void init(int index, scala.collection.Iterator[] inputs) 
{
/* 050 */ partitionIndex = index;
/* 051 */ this.inputs = inputs;
/* 052 */ wholestagecodegen_init_0();
/* 053 */ wholestagecodegen_init_1();
/* 054 */
/* 055 */   }
/* 056 */
/* 057 */   private void wholestagecodegen_init_0() {
/* 058 */ this.shuffledhashjointopk_topKJoin = 
(org.apache.spark.sql.execution.joins.ShuffledHashJoinTopKExec) references[0];
/* 059 */ shuffledhashjointopk_queue = 
shuffledhashjointopk_topKJoin.priorityQueue();
/* 060 */ shuffledhashjointopk_leftIter = inputs[0];
/* 061 */
/* 062 */ this.shuffledhashjointopk_joinExec = 
(org.apache.spark.sql.execution.joins.ShuffledHashJoinTopKExec) references[1];
/* 063 */
/* 064 */ shuffledhashjointopk_relation = 
(org.apache.spark.sql.execution.joins.HashedRelation) 
shuffledhashjointopk_joinExec.buildHashedRelation(inputs[1]);
/* 065 */ 
incPeakExecutionMemory(shuffledhashjointopk_relation.estimatedSize());
/* 066 */
/* 067 */ shuffledhashjointopk_result = new UnsafeRow(1);
/* 068

[GitHub] incubator-hivemall issue #38: Support spark-sql

2017-02-07 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/38
  
LGTM cc: @myui


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #36: [Spark] Update gitbook for top_k_join

2017-02-04 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/36
  
okay, thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #41: [HIVEMALL-54][SPARK] Add an easy-to-use script...

2017-02-08 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/41
  
yea, I'll update just after this merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #37: [HIVEMALL-47][SPARK] Support codegen for top-K...

2017-02-06 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/37
  
okay, merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #23: [HIVEMALL-31] Change the branch of spark-2.0 t...

2017-01-24 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/23
  
yea, could you add `[WIP]` in this title?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #24: [HIVEMALL-32] Print explicit error messages in...

2017-01-24 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/24
  
Merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #29: [HIVEMALL-39] Put the use of HiveUDFs i...

2017-01-26 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/incubator-hivemall/pull/29

[HIVEMALL-39] Put the use of HiveUDFs in one place

## What changes were proposed in this pull request?
This is a refactoring issue; in the master, we directly use the logical 
plan nodes of Hive UDFs in `HivemallOps`. However, these nodes are the internal 
classes of Spark and the interfaces may evolve. So, this pr created a new file 
`HivemallOpsImpl` and put these classes there.

## What type of PR is it?
Refactoring

### What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-39


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-39

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/29.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #29


commit 09c2233138f0976e0de8c871d319bd39f5819464
Author: Takeshi YAMAMURO <linguin@gmail.com>
Date:   2017-01-26T15:12:54Z

Put the use of HiveUDFs in one place




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #29: [HIVEMALL-39][SPARK] Put the use of HiveUDFs i...

2017-01-26 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/29
  
Merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #34: [HIVEMALL-45][SPARK] Upgrade spark v2.0...

2017-01-30 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/incubator-hivemall/pull/34

[HIVEMALL-45][SPARK] Upgrade spark v2.0.0 to v2.0.2 (latest)

## What changes were proposed in this pull request?
This pr updated pom.xml for the upgrade.

## What type of PR is it?
Improvement

### What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-45



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-45

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/34.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #34


commit ff097f94165486d1b19b8a7790e49476dcb9e24f
Author: Takeshi YAMAMURO <linguin@gmail.com>
Date:   2017-01-31T04:05:44Z

Upgrade spark v2.0.0 to v2.0.2 (latest)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #26: [HIVEMALL-35] Remove unnecessary implic...

2017-01-25 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/incubator-hivemall/pull/26

[HIVEMALL-35] Remove unnecessary implicit conversions in HivemallUtils

## What changes were proposed in this pull request?
This pr removed entries for implicit conversion in `HivemallUtils`.

## What type of PR is it?
Improvement

### What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-35



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-35

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/26.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #26


commit d45657dc4aad5c647e8c702a1e8549670c5b1dee
Author: Takeshi YAMAMURO <linguin@gmail.com>
Date:   2017-01-25T14:21:32Z

Remove unnecessary implicit conversions in HivemallUtils




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #25: [HIVEMALL-34] Fix a bug to wrongly use ...

2017-01-25 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/incubator-hivemall/pull/25

[HIVEMALL-34] Fix a bug to wrongly use mllib vectors in some functions

## What changes were proposed in this pull request?
In `to_hivemall_features` and `append_bias` in `HivemallUtils`, they 
wrongly used mllib vectors. They should use ml vectors instead.

## What type of PR is it?
Bug Fix

### What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-34

## How was this patch tested?
Enabled a test in `HiveUdfWithVectorSuite`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-34

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/25.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #25


commit d1c8b64176fd6caf9f2f00a51017ccce40df6eb9
Author: Takeshi YAMAMURO <linguin@gmail.com>
Date:   2017-01-25T10:06:02Z

Fix a bug to wrongly use mllib vectors in some functions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #25: [HIVEMALL-34] Fix a bug to wrongly use mllib v...

2017-01-25 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/25
  
Merged!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #28: [HIVEMALL-30] Temporarily ignore a stre...

2017-01-26 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/incubator-hivemall/pull/28

[HIVEMALL-30] Temporarily ignore a streaming test

## What changes were proposed in this pull request?
This test below fails sometimes (too flaky), so we temporarily ignore it.
The stacktrace of this failure is:
```
HivemallOpsWithFeatureSuite:
Exception in thread "broadcast-exchange-60" java.lang.OutOfMemoryError: 
Java heap space
at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$4.apply(TorrentBroadcast.scala:231)
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$4.apply(TorrentBroadcast.scala:231)
at 
org.apache.spark.util.io.ChunkedByteBufferOutputStream.allocateNewChunkIfNeeded(ChunkedByteBufferOutputStream.scala:78)
at 
org.apache.spark.util.io.ChunkedByteBufferOutputStream.write(ChunkedByteBufferOutputStream.scala:65)
at 
net.jpountz.lz4.LZ4BlockOutputStream.flushBufferedData(LZ4BlockOutputStream.java:205)
at 
net.jpountz.lz4.LZ4BlockOutputStream.finish(LZ4BlockOutputStream.java:235)
at 
net.jpountz.lz4.LZ4BlockOutputStream.close(LZ4BlockOutputStream.java:175)
at 
java.io.ObjectOutputStream$BlockDataOutputStream.close(ObjectOutputStream.java:1827)
at java.io.ObjectOutputStream.close(ObjectOutputStream.java:741)
at 
org.apache.spark.serializer.JavaSerializationStream.close(JavaSerializer.scala:57)
at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$blockifyObject$1.apply$mcV$sp(TorrentBroadcast.scala:238)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1296)
at 
org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:237)
at 
org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:107)
at 
org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:86)
at 
org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34)
at 
org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:56)
at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1370)
```

## What type of PR is it?
Bug Fix

### What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-30


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-30

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/28.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #28


commit db3de52892fee8027fab2a99499ee6658f1eb4fa
Author: Takeshi YAMAMURO <linguin@gmail.com>
Date:   2017-01-26T08:44:23Z

Temporarily ignore a streaming test




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #26: [HIVEMALL-35] Remove unnecessary implicit conv...

2017-01-26 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/26
  
I made a pr for this flaky test failure in #28, so I'll merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #27: [HIVEMALL-36] Refactor each_top_k

2017-01-25 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/27
  
okay, I'll merge this, then I'll check the OOM issue in follow-up 
activities. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #29: [HIVEMALL-39] Put the use of HiveUDFs in one p...

2017-01-26 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/29
  
@myui could you check this before merging it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #49: [HIVEMALL-26][SPARK] Make docs for regression ...

2017-02-23 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/49
  
Updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #49: [HIVEMALL-26][SPARK] Make docs for regr...

2017-02-23 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/incubator-hivemall/pull/49

[HIVEMALL-26][SPARK] Make docs for regression and binary classification

## What changes were proposed in this pull request?
This pr added docs for hivemall-on-spark.

## What type of PR is it?
Documentation

## What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-26

## How was this patch tested?
N/A


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-26-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/49.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #49


commit eabddeacd40e3c9d9b3b20938357f666f00132a1
Author: Takeshi Yamamuro <yamam...@apache.org>
Date:   2017-02-23T11:03:09Z

Make docs for hivemall-on-spark




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #49: [HIVEMALL-26][SPARK] Make docs for regression ...

2017-02-23 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/49
  
Many thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #44: [HIVEMALL-65] Update define-all.spark and impo...

2017-02-14 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/44
  
LGTM. I'll merged later


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #41: [HIVEMALL-54][SPARK] Add an easy-to-use script...

2017-02-10 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/41
  
Merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #20: [HIVEMALL-28] Set HIVEMALL_HOME to absolute pa...

2017-01-19 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/20
  
@wangyum Thanks for your work! What does this pr solve? Any issue in the 
current script?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #20: [HIVEMALL-28] Set HIVEMALL_HOME to absolute pa...

2017-01-19 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/20
  
@wangyum oh, I found you've already described that in the JIRA ticket. 
Could you write "what does this pr solve?" in this description? If you update 
that , LGTM. cc: @myui


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #54: [HIVEMALL-76][SPARK] Fix worng ranks in...

2017-02-28 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/incubator-hivemall/pull/54

[HIVEMALL-76][SPARK] Fix worng ranks in top-K funcs

## What changes were proposed in this pull request?
This pr fixed the Spark `each_top_k`/`top_k_join` behaviour along with Hive 
ones.

## What type of PR is it?
Bug Fix

## What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-76

## How was this patch tested?
Added tests in `HivemallOpsSuite`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-76

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/54.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #54






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #54: [HIVEMALL-76][SPARK] Fix worng ranks in top-K ...

2017-03-02 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/54
  
@myui passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #42: [HIVEMALL-38][SPARK] Support ChangeFinderUDF i...

2017-02-26 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/42
  
It's okay to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #62: [HIVEMALL-89][SQL] Support to_from/from_csv in...

2017-03-16 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/62
  
Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #59: [HIVEMALL-85] Upgrade hivemall-xgboost's hadoo...

2017-03-04 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/59
  
@wangyum Thanks for your continuous contributions! 
@myui do we have any reason to have a dependency with hadoop-core 
`0.20.2-cdh3u6`? I just used this dependency along with the other modules.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #61: [HIVEMALL-88][SPARK] Support a function...

2017-03-08 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/61#discussion_r105089894
  
--- Diff: 
spark/spark-2.1/src/main/scala/org/apache/spark/sql/hive/HivemallOps.scala ---
@@ -805,6 +805,47 @@ final class HivemallOps(df: DataFrame) extends Logging 
{
 JoinTopK(kInt, df.logicalPlan, right.logicalPlan, Inner, 
Option(joinExprs.expr))(score.named)
   }
 
+  private def doFlatten(schema: StructType, prefix: Option[String] = None) 
: Seq[Column] = {
+schema.fields.flatMap { f =>
+  val colName = prefix.map(p => s"$p.${f.name}").getOrElse(f.name)
--- End diff --

In Spark, the dot is used as the separator of column names in nested schema.
Currently, Spark users cannot change this separator via configurations.
For example,

```
scala> val ds = Seq((1, (1.0, "a"))).toDS()
ds: org.apache.spark.sql.Dataset[(Int, (Double, String))] = [_1: int, _2: 
struct<_1: double, _2: string>]

scala> ds.printSchema
root
 |-- _1: integer (nullable = false)
 |-- _2: struct (nullable = true)
 ||-- _1: double (nullable = false)
 ||-- _2: string (nullable = true)


scala> ds.select($"_2._2").show
+---+
| _2|
+---+
|  a|
+---+
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #61: [HIVEMALL-88][SPARK] Support a function...

2017-03-08 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/61#discussion_r105090944
  
--- Diff: 
spark/spark-2.1/src/main/scala/org/apache/spark/sql/hive/HivemallOps.scala ---
@@ -805,6 +805,47 @@ final class HivemallOps(df: DataFrame) extends Logging 
{
 JoinTopK(kInt, df.logicalPlan, right.logicalPlan, Inner, 
Option(joinExprs.expr))(score.named)
   }
 
+  private def doFlatten(schema: StructType, prefix: Option[String] = None) 
: Seq[Column] = {
+schema.fields.flatMap { f =>
+  val colName = prefix.map(p => s"$p.${f.name}").getOrElse(f.name)
--- End diff --

Ah, I found an issue;
```
scala> val df = Seq((1, (1.0, "a"))).toDF()
df: org.apache.spark.sql.DataFrame = [_1: int, _2: struct<_1: double, _2: 
string>]

scala> val ds1 = df.flatten().select("_2._1")
org.apache.spark.sql.AnalysisException: cannot resolve '`_2._1`' given 
input columns: [_1, _2._1, _2._2];;
'Project ['_2._1]
+- Project [_1#67 AS _1#73, _2#68._1 AS _2._1#74, _2#68._2 AS _2._2#75]
   +- LocalRelation [_1#67, _2#68]

  at 
org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:75)
  at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:72)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289)
```
So, I'll reconsider this and please give me a sec. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #61: [HIVEMALL-88][SPARK] Support a function...

2017-03-08 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/61#discussion_r105093086
  
--- Diff: 
spark/spark-2.1/src/main/scala/org/apache/spark/sql/hive/HivemallOps.scala ---
@@ -805,6 +805,47 @@ final class HivemallOps(df: DataFrame) extends Logging 
{
 JoinTopK(kInt, df.logicalPlan, right.logicalPlan, Inner, 
Option(joinExprs.expr))(score.named)
   }
 
+  private def doFlatten(schema: StructType, prefix: Option[String] = None) 
: Seq[Column] = {
+schema.fields.flatMap { f =>
+  val colName = prefix.map(p => s"$p.${f.name}").getOrElse(f.name)
--- End diff --

Actually, we can access this column like this;
```
scala> val df = Seq((1, (1.0, "a"))).toDF()
df: org.apache.spark.sql.DataFrame = [_1: int, _2: struct<_1: double, _2: 
string>]

scala> val ds1 = df.flatten().select("`_2._1`").show
+-+
|_2._1|
+-+
|  1.0|
+-+

```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #61: [HIVEMALL-88][SPARK] Support a function...

2017-03-08 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/61#discussion_r105088535
  
--- Diff: 
spark/spark-2.1/src/main/scala/org/apache/spark/sql/hive/HivemallOps.scala ---
@@ -805,6 +805,47 @@ final class HivemallOps(df: DataFrame) extends Logging 
{
 JoinTopK(kInt, df.logicalPlan, right.logicalPlan, Inner, 
Option(joinExprs.expr))(score.named)
   }
 
+  private def doFlatten(schema: StructType, prefix: Option[String] = None) 
: Seq[Column] = {
+schema.fields.flatMap { f =>
+  val colName = prefix.map(p => s"$p.${f.name}").getOrElse(f.name)
--- End diff --

I know, but this is a Spark-local specification. So, the change you 
suggested make `doFlatten` fail.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #62: [HIVEMALL-89][SQL] Support to_from/from...

2017-03-08 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/62#discussion_r105090294
  
--- Diff: 
spark/spark-2.1/src/main/scala/org/apache/spark/sql/execution/datasources/csv/csvExpressions.scala
 ---
@@ -0,0 +1,153 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources.csv
+
+import java.io.CharArrayWriter
+
+import jodd.util.CsvUtil
--- End diff --

Updated


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #61: [HIVEMALL-88][SPARK] Support a function...

2017-03-08 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/61#discussion_r105099394
  
--- Diff: 
spark/spark-2.1/src/main/scala/org/apache/spark/sql/hive/HivemallOps.scala ---
@@ -805,6 +805,47 @@ final class HivemallOps(df: DataFrame) extends Logging 
{
 JoinTopK(kInt, df.logicalPlan, right.logicalPlan, Inner, 
Option(joinExprs.expr))(score.named)
   }
 
+  private def doFlatten(schema: StructType, prefix: Option[String] = None) 
: Seq[Column] = {
+schema.fields.flatMap { f =>
+  val colName = prefix.map(p => s"$p.${f.name}").getOrElse(f.name)
--- End diff --

@myui How about the latest fix? As you suggested, I added an option for 
separator.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #61: [HIVEMALL-88][SPARK] Support a function...

2017-03-08 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/61#discussion_r105090100
  
--- Diff: 
spark/spark-2.1/src/main/scala/org/apache/spark/sql/hive/HivemallOps.scala ---
@@ -805,6 +805,47 @@ final class HivemallOps(df: DataFrame) extends Logging 
{
 JoinTopK(kInt, df.logicalPlan, right.logicalPlan, Inner, 
Option(joinExprs.expr))(score.named)
   }
 
+  private def doFlatten(schema: StructType, prefix: Option[String] = None) 
: Seq[Column] = {
+schema.fields.flatMap { f =>
+  val colName = prefix.map(p => s"$p.${f.name}").getOrElse(f.name)
--- End diff --

So, the dot is more natural for Spark users.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #62: [HIVEMALL-89][SQL] Support to_from/from_csv in...

2017-03-09 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/62
  
Updated descriptions for the two funcs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #59: [HIVEMALL-85] Upgrade hivemall-xgboost's hadoo...

2017-03-05 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/59
  
@wangyum Why you select `2.6.5` in this pr? Any reason?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #100: [HOTFIX] Update documents for DataFram...

2017-07-11 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/incubator-hivemall/pull/100

[HOTFIX] Update documents for DataFrame in Spark

## What changes were proposed in this pull request?
This pr updated documents for `DataFrame` in Spark.

## What type of PR is it?
[Bug Fix | Hot Fix]

## What is the Jira issue?
N/A

## How was this patch tested?
N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/incubator-hivemall HOTFIX-20170712

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/100.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #100


commit 2d036aa76d5365ad9a4a3b4d3272232369f114f6
Author: Takeshi Yamamuro <yamam...@apache.org>
Date:   2017-07-12T04:37:31Z

hotfix




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #100: [HOTFIX] Update documents for DataFrame in Sp...

2017-07-12 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/100
  
Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #99: [HIVEMALL-116][SQL][DOC] Add docs for S...

2017-07-11 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/incubator-hivemall/pull/99

[HIVEMALL-116][SQL][DOC] Add docs for SQL cases in hivemall-spark

## What changes were proposed in this pull request?
This pr added docs for SQL cases in `hivemall-spark`.

## What type of PR is it?
Documentation

## What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-116

## How was this patch tested?
N/A

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-116

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/99.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #99


commit e593d704cad18e897fd1187861855f389ed5184e
Author: Takeshi Yamamuro <yamam...@apache.org>
Date:   2017-07-11T13:51:27Z

Add SQL docs for hivemall-spark




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #106: [HIVEMALL-136][SPARK] Support train_cl...

2017-07-27 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/incubator-hivemall/pull/106

[HIVEMALL-136][SPARK] Support train_classifier and train_regressor for Spark

## What changes were proposed in this pull request?
This pr added functions `train_classifier` and `train_regressor` in 
`HivemallOps`.

## What type of PR is it?
Improvement

## What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-136

## How was this patch tested?
Added tests in `HivemallOpsWithFeatureSuite`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-136

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/106.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #106


commit a71de06f94b4acf5f53d8bd2ec5fe73c2e589b03
Author: Takeshi Yamamuro <yamam...@apache.org>
Date:   2017-07-27T13:44:34Z

Support train_classifier and train_regressor




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #99: [HIVEMALL-116][SPARK][DOC] Add docs for...

2017-07-11 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/99#discussion_r126836953
  
--- Diff: docs/gitbook/spark/regression/e2006_sql.md ---
@@ -0,0 +1,151 @@
+
+
+E2006
+===

+http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html#E2006-tfidf
+
+Data preparation
+
+
+```sh
+$ wget 
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression/E2006.train.bz2
+$ wget 
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression/E2006.test.bz2
+```
+
+```scala
+scala> :paste
+spark.read.format("libsvm").load("E2006.train.bz2")
+  .select($"label", to_hivemall_features($"features").as("features"))
+  .createOrReplaceTempView("rawTrainTable")
+
+// `label` must be [0.0, 1.0]
+sql("""
+  CREATE OR REPLACE TEMPORARY VIEW trainTable AS
+SELECT rescale(label, -7.899578, -0.51940954) AS label, features
--- End diff --

FIxed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #99: [HIVEMALL-116][SPARK][DOC] Add docs for SQL ca...

2017-07-11 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/99
  
Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #95: [HIVEMALL-119] Fix type cast issues in XGBoost...

2017-07-05 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/95
  
@amaya382 Can you check this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #95: [HIVEMALL-119] Fix type cast issues in ...

2017-07-06 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/95#discussion_r125845972
  
--- Diff: xgboost/src/main/java/hivemall/xgboost/XGBoostUDTF.java ---
@@ -326,7 +320,7 @@ public void close() throws HiveException {
 logger.info("model_id:" + modelId.toString() + " size:" + 
predModel.length);
 forward(new Object[] {modelId, predModel});
 } catch (Exception e) {
--- End diff --

It seems we can't cuz `close()` only throws `HiveException`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #95: [HIVEMALL-119] Fix type cast issues in ...

2017-07-06 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/95#discussion_r125930040
  
--- Diff: xgboost/src/main/java/hivemall/xgboost/XGBoostUDTF.java ---
@@ -269,44 +270,35 @@ public void checkTargetValue(double target) throws 
HiveException {}
 public void process(Object[] args) throws HiveException {
--- End diff --

Is it ok to just call `mvn formatter:format`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #95: [HIVEMALL-119] Fix type cast issues in XGBoost...

2017-07-05 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/95
  
ok, I hit the same error. I'll check again. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #95: [HIVEMALL-119] Fix type cast issues in XGBoost...

2017-07-05 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/95
  
Without `HadoopUtils.getTaskId()` in 
[here](https://github.com/maropu/incubator-hivemall/blob/e9fc6cfabd295c4c49faf43c4a44fe9eca2c9025/xgboost/src/main/java/hivemall/xgboost/XGBoostUDTF.java#L290),
 it works fine. But, I don't know this is why.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #95: [HIVEMALL-119] Fix type cast issues in XGBoost...

2017-07-05 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/95
  
@amaya382 check again? I checked it worked well in my local env.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #75: [HIVEMALL-100] Fix build script

2017-04-27 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/75
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #103: [HIVEMALL-133][SPARK][WIP] Support spark-v2.2...

2017-07-27 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/103
  
ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #78: [HIVEMALL-103][Spark] Upgrade spark-v2....

2017-05-11 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/incubator-hivemall/pull/78

[HIVEMALL-103][Spark] Upgrade spark-v2.1.0 to v2.1.1

## What changes were proposed in this pull request?
This pr upgraded spark-v2.1.0 to v2.1.1.

## What type of PR is it?
Improvement

## What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-103

## How was this patch tested?
Existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-103

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/78.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #78


commit de9f2ce2d6cefa0228122edaf872e3ad9068a7c0
Author: Takeshi Yamamuro <yamam...@apache.org>
Date:   2017-05-12T03:39:00Z

Upgrade spark-v2.1.0 to v2.1.1




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #78: [HIVEMALL-103][Spark] Upgrade spark-v2.1.0 to ...

2017-05-12 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/78
  
ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #78: [HIVEMALL-103][Spark] Upgrade spark-v2.1.0 to ...

2017-05-12 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/78
  
merged to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #80: [WIP][HIVEMALL-99] Cross-compilation of...

2017-05-21 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/80#discussion_r117665868
  
--- Diff: bin/build_xgboost.sh ---
@@ -1,87 +0,0 @@
-#!/bin/bash
-#
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-#
-
-# xgboost requires g++-4.6 or higher 
(https://github.com/dmlc/xgboost/blob/master/doc/build.md),
-# so we need to first check if the requirement is satisfied.
-COMPILER_REQUIRED_VERSION="4.6"
-COMPILER_VERSION=`g++ --version 2> /dev/null`
-
-# Check if GNU g++ installed
-if [ $? = 127 ]; then
--- End diff --

We'd be better off printing explicit error messages when `clang` used. 
Could you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #80: [WIP][HIVEMALL-99] Cross-compilation of...

2017-05-21 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/80#discussion_r117657316
  
--- Diff: xgboost/src/main/java/hivemall/xgboost/NativeLibLoader.java ---
@@ -54,15 +55,47 @@ private static boolean hasResource(String path) {
 }
 
 private static String getOSName() {
-return System.getProperty("os.name");
+return System.getProperty("os.name").toLowerCase();
+}
+
+private static String getOSArch() {
+return System.getProperty("os.arch").toLowerCase();
+}
+
+private static String getOSArchString() {
+String os = getOSName();
+if(os.startsWith("linux")) {
+os = "linux";
+} else if(os.startsWith("mac")) {
+os = "darwin";
+} else if(os.startsWith("windows")) {
+os = "windows";
+}
+
+String arch = getOSArch();
+if(arch.equals("amd64") || arch.equals("x86_64")) {
+arch = "x64";
+} else if(arch.endsWith("86")) {
+arch = "x86";
+} else if(arch.indexOf("arm64") != -1) {
+arch = "arm64";
+} else if(arch.indexOf("armv6") != -1) {
+arch = "armv6";
+} else if(arch.indexOf("armv7") != -1) {
+arch = "armv7";
+} else if(arch.indexOf("ppc") != -1) {
+arch = "ppc64le";
+}
+
+return os + "-" + arch;
--- End diff --

I think you could refer [the 
code](https://github.com/xerial/snappy-java/blob/master/src/main/java/org/xerial/snappy/OSInfo.java)
 in `snappy-java` to handle almost all the case for detecting arch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #80: [WIP][HIVEMALL-99] Cross-compilation of XGBoos...

2017-05-21 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/80
  
Yea, I also think we need to use `qemu` to test them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #80: [WIP][HIVEMALL-99] Cross-compilation of...

2017-05-22 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/80#discussion_r117690004
  
--- Diff: bin/build_xgboost.sh ---
@@ -1,87 +0,0 @@
-#!/bin/bash
-#
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-#
-
-# xgboost requires g++-4.6 or higher 
(https://github.com/dmlc/xgboost/blob/master/doc/build.md),
-# so we need to first check if the requirement is satisfied.
-COMPILER_REQUIRED_VERSION="4.6"
-COMPILER_VERSION=`g++ --version 2> /dev/null`
-
-# Check if GNU g++ installed
-if [ $? = 127 ]; then
--- End diff --

Ah, ok. But I think we keep a script to build `xgboost` on native 
environments in terms of CPU optimization ( I think it'd be better to follow 
the same approach with `snappy-java`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #80: [WIP][HIVEMALL-99] Cross-compilation of...

2017-05-21 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/80#discussion_r117657090
  
--- Diff: bin/build_xgboost.sh ---
@@ -1,87 +0,0 @@
-#!/bin/bash
-#
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements.  See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership.  The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License.  You may obtain a copy of the License at
-#
-#   http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied.  See the License for the
-# specific language governing permissions and limitations
-# under the License.
-#
-
-# xgboost requires g++-4.6 or higher 
(https://github.com/dmlc/xgboost/blob/master/doc/build.md),
-# so we need to first check if the requirement is satisfied.
-COMPILER_REQUIRED_VERSION="4.6"
-COMPILER_VERSION=`g++ --version 2> /dev/null`
-
-# Check if GNU g++ installed
-if [ $? = 127 ]; then
--- End diff --

@amaya382 Does the new script works for both gcc and clang?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #80: [WIP][HIVEMALL-99] Cross-compilation of XGBoos...

2017-06-15 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/80
  
yea, I think so. I just mean I can't reproduce in my laptop and I can't 
look into this issue...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #80: [WIP][HIVEMALL-99] Cross-compilation of XGBoos...

2017-06-14 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/80
  
@amaya382 Aha, I've not seen that exception. Actually, I didn't check 
behaviours in Hive. Could you look into this issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall issue #80: [WIP][HIVEMALL-99] Cross-compilation of XGBoos...

2017-06-15 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/80
  
@amaya382 Could you file a jira for that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] incubator-hivemall pull request #122: [HIVEMALL-147][Spark] Support all Hive...

2017-10-15 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/incubator-hivemall/pull/122

[HIVEMALL-147][Spark] Support all Hivemall functions of v0.5-rc.1 in Spark 
Dataframe

## What changes were proposed in this pull request?
This pr added more Hivemall functions for Spark DataFrame. However, some of 
the functions are not supported here because Spark simply cannot handle them 
(e.g., unsupported types, returned types depending on options, ...).

## What type of PR is it?
Feature

## What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-147

## How was this patch tested?
Added tests in `HivemallOpsWithFeatureSuite`.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-147-2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/122.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #122


commit 4963c2e71279c095759ba4f545cbbb47cff667b7
Author: Takeshi Yamamuro <yamam...@apache.org>
Date:   2017-10-14T15:11:19Z

Support all Hivemall functions of v0.5-rc.1 in Spark Dataframe




---


[GitHub] incubator-hivemall pull request #122: [HIVEMALL-147][Spark] Support all Hive...

2017-10-15 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/122#discussion_r144753835
  
--- Diff: core/src/main/java/hivemall/evaluation/HitRateUDAF.java ---
@@ -71,9 +71,6 @@
 + " - Returns HitRate")
 public final class HitRateUDAF extends AbstractGenericUDAFResolver {
 
-// prevent instantiation
-private HitRateUDAF() {}
-
--- End diff --

This prevents Spark from loading UDAFs by using reflection. Can we remove 
this?


---


[GitHub] incubator-hivemall pull request #122: [HIVEMALL-147][Spark] Support all Hive...

2017-10-15 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/122#discussion_r144753777
  
--- Diff: core/src/main/java/hivemall/evaluation/AUCUDAF.java ---
@@ -110,7 +110,7 @@ public ClassificationEvaluator() {}
 
 @Override
 public ObjectInspector init(Mode mode, ObjectInspector[] 
parameters) throws HiveException {
-assert (parameters.length == 2 || parameters.length == 3) : 
parameters.length;
+assert (0 < parameters.length && parameters.length <= 3) : 
parameters.length;
--- End diff --

In Spark, this assertion fails because Spark passes a single parameter in 
`parameters` here for final output (IIUC [`AUC` finally outputs a single 
double-typed value for each 
group](https://github.com/apache/incubator-hivemall/pull/122/files#diff-9d758588c8fad559a15d0b2362e757b2R1134)).
 In Hive, does this work well?




---


[GitHub] incubator-hivemall pull request #112: [HIVEMALL-133][SPARK] Support spark-v2...

2017-09-11 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/incubator-hivemall/pull/112

[HIVEMALL-133][SPARK] Support spark-v2.2 in the hivemalls-spark module

## What changes were proposed in this pull request?
This pr added supports for spark-2.2 in Hivemall.

This pr is currently WIP because:
1. Java7's been dropped in spark-v2.2 and Hivemall still supports it, so we 
need some entries to check a Java version when `spark-2.2` enabled.
2. We need to move common code into `spark/spark-common`.

## What type of PR is it?
Improvement

## What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-133

## How was this patch tested?
Existing tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-133

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/112.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #112


commit 2d2750454c1567ba0e7a3af1401b9a3b4cbfda1f
Author: Takeshi Yamamuro <yamam...@apache.org>
Date:   2017-07-20T02:18:05Z

Support spark-2.2

commit cbda47a8fcd667028256c722c0905d0553ea7945
Author: Takeshi Yamamuro <yamam...@apache.org>
Date:   2017-07-27T14:54:13Z

Add enforce-plugin to validate java source/target versions

commit 18df884a0d36a7cd272f712c2b4414b212d958ee
Author: Takeshi Yamamuro <yamam...@apache.org>
Date:   2017-07-27T15:22:14Z

Fix style errors

commit 55fda48afbc1af3f95ea5b40d0645b4d149cba72
Author: Takeshi Yamamuro <yamam...@apache.org>
Date:   2017-07-27T15:22:24Z

Update .travis.yml

commit 95ec7833032701d6e87b19cad0ebedbc0a8f6cf4
Author: Takeshi Yamamuro <yamam...@apache.org>
Date:   2017-07-28T02:09:21Z

Add bin/run_travis_tests.sh




---


[GitHub] incubator-hivemall issue #103: [HIVEMALL-133][SPARK][WIP] Support spark-v2.2...

2017-09-11 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/103
  
See #112


---


[GitHub] incubator-hivemall issue #103: [HIVEMALL-133][SPARK][WIP] Support spark-v2.2...

2017-09-10 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/103
  
Thanks, I'll check later


---


[GitHub] incubator-hivemall pull request #113: [HIVEMALL-136][SPARK] Support train_cl...

2017-09-11 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/incubator-hivemall/pull/113

[HIVEMALL-136][SPARK] Support train_classifier and train_regressor for 
Spark 

## What changes were proposed in this pull request?
This pr added functions `train_classifier` and `train_regressor` in 
`HivemallOps`.

## What type of PR is it?
Improvement

## What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-136

## How was this patch tested?
Added tests in `HivemallOpsWithFeatureSuite`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-136

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/113.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #113


commit 3ec110dd347bbde9fc1e78a6af37deb080292388
Author: Takeshi Yamamuro <yamam...@apache.org>
Date:   2017-07-27T13:44:34Z

Support train_classifier and train_regressor




---


[GitHub] incubator-hivemall issue #106: [HIVEMALL-136][SPARK] Support train_classifie...

2017-09-11 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/106
  
See #113


---


[GitHub] incubator-hivemall pull request #130: [HIVEMALL][SPARK][WIP] Fix Spark-relat...

2018-01-09 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/incubator-hivemall/pull/130

[HIVEMALL][SPARK][WIP] Fix Spark-related artifact issues

## What changes were proposed in this pull request?
A objective of this pr is to fix artifact to release hivemall-v0.5.0 in ASF.
TODO
 - Update [the Release 
Guide](https://github.com/apache/incubator-hivemall/blob/master/src/site/markdown/release-guide.md)
 for Spark modules.

## What type of PR is it?

Bug Fix

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/incubator-hivemall 
FixSparkArtifactIssues

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/130.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #130


commit 0cef6bd4023198e0e7e9945d651268021da51dd5
Author: Takeshi Yamamuro <yamamuro@...>
Date:   2018-01-09T14:23:55Z

Fix Spark-related artifact issues




---


[GitHub] incubator-hivemall issue #130: [HIVEMALL][SPARK][WIP] Fix Spark-related arti...

2018-01-09 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/130
  
yea, NVM. A main target of this pr is to solve all the issue for the 
upcoming releases.


---


[GitHub] incubator-hivemall pull request #130: [HIVEMALL][SPARK][WIP] Fix Spark-relat...

2018-01-10 Thread maropu
Github user maropu closed the pull request at:

https://github.com/apache/incubator-hivemall/pull/130


---


[GitHub] incubator-hivemall issue #130: [HIVEMALL][SPARK][WIP] Fix Spark-related arti...

2018-01-10 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/130
  
See #131


---


[GitHub] incubator-hivemall pull request #131: [HIVEMALL][SPARK] Update release-guide...

2018-01-10 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/incubator-hivemall/pull/131

[HIVEMALL][SPARK] Update release-guide.md for spark releases

## What changes were proposed in this pull request?
This pr updated `release-guide.md` for spark releases.

## What type of PR is it?
Documentation

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/incubator-hivemall 
UpdateASFReleaseGuide

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/131.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #131


commit e24a78d6276f853b497ad6ba7f6c8e4f95b58644
Author: Takeshi Yamamuro <yamamuro@...>
Date:   2018-01-11T01:53:10Z

Update release-guide.md for spark releases




---


[GitHub] incubator-hivemall issue #139: [HIVEMALL-182][SPARK][WIP] Add an optimizer r...

2018-08-14 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/139
  
Sorry for my slow work. I'm checking the feasibility on my separate repo 
(because there are some issues to solve): 
https://github.com/maropu/spark-catalyst-rule-rewiter/tree/master
So, please give me more time and thanks.


---


[GitHub] incubator-hivemall issue #141: [HIVEMALL-117][SPARK] Update the installation...

2018-04-04 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/141
  
Finished: https://spark-packages.org/package/apache-hivemall/apache-hivemall
@myui check again?


---


[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions

2018-04-05 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/135
  
@myui Spark already has these functions: 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3118


---


[GitHub] incubator-hivemall issue #137: [HIVEMALL-179][SPARK] Support spark-v2.3

2018-03-28 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/137
  
merged to master


---


[GitHub] incubator-hivemall pull request #138: [HIVEMALL-180][SPARK] Drop the Spark-2...

2018-03-28 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/incubator-hivemall/pull/138

[HIVEMALL-180][SPARK] Drop the Spark-2.0 support

## What changes were proposed in this pull request?
This pr dropped the module for Spark-2.0.

## What type of PR is it?
Improvement

## What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-180

## How was this patch tested?
Existing tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-180

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/138.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #138






---


[GitHub] incubator-hivemall issue #139: [HIVEMALL-182][SPARK][WIP] Add an optimizer r...

2018-04-01 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/139
  
I'll fix later.


---


[GitHub] incubator-hivemall pull request #141: [HIVEMALL-117][SPARK] Update the insta...

2018-04-03 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/incubator-hivemall/pull/141

[HIVEMALL-117][SPARK] Update the installation guide for Spark

## What changes were proposed in this pull request?
This pr updated the installation guide for Spark.

## What type of PR is it?
Documentation

## What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-117

## How was this patch tested?
N/A



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-117

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/141.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #141


commit 1c0eb11b3095f8891d95ba84a84019c2e0142d47
Author: Takeshi Yamamuro <yamamuro@...>
Date:   2018-04-04T01:27:27Z

Update the installation guide for Spark




---


[GitHub] incubator-hivemall issue #141: [HIVEMALL-117][SPARK] Update the installation...

2018-04-03 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/141
  
I'll create a new github account for this purpose and then move the repo 
there.
So, pending until the move finished.


---


[GitHub] incubator-hivemall pull request #139: [HIVEMALL-182][SPARK][WIP] Add an opti...

2018-03-29 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/incubator-hivemall/pull/139

[HIVEMALL-182][SPARK][WIP] Add an optimizer rule to filter out columns with 
low variances

## What changes were proposed in this pull request?
This pr added a new optimizer rule `VarianceThreshold` in Spark.

TODO
 - Add docs in gitbook
 - Add more tests
 - Brush up `VarianceThreshold` code

## What type of PR is it?
Feature

## What is the Jira issue?
https://issues.apache.org/jira/browse/HIVEMALL-182

## How was this patch tested?
Added tests in `FeatureSelectionRuleSuite`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-182

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/139.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #139


commit dc5af08c6a18fb9f4cebf7d7f619cbd053165e1f
Author: Takeshi Yamamuro <yamamuro@...>
Date:   2018-03-29T22:26:40Z

Add an optimizer rule to filter out columns with low variances




---


[GitHub] incubator-hivemall pull request #171: [SPARK][HOTFIX][WIP] Fix existing test...

2018-11-13 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/171#discussion_r233262036
  
--- Diff: 
spark/spark-2.3/src/main/scala/org/apache/spark/sql/hive/HivemallOps.scala ---
@@ -1935,18 +1935,6 @@ object HivemallOps {
 )
   }
 
-  /**
-   * @see [[hivemall.tools.array.SubarrayUDF]]
-   * @group tools.array
-   */
-  def subarray(original: Column, fromIndex: Column, toIndex: Column): 
Column = withExpr {
-planHiveUDF(
-  "hivemall.tools.array.SubarrayUDF",
-  "subarray",
-  original :: fromIndex :: toIndex :: Nil
-)
-  }
--- End diff --

Probably, we need to support brickhouse functions for spark in a following 
separate pr.



---


[GitHub] incubator-hivemall issue #171: [SPARK][HOTFIX][WIP] Fix existing test failur...

2018-11-13 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/incubator-hivemall/pull/171
  
Not finished yet (I'm still working).


---


[GitHub] incubator-hivemall pull request #171: [SPARK][HOTFIX][WIP] Fix existing test...

2018-11-13 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/171#discussion_r233262624
  
--- Diff: 
spark/spark-2.3/src/test/scala/org/apache/spark/sql/hive/XGBoostSuite.scala ---
@@ -77,6 +77,7 @@ final class XGBoostSuite extends VectorQueryTest {
 val model = 
hiveContext.sparkSession.read.format("libxgboost").load(tempDir)
 val predict = model.join(mllibTestDf)
   .xgboost_predict($"rowid", $"features", $"model_id", 
$"pred_model")
--- End diff --

When invoking `xgboost_predict`, an assesion inside the xgboost library 
fails. I'm looking into this failure now though, I think we could skip the 
xgboost support for spark-2.3 in the upcoming release.


---


[GitHub] incubator-hivemall pull request #171: [SPARK][HOTFIX][WIP] Fix existing test...

2018-11-13 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/171#discussion_r233288446
  
--- Diff: 
spark/spark-2.3/src/main/scala/org/apache/spark/sql/hive/HivemallOps.scala ---
@@ -1935,18 +1935,6 @@ object HivemallOps {
 )
   }
 
-  /**
-   * @see [[hivemall.tools.array.SubarrayUDF]]
-   * @group tools.array
-   */
-  def subarray(original: Column, fromIndex: Column, toIndex: Column): 
Column = withExpr {
-planHiveUDF(
-  "hivemall.tools.array.SubarrayUDF",
-  "subarray",
-  original :: fromIndex :: toIndex :: Nil
-)
-  }
--- End diff --

I'll check


---


[GitHub] incubator-hivemall pull request #171: [SPARK][HOTFIX][WIP] Fix existing test...

2018-11-13 Thread maropu
GitHub user maropu opened a pull request:

https://github.com/apache/incubator-hivemall/pull/171

[SPARK][HOTFIX][WIP] Fix existing test failures in spark-2.3

## What changes were proposed in this pull request?
This pr is to fix the test failures for spark-2.3.

## How was this patch tested?
Run the existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maropu/incubator-hivemall HOTFIX-20181114

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-hivemall/pull/171.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #171


commit cde6fa6d11c2d6e23d52c3db282b9d66b69f4ee9
Author: Takeshi Yamamuro 
Date:   2018-11-13T23:14:33Z

Fix existing issues in spark-2.3




---


[GitHub] incubator-hivemall pull request #171: [SPARK][HOTFIX][WIP] Fix existing test...

2018-11-13 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/171#discussion_r233261637
  
--- Diff: spark/pom.xml ---
@@ -52,6 +52,12 @@
hivemall-core
${project.version}
compile
+   
+   
+   io.netty
+   
netty-all
+   
--- End diff --

Because the `netty` version conflicts with the one in spark.


---


[GitHub] incubator-hivemall pull request #171: [SPARK][HOTFIX] Fix existing test fail...

2018-11-14 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/incubator-hivemall/pull/171#discussion_r233399519
  
--- Diff: 
spark/spark-2.3/src/test/scala/org/apache/spark/sql/hive/XGBoostSuite.scala ---
@@ -77,6 +77,7 @@ final class XGBoostSuite extends VectorQueryTest {
 val model = 
hiveContext.sparkSession.read.format("libxgboost").load(tempDir)
 val predict = model.join(mllibTestDf)
   .xgboost_predict($"rowid", $"features", $"model_id", 
$"pred_model")
--- End diff --

Since it seems the test fails in JNI, we have no stacktrace;
```
XGBoostSuite:
- resolve libxgboost
- check XGBoost options
AssertError:read can not have position excceed buffer length
...
```


---