[GitHub] incubator-hivemall issue #35: [HIVEMALL-31][SPARK] Support Spark-v2.1.0
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/35 Merged --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #37: [HIVEMALL-47][SPARK] Support codegen for top-K...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/37 Updated the benchmark; the size of the left table is ~140MB and the size of the right table is ~70MB. ``` TestUtils.benchmark("codegen top-k join") { /** * Java HotSpot(TM) 64-Bit Server VM 1.8.0_31-b13 on Mac OS X 10.10.2 * Intel(R) Core(TM) i7-4578U CPU @ 3.00GHz * * top_k_join: Best/Avg Time(ms)Rate(M/s) Per Row(ns) Relative * --- * top_k_join wholestage off 3 /5 2751.9 0.4 1.0X * top_k_join wholestage on1 /1 6494.4 0.2 2.4X */ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #37: [HIVEMALL-47][SPARK] Support codegen for top-K...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/37 A codegen'd top-K join is as follows; ``` Found 1 WholeStageCodegen subtrees. == Subtree 1 / 1 == *ShuffledHashJoinTopK -1, [group#10], [group#27] :- Exchange hashpartitioning(group#10, 200) : +- LocalTableScan [userId#9, group#10, x#11, y#12] +- Exchange hashpartitioning(group#27, 200) +- LocalTableScan [group#27, position#28, x#29, y#30] Generated code: /* 001 */ public Object generate(Object[] references) { /* 002 */ return new GeneratedIterator(references); /* 003 */ } /* 004 */ /* 005 */ final class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator { /* 006 */ private Object[] references; /* 007 */ private scala.collection.Iterator[] inputs; /* 008 */ private org.apache.spark.sql.execution.joins.ShuffledHashJoinTopKExec shuffledhashjointopk_topKJoin; /* 009 */ private org.apache.spark.sql.execution.joins.PriorityQueueShim shuffledhashjointopk_queue; /* 010 */ private scala.collection.Iterator shuffledhashjointopk_leftIter; /* 011 */ private InternalRow shuffledhashjointopk_leftRow; /* 012 */ private int shuffledhashjointopk_value; /* 013 */ private UTF8String shuffledhashjointopk_value1; /* 014 */ private boolean shuffledhashjointopk_isNull; /* 015 */ private double shuffledhashjointopk_value2; /* 016 */ private double shuffledhashjointopk_value3; /* 017 */ private int shuffledhashjointopk_value8; /* 018 */ private double shuffledhashjointopk_value9; /* 019 */ private org.apache.spark.sql.execution.joins.ShuffledHashJoinTopKExec shuffledhashjointopk_joinExec; /* 020 */ private org.apache.spark.sql.execution.joins.HashedRelation shuffledhashjointopk_relation; /* 021 */ private UnsafeRow shuffledhashjointopk_result; /* 022 */ private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder shuffledhashjointopk_holder; /* 023 */ private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter shuffledhashjointopk_rowWriter; /* 024 */ private org.apache.spark.sql.catalyst.expressions.JoinedRow shuffledhashjointopk_joinedRow; /* 025 */ private int shuffledhashjointopk_value23; /* 026 */ private boolean shuffledhashjointopk_isNull18; /* 027 */ private double shuffledhashjointopk_value24; /* 028 */ private boolean shuffledhashjointopk_isNull19; /* 029 */ private int shuffledhashjointopk_value25; /* 030 */ private UTF8String shuffledhashjointopk_value26; /* 031 */ private boolean shuffledhashjointopk_isNull20; /* 032 */ private double shuffledhashjointopk_value27; /* 033 */ private double shuffledhashjointopk_value28; /* 034 */ private UTF8String shuffledhashjointopk_value29; /* 035 */ private boolean shuffledhashjointopk_isNull21; /* 036 */ private UTF8String shuffledhashjointopk_value30; /* 037 */ private boolean shuffledhashjointopk_isNull22; /* 038 */ private double shuffledhashjointopk_value31; /* 039 */ private double shuffledhashjointopk_value32; /* 040 */ private org.apache.spark.sql.execution.metric.SQLMetric shuffledhashjointopk_numOutputRows; /* 041 */ private UnsafeRow shuffledhashjointopk_result1; /* 042 */ private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder shuffledhashjointopk_holder1; /* 043 */ private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter shuffledhashjointopk_rowWriter1; /* 044 */ /* 045 */ public GeneratedIterator(Object[] references) { /* 046 */ this.references = references; /* 047 */ } /* 048 */ /* 049 */ public void init(int index, scala.collection.Iterator[] inputs) { /* 050 */ partitionIndex = index; /* 051 */ this.inputs = inputs; /* 052 */ wholestagecodegen_init_0(); /* 053 */ wholestagecodegen_init_1(); /* 054 */ /* 055 */ } /* 056 */ /* 057 */ private void wholestagecodegen_init_0() { /* 058 */ this.shuffledhashjointopk_topKJoin = (org.apache.spark.sql.execution.joins.ShuffledHashJoinTopKExec) references[0]; /* 059 */ shuffledhashjointopk_queue = shuffledhashjointopk_topKJoin.priorityQueue(); /* 060 */ shuffledhashjointopk_leftIter = inputs[0]; /* 061 */ /* 062 */ this.shuffledhashjointopk_joinExec = (org.apache.spark.sql.execution.joins.ShuffledHashJoinTopKExec) references[1]; /* 063 */ /* 064 */ shuffledhashjointopk_relation = (org.apache.spark.sql.execution.joins.HashedRelation) shuffledhashjointopk_joinExec.buildHashedRelation(inputs[1]); /* 065 */ incPeakExecutionMemory(shuffledhashjointopk_relation.estimatedSize()); /* 066 */ /* 067 */ shuffledhashjointopk_result = new UnsafeRow(1); /* 068
[GitHub] incubator-hivemall issue #38: Support spark-sql
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/38 LGTM cc: @myui --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #36: [Spark] Update gitbook for top_k_join
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/36 okay, thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #41: [HIVEMALL-54][SPARK] Add an easy-to-use script...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/41 yea, I'll update just after this merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #37: [HIVEMALL-47][SPARK] Support codegen for top-K...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/37 okay, merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #23: [HIVEMALL-31] Change the branch of spark-2.0 t...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/23 yea, could you add `[WIP]` in this title? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #24: [HIVEMALL-32] Print explicit error messages in...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/24 Merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #29: [HIVEMALL-39] Put the use of HiveUDFs i...
GitHub user maropu opened a pull request: https://github.com/apache/incubator-hivemall/pull/29 [HIVEMALL-39] Put the use of HiveUDFs in one place ## What changes were proposed in this pull request? This is a refactoring issue; in the master, we directly use the logical plan nodes of Hive UDFs in `HivemallOps`. However, these nodes are the internal classes of Spark and the interfaces may evolve. So, this pr created a new file `HivemallOpsImpl` and put these classes there. ## What type of PR is it? Refactoring ### What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-39 You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-39 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/29.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #29 commit 09c2233138f0976e0de8c871d319bd39f5819464 Author: Takeshi YAMAMURO <linguin@gmail.com> Date: 2017-01-26T15:12:54Z Put the use of HiveUDFs in one place --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #29: [HIVEMALL-39][SPARK] Put the use of HiveUDFs i...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/29 Merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #34: [HIVEMALL-45][SPARK] Upgrade spark v2.0...
GitHub user maropu opened a pull request: https://github.com/apache/incubator-hivemall/pull/34 [HIVEMALL-45][SPARK] Upgrade spark v2.0.0 to v2.0.2 (latest) ## What changes were proposed in this pull request? This pr updated pom.xml for the upgrade. ## What type of PR is it? Improvement ### What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-45 You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-45 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/34.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #34 commit ff097f94165486d1b19b8a7790e49476dcb9e24f Author: Takeshi YAMAMURO <linguin@gmail.com> Date: 2017-01-31T04:05:44Z Upgrade spark v2.0.0 to v2.0.2 (latest) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #26: [HIVEMALL-35] Remove unnecessary implic...
GitHub user maropu opened a pull request: https://github.com/apache/incubator-hivemall/pull/26 [HIVEMALL-35] Remove unnecessary implicit conversions in HivemallUtils ## What changes were proposed in this pull request? This pr removed entries for implicit conversion in `HivemallUtils`. ## What type of PR is it? Improvement ### What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-35 You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-35 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/26.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #26 commit d45657dc4aad5c647e8c702a1e8549670c5b1dee Author: Takeshi YAMAMURO <linguin@gmail.com> Date: 2017-01-25T14:21:32Z Remove unnecessary implicit conversions in HivemallUtils --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #25: [HIVEMALL-34] Fix a bug to wrongly use ...
GitHub user maropu opened a pull request: https://github.com/apache/incubator-hivemall/pull/25 [HIVEMALL-34] Fix a bug to wrongly use mllib vectors in some functions ## What changes were proposed in this pull request? In `to_hivemall_features` and `append_bias` in `HivemallUtils`, they wrongly used mllib vectors. They should use ml vectors instead. ## What type of PR is it? Bug Fix ### What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-34 ## How was this patch tested? Enabled a test in `HiveUdfWithVectorSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-34 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/25.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #25 commit d1c8b64176fd6caf9f2f00a51017ccce40df6eb9 Author: Takeshi YAMAMURO <linguin@gmail.com> Date: 2017-01-25T10:06:02Z Fix a bug to wrongly use mllib vectors in some functions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #25: [HIVEMALL-34] Fix a bug to wrongly use mllib v...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/25 Merged! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #28: [HIVEMALL-30] Temporarily ignore a stre...
GitHub user maropu opened a pull request: https://github.com/apache/incubator-hivemall/pull/28 [HIVEMALL-30] Temporarily ignore a streaming test ## What changes were proposed in this pull request? This test below fails sometimes (too flaky), so we temporarily ignore it. The stacktrace of this failure is: ``` HivemallOpsWithFeatureSuite: Exception in thread "broadcast-exchange-60" java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:331) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$4.apply(TorrentBroadcast.scala:231) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$4.apply(TorrentBroadcast.scala:231) at org.apache.spark.util.io.ChunkedByteBufferOutputStream.allocateNewChunkIfNeeded(ChunkedByteBufferOutputStream.scala:78) at org.apache.spark.util.io.ChunkedByteBufferOutputStream.write(ChunkedByteBufferOutputStream.scala:65) at net.jpountz.lz4.LZ4BlockOutputStream.flushBufferedData(LZ4BlockOutputStream.java:205) at net.jpountz.lz4.LZ4BlockOutputStream.finish(LZ4BlockOutputStream.java:235) at net.jpountz.lz4.LZ4BlockOutputStream.close(LZ4BlockOutputStream.java:175) at java.io.ObjectOutputStream$BlockDataOutputStream.close(ObjectOutputStream.java:1827) at java.io.ObjectOutputStream.close(ObjectOutputStream.java:741) at org.apache.spark.serializer.JavaSerializationStream.close(JavaSerializer.scala:57) at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$blockifyObject$1.apply$mcV$sp(TorrentBroadcast.scala:238) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1296) at org.apache.spark.broadcast.TorrentBroadcast$.blockifyObject(TorrentBroadcast.scala:237) at org.apache.spark.broadcast.TorrentBroadcast.writeBlocks(TorrentBroadcast.scala:107) at org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:86) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:56) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1370) ``` ## What type of PR is it? Bug Fix ### What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-30 You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-30 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/28.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #28 commit db3de52892fee8027fab2a99499ee6658f1eb4fa Author: Takeshi YAMAMURO <linguin@gmail.com> Date: 2017-01-26T08:44:23Z Temporarily ignore a streaming test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #26: [HIVEMALL-35] Remove unnecessary implicit conv...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/26 I made a pr for this flaky test failure in #28, so I'll merge this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #27: [HIVEMALL-36] Refactor each_top_k
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/27 okay, I'll merge this, then I'll check the OOM issue in follow-up activities. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #29: [HIVEMALL-39] Put the use of HiveUDFs in one p...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/29 @myui could you check this before merging it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #49: [HIVEMALL-26][SPARK] Make docs for regression ...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/49 Updated --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #49: [HIVEMALL-26][SPARK] Make docs for regr...
GitHub user maropu opened a pull request: https://github.com/apache/incubator-hivemall/pull/49 [HIVEMALL-26][SPARK] Make docs for regression and binary classification ## What changes were proposed in this pull request? This pr added docs for hivemall-on-spark. ## What type of PR is it? Documentation ## What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-26 ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-26-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/49.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #49 commit eabddeacd40e3c9d9b3b20938357f666f00132a1 Author: Takeshi Yamamuro <yamam...@apache.org> Date: 2017-02-23T11:03:09Z Make docs for hivemall-on-spark --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #49: [HIVEMALL-26][SPARK] Make docs for regression ...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/49 Many thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #44: [HIVEMALL-65] Update define-all.spark and impo...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/44 LGTM. I'll merged later --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #41: [HIVEMALL-54][SPARK] Add an easy-to-use script...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/41 Merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #20: [HIVEMALL-28] Set HIVEMALL_HOME to absolute pa...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/20 @wangyum Thanks for your work! What does this pr solve? Any issue in the current script? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #20: [HIVEMALL-28] Set HIVEMALL_HOME to absolute pa...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/20 @wangyum oh, I found you've already described that in the JIRA ticket. Could you write "what does this pr solve?" in this description? If you update that , LGTM. cc: @myui --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #54: [HIVEMALL-76][SPARK] Fix worng ranks in...
GitHub user maropu opened a pull request: https://github.com/apache/incubator-hivemall/pull/54 [HIVEMALL-76][SPARK] Fix worng ranks in top-K funcs ## What changes were proposed in this pull request? This pr fixed the Spark `each_top_k`/`top_k_join` behaviour along with Hive ones. ## What type of PR is it? Bug Fix ## What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-76 ## How was this patch tested? Added tests in `HivemallOpsSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-76 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/54.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #54 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #54: [HIVEMALL-76][SPARK] Fix worng ranks in top-K ...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/54 @myui passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #42: [HIVEMALL-38][SPARK] Support ChangeFinderUDF i...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/42 It's okay to merge --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #62: [HIVEMALL-89][SQL] Support to_from/from_csv in...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/62 Merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #59: [HIVEMALL-85] Upgrade hivemall-xgboost's hadoo...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/59 @wangyum Thanks for your continuous contributions! @myui do we have any reason to have a dependency with hadoop-core `0.20.2-cdh3u6`? I just used this dependency along with the other modules. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #61: [HIVEMALL-88][SPARK] Support a function...
Github user maropu commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/61#discussion_r105089894 --- Diff: spark/spark-2.1/src/main/scala/org/apache/spark/sql/hive/HivemallOps.scala --- @@ -805,6 +805,47 @@ final class HivemallOps(df: DataFrame) extends Logging { JoinTopK(kInt, df.logicalPlan, right.logicalPlan, Inner, Option(joinExprs.expr))(score.named) } + private def doFlatten(schema: StructType, prefix: Option[String] = None) : Seq[Column] = { +schema.fields.flatMap { f => + val colName = prefix.map(p => s"$p.${f.name}").getOrElse(f.name) --- End diff -- In Spark, the dot is used as the separator of column names in nested schema. Currently, Spark users cannot change this separator via configurations. For example, ``` scala> val ds = Seq((1, (1.0, "a"))).toDS() ds: org.apache.spark.sql.Dataset[(Int, (Double, String))] = [_1: int, _2: struct<_1: double, _2: string>] scala> ds.printSchema root |-- _1: integer (nullable = false) |-- _2: struct (nullable = true) ||-- _1: double (nullable = false) ||-- _2: string (nullable = true) scala> ds.select($"_2._2").show +---+ | _2| +---+ | a| +---+ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #61: [HIVEMALL-88][SPARK] Support a function...
Github user maropu commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/61#discussion_r105090944 --- Diff: spark/spark-2.1/src/main/scala/org/apache/spark/sql/hive/HivemallOps.scala --- @@ -805,6 +805,47 @@ final class HivemallOps(df: DataFrame) extends Logging { JoinTopK(kInt, df.logicalPlan, right.logicalPlan, Inner, Option(joinExprs.expr))(score.named) } + private def doFlatten(schema: StructType, prefix: Option[String] = None) : Seq[Column] = { +schema.fields.flatMap { f => + val colName = prefix.map(p => s"$p.${f.name}").getOrElse(f.name) --- End diff -- Ah, I found an issue; ``` scala> val df = Seq((1, (1.0, "a"))).toDF() df: org.apache.spark.sql.DataFrame = [_1: int, _2: struct<_1: double, _2: string>] scala> val ds1 = df.flatten().select("_2._1") org.apache.spark.sql.AnalysisException: cannot resolve '`_2._1`' given input columns: [_1, _2._1, _2._2];; 'Project ['_2._1] +- Project [_1#67 AS _1#73, _2#68._1 AS _2._1#74, _2#68._2 AS _2._2#75] +- LocalRelation [_1#67, _2#68] at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:75) at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:72) at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:289) ``` So, I'll reconsider this and please give me a sec. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #61: [HIVEMALL-88][SPARK] Support a function...
Github user maropu commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/61#discussion_r105093086 --- Diff: spark/spark-2.1/src/main/scala/org/apache/spark/sql/hive/HivemallOps.scala --- @@ -805,6 +805,47 @@ final class HivemallOps(df: DataFrame) extends Logging { JoinTopK(kInt, df.logicalPlan, right.logicalPlan, Inner, Option(joinExprs.expr))(score.named) } + private def doFlatten(schema: StructType, prefix: Option[String] = None) : Seq[Column] = { +schema.fields.flatMap { f => + val colName = prefix.map(p => s"$p.${f.name}").getOrElse(f.name) --- End diff -- Actually, we can access this column like this; ``` scala> val df = Seq((1, (1.0, "a"))).toDF() df: org.apache.spark.sql.DataFrame = [_1: int, _2: struct<_1: double, _2: string>] scala> val ds1 = df.flatten().select("`_2._1`").show +-+ |_2._1| +-+ | 1.0| +-+ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #61: [HIVEMALL-88][SPARK] Support a function...
Github user maropu commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/61#discussion_r105088535 --- Diff: spark/spark-2.1/src/main/scala/org/apache/spark/sql/hive/HivemallOps.scala --- @@ -805,6 +805,47 @@ final class HivemallOps(df: DataFrame) extends Logging { JoinTopK(kInt, df.logicalPlan, right.logicalPlan, Inner, Option(joinExprs.expr))(score.named) } + private def doFlatten(schema: StructType, prefix: Option[String] = None) : Seq[Column] = { +schema.fields.flatMap { f => + val colName = prefix.map(p => s"$p.${f.name}").getOrElse(f.name) --- End diff -- I know, but this is a Spark-local specification. So, the change you suggested make `doFlatten` fail. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #62: [HIVEMALL-89][SQL] Support to_from/from...
Github user maropu commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/62#discussion_r105090294 --- Diff: spark/spark-2.1/src/main/scala/org/apache/spark/sql/execution/datasources/csv/csvExpressions.scala --- @@ -0,0 +1,153 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources.csv + +import java.io.CharArrayWriter + +import jodd.util.CsvUtil --- End diff -- Updated --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #61: [HIVEMALL-88][SPARK] Support a function...
Github user maropu commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/61#discussion_r105099394 --- Diff: spark/spark-2.1/src/main/scala/org/apache/spark/sql/hive/HivemallOps.scala --- @@ -805,6 +805,47 @@ final class HivemallOps(df: DataFrame) extends Logging { JoinTopK(kInt, df.logicalPlan, right.logicalPlan, Inner, Option(joinExprs.expr))(score.named) } + private def doFlatten(schema: StructType, prefix: Option[String] = None) : Seq[Column] = { +schema.fields.flatMap { f => + val colName = prefix.map(p => s"$p.${f.name}").getOrElse(f.name) --- End diff -- @myui How about the latest fix? As you suggested, I added an option for separator. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #61: [HIVEMALL-88][SPARK] Support a function...
Github user maropu commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/61#discussion_r105090100 --- Diff: spark/spark-2.1/src/main/scala/org/apache/spark/sql/hive/HivemallOps.scala --- @@ -805,6 +805,47 @@ final class HivemallOps(df: DataFrame) extends Logging { JoinTopK(kInt, df.logicalPlan, right.logicalPlan, Inner, Option(joinExprs.expr))(score.named) } + private def doFlatten(schema: StructType, prefix: Option[String] = None) : Seq[Column] = { +schema.fields.flatMap { f => + val colName = prefix.map(p => s"$p.${f.name}").getOrElse(f.name) --- End diff -- So, the dot is more natural for Spark users. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #62: [HIVEMALL-89][SQL] Support to_from/from_csv in...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/62 Updated descriptions for the two funcs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #59: [HIVEMALL-85] Upgrade hivemall-xgboost's hadoo...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/59 @wangyum Why you select `2.6.5` in this pr? Any reason? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #100: [HOTFIX] Update documents for DataFram...
GitHub user maropu opened a pull request: https://github.com/apache/incubator-hivemall/pull/100 [HOTFIX] Update documents for DataFrame in Spark ## What changes were proposed in this pull request? This pr updated documents for `DataFrame` in Spark. ## What type of PR is it? [Bug Fix | Hot Fix] ## What is the Jira issue? N/A ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/incubator-hivemall HOTFIX-20170712 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/100.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #100 commit 2d036aa76d5365ad9a4a3b4d3272232369f114f6 Author: Takeshi Yamamuro <yamam...@apache.org> Date: 2017-07-12T04:37:31Z hotfix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #100: [HOTFIX] Update documents for DataFrame in Sp...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/100 Merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #99: [HIVEMALL-116][SQL][DOC] Add docs for S...
GitHub user maropu opened a pull request: https://github.com/apache/incubator-hivemall/pull/99 [HIVEMALL-116][SQL][DOC] Add docs for SQL cases in hivemall-spark ## What changes were proposed in this pull request? This pr added docs for SQL cases in `hivemall-spark`. ## What type of PR is it? Documentation ## What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-116 ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-116 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/99.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #99 commit e593d704cad18e897fd1187861855f389ed5184e Author: Takeshi Yamamuro <yamam...@apache.org> Date: 2017-07-11T13:51:27Z Add SQL docs for hivemall-spark --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #106: [HIVEMALL-136][SPARK] Support train_cl...
GitHub user maropu opened a pull request: https://github.com/apache/incubator-hivemall/pull/106 [HIVEMALL-136][SPARK] Support train_classifier and train_regressor for Spark ## What changes were proposed in this pull request? This pr added functions `train_classifier` and `train_regressor` in `HivemallOps`. ## What type of PR is it? Improvement ## What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-136 ## How was this patch tested? Added tests in `HivemallOpsWithFeatureSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-136 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/106.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #106 commit a71de06f94b4acf5f53d8bd2ec5fe73c2e589b03 Author: Takeshi Yamamuro <yamam...@apache.org> Date: 2017-07-27T13:44:34Z Support train_classifier and train_regressor --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #99: [HIVEMALL-116][SPARK][DOC] Add docs for...
Github user maropu commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/99#discussion_r126836953 --- Diff: docs/gitbook/spark/regression/e2006_sql.md --- @@ -0,0 +1,151 @@ + + +E2006 +=== +http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html#E2006-tfidf + +Data preparation + + +```sh +$ wget http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression/E2006.train.bz2 +$ wget http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression/E2006.test.bz2 +``` + +```scala +scala> :paste +spark.read.format("libsvm").load("E2006.train.bz2") + .select($"label", to_hivemall_features($"features").as("features")) + .createOrReplaceTempView("rawTrainTable") + +// `label` must be [0.0, 1.0] +sql(""" + CREATE OR REPLACE TEMPORARY VIEW trainTable AS +SELECT rescale(label, -7.899578, -0.51940954) AS label, features --- End diff -- FIxed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #99: [HIVEMALL-116][SPARK][DOC] Add docs for SQL ca...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/99 Merged to master --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #95: [HIVEMALL-119] Fix type cast issues in XGBoost...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/95 @amaya382 Can you check this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #95: [HIVEMALL-119] Fix type cast issues in ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/95#discussion_r125845972 --- Diff: xgboost/src/main/java/hivemall/xgboost/XGBoostUDTF.java --- @@ -326,7 +320,7 @@ public void close() throws HiveException { logger.info("model_id:" + modelId.toString() + " size:" + predModel.length); forward(new Object[] {modelId, predModel}); } catch (Exception e) { --- End diff -- It seems we can't cuz `close()` only throws `HiveException`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #95: [HIVEMALL-119] Fix type cast issues in ...
Github user maropu commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/95#discussion_r125930040 --- Diff: xgboost/src/main/java/hivemall/xgboost/XGBoostUDTF.java --- @@ -269,44 +270,35 @@ public void checkTargetValue(double target) throws HiveException {} public void process(Object[] args) throws HiveException { --- End diff -- Is it ok to just call `mvn formatter:format`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #95: [HIVEMALL-119] Fix type cast issues in XGBoost...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/95 ok, I hit the same error. I'll check again. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #95: [HIVEMALL-119] Fix type cast issues in XGBoost...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/95 Without `HadoopUtils.getTaskId()` in [here](https://github.com/maropu/incubator-hivemall/blob/e9fc6cfabd295c4c49faf43c4a44fe9eca2c9025/xgboost/src/main/java/hivemall/xgboost/XGBoostUDTF.java#L290), it works fine. But, I don't know this is why. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #95: [HIVEMALL-119] Fix type cast issues in XGBoost...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/95 @amaya382 check again? I checked it worked well in my local env. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #75: [HIVEMALL-100] Fix build script
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/75 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #103: [HIVEMALL-133][SPARK][WIP] Support spark-v2.2...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/103 ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #78: [HIVEMALL-103][Spark] Upgrade spark-v2....
GitHub user maropu opened a pull request: https://github.com/apache/incubator-hivemall/pull/78 [HIVEMALL-103][Spark] Upgrade spark-v2.1.0 to v2.1.1 ## What changes were proposed in this pull request? This pr upgraded spark-v2.1.0 to v2.1.1. ## What type of PR is it? Improvement ## What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-103 ## How was this patch tested? Existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-103 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/78.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #78 commit de9f2ce2d6cefa0228122edaf872e3ad9068a7c0 Author: Takeshi Yamamuro <yamam...@apache.org> Date: 2017-05-12T03:39:00Z Upgrade spark-v2.1.0 to v2.1.1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #78: [HIVEMALL-103][Spark] Upgrade spark-v2.1.0 to ...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/78 ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #78: [HIVEMALL-103][Spark] Upgrade spark-v2.1.0 to ...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/78 merged to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #80: [WIP][HIVEMALL-99] Cross-compilation of...
Github user maropu commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/80#discussion_r117665868 --- Diff: bin/build_xgboost.sh --- @@ -1,87 +0,0 @@ -#!/bin/bash -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# - -# xgboost requires g++-4.6 or higher (https://github.com/dmlc/xgboost/blob/master/doc/build.md), -# so we need to first check if the requirement is satisfied. -COMPILER_REQUIRED_VERSION="4.6" -COMPILER_VERSION=`g++ --version 2> /dev/null` - -# Check if GNU g++ installed -if [ $? = 127 ]; then --- End diff -- We'd be better off printing explicit error messages when `clang` used. Could you? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #80: [WIP][HIVEMALL-99] Cross-compilation of...
Github user maropu commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/80#discussion_r117657316 --- Diff: xgboost/src/main/java/hivemall/xgboost/NativeLibLoader.java --- @@ -54,15 +55,47 @@ private static boolean hasResource(String path) { } private static String getOSName() { -return System.getProperty("os.name"); +return System.getProperty("os.name").toLowerCase(); +} + +private static String getOSArch() { +return System.getProperty("os.arch").toLowerCase(); +} + +private static String getOSArchString() { +String os = getOSName(); +if(os.startsWith("linux")) { +os = "linux"; +} else if(os.startsWith("mac")) { +os = "darwin"; +} else if(os.startsWith("windows")) { +os = "windows"; +} + +String arch = getOSArch(); +if(arch.equals("amd64") || arch.equals("x86_64")) { +arch = "x64"; +} else if(arch.endsWith("86")) { +arch = "x86"; +} else if(arch.indexOf("arm64") != -1) { +arch = "arm64"; +} else if(arch.indexOf("armv6") != -1) { +arch = "armv6"; +} else if(arch.indexOf("armv7") != -1) { +arch = "armv7"; +} else if(arch.indexOf("ppc") != -1) { +arch = "ppc64le"; +} + +return os + "-" + arch; --- End diff -- I think you could refer [the code](https://github.com/xerial/snappy-java/blob/master/src/main/java/org/xerial/snappy/OSInfo.java) in `snappy-java` to handle almost all the case for detecting arch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #80: [WIP][HIVEMALL-99] Cross-compilation of XGBoos...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/80 Yea, I also think we need to use `qemu` to test them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #80: [WIP][HIVEMALL-99] Cross-compilation of...
Github user maropu commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/80#discussion_r117690004 --- Diff: bin/build_xgboost.sh --- @@ -1,87 +0,0 @@ -#!/bin/bash -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# - -# xgboost requires g++-4.6 or higher (https://github.com/dmlc/xgboost/blob/master/doc/build.md), -# so we need to first check if the requirement is satisfied. -COMPILER_REQUIRED_VERSION="4.6" -COMPILER_VERSION=`g++ --version 2> /dev/null` - -# Check if GNU g++ installed -if [ $? = 127 ]; then --- End diff -- Ah, ok. But I think we keep a script to build `xgboost` on native environments in terms of CPU optimization ( I think it'd be better to follow the same approach with `snappy-java`). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #80: [WIP][HIVEMALL-99] Cross-compilation of...
Github user maropu commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/80#discussion_r117657090 --- Diff: bin/build_xgboost.sh --- @@ -1,87 +0,0 @@ -#!/bin/bash -# -# Licensed to the Apache Software Foundation (ASF) under one -# or more contributor license agreements. See the NOTICE file -# distributed with this work for additional information -# regarding copyright ownership. The ASF licenses this file -# to you under the Apache License, Version 2.0 (the -# "License"); you may not use this file except in compliance -# with the License. You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, -# software distributed under the License is distributed on an -# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -# KIND, either express or implied. See the License for the -# specific language governing permissions and limitations -# under the License. -# - -# xgboost requires g++-4.6 or higher (https://github.com/dmlc/xgboost/blob/master/doc/build.md), -# so we need to first check if the requirement is satisfied. -COMPILER_REQUIRED_VERSION="4.6" -COMPILER_VERSION=`g++ --version 2> /dev/null` - -# Check if GNU g++ installed -if [ $? = 127 ]; then --- End diff -- @amaya382 Does the new script works for both gcc and clang? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #80: [WIP][HIVEMALL-99] Cross-compilation of XGBoos...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/80 yea, I think so. I just mean I can't reproduce in my laptop and I can't look into this issue... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #80: [WIP][HIVEMALL-99] Cross-compilation of XGBoos...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/80 @amaya382 Aha, I've not seen that exception. Actually, I didn't check behaviours in Hive. Could you look into this issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall issue #80: [WIP][HIVEMALL-99] Cross-compilation of XGBoos...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/80 @amaya382 Could you file a jira for that? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-hivemall pull request #122: [HIVEMALL-147][Spark] Support all Hive...
GitHub user maropu opened a pull request: https://github.com/apache/incubator-hivemall/pull/122 [HIVEMALL-147][Spark] Support all Hivemall functions of v0.5-rc.1 in Spark Dataframe ## What changes were proposed in this pull request? This pr added more Hivemall functions for Spark DataFrame. However, some of the functions are not supported here because Spark simply cannot handle them (e.g., unsupported types, returned types depending on options, ...). ## What type of PR is it? Feature ## What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-147 ## How was this patch tested? Added tests in `HivemallOpsWithFeatureSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-147-2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/122.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #122 commit 4963c2e71279c095759ba4f545cbbb47cff667b7 Author: Takeshi Yamamuro <yamam...@apache.org> Date: 2017-10-14T15:11:19Z Support all Hivemall functions of v0.5-rc.1 in Spark Dataframe ---
[GitHub] incubator-hivemall pull request #122: [HIVEMALL-147][Spark] Support all Hive...
Github user maropu commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/122#discussion_r144753835 --- Diff: core/src/main/java/hivemall/evaluation/HitRateUDAF.java --- @@ -71,9 +71,6 @@ + " - Returns HitRate") public final class HitRateUDAF extends AbstractGenericUDAFResolver { -// prevent instantiation -private HitRateUDAF() {} - --- End diff -- This prevents Spark from loading UDAFs by using reflection. Can we remove this? ---
[GitHub] incubator-hivemall pull request #122: [HIVEMALL-147][Spark] Support all Hive...
Github user maropu commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/122#discussion_r144753777 --- Diff: core/src/main/java/hivemall/evaluation/AUCUDAF.java --- @@ -110,7 +110,7 @@ public ClassificationEvaluator() {} @Override public ObjectInspector init(Mode mode, ObjectInspector[] parameters) throws HiveException { -assert (parameters.length == 2 || parameters.length == 3) : parameters.length; +assert (0 < parameters.length && parameters.length <= 3) : parameters.length; --- End diff -- In Spark, this assertion fails because Spark passes a single parameter in `parameters` here for final output (IIUC [`AUC` finally outputs a single double-typed value for each group](https://github.com/apache/incubator-hivemall/pull/122/files#diff-9d758588c8fad559a15d0b2362e757b2R1134)). In Hive, does this work well? ---
[GitHub] incubator-hivemall pull request #112: [HIVEMALL-133][SPARK] Support spark-v2...
GitHub user maropu opened a pull request: https://github.com/apache/incubator-hivemall/pull/112 [HIVEMALL-133][SPARK] Support spark-v2.2 in the hivemalls-spark module ## What changes were proposed in this pull request? This pr added supports for spark-2.2 in Hivemall. This pr is currently WIP because: 1. Java7's been dropped in spark-v2.2 and Hivemall still supports it, so we need some entries to check a Java version when `spark-2.2` enabled. 2. We need to move common code into `spark/spark-common`. ## What type of PR is it? Improvement ## What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-133 ## How was this patch tested? Existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-133 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/112.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #112 commit 2d2750454c1567ba0e7a3af1401b9a3b4cbfda1f Author: Takeshi Yamamuro <yamam...@apache.org> Date: 2017-07-20T02:18:05Z Support spark-2.2 commit cbda47a8fcd667028256c722c0905d0553ea7945 Author: Takeshi Yamamuro <yamam...@apache.org> Date: 2017-07-27T14:54:13Z Add enforce-plugin to validate java source/target versions commit 18df884a0d36a7cd272f712c2b4414b212d958ee Author: Takeshi Yamamuro <yamam...@apache.org> Date: 2017-07-27T15:22:14Z Fix style errors commit 55fda48afbc1af3f95ea5b40d0645b4d149cba72 Author: Takeshi Yamamuro <yamam...@apache.org> Date: 2017-07-27T15:22:24Z Update .travis.yml commit 95ec7833032701d6e87b19cad0ebedbc0a8f6cf4 Author: Takeshi Yamamuro <yamam...@apache.org> Date: 2017-07-28T02:09:21Z Add bin/run_travis_tests.sh ---
[GitHub] incubator-hivemall issue #103: [HIVEMALL-133][SPARK][WIP] Support spark-v2.2...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/103 See #112 ---
[GitHub] incubator-hivemall issue #103: [HIVEMALL-133][SPARK][WIP] Support spark-v2.2...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/103 Thanks, I'll check later ---
[GitHub] incubator-hivemall pull request #113: [HIVEMALL-136][SPARK] Support train_cl...
GitHub user maropu opened a pull request: https://github.com/apache/incubator-hivemall/pull/113 [HIVEMALL-136][SPARK] Support train_classifier and train_regressor for Spark ## What changes were proposed in this pull request? This pr added functions `train_classifier` and `train_regressor` in `HivemallOps`. ## What type of PR is it? Improvement ## What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-136 ## How was this patch tested? Added tests in `HivemallOpsWithFeatureSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-136 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/113.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #113 commit 3ec110dd347bbde9fc1e78a6af37deb080292388 Author: Takeshi Yamamuro <yamam...@apache.org> Date: 2017-07-27T13:44:34Z Support train_classifier and train_regressor ---
[GitHub] incubator-hivemall issue #106: [HIVEMALL-136][SPARK] Support train_classifie...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/106 See #113 ---
[GitHub] incubator-hivemall pull request #130: [HIVEMALL][SPARK][WIP] Fix Spark-relat...
GitHub user maropu opened a pull request: https://github.com/apache/incubator-hivemall/pull/130 [HIVEMALL][SPARK][WIP] Fix Spark-related artifact issues ## What changes were proposed in this pull request? A objective of this pr is to fix artifact to release hivemall-v0.5.0 in ASF. TODO - Update [the Release Guide](https://github.com/apache/incubator-hivemall/blob/master/src/site/markdown/release-guide.md) for Spark modules. ## What type of PR is it? Bug Fix You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/incubator-hivemall FixSparkArtifactIssues Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/130.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #130 commit 0cef6bd4023198e0e7e9945d651268021da51dd5 Author: Takeshi Yamamuro <yamamuro@...> Date: 2018-01-09T14:23:55Z Fix Spark-related artifact issues ---
[GitHub] incubator-hivemall issue #130: [HIVEMALL][SPARK][WIP] Fix Spark-related arti...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/130 yea, NVM. A main target of this pr is to solve all the issue for the upcoming releases. ---
[GitHub] incubator-hivemall pull request #130: [HIVEMALL][SPARK][WIP] Fix Spark-relat...
Github user maropu closed the pull request at: https://github.com/apache/incubator-hivemall/pull/130 ---
[GitHub] incubator-hivemall issue #130: [HIVEMALL][SPARK][WIP] Fix Spark-related arti...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/130 See #131 ---
[GitHub] incubator-hivemall pull request #131: [HIVEMALL][SPARK] Update release-guide...
GitHub user maropu opened a pull request: https://github.com/apache/incubator-hivemall/pull/131 [HIVEMALL][SPARK] Update release-guide.md for spark releases ## What changes were proposed in this pull request? This pr updated `release-guide.md` for spark releases. ## What type of PR is it? Documentation You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/incubator-hivemall UpdateASFReleaseGuide Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/131.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #131 commit e24a78d6276f853b497ad6ba7f6c8e4f95b58644 Author: Takeshi Yamamuro <yamamuro@...> Date: 2018-01-11T01:53:10Z Update release-guide.md for spark releases ---
[GitHub] incubator-hivemall issue #139: [HIVEMALL-182][SPARK][WIP] Add an optimizer r...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/139 Sorry for my slow work. I'm checking the feasibility on my separate repo (because there are some issues to solve): https://github.com/maropu/spark-catalyst-rule-rewiter/tree/master So, please give me more time and thanks. ---
[GitHub] incubator-hivemall issue #141: [HIVEMALL-117][SPARK] Update the installation...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/141 Finished: https://spark-packages.org/package/apache-hivemall/apache-hivemall @myui check again? ---
[GitHub] incubator-hivemall issue #135: [WIP] Merge Brickhouse functions
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/135 @myui Spark already has these functions: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L3118 ---
[GitHub] incubator-hivemall issue #137: [HIVEMALL-179][SPARK] Support spark-v2.3
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/137 merged to master ---
[GitHub] incubator-hivemall pull request #138: [HIVEMALL-180][SPARK] Drop the Spark-2...
GitHub user maropu opened a pull request: https://github.com/apache/incubator-hivemall/pull/138 [HIVEMALL-180][SPARK] Drop the Spark-2.0 support ## What changes were proposed in this pull request? This pr dropped the module for Spark-2.0. ## What type of PR is it? Improvement ## What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-180 ## How was this patch tested? Existing tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-180 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/138.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #138 ---
[GitHub] incubator-hivemall issue #139: [HIVEMALL-182][SPARK][WIP] Add an optimizer r...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/139 I'll fix later. ---
[GitHub] incubator-hivemall pull request #141: [HIVEMALL-117][SPARK] Update the insta...
GitHub user maropu opened a pull request: https://github.com/apache/incubator-hivemall/pull/141 [HIVEMALL-117][SPARK] Update the installation guide for Spark ## What changes were proposed in this pull request? This pr updated the installation guide for Spark. ## What type of PR is it? Documentation ## What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-117 ## How was this patch tested? N/A You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-117 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/141.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #141 commit 1c0eb11b3095f8891d95ba84a84019c2e0142d47 Author: Takeshi Yamamuro <yamamuro@...> Date: 2018-04-04T01:27:27Z Update the installation guide for Spark ---
[GitHub] incubator-hivemall issue #141: [HIVEMALL-117][SPARK] Update the installation...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/141 I'll create a new github account for this purpose and then move the repo there. So, pending until the move finished. ---
[GitHub] incubator-hivemall pull request #139: [HIVEMALL-182][SPARK][WIP] Add an opti...
GitHub user maropu opened a pull request: https://github.com/apache/incubator-hivemall/pull/139 [HIVEMALL-182][SPARK][WIP] Add an optimizer rule to filter out columns with low variances ## What changes were proposed in this pull request? This pr added a new optimizer rule `VarianceThreshold` in Spark. TODO - Add docs in gitbook - Add more tests - Brush up `VarianceThreshold` code ## What type of PR is it? Feature ## What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-182 ## How was this patch tested? Added tests in `FeatureSelectionRuleSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/incubator-hivemall HIVEMALL-182 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/139.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #139 commit dc5af08c6a18fb9f4cebf7d7f619cbd053165e1f Author: Takeshi Yamamuro <yamamuro@...> Date: 2018-03-29T22:26:40Z Add an optimizer rule to filter out columns with low variances ---
[GitHub] incubator-hivemall pull request #171: [SPARK][HOTFIX][WIP] Fix existing test...
Github user maropu commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/171#discussion_r233262036 --- Diff: spark/spark-2.3/src/main/scala/org/apache/spark/sql/hive/HivemallOps.scala --- @@ -1935,18 +1935,6 @@ object HivemallOps { ) } - /** - * @see [[hivemall.tools.array.SubarrayUDF]] - * @group tools.array - */ - def subarray(original: Column, fromIndex: Column, toIndex: Column): Column = withExpr { -planHiveUDF( - "hivemall.tools.array.SubarrayUDF", - "subarray", - original :: fromIndex :: toIndex :: Nil -) - } --- End diff -- Probably, we need to support brickhouse functions for spark in a following separate pr. ---
[GitHub] incubator-hivemall issue #171: [SPARK][HOTFIX][WIP] Fix existing test failur...
Github user maropu commented on the issue: https://github.com/apache/incubator-hivemall/pull/171 Not finished yet (I'm still working). ---
[GitHub] incubator-hivemall pull request #171: [SPARK][HOTFIX][WIP] Fix existing test...
Github user maropu commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/171#discussion_r233262624 --- Diff: spark/spark-2.3/src/test/scala/org/apache/spark/sql/hive/XGBoostSuite.scala --- @@ -77,6 +77,7 @@ final class XGBoostSuite extends VectorQueryTest { val model = hiveContext.sparkSession.read.format("libxgboost").load(tempDir) val predict = model.join(mllibTestDf) .xgboost_predict($"rowid", $"features", $"model_id", $"pred_model") --- End diff -- When invoking `xgboost_predict`, an assesion inside the xgboost library fails. I'm looking into this failure now though, I think we could skip the xgboost support for spark-2.3 in the upcoming release. ---
[GitHub] incubator-hivemall pull request #171: [SPARK][HOTFIX][WIP] Fix existing test...
Github user maropu commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/171#discussion_r233288446 --- Diff: spark/spark-2.3/src/main/scala/org/apache/spark/sql/hive/HivemallOps.scala --- @@ -1935,18 +1935,6 @@ object HivemallOps { ) } - /** - * @see [[hivemall.tools.array.SubarrayUDF]] - * @group tools.array - */ - def subarray(original: Column, fromIndex: Column, toIndex: Column): Column = withExpr { -planHiveUDF( - "hivemall.tools.array.SubarrayUDF", - "subarray", - original :: fromIndex :: toIndex :: Nil -) - } --- End diff -- I'll check ---
[GitHub] incubator-hivemall pull request #171: [SPARK][HOTFIX][WIP] Fix existing test...
GitHub user maropu opened a pull request: https://github.com/apache/incubator-hivemall/pull/171 [SPARK][HOTFIX][WIP] Fix existing test failures in spark-2.3 ## What changes were proposed in this pull request? This pr is to fix the test failures for spark-2.3. ## How was this patch tested? Run the existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/maropu/incubator-hivemall HOTFIX-20181114 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/171.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #171 commit cde6fa6d11c2d6e23d52c3db282b9d66b69f4ee9 Author: Takeshi Yamamuro Date: 2018-11-13T23:14:33Z Fix existing issues in spark-2.3 ---
[GitHub] incubator-hivemall pull request #171: [SPARK][HOTFIX][WIP] Fix existing test...
Github user maropu commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/171#discussion_r233261637 --- Diff: spark/pom.xml --- @@ -52,6 +52,12 @@ hivemall-core ${project.version} compile + + + io.netty + netty-all + --- End diff -- Because the `netty` version conflicts with the one in spark. ---
[GitHub] incubator-hivemall pull request #171: [SPARK][HOTFIX] Fix existing test fail...
Github user maropu commented on a diff in the pull request: https://github.com/apache/incubator-hivemall/pull/171#discussion_r233399519 --- Diff: spark/spark-2.3/src/test/scala/org/apache/spark/sql/hive/XGBoostSuite.scala --- @@ -77,6 +77,7 @@ final class XGBoostSuite extends VectorQueryTest { val model = hiveContext.sparkSession.read.format("libxgboost").load(tempDir) val predict = model.join(mllibTestDf) .xgboost_predict($"rowid", $"features", $"model_id", $"pred_model") --- End diff -- Since it seems the test fails in JNI, we have no stacktrace; ``` XGBoostSuite: - resolve libxgboost - check XGBoost options AssertError:read can not have position excceed buffer length ... ``` ---