[GitHub] spark pull request: [SPARK-11419][STREAMING] Parallel recovery for...

2015-10-31 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9373#discussion_r43578032 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/FileBasedWriteAheadLog.scala --- @@ -17,6 +17,7 @@ package

[GitHub] spark pull request: [SPARK-11419][STREAMING] Parallel recovery for...

2015-11-01 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9373#discussion_r43587988 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/FileBasedWriteAheadLog.scala --- @@ -126,11 +127,11 @@ private[streaming] class

[GitHub] spark pull request: [SPARK-11419][STREAMING] Parallel recovery for...

2015-11-01 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/9373#issuecomment-152867985 @harishreedharan Here are some benchmark results: For reference, the driver was a r3.2xlarge EC2 instance. ![image](https://cloud.githubusercontent.com

[GitHub] spark pull request: [SPARK-11198][STREAMING][KINESIS] Support de-a...

2015-11-01 Thread brkyvz
GitHub user brkyvz opened a pull request: https://github.com/apache/spark/pull/9403 [SPARK-11198][STREAMING][KINESIS] Support de-aggregation of records during recovery While the KCL handles de-aggregation during the regular operation, during recovery we use the lower level api

[GitHub] spark pull request: [SPARK-11198][STREAMING][KINESIS] Support de-a...

2015-11-02 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/9403#issuecomment-153117592 Tested that this patch successfully de-aggregates in recovery as well. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-11359][STREAMING][KINESIS] Checkpoint t...

2015-11-02 Thread brkyvz
GitHub user brkyvz opened a pull request: https://github.com/apache/spark/pull/9421 [SPARK-11359][STREAMING][KINESIS] Checkpoint to DynamoDB even when new data doesn't come in Currently, the checkpoints to DynamoDB occur only when new data comes in, as we update the clock fo

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-03 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/9143#issuecomment-153416739 @tdas @zsxwing This is ready for another pass --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

2015-11-04 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9367#discussion_r43927722 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -366,6 +367,72 @@ class SparkSubmitSuite

[GitHub] spark pull request: [SPARK-11195][CORE] Use correct classloader fo...

2015-11-04 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9367#discussion_r43927641 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -366,6 +367,72 @@ class SparkSubmitSuite

[GitHub] spark pull request: [SPARK-11198][STREAMING][KINESIS] Support de-a...

2015-11-04 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9403#discussion_r43928521 --- Diff: extras/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisStreamSuite.scala --- @@ -56,6 +56,8 @@ class KinesisStreamSuite

[GitHub] spark pull request: [SPARK-11198][STREAMING][KINESIS] Support de-a...

2015-11-04 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9403#discussion_r43929148 --- Diff: extras/kinesis-asl/src/test/scala/org/apache/spark/streaming/kinesis/KinesisTestUtils.scala --- @@ -90,24 +102,44 @@ private[kinesis] class

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-04 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r43947286 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala --- @@ -599,6 +607,7 @@ class ReceiverTracker(ssc

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-04 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r43948616 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/BatchedWriteAheadLog.scala --- @@ -0,0 +1,204 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-11359][STREAMING][KINESIS] Checkpoint t...

2015-11-04 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9421#discussion_r43967600 --- Diff: extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisCheckpointState.scala --- @@ -16,39 +16,77 @@ */ package

[GitHub] spark pull request: [SPARK-11359][STREAMING][KINESIS] Checkpoint t...

2015-11-05 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9421#discussion_r44069549 --- Diff: extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisRecordProcessor.scala --- @@ -135,18 +105,17 @@ private[kinesis] class

[GitHub] spark pull request: [SPARK-11359][STREAMING][KINESIS] Checkpoint t...

2015-11-05 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9421#discussion_r44083524 --- Diff: extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisCheckpointer.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-11359][STREAMING][KINESIS] Checkpoint t...

2015-11-05 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9421#discussion_r44086804 --- Diff: extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisReceiver.scala --- @@ -216,6 +226,25 @@ private[kinesis] class

[GitHub] spark pull request: [SPARK-11359][STREAMING][KINESIS] Checkpoint t...

2015-11-05 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9421#discussion_r44086994 --- Diff: extras/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/KinesisCheckpointer.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-05 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r44090581 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/util/WriteAheadLogSuite.scala --- @@ -190,135 +362,197 @@ class WriteAheadLogSuite extends

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-05 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r44092490 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/BatchedWriteAheadLog.scala --- @@ -0,0 +1,212 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-11198][STREAMING][KINESIS] Support de-a...

2015-11-05 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/9403#issuecomment-154254187 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-11198][STREAMING][KINESIS] Support de-a...

2015-11-05 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/9403#issuecomment-154256138 @zsxwing probably not. Locally tests pass. They fail on jenkins for some reason. I've enabled a lot of the logging to look deeper into it. --- If your project is s

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-06 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/9143#issuecomment-154342137 Addressed comments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-06 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r44180341 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/BatchedWriteAheadLog.scala --- @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-06 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r44184988 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/BatchedWriteAheadLog.scala --- @@ -0,0 +1,206 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-06 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/9143#issuecomment-154533740 @zsxwing The same receiver won't be able to send Block A and B, as the receiver will be blocked until it receives the reply. Different receivers may send A, and

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-06 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/9143#issuecomment-154537412 @zsxwing Maybe I should add that to the Javadoc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-11198][STREAMING][KINESIS] Support de-a...

2015-11-06 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/9403#issuecomment-154542698 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-10500][SPARKR] sparkr.zip cannot be cre...

2015-11-06 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9390#discussion_r44192284 --- Diff: R/install-dev.bat --- @@ -25,3 +25,9 @@ set SPARK_HOME=%~dp0.. MKDIR %SPARK_HOME%\R\lib R.exe CMD INSTALL --library="%SPARK_HO

[GitHub] spark pull request: [SPARK-10500][SPARKR] sparkr.zip cannot be cre...

2015-11-06 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9390#discussion_r44192531 --- Diff: core/src/main/scala/org/apache/spark/api/r/RUtils.scala --- @@ -23,6 +23,10 @@ import java.util.Arrays import org.apache.spark.{SparkEnv

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-08 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r44229377 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala --- @@ -488,7 +491,12 @@ class ReceiverTracker(ssc

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-08 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/9143#issuecomment-154861688 @tdas @zsxwing @harishreedharan I addressed all the comments, added a shutdown check to the ReceiverTracker RPC Endpoint, and removed the possibly flaky test. Could

[GitHub] spark pull request: [SPARK-11419][STREAMING] Parallel recovery for...

2015-11-08 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/9373#issuecomment-154861924 @harishreedharan I've been trying to test this patch, but I just couldn't set up HDFS to work with Spark using the spark-ec2 scripts. Could you please help m

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-08 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r44234779 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala --- @@ -488,7 +493,28 @@ class ReceiverTracker(ssc

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-08 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r44239951 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/BatchedWriteAheadLog.scala --- @@ -0,0 +1,221 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-08 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r44240091 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/util/WriteAheadLogSuite.scala --- @@ -190,283 +281,150 @@ class WriteAheadLogSuite extends

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-08 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r44240396 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/util/WriteAheadLogSuite.scala --- @@ -58,49 +71,127 @@ class WriteAheadLogSuite extends

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-08 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r44240881 --- Diff: streaming/src/test/scala/org/apache/spark/streaming/util/WriteAheadLogSuite.scala --- @@ -190,283 +281,150 @@ class WriteAheadLogSuite extends

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-08 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9143#discussion_r44241288 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala --- @@ -488,7 +493,28 @@ class ReceiverTracker(ssc

[GitHub] spark pull request: [SPARK-11198][STREAMING][KINESIS] Support de-a...

2015-11-09 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/9403#issuecomment-155119060 Huh. Didn't change much but the tests passed this time. I wonder if it was a Java 7 vs. 8 mismatch... Just to be sure, will re-run tests --- If your project is s

[GitHub] spark pull request: [SPARK-11198][STREAMING][KINESIS] Support de-a...

2015-11-09 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/9403#issuecomment-155119083 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-09 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/9143#issuecomment-155129707 Added the test for abrupt close, and added the batching to manual writer --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-11198][STREAMING][KINESIS] Support de-a...

2015-11-09 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/9403#issuecomment-155152211 Aww man. The Kinesis tests passed, hive tests failed. Re-running --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-11198][STREAMING][KINESIS] Support de-a...

2015-11-09 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/9403#issuecomment-15515 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-11359][STREAMING][KINESIS] Checkpoint t...

2015-11-09 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/9421#issuecomment-155167933 @zsxwing It doesn't matter. Since we always checkpoint the latest value we can checkpoint, it doesn't matter if it is sequential or in parallel --- If your

[GitHub] spark pull request: [SPARK-11359][STREAMING][KINESIS] Checkpoint t...

2015-11-09 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/9421#issuecomment-155175426 @zsxwing It should be acceptable as well. Think about it like this: We have 2 receivers, A and B: t0 -> A receives batch with seq number x_0, B receives batch w

[GitHub] spark pull request: [SPARK-11198][STREAMING][KINESIS] Support de-a...

2015-11-09 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/9403#issuecomment-155209271 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-09 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/9143#issuecomment-155211188 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-11198][STREAMING][KINESIS] Support de-a...

2015-11-09 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/9403#issuecomment-155231423 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-11419][STREAMING] Parallel recovery for...

2015-11-09 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/9373#discussion_r44353236 --- Diff: streaming/src/main/scala/org/apache/spark/streaming/util/FileBasedWriteAheadLog.scala --- @@ -149,27 +150,26 @@ private[streaming] class

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-09 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/9143#issuecomment-155240395 Failed to a flaky test. Restarting --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-11141][STREAMING] Batch ReceivedBlockTr...

2015-11-09 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/9143#issuecomment-155240408 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #13655: [SPARK-15935][PySpark]Enable test for sql/streami...

2016-06-13 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/13655#discussion_r66895475 --- Diff: dev/sparktestsupport/modules.py --- @@ -337,6 +337,7 @@ def __hash__(self): "pyspark.sql.group", "pyspar

[GitHub] spark pull request #13655: [SPARK-15935][PySpark]Enable test for sql/streami...

2016-06-13 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/13655#discussion_r66895718 --- Diff: python/pyspark/sql/streaming.py --- @@ -106,28 +114,37 @@ def active(self): """Returns a list of active queries assoc

[GitHub] spark issue #13655: [SPARK-15935][PySpark]Enable test for sql/streaming.py a...

2016-06-13 Thread brkyvz
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/13655 a minor nit, otherwise LGTM! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #13655: [SPARK-15935][PySpark]Enable test for sql/streami...

2016-06-14 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/13655#discussion_r67002254 --- Diff: python/pyspark/sql/streaming.py --- @@ -106,28 +120,34 @@ def active(self): """Returns a list of active queries assoc

[GitHub] spark issue #13665: [SPARK-15935][PySpark]Fix a wrong format tag in the erro...

2016-06-14 Thread brkyvz
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/13665 LGTM! Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark pull request #13703: [SPARK-15981][SQL][STREAMING] Fixed bug and added...

2016-06-16 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/13703#discussion_r67405432 --- Diff: python/pyspark/sql/readwriter.py --- @@ -818,6 +849,8 @@ def option(self, key, value): """Adds an input option for the

[GitHub] spark pull request #13703: [SPARK-15981][SQL][STREAMING] Fixed bug and added...

2016-06-16 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/13703#discussion_r67405408 --- Diff: python/pyspark/sql/readwriter.py --- @@ -818,6 +849,8 @@ def option(self, key, value): """Adds an input option for the

[GitHub] spark pull request: [SPARK-15622] [SQL] Wrap the parent classloader of Janin...

2016-05-31 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/13366 Tested this. Fixes the bug for me! Thanks @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-15622] [SQL] Wrap the parent classloader of Janin...

2016-05-31 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/13366 LGTM. It would be great to be able to add a unit test for this, but it won't be very easy, and is deep in the code where I feel people won't be able to break easily. --- If your

[GitHub] spark pull request: [SPARK-15622] [SQL] Wrap the parent classloader of Janin...

2016-05-31 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/13366 oh okay, perfect --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #13428: [SPARK-12666][CORE] SparkSubmit packages fix for when 'd...

2016-06-01 Thread brkyvz
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/13428 @BryanCutler Am I dreaming or is this for real. I've been trying to solve this issue for a very long time. I honestly want to give you a high five now :) `default(runtime)` makes sense

[GitHub] spark pull request: [SPARK-14555] Second cut of Python API for Str...

2016-04-25 Thread brkyvz
GitHub user brkyvz opened a pull request: https://github.com/apache/spark/pull/12673 [SPARK-14555] Second cut of Python API for Structured Streaming ## What changes were proposed in this pull request? This PR adds Python APIs for: - `ContinuousQueryManager

[GitHub] spark pull request: [SPARK-14555] Second cut of Python API for Str...

2016-04-25 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/12673#issuecomment-214562057 cc @davies @marmbrus @tdas --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-14555] Second cut of Python API for Str...

2016-04-25 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/12673#discussion_r61009792 --- Diff: python/pyspark/sql/streaming.py --- @@ -87,6 +87,84 @@ def stop(self): self._jcq.stop() +class ContinuousQueryManager

[GitHub] spark pull request: [SPARK-14555] Second cut of Python API for Str...

2016-04-25 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/12673#discussion_r61010762 --- Diff: python/pyspark/sql/streaming.py --- @@ -87,6 +87,84 @@ def stop(self): self._jcq.stop() +class ContinuousQueryManager

[GitHub] spark pull request: [SPARK-14555] Second cut of Python API for Str...

2016-04-26 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/12673#issuecomment-214847754 @davies Addressed your comments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-14555] Second cut of Python API for Str...

2016-04-27 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/12673#issuecomment-215198107 Ooops! Addressed both your comments --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-7710][SPARK-7998][DOCS] Docs for DataFr...

2015-08-22 Thread brkyvz
GitHub user brkyvz opened a pull request: https://github.com/apache/spark/pull/8378 [SPARK-7710][SPARK-7998][DOCS] Docs for DataFrameStatFunctions This PR contains examples on how to use some of the Stat Functions available for DataFrames under `df.stat`. @rxin You can

[GitHub] spark pull request: [SPARK-7710] add doc examples for DataFrameSta...

2015-08-22 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/6531#issuecomment-133779846 Closed in favor of #8378. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-7710] add doc examples for DataFrameSta...

2015-08-22 Thread brkyvz
Github user brkyvz closed the pull request at: https://github.com/apache/spark/pull/6531 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [MLLIB] Fix BLAS not scaling with beta = 0.0

2015-08-29 Thread brkyvz
GitHub user brkyvz opened a pull request: https://github.com/apache/spark/pull/8525 [MLLIB] Fix BLAS not scaling with beta = 0.0 @mengxr @jkbradley @rxin It would be great if this fix made it into RC3! You can merge this pull request into a Git repository by running

[GitHub] spark pull request: [SPARK-10353][MLLIB] BLAS gemm not scaling whe...

2015-08-29 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/8525#issuecomment-136092624 `dgemm` does handle the easy cases (alpha == 0.0 && beta == 1.0). `dscal` doesn't handle `beta == 1.0`. I still think we should keep them, because not

[GitHub] spark pull request: [SPARK-10353][MLLIB] BLAS gemm not scaling whe...

2015-08-29 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/8525#issuecomment-136092638 The most important change here is https://github.com/apache/spark/pull/8525/files#diff-6d42e8ee906809226b36f06c9d2a8e8fR414 --- If your project is set up for it, you

[GitHub] spark pull request: [SPARK-3976][MLlib] Added repartitioning for B...

2015-08-31 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/4286#discussion_r38369650 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala --- @@ -246,4 +246,135 @@ class BlockMatrix( val localMat

[GitHub] spark pull request: [SPARK-3976][MLlib] Added repartitioning for B...

2015-08-31 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/4286#discussion_r38369618 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala --- @@ -246,4 +246,135 @@ class BlockMatrix( val localMat

[GitHub] spark pull request: [SPARK-3976][MLlib] Added repartitioning for B...

2015-08-31 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/4286#issuecomment-136522922 The merge conflicts should be minor. Due to `Since(1.3.0)`s. @mengxr Would we like this for 1.6? I'm thinking of implementing some optimizations to the code as well

[GitHub] spark pull request: [SPARK-2434][MLlib]: Warning messages that poi...

2014-07-21 Thread brkyvz
GitHub user brkyvz opened a pull request: https://github.com/apache/spark/pull/1515 [SPARK-2434][MLlib]: Warning messages that point users to original MLlib implementations added to Examples [SPARK-2434][MLlib]: Warning messages that refer users to the original MLlib

[GitHub] spark pull request: [SPARK-2434][MLlib]: Warning messages that poi...

2014-07-21 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/1515#discussion_r15204194 --- Diff: examples/src/main/python/logistic_regression.py --- @@ -47,9 +47,15 @@ def readPointBatch(iterator): return [matrix] if __name__

[GitHub] spark pull request: [SPARK-2801][MLlib]: DistributionGenerator ren...

2014-08-01 Thread brkyvz
GitHub user brkyvz opened a pull request: https://github.com/apache/spark/pull/1732 [SPARK-2801][MLlib]: DistributionGenerator renamed to RandomDataGenerator. RandomRDD is now of generic type The RandomRDDGenerators used to only output RDD[Double]. Now

[GitHub] spark pull request: [SPARK-8313] R Spark packages support

2015-07-16 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/7139#issuecomment-122025235 @sun-rui Thanks for the comments! This PR is basically aiming to support complex Spark Packages, basically packages that have Scala / Java, Python, and/or R code in them

[GitHub] spark pull request: [SPARK-8313] R Spark packages support

2015-07-16 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/7139#issuecomment-122149817 @sun-rui Unfortunately there is no concrete JIRA regarding a discussion. Regarding extra tools, and documentation, there is documentation [here](http://spark

[GitHub] spark pull request: [SPARK-9175] [MLlib] BLAS.gemm fails to update...

2015-07-20 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/7503#issuecomment-122918985 Can we add the same check to gemv please? In addition, instead of calling the dgemm routine to just scale C, maybe we can just call f2jBlas.scal? @mengxr What do you

[GitHub] spark pull request: [SPARK-8715] ArrayOutOfBoundsException fixed f...

2015-06-29 Thread brkyvz
GitHub user brkyvz opened a pull request: https://github.com/apache/spark/pull/7100 [SPARK-8715] ArrayOutOfBoundsException fixed for DataFrameStatSuite.crosstab cc @yhuai You can merge this pull request into a Git repository by running: $ git pull https://github.com/brkyvz

[GitHub] spark pull request: [SPARK-8186] [SPARK-8187] [SQL] datetime funct...

2015-06-29 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/6782#discussion_r33526888 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -946,6 +947,38 @@ object functions { def cosh(columnName: String): Column

[GitHub] spark pull request: [SPARK-8715] ArrayOutOfBoundsException fixed f...

2015-06-29 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/7100#issuecomment-116880115 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [WIP][SPARK-8313] R Spark packages support

2015-06-30 Thread brkyvz
GitHub user brkyvz opened a pull request: https://github.com/apache/spark/pull/7139 [WIP][SPARK-8313] R Spark packages support ** This is a work in progress. Opened this up early for community testing ** @shivaram @cafreeman Could you please help me in testing this out

[GitHub] spark pull request: [WIP][SPARK-8313] R Spark packages support

2015-07-01 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/7139#issuecomment-117839051 @shivaram Thanks for the feedback. I'll add more error messages regarding structuring. Regarding (1), right now, this doesn't install anything on the exec

[GitHub] spark pull request: [SPARK-8803] handle special characters in elem...

2015-07-02 Thread brkyvz
GitHub user brkyvz opened a pull request: https://github.com/apache/spark/pull/7201 [SPARK-8803] handle special characters in elements in crosstab cc @rxin Having back ticks or null as elements causes problems. Since elements become column names, we have to drop them

[GitHub] spark pull request: [SPARK-7944][SPARK-8013] Remove most of the Sp...

2015-07-07 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/6903#issuecomment-119240646 Hey, sorry for missing this thread. I'll publish the 2.11.7 version today and ping you all. --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: [SPARK-7944][SPARK-8013] Remove most of the Sp...

2015-07-07 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/6903#issuecomment-119272261 @dragos Just published the genjavadoc 2.11.7! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-7944][SPARK-8013] Remove most of the Sp...

2015-07-07 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/6903#discussion_r34064787 --- Diff: repl/scala-2.11/src/test/scala/org/apache/spark/repl/ReplSuite.scala --- @@ -22,14 +22,11 @@ import java.net.URLClassLoader import

[GitHub] spark pull request: [SPARK-8903] Fix bug in cherry-pick of SPARK-8...

2015-07-08 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/7295#issuecomment-119697064 LGTM. Thanks @JoshRosen for your swift response! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [WIP][SPARK-8313] R Spark packages support

2015-07-08 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/7139#issuecomment-119828796 @shivaram @cafreeman I believe this is ready. I added unit, and end to end tests. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-8313] R Spark packages support

2015-07-09 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/7139#issuecomment-120066884 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-8313] R Spark packages support

2015-07-09 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/7139#issuecomment-120105112 The test is failing, because Jenkins is not allowing me to install an R package to `$SPARK_HOME/R/lib`. I'll have to update the code somehow. One question I have i

[GitHub] spark pull request: [SPARK-9263] Added flags to exclude dependenci...

2015-07-30 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/7599#issuecomment-126408446 @vanzin The tests passed locally :/ I don't want to see it get flaky again, let's see what happens. --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [SPARK-8313] R Spark packages support

2015-07-30 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/7139#issuecomment-126484543 @sun-rui @shivaram Could you please take a look? Since the codefreeze is right around the corner, I would like this to get in ASAP. In order to make sure that the R

[GitHub] spark pull request: [SPARK-9263] Added flags to exclude dependenci...

2015-07-30 Thread brkyvz
Github user brkyvz commented on the pull request: https://github.com/apache/spark/pull/7599#issuecomment-126517817 jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

<    3   4   5   6   7   8   9   10   11   12   >