[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-03 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81668841 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StatefulAggregate.scala --- @@ -56,7 +57,12 @@ case class StateStoreRestoreExec

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-03 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81672040 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -136,16 +139,30 @@ class StreamExecution

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-03 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81672432 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -317,15 +358,18 @@ class StreamExecution

[GitHub] spark pull request #15307: [WIP][SPARK-17731][SQL][STREAMING] Metrics for st...

2016-10-03 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r81684775 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -136,16 +139,30 @@ class StreamExecution

[GitHub] spark pull request #15352: [SPARK-17780][SQL]Report Throwable to user in Str...

2016-10-05 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/15352#discussion_r82085912 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -207,13 +207,18 @@ class StreamExecution

[GitHub] spark pull request #14553: [SPARK-16963] [STREAMING] [SQL] Changes to Source...

2016-10-14 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/14553#discussion_r83524216 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Source.scala --- @@ -30,16 +30,37 @@ trait Source { /** Returns the

[GitHub] spark pull request #14553: [SPARK-16963] [STREAMING] [SQL] Changes to Source...

2016-10-14 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/14553#discussion_r83524491 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/socket.scala --- @@ -92,21 +105,64 @@ class TextSocketSource(host: String, port

[GitHub] spark issue #14553: [SPARK-16963] [STREAMING] [SQL] Changes to Source trait ...

2016-10-14 Thread frreiss
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/14553 Sorry, I missed the last few email notifications about this PR. I've merged with the head version and made updates to address the most recent round of review comments. Currently running regre

[GitHub] spark issue #14553: [SPARK-16963] [STREAMING] [SQL] Changes to Source trait ...

2016-10-17 Thread frreiss
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/14553 All my changes are in now, and regression tests pass. As far as I can see, all the review comments have been addressed at this point. --- If your project is set up for it, you can reply to this

[GitHub] spark pull request #14553: [SPARK-16963] [STREAMING] [SQL] Changes to Source...

2016-10-19 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/14553#discussion_r84138507 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/Source.scala --- @@ -30,16 +30,30 @@ trait Source { /** Returns the

[GitHub] spark issue #14553: [SPARK-16963] [STREAMING] [SQL] Changes to Source trait ...

2016-10-21 Thread frreiss
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/14553 I've been running tests since this morning; should have updates in soon. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request #14553: [SPARK-16963] [STREAMING] [SQL] Changes to Source...

2016-10-21 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/14553#discussion_r84538526 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -336,17 +342,27 @@ class StreamExecution

[GitHub] spark pull request #14553: [SPARK-16963] [STREAMING] [SQL] Changes to Source...

2016-10-21 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/14553#discussion_r84539662 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -336,17 +342,27 @@ class StreamExecution

[GitHub] spark pull request #14553: [SPARK-16963] [STREAMING] [SQL] Changes to Source...

2016-10-21 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/14553#discussion_r84569335 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -337,17 +343,27 @@ class StreamExecution

[GitHub] spark pull request #14553: [SPARK-16963] [STREAMING] [SQL] Changes to Source...

2016-10-26 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/14553#discussion_r85227658 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/memory.scala --- @@ -111,6 +126,23 @@ case class MemoryStream[A : Encoder](id

[GitHub] spark pull request #14553: [SPARK-16963] [STREAMING] [SQL] Changes to Source...

2016-10-26 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/14553#discussion_r85227714 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/memory.scala --- @@ -111,6 +126,23 @@ case class MemoryStream[A : Encoder](id

[GitHub] spark issue #14553: [SPARK-16963] [STREAMING] [SQL] Changes to Source trait ...

2016-10-26 Thread frreiss
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/14553 Updated the branch and addressed new review comments. Looks like my last push missed a one-line change to memory.scala. Tests are running now. --- If your project is set up for it, you can reply

[GitHub] spark pull request #15162: [SPARK-17386] [STREAMING] [WIP] Make polling rate...

2016-10-26 Thread frreiss
Github user frreiss closed the pull request at: https://github.com/apache/spark/pull/15162 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #15162: [SPARK-17386] [STREAMING] [WIP] Make polling rate adapti...

2016-10-26 Thread frreiss
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/15162 Closing the PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark issue #15027: [SPARK-17475] [STREAMING] Delete CRC files if the filesy...

2016-10-27 Thread frreiss
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/15027 When I comment out line 155 in HDFSMetadataLog.scala on this branch (`if (fileManager.exists(crcPath)) fileManager.delete(crcPath)`) and run the test case attached to this PR, the test case fails

[GitHub] spark issue #15027: [SPARK-17475] [STREAMING] Delete CRC files if the filesy...

2016-11-02 Thread frreiss
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/15027 @viirya to answer your question re deleting vs moving the files: Deleting is easier to implement, because once the .crc file is deleted, you can be sure it won't appear again. Moving the che

[GitHub] spark pull request #14553: [SPARK-16963] Changes to Source trait and related...

2016-08-26 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/14553#discussion_r76498223 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MetadataLog.scala --- @@ -48,4 +49,13 @@ trait MetadataLog[T] { * Return

[GitHub] spark pull request #14553: [SPARK-16963] Changes to Source trait and related...

2016-08-26 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/14553#discussion_r76498251 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -244,6 +250,21 @@ class StreamExecution

[GitHub] spark pull request #14553: [SPARK-16963] Changes to Source trait and related...

2016-08-26 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/14553#discussion_r76498301 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/socket.scala --- @@ -24,21 +24,24 @@ import java.text.SimpleDateFormat

[GitHub] spark pull request #14553: [SPARK-16963] Changes to Source trait and related...

2016-08-26 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/14553#discussion_r76498637 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -727,6 +732,48 @@ class FileStreamSourceSuite extends

[GitHub] spark pull request #13513: [SPARK-15698][SQL][Streaming] Add the ability to ...

2016-08-26 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13513#discussion_r76499068 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala --- @@ -129,3 +131,86 @@ class FileStreamSource

[GitHub] spark issue #14802: [SPARK-17235][SQL] Support purging of old logs in Metada...

2016-08-26 Thread frreiss
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/14802 LGTM. I have written nearly the exact same thing as part of [https://github.com/apache/spark/pull/14553], but can use this version of the method instead. --- If your project is set up for it, you

[GitHub] spark pull request #14773: [SPARK-17203][SQL] data source options should alw...

2016-08-26 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/14773#discussion_r76503740 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/createDataSourceTables.scala --- @@ -65,7 +65,7 @@ case class

[GitHub] spark pull request #14691: [SPARK-16407][STREAMING] Allow users to supply cu...

2016-08-26 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/14691#discussion_r76504239 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala --- @@ -123,12 +124,30 @@ final class DataStreamWriter[T] private

[GitHub] spark pull request #14691: [SPARK-16407][STREAMING] Allow users to supply cu...

2016-08-26 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/14691#discussion_r76505064 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala --- @@ -123,12 +124,30 @@ final class DataStreamWriter[T] private

[GitHub] spark issue #14553: [SPARK-16963] Changes to Source trait and related implem...

2016-08-29 Thread frreiss
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/14553 @rxin and @marmbrus, would it be possible to get this PR reviewed soon? I can split it into smaller chunks if that would make things easier; I just need to know. --- If your project is set up for

[GitHub] spark pull request #14803: [SPARK-17153][SQL] Should read partition data whe...

2016-08-29 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/14803#discussion_r76646983 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FileStreamSource.scala --- @@ -129,13 +129,20 @@ class FileStreamSource

[GitHub] spark pull request #14870: [SPARK-17303] Added spark-warehouse to dev/.rat-e...

2016-08-29 Thread frreiss
GitHub user frreiss opened a pull request: https://github.com/apache/spark/pull/14870 [SPARK-17303] Added spark-warehouse to dev/.rat-excludes ## What changes were proposed in this pull request? Excludes the `spark-warehouse` directory from the Apache RAT checks that src

[GitHub] spark issue #14803: [SPARK-17153][SQL] Should read partition data when readi...

2016-08-30 Thread frreiss
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/14803 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #14553: [SPARK-16963] Changes to Source trait and related implem...

2016-08-31 Thread frreiss
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/14553 @ScrapCodes, would you mind triggering a build of this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #14945: [SPARK-17386] Set default trigger interval to 1/1...

2016-09-02 Thread frreiss
GitHub user frreiss opened a pull request: https://github.com/apache/spark/pull/14945 [SPARK-17386] Set default trigger interval to 1/10 second ## What changes were proposed in this pull request? This pull request implements the most expedient change to fix SPARK-17386

[GitHub] spark pull request #14986: [WIP] [SPARK-17421] Don't use -XX:MaxPermSize opt...

2016-09-06 Thread frreiss
GitHub user frreiss opened a pull request: https://github.com/apache/spark/pull/14986 [WIP] [SPARK-17421] Don't use -XX:MaxPermSize option when Java version >= 8 ## What changes were proposed in this pull request? Modifies the `build/mvn` and `build/sbt-launch-

[GitHub] spark issue #14986: [WIP] [SPARK-17421] Don't use -XX:MaxPermSize option whe...

2016-09-07 Thread frreiss
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/14986 Make sense. I will close this PR and just add a clarification to the documentation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #14986: [WIP] [SPARK-17421] Don't use -XX:MaxPermSize opt...

2016-09-07 Thread frreiss
Github user frreiss closed the pull request at: https://github.com/apache/spark/pull/14986 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #15005: [SPARK-17421] Documenting the current treatment o...

2016-09-07 Thread frreiss
GitHub user frreiss opened a pull request: https://github.com/apache/spark/pull/15005 [SPARK-17421] Documenting the current treatment of MAVEN_OPTS. ## What changes were proposed in this pull request? Modified the documentation to clarify that `build/mvn` and `pom.xml

[GitHub] spark issue #14945: [SPARK-17386] Set default trigger interval to 1/10 secon...

2016-09-07 Thread frreiss
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/14945 On a closer reading of the code, there is a more expedient fix; change the default STREAMING_POLLING_DELAY parameter. Will redo. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #14945: [SPARK-17386] Set default trigger interval to 1/1...

2016-09-07 Thread frreiss
Github user frreiss closed the pull request at: https://github.com/apache/spark/pull/14945 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #15027: [SPARK-17475] [STREAMING] Delete CRC files if the...

2016-09-09 Thread frreiss
GitHub user frreiss opened a pull request: https://github.com/apache/spark/pull/15027 [SPARK-17475] [STREAMING] Delete CRC files if the filesystem doesn't use checksum files ## What changes were proposed in this pull request? When the metadata logs for various par

[GitHub] spark issue #15005: [SPARK-17421] [DOCS] Documenting the current treatment o...

2016-09-09 Thread frreiss
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/15005 Sure, I'll redo that part so that includes two sets of recommended options. Note that docs in the Spark 2.0.0 release say that these options aren't necessary for Java 8. --- If your

[GitHub] spark pull request #15027: [SPARK-17475] [STREAMING] Delete CRC files if the...

2016-09-12 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/15027#discussion_r78469982 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/HDFSMetadataLog.scala --- @@ -146,6 +146,11 @@ class HDFSMetadataLog[T: ClassTag

[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...

2016-09-12 Thread frreiss
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/13513 You could just move the metadata deletion logic from FileStreamSinkLog into CompactibleFileStreamLog. Then FileStreamSource could issue DELETE log records for files that are older than

[GitHub] spark pull request #15067: [SPARK-17513] [STREAMING] [SQL] Make StreamExecut...

2016-09-12 Thread frreiss
GitHub user frreiss opened a pull request: https://github.com/apache/spark/pull/15067 [SPARK-17513] [STREAMING] [SQL] Make StreamExecution garbage-collect its metadata ## What changes were proposed in this pull request? This PR modifies StreamExecution such that it

[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...

2016-09-13 Thread frreiss
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/13513 Ah, now I fully understand @zsxwing's earlier comment about the semantics of the semantics of `Source.getBatch()`. Those semantics have a design flaw; see the email thread I started at

[GitHub] spark issue #15005: [SPARK-17421] [DOCS] Documenting the current treatment o...

2016-09-14 Thread frreiss
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/15005 Quick update: I'm running a series of test builds with various parameters to determine what parts of MAVEN_OPTS are currently necessary on different versions of Java. Will report back in a few

[GitHub] spark issue #15005: [SPARK-17421] [DOCS] Documenting the current treatment o...

2016-09-20 Thread frreiss
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/15005 I've about narrowed down the options that work for OpenJDK 7 and 8 on Mac and Linux. Working on IBM Java on Linux. I can have an update in by EOD today. BTW, one thing that's been

[GitHub] spark pull request #15067: [SPARK-17513] [STREAMING] [SQL] Make StreamExecut...

2016-09-20 Thread frreiss
Github user frreiss closed the pull request at: https://github.com/apache/spark/pull/15067 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #15067: [SPARK-17513] [STREAMING] [SQL] Make StreamExecut...

2016-09-20 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/15067#discussion_r79662093 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala --- @@ -125,6 +125,32 @@ class StreamingQuerySuite extends

[GitHub] spark pull request #15162: [SPARK-17386] [STREAMING] [WIP] Make polling rate...

2016-09-20 Thread frreiss
GitHub user frreiss opened a pull request: https://github.com/apache/spark/pull/15162 [SPARK-17386] [STREAMING] [WIP] Make polling rate adaptive ## What changes were proposed in this pull request? This change makes the scheduler in `StreamExecution` adjust its rate of

[GitHub] spark pull request #15166: [SPARK-17513][SQL] Make StreamExecution garbage-c...

2016-09-20 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/15166#discussion_r79722643 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala --- @@ -125,6 +125,30 @@ class StreamingQuerySuite extends

[GitHub] spark pull request #15166: [SPARK-17513][SQL] Make StreamExecution garbage-c...

2016-09-20 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/15166#discussion_r79730664 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala --- @@ -125,6 +125,30 @@ class StreamingQuerySuite extends

[GitHub] spark pull request #15166: [SPARK-17513][SQL] Make StreamExecution garbage-c...

2016-09-20 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/15166#discussion_r79730904 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQuerySuite.scala --- @@ -125,6 +125,30 @@ class StreamingQuerySuite extends

[GitHub] spark issue #15005: [SPARK-17421] [DOCS] Documenting the current treatment o...

2016-09-20 Thread frreiss
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/15005 Summary of testing: - On Java 8, the build fails intermittently with OOM when `-Xmx2g` is omitted - The `-XX:ReservedCodeCacheSize=512m` argument prevents warnings on both Java 7 and

[GitHub] spark pull request #15005: [SPARK-17421] [DOCS] Documenting the current trea...

2016-09-20 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/15005#discussion_r79765892 --- Diff: docs/building-spark.md --- @@ -16,24 +16,31 @@ Building Spark using Maven requires Maven 3.3.9 or newer and Java 7+. ### Setting up

[GitHub] spark pull request #15005: [SPARK-17421] [DOCS] Documenting the current trea...

2016-09-20 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/15005#discussion_r79766509 --- Diff: docs/building-spark.md --- @@ -16,24 +16,27 @@ Building Spark using Maven requires Maven 3.3.9 or newer and Java 7+. ### Setting up

[GitHub] spark pull request #15005: [SPARK-17421] [DOCS] Documenting the current trea...

2016-09-21 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/15005#discussion_r79902326 --- Diff: docs/building-spark.md --- @@ -16,24 +16,32 @@ Building Spark using Maven requires Maven 3.3.9 or newer and Java 7+. ### Setting up

[GitHub] spark issue #15005: [SPARK-17421] [DOCS] Documenting the current treatment o...

2016-09-23 Thread frreiss
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/15005 Thanks @srowen for all the thoughtful comments! It's great to see committers spending time to help improve the build experience for new developers. --- If your project is set up for it, yo

[GitHub] spark pull request #15262: [SPARK-17690][STREAMING][SQL] Add mini-dfs cluste...

2016-09-27 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/15262#discussion_r80826485 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -330,15 +353,42 @@ class FileStreamSourceSuite extends

[GitHub] spark pull request #15258: [SPARK-17689][SQL][STREAMING] added excludeFiles ...

2016-09-27 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/15258#discussion_r80838376 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ListingFileCatalog.scala --- @@ -50,6 +50,19 @@ class ListingFileCatalog

[GitHub] spark issue #15258: [SPARK-17689][SQL][STREAMING] added excludeFiles option ...

2016-09-27 Thread frreiss
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/15258 This change allows FileInputStream to consume partial outputs of a system such as Hadoop or another copy of Spark, provided that the system adheres rigidly to the write policy of recent versions of

[GitHub] spark issue #15262: [SPARK-17690][STREAMING][SQL] Add mini-dfs cluster based...

2016-09-28 Thread frreiss
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/15262 LGTM overall. We may want to switch more of the test cases to use HDFS in a follow-on JIRA. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #13155: [SPARK-15370] [SQL] Update RewriteCorrelatedScala...

2016-06-09 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r66478261 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark issue #13155: [SPARK-15370] [SQL] Update RewriteCorrelatedScalarSubque...

2016-06-09 Thread frreiss
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/13155 @rxin I'll have an updated set of changes in tonight --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #13155: [SPARK-15370] [SQL] Update RewriteCorrelatedScala...

2016-06-09 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r66509405 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request #13155: [SPARK-15370] [SQL] Update RewriteCorrelatedScala...

2016-06-09 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r66509444 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request #13155: [SPARK-15370] [SQL] Update RewriteCorrelatedScala...

2016-06-09 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r66509510 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request #13155: [SPARK-15370] [SQL] Update RewriteCorrelatedScala...

2016-06-09 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r66539170 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request #13155: [SPARK-15370] [SQL] Update RewriteCorrelatedScala...

2016-06-09 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r66558119 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request #13155: [SPARK-15370] [SQL] Update RewriteCorrelatedScala...

2016-06-09 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r66558125 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request #13155: [SPARK-15370] [SQL] Update RewriteCorrelatedScala...

2016-06-09 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r66560868 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request #13155: [SPARK-15370] [SQL] Update RewriteCorrelatedScala...

2016-06-09 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r66560947 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request #13155: [SPARK-15370] [SQL] Update RewriteCorrelatedScala...

2016-06-09 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r66561017 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request #13155: [SPARK-15370] [SQL] Update RewriteCorrelatedScala...

2016-06-09 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r66561815 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request #13155: [SPARK-15370] [SQL] Update RewriteCorrelatedScala...

2016-06-09 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r66564793 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark issue #13155: [SPARK-15370] [SQL] Update RewriteCorrelatedScalarSubque...

2016-06-09 Thread frreiss
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/13155 Updated changes are in. Running a full regression suite overnight. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #13155: [SPARK-15370] [SQL] Update RewriteCorrelatedScalarSubque...

2016-06-10 Thread frreiss
Github user frreiss commented on the issue: https://github.com/apache/spark/pull/13155 Tests ran successfully on my machine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...

2016-05-24 Thread frreiss
Github user frreiss commented on the pull request: https://github.com/apache/spark/pull/13155#issuecomment-221336991 Could one of the committers please trigger another build on this PR? The change set passes all the tests on my machine, but it's good to be safe. --- If your pr

[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...

2016-05-27 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r64941953 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...

2016-05-27 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r64942480 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...

2016-05-27 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r64942870 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...

2016-05-27 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r64943404 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...

2016-05-27 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r64944724 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...

2016-05-28 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r64995985 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...

2016-05-28 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r64996012 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedScalarSubque...

2016-05-31 Thread frreiss
Github user frreiss commented on the pull request: https://github.com/apache/spark/pull/13155 Thanks @hvanhovell for the additional pass of review! I'll be preparing my slides for Spark Summit all day today but will come back to this PR as soon as that's done. --- If yo

[GitHub] spark pull request #13155: [SPARK-15370] [SQL] Update RewriteCorrelatedScala...

2016-06-01 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r65454461 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request #13155: [SPARK-15370] [SQL] Update RewriteCorrelatedScala...

2016-06-02 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r65583548 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request #13155: [SPARK-15370] [SQL] Update RewriteCorrelatedScala...

2016-06-02 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r65584546 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request: [SPARK-13857][ML][WIP] Add "recommend all" fun...

2016-04-25 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/12574#discussion_r60958325 --- Diff: mllib/src/main/scala/org/apache/spark/ml/recommendation/ALS.scala --- @@ -218,11 +292,135 @@ class ALSModel private[ml] ( predict

[GitHub] spark pull request: [SPARK-13857][ML][WIP] Add "recommend all" fun...

2016-04-25 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/12574#discussion_r60958628 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala --- @@ -261,58 +261,93 @@ object

[GitHub] spark pull request: [SPARK-13857][ML][WIP] Add "recommend all" fun...

2016-04-25 Thread frreiss
Github user frreiss commented on the pull request: https://github.com/apache/spark/pull/12574#issuecomment-214463888 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...

2016-05-17 Thread frreiss
GitHub user frreiss opened a pull request: https://github.com/apache/spark/pull/13155 [SPARK-15370] [SQL] Update RewriteCorrelatedScalarSubquery rule to fix COUNT bug ## What changes were proposed in this pull request? This pull request fixes the COUNT bug in the

[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...

2016-05-18 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r63756577 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1648,16 +1648,56 @@ object

[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...

2016-05-18 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r63757089 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1648,16 +1648,56 @@ object

[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...

2016-05-18 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r63759450 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1648,16 +1648,56 @@ object

[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...

2016-05-18 Thread frreiss
Github user frreiss commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r63767862 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala --- @@ -293,4 +293,65 @@ class SubquerySuite extends QueryTest with

  1   2   >