GitHub user yintengfei opened a pull request: https://github.com/apache/spark/pull/15404
Branch 2.0 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/spark branch-2.0 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/15404.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #15404 ---- commit 5735b8bd769c64e2b0e0fae75bad794cde3edc99 Author: Reynold Xin <r...@databricks.com> Date: 2016-08-18T08:37:25Z [SPARK-16391][SQL] Support partial aggregation for reduceGroups ## What changes were proposed in this pull request? This patch introduces a new private ReduceAggregator interface that is a subclass of Aggregator. ReduceAggregator only requires a single associative and commutative reduce function. ReduceAggregator is also used to implement KeyValueGroupedDataset.reduceGroups in order to support partial aggregation. Note that the pull request was initially done by viirya. ## How was this patch tested? Covered by original tests for reduceGroups, as well as a new test suite for ReduceAggregator. Author: Reynold Xin <r...@databricks.com> Author: Liang-Chi Hsieh <sim...@tw.ibm.com> Closes #14576 from rxin/reduceAggregator. (cherry picked from commit 1748f824101870b845dbbd118763c6885744f98a) Signed-off-by: Wenchen Fan <wenc...@databricks.com> commit ec5f157a32f0c65b5f93bdde7a6334e982b3b83c Author: petermaxlee <petermax...@gmail.com> Date: 2016-08-18T11:44:13Z [SPARK-17117][SQL] 1 / NULL should not fail analysis ## What changes were proposed in this pull request? This patch fixes the problem described in SPARK-17117, i.e. "SELECT 1 / NULL" throws an analysis exception: ``` org.apache.spark.sql.AnalysisException: cannot resolve '(1 / NULL)' due to data type mismatch: differing types in '(1 / NULL)' (int and null). ``` The problem is that division type coercion did not take null type into account. ## How was this patch tested? A unit test for the type coercion, and a few end-to-end test cases using SQLQueryTestSuite. Author: petermaxlee <petermax...@gmail.com> Closes #14695 from petermaxlee/SPARK-17117. (cherry picked from commit 68f5087d2107d6afec5d5745f0cb0e9e3bdd6a0b) Signed-off-by: Herman van Hovell <hvanhov...@databricks.com> commit 176af17a7213a4c2847a04f715137257657f2961 Author: Xin Ren <iamsh...@126.com> Date: 2016-08-10T07:49:06Z [MINOR][SPARKR] R API documentation for "coltypes" is confusing ## What changes were proposed in this pull request? R API documentation for "coltypes" is confusing, found when working on another ticket. Current version http://spark.apache.org/docs/2.0.0/api/R/coltypes.html, where parameters have 2 "x" which is a duplicate, and also the example is not very clear ![current](https://cloud.githubusercontent.com/assets/3925641/17386808/effb98ce-59a2-11e6-9657-d477d258a80c.png) ![screen shot 2016-08-03 at 5 56 00 pm](https://cloud.githubusercontent.com/assets/3925641/17386884/91831096-59a3-11e6-84af-39890b3d45d8.png) ## How was this patch tested? Tested manually on local machine. And the screenshots are like below: ![screen shot 2016-08-07 at 11 29 20 pm](https://cloud.githubusercontent.com/assets/3925641/17471144/df36633c-5cf6-11e6-8238-4e32ead0e529.png) ![screen shot 2016-08-03 at 5 56 22 pm](https://cloud.githubusercontent.com/assets/3925641/17386896/9d36cb26-59a3-11e6-9619-6dae29f7ab17.png) Author: Xin Ren <iamsh...@126.com> Closes #14489 from keypointt/rExample. (cherry picked from commit 1203c8415cd11540f79a235e66a2f241ca6c71e4) Signed-off-by: Shivaram Venkataraman <shiva...@cs.berkeley.edu> commit ea684b69cd6934bc093f4a5a8b0d8470e92157cd Author: Eric Liang <e...@databricks.com> Date: 2016-08-18T11:33:55Z [SPARK-17069] Expose spark.range() as table-valued function in SQL This adds analyzer rules for resolving table-valued functions, and adds one builtin implementation for range(). The arguments for range() are the same as those of `spark.range()`. Unit tests. cc hvanhovell Author: Eric Liang <e...@databricks.com> Closes #14656 from ericl/sc-4309. (cherry picked from commit 412dba63b511474a6db3c43c8618d803e604bc6b) Signed-off-by: Reynold Xin <r...@databricks.com> commit c180d637a3caca0d4e46f4980c10d1005eb453bc Author: petermaxlee <petermax...@gmail.com> Date: 2016-08-19T01:19:47Z [SPARK-16947][SQL] Support type coercion and foldable expression for inline tables This patch improves inline table support with the following: 1. Support type coercion. 2. Support using foldable expressions. Previously only literals were supported. 3. Improve error message handling. 4. Improve test coverage. Added a new unit test suite ResolveInlineTablesSuite and a new file-based end-to-end test inline-table.sql. Author: petermaxlee <petermax...@gmail.com> Closes #14676 from petermaxlee/SPARK-16947. (cherry picked from commit f5472dda51b980a726346587257c22873ff708e3) Signed-off-by: Reynold Xin <r...@databricks.com> commit 05b180faa4bd87498516c05d4769cc2f51d56aae Author: Reynold Xin <r...@databricks.com> Date: 2016-08-19T02:02:32Z HOTFIX: compilation broken due to protected ctor. (cherry picked from commit b482c09fa22c5762a355f95820e4ba3e2517fb77) Signed-off-by: Reynold Xin <r...@databricks.com> commit d55d1f454e6739ccff9c748f78462d789b09991f Author: Nick Lavers <nick.lav...@videoamp.com> Date: 2016-08-19T09:11:59Z [SPARK-16961][CORE] Fixed off-by-one error that biased randomizeInPlace JIRA issue link: https://issues.apache.org/jira/browse/SPARK-16961 Changed one line of Utils.randomizeInPlace to allow elements to stay in place. Created a unit test that runs a Pearson's chi squared test to determine whether the output diverges significantly from a uniform distribution. Author: Nick Lavers <nick.lav...@videoamp.com> Closes #14551 from nicklavers/SPARK-16961-randomizeInPlace. (cherry picked from commit 5377fc62360d5e9b5c94078e41d10a96e0e8a535) Signed-off-by: Sean Owen <so...@cloudera.com> commit e0c60f1850706faf2830b09af3dc6b52ffd9991e Author: Reynold Xin <r...@databricks.com> Date: 2016-08-19T13:11:35Z [SPARK-16994][SQL] Whitelist operators for predicate pushdown ## What changes were proposed in this pull request? This patch changes predicate pushdown optimization rule (PushDownPredicate) from using a blacklist to a whitelist. That is to say, operators must be explicitly allowed. This approach is more future-proof: previously it was possible for us to introduce a new operator and then render the optimization rule incorrect. This also fixes the bug that previously we allowed pushing filter beneath limit, which was incorrect. That is to say, before this patch, the optimizer would rewrite ``` select * from (select * from range(10) limit 5) where id > 3 to select * from range(10) where id > 3 limit 5 ``` ## How was this patch tested? - a unit test case in FilterPushdownSuite - an end-to-end test in limit.sql Author: Reynold Xin <r...@databricks.com> Closes #14713 from rxin/SPARK-16994. (cherry picked from commit 67e59d464f782ff5f509234212aa072a7653d7bf) Signed-off-by: Wenchen Fan <wenc...@databricks.com> commit d0707c6baeb4003735a508f981111db370984354 Author: Kousuke Saruta <saru...@oss.nttdata.co.jp> Date: 2016-08-19T15:11:25Z [SPARK-11227][CORE] UnknownHostException can be thrown when NameNode HA is enabled. ## What changes were proposed in this pull request? If the following conditions are satisfied, executors don't load properties in `hdfs-site.xml` and UnknownHostException can be thrown. (1) NameNode HA is enabled (2) spark.eventLogging is disabled or logging path is NOT on HDFS (3) Using Standalone or Mesos for the cluster manager (4) There are no code to load `HdfsCondition` class in the driver regardless of directly or indirectly. (5) The tasks access to HDFS (There might be some more conditions...) For example, following code causes UnknownHostException when the conditions above are satisfied. ``` sc.textFile("<path on HDFS>").collect ``` ``` java.lang.IllegalArgumentException: java.net.UnknownHostException: hacluster at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170) at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:656) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:438) at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:411) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:986) at org.apache.spark.SparkContext$$anonfun$hadoopFile$1$$anonfun$32.apply(SparkContext.scala:986) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:177) at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$6.apply(HadoopRDD.scala:177) at scala.Option.map(Option.scala:146) at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:177) at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:213) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:209) at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318) at org.apache.spark.rdd.RDD.iterator(RDD.scala:282) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.UnknownHostException: hacluster ``` But following code doesn't cause the Exception because `textFile` method loads `HdfsConfiguration` indirectly. ``` sc.textFile("<path on HDFS>").collect ``` When a job includes some operations which access to HDFS, the object of `org.apache.hadoop.Configuration` is wrapped by `SerializableConfiguration`, serialized and broadcasted from driver to executors and each executor deserialize the object with `loadDefaults` false so HDFS related properties should be set before broadcasted. ## How was this patch tested? Tested manually on my standalone cluster. Author: Kousuke Saruta <saru...@oss.nttdata.co.jp> Closes #13738 from sarutak/SPARK-11227. (cherry picked from commit 071eaaf9d2b63589f2e66e5279a16a5a484de6f5) Signed-off-by: Tom Graves <tgra...@yahoo-inc.com> commit 3276ccfac807514d5a959415bcf58d2aa6ed8fbc Author: Liang-Chi Hsieh <sim...@tw.ibm.com> Date: 2016-07-26T04:00:01Z [SPARK-16686][SQL] Remove PushProjectThroughSample since it is handled by ColumnPruning We push down `Project` through `Sample` in `Optimizer` by the rule `PushProjectThroughSample`. However, if the projected columns produce new output, they will encounter whole data instead of sampled data. It will bring some inconsistency between original plan (Sample then Project) and optimized plan (Project then Sample). In the extreme case such as attached in the JIRA, if the projected column is an UDF which is supposed to not see the sampled out data, the result of UDF will be incorrect. Since the rule `ColumnPruning` already handles general `Project` pushdown. We don't need `PushProjectThroughSample` anymore. The rule `ColumnPruning` also avoids the described issue. Jenkins tests. Author: Liang-Chi Hsieh <sim...@tw.ibm.com> Closes #14327 from viirya/fix-sample-pushdown. (cherry picked from commit 7b06a8948fc16d3c14e240fdd632b79ce1651008) Signed-off-by: Reynold Xin <r...@databricks.com> commit ae89c8e170dd77e0b2adc04a2c85577f6df5cdef Author: Sital Kedia <ske...@fb.com> Date: 2016-08-19T18:27:30Z [SPARK-17113] [SHUFFLE] Job failure due to Executor OOM in offheap mode ## What changes were proposed in this pull request? This PR fixes executor OOM in offheap mode due to bug in Cooperative Memory Management for UnsafeExternSorter. UnsafeExternalSorter was checking if memory page is being used by upstream by comparing the base object address of the current page with the base object address of upstream. However, in case of offheap memory allocation, the base object addresses are always null, so there was no spilling happening and eventually the operator would OOM. Following is the stack trace this issue addresses - java.lang.OutOfMemoryError: Unable to acquire 1220 bytes of memory, got 0 at org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:120) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:341) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:362) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:93) at org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:170) ## How was this patch tested? Tested by running the failing job. Author: Sital Kedia <ske...@fb.com> Closes #14693 from sitalkedia/fix_offheap_oom. (cherry picked from commit cf0cce90364d17afe780ff9a5426dfcefa298535) Signed-off-by: Davies Liu <davies....@gmail.com> commit efe832200f2fdf90868f5d03b45f1d75502444b3 Author: petermaxlee <petermax...@gmail.com> Date: 2016-08-20T01:14:45Z [SPARK-17149][SQL] array.sql for testing array related functions ## What changes were proposed in this pull request? This patch creates array.sql in SQLQueryTestSuite for testing array related functions, including: - indexing - array creation - size - array_contains - sort_array ## How was this patch tested? The patch itself is about adding tests. Author: petermaxlee <petermax...@gmail.com> Closes #14708 from petermaxlee/SPARK-17149. (cherry picked from commit a117afa7c2d94f943106542ec53d74ba2b5f1058) Signed-off-by: Reynold Xin <r...@databricks.com> commit 379b1272925e534d99ddf4e4add054284900d200 Author: Srinath Shankar <srin...@databricks.com> Date: 2016-08-20T02:54:26Z [SPARK-17158][SQL] Change error message for out of range numeric literals ## What changes were proposed in this pull request? Modifies error message for numeric literals to Numeric literal <literal> does not fit in range [min, max] for type <T> ## How was this patch tested? Fixed up the error messages for literals.sql in SqlQueryTestSuite and re-ran via sbt. Also fixed up error messages in ExpressionParserSuite Author: Srinath Shankar <srin...@databricks.com> Closes #14721 from srinathshankar/sc4296. (cherry picked from commit ba1737c21aab91ff3f1a1737aa2d6b07575e36a3) Signed-off-by: Reynold Xin <r...@databricks.com> commit f7458c71d3b02864acb33fc48c130a0a734e9723 Author: petermaxlee <petermax...@gmail.com> Date: 2016-08-20T05:19:38Z [SPARK-17150][SQL] Support SQL generation for inline tables ## What changes were proposed in this pull request? This patch adds support for SQL generation for inline tables. With this, it would be possible to create a view that depends on inline tables. ## How was this patch tested? Added a test case in LogicalPlanToSQLSuite. Author: petermaxlee <petermax...@gmail.com> Closes #14709 from petermaxlee/SPARK-17150. (cherry picked from commit 45d40d9f66c666eec6df926db23937589d67225d) Signed-off-by: Wenchen Fan <wenc...@databricks.com> commit 4c4c2753b1012e395ae3896396b6509d6082fdf2 Author: Liang-Chi Hsieh <sim...@tw.ibm.com> Date: 2016-08-20T15:29:48Z [SPARK-17104][SQL] LogicalRelation.newInstance should follow the semantics of MultiInstanceRelation ## What changes were proposed in this pull request? Currently `LogicalRelation.newInstance()` simply creates another `LogicalRelation` object with the same parameters. However, the `newInstance()` method inherited from `MultiInstanceRelation` should return a copy of object with unique expression ids. Current `LogicalRelation.newInstance()` can cause failure when doing self-join. ## How was this patch tested? Jenkins tests. Author: Liang-Chi Hsieh <sim...@tw.ibm.com> Closes #14682 from viirya/fix-localrelation. (cherry picked from commit 31a015572024046f4deaa6cec66bb6fab110f31d) Signed-off-by: Wenchen Fan <wenc...@databricks.com> commit 24dd9a702694db1d2c28ff4c41edac2b3112df60 Author: petermaxlee <petermax...@gmail.com> Date: 2016-08-20T16:25:55Z [SPARK-17124][SQL] RelationalGroupedDataset.agg should preserve order and allow multiple aggregates per column ## What changes were proposed in this pull request? This patch fixes a longstanding issue with one of the RelationalGroupedDataset.agg function. Even though the signature accepts vararg of pairs, the underlying implementation turns the seq into a map, and thus not order preserving nor allowing multiple aggregates per column. This change also allows users to use this function to run multiple different aggregations for a single column, e.g. ``` agg("age" -> "max", "age" -> "count") ``` ## How was this patch tested? Added a test case in DataFrameAggregateSuite. Author: petermaxlee <petermax...@gmail.com> Closes #14697 from petermaxlee/SPARK-17124. (cherry picked from commit 9560c8d29542a5dcaaa07b7af9ef5ddcdbb5d14d) Signed-off-by: Wenchen Fan <wenc...@databricks.com> commit faff9297d154596e35de555c819049ba9a51d57d Author: Bryan Cutler <cutl...@gmail.com> Date: 2016-08-20T20:45:26Z [SPARK-12666][CORE] SparkSubmit packages fix for when 'default' conf doesn't exist in dependent module ## What changes were proposed in this pull request? Adding a "(runtime)" to the dependency configuration will set a fallback configuration to be used if the requested one is not found. E.g. with the setting "default(runtime)", Ivy will look for the conf "default" in the module ivy file and if not found will look for the conf "runtime". This can help with the case when using "sbt publishLocal" which does not write a "default" conf in the published ivy.xml file. ## How was this patch tested? used spark-submit with --packages option for a package published locally with no default conf, and a package resolved from Maven central. Author: Bryan Cutler <cutl...@gmail.com> Closes #13428 from BryanCutler/fallback-package-conf-SPARK-12666. (cherry picked from commit 9f37d4eac28dd179dd523fa7d645be97bb52af9c) Signed-off-by: Josh Rosen <joshro...@databricks.com> commit 26d5a8b0dab10310ec76b91465b3b4ff465e9746 Author: Xiangrui Meng <m...@databricks.com> Date: 2016-08-21T17:31:25Z [MINOR][R] add SparkR.Rcheck/ and SparkR_*.tar.gz to R/.gitignore ## What changes were proposed in this pull request? Ignore temp files generated by `check-cran.sh`. Author: Xiangrui Meng <m...@databricks.com> Closes #14740 from mengxr/R-gitignore. (cherry picked from commit ab7143463daf2056736c85e3a943c826b5992623) Signed-off-by: Xiangrui Meng <m...@databricks.com> commit 0297896119e11f23da4b14f62f50ec72b5fac57f Author: Junyang Qian <junya...@databricks.com> Date: 2016-08-20T13:59:23Z [SPARK-16508][SPARKR] Fix CRAN undocumented/duplicated arguments warnings. This PR tries to fix all the remaining "undocumented/duplicated arguments" warnings given by CRAN-check. One left is doc for R `stats::glm` exported in SparkR. To mute that warning, we have to also provide document for all arguments of that non-SparkR function. Some previous conversation is in #14558. R unit test and `check-cran.sh` script (with no-test). Author: Junyang Qian <junya...@databricks.com> Closes #14705 from junyangq/SPARK-16508-master. (cherry picked from commit 01401e965b58f7e8ab615764a452d7d18f1d4bf0) Signed-off-by: Shivaram Venkataraman <shiva...@cs.berkeley.edu> commit e62b29f29f44196a1cbe13004ff4abfd8e5be1c1 Author: Dongjoon Hyun <dongj...@apache.org> Date: 2016-08-21T20:07:47Z [SPARK-17098][SQL] Fix `NullPropagation` optimizer to handle `COUNT(NULL) OVER` correctly ## What changes were proposed in this pull request? Currently, `NullPropagation` optimizer replaces `COUNT` on null literals in a bottom-up fashion. During that, `WindowExpression` is not covered properly. This PR adds the missing propagation logic. **Before** ```scala scala> sql("SELECT COUNT(1 + NULL) OVER ()").show java.lang.UnsupportedOperationException: Cannot evaluate expression: cast(0 as bigint) windowspecdefinition(ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) ``` **After** ```scala scala> sql("SELECT COUNT(1 + NULL) OVER ()").show +----------------------------------------------------------------------------------------------+ |count((1 + CAST(NULL AS INT))) OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)| +----------------------------------------------------------------------------------------------+ | 0| +----------------------------------------------------------------------------------------------+ ``` ## How was this patch tested? Pass the Jenkins test with a new test case. Author: Dongjoon Hyun <dongj...@apache.org> Closes #14689 from dongjoon-hyun/SPARK-17098. (cherry picked from commit 91c2397684ab791572ac57ffb2a924ff058bb64f) Signed-off-by: Herman van Hovell <hvanhov...@databricks.com> commit 49cc44de3ad5495b2690633791941aa00a62b553 Author: Davies Liu <dav...@databricks.com> Date: 2016-08-22T08:16:03Z [SPARK-17115][SQL] decrease the threshold when split expressions ## What changes were proposed in this pull request? In 2.0, we change the threshold of splitting expressions from 16K to 64K, which cause very bad performance on wide table, because the generated method can't be JIT compiled by default (above the limit of 8K bytecode). This PR will decrease it to 1K, based on the benchmark results for a wide table with 400 columns of LongType. It also fix a bug around splitting expression in whole-stage codegen (it should not split them). ## How was this patch tested? Added benchmark suite. Author: Davies Liu <dav...@databricks.com> Closes #14692 from davies/split_exprs. (cherry picked from commit 8d35a6f68d6d733212674491cbf31bed73fada0f) Signed-off-by: Wenchen Fan <wenc...@databricks.com> commit 2add45fabeb0ea4f7b17b5bc4910161370e72627 Author: Jagadeesan <a...@us.ibm.com> Date: 2016-08-22T08:30:31Z [SPARK-17085][STREAMING][DOCUMENTATION AND ACTUAL CODE DIFFERS - UNSUPPORTED OPERATIONS] Changes in Spark Stuctured Streaming doc in this link https://spark.apache.org/docs/2.0.0/structured-streaming-programming-guide.html#unsupported-operations Author: Jagadeesan <a...@us.ibm.com> Closes #14715 from jagadeesanas2/SPARK-17085. (cherry picked from commit bd9655063bdba8836b4ec96ed115e5653e246b65) Signed-off-by: Sean Owen <so...@cloudera.com> commit 79195982a4c6f8b1a3e02069dea00049cc806574 Author: Junyang Qian <junya...@databricks.com> Date: 2016-08-22T17:03:48Z [SPARKR][MINOR] Fix Cache Folder Path in Windows ## What changes were proposed in this pull request? This PR tries to fix the scheme of local cache folder in Windows. The name of the environment variable should be `LOCALAPPDATA` rather than `%LOCALAPPDATA%`. ## How was this patch tested? Manual test in Windows 7. Author: Junyang Qian <junya...@databricks.com> Closes #14743 from junyangq/SPARKR-FixWindowsInstall. (cherry picked from commit 209e1b3c0683a9106428e269e5041980b6cc327f) Signed-off-by: Shivaram Venkataraman <shiva...@cs.berkeley.edu> commit 94eff08757cee70c5b31fff7095bbb1e6ebc7ecf Author: Sean Owen <so...@cloudera.com> Date: 2016-08-22T18:15:53Z [SPARK-16320][DOC] Document G1 heap region's effect on spark 2.0 vs 1.6 ## What changes were proposed in this pull request? Collect GC discussion in one section, and documenting findings about G1 GC heap region size. ## How was this patch tested? Jekyll doc build Author: Sean Owen <so...@cloudera.com> Closes #14732 from srowen/SPARK-16320. (cherry picked from commit 342278c09cf6e79ed4f63422988a6bbd1e7d8a91) Signed-off-by: Yin Huai <yh...@databricks.com> commit 6dcc1a3f0cc8f2ed71f7bb6b1493852a58259d2f Author: Shivaram Venkataraman <shiva...@cs.berkeley.edu> Date: 2016-08-22T19:53:52Z [SPARKR][MINOR] Add Xiangrui and Felix to maintainers ## What changes were proposed in this pull request? This change adds Xiangrui Meng and Felix Cheung to the maintainers field in the package description. ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: Shivaram Venkataraman <shiva...@cs.berkeley.edu> Closes #14758 from shivaram/sparkr-maintainers. (cherry picked from commit 6f3cd36f93c11265449fdce3323e139fec8ab22d) Signed-off-by: Shivaram Venkataraman <shiva...@cs.berkeley.edu> commit 01a4d69f309a1cc8d370ce9f85e6a4f31b6db3b8 Author: Eric Liang <e...@databricks.com> Date: 2016-08-22T22:48:35Z [SPARK-17162] Range does not support SQL generation ## What changes were proposed in this pull request? The range operator previously didn't support SQL generation, which made it not possible to use in views. ## How was this patch tested? Unit tests. cc hvanhovell Author: Eric Liang <e...@databricks.com> Closes #14724 from ericl/spark-17162. (cherry picked from commit 84770b59f773f132073cd2af4204957fc2d7bf35) Signed-off-by: Reynold Xin <r...@databricks.com> commit b65b041af8b64413c7d460d4ea110b2044d6f36e Author: Felix Cheung <felixcheun...@hotmail.com> Date: 2016-08-22T22:53:10Z [SPARK-16508][SPARKR] doc updates and more CRAN check fixes replace ``` ` ``` in code doc with `\code{thing}` remove added `...` for drop(DataFrame) fix remaining CRAN check warnings create doc with knitr junyangq Author: Felix Cheung <felixcheun...@hotmail.com> Closes #14734 from felixcheung/rdoccleanup. (cherry picked from commit 71afeeea4ec8e67edc95b5d504c557c88a2598b9) Signed-off-by: Shivaram Venkataraman <shiva...@cs.berkeley.edu> commit ff2f873800fcc3d699e52e60fd0e69eb01d12503 Author: Eric Liang <e...@databricks.com> Date: 2016-08-22T23:32:14Z [SPARK-16550][SPARK-17042][CORE] Certain classes fail to deserialize in block manager replication ## What changes were proposed in this pull request? This is a straightforward clone of JoshRosen 's original patch. I have follow-up changes to fix block replication for repl-defined classes as well, but those appear to be flaking tests so I'm going to leave that for SPARK-17042 ## How was this patch tested? End-to-end test in ReplSuite (also more tests in DistributedSuite from the original patch). Author: Eric Liang <e...@databricks.com> Closes #14311 from ericl/spark-16550. (cherry picked from commit 8e223ea67acf5aa730ccf688802f17f6fc10907c) Signed-off-by: Reynold Xin <r...@databricks.com> commit 225898961bc4bc71d56f33c027adbb2d0929ae5a Author: Shivaram Venkataraman <shiva...@cs.berkeley.edu> Date: 2016-08-23T00:09:32Z [SPARK-16577][SPARKR] Add CRAN documentation checks to run-tests.sh ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? This change adds CRAN documentation checks to be run as a part of `R/run-tests.sh` . As this script is also used by Jenkins this means that we will get documentation checks on every PR going forward. (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Author: Shivaram Venkataraman <shiva...@cs.berkeley.edu> Closes #14759 from shivaram/sparkr-cran-jenkins. (cherry picked from commit 920806ab272ba58a369072a5eeb89df5e9b470a6) Signed-off-by: Shivaram Venkataraman <shiva...@cs.berkeley.edu> commit eaea1c86b897d302107a9b6833a27a2b24ca31a0 Author: Cheng Lian <l...@databricks.com> Date: 2016-08-23T01:11:47Z [SPARK-17182][SQL] Mark Collect as non-deterministic ## What changes were proposed in this pull request? This PR marks the abstract class `Collect` as non-deterministic since the results of `CollectList` and `CollectSet` depend on the actual order of input rows. ## How was this patch tested? Existing test cases should be enough. Author: Cheng Lian <l...@databricks.com> Closes #14749 from liancheng/spark-17182-non-deterministic-collect. (cherry picked from commit 2cdd92a7cd6f85186c846635b422b977bdafbcdd) Signed-off-by: Wenchen Fan <wenc...@databricks.com> ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org