from:"marmbrus"

spark git commit: [SPARK-15062][SQL] fix list type infer serializer issue

2016-05-02 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 1c19c2769 -> 0fd95be3c [SPARK-15062][SQL] fix list type infer serializer issue ## What changes were proposed in this pull request? Make serializer correctly inferred if the input type is `List[_]`, since `List[_]` is type of `Seq[_]`, bef

spark git commit: [SPARK-14747][SQL] Add assertStreaming/assertNoneStreaming checks in DataFrameWriter

2016-05-02 Thread marmbrus

Repository: spark Updated Branches: refs/heads/branch-2.0 fbc73f731 -> 65b94f460 [SPARK-14747][SQL] Add assertStreaming/assertNoneStreaming checks in DataFrameWriter ## Problem If an end user happens to write code mixed with continuous-query-oriented methods and non-continuous-query-oriente

spark git commit: [SPARK-14747][SQL] Add assertStreaming/assertNoneStreaming checks in DataFrameWriter

2016-05-02 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master f362363d1 -> 35d9c8aa6 [SPARK-14747][SQL] Add assertStreaming/assertNoneStreaming checks in DataFrameWriter ## Problem If an end user happens to write code mixed with continuous-query-oriented methods and non-continuous-query-oriented me

spark git commit: [SPARK-14830][SQL] Add RemoveRepetitionFromGroupExpressions optimizer.

2016-05-02 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master a35a67a83 -> 6e6320122 [SPARK-14830][SQL] Add RemoveRepetitionFromGroupExpressions optimizer. ## What changes were proposed in this pull request? This PR aims to optimize GroupExpressions by removing repeating expressions. `RemoveRepetiti

spark git commit: [SPARK-14830][SQL] Add RemoveRepetitionFromGroupExpressions optimizer.

2016-05-02 Thread marmbrus

Repository: spark Updated Branches: refs/heads/branch-2.0 1c2082b64 -> 972fd22e3 [SPARK-14830][SQL] Add RemoveRepetitionFromGroupExpressions optimizer. ## What changes were proposed in this pull request? This PR aims to optimize GroupExpressions by removing repeating expressions. `RemoveRepe

spark git commit: [SPARK-14579][SQL] Fix the race condition in StreamExecution.processAllAvailable again

2016-05-02 Thread marmbrus

Repository: spark Updated Branches: refs/heads/branch-2.0 08ae32e61 -> 1c2082b64 [SPARK-14579][SQL] Fix the race condition in StreamExecution.processAllAvailable again ## What changes were proposed in this pull request? #12339 didn't fix the race condition. MemorySinkSuite is still flaky: h

spark git commit: [SPARK-14579][SQL] Fix the race condition in StreamExecution.processAllAvailable again

2016-05-02 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 992744186 -> a35a67a83 [SPARK-14579][SQL] Fix the race condition in StreamExecution.processAllAvailable again ## What changes were proposed in this pull request? #12339 didn't fix the race condition. MemorySinkSuite is still flaky: https

spark git commit: [SPARK-14637][SQL] object expressions cleanup

2016-05-02 Thread marmbrus

Repository: spark Updated Branches: refs/heads/branch-2.0 ccb53a20e -> 1145ea01b [SPARK-14637][SQL] object expressions cleanup ## What changes were proposed in this pull request? Simplify and clean up some object expressions: 1. simplify the logic to handle `propagateNull` 2. add `propagateN

spark git commit: [SPARK-14637][SQL] object expressions cleanup

2016-05-02 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 214d1be4f -> 0513c3ac9 [SPARK-14637][SQL] object expressions cleanup ## What changes were proposed in this pull request? Simplify and clean up some object expressions: 1. simplify the logic to handle `propagateNull` 2. add `propagateNull`

spark git commit: [SPARK-14981][SQL] Throws exception if DESC is specified for sorting columns

2016-04-29 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 8ebae466a -> a04b1de5f [SPARK-14981][SQL] Throws exception if DESC is specified for sorting columns ## What changes were proposed in this pull request? Currently Spark SQL doesn't support sorting columns in descending order. However, the

spark git commit: [SPARK-14970][SQL] Prevent DataSource from enumerates all files in a directory if there is user specified schema

2016-04-28 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master d5ab42ceb -> 0ee5419b6 [SPARK-14970][SQL] Prevent DataSource from enumerates all files in a directory if there is user specified schema ## What changes were proposed in this pull request? The FileCatalog object gets created even if the use

spark git commit: [SPARK-14874][SQL][STREAMING] Remove the obsolete Batch representation

2016-04-27 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 7dd01d9c0 -> a234cc614 [SPARK-14874][SQL][STREAMING] Remove the obsolete Batch representation ## What changes were proposed in this pull request? The `Batch` class, which had been used to indicate progress in a stream, was abandoned by [[

spark git commit: [SPARK-14678][SQL] Add a file sink log to support versioning and compaction

2016-04-20 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 296c384af -> 7bc948557 [SPARK-14678][SQL] Add a file sink log to support versioning and compaction ## What changes were proposed in this pull request? This PR adds a special log for FileStreamSink for two purposes: - Versioning. A future

spark git commit: [SPARK-14741][SQL] Fixed error in reading json file stream inside a partitioned directory

2016-04-20 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master acc7e592c -> cb8ea9e1f [SPARK-14741][SQL] Fixed error in reading json file stream inside a partitioned directory ## What changes were proposed in this pull request? Consider the following directory structure dir/col=X/some-files If we cre

spark git commit: [SPARK-14555] First cut of Python API for Structured Streaming

2016-04-20 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 834277884 -> 80bf48f43 [SPARK-14555] First cut of Python API for Structured Streaming ## What changes were proposed in this pull request? This patch provides a first cut of python APIs for structured streaming. This PR provides the new cl

spark git commit: [SPARK-13929] Use Scala reflection for UDTs

2016-04-19 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 10f273d8d -> 3ae25f244 [SPARK-13929] Use Scala reflection for UDTs ## What changes were proposed in this pull request? Enable ScalaReflection and User Defined Types for plain Scala classes. This involves the move of `schemaFor` from `Scal

spark git commit: [SPARK-14474][SQL] Move FileSource offset log into checkpointLocation

2016-04-12 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master da60b34d2 -> 6bf692147 [SPARK-14474][SQL] Move FileSource offset log into checkpointLocation ## What changes were proposed in this pull request? Now that we have a single location for storing checkpointed state. This PR just propagates th

spark git commit: [SPARK-14494][SQL] Fix the race conditions in MemoryStream and MemorySink

2016-04-11 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 5de26194a -> 2dacc81ec [SPARK-14494][SQL] Fix the race conditions in MemoryStream and MemorySink ## What changes were proposed in this pull request? Make sure accessing mutable variables in MemoryStream and MemorySink are protected by `sy

spark git commit: [SPARK-14456][SQL][MINOR] Remove unused variables and logics in DataSource

2016-04-07 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 3aa7d7639 -> 8dcb0c7c9 [SPARK-14456][SQL][MINOR] Remove unused variables and logics in DataSource ## What changes were proposed in this pull request? In DataSource#write method, the variables `dataSchema` and `equality`, and related logic

spark git commit: [SPARK-14411][SQL] Add a note to warn that onQueryProgress is asynchronous

2016-04-05 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 45d8cdee3 -> 7329fe272 [SPARK-14411][SQL] Add a note to warn that onQueryProgress is asynchronous ## What changes were proposed in this pull request? onQueryProgress is asynchronous so the user may see some future status of `ContinuousQue

spark git commit: [SPARK-14402][SQL] initcap UDF doesn't match Hive/Oracle behavior in lowercasing rest of string

2016-04-05 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 9ee5c2571 -> c59abad05 [SPARK-14402][SQL] initcap UDF doesn't match Hive/Oracle behavior in lowercasing rest of string ## What changes were proposed in this pull request? Current, SparkSQL `initCap` is using `toTitleCase` function. Howeve

spark git commit: [SPARK-14257][SQL] Allow multiple continuous queries to be started from the same DataFrame

2016-04-05 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master f77f11c67 -> 463bac001 [SPARK-14257][SQL] Allow multiple continuous queries to be started from the same DataFrame ## What changes were proposed in this pull request? Make StreamingRelation store the closure to create the source in Stream

spark git commit: [SPARK-14345][SQL] Decouple deserializer expression resolution from ObjectOperator

2016-04-05 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master e4bd50412 -> f77f11c67 [SPARK-14345][SQL] Decouple deserializer expression resolution from ObjectOperator ## What changes were proposed in this pull request? This PR decouples deserializer expression resolution from `ObjectOperator`, so

spark git commit: [SPARK-14287] isStreaming method for Dataset

2016-04-04 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 7201f033c -> ba24d1ee9 [SPARK-14287] isStreaming method for Dataset With the addition of StreamExecution (ContinuousQuery) to Datasets, data will become unbounded. With unbounded data, the execution of some methods and operations will not

spark git commit: [SPARK-14176][SQL] Add DataFrameWriter.trigger to set the stream batch period

2016-04-04 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 89f3befab -> 855ed44ed [SPARK-14176][SQL] Add DataFrameWriter.trigger to set the stream batch period ## What changes were proposed in this pull request? Add a processing time trigger to control the batch processing speed ## How was this p

[1/2] spark git commit: [SPARK-14255][SQL] Streaming Aggregation

2016-04-01 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 0b7d4966c -> 0fc4aaa71 http://git-wip-us.apache.org/repos/asf/spark/blob/0fc4aaa7/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreRDDSuite.scala ---

[2/2] spark git commit: [SPARK-14255][SQL] Streaming Aggregation

2016-04-01 Thread marmbrus

checks only the output of the last batch has been added to simulate the future addition of output modes. Author: Michael Armbrust Closes #12048 from marmbrus/statefulAgg. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0fc4aaa7

spark git commit: [SPARK-14160] Time Windowing functions for Datasets

2016-04-01 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 1e8861598 -> 1b829ce13 [SPARK-14160] Time Windowing functions for Datasets ## What changes were proposed in this pull request? This PR adds the function `window` as a column expression. `window` can be used to bucket rows into time window

spark git commit: [SPARK-14070][SQL] Use ORC data source for SQL queries on ORC tables

2016-04-01 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master a884daad8 -> 1e8861598 [SPARK-14070][SQL] Use ORC data source for SQL queries on ORC tables ## What changes were proposed in this pull request? This patch enables use of OrcRelation for SQL queries which read data from Hive tables. Change

spark git commit: [SPARK-14191][SQL] Remove invalid Expand operator constraints

2016-04-01 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master df68beb85 -> a884daad8 [SPARK-14191][SQL] Remove invalid Expand operator constraints `Expand` operator now uses its child plan's constraints as its valid constraints (i.e., the base of constraints). This is not correct because `Expand` wi

spark git commit: [SPARK-13995][SQL] Extract correct IsNotNull constraints for Expression

2016-04-01 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 381358fbe -> df68beb85 [SPARK-13995][SQL] Extract correct IsNotNull constraints for Expression ## What changes were proposed in this pull request? JIRA: https://issues.apache.org/jira/browse/SPARK-13995 We infer relative `IsNotNull` const

spark git commit: [SPARK-14268][SQL] rename toRowExpressions and fromRowExpression to serializer and deserializer in ExpressionEncoder

2016-03-30 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 816f359cf -> d46c71b39 [SPARK-14268][SQL] rename toRowExpressions and fromRowExpression to serializer and deserializer in ExpressionEncoder ## What changes were proposed in this pull request? In `ExpressionEncoder`, we use `constructorFor

spark git commit: [SPARK-12443][SQL] encoderFor should support Decimal

2016-03-25 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 11fa8741c -> ca003354d [SPARK-12443][SQL] encoderFor should support Decimal ## What changes were proposed in this pull request? JIRA: https://issues.apache.org/jira/browse/SPARK-12443 `constructorFor` will call `dataTypeFor` to determine

spark git commit: [SPARK-14078] Streaming Parquet Based FileSink

2016-03-23 Thread marmbrus

ess test that checks the answer after non-deterministic injected failures. Author: Michael Armbrust Closes #11897 from marmbrus/fileSink. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6bc4be64 Tree: http://git-

spark git commit: [SPARK-13985][SQL] Deterministic batches with ids

2016-03-22 Thread marmbrus

ion with the the `StateStore` (#11645). Author: Michael Armbrust Closes #11804 from marmbrus/batchIds. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/caea1521 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/caea1

spark git commit: [SPARK-14029][SQL] Improve BooleanSimplification optimization by implementing `Not` canonicalization.

2016-03-22 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 0ce01635c -> c632bdc01 [SPARK-14029][SQL] Improve BooleanSimplification optimization by implementing `Not` canonicalization. ## What changes were proposed in this pull request? Currently, **BooleanSimplification** optimization can handle

spark git commit: [SPARK-13883][SQL] Parquet Implementation of FileFormat.buildReader

2016-03-21 Thread marmbrus

ded. This code should be tested by the many existing tests for parquet. Author: Michael Armbrust Author: Sameer Agarwal Author: Nong Li Closes #11709 from marmbrus/parquetReader. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/com

spark git commit: [SPARK-13427][SQL] Support USING clause in JOIN.

2016-03-19 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 65b75e66e -> 637a78f1d [SPARK-13427][SQL] Support USING clause in JOIN. ## What changes were proposed in this pull request? Support queries that JOIN tables with USING clause. SELECT * from table1 JOIN table2 USING USING clause can be us

spark git commit: [SPARK-13791][SQL] Add MetadataLog and HDFSMetadataLog

2016-03-14 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 8e0b03060 -> b5e3bd87f [SPARK-13791][SQL] Add MetadataLog and HDFSMetadataLog ## What changes were proposed in this pull request? - Add a MetadataLog interface for metadata reliably storage. - Add HDFSMetadataLog as a MetadataLog implemen

spark git commit: [SPARK-10380][SQL] Fix confusing documentation examples for astype/drop_duplicates.

2016-03-14 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 4bf460979 -> 8e0b03060 [SPARK-10380][SQL] Fix confusing documentation examples for astype/drop_duplicates. ## What changes were proposed in this pull request? We have seen users getting confused by the documentation for astype and drop_du

spark git commit: [SPARK-13664][SQL] Add a strategy for planning partitioned and bucketed scans of files

2016-03-14 Thread marmbrus

ternal APIs to avoid unnecessary `toArray` calls - Rename `Partition` to `PartitionDirectory` to differentiate partitions used earlier in pruning from those where we have already enumerated the files and their sizes. Author: Michael Armbrust Closes #11646 from marmbrus/fileStrategy. Project

spark git commit: [SPARK-13658][SQL] BooleanSimplification rule is slow with large boolean expressions

2016-03-14 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 63f642aea -> 6a4bfcd62 [SPARK-13658][SQL] BooleanSimplification rule is slow with large boolean expressions JIRA: https://issues.apache.org/jira/browse/SPARK-13658 ## What changes were proposed in this pull request? Quoted from JIRA desc

svn commit: r1734450 - in /spark: ./ _layouts/ js/ news/_posts/ releases/_posts/ site/ site/docs/ site/docs/1.6.1/ site/docs/1.6.1/api/ site/docs/1.6.1/api/R/ site/docs/1.6.1/api/java/ site/docs/1.6.1

2016-03-10 Thread marmbrus

Author: marmbrus Date: Thu Mar 10 19:28:30 2016 New Revision: 1734450 URL: http://svn.apache.org/viewvc?rev=1734450&view=rev Log: Release Spark 1.6.1 [This commit notification would consist of 933 parts, which exceeds the limit of 50 ones, so it was shortened to the sum

svn commit: r12718 - /dev/spark/spark-1.6.1-rc1/ /release/spark/spark-1.6.1/

2016-03-10 Thread marmbrus

Author: marmbrus Date: Thu Mar 10 19:14:45 2016 New Revision: 12718 Log: Release Spark 1.6.1 Added: release/spark/spark-1.6.1/ - copied from r12717, dev/spark/spark-1.6.1-rc1/ Removed: dev/spark/spark-1.6.1-rc1

svn commit: r12717 - /dev/spark/spark-1.6.1-rc1/

2016-03-10 Thread marmbrus

Author: marmbrus Date: Thu Mar 10 19:10:54 2016 New Revision: 12717 Log: Add spark-1.6.1-rc1 Added: dev/spark/spark-1.6.1-rc1/ dev/spark/spark-1.6.1-rc1/spark-1.6.1-bin-cdh4.tgz (with props) dev/spark/spark-1.6.1-rc1/spark-1.6.1-bin-cdh4.tgz.asc dev/spark/spark-1.6.1-rc1/spark

[spark] Git Push Summary

2016-03-09 Thread marmbrus

Repository: spark Updated Tags: refs/tags/v1.6.1 [created] 15de51c23 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

[spark] Git Push Summary

2016-03-09 Thread marmbrus

Repository: spark Updated Tags: refs/tags/v1.6.1 [deleted] 152252f15 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

[spark] Git Push Summary

2016-03-09 Thread marmbrus

Repository: spark Updated Tags: refs/tags/v1.6.1-rc1 [deleted] 15de51c23 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

[spark] Git Push Summary

2016-03-09 Thread marmbrus

Repository: spark Updated Tags: refs/tags/v1.6.1 [created] 152252f15 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-13781][SQL] Use ExpressionSets in ConstraintPropagationSuite

2016-03-09 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master e1772d3f1 -> dbf2a7cfa [SPARK-13781][SQL] Use ExpressionSets in ConstraintPropagationSuite ## What changes were proposed in this pull request? This PR is a small follow up on https://github.com/apache/spark/pull/11338 (https://issues.apac

spark git commit: [SPARK-13527][SQL] Prune Filters based on Constraints

2016-03-09 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 3dc9ae2e1 -> c6aa356cd [SPARK-13527][SQL] Prune Filters based on Constraints What changes were proposed in this pull request? Remove all the deterministic conditions in a [[Filter]] that are contained in the Child's Constraints. For

spark git commit: [SPARK-13728][SQL] Fix ORC PPD test so that pushed filters can be checked.

2016-03-09 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 23369c3bd -> cad29a40b [SPARK-13728][SQL] Fix ORC PPD test so that pushed filters can be checked. ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-13728 https://github.com/apache/spark/pull/11

spark git commit: [SPARK-13763][SQL] Remove Project when its Child's Output is Nil

2016-03-09 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 256704c77 -> 23369c3bd [SPARK-13763][SQL] Remove Project when its Child's Output is Nil What changes were proposed in this pull request? As shown in another PR: https://github.com/apache/spark/pull/11596, we are using `SELECT 1` as a

spark git commit: [SPARK-13754] Keep old data source name for backwards compatibility

2016-03-08 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 982ef2b87 -> cc4ab37ee [SPARK-13754] Keep old data source name for backwards compatibility ## Motivation CSV data source was contributed by Databricks. It is the inlined version of https://github.com/databricks/spark-csv. The data source n

spark git commit: [SPARK-13750][SQL] fix sizeInBytes of HadoopFsRelation

2016-03-08 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master d8813fa04 -> 982ef2b87 [SPARK-13750][SQL] fix sizeInBytes of HadoopFsRelation ## What changes were proposed in this pull request? This PR fix the sizeInBytes of HadoopFsRelation. ## How was this patch tested? Added regression test for th

spark git commit: [SPARK-13648] Add Hive Cli to classes for isolated classloader

2016-03-07 Thread marmbrus

Repository: spark Updated Branches: refs/heads/branch-1.6 cf4e62ec2 -> 695c8a257 [SPARK-13648] Add Hive Cli to classes for isolated classloader ## What changes were proposed in this pull request? Adding the hive-cli classes to the classloader ## How was this patch tested? The hive Versionss

spark git commit: [SPARK-13648] Add Hive Cli to classes for isolated classloader

2016-03-07 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master e720dda42 -> 46f25c241 [SPARK-13648] Add Hive Cli to classes for isolated classloader ## What changes were proposed in this pull request? Adding the hive-cli classes to the classloader ## How was this patch tested? The hive Versionssuite

spark git commit: [SPARK-13722][SQL] No Push Down for Non-deterministics Predicates through Generate

2016-03-07 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master a3ec50a4b -> b6071a700 [SPARK-13722][SQL] No Push Down for Non-deterministics Predicates through Generate What changes were proposed in this pull request? Non-deterministic predicates should not be pushed through Generate. How

spark git commit: [SPARK-13694][SQL] QueryPlan.expressions should always include all expressions

2016-03-07 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master d7eac9d79 -> 489641117 [SPARK-13694][SQL] QueryPlan.expressions should always include all expressions ## What changes were proposed in this pull request? It's weird that expressions don't always have all the expressions in it. This PR mar

spark git commit: [SPARK-13544][SQL] Rewrite/Propagate Constraints for Aliases in Aggregate

2016-02-29 Thread marmbrus

and `Aggregate`. So far, we only rewrite and propagate constraints if `Alias` is defined in `Project`. This PR is to resolve this issue in `Aggregate`. How was this patch tested? Added a test case for `Aggregate` in `ConstraintPropagationSuite`. marmbrus sameeragarwal Author: gatorsmile Clo

[spark] Git Push Summary

2016-02-26 Thread marmbrus

Repository: spark Updated Tags: refs/tags/v1.6.1-rc1 [deleted] 152252f15 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-13383][SQL] Keep broadcast hint after column pruning

2016-02-24 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 893018183 -> f37398699 [SPARK-13383][SQL] Keep broadcast hint after column pruning JIRA: https://issues.apache.org/jira/browse/SPARK-13383 ## What changes were proposed in this pull request? When we do column pruning in Optimizer, we put

spark git commit: [SPARK-13440][SQL] ObjectType should accept any ObjectType, If should not care about nullability

2016-02-23 Thread marmbrus

`DatasetSuite` for the reported failure. - all the unit tests in `ExpressionEncoderSuite` are augmented to also confirm successful analysis. These tests are actually what pointed out the additional issues with `If` resolution. Author: Michael Armbrust Closes #11316 from marmbrus/datasetOptio

spark git commit: Update branch-1.6 for 1.6.1 release

2016-02-22 Thread marmbrus

Repository: spark Updated Branches: refs/heads/branch-1.6 f7898f9e2 -> 40d11d049 Update branch-1.6 for 1.6.1 release Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/40d11d04 Tree: http://git-wip-us.apache.org/repos/asf/spa

spark git commit: [SPARK-11624][SPARK-11972][SQL] fix commands that need hive to exec

2016-02-22 Thread marmbrus

Repository: spark Updated Branches: refs/heads/branch-1.6 85e6a2205 -> f7898f9e2 [SPARK-11624][SPARK-11972][SQL] fix commands that need hive to exec In SparkSQLCLI, we have created a `CliSessionState`, but then we call `SparkSQLEnv.init()`, which will start another `SessionState`. This would

spark git commit: [SPARK-11624][SPARK-11972][SQL] fix commands that need hive to exec

2016-02-22 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master a11b39951 -> 5d80fac58 [SPARK-11624][SPARK-11972][SQL] fix commands that need hive to exec In SparkSQLCLI, we have created a `CliSessionState`, but then we call `SparkSQLEnv.init()`, which will start another `SessionState`. This would lead

spark git commit: [SPARK-12546][SQL] Change default number of open parquet files

2016-02-22 Thread marmbrus

uet allocates a significant amount of memory that is not accounted for by our own mechanisms. As a workaround, we can ensure that only a single file is open per task unless the user explicitly asks for more. Author: Michael Armbrust Closes #11308 from marmbrus/parquetWriteOOM. (cherry picked from com

spark git commit: [SPARK-12546][SQL] Change default number of open parquet files

2016-02-22 Thread marmbrus

s a significant amount of memory that is not accounted for by our own mechanisms. As a workaround, we can ensure that only a single file is open per task unless the user explicitly asks for more. Author: Michael Armbrust Closes #11308 from marmbrus/parquetWriteOOM. Project: http://git-

spark git commit: [SPARK-13091][SQL] Rewrite/Propagate constraints for Aliases

2016-02-19 Thread marmbrus

any constraints on `a` now also apply to `b`. JIRA: https://issues.apache.org/jira/browse/SPARK-13091 cc marmbrus Author: Sameer Agarwal Closes #11144 from sameeragarwal/alias. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/com

spark git commit: [SPARK-13261][SQL] Expose maxCharactersPerColumn as a user configurable option

2016-02-19 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master dbb08cdd5 -> 14844118b [SPARK-13261][SQL] Expose maxCharactersPerColumn as a user configurable option This patch expose `maxCharactersPerColumn` and `maxColumns` to user in CSV data source. Author: Hossein Closes #11147 from falaki/SPAR

spark git commit: [SPARK-12966][SQL] ArrayType(DecimalType) support in Postgres JDBC

2016-02-19 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master c7c55637b -> dbb08cdd5 [SPARK-12966][SQL] ArrayType(DecimalType) support in Postgres JDBC Fixes error `org.postgresql.util.PSQLException: Unable to find server array type for provided name decimal(38,18)`. * Passes scale metadata to JDBC

spark git commit: [SPARK-13384][SQL] Keep attribute qualifiers after dedup in Analyzer

2016-02-19 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 6915cc23b -> c7c55637b [SPARK-13384][SQL] Keep attribute qualifiers after dedup in Analyzer JIRA: https://issues.apache.org/jira/browse/SPARK-13384 ## What changes were proposed in this pull request? When we de-duplicate attributes in Ana

spark git commit: [SPARK-13101][SQL] nullability of array type element should not fail analysis of encoder

2016-02-08 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 06f0df6df -> 8e4d15f70 [SPARK-13101][SQL] nullability of array type element should not fail analysis of encoder nullability should only be considered as an optimization rather than part of the type system, so instead of failing analysis f

spark git commit: [SPARK-12939][SQL] migrate encoder resolution logic to Analyzer

2016-02-05 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 7b73f1719 -> 1ed354a53 [SPARK-12939][SQL] migrate encoder resolution logic to Analyzer https://issues.apache.org/jira/browse/SPARK-12939 Now we will catch `ObjectOperator` in `Analyzer` and resolve the `fromRowExpression/deserializer` ins

spark git commit: [SPARK-13101][SQL][BRANCH-1.6] nullability of array type element should not fail analysis of encoder

2016-02-03 Thread marmbrus

Repository: spark Updated Branches: refs/heads/branch-1.6 5fe8796c2 -> cdfb2a141 [SPARK-13101][SQL][BRANCH-1.6] nullability of array type element should not fail analysis of encoder nullability should only be considered as an optimization rather than part of the type system, so instead of fa

spark git commit: [SPARK-13166][SQL] Remove DataStreamReader/Writer

2016-02-03 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 3221eddb8 -> 915a75398 [SPARK-13166][SQL] Remove DataStreamReader/Writer They seem redundant and we can simply use DataFrameReader/Writer. The new usage looks like: ```scala val df = sqlContext.read.stream("...") val handle = df.write.str

spark git commit: [SPARK-12957][SQL] Initial support for constraint propagation in SparkSQL

2016-02-02 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master e86f8f63b -> 138c300f9 [SPARK-12957][SQL] Initial support for constraint propagation in SparkSQL Based on the semantics of a query, we can derive a number of data constraints on output of each (logical or physical) operator. For instance,

spark git commit: [DOCS] Update StructType.scala

2016-02-02 Thread marmbrus

Repository: spark Updated Branches: refs/heads/branch-1.6 3c92333ee -> e81333be0 [DOCS] Update StructType.scala The example will throw error like :20: error: not found: value StructType Need to add this line: import org.apache.spark.sql.types._ Author: Kevin (Sangwoo) Kim Closes #10141 fro

spark git commit: [DOCS] Update StructType.scala

2016-02-02 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master d0df2ca40 -> b377b0353 [DOCS] Update StructType.scala The example will throw error like :20: error: not found: value StructType Need to add this line: import org.apache.spark.sql.types._ Author: Kevin (Sangwoo) Kim Closes #10141 from sw

spark git commit: [SPARK-13056][SQL] map column would throw NPE if value is null

2016-02-02 Thread marmbrus

Repository: spark Updated Branches: refs/heads/branch-1.6 9c0cf22f7 -> 3c92333ee [SPARK-13056][SQL] map column would throw NPE if value is null Jira: https://issues.apache.org/jira/browse/SPARK-13056 Create a map like { "a": "somestring", "b": null} Query like SELECT col["b"] FROM t1; NPE wou

spark git commit: [SPARK-13056][SQL] map column would throw NPE if value is null

2016-02-02 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master cba1d6b65 -> 358300c79 [SPARK-13056][SQL] map column would throw NPE if value is null Jira: https://issues.apache.org/jira/browse/SPARK-13056 Create a map like { "a": "somestring", "b": null} Query like SELECT col["b"] FROM t1; NPE would b

spark git commit: [SPARK-13094][SQL] Add encoders for seq/array of primitives

2016-02-02 Thread marmbrus

Repository: spark Updated Branches: refs/heads/branch-1.6 bd8efba8f -> 99594b213 [SPARK-13094][SQL] Add encoders for seq/array of primitives Author: Michael Armbrust Closes #11014 from marmbrus/seqEncoders. (cherry picked from commit 29d92181d0c49988c387d34e4a71b1afe02c29e2) Signed-off

spark git commit: [SPARK-13094][SQL] Add encoders for seq/array of primitives

2016-02-02 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 12a20c144 -> 29d92181d [SPARK-13094][SQL] Add encoders for seq/array of primitives Author: Michael Armbrust Closes #11014 from marmbrus/seqEncoders. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-

spark git commit: [SPARK-10820][SQL] Support for the continuous execution of structured queries

2016-02-02 Thread marmbrus

ses #11006 from marmbrus/structured-streaming. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/12a20c14 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/12a20c14 Diff: http://git-wip-us.apache.org/repos/asf/spark/d

spark git commit: [SPARK-11780][SQL] Add catalyst type aliases backwards compatibility

2016-02-01 Thread marmbrus

Repository: spark Updated Branches: refs/heads/branch-1.6 215d5d884 -> 70fcbf68e [SPARK-11780][SQL] Add catalyst type aliases backwards compatibility Changed a target at branch-1.6 from #10635. Author: Takeshi YAMAMURO Closes #10915 from maropu/pr9935-v3. Project: http://git-wip-us.apache

spark git commit: [DOCS] Fix the jar location of datanucleus in sql-programming-guid.md

2016-02-01 Thread marmbrus

Repository: spark Updated Branches: refs/heads/branch-1.6 9a5b25d0f -> 215d5d884 [DOCS] Fix the jar location of datanucleus in sql-programming-guid.md ISTM `lib` is better because `datanucleus` jars are located in `lib` for release builds. Author: Takeshi YAMAMURO Closes #10901 from maropu

spark git commit: [DOCS] Fix the jar location of datanucleus in sql-programming-guid.md

2016-02-01 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 8f26eb5ef -> da9146c91 [DOCS] Fix the jar location of datanucleus in sql-programming-guid.md ISTM `lib` is better because `datanucleus` jars are located in `lib` for release builds. Author: Takeshi YAMAMURO Closes #10901 from maropu/Doc

spark git commit: [SPARK-12705][SPARK-10777][SQL] Analyzer Rule ResolveSortReferences

2016-02-01 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 33c8a490f -> 8f26eb5ef [SPARK-12705][SPARK-10777][SQL] Analyzer Rule ResolveSortReferences JIRA: https://issues.apache.org/jira/browse/SPARK-12705 **Scope:** This PR is a general fix for sorting reference resolution when the child's `outp

spark git commit: [SPARK-12989][SQL] Delaying Alias Cleanup after ExtractWindowExpressions

2016-02-01 Thread marmbrus

Repository: spark Updated Branches: refs/heads/branch-1.6 ddb963304 -> 9a5b25d0f [SPARK-12989][SQL] Delaying Alias Cleanup after ExtractWindowExpressions JIRA: https://issues.apache.org/jira/browse/SPARK-12989 In the rule `ExtractWindowExpressions`, we simply replace alias by the correspondi

spark git commit: [SPARK-12989][SQL] Delaying Alias Cleanup after ExtractWindowExpressions

2016-02-01 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 6075573a9 -> 33c8a490f [SPARK-12989][SQL] Delaying Alias Cleanup after ExtractWindowExpressions JIRA: https://issues.apache.org/jira/browse/SPARK-12989 In the rule `ExtractWindowExpressions`, we simply replace alias by the corresponding a

spark git commit: [SPARK-12926][SQL] SQLContext to display warning message when non-sql configs are being set

2016-01-28 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 415d0a859 -> 676803963 [SPARK-12926][SQL] SQLContext to display warning message when non-sql configs are being set Users unknowingly try to set core Spark configs in SQLContext but later realise that it didn't work. eg. sqlContext.sql("SE

spark git commit: [SPARK-12975][SQL] Throwing Exception when Bucketing Columns are part of Partitioning Columns

2016-01-25 Thread marmbrus

is just wasting extra CPU when reading or writing bucket tables. Thus, like Hive, we can issue an exception and let users do the change. Also added a test case for checking if the information of `sortBy` and `bucketBy` columns are correctly saved in the metastore table. Could you check if my unders

spark git commit: [SPARK-12816][SQL] De-alias type when generating schemas

2016-01-19 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 4dbd31612 -> c78e2080e [SPARK-12816][SQL] De-alias type when generating schemas Call `dealias` on local types to fix schema generation for abstract type members, such as ```scala type KeyValue = (Int, String) ``` Add simple test Author:

spark git commit: [SQL][MINOR] BoundReference do not need to be NamedExpression

2016-01-15 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 61c45876f -> 3f1c58d60 [SQL][MINOR] BoundReference do not need to be NamedExpression We made it a `NamedExpression` to workaroud some hacky cases long time ago, and now seems it's safe to remove it. Author: Wenchen Fan Closes #10765 fro

spark git commit: [SPARK-12813][SQL] Eliminate serialization for back to back operations

2016-01-14 Thread marmbrus

ion. - Eliminate serializations in more cases by adding more cases to `EliminateSerialization` Author: Michael Armbrust Closes #10747 from marmbrus/encoderExpressions. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cc7af

spark git commit: [HOT-FIX] bypass hive test when parse logical plan to json

2016-01-12 Thread marmbrus

Repository: spark Updated Branches: refs/heads/branch-1.6 03e523e52 -> f71e5cc12 [HOT-FIX] bypass hive test when parse logical plan to json https://github.com/apache/spark/pull/10311 introduces some rare, non-deterministic flakiness for hive udf tests, see https://github.com/apache/spark/pul

spark git commit: [SPARK-9843][SQL] Make catalyst optimizer pass pluggable at runtime

2016-01-12 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master 1d8887953 -> 508592b1b [SPARK-9843][SQL] Make catalyst optimizer pass pluggable at runtime Let me know whether you'd like to see it in other place Author: Robert Kruszewski Closes #10210 from robert3005/feature/pluggable-optimizer. Pro

spark git commit: [SPARK-12758][SQL] add note to Spark SQL Migration guide about TimestampType casting

2016-01-11 Thread marmbrus

Repository: spark Updated Branches: refs/heads/branch-1.6 3b32aa9e2 -> dd2cf64f3 [SPARK-12758][SQL] add note to Spark SQL Migration guide about TimestampType casting Warning users about casting changes. Author: Brandon Bradley Closes #10708 from blbradley/spark-12758. (cherry picked from

spark git commit: [SPARK-12758][SQL] add note to Spark SQL Migration guide about TimestampType casting

2016-01-11 Thread marmbrus

Repository: spark Updated Branches: refs/heads/master a44991453 -> a767ee8a0 [SPARK-12758][SQL] add note to Spark SQL Migration guide about TimestampType casting Warning users about casting changes. Author: Brandon Bradley Closes #10708 from blbradley/spark-12758. Project: http://git-wip

[2/2] spark git commit: [SPARK-12696] Backport Dataset Bug fixes to 1.6

2016-01-08 Thread marmbrus

smile Author: Liang-Chi Hsieh Author: Cheng Lian Author: Nong Li Closes #10650 from marmbrus/dataset-backports. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a6190508 Tree: http://git-wip-us.apache.org/repos/asf/spark

< 1 2 3 4 5 6 7 8 9 10 >

201 - 300 of 1984 matches

Mail list logo