GitHub user damnMeddlingKid opened a pull request:
https://github.com/apache/spark/pull/10136
Kafka streaming
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/Shopify/spark kafka_streaming
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/10136.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #10136
----
commit 854319e589c89b2b6b4a9d02916f6f748fc5680a
Author: Fernando Otero (ZeoS) <[email protected]>
Date: 2015-01-08T20:42:54Z
SPARK-5148 [MLlib] Make usersOut/productsOut storagelevel in ALS
configurable
Author: Fernando Otero (ZeoS) <[email protected]>
Closes #3953 from zeitos/storageLevel and squashes the following commits:
0f070b9 [Fernando Otero (ZeoS)] fix imports
6869e80 [Fernando Otero (ZeoS)] fix comment length
90c9f7e [Fernando Otero (ZeoS)] fix comment length
18a992e [Fernando Otero (ZeoS)] changing storage level
commit d9cad94b1df0200207ba03fb0168373ccc3a8597
Author: Kousuke Saruta <[email protected]>
Date: 2015-01-08T21:43:09Z
[SPARK-4973][CORE] Local directory in the driver of client-mode continues
remaining even if application finished when external shuffle is enabled
When we enables external shuffle service, local directories in the driver
of client-mode continue remaining even if application has finished.
I think local directories for drivers should be deleted.
Author: Kousuke Saruta <[email protected]>
Closes #3811 from sarutak/SPARK-4973 and squashes the following commits:
ad944ab [Kousuke Saruta] Fixed DiskBlockManager to cleanup local directory
if it's the driver
43770da [Kousuke Saruta] Merge branch 'master' of
git://git.apache.org/spark into SPARK-4973
88feecd [Kousuke Saruta] Merge branch 'master' of
git://git.apache.org/spark into SPARK-4973
d99718e [Kousuke Saruta] Fixed SparkSubmit.scala and DiskBlockManager.scala
in order to delete local directories of the driver of local-mode when external
shuffle service is enabled
commit b14068bf7b2dff450101d48a59e79761e3ca4eb2
Author: RJ Nowling <[email protected]>
Date: 2015-01-08T23:03:43Z
[SPARK-4891][PySpark][MLlib] Add gamma/log normal/exp dist sampling to P...
...ySpark MLlib
This is a follow up to PR3680 https://github.com/apache/spark/pull/3680 .
Author: RJ Nowling <[email protected]>
Closes #3955 from rnowling/spark4891 and squashes the following commits:
1236a01 [RJ Nowling] Fix Python style issues
7a01a78 [RJ Nowling] Fix Python style issues
174beab [RJ Nowling] [SPARK-4891][PySpark][MLlib] Add gamma/log normal/exp
dist sampling to PySpark MLlib
commit 5a1b7a9c8a77b6d1ef5553490d0ccf291dfac06f
Author: Marcelo Vanzin <[email protected]>
Date: 2015-01-09T01:15:13Z
[SPARK-4048] Enhance and extend hadoop-provided profile.
This change does a few things to make the hadoop-provided profile more
useful:
- Create new profiles for other libraries / services that might be provided
by the infrastructure
- Simplify and fix the poms so that the profiles are only activated while
building assemblies.
- Fix tests so that they're able to run when the profiles are activated
- Add a new env variable to be used by distributions that use these
profiles to provide the runtime
classpath for Spark jobs and daemons.
Author: Marcelo Vanzin <[email protected]>
Closes #2982 from vanzin/SPARK-4048 and squashes the following commits:
82eb688 [Marcelo Vanzin] Add a comment.
eb228c0 [Marcelo Vanzin] Fix borked merge.
4e38f4e [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
9ef79a3 [Marcelo Vanzin] Alternative way to propagate test classpath to
child processes.
371ebee [Marcelo Vanzin] Review feedback.
52f366d [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
83099fc [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
7377e7b [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
322f882 [Marcelo Vanzin] Fix merge fail.
f24e9e7 [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
8b00b6a [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
9640503 [Marcelo Vanzin] Cleanup child process log message.
115fde5 [Marcelo Vanzin] Simplify a comment (and make it consistent with
another pom).
e3ab2da [Marcelo Vanzin] Fix hive-thriftserver profile.
7820d58 [Marcelo Vanzin] Fix CliSuite with provided profiles.
1be73d4 [Marcelo Vanzin] Restore flume-provided profile.
d1399ed [Marcelo Vanzin] Restore jetty dependency.
82a54b9 [Marcelo Vanzin] Remove unused profile.
5c54a25 [Marcelo Vanzin] Fix HiveThriftServer2Suite with *-provided
profiles.
1fc4d0b [Marcelo Vanzin] Update dependencies for hive-thriftserver.
f7b3bbe [Marcelo Vanzin] Add snappy to hadoop-provided list.
9e4e001 [Marcelo Vanzin] Remove duplicate hive profile.
d928d62 [Marcelo Vanzin] Redirect child stderr to parent's log.
4d67469 [Marcelo Vanzin] Propagate SPARK_DIST_CLASSPATH on Yarn.
417d90e [Marcelo Vanzin] Introduce "SPARK_DIST_CLASSPATH".
2f95f0d [Marcelo Vanzin] Propagate classpath to child processes during
testing.
1adf91c [Marcelo Vanzin] Re-enable maven-install-plugin for a few projects.
284dda6 [Marcelo Vanzin] Rework the "hadoop-provided" profile, add new ones.
commit 013e031d01dca052b94a094c08b7d7f76f640711
Author: Nicholas Chammas <[email protected]>
Date: 2015-01-09T01:42:08Z
[SPARK-5122] Remove Shark from spark-ec2
I moved the Spark-Shark version map [to the
wiki](https://cwiki.apache.org/confluence/display/SPARK/Spark-Shark+version+mapping).
This PR has a [matching PR in
mesos/spark-ec2](https://github.com/mesos/spark-ec2/pull/89).
Author: Nicholas Chammas <[email protected]>
Closes #3939 from nchammas/remove-shark and squashes the following commits:
66e0841 [Nicholas Chammas] fix style
ceeab85 [Nicholas Chammas] show default Spark GitHub repo
7270126 [Nicholas Chammas] validate Spark hashes
db4935d [Nicholas Chammas] validate spark version upfront
fc0d5b9 [Nicholas Chammas] remove Shark
commit 8a95a3e61580b1c1f6c0a3e124aa8469255db968
Author: WangTaoTheTonic <[email protected]>
Date: 2015-01-09T14:10:09Z
[SPARK-5169][YARN]fetch the correct max attempts
Soryy for fetching the wrong max attempts in this commit
https://github.com/apache/spark/commit/8fdd48959c93b9cf809f03549e2ae6c4687d1fcd.
We need to fix it now.
tgravescs
If we set an spark.yarn.maxAppAttempts which is larger than
`yarn.resourcemanager.am.max-attempts` in yarn side, it will be overrided as
described here:
>The maximum number of application attempts. It's a global setting for all
application masters. Each application master can specify its individual maximum
number of application attempts via the API, but the individual number cannot be
more than the global upper bound. If it is, the resourcemanager will override
it. The default number is set to 2, to allow at least one retry for AM.
http://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
Author: WangTaoTheTonic <[email protected]>
Closes #3942 from WangTaoTheTonic/HOTFIX and squashes the following commits:
9ac16ce [WangTaoTheTonic] fetch the correct max attempts
commit 82f1259aba249285fd271f9f20e095409cb4d20b
Author: Aaron Davidson <[email protected]>
Date: 2015-01-09T17:20:16Z
[Minor] Fix test RetryingBlockFetcherSuite after changed config name
Flakey due to the default retry interval being the same as our test's wait
timeout.
Author: Aaron Davidson <[email protected]>
Closes #3972 from aarondav/fix-test and squashes the following commits:
db77cab [Aaron Davidson] [Minor] Fix test after changed config name
commit 2f2b837e33eca6010fad3ad22c7d298fa6d042c9
Author: Sean Owen <[email protected]>
Date: 2015-01-09T17:35:46Z
SPARK-5136 [DOCS] Improve documentation around setting up Spark IntelliJ
project
This PR simply points to the IntelliJ wiki page instead of also including
IntelliJ notes in the docs. The intent however is to also update the wiki page
with updated tips. This is the text I propose for the IntelliJ section on the
wiki. I realize it omits some of the existing instructions on the wiki, about
enabling Hive, but I think those are actually optional.
------
IntelliJ supports both Maven- and SBT-based projects. It is recommended,
however, to import Spark as a Maven project. Choose "Import Project..." from
the File menu, and select the `pom.xml` file in the Spark root directory.
It is fine to leave all settings at their default values in the Maven
import wizard, with two caveats. First, it is usually useful to enable "Import
Maven projects automatically", sincchanges to the project structure will
automatically update the IntelliJ project.
Second, note the step that prompts you to choose active Maven build
profiles. As documented above, some build configuration require specific
profiles to be enabled. The same profiles that are enabled with `-P[profile
name]` above may be enabled on this screen. For example, if developing for
Hadoop 2.4 with YARN support, enable profiles `yarn` and `hadoop-2.4`.
These selections can be changed later by accessing the "Maven Projects"
tool window from the View menu, and expanding the Profiles section.
"Rebuild Project" can fail the first time the project is compiled, because
generate source files are not automatically generated. Try clicking the
"Generate Sources and Update Folders For All Projects" button in the "Maven
Projects" tool window to manually generate these sources.
Compilation may fail with an error like "scalac: bad option:
-P:/home/jakub/.m2/repository/org/scalamacros/paradise_2.10.4/2.0.1/paradise_2.10.4-2.0.1.jar".
If so, go to Preferences > Build, Execution, Deployment > Scala Compiler and
clear the "Additional compiler options" field. It will work then although the
option will come back when the project reimports.
Author: Sean Owen <[email protected]>
Closes #3952 from srowen/SPARK-5136 and squashes the following commits:
f3baa66 [Sean Owen] Point to new IJ / Eclipse wiki link
016b7df [Sean Owen] Point to IntelliJ wiki page instead of also including
IntelliJ notes in the docs
commit 37fea2dde60567baa69e031ed8a7895d1b923429
Author: Patrick Wendell <[email protected]>
Date: 2015-01-09T17:40:18Z
HOTFIX: Minor improvements to make-distribution.sh
1. Renames $FWDIR to $SPARK_HOME (vast majority of diff).
2. Use Spark-provided Maven.
3. Logs build flags in the RELEASE file.
Author: Patrick Wendell <[email protected]>
Closes #3973 from pwendell/master and squashes the following commits:
340a2fa [Patrick Wendell] HOTFIX: Minor improvements to make-distribution.sh
commit 0a3aa5fac073e60d09a4afa2cd2a90f6faa2982c
Author: Kay Ousterhout <[email protected]>
Date: 2015-01-09T17:47:06Z
[SPARK-1143] Separate pool tests into their own suite.
The current TaskSchedulerImplSuite includes some tests that are
actually for the TaskSchedulerImpl, but the remainder of the tests avoid
using
the TaskSchedulerImpl entirely, and actually test the pool and scheduling
algorithm mechanisms. This commit separates the pool/scheduling algorithm
tests into their own suite, and also simplifies those tests.
The pull request replaces #339.
Author: Kay Ousterhout <[email protected]>
Closes #3967 from kayousterhout/SPARK-1143 and squashes the following
commits:
8a898c4 [Kay Ousterhout] [SPARK-1143] Separate pool tests into their own
suite.
commit d2a450c8ab1669acfe6007ae87bec4dde60fea7e
Author: Liang-Chi Hsieh <[email protected]>
Date: 2015-01-09T18:27:33Z
[SPARK-5145][Mllib] Add BLAS.dsyr and use it in GaussianMixtureEM
This pr uses BLAS.dsyr to replace few implementations in GaussianMixtureEM.
Author: Liang-Chi Hsieh <[email protected]>
Closes #3949 from viirya/blas_dsyr and squashes the following commits:
4e4d6cf [Liang-Chi Hsieh] Add unit test. Rename function name, modify doc
and style.
3f57fd2 [Liang-Chi Hsieh] Add BLAS.dsyr and use it in GaussianMixtureEM.
commit 831a0d287203392bead89d2c553919bb2fb4456a
Author: Jongyoul Lee <[email protected]>
Date: 2015-01-09T18:47:08Z
[SPARK-3619] Upgrade to Mesos 0.21 to work around MESOS-1688
- update version from 0.18.1 to 0.21.0
- I'm doing some tests in order to verify some spark jobs work fine on
mesos 0.21.0 environment.
Author: Jongyoul Lee <[email protected]>
Closes #3934 from jongyoul/SPARK-3619 and squashes the following commits:
ab994fa [Jongyoul Lee] [SPARK-3619] Upgrade to Mesos 0.21 to work around
MESOS-1688 - update version from 0.18.1 to 0.21.0
commit 40d8a94b1445e10a31f9dbbf7ff0757e7f159f2c
Author: Joseph K. Bradley <[email protected]>
Date: 2015-01-09T21:00:15Z
[SPARK-5015] [mllib] Random seed for GMM + make test suite deterministic
Issues:
* From JIRA: GaussianMixtureEM uses randomness but does not take a random
seed. It should take one as a parameter.
* This also makes the test suite flaky since initialization can fail due to
stochasticity.
Fix:
* Add random seed
* Use it in test suite
CC: mengxr tgaloppo
Author: Joseph K. Bradley <[email protected]>
Closes #3981 from jkbradley/gmm-seed and squashes the following commits:
f0df4fd [Joseph K. Bradley] Added seed parameter to GMM. Updated test
suite to use seed to prevent flakiness
commit 7884948b953161e8df6d6a97e8ec37f69f3597e3
Author: WangTaoTheTonic <[email protected]>
Date: 2015-01-09T21:20:32Z
[SPARK-1953][YARN]yarn client mode Application Master memory size is same
as driver memory...
... size
Ways to set Application Master's memory on yarn-client mode:
1. `spark.yarn.am.memory` in SparkConf or System Properties
2. default value 512m
Note: this arguments is only available in yarn-client mode.
Author: WangTaoTheTonic <[email protected]>
Closes #3607 from WangTaoTheTonic/SPARK4181 and squashes the following
commits:
d5ceb1b [WangTaoTheTonic] spark.driver.memeory is used in both modes
6c1b264 [WangTaoTheTonic] rebase
b8410c0 [WangTaoTheTonic] minor optiminzation
ddcd592 [WangTaoTheTonic] fix the bug produced in rebase and some
improvements
3bf70cc [WangTaoTheTonic] rebase and give proper hint
987b99d [WangTaoTheTonic] disable --driver-memory in client mode
2b27928 [WangTaoTheTonic] inaccurate description
b7acbb2 [WangTaoTheTonic] incorrect method invoked
2557c5e [WangTaoTheTonic] missing a single blank
42075b0 [WangTaoTheTonic] arrange the args and warn logging
69c7dba [WangTaoTheTonic] rebase
1960d16 [WangTaoTheTonic] fix wrong comment
7fa9e2e [WangTaoTheTonic] log a warning
f6bee0e [WangTaoTheTonic] docs issue
d619996 [WangTaoTheTonic] Merge branch 'master' into SPARK4181
b09c309 [WangTaoTheTonic] use code format
ab16bb5 [WangTaoTheTonic] fix bug and add comments
44e48c2 [WangTaoTheTonic] minor fix
6fd13e1 [WangTaoTheTonic] add overhead mem and remove some configs
0566bb8 [WangTaoTheTonic] yarn client mode Application Master memory size
is same as driver memory size
commit a4f1946e4c42d1e350199b927018bfe9ed337929
Author: mcheah <[email protected]>
Date: 2015-01-09T22:16:20Z
[SPARK-4737] Task set manager properly handles serialization errors
Dealing with [SPARK-4737], the handling of serialization errors should not
be the DAGScheduler's responsibility. The task set manager now catches the
error and aborts the stage.
If the TaskSetManager throws a TaskNotSerializableException, the
TaskSchedulerImpl will return an empty list of task descriptions, because no
tasks were started. The scheduler should abort the stage gracefully.
Note that I'm not too familiar with this part of the codebase and its place
in the overall architecture of the Spark stack. If implementing it this way
will have any averse side effects please voice that loudly.
Author: mcheah <[email protected]>
Closes #3638 from mccheah/task-set-manager-properly-handle-ser-err and
squashes the following commits:
1545984 [mcheah] Some more style fixes from Andrew Or.
5267929 [mcheah] Fixing style suggestions from Andrew Or.
dfa145b [mcheah] Fixing style from Josh Rosen's feedback
b2a430d [mcheah] Not returning empty seq when a task set cannot be
serialized.
94844d7 [mcheah] Fixing compilation error, one brace too many
5f486f4 [mcheah] Adding license header for fake task class
bf5e706 [mcheah] Fixing indentation.
097e7a2 [mcheah] [SPARK-4737] Catching task serialization exception in
TaskSetManager
commit 30f7f1744c6441fae1e8299a27046d06d105b2e6
Author: Kousuke Saruta <[email protected]>
Date: 2015-01-09T22:40:45Z
[DOC] Fixed Mesos version in doc from 0.18.1 to 0.21.0
#3934 upgraded Mesos version so we should also fix docs right?
This issue is really minor so I don't file in JIRA.
Author: Kousuke Saruta <[email protected]>
Closes #3982 from sarutak/fix-mesos-version and squashes the following
commits:
9a86ee3 [Kousuke Saruta] Fixed mesos version from 0.18.1 to 0.21.0
commit a675d98ffec5054c1e0818b737609a34be9be983
Author: bilna <[email protected]>
Date: 2015-01-09T22:45:28Z
[Minor] Fix import order and other coding style
fixed import order and other coding style
Author: bilna <[email protected]>
Author: Bilna P <[email protected]>
Closes #3966 from Bilna/master and squashes the following commits:
5e76f04 [bilna] fix import order and other coding style
5718d66 [bilna] Merge remote-tracking branch 'upstream/master'
ae56514 [bilna] Merge remote-tracking branch 'upstream/master'
acea3a3 [bilna] Adding dependency with scope test
28681fa [bilna] Merge remote-tracking branch 'upstream/master'
fac3904 [bilna] Correction in Indentation and coding style
ed9db4c [bilna] Merge remote-tracking branch 'upstream/master'
4b34ee7 [Bilna P] Update MQTTStreamSuite.scala
04503cf [bilna] Added embedded broker service for mqtt test
89d804e [bilna] Merge remote-tracking branch 'upstream/master'
fc8eb28 [bilna] Merge remote-tracking branch 'upstream/master'
4b58094 [Bilna P] Update MQTTStreamSuite.scala
b1ac4ad [bilna] Added BeforeAndAfter
5f6bfd2 [bilna] Added BeforeAndAfter
e8b6623 [Bilna P] Update MQTTStreamSuite.scala
5ca6691 [Bilna P] Update MQTTStreamSuite.scala
8616495 [bilna] [SPARK-4631] unit test for MQTT
commit 37a27b427dc7ae8fe731907472b38a2e5ff54ae8
Author: WangTaoTheTonic <[email protected]>
Date: 2015-01-10T01:10:02Z
[SPARK-4990][Deploy]to find default properties file, search SPARK_CONF_DIR
first
https://issues.apache.org/jira/browse/SPARK-4990
Author: WangTaoTheTonic <[email protected]>
Author: WangTao <[email protected]>
Closes #3823 from WangTaoTheTonic/SPARK-4990 and squashes the following
commits:
133c43e [WangTao] Update spark-submit2.cmd
b1ab402 [WangTao] Update spark-submit
4cc7f34 [WangTaoTheTonic] rebase
55300bc [WangTaoTheTonic] use export to make it global
d8d3cb7 [WangTaoTheTonic] remove blank line
07b9ebf [WangTaoTheTonic] check SPARK_CONF_DIR instead of checking
properties file
c5a85eb [WangTaoTheTonic] to find default properties file, search
SPARK_CONF_DIR first
commit 0a9c325e6a2d0028c30f3e13e6bc6c7e71170929
Author: MechCoder <[email protected]>
Date: 2015-01-10T01:45:18Z
[SPARK-4406] [MLib] FIX: Validate k in SVD
Raise exception when k is non-positive in SVD
Author: MechCoder <[email protected]>
Closes #3945 from MechCoder/spark-4406 and squashes the following commits:
64e6d2d [MechCoder] TST: Add better test errors and messages
12dae73 [MechCoder] [SPARK-4406] FIX: Validate k in SVD
commit 29534b6bf401043123aba92473389946bb84946a
Author: luogankun <[email protected]>
Date: 2015-01-10T04:38:41Z
[SPARK-5141][SQL]CaseInsensitiveMap throws java.io.NotSerializableException
CaseInsensitiveMap throws java.io.NotSerializableException.
Author: luogankun <[email protected]>
Closes #3944 from luogankun/SPARK-5141 and squashes the following commits:
b6d63d5 [luogankun] [SPARK-5141]CaseInsensitiveMap throws
java.io.NotSerializableException
commit 5d2bb0fffeb2e3cae744b410b55cef99595f0af1
Author: Alex Liu <[email protected]>
Date: 2015-01-10T21:19:12Z
[SPARK-4925][SQL] Publish Spark SQL hive-thriftserver maven artifact
Author: Alex Liu <[email protected]>
Closes #3766 from alexliu68/SPARK-SQL-4925 and squashes the following
commits:
3137b51 [Alex Liu] [SPARK-4925][SQL] Remove sql/hive-thriftserver module
from pom.xml
15f2e38 [Alex Liu] [SPARK-4925][SQL] Publish Spark SQL hive-thriftserver
maven artifact
commit cf5686b922a90612cea185c882033989b391a021
Author: Alex Liu <[email protected]>
Date: 2015-01-10T21:23:09Z
[SPARK-4943][SQL] Allow table name having dot for db/catalog
The pull only fixes the parsing error and changes API to use
tableIdentifier. Joining different catalog datasource related change is not
done in this pull.
Author: Alex Liu <[email protected]>
Closes #3941 from alexliu68/SPARK-SQL-4943-3 and squashes the following
commits:
343ae27 [Alex Liu] [SPARK-4943][SQL] refactoring according to review
29e5e55 [Alex Liu] [SPARK-4943][SQL] fix failed Hive CTAS tests
6ae77ce [Alex Liu] [SPARK-4943][SQL] fix TestHive matching error
3652997 [Alex Liu] [SPARK-4943][SQL] Allow table name having dot to support
db/catalog ...
commit 37a79554360b7809a1b7413f831a8e91d68400d6
Author: scwf <[email protected]>
Date: 2015-01-10T21:53:21Z
[SPARK-4574][SQL] Adding support for defining schema in foreign DDL
commands.
Adding support for defining schema in foreign DDL commands. Now foreign DDL
support commands like:
```
CREATE TEMPORARY TABLE avroTable
USING org.apache.spark.sql.avro
OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")
```
With this PR user can define schema instead of infer from file, so support
ddl command as follows:
```
CREATE TEMPORARY TABLE avroTable(a int, b string)
USING org.apache.spark.sql.avro
OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")
```
Author: scwf <[email protected]>
Author: Yin Huai <[email protected]>
Author: Fei Wang <[email protected]>
Author: wangfei <[email protected]>
Closes #3431 from scwf/ddl and squashes the following commits:
7e79ce5 [Fei Wang] Merge pull request #22 from yhuai/pr3431yin
38f634e [Yin Huai] Remove Option from createRelation.
65e9c73 [Yin Huai] Revert all changes since applying a given schema has not
been testd.
a852b10 [scwf] remove cleanIdentifier
f336a16 [Fei Wang] Merge pull request #21 from yhuai/pr3431yin
baf79b5 [Yin Huai] Test special characters quoted by backticks.
50a03b0 [Yin Huai] Use JsonRDD.nullTypeToStringType to convert NullType to
StringType.
1eeb769 [Fei Wang] Merge pull request #20 from yhuai/pr3431yin
f5c22b0 [Yin Huai] Refactor code and update test cases.
f1cffe4 [Yin Huai] Revert "minor refactory"
b621c8f [scwf] minor refactory
d02547f [scwf] fix HiveCompatibilitySuite test failure
8dfbf7a [scwf] more tests for complex data type
ddab984 [Fei Wang] Merge pull request #19 from yhuai/pr3431yin
91ad91b [Yin Huai] Parse data types in DDLParser.
cf982d2 [scwf] fixed test failure
445b57b [scwf] address comments
02a662c [scwf] style issue
44eb70c [scwf] fix decimal parser issue
83b6fc3 [scwf] minor fix
9bf12f8 [wangfei] adding test case
7787ec7 [wangfei] added SchemaRelationProvider
0ba70df [wangfei] draft version
commit 94b489f8d3966f5133b75be4d79818a3b19a717d
Author: scwf <[email protected]>
Date: 2015-01-10T22:08:04Z
[SPARK-4861][SQL] Refactory command in spark sql
Follow up for #3712.
This PR finally remove ```CommandStrategy``` and make all commands follow
```RunnableCommand``` so they can go with ```case r: RunnableCommand =>
ExecutedCommand(r) :: Nil```.
One exception is the ```DescribeCommand``` of hive, which is a special case
and need to distinguish hive table and temporary table, so still keep
```HiveCommandStrategy``` here.
Author: scwf <[email protected]>
Closes #3948 from scwf/followup-SPARK-4861 and squashes the following
commits:
6b48e64 [scwf] minor style fix
2c62e9d [scwf] fix for hive module
5a7a819 [scwf] Refactory command in spark sql
commit 447f643adf7ea2f89018ac380412eb5dc7133af5
Author: Yanbo Liang <[email protected]>
Date: 2015-01-10T22:16:37Z
SPARK-4963 [SQL] Add copy to SQL's Sample operator
https://issues.apache.org/jira/browse/SPARK-4963
SchemaRDD.sample() return wrong results due to GapSamplingIterator
operating on mutable row.
HiveTableScan make RDD with SpecificMutableRow and SchemaRDD.sample() will
return GapSamplingIterator for iterating.
override def next(): T = {
val r = data.next()
advance
r
}
GapSamplingIterator.next() return the current underlying element and
assigned it to r.
However if the underlying iterator is mutable row just like what
HiveTableScan returned, underlying iterator and r will point to the same object.
After advance operation, we drop some underlying elments and it also
changed r which is not expected. Then we return the wrong value different from
initial r.
To fix this issue, the most direct way is to make HiveTableScan return
mutable row with copy just like the initial commit that I have made. This
solution will make HiveTableScan can not get the full advantage of reusable
MutableRow, but it can make sample operation return correct result.
Further more, we need to investigate GapSamplingIterator.next() and make
it can implement copy operation inside it. To achieve this, we should define
every elements that RDD can store implement the function like cloneable and it
will make huge change.
Author: Yanbo Liang <[email protected]>
Closes #3827 from yanbohappy/spark-4963 and squashes the following commits:
0912ca0 [Yanbo Liang] code format keep
65c4e7c [Yanbo Liang] import file and clear annotation
55c7c56 [Yanbo Liang] better output of test case
cea7e2e [Yanbo Liang] SchemaRDD add copy operation before Sample operator
e840829 [Yanbo Liang] HiveTableScan return mutable row with copy
commit 63729e175b4aa2ee25f05e2598785719c1e4acb7
Author: Michael Armbrust <[email protected]>
Date: 2015-01-10T22:25:45Z
[SPARK-5187][SQL] Fix caching of tables with HiveUDFs in the WHERE clause
Author: Michael Armbrust <[email protected]>
Closes #3987 from marmbrus/hiveUdfCaching and squashes the following
commits:
8bca2fa [Michael Armbrust] [SPARK-5187][SQL] Fix caching of tables with
HiveUDFs in the WHERE clause
commit 6687ee8c5000048a0f45ede5f1e1288adba96019
Author: YanTangZhai <[email protected]>
Date: 2015-01-10T23:05:23Z
[SPARK-4692] [SQL] Support ! boolean logic operator like NOT
Support ! boolean logic operator like NOT in sql as follows
select * from for_test where !(col1 > col2)
Author: YanTangZhai <[email protected]>
Author: Michael Armbrust <[email protected]>
Closes #3555 from YanTangZhai/SPARK-4692 and squashes the following commits:
1a9f605 [YanTangZhai] Update HiveQuerySuite.scala
7c03c68 [YanTangZhai] Merge pull request #23 from apache/master
992046e [YanTangZhai] Update HiveQuerySuite.scala
ea618f4 [YanTangZhai] Update HiveQuerySuite.scala
192411d [YanTangZhai] Merge pull request #17 from YanTangZhai/master
e4c2c0a [YanTangZhai] Merge pull request #15 from apache/master
1e1ebb4 [YanTangZhai] Update HiveQuerySuite.scala
efc4210 [YanTangZhai] Update HiveQuerySuite.scala
bd2c444 [YanTangZhai] Update HiveQuerySuite.scala
1893956 [YanTangZhai] Merge pull request #14 from marmbrus/pr/3555
59e4de9 [Michael Armbrust] make hive test
718afeb [YanTangZhai] Merge pull request #12 from apache/master
950b21e [YanTangZhai] Update HiveQuerySuite.scala
74175b4 [YanTangZhai] Update HiveQuerySuite.scala
92242c7 [YanTangZhai] Update HiveQl.scala
6e643f8 [YanTangZhai] Merge pull request #11 from apache/master
e249846 [YanTangZhai] Merge pull request #10 from apache/master
d26d982 [YanTangZhai] Merge pull request #9 from apache/master
76d4027 [YanTangZhai] Merge pull request #8 from apache/master
03b62b0 [YanTangZhai] Merge pull request #7 from apache/master
8a00106 [YanTangZhai] Merge pull request #6 from apache/master
cbcba66 [YanTangZhai] Merge pull request #3 from apache/master
cdef539 [YanTangZhai] Merge pull request #1 from apache/master
commit dbbd5f5d255d41505406cee046586b72ae6199e9
Author: CodingCat <[email protected]>
Date: 2015-01-10T23:35:41Z
[SPARK-5181] do not print writing WAL log when WAL is disabled
https://issues.apache.org/jira/browse/SPARK-5181
Currently, even the logManager is not created, we still see the log entry
s"Writing to log $record"
a simple fix to make log more accurate
Author: CodingCat <[email protected]>
Closes #3985 from CodingCat/SPARK-5181 and squashes the following commits:
0e27dc5 [CodingCat] do not print writing WAL log when WAL is disabled
commit 04da7031fda06522c4927df45185255320c37f3e
Author: GuoQiang Li <[email protected]>
Date: 2015-01-10T23:38:43Z
[Minor]Resolve sbt warnings during build (MQTTStreamSuite.scala).
cc andrewor14
Author: GuoQiang Li <[email protected]>
Closes #3989 from witgo/MQTTStreamSuite and squashes the following commits:
a6e967e [GuoQiang Li] Resolve sbt warnings during build
(MQTTStreamSuite.scala).
commit c9b4a7de2304c0b80c4e9ea49c045ae64f26ed8f
Author: wangfei <[email protected]>
Date: 2015-01-11T01:04:56Z
[SPARK-4871][SQL] Show sql statement in spark ui when run sql with spark-sql
Author: wangfei <[email protected]>
Closes #3718 from scwf/sparksqlui and squashes the following commits:
e0d6b5d [wangfei] format fix
383b505 [wangfei] fix conflicts
4d2038a [wangfei] using setJobDescription
df79837 [wangfei] fix compile error
92ce834 [wangfei] show sql statement in spark ui when run sql use spark-sql
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]