[GitHub] spark pull request: Kafka streaming

damnMeddlingKid Thu, 03 Dec 2015 13:28:58 -0800

GitHub user damnMeddlingKid opened a pull request:

    https://github.com/apache/spark/pull/10136


    Kafka streaming

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Shopify/spark kafka_streaming

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10136.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10136
    
----
commit 854319e589c89b2b6b4a9d02916f6f748fc5680a
Author: Fernando Otero (ZeoS) <[email protected]>
Date:   2015-01-08T20:42:54Z

    SPARK-5148 [MLlib] Make usersOut/productsOut storagelevel in ALS 
configurable
    
    Author: Fernando Otero (ZeoS) <[email protected]>
    
    Closes #3953 from zeitos/storageLevel and squashes the following commits:
    
    0f070b9 [Fernando Otero (ZeoS)] fix imports
    6869e80 [Fernando Otero (ZeoS)] fix comment length
    90c9f7e [Fernando Otero (ZeoS)] fix comment length
    18a992e [Fernando Otero (ZeoS)] changing storage level

commit d9cad94b1df0200207ba03fb0168373ccc3a8597
Author: Kousuke Saruta <[email protected]>
Date:   2015-01-08T21:43:09Z

    [SPARK-4973][CORE] Local directory in the driver of client-mode continues 
remaining even if application finished when external shuffle is enabled
    
    When we enables external shuffle service, local directories in the driver 
of client-mode continue remaining even if application has finished.
    I think local directories for drivers should be deleted.
    
    Author: Kousuke Saruta <[email protected]>
    
    Closes #3811 from sarutak/SPARK-4973 and squashes the following commits:
    
    ad944ab [Kousuke Saruta] Fixed DiskBlockManager to cleanup local directory 
if it's the driver
    43770da [Kousuke Saruta] Merge branch 'master' of 
git://git.apache.org/spark into SPARK-4973
    88feecd [Kousuke Saruta] Merge branch 'master' of 
git://git.apache.org/spark into SPARK-4973
    d99718e [Kousuke Saruta] Fixed SparkSubmit.scala and DiskBlockManager.scala 
in order to delete local directories of the driver of local-mode when external 
shuffle service is enabled

commit b14068bf7b2dff450101d48a59e79761e3ca4eb2
Author: RJ Nowling <[email protected]>
Date:   2015-01-08T23:03:43Z

    [SPARK-4891][PySpark][MLlib] Add gamma/log normal/exp dist sampling to P...
    
    ...ySpark MLlib
    
    This is a follow up to PR3680 https://github.com/apache/spark/pull/3680 .
    
    Author: RJ Nowling <[email protected]>
    
    Closes #3955 from rnowling/spark4891 and squashes the following commits:
    
    1236a01 [RJ Nowling] Fix Python style issues
    7a01a78 [RJ Nowling] Fix Python style issues
    174beab [RJ Nowling] [SPARK-4891][PySpark][MLlib] Add gamma/log normal/exp 
dist sampling to PySpark MLlib

commit 5a1b7a9c8a77b6d1ef5553490d0ccf291dfac06f
Author: Marcelo Vanzin <[email protected]>
Date:   2015-01-09T01:15:13Z

    [SPARK-4048] Enhance and extend hadoop-provided profile.
    
    This change does a few things to make the hadoop-provided profile more 
useful:
    
    - Create new profiles for other libraries / services that might be provided 
by the infrastructure
    - Simplify and fix the poms so that the profiles are only activated while 
building assemblies.
    - Fix tests so that they're able to run when the profiles are activated
    - Add a new env variable to be used by distributions that use these 
profiles to provide the runtime
      classpath for Spark jobs and daemons.
    
    Author: Marcelo Vanzin <[email protected]>
    
    Closes #2982 from vanzin/SPARK-4048 and squashes the following commits:
    
    82eb688 [Marcelo Vanzin] Add a comment.
    eb228c0 [Marcelo Vanzin] Fix borked merge.
    4e38f4e [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
    9ef79a3 [Marcelo Vanzin] Alternative way to propagate test classpath to 
child processes.
    371ebee [Marcelo Vanzin] Review feedback.
    52f366d [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
    83099fc [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
    7377e7b [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
    322f882 [Marcelo Vanzin] Fix merge fail.
    f24e9e7 [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
    8b00b6a [Marcelo Vanzin] Merge branch 'master' into SPARK-4048
    9640503 [Marcelo Vanzin] Cleanup child process log message.
    115fde5 [Marcelo Vanzin] Simplify a comment (and make it consistent with 
another pom).
    e3ab2da [Marcelo Vanzin] Fix hive-thriftserver profile.
    7820d58 [Marcelo Vanzin] Fix CliSuite with provided profiles.
    1be73d4 [Marcelo Vanzin] Restore flume-provided profile.
    d1399ed [Marcelo Vanzin] Restore jetty dependency.
    82a54b9 [Marcelo Vanzin] Remove unused profile.
    5c54a25 [Marcelo Vanzin] Fix HiveThriftServer2Suite with *-provided 
profiles.
    1fc4d0b [Marcelo Vanzin] Update dependencies for hive-thriftserver.
    f7b3bbe [Marcelo Vanzin] Add snappy to hadoop-provided list.
    9e4e001 [Marcelo Vanzin] Remove duplicate hive profile.
    d928d62 [Marcelo Vanzin] Redirect child stderr to parent's log.
    4d67469 [Marcelo Vanzin] Propagate SPARK_DIST_CLASSPATH on Yarn.
    417d90e [Marcelo Vanzin] Introduce "SPARK_DIST_CLASSPATH".
    2f95f0d [Marcelo Vanzin] Propagate classpath to child processes during 
testing.
    1adf91c [Marcelo Vanzin] Re-enable maven-install-plugin for a few projects.
    284dda6 [Marcelo Vanzin] Rework the "hadoop-provided" profile, add new ones.

commit 013e031d01dca052b94a094c08b7d7f76f640711
Author: Nicholas Chammas <[email protected]>
Date:   2015-01-09T01:42:08Z

    [SPARK-5122] Remove Shark from spark-ec2
    
    I moved the Spark-Shark version map [to the 
wiki](https://cwiki.apache.org/confluence/display/SPARK/Spark-Shark+version+mapping).
    
    This PR has a [matching PR in 
mesos/spark-ec2](https://github.com/mesos/spark-ec2/pull/89).
    
    Author: Nicholas Chammas <[email protected]>
    
    Closes #3939 from nchammas/remove-shark and squashes the following commits:
    
    66e0841 [Nicholas Chammas] fix style
    ceeab85 [Nicholas Chammas] show default Spark GitHub repo
    7270126 [Nicholas Chammas] validate Spark hashes
    db4935d [Nicholas Chammas] validate spark version upfront
    fc0d5b9 [Nicholas Chammas] remove Shark

commit 8a95a3e61580b1c1f6c0a3e124aa8469255db968
Author: WangTaoTheTonic <[email protected]>
Date:   2015-01-09T14:10:09Z

    [SPARK-5169][YARN]fetch the correct max attempts
    
    Soryy for fetching the wrong max attempts in this commit 
https://github.com/apache/spark/commit/8fdd48959c93b9cf809f03549e2ae6c4687d1fcd.
    We need to fix it now.
    
    tgravescs
    
    If we set an spark.yarn.maxAppAttempts which is larger than 
`yarn.resourcemanager.am.max-attempts` in yarn side, it will be overrided as 
described here:
    >The maximum number of application attempts. It's a global setting for all 
application masters. Each application master can specify its individual maximum 
number of application attempts via the API, but the individual number cannot be 
more than the global upper bound. If it is, the resourcemanager will override 
it. The default number is set to 2, to allow at least one retry for AM.
    
    
http://hadoop.apache.org/docs/r2.6.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
    
    Author: WangTaoTheTonic <[email protected]>
    
    Closes #3942 from WangTaoTheTonic/HOTFIX and squashes the following commits:
    
    9ac16ce [WangTaoTheTonic] fetch the correct max attempts

commit 82f1259aba249285fd271f9f20e095409cb4d20b
Author: Aaron Davidson <[email protected]>
Date:   2015-01-09T17:20:16Z

    [Minor] Fix test RetryingBlockFetcherSuite after changed config name
    
    Flakey due to the default retry interval being the same as our test's wait 
timeout.
    
    Author: Aaron Davidson <[email protected]>
    
    Closes #3972 from aarondav/fix-test and squashes the following commits:
    
    db77cab [Aaron Davidson] [Minor] Fix test after changed config name

commit 2f2b837e33eca6010fad3ad22c7d298fa6d042c9
Author: Sean Owen <[email protected]>
Date:   2015-01-09T17:35:46Z

    SPARK-5136 [DOCS] Improve documentation around setting up Spark IntelliJ 
project
    
    This PR simply points to the IntelliJ wiki page instead of also including 
IntelliJ notes in the docs. The intent however is to also update the wiki page 
with updated tips. This is the text I propose for the IntelliJ section on the 
wiki. I realize it omits some of the existing instructions on the wiki, about 
enabling Hive, but I think those are actually optional.
    
    ------
    
    IntelliJ supports both Maven- and SBT-based projects. It is recommended, 
however, to import Spark as a Maven project. Choose "Import Project..." from 
the File menu, and select the `pom.xml` file in the Spark root directory.
    
    It is fine to leave all settings at their default values in the Maven 
import wizard, with two caveats. First, it is usually useful to enable "Import 
Maven projects automatically", sincchanges to the project structure will 
automatically update the IntelliJ project.
    
    Second, note the step that prompts you to choose active Maven build 
profiles. As documented above, some build configuration require specific 
profiles to be enabled. The same profiles that are enabled with `-P[profile 
name]` above may be enabled on this screen. For example, if developing for 
Hadoop 2.4 with YARN support, enable profiles `yarn` and `hadoop-2.4`.
    
    These selections can be changed later by accessing the "Maven Projects" 
tool window from the View menu, and expanding the Profiles section.
    
    "Rebuild Project" can fail the first time the project is compiled, because 
generate source files are not automatically generated. Try clicking the  
"Generate Sources and Update Folders For All Projects" button in the "Maven 
Projects" tool window to manually generate these sources.
    
    Compilation may fail with an error like "scalac: bad option: 
-P:/home/jakub/.m2/repository/org/scalamacros/paradise_2.10.4/2.0.1/paradise_2.10.4-2.0.1.jar".
 If so, go to Preferences > Build, Execution, Deployment > Scala Compiler and 
clear the "Additional compiler options" field. It will work then although the 
option will come back when the project reimports.
    
    Author: Sean Owen <[email protected]>
    
    Closes #3952 from srowen/SPARK-5136 and squashes the following commits:
    
    f3baa66 [Sean Owen] Point to new IJ / Eclipse wiki link
    016b7df [Sean Owen] Point to IntelliJ wiki page instead of also including 
IntelliJ notes in the docs

commit 37fea2dde60567baa69e031ed8a7895d1b923429
Author: Patrick Wendell <[email protected]>
Date:   2015-01-09T17:40:18Z

    HOTFIX: Minor improvements to make-distribution.sh
    
    1. Renames $FWDIR to $SPARK_HOME (vast majority of diff).
    2. Use Spark-provided Maven.
    3. Logs build flags in the RELEASE file.
    
    Author: Patrick Wendell <[email protected]>
    
    Closes #3973 from pwendell/master and squashes the following commits:
    
    340a2fa [Patrick Wendell] HOTFIX: Minor improvements to make-distribution.sh

commit 0a3aa5fac073e60d09a4afa2cd2a90f6faa2982c
Author: Kay Ousterhout <[email protected]>
Date:   2015-01-09T17:47:06Z

    [SPARK-1143] Separate pool tests into their own suite.
    
    The current TaskSchedulerImplSuite includes some tests that are
    actually for the TaskSchedulerImpl, but the remainder of the tests avoid 
using
    the TaskSchedulerImpl entirely, and actually test the pool and scheduling
    algorithm mechanisms. This commit separates the pool/scheduling algorithm
    tests into their own suite, and also simplifies those tests.
    
    The pull request replaces #339.
    
    Author: Kay Ousterhout <[email protected]>
    
    Closes #3967 from kayousterhout/SPARK-1143 and squashes the following 
commits:
    
    8a898c4 [Kay Ousterhout] [SPARK-1143] Separate pool tests into their own 
suite.

commit d2a450c8ab1669acfe6007ae87bec4dde60fea7e
Author: Liang-Chi Hsieh <[email protected]>
Date:   2015-01-09T18:27:33Z

    [SPARK-5145][Mllib] Add BLAS.dsyr and use it in GaussianMixtureEM
    
    This pr uses BLAS.dsyr to replace few implementations in GaussianMixtureEM.
    
    Author: Liang-Chi Hsieh <[email protected]>
    
    Closes #3949 from viirya/blas_dsyr and squashes the following commits:
    
    4e4d6cf [Liang-Chi Hsieh] Add unit test. Rename function name, modify doc 
and style.
    3f57fd2 [Liang-Chi Hsieh] Add BLAS.dsyr and use it in GaussianMixtureEM.

commit 831a0d287203392bead89d2c553919bb2fb4456a
Author: Jongyoul Lee <[email protected]>
Date:   2015-01-09T18:47:08Z

    [SPARK-3619] Upgrade to Mesos 0.21 to work around MESOS-1688
    
    - update version from 0.18.1 to 0.21.0
    - I'm doing some tests in order to verify some spark jobs work fine on 
mesos 0.21.0 environment.
    
    Author: Jongyoul Lee <[email protected]>
    
    Closes #3934 from jongyoul/SPARK-3619 and squashes the following commits:
    
    ab994fa [Jongyoul Lee] [SPARK-3619] Upgrade to Mesos 0.21 to work around 
MESOS-1688 - update version from 0.18.1 to 0.21.0

commit 40d8a94b1445e10a31f9dbbf7ff0757e7f159f2c
Author: Joseph K. Bradley <[email protected]>
Date:   2015-01-09T21:00:15Z

    [SPARK-5015] [mllib] Random seed for GMM + make test suite deterministic
    
    Issues:
    * From JIRA: GaussianMixtureEM uses randomness but does not take a random 
seed. It should take one as a parameter.
    * This also makes the test suite flaky since initialization can fail due to 
stochasticity.
    
    Fix:
    * Add random seed
    * Use it in test suite
    
    CC: mengxr  tgaloppo
    
    Author: Joseph K. Bradley <[email protected]>
    
    Closes #3981 from jkbradley/gmm-seed and squashes the following commits:
    
    f0df4fd [Joseph K. Bradley] Added seed parameter to GMM.  Updated test 
suite to use seed to prevent flakiness

commit 7884948b953161e8df6d6a97e8ec37f69f3597e3
Author: WangTaoTheTonic <[email protected]>
Date:   2015-01-09T21:20:32Z

    [SPARK-1953][YARN]yarn client mode Application Master memory size is same 
as driver memory...
    
    ... size
    
    Ways to set Application Master's memory on yarn-client mode:
    1.  `spark.yarn.am.memory` in SparkConf or System Properties
    2.  default value 512m
    
    Note: this arguments is only available in yarn-client mode.
    
    Author: WangTaoTheTonic <[email protected]>
    
    Closes #3607 from WangTaoTheTonic/SPARK4181 and squashes the following 
commits:
    
    d5ceb1b [WangTaoTheTonic] spark.driver.memeory is used in both modes
    6c1b264 [WangTaoTheTonic] rebase
    b8410c0 [WangTaoTheTonic] minor optiminzation
    ddcd592 [WangTaoTheTonic] fix the bug produced in rebase and some 
improvements
    3bf70cc [WangTaoTheTonic] rebase and give proper hint
    987b99d [WangTaoTheTonic] disable --driver-memory in client mode
    2b27928 [WangTaoTheTonic] inaccurate description
    b7acbb2 [WangTaoTheTonic] incorrect method invoked
    2557c5e [WangTaoTheTonic] missing a single blank
    42075b0 [WangTaoTheTonic] arrange the args and warn logging
    69c7dba [WangTaoTheTonic] rebase
    1960d16 [WangTaoTheTonic] fix wrong comment
    7fa9e2e [WangTaoTheTonic] log a warning
    f6bee0e [WangTaoTheTonic] docs issue
    d619996 [WangTaoTheTonic] Merge branch 'master' into SPARK4181
    b09c309 [WangTaoTheTonic] use code format
    ab16bb5 [WangTaoTheTonic] fix bug and add comments
    44e48c2 [WangTaoTheTonic] minor fix
    6fd13e1 [WangTaoTheTonic] add overhead mem and remove some configs
    0566bb8 [WangTaoTheTonic] yarn client mode Application Master memory size 
is same as driver memory size

commit a4f1946e4c42d1e350199b927018bfe9ed337929
Author: mcheah <[email protected]>
Date:   2015-01-09T22:16:20Z

    [SPARK-4737] Task set manager properly handles serialization errors
    
    Dealing with [SPARK-4737], the handling of serialization errors should not 
be the DAGScheduler's responsibility. The task set manager now catches the 
error and aborts the stage.
    
    If the TaskSetManager throws a TaskNotSerializableException, the 
TaskSchedulerImpl will return an empty list of task descriptions, because no 
tasks were started. The scheduler should abort the stage gracefully.
    
    Note that I'm not too familiar with this part of the codebase and its place 
in the overall architecture of the Spark stack. If implementing it this way 
will have any averse side effects please voice that loudly.
    
    Author: mcheah <[email protected]>
    
    Closes #3638 from mccheah/task-set-manager-properly-handle-ser-err and 
squashes the following commits:
    
    1545984 [mcheah] Some more style fixes from Andrew Or.
    5267929 [mcheah] Fixing style suggestions from Andrew Or.
    dfa145b [mcheah] Fixing style from Josh Rosen's feedback
    b2a430d [mcheah] Not returning empty seq when a task set cannot be 
serialized.
    94844d7 [mcheah] Fixing compilation error, one brace too many
    5f486f4 [mcheah] Adding license header for fake task class
    bf5e706 [mcheah] Fixing indentation.
    097e7a2 [mcheah] [SPARK-4737] Catching task serialization exception in 
TaskSetManager

commit 30f7f1744c6441fae1e8299a27046d06d105b2e6
Author: Kousuke Saruta <[email protected]>
Date:   2015-01-09T22:40:45Z

    [DOC] Fixed Mesos version in doc from 0.18.1 to 0.21.0
    
    #3934 upgraded Mesos version so we should also fix docs right?
    
    This issue is really minor so I don't file in JIRA.
    
    Author: Kousuke Saruta <[email protected]>
    
    Closes #3982 from sarutak/fix-mesos-version and squashes the following 
commits:
    
    9a86ee3 [Kousuke Saruta] Fixed mesos version from 0.18.1 to 0.21.0

commit a675d98ffec5054c1e0818b737609a34be9be983
Author: bilna <[email protected]>
Date:   2015-01-09T22:45:28Z

    [Minor] Fix import order and other coding style
    
    fixed import order and other coding style
    
    Author: bilna <[email protected]>
    Author: Bilna P <[email protected]>
    
    Closes #3966 from Bilna/master and squashes the following commits:
    
    5e76f04 [bilna] fix import order and other coding style
    5718d66 [bilna] Merge remote-tracking branch 'upstream/master'
    ae56514 [bilna] Merge remote-tracking branch 'upstream/master'
    acea3a3 [bilna] Adding dependency with scope test
    28681fa [bilna] Merge remote-tracking branch 'upstream/master'
    fac3904 [bilna] Correction in Indentation and coding style
    ed9db4c [bilna] Merge remote-tracking branch 'upstream/master'
    4b34ee7 [Bilna P] Update MQTTStreamSuite.scala
    04503cf [bilna] Added embedded broker service for mqtt test
    89d804e [bilna] Merge remote-tracking branch 'upstream/master'
    fc8eb28 [bilna] Merge remote-tracking branch 'upstream/master'
    4b58094 [Bilna P] Update MQTTStreamSuite.scala
    b1ac4ad [bilna] Added BeforeAndAfter
    5f6bfd2 [bilna] Added BeforeAndAfter
    e8b6623 [Bilna P] Update MQTTStreamSuite.scala
    5ca6691 [Bilna P] Update MQTTStreamSuite.scala
    8616495 [bilna] [SPARK-4631] unit test for MQTT

commit 37a27b427dc7ae8fe731907472b38a2e5ff54ae8
Author: WangTaoTheTonic <[email protected]>
Date:   2015-01-10T01:10:02Z

    [SPARK-4990][Deploy]to find default properties file, search SPARK_CONF_DIR 
first
    
    https://issues.apache.org/jira/browse/SPARK-4990
    
    Author: WangTaoTheTonic <[email protected]>
    Author: WangTao <[email protected]>
    
    Closes #3823 from WangTaoTheTonic/SPARK-4990 and squashes the following 
commits:
    
    133c43e [WangTao] Update spark-submit2.cmd
    b1ab402 [WangTao] Update spark-submit
    4cc7f34 [WangTaoTheTonic] rebase
    55300bc [WangTaoTheTonic] use export to make it global
    d8d3cb7 [WangTaoTheTonic] remove blank line
    07b9ebf [WangTaoTheTonic] check SPARK_CONF_DIR instead of checking 
properties file
    c5a85eb [WangTaoTheTonic] to find default properties file, search 
SPARK_CONF_DIR first

commit 0a9c325e6a2d0028c30f3e13e6bc6c7e71170929
Author: MechCoder <[email protected]>
Date:   2015-01-10T01:45:18Z

    [SPARK-4406] [MLib] FIX: Validate k in SVD
    
    Raise exception when k is non-positive in SVD
    
    Author: MechCoder <[email protected]>
    
    Closes #3945 from MechCoder/spark-4406 and squashes the following commits:
    
    64e6d2d [MechCoder] TST: Add better test errors and messages
    12dae73 [MechCoder] [SPARK-4406] FIX: Validate k in SVD

commit 29534b6bf401043123aba92473389946bb84946a
Author: luogankun <[email protected]>
Date:   2015-01-10T04:38:41Z

    [SPARK-5141][SQL]CaseInsensitiveMap throws java.io.NotSerializableException
    
    CaseInsensitiveMap throws java.io.NotSerializableException.
    
    Author: luogankun <[email protected]>
    
    Closes #3944 from luogankun/SPARK-5141 and squashes the following commits:
    
    b6d63d5 [luogankun] [SPARK-5141]CaseInsensitiveMap throws 
java.io.NotSerializableException

commit 5d2bb0fffeb2e3cae744b410b55cef99595f0af1
Author: Alex Liu <[email protected]>
Date:   2015-01-10T21:19:12Z

    [SPARK-4925][SQL] Publish Spark SQL hive-thriftserver maven artifact
    
    Author: Alex Liu <[email protected]>
    
    Closes #3766 from alexliu68/SPARK-SQL-4925 and squashes the following 
commits:
    
    3137b51 [Alex Liu] [SPARK-4925][SQL] Remove sql/hive-thriftserver module 
from pom.xml
    15f2e38 [Alex Liu] [SPARK-4925][SQL] Publish Spark SQL hive-thriftserver 
maven artifact

commit cf5686b922a90612cea185c882033989b391a021
Author: Alex Liu <[email protected]>
Date:   2015-01-10T21:23:09Z

    [SPARK-4943][SQL] Allow table name having dot for db/catalog
    
    The pull only fixes the parsing error and changes API to use 
tableIdentifier. Joining different catalog datasource related change is not 
done in this pull.
    
    Author: Alex Liu <[email protected]>
    
    Closes #3941 from alexliu68/SPARK-SQL-4943-3 and squashes the following 
commits:
    
    343ae27 [Alex Liu] [SPARK-4943][SQL] refactoring according to review
    29e5e55 [Alex Liu] [SPARK-4943][SQL] fix failed Hive CTAS tests
    6ae77ce [Alex Liu] [SPARK-4943][SQL] fix TestHive matching error
    3652997 [Alex Liu] [SPARK-4943][SQL] Allow table name having dot to support 
db/catalog ...

commit 37a79554360b7809a1b7413f831a8e91d68400d6
Author: scwf <[email protected]>
Date:   2015-01-10T21:53:21Z

    [SPARK-4574][SQL] Adding support for defining schema in foreign DDL 
commands.
    
    Adding support for defining schema in foreign DDL commands. Now foreign DDL 
support commands like:
    ```
    CREATE TEMPORARY TABLE avroTable
    USING org.apache.spark.sql.avro
    OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")
    ```
    With this PR user can define schema instead of infer from file, so  support 
ddl command as follows:
    ```
    CREATE TEMPORARY TABLE avroTable(a int, b string)
    USING org.apache.spark.sql.avro
    OPTIONS (path "../hive/src/test/resources/data/files/episodes.avro")
    ```
    
    Author: scwf <[email protected]>
    Author: Yin Huai <[email protected]>
    Author: Fei Wang <[email protected]>
    Author: wangfei <[email protected]>
    
    Closes #3431 from scwf/ddl and squashes the following commits:
    
    7e79ce5 [Fei Wang] Merge pull request #22 from yhuai/pr3431yin
    38f634e [Yin Huai] Remove Option from createRelation.
    65e9c73 [Yin Huai] Revert all changes since applying a given schema has not 
been testd.
    a852b10 [scwf] remove cleanIdentifier
    f336a16 [Fei Wang] Merge pull request #21 from yhuai/pr3431yin
    baf79b5 [Yin Huai] Test special characters quoted by backticks.
    50a03b0 [Yin Huai] Use JsonRDD.nullTypeToStringType to convert NullType to 
StringType.
    1eeb769 [Fei Wang] Merge pull request #20 from yhuai/pr3431yin
    f5c22b0 [Yin Huai] Refactor code and update test cases.
    f1cffe4 [Yin Huai] Revert "minor refactory"
    b621c8f [scwf] minor refactory
    d02547f [scwf] fix HiveCompatibilitySuite test failure
    8dfbf7a [scwf] more tests for complex data type
    ddab984 [Fei Wang] Merge pull request #19 from yhuai/pr3431yin
    91ad91b [Yin Huai] Parse data types in DDLParser.
    cf982d2 [scwf] fixed test failure
    445b57b [scwf] address comments
    02a662c [scwf] style issue
    44eb70c [scwf] fix decimal parser issue
    83b6fc3 [scwf] minor fix
    9bf12f8 [wangfei] adding test case
    7787ec7 [wangfei] added SchemaRelationProvider
    0ba70df [wangfei] draft version

commit 94b489f8d3966f5133b75be4d79818a3b19a717d
Author: scwf <[email protected]>
Date:   2015-01-10T22:08:04Z

    [SPARK-4861][SQL] Refactory command in spark sql
    
    Follow up for #3712.
    This PR finally remove ```CommandStrategy``` and make all commands follow 
```RunnableCommand``` so they can go with ```case r: RunnableCommand => 
ExecutedCommand(r) :: Nil```.
    
    One exception is the ```DescribeCommand``` of hive, which is a special case 
and need to distinguish hive table and temporary table, so still keep 
```HiveCommandStrategy``` here.
    
    Author: scwf <[email protected]>
    
    Closes #3948 from scwf/followup-SPARK-4861 and squashes the following 
commits:
    
    6b48e64 [scwf] minor style fix
    2c62e9d [scwf] fix for hive module
    5a7a819 [scwf] Refactory command in spark sql

commit 447f643adf7ea2f89018ac380412eb5dc7133af5
Author: Yanbo Liang <[email protected]>
Date:   2015-01-10T22:16:37Z

    SPARK-4963 [SQL] Add copy to SQL's Sample operator
    
    https://issues.apache.org/jira/browse/SPARK-4963
    SchemaRDD.sample() return wrong results due to GapSamplingIterator 
operating on mutable row.
    HiveTableScan make RDD with SpecificMutableRow and SchemaRDD.sample() will 
return GapSamplingIterator for iterating.
    
    override def next(): T = {
        val r = data.next()
        advance
        r
      }
    
    GapSamplingIterator.next() return the current underlying element and 
assigned it to r.
    However if the underlying iterator is mutable row just like what 
HiveTableScan returned, underlying iterator and r will point to the same object.
    After advance operation, we drop some underlying elments and it also 
changed r which is not expected. Then we return the wrong value different from 
initial r.
    
    To fix this issue, the most direct way is to make HiveTableScan return 
mutable row with copy just like the initial commit that I have made. This 
solution will make HiveTableScan can not get the full advantage of reusable 
MutableRow, but it can make sample operation return correct result.
    Further more, we need to investigate  GapSamplingIterator.next() and make 
it can implement copy operation inside it. To achieve this, we should define 
every elements that RDD can store implement the function like cloneable and it 
will make huge change.
    
    Author: Yanbo Liang <[email protected]>
    
    Closes #3827 from yanbohappy/spark-4963 and squashes the following commits:
    
    0912ca0 [Yanbo Liang] code format keep
    65c4e7c [Yanbo Liang] import file and clear annotation
    55c7c56 [Yanbo Liang] better output of test case
    cea7e2e [Yanbo Liang] SchemaRDD add copy operation before Sample operator
    e840829 [Yanbo Liang] HiveTableScan return mutable row with copy

commit 63729e175b4aa2ee25f05e2598785719c1e4acb7
Author: Michael Armbrust <[email protected]>
Date:   2015-01-10T22:25:45Z

    [SPARK-5187][SQL] Fix caching of tables with HiveUDFs in the WHERE clause
    
    Author: Michael Armbrust <[email protected]>
    
    Closes #3987 from marmbrus/hiveUdfCaching and squashes the following 
commits:
    
    8bca2fa [Michael Armbrust] [SPARK-5187][SQL] Fix caching of tables with 
HiveUDFs in the WHERE clause

commit 6687ee8c5000048a0f45ede5f1e1288adba96019
Author: YanTangZhai <[email protected]>
Date:   2015-01-10T23:05:23Z

    [SPARK-4692] [SQL] Support ! boolean logic operator like NOT
    
    Support ! boolean logic operator like NOT in sql as follows
    select * from for_test where !(col1 > col2)
    
    Author: YanTangZhai <[email protected]>
    Author: Michael Armbrust <[email protected]>
    
    Closes #3555 from YanTangZhai/SPARK-4692 and squashes the following commits:
    
    1a9f605 [YanTangZhai] Update HiveQuerySuite.scala
    7c03c68 [YanTangZhai] Merge pull request #23 from apache/master
    992046e [YanTangZhai] Update HiveQuerySuite.scala
    ea618f4 [YanTangZhai] Update HiveQuerySuite.scala
    192411d [YanTangZhai] Merge pull request #17 from YanTangZhai/master
    e4c2c0a [YanTangZhai] Merge pull request #15 from apache/master
    1e1ebb4 [YanTangZhai] Update HiveQuerySuite.scala
    efc4210 [YanTangZhai] Update HiveQuerySuite.scala
    bd2c444 [YanTangZhai] Update HiveQuerySuite.scala
    1893956 [YanTangZhai] Merge pull request #14 from marmbrus/pr/3555
    59e4de9 [Michael Armbrust] make hive test
    718afeb [YanTangZhai] Merge pull request #12 from apache/master
    950b21e [YanTangZhai] Update HiveQuerySuite.scala
    74175b4 [YanTangZhai] Update HiveQuerySuite.scala
    92242c7 [YanTangZhai] Update HiveQl.scala
    6e643f8 [YanTangZhai] Merge pull request #11 from apache/master
    e249846 [YanTangZhai] Merge pull request #10 from apache/master
    d26d982 [YanTangZhai] Merge pull request #9 from apache/master
    76d4027 [YanTangZhai] Merge pull request #8 from apache/master
    03b62b0 [YanTangZhai] Merge pull request #7 from apache/master
    8a00106 [YanTangZhai] Merge pull request #6 from apache/master
    cbcba66 [YanTangZhai] Merge pull request #3 from apache/master
    cdef539 [YanTangZhai] Merge pull request #1 from apache/master

commit dbbd5f5d255d41505406cee046586b72ae6199e9
Author: CodingCat <[email protected]>
Date:   2015-01-10T23:35:41Z

    [SPARK-5181] do not print writing WAL log when WAL is disabled
    
    https://issues.apache.org/jira/browse/SPARK-5181
    
    Currently, even the logManager is not created, we still see the log entry
    s"Writing to log $record"
    
    a simple fix to make log more accurate
    
    Author: CodingCat <[email protected]>
    
    Closes #3985 from CodingCat/SPARK-5181 and squashes the following commits:
    
    0e27dc5 [CodingCat] do not print writing WAL log when WAL is disabled

commit 04da7031fda06522c4927df45185255320c37f3e
Author: GuoQiang Li <[email protected]>
Date:   2015-01-10T23:38:43Z

    [Minor]Resolve sbt warnings during build (MQTTStreamSuite.scala).
    
    cc andrewor14
    
    Author: GuoQiang Li <[email protected]>
    
    Closes #3989 from witgo/MQTTStreamSuite and squashes the following commits:
    
    a6e967e [GuoQiang Li] Resolve sbt warnings during build 
(MQTTStreamSuite.scala).

commit c9b4a7de2304c0b80c4e9ea49c045ae64f26ed8f
Author: wangfei <[email protected]>
Date:   2015-01-11T01:04:56Z

    [SPARK-4871][SQL] Show sql statement in spark ui when run sql with spark-sql
    
    Author: wangfei <[email protected]>
    
    Closes #3718 from scwf/sparksqlui and squashes the following commits:
    
    e0d6b5d [wangfei] format fix
    383b505 [wangfei] fix conflicts
    4d2038a [wangfei] using setJobDescription
    df79837 [wangfei] fix compile error
    92ce834 [wangfei] show sql statement in spark ui when run sql use spark-sql

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: Kafka streaming

Reply via email to