[GitHub] spark pull request: [SPARK-10220] [SQL] org.apache.spark.sql.jdbc....

ffchenAtCloudera Tue, 25 Aug 2015 19:49:42 -0700

GitHub user ffchenAtCloudera opened a pull request:

    https://github.com/apache/spark/pull/8443


    [SPARK-10220] [SQL] org.apache.spark.sql.jdbc.JDBCRDD could not parse mysql 
table column named reserved word

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/8443.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #8443
    
----
commit 017b5de07ef6cff249e984a2ab781c520249ac76
Author: Sudhakar Thota <[email protected]>
Date:   2015-08-11T21:31:51Z

    [SPARK-8925] [MLLIB] Add @since tags to mllib.util
    
    Went thru the history of changes the file MLUtils.scala and picked up the 
version that the change went in.
    
    Author: Sudhakar Thota <[email protected]>
    Author: Sudhakar Thota <[email protected]>
    
    Closes #7436 from sthota2014/SPARK-8925_thotas.

commit 736af95bd0c41723d455246b634a0fb68b38a7c7
Author: Andrew Or <[email protected]>
Date:   2015-08-11T21:52:52Z

    [HOTFIX] Fix style error caused by 017b5de

commit 5a5bbc29961630d649d4bd4acd5d19eb537b5fd0
Author: Marcelo Vanzin <[email protected]>
Date:   2015-08-11T23:33:08Z

    [SPARK-9074] [LAUNCHER] Allow arbitrary Spark args to be set.
    
    This change allows any Spark argument to be added to the app to
    be started using SparkLauncher. Known arguments are properly
    validated, while unknown arguments are allowed so that the
    library can launch newer Spark versions (in case SPARK_HOME points
    at one).
    
    Author: Marcelo Vanzin <[email protected]>
    
    Closes #7975 from vanzin/SPARK-9074 and squashes the following commits:
    
    b5e451a [Marcelo Vanzin] [SPARK-9074] [launcher] Allow arbitrary Spark args 
to be set.

commit afa757c98c537965007cad4c61c436887f3ac6a6
Author: Reynold Xin <[email protected]>
Date:   2015-08-12T01:08:49Z

    [SPARK-9849] [SQL] DirectParquetOutputCommitter qualified name should be 
backward compatible
    
    DirectParquetOutputCommitter was moved in SPARK-9763. However, users can 
explicitly set the class as a config option, so we must be able to resolve the 
old committer qualified name.
    
    Author: Reynold Xin <[email protected]>
    
    Closes #8114 from rxin/SPARK-9849.

commit ca8f70e9d473d2c81866f3c330cc6545c33bdac7
Author: Andrew Or <[email protected]>
Date:   2015-08-12T03:46:58Z

    [SPARK-9649] Fix flaky test MasterSuite again - disable REST
    
    The REST server is not actually used in most tests and so we can disable 
it. It is a source of flakiness because it tries to bind to a specific port in 
vain. There was also some code that avoided the shuffle service in tests. This 
is actually not necessary because the shuffle service is already off by default.
    
    Author: Andrew Or <[email protected]>
    
    Closes #8084 from andrewor14/fix-master-suite-again.

commit 3ef0f32928fc383ad3edd5ad167212aeb9eba6e1
Author: Patrick Wendell <[email protected]>
Date:   2015-08-12T04:16:48Z

    [SPARK-1517] Refactor release scripts to facilitate nightly publishing
    
    This update contains some code changes to the release scripts that allow 
easier nightly publishing. I've been using these new scripts on Jenkins for 
cutting and publishing nightly snapshots for the last month or so, and it has 
been going well. I'd like to get them merged back upstream so this can be 
maintained by the community.
    
    The main changes are:
    1. Separates the release tagging from various build possibilities for an 
already tagged release (`release-tag.sh` and `release-build.sh`).
    2. Allow for injecting credentials through the environment, including GPG 
keys. This is then paired with secure key injection in Jenkins.
    3. Support for copying build results to a remote directory, and also 
"rotating" results, e.g. the ability to keep the last N copies of binary or doc 
builds.
    
    I'm happy if anyone wants to take a look at this - it's not user facing but 
an internal utility used for generating releases.
    
    Author: Patrick Wendell <[email protected]>
    
    Closes #7411 from pwendell/release-script-updates and squashes the 
following commits:
    
    74f9beb [Patrick Wendell] Moving maven build command to a variable
    233ce85 [Patrick Wendell] [SPARK-1517] Refactor release scripts to 
facilitate nightly publishing

commit 74a293f4537c6982345166f8883538f81d850872
Author: Eric Liang <[email protected]>
Date:   2015-08-12T04:26:03Z

    [SPARK-9713] [ML] Document SparkR MLlib glm() integration in Spark 1.5
    
    This documents the use of R model formulae in the SparkR guide. Also fixes 
some bugs in the R api doc.
    
    mengxr
    
    Author: Eric Liang <[email protected]>
    
    Closes #8085 from ericl/docs.

commit c3e9a120e33159fb45cd99f3a55fc5cf16cd7c6c
Author: Davies Liu <[email protected]>
Date:   2015-08-12T05:45:18Z

    [SPARK-9831] [SQL] fix serialization with empty broadcast
    
    Author: Davies Liu <[email protected]>
    
    Closes #8117 from davies/fix_serialization and squashes the following 
commits:
    
    d21ac71 [Davies Liu] fix serialization with empty broadcast

commit b1581ac28840a4d2209ef8bb5c9f8700b4c1b286
Author: Josh Rosen <[email protected]>
Date:   2015-08-12T05:46:59Z

    [SPARK-9854] [SQL] RuleExecutor.timeMap should be thread-safe
    
    `RuleExecutor.timeMap` is currently a non-thread-safe mutable HashMap; this 
can lead to infinite loops if multiple threads are concurrently modifying the 
map.  I believe that this is responsible for some hangs that I've observed in 
HiveQuerySuite.
    
    This patch addresses this by using a Guava `AtomicLongMap`.
    
    Author: Josh Rosen <[email protected]>
    
    Closes #8120 from JoshRosen/rule-executor-time-map-fix.

commit b85f9a242a12e8096e331fa77d5ebd16e93c844d
Author: xutingjun <[email protected]>
Date:   2015-08-12T06:19:35Z

    [SPARK-8366] maxNumExecutorsNeeded should properly handle failed tasks
    
    Author: xutingjun <[email protected]>
    Author: meiyoula <[email protected]>
    
    Closes #6817 from XuTingjun/SPARK-8366.

commit a807fcbe50b2ce18751d80d39e9d21842f7da32a
Author: Rohit Agarwal <[email protected]>
Date:   2015-08-12T06:20:39Z

    [SPARK-9806] [WEB UI] Don't share ReplayListenerBus between multiple 
applications
    
    Author: Rohit Agarwal <[email protected]>
    
    Closes #8088 from mindprince/SPARK-9806.

commit 4e3f4b934f74e8c7c06f4940d6381343f9fd4918
Author: zsxwing <[email protected]>
Date:   2015-08-12T06:23:17Z

    [SPARK-9829] [WEBUI] Display the update value for peak execution memory
    
    The peak execution memory is not correct because it shows the sum of 
finished tasks' values when a task finishes.
    
    This PR fixes it by using the update value rather than the accumulator 
value.
    
    Author: zsxwing <[email protected]>
    
    Closes #8121 from zsxwing/SPARK-9829.

commit bab89232854de7554e88f29cab76f1a1c349edc1
Author: Carson Wang <[email protected]>
Date:   2015-08-12T06:25:02Z

    [SPARK-9426] [WEBUI] Job page DAG visualization is not shown
    
    To reproduce the issue, go to the stage page and click DAG Visualization 
once, then go to the job page to show the job DAG visualization. You will only 
see the first stage of the job.
    Root cause: the java script use local storage to remember your selection. 
Once you click the stage DAG visualization, the local storage set 
`expand-dag-viz-arrow-stage` to true. When you go to the job page, the js 
checks `expand-dag-viz-arrow-stage` in the local storage first and will try to 
show stage DAG visualization on the job page.
    To fix this, I set an id to the DAG span to differ job page and stage page. 
In the js code, we check the id and local storage together to make sure we show 
the correct DAG visualization.
    
    Author: Carson Wang <[email protected]>
    
    Closes #8104 from carsonwang/SPARK-9426.

commit 5c99d8bf98cbf7f568345d02a814fc318cbfca75
Author: Timothy Chen <[email protected]>
Date:   2015-08-12T06:26:33Z

    [SPARK-8798] [MESOS] Allow additional uris to be fetched with mesos
    
    Some users like to download additional files in their sandbox that they can 
refer to from their spark program, or even later mount these files to another 
directory.
    
    Author: Timothy Chen <[email protected]>
    
    Closes #7195 from tnachen/mesos_files.

commit 741a29f98945538a475579ccc974cd42c1613be4
Author: Timothy Chen <[email protected]>
Date:   2015-08-12T06:33:22Z

    [SPARK-9575] [MESOS] Add docuemntation around Mesos shuffle service.
    
    andrewor14
    
    Author: Timothy Chen <[email protected]>
    
    Closes #7907 from tnachen/mesos_shuffle.

commit 9d0822455ddc8d765440d58c463367a4d67ef456
Author: Yijie Shen <[email protected]>
Date:   2015-08-12T11:54:00Z

    [SPARK-9182] [SQL] Filters are not passed through to jdbc source
    
    This PR fixes unable to push filter down to JDBC source caused by `Cast` 
during pattern matching.
    
    While we are comparing columns of different type, there's a big chance we 
need a cast on the column, therefore not match the pattern directly on 
Attribute and would fail to push down.
    
    Author: Yijie Shen <[email protected]>
    
    Closes #8049 from yjshen/jdbc_pushdown.

commit 3ecb3794302dc12d0989f8d725483b2cc37762cf
Author: Cheng Lian <[email protected]>
Date:   2015-08-12T12:01:34Z

    [SPARK-9407] [SQL] Relaxes Parquet ValidTypeMap to allow ENUM predicates to 
be pushed down
    
    This PR adds a hacky workaround for PARQUET-201, and should be removed once 
we upgrade to parquet-mr 1.8.1 or higher versions.
    
    In Parquet, not all types of columns can be used for filter push-down 
optimization.  The set of valid column types is controlled by `ValidTypeMap`.  
Unfortunately, in parquet-mr 1.7.0 and prior versions, this limitation is too 
strict, and doesn't allow `BINARY (ENUM)` columns to be pushed down.  On the 
other hand, `BINARY (ENUM)` is commonly seen in Parquet files written by 
libraries like `parquet-avro`.
    
    This restriction is problematic for Spark SQL, because Spark SQL doesn't 
have a type that maps to Parquet `BINARY (ENUM)` directly, and always converts 
`BINARY (ENUM)` to Catalyst `StringType`.  Thus, a predicate involving a 
`BINARY (ENUM)` is recognized as one involving a string field instead and can 
be pushed down by the query optimizer.  Such predicates are actually perfectly 
legal except that it fails the `ValidTypeMap` check.
    
    The workaround added here is relaxing `ValidTypeMap` to include `BINARY 
(ENUM)`.  I also took the chance to simplify `ParquetCompatibilityTest` a 
little bit when adding regression test.
    
    Author: Cheng Lian <[email protected]>
    
    Closes #8107 from liancheng/spark-9407/parquet-enum-filter-push-down.

commit 2e680668f7b6fc158aa068aedd19c1878ecf759e
Author: Tom White <[email protected]>
Date:   2015-08-12T15:06:27Z

    [SPARK-8625] [CORE] Propagate user exceptions in tasks back to driver
    
    This allows clients to retrieve the original exception from the
    cause field of the SparkException that is thrown by the driver.
    If the original exception is not in fact Serializable then it will
    not be returned, but the message and stacktrace will be. (All Java
    Throwables implement the Serializable interface, but this is no
    guarantee that a particular implementation can actually be
    serialized.)
    
    Author: Tom White <[email protected]>
    
    Closes #7014 from tomwhite/propagate-user-exceptions.

commit be5d1912076c2ffd21ec88611e53d3b3c59b7ecc
Author: Andrew Or <[email protected]>
Date:   2015-08-12T16:24:50Z

    [SPARK-9795] Dynamic allocation: avoid double counting when killing same 
executor twice
    
    This is based on KaiXinXiaoLei's changes in #7716.
    
    The issue is that when someone calls `sc.killExecutor("1")` on the same 
executor twice quickly, then the executor target will be adjusted downwards by 
2 instead of 1 even though we're only actually killing one executor. In certain 
cases where we don't adjust the target back upwards quickly, we'll end up with 
jobs hanging.
    
    This is a common danger because there are many places where this is called:
    - `HeartbeatReceiver` kills an executor that has not been sending heartbeats
    - `ExecutorAllocationManager` kills an executor that has been idle
    - The user code might call this, which may interfere with the previous 
callers
    
    While it's not clear whether this fixes SPARK-9745, fixing this potential 
race condition seems like a strict improvement. I've added a regression test to 
illustrate the issue.
    
    Author: Andrew Or <[email protected]>
    
    Closes #8078 from andrewor14/da-double-kill.

commit 66d87c1d76bea2b81993156ac1fa7dad6c312ebf
Author: Yuhao Yang <[email protected]>
Date:   2015-08-12T16:35:32Z

    [SPARK-7583] [MLLIB] User guide update for RegexTokenizer
    
    jira: https://issues.apache.org/jira/browse/SPARK-7583
    
    User guide update for RegexTokenizer
    
    Author: Yuhao Yang <[email protected]>
    
    Closes #7828 from hhbyyh/regexTokenizerDoc.

commit e0110792ef71ebfd3727b970346a2e13695990a4
Author: Andrew Or <[email protected]>
Date:   2015-08-12T17:08:35Z

    [SPARK-9747] [SQL] Avoid starving an unsafe operator in aggregation
    
    This is the sister patch to #8011, but for aggregation.
    
    In a nutshell: create the `TungstenAggregationIterator` before computing 
the parent partition. Internally this creates a `BytesToBytesMap` which 
acquires a page in the constructor as of this patch. This ensures that the 
aggregation operator is not starved since we reserve at least 1 page in advance.
    
    rxin yhuai
    
    Author: Andrew Or <[email protected]>
    
    Closes #8038 from andrewor14/unsafe-starve-memory-agg.

commit 57ec27dd7784ce15a2ece8a6c8ac7bd5fd25aea2
Author: Marcelo Vanzin <[email protected]>
Date:   2015-08-12T17:38:30Z

    [SPARK-9804] [HIVE] Use correct value for isSrcLocal parameter.
    
    If the correct parameter is not provided, Hive will run into an error
    because it calls methods that are specific to the local filesystem to
    copy the data.
    
    Author: Marcelo Vanzin <[email protected]>
    
    Closes #8086 from vanzin/SPARK-9804.

commit 70fe558867ccb4bcff6ec673438b03608bb02252
Author: Joseph K. Bradley <[email protected]>
Date:   2015-08-12T17:48:52Z

    [SPARK-9847] [ML] Modified copyValues to distinguish between default, 
explicit param values
    
    From JIRA: Currently, Params.copyValues copies default parameter values to 
the paramMap of the target instance, rather than the defaultParamMap. It should 
copy to the defaultParamMap because explicitly setting a parameter can change 
the semantics.
    This issue arose in SPARK-9789, where 2 params "threshold" and "thresholds" 
for LogisticRegression can have mutually exclusive values. If thresholds is 
set, then fit() will copy the default value of threshold as well, easily 
resulting in inconsistent settings for the 2 params.
    
    CC: mengxr
    
    Author: Joseph K. Bradley <[email protected]>
    
    Closes #8115 from jkbradley/copyvalues-fix.

commit 60103ecd3d9c92709a5878be7ebd57012813ab48
Author: Brennan Ashton <[email protected]>
Date:   2015-08-12T18:57:30Z

    [SPARK-9726] [PYTHON] PySpark DF join no longer accepts on=None
    
    rxin
    
    First pull request for Spark so let me know if I am missing anything
    The contribution is my original work and I license the work to the project 
under the project's open source license.
    
    Author: Brennan Ashton <[email protected]>
    
    Closes #8016 from btashton/patch-1.

commit 762bacc16ac5e74c8b05a7c1e3e367d1d1633cef
Author: Yanbo Liang <[email protected]>
Date:   2015-08-12T20:24:18Z

    [SPARK-9766] [ML] [PySpark] check and add miss docs for PySpark ML
    
    Check and add miss docs for PySpark ML (this issue only check miss docs for 
o.a.s.ml not o.a.s.mllib).
    
    Author: Yanbo Liang <[email protected]>
    
    Closes #8059 from yanboliang/SPARK-9766.

commit 551def5d6972440365bd7436d484a67138d9a8f3
Author: Joseph K. Bradley <[email protected]>
Date:   2015-08-12T21:27:13Z

    [SPARK-9789] [ML] Added logreg threshold param back
    
    Reinstated LogisticRegression.threshold Param for binary compatibility.  
Param thresholds overrides threshold, if set.
    
    CC: mengxr dbtsai feynmanliang
    
    Author: Joseph K. Bradley <[email protected]>
    
    Closes #8079 from jkbradley/logreg-reinstate-threshold.

commit 6f60298b1d7aa97268a42eca1e3b4851a7e88cb5
Author: Xiangrui Meng <[email protected]>
Date:   2015-08-12T21:28:23Z

    [SPARK-8967] [DOC] add Since annotation
    
    Add `Since` as a Scala annotation. The benefit is that we can use it 
without having explicit JavaDoc. This is useful for inherited methods. The 
limitation is that is doesn't show up in the generated Java API documentation. 
This might be fixed by modifying genjavadoc. I think we could leave it as a 
TODO.
    
    This is how the generated Scala doc looks:
    
    `since` JavaDoc tag:
    
    ![screen shot 2015-08-11 at 10 00 37 
pm](https://cloud.githubusercontent.com/assets/829644/9230761/fa72865c-40d8-11e5-807e-0f3c815c5acd.png)
    
    `Since` annotation:
    
    ![screen shot 2015-08-11 at 10 00 28 
pm](https://cloud.githubusercontent.com/assets/829644/9230764/0041d7f4-40d9-11e5-8124-c3f3e5d5b31f.png)
    
    rxin
    
    Author: Xiangrui Meng <[email protected]>
    
    Closes #8131 from mengxr/SPARK-8967.

commit a17384fa343628cec44437da5b80b9403ecd5838
Author: Reynold Xin <[email protected]>
Date:   2015-08-12T22:27:52Z

    [SPARK-9907] [SQL] Python crc32 is mistakenly calling md5
    
    Author: Reynold Xin <[email protected]>
    
    Closes #8138 from rxin/SPARK-9907.

commit 738f353988dbf02704bd63f5e35d94402c59ed79
Author: Niranjan Padmanabhan <[email protected]>
Date:   2015-08-12T23:10:21Z

    [SPARK-9092] Fixed incompatibility when both num-executors and dynamic...
    
    â¦ allocation are set. Now, dynamic allocation is set to false when 
num-executors is explicitly specified as an argument. Consequently, 
executorAllocationManager in not initialized in the SparkContext.
    
    Author: Niranjan Padmanabhan <[email protected]>
    
    Closes #7657 from neurons/SPARK-9092.

commit ab7e721cfec63155641e81e72b4ad43cf6a7d4c7
Author: Michel Lemay <[email protected]>
Date:   2015-08-12T23:17:58Z

    [SPARK-9826] [CORE] Fix cannot use custom classes in log4j.properties
    
    Refactor Utils class and create ShutdownHookManager.
    
    NOTE: Wasn't able to run /dev/run-tests on windows machine.
    Manual tests were conducted locally using custom log4j.properties file with 
Redis appender and logstash formatter (bundled in the fat-jar submitted to 
spark)
    
    ex:
    log4j.rootCategory=WARN,console,redis
    log4j.appender.console=org.apache.log4j.ConsoleAppender
    log4j.appender.console.target=System.err
    log4j.appender.console.layout=org.apache.log4j.PatternLayout
    log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p 
%c{1}: %m%n
    
    log4j.logger.org.eclipse.jetty=WARN
    log4j.logger.org.eclipse.jetty.util.component.AbstractLifeCycle=ERROR
    log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
    log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
    log4j.logger.org.apache.spark.graphx.Pregel=INFO
    
    log4j.appender.redis=com.ryantenney.log4j.FailoverRedisAppender
    log4j.appender.redis.endpoints=hostname:port
    log4j.appender.redis.key=mykey
    log4j.appender.redis.alwaysBatch=false
    log4j.appender.redis.layout=net.logstash.log4j.JSONEventLayoutV1
    
    Author: michellemay <[email protected]>
    
    Closes #8109 from michellemay/SPARK-9826.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-10220] [SQL] org.apache.spark.sql.jdbc....

Reply via email to