[GitHub] spark pull request: Problem select empty ORC table

pprado Fri, 13 May 2016 11:52:39 -0700

GitHub user pprado opened a pull request:

    https://github.com/apache/spark/pull/13103


    Problem select empty ORC table

    ## Error when I selected empty ORC table
    
    > [pprado@hadoop-m ~]$ beeline -u jdbc:hive2://
    WARNING: Use "yarn jar" to launch YARN applications.
    Connecting to jdbc:hive2://
    Connected to: Apache Hive (version 1.2.1000.2.4.2.0-258)
    Driver: Hive JDBC (version 1.2.1000.2.4.2.0-258)
    Transaction isolation: TRANSACTION_REPEATABLE_READ
    Beeline version 1.2.1000.2.4.2.0-258 by Apache Hive
    
    > 
    
    On beeline => `create table my_test (id int, name String) stored as orc;`
    On beeline => `select * from my_test;`
    
    > 
    16/05/13 18:18:57 [main]: ERROR hdfs.KeyProviderCache: Could not find uri 
with key [dfs.encryption.key.provider.uri] to create a keyProvider !!
    OK
    +-------------+---------------+--+
    | my_test.id  | my_test.name  |
    +-------------+---------------+--+
    +-------------+---------------+--+
    No rows selected (1.227 seconds)
    
    > 
    
    Hive is OK!
    
    Now, when i execute pyspark.
    
    > Welcome to
    >     SPARK   version 1.6.1
    > 
    > Using Python version 2.6.6 (r266:84292, Jul 23 2015 15:22:56)
    > SparkContext available as sc, HiveContext available as sqlContext.
    > 
    > 
    
    PySpark => `sqlContext.sql("select * from my_test")`
    
    > 16/05/13 18:33:41 INFO ParseDriver: Parsing command: select * from my_test
    > 16/05/13 18:33:41 INFO ParseDriver: Parse Completed
    > Traceback (most recent call last):
    >   File "<stdin>", line 1, in <module>
    >   File "/usr/hdp/2.4.2.0-258/spark/python/pyspark/sql/context.py", line 
580, in sql
    >     return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
    >   File 
"/usr/hdp/2.4.2.0-258/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", 
line 813, in __call__
    >   File "/usr/hdp/2.4.2.0-258/spark/python/pyspark/sql/utils.py", line 53, 
in deco
    >     raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
    > pyspark.sql.utils.IllegalArgumentException: u'orcFileOperator: path 
hdfs://hadoop-m.c.sva-0001.internal:8020/apps/hive/warehouse/my_test does not 
have valid orc files matching the pattern'
    
    when i create parquet table, it's all right. I do not have problem.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-1.6

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13103.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13103
    
----
commit bd33d4ee847973289a58032df35375f03e9f9865
Author: Kousuke Saruta <[email protected]>
Date:   2015-12-18T22:05:06Z

    [SPARK-12404][SQL] Ensure objects passed to StaticInvoke is Serializable
    
    Now `StaticInvoke` receives `Any` as a object and `StaticInvoke` can be 
serialized but sometimes the object passed is not serializable.
    
    For example, following code raises Exception because 
`RowEncoder#extractorsFor` invoked indirectly makes `StaticInvoke`.
    
    ```
    case class TimestampContainer(timestamp: java.sql.Timestamp)
    val rdd = sc.parallelize(1 to 2).map(_ => 
TimestampContainer(System.currentTimeMillis))
    val df = rdd.toDF
    val ds = df.as[TimestampContainer]
    val rdd2 = ds.rdd                                 <----------------- 
invokes extractorsFor indirectory
    ```
    
    I'll add test cases.
    
    Author: Kousuke Saruta <[email protected]>
    Author: Michael Armbrust <[email protected]>
    
    Closes #10357 from sarutak/SPARK-12404.
    
    (cherry picked from commit 6eba655259d2bcea27d0147b37d5d1e476e85422)
    Signed-off-by: Michael Armbrust <[email protected]>

commit eca401ee5d3ae683cbee531c1f8bc981f9603fc8
Author: Burak Yavuz <[email protected]>
Date:   2015-12-18T23:24:41Z

    [SPARK-11985][STREAMING][KINESIS][DOCS] Update Kinesis docs
    
     - Provide example on `message handler`
     - Provide bit on KPL record de-aggregation
     - Fix typos
    
    Author: Burak Yavuz <[email protected]>
    
    Closes #9970 from brkyvz/kinesis-docs.
    
    (cherry picked from commit 2377b707f25449f4557bf048bb384c743d9008e5)
    Signed-off-by: Shixiong Zhu <[email protected]>

commit d6a519ff20652494ac3aeba477526ad1fd810a3c
Author: Yanbo Liang <[email protected]>
Date:   2015-12-19T08:34:30Z

    [SQL] Fix mistake doc of join type for dataframe.join
    
    Fix mistake doc of join type for ```dataframe.join```.
    
    Author: Yanbo Liang <[email protected]>
    
    Closes #10378 from yanboliang/leftsemi.
    
    (cherry picked from commit a073a73a561e78c734119c8b764d37a4e5e70da4)
    Signed-off-by: Reynold Xin <[email protected]>

commit c754a08793458813d608e48ad1b158da770cd992
Author: pshearer <[email protected]>
Date:   2015-12-21T22:04:59Z

    Doc typo: ltrim = trim from left end, not right
    
    Author: pshearer <[email protected]>
    
    Closes #10414 from pshearer/patch-1.
    
    (cherry picked from commit fc6dbcc7038c2b030ef6a2dc8be5848499ccee1c)
    Signed-off-by: Andrew Or <[email protected]>

commit ca3998512dd7801379c96c9399d3d053ab7472cd
Author: Andrew Or <[email protected]>
Date:   2015-12-21T22:09:04Z

    [SPARK-12466] Fix harmless NPE in tests
    
    ```
    [info] ReplayListenerSuite:
    [info] - Simple replay (58 milliseconds)
    java.lang.NullPointerException
        at 
org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:982)
        at 
org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:980)
    ```
    
https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-SBT/4316/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/consoleFull
    
    This was introduced in #10284. It's harmless because the NPE is caused by a 
race that occurs mainly in `local-cluster` tests (but don't actually fail the 
tests).
    
    Tested locally to verify that the NPE is gone.
    
    Author: Andrew Or <[email protected]>
    
    Closes #10417 from andrewor14/fix-harmless-npe.
    
    (cherry picked from commit d655d37ddf59d7fb6db529324ac8044d53b2622a)
    Signed-off-by: Andrew Or <[email protected]>

commit 4062cda3087ae42c6c3cb24508fc1d3a931accdf
Author: Patrick Wendell <[email protected]>
Date:   2015-12-22T01:50:29Z

    Preparing Spark release v1.6.0-rc4

commit 5b19e7cfded0e2e41b6f427b4c3cfc3f06f85466
Author: Patrick Wendell <[email protected]>
Date:   2015-12-22T01:50:36Z

    Preparing development version 1.6.0-SNAPSHOT

commit 309ef355fc511b70765983358d5c92b5f1a26bce
Author: Shixiong Zhu <[email protected]>
Date:   2015-12-22T06:28:18Z

    [MINOR] Fix typos in JavaStreamingContext
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #10424 from zsxwing/typo.
    
    (cherry picked from commit 93da8565fea42d8ac978df411daced4a9ea3a9c8)
    Signed-off-by: Reynold Xin <[email protected]>

commit 0f905d7df43b20d9335ec880b134d8d4f962c297
Author: Josh Rosen <[email protected]>
Date:   2015-12-22T07:12:05Z

    [SPARK-11823][SQL] Fix flaky JDBC cancellation test in 
HiveThriftBinaryServerSuite
    
    This patch fixes a flaky "test jdbc cancel" test in 
HiveThriftBinaryServerSuite. This test is prone to a race-condition which 
causes it to block indefinitely with while waiting for an extremely slow query 
to complete, which caused many Jenkins builds to time out.
    
    For more background, see my comments on #6207 (the PR which introduced this 
test).
    
    Author: Josh Rosen <[email protected]>
    
    Closes #10425 from JoshRosen/SPARK-11823.
    
    (cherry picked from commit 2235cd44407e3b6b401fb84a2096ade042c51d36)
    Signed-off-by: Josh Rosen <[email protected]>

commit 94fb5e870403e19feca8faf7d98bba6d14f7a362
Author: Shixiong Zhu <[email protected]>
Date:   2015-12-22T23:33:30Z

    [SPARK-12487][STREAMING][DOCUMENT] Add docs for Kafka message handler
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #10439 from zsxwing/kafka-message-handler-doc.
    
    (cherry picked from commit 93db50d1c2ff97e6eb9200a995e4601f752968ae)
    Signed-off-by: Tathagata Das <[email protected]>

commit 942c0577b201a08fffdcaf71e4d1867266ae309e
Author: Shixiong Zhu <[email protected]>
Date:   2015-12-23T00:39:10Z

    [SPARK-12429][STREAMING][DOC] Add Accumulator and Broadcast example for 
Streaming
    
    This PR adds Scala, Java and Python examples to show how to use Accumulator 
and Broadcast in Spark Streaming to support checkpointing.
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #10385 from zsxwing/accumulator-broadcast-example.
    
    (cherry picked from commit 20591afd790799327f99485c5a969ed7412eca45)
    Signed-off-by: Tathagata Das <[email protected]>

commit c6c9bf99af0ee0559248ad772460e9b2efde5861
Author: pierre-borckmans <[email protected]>
Date:   2015-12-23T07:00:42Z

    [SPARK-12477][SQL] - Tungsten projection fails for null values in array 
fields
    
    Accessing null elements in an array field fails when tungsten is enabled.
    It works in Spark 1.3.1, and in Spark > 1.5 with Tungsten disabled.
    
    This PR solves this by checking if the accessed element in the array field 
is null, in the generated code.
    
    Example:
    ```
    // Array of String
    case class AS( as: Seq[String] )
    val dfAS = sc.parallelize( Seq( AS ( Seq("a",null,"b") ) ) ).toDF
    dfAS.registerTempTable("T_AS")
    for (i <- 0 to 2) { println(i + " = " + sqlContext.sql(s"select as[$i] from 
T_AS").collect.mkString(","))}
    ```
    
    With Tungsten disabled:
    ```
    0 = [a]
    1 = [null]
    2 = [b]
    ```
    
    With Tungsten enabled:
    ```
    0 = [a]
    15/12/22 09:32:50 ERROR Executor: Exception in task 7.0 in stage 1.0 (TID 
15)
    java.lang.NullPointerException
        at 
org.apache.spark.sql.catalyst.expressions.UnsafeRowWriters$UTF8StringWriter.getSize(UnsafeRowWriters.java:90)
        at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
 Source)
        at 
org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:90)
        at 
org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:88)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    ```
    
    Author: pierre-borckmans <[email protected]>
    
    Closes #10429 from 
pierre-borckmans/SPARK-12477_Tungsten-Projection-Null-Element-In-Array.
    
    (cherry picked from commit 43b2a6390087b7ce262a54dc8ab8dd825db62e21)
    Signed-off-by: Reynold Xin <[email protected]>

commit 5987b1658b837400691160c38ba6eedc47274ee4
Author: Adrian Bridgett <[email protected]>
Date:   2015-12-24T00:00:03Z

    [SPARK-12499][BUILD] don't force MAVEN_OPTS
    
    allow the user to override MAVEN_OPTS (2GB wasn't sufficient for me)
    
    Author: Adrian Bridgett <[email protected]>
    
    Closes #10448 from abridgett/feature/do_not_force_maven_opts.
    
    (cherry picked from commit ead6abf7e7fc14b451214951d4991d497aa65e63)
    Signed-off-by: Josh Rosen <[email protected]>

commit b49856ae5983aca8ed7df2f478fc5f399ec34ce8
Author: Nong Li <[email protected]>
Date:   2015-12-19T00:05:18Z

    [SPARK-12411][CORE] Decrease executor heartbeat timeout to match heartbeat 
interval
    
    Previously, the rpc timeout was the default network timeout, which is the 
same value
    the driver uses to determine dead executors. This means if there is a 
network issue,
    the executor is determined dead after one heartbeat attempt. There is a 
separate config
    for the heartbeat interval which is a better value to use for the heartbeat 
RPC. With
    this change, the executor will make multiple heartbeat attempts even with 
RPC issues.
    
    Author: Nong Li <[email protected]>
    
    Closes #10365 from nongli/spark-12411.

commit 4dd8712c1b64a64da0fa0413e2c9be68ad0ddc17
Author: Kazuaki Ishizaki <[email protected]>
Date:   2015-12-24T12:27:55Z

    [SPARK-12502][BUILD][PYTHON] Script /dev/run-tests fails when IBM Java is 
used
    
    fix an exception with IBM JDK by removing update field from a JavaVersion 
tuple. This is because IBM JDK does not have information on update '_xx'
    
    Author: Kazuaki Ishizaki <[email protected]>
    
    Closes #10463 from kiszk/SPARK-12502.
    
    (cherry picked from commit 9e85bb71ad2d7d3a9da0cb8853f3216d37e6ff47)
    Signed-off-by: Kousuke Saruta <[email protected]>

commit 865dd8bccfc994310ad6664151d469043706ef3b
Author: CK50 <[email protected]>
Date:   2015-12-24T13:39:11Z

    [SPARK-12010][SQL] Spark JDBC requires support for column-name-free INSERT 
syntax
    
    In the past Spark JDBC write only worked with technologies which support 
the following INSERT statement syntax (JdbcUtils.scala: insertStatement()):
    
    INSERT INTO $table VALUES ( ?, ?, ..., ? )
    
    But some technologies require a list of column names:
    
    INSERT INTO $table ( $colNameList ) VALUES ( ?, ?, ..., ? )
    
    This was blocking the use of e.g. the Progress JDBC Driver for Cassandra.
    
    Another limitation is that syntax 1 relies no the dataframe field ordering 
match that of the target table. This works fine, as long as the target table 
has been created by writer.jdbc().
    
    If the target table contains more columns (not created by writer.jdbc()), 
then the insert fails due mismatch of number of columns or their data types.
    
    This PR switches to the recommended second INSERT syntax. Column names are 
taken from datafram field names.
    
    Author: CK50 <[email protected]>
    
    Closes #10380 from CK50/master-SPARK-12010-2.
    
    (cherry picked from commit 502476e45c314a1229b3bce1c61f5cb94a9fc04b)
    Signed-off-by: Sean Owen <[email protected]>

commit b8da77ef776ab9cdc130a70293d75e7bdcdf95b0
Author: gatorsmile <[email protected]>
Date:   2015-12-28T07:18:48Z

    [SPARK-12520] [PYSPARK] Correct Descriptions and Add Use Cases in Equi-Join
    
    After reading the JIRA https://issues.apache.org/jira/browse/SPARK-12520, I 
double checked the code.
    
    For example, users can do the Equi-Join like
      ```df.join(df2, 'name', 'outer').select('name', 'height').collect()```
    - There exists a bug in 1.5 and 1.4. The code just ignores the third 
parameter (join type) users pass. However, the join type we called is `Inner`, 
even if the user-specified type is the other type (e.g., `Outer`).
    - After a PR: https://github.com/apache/spark/pull/8600, the 1.6 does not 
have such an issue, but the description has not been updated.
    
    Plan to submit another PR to fix 1.5 and issue an error message if users 
specify a non-inner join type when using Equi-Join.
    
    Author: gatorsmile <[email protected]>
    
    Closes #10477 from gatorsmile/pyOuterJoin.

commit 1fbcb6e7be9cd9fa5255837cfc5358f2283f4aaf
Author: Yaron Weinsberg <[email protected]>
Date:   2015-12-28T20:19:11Z

    [SPARK-12517] add default RDD name for one created via sc.textFile
    
    The feature was first added at commit: 
7b877b27053bfb7092e250e01a3b887e1b50a109 but was later removed (probably by 
mistake) at commit: fc8b58195afa67fbb75b4c8303e022f703cbf007.
    This change sets the default path of RDDs created via sc.textFile(...) to 
the path argument.
    
    Here is the symptom:
    
    * Using spark-1.5.2-bin-hadoop2.6:
    
    scala> sc.textFile("/home/root/.bashrc").name
    res5: String = null
    
    scala> sc.binaryFiles("/home/root/.bashrc").name
    res6: String = /home/root/.bashrc
    
    * while using Spark 1.3.1:
    
    scala> sc.textFile("/home/root/.bashrc").name
    res0: String = /home/root/.bashrc
    
    scala> sc.binaryFiles("/home/root/.bashrc").name
    res1: String = /home/root/.bashrc
    
    Author: Yaron Weinsberg <[email protected]>
    Author: yaron <[email protected]>
    
    Closes #10456 from wyaron/master.
    
    (cherry picked from commit 73b70f076d4e22396b7e145f2ce5974fbf788048)
    Signed-off-by: Kousuke Saruta <[email protected]>

commit 7c7d76f34c0e09aae12f03e7c2922d4eb50d1830
Author: Kousuke Saruta <[email protected]>
Date:   2015-12-28T20:33:19Z

    [SPARK-12424][ML] The implementation of ParamMap#filter is wrong.
    
    ParamMap#filter uses `mutable.Map#filterKeys`. The return type of 
`filterKey` is collection.Map, not mutable.Map but the result is casted to 
mutable.Map using `asInstanceOf` so we get `ClassCastException`.
    Also, the return type of Map#filterKeys is not Serializable. It's the issue 
of Scala (https://issues.scala-lang.org/browse/SI-6654).
    
    Author: Kousuke Saruta <[email protected]>
    
    Closes #10381 from sarutak/SPARK-12424.
    
    (cherry picked from commit 07165ca06fe0866677525f85fec25e4dbd336674)
    Signed-off-by: Kousuke Saruta <[email protected]>

commit a9c52d4954aa445ab751b38ddbfd8fb6f84d7c14
Author: Daoyuan Wang <[email protected]>
Date:   2015-12-28T22:02:30Z

    [SPARK-12222][CORE] Deserialize RoaringBitmap using Kryo serializer throw 
Buffer underflow exception
    
    Since we only need to implement `def skipBytes(n: Int)`,
    code in #10213 could be simplified.
    davies scwf
    
    Author: Daoyuan Wang <[email protected]>
    
    Closes #10253 from adrian-wang/kryo.
    
    (cherry picked from commit a6d385322e7dfaff600465fa5302010a5f122c6b)
    Signed-off-by: Kousuke Saruta <[email protected]>

commit fd202485ace613d9930d0ede48ba8a65920004db
Author: Shixiong Zhu <[email protected]>
Date:   2015-12-28T23:01:51Z

    [SPARK-12489][CORE][SQL][MLIB] Fix minor issues found by FindBugs
    
    Include the following changes:
    
    1. Close `java.sql.Statement`
    2. Fix incorrect `asInstanceOf`.
    3. Remove unnecessary `synchronized` and `ReentrantLock`.
    
    Author: Shixiong Zhu <[email protected]>
    
    Closes #10440 from zsxwing/findbugs.
    
    (cherry picked from commit 710b41172958a0b3a2b70c48821aefc81893731b)
    Signed-off-by: Shixiong Zhu <[email protected]>

commit 85a871818ee1134deb29387c78c6ce21eb6d2acb
Author: Takeshi YAMAMURO <[email protected]>
Date:   2015-12-29T05:28:32Z

    [SPARK-11394][SQL] Throw IllegalArgumentException for unsupported types in 
postgresql
    
    If DataFrame has BYTE types, throws an exception:
    org.postgresql.util.PSQLException: ERROR: type "byte" does not exist
    
    Author: Takeshi YAMAMURO <[email protected]>
    
    Closes #9350 from maropu/FixBugInPostgreJdbc.
    
    (cherry picked from commit 73862a1eb9744c3c32458c9c6f6431c23783786a)
    Signed-off-by: Yin Huai <[email protected]>

commit c069ffc2b13879f471e6d888116f45f6a8902236
Author: Forest Fang <[email protected]>
Date:   2015-12-29T07:15:24Z

    [SPARK-12526][SPARKR] ifelse`, `when`, `otherwise` unable to take Column as 
value
    
    `ifelse`, `when`, `otherwise` is unable to take `Column` typed S4 object as 
values.
    
    For example:
    ```r
    ifelse(lit(1) == lit(1), lit(2), lit(3))
    ifelse(df$mpg > 0, df$mpg, 0)
    ```
    will both fail with
    ```r
    attempt to replicate an object of type 'environment'
    ```
    
    The PR replaces `ifelse` calls with `if ... else ...` inside the function 
implementations to avoid attempt to vectorize(i.e. `rep()`). It remains to be 
discussed whether we should instead support vectorization in these functions 
for consistency because `ifelse` in base R is vectorized but I cannot foresee 
any scenarios these functions will want to be vectorized in SparkR.
    
    For reference, added test cases which trigger failures:
    ```r
    . Error: when(), otherwise() and ifelse() with column on a DataFrame 
----------
    error in evaluating the argument 'x' in selecting a method for function 
'collect':
      error in evaluating the argument 'col' in selecting a method for function 
'select':
      attempt to replicate an object of type 'environment'
    Calls: when -> when -> ifelse -> ifelse
    
    1: withCallingHandlers(eval(code, new_test_environment), error = 
capture_calls, message = function(c) invokeRestart("muffleMessage"))
    2: eval(code, new_test_environment)
    3: eval(expr, envir, enclos)
    4: expect_equal(collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))[, 
1], c(NA, 1)) at test_sparkSQL.R:1126
    5: expect_that(object, equals(expected, label = expected.label, ...), info 
= info, label = label)
    6: condition(object)
    7: compare(actual, expected, ...)
    8: collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))
    Error: Test failures
    Execution halted
    ```
    
    Author: Forest Fang <[email protected]>
    
    Closes #10481 from saurfang/spark-12526.
    
    (cherry picked from commit d80cc90b5545cff82cd9b340f12d01eafc9ca524)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit 8dc65497152f2c8949b08fddad853d31c4bd9ae5
Author: Holden Karau <[email protected]>
Date:   2015-12-30T19:14:47Z

    [SPARK-12300] [SQL] [PYSPARK] fix schema inferance on local collections
    
    Current schema inference for local python collections halts as soon as 
there are no NullTypes. This is different than when we specify a sampling ratio 
of 1.0 on a distributed collection. This could result in incomplete schema 
information.
    
    Author: Holden Karau <[email protected]>
    
    Closes #10275 from 
holdenk/SPARK-12300-fix-schmea-inferance-on-local-collections.
    
    (cherry picked from commit d1ca634db4ca9db7f0ba7ca38a0e03bcbfec23c9)
    Signed-off-by: Davies Liu <[email protected]>

commit cd86075b52d6363f674dffc3eb71d90449563879
Author: Carson Wang <[email protected]>
Date:   2015-12-30T21:49:10Z

    [SPARK-12399] Display correct error message when accessing REST API with an 
unknown app Id
    
    I got an exception when accessing the below REST API with an unknown 
application Id.
    `http://<server-url>:18080/api/v1/applications/xxx/jobs`
    Instead of an exception, I expect an error message "no such app: xxx" which 
is a similar error message when I access `/api/v1/applications/xxx`
    ```
    org.spark-project.guava.util.concurrent.UncheckedExecutionException: 
java.util.NoSuchElementException: no app with key xxx
        at 
org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2263)
        at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000)
        at 
org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
        at 
org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
        at 
org.apache.spark.deploy.history.HistoryServer.getSparkUI(HistoryServer.scala:116)
        at 
org.apache.spark.status.api.v1.UIRoot$class.withSparkUI(ApiRootResource.scala:226)
        at 
org.apache.spark.deploy.history.HistoryServer.withSparkUI(HistoryServer.scala:46)
        at 
org.apache.spark.status.api.v1.ApiRootResource.getJobs(ApiRootResource.scala:66)
    ```
    
    Author: Carson Wang <[email protected]>
    
    Closes #10352 from carsonwang/unknownAppFix.
    
    (cherry picked from commit b244297966be1d09f8e861cfe2d8e69f7bed84da)
    Signed-off-by: Marcelo Vanzin <[email protected]>

commit 4e9dd16987b3cba19dcf6437f3b6c8aeb59e2e39
Author: felixcheung <[email protected]>
Date:   2016-01-03T15:23:35Z

    [SPARK-12327][SPARKR] fix code for lintr warning for commented code
    
    shivaram
    
    Author: felixcheung <[email protected]>
    
    Closes #10408 from felixcheung/rcodecomment.
    
    (cherry picked from commit c3d505602de2fd2361633f90e4fff7e041849e28)
    Signed-off-by: Shivaram Venkataraman <[email protected]>

commit f7a322382a3c1eed7088541add55a7813813a958
Author: Xiu Guo <[email protected]>
Date:   2016-01-04T04:48:56Z

    [SPARK-12562][SQL] DataFrame.write.format(text) requires the column name to 
be called value
    
    Author: Xiu Guo <[email protected]>
    
    Closes #10515 from xguo27/SPARK-12562.
    
    (cherry picked from commit 84f8492c1555bf8ab44c9818752278f61768eb16)
    Signed-off-by: Reynold Xin <[email protected]>

commit cd02038198fa57da816211d7bc65921ff9f1e9bb
Author: Nong Li <[email protected]>
Date:   2016-01-04T18:37:56Z

    [SPARK-12486] Worker should kill the executors more forcefully if possible.
    
    This patch updates the ExecutorRunner's terminate path to use the new java 
8 API
    to terminate processes more forcefully if possible. If the executor is 
unhealthy,
    it would previously ignore the destroy() call. Presumably, the new java API 
was
    added to handle cases like this.
    
    We could update the termination path in the future to use OS specific 
commands
    for older java versions.
    
    Author: Nong Li <[email protected]>
    
    Closes #10438 from nongli/spark-12486-executors.
    
    (cherry picked from commit 8f659393b270c46e940c4e98af2d996bd4fd6442)
    Signed-off-by: Andrew Or <[email protected]>

commit b5a1f564a3c099ef0b674599f0b012d9346115a3
Author: Pete Robbins <[email protected]>
Date:   2016-01-04T18:43:21Z

    [SPARK-12470] [SQL] Fix size reduction calculation
    
    also only allocate required buffer size
    
    Author: Pete Robbins <[email protected]>
    
    Closes #10421 from robbinspg/master.
    
    (cherry picked from commit b504b6a90a95a723210beb0031ed41a75d702f66)
    Signed-off-by: Davies Liu <[email protected]>
    
    Conflicts:
        
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoiner.scala

commit 7f37c1e45d52b7823d566349e2be21366d73651f
Author: Josh Rosen <[email protected]>
Date:   2016-01-04T18:39:42Z

    [SPARK-12579][SQL] Force user-specified JDBC driver to take precedence
    
    Spark SQL's JDBC data source allows users to specify an explicit JDBC 
driver to load (using the `driver` argument), but in the current code it's 
possible that the user-specified driver will not be used when it comes time to 
actually create a JDBC connection.
    
    In a nutshell, the problem is that you might have multiple JDBC drivers on 
the classpath that claim to be able to handle the same subprotocol, so simply 
registering the user-provided driver class with the our `DriverRegistry` and 
JDBC's `DriverManager` is not sufficient to ensure that it's actually used when 
creating the JDBC connection.
    
    This patch addresses this issue by first registering the user-specified 
driver with the DriverManager, then iterating over the driver manager's loaded 
drivers in order to obtain the correct driver and use it to create a connection 
(previously, we just called `DriverManager.getConnection()` directly).
    
    If a user did not specify a JDBC driver to use, then we call 
`DriverManager.getDriver` to figure out the class of the driver to use, then 
pass that class's name to executors; this guards against corner-case bugs in 
situations where the driver and executor JVMs might have different sets of JDBC 
drivers on their classpaths (previously, there was the (rare) potential for 
`DriverManager.getConnection()` to use different drivers on the driver and 
executors if the user had not explicitly specified a JDBC driver class and the 
classpaths were different).
    
    This patch is inspired by a similar patch that I made to the 
`spark-redshift` library 
(https://github.com/databricks/spark-redshift/pull/143), which contains its own 
modified fork of some of Spark's JDBC data source code (for cross-Spark-version 
compatibility reasons).
    
    Author: Josh Rosen <[email protected]>
    
    Closes #10519 from JoshRosen/jdbc-driver-precedence.
    
    (cherry picked from commit 6c83d938cc61bd5fabaf2157fcc3936364a83f02)
    Signed-off-by: Yin Huai <[email protected]>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: Problem select empty ORC table

Reply via email to