[GitHub] spark pull request: spark ssc.textFileStream returns empty

sduchh Mon, 15 Jun 2015 23:24:33 -0700

GitHub user sduchh opened a pull request:

    https://github.com/apache/spark/pull/6837


    spark ssc.textFileStream returns empty

    I'm trying to stream from an hdfs folder (sparkstreamining) I tried:
    JavaDStream<String> sourceInfors 
=jssc.textFileStream("hdfs://172.19.23.24:8020/user/input");  
    sourceInfors.print();
    As the documentation for textFileStream said:Create an input stream that 
monitors a Hadoop-compatible filesystem for new files and reads them as text 
files (using key as LongWritable, value as Text and input format as 
TextInputFormat). Files must be written to the monitored directory by "moving" 
them from another location within the same file system. File names starting 
with . are ignored.
    
    First, I run this application in local mode, and  there is no files in this 
directory.Second,I move one file from the same file system to the diectory 
"input", however, nothing returned.  Can anyone help solve this problem? Thanks.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-1.3

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/6837.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6837
    
----
commit d9e141cb7b973e1fa21f0144e08d7ffa976c0b39
Author: Volodymyr Lyubinets <[email protected]>
Date:   2015-03-12T07:55:26Z

    [SPARK-6296] [SQL] Added equals to Column
    
    Author: Volodymyr Lyubinets <[email protected]>
    
    Closes #4988 from vlyubin/columncomp and squashes the following commits:
    
    92d7c8f [Volodymyr Lyubinets] Added equals to Column
    
    (cherry picked from commit 25b71d8c15572f0f2b951c827c169f8c65f726ad)
    Signed-off-by: Reynold Xin <[email protected]>

commit 850e69451e3ea501a527c81d10fd4b34e2979a2a
Author: Davies Liu <[email protected]>
Date:   2015-03-12T08:34:38Z

    [SPARK-6294] fix hang when call take() in JVM on PythonRDD
    
    The Thread.interrupt() can not terminate the thread in some cases, so we 
should not wait for the writerThread of PythonRDD.
    
    This PR also ignore some exception during clean up.
    
    cc JoshRosen mengxr
    
    Author: Davies Liu <[email protected]>
    
    Closes #4987 from davies/fix_take and squashes the following commits:
    
    4488f1a [Davies Liu] fix hang when call take() in JVM on PythonRDD
    
    (cherry picked from commit 712679a7b447346a365b38574d7a86d56a93f767)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 23069bd02af342668a6eadecb545f529acb843b6
Author: Joseph K. Bradley <[email protected]>
Date:   2015-03-12T23:46:29Z

    [mllib] [python] Add LassoModel to __all__ in regression.py
    
    Add LassoModel to __all__ in regression.py
    
    LassoModel does not show up in Python docs
    
    This should be merged into branch-1.3 and master.
    
    Author: Joseph K. Bradley <[email protected]>
    
    Closes #4970 from jkbradley/SPARK-6253 and squashes the following commits:
    
    c2cb533 [Joseph K. Bradley] Add LassoModel to __all__ in regression.py
    
    (cherry picked from commit 17c309c87e78da145dc358514150ec5700eed8f0)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit dc287f38f1cc192b7fa6ec0e83b36254f1cfec10
Author: Cheng Lian <[email protected]>
Date:   2015-03-13T13:34:50Z

    [SPARK-5310] [SQL] [DOC] Parquet section for the SQL programming guide
    
    Also fixed a bunch of minor styling issues.
    
    <!-- Reviewable:start -->
    [<img src="https://reviewable.io/review_button.png"; height=40 alt="Review 
on Reviewable"/>](https://reviewable.io/reviews/apache/spark/5001)
    <!-- Reviewable:end -->
    
    Author: Cheng Lian <[email protected]>
    
    Closes #5001 from liancheng/parquet-doc and squashes the following commits:
    
    89ad3db [Cheng Lian] Addresses @rxin's comments
    7eb6955 [Cheng Lian] Docs for the new Parquet data source
    415eefb [Cheng Lian] Some minor formatting improvements
    
    (cherry picked from commit 69ff8e8cfbecd81fd54100c4dab332c3bc992316)
    Signed-off-by: Cheng Lian <[email protected]>

commit 214f68103219317416e2278e80b8fc0fb5a616f4
Author: Xiangrui Meng <[email protected]>
Date:   2015-03-13T17:27:28Z

    [SPARK-6278][MLLIB] Mention the change of objective in linear regression
    
    As discussed in the RC3 vote thread, we should mention the change of 
objective in linear regression in the migration guide. srowen
    
    Author: Xiangrui Meng <[email protected]>
    
    Closes #4978 from mengxr/SPARK-6278 and squashes the following commits:
    
    fb3bbe6 [Xiangrui Meng] mention regularization parameter
    bfd6cff [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into 
SPARK-6278
    375fd09 [Xiangrui Meng] address Sean's comments
    f87ae71 [Xiangrui Meng] mention step size change
    
    (cherry picked from commit 7f13434a5c52b815c584ec773ab0e5df1a35ea86)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit dbee7e16c7434326cce6f6d5ab494093c60ee097
Author: Sean Owen <[email protected]>
Date:   2015-02-26T20:56:54Z

    SPARK-4704 [CORE] SparkSubmitDriverBootstrap doesn't flush output
    
    Join on output threads to make sure any lingering output from process 
reaches stdout, stderr before exiting
    
    CC andrewor14 since I believe he created this section of code
    
    Author: Sean Owen <[email protected]>
    
    Closes #4788 from srowen/SPARK-4704 and squashes the following commits:
    
    ad7114e [Sean Owen] Join on output threads to make sure any lingering 
output from process reaches stdout, stderr before exiting

commit 170af49bb0b183b2f4cb3ebbb3e9ab5327f907c9
Author: Davies Liu <[email protected]>
Date:   2015-03-09T23:24:06Z

    [SPARK-6194] [SPARK-677] [PySpark] fix memory leak in collect()
    
    Because circular reference between JavaObject and JavaMember, an Java 
object can not be released until Python GC kick in, then it will cause memory 
leak in collect(), which may consume lots of memory in JVM.
    
    This PR change the way we sending collected data back into Python from 
local file to socket, which could avoid any disk IO during collect, also avoid 
any referrers of Java object in Python.
    
    cc JoshRosen
    
    Author: Davies Liu <[email protected]>
    
    Closes #4923 from davies/fix_collect and squashes the following commits:
    
    d730286 [Davies Liu] address comments
    24c92a4 [Davies Liu] fix style
    ba54614 [Davies Liu] use socket to transfer data from JVM
    9517c8f [Davies Liu] fix memory leak in collect()
    
    (cherry picked from commit 8767565cef01d847f57b7293d8b63b2422009b90)
    Signed-off-by: Josh Rosen <[email protected]>

commit a3493eb77a0aa7d3048e657459ebaa22e98ccf0c
Author: Sean Owen <[email protected]>
Date:   2015-02-26T22:08:56Z

    SPARK-4300 [CORE] Race condition during SparkWorker shutdown
    
    Close appender saving stdout/stderr before destroying process to avoid 
exception on reading closed input stream.
    (This also removes a redundant `waitFor()` although it was harmless)
    
    CC tdas since I think you wrote this method.
    
    Author: Sean Owen <[email protected]>
    
    Closes #4787 from srowen/SPARK-4300 and squashes the following commits:
    
    e0cdabf [Sean Owen] Close appender saving stdout/stderr before destroying 
process to avoid exception on reading closed input stream

commit 4aa41327d164ed5b2830cb18eb47b93ebd27401b
Author: Sean Owen <[email protected]>
Date:   2015-03-13T17:59:31Z

    SPARK-4044 [CORE] Thriftserver fails to start when JAVA_HOME points to JRE 
instead of JDK
    
    Don't use JAR_CMD unless present in archive check. Add datanucleus always 
if present, to avoid needing a check involving JAR_CMD.
    
    Follow up to https://github.com/apache/spark/pull/4873 for branch 1.3.
    
    Author: Sean Owen <[email protected]>
    
    Closes #4981 from srowen/SPARK-4044.2 and squashes the following commits:
    
    3aafc76 [Sean Owen] Don't use JAR_CMD unless present in archive check. Add 
datanucleus always if present, to avoid needing a check involving JAR_CMD

commit f81611dca7ce97ebd26262086ac0e2b5e5f997e5
Author: Zhang, Liye <[email protected]>
Date:   2015-02-27T07:11:43Z

    [SPARK-6036][CORE] avoid race condition between eventlogListener and akka 
actor system
    
    For detail description, pls refer to 
[SPARK-6036](https://issues.apache.org/jira/browse/SPARK-6036).
    
    Author: Zhang, Liye <[email protected]>
    
    Closes #4785 from liyezhang556520/EventLogInProcess and squashes the 
following commits:
    
    8b0b0a6 [Zhang, Liye] stop listener after DAGScheduler
    79b15b3 [Zhang, Liye] SPARK-6036 avoid race condition between 
eventlogListener and akka actor system

commit 9846790f49e2716e0b0c15f58e8547a1f04ba3ae
Author: Lev Khomich <[email protected]>
Date:   2015-03-10T10:55:42Z

    [SPARK-6087][CORE] Provide actionable exception if Kryo buffer is not large 
enough
    
    A simple try-catch wrapping KryoException to be more informative.
    
    Author: Lev Khomich <[email protected]>
    
    Closes #4947 from levkhomich/master and squashes the following commits:
    
    0f7a947 [Lev Khomich] [SPARK-6087][CORE] Provide actionable exception if 
Kryo buffer is not large enough

commit 3cdc8a35a7b9bbdf418988d0fe4524d413dce23c
Author: Andrew Or <[email protected]>
Date:   2015-03-03T21:44:05Z

    [SPARK-6132] ContextCleaner race condition across SparkContexts
    
    The problem is that `ContextCleaner` may clean variables that belong to a 
different `SparkContext`. This can happen if the `SparkContext` to which the 
cleaner belongs stops, and a new one is started immediately afterwards in the 
same JVM. In this case, if the cleaner is in the middle of cleaning a 
broadcast, for instance, it will do so through `SparkEnv.get.blockManager`, 
which could be one that belongs to a different `SparkContext`.
    
    JoshRosen and I suspect that this is the cause of many flaky tests, most 
notably the `JavaAPISuite`. We were able to reproduce the failure locally 
(though it is not deterministic and very hard to reproduce).
    
    Author: Andrew Or <[email protected]>
    
    Closes #4869 from andrewor14/cleaner-masquerade and squashes the following 
commits:
    
    29168c0 [Andrew Or] Synchronize ContextCleaner stop

commit 338bea7b33a0faaa62c94ace334a79c0b1716a01
Author: Andrew Or <[email protected]>
Date:   2015-03-04T04:49:45Z

    [SPARK-6132][HOTFIX] ContextCleaner InterruptedException should be quiet
    
    If the cleaner is stopped, we shouldn't print a huge stack trace when the 
cleaner thread is interrupted because we purposefully did this.
    
    Author: Andrew Or <[email protected]>
    
    Closes #4882 from andrewor14/cleaner-interrupt and squashes the following 
commits:
    
    8652120 [Andrew Or] Just a hot fix

commit a08588c7eeaecf7003073c092320b37abd166191
Author: Andrew Or <[email protected]>
Date:   2015-03-03T23:09:57Z

    [SPARK-6133] Make sc.stop() idempotent
    
    Before we would get the following (benign) error if we called `sc.stop()` 
twice. This is because the listener bus would try to post the end event again 
even after it has already stopped. This happens occasionally when flaky tests 
fail, usually as a result of other sources of error. Either way we shouldn't be 
logging this error when it is not the cause of the failure.
    ```
    ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event 
SparkListenerApplicationEnd(1425348445682)
    ```
    
    Author: Andrew Or <[email protected]>
    
    Closes #4871 from andrewor14/sc-stop and squashes the following commits:
    
    a14afc5 [Andrew Or] Move code after code
    915db16 [Andrew Or] Move code into code

commit 30127812629a53a1b45c4d90b70c5cc55dd28fb6
Author: zzcclp <[email protected]>
Date:   2015-03-12T15:07:15Z

    [SPARK-6275][Documentation]Miss toDF() function in 
docs/sql-programming-guide.md
    
    Miss `toDF()` function in docs/sql-programming-guide.md
    
    Author: zzcclp <[email protected]>
    
    Closes #4977 from zzcclp/SPARK-6275 and squashes the following commits:
    
    9a96c7b [zzcclp] Miss toDF()

commit ad475632106f90d380790dd236eb7f5d571f6a53
Author: Davies Liu <[email protected]>
Date:   2015-03-14T07:43:33Z

    [SPARK-6210] [SQL] use prettyString as column name in agg()
    
    use prettyString instead of toString() (which include id of expression) as 
column name in agg()
    
    Author: Davies Liu <[email protected]>
    
    Closes #5006 from davies/prettystring and squashes the following commits:
    
    cb1fdcf [Davies Liu] use prettyString as column name in agg()
    
    (cherry picked from commit b38e073fee794188d5267f1812b095e51874839e)
    Signed-off-by: Reynold Xin <[email protected]>

commit 43fcab01a4cb8e0534083da36a9fac022575f4f2
Author: Jongyoul Lee <[email protected]>
Date:   2015-03-15T15:46:55Z

    [SPARK-3619] Part 2. Upgrade to Mesos 0.21 to work around MESOS-1688
    
    - MESOS_NATIVE_LIBRARY become deprecated
    - Chagned MESOS_NATIVE_LIBRARY to MESOS_NATIVE_JAVA_LIBRARY
    
    Author: Jongyoul Lee <[email protected]>
    
    Closes #4361 from jongyoul/SPARK-3619-1 and squashes the following commits:
    
    f1ea91f [Jongyoul Lee] Merge branch 'SPARK-3619-1' of 
https://github.com/jongyoul/spark into SPARK-3619-1
    a6a00c2 [Jongyoul Lee] [SPARK-3619] Upgrade to Mesos 0.21 to work around 
MESOS-1688 - Removed 'Known issues' section
    2e15a21 [Jongyoul Lee] [SPARK-3619] Upgrade to Mesos 0.21 to work around 
MESOS-1688 - MESOS_NATIVE_LIBRARY become deprecated - Chagned 
MESOS_NATIVE_LIBRARY to MESOS_NATIVE_JAVA_LIBRARY
    0dace7b [Jongyoul Lee] [SPARK-3619] Upgrade to Mesos 0.21 to work around 
MESOS-1688 - MESOS_NATIVE_LIBRARY become deprecated - Chagned 
MESOS_NATIVE_LIBRARY to MESOS_NATIVE_JAVA_LIBRARY

commit 724aab4b908b7b4d9d1ff79a46d8baf41302f2df
Author: DoingDone9 <[email protected]>
Date:   2015-03-16T12:27:15Z

    [SPARK-6300][Spark Core] sc.addFile(path) does not support the relative 
path.
    
    when i run cmd like that sc.addFile("../test.txt"), it did not work and 
throwed an exception:
    java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
path in absolute URI: file:../test.txt
    at org.apache.hadoop.fs.Path.initialize(Path.java:206)
    at org.apache.hadoop.fs.Path.<init>(Path.java:172)
    ........
    .......
    Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
file:../test.txt
    at java.net.URI.checkPath(URI.java:1804)
    at java.net.URI.<init>(URI.java:752)
    at org.apache.hadoop.fs.Path.initialize(Path.java:203)
    
    Author: DoingDone9 <[email protected]>
    
    Closes #4993 from DoingDone9/relativePath and squashes the following 
commits:
    
    ee375cd [DoingDone9] Update SparkContextSuite.scala
    d594e16 [DoingDone9] Update SparkContext.scala
    0ff3fa8 [DoingDone9] test for add file
    dced8eb [DoingDone9] Update SparkContext.scala
    e4a13fe [DoingDone9] getCanonicalPath
    161cae3 [DoingDone9] Merge pull request #4 from apache/master
    c87e8b6 [DoingDone9] Merge pull request #3 from apache/master
    cb1852d [DoingDone9] Merge pull request #2 from apache/master
    c3f046f [DoingDone9] Merge pull request #1 from apache/master
    
    (cherry picked from commit 00e730b94cba1202a73af1e2476ff5a44af4b6b2)
    Signed-off-by: Sean Owen <[email protected]>

commit 684ff2476e4ef8aa2d39e1385413edb1b9129838
Author: Sean Owen <[email protected]>
Date:   2015-03-11T14:09:09Z

    SPARK-6245 [SQL] jsonRDD() of empty RDD results in exception
    
    Avoid `UnsupportedOperationException` from JsonRDD.inferSchema on empty RDD.
    
    Not sure if this is supposed to be an error (but a better one), but it 
seems like this case can come up if the input is down-sampled so much that 
nothing is sampled.
    
    Now stuff like this:
    ```
    sqlContext.jsonRDD(sc.parallelize(List[String]()))
    ```
    just results in
    ```
    org.apache.spark.sql.DataFrame = []
    ```
    
    Author: Sean Owen <[email protected]>
    
    Closes #4971 from srowen/SPARK-6245 and squashes the following commits:
    
    3699964 [Sean Owen] Set() -> Set.empty
    3c619e1 [Sean Owen] Avoid UnsupportedOperationException from 
JsonRDD.inferSchema on empty RDD

commit 67fa6d1f830dee37244b5a30684d797093c7c134
Author: Volodymyr Lyubinets <[email protected]>
Date:   2015-03-16T19:13:18Z

    [SPARK-6330] Fix filesystem bug in newParquet relation
    
    If I'm running this locally and my path points to S3, this would currently 
error out because of incorrect FS.
    I tested this in a scenario that previously didn't work, this change seemed 
to fix the issue.
    
    Author: Volodymyr Lyubinets <[email protected]>
    
    Closes #5020 from vlyubin/parquertbug and squashes the following commits:
    
    a645ad5 [Volodymyr Lyubinets] Fix filesystem bug in newParquet relation

commit 47cce984eb6286fcedde8bf480442a66a87de09c
Author: lisurprise <[email protected]>
Date:   2015-03-16T20:10:32Z

    [SPARK-6077] Remove streaming tab while stopping StreamingContext
    
    Currently we would create a new streaming tab for each streamingContext 
even if there's already one on the same sparkContext which would cause 
duplicate StreamingTab created and none of them is taking effect.
    snapshot: 
https://www.dropbox.com/s/t4gd6hqyqo0nivz/bad%20multiple%20streamings.png?dl=0
    How to reproduce:
    1)
    import org.apache.spark.SparkConf
    import org.apache.spark.streaming.
    {Seconds, StreamingContext}
    import org.apache.spark.storage.StorageLevel
    val ssc = new StreamingContext(sc, Seconds(1))
    val lines = ssc.socketTextStream("localhost", 9999, 
StorageLevel.MEMORY_AND_DISK_SER)
    val words = lines.flatMap(_.split(" "))
    val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
    wordCounts.print()
    ssc.start()
    .....
    2)
    ssc.stop(false)
    val ssc = new StreamingContext(sc, Seconds(1))
    val lines = ssc.socketTextStream("localhost", 9999, 
StorageLevel.MEMORY_AND_DISK_SER)
    val words = lines.flatMap(_.split(" "))
    val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
    wordCounts.print()
    ssc.start()
    
    Author: lisurprise <[email protected]>
    
    Closes #4828 from zhichao-li/master and squashes the following commits:
    
    c329806 [lisurprise] add test for attaching/detaching streaming tab
    51e6c7f [lisurprise] move detach method into StreamingTab
    31a44fa [lisurprise] add unit test for attaching and detaching new tab
    db25ed2 [lisurprise] clean code
    8281bcb [lisurprise] clean code
    193c542 [lisurprise] remove streaming tab while closing streaming context
    
    (cherry picked from commit f149b8b5e542af44650923d0156f037121b45a20)
    Signed-off-by: Tathagata Das <[email protected]>

commit 5c16ced1e6c2dcadc0179eda8b273071254e285b
Author: Kevin (Sangwoo) Kim <[email protected]>
Date:   2015-03-17T06:49:23Z

    [SPARK-6299][CORE] ClassNotFoundException in standalone mode when running 
groupByKey with class defined in REPL
    
    ```
    case class ClassA(value: String)
    val rdd = sc.parallelize(List(("k1", ClassA("v1")), ("k1", ClassA("v2")) ))
    rdd.groupByKey.collect
    ```
    This code used to be throw exception in spark-shell, because while 
shuffling ```JavaSerializer```uses ```defaultClassLoader``` which was defined 
like ```env.serializer.setDefaultClassLoader(urlClassLoader)```.
    
    It should be ```env.serializer.setDefaultClassLoader(replClassLoader)```, 
like
    ```
        override def run() {
          val deserializeStartTime = System.currentTimeMillis()
          Thread.currentThread.setContextClassLoader(replClassLoader)
    ```
    in TaskRunner.
    
    When ```replClassLoader``` cannot be defined, it's identical with 
```urlClassLoader```
    
    Author: Kevin (Sangwoo) Kim <[email protected]>
    
    Closes #5046 from swkimme/master and squashes the following commits:
    
    fa2b9ee [Kevin (Sangwoo) Kim] stylish test codes ( collect -> collect() )
    6e9620b [Kevin (Sangwoo) Kim] stylish test codes ( collect -> collect() )
    d23e4e2 [Kevin (Sangwoo) Kim] stylish test codes ( collect -> collect() )
    a4a3c8a [Kevin (Sangwoo) Kim] add 'class defined in repl - shuffle' test to 
ReplSuite
    bd00da5 [Kevin (Sangwoo) Kim] add 'class defined in repl - shuffle' test to 
ReplSuite
    c1b1fc7 [Kevin (Sangwoo) Kim] use REPL class loader for executor's 
serializer
    
    (cherry picked from commit f0edeae7f9ab7eae02c227be9162ec69d22c92bd)
    Signed-off-by: Reynold Xin <[email protected]>

commit 426816b5ca42443b99e8af7a6450604dd7794bd6
Author: Lomig MeÌgard <[email protected]>
Date:   2015-03-17T06:52:42Z

    [SQL][docs][minor] Fixed sample code in SQLContext scaladoc
    
    Error in the code sample of the `implicits` object in `SQLContext`.
    
    Author: Lomig MeÌgard <[email protected]>
    
    Closes #5051 from tarfaa/simple and squashes the following commits:
    
    5a88acc [Lomig MeÌgard] [docs][minor] Fixed sample code in SQLContext 
scaladoc
    
    (cherry picked from commit 68707225f1a4081aadbf0fd7e6221293a190529b)
    Signed-off-by: Reynold Xin <[email protected]>

commit 95f8d1c51dabf89a50985d488ac68977ebaf9771
Author: Tathagata Das <[email protected]>
Date:   2015-03-17T12:31:27Z

    [SPARK-6331] Load new master URL if present when recovering streaming 
context from checkpoint
    
    In streaming driver recovery, when the SparkConf is reconstructed based on 
the checkpointed configuration, it recovers the old master URL. This okay if 
the cluster on which the streaming application is relaunched is the same 
cluster as it was running before. But if that cluster changes, there is no way 
to inject the new master URL of the new cluster. As a result, the restarted app 
tries to connect to the non-existent old cluster and fails.
    
    The solution is to check whether a master URL is set in the System 
properties (by Spark submit) before recreating the SparkConf. If a new master 
url is set in the properties, then use it as that is obviously the most 
relevant one. Otherwise load the old one (to maintain existing behavior).
    
    Author: Tathagata Das <[email protected]>
    
    Closes #5024 from tdas/SPARK-6331 and squashes the following commits:
    
    392fd44 [Tathagata Das] Fixed naming issue.
    c7c0b99 [Tathagata Das] Addressed comments.
    6a0857c [Tathagata Das] Updated testsuites.
    222485d [Tathagata Das] Load new master URL if present when recovering 
streaming context from checkpoint
    
    (cherry picked from commit c928796ade54f68e26bc55734a9867a046d2e3fe)
    Signed-off-by: Tathagata Das <[email protected]>

commit 29e39e178e9e21a038cf1aef1d110b368b6d64f7
Author: Josh Rosen <[email protected]>
Date:   2015-03-17T16:18:57Z

    [SPARK-3266] Use intermediate abstract classes to fix type erasure issues 
in Java APIs
    
    This PR addresses a Scala compiler bug 
([SI-8905](https://issues.scala-lang.org/browse/SI-8905)) that was breaking 
some of the Spark Java APIs.  In a nutshell, it seems that methods whose 
implementations are inherited from generic traits sometimes have their type 
parameters erased to Object.  This was causing methods like `DoubleRDD.min()` 
to throw confusing NoSuchMethodErrors at runtime.
    
    The fix implemented here is to introduce an intermediate layer of abstract 
classes and inherit from those instead of directly extends the `Java*Like` 
traits.  This should not break binary compatibility.
    
    I also improved the test coverage of the Java API, adding several new tests 
for methods that failed at runtime due to this bug.
    
    Author: Josh Rosen <[email protected]>
    
    Closes #5050 from JoshRosen/javardd-si-8905-fix and squashes the following 
commits:
    
    2feb068 [Josh Rosen] Use intermediate abstract classes to work around 
SPARK-3266
    d5f3e5d [Josh Rosen] Add failing regression tests for SPARK-3266
    
    (cherry picked from commit 0f673c21f68ee3d5df3c01ae405709d3c1f4909b)
    Signed-off-by: Josh Rosen <[email protected]>

commit febb12308dac94af20279a31e8a6013690f42f24
Author: nemccarthy <[email protected]>
Date:   2015-03-17T16:33:11Z

    [SPARK-6313] Add config option to disable file locks/fetchFile cache to ...
    
    ...support NFS mounts.
    
    This is a work around for now with the goal to find a more permanent 
solution.
    https://issues.apache.org/jira/browse/SPARK-6313
    
    Author: nemccarthy <[email protected]>
    
    Closes #5036 from nemccarthy/master and squashes the following commits:
    
    2eaaf42 [nemccarthy] [SPARK-6313] Update config wording doc for 
spark.files.useFetchCache
    5de7eb4 [nemccarthy] [SPARK-6313] Add config option to disable file 
locks/fetchFile cache to support NFS mounts
    
    (cherry picked from commit 4cca3917dc30ee907e6cbd6a569b6ac58af963f7)
    Signed-off-by: Josh Rosen <[email protected]>

commit ac0e7cc7f69c8300cc2e3a91a2f3fb2d2024ef25
Author: Imran Rashid <[email protected]>
Date:   2015-03-17T17:03:54Z

    [SPARK-6365] jetty-security also needed for SPARK_PREPEND_CLASSES to work
    
    https://issues.apache.org/jira/browse/SPARK-6365
    
    thanks vanzin for helping me figure this out
    
    Author: Imran Rashid <[email protected]>
    
    Closes #5071 from squito/1.3_fix_prepend_classes and squashes the following 
commits:
    
    712adc1 [Imran Rashid] [SPARK-6365] jetty-security also needed for 
SPARK_PREPEND_CLASSES to work

commit 476c4e117675ea841cfa35661c28c36419dd3bdc
Author: lewuathe <[email protected]>
Date:   2015-03-17T19:11:57Z

    [SPARK-6336] LBFGS should document what convergenceTol means
    
    LBFGS uses convergence tolerance. This value should be written in document 
as an argument.
    
    Author: lewuathe <[email protected]>
    
    Closes #5033 from Lewuathe/SPARK-6336 and squashes the following commits:
    
    e738b33 [lewuathe] Modify text to be more natural
    ac03c3a [lewuathe] Modify documentations
    6ccb304 [lewuathe] [SPARK-6336] LBFGS should document what convergenceTol 
means
    
    (cherry picked from commit d9f3e01688ad0a8d5fc2419a948a682ad7d957c9)
    Signed-off-by: Xiangrui Meng <[email protected]>

commit 9d88f0cbdb3e994c3ab62eb7534b0b5308cc5265
Author: Pei-Lun Lee <[email protected]>
Date:   2015-03-18T00:34:46Z

    [SPARK-6330] [SQL] Add a test case for SPARK-6330
    
    When getting file statuses, create file system from each path instead of a 
single one from hadoop configuration.
    
    Author: Pei-Lun Lee <[email protected]>
    
    Closes #5039 from ypcat/spark-6351 and squashes the following commits:
    
    a19a3fe [Pei-Lun Lee] [SPARK-6330] [SQL] fix test
    506f5a0 [Pei-Lun Lee] [SPARK-6351] [SQL] fix test
    fa2290e [Pei-Lun Lee] [SPARK-6330] [SQL] Rename test case and add comment
    606c967 [Pei-Lun Lee] Merge branch 'master' of 
https://github.com/apache/spark into spark-6351
    896e80a [Pei-Lun Lee] [SPARK-6351] [SQL] Add test case
    2ae0916 [Pei-Lun Lee] [SPARK-6351] [SQL] ParquetRelation2 supporting 
multiple file systems
    
    (cherry picked from commit 4633a87b86a6ef01fa724d31763dcb97cb5bc746)
    Signed-off-by: Cheng Lian <[email protected]>

commit 3ea38bc3d8c6b6a5f75ee23a4a8799a1d137c6dd
Author: Yin Huai <[email protected]>
Date:   2015-03-18T01:41:06Z

    [SPARK-6366][SQL] In Python API, the default save mode for save and 
saveAsTable should be "error" instead of "append".
    
    https://issues.apache.org/jira/browse/SPARK-6366
    
    Author: Yin Huai <[email protected]>
    
    Closes #5053 from yhuai/SPARK-6366 and squashes the following commits:
    
    fc81897 [Yin Huai] Use error as the default save mode for save/saveAsTable.
    
    (cherry picked from commit dc9c9196d63aa465e86ac52f0e86e10c12472100)
    Signed-off-by: Cheng Lian <[email protected]>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: spark ssc.textFileStream returns empty

Reply via email to