what do you think about using a streamrdd in this case?

assuming streaming is available for pyspark, and you can collect based on # events

best,


matt

On 09/02/2014 10:38 AM, Andrew Or wrote:
Spark-shell, or any other Spark application, returns the full results of
the job until it has finished executing. You could add a hook for it to
write partial results to a file, but you may want to do so sparingly to
incur fewer I/Os. If you have a large file and the result contains many
lines, it is unlikely to fully fit in memory anyway, so it's probably
not a bad idea to just write your results to a file in batches while the
application is still running.

-Andrew


2014-09-01 22:16 GMT-07:00 Hao Wang <wh.s...@gmail.com
<mailto:wh.s...@gmail.com>>:

    Hi, all

    I am wondering if I use Spark-shell to scan a large file to obtain
    lines containing "error", whether the shell returns results while
    the job is executing, or the job has been totally finished.

    Regards,
    Wang Hao(王灏)

    CloudTeam | School of Software Engineering
    Shanghai Jiao Tong University
    Address:800 Dongchuan Road, Minhang District, Shanghai, 200240
    Email:wh.s...@gmail.com <mailto:wh.s...@gmail.com>




---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to