Hi Andrew,
Thanks to re-confirm the problem. I thought it only happens to my own build. :)
by the way, we have multiple users using the spark-shell to explore their 
dataset, and we are continuously looking into ways to isolate their jobs 
history. In the current situation, we can't really ask them to create their own 
spark-defaults.conf since this is set to read-only. A workaround is to set it 
to a shared folder e.g. /user/spark/logs and user permission 1777. This isn't 
really ideal since other people can see what are the other jobs running on the 
shared cluster.
It will be nice to have a better security if this is enhanced so people aren't 
exposing their algorithm (which is usually embed in their job's name) to other 
users.
Will there or is there a JIRA ticket to keep track of this? any plan to enhance 
this part for spark-shell ?


Date: Mon, 28 Jul 2014 13:54:56 -0700
Subject: Re: Issues on spark-shell and spark-submit behave differently on 
spark-defaults.conf parameter spark.eventLog.dir
From: and...@databricks.com
To: user@spark.apache.org

Hi Andrew,
It's definitely not bad practice to use spark-shell with HistoryServer. The 
issue here is not with spark-shell, but the way we pass Spark configs to the 
application. spark-defaults.conf does not currently support embedding 
environment variables, but instead interprets everything as a string literal. 
You will have to manually specify "test" instead of "$USER" in the path you 
provide to spark.eventLog.dir.

-Andrew

2014-07-28 12:40 GMT-07:00 Andrew Lee <alee...@hotmail.com>:




Hi All,
Not sure if anyone has ran into this problem, but this exist in spark 1.0.0 
when you specify the location in conf/spark-defaults.conf for

spark.eventLog.dir hdfs:///user/$USER/spark/logs
to use the $USER env variable. 

For example, I'm running the command with user 'test'.
In spark-submit, the folder will be created on-the-fly and you will see the 
event logs created on HDFS /user/test/spark/logs/spark-pi-1405097484152

but in spark-shell, the user 'test' folder is not created, and you will see 
this /user/$USER/spark/logs on HDFS. It will try to create 
/user/$USER/spark/logs instead of /user/test/spark/logs.

It looks like spark-shell couldn't pick up the env variable $USER to apply for 
the eventLog directory for the running user 'test'.

Is this considered a bug or bad practice to use spark-shell with Spark's 
HistoryServer?









                                          

                                          

Reply via email to