[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...
Github user AhyoungRyu commented on the issue: https://github.com/apache/zeppelin/pull/1339 I'm closing this PR since there'll be better solution for this (e.g. [ZEPPELIN-1993](https://issues.apache.org/jira/browse/ZEPPELIN-1993)) :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...
Github user 1ambda commented on the issue: https://github.com/apache/zeppelin/pull/1339 Short summary and small thought about #1399 1. Using symlink like `local-spark/master` would be safe i think. It enables user replace his local spark without renaming directories. Currently we are using hard coded name. ``` SPARK_CACHE="local-spark" SPARK_ARCHIVE="spark-${SPARK_VERSION}-bin-hadoop${HADOOP_VERSION}" ``` 2. about UX, - most (experienced in zeppelin) users do not use local spark - for new comers, we can provide embedded spark using docker will be shipped by #1538 - and storking `get-spark` is not too hard even if new users do not use the docker images. 3. Now users need to type `get-spark`. it works as described ``` $ zeppelin-review git:(pr/1339) ./bin/zeppelin-daemon.sh start Log dir doesn't exist, create /Users/1ambda/github/apache-zeppelin/zeppelin-review/logs Pid dir doesn't exist, create /Users/1ambda/github/apache-zeppelin/zeppelin-review/run You do not have neither local-spark, nor external SPARK_HOME set up. If you want to use Spark interpreter, you need to run get-spark at least one time or set SPARK_HOME. Zeppelin start [ OK ] $ zeppelin-review git:(pr/1339) ./bin/zeppelin-daemon.sh stop Zeppelin stop [ OK ] $ zeppelin-review git:(pr/1339) ./bin/zeppelin-daemon.sh get-spark Download spark-2.0.1-bin-hadoop2.7.tgz from mirror ... % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 178M 100 178M0 0 7157k 0 0:00:25 0:00:25 --:--:-- 6953k spark-2.0.1-bin-hadoop2.7 is successfully downloaded and saved under /Users/lambda/github/apache-zeppelin/zeppelin-review/local-spark $ zeppelin-review git:(pr/1339) ./bin/zeppelin-daemon.sh start Zeppelin start [ OK ] ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...
Github user Leemoonsoo commented on the issue: https://github.com/apache/zeppelin/pull/1339 In case of 1) Don't have plan to use spark interpreter, just want to use other interpreters like python, big query. 2) Set SPARK_HOME in interpreter property instead of conf/zeppelin-env.sh User may not interested in local-spark. but user will keep seeing messages ``` Lees-MacBook:pr1339 moon$ bin/zeppelin-daemon.sh start You do not have neither local-spark, nor external SPARK_HOME set up. If you want to use Spark interpreter, you need to run get-spark at least one time or set SPARK_HOME. Zeppelin start [ OK ] Lees-MacBook:pr1339 moon$ bin/zeppelin-daemon.sh stop Zeppelin stop [ OK ] Lees-MacBook:pr1339 moon$ bin/zeppelin-daemon.sh start You do not have neither local-spark, nor external SPARK_HOME set up. If you want to use Spark interpreter, you need to run get-spark at least one time or set SPARK_HOME. Zeppelin start [ OK ] ``` @AhyoungRyu What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...
Github user AhyoungRyu commented on the issue: https://github.com/apache/zeppelin/pull/1339 @bzz Just updated `upgrade.md` as your feedback. @1ambda Sure. Thanks! Please do :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...
Github user 1ambda commented on the issue: https://github.com/apache/zeppelin/pull/1339 Let me also review this great PR and then give some feedbacks ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...
Github user bzz commented on the issue: https://github.com/apache/zeppelin/pull/1339 Thank you @AhyoungRyu for great job and taking care in addressing the [user experience concerns](https://github.com/apache/zeppelin/pull/1339#issuecomment-259683752)! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...
Github user AhyoungRyu commented on the issue: https://github.com/apache/zeppelin/pull/1339 ping ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...
Github user AhyoungRyu commented on the issue: https://github.com/apache/zeppelin/pull/1339 CI is green now, so ready for review. I updated related docs again based on #1615 and @tae-jun 's feedback as well. @bzz Could you take a look this again? As I mentioned in [this comment](https://github.com/apache/zeppelin/pull/1339#issuecomment-259683752), I added `You do not have neither local-spark, nor external SPARK_HOME set up.\nIf you want to use Spark interpreter, you need to run get-spark at least one time or set SPARK_HOME.` This msg will be printed when the user starts Zeppelin if he doesn't have neither `local-spark/` yet nor set external `SPARK_HOME` in his machine. Please see [my latest commit](https://github.com/apache/zeppelin/pull/1339/commits/2747d9eec49aa04f92ac93408f4c00cb101cb23e) :) Maybe this msg can be removed in the future, when many Zeppelin users can get accustomed to this change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...
Github user tae-jun commented on the issue: https://github.com/apache/zeppelin/pull/1339 @AhyoungRyu Thanks for taking care of my feedback ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...
Github user AhyoungRyu commented on the issue: https://github.com/apache/zeppelin/pull/1339 ### To whom may concern about the breaking current UX with this change This change has many benefits comparing to current embedded Spark as I wrote in the PR description (and @tae-jun mentioned in [this comment](https://github.com/apache/zeppelin/pull/1339#issuecomment-259486249) as well. Thanks!). But as always, this kind of big change brings downside as well (e.g. breaking current UX). So I wanna write down how we can address some major cases as below. I think it would be better to share my opinion and get more feedback before merging. :) 1. New Spark/Zeppelin user, running Zeppelin for the first time : Quite easy to cover and already handled by updating the related docs pages I guess. 2. Existing Spark/Zeppelin user, running new Zeppelin installation (e.g. upgrading version) : Definitely this case is harder to handle than 1. As the user already has expectation, that local mode will **just works** and surely they won't read the docs. To resolve this, Iâll update `bin/download-spark.sh` to print sth like âYou donât have local-spark/, you can download embedded Spark with `get-spark` option.â When the user run `./bin/zeppelin-daemon.sh start`. And this sentences can be removed in the future when Zeppelin users can be getting accustomed with `get-spark` option. 3. Docker user, starting `bin/zeppelin.sh` inside the container : This one can be also hard to handle because the user might assume that Spark just works. So I would suggest start applying this change to #1538 as a first step. Since it can be a Zeppelin-provided official docker script. 4. CI issue Since @bzz raised some concern about CI issue, let me answer again in here to make sure :) The reason I removed `-Ppyspark` in `.travis` is `pyspark` profile is only existed in `spark-dependencies/pom.xml`. So `pyspark` profile wonât be anymore after this PR merged. Actually the Pyspark testcase that @astroshim added recently had some conflict with this change. But we solved by simply adding `export SPARK_HOME=`pwd`/spark-$SPARK_VER-bin-hadoop$HADOOP_VER` to `.travis.yml` so that travis can run it before running the script. So there are no more CI issues especially concerning about removing `spark-dependencies` related build profiles. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...
Github user AhyoungRyu commented on the issue: https://github.com/apache/zeppelin/pull/1339 @tae-jun Appreciate your nice feedback! Will update again `zeppelin-env.sh` and `install.md` instead of `README.md` as you suggested(since #1615 is trying to make it simpler to deliver only key content). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...
Github user tae-jun commented on the issue: https://github.com/apache/zeppelin/pull/1339 Tested and worked as expected ð Fantastic work! But, there are other benefits caused by this change. Before this change, users couldn't use `SPARK_SUBMIT_OPTIONS` env variable using embedded Spark. (am i right?) But now it's possible! I tested with `export SPARK_SUBMIT_OPTIONS="--driver-memory 4G"` and on Spark UI, I could check It works. ![image](https://cloud.githubusercontent.com/assets/8201019/20149088/128d4bc4-a6f3-11e6-9990-705040e04a59.png) Therefore, I think it would be better to update `conf/zeppelin-env.sh`. There is a comment which is: ```sh ## Use provided spark installation ## ## defining SPARK_HOME makes Zeppelin run spark interpreter process using spark-submit ## # export SPARK_HOME # (required) When it is defined, load it instead of Zeppelin embedded Spark libraries # export SPARK_SUBMIT_OPTIONS # (optional) extra options to pass to spark submit. eg) "--driver-memory 512M --executor-memory 1G". # export SPARK_APP_NAME # (optional) The name of spark application. ## Use embedded spark binaries ## ## without SPARK_HOME defined, Zeppelin still able to run spark interpreter process using embedded spark binaries. ## however, it is not encouraged when you can define SPARK_HOME ## ``` This should be updated properly. In my opinion, it doesn't need to encourage use external spark anymore :-) And, is it possible to use embedded spark without `get-spark`? If not, I think it should be written on README clearly. LGTM ð --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...
Github user AhyoungRyu commented on the issue: https://github.com/apache/zeppelin/pull/1339 @bzz Yeah I also wanted to get more and more feedbacks for this change since it's a huge change as you said. Thanks for asking and i'm willing to explain again :) > ** 1.** Is the comment above is how it works now? Meaning, does on the first run of ./bin/zeppelin-deamon.sh or ./bin/zeppelin.sh a download of Apache Spark (100+Mb) happen, without asking a user? First time, I intended to ask sth like "Do you want to download local Spark?" when user starts Zeppelin daemon. But there are lot's of things to think about more since this question will be added before Zeppelin server start. e.g. [Some ppl are using Zeppelin as a start up service](https://github.com/apache/zeppelin/pull/1339#issuecomment-250672904) with their script as @jongyoul said. This kind of interactive mode will bother their env. So I decided to download this local Spark with `./bin/zeppelin-daemon.sh get-spark` or `./bin/zeppelin.sh get-spark`. With `get-spark` option, users don't need to be asked and they can choose whether they download this local mode Spark or not. Also they can use this local Spark without any configuration aka `zero configuration`. But we need to notice them the existence of `get-spark` option. That's why I updated documentation pages to let them know. > **2.** does this also mean that on CI it will happen on every run of SeleniumTests as well? This change won't effect to CI build. I added `./bin/download-spark.sh` to download Spark only when the user run `./bin/zeppelin-daemon.sh get-spark`. > **3.** -Ppyspark disappeared, but I remember it was added because we need to re-pack some files from Apache Spark to incorporate them in Zeppelin build in order for it to work on a cluster. Is it not the case any more? For Spark standalone and YARN, etc `pyspark` profile only exists in `spark-dependency` (Please see [here](https://github.com/apache/zeppelin/blob/master/spark-dependencies/pom.xml#L820)). Since `spark-dependencies` won't be existed anymore, `-Ppyspark` needs to be removed accordingly I guess. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...
Github user bzz commented on the issue: https://github.com/apache/zeppelin/pull/1339 Guys, what great work here, simplifying the build! A quick question @AhyoungRyu as it's kind of a big change, and I'm sorry if that was explained before, but could you please recap: ``` AhyoungRyu commented on Sep 17 @Leemoonsoo Thanks for your quick feedback! The "zero configuration like before" makes sense. Let me update and will ping you again. ``` 1. Is the comment above is how it works now? Meaning, does on the first run of `./bin/zeppelin-deamon.sh` or `./bin/zeppelin.sh` a download of Apache Spark (100+Mb) happen, without asking a user? 2. does this also mean that on CI it will happen on every run of SeleniumTests as well? 3. `-Ppyspark` disappeared, but I remember it was added because we need to re-pack some files from Apache Spark to incorporate them in Zeppelin build in order for it to work on a cluster. Is it not the case any more? For Spark standalone and YARN, etc Thanks in advance! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...
Github user AhyoungRyu commented on the issue: https://github.com/apache/zeppelin/pull/1339 Updated the related docs pages ([README.md](https://github.com/apache/zeppelin/pull/1339/files#diff-04c6e90faac2675aa89e2176d2eec7d8), [spark.md](https://github.com/apache/zeppelin/pull/1339/files#diff-83df2e7970d5a53a9028d05098bc626d), [upgrade.md](https://github.com/apache/zeppelin/pull/1339/files#diff-f472957a611b3e4d6c1171edca51cf93) and [install.md](https://github.com/apache/zeppelin/pull/1339/files#diff-f472957a611b3e4d6c1171edca51cf93)) and CI has passed now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...
Github user AhyoungRyu commented on the issue: https://github.com/apache/zeppelin/pull/1339 @astroshim It passed at last!! Thanks again. Will update the related docs if there are no further discussions about this changes :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...
Github user AhyoungRyu commented on the issue: https://github.com/apache/zeppelin/pull/1339 All test passes(except for selenium test) in my own travis [AhyoungRyu/zeppelin/builds](https://travis-ci.org/AhyoungRyu/zeppelin/builds/174094481), but Zeppelin travis doesn't even started... [apache/zeppelin/build](https://travis-ci.org/apache/zeppelin/builds/174094499). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...
Github user AhyoungRyu commented on the issue: https://github.com/apache/zeppelin/pull/1339 @astroshim Appreciate for your help! I've just pushed it and let's wait until it finished :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...
Github user AhyoungRyu commented on the issue: https://github.com/apache/zeppelin/pull/1339 @Leemoonsoo @jongyoul Sorry for my late update. I've just added new option: `get-spark` to [zeppelin-daemon.sh](https://github.com/apache/zeppelin/pull/1339/files#diff-bd1714fd11d1853b691468647374113dR23) and [zeppelin.sh](https://github.com/apache/zeppelin/pull/1339/files#diff-1724182f3ebaf54f5c9e202dcdf82415R46) to download local Spark binary. I think this is more simpler than getting user's answer and then separating "interactive mode" and "non-interactive mode" that @jongyoul mentioned in [here](https://github.com/apache/zeppelin/pull/1339#issuecomment-250672904). So to sum up, ppl can download local Spark with `./bin/zeppelin-daemon.sh get-spark` or `./bin/zeppelin.sh get-spark` with my latest update. If this way is okay, i'll update related docs pages accordingly. Maybe we need to let ppl know the existence of `get-spark` option by updating documentation i think. What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...
Github user AhyoungRyu commented on the issue: https://github.com/apache/zeppelin/pull/1339 @jongyoul Thanks for your feedback! Yeah I didn't try to cover that case. So you mean we need to support ppl who are using [this upstart option](http://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/install/install.html#optional-start-apache-zeppelin-with-a-service-manager), am I right? :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...
Github user jongyoul commented on the issue: https://github.com/apache/zeppelin/pull/1339 @AhyoungRyu Thanks for your effort. LGTM. But I think it would be better to support non-interactive mode for running the server because some of users launches Zeppelin as a start-up service for their server and interactive mode would break this feature. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...
Github user AhyoungRyu commented on the issue: https://github.com/apache/zeppelin/pull/1339 ping ð¯ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...
Github user AhyoungRyu commented on the issue: https://github.com/apache/zeppelin/pull/1339 I think [ZEPPELIN-1101](https://issues.apache.org/jira/browse/ZEPPELIN-1101) can also be resolved by this change. >It looks related to ZEPPELIN-1099 which is about removing dependencies from Spark. I think we don't need to build spark-dependencies by ourselves. we'd better support script to download spark binary and set SPARK_HOME. How about it? @jongyoul As you replied like above in ZEPPELIN-1101, could you please take a look this one? :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1339: [ZEPPELIN-1332] Remove spark-dependencies & suggest ne...
Github user AhyoungRyu commented on the issue: https://github.com/apache/zeppelin/pull/1339 I think this PR is working well as expected(at least to me haha). So ready for review again. @moon If you possible, could you please check this one again? :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---