[GitHub] zeppelin pull request #:
Github user lresende commented on the pull request: https://github.com/apache/zeppelin/commit/bd714c2b96d28b9b6e1b2c71431ace99e5e963ec#commitcomment-18310691 In spark/src/main/java/org/apache/zeppelin/spark/DepInterpreter.java: In spark/src/main/java/org/apache/zeppelin/spark/DepInterpreter.java on line 179: Are you just building a regular distribution of Zeppelin and using a official release of spark 1.5.1 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (ZEPPELIN-1216) Add a matrix table about "Zeppelin version X available interpreters"
Ahyoung created ZEPPELIN-1216: - Summary: Add a matrix table about "Zeppelin version X available interpreters" Key: ZEPPELIN-1216 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1216 Project: Zeppelin Issue Type: Improvement Components: documentation Reporter: Ahyoung Assignee: Ahyoung Priority: Minor There is no descriptions about available interpreters in each Zeppelin versions. It would be helpful if Zeppelin provides some kind of a matrix table for this so that users can compare "Zeppelin version X each available interpreter(with specific version)" at a glance. Maybe `[download page|https://zeppelin.apache.org/download.html]` is proper place for this table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] zeppelin issue #1206: ZEPPELIN-1199. Need to login using keytab and principa...
Github user zjffdu commented on the issue: https://github.com/apache/zeppelin/pull/1206 pyspark and sprakr both create SparkContext, so this fix also works in pyspark & sparkr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1160: [Zeppelin - 1152] Listing note revision history
Github user Leemoonsoo commented on the issue: https://github.com/apache/zeppelin/pull/1160 @corneadoug @khalidhuseynov Or how about align commit message left, and align date right and use fixed length date format? Then it gives little better readability i think. ``` add visualize paragrgraph...2016-07-06 10:12:34 add tutorial2016-07-06 09:11:11 the first commit2016-05-23 23:31:33 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1160: [Zeppelin - 1152] Listing note revision history
Github user khalidhuseynov commented on the issue: https://github.com/apache/zeppelin/pull/1160 @corneadoug yeah i changed the order since it's more intuitive to have name first, and date afterwards. But you're right when too long name, the ellipse doesn't look good. how do you think we can fix it; maybe move date to new line, or any other suggestions? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1160: [Zeppelin - 1152] Listing note revision history
Github user corneadoug commented on the issue: https://github.com/apache/zeppelin/pull/1160 @khalidhuseynov Previous order in the list was: Date - Title, now it is Title - Date. As you can see in your screenshot, when the text gets too long, there is an ellipse: `add visualiz paragraph - July 19th 2016, 5...` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1160: [Zeppelin - 1152] Listing note revision history
Github user khalidhuseynov commented on the issue: https://github.com/apache/zeppelin/pull/1160 @corneadoug i'm not sure exactly what you mean, could you provide an example? @anthonycorbacho because: * there're cases when backend will send the list triggered by different ws event (checkpoint) for example [here](https://github.com/khalidhuseynov/incubator-zeppelin/blob/f676b741c74d308157e7cf60ab3b1db59d215e95/zeppelin-server/src/main/java/org/apache/zeppelin/socket/NotebookServer.java#L1132) * to have consistency with original work and provide whole api through websocket, possibly later provide rest-api as well --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1186: [ZEPPELIN-1179] Append scala version to maven artifact...
Github user minahlee commented on the issue: https://github.com/apache/zeppelin/pull/1186 I have one question before merging, is there any special reason that ignite doesn't have `_2.11` suffix? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1186: [ZEPPELIN-1179] Append scala version to maven artifact...
Github user minahlee commented on the issue: https://github.com/apache/zeppelin/pull/1186 CI failures seems irrelevant. LGTM merging if there is no more discussion --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1160: [Zeppelin - 1152] Listing note revision history
Github user anthonycorbacho commented on the issue: https://github.com/apache/zeppelin/pull/1160 @khalidhuseynov what is the benefit of having a new websocket call? will rest api call be enough? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1160: [Zeppelin - 1152] Listing note revision history
Github user corneadoug commented on the issue: https://github.com/apache/zeppelin/pull/1160 Is it better to have the Date potentially hidden or the title hidden? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin pull request #1199: [HOTFIX][ZEPPELIN-1169] Fix wrong Guava version
Github user asfgit closed the pull request at: https://github.com/apache/zeppelin/pull/1199 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1199: [HOTFIX][ZEPPELIN-1169] Fix wrong Guava version
Github user jongyoul commented on the issue: https://github.com/apache/zeppelin/pull/1199 @bzz I've installed Safari driver for selenium 2.48.2 which is the same version that Zeppelin uses it. But it's deprecated now and I'll try to do it with Firefox 31. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1207: [DOC][ZEPPELIN-1209] Remove a useless sentence about d...
Github user jongyoul commented on the issue: https://github.com/apache/zeppelin/pull/1207 @AhyoungRyu Thanks for the quick fix. Could you please add the description about `Deprecated` of this feature? I have a plan to remove its dependency until 0.7.0. Except that, LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1207: [DOC][ZEPPELIN-1209] Remove a useless sentence about d...
Github user AhyoungRyu commented on the issue: https://github.com/apache/zeppelin/pull/1207 @zjffdu @jongyoul Could you review this one? :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin pull request #1207: [DOC][ZEPPELIN-1209] Remove a useless sentence ...
GitHub user AhyoungRyu opened a pull request: https://github.com/apache/zeppelin/pull/1207 [DOC][ZEPPELIN-1209] Remove a useless sentence about default interpreter in docs ### What is this PR for? As new interpreter registration mechanism which was started in [ZEPPELIN-804](https://issues.apache.org/jira/browse/ZEPPELIN-804), we can't set default interpreter anymore using `zeppelin-site.xml` as described in [https://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/install/install.html#apache-zeppelin-configuration](https://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/install/install.html#apache-zeppelin-configuration) (see `zeppelin.interpreters` property description in the configuration table). So we need to remove the related contents in Zeppelin docs site. Below pages will be updated: - [https://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/install/install.html#apache-zeppelin-configuration](https://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/install/install.html#apache-zeppelin-configuration) - [https://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/development/writingzeppelininterpreter.html#060-and-later](https://zeppelin.apache.org/docs/0.7.0-SNAPSHOT/development/writingzeppelininterpreter.html#060-and-later) ### What type of PR is it? Documentation ### What is the Jira issue? [ZEPPELIN-1209](https://issues.apache.org/jira/browse/ZEPPELIN-1209) ### How should this be tested? No need to test. Just removed two sentences about setting a default interpreter. ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? no * Is there breaking changes for older versions? no * Does this needs documentation? no You can merge this pull request into a Git repository by running: $ git pull https://github.com/AhyoungRyu/zeppelin ZEPPELIN-1209 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zeppelin/pull/1207.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1207 commit 9f44f224b894eb0fce9da7ad4c13ae12b96662a8 Author: AhyoungRyuDate: 2016-07-20T04:30:26Z Remove useless sentence about default interpreter in docs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1206: ZEPPELIN-1215. Need to login using keytab and principa...
Github user zjffdu commented on the issue: https://github.com/apache/zeppelin/pull/1206 Might need to check whether it works in pyspark/sparkr. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin pull request #1206: ZEPPELIN-1215. Need to login using keytab and p...
GitHub user zjffdu opened a pull request: https://github.com/apache/zeppelin/pull/1206 ZEPPELIN-1215. Need to login using keytab and principal before creating SparkContext in secured cluster ### What is this PR for? Need to login using keytab and principal before creating SparkContext in secured cluster. ### What type of PR is it? [Bug Fix] ### What is the Jira issue? * https://issues.apache.org/jira/browse/ZEPPELIN-1215 ### How should this be tested? Manually verified in a secured cluster. ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? no * Is there breaking changes for older versions? no * Does this needs documentation? no You can merge this pull request into a Git repository by running: $ git pull https://github.com/zjffdu/incubator-zeppelin ZEPPELIN-1215 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zeppelin/pull/1206.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1206 commit bd4a3d7f657f0b2b46f9f7a0e52c24aa0395be20 Author: Jeff ZhangDate: 2016-07-20T04:18:30Z ZEPPELIN-1215. Need to login using keytab and principal before creating SparkContext in secured cluster --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (ZEPPELIN-1215) Need to login using keytab and principal before creating SparkContext in secured cluster
Jeff Zhang created ZEPPELIN-1215: Summary: Need to login using keytab and principal before creating SparkContext in secured cluster Key: ZEPPELIN-1215 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1215 Project: Zeppelin Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jeff Zhang Assignee: Jeff Zhang -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] zeppelin pull request #1192: [ZEPPELIN-1189] Get note revision websocket api
Github user asfgit closed the pull request at: https://github.com/apache/zeppelin/pull/1192 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1192: [ZEPPELIN-1189] Get note revision websocket api
Github user bzz commented on the issue: https://github.com/apache/zeppelin/pull/1192 Looks great to me! Thank you for taking care. Megin if there is no further discussion --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1194: [ZEPPELIN-1193] Update Node JS related dependencies to...
Github user corneadoug commented on the issue: https://github.com/apache/zeppelin/pull/1194 @lresende we still get the warning from before (you can see it in the CI logs) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1195: [ZEPPELIN-759] Spark 2.0 support
Github user minahlee commented on the issue: https://github.com/apache/zeppelin/pull/1195 After I look close the pom files, I don't see clear reason why spark.version is specified in scala-2.11 and scala-2.10 profiles. https://github.com/apache/zeppelin/blob/master/spark-dependencies/pom.xml#L347 https://github.com/apache/zeppelin/blob/master/r/pom.xml#L386 https://github.com/apache/zeppelin/blob/master/r/pom.xml#L398 And I think it's always good to ship default spark version as the latest one. Can we also update below lines? https://github.com/apache/zeppelin/blob/master/spark-dependencies/pom.xml#L39 https://github.com/apache/zeppelin/blob/master/spark/pom.xml#L41 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin pull request #1192: [ZEPPELIN-1189] Get note revision websocket api
Github user khalidhuseynov closed the pull request at: https://github.com/apache/zeppelin/pull/1192 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin pull request #1192: [ZEPPELIN-1189] Get note revision websocket api
GitHub user khalidhuseynov reopened a pull request: https://github.com/apache/zeppelin/pull/1192 [ZEPPELIN-1189] Get note revision websocket api ### What is this PR for? Adds websocket api for getting note revision. ### What type of PR is it? Improvement | Feature ### Todos * [x] - add backend websocket handle * [x] - add frontend call ### What is the Jira issue? [#1189](https://issues.apache.org/jira/browse/ZEPPELIN-1189) ### How should this be tested? green CI (can be tested once frontend implemented) ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? no * Is there breaking changes for older versions? no * Does this needs documentation? no You can merge this pull request into a Git repository by running: $ git pull https://github.com/khalidhuseynov/incubator-zeppelin versioning/get-note-revision-api Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zeppelin/pull/1192.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1192 commit 79f8ac9525fb2f58fbde43511e742f65f47bf36e Author: Khalid HuseynovDate: 2016-07-15T09:03:06Z add getNoteRevision to front commit 3783fc95439391125205c2393f10466088773561 Author: Khalid Huseynov Date: 2016-07-15T09:12:26Z receive ws NOTE_REVISION msg commit d9751c3117c0171076d5232f62b46ea9dd50b1bb Author: Khalid Huseynov Date: 2016-07-15T10:05:54Z add backend ws api to get note revision commit baaa704fbf9ba24db09ef631a4ef2ceb32809396 Author: Khalid Huseynov Date: 2016-07-15T10:32:33Z change NotebookRepo api to get note revision from Revision object to String revId commit ce097ede81030da9273fea7b82dc38995d380e68 Author: Khalid Huseynov Date: 2016-07-18T12:39:14Z add throws to notebook getRevisionNote commit aa0a7d6af3655b7205c2e1ecb15733e90aff69a2 Author: Khalid Huseynov Date: 2016-07-18T16:24:05Z Revert "change NotebookRepo api to get note revision from Revision object to String revId" This reverts commit baaa704fbf9ba24db09ef631a4ef2ceb32809396. commit 683b481ddcf754eb81c041189a207625921eb1f2 Author: Khalid Huseynov Date: 2016-07-18T16:45:41Z receive Revision object from frontend instead of revisionId string commit f1ab9948c918bf921827fa90e558c57d9a9458f8 Author: Khalid Huseynov Date: 2016-07-19T14:09:42Z Merge branch 'master' into versioning/get-note-revision-api --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1193: [ZEPPELIN-1192] Block pyspark paragraph hang.
Github user astroshim commented on the issue: https://github.com/apache/zeppelin/pull/1193 please review this. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1195: [ZEPPELIN-759] Spark 2.0 support
Github user Leemoonsoo commented on the issue: https://github.com/apache/zeppelin/pull/1195 @minahlee I was trying to make 'zeppelin-spark' interpreter support spark 2.0 in this PR. If there're user demands, spark 2.0 support in 'zeppelin-zrinterpreter' can be addressed in separate issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin pull request #1203: [DOC][MINOR] Fix 'Drill JDBC Driver' link in jd...
Github user asfgit closed the pull request at: https://github.com/apache/zeppelin/pull/1203 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1170: BigQuery Interpreter for Apazhe Zeppelin[ZEPPELIN-1153...
Github user bzz commented on the issue: https://github.com/apache/zeppelin/pull/1170 Looks great to me, thank you for taking care! I think now the only thing that is left, is to determine the status of dependency: ``` com.google.apis google-api-services-bigquery v2-rev265-1.21.0 ``` >Got it. These packages are licensed under Apache 2.0. I have asked around to see if the code is publicly available. Is that an open source library? If so, let's link the sources to README.md --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [GSoC - 2016][Zeppelin Notebooks] Issues with Common Crawl Datasets
Hi Anish, thank you for sharing your progress and totally know what you mean - that's an expected pain of working with real BigData. I would advise to conduct a series of experiments: *1 moderate machine*, Spark 1.6 in local mode, 1 WARC input file (1Gb) - Spark in local mode is a single JVM process, so fine-tune it and make sure it uses ALL available memory (i.e 16Gb) - We are not going to use in-memory caching, so storage part can be turned off [1] and [2] - AFIAK DataFrames use memory more efficient than RDDs but not sure if we can benefit from it here - Start with something simple, like `val mayBegLinks = mayBegData.keepValidPages().count()` and make sure it works - Proceed further until few more complex queries work *Cluster of N machines*, Spark 1.6 in standalone cluster mode - process fraction of the whole dataset i.e 1 segment I know that is not easy, but it's worth to try for 1 more week and see if the approach outlined above works. Last, but not least - do not hesitate to reach out to CommonCrawl community [3] for an advice, there are people using Apache Spark there as well. Please keep us posted! 1. http://spark.apache.org/docs/latest/tuning.html#memory-management-overview 2. http://blog.cloudera.com/blog/2015/03/how-to-tune-your-apache-spark-jobs-part-2/ 3. https://groups.google.com/forum/#!forum/common-crawl -- Alex On Wed, Jul 20, 2016 at 2:27 AM, anish singhwrote: > Hello, > > The last two weeks have been tough and full of learning, the code in the > previous mail which performed only simple transformation and reduceByKey() > to count similar domain links did not work even on the first segment(1005 > MB) of data. So I studied and read extensively on the web : blogs(cloudera, > databricks and stack overflow) and books on Spark, tried all the options > and configurations on memory and performance tuning but the code did not > run. My current configurations to SPARK_SUBMIT_OPTIONS are set to > "--driver-memory 9g --driver-java-options -XX:+UseG1GC > -XX:+UseCompressedOops --conf spark.storage.memoryFraction=0.1" and even > this does not work. Even simple operations such as rdd.count() after the > transformations in the previous mail does not work. All this on an > m4.xlarge machine. > > Moreover, in trying to set up standalone cluster on single machine by > following instructions in the book 'Learning Spark', I messed with file > '~/.ssh/authorized_keys' file which cut me out of the instance so I had to > terminate it and start all over again after losing all the work done in one > week. > > Today, I performed a comparison of memory and cpu load values using the > size of data and the machine configurations between two conditions: (when I > worked on my local machine) vs. (m4.xlarge single instance), where > > memory load = (data size) / (memory available for processing), > cpu load = (data size) / (cores available for processing) > > the results of the comparison indicate that with the amount of data, the > AWS instance is 100 times more constrained than the analysis that I > previously did on my machine (for calculations, please see sheet [0] ). > This has completely stalled work as I'm unable to perform any further > operations on the data sets. Further, choosing another instance (such as 32 > GiB) may also not be sufficient (as per calculations in [0]). Please let me > know if I'm missing something or how to proceed with this. > > [0]. https://drive.google.com/open?id=0ByXTtaL2yHBuYnJSNGt6T2U2RjQ > > Thanks, > Anish. > > > > On Tue, Jul 12, 2016 at 12:35 PM, anish singh > wrote: > > > Hello, > > > > I had been able to setup zeppelin with spark on aws ec2 m4.xlarge > instance > > a few days ago. In designing the notebook, I was trying to visualize the > > link structure by the following code : > > > > val mayBegLinks = mayBegData.keepValidPages() > > .flatMap(r => ExtractLinks(r.getUrl, > > r.getContentString)) > > .map(r => (ExtractDomain(r._1), > > ExtractDomain(r._2))) > > .filter(r => (r._1.equals("www.fangraphs.com > ") > > || r._1.equals("www.osnews.com") || r._1.equals("www.dailytech.com"))) > > > > val linkWtMap = mayBegLinks.map(r => (r, 1)).reduceByKey((x, y) => x + y) > > linkWtMap.toDF().registerTempTable("LnkWtTbl") > > > > where 'mayBegData' is some 2GB of WARC for the first two segments of May. > > This paragraph runs smoothly but in the next paragraph using %sql and the > > following statement :- > > > > select W._1 as Links, W._2 as Weight from LnkWtTbl W > > > > I get errors which are always java.lang.OutOfMemoryError because of > > Garbage Collection space exceeded or heap space exceeded and the most > > recent one is the following: > > > > org.apache.thrift.transport.TTransportException at > > > org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) > > at
[GitHub] zeppelin issue #1199: [HOTFIX][ZEPPELIN-1169] Fix wrong Guava version
Github user bzz commented on the issue: https://github.com/apache/zeppelin/pull/1199 Looks good to me. CI is green now. AFAIK it's not possible to use Selenium with Safari that easy as it requires signed browser extension. Firefox 31 should work well though. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin pull request #1199: [HOTFIX][ZEPPELIN-1169] Fix wrong Guava version
GitHub user jongyoul reopened a pull request: https://github.com/apache/zeppelin/pull/1199 [HOTFIX][ZEPPELIN-1169] Fix wrong Guava version ### What is this PR for? Fixing the incompatible version for guava ### What type of PR is it? [Hot Fix] ### Todos * [x] - Revert guava.version for fitting in hadoop-2.6 ### What is the Jira issue? https://issues.apache.org/jira/browse/ZEPPELIN-1169 ### How should this be tested? 1. `mvn clean package -Pspark-1.6 -Phadoop-2.4 -Pyarn -Ppyspark -DskipTests` 1. Run spark interpreter with simple script ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No You can merge this pull request into a Git repository by running: $ git pull https://github.com/jongyoul/zeppelin ZEPPELIN-1169 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zeppelin/pull/1199.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1199 commit e0b11c152be9516d3be88032ce196d0f52d28225 Author: Jongyoul LeeDate: 2016-07-18T04:38:13Z Reverted guava.version for fitting in hadoop-2.6 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin pull request #1199: [HOTFIX][ZEPPELIN-1169] Fix wrong Guava version
Github user jongyoul closed the pull request at: https://github.com/apache/zeppelin/pull/1199 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1197: [ZEPPELIN-1196] Fix for bug ZEPPELIN-1196
Github user jongyoul commented on the issue: https://github.com/apache/zeppelin/pull/1197 LGTM. Merging if there's no more discussion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin pull request #1205: [ZEPPELIN-1212] User impersonation support in J...
GitHub user prabhjyotsingh opened a pull request: https://github.com/apache/zeppelin/pull/1205 [ZEPPELIN-1212] User impersonation support in JDBC interpreter for Hive⦠### What is this PR for? Add impersonation support to JDBC interpreters, in addition to Kerberos Authentication to improve auditability in all JDBC interpreters. ### What type of PR is it? [Bug Fix | Improvement] ### What is the Jira issue? * [ZEPPELIN-1212](https://issues.apache.org/jira/browse/ZEPPELIN-1212) ### How should this be tested? In JDBC interpreter setting add following properties - zeppelin.jdbc.auth.type = KERBEROS - zeppelin.jdbc.principal = principal value - zeppelin.jdbc.keytab.location = keytab location - enable shiro authentication via shiro.ini Now try and run any of hive's query (say show tables) it should return with valid results/errors depending on user permission. ### Questions: * Does the licenses files need update? n/a * Is there breaking changes for older versions? n/a * Does this needs documentation? n/a ⦠and Phoenix(Others) You can merge this pull request into a Git repository by running: $ git pull https://github.com/prabhjyotsingh/zeppelin ZEPPELIN-1212 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zeppelin/pull/1205.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1205 commit 66824a07445208b4ab4aa407c66abbc488b35eec Author: Prabhjyot SinghDate: 2016-07-19T16:36:16Z ZEPPELIN-1212 User impersonation support in JDBC interpreter for Hive and Phoenix(Others) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (ZEPPELIN-1212) User impersonation support in JDBC interpreter for Hive and Phoenix(Others)
Prabhjyot Singh created ZEPPELIN-1212: - Summary: User impersonation support in JDBC interpreter for Hive and Phoenix(Others) Key: ZEPPELIN-1212 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1212 Project: Zeppelin Issue Type: Bug Components: zeppelin-server Affects Versions: 0.6.0, 0.6.1, 0.7.0 Reporter: Prabhjyot Singh Assignee: Prabhjyot Singh Priority: Critical Fix For: 0.6.1 Add impersonation support to JDBC interpreters, in addition to Kerberos Authentication to improve auditability in all JDBC interpreters. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] zeppelin pull request #:
Github user karup1990 commented on the pull request: https://github.com/apache/zeppelin/commit/bd714c2b96d28b9b6e1b2c71431ace99e5e963ec#commitcomment-18297977 In spark/src/main/java/org/apache/zeppelin/spark/DepInterpreter.java: In spark/src/main/java/org/apache/zeppelin/spark/DepInterpreter.java on line 179: @Leemoonsoo @lresende I am hitting an NPE here. ` ERROR [2016-07-19 12:27:36,533] ({pool-2-thread-2} Utils.java[instantiateClass]:74) - SparkJLineCompletion java.lang.ClassNotFoundException: SparkJLineCompletion at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at org.apache.zeppelin.spark.Utils.instantiateClass(Utils.java:69) at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:644) at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:71) ` I am running spark-1.5.1. Any pointers/ideas why this could happen? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin pull request #1184: [ZEPPELIN-1159] Livy interpreter gets "404 not ...
Github user asfgit closed the pull request at: https://github.com/apache/zeppelin/pull/1184 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin pull request #1204: [Zeppelin-1089] Replace JsHint with Eslint
GitHub user corneadoug opened a pull request: https://github.com/apache/zeppelin/pull/1204 [Zeppelin-1089] Replace JsHint with Eslint ### What is this PR for? This PR replace JsHint with Eslint, the original configuration was ported from jsHint using [Polyjuice](https://www.npmjs.com/package/polyjuice), then setting were tuned better. For this PR, the goal is not to add new rules, but only to replace the system with the same rules. ### What type of PR is it? Improvement ### Todos * [ ] - Add dev licenses for grunt-eslint * [ ] - Remove license for jsHint * [ ] - Remove .jshintrc ### What is the Jira issue? https://issues.apache.org/jira/browse/ZEPPELIN-1089 ### How should this be tested? You can simply build the code. You can try to add an unused variable in the js code to see an error. ### Questions: * Does the licenses files need update? Yes * Is there breaking changes for older versions? No * Does this needs documentation? No You can merge this pull request into a Git repository by running: $ git pull https://github.com/corneadoug/incubator-zeppelin ZEPPELIN-1089 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zeppelin/pull/1204.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1204 commit 7b1da053cc8feb302ede04e95b6bdd10fc6933ea Author: Damien CORNEAUDate: 2016-07-19T09:40:00Z Replace JsHint by Eslint commit 0364ad48d03770603a2a952a1077e717cba107f7 Author: Damien CORNEAU Date: 2016-07-19T10:00:43Z Fix eslint for tests --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1197: [ZEPPELIN-1196] Fix for bug ZEPPELIN-1196
Github user SachinJanani commented on the issue: https://github.com/apache/zeppelin/pull/1197 Thanks @jongyoul. Sure will make the change also will add a comment in the code about about the 30 seconds timeout --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1195: [ZEPPELIN-759] Spark 2.0 support
Github user minahlee commented on the issue: https://github.com/apache/zeppelin/pull/1195 Shall we also change spark version in https://github.com/apache/zeppelin/blob/master/r/pom.xml#L386 and https://github.com/apache/zeppelin/blob/master/r/pom.xml#L398 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1160: [Zeppelin - 1152] Listing note revision history
Github user khalidhuseynov commented on the issue: https://github.com/apache/zeppelin/pull/1160 @Leemoonsoo updated code and screenshot with feedback --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1197: [ZEPPELIN-1196] Fix for bug ZEPPELIN-1196
Github user jongyoul commented on the issue: https://github.com/apache/zeppelin/pull/1197 @SachinJanani 30 seconds would be realistic. And could you please leave memo that it may occurs a potential bug when it fails launching it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin pull request #1200: [ZEPPELIN-1191] Supported legacy way to run par...
GitHub user jongyoul reopened a pull request: https://github.com/apache/zeppelin/pull/1200 [ZEPPELIN-1191] Supported legacy way to run paragraph with group name only ### What is this PR for? Preserving legacy way to run paragraph when users use group name only ### What type of PR is it? [Improvement] ### Todos * [x] - Added the way to find a interpreter with group name at last ### What is the Jira issue? https://issues.apache.org/jira/browse/ZEPPELIN-1191 ### How should this be tested? ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No You can merge this pull request into a Git repository by running: $ git pull https://github.com/jongyoul/zeppelin ZEPPELIN-1191 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zeppelin/pull/1200.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1200 commit 7c0dc372fb5e8ceb1c1054c962dd8de34b891a4d Author: Jongyoul LeeDate: 2016-07-18T08:30:01Z Supported old way to run paragraph with group name only --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin pull request #1200: [ZEPPELIN-1191] Supported legacy way to run par...
Github user jongyoul closed the pull request at: https://github.com/apache/zeppelin/pull/1200 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1187: [ZEPPELIN-1163] Change some parameter name properly in...
Github user jongyoul commented on the issue: https://github.com/apache/zeppelin/pull/1187 @zmhassan You look like that you don't use formatter in your IDE. Could you please change the formatter setting fit in google style and reformat whole of files you changed? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin pull request #1201: [MINOR] Enable pyspark test in local mode
GitHub user jongyoul reopened a pull request: https://github.com/apache/zeppelin/pull/1201 [MINOR] Enable pyspark test in local mode ### What is this PR for? Enabling test for pyspark in local mode ### What type of PR is it? [Improvement] ### Todos * [x] - Add spark configuration and * [x] - Edit a logic for finding spark home ### What is the Jira issue? N/A ### How should this be tested? 1. Download spark under {ZEPPELIN_HOME} 1. run `testing/startSparkCluster.sh` 1. `mvn clean package -Pspark-1.6 -Ppyspark -Pyarn -DskipTests` 1. Run `ZeppelinSparkClusterTest` ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No You can merge this pull request into a Git repository by running: $ git pull https://github.com/jongyoul/zeppelin minor/enable-pyspark-test-in-local-mode Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zeppelin/pull/1201.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1201 commit 67ccc2df0397267bb4d3e3140c24cd1c1efa78da Author: Jongyoul LeeDate: 2016-07-18T09:09:42Z Added spark configuration while not using CI --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin pull request #1201: [MINOR] Enable pyspark test in local mode
Github user jongyoul closed the pull request at: https://github.com/apache/zeppelin/pull/1201 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin pull request #1179: [ZEPPELIN-1109] Remove bootstrap dialog fade-in...
Github user asfgit closed the pull request at: https://github.com/apache/zeppelin/pull/1179 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin issue #1203: [DOC][MINOR] Fix 'Drill JDBC Driver' link in jdbc.md
Github user minahlee commented on the issue: https://github.com/apache/zeppelin/pull/1203 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin pull request #1203: [DOC][MINOR] Fix 'Drill JDBC Driver' link in jd...
GitHub user AhyoungRyu opened a pull request: https://github.com/apache/zeppelin/pull/1203 [DOC][MINOR] Fix 'Drill JDBC Driver' link in jdbc.md ### What is this PR for? This PR is for fixing odd 'Drill JDBC Driver' link in `jdbc.md` ### What type of PR is it? Documentation ### What is the Jira issue? Since it's minor fixing, I didn't create a Jira issue for this. ### How should this be tested? Please just see the attached screenshot images :) ### Screenshots (if appropriate) - Before ![screen shot 2016-07-19 at 4 09 13 pm](https://cloud.githubusercontent.com/assets/10060731/16941504/78af23c2-4dcb-11e6-9031-479a98b972a1.png) - After ![screen shot 2016-07-19 at 4 09 21 pm](https://cloud.githubusercontent.com/assets/10060731/16941507/7e5ddd40-4dcb-11e6-88c9-86635ab6b974.png) ### Questions: * Does the licenses files need update? no * Is there breaking changes for older versions? no * Does this needs documentation? no You can merge this pull request into a Git repository by running: $ git pull https://github.com/AhyoungRyu/zeppelin fix/jdbcInterpreterDocs Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zeppelin/pull/1203.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1203 commit fddd01a0885a7cf4c008e0724669beae93ee6b20 Author: AhyoungRyuDate: 2016-07-19T07:09:00Z Fix 'Drill JDBC Driver' link commit 9f359f84a14ff3918a33f0b02212051a65ade5f1 Author: AhyoungRyu Date: 2016-07-19T07:14:13Z Fix dead link --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (ZEPPELIN-1210) Run interpreter per user
Jongyoul Lee created ZEPPELIN-1210: -- Summary: Run interpreter per user Key: ZEPPELIN-1210 URL: https://issues.apache.org/jira/browse/ZEPPELIN-1210 Project: Zeppelin Issue Type: New Feature Components: zeppelin-zengine Affects Versions: 0.6.0 Reporter: Jongyoul Lee Assignee: Jongyoul Lee It helps users to use their own interpreters -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] zeppelin pull request #1198: [ZEPPELIN-1202] Documentation typo under writin...
Github user asfgit closed the pull request at: https://github.com/apache/zeppelin/pull/1198 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zeppelin pull request #1178: [Zeppelin-1167] Group $scope.$on functions
Github user asfgit closed the pull request at: https://github.com/apache/zeppelin/pull/1178 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---