[GitHub] [zeppelin] felixcheung commented on a change in pull request #3337: [ZEPPELIN-4078] Ipython queue performance
felixcheung commented on a change in pull request #3337: [ZEPPELIN-4078] Ipython queue performance URL: https://github.com/apache/zeppelin/pull/3337#discussion_r267638063 ## File path: python/src/main/resources/grpc/python/ipython_server.py ## @@ -52,24 +52,19 @@ def execute(self, request, context): print("execute code:\n") print(request.code.encode('utf-8')) sys.stdout.flush() -stdout_queue = queue.Queue(maxsize = 10) -stderr_queue = queue.Queue(maxsize = 10) -image_queue = queue.Queue(maxsize = 5) - +stream_reply_queue = queue.Queue(maxsize = 20) Review comment: should maxsize be a bit more? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [zeppelin] felixcheung commented on a change in pull request #3337: [ZEPPELIN-4078] Ipython queue performance
felixcheung commented on a change in pull request #3337: [ZEPPELIN-4078] Ipython queue performance URL: https://github.com/apache/zeppelin/pull/3337#discussion_r267637879 ## File path: python/src/main/resources/grpc/python/ipython_server.py ## @@ -52,24 +52,19 @@ def execute(self, request, context): print("execute code:\n") print(request.code.encode('utf-8')) sys.stdout.flush() -stdout_queue = queue.Queue(maxsize = 10) -stderr_queue = queue.Queue(maxsize = 10) -image_queue = queue.Queue(maxsize = 5) - +stream_reply_queue = queue.Queue(maxsize = 20) +payload_reply = [] def _output_hook(msg): msg_type = msg['header']['msg_type'] content = msg['content'] if msg_type == 'stream': -stdout_queue.put(content['text']) + stream_reply_queue.put(ipython_pb2.ExecuteResponse(status=ipython_pb2.SUCCESS, type=ipython_pb2.TEXT, output=content['text'])) +elif msg_type == 'error': + stream_reply_queue.put(ipython_pb2.ExecuteResponse(status=ipython_pb2.ERROR, type=ipython_pb2.TEXT, output='\n'.join(content['traceback']))) elif msg_type in ('display_data', 'execute_result'): -stdout_queue.put(content['data'].get('text/plain', '')) + stream_reply_queue.put(ipython_pb2.ExecuteResponse(status=ipython_pb2.SUCCESS, type=ipython_pb2.TEXT, output=content['data'].get('text/plain', ''))) if 'image/png' in content['data']: -image_queue.put(content['data']['image/png']) -elif msg_type == 'error': -stderr_queue.put('\n'.join(content['traceback'])) - - -payload_reply = [] + stream_reply_queue.put(ipython_pb2.ExecuteResponse(status=ipython_pb2.SUCCESS, type=ipython_pb2.IMAGE, output=content['data']['image/png'])) Review comment: this is a bit long and hard to read? consider refactor `stream_reply_queue.put` into a separate line This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (ZEPPELIN-4082) Error occured when using UDF with scoped notebook
Jin-Hyeok, Cha created ZEPPELIN-4082: Summary: Error occured when using UDF with scoped notebook Key: ZEPPELIN-4082 URL: https://issues.apache.org/jira/browse/ZEPPELIN-4082 Project: Zeppelin Issue Type: Bug Components: Interpreters Affects Versions: 0.8.1 Environment: * Zeppelin v0.8.1 * Spark v2.4.0 (1 Master, N Workers) * Hadoop (Embedded, Maybe v2.7.x) * The interpreter will be instantiated *Per Note* in *scoped* process Reporter: Jin-Hyeok, Cha When I defined my own function with UDF (User-Defined Functions) feature, and I got the error message like this: {code:java} java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD {code} I just defined the simple function: {code:java} import java.text.SimpleDateFormat def diffHour(s1: String, s2: String): Long = { var hour = 0L try { val sdf = new SimpleDateFormat("-MM-dd HH:mm:ss") val d1 = sdf.parse(s1) val d2 = sdf.parse(s2) hour = d2.getTime - d1.getTime hour /= 1000 * 60 * 60 } catch { case e: Exception => hour = -1 } hour }{code} And registered my function to Spark SQL Context: {code:java} sqlContext.udf.register("diffHour", diffHour _) {code} Now I expected I can use my function on SQL. {code:java} %sql SELECT id, time, diffHour(time, '2019-01-01 00:00:00') as hour FROM users{code} But the error occurred I mentioned at first. I used *Per Note* and *scoped* settings for Spark Interpreter. So I changed Interpreter settings to *Globally*. Then error not occurred. How can I fix it? Please help me. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [zeppelin] zjffdu opened a new pull request #3338: ZEPPELIN-4081. when the python process is killed, the task state is still running
zjffdu opened a new pull request #3338: ZEPPELIN-4081. when the python process is killed,the task state is still running URL: https://github.com/apache/zeppelin/pull/3338 ### What is this PR for? This PR will break python code execution if the python process is existed. Besides that, I also improve the error message for ipython interpreter although it doesn't have such issue. ### What type of PR is it? [Bug Fix] ### Todos * [ ] - Task ### What is the Jira issue? * https://issues.apache.org/jira/browse/ZEPPELIN-4081 ### How should this be tested? * Unit test is added ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (ZEPPELIN-4081) when the python process is killed,the task state is still running
MOBIN created ZEPPELIN-4081: --- Summary: when the python process is killed,the task state is still running Key: ZEPPELIN-4081 URL: https://issues.apache.org/jira/browse/ZEPPELIN-4081 Project: Zeppelin Issue Type: Bug Components: python-interpreter Affects Versions: 0.8.0 Environment: centOS-6.5 java version "1.7.0_75" Reporter: MOBIN when execute the following python code, then kill the python process,the taks state is still running {code:java} // import time print("start") time.sleep(1000) print("end"){code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [zeppelin] AyWa commented on issue #3336: [ZEPPELIN-4078] Fix concurrent autocomplete and execute for Ipython
AyWa commented on issue #3336: [ZEPPELIN-4078] Fix concurrent autocomplete and execute for Ipython URL: https://github.com/apache/zeppelin/pull/3336#issuecomment-475073644 @Leemoonsoo Thx you for the info, I guess it was because of `new Properties()`. i pushed a changed to use `initIntpProperties()` in the test. Let's hope it will pass 🤞 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [zeppelin] jongyoul commented on a change in pull request #3316: [ZEPPELIN-3985]. Move note permission from notebook-authorization.json to note file
jongyoul commented on a change in pull request #3316: [ZEPPELIN-3985]. Move note permission from notebook-authorization.json to note file URL: https://github.com/apache/zeppelin/pull/3316#discussion_r267533084 ## File path: zeppelin-zengine/src/main/java/org/apache/zeppelin/notebook/AuthorizationService.java ## @@ -0,0 +1,249 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.zeppelin.notebook; + +import com.google.common.base.Predicate; +import com.google.common.collect.FluentIterable; +import com.google.common.collect.Sets; +import org.apache.commons.lang.StringUtils; +import org.apache.zeppelin.conf.ZeppelinConfiguration; +import org.apache.zeppelin.user.AuthenticationInfo; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import javax.inject.Inject; +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; + +/** + * This class is responsible for maintain notes authorization info. And provide api for + * setting and querying note authorization info. + */ +public class AuthorizationService { + + private static final Logger LOGGER = LoggerFactory.getLogger(AuthorizationService.class); + private static final Set EMPTY_SET = new HashSet<>(); + + private ZeppelinConfiguration conf; + private Notebook notebook; + /* + * contains roles for each user + */ + private Map> userRoles = new HashMap<>(); + + @Inject + public AuthorizationService(Notebook notebook) { Review comment: It would be better to have `ZeppelinConfiguration` injected as a parameter of this constructor. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [zeppelin] Leemoonsoo edited a comment on issue #3336: [ZEPPELIN-4078] Fix concurrent autocomplete and execute for Ipython
Leemoonsoo edited a comment on issue #3336: [ZEPPELIN-4078] Fix concurrent autocomplete and execute for Ipython URL: https://github.com/apache/zeppelin/pull/3336#issuecomment-474924351 Thanks @AyWa for the contribution. A test is failing with ``` 13:09:18,331 INFO org.apache.zeppelin.spark.OldSparkInterpreter:338 - -- Create new SparkContext local --- 13:09:18,335 INFO org.apache.spark.SparkContext:58 - Running Spark version 1.6.3 13:09:18,338 ERROR org.apache.spark.SparkContext:95 - Error initializing SparkContext. org.apache.spark.SparkException: An application name must be set in your configuration at org.apache.spark.SparkContext.(SparkContext.scala:404) at org.apache.zeppelin.spark.OldSparkInterpreter.createSparkContext_1(OldSparkInterpreter.java:426) at org.apache.zeppelin.spark.OldSparkInterpreter.createSparkContext(OldSparkInterpreter.java:321) at org.apache.zeppelin.spark.OldSparkInterpreter.getSparkContext(OldSparkInterpreter.java:139) at org.apache.zeppelin.spark.OldSparkInterpreter.open(OldSparkInterpreter.java:696) at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:66) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69) at org.apache.zeppelin.interpreter.Interpreter.getInterpreterInTheSameSessionByClassName(Interpreter.java:354) at org.apache.zeppelin.interpreter.Interpreter.getInterpreterInTheSameSessionByClassName(Interpreter.java:365) at org.apache.zeppelin.spark.IPySparkInterpreter.open(IPySparkInterpreter.java:52) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69) at org.apache.zeppelin.spark.IPySparkInterpreterTest.startInterpreter(IPySparkInterpreterTest.java:93) at org.apache.zeppelin.python.IPythonInterpreterTest.testIpython_shouldNotHang_whenCallingAutoCompleteAndInterpretConcurrently(IPythonInterpreterTest.java:250) ``` Looks like https://github.com/apache/zeppelin/blob/master/spark/interpreter/src/test/java/org/apache/zeppelin/spark/IPySparkInterpreterTest.java#L59 is somehow not applied on the test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [zeppelin] Leemoonsoo commented on issue #3336: [ZEPPELIN-4078] Fix concurrent autocomplete and execute for Ipython
Leemoonsoo commented on issue #3336: [ZEPPELIN-4078] Fix concurrent autocomplete and execute for Ipython URL: https://github.com/apache/zeppelin/pull/3336#issuecomment-474924351 Thanks @AyWa for the contribution. Looks like a test is failing with ``` 13:09:18,331 INFO org.apache.zeppelin.spark.OldSparkInterpreter:338 - -- Create new SparkContext local --- 13:09:18,335 INFO org.apache.spark.SparkContext:58 - Running Spark version 1.6.3 13:09:18,338 ERROR org.apache.spark.SparkContext:95 - Error initializing SparkContext. org.apache.spark.SparkException: An application name must be set in your configuration at org.apache.spark.SparkContext.(SparkContext.scala:404) at org.apache.zeppelin.spark.OldSparkInterpreter.createSparkContext_1(OldSparkInterpreter.java:426) at org.apache.zeppelin.spark.OldSparkInterpreter.createSparkContext(OldSparkInterpreter.java:321) at org.apache.zeppelin.spark.OldSparkInterpreter.getSparkContext(OldSparkInterpreter.java:139) at org.apache.zeppelin.spark.OldSparkInterpreter.open(OldSparkInterpreter.java:696) at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:66) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69) at org.apache.zeppelin.interpreter.Interpreter.getInterpreterInTheSameSessionByClassName(Interpreter.java:354) at org.apache.zeppelin.interpreter.Interpreter.getInterpreterInTheSameSessionByClassName(Interpreter.java:365) at org.apache.zeppelin.spark.IPySparkInterpreter.open(IPySparkInterpreter.java:52) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69) at org.apache.zeppelin.spark.IPySparkInterpreterTest.startInterpreter(IPySparkInterpreterTest.java:93) at org.apache.zeppelin.python.IPythonInterpreterTest.testIpython_shouldNotHang_whenCallingAutoCompleteAndInterpretConcurrently(IPythonInterpreterTest.java:250) ``` Looks like https://github.com/apache/zeppelin/blob/master/spark/interpreter/src/test/java/org/apache/zeppelin/spark/IPySparkInterpreterTest.java#L59 is somehow not applied on the test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [zeppelin] zjffdu commented on issue #3316: [ZEPPELIN-3985]. Move note permission from notebook-authorization.json to note file
zjffdu commented on issue #3316: [ZEPPELIN-3985]. Move note permission from notebook-authorization.json to note file URL: https://github.com/apache/zeppelin/pull/3316#issuecomment-474852742 @felixcheung Could you help review it ? Thanks This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [zeppelin] zjffdu commented on issue #3336: [ZEPPELIN-4078] Fix concurrent autocomplete and execute for Ipython
zjffdu commented on issue #3336: [ZEPPELIN-4078] Fix concurrent autocomplete and execute for Ipython URL: https://github.com/apache/zeppelin/pull/3336#issuecomment-474766783 Thanks @AyWa LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [zeppelin] AyWa opened a new pull request #3337: [ZEPPELIN-4078] Ipython queue performance
AyWa opened a new pull request #3337: [ZEPPELIN-4078] Ipython queue performance URL: https://github.com/apache/zeppelin/pull/3337 ### What is this PR for? The pr is to fix a bug that will make the **ipython** queue listener, overuse cpu. After this fix, cpu usage should be way lower. Also there is a bit of refactor to use only one queue to ensure emssage will be order even with a sleep. ### What type of PR is it? Bug Fix / performance improvement ### Todos * [x] - Performance improvement * [ ] - Need to add some performance test ? or other test ? ### What is the Jira issue? It is one part of the jira issue. https://issues.apache.org/jira/browse/ZEPPELIN-4078 ### How should this be tested? * First time? Setup Travis CI as described on https://zeppelin.apache.org/contribution/contributions.html#continuous-integration * Strongly recommended: add automated unit tests for any new or changed behavior * Outline any manual steps to test the PR here. ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? no * Is there breaking changes for older versions? no * Does this needs documentation? no This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [zeppelin] AyWa opened a new pull request #3336: [ZEPPELIN-4078] Fix concurrent autocomplete and execute for Ipython
AyWa opened a new pull request #3336: [ZEPPELIN-4078] Fix concurrent autocomplete and execute for Ipython URL: https://github.com/apache/zeppelin/pull/3336 ### What is this PR for? The pr is to fix a bug that will make the **ipython** `execute_interactive` hang forever if a auto `complete` call is make at the same time. (see unit test for example that is failing on master). For now the fix is to synchronize those method : `execute` / `complete`. It will not bring regression because anyway, the kernel does not support concurrent execute and auto complete (see https://github.com/jupyter/notebook/issues/3763) ### What type of PR is it? Bug Fix ### Todos * [x] - unit test failing in master / succeed on this branch * [x] - fix with lock ### What is the Jira issue? It is one part of the jira issue. Other fix will come soon https://issues.apache.org/jira/browse/ZEPPELIN-4078 ### How should this be tested? * First time? Setup Travis CI as described on https://zeppelin.apache.org/contribution/contributions.html#continuous-integration * Strongly recommended: add automated unit tests for any new or changed behavior * Outline any manual steps to test the PR here. ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? no * Is there breaking changes for older versions? no * Does this needs documentation? no This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services