[jira] [Commented] (LIVY-489) Expose a JDBC endpoint for Livy
[ https://issues.apache.org/jira/browse/LIVY-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922946#comment-16922946 ] Von Han Yu commented on LIVY-489: - Awesome, thanks [~mgaido]. Will check the livy.conf. > Expose a JDBC endpoint for Livy > --- > > Key: LIVY-489 > URL: https://issues.apache.org/jira/browse/LIVY-489 > Project: Livy > Issue Type: New Feature > Components: API, Server >Affects Versions: 0.6.0 >Reporter: Marco Gaido >Assignee: Marco Gaido >Priority: Major > Fix For: 0.6.0 > > > Many users and BI tools use JDBC connections in order to retrieve data. As > Livy exposes only a REST API, this is a limitation in its adoption. Hence, > adding a JDBC endpoint may be a very useful feature, which could also make > Livy a more attractive solution for end user to adopt. > Moreover, currently, Spark exposes a JDBC interface, but this has many > limitations, including that all the queries are submitted to the same > application, therefore there is no isolation/security, which can be offered > by Livy, making a Livy JDBC API a better solution for companies/users who > want to use Spark in order to run they queries through JDBC. > In order to make the transition from existing solutions to the new JDBC > server seamless, the proposal is to use the Hive thrift-server and extend it > as it was done by the STS. > [Here, you can find the design > doc.|https://docs.google.com/document/d/18HAR_VnQLegbYyzGg8f4zwD4GtDP5q_t3K21eXecZC4/edit] > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (LIVY-660) How can we use YARN and all the nodes in our cluster when submiting a pySpark job
Sebastian Rama created LIVY-660: --- Summary: How can we use YARN and all the nodes in our cluster when submiting a pySpark job Key: LIVY-660 URL: https://issues.apache.org/jira/browse/LIVY-660 Project: Livy Issue Type: Question Components: Server Affects Versions: 0.6.0 Reporter: Sebastian Rama How can we use YARN and all the nodes in our cluster when submiting a pySpark job? We have edited all the required .conf files but nothing happens. =( [root@cdh-node06 conf]# cat livy-client.conf # # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the "License"); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # Use this keystore for the SSL certificate and key. # livy.keystore = dew0wf-e # Specify the keystore password. # livy.keystore.password = # welfka # Specify the key password. # livy.key-password = # Hadoop Credential Provider Path to get "livy.keystore.password" and "livy.key-password". # Credential Provider can be created using command as follow: # hadoop credential create "livy.keystore.password" -value "secret" -provider jceks://hdfs/path/to/livy.jceks # livy.hadoop.security.credential.provider.path = # What host address to start the server on. By default, Livy will bind to all network interfaces. # livy.server.host = 0.0.0.0 # What port to start the server on. # livy.server.port = 8998 # What base path ui should work on. By default UI is mounted on "/". # E.g.: livy.ui.basePath = /my_livy - result in mounting UI on /my_livy/ # livy.ui.basePath = "" # What spark master Livy sessions should use. livy.spark.master = yarn # What spark deploy mode Livy sessions should use. livy.spark.deploy-mode = cluster # Configure Livy server http request and response header size. # livy.server.request-header.size = 131072 # livy.server.response-header.size = 131072 # Enabled to check whether timeout Livy sessions should be stopped. # livy.server.session.timeout-check = true # Time in milliseconds on how long Livy will wait before timing out an idle session. # livy.server.session.timeout = 1h # # How long a finished session state should be kept in LivyServer for query. # livy.server.session.state-retain.sec = 600s # If livy should impersonate the requesting users when creating a new session. # livy.impersonation.enabled = true # Logs size livy can cache for each session/batch. 0 means don't cache the logs. # livy.cache-log.size = 200 # Comma-separated list of Livy RSC jars. By default Livy will upload jars from its installation # directory every time a session is started. By caching these files in HDFS, for example, startup # time of sessions on YARN can be reduced. # livy.rsc.jars = # Comma-separated list of Livy REPL jars. By default Livy will upload jars from its installation # directory every time a session is started. By caching these files in HDFS, for example, startup # time of sessions on YARN can be reduced. Please list all the repl dependencies including # Scala version-specific livy-repl jars, Livy will automatically pick the right dependencies # during session creation. # livy.repl.jars = # Location of PySpark archives. By default Livy will upload the file from SPARK_HOME, but # by caching the file in HDFS, startup time of PySpark sessions on YARN can be reduced. # livy.pyspark.archives = # Location of the SparkR package. By default Livy will upload the file from SPARK_HOME, but # by caching the file in HDFS, startup time of R sessions on YARN can be reduced. # livy.sparkr.package = # List of local directories from where files are allowed to be added to user sessions. By # default it's empty, meaning users can only reference remote URIs when starting their # sessions. # livy.file.local-dir-whitelist = # Whether to enable csrf protection, by default it is false. If it is enabled, client should add # http-header "X-Requested-By" in request if the http method is POST/DELETE/PUT/PATCH. # livy.server.csrf-protection.enabled = # Whether to enable HiveContext in livy interpreter, if it is true hive-site.xml will be detected # on user request and then livy server
[jira] [Commented] (LIVY-489) Expose a JDBC endpoint for Livy
[ https://issues.apache.org/jira/browse/LIVY-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922363#comment-16922363 ] Marco Gaido commented on LIVY-489: -- Hi [~vonatzki]. Please check the design doc attached here. Moreover, please check the configurations added in LivyConf. You'll see that the thriftserver must be enabled and configured properly and default values for the configurations are there. As a suggestion, it is possible for you, I'd recommend to build Livy from master branch, even though the thriftserver is present also in the 0.6.0 version. Indeed, it was the first release for it, so might find issues which have been fixed on current master. Thanks. > Expose a JDBC endpoint for Livy > --- > > Key: LIVY-489 > URL: https://issues.apache.org/jira/browse/LIVY-489 > Project: Livy > Issue Type: New Feature > Components: API, Server >Affects Versions: 0.6.0 >Reporter: Marco Gaido >Assignee: Marco Gaido >Priority: Major > Fix For: 0.6.0 > > > Many users and BI tools use JDBC connections in order to retrieve data. As > Livy exposes only a REST API, this is a limitation in its adoption. Hence, > adding a JDBC endpoint may be a very useful feature, which could also make > Livy a more attractive solution for end user to adopt. > Moreover, currently, Spark exposes a JDBC interface, but this has many > limitations, including that all the queries are submitted to the same > application, therefore there is no isolation/security, which can be offered > by Livy, making a Livy JDBC API a better solution for companies/users who > want to use Spark in order to run they queries through JDBC. > In order to make the transition from existing solutions to the new JDBC > server seamless, the proposal is to use the Hive thrift-server and extend it > as it was done by the STS. > [Here, you can find the design > doc.|https://docs.google.com/document/d/18HAR_VnQLegbYyzGg8f4zwD4GtDP5q_t3K21eXecZC4/edit] > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (LIVY-659) Travis failed on "can kill spark-submit while it's running"
runzhiwang created LIVY-659: --- Summary: Travis failed on "can kill spark-submit while it's running" Key: LIVY-659 URL: https://issues.apache.org/jira/browse/LIVY-659 Project: Livy Issue Type: Bug Components: Tests Affects Versions: 0.6.0 Reporter: runzhiwang * can kill spark-submit while it's running *** FAILED *** (41 milliseconds) org.mockito.exceptions.verification.WantedButNotInvoked: Wanted but not invoked: lineBufferedProcess.destroy(); -> at org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13$$anonfun$apply$mcV$sp$15$$anonfun$apply$mcV$sp$16.apply$mcV$sp(SparkYarnAppSpec.scala:226) Actually, there were zero interactions with this mock. at org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13$$anonfun$apply$mcV$sp$15$$anonfun$apply$mcV$sp$16.apply$mcV$sp(SparkYarnAppSpec.scala:226) at org.apache.livy.utils.SparkYarnAppSpec.org$apache$livy$utils$SparkYarnAppSpec$$cleanupThread(SparkYarnAppSpec.scala:43) at org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13$$anonfun$apply$mcV$sp$15.apply$mcV$sp(SparkYarnAppSpec.scala:224) at org.apache.livy.utils.Clock$.withSleepMethod(Clock.scala:31) at org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13.apply$mcV$sp(SparkYarnAppSpec.scala:201) at org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13.apply(SparkYarnAppSpec.scala:201) at org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13.apply(SparkYarnAppSpec.scala:201) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) please reference to: https://travis-ci.org/captainzmc/incubator-livy/jobs/580596561 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (LIVY-659) Travis failed on "can kill spark-submit while it's running"
[ https://issues.apache.org/jira/browse/LIVY-659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922336#comment-16922336 ] runzhiwang commented on LIVY-659: - I‘m working on it. > Travis failed on "can kill spark-submit while it's running" > --- > > Key: LIVY-659 > URL: https://issues.apache.org/jira/browse/LIVY-659 > Project: Livy > Issue Type: Bug > Components: Tests >Affects Versions: 0.6.0 >Reporter: runzhiwang >Priority: Major > > * can kill spark-submit while it's running *** FAILED *** (41 milliseconds) > org.mockito.exceptions.verification.WantedButNotInvoked: Wanted but not > invoked: > lineBufferedProcess.destroy(); > -> at > org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13$$anonfun$apply$mcV$sp$15$$anonfun$apply$mcV$sp$16.apply$mcV$sp(SparkYarnAppSpec.scala:226) > Actually, there were zero interactions with this mock. > at > org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13$$anonfun$apply$mcV$sp$15$$anonfun$apply$mcV$sp$16.apply$mcV$sp(SparkYarnAppSpec.scala:226) > at > org.apache.livy.utils.SparkYarnAppSpec.org$apache$livy$utils$SparkYarnAppSpec$$cleanupThread(SparkYarnAppSpec.scala:43) > at > org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13$$anonfun$apply$mcV$sp$15.apply$mcV$sp(SparkYarnAppSpec.scala:224) > at org.apache.livy.utils.Clock$.withSleepMethod(Clock.scala:31) > at > org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13.apply$mcV$sp(SparkYarnAppSpec.scala:201) > at > org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13.apply(SparkYarnAppSpec.scala:201) > at > org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13.apply(SparkYarnAppSpec.scala:201) > at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > please reference to: > https://travis-ci.org/captainzmc/incubator-livy/jobs/580596561 -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Comment Edited] (LIVY-636) Unable to create interactive session with additional JAR in spark.driver.extraClassPath
[ https://issues.apache.org/jira/browse/LIVY-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922323#comment-16922323 ] Konstantin edited comment on LIVY-636 at 9/4/19 9:35 AM: - Exactly same exception when trying to add jar from HDFS through the "jars" field in the spark session; possibly for the same reason. Tried to add the main class to the jar (it was meant to be library-only, so there was no main class presented initially) - exception goes away, but imports from this jar in the session statements are still impossible. was (Author: cerberuser): Exactly same exception when trying to add jar from HDFS through the "jars" field; possibly for the same reason. Tried to add the main class to the jar (it was meant to be library-only, so there was no main class presented initially) - exception goes away, but imports from this jar are still impossible. > Unable to create interactive session with additional JAR in > spark.driver.extraClassPath > --- > > Key: LIVY-636 > URL: https://issues.apache.org/jira/browse/LIVY-636 > Project: Livy > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Ishita Virmani >Priority: Major > Attachments: applicationmaster.log, container.log, stacktrace.txt, > test.png > > > Command Run: c{{url -H "Content-Type: application/json" -X POST -d > '\{"kind":"pyspark","conf":{"spark.driver.extraClassPath":"/data/XXX-0.0.1-SNAPSHOT.jar"}}' > -i http:///session}} > {{The above command fails to create a Spark Session on YARN with Null pointer > exception. Stack trace for the same has been attached along-with.}} > The JAR file here is present on local driver Path. Also tried using HDFS path > in the following manner > {{hdfs://:/data/XXX-0.0.1-SNAPSHOT.jar}} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (LIVY-636) Unable to create interactive session with additional JAR in spark.driver.extraClassPath
[ https://issues.apache.org/jira/browse/LIVY-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922323#comment-16922323 ] Konstantin commented on LIVY-636: - Exactly same exception when trying to add jar from HDFS through the "jars" field; possibly for the same reason. Tried to add the main class to the jar (it was meant to be library-only, so there was no main class presented initially) - exception goes away, but imports from this jar are still impossible. > Unable to create interactive session with additional JAR in > spark.driver.extraClassPath > --- > > Key: LIVY-636 > URL: https://issues.apache.org/jira/browse/LIVY-636 > Project: Livy > Issue Type: Bug >Affects Versions: 0.6.0 >Reporter: Ishita Virmani >Priority: Major > Attachments: applicationmaster.log, container.log, stacktrace.txt, > test.png > > > Command Run: c{{url -H "Content-Type: application/json" -X POST -d > '\{"kind":"pyspark","conf":{"spark.driver.extraClassPath":"/data/XXX-0.0.1-SNAPSHOT.jar"}}' > -i http:///session}} > {{The above command fails to create a Spark Session on YARN with Null pointer > exception. Stack trace for the same has been attached along-with.}} > The JAR file here is present on local driver Path. Also tried using HDFS path > in the following manner > {{hdfs://:/data/XXX-0.0.1-SNAPSHOT.jar}} -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Comment Edited] (LIVY-489) Expose a JDBC endpoint for Livy
[ https://issues.apache.org/jira/browse/LIVY-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922294#comment-16922294 ] Von Han Yu edited comment on LIVY-489 at 9/4/19 8:50 AM: - Hi [~mgaido], I am interested in testing this feature but I do not know where to start. Does this assume that I can connect via JDBC in the same port of the UI? I tried using beeline but it seems it's not working. Would appreciate some guidance on this. Thanks! was (Author: vonatzki): Hi [~mgaido], I am interested on testing this feature but I do not know where to start. Does this assume that I can connect via JDBC in the same port of the UI? I tried using beeline but it seems it's not working. Would appreciate some guidance on this. Thanks! > Expose a JDBC endpoint for Livy > --- > > Key: LIVY-489 > URL: https://issues.apache.org/jira/browse/LIVY-489 > Project: Livy > Issue Type: New Feature > Components: API, Server >Affects Versions: 0.6.0 >Reporter: Marco Gaido >Assignee: Marco Gaido >Priority: Major > Fix For: 0.6.0 > > > Many users and BI tools use JDBC connections in order to retrieve data. As > Livy exposes only a REST API, this is a limitation in its adoption. Hence, > adding a JDBC endpoint may be a very useful feature, which could also make > Livy a more attractive solution for end user to adopt. > Moreover, currently, Spark exposes a JDBC interface, but this has many > limitations, including that all the queries are submitted to the same > application, therefore there is no isolation/security, which can be offered > by Livy, making a Livy JDBC API a better solution for companies/users who > want to use Spark in order to run they queries through JDBC. > In order to make the transition from existing solutions to the new JDBC > server seamless, the proposal is to use the Hive thrift-server and extend it > as it was done by the STS. > [Here, you can find the design > doc.|https://docs.google.com/document/d/18HAR_VnQLegbYyzGg8f4zwD4GtDP5q_t3K21eXecZC4/edit] > -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Commented] (LIVY-489) Expose a JDBC endpoint for Livy
[ https://issues.apache.org/jira/browse/LIVY-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922294#comment-16922294 ] Von Han Yu commented on LIVY-489: - Hi [~mgaido], I am interested on testing this feature but I do not know where to start. Does this assume that I can connect via JDBC in the same port of the UI? I tried using beeline but it seems it's not working. Would appreciate some guidance on this. Thanks! > Expose a JDBC endpoint for Livy > --- > > Key: LIVY-489 > URL: https://issues.apache.org/jira/browse/LIVY-489 > Project: Livy > Issue Type: New Feature > Components: API, Server >Affects Versions: 0.6.0 >Reporter: Marco Gaido >Assignee: Marco Gaido >Priority: Major > Fix For: 0.6.0 > > > Many users and BI tools use JDBC connections in order to retrieve data. As > Livy exposes only a REST API, this is a limitation in its adoption. Hence, > adding a JDBC endpoint may be a very useful feature, which could also make > Livy a more attractive solution for end user to adopt. > Moreover, currently, Spark exposes a JDBC interface, but this has many > limitations, including that all the queries are submitted to the same > application, therefore there is no isolation/security, which can be offered > by Livy, making a Livy JDBC API a better solution for companies/users who > want to use Spark in order to run they queries through JDBC. > In order to make the transition from existing solutions to the new JDBC > server seamless, the proposal is to use the Hive thrift-server and extend it > as it was done by the STS. > [Here, you can find the design > doc.|https://docs.google.com/document/d/18HAR_VnQLegbYyzGg8f4zwD4GtDP5q_t3K21eXecZC4/edit] > -- This message was sent by Atlassian Jira (v8.3.2#803003)