[jira] [Commented] (LIVY-489) Expose a JDBC endpoint for Livy

2019-09-04 Thread Von Han Yu (Jira)


[ 
https://issues.apache.org/jira/browse/LIVY-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922946#comment-16922946
 ] 

Von Han Yu commented on LIVY-489:
-

Awesome, thanks [~mgaido]. Will check the livy.conf.

> Expose a JDBC endpoint for Livy
> ---
>
> Key: LIVY-489
> URL: https://issues.apache.org/jira/browse/LIVY-489
> Project: Livy
>  Issue Type: New Feature
>  Components: API, Server
>Affects Versions: 0.6.0
>Reporter: Marco Gaido
>Assignee: Marco Gaido
>Priority: Major
> Fix For: 0.6.0
>
>
> Many users and BI tools use JDBC connections in order to retrieve data. As 
> Livy exposes only a REST API, this is a limitation in its adoption. Hence, 
> adding a JDBC endpoint may be a very useful feature, which could also make 
> Livy a more attractive solution for end user to adopt.
> Moreover, currently, Spark exposes a JDBC interface, but this has many 
> limitations, including that all the queries are submitted to the same 
> application, therefore there is no isolation/security, which can be offered 
> by Livy, making a Livy JDBC API a better solution for companies/users who 
> want to use Spark in order to run they queries through JDBC.
> In order to make the transition from existing solutions to the new JDBC 
> server seamless, the proposal is to use the Hive thrift-server and extend it 
> as it was done by the STS.
> [Here, you can find the design 
> doc.|https://docs.google.com/document/d/18HAR_VnQLegbYyzGg8f4zwD4GtDP5q_t3K21eXecZC4/edit]
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (LIVY-660) How can we use YARN and all the nodes in our cluster when submiting a pySpark job

2019-09-04 Thread Sebastian Rama (Jira)
Sebastian Rama created LIVY-660:
---

 Summary: How can we use YARN and all the nodes in our cluster when 
submiting a pySpark job
 Key: LIVY-660
 URL: https://issues.apache.org/jira/browse/LIVY-660
 Project: Livy
  Issue Type: Question
  Components: Server
Affects Versions: 0.6.0
Reporter: Sebastian Rama


How can we use YARN and all the nodes in our cluster when submiting a pySpark 
job?

We have edited all the required .conf files but nothing happens. =(

 

 

[root@cdh-node06 conf]# cat livy-client.conf

#

# Licensed to the Apache Software Foundation (ASF) under one or more

# contributor license agreements.  See the NOTICE file distributed with

# this work for additional information regarding copyright ownership.

# The ASF licenses this file to You under the Apache License, Version 2.0

# (the "License"); you may not use this file except in compliance with

# the License.  You may obtain a copy of the License at

#

#    http://www.apache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

#

# Use this keystore for the SSL certificate and key.

# livy.keystore =

dew0wf-e

# Specify the keystore password.

# livy.keystore.password =

#

welfka

# Specify the key password.

# livy.key-password =

 

# Hadoop Credential Provider Path to get "livy.keystore.password" and 
"livy.key-password".

# Credential Provider can be created using command as follow:

# hadoop credential create "livy.keystore.password" -value "secret" -provider 
jceks://hdfs/path/to/livy.jceks

# livy.hadoop.security.credential.provider.path =

 

# What host address to start the server on. By default, Livy will bind to all 
network interfaces.

# livy.server.host = 0.0.0.0

 

# What port to start the server on.

# livy.server.port = 8998

 

# What base path ui should work on. By default UI is mounted on "/".

# E.g.: livy.ui.basePath = /my_livy - result in mounting UI on /my_livy/

# livy.ui.basePath = ""

 

# What spark master Livy sessions should use.

livy.spark.master = yarn

 

# What spark deploy mode Livy sessions should use.

livy.spark.deploy-mode = cluster

 

# Configure Livy server http request and response header size.

# livy.server.request-header.size = 131072

# livy.server.response-header.size = 131072

 

# Enabled to check whether timeout Livy sessions should be stopped.

# livy.server.session.timeout-check = true

 

# Time in milliseconds on how long Livy will wait before timing out an idle 
session.

# livy.server.session.timeout = 1h

#

# How long a finished session state should be kept in LivyServer for query.

# livy.server.session.state-retain.sec = 600s

 

# If livy should impersonate the requesting users when creating a new session.

# livy.impersonation.enabled = true

 

# Logs size livy can cache for each session/batch. 0 means don't cache the logs.

# livy.cache-log.size = 200

 

# Comma-separated list of Livy RSC jars. By default Livy will upload jars from 
its installation

# directory every time a session is started. By caching these files in HDFS, 
for example, startup

# time of sessions on YARN can be reduced.

# livy.rsc.jars =

 

# Comma-separated list of Livy REPL jars. By default Livy will upload jars from 
its installation

# directory every time a session is started. By caching these files in HDFS, 
for example, startup

# time of sessions on YARN can be reduced. Please list all the repl 
dependencies including

# Scala version-specific livy-repl jars, Livy will automatically pick the right 
dependencies

# during session creation.

# livy.repl.jars =

 

# Location of PySpark archives. By default Livy will upload the file from 
SPARK_HOME, but

# by caching the file in HDFS, startup time of PySpark sessions on YARN can be 
reduced.

# livy.pyspark.archives =

 

# Location of the SparkR package. By default Livy will upload the file from 
SPARK_HOME, but

# by caching the file in HDFS, startup time of R sessions on YARN can be 
reduced.

# livy.sparkr.package =

 

# List of local directories from where files are allowed to be added to user 
sessions. By

# default it's empty, meaning users can only reference remote URIs when 
starting their

# sessions.

# livy.file.local-dir-whitelist =

 

# Whether to enable csrf protection, by default it is false. If it is enabled, 
client should add

# http-header "X-Requested-By" in request if the http method is 
POST/DELETE/PUT/PATCH.

# livy.server.csrf-protection.enabled =

 

# Whether to enable HiveContext in livy interpreter, if it is true 
hive-site.xml will be detected

# on user request and then livy server 

[jira] [Commented] (LIVY-489) Expose a JDBC endpoint for Livy

2019-09-04 Thread Marco Gaido (Jira)


[ 
https://issues.apache.org/jira/browse/LIVY-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922363#comment-16922363
 ] 

Marco Gaido commented on LIVY-489:
--

Hi [~vonatzki]. Please check the design doc attached here. Moreover, please 
check the configurations added in LivyConf. You'll see that the thriftserver 
must be enabled and configured properly and default values for the 
configurations are there.

As a suggestion, it is possible for you, I'd recommend to build Livy from 
master branch, even though the thriftserver is present also in the 0.6.0 
version. Indeed, it was the first release for it, so might find issues which 
have been fixed on current master.

Thanks.

> Expose a JDBC endpoint for Livy
> ---
>
> Key: LIVY-489
> URL: https://issues.apache.org/jira/browse/LIVY-489
> Project: Livy
>  Issue Type: New Feature
>  Components: API, Server
>Affects Versions: 0.6.0
>Reporter: Marco Gaido
>Assignee: Marco Gaido
>Priority: Major
> Fix For: 0.6.0
>
>
> Many users and BI tools use JDBC connections in order to retrieve data. As 
> Livy exposes only a REST API, this is a limitation in its adoption. Hence, 
> adding a JDBC endpoint may be a very useful feature, which could also make 
> Livy a more attractive solution for end user to adopt.
> Moreover, currently, Spark exposes a JDBC interface, but this has many 
> limitations, including that all the queries are submitted to the same 
> application, therefore there is no isolation/security, which can be offered 
> by Livy, making a Livy JDBC API a better solution for companies/users who 
> want to use Spark in order to run they queries through JDBC.
> In order to make the transition from existing solutions to the new JDBC 
> server seamless, the proposal is to use the Hive thrift-server and extend it 
> as it was done by the STS.
> [Here, you can find the design 
> doc.|https://docs.google.com/document/d/18HAR_VnQLegbYyzGg8f4zwD4GtDP5q_t3K21eXecZC4/edit]
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Created] (LIVY-659) Travis failed on "can kill spark-submit while it's running"

2019-09-04 Thread runzhiwang (Jira)
runzhiwang created LIVY-659:
---

 Summary: Travis failed on "can kill spark-submit while it's 
running"
 Key: LIVY-659
 URL: https://issues.apache.org/jira/browse/LIVY-659
 Project: Livy
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.6.0
Reporter: runzhiwang


* can kill spark-submit while it's running *** FAILED *** (41 milliseconds)
 org.mockito.exceptions.verification.WantedButNotInvoked: Wanted but not 
invoked:
lineBufferedProcess.destroy();
-> at 
org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13$$anonfun$apply$mcV$sp$15$$anonfun$apply$mcV$sp$16.apply$mcV$sp(SparkYarnAppSpec.scala:226)
Actually, there were zero interactions with this mock.
 at 
org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13$$anonfun$apply$mcV$sp$15$$anonfun$apply$mcV$sp$16.apply$mcV$sp(SparkYarnAppSpec.scala:226)
 at 
org.apache.livy.utils.SparkYarnAppSpec.org$apache$livy$utils$SparkYarnAppSpec$$cleanupThread(SparkYarnAppSpec.scala:43)
 at 
org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13$$anonfun$apply$mcV$sp$15.apply$mcV$sp(SparkYarnAppSpec.scala:224)
 at org.apache.livy.utils.Clock$.withSleepMethod(Clock.scala:31)
 at 
org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13.apply$mcV$sp(SparkYarnAppSpec.scala:201)
 at 
org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13.apply(SparkYarnAppSpec.scala:201)
 at 
org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13.apply(SparkYarnAppSpec.scala:201)
 at 
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
 at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
 at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)

please reference to: 
https://travis-ci.org/captainzmc/incubator-livy/jobs/580596561



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (LIVY-659) Travis failed on "can kill spark-submit while it's running"

2019-09-04 Thread runzhiwang (Jira)


[ 
https://issues.apache.org/jira/browse/LIVY-659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922336#comment-16922336
 ] 

runzhiwang commented on LIVY-659:
-

I‘m working on it.

> Travis failed on "can kill spark-submit while it's running"
> ---
>
> Key: LIVY-659
> URL: https://issues.apache.org/jira/browse/LIVY-659
> Project: Livy
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 0.6.0
>Reporter: runzhiwang
>Priority: Major
>
> * can kill spark-submit while it's running *** FAILED *** (41 milliseconds)
>  org.mockito.exceptions.verification.WantedButNotInvoked: Wanted but not 
> invoked:
> lineBufferedProcess.destroy();
> -> at 
> org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13$$anonfun$apply$mcV$sp$15$$anonfun$apply$mcV$sp$16.apply$mcV$sp(SparkYarnAppSpec.scala:226)
> Actually, there were zero interactions with this mock.
>  at 
> org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13$$anonfun$apply$mcV$sp$15$$anonfun$apply$mcV$sp$16.apply$mcV$sp(SparkYarnAppSpec.scala:226)
>  at 
> org.apache.livy.utils.SparkYarnAppSpec.org$apache$livy$utils$SparkYarnAppSpec$$cleanupThread(SparkYarnAppSpec.scala:43)
>  at 
> org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13$$anonfun$apply$mcV$sp$15.apply$mcV$sp(SparkYarnAppSpec.scala:224)
>  at org.apache.livy.utils.Clock$.withSleepMethod(Clock.scala:31)
>  at 
> org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13.apply$mcV$sp(SparkYarnAppSpec.scala:201)
>  at 
> org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13.apply(SparkYarnAppSpec.scala:201)
>  at 
> org.apache.livy.utils.SparkYarnAppSpec$$anonfun$1$$anonfun$apply$mcV$sp$13.apply(SparkYarnAppSpec.scala:201)
>  at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>  at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> please reference to: 
> https://travis-ci.org/captainzmc/incubator-livy/jobs/580596561



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (LIVY-636) Unable to create interactive session with additional JAR in spark.driver.extraClassPath

2019-09-04 Thread Konstantin (Jira)


[ 
https://issues.apache.org/jira/browse/LIVY-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922323#comment-16922323
 ] 

Konstantin edited comment on LIVY-636 at 9/4/19 9:35 AM:
-

Exactly same exception when trying to add jar from HDFS through the "jars" 
field in the spark session; possibly for the same reason. Tried to add the main 
class to the jar (it was meant to be library-only, so there was no main class 
presented initially) - exception goes away, but imports from this jar in the 
session statements are still impossible.


was (Author: cerberuser):
Exactly same exception when trying to add jar from HDFS through the "jars" 
field; possibly for the same reason. Tried to add the main class to the jar (it 
was meant to be library-only, so there was no main class presented initially) - 
exception goes away, but imports from this jar are still impossible.

> Unable to create interactive session with additional JAR in 
> spark.driver.extraClassPath
> ---
>
> Key: LIVY-636
> URL: https://issues.apache.org/jira/browse/LIVY-636
> Project: Livy
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ishita Virmani
>Priority: Major
> Attachments: applicationmaster.log, container.log, stacktrace.txt, 
> test.png
>
>
> Command Run: c{{url -H "Content-Type: application/json" -X POST -d 
> '\{"kind":"pyspark","conf":{"spark.driver.extraClassPath":"/data/XXX-0.0.1-SNAPSHOT.jar"}}'
>  -i http:///session}}
> {{The above command fails to create a Spark Session on YARN with Null pointer 
> exception. Stack trace for the same has been attached along-with.}}
> The JAR file here is present on local driver Path. Also tried using HDFS path 
> in the following manner 
> {{hdfs://:/data/XXX-0.0.1-SNAPSHOT.jar}}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (LIVY-636) Unable to create interactive session with additional JAR in spark.driver.extraClassPath

2019-09-04 Thread Konstantin (Jira)


[ 
https://issues.apache.org/jira/browse/LIVY-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922323#comment-16922323
 ] 

Konstantin commented on LIVY-636:
-

Exactly same exception when trying to add jar from HDFS through the "jars" 
field; possibly for the same reason. Tried to add the main class to the jar (it 
was meant to be library-only, so there was no main class presented initially) - 
exception goes away, but imports from this jar are still impossible.

> Unable to create interactive session with additional JAR in 
> spark.driver.extraClassPath
> ---
>
> Key: LIVY-636
> URL: https://issues.apache.org/jira/browse/LIVY-636
> Project: Livy
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ishita Virmani
>Priority: Major
> Attachments: applicationmaster.log, container.log, stacktrace.txt, 
> test.png
>
>
> Command Run: c{{url -H "Content-Type: application/json" -X POST -d 
> '\{"kind":"pyspark","conf":{"spark.driver.extraClassPath":"/data/XXX-0.0.1-SNAPSHOT.jar"}}'
>  -i http:///session}}
> {{The above command fails to create a Spark Session on YARN with Null pointer 
> exception. Stack trace for the same has been attached along-with.}}
> The JAR file here is present on local driver Path. Also tried using HDFS path 
> in the following manner 
> {{hdfs://:/data/XXX-0.0.1-SNAPSHOT.jar}}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Comment Edited] (LIVY-489) Expose a JDBC endpoint for Livy

2019-09-04 Thread Von Han Yu (Jira)


[ 
https://issues.apache.org/jira/browse/LIVY-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922294#comment-16922294
 ] 

Von Han Yu edited comment on LIVY-489 at 9/4/19 8:50 AM:
-

Hi [~mgaido], I am interested in testing this feature but I do not know where 
to start. Does this assume that I can connect via JDBC in the same port of the 
UI? I tried using beeline but it seems it's not working. Would appreciate some 
guidance on this.

 

Thanks!


was (Author: vonatzki):
Hi [~mgaido], I am interested on testing this feature but I do not know where 
to start. Does this assume that I can connect via JDBC in the same port of the 
UI? I tried using beeline but it seems it's not working. Would appreciate some 
guidance on this.

 

Thanks!

> Expose a JDBC endpoint for Livy
> ---
>
> Key: LIVY-489
> URL: https://issues.apache.org/jira/browse/LIVY-489
> Project: Livy
>  Issue Type: New Feature
>  Components: API, Server
>Affects Versions: 0.6.0
>Reporter: Marco Gaido
>Assignee: Marco Gaido
>Priority: Major
> Fix For: 0.6.0
>
>
> Many users and BI tools use JDBC connections in order to retrieve data. As 
> Livy exposes only a REST API, this is a limitation in its adoption. Hence, 
> adding a JDBC endpoint may be a very useful feature, which could also make 
> Livy a more attractive solution for end user to adopt.
> Moreover, currently, Spark exposes a JDBC interface, but this has many 
> limitations, including that all the queries are submitted to the same 
> application, therefore there is no isolation/security, which can be offered 
> by Livy, making a Livy JDBC API a better solution for companies/users who 
> want to use Spark in order to run they queries through JDBC.
> In order to make the transition from existing solutions to the new JDBC 
> server seamless, the proposal is to use the Hive thrift-server and extend it 
> as it was done by the STS.
> [Here, you can find the design 
> doc.|https://docs.google.com/document/d/18HAR_VnQLegbYyzGg8f4zwD4GtDP5q_t3K21eXecZC4/edit]
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (LIVY-489) Expose a JDBC endpoint for Livy

2019-09-04 Thread Von Han Yu (Jira)


[ 
https://issues.apache.org/jira/browse/LIVY-489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16922294#comment-16922294
 ] 

Von Han Yu commented on LIVY-489:
-

Hi [~mgaido], I am interested on testing this feature but I do not know where 
to start. Does this assume that I can connect via JDBC in the same port of the 
UI? I tried using beeline but it seems it's not working. Would appreciate some 
guidance on this.

 

Thanks!

> Expose a JDBC endpoint for Livy
> ---
>
> Key: LIVY-489
> URL: https://issues.apache.org/jira/browse/LIVY-489
> Project: Livy
>  Issue Type: New Feature
>  Components: API, Server
>Affects Versions: 0.6.0
>Reporter: Marco Gaido
>Assignee: Marco Gaido
>Priority: Major
> Fix For: 0.6.0
>
>
> Many users and BI tools use JDBC connections in order to retrieve data. As 
> Livy exposes only a REST API, this is a limitation in its adoption. Hence, 
> adding a JDBC endpoint may be a very useful feature, which could also make 
> Livy a more attractive solution for end user to adopt.
> Moreover, currently, Spark exposes a JDBC interface, but this has many 
> limitations, including that all the queries are submitted to the same 
> application, therefore there is no isolation/security, which can be offered 
> by Livy, making a Livy JDBC API a better solution for companies/users who 
> want to use Spark in order to run they queries through JDBC.
> In order to make the transition from existing solutions to the new JDBC 
> server seamless, the proposal is to use the Hive thrift-server and extend it 
> as it was done by the STS.
> [Here, you can find the design 
> doc.|https://docs.google.com/document/d/18HAR_VnQLegbYyzGg8f4zwD4GtDP5q_t3K21eXecZC4/edit]
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)