[jira] [Assigned] (LIVY-637) get NullPointerException when create database using thriftserver

2019-08-21 Thread Saisai Shao (Jira)


 [ 
https://issues.apache.org/jira/browse/LIVY-637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned LIVY-637:


Assignee: mingchao zhao

> get NullPointerException when create database using thriftserver
> 
>
> Key: LIVY-637
> URL: https://issues.apache.org/jira/browse/LIVY-637
> Project: Livy
>  Issue Type: Bug
>  Components: Thriftserver
>Affects Versions: 0.6.0
>Reporter: mingchao zhao
>Assignee: mingchao zhao
>Priority: Major
> Attachments: create.png, drop.png, use.png
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> When I connected thriftserver with spark beeline. NullPointerException occurs 
> when  execute the following SQL. This exception does not affect the final 
> execution result.
> create database test;
> use test;
> drop database test;
> 0: jdbc:hive2://localhost:10090> create database test;
>  java.lang.NullPointerException
>  at org.apache.hive.service.cli.ColumnBasedSet.(ColumnBasedSet.java:50)
>  at org.apache.hive.service.cli.RowSetFactory.create(RowSetFactory.java:37)
>  at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:368)
>  at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:42)
>  at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1794)
>  at org.apache.hive.beeline.Commands.execute(Commands.java:860)
>  at org.apache.hive.beeline.Commands.sql(Commands.java:713)
>  at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:973)
>  at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:813)
>  at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:771)
>  at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:484)
>  at org.apache.hive.beeline.BeeLine.main(BeeLine.java:467)
>  Error: Error retrieving next row (state=,code=0)
>  0: jdbc:hive2://localhost:10090> use test;
>  java.lang.NullPointerException
>  at org.apache.hive.service.cli.ColumnBasedSet.(ColumnBasedSet.java:50)
>  at org.apache.hive.service.cli.RowSetFactory.create(RowSetFactory.java:37)
>  at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:368)
>  at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:42)
>  at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1794)
>  at org.apache.hive.beeline.Commands.execute(Commands.java:860)
>  at org.apache.hive.beeline.Commands.sql(Commands.java:713)
>  at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:973)
>  at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:813)
>  at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:771)
>  at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:484)
>  at org.apache.hive.beeline.BeeLine.main(BeeLine.java:467)
>  Error: Error retrieving next row (state=,code=0)
> 0: jdbc:hive2://localhost:10090> drop database test;
>  java.lang.NullPointerException
>  at org.apache.hive.service.cli.ColumnBasedSet.(ColumnBasedSet.java:50)
>  at org.apache.hive.service.cli.RowSetFactory.create(RowSetFactory.java:37)
>  at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:368)
>  at org.apache.hive.beeline.BufferedRows.(BufferedRows.java:42)
>  at org.apache.hive.beeline.BeeLine.print(BeeLine.java:1794)
>  at org.apache.hive.beeline.Commands.execute(Commands.java:860)
>  at org.apache.hive.beeline.Commands.sql(Commands.java:713)
>  at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:973)
>  at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:813)
>  at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:771)
>  at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:484)
>  at org.apache.hive.beeline.BeeLine.main(BeeLine.java:467)
>  Error: Error retrieving next row (state=,code=0)
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Closed] (LIVY-591) ACLs enforcement should occur on both session owner and proxy user

2019-08-21 Thread Saisai Shao (Jira)


 [ 
https://issues.apache.org/jira/browse/LIVY-591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao closed LIVY-591.

Resolution: Duplicate

> ACLs enforcement should occur on both session owner and proxy user
> --
>
> Key: LIVY-591
> URL: https://issues.apache.org/jira/browse/LIVY-591
> Project: Livy
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 0.6.0
>Reporter: Ankur Gupta
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Currently ACLs enforcement occurs only on session owner. So, a request is 
> authorized if the request user is same as session owner or has correct ACLs 
> configured.
> Eg: 
> https://github.com/apache/incubator-livy/blob/master/server/src/main/scala/org/apache/livy/server/interactive/InteractiveSessionServlet.scala#L70
> In case of impersonation, proxy user is checked against session owner, 
> instead he should be checked against session proxy. Otherwise, a proxy user 
> who created the session will not be able to submit statements against it, if 
> ACLs are not configured correctly.
> Additionally, it seems there is no auth-check right now while creating a 
> session. We should add that check as well (against modify-session acls).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (LIVY-592) Proxy user cannot view its session log

2019-08-21 Thread Saisai Shao (Jira)


 [ 
https://issues.apache.org/jira/browse/LIVY-592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-592.
--
Fix Version/s: 0.7.0
 Assignee: Yiheng Wang
   Resolution: Fixed

> Proxy user cannot view its session log
> --
>
> Key: LIVY-592
> URL: https://issues.apache.org/jira/browse/LIVY-592
> Project: Livy
>  Issue Type: Bug
>  Components: Server
> Environment: Docker running on Kubernetes
>Reporter: Zikun Xu
>Assignee: Yiheng Wang
>Priority: Minor
> Fix For: 0.7.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Here is how to reproduce the issue.
> 
> root@storage-0-0:~# kinit admin
> Password for [admin@AZDATA.LOCAL|mailto:admin@AZDATA.LOCAL]:
> Warning: Your password will expire in 41 days on Tue Jun 11 08:35:19 2019
> root@storage-0-0:~#
> root@storage-0-0:~# curl -k -X POST --negotiate -u : --data '\{"kind": 
> "pyspark", "proxyUser": "admin"}' -H "Content-Type: application/json" 
> 'https://gateway-0.azdata.local:8443/gateway/default/livy/v1/sessions'
> {"id":0,"name":null,"appId":null,"owner":"knox","proxyUser":"admin","state":"starting","kind":"pyspark","appInfo":\{"driverLogUrl":null,"sparkUiUrl":null},"log":[]}
>  
> root@storage-0-0:~# curl -k --negotiate -u : 
> 'https://gateway-0.azdata.local:8443/gateway/default/livy/v1/sessions'
> {"from":0,"total":2,"sessions":[{"id":0,"name":null,"appId":"application_1556613676830_0001","owner":"knox","proxyUser":"admin","state":"starting","kind":"pyspark","appInfo":{"driverLogUrl":"[http://storage-0-0.storage-0-svc.test.svc.cluster.local:8042/node/containerlogs/container_1556613676830_0001_01_01/admin]","sparkUiUrl":"[http://master-0.azdata.local:8088/proxy/application_1556613676830_0001/]"},"log":[]},\{"id":1,"name":null,"appId":null,"owner":"knox","proxyUser":"bob","state":"starting","kind":"pyspark","appInfo":{"driverLogUrl":null,"sparkUiUrl":null},"log":[]}]}
> 
> From the result, you can see that the user admin can not view the log of its 
> own session. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (LIVY-592) Proxy user cannot view its session log

2019-08-21 Thread Saisai Shao (Jira)


[ 
https://issues.apache.org/jira/browse/LIVY-592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912875#comment-16912875
 ] 

Saisai Shao commented on LIVY-592:
--

Issue resolved by pull request 202
https://github.com/apache/incubator-livy/pull/202

> Proxy user cannot view its session log
> --
>
> Key: LIVY-592
> URL: https://issues.apache.org/jira/browse/LIVY-592
> Project: Livy
>  Issue Type: Bug
>  Components: Server
> Environment: Docker running on Kubernetes
>Reporter: Zikun Xu
>Priority: Minor
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Here is how to reproduce the issue.
> 
> root@storage-0-0:~# kinit admin
> Password for [admin@AZDATA.LOCAL|mailto:admin@AZDATA.LOCAL]:
> Warning: Your password will expire in 41 days on Tue Jun 11 08:35:19 2019
> root@storage-0-0:~#
> root@storage-0-0:~# curl -k -X POST --negotiate -u : --data '\{"kind": 
> "pyspark", "proxyUser": "admin"}' -H "Content-Type: application/json" 
> 'https://gateway-0.azdata.local:8443/gateway/default/livy/v1/sessions'
> {"id":0,"name":null,"appId":null,"owner":"knox","proxyUser":"admin","state":"starting","kind":"pyspark","appInfo":\{"driverLogUrl":null,"sparkUiUrl":null},"log":[]}
>  
> root@storage-0-0:~# curl -k --negotiate -u : 
> 'https://gateway-0.azdata.local:8443/gateway/default/livy/v1/sessions'
> {"from":0,"total":2,"sessions":[{"id":0,"name":null,"appId":"application_1556613676830_0001","owner":"knox","proxyUser":"admin","state":"starting","kind":"pyspark","appInfo":{"driverLogUrl":"[http://storage-0-0.storage-0-svc.test.svc.cluster.local:8042/node/containerlogs/container_1556613676830_0001_01_01/admin]","sparkUiUrl":"[http://master-0.azdata.local:8088/proxy/application_1556613676830_0001/]"},"log":[]},\{"id":1,"name":null,"appId":null,"owner":"knox","proxyUser":"bob","state":"starting","kind":"pyspark","appInfo":{"driverLogUrl":null,"sparkUiUrl":null},"log":[]}]}
> 
> From the result, you can see that the user admin can not view the log of its 
> own session. 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


Re: Are we going to use Apache JIRA instead of Github issues

2019-08-18 Thread Saisai Shao
>
>  The issue linking, Fix Version, and assignee features of JIRA are also
> helpful communication and organization tools.
>

Yes, I think so. Github issues seems a little bit simple, there're not so
many status to track the issue unless we create bunch of labels.

Wes McKinney  于2019年8月17日周六 上午2:37写道:

> One significant issue with GitHub issues for ASF projects is that
> non-committers cannot edit issue or PR metadata (labels, requesting
> reviews, etc). The lack of formalism around Resolved and Closed states can
> place an extra communication burden to explain why an issue is closed.
> Sometimes projects use GitHub labels like 'wontfix'. The issue linking, Fix
> Version, and assignee features of JIRA are also helpful communication and
> organization tools.
>
> In other projects I have found JIRA easier to keep a larger number of
> people, release milestones, and issues organized. I can't imagine changing
> to GitHub issues in Apache Arrow, for example
>
> On Fri, Aug 16, 2019, 1:19 PM Ryan Blue  wrote:
>
>> I prefer to use github instead of JIRA because it is simpler and has
>> better search (in my opinion). I'm just one vote, though, so if most people
>> prefer to move to JIRA I'm open to it.
>>
>> What do you think is missing compared to JIRA?
>>
>> On Fri, Aug 16, 2019 at 3:09 AM Saisai Shao 
>> wrote:
>>
>>> Hi Team,
>>>
>>> Seems Iceberg project uses Github issues instead of JIRA. IMHO JIRA is
>>> more powerful and easy to manage, most of the Apache projects use JIRA to
>>> track everything, any plan to move to JIRA or we stick on using Github
>>> issues?
>>>
>>> Thanks
>>> Saisai
>>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>


Re: Release Spark 2.3.4

2019-08-18 Thread Saisai Shao
+1

Wenchen Fan  于2019年8月19日周一 上午10:28写道:

> +1
>
> On Sat, Aug 17, 2019 at 3:37 PM Hyukjin Kwon  wrote:
>
>> +1 too
>>
>> 2019년 8월 17일 (토) 오후 3:06, Dilip Biswal 님이 작성:
>>
>>> +1
>>>
>>> Regards,
>>> Dilip Biswal
>>> Tel: 408-463-4980
>>> dbis...@us.ibm.com
>>>
>>>
>>>
>>> - Original message -
>>> From: John Zhuge 
>>> To: Xiao Li 
>>> Cc: Takeshi Yamamuro , Spark dev list <
>>> dev@spark.apache.org>, Kazuaki Ishizaki 
>>> Subject: [EXTERNAL] Re: Release Spark 2.3.4
>>> Date: Fri, Aug 16, 2019 4:33 PM
>>>
>>> +1
>>>
>>> On Fri, Aug 16, 2019 at 4:25 PM Xiao Li  wrote:
>>>
>>> +1
>>>
>>> On Fri, Aug 16, 2019 at 4:11 PM Takeshi Yamamuro 
>>> wrote:
>>>
>>> +1, too
>>>
>>> Bests,
>>> Takeshi
>>>
>>> On Sat, Aug 17, 2019 at 7:25 AM Dongjoon Hyun 
>>> wrote:
>>>
>>> +1 for 2.3.4 release as the last release for `branch-2.3` EOL.
>>>
>>> Also, +1 for next week release.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>>
>>> On Fri, Aug 16, 2019 at 8:19 AM Sean Owen  wrote:
>>>
>>> I think it's fine to do these in parallel, yes. Go ahead if you are
>>> willing.
>>>
>>> On Fri, Aug 16, 2019 at 9:48 AM Kazuaki Ishizaki 
>>> wrote:
>>> >
>>> > Hi, All.
>>> >
>>> > Spark 2.3.3 was released six months ago (15th February, 2019) at
>>> http://spark.apache.org/news/spark-2-3-3-released.html. And, about 18
>>> months have been passed after Spark 2.3.0 has been released (28th February,
>>> 2018).
>>> > As of today (16th August), there are 103 commits (69 JIRAs) in
>>> `branch-23` since 2.3.3.
>>> >
>>> > It would be great if we can have Spark 2.3.4.
>>> > If it is ok, shall we start `2.3.4 RC1` concurrent with 2.4.4 or after
>>> 2.4.4 will be released?
>>> >
>>> > A issue list in jira:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12344844
>>> > A commit list in github from the last release:
>>> https://github.com/apache/spark/compare/66fd9c34bf406a4b5f86605d06c9607752bd637a...branch-2.3
>>> > The 8 correctness issues resolved in branch-2.3:
>>> >
>>> https://issues.apache.org/jira/browse/SPARK-26873?jql=project%20%3D%2012315420%20AND%20fixVersion%20%3D%2012344844%20AND%20labels%20in%20(%27correctness%27)%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC
>>> >
>>> > Best Regards,
>>> > Kazuaki Ishizaki
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>>
>>>
>>> --
>>> ---
>>> Takeshi Yamamuro
>>>
>>>
>>>
>>> --
>>> [image: Databricks Summit - Watch the talks]
>>> 
>>>
>>>
>>>
>>> --
>>> John Zhuge
>>>
>>>
>>>
>>> - To
>>> unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Are we going to use Apache JIRA instead of Github issues

2019-08-16 Thread Saisai Shao
Hi Team,

Seems Iceberg project uses Github issues instead of JIRA. IMHO JIRA is more
powerful and easy to manage, most of the Apache projects use JIRA to track
everything, any plan to move to JIRA or we stick on using Github issues?

Thanks
Saisai


[jira] [Assigned] (LIVY-623) Implement GetTables metadata operation

2019-08-15 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned LIVY-623:


Assignee: Yiheng Wang

> Implement GetTables metadata operation
> --
>
> Key: LIVY-623
> URL: https://issues.apache.org/jira/browse/LIVY-623
> Project: Livy
>  Issue Type: Sub-task
>  Components: Thriftserver
>Reporter: Yiheng Wang
>Assignee: Yiheng Wang
>Priority: Minor
> Fix For: 0.7.0
>
>
> We should support GetTables metadata operation in Livy thrift server.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (LIVY-625) Implement GetFunctions metadata operation

2019-08-15 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned LIVY-625:


Assignee: Yiheng Wang

> Implement GetFunctions metadata operation
> -
>
> Key: LIVY-625
> URL: https://issues.apache.org/jira/browse/LIVY-625
> Project: Livy
>  Issue Type: Sub-task
>  Components: Thriftserver
>Reporter: Yiheng Wang
>Assignee: Yiheng Wang
>Priority: Minor
> Fix For: 0.7.0
>
>
> We should support GetFunctions metadata operation in Livy thrift server.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (LIVY-624) Implement GetColumns metadata operation

2019-08-15 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned LIVY-624:


Assignee: Yiheng Wang

> Implement GetColumns metadata operation
> ---
>
> Key: LIVY-624
> URL: https://issues.apache.org/jira/browse/LIVY-624
> Project: Livy
>  Issue Type: Sub-task
>  Components: Thriftserver
>Reporter: Yiheng Wang
>Assignee: Yiheng Wang
>Priority: Minor
> Fix For: 0.7.0
>
>
> We should support GetColumns metadata operation in Livy thrift server.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (LIVY-575) Implement missing metadata operations

2019-08-15 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated LIVY-575:
-
Priority: Major  (was: Minor)

> Implement missing metadata operations
> -
>
> Key: LIVY-575
> URL: https://issues.apache.org/jira/browse/LIVY-575
> Project: Livy
>  Issue Type: Improvement
>  Components: Thriftserver
>Reporter: Marco Gaido
>Priority: Major
>
> Many metadata operations (eg. table list retrieval, schema retrieval, ...) 
> are currently not implemented. We should implement them.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (LIVY-622) Implement GetSchemas metadata operation

2019-08-15 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned LIVY-622:


Assignee: Yiheng Wang

> Implement GetSchemas metadata operation
> ---
>
> Key: LIVY-622
> URL: https://issues.apache.org/jira/browse/LIVY-622
> Project: Livy
>  Issue Type: Sub-task
>  Components: Thriftserver
>Reporter: Yiheng Wang
>Assignee: Yiheng Wang
>Priority: Minor
> Fix For: 0.7.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We should support GetSchemas metadata operation in Livy thrift server.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (LIVY-625) Implement GetFunctions metadata operation

2019-08-15 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-625.
--
   Resolution: Fixed
Fix Version/s: 0.7.0

Issue resolved by pull request 194
[https://github.com/apache/incubator-livy/pull/194]

> Implement GetFunctions metadata operation
> -
>
> Key: LIVY-625
> URL: https://issues.apache.org/jira/browse/LIVY-625
> Project: Livy
>  Issue Type: Sub-task
>  Components: Thriftserver
>Reporter: Yiheng Wang
>Priority: Minor
> Fix For: 0.7.0
>
>
> We should support GetFunctions metadata operation in Livy thrift server.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (LIVY-624) Implement GetColumns metadata operation

2019-08-15 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-624.
--
   Resolution: Fixed
Fix Version/s: 0.7.0

Issue resolved by pull request 194
[https://github.com/apache/incubator-livy/pull/194]

> Implement GetColumns metadata operation
> ---
>
> Key: LIVY-624
> URL: https://issues.apache.org/jira/browse/LIVY-624
> Project: Livy
>  Issue Type: Sub-task
>  Components: Thriftserver
>Reporter: Yiheng Wang
>Priority: Minor
> Fix For: 0.7.0
>
>
> We should support GetColumns metadata operation in Livy thrift server.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (LIVY-623) Implement GetTables metadata operation

2019-08-15 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-623.
--
   Resolution: Fixed
Fix Version/s: 0.7.0

Issue resolved by pull request 194
[https://github.com/apache/incubator-livy/pull/194]

> Implement GetTables metadata operation
> --
>
> Key: LIVY-623
> URL: https://issues.apache.org/jira/browse/LIVY-623
> Project: Livy
>  Issue Type: Sub-task
>  Components: Thriftserver
>Reporter: Yiheng Wang
>Priority: Minor
> Fix For: 0.7.0
>
>
> We should support GetTables metadata operation in Livy thrift server.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (LIVY-622) Implement GetSchemas metadata operation

2019-08-15 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-622.
--
   Resolution: Fixed
Fix Version/s: 0.7.0

Issue resolved by pull request 194
[https://github.com/apache/incubator-livy/pull/194]

> Implement GetSchemas metadata operation
> ---
>
> Key: LIVY-622
> URL: https://issues.apache.org/jira/browse/LIVY-622
> Project: Livy
>  Issue Type: Sub-task
>  Components: Thriftserver
>Reporter: Yiheng Wang
>Priority: Minor
> Fix For: 0.7.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> We should support GetSchemas metadata operation in Livy thrift server.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (LIVY-635) Travis failed to build

2019-08-15 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated LIVY-635:
-
Component/s: Build

> Travis failed to build
> --
>
> Key: LIVY-635
> URL: https://issues.apache.org/jira/browse/LIVY-635
> Project: Livy
>  Issue Type: Bug
>  Components: Build, Tests, Thriftserver
>Affects Versions: 0.6.0
>Reporter: jiewang
>Assignee: jiewang
>Priority: Major
> Fix For: 0.7.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> [ERROR] Failed to execute goal on project livy-thriftserver: Could not 
> resolve dependencies for project 
> org.apache.livy:livy-thriftserver:jar:0.7.0-incubating-SNAPSHOT: Failed to 
> collect dependencies at org.apache.hive:hive-jdbc:jar:3.0.0 -> 
> org.apache.hive:hive-service:jar:3.0.0 -> 
> org.apache.hive:hive-llap-server:jar:3.0.0 -> 
> org.apache.hbase:hbase-server:jar:2.0.0-alpha4 -> 
> org.glassfish.web:javax.servlet.jsp:jar:2.3.2 -> 
> org.glassfish:javax.el:jar:3.0.1-b08-SNAPSHOT: Failed to read artifact 
> descriptor for org.glassfish:javax.el:jar:3.0.1-b08-SNAPSHOT: Could not 
> transfer artifact org.glassfish:javax.el:pom:3.0.1-b08-SNAPSHOT from/to 
> apache-snapshots (https://repository.apache.org/snapshots/): Connect to 
> repository.apache.org:443 [repository.apache.org/207.244.88.140] failed: 
> Connection timed out (Connection timed out) -> [Help 1]
> 2258org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal on project livy-thriftserver: Could not resolve dependencies for project 
> org.apache.livy:livy-thriftserver:jar:0.7.0-incubating-SNAPSHOT: Failed to 
> collect dependencies at org.apache.hive:hive-jdbc:jar:3.0.0 -> 
> org.apache.hive:hive-service:jar:3.0.0 -> 
> org.apache.hive:hive-llap-server:jar:3.0.0 -> 
> org.apache.hbase:hbase-server:jar:2.0.0-alpha4 -> 
> org.glassfish.web:javax.servlet.jsp:jar:2.3.2 -> 
> org.glassfish:javax.el:jar:3.0.1-b08-SNAPSHOT
> 2259 at 
> org.apache.maven.lifecycle.internal.LifecycleDependencyResolver.getDependencies
>  (LifecycleDependencyResolver.java:249)
> 2260 at 
> org.apache.maven.lifecycle.internal.LifecycleDependencyResolver.resolveProjectDependencies
>  (LifecycleDependencyResolver.java:145)
> 2261 at 
> org.apache.maven.lifecycle.internal.MojoExecutor.ensureDependenciesAreResolved
>  (MojoExecutor.java:246)
> 2262 at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:200)
> 2263 at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:154)
> 2264 at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:146)
> 2265 at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
> (LifecycleModuleBuilder.java:117)
> 2266 at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
> (LifecycleModuleBuilder.java:81)
> 2267 at 
> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build
>  (SingleThreadedBuilder.java:51)
> 2268 at org.apache.maven.lifecycle.internal.LifecycleStarter.execute 
> (LifecycleStarter.java:128)
> 2269 at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:309)
> 2270 at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:194)
> 2271 at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:107)
> 2272 at org.apache.maven.cli.MavenCli.execute (MavenCli.java:955)
> 2273 at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:290)
> 2274 at org.apache.maven.cli.MavenCli.main (MavenCli.java:194)
> 2275 at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
> 2276 at sun.reflect.NativeMethodAccessorImpl.invoke 
> (NativeMethodAccessorImpl.java:62)
> 2277 at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
> 2278 at java.lang.reflect.Method.invoke (Method.java:498)
> 2279 at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced 
> (Launcher.java:289)
> 2280 at org.codehaus.plexus.classworlds.launcher.Launcher.launch 
> (Launcher.java:229)
> 2281 at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode 
> (Launcher.java:415)
> 2282 at org.codehaus.plexus.classworlds.launcher.Launcher.main 
> (Launcher.java:356)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (LIVY-635) Travis failed to build

2019-08-15 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned LIVY-635:


Assignee: jiewang

> Travis failed to build
> --
>
> Key: LIVY-635
> URL: https://issues.apache.org/jira/browse/LIVY-635
> Project: Livy
>  Issue Type: Bug
>  Components: Tests, Thriftserver
>Affects Versions: 0.6.0
>Reporter: jiewang
>Assignee: jiewang
>Priority: Major
> Fix For: 0.7.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> [ERROR] Failed to execute goal on project livy-thriftserver: Could not 
> resolve dependencies for project 
> org.apache.livy:livy-thriftserver:jar:0.7.0-incubating-SNAPSHOT: Failed to 
> collect dependencies at org.apache.hive:hive-jdbc:jar:3.0.0 -> 
> org.apache.hive:hive-service:jar:3.0.0 -> 
> org.apache.hive:hive-llap-server:jar:3.0.0 -> 
> org.apache.hbase:hbase-server:jar:2.0.0-alpha4 -> 
> org.glassfish.web:javax.servlet.jsp:jar:2.3.2 -> 
> org.glassfish:javax.el:jar:3.0.1-b08-SNAPSHOT: Failed to read artifact 
> descriptor for org.glassfish:javax.el:jar:3.0.1-b08-SNAPSHOT: Could not 
> transfer artifact org.glassfish:javax.el:pom:3.0.1-b08-SNAPSHOT from/to 
> apache-snapshots (https://repository.apache.org/snapshots/): Connect to 
> repository.apache.org:443 [repository.apache.org/207.244.88.140] failed: 
> Connection timed out (Connection timed out) -> [Help 1]
> 2258org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal on project livy-thriftserver: Could not resolve dependencies for project 
> org.apache.livy:livy-thriftserver:jar:0.7.0-incubating-SNAPSHOT: Failed to 
> collect dependencies at org.apache.hive:hive-jdbc:jar:3.0.0 -> 
> org.apache.hive:hive-service:jar:3.0.0 -> 
> org.apache.hive:hive-llap-server:jar:3.0.0 -> 
> org.apache.hbase:hbase-server:jar:2.0.0-alpha4 -> 
> org.glassfish.web:javax.servlet.jsp:jar:2.3.2 -> 
> org.glassfish:javax.el:jar:3.0.1-b08-SNAPSHOT
> 2259 at 
> org.apache.maven.lifecycle.internal.LifecycleDependencyResolver.getDependencies
>  (LifecycleDependencyResolver.java:249)
> 2260 at 
> org.apache.maven.lifecycle.internal.LifecycleDependencyResolver.resolveProjectDependencies
>  (LifecycleDependencyResolver.java:145)
> 2261 at 
> org.apache.maven.lifecycle.internal.MojoExecutor.ensureDependenciesAreResolved
>  (MojoExecutor.java:246)
> 2262 at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:200)
> 2263 at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:154)
> 2264 at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:146)
> 2265 at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
> (LifecycleModuleBuilder.java:117)
> 2266 at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
> (LifecycleModuleBuilder.java:81)
> 2267 at 
> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build
>  (SingleThreadedBuilder.java:51)
> 2268 at org.apache.maven.lifecycle.internal.LifecycleStarter.execute 
> (LifecycleStarter.java:128)
> 2269 at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:309)
> 2270 at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:194)
> 2271 at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:107)
> 2272 at org.apache.maven.cli.MavenCli.execute (MavenCli.java:955)
> 2273 at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:290)
> 2274 at org.apache.maven.cli.MavenCli.main (MavenCli.java:194)
> 2275 at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
> 2276 at sun.reflect.NativeMethodAccessorImpl.invoke 
> (NativeMethodAccessorImpl.java:62)
> 2277 at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
> 2278 at java.lang.reflect.Method.invoke (Method.java:498)
> 2279 at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced 
> (Launcher.java:289)
> 2280 at org.codehaus.plexus.classworlds.launcher.Launcher.launch 
> (Launcher.java:229)
> 2281 at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode 
> (Launcher.java:415)
> 2282 at org.codehaus.plexus.classworlds.launcher.Launcher.main 
> (Launcher.java:356)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (LIVY-635) Travis failed to build

2019-08-15 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-635.
--
   Resolution: Fixed
Fix Version/s: 0.7.0

Issue resolved by pull request 198
[https://github.com/apache/incubator-livy/pull/198]

> Travis failed to build
> --
>
> Key: LIVY-635
> URL: https://issues.apache.org/jira/browse/LIVY-635
> Project: Livy
>  Issue Type: Bug
>  Components: Tests, Thriftserver
>Affects Versions: 0.6.0
>Reporter: jiewang
>Priority: Major
> Fix For: 0.7.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> [ERROR] Failed to execute goal on project livy-thriftserver: Could not 
> resolve dependencies for project 
> org.apache.livy:livy-thriftserver:jar:0.7.0-incubating-SNAPSHOT: Failed to 
> collect dependencies at org.apache.hive:hive-jdbc:jar:3.0.0 -> 
> org.apache.hive:hive-service:jar:3.0.0 -> 
> org.apache.hive:hive-llap-server:jar:3.0.0 -> 
> org.apache.hbase:hbase-server:jar:2.0.0-alpha4 -> 
> org.glassfish.web:javax.servlet.jsp:jar:2.3.2 -> 
> org.glassfish:javax.el:jar:3.0.1-b08-SNAPSHOT: Failed to read artifact 
> descriptor for org.glassfish:javax.el:jar:3.0.1-b08-SNAPSHOT: Could not 
> transfer artifact org.glassfish:javax.el:pom:3.0.1-b08-SNAPSHOT from/to 
> apache-snapshots (https://repository.apache.org/snapshots/): Connect to 
> repository.apache.org:443 [repository.apache.org/207.244.88.140] failed: 
> Connection timed out (Connection timed out) -> [Help 1]
> 2258org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute 
> goal on project livy-thriftserver: Could not resolve dependencies for project 
> org.apache.livy:livy-thriftserver:jar:0.7.0-incubating-SNAPSHOT: Failed to 
> collect dependencies at org.apache.hive:hive-jdbc:jar:3.0.0 -> 
> org.apache.hive:hive-service:jar:3.0.0 -> 
> org.apache.hive:hive-llap-server:jar:3.0.0 -> 
> org.apache.hbase:hbase-server:jar:2.0.0-alpha4 -> 
> org.glassfish.web:javax.servlet.jsp:jar:2.3.2 -> 
> org.glassfish:javax.el:jar:3.0.1-b08-SNAPSHOT
> 2259 at 
> org.apache.maven.lifecycle.internal.LifecycleDependencyResolver.getDependencies
>  (LifecycleDependencyResolver.java:249)
> 2260 at 
> org.apache.maven.lifecycle.internal.LifecycleDependencyResolver.resolveProjectDependencies
>  (LifecycleDependencyResolver.java:145)
> 2261 at 
> org.apache.maven.lifecycle.internal.MojoExecutor.ensureDependenciesAreResolved
>  (MojoExecutor.java:246)
> 2262 at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:200)
> 2263 at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:154)
> 2264 at org.apache.maven.lifecycle.internal.MojoExecutor.execute 
> (MojoExecutor.java:146)
> 2265 at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
> (LifecycleModuleBuilder.java:117)
> 2266 at 
> org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject 
> (LifecycleModuleBuilder.java:81)
> 2267 at 
> org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build
>  (SingleThreadedBuilder.java:51)
> 2268 at org.apache.maven.lifecycle.internal.LifecycleStarter.execute 
> (LifecycleStarter.java:128)
> 2269 at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:309)
> 2270 at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:194)
> 2271 at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:107)
> 2272 at org.apache.maven.cli.MavenCli.execute (MavenCli.java:955)
> 2273 at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:290)
> 2274 at org.apache.maven.cli.MavenCli.main (MavenCli.java:194)
> 2275 at sun.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
> 2276 at sun.reflect.NativeMethodAccessorImpl.invoke 
> (NativeMethodAccessorImpl.java:62)
> 2277 at sun.reflect.DelegatingMethodAccessorImpl.invoke 
> (DelegatingMethodAccessorImpl.java:43)
> 2278 at java.lang.reflect.Method.invoke (Method.java:498)
> 2279 at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced 
> (Launcher.java:289)
> 2280 at org.codehaus.plexus.classworlds.launcher.Launcher.launch 
> (Launcher.java:229)
> 2281 at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode 
> (Launcher.java:415)
> 2282 at org.codehaus.plexus.classworlds.launcher.Launcher.main 
> (Launcher.java:356)



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (LIVY-620) Spark batch session always ends with success when configuration is master yarn and deploy-mode client

2019-08-14 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned LIVY-620:


Assignee: Gustavo Martin

> Spark batch session always ends with success when configuration is master 
> yarn and deploy-mode client
> -
>
> Key: LIVY-620
> URL: https://issues.apache.org/jira/browse/LIVY-620
> Project: Livy
>  Issue Type: Improvement
>  Components: Batch
>Affects Versions: 0.5.0
>Reporter: Gustavo Martin
>Assignee: Gustavo Martin
>Priority: Major
> Fix For: 0.7.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> In AWS emr-5.23.0 with Livy 0.5.0 and the following configuration in 
> /etc/livy/conf/livy.conf:
> {noformat}
> livy.spark.masteryarn
> livy.spark.deploy-mode   client
> {noformat}
> Batch session always ends with success because yarn always ends with status 
> Succeeded. Even if spark fails for some reason (exceptions or whatever) batch 
> session ends with success.
>  Not sure, but the issue about yarn always ending with success when client 
> deploy-mode might be related to this Jira (see linked comment): 
> https://issues.apache.org/jira/browse/SPARK-11058?focusedCommentId=16052520&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16052520
> When client deploy-mode and having spark errors yarn always ends with status 
> Succeeded but the process launched by Livy (the one running 
> org.apache.spark.deploy.SparkSubmit) is killed and exits with no 0 return 
> code. So even if in this case yarn always ends with success livy can find out 
> if it ended with error and end with error itself.
> I have already implemented a patch (in master branch) that could fix this 
> issue:
> PR: [https://github.com/apache/incubator-livy/pull/192]
> {noformat}
> diff --git 
> a/server/src/main/scala/org/apache/livy/server/batch/BatchSession.scala 
> b/server/src/main/scala/org/apache/livy/server/batch/BatchSession.scala
> index 4b27058..c215a8e 100644
> --- a/server/src/main/scala/org/apache/livy/server/batch/BatchSession.scala
> +++ b/server/src/main/scala/org/apache/livy/server/batch/BatchSession.scala
> @@ -93,6 +93,7 @@ object BatchSession extends Logging {
>  
>val file = resolveURIs(Seq(request.file), livyConf)(0)
>val sparkSubmit = builder.start(Some(file), request.args)
>  
>Utils.startDaemonThread(s"batch-session-process-$id") {
>  childProcesses.incrementAndGet()
> @@ -101,6 +102,7 @@ object BatchSession extends Logging {
>  case 0 =>
>  case exitCode =>
>warn(s"spark-submit exited with code $exitCode")
> +  s.stateChanged(SparkApp.State.KILLED)
>}
>  } finally {
>childProcesses.decrementAndGet()
> @@ -182,6 +184,14 @@ class BatchSession(
>override def stateChanged(oldState: SparkApp.State, newState: 
> SparkApp.State): Unit = {
>  synchronized {
>debug(s"$this state changed from $oldState to $newState")
> +  if (_state != SessionState.Dead()) {
> +stateChanged(newState)
> +  }
> +}
> +  }
> +
> +  private def stateChanged(newState: SparkApp.State): Unit = {
> +synchronized {
>newState match {
>  case SparkApp.State.RUNNING =>
>_state = SessionState.Running
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (LIVY-620) Spark batch session always ends with success when configuration is master yarn and deploy-mode client

2019-08-14 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-620.
--
   Resolution: Fixed
Fix Version/s: 0.7.0

Issue resolved by pull request 192
[https://github.com/apache/incubator-livy/pull/192]

> Spark batch session always ends with success when configuration is master 
> yarn and deploy-mode client
> -
>
> Key: LIVY-620
> URL: https://issues.apache.org/jira/browse/LIVY-620
> Project: Livy
>  Issue Type: Improvement
>  Components: Batch
>Affects Versions: 0.5.0
>Reporter: Gustavo Martin
>Priority: Major
> Fix For: 0.7.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> In AWS emr-5.23.0 with Livy 0.5.0 and the following configuration in 
> /etc/livy/conf/livy.conf:
> {noformat}
> livy.spark.masteryarn
> livy.spark.deploy-mode   client
> {noformat}
> Batch session always ends with success because yarn always ends with status 
> Succeeded. Even if spark fails for some reason (exceptions or whatever) batch 
> session ends with success.
>  Not sure, but the issue about yarn always ending with success when client 
> deploy-mode might be related to this Jira (see linked comment): 
> https://issues.apache.org/jira/browse/SPARK-11058?focusedCommentId=16052520&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16052520
> When client deploy-mode and having spark errors yarn always ends with status 
> Succeeded but the process launched by Livy (the one running 
> org.apache.spark.deploy.SparkSubmit) is killed and exits with no 0 return 
> code. So even if in this case yarn always ends with success livy can find out 
> if it ended with error and end with error itself.
> I have already implemented a patch (in master branch) that could fix this 
> issue:
> PR: [https://github.com/apache/incubator-livy/pull/192]
> {noformat}
> diff --git 
> a/server/src/main/scala/org/apache/livy/server/batch/BatchSession.scala 
> b/server/src/main/scala/org/apache/livy/server/batch/BatchSession.scala
> index 4b27058..c215a8e 100644
> --- a/server/src/main/scala/org/apache/livy/server/batch/BatchSession.scala
> +++ b/server/src/main/scala/org/apache/livy/server/batch/BatchSession.scala
> @@ -93,6 +93,7 @@ object BatchSession extends Logging {
>  
>val file = resolveURIs(Seq(request.file), livyConf)(0)
>val sparkSubmit = builder.start(Some(file), request.args)
>  
>Utils.startDaemonThread(s"batch-session-process-$id") {
>  childProcesses.incrementAndGet()
> @@ -101,6 +102,7 @@ object BatchSession extends Logging {
>  case 0 =>
>  case exitCode =>
>warn(s"spark-submit exited with code $exitCode")
> +  s.stateChanged(SparkApp.State.KILLED)
>}
>  } finally {
>childProcesses.decrementAndGet()
> @@ -182,6 +184,14 @@ class BatchSession(
>override def stateChanged(oldState: SparkApp.State, newState: 
> SparkApp.State): Unit = {
>  synchronized {
>debug(s"$this state changed from $oldState to $newState")
> +  if (_state != SessionState.Dead()) {
> +stateChanged(newState)
> +  }
> +}
> +  }
> +
> +  private def stateChanged(newState: SparkApp.State): Unit = {
> +synchronized {
>newState match {
>  case SparkApp.State.RUNNING =>
>_state = SessionState.Running
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (LIVY-616) Livy Server discovery

2019-08-13 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/LIVY-616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16906098#comment-16906098
 ] 

Saisai Shao commented on LIVY-616:
--

I will spend time on the PR itself, but I'm a little busy these days, so it may 
not be fast to review. Also I would suggest you to separate to small sub-tasks 
for the ease of review, like API design, framework abstraction, detailed 
implementation, and so on.

> Livy Server discovery
> -
>
> Key: LIVY-616
> URL: https://issues.apache.org/jira/browse/LIVY-616
> Project: Livy
>  Issue Type: Improvement
>  Components: Server
>Reporter: Oleksandr Shevchenko
>Priority: Major
> Attachments: Livy Server discovery spec.pdf
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, there isn't a way to get Livy Server URI by the client without 
> setting Livy address explicitly to livy.conf. A client should set 
> "livy.server.host" variable and then get it via LivyConf. The same behavior 
> if you want to use Livy with Zeppelin, we need to set "zeppelin.livy.url". It 
> very inconvenient when we install Livy packages on couple nodes and don't 
> know where exactly Livy Server will be started e.g. by Ambari or Cloudera 
> Manager. Also, in this case, we need to have Livy configuration files on a 
> node where we want to get Livy address. 
> It will be very helpful if we will add Livy Server address to Zookeeper and 
> expose API for clients to get Livy URL to use it in client code for REST 
> calls. 
> Livy already supports state saving in Zookeeper but I don't see that we store 
> Livy server address somewhere. Before starting investigating and 
> implementation I want to ask here about this.
> Please, correct me if I missed something.
> Any comments will be highly appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (LIVY-616) Livy Server discovery

2019-08-12 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/LIVY-616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905734#comment-16905734
 ] 

Saisai Shao commented on LIVY-616:
--

I don't have the objections about it, just thinking about how to use it 
properly. Anyway in the current design, we need to depend on livy-server jar, 
it is not so good, maybe we should separate this zk discovery code path.

> Livy Server discovery
> -
>
> Key: LIVY-616
> URL: https://issues.apache.org/jira/browse/LIVY-616
> Project: Livy
>  Issue Type: Improvement
>  Components: Server
>Reporter: Oleksandr Shevchenko
>Priority: Major
> Attachments: Livy Server discovery spec.pdf
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, there isn't a way to get Livy Server URI by the client without 
> setting Livy address explicitly to livy.conf. A client should set 
> "livy.server.host" variable and then get it via LivyConf. The same behavior 
> if you want to use Livy with Zeppelin, we need to set "zeppelin.livy.url". It 
> very inconvenient when we install Livy packages on couple nodes and don't 
> know where exactly Livy Server will be started e.g. by Ambari or Cloudera 
> Manager. Also, in this case, we need to have Livy configuration files on a 
> node where we want to get Livy address. 
> It will be very helpful if we will add Livy Server address to Zookeeper and 
> expose API for clients to get Livy URL to use it in client code for REST 
> calls. 
> Livy already supports state saving in Zookeeper but I don't see that we store 
> Livy server address somewhere. Before starting investigating and 
> implementation I want to ask here about this.
> Please, correct me if I missed something.
> Any comments will be highly appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (LIVY-616) Livy Server discovery

2019-08-12 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/LIVY-616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904966#comment-16904966
 ] 

Saisai Shao commented on LIVY-616:
--

I think what I mean is to use http endpoint, simply using curl, but can be any 
other language binding. The problem is still the same (don't know the server 
url), then how do we handle this?

What you mentioned above is to use sdk, either Java or Scala... But Livy is 
actually a REST endpoint, so how do we handle other scenario where we don't 
have sdk support?

> Livy Server discovery
> -
>
> Key: LIVY-616
> URL: https://issues.apache.org/jira/browse/LIVY-616
> Project: Livy
>  Issue Type: Improvement
>  Components: Server
>Reporter: Oleksandr Shevchenko
>Priority: Major
> Attachments: Livy Server discovery spec.pdf
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, there isn't a way to get Livy Server URI by the client without 
> setting Livy address explicitly to livy.conf. A client should set 
> "livy.server.host" variable and then get it via LivyConf. The same behavior 
> if you want to use Livy with Zeppelin, we need to set "zeppelin.livy.url". It 
> very inconvenient when we install Livy packages on couple nodes and don't 
> know where exactly Livy Server will be started e.g. by Ambari or Cloudera 
> Manager. Also, in this case, we need to have Livy configuration files on a 
> node where we want to get Livy address. 
> It will be very helpful if we will add Livy Server address to Zookeeper and 
> expose API for clients to get Livy URL to use it in client code for REST 
> calls. 
> Livy already supports state saving in Zookeeper but I don't see that we store 
> Livy server address somewhere. Before starting investigating and 
> implementation I want to ask here about this.
> Please, correct me if I missed something.
> Any comments will be highly appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (LIVY-616) Livy Server discovery

2019-08-12 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/LIVY-616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904922#comment-16904922
 ] 

Saisai Shao commented on LIVY-616:
--

Yes, I was asking how to use HTTP request directly, do you have a related API?

> Livy Server discovery
> -
>
> Key: LIVY-616
> URL: https://issues.apache.org/jira/browse/LIVY-616
> Project: Livy
>  Issue Type: Improvement
>  Components: Server
>Reporter: Oleksandr Shevchenko
>Priority: Major
> Attachments: Livy Server discovery spec.pdf
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, there isn't a way to get Livy Server URI by the client without 
> setting Livy address explicitly to livy.conf. A client should set 
> "livy.server.host" variable and then get it via LivyConf. The same behavior 
> if you want to use Livy with Zeppelin, we need to set "zeppelin.livy.url". It 
> very inconvenient when we install Livy packages on couple nodes and don't 
> know where exactly Livy Server will be started e.g. by Ambari or Cloudera 
> Manager. Also, in this case, we need to have Livy configuration files on a 
> node where we want to get Livy address. 
> It will be very helpful if we will add Livy Server address to Zookeeper and 
> expose API for clients to get Livy URL to use it in client code for REST 
> calls. 
> Livy already supports state saving in Zookeeper but I don't see that we store 
> Livy server address somewhere. Before starting investigating and 
> implementation I want to ask here about this.
> Please, correct me if I missed something.
> Any comments will be highly appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (LIVY-616) Livy Server discovery

2019-08-12 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/LIVY-616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904908#comment-16904908
 ] 

Saisai Shao commented on LIVY-616:
--

I think I was asking if user directly using REST API, not Livy JobAPI. How do 
they get the Livy host url, do you have a such design?

> Livy Server discovery
> -
>
> Key: LIVY-616
> URL: https://issues.apache.org/jira/browse/LIVY-616
> Project: Livy
>  Issue Type: Improvement
>  Components: Server
>Reporter: Oleksandr Shevchenko
>Priority: Major
> Attachments: Livy Server discovery spec.pdf
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, there isn't a way to get Livy Server URI by the client without 
> setting Livy address explicitly to livy.conf. A client should set 
> "livy.server.host" variable and then get it via LivyConf. The same behavior 
> if you want to use Livy with Zeppelin, we need to set "zeppelin.livy.url". It 
> very inconvenient when we install Livy packages on couple nodes and don't 
> know where exactly Livy Server will be started e.g. by Ambari or Cloudera 
> Manager. Also, in this case, we need to have Livy configuration files on a 
> node where we want to get Livy address. 
> It will be very helpful if we will add Livy Server address to Zookeeper and 
> expose API for clients to get Livy URL to use it in client code for REST 
> calls. 
> Livy already supports state saving in Zookeeper but I don't see that we store 
> Livy server address somewhere. Before starting investigating and 
> implementation I want to ask here about this.
> Please, correct me if I missed something.
> Any comments will be highly appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (LIVY-616) Livy Server discovery

2019-08-11 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/LIVY-616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904863#comment-16904863
 ] 

Saisai Shao commented on LIVY-616:
--

Two questions about your design:

1. How do you handle different scenarios like directly using REST API, rather 
than using client API?
2. So for now this is binding to zookeeper, but Livy's recovery mode can also 
support HDFS, how do you support this scenario?

> Livy Server discovery
> -
>
> Key: LIVY-616
> URL: https://issues.apache.org/jira/browse/LIVY-616
> Project: Livy
>  Issue Type: Improvement
>  Components: Server
>Reporter: Oleksandr Shevchenko
>Priority: Major
> Attachments: Livy Server discovery spec.pdf
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, there isn't a way to get Livy Server URI by the client without 
> setting Livy address explicitly to livy.conf. A client should set 
> "livy.server.host" variable and then get it via LivyConf. The same behavior 
> if you want to use Livy with Zeppelin, we need to set "zeppelin.livy.url". It 
> very inconvenient when we install Livy packages on couple nodes and don't 
> know where exactly Livy Server will be started e.g. by Ambari or Cloudera 
> Manager. Also, in this case, we need to have Livy configuration files on a 
> node where we want to get Livy address. 
> It will be very helpful if we will add Livy Server address to Zookeeper and 
> expose API for clients to get Livy URL to use it in client code for REST 
> calls. 
> Livy already supports state saving in Zookeeper but I don't see that we store 
> Livy server address somewhere. Before starting investigating and 
> implementation I want to ask here about this.
> Please, correct me if I missed something.
> Any comments will be highly appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


Re: Creating session in livy with jars parameter

2019-08-11 Thread Saisai Shao
Can you please paste the exception?

Pavel Sivak  于2019年8月12日周一 上午11:52写道:

> Sure Saisai,
> First step - run livy server. I can see that server is running and web
> interface is available
> Second step - session creation. Example of body in POST request
>
>> {"kind":"spark", "jars":["local:/path_to_jar/test.jar"]}
>>
> Session is *STARTING* status, I can see that exception in LOG, but SPARK
> UI is available and I can see my jar in environment TAB
>
> But if I'm using some other library(I took guava for example) - session is
> in *IDLE* status
>
>> {"kind":"spark", "jars":["local:/path_to_guava_jar/guava.jar"]}
>
>
> I don't understand what can be the difference between my jar and some
> other jars...
>
> On Sun, Aug 11, 2019 at 10:46 PM Saisai Shao 
> wrote:
>
>> Would you mind listing the steps to reproduce your issue, and how do you
>> use REST APIs?
>>
>> Thanks
>> Saisai
>>
>> Pavel Sivak  于2019年8月10日周六 上午11:01写道:
>>
>>> Hi,
>>> My idea is to create livy session with my library in class path using
>>> "jars" parameter.
>>> I'm using REST API to create a session. After sending POST request I can
>>> see that spark session is UP, I can use Spark UI and there's my jar in the
>>> Environment TAB.
>>> But status of the session in livy is "Starting"...
>>> This is example from log file:
>>>
>>>> 19/08/09 22:10:37 INFO driver.SparkEntries: Created Spark session.
>>>> Exception in thread "main" java.lang.NullPointerException
>>>>at org.apache.livy.rsc.driver.JobWrapper.cancel(JobWrapper.java:90)
>>>>at org.apache.livy.rsc.driver.RSCDriver.shutdown(RSCDriver.java:127)
>>>>at org.apache.livy.rsc.driver.RSCDriver.run(RSCDriver.java:356)
>>>>at 
>>>> org.apache.livy.rsc.driver.RSCDriverBootstrapper.main(RSCDriverBootstrapper.java:93)
>>>>at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>at 
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>>>at 
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>>at java.lang.reflect.Method.invoke(Method.java:498)
>>>>at 
>>>> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>>>>at 
>>>> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
>>>>at 
>>>> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
>>>>at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
>>>>
>>>> Without my library livy session I'm getting livy session in Idle status.
>>> Can you help me to figure this out?
>>> Thanks
>>> --
>>> Best wishes,
>>> Pavel Sivak
>>>
>>
>
> --
> Best wishes,
> Pavel Sivak
>


Re: Creating session in livy with jars parameter

2019-08-11 Thread Saisai Shao
Would you mind listing the steps to reproduce your issue, and how do you
use REST APIs?

Thanks
Saisai

Pavel Sivak  于2019年8月10日周六 上午11:01写道:

> Hi,
> My idea is to create livy session with my library in class path using
> "jars" parameter.
> I'm using REST API to create a session. After sending POST request I can
> see that spark session is UP, I can use Spark UI and there's my jar in the
> Environment TAB.
> But status of the session in livy is "Starting"...
> This is example from log file:
>
>> 19/08/09 22:10:37 INFO driver.SparkEntries: Created Spark session.
>> Exception in thread "main" java.lang.NullPointerException
>>  at org.apache.livy.rsc.driver.JobWrapper.cancel(JobWrapper.java:90)
>>  at org.apache.livy.rsc.driver.RSCDriver.shutdown(RSCDriver.java:127)
>>  at org.apache.livy.rsc.driver.RSCDriver.run(RSCDriver.java:356)
>>  at 
>> org.apache.livy.rsc.driver.RSCDriverBootstrapper.main(RSCDriverBootstrapper.java:93)
>>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>  at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>  at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>  at java.lang.reflect.Method.invoke(Method.java:498)
>>  at 
>> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
>>  at 
>> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
>>  at 
>> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:167)
>>  at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
>>
>> Without my library livy session I'm getting livy session in Idle status.
> Can you help me to figure this out?
> Thanks
> --
> Best wishes,
> Pavel Sivak
>


Re: Pre-registering UDTs / UDFs in Livy session

2019-08-11 Thread Saisai Shao
Unfortunately there's no such mechanism to inject custom code when session
is started in Livy side. I think you can add some code in Spark side, Spark
has a listener hook "SparkListener", in which there has a hook `
onApplicationStart`, this hook will be called immediately after application
is started. You can take a look at SparkListener.

Thanks
Saisai

Sergii Mikhtoniuk  于2019年8月12日周一 上午6:14写道:

> Hi,
>
> I'm currently using Livy in two different contexts:
> - from Jupyter notebooks
> - from SqlLine/Beeline CLI over Thrift/JDBC connection.
>
> The data I work with includes GIS, so it sometimes necessary to register
> custom (GeoSpark) geometry UDTs and UDFs in the Spark session.
>
> For Jupyter notebook case I was able to simply add a custom step to my
> Jupyter kernel that registers UDTs after session is created, but I don't
> know how to achieve the same in JDBC client scenario.
>
> Is there any extension mechanism in Livy or Spark that would execute a
> custom code on session init or to automatically discover and register
> UDFs/UDTs?
>
> As I understand from https://issues.apache.org/jira/browse/SPARK-7768 the
> UDT mechanism is still in flux, but perhaps there's a better solution than
> to fork Livy to add my custom registration code.
>
> Any pointers are much appreciated.
>
> - Sergii
>


[jira] [Assigned] (LIVY-547) Livy kills session after livy.server.session.timeout even if the session is active

2019-08-08 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned LIVY-547:


Assignee: shanyu zhao

> Livy kills session after livy.server.session.timeout even if the session is 
> active
> --
>
> Key: LIVY-547
> URL: https://issues.apache.org/jira/browse/LIVY-547
> Project: Livy
>  Issue Type: Bug
>  Components: Server
>Reporter: Sandeep Nemuri
>Assignee: shanyu zhao
>Priority: Major
> Fix For: 0.7.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Livy kills session after {{livy.server.session.timeout}} even if the session 
> is active.
> Code that runs more than the {{livy.server.session.timeout}} with 
> intermediate sleeps.
> {noformat}
> %pyspark 
> import time 
> import datetime 
> import random
> def inside(p):
> x, y = random.random(), random.random()
> return x*x + y*y < 1
> NUM_SAMPLES=10
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 100 s") 
> time.sleep(100) 
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 200 s") 
> time.sleep(200) 
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 300 s1") 
> time.sleep(300)
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 300 s2") 
> time.sleep(300) 
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 300 s3") 
> time.sleep(300) 
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 300 s4") 
> time.sleep(300) 
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> {noformat}
> Livy log:
> {noformat}
> 19/01/07 17:38:59 INFO InteractiveSession: Interactive session 14 created 
> [appid: application_1546711709239_0002, owner: zeppelin-hwc327, proxyUser: 
> Some(admin), state: idle, kind: shared, info: 
> {driverLogUrl=http://hwc327-node3.hogwarts-labs.com:8042/node/containerlogs/container_e18_1546711709239_0002_01_01/admin,
>  
> sparkUiUrl=http://hwc327-node2.hogwarts-labs.com:8088/proxy/application_1546711709239_0002/}]
> 19/01/07 17:52:46 INFO InteractiveSession: Stopping InteractiveSession 14...
> 19/01/07 17:52:56 WARN RSCClient: Exception while waiting for end session 
> reply.
> java.util.concurrent.TimeoutException
> at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49)
> at org.apache.livy.rsc.RSCClient.stop(RSCClient.java:223)
> at 
> org.apache.livy.server.interactive.InteractiveSession$$anonfun$stopSession$1.apply(InteractiveSession.scala:471)
> at 
> org.apache.livy.server.interactive.InteractiveSession$$anonfun$stopSession$1.apply(InteractiveSession.scala:471)
> at scala.Option.foreach(Option.scala:236)
> at 
> org.apache.livy.server.interactive.InteractiveSession.stopSession(InteractiveSession.scala:471)
> at 
> org.apache.livy.sessions.Session$$anonfun$stop$1.apply$mcV$sp(Session.scala:174)
> at 
> org.apache.livy.sessions.Session$$anonfun$stop$1.apply(Session.scala:171)
> at 
> org.apache.livy.sessions.Session$$anonfun$stop$1.apply(Session.scala:171)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at 
> scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
> at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1

[jira] [Resolved] (LIVY-547) Livy kills session after livy.server.session.timeout even if the session is active

2019-08-08 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-547.
--
   Resolution: Fixed
Fix Version/s: 0.7.0

> Livy kills session after livy.server.session.timeout even if the session is 
> active
> --
>
> Key: LIVY-547
> URL: https://issues.apache.org/jira/browse/LIVY-547
> Project: Livy
>  Issue Type: Bug
>  Components: Server
>Reporter: Sandeep Nemuri
>Priority: Major
> Fix For: 0.7.0
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Livy kills session after {{livy.server.session.timeout}} even if the session 
> is active.
> Code that runs more than the {{livy.server.session.timeout}} with 
> intermediate sleeps.
> {noformat}
> %pyspark 
> import time 
> import datetime 
> import random
> def inside(p):
> x, y = random.random(), random.random()
> return x*x + y*y < 1
> NUM_SAMPLES=10
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 100 s") 
> time.sleep(100) 
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 200 s") 
> time.sleep(200) 
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 300 s1") 
> time.sleep(300)
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 300 s2") 
> time.sleep(300) 
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 300 s3") 
> time.sleep(300) 
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 300 s4") 
> time.sleep(300) 
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> {noformat}
> Livy log:
> {noformat}
> 19/01/07 17:38:59 INFO InteractiveSession: Interactive session 14 created 
> [appid: application_1546711709239_0002, owner: zeppelin-hwc327, proxyUser: 
> Some(admin), state: idle, kind: shared, info: 
> {driverLogUrl=http://hwc327-node3.hogwarts-labs.com:8042/node/containerlogs/container_e18_1546711709239_0002_01_01/admin,
>  
> sparkUiUrl=http://hwc327-node2.hogwarts-labs.com:8088/proxy/application_1546711709239_0002/}]
> 19/01/07 17:52:46 INFO InteractiveSession: Stopping InteractiveSession 14...
> 19/01/07 17:52:56 WARN RSCClient: Exception while waiting for end session 
> reply.
> java.util.concurrent.TimeoutException
> at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49)
> at org.apache.livy.rsc.RSCClient.stop(RSCClient.java:223)
> at 
> org.apache.livy.server.interactive.InteractiveSession$$anonfun$stopSession$1.apply(InteractiveSession.scala:471)
> at 
> org.apache.livy.server.interactive.InteractiveSession$$anonfun$stopSession$1.apply(InteractiveSession.scala:471)
> at scala.Option.foreach(Option.scala:236)
> at 
> org.apache.livy.server.interactive.InteractiveSession.stopSession(InteractiveSession.scala:471)
> at 
> org.apache.livy.sessions.Session$$anonfun$stop$1.apply$mcV$sp(Session.scala:174)
> at 
> org.apache.livy.sessions.Session$$anonfun$stop$1.apply(Session.scala:171)
> at 
> org.apache.livy.sessions.Session$$anonfun$stop$1.apply(Session.scala:171)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at 
> scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
> at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at 
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979

[jira] [Commented] (LIVY-547) Livy kills session after livy.server.session.timeout even if the session is active

2019-08-08 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/LIVY-547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16903616#comment-16903616
 ] 

Saisai Shao commented on LIVY-547:
--

Issue resolved by pull request 190
https://github.com/apache/incubator-livy/pull/190

> Livy kills session after livy.server.session.timeout even if the session is 
> active
> --
>
> Key: LIVY-547
> URL: https://issues.apache.org/jira/browse/LIVY-547
> Project: Livy
>  Issue Type: Bug
>  Components: Server
>Reporter: Sandeep Nemuri
>Priority: Major
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Livy kills session after {{livy.server.session.timeout}} even if the session 
> is active.
> Code that runs more than the {{livy.server.session.timeout}} with 
> intermediate sleeps.
> {noformat}
> %pyspark 
> import time 
> import datetime 
> import random
> def inside(p):
> x, y = random.random(), random.random()
> return x*x + y*y < 1
> NUM_SAMPLES=10
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 100 s") 
> time.sleep(100) 
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 200 s") 
> time.sleep(200) 
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 300 s1") 
> time.sleep(300)
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 300 s2") 
> time.sleep(300) 
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 300 s3") 
> time.sleep(300) 
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> print("waiting for 300 s4") 
> time.sleep(300) 
> count = sc.parallelize(xrange(0, NUM_SAMPLES)) \
>  .filter(inside).count()
> print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES)
> {noformat}
> Livy log:
> {noformat}
> 19/01/07 17:38:59 INFO InteractiveSession: Interactive session 14 created 
> [appid: application_1546711709239_0002, owner: zeppelin-hwc327, proxyUser: 
> Some(admin), state: idle, kind: shared, info: 
> {driverLogUrl=http://hwc327-node3.hogwarts-labs.com:8042/node/containerlogs/container_e18_1546711709239_0002_01_01/admin,
>  
> sparkUiUrl=http://hwc327-node2.hogwarts-labs.com:8088/proxy/application_1546711709239_0002/}]
> 19/01/07 17:52:46 INFO InteractiveSession: Stopping InteractiveSession 14...
> 19/01/07 17:52:56 WARN RSCClient: Exception while waiting for end session 
> reply.
> java.util.concurrent.TimeoutException
> at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:49)
> at org.apache.livy.rsc.RSCClient.stop(RSCClient.java:223)
> at 
> org.apache.livy.server.interactive.InteractiveSession$$anonfun$stopSession$1.apply(InteractiveSession.scala:471)
> at 
> org.apache.livy.server.interactive.InteractiveSession$$anonfun$stopSession$1.apply(InteractiveSession.scala:471)
> at scala.Option.foreach(Option.scala:236)
> at 
> org.apache.livy.server.interactive.InteractiveSession.stopSession(InteractiveSession.scala:471)
> at 
> org.apache.livy.sessions.Session$$anonfun$stop$1.apply$mcV$sp(Session.scala:174)
> at 
> org.apache.livy.sessions.Session$$anonfun$stop$1.apply(Session.scala:171)
> at 
> org.apache.livy.sessions.Session$$anonfun$stop$1.apply(Session.scala:171)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24)
> at 
> scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24)
> at 
> scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
> at 
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at 
> scala.concurrent.forkjoin.ForkJ

Re: Any plan to support update, delete and others

2019-08-08 Thread Saisai Shao
Got it. Thanks a lot for the reply.

Best regards,
Saisai

Ryan Blue  于2019年8月9日周五 上午6:36写道:

> We've actually been doing all of our API work in upstream Spark instead of
> adding APIs to Iceberg for row-level data manipulation. That's why I'm
> involved in the DataSourceV2 work.
>
> I think for Delta, this is probably an effort to get some features out
> earlier. I think that's easier for Delta because it deeply integrates with
> Spark and adds new plans -- last I checked, some of the project had to be
> located in Spark packages because they use internal classes.
>
> I think that this API will probably be contributed to Spark itself when
> Spark supports update and merge operations. That's probably a good time for
> Iceberg to pick it up because Iceberg still needs to update the format for
> those.
>
> Otherwise, Spark supports the latest features available in DataSourceV2,
> and will continue to. In fact, we're adding features to DSv2 based on what
> we've built internally at Netflix to support Iceberg.
>
> On Wed, Aug 7, 2019 at 7:03 PM Saisai Shao  wrote:
>
>> Thanks a lot Ryan, that would be very helpful!
>>
>> Delta lake recently adds support for such operations in API level (
>> https://github.com/delta-io/delta/blob/master/src/main/scala/io/delta/tables/DeltaTable.scala).
>> I was thinking that in the API level the goal of Iceberg is similar, maybe
>> we could take that as a reference.
>>
>> Besides directly using Iceberg API to manipulate data is not so
>> straightforward, so it would be great if we could also have a DF API/SQL
>> support later on.
>>
>> Best regards
>> Saisai
>>
>> Ryan Blue  于2019年8月8日周四 上午1:22写道:
>>
>>> Hi Saisai,
>>>
>>> We are working on adding row-level delete support to Iceberg, where the
>>> deletes are applied when data is read. We’ve had a few good design
>>> discussions and have come up with a good way to integrate these into the
>>> format. Erik has written a good document on it:
>>> https://docs.google.com/document/d/1FMKh_SQ6xSUUmoCA8LerTkzIxDUN5JbStQp5Hzot4eo/edit#heading=h.p74qmh3a6ets
>>>
>>> I’ve also started a milestone to track this work:
>>> https://github.com/apache/incubator-iceberg/issues?q=is%3Aopen+is%3Aissue+milestone%3A%22Row-level+Delete%22
>>>
>>> That’s assuming that you’re talking about row-level deletes. Iceberg
>>> already supports file-level delete, overwrite, etc.
>>>
>>> Iceberg also already supports a vacuum operation using ExpireSnapshots
>>> <http://iceberg.apache.org/javadoc/master/index.html?org/apache/iceberg/ExpireSnapshots.html>.
>>> But, Spark (and other engines) don’t have a way to call this yet. Same for 
>>> MERGE
>>> INTO, open source Spark doesn’t support the operation yet. We’re also
>>> working on building support into Spark as we go.
>>>
>>> I hope that helps!
>>>
>>> On Wed, Aug 7, 2019 at 4:25 AM Saisai Shao 
>>> wrote:
>>>
>>>> Hi team,
>>>>
>>>> Delta lake project recently announced version 0.3.0, which added
>>>> several new features in API level, like update, delete, merge, vacuum, etc.
>>>> May I ask is there any plan to add such features in Iceberg?
>>>>
>>>> Thanks
>>>> Saisai
>>>>
>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


Re: Two newbie question about Iceberg

2019-08-08 Thread Saisai Shao
I'm still looking into this, to figure out a way to add HIVE_LOCKS table in
the Spark side. Anyway I will create an issue first to track this.

Best regards,
Saisai

Ryan Blue  于2019年8月9日周五 上午4:58写道:

> Any ideas on how to fix this? Can we create the HIVE_LOCKS table if it is
> missing automatically?
>
> On Wed, Aug 7, 2019 at 7:13 PM Saisai Shao  wrote:
>
>> Thanks guys for your reply.
>>
>> I didn't do anything special, I don't even have a configured Hive. I just
>> simply put the iceberg (assembly) jar into Spark and start a local Spark
>> process. I think the built-in Hive version of Spark is 1.2.1-spark (has a
>> slight pom change), and all the configurations related to SparkSQL/Hive are
>> default. I guess the reason is like Anton mentioned, I will take a try by
>> creating all tables (HIVE_LOCKS) using script. But I think we should fix
>> it, this potentially stops user to do a quick start by using local spark.
>>
>>  think the reason why it works in tests is because we create all tables
>>> (including HIVE_LOCKS) using a script
>>>
>>
>> Best regards,
>> Saisai
>>
>> Anton Okolnychyi  于2019年8月7日周三 下午11:56写道:
>>
>>> I think the reason why it works in tests is because we create all tables
>>> (including HIVE_LOCKS) using a script. I am not sure lock tables are always
>>> created in embedded mode.
>>>
>>> > On 7 Aug 2019, at 16:49, Ryan Blue  wrote:
>>> >
>>> > This is the right list. Iceberg is fairly low in the stack, so most
>>> questions are probably dev questions.
>>> >
>>> > I'm surprised that this doesn't work with an embedded metastore
>>> because we use an embedded metastore in tests:
>>> https://github.com/apache/incubator-iceberg/blob/master/hive/src/test/java/org/apache/iceberg/hive/TestHiveMetastore.java
>>> >
>>> > But we are also using Hive 1.2.1 and a metastore schema for 3.1.0. I
>>> wonder if a newer version of Hive would avoid this problem? What version
>>> are you linking with?
>>> >
>>> > On Tue, Aug 6, 2019 at 8:59 PM Saisai Shao 
>>> wrote:
>>> > Hi team,
>>> >
>>> > I just met some issues when trying Iceberg with quick start guide. Not
>>> sure if it is proper to send this to @dev mail list (seems there's no user
>>> mail list).
>>> >
>>> > One issue is that seems current Iceberg cannot run with embedded
>>> metastore. It will throw an exception. Is this an on-purpose behavior
>>> (force to use remote HMS), or just a bug?
>>> >
>>> > Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Unable
>>> to update transaction database java.sql.SQLSyntaxErrorException: Table/View
>>> 'HIVE_LOCKS' does not exist.
>>> > at
>>> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown
>>> Source)
>>> > at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown
>>> Source)
>>> > at
>>> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown
>>> Source)
>>> >
>>> > Followed by this issue, seems like current Iceberg only binds to HMS
>>> as catalog, this is fine for production usage. But I'm wondering if we
>>> could have a simple catalog like in-memory catalog as Spark, so that it is
>>> easy for user to test and play. Is there any concern or plan?
>>> >
>>> > Best regards,
>>> > Saisai
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Ryan Blue
>>> > Software Engineer
>>> > Netflix
>>>
>>>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


Re: Iceberg in Spark 3.0.0

2019-08-07 Thread Saisai Shao
IMHO I agree that we should have a branch to track the changes for Spark
3.0.0. Spark 3.0.0 has several changes regarding to DataSource V2, it would
be better to evaluate the changes and do the design by also considering 3.0
changes.

My two cents :)

Best regards,
Saisai

Edgar Rodriguez  于2019年8月8日周四 上午4:58写道:

> Hi everyone,
>
> I was wondering if there's a branch tracking the changes happening in
> Spark 3.0.0 for Iceberg. The DataSource V2 API has substantially changed
> from the one implemented in Iceberg master branch and since Spark 3.0.0
> would allow us to introduce Spark SQL support then it seems interesting to
> start tracking those changes to start evaluating some of the support as it
> evolves.
>
> Thanks.
>
> Cheers,
> --
> Edgar Rodriguez
>


Re: Two newbie question about Iceberg

2019-08-07 Thread Saisai Shao
Thanks guys for your reply.

I didn't do anything special, I don't even have a configured Hive. I just
simply put the iceberg (assembly) jar into Spark and start a local Spark
process. I think the built-in Hive version of Spark is 1.2.1-spark (has a
slight pom change), and all the configurations related to SparkSQL/Hive are
default. I guess the reason is like Anton mentioned, I will take a try by
creating all tables (HIVE_LOCKS) using script. But I think we should fix
it, this potentially stops user to do a quick start by using local spark.

 think the reason why it works in tests is because we create all tables
> (including HIVE_LOCKS) using a script
>

Best regards,
Saisai

Anton Okolnychyi  于2019年8月7日周三 下午11:56写道:

> I think the reason why it works in tests is because we create all tables
> (including HIVE_LOCKS) using a script. I am not sure lock tables are always
> created in embedded mode.
>
> > On 7 Aug 2019, at 16:49, Ryan Blue  wrote:
> >
> > This is the right list. Iceberg is fairly low in the stack, so most
> questions are probably dev questions.
> >
> > I'm surprised that this doesn't work with an embedded metastore because
> we use an embedded metastore in tests:
> https://github.com/apache/incubator-iceberg/blob/master/hive/src/test/java/org/apache/iceberg/hive/TestHiveMetastore.java
> >
> > But we are also using Hive 1.2.1 and a metastore schema for 3.1.0. I
> wonder if a newer version of Hive would avoid this problem? What version
> are you linking with?
> >
> > On Tue, Aug 6, 2019 at 8:59 PM Saisai Shao 
> wrote:
> > Hi team,
> >
> > I just met some issues when trying Iceberg with quick start guide. Not
> sure if it is proper to send this to @dev mail list (seems there's no user
> mail list).
> >
> > One issue is that seems current Iceberg cannot run with embedded
> metastore. It will throw an exception. Is this an on-purpose behavior
> (force to use remote HMS), or just a bug?
> >
> > Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Unable to
> update transaction database java.sql.SQLSyntaxErrorException: Table/View
> 'HIVE_LOCKS' does not exist.
> > at
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown
> Source)
> > at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
> > at
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown
> Source)
> >
> > Followed by this issue, seems like current Iceberg only binds to HMS as
> catalog, this is fine for production usage. But I'm wondering if we could
> have a simple catalog like in-memory catalog as Spark, so that it is easy
> for user to test and play. Is there any concern or plan?
> >
> > Best regards,
> > Saisai
> >
> >
> >
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
>
>


Re: Any plan to support update, delete and others

2019-08-07 Thread Saisai Shao
Thanks a lot Ryan, that would be very helpful!

Delta lake recently adds support for such operations in API level (
https://github.com/delta-io/delta/blob/master/src/main/scala/io/delta/tables/DeltaTable.scala).
I was thinking that in the API level the goal of Iceberg is similar, maybe
we could take that as a reference.

Besides directly using Iceberg API to manipulate data is not so
straightforward, so it would be great if we could also have a DF API/SQL
support later on.

Best regards
Saisai

Ryan Blue  于2019年8月8日周四 上午1:22写道:

> Hi Saisai,
>
> We are working on adding row-level delete support to Iceberg, where the
> deletes are applied when data is read. We’ve had a few good design
> discussions and have come up with a good way to integrate these into the
> format. Erik has written a good document on it:
> https://docs.google.com/document/d/1FMKh_SQ6xSUUmoCA8LerTkzIxDUN5JbStQp5Hzot4eo/edit#heading=h.p74qmh3a6ets
>
> I’ve also started a milestone to track this work:
> https://github.com/apache/incubator-iceberg/issues?q=is%3Aopen+is%3Aissue+milestone%3A%22Row-level+Delete%22
>
> That’s assuming that you’re talking about row-level deletes. Iceberg
> already supports file-level delete, overwrite, etc.
>
> Iceberg also already supports a vacuum operation using ExpireSnapshots
> <http://iceberg.apache.org/javadoc/master/index.html?org/apache/iceberg/ExpireSnapshots.html>.
> But, Spark (and other engines) don’t have a way to call this yet. Same for 
> MERGE
> INTO, open source Spark doesn’t support the operation yet. We’re also
> working on building support into Spark as we go.
>
> I hope that helps!
>
> On Wed, Aug 7, 2019 at 4:25 AM Saisai Shao  wrote:
>
>> Hi team,
>>
>> Delta lake project recently announced version 0.3.0, which added several
>> new features in API level, like update, delete, merge, vacuum, etc. May I
>> ask is there any plan to add such features in Iceberg?
>>
>> Thanks
>> Saisai
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>


Any plan to support update, delete and others

2019-08-07 Thread Saisai Shao
Hi team,

Delta lake project recently announced version 0.3.0, which added several
new features in API level, like update, delete, merge, vacuum, etc. May I
ask is there any plan to add such features in Iceberg?

Thanks
Saisai


Re: Podling Report Reminder - August 2019

2019-08-07 Thread Saisai Shao
Oh, sorry, let me check it. Also it would grateful if any mentor could
sign-off the report.

Thanks
Saisai

Justin Mclean  于2019年8月7日周三 下午3:39写道:

> Hi,
>
> Thanks for submitting your report but I notice you didn't answer the "Have
> your mentors been helpful and responsive?" question. It would be great if
> you could do that.
>
> Thanks,
> Justin
>


Two newbie question about Iceberg

2019-08-06 Thread Saisai Shao
Hi team,

I just met some issues when trying Iceberg with quick start guide. Not sure
if it is proper to send this to @dev mail list (seems there's no user mail
list).

One issue is that seems current Iceberg cannot run with embedded metastore.
It will throw an exception. Is this an on-purpose behavior (force to use
remote HMS), or just a bug?

Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Unable to
update transaction database java.sql.SQLSyntaxErrorException: Table/View
'HIVE_LOCKS' does not exist.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown
Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
at
org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown
Source)

Followed by this issue, seems like current Iceberg only binds to HMS as
catalog, this is fine for production usage. But I'm wondering if we could
have a simple catalog like in-memory catalog as Spark, so that it is easy
for user to test and play. Is there any concern or plan?

Best regards,
Saisai


[jira] [Commented] (LIVY-616) Livy Server discovery

2019-08-06 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/LIVY-616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900749#comment-16900749
 ] 

Saisai Shao commented on LIVY-616:
--

I see your point. Let me check the design and code then. Thanks!

> Livy Server discovery
> -
>
> Key: LIVY-616
> URL: https://issues.apache.org/jira/browse/LIVY-616
> Project: Livy
>  Issue Type: Improvement
>  Components: Server
>Reporter: Oleksandr Shevchenko
>Priority: Major
> Attachments: Livy Server discovery spec.pdf
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, there isn't a way to get Livy Server URI by the client without 
> setting Livy address explicitly to livy.conf. A client should set 
> "livy.server.host" variable and then get it via LivyConf. The same behavior 
> if you want to use Livy with Zeppelin, we need to set "zeppelin.livy.url". It 
> very inconvenient when we install Livy packages on couple nodes and don't 
> know where exactly Livy Server will be started e.g. by Ambari or Cloudera 
> Manager. Also, in this case, we need to have Livy configuration files on a 
> node where we want to get Livy address. 
> It will be very helpful if we will add Livy Server address to Zookeeper and 
> expose API for clients to get Livy URL to use it in client code for REST 
> calls. 
> Livy already supports state saving in Zookeeper but I don't see that we store 
> Livy server address somewhere. Before starting investigating and 
> implementation I want to ask here about this.
> Please, correct me if I missed something.
> Any comments will be highly appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (LIVY-616) Livy Server discovery

2019-08-05 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/LIVY-616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900602#comment-16900602
 ] 

Saisai Shao commented on LIVY-616:
--

[~oshevchenko] you can either push an empty commit or reopen the PR to trigger 
travis test.

I'm curious about the scenarios of this proposal. Seems like you don't want to 
maintain the Livy address in client side, instead asking ZK to get Livy server 
address. I can see there's some advantages with this proposal, but instead you 
have to maintain ZK address, what's difference between maintaining either? 

Typically server discovery mechanism is mainly used in HA scenario, where you 
can get the active master address from ZK, but current Livy doesn't support HA. 
So IMHO, I don't see it super useful for now.

> Livy Server discovery
> -
>
> Key: LIVY-616
> URL: https://issues.apache.org/jira/browse/LIVY-616
> Project: Livy
>  Issue Type: Improvement
>  Components: Server
>Reporter: Oleksandr Shevchenko
>Priority: Major
> Attachments: Livy Server discovery spec.pdf
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, there isn't a way to get Livy Server URI by the client without 
> setting Livy address explicitly to livy.conf. A client should set 
> "livy.server.host" variable and then get it via LivyConf. The same behavior 
> if you want to use Livy with Zeppelin, we need to set "zeppelin.livy.url". It 
> very inconvenient when we install Livy packages on couple nodes and don't 
> know where exactly Livy Server will be started e.g. by Ambari or Cloudera 
> Manager. Also, in this case, we need to have Livy configuration files on a 
> node where we want to get Livy address. 
> It will be very helpful if we will add Livy Server address to Zookeeper and 
> expose API for clients to get Livy URL to use it in client code for REST 
> calls. 
> Livy already supports state saving in Zookeeper but I don't see that we store 
> Livy server address somewhere. Before starting investigating and 
> implementation I want to ask here about this.
> Please, correct me if I missed something.
> Any comments will be highly appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (SPARK-28475) Add regex MetricFilter to GraphiteSink

2019-08-02 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-28475:
---

Assignee: Nick Karpov

> Add regex MetricFilter to GraphiteSink
> --
>
> Key: SPARK-28475
> URL: https://issues.apache.org/jira/browse/SPARK-28475
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.3
>Reporter: Nick Karpov
>Assignee: Nick Karpov
>Priority: Major
> Fix For: 3.0.0
>
>
> Today all registered metric sources are reported to GraphiteSink with no 
> filtering mechanism, although the codahale project does support it.
> GraphiteReporter (ScheduledReporter) from the codahale project requires you 
> implement and supply the MetricFilter interface (there is only a single 
> implementation by default in the codahale project, MetricFilter.ALL).
> Propose to add an additional regex config to match and filter metrics to the 
> GraphiteSink



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28475) Add regex MetricFilter to GraphiteSink

2019-08-02 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-28475:

Priority: Minor  (was: Major)

> Add regex MetricFilter to GraphiteSink
> --
>
> Key: SPARK-28475
> URL: https://issues.apache.org/jira/browse/SPARK-28475
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.3
>Reporter: Nick Karpov
>Assignee: Nick Karpov
>Priority: Minor
> Fix For: 3.0.0
>
>
> Today all registered metric sources are reported to GraphiteSink with no 
> filtering mechanism, although the codahale project does support it.
> GraphiteReporter (ScheduledReporter) from the codahale project requires you 
> implement and supply the MetricFilter interface (there is only a single 
> implementation by default in the codahale project, MetricFilter.ALL).
> Propose to add an additional regex config to match and filter metrics to the 
> GraphiteSink



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28475) Add regex MetricFilter to GraphiteSink

2019-08-02 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-28475.
-
   Resolution: Resolved
Fix Version/s: 3.0.0

> Add regex MetricFilter to GraphiteSink
> --
>
> Key: SPARK-28475
> URL: https://issues.apache.org/jira/browse/SPARK-28475
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.3
>Reporter: Nick Karpov
>Priority: Major
> Fix For: 3.0.0
>
>
> Today all registered metric sources are reported to GraphiteSink with no 
> filtering mechanism, although the codahale project does support it.
> GraphiteReporter (ScheduledReporter) from the codahale project requires you 
> implement and supply the MetricFilter interface (there is only a single 
> implementation by default in the codahale project, MetricFilter.ALL).
> Propose to add an additional regex config to match and filter metrics to the 
> GraphiteSink



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28475) Add regex MetricFilter to GraphiteSink

2019-08-02 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898749#comment-16898749
 ] 

Saisai Shao commented on SPARK-28475:
-

This is resolved via https://github.com/apache/spark/pull/25232

> Add regex MetricFilter to GraphiteSink
> --
>
> Key: SPARK-28475
> URL: https://issues.apache.org/jira/browse/SPARK-28475
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.3
>Reporter: Nick Karpov
>Priority: Major
>
> Today all registered metric sources are reported to GraphiteSink with no 
> filtering mechanism, although the codahale project does support it.
> GraphiteReporter (ScheduledReporter) from the codahale project requires you 
> implement and supply the MetricFilter interface (there is only a single 
> implementation by default in the codahale project, MetricFilter.ALL).
> Propose to add an additional regex config to match and filter metrics to the 
> GraphiteSink



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (LIVY-597) Upgrade Livy guava dependency

2019-07-29 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao closed LIVY-597.

Resolution: Duplicate

> Upgrade Livy guava dependency
> -
>
> Key: LIVY-597
> URL: https://issues.apache.org/jira/browse/LIVY-597
> Project: Livy
>  Issue Type: Improvement
>Reporter: Arun Mahadevan
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The guava 15.0 that Livy is using is affected by CVE-2018-10237.
> It seems Livy's guava usage is limited and we can upgrade the version
> seamlessly.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (LIVY-587) Remove Guava dependency

2019-07-29 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned LIVY-587:


Assignee: Saisai Shao  (was: jiewang)

> Remove Guava dependency
> ---
>
> Key: LIVY-587
> URL: https://issues.apache.org/jira/browse/LIVY-587
> Project: Livy
>  Issue Type: Task
>  Components: Core
>Affects Versions: 0.6.0
>Reporter: Marcelo Vanzin
>Assignee: Saisai Shao
>Priority: Major
> Fix For: 0.7.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> It seems Guava has crept back into Livy at some point. Guava is kind of a 
> pain to maintain and update. We should avoid using it, especially since it 
> doesn't seem to be used for anything important.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (LIVY-587) Remove Guava dependency

2019-07-29 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned LIVY-587:


Assignee: jiewang  (was: Saisai Shao)

> Remove Guava dependency
> ---
>
> Key: LIVY-587
> URL: https://issues.apache.org/jira/browse/LIVY-587
> Project: Livy
>  Issue Type: Task
>  Components: Core
>Affects Versions: 0.6.0
>Reporter: Marcelo Vanzin
>Assignee: jiewang
>Priority: Major
> Fix For: 0.7.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> It seems Guava has crept back into Livy at some point. Guava is kind of a 
> pain to maintain and update. We should avoid using it, especially since it 
> doesn't seem to be used for anything important.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (LIVY-587) Remove Guava dependency

2019-07-29 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned LIVY-587:


Assignee: jiewang

> Remove Guava dependency
> ---
>
> Key: LIVY-587
> URL: https://issues.apache.org/jira/browse/LIVY-587
> Project: Livy
>  Issue Type: Task
>  Components: Core
>Affects Versions: 0.6.0
>Reporter: Marcelo Vanzin
>Assignee: jiewang
>Priority: Major
> Fix For: 0.7.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> It seems Guava has crept back into Livy at some point. Guava is kind of a 
> pain to maintain and update. We should avoid using it, especially since it 
> doesn't seem to be used for anything important.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (LIVY-587) Remove Guava dependency

2019-07-29 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-587.
--
   Resolution: Fixed
Fix Version/s: 0.7.0

Issue resolved by pull request 181
[https://github.com/apache/incubator-livy/pull/181]

> Remove Guava dependency
> ---
>
> Key: LIVY-587
> URL: https://issues.apache.org/jira/browse/LIVY-587
> Project: Livy
>  Issue Type: Task
>  Components: Core
>Affects Versions: 0.6.0
>Reporter: Marcelo Vanzin
>Priority: Major
> Fix For: 0.7.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> It seems Guava has crept back into Livy at some point. Guava is kind of a 
> pain to maintain and update. We should avoid using it, especially since it 
> doesn't seem to be used for anything important.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (LIVY-613) Livy can't handle the java.sql.Date type correctly

2019-07-26 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned LIVY-613:


Assignee: wyp

> Livy can't handle the java.sql.Date type correctly
> --
>
> Key: LIVY-613
> URL: https://issues.apache.org/jira/browse/LIVY-613
> Project: Livy
>  Issue Type: Bug
>  Components: REPL
>Affects Versions: 0.7.0
>Reporter: wyp
>Assignee: wyp
>Priority: Minor
> Fix For: 0.7.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When Spark table has java.sql.Date type column, Livy can't handle the 
> java.sql.Date type correctly. e.g
> {code:java}
> create table test(
> name string,
> birthday date
> );
> insert into test values ('Livy', '2019-07-24')
> curl -H "Content-Type:application/json" -X POST -d '{"code":"select * from 
> test", "kind":"sql"}' 192.168.1.6:8998/sessions/48/statements
> {"id":1,"code":"select * from 
> test","state":"waiting","output":null,"progress":0.0}
> curl 192.168.1.6:8998/sessions/48/statements/1
> {"id":1,"code":"select * from 
> test","state":"available","output":{"status":"ok","execution_count":1,"data":{"application/json":{"schema":{"type":"struct","fields":[{"name":"name","type":"string","nullable":true,"metadata":{}},{"name":"birthday","type":"date","nullable":true,"metadata":{}}]},"data":[["Livy",{}]]}}},"progress":1.0}{code}
> as you can see, the output of `select * from test` is ["Livy",{}], birthday 
> column's value isn't handle  correctly.
> The reason is that json4j can't handle java.sql.Date, so we should define the 
> CustomSerializer for java.sql.Date.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (LIVY-613) Livy can't handle the java.sql.Date type correctly

2019-07-26 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-613.
--
   Resolution: Fixed
Fix Version/s: 0.7.0

Issue resolved by pull request 186
[https://github.com/apache/incubator-livy/pull/186]

> Livy can't handle the java.sql.Date type correctly
> --
>
> Key: LIVY-613
> URL: https://issues.apache.org/jira/browse/LIVY-613
> Project: Livy
>  Issue Type: Bug
>  Components: REPL
>Affects Versions: 0.7.0
>Reporter: wyp
>Priority: Minor
> Fix For: 0.7.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> When Spark table has java.sql.Date type column, Livy can't handle the 
> java.sql.Date type correctly. e.g
> {code:java}
> create table test(
> name string,
> birthday date
> );
> insert into test values ('Livy', '2019-07-24')
> curl -H "Content-Type:application/json" -X POST -d '{"code":"select * from 
> test", "kind":"sql"}' 192.168.1.6:8998/sessions/48/statements
> {"id":1,"code":"select * from 
> test","state":"waiting","output":null,"progress":0.0}
> curl 192.168.1.6:8998/sessions/48/statements/1
> {"id":1,"code":"select * from 
> test","state":"available","output":{"status":"ok","execution_count":1,"data":{"application/json":{"schema":{"type":"struct","fields":[{"name":"name","type":"string","nullable":true,"metadata":{}},{"name":"birthday","type":"date","nullable":true,"metadata":{}}]},"data":[["Livy",{}]]}}},"progress":1.0}{code}
> as you can see, the output of `select * from test` is ["Livy",{}], birthday 
> column's value isn't handle  correctly.
> The reason is that json4j can't handle java.sql.Date, so we should define the 
> CustomSerializer for java.sql.Date.
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (LIVY-575) Implement missing metadata operations

2019-07-22 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/LIVY-575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16890656#comment-16890656
 ] 

Saisai Shao commented on LIVY-575:
--

Yes, seems like resolving wrong jiras.

> Implement missing metadata operations
> -
>
> Key: LIVY-575
> URL: https://issues.apache.org/jira/browse/LIVY-575
> Project: Livy
>  Issue Type: Improvement
>  Components: Thriftserver
>Reporter: Marco Gaido
>Priority: Minor
>
> Many metadata operations (eg. table list retrieval, schema retrieval, ...) 
> are currently not implemented. We should implement them.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (LIVY-575) Implement missing metadata operations

2019-07-22 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated LIVY-575:
-
Fix Version/s: (was: 0.7.0)

> Implement missing metadata operations
> -
>
> Key: LIVY-575
> URL: https://issues.apache.org/jira/browse/LIVY-575
> Project: Livy
>  Issue Type: Improvement
>  Components: Thriftserver
>Reporter: Marco Gaido
>Priority: Minor
>
> Many metadata operations (eg. table list retrieval, schema retrieval, ...) 
> are currently not implemented. We should implement them.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Reopened] (LIVY-575) Implement missing metadata operations

2019-07-22 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reopened LIVY-575:
--

> Implement missing metadata operations
> -
>
> Key: LIVY-575
> URL: https://issues.apache.org/jira/browse/LIVY-575
> Project: Livy
>  Issue Type: Improvement
>  Components: Thriftserver
>Reporter: Marco Gaido
>Priority: Minor
> Fix For: 0.7.0
>
>
> Many metadata operations (eg. table list retrieval, schema retrieval, ...) 
> are currently not implemented. We should implement them.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (LIVY-609) LDAP auth for Livy thriftserver

2019-07-22 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated LIVY-609:
-
Priority: Major  (was: Minor)

> LDAP auth for Livy thriftserver
> ---
>
> Key: LIVY-609
> URL: https://issues.apache.org/jira/browse/LIVY-609
> Project: Livy
>  Issue Type: New Feature
>  Components: Thriftserver
>Affects Versions: 0.6.0
>Reporter: dockerzhang
>Priority: Major
> Fix For: 0.7.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> we should support LDAP auth for Livy thriftserver



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Updated] (LIVY-609) LDAP auth for Livy thriftserver

2019-07-22 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated LIVY-609:
-
Fix Version/s: (was: 0.7.0)

> LDAP auth for Livy thriftserver
> ---
>
> Key: LIVY-609
> URL: https://issues.apache.org/jira/browse/LIVY-609
> Project: Livy
>  Issue Type: New Feature
>  Components: Thriftserver
>Affects Versions: 0.6.0
>Reporter: dockerzhang
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> we should support LDAP auth for Livy thriftserver



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (LIVY-610) optimization for windows environment build.

2019-07-21 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned LIVY-610:


Assignee: dockerzhang

> optimization for windows environment build.
> ---
>
> Key: LIVY-610
> URL: https://issues.apache.org/jira/browse/LIVY-610
> Project: Livy
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 0.6.0
>Reporter: dockerzhang
>Assignee: dockerzhang
>Priority: Trivial
> Fix For: 0.7.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> * we can remove requireOS restriction for windows building.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (LIVY-610) optimization for windows environment build.

2019-07-21 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-610.
--
Resolution: Fixed

Issue resolved by pull request 184
[https://github.com/apache/incubator-livy/pull/184]

> optimization for windows environment build.
> ---
>
> Key: LIVY-610
> URL: https://issues.apache.org/jira/browse/LIVY-610
> Project: Livy
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 0.6.0
>Reporter: dockerzhang
>Priority: Trivial
> Fix For: 0.7.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> * we can remove requireOS restriction for windows building.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (LIVY-575) Implement missing metadata operations

2019-07-18 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/LIVY-575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16887773#comment-16887773
 ] 

Saisai Shao commented on LIVY-575:
--

Hi [~mgaido], we have 3 colleagues planning to work on Livy project, but 
they're just starting, so I would suggest to break down to subtasks so that 
they can take this easily. Also it would be great if you could help to review. 
Thanks!

> Implement missing metadata operations
> -
>
> Key: LIVY-575
> URL: https://issues.apache.org/jira/browse/LIVY-575
> Project: Livy
>  Issue Type: Improvement
>  Components: Thriftserver
>Reporter: Marco Gaido
>Priority: Minor
>
> Many metadata operations (eg. table list retrieval, schema retrieval, ...) 
> are currently not implemented. We should implement them.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (SPARK-28106) Spark SQL add jar with wrong hdfs path, SparkContext still add it to jar path ,and cause Task Failed

2019-07-16 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-28106:
---

Assignee: angerszhu

> Spark SQL add jar with wrong hdfs path, SparkContext still add it to jar path 
> ,and cause Task Failed
> 
>
> Key: SPARK-28106
> URL: https://issues.apache.org/jira/browse/SPARK-28106
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Minor
> Attachments: image-2019-06-19-21-23-22-061.png, 
> image-2019-06-20-11-49-13-691.png, image-2019-06-20-11-50-36-418.png, 
> image-2019-06-20-11-51-06-889.png
>
>
> When we use SparkSQL, about add jar command, if we add a wrong path of HDFS 
> such as "add jar hdfs:///home/hadoop/test/test.jar", when execute it:
>  * In hive case , HiveClientImple call add jar, when runHiveSql() called, it 
> will cause error but will still run next code , then call  
> SparkContext.addJar, but this method don't have a path check when path schema 
> is HDFS , then do other sql, TaskDescribtion will carry jarPath of 
> SparkContext's registered JarPath. Then it will carry wrong path then cause 
> error happen
>  * None hive case, the same, will only check local path but not check hdfs 
> path.
>  
> {code:java}
> 19/06/19 19:55:12 INFO SessionState: converting to local 
> hdfs://home/hadoop/aaa.jar
> Failed to read external resource hdfs://home/hadoop/aaa.jar
> 19/06/19 19:55:12 ERROR SessionState: Failed to read external resource 
> hdfs://home/hadoop/aaa.jar
> java.lang.RuntimeException: Failed to read external resource 
> hdfs://home/hadoop/aaa.jar
> at 
> org.apache.hadoop.hive.ql.session.SessionState.downloadResource(SessionState.java:1288)
> atorg.apache.hadoop.hive.ql.session.SessionState.resolveAndDownload(SessionState.java:1242)
> at 
> org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1163)
> at 
> org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1149)
> at 
> org.apache.hadoop.hive.ql.processors.AddResourceProcessor.run(AddResourceProcessor.java:67)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:866)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:835)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:275)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:213)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:212)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:258)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.runHive(HiveClientImpl.scala:835)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.runSqlHive(HiveClientImpl.scala:825)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.addJar(HiveClientImpl.scala:983)
> at 
> org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:112)
> at 
> org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:40)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
> at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:195)
> at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:195)
> at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3365)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withCustomJobTag(SQLExecution.scala:119)
> at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:79)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:143)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
> at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3364)
> at org.apache.spark.sql.Dataset.(Dataset.scala:195)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:80)
> at org.apache.spark.sql.SparkSession.sql(Spa

[jira] [Resolved] (SPARK-28106) Spark SQL add jar with wrong hdfs path, SparkContext still add it to jar path ,and cause Task Failed

2019-07-16 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-28106.
-
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24909
[https://github.com/apache/spark/pull/24909]

> Spark SQL add jar with wrong hdfs path, SparkContext still add it to jar path 
> ,and cause Task Failed
> 
>
> Key: SPARK-28106
> URL: https://issues.apache.org/jira/browse/SPARK-28106
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: image-2019-06-19-21-23-22-061.png, 
> image-2019-06-20-11-49-13-691.png, image-2019-06-20-11-50-36-418.png, 
> image-2019-06-20-11-51-06-889.png
>
>
> When we use SparkSQL, about add jar command, if we add a wrong path of HDFS 
> such as "add jar hdfs:///home/hadoop/test/test.jar", when execute it:
>  * In hive case , HiveClientImple call add jar, when runHiveSql() called, it 
> will cause error but will still run next code , then call  
> SparkContext.addJar, but this method don't have a path check when path schema 
> is HDFS , then do other sql, TaskDescribtion will carry jarPath of 
> SparkContext's registered JarPath. Then it will carry wrong path then cause 
> error happen
>  * None hive case, the same, will only check local path but not check hdfs 
> path.
>  
> {code:java}
> 19/06/19 19:55:12 INFO SessionState: converting to local 
> hdfs://home/hadoop/aaa.jar
> Failed to read external resource hdfs://home/hadoop/aaa.jar
> 19/06/19 19:55:12 ERROR SessionState: Failed to read external resource 
> hdfs://home/hadoop/aaa.jar
> java.lang.RuntimeException: Failed to read external resource 
> hdfs://home/hadoop/aaa.jar
> at 
> org.apache.hadoop.hive.ql.session.SessionState.downloadResource(SessionState.java:1288)
> atorg.apache.hadoop.hive.ql.session.SessionState.resolveAndDownload(SessionState.java:1242)
> at 
> org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1163)
> at 
> org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1149)
> at 
> org.apache.hadoop.hive.ql.processors.AddResourceProcessor.run(AddResourceProcessor.java:67)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:866)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:835)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:275)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:213)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:212)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:258)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.runHive(HiveClientImpl.scala:835)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.runSqlHive(HiveClientImpl.scala:825)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.addJar(HiveClientImpl.scala:983)
> at 
> org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:112)
> at 
> org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:40)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
> at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:195)
> at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:195)
> at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3365)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withCustomJobTag(SQLExecution.scala:119)
> at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:79)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:143)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
> at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3364)
> at org.apache.spark.sql.Dataset.(Dataset.scala:195)
> at org.apache.

[jira] [Updated] (SPARK-28106) Spark SQL add jar with wrong hdfs path, SparkContext still add it to jar path ,and cause Task Failed

2019-07-16 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-28106:

Component/s: Spark Core

> Spark SQL add jar with wrong hdfs path, SparkContext still add it to jar path 
> ,and cause Task Failed
> 
>
> Key: SPARK-28106
> URL: https://issues.apache.org/jira/browse/SPARK-28106
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 2.2.0, 2.3.0, 2.4.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Minor
> Fix For: 3.0.0
>
> Attachments: image-2019-06-19-21-23-22-061.png, 
> image-2019-06-20-11-49-13-691.png, image-2019-06-20-11-50-36-418.png, 
> image-2019-06-20-11-51-06-889.png
>
>
> When we use SparkSQL, about add jar command, if we add a wrong path of HDFS 
> such as "add jar hdfs:///home/hadoop/test/test.jar", when execute it:
>  * In hive case , HiveClientImple call add jar, when runHiveSql() called, it 
> will cause error but will still run next code , then call  
> SparkContext.addJar, but this method don't have a path check when path schema 
> is HDFS , then do other sql, TaskDescribtion will carry jarPath of 
> SparkContext's registered JarPath. Then it will carry wrong path then cause 
> error happen
>  * None hive case, the same, will only check local path but not check hdfs 
> path.
>  
> {code:java}
> 19/06/19 19:55:12 INFO SessionState: converting to local 
> hdfs://home/hadoop/aaa.jar
> Failed to read external resource hdfs://home/hadoop/aaa.jar
> 19/06/19 19:55:12 ERROR SessionState: Failed to read external resource 
> hdfs://home/hadoop/aaa.jar
> java.lang.RuntimeException: Failed to read external resource 
> hdfs://home/hadoop/aaa.jar
> at 
> org.apache.hadoop.hive.ql.session.SessionState.downloadResource(SessionState.java:1288)
> atorg.apache.hadoop.hive.ql.session.SessionState.resolveAndDownload(SessionState.java:1242)
> at 
> org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1163)
> at 
> org.apache.hadoop.hive.ql.session.SessionState.add_resources(SessionState.java:1149)
> at 
> org.apache.hadoop.hive.ql.processors.AddResourceProcessor.run(AddResourceProcessor.java:67)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:866)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:835)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:275)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:213)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:212)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:258)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.runHive(HiveClientImpl.scala:835)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.runSqlHive(HiveClientImpl.scala:825)
> at 
> org.apache.spark.sql.hive.client.HiveClientImpl.addJar(HiveClientImpl.scala:983)
> at 
> org.apache.spark.sql.hive.HiveSessionResourceLoader.addJar(HiveSessionStateBuilder.scala:112)
> at 
> org.apache.spark.sql.execution.command.AddJarCommand.run(resources.scala:40)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
> at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:195)
> at org.apache.spark.sql.Dataset$$anonfun$6.apply(Dataset.scala:195)
> at org.apache.spark.sql.Dataset$$anonfun$53.apply(Dataset.scala:3365)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withCustomJobTag(SQLExecution.scala:119)
> at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:79)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:143)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
> at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3364)
> at org.apache.spark.sql.Dataset.(Dataset.scala:195)
> at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:80)
> at org.apache.spark.sql.SparkSession.s

[jira] [Resolved] (LIVY-582) python test_create_new_session_without_default_config test fails consistently

2019-07-15 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-582.
--
   Resolution: Fixed
Fix Version/s: 0.7.0

Issue resolved by pull request 180
[https://github.com/apache/incubator-livy/pull/180]

> python test_create_new_session_without_default_config test fails consistently
> -
>
> Key: LIVY-582
> URL: https://issues.apache.org/jira/browse/LIVY-582
> Project: Livy
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Felix Cheung
>Priority: Major
> Fix For: 0.7.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code:java}
> test_create_new_session_without_default_config 
> def test_create_new_session_without_default_config():
> > mock_and_validate_create_new_session(False)
> src/test/python/livy-tests/client_test.py:105:
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _
> :3: in wrapper
> ???
> src/test/python/livy-tests/client_test.py:48: in 
> mock_and_validate_create_new_session
> load_defaults=defaults)
> src/main/python/livy/client.py:88: in __init__
> session_conf_dict).json()['id']
> src/main/python/livy/client.py:388: in _create_new_session
> headers=self._conn._JSON_HEADERS, data=data)
> src/main/python/livy/client.py:500: in send_request
> json=data, auth=self._spnego_auth())
> .eggs/requests-2.21.0-py2.7.egg/requests/api.py:60: in request
> return session.request(method=method, url=url, **kwargs)
> .eggs/requests-2.21.0-py2.7.egg/requests/sessions.py:533: in request
> resp = self.send(prep, **send_kwargs)
> .eggs/requests-2.21.0-py2.7.egg/requests/sessions.py:646: in send
> r = adapter.send(request, **kwargs)
> .eggs/responses-0.10.6-py2.7.egg/responses.py:626: in unbound_on_send
> return self._on_request(adapter, request, *a, **kwargs)
> self = 
> adapter = 
> request = 
> kwargs = {'cert': None, 'proxies': OrderedDict(), 'stream': False, 'timeout': 
> 10, ...}
> match = None, resp_callback = None
> error_msg = "Connection refused by Responses: POST 
> http://machine:8998/sessions/ doesn't match Responses Mock"
> response = ConnectionError(u"Connection refused by Responses: POST 
> http://machine:8998/sessions/doesn't match Responses Mock",)
> {code}
> Not sure why. this fails 100% and I don't see anything listening to this 
> port. Need some help to troubleshoot this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (LIVY-582) python test_create_new_session_without_default_config test fails consistently

2019-07-15 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned LIVY-582:


Assignee: Yiheng Wang

> python test_create_new_session_without_default_config test fails consistently
> -
>
> Key: LIVY-582
> URL: https://issues.apache.org/jira/browse/LIVY-582
> Project: Livy
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Felix Cheung
>Assignee: Yiheng Wang
>Priority: Major
> Fix For: 0.7.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code:java}
> test_create_new_session_without_default_config 
> def test_create_new_session_without_default_config():
> > mock_and_validate_create_new_session(False)
> src/test/python/livy-tests/client_test.py:105:
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _
> :3: in wrapper
> ???
> src/test/python/livy-tests/client_test.py:48: in 
> mock_and_validate_create_new_session
> load_defaults=defaults)
> src/main/python/livy/client.py:88: in __init__
> session_conf_dict).json()['id']
> src/main/python/livy/client.py:388: in _create_new_session
> headers=self._conn._JSON_HEADERS, data=data)
> src/main/python/livy/client.py:500: in send_request
> json=data, auth=self._spnego_auth())
> .eggs/requests-2.21.0-py2.7.egg/requests/api.py:60: in request
> return session.request(method=method, url=url, **kwargs)
> .eggs/requests-2.21.0-py2.7.egg/requests/sessions.py:533: in request
> resp = self.send(prep, **send_kwargs)
> .eggs/requests-2.21.0-py2.7.egg/requests/sessions.py:646: in send
> r = adapter.send(request, **kwargs)
> .eggs/responses-0.10.6-py2.7.egg/responses.py:626: in unbound_on_send
> return self._on_request(adapter, request, *a, **kwargs)
> self = 
> adapter = 
> request = 
> kwargs = {'cert': None, 'proxies': OrderedDict(), 'stream': False, 'timeout': 
> 10, ...}
> match = None, resp_callback = None
> error_msg = "Connection refused by Responses: POST 
> http://machine:8998/sessions/ doesn't match Responses Mock"
> response = ConnectionError(u"Connection refused by Responses: POST 
> http://machine:8998/sessions/doesn't match Responses Mock",)
> {code}
> Not sure why. this fails 100% and I don't see anything listening to this 
> port. Need some help to troubleshoot this.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Assigned] (LIVY-603) upgrade build spark version to 2.4.3

2019-07-11 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned LIVY-603:


Assignee: Yiheng Wang

> upgrade build spark version to 2.4.3
> 
>
> Key: LIVY-603
> URL: https://issues.apache.org/jira/browse/LIVY-603
> Project: Livy
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 0.6.0
>Reporter: Jeffrey(Xilang) Yan
>Assignee: Yiheng Wang
>Priority: Major
> Fix For: 0.7.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> config with current pom.xml will fail because 2.4.0 spark is removed from 
> [http://mirrors.advancedhosters.com/apache/spark/]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Resolved] (LIVY-603) upgrade build spark version to 2.4.3

2019-07-11 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-603.
--
Resolution: Fixed

Issue resolved by pull request 179
[https://github.com/apache/incubator-livy/pull/179]

> upgrade build spark version to 2.4.3
> 
>
> Key: LIVY-603
> URL: https://issues.apache.org/jira/browse/LIVY-603
> Project: Livy
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 0.6.0
>Reporter: Jeffrey(Xilang) Yan
>Priority: Major
> Fix For: 0.7.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> config with current pom.xml will fail because 2.4.0 spark is removed from 
> [http://mirrors.advancedhosters.com/apache/spark/]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Commented] (LIVY-587) Remove Guava dependency

2019-07-10 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/LIVY-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16882007#comment-16882007
 ] 

Saisai Shao commented on LIVY-587:
--

I would suggest to remove guava rather than upgrade guava version. Guava is too 
heavy to include, unless it is unavoidable, let's not include it unnecessarily.

> Remove Guava dependency
> ---
>
> Key: LIVY-587
> URL: https://issues.apache.org/jira/browse/LIVY-587
> Project: Livy
>  Issue Type: Task
>  Components: Core
>Affects Versions: 0.6.0
>Reporter: Marcelo Vanzin
>Priority: Major
>
> It seems Guava has crept back into Livy at some point. Guava is kind of a 
> pain to maintain and update. We should avoid using it, especially since it 
> doesn't seem to be used for anything important.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (LIVY-587) Remove Guava dependency

2019-07-10 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/LIVY-587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16882007#comment-16882007
 ] 

Saisai Shao edited comment on LIVY-587 at 7/10/19 12:40 PM:


I would suggest to remove guava rather than upgrading guava version. Guava is 
too heavy to include, unless it is unavoidable, let's not include it 
unnecessarily.


was (Author: jerryshao):
I would suggest to remove guava rather than upgrade guava version. Guava is too 
heavy to include, unless it is unavoidable, let's not include it unnecessarily.

> Remove Guava dependency
> ---
>
> Key: LIVY-587
> URL: https://issues.apache.org/jira/browse/LIVY-587
> Project: Livy
>  Issue Type: Task
>  Components: Core
>Affects Versions: 0.6.0
>Reporter: Marcelo Vanzin
>Priority: Major
>
> It seems Guava has crept back into Livy at some point. Guava is kind of a 
> pain to maintain and update. We should avoid using it, especially since it 
> doesn't seem to be used for anything important.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (SPARK-28202) [Core] [Test] Avoid noises of system props in SparkConfSuite

2019-07-01 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-28202:
---

Assignee: ShuMing Li

> [Core] [Test] Avoid noises of system props in SparkConfSuite
> 
>
> Key: SPARK-28202
> URL: https://issues.apache.org/jira/browse/SPARK-28202
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: ShuMing Li
>Assignee: ShuMing Li
>Priority: Trivial
> Fix For: 3.0.0
>
>
> When SPARK_HOME of env is set and contains a specific `spark-defaults,conf`, 
> `org.apache.spark.util.loadDefaultSparkProperties` method may noise `system 
> props`. So when runs `core/test` module, it is possible to fail to run 
> `SparkConfSuite` .
>  
> It's easy to repair by setting `loadDefaults` in `SparkConf` to be false.
> ```
> [info] - accumulators (5 seconds, 565 milliseconds)
> [info] - deprecated configs *** FAILED *** (79 milliseconds)
> [info] 7 did not equal 4 (SparkConfSuite.scala:266)
> [info] org.scalatest.exceptions.TestFailedException:
> [info] at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:528)
> [info] at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:527)
> [info] at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
> [info] at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501)
> [info] at 
> org.apache.spark.SparkConfSuite.$anonfun$new$26(SparkConfSuite.scala:266)
> [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info] at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info] at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info] at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
> [info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:149)
> [info] at 
> org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
> [info] at 
> org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
> [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
> ```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28202) [Core] [Test] Avoid noises of system props in SparkConfSuite

2019-07-01 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-28202.
-
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24998
[https://github.com/apache/spark/pull/24998]

> [Core] [Test] Avoid noises of system props in SparkConfSuite
> 
>
> Key: SPARK-28202
> URL: https://issues.apache.org/jira/browse/SPARK-28202
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: ShuMing Li
>Priority: Trivial
> Fix For: 3.0.0
>
>
> When SPARK_HOME of env is set and contains a specific `spark-defaults,conf`, 
> `org.apache.spark.util.loadDefaultSparkProperties` method may noise `system 
> props`. So when runs `core/test` module, it is possible to fail to run 
> `SparkConfSuite` .
>  
> It's easy to repair by setting `loadDefaults` in `SparkConf` to be false.
> ```
> [info] - accumulators (5 seconds, 565 milliseconds)
> [info] - deprecated configs *** FAILED *** (79 milliseconds)
> [info] 7 did not equal 4 (SparkConfSuite.scala:266)
> [info] org.scalatest.exceptions.TestFailedException:
> [info] at 
> org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:528)
> [info] at 
> org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:527)
> [info] at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
> [info] at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501)
> [info] at 
> org.apache.spark.SparkConfSuite.$anonfun$new$26(SparkConfSuite.scala:266)
> [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info] at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info] at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info] at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:186)
> [info] at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:149)
> [info] at 
> org.scalatest.FunSuiteLike.invokeWithFixture$1(FunSuiteLike.scala:184)
> [info] at 
> org.scalatest.FunSuiteLike.$anonfun$runTest$1(FunSuiteLike.scala:196)
> [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:289)
> ```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25299) Use remote storage for persisting shuffle data

2019-07-01 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16876070#comment-16876070
 ] 

Saisai Shao commented on SPARK-25299:
-

Better to post a pdf version [~mcheah] :).

> Use remote storage for persisting shuffle data
> --
>
> Key: SPARK-25299
> URL: https://issues.apache.org/jira/browse/SPARK-25299
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Affects Versions: 2.4.0
>Reporter: Matt Cheah
>Priority: Major
>  Labels: SPIP
>
> In Spark, the shuffle primitive requires Spark executors to persist data to 
> the local disk of the worker nodes. If executors crash, the external shuffle 
> service can continue to serve the shuffle data that was written beyond the 
> lifetime of the executor itself. In YARN, Mesos, and Standalone mode, the 
> external shuffle service is deployed on every worker node. The shuffle 
> service shares local disk with the executors that run on its node.
> There are some shortcomings with the way shuffle is fundamentally implemented 
> right now. Particularly:
>  * If any external shuffle service process or node becomes unavailable, all 
> applications that had an executor that ran on that node must recompute the 
> shuffle blocks that were lost.
>  * Similarly to the above, the external shuffle service must be kept running 
> at all times, which may waste resources when no applications are using that 
> shuffle service node.
>  * Mounting local storage can prevent users from taking advantage of 
> desirable isolation benefits from using containerized environments, like 
> Kubernetes. We had an external shuffle service implementation in an early 
> prototype of the Kubernetes backend, but it was rejected due to its strict 
> requirement to be able to mount hostPath volumes or other persistent volume 
> setups.
> In the following [architecture discussion 
> document|https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.btqugnmt2h40]
>  (note: _not_ an SPIP), we brainstorm various high level architectures for 
> improving the external shuffle service in a way that addresses the above 
> problems. The purpose of this umbrella JIRA is to promote additional 
> discussion on how we can approach these problems, both at the architecture 
> level and the implementation level. We anticipate filing sub-issues that 
> break down the tasks that must be completed to achieve this goal.
> Edit June 28 2019: Our SPIP is here: 
> [https://docs.google.com/document/d/1d6egnL6WHOwWZe8MWv3m8n4PToNacdx7n_0iMSWwhCQ/edit]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25299) Use remote storage for persisting shuffle data

2019-06-27 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16874703#comment-16874703
 ] 

Saisai Shao commented on SPARK-25299:
-

Votes were passed, so what is our plan for code submission? [~yifeih] [~mcheah]

> Use remote storage for persisting shuffle data
> --
>
> Key: SPARK-25299
> URL: https://issues.apache.org/jira/browse/SPARK-25299
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Affects Versions: 2.4.0
>Reporter: Matt Cheah
>Priority: Major
>  Labels: SPIP
>
> In Spark, the shuffle primitive requires Spark executors to persist data to 
> the local disk of the worker nodes. If executors crash, the external shuffle 
> service can continue to serve the shuffle data that was written beyond the 
> lifetime of the executor itself. In YARN, Mesos, and Standalone mode, the 
> external shuffle service is deployed on every worker node. The shuffle 
> service shares local disk with the executors that run on its node.
> There are some shortcomings with the way shuffle is fundamentally implemented 
> right now. Particularly:
>  * If any external shuffle service process or node becomes unavailable, all 
> applications that had an executor that ran on that node must recompute the 
> shuffle blocks that were lost.
>  * Similarly to the above, the external shuffle service must be kept running 
> at all times, which may waste resources when no applications are using that 
> shuffle service node.
>  * Mounting local storage can prevent users from taking advantage of 
> desirable isolation benefits from using containerized environments, like 
> Kubernetes. We had an external shuffle service implementation in an early 
> prototype of the Kubernetes backend, but it was rejected due to its strict 
> requirement to be able to mount hostPath volumes or other persistent volume 
> setups.
> In the following [architecture discussion 
> document|https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.btqugnmt2h40]
>  (note: _not_ an SPIP), we brainstorm various high level architectures for 
> improving the external shuffle service in a way that addresses the above 
> problems. The purpose of this umbrella JIRA is to promote additional 
> discussion on how we can approach these problems, both at the architecture 
> level and the implementation level. We anticipate filing sub-issues that 
> break down the tasks that must be completed to achieve this goal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE][SPARK-25299] SPIP: Shuffle Storage API

2019-06-16 Thread Saisai Shao
+1 (binding)

Thanks
Saisai

Imran Rashid  于2019年6月15日周六 上午3:46写道:

> +1 (binding)
>
> I think this is a really important feature for spark.
>
> First, there is already a lot of interest in alternative shuffle storage
> in the community.  There is already a lot of interest in alternative
> shuffle storage, from dynamic allocation in kubernetes, to even just
> improving stability in standard on-premise use of Spark.  However, they're
> often stuck doing this in forks of Spark, and in ways that are not
> maintainable (because they copy-paste many spark internals) or are
> incorrect (for not correctly handling speculative execution & stage
> retries).
>
> Second, I think the specific proposal is good for finding the right
> balance between flexibility and too much complexity, to allow incremental
> improvements.  A lot of work has been put into this already to try to
> figure out which pieces are essential to make alternative shuffle storage
> implementations feasible.
>
> Of course, that means it doesn't include everything imaginable; some
> things still aren't supported, and some will still choose to use the older
> ShuffleManager api to give total control over all of shuffle.  But we know
> there are a reasonable set of things which can be implemented behind the
> api as the first step, and it can continue to evolve.
>
> On Fri, Jun 14, 2019 at 12:13 PM Ilan Filonenko  wrote:
>
>> +1 (non-binding). This API is versatile and flexible enough to handle
>> Bloomberg's internal use-cases. The ability for us to vary implementation
>> strategies is quite appealing. It is also worth to note the minimal changes
>> to Spark core in order to make it work. This is a very much needed addition
>> within the Spark shuffle story.
>>
>> On Fri, Jun 14, 2019 at 9:59 AM bo yang  wrote:
>>
>>> +1 This is great work, allowing plugin of different sort shuffle
>>> write/read implementation! Also great to see it retain the current Spark
>>> configuration
>>> (spark.shuffle.manager=org.apache.spark.shuffle.YourShuffleManagerImpl).
>>>
>>>
>>> On Thu, Jun 13, 2019 at 2:58 PM Matt Cheah  wrote:
>>>
 Hi everyone,



 I would like to call a vote for the SPIP for SPARK-25299
 , which proposes to
 introduce a pluggable storage API for temporary shuffle data.



 You may find the SPIP document here
 
 .



 The discussion thread for the SPIP was conducted here
 
 .



 Please vote on whether or not this proposal is agreeable to you.



 Thanks!



 -Matt Cheah

>>>


[jira] [Updated] (SPARK-25299) Use remote storage for persisting shuffle data

2019-06-12 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-25299:

Labels: SPIP  (was: )

> Use remote storage for persisting shuffle data
> --
>
> Key: SPARK-25299
> URL: https://issues.apache.org/jira/browse/SPARK-25299
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Affects Versions: 2.4.0
>Reporter: Matt Cheah
>Priority: Major
>  Labels: SPIP
>
> In Spark, the shuffle primitive requires Spark executors to persist data to 
> the local disk of the worker nodes. If executors crash, the external shuffle 
> service can continue to serve the shuffle data that was written beyond the 
> lifetime of the executor itself. In YARN, Mesos, and Standalone mode, the 
> external shuffle service is deployed on every worker node. The shuffle 
> service shares local disk with the executors that run on its node.
> There are some shortcomings with the way shuffle is fundamentally implemented 
> right now. Particularly:
>  * If any external shuffle service process or node becomes unavailable, all 
> applications that had an executor that ran on that node must recompute the 
> shuffle blocks that were lost.
>  * Similarly to the above, the external shuffle service must be kept running 
> at all times, which may waste resources when no applications are using that 
> shuffle service node.
>  * Mounting local storage can prevent users from taking advantage of 
> desirable isolation benefits from using containerized environments, like 
> Kubernetes. We had an external shuffle service implementation in an early 
> prototype of the Kubernetes backend, but it was rejected due to its strict 
> requirement to be able to mount hostPath volumes or other persistent volume 
> setups.
> In the following [architecture discussion 
> document|https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.btqugnmt2h40]
>  (note: _not_ an SPIP), we brainstorm various high level architectures for 
> improving the external shuffle service in a way that addresses the above 
> problems. The purpose of this umbrella JIRA is to promote additional 
> discussion on how we can approach these problems, both at the architecture 
> level and the implementation level. We anticipate filing sub-issues that 
> break down the tasks that must be completed to achieve this goal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [DISCUSS][SPARK-25299] SPIP: Shuffle storage API

2019-06-12 Thread Saisai Shao
I think maybe we could start a vote on this SPIP.

This has been discussed for a while, and the current doc is pretty complete
as for now. Also we saw lots of demands in the community about building
their own shuffle storage.

Thanks
Saisai

Imran Rashid  于2019年6月11日周二 上午3:27写道:

> I would be happy to shepherd this.
>
> On Wed, Jun 5, 2019 at 7:33 PM Matt Cheah  wrote:
>
>> Hi everyone,
>>
>>
>>
>> I wanted to pick this back up again. The discussion has quieted down both
>> on this thread and on the document.
>>
>>
>>
>> We made a few revisions to the document to hopefully make it easier to
>> read and to clarify our criteria for success in the project. Some of the
>> APIs have also been adjusted based on further discussion and things we’ve
>> learned.
>>
>>
>>
>> I was hoping to discuss what our next steps could be here. Specifically,
>>
>>1. Would any PMC be willing to become the shepherd for this SPIP?
>>2. Is there any more feedback regarding this proposal?
>>3. What would we need to do to take this to a voting phase and to
>>begin proposing our work against upstream Spark?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> -Matt Cheah
>>
>>
>>
>> *From: *"Yifei Huang (PD)" 
>> *Date: *Monday, May 13, 2019 at 1:04 PM
>> *To: *Mridul Muralidharan 
>> *Cc: *Bo Yang , Ilan Filonenko , Imran
>> Rashid , Justin Uang , Liang
>> Tang , Marcelo Vanzin , Matei
>> Zaharia , Matt Cheah , Min
>> Shen , Reynold Xin , Ryan Blue <
>> rb...@netflix.com>, Vinoo Ganesh , Will Manning <
>> wmann...@palantir.com>, "b...@fb.com" , "
>> dev@spark.apache.org" , "fel...@uber.com" <
>> fel...@uber.com>, "f...@linkedin.com" , "
>> tgraves...@gmail.com" , "yez...@linkedin.com" <
>> yez...@linkedin.com>, "yue...@memverge.com" 
>> *Subject: *Re: [DISCUSS][SPARK-25299] SPIP: Shuffle storage API
>>
>>
>>
>> Hi Mridul - thanks for taking the time to give us feedback! Thoughts on
>> the points that you mentioned:
>>
>>
>>
>> The API is meant to work with the existing SortShuffleManager algorithm.
>> There aren't strict requirements on how other ShuffleManager
>> implementations must behave, so it seems impractical to design an API that
>> could also satisfy those unknown requirements. However, we do believe that
>> the API is rather generic, using OutputStreams for writes and InputStreams
>> for reads, and indexing the data by a shuffleId-mapId-reduceId combo, so if
>> other shuffle algorithms treat the data in the same chunks and want an
>> interface for storage, then they can also use this API from within their
>> implementation.
>>
>>
>>
>> About speculative execution, we originally made the assumption that each
>> shuffle task is deterministic, which meant that even if a later mapper
>> overrode a previous committed mapper's value, it's still the same contents.
>> Having searched some tickets and reading
>> https://github.com/apache/spark/pull/22112/files more carefully, I think
>> there are problems with our original thought if the writer writes all
>> attempts of a task to the same location. One example is if the writer
>> implementation writes each partition to the remote host in a sequence of
>> chunks. In such a situation, a reducer might read data half written by the
>> original task and half written by the running speculative task, which will
>> not be the correct contents if the mapper output is unordered. Therefore,
>> writes by a single mapper might have to be transactioned, which is not
>> clear from the API, and seems rather complex to reason about, so we
>> shouldn't expect this from the implementer.
>>
>>
>>
>> However, this doesn't affect the fundamentals of the API since we only
>> need to add an additional attemptId to the storage data index (which can be
>> stored within the MapStatus) to solve the problem of concurrent writes.
>> This would also make it more clear that the writer should use attempt ID as
>> an index to ensure that writes from speculative tasks don't interfere with
>> one another (we can add that to the API docs as well).
>>
>>
>>
>> *From: *Mridul Muralidharan 
>> *Date: *Wednesday, May 8, 2019 at 8:18 PM
>> *To: *"Yifei Huang (PD)" 
>> *Cc: *Bo Yang , Ilan Filonenko , Imran
>> Rashid , Justin Uang , Liang
>> Tang , Marcelo Vanzin , Matei
>> Zaharia , Matt Cheah , Min
>> Shen , Reynold Xin , Ryan Blue <
>> rb...@netflix.com>, Vinoo Ganesh , Will Manning <
>> wmann...@palantir.com>, "b...@fb.com" , "
>> dev@spark.apache.org" , "fel...@uber.com" <
>> fel...@uber.com>, "f...@linkedin.com" , "
>> tgraves...@gmail.com" , "yez...@linkedin.com" <
>> yez...@linkedin.com>, "yue...@memverge.com" 
>> *Subject: *Re: [DISCUSS][SPARK-25299] SPIP: Shuffle storage API
>>
>>
>>
>>
>>
>> Unfortunately I do not have bandwidth to do a detailed review, but a few
>> things come to mind after a quick read:
>>
>>
>>
>> - While it might be tactically beneficial to align with existing
>> implementation, a clean design which does not tie into existing shuffle
>> implementation would be preferable (if it can be done without o

[jira] [Created] (SPARK-27996) Spark UI redirect will be failed behind the https reverse proxy

2019-06-10 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-27996:
---

 Summary: Spark UI redirect will be failed behind the https reverse 
proxy
 Key: SPARK-27996
 URL: https://issues.apache.org/jira/browse/SPARK-27996
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.4.3
Reporter: Saisai Shao


When Spark live/history UI is proxied behind the reverse proxy, the redirect 
will return wrong scheme, for example:

If reverse proxy is SSL enabled, so the client to reverse proxy is a HTTPS 
request, whereas if Spark's UI is not SSL enabled, then the request from 
reverse proxy to Spark UI is a HTTP request, Spark itself treats all the 
requests as HTTP requests, so the redirect URL is just started with "http", 
which will be failed to redirect from client. 

Actually for most of the reverse proxy, the proxy will add an additional header 
"X-Forwarded-Proto" to tell the backend server that the client request is a 
https request, so Spark should leverage this header to return the correct URL.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [DISCUSS][SPARK-25299] SPIP: Shuffle storage API

2019-06-10 Thread Saisai Shao
I'm currently working with MemVerge on the Splash project (one
implementation of remote shuffle storage) and followed this ticket for a
while. I would like to be a shepherd if no one else volunteered to be.

Best regards,
Saisai

Matt Cheah  于2019年6月6日周四 上午8:33写道:

> Hi everyone,
>
>
>
> I wanted to pick this back up again. The discussion has quieted down both
> on this thread and on the document.
>
>
>
> We made a few revisions to the document to hopefully make it easier to
> read and to clarify our criteria for success in the project. Some of the
> APIs have also been adjusted based on further discussion and things we’ve
> learned.
>
>
>
> I was hoping to discuss what our next steps could be here. Specifically,
>
>1. Would any PMC be willing to become the shepherd for this SPIP?
>2. Is there any more feedback regarding this proposal?
>3. What would we need to do to take this to a voting phase and to
>begin proposing our work against upstream Spark?
>
>
>
> Thanks,
>
>
>
> -Matt Cheah
>
>
>
> *From: *"Yifei Huang (PD)" 
> *Date: *Monday, May 13, 2019 at 1:04 PM
> *To: *Mridul Muralidharan 
> *Cc: *Bo Yang , Ilan Filonenko , Imran
> Rashid , Justin Uang , Liang
> Tang , Marcelo Vanzin , Matei
> Zaharia , Matt Cheah , Min
> Shen , Reynold Xin , Ryan Blue <
> rb...@netflix.com>, Vinoo Ganesh , Will Manning <
> wmann...@palantir.com>, "b...@fb.com" , "dev@spark.apache.org"
> , "fel...@uber.com" , "
> f...@linkedin.com" , "tgraves...@gmail.com" <
> tgraves...@gmail.com>, "yez...@linkedin.com" , "
> yue...@memverge.com" 
> *Subject: *Re: [DISCUSS][SPARK-25299] SPIP: Shuffle storage API
>
>
>
> Hi Mridul - thanks for taking the time to give us feedback! Thoughts on
> the points that you mentioned:
>
>
>
> The API is meant to work with the existing SortShuffleManager algorithm.
> There aren't strict requirements on how other ShuffleManager
> implementations must behave, so it seems impractical to design an API that
> could also satisfy those unknown requirements. However, we do believe that
> the API is rather generic, using OutputStreams for writes and InputStreams
> for reads, and indexing the data by a shuffleId-mapId-reduceId combo, so if
> other shuffle algorithms treat the data in the same chunks and want an
> interface for storage, then they can also use this API from within their
> implementation.
>
>
>
> About speculative execution, we originally made the assumption that each
> shuffle task is deterministic, which meant that even if a later mapper
> overrode a previous committed mapper's value, it's still the same contents.
> Having searched some tickets and reading
> https://github.com/apache/spark/pull/22112/files more carefully, I think
> there are problems with our original thought if the writer writes all
> attempts of a task to the same location. One example is if the writer
> implementation writes each partition to the remote host in a sequence of
> chunks. In such a situation, a reducer might read data half written by the
> original task and half written by the running speculative task, which will
> not be the correct contents if the mapper output is unordered. Therefore,
> writes by a single mapper might have to be transactioned, which is not
> clear from the API, and seems rather complex to reason about, so we
> shouldn't expect this from the implementer.
>
>
>
> However, this doesn't affect the fundamentals of the API since we only
> need to add an additional attemptId to the storage data index (which can be
> stored within the MapStatus) to solve the problem of concurrent writes.
> This would also make it more clear that the writer should use attempt ID as
> an index to ensure that writes from speculative tasks don't interfere with
> one another (we can add that to the API docs as well).
>
>
>
> *From: *Mridul Muralidharan 
> *Date: *Wednesday, May 8, 2019 at 8:18 PM
> *To: *"Yifei Huang (PD)" 
> *Cc: *Bo Yang , Ilan Filonenko , Imran
> Rashid , Justin Uang , Liang
> Tang , Marcelo Vanzin , Matei
> Zaharia , Matt Cheah , Min
> Shen , Reynold Xin , Ryan Blue <
> rb...@netflix.com>, Vinoo Ganesh , Will Manning <
> wmann...@palantir.com>, "b...@fb.com" , "dev@spark.apache.org"
> , "fel...@uber.com" , "
> f...@linkedin.com" , "tgraves...@gmail.com" <
> tgraves...@gmail.com>, "yez...@linkedin.com" , "
> yue...@memverge.com" 
> *Subject: *Re: [DISCUSS][SPARK-25299] SPIP: Shuffle storage API
>
>
>
>
>
> Unfortunately I do not have bandwidth to do a detailed review, but a few
> things come to mind after a quick read:
>
>
>
> - While it might be tactically beneficial to align with existing
> implementation, a clean design which does not tie into existing shuffle
> implementation would be preferable (if it can be done without over
> engineering). Shuffle implementation can change and there are custom
> implementations and experiments which differ quite a bit from what comes
> with Apache Spark.
>
>
>
>
>
> - Please keep speculative execution in mind while designing the
> interfaces: in 

Re: Support for Livy with Scala 2.12

2019-06-05 Thread Saisai Shao
 Sorry I don't have bandwidth to support 2.12, I can help to review it if
someone can do this.

Thanks
Saisai

 于2019年6月5日周三 上午10:09写道:

> Hi Saisai,
>
> I’m not familiar with Livy code.  We’re just using it for our Jupyter
> integration.
>
> I’m looking through PR for 2.11 migration that was done an year ago and it
> looks like it is mostly Pom changes.  If that’s not correct than I might
> need help to perform the upgrade.
>
> Do you have bandwidth to make this change?
>
>
>
>
> From: Saisai Shao mailto:sai.sai.s...@gmail.com>>
> Date: Tuesday, Jun 04, 2019, 8:56 PM
> To: user@livy.incubator.apache.org  user@livy.incubator.apache.org>>
> Subject: [External] Re: Support for Livy with Scala 2.12
>
> If you're familiar with Livy code, I think the effort is not so big.
> According to my previous experience on Scala 2.10 support, some codes may
> need to be changed because of version incompatible for Scala.
>
> Thanks
> Saisai
>
> mailto:santosh.dan...@ubs.com>> 于2019年6月4日周二
> 下午8:25写道:
> How much effort we need to put in to create 2.12 module? Is that just a
> change in POM files or code change is required?
>
> We have release planned for July to upgrade Jupyter and Livy to utilize
> spark 2.4.2.  This is blocking us from upgrade.
> From: Saisai Shao mailto:sai.sai.s...@gmail.com
> ><mailto:sai.sai.s...@gmail.com<mailto:sai.sai.s...@gmail.com>>>
> Date: Monday, Jun 03, 2019, 9:02 PM
> To: user@livy.incubator.apache.org<mailto:user@livy.incubator.apache.org>
> mailto:user@livy.incubator.apache.org
> ><mailto:user@livy.incubator.apache.org user@livy.incubator.apache.org>>>
> Subject: [External] Re: Support for Livy with Scala 2.12
>
> Like what we did before to support both Scala 2.10 and 2.11 in Livy, I
> think we should also have a new module to support 2.12.
>
> mailto:santosh.dan...@ubs.com> santosh.dan...@ubs.com<mailto:santosh.dan...@ubs.com>>> 于2019年6月4日周二
> 上午7:40写道:
> Yes, the spark binary we downloaded is built with default Scala 2.12.  We
> want to use databricks delta which I think only support Scala 2.12.  So,
> I'm stuck with Scala 2.12.  Moreover, Spark community is going to
> decommission Scala 2.11 completely from Spark 3.0 release.  We might need
> to prepare Livy to support Scala 2.12 by default.
>
> From: Kevin Risden [mailto:kris...@apache.org<mailto:kris...@apache.org
> ><mailto:kris...@apache.org<mailto:kris...@apache.org>>]
> Sent: Monday, June 03, 2019 6:35 PM
> To: user@livy.incubator.apache.org<mailto:user@livy.incubator.apache.org
> ><mailto:user@livy.incubator.apache.org user@livy.incubator.apache.org>>
> Subject: [External] Re: Support for Livy with Scala 2.12
>
> Looks like the issue might be Spark 2.4.2 only? From
> https://spark.apache.org/downloads.html, "Note that, Spark is pre-built
> with Scala 2.11 except version 2.4.2, which is pre-built with Scala 2.12."
> So maybe you just got unlucky with using Spark 2.4.2?
>
> Kevin Risden
>
>
> On Mon, Jun 3, 2019 at 6:19 PM  santosh.dan...@ubs.com><mailto:santosh.dan...@ubs.com santosh.dan...@ubs.com>>> wrote:
> Kevin,
>
> I'm using Livy 0.6.0.  The issues is related to not finding repl jars that
> support scala 2.12.  The error "requirement failed: Cannot find Livy REPL
> jars." is thrown because it couldn't find folder repl_2.12-jars under LIVY
> directory.
>
> I performed a test to make sure this issue is related to scala 2.12
> compatibility , I copied contents of repl_2.11-jars under Livy directory
> into new directory LIVY/repl_2.12-jars and this time I didn't get REPL jars
> exception it went ahead and created session but failed to start session due
> to rsc jars version incompatibility.
>
> LIVY Folder structure for error " requirement failed: Cannot find Livy
> REPL jars.""
>
> [/app/risk/ha02/livy]$ ls -ltr
> total 116
> -rwxr-xr-x 1 agriddev agriddev   160 Mar 19 14:39 NOTICE
> -rwxr-xr-x 1 agriddev agriddev 18665 Mar 19 14:39 LICENSE
> -rwxr-xr-x 1 agriddev agriddev   537 Mar 19 14:39 DISCLAIMER
> -rwxr-xr-x 1 agriddev agriddev 46355 Mar 19 14:42 THIRD-PARTY
> drwxr-xr-x 2 agriddev agriddev  4096 Mar 19 14:43 bin
> drwxr-xr-x 2 agriddev agriddev  4096 Apr 14 22:37 repl_2.11-jars
> drwxr-xr-x 2 agriddev agriddev  4096 Apr 14 22:37 rsc-jars
> drwxr-xr-x 2 agriddev agriddev 12288 Apr 14 22:37 jars
> drwxr-xr-x 2 agriddev agriddev  4096 Apr 14 22:37
> apache-livy-0.6.0-incubating-bin
> drwxr-xr-x 2 agriddev agriddev  4096 Jun  3 17:37 conf
> drwxr-xr-x 2 agriddev agriddev  4096 Jun  3 21:51 logs
>
> LIVY FOLDER STRUCTURE TO BYPASS "REQUIRE

Re: Support for Livy with Scala 2.12

2019-06-04 Thread Saisai Shao
If you're familiar with Livy code, I think the effort is not so big.
According to my previous experience on Scala 2.10 support, some codes may
need to be changed because of version incompatible for Scala.

Thanks
Saisai

 于2019年6月4日周二 下午8:25写道:

> How much effort we need to put in to create 2.12 module? Is that just a
> change in POM files or code change is required?
>
> We have release planned for July to upgrade Jupyter and Livy to utilize
> spark 2.4.2.  This is blocking us from upgrade.
> From: Saisai Shao mailto:sai.sai.s...@gmail.com>>
> Date: Monday, Jun 03, 2019, 9:02 PM
> To: user@livy.incubator.apache.org  user@livy.incubator.apache.org>>
> Subject: [External] Re: Support for Livy with Scala 2.12
>
> Like what we did before to support both Scala 2.10 and 2.11 in Livy, I
> think we should also have a new module to support 2.12.
>
> mailto:santosh.dan...@ubs.com>> 于2019年6月4日周二
> 上午7:40写道:
> Yes, the spark binary we downloaded is built with default Scala 2.12.  We
> want to use databricks delta which I think only support Scala 2.12.  So,
> I'm stuck with Scala 2.12.  Moreover, Spark community is going to
> decommission Scala 2.11 completely from Spark 3.0 release.  We might need
> to prepare Livy to support Scala 2.12 by default.
>
> From: Kevin Risden [mailto:kris...@apache.org<mailto:kris...@apache.org>]
> Sent: Monday, June 03, 2019 6:35 PM
> To: user@livy.incubator.apache.org<mailto:user@livy.incubator.apache.org>
> Subject: [External] Re: Support for Livy with Scala 2.12
>
> Looks like the issue might be Spark 2.4.2 only? From
> https://spark.apache.org/downloads.html, "Note that, Spark is pre-built
> with Scala 2.11 except version 2.4.2, which is pre-built with Scala 2.12."
> So maybe you just got unlucky with using Spark 2.4.2?
>
> Kevin Risden
>
>
> On Mon, Jun 3, 2019 at 6:19 PM  santosh.dan...@ubs.com>> wrote:
> Kevin,
>
> I'm using Livy 0.6.0.  The issues is related to not finding repl jars that
> support scala 2.12.  The error "requirement failed: Cannot find Livy REPL
> jars." is thrown because it couldn't find folder repl_2.12-jars under LIVY
> directory.
>
> I performed a test to make sure this issue is related to scala 2.12
> compatibility , I copied contents of repl_2.11-jars under Livy directory
> into new directory LIVY/repl_2.12-jars and this time I didn't get REPL jars
> exception it went ahead and created session but failed to start session due
> to rsc jars version incompatibility.
>
> LIVY Folder structure for error " requirement failed: Cannot find Livy
> REPL jars.""
>
> [/app/risk/ha02/livy]$ ls -ltr
> total 116
> -rwxr-xr-x 1 agriddev agriddev   160 Mar 19 14:39 NOTICE
> -rwxr-xr-x 1 agriddev agriddev 18665 Mar 19 14:39 LICENSE
> -rwxr-xr-x 1 agriddev agriddev   537 Mar 19 14:39 DISCLAIMER
> -rwxr-xr-x 1 agriddev agriddev 46355 Mar 19 14:42 THIRD-PARTY
> drwxr-xr-x 2 agriddev agriddev  4096 Mar 19 14:43 bin
> drwxr-xr-x 2 agriddev agriddev  4096 Apr 14 22:37 repl_2.11-jars
> drwxr-xr-x 2 agriddev agriddev  4096 Apr 14 22:37 rsc-jars
> drwxr-xr-x 2 agriddev agriddev 12288 Apr 14 22:37 jars
> drwxr-xr-x 2 agriddev agriddev  4096 Apr 14 22:37
> apache-livy-0.6.0-incubating-bin
> drwxr-xr-x 2 agriddev agriddev  4096 Jun  3 17:37 conf
> drwxr-xr-x 2 agriddev agriddev  4096 Jun  3 21:51 logs
>
> LIVY FOLDER STRUCTURE TO BYPASS "REQUIREMENT FAILED:CANNOT FIND LIVY REPL
> JARS"
>
> [/app/risk/ha02/livy]$ ls -ltr
> total 116
> -rwxr-xr-x 1 agriddev agriddev   160 Mar 19 14:39 NOTICE
> -rwxr-xr-x 1 agriddev agriddev 18665 Mar 19 14:39 LICENSE
> -rwxr-xr-x 1 agriddev agriddev   537 Mar 19 14:39 DISCLAIMER
> -rwxr-xr-x 1 agriddev agriddev 46355 Mar 19 14:42 THIRD-PARTY
> drwxr-xr-x 2 agriddev agriddev  4096 Mar 19 14:43 bin
> drwxr-xr-x 2 agriddev agriddev  4096 Apr 14 22:37 repl_2.11-jars
> drwxr-xr-x 2 agriddev agriddev  4096 Apr 14 22:37 rsc-jars
> drwxr-xr-x 2 agriddev agriddev 12288 Apr 14 22:37 jars
> drwxr-xr-x 2 agriddev agriddev  4096 Apr 14 22:37
> apache-livy-0.6.0-incubating-bin
> drwxr-xr-x 2 agriddev agriddev  4096 Jun  3 17:37 conf
> drwxr-xr-x 2 agriddev agriddev  4096 Jun  3 21:50 repl_2.12-jars
> drwxr-xr-x 2 agriddev agriddev  4096 Jun  3 21:51 logs
>
>
>
> Error Information
>
> zip
> 19/06/03 21:52:00 INFO LineBufferedStream: 19/06/03 21:52:00 INFO
> SecurityManager: Changing view acls to: agriddev
> 19/06/03 21:52:00 INFO LineBufferedStream: 19/06/03 21:52:00 INFO
> SecurityManager: Changing modify acls to: agriddev
> 19/06/03 21:52:00 INFO LineBufferedStream: 19/06/03 21:52:00 INFO
> SecurityManager: Changing view acls groups to:
> 19/06/03 21:52:00 INFO LineBufferedSt

Re: Support for Livy with Scala 2.12

2019-06-03 Thread Saisai Shao
Like what we did before to support both Scala 2.10 and 2.11 in Livy, I
think we should also have a new module to support 2.12.

 于2019年6月4日周二 上午7:40写道:

> Yes, the spark binary we downloaded is built with default Scala 2.12.  We
> want to use databricks delta which I think only support Scala 2.12.  So,
> I'm stuck with Scala 2.12.  Moreover, Spark community is going to
> decommission Scala 2.11 completely from Spark 3.0 release.  We might need
> to prepare Livy to support Scala 2.12 by default.
>
>
>
> *From:* Kevin Risden [mailto:kris...@apache.org]
> *Sent:* Monday, June 03, 2019 6:35 PM
> *To:* user@livy.incubator.apache.org
> *Subject:* [External] Re: Support for Livy with Scala 2.12
>
>
>
> Looks like the issue might be Spark 2.4.2 only? From
> https://spark.apache.org/downloads.html, "Note that, Spark is pre-built
> with Scala 2.11 except version 2.4.2, which is pre-built with Scala 2.12."
> So maybe you just got unlucky with using Spark 2.4.2?
>
>
>
> Kevin Risden
>
>
>
>
>
> On Mon, Jun 3, 2019 at 6:19 PM  wrote:
>
> Kevin,
>
>
>
> I'm using Livy 0.6.0.  The issues is related to not finding repl jars that
> support scala 2.12.  The error "requirement failed: Cannot find Livy REPL
> jars." is thrown because it couldn't find folder repl_2.12-jars under
> LIVY directory.
>
>
>
> I performed a test to make sure this issue is related to scala 2.12
> compatibility , I copied contents of repl_2.11-jars under Livy directory
> into new directory LIVY/repl_2.12-jars and this time I didn't get REPL jars
> exception it went ahead and created session but failed to start session due
> to rsc jars version incompatibility.
>
>
>
> *LIVY Folder structure for error " requirement failed: Cannot find Livy
> REPL jars.""*
>
>
>
> [/app/risk/ha02/livy]$ ls -ltr
>
> total 116
>
> -rwxr-xr-x 1 agriddev agriddev   160 Mar 19 14:39 NOTICE
>
> -rwxr-xr-x 1 agriddev agriddev 18665 Mar 19 14:39 LICENSE
>
> -rwxr-xr-x 1 agriddev agriddev   537 Mar 19 14:39 DISCLAIMER
>
> -rwxr-xr-x 1 agriddev agriddev 46355 Mar 19 14:42 THIRD-PARTY
>
> drwxr-xr-x 2 agriddev agriddev  4096 Mar 19 14:43 bin
>
> drwxr-xr-x 2 agriddev agriddev  4096 Apr 14 22:37 repl_2.11-jars
>
> drwxr-xr-x 2 agriddev agriddev  4096 Apr 14 22:37 rsc-jars
>
> drwxr-xr-x 2 agriddev agriddev 12288 Apr 14 22:37 jars
>
> drwxr-xr-x 2 agriddev agriddev  4096 Apr 14 22:37
> apache-livy-0.6.0-incubating-bin
>
> drwxr-xr-x 2 agriddev agriddev  4096 Jun  3 17:37 conf
>
> drwxr-xr-x 2 agriddev agriddev  4096 Jun  3 21:51 logs
>
>
>
> *LIVY FOLDER STRUCTURE TO BYPASS "REQUIREMENT FAILED:CANNOT FIND LIVY REPL
> JARS"*
>
>
>
> [/app/risk/ha02/livy]$ ls -ltr
>
> total 116
>
> -rwxr-xr-x 1 agriddev agriddev   160 Mar 19 14:39 NOTICE
>
> -rwxr-xr-x 1 agriddev agriddev 18665 Mar 19 14:39 LICENSE
>
> -rwxr-xr-x 1 agriddev agriddev   537 Mar 19 14:39 DISCLAIMER
>
> -rwxr-xr-x 1 agriddev agriddev 46355 Mar 19 14:42 THIRD-PARTY
>
> drwxr-xr-x 2 agriddev agriddev  4096 Mar 19 14:43 bin
>
> drwxr-xr-x 2 agriddev agriddev  4096 Apr 14 22:37 repl_2.11-jars
>
> drwxr-xr-x 2 agriddev agriddev  4096 Apr 14 22:37 rsc-jars
>
> drwxr-xr-x 2 agriddev agriddev 12288 Apr 14 22:37 jars
>
> drwxr-xr-x 2 agriddev agriddev  4096 Apr 14 22:37
> apache-livy-0.6.0-incubating-bin
>
> drwxr-xr-x 2 agriddev agriddev  4096 Jun  3 17:37 conf
>
> drwxr-xr-x 2 agriddev agriddev  4096 Jun  3 21:50 repl_2.12-jars
>
> drwxr-xr-x 2 agriddev agriddev  4096 Jun  3 21:51 logs
>
>
>
>
>
> Error Information
>
>
>
> zip
>
> 19/06/03 21:52:00 INFO LineBufferedStream: 19/06/03 21:52:00 INFO
> SecurityManager: Changing view acls to: agriddev
>
> 19/06/03 21:52:00 INFO LineBufferedStream: 19/06/03 21:52:00 INFO
> SecurityManager: Changing modify acls to: agriddev
>
> 19/06/03 21:52:00 INFO LineBufferedStream: 19/06/03 21:52:00 INFO
> SecurityManager: Changing view acls groups to:
>
> 19/06/03 21:52:00 INFO LineBufferedStream: 19/06/03 21:52:00 INFO
> SecurityManager: Changing modify acls groups to:
>
> 19/06/03 21:52:00 INFO LineBufferedStream: 19/06/03 21:52:00 INFO
> SecurityManager: SecurityManager: authentication disabled; ui acls
> disabled; users  with view permissions: Set(agriddev); groups with view
> permissions: Set(); users  with modify permissions: Set(agriddev); groups
> with modify permissions: Set()
>
> 19/06/03 21:52:01 INFO LineBufferedStream: 19/06/03 21:52:01 INFO Client:
> Submitting application application_1559316432251_0172 to ResourceManager
>
> 19/06/03 21:52:01 INFO LineBufferedStream: 19/06/03 21:52:01 INFO
> YarnClientImpl: Submitted application application_1559316432251_0172
>
> 19/06/03 21:52:01 INFO LineBufferedStream: 19/06/03 21:52:01 INFO Client:
> Application report for application_1559316432251_0172 (state: ACCEPTED)
>
> 19/06/03 21:52:01 INFO LineBufferedStream: 19/06/03 21:52:01 INFO Client:
>
> 19/06/03 21:52:01 INFO LineBufferedStream:   client token: N/A
>
> 19/06/03 21:52:01 INFO LineBufferedStream:   diagnostics: [Mon Jun 03
> 21:52:01 + 2019] Application is Activated, wai

[jira] [Commented] (SPARK-15348) Hive ACID

2019-05-21 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844750#comment-16844750
 ] 

Saisai Shao commented on SPARK-15348:
-

No, it doesn't support hive ACID, it has its own mechanism to support ACID.

> Hive ACID
> -
>
> Key: SPARK-15348
> URL: https://issues.apache.org/jira/browse/SPARK-15348
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.6.3, 2.0.2, 2.1.2, 2.2.0, 2.3.0
>Reporter: Ran Haim
>Priority: Major
>
> Spark does not support any feature of hive's transnational tables,
> you cannot use spark to delete/update a table and it also has problems 
> reading the aggregated data when no compaction was done.
> Also it seems that compaction is not supported - alter table ... partition 
>  COMPACT 'major'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15348) Hive ACID

2019-05-21 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844597#comment-16844597
 ] 

Saisai Shao commented on SPARK-15348:
-

I think delta lake project is exactly what you want.

> Hive ACID
> -
>
> Key: SPARK-15348
> URL: https://issues.apache.org/jira/browse/SPARK-15348
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.6.3, 2.0.2, 2.1.2, 2.2.0, 2.3.0
>Reporter: Ran Haim
>Priority: Major
>
> Spark does not support any feature of hive's transnational tables,
> you cannot use spark to delete/update a table and it also has problems 
> reading the aggregated data when no compaction was done.
> Also it seems that compaction is not supported - alter table ... partition 
>  COMPACT 'major'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15348) Hive ACID

2019-05-21 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16844597#comment-16844597
 ] 

Saisai Shao edited comment on SPARK-15348 at 5/21/19 7:38 AM:
--

I think delta lake project is exactly what you want. This was recently 
announced in Spark AI summit


was (Author: jerryshao):
I think delta lake project is exactly what you want.

> Hive ACID
> -
>
> Key: SPARK-15348
> URL: https://issues.apache.org/jira/browse/SPARK-15348
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.6.3, 2.0.2, 2.1.2, 2.2.0, 2.3.0
>Reporter: Ran Haim
>Priority: Major
>
> Spark does not support any feature of hive's transnational tables,
> you cannot use spark to delete/update a table and it also has problems 
> reading the aggregated data when no compaction was done.
> Also it seems that compaction is not supported - alter table ... partition 
>  COMPACT 'major'



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Dynamic metric names

2019-05-06 Thread Saisai Shao
I think the main reason why that was not merged is that Spark itself
doesn't have such requirement, and the metrics system is mainly used for
spark itself. Most of the needs are from the custom sources/sinks, but
Spark's MetricsSystem is not designed as a public API.

I think we could revisit or improve that PR if there's a solid reason about
it.

Thanks
Saisai

Sergey Zhemzhitsky  于2019年5月7日周二 上午5:49写道:

> Hi Saisai,
>
> Thanks a lot for the link! This is exactly what I need.
> Just curious, why this PR has not been merged, as it seems to implement
> rather natural requirement.
>
> There are a number or use cases which can benefit from this feature, e.g.
> - collecting business metrics based on the data's attributes and reporting
> them into the monitoring system as a side effect of the data processing
> - visualizing technical metrics by means of alternative software (e.g.
> grafana) - currently it's hardly possible to know the actual number of
> jobs, stages, tasks and their names and IDs in advance to register all the
> corresponding metrics statically.
>
>
> Kind Regards,
> Sergey
>
>
> On Mon, May 6, 2019, 16:07 Saisai Shao  wrote:
>
>> I remembered there was a PR about doing similar thing (
>> https://github.com/apache/spark/pull/18406). From my understanding, this
>> seems like a quite specific requirement, it may requires code change to
>> support your needs.
>>
>> Thanks
>> Saisai
>>
>> Sergey Zhemzhitsky  于2019年5月4日周六 下午4:44写道:
>>
>>> Hello Spark Users!
>>>
>>> Just wondering whether it is possible to register a metric source
>>> without metrics known in advance and add the metrics themselves to this
>>> source later on?
>>>
>>> It seems that currently MetricSystem puts all the metrics from the
>>> source's MetricRegistry into a shared MetricRegistry of a MetricSystem
>>> during metric source registration [1].
>>>
>>> So in case there is a new metric with a new name added to the source's
>>> registry after this source registration, then this new metric will not be
>>> reported to the sinks.
>>>
>>> What I'd like to achieve is to be able to register new metrics with new
>>> names dynamically using a single metric source.
>>> Is it somehow possible?
>>>
>>>
>>> [1]
>>> https://github.com/apache/spark/blob/51de86baed0776304c6184f2c04b6303ef48df90/core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala#L162
>>>
>>


Re: Dynamic metric names

2019-05-06 Thread Saisai Shao
I remembered there was a PR about doing similar thing (
https://github.com/apache/spark/pull/18406). From my understanding, this
seems like a quite specific requirement, it may requires code change to
support your needs.

Thanks
Saisai

Sergey Zhemzhitsky  于2019年5月4日周六 下午4:44写道:

> Hello Spark Users!
>
> Just wondering whether it is possible to register a metric source without
> metrics known in advance and add the metrics themselves to this source
> later on?
>
> It seems that currently MetricSystem puts all the metrics from the
> source's MetricRegistry into a shared MetricRegistry of a MetricSystem
> during metric source registration [1].
>
> So in case there is a new metric with a new name added to the source's
> registry after this source registration, then this new metric will not be
> reported to the sinks.
>
> What I'd like to achieve is to be able to register new metrics with new
> names dynamically using a single metric source.
> Is it somehow possible?
>
>
> [1]
> https://github.com/apache/spark/blob/51de86baed0776304c6184f2c04b6303ef48df90/core/src/main/scala/org/apache/spark/metrics/MetricsSystem.scala#L162
>


Re: Livy-0.6 release?

2019-03-12 Thread Saisai Shao
I can also help to release a new version. My only concern is that how
mature the thrift module is, shall we enable it by default or leave it
disabled?

Thanks
Saisai

Jeff Zhang  于2019年3月12日周二 上午10:54写道:

> Thanks Marcelo, I can help to test it in zeppelin side which use livy as
> one interpreter.
>
> Marcelo Vanzin  于2019年3月12日周二 上午7:25写道:
>
>> Since there isn't much activity going on from the project committers,
>> I guess I could spend some time to create a release.
>>
>> The main problem from my side is that I haven't actually used Livy in
>> a long time. So personally I have no idea of how stable the current
>> master is, and the most I can do is just run the built-in integration
>> tests. So there would be a release (assuming other PPMC members are
>> still around), but I wouldn't really be able to attest to its
>> stability. If people are ok with that...
>>
>> On Sat, Mar 2, 2019 at 6:04 AM kant kodali  wrote:
>> >
>> > Any rough timeline on 0.6? IfLivyy doesn't allow to choose a higher
>> spark version I guess that will be a blocker fora lot of people who want to
>> leverage new features from spark. Any good solution to fix this?
>> >
>> > On Mon, Feb 11, 2019 at 3:46 PM Ruslan Dautkhanov 
>> wrote:
>> >>
>> >> Got it. Thanks Marcelo.
>> >>
>> >> I see LIVY-551 is now part of the master. Hope to see Livy 0.6 perhaps
>> soon.
>> >>
>> >>
>> >> Thank you!
>> >> Ruslan Dautkhanov
>> >>
>> >>
>> >> On Tue, Feb 5, 2019 at 12:38 PM Marcelo Vanzin 
>> wrote:
>> >>>
>> >>> I think LIVY-551 is the current blocker. Unfortunately I don't think
>> >>> we're really tracking things in jira that well, as far as releases go.
>> >>> At least I'm not.
>> >>>
>> >>> On Mon, Feb 4, 2019 at 6:32 PM Ruslan Dautkhanov <
>> dautkha...@gmail.com> wrote:
>> >>> >
>> >>> > +1 for 0.6 release so folks can upgrade to Spark 2.4..
>> >>> >
>> >>> > Marcelo, what particular patches are blocking Livy 0.6 release?
>> >>> >
>> >>> > I see 3 jiras with 0.6 as Fix Version - not sure if that's correct
>> way to find blockers.
>> >>> > https://goo.gl/9axfsw
>> >>> >
>> >>> >
>> >>> > Thank you!
>> >>> > Ruslan Dautkhanov
>> >>> >
>> >>> >
>> >>> > On Mon, Jan 28, 2019 at 2:24 PM Marcelo Vanzin 
>> wrote:
>> >>> >>
>> >>> >> There are a couple of patches under review that are currently
>> blocking
>> >>> >> the release.
>> >>> >>
>> >>> >> Once those are done, we can work on releasing 0.6.
>> >>> >>
>> >>> >> On Mon, Jan 28, 2019 at 11:18 AM Roger Liu <
>> liu.ro...@microsoft.com> wrote:
>> >>> >> >
>> >>> >> > Hey there,
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > I’m wondering if we have a timeline for releasing Livy-0.6? Its
>> been a year since the last release and there are features like Spark-2.4
>> support that are not incorporated in the livy-0.5 package.
>> >>> >> >
>> >>> >> >
>> >>> >> >
>> >>> >> > Thanks,
>> >>> >> >
>> >>> >> > Roger Liu
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> Marcelo
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Marcelo
>>
>>
>>
>> --
>> Marcelo
>>
>
>
> --
> Best Regards
>
> Jeff Zhang
>


Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-03-06 Thread Saisai Shao
Do we have other block/critical issues for Spark 2.4.1 or waiting something
to be fixed? I roughly searched the JIRA, seems there's no block/critical
issues marked for 2.4.1.

Thanks
Saisai

shane knapp  于2019年3月7日周四 上午4:57写道:

> i'll be popping in to the sig-big-data meeting on the 20th to talk about
> stuff like this.
>
> On Wed, Mar 6, 2019 at 12:40 PM Stavros Kontopoulos <
> stavros.kontopou...@lightbend.com> wrote:
>
>> Yes its a touch decision and as we discussed today (
>> https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA
>> )
>> "Kubernetes support window is 9 months, Spark is two years". So we may
>> end up with old client versions on branches still supported like 2.4.x in
>> the future.
>> That gives us no choice but to upgrade, if we want to be on the safe
>> side. We have tested 3.0.0 with 1.11 internally and it works but I dont
>> know what it means to run with old
>> clients.
>>
>>
>> On Wed, Mar 6, 2019 at 7:54 PM Sean Owen  wrote:
>>
>>> If the old client is basically unusable with the versions of K8S
>>> people mostly use now, and the new client still works with older
>>> versions, I could see including this in 2.4.1.
>>>
>>> Looking at
>>> https://github.com/fabric8io/kubernetes-client#compatibility-matrix
>>> it seems like the 4.1.1 client is needed for 1.10 and above. However
>>> it no longer supports 1.7 and below.
>>> We have 3.0.x, and versions through 4.0.x of the client support the
>>> same K8S versions, so no real middle ground here.
>>>
>>> 1.7.0 came out June 2017, it seems. 1.10 was March 2018. Minor release
>>> branches are maintained for 9 months per
>>> https://kubernetes.io/docs/setup/version-skew-policy/
>>>
>>> Spark 2.4.0 came in Nov 2018. I suppose we could say it should have
>>> used the newer client from the start as at that point (?) 1.7 and
>>> earlier were already at least 7 months past EOL.
>>> If we update the client in 2.4.1, versions of K8S as recently
>>> 'supported' as a year ago won't work anymore. I'm guessing there are
>>> still 1.7 users out there? That wasn't that long ago but if the
>>> project and users generally move fast, maybe not.
>>>
>>> Normally I'd say, that's what the next minor release of Spark is for;
>>> update if you want later infra. But there is no Spark 2.5.
>>> I presume downstream distros could modify the dependency easily (?) if
>>> needed and maybe already do. It wouldn't necessarily help end users.
>>>
>>> Does the 3.0.x client not work at all with 1.10+ or just unsupported.
>>> If it 'basically works but no guarantees' I'd favor not updating. If
>>> it doesn't work at all, hm. That's tough. I think I'd favor updating
>>> the client but think it's a tough call both ways.
>>>
>>>
>>>
>>> On Wed, Mar 6, 2019 at 11:14 AM Stavros Kontopoulos
>>>  wrote:
>>> >
>>> > Yes Shane Knapp has done the work for that already,  and also tests
>>> pass, I am working on a PR now, I could submit it for the 2.4 branch .
>>> > I understand that this is a major dependency update, but the problem I
>>> see is that the client version is so old that I dont think it makes
>>> > much sense for current users who are on k8s 1.10, 1.11 etc(
>>> https://github.com/fabric8io/kubernetes-client#compatibility-matrix,
>>> 3.0.0 does not even exist in there).
>>> > I dont know what it means to use that old version with current k8s
>>> clusters in terms of bugs etc.
>>>
>>
>>
>>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-03-05 Thread Saisai Shao
Hi DB,

I saw that we already have 6 RCs, but the vote I can search by now was RC2,
were they all canceled?

Thanks
Saisai

DB Tsai  于2019年2月22日周五 上午4:51写道:

> I am cutting a new rc4 with fix from Felix. Thanks.
>
> Sincerely,
>
> DB Tsai
> --
> Web: https://www.dbtsai.com
> PGP Key ID: 0359BC9965359766
>
> On Thu, Feb 21, 2019 at 8:57 AM Felix Cheung 
> wrote:
> >
> > I merged the fix to 2.4.
> >
> >
> > 
> > From: Felix Cheung 
> > Sent: Wednesday, February 20, 2019 9:34 PM
> > To: DB Tsai; Spark dev list
> > Cc: Cesar Delgado
> > Subject: Re: [VOTE] Release Apache Spark 2.4.1 (RC2)
> >
> > Could you hold for a bit - I have one more fix to get in
> >
> >
> > 
> > From: d_t...@apple.com on behalf of DB Tsai 
> > Sent: Wednesday, February 20, 2019 12:25 PM
> > To: Spark dev list
> > Cc: Cesar Delgado
> > Subject: Re: [VOTE] Release Apache Spark 2.4.1 (RC2)
> >
> > Okay. Let's fail rc2, and I'll prepare rc3 with SPARK-26859.
> >
> > DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple,
> Inc
> >
> > > On Feb 20, 2019, at 12:11 PM, Marcelo Vanzin
>  wrote:
> > >
> > > Just wanted to point out that
> > > https://issues.apache.org/jira/browse/SPARK-26859 is not in this RC,
> > > and is marked as a correctness bug. (The fix is in the 2.4 branch,
> > > just not in rc2.)
> > >
> > > On Wed, Feb 20, 2019 at 12:07 PM DB Tsai 
> wrote:
> > >>
> > >> Please vote on releasing the following candidate as Apache Spark
> version 2.4.1.
> > >>
> > >> The vote is open until Feb 24 PST and passes if a majority +1 PMC
> votes are cast, with
> > >> a minimum of 3 +1 votes.
> > >>
> > >> [ ] +1 Release this package as Apache Spark 2.4.1
> > >> [ ] -1 Do not release this package because ...
> > >>
> > >> To learn more about Apache Spark, please see http://spark.apache.org/
> > >>
> > >> The tag to be voted on is v2.4.1-rc2 (commit
> 229ad524cfd3f74dd7aa5fc9ba841ae223caa960):
> > >> https://github.com/apache/spark/tree/v2.4.1-rc2
> > >>
> > >> The release files, including signatures, digests, etc. can be found
> at:
> > >> https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc2-bin/
> > >>
> > >> Signatures used for Spark RCs can be found in this file:
> > >> https://dist.apache.org/repos/dist/dev/spark/KEYS
> > >>
> > >> The staging repository for this release can be found at:
> > >>
> https://repository.apache.org/content/repositories/orgapachespark-1299/
> > >>
> > >> The documentation corresponding to this release can be found at:
> > >> https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc2-docs/
> > >>
> > >> The list of bug fixes going into 2.4.1 can be found at the following
> URL:
> > >> https://issues.apache.org/jira/projects/SPARK/versions/2.4.1
> > >>
> > >> FAQ
> > >>
> > >> =
> > >> How can I help test this release?
> > >> =
> > >>
> > >> If you are a Spark user, you can help us test this release by taking
> > >> an existing Spark workload and running on this release candidate, then
> > >> reporting any regressions.
> > >>
> > >> If you're working in PySpark you can set up a virtual env and install
> > >> the current RC and see if anything important breaks, in the Java/Scala
> > >> you can add the staging repository to your projects resolvers and test
> > >> with the RC (make sure to clean up the artifact cache before/after so
> > >> you don't end up building with a out of date RC going forward).
> > >>
> > >> ===
> > >> What should happen to JIRA tickets still targeting 2.4.1?
> > >> ===
> > >>
> > >> The current list of open tickets targeted at 2.4.1 can be found at:
> > >> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 2.4.1
> > >>
> > >> Committers should look at those and triage. Extremely important bug
> > >> fixes, documentation, and API tweaks that impact compatibility should
> > >> be worked on immediately. Everything else please retarget to an
> > >> appropriate release.
> > >>
> > >> ==
> > >> But my bug isn't fixed?
> > >> ==
> > >>
> > >> In order to make timely releases, we will typically not hold the
> > >> release unless the bug in question is a regression from the previous
> > >> release. That being said, if there is something which is a regression
> > >> that has not been correctly targeted please ping me or a committer to
> > >> help target the issue.
> > >>
> > >>
> > >> DB Tsai | Siri Open Source Technologies [not a contribution] | 
> Apple, Inc
> > >>
> > >>
> > >> -
> > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> > >>
> > >
> > >
> > > --
> > > Marcelo
> > >
> > > -
> > > To unsubscribe e-mail: dev-unsubscr...@sp

Re: Making travis logs less verbose

2019-02-11 Thread Saisai Shao
I'm ok with the upgrade, but the environment is quite painful, I fixed lots
of issues due to changes of travis basic image. So I think the upgrade may
introduces several additional changes.

I cannot remember why two mvn builds are required, maybe you can take a try.

Thanks
Saisai


Meisam Fathi  于2019年2月12日周二 上午11:16写道:

> I noticed a couple other things in .travis.yml, but I am not sure why they
> are needed.
>
>- The base image is ubuntu:trusty. Can we update it to a later LTS
>version of ubuntu like Xenial?
>- Travis builds Livy twice: Is there a reason why two mvn builds are
>needed?
>
> install:  - mvn $MVN_FLAG install -Dskip -DskipTests -DskipITs
> -Dmaven.javadoc.skip=true -B -V
> script:  - mvn $MVN_FLAG verify -e
>
>
> On Mon, Feb 11, 2019 at 5:52 PM Saisai Shao 
> wrote:
>
> > Problem is that there's no better way to get detailed log on travis
> without
> > printing in out on screen. I was struggling on it when debugging on
> travis.
> >
> > Thanks
> > Saisai
> >
> > Meisam Fathi  于2019年2月9日周六 下午12:19写道:
> >
> > > This may do the trick for maven
> > >
> > > mvn -Dorg.slf4j.simpleLogger.defaultLogLevel=warn ...
> > >
> > > Tanks,
> > > Meisam
> > >
> > > On Fri, Feb 8, 2019 at 2:11 PM Marcelo Vanzin
> >  > > >
> > > wrote:
> > >
> > > > If you know how to silence messages from the setup phase (apt / pip /
> > > > git), go for it. Those seem kinda hidden by Travis, but maybe there's
> > > > a setting I'm not familiar with.
> > > >
> > > > Maven also has a -B option that makes things a little less verbose in
> > > > non-interactive terminals. I think -quiet might be a little overkill.
> > > >
> > > > On Thu, Feb 7, 2019 at 2:40 PM Meisam Fathi 
> > > > wrote:
> > > > >
> > > > > Each build on travis generates 10K+ lines of log. Should we make
> > build
> > > > > commands less verbose by passing --quiet to them?
> > > > >
> > > > > As an example, apt-get installs and pip installs generate 3K+ lines
> > on
> > > > > their own. Maven generates another 6K+ lines of log, but I am not
> > sure
> > > if
> > > > > scilencing maven is a good idea. Passing --quiet to Maven silences
> > > scalac
> > > > > warnging.
> > > > >
> > > > > Having said all of that, should we make travis logs less verbose?
> If
> > > > yes, I
> > > > > can send a PR.
> > > > >
> > > > > Thanks,
> > > > > Meisam
> > > >
> > > >
> > > >
> > > > --
> > > > Marcelo
> > > >
> > >
> >
>


Re: Making travis logs less verbose

2019-02-11 Thread Saisai Shao
Problem is that there's no better way to get detailed log on travis without
printing in out on screen. I was struggling on it when debugging on travis.

Thanks
Saisai

Meisam Fathi  于2019年2月9日周六 下午12:19写道:

> This may do the trick for maven
>
> mvn -Dorg.slf4j.simpleLogger.defaultLogLevel=warn ...
>
> Tanks,
> Meisam
>
> On Fri, Feb 8, 2019 at 2:11 PM Marcelo Vanzin  >
> wrote:
>
> > If you know how to silence messages from the setup phase (apt / pip /
> > git), go for it. Those seem kinda hidden by Travis, but maybe there's
> > a setting I'm not familiar with.
> >
> > Maven also has a -B option that makes things a little less verbose in
> > non-interactive terminals. I think -quiet might be a little overkill.
> >
> > On Thu, Feb 7, 2019 at 2:40 PM Meisam Fathi 
> > wrote:
> > >
> > > Each build on travis generates 10K+ lines of log. Should we make build
> > > commands less verbose by passing --quiet to them?
> > >
> > > As an example, apt-get installs and pip installs generate 3K+ lines on
> > > their own. Maven generates another 6K+ lines of log, but I am not sure
> if
> > > scilencing maven is a good idea. Passing --quiet to Maven silences
> scalac
> > > warnging.
> > >
> > > Having said all of that, should we make travis logs less verbose? If
> > yes, I
> > > can send a PR.
> > >
> > > Thanks,
> > > Meisam
> >
> >
> >
> > --
> > Marcelo
> >
>


[jira] [Commented] (SPARK-24615) Accelerator-aware task scheduling for Spark

2019-01-24 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750872#comment-16750872
 ] 

Saisai Shao commented on SPARK-24615:
-

I'm really sorry about the delay. Due to some changes on my side, I didn't have 
enough time to work on this before. But I talked to Xiangrui offline recently, 
we will continue to work on this and finalize it in 3.0. 

> Accelerator-aware task scheduling for Spark
> ---
>
> Key: SPARK-24615
> URL: https://issues.apache.org/jira/browse/SPARK-24615
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Saisai Shao
>Priority: Major
>  Labels: Hydrogen, SPIP
>
> In the machine learning area, accelerator card (GPU, FPGA, TPU) is 
> predominant compared to CPUs. To make the current Spark architecture to work 
> with accelerator cards, Spark itself should understand the existence of 
> accelerators and know how to schedule task onto the executors where 
> accelerators are equipped.
> Current Spark’s scheduler schedules tasks based on the locality of the data 
> plus the available of CPUs. This will introduce some problems when scheduling 
> tasks with accelerators required.
>  # CPU cores are usually more than accelerators on one node, using CPU cores 
> to schedule accelerator required tasks will introduce the mismatch.
>  # In one cluster, we always assume that CPU is equipped in each node, but 
> this is not true of accelerator cards.
>  # The existence of heterogeneous tasks (accelerator required or not) 
> requires scheduler to schedule tasks with a smart way.
> So here propose to improve the current scheduler to support heterogeneous 
> tasks (accelerator requires or not). This can be part of the work of Project 
> hydrogen.
> Details is attached in google doc. It doesn't cover all the implementation 
> details, just highlight the parts should be changed.
>  
> CC [~yanboliang] [~merlintang]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26512) Spark 2.4.0 is not working with Hadoop 2.8.3 in windows 10

2019-01-04 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16734072#comment-16734072
 ] 

Saisai Shao commented on SPARK-26512:
-

This seems like a Netty version problem, netty-3.9.9.Final.jar is unrelated. I 
was thinking if we can put spark classpath in front of Hadoop classpath, maybe 
this can be worked. There's a such configuration for driver/executor, not such 
if there's a similar one for AM only.

> Spark 2.4.0 is not working with Hadoop 2.8.3 in windows 10
> --
>
> Key: SPARK-26512
> URL: https://issues.apache.org/jira/browse/SPARK-26512
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell, YARN
>Affects Versions: 2.4.0
> Environment: operating system : Windows 10
> Spark Version : 2.4.0
> Hadoop Version : 2.8.3
>Reporter: Anubhav Jain
>Priority: Minor
>  Labels: windows
> Attachments: log.png
>
>
> I have installed Hadoop version 2.8.3 in my windows 10 environment and its 
> working fine. Now when i try to install Apache Spark(version 2.4.0) with yarn 
> as cluster manager and its not working. When i try to submit a spark job 
> using spark-submit for testing , so its coming under ACCEPTED tab in YARN UI 
> after that it fail



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] Move source repo to gitbox

2019-01-03 Thread Saisai Shao
+1

Luciano Resende  于2019年1月4日周五 上午8:26写道:

> +1
>
> On Thu, Jan 3, 2019 at 15:57 Marcelo Vanzin 
> wrote:
>
> > Creating a formal vote for this. I don't think we have a choice but
> > they seem to request a vote anyway. This will be lazy consensus, I
> > plan to create an infra ticket for the migration on Monday if the vote
> > passes.
> >
> > Starting with my +1.
> >
> >
> > --
> > Marcelo
> >
> --
> Sent from my Mobile device
>


[jira] [Commented] (SPARK-26512) Spark 2.4.0 is not working with Hadoop 2.8.3 in windows 10?

2019-01-03 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16733740#comment-16733740
 ] 

Saisai Shao commented on SPARK-26512:
-

Please list the problems you saw, any log or exception. We can't tell anything 
with above information.

> Spark 2.4.0 is not working with Hadoop 2.8.3 in windows 10?
> ---
>
> Key: SPARK-26512
> URL: https://issues.apache.org/jira/browse/SPARK-26512
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Spark Shell, YARN
>Affects Versions: 2.4.0
> Environment: operating system : Windows 10
> Spark Version : 2.4.0
> Hadoop Version : 2.8.3
>Reporter: Anubhav Jain
>Priority: Minor
>  Labels: windows
> Attachments: log.png
>
>
> I have installed Hadoop version 2.8.3 in my windows 10 environment and its 
> working fine. Now when i try to install Apache Spark(version 2.4.0) with yarn 
> as cluster manager and its not working. When i try to submit a spark job 
> using spark-submit for testing , so its coming under ACCEPTED tab in YARN UI 
> after that it fail



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26457) Show hadoop configurations in HistoryServer environment tab

2019-01-02 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-26457:

Priority: Minor  (was: Major)

> Show hadoop configurations in HistoryServer environment tab
> ---
>
> Key: SPARK-26457
> URL: https://issues.apache.org/jira/browse/SPARK-26457
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core, Web UI
>Affects Versions: 2.3.2, 2.4.0
> Environment: Maybe it is good to show some configurations in 
> HistoryServer environment tab for debugging some bugs about hadoop
>Reporter: deshanxiao
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26516) zeppelin with spark on mesos: environment variable setting

2019-01-02 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-26516.
-
Resolution: Invalid

> zeppelin with spark on mesos: environment variable setting
> --
>
> Key: SPARK-26516
> URL: https://issues.apache.org/jira/browse/SPARK-26516
> Project: Spark
>  Issue Type: Question
>  Components: Mesos, Spark Core
>Affects Versions: 2.4.0
>Reporter: Yui Hirasawa
>Priority: Major
>
> I am trying to use zeppelin with spark on mesos mode following [Apache 
> Zeppelin on Spark Cluster 
> Mode|https://zeppelin.apache.org/docs/0.8.0/setup/deployment/spark_cluster_mode.html#4-configure-spark-interpreter-in-zeppelin-1].
> In the instruction, we should set these environment variables:
> {code:java}
> export MASTER=mesos://127.0.1.1:5050
> export MESOS_NATIVE_JAVA_LIBRARY=[PATH OF libmesos.so]
> export SPARK_HOME=[PATH OF SPARK HOME]
> {code}
> As far as I know, these environment variables are used by zeppelin, so it 
> should be set in localhost rather than in docker container(if i am wrong 
> please correct me).
> But mesos and spark is running inside docker container, so do we need to set 
> these environment variables so that they are pointing to the path inside the 
> docker container? If so, how should one achieve that?
> Thanks in advance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26513) Trigger GC on executor node idle

2019-01-02 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-26513:

Fix Version/s: (was: 3.0.0)

> Trigger GC on executor node idle
> 
>
> Key: SPARK-26513
> URL: https://issues.apache.org/jira/browse/SPARK-26513
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Sandish Kumar HN
>Priority: Major
>
>  
> Correct me if I'm wrong.
>  *Stage:*
>       On a large cluster, each stage would have some executors. were a few 
> executors would finish a couple of tasks first and wait for whole stage or 
> remaining tasks to finish which are executed by different executors nodes in 
> a cluster. a stage will only be completed when all tasks in a current stage 
> finish its execution. and the next stage execution has to wait till all tasks 
> of the current stage are completed. 
>  
> why don't we trigger GC, when the executor node is waiting for remaining 
> tasks to finish, or executor Idle? anyways executor has to wait for the 
> remaining tasks to finish which can at least take a couple of seconds. why 
> don't we trigger GC? which will max take <300ms
>  
> I have proposed a small code snippet which triggers GC when running tasks are 
> empty and heap usage in current executor node is more than the given 
> threshold.
> This could improve performance for long-running spark job's. 
> we referred this paper 
> [https://www.computer.org/csdl/proceedings/hipc/2016/5411/00/07839705.pdf] 
> and we found performance improvements in our long-running spark batch job's.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   10   >