[jira] [Commented] (SPARK-24615) Accelerator-aware task scheduling for Spark

2018-07-19 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550266#comment-16550266
 ] 

Saisai Shao commented on SPARK-24615:
-

Hi [~tgraves] I'm still not sure how to handle memory per stage. Unlike MR, 
Spark shares the task runtime in a single JVM, I'm not sure how to control the 
memory usage within the JVM. Are you suggesting the similar approach like using 
GPU, when memory requirement cannot be satisfied, release the current executors 
and requesting new executors by dynamic resource allocation?

> Accelerator-aware task scheduling for Spark
> ---
>
> Key: SPARK-24615
> URL: https://issues.apache.org/jira/browse/SPARK-24615
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Major
>  Labels: Hydrogen, SPIP
>
> In the machine learning area, accelerator card (GPU, FPGA, TPU) is 
> predominant compared to CPUs. To make the current Spark architecture to work 
> with accelerator cards, Spark itself should understand the existence of 
> accelerators and know how to schedule task onto the executors where 
> accelerators are equipped.
> Current Spark’s scheduler schedules tasks based on the locality of the data 
> plus the available of CPUs. This will introduce some problems when scheduling 
> tasks with accelerators required.
>  # CPU cores are usually more than accelerators on one node, using CPU cores 
> to schedule accelerator required tasks will introduce the mismatch.
>  # In one cluster, we always assume that CPU is equipped in each node, but 
> this is not true of accelerator cards.
>  # The existence of heterogeneous tasks (accelerator required or not) 
> requires scheduler to schedule tasks with a smart way.
> So here propose to improve the current scheduler to support heterogeneous 
> tasks (accelerator requires or not). This can be part of the work of Project 
> hydrogen.
> Details is attached in google doc. It doesn't cover all the implementation 
> details, just highlight the parts should be changed.
>  
> CC [~yanboliang] [~merlintang]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24723) Discuss necessary info and access in barrier mode + YARN

2018-07-19 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550182#comment-16550182
 ] 

Saisai Shao commented on SPARK-24723:
-

Hi [~mengxr], I don't think YARN has such feature to configure password-less 
SSH on all containers. YARN itself doesn't rely on SSH, and in our deployment 
(Ambari), we don't have use password-less ssh.
{quote}And does container by default run sshd? If not, which process is 
responsible for starting/terminating the daemon?
{quote}
If the container is is not dockerized, so it will share with system's sshd, it 
is system's responsibility to start/terminate this daemon.

If the container is dockerized, I think the docker container should be 
responsible for starting sshd (IIUC).

Maybe we should check if sshd is started before starting MPI job, if sshd is 
not started, simply we cannot run MPI job no matter who is responsible for sshd 
daemon.

[~leftnoteasy] might have some thoughts, since he is the originator of 
mpich2-yarn.

 

> Discuss necessary info and access in barrier mode + YARN
> 
>
> Key: SPARK-24723
> URL: https://issues.apache.org/jira/browse/SPARK-24723
> Project: Spark
>  Issue Type: Story
>  Components: ML, Spark Core
>Affects Versions: 3.0.0
>    Reporter: Xiangrui Meng
>Assignee: Saisai Shao
>Priority: Major
>
> In barrier mode, to run hybrid distributed DL training jobs, we need to 
> provide users sufficient info and access so they can set up a hybrid 
> distributed training job, e.g., using MPI.
> This ticket limits the scope of discussion to Spark + YARN. There were some 
> past attempts from the Hadoop community. So we should find someone with good 
> knowledge to lead the discussion here.
>  
> Requirements:
>  * understand how to set up YARN to run MPI job as a YARN application
>  * figure out how to do it with Spark w/ Barrier



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24195) sc.addFile for local:/ path is broken

2018-07-19 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-24195:
---

Assignee: Li Yuanjian

> sc.addFile for local:/ path is broken
> -
>
> Key: SPARK-24195
> URL: https://issues.apache.org/jira/browse/SPARK-24195
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.1, 1.4.1, 1.5.2, 1.6.3, 2.0.2, 2.1.2, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>Assignee: Li Yuanjian
>Priority: Minor
>  Labels: starter
> Fix For: 2.4.0
>
>
> In changing SPARK-6300
> https://github.com/apache/spark/commit/00e730b94cba1202a73af1e2476ff5a44af4b6b2
> essentially the change to
> new File(path).getCanonicalFile.toURI.toString
> breaks when path is local:, as java.io.File doesn't handle it.
> eg.
> new 
> File("local:///home/user/demo/logger.config").getCanonicalFile.toURI.toString
> res1: String = file:/user/anotheruser/local:/home/user/demo/logger.config



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24195) sc.addFile for local:/ path is broken

2018-07-19 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-24195.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 21533
[https://github.com/apache/spark/pull/21533]

> sc.addFile for local:/ path is broken
> -
>
> Key: SPARK-24195
> URL: https://issues.apache.org/jira/browse/SPARK-24195
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.1, 1.4.1, 1.5.2, 1.6.3, 2.0.2, 2.1.2, 2.2.1, 2.3.0
>Reporter: Felix Cheung
>Priority: Minor
>  Labels: starter
> Fix For: 2.4.0
>
>
> In changing SPARK-6300
> https://github.com/apache/spark/commit/00e730b94cba1202a73af1e2476ff5a44af4b6b2
> essentially the change to
> new File(path).getCanonicalFile.toURI.toString
> breaks when path is local:, as java.io.File doesn't handle it.
> eg.
> new 
> File("local:///home/user/demo/logger.config").getCanonicalFile.toURI.toString
> res1: String = file:/user/anotheruser/local:/home/user/demo/logger.config



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24307) Support sending messages over 2GB from memory

2018-07-19 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16550167#comment-16550167
 ] 

Saisai Shao commented on SPARK-24307:
-

Issue resolved by pull request 21440
[https://github.com/apache/spark/pull/21440]

> Support sending messages over 2GB from memory
> -
>
> Key: SPARK-24307
> URL: https://issues.apache.org/jira/browse/SPARK-24307
> Project: Spark
>  Issue Type: Sub-task
>  Components: Block Manager, Spark Core
>Affects Versions: 2.3.0
>Reporter: Imran Rashid
>Assignee: Imran Rashid
>Priority: Major
> Fix For: 2.4.0
>
>
> Spark's networking layer supports sending messages backed by a {{FileRegion}} 
> or a {{ByteBuf}}.  Sending large FileRegion's works, as netty supports large 
> FileRegions.   However, {{ByteBuf}} is limited to 2GB.  This is particularly 
> a problem for sending large datasets that are already in memory, eg.  cached 
> RDD blocks.
> eg. if you try to replicate a block stored in memory that is over 2 GB, you 
> will see an exception like:
> {noformat}
> 18/05/16 12:40:57 ERROR client.TransportClient: Failed to send RPC 
> 7420542363232096629 to xyz.com/172.31.113.213:44358: 
> io.netty.handler.codec.EncoderException: java.lang.IndexOutOfBoundsException: 
> readerIndex: 0, writerIndex: -1294617291 (expected: 0 <= readerIndex <= 
> writerIndex <= capacity(-1294617291))
> io.netty.handler.codec.EncoderException: java.lang.IndexOutOfBoundsException: 
> readerIndex: 0, writerIndex: -1294617291 (expected: 0 <= readerIndex <= 
> writerIndex <= capacity(-1294617291))
> at 
> io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:106)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730)
> at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:816)
> at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:723)
> at 
> io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:302)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730)
> at 
> io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:38)
> at 
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1081)
> at 
> io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1128)
> at 
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1070)
> at 
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
> at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> -1294617291 (expected: 0 <= readerIndex <= writerIndex <= 
> capacity(-1294617291))
> at io.netty.buffer.AbstractByteBuf.setIndex(AbstractByteBuf.java:129)
> at 
> io.netty.buffer.CompositeByteBuf.setIndex(CompositeByteBuf.java:1688)
> at io.netty.buffer.CompositeByteBuf.(CompositeByteBuf.java:110)
> at io.netty.buffer.Unpooled.wrappedBuffer(Unpooled.java:359)
> at 
> org.apache.spark.util.io.ChunkedByteBuffer.toNetty(ChunkedByteBuffer.scala:87)
> at 
> org.apache.spark.storage.ByteBufferBlockData.toNetty(BlockManager.scala:95)
> at 
> org.apache.spark.storage.BlockManagerManagedBuffer.convertToNetty(BlockManagerManagedBuffer.scala:52)
> at 
> org.apache.spark.network.protocol.MessageEncoder.encode(MessageEncoder.java:58)
> at 
> org.apache.spark.network.protocol.MessageEncoder.encode(MessageEncoder.

[jira] [Resolved] (SPARK-24307) Support sending messages over 2GB from memory

2018-07-19 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-24307.
-
Resolution: Fixed
  Assignee: Imran Rashid

> Support sending messages over 2GB from memory
> -
>
> Key: SPARK-24307
> URL: https://issues.apache.org/jira/browse/SPARK-24307
> Project: Spark
>  Issue Type: Sub-task
>  Components: Block Manager, Spark Core
>Affects Versions: 2.3.0
>Reporter: Imran Rashid
>Assignee: Imran Rashid
>Priority: Major
> Fix For: 2.4.0
>
>
> Spark's networking layer supports sending messages backed by a {{FileRegion}} 
> or a {{ByteBuf}}.  Sending large FileRegion's works, as netty supports large 
> FileRegions.   However, {{ByteBuf}} is limited to 2GB.  This is particularly 
> a problem for sending large datasets that are already in memory, eg.  cached 
> RDD blocks.
> eg. if you try to replicate a block stored in memory that is over 2 GB, you 
> will see an exception like:
> {noformat}
> 18/05/16 12:40:57 ERROR client.TransportClient: Failed to send RPC 
> 7420542363232096629 to xyz.com/172.31.113.213:44358: 
> io.netty.handler.codec.EncoderException: java.lang.IndexOutOfBoundsException: 
> readerIndex: 0, writerIndex: -1294617291 (expected: 0 <= readerIndex <= 
> writerIndex <= capacity(-1294617291))
> io.netty.handler.codec.EncoderException: java.lang.IndexOutOfBoundsException: 
> readerIndex: 0, writerIndex: -1294617291 (expected: 0 <= readerIndex <= 
> writerIndex <= capacity(-1294617291))
> at 
> io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:106)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730)
> at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:816)
> at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:723)
> at 
> io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:302)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730)
> at 
> io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:38)
> at 
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1081)
> at 
> io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1128)
> at 
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1070)
> at 
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
> at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> -1294617291 (expected: 0 <= readerIndex <= writerIndex <= 
> capacity(-1294617291))
> at io.netty.buffer.AbstractByteBuf.setIndex(AbstractByteBuf.java:129)
> at 
> io.netty.buffer.CompositeByteBuf.setIndex(CompositeByteBuf.java:1688)
> at io.netty.buffer.CompositeByteBuf.(CompositeByteBuf.java:110)
> at io.netty.buffer.Unpooled.wrappedBuffer(Unpooled.java:359)
> at 
> org.apache.spark.util.io.ChunkedByteBuffer.toNetty(ChunkedByteBuffer.scala:87)
> at 
> org.apache.spark.storage.ByteBufferBlockData.toNetty(BlockManager.scala:95)
> at 
> org.apache.spark.storage.BlockManagerManagedBuffer.convertToNetty(BlockManagerManagedBuffer.scala:52)
> at 
> org.apache.spark.network.protocol.MessageEncoder.encode(MessageEncoder.java:58)
> at 
> org.apache.spark.network.protocol.MessageEncoder.encode(MessageEncoder.java:33)
> at 
> io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMe

[jira] [Updated] (SPARK-24307) Support sending messages over 2GB from memory

2018-07-19 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-24307:

Fix Version/s: 2.4.0

> Support sending messages over 2GB from memory
> -
>
> Key: SPARK-24307
> URL: https://issues.apache.org/jira/browse/SPARK-24307
> Project: Spark
>  Issue Type: Sub-task
>  Components: Block Manager, Spark Core
>Affects Versions: 2.3.0
>Reporter: Imran Rashid
>Priority: Major
> Fix For: 2.4.0
>
>
> Spark's networking layer supports sending messages backed by a {{FileRegion}} 
> or a {{ByteBuf}}.  Sending large FileRegion's works, as netty supports large 
> FileRegions.   However, {{ByteBuf}} is limited to 2GB.  This is particularly 
> a problem for sending large datasets that are already in memory, eg.  cached 
> RDD blocks.
> eg. if you try to replicate a block stored in memory that is over 2 GB, you 
> will see an exception like:
> {noformat}
> 18/05/16 12:40:57 ERROR client.TransportClient: Failed to send RPC 
> 7420542363232096629 to xyz.com/172.31.113.213:44358: 
> io.netty.handler.codec.EncoderException: java.lang.IndexOutOfBoundsException: 
> readerIndex: 0, writerIndex: -1294617291 (expected: 0 <= readerIndex <= 
> writerIndex <= capacity(-1294617291))
> io.netty.handler.codec.EncoderException: java.lang.IndexOutOfBoundsException: 
> readerIndex: 0, writerIndex: -1294617291 (expected: 0 <= readerIndex <= 
> writerIndex <= capacity(-1294617291))
> at 
> io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:106)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730)
> at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:816)
> at 
> io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:723)
> at 
> io.netty.handler.timeout.IdleStateHandler.write(IdleStateHandler.java:302)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
> at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730)
> at 
> io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:38)
> at 
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1081)
> at 
> io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1128)
> at 
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1070)
> at 
> io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
> at 
> io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
> at 
> io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IndexOutOfBoundsException: readerIndex: 0, writerIndex: 
> -1294617291 (expected: 0 <= readerIndex <= writerIndex <= 
> capacity(-1294617291))
> at io.netty.buffer.AbstractByteBuf.setIndex(AbstractByteBuf.java:129)
> at 
> io.netty.buffer.CompositeByteBuf.setIndex(CompositeByteBuf.java:1688)
> at io.netty.buffer.CompositeByteBuf.(CompositeByteBuf.java:110)
> at io.netty.buffer.Unpooled.wrappedBuffer(Unpooled.java:359)
> at 
> org.apache.spark.util.io.ChunkedByteBuffer.toNetty(ChunkedByteBuffer.scala:87)
> at 
> org.apache.spark.storage.ByteBufferBlockData.toNetty(BlockManager.scala:95)
> at 
> org.apache.spark.storage.BlockManagerManagedBuffer.convertToNetty(BlockManagerManagedBuffer.scala:52)
> at 
> org.apache.spark.network.protocol.MessageEncoder.encode(MessageEncoder.java:58)
> at 
> org.apache.spark.network.protocol.MessageEncoder.encode(MessageEncoder.java:33)
> at 
> io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:88)
> ... 17 more
> {noformat}

[jira] [Updated] (SPARK-24037) stateful operators

2018-07-19 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-24037:

Fix Version/s: (was: 2.4.0)

> stateful operators
> --
>
> Key: SPARK-24037
> URL: https://issues.apache.org/jira/browse/SPARK-24037
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Priority: Major
>
> pointer to https://issues.apache.org/jira/browse/SPARK-24036



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24037) stateful operators

2018-07-19 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-24037:

Fix Version/s: 2.4.0

> stateful operators
> --
>
> Key: SPARK-24037
> URL: https://issues.apache.org/jira/browse/SPARK-24037
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jose Torres
>Priority: Major
>
> pointer to https://issues.apache.org/jira/browse/SPARK-24036



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] SPARK 2.3.2 (RC3)

2018-07-19 Thread Saisai Shao
Sure, I can wait for this and create another RC then.

Thanks,
Saisai

Xiao Li  于2018年7月20日周五 上午9:11写道:

> Yes. https://issues.apache.org/jira/browse/SPARK-24867 is the one I
> created. The PR has been created. Since this is not rare, let us merge it
> to 2.3.2?
>
> Reynold' PR is to get rid of AnalysisBarrier. That is better than multiple
> patches we added for AnalysisBarrier after 2.3.0 release. We can target it
> to 2.4.
>
> Thanks,
>
> Xiao
>
> 2018-07-19 17:48 GMT-07:00 Saisai Shao :
>
>> I see, thanks Reynold.
>>
>> Reynold Xin  于2018年7月20日周五 上午8:46写道:
>>
>>> Looking at the list of pull requests it looks like this is the ticket:
>>> https://issues.apache.org/jira/browse/SPARK-24867
>>>
>>>
>>>
>>> On Thu, Jul 19, 2018 at 5:25 PM Reynold Xin  wrote:
>>>
>>>> I don't think my ticket should block this release. It's a big general
>>>> refactoring.
>>>>
>>>> Xiao do you have a ticket for the bug you found?
>>>>
>>>>
>>>> On Thu, Jul 19, 2018 at 5:24 PM Saisai Shao 
>>>> wrote:
>>>>
>>>>> Hi Xiao,
>>>>>
>>>>> Are you referring to this JIRA (
>>>>> https://issues.apache.org/jira/browse/SPARK-24865)?
>>>>>
>>>>> Xiao Li  于2018年7月20日周五 上午2:41写道:
>>>>>
>>>>>> dfWithUDF.cache()
>>>>>> dfWithUDF.write.saveAsTable("t")
>>>>>> dfWithUDF.write.saveAsTable("t1")
>>>>>>
>>>>>>
>>>>>> Cached data is not being used. It causes a big performance
>>>>>> regression.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> 2018-07-19 11:32 GMT-07:00 Sean Owen :
>>>>>>
>>>>>>> What regression are you referring to here? A -1 vote really needs a
>>>>>>> rationale.
>>>>>>>
>>>>>>> On Thu, Jul 19, 2018 at 1:27 PM Xiao Li 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I would first vote -1.
>>>>>>>>
>>>>>>>> I might find another regression caused by the analysis barrier.
>>>>>>>> Will keep you posted.
>>>>>>>>
>>>>>>>>
>>>>>>
>


Re: [VOTE] SPARK 2.3.2 (RC3)

2018-07-19 Thread Saisai Shao
I see, thanks Reynold.

Reynold Xin  于2018年7月20日周五 上午8:46写道:

> Looking at the list of pull requests it looks like this is the ticket:
> https://issues.apache.org/jira/browse/SPARK-24867
>
>
>
> On Thu, Jul 19, 2018 at 5:25 PM Reynold Xin  wrote:
>
>> I don't think my ticket should block this release. It's a big general
>> refactoring.
>>
>> Xiao do you have a ticket for the bug you found?
>>
>>
>> On Thu, Jul 19, 2018 at 5:24 PM Saisai Shao 
>> wrote:
>>
>>> Hi Xiao,
>>>
>>> Are you referring to this JIRA (
>>> https://issues.apache.org/jira/browse/SPARK-24865)?
>>>
>>> Xiao Li  于2018年7月20日周五 上午2:41写道:
>>>
>>>> dfWithUDF.cache()
>>>> dfWithUDF.write.saveAsTable("t")
>>>> dfWithUDF.write.saveAsTable("t1")
>>>>
>>>>
>>>> Cached data is not being used. It causes a big performance regression.
>>>>
>>>>
>>>>
>>>>
>>>> 2018-07-19 11:32 GMT-07:00 Sean Owen :
>>>>
>>>>> What regression are you referring to here? A -1 vote really needs a
>>>>> rationale.
>>>>>
>>>>> On Thu, Jul 19, 2018 at 1:27 PM Xiao Li  wrote:
>>>>>
>>>>>> I would first vote -1.
>>>>>>
>>>>>> I might find another regression caused by the analysis barrier. Will
>>>>>> keep you posted.
>>>>>>
>>>>>>
>>>>


Re: [VOTE] SPARK 2.3.2 (RC3)

2018-07-19 Thread Saisai Shao
Hi Xiao,

Are you referring to this JIRA (
https://issues.apache.org/jira/browse/SPARK-24865)?

Xiao Li  于2018年7月20日周五 上午2:41写道:

> dfWithUDF.cache()
> dfWithUDF.write.saveAsTable("t")
> dfWithUDF.write.saveAsTable("t1")
>
>
> Cached data is not being used. It causes a big performance regression.
>
>
>
>
> 2018-07-19 11:32 GMT-07:00 Sean Owen :
>
>> What regression are you referring to here? A -1 vote really needs a
>> rationale.
>>
>> On Thu, Jul 19, 2018 at 1:27 PM Xiao Li  wrote:
>>
>>> I would first vote -1.
>>>
>>> I might find another regression caused by the analysis barrier. Will
>>> keep you posted.
>>>
>>>
>


[jira] [Commented] (SPARK-24615) Accelerator-aware task scheduling for Spark

2018-07-19 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16549276#comment-16549276
 ] 

Saisai Shao commented on SPARK-24615:
-

Thanks [~tgraves] for the suggestion. 
{quote}Once I get to the point I want to do the ML I want to ask for the gpu's 
as well as ask for more memory during that stage because I didn't need more 
before this stage for all the etl work.  I realize you already have executors, 
but ideally spark with the cluster manager could potentially release the 
existing ones and ask for new ones with those requirements.
{quote}
Yes, I already discussed with my colleague offline, this is a valid scenario, 
but I think to achieve this we should change the current dynamic resource 
allocation mechanism Currently I marked this as a Non-Goal in this proposal, 
only focus on statically resource requesting (--executor-cores, 
--executor-gpus). I think we should support it later.

> Accelerator-aware task scheduling for Spark
> ---
>
> Key: SPARK-24615
> URL: https://issues.apache.org/jira/browse/SPARK-24615
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Major
>  Labels: Hydrogen, SPIP
>
> In the machine learning area, accelerator card (GPU, FPGA, TPU) is 
> predominant compared to CPUs. To make the current Spark architecture to work 
> with accelerator cards, Spark itself should understand the existence of 
> accelerators and know how to schedule task onto the executors where 
> accelerators are equipped.
> Current Spark’s scheduler schedules tasks based on the locality of the data 
> plus the available of CPUs. This will introduce some problems when scheduling 
> tasks with accelerators required.
>  # CPU cores are usually more than accelerators on one node, using CPU cores 
> to schedule accelerator required tasks will introduce the mismatch.
>  # In one cluster, we always assume that CPU is equipped in each node, but 
> this is not true of accelerator cards.
>  # The existence of heterogeneous tasks (accelerator required or not) 
> requires scheduler to schedule tasks with a smart way.
> So here propose to improve the current scheduler to support heterogeneous 
> tasks (accelerator requires or not). This can be part of the work of Project 
> hydrogen.
> Details is attached in google doc. It doesn't cover all the implementation 
> details, just highlight the parts should be changed.
>  
> CC [~yanboliang] [~merlintang]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24615) Accelerator-aware task scheduling for Spark

2018-07-18 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16548662#comment-16548662
 ] 

Saisai Shao commented on SPARK-24615:
-

Hi [~tgraves] I'm rewriting the design doc based on the comments mentioned 
above, so temporarily make it inaccessible, sorry about it, I will reopen it.

I think it is hard to control the memory usage per stage/task, because task is 
running in the executor which shared within a JVM. For CPU, yes I think we can 
do it, but I'm not sure the usage scenario of it.

For the requirement of using different types of machine, what I can think of is 
leveraging dynamic resource allocation. For example, if user wants run some MPI 
jobs with barrier enabled, then Spark could allocate some new executors with 
accelerator resource via cluster manager (for example using node label if it is 
running on YARN). But I will not target this as a goal in this design, since a) 
it is a non-goal for barrier scheduler currently; b) it makes the design too 
complex, would be better to separate to another work.

> Accelerator-aware task scheduling for Spark
> ---
>
> Key: SPARK-24615
> URL: https://issues.apache.org/jira/browse/SPARK-24615
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Major
>  Labels: Hydrogen, SPIP
>
> In the machine learning area, accelerator card (GPU, FPGA, TPU) is 
> predominant compared to CPUs. To make the current Spark architecture to work 
> with accelerator cards, Spark itself should understand the existence of 
> accelerators and know how to schedule task onto the executors where 
> accelerators are equipped.
> Current Spark’s scheduler schedules tasks based on the locality of the data 
> plus the available of CPUs. This will introduce some problems when scheduling 
> tasks with accelerators required.
>  # CPU cores are usually more than accelerators on one node, using CPU cores 
> to schedule accelerator required tasks will introduce the mismatch.
>  # In one cluster, we always assume that CPU is equipped in each node, but 
> this is not true of accelerator cards.
>  # The existence of heterogeneous tasks (accelerator required or not) 
> requires scheduler to schedule tasks with a smart way.
> So here propose to improve the current scheduler to support heterogeneous 
> tasks (accelerator requires or not). This can be part of the work of Project 
> hydrogen.
> Details is attached in google doc. It doesn't cover all the implementation 
> details, just highlight the parts should be changed.
>  
> CC [~yanboliang] [~merlintang]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24615) Accelerator-aware task scheduling for Spark

2018-07-17 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16547263#comment-16547263
 ] 

Saisai Shao commented on SPARK-24615:
-

Sure, I will also add it as Xiangrui also suggested the same concern.

> Accelerator-aware task scheduling for Spark
> ---
>
> Key: SPARK-24615
> URL: https://issues.apache.org/jira/browse/SPARK-24615
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Major
>  Labels: Hydrogen, SPIP
>
> In the machine learning area, accelerator card (GPU, FPGA, TPU) is 
> predominant compared to CPUs. To make the current Spark architecture to work 
> with accelerator cards, Spark itself should understand the existence of 
> accelerators and know how to schedule task onto the executors where 
> accelerators are equipped.
> Current Spark’s scheduler schedules tasks based on the locality of the data 
> plus the available of CPUs. This will introduce some problems when scheduling 
> tasks with accelerators required.
>  # CPU cores are usually more than accelerators on one node, using CPU cores 
> to schedule accelerator required tasks will introduce the mismatch.
>  # In one cluster, we always assume that CPU is equipped in each node, but 
> this is not true of accelerator cards.
>  # The existence of heterogeneous tasks (accelerator required or not) 
> requires scheduler to schedule tasks with a smart way.
> So here propose to improve the current scheduler to support heterogeneous 
> tasks (accelerator requires or not). This can be part of the work of Project 
> hydrogen.
> Details is attached in google doc. It doesn't cover all the implementation 
> details, just highlight the parts should be changed.
>  
> CC [~yanboliang] [~merlintang]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24615) Accelerator-aware task scheduling for Spark

2018-07-17 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16546590#comment-16546590
 ] 

Saisai Shao commented on SPARK-24615:
-

Yes, currently the user is responsible for asking yarn to get resources like 
GPU, for example like {{--num-gpus}}. 

Yes, I agree with you, the concerns you mentioned above is valid. But currently 
the design of this Jira only targets to the task level scheduling with 
accelerator resources already available. It assumes that accelerator resources 
is already got by executor and reported back to driver. Driver will schedule 
the tasks based on the available resources.

For Spark to communicate with cluster manager to request other resources like 
GPU, currently it is not covered in this design doc.

Xiangrui mentioned that Spark to communicate with cluster manager should also 
be covered in this SPIP, so I'm still under drafting.

 

> Accelerator-aware task scheduling for Spark
> ---
>
> Key: SPARK-24615
> URL: https://issues.apache.org/jira/browse/SPARK-24615
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Major
>  Labels: Hydrogen, SPIP
>
> In the machine learning area, accelerator card (GPU, FPGA, TPU) is 
> predominant compared to CPUs. To make the current Spark architecture to work 
> with accelerator cards, Spark itself should understand the existence of 
> accelerators and know how to schedule task onto the executors where 
> accelerators are equipped.
> Current Spark’s scheduler schedules tasks based on the locality of the data 
> plus the available of CPUs. This will introduce some problems when scheduling 
> tasks with accelerators required.
>  # CPU cores are usually more than accelerators on one node, using CPU cores 
> to schedule accelerator required tasks will introduce the mismatch.
>  # In one cluster, we always assume that CPU is equipped in each node, but 
> this is not true of accelerator cards.
>  # The existence of heterogeneous tasks (accelerator required or not) 
> requires scheduler to schedule tasks with a smart way.
> So here propose to improve the current scheduler to support heterogeneous 
> tasks (accelerator requires or not). This can be part of the work of Project 
> hydrogen.
> Details is attached in google doc. It doesn't cover all the implementation 
> details, just highlight the parts should be changed.
>  
> CC [~yanboliang] [~merlintang]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] SPARK 2.3.2 (RC3)

2018-07-16 Thread Saisai Shao
I will put my +1 on this RC.

For the test failure fix, I will include it if there's another RC.

Sean Owen  于2018年7月16日周一 下午10:47写道:

> OK, hm, will try to get to the bottom of it. But if others can build this
> module successfully, I give a +1 . The test failure is inevitable here and
> should not block release.
>
> On Sun, Jul 15, 2018 at 9:39 PM Saisai Shao 
> wrote:
>
>> Hi Sean,
>>
>> I just did a clean build with mvn/sbt on 2.3.2, I didn't meet the errors
>> you pasted here. I'm not sure how it happens.
>>
>> Sean Owen  于2018年7月16日周一 上午6:30写道:
>>
>>> Looks good to me, with the following caveats.
>>>
>>> First see the discussion on
>>> https://issues.apache.org/jira/browse/SPARK-24813 ; the
>>> flaky HiveExternalCatalogVersionsSuite will probably fail all the time
>>> right now. That's not a regression and is a test-only issue, so don't think
>>> it must block the release. However if this fix holds up, and we need
>>> another RC, worth pulling in for sure.
>>>
>>> Also is anyone seeing this while building and testing the Spark SQL +
>>> Kafka module? I see this error even after a clean rebuild. I sort of get
>>> what the error is saying but can't figure out why it would only happen at
>>> test/runtime. Haven't seen it before.
>>>
>>> [error] missing or invalid dependency detected while loading class file
>>> 'MetricsSystem.class'.
>>>
>>> [error] Could not access term eclipse in package org,
>>>
>>> [error] because it (or its dependencies) are missing. Check your build
>>> definition for
>>>
>>> [error] missing or conflicting dependencies. (Re-run with
>>> `-Ylog-classpath` to see the problematic classpath.)
>>>
>>> [error] A full rebuild may help if 'MetricsSystem.class' was compiled
>>> against an incompatible version of org.
>>>
>>> [error] missing or invalid dependency detected while loading class file
>>> 'MetricsSystem.class'.
>>>
>>> [error] Could not access term jetty in value org.eclipse,
>>>
>>> [error] because it (or its dependencies) are missing. Check your build
>>> definition for
>>>
>>> [error] missing or conflicting dependencies. (Re-run with
>>> `-Ylog-classpath` to see the problematic classpath.)
>>>
>>> [error] A full rebuild may help if 'MetricsSystem.class' was compiled
>>> against an incompatible version of org.eclipse
>>>
>>> On Sun, Jul 15, 2018 at 3:09 AM Saisai Shao 
>>> wrote:
>>>
>>>> Please vote on releasing the following candidate as Apache Spark
>>>> version 2.3.2.
>>>>
>>>> The vote is open until July 20 PST and passes if a majority +1 PMC
>>>> votes are cast, with a minimum of 3 +1 votes.
>>>>
>>>> [ ] +1 Release this package as Apache Spark 2.3.2
>>>> [ ] -1 Do not release this package because ...
>>>>
>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>
>>>> The tag to be voted on is v2.3.2-rc3
>>>> (commit b3726dadcf2997f20231873ec6e057dba433ae64):
>>>> https://github.com/apache/spark/tree/v2.3.2-rc3
>>>>
>>>> The release files, including signatures, digests, etc. can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc3-bin/
>>>>
>>>> Signatures used for Spark RCs can be found in this file:
>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>
>>>> The staging repository for this release can be found at:
>>>> https://repository.apache.org/content/repositories/orgapachespark-1278/
>>>>
>>>> The documentation corresponding to this release can be found at:
>>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc3-docs/
>>>>
>>>> The list of bug fixes going into 2.3.2 can be found at the following
>>>> URL:
>>>> https://issues.apache.org/jira/projects/SPARK/versions/12343289
>>>>
>>>> Note. RC2 was cancelled because of one blocking issue SPARK-24781
>>>> during release preparation.
>>>>
>>>> FAQ
>>>>
>>>> =
>>>> How can I help test this release?
>>>> =
>>>>
>>>> If you are a Spark user, you can help us test this release by taking
>>>

[jira] [Commented] (SPARK-24615) Accelerator-aware task scheduling for Spark

2018-07-16 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545962#comment-16545962
 ] 

Saisai Shao commented on SPARK-24615:
-

Sorry [~tgraves] for the late response. Yes,  when requesting executors, user 
should know accelerators are required or not. If there's no satisfied 
accelerators, the job will be pending or not launched. 

> Accelerator-aware task scheduling for Spark
> ---
>
> Key: SPARK-24615
> URL: https://issues.apache.org/jira/browse/SPARK-24615
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Major
>  Labels: Hydrogen, SPIP
>
> In the machine learning area, accelerator card (GPU, FPGA, TPU) is 
> predominant compared to CPUs. To make the current Spark architecture to work 
> with accelerator cards, Spark itself should understand the existence of 
> accelerators and know how to schedule task onto the executors where 
> accelerators are equipped.
> Current Spark’s scheduler schedules tasks based on the locality of the data 
> plus the available of CPUs. This will introduce some problems when scheduling 
> tasks with accelerators required.
>  # CPU cores are usually more than accelerators on one node, using CPU cores 
> to schedule accelerator required tasks will introduce the mismatch.
>  # In one cluster, we always assume that CPU is equipped in each node, but 
> this is not true of accelerator cards.
>  # The existence of heterogeneous tasks (accelerator required or not) 
> requires scheduler to schedule tasks with a smart way.
> So here propose to improve the current scheduler to support heterogeneous 
> tasks (accelerator requires or not). This can be part of the work of Project 
> hydrogen.
> Details is attached in google doc. It doesn't cover all the implementation 
> details, just highlight the parts should be changed.
>  
> CC [~yanboliang] [~merlintang]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] SPARK 2.3.2 (RC3)

2018-07-15 Thread Saisai Shao
Hi Sean,

I just did a clean build with mvn/sbt on 2.3.2, I didn't meet the errors
you pasted here. I'm not sure how it happens.

Sean Owen  于2018年7月16日周一 上午6:30写道:

> Looks good to me, with the following caveats.
>
> First see the discussion on
> https://issues.apache.org/jira/browse/SPARK-24813 ; the
> flaky HiveExternalCatalogVersionsSuite will probably fail all the time
> right now. That's not a regression and is a test-only issue, so don't think
> it must block the release. However if this fix holds up, and we need
> another RC, worth pulling in for sure.
>
> Also is anyone seeing this while building and testing the Spark SQL +
> Kafka module? I see this error even after a clean rebuild. I sort of get
> what the error is saying but can't figure out why it would only happen at
> test/runtime. Haven't seen it before.
>
> [error] missing or invalid dependency detected while loading class file
> 'MetricsSystem.class'.
>
> [error] Could not access term eclipse in package org,
>
> [error] because it (or its dependencies) are missing. Check your build
> definition for
>
> [error] missing or conflicting dependencies. (Re-run with
> `-Ylog-classpath` to see the problematic classpath.)
>
> [error] A full rebuild may help if 'MetricsSystem.class' was compiled
> against an incompatible version of org.
>
> [error] missing or invalid dependency detected while loading class file
> 'MetricsSystem.class'.
>
> [error] Could not access term jetty in value org.eclipse,
>
> [error] because it (or its dependencies) are missing. Check your build
> definition for
>
> [error] missing or conflicting dependencies. (Re-run with
> `-Ylog-classpath` to see the problematic classpath.)
>
> [error] A full rebuild may help if 'MetricsSystem.class' was compiled
> against an incompatible version of org.eclipse
>
> On Sun, Jul 15, 2018 at 3:09 AM Saisai Shao 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.3.2.
>>
>> The vote is open until July 20 PST and passes if a majority +1 PMC votes
>> are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 2.3.2
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.3.2-rc3
>> (commit b3726dadcf2997f20231873ec6e057dba433ae64):
>> https://github.com/apache/spark/tree/v2.3.2-rc3
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc3-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1278/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc3-docs/
>>
>> The list of bug fixes going into 2.3.2 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12343289
>>
>> Note. RC2 was cancelled because of one blocking issue SPARK-24781 during
>> release preparation.
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 2.3.2?
>> ===
>>
>> The current list of open tickets targeted at 2.3.2 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 2.3.2
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>>
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>>


[VOTE] SPARK 2.3.2 (RC3)

2018-07-15 Thread Saisai Shao
Please vote on releasing the following candidate as Apache Spark version
2.3.2.

The vote is open until July 20 PST and passes if a majority +1 PMC votes
are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.3.2
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.3.2-rc3
(commit b3726dadcf2997f20231873ec6e057dba433ae64):
https://github.com/apache/spark/tree/v2.3.2-rc3

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc3-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1278/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc3-docs/

The list of bug fixes going into 2.3.2 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12343289

Note. RC2 was cancelled because of one blocking issue SPARK-24781 during
release preparation.

FAQ

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 2.3.2?
===

The current list of open tickets targeted at 2.3.2 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 2.3.2

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


[jira] [Comment Edited] (SPARK-24804) There are duplicate words in the title in the DatasetSuite

2018-07-14 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16544386#comment-16544386
 ] 

Saisai Shao edited comment on SPARK-24804 at 7/15/18 1:29 AM:
--

Please don't set target version or fix version. Committers will help to set 
when this issue is resolved.

I'm removing the target version of this Jira to avoid blocking the release of 
2.3.2


was (Author: jerryshao):
Please don't set target version or fix version. Committers will help to set 
when this issue is resolved.

> There are duplicate words in the title in the DatasetSuite
> --
>
> Key: SPARK-24804
> URL: https://issues.apache.org/jira/browse/SPARK-24804
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2
>Reporter: hantiantian
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24804) There are duplicate words in the title in the DatasetSuite

2018-07-14 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16544386#comment-16544386
 ] 

Saisai Shao commented on SPARK-24804:
-

Please don't set target version or fix version. Committers will help to set 
when this issue is resolved.

> There are duplicate words in the title in the DatasetSuite
> --
>
> Key: SPARK-24804
> URL: https://issues.apache.org/jira/browse/SPARK-24804
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2
>Reporter: hantiantian
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24804) There are duplicate words in the title in the DatasetSuite

2018-07-14 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-24804:

Priority: Trivial  (was: Minor)

> There are duplicate words in the title in the DatasetSuite
> --
>
> Key: SPARK-24804
> URL: https://issues.apache.org/jira/browse/SPARK-24804
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2
>Reporter: hantiantian
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24804) There are duplicate words in the title in the DatasetSuite

2018-07-14 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-24804:

Target Version/s:   (was: 2.3.2)

> There are duplicate words in the title in the DatasetSuite
> --
>
> Key: SPARK-24804
> URL: https://issues.apache.org/jira/browse/SPARK-24804
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2
>Reporter: hantiantian
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24804) There are duplicate words in the title in the DatasetSuite

2018-07-14 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-24804:

Fix Version/s: (was: 2.3.2)

> There are duplicate words in the title in the DatasetSuite
> --
>
> Key: SPARK-24804
> URL: https://issues.apache.org/jira/browse/SPARK-24804
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2
>Reporter: hantiantian
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] SPARK 2.3.2 (RC1)

2018-07-11 Thread Saisai Shao
Hi Sean,

The doc for RC1 is not usable because of sphinx issue. It should be rebuilt
with python3 to avoid the issue. Also there's one more blocking issue in
SQL, so I will wait for that to cut a new RC.

Sean Owen  于2018年7月12日周四 上午9:05写道:

> I guess my question is just whether the Python docs are usable or not in
> this RC. They looked reasonable to me but I don't know enough to know what
> the issue was. If the result is usable, then there's no problem here, even
> if something could be fixed/improved later.
>
> On Sun, Jul 8, 2018 at 7:25 PM Saisai Shao  wrote:
>
>> Hi Sean,
>>
>> SPARK-24530 is not included in this RC1 release. Actually I'm so familiar
>> with this issue so still using python2 to generate docs.
>>
>> In the JIRA it mentioned that python3 with sphinx could workaround this
>> issue. @Hyukjin Kwon  would you please help to
>> clarify?
>>
>> Thanks
>> Saisai
>>
>>
>> Xiao Li  于2018年7月9日周一 上午1:59写道:
>>
>>> Three business days might be too short. Let us open the vote until the
>>> end of this Friday (July 13th)?
>>>
>>> Cheers,
>>>
>>> Xiao
>>>
>>> 2018-07-08 10:15 GMT-07:00 Sean Owen :
>>>
>>>> Just checking that the doc issue in
>>>> https://issues.apache.org/jira/browse/SPARK-24530 is worked around in
>>>> this release?
>>>>
>>>> This was pointed out as an example of a broken doc:
>>>>
>>>> https://spark.apache.org/docs/2.3.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression
>>>>
>>>> Here it is in 2.3.2 RC1:
>>>>
>>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-docs/_site/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression
>>>>
>>>> It wasn't immediately obvious to me whether this addressed the issue
>>>> that was identified or not.
>>>>
>>>>
>>>> Otherwise nothing is open for 2.3.2, sigs and license look good, tests
>>>> pass as last time, etc.
>>>>
>>>> +1
>>>>
>>>> On Sun, Jul 8, 2018 at 3:30 AM Saisai Shao 
>>>> wrote:
>>>>
>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>> version 2.3.2.
>>>>>
>>>>> The vote is open until July 11th PST and passes if a majority +1 PMC
>>>>> votes are cast, with a minimum of 3 +1 votes.
>>>>>
>>>>> [ ] +1 Release this package as Apache Spark 2.3.2
>>>>> [ ] -1 Do not release this package because ...
>>>>>
>>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>>
>>>>> The tag to be voted on is v2.3.2-rc1
>>>>> (commit 4df06b45160241dbb331153efbb25703f913c192):
>>>>> https://github.com/apache/spark/tree/v2.3.2-rc1
>>>>>
>>>>> The release files, including signatures, digests, etc. can be found at:
>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-bin/
>>>>>
>>>>> Signatures used for Spark RCs can be found in this file:
>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>>
>>>>> The staging repository for this release can be found at:
>>>>> https://repository.apache.org/content/repositories/orgapachespark-1277/
>>>>>
>>>>> The documentation corresponding to this release can be found at:
>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-docs/
>>>>>
>>>>> The list of bug fixes going into 2.3.2 can be found at the following
>>>>> URL:
>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12343289
>>>>>
>>>>> PS. This is my first time to do release, please help to check if
>>>>> everything is landing correctly. Thanks ^-^
>>>>>
>>>>> FAQ
>>>>>
>>>>> =
>>>>> How can I help test this release?
>>>>> =
>>>>>
>>>>> If you are a Spark user, you can help us test this release by taking
>>>>> an existing Spark workload and running on this release candidate, then
>>>>> reporting any regressions.
>>>>>
>>>>> If you're working in PySpark you can set up a virtual env and install
>>>>> the curren

Re: Some questions about cached data in Livy

2018-07-11 Thread Saisai Shao
Hi Wandong,

Livy's shared object mechanism mainly used to share objects between
different Livy jobs, this is mainly used for Job API. For example job A
create a object Foo which wants to be accessed by Job B, then user could
store this object Foo into JobContext with a provided name, after that Job
B could get this object by the name.

This is different from Spark's cache mechanism. What you mentioned above
(tmp table) is a Spark provided table cache mechanism, which is unrelated
to Livy.



Wandong Wu  于2018年7月11日周三 下午5:46写道:

> Dear Sir or Madam:
>
>   I am a Livy beginner. I use Livy, because within an interactive
> session, different spark jobs could share cached RDDs or DataFrames.
>
>   When I read some parquet files and create a table called “TmpTable”.
> The following queries will use this table. Does it mean this table has been
> cached?
>   If cached, where is the table cached? The table is cached in Livy or
> Spark cluster?
>
>   Spark also supports cache function.  When I read some parquet files
> and create a table called “TmpTable2”. I add such code: sql_ctx.cacheTable(
> *'tmpTable2'*).
>   In the next query using this table. It will be cached in Spark
> cluster. Then the following queries could use this cached table.
>
>   What is the difference between cached in Livy and cached in Spark
> cluster?
>
> Thanks!
>
> Yours
> Wandong
>
>


[jira] [Commented] (SPARK-24781) Using a reference from Dataset in Filter/Sort might not work.

2018-07-10 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539635#comment-16539635
 ] 

Saisai Shao commented on SPARK-24781:
-

I see. I will wait for this before cutting a new 2.3.2 RC release

> Using a reference from Dataset in Filter/Sort might not work.
> -
>
> Key: SPARK-24781
> URL: https://issues.apache.org/jira/browse/SPARK-24781
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Takuya Ueshin
>Priority: Blocker
>
> When we use a reference from {{Dataset}} in {{filter}} or {{sort}}, which was 
> not used in the prior {{select}}, an {{AnalysisException}} occurs, e.g.,
> {code:scala}
> val df = Seq(("test1", 0), ("test2", 1)).toDF("name", "id")
> df.select(df("name")).filter(df("id") === 0).show()
> {code}
> {noformat}
> org.apache.spark.sql.AnalysisException: Resolved attribute(s) id#6 missing 
> from name#5 in operator !Filter (id#6 = 0).;;
> !Filter (id#6 = 0)
>+- AnalysisBarrier
>   +- Project [name#5]
>  +- Project [_1#2 AS name#5, _2#3 AS id#6]
> +- LocalRelation [_1#2, _2#3]
> {noformat}
> If we use {{col}} instead, it works:
> {code:scala}
> val df = Seq(("test1", 0), ("test2", 1)).toDF("name", "id")
> df.select(col("name")).filter(col("id") === 0).show()
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24781) Using a reference from Dataset in Filter/Sort might not work.

2018-07-10 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16539630#comment-16539630
 ] 

Saisai Shao commented on SPARK-24781:
-

Thanks Felix. Does this have to be in 2.3.2? [~ueshin]

> Using a reference from Dataset in Filter/Sort might not work.
> -
>
> Key: SPARK-24781
> URL: https://issues.apache.org/jira/browse/SPARK-24781
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Takuya Ueshin
>Priority: Blocker
>
> When we use a reference from {{Dataset}} in {{filter}} or {{sort}}, which was 
> not used in the prior {{select}}, an {{AnalysisException}} occurs, e.g.,
> {code:scala}
> val df = Seq(("test1", 0), ("test2", 1)).toDF("name", "id")
> df.select(df("name")).filter(df("id") === 0).show()
> {code}
> {noformat}
> org.apache.spark.sql.AnalysisException: Resolved attribute(s) id#6 missing 
> from name#5 in operator !Filter (id#6 = 0).;;
> !Filter (id#6 = 0)
>+- AnalysisBarrier
>   +- Project [name#5]
>  +- Project [_1#2 AS name#5, _2#3 AS id#6]
> +- LocalRelation [_1#2, _2#3]
> {noformat}
> If we use {{col}} instead, it works:
> {code:scala}
> val df = Seq(("test1", 0), ("test2", 1)).toDF("name", "id")
> df.select(col("name")).filter(col("id") === 0).show()
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] SPARK 2.3.2 (RC1)

2018-07-10 Thread Saisai Shao
https://issues.apache.org/jira/browse/SPARK-24530 is just merged, I will
cancel this vote and prepare a new RC2 cut with doc fixed.

Thanks
Saisai

Wenchen Fan  于2018年7月11日周三 下午12:25写道:

> +1
>
> On Wed, Jul 11, 2018 at 1:31 AM John Zhuge  wrote:
>
>> +1
>>
>> On Sun, Jul 8, 2018 at 1:30 AM Saisai Shao 
>> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 2.3.2.
>>>
>>> The vote is open until July 11th PST and passes if a majority +1 PMC
>>> votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.3.2
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v2.3.2-rc1
>>> (commit 4df06b45160241dbb331153efbb25703f913c192):
>>> https://github.com/apache/spark/tree/v2.3.2-rc1
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1277/
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-docs/
>>>
>>> The list of bug fixes going into 2.3.2 can be found at the following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12343289
>>>
>>> PS. This is my first time to do release, please help to check if
>>> everything is landing correctly. Thanks ^-^
>>>
>>> FAQ
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 2.3.2?
>>> ===
>>>
>>> The current list of open tickets targeted at 2.3.2 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 2.3.2
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>>
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>
>>
>> --
>> John
>>
>


[jira] [Assigned] (SPARK-24678) We should use 'PROCESS_LOCAL' first for Spark-Streaming

2018-07-10 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-24678:
---

Assignee: sharkd tu

> We should use 'PROCESS_LOCAL' first for Spark-Streaming
> ---
>
> Key: SPARK-24678
> URL: https://issues.apache.org/jira/browse/SPARK-24678
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager
>Affects Versions: 2.3.1
>Reporter: sharkd tu
>Assignee: sharkd tu
>Priority: Major
> Fix For: 2.4.0
>
>
> Currently, `BlockRDD.getPreferredLocations`  only get hosts info of blocks, 
> which results in subsequent schedule level is not better than 'NODE_LOCAL'. 
> We can just make a small changes, the schedule level can be improved to 
> 'PROCESS_LOCAL'
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24678) We should use 'PROCESS_LOCAL' first for Spark-Streaming

2018-07-10 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-24678.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 21658
[https://github.com/apache/spark/pull/21658]

> We should use 'PROCESS_LOCAL' first for Spark-Streaming
> ---
>
> Key: SPARK-24678
> URL: https://issues.apache.org/jira/browse/SPARK-24678
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager
>Affects Versions: 2.3.1
>Reporter: sharkd tu
>Priority: Major
> Fix For: 2.4.0
>
>
> Currently, `BlockRDD.getPreferredLocations`  only get hosts info of blocks, 
> which results in subsequent schedule level is not better than 'NODE_LOCAL'. 
> We can just make a small changes, the schedule level can be improved to 
> 'PROCESS_LOCAL'
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24646) Support wildcard '*' for to spark.yarn.dist.forceDownloadSchemes

2018-07-08 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-24646.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 21633
[https://github.com/apache/spark/pull/21633]

> Support wildcard '*' for to spark.yarn.dist.forceDownloadSchemes
> 
>
> Key: SPARK-24646
> URL: https://issues.apache.org/jira/browse/SPARK-24646
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Minor
> Fix For: 2.4.0
>
>
> In the case of getting tokens via customized {{ServiceCredentialProvider}}, 
> it is required that {{ServiceCredentialProvider}} be available in local 
> spark-submit process classpath. In this case, all the configured remote 
> sources should be forced to download to local.
> For the ease of using this configuration, here propose to add wildcard '*' 
> support to {{spark.yarn.dist.forceDownloadSchemes}}, also clarify the usage 
> of this configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24646) Support wildcard '*' for to spark.yarn.dist.forceDownloadSchemes

2018-07-08 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-24646:
---

Assignee: Saisai Shao

> Support wildcard '*' for to spark.yarn.dist.forceDownloadSchemes
> 
>
> Key: SPARK-24646
> URL: https://issues.apache.org/jira/browse/SPARK-24646
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Minor
> Fix For: 2.4.0
>
>
> In the case of getting tokens via customized {{ServiceCredentialProvider}}, 
> it is required that {{ServiceCredentialProvider}} be available in local 
> spark-submit process classpath. In this case, all the configured remote 
> sources should be forced to download to local.
> For the ease of using this configuration, here propose to add wildcard '*' 
> support to {{spark.yarn.dist.forceDownloadSchemes}}, also clarify the usage 
> of this configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Podling report this month

2018-07-08 Thread Saisai Shao
Updated. Please review and sign off.

Thanks

Saisai Shao  于2018年7月9日周一 上午8:27写道:

> Sorry I forgot about this, will draft this today.
>
> Thanks
> Saisai
>
> Jean-Baptiste Onofré  于2018年7月8日周日 下午1:17写道:
>
>> Hi guys,
>>
>> we have to report this month:
>>
>> https://wiki.apache.org/incubator/July2018
>>
>> Did someone already start a report ? I can bootstrap one if you want.
>>
>> Thanks,
>> Regards
>> JB
>> --
>> Jean-Baptiste Onofré
>> jbono...@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>


Re: [VOTE] SPARK 2.3.2 (RC1)

2018-07-08 Thread Saisai Shao
Thanks @Hyukjin Kwon  . Yes I'm using python2 to build
docs, looks like Python2 with Sphinx has issues.

What is the pending thing for this PR (
https://github.com/apache/spark/pull/21659)? I'm planning to cut RC2 once
this is merged, do you an ETA for this PR?

Hyukjin Kwon  于2018年7月9日周一 上午9:06写道:

> Seems Python 2's Sphinx was used -
> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-docs/_site/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression
> and SPARK-24530 issue exists in the RC. it's kind of tricky to manually
> verify if Python 3 is used given my few tries in my local.
>
> I think the fix against SPARK-24530 is technically not merged yet;
> however, I don't think this blocks the release like the previous release. I
> think we could proceed in parallel.
> Will probably make a progress on
> https://github.com/apache/spark/pull/21659, and fix the release doc too.
>
>
> 2018년 7월 9일 (월) 오전 8:25, Saisai Shao 님이 작성:
>
>> Hi Sean,
>>
>> SPARK-24530 is not included in this RC1 release. Actually I'm so familiar
>> with this issue so still using python2 to generate docs.
>>
>> In the JIRA it mentioned that python3 with sphinx could workaround this
>> issue. @Hyukjin Kwon  would you please help to
>> clarify?
>>
>> Thanks
>> Saisai
>>
>>
>> Xiao Li  于2018年7月9日周一 上午1:59写道:
>>
>>> Three business days might be too short. Let us open the vote until the
>>> end of this Friday (July 13th)?
>>>
>>> Cheers,
>>>
>>> Xiao
>>>
>>> 2018-07-08 10:15 GMT-07:00 Sean Owen :
>>>
>>>> Just checking that the doc issue in
>>>> https://issues.apache.org/jira/browse/SPARK-24530 is worked around in
>>>> this release?
>>>>
>>>> This was pointed out as an example of a broken doc:
>>>>
>>>> https://spark.apache.org/docs/2.3.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression
>>>>
>>>> Here it is in 2.3.2 RC1:
>>>>
>>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-docs/_site/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression
>>>>
>>>> It wasn't immediately obvious to me whether this addressed the issue
>>>> that was identified or not.
>>>>
>>>>
>>>> Otherwise nothing is open for 2.3.2, sigs and license look good, tests
>>>> pass as last time, etc.
>>>>
>>>> +1
>>>>
>>>> On Sun, Jul 8, 2018 at 3:30 AM Saisai Shao 
>>>> wrote:
>>>>
>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>> version 2.3.2.
>>>>>
>>>>> The vote is open until July 11th PST and passes if a majority +1 PMC
>>>>> votes are cast, with a minimum of 3 +1 votes.
>>>>>
>>>>> [ ] +1 Release this package as Apache Spark 2.3.2
>>>>> [ ] -1 Do not release this package because ...
>>>>>
>>>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>>>
>>>>> The tag to be voted on is v2.3.2-rc1
>>>>> (commit 4df06b45160241dbb331153efbb25703f913c192):
>>>>> https://github.com/apache/spark/tree/v2.3.2-rc1
>>>>>
>>>>> The release files, including signatures, digests, etc. can be found at:
>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-bin/
>>>>>
>>>>> Signatures used for Spark RCs can be found in this file:
>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>>
>>>>> The staging repository for this release can be found at:
>>>>> https://repository.apache.org/content/repositories/orgapachespark-1277/
>>>>>
>>>>> The documentation corresponding to this release can be found at:
>>>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-docs/
>>>>>
>>>>> The list of bug fixes going into 2.3.2 can be found at the following
>>>>> URL:
>>>>> https://issues.apache.org/jira/projects/SPARK/versions/12343289
>>>>>
>>>>> PS. This is my first time to do release, please help to check if
>>>>> everything is landing correctly. Thanks ^-^
>>>>>
>>>>> FAQ
>>>>>
>>>>> =
>>>>> How can I help test this release?
>>>>> ==

Re: Podling report this month

2018-07-08 Thread Saisai Shao
Sorry I forgot about this, will draft this today.

Thanks
Saisai

Jean-Baptiste Onofré  于2018年7月8日周日 下午1:17写道:

> Hi guys,
>
> we have to report this month:
>
> https://wiki.apache.org/incubator/July2018
>
> Did someone already start a report ? I can bootstrap one if you want.
>
> Thanks,
> Regards
> JB
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Re: [VOTE] SPARK 2.3.2 (RC1)

2018-07-08 Thread Saisai Shao
Hi Sean,

SPARK-24530 is not included in this RC1 release. Actually I'm so familiar
with this issue so still using python2 to generate docs.

In the JIRA it mentioned that python3 with sphinx could workaround this
issue. @Hyukjin Kwon  would you please help to clarify?

Thanks
Saisai


Xiao Li  于2018年7月9日周一 上午1:59写道:

> Three business days might be too short. Let us open the vote until the end
> of this Friday (July 13th)?
>
> Cheers,
>
> Xiao
>
> 2018-07-08 10:15 GMT-07:00 Sean Owen :
>
>> Just checking that the doc issue in
>> https://issues.apache.org/jira/browse/SPARK-24530 is worked around in
>> this release?
>>
>> This was pointed out as an example of a broken doc:
>>
>> https://spark.apache.org/docs/2.3.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression
>>
>> Here it is in 2.3.2 RC1:
>>
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-docs/_site/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression
>>
>> It wasn't immediately obvious to me whether this addressed the issue that
>> was identified or not.
>>
>>
>> Otherwise nothing is open for 2.3.2, sigs and license look good, tests
>> pass as last time, etc.
>>
>> +1
>>
>> On Sun, Jul 8, 2018 at 3:30 AM Saisai Shao 
>> wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 2.3.2.
>>>
>>> The vote is open until July 11th PST and passes if a majority +1 PMC
>>> votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.3.2
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v2.3.2-rc1
>>> (commit 4df06b45160241dbb331153efbb25703f913c192):
>>> https://github.com/apache/spark/tree/v2.3.2-rc1
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1277/
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-docs/
>>>
>>> The list of bug fixes going into 2.3.2 can be found at the following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12343289
>>>
>>> PS. This is my first time to do release, please help to check if
>>> everything is landing correctly. Thanks ^-^
>>>
>>> FAQ
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 2.3.2?
>>> ===
>>>
>>> The current list of open tickets targeted at 2.3.2 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 2.3.2
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>>
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>
>


[VOTE] SPARK 2.3.2 (RC1)

2018-07-08 Thread Saisai Shao
Please vote on releasing the following candidate as Apache Spark version
2.3.2.

The vote is open until July 11th PST and passes if a majority +1 PMC votes
are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.3.2
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.3.2-rc1
(commit 4df06b45160241dbb331153efbb25703f913c192):
https://github.com/apache/spark/tree/v2.3.2-rc1

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1277/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.2-rc1-docs/

The list of bug fixes going into 2.3.2 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12343289

PS. This is my first time to do release, please help to check if everything
is landing correctly. Thanks ^-^

FAQ

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 2.3.2?
===

The current list of open tickets targeted at 2.3.2 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 2.3.2

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


[jira] [Comment Edited] (SPARK-24723) Discuss necessary info and access in barrier mode + YARN

2018-07-06 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534565#comment-16534565
 ] 

Saisai Shao edited comment on SPARK-24723 at 7/6/18 8:25 AM:
-

[~mengxr] [~jiangxb1987]

There's one solution to handle password-less SSH problem for all cluster 
manager in a programming way. This is referred from MPI on YARN framework 
[https://github.com/alibaba/mpich2-yarn]

In this MPI on YARN framework, before launching MPI job, application master 
(master) will generate ssh private key and public key and then propagate the 
public key to all the containers (worker), during container start, it will 
write public key to local authorized_keys file, so after that, MPI job started 
from master node can ssh with all the containers in password-less manner. After 
MPI job is finished, all the containers would delete this public key from 
authorized_keys file to revert the environment.

In our case, we could do this in a similar way, before launching MPI job, 0-th 
task could also generate ssh private key and public key, and then propagate the 
public keys to all the barrier task (maybe through BarrierTaskContext). For 
other tasks, they could receive public key from 0-th task and write public key 
to authorized_keys file (maybe by BarrierTaskContext). After this, 
password-less ssh is set up, mpirun from 0-th task could be started without 
password. After MPI job is finished, all the barrier tasks could delete this 
public key from authorized_keys file to revert the environment.

The example code is like below:
{code:java}
 rdd.barrier().mapPartitions { (iter, context) => 
  // Write iter to disk. ??? 
  // Wait until all tasks finished writing. 
  context.barrier() 
  // The 0-th task launches an MPI job. 
  if (context.partitionId() == 0) {
    // generate and propagate ssh keys.
    // Wait for keys to set up in other tasks.
    val hosts = context.getTaskInfos().map(_.host)
    // Set up MPI machine file using host infos. ???
    // Launch the MPI job by calling mpirun. ???
  } else {
    // get and setup public key
    // notify 0-th task that pubic key is setup.
  }

  // Wait until the MPI job finished. 
  context.barrier()

  // Delete SSH key and revert the environment. 
  // Collect output and return. ??? 
 }
{code}
 

What is your opinion about this solution?


was (Author: jerryshao):
[~mengxr] [~jiangxb1987]

There's one solution to handle password-less SSH problem for all cluster 
manager in a programming way. This is referred from MPI on YARN framework 
[https://github.com/alibaba/mpich2-yarn]

In this MPI on YARN framework, before launching MPI job, application master 
(master) will generate ssh private key and public key and then propagate the 
public key to all the containers (worker), during container start, it will 
write public key to local authorized_keys file, so after that, MPI job started 
from master node can ssh with all the containers in password-less manner. After 
MPI job is finished, all the containers would delete this public key from 
authorized_keys file to revert the environment.

In our case, we could do this in a similar way, before launching MPI job, 0-th 
task could also generate ssh private key and public key, and then propagate the 
public keys to all the barrier task (maybe through BarrierTaskContext). For 
other tasks, they could receive public key from 0-th task and write public key 
to authorized_keys file (maybe by BarrierTaskContext). After this, 
password-less ssh is set up, mpirun from 0-th task could be started without 
password. After MPI job is finished, all the barrier tasks could delete this 
public key from authorized_keys file to revert the environment.

The example code is like below:

 
rdd.barrier().mapPartitions { (iter, context) =>
  // Write iter to disk.???
  // Wait until all tasks finished writing.
  context.barrier()
  // The 0-th task launches an MPI job.
  if (context.partitionId() == 0) {
// generate and propagate ssh keys.
// Wait for keys to set up in other tasks.

val hosts = context.getTaskInfos().map(_.host)  
// Set up MPI machine file using host infos.  ???
// Launch the MPI job by calling mpirun.  ???
  } else {
// get and setup public key
// notify 0-th task that pubic key is setup.
  }  
  // Wait until the MPI job finished.
  context.barrier()

  // Delete SSH key and revert the environment.  
  // Collect output and return.???  
}
 

What is your opinion about this solution?

> Discuss necessary info and access in barrier mode + YARN
> 
>
> Key: SPARK-24723
> URL: https://issues.apache.org/jira/browse/SPARK-24723
> Project: Spark
>  Issue Type: St

[jira] [Commented] (SPARK-24723) Discuss necessary info and access in barrier mode + YARN

2018-07-06 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534565#comment-16534565
 ] 

Saisai Shao commented on SPARK-24723:
-

[~mengxr] [~jiangxb1987]

There's one solution to handle password-less SSH problem for all cluster 
manager in a programming way. This is referred from MPI on YARN framework 
[https://github.com/alibaba/mpich2-yarn]

In this MPI on YARN framework, before launching MPI job, application master 
(master) will generate ssh private key and public key and then propagate the 
public key to all the containers (worker), during container start, it will 
write public key to local authorized_keys file, so after that, MPI job started 
from master node can ssh with all the containers in password-less manner. After 
MPI job is finished, all the containers would delete this public key from 
authorized_keys file to revert the environment.

In our case, we could do this in a similar way, before launching MPI job, 0-th 
task could also generate ssh private key and public key, and then propagate the 
public keys to all the barrier task (maybe through BarrierTaskContext). For 
other tasks, they could receive public key from 0-th task and write public key 
to authorized_keys file (maybe by BarrierTaskContext). After this, 
password-less ssh is set up, mpirun from 0-th task could be started without 
password. After MPI job is finished, all the barrier tasks could delete this 
public key from authorized_keys file to revert the environment.

The example code is like below:

 
rdd.barrier().mapPartitions { (iter, context) =>
  // Write iter to disk.???
  // Wait until all tasks finished writing.
  context.barrier()
  // The 0-th task launches an MPI job.
  if (context.partitionId() == 0) {
// generate and propagate ssh keys.
// Wait for keys to set up in other tasks.

val hosts = context.getTaskInfos().map(_.host)  
// Set up MPI machine file using host infos.  ???
// Launch the MPI job by calling mpirun.  ???
  } else {
// get and setup public key
// notify 0-th task that pubic key is setup.
  }  
  // Wait until the MPI job finished.
  context.barrier()

  // Delete SSH key and revert the environment.  
  // Collect output and return.???  
}
 

What is your opinion about this solution?

> Discuss necessary info and access in barrier mode + YARN
> 
>
> Key: SPARK-24723
> URL: https://issues.apache.org/jira/browse/SPARK-24723
> Project: Spark
>  Issue Type: Story
>  Components: ML, Spark Core
>Affects Versions: 3.0.0
>Reporter: Xiangrui Meng
>Priority: Major
>
> In barrier mode, to run hybrid distributed DL training jobs, we need to 
> provide users sufficient info and access so they can set up a hybrid 
> distributed training job, e.g., using MPI.
> This ticket limits the scope of discussion to Spark + YARN. There were some 
> past attempts from the Hadoop community. So we should find someone with good 
> knowledge to lead the discussion here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24723) Discuss necessary info and access in barrier mode + YARN

2018-07-06 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534529#comment-16534529
 ] 

Saisai Shao commented on SPARK-24723:
-

After discussed with Xiangrui offline, resource reservation is not the key 
focus here. Here the main problem is how to provide necessary information for 
barrier tasks to start MPI job in a password-less manner.

> Discuss necessary info and access in barrier mode + YARN
> 
>
> Key: SPARK-24723
> URL: https://issues.apache.org/jira/browse/SPARK-24723
> Project: Spark
>  Issue Type: Story
>  Components: ML, Spark Core
>Affects Versions: 3.0.0
>Reporter: Xiangrui Meng
>Priority: Major
>
> In barrier mode, to run hybrid distributed DL training jobs, we need to 
> provide users sufficient info and access so they can set up a hybrid 
> distributed training job, e.g., using MPI.
> This ticket limits the scope of discussion to Spark + YARN. There were some 
> past attempts from the Hadoop community. So we should find someone with good 
> knowledge to lead the discussion here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (LIVY-480) Apache livy and Ajax

2018-07-04 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/LIVY-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16533241#comment-16533241
 ] 

Saisai Shao commented on LIVY-480:
--

Please ask question in the mail list, to see if someone can answer your 
questions.

> Apache livy and Ajax
> 
>
> Key: LIVY-480
> URL: https://issues.apache.org/jira/browse/LIVY-480
> Project: Livy
>  Issue Type: Request
>  Components: API, Interpreter, REPL, Server
>Affects Versions: 0.5.0
>Reporter: Melchicédec NDUWAYO
>Priority: Major
>
> Good morning every one,
> can someone help me with how to send a post request by ajax to the apache 
> livy server? I tried my best but I don't see how to do that. I have a 
> textarea in which user can put his spark code. I want to send that code to 
> the apache livy by ajax and get the result. I don't if my question is clear. 
> Thank you.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (LIVY-480) Apache livy and Ajax

2018-07-04 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-480.
--
Resolution: Invalid

> Apache livy and Ajax
> 
>
> Key: LIVY-480
> URL: https://issues.apache.org/jira/browse/LIVY-480
> Project: Livy
>  Issue Type: Request
>  Components: API, Interpreter, REPL, Server
>Affects Versions: 0.5.0
>Reporter: Melchicédec NDUWAYO
>Priority: Major
>
> Good morning every one,
> can someone help me with how to send a post request by ajax to the apache 
> livy server? I tried my best but I don't see how to do that. I have a 
> textarea in which user can put his spark code. I want to send that code to 
> the apache livy by ajax and get the result. I don't if my question is clear. 
> Thank you.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SPARK-24723) Discuss necessary info and access in barrier mode + YARN

2018-07-04 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16533205#comment-16533205
 ] 

Saisai Shao commented on SPARK-24723:
-

Hi [~mengxr], I would like to know the goal of this ticket? The goal of barrier 
scheduler is to offer gang semantics in the task scheduling level, whereas the 
gang semantics in the YARN level is more regarding to resource level.

I discussed with [~leftnoteasy] about the feasibility of supporting gang 
semantics on YARN. YARN has Reservation System which support gang like 
semantics (reserve requested resources), but it is not designed for gang. Here 
is some thoughts about supporting it on YARN 
[https://docs.google.com/document/d/1OA-iVwuHB8wlzwwlrEHOK6Q2SlKy3-5QEB5AmwMVEUU/edit?usp=sharing],
 I'm not sure if it aligns your goal of this ticket.

> Discuss necessary info and access in barrier mode + YARN
> 
>
> Key: SPARK-24723
> URL: https://issues.apache.org/jira/browse/SPARK-24723
> Project: Spark
>  Issue Type: Story
>  Components: ML, Spark Core
>Affects Versions: 3.0.0
>Reporter: Xiangrui Meng
>Priority: Major
>
> In barrier mode, to run hybrid distributed DL training jobs, we need to 
> provide users sufficient info and access so they can set up a hybrid 
> distributed training job, e.g., using MPI.
> This ticket limits the scope of discussion to Spark + YARN. There were some 
> past attempts from the Hadoop community. So we should find someone with good 
> knowledge to lead the discussion here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24739) PySpark does not work with Python 3.7.0

2018-07-04 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16533189#comment-16533189
 ] 

Saisai Shao commented on SPARK-24739:
-

Do we have to fix it in 2.3.2? I don't think it is even critical. For example 
Java 9 is out for a long time, but we still don't support Java 9. So this seems 
not a big problem. 

> PySpark does not work with Python 3.7.0
> ---
>
> Key: SPARK-24739
> URL: https://issues.apache.org/jira/browse/SPARK-24739
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.3, 2.2.2, 2.3.1
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Critical
>
> Python 3.7 is released in few days ago and our PySpark does not work. For 
> example
> {code}
> sc.parallelize([1, 2]).take(1)
> {code}
> {code}
> File "/.../spark/python/pyspark/rdd.py", line 1343, in __main__.RDD.take
> Failed example:
> sc.parallelize(range(100), 100).filter(lambda x: x > 90).take(3)
> Exception raised:
> Traceback (most recent call last):
>   File "/.../3.7/lib/python3.7/doctest.py", line 1329, in __run
> compileflags, 1), test.globs)
>   File "", line 1, in 
> sc.parallelize(range(100), 100).filter(lambda x: x > 90).take(3)
>   File "/.../spark/python/pyspark/rdd.py", line 1377, in take
> res = self.context.runJob(self, takeUpToNumLeft, p)
>   File "/.../spark/python/pyspark/context.py", line 1013, in runJob
> sock_info = self._jvm.PythonRDD.runJob(self._jsc.sc(), 
> mappedRDD._jrdd, partitions)
>   File "/.../spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", 
> line 1257, in __call__
> answer, self.gateway_client, self.target_id, self.name)
>   File "/.../spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 
> 328, in get_return_value
> format(target_id, ".", name), value)
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.runJob.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 
> 0 in stage 143.0 failed 1 times, most recent failure: Lost task 0.0 in stage 
> 143.0 (TID 688, localhost, executor driver): 
> org.apache.spark.api.python.PythonException: Traceback (most recent call 
> last):
>   File "/.../spark/python/pyspark/rdd.py", line 1373, in takeUpToNumLeft
> yield next(iterator)
> StopIteration
> The above exception was the direct cause of the following exception:
> Traceback (most recent call last):
>   File "/.../spark/python/lib/pyspark.zip/pyspark/worker.py", line 320, 
> in main
> process()
>   File "/.../spark/python/lib/pyspark.zip/pyspark/worker.py", line 315, 
> in process
> serializer.dump_stream(func(split_index, iterator), outfile)
>   File "/.../spark/python/lib/pyspark.zip/pyspark/serializers.py", line 
> 378, in dump_stream
> vs = list(itertools.islice(iterator, batch))
> RuntimeError: generator raised StopIteration
>   at 
> org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:309)
>   at 
> org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:449)
>   at 
> org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:432)
>   at 
> org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:263)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:891)
>   at 
> org.apache.spark.InterruptibleIterator.foreach(InterruptibleIterator.scala:28)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>   at 
> org.apache.spark.InterruptibleIterator.to(InterruptibleIterator.scala:28)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
>   at 
> org.apache.spark.InterruptibleIterator.toBuffer(InterruptibleIterator.scala:28)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
&g

Re: Time for 2.3.2?

2018-07-03 Thread Saisai Shao
FYI, currently we have one block issue (
https://issues.apache.org/jira/browse/SPARK-24535), will start the release
after this is fixed.

Also please let me know if there're other blocks or fixes want to land to
2.3.2 release.

Thanks
Saisai

Saisai Shao  于2018年7月2日周一 下午1:16写道:

> I will start preparing the release.
>
> Thanks
>
> John Zhuge  于2018年6月30日周六 上午10:31写道:
>
>> +1  Looking forward to the critical fixes in 2.3.2.
>>
>> On Thu, Jun 28, 2018 at 9:37 AM Ryan Blue 
>> wrote:
>>
>>> +1
>>>
>>> On Thu, Jun 28, 2018 at 9:34 AM Xiao Li  wrote:
>>>
>>>> +1. Thanks, Saisai!
>>>>
>>>> The impact of SPARK-24495 is large. We should release Spark 2.3.2 ASAP.
>>>>
>>>> Thanks,
>>>>
>>>> Xiao
>>>>
>>>> 2018-06-27 23:28 GMT-07:00 Takeshi Yamamuro :
>>>>
>>>>> +1, I heard some Spark users have skipped v2.3.1 because of these bugs.
>>>>>
>>>>> On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang 
>>>>> wrote:
>>>>>
>>>>>> +1
>>>>>>
>>>>>> Wenchen Fan 于2018年6月28日 周四下午2:06写道:
>>>>>>
>>>>>>> Hi Saisai, that's great! please go ahead!
>>>>>>>
>>>>>>> On Thu, Jun 28, 2018 at 12:56 PM Saisai Shao 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> +1, like mentioned by Marcelo, these issues seems quite severe.
>>>>>>>>
>>>>>>>> I can work on the release if short of hands :).
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>> Jerry
>>>>>>>>
>>>>>>>>
>>>>>>>> Marcelo Vanzin  于2018年6月28日周四
>>>>>>>> 上午11:40写道:
>>>>>>>>
>>>>>>>>> +1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get
>>>>>>>>> fixes
>>>>>>>>> for those out.
>>>>>>>>>
>>>>>>>>> (Those are what delayed 2.2.2 and 2.1.3 for those watching...)
>>>>>>>>>
>>>>>>>>> On Wed, Jun 27, 2018 at 7:59 PM, Wenchen Fan 
>>>>>>>>> wrote:
>>>>>>>>> > Hi all,
>>>>>>>>> >
>>>>>>>>> > Spark 2.3.1 was released just a while ago, but unfortunately we
>>>>>>>>> discovered
>>>>>>>>> > and fixed some critical issues afterward.
>>>>>>>>> >
>>>>>>>>> > SPARK-24495: SortMergeJoin may produce wrong result.
>>>>>>>>> > This is a serious correctness bug, and is easy to hit: have
>>>>>>>>> duplicated join
>>>>>>>>> > key from the left table, e.g. `WHERE t1.a = t2.b AND t1.a =
>>>>>>>>> t2.c`, and the
>>>>>>>>> > join is a sort merge join. This bug is only present in Spark 2.3.
>>>>>>>>> >
>>>>>>>>> > SPARK-24588: stream-stream join may produce wrong result
>>>>>>>>> > This is a correctness bug in a new feature of Spark 2.3: the
>>>>>>>>> stream-stream
>>>>>>>>> > join. Users can hit this bug if one of the join side is
>>>>>>>>> partitioned by a
>>>>>>>>> > subset of the join keys.
>>>>>>>>> >
>>>>>>>>> > SPARK-24552: Task attempt numbers are reused when stages are
>>>>>>>>> retried
>>>>>>>>> > This is a long-standing bug in the output committer that may
>>>>>>>>> introduce data
>>>>>>>>> > corruption.
>>>>>>>>> >
>>>>>>>>> > SPARK-24542: UDFXPath allow users to pass carefully crafted
>>>>>>>>> XML to
>>>>>>>>> > access arbitrary files
>>>>>>>>> > This is a potential security issue if users build access control
>>>>>>>>> module upon
>>>>>>>>> > Spark.
>>>>>>>>> >
>>>>>>>>> > I think we need a Spark 2.3.2 to address these issues(especially
>>>>>>>>> the
>>>>>>>>> > correctness bugs) ASAP. Any thoughts?
>>>>>>>>> >
>>>>>>>>> > Thanks,
>>>>>>>>> > Wenchen
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Marcelo
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -
>>>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>>>
>>>>>>>>>
>>>>>
>>>>> --
>>>>> ---
>>>>> Takeshi Yamamuro
>>>>>
>>>>
>>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>> --
>>> John Zhuge
>>>
>>


[jira] [Commented] (SPARK-24535) Fix java version parsing in SparkR on Windows

2018-07-03 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532145#comment-16532145
 ] 

Saisai Shao commented on SPARK-24535:
-

Hi [~felixcheung] what's the current status of this JIRA, do you have an ETA 
about it?

> Fix java version parsing in SparkR on Windows
> -
>
> Key: SPARK-24535
> URL: https://issues.apache.org/jira/browse/SPARK-24535
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.1, 2.4.0
>Reporter: Shivaram Venkataraman
>Assignee: Felix Cheung
>Priority: Blocker
>
> We see errors on CRAN of the form 
> {code:java}
>   java version "1.8.0_144"
>   Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
>   Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
>   Picked up _JAVA_OPTIONS: -XX:-UsePerfData 
>   -- 1. Error: create DataFrame from list or data.frame (@test_basic.R#21)  
> --
>   subscript out of bounds
>   1: sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE, 
> sparkConfig = sparkRTestConfig) at 
> D:/temp/RtmpIJ8Cc3/RLIBS_3242c713c3181/SparkR/tests/testthat/test_basic.R:21
>   2: sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, 
> sparkExecutorEnvMap, 
>  sparkJars, sparkPackages)
>   3: checkJavaVersion()
>   4: strsplit(javaVersionFilter[[1]], "[\"]")
> {code}
> The complete log file is at 
> http://home.apache.org/~shivaram/SparkR_2.3.1_check_results/Windows/00check.log



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24530) Sphinx doesn't render autodoc_docstring_signature correctly (with Python 2?) and pyspark.ml docs are broken

2018-07-03 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16532143#comment-16532143
 ] 

Saisai Shao commented on SPARK-24530:
-

Hi [~hyukjin.kwon] what is the current status of this JIRA, do you have an ETA 
about it?

> Sphinx doesn't render autodoc_docstring_signature correctly (with Python 2?) 
> and pyspark.ml docs are broken
> ---
>
> Key: SPARK-24530
> URL: https://issues.apache.org/jira/browse/SPARK-24530
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.4.0
>Reporter: Xiangrui Meng
>Assignee: Hyukjin Kwon
>Priority: Blocker
> Attachments: Screen Shot 2018-06-12 at 8.23.18 AM.png, Screen Shot 
> 2018-06-12 at 8.23.29 AM.png, image-2018-06-13-15-15-51-025.png, 
> pyspark-ml-doc-utuntu18.04-python2.7-sphinx-1.7.5.png
>
>
> I generated python docs from master locally using `make html`. However, the 
> generated html doc doesn't render class docs correctly. I attached the 
> screenshot from Spark 2.3 docs and master docs generated on my local. Not 
> sure if this is because my local setup.
> cc: [~dongjoon] Could you help verify?
>  
> The followings are our released doc status. Some recent docs seems to be 
> broken.
> *2.1.x*
> (O) 
> [https://spark.apache.org/docs/2.1.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (O) 
> [https://spark.apache.org/docs/2.1.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (X) 
> [https://spark.apache.org/docs/2.1.2/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> *2.2.x*
> (O) 
> [https://spark.apache.org/docs/2.2.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (X) 
> [https://spark.apache.org/docs/2.2.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> *2.3.x*
> (O) 
> [https://spark.apache.org/docs/2.3.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (X) 
> [https://spark.apache.org/docs/2.3.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Time for 2.3.2?

2018-07-01 Thread Saisai Shao
I will start preparing the release.

Thanks

John Zhuge  于2018年6月30日周六 上午10:31写道:

> +1  Looking forward to the critical fixes in 2.3.2.
>
> On Thu, Jun 28, 2018 at 9:37 AM Ryan Blue 
> wrote:
>
>> +1
>>
>> On Thu, Jun 28, 2018 at 9:34 AM Xiao Li  wrote:
>>
>>> +1. Thanks, Saisai!
>>>
>>> The impact of SPARK-24495 is large. We should release Spark 2.3.2 ASAP.
>>>
>>> Thanks,
>>>
>>> Xiao
>>>
>>> 2018-06-27 23:28 GMT-07:00 Takeshi Yamamuro :
>>>
>>>> +1, I heard some Spark users have skipped v2.3.1 because of these bugs.
>>>>
>>>> On Thu, Jun 28, 2018 at 3:09 PM Xingbo Jiang 
>>>> wrote:
>>>>
>>>>> +1
>>>>>
>>>>> Wenchen Fan 于2018年6月28日 周四下午2:06写道:
>>>>>
>>>>>> Hi Saisai, that's great! please go ahead!
>>>>>>
>>>>>> On Thu, Jun 28, 2018 at 12:56 PM Saisai Shao 
>>>>>> wrote:
>>>>>>
>>>>>>> +1, like mentioned by Marcelo, these issues seems quite severe.
>>>>>>>
>>>>>>> I can work on the release if short of hands :).
>>>>>>>
>>>>>>> Thanks
>>>>>>> Jerry
>>>>>>>
>>>>>>>
>>>>>>> Marcelo Vanzin  于2018年6月28日周四
>>>>>>> 上午11:40写道:
>>>>>>>
>>>>>>>> +1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get
>>>>>>>> fixes
>>>>>>>> for those out.
>>>>>>>>
>>>>>>>> (Those are what delayed 2.2.2 and 2.1.3 for those watching...)
>>>>>>>>
>>>>>>>> On Wed, Jun 27, 2018 at 7:59 PM, Wenchen Fan 
>>>>>>>> wrote:
>>>>>>>> > Hi all,
>>>>>>>> >
>>>>>>>> > Spark 2.3.1 was released just a while ago, but unfortunately we
>>>>>>>> discovered
>>>>>>>> > and fixed some critical issues afterward.
>>>>>>>> >
>>>>>>>> > SPARK-24495: SortMergeJoin may produce wrong result.
>>>>>>>> > This is a serious correctness bug, and is easy to hit: have
>>>>>>>> duplicated join
>>>>>>>> > key from the left table, e.g. `WHERE t1.a = t2.b AND t1.a =
>>>>>>>> t2.c`, and the
>>>>>>>> > join is a sort merge join. This bug is only present in Spark 2.3.
>>>>>>>> >
>>>>>>>> > SPARK-24588: stream-stream join may produce wrong result
>>>>>>>> > This is a correctness bug in a new feature of Spark 2.3: the
>>>>>>>> stream-stream
>>>>>>>> > join. Users can hit this bug if one of the join side is
>>>>>>>> partitioned by a
>>>>>>>> > subset of the join keys.
>>>>>>>> >
>>>>>>>> > SPARK-24552: Task attempt numbers are reused when stages are
>>>>>>>> retried
>>>>>>>> > This is a long-standing bug in the output committer that may
>>>>>>>> introduce data
>>>>>>>> > corruption.
>>>>>>>> >
>>>>>>>> > SPARK-24542: UDFXPath allow users to pass carefully crafted
>>>>>>>> XML to
>>>>>>>> > access arbitrary files
>>>>>>>> > This is a potential security issue if users build access control
>>>>>>>> module upon
>>>>>>>> > Spark.
>>>>>>>> >
>>>>>>>> > I think we need a Spark 2.3.2 to address these issues(especially
>>>>>>>> the
>>>>>>>> > correctness bugs) ASAP. Any thoughts?
>>>>>>>> >
>>>>>>>> > Thanks,
>>>>>>>> > Wenchen
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Marcelo
>>>>>>>>
>>>>>>>>
>>>>>>>> -
>>>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>>>>>>
>>>>>>>>
>>>>
>>>> --
>>>> ---
>>>> Takeshi Yamamuro
>>>>
>>>
>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>> --
>> John Zhuge
>>
>


[jira] [Commented] (SPARK-24530) Sphinx doesn't render autodoc_docstring_signature correctly (with Python 2?) and pyspark.ml docs are broken

2018-07-01 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529352#comment-16529352
 ] 

Saisai Shao commented on SPARK-24530:
-

[~hyukjin.kwon], Spark 2.1.3 and 2.2.2 are on vote, can you please fix the 
issue and leave the comments in the related threads.

> Sphinx doesn't render autodoc_docstring_signature correctly (with Python 2?) 
> and pyspark.ml docs are broken
> ---
>
> Key: SPARK-24530
> URL: https://issues.apache.org/jira/browse/SPARK-24530
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.4.0
>Reporter: Xiangrui Meng
>Assignee: Hyukjin Kwon
>Priority: Blocker
> Attachments: Screen Shot 2018-06-12 at 8.23.18 AM.png, Screen Shot 
> 2018-06-12 at 8.23.29 AM.png, image-2018-06-13-15-15-51-025.png, 
> pyspark-ml-doc-utuntu18.04-python2.7-sphinx-1.7.5.png
>
>
> I generated python docs from master locally using `make html`. However, the 
> generated html doc doesn't render class docs correctly. I attached the 
> screenshot from Spark 2.3 docs and master docs generated on my local. Not 
> sure if this is because my local setup.
> cc: [~dongjoon] Could you help verify?
>  
> The followings are our released doc status. Some recent docs seems to be 
> broken.
> *2.1.x*
> (O) 
> [https://spark.apache.org/docs/2.1.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (O) 
> [https://spark.apache.org/docs/2.1.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (X) 
> [https://spark.apache.org/docs/2.1.2/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> *2.2.x*
> (O) 
> [https://spark.apache.org/docs/2.2.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (X) 
> [https://spark.apache.org/docs/2.2.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> *2.3.x*
> (O) 
> [https://spark.apache.org/docs/2.3.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (X) 
> [https://spark.apache.org/docs/2.3.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24530) Sphinx doesn't render autodoc_docstring_signature correctly (with Python 2?) and pyspark.ml docs are broken

2018-07-01 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-24530:

Target Version/s: 2.1.3, 2.2.2, 2.3.2, 2.4.0  (was: 2.4.0)

> Sphinx doesn't render autodoc_docstring_signature correctly (with Python 2?) 
> and pyspark.ml docs are broken
> ---
>
> Key: SPARK-24530
> URL: https://issues.apache.org/jira/browse/SPARK-24530
> Project: Spark
>  Issue Type: Bug
>  Components: ML, PySpark
>Affects Versions: 2.4.0
>Reporter: Xiangrui Meng
>Assignee: Hyukjin Kwon
>Priority: Blocker
> Attachments: Screen Shot 2018-06-12 at 8.23.18 AM.png, Screen Shot 
> 2018-06-12 at 8.23.29 AM.png, image-2018-06-13-15-15-51-025.png, 
> pyspark-ml-doc-utuntu18.04-python2.7-sphinx-1.7.5.png
>
>
> I generated python docs from master locally using `make html`. However, the 
> generated html doc doesn't render class docs correctly. I attached the 
> screenshot from Spark 2.3 docs and master docs generated on my local. Not 
> sure if this is because my local setup.
> cc: [~dongjoon] Could you help verify?
>  
> The followings are our released doc status. Some recent docs seems to be 
> broken.
> *2.1.x*
> (O) 
> [https://spark.apache.org/docs/2.1.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (O) 
> [https://spark.apache.org/docs/2.1.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (X) 
> [https://spark.apache.org/docs/2.1.2/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> *2.2.x*
> (O) 
> [https://spark.apache.org/docs/2.2.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (X) 
> [https://spark.apache.org/docs/2.2.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> *2.3.x*
> (O) 
> [https://spark.apache.org/docs/2.3.0/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]
> (X) 
> [https://spark.apache.org/docs/2.3.1/api/python/pyspark.ml.html#pyspark.ml.classification.LogisticRegression]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Does Livy Support Basic Auth?

2018-07-01 Thread Saisai Shao
No, it doesn't support, you should change the code to add a specific Filter
to make it work.

Harsch, Tim  于2018年6月30日周六 上午12:49写道:

> I thought I read somewhere that Livy supports basic auth, but don't see
> configurations for it.  Does it?
>


Re: Time for 2.3.2?

2018-06-27 Thread Saisai Shao
+1, like mentioned by Marcelo, these issues seems quite severe.

I can work on the release if short of hands :).

Thanks
Jerry


Marcelo Vanzin  于2018年6月28日周四 上午11:40写道:

> +1. SPARK-24589 / SPARK-24552 are kinda nasty and we should get fixes
> for those out.
>
> (Those are what delayed 2.2.2 and 2.1.3 for those watching...)
>
> On Wed, Jun 27, 2018 at 7:59 PM, Wenchen Fan  wrote:
> > Hi all,
> >
> > Spark 2.3.1 was released just a while ago, but unfortunately we
> discovered
> > and fixed some critical issues afterward.
> >
> > SPARK-24495: SortMergeJoin may produce wrong result.
> > This is a serious correctness bug, and is easy to hit: have duplicated
> join
> > key from the left table, e.g. `WHERE t1.a = t2.b AND t1.a = t2.c`, and
> the
> > join is a sort merge join. This bug is only present in Spark 2.3.
> >
> > SPARK-24588: stream-stream join may produce wrong result
> > This is a correctness bug in a new feature of Spark 2.3: the
> stream-stream
> > join. Users can hit this bug if one of the join side is partitioned by a
> > subset of the join keys.
> >
> > SPARK-24552: Task attempt numbers are reused when stages are retried
> > This is a long-standing bug in the output committer that may introduce
> data
> > corruption.
> >
> > SPARK-24542: UDFXPath allow users to pass carefully crafted XML to
> > access arbitrary files
> > This is a potential security issue if users build access control module
> upon
> > Spark.
> >
> > I think we need a Spark 2.3.2 to address these issues(especially the
> > correctness bugs) ASAP. Any thoughts?
> >
> > Thanks,
> > Wenchen
>
>
>
> --
> Marcelo
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


[jira] [Commented] (LIVY-479) Enable livy.rsc.launcher.address configuration

2018-06-27 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/LIVY-479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525775#comment-16525775
 ] 

Saisai Shao commented on LIVY-479:
--

Yes, I've left the comments on the PR.

> Enable livy.rsc.launcher.address configuration
> --
>
> Key: LIVY-479
> URL: https://issues.apache.org/jira/browse/LIVY-479
> Project: Livy
>  Issue Type: Improvement
>  Components: RSC
>Affects Versions: 0.5.0
>Reporter: Tao Li
>Priority: Major
>
> The current behavior is that we are setting livy.rsc.launcher.address to RPC 
> server and ignoring whatever is specified in configs. However in some 
> scenarios there is a need for the user to be able to explicitly configure it. 
> For example, the IP address for an active Livy server might change over the 
> time, so rather than using the fixed IP address, we need to specify this 
> setting to a more generic host name. For this reason, we want to enable the 
> capability of user side configurations. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: A new link request to my project and one question

2018-06-25 Thread Saisai Shao
I think Livy super user is similar to Hadoop's proxy user, it allows this
user to impersonate others, but it doesn't check whether other users is
allowed to be impersonated.

In the meantime, Livy has ACL mechanisms, which allows only ACL verified
users to connect to LivyServer, so I think with ACL, we can do a more
fine-grained control.

For other missing point, I think we can improve the Livy code.

Marcelo Vanzin  于2018年6月26日周二 上午9:53写道:

> Superusers are a little more than "allowed to impersonate others". I
> don't remember exactly what are the things that it allows, but it
> would be better to add finer grained permissions.
>
> On Mon, Jun 25, 2018 at 6:30 PM, Saisai Shao 
> wrote:
> > Yes, has a configuration "livy.superusers". Here in this case, the sql
> > server user should be added as a superuser, who can impersonate other
> > different users.
> >
> > Marcelo Vanzin  于2018年6月26日周二 上午9:12写道:
> >
> >> You're talking about another service between the user and the
> application.
> >>
> >> In that case a parameter probably makes sense. But then you'd need to
> >> add those config options, because this is a dangerous feature, and
> >> Livy should know who is allowed to impersonate who. In this case the
> >> service needs to authenticate to Livy as a privileged user, and Livy's
> >> configuration would say that the service's user is allowed to
> >> impersonate certain users or groups (same as the other services that
> >> allow impersonation like YARN).
> >>
> >>
> >> On Mon, Jun 25, 2018 at 5:41 PM, Takeshi Yamamuro <
> linguin@gmail.com>
> >> wrote:
> >> > Yea, I know the Livy supports impersonation.
> >> > I assume a case blow
> >> > [different users] ---Some protocols---> [the server applications
> managing
> >> > multiple sessions for users] ---REST---> [Livy server]
> >> > In this case, Livy already has a way to pass proxyUser from the
> >> application
> >> > to Livy?
> >> > Sorry, but I'm not familiar with Livy internal logic.
> >> >
> >> >
> >> > On Tue, Jun 26, 2018 at 9:14 AM Marcelo Vanzin
> >> 
> >> > wrote:
> >> >
> >> >> On Mon, Jun 25, 2018 at 5:09 PM, Takeshi Yamamuro <
> >> linguin@gmail.com>
> >> >> wrote:
> >> >> > In that case, I think Livy is useful; the application can pass
> >> proxyUser
> >> >> to
> >> >> > build LivyClient for each user
> >> >> > and run spark queries as each user authorization.
> >> >>
> >> >> But Livy already supports impersonation. It can impersonate the
> >> >> authenticated user.
> >> >>
> >> >> You're suggesting adding a parameter so the user can request
> >> >> impersonation of some specific user, which is a different thing. What
> >> >> is the use case for that?
> >> >>
> >> >> --
> >> >> Marcelo
> >> >>
> >> >
> >> >
> >> > --
> >> > ---
> >> > Takeshi Yamamuro
> >>
> >>
> >>
> >> --
> >> Marcelo
> >>
>
>
>
> --
> Marcelo
>


[jira] [Resolved] (SPARK-24418) Upgrade to Scala 2.11.12

2018-06-25 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-24418.
-
Resolution: Fixed

Issue resolved by pull request 21495
[https://github.com/apache/spark/pull/21495]

> Upgrade to Scala 2.11.12
> 
>
> Key: SPARK-24418
> URL: https://issues.apache.org/jira/browse/SPARK-24418
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 2.3.0
>Reporter: DB Tsai
>Assignee: DB Tsai
>Priority: Major
> Fix For: 2.4.0
>
>
> Scala 2.11.12+ will support JDK9+. However, this is not going to be a simple 
> version bump. 
> *loadFIles()* in *ILoop* was removed in Scala 2.11.12. We use it as a hack to 
> initialize the Spark before REPL sees any files.
> Issue filed in Scala community.
> https://github.com/scala/bug/issues/10913



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: A new link request to my project and one question

2018-06-25 Thread Saisai Shao
Yes, has a configuration "livy.superusers". Here in this case, the sql
server user should be added as a superuser, who can impersonate other
different users.

Marcelo Vanzin  于2018年6月26日周二 上午9:12写道:

> You're talking about another service between the user and the application.
>
> In that case a parameter probably makes sense. But then you'd need to
> add those config options, because this is a dangerous feature, and
> Livy should know who is allowed to impersonate who. In this case the
> service needs to authenticate to Livy as a privileged user, and Livy's
> configuration would say that the service's user is allowed to
> impersonate certain users or groups (same as the other services that
> allow impersonation like YARN).
>
>
> On Mon, Jun 25, 2018 at 5:41 PM, Takeshi Yamamuro 
> wrote:
> > Yea, I know the Livy supports impersonation.
> > I assume a case blow
> > [different users] ---Some protocols---> [the server applications managing
> > multiple sessions for users] ---REST---> [Livy server]
> > In this case, Livy already has a way to pass proxyUser from the
> application
> > to Livy?
> > Sorry, but I'm not familiar with Livy internal logic.
> >
> >
> > On Tue, Jun 26, 2018 at 9:14 AM Marcelo Vanzin
> 
> > wrote:
> >
> >> On Mon, Jun 25, 2018 at 5:09 PM, Takeshi Yamamuro <
> linguin@gmail.com>
> >> wrote:
> >> > In that case, I think Livy is useful; the application can pass
> proxyUser
> >> to
> >> > build LivyClient for each user
> >> > and run spark queries as each user authorization.
> >>
> >> But Livy already supports impersonation. It can impersonate the
> >> authenticated user.
> >>
> >> You're suggesting adding a parameter so the user can request
> >> impersonation of some specific user, which is a different thing. What
> >> is the use case for that?
> >>
> >> --
> >> Marcelo
> >>
> >
> >
> > --
> > ---
> > Takeshi Yamamuro
>
>
>
> --
> Marcelo
>


[jira] [Commented] (SPARK-24646) Support wildcard '*' for to spark.yarn.dist.forceDownloadSchemes

2018-06-25 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523022#comment-16523022
 ] 

Saisai Shao commented on SPARK-24646:
-

Hi [~vanzin], here is a specific example:

We have a customized {{ServiceCredentialProvider}} named HS2 
{{ServiceCredentialProvider}} which is located in our own jar {{foo}}. So when 
SparkSubmit process launches YARN client, this {{foo}} jar should be existed in 
SparkSubmit process classpath. 

If this {{foo}} jar is a local resource added by {{--jars}}, then it will be 
existed in the classpath, but if it is a remote jar (for example on HDFS), then 
only yarn-client mode will download and add this {{foo}} jar to classpath, in 
yarn-cluster mode, it will not be downloaded, so this specific HS2 
{{ServiceCredentialProvider}} will not be loaded.

When using spark-submit script, user could decide to add the remote jar or 
local jar. But in the Livy scenario, Livy only supports remote jars (jars on 
the hdfs), and we only configure to support yarn cluster mode. So in this 
scenario, we cannot load this customized {{ServiceCredentialProvider}} in yarn 
cluster mode.

So the fix is to force to download the jars to local SparkSubmit process with 
configuration "spark.yarn.dist.forceDownloadSchemes", to use it easily, I 
propose to add wildcard '*' support, which will download all the remote 
resource without checking the scheme.

> Support wildcard '*' for to spark.yarn.dist.forceDownloadSchemes
> 
>
> Key: SPARK-24646
> URL: https://issues.apache.org/jira/browse/SPARK-24646
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>    Affects Versions: 2.3.0
>Reporter: Saisai Shao
>Priority: Minor
>
> In the case of getting tokens via customized {{ServiceCredentialProvider}}, 
> it is required that {{ServiceCredentialProvider}} be available in local 
> spark-submit process classpath. In this case, all the configured remote 
> sources should be forced to download to local.
> For the ease of using this configuration, here propose to add wildcard '*' 
> support to {{spark.yarn.dist.forceDownloadSchemes}}, also clarify the usage 
> of this configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24646) Support wildcard '*' for to spark.yarn.dist.forceDownloadSchemes

2018-06-24 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-24646:
---

 Summary: Support wildcard '*' for to 
spark.yarn.dist.forceDownloadSchemes
 Key: SPARK-24646
 URL: https://issues.apache.org/jira/browse/SPARK-24646
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.3.0
Reporter: Saisai Shao


In the case of getting tokens via customized {{ServiceCredentialProvider}}, it 
is required that {{ServiceCredentialProvider}} be available in local 
spark-submit process classpath. In this case, all the configured remote sources 
should be forced to download to local.

For the ease of using this configuration, here propose to add wildcard '*' 
support to {{spark.yarn.dist.forceDownloadSchemes}}, also clarify the usage of 
this configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Can livy execute code when session is in busy state?

2018-06-21 Thread Saisai Shao
No, busy means currently there's job running in Spark, so the follow-up
code will wait until the previous job is done.

JF Chen  于2018年6月22日周五 上午11:53写道:

> Can livy execute code when session  is in busy state?
>
> Regard,
> Junfeng Chen
>


[jira] [Commented] (SPARK-24374) SPIP: Support Barrier Scheduling in Apache Spark

2018-06-21 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519934#comment-16519934
 ] 

Saisai Shao commented on SPARK-24374:
-

Hi [~mengxr] [~jiangxb1987] SPARK-24615 would be a part of project hydrogen, I 
wrote a high level design doc about the implementation, would you please help 
to review and comment, to see the solution is feasible or not. Thanks a lot.

> SPIP: Support Barrier Scheduling in Apache Spark
> 
>
> Key: SPARK-24374
> URL: https://issues.apache.org/jira/browse/SPARK-24374
> Project: Spark
>  Issue Type: Epic
>  Components: ML, Spark Core
>Affects Versions: 3.0.0
>Reporter: Xiangrui Meng
>Assignee: Xiangrui Meng
>Priority: Major
>  Labels: Hydrogen, SPIP
> Attachments: SPIP_ Support Barrier Scheduling in Apache Spark.pdf
>
>
> (See details in the linked/attached SPIP doc.)
> {quote}
> The proposal here is to add a new scheduling model to Apache Spark so users 
> can properly embed distributed DL training as a Spark stage to simplify the 
> distributed training workflow. For example, Horovod uses MPI to implement 
> all-reduce to accelerate distributed TensorFlow training. The computation 
> model is different from MapReduce used by Spark. In Spark, a task in a stage 
> doesn’t depend on any other tasks in the same stage, and hence it can be 
> scheduled independently. In MPI, all workers start at the same time and pass 
> messages around. To embed this workload in Spark, we need to introduce a new 
> scheduling model, tentatively named “barrier scheduling”, which launches 
> tasks at the same time and provides users enough information and tooling to 
> embed distributed DL training. Spark can also provide an extra layer of fault 
> tolerance in case some tasks failed in the middle, where Spark would abort 
> all tasks and restart the stage.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-21 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519931#comment-16519931
 ] 

Saisai Shao commented on HIVE-16391:


Gently ping [~hagleitn], would you please help to review the current proposed 
patch and suggest the next step. Thanks a lot.

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.1.patch, HIVE-16391.2.patch, HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SPARK-24615) Accelerator aware task scheduling for Spark

2018-06-21 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-24615:

Labels: SPIP  (was: )

> Accelerator aware task scheduling for Spark
> ---
>
> Key: SPARK-24615
> URL: https://issues.apache.org/jira/browse/SPARK-24615
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>    Reporter: Saisai Shao
>Priority: Major
>  Labels: SPIP
>
> In the machine learning area, accelerator card (GPU, FPGA, TPU) is 
> predominant compared to CPUs. To make the current Spark architecture to work 
> with accelerator cards, Spark itself should understand the existence of 
> accelerators and know how to schedule task onto the executors where 
> accelerators are equipped.
> Current Spark’s scheduler schedules tasks based on the locality of the data 
> plus the available of CPUs. This will introduce some problems when scheduling 
> tasks with accelerators required.
>  # CPU cores are usually more than accelerators on one node, using CPU cores 
> to schedule accelerator required tasks will introduce the mismatch.
>  # In one cluster, we always assume that CPU is equipped in each node, but 
> this is not true of accelerator cards.
>  # The existence of heterogeneous tasks (accelerator required or not) 
> requires scheduler to schedule tasks with a smart way.
> So here propose to improve the current scheduler to support heterogeneous 
> tasks (accelerator requires or not). This can be part of the work of Project 
> hydrogen.
> Details is attached in google doc. It doesn't cover all the implementation 
> details, just highlight the parts should be changed.
>  
> CC [~yanboliang] [~merlintang]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24615) Accelerator aware task scheduling for Spark

2018-06-21 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-24615:

Description: 
In the machine learning area, accelerator card (GPU, FPGA, TPU) is predominant 
compared to CPUs. To make the current Spark architecture to work with 
accelerator cards, Spark itself should understand the existence of accelerators 
and know how to schedule task onto the executors where accelerators are 
equipped.

Current Spark’s scheduler schedules tasks based on the locality of the data 
plus the available of CPUs. This will introduce some problems when scheduling 
tasks with accelerators required.
 # CPU cores are usually more than accelerators on one node, using CPU cores to 
schedule accelerator required tasks will introduce the mismatch.
 # In one cluster, we always assume that CPU is equipped in each node, but this 
is not true of accelerator cards.
 # The existence of heterogeneous tasks (accelerator required or not) requires 
scheduler to schedule tasks with a smart way.

So here propose to improve the current scheduler to support heterogeneous tasks 
(accelerator requires or not). This can be part of the work of Project hydrogen.

Details is attached in google doc. It doesn't cover all the implementation 
details, just highlight the parts should be changed.

 

CC [~yanboliang] [~merlintang]

  was:
In the machine learning area, accelerator card (GPU, FPGA, TPU) is predominant 
compared to CPUs. To make the current Spark architecture to work with 
accelerator cards, Spark itself should understand the existence of accelerators 
and know how to schedule task onto the executors where accelerators are 
equipped.

Current Spark’s scheduler schedules tasks based on the locality of the data 
plus the available of CPUs. This will introduce some problems when scheduling 
tasks with accelerators required.
 # CPU cores are usually more than accelerators on one node, using CPU cores to 
schedule accelerator required tasks will introduce the mismatch.
 # In one cluster, we always assume that CPU is equipped in each node, but this 
is not true of accelerator cards.
 # The existence of heterogeneous tasks (accelerator required or not) requires 
scheduler to schedule tasks with a smart way.

So here propose to improve the current scheduler to support heterogeneous tasks 
(accelerator requires or not). This can be part of the work of Project hydrogen.

Details is attached in google doc.

 

CC [~yanboliang] [~merlintang]


> Accelerator aware task scheduling for Spark
> ---
>
> Key: SPARK-24615
> URL: https://issues.apache.org/jira/browse/SPARK-24615
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Saisai Shao
>Priority: Major
>
> In the machine learning area, accelerator card (GPU, FPGA, TPU) is 
> predominant compared to CPUs. To make the current Spark architecture to work 
> with accelerator cards, Spark itself should understand the existence of 
> accelerators and know how to schedule task onto the executors where 
> accelerators are equipped.
> Current Spark’s scheduler schedules tasks based on the locality of the data 
> plus the available of CPUs. This will introduce some problems when scheduling 
> tasks with accelerators required.
>  # CPU cores are usually more than accelerators on one node, using CPU cores 
> to schedule accelerator required tasks will introduce the mismatch.
>  # In one cluster, we always assume that CPU is equipped in each node, but 
> this is not true of accelerator cards.
>  # The existence of heterogeneous tasks (accelerator required or not) 
> requires scheduler to schedule tasks with a smart way.
> So here propose to improve the current scheduler to support heterogeneous 
> tasks (accelerator requires or not). This can be part of the work of Project 
> hydrogen.
> Details is attached in google doc. It doesn't cover all the implementation 
> details, just highlight the parts should be changed.
>  
> CC [~yanboliang] [~merlintang]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24615) Accelerator aware task scheduling for Spark

2018-06-21 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-24615:

Description: 
In the machine learning area, accelerator card (GPU, FPGA, TPU) is predominant 
compared to CPUs. To make the current Spark architecture to work with 
accelerator cards, Spark itself should understand the existence of accelerators 
and know how to schedule task onto the executors where accelerators are 
equipped.

Current Spark’s scheduler schedules tasks based on the locality of the data 
plus the available of CPUs. This will introduce some problems when scheduling 
tasks with accelerators required.
 # CPU cores are usually more than accelerators on one node, using CPU cores to 
schedule accelerator required tasks will introduce the mismatch.
 # In one cluster, we always assume that CPU is equipped in each node, but this 
is not true of accelerator cards.
 # The existence of heterogeneous tasks (accelerator required or not) requires 
scheduler to schedule tasks with a smart way.

So here propose to improve the current scheduler to support heterogeneous tasks 
(accelerator requires or not). This can be part of the work of Project hydrogen.

Details is attached in google doc.

 

CC [~yanboliang] [~merlintang]

  was:
In the machine learning area, accelerator card (GPU, FPGA, TPU) is predominant 
compared to CPUs. To make the current Spark architecture to work with 
accelerator cards, Spark itself should understand the existence of accelerators 
and know how to schedule task onto the executors where accelerators are 
equipped.

Current Spark’s scheduler schedules tasks based on the locality of the data 
plus the available of CPUs. This will introduce some problems when scheduling 
tasks with accelerators required.
 # CPU cores are usually more than accelerators on one node, using CPU cores to 
schedule accelerator required tasks will introduce the mismatch.
 # In one cluster, we always assume that CPU is equipped in each node, but this 
is not true of accelerator cards.
 # The existence of heterogeneous tasks (accelerator required or not) requires 
scheduler to schedule tasks with a smart way.

So here propose to improve the current scheduler to support heterogeneous tasks 
(accelerator requires or not). Details is attached in google doc.


> Accelerator aware task scheduling for Spark
> ---
>
> Key: SPARK-24615
> URL: https://issues.apache.org/jira/browse/SPARK-24615
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>    Reporter: Saisai Shao
>Priority: Major
>
> In the machine learning area, accelerator card (GPU, FPGA, TPU) is 
> predominant compared to CPUs. To make the current Spark architecture to work 
> with accelerator cards, Spark itself should understand the existence of 
> accelerators and know how to schedule task onto the executors where 
> accelerators are equipped.
> Current Spark’s scheduler schedules tasks based on the locality of the data 
> plus the available of CPUs. This will introduce some problems when scheduling 
> tasks with accelerators required.
>  # CPU cores are usually more than accelerators on one node, using CPU cores 
> to schedule accelerator required tasks will introduce the mismatch.
>  # In one cluster, we always assume that CPU is equipped in each node, but 
> this is not true of accelerator cards.
>  # The existence of heterogeneous tasks (accelerator required or not) 
> requires scheduler to schedule tasks with a smart way.
> So here propose to improve the current scheduler to support heterogeneous 
> tasks (accelerator requires or not). This can be part of the work of Project 
> hydrogen.
> Details is attached in google doc.
>  
> CC [~yanboliang] [~merlintang]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24615) Accelerator aware task scheduling for Spark

2018-06-21 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-24615:
---

 Summary: Accelerator aware task scheduling for Spark
 Key: SPARK-24615
 URL: https://issues.apache.org/jira/browse/SPARK-24615
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.4.0
Reporter: Saisai Shao


In the machine learning area, accelerator card (GPU, FPGA, TPU) is predominant 
compared to CPUs. To make the current Spark architecture to work with 
accelerator cards, Spark itself should understand the existence of accelerators 
and know how to schedule task onto the executors where accelerators are 
equipped.

Current Spark’s scheduler schedules tasks based on the locality of the data 
plus the available of CPUs. This will introduce some problems when scheduling 
tasks with accelerators required.
 # CPU cores are usually more than accelerators on one node, using CPU cores to 
schedule accelerator required tasks will introduce the mismatch.
 # In one cluster, we always assume that CPU is equipped in each node, but this 
is not true of accelerator cards.
 # The existence of heterogeneous tasks (accelerator required or not) requires 
scheduler to schedule tasks with a smart way.

So here propose to improve the current scheduler to support heterogeneous tasks 
(accelerator requires or not). Details is attached in google doc.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24493) Kerberos Ticket Renewal is failing in Hadoop 2.8+ and Hadoop 3

2018-06-19 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16517720#comment-16517720
 ] 

Saisai Shao commented on SPARK-24493:
-

This BUG is fixed in HDFS-12670, Spark should bump its supported Hadoop3 
version to 3.1.1 when released.

> Kerberos Ticket Renewal is failing in Hadoop 2.8+ and Hadoop 3
> --
>
> Key: SPARK-24493
> URL: https://issues.apache.org/jira/browse/SPARK-24493
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 2.3.0
>Reporter: Asif M
>Priority: Major
>
> Kerberos Ticket Renewal is failing on long running spark job. I have added 
> below 2 kerberos properties in the HDFS configuration and ran a spark 
> streaming job 
> ([hdfs_wordcount.py|https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/hdfs_wordcount.py])
> {noformat}
> dfs.namenode.delegation.token.max-lifetime=180 (30min)
> dfs.namenode.delegation.token.renew-interval=90 (15min)
> {noformat}
>  
> Spark Job failed at 15min with below error:
> {noformat}
> 18/06/04 18:56:51 INFO DAGScheduler: ShuffleMapStage 10896 (call at 
> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py:2381)
>  failed in 0.218 s due to Job aborted due to stage failure: Task 0 in stage 
> 10896.0 failed 4 times, most recent failure: Lost task 0.3 in stage 10896.0 
> (TID 7290, , executor 1): 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for abcd: HDFS_DELEGATION_TOKEN owner=a...@example.com, 
> renewer=yarn, realUser=, issueDate=1528136773875, maxDate=1528138573875, 
> sequenceNumber=38, masterKeyId=6) is expired, current time: 2018-06-04 
> 18:56:51,276+ expected renewal time: 2018-06-04 18:56:13,875+
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
> at org.apache.hadoop.ipc.Client.call(Client.java:1445)
> at org.apache.hadoop.ipc.Client.call(Client.java:1355)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy18.getBlockLocations(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:317)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
> at com.sun.proxy.$Proxy19.getBlockLocations(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:856)
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:845)
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:834)
> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:998)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:326)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:322)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:334)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
> at 
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:86)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:189)
> at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:186)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:141)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:70)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache

Re: A new link request to my project and one question

2018-06-14 Thread Saisai Shao
Sure, I will merge the website code, thanks!

For proxyUser thing, I think there's no particular reason not adding it,
maybe we just forgot to add the proxyUser support.

It would be better if you could create a JIRA to track this issue. If
you're familiar with Livy code, you can also submit a PR about it.😀

Thanks
Jerry

Takeshi Yamamuro  于2018年6月15日周五 上午7:33写道:

> Hi, Livy dev,
>
> I opened a new pr in incubator-livy-website to add a new link in
> third-party-projects.md. It'd be great if you could check this;
> https://github.com/apache/incubator-livy-website/pull/23
>
> Btw, I have one question; currently, we cannot pass proxyUser
> in LivyClientBuilder. Any reason not to add code for that?
> I know we can handle this in an application side by adding a bit code like
>
> https://github.com/maropu/spark-sql-server/blob/master/sql/sql-server/src/main/java/org/apache/livyclient/common/CreateClientRequestWithProxyUser.java
> But, If Livy itself supported this, it'd be nice to me.
>
> Best,
> takeshi
>
> --
> ---
> Takeshi Yamamuro
>


[jira] [Updated] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-13 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated HIVE-16391:
---
Attachment: HIVE-16391.2.patch

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.1.patch, HIVE-16391.2.patch, HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-13 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511851#comment-16511851
 ] 

Saisai Shao commented on HIVE-16391:


I see. I can keep the "core" classifier and use another name. Will update the 
patch.

[~owen.omalley] would you please help to review this patch, since you created a 
Spark JIRA, or can you please point someone in Hive community to help to 
review? Thanks a lot.

 

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.1.patch, HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (LIVY-477) Upgrade Livy Scala version to 2.11.12

2018-06-12 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-477.
--
   Resolution: Fixed
Fix Version/s: 0.6.0

> Upgrade Livy Scala version to 2.11.12
> -
>
> Key: LIVY-477
> URL: https://issues.apache.org/jira/browse/LIVY-477
> Project: Livy
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 0.5.0
>    Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Minor
> Fix For: 0.6.0
>
>
> Scala version below 2.11.12 has CVE 
> ([https://scala-lang.org/news/security-update-nov17.html),] and Spark will 
> also upgrade its supported version to 2.11.12. 
> So here upgrading Livy's Scala version also.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (LIVY-477) Upgrade Livy Scala version to 2.11.12

2018-06-12 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/LIVY-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509600#comment-16509600
 ] 

Saisai Shao commented on LIVY-477:
--

Issue resolved by pull request 100
[https://github.com/apache/incubator-livy/pull/100|https://github.com/apache/incubator-livy/pull/100]

> Upgrade Livy Scala version to 2.11.12
> -
>
> Key: LIVY-477
> URL: https://issues.apache.org/jira/browse/LIVY-477
> Project: Livy
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 0.5.0
>Reporter: Saisai Shao
>Assignee: Saisai Shao
>Priority: Minor
>
> Scala version below 2.11.12 has CVE 
> ([https://scala-lang.org/news/security-update-nov17.html),] and Spark will 
> also upgrade its supported version to 2.11.12. 
> So here upgrading Livy's Scala version also.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (SPARK-24518) Using Hadoop credential provider API to store password

2018-06-11 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-24518:

Description: 
Current Spark configs password in a plaintext way, like putting in the 
configuration file or adding as a launch arguments, sometimes such 
configurations like SSL password is configured by cluster admin, which should 
not be seen by user, but now this passwords are world readable to all the users.

Hadoop credential provider API support storing password in a secure way, in 
which Spark could read it in a secure way, so here propose to add support of 
using credential provider API to get password.

> Using Hadoop credential provider API to store password
> --
>
> Key: SPARK-24518
> URL: https://issues.apache.org/jira/browse/SPARK-24518
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>    Reporter: Saisai Shao
>Priority: Minor
>
> Current Spark configs password in a plaintext way, like putting in the 
> configuration file or adding as a launch arguments, sometimes such 
> configurations like SSL password is configured by cluster admin, which should 
> not be seen by user, but now this passwords are world readable to all the 
> users.
> Hadoop credential provider API support storing password in a secure way, in 
> which Spark could read it in a secure way, so here propose to add support of 
> using credential provider API to get password.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24518) Using Hadoop credential provider API to store password

2018-06-11 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-24518:

Environment: (was: Current Spark configs password in a plaintext way, 
like putting in the configuration file or adding as a launch arguments, 
sometimes such configurations like SSL password is configured by cluster admin, 
which should not be seen by user, but now this passwords are world readable to 
all the users.

Hadoop credential provider API support storing password in a secure way, in 
which Spark could read it in a secure way, so here propose to add support of 
using credential provider API to get password.)

> Using Hadoop credential provider API to store password
> --
>
> Key: SPARK-24518
> URL: https://issues.apache.org/jira/browse/SPARK-24518
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>    Reporter: Saisai Shao
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24518) Using Hadoop credential provider API to store password

2018-06-11 Thread Saisai Shao (JIRA)
Saisai Shao created SPARK-24518:
---

 Summary: Using Hadoop credential provider API to store password
 Key: SPARK-24518
 URL: https://issues.apache.org/jira/browse/SPARK-24518
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.3.0
 Environment: Current Spark configs password in a plaintext way, like 
putting in the configuration file or adding as a launch arguments, sometimes 
such configurations like SSL password is configured by cluster admin, which 
should not be seen by user, but now this passwords are world readable to all 
the users.

Hadoop credential provider API support storing password in a secure way, in 
which Spark could read it in a secure way, so here propose to add support of 
using credential provider API to get password.
Reporter: Saisai Shao






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (LIVY-477) Upgrade Livy Scala version to 2.11.12

2018-06-10 Thread Saisai Shao (JIRA)
Saisai Shao created LIVY-477:


 Summary: Upgrade Livy Scala version to 2.11.12
 Key: LIVY-477
 URL: https://issues.apache.org/jira/browse/LIVY-477
 Project: Livy
  Issue Type: Improvement
  Components: Build
Affects Versions: 0.5.0
Reporter: Saisai Shao
Assignee: Saisai Shao


Scala version below 2.11.12 has CVE 
([https://scala-lang.org/news/security-update-nov17.html),] and Spark will also 
upgrade its supported version to 2.11.12. 

So here upgrading Livy's Scala version also.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SPARK-23534) Spark run on Hadoop 3.0.0

2018-06-08 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506023#comment-16506023
 ] 

Saisai Shao commented on SPARK-23534:
-

Yes, it is a Hadoop 2.8+ issue, but we don't a 2.8 profile for Spark, instead 
we have a 3.1 profile, so it will affect our hadoop 3 build, that's why also 
add a link here.

> Spark run on Hadoop 3.0.0
> -
>
> Key: SPARK-23534
> URL: https://issues.apache.org/jira/browse/SPARK-23534
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.3.0
>Reporter: Saisai Shao
>Priority: Major
>
> Major Hadoop vendors already/will step in Hadoop 3.0. So we should also make 
> sure Spark can run with Hadoop 3.0. This Jira tracks the work to make Spark 
> run on Hadoop 3.0.
> The work includes:
>  # Add a Hadoop 3.0.0 new profile to make Spark build-able with Hadoop 3.0.
>  # Test to see if there's dependency issues with Hadoop 3.0.
>  # Investigating the feasibility to use shaded client jars (HADOOP-11804).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Spark YARN job submission error (code 13)

2018-06-08 Thread Saisai Shao
In Spark on YARN, error code 13 means SparkContext doesn't initialize in
time. You can check the yarn application log to get more information.

BTW, did you just write a plain python script without creating
SparkContext/SparkSession?

Aakash Basu  于2018年6月8日周五 下午4:15写道:

> Hi,
>
> I'm trying to run a program on a cluster using YARN.
>
> YARN is present there along with HADOOP.
>
> Problem I'm running into is as below -
>
> Container exited with a non-zero exit code 13
>> Failing this attempt. Failing the application.
>>  ApplicationMaster host: N/A
>>  ApplicationMaster RPC port: -1
>>  queue: default
>>  start time: 1528297574594
>>  final status: FAILED
>>  tracking URL:
>> http://MasterNode:8088/cluster/app/application_1528296308262_0004
>>  user: bblite
>> Exception in thread "main" org.apache.spark.SparkException: Application
>> application_1528296308262_0004 finished with failed status
>>
>
> I checked on the net and most of the stackoverflow problems say, that the
> users have given *.master('local[*]')* in the code while invoking the
> Spark Session and at the same time, giving *--master yarn* while doing
> the spark-submit, hence they're getting the error due to conflict.
>
> But, in my case, I've not mentioned any master at all at the code. Just
> trying to run it on yarn by giving *--master yarn* while doing the
> spark-submit. Below is the code spark invoking -
>
> spark = SparkSession\
> .builder\
> .appName("Temp_Prog")\
> .getOrCreate()
>
> Below is the spark-submit -
>
> *spark-submit --master yarn --deploy-mode cluster --num-executors 3
> --executor-cores 6 --executor-memory 4G
> /appdata/codebase/backend/feature_extraction/try_yarn.py*
>
> I've tried without --deploy-mode too, still no help.
>
> Thanks,
> Aakash.
>


[jira] [Resolved] (SPARK-24487) Add support for RabbitMQ.

2018-06-08 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved SPARK-24487.
-
Resolution: Won't Fix

> Add support for RabbitMQ.
> -
>
> Key: SPARK-24487
> URL: https://issues.apache.org/jira/browse/SPARK-24487
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Michał Jurkiewicz
>Priority: Major
>
> Add support for RabbitMQ.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24487) Add support for RabbitMQ.

2018-06-08 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505786#comment-16505786
 ] 

Saisai Shao commented on SPARK-24487:
-

I'm sure Spark community will not merge this into Spark code base. You can 
either maintain a package yourself, or contribute to Apache Bahir.

> Add support for RabbitMQ.
> -
>
> Key: SPARK-24487
> URL: https://issues.apache.org/jira/browse/SPARK-24487
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Michał Jurkiewicz
>Priority: Major
>
> Add support for RabbitMQ.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23151) Provide a distribution of Spark with Hadoop 3.0

2018-06-08 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-23151:

Issue Type: Sub-task  (was: Dependency upgrade)
Parent: SPARK-23534

> Provide a distribution of Spark with Hadoop 3.0
> ---
>
> Key: SPARK-23151
> URL: https://issues.apache.org/jira/browse/SPARK-23151
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.2.0, 2.2.1
>Reporter: Louis Burke
>Priority: Major
>
> Provide a Spark package that supports Hadoop 3.0.0. Currently the Spark 
> package
> only supports Hadoop 2.7 i.e. spark-2.2.1-bin-hadoop2.7.tgz. The implication 
> is
> that using up to date Kinesis libraries alongside s3 causes a clash w.r.t
> aws-java-sdk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23151) Provide a distribution of Spark with Hadoop 3.0

2018-06-08 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505783#comment-16505783
 ] 

Saisai Shao commented on SPARK-23151:
-

I will convert this as a subtask of SPARK-23534.

> Provide a distribution of Spark with Hadoop 3.0
> ---
>
> Key: SPARK-23151
> URL: https://issues.apache.org/jira/browse/SPARK-23151
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Spark Core
>Affects Versions: 2.2.0, 2.2.1
>Reporter: Louis Burke
>Priority: Major
>
> Provide a Spark package that supports Hadoop 3.0.0. Currently the Spark 
> package
> only supports Hadoop 2.7 i.e. spark-2.2.1-bin-hadoop2.7.tgz. The implication 
> is
> that using up to date Kinesis libraries alongside s3 causes a clash w.r.t
> aws-java-sdk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23534) Spark run on Hadoop 3.0.0

2018-06-07 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505771#comment-16505771
 ] 

Saisai Shao commented on SPARK-23534:
-

Spark with Hadoop 3 will be failed in token renew for long running case (with 
keytab and principal).

> Spark run on Hadoop 3.0.0
> -
>
> Key: SPARK-23534
> URL: https://issues.apache.org/jira/browse/SPARK-23534
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.3.0
>Reporter: Saisai Shao
>Priority: Major
>
> Major Hadoop vendors already/will step in Hadoop 3.0. So we should also make 
> sure Spark can run with Hadoop 3.0. This Jira tracks the work to make Spark 
> run on Hadoop 3.0.
> The work includes:
>  # Add a Hadoop 3.0.0 new profile to make Spark build-able with Hadoop 3.0.
>  # Test to see if there's dependency issues with Hadoop 3.0.
>  # Investigating the feasibility to use shaded client jars (HADOOP-11804).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24493) Kerberos Ticket Renewal is failing in Hadoop 2.8+ and Hadoop 3

2018-06-07 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505768#comment-16505768
 ] 

Saisai Shao edited comment on SPARK-24493 at 6/8/18 6:56 AM:
-

Adding more background, the issue is happened when building Spark with Hadoop 
2.8+.

In Spark, we use {{TokenIdentifer}} return by {{decodeIdentifier}} to get renew 
interval for HDFS token, but due to missing service loader file, Hadoop failed 
to create {{TokenIdentifier}} and returns *null*, which leads to an unexpected 
renew interval (Long.MaxValue).

The related code in Hadoop is:

{code}
 private static Class
 getClassForIdentifier(Text kind) { Class cls = 
null; synchronized (Token.class) { if (tokenKindMap == null)

{ tokenKindMap = Maps.newHashMap(); for (TokenIdentifier id : 
ServiceLoader.load(TokenIdentifier.class)) \\{ tokenKindMap.put(id.getKind(), 
id.getClass()); }

}
 cls = tokenKindMap.get(kind);
 } if (cls == null)

{ LOG.debug("Cannot find class for token kind " + kind); return null; }

return cls;
 }
{code}

The problem is:

The HDFS "DelegationTokenIdentifier" class is in hadoop-hdfs-client jar, but 
the service loader description file 
"META-INF/services/org.apache.hadoop.security.token.TokenIdentifier" is in 
hadoop-hdfs jar. Spark local submit process/driver process (depends on client 
or cluster mode) only relies on hadoop-hdfs-client jar, but not hadoop-hdfs 
jar. So the ServiceLoader will be failed to find HDFS 
"DelegationTokenIdentifier" class and return null.

The issue is due to the change in HADOOP-6200.  Previously we only have 
building profile for Hadoop 2.6 and 2.7, so there's no issue here. But 
currently we has a building profile for Hadoop 3.1, so this will fail the token 
renew in Hadoop 3.1.

The is a Hadoop issue, creating a Spark Jira to track this issue and bump the 
version when Hadoop side is fixed.


was (Author: jerryshao):
Adding more background, the issue is happened when building Spark with Hadoop 
2.8+.

In Spark, we use {{TokenIdentifer}} return by {{decodeIdentifier}} to get renew 
interval for HDFS token, but due to missing service loader file, Hadoop failed 
to create {{TokenIdentifier}} and returns *null*, which leads to an unexpected 
renew interval (Long.MaxValue).

The related code in Hadoop is:
  private static Class
  getClassForIdentifier(Text kind) {Class 
cls = null;synchronized (Token.class) {  if (tokenKindMap == null) {
tokenKindMap = Maps.newHashMap();for (TokenIdentifier id : 
ServiceLoader.load(TokenIdentifier.class)) \{
  tokenKindMap.put(id.getKind(), id.getClass());
}
  }
  cls = tokenKindMap.get(kind);
}if (cls == null) {
  LOG.debug("Cannot find class for token kind " + kind);  return null;
}return cls;
  }


The problem is:

The HDFS "DelegationTokenIdentifier" class is in hadoop-hdfs-client jar, but 
the service loader description file 
"META-INF/services/org.apache.hadoop.security.token.TokenIdentifier" is in 
hadoop-hdfs jar. Spark local submit process/driver process (depends on client 
or cluster mode) only relies on hadoop-hdfs-client jar, but not hadoop-hdfs 
jar. So the ServiceLoader will be failed to find HDFS 
"DelegationTokenIdentifier" class and return null.

The issue is due to the change in HADOOP-6200.  Previously we only have 
building profile for Hadoop 2.6 and 2.7, so there's no issue here. But 
currently we has a building profile for Hadoop 3.1, so this will fail the token 
renew in Hadoop 3.1.

The is a Hadoop issue, creating a Spark Jira to track this issue and bump the 
version when Hadoop side is fixed.

> Kerberos Ticket Renewal is failing in Hadoop 2.8+ and Hadoop 3
> --
>
> Key: SPARK-24493
> URL: https://issues.apache.org/jira/browse/SPARK-24493
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 2.3.0
>Reporter: Asif M
>Priority: Major
>
> Kerberos Ticket Renewal is failing on long running spark job. I have added 
> below 2 kerberos properties in the HDFS configuration and ran a spark 
> streaming job 
> ([hdfs_wordcount.py|https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/hdfs_wordcount.py])
> {noformat}
> dfs.namenode.delegation.token.max-lifetime=180 (30min)
> dfs.namenode.delegation.token.renew-interval=90 (15min)
> {noformat}
>  
> Spark Job failed at 15min with below error:
> {noformat}
> 18/06/04 18:56:51 INFO DAGScheduler: ShuffleMapStage 10896 (call at 
> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/java_gate

[jira] [Commented] (SPARK-24493) Kerberos Ticket Renewal is failing in Hadoop 2.8+ and Hadoop 3

2018-06-07 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505768#comment-16505768
 ] 

Saisai Shao commented on SPARK-24493:
-

Adding more background, the issue is happened when building Spark with Hadoop 
2.8+.

In Spark, we use {{TokenIdentifer}} return by {{decodeIdentifier}} to get renew 
interval for HDFS token, but due to missing service loader file, Hadoop failed 
to create {{TokenIdentifier}} and returns *null*, which leads to an unexpected 
renew interval (Long.MaxValue).

The related code in Hadoop is:
  private static Class
  getClassForIdentifier(Text kind) {Class 
cls = null;synchronized (Token.class) {  if (tokenKindMap == null) {
tokenKindMap = Maps.newHashMap();for (TokenIdentifier id : 
ServiceLoader.load(TokenIdentifier.class)) \{
  tokenKindMap.put(id.getKind(), id.getClass());
}
  }
  cls = tokenKindMap.get(kind);
}if (cls == null) {
  LOG.debug("Cannot find class for token kind " + kind);  return null;
}return cls;
  }


The problem is:

The HDFS "DelegationTokenIdentifier" class is in hadoop-hdfs-client jar, but 
the service loader description file 
"META-INF/services/org.apache.hadoop.security.token.TokenIdentifier" is in 
hadoop-hdfs jar. Spark local submit process/driver process (depends on client 
or cluster mode) only relies on hadoop-hdfs-client jar, but not hadoop-hdfs 
jar. So the ServiceLoader will be failed to find HDFS 
"DelegationTokenIdentifier" class and return null.

The issue is due to the change in HADOOP-6200.  Previously we only have 
building profile for Hadoop 2.6 and 2.7, so there's no issue here. But 
currently we has a building profile for Hadoop 3.1, so this will fail the token 
renew in Hadoop 3.1.

The is a Hadoop issue, creating a Spark Jira to track this issue and bump the 
version when Hadoop side is fixed.

> Kerberos Ticket Renewal is failing in Hadoop 2.8+ and Hadoop 3
> --
>
> Key: SPARK-24493
> URL: https://issues.apache.org/jira/browse/SPARK-24493
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 2.3.0
>Reporter: Asif M
>Priority: Major
>
> Kerberos Ticket Renewal is failing on long running spark job. I have added 
> below 2 kerberos properties in the HDFS configuration and ran a spark 
> streaming job 
> ([hdfs_wordcount.py|https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/hdfs_wordcount.py])
> {noformat}
> dfs.namenode.delegation.token.max-lifetime=180 (30min)
> dfs.namenode.delegation.token.renew-interval=90 (15min)
> {noformat}
>  
> Spark Job failed at 15min with below error:
> {noformat}
> 18/06/04 18:56:51 INFO DAGScheduler: ShuffleMapStage 10896 (call at 
> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py:2381)
>  failed in 0.218 s due to Job aborted due to stage failure: Task 0 in stage 
> 10896.0 failed 4 times, most recent failure: Lost task 0.3 in stage 10896.0 
> (TID 7290, , executor 1): 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for abcd: HDFS_DELEGATION_TOKEN owner=a...@example.com, 
> renewer=yarn, realUser=, issueDate=1528136773875, maxDate=1528138573875, 
> sequenceNumber=38, masterKeyId=6) is expired, current time: 2018-06-04 
> 18:56:51,276+ expected renewal time: 2018-06-04 18:56:13,875+
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
> at org.apache.hadoop.ipc.Client.call(Client.java:1445)
> at org.apache.hadoop.ipc.Client.call(Client.java:1355)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy18.getBlockLocations(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:317)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
> 

[jira] [Updated] (SPARK-24493) Kerberos Ticket Renewal is failing in Hadoop 2.8+ and Hadoop 3

2018-06-07 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-24493:

Summary: Kerberos Ticket Renewal is failing in Hadoop 2.8+ and Hadoop 3  
(was: Kerberos Ticket Renewal is failing in long running Spark job)

> Kerberos Ticket Renewal is failing in Hadoop 2.8+ and Hadoop 3
> --
>
> Key: SPARK-24493
> URL: https://issues.apache.org/jira/browse/SPARK-24493
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 2.3.0
>Reporter: Asif M
>Priority: Major
>
> Kerberos Ticket Renewal is failing on long running spark job. I have added 
> below 2 kerberos properties in the HDFS configuration and ran a spark 
> streaming job 
> ([hdfs_wordcount.py|https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/hdfs_wordcount.py])
> {noformat}
> dfs.namenode.delegation.token.max-lifetime=180 (30min)
> dfs.namenode.delegation.token.renew-interval=90 (15min)
> {noformat}
>  
> Spark Job failed at 15min with below error:
> {noformat}
> 18/06/04 18:56:51 INFO DAGScheduler: ShuffleMapStage 10896 (call at 
> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py:2381)
>  failed in 0.218 s due to Job aborted due to stage failure: Task 0 in stage 
> 10896.0 failed 4 times, most recent failure: Lost task 0.3 in stage 10896.0 
> (TID 7290, , executor 1): 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for abcd: HDFS_DELEGATION_TOKEN owner=a...@example.com, 
> renewer=yarn, realUser=, issueDate=1528136773875, maxDate=1528138573875, 
> sequenceNumber=38, masterKeyId=6) is expired, current time: 2018-06-04 
> 18:56:51,276+ expected renewal time: 2018-06-04 18:56:13,875+
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
> at org.apache.hadoop.ipc.Client.call(Client.java:1445)
> at org.apache.hadoop.ipc.Client.call(Client.java:1355)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy18.getBlockLocations(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:317)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
> at com.sun.proxy.$Proxy19.getBlockLocations(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:856)
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:845)
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:834)
> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:998)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:326)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:322)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:334)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
> at 
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:86)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:189)
> at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:186)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:141)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:70)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spa

[jira] [Commented] (SPARK-24487) Add support for RabbitMQ.

2018-06-07 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505760#comment-16505760
 ] 

Saisai Shao commented on SPARK-24487:
-

You can add it in your own package with DataSource API. There's no meaning to 
add to Spark code base.

 

> Add support for RabbitMQ.
> -
>
> Key: SPARK-24487
> URL: https://issues.apache.org/jira/browse/SPARK-24487
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Michał Jurkiewicz
>Priority: Major
>
> Add support for RabbitMQ.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24493) Kerberos Ticket Renewal is failing in long running Spark job

2018-06-07 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-24493:

Component/s: Spark Core

> Kerberos Ticket Renewal is failing in long running Spark job
> 
>
> Key: SPARK-24493
> URL: https://issues.apache.org/jira/browse/SPARK-24493
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 2.3.0
>Reporter: Asif M
>Priority: Major
>
> Kerberos Ticket Renewal is failing on long running spark job. I have added 
> below 2 kerberos properties in the HDFS configuration and ran a spark 
> streaming job 
> ([hdfs_wordcount.py|https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/hdfs_wordcount.py])
> {noformat}
> dfs.namenode.delegation.token.max-lifetime=180 (30min)
> dfs.namenode.delegation.token.renew-interval=90 (15min)
> {noformat}
>  
> Spark Job failed at 15min with below error:
> {noformat}
> 18/06/04 18:56:51 INFO DAGScheduler: ShuffleMapStage 10896 (call at 
> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py:2381)
>  failed in 0.218 s due to Job aborted due to stage failure: Task 0 in stage 
> 10896.0 failed 4 times, most recent failure: Lost task 0.3 in stage 10896.0 
> (TID 7290, , executor 1): 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for abcd: HDFS_DELEGATION_TOKEN owner=a...@example.com, 
> renewer=yarn, realUser=, issueDate=1528136773875, maxDate=1528138573875, 
> sequenceNumber=38, masterKeyId=6) is expired, current time: 2018-06-04 
> 18:56:51,276+ expected renewal time: 2018-06-04 18:56:13,875+
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
> at org.apache.hadoop.ipc.Client.call(Client.java:1445)
> at org.apache.hadoop.ipc.Client.call(Client.java:1355)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy18.getBlockLocations(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:317)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
> at com.sun.proxy.$Proxy19.getBlockLocations(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:856)
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:845)
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:834)
> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:998)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:326)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:322)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:334)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
> at 
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:86)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:189)
> at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:186)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:141)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:70)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at

[jira] [Updated] (SPARK-24493) Kerberos Ticket Renewal is failing in long running Spark job

2018-06-07 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-24493:

Component/s: (was: Spark Core)
 YARN

> Kerberos Ticket Renewal is failing in long running Spark job
> 
>
> Key: SPARK-24493
> URL: https://issues.apache.org/jira/browse/SPARK-24493
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.3.0
>Reporter: Asif M
>Priority: Major
>
> Kerberos Ticket Renewal is failing on long running spark job. I have added 
> below 2 kerberos properties in the HDFS configuration and ran a spark 
> streaming job 
> ([hdfs_wordcount.py|https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/hdfs_wordcount.py])
> {noformat}
> dfs.namenode.delegation.token.max-lifetime=180 (30min)
> dfs.namenode.delegation.token.renew-interval=90 (15min)
> {noformat}
>  
> Spark Job failed at 15min with below error:
> {noformat}
> 18/06/04 18:56:51 INFO DAGScheduler: ShuffleMapStage 10896 (call at 
> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py:2381)
>  failed in 0.218 s due to Job aborted due to stage failure: Task 0 in stage 
> 10896.0 failed 4 times, most recent failure: Lost task 0.3 in stage 10896.0 
> (TID 7290, , executor 1): 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for abcd: HDFS_DELEGATION_TOKEN owner=a...@example.com, 
> renewer=yarn, realUser=, issueDate=1528136773875, maxDate=1528138573875, 
> sequenceNumber=38, masterKeyId=6) is expired, current time: 2018-06-04 
> 18:56:51,276+ expected renewal time: 2018-06-04 18:56:13,875+
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
> at org.apache.hadoop.ipc.Client.call(Client.java:1445)
> at org.apache.hadoop.ipc.Client.call(Client.java:1355)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy18.getBlockLocations(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:317)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
> at com.sun.proxy.$Proxy19.getBlockLocations(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:856)
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:845)
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:834)
> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:998)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:326)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:322)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:334)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
> at 
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:86)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:189)
> at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:186)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:141)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:70)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
&

[jira] [Updated] (SPARK-24493) Kerberos Ticket Renewal is failing in long running Spark job

2018-06-07 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-24493:

Priority: Major  (was: Blocker)

> Kerberos Ticket Renewal is failing in long running Spark job
> 
>
> Key: SPARK-24493
> URL: https://issues.apache.org/jira/browse/SPARK-24493
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.3.0
>Reporter: Asif M
>Priority: Major
>
> Kerberos Ticket Renewal is failing on long running spark job. I have added 
> below 2 kerberos properties in the HDFS configuration and ran a spark 
> streaming job 
> ([hdfs_wordcount.py|https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/hdfs_wordcount.py])
> {noformat}
> dfs.namenode.delegation.token.max-lifetime=180 (30min)
> dfs.namenode.delegation.token.renew-interval=90 (15min)
> {noformat}
>  
> Spark Job failed at 15min with below error:
> {noformat}
> 18/06/04 18:56:51 INFO DAGScheduler: ShuffleMapStage 10896 (call at 
> /usr/hdp/current/spark2-client/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py:2381)
>  failed in 0.218 s due to Job aborted due to stage failure: Task 0 in stage 
> 10896.0 failed 4 times, most recent failure: Lost task 0.3 in stage 10896.0 
> (TID 7290, , executor 1): 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (token for abcd: HDFS_DELEGATION_TOKEN owner=a...@example.com, 
> renewer=yarn, realUser=, issueDate=1528136773875, maxDate=1528138573875, 
> sequenceNumber=38, masterKeyId=6) is expired, current time: 2018-06-04 
> 18:56:51,276+ expected renewal time: 2018-06-04 18:56:13,875+
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
> at org.apache.hadoop.ipc.Client.call(Client.java:1445)
> at org.apache.hadoop.ipc.Client.call(Client.java:1355)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
> at com.sun.proxy.$Proxy18.getBlockLocations(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:317)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
> at com.sun.proxy.$Proxy19.getBlockLocations(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:856)
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:845)
> at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:834)
> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:998)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:326)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:322)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:334)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:950)
> at 
> org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:86)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.liftedTree1$1(NewHadoopRDD.scala:189)
> at org.apache.spark.rdd.NewHadoopRDD$$anon$1.(NewHadoopRDD.scala:186)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:141)
> at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:70)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
> at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:105)
> at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
> at

[jira] [Commented] (SPARK-24487) Add support for RabbitMQ.

2018-06-07 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505608#comment-16505608
 ] 

Saisai Shao commented on SPARK-24487:
-

What's the usage and purpose to integrate RabbitMQ to Spark?

> Add support for RabbitMQ.
> -
>
> Key: SPARK-24487
> URL: https://issues.apache.org/jira/browse/SPARK-24487
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Michał Jurkiewicz
>Priority: Major
>
> Add support for RabbitMQ.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-07 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16505607#comment-16505607
 ] 

Saisai Shao commented on HIVE-16391:


Any comment [~vanzin] [~ste...@apache.org]?

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.1.patch, HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (LIVY-476) Changing log-level INFO

2018-06-07 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reopened LIVY-476:
--

> Changing log-level INFO
> ---
>
> Key: LIVY-476
> URL: https://issues.apache.org/jira/browse/LIVY-476
> Project: Livy
>  Issue Type: Question
>  Components: API
>Affects Versions: 0.4.0
> Environment: Azure 
>Reporter: Antoine Ly
>Priority: Major
> Attachments: scrren.jpg
>
>
> I am using the LIVY API with a web application. We use it to submit spark job 
> then collect results to print them on the front page. To do so, we have to 
> interpret the LIVY callback.
> However, despite we changed the log level (log4j params changed to ERROR 
> instead of INFO), we get INFO message that prevent the application to 
> interpret properly the LIVY output. (see image below). It happens only with 
> RSpark session.
> We also tried to to change log4j log level directly in hdfs and spark but no 
> success...
> Could you please help on that to be able to remove the underlined lines below?
>  
> Thanks a lot.
> !scrren.jpg!
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (LIVY-476) Changing log-level INFO

2018-06-07 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao resolved LIVY-476.
--
Resolution: Not A Problem

> Changing log-level INFO
> ---
>
> Key: LIVY-476
> URL: https://issues.apache.org/jira/browse/LIVY-476
> Project: Livy
>  Issue Type: Question
>  Components: API
>Affects Versions: 0.4.0
> Environment: Azure 
>Reporter: Antoine Ly
>Priority: Major
> Attachments: scrren.jpg
>
>
> I am using the LIVY API with a web application. We use it to submit spark job 
> then collect results to print them on the front page. To do so, we have to 
> interpret the LIVY callback.
> However, despite we changed the log level (log4j params changed to ERROR 
> instead of INFO), we get INFO message that prevent the application to 
> interpret properly the LIVY output. (see image below). It happens only with 
> RSpark session.
> We also tried to to change log4j log level directly in hdfs and spark but no 
> success...
> Could you please help on that to be able to remove the underlined lines below?
>  
> Thanks a lot.
> !scrren.jpg!
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (LIVY-476) Changing log-level INFO

2018-06-07 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/LIVY-476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16504545#comment-16504545
 ] 

Saisai Shao commented on LIVY-476:
--

Not super sure about your issues, can you please explain more?

> Changing log-level INFO
> ---
>
> Key: LIVY-476
> URL: https://issues.apache.org/jira/browse/LIVY-476
> Project: Livy
>  Issue Type: Question
>  Components: API
>Affects Versions: 0.4.0
> Environment: Azure 
>Reporter: Antoine Ly
>Priority: Major
> Attachments: scrren.jpg
>
>
> I am using the LIVY API with a web application. We use it to submit spark job 
> then collect results to print them on the front page. To do so, we have to 
> interpret the LIVY callback.
> However, despite we changed the log level (log4j params changed to ERROR 
> instead of INFO), we get INFO message that prevent the application to 
> interpret properly the LIVY output. (see image below). It happens only with 
> RSpark session.
> We also tried to to change log4j log level directly in hdfs and spark but no 
> success...
> Could you please help on that to be able to remove the underlined lines below?
>  
> Thanks a lot.
> !scrren.jpg!
>  
>  
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-06 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16503188#comment-16503188
 ] 

Saisai Shao commented on HIVE-16391:


Uploaded a new patch [^HIVE-16391.1.patch]to use the solution mentioned by 
Marcelo.

Simply by adding two new maven modules and rename the original "hive-exec" 
module. One added module is new "hive-exec" which is compliant to existing 
Hive, another added module "hive-exec-spark" is specifically for Spark.

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.1.patch, HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-06 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated HIVE-16391:
---
Attachment: HIVE-16391.1.patch

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.1.patch, HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-06 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502976#comment-16502976
 ] 

Saisai Shao commented on HIVE-16391:


[~vanzin] one problem about your proposed solution: hive-exec test jar is not 
valid anymore, because we changed the artifact name for the current "hive-exec" 
pom. This might affect the user who relies on this test jar.

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HIVE-16391) Publish proper Hive 1.2 jars (without including all dependencies in uber jar)

2018-06-05 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502756#comment-16502756
 ] 

Saisai Shao commented on HIVE-16391:


{quote}The problem with that is that it changes the meaning of Hive's 
artifacts, so anybody currently importing hive-exec would see a breakage, and 
that's probably not desired.
{quote}
 
 This might not be acceptable from Hive community, because it will break the 
current user as you mentioned.

As [~joshrosen] mentioned, Spark wants the hive-exec jar which shades kryo and 
prototuf-java, not a pure non-shaded jar.
{quote}Another option is to change the artifact name of the current "hive-exec" 
pom. Then you'd publish the normal jar under the new artifact name, then have a 
separate module that imports that jar, shades dependencies, and publishes the 
result as "hive-exec". That would maintain compatibility with existing 
artifacts.
{quote}
I can try this approach, but it seems not a small change for Hive, I'm not sure 
if Hive community will accept such approach (at least for branch 1.2).

> Publish proper Hive 1.2 jars (without including all dependencies in uber jar)
> -
>
> Key: HIVE-16391
> URL: https://issues.apache.org/jira/browse/HIVE-16391
> Project: Hive
>  Issue Type: Task
>  Components: Build Infrastructure
>Affects Versions: 1.2.2
>Reporter: Reynold Xin
>Assignee: Saisai Shao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.2.3
>
> Attachments: HIVE-16391.patch
>
>
> Apache Spark currently depends on a forked version of Apache Hive. AFAIK, the 
> only change in the fork is to work around the issue that Hive publishes only 
> two sets of jars: one set with no dependency declared, and another with all 
> the dependencies included in the published uber jar. That is to say, Hive 
> doesn't publish a set of jars with the proper dependencies declared.
> There is general consensus on both sides that we should remove the forked 
> Hive.
> The change in the forked version is recorded here 
> https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2
> Note that the fork in the past included other fixes but those have all become 
> unnecessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


<    1   2   3   4   5   6   7   8   9   10   >