Jakob Odersky created SPARK-12350: ------------------------------------- Summary: VectorAssembler#transform() initially throws an exception Key: SPARK-12350 URL: https://issues.apache.org/jira/browse/SPARK-12350 Project: Spark Issue Type: Bug Components: ML Environment: sparkShell command from sbt Reporter: Jakob Odersky
Calling VectorAssembler.transform() initially throws an exception, subsequent calls work. h3. Steps to reproduce In spark-shell, 1. Create a dummy dataframe and define an assembler {code} import org.apache.spark.ml.feature.VectorAssembler val df = sc.parallelize(List((1,2), (3,4))).toDF val assembler = new VectorAssembler().setInputCols(Array("_1", "_2")).setOutputCol("features") {code} 2. Run {code} assembler.transform(df).show {code} Initially the following exception is thrown: {code} 15/12/15 16:20:19 ERROR TransportRequestHandler: Error opening stream /classes/org/apache/spark/sql/catalyst/expressions/Object.class for request from /9.72.139.102:60610 java.lang.IllegalArgumentException: requirement failed: File not found: /classes/org/apache/spark/sql/catalyst/expressions/Object.class at scala.Predef$.require(Predef.scala:233) at org.apache.spark.rpc.netty.NettyStreamManager.openStream(NettyStreamManager.scala:60) at org.apache.spark.network.server.TransportRequestHandler.processStreamRequest(TransportRequestHandler.java:136) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:106) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) {code} Subsequent calls work: {code} +---+---+---------+ | _1| _2| features| +---+---+---------+ | 1| 2|[1.0,2.0]| | 3| 4|[3.0,4.0]| +---+---+---------+ {code} It seems as though there is some internal state that is not initialized. [~iyounus] originally found this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org