[jira] [Comment Edited] (SPARK-13065) streaming-twitter pass twitter4j.FilterQuery argument to TwitterUtils.createStream()
[ https://issues.apache.org/jira/browse/SPARK-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128832#comment-15128832 ] Andrew Davidson edited comment on SPARK-13065 at 2/2/16 7:20 PM: - Hi Sachin I attached my java implementation for this enhancement as a reference. I also changed the description above. I added the code I use in my streaming spark app main() I chose a bad name for the attachement . its not in patch format Kind regards Andy was (Author: aedwip): sorry bad name. its not in patch format > streaming-twitter pass twitter4j.FilterQuery argument to > TwitterUtils.createStream() > > > Key: SPARK-13065 > URL: https://issues.apache.org/jira/browse/SPARK-13065 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 > Environment: all >Reporter: Andrew Davidson >Priority: Minor > Labels: twitter > Attachments: twitterFilterQueryPatch.tar.gz > > Original Estimate: 2h > Remaining Estimate: 2h > > The twitter stream api is very powerful provides a lot of support for > twitter.com side filtering of status objects. When ever possible we want to > let twitter do as much work as possible for us. > currently the spark twitter api only allows you to configure a small sub set > of possible filters > String{} filters = {"tag1", tag2"} > JavaDStream tweets =TwitterUtils.createStream(ssc, twitterAuth, > filters); > The current implemenation does > private[streaming] > class TwitterReceiver( > twitterAuth: Authorization, > filters: Seq[String], > storageLevel: StorageLevel > ) extends Receiver[Status](storageLevel) with Logging { > . . . > val query = new FilterQuery > if (filters.size > 0) { > query.track(filters.mkString(",")) > newTwitterStream.filter(query) > } else { > newTwitterStream.sample() > } > ... > rather than construct the FilterQuery object in TwitterReceiver.onStart(). we > should be able to pass a FilterQueryObject > looks like an easy fix. See source code links bellow > kind regards > Andy > https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L60 > https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L89 > $ 2/2/16 > attached is my java implementation for this problem. Feel free to reuse it > how ever you like. In my streaming spark app main() I have the following code >FilterQuery query = config.getFilterQuery().fetch(); > if (query != null) { > // TODO https://issues.apache.org/jira/browse/SPARK-13065 > tweets = TwitterFilterQueryUtils.createStream(ssc, twitterAuth, > query); > } /*else > spark native api > String[] filters = {"tag1", tag2"} > tweets = TwitterUtils.createStream(ssc, twitterAuth, filters); > > see > https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L89 > > causes > val query = new FilterQuery > if (filters.size > 0) { > query.track(filters.mkString(",")) > newTwitterStream.filter(query) > } */ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13094) No encoder implicits for Seq[Primitive]
[ https://issues.apache.org/jira/browse/SPARK-13094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13094: - Assignee: Michael Armbrust > No encoder implicits for Seq[Primitive] > --- > > Key: SPARK-13094 > URL: https://issues.apache.org/jira/browse/SPARK-13094 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Deenar Toraskar >Assignee: Michael Armbrust > Fix For: 1.6.1, 2.0.0 > > > Dataset aggregators with complex types fail with unable to find encoder for > type stored in a Dataset. Though Datasets with these complex types are > supported. > {code} > val arraySum = new Aggregator[Seq[Float], Seq[Float], > Seq[Float]] with Serializable { > def zero: Seq[Float] = Nil > // The initial value. > def reduce(currentSum: Seq[Float], currentRow: Seq[Float]) = > sumArray(currentSum, currentRow) > def merge(sum: Seq[Float], row: Seq[Float]) = sumArray(sum, row) > def finish(b: Seq[Float]) = b // Return the final result. > def sumArray(a: Seq[Float], b: Seq[Float]): Seq[Float] = { > (a, b) match { > case (Nil, Nil) => Nil > case (Nil, row) => row > case (sum, Nil) => sum > case (sum, row) => (a, b).zipped.map { case (a, b) => a + b } > } > } > }.toColumn > {code} > {code} > :47: error: Unable to find encoder for type stored in a Dataset. > Primitive types (Int, String, etc) and Product types (case classes) are > supported by importing sqlContext.implicits._ Support for serializing other > types will be added in future releases. >}.toColumn > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12783) Dataset map serialization error
[ https://issues.apache.org/jira/browse/SPARK-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12783. -- Resolution: Fixed Fix Version/s: 1.6.1 Closing, please reopen if you can reproduce in 1.6.1-RC1. > Dataset map serialization error > --- > > Key: SPARK-12783 > URL: https://issues.apache.org/jira/browse/SPARK-12783 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Muthu Jayakumar >Assignee: Wenchen Fan >Priority: Critical > Fix For: 1.6.1 > > Attachments: MyMap.scala > > > When Dataset API is used to map to another case class, an error is thrown. > {code} > case class MyMap(map: Map[String, String]) > case class TestCaseClass(a: String, b: String){ > def toMyMap: MyMap = { > MyMap(Map(a->b)) > } > def toStr: String = { > a > } > } > //Main method section below > import sqlContext.implicits._ > val df1 = sqlContext.createDataset(Seq(TestCaseClass("2015-05-01", "data1"), > TestCaseClass("2015-05-01", "data2"))).toDF() > df1.as[TestCaseClass].map(_.toStr).show() //works fine > df1.as[TestCaseClass].map(_.toMyMap).show() //fails > {code} > Error message: > {quote} > Caused by: java.io.NotSerializableException: > scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$1 > Serialization stack: > - object not serializable (class: > scala.reflect.runtime.SynchronizedSymbols$SynchronizedSymbol$$anon$1, value: > package lang) > - field (class: scala.reflect.internal.Types$ThisType, name: sym, type: > class scala.reflect.internal.Symbols$Symbol) > - object (class scala.reflect.internal.Types$UniqueThisType, > java.lang.type) > - field (class: scala.reflect.internal.Types$TypeRef, name: pre, type: > class scala.reflect.internal.Types$Type) > - object (class scala.reflect.internal.Types$ClassNoArgsTypeRef, String) > - field (class: scala.reflect.internal.Types$TypeRef, name: normalized, > type: class scala.reflect.internal.Types$Type) > - object (class scala.reflect.internal.Types$AliasNoArgsTypeRef, String) > - field (class: > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$6, name: keyType$1, > type: class scala.reflect.api.Types$TypeApi) > - object (class > org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$6, ) > - field (class: org.apache.spark.sql.catalyst.expressions.MapObjects, > name: function, type: interface scala.Function1) > - object (class org.apache.spark.sql.catalyst.expressions.MapObjects, > mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),- > field (class: "scala.collection.immutable.Map", name: "map"),- root class: > "collector.MyMap"),keyArray,ArrayType(StringType,true)),StringType)) > - field (class: org.apache.spark.sql.catalyst.expressions.Invoke, name: > targetObject, type: class > org.apache.spark.sql.catalyst.expressions.Expression) > - object (class org.apache.spark.sql.catalyst.expressions.Invoke, > invoke(mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),- > field (class: "scala.collection.immutable.Map", name: "map"),- root class: > "collector.MyMap"),keyArray,ArrayType(StringType,true)),StringType),array,ObjectType(class > [Ljava.lang.Object;))) > - writeObject data (class: > scala.collection.immutable.List$SerializationProxy) > - object (class scala.collection.immutable.List$SerializationProxy, > scala.collection.immutable.List$SerializationProxy@4c7e3aab) > - writeReplace data (class: > scala.collection.immutable.List$SerializationProxy) > - object (class scala.collection.immutable.$colon$colon, > List(invoke(mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),- > field (class: "scala.collection.immutable.Map", name: "map"),- root class: > "collector.MyMap"),keyArray,ArrayType(StringType,true)),StringType),array,ObjectType(class > [Ljava.lang.Object;)), > invoke(mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),- > field (class: "scala.collection.immutable.Map", name: "map"),- root class: > "collector.MyMap"),valueArray,ArrayType(StringType,true)),StringType),array,ObjectType(class > [Ljava.lang.Object; > - field (class: org.apache.spark.sql.catalyst.expressions.StaticInvoke, > name: arguments, type: interface scala.collection.Seq) > - object (class org.apache.spark.sql.catalyst.expressions.StaticInvoke, > staticinvoke(class > org.apache.spark.sql.catalyst.util.ArrayBasedMapData$,ObjectType(interface > scala.collection.Map),toScalaMap,invoke(mapobjects(,invoke(upcast('map,MapType(StringType,StringType,true),- > field (class: "scala.collection.immutable.Map", name: "map"),- root class: >
[jira] [Commented] (SPARK-12988) Can't drop columns that contain dots
[ https://issues.apache.org/jira/browse/SPARK-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128745#comment-15128745 ] Wenchen Fan commented on SPARK-12988: - I'd also like to forbid to use invalid column names in `drop` > Can't drop columns that contain dots > > > Key: SPARK-12988 > URL: https://issues.apache.org/jira/browse/SPARK-12988 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Michael Armbrust > > Neither of theses works: > {code} > val df = Seq((1, 1)).toDF("a_b", "a.c") > df.drop("a.c").collect() > df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int] > {code} > {code} > val df = Seq((1, 1)).toDF("a_b", "a.c") > df.drop("`a.c`").collect() > df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int] > {code} > Given that you can't use drop to drop subfields, it seems to me that we > should treat the column name literally (i.e. as though it is wrapped in back > ticks). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12631) Make Parameter Descriptions Consistent for PySpark MLlib Clustering
[ https://issues.apache.org/jira/browse/SPARK-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-12631. --- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10610 [https://github.com/apache/spark/pull/10610] > Make Parameter Descriptions Consistent for PySpark MLlib Clustering > --- > > Key: SPARK-12631 > URL: https://issues.apache.org/jira/browse/SPARK-12631 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 1.6.0 >Reporter: Bryan Cutler >Assignee: Bryan Cutler >Priority: Trivial > Labels: doc, starter > Fix For: 2.0.0 > > Original Estimate: 1h > Remaining Estimate: 1h > > Follow example parameter description format from parent task to fix up > clustering.py -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13120) Shade protobuf-java
[ https://issues.apache.org/jira/browse/SPARK-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128759#comment-15128759 ] Ted Yu commented on SPARK-13120: https://groups.google.com/forum/#!topic/protobuf/wAqvtPLBsE8 PB2 and PB3 are wire compatible, but, protobuf-java is not compatible so dependency will be a problem. Shading protobuf-java would provide better experience for downstream projects. > Shade protobuf-java > --- > > Key: SPARK-13120 > URL: https://issues.apache.org/jira/browse/SPARK-13120 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Ted Yu > > See this thread for background information: > http://search-hadoop.com/m/q3RTtdkUFK11xQhP1/Spark+not+able+to+fetch+events+from+Amazon+Kinesis > This issue shades com.google.protobuf:protobuf-java as > org.spark-project.protobuf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12988) Can't drop columns that contain dots
[ https://issues.apache.org/jira/browse/SPARK-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128829#comment-15128829 ] Yan commented on SPARK-12988: - My thinking is that projections should parse the column names; while the schema-based ops should keep the names as is. One thing I'm not sure is "Column". Given its current capabilities, it seems it is for projections so its name should be backticked if it contains a '.'. But please correct me if I'm wrong here. > Can't drop columns that contain dots > > > Key: SPARK-12988 > URL: https://issues.apache.org/jira/browse/SPARK-12988 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Michael Armbrust > > Neither of theses works: > {code} > val df = Seq((1, 1)).toDF("a_b", "a.c") > df.drop("a.c").collect() > df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int] > {code} > {code} > val df = Seq((1, 1)).toDF("a_b", "a.c") > df.drop("`a.c`").collect() > df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int] > {code} > Given that you can't use drop to drop subfields, it seems to me that we > should treat the column name literally (i.e. as though it is wrapped in back > ticks). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13065) streaming-twitter pass twitter4j.FilterQuery argument to TwitterUtils.createStream()
[ https://issues.apache.org/jira/browse/SPARK-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Davidson updated SPARK-13065: Description: The twitter stream api is very powerful provides a lot of support for twitter.com side filtering of status objects. When ever possible we want to let twitter do as much work as possible for us. currently the spark twitter api only allows you to configure a small sub set of possible filters String{} filters = {"tag1", tag2"} JavaDStream tweets =TwitterUtils.createStream(ssc, twitterAuth, filters); The current implemenation does private[streaming] class TwitterReceiver( twitterAuth: Authorization, filters: Seq[String], storageLevel: StorageLevel ) extends Receiver[Status](storageLevel) with Logging { . . . val query = new FilterQuery if (filters.size > 0) { query.track(filters.mkString(",")) newTwitterStream.filter(query) } else { newTwitterStream.sample() } ... rather than construct the FilterQuery object in TwitterReceiver.onStart(). we should be able to pass a FilterQueryObject looks like an easy fix. See source code links bellow kind regards Andy https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L60 https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L89 $ 2/2/16 attached is my java implementation for this problem. Feel free to reuse it how ever you like. In my streaming spark app main() I have the following code FilterQuery query = config.getFilterQuery().fetch(); if (query != null) { // TODO https://issues.apache.org/jira/browse/SPARK-13065 tweets = TwitterFilterQueryUtils.createStream(ssc, twitterAuth, query); } /*else spark native api String[] filters = {"tag1", tag2"} tweets = TwitterUtils.createStream(ssc, twitterAuth, filters); see https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L89 causes val query = new FilterQuery if (filters.size > 0) { query.track(filters.mkString(",")) newTwitterStream.filter(query) } */ was: The twitter stream api is very powerful provides a lot of support for twitter.com side filtering of status objects. When ever possible we want to let twitter do as much work as possible for us. currently the spark twitter api only allows you to configure a small sub set of possible filters String{} filters = {"tag1", tag2"} JavaDStream tweets =TwitterUtils.createStream(ssc, twitterAuth, filters); The current implemenation does private[streaming] class TwitterReceiver( twitterAuth: Authorization, filters: Seq[String], storageLevel: StorageLevel ) extends Receiver[Status](storageLevel) with Logging { . . . val query = new FilterQuery if (filters.size > 0) { query.track(filters.mkString(",")) newTwitterStream.filter(query) } else { newTwitterStream.sample() } ... rather than construct the FilterQuery object in TwitterReceiver.onStart(). we should be able to pass a FilterQueryObject looks like an easy fix. See source code links bellow kind regards Andy https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L60 https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L89 > streaming-twitter pass twitter4j.FilterQuery argument to > TwitterUtils.createStream() > > > Key: SPARK-13065 > URL: https://issues.apache.org/jira/browse/SPARK-13065 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 > Environment: all >Reporter: Andrew Davidson >Priority: Minor > Labels: twitter > Original Estimate: 2h > Remaining Estimate: 2h > > The twitter stream api is very powerful provides a lot of support for > twitter.com side filtering of status objects. When ever possible we want to > let twitter do as much work as possible for us. > currently the spark twitter api only allows you to configure a small sub set > of possible filters > String{} filters = {"tag1", tag2"} > JavaDStream tweets =TwitterUtils.createStream(ssc, twitterAuth, > filters); > The current implemenation does > private[streaming] > class TwitterReceiver( > twitterAuth: Authorization, >
[jira] [Resolved] (SPARK-12711) ML StopWordsRemover does not protect itself from column name duplication
[ https://issues.apache.org/jira/browse/SPARK-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-12711. --- Resolution: Fixed Fix Version/s: 1.6.1 2.0.0 Issue resolved by pull request 10741 [https://github.com/apache/spark/pull/10741] > ML StopWordsRemover does not protect itself from column name duplication > > > Key: SPARK-12711 > URL: https://issues.apache.org/jira/browse/SPARK-12711 > Project: Spark > Issue Type: Bug > Components: ML, MLlib >Affects Versions: 1.6.0 >Reporter: Grzegorz Chilkiewicz >Priority: Trivial > Labels: ml, mllib, newbie, suggestion > Fix For: 2.0.0, 1.6.1 > > > At work we were 'taking a closer look' at ML transformers and I > spotted that anomally. > On first look, resolution looks simple: > Add to StopWordsRemover.transformSchema line (as is done in e.g. > PCA.transformSchema, StandardScaler.transformSchema, > OneHotEncoder.transformSchema): > {code} > require(!schema.fieldNames.contains($(outputCol)), s"Output column > ${$(outputCol)} already exists.") > {code} > Am I correct? Is that a bug?If yes - I am willing to prepare an > appropriate pull request. > Maybe a better idea is to make use of super.transformSchema in > StopWordsRemover (and possibly in all other places)? > Links to files at github, mentioned above: > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala#L147 > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/Transformer.scala#L109-L111 > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala#L101-L102 > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/PCA.scala#L138-L139 > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala#L75-L76 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11293) Spillable collections leak shuffle memory
[ https://issues.apache.org/jira/browse/SPARK-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128935#comment-15128935 ] Mridul Muralidharan commented on SPARK-11293: - Not iterating to the end has a bunch of issues IIRC - including what you mention above. For example, m'mapped buffers are not released, etc. Unfortunately, I dont think there is a general clean solution for it. Would be good to see what alternatives exist to resolve this. > Spillable collections leak shuffle memory > - > > Key: SPARK-11293 > URL: https://issues.apache.org/jira/browse/SPARK-11293 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.1, 1.4.1, 1.5.1, 1.6.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Critical > > I discovered multiple leaks of shuffle memory while working on my memory > manager consolidation patch, which added the ability to do strict memory leak > detection for the bookkeeping that used to be performed by the > ShuffleMemoryManager. This uncovered a handful of places where tasks can > acquire execution/shuffle memory but never release it, starving themselves of > memory. > Problems that I found: > * {{ExternalSorter.stop()}} should release the sorter's shuffle/execution > memory. > * BlockStoreShuffleReader should call {{ExternalSorter.stop()}} using a > {{CompletionIterator}}. > * {{ExternalAppendOnlyMap}} exposes no equivalent of {{stop()}} for freeing > its resources. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13094) No encoder implicits for Seq[Primitive]
[ https://issues.apache.org/jira/browse/SPARK-13094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-13094. -- Resolution: Fixed Fix Version/s: 1.6.1 2.0.0 Issue resolved by pull request 11014 [https://github.com/apache/spark/pull/11014] > No encoder implicits for Seq[Primitive] > --- > > Key: SPARK-13094 > URL: https://issues.apache.org/jira/browse/SPARK-13094 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.6.0 >Reporter: Deenar Toraskar > Fix For: 2.0.0, 1.6.1 > > > Dataset aggregators with complex types fail with unable to find encoder for > type stored in a Dataset. Though Datasets with these complex types are > supported. > {code} > val arraySum = new Aggregator[Seq[Float], Seq[Float], > Seq[Float]] with Serializable { > def zero: Seq[Float] = Nil > // The initial value. > def reduce(currentSum: Seq[Float], currentRow: Seq[Float]) = > sumArray(currentSum, currentRow) > def merge(sum: Seq[Float], row: Seq[Float]) = sumArray(sum, row) > def finish(b: Seq[Float]) = b // Return the final result. > def sumArray(a: Seq[Float], b: Seq[Float]): Seq[Float] = { > (a, b) match { > case (Nil, Nil) => Nil > case (Nil, row) => row > case (sum, Nil) => sum > case (sum, row) => (a, b).zipped.map { case (a, b) => a + b } > } > } > }.toColumn > {code} > {code} > :47: error: Unable to find encoder for type stored in a Dataset. > Primitive types (Int, String, etc) and Product types (case classes) are > supported by importing sqlContext.implicits._ Support for serializing other > types will be added in future releases. >}.toColumn > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12780) Inconsistency returning value of ML python models' properties
[ https://issues.apache.org/jira/browse/SPARK-12780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-12780: -- Fix Version/s: 1.6.1 > Inconsistency returning value of ML python models' properties > - > > Key: SPARK-12780 > URL: https://issues.apache.org/jira/browse/SPARK-12780 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Reporter: Xusen Yin >Assignee: Xusen Yin >Priority: Minor > Fix For: 1.6.1, 2.0.0 > > > In spark/python/pyspark/ml/feature.py, StringIndexerModel has a property > method named labels, which is different with other properties in other models. > In StringIndexerModel: > {code:title=StringIndexerModel|theme=FadeToGrey|linenumbers=true|language=python|firstline=0001|collapse=false} > @property > @since("1.5.0") > def labels(self): > """ > Ordered list of labels, corresponding to indices to be assigned. > """ > return self._java_obj.labels > {code} > In CounterVectorizerModel (as an example): > {code:title=CounterVectorizerModel|theme=FadeToGrey|linenumbers=true|language=python|firstline=0001|collapse=false} > @property > @since("1.6.0") > def vocabulary(self): > """ > An array of terms in the vocabulary. > """ > return self._call_java("vocabulary") > {code} > In StringIndexerModel, the returned value of labels is not an array of labels > as expected. Otherwise it is a JavaMember of py4j. > What's more, the Pickle in Python side cannot deserialize Scala Array > normally. According to my experiments, it translates Array[String] into > Tuple, Array[Int] to array.array. It may bring some errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13002) Mesos scheduler backend does not follow the property spark.dynamicAllocation.initialExecutors
[ https://issues.apache.org/jira/browse/SPARK-13002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13002: - Target Version/s: 2.0.0 (was: 1.6.1, 2.0.0) > Mesos scheduler backend does not follow the property > spark.dynamicAllocation.initialExecutors > - > > Key: SPARK-13002 > URL: https://issues.apache.org/jira/browse/SPARK-13002 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.5.2, 1.6.0 >Reporter: Luc Bourlier > Labels: dynamic_allocation, mesos > > When starting a Spark job on a Mesos cluster, all available cores are > reserved (up to {{spark.cores.max}}), creating one executor per Mesos node, > and as many executors as needed. > This is the case even when dynamic allocation is enabled. > When dynamic allocation is enabled, the number of executor launched at > startup should be limited to the value of > {{spark.dynamicAllocation.initialExecutors}}. > The Mesos scheduler backend already follows the value computed by the > {{ExecutorAllocationManager}} for the number of executors that should be up > and running. Expect at startup, when it just creates all the executors it can. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12913) Reimplement stat functions as declarative function
[ https://issues.apache.org/jira/browse/SPARK-12913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-12913. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10960 [https://github.com/apache/spark/pull/10960] > Reimplement stat functions as declarative function > -- > > Key: SPARK-12913 > URL: https://issues.apache.org/jira/browse/SPARK-12913 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu >Assignee: Davies Liu > Fix For: 2.0.0 > > > As benchmarked and discussed here: > https://github.com/apache/spark/pull/10786/files#r50038294. > Benefits from codegen, the declarative aggregate function could be much > faster than imperative one, we should re-implement all the builtin aggregate > functions as declarative one. > For skewness and kurtosis, we need to benchmark it to make sure that the > declarative one is actually faster than imperative one. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13114) java.lang.NegativeArraySizeException in CSV
[ https://issues.apache.org/jira/browse/SPARK-13114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-13114. - Resolution: Fixed Assignee: Hyukjin Kwon Fix Version/s: 2.0.0 > java.lang.NegativeArraySizeException in CSV > --- > > Key: SPARK-13114 > URL: https://issues.apache.org/jira/browse/SPARK-13114 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Davies Liu >Assignee: Hyukjin Kwon >Priority: Critical > Fix For: 2.0.0 > > > It could be that token.length > schemaFields.length > {code} > java.lang.NegativeArraySizeException > at > com.databricks.spark.csv.CsvRelation$$anonfun$buildScan$6.apply(CsvRelation.scala:171) > at > com.databricks.spark.csv.CsvRelation$$anonfun$buildScan$6.apply(CsvRelation.scala:162) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:148) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:89) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13065) streaming-twitter pass twitter4j.FilterQuery argument to TwitterUtils.createStream()
[ https://issues.apache.org/jira/browse/SPARK-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Davidson updated SPARK-13065: Attachment: twitterFilterQueryPatch.tar.gz sorry bad name. its not in patch format > streaming-twitter pass twitter4j.FilterQuery argument to > TwitterUtils.createStream() > > > Key: SPARK-13065 > URL: https://issues.apache.org/jira/browse/SPARK-13065 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 > Environment: all >Reporter: Andrew Davidson >Priority: Minor > Labels: twitter > Attachments: twitterFilterQueryPatch.tar.gz > > Original Estimate: 2h > Remaining Estimate: 2h > > The twitter stream api is very powerful provides a lot of support for > twitter.com side filtering of status objects. When ever possible we want to > let twitter do as much work as possible for us. > currently the spark twitter api only allows you to configure a small sub set > of possible filters > String{} filters = {"tag1", tag2"} > JavaDStream tweets =TwitterUtils.createStream(ssc, twitterAuth, > filters); > The current implemenation does > private[streaming] > class TwitterReceiver( > twitterAuth: Authorization, > filters: Seq[String], > storageLevel: StorageLevel > ) extends Receiver[Status](storageLevel) with Logging { > . . . > val query = new FilterQuery > if (filters.size > 0) { > query.track(filters.mkString(",")) > newTwitterStream.filter(query) > } else { > newTwitterStream.sample() > } > ... > rather than construct the FilterQuery object in TwitterReceiver.onStart(). we > should be able to pass a FilterQueryObject > looks like an easy fix. See source code links bellow > kind regards > Andy > https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L60 > https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L89 > $ 2/2/16 > attached is my java implementation for this problem. Feel free to reuse it > how ever you like. In my streaming spark app main() I have the following code >FilterQuery query = config.getFilterQuery().fetch(); > if (query != null) { > // TODO https://issues.apache.org/jira/browse/SPARK-13065 > tweets = TwitterFilterQueryUtils.createStream(ssc, twitterAuth, > query); > } /*else > spark native api > String[] filters = {"tag1", tag2"} > tweets = TwitterUtils.createStream(ssc, twitterAuth, filters); > > see > https://github.com/apache/spark/blob/master/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala#L89 > > causes > val query = new FilterQuery > if (filters.size > 0) { > query.track(filters.mkString(",")) > newTwitterStream.filter(query) > } */ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12711) ML StopWordsRemover does not protect itself from column name duplication
[ https://issues.apache.org/jira/browse/SPARK-12711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-12711: -- Assignee: Grzegorz Chilkiewicz > ML StopWordsRemover does not protect itself from column name duplication > > > Key: SPARK-12711 > URL: https://issues.apache.org/jira/browse/SPARK-12711 > Project: Spark > Issue Type: Bug > Components: ML, MLlib >Affects Versions: 1.6.0 >Reporter: Grzegorz Chilkiewicz >Assignee: Grzegorz Chilkiewicz >Priority: Trivial > Labels: ml, mllib, newbie, suggestion > Fix For: 1.6.1, 2.0.0 > > > At work we were 'taking a closer look' at ML transformers and I > spotted that anomally. > On first look, resolution looks simple: > Add to StopWordsRemover.transformSchema line (as is done in e.g. > PCA.transformSchema, StandardScaler.transformSchema, > OneHotEncoder.transformSchema): > {code} > require(!schema.fieldNames.contains($(outputCol)), s"Output column > ${$(outputCol)} already exists.") > {code} > Am I correct? Is that a bug?If yes - I am willing to prepare an > appropriate pull request. > Maybe a better idea is to make use of super.transformSchema in > StopWordsRemover (and possibly in all other places)? > Links to files at github, mentioned above: > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala#L147 > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/Transformer.scala#L109-L111 > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/StandardScaler.scala#L101-L102 > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/PCA.scala#L138-L139 > https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/OneHotEncoder.scala#L75-L76 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13121) java mapWithState mishandles scala Option
[ https://issues.apache.org/jira/browse/SPARK-13121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128871#comment-15128871 ] Apache Spark commented on SPARK-13121: -- User 'gabrielenizzoli' has created a pull request for this issue: https://github.com/apache/spark/pull/11028 > java mapWithState mishandles scala Option > - > > Key: SPARK-13121 > URL: https://issues.apache.org/jira/browse/SPARK-13121 > Project: Spark > Issue Type: Bug > Components: Java API, Streaming >Affects Versions: 1.6.0 >Reporter: Gabriele Nizzoli >Priority: Critical > Fix For: 1.6.1 > > > in Spark Streaming, java mapWithState that uses Function3 has a bug in the > convertion from a scala Option to a java Optional. In the conversion, the > code in `StateSpec.scala`, line 222 is > `Optional.fromNullable(v.get)`. This fails if `v`, an `Option`, is `None`, > better to use `JavaUtils.optionToOptional(v)` instead. > Workaround is to use the Function4 call to mapWithState. This call has the > right conversion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-12988) Can't drop columns that contain dots
[ https://issues.apache.org/jira/browse/SPARK-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128902#comment-15128902 ] Dilip Biswal edited comment on SPARK-12988 at 2/2/16 7:56 PM: -- The subtle difference between column path and column name may not be very obvious to a common user of this API. val df = Seq((1, 1)).toDF("a_b", "a.b") df.select("`a.b`") df.drop("`a.b`") => the fact that one can not use back tick here , would it be that obvious to the user ? I believe that was the motivation to allow it but then i am not sure of its implications. was (Author: dkbiswal): The shuttle difference between column path and column name may not be very obvious to a common user of this API. val df = Seq((1, 1)).toDF("a_b", "a.b") df.select("`a.b`") df.drop("`a.b`") => the fact that one can not use back tick here , would it be that obvious to the user ? I believe that was the motivation to allow it but then i am not sure of its implications. > Can't drop columns that contain dots > > > Key: SPARK-12988 > URL: https://issues.apache.org/jira/browse/SPARK-12988 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Michael Armbrust > > Neither of theses works: > {code} > val df = Seq((1, 1)).toDF("a_b", "a.c") > df.drop("a.c").collect() > df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int] > {code} > {code} > val df = Seq((1, 1)).toDF("a_b", "a.c") > df.drop("`a.c`").collect() > df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int] > {code} > Given that you can't use drop to drop subfields, it seems to me that we > should treat the column name literally (i.e. as though it is wrapped in back > ticks). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13009) spark-streaming-twitter_2.10 does not make it possible to access the raw twitter json
[ https://issues.apache.org/jira/browse/SPARK-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128843#comment-15128843 ] Andrew Davidson commented on SPARK-13009: - I Sean I total agree with you. The Twitter4j people asked me to file a RFE with spark. I agree it is their problem. I just looking for some sort of work around. My down stream systems will not be able to process the data I am capturing. I guess in the short term I create the wrapper object and modify the spark twitter source code kind regards Andy > spark-streaming-twitter_2.10 does not make it possible to access the raw > twitter json > - > > Key: SPARK-13009 > URL: https://issues.apache.org/jira/browse/SPARK-13009 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Andrew Davidson >Priority: Minor > > The Streaming-twitter package makes it easy for Java programmers to work with > twitter. The implementation returns the raw twitter data in JSON formate as a > twitter4J StatusJSONImpl object > JavaDStream tweets = TwitterUtils.createStream(ssc, twitterAuth); > The status class is different then the raw JSON. I.E. serializing the status > object will be the same as the original json. I have down stream systems that > can only process raw tweets not twitter4J Status objects. > Here is my bug/RFE request made to Twitter4J. > They asked I create a spark tracking issue. > On Thursday, January 21, 2016 at 6:27:25 PM UTC, Andy Davidson wrote: > Hi All > Quick problem summary: > My system uses the Status objects to do some analysis how ever I need to > store the raw JSON. There are other systems that process that data that are > not written in Java. > Currently we are serializing the Status Object. The JSON is going to break > down stream systems. > I am using the Apache Spark Streaming spark-streaming-twitter_2.10 > http://spark.apache.org/docs/latest/streaming-programming-guide.html#advanced-sources > Request For Enhancement: > I imagine easy access to the raw JSON is a common requirement. Would it be > possible to add a member function to StatusJSONImpl getRawJson(). By default > the returned value would be null unless jsonStoreEnabled=True is set in the > config. > Alternative implementations: > > It should be possible to modify the spark-streaming-twitter_2.10 to provide > this support. The solutions is not very clean > It would required apache spark to define their own Status Pojo. The current > StatusJSONImpl class is marked final > The Wrapper is not going to work nicely with existing code. > spark-streaming-twitter_2.10 does not expose all of the twitter streaming > API so many developers are writing their implementations of > org.apache.park.streaming.twitter.TwitterInputDStream. This make maintenance > difficult. Its not easy to know when the spark implementation for twitter has > changed. > Code listing for > spark-1.6.0/external/twitter/src/main/scala/org/apache/spark/streaming/twitter/TwitterInputDStream.scala > private[streaming] > class TwitterReceiver( > twitterAuth: Authorization, > filters: Seq[String], > storageLevel: StorageLevel > ) extends Receiver[Status](storageLevel) with Logging { > @volatile private var twitterStream: TwitterStream = _ > @volatile private var stopped = false > def onStart() { > try { > val newTwitterStream = new > TwitterStreamFactory().getInstance(twitterAuth) > newTwitterStream.addListener(new StatusListener { > def onStatus(status: Status): Unit = { > store(status) > } > Ref: > https://forum.processing.org/one/topic/saving-json-data-from-twitter4j.html > What do people think? > Kind regards > Andy > From: on behalf of Igor Brigadir > > Reply-To: > Date: Tuesday, January 19, 2016 at 5:55 AM > To: Twitter4J > Subject: Re: [Twitter4J] trouble writing unit test > Main issue is that the Json object is in the wrong json format. > eg: "createdAt": 1449775664000 should be "created_at": "Thu Dec 10 19:27:44 > + 2015", ... > It looks like the json you have was serialized from a java Status object, > which makes json objects different to what you get from the API, > TwitterObjectFactory expects json from Twitter (I haven't had any problems > using TwitterObjectFactory instead of the Deprecated DataObjectFactory). > You could "fix" it by matching the keys & values you have with the correct, > twitter API json - it should look like the example here: > https://dev.twitter.com/rest/reference/get/statuses/show/%3Aid > But it might be easier to download the tweets again, but this
[jira] [Commented] (SPARK-12988) Can't drop columns that contain dots
[ https://issues.apache.org/jira/browse/SPARK-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128902#comment-15128902 ] Dilip Biswal commented on SPARK-12988: -- The shuttle difference between column path and column name may not be very obvious to a common user of this API. val df = Seq((1, 1)).toDF("a_b", "a.b") df.select("`a.b`") df.drop("`a.b`") => the fact that one can not use back tick here , would it be that obvious to the user ? I believe that was the motivation to allow it but then i am not sure of its implications. > Can't drop columns that contain dots > > > Key: SPARK-12988 > URL: https://issues.apache.org/jira/browse/SPARK-12988 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.0 >Reporter: Michael Armbrust > > Neither of theses works: > {code} > val df = Seq((1, 1)).toDF("a_b", "a.c") > df.drop("a.c").collect() > df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int] > {code} > {code} > val df = Seq((1, 1)).toDF("a_b", "a.c") > df.drop("`a.c`").collect() > df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int] > {code} > Given that you can't use drop to drop subfields, it seems to me that we > should treat the column name literally (i.e. as though it is wrapped in back > ticks). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13126) History Server page always has horizontal scrollbar
[ https://issues.apache.org/jira/browse/SPARK-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128927#comment-15128927 ] Zhuo Liu commented on SPARK-13126: -- Hi Alex, thanks for testing that out. I came up with the fix for that. Please feel free to test again. https://github.com/apache/spark/pull/11029 > History Server page always has horizontal scrollbar > --- > > Key: SPARK-13126 > URL: https://issues.apache.org/jira/browse/SPARK-13126 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 >Reporter: Alex Bozarth >Priority: Minor > Attachments: page_width.png > > > The new History Server page table is always wider than the page no matter how > much larger you make the window. Most likely an odd CSS error, doesn't seem > to be to be a simple fix when manipulating the css using the Web Inspector -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13126) History Server page always has horizontal scrollbar
[ https://issues.apache.org/jira/browse/SPARK-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13126: Assignee: (was: Apache Spark) > History Server page always has horizontal scrollbar > --- > > Key: SPARK-13126 > URL: https://issues.apache.org/jira/browse/SPARK-13126 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 >Reporter: Alex Bozarth >Priority: Minor > Attachments: page_width.png > > > The new History Server page table is always wider than the page no matter how > much larger you make the window. Most likely an odd CSS error, doesn't seem > to be to be a simple fix when manipulating the css using the Web Inspector -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13126) History Server page always has horizontal scrollbar
[ https://issues.apache.org/jira/browse/SPARK-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13126: Assignee: Apache Spark > History Server page always has horizontal scrollbar > --- > > Key: SPARK-13126 > URL: https://issues.apache.org/jira/browse/SPARK-13126 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 >Reporter: Alex Bozarth >Assignee: Apache Spark >Priority: Minor > Attachments: page_width.png > > > The new History Server page table is always wider than the page no matter how > much larger you make the window. Most likely an odd CSS error, doesn't seem > to be to be a simple fix when manipulating the css using the Web Inspector -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13146) API for managing streaming dataframes
[ https://issues.apache.org/jira/browse/SPARK-13146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13146: Assignee: Tathagata Das (was: Apache Spark) > API for managing streaming dataframes > - > > Key: SPARK-13146 > URL: https://issues.apache.org/jira/browse/SPARK-13146 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Tathagata Das >Assignee: Tathagata Das > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13121) java mapWithState mishandles scala Option
[ https://issues.apache.org/jira/browse/SPARK-13121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-13121. -- Resolution: Fixed Fix Version/s: 2.0.0 > java mapWithState mishandles scala Option > - > > Key: SPARK-13121 > URL: https://issues.apache.org/jira/browse/SPARK-13121 > Project: Spark > Issue Type: Bug > Components: Java API, Streaming >Affects Versions: 1.6.0 >Reporter: Gabriele Nizzoli >Priority: Critical > Fix For: 1.6.1, 2.0.0 > > > in Spark Streaming, java mapWithState that uses Function3 has a bug in the > convertion from a scala Option to a java Optional. In the conversion, the > code in `StateSpec.scala`, line 222 is > `Optional.fromNullable(v.get)`. This fails if `v`, an `Option`, is `None`, > better to use `JavaUtils.optionToOptional(v)` instead. > Workaround is to use the Function4 call to mapWithState. This call has the > right conversion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13147) improve readability of generated code
Davies Liu created SPARK-13147: -- Summary: improve readability of generated code Key: SPARK-13147 URL: https://issues.apache.org/jira/browse/SPARK-13147 Project: Spark Issue Type: Improvement Components: SQL Reporter: Davies Liu Assignee: Davies Liu 1. try to avoid the suffix (unique id) 2. remove multiple empty lines in code formater 3. remove the comment if there is no code generated. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13146) API for managing streaming dataframes
Tathagata Das created SPARK-13146: - Summary: API for managing streaming dataframes Key: SPARK-13146 URL: https://issues.apache.org/jira/browse/SPARK-13146 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Tathagata Das Assignee: Tathagata Das -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7997) Remove the developer api SparkEnv.actorSystem and AkkaUtils
[ https://issues.apache.org/jira/browse/SPARK-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129019#comment-15129019 ] Apache Spark commented on SPARK-7997: - User 'zsxwing' has created a pull request for this issue: https://github.com/apache/spark/pull/11031 > Remove the developer api SparkEnv.actorSystem and AkkaUtils > --- > > Key: SPARK-7997 > URL: https://issues.apache.org/jira/browse/SPARK-7997 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13121) java mapWithState mishandles scala Option
[ https://issues.apache.org/jira/browse/SPARK-13121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-13121: - Assignee: Gabriele Nizzoli > java mapWithState mishandles scala Option > - > > Key: SPARK-13121 > URL: https://issues.apache.org/jira/browse/SPARK-13121 > Project: Spark > Issue Type: Bug > Components: Java API, Streaming >Affects Versions: 1.6.0 >Reporter: Gabriele Nizzoli >Assignee: Gabriele Nizzoli >Priority: Critical > Fix For: 1.6.1, 2.0.0 > > > in Spark Streaming, java mapWithState that uses Function3 has a bug in the > convertion from a scala Option to a java Optional. In the conversion, the > code in `StateSpec.scala`, line 222 is > `Optional.fromNullable(v.get)`. This fails if `v`, an `Option`, is `None`, > better to use `JavaUtils.optionToOptional(v)` instead. > Workaround is to use the Function4 call to mapWithState. This call has the > right conversion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13147) improve readability of generated code
[ https://issues.apache.org/jira/browse/SPARK-13147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13147: Assignee: Davies Liu (was: Apache Spark) > improve readability of generated code > - > > Key: SPARK-13147 > URL: https://issues.apache.org/jira/browse/SPARK-13147 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu >Assignee: Davies Liu > > 1. try to avoid the suffix (unique id) > 2. remove multiple empty lines in code formater > 3. remove the comment if there is no code generated. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13147) improve readability of generated code
[ https://issues.apache.org/jira/browse/SPARK-13147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129108#comment-15129108 ] Apache Spark commented on SPARK-13147: -- User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/11032 > improve readability of generated code > - > > Key: SPARK-13147 > URL: https://issues.apache.org/jira/browse/SPARK-13147 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu >Assignee: Davies Liu > > 1. try to avoid the suffix (unique id) > 2. remove multiple empty lines in code formater > 3. remove the comment if there is no code generated. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13147) improve readability of generated code
[ https://issues.apache.org/jira/browse/SPARK-13147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13147: Assignee: Apache Spark (was: Davies Liu) > improve readability of generated code > - > > Key: SPARK-13147 > URL: https://issues.apache.org/jira/browse/SPARK-13147 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu >Assignee: Apache Spark > > 1. try to avoid the suffix (unique id) > 2. remove multiple empty lines in code formater > 3. remove the comment if there is no code generated. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13148) support zero-keytab Oozie application launch on a secure cluster
Steve Loughran created SPARK-13148: -- Summary: support zero-keytab Oozie application launch on a secure cluster Key: SPARK-13148 URL: https://issues.apache.org/jira/browse/SPARK-13148 Project: Spark Issue Type: New Feature Components: YARN Affects Versions: 1.6.0 Environment: YARN cluster with Kerberos enabled, launched from Oozie —where Oozie passes down the delegation tokens Reporter: Steve Loughran Oozie can launch Spark instances on insecure clusters, and on a secure cluster if Oozie is set up to provide a keytab. What it cannot currently do is launch a Spark application on a YARN cluster without a keytab. In this situation, Oozie collects the delegation tokens it is setup to collect (as a superuser in the cluster), saves them to a file, then points to the file in the `HADOOP_TOKEN_FILE_LOCATION` environment variable. These tokens need to be used to launch the application —rather than try to get some more -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13148) support zero-keytab Oozie application launch on a secure cluster
[ https://issues.apache.org/jira/browse/SPARK-13148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129136#comment-15129136 ] Steve Loughran commented on SPARK-13148: Note that Hadoop's UGI class automatically loads the file referenced off {{$HADOOP_TOKEN_FILE_LOCATION}} when it inits; this is the mechanism used to get tokens in the YARN AM. Client-side, they become the tokens of the current user. All that is needed is for the Yarn client to recognise that the situation has occurred (i.e. the env variable is set), add all those credentials to the AM's launch context —and skip trying to acquire tokens for filesystems, HBase and Hive. > support zero-keytab Oozie application launch on a secure cluster > - > > Key: SPARK-13148 > URL: https://issues.apache.org/jira/browse/SPARK-13148 > Project: Spark > Issue Type: New Feature > Components: YARN >Affects Versions: 1.6.0 > Environment: YARN cluster with Kerberos enabled, launched from Oozie > —where Oozie passes down the delegation tokens >Reporter: Steve Loughran > > Oozie can launch Spark instances on insecure clusters, and on a secure > cluster if Oozie is set up to provide a keytab. > What it cannot currently do is launch a Spark application on a YARN cluster > without a keytab. In this situation, Oozie collects the delegation tokens it > is setup to collect (as a superuser in the cluster), saves them to a file, > then points to the file in the `HADOOP_TOKEN_FILE_LOCATION` environment > variable. > These tokens need to be used to launch the application —rather than try to > get some more -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13148) support zero-keytab Oozie application launch on a secure cluster
[ https://issues.apache.org/jira/browse/SPARK-13148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129176#comment-15129176 ] Apache Spark commented on SPARK-13148: -- User 'steveloughran' has created a pull request for this issue: https://github.com/apache/spark/pull/11033 > support zero-keytab Oozie application launch on a secure cluster > - > > Key: SPARK-13148 > URL: https://issues.apache.org/jira/browse/SPARK-13148 > Project: Spark > Issue Type: New Feature > Components: YARN >Affects Versions: 1.6.0 > Environment: YARN cluster with Kerberos enabled, launched from Oozie > —where Oozie passes down the delegation tokens >Reporter: Steve Loughran > > Oozie can launch Spark instances on insecure clusters, and on a secure > cluster if Oozie is set up to provide a keytab. > What it cannot currently do is launch a Spark application on a YARN cluster > without a keytab. In this situation, Oozie collects the delegation tokens it > is setup to collect (as a superuser in the cluster), saves them to a file, > then points to the file in the `HADOOP_TOKEN_FILE_LOCATION` environment > variable. > These tokens need to be used to launch the application —rather than try to > get some more -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13148) support zero-keytab Oozie application launch on a secure cluster
[ https://issues.apache.org/jira/browse/SPARK-13148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13148: Assignee: Apache Spark > support zero-keytab Oozie application launch on a secure cluster > - > > Key: SPARK-13148 > URL: https://issues.apache.org/jira/browse/SPARK-13148 > Project: Spark > Issue Type: New Feature > Components: YARN >Affects Versions: 1.6.0 > Environment: YARN cluster with Kerberos enabled, launched from Oozie > —where Oozie passes down the delegation tokens >Reporter: Steve Loughran >Assignee: Apache Spark > > Oozie can launch Spark instances on insecure clusters, and on a secure > cluster if Oozie is set up to provide a keytab. > What it cannot currently do is launch a Spark application on a YARN cluster > without a keytab. In this situation, Oozie collects the delegation tokens it > is setup to collect (as a superuser in the cluster), saves them to a file, > then points to the file in the `HADOOP_TOKEN_FILE_LOCATION` environment > variable. > These tokens need to be used to launch the application —rather than try to > get some more -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13148) support zero-keytab Oozie application launch on a secure cluster
[ https://issues.apache.org/jira/browse/SPARK-13148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13148: Assignee: (was: Apache Spark) > support zero-keytab Oozie application launch on a secure cluster > - > > Key: SPARK-13148 > URL: https://issues.apache.org/jira/browse/SPARK-13148 > Project: Spark > Issue Type: New Feature > Components: YARN >Affects Versions: 1.6.0 > Environment: YARN cluster with Kerberos enabled, launched from Oozie > —where Oozie passes down the delegation tokens >Reporter: Steve Loughran > > Oozie can launch Spark instances on insecure clusters, and on a secure > cluster if Oozie is set up to provide a keytab. > What it cannot currently do is launch a Spark application on a YARN cluster > without a keytab. In this situation, Oozie collects the delegation tokens it > is setup to collect (as a superuser in the cluster), saves them to a file, > then points to the file in the `HADOOP_TOKEN_FILE_LOCATION` environment > variable. > These tokens need to be used to launch the application —rather than try to > get some more -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13146) API for managing streaming dataframes
[ https://issues.apache.org/jira/browse/SPARK-13146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13146: Assignee: Apache Spark (was: Tathagata Das) > API for managing streaming dataframes > - > > Key: SPARK-13146 > URL: https://issues.apache.org/jira/browse/SPARK-13146 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Tathagata Das >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13146) API for managing streaming dataframes
[ https://issues.apache.org/jira/browse/SPARK-13146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129010#comment-15129010 ] Apache Spark commented on SPARK-13146: -- User 'tdas' has created a pull request for this issue: https://github.com/apache/spark/pull/11030 > API for managing streaming dataframes > - > > Key: SPARK-13146 > URL: https://issues.apache.org/jira/browse/SPARK-13146 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Tathagata Das >Assignee: Tathagata Das > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13149) Add FileStreamSource and a simple version of FileStreamSink
[ https://issues.apache.org/jira/browse/SPARK-13149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13149: Assignee: Shixiong Zhu (was: Apache Spark) > Add FileStreamSource and a simple version of FileStreamSink > --- > > Key: SPARK-13149 > URL: https://issues.apache.org/jira/browse/SPARK-13149 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Shixiong Zhu >Assignee: Shixiong Zhu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13149) Add FileStreamSource and a simple version of FileStreamSink
[ https://issues.apache.org/jira/browse/SPARK-13149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13149: Assignee: Apache Spark (was: Shixiong Zhu) > Add FileStreamSource and a simple version of FileStreamSink > --- > > Key: SPARK-13149 > URL: https://issues.apache.org/jira/browse/SPARK-13149 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Shixiong Zhu >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13150) Flaky test: org.apache.spark.sql.hive.thriftserver.SingleSessionSuite.test single session
Davies Liu created SPARK-13150: -- Summary: Flaky test: org.apache.spark.sql.hive.thriftserver.SingleSessionSuite.test single session Key: SPARK-13150 URL: https://issues.apache.org/jira/browse/SPARK-13150 Project: Spark Issue Type: Test Components: SQL Reporter: Davies Liu Assignee: Cheng Lian https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50551/testReport/org.apache.spark.sql.hive.thriftserver/SingleSessionSuite/test_single_session/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13101) Dataset complex types mapping to DataFrame (element nullability) mismatch
[ https://issues.apache.org/jira/browse/SPARK-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13101: Assignee: Apache Spark > Dataset complex types mapping to DataFrame (element nullability) mismatch > -- > > Key: SPARK-13101 > URL: https://issues.apache.org/jira/browse/SPARK-13101 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1 >Reporter: Deenar Toraskar >Assignee: Apache Spark >Priority: Blocker > > There seems to be a regression between 1.6.0 and 1.6.1 (snapshot build). By > default a scala {{Seq\[Double\]}} is mapped by Spark as an ArrayType with > nullable element > {noformat} > |-- valuations: array (nullable = true) > ||-- element: double (containsNull = true) > {noformat} > This could be read back to as a Dataset in Spark 1.6.0 > {code} > val df = sqlContext.table("valuations").as[Valuation] > {code} > But with Spark 1.6.1 the same fails with > {code} > val df = sqlContext.table("valuations").as[Valuation] > org.apache.spark.sql.AnalysisException: cannot resolve 'cast(valuations as > array)' due to data type mismatch: cannot cast > ArrayType(DoubleType,true) to ArrayType(DoubleType,false); > {code} > Here's the classes I am using > {code} > case class Valuation(tradeId : String, > counterparty: String, > nettingAgreement: String, > wrongWay: Boolean, > valuations : Seq[Double], /* one per scenario */ > timeInterval: Int, > jobId: String) /* used for hdfs partitioning */ > val vals : Seq[Valuation] = Seq() > val valsDF = sqlContext.sparkContext.parallelize(vals).toDF > valsDF.write.partitionBy("jobId").mode(SaveMode.Overwrite).saveAsTable("valuations") > {code} > even the following gives the same result > {code} > val valsDF = vals.toDS.toDF > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13101) Dataset complex types mapping to DataFrame (element nullability) mismatch
[ https://issues.apache.org/jira/browse/SPARK-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129347#comment-15129347 ] Apache Spark commented on SPARK-13101: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/11035 > Dataset complex types mapping to DataFrame (element nullability) mismatch > -- > > Key: SPARK-13101 > URL: https://issues.apache.org/jira/browse/SPARK-13101 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1 >Reporter: Deenar Toraskar >Priority: Blocker > > There seems to be a regression between 1.6.0 and 1.6.1 (snapshot build). By > default a scala {{Seq\[Double\]}} is mapped by Spark as an ArrayType with > nullable element > {noformat} > |-- valuations: array (nullable = true) > ||-- element: double (containsNull = true) > {noformat} > This could be read back to as a Dataset in Spark 1.6.0 > {code} > val df = sqlContext.table("valuations").as[Valuation] > {code} > But with Spark 1.6.1 the same fails with > {code} > val df = sqlContext.table("valuations").as[Valuation] > org.apache.spark.sql.AnalysisException: cannot resolve 'cast(valuations as > array)' due to data type mismatch: cannot cast > ArrayType(DoubleType,true) to ArrayType(DoubleType,false); > {code} > Here's the classes I am using > {code} > case class Valuation(tradeId : String, > counterparty: String, > nettingAgreement: String, > wrongWay: Boolean, > valuations : Seq[Double], /* one per scenario */ > timeInterval: Int, > jobId: String) /* used for hdfs partitioning */ > val vals : Seq[Valuation] = Seq() > val valsDF = sqlContext.sparkContext.parallelize(vals).toDF > valsDF.write.partitionBy("jobId").mode(SaveMode.Overwrite).saveAsTable("valuations") > {code} > even the following gives the same result > {code} > val valsDF = vals.toDS.toDF > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13101) Dataset complex types mapping to DataFrame (element nullability) mismatch
[ https://issues.apache.org/jira/browse/SPARK-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13101: Assignee: (was: Apache Spark) > Dataset complex types mapping to DataFrame (element nullability) mismatch > -- > > Key: SPARK-13101 > URL: https://issues.apache.org/jira/browse/SPARK-13101 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.1 >Reporter: Deenar Toraskar >Priority: Blocker > > There seems to be a regression between 1.6.0 and 1.6.1 (snapshot build). By > default a scala {{Seq\[Double\]}} is mapped by Spark as an ArrayType with > nullable element > {noformat} > |-- valuations: array (nullable = true) > ||-- element: double (containsNull = true) > {noformat} > This could be read back to as a Dataset in Spark 1.6.0 > {code} > val df = sqlContext.table("valuations").as[Valuation] > {code} > But with Spark 1.6.1 the same fails with > {code} > val df = sqlContext.table("valuations").as[Valuation] > org.apache.spark.sql.AnalysisException: cannot resolve 'cast(valuations as > array)' due to data type mismatch: cannot cast > ArrayType(DoubleType,true) to ArrayType(DoubleType,false); > {code} > Here's the classes I am using > {code} > case class Valuation(tradeId : String, > counterparty: String, > nettingAgreement: String, > wrongWay: Boolean, > valuations : Seq[Double], /* one per scenario */ > timeInterval: Int, > jobId: String) /* used for hdfs partitioning */ > val vals : Seq[Valuation] = Seq() > val valsDF = sqlContext.sparkContext.parallelize(vals).toDF > valsDF.write.partitionBy("jobId").mode(SaveMode.Overwrite).saveAsTable("valuations") > {code} > even the following gives the same result > {code} > val valsDF = vals.toDS.toDF > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13150) Flaky test: org.apache.spark.sql.hive.thriftserver.SingleSessionSuite.test single session
[ https://issues.apache.org/jira/browse/SPARK-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129239#comment-15129239 ] Davies Liu commented on SPARK-13150: This one usually fail together : https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50551/testReport/org.apache.spark.sql.hive.thriftserver/HiveThriftBinaryServerSuite/SPARK_11595_ADD_JAR_with_input_path_having_URL_scheme/ > Flaky test: org.apache.spark.sql.hive.thriftserver.SingleSessionSuite.test > single session > - > > Key: SPARK-13150 > URL: https://issues.apache.org/jira/browse/SPARK-13150 > Project: Spark > Issue Type: Test > Components: SQL >Reporter: Davies Liu >Assignee: Cheng Lian > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50551/testReport/org.apache.spark.sql.hive.thriftserver/SingleSessionSuite/test_single_session/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13149) Add FileStreamSource and a simple version of FileStreamSink
Shixiong Zhu created SPARK-13149: Summary: Add FileStreamSource and a simple version of FileStreamSink Key: SPARK-13149 URL: https://issues.apache.org/jira/browse/SPARK-13149 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Shixiong Zhu Assignee: Shixiong Zhu -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13122) Race condition in MemoryStore.unrollSafely() causes memory leak
[ https://issues.apache.org/jira/browse/SPARK-13122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-13122: -- Assignee: Adam Budde > Race condition in MemoryStore.unrollSafely() causes memory leak > --- > > Key: SPARK-13122 > URL: https://issues.apache.org/jira/browse/SPARK-13122 > Project: Spark > Issue Type: Bug > Components: Spark Core, Streaming >Affects Versions: 1.6.0 >Reporter: Adam Budde >Assignee: Adam Budde > > The > [unrollSafely()|https://github.com/apache/spark/blob/v1.6.0/core/src/main/scala/org/apache/spark/storage/MemoryStore.scala#L249] > method in MemoryStore will progressively unroll the contents of a block > iterator into memory. It works by reserving an initial chunk of unroll memory > and periodically checking if more memory must be reserved as it unrolls the > iterator. The memory reserved for performing the unroll is considered > "pending" memory and is tracked on a per-task attempt ID bases in a map > object named pendingUnrollMemoryMap. When the unrolled block is committed to > storage memory in the > [tryToPut()|https://github.com/apache/spark/blob/v1.6.0/core/src/main/scala/org/apache/spark/storage/MemoryStore.scala#L362] > method, a method named > [releasePendingUnrollMemoryForThisTask()|https://github.com/apache/spark/blob/v1.6.0/core/src/main/scala/org/apache/spark/storage/MemoryStore.scala#L521] > is invoked and this pending memory is released. tryToPut() then proceeds to > allocate the storage memory required for the block. > The unrollSafely() method computes the amount of pending memory used for the > unroll operation by saving the amount of unroll memory reserved for the > particular task attempt ID at the start of the method in a variable named > previousMemoryReserved and subtracting this value from the unroll memory > dedicated to the task at the end of the method. This value is stored as the > variable amountToTransferToPending. This amount is then subtracted from the > per-task unrollMemoryMap and added to pendingUnrollMemoryMap. > The amount of unroll memory consumed for the task is obtained from > unrollMemoryMap via the currentUnrollMemoryForThisTask method. In order for > the semantics of unrollSafely() to work, the value of unrollMemoryMap for the > task returned by > [currentTaskAttemptId()|https://github.com/apache/spark/blob/v1.6.0/core/src/main/scala/org/apache/spark/storage/MemoryStore.scala#L475] > must not be mutated between the computation of previousMemoryReserved and > amountToTransferToPending. However, since there is no synchronization in > place to ensure that computing both variables and updating the memory maps > happens atomically, a race condition can occur when multiple threads for > which currentTaskAttemptId() returns the same value are both trying to store > blocks. This can lead to a negative value being computed for > amountToTransferToPending, corrupting the unrollMemoryMap and > pendingUnrollMemoryMap memory maps which in turn can lead to the memory > manager leaking unroll memory. > For example, lets consider how the state of the unrollMemoryMap and > pendingUnrollMemoryMap variables might be affected if two threads returning > the same value for currentTaskAttemptId() both execute unrollSafely() > concurrently: > ||Thread 1||Thread 2||unrollMemoryMap||pendingUnrollMemoryMap|| > |Enter unrollSafely()|-|0|0| > |perviousMemoryReserved = 0|-|0|0| > |(perform unroll)|-|2097152 (2 MiB)|0| > |-|Enter unrollSafely()|2097152 (2 MiB)|0| > |-|perviousMemoryReserved = 2097152|2097152 (2 MiB)|0| > |-|(performUnroll)|3145728 (3 MiB)|0| > |Enter finally { }|-|3145728 (3 MiB)|0| > |amtToTransfer = 3145728|-|3145728 (3 MiB)|0| > |Update memory maps|-|0|3145728 (3 MiB)| > |Return|Enter finally { }|0|3145728 (3 MiB)| > |-|amtToTrasnfer = -2097152|0|3145728 (3 MiB)| > |-|Update memory maps|-2097152 (2 MiB)|1048576 (1 MiB)| > In this example, we end up leaking 2 MiB of unroll memory since both Thread 1 > and Thread 2 think that the task has only 1 MiB of unroll memory allocated to > it when it actually has 3 MiB. The negative value stored in unrollMemoryMap > will also propagate to future invocations of unrollSafely(). > In our particular case, this behavior manifests since the > currentTaskAttemptId() method is returning -1 for each Spark receiver task. > This in and of itself could be a bug and is something I'm going to look into. > We noticed that blocks would start to spill over to disk when more than > enough storage memory was available, so we inserted log statements into > MemoryManager's acquireUnrollMemory() and releaseUnrollMemory() in order to > collect the number of unroll bytes acquired and released. When we plot the > output, it is apparent that unroll memory is
[jira] [Updated] (SPARK-13151) Investigate replacing SynchronizedBuffer as it is deprecated/unreliable
[ https://issues.apache.org/jira/browse/SPARK-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-13151: Component/s: Spark Core > Investigate replacing SynchronizedBuffer as it is deprecated/unreliable > --- > > Key: SPARK-13151 > URL: https://issues.apache.org/jira/browse/SPARK-13151 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Streaming >Reporter: holdenk >Priority: Trivial > > Building with scala 2.11 results in the warning trait SynchronizedBuffer in > package mutable is deprecated: Synchronization via traits is deprecated as it > is inherently unreliable. Consider > java.util.concurrent.ConcurrentLinkedQueue as an alternative - we should > investigate if this is a reasonable suggestion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-13150) Flaky test: org.apache.spark.sql.hive.thriftserver.SingleSessionSuite.test single session
[ https://issues.apache.org/jira/browse/SPARK-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu reopened SPARK-13150: not fixed yet > Flaky test: org.apache.spark.sql.hive.thriftserver.SingleSessionSuite.test > single session > - > > Key: SPARK-13150 > URL: https://issues.apache.org/jira/browse/SPARK-13150 > Project: Spark > Issue Type: Test > Components: SQL >Reporter: Davies Liu >Assignee: Cheng Lian > Fix For: 2.0.0 > > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50551/testReport/org.apache.spark.sql.hive.thriftserver/SingleSessionSuite/test_single_session/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13150) Flaky test: org.apache.spark.sql.hive.thriftserver.SingleSessionSuite.test single session
[ https://issues.apache.org/jira/browse/SPARK-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129556#comment-15129556 ] Cheng Lian commented on SPARK-13150: Seems that ADD JAR command in both flaky tests may fail silently and causing the following failure of CREATE TEMPORARY FUNCTION command. Still investigating. > Flaky test: org.apache.spark.sql.hive.thriftserver.SingleSessionSuite.test > single session > - > > Key: SPARK-13150 > URL: https://issues.apache.org/jira/browse/SPARK-13150 > Project: Spark > Issue Type: Test > Components: SQL >Reporter: Davies Liu >Assignee: Cheng Lian > Fix For: 2.0.0 > > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50551/testReport/org.apache.spark.sql.hive.thriftserver/SingleSessionSuite/test_single_session/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12991) Establish correspondence between SparkPlan and LogicalPlan nodes
[ https://issues.apache.org/jira/browse/SPARK-12991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12991: Assignee: Apache Spark > Establish correspondence between SparkPlan and LogicalPlan nodes > > > Key: SPARK-12991 > URL: https://issues.apache.org/jira/browse/SPARK-12991 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Mikhail Bautin >Assignee: Apache Spark > > In order to reuse RDDs across Spark SQL queries (SPARK-11838), we need to > know a {{LogicalPlan}} a {{SparkPlan}} node corresponds to. Unfortunately, > once a {{SparkPlan}} gets built, it is difficult to go back to > {{LogicalPlan}} nodes. Ideally, there would be an optional field of the type > {{LogicalPlan}} in {{SparkPlan}} that would get populated as {{SparkPlan}} > gets built. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-12991) Establish correspondence between SparkPlan and LogicalPlan nodes
[ https://issues.apache.org/jira/browse/SPARK-12991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-12991: Assignee: (was: Apache Spark) > Establish correspondence between SparkPlan and LogicalPlan nodes > > > Key: SPARK-12991 > URL: https://issues.apache.org/jira/browse/SPARK-12991 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Mikhail Bautin > > In order to reuse RDDs across Spark SQL queries (SPARK-11838), we need to > know a {{LogicalPlan}} a {{SparkPlan}} node corresponds to. Unfortunately, > once a {{SparkPlan}} gets built, it is difficult to go back to > {{LogicalPlan}} nodes. Ideally, there would be an optional field of the type > {{LogicalPlan}} in {{SparkPlan}} that would get populated as {{SparkPlan}} > gets built. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12991) Establish correspondence between SparkPlan and LogicalPlan nodes
[ https://issues.apache.org/jira/browse/SPARK-12991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129429#comment-15129429 ] Apache Spark commented on SPARK-12991: -- User 'mbautin' has created a pull request for this issue: https://github.com/apache/spark/pull/11036 > Establish correspondence between SparkPlan and LogicalPlan nodes > > > Key: SPARK-12991 > URL: https://issues.apache.org/jira/browse/SPARK-12991 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Mikhail Bautin > > In order to reuse RDDs across Spark SQL queries (SPARK-11838), we need to > know a {{LogicalPlan}} a {{SparkPlan}} node corresponds to. Unfortunately, > once a {{SparkPlan}} gets built, it is difficult to go back to > {{LogicalPlan}} nodes. Ideally, there would be an optional field of the type > {{LogicalPlan}} in {{SparkPlan}} that would get populated as {{SparkPlan}} > gets built. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13020) fix random generator for map type
[ https://issues.apache.org/jira/browse/SPARK-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-13020. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10930 [https://github.com/apache/spark/pull/10930] > fix random generator for map type > - > > Key: SPARK-13020 > URL: https://issues.apache.org/jira/browse/SPARK-13020 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Wenchen Fan > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11838) Spark SQL query fragment RDD reuse
[ https://issues.apache.org/jira/browse/SPARK-11838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129534#comment-15129534 ] Wenchen Fan commented on SPARK-11838: - Can the new `StreamFrame` satisfy this requirement(avoid re-computing for slowly changing tables)? cc [~zsxwing] > Spark SQL query fragment RDD reuse > -- > > Key: SPARK-11838 > URL: https://issues.apache.org/jira/browse/SPARK-11838 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Mikhail Bautin > > With many analytical Spark SQL workloads against slowly changing tables, > successive queries frequently share fragments that produce the same result. > Instead of re-computing those fragments for every query, it makes sense to > detect similar fragments and substitute RDDs previously created for matching > SparkPlan fragments into every new SparkPlan at the execution time whenever > possible. Even if no RDDs are persist()-ed to memory/disk/off-heap memory, > many stages can still be skipped due to map output files being present on > executor nodes. > The implementation involves the following steps: > (1) Logical plan "canonicalization". > Logical plans mapping to the same "canonical" logical plan should always > produce the same results (except for possible output column reordering), > although the inverse statement won't always be true. > - Re-mapping expression ids to "canonical expression ids" (successively > increasing numbers always starting with 1). > - Eliminating alias names that are unimportant after analysis completion. > Only the names that are necessary to determine the Hive table columns to be > scanned are retained. > - Reordering columns in projections, grouping/aggregation expressions, etc. > This can be done e.g. by using the string representation as a sort key. Union > inputs always have to be reordered the same way. > - Tree traversal has to happen starting from leaves and progressing towards > the root, because we need to already have identified canonical expression ids > for children of a node before we can come up with sort keys that would allow > to reorder expressions in a node deterministically. This is a bit more > complicated for Union nodes. > - Special handling for MetastoreRelations. We replace MetastoreRelation > with a special class CanonicalMetastoreRelation that uses attributes and > partitionKeys as part of its equals() and hashCode() implementation, but the > visible attributes and aprtitionKeys are restricted to expression ids that > the rest of the query actually needs from that MetastoreRelation. > An example of logical plans and corresponding canonical logical plans: > https://gist.githubusercontent.com/mbautin/ef1317b341211d9606cf/raw > (2) Tracking LogicalPlan fragments corresponding to SparkPlan fragments. When > generating a SparkPlan, we keep an optional reference to a LogicalPlan > instance in every node. This allows us to populate the cache with mappings > from canonical logical plans of query fragments to the corresponding RDDs > generated as part of query execution. Note that there is no new work > necessary to generate the RDDs, we are merely utilizing the RDDs that would > have been produced as part of SparkPlan execution anyway. > (3) SparkPlan fragment substitution. After generating a SparkPlan and before > calling prepare() or execute() on it, we check if any of its nodes have an > associated LogicalPlan that maps to a canonical logical plan matching a cache > entry. If so, we substitute a PhysicalRDD (or a new class UnsafePhysicalRDD > wrapping an RDD of UnsafeRow) scanning the previously created RDD instead of > the current query fragment. If the expected column order differs from what > the current SparkPlan fragment produces, we add a projection to reorder the > columns. We also add safe/unsafe row conversions as necessary to match the > row type that is expected by the parent of the current SparkPlan fragment. > (4) The execute() method of SparkPlan also needs to perform the cache lookup > and substitution described above before producing a new RDD for the current > SparkPlan node. The "loading cache" pattern (e.g. as implemented in Guava) > allows to reuse query fragments between simultaneously submitted queries: > whichever query runs execute() for a particular fragment's canonical logical > plan starts producing an RDD first, and if another query has a fragment with > the same canonical logical plan, it waits for the RDD to be produced by the > first query and inserts it in its SparkPlan instead. > This kind of query fragment caching will mostly be useful for slowly-changing > or static tables. Even with slowly-changing tables, the cache needs to be > invalidated when those data set changes take place. One of the following >
[jira] [Assigned] (SPARK-13124) Adding JQuery DataTables messed up the Web UI css and js
[ https://issues.apache.org/jira/browse/SPARK-13124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13124: Assignee: (was: Apache Spark) > Adding JQuery DataTables messed up the Web UI css and js > > > Key: SPARK-13124 > URL: https://issues.apache.org/jira/browse/SPARK-13124 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 >Reporter: Alex Bozarth > Attachments: css_issue.png, js_issue.png > > > With the addition of JQuery DataTables in SPARK-10873 all the old tables are > using the new DataTables css instead of the old css. Though we most likely > want to switch over completely to DataTables eventually, we should still keep > the old tables UI. > Also when you open up Web Inspector all pages in the WebUI throw an > jsonFormatter.min.js.map not found error. This file was not included in the > update and seems to be required to use Web Inspector on the new js file > (Error doesn't affect actual use) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13124) Adding JQuery DataTables messed up the Web UI css and js
[ https://issues.apache.org/jira/browse/SPARK-13124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13124: Assignee: Apache Spark > Adding JQuery DataTables messed up the Web UI css and js > > > Key: SPARK-13124 > URL: https://issues.apache.org/jira/browse/SPARK-13124 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 >Reporter: Alex Bozarth >Assignee: Apache Spark > Attachments: css_issue.png, js_issue.png > > > With the addition of JQuery DataTables in SPARK-10873 all the old tables are > using the new DataTables css instead of the old css. Though we most likely > want to switch over completely to DataTables eventually, we should still keep > the old tables UI. > Also when you open up Web Inspector all pages in the WebUI throw an > jsonFormatter.min.js.map not found error. This file was not included in the > update and seems to be required to use Web Inspector on the new js file > (Error doesn't affect actual use) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13124) Adding JQuery DataTables messed up the Web UI css and js
[ https://issues.apache.org/jira/browse/SPARK-13124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129533#comment-15129533 ] Apache Spark commented on SPARK-13124: -- User 'ajbozarth' has created a pull request for this issue: https://github.com/apache/spark/pull/11038 > Adding JQuery DataTables messed up the Web UI css and js > > > Key: SPARK-13124 > URL: https://issues.apache.org/jira/browse/SPARK-13124 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 >Reporter: Alex Bozarth > Attachments: css_issue.png, js_issue.png > > > With the addition of JQuery DataTables in SPARK-10873 all the old tables are > using the new DataTables css instead of the old css. Though we most likely > want to switch over completely to DataTables eventually, we should still keep > the old tables UI. > Also when you open up Web Inspector all pages in the WebUI throw an > jsonFormatter.min.js.map not found error. This file was not included in the > update and seems to be required to use Web Inspector on the new js file > (Error doesn't affect actual use) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12986) Fix pydoc warnings in mllib/regression.py
[ https://issues.apache.org/jira/browse/SPARK-12986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-12986: -- Assignee: Nam Pham (was: Yu Ishikawa) > Fix pydoc warnings in mllib/regression.py > - > > Key: SPARK-12986 > URL: https://issues.apache.org/jira/browse/SPARK-12986 > Project: Spark > Issue Type: Bug > Components: MLlib, PySpark >Affects Versions: 2.0.0 >Reporter: Xiangrui Meng >Assignee: Nam Pham >Priority: Minor > > Got those warnings by running "make html" under "python/docs/": > {code} > /Users/meng/src/spark/python/pyspark/mllib/regression.py:docstring of > pyspark.mllib.regression.LinearRegressionWithSGD:3: ERROR: Unexpected > indentation. > /Users/meng/src/spark/python/pyspark/mllib/regression.py:docstring of > pyspark.mllib.regression.LinearRegressionWithSGD:4: WARNING: Block quote ends > without a blank line; unexpected unindent. > /Users/meng/src/spark/python/pyspark/mllib/regression.py:docstring of > pyspark.mllib.regression.RidgeRegressionWithSGD:3: ERROR: Unexpected > indentation. > /Users/meng/src/spark/python/pyspark/mllib/regression.py:docstring of > pyspark.mllib.regression.RidgeRegressionWithSGD:4: WARNING: Block quote ends > without a blank line; unexpected unindent. > /Users/meng/src/spark/python/pyspark/mllib/regression.py:docstring of > pyspark.mllib.regression.LassoWithSGD:3: ERROR: Unexpected indentation. > /Users/meng/src/spark/python/pyspark/mllib/regression.py:docstring of > pyspark.mllib.regression.LassoWithSGD:4: WARNING: Block quote ends without a > blank line; unexpected unindent. > /Users/meng/src/spark/python/pyspark/mllib/regression.py:docstring of > pyspark.mllib.regression.IsotonicRegression:7: ERROR: Unexpected indentation. > /Users/meng/src/spark/python/pyspark/mllib/regression.py:docstring of > pyspark.mllib.regression.IsotonicRegression:12: ERROR: Unexpected indentation. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12986) Fix pydoc warnings in mllib/regression.py
[ https://issues.apache.org/jira/browse/SPARK-12986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129541#comment-15129541 ] Xiangrui Meng commented on SPARK-12986: --- [~holdenk] I think it is useful to check the pydoc warnings at builds. ScalaDoc is quite noise, but Python warnings are usually pointing out real problems. Could you make a JIRA and ping Josh there? > Fix pydoc warnings in mllib/regression.py > - > > Key: SPARK-12986 > URL: https://issues.apache.org/jira/browse/SPARK-12986 > Project: Spark > Issue Type: Bug > Components: MLlib, PySpark >Affects Versions: 2.0.0 >Reporter: Xiangrui Meng >Assignee: Nam Pham >Priority: Minor > > Got those warnings by running "make html" under "python/docs/": > {code} > /Users/meng/src/spark/python/pyspark/mllib/regression.py:docstring of > pyspark.mllib.regression.LinearRegressionWithSGD:3: ERROR: Unexpected > indentation. > /Users/meng/src/spark/python/pyspark/mllib/regression.py:docstring of > pyspark.mllib.regression.LinearRegressionWithSGD:4: WARNING: Block quote ends > without a blank line; unexpected unindent. > /Users/meng/src/spark/python/pyspark/mllib/regression.py:docstring of > pyspark.mllib.regression.RidgeRegressionWithSGD:3: ERROR: Unexpected > indentation. > /Users/meng/src/spark/python/pyspark/mllib/regression.py:docstring of > pyspark.mllib.regression.RidgeRegressionWithSGD:4: WARNING: Block quote ends > without a blank line; unexpected unindent. > /Users/meng/src/spark/python/pyspark/mllib/regression.py:docstring of > pyspark.mllib.regression.LassoWithSGD:3: ERROR: Unexpected indentation. > /Users/meng/src/spark/python/pyspark/mllib/regression.py:docstring of > pyspark.mllib.regression.LassoWithSGD:4: WARNING: Block quote ends without a > blank line; unexpected unindent. > /Users/meng/src/spark/python/pyspark/mllib/regression.py:docstring of > pyspark.mllib.regression.IsotonicRegression:7: ERROR: Unexpected indentation. > /Users/meng/src/spark/python/pyspark/mllib/regression.py:docstring of > pyspark.mllib.regression.IsotonicRegression:12: ERROR: Unexpected indentation. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-3611) Show number of cores for each executor in application web UI
[ https://issues.apache.org/jira/browse/SPARK-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-3611: --- Assignee: Apache Spark > Show number of cores for each executor in application web UI > > > Key: SPARK-3611 > URL: https://issues.apache.org/jira/browse/SPARK-3611 > Project: Spark > Issue Type: New Feature > Components: Web UI >Affects Versions: 1.0.0 >Reporter: Matei Zaharia >Assignee: Apache Spark >Priority: Minor > Labels: starter > > This number is not always fully known, because e.g. in Mesos your executors > can scale up and down in # of CPUs, but it would be nice to show at least the > number of cores the machine has in that case, or the # of cores the executor > has been configured with if known. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3611) Show number of cores for each executor in application web UI
[ https://issues.apache.org/jira/browse/SPARK-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129542#comment-15129542 ] Apache Spark commented on SPARK-3611: - User 'ajbozarth' has created a pull request for this issue: https://github.com/apache/spark/pull/11039 > Show number of cores for each executor in application web UI > > > Key: SPARK-3611 > URL: https://issues.apache.org/jira/browse/SPARK-3611 > Project: Spark > Issue Type: New Feature > Components: Web UI >Affects Versions: 1.0.0 >Reporter: Matei Zaharia >Priority: Minor > Labels: starter > > This number is not always fully known, because e.g. in Mesos your executors > can scale up and down in # of CPUs, but it would be nice to show at least the > number of cores the machine has in that case, or the # of cores the executor > has been configured with if known. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-3611) Show number of cores for each executor in application web UI
[ https://issues.apache.org/jira/browse/SPARK-3611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-3611: --- Assignee: (was: Apache Spark) > Show number of cores for each executor in application web UI > > > Key: SPARK-3611 > URL: https://issues.apache.org/jira/browse/SPARK-3611 > Project: Spark > Issue Type: New Feature > Components: Web UI >Affects Versions: 1.0.0 >Reporter: Matei Zaharia >Priority: Minor > Labels: starter > > This number is not always fully known, because e.g. in Mesos your executors > can scale up and down in # of CPUs, but it would be nice to show at least the > number of cores the machine has in that case, or the # of cores the executor > has been configured with if known. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12291) Support UnsafeRow in BroadcastLeftSemiJoinHash
[ https://issues.apache.org/jira/browse/SPARK-12291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129369#comment-15129369 ] Apache Spark commented on SPARK-12291: -- User 'mbautin' has created a pull request for this issue: https://github.com/apache/spark/pull/11036 > Support UnsafeRow in BroadcastLeftSemiJoinHash > -- > > Key: SPARK-12291 > URL: https://issues.apache.org/jira/browse/SPARK-12291 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13150) Flaky test: org.apache.spark.sql.hive.thriftserver.SingleSessionSuite.test single session
[ https://issues.apache.org/jira/browse/SPARK-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129433#comment-15129433 ] Apache Spark commented on SPARK-13150: -- User 'davies' has created a pull request for this issue: https://github.com/apache/spark/pull/11037 > Flaky test: org.apache.spark.sql.hive.thriftserver.SingleSessionSuite.test > single session > - > > Key: SPARK-13150 > URL: https://issues.apache.org/jira/browse/SPARK-13150 > Project: Spark > Issue Type: Test > Components: SQL >Reporter: Davies Liu >Assignee: Cheng Lian > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50551/testReport/org.apache.spark.sql.hive.thriftserver/SingleSessionSuite/test_single_session/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13150) Flaky test: org.apache.spark.sql.hive.thriftserver.SingleSessionSuite.test single session
[ https://issues.apache.org/jira/browse/SPARK-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13150: Assignee: Apache Spark (was: Cheng Lian) > Flaky test: org.apache.spark.sql.hive.thriftserver.SingleSessionSuite.test > single session > - > > Key: SPARK-13150 > URL: https://issues.apache.org/jira/browse/SPARK-13150 > Project: Spark > Issue Type: Test > Components: SQL >Reporter: Davies Liu >Assignee: Apache Spark > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50551/testReport/org.apache.spark.sql.hive.thriftserver/SingleSessionSuite/test_single_session/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13150) Flaky test: org.apache.spark.sql.hive.thriftserver.SingleSessionSuite.test single session
[ https://issues.apache.org/jira/browse/SPARK-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13150: Assignee: Cheng Lian (was: Apache Spark) > Flaky test: org.apache.spark.sql.hive.thriftserver.SingleSessionSuite.test > single session > - > > Key: SPARK-13150 > URL: https://issues.apache.org/jira/browse/SPARK-13150 > Project: Spark > Issue Type: Test > Components: SQL >Reporter: Davies Liu >Assignee: Cheng Lian > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50551/testReport/org.apache.spark.sql.hive.thriftserver/SingleSessionSuite/test_single_session/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13020) fix random generator for map type
[ https://issues.apache.org/jira/browse/SPARK-13020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-13020: - Assignee: Wenchen Fan > fix random generator for map type > - > > Key: SPARK-13020 > URL: https://issues.apache.org/jira/browse/SPARK-13020 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13151) Investigate replacing SynchronizedBuffer as it is deprecated/unreliable
holdenk created SPARK-13151: --- Summary: Investigate replacing SynchronizedBuffer as it is deprecated/unreliable Key: SPARK-13151 URL: https://issues.apache.org/jira/browse/SPARK-13151 Project: Spark Issue Type: Improvement Components: Streaming Reporter: holdenk Priority: Trivial Building with scala 2.11 results in the warning trait SynchronizedBuffer in package mutable is deprecated: Synchronization via traits is deprecated as it is inherently unreliable. Consider java.util.concurrent.ConcurrentLinkedQueue as an alternative - we should investigate if this is a reasonable suggestion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12992) Vectorize parquet decoding using ColumnarBatch
[ https://issues.apache.org/jira/browse/SPARK-12992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-12992. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10908 [https://github.com/apache/spark/pull/10908] > Vectorize parquet decoding using ColumnarBatch > -- > > Key: SPARK-12992 > URL: https://issues.apache.org/jira/browse/SPARK-12992 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Nong Li > Fix For: 2.0.0 > > > Parquet files benefit from vectorized decoding. ColumnarBatches have been > designed to support this. This means that a single encoded parquet column is > decoded to a single ColumnVector. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13152) Fix task metrics deprecation warning
holdenk created SPARK-13152: --- Summary: Fix task metrics deprecation warning Key: SPARK-13152 URL: https://issues.apache.org/jira/browse/SPARK-13152 Project: Spark Issue Type: Improvement Components: Spark Core Reporter: holdenk Priority: Minor Right now incBytesRead and incRecordsRead are marked as deprecated and for internal use only. We should make private[spark] versions which are not deprecated and switch to those internally so as to not clutter up the warning messages when building. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6476) Spark fileserver not started on same IP as using spark.driver.host
[ https://issues.apache.org/jira/browse/SPARK-6476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129567#comment-15129567 ] Hao Xia commented on SPARK-6476: I found a workaround by setting env SPARK_LOCAL_IP on the driver > Spark fileserver not started on same IP as using spark.driver.host > -- > > Key: SPARK-6476 > URL: https://issues.apache.org/jira/browse/SPARK-6476 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.2.1 >Reporter: Rares Vernica > > I initially inquired about this here: > http://mail-archives.apache.org/mod_mbox/spark-user/201503.mbox/%3ccalq9kxcn2mwfnd4r4k0q+qh1ypwn3p8rgud1v6yrx9_05lv...@mail.gmail.com%3E > If the Spark driver host has multiple IPs and spark.driver.host is set to one > of them, I would expect the fileserver to start on the same IP. I checked > HttpServer and the jetty Server is started the default IP of the machine: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/HttpServer.scala#L75 > Something like this might work instead: > {code:title=HttpServer.scala#L75} > val server = new Server(new InetSocketAddress(conf.get("spark.driver.host"), > 0)) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13150) Flaky test: org.apache.spark.sql.hive.thriftserver.SingleSessionSuite.test single session
[ https://issues.apache.org/jira/browse/SPARK-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13150: Assignee: Cheng Lian (was: Apache Spark) > Flaky test: org.apache.spark.sql.hive.thriftserver.SingleSessionSuite.test > single session > - > > Key: SPARK-13150 > URL: https://issues.apache.org/jira/browse/SPARK-13150 > Project: Spark > Issue Type: Test > Components: SQL >Reporter: Davies Liu >Assignee: Cheng Lian > Fix For: 2.0.0 > > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50551/testReport/org.apache.spark.sql.hive.thriftserver/SingleSessionSuite/test_single_session/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13150) Flaky test: org.apache.spark.sql.hive.thriftserver.SingleSessionSuite.test single session
[ https://issues.apache.org/jira/browse/SPARK-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129598#comment-15129598 ] Apache Spark commented on SPARK-13150: -- User 'liancheng' has created a pull request for this issue: https://github.com/apache/spark/pull/11040 > Flaky test: org.apache.spark.sql.hive.thriftserver.SingleSessionSuite.test > single session > - > > Key: SPARK-13150 > URL: https://issues.apache.org/jira/browse/SPARK-13150 > Project: Spark > Issue Type: Test > Components: SQL >Reporter: Davies Liu >Assignee: Cheng Lian > Fix For: 2.0.0 > > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50551/testReport/org.apache.spark.sql.hive.thriftserver/SingleSessionSuite/test_single_session/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-13150) Flaky test: org.apache.spark.sql.hive.thriftserver.SingleSessionSuite.test single session
[ https://issues.apache.org/jira/browse/SPARK-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13150: Assignee: Apache Spark (was: Cheng Lian) > Flaky test: org.apache.spark.sql.hive.thriftserver.SingleSessionSuite.test > single session > - > > Key: SPARK-13150 > URL: https://issues.apache.org/jira/browse/SPARK-13150 > Project: Spark > Issue Type: Test > Components: SQL >Reporter: Davies Liu >Assignee: Apache Spark > Fix For: 2.0.0 > > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50551/testReport/org.apache.spark.sql.hive.thriftserver/SingleSessionSuite/test_single_session/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-12629) SparkR: DataFrame's saveAsTable method has issues with the signature and HiveContext
[ https://issues.apache.org/jira/browse/SPARK-12629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman updated SPARK-12629: -- Fix Version/s: 1.6.1 > SparkR: DataFrame's saveAsTable method has issues with the signature and > HiveContext > - > > Key: SPARK-12629 > URL: https://issues.apache.org/jira/browse/SPARK-12629 > Project: Spark > Issue Type: Bug > Components: SparkR >Reporter: Narine Kokhlikyan >Assignee: Narine Kokhlikyan > Fix For: 1.6.1, 2.0.0 > > > There are several issues with the DataFrame's saveAsTable method in SparkR. > Here is a summary of some of them. Hope this will help to fix the issues. > 1. According to SparkR's saveAsTable(...) documentation, we can call the > "saveAsTable(df, "myfile")" in order to store the dataframe. > However, this signature isn't working. It seems that "source" and "mode" are > forced according to signature. > 2. Within the method saveAsTable(...) it tries to retrieve the SQL context > and tries to create/initialize source as parquet, but this is also failing > because the context has to be Hive Context. Based on the error messages I see. > 3. In general the method fails when I try to call it with sqlContext > 4. Also, it seems that SQL DataFrame.saveAsTable is deprecated, we could use > df.write.saveAsTable(...) instead ... > [~shivaram] [~sunrui] [~felixcheung] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13145) checkAnswer should tolerate small float number error
Davies Liu created SPARK-13145: -- Summary: checkAnswer should tolerate small float number error Key: SPARK-13145 URL: https://issues.apache.org/jira/browse/SPARK-13145 Project: Spark Issue Type: Improvement Reporter: Davies Liu For example, we should the Float/Double as this: {code} abs(actual - expected) < expected * 1e-12 {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13138) Add "logical" package prefix for ddl.scala
[ https://issues.apache.org/jira/browse/SPARK-13138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-13138. - Resolution: Fixed Fix Version/s: 2.0.0 > Add "logical" package prefix for ddl.scala > -- > > Key: SPARK-13138 > URL: https://issues.apache.org/jira/browse/SPARK-13138 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Reynold Xin >Assignee: Reynold Xin > Fix For: 2.0.0 > > > ddl.scala is defined in the execution package, and yet its reference of > "UnaryNode" and "Command" are logical. This was fairly confusing when I was > trying to understand the ddl code. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13156) JDBC using multiple partitions creates additional tasks but only executes on one
[ https://issues.apache.org/jira/browse/SPARK-13156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129815#comment-15129815 ] Sean Owen commented on SPARK-13156: --- It just sounds like your data is skewed. Are you sure that's not it? > JDBC using multiple partitions creates additional tasks but only executes on > one > > > Key: SPARK-13156 > URL: https://issues.apache.org/jira/browse/SPARK-13156 > Project: Spark > Issue Type: Bug > Components: Input/Output >Affects Versions: 1.5.0 > Environment: Hadoop 2.6.0-cdh5.4.0, Teradata, yarn-client >Reporter: Charles Drotar > > I can successfully kick off a query through JDBC to Teradata, and when it > runs it creates a task on each executor for every partition. The problem is > that all of the tasks except for one complete within a couple seconds and the > final task handles the entire dataset. > Example Code: > private val properties = new java.util.Properties() > properties.setProperty("driver","com.teradata.jdbc.TeraDriver") > properties.setProperty("username","foo") > properties.setProperty("password","bar") > val url = "jdbc:teradata://oneview/, TMODE=TERA,TYPE=FASTEXPORT,SESSIONS=10" > val numPartitions = 5 > val dbTableTemp = "( SELECT id MOD $numPartitions%d AS modulo, id > FROM db.table > ) AS TEMP_TABLE" > val partitionColumn = "modulo" > val lowerBound = 0.toLong > val upperBound = (numPartitions-1).toLong > val df = > sqlContext.read.jdbc(url,dbTableTemp,partitionColumn,lowerBound,upperBound,numPartitions,properties) > df.write.parquet("/output/path/for/df/") > When I look at the Spark UI I see that 5 tasks, but only 1 is actually > querying. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13151) Investigate replacing SynchronizedBuffer as it is deprecated/unreliable
[ https://issues.apache.org/jira/browse/SPARK-13151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129816#comment-15129816 ] Sean Owen commented on SPARK-13151: --- +1 to this and your next JIRA > Investigate replacing SynchronizedBuffer as it is deprecated/unreliable > --- > > Key: SPARK-13151 > URL: https://issues.apache.org/jira/browse/SPARK-13151 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Streaming >Reporter: holdenk >Priority: Trivial > > Building with scala 2.11 results in the warning trait SynchronizedBuffer in > package mutable is deprecated: Synchronization via traits is deprecated as it > is inherently unreliable. Consider > java.util.concurrent.ConcurrentLinkedQueue as an alternative - we should > investigate if this is a reasonable suggestion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13120) Shade protobuf-java
[ https://issues.apache.org/jira/browse/SPARK-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129820#comment-15129820 ] Sean Owen commented on SPARK-13120: --- Yeah, aren't you also saying this wouldn't help? And it is still not something identified as a problem in the thread you linked > Shade protobuf-java > --- > > Key: SPARK-13120 > URL: https://issues.apache.org/jira/browse/SPARK-13120 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Ted Yu > > See this thread for background information: > http://search-hadoop.com/m/q3RTtdkUFK11xQhP1/Spark+not+able+to+fetch+events+from+Amazon+Kinesis > This issue shades com.google.protobuf:protobuf-java as > org.spark-project.protobuf -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13125) makes the ratio of KafkaRDD partition to kafka topic partition configurable.
[ https://issues.apache.org/jira/browse/SPARK-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-13125. --- Resolution: Not A Problem Target Version/s: (was: 2.0.0) > makes the ratio of KafkaRDD partition to kafka topic partition configurable. > - > > Key: SPARK-13125 > URL: https://issues.apache.org/jira/browse/SPARK-13125 > Project: Spark > Issue Type: Improvement > Components: Input/Output >Affects Versions: 1.6.1 >Reporter: zhengcanbin > Labels: features > Original Estimate: 96h > Remaining Estimate: 96h > > Now each given Kafka topic/partition corresponds to an RDD partition, in some > case it's quite necessary to make this configurable, namely a ratio > configuration of RDDPartition/kafkaTopicPartition is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13150) Flaky test: org.apache.spark.sql.hive.thriftserver.SingleSessionSuite.test single session
[ https://issues.apache.org/jira/browse/SPARK-13150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129857#comment-15129857 ] Cheng Lian commented on SPARK-13150: Please refer to [this PR comment|https://github.com/apache/spark/pull/11040#issuecomment-179028394] for the reason of the test failure. > Flaky test: org.apache.spark.sql.hive.thriftserver.SingleSessionSuite.test > single session > - > > Key: SPARK-13150 > URL: https://issues.apache.org/jira/browse/SPARK-13150 > Project: Spark > Issue Type: Test > Components: SQL >Reporter: Davies Liu >Assignee: Cheng Lian > Fix For: 2.0.0 > > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50551/testReport/org.apache.spark.sql.hive.thriftserver/SingleSessionSuite/test_single_session/ -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13125) makes the ratio of KafkaRDD partition to kafka topic partition configurable.
[ https://issues.apache.org/jira/browse/SPARK-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129860#comment-15129860 ] zhengcanbin commented on SPARK-13125: - Yes, you are right, but shuffle will increase net burden, and number of partitions is limited by total number of disk. In strictly real-time scenarios, one topic partition corresponds to multiple rdd partitions is important for increasing parallelism > makes the ratio of KafkaRDD partition to kafka topic partition configurable. > - > > Key: SPARK-13125 > URL: https://issues.apache.org/jira/browse/SPARK-13125 > Project: Spark > Issue Type: Improvement > Components: Input/Output >Affects Versions: 1.6.1 >Reporter: zhengcanbin > Labels: features > Original Estimate: 96h > Remaining Estimate: 96h > > Now each given Kafka topic/partition corresponds to an RDD partition, in some > case it's quite necessary to make this configurable, namely a ratio > configuration of RDDPartition/kafkaTopicPartition is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13125) makes the ratio of KafkaRDD partition to kafka topic partition configurable.
[ https://issues.apache.org/jira/browse/SPARK-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129867#comment-15129867 ] zhengcanbin commented on SPARK-13125: - Shuffle will increase net burden, and number of partitions is limited by total number of disk. In strictly real-time scenarios, one topic partition corresponds to multiple rdd partitions is important for increasing parallelism. A lot of clients who runs our application has raised this problem, so I still think it makes sense. > makes the ratio of KafkaRDD partition to kafka topic partition configurable. > - > > Key: SPARK-13125 > URL: https://issues.apache.org/jira/browse/SPARK-13125 > Project: Spark > Issue Type: Improvement > Components: Input/Output >Affects Versions: 1.6.1 >Reporter: zhengcanbin > Labels: features > Original Estimate: 96h > Remaining Estimate: 96h > > Now each given Kafka topic/partition corresponds to an RDD partition, in some > case it's quite necessary to make this configurable, namely a ratio > configuration of RDDPartition/kafkaTopicPartition is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-13125) makes the ratio of KafkaRDD partition to kafka topic partition configurable.
[ https://issues.apache.org/jira/browse/SPARK-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129867#comment-15129867 ] zhengcanbin edited comment on SPARK-13125 at 2/3/16 6:08 AM: - Shuffle will increase net burden, and number of partitions is limited by total number of disk. In strictly real-time scenarios, one topic partition corresponds to multiple rdd partitions is important for increasing parallelism. A lot of clients who run our application raise this problem, so I still think it makes sense. was (Author: zhengcanbin): Shuffle will increase net burden, and number of partitions is limited by total number of disk. In strictly real-time scenarios, one topic partition corresponds to multiple rdd partitions is important for increasing parallelism. A lot of clients who runs our application has raised this problem, so I still think it makes sense. > makes the ratio of KafkaRDD partition to kafka topic partition configurable. > - > > Key: SPARK-13125 > URL: https://issues.apache.org/jira/browse/SPARK-13125 > Project: Spark > Issue Type: Improvement > Components: Input/Output >Affects Versions: 1.6.1 >Reporter: zhengcanbin > Labels: features > Original Estimate: 96h > Remaining Estimate: 96h > > Now each given Kafka topic/partition corresponds to an RDD partition, in some > case it's quite necessary to make this configurable, namely a ratio > configuration of RDDPartition/kafkaTopicPartition is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-13125) makes the ratio of KafkaRDD partition to kafka topic partition configurable.
[ https://issues.apache.org/jira/browse/SPARK-13125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengcanbin reopened SPARK-13125: - Shuffle will increase net burden, and number of partitions is limited by total number of disk. In strictly real-time scenarios, one topic partition corresponds to multiple rdd partitions is important for increasing parallelism. A lot of clients who run our application raise this problem, so I still think it makes sense. > makes the ratio of KafkaRDD partition to kafka topic partition configurable. > - > > Key: SPARK-13125 > URL: https://issues.apache.org/jira/browse/SPARK-13125 > Project: Spark > Issue Type: Improvement > Components: Input/Output >Affects Versions: 1.6.1 >Reporter: zhengcanbin > Labels: features > Original Estimate: 96h > Remaining Estimate: 96h > > Now each given Kafka topic/partition corresponds to an RDD partition, in some > case it's quite necessary to make this configurable, namely a ratio > configuration of RDDPartition/kafkaTopicPartition is needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-13147) improve readability of generated code
[ https://issues.apache.org/jira/browse/SPARK-13147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-13147. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11032 [https://github.com/apache/spark/pull/11032] > improve readability of generated code > - > > Key: SPARK-13147 > URL: https://issues.apache.org/jira/browse/SPARK-13147 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Davies Liu >Assignee: Davies Liu > Fix For: 2.0.0 > > > 1. try to avoid the suffix (unique id) > 2. remove multiple empty lines in code formater > 3. remove the comment if there is no code generated. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10820) Initial infrastructure
[ https://issues.apache.org/jira/browse/SPARK-10820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-10820. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11006 [https://github.com/apache/spark/pull/11006] > Initial infrastructure > -- > > Key: SPARK-10820 > URL: https://issues.apache.org/jira/browse/SPARK-10820 > Project: Spark > Issue Type: Sub-task > Components: SQL >Reporter: Reynold Xin >Assignee: Michael Armbrust > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13121) java mapWithState mishandles scala Option
[ https://issues.apache.org/jira/browse/SPARK-13121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-13121: - Fix Version/s: 1.6.1 > java mapWithState mishandles scala Option > - > > Key: SPARK-13121 > URL: https://issues.apache.org/jira/browse/SPARK-13121 > Project: Spark > Issue Type: Bug > Components: Java API, Streaming >Affects Versions: 1.6.0 >Reporter: Gabriele Nizzoli >Priority: Critical > Fix For: 1.6.1 > > > in Spark Streaming, java mapWithState that uses Function3 has a bug in the > convertion from a scala Option to a java Optional. In the conversion, the > code in `StateSpec.scala`, line 222 is > `Optional.fromNullable(v.get)`. This fails if `v`, an `Option`, is `None`, > better to use `JavaUtils.optionToOptional(v)` instead. > Workaround is to use the Function4 call to mapWithState. This call has the > right conversion. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13126) History Server page always has horizontal scrollbar
[ https://issues.apache.org/jira/browse/SPARK-13126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128928#comment-15128928 ] Apache Spark commented on SPARK-13126: -- User 'zhuoliu' has created a pull request for this issue: https://github.com/apache/spark/pull/11029 > History Server page always has horizontal scrollbar > --- > > Key: SPARK-13126 > URL: https://issues.apache.org/jira/browse/SPARK-13126 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0 >Reporter: Alex Bozarth >Priority: Minor > Attachments: page_width.png > > > The new History Server page table is always wider than the page no matter how > much larger you make the window. Most likely an odd CSS error, doesn't seem > to be to be a simple fix when manipulating the css using the Web Inspector -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-12811) Estimator interface for generalized linear models (GLMs)
[ https://issues.apache.org/jira/browse/SPARK-12811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang updated SPARK-12811: Comment: was deleted (was: Should we put it under a new folder named "ml/glm"?) > Estimator interface for generalized linear models (GLMs) > > > Key: SPARK-12811 > URL: https://issues.apache.org/jira/browse/SPARK-12811 > Project: Spark > Issue Type: New Feature > Components: ML >Affects Versions: 2.0.0 >Reporter: Xiangrui Meng >Assignee: Yanbo Liang >Priority: Critical > > In Spark 1.6, MLlib provides logistic regression and linear regression with > L1/L2/elastic-net regularization. We want to expand the support of > generalized linear models (GLMs) in 2.0, e.g., Poisson/Gamma families and > more link functions. SPARK-9835 implements a GLM solver for the case when the > number of features is small. We also need to design an interface for GLMs. > In SparkR, we can simply follow glm or glmnet. On the Python/Scala/Java side, > the interface should be consistent with LinearRegression and > LogisticRegression, e.g., > {code} > val glm = new GeneralizedLinearModel() > .setFamily("poisson") > .setSolver("irls") > {code} > It would be great if LinearRegression and LogisticRegression can reuse code > from GeneralizedLinearModel. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12986) Fix pydoc warnings in mllib/regression.py
[ https://issues.apache.org/jira/browse/SPARK-12986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129760#comment-15129760 ] holdenk commented on SPARK-12986: - Sure :) > Fix pydoc warnings in mllib/regression.py > - > > Key: SPARK-12986 > URL: https://issues.apache.org/jira/browse/SPARK-12986 > Project: Spark > Issue Type: Bug > Components: MLlib, PySpark >Affects Versions: 2.0.0 >Reporter: Xiangrui Meng >Assignee: Nam Pham >Priority: Minor > > Got those warnings by running "make html" under "python/docs/": > {code} > /Users/meng/src/spark/python/pyspark/mllib/regression.py:docstring of > pyspark.mllib.regression.LinearRegressionWithSGD:3: ERROR: Unexpected > indentation. > /Users/meng/src/spark/python/pyspark/mllib/regression.py:docstring of > pyspark.mllib.regression.LinearRegressionWithSGD:4: WARNING: Block quote ends > without a blank line; unexpected unindent. > /Users/meng/src/spark/python/pyspark/mllib/regression.py:docstring of > pyspark.mllib.regression.RidgeRegressionWithSGD:3: ERROR: Unexpected > indentation. > /Users/meng/src/spark/python/pyspark/mllib/regression.py:docstring of > pyspark.mllib.regression.RidgeRegressionWithSGD:4: WARNING: Block quote ends > without a blank line; unexpected unindent. > /Users/meng/src/spark/python/pyspark/mllib/regression.py:docstring of > pyspark.mllib.regression.LassoWithSGD:3: ERROR: Unexpected indentation. > /Users/meng/src/spark/python/pyspark/mllib/regression.py:docstring of > pyspark.mllib.regression.LassoWithSGD:4: WARNING: Block quote ends without a > blank line; unexpected unindent. > /Users/meng/src/spark/python/pyspark/mllib/regression.py:docstring of > pyspark.mllib.regression.IsotonicRegression:7: ERROR: Unexpected indentation. > /Users/meng/src/spark/python/pyspark/mllib/regression.py:docstring of > pyspark.mllib.regression.IsotonicRegression:12: ERROR: Unexpected indentation. > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13154) Add pydoc lint for docs
[ https://issues.apache.org/jira/browse/SPARK-13154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-13154: Component/s: PySpark > Add pydoc lint for docs > --- > > Key: SPARK-13154 > URL: https://issues.apache.org/jira/browse/SPARK-13154 > Project: Spark > Issue Type: Improvement > Components: PySpark >Reporter: holdenk >Priority: Trivial > > As we fixed in SPARK-12986 it would be useful to have a lint rule to catch > this automatically. > cc [~mengxr] & [~josephkb] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13154) Add pydoc lint for docs
holdenk created SPARK-13154: --- Summary: Add pydoc lint for docs Key: SPARK-13154 URL: https://issues.apache.org/jira/browse/SPARK-13154 Project: Spark Issue Type: Improvement Reporter: holdenk Priority: Trivial As we fixed in SPARK-12986 it would be useful to have a lint rule to catch this automatically. cc [~mengxr] & [~josephkb] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-12732) Fix LinearRegression.train for the case when label is constant and fitIntercept=false
[ https://issues.apache.org/jira/browse/SPARK-12732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai resolved SPARK-12732. - Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10702 [https://github.com/apache/spark/pull/10702] > Fix LinearRegression.train for the case when label is constant and > fitIntercept=false > - > > Key: SPARK-12732 > URL: https://issues.apache.org/jira/browse/SPARK-12732 > Project: Spark > Issue Type: Bug > Components: MLlib >Reporter: Imran Younus >Assignee: Imran Younus >Priority: Minor > Fix For: 2.0.0 > > > If the target variable is constant, then the linear regression must check if > the fitIntercept is true or false, and handle these two cases separately. > If the fitIntercept is true, then there is no training needed and we set the > intercept equal to the mean of y. > But if the fit intercept is false, then the model should still train. > Currently, LinearRegression handles both cases in the same way. It doesn't > train the model and sets the intercept equal to the mean of y. Which, means > that it returns a non-zero intercept even when the user forces the regression > through the origin. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13155) add runtime null check when convert catalyst array to external array
Wenchen Fan created SPARK-13155: --- Summary: add runtime null check when convert catalyst array to external array Key: SPARK-13155 URL: https://issues.apache.org/jira/browse/SPARK-13155 Project: Spark Issue Type: Bug Components: SQL Reporter: Wenchen Fan {code} scala> Seq(("a", Seq(null, new Integer(1.toDS().as[(String, Array[Int])].collect() res5: Array[(String, Array[Int])] = Array((a,Array(0, 1))) {code} This is wrong, we should throw exception for this case -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13069) ActorHelper is not throttled by rate limiter
[ https://issues.apache.org/jira/browse/SPARK-13069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15129787#comment-15129787 ] sachin aggarwal commented on SPARK-13069: - As of Spark 2.0 (not yet released), Spark does not use Akka any more. See https://issues.apache.org/jira/browse/SPARK-5293 can you check with latest 2.0 build, to see if similar problem exists. > ActorHelper is not throttled by rate limiter > > > Key: SPARK-13069 > URL: https://issues.apache.org/jira/browse/SPARK-13069 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.6.0 >Reporter: Lin Zhao > > The rate an actor receiver sends data to spark is not limited by maxRate or > back pressure. Spark would control how fast it writes the data to block > manager, but the receiver actor sends events asynchronously and would fill > out akka mailbox with millions of events until memory runs out. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org