[jira] [Comment Edited] (SPARK-18974) FileInputDStream could not detected files which moved to the directory
[ https://issues.apache.org/jira/browse/SPARK-18974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782284#comment-15782284 ] Adam Wang edited comment on SPARK-18974 at 12/28/16 7:31 AM: - Thanks for remind, this is my fault. This bug seems not so easy to solve, could we define a Set[Path] to check if file had been read by InputDstream? Such as: @transient private var selectedFiles = new mutable.HashSet[String]() ... override def start() { addOldFilesToSelectedFiles() } ... private def isNewFile(path: Path, currentTime: Long, modTimeIgnoreThreshold: Long): Boolean = { val pathStr = path.toString .. if (selectedFiles.contains(pathStr)) { return false } ... } But it would very inefficient for lot of files directory. Are there any other way to judge the moved new files? was (Author: adam wang): Thanks for remind, this is my fault. This bug seems not so easy to solve, could we define a Set[Path] to check if file had been read by InputDstream? Such as: @transient private var selectedFiles = new mutable.HashSet[String]() ... override def start() { addOldFilesToSelectedFiles() } ... private def isNewFile(path: Path, currentTime: Long, modTimeIgnoreThreshold: Long): Boolean = { val pathStr = path.toString .. // Reject file if it was considered earlier if (selectedFiles.contains(pathStr)) { return false } logDebug(s"$pathStr accepted with mod time $modTime") return true } But it would very inefficient for lot of files directory. Are there any other way to judge the moved new files? > FileInputDStream could not detected files which moved to the directory > --- > > Key: SPARK-18974 > URL: https://issues.apache.org/jira/browse/SPARK-18974 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 1.6.3, 2.0.2 >Reporter: Adam Wang > > FileInputDStream use mod time to find new files, but if a file was moved into > the directories it's modification time would not be changed, so > FileInputDStream could not detect these files. > I think a way to fix this bug is get access_time and do judgment, bug it need > a Set of files to save all old files, it would very inefficient for lot of > files directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18974) FileInputDStream could not detected files which moved to the directory
[ https://issues.apache.org/jira/browse/SPARK-18974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782284#comment-15782284 ] Adam Wang commented on SPARK-18974: --- Thanks for remind, this is my fault. This bug seems not so easy to solve, could we define a Set[Path] to check if file has been read by InputDstream? Such as: @transient private var selectedFiles = new mutable.HashSet[String]() ... override def start() { addOldFilesToSelectedFiles() } ... private def isNewFile(path: Path, currentTime: Long, modTimeIgnoreThreshold: Long): Boolean = { val pathStr = path.toString .. // Reject file if it was considered earlier if (selectedFiles.contains(pathStr)) { return false } logDebug(s"$pathStr accepted with mod time $modTime") return true } But it would very inefficient for lot of files directory. Are there any other way to judge the moved new files? > FileInputDStream could not detected files which moved to the directory > --- > > Key: SPARK-18974 > URL: https://issues.apache.org/jira/browse/SPARK-18974 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 1.6.3, 2.0.2 >Reporter: Adam Wang > > FileInputDStream use mod time to find new files, but if a file was moved into > the directories it's modification time would not be changed, so > FileInputDStream could not detect these files. > I think a way to fix this bug is get access_time and do judgment, bug it need > a Set of files to save all old files, it would very inefficient for lot of > files directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18974) FileInputDStream could not detected files which moved to the directory
[ https://issues.apache.org/jira/browse/SPARK-18974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782284#comment-15782284 ] Adam Wang edited comment on SPARK-18974 at 12/28/16 7:30 AM: - Thanks for remind, this is my fault. This bug seems not so easy to solve, could we define a Set[Path] to check if file had been read by InputDstream? Such as: @transient private var selectedFiles = new mutable.HashSet[String]() ... override def start() { addOldFilesToSelectedFiles() } ... private def isNewFile(path: Path, currentTime: Long, modTimeIgnoreThreshold: Long): Boolean = { val pathStr = path.toString .. // Reject file if it was considered earlier if (selectedFiles.contains(pathStr)) { return false } logDebug(s"$pathStr accepted with mod time $modTime") return true } But it would very inefficient for lot of files directory. Are there any other way to judge the moved new files? was (Author: adam wang): Thanks for remind, this is my fault. This bug seems not so easy to solve, could we define a Set[Path] to check if file has been read by InputDstream? Such as: @transient private var selectedFiles = new mutable.HashSet[String]() ... override def start() { addOldFilesToSelectedFiles() } ... private def isNewFile(path: Path, currentTime: Long, modTimeIgnoreThreshold: Long): Boolean = { val pathStr = path.toString .. // Reject file if it was considered earlier if (selectedFiles.contains(pathStr)) { return false } logDebug(s"$pathStr accepted with mod time $modTime") return true } But it would very inefficient for lot of files directory. Are there any other way to judge the moved new files? > FileInputDStream could not detected files which moved to the directory > --- > > Key: SPARK-18974 > URL: https://issues.apache.org/jira/browse/SPARK-18974 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 1.6.3, 2.0.2 >Reporter: Adam Wang > > FileInputDStream use mod time to find new files, but if a file was moved into > the directories it's modification time would not be changed, so > FileInputDStream could not detect these files. > I think a way to fix this bug is get access_time and do judgment, bug it need > a Set of files to save all old files, it would very inefficient for lot of > files directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19002) Check pep8 against all the python scripts
[ https://issues.apache.org/jira/browse/SPARK-19002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-19002: - Summary: Check pep8 against all the python scripts (was: Check pep8 against dev/*.py scripts) > Check pep8 against all the python scripts > - > > Key: SPARK-19002 > URL: https://issues.apache.org/jira/browse/SPARK-19002 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Hyukjin Kwon > > We can check pep8 against dev/*.py scripts. There are already several python > scripts being checked. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19002) Check pep8 against all the python scripts
[ https://issues.apache.org/jira/browse/SPARK-19002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-19002: - Description: We can check pep8 against all the python scripts. There are only python scripts being checked. (was: We can check pep8 against dev/*.py scripts. There are already several python scripts being checked.) > Check pep8 against all the python scripts > - > > Key: SPARK-19002 > URL: https://issues.apache.org/jira/browse/SPARK-19002 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Hyukjin Kwon > > We can check pep8 against all the python scripts. There are only python > scripts being checked. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19002) Check pep8 against dev/*.py scripts
[ https://issues.apache.org/jira/browse/SPARK-19002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782211#comment-15782211 ] Hyukjin Kwon commented on SPARK-19002: -- I changed the priority to major because it seems some scripts are not executable via python 3. > Check pep8 against dev/*.py scripts > --- > > Key: SPARK-19002 > URL: https://issues.apache.org/jira/browse/SPARK-19002 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Hyukjin Kwon > > We can check pep8 against dev/*.py scripts. There are already several python > scripts being checked. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19002) Check pep8 against dev/*.py scripts
[ https://issues.apache.org/jira/browse/SPARK-19002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-19002: - Priority: Major (was: Trivial) > Check pep8 against dev/*.py scripts > --- > > Key: SPARK-19002 > URL: https://issues.apache.org/jira/browse/SPARK-19002 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: Hyukjin Kwon > > We can check pep8 against dev/*.py scripts. There are already several python > scripts being checked. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-18931) Create empty staging directory in partitioned table on insert
[ https://issues.apache.org/jira/browse/SPARK-18931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-18931. - Resolution: Duplicate > Create empty staging directory in partitioned table on insert > - > > Key: SPARK-18931 > URL: https://issues.apache.org/jira/browse/SPARK-18931 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: Egor Pahomov > > CREATE TABLE temp.test_partitioning_4 ( > num string > ) > PARTITIONED BY ( > day string) > stored as parquet > On every > INSERT INTO TABLE temp.test_partitioning_4 PARTITION (day) > select day, count(*) as num from > hss.session where year=2016 and month=4 > group by day > new directory > ".hive-staging_hive_2016-12-19_15-55-11_298_3412488541559534475-4" created on > HDFS. It's big issue, because I insert every day and bunch of empty dirs on > HDFS is very bad for HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18931) Create empty staging directory in partitioned table on insert
[ https://issues.apache.org/jira/browse/SPARK-18931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782017#comment-15782017 ] Xiao Li commented on SPARK-18931: - Please try it using the latest nightly build: https://people.apache.org/~pwendell/spark-nightly/. If it still failed, please reopen it. Thanks! > Create empty staging directory in partitioned table on insert > - > > Key: SPARK-18931 > URL: https://issues.apache.org/jira/browse/SPARK-18931 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2 >Reporter: Egor Pahomov > > CREATE TABLE temp.test_partitioning_4 ( > num string > ) > PARTITIONED BY ( > day string) > stored as parquet > On every > INSERT INTO TABLE temp.test_partitioning_4 PARTITION (day) > select day, count(*) as num from > hss.session where year=2016 and month=4 > group by day > new directory > ".hive-staging_hive_2016-12-19_15-55-11_298_3412488541559534475-4" created on > HDFS. It's big issue, because I insert every day and bunch of empty dirs on > HDFS is very bad for HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16563) Repeat calling Spark SQL thrift server fetchResults return empty for ExecuteStatement operation
[ https://issues.apache.org/jira/browse/SPARK-16563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782003#comment-15782003 ] vishal agrawal commented on SPARK-16563: Fix for this issue is causing Thrift server to hang while fetching large amount of data from Thrift server. Raised a bug for the same SPARK-18857. kindly check. > Repeat calling Spark SQL thrift server fetchResults return empty for > ExecuteStatement operation > --- > > Key: SPARK-16563 > URL: https://issues.apache.org/jira/browse/SPARK-16563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.2 >Reporter: Gu Huiqin Alice >Assignee: Gu Huiqin Alice >Priority: Minor > Fix For: 2.0.1, 2.1.0 > > > Repeat calling FetchResults(... orientation=FetchOrientation.FETCH_FIRST ..) > of spark sql thrift service will return empty set after calling > ExecuteStatement of TCLIService. > The bug exist in *function public RowSet getNextRowSet(FetchOrientation > orientation, long maxRows)* > https://github.com/apache/spark/blob/02c8072eea72425e89256347e1f373a3e76e6eba/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/SQLOperation.java#L332 > The iterator for geting result can be used for only once, so repeat calling > FetchResults with FETCH_FIRST parameter will return empty result. > FetchOrientation.FETCH_FIRST -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-18992) Move spark.sql.hive.thriftServer.singleSession to SQLConf
[ https://issues.apache.org/jira/browse/SPARK-18992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-18992. - Resolution: Fixed Fix Version/s: 2.2.0 > Move spark.sql.hive.thriftServer.singleSession to SQLConf > - > > Key: SPARK-18992 > URL: https://issues.apache.org/jira/browse/SPARK-18992 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.2, 2.1.0 >Reporter: Xiao Li >Assignee: Xiao Li > Fix For: 2.2.0 > > > Since {{spark.sql.hive.thriftServer.singleSession}} is a configuration of SQL > component, this conf can be moved from {{SparkConf}} to {{StaticSQLConf}}. > When we introduced {{spark.sql.hive.thriftServer.singleSession}}, all the SQL > configuration can be modified in different sessions. Later, static SQL > configuration is added. It is a perfect fit for > {{spark.sql.hive.thriftServer.singleSession}}. Previously, we did the same > move for {{spark.sql.warehouse.dir}} from {{SparkConf}} to {{StaticSQLConf}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x
[ https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781450#comment-15781450 ] Josh Bacon edited comment on SPARK-18737 at 12/27/16 11:16 PM: --- Thanks for the quick reply, We've cut our code down to a minimum and are only using JavaDStream and JavaDStream to call isEmpty (which experiences KyroException). So we do not have any classes to register, yet are still experiencing KyroExceptions. Non-the-less, enabling JavaSerialization and setting requireRegister to false do not appear to be functional. Are there any other details you'd like me to provide to help identify this issue? We are using the library: org.apache.spark.streaming.kinesis.KinesisUtils was (Author: jbacon): Thanks for the quick reply, We've cut our code down to a minimum and are only using JavaDStream and JavaDStream to call isEmpty (which experiences KyroException). So we do not have any classes to register, yet are still experiencing KyroExceptions. Non-the-less, enabling JavaSerialization and setting requireRegister to false do not appear to be functional. Are there any other details you'd like me to provide to help identify this issue? > Serialization setting "spark.serializer" ignored in Spark 2.x > - > > Key: SPARK-18737 > URL: https://issues.apache.org/jira/browse/SPARK-18737 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0, 2.0.1 >Reporter: Dr. Michael Menzel > > The following exception occurs although the JavaSerializer has been activated: > 16/11/22 10:49:24 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID > 77, ip-10-121-14-147.eu-central-1.compute.internal, partition 1, RACK_LOCAL, > 5621 bytes) > 16/11/22 10:49:24 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching > task 77 on executor id: 2 hostname: > ip-10-121-14-147.eu-central-1.compute.internal. > 16/11/22 10:49:24 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory > on ip-10-121-14-147.eu-central-1.compute.internal:45059 (size: 879.0 B, free: > 410.4 MB) > 16/11/22 10:49:24 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 77, > ip-10-121-14-147.eu-central-1.compute.internal): > com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: > 13994 > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137) > at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781) > at > org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229) > at > org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169) > at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) > at > scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) > at org.apache.spark.util.NextIterator.to(NextIterator.scala:21) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302) > at org.apache.spark.util.NextIterator.toBuffer(NextIterator.scala:21) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) > at org.apache.spark.util.NextIterator.toArray(NextIterator.scala:21) > at > org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927) > at > org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > The code runs perfectly with Spark 1.6.0. Since we moved to 2.0.0 and now > 2
[jira] [Commented] (SPARK-18393) DataFrame pivot output column names should respect aliases
[ https://issues.apache.org/jira/browse/SPARK-18393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781473#comment-15781473 ] Andrew Ray commented on SPARK-18393: It wouldn't hurt to backport to 2.0, its a pretty simple fix. > DataFrame pivot output column names should respect aliases > -- > > Key: SPARK-18393 > URL: https://issues.apache.org/jira/browse/SPARK-18393 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Eric Liang >Priority: Minor > > For example > {code} > val df = spark.range(100).selectExpr("id % 5 as x", "id % 2 as a", "id as b") > df > .groupBy('x) > .pivot("a", Seq(0, 1)) > .agg(expr("sum(b)").as("blah"), expr("count(b)").as("foo")) > .show() > +---++-++-+ > | x|0_sum(`b`) AS `blah`|0_count(`b`) AS `foo`|1_sum(`b`) AS > `blah`|1_count(`b`) AS `foo`| > +---++-++-+ > | 0| 450| 10| 500| > 10| > | 1| 510| 10| 460| > 10| > | 3| 530| 10| 480| > 10| > | 2| 470| 10| 520| > 10| > | 4| 490| 10| 540| > 10| > +---++-++-+ > {code} > The column names here are quite hard to read. Ideally we would respect the > aliases and generate column names like 0_blah, 0_foo, 1_blah, 1_foo instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x
[ https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781450#comment-15781450 ] Josh Bacon commented on SPARK-18737: Thanks for the quick reply, We've cut our code down to a minimum and are only using JavaDStream and JavaDStream to call isEmpty (which experiences KyroException). So we do not have any classes to register, yet are still experiencing KyroExceptions. Non-the-less, enabling JavaSerialization and setting requireRegister to false do not appear to be functional. Are there any other details you'd like me to provide to help identify this issue? > Serialization setting "spark.serializer" ignored in Spark 2.x > - > > Key: SPARK-18737 > URL: https://issues.apache.org/jira/browse/SPARK-18737 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0, 2.0.1 >Reporter: Dr. Michael Menzel > > The following exception occurs although the JavaSerializer has been activated: > 16/11/22 10:49:24 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID > 77, ip-10-121-14-147.eu-central-1.compute.internal, partition 1, RACK_LOCAL, > 5621 bytes) > 16/11/22 10:49:24 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching > task 77 on executor id: 2 hostname: > ip-10-121-14-147.eu-central-1.compute.internal. > 16/11/22 10:49:24 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory > on ip-10-121-14-147.eu-central-1.compute.internal:45059 (size: 879.0 B, free: > 410.4 MB) > 16/11/22 10:49:24 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 77, > ip-10-121-14-147.eu-central-1.compute.internal): > com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: > 13994 > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137) > at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781) > at > org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229) > at > org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169) > at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) > at > scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) > at org.apache.spark.util.NextIterator.to(NextIterator.scala:21) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302) > at org.apache.spark.util.NextIterator.toBuffer(NextIterator.scala:21) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) > at org.apache.spark.util.NextIterator.toArray(NextIterator.scala:21) > at > org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927) > at > org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > The code runs perfectly with Spark 1.6.0. Since we moved to 2.0.0 and now > 2.0.1, we see the Kyro deserialization exception and over time the Spark > streaming job stops processing since too many tasks failed. > Our action was to use conf.set("spark.serializer", > "org.apache.spark.serializer.JavaSerializer") and to disable Kryo class > registration with conf.set("spark.kryo.registrationRequired", false). We hope > to identify the root cause of the exception. > However, setting the serializer to JavaSerializer is oviously ignored by the > Spark-internals. Despite the setting we still see the exception printed in > the log and tasks fail. The occurence seems to
[jira] [Updated] (SPARK-17807) Scalatest listed as compile dependency in spark-tags
[ https://issues.apache.org/jira/browse/SPARK-17807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-17807: -- Assignee: Ryan Williams > Scalatest listed as compile dependency in spark-tags > > > Key: SPARK-17807 > URL: https://issues.apache.org/jira/browse/SPARK-17807 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.0 >Reporter: Tom Standard >Assignee: Ryan Williams >Priority: Trivial > Fix For: 2.0.3, 2.1.1, 2.2.0 > > > In spark-tags:2.0.0, Scalatest is listed as a compile time dependency - > shouldn't this be in test scope? -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19006) should mentioned the max value allowed for spark.kryoserializer.buffer.max in doc
[ https://issues.apache.org/jira/browse/SPARK-19006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-19006: -- Assignee: Yuexin Zhang > should mentioned the max value allowed for spark.kryoserializer.buffer.max in > doc > - > > Key: SPARK-19006 > URL: https://issues.apache.org/jira/browse/SPARK-19006 > Project: Spark > Issue Type: Documentation >Reporter: Yuexin Zhang >Assignee: Yuexin Zhang >Priority: Trivial > Fix For: 2.2.0 > > > On configuration doc > page:https://spark.apache.org/docs/latest/configuration.html > We mentioned spark.kryoserializer.buffer.max : Maximum allowable size of Kryo > serialization buffer. This must be larger than any object you attempt to > serialize. Increase this if you get a "buffer limit exceeded" exception > inside Kryo. > from source code, it has hard coded upper limit : > val maxBufferSizeMb = conf.getSizeAsMb("spark.kryoserializer.buffer.max", > "64m").toInt > if (maxBufferSizeMb >= ByteUnit.GiB.toMiB(2)) { > throw new IllegalArgumentException("spark.kryoserializer.buffer.max must > be less than " + > s"2048 mb, got: + $maxBufferSizeMb mb.") > } > We should mention "this value must be less than 2048 mb" on the config page > as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18393) DataFrame pivot output column names should respect aliases
[ https://issues.apache.org/jira/browse/SPARK-18393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781436#comment-15781436 ] Hyukjin Kwon commented on SPARK-18393: -- Yup, I checked this against the current master. Hi [~a1ray], should we backport this to 2.0.x? > DataFrame pivot output column names should respect aliases > -- > > Key: SPARK-18393 > URL: https://issues.apache.org/jira/browse/SPARK-18393 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Eric Liang >Priority: Minor > > For example > {code} > val df = spark.range(100).selectExpr("id % 5 as x", "id % 2 as a", "id as b") > df > .groupBy('x) > .pivot("a", Seq(0, 1)) > .agg(expr("sum(b)").as("blah"), expr("count(b)").as("foo")) > .show() > +---++-++-+ > | x|0_sum(`b`) AS `blah`|0_count(`b`) AS `foo`|1_sum(`b`) AS > `blah`|1_count(`b`) AS `foo`| > +---++-++-+ > | 0| 450| 10| 500| > 10| > | 1| 510| 10| 460| > 10| > | 3| 530| 10| 480| > 10| > | 2| 470| 10| 520| > 10| > | 4| 490| 10| 540| > 10| > +---++-++-+ > {code} > The column names here are quite hard to read. Ideally we would respect the > aliases and generate column names like 0_blah, 0_foo, 1_blah, 1_foo instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x
[ https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-18737. --- Resolution: Not A Problem Yes, see the message right above this. Spark 2.x does not necessarily work like 1.x and Kryo is not optional in all code paths. You will want to disable registration or register your classes, in general. Disabling Kryo isn't a good idea. If there's evidence this is from a path where Kryo should be disable-able then that much is a problem to fix, above notwithstanding. But then, using Kryo is still the better outcome. > Serialization setting "spark.serializer" ignored in Spark 2.x > - > > Key: SPARK-18737 > URL: https://issues.apache.org/jira/browse/SPARK-18737 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0, 2.0.1 >Reporter: Dr. Michael Menzel > > The following exception occurs although the JavaSerializer has been activated: > 16/11/22 10:49:24 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID > 77, ip-10-121-14-147.eu-central-1.compute.internal, partition 1, RACK_LOCAL, > 5621 bytes) > 16/11/22 10:49:24 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching > task 77 on executor id: 2 hostname: > ip-10-121-14-147.eu-central-1.compute.internal. > 16/11/22 10:49:24 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory > on ip-10-121-14-147.eu-central-1.compute.internal:45059 (size: 879.0 B, free: > 410.4 MB) > 16/11/22 10:49:24 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 77, > ip-10-121-14-147.eu-central-1.compute.internal): > com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: > 13994 > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137) > at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781) > at > org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229) > at > org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169) > at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) > at > scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) > at org.apache.spark.util.NextIterator.to(NextIterator.scala:21) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302) > at org.apache.spark.util.NextIterator.toBuffer(NextIterator.scala:21) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) > at org.apache.spark.util.NextIterator.toArray(NextIterator.scala:21) > at > org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927) > at > org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > The code runs perfectly with Spark 1.6.0. Since we moved to 2.0.0 and now > 2.0.1, we see the Kyro deserialization exception and over time the Spark > streaming job stops processing since too many tasks failed. > Our action was to use conf.set("spark.serializer", > "org.apache.spark.serializer.JavaSerializer") and to disable Kryo class > registration with conf.set("spark.kryo.registrationRequired", false). We hope > to identify the root cause of the exception. > However, setting the serializer to JavaSerializer is oviously ignored by the > Spark-internals. Despite the setting we still see the exception printed in > the log and tasks fail. The occurence seems to be non-deterministic, but to > become more freq
[jira] [Comment Edited] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x
[ https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781418#comment-15781418 ] Josh Bacon edited comment on SPARK-18737 at 12/27/16 10:33 PM: --- My team is experiencing the exact same issue as described by OP for our streaming jobs using KinesisUtils library. Code worked in 1.6 previously but now experiences KyroExceptions in 2.0 (Unregistered Class Id) no matter if JavaSerialization is enabled instead or if requireRegister is set to false. Errors are experienced in non-deterministic manor. We see no work-around currently for our streaming jobs in Spark 2.0.1. was (Author: jbacon): My team is experiencing the exact same issue as described by OP for our streaming jobs using KinesisUtils library. Code worked in 1.6 previously but now experiences KyroExceptions in 2.0 (Unregistered Class Id) no matter if JavaSerialization is enabled instead or if requireRegister is set to false. Errors are experienced in non-deterministic manor as well. We see no work-around currently for our streaming jobs in Spark 2.0.1. > Serialization setting "spark.serializer" ignored in Spark 2.x > - > > Key: SPARK-18737 > URL: https://issues.apache.org/jira/browse/SPARK-18737 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0, 2.0.1 >Reporter: Dr. Michael Menzel > > The following exception occurs although the JavaSerializer has been activated: > 16/11/22 10:49:24 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID > 77, ip-10-121-14-147.eu-central-1.compute.internal, partition 1, RACK_LOCAL, > 5621 bytes) > 16/11/22 10:49:24 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching > task 77 on executor id: 2 hostname: > ip-10-121-14-147.eu-central-1.compute.internal. > 16/11/22 10:49:24 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory > on ip-10-121-14-147.eu-central-1.compute.internal:45059 (size: 879.0 B, free: > 410.4 MB) > 16/11/22 10:49:24 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 77, > ip-10-121-14-147.eu-central-1.compute.internal): > com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: > 13994 > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137) > at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781) > at > org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229) > at > org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169) > at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) > at > scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) > at org.apache.spark.util.NextIterator.to(NextIterator.scala:21) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302) > at org.apache.spark.util.NextIterator.toBuffer(NextIterator.scala:21) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) > at org.apache.spark.util.NextIterator.toArray(NextIterator.scala:21) > at > org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927) > at > org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > The code runs perfectly with Spark 1.6.0. Since we moved to 2.0.0 and now > 2.0.1, we see the Kyro deserialization exception and over time the Spark > streaming job stops processing sin
[jira] [Commented] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x
[ https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781418#comment-15781418 ] Josh Bacon commented on SPARK-18737: My team is experiencing the exact same issue as described by OP for our streaming jobs using KinesisUtils library. Code worked in 1.6 previously but now experiences KyroExceptions in 2.0 (Unregistered Class Id) no matter if JavaSerialization is enabled instead or if requireRegister is set to false. Errors are experienced in non-deterministic manor as well. We see no work-around currently for our jobs in Spark 2.0.1. > Serialization setting "spark.serializer" ignored in Spark 2.x > - > > Key: SPARK-18737 > URL: https://issues.apache.org/jira/browse/SPARK-18737 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0, 2.0.1 >Reporter: Dr. Michael Menzel > > The following exception occurs although the JavaSerializer has been activated: > 16/11/22 10:49:24 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID > 77, ip-10-121-14-147.eu-central-1.compute.internal, partition 1, RACK_LOCAL, > 5621 bytes) > 16/11/22 10:49:24 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching > task 77 on executor id: 2 hostname: > ip-10-121-14-147.eu-central-1.compute.internal. > 16/11/22 10:49:24 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory > on ip-10-121-14-147.eu-central-1.compute.internal:45059 (size: 879.0 B, free: > 410.4 MB) > 16/11/22 10:49:24 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 77, > ip-10-121-14-147.eu-central-1.compute.internal): > com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: > 13994 > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137) > at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781) > at > org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229) > at > org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169) > at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) > at > scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) > at org.apache.spark.util.NextIterator.to(NextIterator.scala:21) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302) > at org.apache.spark.util.NextIterator.toBuffer(NextIterator.scala:21) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) > at org.apache.spark.util.NextIterator.toArray(NextIterator.scala:21) > at > org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927) > at > org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > The code runs perfectly with Spark 1.6.0. Since we moved to 2.0.0 and now > 2.0.1, we see the Kyro deserialization exception and over time the Spark > streaming job stops processing since too many tasks failed. > Our action was to use conf.set("spark.serializer", > "org.apache.spark.serializer.JavaSerializer") and to disable Kryo class > registration with conf.set("spark.kryo.registrationRequired", false). We hope > to identify the root cause of the exception. > However, setting the serializer to JavaSerializer is oviously ignored by the > Spark-internals. Despite the setting we still see the exception printed in > the log and tasks fail. The occurence seems to be non-deterministic, b
[jira] [Comment Edited] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x
[ https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781418#comment-15781418 ] Josh Bacon edited comment on SPARK-18737 at 12/27/16 10:32 PM: --- My team is experiencing the exact same issue as described by OP for our streaming jobs using KinesisUtils library. Code worked in 1.6 previously but now experiences KyroExceptions in 2.0 (Unregistered Class Id) no matter if JavaSerialization is enabled instead or if requireRegister is set to false. Errors are experienced in non-deterministic manor as well. We see no work-around currently for our streaming jobs in Spark 2.0.1. was (Author: jbacon): My team is experiencing the exact same issue as described by OP for our streaming jobs using KinesisUtils library. Code worked in 1.6 previously but now experiences KyroExceptions in 2.0 (Unregistered Class Id) no matter if JavaSerialization is enabled instead or if requireRegister is set to false. Errors are experienced in non-deterministic manor as well. We see no work-around currently for our jobs in Spark 2.0.1. > Serialization setting "spark.serializer" ignored in Spark 2.x > - > > Key: SPARK-18737 > URL: https://issues.apache.org/jira/browse/SPARK-18737 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.0, 2.0.1 >Reporter: Dr. Michael Menzel > > The following exception occurs although the JavaSerializer has been activated: > 16/11/22 10:49:24 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID > 77, ip-10-121-14-147.eu-central-1.compute.internal, partition 1, RACK_LOCAL, > 5621 bytes) > 16/11/22 10:49:24 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching > task 77 on executor id: 2 hostname: > ip-10-121-14-147.eu-central-1.compute.internal. > 16/11/22 10:49:24 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory > on ip-10-121-14-147.eu-central-1.compute.internal:45059 (size: 879.0 B, free: > 410.4 MB) > 16/11/22 10:49:24 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 77, > ip-10-121-14-147.eu-central-1.compute.internal): > com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: > 13994 > at > com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137) > at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670) > at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781) > at > org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229) > at > org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169) > at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) > at > scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) > at org.apache.spark.util.NextIterator.to(NextIterator.scala:21) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302) > at org.apache.spark.util.NextIterator.toBuffer(NextIterator.scala:21) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) > at org.apache.spark.util.NextIterator.toArray(NextIterator.scala:21) > at > org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927) > at > org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:86) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > The code runs perfectly with Spark 1.6.0. Since we moved to 2.0.0 and now > 2.0.1, we see the Kyro deserialization exception and over time the Spark > streaming job stops processing since
[jira] [Commented] (SPARK-18393) DataFrame pivot output column names should respect aliases
[ https://issues.apache.org/jira/browse/SPARK-18393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781324#comment-15781324 ] Ganesh Chand commented on SPARK-18393: -- [~hyukjin.kwon] - Can you confirm the Spark version? I get the same behavior as Eric mentioned above on 2.0.2. > DataFrame pivot output column names should respect aliases > -- > > Key: SPARK-18393 > URL: https://issues.apache.org/jira/browse/SPARK-18393 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Eric Liang >Priority: Minor > > For example > {code} > val df = spark.range(100).selectExpr("id % 5 as x", "id % 2 as a", "id as b") > df > .groupBy('x) > .pivot("a", Seq(0, 1)) > .agg(expr("sum(b)").as("blah"), expr("count(b)").as("foo")) > .show() > +---++-++-+ > | x|0_sum(`b`) AS `blah`|0_count(`b`) AS `foo`|1_sum(`b`) AS > `blah`|1_count(`b`) AS `foo`| > +---++-++-+ > | 0| 450| 10| 500| > 10| > | 1| 510| 10| 460| > 10| > | 3| 530| 10| 480| > 10| > | 2| 470| 10| 520| > 10| > | 4| 490| 10| 540| > 10| > +---++-++-+ > {code} > The column names here are quite hard to read. Ideally we would respect the > aliases and generate column names like 0_blah, 0_foo, 1_blah, 1_foo instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18993) Unable to build/compile Spark in IntelliJ due to missing Scala deps in spark-tags
[ https://issues.apache.org/jira/browse/SPARK-18993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781302#comment-15781302 ] Apache Spark commented on SPARK-18993: -- User 'srowen' has created a pull request for this issue: https://github.com/apache/spark/pull/16418 > Unable to build/compile Spark in IntelliJ due to missing Scala deps in > spark-tags > - > > Key: SPARK-18993 > URL: https://issues.apache.org/jira/browse/SPARK-18993 > Project: Spark > Issue Type: Bug > Components: Build >Reporter: Xiao Li >Priority: Critical > > After https://github.com/apache/spark/pull/16311 is merged, I am unable to > build it in my IntelliJ. Got the following compilation error: > {noformat} > Error:scalac: error while loading Object, Missing dependency 'object scala in > compiler mirror', required by > /Library/Java/JavaVirtualMachines/jdk1.8.0_74.jdk/Contents/Home/jre/lib/rt.jar(java/lang/Object.class) > Error:scalac: Error: object scala in compiler mirror not found. > scala.reflect.internal.MissingRequirementError: object scala in compiler > mirror not found. > at > scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:17) > at > scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:18) > at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:53) > at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:66) > at > scala.reflect.internal.Mirrors$RootsBase.getPackage(Mirrors.scala:173) > at > scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackage$lzycompute(Definitions.scala:161) > at > scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackage(Definitions.scala:161) > at > scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackageClass$lzycompute(Definitions.scala:162) > at > scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackageClass(Definitions.scala:162) > at > scala.reflect.internal.Definitions$DefinitionsClass.init(Definitions.scala:1395) > at scala.tools.nsc.Global$Run.(Global.scala:1215) > at xsbt.CachedCompiler0$$anon$2.(CompilerInterface.scala:105) > at xsbt.CachedCompiler0.run(CompilerInterface.scala:105) > at xsbt.CachedCompiler0.run(CompilerInterface.scala:94) > at xsbt.CompilerInterface.run(CompilerInterface.scala:22) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:101) > at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:47) > at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:41) > at > org.jetbrains.jps.incremental.scala.local.IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:29) > at > org.jetbrains.jps.incremental.scala.local.LocalServer.compile(LocalServer.scala:26) > at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main.scala:67) > at > org.jetbrains.jps.incremental.scala.remote.Main$.nailMain(Main.scala:24) > at org.jetbrains.jps.incremental.scala.remote.Main.nailMain(Main.scala) > at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at com.martiansoftware.nailgun.NGSession.run(NGSession.java:319) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18993) Unable to build/compile Spark in IntelliJ due to missing Scala deps in spark-tags
[ https://issues.apache.org/jira/browse/SPARK-18993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-18993: -- Summary: Unable to build/compile Spark in IntelliJ due to missing Scala deps in spark-tags (was: Unable to build/compile Spark in IntelliJ) > Unable to build/compile Spark in IntelliJ due to missing Scala deps in > spark-tags > - > > Key: SPARK-18993 > URL: https://issues.apache.org/jira/browse/SPARK-18993 > Project: Spark > Issue Type: Bug > Components: Build >Reporter: Xiao Li >Priority: Critical > > After https://github.com/apache/spark/pull/16311 is merged, I am unable to > build it in my IntelliJ. Got the following compilation error: > {noformat} > Error:scalac: error while loading Object, Missing dependency 'object scala in > compiler mirror', required by > /Library/Java/JavaVirtualMachines/jdk1.8.0_74.jdk/Contents/Home/jre/lib/rt.jar(java/lang/Object.class) > Error:scalac: Error: object scala in compiler mirror not found. > scala.reflect.internal.MissingRequirementError: object scala in compiler > mirror not found. > at > scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:17) > at > scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:18) > at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:53) > at > scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:66) > at > scala.reflect.internal.Mirrors$RootsBase.getPackage(Mirrors.scala:173) > at > scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackage$lzycompute(Definitions.scala:161) > at > scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackage(Definitions.scala:161) > at > scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackageClass$lzycompute(Definitions.scala:162) > at > scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackageClass(Definitions.scala:162) > at > scala.reflect.internal.Definitions$DefinitionsClass.init(Definitions.scala:1395) > at scala.tools.nsc.Global$Run.(Global.scala:1215) > at xsbt.CachedCompiler0$$anon$2.(CompilerInterface.scala:105) > at xsbt.CachedCompiler0.run(CompilerInterface.scala:105) > at xsbt.CachedCompiler0.run(CompilerInterface.scala:94) > at xsbt.CompilerInterface.run(CompilerInterface.scala:22) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:101) > at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:47) > at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:41) > at > org.jetbrains.jps.incremental.scala.local.IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:29) > at > org.jetbrains.jps.incremental.scala.local.LocalServer.compile(LocalServer.scala:26) > at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main.scala:67) > at > org.jetbrains.jps.incremental.scala.remote.Main$.nailMain(Main.scala:24) > at org.jetbrains.jps.incremental.scala.remote.Main.nailMain(Main.scala) > at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at com.martiansoftware.nailgun.NGSession.run(NGSession.java:319) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18974) FileInputDStream could not detected files which moved to the directory
[ https://issues.apache.org/jira/browse/SPARK-18974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781279#comment-15781279 ] Shixiong Zhu commented on SPARK-18974: -- Not sure about HDFS. But I just checked my local file system on Mac. It doesn't change `access_time` after renaming a file, either. {code} $ stat -x a.txt File: "a.txt" Size: 2FileType: Regular File Mode: (0644/-rw-r--r--) ... Device: 1,4 Inode: 437388576Links: 1 Access: Tue Dec 27 13:10:13 2016 Modify: Tue Dec 27 13:10:13 2016 Change: Tue Dec 27 13:10:13 2016 $ sleep 3 $ mv a.txt b.txt $ stat -x b.txt File: "b.txt" Size: 2FileType: Regular File Mode: (0644/-rw-r--r--) ... Device: 1,4 Inode: 437388576Links: 1 Access: Tue Dec 27 13:10:13 2016 Modify: Tue Dec 27 13:10:13 2016 Change: Tue Dec 27 13:10:13 2016 {code} > FileInputDStream could not detected files which moved to the directory > --- > > Key: SPARK-18974 > URL: https://issues.apache.org/jira/browse/SPARK-18974 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 1.6.3, 2.0.2 >Reporter: Adam Wang > > FileInputDStream use mod time to find new files, but if a file was moved into > the directories it's modification time would not be changed, so > FileInputDStream could not detect these files. > I think a way to fix this bug is get access_time and do judgment, bug it need > a Set of files to save all old files, it would very inefficient for lot of > files directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18974) FileInputDStream could not detected files which moved to the directory
[ https://issues.apache.org/jira/browse/SPARK-18974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-18974: - Component/s: (was: Input/Output) > FileInputDStream could not detected files which moved to the directory > --- > > Key: SPARK-18974 > URL: https://issues.apache.org/jira/browse/SPARK-18974 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 1.6.3, 2.0.2 >Reporter: Adam Wang > > FileInputDStream use mod time to find new files, but if a file was moved into > the directories it's modification time would not be changed, so > FileInputDStream could not detect these files. > I think a way to fix this bug is get access_time and do judgment, bug it need > a Set of files to save all old files, it would very inefficient for lot of > files directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19007) Speedup and optimize the GradientBoostedTrees in the "data>memory" scene
[ https://issues.apache.org/jira/browse/SPARK-19007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-19007: -- Affects Version/s: (was: 1.6.3) (was: 1.6.2) (was: 1.6.1) (was: 1.5.2) (was: 1.5.1) (was: 1.6.0) (was: 1.5.0) (was: 2.0.0) Target Version/s: (was: 2.1.0) Priority: Minor (was: Major) Fix Version/s: (was: 2.1.0) > Speedup and optimize the GradientBoostedTrees in the "data>memory" scene > > > Key: SPARK-19007 > URL: https://issues.apache.org/jira/browse/SPARK-19007 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib >Affects Versions: 2.0.1, 2.0.2, 2.1.0 > Environment: A CDH cluster consists of 3 redhat server ,(120G > memory、40 cores、43TB disk per server). >Reporter: zhangdenghui >Priority: Minor > Original Estimate: 168h > Remaining Estimate: 168h > > Test data:80G CTR training data from > criteolabs(http://criteolabs.wpengine.com/downloads/download-terabyte-click-logs/ > ) ,I used 1 of the 24 days' data.Some features needed to be repalced by new > generated continuous features,the way to generate the new features refers to > the way mentioned in the xgboost's paper. > Recource allocated: spark on yarn, 20 executors, 8G memory and 2 cores per > executor. > Parameters: numIterations 10, maxdepth 8, the rest parameters are default > I tested the GradientBoostedTrees algorithm in mllib using 80G CTR data > mentioned above. > It totally costs 1.5 hour, and i found many task failures after 6 or 7 GBT > rounds later.Without these task failures and task retry it can be much faster > ,which can save about half the time. I think it's caused by the RDD named > predError in the while loop of the boost method at > GradientBoostedTrees.scala,because the lineage of the RDD named predError is > growing after every GBT round, and then it caused failures like this : > (ExecutorLostFailure (executor 6 exited caused by one of the running tasks) > Reason: Container killed by YARN for exceeding memory limits. 10.2 GB of 10 > GB physical memory used. Consider boosting > spark.yarn.executor.memoryOverhead.). > I tried to boosting spark.yarn.executor.memoryOverhead but the meomry it > needed is too much (even increase half the memory can't solve the problem) > so i think it's not a proper method. > Although it can set the predCheckpoint Interval smaller to cut the line of > the lineage but it increases IO cost a lot. > I tried another way to solve this problem.I persisted the RDD named > predError every round and use pre_predError to record the previous RDD and > unpersist it because it's useless anymore. > Finally it costs about 45 min after i tried my method and no task failure > occured and no more memeory added. > So when the data is much larger than memory, my little improvement can > speedup the GradientBoostedTrees 1~2 times. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-19006) should mentioned the max value allowed for spark.kryoserializer.buffer.max in doc
[ https://issues.apache.org/jira/browse/SPARK-19006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-19006. --- Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 16412 [https://github.com/apache/spark/pull/16412] > should mentioned the max value allowed for spark.kryoserializer.buffer.max in > doc > - > > Key: SPARK-19006 > URL: https://issues.apache.org/jira/browse/SPARK-19006 > Project: Spark > Issue Type: Documentation >Reporter: Yuexin Zhang >Priority: Trivial > Fix For: 2.2.0 > > > On configuration doc > page:https://spark.apache.org/docs/latest/configuration.html > We mentioned spark.kryoserializer.buffer.max : Maximum allowable size of Kryo > serialization buffer. This must be larger than any object you attempt to > serialize. Increase this if you get a "buffer limit exceeded" exception > inside Kryo. > from source code, it has hard coded upper limit : > val maxBufferSizeMb = conf.getSizeAsMb("spark.kryoserializer.buffer.max", > "64m").toInt > if (maxBufferSizeMb >= ByteUnit.GiB.toMiB(2)) { > throw new IllegalArgumentException("spark.kryoserializer.buffer.max must > be less than " + > s"2048 mb, got: + $maxBufferSizeMb mb.") > } > We should mention "this value must be less than 2048 mb" on the config page > as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-18970) FileSource failure during file list refresh doesn't cause an application to fail, but stops further processing
[ https://issues.apache.org/jira/browse/SPARK-18970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15771583#comment-15771583 ] Lev edited comment on SPARK-18970 at 12/27/16 8:15 PM: --- Actually this is exactly the behavior I want. My problem is that application appeared to be alive, but was not processing and new files after this message in the log. I intentionally included the portion of the log at the top, showing that file list was refreshed every couple of minutes before. But as you can see refreshing have stopped after the logged error. was (Author: lev.numerify): Actually this is exactly the behavior I want. My problem is that application appeared to be alive, but was not processing and new files after this message in the log. I intentionally included the portion of the log in the bottom, showing that file lisd was refreshed every couple of minutes before. > FileSource failure during file list refresh doesn't cause an application to > fail, but stops further processing > -- > > Key: SPARK-18970 > URL: https://issues.apache.org/jira/browse/SPARK-18970 > Project: Spark > Issue Type: Bug > Components: SQL, Structured Streaming >Affects Versions: 2.0.0, 2.0.2 >Reporter: Lev > Attachments: sparkerror.log > > > Spark streaming application uses S3 files as streaming sources. After running > for several day processing stopped even though an application continued to > run. > Stack trace: > java.io.FileNotFoundException: No such file or directory > 's3n://X' > at > com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:818) > at > com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:511) > at > org.apache.spark.sql.execution.datasources.HadoopFsRelation$$anonfun$7$$anonfun$apply$3.apply(fileSourceInterfaces.scala:465) > at > org.apache.spark.sql.execution.datasources.HadoopFsRelation$$anonfun$7$$anonfun$apply$3.apply(fileSourceInterfaces.scala:462) > at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434) > at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) > at scala.collection.Iterator$class.foreach(Iterator.scala:893) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48) > at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310) > at scala.collection.AbstractIterator.to(Iterator.scala:1336) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1336) > at > org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:893) > at > org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:893) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897) > at > org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70) > at org.apache.spark.scheduler.Task.run(Task.scala:85) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > I believe 2 things should (or can) be fixed: > 1. Application should fail in case of such an error. > 2. Allow application to ignore such failure, since there is a chance that > during next refresh the error will not resurface. (In my case I believe an > error was cased by S3 cleaning the bucket exactly at the same moment when > refresh was running) > My code to create streaming processing looks as the following: > val cq = sqlContext.readStream > .format("json") > .schema(struct) > .load(s"input") > .writeStream > .option("checkpointLocation", s"checkpoints") > .foreach(new ForeachWriter[Row] {...}) > .trigger(ProcessingTime("10 seconds")).start() > > cq.awaitTermination() --
[jira] [Updated] (SPARK-19006) should mentioned the max value allowed for spark.kryoserializer.buffer.max in doc
[ https://issues.apache.org/jira/browse/SPARK-19006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-19006: -- Priority: Trivial (was: Major) (Not "Major") > should mentioned the max value allowed for spark.kryoserializer.buffer.max in > doc > - > > Key: SPARK-19006 > URL: https://issues.apache.org/jira/browse/SPARK-19006 > Project: Spark > Issue Type: Documentation >Reporter: Yuexin Zhang >Priority: Trivial > > On configuration doc > page:https://spark.apache.org/docs/latest/configuration.html > We mentioned spark.kryoserializer.buffer.max : Maximum allowable size of Kryo > serialization buffer. This must be larger than any object you attempt to > serialize. Increase this if you get a "buffer limit exceeded" exception > inside Kryo. > from source code, it has hard coded upper limit : > val maxBufferSizeMb = conf.getSizeAsMb("spark.kryoserializer.buffer.max", > "64m").toInt > if (maxBufferSizeMb >= ByteUnit.GiB.toMiB(2)) { > throw new IllegalArgumentException("spark.kryoserializer.buffer.max must > be less than " + > s"2048 mb, got: + $maxBufferSizeMb mb.") > } > We should mention "this value must be less than 2048 mb" on the config page > as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18997) Recommended upgrade libthrift to 0.9.3
[ https://issues.apache.org/jira/browse/SPARK-18997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-18997: -- Priority: Minor (was: Critical) > Recommended upgrade libthrift to 0.9.3 > --- > > Key: SPARK-18997 > URL: https://issues.apache.org/jira/browse/SPARK-18997 > Project: Spark > Issue Type: Bug > Components: Build >Reporter: meiyoula >Priority: Minor > > libthrift 0.9.2 has a serious security vulnerability:CVE-2015-3254 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18997) Recommended upgrade libthrift to 0.9.3
[ https://issues.apache.org/jira/browse/SPARK-18997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781139#comment-15781139 ] Sean Owen commented on SPARK-18997: --- So, the questions are: does this CVE impact Spark at all? and what are the other implications of updating like breaking changes? A maintenance release is typically fine to take, but this is a very large one. I'd like to see if the CVE even matters for Spark too -- can you provide more info? > Recommended upgrade libthrift to 0.9.3 > --- > > Key: SPARK-18997 > URL: https://issues.apache.org/jira/browse/SPARK-18997 > Project: Spark > Issue Type: Bug > Components: Build >Reporter: meiyoula >Priority: Critical > > libthrift 0.9.2 has a serious security vulnerability:CVE-2015-3254 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18997) Recommended upgrade libthrift to 0.9.3
[ https://issues.apache.org/jira/browse/SPARK-18997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-18997: -- Issue Type: Improvement (was: Bug) > Recommended upgrade libthrift to 0.9.3 > --- > > Key: SPARK-18997 > URL: https://issues.apache.org/jira/browse/SPARK-18997 > Project: Spark > Issue Type: Improvement > Components: Build >Reporter: meiyoula >Priority: Minor > > libthrift 0.9.2 has a serious security vulnerability:CVE-2015-3254 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18757) Models in Pyspark support column setters
[ https://issues.apache.org/jira/browse/SPARK-18757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781096#comment-15781096 ] Joseph K. Bradley commented on SPARK-18757: --- I think it's useful to make the Python types match the Scala ones. However, I want to avoid introducing more abstract classes in Scala, unless they are really useful. We've run into issues with them, where some (especially the Classifier and ProbabilisticClassifier) need to be specialized so much that they are probably not worth the trouble. At some point, I hope we can replace them with traits and eliminate the "shared" implementations which are really not shareable across all subclasses. To see what I mean, check out the overrides in LogisticRegression. > Models in Pyspark support column setters > > > Key: SPARK-18757 > URL: https://issues.apache.org/jira/browse/SPARK-18757 > Project: Spark > Issue Type: Brainstorming > Components: ML, PySpark >Reporter: zhengruifeng > > Recently, I found three places in which column setters are missing: > KMeansModel, BisectingKMeansModel and OneVsRestModel. > These three models directly inherit `Model` which dont have columns setters, > so I had to add the missing setters manually in [SPARK-18625] and > [SPARK-18520]. > Fow now, models in pyspark still don't support column setters at all. > I suggest that we keep the hierarchy of pyspark models in line with that in > the scala side: > For classifiation and regression algs, I‘m making a trial in [SPARK-18739]. > In it, I try to copy the hierarchy from the scala side. > For clustering algs, I think we may first create abstract classes > {{ClusteringModel}} and {{ProbabilisticClusteringModel}} in the scala side, > and make clustering algs inherit it. Then, in the python side, we copy the > hierarchy so that we dont need to add setters manually for each alg. > For features algs, we can also use a abstract class {{FeatureModel}} in scala > side, and do the same thing. > What's your opinions? [~yanboliang][~josephkb][~sethah][~srowen] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18757) Models in Pyspark support column setters
[ https://issues.apache.org/jira/browse/SPARK-18757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-18757: -- Description: Recently, I found three places in which column setters are missing: KMeansModel, BisectingKMeansModel and OneVsRestModel. These three models directly inherit `Model` which dont have columns setters, so I had to add the missing setters manually in [SPARK-18625] and [SPARK-18520]. Fow now, models in pyspark still don't support column setters at all. I suggest that we keep the hierarchy of pyspark models in line with that in the scala side: For classifiation and regression algs, I‘m making a trial in [SPARK-18739]. In it, I try to copy the hierarchy from the scala side. For clustering algs, I think we may first create abstract classes {{ClusteringModel}} and {{ProbabilisticClusteringModel}} in the scala side, and make clustering algs inherit it. Then, in the python side, we copy the hierarchy so that we dont need to add setters manually for each alg. For features algs, we can also use a abstract class {{FeatureModel}} in scala side, and do the same thing. What's your opinions? [~yanboliang][~josephkb][~sethah][~srowen] was: Recently, I found three places in which column setters are missing: KMeansModel, BisectingKMeansModel and OneVsRestModel. These three models directly inherit `Model` which dont have columns setters, so I had to add the missing setters manually in [SPARK-18625] and [SPARK-18520]. Fow now, models in pyspark still don't support column setters at all. I suggest that we keep the hierarchy of pyspark models in line with that in the scala side: For classifiation and regression algs, I‘m making a trial in [SPARK-18379]. In it, I try to copy the hierarchy from the scala side. For clustering algs, I think we may first create abstract classes {{ClusteringModel}} and {{ProbabilisticClusteringModel}} in the scala side, and make clustering algs inherit it. Then, in the python side, we copy the hierarchy so that we dont need to add setters manually for each alg. For features algs, we can also use a abstract class {{FeatureModel}} in scala side, and do the same thing. What's your opinions? [~yanboliang][~josephkb][~sethah][~srowen] > Models in Pyspark support column setters > > > Key: SPARK-18757 > URL: https://issues.apache.org/jira/browse/SPARK-18757 > Project: Spark > Issue Type: Brainstorming > Components: ML, PySpark >Reporter: zhengruifeng > > Recently, I found three places in which column setters are missing: > KMeansModel, BisectingKMeansModel and OneVsRestModel. > These three models directly inherit `Model` which dont have columns setters, > so I had to add the missing setters manually in [SPARK-18625] and > [SPARK-18520]. > Fow now, models in pyspark still don't support column setters at all. > I suggest that we keep the hierarchy of pyspark models in line with that in > the scala side: > For classifiation and regression algs, I‘m making a trial in [SPARK-18739]. > In it, I try to copy the hierarchy from the scala side. > For clustering algs, I think we may first create abstract classes > {{ClusteringModel}} and {{ProbabilisticClusteringModel}} in the scala side, > and make clustering algs inherit it. Then, in the python side, we copy the > hierarchy so that we dont need to add setters manually for each alg. > For features algs, we can also use a abstract class {{FeatureModel}} in scala > side, and do the same thing. > What's your opinions? [~yanboliang][~josephkb][~sethah][~srowen] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-18842) De-duplicate paths in classpaths in processes for local-cluster mode to work around the length limitation on Windows
[ https://issues.apache.org/jira/browse/SPARK-18842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-18842. --- Resolution: Fixed Issue resolved by pull request 16398 [https://github.com/apache/spark/pull/16398] > De-duplicate paths in classpaths in processes for local-cluster mode to work > around the length limitation on Windows > > > Key: SPARK-18842 > URL: https://issues.apache.org/jira/browse/SPARK-18842 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Tests >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon > Fix For: 2.2.0 > > > Currently, some tests are being failed and hanging on Windows due to this > problem. For the reason in SPARK-18718, some tests using {{local-cluster}} > mode were disabled on Windows due to the length limitation by paths given to > classpaths. > The limitation seems roughly 32K (see > https://blogs.msdn.microsoft.com/oldnewthing/20031210-00/?p=41553/ and > https://support.thoughtworks.com/hc/en-us/articles/213248526-Getting-around-maximum-command-line-length-is-32767-characters-on-Windows) > but executors were being launched with the command such as > https://gist.github.com/HyukjinKwon/5bc81061c250d4af5a180869b59d42ea in > (only) tests. > This length is roughly 40K due to the class paths. However, it seems there > are duplicates more than half. So, if we de-duplicate this paths, it is > reduced to roughly 20K. > Maybe, we should consider as some more paths are added in the future but it > seems better than disabling all the tests for now with minimised changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19013) java.util.ConcurrentModificationException when using s3 path as checkpointLocation
[ https://issues.apache.org/jira/browse/SPARK-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781029#comment-15781029 ] Tim Chan commented on SPARK-19013: -- Perhaps the documentation should be revised to recommend against using s3 as a location for {{checkpointLocation}}? I will test with an hdfs location and update this ticket with my findings. > java.util.ConcurrentModificationException when using s3 path as > checkpointLocation > --- > > Key: SPARK-19013 > URL: https://issues.apache.org/jira/browse/SPARK-19013 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.0.2 >Reporter: Tim Chan > > I have a structured stream job running on EMR. The job will fail due to this > {code} > Multiple HDFSMetadataLog are using s3://mybucket/myapp > org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:162) > {code} > There is only one instance of this stream job running. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19013) java.util.ConcurrentModificationException when using s3 path as checkpointLocation
[ https://issues.apache.org/jira/browse/SPARK-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Chan updated SPARK-19013: - Description: I have a structured stream job running on EMR. The job will fail due to this {code} Multiple HDFSMetadataLog are using s3://mybucket/myapp org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:162) {code} There is only one instance of this stream job running. was: I have a structured stream job running on EMR. The job will fail due to this ``` Multiple HDFSMetadataLog are using s3://mybucket/myapp org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:162) ``` There is only one instance of this stream job running. > java.util.ConcurrentModificationException when using s3 path as > checkpointLocation > --- > > Key: SPARK-19013 > URL: https://issues.apache.org/jira/browse/SPARK-19013 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.0.2 >Reporter: Tim Chan > > I have a structured stream job running on EMR. The job will fail due to this > {code} > Multiple HDFSMetadataLog are using s3://mybucket/myapp > org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:162) > {code} > There is only one instance of this stream job running. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-19013) java.util.ConcurrentModificationException when using s3 path as checkpointLocation
[ https://issues.apache.org/jira/browse/SPARK-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-19013. -- Resolution: Not A Problem This is basically a S3 problem, not a Spark one. > java.util.ConcurrentModificationException when using s3 path as > checkpointLocation > --- > > Key: SPARK-19013 > URL: https://issues.apache.org/jira/browse/SPARK-19013 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.0.2 >Reporter: Tim Chan > > I have a structured stream job running on EMR. The job will fail due to this > ``` > Multiple HDFSMetadataLog are using s3://mybucket/myapp > org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:162) > ``` > There is only one instance of this stream job running. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19013) java.util.ConcurrentModificationException when using s3 path as checkpointLocation
[ https://issues.apache.org/jira/browse/SPARK-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781019#comment-15781019 ] Shixiong Zhu commented on SPARK-19013: -- This is probably because S3 negative cache. "a negative GET may be cached, such that even if an object is immediately created, the fact that there "wasn't" an object is still remembered." See https://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/s3-trouble/index.html#visible-s3-inconsistency for details. > java.util.ConcurrentModificationException when using s3 path as > checkpointLocation > --- > > Key: SPARK-19013 > URL: https://issues.apache.org/jira/browse/SPARK-19013 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 2.0.2 >Reporter: Tim Chan > > I have a structured stream job running on EMR. The job will fail due to this > ``` > Multiple HDFSMetadataLog are using s3://mybucket/myapp > org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:162) > ``` > There is only one instance of this stream job running. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-18990) make DatasetBenchmark fairer for Dataset
[ https://issues.apache.org/jira/browse/SPARK-18990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18990: Assignee: Wenchen Fan (was: Apache Spark) > make DatasetBenchmark fairer for Dataset > > > Key: SPARK-18990 > URL: https://issues.apache.org/jira/browse/SPARK-18990 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-18990) make DatasetBenchmark fairer for Dataset
[ https://issues.apache.org/jira/browse/SPARK-18990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18990: Assignee: Apache Spark (was: Wenchen Fan) > make DatasetBenchmark fairer for Dataset > > > Key: SPARK-18990 > URL: https://issues.apache.org/jira/browse/SPARK-18990 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Wenchen Fan >Assignee: Apache Spark > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18990) make DatasetBenchmark fairer for Dataset
[ https://issues.apache.org/jira/browse/SPARK-18990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-18990: - Fix Version/s: (was: 2.2.0) > make DatasetBenchmark fairer for Dataset > > > Key: SPARK-18990 > URL: https://issues.apache.org/jira/browse/SPARK-18990 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-18990) make DatasetBenchmark fairer for Dataset
[ https://issues.apache.org/jira/browse/SPARK-18990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai reopened SPARK-18990: -- > make DatasetBenchmark fairer for Dataset > > > Key: SPARK-18990 > URL: https://issues.apache.org/jira/browse/SPARK-18990 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > Fix For: 2.2.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19013) java.util.ConcurrentModificationException when using s3 path as checkpointLocation
Tim Chan created SPARK-19013: Summary: java.util.ConcurrentModificationException when using s3 path as checkpointLocation Key: SPARK-19013 URL: https://issues.apache.org/jira/browse/SPARK-19013 Project: Spark Issue Type: Bug Components: Structured Streaming Affects Versions: 2.0.2 Reporter: Tim Chan I have a structured stream job running on EMR. The job will fail due to this ``` Multiple HDFSMetadataLog are using s3://mybucket/myapp org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:162) ``` There is only one instance of this stream job running. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical
[ https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780545#comment-15780545 ] Jork Zijlstra commented on SPARK-19012: --- The type is viewName: String so you would expect any string to work. When executing createOrReplaceTempView(tableOrViewName: String) an ParseException of {code} == SQL == {tableOrViewName} {code} is thrown. You might not see that these are related, because it goes on about an SQL ParseException and you just defined a tempView You might not expect that the tableOrViewName trigger some SQL parsing. A slighter more clearer exception message (viewName not supported) could be more helpful or add the identifier rules set to the documentation. As you said prefixing the tableOrViewName with a non-numerical value does the trick (although for me this feels more like a workaround) > CreateOrReplaceTempView throws > org.apache.spark.sql.catalyst.parser.ParseException when viewName first char > is numerical > > > Key: SPARK-19012 > URL: https://issues.apache.org/jira/browse/SPARK-19012 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.1, 2.0.2 >Reporter: Jork Zijlstra > > Using a viewName where the the fist char is a numerical value on > dataframe.createOrReplaceTempView(viewName: String) causes: > {code} > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', > 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', > 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', > 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', > 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', > 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', > 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', > 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', > 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', > 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', > 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', > 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', > 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', > 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', > 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', > 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', > 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', > 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', > 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', > 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, > DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', > 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', > 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', > 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', > 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0) > == SQL == > 1 > {code} > {code} > val tableOrViewName = "1" //fails > val tableOrViewName = "a" //works > sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-18990) make DatasetBenchmark fairer for Dataset
[ https://issues.apache.org/jira/browse/SPARK-18990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-18990. - Resolution: Fixed Fix Version/s: 2.2.0 > make DatasetBenchmark fairer for Dataset > > > Key: SPARK-18990 > URL: https://issues.apache.org/jira/browse/SPARK-18990 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan > Fix For: 2.2.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19004) Fix `JDBCWriteSuite.testH2Dialect` by removing `getCatalystType`
[ https://issues.apache.org/jira/browse/SPARK-19004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-19004: Assignee: Dongjoon Hyun > Fix `JDBCWriteSuite.testH2Dialect` by removing `getCatalystType` > > > Key: SPARK-19004 > URL: https://issues.apache.org/jira/browse/SPARK-19004 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 2.1.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 2.2.0 > > > JDBCSuite and JDBCWriterSuite have its own testH2Dialect for their testing > purposes. > This issue fixes testH2Dialect in JDBCWriterSuite by removing getCatalystType > implementation in order to return correct types. Currently, it returns > Some(StringType) incorrectly. For the testH2Dialect in JDBCSuite, it's > intentional because of the test case Remap types via JdbcDialects. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-19004) Fix `JDBCWriteSuite.testH2Dialect` by removing `getCatalystType`
[ https://issues.apache.org/jira/browse/SPARK-19004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-19004. - Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 16409 [https://github.com/apache/spark/pull/16409] > Fix `JDBCWriteSuite.testH2Dialect` by removing `getCatalystType` > > > Key: SPARK-19004 > URL: https://issues.apache.org/jira/browse/SPARK-19004 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 2.1.0 >Reporter: Dongjoon Hyun >Priority: Minor > Fix For: 2.2.0 > > > JDBCSuite and JDBCWriterSuite have its own testH2Dialect for their testing > purposes. > This issue fixes testH2Dialect in JDBCWriterSuite by removing getCatalystType > implementation in order to return correct types. Currently, it returns > Some(StringType) incorrectly. For the testH2Dialect in JDBCSuite, it's > intentional because of the test case Remap types via JdbcDialects. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-18999) simplify Literal codegen
[ https://issues.apache.org/jira/browse/SPARK-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-18999. - Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 16402 [https://github.com/apache/spark/pull/16402] > simplify Literal codegen > > > Key: SPARK-18999 > URL: https://issues.apache.org/jira/browse/SPARK-18999 > Project: Spark > Issue Type: Improvement > Components: SQL >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Minor > Fix For: 2.2.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-19010) Include Kryo exception in case of overflow
[ https://issues.apache.org/jira/browse/SPARK-19010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19010: Assignee: Apache Spark > Include Kryo exception in case of overflow > -- > > Key: SPARK-19010 > URL: https://issues.apache.org/jira/browse/SPARK-19010 > Project: Spark > Issue Type: Improvement >Affects Versions: 1.6.3 >Reporter: Sergei Lebedev >Assignee: Apache Spark > > SPARK-6087 replaced Kryo overflow exception with a SparkException giving > Spark-specific instructions on tackling the issue. The implementation also > [suppressed|https://github.com/apache/spark/pull/4947/files#diff-1f81c62dad0e2dfc387a974bb08c497cR165] > the original Kryo exception, which made it impossible to trace the operation > causing the overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-19010) Include Kryo exception in case of overflow
[ https://issues.apache.org/jira/browse/SPARK-19010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-19010: Assignee: (was: Apache Spark) > Include Kryo exception in case of overflow > -- > > Key: SPARK-19010 > URL: https://issues.apache.org/jira/browse/SPARK-19010 > Project: Spark > Issue Type: Improvement >Affects Versions: 1.6.3 >Reporter: Sergei Lebedev > > SPARK-6087 replaced Kryo overflow exception with a SparkException giving > Spark-specific instructions on tackling the issue. The implementation also > [suppressed|https://github.com/apache/spark/pull/4947/files#diff-1f81c62dad0e2dfc387a974bb08c497cR165] > the original Kryo exception, which made it impossible to trace the operation > causing the overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19010) Include Kryo exception in case of overflow
[ https://issues.apache.org/jira/browse/SPARK-19010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780449#comment-15780449 ] Apache Spark commented on SPARK-19010: -- User 'superbobry' has created a pull request for this issue: https://github.com/apache/spark/pull/16416 > Include Kryo exception in case of overflow > -- > > Key: SPARK-19010 > URL: https://issues.apache.org/jira/browse/SPARK-19010 > Project: Spark > Issue Type: Improvement >Affects Versions: 1.6.3 >Reporter: Sergei Lebedev > > SPARK-6087 replaced Kryo overflow exception with a SparkException giving > Spark-specific instructions on tackling the issue. The implementation also > [suppressed|https://github.com/apache/spark/pull/4947/files#diff-1f81c62dad0e2dfc387a974bb08c497cR165] > the original Kryo exception, which made it impossible to trace the operation > causing the overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical
[ https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780429#comment-15780429 ] Herman van Hovell commented on SPARK-19012: --- {{"1"}} fails because it is a numeric literal. Either add a letter to the name, or place the name between backticks, e.g.: {{"`1`"}} This is not a bug per se, we could however improve the name parsing. > CreateOrReplaceTempView throws > org.apache.spark.sql.catalyst.parser.ParseException when viewName first char > is numerical > > > Key: SPARK-19012 > URL: https://issues.apache.org/jira/browse/SPARK-19012 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.1, 2.0.2 >Reporter: Jork Zijlstra > > Using a viewName where the the fist char is a numerical value on > dataframe.createOrReplaceTempView(viewName: String) causes: > {code} > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', > 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', > 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', > 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', > 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', > 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', > 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', > 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', > 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', > 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', > 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', > 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', > 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', > 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', > 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', > 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', > 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', > 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', > 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', > 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, > DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', > 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', > 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', > 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', > 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0) > == SQL == > 1 > {code} > {code} > val tableOrViewName = "1" //fails > val tableOrViewName = "a" //works > sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical
[ https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jork Zijlstra updated SPARK-19012: -- Affects Version/s: 2.0.2 > CreateOrReplaceTempView throws > org.apache.spark.sql.catalyst.parser.ParseException when viewName first char > is numerical > > > Key: SPARK-19012 > URL: https://issues.apache.org/jira/browse/SPARK-19012 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.1, 2.0.2 >Reporter: Jork Zijlstra > > Using a viewName where the the fist char is a numerical value on > dataframe.createOrReplaceTempView(viewName: String) causes: > {code} > Exception in thread "main" > org.apache.spark.sql.catalyst.parser.ParseException: > mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', > 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', > 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', > 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', > 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', > 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', > 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', > 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', > 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', > 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', > 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', > 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', > 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', > 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', > 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', > 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', > 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', > 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', > 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', > 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', > 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', > 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', > 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, > DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', > 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', > 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', > 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', > 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', > IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0) > == SQL == > 1 > {code} > {code} > val tableOrViewName = "1" //fails > val tableOrViewName = "a" //works > sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical
[ https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jork Zijlstra updated SPARK-19012: -- Description: Using a viewName where the the fist char is a numerical value on dataframe.createOrReplaceTempView(viewName: String) causes: {code} Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException: mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0) == SQL == 1 {code} {code} val tableOrViewName = "1" //fails val tableOrViewName = "a" //works sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName) {code} was: Using a viewName where the the fist char is a numerical value on dataframe.createOrReplaceTempView(viewName: String) causes: {code} Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException: mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 'OPTION', 'ANTI', 'LOCAL', 'I
[jira] [Created] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical
Jork Zijlstra created SPARK-19012: - Summary: CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical Key: SPARK-19012 URL: https://issues.apache.org/jira/browse/SPARK-19012 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.0.1 Reporter: Jork Zijlstra Using a viewName where the the fist char is a numerical value on dataframe.createOrReplaceTempView(viewName: String) causes: {code} Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException: mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0) == SQL == 1468079114 {code} {code} val tableOrViewName = "1" //fails val tableOrViewName = "a" //works sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19011) ApplicationDescription should add the Submission ID for the standalone cluster
[ https://issues.apache.org/jira/browse/SPARK-19011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hustfxj updated SPARK-19011: Description: A large standalone cluster may have logs of applications and drivers, So I may not know which driver start up which application from the masterPage . So I think we can add the driver id at the applicationDescription. In fact, I have finished! > ApplicationDescription should add the Submission ID for the standalone cluster > -- > > Key: SPARK-19011 > URL: https://issues.apache.org/jira/browse/SPARK-19011 > Project: Spark > Issue Type: Improvement >Reporter: hustfxj > > A large standalone cluster may have logs of applications and drivers, So I > may not know which driver start up which application from the masterPage . So > I think we can add the driver id at the applicationDescription. In fact, I > have finished! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19011) ApplicationDescription should add the Submission ID for the standalone cluster
[ https://issues.apache.org/jira/browse/SPARK-19011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780310#comment-15780310 ] hustfxj commented on SPARK-19011: - A large standalone cluster may have logs of applications and drivers, So I may not know which driver start up which application from the masterPage . So I think we can add the driver id at the applicationDescription. In fact, I have finished! > ApplicationDescription should add the Submission ID for the standalone cluster > -- > > Key: SPARK-19011 > URL: https://issues.apache.org/jira/browse/SPARK-19011 > Project: Spark > Issue Type: Improvement >Reporter: hustfxj > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19011) ApplicationDescription should add the Submission ID for the standalone cluster
hustfxj created SPARK-19011: --- Summary: ApplicationDescription should add the Submission ID for the standalone cluster Key: SPARK-19011 URL: https://issues.apache.org/jira/browse/SPARK-19011 Project: Spark Issue Type: Improvement Reporter: hustfxj -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-19010) Include Kryo exception in case of overflow
Sergei Lebedev created SPARK-19010: -- Summary: Include Kryo exception in case of overflow Key: SPARK-19010 URL: https://issues.apache.org/jira/browse/SPARK-19010 Project: Spark Issue Type: Improvement Affects Versions: 1.6.3 Reporter: Sergei Lebedev SPARK-6087 replaced Kryo overflow exception with a SparkException giving Spark-specific instructions on tackling the issue. The implementation also [suppressed|https://github.com/apache/spark/pull/4947/files#diff-1f81c62dad0e2dfc387a974bb08c497cR165] the original Kryo exception, which made it impossible to trace the operation causing the overflow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-18976) in standlone mode,executor expired by HeartbeanReceiver that still take up cores but no tasks assigned to
[ https://issues.apache.org/jira/browse/SPARK-18976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-18976. --- Resolution: Duplicate > in standlone mode,executor expired by HeartbeanReceiver that still take up > cores but no tasks assigned to > -- > > Key: SPARK-18976 > URL: https://issues.apache.org/jira/browse/SPARK-18976 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 1.6.1 > Environment: jdk1.8.0_77 Red Hat 4.4.7-11 >Reporter: liujianhui > Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png > > > h2. scene > when executor expired by HeartbeatReceiver in driver, driver will mark that > executor as not live, task scheduler will not assign tasks to that executor, > but that executor's status will always be running and take up cores, the > executor 18 was expired and no task running, the task time far less than the > normal executor 142, but in app page, the executor is running > !screenshot-1.png! > !screenshot-2.png! > !screenshot-3.png! > h2.process: > # exeuctor expired by HearbeatReceiver because the last heartbeat execeed the > executor timeout > # executor will be removed in CoarseGrainedSchdulerBackend.killExecutors, so > that executor will marked as dead, it will not scheduled as offer since now > because it in executorsPendingToRemove > # status of that executor is running because the CoarseGrainedExecutorBackend > processor is also exist and it register block manager to the driver every > 10s, log as > {code} > 16/12/22 17:04:26 INFO Executor: Told to re-register on heartbeat > 16/12/22 17:04:26 INFO BlockManager: BlockManager re-registering with master > 16/12/22 17:04:26 INFO BlockManagerMaster: Trying to register BlockManager > 16/12/22 17:04:26 INFO BlockManagerMaster: Registered BlockManager > 16/12/22 17:04:26 INFO BlockManager: Reporting 0 blocks to the master. > 16/12/22 17:04:36 INFO Executor: Told to re-register on heartbeat > 16/12/22 17:04:36 INFO BlockManager: BlockManager re-registering with master > 16/12/22 17:04:36 INFO BlockManagerMaster: Trying to register BlockManager > 16/12/22 17:04:36 INFO BlockManagerMaster: Registered BlockManager > 16/12/22 17:04:36 INFO BlockManager: Reporting 0 blocks to the master. > 16/12/22 17:04:46 INFO Executor: Told to re-register on heartbeat > 16/12/22 17:04:46 INFO BlockManager: BlockManager re-registering with master > 16/12/22 17:04:46 INFO BlockManagerMaster: Trying to register BlockManager > 16/12/22 17:04:46 INFO BlockManagerMaster: Registered BlockManager > 16/12/22 17:04:46 INFO BlockManager: Reporting 0 blocks to the master. > 16/12/22 17:04:56 INFO Executor: Told to re-register on heartbeat > 16/12/22 17:04:56 INFO BlockManager: BlockManager re-registering with master > 16/12/22 17:04:56 INFO BlockManagerMaster: Trying to register BlockManager > 16/12/22 17:04:56 INFO BlockManagerMaster: Registered BlockManager > 16/12/22 17:04:56 INFO BlockManager: Reporting 0 blocks to the master. > {code} > h2. resolve > when the register times exceed some threshold(e.g. 10), the executor should > exit as zero -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-15359) Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run()
[ https://issues.apache.org/jira/browse/SPARK-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780014#comment-15780014 ] Jared commented on SPARK-15359: --- Hi, I also met some similar problem when running spark on mesos. For my testing, spark mesos dispatcher didn't register with mesos master successfully. But mesos dispatcher is still brought up and listening on default port 7077. I think mesos dispatcher should been shut down if status of mesosDriver.run() is DRIVER_ABORTED. I didn't quite understand content in the description. What's meaning of "successful registration"? Do you mean mesosDriver.run() return without aborting? If we're working exactly on the same problem, I would like to contribute to fix this issue, for example, review code changes or testing the fixes and so on. Thanks, Jared > Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run() > --- > > Key: SPARK-15359 > URL: https://issues.apache.org/jira/browse/SPARK-15359 > Project: Spark > Issue Type: Bug > Components: Deploy, Mesos >Reporter: Devaraj K >Priority: Minor > > Mesos dispatcher handles DRIVER_ABORTED status for mesosDriver.run() during > the successful registration but if the mesosDriver.run() returns > DRIVER_ABORTED status after the successful register then there is no action > for the status and the thread will be terminated. > I think we need to throw the exception and shutdown the dispatcher. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org