[jira] [Comment Edited] (SPARK-18974) FileInputDStream could not detected files which moved to the directory

2016-12-27 Thread Adam Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782284#comment-15782284
 ] 

Adam Wang edited comment on SPARK-18974 at 12/28/16 7:31 AM:
-

Thanks for remind, this is my fault. This bug seems not so easy to solve, could 
we define a Set[Path] to check if file had been read by InputDstream? Such as:

@transient private var selectedFiles = new mutable.HashSet[String]()

... 

  override def start() { 
addOldFilesToSelectedFiles() 
  }
...

  private def isNewFile(path: Path, currentTime: Long, modTimeIgnoreThreshold: 
Long): Boolean = {
val pathStr = path.toString
..
if (selectedFiles.contains(pathStr)) {
  return false
}
 ...
  }

But it would very inefficient for lot of files directory. Are there any other 
way to judge the moved new files?



was (Author: adam wang):
Thanks for remind, this is my fault. This bug seems not so easy to solve, could 
we define a Set[Path] to check if file had been read by InputDstream? Such as:

@transient private var selectedFiles = new mutable.HashSet[String]()

... 

  override def start() { 
addOldFilesToSelectedFiles() 
  }
...

  private def isNewFile(path: Path, currentTime: Long, modTimeIgnoreThreshold: 
Long): Boolean = {
val pathStr = path.toString
..
// Reject file if it was considered earlier
if (selectedFiles.contains(pathStr)) {
  return false
}
logDebug(s"$pathStr accepted with mod time $modTime")
return true
  }

But it would very inefficient for lot of files directory. Are there any other 
way to judge the moved new files?


> FileInputDStream could not detected files which moved to the directory 
> ---
>
> Key: SPARK-18974
> URL: https://issues.apache.org/jira/browse/SPARK-18974
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 1.6.3, 2.0.2
>Reporter: Adam Wang
>
> FileInputDStream use mod time to find new files, but if a file was moved into 
> the directories it's modification time would not be changed, so 
> FileInputDStream could not detect these files.
> I think a way to fix this bug is get access_time and do judgment, bug it need 
> a Set of files to save all old files, it would very inefficient for lot of 
> files directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18974) FileInputDStream could not detected files which moved to the directory

2016-12-27 Thread Adam Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782284#comment-15782284
 ] 

Adam Wang commented on SPARK-18974:
---

Thanks for remind, this is my fault. This bug seems not so easy to solve, could 
we define a Set[Path] to check if file has been read by InputDstream? Such as:

@transient private var selectedFiles = new mutable.HashSet[String]()

... 

  override def start() { 
addOldFilesToSelectedFiles() 
  }
...

  private def isNewFile(path: Path, currentTime: Long, modTimeIgnoreThreshold: 
Long): Boolean = {
val pathStr = path.toString
..
// Reject file if it was considered earlier
if (selectedFiles.contains(pathStr)) {
  return false
}
logDebug(s"$pathStr accepted with mod time $modTime")
return true
  }

But it would very inefficient for lot of files directory. Are there any other 
way to judge the moved new files?


> FileInputDStream could not detected files which moved to the directory 
> ---
>
> Key: SPARK-18974
> URL: https://issues.apache.org/jira/browse/SPARK-18974
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 1.6.3, 2.0.2
>Reporter: Adam Wang
>
> FileInputDStream use mod time to find new files, but if a file was moved into 
> the directories it's modification time would not be changed, so 
> FileInputDStream could not detect these files.
> I think a way to fix this bug is get access_time and do judgment, bug it need 
> a Set of files to save all old files, it would very inefficient for lot of 
> files directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18974) FileInputDStream could not detected files which moved to the directory

2016-12-27 Thread Adam Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782284#comment-15782284
 ] 

Adam Wang edited comment on SPARK-18974 at 12/28/16 7:30 AM:
-

Thanks for remind, this is my fault. This bug seems not so easy to solve, could 
we define a Set[Path] to check if file had been read by InputDstream? Such as:

@transient private var selectedFiles = new mutable.HashSet[String]()

... 

  override def start() { 
addOldFilesToSelectedFiles() 
  }
...

  private def isNewFile(path: Path, currentTime: Long, modTimeIgnoreThreshold: 
Long): Boolean = {
val pathStr = path.toString
..
// Reject file if it was considered earlier
if (selectedFiles.contains(pathStr)) {
  return false
}
logDebug(s"$pathStr accepted with mod time $modTime")
return true
  }

But it would very inefficient for lot of files directory. Are there any other 
way to judge the moved new files?



was (Author: adam wang):
Thanks for remind, this is my fault. This bug seems not so easy to solve, could 
we define a Set[Path] to check if file has been read by InputDstream? Such as:

@transient private var selectedFiles = new mutable.HashSet[String]()

... 

  override def start() { 
addOldFilesToSelectedFiles() 
  }
...

  private def isNewFile(path: Path, currentTime: Long, modTimeIgnoreThreshold: 
Long): Boolean = {
val pathStr = path.toString
..
// Reject file if it was considered earlier
if (selectedFiles.contains(pathStr)) {
  return false
}
logDebug(s"$pathStr accepted with mod time $modTime")
return true
  }

But it would very inefficient for lot of files directory. Are there any other 
way to judge the moved new files?


> FileInputDStream could not detected files which moved to the directory 
> ---
>
> Key: SPARK-18974
> URL: https://issues.apache.org/jira/browse/SPARK-18974
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 1.6.3, 2.0.2
>Reporter: Adam Wang
>
> FileInputDStream use mod time to find new files, but if a file was moved into 
> the directories it's modification time would not be changed, so 
> FileInputDStream could not detect these files.
> I think a way to fix this bug is get access_time and do judgment, bug it need 
> a Set of files to save all old files, it would very inefficient for lot of 
> files directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19002) Check pep8 against all the python scripts

2016-12-27 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-19002:
-
Summary: Check pep8 against all the python scripts  (was: Check pep8 
against dev/*.py scripts)

> Check pep8 against all the python scripts
> -
>
> Key: SPARK-19002
> URL: https://issues.apache.org/jira/browse/SPARK-19002
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Hyukjin Kwon
>
> We can check pep8 against dev/*.py scripts. There are already several python 
> scripts being checked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19002) Check pep8 against all the python scripts

2016-12-27 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-19002:
-
Description: We can check pep8 against all the python scripts. There are 
only python scripts being checked.  (was: We can check pep8 against dev/*.py 
scripts. There are already several python scripts being checked.)

> Check pep8 against all the python scripts
> -
>
> Key: SPARK-19002
> URL: https://issues.apache.org/jira/browse/SPARK-19002
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Hyukjin Kwon
>
> We can check pep8 against all the python scripts. There are only python 
> scripts being checked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19002) Check pep8 against dev/*.py scripts

2016-12-27 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782211#comment-15782211
 ] 

Hyukjin Kwon commented on SPARK-19002:
--

I changed the priority to major because it seems some scripts are not 
executable via python 3.

> Check pep8 against dev/*.py scripts
> ---
>
> Key: SPARK-19002
> URL: https://issues.apache.org/jira/browse/SPARK-19002
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Hyukjin Kwon
>
> We can check pep8 against dev/*.py scripts. There are already several python 
> scripts being checked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19002) Check pep8 against dev/*.py scripts

2016-12-27 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-19002:
-
Priority: Major  (was: Trivial)

> Check pep8 against dev/*.py scripts
> ---
>
> Key: SPARK-19002
> URL: https://issues.apache.org/jira/browse/SPARK-19002
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Hyukjin Kwon
>
> We can check pep8 against dev/*.py scripts. There are already several python 
> scripts being checked.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18931) Create empty staging directory in partitioned table on insert

2016-12-27 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-18931.
-
Resolution: Duplicate

> Create empty staging directory in partitioned table on insert
> -
>
> Key: SPARK-18931
> URL: https://issues.apache.org/jira/browse/SPARK-18931
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Egor Pahomov
>
> CREATE TABLE temp.test_partitioning_4 (
>   num string
>  ) 
> PARTITIONED BY (
>   day string)
>   stored as parquet
> On every 
> INSERT INTO TABLE temp.test_partitioning_4 PARTITION (day)
> select day, count(*) as num from 
> hss.session where year=2016 and month=4 
> group by day
> new directory 
> ".hive-staging_hive_2016-12-19_15-55-11_298_3412488541559534475-4" created on 
> HDFS.  It's big issue, because I insert every day and bunch of empty dirs on 
> HDFS is very bad for HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18931) Create empty staging directory in partitioned table on insert

2016-12-27 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782017#comment-15782017
 ] 

Xiao Li commented on SPARK-18931:
-

Please try it using the latest nightly build: 
https://people.apache.org/~pwendell/spark-nightly/. 

If it still failed, please reopen it. Thanks!

> Create empty staging directory in partitioned table on insert
> -
>
> Key: SPARK-18931
> URL: https://issues.apache.org/jira/browse/SPARK-18931
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Egor Pahomov
>
> CREATE TABLE temp.test_partitioning_4 (
>   num string
>  ) 
> PARTITIONED BY (
>   day string)
>   stored as parquet
> On every 
> INSERT INTO TABLE temp.test_partitioning_4 PARTITION (day)
> select day, count(*) as num from 
> hss.session where year=2016 and month=4 
> group by day
> new directory 
> ".hive-staging_hive_2016-12-19_15-55-11_298_3412488541559534475-4" created on 
> HDFS.  It's big issue, because I insert every day and bunch of empty dirs on 
> HDFS is very bad for HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16563) Repeat calling Spark SQL thrift server fetchResults return empty for ExecuteStatement operation

2016-12-27 Thread vishal agrawal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782003#comment-15782003
 ] 

vishal agrawal commented on SPARK-16563:


Fix for this issue is causing Thrift server to hang while fetching large amount 
of data from Thrift server. Raised a bug for the same SPARK-18857. kindly check.

> Repeat calling Spark SQL thrift server fetchResults return empty for 
> ExecuteStatement operation
> ---
>
> Key: SPARK-16563
> URL: https://issues.apache.org/jira/browse/SPARK-16563
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.2
>Reporter: Gu Huiqin Alice
>Assignee: Gu Huiqin Alice
>Priority: Minor
> Fix For: 2.0.1, 2.1.0
>
>
> Repeat calling FetchResults(... orientation=FetchOrientation.FETCH_FIRST ..) 
> of spark sql thrift service will return empty set after calling 
> ExecuteStatement of TCLIService. 
> The bug exist in *function public RowSet getNextRowSet(FetchOrientation 
> orientation, long maxRows)*
> https://github.com/apache/spark/blob/02c8072eea72425e89256347e1f373a3e76e6eba/sql/hive-thriftserver/src/main/java/org/apache/hive/service/cli/operation/SQLOperation.java#L332
> The iterator for geting result can be used for only once, so repeat calling 
> FetchResults with FETCH_FIRST parameter will return empty result. 
> FetchOrientation.FETCH_FIRST



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18992) Move spark.sql.hive.thriftServer.singleSession to SQLConf

2016-12-27 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-18992.
-
   Resolution: Fixed
Fix Version/s: 2.2.0

> Move spark.sql.hive.thriftServer.singleSession to SQLConf
> -
>
> Key: SPARK-18992
> URL: https://issues.apache.org/jira/browse/SPARK-18992
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Xiao Li
>Assignee: Xiao Li
> Fix For: 2.2.0
>
>
> Since {{spark.sql.hive.thriftServer.singleSession}} is a configuration of SQL 
> component, this conf can be moved from {{SparkConf}} to {{StaticSQLConf}}. 
> When we introduced {{spark.sql.hive.thriftServer.singleSession}}, all the SQL 
> configuration can be modified in different sessions. Later, static SQL 
> configuration is added. It is a perfect fit for 
> {{spark.sql.hive.thriftServer.singleSession}}. Previously, we did the same 
> move for {{spark.sql.warehouse.dir}} from {{SparkConf}} to {{StaticSQLConf}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x

2016-12-27 Thread Josh Bacon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781450#comment-15781450
 ] 

Josh Bacon edited comment on SPARK-18737 at 12/27/16 11:16 PM:
---

Thanks for the quick reply,

We've cut our code down to a minimum and are only using JavaDStream and 
JavaDStream to call isEmpty (which experiences KyroException). So we do 
not have any classes to register, yet are still experiencing KyroExceptions. 
Non-the-less, enabling JavaSerialization and setting requireRegister to false 
do not appear to be functional.

Are there any other details you'd like me to provide to help identify this 
issue?

We are using the library: org.apache.spark.streaming.kinesis.KinesisUtils


was (Author: jbacon):
Thanks for the quick reply,

We've cut our code down to a minimum and are only using JavaDStream and 
JavaDStream to call isEmpty (which experiences KyroException). So we do 
not have any classes to register, yet are still experiencing KyroExceptions. 
Non-the-less, enabling JavaSerialization and setting requireRegister to false 
do not appear to be functional.

Are there any other details you'd like me to provide to help identify this 
issue?

> Serialization setting "spark.serializer" ignored in Spark 2.x
> -
>
> Key: SPARK-18737
> URL: https://issues.apache.org/jira/browse/SPARK-18737
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Dr. Michael Menzel
>
> The following exception occurs although the JavaSerializer has been activated:
> 16/11/22 10:49:24 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID 
> 77, ip-10-121-14-147.eu-central-1.compute.internal, partition 1, RACK_LOCAL, 
> 5621 bytes)
> 16/11/22 10:49:24 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching 
> task 77 on executor id: 2 hostname: 
> ip-10-121-14-147.eu-central-1.compute.internal.
> 16/11/22 10:49:24 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory 
> on ip-10-121-14-147.eu-central-1.compute.internal:45059 (size: 879.0 B, free: 
> 410.4 MB)
> 16/11/22 10:49:24 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 77, 
> ip-10-121-14-147.eu-central-1.compute.internal): 
> com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 
> 13994
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
> at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
> at 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229)
> at 
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21)
> at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
> at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
> at org.apache.spark.util.NextIterator.to(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
> at org.apache.spark.util.NextIterator.toBuffer(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
> at org.apache.spark.util.NextIterator.toArray(NextIterator.scala:21)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> The code runs perfectly with Spark 1.6.0. Since we moved to 2.0.0 and now 
> 2

[jira] [Commented] (SPARK-18393) DataFrame pivot output column names should respect aliases

2016-12-27 Thread Andrew Ray (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781473#comment-15781473
 ] 

Andrew Ray commented on SPARK-18393:


It wouldn't hurt to backport to 2.0, its a pretty simple fix.

> DataFrame pivot output column names should respect aliases
> --
>
> Key: SPARK-18393
> URL: https://issues.apache.org/jira/browse/SPARK-18393
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Eric Liang
>Priority: Minor
>
> For example
> {code}
> val df = spark.range(100).selectExpr("id % 5 as x", "id % 2 as a", "id as b")
> df
>   .groupBy('x)
>   .pivot("a", Seq(0, 1))
>   .agg(expr("sum(b)").as("blah"), expr("count(b)").as("foo"))
>   .show()
> +---++-++-+
> |  x|0_sum(`b`) AS `blah`|0_count(`b`) AS `foo`|1_sum(`b`) AS 
> `blah`|1_count(`b`) AS `foo`|
> +---++-++-+
> |  0| 450|   10| 500| 
>   10|
> |  1| 510|   10| 460| 
>   10|
> |  3| 530|   10| 480| 
>   10|
> |  2| 470|   10| 520| 
>   10|
> |  4| 490|   10| 540| 
>   10|
> +---++-++-+
> {code}
> The column names here are quite hard to read. Ideally we would respect the 
> aliases and generate column names like 0_blah, 0_foo, 1_blah, 1_foo instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x

2016-12-27 Thread Josh Bacon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781450#comment-15781450
 ] 

Josh Bacon commented on SPARK-18737:


Thanks for the quick reply,

We've cut our code down to a minimum and are only using JavaDStream and 
JavaDStream to call isEmpty (which experiences KyroException). So we do 
not have any classes to register, yet are still experiencing KyroExceptions. 
Non-the-less, enabling JavaSerialization and setting requireRegister to false 
do not appear to be functional.

Are there any other details you'd like me to provide to help identify this 
issue?

> Serialization setting "spark.serializer" ignored in Spark 2.x
> -
>
> Key: SPARK-18737
> URL: https://issues.apache.org/jira/browse/SPARK-18737
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Dr. Michael Menzel
>
> The following exception occurs although the JavaSerializer has been activated:
> 16/11/22 10:49:24 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID 
> 77, ip-10-121-14-147.eu-central-1.compute.internal, partition 1, RACK_LOCAL, 
> 5621 bytes)
> 16/11/22 10:49:24 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching 
> task 77 on executor id: 2 hostname: 
> ip-10-121-14-147.eu-central-1.compute.internal.
> 16/11/22 10:49:24 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory 
> on ip-10-121-14-147.eu-central-1.compute.internal:45059 (size: 879.0 B, free: 
> 410.4 MB)
> 16/11/22 10:49:24 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 77, 
> ip-10-121-14-147.eu-central-1.compute.internal): 
> com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 
> 13994
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
> at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
> at 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229)
> at 
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21)
> at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
> at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
> at org.apache.spark.util.NextIterator.to(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
> at org.apache.spark.util.NextIterator.toBuffer(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
> at org.apache.spark.util.NextIterator.toArray(NextIterator.scala:21)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> The code runs perfectly with Spark 1.6.0. Since we moved to 2.0.0 and now 
> 2.0.1, we see the Kyro deserialization exception and over time the Spark 
> streaming job stops processing since too many tasks failed.
> Our action was to use conf.set("spark.serializer", 
> "org.apache.spark.serializer.JavaSerializer") and to disable Kryo class 
> registration with conf.set("spark.kryo.registrationRequired", false). We hope 
> to identify the root cause of the exception. 
> However, setting the serializer to JavaSerializer is oviously ignored by the 
> Spark-internals. Despite the setting we still see the exception printed in 
> the log and tasks fail. The occurence seems to 

[jira] [Updated] (SPARK-17807) Scalatest listed as compile dependency in spark-tags

2016-12-27 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-17807:
--
Assignee: Ryan Williams

> Scalatest listed as compile dependency in spark-tags
> 
>
> Key: SPARK-17807
> URL: https://issues.apache.org/jira/browse/SPARK-17807
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Tom Standard
>Assignee: Ryan Williams
>Priority: Trivial
> Fix For: 2.0.3, 2.1.1, 2.2.0
>
>
> In spark-tags:2.0.0, Scalatest is listed as a compile time dependency - 
> shouldn't this be in test scope?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19006) should mentioned the max value allowed for spark.kryoserializer.buffer.max in doc

2016-12-27 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-19006:
--
Assignee: Yuexin Zhang

> should mentioned the max value allowed for spark.kryoserializer.buffer.max in 
> doc
> -
>
> Key: SPARK-19006
> URL: https://issues.apache.org/jira/browse/SPARK-19006
> Project: Spark
>  Issue Type: Documentation
>Reporter: Yuexin Zhang
>Assignee: Yuexin Zhang
>Priority: Trivial
> Fix For: 2.2.0
>
>
> On configuration doc 
> page:https://spark.apache.org/docs/latest/configuration.html
> We mentioned spark.kryoserializer.buffer.max : Maximum allowable size of Kryo 
> serialization buffer. This must be larger than any object you attempt to 
> serialize. Increase this if you get a "buffer limit exceeded" exception 
> inside Kryo.
> from source code, it has hard coded upper limit :
> val maxBufferSizeMb = conf.getSizeAsMb("spark.kryoserializer.buffer.max", 
> "64m").toInt
>   if (maxBufferSizeMb >= ByteUnit.GiB.toMiB(2)) {
> throw new IllegalArgumentException("spark.kryoserializer.buffer.max must 
> be less than " +
>   s"2048 mb, got: + $maxBufferSizeMb mb.")
>   }
> We should mention "this value must be less than 2048 mb" on the config page 
> as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18393) DataFrame pivot output column names should respect aliases

2016-12-27 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781436#comment-15781436
 ] 

Hyukjin Kwon commented on SPARK-18393:
--

Yup, I checked this against the current master. Hi [~a1ray], should we backport 
this to 2.0.x?

> DataFrame pivot output column names should respect aliases
> --
>
> Key: SPARK-18393
> URL: https://issues.apache.org/jira/browse/SPARK-18393
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Eric Liang
>Priority: Minor
>
> For example
> {code}
> val df = spark.range(100).selectExpr("id % 5 as x", "id % 2 as a", "id as b")
> df
>   .groupBy('x)
>   .pivot("a", Seq(0, 1))
>   .agg(expr("sum(b)").as("blah"), expr("count(b)").as("foo"))
>   .show()
> +---++-++-+
> |  x|0_sum(`b`) AS `blah`|0_count(`b`) AS `foo`|1_sum(`b`) AS 
> `blah`|1_count(`b`) AS `foo`|
> +---++-++-+
> |  0| 450|   10| 500| 
>   10|
> |  1| 510|   10| 460| 
>   10|
> |  3| 530|   10| 480| 
>   10|
> |  2| 470|   10| 520| 
>   10|
> |  4| 490|   10| 540| 
>   10|
> +---++-++-+
> {code}
> The column names here are quite hard to read. Ideally we would respect the 
> aliases and generate column names like 0_blah, 0_foo, 1_blah, 1_foo instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x

2016-12-27 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-18737.
---
Resolution: Not A Problem

Yes, see the message right above this. Spark 2.x does not necessarily work like 
1.x and Kryo is not optional in all code paths. You will want to disable 
registration or register your classes, in general. Disabling Kryo isn't a good 
idea.

If there's evidence this is from a path where Kryo should be disable-able then 
that much is a problem to fix, above notwithstanding. But then, using Kryo is 
still the better outcome.

> Serialization setting "spark.serializer" ignored in Spark 2.x
> -
>
> Key: SPARK-18737
> URL: https://issues.apache.org/jira/browse/SPARK-18737
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Dr. Michael Menzel
>
> The following exception occurs although the JavaSerializer has been activated:
> 16/11/22 10:49:24 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID 
> 77, ip-10-121-14-147.eu-central-1.compute.internal, partition 1, RACK_LOCAL, 
> 5621 bytes)
> 16/11/22 10:49:24 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching 
> task 77 on executor id: 2 hostname: 
> ip-10-121-14-147.eu-central-1.compute.internal.
> 16/11/22 10:49:24 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory 
> on ip-10-121-14-147.eu-central-1.compute.internal:45059 (size: 879.0 B, free: 
> 410.4 MB)
> 16/11/22 10:49:24 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 77, 
> ip-10-121-14-147.eu-central-1.compute.internal): 
> com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 
> 13994
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
> at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
> at 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229)
> at 
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21)
> at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
> at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
> at org.apache.spark.util.NextIterator.to(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
> at org.apache.spark.util.NextIterator.toBuffer(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
> at org.apache.spark.util.NextIterator.toArray(NextIterator.scala:21)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> The code runs perfectly with Spark 1.6.0. Since we moved to 2.0.0 and now 
> 2.0.1, we see the Kyro deserialization exception and over time the Spark 
> streaming job stops processing since too many tasks failed.
> Our action was to use conf.set("spark.serializer", 
> "org.apache.spark.serializer.JavaSerializer") and to disable Kryo class 
> registration with conf.set("spark.kryo.registrationRequired", false). We hope 
> to identify the root cause of the exception. 
> However, setting the serializer to JavaSerializer is oviously ignored by the 
> Spark-internals. Despite the setting we still see the exception printed in 
> the log and tasks fail. The occurence seems to be non-deterministic, but to 
> become more freq

[jira] [Comment Edited] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x

2016-12-27 Thread Josh Bacon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781418#comment-15781418
 ] 

Josh Bacon edited comment on SPARK-18737 at 12/27/16 10:33 PM:
---

My team is experiencing the exact same issue as described by OP for our 
streaming jobs using KinesisUtils library. Code worked in 1.6 previously but 
now experiences KyroExceptions in 2.0 (Unregistered Class Id) no matter if 
JavaSerialization is enabled instead or if requireRegister is set to false. 
Errors are experienced in non-deterministic manor. We see no work-around 
currently for our streaming jobs in Spark 2.0.1.


was (Author: jbacon):
My team is experiencing the exact same issue as described by OP for our 
streaming jobs using KinesisUtils library. Code worked in 1.6 previously but 
now experiences KyroExceptions in 2.0 (Unregistered Class Id) no matter if 
JavaSerialization is enabled instead or if requireRegister is set to false. 
Errors are experienced in non-deterministic manor as well. We see no 
work-around currently for our streaming jobs in Spark 2.0.1.

> Serialization setting "spark.serializer" ignored in Spark 2.x
> -
>
> Key: SPARK-18737
> URL: https://issues.apache.org/jira/browse/SPARK-18737
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Dr. Michael Menzel
>
> The following exception occurs although the JavaSerializer has been activated:
> 16/11/22 10:49:24 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID 
> 77, ip-10-121-14-147.eu-central-1.compute.internal, partition 1, RACK_LOCAL, 
> 5621 bytes)
> 16/11/22 10:49:24 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching 
> task 77 on executor id: 2 hostname: 
> ip-10-121-14-147.eu-central-1.compute.internal.
> 16/11/22 10:49:24 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory 
> on ip-10-121-14-147.eu-central-1.compute.internal:45059 (size: 879.0 B, free: 
> 410.4 MB)
> 16/11/22 10:49:24 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 77, 
> ip-10-121-14-147.eu-central-1.compute.internal): 
> com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 
> 13994
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
> at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
> at 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229)
> at 
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21)
> at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
> at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
> at org.apache.spark.util.NextIterator.to(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
> at org.apache.spark.util.NextIterator.toBuffer(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
> at org.apache.spark.util.NextIterator.toArray(NextIterator.scala:21)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> The code runs perfectly with Spark 1.6.0. Since we moved to 2.0.0 and now 
> 2.0.1, we see the Kyro deserialization exception and over time the Spark 
> streaming job stops processing sin

[jira] [Commented] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x

2016-12-27 Thread Josh Bacon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781418#comment-15781418
 ] 

Josh Bacon commented on SPARK-18737:


My team is experiencing the exact same issue as described by OP for our 
streaming jobs using KinesisUtils library. Code worked in 1.6 previously but 
now experiences KyroExceptions in 2.0 (Unregistered Class Id) no matter if 
JavaSerialization is enabled instead or if requireRegister is set to false. 
Errors are experienced in non-deterministic manor as well. We see no 
work-around currently for our jobs in Spark 2.0.1.

> Serialization setting "spark.serializer" ignored in Spark 2.x
> -
>
> Key: SPARK-18737
> URL: https://issues.apache.org/jira/browse/SPARK-18737
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Dr. Michael Menzel
>
> The following exception occurs although the JavaSerializer has been activated:
> 16/11/22 10:49:24 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID 
> 77, ip-10-121-14-147.eu-central-1.compute.internal, partition 1, RACK_LOCAL, 
> 5621 bytes)
> 16/11/22 10:49:24 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching 
> task 77 on executor id: 2 hostname: 
> ip-10-121-14-147.eu-central-1.compute.internal.
> 16/11/22 10:49:24 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory 
> on ip-10-121-14-147.eu-central-1.compute.internal:45059 (size: 879.0 B, free: 
> 410.4 MB)
> 16/11/22 10:49:24 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 77, 
> ip-10-121-14-147.eu-central-1.compute.internal): 
> com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 
> 13994
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
> at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
> at 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229)
> at 
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21)
> at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
> at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
> at org.apache.spark.util.NextIterator.to(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
> at org.apache.spark.util.NextIterator.toBuffer(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
> at org.apache.spark.util.NextIterator.toArray(NextIterator.scala:21)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> The code runs perfectly with Spark 1.6.0. Since we moved to 2.0.0 and now 
> 2.0.1, we see the Kyro deserialization exception and over time the Spark 
> streaming job stops processing since too many tasks failed.
> Our action was to use conf.set("spark.serializer", 
> "org.apache.spark.serializer.JavaSerializer") and to disable Kryo class 
> registration with conf.set("spark.kryo.registrationRequired", false). We hope 
> to identify the root cause of the exception. 
> However, setting the serializer to JavaSerializer is oviously ignored by the 
> Spark-internals. Despite the setting we still see the exception printed in 
> the log and tasks fail. The occurence seems to be non-deterministic, b

[jira] [Comment Edited] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x

2016-12-27 Thread Josh Bacon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781418#comment-15781418
 ] 

Josh Bacon edited comment on SPARK-18737 at 12/27/16 10:32 PM:
---

My team is experiencing the exact same issue as described by OP for our 
streaming jobs using KinesisUtils library. Code worked in 1.6 previously but 
now experiences KyroExceptions in 2.0 (Unregistered Class Id) no matter if 
JavaSerialization is enabled instead or if requireRegister is set to false. 
Errors are experienced in non-deterministic manor as well. We see no 
work-around currently for our streaming jobs in Spark 2.0.1.


was (Author: jbacon):
My team is experiencing the exact same issue as described by OP for our 
streaming jobs using KinesisUtils library. Code worked in 1.6 previously but 
now experiences KyroExceptions in 2.0 (Unregistered Class Id) no matter if 
JavaSerialization is enabled instead or if requireRegister is set to false. 
Errors are experienced in non-deterministic manor as well. We see no 
work-around currently for our jobs in Spark 2.0.1.

> Serialization setting "spark.serializer" ignored in Spark 2.x
> -
>
> Key: SPARK-18737
> URL: https://issues.apache.org/jira/browse/SPARK-18737
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Dr. Michael Menzel
>
> The following exception occurs although the JavaSerializer has been activated:
> 16/11/22 10:49:24 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID 
> 77, ip-10-121-14-147.eu-central-1.compute.internal, partition 1, RACK_LOCAL, 
> 5621 bytes)
> 16/11/22 10:49:24 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching 
> task 77 on executor id: 2 hostname: 
> ip-10-121-14-147.eu-central-1.compute.internal.
> 16/11/22 10:49:24 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory 
> on ip-10-121-14-147.eu-central-1.compute.internal:45059 (size: 879.0 B, free: 
> 410.4 MB)
> 16/11/22 10:49:24 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 77, 
> ip-10-121-14-147.eu-central-1.compute.internal): 
> com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 
> 13994
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
> at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
> at 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229)
> at 
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21)
> at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
> at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
> at org.apache.spark.util.NextIterator.to(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
> at org.apache.spark.util.NextIterator.toBuffer(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
> at org.apache.spark.util.NextIterator.toArray(NextIterator.scala:21)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> The code runs perfectly with Spark 1.6.0. Since we moved to 2.0.0 and now 
> 2.0.1, we see the Kyro deserialization exception and over time the Spark 
> streaming job stops processing since

[jira] [Commented] (SPARK-18393) DataFrame pivot output column names should respect aliases

2016-12-27 Thread Ganesh Chand (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781324#comment-15781324
 ] 

Ganesh Chand commented on SPARK-18393:
--

[~hyukjin.kwon] - Can you confirm the Spark version? I get the same behavior as 
Eric mentioned above on 2.0.2.

> DataFrame pivot output column names should respect aliases
> --
>
> Key: SPARK-18393
> URL: https://issues.apache.org/jira/browse/SPARK-18393
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Eric Liang
>Priority: Minor
>
> For example
> {code}
> val df = spark.range(100).selectExpr("id % 5 as x", "id % 2 as a", "id as b")
> df
>   .groupBy('x)
>   .pivot("a", Seq(0, 1))
>   .agg(expr("sum(b)").as("blah"), expr("count(b)").as("foo"))
>   .show()
> +---++-++-+
> |  x|0_sum(`b`) AS `blah`|0_count(`b`) AS `foo`|1_sum(`b`) AS 
> `blah`|1_count(`b`) AS `foo`|
> +---++-++-+
> |  0| 450|   10| 500| 
>   10|
> |  1| 510|   10| 460| 
>   10|
> |  3| 530|   10| 480| 
>   10|
> |  2| 470|   10| 520| 
>   10|
> |  4| 490|   10| 540| 
>   10|
> +---++-++-+
> {code}
> The column names here are quite hard to read. Ideally we would respect the 
> aliases and generate column names like 0_blah, 0_foo, 1_blah, 1_foo instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18993) Unable to build/compile Spark in IntelliJ due to missing Scala deps in spark-tags

2016-12-27 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781302#comment-15781302
 ] 

Apache Spark commented on SPARK-18993:
--

User 'srowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/16418

> Unable to build/compile Spark in IntelliJ due to missing Scala deps in 
> spark-tags
> -
>
> Key: SPARK-18993
> URL: https://issues.apache.org/jira/browse/SPARK-18993
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Xiao Li
>Priority: Critical
>
> After https://github.com/apache/spark/pull/16311 is merged, I am unable to 
> build it in my IntelliJ. Got the following compilation error:
> {noformat}
> Error:scalac: error while loading Object, Missing dependency 'object scala in 
> compiler mirror', required by 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_74.jdk/Contents/Home/jre/lib/rt.jar(java/lang/Object.class)
> Error:scalac: Error: object scala in compiler mirror not found.
> scala.reflect.internal.MissingRequirementError: object scala in compiler 
> mirror not found.
>   at 
> scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:17)
>   at 
> scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:18)
>   at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:53)
>   at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:66)
>   at 
> scala.reflect.internal.Mirrors$RootsBase.getPackage(Mirrors.scala:173)
>   at 
> scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackage$lzycompute(Definitions.scala:161)
>   at 
> scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackage(Definitions.scala:161)
>   at 
> scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackageClass$lzycompute(Definitions.scala:162)
>   at 
> scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackageClass(Definitions.scala:162)
>   at 
> scala.reflect.internal.Definitions$DefinitionsClass.init(Definitions.scala:1395)
>   at scala.tools.nsc.Global$Run.(Global.scala:1215)
>   at xsbt.CachedCompiler0$$anon$2.(CompilerInterface.scala:105)
>   at xsbt.CachedCompiler0.run(CompilerInterface.scala:105)
>   at xsbt.CachedCompiler0.run(CompilerInterface.scala:94)
>   at xsbt.CompilerInterface.run(CompilerInterface.scala:22)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:101)
>   at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:47)
>   at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:41)
>   at 
> org.jetbrains.jps.incremental.scala.local.IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:29)
>   at 
> org.jetbrains.jps.incremental.scala.local.LocalServer.compile(LocalServer.scala:26)
>   at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main.scala:67)
>   at 
> org.jetbrains.jps.incremental.scala.remote.Main$.nailMain(Main.scala:24)
>   at org.jetbrains.jps.incremental.scala.remote.Main.nailMain(Main.scala)
>   at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at com.martiansoftware.nailgun.NGSession.run(NGSession.java:319)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18993) Unable to build/compile Spark in IntelliJ due to missing Scala deps in spark-tags

2016-12-27 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-18993:
--
Summary: Unable to build/compile Spark in IntelliJ due to missing Scala 
deps in spark-tags  (was: Unable to build/compile Spark in IntelliJ)

> Unable to build/compile Spark in IntelliJ due to missing Scala deps in 
> spark-tags
> -
>
> Key: SPARK-18993
> URL: https://issues.apache.org/jira/browse/SPARK-18993
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Xiao Li
>Priority: Critical
>
> After https://github.com/apache/spark/pull/16311 is merged, I am unable to 
> build it in my IntelliJ. Got the following compilation error:
> {noformat}
> Error:scalac: error while loading Object, Missing dependency 'object scala in 
> compiler mirror', required by 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_74.jdk/Contents/Home/jre/lib/rt.jar(java/lang/Object.class)
> Error:scalac: Error: object scala in compiler mirror not found.
> scala.reflect.internal.MissingRequirementError: object scala in compiler 
> mirror not found.
>   at 
> scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:17)
>   at 
> scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:18)
>   at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:53)
>   at 
> scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:66)
>   at 
> scala.reflect.internal.Mirrors$RootsBase.getPackage(Mirrors.scala:173)
>   at 
> scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackage$lzycompute(Definitions.scala:161)
>   at 
> scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackage(Definitions.scala:161)
>   at 
> scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackageClass$lzycompute(Definitions.scala:162)
>   at 
> scala.reflect.internal.Definitions$DefinitionsClass.ScalaPackageClass(Definitions.scala:162)
>   at 
> scala.reflect.internal.Definitions$DefinitionsClass.init(Definitions.scala:1395)
>   at scala.tools.nsc.Global$Run.(Global.scala:1215)
>   at xsbt.CachedCompiler0$$anon$2.(CompilerInterface.scala:105)
>   at xsbt.CachedCompiler0.run(CompilerInterface.scala:105)
>   at xsbt.CachedCompiler0.run(CompilerInterface.scala:94)
>   at xsbt.CompilerInterface.run(CompilerInterface.scala:22)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:101)
>   at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:47)
>   at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:41)
>   at 
> org.jetbrains.jps.incremental.scala.local.IdeaIncrementalCompiler.compile(IdeaIncrementalCompiler.scala:29)
>   at 
> org.jetbrains.jps.incremental.scala.local.LocalServer.compile(LocalServer.scala:26)
>   at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main.scala:67)
>   at 
> org.jetbrains.jps.incremental.scala.remote.Main$.nailMain(Main.scala:24)
>   at org.jetbrains.jps.incremental.scala.remote.Main.nailMain(Main.scala)
>   at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at com.martiansoftware.nailgun.NGSession.run(NGSession.java:319)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18974) FileInputDStream could not detected files which moved to the directory

2016-12-27 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781279#comment-15781279
 ] 

Shixiong Zhu commented on SPARK-18974:
--

Not sure about HDFS. But I just checked my local file system on Mac. It doesn't 
change `access_time` after renaming a file, either.

{code}
$ stat -x a.txt 
  File: "a.txt"
  Size: 2FileType: Regular File
  Mode: (0644/-rw-r--r--) ...
Device: 1,4   Inode: 437388576Links: 1
Access: Tue Dec 27 13:10:13 2016
Modify: Tue Dec 27 13:10:13 2016
Change: Tue Dec 27 13:10:13 2016
$ sleep 3
$ mv a.txt b.txt
$ stat -x b.txt
  File: "b.txt"
  Size: 2FileType: Regular File
  Mode: (0644/-rw-r--r--) ...
Device: 1,4   Inode: 437388576Links: 1
Access: Tue Dec 27 13:10:13 2016
Modify: Tue Dec 27 13:10:13 2016
Change: Tue Dec 27 13:10:13 2016
{code}

> FileInputDStream could not detected files which moved to the directory 
> ---
>
> Key: SPARK-18974
> URL: https://issues.apache.org/jira/browse/SPARK-18974
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 1.6.3, 2.0.2
>Reporter: Adam Wang
>
> FileInputDStream use mod time to find new files, but if a file was moved into 
> the directories it's modification time would not be changed, so 
> FileInputDStream could not detect these files.
> I think a way to fix this bug is get access_time and do judgment, bug it need 
> a Set of files to save all old files, it would very inefficient for lot of 
> files directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18974) FileInputDStream could not detected files which moved to the directory

2016-12-27 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-18974:
-
Component/s: (was: Input/Output)

> FileInputDStream could not detected files which moved to the directory 
> ---
>
> Key: SPARK-18974
> URL: https://issues.apache.org/jira/browse/SPARK-18974
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 1.6.3, 2.0.2
>Reporter: Adam Wang
>
> FileInputDStream use mod time to find new files, but if a file was moved into 
> the directories it's modification time would not be changed, so 
> FileInputDStream could not detect these files.
> I think a way to fix this bug is get access_time and do judgment, bug it need 
> a Set of files to save all old files, it would very inefficient for lot of 
> files directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19007) Speedup and optimize the GradientBoostedTrees in the "data>memory" scene

2016-12-27 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-19007:
--
Affects Version/s: (was: 1.6.3)
   (was: 1.6.2)
   (was: 1.6.1)
   (was: 1.5.2)
   (was: 1.5.1)
   (was: 1.6.0)
   (was: 1.5.0)
   (was: 2.0.0)
 Target Version/s:   (was: 2.1.0)
 Priority: Minor  (was: Major)
Fix Version/s: (was: 2.1.0)

> Speedup and optimize the GradientBoostedTrees in the "data>memory" scene
> 
>
> Key: SPARK-19007
> URL: https://issues.apache.org/jira/browse/SPARK-19007
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Affects Versions: 2.0.1, 2.0.2, 2.1.0
> Environment: A CDH cluster consists of 3 redhat server ,(120G 
> memory、40 cores、43TB disk per server).
>Reporter: zhangdenghui
>Priority: Minor
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Test data:80G CTR training data from 
> criteolabs(http://criteolabs.wpengine.com/downloads/download-terabyte-click-logs/
>  ) ,I used 1 of the 24 days' data.Some  features needed to be repalced by new 
> generated continuous features,the way to generate the new features refers to 
> the way mentioned in the xgboost's paper.
> Recource allocated: spark on yarn, 20 executors, 8G memory and 2 cores per 
> executor.
> Parameters: numIterations 10, maxdepth  8,   the rest parameters are default
> I tested the GradientBoostedTrees algorithm in mllib  using 80G CTR data 
> mentioned above.
> It totally costs 1.5 hour, and i found many task failures after 6 or 7 GBT 
> rounds later.Without these task failures and task retry it can be much faster 
> ,which can save about half the time. I think it's caused by the RDD named 
> predError in the while loop of  the boost method at 
> GradientBoostedTrees.scala,because the lineage of the RDD named predError is 
> growing after every GBT round, and then it caused failures like this :
> (ExecutorLostFailure (executor 6 exited caused by one of the running tasks) 
> Reason: Container killed by YARN for exceeding memory limits. 10.2 GB of 10 
> GB physical memory used. Consider boosting 
> spark.yarn.executor.memoryOverhead.).  
> I tried to boosting spark.yarn.executor.memoryOverhead  but the meomry it 
> needed is too much (even increase half the memory  can't solve the problem) 
> so i think it's not a proper method. 
> Although it can set the predCheckpoint  Interval  smaller  to cut the line of 
> the lineage  but it increases IO cost a lot. 
> I tried  another way to solve this problem.I persisted the RDD named 
> predError every round  and use  pre_predError to record the previous RDD  and 
> unpersist it  because it's useless anymore.
> Finally it costs about 45 min after i tried my method and no task failure 
> occured and no more memeory added. 
> So when the data is much larger than memory, my little improvement can 
> speedup  the  GradientBoostedTrees  1~2 times.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-19006) should mentioned the max value allowed for spark.kryoserializer.buffer.max in doc

2016-12-27 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-19006.
---
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 16412
[https://github.com/apache/spark/pull/16412]

> should mentioned the max value allowed for spark.kryoserializer.buffer.max in 
> doc
> -
>
> Key: SPARK-19006
> URL: https://issues.apache.org/jira/browse/SPARK-19006
> Project: Spark
>  Issue Type: Documentation
>Reporter: Yuexin Zhang
>Priority: Trivial
> Fix For: 2.2.0
>
>
> On configuration doc 
> page:https://spark.apache.org/docs/latest/configuration.html
> We mentioned spark.kryoserializer.buffer.max : Maximum allowable size of Kryo 
> serialization buffer. This must be larger than any object you attempt to 
> serialize. Increase this if you get a "buffer limit exceeded" exception 
> inside Kryo.
> from source code, it has hard coded upper limit :
> val maxBufferSizeMb = conf.getSizeAsMb("spark.kryoserializer.buffer.max", 
> "64m").toInt
>   if (maxBufferSizeMb >= ByteUnit.GiB.toMiB(2)) {
> throw new IllegalArgumentException("spark.kryoserializer.buffer.max must 
> be less than " +
>   s"2048 mb, got: + $maxBufferSizeMb mb.")
>   }
> We should mention "this value must be less than 2048 mb" on the config page 
> as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-18970) FileSource failure during file list refresh doesn't cause an application to fail, but stops further processing

2016-12-27 Thread Lev (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15771583#comment-15771583
 ] 

Lev edited comment on SPARK-18970 at 12/27/16 8:15 PM:
---

Actually this is exactly the behavior I want. My problem is that application 
appeared to be alive, but was not processing and new files after this message 
in the log. I intentionally included the portion of the log at the top, showing 
that file list was refreshed every couple of minutes before. But as you can see 
refreshing have stopped after the logged error.


was (Author: lev.numerify):
Actually this is exactly the behavior I want. My problem is that application 
appeared to be alive, but was not processing and new files after this message 
in the log. I intentionally included the portion of the log in the bottom, 
showing that file lisd was refreshed every couple of minutes before.

> FileSource failure during file list refresh doesn't cause an application to 
> fail, but stops further processing
> --
>
> Key: SPARK-18970
> URL: https://issues.apache.org/jira/browse/SPARK-18970
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Structured Streaming
>Affects Versions: 2.0.0, 2.0.2
>Reporter: Lev
> Attachments: sparkerror.log
>
>
> Spark streaming application uses S3 files as streaming sources. After running 
> for several day processing stopped even though an application continued to 
> run. 
> Stack trace:
> java.io.FileNotFoundException: No such file or directory 
> 's3n://X'
>   at 
> com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.getFileStatus(S3NativeFileSystem.java:818)
>   at 
> com.amazon.ws.emr.hadoop.fs.EmrFileSystem.getFileStatus(EmrFileSystem.java:511)
>   at 
> org.apache.spark.sql.execution.datasources.HadoopFsRelation$$anonfun$7$$anonfun$apply$3.apply(fileSourceInterfaces.scala:465)
>   at 
> org.apache.spark.sql.execution.datasources.HadoopFsRelation$$anonfun$7$$anonfun$apply$3.apply(fileSourceInterfaces.scala:462)
>   at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
>   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1336)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1336)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1336)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:893)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:893)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1897)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> I believe 2 things should (or can) be fixed:
> 1. Application should fail in case of such an error.
> 2. Allow application to ignore such failure, since there is a chance that 
> during next refresh the error will not resurface. (In my case I believe an 
> error was cased by S3 cleaning the bucket exactly at the same moment when 
> refresh was running) 
> My code to create streaming processing looks as the following:
>   val cq = sqlContext.readStream
> .format("json")
> .schema(struct)
> .load(s"input")
> .writeStream
> .option("checkpointLocation", s"checkpoints")
> .foreach(new ForeachWriter[Row] {...})
> .trigger(ProcessingTime("10 seconds")).start()
>   
> cq.awaitTermination() 



--

[jira] [Updated] (SPARK-19006) should mentioned the max value allowed for spark.kryoserializer.buffer.max in doc

2016-12-27 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-19006:
--
Priority: Trivial  (was: Major)

(Not "Major")

> should mentioned the max value allowed for spark.kryoserializer.buffer.max in 
> doc
> -
>
> Key: SPARK-19006
> URL: https://issues.apache.org/jira/browse/SPARK-19006
> Project: Spark
>  Issue Type: Documentation
>Reporter: Yuexin Zhang
>Priority: Trivial
>
> On configuration doc 
> page:https://spark.apache.org/docs/latest/configuration.html
> We mentioned spark.kryoserializer.buffer.max : Maximum allowable size of Kryo 
> serialization buffer. This must be larger than any object you attempt to 
> serialize. Increase this if you get a "buffer limit exceeded" exception 
> inside Kryo.
> from source code, it has hard coded upper limit :
> val maxBufferSizeMb = conf.getSizeAsMb("spark.kryoserializer.buffer.max", 
> "64m").toInt
>   if (maxBufferSizeMb >= ByteUnit.GiB.toMiB(2)) {
> throw new IllegalArgumentException("spark.kryoserializer.buffer.max must 
> be less than " +
>   s"2048 mb, got: + $maxBufferSizeMb mb.")
>   }
> We should mention "this value must be less than 2048 mb" on the config page 
> as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18997) Recommended upgrade libthrift to 0.9.3

2016-12-27 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-18997:
--
Priority: Minor  (was: Critical)

> Recommended upgrade libthrift  to 0.9.3
> ---
>
> Key: SPARK-18997
> URL: https://issues.apache.org/jira/browse/SPARK-18997
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: meiyoula
>Priority: Minor
>
> libthrift 0.9.2 has a serious security vulnerability:CVE-2015-3254



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18997) Recommended upgrade libthrift to 0.9.3

2016-12-27 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781139#comment-15781139
 ] 

Sean Owen commented on SPARK-18997:
---

So, the questions are: does this CVE impact Spark at all? and what are the 
other implications of updating like breaking changes?
A maintenance release is typically fine to take, but this is a very large one. 
I'd like to see if the CVE even matters for Spark too -- can you provide more 
info?

> Recommended upgrade libthrift  to 0.9.3
> ---
>
> Key: SPARK-18997
> URL: https://issues.apache.org/jira/browse/SPARK-18997
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: meiyoula
>Priority: Critical
>
> libthrift 0.9.2 has a serious security vulnerability:CVE-2015-3254



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18997) Recommended upgrade libthrift to 0.9.3

2016-12-27 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-18997:
--
Issue Type: Improvement  (was: Bug)

> Recommended upgrade libthrift  to 0.9.3
> ---
>
> Key: SPARK-18997
> URL: https://issues.apache.org/jira/browse/SPARK-18997
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: meiyoula
>Priority: Minor
>
> libthrift 0.9.2 has a serious security vulnerability:CVE-2015-3254



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18757) Models in Pyspark support column setters

2016-12-27 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781096#comment-15781096
 ] 

Joseph K. Bradley commented on SPARK-18757:
---

I think it's useful to make the Python types match the Scala ones.

However, I want to avoid introducing more abstract classes in Scala, unless 
they are really useful.  We've run into issues with them, where some 
(especially the Classifier and ProbabilisticClassifier) need to be specialized 
so much that they are probably not worth the trouble.  At some point, I hope we 
can replace them with traits and eliminate the "shared" implementations which 
are really not shareable across all subclasses.  To see what I mean, check out 
the overrides in LogisticRegression.

> Models in Pyspark support column setters
> 
>
> Key: SPARK-18757
> URL: https://issues.apache.org/jira/browse/SPARK-18757
> Project: Spark
>  Issue Type: Brainstorming
>  Components: ML, PySpark
>Reporter: zhengruifeng
>
> Recently, I found three places in which column setters are missing: 
> KMeansModel, BisectingKMeansModel and OneVsRestModel.
> These three models directly inherit `Model` which dont have columns setters, 
> so I had to add the missing setters manually in [SPARK-18625] and 
> [SPARK-18520].
> Fow now, models in pyspark still don't support column setters at all.
> I suggest that we keep the hierarchy of pyspark models in line with that in 
> the scala side:
> For classifiation and regression algs, I‘m making a trial in [SPARK-18739]. 
> In it, I try to copy the hierarchy from the scala side.
> For clustering algs, I think we may first create abstract classes 
> {{ClusteringModel}} and {{ProbabilisticClusteringModel}} in the scala side, 
> and make clustering algs inherit it. Then, in the python side, we copy the 
> hierarchy so that we dont need to add setters manually for each alg.
> For features algs, we can also use a abstract class {{FeatureModel}} in scala 
> side, and do the same thing.
> What's your opinions? [~yanboliang][~josephkb][~sethah][~srowen]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18757) Models in Pyspark support column setters

2016-12-27 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-18757:
--
Description: 
Recently, I found three places in which column setters are missing: 
KMeansModel, BisectingKMeansModel and OneVsRestModel.
These three models directly inherit `Model` which dont have columns setters, so 
I had to add the missing setters manually in [SPARK-18625] and [SPARK-18520].
Fow now, models in pyspark still don't support column setters at all.
I suggest that we keep the hierarchy of pyspark models in line with that in the 
scala side:
For classifiation and regression algs, I‘m making a trial in [SPARK-18739]. In 
it, I try to copy the hierarchy from the scala side.
For clustering algs, I think we may first create abstract classes 
{{ClusteringModel}} and {{ProbabilisticClusteringModel}} in the scala side, and 
make clustering algs inherit it. Then, in the python side, we copy the 
hierarchy so that we dont need to add setters manually for each alg.
For features algs, we can also use a abstract class {{FeatureModel}} in scala 
side, and do the same thing.

What's your opinions? [~yanboliang][~josephkb][~sethah][~srowen]

  was:
Recently, I found three places in which column setters are missing: 
KMeansModel, BisectingKMeansModel and OneVsRestModel.
These three models directly inherit `Model` which dont have columns setters, so 
I had to add the missing setters manually in [SPARK-18625] and [SPARK-18520].
Fow now, models in pyspark still don't support column setters at all.
I suggest that we keep the hierarchy of pyspark models in line with that in the 
scala side:
For classifiation and regression algs, I‘m making a trial in [SPARK-18379]. In 
it, I try to copy the hierarchy from the scala side.
For clustering algs, I think we may first create abstract classes 
{{ClusteringModel}} and {{ProbabilisticClusteringModel}} in the scala side, and 
make clustering algs inherit it. Then, in the python side, we copy the 
hierarchy so that we dont need to add setters manually for each alg.
For features algs, we can also use a abstract class {{FeatureModel}} in scala 
side, and do the same thing.

What's your opinions? [~yanboliang][~josephkb][~sethah][~srowen]


> Models in Pyspark support column setters
> 
>
> Key: SPARK-18757
> URL: https://issues.apache.org/jira/browse/SPARK-18757
> Project: Spark
>  Issue Type: Brainstorming
>  Components: ML, PySpark
>Reporter: zhengruifeng
>
> Recently, I found three places in which column setters are missing: 
> KMeansModel, BisectingKMeansModel and OneVsRestModel.
> These three models directly inherit `Model` which dont have columns setters, 
> so I had to add the missing setters manually in [SPARK-18625] and 
> [SPARK-18520].
> Fow now, models in pyspark still don't support column setters at all.
> I suggest that we keep the hierarchy of pyspark models in line with that in 
> the scala side:
> For classifiation and regression algs, I‘m making a trial in [SPARK-18739]. 
> In it, I try to copy the hierarchy from the scala side.
> For clustering algs, I think we may first create abstract classes 
> {{ClusteringModel}} and {{ProbabilisticClusteringModel}} in the scala side, 
> and make clustering algs inherit it. Then, in the python side, we copy the 
> hierarchy so that we dont need to add setters manually for each alg.
> For features algs, we can also use a abstract class {{FeatureModel}} in scala 
> side, and do the same thing.
> What's your opinions? [~yanboliang][~josephkb][~sethah][~srowen]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18842) De-duplicate paths in classpaths in processes for local-cluster mode to work around the length limitation on Windows

2016-12-27 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-18842.
---
Resolution: Fixed

Issue resolved by pull request 16398
[https://github.com/apache/spark/pull/16398]

> De-duplicate paths in classpaths in processes for local-cluster mode to work 
> around the length limitation on Windows
> 
>
> Key: SPARK-18842
> URL: https://issues.apache.org/jira/browse/SPARK-18842
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
> Fix For: 2.2.0
>
>
> Currently, some tests are being failed and hanging on Windows due to this 
> problem. For the reason in SPARK-18718, some tests using {{local-cluster}} 
> mode were disabled on Windows due to the length limitation by paths given to 
> classpaths.
> The limitation seems roughly 32K (see 
> https://blogs.msdn.microsoft.com/oldnewthing/20031210-00/?p=41553/ and 
> https://support.thoughtworks.com/hc/en-us/articles/213248526-Getting-around-maximum-command-line-length-is-32767-characters-on-Windows)
>  but executors were being launched with the command such as 
> https://gist.github.com/HyukjinKwon/5bc81061c250d4af5a180869b59d42ea in 
> (only) tests.
> This length is roughly 40K due to the class paths. However, it seems there 
> are duplicates more than half. So, if we de-duplicate this paths, it is 
> reduced to roughly 20K.
> Maybe, we should consider as some more paths are added in the future but it 
> seems better than disabling all the tests for now with minimised changes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19013) java.util.ConcurrentModificationException when using s3 path as checkpointLocation

2016-12-27 Thread Tim Chan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781029#comment-15781029
 ] 

Tim Chan commented on SPARK-19013:
--

Perhaps the documentation should be revised to recommend against using s3 as a 
location for {{checkpointLocation}}? I will test with an hdfs location and 
update this ticket with my findings. 

> java.util.ConcurrentModificationException when using s3 path as 
> checkpointLocation 
> ---
>
> Key: SPARK-19013
> URL: https://issues.apache.org/jira/browse/SPARK-19013
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.0.2
>Reporter: Tim Chan
>
> I have a structured stream job running on EMR. The job will fail due to this
> {code}
> Multiple HDFSMetadataLog are using s3://mybucket/myapp 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:162)
> {code}
> There is only one instance of this stream job running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19013) java.util.ConcurrentModificationException when using s3 path as checkpointLocation

2016-12-27 Thread Tim Chan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Chan updated SPARK-19013:
-
Description: 
I have a structured stream job running on EMR. The job will fail due to this

{code}
Multiple HDFSMetadataLog are using s3://mybucket/myapp 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:162)
{code}

There is only one instance of this stream job running.

  was:
I have a structured stream job running on EMR. The job will fail due to this

```
Multiple HDFSMetadataLog are using s3://mybucket/myapp 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:162)
```

There is only one instance of this stream job running.


> java.util.ConcurrentModificationException when using s3 path as 
> checkpointLocation 
> ---
>
> Key: SPARK-19013
> URL: https://issues.apache.org/jira/browse/SPARK-19013
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.0.2
>Reporter: Tim Chan
>
> I have a structured stream job running on EMR. The job will fail due to this
> {code}
> Multiple HDFSMetadataLog are using s3://mybucket/myapp 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:162)
> {code}
> There is only one instance of this stream job running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-19013) java.util.ConcurrentModificationException when using s3 path as checkpointLocation

2016-12-27 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-19013.
--
Resolution: Not A Problem

This is basically a S3 problem, not a Spark one.

> java.util.ConcurrentModificationException when using s3 path as 
> checkpointLocation 
> ---
>
> Key: SPARK-19013
> URL: https://issues.apache.org/jira/browse/SPARK-19013
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.0.2
>Reporter: Tim Chan
>
> I have a structured stream job running on EMR. The job will fail due to this
> ```
> Multiple HDFSMetadataLog are using s3://mybucket/myapp 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:162)
> ```
> There is only one instance of this stream job running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19013) java.util.ConcurrentModificationException when using s3 path as checkpointLocation

2016-12-27 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15781019#comment-15781019
 ] 

Shixiong Zhu commented on SPARK-19013:
--

This is probably because S3 negative cache. 

"a negative GET may be cached, such that even if an object is immediately 
created, the fact that there "wasn't" an object is still remembered." See 
https://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.8.0/bk_hdcloud-aws/content/s3-trouble/index.html#visible-s3-inconsistency
 for details.

> java.util.ConcurrentModificationException when using s3 path as 
> checkpointLocation 
> ---
>
> Key: SPARK-19013
> URL: https://issues.apache.org/jira/browse/SPARK-19013
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.0.2
>Reporter: Tim Chan
>
> I have a structured stream job running on EMR. The job will fail due to this
> ```
> Multiple HDFSMetadataLog are using s3://mybucket/myapp 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:162)
> ```
> There is only one instance of this stream job running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18990) make DatasetBenchmark fairer for Dataset

2016-12-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18990:


Assignee: Wenchen Fan  (was: Apache Spark)

> make DatasetBenchmark fairer for Dataset
> 
>
> Key: SPARK-18990
> URL: https://issues.apache.org/jira/browse/SPARK-18990
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-18990) make DatasetBenchmark fairer for Dataset

2016-12-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18990:


Assignee: Apache Spark  (was: Wenchen Fan)

> make DatasetBenchmark fairer for Dataset
> 
>
> Key: SPARK-18990
> URL: https://issues.apache.org/jira/browse/SPARK-18990
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18990) make DatasetBenchmark fairer for Dataset

2016-12-27 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-18990:
-
Fix Version/s: (was: 2.2.0)

> make DatasetBenchmark fairer for Dataset
> 
>
> Key: SPARK-18990
> URL: https://issues.apache.org/jira/browse/SPARK-18990
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-18990) make DatasetBenchmark fairer for Dataset

2016-12-27 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai reopened SPARK-18990:
--

> make DatasetBenchmark fairer for Dataset
> 
>
> Key: SPARK-18990
> URL: https://issues.apache.org/jira/browse/SPARK-18990
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19013) java.util.ConcurrentModificationException when using s3 path as checkpointLocation

2016-12-27 Thread Tim Chan (JIRA)
Tim Chan created SPARK-19013:


 Summary: java.util.ConcurrentModificationException when using s3 
path as checkpointLocation 
 Key: SPARK-19013
 URL: https://issues.apache.org/jira/browse/SPARK-19013
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 2.0.2
Reporter: Tim Chan


I have a structured stream job running on EMR. The job will fail due to this

```
Multiple HDFSMetadataLog are using s3://mybucket/myapp 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.org$apache$spark$sql$execution$streaming$HDFSMetadataLog$$writeBatch(HDFSMetadataLog.scala:162)
```

There is only one instance of this stream job running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-27 Thread Jork Zijlstra (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780545#comment-15780545
 ] 

Jork Zijlstra commented on SPARK-19012:
---

The type is viewName: String so you would expect any string to work. 

When executing createOrReplaceTempView(tableOrViewName: String) an 
ParseException of 
{code}
== SQL ==
{tableOrViewName}
{code}
is thrown. You might not see that these are related, because it goes on about 
an SQL ParseException and you just defined a tempView

You might not expect that the tableOrViewName trigger some SQL parsing. A 
slighter more clearer exception message (viewName not supported) could be more 
helpful or add the identifier rules set to the documentation.

As you said prefixing the tableOrViewName with a non-numerical value does the 
trick (although for me this feels more like a workaround)

> CreateOrReplaceTempView throws 
> org.apache.spark.sql.catalyst.parser.ParseException when viewName first char 
> is numerical
> 
>
> Key: SPARK-19012
> URL: https://issues.apache.org/jira/browse/SPARK-19012
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.1, 2.0.2
>Reporter: Jork Zijlstra
>
> Using a viewName where the the fist char is a numerical value on 
> dataframe.createOrReplaceTempView(viewName: String) causes:
> {code}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 
> 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 
> 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 
> 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 
> 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)
> == SQL ==
> 1
> {code}
> {code}
> val tableOrViewName = "1" //fails
> val tableOrViewName = "a" //works
> sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18990) make DatasetBenchmark fairer for Dataset

2016-12-27 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-18990.
-
   Resolution: Fixed
Fix Version/s: 2.2.0

> make DatasetBenchmark fairer for Dataset
> 
>
> Key: SPARK-18990
> URL: https://issues.apache.org/jira/browse/SPARK-18990
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19004) Fix `JDBCWriteSuite.testH2Dialect` by removing `getCatalystType`

2016-12-27 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-19004:

Assignee: Dongjoon Hyun

> Fix `JDBCWriteSuite.testH2Dialect` by removing `getCatalystType`
> 
>
> Key: SPARK-19004
> URL: https://issues.apache.org/jira/browse/SPARK-19004
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 2.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 2.2.0
>
>
> JDBCSuite and JDBCWriterSuite have its own testH2Dialect for their testing 
> purposes.
> This issue fixes testH2Dialect in JDBCWriterSuite by removing getCatalystType 
> implementation in order to return correct types. Currently, it returns 
> Some(StringType) incorrectly. For the testH2Dialect in JDBCSuite, it's 
> intentional because of the test case Remap types via JdbcDialects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-19004) Fix `JDBCWriteSuite.testH2Dialect` by removing `getCatalystType`

2016-12-27 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19004?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-19004.
-
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 16409
[https://github.com/apache/spark/pull/16409]

> Fix `JDBCWriteSuite.testH2Dialect` by removing `getCatalystType`
> 
>
> Key: SPARK-19004
> URL: https://issues.apache.org/jira/browse/SPARK-19004
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 2.1.0
>Reporter: Dongjoon Hyun
>Priority: Minor
> Fix For: 2.2.0
>
>
> JDBCSuite and JDBCWriterSuite have its own testH2Dialect for their testing 
> purposes.
> This issue fixes testH2Dialect in JDBCWriterSuite by removing getCatalystType 
> implementation in order to return correct types. Currently, it returns 
> Some(StringType) incorrectly. For the testH2Dialect in JDBCSuite, it's 
> intentional because of the test case Remap types via JdbcDialects.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18999) simplify Literal codegen

2016-12-27 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-18999.
-
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 16402
[https://github.com/apache/spark/pull/16402]

> simplify Literal codegen
> 
>
> Key: SPARK-18999
> URL: https://issues.apache.org/jira/browse/SPARK-18999
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Minor
> Fix For: 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-19010) Include Kryo exception in case of overflow

2016-12-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19010:


Assignee: Apache Spark

> Include Kryo exception in case of overflow
> --
>
> Key: SPARK-19010
> URL: https://issues.apache.org/jira/browse/SPARK-19010
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 1.6.3
>Reporter: Sergei Lebedev
>Assignee: Apache Spark
>
> SPARK-6087 replaced Kryo overflow exception with a SparkException giving 
> Spark-specific instructions on tackling the issue. The implementation also 
> [suppressed|https://github.com/apache/spark/pull/4947/files#diff-1f81c62dad0e2dfc387a974bb08c497cR165]
>  the original Kryo exception, which made it impossible to trace the operation 
> causing the overflow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-19010) Include Kryo exception in case of overflow

2016-12-27 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19010:


Assignee: (was: Apache Spark)

> Include Kryo exception in case of overflow
> --
>
> Key: SPARK-19010
> URL: https://issues.apache.org/jira/browse/SPARK-19010
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 1.6.3
>Reporter: Sergei Lebedev
>
> SPARK-6087 replaced Kryo overflow exception with a SparkException giving 
> Spark-specific instructions on tackling the issue. The implementation also 
> [suppressed|https://github.com/apache/spark/pull/4947/files#diff-1f81c62dad0e2dfc387a974bb08c497cR165]
>  the original Kryo exception, which made it impossible to trace the operation 
> causing the overflow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19010) Include Kryo exception in case of overflow

2016-12-27 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780449#comment-15780449
 ] 

Apache Spark commented on SPARK-19010:
--

User 'superbobry' has created a pull request for this issue:
https://github.com/apache/spark/pull/16416

> Include Kryo exception in case of overflow
> --
>
> Key: SPARK-19010
> URL: https://issues.apache.org/jira/browse/SPARK-19010
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 1.6.3
>Reporter: Sergei Lebedev
>
> SPARK-6087 replaced Kryo overflow exception with a SparkException giving 
> Spark-specific instructions on tackling the issue. The implementation also 
> [suppressed|https://github.com/apache/spark/pull/4947/files#diff-1f81c62dad0e2dfc387a974bb08c497cR165]
>  the original Kryo exception, which made it impossible to trace the operation 
> causing the overflow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-27 Thread Herman van Hovell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780429#comment-15780429
 ] 

Herman van Hovell commented on SPARK-19012:
---

{{"1"}} fails because it is a numeric literal. Either add a letter to the 
name, or place the name between backticks, e.g.: {{"`1`"}}

This is not a bug per se, we could however improve the name parsing.

> CreateOrReplaceTempView throws 
> org.apache.spark.sql.catalyst.parser.ParseException when viewName first char 
> is numerical
> 
>
> Key: SPARK-19012
> URL: https://issues.apache.org/jira/browse/SPARK-19012
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.1, 2.0.2
>Reporter: Jork Zijlstra
>
> Using a viewName where the the fist char is a numerical value on 
> dataframe.createOrReplaceTempView(viewName: String) causes:
> {code}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 
> 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 
> 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 
> 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 
> 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)
> == SQL ==
> 1
> {code}
> {code}
> val tableOrViewName = "1" //fails
> val tableOrViewName = "a" //works
> sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-27 Thread Jork Zijlstra (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jork Zijlstra updated SPARK-19012:
--
Affects Version/s: 2.0.2

> CreateOrReplaceTempView throws 
> org.apache.spark.sql.catalyst.parser.ParseException when viewName first char 
> is numerical
> 
>
> Key: SPARK-19012
> URL: https://issues.apache.org/jira/browse/SPARK-19012
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.1, 2.0.2
>Reporter: Jork Zijlstra
>
> Using a viewName where the the fist char is a numerical value on 
> dataframe.createOrReplaceTempView(viewName: String) causes:
> {code}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 
> 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 
> 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 
> 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 
> 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)
> == SQL ==
> 1
> {code}
> {code}
> val tableOrViewName = "1" //fails
> val tableOrViewName = "a" //works
> sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-27 Thread Jork Zijlstra (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jork Zijlstra updated SPARK-19012:
--
Description: 
Using a viewName where the the fist char is a numerical value on 
dataframe.createOrReplaceTempView(viewName: String) causes:

{code}
Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException: 
mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 'ALL', 
'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 'ROLLUP', 
'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 'EXISTS', 
'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 'ASC', 
'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 'JOIN', 
'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 'NATURAL', 'ON', 
'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', 
'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 'CREATE', 
'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 'EXPLAIN', 
'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 'COLUMNS', 'COLUMN', 
'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 'EXCEPT', 'INTERSECT', 'TO', 
'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 'ARRAY', 'MAP', 'STRUCT', 
'COMMENT', 'SET', 'RESET', 'DATA', 'START', 'TRANSACTION', 'COMMIT', 
'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 'BUCKET', 'OUT', 'OF', 'SORT', 
'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 
'SERDEPROPERTIES', 'RECORDREADER', 'RECORDWRITER', 'DELIMITED', 'FIELDS', 
'TERMINATED', 'COLLECTION', 'ITEMS', 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 
'FUNCTION', 'EXTENDED', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 
'FORMATTED', TEMPORARY, 'OPTIONS', 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 
'BUCKETS', 'SKEWED', 'STORED', 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 
'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 'TOUCH', 'COMPACT', 'CONCATENATE', 
'CHANGE', 'CASCADE', 'RESTRICT', 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 
'OUTPUTFORMAT', DATABASE, DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 
'LIST', 'STATISTICS', 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 
'LOCK', 'UNLOCK', 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 
'ROLE', 'ROLES', 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 
'INDEXES', 'LOCKS', 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 
'CURRENT_TIMESTAMP', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)

== SQL ==
1
{code}

{code}
val tableOrViewName = "1" //fails
val tableOrViewName = "a" //works
sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
{code}



  was:
Using a viewName where the the fist char is a numerical value on 
dataframe.createOrReplaceTempView(viewName: String) causes:

{code}
Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException: 
mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 'ALL', 
'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 'ROLLUP', 
'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 'EXISTS', 
'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 'ASC', 
'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 'JOIN', 
'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 'NATURAL', 'ON', 
'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', 
'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 'CREATE', 
'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 'EXPLAIN', 
'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 'COLUMNS', 'COLUMN', 
'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 'EXCEPT', 'INTERSECT', 'TO', 
'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 'ARRAY', 'MAP', 'STRUCT', 
'COMMENT', 'SET', 'RESET', 'DATA', 'START', 'TRANSACTION', 'COMMIT', 
'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 'BUCKET', 'OUT', 'OF', 'SORT', 
'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 
'SERDEPROPERTIES', 'RECORDREADER', 'RECORDWRITER', 'DELIMITED', 'FIELDS', 
'TERMINATED', 'COLLECTION', 'ITEMS', 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 
'FUNCTION', 'EXTENDED', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 
'FORMATTED', TEMPORARY, 'OPTIONS', 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 
'BUCKETS', 'SKEWED', 'STORED', 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 
'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 'TOUCH', 'COMPACT', 'CONCATENATE', 
'CHANGE', 'CASCADE', 'RESTRICT', 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 
'OUTPUTFORMAT', DATABASE, DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 
'LIST', 'STATISTICS', 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 
'LOCK', 'UNLOCK', 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 
'ROLE', 'ROLES', 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 
'INDEXES', 'LOCKS', 'OPTION', 'ANTI', 'LOCAL', 'I

[jira] [Created] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-27 Thread Jork Zijlstra (JIRA)
Jork Zijlstra created SPARK-19012:
-

 Summary: CreateOrReplaceTempView throws 
org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is 
numerical
 Key: SPARK-19012
 URL: https://issues.apache.org/jira/browse/SPARK-19012
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.0.1
Reporter: Jork Zijlstra


Using a viewName where the the fist char is a numerical value on 
dataframe.createOrReplaceTempView(viewName: String) causes:

{code}
Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException: 
mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 'ALL', 
'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 'ROLLUP', 
'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 'EXISTS', 
'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 'ASC', 
'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 'JOIN', 
'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 'NATURAL', 'ON', 
'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'UNBOUNDED', 
'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 'CREATE', 
'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 'EXPLAIN', 
'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 'COLUMNS', 'COLUMN', 
'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 'EXCEPT', 'INTERSECT', 'TO', 
'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 'ARRAY', 'MAP', 'STRUCT', 
'COMMENT', 'SET', 'RESET', 'DATA', 'START', 'TRANSACTION', 'COMMIT', 
'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 'BUCKET', 'OUT', 'OF', 'SORT', 
'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 
'SERDEPROPERTIES', 'RECORDREADER', 'RECORDWRITER', 'DELIMITED', 'FIELDS', 
'TERMINATED', 'COLLECTION', 'ITEMS', 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 
'FUNCTION', 'EXTENDED', 'REFRESH', 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 
'FORMATTED', TEMPORARY, 'OPTIONS', 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 
'BUCKETS', 'SKEWED', 'STORED', 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 
'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 'TOUCH', 'COMPACT', 'CONCATENATE', 
'CHANGE', 'CASCADE', 'RESTRICT', 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 
'OUTPUTFORMAT', DATABASE, DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 
'LIST', 'STATISTICS', 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 
'LOCK', 'UNLOCK', 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 
'ROLE', 'ROLES', 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 
'INDEXES', 'LOCKS', 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 
'CURRENT_TIMESTAMP', IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)

== SQL ==
1468079114
{code}

{code}
val tableOrViewName = "1" //fails
val tableOrViewName = "a" //works
sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19011) ApplicationDescription should add the Submission ID for the standalone cluster

2016-12-27 Thread hustfxj (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hustfxj updated SPARK-19011:

Description: A large standalone cluster may have logs of applications and 
drivers, So I may not know which driver start up which application from the 
masterPage . So I think we can add the driver id at the applicationDescription. 
In fact, I have finished!

> ApplicationDescription should add the Submission ID for the standalone cluster
> --
>
> Key: SPARK-19011
> URL: https://issues.apache.org/jira/browse/SPARK-19011
> Project: Spark
>  Issue Type: Improvement
>Reporter: hustfxj
>
> A large standalone cluster may have logs of applications and drivers, So I 
> may not know which driver start up which application from the masterPage . So 
> I think we can add the driver id at the applicationDescription. In fact, I 
> have finished!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19011) ApplicationDescription should add the Submission ID for the standalone cluster

2016-12-27 Thread hustfxj (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780310#comment-15780310
 ] 

hustfxj commented on SPARK-19011:
-

A large standalone cluster may have logs of applications and drivers, So I may 
not know which driver start up which application from the masterPage . So I 
think we can add the driver id at the applicationDescription. In fact, I have 
finished! 

> ApplicationDescription should add the Submission ID for the standalone cluster
> --
>
> Key: SPARK-19011
> URL: https://issues.apache.org/jira/browse/SPARK-19011
> Project: Spark
>  Issue Type: Improvement
>Reporter: hustfxj
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19011) ApplicationDescription should add the Submission ID for the standalone cluster

2016-12-27 Thread hustfxj (JIRA)
hustfxj created SPARK-19011:
---

 Summary: ApplicationDescription should add the Submission ID for 
the standalone cluster
 Key: SPARK-19011
 URL: https://issues.apache.org/jira/browse/SPARK-19011
 Project: Spark
  Issue Type: Improvement
Reporter: hustfxj






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19010) Include Kryo exception in case of overflow

2016-12-27 Thread Sergei Lebedev (JIRA)
Sergei Lebedev created SPARK-19010:
--

 Summary: Include Kryo exception in case of overflow
 Key: SPARK-19010
 URL: https://issues.apache.org/jira/browse/SPARK-19010
 Project: Spark
  Issue Type: Improvement
Affects Versions: 1.6.3
Reporter: Sergei Lebedev


SPARK-6087 replaced Kryo overflow exception with a SparkException giving 
Spark-specific instructions on tackling the issue. The implementation also 
[suppressed|https://github.com/apache/spark/pull/4947/files#diff-1f81c62dad0e2dfc387a974bb08c497cR165]
 the original Kryo exception, which made it impossible to trace the operation 
causing the overflow. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18976) in standlone mode,executor expired by HeartbeanReceiver that still take up cores but no tasks assigned to

2016-12-27 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-18976.
---
Resolution: Duplicate

> in standlone mode,executor expired by HeartbeanReceiver that still take up 
> cores but no tasks assigned to 
> --
>
> Key: SPARK-18976
> URL: https://issues.apache.org/jira/browse/SPARK-18976
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.6.1
> Environment: jdk1.8.0_77 Red Hat 4.4.7-11
>Reporter: liujianhui
> Attachments: screenshot-1.png, screenshot-2.png, screenshot-3.png
>
>
> h2. scene
> when executor expired by HeartbeatReceiver in driver, driver will mark that 
> executor as not live, task scheduler will not assign tasks to that executor, 
> but that executor's status will always be running and take up cores, the 
> executor 18 was expired and no task running, the task time far less than the 
> normal executor 142, but in app page, the executor is running
> !screenshot-1.png!
> !screenshot-2.png!
> !screenshot-3.png!
> h2.process:
> # exeuctor expired by HearbeatReceiver because the last heartbeat execeed the 
> executor timeout
> # executor will be removed in CoarseGrainedSchdulerBackend.killExecutors, so 
> that executor will marked as dead, it will not scheduled as offer since now 
> because it in executorsPendingToRemove
> # status of that executor is running because the CoarseGrainedExecutorBackend 
> processor is also exist and it register block manager to the driver every 
> 10s, log as 
> {code}
> 16/12/22 17:04:26 INFO Executor: Told to re-register on heartbeat
> 16/12/22 17:04:26 INFO BlockManager: BlockManager re-registering with master
> 16/12/22 17:04:26 INFO BlockManagerMaster: Trying to register BlockManager
> 16/12/22 17:04:26 INFO BlockManagerMaster: Registered BlockManager
> 16/12/22 17:04:26 INFO BlockManager: Reporting 0 blocks to the master.
> 16/12/22 17:04:36 INFO Executor: Told to re-register on heartbeat
> 16/12/22 17:04:36 INFO BlockManager: BlockManager re-registering with master
> 16/12/22 17:04:36 INFO BlockManagerMaster: Trying to register BlockManager
> 16/12/22 17:04:36 INFO BlockManagerMaster: Registered BlockManager
> 16/12/22 17:04:36 INFO BlockManager: Reporting 0 blocks to the master.
> 16/12/22 17:04:46 INFO Executor: Told to re-register on heartbeat
> 16/12/22 17:04:46 INFO BlockManager: BlockManager re-registering with master
> 16/12/22 17:04:46 INFO BlockManagerMaster: Trying to register BlockManager
> 16/12/22 17:04:46 INFO BlockManagerMaster: Registered BlockManager
> 16/12/22 17:04:46 INFO BlockManager: Reporting 0 blocks to the master.
> 16/12/22 17:04:56 INFO Executor: Told to re-register on heartbeat
> 16/12/22 17:04:56 INFO BlockManager: BlockManager re-registering with master
> 16/12/22 17:04:56 INFO BlockManagerMaster: Trying to register BlockManager
> 16/12/22 17:04:56 INFO BlockManagerMaster: Registered BlockManager
> 16/12/22 17:04:56 INFO BlockManager: Reporting 0 blocks to the master. 
> {code}
> h2. resolve 
> when the register times exceed some threshold(e.g. 10), the executor should 
> exit as zero 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15359) Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run()

2016-12-27 Thread Jared (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15780014#comment-15780014
 ] 

Jared commented on SPARK-15359:
---

Hi, I also met some similar problem when running spark on mesos.
For my testing, spark mesos dispatcher didn't register with mesos master 
successfully. But mesos dispatcher is still brought up and listening on default 
port 7077.
I think mesos dispatcher should been shut down if status of mesosDriver.run() 
is DRIVER_ABORTED.

I didn't quite understand content in the description.  What's meaning of 
"successful registration"? Do you mean mesosDriver.run() return without 
aborting?
If we're working exactly on the same problem, I would like to contribute to fix 
this issue, for example, review code changes or testing the fixes and so on.

Thanks,
Jared


> Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run()
> ---
>
> Key: SPARK-15359
> URL: https://issues.apache.org/jira/browse/SPARK-15359
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Mesos
>Reporter: Devaraj K
>Priority: Minor
>
> Mesos dispatcher handles DRIVER_ABORTED status for mesosDriver.run() during 
> the successful registration but if the mesosDriver.run() returns 
> DRIVER_ABORTED status after the successful register then there is no action 
> for the status and the thread will be terminated. 
> I think we need to throw the exception and shutdown the dispatcher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org