date:20181222

[jira] [Commented] (SPARK-2984) FileNotFoundException on _temporary directory

2018-12-22 Thread Gaurav Kumar (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-2984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727799#comment-16727799
 ] 

Gaurav Kumar commented on SPARK-2984:
-

I am having similar issues while using spark 1.6 even when I have set 
spark.speculation=false.I am trying to save the rdd as saveAsTextFile in hdfs 
when I am encountering this error.Can this issue be reopened please?

> FileNotFoundException on _temporary directory
> -
>
> Key: SPARK-2984
> URL: https://issues.apache.org/jira/browse/SPARK-2984
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Andrew Ash
>Assignee: Josh Rosen
>Priority: Critical
> Fix For: 1.3.0
>
>
> We've seen several stacktraces and threads on the user mailing list where 
> people are having issues with a {{FileNotFoundException}} stemming from an 
> HDFS path containing {{_temporary}}.
> I ([~aash]) think this may be related to {{spark.speculation}}.  I think the 
> error condition might manifest in this circumstance:
> 1) task T starts on a executor E1
> 2) it takes a long time, so task T' is started on another executor E2
> 3) T finishes in E1 so moves its data from {{_temporary}} to the final 
> destination and deletes the {{_temporary}} directory during cleanup
> 4) T' finishes in E2 and attempts to move its data from {{_temporary}}, but 
> those files no longer exist!  exception
> Some samples:
> {noformat}
> 14/08/11 08:05:08 ERROR JobScheduler: Error running job streaming job 
> 140774430 ms.0
> java.io.FileNotFoundException: File 
> hdfs://hadoopc/user/csong/output/human_bot/-140774430.out/_temporary/0/task_201408110805__m_07
>  does not exist.
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:654)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:102)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:712)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$14.doCall(DistributedFileSystem.java:708)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:708)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:360)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)
> at 
> org.apache.hadoop.mapred.FileOutputCommitter.commitJob(FileOutputCommitter.java:136)
> at 
> org.apache.spark.SparkHadoopWriter.commitJob(SparkHadoopWriter.scala:126)
> at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopDataset(PairRDDFunctions.scala:841)
> at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:724)
> at 
> org.apache.spark.rdd.PairRDDFunctions.saveAsHadoopFile(PairRDDFunctions.scala:643)
> at org.apache.spark.rdd.RDD.saveAsTextFile(RDD.scala:1068)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$8.apply(DStream.scala:773)
> at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$8.apply(DStream.scala:771)
> at 
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:41)
> at 
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
> at 
> org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
> at scala.util.Try$.apply(Try.scala:161)
> at org.apache.spark.streaming.scheduler.Job.run(Job.scala:32)
> at 
> org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:172)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> -- Chen Song at 
> http://apache-spark-user-list.1001560.n3.nabble.com/saveAsTextFiles-file-not-found-exception-td10686.html
> {noformat}
> I am running a Spark Streaming job that uses saveAsTextFiles to save results 
> into hdfs files. However, it has an exception after 20 batches
> result-140631234/_temporary/0/task_201407251119__m_03 does not 
> exist.
> {noformat}
> and
> {noformat}
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>  No lease on /apps/data/vddil/real-time/checkpoint/temp: File does not exist. 
> Holder DFSClient_NONMAPREDUCE_327993456_13 does not have any open files.
>   at

[jira] [Commented] (SPARK-26421) Spark2.4.0 integration hadoop3.1.1 causes hive sql not to use，just in idea local mode

2018-12-22 Thread thinktothings (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727789#comment-16727789
 ] 

thinktothings commented on SPARK-26421:
---

Thanks, I understand this question, I have another question, I don't understand 
how Spark 2.4.0 DataSet.show submits the job, can you help me? Give me relevant 
information to learn?Spark1.6 rdd is aware of the process of submitting jobs, 
and has seen the implementation of the source code in detail.

> Spark2.4.0 integration hadoop3.1.1 causes hive sql not to use，just in idea 
> local mode
> -
>
> Key: SPARK-26421
> URL: https://issues.apache.org/jira/browse/SPARK-26421
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.4.0
> Environment: ).idea maven project
> ).jdk 1.8.0_191
> ).hadoop 3.1.1
> ).spark 2.4.0
>Reporter: thinktothings
>Priority: Major
>
> ).Spark2.4.0 integration hadoop3.1.1 causes hive sql not to use，just in idea 
> local mode
> ).idea maven project
> ).spark.sql connect hive 
> val spark = SparkSession
>  .builder()
>  .master("local")
>  .appName("Spark Hive Example")
>  .config("spark.sql.warehouse.dir", warehouseLocation)
>  .enableHiveSupport()
>  .getOrCreate()
> spark.sql("show databases").show()
>  
> ).do this   error ,local environment not cluser
> --
> Exception in thread "main" java.lang.ExceptionInInitializerError
>  at org.apache.hadoop.hive.conf.HiveConf.(HiveConf.java:105)
>  at java.lang.Class.forName0(Native Method)
>  at java.lang.Class.forName(Class.java:348)
>  at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
>  at 
> org.apache.spark.sql.SparkSession$.hiveClassesArePresent(SparkSession.scala:1117)
>  at 
> org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:866)
>  at 
> com.opensource.bigdata.spark.sql.n_10_spark_hive.n_01_show_database.Run$.main(Run.scala:19)
>  at 
> com.opensource.bigdata.spark.sql.n_10_spark_hive.n_01_show_database.Run.main(Run.scala)
> Caused by: java.lang.IllegalArgumentException: Unrecognized Hadoop major 
> version number: 3.1.1
>  at 
> org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
>  at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
>  at 
> org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
>  at org.apache.hadoop.hive.conf.HiveConf$ConfVars.(HiveConf.java:368)
>  ... 8 more
> Process finished with exit code 1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24422) Add JDK11 in our Jenkins' build servers

2018-12-22 Thread shane knapp (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727788#comment-16727788
 ] 

shane knapp commented on SPARK-24422:
-

build is failing when attempting to build core:
- SparkListenerEvent *** FAILED ***

> Add JDK11 in our Jenkins' build servers
> ---
>
> Key: SPARK-24422
> URL: https://issues.apache.org/jira/browse/SPARK-24422
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 2.3.0
>Reporter: DB Tsai
>Assignee: shane knapp
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26366) Except with transform regression

2018-12-22 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-26366:
--
Fix Version/s: 2.3.3

> Except with transform regression
> 
>
> Key: SPARK-26366
> URL: https://issues.apache.org/jira/browse/SPARK-26366
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.3.2
>Reporter: Dan Osipov
>Assignee: Marco Gaido
>Priority: Major
> Fix For: 2.3.3, 2.4.1, 3.0.0
>
>
> There appears to be a regression between Spark 2.2 and 2.3. Below is the code 
> to reproduce it:
>  
> {code:java}
> import org.apache.spark.sql.functions.col
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.types._
> val inputDF = spark.sqlContext.createDataFrame(
>   spark.sparkContext.parallelize(Seq(
> Row("0", "john", "smith", "j...@smith.com"),
> Row("1", "jane", "doe", "j...@doe.com"),
> Row("2", "apache", "spark", "sp...@apache.org"),
> Row("3", "foo", "bar", null)
>   )),
>   StructType(List(
> StructField("id", StringType, nullable=true),
> StructField("first_name", StringType, nullable=true),
> StructField("last_name", StringType, nullable=true),
> StructField("email", StringType, nullable=true)
>   ))
> )
> val exceptDF = inputDF.transform( toProcessDF =>
>   toProcessDF.filter(
>   (
> col("first_name").isin(Seq("john", "jane"): _*)
>   and col("last_name").isin(Seq("smith", "doe"): _*)
>   )
>   or col("email").isin(List(): _*)
>   )
> )
> inputDF.except(exceptDF).show()
> {code}
> Output with Spark 2.2:
> {noformat}
> +---+--+-++
> | id|first_name|last_name| email|
> +---+--+-++
> | 2| apache| spark|sp...@apache.org|
> | 3| foo| bar| null|
> +---+--+-++{noformat}
> Output with Spark 2.3:
> {noformat}
> +---+--+-++
> | id|first_name|last_name| email|
> +---+--+-++
> | 2| apache| spark|sp...@apache.org|
> +---+--+-++{noformat}
> Note, changing the last line to 
> {code:java}
> inputDF.except(exceptDF.cache()).show()
> {code}
> produces identical output for both Spark 2.3 and 2.2
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-26421) Spark2.4.0 integration hadoop3.1.1 causes hive sql not to use，just in idea local mode

2018-12-22 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-26421.
-

> Spark2.4.0 integration hadoop3.1.1 causes hive sql not to use，just in idea 
> local mode
> -
>
> Key: SPARK-26421
> URL: https://issues.apache.org/jira/browse/SPARK-26421
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.4.0
> Environment: ).idea maven project
> ).jdk 1.8.0_191
> ).hadoop 3.1.1
> ).spark 2.4.0
>Reporter: thinktothings
>Priority: Major
>
> ).Spark2.4.0 integration hadoop3.1.1 causes hive sql not to use，just in idea 
> local mode
> ).idea maven project
> ).spark.sql connect hive 
> val spark = SparkSession
>  .builder()
>  .master("local")
>  .appName("Spark Hive Example")
>  .config("spark.sql.warehouse.dir", warehouseLocation)
>  .enableHiveSupport()
>  .getOrCreate()
> spark.sql("show databases").show()
>  
> ).do this   error ,local environment not cluser
> --
> Exception in thread "main" java.lang.ExceptionInInitializerError
>  at org.apache.hadoop.hive.conf.HiveConf.(HiveConf.java:105)
>  at java.lang.Class.forName0(Native Method)
>  at java.lang.Class.forName(Class.java:348)
>  at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
>  at 
> org.apache.spark.sql.SparkSession$.hiveClassesArePresent(SparkSession.scala:1117)
>  at 
> org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:866)
>  at 
> com.opensource.bigdata.spark.sql.n_10_spark_hive.n_01_show_database.Run$.main(Run.scala:19)
>  at 
> com.opensource.bigdata.spark.sql.n_10_spark_hive.n_01_show_database.Run.main(Run.scala)
> Caused by: java.lang.IllegalArgumentException: Unrecognized Hadoop major 
> version number: 3.1.1
>  at 
> org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
>  at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
>  at 
> org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
>  at org.apache.hadoop.hive.conf.HiveConf$ConfVars.(HiveConf.java:368)
>  ... 8 more
> Process finished with exit code 1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26421) Spark2.4.0 integration hadoop3.1.1 causes hive sql not to use，just in idea local mode

2018-12-22 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-26421.
---
  Resolution: Duplicate
Target Version/s:   (was: 2.4.0)

> Spark2.4.0 integration hadoop3.1.1 causes hive sql not to use，just in idea 
> local mode
> -
>
> Key: SPARK-26421
> URL: https://issues.apache.org/jira/browse/SPARK-26421
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.4.0
> Environment: ).idea maven project
> ).jdk 1.8.0_191
> ).hadoop 3.1.1
> ).spark 2.4.0
>Reporter: thinktothings
>Priority: Major
>
> ).Spark2.4.0 integration hadoop3.1.1 causes hive sql not to use，just in idea 
> local mode
> ).idea maven project
> ).spark.sql connect hive 
> val spark = SparkSession
>  .builder()
>  .master("local")
>  .appName("Spark Hive Example")
>  .config("spark.sql.warehouse.dir", warehouseLocation)
>  .enableHiveSupport()
>  .getOrCreate()
> spark.sql("show databases").show()
>  
> ).do this   error ,local environment not cluser
> --
> Exception in thread "main" java.lang.ExceptionInInitializerError
>  at org.apache.hadoop.hive.conf.HiveConf.(HiveConf.java:105)
>  at java.lang.Class.forName0(Native Method)
>  at java.lang.Class.forName(Class.java:348)
>  at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
>  at 
> org.apache.spark.sql.SparkSession$.hiveClassesArePresent(SparkSession.scala:1117)
>  at 
> org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:866)
>  at 
> com.opensource.bigdata.spark.sql.n_10_spark_hive.n_01_show_database.Run$.main(Run.scala:19)
>  at 
> com.opensource.bigdata.spark.sql.n_10_spark_hive.n_01_show_database.Run.main(Run.scala)
> Caused by: java.lang.IllegalArgumentException: Unrecognized Hadoop major 
> version number: 3.1.1
>  at 
> org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
>  at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
>  at 
> org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
>  at org.apache.hadoop.hive.conf.HiveConf$ConfVars.(HiveConf.java:368)
>  ... 8 more
> Process finished with exit code 1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26421) Spark2.4.0 integration hadoop3.1.1 causes hive sql not to use，just in idea local mode

2018-12-22 Thread Dongjoon Hyun (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727737#comment-16727737
 ] 

Dongjoon Hyun commented on SPARK-26421:
---

Hi, [~thinktothings]. This is a duplication of SPARK-23534. 
Apache Spark officially doesn't support Hadoop 3.

> Spark2.4.0 integration hadoop3.1.1 causes hive sql not to use，just in idea 
> local mode
> -
>
> Key: SPARK-26421
> URL: https://issues.apache.org/jira/browse/SPARK-26421
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 2.4.0
> Environment: ).idea maven project
> ).jdk 1.8.0_191
> ).hadoop 3.1.1
> ).spark 2.4.0
>Reporter: thinktothings
>Priority: Major
>
> ).Spark2.4.0 integration hadoop3.1.1 causes hive sql not to use，just in idea 
> local mode
> ).idea maven project
> ).spark.sql connect hive 
> val spark = SparkSession
>  .builder()
>  .master("local")
>  .appName("Spark Hive Example")
>  .config("spark.sql.warehouse.dir", warehouseLocation)
>  .enableHiveSupport()
>  .getOrCreate()
> spark.sql("show databases").show()
>  
> ).do this   error ,local environment not cluser
> --
> Exception in thread "main" java.lang.ExceptionInInitializerError
>  at org.apache.hadoop.hive.conf.HiveConf.(HiveConf.java:105)
>  at java.lang.Class.forName0(Native Method)
>  at java.lang.Class.forName(Class.java:348)
>  at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
>  at 
> org.apache.spark.sql.SparkSession$.hiveClassesArePresent(SparkSession.scala:1117)
>  at 
> org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:866)
>  at 
> com.opensource.bigdata.spark.sql.n_10_spark_hive.n_01_show_database.Run$.main(Run.scala:19)
>  at 
> com.opensource.bigdata.spark.sql.n_10_spark_hive.n_01_show_database.Run.main(Run.scala)
> Caused by: java.lang.IllegalArgumentException: Unrecognized Hadoop major 
> version number: 3.1.1
>  at 
> org.apache.hadoop.hive.shims.ShimLoader.getMajorVersion(ShimLoader.java:174)
>  at org.apache.hadoop.hive.shims.ShimLoader.loadShims(ShimLoader.java:139)
>  at 
> org.apache.hadoop.hive.shims.ShimLoader.getHadoopShims(ShimLoader.java:100)
>  at org.apache.hadoop.hive.conf.HiveConf$ConfVars.(HiveConf.java:368)
>  ... 8 more
> Process finished with exit code 1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26243) Use java.time API for parsing timestamps and dates from JSON

2018-12-22 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727551#comment-16727551
 ] 

Apache Spark commented on SPARK-26243:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/23374

> Use java.time API for parsing timestamps and dates from JSON
> 
>
> Key: SPARK-26243
> URL: https://issues.apache.org/jira/browse/SPARK-26243
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, CSV datasource uses Apache FastDateFormat with a few fallbacks for 
> parsing values of TimestampType and DateType. The result of parsing is an 
> instance of java.util.Date/Timestamp which represents a specific instant in 
> time, with millisecond precision. The tickets aims to switch on Java 8 API - 
> java.time which allow parsing with nanoseconds precision.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26243) Use java.time API for parsing timestamps and dates from JSON

2018-12-22 Thread Apache Spark (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727552#comment-16727552
 ] 

Apache Spark commented on SPARK-26243:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/23374

> Use java.time API for parsing timestamps and dates from JSON
> 
>
> Key: SPARK-26243
> URL: https://issues.apache.org/jira/browse/SPARK-26243
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, CSV datasource uses Apache FastDateFormat with a few fallbacks for 
> parsing values of TimestampType and DateType. The result of parsing is an 
> instance of java.util.Date/Timestamp which represents a specific instant in 
> time, with millisecond precision. The tickets aims to switch on Java 8 API - 
> java.time which allow parsing with nanoseconds precision.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26402) Accessing nested fields with different cases in case insensitive mode

2018-12-22 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727529#comment-16727529
 ] 

ASF GitHub Bot commented on SPARK-26402:


asfgit closed pull request #23353: [SPARK-26402][SQL] Accessing nested fields 
with different cases in case insensitive mode
URL: https://github.com/apache/spark/pull/23353
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
index fe6db8b344d3d..4d218b936b3a2 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala
@@ -26,6 +26,7 @@ package org.apache.spark.sql.catalyst.expressions
  *
  * The following rules are applied:
  *  - Names and nullability hints for [[org.apache.spark.sql.types.DataType]]s 
are stripped.
+ *  - Names for [[GetStructField]] are stripped.
  *  - Commutative and associative operations ([[Add]] and [[Multiply]]) have 
their children ordered
  *by `hashCode`.
  *  - [[EqualTo]] and [[EqualNullSafe]] are reordered by `hashCode`.
@@ -37,10 +38,11 @@ object Canonicalize {
 expressionReorder(ignoreNamesTypes(e))
   }
 
-  /** Remove names and nullability from types. */
+  /** Remove names and nullability from types, and names from 
`GetStructField`. */
   private[expressions] def ignoreNamesTypes(e: Expression): Expression = e 
match {
 case a: AttributeReference =>
   AttributeReference("none", a.dataType.asNullable)(exprId = a.exprId)
+case GetStructField(child, ordinal, Some(_)) => GetStructField(child, 
ordinal, None)
 case _ => e
   }
 
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala
index 28e6940f3cca3..9802a6e5891b8 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CanonicalizeSuite.scala
@@ -20,6 +20,7 @@ package org.apache.spark.sql.catalyst.expressions
 import org.apache.spark.SparkFunSuite
 import org.apache.spark.sql.catalyst.dsl.plans._
 import org.apache.spark.sql.catalyst.plans.logical.Range
+import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
 
 class CanonicalizeSuite extends SparkFunSuite {
 
@@ -50,4 +51,32 @@ class CanonicalizeSuite extends SparkFunSuite {
 assert(range.where(arrays1).sameResult(range.where(arrays2)))
 assert(!range.where(arrays1).sameResult(range.where(arrays3)))
   }
+
+  test("SPARK-26402: accessing nested fields with different cases in case 
insensitive mode") {
+val expId = NamedExpression.newExprId
+val qualifier = Seq.empty[String]
+val structType = StructType(
+  StructField("a", StructType(StructField("b", IntegerType, false) :: 
Nil), false) :: Nil)
+
+// GetStructField with different names are semantically equal
+val fieldA1 = GetStructField(
+  AttributeReference("data1", structType, false)(expId, qualifier),
+  0, Some("a1"))
+val fieldA2 = GetStructField(
+  AttributeReference("data2", structType, false)(expId, qualifier),
+  0, Some("a2"))
+assert(fieldA1.semanticEquals(fieldA2))
+
+val fieldB1 = GetStructField(
+  GetStructField(
+AttributeReference("data1", structType, false)(expId, qualifier),
+0, Some("a1")),
+  0, Some("b1"))
+val fieldB2 = GetStructField(
+  GetStructField(
+AttributeReference("data2", structType, false)(expId, qualifier),
+0, Some("a2")),
+  0, Some("b2"))
+assert(fieldB1.semanticEquals(fieldB2))
+  }
 }
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BinaryComparisonSimplificationSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BinaryComparisonSimplificationSuite.scala
index a313681eeb8f0..5794691a365a9 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BinaryComparisonSimplificationSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/BinaryComparisonSimplificationSuite.scala
@@ -25,6 +25,7 @@ import 
org.apache.spark.sql.catalyst.expressions.Literal.{FalseLiteral, TrueLite
 import org.apache.spark.sql.catalyst.plans.PlanTest
 import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.catalyst.rules._
+import

[jira] [Commented] (SPARK-24422) Add JDK11 in our Jenkins' build servers

2018-12-22 Thread shane knapp (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727502#comment-16727502
 ] 

shane knapp commented on SPARK-24422:
-

test build started:

https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.7-jdk-11-ubuntu-testing/1/

> Add JDK11 in our Jenkins' build servers
> ---
>
> Key: SPARK-24422
> URL: https://issues.apache.org/jira/browse/SPARK-24422
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 2.3.0
>Reporter: DB Tsai
>Assignee: shane knapp
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25245) Explain regarding limiting modification on "spark.sql.shuffle.partitions" for structured streaming

2018-12-22 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727501#comment-16727501
 ] 

ASF GitHub Bot commented on SPARK-25245:


srowen closed pull request #22238: [SPARK-25245][DOCS][SS] Explain regarding 
limiting modification on "spark.sql.shuffle.partitions" for structured streaming
URL: https://github.com/apache/spark/pull/22238
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/structured-streaming-programming-guide.md 
b/docs/structured-streaming-programming-guide.md
index 73de1892977ac..8c3622c857240 100644
--- a/docs/structured-streaming-programming-guide.md
+++ b/docs/structured-streaming-programming-guide.md
@@ -2812,6 +2812,16 @@ See [Input Sources](#input-sources) and [Output 
Sinks](#output-sinks) sections f
 
 # Additional Information
 
+**Notes**
+
+- Several configurations are not modifiable after the query has run. To change 
them, discard the checkpoint and start a new query. These configurations 
include:
+  - `spark.sql.shuffle.partitions`
+- This is due to the physical partitioning of state: state is partitioned 
via applying hash function to key, hence the number of partitions for state 
should be unchanged.
+- If you want to run fewer tasks for stateful operations, `coalesce` would 
help with avoiding unnecessary repartitioning.
+  - After `coalesce`, the number of (reduced) tasks will be kept unless 
another shuffle happens.
+  - `spark.sql.streaming.stateStore.providerClass`: To read the previous state 
of the query properly, the class of state store provider should be unchanged.
+  - `spark.sql.streaming.multipleWatermarkPolicy`: Modification of this would 
lead inconsistent watermark value when query contains multiple watermarks, 
hence the policy should be unchanged.
+
 **Further Reading**
 
 - See and run the
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index ef3ce98fd7add..ca100da9f019c 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -266,7 +266,9 @@ object SQLConf {
 .createWithDefault(Long.MaxValue)
 
   val SHUFFLE_PARTITIONS = buildConf("spark.sql.shuffle.partitions")
-.doc("The default number of partitions to use when shuffling data for 
joins or aggregations.")
+.doc("The default number of partitions to use when shuffling data for 
joins or aggregations. " +
+  "Note: For structured streaming, this configuration cannot be changed 
between query " +
+  "restarts from the same checkpoint location.")
 .intConf
 .createWithDefault(200)
 
@@ -868,7 +870,9 @@ object SQLConf {
   .internal()
   .doc(
 "The class used to manage state data in stateful streaming queries. 
This class must " +
-  "be a subclass of StateStoreProvider, and must have a zero-arg 
constructor.")
+  "be a subclass of StateStoreProvider, and must have a zero-arg 
constructor. " +
+  "Note: For structured streaming, this configuration cannot be 
changed between query " +
+  "restarts from the same checkpoint location.")
   .stringConf
   .createWithDefault(
 
"org.apache.spark.sql.execution.streaming.state.HDFSBackedStateStoreProvider")


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Explain regarding limiting modification on "spark.sql.shuffle.partitions" for 
> structured streaming
> --
>
> Key: SPARK-25245
> URL: https://issues.apache.org/jira/browse/SPARK-25245
> Project: Spark
>  Issue Type: Documentation
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Minor
> Fix For: 3.0.0
>
>
> Couple of users wondered why "spark.sql.shuffle.partitions" keeps unchanged 
> when they changed the config value after running the query. Some of them even 
> submitted the patch as this behavior as a bug. But it is based on how state 
> is partitioned and the behavior is intentional.
> Looks like it's worth to explain it to guide doc so that no more users will 
> be wondered.



--
This message

[jira] [Updated] (SPARK-25245) Explain regarding limiting modification on "spark.sql.shuffle.partitions" for structured streaming

2018-12-22 Thread Sean Owen (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-25245:
--
Priority: Minor  (was: Major)

> Explain regarding limiting modification on "spark.sql.shuffle.partitions" for 
> structured streaming
> --
>
> Key: SPARK-25245
> URL: https://issues.apache.org/jira/browse/SPARK-25245
> Project: Spark
>  Issue Type: Documentation
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Minor
> Fix For: 3.0.0
>
>
> Couple of users wondered why "spark.sql.shuffle.partitions" keeps unchanged 
> when they changed the config value after running the query. Some of them even 
> submitted the patch as this behavior as a bug. But it is based on how state 
> is partitioned and the behavior is intentional.
> Looks like it's worth to explain it to guide doc so that no more users will 
> be wondered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-25245) Explain regarding limiting modification on "spark.sql.shuffle.partitions" for structured streaming

2018-12-22 Thread Sean Owen (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-25245.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 22238
[https://github.com/apache/spark/pull/22238]

> Explain regarding limiting modification on "spark.sql.shuffle.partitions" for 
> structured streaming
> --
>
> Key: SPARK-25245
> URL: https://issues.apache.org/jira/browse/SPARK-25245
> Project: Spark
>  Issue Type: Documentation
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> Couple of users wondered why "spark.sql.shuffle.partitions" keeps unchanged 
> when they changed the config value after running the query. Some of them even 
> submitted the patch as this behavior as a bug. But it is based on how state 
> is partitioned and the behavior is intentional.
> Looks like it's worth to explain it to guide doc so that no more users will 
> be wondered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-25245) Explain regarding limiting modification on "spark.sql.shuffle.partitions" for structured streaming

2018-12-22 Thread Sean Owen (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-25245:
-

Assignee: Jungtaek Lim

> Explain regarding limiting modification on "spark.sql.shuffle.partitions" for 
> structured streaming
> --
>
> Key: SPARK-25245
> URL: https://issues.apache.org/jira/browse/SPARK-25245
> Project: Spark
>  Issue Type: Documentation
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> Couple of users wondered why "spark.sql.shuffle.partitions" keeps unchanged 
> when they changed the config value after running the query. Some of them even 
> submitted the patch as this behavior as a bug. But it is based on how state 
> is partitioned and the behavior is intentional.
> Looks like it's worth to explain it to guide doc so that no more users will 
> be wondered.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24422) Add JDK11 in our Jenkins' build servers

2018-12-22 Thread shane knapp (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727499#comment-16727499
 ] 

shane knapp commented on SPARK-24422:
-

btw, to address earlier comments:  JDK11 will be used for building spark, *not* 
for running jenkins.

> Add JDK11 in our Jenkins' build servers
> ---
>
> Key: SPARK-24422
> URL: https://issues.apache.org/jira/browse/SPARK-24422
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 2.3.0
>Reporter: DB Tsai
>Assignee: shane knapp
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-25953) install jdk11 on jenkins workers

2018-12-22 Thread shane knapp (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shane knapp resolved SPARK-25953.
-
Resolution: Duplicate

dupe of https://issues.apache.org/jira/browse/SPARK-24422

> install jdk11 on jenkins workers
> 
>
> Key: SPARK-25953
> URL: https://issues.apache.org/jira/browse/SPARK-25953
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: shane knapp
>Assignee: shane knapp
>Priority: Critical
>
> once we pin down exact what we want installed on the jenkins workers, i will 
> add it to our ansible and deploy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-25953) install jdk11 on jenkins workers

2018-12-22 Thread shane knapp (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shane knapp closed SPARK-25953.
---

> install jdk11 on jenkins workers
> 
>
> Key: SPARK-25953
> URL: https://issues.apache.org/jira/browse/SPARK-25953
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: shane knapp
>Assignee: shane knapp
>Priority: Critical
>
> once we pin down exact what we want installed on the jenkins workers, i will 
> add it to our ansible and deploy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24422) Add JDK11 in our Jenkins' build servers

2018-12-22 Thread shane knapp (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727496#comment-16727496
 ] 

shane knapp commented on SPARK-24422:
-

alright, java 11 is installed on all of the jenkins workers.

centos workers:
{code:java}
[ sknapp@amp-jenkins-master ] [ ~ ]
$ pssh -h jenkins_workers.txt -i "export PATH=/usr/java/jdk-11.0.1/bin:$PATH; 
java -version"
[1] 08:18:25 [SUCCESS] amp-jenkins-worker-04
Stderr: java version "11.0.1" 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode)
[2] 08:18:25 [SUCCESS] amp-jenkins-worker-03
Stderr: java version "11.0.1" 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode)
[3] 08:18:25 [SUCCESS] amp-jenkins-worker-06
Stderr: java version "11.0.1" 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode)
[4] 08:18:25 [SUCCESS] amp-jenkins-worker-02
Stderr: java version "11.0.1" 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode)
[5] 08:18:25 [SUCCESS] amp-jenkins-worker-05
Stderr: java version "11.0.1" 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode)
[6] 08:18:25 [SUCCESS] amp-jenkins-worker-01
Stderr: java version "11.0.1" 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode)
{code}
 

ubuntu workers:
{code:java}
[ sknapp@amp-jenkins-master ] [ ~ ]
$ pssh -h ubuntu_workers.txt -i "export PATH=/usr/lib/jvm/jdk-11.0.1/bin:$PATH; 
java -version"
[1] 08:20:20 [SUCCESS] amp-jenkins-staging-worker-02
Stderr: java version "11.0.1" 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode)
[2] 08:20:20 [SUCCESS] research-jenkins-worker-07
Stderr: java version "11.0.1" 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode)
[3] 08:20:20 [SUCCESS] research-jenkins-worker-08
Stderr: java version "11.0.1" 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode)
[4] 08:20:20 [SUCCESS] amp-jenkins-staging-worker-01
Stderr: java version "11.0.1" 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode)
{code}
next steps?  i'm assuming some job configs will need an env var like 
JAVA11_HOME added.

> Add JDK11 in our Jenkins' build servers
> ---
>
> Key: SPARK-24422
> URL: https://issues.apache.org/jira/browse/SPARK-24422
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 2.3.0
>Reporter: DB Tsai
>Assignee: shane knapp
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-24422) Add JDK11 in our Jenkins' build servers

2018-12-22 Thread shane knapp (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shane knapp reassigned SPARK-24422:
---

Assignee: shane knapp

> Add JDK11 in our Jenkins' build servers
> ---
>
> Key: SPARK-24422
> URL: https://issues.apache.org/jira/browse/SPARK-24422
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 2.3.0
>Reporter: DB Tsai
>Assignee: shane knapp
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-26282) Update JVM to 8u191 on jenkins workers

2018-12-22 Thread shane knapp (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shane knapp closed SPARK-26282.
---

> Update JVM to 8u191 on jenkins workers
> --
>
> Key: SPARK-26282
> URL: https://issues.apache.org/jira/browse/SPARK-26282
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: shane knapp
>Assignee: shane knapp
>Priority: Major
>
> the jvm we're using to build/test spark on the centos workers is a bit...  
> long in the teeth:
> {noformat}
> [sknapp@amp-jenkins-worker-04 ~]$ java -version
> java version "1.8.0_60"
> Java(TM) SE Runtime Environment (build 1.8.0_60-b27)
> Java HotSpot(TM) 64-Bit Server VM (build 25.60-b23, mixed mode){noformat}
> on the ubuntu nodes, it's only a little bit less old:
> {noformat}
> sknapp@amp-jenkins-staging-worker-01:~$ java -version
> java version "1.8.0_171"
> Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
> Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode){noformat}
> steps to update on centos:
>  * manually install new(er) java
>  * update /etc/alternatives
>  * update JJB configs and update JAVA_HOME/JAVA_BIN
> steps to update on ubuntu:
>  * update ansible to install newer java
>  * deploy ansible
> questions:
>  * do we stick w/java8 for now?
>  * which version is sufficient?
> [~srowen]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25953) install jdk11 on jenkins workers

2018-12-22 Thread shane knapp (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727495#comment-16727495
 ] 

shane knapp commented on SPARK-25953:
-

alright, java 11 is installed on all of the jenkins workers.

centos workers:
{code:java}
[ sknapp@amp-jenkins-master ] [ ~ ]
$ pssh -h jenkins_workers.txt -i "export PATH=/usr/java/jdk-11.0.1/bin:$PATH; 
java -version"
[1] 08:18:25 [SUCCESS] amp-jenkins-worker-04
Stderr: java version "11.0.1" 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode)
[2] 08:18:25 [SUCCESS] amp-jenkins-worker-03
Stderr: java version "11.0.1" 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode)
[3] 08:18:25 [SUCCESS] amp-jenkins-worker-06
Stderr: java version "11.0.1" 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode)
[4] 08:18:25 [SUCCESS] amp-jenkins-worker-02
Stderr: java version "11.0.1" 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode)
[5] 08:18:25 [SUCCESS] amp-jenkins-worker-05
Stderr: java version "11.0.1" 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode)
[6] 08:18:25 [SUCCESS] amp-jenkins-worker-01
Stderr: java version "11.0.1" 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode)
{code}
 

ubuntu workers:
{code:java}
[ sknapp@amp-jenkins-master ] [ ~ ]
$ pssh -h ubuntu_workers.txt -i "export PATH=/usr/lib/jvm/jdk-11.0.1/bin:$PATH; 
java -version"
[1] 08:20:20 [SUCCESS] amp-jenkins-staging-worker-02
Stderr: java version "11.0.1" 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode)
[2] 08:20:20 [SUCCESS] research-jenkins-worker-07
Stderr: java version "11.0.1" 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode)
[3] 08:20:20 [SUCCESS] research-jenkins-worker-08
Stderr: java version "11.0.1" 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode)
[4] 08:20:20 [SUCCESS] amp-jenkins-staging-worker-01
Stderr: java version "11.0.1" 2018-10-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.1+13-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.1+13-LTS, mixed mode)
{code}
next steps?  i'm assuming some job configs will need an env var like 
JAVA11_HOME added.

> install jdk11 on jenkins workers
> 
>
> Key: SPARK-25953
> URL: https://issues.apache.org/jira/browse/SPARK-25953
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: shane knapp
>Assignee: shane knapp
>Priority: Critical
>
> once we pin down exact what we want installed on the jenkins workers, i will 
> add it to our ansible and deploy.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14023) Make exceptions consistent regarding fields and columns

2018-12-22 Thread Sean Owen (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-14023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-14023:
-

Assignee: Rekha Joshi

> Make exceptions consistent regarding fields and columns
> ---
>
> Key: SPARK-14023
> URL: https://issues.apache.org/jira/browse/SPARK-14023
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Assignee: Rekha Joshi
>Priority: Trivial
>
> As you can see below, a column is called a field depending on where an 
> exception is thrown. I think it should be "column" everywhere (since that's 
> what has a type from a schema).
> {code}
> scala> lr
> res32: org.apache.spark.ml.regression.LinearRegression = linReg_d9bfe808e743
> scala> lr.fit(ds)
> java.lang.IllegalArgumentException: Field "features" does not exist.
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:214)
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:214)
>   at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
>   at scala.collection.AbstractMap.getOrElse(Map.scala:59)
>   at org.apache.spark.sql.types.StructType.apply(StructType.scala:213)
>   at 
> org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:40)
>   at 
> org.apache.spark.ml.PredictorParams$class.validateAndTransformSchema(Predictor.scala:50)
>   at 
> org.apache.spark.ml.Predictor.validateAndTransformSchema(Predictor.scala:71)
>   at org.apache.spark.ml.Predictor.transformSchema(Predictor.scala:116)
>   at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67)
>   at org.apache.spark.ml.Predictor.fit(Predictor.scala:89)
>   ... 51 elided
> scala> lr.fit(ds)
> java.lang.IllegalArgumentException: requirement failed: Column label must be 
> of type DoubleType but was actually StringType.
>   at scala.Predef$.require(Predef.scala:219)
>   at 
> org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:42)
>   at 
> org.apache.spark.ml.PredictorParams$class.validateAndTransformSchema(Predictor.scala:53)
>   at 
> org.apache.spark.ml.Predictor.validateAndTransformSchema(Predictor.scala:71)
>   at org.apache.spark.ml.Predictor.transformSchema(Predictor.scala:116)
>   at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67)
>   at org.apache.spark.ml.Predictor.fit(Predictor.scala:89)
>   ... 51 elided
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14023) Make exceptions consistent regarding fields and columns

2018-12-22 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-14023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727490#comment-16727490
 ] 

ASF GitHub Bot commented on SPARK-14023:


srowen opened a new pull request #23373: [SPARK-14023][CORE] Don't reference 
'field' in StructField errors for clarity in exceptions
URL: https://github.com/apache/spark/pull/23373
 
 
   ## What changes were proposed in this pull request?
   
   Variation of https://github.com/apache/spark/pull/20500
   I cheated by not referencing fields or columns at all as this exception 
propagates in contexts where both would be applicable.
   
   ## How was this patch tested?
   
   Existing tests


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make exceptions consistent regarding fields and columns
> ---
>
> Key: SPARK-14023
> URL: https://issues.apache.org/jira/browse/SPARK-14023
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Priority: Trivial
>
> As you can see below, a column is called a field depending on where an 
> exception is thrown. I think it should be "column" everywhere (since that's 
> what has a type from a schema).
> {code}
> scala> lr
> res32: org.apache.spark.ml.regression.LinearRegression = linReg_d9bfe808e743
> scala> lr.fit(ds)
> java.lang.IllegalArgumentException: Field "features" does not exist.
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:214)
>   at 
> org.apache.spark.sql.types.StructType$$anonfun$apply$1.apply(StructType.scala:214)
>   at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
>   at scala.collection.AbstractMap.getOrElse(Map.scala:59)
>   at org.apache.spark.sql.types.StructType.apply(StructType.scala:213)
>   at 
> org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:40)
>   at 
> org.apache.spark.ml.PredictorParams$class.validateAndTransformSchema(Predictor.scala:50)
>   at 
> org.apache.spark.ml.Predictor.validateAndTransformSchema(Predictor.scala:71)
>   at org.apache.spark.ml.Predictor.transformSchema(Predictor.scala:116)
>   at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67)
>   at org.apache.spark.ml.Predictor.fit(Predictor.scala:89)
>   ... 51 elided
> scala> lr.fit(ds)
> java.lang.IllegalArgumentException: requirement failed: Column label must be 
> of type DoubleType but was actually StringType.
>   at scala.Predef$.require(Predef.scala:219)
>   at 
> org.apache.spark.ml.util.SchemaUtils$.checkColumnType(SchemaUtils.scala:42)
>   at 
> org.apache.spark.ml.PredictorParams$class.validateAndTransformSchema(Predictor.scala:53)
>   at 
> org.apache.spark.ml.Predictor.validateAndTransformSchema(Predictor.scala:71)
>   at org.apache.spark.ml.Predictor.transformSchema(Predictor.scala:116)
>   at org.apache.spark.ml.PipelineStage.transformSchema(Pipeline.scala:67)
>   at org.apache.spark.ml.Predictor.fit(Predictor.scala:89)
>   ... 51 elided
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14023) Make exceptions consistent regarding fields and columns

2018-12-22 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-14023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727491#comment-16727491
 ] 

ASF GitHub Bot commented on SPARK-14023:


srowen closed pull request #20500: [SPARK-14023][MLlib] Make exceptions 
consistent regarding fields and columns
URL: https://github.com/apache/spark/pull/20500
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala
index e3b0969283a84..c6c295487ecc6 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/StructType.scala
@@ -57,7 +57,7 @@ import org.apache.spark.util.Utils
  *
  * // If this struct does not have a field called "d", it throws an exception.
  * struct("d")
- * // java.lang.IllegalArgumentException: Field "d" does not exist.
+ * // java.lang.IllegalArgumentException: Column "d" does not exist.
  * //   ...
  *
  * // Extract multiple StructFields. Field names are provided in a set.
@@ -69,7 +69,7 @@ import org.apache.spark.util.Utils
  * // Any names without matching fields will throw an exception.
  * // For the case shown below, an exception is thrown due to "d".
  * struct(Set("b", "c", "d"))
- * // java.lang.IllegalArgumentException: Field "d" does not exist.
+ * // java.lang.IllegalArgumentException: Column "d" does not exist.
  * //...
  * }}}
  *
@@ -260,24 +260,24 @@ case class StructType(fields: Array[StructField]) extends 
DataType with Seq[Stru
   /**
* Extracts the [[StructField]] with the given name.
*
-   * @throws IllegalArgumentException if a field with the given name does not 
exist
+   * @throws IllegalArgumentException if a column with the given name does not 
exist
*/
   def apply(name: String): StructField = {
 nameToField.getOrElse(name,
-  throw new IllegalArgumentException(s"""Field "$name" does not exist."""))
+  throw new IllegalArgumentException(s"""Column "$name" does not 
exist."""))
   }
 
   /**
* Returns a [[StructType]] containing [[StructField]]s of the given names, 
preserving the
* original order of fields.
*
-   * @throws IllegalArgumentException if a field cannot be found for any of 
the given names
+   * @throws IllegalArgumentException if a column cannot be found for any of 
the given names
*/
   def apply(names: Set[String]): StructType = {
 val nonExistFields = names -- fieldNamesSet
 if (nonExistFields.nonEmpty) {
   throw new IllegalArgumentException(
-s"Field ${nonExistFields.mkString(",")} does not exist.")
+s"Column ${nonExistFields.mkString(",")} does not exist.")
 }
 // Preserve the original order of fields.
 StructType(fields.filter(f => names.contains(f.name)))
@@ -286,11 +286,11 @@ case class StructType(fields: Array[StructField]) extends 
DataType with Seq[Stru
   /**
* Returns the index of a given field.
*
-   * @throws IllegalArgumentException if a field with the given name does not 
exist
+   * @throws IllegalArgumentException if a column with the given name does not 
exist
*/
   def fieldIndex(name: String): Int = {
 nameToIndex.getOrElse(name,
-  throw new IllegalArgumentException(s"""Field "$name" does not exist."""))
+  throw new IllegalArgumentException(s"""Column "$name" does not 
exist."""))
   }
 
   private[sql] def getFieldIndex(name: String): Option[Int] = {


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Make exceptions consistent regarding fields and columns
> ---
>
> Key: SPARK-14023
> URL: https://issues.apache.org/jira/browse/SPARK-14023
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Priority: Trivial
>
> As you can see below, a column is called a field depending on where an 
> exception is thrown. I think it should be "column" everywhere (since that's 
> what has a type from a schema).
> {code}
> scala> lr
> res32: org.apache.spark.ml.regression.LinearRegression = linReg_d9bfe808e743
> scala> lr.fit(ds)
> java.lang.IllegalArgumentException: Field "features" does not exist.
>   at 
>

[jira] [Commented] (SPARK-26285) Add a metric source for accumulators (aka AccumulatorSource)

2018-12-22 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727482#comment-16727482
 ] 

ASF GitHub Bot commented on SPARK-26285:


asfgit closed pull request #23242: [SPARK-26285][CORE] accumulator metrics 
sources for LongAccumulator and Doub…
URL: https://github.com/apache/spark/pull/23242
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/core/src/main/scala/org/apache/spark/metrics/source/AccumulatorSource.scala 
b/core/src/main/scala/org/apache/spark/metrics/source/AccumulatorSource.scala
new file mode 100644
index 0..45a4d224d45fe
--- /dev/null
+++ 
b/core/src/main/scala/org/apache/spark/metrics/source/AccumulatorSource.scala
@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.metrics.source
+
+import com.codahale.metrics.{Gauge, MetricRegistry}
+
+import org.apache.spark.SparkContext
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.util.{AccumulatorV2, DoubleAccumulator, 
LongAccumulator}
+
+/**
+ * AccumulatorSource is a Spark metric Source that reports the current value
+ * of the accumulator as a gauge.
+ *
+ * It is restricted to the LongAccumulator and the DoubleAccumulator, as those
+ * are the current built-in numerical accumulators with Spark, and excludes
+ * the CollectionAccumulator, as that is a List of values (hard to report,
+ * to a metrics system)
+ */
+private[spark] class AccumulatorSource extends Source {
+  private val registry = new MetricRegistry
+  protected def register[T](accumulators: Map[String, AccumulatorV2[_, T]]): 
Unit = {
+accumulators.foreach {
+  case (name, accumulator) =>
+val gauge = new Gauge[T] {
+  override def getValue: T = accumulator.value
+}
+registry.register(MetricRegistry.name(name), gauge)
+}
+  }
+
+  override def sourceName: String = "AccumulatorSource"
+  override def metricRegistry: MetricRegistry = registry
+}
+
+@Experimental
+class LongAccumulatorSource extends AccumulatorSource
+
+@Experimental
+class DoubleAccumulatorSource extends AccumulatorSource
+
+/**
+ * :: Experimental ::
+ * Metrics source specifically for LongAccumulators. Accumulators
+ * are only valid on the driver side, so these metrics are reported
+ * only by the driver.
+ * Register LongAccumulators using:
+ *LongAccumulatorSource.register(sc, {"name" -> longAccumulator})
+ */
+@Experimental
+object LongAccumulatorSource {
+  def register(sc: SparkContext, accumulators: Map[String, LongAccumulator]): 
Unit = {
+val source = new LongAccumulatorSource
+source.register(accumulators)
+sc.env.metricsSystem.registerSource(source)
+  }
+}
+
+/**
+ * :: Experimental ::
+ * Metrics source specifically for DoubleAccumulators. Accumulators
+ * are only valid on the driver side, so these metrics are reported
+ * only by the driver.
+ * Register DoubleAccumulators using:
+ *DoubleAccumulatorSource.register(sc, {"name" -> doubleAccumulator})
+ */
+@Experimental
+object DoubleAccumulatorSource {
+  def register(sc: SparkContext, accumulators: Map[String, 
DoubleAccumulator]): Unit = {
+val source = new DoubleAccumulatorSource
+source.register(accumulators)
+sc.env.metricsSystem.registerSource(source)
+  }
+}
diff --git 
a/core/src/test/scala/org/apache/spark/metrics/source/AccumulatorSourceSuite.scala
 
b/core/src/test/scala/org/apache/spark/metrics/source/AccumulatorSourceSuite.scala
new file mode 100644
index 0..6a6c07cb068cc
--- /dev/null
+++ 
b/core/src/test/scala/org/apache/spark/metrics/source/AccumulatorSourceSuite.scala
@@ -0,0 +1,91 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License,

[jira] [Assigned] (SPARK-26285) Add a metric source for accumulators (aka AccumulatorSource)

2018-12-22 Thread Thomas Graves (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves reassigned SPARK-26285:
-

Assignee: Alessandro Bellina

> Add a metric source for accumulators (aka AccumulatorSource)
> 
>
> Key: SPARK-26285
> URL: https://issues.apache.org/jira/browse/SPARK-26285
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Alessandro Bellina
>Assignee: Alessandro Bellina
>Priority: Minor
> Fix For: 3.0.0
>
>
> We'd like a simple mechanism to register spark accumulators against the 
> codahale metrics registry. 
> This task proposes adding a LongAccumulatorSource and a 
> DoubleAccumulatorSource.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26285) Add a metric source for accumulators (aka AccumulatorSource)

2018-12-22 Thread Thomas Graves (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves resolved SPARK-26285.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

> Add a metric source for accumulators (aka AccumulatorSource)
> 
>
> Key: SPARK-26285
> URL: https://issues.apache.org/jira/browse/SPARK-26285
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Alessandro Bellina
>Assignee: Alessandro Bellina
>Priority: Minor
> Fix For: 3.0.0
>
>
> We'd like a simple mechanism to register spark accumulators against the 
> codahale metrics registry. 
> This task proposes adding a LongAccumulatorSource and a 
> DoubleAccumulatorSource.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26269) YarnAllocator should have same blacklist behaviour with YARN to maxmize use of cluster resource

2018-12-22 Thread Thomas Graves (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-26269:
--
Issue Type: Bug  (was: Improvement)

> YarnAllocator should have same blacklist behaviour with YARN to maxmize use 
> of cluster resource
> ---
>
> Key: SPARK-26269
> URL: https://issues.apache.org/jira/browse/SPARK-26269
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.3.1, 2.3.2, 2.4.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Minor
> Fix For: 3.0.0
>
>
> Currently, YarnAllocator may put a node with a completed container whose exit 
> status is not one of SUCCESS, PREEMPTED, KILLED_EXCEEDED_VMEM, 
> KILLED_EXCEEDED_PMEM into blacklist. Howerver, for other exit status, e.g. 
> KILLED_BY_RESOURCEMANAGER, Yarn do not consider its related nodes shoule be 
> added into blacklist(see YARN's explaination for detail 
> https://github.com/apache/hadoop/blob/228156cfd1b474988bc4fedfbf7edddc87db41e3/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/util/Apps.java#L273).
>  So, relaxing the current blacklist rule and having the same blacklist 
> behaviour with YARN would maxmize use of cluster resources.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26366) Except with transform regression

2018-12-22 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727366#comment-16727366
 ] 

ASF GitHub Bot commented on SPARK-26366:


mgaido91 opened a new pull request #23372: [SPARK-26366][SQL][BACKPORT-2.3] 
ReplaceExceptWithFilter should consider NULL as False
URL: https://github.com/apache/spark/pull/23372
 
 
   ## What changes were proposed in this pull request?
   
   In `ReplaceExceptWithFilter` we do not consider properly the case in which 
the condition returns NULL. Indeed, in that case, since negating NULL still 
returns NULL, so it is not true the assumption that negating the condition 
returns all the rows which didn't satisfy it, rows returning NULL may not be 
returned. This happens when constraints inferred by 
`InferFiltersFromConstraints` are not enough, as it happens with `OR` 
conditions.
   
   The rule had also problems with non-deterministic conditions: in such a 
scenario, this rule would change the probability of the output.
   
   The PR fixes these problem by:
- returning False for the condition when it is Null (in this way we do 
return all the rows which didn't satisfy it);
- avoiding any transformation when the condition is non-deterministic.
   
   ## How was this patch tested?
   
   added UTs
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Except with transform regression
> 
>
> Key: SPARK-26366
> URL: https://issues.apache.org/jira/browse/SPARK-26366
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.3.2
>Reporter: Dan Osipov
>Assignee: Marco Gaido
>Priority: Major
> Fix For: 2.4.1, 3.0.0
>
>
> There appears to be a regression between Spark 2.2 and 2.3. Below is the code 
> to reproduce it:
>  
> {code:java}
> import org.apache.spark.sql.functions.col
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.types._
> val inputDF = spark.sqlContext.createDataFrame(
>   spark.sparkContext.parallelize(Seq(
> Row("0", "john", "smith", "j...@smith.com"),
> Row("1", "jane", "doe", "j...@doe.com"),
> Row("2", "apache", "spark", "sp...@apache.org"),
> Row("3", "foo", "bar", null)
>   )),
>   StructType(List(
> StructField("id", StringType, nullable=true),
> StructField("first_name", StringType, nullable=true),
> StructField("last_name", StringType, nullable=true),
> StructField("email", StringType, nullable=true)
>   ))
> )
> val exceptDF = inputDF.transform( toProcessDF =>
>   toProcessDF.filter(
>   (
> col("first_name").isin(Seq("john", "jane"): _*)
>   and col("last_name").isin(Seq("smith", "doe"): _*)
>   )
>   or col("email").isin(List(): _*)
>   )
> )
> inputDF.except(exceptDF).show()
> {code}
> Output with Spark 2.2:
> {noformat}
> +---+--+-++
> | id|first_name|last_name| email|
> +---+--+-++
> | 2| apache| spark|sp...@apache.org|
> | 3| foo| bar| null|
> +---+--+-++{noformat}
> Output with Spark 2.3:
> {noformat}
> +---+--+-++
> | id|first_name|last_name| email|
> +---+--+-++
> | 2| apache| spark|sp...@apache.org|
> +---+--+-++{noformat}
> Note, changing the last line to 
> {code:java}
> inputDF.except(exceptDF.cache()).show()
> {code}
> produces identical output for both Spark 2.3 and 2.2
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26430) Upgrade Surefire plugin to 3.0.0-M2

2018-12-22 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727293#comment-16727293
 ] 

ASF GitHub Bot commented on SPARK-26430:


asfgit closed pull request #23370: [SPARK-26430][BUILD][test-maven] Upgrade 
Surefire plugin to 3.0.0-M2
URL: https://github.com/apache/spark/pull/23370
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/pom.xml b/pom.xml
index 310d7de955125..0771578def24a 100644
--- a/pom.xml
+++ b/pom.xml
@@ -2099,7 +2099,7 @@
 
   org.apache.maven.plugins
   maven-surefire-plugin
-  3.0.0-M1
+  3.0.0-M2
   
   
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Surefire plugin to 3.0.0-M2
> ---
>
> Key: SPARK-26430
> URL: https://issues.apache.org/jira/browse/SPARK-26430
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> This issue aims to upgrade Maven Surefile plugin for JDK11 support. 3.0.0-M2 
> is [released Dec. 
> 9th.|https://issues.apache.org/jira/projects/SUREFIRE/versions/12344396]
> {code}
> [SUREFIRE-1568] Versions 2.21 and higher doesn't work with junit-platform for 
> Java 9 module
> [SUREFIRE-1605] NoClassDefFoundError (RunNotifier) with JDK 11
> [SUREFIRE-1600] Surefire Project using surefire:2.12.4 is not fully able to 
> work with JDK 10+ on internal build system. Therefore surefire-shadefire 
> should go with Surefire:3.0.0-M2.
> [SUREFIRE-1593] 3.0.0-M1 produces invalid code sources on Windows
> [SUREFIRE-1602] Surefire fails loading class ForkedBooter when using a 
> sub-directory pom file and a local maven repo
> [SUREFIRE-1606] maven-shared-utils must not be on provider's classpath
> [SUREFIRE-1531] Option to switch-off Java 9 modules
> [SUREFIRE-1590] Deploy multiple versions of Report XSD
> [SUREFIRE-1591] Java 1.7 feature Diamonds replaced Generics
> [SUREFIRE-1594] Java 1.7 feature try-catch - multiple exceptions in one catch
> [SUREFIRE-1595] Java 1.7 feature System.lineSeparator()
> [SUREFIRE-1597] ModularClasspathForkConfiguration with debug logs (@args file 
> and its path on file system)
> [SUREFIRE-1596] Unnecessary check JAVA_RECENT == JAVA_1_7 in unit tests
> [SUREFIRE-1598] Fixed typo in assertion statement in integration test 
> Surefire855AllowFailsafeUseArtifactFileIT
> [SUREFIRE-1607] Roadmap on Project Site
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26428) Minimize deprecated `ProcessingTime` usage

2018-12-22 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727288#comment-16727288
 ] 

ASF GitHub Bot commented on SPARK-26428:


asfgit closed pull request #23367: [SPARK-26428][SS][TEST] Minimize deprecated 
`ProcessingTime` usage
URL: https://github.com/apache/spark/pull/23367
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala
 
b/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala
index 61cbb3285a4f0..d4eb526540053 100644
--- 
a/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala
+++ 
b/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala
@@ -42,7 +42,7 @@ import org.apache.spark.sql.functions.{count, window}
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.kafka010.KafkaSourceProvider._
 import org.apache.spark.sql.sources.v2.DataSourceOptions
-import org.apache.spark.sql.streaming.{ProcessingTime, StreamTest}
+import org.apache.spark.sql.streaming.{StreamTest, Trigger}
 import org.apache.spark.sql.streaming.util.StreamManualClock
 import org.apache.spark.sql.test.SharedSQLContext
 
@@ -236,7 +236,7 @@ abstract class KafkaMicroBatchSourceSuiteBase extends 
KafkaSourceSuiteBase {
 }
 
 testStream(mapped)(
-  StartStream(ProcessingTime(100), clock),
+  StartStream(Trigger.ProcessingTime(100), clock),
   waitUntilBatchProcessed,
   // 1 from smallest, 1 from middle, 8 from biggest
   CheckAnswer(1, 10, 100, 101, 102, 103, 104, 105, 106, 107),
@@ -247,7 +247,7 @@ abstract class KafkaMicroBatchSourceSuiteBase extends 
KafkaSourceSuiteBase {
 11, 108, 109, 110, 111, 112, 113, 114, 115, 116
   ),
   StopStream,
-  StartStream(ProcessingTime(100), clock),
+  StartStream(Trigger.ProcessingTime(100), clock),
   waitUntilBatchProcessed,
   // smallest now empty, 1 more from middle, 9 more from biggest
   CheckAnswer(1, 10, 100, 101, 102, 103, 104, 105, 106, 107,
@@ -282,7 +282,7 @@ abstract class KafkaMicroBatchSourceSuiteBase extends 
KafkaSourceSuiteBase {
 
 val mapped = kafka.map(kv => kv._2.toInt + 1)
 testStream(mapped)(
-  StartStream(trigger = ProcessingTime(1)),
+  StartStream(trigger = Trigger.ProcessingTime(1)),
   makeSureGetOffsetCalled,
   AddKafkaData(Set(topic), 1, 2, 3),
   CheckAnswer(2, 3, 4),
@@ -605,7 +605,7 @@ abstract class KafkaMicroBatchSourceSuiteBase extends 
KafkaSourceSuiteBase {
 }
 
 testStream(kafka)(
-  StartStream(ProcessingTime(100), clock),
+  StartStream(Trigger.ProcessingTime(100), clock),
   waitUntilBatchProcessed,
   // 5 from smaller topic, 5 from bigger one
   CheckLastBatch((0 to 4) ++ (100 to 104): _*),
@@ -618,7 +618,7 @@ abstract class KafkaMicroBatchSourceSuiteBase extends 
KafkaSourceSuiteBase {
   // smaller topic empty, 5 from bigger one
   CheckLastBatch(110 to 114: _*),
   StopStream,
-  StartStream(ProcessingTime(100), clock),
+  StartStream(Trigger.ProcessingTime(100), clock),
   waitUntilBatchProcessed,
   // smallest now empty, 5 from bigger one
   CheckLastBatch(115 to 119: _*),
@@ -727,7 +727,7 @@ abstract class KafkaMicroBatchSourceSuiteBase extends 
KafkaSourceSuiteBase {
 // The message values are the same as their offsets to make the test easy 
to follow
 testUtils.withTranscationalProducer { producer =>
   testStream(mapped)(
-StartStream(ProcessingTime(100), clock),
+StartStream(Trigger.ProcessingTime(100), clock),
 waitUntilBatchProcessed,
 CheckAnswer(),
 WithOffsetSync(topicPartition, expectedOffset = 5) { () =>
@@ -850,7 +850,7 @@ abstract class KafkaMicroBatchSourceSuiteBase extends 
KafkaSourceSuiteBase {
 // The message values are the same as their offsets to make the test easy 
to follow
 testUtils.withTranscationalProducer { producer =>
   testStream(mapped)(
-StartStream(ProcessingTime(100), clock),
+StartStream(Trigger.ProcessingTime(100), clock),
 waitUntilBatchProcessed,
 CheckNewAnswer(),
 WithOffsetSync(topicPartition, expectedOffset = 5) { () =>
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
index d4bd9c7987f2d..de664cafed3b6 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala
+++

[jira] [Resolved] (SPARK-26430) Upgrade Surefire plugin to 3.0.0-M2

2018-12-22 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-26430.
---
   Resolution: Fixed
 Assignee: Dongjoon Hyun
Fix Version/s: 3.0.0

This is resolved via https://github.com/apache/spark/pull/23370

> Upgrade Surefire plugin to 3.0.0-M2
> ---
>
> Key: SPARK-26430
> URL: https://issues.apache.org/jira/browse/SPARK-26430
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> This issue aims to upgrade Maven Surefile plugin for JDK11 support. 3.0.0-M2 
> is [released Dec. 
> 9th.|https://issues.apache.org/jira/projects/SUREFIRE/versions/12344396]
> {code}
> [SUREFIRE-1568] Versions 2.21 and higher doesn't work with junit-platform for 
> Java 9 module
> [SUREFIRE-1605] NoClassDefFoundError (RunNotifier) with JDK 11
> [SUREFIRE-1600] Surefire Project using surefire:2.12.4 is not fully able to 
> work with JDK 10+ on internal build system. Therefore surefire-shadefire 
> should go with Surefire:3.0.0-M2.
> [SUREFIRE-1593] 3.0.0-M1 produces invalid code sources on Windows
> [SUREFIRE-1602] Surefire fails loading class ForkedBooter when using a 
> sub-directory pom file and a local maven repo
> [SUREFIRE-1606] maven-shared-utils must not be on provider's classpath
> [SUREFIRE-1531] Option to switch-off Java 9 modules
> [SUREFIRE-1590] Deploy multiple versions of Report XSD
> [SUREFIRE-1591] Java 1.7 feature Diamonds replaced Generics
> [SUREFIRE-1594] Java 1.7 feature try-catch - multiple exceptions in one catch
> [SUREFIRE-1595] Java 1.7 feature System.lineSeparator()
> [SUREFIRE-1597] ModularClasspathForkConfiguration with debug logs (@args file 
> and its path on file system)
> [SUREFIRE-1596] Unnecessary check JAVA_RECENT == JAVA_1_7 in unit tests
> [SUREFIRE-1598] Fixed typo in assertion statement in integration test 
> Surefire855AllowFailsafeUseArtifactFileIT
> [SUREFIRE-1607] Roadmap on Project Site
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26428) Minimize deprecated `ProcessingTime` usage

2018-12-22 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-26428.
---
   Resolution: Fixed
 Assignee: Dongjoon Hyun
Fix Version/s: 3.0.0

This is resolved via https://github.com/apache/spark/pull/23367

> Minimize deprecated `ProcessingTime` usage
> --
>
> Key: SPARK-26428
> URL: https://issues.apache.org/jira/browse/SPARK-26428
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.0.0
>
>
> Use of `ProcessingTime` class was deprecated in favor of 
> `Trigger.ProcessingTime` in Spark 2.2. And, SPARK-21464 minimized it at 
> 2.2.1. Recently, it grows again in test suites. This issue aims to clean up 
> newly introduced deprecation warnings for Spark 3.0.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26427) Upgrade Apache ORC to 1.5.4

2018-12-22 Thread ASF GitHub Bot (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16727286#comment-16727286
 ] 

ASF GitHub Bot commented on SPARK-26427:


asfgit closed pull request #23364: [SPARK-26427][BUILD] Upgrade Apache ORC to 
1.5.4
URL: https://github.com/apache/spark/pull/23364
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/dev/deps/spark-deps-hadoop-2.7 b/dev/deps/spark-deps-hadoop-2.7
index 71423af0789c6..1af29fcaff2aa 100644
--- a/dev/deps/spark-deps-hadoop-2.7
+++ b/dev/deps/spark-deps-hadoop-2.7
@@ -155,9 +155,9 @@ objenesis-2.5.1.jar
 okhttp-3.8.1.jar
 okio-1.13.0.jar
 opencsv-2.3.jar
-orc-core-1.5.3-nohive.jar
-orc-mapreduce-1.5.3-nohive.jar
-orc-shims-1.5.3.jar
+orc-core-1.5.4-nohive.jar
+orc-mapreduce-1.5.4-nohive.jar
+orc-shims-1.5.4.jar
 oro-2.0.8.jar
 osgi-resource-locator-1.0.1.jar
 paranamer-2.8.jar
diff --git a/dev/deps/spark-deps-hadoop-3.1 b/dev/deps/spark-deps-hadoop-3.1
index 93eafef045330..05f180b17a588 100644
--- a/dev/deps/spark-deps-hadoop-3.1
+++ b/dev/deps/spark-deps-hadoop-3.1
@@ -172,9 +172,9 @@ okhttp-2.7.5.jar
 okhttp-3.8.1.jar
 okio-1.13.0.jar
 opencsv-2.3.jar
-orc-core-1.5.3-nohive.jar
-orc-mapreduce-1.5.3-nohive.jar
-orc-shims-1.5.3.jar
+orc-core-1.5.4-nohive.jar
+orc-mapreduce-1.5.4-nohive.jar
+orc-shims-1.5.4.jar
 oro-2.0.8.jar
 osgi-resource-locator-1.0.1.jar
 paranamer-2.8.jar
diff --git a/pom.xml b/pom.xml
index 310d7de955125..de9421419edc2 100644
--- a/pom.xml
+++ b/pom.xml
@@ -132,7 +132,7 @@
 2.1.0
 10.12.1.1
 1.10.0
-1.5.3
+1.5.4
 nohive
 1.6.0
 9.4.12.v20180830
@@ -1740,6 +1740,10 @@
 ${orc.classifier}
 ${orc.deps.scope}
 
+  
+javax.xml.bind
+jaxb-api
+  
   
 org.apache.hadoop
 hadoop-common


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Upgrade Apache ORC to 1.5.4
> ---
>
> Key: SPARK-26427
> URL: https://issues.apache.org/jira/browse/SPARK-26427
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> This issue aims to update Apache ORC dependency to the latest version 1.5.4 
> released at Dec. 20. ([Release 
> Notes|https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12318320=12344187])
> {code}
> [ORC-237] - OrcFile.mergeFiles Specified block size is less than configured 
> minimum value
> [ORC-409] - Changes for extending MemoryManagerImpl
> [ORC-410] - Fix a locale-dependent test in TestCsvReader
> [ORC-416] - Avoid opening data reader when there is no stripe
> [ORC-417] - Use dynamic Apache Maven mirror link
> [ORC-419] - Ensure to call `close` at RecordReaderImpl constructor exception
> [ORC-432] - openjdk 8 has a bug that prevents surefire from working
> [ORC-435] - Ability to read stripes that are greater than 2GB
> [ORC-437] - Make acid schema checks case insensitive
> [ORC-411] - Update build to work with Java 10.
> [ORC-418] - Fix broken docker build script
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26427) Upgrade Apache ORC to 1.5.4

2018-12-22 Thread Dongjoon Hyun (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-26427.
---
   Resolution: Fixed
 Assignee: Dongjoon Hyun
Fix Version/s: 3.0.0

This is resolved via https://github.com/apache/spark/pull/23364

> Upgrade Apache ORC to 1.5.4
> ---
>
> Key: SPARK-26427
> URL: https://issues.apache.org/jira/browse/SPARK-26427
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>
> This issue aims to update Apache ORC dependency to the latest version 1.5.4 
> released at Dec. 20. ([Release 
> Notes|https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12318320=12344187])
> {code}
> [ORC-237] - OrcFile.mergeFiles Specified block size is less than configured 
> minimum value
> [ORC-409] - Changes for extending MemoryManagerImpl
> [ORC-410] - Fix a locale-dependent test in TestCsvReader
> [ORC-416] - Avoid opening data reader when there is no stripe
> [ORC-417] - Use dynamic Apache Maven mirror link
> [ORC-419] - Ensure to call `close` at RecordReaderImpl constructor exception
> [ORC-432] - openjdk 8 has a bug that prevents surefire from working
> [ORC-435] - Ability to read stripes that are greater than 2GB
> [ORC-437] - Make acid schema checks case insensitive
> [ORC-411] - Update build to work with Java 10.
> [ORC-418] - Fix broken docker build script
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-2984) FileNotFoundException on _temporary directory

[jira] [Commented] (SPARK-26421) Spark2.4.0 integration hadoop3.1.1 causes hive sql not to use，just in idea local mode

[jira] [Commented] (SPARK-24422) Add JDK11 in our Jenkins' build servers

[jira] [Updated] (SPARK-26366) Except with transform regression

[jira] [Closed] (SPARK-26421) Spark2.4.0 integration hadoop3.1.1 causes hive sql not to use，just in idea local mode

[jira] [Resolved] (SPARK-26421) Spark2.4.0 integration hadoop3.1.1 causes hive sql not to use，just in idea local mode

[jira] [Commented] (SPARK-26421) Spark2.4.0 integration hadoop3.1.1 causes hive sql not to use，just in idea local mode

[jira] [Commented] (SPARK-26243) Use java.time API for parsing timestamps and dates from JSON

[jira] [Commented] (SPARK-26243) Use java.time API for parsing timestamps and dates from JSON

[jira] [Commented] (SPARK-26402) Accessing nested fields with different cases in case insensitive mode

[jira] [Commented] (SPARK-24422) Add JDK11 in our Jenkins' build servers

[jira] [Commented] (SPARK-25245) Explain regarding limiting modification on "spark.sql.shuffle.partitions" for structured streaming

[jira] [Updated] (SPARK-25245) Explain regarding limiting modification on "spark.sql.shuffle.partitions" for structured streaming

[jira] [Resolved] (SPARK-25245) Explain regarding limiting modification on "spark.sql.shuffle.partitions" for structured streaming

[jira] [Assigned] (SPARK-25245) Explain regarding limiting modification on "spark.sql.shuffle.partitions" for structured streaming

[jira] [Commented] (SPARK-24422) Add JDK11 in our Jenkins' build servers

[jira] [Resolved] (SPARK-25953) install jdk11 on jenkins workers

[jira] [Closed] (SPARK-25953) install jdk11 on jenkins workers

[jira] [Commented] (SPARK-24422) Add JDK11 in our Jenkins' build servers

[jira] [Assigned] (SPARK-24422) Add JDK11 in our Jenkins' build servers

[jira] [Closed] (SPARK-26282) Update JVM to 8u191 on jenkins workers

[jira] [Commented] (SPARK-25953) install jdk11 on jenkins workers

[jira] [Assigned] (SPARK-14023) Make exceptions consistent regarding fields and columns

[jira] [Commented] (SPARK-14023) Make exceptions consistent regarding fields and columns

[jira] [Commented] (SPARK-14023) Make exceptions consistent regarding fields and columns

[jira] [Commented] (SPARK-26285) Add a metric source for accumulators (aka AccumulatorSource)

[jira] [Assigned] (SPARK-26285) Add a metric source for accumulators (aka AccumulatorSource)

[jira] [Resolved] (SPARK-26285) Add a metric source for accumulators (aka AccumulatorSource)

[jira] [Updated] (SPARK-26269) YarnAllocator should have same blacklist behaviour with YARN to maxmize use of cluster resource

[jira] [Commented] (SPARK-26366) Except with transform regression

[jira] [Commented] (SPARK-26430) Upgrade Surefire plugin to 3.0.0-M2

[jira] [Commented] (SPARK-26428) Minimize deprecated `ProcessingTime` usage

[jira] [Resolved] (SPARK-26430) Upgrade Surefire plugin to 3.0.0-M2

[jira] [Resolved] (SPARK-26428) Minimize deprecated `ProcessingTime` usage

[jira] [Commented] (SPARK-26427) Upgrade Apache ORC to 1.5.4

[jira] [Resolved] (SPARK-26427) Upgrade Apache ORC to 1.5.4

36 matches

Site Navigation

Mail list logo

Footer information