[jira] [Commented] (SPARK-26836) Columns get switched in Spark SQL using Avro backed Hive table if schema evolves

2019-02-10 Thread Gengliang Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764719#comment-16764719
 ] 

Gengliang Wang commented on SPARK-26836:


[~dongjoon]I don't think the issue is related to spark-avro lib.
[~treff7es] Can you try reproducing the issue on Hive directly and see what the 
behavior is? 

> Columns get switched in Spark SQL using Avro backed Hive table if schema 
> evolves
> 
>
> Key: SPARK-26836
> URL: https://issues.apache.org/jira/browse/SPARK-26836
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1, 2.4.0
> Environment: I tested with Hive and HCatalog which runs on version 
> 2.3.4 and with Spark 2.3.1 and 2.4
>Reporter: Tamas Nemeth
>Priority: Major
>  Labels: correctness
> Attachments: doctors.avro, doctors_evolved.avro, 
> doctors_evolved.json, original.avsc
>
>
> I have a hive avro table where the avro schema is stored on s3 next to the 
> avro files. 
> In the table definiton the avro.schema.url always points to the latest 
> partition's _schema.avsc file which is always the lates schema. (Avro schemas 
> are backward and forward compatible in a table)
> When new data comes in, I always add a new partition where the 
> avro.schema.url properties also set to the _schema.avsc which was used when 
> it was added and of course I always update the table avro.schema.url property 
> to the latest one.
> Querying this table works fine until the schema evolves in a way that a new 
> optional property is added in the middle. 
> When this happens then after the spark sql query the columns in the old 
> partition gets mixed up and it shows the wrong data for the columns.
> If I query the table with Hive then everything is perfectly fine and it gives 
> me back the correct columns for the partitions which were created the old 
> schema and for the new which was created the evolved schema.
>  
> Here is how I could reproduce with the 
> [doctors.avro|https://github.com/apache/spark/blob/master/sql/hive/src/test/resources/data/files/doctors.avro]
>  example data in sql test suite.
>  # I have created two partition folder:
> {code:java}
> [hadoop@ip-192-168-10-158 hadoop]$ hdfs dfs -ls s3://somelocation/doctors/*/
> Found 2 items
> -rw-rw-rw- 1 hadoop hadoop 418 2019-02-06 12:48 s3://somelocation/doctors
> /dt=2019-02-05/_schema.avsc
> -rw-rw-rw- 1 hadoop hadoop 521 2019-02-06 12:13 s3://somelocation/doctors
> /dt=2019-02-05/doctors.avro
> Found 2 items
> -rw-rw-rw- 1 hadoop hadoop 580 2019-02-06 12:49 s3://somelocation/doctors
> /dt=2019-02-06/_schema.avsc
> -rw-rw-rw- 1 hadoop hadoop 577 2019-02-06 12:13 s3://somelocation/doctors
> /dt=2019-02-06/doctors_evolved.avro{code}
> Here the first partition had data which was created with the schema before 
> evolving and the second one had the evolved one. (the evolved schema is the 
> same as in your testcase except I moved the extra_field column to the last 
> from the second and I generated two lines of avro data with the evolved 
> schema.
>  # I have created a hive table with the following command:
>  
> {code:java}
> CREATE EXTERNAL TABLE `default.doctors`
>  PARTITIONED BY (
>  `dt` string
>  )
>  ROW FORMAT SERDE
>  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>  WITH SERDEPROPERTIES (
>  'avro.schema.url'='s3://somelocation/doctors/
> /dt=2019-02-06/_schema.avsc')
>  STORED AS INPUTFORMAT
>  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>  OUTPUTFORMAT
>  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>  LOCATION
>  's3://somelocation/doctors/'
>  TBLPROPERTIES (
>  'transient_lastDdlTime'='1538130975'){code}
>  
> Here as you can see the table schema url points to the latest schema
> 3. I ran an msck _repair table_ to pick up all the partitions.
> Fyi: If I run my select * query from here then everything is fine and no 
> columns switch happening.
> 4. Then I changed the first partition's avro.schema.url url to points to the 
> schema which is under the partition folder (non-evolved one -> 
> s3://somelocation/doctors/
> /dt=2019-02-05/_schema.avsc)
> Then if you ran a _select * from default.spark_test_ then the columns will be 
> mixed up (on the data below the first name column becomes the extra_field 
> column. I guess because in the latest schema it is the second column):
>  
> {code:java}
> number,extra_field,first_name,last_name,dt 
> 6,Colin,Baker,null,2019-02-05 
> 3,Jon,Pertwee,null,2019-02-05 
> 4,Tom,Baker,null,2019-02-05 
> 5,Peter,Davison,null,2019-02-05 
> 11,Matt,Smith,null,2019-02-05 
> 1,William,Hartnell,null,2019-02-05 
> 7,Sylvester,McCoy,null,2019-02-05 
> 8,Paul,McGann,null,2019-02-05 
> 2,Patrick,Troughton,null,2019-02-05 
> 9,Christopher,Eccleston,null,2019-02-05 
> 

[jira] [Commented] (SPARK-23408) Flaky test: StreamingOuterJoinSuite.left outer early state exclusion on right

2019-02-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764712#comment-16764712
 ] 

Apache Spark commented on SPARK-23408:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/23757

> Flaky test: StreamingOuterJoinSuite.left outer early state exclusion on right
> -
>
> Key: SPARK-23408
> URL: https://issues.apache.org/jira/browse/SPARK-23408
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 2.4.0
>Reporter: Marcelo Vanzin
>Assignee: Tathagata Das
>Priority: Minor
> Fix For: 2.4.0
>
>
> Seen on an unrelated PR.
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87386/testReport/org.apache.spark.sql.streaming/StreamingOuterJoinSuite/left_outer_early_state_exclusion_on_right/
> {noformat}
> sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: 
> Assert on query failed: Check total state rows = List(4), updated state rows 
> = List(4): Array(1) did not equal List(4) incorrect updates rows
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528)
>   org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560)
>   
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501)
>   
> org.apache.spark.sql.streaming.StateStoreMetricsTest$$anonfun$assertNumStateRows$1.apply(StateStoreMetricsTest.scala:28)
>   
> org.apache.spark.sql.streaming.StateStoreMetricsTest$$anonfun$assertNumStateRows$1.apply(StateStoreMetricsTest.scala:23)
>   
> org.apache.spark.sql.streaming.StreamTest$$anonfun$liftedTree1$1$1$$anonfun$apply$14.apply$mcZ$sp(StreamTest.scala:568)
>   
> org.apache.spark.sql.streaming.StreamTest$class.verify$1(StreamTest.scala:371)
>   
> org.apache.spark.sql.streaming.StreamTest$$anonfun$liftedTree1$1$1.apply(StreamTest.scala:568)
>   
> org.apache.spark.sql.streaming.StreamTest$$anonfun$liftedTree1$1$1.apply(StreamTest.scala:432)
>   
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> == Progress ==
>AddData to MemoryStream[value#19652]: 3,4,5
>AddData to MemoryStream[value#19662]: 1,2,3
>CheckLastBatch: [3,10,6,9]
> => AssertOnQuery(, Check total state rows = List(4), updated state 
> rows = List(4))
>AddData to MemoryStream[value#19652]: 20
>AddData to MemoryStream[value#19662]: 21
>CheckLastBatch: 
>AddData to MemoryStream[value#19662]: 20
>CheckLastBatch: [20,30,40,60],[4,10,8,null],[5,10,10,null]
> == Stream ==
> Output Mode: Append
> Stream state: {MemoryStream[value#19652]: 0,MemoryStream[value#19662]: 0}
> Thread state: alive
> Thread stack trace: java.lang.Thread.sleep(Native Method)
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:152)
> org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:120)
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:279)
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
> {noformat}
> No other failures in the history, though.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26855) SparkSubmitSuite fails on a clean build

2019-02-10 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764675#comment-16764675
 ] 

Hyukjin Kwon commented on SPARK-26855:
--

But I think it's tricky to fix IMHO .. Should be awesome if that's fixed.

> SparkSubmitSuite fails on a clean build
> ---
>
> Key: SPARK-26855
> URL: https://issues.apache.org/jira/browse/SPARK-26855
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SparkR
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Felix Cheung
>Priority: Major
>
> SparkSubmitSuite
> "include an external JAR in SparkR"
> fails consistently but the test before it, "correctly builds R packages 
> included in a jar with --packages" passes.
> the workaround is to build once with skipTests first, then everything passes.
> ran into this while testing 2.3.3 RC2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26800) JDBC - MySQL nullable option is ignored

2019-02-10 Thread Maxim Gekk (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764665#comment-16764665
 ] 

Maxim Gekk commented on SPARK-26800:


Using NOT NULL for TIMESTAMP by default is non-standard behaviour. Please, take 
a look at the [MySQL system 
variable|https://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_explicit_defaults_for_timestamp]
 to control the behaviour.

> JDBC - MySQL nullable option is ignored
> ---
>
> Key: SPARK-26800
> URL: https://issues.apache.org/jira/browse/SPARK-26800
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Francisco Miguel Biete Banon
>Priority: Minor
>
> Spark 2.4.0
> MySQL 5.7.21 (docker official MySQL image running with default config)
> Writing a dataframe with optionally null fields result in a table with NOT 
> NULL attributes in MySQL.
> {code:java}
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.{Row, SaveMode}
> import java.sql.Timestamp
> val data = Seq[Row](Row(1, null, "Boston"), Row(2, null, "New York"))
> val schema = StructType(
> StructField("id", IntegerType, true) ::
> StructField("when", TimestampType, true) ::
> StructField("city", StringType, true) :: Nil)
> println(schema.toDDL)
> val df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema)
> df.write.mode(SaveMode.Overwrite).jdbc(jdbcUrl, "temp_bug", 
> jdbcProperties){code}
> Produces
> {code}
> CREATE TABLE `temp_bug` (
>   `id` int(11) DEFAULT NULL,
>   `when` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE 
> CURRENT_TIMESTAMP,
>   `city` text
> ) ENGINE=InnoDB DEFAULT CHARSET=latin1;
> {code}
> I would expect "when" column to be defined as nullable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26855) SparkSubmitSuite fails on a clean build

2019-02-10 Thread Takeshi Yamamuro (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764662#comment-16764662
 ] 

Takeshi Yamamuro edited comment on SPARK-26855 at 2/11/19 4:47 AM:
---

Aha, good catch. I didn't notice this cuz I always run the tests after building 
once with skipTests first...


was (Author: maropu):
Aha, good catch. I always run the tests after building once with skipTests 
first...

> SparkSubmitSuite fails on a clean build
> ---
>
> Key: SPARK-26855
> URL: https://issues.apache.org/jira/browse/SPARK-26855
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SparkR
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Felix Cheung
>Priority: Major
>
> SparkSubmitSuite
> "include an external JAR in SparkR"
> fails consistently but the test before it, "correctly builds R packages 
> included in a jar with --packages" passes.
> the workaround is to build once with skipTests first, then everything passes.
> ran into this while testing 2.3.3 RC2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26855) SparkSubmitSuite fails on a clean build

2019-02-10 Thread Takeshi Yamamuro (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764662#comment-16764662
 ] 

Takeshi Yamamuro commented on SPARK-26855:
--

Aha, good catch. I always run the tests after building once with skipTests 
first...

> SparkSubmitSuite fails on a clean build
> ---
>
> Key: SPARK-26855
> URL: https://issues.apache.org/jira/browse/SPARK-26855
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SparkR
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Felix Cheung
>Priority: Major
>
> SparkSubmitSuite
> "include an external JAR in SparkR"
> fails consistently but the test before it, "correctly builds R packages 
> included in a jar with --packages" passes.
> the workaround is to build once with skipTests first, then everything passes.
> ran into this while testing 2.3.3 RC2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26525) Fast release memory of ShuffleBlockFetcherIterator

2019-02-10 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-26525.
-
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 23438
[https://github.com/apache/spark/pull/23438]

> Fast release memory of ShuffleBlockFetcherIterator
> --
>
> Key: SPARK-26525
> URL: https://issues.apache.org/jira/browse/SPARK-26525
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 2.3.2
>Reporter: liupengcheng
>Assignee: liupengcheng
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, spark would not release ShuffleBlockFetcherIterator until the 
> whole task finished.
> In some conditions, it incurs memory leak.
> An example is Shuffle -> map -> Coalesce(shuffle = false). Each 
> ShuffleBlockFetcherIterator contains  some metas about 
> MapStatus(blocksByAddress) and each ShuffleMapTask will keep n(max to shuffle 
> partitions) shuffleBlockFetcherIterator for they are refered by 
> onCompleteCallbacks of TaskContext, in some case, it may take huge memory and 
> the memory will not released until the task finished.
> Actually, We can release ShuffleBlockFetcherIterator as soon as it's consumed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-22826) [SQL] findWiderTypeForTwo Fails over StructField of Array

2019-02-10 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-22826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-22826.
--
Resolution: Cannot Reproduce

It's fixed somewhere. Please link the Jira if anyone identifies.

> [SQL] findWiderTypeForTwo Fails over StructField of Array
> -
>
> Key: SPARK-22826
> URL: https://issues.apache.org/jira/browse/SPARK-22826
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Aleksander Eskilson
>Priority: Major
>
> The {{findWiderTypeForTwo}} codepath in Catalyst {{TypeCoercion}} fails when 
> applied to to {{StructType}} having the following fields:
> {noformat}
>   StructType(StructField("a", ArrayType(StringType, containsNull=true)) 
> :: Nil),
>   StructType(StructField("a", ArrayType(StringType, containsNull=false)) 
> :: Nil)
> {noformat}
> When in {{findTightestCommonType}}, the function attempts to recursively find 
> the tightest common type of two arrays. These two arrays are not equal types 
> (since one would admit null elements and the other would not), but 
> {{findTightestCommonType}} has no match case for {{ArrayType}} (or 
> {{MapType}}), so the 
> [get|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala#L108]
>  operation on the dataType of the {{StructField}} throws a 
> {{NoSuchElementException}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26759) Arrow optimization in SparkR's interoperability

2019-02-10 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-26759:
-
Description: 
Arrow 0.12.0 is release and it contains R API. We could optimize Spark 
DaraFrame <> R DataFrame interoperability.

For instance see the examples below:
 - {{dapply}}

{code:java}
df <- createDataFrame(mtcars)
collect(dapply(df,
   function(r.data.frame) {
 data.frame(r.data.frame$gear)
   },
   structType("gear long")))
{code}
 - {{gapply}}

{code:java}
df <- createDataFrame(mtcars)
collect(gapply(df,
   "gear",
   function(key, group) {
 data.frame(gear = key[[1]], disp = mean(group$disp) > 
group$disp)
   },
   structType("gear double, disp boolean")))
{code}
 - R DataFrame -> Spark DataFrame

{code:java}
createDataFrame(mtcars)
{code}
 - Spark DataFrame -> R DataFrame

{code:java}
collect(df)
head(df)
{code}

  was:Arrow 0.12.0 is release and it contains R API. We could optimize Spark 
DaraFrame <> R DataFrame interoperability.


> Arrow optimization in SparkR's interoperability
> ---
>
> Key: SPARK-26759
> URL: https://issues.apache.org/jira/browse/SPARK-26759
> Project: Spark
>  Issue Type: Umbrella
>  Components: SparkR, SQL
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: release-notes
>
> Arrow 0.12.0 is release and it contains R API. We could optimize Spark 
> DaraFrame <> R DataFrame interoperability.
> For instance see the examples below:
>  - {{dapply}}
> {code:java}
> df <- createDataFrame(mtcars)
> collect(dapply(df,
>function(r.data.frame) {
>  data.frame(r.data.frame$gear)
>},
>structType("gear long")))
> {code}
>  - {{gapply}}
> {code:java}
> df <- createDataFrame(mtcars)
> collect(gapply(df,
>"gear",
>function(key, group) {
>  data.frame(gear = key[[1]], disp = mean(group$disp) > 
> group$disp)
>},
>structType("gear double, disp boolean")))
> {code}
>  - R DataFrame -> Spark DataFrame
> {code:java}
> createDataFrame(mtcars)
> {code}
>  - Spark DataFrame -> R DataFrame
> {code:java}
> collect(df)
> head(df)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26578) Synchronize putBytes's memory allocation and putting block on memoryManager

2019-02-10 Thread SongYadong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SongYadong resolved SPARK-26578.

Resolution: Won't Fix

This may not be a big problem that must be fixed. so won't fix.

> Synchronize putBytes's memory allocation and putting block on memoryManager
> ---
>
> Key: SPARK-26578
> URL: https://issues.apache.org/jira/browse/SPARK-26578
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: SongYadong
>Priority: Minor
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Just like operation about memory allocation and putting/evicting blocks in 
> _MemoryStore_'s many other methods, I think it may be better that _putBytes_ 
> also be synchronized on _memoryManager_.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26578) Synchronize putBytes's memory allocation and putting block on memoryManager

2019-02-10 Thread SongYadong (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764609#comment-16764609
 ] 

SongYadong commented on SPARK-26578:


This may not be a big problem that must be fixed. 

> Synchronize putBytes's memory allocation and putting block on memoryManager
> ---
>
> Key: SPARK-26578
> URL: https://issues.apache.org/jira/browse/SPARK-26578
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: SongYadong
>Priority: Minor
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Just like operation about memory allocation and putting/evicting blocks in 
> _MemoryStore_'s many other methods, I think it may be better that _putBytes_ 
> also be synchronized on _memoryManager_.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26843) Use ConfigEntry for hardcoded configs for "mesos" resource manager

2019-02-10 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-26843.
---
   Resolution: Fixed
 Assignee: Jungtaek Lim
Fix Version/s: 3.0.0

This is resolved via https://github.com/apache/spark/pull/23743

> Use ConfigEntry for hardcoded configs for "mesos" resource manager
> --
>
> Key: SPARK-26843
> URL: https://issues.apache.org/jira/browse/SPARK-26843
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Assignee: Jungtaek Lim
>Priority: Major
> Fix For: 3.0.0
>
>
> Make the hardcoded configs in "mesos" module to use ConfigEntry.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24211) Flaky test: StreamingOuterJoinSuite

2019-02-10 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24211:


Assignee: Apache Spark

> Flaky test: StreamingOuterJoinSuite
> ---
>
> Key: SPARK-24211
> URL: https://issues.apache.org/jira/browse/SPARK-24211
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 2.3.2
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>
> *windowed left outer join*
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/330/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/317/]
> *windowed right outer join*
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/334/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/328/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/371/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/345/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/366/]
> *left outer join with non-key condition violated*
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/337/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/366/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/386/]
> *left outer early state exclusion on left*
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/375]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/385/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24211) Flaky test: StreamingOuterJoinSuite

2019-02-10 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24211:


Assignee: (was: Apache Spark)

> Flaky test: StreamingOuterJoinSuite
> ---
>
> Key: SPARK-24211
> URL: https://issues.apache.org/jira/browse/SPARK-24211
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 2.3.2
>Reporter: Dongjoon Hyun
>Priority: Major
>
> *windowed left outer join*
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/330/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/317/]
> *windowed right outer join*
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/334/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/328/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/371/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/345/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/366/]
> *left outer join with non-key condition violated*
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/337/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/366/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/386/]
> *left outer early state exclusion on left*
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/375]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/385/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24239) Flaky test: KafkaContinuousSourceSuite.subscribing topic by name from earliest offsets

2019-02-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764603#comment-16764603
 ] 

Apache Spark commented on SPARK-24239:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/23757

> Flaky test: KafkaContinuousSourceSuite.subscribing topic by name from 
> earliest offsets
> --
>
> Key: SPARK-24239
> URL: https://issues.apache.org/jira/browse/SPARK-24239
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/360/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/353/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24239) Flaky test: KafkaContinuousSourceSuite.subscribing topic by name from earliest offsets

2019-02-10 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24239:


Assignee: Apache Spark

> Flaky test: KafkaContinuousSourceSuite.subscribing topic by name from 
> earliest offsets
> --
>
> Key: SPARK-24239
> URL: https://issues.apache.org/jira/browse/SPARK-24239
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/360/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/353/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24239) Flaky test: KafkaContinuousSourceSuite.subscribing topic by name from earliest offsets

2019-02-10 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-24239:


Assignee: (was: Apache Spark)

> Flaky test: KafkaContinuousSourceSuite.subscribing topic by name from 
> earliest offsets
> --
>
> Key: SPARK-24239
> URL: https://issues.apache.org/jira/browse/SPARK-24239
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 2.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/360/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/353/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24211) Flaky test: StreamingOuterJoinSuite

2019-02-10 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764601#comment-16764601
 ] 

Apache Spark commented on SPARK-24211:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/23757

> Flaky test: StreamingOuterJoinSuite
> ---
>
> Key: SPARK-24211
> URL: https://issues.apache.org/jira/browse/SPARK-24211
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 2.3.2
>Reporter: Dongjoon Hyun
>Priority: Major
>
> *windowed left outer join*
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/330/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/317/]
> *windowed right outer join*
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/334/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/328/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/371/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/345/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/366/]
> *left outer join with non-key condition violated*
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/337/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/366/]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/386/]
> *left outer early state exclusion on left*
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/375]
>  - 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/385/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26816) Add Benchmark for XORShiftRandom

2019-02-10 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-26816.
---
   Resolution: Fixed
 Assignee: Maxim Gekk
Fix Version/s: 3.0.0

This is resolved via https://github.com/apache/spark/pull/23752

> Add Benchmark for XORShiftRandom
> 
>
> Key: SPARK-26816
> URL: https://issues.apache.org/jira/browse/SPARK-26816
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
>  Labels: low-hanging-fruit
> Fix For: 3.0.0
>
>
> Currently, benchmark for XORShiftRandom is mixed with implementation: 
> [https://github.com/apache/spark/blob/5264164a67df498b73facae207eda12ee133be7d/core/src/main/scala/org/apache/spark/util/random/XORShiftRandom.scala#L70-L107]
>  . Need to extract the code and create a separate benchmark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-26684) Add logs when allocating large memory for PooledByteBufAllocator

2019-02-10 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-26684.
-

> Add logs when allocating large memory for PooledByteBufAllocator
> 
>
> Key: SPARK-26684
> URL: https://issues.apache.org/jira/browse/SPARK-26684
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 2.4.0
>Reporter: liupengcheng
>Priority: Trivial
>
> Currently, Spark use `PooledByteBufAllocator` to allocate memory for channel 
> reading. However, the allocated heap/offheap memory size is not tracked. 
> Sometimes, this make it difficult to  find out the cause of OOM failures(for 
> instance, direct memory oom). we have to dump the heap and use more advanced 
> tools like MAT to locate the cause.
> Actually, we can add some logs for `PooledByteBufAllocator` when allocating 
> large memory, which can facilitate the debugging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26684) Add logs when allocating large memory for PooledByteBufAllocator

2019-02-10 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764558#comment-16764558
 ] 

Dongjoon Hyun commented on SPARK-26684:
---

Hi, [~liupengcheng].

As I commented on your PR, please contribute to `Netty` project itself. Inside 
`Netty` project, it is one liner for each functions.
- 
https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java#L321

> Add logs when allocating large memory for PooledByteBufAllocator
> 
>
> Key: SPARK-26684
> URL: https://issues.apache.org/jira/browse/SPARK-26684
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 2.4.0
>Reporter: liupengcheng
>Priority: Trivial
>
> Currently, Spark use `PooledByteBufAllocator` to allocate memory for channel 
> reading. However, the allocated heap/offheap memory size is not tracked. 
> Sometimes, this make it difficult to  find out the cause of OOM failures(for 
> instance, direct memory oom). we have to dump the heap and use more advanced 
> tools like MAT to locate the cause.
> Actually, we can add some logs for `PooledByteBufAllocator` when allocating 
> large memory, which can facilitate the debugging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26684) Add logs when allocating large memory for PooledByteBufAllocator

2019-02-10 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-26684.
---
Resolution: Invalid

> Add logs when allocating large memory for PooledByteBufAllocator
> 
>
> Key: SPARK-26684
> URL: https://issues.apache.org/jira/browse/SPARK-26684
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 2.4.0
>Reporter: liupengcheng
>Priority: Trivial
>
> Currently, Spark use `PooledByteBufAllocator` to allocate memory for channel 
> reading. However, the allocated heap/offheap memory size is not tracked. 
> Sometimes, this make it difficult to  find out the cause of OOM failures(for 
> instance, direct memory oom). we have to dump the heap and use more advanced 
> tools like MAT to locate the cause.
> Actually, we can add some logs for `PooledByteBufAllocator` when allocating 
> large memory, which can facilitate the debugging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26783) Kafka parameter documentation doesn't match with the reality (upper/lowercase)

2019-02-10 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764553#comment-16764553
 ] 

Dongjoon Hyun commented on SPARK-26783:
---

Hi, [~gsomogyi]. Is this still valid? Could you update the Jira issue 
description more specifically?

> Kafka parameter documentation doesn't match with the reality (upper/lowercase)
> --
>
> Key: SPARK-26783
> URL: https://issues.apache.org/jira/browse/SPARK-26783
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Minor
>
> A good example for this is "failOnDataLoss" which is reported in SPARK-23685. 
> I've just checked and there are several other parameters which suffer from 
> the same issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26855) SparkSubmitSuite fails on a clean build

2019-02-10 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-26855:
--
Affects Version/s: 2.4.0

> SparkSubmitSuite fails on a clean build
> ---
>
> Key: SPARK-26855
> URL: https://issues.apache.org/jira/browse/SPARK-26855
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SparkR
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Felix Cheung
>Priority: Major
>
> SparkSubmitSuite
> "include an external JAR in SparkR"
> fails consistently but the test before it, "correctly builds R packages 
> included in a jar with --packages" passes.
> the workaround is to build once with skipTests first, then everything passes.
> ran into this while testing 2.3.3 RC2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26855) SparkSubmitSuite fails on a clean build

2019-02-10 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764538#comment-16764538
 ] 

Dongjoon Hyun edited comment on SPARK-26855 at 2/10/19 8:15 PM:


+1 for fix this. IIRC, not only this, there was another instances which 
requires 2-phase buildings (which we have been used for a long time), too.


was (Author: dongjoon):
IIRC, not only this, there was another instances which requires 2-phase 
buildings (which we have been used for a long time), too.

> SparkSubmitSuite fails on a clean build
> ---
>
> Key: SPARK-26855
> URL: https://issues.apache.org/jira/browse/SPARK-26855
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SparkR
>Affects Versions: 2.3.2
>Reporter: Felix Cheung
>Priority: Major
>
> SparkSubmitSuite
> "include an external JAR in SparkR"
> fails consistently but the test before it, "correctly builds R packages 
> included in a jar with --packages" passes.
> the workaround is to build once with skipTests first, then everything passes.
> ran into this while testing 2.3.3 RC2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26855) SparkSubmitSuite fails on a clean build

2019-02-10 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764538#comment-16764538
 ] 

Dongjoon Hyun commented on SPARK-26855:
---

IIRC, not only this, there was another instances which requires 2-phase 
buildings (which we have been used for a long time), too.

> SparkSubmitSuite fails on a clean build
> ---
>
> Key: SPARK-26855
> URL: https://issues.apache.org/jira/browse/SPARK-26855
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SparkR
>Affects Versions: 2.3.2
>Reporter: Felix Cheung
>Priority: Major
>
> SparkSubmitSuite
> "include an external JAR in SparkR"
> fails consistently but the test before it, "correctly builds R packages 
> included in a jar with --packages" passes.
> the workaround is to build once with skipTests first, then everything passes.
> ran into this while testing 2.3.3 RC2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26855) SparkSubmitSuite fails on a clean build

2019-02-10 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764537#comment-16764537
 ] 

Sean Owen commented on SPARK-26855:
---

I think this should be improved, but even I am not sure how much it is supposed 
to work without building Spark first separately. That's always been how I test.

> SparkSubmitSuite fails on a clean build
> ---
>
> Key: SPARK-26855
> URL: https://issues.apache.org/jira/browse/SPARK-26855
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SparkR
>Affects Versions: 2.3.2
>Reporter: Felix Cheung
>Priority: Major
>
> SparkSubmitSuite
> "include an external JAR in SparkR"
> fails consistently but the test before it, "correctly builds R packages 
> included in a jar with --packages" passes.
> the workaround is to build once with skipTests first, then everything passes.
> ran into this while testing 2.3.3 RC2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26840) Avoid cost-based join reorder in presence of join hints

2019-02-10 Thread Maryann Xue (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maryann Xue updated SPARK-26840:

Description: This is a fix for 
[https://github.com/apache/spark/pull/23524|https://github.com/apache/spark/pull/23524.],
 which did not stop cost-based join reorder when the {{CostBasedJoinReorder}} 
rule recurses down the tree and applies join reorder for nested joins with 
hints.

> Avoid cost-based join reorder in presence of join hints
> ---
>
> Key: SPARK-26840
> URL: https://issues.apache.org/jira/browse/SPARK-26840
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maryann Xue
>Priority: Minor
>
> This is a fix for 
> [https://github.com/apache/spark/pull/23524|https://github.com/apache/spark/pull/23524.],
>  which did not stop cost-based join reorder when the {{CostBasedJoinReorder}} 
> rule recurses down the tree and applies join reorder for nested joins with 
> hints.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26840) Avoid cost-based join reorder in presence of join hints

2019-02-10 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26840:


Assignee: (was: Apache Spark)

> Avoid cost-based join reorder in presence of join hints
> ---
>
> Key: SPARK-26840
> URL: https://issues.apache.org/jira/browse/SPARK-26840
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maryann Xue
>Priority: Minor
>
> This is a fix for 
> [https://github.com/apache/spark/pull/23524|https://github.com/apache/spark/pull/23524.],
>  which did not stop cost-based join reorder when the {{CostBasedJoinReorder}} 
> rule recurses down the tree and applies join reorder for nested joins with 
> hints.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26840) Avoid cost-based join reorder in presence of join hints

2019-02-10 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26840:


Assignee: Apache Spark

> Avoid cost-based join reorder in presence of join hints
> ---
>
> Key: SPARK-26840
> URL: https://issues.apache.org/jira/browse/SPARK-26840
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maryann Xue
>Assignee: Apache Spark
>Priority: Minor
>
> This is a fix for 
> [https://github.com/apache/spark/pull/23524|https://github.com/apache/spark/pull/23524.],
>  which did not stop cost-based join reorder when the {{CostBasedJoinReorder}} 
> rule recurses down the tree and applies join reorder for nested joins with 
> hints.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26855) SparkSubmitSuite fails on a clean build

2019-02-10 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26855:
-
Description: 
SparkSubmitSuite

"include an external JAR in SparkR"

fails consistently but the test before it, "correctly builds R packages 
included in a jar with --packages" passes.

the workaround is to build once with skipTests first, then everything passes.

ran into this while testing 2.3.3 RC2.

> SparkSubmitSuite fails on a clean build
> ---
>
> Key: SPARK-26855
> URL: https://issues.apache.org/jira/browse/SPARK-26855
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SparkR
>Affects Versions: 2.3.2
>Reporter: Felix Cheung
>Priority: Major
>
> SparkSubmitSuite
> "include an external JAR in SparkR"
> fails consistently but the test before it, "correctly builds R packages 
> included in a jar with --packages" passes.
> the workaround is to build once with skipTests first, then everything passes.
> ran into this while testing 2.3.3 RC2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26855) SparkSubmitSuite fails on a clean build

2019-02-10 Thread Felix Cheung (JIRA)
Felix Cheung created SPARK-26855:


 Summary: SparkSubmitSuite fails on a clean build
 Key: SPARK-26855
 URL: https://issues.apache.org/jira/browse/SPARK-26855
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, SparkR
Affects Versions: 2.3.2
Reporter: Felix Cheung






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17454) Use Mesos disk resources

2019-02-10 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SPARK-17454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764454#comment-16764454
 ] 

Clément Michaud edited comment on SPARK-17454 at 2/10/19 3:41 PM:
--

Hello,

I've added a way for executors to reserve disk resources from Mesos.

The change is [https://github.com/apache/spark/pull/23758].

[~mgummelt], I'd really appreciate if you can review or help me find someone to 
review this change.


was (Author: clems4ever):
Hello,

I've added a way for executors to reserve disk resources from Mesos.

The change is https://github.com/apache/spark/pull/23758.

> Use Mesos disk resources
> 
>
> Key: SPARK-17454
> URL: https://issues.apache.org/jira/browse/SPARK-17454
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Chris Bannister
>Priority: Major
>
> Currently the driver will accept offers from Mesos which have enough ram for 
> the executor and until its max cores is reached. There is no way to control 
> the required CPU's or disk for each executor, it would be very useful to be 
> able to apply something similar to spark.mesos.constraints to resource offers 
> instead of attributes on the offer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17454) Use Mesos disk resources

2019-02-10 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17454:


Assignee: (was: Apache Spark)

> Use Mesos disk resources
> 
>
> Key: SPARK-17454
> URL: https://issues.apache.org/jira/browse/SPARK-17454
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Chris Bannister
>Priority: Major
>
> Currently the driver will accept offers from Mesos which have enough ram for 
> the executor and until its max cores is reached. There is no way to control 
> the required CPU's or disk for each executor, it would be very useful to be 
> able to apply something similar to spark.mesos.constraints to resource offers 
> instead of attributes on the offer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17454) Use Mesos disk resources

2019-02-10 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SPARK-17454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764454#comment-16764454
 ] 

Clément Michaud commented on SPARK-17454:
-

Hello,

I've added a way for executors to reserve disk resources from Mesos.

The change is https://github.com/apache/spark/pull/23758.

> Use Mesos disk resources
> 
>
> Key: SPARK-17454
> URL: https://issues.apache.org/jira/browse/SPARK-17454
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Chris Bannister
>Priority: Major
>
> Currently the driver will accept offers from Mesos which have enough ram for 
> the executor and until its max cores is reached. There is no way to control 
> the required CPU's or disk for each executor, it would be very useful to be 
> able to apply something similar to spark.mesos.constraints to resource offers 
> instead of attributes on the offer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17454) Use Mesos disk resources

2019-02-10 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-17454:


Assignee: Apache Spark

> Use Mesos disk resources
> 
>
> Key: SPARK-17454
> URL: https://issues.apache.org/jira/browse/SPARK-17454
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 2.0.0
>Reporter: Chris Bannister
>Assignee: Apache Spark
>Priority: Major
>
> Currently the driver will accept offers from Mesos which have enough ram for 
> the executor and until its max cores is reached. There is no way to control 
> the required CPU's or disk for each executor, it would be very useful to be 
> able to apply something similar to spark.mesos.constraints to resource offers 
> instead of attributes on the offer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26854) Support ANY subquery

2019-02-10 Thread Mingcong Han (JIRA)
Mingcong Han created SPARK-26854:


 Summary: Support ANY subquery
 Key: SPARK-26854
 URL: https://issues.apache.org/jira/browse/SPARK-26854
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 2.4.0
Reporter: Mingcong Han


ANY syntax
{quote}
SELECT column(s)
FROM table
WHERE column(s) operator ANY
(SELECT column(s) FROM table WHERE condition);
{quote}
`ANY` subquery can be regarded as a generalization of `IN` subquery. And `IN` 
subquery is a special case of `ANY` subquery whose operator should be "=". The 
expression evaluates to `true` if the comparison between `column(s)` and any 
row in the subquery's result set returns `true`.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26829) In place standard scaler so the column remains same after transformation

2019-02-10 Thread Santokh Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santokh Singh updated SPARK-26829:
--
Description: 
Standard scaler and some similar transformations takes input column name and 
produce a new column, either accepting output column or generating new one with 
some random name after performing transformation.

 "inplace" flag  on true does not generate new column in output in dataframe 
after transformation; preserves schema of df.

"inplace" flag on false works the way its currently working.

 

 

> In place standard scaler so the column remains same after transformation
> 
>
> Key: SPARK-26829
> URL: https://issues.apache.org/jira/browse/SPARK-26829
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.3.2
>Reporter: Santokh Singh
>Priority: Major
>
> Standard scaler and some similar transformations takes input column name and 
> produce a new column, either accepting output column or generating new one 
> with some random name after performing transformation.
>  "inplace" flag  on true does not generate new column in output in dataframe 
> after transformation; preserves schema of df.
> "inplace" flag on false works the way its currently working.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26829) In place standard scaler so the column remains same after transformation

2019-02-10 Thread Santokh Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santokh Singh updated SPARK-26829:
--
Description: (was: Standard scaler and some similar transformations)

> In place standard scaler so the column remains same after transformation
> 
>
> Key: SPARK-26829
> URL: https://issues.apache.org/jira/browse/SPARK-26829
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.3.2
>Reporter: Santokh Singh
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26829) In place standard scaler so the column remains same after transformation

2019-02-10 Thread Santokh Singh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santokh Singh updated SPARK-26829:
--
Description: Standard scaler and some similar transformations

> In place standard scaler so the column remains same after transformation
> 
>
> Key: SPARK-26829
> URL: https://issues.apache.org/jira/browse/SPARK-26829
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.3.2
>Reporter: Santokh Singh
>Priority: Major
>
> Standard scaler and some similar transformations



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org