[jira] [Commented] (SPARK-26836) Columns get switched in Spark SQL using Avro backed Hive table if schema evolves
[ https://issues.apache.org/jira/browse/SPARK-26836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764719#comment-16764719 ] Gengliang Wang commented on SPARK-26836: [~dongjoon]I don't think the issue is related to spark-avro lib. [~treff7es] Can you try reproducing the issue on Hive directly and see what the behavior is? > Columns get switched in Spark SQL using Avro backed Hive table if schema > evolves > > > Key: SPARK-26836 > URL: https://issues.apache.org/jira/browse/SPARK-26836 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1, 2.4.0 > Environment: I tested with Hive and HCatalog which runs on version > 2.3.4 and with Spark 2.3.1 and 2.4 >Reporter: Tamas Nemeth >Priority: Major > Labels: correctness > Attachments: doctors.avro, doctors_evolved.avro, > doctors_evolved.json, original.avsc > > > I have a hive avro table where the avro schema is stored on s3 next to the > avro files. > In the table definiton the avro.schema.url always points to the latest > partition's _schema.avsc file which is always the lates schema. (Avro schemas > are backward and forward compatible in a table) > When new data comes in, I always add a new partition where the > avro.schema.url properties also set to the _schema.avsc which was used when > it was added and of course I always update the table avro.schema.url property > to the latest one. > Querying this table works fine until the schema evolves in a way that a new > optional property is added in the middle. > When this happens then after the spark sql query the columns in the old > partition gets mixed up and it shows the wrong data for the columns. > If I query the table with Hive then everything is perfectly fine and it gives > me back the correct columns for the partitions which were created the old > schema and for the new which was created the evolved schema. > > Here is how I could reproduce with the > [doctors.avro|https://github.com/apache/spark/blob/master/sql/hive/src/test/resources/data/files/doctors.avro] > example data in sql test suite. > # I have created two partition folder: > {code:java} > [hadoop@ip-192-168-10-158 hadoop]$ hdfs dfs -ls s3://somelocation/doctors/*/ > Found 2 items > -rw-rw-rw- 1 hadoop hadoop 418 2019-02-06 12:48 s3://somelocation/doctors > /dt=2019-02-05/_schema.avsc > -rw-rw-rw- 1 hadoop hadoop 521 2019-02-06 12:13 s3://somelocation/doctors > /dt=2019-02-05/doctors.avro > Found 2 items > -rw-rw-rw- 1 hadoop hadoop 580 2019-02-06 12:49 s3://somelocation/doctors > /dt=2019-02-06/_schema.avsc > -rw-rw-rw- 1 hadoop hadoop 577 2019-02-06 12:13 s3://somelocation/doctors > /dt=2019-02-06/doctors_evolved.avro{code} > Here the first partition had data which was created with the schema before > evolving and the second one had the evolved one. (the evolved schema is the > same as in your testcase except I moved the extra_field column to the last > from the second and I generated two lines of avro data with the evolved > schema. > # I have created a hive table with the following command: > > {code:java} > CREATE EXTERNAL TABLE `default.doctors` > PARTITIONED BY ( > `dt` string > ) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > WITH SERDEPROPERTIES ( > 'avro.schema.url'='s3://somelocation/doctors/ > /dt=2019-02-06/_schema.avsc') > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > LOCATION > 's3://somelocation/doctors/' > TBLPROPERTIES ( > 'transient_lastDdlTime'='1538130975'){code} > > Here as you can see the table schema url points to the latest schema > 3. I ran an msck _repair table_ to pick up all the partitions. > Fyi: If I run my select * query from here then everything is fine and no > columns switch happening. > 4. Then I changed the first partition's avro.schema.url url to points to the > schema which is under the partition folder (non-evolved one -> > s3://somelocation/doctors/ > /dt=2019-02-05/_schema.avsc) > Then if you ran a _select * from default.spark_test_ then the columns will be > mixed up (on the data below the first name column becomes the extra_field > column. I guess because in the latest schema it is the second column): > > {code:java} > number,extra_field,first_name,last_name,dt > 6,Colin,Baker,null,2019-02-05 > 3,Jon,Pertwee,null,2019-02-05 > 4,Tom,Baker,null,2019-02-05 > 5,Peter,Davison,null,2019-02-05 > 11,Matt,Smith,null,2019-02-05 > 1,William,Hartnell,null,2019-02-05 > 7,Sylvester,McCoy,null,2019-02-05 > 8,Paul,McGann,null,2019-02-05 > 2,Patrick,Troughton,null,2019-02-05 > 9,Christopher,Eccleston,null,2019-02-05 >
[jira] [Commented] (SPARK-23408) Flaky test: StreamingOuterJoinSuite.left outer early state exclusion on right
[ https://issues.apache.org/jira/browse/SPARK-23408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764712#comment-16764712 ] Apache Spark commented on SPARK-23408: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/23757 > Flaky test: StreamingOuterJoinSuite.left outer early state exclusion on right > - > > Key: SPARK-23408 > URL: https://issues.apache.org/jira/browse/SPARK-23408 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 2.4.0 >Reporter: Marcelo Vanzin >Assignee: Tathagata Das >Priority: Minor > Fix For: 2.4.0 > > > Seen on an unrelated PR. > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87386/testReport/org.apache.spark.sql.streaming/StreamingOuterJoinSuite/left_outer_early_state_exclusion_on_right/ > {noformat} > sbt.ForkMain$ForkError: org.scalatest.exceptions.TestFailedException: > Assert on query failed: Check total state rows = List(4), updated state rows > = List(4): Array(1) did not equal List(4) incorrect updates rows > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:528) > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1560) > > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:501) > > org.apache.spark.sql.streaming.StateStoreMetricsTest$$anonfun$assertNumStateRows$1.apply(StateStoreMetricsTest.scala:28) > > org.apache.spark.sql.streaming.StateStoreMetricsTest$$anonfun$assertNumStateRows$1.apply(StateStoreMetricsTest.scala:23) > > org.apache.spark.sql.streaming.StreamTest$$anonfun$liftedTree1$1$1$$anonfun$apply$14.apply$mcZ$sp(StreamTest.scala:568) > > org.apache.spark.sql.streaming.StreamTest$class.verify$1(StreamTest.scala:371) > > org.apache.spark.sql.streaming.StreamTest$$anonfun$liftedTree1$1$1.apply(StreamTest.scala:568) > > org.apache.spark.sql.streaming.StreamTest$$anonfun$liftedTree1$1$1.apply(StreamTest.scala:432) > > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > == Progress == >AddData to MemoryStream[value#19652]: 3,4,5 >AddData to MemoryStream[value#19662]: 1,2,3 >CheckLastBatch: [3,10,6,9] > => AssertOnQuery(, Check total state rows = List(4), updated state > rows = List(4)) >AddData to MemoryStream[value#19652]: 20 >AddData to MemoryStream[value#19662]: 21 >CheckLastBatch: >AddData to MemoryStream[value#19662]: 20 >CheckLastBatch: [20,30,40,60],[4,10,8,null],[5,10,10,null] > == Stream == > Output Mode: Append > Stream state: {MemoryStream[value#19652]: 0,MemoryStream[value#19662]: 0} > Thread state: alive > Thread stack trace: java.lang.Thread.sleep(Native Method) > org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:152) > org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56) > org.apache.spark.sql.execution.streaming.MicroBatchExecution.runActivatedStream(MicroBatchExecution.scala:120) > org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:279) > org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189) > {noformat} > No other failures in the history, though. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26855) SparkSubmitSuite fails on a clean build
[ https://issues.apache.org/jira/browse/SPARK-26855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764675#comment-16764675 ] Hyukjin Kwon commented on SPARK-26855: -- But I think it's tricky to fix IMHO .. Should be awesome if that's fixed. > SparkSubmitSuite fails on a clean build > --- > > Key: SPARK-26855 > URL: https://issues.apache.org/jira/browse/SPARK-26855 > Project: Spark > Issue Type: Bug > Components: Spark Core, SparkR >Affects Versions: 2.3.2, 2.4.0 >Reporter: Felix Cheung >Priority: Major > > SparkSubmitSuite > "include an external JAR in SparkR" > fails consistently but the test before it, "correctly builds R packages > included in a jar with --packages" passes. > the workaround is to build once with skipTests first, then everything passes. > ran into this while testing 2.3.3 RC2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26800) JDBC - MySQL nullable option is ignored
[ https://issues.apache.org/jira/browse/SPARK-26800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764665#comment-16764665 ] Maxim Gekk commented on SPARK-26800: Using NOT NULL for TIMESTAMP by default is non-standard behaviour. Please, take a look at the [MySQL system variable|https://dev.mysql.com/doc/refman/5.6/en/server-system-variables.html#sysvar_explicit_defaults_for_timestamp] to control the behaviour. > JDBC - MySQL nullable option is ignored > --- > > Key: SPARK-26800 > URL: https://issues.apache.org/jira/browse/SPARK-26800 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Francisco Miguel Biete Banon >Priority: Minor > > Spark 2.4.0 > MySQL 5.7.21 (docker official MySQL image running with default config) > Writing a dataframe with optionally null fields result in a table with NOT > NULL attributes in MySQL. > {code:java} > import org.apache.spark.sql.types._ > import org.apache.spark.sql.{Row, SaveMode} > import java.sql.Timestamp > val data = Seq[Row](Row(1, null, "Boston"), Row(2, null, "New York")) > val schema = StructType( > StructField("id", IntegerType, true) :: > StructField("when", TimestampType, true) :: > StructField("city", StringType, true) :: Nil) > println(schema.toDDL) > val df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema) > df.write.mode(SaveMode.Overwrite).jdbc(jdbcUrl, "temp_bug", > jdbcProperties){code} > Produces > {code} > CREATE TABLE `temp_bug` ( > `id` int(11) DEFAULT NULL, > `when` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE > CURRENT_TIMESTAMP, > `city` text > ) ENGINE=InnoDB DEFAULT CHARSET=latin1; > {code} > I would expect "when" column to be defined as nullable. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26855) SparkSubmitSuite fails on a clean build
[ https://issues.apache.org/jira/browse/SPARK-26855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764662#comment-16764662 ] Takeshi Yamamuro edited comment on SPARK-26855 at 2/11/19 4:47 AM: --- Aha, good catch. I didn't notice this cuz I always run the tests after building once with skipTests first... was (Author: maropu): Aha, good catch. I always run the tests after building once with skipTests first... > SparkSubmitSuite fails on a clean build > --- > > Key: SPARK-26855 > URL: https://issues.apache.org/jira/browse/SPARK-26855 > Project: Spark > Issue Type: Bug > Components: Spark Core, SparkR >Affects Versions: 2.3.2, 2.4.0 >Reporter: Felix Cheung >Priority: Major > > SparkSubmitSuite > "include an external JAR in SparkR" > fails consistently but the test before it, "correctly builds R packages > included in a jar with --packages" passes. > the workaround is to build once with skipTests first, then everything passes. > ran into this while testing 2.3.3 RC2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26855) SparkSubmitSuite fails on a clean build
[ https://issues.apache.org/jira/browse/SPARK-26855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764662#comment-16764662 ] Takeshi Yamamuro commented on SPARK-26855: -- Aha, good catch. I always run the tests after building once with skipTests first... > SparkSubmitSuite fails on a clean build > --- > > Key: SPARK-26855 > URL: https://issues.apache.org/jira/browse/SPARK-26855 > Project: Spark > Issue Type: Bug > Components: Spark Core, SparkR >Affects Versions: 2.3.2, 2.4.0 >Reporter: Felix Cheung >Priority: Major > > SparkSubmitSuite > "include an external JAR in SparkR" > fails consistently but the test before it, "correctly builds R packages > included in a jar with --packages" passes. > the workaround is to build once with skipTests first, then everything passes. > ran into this while testing 2.3.3 RC2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26525) Fast release memory of ShuffleBlockFetcherIterator
[ https://issues.apache.org/jira/browse/SPARK-26525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-26525. - Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 23438 [https://github.com/apache/spark/pull/23438] > Fast release memory of ShuffleBlockFetcherIterator > -- > > Key: SPARK-26525 > URL: https://issues.apache.org/jira/browse/SPARK-26525 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 2.3.2 >Reporter: liupengcheng >Assignee: liupengcheng >Priority: Major > Fix For: 3.0.0 > > > Currently, spark would not release ShuffleBlockFetcherIterator until the > whole task finished. > In some conditions, it incurs memory leak. > An example is Shuffle -> map -> Coalesce(shuffle = false). Each > ShuffleBlockFetcherIterator contains some metas about > MapStatus(blocksByAddress) and each ShuffleMapTask will keep n(max to shuffle > partitions) shuffleBlockFetcherIterator for they are refered by > onCompleteCallbacks of TaskContext, in some case, it may take huge memory and > the memory will not released until the task finished. > Actually, We can release ShuffleBlockFetcherIterator as soon as it's consumed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-22826) [SQL] findWiderTypeForTwo Fails over StructField of Array
[ https://issues.apache.org/jira/browse/SPARK-22826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-22826. -- Resolution: Cannot Reproduce It's fixed somewhere. Please link the Jira if anyone identifies. > [SQL] findWiderTypeForTwo Fails over StructField of Array > - > > Key: SPARK-22826 > URL: https://issues.apache.org/jira/browse/SPARK-22826 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Aleksander Eskilson >Priority: Major > > The {{findWiderTypeForTwo}} codepath in Catalyst {{TypeCoercion}} fails when > applied to to {{StructType}} having the following fields: > {noformat} > StructType(StructField("a", ArrayType(StringType, containsNull=true)) > :: Nil), > StructType(StructField("a", ArrayType(StringType, containsNull=false)) > :: Nil) > {noformat} > When in {{findTightestCommonType}}, the function attempts to recursively find > the tightest common type of two arrays. These two arrays are not equal types > (since one would admit null elements and the other would not), but > {{findTightestCommonType}} has no match case for {{ArrayType}} (or > {{MapType}}), so the > [get|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala#L108] > operation on the dataType of the {{StructField}} throws a > {{NoSuchElementException}}. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26759) Arrow optimization in SparkR's interoperability
[ https://issues.apache.org/jira/browse/SPARK-26759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-26759: - Description: Arrow 0.12.0 is release and it contains R API. We could optimize Spark DaraFrame <> R DataFrame interoperability. For instance see the examples below: - {{dapply}} {code:java} df <- createDataFrame(mtcars) collect(dapply(df, function(r.data.frame) { data.frame(r.data.frame$gear) }, structType("gear long"))) {code} - {{gapply}} {code:java} df <- createDataFrame(mtcars) collect(gapply(df, "gear", function(key, group) { data.frame(gear = key[[1]], disp = mean(group$disp) > group$disp) }, structType("gear double, disp boolean"))) {code} - R DataFrame -> Spark DataFrame {code:java} createDataFrame(mtcars) {code} - Spark DataFrame -> R DataFrame {code:java} collect(df) head(df) {code} was:Arrow 0.12.0 is release and it contains R API. We could optimize Spark DaraFrame <> R DataFrame interoperability. > Arrow optimization in SparkR's interoperability > --- > > Key: SPARK-26759 > URL: https://issues.apache.org/jira/browse/SPARK-26759 > Project: Spark > Issue Type: Umbrella > Components: SparkR, SQL >Affects Versions: 3.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: release-notes > > Arrow 0.12.0 is release and it contains R API. We could optimize Spark > DaraFrame <> R DataFrame interoperability. > For instance see the examples below: > - {{dapply}} > {code:java} > df <- createDataFrame(mtcars) > collect(dapply(df, >function(r.data.frame) { > data.frame(r.data.frame$gear) >}, >structType("gear long"))) > {code} > - {{gapply}} > {code:java} > df <- createDataFrame(mtcars) > collect(gapply(df, >"gear", >function(key, group) { > data.frame(gear = key[[1]], disp = mean(group$disp) > > group$disp) >}, >structType("gear double, disp boolean"))) > {code} > - R DataFrame -> Spark DataFrame > {code:java} > createDataFrame(mtcars) > {code} > - Spark DataFrame -> R DataFrame > {code:java} > collect(df) > head(df) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26578) Synchronize putBytes's memory allocation and putting block on memoryManager
[ https://issues.apache.org/jira/browse/SPARK-26578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SongYadong resolved SPARK-26578. Resolution: Won't Fix This may not be a big problem that must be fixed. so won't fix. > Synchronize putBytes's memory allocation and putting block on memoryManager > --- > > Key: SPARK-26578 > URL: https://issues.apache.org/jira/browse/SPARK-26578 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: SongYadong >Priority: Minor > Original Estimate: 48h > Remaining Estimate: 48h > > Just like operation about memory allocation and putting/evicting blocks in > _MemoryStore_'s many other methods, I think it may be better that _putBytes_ > also be synchronized on _memoryManager_. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26578) Synchronize putBytes's memory allocation and putting block on memoryManager
[ https://issues.apache.org/jira/browse/SPARK-26578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764609#comment-16764609 ] SongYadong commented on SPARK-26578: This may not be a big problem that must be fixed. > Synchronize putBytes's memory allocation and putting block on memoryManager > --- > > Key: SPARK-26578 > URL: https://issues.apache.org/jira/browse/SPARK-26578 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: SongYadong >Priority: Minor > Original Estimate: 48h > Remaining Estimate: 48h > > Just like operation about memory allocation and putting/evicting blocks in > _MemoryStore_'s many other methods, I think it may be better that _putBytes_ > also be synchronized on _memoryManager_. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26843) Use ConfigEntry for hardcoded configs for "mesos" resource manager
[ https://issues.apache.org/jira/browse/SPARK-26843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-26843. --- Resolution: Fixed Assignee: Jungtaek Lim Fix Version/s: 3.0.0 This is resolved via https://github.com/apache/spark/pull/23743 > Use ConfigEntry for hardcoded configs for "mesos" resource manager > -- > > Key: SPARK-26843 > URL: https://issues.apache.org/jira/browse/SPARK-26843 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 3.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Fix For: 3.0.0 > > > Make the hardcoded configs in "mesos" module to use ConfigEntry. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24211) Flaky test: StreamingOuterJoinSuite
[ https://issues.apache.org/jira/browse/SPARK-24211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24211: Assignee: Apache Spark > Flaky test: StreamingOuterJoinSuite > --- > > Key: SPARK-24211 > URL: https://issues.apache.org/jira/browse/SPARK-24211 > Project: Spark > Issue Type: Bug > Components: Structured Streaming, Tests >Affects Versions: 2.3.2 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > > *windowed left outer join* > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/330/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/317/] > *windowed right outer join* > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/334/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/328/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/371/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/345/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/366/] > *left outer join with non-key condition violated* > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/337/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/366/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/386/] > *left outer early state exclusion on left* > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/375] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/385/] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24211) Flaky test: StreamingOuterJoinSuite
[ https://issues.apache.org/jira/browse/SPARK-24211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24211: Assignee: (was: Apache Spark) > Flaky test: StreamingOuterJoinSuite > --- > > Key: SPARK-24211 > URL: https://issues.apache.org/jira/browse/SPARK-24211 > Project: Spark > Issue Type: Bug > Components: Structured Streaming, Tests >Affects Versions: 2.3.2 >Reporter: Dongjoon Hyun >Priority: Major > > *windowed left outer join* > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/330/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/317/] > *windowed right outer join* > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/334/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/328/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/371/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/345/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/366/] > *left outer join with non-key condition violated* > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/337/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/366/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/386/] > *left outer early state exclusion on left* > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/375] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/385/] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24239) Flaky test: KafkaContinuousSourceSuite.subscribing topic by name from earliest offsets
[ https://issues.apache.org/jira/browse/SPARK-24239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764603#comment-16764603 ] Apache Spark commented on SPARK-24239: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/23757 > Flaky test: KafkaContinuousSourceSuite.subscribing topic by name from > earliest offsets > -- > > Key: SPARK-24239 > URL: https://issues.apache.org/jira/browse/SPARK-24239 > Project: Spark > Issue Type: Bug > Components: Structured Streaming, Tests >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Priority: Major > > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/360/ > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/353/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24239) Flaky test: KafkaContinuousSourceSuite.subscribing topic by name from earliest offsets
[ https://issues.apache.org/jira/browse/SPARK-24239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24239: Assignee: Apache Spark > Flaky test: KafkaContinuousSourceSuite.subscribing topic by name from > earliest offsets > -- > > Key: SPARK-24239 > URL: https://issues.apache.org/jira/browse/SPARK-24239 > Project: Spark > Issue Type: Bug > Components: Structured Streaming, Tests >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/360/ > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/353/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24239) Flaky test: KafkaContinuousSourceSuite.subscribing topic by name from earliest offsets
[ https://issues.apache.org/jira/browse/SPARK-24239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24239: Assignee: (was: Apache Spark) > Flaky test: KafkaContinuousSourceSuite.subscribing topic by name from > earliest offsets > -- > > Key: SPARK-24239 > URL: https://issues.apache.org/jira/browse/SPARK-24239 > Project: Spark > Issue Type: Bug > Components: Structured Streaming, Tests >Affects Versions: 2.4.0 >Reporter: Dongjoon Hyun >Priority: Major > > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/360/ > - > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/353/ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24211) Flaky test: StreamingOuterJoinSuite
[ https://issues.apache.org/jira/browse/SPARK-24211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764601#comment-16764601 ] Apache Spark commented on SPARK-24211: -- User 'HeartSaVioR' has created a pull request for this issue: https://github.com/apache/spark/pull/23757 > Flaky test: StreamingOuterJoinSuite > --- > > Key: SPARK-24211 > URL: https://issues.apache.org/jira/browse/SPARK-24211 > Project: Spark > Issue Type: Bug > Components: Structured Streaming, Tests >Affects Versions: 2.3.2 >Reporter: Dongjoon Hyun >Priority: Major > > *windowed left outer join* > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/330/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/317/] > *windowed right outer join* > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/334/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/328/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/371/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/345/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/366/] > *left outer join with non-key condition violated* > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/337/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.7/366/] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/386/] > *left outer early state exclusion on left* > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.7/375] > - > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/385/] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26816) Add Benchmark for XORShiftRandom
[ https://issues.apache.org/jira/browse/SPARK-26816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-26816. --- Resolution: Fixed Assignee: Maxim Gekk Fix Version/s: 3.0.0 This is resolved via https://github.com/apache/spark/pull/23752 > Add Benchmark for XORShiftRandom > > > Key: SPARK-26816 > URL: https://issues.apache.org/jira/browse/SPARK-26816 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Minor > Labels: low-hanging-fruit > Fix For: 3.0.0 > > > Currently, benchmark for XORShiftRandom is mixed with implementation: > [https://github.com/apache/spark/blob/5264164a67df498b73facae207eda12ee133be7d/core/src/main/scala/org/apache/spark/util/random/XORShiftRandom.scala#L70-L107] > . Need to extract the code and create a separate benchmark. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-26684) Add logs when allocating large memory for PooledByteBufAllocator
[ https://issues.apache.org/jira/browse/SPARK-26684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun closed SPARK-26684. - > Add logs when allocating large memory for PooledByteBufAllocator > > > Key: SPARK-26684 > URL: https://issues.apache.org/jira/browse/SPARK-26684 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 2.4.0 >Reporter: liupengcheng >Priority: Trivial > > Currently, Spark use `PooledByteBufAllocator` to allocate memory for channel > reading. However, the allocated heap/offheap memory size is not tracked. > Sometimes, this make it difficult to find out the cause of OOM failures(for > instance, direct memory oom). we have to dump the heap and use more advanced > tools like MAT to locate the cause. > Actually, we can add some logs for `PooledByteBufAllocator` when allocating > large memory, which can facilitate the debugging. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26684) Add logs when allocating large memory for PooledByteBufAllocator
[ https://issues.apache.org/jira/browse/SPARK-26684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764558#comment-16764558 ] Dongjoon Hyun commented on SPARK-26684: --- Hi, [~liupengcheng]. As I commented on your PR, please contribute to `Netty` project itself. Inside `Netty` project, it is one liner for each functions. - https://github.com/netty/netty/blob/4.1/buffer/src/main/java/io/netty/buffer/PooledByteBufAllocator.java#L321 > Add logs when allocating large memory for PooledByteBufAllocator > > > Key: SPARK-26684 > URL: https://issues.apache.org/jira/browse/SPARK-26684 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 2.4.0 >Reporter: liupengcheng >Priority: Trivial > > Currently, Spark use `PooledByteBufAllocator` to allocate memory for channel > reading. However, the allocated heap/offheap memory size is not tracked. > Sometimes, this make it difficult to find out the cause of OOM failures(for > instance, direct memory oom). we have to dump the heap and use more advanced > tools like MAT to locate the cause. > Actually, we can add some logs for `PooledByteBufAllocator` when allocating > large memory, which can facilitate the debugging. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-26684) Add logs when allocating large memory for PooledByteBufAllocator
[ https://issues.apache.org/jira/browse/SPARK-26684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-26684. --- Resolution: Invalid > Add logs when allocating large memory for PooledByteBufAllocator > > > Key: SPARK-26684 > URL: https://issues.apache.org/jira/browse/SPARK-26684 > Project: Spark > Issue Type: Improvement > Components: Shuffle >Affects Versions: 2.4.0 >Reporter: liupengcheng >Priority: Trivial > > Currently, Spark use `PooledByteBufAllocator` to allocate memory for channel > reading. However, the allocated heap/offheap memory size is not tracked. > Sometimes, this make it difficult to find out the cause of OOM failures(for > instance, direct memory oom). we have to dump the heap and use more advanced > tools like MAT to locate the cause. > Actually, we can add some logs for `PooledByteBufAllocator` when allocating > large memory, which can facilitate the debugging. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26783) Kafka parameter documentation doesn't match with the reality (upper/lowercase)
[ https://issues.apache.org/jira/browse/SPARK-26783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764553#comment-16764553 ] Dongjoon Hyun commented on SPARK-26783: --- Hi, [~gsomogyi]. Is this still valid? Could you update the Jira issue description more specifically? > Kafka parameter documentation doesn't match with the reality (upper/lowercase) > -- > > Key: SPARK-26783 > URL: https://issues.apache.org/jira/browse/SPARK-26783 > Project: Spark > Issue Type: Bug > Components: Structured Streaming >Affects Versions: 3.0.0 >Reporter: Gabor Somogyi >Priority: Minor > > A good example for this is "failOnDataLoss" which is reported in SPARK-23685. > I've just checked and there are several other parameters which suffer from > the same issue. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26855) SparkSubmitSuite fails on a clean build
[ https://issues.apache.org/jira/browse/SPARK-26855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-26855: -- Affects Version/s: 2.4.0 > SparkSubmitSuite fails on a clean build > --- > > Key: SPARK-26855 > URL: https://issues.apache.org/jira/browse/SPARK-26855 > Project: Spark > Issue Type: Bug > Components: Spark Core, SparkR >Affects Versions: 2.3.2, 2.4.0 >Reporter: Felix Cheung >Priority: Major > > SparkSubmitSuite > "include an external JAR in SparkR" > fails consistently but the test before it, "correctly builds R packages > included in a jar with --packages" passes. > the workaround is to build once with skipTests first, then everything passes. > ran into this while testing 2.3.3 RC2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-26855) SparkSubmitSuite fails on a clean build
[ https://issues.apache.org/jira/browse/SPARK-26855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764538#comment-16764538 ] Dongjoon Hyun edited comment on SPARK-26855 at 2/10/19 8:15 PM: +1 for fix this. IIRC, not only this, there was another instances which requires 2-phase buildings (which we have been used for a long time), too. was (Author: dongjoon): IIRC, not only this, there was another instances which requires 2-phase buildings (which we have been used for a long time), too. > SparkSubmitSuite fails on a clean build > --- > > Key: SPARK-26855 > URL: https://issues.apache.org/jira/browse/SPARK-26855 > Project: Spark > Issue Type: Bug > Components: Spark Core, SparkR >Affects Versions: 2.3.2 >Reporter: Felix Cheung >Priority: Major > > SparkSubmitSuite > "include an external JAR in SparkR" > fails consistently but the test before it, "correctly builds R packages > included in a jar with --packages" passes. > the workaround is to build once with skipTests first, then everything passes. > ran into this while testing 2.3.3 RC2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26855) SparkSubmitSuite fails on a clean build
[ https://issues.apache.org/jira/browse/SPARK-26855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764538#comment-16764538 ] Dongjoon Hyun commented on SPARK-26855: --- IIRC, not only this, there was another instances which requires 2-phase buildings (which we have been used for a long time), too. > SparkSubmitSuite fails on a clean build > --- > > Key: SPARK-26855 > URL: https://issues.apache.org/jira/browse/SPARK-26855 > Project: Spark > Issue Type: Bug > Components: Spark Core, SparkR >Affects Versions: 2.3.2 >Reporter: Felix Cheung >Priority: Major > > SparkSubmitSuite > "include an external JAR in SparkR" > fails consistently but the test before it, "correctly builds R packages > included in a jar with --packages" passes. > the workaround is to build once with skipTests first, then everything passes. > ran into this while testing 2.3.3 RC2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-26855) SparkSubmitSuite fails on a clean build
[ https://issues.apache.org/jira/browse/SPARK-26855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764537#comment-16764537 ] Sean Owen commented on SPARK-26855: --- I think this should be improved, but even I am not sure how much it is supposed to work without building Spark first separately. That's always been how I test. > SparkSubmitSuite fails on a clean build > --- > > Key: SPARK-26855 > URL: https://issues.apache.org/jira/browse/SPARK-26855 > Project: Spark > Issue Type: Bug > Components: Spark Core, SparkR >Affects Versions: 2.3.2 >Reporter: Felix Cheung >Priority: Major > > SparkSubmitSuite > "include an external JAR in SparkR" > fails consistently but the test before it, "correctly builds R packages > included in a jar with --packages" passes. > the workaround is to build once with skipTests first, then everything passes. > ran into this while testing 2.3.3 RC2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26840) Avoid cost-based join reorder in presence of join hints
[ https://issues.apache.org/jira/browse/SPARK-26840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maryann Xue updated SPARK-26840: Description: This is a fix for [https://github.com/apache/spark/pull/23524|https://github.com/apache/spark/pull/23524.], which did not stop cost-based join reorder when the {{CostBasedJoinReorder}} rule recurses down the tree and applies join reorder for nested joins with hints. > Avoid cost-based join reorder in presence of join hints > --- > > Key: SPARK-26840 > URL: https://issues.apache.org/jira/browse/SPARK-26840 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maryann Xue >Priority: Minor > > This is a fix for > [https://github.com/apache/spark/pull/23524|https://github.com/apache/spark/pull/23524.], > which did not stop cost-based join reorder when the {{CostBasedJoinReorder}} > rule recurses down the tree and applies join reorder for nested joins with > hints. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26840) Avoid cost-based join reorder in presence of join hints
[ https://issues.apache.org/jira/browse/SPARK-26840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26840: Assignee: (was: Apache Spark) > Avoid cost-based join reorder in presence of join hints > --- > > Key: SPARK-26840 > URL: https://issues.apache.org/jira/browse/SPARK-26840 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maryann Xue >Priority: Minor > > This is a fix for > [https://github.com/apache/spark/pull/23524|https://github.com/apache/spark/pull/23524.], > which did not stop cost-based join reorder when the {{CostBasedJoinReorder}} > rule recurses down the tree and applies join reorder for nested joins with > hints. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-26840) Avoid cost-based join reorder in presence of join hints
[ https://issues.apache.org/jira/browse/SPARK-26840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-26840: Assignee: Apache Spark > Avoid cost-based join reorder in presence of join hints > --- > > Key: SPARK-26840 > URL: https://issues.apache.org/jira/browse/SPARK-26840 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Maryann Xue >Assignee: Apache Spark >Priority: Minor > > This is a fix for > [https://github.com/apache/spark/pull/23524|https://github.com/apache/spark/pull/23524.], > which did not stop cost-based join reorder when the {{CostBasedJoinReorder}} > rule recurses down the tree and applies join reorder for nested joins with > hints. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26855) SparkSubmitSuite fails on a clean build
[ https://issues.apache.org/jira/browse/SPARK-26855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-26855: - Description: SparkSubmitSuite "include an external JAR in SparkR" fails consistently but the test before it, "correctly builds R packages included in a jar with --packages" passes. the workaround is to build once with skipTests first, then everything passes. ran into this while testing 2.3.3 RC2. > SparkSubmitSuite fails on a clean build > --- > > Key: SPARK-26855 > URL: https://issues.apache.org/jira/browse/SPARK-26855 > Project: Spark > Issue Type: Bug > Components: Spark Core, SparkR >Affects Versions: 2.3.2 >Reporter: Felix Cheung >Priority: Major > > SparkSubmitSuite > "include an external JAR in SparkR" > fails consistently but the test before it, "correctly builds R packages > included in a jar with --packages" passes. > the workaround is to build once with skipTests first, then everything passes. > ran into this while testing 2.3.3 RC2. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26855) SparkSubmitSuite fails on a clean build
Felix Cheung created SPARK-26855: Summary: SparkSubmitSuite fails on a clean build Key: SPARK-26855 URL: https://issues.apache.org/jira/browse/SPARK-26855 Project: Spark Issue Type: Bug Components: Spark Core, SparkR Affects Versions: 2.3.2 Reporter: Felix Cheung -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-17454) Use Mesos disk resources
[ https://issues.apache.org/jira/browse/SPARK-17454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764454#comment-16764454 ] Clément Michaud edited comment on SPARK-17454 at 2/10/19 3:41 PM: -- Hello, I've added a way for executors to reserve disk resources from Mesos. The change is [https://github.com/apache/spark/pull/23758]. [~mgummelt], I'd really appreciate if you can review or help me find someone to review this change. was (Author: clems4ever): Hello, I've added a way for executors to reserve disk resources from Mesos. The change is https://github.com/apache/spark/pull/23758. > Use Mesos disk resources > > > Key: SPARK-17454 > URL: https://issues.apache.org/jira/browse/SPARK-17454 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Chris Bannister >Priority: Major > > Currently the driver will accept offers from Mesos which have enough ram for > the executor and until its max cores is reached. There is no way to control > the required CPU's or disk for each executor, it would be very useful to be > able to apply something similar to spark.mesos.constraints to resource offers > instead of attributes on the offer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17454) Use Mesos disk resources
[ https://issues.apache.org/jira/browse/SPARK-17454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17454: Assignee: (was: Apache Spark) > Use Mesos disk resources > > > Key: SPARK-17454 > URL: https://issues.apache.org/jira/browse/SPARK-17454 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Chris Bannister >Priority: Major > > Currently the driver will accept offers from Mesos which have enough ram for > the executor and until its max cores is reached. There is no way to control > the required CPU's or disk for each executor, it would be very useful to be > able to apply something similar to spark.mesos.constraints to resource offers > instead of attributes on the offer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17454) Use Mesos disk resources
[ https://issues.apache.org/jira/browse/SPARK-17454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764454#comment-16764454 ] Clément Michaud commented on SPARK-17454: - Hello, I've added a way for executors to reserve disk resources from Mesos. The change is https://github.com/apache/spark/pull/23758. > Use Mesos disk resources > > > Key: SPARK-17454 > URL: https://issues.apache.org/jira/browse/SPARK-17454 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Chris Bannister >Priority: Major > > Currently the driver will accept offers from Mesos which have enough ram for > the executor and until its max cores is reached. There is no way to control > the required CPU's or disk for each executor, it would be very useful to be > able to apply something similar to spark.mesos.constraints to resource offers > instead of attributes on the offer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-17454) Use Mesos disk resources
[ https://issues.apache.org/jira/browse/SPARK-17454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17454: Assignee: Apache Spark > Use Mesos disk resources > > > Key: SPARK-17454 > URL: https://issues.apache.org/jira/browse/SPARK-17454 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 2.0.0 >Reporter: Chris Bannister >Assignee: Apache Spark >Priority: Major > > Currently the driver will accept offers from Mesos which have enough ram for > the executor and until its max cores is reached. There is no way to control > the required CPU's or disk for each executor, it would be very useful to be > able to apply something similar to spark.mesos.constraints to resource offers > instead of attributes on the offer. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-26854) Support ANY subquery
Mingcong Han created SPARK-26854: Summary: Support ANY subquery Key: SPARK-26854 URL: https://issues.apache.org/jira/browse/SPARK-26854 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 2.4.0 Reporter: Mingcong Han ANY syntax {quote} SELECT column(s) FROM table WHERE column(s) operator ANY (SELECT column(s) FROM table WHERE condition); {quote} `ANY` subquery can be regarded as a generalization of `IN` subquery. And `IN` subquery is a special case of `ANY` subquery whose operator should be "=". The expression evaluates to `true` if the comparison between `column(s)` and any row in the subquery's result set returns `true`. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26829) In place standard scaler so the column remains same after transformation
[ https://issues.apache.org/jira/browse/SPARK-26829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santokh Singh updated SPARK-26829: -- Description: Standard scaler and some similar transformations takes input column name and produce a new column, either accepting output column or generating new one with some random name after performing transformation. "inplace" flag on true does not generate new column in output in dataframe after transformation; preserves schema of df. "inplace" flag on false works the way its currently working. > In place standard scaler so the column remains same after transformation > > > Key: SPARK-26829 > URL: https://issues.apache.org/jira/browse/SPARK-26829 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.3.2 >Reporter: Santokh Singh >Priority: Major > > Standard scaler and some similar transformations takes input column name and > produce a new column, either accepting output column or generating new one > with some random name after performing transformation. > "inplace" flag on true does not generate new column in output in dataframe > after transformation; preserves schema of df. > "inplace" flag on false works the way its currently working. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26829) In place standard scaler so the column remains same after transformation
[ https://issues.apache.org/jira/browse/SPARK-26829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santokh Singh updated SPARK-26829: -- Description: (was: Standard scaler and some similar transformations) > In place standard scaler so the column remains same after transformation > > > Key: SPARK-26829 > URL: https://issues.apache.org/jira/browse/SPARK-26829 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.3.2 >Reporter: Santokh Singh >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-26829) In place standard scaler so the column remains same after transformation
[ https://issues.apache.org/jira/browse/SPARK-26829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santokh Singh updated SPARK-26829: -- Description: Standard scaler and some similar transformations > In place standard scaler so the column remains same after transformation > > > Key: SPARK-26829 > URL: https://issues.apache.org/jira/browse/SPARK-26829 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 2.3.2 >Reporter: Santokh Singh >Priority: Major > > Standard scaler and some similar transformations -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org