[jira] [Commented] (SPARK-25308) ArrayContains function may return a error in the code generation phase.
[ https://issues.apache.org/jira/browse/SPARK-25308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599951#comment-16599951 ] Apache Spark commented on SPARK-25308: -- User 'dilipbiswal' has created a pull request for this issue: https://github.com/apache/spark/pull/22315 > ArrayContains function may return a error in the code generation phase. > --- > > Key: SPARK-25308 > URL: https://issues.apache.org/jira/browse/SPARK-25308 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Dilip Biswal >Priority: Major > > Invoking ArrayContains function with non nullable array type throws the > following error in the code generation phase. > {code} > Code generation of array_contains([1,2,3], 1) failed: > java.util.concurrent.ExecutionException: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 40, Column 11: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 40, Column 11: Expression "isNull_0" is not an rvalue > java.util.concurrent.ExecutionException: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 40, Column 11: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 40, Column 11: Expression "isNull_0" is not an rvalue > at > com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) > at > com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > at > com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) > at > com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410) > at > com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2380) > at > com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257) > at com.google.common.cache.LocalCache.get(LocalCache.java:4000) > at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004) > at > com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1305) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25308) ArrayContains function may return a error in the code generation phase.
[ https://issues.apache.org/jira/browse/SPARK-25308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25308: Assignee: Apache Spark > ArrayContains function may return a error in the code generation phase. > --- > > Key: SPARK-25308 > URL: https://issues.apache.org/jira/browse/SPARK-25308 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Dilip Biswal >Assignee: Apache Spark >Priority: Major > > Invoking ArrayContains function with non nullable array type throws the > following error in the code generation phase. > {code} > Code generation of array_contains([1,2,3], 1) failed: > java.util.concurrent.ExecutionException: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 40, Column 11: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 40, Column 11: Expression "isNull_0" is not an rvalue > java.util.concurrent.ExecutionException: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 40, Column 11: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 40, Column 11: Expression "isNull_0" is not an rvalue > at > com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) > at > com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > at > com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) > at > com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410) > at > com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2380) > at > com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257) > at com.google.common.cache.LocalCache.get(LocalCache.java:4000) > at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004) > at > com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1305) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25308) ArrayContains function may return a error in the code generation phase.
[ https://issues.apache.org/jira/browse/SPARK-25308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25308: Assignee: (was: Apache Spark) > ArrayContains function may return a error in the code generation phase. > --- > > Key: SPARK-25308 > URL: https://issues.apache.org/jira/browse/SPARK-25308 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Dilip Biswal >Priority: Major > > Invoking ArrayContains function with non nullable array type throws the > following error in the code generation phase. > {code} > Code generation of array_contains([1,2,3], 1) failed: > java.util.concurrent.ExecutionException: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 40, Column 11: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 40, Column 11: Expression "isNull_0" is not an rvalue > java.util.concurrent.ExecutionException: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 40, Column 11: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 40, Column 11: Expression "isNull_0" is not an rvalue > at > com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) > at > com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > at > com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) > at > com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410) > at > com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2380) > at > com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257) > at com.google.common.cache.LocalCache.get(LocalCache.java:4000) > at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004) > at > com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1305) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25308) ArrayContains function may return a error in the code generation phase.
Dilip Biswal created SPARK-25308: Summary: ArrayContains function may return a error in the code generation phase. Key: SPARK-25308 URL: https://issues.apache.org/jira/browse/SPARK-25308 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.1 Reporter: Dilip Biswal Invoking ArrayContains function with non nullable array type throws the following error in the code generation phase. {code} Code generation of array_contains([1,2,3], 1) failed: java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 40, Column 11: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 40, Column 11: Expression "isNull_0" is not an rvalue java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 40, Column 11: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 40, Column 11: Expression "isNull_0" is not an rvalue at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410) at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2380) at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257) at com.google.common.cache.LocalCache.get(LocalCache.java:4000) at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004) at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1305) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25307) ArraySort function may return a error in the code generation phase.
[ https://issues.apache.org/jira/browse/SPARK-25307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599933#comment-16599933 ] Apache Spark commented on SPARK-25307: -- User 'dilipbiswal' has created a pull request for this issue: https://github.com/apache/spark/pull/22314 > ArraySort function may return a error in the code generation phase. > --- > > Key: SPARK-25307 > URL: https://issues.apache.org/jira/browse/SPARK-25307 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Dilip Biswal >Priority: Major > > Sorting array of booleans (not nullable) returns a compilation error in the > code generation phase. Below is the compilation error : > {code:java} > java.util.concurrent.ExecutionException: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 51, Column 23: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 51, Column 23: No applicable constructor/method found for actual parameters > "boolean[]"; candidates are: "public static void > java.util.Arrays.sort(long[])", "public static void > java.util.Arrays.sort(long[], int, int)", "public static void > java.util.Arrays.sort(byte[], int, int)", "public static void > java.util.Arrays.sort(float[])", "public static void > java.util.Arrays.sort(float[], int, int)", "public static void > java.util.Arrays.sort(char[])", "public static void > java.util.Arrays.sort(char[], int, int)", "public static void > java.util.Arrays.sort(short[], int, int)", "public static void > java.util.Arrays.sort(short[])", "public static void > java.util.Arrays.sort(byte[])", "public static void > java.util.Arrays.sort(java.lang.Object[], int, int, java.util.Comparator)", > "public static void java.util.Arrays.sort(java.lang.Object[], > java.util.Comparator)", "public static void java.util.Arrays.sort(int[])", > "public static void java.util.Arrays.sort(java.lang.Object[], int, int)", > "public static void java.util.Arrays.sort(java.lang.Object[])", "public > static void java.util.Arrays.sort(double[])", "public static void > java.util.Arrays.sort(double[], int, int)", "public static void > java.util.Arrays.sort(int[], int, int)" > at > com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) > at > com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > at > com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) > at > com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410) > at > com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2380) > at > com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257) > at com.google.common.cache.LocalCache.get(LocalCache.java:4000) > at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004) > at > com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1305) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25307) ArraySort function may return a error in the code generation phase.
[ https://issues.apache.org/jira/browse/SPARK-25307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25307: Assignee: (was: Apache Spark) > ArraySort function may return a error in the code generation phase. > --- > > Key: SPARK-25307 > URL: https://issues.apache.org/jira/browse/SPARK-25307 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Dilip Biswal >Priority: Major > > Sorting array of booleans (not nullable) returns a compilation error in the > code generation phase. Below is the compilation error : > {code:java} > java.util.concurrent.ExecutionException: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 51, Column 23: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 51, Column 23: No applicable constructor/method found for actual parameters > "boolean[]"; candidates are: "public static void > java.util.Arrays.sort(long[])", "public static void > java.util.Arrays.sort(long[], int, int)", "public static void > java.util.Arrays.sort(byte[], int, int)", "public static void > java.util.Arrays.sort(float[])", "public static void > java.util.Arrays.sort(float[], int, int)", "public static void > java.util.Arrays.sort(char[])", "public static void > java.util.Arrays.sort(char[], int, int)", "public static void > java.util.Arrays.sort(short[], int, int)", "public static void > java.util.Arrays.sort(short[])", "public static void > java.util.Arrays.sort(byte[])", "public static void > java.util.Arrays.sort(java.lang.Object[], int, int, java.util.Comparator)", > "public static void java.util.Arrays.sort(java.lang.Object[], > java.util.Comparator)", "public static void java.util.Arrays.sort(int[])", > "public static void java.util.Arrays.sort(java.lang.Object[], int, int)", > "public static void java.util.Arrays.sort(java.lang.Object[])", "public > static void java.util.Arrays.sort(double[])", "public static void > java.util.Arrays.sort(double[], int, int)", "public static void > java.util.Arrays.sort(int[], int, int)" > at > com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) > at > com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > at > com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) > at > com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410) > at > com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2380) > at > com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257) > at com.google.common.cache.LocalCache.get(LocalCache.java:4000) > at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004) > at > com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1305) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25307) ArraySort function may return a error in the code generation phase.
[ https://issues.apache.org/jira/browse/SPARK-25307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25307: Assignee: Apache Spark > ArraySort function may return a error in the code generation phase. > --- > > Key: SPARK-25307 > URL: https://issues.apache.org/jira/browse/SPARK-25307 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Dilip Biswal >Assignee: Apache Spark >Priority: Major > > Sorting array of booleans (not nullable) returns a compilation error in the > code generation phase. Below is the compilation error : > {code:java} > java.util.concurrent.ExecutionException: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 51, Column 23: failed to compile: > org.codehaus.commons.compiler.CompileException: File 'generated.java', Line > 51, Column 23: No applicable constructor/method found for actual parameters > "boolean[]"; candidates are: "public static void > java.util.Arrays.sort(long[])", "public static void > java.util.Arrays.sort(long[], int, int)", "public static void > java.util.Arrays.sort(byte[], int, int)", "public static void > java.util.Arrays.sort(float[])", "public static void > java.util.Arrays.sort(float[], int, int)", "public static void > java.util.Arrays.sort(char[])", "public static void > java.util.Arrays.sort(char[], int, int)", "public static void > java.util.Arrays.sort(short[], int, int)", "public static void > java.util.Arrays.sort(short[])", "public static void > java.util.Arrays.sort(byte[])", "public static void > java.util.Arrays.sort(java.lang.Object[], int, int, java.util.Comparator)", > "public static void java.util.Arrays.sort(java.lang.Object[], > java.util.Comparator)", "public static void java.util.Arrays.sort(int[])", > "public static void java.util.Arrays.sort(java.lang.Object[], int, int)", > "public static void java.util.Arrays.sort(java.lang.Object[])", "public > static void java.util.Arrays.sort(double[])", "public static void > java.util.Arrays.sort(double[], int, int)", "public static void > java.util.Arrays.sort(int[], int, int)" > at > com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) > at > com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) > at > com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) > at > com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) > at > com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410) > at > com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2380) > at > com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) > at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257) > at com.google.common.cache.LocalCache.get(LocalCache.java:4000) > at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004) > at > com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) > at > org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1305) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25307) ArraySort function may return a error in the code generation phase.
Dilip Biswal created SPARK-25307: Summary: ArraySort function may return a error in the code generation phase. Key: SPARK-25307 URL: https://issues.apache.org/jira/browse/SPARK-25307 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.1 Reporter: Dilip Biswal Sorting array of booleans (not nullable) returns a compilation error in the code generation phase. Below is the compilation error : {code:java} java.util.concurrent.ExecutionException: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 51, Column 23: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 51, Column 23: No applicable constructor/method found for actual parameters "boolean[]"; candidates are: "public static void java.util.Arrays.sort(long[])", "public static void java.util.Arrays.sort(long[], int, int)", "public static void java.util.Arrays.sort(byte[], int, int)", "public static void java.util.Arrays.sort(float[])", "public static void java.util.Arrays.sort(float[], int, int)", "public static void java.util.Arrays.sort(char[])", "public static void java.util.Arrays.sort(char[], int, int)", "public static void java.util.Arrays.sort(short[], int, int)", "public static void java.util.Arrays.sort(short[])", "public static void java.util.Arrays.sort(byte[])", "public static void java.util.Arrays.sort(java.lang.Object[], int, int, java.util.Comparator)", "public static void java.util.Arrays.sort(java.lang.Object[], java.util.Comparator)", "public static void java.util.Arrays.sort(int[])", "public static void java.util.Arrays.sort(java.lang.Object[], int, int)", "public static void java.util.Arrays.sort(java.lang.Object[])", "public static void java.util.Arrays.sort(double[])", "public static void java.util.Arrays.sort(double[], int, int)", "public static void java.util.Arrays.sort(int[], int, int)" at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306) at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293) at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) at com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135) at com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410) at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2380) at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342) at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257) at com.google.common.cache.LocalCache.get(LocalCache.java:4000) at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004) at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874) at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1305) {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10697) Lift Calculation in Association Rule mining
[ https://issues.apache.org/jira/browse/SPARK-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-10697. --- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 22236 [https://github.com/apache/spark/pull/22236] > Lift Calculation in Association Rule mining > --- > > Key: SPARK-10697 > URL: https://issues.apache.org/jira/browse/SPARK-10697 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Yashwanth Kumar >Assignee: Marco Gaido >Priority: Minor > Fix For: 2.4.0 > > > Lift is to be calculated for Association rule mining in > AssociationRules.scala under FPM. > Lift is a measure of the performance of a Association rules. > Adding lift will help to compare the model efficiency. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10697) Lift Calculation in Association Rule mining
[ https://issues.apache.org/jira/browse/SPARK-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-10697: - Assignee: Marco Gaido > Lift Calculation in Association Rule mining > --- > > Key: SPARK-10697 > URL: https://issues.apache.org/jira/browse/SPARK-10697 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Yashwanth Kumar >Assignee: Marco Gaido >Priority: Minor > Fix For: 2.4.0 > > > Lift is to be calculated for Association rule mining in > AssociationRules.scala under FPM. > Lift is a measure of the performance of a Association rules. > Adding lift will help to compare the model efficiency. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25306) Use cache to speed up `createFilter` in ORC
[ https://issues.apache.org/jira/browse/SPARK-25306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25306: -- Affects Version/s: 1.6.3 > Use cache to speed up `createFilter` in ORC > --- > > Key: SPARK-25306 > URL: https://issues.apache.org/jira/browse/SPARK-25306 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.2, 2.3.1, 2.4.0 >Reporter: Dongjoon Hyun >Priority: Critical > > In ORC data source, `createFilter` function has exponential time complexity > due to lack of memoization like the following. This issue aims to improve it. > *REPRODUCE* > {code} > // Create and read 1 row table with 1000 columns > sql("set spark.sql.orc.filterPushdown=true") > val selectExpr = (1 to 1000).map(i => s"id c$i") > spark.range(1).selectExpr(selectExpr: > _*).write.mode("overwrite").orc("/tmp/orc") > print(s"With 0 filters, ") > spark.time(spark.read.orc("/tmp/orc").count) > // Increase the number of filters > (20 to 30).foreach { width => > val whereExpr = (1 to width).map(i => s"c$i is not null").mkString(" and ") > print(s"With $width filters, ") > spark.time(spark.read.orc("/tmp/orc").where(whereExpr).count) > } > {code} > *RESULT* > {code} > With 0 filters, Time taken: 653 ms > > With 20 filters, Time taken: 962 ms > With 21 filters, Time taken: 1282 ms > With 22 filters, Time taken: 1982 ms > With 23 filters, Time taken: 3855 ms > With 24 filters, Time taken: 6719 ms > With 25 filters, Time taken: 12669 ms > With 26 filters, Time taken: 25032 ms > With 27 filters, Time taken: 49585 ms > With 28 filters, Time taken: 98980 ms// over 1 min 38 seconds > With 29 filters, Time taken: 198368 ms // over 3 mins > With 30 filters, Time taken: 393744 ms // over 6 mins > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25306) Use cache to speed up `createFilter` in ORC
[ https://issues.apache.org/jira/browse/SPARK-25306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25306: -- Affects Version/s: 2.0.2 > Use cache to speed up `createFilter` in ORC > --- > > Key: SPARK-25306 > URL: https://issues.apache.org/jira/browse/SPARK-25306 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.2, 2.3.1, 2.4.0 >Reporter: Dongjoon Hyun >Priority: Critical > > In ORC data source, `createFilter` function has exponential time complexity > due to lack of memoization like the following. This issue aims to improve it. > *REPRODUCE* > {code} > // Create and read 1 row table with 1000 columns > sql("set spark.sql.orc.filterPushdown=true") > val selectExpr = (1 to 1000).map(i => s"id c$i") > spark.range(1).selectExpr(selectExpr: > _*).write.mode("overwrite").orc("/tmp/orc") > print(s"With 0 filters, ") > spark.time(spark.read.orc("/tmp/orc").count) > // Increase the number of filters > (20 to 30).foreach { width => > val whereExpr = (1 to width).map(i => s"c$i is not null").mkString(" and ") > print(s"With $width filters, ") > spark.time(spark.read.orc("/tmp/orc").where(whereExpr).count) > } > {code} > *RESULT* > {code} > With 0 filters, Time taken: 653 ms > > With 20 filters, Time taken: 962 ms > With 21 filters, Time taken: 1282 ms > With 22 filters, Time taken: 1982 ms > With 23 filters, Time taken: 3855 ms > With 24 filters, Time taken: 6719 ms > With 25 filters, Time taken: 12669 ms > With 26 filters, Time taken: 25032 ms > With 27 filters, Time taken: 49585 ms > With 28 filters, Time taken: 98980 ms// over 1 min 38 seconds > With 29 filters, Time taken: 198368 ms // over 3 mins > With 30 filters, Time taken: 393744 ms // over 6 mins > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25306) Use cache to speed up `createFilter` in ORC
[ https://issues.apache.org/jira/browse/SPARK-25306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25306: -- Description: In ORC data source, `createFilter` function has exponential time complexity due to lack of memoization like the following. This issue aims to improve it. *REPRODUCE* {code} // Create and read 1 row table with 1000 columns sql("set spark.sql.orc.filterPushdown=true") val selectExpr = (1 to 1000).map(i => s"id c$i") spark.range(1).selectExpr(selectExpr: _*).write.mode("overwrite").orc("/tmp/orc") print(s"With 0 filters, ") spark.time(spark.read.orc("/tmp/orc").count) // Increase the number of filters (20 to 30).foreach { width => val whereExpr = (1 to width).map(i => s"c$i is not null").mkString(" and ") print(s"With $width filters, ") spark.time(spark.read.orc("/tmp/orc").where(whereExpr).count) } {code} *RESULT* {code} With 0 filters, Time taken: 653 ms With 20 filters, Time taken: 962 ms With 21 filters, Time taken: 1282 ms With 22 filters, Time taken: 1982 ms With 23 filters, Time taken: 3855 ms With 24 filters, Time taken: 6719 ms With 25 filters, Time taken: 12669 ms With 26 filters, Time taken: 25032 ms With 27 filters, Time taken: 49585 ms With 28 filters, Time taken: 98980 ms// over 1 min 38 seconds With 29 filters, Time taken: 198368 ms // over 3 mins With 30 filters, Time taken: 393744 ms // over 6 mins {code} was: In ORC data source, `createFilter` function has exponential time complexity due to lack of memoization like the following. This issue aims to improve it. *REPRODUCE* {code} // Create and read 1 row table with 1000 columns sql("set spark.sql.orc.filterPushdown=true") val selectExpr = (1 to 1000).map(i => s"id c$i") spark.range(1).selectExpr(selectExpr: _*).write.mode("overwrite").orc("/tmp/orc") print(s"With 0 filters, ") spark.time(spark.read.orc("/tmp/orc").count) // Increase the number of filters (20 to 30).foreach { width => val whereExpr = (1 to width).map(i => s"c$i is not null").mkString(" and ") print(s"With $width filters, ") spark.time(spark.read.orc("/tmp/orc").where(whereExpr).count) } {code} *RESULT* {code} With 0 filters, Time taken: 653 ms With 20 filters, Time taken: 962 ms With 21 filters, Time taken: 1282 ms With 22 filters, Time taken: 1982 ms With 23 filters, Time taken: 3855 ms With 24 filters, Time taken: 6719 ms With 25 filters, Time taken: 12669 ms With 26 filters, Time taken: 25032 ms With 27 filters, Time taken: 49585 ms With 28 filters, Time taken: 98980 ms // over 1 min 38 seconds With 29 filters, Time taken: 198368 ms // over 3 mins With 30 filters, Time taken: 393744 ms // over 6 mins {code} > Use cache to speed up `createFilter` in ORC > --- > > Key: SPARK-25306 > URL: https://issues.apache.org/jira/browse/SPARK-25306 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.3, 2.2.2, 2.3.1, 2.4.0 >Reporter: Dongjoon Hyun >Priority: Critical > > In ORC data source, `createFilter` function has exponential time complexity > due to lack of memoization like the following. This issue aims to improve it. > *REPRODUCE* > {code} > // Create and read 1 row table with 1000 columns > sql("set spark.sql.orc.filterPushdown=true") > val selectExpr = (1 to 1000).map(i => s"id c$i") > spark.range(1).selectExpr(selectExpr: > _*).write.mode("overwrite").orc("/tmp/orc") > print(s"With 0 filters, ") > spark.time(spark.read.orc("/tmp/orc").count) > // Increase the number of filters > (20 to 30).foreach { width => > val whereExpr = (1 to width).map(i => s"c$i is not null").mkString(" and ") > print(s"With $width filters, ") > spark.time(spark.read.orc("/tmp/orc").where(whereExpr).count) > } > {code} > *RESULT* > {code} > With 0 filters, Time taken: 653 ms > > With 20 filters, Time taken: 962 ms > With 21 filters, Time taken: 1282 ms > With 22 filters, Time taken: 1982 ms > With 23 filters, Time taken: 3855 ms > With 24 filters, Time taken: 6719 ms > With 25 filters, Time taken: 12669 ms > With 26 filters, Time taken: 25032 ms > With 27 filters, Time taken: 49585 ms > With 28 filters, Time taken: 98980 ms// over 1 min 38 seconds > With 29 filters, Time taken: 198368 ms // over 3 mins > With 30 filters, Time taken: 393744 ms // over 6 mins > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25306) Use cache to speed up `createFilter` in ORC
[ https://issues.apache.org/jira/browse/SPARK-25306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25306: -- Affects Version/s: (was: SQL) 2.4.0 2.1.3 2.2.2 2.3.1 > Use cache to speed up `createFilter` in ORC > --- > > Key: SPARK-25306 > URL: https://issues.apache.org/jira/browse/SPARK-25306 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.3, 2.2.2, 2.3.1, 2.4.0 >Reporter: Dongjoon Hyun >Priority: Critical > > In ORC data source, `createFilter` function has exponential time complexity > due to lack of memoization like the following. This issue aims to improve it. > *REPRODUCE* > {code} > // Create and read 1 row table with 1000 columns > sql("set spark.sql.orc.filterPushdown=true") > val selectExpr = (1 to 1000).map(i => s"id c$i") > spark.range(1).selectExpr(selectExpr: > _*).write.mode("overwrite").orc("/tmp/orc") > print(s"With 0 filters, ") > spark.time(spark.read.orc("/tmp/orc").count) > // Increase the number of filters > (20 to 30).foreach { width => > val whereExpr = (1 to width).map(i => s"c$i is not null").mkString(" and ") > print(s"With $width filters, ") > spark.time(spark.read.orc("/tmp/orc").where(whereExpr).count) > } > {code} > *RESULT* > {code} > With 0 filters, Time taken: 653 ms > > With 20 filters, Time taken: 962 ms > With 21 filters, Time taken: 1282 ms > With 22 filters, Time taken: 1982 ms > With 23 filters, Time taken: 3855 ms > With 24 filters, Time taken: 6719 ms > With 25 filters, Time taken: 12669 ms > With 26 filters, Time taken: 25032 ms > With 27 filters, Time taken: 49585 ms > With 28 filters, Time taken: 98980 ms// over 1 min 38 seconds > With 29 filters, Time taken: 198368 ms // over 3 mins > With 30 filters, Time taken: 393744 ms // over 6 mins > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25306) Use cache to speed up `createFilter` in ORC
[ https://issues.apache.org/jira/browse/SPARK-25306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599822#comment-16599822 ] Apache Spark commented on SPARK-25306: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/22313 > Use cache to speed up `createFilter` in ORC > --- > > Key: SPARK-25306 > URL: https://issues.apache.org/jira/browse/SPARK-25306 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: SQL >Reporter: Dongjoon Hyun >Priority: Critical > > In ORC data source, `createFilter` function has exponential time complexity > due to lack of memoization like the following. This issue aims to improve it. > *REPRODUCE* > {code} > // Create and read 1 row table with 1000 columns > sql("set spark.sql.orc.filterPushdown=true") > val selectExpr = (1 to 1000).map(i => s"id c$i") > spark.range(1).selectExpr(selectExpr: > _*).write.mode("overwrite").orc("/tmp/orc") > print(s"With 0 filters, ") > spark.time(spark.read.orc("/tmp/orc").count) > // Increase the number of filters > (20 to 30).foreach { width => > val whereExpr = (1 to width).map(i => s"c$i is not null").mkString(" and ") > print(s"With $width filters, ") > spark.time(spark.read.orc("/tmp/orc").where(whereExpr).count) > } > {code} > *RESULT* > {code} > With 0 filters, Time taken: 653 ms > > With 20 filters, Time taken: 962 ms > With 21 filters, Time taken: 1282 ms > With 22 filters, Time taken: 1982 ms > With 23 filters, Time taken: 3855 ms > With 24 filters, Time taken: 6719 ms > With 25 filters, Time taken: 12669 ms > With 26 filters, Time taken: 25032 ms > With 27 filters, Time taken: 49585 ms > With 28 filters, Time taken: 98980 ms // over 1 min 38 seconds > With 29 filters, Time taken: 198368 ms // over 3 mins > With 30 filters, Time taken: 393744 ms // over 6 mins > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25306) Use cache to speed up `createFilter` in ORC
[ https://issues.apache.org/jira/browse/SPARK-25306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25306: Assignee: Apache Spark > Use cache to speed up `createFilter` in ORC > --- > > Key: SPARK-25306 > URL: https://issues.apache.org/jira/browse/SPARK-25306 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: SQL >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Critical > > In ORC data source, `createFilter` function has exponential time complexity > due to lack of memoization like the following. This issue aims to improve it. > *REPRODUCE* > {code} > // Create and read 1 row table with 1000 columns > sql("set spark.sql.orc.filterPushdown=true") > val selectExpr = (1 to 1000).map(i => s"id c$i") > spark.range(1).selectExpr(selectExpr: > _*).write.mode("overwrite").orc("/tmp/orc") > print(s"With 0 filters, ") > spark.time(spark.read.orc("/tmp/orc").count) > // Increase the number of filters > (20 to 30).foreach { width => > val whereExpr = (1 to width).map(i => s"c$i is not null").mkString(" and ") > print(s"With $width filters, ") > spark.time(spark.read.orc("/tmp/orc").where(whereExpr).count) > } > {code} > *RESULT* > {code} > With 0 filters, Time taken: 653 ms > > With 20 filters, Time taken: 962 ms > With 21 filters, Time taken: 1282 ms > With 22 filters, Time taken: 1982 ms > With 23 filters, Time taken: 3855 ms > With 24 filters, Time taken: 6719 ms > With 25 filters, Time taken: 12669 ms > With 26 filters, Time taken: 25032 ms > With 27 filters, Time taken: 49585 ms > With 28 filters, Time taken: 98980 ms // over 1 min 38 seconds > With 29 filters, Time taken: 198368 ms // over 3 mins > With 30 filters, Time taken: 393744 ms // over 6 mins > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25306) Use cache to speed up `createFilter` in ORC
[ https://issues.apache.org/jira/browse/SPARK-25306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25306: Assignee: (was: Apache Spark) > Use cache to speed up `createFilter` in ORC > --- > > Key: SPARK-25306 > URL: https://issues.apache.org/jira/browse/SPARK-25306 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: SQL >Reporter: Dongjoon Hyun >Priority: Critical > > In ORC data source, `createFilter` function has exponential time complexity > due to lack of memoization like the following. This issue aims to improve it. > *REPRODUCE* > {code} > // Create and read 1 row table with 1000 columns > sql("set spark.sql.orc.filterPushdown=true") > val selectExpr = (1 to 1000).map(i => s"id c$i") > spark.range(1).selectExpr(selectExpr: > _*).write.mode("overwrite").orc("/tmp/orc") > print(s"With 0 filters, ") > spark.time(spark.read.orc("/tmp/orc").count) > // Increase the number of filters > (20 to 30).foreach { width => > val whereExpr = (1 to width).map(i => s"c$i is not null").mkString(" and ") > print(s"With $width filters, ") > spark.time(spark.read.orc("/tmp/orc").where(whereExpr).count) > } > {code} > *RESULT* > {code} > With 0 filters, Time taken: 653 ms > > With 20 filters, Time taken: 962 ms > With 21 filters, Time taken: 1282 ms > With 22 filters, Time taken: 1982 ms > With 23 filters, Time taken: 3855 ms > With 24 filters, Time taken: 6719 ms > With 25 filters, Time taken: 12669 ms > With 26 filters, Time taken: 25032 ms > With 27 filters, Time taken: 49585 ms > With 28 filters, Time taken: 98980 ms // over 1 min 38 seconds > With 29 filters, Time taken: 198368 ms // over 3 mins > With 30 filters, Time taken: 393744 ms // over 6 mins > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25306) Use cache to speed up `createFilter` in ORC
Dongjoon Hyun created SPARK-25306: - Summary: Use cache to speed up `createFilter` in ORC Key: SPARK-25306 URL: https://issues.apache.org/jira/browse/SPARK-25306 Project: Spark Issue Type: Bug Components: SQL Affects Versions: SQL Reporter: Dongjoon Hyun In ORC data source, `createFilter` function has exponential time complexity due to lack of memoization like the following. This issue aims to improve it. *REPRODUCE* {code} // Create and read 1 row table with 1000 columns sql("set spark.sql.orc.filterPushdown=true") val selectExpr = (1 to 1000).map(i => s"id c$i") spark.range(1).selectExpr(selectExpr: _*).write.mode("overwrite").orc("/tmp/orc") print(s"With 0 filters, ") spark.time(spark.read.orc("/tmp/orc").count) // Increase the number of filters (20 to 30).foreach { width => val whereExpr = (1 to width).map(i => s"c$i is not null").mkString(" and ") print(s"With $width filters, ") spark.time(spark.read.orc("/tmp/orc").where(whereExpr).count) } {code} *RESULT* {code} With 0 filters, Time taken: 653 ms With 20 filters, Time taken: 962 ms With 21 filters, Time taken: 1282 ms With 22 filters, Time taken: 1982 ms With 23 filters, Time taken: 3855 ms With 24 filters, Time taken: 6719 ms With 25 filters, Time taken: 12669 ms With 26 filters, Time taken: 25032 ms With 27 filters, Time taken: 49585 ms With 28 filters, Time taken: 98980 ms // over 1 min 38 seconds With 29 filters, Time taken: 198368 ms // over 3 mins With 30 filters, Time taken: 393744 ms // over 6 mins {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728 ] Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:38 PM: - Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have a question, why serve Palantir's priorities and not mine? This is not healthy. You didnt collaborate with me why? .Why nobody respected my will to make it go in 2.4, It was my priority back then. We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... I am not going to implement the same thing again its not reasonable. But also I wouldnt create a PR that just works, you can check my comments, that was easy, just call load and load the template (it is not rocket science this work). Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but the politically correct replies leave me no choice. We have talked on slack several times and privately as well. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. That is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. This is the first time I see this kind of competing "collaboration" among the people of a group that works for the same cause. It is disappointing. On the other hand, lesson learned, let's move on, no hard feelings. was (Author: skonto): Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have a question, why serve Palantir's priorities and not mine? This is not healthy. You didnt collaborate with me why? .Why nobody respected my will to make it go in 2.4, It was my priority back then. We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... I am not going to implement the same thing again its not reasonable. But also I wouldnt create a PR that just works, you can check my comments, that was easy, just call load and load the template (it is not rocket science this work). Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but the politically correct replies leave me no choice. We have talked on slack several times and privately as well. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. That is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. This is the first time I see this kind of competing "collaboration" among the people of a group that works for the same cause. It is disappointing. On the other hand, lesson learned, let's move on, no hard feelings. > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver and executor pods. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728 ] Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:37 PM: - Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have a question, why serve Palantir's priorities and not mine? This is not healthy. You didnt collaborate with me why? .Why nobody respected my will to make it go in 2.4, It was my priority back then. We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... I am not going to implement the same thing again its not reasonable. But also I wouldnt create a PR that just works, you can check my comments, that was easy, just call load and load the template (it is not rocket science this work). Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but the politically correct replies leave me no choice. We have talked on slack several times and privately as well. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. That is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. This is the first time I see this kind of competing "collaboration" among the people of a group that works for the same cause. It is disappointing. On the other hand, lesson learned, let's move on, no hard feelings. was (Author: skonto): Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have a question, why serve Palantir's priorities and not mine? This is not healthy. You didnt collaborate with me why? ) .Why nobody respected my will to make it go in 2.4, It was my priority back then too. We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... I am not going to implement the same thing again its not reasonable. But also I wouldnt create a PR that just works, you can check my comments, that was easy, just call load and load the template (it is not rocket science this work). Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but the politically correct replies leave me no choice. We have talked on slack several times and privately as well. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. That is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. This is the first time I see this kind of competing "collaboration" among the people of a group that works for the same cause. It is disappointing. On the other hand, lesson learned, let's move on, no hard feelings. > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver and executor pods. -- This message was sent by Atlassian JIRA
[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728 ] Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:36 PM: - Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have a question, why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? !/jira/images/icons/emoticons/smile.png|width=16,height=16! Why nobody respected my will to make it go in 2.4, It was my priority back then too. We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... I am not going to implement the same thing again its not reasonable. But also I wouldnt create a PR that just works, you can check my comments, that was easy, just call load and load the template (it is not rocket science this work). Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but the politically correct replies leave me no choice. We have talked on slack several times and privately as well. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. That is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. This is the first time I see this kind of competing "collaboration" among the people of a group that works for the same cause. It is disappointing. On the other hand, lesson learned, let's move on, no hard feelings. was (Author: skonto): Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have a question, why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) Why nobody respected my will to make it go in 2.4, It was my priority back then too. We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... I am not going to implement the same thing again its not reasonable. But also I wouldnt create a PR that just works, you can check my comments, that was easy, just call load and load the template (it is not rocket science this work). Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but the politically correct replies leave me no choice. We have talked on slack several times and privately as well. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. That is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. This is the first time I see this kind of competing collaboration among the people of a group that works for the same cause. It is disappointing. On the other hand, lesson learned, let's move on, no hard feelings. > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the
[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728 ] Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:36 PM: - Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have a question, why serve Palantir's priorities and not mine? This is not healthy. You didnt collaborate with me why? ) .Why nobody respected my will to make it go in 2.4, It was my priority back then too. We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... I am not going to implement the same thing again its not reasonable. But also I wouldnt create a PR that just works, you can check my comments, that was easy, just call load and load the template (it is not rocket science this work). Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but the politically correct replies leave me no choice. We have talked on slack several times and privately as well. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. That is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. This is the first time I see this kind of competing "collaboration" among the people of a group that works for the same cause. It is disappointing. On the other hand, lesson learned, let's move on, no hard feelings. was (Author: skonto): Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have a question, why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? !/jira/images/icons/emoticons/smile.png|width=16,height=16! Why nobody respected my will to make it go in 2.4, It was my priority back then too. We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... I am not going to implement the same thing again its not reasonable. But also I wouldnt create a PR that just works, you can check my comments, that was easy, just call load and load the template (it is not rocket science this work). Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but the politically correct replies leave me no choice. We have talked on slack several times and privately as well. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. That is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. This is the first time I see this kind of competing "collaboration" among the people of a group that works for the same cause. It is disappointing. On the other hand, lesson learned, let's move on, no hard feelings. > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver
[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728 ] Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:35 PM: - Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have a question, why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) Why nobody respected my will to make it go in 2.4, It was my priority back then too. We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... I am not going to implement the same thing again its not reasonable. But also I wouldnt create a PR that just works, you can check my comments, that was easy, just call load and load the template (it is not rocket science this work). Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but the politically correct replies leave me no choice. We have talked on slack several times and privately as well. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. That is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. This is the first time I see this kind of competing collaboration among the people of a group that works for the same cause. It is disappointing. On the other hand, lesson learned, let's move on, no hard feelings. was (Author: skonto): Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have a question, why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) Why nobody respected my will to make it go in 2.4, It was my priority back then too. We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... I am not going to implement the same thing again its not reasonable. But also I wouldnt create a PR that just works, you can check my comments, that was easy, just call load and load the template (it is not rocket science this work). Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but the politically correct replies leave me no choice. We have talked on slack several times and privately as well. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. That is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. Lesson learned, let's move on, no hard feelings. > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver and executor pods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional
[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728 ] Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:28 PM: - Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have a question, why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) Why nobody respected my will to make it go in 2.4, It was my priority back then too. We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... I am not going to implement the same thing again its not reasonable. But also I wouldnt create a PR that just works, you can check my comments, that was easy, just call load and load the template (it is not rocket science this work). Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but the politically correct replies leave me no choice. We have talked on slack several times and privately as well. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. That is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. Lesson learned, let's move on, no hard feelings. was (Author: skonto): Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have a question, why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) Why nobody respected my will to make it go in 2.4, It was my priority back then too. We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... I am not going to implement the same thing again its not reasonable. But also I wouldnt create a PR that just works, you can check my comments, that was easy, just call load and load the template (it is not rocket science this work). Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but the politically correct replies leave me no choice. We have talked on slack several times and privately as well. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. That is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. Lesson learned, let's move on. > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver and executor pods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728 ] Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:27 PM: - Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have a question, why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) Why nobody respected my will to make it go in 2.4, It was my priority back then too. We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... I am not going to implement the same thing again its not reasonable. But also I wouldnt create a PR that just works, you can check my comments, that was easy, just call load and load the template (it is not rocket science this work). Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but the politically correct replies leave me no choice. We have talked on slack several times and privately as well. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. That is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. Lesson learned, let's move on. was (Author: skonto): Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have a question, why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) Why nobody respected my will to make it go in 2.4, It was my priority back then too. We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... I am not going to implement the same thing again its not reasonable. But also I wouldnt create a PR that just works, you can check my comments, that was easy, just call load and load the template (it is not rocket science this work). Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but replies leave me no choice. We have talked on slack several times and privately as well. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. That is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. Lesson learned, let's move on. > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver and executor pods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728 ] Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:26 PM: - Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have a question, why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) Why nobody respected my will to make it go in 2.4, It was my priority back then too. We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... I am not going to implement the same thing again its not reasonable. But also I wouldnt create a PR that just works, you can check my comments, that was easy, just call load and load the template (it is not rocket science this work). Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but replies leave me no choice. We have talked on slack several times and privately as well. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. That is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. Lesson learned, let's move on. was (Author: skonto): Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have a question, why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... I am not going to implement the same thing again its not reasonable. But also I wouldnt create a PR that just works, you can check my comments, that was easy, just call load and load the template (it is not rocket science this work). Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but replies leave me no choice. We have talked on slack several times and privately as well. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. That is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. Lesson learned, let's move on. > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver and executor pods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728 ] Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:25 PM: - Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have a question, why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... I am not going to implement the same thing again its not reasonable. But also I wouldnt create a PR that just works, you can check my comments, that was easy, just call load and load the template (it is not rocket science this work). Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but replies leave me no choice. We have talked on slack several times and privately as well. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. That is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. Lesson learned, let's move on. was (Author: skonto): Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have a question, why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... I am not going to implement the same thing again its not reasonable. But also I wouldnt create a PR that just works, you can check my comments, that was easy, just call load and load the template (it is not rocket science this work). Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but replies leave me no choice. We have talked on slack several times and privately as well. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. The only point is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. Lesson learned, let's move on. > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver and executor pods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728 ] Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:25 PM: - Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have a question, why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... I am not going to implement the same thing again its not reasonable. But also I wouldnt create a PR that just works, you can check my comments, that was easy, just call load and load the template (it is not rocket science this work). Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but replies leave me no choice. We have talked on slack several times and privately as well. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. That is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. Lesson learned, let's move on. was (Author: skonto): Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have a question, why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... I am not going to implement the same thing again its not reasonable. But also I wouldnt create a PR that just works, you can check my comments, that was easy, just call load and load the template (it is not rocket science this work). Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but replies leave me no choice. We have talked on slack several times and privately as well. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. That is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. Lesson learned, let's move on. > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver and executor pods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728 ] Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:24 PM: - Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have a question, why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... I am not going to implement the same thing again its not reasonable. But also I wouldnt create a PR that just works, you can check my comments, that was easy, just call load and load the template (it is not rocket science this work). Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but replies leave me no choice. We have talked on slack several times and privately as well. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. The only point is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. Lesson learned, let's move on. was (Author: skonto): Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have question why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... I am not going to implement the same thing again its not reasonable. But also I wouldnt create a PR that just works, you can check my comments, that was easy, just call load and load the template (it is not rocket science this work). Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but replies leave me no choice. We have talked on slack several times and privately as well. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. The only point is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. Lesson learned, let's move on. > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver and executor pods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728 ] Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:23 PM: - Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have question why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... I am not going to implement the same thing again its not reasonable. But also I wouldnt create a PR that just works, you can check my comments, that was easy, just call load and load the template (it is not rocket science this work). Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but replies leave me no choice. We have talked on slack several times and privately as well. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. The only point is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. Lesson learned, let's move on. was (Author: skonto): Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have question why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... I am not going to implement the same thing again its not reasonable. But also I wouldnt create a PR that just works, you can check my comments, that was easy, just call load and load the template (it is not rocket science this work). Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but replies leave me no choice. We have talked on slack several times and privately. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. The only point is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. Lesson learned, let's move on. > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver and executor pods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728 ] Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:21 PM: - Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have question why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... I am not going to implement the same thing again its not reasonable. But also I wouldnt create a PR that just works, you can check my comments, that was easy, just call load and load the template (it is not rocket science this work). Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but replies leave me no choice. We have talked on slack several times and privately. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. The only point is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. Lesson learned, let's move on. was (Author: skonto): Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have question why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but replies leave me no choice. We have talked on slack several times and privately. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. The only point is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. Lesson learned, let's move on. > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver and executor pods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728 ] Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:17 PM: - Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have question why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but replies leave me no choice. We have talked on slack several times and privately. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. The only point is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. Lesson learned, let's move on. was (Author: skonto): Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have question why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but replies leave me no choice. We have talked on slack several times and privately. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. The only point is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. Lesson learned, let's move on. > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver and executor pods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728 ] Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:14 PM: - Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have question why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but replies leave me no choice. We have talked on slack several times and privately. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message communicated, culture and attitude is clear. The only point is for committers or others in the Spark project not to violate the rules fine. No rule was violated no worries. I am also ok. Lesson learned, let's move on. was (Author: skonto): Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have question why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but replies leave me no choice. We have talked on slack several times and privately. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message culture and attitude is clear. The only point is for committers, the Spark project not to violate the rules fine. > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver and executor pods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728 ] Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:13 PM: - Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have question why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... Sorry I dont see any real arguments in the discussion, and as I said I dont want to reply but replies leave me no choice. We have talked on slack several times and privately. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message culture and attitude is clear. The only point is for committers, the Spark project not to violate the rules fine. was (Author: skonto): Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have question why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... Sorry I dont see any rela arguments in the discussion, and as I said I dont want to reply but you leave me no choice. We have talked on slack several times and privately. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message culture and attitude is clear. The only point is for committers, the Spark project not to violate the rules fine. > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver and executor pods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728 ] Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:12 PM: - Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have question why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) We are implementing the same design, the whole discussion makes no sense, it is not about mine or your implementation... Sorry I dont see any rela arguments in the discussion, and as I said I dont want to reply but you leave me no choice. We have talked on slack several times and privately. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message culture and attitude is clear. The only point is for committers, the Spark project not to violate the rules fine. was (Author: skonto): Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have question why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) We are implementing the same design, the whole discussion makes no sense, it is not about my or your implementation... We have talked on slack several times and privately. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message culture and attitude is clear. The only point is for committers, the Spark project not to violate the rules fine. > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver and executor pods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728 ] Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:11 PM: - Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have question why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) We are implementing the same design, the whole discussion makes no sense, it is not about my or your implementation... We have talked on slack several times and privately. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message culture and attitude is clear. The only point is for committers, the Spark project not to violate the rules fine. was (Author: skonto): Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have question why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) We have talked on slack several times and privately. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message culture and attitude is clear. The only point is for committers, the Spark project not to violate the rules fine. > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver and executor pods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728 ] Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:10 PM: - Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have question why serve Palantir's priorities and not mine? This is not healthy :). You didnt collaborate with me why? :) We have talked on slack several times and privately. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message culture and attitude is clear. The only point is for committers, the Spark project not to violate the rules fine. was (Author: skonto): Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have question why serve Palantir's priorities and not mine? This is not healthy :). We have talked on slack several times and privately. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message culture and attitude is clear. The only point is for committers, the Spark project not to violate the rules fine. > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver and executor pods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728 ] Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:10 PM: - Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have question why serve Palantir's priorities and not mine? This is not healthy :). We have talked on slack several times and privately. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message culture and attitude is clear. The only point is for committers, the Spark project not to violate the rules fine. was (Author: skonto): Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have question why sever Palantir's priorities and not mine? This is not healthy :). We have talked on slack several times and privately. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message culture and attitude is clear. The only point is for committers, the Spark project not to violate the rules fine. > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver and executor pods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728 ] Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:09 PM: - Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have question why sever Palantir's priorities and not mine? This is not healthy :). We have talked on slack several times and privately. You could always have pinged me but you decided to collaborate on this, without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message culture and attitude is clear. The only point is for committers, the Spark project not to violate the rules fine. was (Author: skonto): Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have question why sever Palantir's priorities and not mine? This is not healthy :). We have talked on slack several times and privately. YOu could always have pinged but you decided to collaborate on this without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message culture and attitude is clear. The only point is for committers, the Spark project not to violate the rules fine. > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver and executor pods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728 ] Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:08 PM: - Spark belongs to the community (no?) and should not serve any company's priorities like Palantirs, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. I have question why sever Palantir's priorities and not mine? This is not healthy :). We have talked on slack several times and privately. YOu could always have pinged but you decided to collaborate on this without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message culture and attitude is clear. The only point is for committers, the Spark project not to violate the rules fine. was (Author: skonto): Spark belongs to the community (no?) and should not serve any company's priorities, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. We have talked on slack several times and privately. YOu could always have pinged but you decided to collaborate on this without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message culture and attitude is clear. The only point is for committers, the Spark project not to violate the rules fine. > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver and executor pods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728 ] Stavros Kontopoulos commented on SPARK-24434: - Spark belongs to the community (no?) and should not serve any company's priorities, [~mcheah] we don't need the meeting then if we are going to overlap with each other, fine. We have talked on slack several times and privately. YOu could always have pinged but you decided to collaborate on this without anyone knowing. Probably people don't understand the meaning of fairness, I am not going to explain it here. We can always create any RP we like and then we will see what work is merged, cool. For good or bad though the meeting has power because k8s committers have the final saying on merging no? So I dont agree. The whole discussion for me is pointless, the message culture and attitude is clear. The only point is for committers, the Spark project not to violate the rules fine. > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver and executor pods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17916) CSV data source treats empty string as null no matter what nullValue option is
[ https://issues.apache.org/jira/browse/SPARK-17916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599717#comment-16599717 ] Apache Spark commented on SPARK-17916: -- User 'koertkuipers' has created a pull request for this issue: https://github.com/apache/spark/pull/22312 > CSV data source treats empty string as null no matter what nullValue option is > -- > > Key: SPARK-17916 > URL: https://issues.apache.org/jira/browse/SPARK-17916 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.1 >Reporter: Hossein Falaki >Assignee: Maxim Gekk >Priority: Major > Fix For: 2.4.0 > > > When user configures {{nullValue}} in CSV data source, in addition to those > values, all empty string values are also converted to null. > {code} > data: > col1,col2 > 1,"-" > 2,"" > {code} > {code} > spark.read.format("csv").option("nullValue", "-") > {code} > We will find a null in both rows. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23253) Only write shuffle temporary index file when there is not an existing one
[ https://issues.apache.org/jira/browse/SPARK-23253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599714#comment-16599714 ] Imran Rashid commented on SPARK-23253: -- I think I see the issue you are referring to [~cloud_fan], but I'm not sure this change is actually the responsible one. Isn't it really from here https://github.com/apache/spark/pull/9610 ? the change here just changed whether we bother to write {{lengths}} to a file, but doesn't actually change whether we use that file at all. There is more history discussing that change (and non-determinism etc.) here https://github.com/apache/spark/pull/9214 and https://github.com/apache/spark/pull/6648 > Only write shuffle temporary index file when there is not an existing one > - > > Key: SPARK-23253 > URL: https://issues.apache.org/jira/browse/SPARK-23253 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 2.2.1 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 2.4.0 > > > Shuffle Index temporay file is used for atomic creating shuffle index file, > it is not needed when the index file already exists after another attempts of > same task had it done. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates
[ https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599697#comment-16599697 ] Matt Cheah commented on SPARK-24434: Everyone, thank you for your contribution to this discussion. It is important for us to agree upon constructive next steps to avoid this kind of miscommunication in the future. I have a few notes to make from having collaborated with Yifei and Onur on this patch and on behalf of Palantir. Firstly, we apologize that we did not communicate clearly enough on Apache communication channels that we were working on this, and the urgency of which we needed this work done. We agree with [~felixcheung]'s assessment that notes from the weekly meetings that have bearing on Spark development should be sent back to the wider community. We are specifically sorry for not having said something to the effect of "I am taking a stab at implementing this at https://github.com/... . Stavros, are you cool with that?" Palantir and the Kubernetes Big Data group must improve our communication next time. Secondly, we would suggest that a work in progress patch proposed early on in the feature's development would have been helpful for users to prepare to be able to use this feature in their internal tools. It's helpful for everyone to see the API and expected behavior of a new feature so that they can plan to take advantage of that feature ahead of time. Thirdly, a small clarifying comment on timelines and urgency. While we don't see the need for this to be in Spark 2.4, we will be taking the patch ahead of time on our fork of Spark which follows the Apache master branch (see [https://github.com/palantir/spark).] We were hoping to cherry-pick this patch soon but could have been clearer in our communication of this need. Fourthly, we are sorry for the wording in "On 15 Aug it was discussed that as Stavros Kontopoulos was out, and was not actively working on this PR at that moment, Yifei Huang and I can take over and start working on this.”: Instead of “take over”, we should have said “contribute to this feature” in this comment specifically. Finally, moving forward we are happy to collaborate on what the community believes to be the best implementation of this feature. We are happy to use Onur's, but we can also use Stavros's. Regardless of the chosen implementation, credit should be given to all parties. For example if Onur's implementation is chosen, Stavros's design work should be called out in the pull request description. Either way, we would like to see this feature merged by Friday, September 07, though this will have to be delayed if the Spark 2.4 release branch is not cut before that time (since we don't want this going into master and ending up in Spark 2.4 as a result). We are open to feedback on any of the above points and suggestions on how we can improve the way we contribute to Spark in the future. > Support user-specified driver and executor pod templates > > > Key: SPARK-24434 > URL: https://issues.apache.org/jira/browse/SPARK-24434 > Project: Spark > Issue Type: New Feature > Components: Kubernetes >Affects Versions: 2.4.0 >Reporter: Yinan Li >Priority: Major > > With more requests for customizing the driver and executor pods coming, the > current approach of adding new Spark configuration options has some serious > drawbacks: 1) it means more Kubernetes specific configuration options to > maintain, and 2) it widens the gap between the declarative model used by > Kubernetes and the configuration model used by Spark. We should start > designing a solution that allows users to specify pod templates as central > places for all customization needs for the driver and executor pods. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25305) Respect attribute name in `CollapseProject` and `ColumnPruning`
[ https://issues.apache.org/jira/browse/SPARK-25305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-25305: --- Summary: Respect attribute name in `CollapseProject` and `ColumnPruning` (was: Respect attribute name in `CollapseProject`) > Respect attribute name in `CollapseProject` and `ColumnPruning` > --- > > Key: SPARK-25305 > URL: https://issues.apache.org/jira/browse/SPARK-25305 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Gengliang Wang >Priority: Major > > Currently in optimizer rule `CollapseProject`, the lower level project is > collapsed into upper level, but the naming of alias in lower level is > propagated in upper level. > We should reserve all the output names in upper level. > See PR description of https://github.com/apache/spark/pull/22311 for details. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25305) Respect attribute name in `CollapseProject`
[ https://issues.apache.org/jira/browse/SPARK-25305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang updated SPARK-25305: --- Description: Currently in optimizer rule `CollapseProject`, the lower level project is collapsed into upper level, but the naming of alias in lower level is propagated in upper level. We should reserve all the output names in upper level. See PR description of https://github.com/apache/spark/pull/22311 for details. was: Currently in optimizer rule `CollapseProject`, the lower level project is collapsed into upper level, but the naming of alias in lower level is propagated in upper level. We should reserve all the output names in upper level. > Respect attribute name in `CollapseProject` > --- > > Key: SPARK-25305 > URL: https://issues.apache.org/jira/browse/SPARK-25305 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Gengliang Wang >Priority: Major > > Currently in optimizer rule `CollapseProject`, the lower level project is > collapsed into upper level, but the naming of alias in lower level is > propagated in upper level. > We should reserve all the output names in upper level. > See PR description of https://github.com/apache/spark/pull/22311 for details. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25305) Respect attribute name in `CollapseProject`
[ https://issues.apache.org/jira/browse/SPARK-25305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599665#comment-16599665 ] Apache Spark commented on SPARK-25305: -- User 'gengliangwang' has created a pull request for this issue: https://github.com/apache/spark/pull/22311 > Respect attribute name in `CollapseProject` > --- > > Key: SPARK-25305 > URL: https://issues.apache.org/jira/browse/SPARK-25305 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Gengliang Wang >Priority: Major > > Currently in optimizer rule `CollapseProject`, the lower level project is > collapsed into upper level, but the naming of alias in lower level is > propagated in upper level. > We should reserve all the output names in upper level. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25305) Respect attribute name in `CollapseProject`
[ https://issues.apache.org/jira/browse/SPARK-25305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25305: Assignee: Apache Spark > Respect attribute name in `CollapseProject` > --- > > Key: SPARK-25305 > URL: https://issues.apache.org/jira/browse/SPARK-25305 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Gengliang Wang >Assignee: Apache Spark >Priority: Major > > Currently in optimizer rule `CollapseProject`, the lower level project is > collapsed into upper level, but the naming of alias in lower level is > propagated in upper level. > We should reserve all the output names in upper level. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25305) Respect attribute name in `CollapseProject`
[ https://issues.apache.org/jira/browse/SPARK-25305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25305: Assignee: (was: Apache Spark) > Respect attribute name in `CollapseProject` > --- > > Key: SPARK-25305 > URL: https://issues.apache.org/jira/browse/SPARK-25305 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.4.0 >Reporter: Gengliang Wang >Priority: Major > > Currently in optimizer rule `CollapseProject`, the lower level project is > collapsed into upper level, but the naming of alias in lower level is > propagated in upper level. > We should reserve all the output names in upper level. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25305) Respect attribute name in `CollapseProject`
Gengliang Wang created SPARK-25305: -- Summary: Respect attribute name in `CollapseProject` Key: SPARK-25305 URL: https://issues.apache.org/jira/browse/SPARK-25305 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.0 Reporter: Gengliang Wang Currently in optimizer rule `CollapseProject`, the lower level project is collapsed into upper level, but the naming of alias in lower level is propagated in upper level. We should reserve all the output names in upper level. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25298) spark-tools build failure for Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-25298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25298: Assignee: Apache Spark > spark-tools build failure for Scala 2.12 > > > Key: SPARK-25298 > URL: https://issues.apache.org/jira/browse/SPARK-25298 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 2.4.0 >Reporter: Darcy Shen >Assignee: Apache Spark >Priority: Major > > $ sbt-- > > ++ 2.12.6 > > compile > > [error] > /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:22: > object runtime is not a member of package reflect > [error] import scala.reflect.runtime.\{universe => unv} > [error] ^ > [error] > /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:23: > object runtime is not a member of package reflect > [error] import scala.reflect.runtime.universe.runtimeMirror > [error] ^ > [error] > /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:41: > not found: value runtimeMirror > [error] private val mirror = runtimeMirror(classLoader) > [error] ^ > [error] > /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:43: > not found: value unv > [error] private def isPackagePrivate(sym: unv.Symbol) = -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25298) spark-tools build failure for Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-25298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599660#comment-16599660 ] Apache Spark commented on SPARK-25298: -- User 'sadhen' has created a pull request for this issue: https://github.com/apache/spark/pull/22310 > spark-tools build failure for Scala 2.12 > > > Key: SPARK-25298 > URL: https://issues.apache.org/jira/browse/SPARK-25298 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 2.4.0 >Reporter: Darcy Shen >Priority: Major > > $ sbt-- > > ++ 2.12.6 > > compile > > [error] > /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:22: > object runtime is not a member of package reflect > [error] import scala.reflect.runtime.\{universe => unv} > [error] ^ > [error] > /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:23: > object runtime is not a member of package reflect > [error] import scala.reflect.runtime.universe.runtimeMirror > [error] ^ > [error] > /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:41: > not found: value runtimeMirror > [error] private val mirror = runtimeMirror(classLoader) > [error] ^ > [error] > /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:43: > not found: value unv > [error] private def isPackagePrivate(sym: unv.Symbol) = -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25298) spark-tools build failure for Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-25298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25298: Assignee: (was: Apache Spark) > spark-tools build failure for Scala 2.12 > > > Key: SPARK-25298 > URL: https://issues.apache.org/jira/browse/SPARK-25298 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 2.4.0 >Reporter: Darcy Shen >Priority: Major > > $ sbt-- > > ++ 2.12.6 > > compile > > [error] > /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:22: > object runtime is not a member of package reflect > [error] import scala.reflect.runtime.\{universe => unv} > [error] ^ > [error] > /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:23: > object runtime is not a member of package reflect > [error] import scala.reflect.runtime.universe.runtimeMirror > [error] ^ > [error] > /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:41: > not found: value runtimeMirror > [error] private val mirror = runtimeMirror(classLoader) > [error] ^ > [error] > /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:43: > not found: value unv > [error] private def isPackagePrivate(sym: unv.Symbol) = -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20384) supporting value classes over primitives in DataSets
[ https://issues.apache.org/jira/browse/SPARK-20384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20384: Assignee: Apache Spark > supporting value classes over primitives in DataSets > > > Key: SPARK-20384 > URL: https://issues.apache.org/jira/browse/SPARK-20384 > Project: Spark > Issue Type: Improvement > Components: Optimizer, SQL >Affects Versions: 2.1.0 >Reporter: Daniel Davis >Assignee: Apache Spark >Priority: Minor > > As a spark user who uses value classes in scala for modelling domain objects, > I also would like to make use of them for datasets. > For example, I would like to use the {{User}} case class which is using a > value-class for it's {{id}} as the type for a DataSet: > - the underlying primitive should be mapped to the value-class column > - function on the column (for example comparison ) should only work if > defined on the value-class and use these implementation > - show() should pick up the toString method of the value-class > {code} > case class Id(value: Long) extends AnyVal { > def toString: String = value.toHexString > } > case class User(id: Id, name: String) > val ds = spark.sparkContext > .parallelize(0L to 12L).map(i => (i, f"name-$i")).toDS() > .withColumnRenamed("_1", "id") > .withColumnRenamed("_2", "name") > // mapping should work > val usrs = ds.as[User] > // show should use toString > usrs.show() > // comparison with long should throw exception, as not defined on Id > usrs.col("id") > 0L > {code} > For example `.show()` should use the toString of the `Id` value class: > {noformat} > +---+---+ > | id| name| > +---+---+ > | 0| name-0| > | 1| name-1| > | 2| name-2| > | 3| name-3| > | 4| name-4| > | 5| name-5| > | 6| name-6| > | 7| name-7| > | 8| name-8| > | 9| name-9| > | A|name-10| > | B|name-11| > | C|name-12| > +---+---+ > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20384) supporting value classes over primitives in DataSets
[ https://issues.apache.org/jira/browse/SPARK-20384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599659#comment-16599659 ] Apache Spark commented on SPARK-20384: -- User 'mt40' has created a pull request for this issue: https://github.com/apache/spark/pull/22309 > supporting value classes over primitives in DataSets > > > Key: SPARK-20384 > URL: https://issues.apache.org/jira/browse/SPARK-20384 > Project: Spark > Issue Type: Improvement > Components: Optimizer, SQL >Affects Versions: 2.1.0 >Reporter: Daniel Davis >Priority: Minor > > As a spark user who uses value classes in scala for modelling domain objects, > I also would like to make use of them for datasets. > For example, I would like to use the {{User}} case class which is using a > value-class for it's {{id}} as the type for a DataSet: > - the underlying primitive should be mapped to the value-class column > - function on the column (for example comparison ) should only work if > defined on the value-class and use these implementation > - show() should pick up the toString method of the value-class > {code} > case class Id(value: Long) extends AnyVal { > def toString: String = value.toHexString > } > case class User(id: Id, name: String) > val ds = spark.sparkContext > .parallelize(0L to 12L).map(i => (i, f"name-$i")).toDS() > .withColumnRenamed("_1", "id") > .withColumnRenamed("_2", "name") > // mapping should work > val usrs = ds.as[User] > // show should use toString > usrs.show() > // comparison with long should throw exception, as not defined on Id > usrs.col("id") > 0L > {code} > For example `.show()` should use the toString of the `Id` value class: > {noformat} > +---+---+ > | id| name| > +---+---+ > | 0| name-0| > | 1| name-1| > | 2| name-2| > | 3| name-3| > | 4| name-4| > | 5| name-5| > | 6| name-6| > | 7| name-7| > | 8| name-8| > | 9| name-9| > | A|name-10| > | B|name-11| > | C|name-12| > +---+---+ > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-20384) supporting value classes over primitives in DataSets
[ https://issues.apache.org/jira/browse/SPARK-20384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20384: Assignee: (was: Apache Spark) > supporting value classes over primitives in DataSets > > > Key: SPARK-20384 > URL: https://issues.apache.org/jira/browse/SPARK-20384 > Project: Spark > Issue Type: Improvement > Components: Optimizer, SQL >Affects Versions: 2.1.0 >Reporter: Daniel Davis >Priority: Minor > > As a spark user who uses value classes in scala for modelling domain objects, > I also would like to make use of them for datasets. > For example, I would like to use the {{User}} case class which is using a > value-class for it's {{id}} as the type for a DataSet: > - the underlying primitive should be mapped to the value-class column > - function on the column (for example comparison ) should only work if > defined on the value-class and use these implementation > - show() should pick up the toString method of the value-class > {code} > case class Id(value: Long) extends AnyVal { > def toString: String = value.toHexString > } > case class User(id: Id, name: String) > val ds = spark.sparkContext > .parallelize(0L to 12L).map(i => (i, f"name-$i")).toDS() > .withColumnRenamed("_1", "id") > .withColumnRenamed("_2", "name") > // mapping should work > val usrs = ds.as[User] > // show should use toString > usrs.show() > // comparison with long should throw exception, as not defined on Id > usrs.col("id") > 0L > {code} > For example `.show()` should use the toString of the `Id` value class: > {noformat} > +---+---+ > | id| name| > +---+---+ > | 0| name-0| > | 1| name-1| > | 2| name-2| > | 3| name-3| > | 4| name-4| > | 5| name-5| > | 6| name-6| > | 7| name-7| > | 8| name-8| > | 9| name-9| > | A|name-10| > | B|name-11| > | C|name-12| > +---+---+ > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25298) spark-tools build failure for Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-25298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599651#comment-16599651 ] Darcy Shen commented on SPARK-25298: sbt -Dscala-2.12 -Dscala.version=2.12.6 This is the solution, we should document it or improve the build definition. > spark-tools build failure for Scala 2.12 > > > Key: SPARK-25298 > URL: https://issues.apache.org/jira/browse/SPARK-25298 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 2.4.0 >Reporter: Darcy Shen >Priority: Major > > $ sbt-- > > ++ 2.12.6 > > compile > > [error] > /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:22: > object runtime is not a member of package reflect > [error] import scala.reflect.runtime.\{universe => unv} > [error] ^ > [error] > /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:23: > object runtime is not a member of package reflect > [error] import scala.reflect.runtime.universe.runtimeMirror > [error] ^ > [error] > /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:41: > not found: value runtimeMirror > [error] private val mirror = runtimeMirror(classLoader) > [error] ^ > [error] > /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:43: > not found: value unv > [error] private def isPackagePrivate(sym: unv.Symbol) = -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25304) enable HiveSparkSubmitSuite SPARK-8489 test for Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-25304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25304: Assignee: Apache Spark > enable HiveSparkSubmitSuite SPARK-8489 test for Scala 2.12 > -- > > Key: SPARK-25304 > URL: https://issues.apache.org/jira/browse/SPARK-25304 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Darcy Shen >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25304) enable HiveSparkSubmitSuite SPARK-8489 test for Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-25304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599640#comment-16599640 ] Apache Spark commented on SPARK-25304: -- User 'sadhen' has created a pull request for this issue: https://github.com/apache/spark/pull/22308 > enable HiveSparkSubmitSuite SPARK-8489 test for Scala 2.12 > -- > > Key: SPARK-25304 > URL: https://issues.apache.org/jira/browse/SPARK-25304 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Darcy Shen >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25304) enable HiveSparkSubmitSuite SPARK-8489 test for Scala 2.12
[ https://issues.apache.org/jira/browse/SPARK-25304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25304: Assignee: (was: Apache Spark) > enable HiveSparkSubmitSuite SPARK-8489 test for Scala 2.12 > -- > > Key: SPARK-25304 > URL: https://issues.apache.org/jira/browse/SPARK-25304 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 2.4.0 >Reporter: Darcy Shen >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8489) Add regression tests for SPARK-8470
[ https://issues.apache.org/jira/browse/SPARK-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599641#comment-16599641 ] Apache Spark commented on SPARK-8489: - User 'sadhen' has created a pull request for this issue: https://github.com/apache/spark/pull/22308 > Add regression tests for SPARK-8470 > --- > > Key: SPARK-8489 > URL: https://issues.apache.org/jira/browse/SPARK-8489 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 1.4.0 >Reporter: Andrew Or >Assignee: Andrew Or >Priority: Critical > Fix For: 1.4.1, 1.5.0 > > > See SPARK-8470 for more detail. Basically the Spark Hive code silently > overwrites the context class loader populated in SparkSubmit, resulting in > certain classes missing when we do reflection in `SQLContext#createDataFrame`. > That issue is already resolved in https://github.com/apache/spark/pull/6891, > but we should add a regression test for the specific manifestation of the bug > in SPARK-8470. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25304) enable HiveSparkSubmitSuite SPARK-8489 test for Scala 2.12
Darcy Shen created SPARK-25304: -- Summary: enable HiveSparkSubmitSuite SPARK-8489 test for Scala 2.12 Key: SPARK-25304 URL: https://issues.apache.org/jira/browse/SPARK-25304 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 2.4.0 Reporter: Darcy Shen -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25289) ChiSqSelector max on empty collection
[ https://issues.apache.org/jira/browse/SPARK-25289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-25289: - Assignee: Marco Gaido > ChiSqSelector max on empty collection > - > > Key: SPARK-25289 > URL: https://issues.apache.org/jira/browse/SPARK-25289 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 2.3.1 >Reporter: Marie Beaulieu >Assignee: Marco Gaido >Priority: Major > Fix For: 2.4.0 > > > In org.apache.spark.mllib.feature.ChiSqSelector.fit, there is a max taken on > a possibly empty collection. > I am using Spark 2.3.1. > Here is an example to reproduce. > {code:java} > import org.apache.spark.mllib.feature.ChiSqSelector > import org.apache.spark.mllib.linalg.Vectors > import org.apache.spark.mllib.regression.LabeledPoint > import org.apache.spark.sql.SQLContext > val sqlContext = new SQLContext(sc) > implicit val spark = sqlContext.sparkSession > val labeledPoints = (0 to 1).map(n => { > val v = Vectors.dense((1 to 3).map(_ => n * 1.0).toArray) > LabeledPoint(n.toDouble, v) > }) > val rdd = sc.parallelize(labeledPoints) > val selector = new ChiSqSelector().setSelectorType("fdr").setFdr(0.05) > selector.fit(rdd){code} > Here is the stack trace: > {code:java} > java.lang.UnsupportedOperationException: empty.max > at scala.collection.TraversableOnce$class.max(TraversableOnce.scala:229) > at scala.collection.mutable.ArrayOps$ofInt.max(ArrayOps.scala:234) > at org.apache.spark.mllib.feature.ChiSqSelector.fit(ChiSqSelector.scala:280) > {code} > Looking at line 280 in ChiSqSelector, it's pretty obvious how the collection > can be empty. A simple non empty validation should do the trick. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-25289) ChiSqSelector max on empty collection
[ https://issues.apache.org/jira/browse/SPARK-25289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-25289. --- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 22303 [https://github.com/apache/spark/pull/22303] > ChiSqSelector max on empty collection > - > > Key: SPARK-25289 > URL: https://issues.apache.org/jira/browse/SPARK-25289 > Project: Spark > Issue Type: Bug > Components: MLlib >Affects Versions: 2.3.1 >Reporter: Marie Beaulieu >Assignee: Marco Gaido >Priority: Major > Fix For: 2.4.0 > > > In org.apache.spark.mllib.feature.ChiSqSelector.fit, there is a max taken on > a possibly empty collection. > I am using Spark 2.3.1. > Here is an example to reproduce. > {code:java} > import org.apache.spark.mllib.feature.ChiSqSelector > import org.apache.spark.mllib.linalg.Vectors > import org.apache.spark.mllib.regression.LabeledPoint > import org.apache.spark.sql.SQLContext > val sqlContext = new SQLContext(sc) > implicit val spark = sqlContext.sparkSession > val labeledPoints = (0 to 1).map(n => { > val v = Vectors.dense((1 to 3).map(_ => n * 1.0).toArray) > LabeledPoint(n.toDouble, v) > }) > val rdd = sc.parallelize(labeledPoints) > val selector = new ChiSqSelector().setSelectorType("fdr").setFdr(0.05) > selector.fit(rdd){code} > Here is the stack trace: > {code:java} > java.lang.UnsupportedOperationException: empty.max > at scala.collection.TraversableOnce$class.max(TraversableOnce.scala:229) > at scala.collection.mutable.ArrayOps$ofInt.max(ArrayOps.scala:234) > at org.apache.spark.mllib.feature.ChiSqSelector.fit(ChiSqSelector.scala:280) > {code} > Looking at line 280 in ChiSqSelector, it's pretty obvious how the collection > can be empty. A simple non empty validation should do the trick. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-24615) Accelerator-aware task scheduling for Spark
[ https://issues.apache.org/jira/browse/SPARK-24615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao reassigned SPARK-24615: --- Assignee: (was: Saisai Shao) > Accelerator-aware task scheduling for Spark > --- > > Key: SPARK-24615 > URL: https://issues.apache.org/jira/browse/SPARK-24615 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Saisai Shao >Priority: Major > Labels: Hydrogen, SPIP > > In the machine learning area, accelerator card (GPU, FPGA, TPU) is > predominant compared to CPUs. To make the current Spark architecture to work > with accelerator cards, Spark itself should understand the existence of > accelerators and know how to schedule task onto the executors where > accelerators are equipped. > Current Spark’s scheduler schedules tasks based on the locality of the data > plus the available of CPUs. This will introduce some problems when scheduling > tasks with accelerators required. > # CPU cores are usually more than accelerators on one node, using CPU cores > to schedule accelerator required tasks will introduce the mismatch. > # In one cluster, we always assume that CPU is equipped in each node, but > this is not true of accelerator cards. > # The existence of heterogeneous tasks (accelerator required or not) > requires scheduler to schedule tasks with a smart way. > So here propose to improve the current scheduler to support heterogeneous > tasks (accelerator requires or not). This can be part of the work of Project > hydrogen. > Details is attached in google doc. It doesn't cover all the implementation > details, just highlight the parts should be changed. > > CC [~yanboliang] [~merlintang] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25297) Future for Scala 2.12 will block on a already shutdown ExecutionContext
[ https://issues.apache.org/jira/browse/SPARK-25297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599580#comment-16599580 ] Darcy Shen commented on SPARK-25297: This issue has been fixed by https://github.com/apache/spark/pull/22292 > Future for Scala 2.12 will block on a already shutdown ExecutionContext > --- > > Key: SPARK-25297 > URL: https://issues.apache.org/jira/browse/SPARK-25297 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Darcy Shen >Priority: Major > > *+see > [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12/193/]+* > *The Units Test blocks on FileBasedWriteAheadLogWithFileCloseAfterWriteSuite > in Console Output.* -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25279) Throw exception: zzcclp java.io.NotSerializableException: org.apache.spark.sql.TypedColumn in Spark-shell when run example of doc
[ https://issues.apache.org/jira/browse/SPARK-25279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599574#comment-16599574 ] Zhichao Zhang commented on SPARK-25279: [~dkbiswal], thank you. you mean that use the lates code on branch 2.2 to test and it work fine? > Throw exception: zzcclp java.io.NotSerializableException: > org.apache.spark.sql.TypedColumn in Spark-shell when run example of doc > --- > > Key: SPARK-25279 > URL: https://issues.apache.org/jira/browse/SPARK-25279 > Project: Spark > Issue Type: Bug > Components: Spark Shell, SQL >Affects Versions: 2.2.1 >Reporter: Zhichao Zhang >Priority: Minor > > Hi dev: > I am using Spark-Shell to run the example which is in section > '[http://spark.apache.org/docs/2.2.2/sql-programming-guide.html#type-safe-user-defined-aggregate-functions'], > > and there is an error: > {code:java} > Caused by: java.io.NotSerializableException: > org.apache.spark.sql.TypedColumn > Serialization stack: > - object not serializable (class: org.apache.spark.sql.TypedColumn, > value: > myaverage() AS `average_salary`) > - field (class: $iw, name: averageSalary, type: class > org.apache.spark.sql.TypedColumn) > - object (class $iw, $iw@4b2f8ae9) > - field (class: MyAverage$, name: $outer, type: class $iw) > - object (class MyAverage$, MyAverage$@2be41d90) > - field (class: > org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression, > name: aggregator, type: class org.apache.spark.sql.expressions.Aggregator) > - object (class > org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression, > MyAverage(Employee)) > - field (class: > org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression, > name: aggregateFunction, type: class > org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction) > - object (class > org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression, > partial_myaverage(MyAverage$@2be41d90, Some(newInstance(class Employee)), > Some(class Employee), Some(StructType(StructField(name,StringType,true), > StructField(salary,LongType,false))), assertnotnull(assertnotnull(input[0, > Average, true])).sum AS sum#25L, assertnotnull(assertnotnull(input[0, > Average, true])).count AS count#26L, newInstance(class Average), input[0, > double, false] AS value#24, DoubleType, false, 0, 0)) > - writeObject data (class: > scala.collection.immutable.List$SerializationProxy) > - object (class scala.collection.immutable.List$SerializationProxy, > scala.collection.immutable.List$SerializationProxy@5e92c46f) > - writeReplace data (class: > scala.collection.immutable.List$SerializationProxy) > - object (class scala.collection.immutable.$colon$colon, > List(partial_myaverage(MyAverage$@2be41d90, Some(newInstance(class > Employee)), Some(class Employee), > Some(StructType(StructField(name,StringType,true), > StructField(salary,LongType,false))), assertnotnull(assertnotnull(input[0, > Average, true])).sum AS sum#25L, assertnotnull(assertnotnull(input[0, > Average, true])).count AS count#26L, newInstance(class Average), input[0, > double, false] AS value#24, DoubleType, false, 0, 0))) > - field (class: > org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec, name: > aggregateExpressions, type: interface scala.collection.Seq) > - object (class > org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec, > ObjectHashAggregate(keys=[], > functions=[partial_myaverage(MyAverage$@2be41d90, Some(newInstance(class > Employee)), Some(class Employee), > Some(StructType(StructField(name,StringType,true), > StructField(salary,LongType,false))), assertnotnull(assertnotnull(input[0, > Average, true])).sum AS sum#25L, assertnotnull(assertnotnull(input[0, > Average, true])).count AS count#26L, newInstance(class Average), input[0, > double, false] AS value#24, DoubleType, false, 0, 0)], output=[buf#37]) > +- *FileScan json [name#8,salary#9L] Batched: false, Format: JSON, Location: > InMemoryFileIndex[file:/opt/spark2/examples/src/main/resources/employees.json], > > PartitionFilters: [], PushedFilters: [], ReadSchema: > struct > ) > - field (class: > org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1, > > name: $outer, type: class > org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec) > - object (class > org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1, > > ) > - field (class: >
[jira] [Resolved] (SPARK-25290) BytesToBytesMapOnHeapSuite randomizedStressTest can cause OutOfMemoryError
[ https://issues.apache.org/jira/browse/SPARK-25290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-25290. - Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 22297 [https://github.com/apache/spark/pull/22297] > BytesToBytesMapOnHeapSuite randomizedStressTest can cause OutOfMemoryError > -- > > Key: SPARK-25290 > URL: https://issues.apache.org/jira/browse/SPARK-25290 > Project: Spark > Issue Type: Test > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Liang-Chi Hsieh >Assignee: Liang-Chi Hsieh >Priority: Major > Fix For: 2.4.0 > > > BytesToBytesMapOnHeapSuite randomizedStressTest caused OutOfMemoryError on > several test runs. Seems better to reduce memory usage in this test. > [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95369/testReport/org.apache.spark.unsafe.map/BytesToBytesMapOnHeapSuite/randomizedStressTest/] > [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95482/testReport/org.apache.spark.unsafe.map/BytesToBytesMapOnHeapSuite/randomizedStressTest/] > [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95501/testReport/org.apache.spark.unsafe.map/BytesToBytesMapOnHeapSuite/randomizedStressTest/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-25290) BytesToBytesMapOnHeapSuite randomizedStressTest can cause OutOfMemoryError
[ https://issues.apache.org/jira/browse/SPARK-25290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-25290: --- Assignee: Liang-Chi Hsieh > BytesToBytesMapOnHeapSuite randomizedStressTest can cause OutOfMemoryError > -- > > Key: SPARK-25290 > URL: https://issues.apache.org/jira/browse/SPARK-25290 > Project: Spark > Issue Type: Test > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Liang-Chi Hsieh >Assignee: Liang-Chi Hsieh >Priority: Major > Fix For: 2.4.0 > > > BytesToBytesMapOnHeapSuite randomizedStressTest caused OutOfMemoryError on > several test runs. Seems better to reduce memory usage in this test. > [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95369/testReport/org.apache.spark.unsafe.map/BytesToBytesMapOnHeapSuite/randomizedStressTest/] > [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95482/testReport/org.apache.spark.unsafe.map/BytesToBytesMapOnHeapSuite/randomizedStressTest/] > [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95501/testReport/org.apache.spark.unsafe.map/BytesToBytesMapOnHeapSuite/randomizedStressTest/] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25303) A DStream that is checkpointed should allow its parent(s) to be removed and not persisted
[ https://issues.apache.org/jira/browse/SPARK-25303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599566#comment-16599566 ] Nikunj Bansal commented on SPARK-25303: --- I have a potential fix for this and SPARK-25202 available. > A DStream that is checkpointed should allow its parent(s) to be removed and > not persisted > - > > Key: SPARK-25303 > URL: https://issues.apache.org/jira/browse/SPARK-25303 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, > 2.2.1, 2.2.2, 2.3.0, 2.3.1 >Reporter: Nikunj Bansal >Priority: Major > Labels: Streaming, streaming > > A checkpointed DStream is supposed to cut the lineage to its parent(s) such > that any persisted RDDs for the parent(s) are removed. However, combined with > the issue in SPARK-25302, they result in the Input Stream RDDs being > persisted a lot longer than they are actually required. > See also related bug SPARK-25302. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25302) ReducedWindowedDStream not using checkpoints for reduced RDDs
[ https://issues.apache.org/jira/browse/SPARK-25302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599565#comment-16599565 ] Nikunj Bansal commented on SPARK-25302: --- I have a potential fix for this and SPARK-25303 available. > ReducedWindowedDStream not using checkpoints for reduced RDDs > - > > Key: SPARK-25302 > URL: https://issues.apache.org/jira/browse/SPARK-25302 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, > 2.2.1, 2.2.2, 2.3.0, 2.3.1 >Reporter: Nikunj Bansal >Priority: Major > Labels: Streaming, streaming > > When using reduceByKeyAndWindow() using inverse reduce function, it > eventually creates a ReducedWindowedDStream. This class creates a > reducedDStream but only persists it and does not checkpoint it. The result is > that it ends up using cached RDDs and does not cut lineage to the input > DStream resulting in eventually caching the input RDDs for much longer than > they are needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25302) ReducedWindowedDStream not using checkpoints for reduced RDDs
[ https://issues.apache.org/jira/browse/SPARK-25302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599563#comment-16599563 ] Nikunj Bansal commented on SPARK-25302: --- See also related issue SPARK-25303 > ReducedWindowedDStream not using checkpoints for reduced RDDs > - > > Key: SPARK-25302 > URL: https://issues.apache.org/jira/browse/SPARK-25302 > Project: Spark > Issue Type: Bug > Components: DStreams >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, > 2.2.1, 2.2.2, 2.3.0, 2.3.1 >Reporter: Nikunj Bansal >Priority: Major > Labels: Streaming, streaming > > When using reduceByKeyAndWindow() using inverse reduce function, it > eventually creates a ReducedWindowedDStream. This class creates a > reducedDStream but only persists it and does not checkpoint it. The result is > that it ends up using cached RDDs and does not cut lineage to the input > DStream resulting in eventually caching the input RDDs for much longer than > they are needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25303) A DStream that is checkpointed should allow its parent(s) to be removed and not persisted
Nikunj Bansal created SPARK-25303: - Summary: A DStream that is checkpointed should allow its parent(s) to be removed and not persisted Key: SPARK-25303 URL: https://issues.apache.org/jira/browse/SPARK-25303 Project: Spark Issue Type: Bug Components: DStreams Affects Versions: 2.3.1, 2.3.0, 2.2.2, 2.2.1, 2.2.0, 2.1.3, 2.1.2, 2.1.1, 2.1.0, 2.0.2, 2.0.1, 2.0.0 Reporter: Nikunj Bansal A checkpointed DStream is supposed to cut the lineage to its parent(s) such that any persisted RDDs for the parent(s) are removed. However, combined with the issue in SPARK-25302, they result in the Input Stream RDDs being persisted a lot longer than they are actually required. See also related bug SPARK-25302. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25302) ReducedWindowedDStream not using checkpoints for reduced RDDs
Nikunj Bansal created SPARK-25302: - Summary: ReducedWindowedDStream not using checkpoints for reduced RDDs Key: SPARK-25302 URL: https://issues.apache.org/jira/browse/SPARK-25302 Project: Spark Issue Type: Bug Components: DStreams Affects Versions: 2.3.1, 2.3.0, 2.2.2, 2.2.1, 2.2.0, 2.1.3, 2.1.2, 2.1.1, 2.1.0, 2.0.2, 2.0.1, 2.0.0 Reporter: Nikunj Bansal When using reduceByKeyAndWindow() using inverse reduce function, it eventually creates a ReducedWindowedDStream. This class creates a reducedDStream but only persists it and does not checkpoint it. The result is that it ends up using cached RDDs and does not cut lineage to the input DStream resulting in eventually caching the input RDDs for much longer than they are needed. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-23253) Only write shuffle temporary index file when there is not an existing one
[ https://issues.apache.org/jira/browse/SPARK-23253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599557#comment-16599557 ] Wenchen Fan edited comment on SPARK-23253 at 9/1/18 7:05 AM: - cc [~joshrosen] [~zsxwing] [~r...@databricks.com] [~jiangxb1987] was (Author: cloud_fan): cc [~joshrosen] [~zsxwing] [~r...@databricks.com] > Only write shuffle temporary index file when there is not an existing one > - > > Key: SPARK-23253 > URL: https://issues.apache.org/jira/browse/SPARK-23253 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 2.2.1 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 2.4.0 > > > Shuffle Index temporay file is used for atomic creating shuffle index file, > it is not needed when the index file already exists after another attempts of > same task had it done. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23253) Only write shuffle temporary index file when there is not an existing one
[ https://issues.apache.org/jira/browse/SPARK-23253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599557#comment-16599557 ] Wenchen Fan commented on SPARK-23253: - cc [~joshrosen] [~zsxwing] [~r...@databricks.com] > Only write shuffle temporary index file when there is not an existing one > - > > Key: SPARK-23253 > URL: https://issues.apache.org/jira/browse/SPARK-23253 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 2.2.1 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 2.4.0 > > > Shuffle Index temporay file is used for atomic creating shuffle index file, > it is not needed when the index file already exists after another attempts of > same task had it done. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-23253) Only write shuffle temporary index file when there is not an existing one
[ https://issues.apache.org/jira/browse/SPARK-23253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599551#comment-16599551 ] Wenchen Fan commented on SPARK-23253: - This is dangerous, we can only skip shuffle writing if the data in the existing shuffle file are exactly same with the one we are going to write, but in the PR we only check size. We can use checksum to quickly check if the data are same. This caused a problem in https://github.com/apache/spark/pull/22112 , I'm reverting it in my PR, we should revist this optimization later. > Only write shuffle temporary index file when there is not an existing one > - > > Key: SPARK-23253 > URL: https://issues.apache.org/jira/browse/SPARK-23253 > Project: Spark > Issue Type: Improvement > Components: Shuffle, Spark Core >Affects Versions: 2.2.1 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 2.4.0 > > > Shuffle Index temporay file is used for atomic creating shuffle index file, > it is not needed when the index file already exists after another attempts of > same task had it done. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org