[jira] [Commented] (SPARK-25308) ArrayContains function may return a error in the code generation phase.

2018-09-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599951#comment-16599951
 ] 

Apache Spark commented on SPARK-25308:
--

User 'dilipbiswal' has created a pull request for this issue:
https://github.com/apache/spark/pull/22315

> ArrayContains function may return a error in the code generation phase.
> ---
>
> Key: SPARK-25308
> URL: https://issues.apache.org/jira/browse/SPARK-25308
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Dilip Biswal
>Priority: Major
>
> Invoking ArrayContains function with non nullable array type throws the 
> following error in the code generation phase.
> {code}
> Code generation of array_contains([1,2,3], 1) failed:
> java.util.concurrent.ExecutionException: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 40, Column 11: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 40, Column 11: Expression "isNull_0" is not an rvalue
> java.util.concurrent.ExecutionException: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 40, Column 11: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 40, Column 11: Expression "isNull_0" is not an rvalue
>   at 
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
>   at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
>   at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
>   at 
> com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
>   at 
> com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
>   at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
>   at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
>   at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
>   at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1305)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25308) ArrayContains function may return a error in the code generation phase.

2018-09-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25308:


Assignee: Apache Spark

> ArrayContains function may return a error in the code generation phase.
> ---
>
> Key: SPARK-25308
> URL: https://issues.apache.org/jira/browse/SPARK-25308
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Dilip Biswal
>Assignee: Apache Spark
>Priority: Major
>
> Invoking ArrayContains function with non nullable array type throws the 
> following error in the code generation phase.
> {code}
> Code generation of array_contains([1,2,3], 1) failed:
> java.util.concurrent.ExecutionException: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 40, Column 11: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 40, Column 11: Expression "isNull_0" is not an rvalue
> java.util.concurrent.ExecutionException: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 40, Column 11: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 40, Column 11: Expression "isNull_0" is not an rvalue
>   at 
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
>   at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
>   at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
>   at 
> com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
>   at 
> com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
>   at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
>   at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
>   at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
>   at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1305)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25308) ArrayContains function may return a error in the code generation phase.

2018-09-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25308:


Assignee: (was: Apache Spark)

> ArrayContains function may return a error in the code generation phase.
> ---
>
> Key: SPARK-25308
> URL: https://issues.apache.org/jira/browse/SPARK-25308
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Dilip Biswal
>Priority: Major
>
> Invoking ArrayContains function with non nullable array type throws the 
> following error in the code generation phase.
> {code}
> Code generation of array_contains([1,2,3], 1) failed:
> java.util.concurrent.ExecutionException: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 40, Column 11: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 40, Column 11: Expression "isNull_0" is not an rvalue
> java.util.concurrent.ExecutionException: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 40, Column 11: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 40, Column 11: Expression "isNull_0" is not an rvalue
>   at 
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
>   at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
>   at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
>   at 
> com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
>   at 
> com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
>   at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
>   at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
>   at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
>   at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1305)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25308) ArrayContains function may return a error in the code generation phase.

2018-09-01 Thread Dilip Biswal (JIRA)
Dilip Biswal created SPARK-25308:


 Summary: ArrayContains function may return a error in the code 
generation phase.
 Key: SPARK-25308
 URL: https://issues.apache.org/jira/browse/SPARK-25308
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.1
Reporter: Dilip Biswal


Invoking ArrayContains function with non nullable array type throws the 
following error in the code generation phase.

{code}
Code generation of array_contains([1,2,3], 1) failed:
java.util.concurrent.ExecutionException: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 40, 
Column 11: failed to compile: org.codehaus.commons.compiler.CompileException: 
File 'generated.java', Line 40, Column 11: Expression "isNull_0" is not an 
rvalue
java.util.concurrent.ExecutionException: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 40, 
Column 11: failed to compile: org.codehaus.commons.compiler.CompileException: 
File 'generated.java', Line 40, Column 11: Expression "isNull_0" is not an 
rvalue
at 
com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
at 
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at 
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
at 
com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
at 
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
at 
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004)
at 
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1305)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25307) ArraySort function may return a error in the code generation phase.

2018-09-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599933#comment-16599933
 ] 

Apache Spark commented on SPARK-25307:
--

User 'dilipbiswal' has created a pull request for this issue:
https://github.com/apache/spark/pull/22314

> ArraySort function may return a error in the code generation phase.
> ---
>
> Key: SPARK-25307
> URL: https://issues.apache.org/jira/browse/SPARK-25307
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Dilip Biswal
>Priority: Major
>
> Sorting array of booleans (not nullable) returns a compilation error in the 
> code generation phase. Below is the compilation error :
> {code:java}
> java.util.concurrent.ExecutionException: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 51, Column 23: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 51, Column 23: No applicable constructor/method found for actual parameters 
> "boolean[]"; candidates are: "public static void 
> java.util.Arrays.sort(long[])", "public static void 
> java.util.Arrays.sort(long[], int, int)", "public static void 
> java.util.Arrays.sort(byte[], int, int)", "public static void 
> java.util.Arrays.sort(float[])", "public static void 
> java.util.Arrays.sort(float[], int, int)", "public static void 
> java.util.Arrays.sort(char[])", "public static void 
> java.util.Arrays.sort(char[], int, int)", "public static void 
> java.util.Arrays.sort(short[], int, int)", "public static void 
> java.util.Arrays.sort(short[])", "public static void 
> java.util.Arrays.sort(byte[])", "public static void 
> java.util.Arrays.sort(java.lang.Object[], int, int, java.util.Comparator)", 
> "public static void java.util.Arrays.sort(java.lang.Object[], 
> java.util.Comparator)", "public static void java.util.Arrays.sort(int[])", 
> "public static void java.util.Arrays.sort(java.lang.Object[], int, int)", 
> "public static void java.util.Arrays.sort(java.lang.Object[])", "public 
> static void java.util.Arrays.sort(double[])", "public static void 
> java.util.Arrays.sort(double[], int, int)", "public static void 
> java.util.Arrays.sort(int[], int, int)"
>   at 
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
>   at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
>   at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
>   at 
> com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
>   at 
> com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
>   at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
>   at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
>   at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
>   at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1305)
>  {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25307) ArraySort function may return a error in the code generation phase.

2018-09-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25307:


Assignee: (was: Apache Spark)

> ArraySort function may return a error in the code generation phase.
> ---
>
> Key: SPARK-25307
> URL: https://issues.apache.org/jira/browse/SPARK-25307
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Dilip Biswal
>Priority: Major
>
> Sorting array of booleans (not nullable) returns a compilation error in the 
> code generation phase. Below is the compilation error :
> {code:java}
> java.util.concurrent.ExecutionException: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 51, Column 23: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 51, Column 23: No applicable constructor/method found for actual parameters 
> "boolean[]"; candidates are: "public static void 
> java.util.Arrays.sort(long[])", "public static void 
> java.util.Arrays.sort(long[], int, int)", "public static void 
> java.util.Arrays.sort(byte[], int, int)", "public static void 
> java.util.Arrays.sort(float[])", "public static void 
> java.util.Arrays.sort(float[], int, int)", "public static void 
> java.util.Arrays.sort(char[])", "public static void 
> java.util.Arrays.sort(char[], int, int)", "public static void 
> java.util.Arrays.sort(short[], int, int)", "public static void 
> java.util.Arrays.sort(short[])", "public static void 
> java.util.Arrays.sort(byte[])", "public static void 
> java.util.Arrays.sort(java.lang.Object[], int, int, java.util.Comparator)", 
> "public static void java.util.Arrays.sort(java.lang.Object[], 
> java.util.Comparator)", "public static void java.util.Arrays.sort(int[])", 
> "public static void java.util.Arrays.sort(java.lang.Object[], int, int)", 
> "public static void java.util.Arrays.sort(java.lang.Object[])", "public 
> static void java.util.Arrays.sort(double[])", "public static void 
> java.util.Arrays.sort(double[], int, int)", "public static void 
> java.util.Arrays.sort(int[], int, int)"
>   at 
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
>   at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
>   at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
>   at 
> com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
>   at 
> com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
>   at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
>   at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
>   at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
>   at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1305)
>  {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25307) ArraySort function may return a error in the code generation phase.

2018-09-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25307:


Assignee: Apache Spark

> ArraySort function may return a error in the code generation phase.
> ---
>
> Key: SPARK-25307
> URL: https://issues.apache.org/jira/browse/SPARK-25307
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Dilip Biswal
>Assignee: Apache Spark
>Priority: Major
>
> Sorting array of booleans (not nullable) returns a compilation error in the 
> code generation phase. Below is the compilation error :
> {code:java}
> java.util.concurrent.ExecutionException: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 51, Column 23: failed to compile: 
> org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
> 51, Column 23: No applicable constructor/method found for actual parameters 
> "boolean[]"; candidates are: "public static void 
> java.util.Arrays.sort(long[])", "public static void 
> java.util.Arrays.sort(long[], int, int)", "public static void 
> java.util.Arrays.sort(byte[], int, int)", "public static void 
> java.util.Arrays.sort(float[])", "public static void 
> java.util.Arrays.sort(float[], int, int)", "public static void 
> java.util.Arrays.sort(char[])", "public static void 
> java.util.Arrays.sort(char[], int, int)", "public static void 
> java.util.Arrays.sort(short[], int, int)", "public static void 
> java.util.Arrays.sort(short[])", "public static void 
> java.util.Arrays.sort(byte[])", "public static void 
> java.util.Arrays.sort(java.lang.Object[], int, int, java.util.Comparator)", 
> "public static void java.util.Arrays.sort(java.lang.Object[], 
> java.util.Comparator)", "public static void java.util.Arrays.sort(int[])", 
> "public static void java.util.Arrays.sort(java.lang.Object[], int, int)", 
> "public static void java.util.Arrays.sort(java.lang.Object[])", "public 
> static void java.util.Arrays.sort(double[])", "public static void 
> java.util.Arrays.sort(double[], int, int)", "public static void 
> java.util.Arrays.sort(int[], int, int)"
>   at 
> com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
>   at 
> com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
>   at 
> com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
>   at 
> com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
>   at 
> com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
>   at 
> com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
>   at 
> com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
>   at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
>   at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
>   at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004)
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
>   at 
> org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1305)
>  {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25307) ArraySort function may return a error in the code generation phase.

2018-09-01 Thread Dilip Biswal (JIRA)
Dilip Biswal created SPARK-25307:


 Summary: ArraySort function may return a error in the code 
generation phase.
 Key: SPARK-25307
 URL: https://issues.apache.org/jira/browse/SPARK-25307
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.3.1
Reporter: Dilip Biswal


Sorting array of booleans (not nullable) returns a compilation error in the 
code generation phase. Below is the compilation error :
{code:java}
java.util.concurrent.ExecutionException: 
org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 51, 
Column 23: failed to compile: org.codehaus.commons.compiler.CompileException: 
File 'generated.java', Line 51, Column 23: No applicable constructor/method 
found for actual parameters "boolean[]"; candidates are: "public static void 
java.util.Arrays.sort(long[])", "public static void 
java.util.Arrays.sort(long[], int, int)", "public static void 
java.util.Arrays.sort(byte[], int, int)", "public static void 
java.util.Arrays.sort(float[])", "public static void 
java.util.Arrays.sort(float[], int, int)", "public static void 
java.util.Arrays.sort(char[])", "public static void 
java.util.Arrays.sort(char[], int, int)", "public static void 
java.util.Arrays.sort(short[], int, int)", "public static void 
java.util.Arrays.sort(short[])", "public static void 
java.util.Arrays.sort(byte[])", "public static void 
java.util.Arrays.sort(java.lang.Object[], int, int, java.util.Comparator)", 
"public static void java.util.Arrays.sort(java.lang.Object[], 
java.util.Comparator)", "public static void java.util.Arrays.sort(int[])", 
"public static void java.util.Arrays.sort(java.lang.Object[], int, int)", 
"public static void java.util.Arrays.sort(java.lang.Object[])", "public static 
void java.util.Arrays.sort(double[])", "public static void 
java.util.Arrays.sort(double[], int, int)", "public static void 
java.util.Arrays.sort(int[], int, int)"
at 
com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:306)
at 
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:293)
at 
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
at 
com.google.common.util.concurrent.Uninterruptibles.getUninterruptibly(Uninterruptibles.java:135)
at 
com.google.common.cache.LocalCache$Segment.getAndRecordStats(LocalCache.java:2410)
at 
com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2380)
at 
com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2257)
at com.google.common.cache.LocalCache.get(LocalCache.java:4000)
at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:4004)
at 
com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
at 
org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.compile(CodeGenerator.scala:1305)

 {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-10697) Lift Calculation in Association Rule mining

2018-09-01 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-10697.
---
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 22236
[https://github.com/apache/spark/pull/22236]

> Lift Calculation in Association Rule mining
> ---
>
> Key: SPARK-10697
> URL: https://issues.apache.org/jira/browse/SPARK-10697
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Yashwanth Kumar
>Assignee: Marco Gaido
>Priority: Minor
> Fix For: 2.4.0
>
>
> Lift is to be calculated for Association rule mining in 
> AssociationRules.scala under FPM.
> Lift is a measure of the performance of a  Association rules.
> Adding lift will help to compare the model efficiency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10697) Lift Calculation in Association Rule mining

2018-09-01 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-10697:
-

Assignee: Marco Gaido

> Lift Calculation in Association Rule mining
> ---
>
> Key: SPARK-10697
> URL: https://issues.apache.org/jira/browse/SPARK-10697
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Yashwanth Kumar
>Assignee: Marco Gaido
>Priority: Minor
> Fix For: 2.4.0
>
>
> Lift is to be calculated for Association rule mining in 
> AssociationRules.scala under FPM.
> Lift is a measure of the performance of a  Association rules.
> Adding lift will help to compare the model efficiency.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25306) Use cache to speed up `createFilter` in ORC

2018-09-01 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25306:
--
Affects Version/s: 1.6.3

> Use cache to speed up `createFilter` in ORC
> ---
>
> Key: SPARK-25306
> URL: https://issues.apache.org/jira/browse/SPARK-25306
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.2, 2.3.1, 2.4.0
>Reporter: Dongjoon Hyun
>Priority: Critical
>
> In ORC data source, `createFilter` function has exponential time complexity 
> due to lack of memoization like the following. This issue aims to improve it.
> *REPRODUCE*
> {code}
> // Create and read 1 row table with 1000 columns
> sql("set spark.sql.orc.filterPushdown=true")
> val selectExpr = (1 to 1000).map(i => s"id c$i")
> spark.range(1).selectExpr(selectExpr: 
> _*).write.mode("overwrite").orc("/tmp/orc")
> print(s"With 0 filters, ")
> spark.time(spark.read.orc("/tmp/orc").count)
> // Increase the number of filters
> (20 to 30).foreach { width =>
>   val whereExpr = (1 to width).map(i => s"c$i is not null").mkString(" and ")
>   print(s"With $width filters, ")
>   spark.time(spark.read.orc("/tmp/orc").where(whereExpr).count)
> }
> {code}
> *RESULT*
> {code}
> With 0 filters, Time taken: 653 ms
>   
> With 20 filters, Time taken: 962 ms
> With 21 filters, Time taken: 1282 ms
> With 22 filters, Time taken: 1982 ms
> With 23 filters, Time taken: 3855 ms
> With 24 filters, Time taken: 6719 ms
> With 25 filters, Time taken: 12669 ms
> With 26 filters, Time taken: 25032 ms
> With 27 filters, Time taken: 49585 ms
> With 28 filters, Time taken: 98980 ms// over 1 min 38 seconds
> With 29 filters, Time taken: 198368 ms   // over 3 mins
> With 30 filters, Time taken: 393744 ms   // over 6 mins
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25306) Use cache to speed up `createFilter` in ORC

2018-09-01 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25306:
--
Affects Version/s: 2.0.2

> Use cache to speed up `createFilter` in ORC
> ---
>
> Key: SPARK-25306
> URL: https://issues.apache.org/jira/browse/SPARK-25306
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.3, 2.2.2, 2.3.1, 2.4.0
>Reporter: Dongjoon Hyun
>Priority: Critical
>
> In ORC data source, `createFilter` function has exponential time complexity 
> due to lack of memoization like the following. This issue aims to improve it.
> *REPRODUCE*
> {code}
> // Create and read 1 row table with 1000 columns
> sql("set spark.sql.orc.filterPushdown=true")
> val selectExpr = (1 to 1000).map(i => s"id c$i")
> spark.range(1).selectExpr(selectExpr: 
> _*).write.mode("overwrite").orc("/tmp/orc")
> print(s"With 0 filters, ")
> spark.time(spark.read.orc("/tmp/orc").count)
> // Increase the number of filters
> (20 to 30).foreach { width =>
>   val whereExpr = (1 to width).map(i => s"c$i is not null").mkString(" and ")
>   print(s"With $width filters, ")
>   spark.time(spark.read.orc("/tmp/orc").where(whereExpr).count)
> }
> {code}
> *RESULT*
> {code}
> With 0 filters, Time taken: 653 ms
>   
> With 20 filters, Time taken: 962 ms
> With 21 filters, Time taken: 1282 ms
> With 22 filters, Time taken: 1982 ms
> With 23 filters, Time taken: 3855 ms
> With 24 filters, Time taken: 6719 ms
> With 25 filters, Time taken: 12669 ms
> With 26 filters, Time taken: 25032 ms
> With 27 filters, Time taken: 49585 ms
> With 28 filters, Time taken: 98980 ms// over 1 min 38 seconds
> With 29 filters, Time taken: 198368 ms   // over 3 mins
> With 30 filters, Time taken: 393744 ms   // over 6 mins
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25306) Use cache to speed up `createFilter` in ORC

2018-09-01 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25306:
--
Description: 
In ORC data source, `createFilter` function has exponential time complexity due 
to lack of memoization like the following. This issue aims to improve it.

*REPRODUCE*
{code}
// Create and read 1 row table with 1000 columns
sql("set spark.sql.orc.filterPushdown=true")
val selectExpr = (1 to 1000).map(i => s"id c$i")
spark.range(1).selectExpr(selectExpr: 
_*).write.mode("overwrite").orc("/tmp/orc")
print(s"With 0 filters, ")
spark.time(spark.read.orc("/tmp/orc").count)

// Increase the number of filters
(20 to 30).foreach { width =>
  val whereExpr = (1 to width).map(i => s"c$i is not null").mkString(" and ")
  print(s"With $width filters, ")
  spark.time(spark.read.orc("/tmp/orc").where(whereExpr).count)
}
{code}

*RESULT*
{code}
With 0 filters, Time taken: 653 ms  
With 20 filters, Time taken: 962 ms
With 21 filters, Time taken: 1282 ms
With 22 filters, Time taken: 1982 ms
With 23 filters, Time taken: 3855 ms
With 24 filters, Time taken: 6719 ms
With 25 filters, Time taken: 12669 ms
With 26 filters, Time taken: 25032 ms
With 27 filters, Time taken: 49585 ms
With 28 filters, Time taken: 98980 ms// over 1 min 38 seconds
With 29 filters, Time taken: 198368 ms   // over 3 mins
With 30 filters, Time taken: 393744 ms   // over 6 mins
{code}

  was:
In ORC data source, `createFilter` function has exponential time complexity due 
to lack of memoization like the following. This issue aims to improve it.

*REPRODUCE*
{code}
// Create and read 1 row table with 1000 columns
sql("set spark.sql.orc.filterPushdown=true")
val selectExpr = (1 to 1000).map(i => s"id c$i")
spark.range(1).selectExpr(selectExpr: 
_*).write.mode("overwrite").orc("/tmp/orc")
print(s"With 0 filters, ")
spark.time(spark.read.orc("/tmp/orc").count)

// Increase the number of filters
(20 to 30).foreach { width =>
  val whereExpr = (1 to width).map(i => s"c$i is not null").mkString(" and ")
  print(s"With $width filters, ")
  spark.time(spark.read.orc("/tmp/orc").where(whereExpr).count)
}
{code}

*RESULT*
{code}
With 0 filters, Time taken: 653 ms  
With 20 filters, Time taken: 962 ms
With 21 filters, Time taken: 1282 ms
With 22 filters, Time taken: 1982 ms
With 23 filters, Time taken: 3855 ms
With 24 filters, Time taken: 6719 ms
With 25 filters, Time taken: 12669 ms
With 26 filters, Time taken: 25032 ms
With 27 filters, Time taken: 49585 ms
With 28 filters, Time taken: 98980 ms // over 1 min 38 seconds
With 29 filters, Time taken: 198368 ms   // over 3 mins
With 30 filters, Time taken: 393744 ms   // over 6 mins
{code}


> Use cache to speed up `createFilter` in ORC
> ---
>
> Key: SPARK-25306
> URL: https://issues.apache.org/jira/browse/SPARK-25306
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.3, 2.2.2, 2.3.1, 2.4.0
>Reporter: Dongjoon Hyun
>Priority: Critical
>
> In ORC data source, `createFilter` function has exponential time complexity 
> due to lack of memoization like the following. This issue aims to improve it.
> *REPRODUCE*
> {code}
> // Create and read 1 row table with 1000 columns
> sql("set spark.sql.orc.filterPushdown=true")
> val selectExpr = (1 to 1000).map(i => s"id c$i")
> spark.range(1).selectExpr(selectExpr: 
> _*).write.mode("overwrite").orc("/tmp/orc")
> print(s"With 0 filters, ")
> spark.time(spark.read.orc("/tmp/orc").count)
> // Increase the number of filters
> (20 to 30).foreach { width =>
>   val whereExpr = (1 to width).map(i => s"c$i is not null").mkString(" and ")
>   print(s"With $width filters, ")
>   spark.time(spark.read.orc("/tmp/orc").where(whereExpr).count)
> }
> {code}
> *RESULT*
> {code}
> With 0 filters, Time taken: 653 ms
>   
> With 20 filters, Time taken: 962 ms
> With 21 filters, Time taken: 1282 ms
> With 22 filters, Time taken: 1982 ms
> With 23 filters, Time taken: 3855 ms
> With 24 filters, Time taken: 6719 ms
> With 25 filters, Time taken: 12669 ms
> With 26 filters, Time taken: 25032 ms
> With 27 filters, Time taken: 49585 ms
> With 28 filters, Time taken: 98980 ms// over 1 min 38 seconds
> With 29 filters, Time taken: 198368 ms   // over 3 mins
> With 30 filters, Time taken: 393744 ms   // over 6 mins
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25306) Use cache to speed up `createFilter` in ORC

2018-09-01 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25306:
--
Affects Version/s: (was: SQL)
   2.4.0
   2.1.3
   2.2.2
   2.3.1

> Use cache to speed up `createFilter` in ORC
> ---
>
> Key: SPARK-25306
> URL: https://issues.apache.org/jira/browse/SPARK-25306
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.3, 2.2.2, 2.3.1, 2.4.0
>Reporter: Dongjoon Hyun
>Priority: Critical
>
> In ORC data source, `createFilter` function has exponential time complexity 
> due to lack of memoization like the following. This issue aims to improve it.
> *REPRODUCE*
> {code}
> // Create and read 1 row table with 1000 columns
> sql("set spark.sql.orc.filterPushdown=true")
> val selectExpr = (1 to 1000).map(i => s"id c$i")
> spark.range(1).selectExpr(selectExpr: 
> _*).write.mode("overwrite").orc("/tmp/orc")
> print(s"With 0 filters, ")
> spark.time(spark.read.orc("/tmp/orc").count)
> // Increase the number of filters
> (20 to 30).foreach { width =>
>   val whereExpr = (1 to width).map(i => s"c$i is not null").mkString(" and ")
>   print(s"With $width filters, ")
>   spark.time(spark.read.orc("/tmp/orc").where(whereExpr).count)
> }
> {code}
> *RESULT*
> {code}
> With 0 filters, Time taken: 653 ms
>   
> With 20 filters, Time taken: 962 ms
> With 21 filters, Time taken: 1282 ms
> With 22 filters, Time taken: 1982 ms
> With 23 filters, Time taken: 3855 ms
> With 24 filters, Time taken: 6719 ms
> With 25 filters, Time taken: 12669 ms
> With 26 filters, Time taken: 25032 ms
> With 27 filters, Time taken: 49585 ms
> With 28 filters, Time taken: 98980 ms// over 1 min 38 seconds
> With 29 filters, Time taken: 198368 ms   // over 3 mins
> With 30 filters, Time taken: 393744 ms   // over 6 mins
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25306) Use cache to speed up `createFilter` in ORC

2018-09-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599822#comment-16599822
 ] 

Apache Spark commented on SPARK-25306:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/22313

> Use cache to speed up `createFilter` in ORC
> ---
>
> Key: SPARK-25306
> URL: https://issues.apache.org/jira/browse/SPARK-25306
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: SQL
>Reporter: Dongjoon Hyun
>Priority: Critical
>
> In ORC data source, `createFilter` function has exponential time complexity 
> due to lack of memoization like the following. This issue aims to improve it.
> *REPRODUCE*
> {code}
> // Create and read 1 row table with 1000 columns
> sql("set spark.sql.orc.filterPushdown=true")
> val selectExpr = (1 to 1000).map(i => s"id c$i")
> spark.range(1).selectExpr(selectExpr: 
> _*).write.mode("overwrite").orc("/tmp/orc")
> print(s"With 0 filters, ")
> spark.time(spark.read.orc("/tmp/orc").count)
> // Increase the number of filters
> (20 to 30).foreach { width =>
>   val whereExpr = (1 to width).map(i => s"c$i is not null").mkString(" and ")
>   print(s"With $width filters, ")
>   spark.time(spark.read.orc("/tmp/orc").where(whereExpr).count)
> }
> {code}
> *RESULT*
> {code}
> With 0 filters, Time taken: 653 ms
>   
> With 20 filters, Time taken: 962 ms
> With 21 filters, Time taken: 1282 ms
> With 22 filters, Time taken: 1982 ms
> With 23 filters, Time taken: 3855 ms
> With 24 filters, Time taken: 6719 ms
> With 25 filters, Time taken: 12669 ms
> With 26 filters, Time taken: 25032 ms
> With 27 filters, Time taken: 49585 ms
> With 28 filters, Time taken: 98980 ms // over 1 min 38 seconds
> With 29 filters, Time taken: 198368 ms   // over 3 mins
> With 30 filters, Time taken: 393744 ms   // over 6 mins
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25306) Use cache to speed up `createFilter` in ORC

2018-09-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25306:


Assignee: Apache Spark

> Use cache to speed up `createFilter` in ORC
> ---
>
> Key: SPARK-25306
> URL: https://issues.apache.org/jira/browse/SPARK-25306
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: SQL
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Critical
>
> In ORC data source, `createFilter` function has exponential time complexity 
> due to lack of memoization like the following. This issue aims to improve it.
> *REPRODUCE*
> {code}
> // Create and read 1 row table with 1000 columns
> sql("set spark.sql.orc.filterPushdown=true")
> val selectExpr = (1 to 1000).map(i => s"id c$i")
> spark.range(1).selectExpr(selectExpr: 
> _*).write.mode("overwrite").orc("/tmp/orc")
> print(s"With 0 filters, ")
> spark.time(spark.read.orc("/tmp/orc").count)
> // Increase the number of filters
> (20 to 30).foreach { width =>
>   val whereExpr = (1 to width).map(i => s"c$i is not null").mkString(" and ")
>   print(s"With $width filters, ")
>   spark.time(spark.read.orc("/tmp/orc").where(whereExpr).count)
> }
> {code}
> *RESULT*
> {code}
> With 0 filters, Time taken: 653 ms
>   
> With 20 filters, Time taken: 962 ms
> With 21 filters, Time taken: 1282 ms
> With 22 filters, Time taken: 1982 ms
> With 23 filters, Time taken: 3855 ms
> With 24 filters, Time taken: 6719 ms
> With 25 filters, Time taken: 12669 ms
> With 26 filters, Time taken: 25032 ms
> With 27 filters, Time taken: 49585 ms
> With 28 filters, Time taken: 98980 ms // over 1 min 38 seconds
> With 29 filters, Time taken: 198368 ms   // over 3 mins
> With 30 filters, Time taken: 393744 ms   // over 6 mins
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25306) Use cache to speed up `createFilter` in ORC

2018-09-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25306:


Assignee: (was: Apache Spark)

> Use cache to speed up `createFilter` in ORC
> ---
>
> Key: SPARK-25306
> URL: https://issues.apache.org/jira/browse/SPARK-25306
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: SQL
>Reporter: Dongjoon Hyun
>Priority: Critical
>
> In ORC data source, `createFilter` function has exponential time complexity 
> due to lack of memoization like the following. This issue aims to improve it.
> *REPRODUCE*
> {code}
> // Create and read 1 row table with 1000 columns
> sql("set spark.sql.orc.filterPushdown=true")
> val selectExpr = (1 to 1000).map(i => s"id c$i")
> spark.range(1).selectExpr(selectExpr: 
> _*).write.mode("overwrite").orc("/tmp/orc")
> print(s"With 0 filters, ")
> spark.time(spark.read.orc("/tmp/orc").count)
> // Increase the number of filters
> (20 to 30).foreach { width =>
>   val whereExpr = (1 to width).map(i => s"c$i is not null").mkString(" and ")
>   print(s"With $width filters, ")
>   spark.time(spark.read.orc("/tmp/orc").where(whereExpr).count)
> }
> {code}
> *RESULT*
> {code}
> With 0 filters, Time taken: 653 ms
>   
> With 20 filters, Time taken: 962 ms
> With 21 filters, Time taken: 1282 ms
> With 22 filters, Time taken: 1982 ms
> With 23 filters, Time taken: 3855 ms
> With 24 filters, Time taken: 6719 ms
> With 25 filters, Time taken: 12669 ms
> With 26 filters, Time taken: 25032 ms
> With 27 filters, Time taken: 49585 ms
> With 28 filters, Time taken: 98980 ms // over 1 min 38 seconds
> With 29 filters, Time taken: 198368 ms   // over 3 mins
> With 30 filters, Time taken: 393744 ms   // over 6 mins
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25306) Use cache to speed up `createFilter` in ORC

2018-09-01 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-25306:
-

 Summary: Use cache to speed up `createFilter` in ORC
 Key: SPARK-25306
 URL: https://issues.apache.org/jira/browse/SPARK-25306
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: SQL
Reporter: Dongjoon Hyun


In ORC data source, `createFilter` function has exponential time complexity due 
to lack of memoization like the following. This issue aims to improve it.

*REPRODUCE*
{code}
// Create and read 1 row table with 1000 columns
sql("set spark.sql.orc.filterPushdown=true")
val selectExpr = (1 to 1000).map(i => s"id c$i")
spark.range(1).selectExpr(selectExpr: 
_*).write.mode("overwrite").orc("/tmp/orc")
print(s"With 0 filters, ")
spark.time(spark.read.orc("/tmp/orc").count)

// Increase the number of filters
(20 to 30).foreach { width =>
  val whereExpr = (1 to width).map(i => s"c$i is not null").mkString(" and ")
  print(s"With $width filters, ")
  spark.time(spark.read.orc("/tmp/orc").where(whereExpr).count)
}
{code}

*RESULT*
{code}
With 0 filters, Time taken: 653 ms  
With 20 filters, Time taken: 962 ms
With 21 filters, Time taken: 1282 ms
With 22 filters, Time taken: 1982 ms
With 23 filters, Time taken: 3855 ms
With 24 filters, Time taken: 6719 ms
With 25 filters, Time taken: 12669 ms
With 26 filters, Time taken: 25032 ms
With 27 filters, Time taken: 49585 ms
With 28 filters, Time taken: 98980 ms // over 1 min 38 seconds
With 29 filters, Time taken: 198368 ms   // over 3 mins
With 30 filters, Time taken: 393744 ms   // over 6 mins
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-09-01 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:38 PM:
-

Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have a question, why serve 
Palantir's priorities and not mine? This is not healthy.

You didnt collaborate with me why? .Why nobody respected my will to make it go 
in 2.4, It was my priority back then. 

We are implementing the same design, the whole discussion makes no sense, it is 
not about mine or your implementation... I am not going to implement the same 
thing again its not reasonable. But also I wouldnt create a PR that just works, 
you can check my comments, that was easy, just call load and load the template 
(it is not rocket science this work).

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but the politically correct replies leave me no choice.

We have talked on slack several times and privately as well. You could always 
have pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. That is for committers or others in the Spark project not to 
violate the rules fine. 

No rule was violated no worries. 

This is the first time I see this kind of competing "collaboration" among the 
people of a group that works for the same cause. It is disappointing.

On the other hand, lesson learned, let's move on, no hard feelings.

 


was (Author: skonto):
Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have a question, why serve 
Palantir's priorities and not mine? This is not healthy.

You didnt collaborate with me why? .Why nobody respected my will to make it go 
in 2.4, It was my priority back then. 

We are implementing the same design, the whole discussion makes no sense, it is 
not about mine or your implementation... I am not going to implement the same 
thing again its not reasonable. But also I wouldnt create a PR that just works, 
you can check my comments, that was easy, just call load and load the template 
(it is not rocket science this work).

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but the politically correct replies leave me no choice.

We have talked on slack several times and privately as well. You could always 
have pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. That is for committers or others in the Spark project not to 
violate the rules fine. 

No rule was violated no worries. I am also ok.

This is the first time I see this kind of competing "collaboration" among the 
people of a group that works for the same cause. It is disappointing.

On the other hand, lesson learned, let's move on, no hard feelings.

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-09-01 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:37 PM:
-

Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have a question, why serve 
Palantir's priorities and not mine? This is not healthy.

You didnt collaborate with me why? .Why nobody respected my will to make it go 
in 2.4, It was my priority back then. 

We are implementing the same design, the whole discussion makes no sense, it is 
not about mine or your implementation... I am not going to implement the same 
thing again its not reasonable. But also I wouldnt create a PR that just works, 
you can check my comments, that was easy, just call load and load the template 
(it is not rocket science this work).

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but the politically correct replies leave me no choice.

We have talked on slack several times and privately as well. You could always 
have pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. That is for committers or others in the Spark project not to 
violate the rules fine. 

No rule was violated no worries. I am also ok.

This is the first time I see this kind of competing "collaboration" among the 
people of a group that works for the same cause. It is disappointing.

On the other hand, lesson learned, let's move on, no hard feelings.

 


was (Author: skonto):
Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have a question, why serve 
Palantir's priorities and not mine? This is not healthy.

You didnt collaborate with me why? ) .Why nobody respected my will to make it 
go in 2.4, It was my priority back then too. 

We are implementing the same design, the whole discussion makes no sense, it is 
not about mine or your implementation... I am not going to implement the same 
thing again its not reasonable. But also I wouldnt create a PR that just works, 
you can check my comments, that was easy, just call load and load the template 
(it is not rocket science this work).

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but the politically correct replies leave me no choice.

We have talked on slack several times and privately as well. You could always 
have pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. That is for committers or others in the Spark project not to 
violate the rules fine. 

No rule was violated no worries. I am also ok.

This is the first time I see this kind of competing "collaboration" among the 
people of a group that works for the same cause. It is disappointing.

On the other hand, lesson learned, let's move on, no hard feelings.

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA

[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-09-01 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:36 PM:
-

Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have a question, why serve 
Palantir's priorities and not mine? This is not healthy :).

You didnt collaborate with me why?  
!/jira/images/icons/emoticons/smile.png|width=16,height=16!   Why nobody 
respected my will to make it go in 2.4, It was my priority back then too. 

We are implementing the same design, the whole discussion makes no sense, it is 
not about mine or your implementation... I am not going to implement the same 
thing again its not reasonable. But also I wouldnt create a PR that just works, 
you can check my comments, that was easy, just call load and load the template 
(it is not rocket science this work).

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but the politically correct replies leave me no choice.

We have talked on slack several times and privately as well. You could always 
have pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. That is for committers or others in the Spark project not to 
violate the rules fine. 

No rule was violated no worries. I am also ok.

This is the first time I see this kind of competing "collaboration" among the 
people of a group that works for the same cause. It is disappointing.

On the other hand, lesson learned, let's move on, no hard feelings.

 


was (Author: skonto):
Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have a question, why serve 
Palantir's priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  Why nobody respected my will to make it 
go in 2.4, It was my priority back then too. 

We are implementing the same design, the whole discussion makes no sense, it is 
not about mine or your implementation... I am not going to implement the same 
thing again its not reasonable. But also I wouldnt create a PR that just works, 
you can check my comments, that was easy, just call load and load the template 
(it is not rocket science this work).

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but the politically correct replies leave me no choice.

We have talked on slack several times and privately as well. You could always 
have pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. That is for committers or others in the Spark project not to 
violate the rules fine. 

No rule was violated no worries. I am also ok.

This is the first time I see this kind of competing collaboration among the 
people of a group that works for the same cause. It is disappointing.

On the other hand, lesson learned, let's move on, no hard feelings.

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the 

[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-09-01 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:36 PM:
-

Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have a question, why serve 
Palantir's priorities and not mine? This is not healthy.

You didnt collaborate with me why? ) .Why nobody respected my will to make it 
go in 2.4, It was my priority back then too. 

We are implementing the same design, the whole discussion makes no sense, it is 
not about mine or your implementation... I am not going to implement the same 
thing again its not reasonable. But also I wouldnt create a PR that just works, 
you can check my comments, that was easy, just call load and load the template 
(it is not rocket science this work).

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but the politically correct replies leave me no choice.

We have talked on slack several times and privately as well. You could always 
have pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. That is for committers or others in the Spark project not to 
violate the rules fine. 

No rule was violated no worries. I am also ok.

This is the first time I see this kind of competing "collaboration" among the 
people of a group that works for the same cause. It is disappointing.

On the other hand, lesson learned, let's move on, no hard feelings.

 


was (Author: skonto):
Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have a question, why serve 
Palantir's priorities and not mine? This is not healthy :).

You didnt collaborate with me why?  
!/jira/images/icons/emoticons/smile.png|width=16,height=16!   Why nobody 
respected my will to make it go in 2.4, It was my priority back then too. 

We are implementing the same design, the whole discussion makes no sense, it is 
not about mine or your implementation... I am not going to implement the same 
thing again its not reasonable. But also I wouldnt create a PR that just works, 
you can check my comments, that was easy, just call load and load the template 
(it is not rocket science this work).

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but the politically correct replies leave me no choice.

We have talked on slack several times and privately as well. You could always 
have pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. That is for committers or others in the Spark project not to 
violate the rules fine. 

No rule was violated no worries. I am also ok.

This is the first time I see this kind of competing "collaboration" among the 
people of a group that works for the same cause. It is disappointing.

On the other hand, lesson learned, let's move on, no hard feelings.

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver 

[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-09-01 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:35 PM:
-

Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have a question, why serve 
Palantir's priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  Why nobody respected my will to make it 
go in 2.4, It was my priority back then too. 

We are implementing the same design, the whole discussion makes no sense, it is 
not about mine or your implementation... I am not going to implement the same 
thing again its not reasonable. But also I wouldnt create a PR that just works, 
you can check my comments, that was easy, just call load and load the template 
(it is not rocket science this work).

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but the politically correct replies leave me no choice.

We have talked on slack several times and privately as well. You could always 
have pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. That is for committers or others in the Spark project not to 
violate the rules fine. 

No rule was violated no worries. I am also ok.

This is the first time I see this kind of competing collaboration among the 
people of a group that works for the same cause. It is disappointing.

On the other hand, lesson learned, let's move on, no hard feelings.

 


was (Author: skonto):
Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have a question, why serve 
Palantir's priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  Why nobody respected my will to make it 
go in 2.4, It was my priority back then too. 

We are implementing the same design, the whole discussion makes no sense, it is 
not about mine or your implementation... I am not going to implement the same 
thing again its not reasonable. But also I wouldnt create a PR that just works, 
you can check my comments, that was easy, just call load and load the template 
(it is not rocket science this work).

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but the politically correct replies leave me no choice.

We have talked on slack several times and privately as well. You could always 
have pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. That is for committers or others in the Spark project not to 
violate the rules fine. 

No rule was violated no worries. I am also ok. Lesson learned, let's move on, 
no hard feelings.

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional 

[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-09-01 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:28 PM:
-

Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have a question, why serve 
Palantir's priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  Why nobody respected my will to make it 
go in 2.4, It was my priority back then too. 

We are implementing the same design, the whole discussion makes no sense, it is 
not about mine or your implementation... I am not going to implement the same 
thing again its not reasonable. But also I wouldnt create a PR that just works, 
you can check my comments, that was easy, just call load and load the template 
(it is not rocket science this work).

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but the politically correct replies leave me no choice.

We have talked on slack several times and privately as well. You could always 
have pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. That is for committers or others in the Spark project not to 
violate the rules fine. 

No rule was violated no worries. I am also ok. Lesson learned, let's move on, 
no hard feelings.

 


was (Author: skonto):
Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have a question, why serve 
Palantir's priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  Why nobody respected my will to make it 
go in 2.4, It was my priority back then too. 

We are implementing the same design, the whole discussion makes no sense, it is 
not about mine or your implementation... I am not going to implement the same 
thing again its not reasonable. But also I wouldnt create a PR that just works, 
you can check my comments, that was easy, just call load and load the template 
(it is not rocket science this work).

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but the politically correct replies leave me no choice.

We have talked on slack several times and privately as well. You could always 
have pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. That is for committers or others in the Spark project not to 
violate the rules fine. 

No rule was violated no worries. I am also ok. Lesson learned, let's move on.

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-09-01 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:27 PM:
-

Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have a question, why serve 
Palantir's priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  Why nobody respected my will to make it 
go in 2.4, It was my priority back then too. 

We are implementing the same design, the whole discussion makes no sense, it is 
not about mine or your implementation... I am not going to implement the same 
thing again its not reasonable. But also I wouldnt create a PR that just works, 
you can check my comments, that was easy, just call load and load the template 
(it is not rocket science this work).

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but the politically correct replies leave me no choice.

We have talked on slack several times and privately as well. You could always 
have pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. That is for committers or others in the Spark project not to 
violate the rules fine. 

No rule was violated no worries. I am also ok. Lesson learned, let's move on.

 


was (Author: skonto):
Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have a question, why serve 
Palantir's priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  Why nobody respected my will to make it 
go in 2.4, It was my priority back then too. 

We are implementing the same design, the whole discussion makes no sense, it is 
not about mine or your implementation... I am not going to implement the same 
thing again its not reasonable. But also I wouldnt create a PR that just works, 
you can check my comments, that was easy, just call load and load the template 
(it is not rocket science this work).

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but replies leave me no choice.

We have talked on slack several times and privately as well. You could always 
have pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. That is for committers or others in the Spark project not to 
violate the rules fine. 

No rule was violated no worries. I am also ok. Lesson learned, let's move on.

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-09-01 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:26 PM:
-

Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have a question, why serve 
Palantir's priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  Why nobody respected my will to make it 
go in 2.4, It was my priority back then too. 

We are implementing the same design, the whole discussion makes no sense, it is 
not about mine or your implementation... I am not going to implement the same 
thing again its not reasonable. But also I wouldnt create a PR that just works, 
you can check my comments, that was easy, just call load and load the template 
(it is not rocket science this work).

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but replies leave me no choice.

We have talked on slack several times and privately as well. You could always 
have pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. That is for committers or others in the Spark project not to 
violate the rules fine. 

No rule was violated no worries. I am also ok. Lesson learned, let's move on.

 


was (Author: skonto):
Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have a question, why serve 
Palantir's priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  We are implementing the same design, the 
whole discussion makes no sense, it is not about mine or your implementation... 
I am not going to implement the same thing again its not reasonable. But also I 
wouldnt create a PR that just works, you can check my comments, that was easy, 
just call load and load the template (it is not rocket science this work).

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but replies leave me no choice.

We have talked on slack several times and privately as well. You could always 
have pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. That is for committers or others in the Spark project not to 
violate the rules fine. 

No rule was violated no worries. I am also ok. Lesson learned, let's move on.

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-09-01 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:25 PM:
-

Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have a question, why serve 
Palantir's priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  We are implementing the same design, the 
whole discussion makes no sense, it is not about mine or your implementation... 
I am not going to implement the same thing again its not reasonable. But also I 
wouldnt create a PR that just works, you can check my comments, that was easy, 
just call load and load the template (it is not rocket science this work).

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but replies leave me no choice.

We have talked on slack several times and privately as well. You could always 
have pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. That is for committers or others in the Spark project not to 
violate the rules fine. 

No rule was violated no worries. I am also ok. Lesson learned, let's move on.

 


was (Author: skonto):
Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have a question, why serve 
Palantir's priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  We are implementing the same design, the 
whole discussion makes no sense, it is not about mine or your implementation... 
I am not going to implement the same thing again its not reasonable. But also I 
wouldnt create a PR that just works, you can check my comments, that was easy, 
just call load and load the template (it is not rocket science this work).

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but replies leave me no choice.

We have talked on slack several times and privately as well. You could always 
have pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. The only point is for committers or others in the Spark 
project not to violate the rules fine. 

No rule was violated no worries. I am also ok. Lesson learned, let's move on.

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-09-01 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:25 PM:
-

Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have a question, why serve 
Palantir's priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  We are implementing the same design, the 
whole discussion makes no sense, it is not about mine or your implementation... 
I am not going to implement the same thing again its not reasonable. But also I 
wouldnt create a PR that just works, you can check my comments, that was easy, 
just call load and load the template (it is not rocket science this work).

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but replies leave me no choice.

We have talked on slack several times and privately as well. You could always 
have pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. That is for committers or others in the Spark project not to 
violate the rules fine. 

No rule was violated no worries. I am also ok. Lesson learned, let's move on.

 


was (Author: skonto):
Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have a question, why serve 
Palantir's priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  We are implementing the same design, the 
whole discussion makes no sense, it is not about mine or your implementation... 
I am not going to implement the same thing again its not reasonable. But also I 
wouldnt create a PR that just works, you can check my comments, that was easy, 
just call load and load the template (it is not rocket science this work).

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but replies leave me no choice.

We have talked on slack several times and privately as well. You could always 
have pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. That is for committers or others in the Spark project not to 
violate the rules fine. 

No rule was violated no worries. I am also ok. Lesson learned, let's move on.

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-09-01 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:24 PM:
-

Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have a question, why serve 
Palantir's priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  We are implementing the same design, the 
whole discussion makes no sense, it is not about mine or your implementation... 
I am not going to implement the same thing again its not reasonable. But also I 
wouldnt create a PR that just works, you can check my comments, that was easy, 
just call load and load the template (it is not rocket science this work).

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but replies leave me no choice.

We have talked on slack several times and privately as well. You could always 
have pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. The only point is for committers or others in the Spark 
project not to violate the rules fine. 

No rule was violated no worries. I am also ok. Lesson learned, let's move on.

 


was (Author: skonto):
Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have question why serve Palantir's 
priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  We are implementing the same design, the 
whole discussion makes no sense, it is not about mine or your implementation... 
I am not going to implement the same thing again its not reasonable. But also I 
wouldnt create a PR that just works, you can check my comments, that was easy, 
just call load and load the template (it is not rocket science this work).

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but replies leave me no choice.

We have talked on slack several times and privately as well. You could always 
have pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. The only point is for committers or others in the Spark 
project not to violate the rules fine. 

No rule was violated no worries. I am also ok. Lesson learned, let's move on.

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-09-01 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:23 PM:
-

Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have question why serve Palantir's 
priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  We are implementing the same design, the 
whole discussion makes no sense, it is not about mine or your implementation... 
I am not going to implement the same thing again its not reasonable. But also I 
wouldnt create a PR that just works, you can check my comments, that was easy, 
just call load and load the template (it is not rocket science this work).

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but replies leave me no choice.

We have talked on slack several times and privately as well. You could always 
have pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. The only point is for committers or others in the Spark 
project not to violate the rules fine. 

No rule was violated no worries. I am also ok. Lesson learned, let's move on.

 


was (Author: skonto):
Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have question why serve Palantir's 
priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  We are implementing the same design, the 
whole discussion makes no sense, it is not about mine or your implementation... 
I am not going to implement the same thing again its not reasonable. But also I 
wouldnt create a PR that just works, you can check my comments, that was easy, 
just call load and load the template (it is not rocket science this work).

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but replies leave me no choice.

We have talked on slack several times and privately. You could always have 
pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. The only point is for committers or others in the Spark 
project not to violate the rules fine. 

No rule was violated no worries. I am also ok. Lesson learned, let's move on.

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-09-01 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:21 PM:
-

Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have question why serve Palantir's 
priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  We are implementing the same design, the 
whole discussion makes no sense, it is not about mine or your implementation... 
I am not going to implement the same thing again its not reasonable. But also I 
wouldnt create a PR that just works, you can check my comments, that was easy, 
just call load and load the template (it is not rocket science this work).

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but replies leave me no choice.

We have talked on slack several times and privately. You could always have 
pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. The only point is for committers or others in the Spark 
project not to violate the rules fine. 

No rule was violated no worries. I am also ok. Lesson learned, let's move on.

 


was (Author: skonto):
Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have question why serve Palantir's 
priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  We are implementing the same design, the 
whole discussion makes no sense, it is not about mine or your implementation...

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but replies leave me no choice.

We have talked on slack several times and privately. You could always have 
pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. The only point is for committers or others in the Spark 
project not to violate the rules fine. 

No rule was violated no worries. I am also ok. Lesson learned, let's move on.

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-09-01 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:17 PM:
-

Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have question why serve Palantir's 
priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  We are implementing the same design, the 
whole discussion makes no sense, it is not about mine or your implementation...

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but replies leave me no choice.

We have talked on slack several times and privately. You could always have 
pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. The only point is for committers or others in the Spark 
project not to violate the rules fine. 

No rule was violated no worries. I am also ok. Lesson learned, let's move on.

 


was (Author: skonto):
Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have question why serve Palantir's 
priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  We are implementing the same design, the 
whole discussion makes no sense, it is not about mine or your implementation...

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but replies leave me no choice.

We have talked on slack several times and privately. You could always have 
pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. The only point is for committers or others in the Spark 
project not to violate the rules fine. 

No rule was violated no worries. I am also ok. Lesson learned, let's move on.

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-09-01 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:14 PM:
-

Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have question why serve Palantir's 
priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  We are implementing the same design, the 
whole discussion makes no sense, it is not about mine or your implementation...

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but replies leave me no choice.

We have talked on slack several times and privately. You could always have 
pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message communicated, culture and 
attitude is clear. The only point is for committers or others in the Spark 
project not to violate the rules fine. 

No rule was violated no worries. I am also ok. Lesson learned, let's move on.

 


was (Author: skonto):
Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have question why serve Palantir's 
priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  We are implementing the same design, the 
whole discussion makes no sense, it is not about mine or your implementation...

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but replies leave me no choice.

We have talked on slack several times and privately. You could always have 
pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message culture and attitude is 
clear. The only point is for committers, the Spark project not to violate the 
rules fine. 

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-09-01 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:13 PM:
-

Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have question why serve Palantir's 
priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  We are implementing the same design, the 
whole discussion makes no sense, it is not about mine or your implementation...

Sorry I dont see any real arguments in the discussion, and as I said I dont 
want to reply but replies leave me no choice.

We have talked on slack several times and privately. You could always have 
pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message culture and attitude is 
clear. The only point is for committers, the Spark project not to violate the 
rules fine. 

 


was (Author: skonto):
Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have question why serve Palantir's 
priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  We are implementing the same design, the 
whole discussion makes no sense, it is not about mine or your implementation...

Sorry I dont see any rela arguments in the discussion, and as I said I dont 
want to reply but you leave me no choice.

We have talked on slack several times and privately. You could always have 
pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message culture and attitude is 
clear. The only point is for committers, the Spark project not to violate the 
rules fine. 

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-09-01 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:12 PM:
-

Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have question why serve Palantir's 
priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  We are implementing the same design, the 
whole discussion makes no sense, it is not about mine or your implementation...

Sorry I dont see any rela arguments in the discussion, and as I said I dont 
want to reply but you leave me no choice.

We have talked on slack several times and privately. You could always have 
pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message culture and attitude is 
clear. The only point is for committers, the Spark project not to violate the 
rules fine. 

 


was (Author: skonto):
Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have question why serve Palantir's 
priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  We are implementing the same design, the 
whole discussion makes no sense, it is not about my or your implementation...

We have talked on slack several times and privately. You could always have 
pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message culture and attitude is 
clear. The only point is for committers, the Spark project not to violate the 
rules fine. 

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-09-01 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:11 PM:
-

Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have question why serve Palantir's 
priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :)  We are implementing the same design, the 
whole discussion makes no sense, it is not about my or your implementation...

We have talked on slack several times and privately. You could always have 
pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message culture and attitude is 
clear. The only point is for committers, the Spark project not to violate the 
rules fine. 

 


was (Author: skonto):
Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have question why serve Palantir's 
priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :) 

We have talked on slack several times and privately. You could always have 
pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message culture and attitude is 
clear. The only point is for committers, the Spark project not to violate the 
rules fine. 

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-09-01 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:10 PM:
-

Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have question why serve Palantir's 
priorities and not mine? This is not healthy :).

You didnt collaborate with me why? :) 

We have talked on slack several times and privately. You could always have 
pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message culture and attitude is 
clear. The only point is for committers, the Spark project not to violate the 
rules fine. 

 


was (Author: skonto):
Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have question why serve Palantir's 
priorities and not mine? This is not healthy :).

We have talked on slack several times and privately. You could always have 
pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message culture and attitude is 
clear. The only point is for committers, the Spark project not to violate the 
rules fine. 

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-09-01 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:10 PM:
-

Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have question why serve Palantir's 
priorities and not mine? This is not healthy :).

We have talked on slack several times and privately. You could always have 
pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message culture and attitude is 
clear. The only point is for committers, the Spark project not to violate the 
rules fine. 

 


was (Author: skonto):
Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have question why sever Palantir's 
priorities and not mine? This is not healthy :).

We have talked on slack several times and privately. You could always have 
pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message culture and attitude is 
clear. The only point is for committers, the Spark project not to violate the 
rules fine. 

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-09-01 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:09 PM:
-

Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have question why sever Palantir's 
priorities and not mine? This is not healthy :).

We have talked on slack several times and privately. You could always have 
pinged me but you decided to collaborate on this, without anyone knowing. 
Probably people don't understand the meaning of fairness, I am not going to 
explain it here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message culture and attitude is 
clear. The only point is for committers, the Spark project not to violate the 
rules fine. 

 


was (Author: skonto):
Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have question why sever Palantir's 
priorities and not mine? This is not healthy :).

We have talked on slack several times and privately. YOu could always have 
pinged but you decided to collaborate on this without anyone knowing. Probably 
people don't understand the meaning of fairness, I am not going to explain it 
here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message culture and attitude is 
clear. The only point is for committers, the Spark project not to violate the 
rules fine. 

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24434) Support user-specified driver and executor pod templates

2018-09-01 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728
 ] 

Stavros Kontopoulos edited comment on SPARK-24434 at 9/1/18 8:08 PM:
-

Spark belongs to the community (no?) and should not serve any company's 
priorities like Palantirs, [~mcheah] we don't need the meeting then if we are 
going to overlap with each other, fine.  I have question why sever Palantir's 
priorities and not mine? This is not healthy :).

We have talked on slack several times and privately. YOu could always have 
pinged but you decided to collaborate on this without anyone knowing. Probably 
people don't understand the meaning of fairness, I am not going to explain it 
here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message culture and attitude is 
clear. The only point is for committers, the Spark project not to violate the 
rules fine. 

 


was (Author: skonto):
Spark belongs to the community (no?) and should not serve any company's 
priorities, [~mcheah] we don't need the meeting then if we are going to overlap 
with each other, fine. 

We have talked on slack several times and privately. YOu could always have 
pinged but you decided to collaborate on this without anyone knowing. Probably 
people don't understand the meaning of fairness, I am not going to explain it 
here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message culture and attitude is 
clear. The only point is for committers, the Spark project not to violate the 
rules fine. 

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-09-01 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599728#comment-16599728
 ] 

Stavros Kontopoulos commented on SPARK-24434:
-

Spark belongs to the community (no?) and should not serve any company's 
priorities, [~mcheah] we don't need the meeting then if we are going to overlap 
with each other, fine. 

We have talked on slack several times and privately. YOu could always have 
pinged but you decided to collaborate on this without anyone knowing. Probably 
people don't understand the meaning of fairness, I am not going to explain it 
here.  

We can always create any RP we like and then we will see what work is merged, 
cool.

For good or bad though the meeting has power because k8s committers have the 
final saying on merging no? So I dont agree. 

The whole discussion for me is pointless, the message culture and attitude is 
clear. The only point is for committers, the Spark project not to violate the 
rules fine. 

 

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17916) CSV data source treats empty string as null no matter what nullValue option is

2018-09-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599717#comment-16599717
 ] 

Apache Spark commented on SPARK-17916:
--

User 'koertkuipers' has created a pull request for this issue:
https://github.com/apache/spark/pull/22312

> CSV data source treats empty string as null no matter what nullValue option is
> --
>
> Key: SPARK-17916
> URL: https://issues.apache.org/jira/browse/SPARK-17916
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1
>Reporter: Hossein Falaki
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 2.4.0
>
>
> When user configures {{nullValue}} in CSV data source, in addition to those 
> values, all empty string values are also converted to null.
> {code}
> data:
> col1,col2
> 1,"-"
> 2,""
> {code}
> {code}
> spark.read.format("csv").option("nullValue", "-")
> {code}
> We will find a null in both rows.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23253) Only write shuffle temporary index file when there is not an existing one

2018-09-01 Thread Imran Rashid (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599714#comment-16599714
 ] 

Imran Rashid commented on SPARK-23253:
--

I think I see the issue you are referring to [~cloud_fan], but I'm not sure 
this change is actually the responsible one.  Isn't it really from here 
https://github.com/apache/spark/pull/9610 ?  the change here just changed 
whether we bother to write {{lengths}} to a file, but doesn't actually change 
whether we use that file at all.

There is more history discussing that change (and non-determinism etc.) here 
https://github.com/apache/spark/pull/9214 and 
https://github.com/apache/spark/pull/6648

> Only write shuffle temporary index file when there is not an existing one
> -
>
> Key: SPARK-23253
> URL: https://issues.apache.org/jira/browse/SPARK-23253
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 2.2.1
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 2.4.0
>
>
> Shuffle Index temporay file is used for atomic creating shuffle index file, 
> it is not needed when the index file already exists after another attempts of 
> same task had it done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-09-01 Thread Matt Cheah (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599697#comment-16599697
 ] 

Matt Cheah commented on SPARK-24434:


Everyone, thank you for your contribution to this discussion. It is important 
for us to agree upon constructive next steps to avoid this kind of 
miscommunication in the future.

I have a few notes to make from having collaborated with Yifei and Onur on this 
patch and on behalf of Palantir.

Firstly, we apologize that we did not communicate clearly enough on Apache 
communication channels that we were working on this, and the urgency of which 
we needed this work done. We agree with [~felixcheung]'s assessment that notes 
from the weekly meetings that have bearing on Spark development should be sent 
back to the wider community. We are specifically sorry for not having said 
something to the effect of "I am taking a stab at implementing this at 
https://github.com/... . Stavros, are you cool with that?" Palantir and the 
Kubernetes Big Data group must improve our communication next time.

Secondly, we would suggest that a work in progress patch proposed early on in 
the feature's development would have been helpful for users to prepare to be 
able to use this feature in their internal tools. It's helpful for everyone to 
see the API and expected behavior of a new feature so that they can plan to 
take advantage of that feature ahead of time.

Thirdly, a small clarifying comment on timelines and urgency. While we don't 
see the need for this to be in Spark 2.4, we will be taking the patch ahead of 
time on our fork of Spark which follows the Apache master branch (see 
[https://github.com/palantir/spark).] We were hoping to cherry-pick this patch 
soon but could have been clearer in our communication of this need.

Fourthly, we are sorry for the wording in "On 15 Aug it was discussed that as 
Stavros Kontopoulos was out, and was not actively working on this PR at that 
moment, Yifei Huang and I can take over and start working on this.”: Instead of 
“take over”, we should have said “contribute to this feature” in this comment 
specifically.

Finally, moving forward we are happy to collaborate on what the community 
believes to be the best implementation of this feature. We are happy to use 
Onur's, but we can also use Stavros's. Regardless of the chosen implementation, 
credit should be given to all parties. For example if Onur's implementation is 
chosen, Stavros's design work should be called out in the pull request 
description. Either way, we would like to see this feature merged by Friday, 
September 07, though this will have to be delayed if the Spark 2.4 release 
branch is not cut before that time (since we don't want this going into master 
and ending up in Spark 2.4 as a result).

We are open to feedback on any of the above points and suggestions on how we 
can improve the way we contribute to Spark in the future.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25305) Respect attribute name in `CollapseProject` and `ColumnPruning`

2018-09-01 Thread Gengliang Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-25305:
---
Summary: Respect attribute name in `CollapseProject` and `ColumnPruning`  
(was: Respect attribute name in `CollapseProject`)

> Respect attribute name in `CollapseProject` and `ColumnPruning`
> ---
>
> Key: SPARK-25305
> URL: https://issues.apache.org/jira/browse/SPARK-25305
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Gengliang Wang
>Priority: Major
>
> Currently in optimizer rule `CollapseProject`, the lower level project is 
> collapsed into upper level, but the naming of alias in lower level is 
> propagated in upper level.
> We should reserve all the output names in upper level.
> See PR description of https://github.com/apache/spark/pull/22311 for details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25305) Respect attribute name in `CollapseProject`

2018-09-01 Thread Gengliang Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang updated SPARK-25305:
---
Description: 
Currently in optimizer rule `CollapseProject`, the lower level project is 
collapsed into upper level, but the naming of alias in lower level is 
propagated in upper level.
We should reserve all the output names in upper level.

See PR description of https://github.com/apache/spark/pull/22311 for details.

  was:
Currently in optimizer rule `CollapseProject`, the lower level project is 
collapsed into upper level, but the naming of alias in lower level is 
propagated in upper level.
We should reserve all the output names in upper level.


> Respect attribute name in `CollapseProject`
> ---
>
> Key: SPARK-25305
> URL: https://issues.apache.org/jira/browse/SPARK-25305
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Gengliang Wang
>Priority: Major
>
> Currently in optimizer rule `CollapseProject`, the lower level project is 
> collapsed into upper level, but the naming of alias in lower level is 
> propagated in upper level.
> We should reserve all the output names in upper level.
> See PR description of https://github.com/apache/spark/pull/22311 for details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25305) Respect attribute name in `CollapseProject`

2018-09-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599665#comment-16599665
 ] 

Apache Spark commented on SPARK-25305:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/22311

> Respect attribute name in `CollapseProject`
> ---
>
> Key: SPARK-25305
> URL: https://issues.apache.org/jira/browse/SPARK-25305
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Gengliang Wang
>Priority: Major
>
> Currently in optimizer rule `CollapseProject`, the lower level project is 
> collapsed into upper level, but the naming of alias in lower level is 
> propagated in upper level.
> We should reserve all the output names in upper level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25305) Respect attribute name in `CollapseProject`

2018-09-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25305:


Assignee: Apache Spark

> Respect attribute name in `CollapseProject`
> ---
>
> Key: SPARK-25305
> URL: https://issues.apache.org/jira/browse/SPARK-25305
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> Currently in optimizer rule `CollapseProject`, the lower level project is 
> collapsed into upper level, but the naming of alias in lower level is 
> propagated in upper level.
> We should reserve all the output names in upper level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25305) Respect attribute name in `CollapseProject`

2018-09-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25305:


Assignee: (was: Apache Spark)

> Respect attribute name in `CollapseProject`
> ---
>
> Key: SPARK-25305
> URL: https://issues.apache.org/jira/browse/SPARK-25305
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Gengliang Wang
>Priority: Major
>
> Currently in optimizer rule `CollapseProject`, the lower level project is 
> collapsed into upper level, but the naming of alias in lower level is 
> propagated in upper level.
> We should reserve all the output names in upper level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25305) Respect attribute name in `CollapseProject`

2018-09-01 Thread Gengliang Wang (JIRA)
Gengliang Wang created SPARK-25305:
--

 Summary: Respect attribute name in `CollapseProject`
 Key: SPARK-25305
 URL: https://issues.apache.org/jira/browse/SPARK-25305
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: Gengliang Wang


Currently in optimizer rule `CollapseProject`, the lower level project is 
collapsed into upper level, but the naming of alias in lower level is 
propagated in upper level.
We should reserve all the output names in upper level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25298) spark-tools build failure for Scala 2.12

2018-09-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25298:


Assignee: Apache Spark

> spark-tools build failure for Scala 2.12
> 
>
> Key: SPARK-25298
> URL: https://issues.apache.org/jira/browse/SPARK-25298
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Darcy Shen
>Assignee: Apache Spark
>Priority: Major
>
> $ sbt--
> > ++ 2.12.6
> > compile
>  
> [error] 
> /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:22:
>  object runtime is not a member of package reflect
> [error] import scala.reflect.runtime.\{universe => unv}
> [error]  ^
> [error] 
> /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:23:
>  object runtime is not a member of package reflect
> [error] import scala.reflect.runtime.universe.runtimeMirror
> [error]  ^
> [error] 
> /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:41:
>  not found: value runtimeMirror
> [error]   private val mirror = runtimeMirror(classLoader)
> [error]    ^
> [error] 
> /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:43:
>  not found: value unv
> [error]   private def isPackagePrivate(sym: unv.Symbol) =



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25298) spark-tools build failure for Scala 2.12

2018-09-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599660#comment-16599660
 ] 

Apache Spark commented on SPARK-25298:
--

User 'sadhen' has created a pull request for this issue:
https://github.com/apache/spark/pull/22310

> spark-tools build failure for Scala 2.12
> 
>
> Key: SPARK-25298
> URL: https://issues.apache.org/jira/browse/SPARK-25298
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Darcy Shen
>Priority: Major
>
> $ sbt--
> > ++ 2.12.6
> > compile
>  
> [error] 
> /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:22:
>  object runtime is not a member of package reflect
> [error] import scala.reflect.runtime.\{universe => unv}
> [error]  ^
> [error] 
> /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:23:
>  object runtime is not a member of package reflect
> [error] import scala.reflect.runtime.universe.runtimeMirror
> [error]  ^
> [error] 
> /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:41:
>  not found: value runtimeMirror
> [error]   private val mirror = runtimeMirror(classLoader)
> [error]    ^
> [error] 
> /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:43:
>  not found: value unv
> [error]   private def isPackagePrivate(sym: unv.Symbol) =



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25298) spark-tools build failure for Scala 2.12

2018-09-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25298:


Assignee: (was: Apache Spark)

> spark-tools build failure for Scala 2.12
> 
>
> Key: SPARK-25298
> URL: https://issues.apache.org/jira/browse/SPARK-25298
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Darcy Shen
>Priority: Major
>
> $ sbt--
> > ++ 2.12.6
> > compile
>  
> [error] 
> /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:22:
>  object runtime is not a member of package reflect
> [error] import scala.reflect.runtime.\{universe => unv}
> [error]  ^
> [error] 
> /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:23:
>  object runtime is not a member of package reflect
> [error] import scala.reflect.runtime.universe.runtimeMirror
> [error]  ^
> [error] 
> /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:41:
>  not found: value runtimeMirror
> [error]   private val mirror = runtimeMirror(classLoader)
> [error]    ^
> [error] 
> /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:43:
>  not found: value unv
> [error]   private def isPackagePrivate(sym: unv.Symbol) =



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20384) supporting value classes over primitives in DataSets

2018-09-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20384:


Assignee: Apache Spark

> supporting value classes over primitives in DataSets
> 
>
> Key: SPARK-20384
> URL: https://issues.apache.org/jira/browse/SPARK-20384
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer, SQL
>Affects Versions: 2.1.0
>Reporter: Daniel Davis
>Assignee: Apache Spark
>Priority: Minor
>
> As a spark user who uses value classes in scala for modelling domain objects, 
> I also would like to make use of them for datasets. 
> For example, I would like to use the {{User}} case class which is using a 
> value-class for it's {{id}} as the type for a DataSet:
> - the underlying primitive should be mapped to the value-class column
> - function on the column (for example comparison ) should only work if 
> defined on the value-class and use these implementation
> - show() should pick up the toString method of the value-class
> {code}
> case class Id(value: Long) extends AnyVal {
>   def toString: String = value.toHexString
> }
> case class User(id: Id, name: String)
> val ds = spark.sparkContext
>   .parallelize(0L to 12L).map(i => (i, f"name-$i")).toDS()
>   .withColumnRenamed("_1", "id")
>   .withColumnRenamed("_2", "name")
> // mapping should work
> val usrs = ds.as[User]
> // show should use toString
> usrs.show()
> // comparison with long should throw exception, as not defined on Id
> usrs.col("id") > 0L
> {code}
> For example `.show()` should use the toString of the `Id` value class:
> {noformat}
> +---+---+
> | id|   name|
> +---+---+
> |  0| name-0|
> |  1| name-1|
> |  2| name-2|
> |  3| name-3|
> |  4| name-4|
> |  5| name-5|
> |  6| name-6|
> |  7| name-7|
> |  8| name-8|
> |  9| name-9|
> |  A|name-10|
> |  B|name-11|
> |  C|name-12|
> +---+---+
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20384) supporting value classes over primitives in DataSets

2018-09-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599659#comment-16599659
 ] 

Apache Spark commented on SPARK-20384:
--

User 'mt40' has created a pull request for this issue:
https://github.com/apache/spark/pull/22309

> supporting value classes over primitives in DataSets
> 
>
> Key: SPARK-20384
> URL: https://issues.apache.org/jira/browse/SPARK-20384
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer, SQL
>Affects Versions: 2.1.0
>Reporter: Daniel Davis
>Priority: Minor
>
> As a spark user who uses value classes in scala for modelling domain objects, 
> I also would like to make use of them for datasets. 
> For example, I would like to use the {{User}} case class which is using a 
> value-class for it's {{id}} as the type for a DataSet:
> - the underlying primitive should be mapped to the value-class column
> - function on the column (for example comparison ) should only work if 
> defined on the value-class and use these implementation
> - show() should pick up the toString method of the value-class
> {code}
> case class Id(value: Long) extends AnyVal {
>   def toString: String = value.toHexString
> }
> case class User(id: Id, name: String)
> val ds = spark.sparkContext
>   .parallelize(0L to 12L).map(i => (i, f"name-$i")).toDS()
>   .withColumnRenamed("_1", "id")
>   .withColumnRenamed("_2", "name")
> // mapping should work
> val usrs = ds.as[User]
> // show should use toString
> usrs.show()
> // comparison with long should throw exception, as not defined on Id
> usrs.col("id") > 0L
> {code}
> For example `.show()` should use the toString of the `Id` value class:
> {noformat}
> +---+---+
> | id|   name|
> +---+---+
> |  0| name-0|
> |  1| name-1|
> |  2| name-2|
> |  3| name-3|
> |  4| name-4|
> |  5| name-5|
> |  6| name-6|
> |  7| name-7|
> |  8| name-8|
> |  9| name-9|
> |  A|name-10|
> |  B|name-11|
> |  C|name-12|
> +---+---+
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-20384) supporting value classes over primitives in DataSets

2018-09-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20384:


Assignee: (was: Apache Spark)

> supporting value classes over primitives in DataSets
> 
>
> Key: SPARK-20384
> URL: https://issues.apache.org/jira/browse/SPARK-20384
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer, SQL
>Affects Versions: 2.1.0
>Reporter: Daniel Davis
>Priority: Minor
>
> As a spark user who uses value classes in scala for modelling domain objects, 
> I also would like to make use of them for datasets. 
> For example, I would like to use the {{User}} case class which is using a 
> value-class for it's {{id}} as the type for a DataSet:
> - the underlying primitive should be mapped to the value-class column
> - function on the column (for example comparison ) should only work if 
> defined on the value-class and use these implementation
> - show() should pick up the toString method of the value-class
> {code}
> case class Id(value: Long) extends AnyVal {
>   def toString: String = value.toHexString
> }
> case class User(id: Id, name: String)
> val ds = spark.sparkContext
>   .parallelize(0L to 12L).map(i => (i, f"name-$i")).toDS()
>   .withColumnRenamed("_1", "id")
>   .withColumnRenamed("_2", "name")
> // mapping should work
> val usrs = ds.as[User]
> // show should use toString
> usrs.show()
> // comparison with long should throw exception, as not defined on Id
> usrs.col("id") > 0L
> {code}
> For example `.show()` should use the toString of the `Id` value class:
> {noformat}
> +---+---+
> | id|   name|
> +---+---+
> |  0| name-0|
> |  1| name-1|
> |  2| name-2|
> |  3| name-3|
> |  4| name-4|
> |  5| name-5|
> |  6| name-6|
> |  7| name-7|
> |  8| name-8|
> |  9| name-9|
> |  A|name-10|
> |  B|name-11|
> |  C|name-12|
> +---+---+
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25298) spark-tools build failure for Scala 2.12

2018-09-01 Thread Darcy Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599651#comment-16599651
 ] 

Darcy Shen commented on SPARK-25298:


sbt -Dscala-2.12 -Dscala.version=2.12.6

 

This is the solution, we should document it or improve the build definition.

> spark-tools build failure for Scala 2.12
> 
>
> Key: SPARK-25298
> URL: https://issues.apache.org/jira/browse/SPARK-25298
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 2.4.0
>Reporter: Darcy Shen
>Priority: Major
>
> $ sbt--
> > ++ 2.12.6
> > compile
>  
> [error] 
> /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:22:
>  object runtime is not a member of package reflect
> [error] import scala.reflect.runtime.\{universe => unv}
> [error]  ^
> [error] 
> /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:23:
>  object runtime is not a member of package reflect
> [error] import scala.reflect.runtime.universe.runtimeMirror
> [error]  ^
> [error] 
> /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:41:
>  not found: value runtimeMirror
> [error]   private val mirror = runtimeMirror(classLoader)
> [error]    ^
> [error] 
> /Users/rendong/wdi/spark/tools/src/main/scala/org/apache/spark/tools/GenerateMIMAIgnore.scala:43:
>  not found: value unv
> [error]   private def isPackagePrivate(sym: unv.Symbol) =



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25304) enable HiveSparkSubmitSuite SPARK-8489 test for Scala 2.12

2018-09-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25304:


Assignee: Apache Spark

> enable HiveSparkSubmitSuite SPARK-8489 test for Scala 2.12
> --
>
> Key: SPARK-25304
> URL: https://issues.apache.org/jira/browse/SPARK-25304
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Darcy Shen
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25304) enable HiveSparkSubmitSuite SPARK-8489 test for Scala 2.12

2018-09-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599640#comment-16599640
 ] 

Apache Spark commented on SPARK-25304:
--

User 'sadhen' has created a pull request for this issue:
https://github.com/apache/spark/pull/22308

> enable HiveSparkSubmitSuite SPARK-8489 test for Scala 2.12
> --
>
> Key: SPARK-25304
> URL: https://issues.apache.org/jira/browse/SPARK-25304
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Darcy Shen
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25304) enable HiveSparkSubmitSuite SPARK-8489 test for Scala 2.12

2018-09-01 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25304:


Assignee: (was: Apache Spark)

> enable HiveSparkSubmitSuite SPARK-8489 test for Scala 2.12
> --
>
> Key: SPARK-25304
> URL: https://issues.apache.org/jira/browse/SPARK-25304
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Darcy Shen
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8489) Add regression tests for SPARK-8470

2018-09-01 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599641#comment-16599641
 ] 

Apache Spark commented on SPARK-8489:
-

User 'sadhen' has created a pull request for this issue:
https://github.com/apache/spark/pull/22308

> Add regression tests for SPARK-8470
> ---
>
> Key: SPARK-8489
> URL: https://issues.apache.org/jira/browse/SPARK-8489
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 1.4.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>Priority: Critical
> Fix For: 1.4.1, 1.5.0
>
>
> See SPARK-8470 for more detail. Basically the Spark Hive code silently 
> overwrites the context class loader populated in SparkSubmit, resulting in 
> certain classes missing when we do reflection in `SQLContext#createDataFrame`.
> That issue is already resolved in https://github.com/apache/spark/pull/6891, 
> but we should add a regression test for the specific manifestation of the bug 
> in SPARK-8470.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25304) enable HiveSparkSubmitSuite SPARK-8489 test for Scala 2.12

2018-09-01 Thread Darcy Shen (JIRA)
Darcy Shen created SPARK-25304:
--

 Summary: enable HiveSparkSubmitSuite SPARK-8489 test for Scala 2.12
 Key: SPARK-25304
 URL: https://issues.apache.org/jira/browse/SPARK-25304
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.4.0
Reporter: Darcy Shen






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25289) ChiSqSelector max on empty collection

2018-09-01 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-25289:
-

Assignee: Marco Gaido

> ChiSqSelector max on empty collection
> -
>
> Key: SPARK-25289
> URL: https://issues.apache.org/jira/browse/SPARK-25289
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 2.3.1
>Reporter: Marie Beaulieu
>Assignee: Marco Gaido
>Priority: Major
> Fix For: 2.4.0
>
>
> In org.apache.spark.mllib.feature.ChiSqSelector.fit, there is a max taken on 
> a possibly empty collection.
> I am using Spark 2.3.1.
> Here is an example to reproduce.
> {code:java}
> import org.apache.spark.mllib.feature.ChiSqSelector
> import org.apache.spark.mllib.linalg.Vectors
> import org.apache.spark.mllib.regression.LabeledPoint
> import org.apache.spark.sql.SQLContext
> val sqlContext = new SQLContext(sc)
> implicit val spark = sqlContext.sparkSession
> val labeledPoints = (0 to 1).map(n => {
>   val v = Vectors.dense((1 to 3).map(_ => n * 1.0).toArray)
>   LabeledPoint(n.toDouble, v)
> })
> val rdd = sc.parallelize(labeledPoints)
> val selector = new ChiSqSelector().setSelectorType("fdr").setFdr(0.05)
> selector.fit(rdd){code}
> Here is the stack trace:
> {code:java}
> java.lang.UnsupportedOperationException: empty.max
> at scala.collection.TraversableOnce$class.max(TraversableOnce.scala:229)
> at scala.collection.mutable.ArrayOps$ofInt.max(ArrayOps.scala:234)
> at org.apache.spark.mllib.feature.ChiSqSelector.fit(ChiSqSelector.scala:280)
> {code}
> Looking at line 280 in ChiSqSelector, it's pretty obvious how the collection 
> can be empty. A simple non empty validation should do the trick.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25289) ChiSqSelector max on empty collection

2018-09-01 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-25289.
---
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 22303
[https://github.com/apache/spark/pull/22303]

> ChiSqSelector max on empty collection
> -
>
> Key: SPARK-25289
> URL: https://issues.apache.org/jira/browse/SPARK-25289
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 2.3.1
>Reporter: Marie Beaulieu
>Assignee: Marco Gaido
>Priority: Major
> Fix For: 2.4.0
>
>
> In org.apache.spark.mllib.feature.ChiSqSelector.fit, there is a max taken on 
> a possibly empty collection.
> I am using Spark 2.3.1.
> Here is an example to reproduce.
> {code:java}
> import org.apache.spark.mllib.feature.ChiSqSelector
> import org.apache.spark.mllib.linalg.Vectors
> import org.apache.spark.mllib.regression.LabeledPoint
> import org.apache.spark.sql.SQLContext
> val sqlContext = new SQLContext(sc)
> implicit val spark = sqlContext.sparkSession
> val labeledPoints = (0 to 1).map(n => {
>   val v = Vectors.dense((1 to 3).map(_ => n * 1.0).toArray)
>   LabeledPoint(n.toDouble, v)
> })
> val rdd = sc.parallelize(labeledPoints)
> val selector = new ChiSqSelector().setSelectorType("fdr").setFdr(0.05)
> selector.fit(rdd){code}
> Here is the stack trace:
> {code:java}
> java.lang.UnsupportedOperationException: empty.max
> at scala.collection.TraversableOnce$class.max(TraversableOnce.scala:229)
> at scala.collection.mutable.ArrayOps$ofInt.max(ArrayOps.scala:234)
> at org.apache.spark.mllib.feature.ChiSqSelector.fit(ChiSqSelector.scala:280)
> {code}
> Looking at line 280 in ChiSqSelector, it's pretty obvious how the collection 
> can be empty. A simple non empty validation should do the trick.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-24615) Accelerator-aware task scheduling for Spark

2018-09-01 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao reassigned SPARK-24615:
---

Assignee: (was: Saisai Shao)

> Accelerator-aware task scheduling for Spark
> ---
>
> Key: SPARK-24615
> URL: https://issues.apache.org/jira/browse/SPARK-24615
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Saisai Shao
>Priority: Major
>  Labels: Hydrogen, SPIP
>
> In the machine learning area, accelerator card (GPU, FPGA, TPU) is 
> predominant compared to CPUs. To make the current Spark architecture to work 
> with accelerator cards, Spark itself should understand the existence of 
> accelerators and know how to schedule task onto the executors where 
> accelerators are equipped.
> Current Spark’s scheduler schedules tasks based on the locality of the data 
> plus the available of CPUs. This will introduce some problems when scheduling 
> tasks with accelerators required.
>  # CPU cores are usually more than accelerators on one node, using CPU cores 
> to schedule accelerator required tasks will introduce the mismatch.
>  # In one cluster, we always assume that CPU is equipped in each node, but 
> this is not true of accelerator cards.
>  # The existence of heterogeneous tasks (accelerator required or not) 
> requires scheduler to schedule tasks with a smart way.
> So here propose to improve the current scheduler to support heterogeneous 
> tasks (accelerator requires or not). This can be part of the work of Project 
> hydrogen.
> Details is attached in google doc. It doesn't cover all the implementation 
> details, just highlight the parts should be changed.
>  
> CC [~yanboliang] [~merlintang]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25297) Future for Scala 2.12 will block on a already shutdown ExecutionContext

2018-09-01 Thread Darcy Shen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599580#comment-16599580
 ] 

Darcy Shen commented on SPARK-25297:


This issue has been fixed by https://github.com/apache/spark/pull/22292

> Future for Scala 2.12 will block on a already shutdown ExecutionContext
> ---
>
> Key: SPARK-25297
> URL: https://issues.apache.org/jira/browse/SPARK-25297
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Darcy Shen
>Priority: Major
>
> *+see 
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/job/spark-master-test-maven-hadoop-2.7-ubuntu-scala-2.12/193/]+*
> *The Units Test blocks on FileBasedWriteAheadLogWithFileCloseAfterWriteSuite 
> in Console Output.*



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25279) Throw exception: zzcclp java.io.NotSerializableException: org.apache.spark.sql.TypedColumn in Spark-shell when run example of doc

2018-09-01 Thread Zhichao Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599574#comment-16599574
 ] 

Zhichao  Zhang commented on SPARK-25279:


[~dkbiswal], thank you. you mean that use the lates code on branch 2.2 to test 
and it work fine?

> Throw exception: zzcclp   java.io.NotSerializableException: 
> org.apache.spark.sql.TypedColumn in Spark-shell when run example of doc
> ---
>
> Key: SPARK-25279
> URL: https://issues.apache.org/jira/browse/SPARK-25279
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, SQL
>Affects Versions: 2.2.1
>Reporter: Zhichao  Zhang
>Priority: Minor
>
> Hi dev: 
>   I am using Spark-Shell to run the example which is in section 
> '[http://spark.apache.org/docs/2.2.2/sql-programming-guide.html#type-safe-user-defined-aggregate-functions'],
>  
> and there is an error: 
> {code:java}
> Caused by: java.io.NotSerializableException: 
> org.apache.spark.sql.TypedColumn 
> Serialization stack: 
>         - object not serializable (class: org.apache.spark.sql.TypedColumn, 
> value: 
> myaverage() AS `average_salary`) 
>         - field (class: $iw, name: averageSalary, type: class 
> org.apache.spark.sql.TypedColumn) 
>         - object (class $iw, $iw@4b2f8ae9) 
>         - field (class: MyAverage$, name: $outer, type: class $iw) 
>         - object (class MyAverage$, MyAverage$@2be41d90) 
>         - field (class: 
> org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression, 
> name: aggregator, type: class org.apache.spark.sql.expressions.Aggregator) 
>         - object (class 
> org.apache.spark.sql.execution.aggregate.ComplexTypedAggregateExpression, 
> MyAverage(Employee)) 
>         - field (class: 
> org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression, 
> name: aggregateFunction, type: class 
> org.apache.spark.sql.catalyst.expressions.aggregate.AggregateFunction) 
>         - object (class 
> org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression, 
> partial_myaverage(MyAverage$@2be41d90, Some(newInstance(class Employee)), 
> Some(class Employee), Some(StructType(StructField(name,StringType,true), 
> StructField(salary,LongType,false))), assertnotnull(assertnotnull(input[0, 
> Average, true])).sum AS sum#25L, assertnotnull(assertnotnull(input[0, 
> Average, true])).count AS count#26L, newInstance(class Average), input[0, 
> double, false] AS value#24, DoubleType, false, 0, 0)) 
>         - writeObject data (class: 
> scala.collection.immutable.List$SerializationProxy) 
>         - object (class scala.collection.immutable.List$SerializationProxy, 
> scala.collection.immutable.List$SerializationProxy@5e92c46f) 
>         - writeReplace data (class: 
> scala.collection.immutable.List$SerializationProxy) 
>         - object (class scala.collection.immutable.$colon$colon, 
> List(partial_myaverage(MyAverage$@2be41d90, Some(newInstance(class 
> Employee)), Some(class Employee), 
> Some(StructType(StructField(name,StringType,true), 
> StructField(salary,LongType,false))), assertnotnull(assertnotnull(input[0, 
> Average, true])).sum AS sum#25L, assertnotnull(assertnotnull(input[0, 
> Average, true])).count AS count#26L, newInstance(class Average), input[0, 
> double, false] AS value#24, DoubleType, false, 0, 0))) 
>         - field (class: 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec, name: 
> aggregateExpressions, type: interface scala.collection.Seq) 
>         - object (class 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec, 
> ObjectHashAggregate(keys=[], 
> functions=[partial_myaverage(MyAverage$@2be41d90, Some(newInstance(class 
> Employee)), Some(class Employee), 
> Some(StructType(StructField(name,StringType,true), 
> StructField(salary,LongType,false))), assertnotnull(assertnotnull(input[0, 
> Average, true])).sum AS sum#25L, assertnotnull(assertnotnull(input[0, 
> Average, true])).count AS count#26L, newInstance(class Average), input[0, 
> double, false] AS value#24, DoubleType, false, 0, 0)], output=[buf#37]) 
> +- *FileScan json [name#8,salary#9L] Batched: false, Format: JSON, Location: 
> InMemoryFileIndex[file:/opt/spark2/examples/src/main/resources/employees.json],
>  
> PartitionFilters: [], PushedFilters: [], ReadSchema: 
> struct 
> ) 
>         - field (class: 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1,
>  
> name: $outer, type: class 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec) 
>         - object (class 
> org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1,
>  
> ) 
>         - field (class: 
> 

[jira] [Resolved] (SPARK-25290) BytesToBytesMapOnHeapSuite randomizedStressTest can cause OutOfMemoryError

2018-09-01 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-25290.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 22297
[https://github.com/apache/spark/pull/22297]

> BytesToBytesMapOnHeapSuite randomizedStressTest can cause OutOfMemoryError
> --
>
> Key: SPARK-25290
> URL: https://issues.apache.org/jira/browse/SPARK-25290
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
>Priority: Major
> Fix For: 2.4.0
>
>
> BytesToBytesMapOnHeapSuite randomizedStressTest caused OutOfMemoryError on 
> several test runs. Seems better to reduce memory usage in this test.
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95369/testReport/org.apache.spark.unsafe.map/BytesToBytesMapOnHeapSuite/randomizedStressTest/]
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95482/testReport/org.apache.spark.unsafe.map/BytesToBytesMapOnHeapSuite/randomizedStressTest/]
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95501/testReport/org.apache.spark.unsafe.map/BytesToBytesMapOnHeapSuite/randomizedStressTest/]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25290) BytesToBytesMapOnHeapSuite randomizedStressTest can cause OutOfMemoryError

2018-09-01 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-25290:
---

Assignee: Liang-Chi Hsieh

> BytesToBytesMapOnHeapSuite randomizedStressTest can cause OutOfMemoryError
> --
>
> Key: SPARK-25290
> URL: https://issues.apache.org/jira/browse/SPARK-25290
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
>Priority: Major
> Fix For: 2.4.0
>
>
> BytesToBytesMapOnHeapSuite randomizedStressTest caused OutOfMemoryError on 
> several test runs. Seems better to reduce memory usage in this test.
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95369/testReport/org.apache.spark.unsafe.map/BytesToBytesMapOnHeapSuite/randomizedStressTest/]
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95482/testReport/org.apache.spark.unsafe.map/BytesToBytesMapOnHeapSuite/randomizedStressTest/]
> [https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95501/testReport/org.apache.spark.unsafe.map/BytesToBytesMapOnHeapSuite/randomizedStressTest/]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25303) A DStream that is checkpointed should allow its parent(s) to be removed and not persisted

2018-09-01 Thread Nikunj Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599566#comment-16599566
 ] 

Nikunj Bansal commented on SPARK-25303:
---

I have a potential fix for this and SPARK-25202 available.

> A DStream that is checkpointed should allow its parent(s) to be removed and 
> not persisted
> -
>
> Key: SPARK-25303
> URL: https://issues.apache.org/jira/browse/SPARK-25303
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 
> 2.2.1, 2.2.2, 2.3.0, 2.3.1
>Reporter: Nikunj Bansal
>Priority: Major
>  Labels: Streaming, streaming
>
> A checkpointed DStream is supposed to cut the lineage to its parent(s) such 
> that any persisted RDDs for the parent(s) are removed. However, combined with 
> the issue in SPARK-25302, they result in the Input Stream RDDs being 
> persisted a lot longer than they are actually required.
> See also related bug SPARK-25302.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25302) ReducedWindowedDStream not using checkpoints for reduced RDDs

2018-09-01 Thread Nikunj Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599565#comment-16599565
 ] 

Nikunj Bansal commented on SPARK-25302:
---

I have a potential fix for this and SPARK-25303 available.

> ReducedWindowedDStream not using checkpoints for reduced RDDs
> -
>
> Key: SPARK-25302
> URL: https://issues.apache.org/jira/browse/SPARK-25302
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 
> 2.2.1, 2.2.2, 2.3.0, 2.3.1
>Reporter: Nikunj Bansal
>Priority: Major
>  Labels: Streaming, streaming
>
> When using reduceByKeyAndWindow() using inverse reduce function, it 
> eventually creates a ReducedWindowedDStream. This class creates a 
> reducedDStream but only persists it and does not checkpoint it. The result is 
> that it ends up using cached RDDs and does not cut lineage to the input 
> DStream resulting in eventually caching the input RDDs for much longer than 
> they are needed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25302) ReducedWindowedDStream not using checkpoints for reduced RDDs

2018-09-01 Thread Nikunj Bansal (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599563#comment-16599563
 ] 

Nikunj Bansal commented on SPARK-25302:
---

See also related issue SPARK-25303

> ReducedWindowedDStream not using checkpoints for reduced RDDs
> -
>
> Key: SPARK-25302
> URL: https://issues.apache.org/jira/browse/SPARK-25302
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.1.3, 2.2.0, 
> 2.2.1, 2.2.2, 2.3.0, 2.3.1
>Reporter: Nikunj Bansal
>Priority: Major
>  Labels: Streaming, streaming
>
> When using reduceByKeyAndWindow() using inverse reduce function, it 
> eventually creates a ReducedWindowedDStream. This class creates a 
> reducedDStream but only persists it and does not checkpoint it. The result is 
> that it ends up using cached RDDs and does not cut lineage to the input 
> DStream resulting in eventually caching the input RDDs for much longer than 
> they are needed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25303) A DStream that is checkpointed should allow its parent(s) to be removed and not persisted

2018-09-01 Thread Nikunj Bansal (JIRA)
Nikunj Bansal created SPARK-25303:
-

 Summary: A DStream that is checkpointed should allow its parent(s) 
to be removed and not persisted
 Key: SPARK-25303
 URL: https://issues.apache.org/jira/browse/SPARK-25303
 Project: Spark
  Issue Type: Bug
  Components: DStreams
Affects Versions: 2.3.1, 2.3.0, 2.2.2, 2.2.1, 2.2.0, 2.1.3, 2.1.2, 2.1.1, 
2.1.0, 2.0.2, 2.0.1, 2.0.0
Reporter: Nikunj Bansal


A checkpointed DStream is supposed to cut the lineage to its parent(s) such 
that any persisted RDDs for the parent(s) are removed. However, combined with 
the issue in SPARK-25302, they result in the Input Stream RDDs being persisted 
a lot longer than they are actually required.

See also related bug SPARK-25302.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25302) ReducedWindowedDStream not using checkpoints for reduced RDDs

2018-09-01 Thread Nikunj Bansal (JIRA)
Nikunj Bansal created SPARK-25302:
-

 Summary: ReducedWindowedDStream not using checkpoints for reduced 
RDDs
 Key: SPARK-25302
 URL: https://issues.apache.org/jira/browse/SPARK-25302
 Project: Spark
  Issue Type: Bug
  Components: DStreams
Affects Versions: 2.3.1, 2.3.0, 2.2.2, 2.2.1, 2.2.0, 2.1.3, 2.1.2, 2.1.1, 
2.1.0, 2.0.2, 2.0.1, 2.0.0
Reporter: Nikunj Bansal


When using reduceByKeyAndWindow() using inverse reduce function, it eventually 
creates a ReducedWindowedDStream. This class creates a reducedDStream but only 
persists it and does not checkpoint it. The result is that it ends up using 
cached RDDs and does not cut lineage to the input DStream resulting in 
eventually caching the input RDDs for much longer than they are needed. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23253) Only write shuffle temporary index file when there is not an existing one

2018-09-01 Thread Wenchen Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599557#comment-16599557
 ] 

Wenchen Fan edited comment on SPARK-23253 at 9/1/18 7:05 AM:
-

cc [~joshrosen] [~zsxwing] [~r...@databricks.com] [~jiangxb1987]


was (Author: cloud_fan):
cc [~joshrosen] [~zsxwing] [~r...@databricks.com]

> Only write shuffle temporary index file when there is not an existing one
> -
>
> Key: SPARK-23253
> URL: https://issues.apache.org/jira/browse/SPARK-23253
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 2.2.1
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 2.4.0
>
>
> Shuffle Index temporay file is used for atomic creating shuffle index file, 
> it is not needed when the index file already exists after another attempts of 
> same task had it done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23253) Only write shuffle temporary index file when there is not an existing one

2018-09-01 Thread Wenchen Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599557#comment-16599557
 ] 

Wenchen Fan commented on SPARK-23253:
-

cc [~joshrosen] [~zsxwing] [~r...@databricks.com]

> Only write shuffle temporary index file when there is not an existing one
> -
>
> Key: SPARK-23253
> URL: https://issues.apache.org/jira/browse/SPARK-23253
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 2.2.1
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 2.4.0
>
>
> Shuffle Index temporay file is used for atomic creating shuffle index file, 
> it is not needed when the index file already exists after another attempts of 
> same task had it done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23253) Only write shuffle temporary index file when there is not an existing one

2018-09-01 Thread Wenchen Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599551#comment-16599551
 ] 

Wenchen Fan commented on SPARK-23253:
-

This is dangerous, we can only skip shuffle writing if the data in the existing 
shuffle file are exactly same with the one we are going to write, but in the PR 
we only check size. We can use checksum to quickly check if the data are same.

This caused a problem in https://github.com/apache/spark/pull/22112 , I'm 
reverting it in my PR, we should revist this optimization later.

> Only write shuffle temporary index file when there is not an existing one
> -
>
> Key: SPARK-23253
> URL: https://issues.apache.org/jira/browse/SPARK-23253
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle, Spark Core
>Affects Versions: 2.2.1
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 2.4.0
>
>
> Shuffle Index temporay file is used for atomic creating shuffle index file, 
> it is not needed when the index file already exists after another attempts of 
> same task had it done.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org