[jira] [Created] (SPARK-25538) incorrect row counts after distinct()

2018-09-25 Thread Steven Rand (JIRA)
Steven Rand created SPARK-25538:
---

 Summary: incorrect row counts after distinct()
 Key: SPARK-25538
 URL: https://issues.apache.org/jira/browse/SPARK-25538
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
 Environment: Reproduced on a Centos7 VM and from source in Intellij on 
OS X.
Reporter: Steven Rand


It appears that {{df.distinct.count}} can return incorrect values after 
SPARK-23713. It's possible that other operations are affected as well; 
{{distinct}} just happens to be the one that we noticed. I believe that this 
issue was introduced by SPARK-23713 because I can't reproduce it until that 
commit, and I've been able to reproduce it after that commit as well as with 
{{tags/v2.4.0-rc1}}. 

Below are example spark-shell sessions to illustrate the problem. Unfortunately 
the data used in these examples can't be uploaded to this Jira ticket. I'll try 
to create test data which also reproduces the issue, and will upload that if 
I'm able to do so.

Example from Spark 2.3.1, which behaves correctly:

{code}
scala> val df = spark.read.parquet("hdfs:///data")
df: org.apache.spark.sql.DataFrame = []

scala> df.count
res0: Long = 123

scala> df.distinct.count
res1: Long = 115
{code}

Example from Spark 2.4.0-rc1, which returns different output:

{code}
scala> val df = spark.read.parquet("hdfs:///data")
df: org.apache.spark.sql.DataFrame = []

scala> df.count
res0: Long = 123

scala> df.distinct.count
res1: Long = 116

scala> df.sort("col_0").distinct.count
res2: Long = 123

scala> df.withColumnRenamed("col_0", "newName").distinct.count
res3: Long = 115
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25537) spark.pyspark.driver.python when set in code doesnt work

2018-09-25 Thread Venkat Sambath (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkat Sambath updated SPARK-25537:
---
Description: spark.pyspark.driver.python, spark.pyspark.python when set in 
code doesnt get picked up by driver or executor. It gets picked up only when 
set through --conf or when set in spark-defaults.conf. Can we add a line which 
states it is illegal to set these in application as we do for 
spark.driver.extraJavaOptions in the doc 
https://spark.apache.org/docs/latest/configuration.html#application-properties  
(was: spark.pyspark.driver.python, spark.pyspark.python when set in code doesnt 
get picked up by driver. Can we add a line which states it is illegal to set 
these in application as we do for spark.driver.extraJavaOptions )

> spark.pyspark.driver.python when set in code doesnt work
> 
>
> Key: SPARK-25537
> URL: https://issues.apache.org/jira/browse/SPARK-25537
> Project: Spark
>  Issue Type: Documentation
>  Components: Spark Core
>Affects Versions: 2.3.0
>Reporter: Venkat Sambath
>Priority: Minor
>
> spark.pyspark.driver.python, spark.pyspark.python when set in code doesnt get 
> picked up by driver or executor. It gets picked up only when set through 
> --conf or when set in spark-defaults.conf. Can we add a line which states it 
> is illegal to set these in application as we do for 
> spark.driver.extraJavaOptions in the doc 
> https://spark.apache.org/docs/latest/configuration.html#application-properties



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25536) executorSource.METRIC read wrong record in Executor.scala Line444

2018-09-25 Thread shahid (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628267#comment-16628267
 ] 

shahid commented on SPARK-25536:


I will raise a pr

> executorSource.METRIC read wrong record in Executor.scala Line444
> -
>
> Key: SPARK-25536
> URL: https://issues.apache.org/jira/browse/SPARK-25536
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2
>Reporter: ZhuoerXu
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25537) spark.pyspark.driver.python when set in code doesnt work

2018-09-25 Thread Venkat Sambath (JIRA)
Venkat Sambath created SPARK-25537:
--

 Summary: spark.pyspark.driver.python when set in code doesnt work
 Key: SPARK-25537
 URL: https://issues.apache.org/jira/browse/SPARK-25537
 Project: Spark
  Issue Type: Documentation
  Components: Spark Core
Affects Versions: 2.3.0
Reporter: Venkat Sambath


spark.pyspark.driver.python, spark.pyspark.python when set in code doesnt get 
picked up by driver. Can we add a line which states it is illegal to set these 
in application as we do for spark.driver.extraJavaOptions 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25378) ArrayData.toArray(StringType) assume UTF8String in 2.4

2018-09-25 Thread Wenchen Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628231#comment-16628231
 ] 

Wenchen Fan commented on SPARK-25378:
-

[~mengxr] what do you think? This is not a real compatibility issue, but is 
more like a special case for Spark's adoption.

> ArrayData.toArray(StringType) assume UTF8String in 2.4
> --
>
> Key: SPARK-25378
> URL: https://issues.apache.org/jira/browse/SPARK-25378
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Xiangrui Meng
>Priority: Critical
>
> The following code works in 2.3.1 but failed in 2.4.0-SNAPSHOT:
> {code}
> import org.apache.spark.sql.catalyst.util._
> import org.apache.spark.sql.types.StringType
> ArrayData.toArrayData(Array("a", "b")).toArray[String](StringType)
> res0: Array[String] = Array(a, b)
> {code}
> In 2.4.0-SNAPSHOT, the error is
> {code}java.lang.ClassCastException: java.lang.String cannot be cast to 
> org.apache.spark.unsafe.types.UTF8String
>   at 
> org.apache.spark.sql.catalyst.util.GenericArrayData.getUTF8String(GenericArrayData.scala:75)
>   at 
> org.apache.spark.sql.catalyst.InternalRow$$anonfun$getAccessor$8.apply(InternalRow.scala:136)
>   at 
> org.apache.spark.sql.catalyst.InternalRow$$anonfun$getAccessor$8.apply(InternalRow.scala:136)
>   at org.apache.spark.sql.catalyst.util.ArrayData.toArray(ArrayData.scala:178)
>   ... 51 elided
> {code}
> cc: [~cloud_fan] [~yogeshg]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25536) executorSource.METRIC read wrong record in Executor.scala Line444

2018-09-25 Thread ZhuoerXu (JIRA)
ZhuoerXu created SPARK-25536:


 Summary: executorSource.METRIC read wrong record in Executor.scala 
Line444
 Key: SPARK-25536
 URL: https://issues.apache.org/jira/browse/SPARK-25536
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.3.2
Reporter: ZhuoerXu






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25378) ArrayData.toArray(StringType) assume UTF8String in 2.4

2018-09-25 Thread Liang-Chi Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628225#comment-16628225
 ] 

Liang-Chi Hsieh commented on SPARK-25378:
-

Don't we have any decision on this yet?

> ArrayData.toArray(StringType) assume UTF8String in 2.4
> --
>
> Key: SPARK-25378
> URL: https://issues.apache.org/jira/browse/SPARK-25378
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Xiangrui Meng
>Priority: Critical
>
> The following code works in 2.3.1 but failed in 2.4.0-SNAPSHOT:
> {code}
> import org.apache.spark.sql.catalyst.util._
> import org.apache.spark.sql.types.StringType
> ArrayData.toArrayData(Array("a", "b")).toArray[String](StringType)
> res0: Array[String] = Array(a, b)
> {code}
> In 2.4.0-SNAPSHOT, the error is
> {code}java.lang.ClassCastException: java.lang.String cannot be cast to 
> org.apache.spark.unsafe.types.UTF8String
>   at 
> org.apache.spark.sql.catalyst.util.GenericArrayData.getUTF8String(GenericArrayData.scala:75)
>   at 
> org.apache.spark.sql.catalyst.InternalRow$$anonfun$getAccessor$8.apply(InternalRow.scala:136)
>   at 
> org.apache.spark.sql.catalyst.InternalRow$$anonfun$getAccessor$8.apply(InternalRow.scala:136)
>   at org.apache.spark.sql.catalyst.util.ArrayData.toArray(ArrayData.scala:178)
>   ... 51 elided
> {code}
> cc: [~cloud_fan] [~yogeshg]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25523) Multi thread execute sparkSession.read().jdbc(url, table, properties) problem

2018-09-25 Thread huanghuai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huanghuai updated SPARK-25523:
--
Description: 
public static void test2() throws Exception{
 String ckUrlPrefix="jdbc:clickhouse://";
 String quote = "`";
 JdbcDialects.registerDialect(new JdbcDialect() {
 @Override
 public boolean canHandle(String url)

{ return url.startsWith(ckUrlPrefix); }

@Override
 public String quoteIdentifier(String colName)

{ return quote + colName + quote; }

});

SparkSession spark = initSpark();
 String ckUrl = "jdbc:clickhouse://192.168.2.148:8123/default";
 Properties ckProp = new Properties();
 ckProp.put("user", "default");
 ckProp.put("password", "");

String prestoUrl = "jdbc:presto://192.168.2.148:9002/mysql-xxx/xxx";
 Properties prestoUrlProp = new Properties();
 prestoUrlProp.put("user", "root");
 prestoUrlProp.put("password", "");

// new Thread(()->{

// spark.read()

// .jdbc(ckUrl, "ontime", ckProp).show();

// }).start();

System.out.println("--");

new Thread(()->{

spark.read()

.jdbc(prestoUrl, "tx_user", prestoUrlProp).show();

}).start();

System.out.println("--");

new Thread(()->{

Dataset load = spark.read()

.format("com.vertica.spark.datasource.DefaultSource")

.option("host", "192.168.1.102")

.option("port", 5433)

.option("user", "dbadmin")

.option("password", "manager")

.option("db", "test")

.option("dbschema", "public")

.option("table", "customers")

.load();

load.printSchema();

load.show();

}).start();
 System.out.println("--");
 }

public static SparkSession initSpark() throws Exception

{ return SparkSession.builder() .master("spark://dsjkfb1:7077") 
//spark://dsjkfb1:7077 .appName("Test") .config("spark.executor.instances",3) 
.config("spark.executor.cores",2) .config("spark.cores.max",6) 
//.config("spark.default.parallelism",1) 
.config("spark.submit.deployMode","client") .config("spark.driver.memory","2G") 
.config("spark.executor.memory","3G") .config("spark.driver.maxResultSize", 
"2G") .config("spark.local.dir", "d:\\tmp") .config("spark.driver.host", 
"192.168.2.148") .config("spark.scheduler.mode", "FAIR") .config("spark.jars", 
"F:\\project\\xxx\\vertica-jdbc-7.0.1-0.jar," + 
"F:\\project\\xxx\\clickhouse-jdbc-0.1.40.jar," + 
"F:\\project\\xxx\\vertica-spark-connector-9.1-2.1.jar," + 
"F:\\project\\xxx\\presto-jdbc-0.189-mining.jar")  .getOrCreate(); }

 

 

{color:#ff}*  The above is code 
--*{color}

{color:#ff}*question: If i open vertica jdbc , thread will pending 
forever.*{color}

{color:#ff}*And driver loging like this:*{color}

 

2018-09-26 10:32:51 INFO SharedState:54 - Setting hive.metastore.warehouse.dir 
('null') to the value of spark.sql.warehouse.dir 
('file:/C:/Users/admin/Desktop/test-project/sparktest/spark-warehouse/').
 2018-09-26 10:32:51 INFO SharedState:54 - Warehouse path is 
'file:/C:/Users/admin/Desktop/test-project/sparktest/spark-warehouse/'.
 2018-09-26 10:32:51 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@2f70d6e2\{/SQL,null,AVAILABLE,@Spark}
 2018-09-26 10:32:51 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@1d66833d\{/SQL/json,null,AVAILABLE,@Spark}
 2018-09-26 10:32:51 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@65af6f3a\{/SQL/execution,null,AVAILABLE,@Spark}
 2018-09-26 10:32:51 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@55012968\{/SQL/execution/json,null,AVAILABLE,@Spark}
 2018-09-26 10:32:51 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@59e3f5aa\{/static/sql,null,AVAILABLE,@Spark}
 2018-09-26 10:32:52 INFO StateStoreCoordinatorRef:54 - Registered 
StateStoreCoordinator endpoint
 2018-09-26 10:32:52 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - 
Registered executor NettyRpcEndpointRef(spark-client://Executor) 
(192.168.4.232:49434) with ID 0
 2018-09-26 10:32:52 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - 
Registered executor NettyRpcEndpointRef(spark-client://Executor) 
(192.168.4.233:44834) with ID 2
 2018-09-26 10:32:52 INFO BlockManagerMasterEndpoint:54 - Registering block 
manager 192.168.4.232:35380 with 1458.6 MB RAM, BlockManagerId(0, 
192.168.4.232, 35380, None)
 2018-09-26 10:32:52 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - 
Registered executor NettyRpcEndpointRef(spark-client://Executor) 
(192.168.4.231:42504) with ID 1
 2018-09-26 10:32:52 INFO BlockManagerMasterEndpoint:54 - Registering block 
manager 192.168.4.233:40882 with 1458.6 MB RAM, BlockManagerId(2, 
192.168.4.233, 40882, None)
 2018-09-26 10:32:52 INFO BlockManagerMasterEndpoint:54 - Registering block 
manager 192.168.4.231:44682 with 1458.6 MB RAM, BlockManagerId(1, 
192.168.4.231, 44682, None)
 2018-09-26 

[jira] [Updated] (SPARK-25523) Multi thread execute sparkSession.read().jdbc(url, table, properties) problem

2018-09-25 Thread huanghuai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huanghuai updated SPARK-25523:
--
Description: 
public static void test2() throws Exception{
 String ckUrlPrefix="jdbc:clickhouse://";
 String quote = "`";
 JdbcDialects.registerDialect(new JdbcDialect() {
 @Override
 public boolean canHandle(String url) {
 return url.startsWith(ckUrlPrefix);
 }
 @Override
 public String quoteIdentifier(String colName) {
 return quote + colName + quote;
 }
 });

 SparkSession spark = initSpark();
 String ckUrl = "jdbc:clickhouse://192.168.2.148:8123/default";
 Properties ckProp = new Properties();
 ckProp.put("user", "default");
 ckProp.put("password", "");


 String prestoUrl = "jdbc:presto://192.168.2.148:9002/mysql-xxx/xxx";
 Properties prestoUrlProp = new Properties();
 prestoUrlProp.put("user", "root");
 prestoUrlProp.put("password", "");



// new Thread(()->{
// spark.read()
// .jdbc(ckUrl, "ontime", ckProp).show();
// }).start();


 System.out.println("--");

 new Thread(()->{
 spark.read()
 .jdbc(prestoUrl, "tx_user", prestoUrlProp).show();
 }).start();

 System.out.println("--");

 new Thread(()->{
 Dataset load = spark.read()
 .format("com.vertica.spark.datasource.DefaultSource")
 .option("host", "192.168.1.102")
 .option("port", 5433)
 .option("user", "dbadmin")
 .option("password", "manager")

 .option("db", "test")
 .option("dbschema", "public")
 .option("table", "customers")
 .load();
 load.printSchema();
 load.show();
 }).start();
 System.out.println("--");
 }




 public static SparkSession initSpark() throws Exception{
 return SparkSession.builder()
 .master("spark://dsjkfb1:7077") //spark://dsjkfb1:7077
 .appName("Test")
 .config("spark.executor.instances",3)
 .config("spark.executor.cores",2)
 .config("spark.cores.max",6)
 //.config("spark.default.parallelism",1)
 .config("spark.submit.deployMode","client")
 .config("spark.driver.memory","2G")
 .config("spark.executor.memory","3G")
 .config("spark.driver.maxResultSize", "2G")
 .config("spark.local.dir", "d:\\tmp")
 .config("spark.driver.host", "192.168.2.148")
 .config("spark.scheduler.mode", "FAIR")
 .config("spark.jars", "F:\\project\\xxx\\vertica-jdbc-7.0.1-0.jar," +
 "F:\\project\\xxx\\clickhouse-jdbc-0.1.40.jar," +
 "F:\\project\\xxx\\vertica-spark-connector-9.1-2.1.jar," +
 "F:\\project\\xxx\\presto-jdbc-0.189-mining.jar") 
 .getOrCreate();
 }

 

 

{color:#FF}*  The above is code 
--*{color}

{color:#FF}*question: If i open vertica jdbc , thread will pending 
forever.*{color}

{color:#FF}*And driver loging like this:*{color}

 

2018-09-26 10:32:51 INFO SharedState:54 - Setting hive.metastore.warehouse.dir 
('null') to the value of spark.sql.warehouse.dir 
('file:/C:/Users/admin/Desktop/test-project/sparktest/spark-warehouse/').
2018-09-26 10:32:51 INFO SharedState:54 - Warehouse path is 
'file:/C:/Users/admin/Desktop/test-project/sparktest/spark-warehouse/'.
2018-09-26 10:32:51 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@2f70d6e2\{/SQL,null,AVAILABLE,@Spark}
2018-09-26 10:32:51 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@1d66833d\{/SQL/json,null,AVAILABLE,@Spark}
2018-09-26 10:32:51 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@65af6f3a\{/SQL/execution,null,AVAILABLE,@Spark}
2018-09-26 10:32:51 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@55012968\{/SQL/execution/json,null,AVAILABLE,@Spark}
2018-09-26 10:32:51 INFO ContextHandler:781 - Started 
o.s.j.s.ServletContextHandler@59e3f5aa\{/static/sql,null,AVAILABLE,@Spark}
2018-09-26 10:32:52 INFO StateStoreCoordinatorRef:54 - Registered 
StateStoreCoordinator endpoint
2018-09-26 10:32:52 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - 
Registered executor NettyRpcEndpointRef(spark-client://Executor) 
(192.168.4.232:49434) with ID 0
2018-09-26 10:32:52 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - 
Registered executor NettyRpcEndpointRef(spark-client://Executor) 
(192.168.4.233:44834) with ID 2
2018-09-26 10:32:52 INFO BlockManagerMasterEndpoint:54 - Registering block 
manager 192.168.4.232:35380 with 1458.6 MB RAM, BlockManagerId(0, 
192.168.4.232, 35380, None)
2018-09-26 10:32:52 INFO CoarseGrainedSchedulerBackend$DriverEndpoint:54 - 
Registered executor NettyRpcEndpointRef(spark-client://Executor) 
(192.168.4.231:42504) with ID 1
2018-09-26 10:32:52 INFO BlockManagerMasterEndpoint:54 - Registering block 
manager 192.168.4.233:40882 with 1458.6 MB RAM, BlockManagerId(2, 
192.168.4.233, 40882, None)
2018-09-26 10:32:52 INFO BlockManagerMasterEndpoint:54 - Registering block 
manager 192.168.4.231:44682 with 1458.6 MB RAM, BlockManagerId(1, 
192.168.4.231, 44682, None)

[jira] [Commented] (SPARK-23839) consider bucket join in cost-based JoinReorder rule

2018-09-25 Thread Xiaoju Wu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628184#comment-16628184
 ] 

Xiaoju Wu commented on SPARK-23839:
---

[~smilegator] Is there any plan on the cost-based optimizer?

> consider bucket join in cost-based JoinReorder rule
> ---
>
> Key: SPARK-23839
> URL: https://issues.apache.org/jira/browse/SPARK-23839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiaoju Wu
>Priority: Minor
>
> Since spark 2.2, the cost-based JoinReorder rule is implemented and in Spark 
> 2.3 released, it is improved with histogram. While it doesn't take the cost 
> of the different join implementations. For example:
> TableA JOIN TableB JOIN TableC
> TableA  will output 10,000 rows after filter and projection. 
> TableB  will output 10,000 rows after filter and projection. 
> TableC  will output 8,000 rows after filter and projection. 
> The current JoinReorder rule will possibly optimize the plan to join TableC 
> with TableA firstly and then TableB. But if the TableA and TableB are bucket 
> tables and can be applied with BucketJoin, it could be a different story. 
>  
> Also, to support bucket join of more than 2 tables when table bucket number 
> is multiple of another (SPARK-17570), whether bucket join can take effect 
> depends on the result of JoinReorder. For example of "A join B join C" which 
> has bucket number like 8, 4, 12, JoinReorder rule should optimize the order 
> to "A join B join C“ to make the bucket join take effect instead of "C join A 
> join B". 
>  
> Based on current CBO JoinReorder, there are possibly 2 part to be changed:
>  # CostBasedJoinReorder rule is applied in optimizer phase while we do Join 
> selection in planner phase and bucket join optimization in EnsureRequirements 
> which is in preparation phase. Both are after optimizer. 
>  # Current statistics and join cost formula are based data selectivity and 
> cardinality, we need to add statistics for present the join method cost like 
> shuffle, sort, hash and etc. Also we need to add the statistics into the 
> formula to estimate the join cost. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25517) Spark DataFrame option inferSchema="true", dataFormat=MM/dd/yyyy, fails to detect date type from the csv file while reading

2018-09-25 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25517:
--
Environment: (was: Spark 2.3.0)

> Spark DataFrame option inferSchema="true", dataFormat=MM/dd/, fails to 
> detect date type from the csv file while reading
> ---
>
> Key: SPARK-25517
> URL: https://issues.apache.org/jira/browse/SPARK-25517
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Manoranjan Kumar
>Priority: Major
>  Labels: easyfix
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> spark.read.format("csv").option("inferSchema", true).option("dateFormat", 
> "MM/dd/") fails to detect or infer the date type while reading the csv 
> file having date column in the specified format(MM/dd/)
> For example:-
> An employee csv file (employee.csv) has following two sample dummy records 
> (with header):
> emp_id,emp_name,joining_date,emp_age, emp_in_time,emp_salary
> 100,Bradd Pitt,{color:#f6c342}09/25/2018{color},26,{color:#f691b2}09/25/2018 
> 10:12:36{color},1.00
> 101,Angel Joli,{color:#f6c342}08/20/2018{color},28,{color:#f691b2}08/20/2018 
> 11:32:58{color},12000.00
> when I read the above csv file as dataframe like below: 
> val empDF = spark.read.format("csv").option("inferSchema", 
> true).option("dateFormat","MM/dd/").option("timestampFormat","MM/dd/ 
> HH:mm:ss").load(employee.csv)
> empDF.printSchema()
> results/output:
> root
>  |-- emp_id: integer (nullable = true)
>  |-- emp_name: string (nullable = true)
>  |-- {color:#d04437}joining_date: string{color} (nullable = true)
>  |-- emp_age: integer (nullable = true)
>  |-- {color:#d04437}emp_in_time: timestamp{color} (nullable = true)
>  |-- emp_salary: double (nullable = true)
> Please notice above (marked in {color:#d04437}red{color} color) the data type 
> automatically inferred by spark for joining_date and emp_in_time, for 
> joining_date, it fails to detect as date type and the type remains as 
> {color:#d04437}string{color} as it is, whereas it detects well for 
> emp_in_time as {color:#d04437}timestamp{color}
> This was the issue that I struggled with for a complete day, and when I dived 
> deep into the spark source code, i found the implementation for date type is 
> missing whereas the implementation for timestamp is present in all its glory.
> I am new to this place (exactly first timer), please get back in case of 
> further information or live example with running code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25517) Spark DataFrame option inferSchema="true", dataFormat=MM/dd/yyyy, fails to detect date type from the csv file while reading

2018-09-25 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-25517.
---
Resolution: Duplicate

According to the comments on the PR, I'll close this as `Duplicate` for now.

> Spark DataFrame option inferSchema="true", dataFormat=MM/dd/, fails to 
> detect date type from the csv file while reading
> ---
>
> Key: SPARK-25517
> URL: https://issues.apache.org/jira/browse/SPARK-25517
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1
> Environment: Spark 2.3.0
>Reporter: Manoranjan Kumar
>Priority: Major
>  Labels: easyfix
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> spark.read.format("csv").option("inferSchema", true).option("dateFormat", 
> "MM/dd/") fails to detect or infer the date type while reading the csv 
> file having date column in the specified format(MM/dd/)
> For example:-
> An employee csv file (employee.csv) has following two sample dummy records 
> (with header):
> emp_id,emp_name,joining_date,emp_age, emp_in_time,emp_salary
> 100,Bradd Pitt,{color:#f6c342}09/25/2018{color},26,{color:#f691b2}09/25/2018 
> 10:12:36{color},1.00
> 101,Angel Joli,{color:#f6c342}08/20/2018{color},28,{color:#f691b2}08/20/2018 
> 11:32:58{color},12000.00
> when I read the above csv file as dataframe like below: 
> val empDF = spark.read.format("csv").option("inferSchema", 
> true).option("dateFormat","MM/dd/").option("timestampFormat","MM/dd/ 
> HH:mm:ss").load(employee.csv)
> empDF.printSchema()
> results/output:
> root
>  |-- emp_id: integer (nullable = true)
>  |-- emp_name: string (nullable = true)
>  |-- {color:#d04437}joining_date: string{color} (nullable = true)
>  |-- emp_age: integer (nullable = true)
>  |-- {color:#d04437}emp_in_time: timestamp{color} (nullable = true)
>  |-- emp_salary: double (nullable = true)
> Please notice above (marked in {color:#d04437}red{color} color) the data type 
> automatically inferred by spark for joining_date and emp_in_time, for 
> joining_date, it fails to detect as date type and the type remains as 
> {color:#d04437}string{color} as it is, whereas it detects well for 
> emp_in_time as {color:#d04437}timestamp{color}
> This was the issue that I struggled with for a complete day, and when I dived 
> deep into the spark source code, i found the implementation for date type is 
> missing whereas the implementation for timestamp is present in all its glory.
> I am new to this place (exactly first timer), please get back in case of 
> further information or live example with running code.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-25518) Spark kafka delegation token supported

2018-09-25 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-25518.
-

> Spark kafka delegation token supported 
> ---
>
> Key: SPARK-25518
> URL: https://issues.apache.org/jira/browse/SPARK-25518
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.3.1
>Reporter: Mingjie Tang
>Priority: Major
>  Labels: feature
>
> As we can notice, Kafka is going to support delegation 
> token[https://cwiki-test.apache.org/confluence/display/KAFKA/KIP-48+Delegation+token+support+for+Kafka].
>  Spark need to use the delegation token for Kafka cluster as delegation token 
> for HDFS, Hive and  HBase server. 
> In this Jira, we are going to track how to provide the delegation token for 
> Spark Streaming to read and write data from/to Kafka. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-25526) SparkSQL over REST API not JDBC

2018-09-25 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-25526.
-

> SparkSQL over REST API not JDBC
> ---
>
> Key: SPARK-25526
> URL: https://issues.apache.org/jira/browse/SPARK-25526
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: t oo
>Priority: Major
>
> Ideally a Sparksql rest api could be added to Spark, similar to below but 
> updated for spark 2+.
> [https://github.com/VeritoneAlpha/jaws-spark-sql-rest]
> It does not support Spark 2 version. Would like to pass query into curl 
> request and get a response of the rows retrieved by the query in json format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25526) SparkSQL over REST API not JDBC

2018-09-25 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25526:
--
Component/s: (was: Spark Submit)
 (was: Spark Core)
 (was: Java API)

> SparkSQL over REST API not JDBC
> ---
>
> Key: SPARK-25526
> URL: https://issues.apache.org/jira/browse/SPARK-25526
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: t oo
>Priority: Major
>
> Ideally a Sparksql rest api could be added to Spark, similar to below but 
> updated for spark 2+.
> [https://github.com/VeritoneAlpha/jaws-spark-sql-rest]
> It does not support Spark 2 version. Would like to pass query into curl 
> request and get a response of the rows retrieved by the query in json format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25526) SparkSQL over REST API not JDBC

2018-09-25 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-25526.
---
Resolution: Won't Fix

> SparkSQL over REST API not JDBC
> ---
>
> Key: SPARK-25526
> URL: https://issues.apache.org/jira/browse/SPARK-25526
> Project: Spark
>  Issue Type: New Feature
>  Components: Java API, Spark Core, Spark Submit, SQL
>Affects Versions: 2.4.0
>Reporter: t oo
>Priority: Major
>
> Ideally a Sparksql rest api could be added to Spark, similar to below but 
> updated for spark 2+.
> [https://github.com/VeritoneAlpha/jaws-spark-sql-rest]
> It does not support Spark 2 version. Would like to pass query into curl 
> request and get a response of the rows retrieved by the query in json format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25526) SparkSQL over REST API not JDBC

2018-09-25 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628165#comment-16628165
 ] 

Dongjoon Hyun commented on SPARK-25526:
---

In addition to that, recently LIVY-494 starts to support Livy Thrift Server, 
too. In short, with Livy, you can use both JDBC and REST.

> SparkSQL over REST API not JDBC
> ---
>
> Key: SPARK-25526
> URL: https://issues.apache.org/jira/browse/SPARK-25526
> Project: Spark
>  Issue Type: New Feature
>  Components: Java API, Spark Core, Spark Submit, SQL
>Affects Versions: 2.4.0
>Reporter: t oo
>Priority: Major
>
> Ideally a Sparksql rest api could be added to Spark, similar to below but 
> updated for spark 2+.
> [https://github.com/VeritoneAlpha/jaws-spark-sql-rest]
> It does not support Spark 2 version. Would like to pass query into curl 
> request and get a response of the rows retrieved by the query in json format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25526) SparkSQL over REST API not JDBC

2018-09-25 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628161#comment-16628161
 ] 

Dongjoon Hyun commented on SPARK-25526:
---

Could you try Apache Livy? Apache Livy supports all official Apache Spark 
releases. And, it's popular and major vendors provide as a built-in components.

> SparkSQL over REST API not JDBC
> ---
>
> Key: SPARK-25526
> URL: https://issues.apache.org/jira/browse/SPARK-25526
> Project: Spark
>  Issue Type: New Feature
>  Components: Java API, Spark Core, Spark Submit, SQL
>Affects Versions: 2.4.0
>Reporter: t oo
>Priority: Major
>
> Ideally a Sparksql rest api could be added to Spark, similar to below but 
> updated for spark 2+.
> [https://github.com/VeritoneAlpha/jaws-spark-sql-rest]
> It does not support Spark 2 version. Would like to pass query into curl 
> request and get a response of the rows retrieved by the query in json format.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-25511) Map with "null" key not working in spark 2.3

2018-09-25 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-25511.
-

> Map with "null" key not working in spark 2.3
> 
>
> Key: SPARK-25511
> URL: https://issues.apache.org/jira/browse/SPARK-25511
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Ravi Shankar
>Priority: Major
>
> I had a use case where i was creating a histogram of column values through a 
> UDAF in a Map data type. It is basically just a group by count on a column's 
> value that is returned as a Map. I needed to plugin 
> all invalid values for the column as a "null -> count" in the map that was 
> returned. In 2.1.x, this was working fine and i could create a Map with 
> "null" being a key. This is not working in 2.3 and wondering if this is 
> expected and if i have to change my application code: 
>  
> {code:java}
> val myList = List(("a", "1"), ("b", "2"), ("a", "3"), (null, "4"))
> val map = myList.toMap
> val data = List(List("sublime", map))
> val rdd = sc.parallelize(data).map(l ⇒ Row.fromSeq(l.toSeq))
> val datasetSchema = StructType(List(StructField("name", StringType, true), 
> StructField("songs", MapType(StringType, StringType, true), true)))
> val df = spark.createDataFrame(rdd, datasetSchema)
> df.take(5).foreach(println)
> {code}
> Output in spark 2.1.x:
> {code:java}
> scala> df.take(5).foreach(println)
> [sublime,Map(a -> 3, b -> 2, null -> 4)]
> {code}
> Output in spark 2.3.x:
> {code:java}
> 2018-09-21 15:35:25 ERROR Executor:91 - Exception in task 2.0 in stage 14.0 
> (TID 39)
> java.lang.RuntimeException: Error while encoding: 
> java.lang.NullPointerException: Null value appeared in non-nullable field:
> If the schema is inferred from a Scala tuple/case class, or a Java bean, 
> please try to use scala.Option[_] or other nullable types (e.g. 
> java.lang.Integer instead of int/scala.Int).
> if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null 
> else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
> fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, 
> org.apache.spark.sql.Row, true]), 0, name), StringType), true, false) AS 
> name#38
> if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null 
> else newInstance(class org.apache.spark.sql.catalyst.util.ArrayBasedMapData) 
> AS songs#39
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:291)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$4.apply(SparkSession.scala:589)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$4.apply(SparkSession.scala:589)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   at org.apache.spark.scheduler.Task.run(Task.scala:109)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException: Null value appeared in 
> non-nullable field:
> If the schema is inferred from a Scala tuple/case class, or a Java bean, 
> please try to use scala.Option[_] or other nullable types (e.g. 
> java.lang.Integer instead of int/scala.Int).
>

[jira] [Updated] (SPARK-25511) Map with "null" key not working in spark 2.3

2018-09-25 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25511:
--
Component/s: (was: Optimizer)
 SQL

> Map with "null" key not working in spark 2.3
> 
>
> Key: SPARK-25511
> URL: https://issues.apache.org/jira/browse/SPARK-25511
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Ravi Shankar
>Priority: Major
>
> I had a use case where i was creating a histogram of column values through a 
> UDAF in a Map data type. It is basically just a group by count on a column's 
> value that is returned as a Map. I needed to plugin 
> all invalid values for the column as a "null -> count" in the map that was 
> returned. In 2.1.x, this was working fine and i could create a Map with 
> "null" being a key. This is not working in 2.3 and wondering if this is 
> expected and if i have to change my application code: 
>  
> {code:java}
> val myList = List(("a", "1"), ("b", "2"), ("a", "3"), (null, "4"))
> val map = myList.toMap
> val data = List(List("sublime", map))
> val rdd = sc.parallelize(data).map(l ⇒ Row.fromSeq(l.toSeq))
> val datasetSchema = StructType(List(StructField("name", StringType, true), 
> StructField("songs", MapType(StringType, StringType, true), true)))
> val df = spark.createDataFrame(rdd, datasetSchema)
> df.take(5).foreach(println)
> {code}
> Output in spark 2.1.x:
> {code:java}
> scala> df.take(5).foreach(println)
> [sublime,Map(a -> 3, b -> 2, null -> 4)]
> {code}
> Output in spark 2.3.x:
> {code:java}
> 2018-09-21 15:35:25 ERROR Executor:91 - Exception in task 2.0 in stage 14.0 
> (TID 39)
> java.lang.RuntimeException: Error while encoding: 
> java.lang.NullPointerException: Null value appeared in non-nullable field:
> If the schema is inferred from a Scala tuple/case class, or a Java bean, 
> please try to use scala.Option[_] or other nullable types (e.g. 
> java.lang.Integer instead of int/scala.Int).
> if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null 
> else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
> fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, 
> org.apache.spark.sql.Row, true]), 0, name), StringType), true, false) AS 
> name#38
> if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null 
> else newInstance(class org.apache.spark.sql.catalyst.util.ArrayBasedMapData) 
> AS songs#39
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:291)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$4.apply(SparkSession.scala:589)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$4.apply(SparkSession.scala:589)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   at org.apache.spark.scheduler.Task.run(Task.scala:109)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException: Null value appeared in 
> non-nullable field:
> If the schema is inferred from a Scala tuple/case class, or a Java bean, 
> please try to use scala.Option[_] or other nullable 

[jira] [Resolved] (SPARK-25511) Map with "null" key not working in spark 2.3

2018-09-25 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-25511.
---
Resolution: Invalid

Hi, [~ravi_b_shankar].

Historically, it's designed and documented like that since Apache Spark 1.4.0 
and marked as a stable interface since 2.1.0.

- 
https://github.com/apache/spark/blob/v1.4.0/sql/catalyst/src/main/scala/org/apache/spark/sql/types/MapType.scala#L26

And, now we are releasing Spark 2.4. Given that, your request is invalid for 
general Spark users.

In addition, Spark cannot change the behavior, too. So, sorry, but I'll close 
this as `Invalid`.

> Map with "null" key not working in spark 2.3
> 
>
> Key: SPARK-25511
> URL: https://issues.apache.org/jira/browse/SPARK-25511
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 2.3.1
>Reporter: Ravi Shankar
>Priority: Major
>
> I had a use case where i was creating a histogram of column values through a 
> UDAF in a Map data type. It is basically just a group by count on a column's 
> value that is returned as a Map. I needed to plugin 
> all invalid values for the column as a "null -> count" in the map that was 
> returned. In 2.1.x, this was working fine and i could create a Map with 
> "null" being a key. This is not working in 2.3 and wondering if this is 
> expected and if i have to change my application code: 
>  
> {code:java}
> val myList = List(("a", "1"), ("b", "2"), ("a", "3"), (null, "4"))
> val map = myList.toMap
> val data = List(List("sublime", map))
> val rdd = sc.parallelize(data).map(l ⇒ Row.fromSeq(l.toSeq))
> val datasetSchema = StructType(List(StructField("name", StringType, true), 
> StructField("songs", MapType(StringType, StringType, true), true)))
> val df = spark.createDataFrame(rdd, datasetSchema)
> df.take(5).foreach(println)
> {code}
> Output in spark 2.1.x:
> {code:java}
> scala> df.take(5).foreach(println)
> [sublime,Map(a -> 3, b -> 2, null -> 4)]
> {code}
> Output in spark 2.3.x:
> {code:java}
> 2018-09-21 15:35:25 ERROR Executor:91 - Exception in task 2.0 in stage 14.0 
> (TID 39)
> java.lang.RuntimeException: Error while encoding: 
> java.lang.NullPointerException: Null value appeared in non-nullable field:
> If the schema is inferred from a Scala tuple/case class, or a Java bean, 
> please try to use scala.Option[_] or other nullable types (e.g. 
> java.lang.Integer instead of int/scala.Int).
> if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null 
> else staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, 
> fromString, validateexternaltype(getexternalrowfield(assertnotnull(input[0, 
> org.apache.spark.sql.Row, true]), 0, name), StringType), true, false) AS 
> name#38
> if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null 
> else newInstance(class org.apache.spark.sql.catalyst.util.ArrayBasedMapData) 
> AS songs#39
>   at 
> org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:291)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$4.apply(SparkSession.scala:589)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$4.apply(SparkSession.scala:589)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:830)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>   at org.apache.spark.scheduler.Task.run(Task.scala:109)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>   at 
> 

[jira] [Comment Edited] (SPARK-23985) predicate push down doesn't work with simple compound partition spec

2018-09-25 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625876#comment-16625876
 ] 

Yuming Wang edited comment on SPARK-23985 at 9/26/18 1:54 AM:
--

 [~uzadude] Seem we should not push down predicate. Pelase see these test case:

[https://github.com/apache/spark/blob/2c73d2a948bdde798aaf0f87c18846281deb05fd/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala#L1086-L1144]

 

Here is an example:
{code:scala}
spark.range(10).selectExpr("cast(id % 5 as string) as a", "id as 
b").write.saveAsTable("t1")

val w1 = spark.sql(
  "select * from (select *, row_number() over (partition by alit order by b) as 
rn from " +
"(select *, a % 4 as alit from t1) x) y where a>2 order by a")
w1.show

val w2 = spark.sql(
  "select * from (select *, row_number() over (partition by alit order by b) as 
rn from " +
"(select *, a % 4 as alit from t1 where a> 2) x) y order by a")
w2.show
{code}
output:

{noformat}
+---+---++---+
|  a|  b|alit| rn|
+---+---++---+
|  3|  3| 3.0|  1|
|  3|  8| 3.0|  2|
|  4|  4| 0.0|  2|
|  4|  9| 0.0|  4|
+---+---++---+

+---+---++---+
|  a|  b|alit| rn|
+---+---++---+
|  3|  3| 3.0|  1|
|  3|  8| 3.0|  2|
|  4|  4| 0.0|  1|
|  4|  9| 0.0|  2|
+---+---++---+
{noformat}



was (Author: q79969786):
 [~uzadude] Seem we should not push down predicate. Pelase see these test case:

https://github.com/apache/spark/blob/2c73d2a948bdde798aaf0f87c18846281deb05fd/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala#L1086-L1144

> predicate push down doesn't work with simple compound partition spec
> 
>
> Key: SPARK-23985
> URL: https://issues.apache.org/jira/browse/SPARK-23985
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Ohad Raviv
>Priority: Minor
>
> while predicate push down works with this query: 
> {code:sql}
> select * from (
>select *, row_number() over (partition by a order by b) from t1
> )z 
> where a>1
> {code}
> it dowsn't work with:
> {code:sql}
> select * from (
>select *, row_number() over (partition by concat(a,'lit') order by b) from 
> t1
> )z 
> where a>1
> {code}
>  
>  I added a test to FilterPushdownSuite which I think recreates the problem:
> {code}
>   test("Window: predicate push down -- ohad") {
> val winExpr = windowExpr(count('b),
>   windowSpec(Concat('a :: Nil) :: Nil, 'b.asc :: Nil, UnspecifiedFrame))
> val originalQuery = testRelation.select('a, 'b, 'c, 
> winExpr.as('window)).where('a > 1)
> val correctAnswer = testRelation
>   .where('a > 1).select('a, 'b, 'c)
>   .window(winExpr.as('window) :: Nil, 'a :: Nil, 'b.asc :: Nil)
>   .select('a, 'b, 'c, 'window).analyze
> comparePlans(Optimize.execute(originalQuery.analyze), correctAnswer)
>   }
> {code}
> will try to create a PR with a correction



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25514) Generating pretty JSON by to_json

2018-09-25 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-25514.
--
   Resolution: Fixed
Fix Version/s: 2.5.0

Issue resolved by pull request 22534
[https://github.com/apache/spark/pull/22534]

> Generating pretty JSON by to_json
> -
>
> Key: SPARK-25514
> URL: https://issues.apache.org/jira/browse/SPARK-25514
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
> Fix For: 2.5.0
>
>
> It would be nice to have an option, for example *"pretty"*, which enable 
> special output mode for the to_json function. In the mode, produced JSON 
> string will have easily readable representation. For example:
> {code:scala}
> val json = 
> """[{"book":{"publisher":[{"country":"NL","year":[1981,1986,1999]}]}}]"""
> to_json(from_json('col, ...), Map("pretty" -> "true")))
> [ {
>   "book" : {
> "publisher" : [ {
>   "country" : "NL",
>   "year" : [ 1981, 1986, 1999 ]
> } ]
>   }
> } ]
> {code}
> There are at least two use cases:
> # Exploring content of nested columns. For example, a result of your query is 
> a few rows, and some columns have deep nested structure. And you want to 
> analyze and find a value of one of nested fields.
> # You already have an JSON in one of columns, and want to explore the JSON 
> records. New option will allow to do that easily without copy-past JSON 
> content to an editor by combining from_json and to_json functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25514) Generating pretty JSON by to_json

2018-09-25 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-25514:


Assignee: Maxim Gekk

> Generating pretty JSON by to_json
> -
>
> Key: SPARK-25514
> URL: https://issues.apache.org/jira/browse/SPARK-25514
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Minor
>
> It would be nice to have an option, for example *"pretty"*, which enable 
> special output mode for the to_json function. In the mode, produced JSON 
> string will have easily readable representation. For example:
> {code:scala}
> val json = 
> """[{"book":{"publisher":[{"country":"NL","year":[1981,1986,1999]}]}}]"""
> to_json(from_json('col, ...), Map("pretty" -> "true")))
> [ {
>   "book" : {
> "publisher" : [ {
>   "country" : "NL",
>   "year" : [ 1981, 1986, 1999 ]
> } ]
>   }
> } ]
> {code}
> There are at least two use cases:
> # Exploring content of nested columns. For example, a result of your query is 
> a few rows, and some columns have deep nested structure. And you want to 
> analyze and find a value of one of nested fields.
> # You already have an JSON in one of columns, and want to explore the JSON 
> records. New option will allow to do that easily without copy-past JSON 
> content to an editor by combining from_json and to_json functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21291) R bucketBy partitionBy API

2018-09-25 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-21291.
--
   Resolution: Fixed
Fix Version/s: 2.5.0

Issue resolved by pull request 22537
[https://github.com/apache/spark/pull/22537]

> R bucketBy partitionBy API
> --
>
> Key: SPARK-21291
> URL: https://issues.apache.org/jira/browse/SPARK-21291
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 2.5.0
>
>
> partitionBy exists but it's for windowspec only



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21291) R bucketBy partitionBy API

2018-09-25 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-21291:


Assignee: Huaxin Gao

> R bucketBy partitionBy API
> --
>
> Key: SPARK-21291
> URL: https://issues.apache.org/jira/browse/SPARK-21291
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 2.5.0
>
>
> partitionBy exists but it's for windowspec only



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-25534) Make `SQLHelper` trait

2018-09-25 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25534:
--
Comment: was deleted

(was: User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/22548)

> Make `SQLHelper` trait
> --
>
> Key: SPARK-25534
> URL: https://issues.apache.org/jira/browse/SPARK-25534
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> Currently, Spark has 7 `withTempPath` and 6 `withSQLConf` functions. This PR 
> aims to remove duplicated and inconsistent code and reduce them to the 
> following meaningful implementations.
> *withTempPath*
> - `SQLHelper.withTempPath`: The one which was used in `SQLTestUtils`.
> *withSQLConf*
> - `SQLHelper.withSQLConf`: The one which was used in `PlanTest`.
> - `ExecutorSideSQLConfSuite.withSQLConf`: The one which doesn't throw 
> `AnalysisException` on StaticConf changes.
> - `SQLTestUtils.withSQLConf`: The one which overrides intentionally to change 
> the active session.
> {code}
>   protected override def withSQLConf(pairs: (String, String)*)(f: => Unit): 
> Unit = {
> SparkSession.setActiveSession(spark)
> super.withSQLConf(pairs: _*)(f)
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25534) Make `SQLHelper` trait

2018-09-25 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25534:
--
Summary: Make `SQLHelper` trait  (was: Make `SupportWithSQLConf` trait)

> Make `SQLHelper` trait
> --
>
> Key: SPARK-25534
> URL: https://issues.apache.org/jira/browse/SPARK-25534
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> Currently, Spark has 6 `withSQLConf` function. This issue aims to remove 
> duplicated and inconsistent `withSQLConf` code and reduce them to three 
> meaningful implementations.
> {code}
> $ git grep 'def withSQLConf(' | wc -l
>6
> {code}
> {code}
> $ git grep 'def withSQLConf(' | wc -l
>3
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25534) Make `SQLHelper` trait

2018-09-25 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25534:
--
Description: 
Currently, Spark has 7 `withTempPath` and 6 `withSQLConf` functions. This PR 
aims to remove duplicated and inconsistent code and reduce them to the 
following meaningful implementations.

*withTempPath*
- `SQLHelper.withTempPath`: The one which was used in `SQLTestUtils`.

*withSQLConf*
- `SQLHelper.withSQLConf`: The one which was used in `PlanTest`.
- `ExecutorSideSQLConfSuite.withSQLConf`: The one which doesn't throw 
`AnalysisException` on StaticConf changes.
- `SQLTestUtils.withSQLConf`: The one which overrides intentionally to change 
the active session.
{code}
  protected override def withSQLConf(pairs: (String, String)*)(f: => Unit): 
Unit = {
SparkSession.setActiveSession(spark)
super.withSQLConf(pairs: _*)(f)
  }
{code}

  was:
Currently, Spark has 6 `withSQLConf` function. This issue aims to remove 
duplicated and inconsistent `withSQLConf` code and reduce them to three 
meaningful implementations.

{code}
$ git grep 'def withSQLConf(' | wc -l
   6
{code}

{code}
$ git grep 'def withSQLConf(' | wc -l
   3
{code}


> Make `SQLHelper` trait
> --
>
> Key: SPARK-25534
> URL: https://issues.apache.org/jira/browse/SPARK-25534
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> Currently, Spark has 7 `withTempPath` and 6 `withSQLConf` functions. This PR 
> aims to remove duplicated and inconsistent code and reduce them to the 
> following meaningful implementations.
> *withTempPath*
> - `SQLHelper.withTempPath`: The one which was used in `SQLTestUtils`.
> *withSQLConf*
> - `SQLHelper.withSQLConf`: The one which was used in `PlanTest`.
> - `ExecutorSideSQLConfSuite.withSQLConf`: The one which doesn't throw 
> `AnalysisException` on StaticConf changes.
> - `SQLTestUtils.withSQLConf`: The one which overrides intentionally to change 
> the active session.
> {code}
>   protected override def withSQLConf(pairs: (String, String)*)(f: => Unit): 
> Unit = {
> SparkSession.setActiveSession(spark)
> super.withSQLConf(pairs: _*)(f)
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25501) Kafka delegation token support

2018-09-25 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628120#comment-16628120
 ] 

Apache Spark commented on SPARK-25501:
--

User 'merlintang' has created a pull request for this issue:
https://github.com/apache/spark/pull/22550

> Kafka delegation token support
> --
>
> Key: SPARK-25501
> URL: https://issues.apache.org/jira/browse/SPARK-25501
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Gabor Somogyi
>Priority: Major
>
> In kafka version 1.1 delegation token support is released. As spark updated 
> it's kafka client to 2.0.0 now it's possible to implement delegation token 
> support. Please see description: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-48+Delegation+token+support+for+Kafka



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25501) Kafka delegation token support

2018-09-25 Thread Mingjie Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628119#comment-16628119
 ] 

Mingjie Tang commented on SPARK-25501:
--

[~gsomogyi] if you like, you can propose a PR based my PR as well. 
https://github.com/apache/spark/pull/22550

> Kafka delegation token support
> --
>
> Key: SPARK-25501
> URL: https://issues.apache.org/jira/browse/SPARK-25501
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Gabor Somogyi
>Priority: Major
>
> In kafka version 1.1 delegation token support is released. As spark updated 
> it's kafka client to 2.0.0 now it's possible to implement delegation token 
> support. Please see description: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-48+Delegation+token+support+for+Kafka



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25501) Kafka delegation token support

2018-09-25 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25501:


Assignee: (was: Apache Spark)

> Kafka delegation token support
> --
>
> Key: SPARK-25501
> URL: https://issues.apache.org/jira/browse/SPARK-25501
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Gabor Somogyi
>Priority: Major
>
> In kafka version 1.1 delegation token support is released. As spark updated 
> it's kafka client to 2.0.0 now it's possible to implement delegation token 
> support. Please see description: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-48+Delegation+token+support+for+Kafka



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25501) Kafka delegation token support

2018-09-25 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628117#comment-16628117
 ] 

Apache Spark commented on SPARK-25501:
--

User 'merlintang' has created a pull request for this issue:
https://github.com/apache/spark/pull/22550

> Kafka delegation token support
> --
>
> Key: SPARK-25501
> URL: https://issues.apache.org/jira/browse/SPARK-25501
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Gabor Somogyi
>Priority: Major
>
> In kafka version 1.1 delegation token support is released. As spark updated 
> it's kafka client to 2.0.0 now it's possible to implement delegation token 
> support. Please see description: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-48+Delegation+token+support+for+Kafka



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25501) Kafka delegation token support

2018-09-25 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25501:


Assignee: Apache Spark

> Kafka delegation token support
> --
>
> Key: SPARK-25501
> URL: https://issues.apache.org/jira/browse/SPARK-25501
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Gabor Somogyi
>Assignee: Apache Spark
>Priority: Major
>
> In kafka version 1.1 delegation token support is released. As spark updated 
> it's kafka client to 2.0.0 now it's possible to implement delegation token 
> support. Please see description: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-48+Delegation+token+support+for+Kafka



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25422) flaky test: org.apache.spark.DistributedSuite.caching on disk, replicated (encryption = on) (with replication as stream)

2018-09-25 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-25422.
-
   Resolution: Fixed
Fix Version/s: 2.4.0

Issue resolved by pull request 22546
[https://github.com/apache/spark/pull/22546]

> flaky test: org.apache.spark.DistributedSuite.caching on disk, replicated 
> (encryption = on) (with replication as stream)
> 
>
> Key: SPARK-25422
> URL: https://issues.apache.org/jira/browse/SPARK-25422
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Wenchen Fan
>Assignee: Imran Rashid
>Priority: Major
> Fix For: 2.4.0
>
>
> stacktrace
> {code}
>  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 7, localhost, executor 1): java.io.IOException: 
> org.apache.spark.SparkException: corrupt remote block broadcast_0_piece0 of 
> broadcast_0: 1651574976 != 1165629262
>   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1320)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
>   at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:84)
>   at org.apache.spark.scheduler.Task.run(Task.scala:121)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$7.apply(Executor.scala:367)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1347)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:373)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: corrupt remote block 
> broadcast_0_piece0 of broadcast_0: 1651574976 != 1165629262
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:167)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:151)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:151)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:151)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:231)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
>   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1313)
>   ... 13 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25422) flaky test: org.apache.spark.DistributedSuite.caching on disk, replicated (encryption = on) (with replication as stream)

2018-09-25 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-25422:
---

Assignee: Imran Rashid

> flaky test: org.apache.spark.DistributedSuite.caching on disk, replicated 
> (encryption = on) (with replication as stream)
> 
>
> Key: SPARK-25422
> URL: https://issues.apache.org/jira/browse/SPARK-25422
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Wenchen Fan
>Assignee: Imran Rashid
>Priority: Major
> Fix For: 2.4.0
>
>
> stacktrace
> {code}
>  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 7, localhost, executor 1): java.io.IOException: 
> org.apache.spark.SparkException: corrupt remote block broadcast_0_piece0 of 
> broadcast_0: 1651574976 != 1165629262
>   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1320)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
>   at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:84)
>   at org.apache.spark.scheduler.Task.run(Task.scala:121)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$7.apply(Executor.scala:367)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1347)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:373)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: corrupt remote block 
> broadcast_0_piece0 of broadcast_0: 1651574976 != 1165629262
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:167)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:151)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:151)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:151)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:231)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
>   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1313)
>   ... 13 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23839) consider bucket join in cost-based JoinReorder rule

2018-09-25 Thread Xiao Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628080#comment-16628080
 ] 

Xiao Li commented on SPARK-23839:
-

To implement CBO in planner, we need a major change in our planner. The 
stats-based JoinReorder rule is just the current workaround before we doing the 
actual cost-based optimizer. 

> consider bucket join in cost-based JoinReorder rule
> ---
>
> Key: SPARK-23839
> URL: https://issues.apache.org/jira/browse/SPARK-23839
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiaoju Wu
>Priority: Minor
>
> Since spark 2.2, the cost-based JoinReorder rule is implemented and in Spark 
> 2.3 released, it is improved with histogram. While it doesn't take the cost 
> of the different join implementations. For example:
> TableA JOIN TableB JOIN TableC
> TableA  will output 10,000 rows after filter and projection. 
> TableB  will output 10,000 rows after filter and projection. 
> TableC  will output 8,000 rows after filter and projection. 
> The current JoinReorder rule will possibly optimize the plan to join TableC 
> with TableA firstly and then TableB. But if the TableA and TableB are bucket 
> tables and can be applied with BucketJoin, it could be a different story. 
>  
> Also, to support bucket join of more than 2 tables when table bucket number 
> is multiple of another (SPARK-17570), whether bucket join can take effect 
> depends on the result of JoinReorder. For example of "A join B join C" which 
> has bucket number like 8, 4, 12, JoinReorder rule should optimize the order 
> to "A join B join C“ to make the bucket join take effect instead of "C join A 
> join B". 
>  
> Based on current CBO JoinReorder, there are possibly 2 part to be changed:
>  # CostBasedJoinReorder rule is applied in optimizer phase while we do Join 
> selection in planner phase and bucket join optimization in EnsureRequirements 
> which is in preparation phase. Both are after optimizer. 
>  # Current statistics and join cost formula are based data selectivity and 
> cardinality, we need to add statistics for present the join method cost like 
> shuffle, sort, hash and etc. Also we need to add the statistics into the 
> formula to estimate the join cost. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25535) Work around bad error checking in commons-crypto

2018-09-25 Thread Marcelo Vanzin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628078#comment-16628078
 ] 

Marcelo Vanzin commented on SPARK-25535:


I filed CRYPTO-141 for the commons-crypto fix. But in some parts of Spark we 
can avoid that issue, so let's do that to avoid requiring a new release of that 
library.

> Work around bad error checking in commons-crypto
> 
>
> Key: SPARK-25535
> URL: https://issues.apache.org/jira/browse/SPARK-25535
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.2
>Reporter: Marcelo Vanzin
>Priority: Major
>
> The commons-crypto library used for encryption can get confused when certain 
> errors happen; that can lead to crashes since the Java side thinks the 
> ciphers are still valid while the native side has already cleaned up the 
> ciphers.
> We can work around that in Spark by doing some error checking at a higher 
> level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25535) Work around bad error checking in commons-crypto

2018-09-25 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created SPARK-25535:
--

 Summary: Work around bad error checking in commons-crypto
 Key: SPARK-25535
 URL: https://issues.apache.org/jira/browse/SPARK-25535
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.2
Reporter: Marcelo Vanzin


The commons-crypto library used for encryption can get confused when certain 
errors happen; that can lead to crashes since the Java side thinks the ciphers 
are still valid while the native side has already cleaned up the ciphers.

We can work around that in Spark by doing some error checking at a higher level.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25533) Inconsistent message for Completed Jobs in the JobUI, when there are failed jobs, compared to spark2.2

2018-09-25 Thread shahid (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-25533:
---
Description: 
Test steps:

 1) bin/spark-shell
{code:java}
sc.parallelize(1 to 5, 5).collect()
sc.parallelize(1 to 5, 2).map{ x => throw new RuntimeException("Fail 
Job")}.collect()
{code}
*Output in spark - 2.3.1:*

!Screenshot from 2018-09-26 00-42-00.png!

*Output in spark - 2.2.1:*

!Screenshot from 2018-09-26 00-46-35.png!

  was:
Test steps:

 1) bin/spark-shell
{code:java}
sc.parallelize(1 to 5, 5).collect()
sc.parallelize(1 to 5, 2).map{ x => throw new RuntimeException("Bad 
executor")}.collect()
{code}
*Output in spark - 2.3.1:*

 !Screenshot from 2018-09-26 00-42-00.png! 

*Output in spark - 2.2.1:*

 !Screenshot from 2018-09-26 00-46-35.png! 


> Inconsistent message for Completed Jobs in the  JobUI, when there are failed 
> jobs, compared to spark2.2
> ---
>
> Key: SPARK-25533
> URL: https://issues.apache.org/jira/browse/SPARK-25533
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: shahid
>Priority: Major
> Attachments: Screenshot from 2018-09-26 00-42-00.png, Screenshot from 
> 2018-09-26 00-46-35.png
>
>
> Test steps:
>  1) bin/spark-shell
> {code:java}
> sc.parallelize(1 to 5, 5).collect()
> sc.parallelize(1 to 5, 2).map{ x => throw new RuntimeException("Fail 
> Job")}.collect()
> {code}
> *Output in spark - 2.3.1:*
> !Screenshot from 2018-09-26 00-42-00.png!
> *Output in spark - 2.2.1:*
> !Screenshot from 2018-09-26 00-46-35.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25102) Write Spark version information to Parquet file footers

2018-09-25 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25102:
--
Component/s: (was: Spark Core)
 SQL

> Write Spark version information to Parquet file footers
> ---
>
> Key: SPARK-25102
> URL: https://issues.apache.org/jira/browse/SPARK-25102
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Zoltan Ivanfi
>Priority: Minor
>
> -PARQUET-352- added support for the "writer.model.name" property in the 
> Parquet metadata to identify the object model (application) that wrote the 
> file.
> The easiest way to write this property is by overriding getName() of 
> org.apache.parquet.hadoop.api.WriteSupport. In Spark, this would mean adding 
> getName() to the 
> org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25102) Write Spark version information to Parquet file footers

2018-09-25 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25102:
--
Priority: Minor  (was: Major)

> Write Spark version information to Parquet file footers
> ---
>
> Key: SPARK-25102
> URL: https://issues.apache.org/jira/browse/SPARK-25102
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Zoltan Ivanfi
>Priority: Minor
>
> -PARQUET-352- added support for the "writer.model.name" property in the 
> Parquet metadata to identify the object model (application) that wrote the 
> file.
> The easiest way to write this property is by overriding getName() of 
> org.apache.parquet.hadoop.api.WriteSupport. In Spark, this would mean adding 
> getName() to the 
> org.apache.spark.sql.execution.datasources.parquet.ParquetWriteSupport class.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25533) Inconsistent message for Completed Jobs in the JobUI, when there are failed jobs, compared to spark2.2

2018-09-25 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627979#comment-16627979
 ] 

Apache Spark commented on SPARK-25533:
--

User 'shahidki31' has created a pull request for this issue:
https://github.com/apache/spark/pull/22549

> Inconsistent message for Completed Jobs in the  JobUI, when there are failed 
> jobs, compared to spark2.2
> ---
>
> Key: SPARK-25533
> URL: https://issues.apache.org/jira/browse/SPARK-25533
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: shahid
>Priority: Major
> Attachments: Screenshot from 2018-09-26 00-42-00.png, Screenshot from 
> 2018-09-26 00-46-35.png
>
>
> Test steps:
>  1) bin/spark-shell
> {code:java}
> sc.parallelize(1 to 5, 5).collect()
> sc.parallelize(1 to 5, 2).map{ x => throw new RuntimeException("Bad 
> executor")}.collect()
> {code}
> *Output in spark - 2.3.1:*
>  !Screenshot from 2018-09-26 00-42-00.png! 
> *Output in spark - 2.2.1:*
>  !Screenshot from 2018-09-26 00-46-35.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25533) Inconsistent message for Completed Jobs in the JobUI, when there are failed jobs, compared to spark2.2

2018-09-25 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25533:


Assignee: (was: Apache Spark)

> Inconsistent message for Completed Jobs in the  JobUI, when there are failed 
> jobs, compared to spark2.2
> ---
>
> Key: SPARK-25533
> URL: https://issues.apache.org/jira/browse/SPARK-25533
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: shahid
>Priority: Major
> Attachments: Screenshot from 2018-09-26 00-42-00.png, Screenshot from 
> 2018-09-26 00-46-35.png
>
>
> Test steps:
>  1) bin/spark-shell
> {code:java}
> sc.parallelize(1 to 5, 5).collect()
> sc.parallelize(1 to 5, 2).map{ x => throw new RuntimeException("Bad 
> executor")}.collect()
> {code}
> *Output in spark - 2.3.1:*
>  !Screenshot from 2018-09-26 00-42-00.png! 
> *Output in spark - 2.2.1:*
>  !Screenshot from 2018-09-26 00-46-35.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25533) Inconsistent message for Completed Jobs in the JobUI, when there are failed jobs, compared to spark2.2

2018-09-25 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25533:


Assignee: Apache Spark

> Inconsistent message for Completed Jobs in the  JobUI, when there are failed 
> jobs, compared to spark2.2
> ---
>
> Key: SPARK-25533
> URL: https://issues.apache.org/jira/browse/SPARK-25533
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: shahid
>Assignee: Apache Spark
>Priority: Major
> Attachments: Screenshot from 2018-09-26 00-42-00.png, Screenshot from 
> 2018-09-26 00-46-35.png
>
>
> Test steps:
>  1) bin/spark-shell
> {code:java}
> sc.parallelize(1 to 5, 5).collect()
> sc.parallelize(1 to 5, 2).map{ x => throw new RuntimeException("Bad 
> executor")}.collect()
> {code}
> *Output in spark - 2.3.1:*
>  !Screenshot from 2018-09-26 00-42-00.png! 
> *Output in spark - 2.2.1:*
>  !Screenshot from 2018-09-26 00-46-35.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25534) Make `SupportWithSQLConf` trait

2018-09-25 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627974#comment-16627974
 ] 

Apache Spark commented on SPARK-25534:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/22548

> Make `SupportWithSQLConf` trait
> ---
>
> Key: SPARK-25534
> URL: https://issues.apache.org/jira/browse/SPARK-25534
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> Currently, Spark has 6 `withSQLConf` function. This issue aims to remove 
> duplicated and inconsistent `withSQLConf` code and reduce them to three 
> meaningful implementations.
> {code}
> $ git grep 'def withSQLConf(' | wc -l
>6
> {code}
> {code}
> $ git grep 'def withSQLConf(' | wc -l
>3
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25534) Make `SupportWithSQLConf` trait

2018-09-25 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25534:


Assignee: Dongjoon Hyun  (was: Apache Spark)

> Make `SupportWithSQLConf` trait
> ---
>
> Key: SPARK-25534
> URL: https://issues.apache.org/jira/browse/SPARK-25534
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> Currently, Spark has 6 `withSQLConf` function. This issue aims to remove 
> duplicated and inconsistent `withSQLConf` code and reduce them to three 
> meaningful implementations.
> {code}
> $ git grep 'def withSQLConf(' | wc -l
>6
> {code}
> {code}
> $ git grep 'def withSQLConf(' | wc -l
>3
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25534) Make `SupportWithSQLConf` trait

2018-09-25 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25534:


Assignee: Apache Spark  (was: Dongjoon Hyun)

> Make `SupportWithSQLConf` trait
> ---
>
> Key: SPARK-25534
> URL: https://issues.apache.org/jira/browse/SPARK-25534
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>
> Currently, Spark has 6 `withSQLConf` function. This issue aims to remove 
> duplicated and inconsistent `withSQLConf` code and reduce them to three 
> meaningful implementations.
> {code}
> $ git grep 'def withSQLConf(' | wc -l
>6
> {code}
> {code}
> $ git grep 'def withSQLConf(' | wc -l
>3
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25534) Make `SupportWithSQLConf` trait

2018-09-25 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627973#comment-16627973
 ] 

Apache Spark commented on SPARK-25534:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/22548

> Make `SupportWithSQLConf` trait
> ---
>
> Key: SPARK-25534
> URL: https://issues.apache.org/jira/browse/SPARK-25534
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> Currently, Spark has 6 `withSQLConf` function. This issue aims to remove 
> duplicated and inconsistent `withSQLConf` code and reduce them to three 
> meaningful implementations.
> {code}
> $ git grep 'def withSQLConf(' | wc -l
>6
> {code}
> {code}
> $ git grep 'def withSQLConf(' | wc -l
>3
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25534) Make `SupportWithSQLConf` trait

2018-09-25 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25534:
--
Description: 
Currently, Spark has 6 `withSQLConf` function. This issue aims to remove 
duplicated and inconsistent `withSQLConf` code and reduce them to three 
meaningful implementations.

{code}
$ git grep 'def withSQLConf(' | wc -l
   6
{code}

{code}
$ git grep 'def withSQLConf(' | wc -l
   3
{code}

  was:
Currently, Spark has 6 `withSQLConf` function. This issue aims to remove 
duplicated and inconsistent `withSQLConf` code and reduce them to two 
meaningful implementations.

{code}
$ git grep 'def withSQLConf(' | wc -l
   6
{code}

{code}
$ git grep 'def withSQLConf(' | wc -l
   3
{code}


> Make `SupportWithSQLConf` trait
> ---
>
> Key: SPARK-25534
> URL: https://issues.apache.org/jira/browse/SPARK-25534
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> Currently, Spark has 6 `withSQLConf` function. This issue aims to remove 
> duplicated and inconsistent `withSQLConf` code and reduce them to three 
> meaningful implementations.
> {code}
> $ git grep 'def withSQLConf(' | wc -l
>6
> {code}
> {code}
> $ git grep 'def withSQLConf(' | wc -l
>3
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25534) Make `SupportWithSQLConf` trait

2018-09-25 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25534:
--
Description: 
Currently, Spark has 6 `withSQLConf` function. This issue aims to remove 
duplicated and inconsistent `withSQLConf` code and reduce them to two 
meaningful implementations.

{code}
$ git grep 'def withSQLConf(' | wc -l
   6
{code}

{code}
$ git grep 'def withSQLConf(' | wc -l
   3
{code}

  was:
Currently, Spark has 6 `withSQLConf` function. This issue aims to remove 
duplicated and inconsistent `withSQLConf` code and reduce them to two 
meaningful implementations.

{code}
$ git grep 'def withSQLConf(' | wc -l
   6
{code}

{code}
$ git grep 'def withSQLConf(' | wc -l
   2
{code}


> Make `SupportWithSQLConf` trait
> ---
>
> Key: SPARK-25534
> URL: https://issues.apache.org/jira/browse/SPARK-25534
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> Currently, Spark has 6 `withSQLConf` function. This issue aims to remove 
> duplicated and inconsistent `withSQLConf` code and reduce them to two 
> meaningful implementations.
> {code}
> $ git grep 'def withSQLConf(' | wc -l
>6
> {code}
> {code}
> $ git grep 'def withSQLConf(' | wc -l
>3
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25534) Make `SupportWithSQLConf` trait

2018-09-25 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25534:
--
Description: 
Currently, Spark has 6 `withSQLConf` function. This issue aims to remove 
duplicated and inconsistent `withSQLConf` code and reduce them to two 
meaningful implementations.

{code}
$ git grep 'def withSQLConf(' | wc -l
   6
{code}

{code}
$ git grep 'def withSQLConf(' | wc -l
   2
{code}

  was:
Currently, Spark has 6 `withSQLConf` function. This issue aims to remove 
duplicated and inconsistent `withSQLConf` code and reduce them to two 
implementation.

{code}
$ git grep 'def withSQLConf(' | wc -l
   6
{code}

{code}
$ git grep 'def withSQLConf(' | wc -l
   2
{code}


> Make `SupportWithSQLConf` trait
> ---
>
> Key: SPARK-25534
> URL: https://issues.apache.org/jira/browse/SPARK-25534
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> Currently, Spark has 6 `withSQLConf` function. This issue aims to remove 
> duplicated and inconsistent `withSQLConf` code and reduce them to two 
> meaningful implementations.
> {code}
> $ git grep 'def withSQLConf(' | wc -l
>6
> {code}
> {code}
> $ git grep 'def withSQLConf(' | wc -l
>2
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25534) Make `SupportWithSQLConf` trait

2018-09-25 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627961#comment-16627961
 ] 

Dongjoon Hyun commented on SPARK-25534:
---

I'll make a PR soon.

> Make `SupportWithSQLConf` trait
> ---
>
> Key: SPARK-25534
> URL: https://issues.apache.org/jira/browse/SPARK-25534
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> Currently, Spark has 6 `withSQLConf` function. This issue aims to remove 
> duplicated and inconsistent `withSQLConf` code and reduce them to two 
> implementation.
> {code}
> $ git grep 'def withSQLConf(' | wc -l
>6
> {code}
> {code}
> $ git grep 'def withSQLConf(' | wc -l
>2
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25534) Make `SupportWithSQLConf` trait

2018-09-25 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-25534:
-

Assignee: Dongjoon Hyun

> Make `SupportWithSQLConf` trait
> ---
>
> Key: SPARK-25534
> URL: https://issues.apache.org/jira/browse/SPARK-25534
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> Currently, Spark has 6 `withSQLConf` function. This issue aims to remove 
> duplicated and inconsistent `withSQLConf` code and reduce them to two 
> implementation.
> {code}
> $ git grep 'def withSQLConf(' | wc -l
>6
> {code}
> {code}
> $ git grep 'def withSQLConf(' | wc -l
>2
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25534) Make `SupportWithSQLConf` trait

2018-09-25 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-25534:
-

 Summary: Make `SupportWithSQLConf` trait
 Key: SPARK-25534
 URL: https://issues.apache.org/jira/browse/SPARK-25534
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.5.0
Reporter: Dongjoon Hyun


Currently, Spark has 6 `withSQLConf` function. This issue aims to remove 
duplicated and inconsistent `withSQLConf` code and reduce them to two 
implementation.

{code}
$ git grep 'def withSQLConf(' | wc -l
   6
{code}

{code}
$ git grep 'def withSQLConf(' | wc -l
   2
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25501) Kafka delegation token support

2018-09-25 Thread Mingjie Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627934#comment-16627934
 ] 

Mingjie Tang commented on SPARK-25501:
--

[~gsomogyi] any updates on your PR, I can propose a PR as well. 

> Kafka delegation token support
> --
>
> Key: SPARK-25501
> URL: https://issues.apache.org/jira/browse/SPARK-25501
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Gabor Somogyi
>Priority: Major
>
> In kafka version 1.1 delegation token support is released. As spark updated 
> it's kafka client to 2.0.0 now it's possible to implement delegation token 
> support. Please see description: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-48+Delegation+token+support+for+Kafka



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25533) Inconsistent message for Completed Jobs in the JobUI, when there are failed jobs, compared to spark2.2

2018-09-25 Thread shahid (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-25533:
---
Summary: Inconsistent message for Completed Jobs in the  JobUI, when there 
are failed jobs, compared to spark2.2  (was: Inconsistent message of Completed 
Jobs in the  JobUI, when there are failed jobs, compared to spark2.2)

> Inconsistent message for Completed Jobs in the  JobUI, when there are failed 
> jobs, compared to spark2.2
> ---
>
> Key: SPARK-25533
> URL: https://issues.apache.org/jira/browse/SPARK-25533
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: shahid
>Priority: Major
> Attachments: Screenshot from 2018-09-26 00-42-00.png, Screenshot from 
> 2018-09-26 00-46-35.png
>
>
> Test steps:
>  1) bin/spark-shell
> {code:java}
> sc.parallelize(1 to 5, 5).collect()
> sc.parallelize(1 to 5, 2).map{ x => throw new RuntimeException("Bad 
> executor")}.collect()
> {code}
> *Output in spark - 2.3.1:*
>  !Screenshot from 2018-09-26 00-42-00.png! 
> *Output in spark - 2.2.1:*
>  !Screenshot from 2018-09-26 00-46-35.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25533) Inconsistent message of Completed Jobs in the JobUI, when there are failed jobs, compared to spark2.2

2018-09-25 Thread shahid (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-25533:
---
Attachment: (was: Screenshot from 2018-09-26 00-42-00.png)

> Inconsistent message of Completed Jobs in the  JobUI, when there are failed 
> jobs, compared to spark2.2
> --
>
> Key: SPARK-25533
> URL: https://issues.apache.org/jira/browse/SPARK-25533
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: shahid
>Priority: Major
> Attachments: Screenshot from 2018-09-26 00-42-00.png, Screenshot from 
> 2018-09-26 00-46-35.png
>
>
> Test steps:
>  1) bin/spark-shell
> {code:java}
> sc.parallelize(1 to 5, 5).collect()
> sc.parallelize(1 to 5, 2).map{ x => throw new RuntimeException("Bad 
> executor")}.collect()
> {code}
> *Output in spark - 2.3.1:*
>  !Screenshot from 2018-09-26 00-42-00.png! 
> *Output in spark - 2.2.1:*
>  !Screenshot from 2018-09-26 00-46-35.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25533) Inconsistent message of Completed Jobs in the JobUI, when there are failed jobs, compared to spark2.2

2018-09-25 Thread shahid (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-25533:
---
Description: 
Test steps:

 1) bin/spark-shell
{code:java}
sc.parallelize(1 to 5, 5).collect()
sc.parallelize(1 to 5, 2).map{ x => throw new RuntimeException("Bad 
executor")}.collect()
{code}
*Output in spark - 2.3.1:*

 !Screenshot from 2018-09-26 00-42-00.png! 

*Output in spark - 2.2.1:*

 !Screenshot from 2018-09-26 00-46-35.png! 

  was:
Test steps:

 1) bin/spark-shell
{code:java}
sc.parallelize(1 to 5, 5).collect()
sc.parallelize(1 to 5, 2).map{ x => throw new RuntimeException("Bad 
executor")}.collect()
{code}
*Output in spark - 2.3.1:*

!image-2018-09-26-00-47-42-209.png!

*Output in spark - 2.2.1:*

!image-2018-09-26-00-47-16-344.png!


> Inconsistent message of Completed Jobs in the  JobUI, when there are failed 
> jobs, compared to spark2.2
> --
>
> Key: SPARK-25533
> URL: https://issues.apache.org/jira/browse/SPARK-25533
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: shahid
>Priority: Major
> Attachments: Screenshot from 2018-09-26 00-42-00.png, Screenshot from 
> 2018-09-26 00-46-35.png
>
>
> Test steps:
>  1) bin/spark-shell
> {code:java}
> sc.parallelize(1 to 5, 5).collect()
> sc.parallelize(1 to 5, 2).map{ x => throw new RuntimeException("Bad 
> executor")}.collect()
> {code}
> *Output in spark - 2.3.1:*
>  !Screenshot from 2018-09-26 00-42-00.png! 
> *Output in spark - 2.2.1:*
>  !Screenshot from 2018-09-26 00-46-35.png! 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25533) Inconsistent message of Completed Jobs in the JobUI, when there are failed jobs, compared to spark2.2

2018-09-25 Thread shahid (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-25533:
---
Attachment: Screenshot from 2018-09-26 00-46-35.png

> Inconsistent message of Completed Jobs in the  JobUI, when there are failed 
> jobs, compared to spark2.2
> --
>
> Key: SPARK-25533
> URL: https://issues.apache.org/jira/browse/SPARK-25533
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: shahid
>Priority: Major
> Attachments: Screenshot from 2018-09-26 00-42-00.png, Screenshot from 
> 2018-09-26 00-42-00.png, Screenshot from 2018-09-26 00-46-35.png
>
>
> Test steps:
>  1) bin/spark-shell
> {code:java}
> sc.parallelize(1 to 5, 5).collect()
> sc.parallelize(1 to 5, 2).map{ x => throw new RuntimeException("Bad 
> executor")}.collect()
> {code}
> *Output in spark - 2.3.1:*
> !image-2018-09-26-00-47-42-209.png!
> *Output in spark - 2.2.1:*
> !image-2018-09-26-00-47-16-344.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25533) Inconsistent message of Completed Jobs in the JobUI, when there are failed jobs, compared to spark2.2

2018-09-25 Thread shahid (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-25533:
---
Attachment: Screenshot from 2018-09-26 00-42-00.png

> Inconsistent message of Completed Jobs in the  JobUI, when there are failed 
> jobs, compared to spark2.2
> --
>
> Key: SPARK-25533
> URL: https://issues.apache.org/jira/browse/SPARK-25533
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: shahid
>Priority: Major
> Attachments: Screenshot from 2018-09-26 00-42-00.png, Screenshot from 
> 2018-09-26 00-42-00.png, Screenshot from 2018-09-26 00-46-35.png
>
>
> Test steps:
>  1) bin/spark-shell
> {code:java}
> sc.parallelize(1 to 5, 5).collect()
> sc.parallelize(1 to 5, 2).map{ x => throw new RuntimeException("Bad 
> executor")}.collect()
> {code}
> *Output in spark - 2.3.1:*
> !image-2018-09-26-00-47-42-209.png!
> *Output in spark - 2.2.1:*
> !image-2018-09-26-00-47-16-344.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25533) Inconsistent message of Completed Jobs in the JobUI, when there are failed jobs, compared to spark2.2

2018-09-25 Thread shahid (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627823#comment-16627823
 ] 

shahid commented on SPARK-25533:


I am working on it.

> Inconsistent message of Completed Jobs in the  JobUI, when there are failed 
> jobs, compared to spark2.2
> --
>
> Key: SPARK-25533
> URL: https://issues.apache.org/jira/browse/SPARK-25533
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: shahid
>Priority: Major
> Attachments: Screenshot from 2018-09-26 00-42-00.png
>
>
> Test steps:
>  1) bin/spark-shell
> {code:java}
> sc.parallelize(1 to 5, 5).collect()
> sc.parallelize(1 to 5, 2).map{ x => throw new RuntimeException("Bad 
> executor")}.collect()
> {code}
> *Output in spark - 2.3.1:*
> !image-2018-09-26-00-47-42-209.png!
> *Output in spark - 2.2.1:*
> !image-2018-09-26-00-47-16-344.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25533) Inconsistent message of Completed Jobs in the JobUI, when there are failed jobs, compared to spark2.2

2018-09-25 Thread shahid (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

shahid updated SPARK-25533:
---
Attachment: Screenshot from 2018-09-26 00-42-00.png

> Inconsistent message of Completed Jobs in the  JobUI, when there are failed 
> jobs, compared to spark2.2
> --
>
> Key: SPARK-25533
> URL: https://issues.apache.org/jira/browse/SPARK-25533
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: shahid
>Priority: Major
> Attachments: Screenshot from 2018-09-26 00-42-00.png
>
>
> Test steps:
>  1) bin/spark-shell
> {code:java}
> sc.parallelize(1 to 5, 5).collect()
> sc.parallelize(1 to 5, 2).map{ x => throw new RuntimeException("Bad 
> executor")}.collect()
> {code}
> *Output in spark - 2.3.1:*
> !image-2018-09-26-00-47-42-209.png!
> *Output in spark - 2.2.1:*
> !image-2018-09-26-00-47-16-344.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25533) Inconsistent message of Completed Jobs in the JobUI, when there are failed jobs, compared to spark2.2

2018-09-25 Thread shahid (JIRA)
shahid created SPARK-25533:
--

 Summary: Inconsistent message of Completed Jobs in the  JobUI, 
when there are failed jobs, compared to spark2.2
 Key: SPARK-25533
 URL: https://issues.apache.org/jira/browse/SPARK-25533
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.3.1
Reporter: shahid


Test steps:

 1) bin/spark-shell
{code:java}
sc.parallelize(1 to 5, 5).collect()
sc.parallelize(1 to 5, 2).map{ x => throw new RuntimeException("Bad 
executor")}.collect()
{code}
*Output in spark - 2.3.1:*

!image-2018-09-26-00-47-42-209.png!

*Output in spark - 2.2.1:*

!image-2018-09-26-00-47-16-344.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25495) FetchedData.reset doesn't reset _nextOffsetInFetchedData and _offsetAfterPoll

2018-09-25 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-25495.
--
   Resolution: Fixed
Fix Version/s: 2.4.0

> FetchedData.reset doesn't reset _nextOffsetInFetchedData and _offsetAfterPoll
> -
>
> Key: SPARK-25495
> URL: https://issues.apache.org/jira/browse/SPARK-25495
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Blocker
>  Labels: correctness
> Fix For: 2.4.0
>
>
> FetchedData.reset doesn't reset _nextOffsetInFetchedData and _offsetAfterPoll 
> and causes inconsistent cached data and may make Kafka connector return wrong 
> results.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-17636) Parquet filter push down doesn't handle struct fields

2018-09-25 Thread DB Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DB Tsai reassigned SPARK-17636:
---

Assignee: DB Tsai

> Parquet filter push down doesn't handle struct fields
> -
>
> Key: SPARK-17636
> URL: https://issues.apache.org/jira/browse/SPARK-17636
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.6.2, 1.6.3, 2.0.2
>Reporter: Mitesh
>Assignee: DB Tsai
>Priority: Minor
> Fix For: 2.5.0
>
>
> There's a *PushedFilters* for a simple numeric field, but not for a numeric 
> field inside a struct. Not sure if this is a Spark limitation because of 
> Parquet, or only a Spark limitation.
> {noformat}
> scala> hc.read.parquet("s3a://some/parquet/file").select("day_timestamp", 
> "sale_id")
> res5: org.apache.spark.sql.DataFrame = [day_timestamp: 
> struct, sale_id: bigint]
> scala> res5.filter("sale_id > 4").queryExecution.executedPlan
> res9: org.apache.spark.sql.execution.SparkPlan =
> Filter[23814] [args=(sale_id#86324L > 
> 4)][outPart=UnknownPartitioning(0)][outOrder=List()]
> +- Scan ParquetRelation[day_timestamp#86302,sale_id#86324L] InputPaths: 
> s3a://some/parquet/file, PushedFilters: [GreaterThan(sale_id,4)]
> scala> res5.filter("day_timestamp.timestamp > 4").queryExecution.executedPlan
> res10: org.apache.spark.sql.execution.SparkPlan =
> Filter[23815] [args=(day_timestamp#86302.timestamp > 
> 4)][outPart=UnknownPartitioning(0)][outOrder=List()]
> +- Scan ParquetRelation[day_timestamp#86302,sale_id#86324L] InputPaths: 
> s3a://some/parquet/file
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17636) Parquet filter push down doesn't handle struct fields

2018-09-25 Thread DB Tsai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DB Tsai updated SPARK-17636:

Fix Version/s: 2.5.0

> Parquet filter push down doesn't handle struct fields
> -
>
> Key: SPARK-17636
> URL: https://issues.apache.org/jira/browse/SPARK-17636
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 1.6.2, 1.6.3, 2.0.2
>Reporter: Mitesh
>Assignee: DB Tsai
>Priority: Minor
> Fix For: 2.5.0
>
>
> There's a *PushedFilters* for a simple numeric field, but not for a numeric 
> field inside a struct. Not sure if this is a Spark limitation because of 
> Parquet, or only a Spark limitation.
> {noformat}
> scala> hc.read.parquet("s3a://some/parquet/file").select("day_timestamp", 
> "sale_id")
> res5: org.apache.spark.sql.DataFrame = [day_timestamp: 
> struct, sale_id: bigint]
> scala> res5.filter("sale_id > 4").queryExecution.executedPlan
> res9: org.apache.spark.sql.execution.SparkPlan =
> Filter[23814] [args=(sale_id#86324L > 
> 4)][outPart=UnknownPartitioning(0)][outOrder=List()]
> +- Scan ParquetRelation[day_timestamp#86302,sale_id#86324L] InputPaths: 
> s3a://some/parquet/file, PushedFilters: [GreaterThan(sale_id,4)]
> scala> res5.filter("day_timestamp.timestamp > 4").queryExecution.executedPlan
> res10: org.apache.spark.sql.execution.SparkPlan =
> Filter[23815] [args=(day_timestamp#86302.timestamp > 
> 4)][outPart=UnknownPartitioning(0)][outOrder=List()]
> +- Scan ParquetRelation[day_timestamp#86302,sale_id#86324L] InputPaths: 
> s3a://some/parquet/file
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25486) Refactor SortBenchmark to use main method

2018-09-25 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-25486.
---
   Resolution: Fixed
Fix Version/s: 2.5.0

Issue resolved by pull request 22495
[https://github.com/apache/spark/pull/22495]

> Refactor SortBenchmark to use main method
> -
>
> Key: SPARK-25486
> URL: https://issues.apache.org/jira/browse/SPARK-25486
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: yucai
>Assignee: yucai
>Priority: Major
> Fix For: 2.5.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25486) Refactor SortBenchmark to use main method

2018-09-25 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-25486:
-

Assignee: yucai

> Refactor SortBenchmark to use main method
> -
>
> Key: SPARK-25486
> URL: https://issues.apache.org/jira/browse/SPARK-25486
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: yucai
>Assignee: yucai
>Priority: Major
> Fix For: 2.5.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25528) data source V2 read side API refactoring

2018-09-25 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627650#comment-16627650
 ] 

Apache Spark commented on SPARK-25528:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/22547

> data source V2 read side API refactoring
> 
>
> Key: SPARK-25528
> URL: https://issues.apache.org/jira/browse/SPARK-25528
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>
> refactor the read side API according to this abstraction
> {code}
> batch: catalog -> table -> scan
> streaming: catalog -> table -> stream -> scan
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25528) data source V2 read side API refactoring

2018-09-25 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627649#comment-16627649
 ] 

Apache Spark commented on SPARK-25528:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/22547

> data source V2 read side API refactoring
> 
>
> Key: SPARK-25528
> URL: https://issues.apache.org/jira/browse/SPARK-25528
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>
> refactor the read side API according to this abstraction
> {code}
> batch: catalog -> table -> scan
> streaming: catalog -> table -> stream -> scan
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25528) data source V2 read side API refactoring

2018-09-25 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25528:


Assignee: Wenchen Fan  (was: Apache Spark)

> data source V2 read side API refactoring
> 
>
> Key: SPARK-25528
> URL: https://issues.apache.org/jira/browse/SPARK-25528
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>
> refactor the read side API according to this abstraction
> {code}
> batch: catalog -> table -> scan
> streaming: catalog -> table -> stream -> scan
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25528) data source V2 read side API refactoring

2018-09-25 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627648#comment-16627648
 ] 

Apache Spark commented on SPARK-25528:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/22547

> data source V2 read side API refactoring
> 
>
> Key: SPARK-25528
> URL: https://issues.apache.org/jira/browse/SPARK-25528
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>
> refactor the read side API according to this abstraction
> {code}
> batch: catalog -> table -> scan
> streaming: catalog -> table -> stream -> scan
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-25528) data source V2 read side API refactoring

2018-09-25 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-25528:


Assignee: Apache Spark  (was: Wenchen Fan)

> data source V2 read side API refactoring
> 
>
> Key: SPARK-25528
> URL: https://issues.apache.org/jira/browse/SPARK-25528
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>
> refactor the read side API according to this abstraction
> {code}
> batch: catalog -> table -> scan
> streaming: catalog -> table -> stream -> scan
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23521) SPIP: Standardize SQL logical plans with DataSourceV2

2018-09-25 Thread Ryan Blue (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan Blue resolved SPARK-23521.
---
Resolution: Fixed

Marking this as "FIxed" because the vote passed.

> SPIP: Standardize SQL logical plans with DataSourceV2
> -
>
> Key: SPARK-23521
> URL: https://issues.apache.org/jira/browse/SPARK-23521
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Ryan Blue
>Priority: Major
>  Labels: SPIP
>
> Executive Summary: This SPIP is based on [discussion about the DataSourceV2 
> implementation|https://lists.apache.org/thread.html/55676ec1f5039d3deaf347d391cf82fe8574b8fa4eeab70110ed5b2b@%3Cdev.spark.apache.org%3E]
>  on the dev list. The proposal is to standardize the logical plans used for 
> write operations to make the planner more maintainable and to make Spark's 
> write behavior predictable and reliable. It proposes the following principles:
>  # Use well-defined logical plan nodes for all high-level operations: insert, 
> create, CTAS, overwrite table, etc.
>  # Use planner rules that match on these high-level nodes, so that it isn’t 
> necessary to create rules to match each eventual code path individually.
>  # Clearly define Spark’s behavior for these logical plan nodes. Physical 
> nodes should implement that behavior so that all code paths eventually make 
> the same guarantees.
>  # Specialize implementation when creating a physical plan, not logical 
> plans. This will avoid behavior drift and ensure planner code is shared 
> across physical implementations.
> The SPIP doc presents a small but complete set of those high-level logical 
> operations, most of which are already defined in SQL or implemented by some 
> write path in Spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25527) Job stuck waiting for last stage to start

2018-09-25 Thread Li Yuanjian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627593#comment-16627593
 ] 

Li Yuanjian commented on SPARK-25527:
-

{quote}
There are no Tasks waiting for completion, and the job just hangs.
{quote}
You can check the driver thread dump at this time, generally speaking driver is 
doing some heavy work like commit in this scenario. We need more clues to 
confirm this is a bug.

> Job stuck waiting for last stage to start
> -
>
> Key: SPARK-25527
> URL: https://issues.apache.org/jira/browse/SPARK-25527
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Ran Haim
>Priority: Major
>
> Sometimes it can somehow happen that a job is stuck waiting for the last 
> stage to start.
> There are no Tasks waiting for completion, and the job just hangs.
> There are available Executors for the job to run.
> I do not know how to reproduce this, all I know is that it happens randomly 
> after couple days of hard load.
> Another thing that might help is that it seems to happen when some tasks fail 
> because one or more executors killed (due to memory issues or something).
> Those tasks eventually do get finished by other executors because of retries, 
> but the next stage hangs.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25532) A stable and efficient row representation

2018-09-25 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-25532:
---

 Summary: A stable and efficient row representation
 Key: SPARK-25532
 URL: https://issues.apache.org/jira/browse/SPARK-25532
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.5.0
Reporter: Wenchen Fan


Currently data source v2 API uses {{InternalRow}} in the read/write interfaces, 
which is not stable. We should design a stable row representation that is as 
efficient as {{InternalRow}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25531) new write APIs for data source v2

2018-09-25 Thread Wenchen Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627572#comment-16627572
 ] 

Wenchen Fan commented on SPARK-25531:
-

Hi [~rdblue], I've created this umbrella JIRA ticket to track the progress of 
the "Standardize SQL logical plans" project. Feel free to update description 
and add more sub-tasks. Thanks!

> new write APIs for data source v2
> -
>
> Key: SPARK-25531
> URL: https://issues.apache.org/jira/browse/SPARK-25531
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: Wenchen Fan
>Priority: Major
>
> The current data source write API heavily depend on {{SaveMode}}, which 
> doesn't have a clear semantic, especially when writing to tables.
> We should design a new set of write API without {{SaveMode}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-23521) SPIP: Standardize SQL logical plans with DataSourceV2

2018-09-25 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-23521:

Issue Type: Sub-task  (was: Improvement)
Parent: SPARK-25531

> SPIP: Standardize SQL logical plans with DataSourceV2
> -
>
> Key: SPARK-23521
> URL: https://issues.apache.org/jira/browse/SPARK-23521
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Ryan Blue
>Priority: Major
>  Labels: SPIP
>
> Executive Summary: This SPIP is based on [discussion about the DataSourceV2 
> implementation|https://lists.apache.org/thread.html/55676ec1f5039d3deaf347d391cf82fe8574b8fa4eeab70110ed5b2b@%3Cdev.spark.apache.org%3E]
>  on the dev list. The proposal is to standardize the logical plans used for 
> write operations to make the planner more maintainable and to make Spark's 
> write behavior predictable and reliable. It proposes the following principles:
>  # Use well-defined logical plan nodes for all high-level operations: insert, 
> create, CTAS, overwrite table, etc.
>  # Use planner rules that match on these high-level nodes, so that it isn’t 
> necessary to create rules to match each eventual code path individually.
>  # Clearly define Spark’s behavior for these logical plan nodes. Physical 
> nodes should implement that behavior so that all code paths eventually make 
> the same guarantees.
>  # Specialize implementation when creating a physical plan, not logical 
> plans. This will avoid behavior drift and ensure planner code is shared 
> across physical implementations.
> The SPIP doc presents a small but complete set of those high-level logical 
> operations, most of which are already defined in SQL or implemented by some 
> write path in Spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24923) DataSourceV2: Add CTAS and RTAS logical operations

2018-09-25 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-24923:

Parent Issue: SPARK-25531  (was: SPARK-22386)

> DataSourceV2: Add CTAS and RTAS logical operations
> --
>
> Key: SPARK-24923
> URL: https://issues.apache.org/jira/browse/SPARK-24923
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Ryan Blue
>Priority: Major
>
> When SPARK-24252 and SPARK-24251 are in, next plans to implement from the 
> SPIP are CTAS and RTAS.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24253) DataSourceV2: Add DeleteSupport for delete and overwrite operations

2018-09-25 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-24253:

Parent Issue: SPARK-25531  (was: SPARK-22386)

> DataSourceV2: Add DeleteSupport for delete and overwrite operations
> ---
>
> Key: SPARK-24253
> URL: https://issues.apache.org/jira/browse/SPARK-24253
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Ryan Blue
>Priority: Major
>
> Implementing delete and overwrite logical plans requires an API to delete 
> data from a data source.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24251) DataSourceV2: Add AppendData logical operation

2018-09-25 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-24251:

Parent Issue: SPARK-25531  (was: SPARK-22386)

> DataSourceV2: Add AppendData logical operation
> --
>
> Key: SPARK-24251
> URL: https://issues.apache.org/jira/browse/SPARK-24251
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Ryan Blue
>Assignee: Ryan Blue
>Priority: Major
> Fix For: 2.4.0
>
>
> The SPIP to standardize SQL logical plans (SPARK-23521) proposes AppendData 
> for inserting data in append mode. This is the simplest plan to implement 
> first.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25531) new write APIs for data source v2

2018-09-25 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-25531:
---

 Summary: new write APIs for data source v2
 Key: SPARK-25531
 URL: https://issues.apache.org/jira/browse/SPARK-25531
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.5.0
Reporter: Wenchen Fan


The current data source write API heavily depend on {{SaveMode}}, which doesn't 
have a clear semantic, especially when writing to tables.

We should design a new set of write API without {{SaveMode}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25523) Multi thread execute sparkSession.read().jdbc(url, table, properties) problem

2018-09-25 Thread huanghuai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huanghuai updated SPARK-25523:
--
Summary: Multi thread execute sparkSession.read().jdbc(url, table, 
properties) problem  (was: Multithreading executive 
sparkSession.read().jdbc(url, table, properties) problem)

> Multi thread execute sparkSession.read().jdbc(url, table, properties) problem
> -
>
> Key: SPARK-25523
> URL: https://issues.apache.org/jira/browse/SPARK-25523
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: h3. [IntelliJ 
> _IDEA_|http://www.baidu.com/link?url=7ZLtsOfyqR1YxLqcTU0Q-hqXWV_PsY6IzIzZoKhiXZZ4AcLrpQ4DoTG30yIN-Gs8]
>  
> local mode
>  
>Reporter: huanghuai
>Priority: Major
>
> when i execute sparkSession.read().jdbc(url, table, properties) in 2 or 3 
> thread , it will be in limitless waiting.
>  
> I found it by debug step by step, and finally waited here: 
> JdbcRelationProvider.createRelation() method
> at line 34:
> val jdbcOptions = new JDBCOptions(parameters)
>  
> when I press F8, Spark did not run away.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25189) CustomMetrics should be created with ScanConfig

2018-09-25 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-25189.
-
Resolution: Not A Problem

The {{CustomMetrics}} has been reverted

> CustomMetrics should be created with ScanConfig
> ---
>
> Key: SPARK-25189
> URL: https://issues.apache.org/jira/browse/SPARK-25189
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Wenchen Fan
>Priority: Major
>
> The read side CustomMetrics need to know the end offset of a micro-batch, we 
> should update the  {{ProgressReporter}} to track ScanConfig, and let 
> {{SupportsCustomReaderMetrics#getCustomMetrics}} take ScanConfig as parameter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25528) data source V2 read side API refactoring

2018-09-25 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-25528:

Description: 
refactor the read side API according to this abstraction
{code}
batch: catalog -> table -> scan
streaming: catalog -> table -> stream -> scan
{code}

> data source V2 read side API refactoring
> 
>
> Key: SPARK-25528
> URL: https://issues.apache.org/jira/browse/SPARK-25528
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>
> refactor the read side API according to this abstraction
> {code}
> batch: catalog -> table -> scan
> streaming: catalog -> table -> stream -> scan
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25530) data source v2 write side API refactoring

2018-09-25 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-25530:
---

 Summary: data source v2 write side API refactoring
 Key: SPARK-25530
 URL: https://issues.apache.org/jira/browse/SPARK-25530
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.5.0
Reporter: Wenchen Fan


refactor the write side API according to this abstraction
{code}
batch: catalog -> table -> write
streaming: catalog -> table -> stream -> write
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Deleted] (SPARK-25529) data source V2 read side API refactoring

2018-09-25 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan deleted SPARK-25529:



> data source V2 read side API refactoring
> 
>
> Key: SPARK-25529
> URL: https://issues.apache.org/jira/browse/SPARK-25529
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25529) data source V2 read side API refactoring

2018-09-25 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-25529:
---

 Summary: data source V2 read side API refactoring
 Key: SPARK-25529
 URL: https://issues.apache.org/jira/browse/SPARK-25529
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.5.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25528) data source V2 read side API refactoring

2018-09-25 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-25528:
---

 Summary: data source V2 read side API refactoring
 Key: SPARK-25528
 URL: https://issues.apache.org/jira/browse/SPARK-25528
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.5.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25390) data source V2 API refactoring

2018-09-25 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-25390:

Target Version/s: 2.5.0

> data source V2 API refactoring
> --
>
> Key: SPARK-25390
> URL: https://issues.apache.org/jira/browse/SPARK-25390
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: Wenchen Fan
>Priority: Major
>
> Currently it's not very clear how we should abstract data source v2 API. The 
> abstraction should be unified between batch and streaming, or similar but 
> have a well-defined difference between batch and streaming. And the 
> abstraction should also include catalog/table.
> An example of the abstraction:
> {code}
> batch: catalog -> table -> scan
> streaming: catalog -> table -> stream -> scan
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25390) data source V2 API refactoring

2018-09-25 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-25390:

Description: 
Currently it's not very clear how we should abstract data source v2 API. The 
abstraction should be unified between batch and streaming, or similar but have 
a well-defined difference between batch and streaming. And the abstraction 
should also include catalog/table.

An example of the abstraction:
{code}
batch: catalog -> table -> scan
streaming: catalog -> table -> stream -> scan
{code}

We should refactor the data source v2 API according to the abstraction

  was:
Currently it's not very clear how we should abstract data source v2 API. The 
abstraction should be unified between batch and streaming, or similar but have 
a well-defined difference between batch and streaming. And the abstraction 
should also include catalog/table.

An example of the abstraction:
{code}
batch: catalog -> table -> scan
streaming: catalog -> table -> stream -> scan
{code}


> data source V2 API refactoring
> --
>
> Key: SPARK-25390
> URL: https://issues.apache.org/jira/browse/SPARK-25390
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: Wenchen Fan
>Priority: Major
>
> Currently it's not very clear how we should abstract data source v2 API. The 
> abstraction should be unified between batch and streaming, or similar but 
> have a well-defined difference between batch and streaming. And the 
> abstraction should also include catalog/table.
> An example of the abstraction:
> {code}
> batch: catalog -> table -> scan
> streaming: catalog -> table -> stream -> scan
> {code}
> We should refactor the data source v2 API according to the abstraction



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25390) data source V2 API refactoring

2018-09-25 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-25390:

Summary: data source V2 API refactoring  (was: finalize the abstraction of 
data source V2 API)

> data source V2 API refactoring
> --
>
> Key: SPARK-25390
> URL: https://issues.apache.org/jira/browse/SPARK-25390
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: Wenchen Fan
>Priority: Major
>
> Currently it's not very clear how we should abstract data source v2 API. The 
> abstraction should be unified between batch and streaming, or similar but 
> have a well-defined difference between batch and streaming. And the 
> abstraction should also include catalog/table.
> An example of the abstraction:
> {code}
> batch: catalog -> table -> scan
> streaming: catalog -> table -> stream -> scan
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25390) finalize the abstraction of data source V2 API

2018-09-25 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-25390:

Affects Version/s: (was: 3.0.0)
   2.5.0

> finalize the abstraction of data source V2 API
> --
>
> Key: SPARK-25390
> URL: https://issues.apache.org/jira/browse/SPARK-25390
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.5.0
>Reporter: Wenchen Fan
>Priority: Major
>
> Currently it's not very clear how we should abstract data source v2 API. The 
> abstraction should be unified between batch and streaming, or similar but 
> have a well-defined difference between batch and streaming. And the 
> abstraction should also include catalog/table.
> An example of the abstraction:
> {code}
> batch: catalog -> table -> scan
> streaming: catalog -> table -> stream -> scan
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25500) Specify configmap and secrets in Spark driver and executor pods in Kubernetes

2018-09-25 Thread Abhishek Rao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhishek Rao resolved SPARK-25500.
--
Resolution: Won't Fix

As mentioned in ticket, we'll use the templates way to mount configmap. Hence 
closing this ticket

> Specify configmap and secrets in Spark driver and executor pods in Kubernetes
> -
>
> Key: SPARK-25500
> URL: https://issues.apache.org/jira/browse/SPARK-25500
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.1
>Reporter: Abhishek Rao
>Priority: Minor
>
> This uses SPARK-23529. Support for specifying configmap and secrets as 
> spark-configuration is requested.
> Using PR #22146, the above functionality can be achieved by passing template 
> file. However, for spark properties (like log4j.properties, fairscheduler.xml 
> and metrics.properties), we are proposing this approach as this is native to 
> other configuration options specifications in spark.
> The configmaps and secrets have to be pre-created before using this as spark 
> configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25422) flaky test: org.apache.spark.DistributedSuite.caching on disk, replicated (encryption = on) (with replication as stream)

2018-09-25 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627466#comment-16627466
 ] 

Apache Spark commented on SPARK-25422:
--

User 'squito' has created a pull request for this issue:
https://github.com/apache/spark/pull/22546

> flaky test: org.apache.spark.DistributedSuite.caching on disk, replicated 
> (encryption = on) (with replication as stream)
> 
>
> Key: SPARK-25422
> URL: https://issues.apache.org/jira/browse/SPARK-25422
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Wenchen Fan
>Priority: Major
>
> stacktrace
> {code}
>  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 7, localhost, executor 1): java.io.IOException: 
> org.apache.spark.SparkException: corrupt remote block broadcast_0_piece0 of 
> broadcast_0: 1651574976 != 1165629262
>   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1320)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
>   at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:84)
>   at org.apache.spark.scheduler.Task.run(Task.scala:121)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$7.apply(Executor.scala:367)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1347)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:373)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: corrupt remote block 
> broadcast_0_piece0 of broadcast_0: 1651574976 != 1165629262
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:167)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:151)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:151)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:151)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:231)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
>   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1313)
>   ... 13 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25422) flaky test: org.apache.spark.DistributedSuite.caching on disk, replicated (encryption = on) (with replication as stream)

2018-09-25 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627464#comment-16627464
 ] 

Apache Spark commented on SPARK-25422:
--

User 'squito' has created a pull request for this issue:
https://github.com/apache/spark/pull/22546

> flaky test: org.apache.spark.DistributedSuite.caching on disk, replicated 
> (encryption = on) (with replication as stream)
> 
>
> Key: SPARK-25422
> URL: https://issues.apache.org/jira/browse/SPARK-25422
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Wenchen Fan
>Priority: Major
>
> stacktrace
> {code}
>  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 7, localhost, executor 1): java.io.IOException: 
> org.apache.spark.SparkException: corrupt remote block broadcast_0_piece0 of 
> broadcast_0: 1651574976 != 1165629262
>   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1320)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:207)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:66)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:66)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:96)
>   at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:84)
>   at org.apache.spark.scheduler.Task.run(Task.scala:121)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$7.apply(Executor.scala:367)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1347)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:373)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: corrupt remote block 
> broadcast_0_piece0 of broadcast_0: 1651574976 != 1165629262
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:167)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:151)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:151)
>   at scala.collection.immutable.List.foreach(List.scala:392)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:151)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1$$anonfun$apply$2.apply(TorrentBroadcast.scala:231)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:211)
>   at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1313)
>   ... 13 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23985) predicate push down doesn't work with simple compound partition spec

2018-09-25 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16625876#comment-16625876
 ] 

Yuming Wang edited comment on SPARK-23985 at 9/25/18 2:19 PM:
--

 [~uzadude] Seem we should not push down predicate. Pelase see these test case:

https://github.com/apache/spark/blob/2c73d2a948bdde798aaf0f87c18846281deb05fd/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/FilterPushdownSuite.scala#L1086-L1144


was (Author: q79969786):
Thanks [~uzadude] I will deep dive it.

> predicate push down doesn't work with simple compound partition spec
> 
>
> Key: SPARK-23985
> URL: https://issues.apache.org/jira/browse/SPARK-23985
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Ohad Raviv
>Priority: Minor
>
> while predicate push down works with this query: 
> {code:sql}
> select * from (
>select *, row_number() over (partition by a order by b) from t1
> )z 
> where a>1
> {code}
> it dowsn't work with:
> {code:sql}
> select * from (
>select *, row_number() over (partition by concat(a,'lit') order by b) from 
> t1
> )z 
> where a>1
> {code}
>  
>  I added a test to FilterPushdownSuite which I think recreates the problem:
> {code}
>   test("Window: predicate push down -- ohad") {
> val winExpr = windowExpr(count('b),
>   windowSpec(Concat('a :: Nil) :: Nil, 'b.asc :: Nil, UnspecifiedFrame))
> val originalQuery = testRelation.select('a, 'b, 'c, 
> winExpr.as('window)).where('a > 1)
> val correctAnswer = testRelation
>   .where('a > 1).select('a, 'b, 'c)
>   .window(winExpr.as('window) :: Nil, 'a :: Nil, 'b.asc :: Nil)
>   .select('a, 'b, 'c, 'window).analyze
> comparePlans(Optimize.execute(originalQuery.analyze), correctAnswer)
>   }
> {code}
> will try to create a PR with a correction



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24064) [Spark SQL] Create table using csv does not support binary column Type

2018-09-25 Thread Gururaj Shetty (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16627376#comment-16627376
 ] 

Gururaj Shetty commented on SPARK-24064:


If Binary is not supported only for CSV, then the same can be captured in the 
document. Useful for the user while using CSV

> [Spark SQL] Create table  using csv does not support binary column Type
> ---
>
> Key: SPARK-24064
> URL: https://issues.apache.org/jira/browse/SPARK-24064
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment: OS Type: Suse 11
> Spark Version: 2.3.0
>Reporter: ABHISHEK KUMAR GUPTA
>Priority: Minor
>  Labels: test
>
> #  Launch spark-sql --master yarn                                         
>  # create table csvTable (time timestamp, name string, isright boolean, 
> datetoday date, num binary, height double, score float, decimaler 
> decimal(10,0), id tinyint, age int, license bigint, length smallint) using 
> CSV options (path "/user/datatmo/customer1.csv");
> Result: Table creation is successful
>     3. Select * from csvTable;
> Throws below Exception
> ERROR SparkSQLDriver:91 - Failed in [select * from csvtable]
> java.lang.UnsupportedOperationException: *CSV data source does not support 
> binary data type*.
>  at 
> org.apache.spark.sql.execution.datasources.csv.CSVUtils$.org$apache$spark$sql$execution$datasources$csv$CSVUtils$$verifyType$1(CSVUtils.scala:127)
>  at 
> org.apache.spark.sql.execution.datasources.csv.CSVUtils$$anonfun$verifySchema$1.apply(CSVUtils.scala:131)
>  at 
> org.apache.spark.sql.execution.datasources.csv.CSVUtils$$anonfun$verifySchema$1.apply(CSVUtils.scala:131)
>  at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>  at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
>  at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>  at org.apache.spark.sql.types.StructType.foreach(StructType.scala:99)
>  
> But Normal table supports binary Data Type.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25527) Job stuck waiting for last stage to start

2018-09-25 Thread Ran Haim (JIRA)
Ran Haim created SPARK-25527:


 Summary: Job stuck waiting for last stage to start
 Key: SPARK-25527
 URL: https://issues.apache.org/jira/browse/SPARK-25527
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.1.0
Reporter: Ran Haim


Sometimes it can somehow happen that a job is stuck waiting for the last stage 
to start.
There are no Tasks waiting for completion, and the job just hangs.
There are available Executors for the job to run.

I do not know how to reproduce this, all I know is that it happens randomly 
after couple days of hard load.


Another thing that might help is that it seems to happen when some tasks fail 
because one or more executors killed (due to memory issues or something).
Those tasks eventually do get finished by other executors because of retries, 
but the next stage hangs.  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >