[jira] [Commented] (SPARK-26501) Unexpected overriden of exitFn in SparkSubmitSuite

2018-12-28 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730591#comment-16730591
 ] 

Apache Spark commented on SPARK-26501:
--

User 'liupc' has created a pull request for this issue:
https://github.com/apache/spark/pull/23404

> Unexpected overriden of exitFn in SparkSubmitSuite
> --
>
> Key: SPARK-26501
> URL: https://issues.apache.org/jira/browse/SPARK-26501
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: liupengcheng
>Priority: Major
>
> When I run SparkSubmitSuite of spark2.3.2 in intellij IDE, I found that some 
> tests cannot pass when I run them one by one, but they passed when the whole 
> SparkSubmitSuite was run.
> Failed tests when ran seperately:
>  
> {code:java}
> test("SPARK_CONF_DIR overrides spark-defaults.conf") {
>   forConfDir(Map("spark.executor.memory" -> "2.3g")) { path =>
> val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
> val args = Seq(
>   "--class", SimpleApplicationTest.getClass.getName.stripSuffix("$"),
>   "--name", "testApp",
>   "--master", "local",
>   unusedJar.toString)
> val appArgs = new SparkSubmitArguments(args, Map("SPARK_CONF_DIR" -> 
> path))
> assert(appArgs.defaultPropertiesFile != null)
> assert(appArgs.defaultPropertiesFile.startsWith(path))
> assert(appArgs.propertiesFile == null)
> appArgs.executorMemory should be ("2.3g")
>   }
> }
> {code}
> Failure reason:
> {code:java}
> Error: Executor Memory cores must be a positive number
> Run with --help for usage help or --verbose for debug output
> {code}
>  
> After carefully checked the code, I found the exitFn of SparkSubmit is 
> overrided by front tests in testPrematrueExit call.
> Although the above test was fixed by SPARK-22941, but the overriden of exitFn 
> might cause other problems in the future.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26501) Unexpected overriden of exitFn in SparkSubmitSuite

2018-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26501:


Assignee: (was: Apache Spark)

> Unexpected overriden of exitFn in SparkSubmitSuite
> --
>
> Key: SPARK-26501
> URL: https://issues.apache.org/jira/browse/SPARK-26501
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: liupengcheng
>Priority: Major
>
> When I run SparkSubmitSuite of spark2.3.2 in intellij IDE, I found that some 
> tests cannot pass when I run them one by one, but they passed when the whole 
> SparkSubmitSuite was run.
> Failed tests when ran seperately:
>  
> {code:java}
> test("SPARK_CONF_DIR overrides spark-defaults.conf") {
>   forConfDir(Map("spark.executor.memory" -> "2.3g")) { path =>
> val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
> val args = Seq(
>   "--class", SimpleApplicationTest.getClass.getName.stripSuffix("$"),
>   "--name", "testApp",
>   "--master", "local",
>   unusedJar.toString)
> val appArgs = new SparkSubmitArguments(args, Map("SPARK_CONF_DIR" -> 
> path))
> assert(appArgs.defaultPropertiesFile != null)
> assert(appArgs.defaultPropertiesFile.startsWith(path))
> assert(appArgs.propertiesFile == null)
> appArgs.executorMemory should be ("2.3g")
>   }
> }
> {code}
> Failure reason:
> {code:java}
> Error: Executor Memory cores must be a positive number
> Run with --help for usage help or --verbose for debug output
> {code}
>  
> After carefully checked the code, I found the exitFn of SparkSubmit is 
> overrided by front tests in testPrematrueExit call.
> Although the above test was fixed by SPARK-22941, but the overriden of exitFn 
> might cause other problems in the future.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26501) Unexpected overriden of exitFn in SparkSubmitSuite

2018-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26501:


Assignee: Apache Spark

> Unexpected overriden of exitFn in SparkSubmitSuite
> --
>
> Key: SPARK-26501
> URL: https://issues.apache.org/jira/browse/SPARK-26501
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: liupengcheng
>Assignee: Apache Spark
>Priority: Major
>
> When I run SparkSubmitSuite of spark2.3.2 in intellij IDE, I found that some 
> tests cannot pass when I run them one by one, but they passed when the whole 
> SparkSubmitSuite was run.
> Failed tests when ran seperately:
>  
> {code:java}
> test("SPARK_CONF_DIR overrides spark-defaults.conf") {
>   forConfDir(Map("spark.executor.memory" -> "2.3g")) { path =>
> val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
> val args = Seq(
>   "--class", SimpleApplicationTest.getClass.getName.stripSuffix("$"),
>   "--name", "testApp",
>   "--master", "local",
>   unusedJar.toString)
> val appArgs = new SparkSubmitArguments(args, Map("SPARK_CONF_DIR" -> 
> path))
> assert(appArgs.defaultPropertiesFile != null)
> assert(appArgs.defaultPropertiesFile.startsWith(path))
> assert(appArgs.propertiesFile == null)
> appArgs.executorMemory should be ("2.3g")
>   }
> }
> {code}
> Failure reason:
> {code:java}
> Error: Executor Memory cores must be a positive number
> Run with --help for usage help or --verbose for debug output
> {code}
>  
> After carefully checked the code, I found the exitFn of SparkSubmit is 
> overrided by front tests in testPrematrueExit call.
> Although the above test was fixed by SPARK-22941, but the overriden of exitFn 
> might cause other problems in the future.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26501) Unexpected overrided exitFn in SparkSubmitSuite

2018-12-28 Thread liupengcheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liupengcheng updated SPARK-26501:
-
Summary: Unexpected overrided exitFn in SparkSubmitSuite  (was: Fix 
overrided exitFn in SparkSubmitSuite)

> Unexpected overrided exitFn in SparkSubmitSuite
> ---
>
> Key: SPARK-26501
> URL: https://issues.apache.org/jira/browse/SPARK-26501
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: liupengcheng
>Priority: Major
>
> When I run SparkSubmitSuite of spark2.3.2 in intellij IDE, I found that some 
> tests cannot pass when I run them one by one, but they passed when the whole 
> SparkSubmitSuite was run.
> Failed tests when ran seperately:
>  
> {code:java}
> test("SPARK_CONF_DIR overrides spark-defaults.conf") {
>   forConfDir(Map("spark.executor.memory" -> "2.3g")) { path =>
> val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
> val args = Seq(
>   "--class", SimpleApplicationTest.getClass.getName.stripSuffix("$"),
>   "--name", "testApp",
>   "--master", "local",
>   unusedJar.toString)
> val appArgs = new SparkSubmitArguments(args, Map("SPARK_CONF_DIR" -> 
> path))
> assert(appArgs.defaultPropertiesFile != null)
> assert(appArgs.defaultPropertiesFile.startsWith(path))
> assert(appArgs.propertiesFile == null)
> appArgs.executorMemory should be ("2.3g")
>   }
> }
> {code}
> Failure reason:
> {code:java}
> Error: Executor Memory cores must be a positive number
> Run with --help for usage help or --verbose for debug output
> {code}
>  
> After carefully checked the code, I found the exitFn of SparkSubmit is 
> overrided by front tests in testPrematrueExit call.
> Although the above test was fixed by SPARK-22941, but the overriden of exitFn 
> might cause other problems in the future.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26501) Unexpected overriden of exitFn in SparkSubmitSuite

2018-12-28 Thread liupengcheng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liupengcheng updated SPARK-26501:
-
Summary: Unexpected overriden of exitFn in SparkSubmitSuite  (was: 
Unexpected overrided exitFn in SparkSubmitSuite)

> Unexpected overriden of exitFn in SparkSubmitSuite
> --
>
> Key: SPARK-26501
> URL: https://issues.apache.org/jira/browse/SPARK-26501
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Spark Core
>Affects Versions: 2.3.2, 2.4.0
>Reporter: liupengcheng
>Priority: Major
>
> When I run SparkSubmitSuite of spark2.3.2 in intellij IDE, I found that some 
> tests cannot pass when I run them one by one, but they passed when the whole 
> SparkSubmitSuite was run.
> Failed tests when ran seperately:
>  
> {code:java}
> test("SPARK_CONF_DIR overrides spark-defaults.conf") {
>   forConfDir(Map("spark.executor.memory" -> "2.3g")) { path =>
> val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
> val args = Seq(
>   "--class", SimpleApplicationTest.getClass.getName.stripSuffix("$"),
>   "--name", "testApp",
>   "--master", "local",
>   unusedJar.toString)
> val appArgs = new SparkSubmitArguments(args, Map("SPARK_CONF_DIR" -> 
> path))
> assert(appArgs.defaultPropertiesFile != null)
> assert(appArgs.defaultPropertiesFile.startsWith(path))
> assert(appArgs.propertiesFile == null)
> appArgs.executorMemory should be ("2.3g")
>   }
> }
> {code}
> Failure reason:
> {code:java}
> Error: Executor Memory cores must be a positive number
> Run with --help for usage help or --verbose for debug output
> {code}
>  
> After carefully checked the code, I found the exitFn of SparkSubmit is 
> overrided by front tests in testPrematrueExit call.
> Although the above test was fixed by SPARK-22941, but the overriden of exitFn 
> might cause other problems in the future.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26501) Fix overrided exitFn in SparkSubmitSuite

2018-12-28 Thread liupengcheng (JIRA)
liupengcheng created SPARK-26501:


 Summary: Fix overrided exitFn in SparkSubmitSuite
 Key: SPARK-26501
 URL: https://issues.apache.org/jira/browse/SPARK-26501
 Project: Spark
  Issue Type: Bug
  Components: Deploy, Spark Core
Affects Versions: 2.4.0, 2.3.2
Reporter: liupengcheng


When I run SparkSubmitSuite of spark2.3.2 in intellij IDE, I found that some 
tests cannot pass when I run them one by one, but they passed when the whole 
SparkSubmitSuite was run.

Failed tests when ran seperately:

 
{code:java}
test("SPARK_CONF_DIR overrides spark-defaults.conf") {
  forConfDir(Map("spark.executor.memory" -> "2.3g")) { path =>
val unusedJar = TestUtils.createJarWithClasses(Seq.empty)
val args = Seq(
  "--class", SimpleApplicationTest.getClass.getName.stripSuffix("$"),
  "--name", "testApp",
  "--master", "local",
  unusedJar.toString)
val appArgs = new SparkSubmitArguments(args, Map("SPARK_CONF_DIR" -> path))
assert(appArgs.defaultPropertiesFile != null)
assert(appArgs.defaultPropertiesFile.startsWith(path))
assert(appArgs.propertiesFile == null)
appArgs.executorMemory should be ("2.3g")
  }
}
{code}
Failure reason:
{code:java}
Error: Executor Memory cores must be a positive number
Run with --help for usage help or --verbose for debug output
{code}
 

After carefully checked the code, I found the exitFn of SparkSubmit is 
overrided by front tests in testPrematrueExit call.

Although the above test was fixed by SPARK-22941, but the overriden of exitFn 
might cause other problems in the future.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26022) PySpark Comparison with Pandas

2018-12-28 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-26022:
-
Target Version/s: 3.0.0

> PySpark Comparison with Pandas
> --
>
> Key: SPARK-26022
> URL: https://issues.apache.org/jira/browse/SPARK-26022
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Priority: Major
>
> It would be very nice if we can have a doc like 
> https://pandas.pydata.org/pandas-docs/stable/comparison_with_sql.html to show 
> the API difference between PySpark and Pandas. 
> Reference:
> https://www.kdnuggets.com/2016/01/python-data-science-pandas-spark-dataframe-differences.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26500) Add conf to support ignore hdfs data locality

2018-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26500:


Assignee: Apache Spark

> Add conf to support ignore hdfs data locality
> -
>
> Key: SPARK-26500
> URL: https://issues.apache.org/jira/browse/SPARK-26500
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: EdisonWang
>Assignee: Apache Spark
>Priority: Trivial
>
> When reading a large hive table/directory with thousands of files, it will 
> cost up to several minutes or even hours to calculate data locality for each 
> split in driver, while executors are in idle status.
> This situation is even worth when running in SparkThriftServer mode, because 
> handleJobSubmitted(it will call getPreferedLocation) is handled in a single 
> thread. One big sql will block all the following sqls.
> At the same time, most companies's internal networks are all gigabit network 
> cards, so it is ok to read a data not locality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26500) Add conf to support ignore hdfs data locality

2018-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26500:


Assignee: (was: Apache Spark)

> Add conf to support ignore hdfs data locality
> -
>
> Key: SPARK-26500
> URL: https://issues.apache.org/jira/browse/SPARK-26500
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: EdisonWang
>Priority: Trivial
>
> When reading a large hive table/directory with thousands of files, it will 
> cost up to several minutes or even hours to calculate data locality for each 
> split in driver, while executors are in idle status.
> This situation is even worth when running in SparkThriftServer mode, because 
> handleJobSubmitted(it will call getPreferedLocation) is handled in a single 
> thread. One big sql will block all the following sqls.
> At the same time, most companies's internal networks are all gigabit network 
> cards, so it is ok to read a data not locality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26500) Add conf to support ignore hdfs data locality

2018-12-28 Thread EdisonWang (JIRA)
EdisonWang created SPARK-26500:
--

 Summary: Add conf to support ignore hdfs data locality
 Key: SPARK-26500
 URL: https://issues.apache.org/jira/browse/SPARK-26500
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 2.4.0
Reporter: EdisonWang


When reading a large hive table/directory with thousands of files, it will cost 
up to several minutes or even hours to calculate data locality for each split 
in driver, while executors are in idle status.

This situation is even worth when running in SparkThriftServer mode, because 
handleJobSubmitted(it will call getPreferedLocation) is handled in a single 
thread. One big sql will block all the following sqls.

At the same time, most companies's internal networks are all gigabit network 
cards, so it is ok to read a data not locality.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26442) Use ConfigEntry for hardcoded configs.

2018-12-28 Thread Takuya Ueshin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730528#comment-16730528
 ] 

Takuya Ueshin commented on SPARK-26442:
---

I see. I'll combine them into several subtasks.
I'll leave some which someone commented on to work as is, so please keep going 
ahead.

> Use ConfigEntry for hardcoded configs.
> --
>
> Key: SPARK-26442
> URL: https://issues.apache.org/jira/browse/SPARK-26442
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> This umbrella JIRA is to make hardcoded configs to use {{ConfigEntry}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26442) Use ConfigEntry for hardcoded configs.

2018-12-28 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730519#comment-16730519
 ] 

Hyukjin Kwon commented on SPARK-26442:
--

Yes.. maybe at least the JIRAs should be combined per, for instance, 3 to 5 or 
leave one JIRA that describes catogeries that should have multiple PRs.

Hey Takuya, since the JIRAs are open by you, do you mind taking an action for 
them? I assume you know the rough estimation of each JIRA.

> Use ConfigEntry for hardcoded configs.
> --
>
> Key: SPARK-26442
> URL: https://issues.apache.org/jira/browse/SPARK-26442
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> This umbrella JIRA is to make hardcoded configs to use {{ConfigEntry}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26494) 【spark sql】 使用spark读取oracle TIMESTAMP(6) WITH LOCAL TIME ZONE 类型找不到

2018-12-28 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/SPARK-26494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

秦坤 updated SPARK-26494:
---
Priority: Minor  (was: Major)

> 【spark sql】 使用spark读取oracle TIMESTAMP(6) WITH LOCAL TIME ZONE 类型找不到
> ---
>
> Key: SPARK-26494
> URL: https://issues.apache.org/jira/browse/SPARK-26494
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: 秦坤
>Priority: Minor
>
> 使用spark读取oracle TIMESTAMP(6) WITH LOCAL TIME ZONE 类型找不到,
>  
> 当数据类型为TIMESTAMP(6) WITH LOCAL TIME ZONE  
> 此时JdbcUtils 类 里面函数getCatalystType的 sqlType 数值为-102



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26499) JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of ByteType

2018-12-28 Thread Thomas D'Silva (JIRA)
Thomas D'Silva created SPARK-26499:
--

 Summary: JdbcUtils.getCatalystType maps TINYINT to IntegerType 
instead of ByteType
 Key: SPARK-26499
 URL: https://issues.apache.org/jira/browse/SPARK-26499
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: Thomas D'Silva


I am trying to use the  DataSource V2 API to read from a JDBC source. While 
using {{JdbcUtils.resultSetToSparkInternalRows}} to create an internal row from 
a ResultSet that has a column of type TINYINT I ran into the following exception
{code:java}
java.lang.IllegalArgumentException: Unsupported type tinyint
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter(JdbcUtils.scala:502)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetters$1.apply(JdbcUtils.scala:379)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetters$1.apply(JdbcUtils.scala:379)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetters(JdbcUtils.scala:379)
at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.(JdbcUtils.scala:340)
{code}
This happens because ByteType is not handled in {{JdbcUtils.makeGetter}}.

Also, since {{JdbcUtils.getCommonJDBCType}} maps ByteType to TinyInt, I think 
{{getCatalystType}} should map TINYINT to ByteType (it currently maps TINYINT 
to IntegerType).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26499) JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of ByteType

2018-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26499:


Assignee: (was: Apache Spark)

> JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of ByteType
> -
>
> Key: SPARK-26499
> URL: https://issues.apache.org/jira/browse/SPARK-26499
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Thomas D'Silva
>Priority: Major
>
> I am trying to use the  DataSource V2 API to read from a JDBC source. While 
> using {{JdbcUtils.resultSetToSparkInternalRows}} to create an internal row 
> from a ResultSet that has a column of type TINYINT I ran into the following 
> exception
> {code:java}
> java.lang.IllegalArgumentException: Unsupported type tinyint
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter(JdbcUtils.scala:502)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetters$1.apply(JdbcUtils.scala:379)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetters$1.apply(JdbcUtils.scala:379)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetters(JdbcUtils.scala:379)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.(JdbcUtils.scala:340)
> {code}
> This happens because ByteType is not handled in {{JdbcUtils.makeGetter}}.
> Also, since {{JdbcUtils.getCommonJDBCType}} maps ByteType to TinyInt, I think 
> {{getCatalystType}} should map TINYINT to ByteType (it currently maps TINYINT 
> to IntegerType).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26499) JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of ByteType

2018-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26499:


Assignee: Apache Spark

> JdbcUtils.getCatalystType maps TINYINT to IntegerType instead of ByteType
> -
>
> Key: SPARK-26499
> URL: https://issues.apache.org/jira/browse/SPARK-26499
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Thomas D'Silva
>Assignee: Apache Spark
>Priority: Major
>
> I am trying to use the  DataSource V2 API to read from a JDBC source. While 
> using {{JdbcUtils.resultSetToSparkInternalRows}} to create an internal row 
> from a ResultSet that has a column of type TINYINT I ran into the following 
> exception
> {code:java}
> java.lang.IllegalArgumentException: Unsupported type tinyint
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetter(JdbcUtils.scala:502)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetters$1.apply(JdbcUtils.scala:379)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetters$1.apply(JdbcUtils.scala:379)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.org$apache$spark$sql$execution$datasources$jdbc$JdbcUtils$$makeGetters(JdbcUtils.scala:379)
>   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.(JdbcUtils.scala:340)
> {code}
> This happens because ByteType is not handled in {{JdbcUtils.makeGetter}}.
> Also, since {{JdbcUtils.getCommonJDBCType}} maps ByteType to TinyInt, I think 
> {{getCatalystType}} should map TINYINT to ByteType (it currently maps TINYINT 
> to IntegerType).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26498) Integrate barrier execution with MMLSpark's LightGBM

2018-12-28 Thread Ilya Matiach (JIRA)
Ilya Matiach created SPARK-26498:


 Summary: Integrate barrier execution with MMLSpark's LightGBM
 Key: SPARK-26498
 URL: https://issues.apache.org/jira/browse/SPARK-26498
 Project: Spark
  Issue Type: New Feature
  Components: ML, MLlib
Affects Versions: 2.4.0
Reporter: Ilya Matiach


I would like to use the new barrier execution mode introduced in spark 2.4 with 
LightGBM in the spark package mmlspark but I ran into some issues.

Currently, the LightGBM distributed learner tries to figure out the number of 
cores on the cluster and then does a coalesce and a mapPartitions, and inside 
the mapPartitions we do a NetworkInit (where the address:port of all workers 
needs to be passed in the constructor) and pass the data in-memory to the 
native layer of the distributed lightgbm learner.

With barrier execution mode, I think the code would become much more robust.  
However, there are several issues that I am running into when trying to move my 
code over to the new barrier execution mode scheduler:

Does not support dynamic allocation – however, I think it would be convenient 
if it restarted the job when the number of workers has decreased and allowed 
the dev to decide whether to restart the job if the number of workers increased

Does not work with DataFrame or Dataset API, but I think it would be much more 
convenient if it did.

How does barrier execution mode deal with #partitions > #tasks?  If the number 
of partitions is larger than the number of “tasks” or workers, can barrier 
execution mode automatically coalesce the dataset to have # partitions == # 
tasks?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23949) makes "&&" supports the function of predicate operator "and"

2018-12-28 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-23949.
---
Resolution: Won't Fix

> makes "&&" supports the function of predicate operator "and"
> 
>
> Key: SPARK-23949
> URL: https://issues.apache.org/jira/browse/SPARK-23949
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: hantiantian
>Priority: Minor
>
> In mysql , symbol && supports  the function of predicate operator "and", 
> maybe we can add support for the function in Spark SQL.
> For example,
> select * from tbl where id==1 && age=10
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26497) Show users where the pre-packaged SparkR and PySpark Dockerfiles are in the image build script.

2018-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26497:


Assignee: holdenk  (was: Apache Spark)

> Show users where the pre-packaged SparkR and PySpark Dockerfiles are in the 
> image build script.
> ---
>
> Key: SPARK-26497
> URL: https://issues.apache.org/jira/browse/SPARK-26497
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, Kubernetes
>Affects Versions: 3.0.0
>Reporter: holdenk
>Assignee: holdenk
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26497) Show users where the pre-packaged SparkR and PySpark Dockerfiles are in the image build script.

2018-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26497:


Assignee: Apache Spark  (was: holdenk)

> Show users where the pre-packaged SparkR and PySpark Dockerfiles are in the 
> image build script.
> ---
>
> Key: SPARK-26497
> URL: https://issues.apache.org/jira/browse/SPARK-26497
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, Kubernetes
>Affects Versions: 3.0.0
>Reporter: holdenk
>Assignee: Apache Spark
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26497) Show users where the pre-packaged SparkR and PySpark Dockerfiles are in the image build script.

2018-12-28 Thread holdenk (JIRA)
holdenk created SPARK-26497:
---

 Summary: Show users where the pre-packaged SparkR and PySpark 
Dockerfiles are in the image build script.
 Key: SPARK-26497
 URL: https://issues.apache.org/jira/browse/SPARK-26497
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, Kubernetes
Affects Versions: 3.0.0
Reporter: holdenk
Assignee: holdenk






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26496) Test "locality preferences of StateStoreAwareZippedRDD" frequently fails on High Sierra

2018-12-28 Thread Bruce Robbins (JIRA)
Bruce Robbins created SPARK-26496:
-

 Summary: Test "locality preferences of StateStoreAwareZippedRDD" 
frequently fails on High Sierra
 Key: SPARK-26496
 URL: https://issues.apache.org/jira/browse/SPARK-26496
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
 Environment: Mac OS X High Sierra

Reporter: Bruce Robbins


This is a bit esoteric and minor, but makes it difficult to run SQL unit tests 
successfully on High Sierra.

StreamingInnerJoinSuite."locality preferences of StateStoreAwareZippedRDD" 
generates a directory name using {{Random.nextString(10)}}, and frequently that 
directory name is unacceptable to High Sierra.

For example:
{noformat}
scala> val prefix = Random.nextString(10); val dir = new File("/tmp", "del_" + 
prefix + "-" + UUID.randomUUID.toString); dir.mkdirs()
prefix: String = 媈ᒢ탊渓뀟?녛ꃲ싢櫦
dir: java.io.File = /tmp/del_媈ᒢ탊渓뀟?녛ꃲ싢櫦-aff57fc6-ca38-4825-b4f3-473140edd4f6
res39: Boolean = true // this one was OK

scala> val prefix = Random.nextString(10); val dir = new File("/tmp", "del_" + 
prefix + "-" + UUID.randomUUID.toString); dir.mkdirs()
prefix: String = 窽텘⒘駖ⵚ駢⡞Ρ닋੎
dir: java.io.File = /tmp/del_窽텘⒘駖ⵚ駢⡞Ρ닋੎-a3f99855-c429-47a0-a108-47bca6905745
res40: Boolean = false  // nope, didn't like this one

scala> prefix.foreach(x => printf("%04x ", x.toInt))
7abd d158 2498 99d6 2d5a 99e2 285e 03a1 b2cb 0a4e 

scala> prefix(9)
res46: Char = ੎

scala> val prefix = "\u7abd"
prefix: String = 窽

scala> val dir = new File("/tmp", "del_" + prefix + "-" + 
UUID.randomUUID.toString); dir.mkdirs()
dir: java.io.File = /tmp/del_窽-d1c3af34-d34d-43fe-afed-ccef9a800ff4
res47: Boolean = true // it's OK with \u7abd

scala> val prefix = "\u0a4e"
prefix: String = ੎

scala> val dir = new File("/tmp", "del_" + prefix + "-" + 
UUID.randomUUID.toString); dir.mkdirs()
dir: java.io.File = /tmp/del_੎-3654a34c-6f74-4591-85af-a0f28b675a6f
res50: Boolean = false // doesn't like \u0a4e
{noformat}
I thought it might have something to do with my Java 8 version, but Python is 
equally affected:
{noformat}
>>> f = open(u"/tmp/del_\u7abd_file", "wb")
f = open(u"/tmp/del_\u7abd_file", "wb")
>>> f.write("hello\n")
f.write("hello\n")
# it's OK with \u7abd
>>> f2 = open(u"/tmp/del_\u0a4e_file", "wb")
f2 = open(u"/tmp/del_\u0a4e_file", "wb")
Traceback (most recent call last):
  File "", line 1, in 
IOError: [Errno 92] Illegal byte sequence: u'/tmp/del_\u0a4e_file'
# doesn't like \u0a4e
>>> f2 = open(u"/tmp/del_\ufa4e_file", "wb")
f2 = open(u"/tmp/del_\ufa4e_file", "wb")
# a little change and it's happy again
>>> 
{noformat}
Mac OS X Sierra is perfectly happy with these characters. This seems to be a 
limitation introduced by High Sierra.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26493) spark.sql.extensions should support multiple extensions

2018-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26493:


Assignee: Apache Spark

> spark.sql.extensions should support multiple extensions
> ---
>
> Key: SPARK-26493
> URL: https://issues.apache.org/jira/browse/SPARK-26493
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Jamison Bennett
>Assignee: Apache Spark
>Priority: Minor
>  Labels: starter
>
> The spark.sql.extensions configuration options should support multiple 
> extensions. It is currently possible to load multiple extensions using the 
> programatic interface (e.g. 
> SparkSession.builder().master("..").withExtensions(sparkSessionExtensions1).withExtensions(sparkSessionExtensions2).getOrCreate()
>  ) but the same cannot currently be done with the command line options 
> without writing a wrapper extensions that combines multiple extensions.
>  
> Allowing multiple spark.sql.extensions, would allow the extensions to be 
> easily changes on the command line or via the configuration file. Multiple 
> extensions could be specified using a comma separated list of class names. 
> Allowing multiple extensions should maintain backwards compatibility because 
> existing spark.sql.extensions configuration settings shouldn't contain a 
> comma because the value is a class name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26493) spark.sql.extensions should support multiple extensions

2018-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26493:


Assignee: (was: Apache Spark)

> spark.sql.extensions should support multiple extensions
> ---
>
> Key: SPARK-26493
> URL: https://issues.apache.org/jira/browse/SPARK-26493
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Jamison Bennett
>Priority: Minor
>  Labels: starter
>
> The spark.sql.extensions configuration options should support multiple 
> extensions. It is currently possible to load multiple extensions using the 
> programatic interface (e.g. 
> SparkSession.builder().master("..").withExtensions(sparkSessionExtensions1).withExtensions(sparkSessionExtensions2).getOrCreate()
>  ) but the same cannot currently be done with the command line options 
> without writing a wrapper extensions that combines multiple extensions.
>  
> Allowing multiple spark.sql.extensions, would allow the extensions to be 
> easily changes on the command line or via the configuration file. Multiple 
> extensions could be specified using a comma separated list of class names. 
> Allowing multiple extensions should maintain backwards compatibility because 
> existing spark.sql.extensions configuration settings shouldn't contain a 
> comma because the value is a class name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26495) Simplify SelectedField extractor

2018-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26495:


Assignee: Apache Spark  (was: Herman van Hovell)

> Simplify SelectedField extractor
> 
>
> Key: SPARK-26495
> URL: https://issues.apache.org/jira/browse/SPARK-26495
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Herman van Hovell
>Assignee: Apache Spark
>Priority: Major
>
> I was reading through the code of the {{SelectedField}} extractor and this is 
> overly complex. It contains a couple of pattern matches that are redundant.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26495) Simplify SelectedField extractor

2018-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26495:


Assignee: Herman van Hovell  (was: Apache Spark)

> Simplify SelectedField extractor
> 
>
> Key: SPARK-26495
> URL: https://issues.apache.org/jira/browse/SPARK-26495
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Herman van Hovell
>Assignee: Herman van Hovell
>Priority: Major
>
> I was reading through the code of the {{SelectedField}} extractor and this is 
> overly complex. It contains a couple of pattern matches that are redundant.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24888) spark-submit --master spark://host:port --status driver-id does not work

2018-12-28 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-24888.
---
Resolution: Won't Fix

> spark-submit --master spark://host:port --status driver-id does not work 
> -
>
> Key: SPARK-24888
> URL: https://issues.apache.org/jira/browse/SPARK-24888
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 2.3.1
>Reporter: srinivasan
>Priority: Minor
>
> spark-submit --master spark://host:port --status driver-id
> does not return anything. The command terminates without any error or output.
> Behaviour is the same from linux or windows



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24888) spark-submit --master spark://host:port --status driver-id does not work

2018-12-28 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-24888:
--
  Priority: Minor  (was: Major)
Issue Type: Improvement  (was: Bug)

> spark-submit --master spark://host:port --status driver-id does not work 
> -
>
> Key: SPARK-24888
> URL: https://issues.apache.org/jira/browse/SPARK-24888
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy
>Affects Versions: 2.3.1
>Reporter: srinivasan
>Priority: Minor
>
> spark-submit --master spark://host:port --status driver-id
> does not return anything. The command terminates without any error or output.
> Behaviour is the same from linux or windows



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21688) performance improvement in mllib SVM with native BLAS

2018-12-28 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-21688.
---
Resolution: Won't Fix

> performance improvement in mllib SVM with native BLAS 
> --
>
> Key: SPARK-21688
> URL: https://issues.apache.org/jira/browse/SPARK-21688
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.2.0
> Environment: 4 nodes: 1 master node, 3 worker nodes
> model name  : Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz
> Memory : 180G
> num of core per node: 10
>Reporter: Vincent
>Priority: Minor
> Attachments: ddot unitest.png, mllib svm training.png, 
> native-trywait.png, svm-mkl-1.png, svm-mkl-2.png, svm1.png, svm2.png
>
>
> in current mllib SVM implementation, we found that the CPU is not fully 
> utilized, one reason is that f2j blas is set to be used in the HingeGradient 
> computation. As we found out earlier 
> (https://issues.apache.org/jira/browse/SPARK-21305) that with proper 
> settings, native blas is generally better than f2j on the uni-test level, 
> here we make the blas operations in SVM go with MKL blas and get an end to 
> end performance report showing that in most cases native blas outperformance 
> f2j blas up to 50%.
> So, we suggest removing those f2j-fixed calling and going for native blas if 
> available. If this proposal is acceptable, we will move on to benchmark other 
> algorithms impacted. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26477) Use ConfigEntry for hardcoded configs for unsafe category.

2018-12-28 Thread Kazuaki Ishizaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730340#comment-16730340
 ] 

Kazuaki Ishizaki commented on SPARK-26477:
--

I will work for this

> Use ConfigEntry for hardcoded configs for unsafe category.
> --
>
> Key: SPARK-26477
> URL: https://issues.apache.org/jira/browse/SPARK-26477
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26495) Simplify SelectedField extractor

2018-12-28 Thread Herman van Hovell (JIRA)
Herman van Hovell created SPARK-26495:
-

 Summary: Simplify SelectedField extractor
 Key: SPARK-26495
 URL: https://issues.apache.org/jira/browse/SPARK-26495
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: Herman van Hovell
Assignee: Herman van Hovell


I was reading through the code of the {{SelectedField}} extractor and this is 
overly complex. It contains a couple of pattern matches that are redundant.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26442) Use ConfigEntry for hardcoded configs.

2018-12-28 Thread Takuya Ueshin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730334#comment-16730334
 ] 

Takuya Ueshin commented on SPARK-26442:
---

I'm sorry for bothering you.
Since there are so many hardcoded configs used, I'd like to split into small 
pieces to avoid conflict efforts and to make review easier.
Please feel free to combine some subtasks into one PR.
Thanks!

> Use ConfigEntry for hardcoded configs.
> --
>
> Key: SPARK-26442
> URL: https://issues.apache.org/jira/browse/SPARK-26442
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> This umbrella JIRA is to make hardcoded configs to use {{ConfigEntry}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26397) Driver-side only metrics support

2018-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26397:


Assignee: Apache Spark

> Driver-side only metrics support
> 
>
> Key: SPARK-26397
> URL: https://issues.apache.org/jira/browse/SPARK-26397
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuanjian Li
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.0.0
>
>
> As the comment in 
> [https://github.com/apache/spark/pull/23327#discussion_r242646521|https://github.com/apache/spark/pull/23327#discussion_r242646521],
>  during the work of SPARK-26222 and SPARK-26223, we need the support for 
> driver-side only metrics, which will mark the metadata relative metrics as 
> driver-side only and will not send them to executor-side.
> This issue needs some changes in SparkPlan and SparkPlanInfo, we should also 
> check is there any misuse before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26397) Driver-side only metrics support

2018-12-28 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730311#comment-16730311
 ] 

Apache Spark commented on SPARK-26397:
--

User 'xuanyuanking' has created a pull request for this issue:
https://github.com/apache/spark/pull/23396

> Driver-side only metrics support
> 
>
> Key: SPARK-26397
> URL: https://issues.apache.org/jira/browse/SPARK-26397
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuanjian Li
>Priority: Major
> Fix For: 3.0.0
>
>
> As the comment in 
> [https://github.com/apache/spark/pull/23327#discussion_r242646521|https://github.com/apache/spark/pull/23327#discussion_r242646521],
>  during the work of SPARK-26222 and SPARK-26223, we need the support for 
> driver-side only metrics, which will mark the metadata relative metrics as 
> driver-side only and will not send them to executor-side.
> This issue needs some changes in SparkPlan and SparkPlanInfo, we should also 
> check is there any misuse before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26397) Driver-side only metrics support

2018-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26397:


Assignee: (was: Apache Spark)

> Driver-side only metrics support
> 
>
> Key: SPARK-26397
> URL: https://issues.apache.org/jira/browse/SPARK-26397
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuanjian Li
>Priority: Major
> Fix For: 3.0.0
>
>
> As the comment in 
> [https://github.com/apache/spark/pull/23327#discussion_r242646521|https://github.com/apache/spark/pull/23327#discussion_r242646521],
>  during the work of SPARK-26222 and SPARK-26223, we need the support for 
> driver-side only metrics, which will mark the metadata relative metrics as 
> driver-side only and will not send them to executor-side.
> This issue needs some changes in SparkPlan and SparkPlanInfo, we should also 
> check is there any misuse before.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26494) 【spark sql】 使用spark读取oracle TIMESTAMP(6) WITH LOCAL TIME ZONE 类型找不到

2018-12-28 Thread JIRA
秦坤 created SPARK-26494:
--

 Summary: 【spark sql】 使用spark读取oracle TIMESTAMP(6) WITH LOCAL TIME 
ZONE 类型找不到
 Key: SPARK-26494
 URL: https://issues.apache.org/jira/browse/SPARK-26494
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0
Reporter: 秦坤


使用spark读取oracle TIMESTAMP(6) WITH LOCAL TIME ZONE 类型找不到,

 

当数据类型为TIMESTAMP(6) WITH LOCAL TIME ZONE  

此时JdbcUtils 类 里面函数getCatalystType的 sqlType 数值为-102



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26493) spark.sql.extensions should support multiple extensions

2018-12-28 Thread Jamison Bennett (JIRA)
Jamison Bennett created SPARK-26493:
---

 Summary: spark.sql.extensions should support multiple extensions
 Key: SPARK-26493
 URL: https://issues.apache.org/jira/browse/SPARK-26493
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Jamison Bennett


The spark.sql.extensions configuration options should support multiple 
extensions. It is currently possible to load multiple extensions using the 
programatic interface (e.g. 
SparkSession.builder().master("..").withExtensions(sparkSessionExtensions1).withExtensions(sparkSessionExtensions2).getOrCreate()
 ) but the same cannot currently be done with the command line options without 
writing a wrapper extensions that combines multiple extensions.

 

Allowing multiple spark.sql.extensions, would allow the extensions to be easily 
changes on the command line or via the configuration file. Multiple extensions 
could be specified using a comma separated list of class names. Allowing 
multiple extensions should maintain backwards compatibility because existing 
spark.sql.extensions configuration settings shouldn't contain a comma because 
the value is a class name.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22579) BlockManager.getRemoteValues and BlockManager.getRemoteBytes should be implemented using streaming

2018-12-28 Thread Eyal Farago (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-22579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730301#comment-16730301
 ] 

Eyal Farago commented on SPARK-22579:
-

Glanced over this on my cell, seems like spark-25905 only addresses the reading 
side, when the executor serving this block has all values in memory a similar 
issue happens on that executor as well.

> BlockManager.getRemoteValues and BlockManager.getRemoteBytes should be 
> implemented using streaming
> --
>
> Key: SPARK-22579
> URL: https://issues.apache.org/jira/browse/SPARK-22579
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager, Spark Core
>Affects Versions: 2.1.0
>Reporter: Eyal Farago
>Priority: Major
>
> when an RDD partition is cached on an executor bu the task requiring it is 
> running on another executor (process locality ANY), the cached partition is 
> fetched via BlockManager.getRemoteValues which delegates to 
> BlockManager.getRemoteBytes, both calls are blocking.
> in my use case I had a 700GB RDD spread over 1000 partitions on a 6 nodes 
> cluster, cached to disk. rough math shows that average partition size is 
> 700MB.
> looking at spark UI it was obvious that tasks running with process locality 
> 'ANY' are much slower than local tasks (~40 seconds to 8-10 minutes ratio), I 
> was able to capture thread dumps of executors executing remote tasks and got 
> this stake trace:
> {quote}Thread ID  Thread Name Thread StateThread Locks
> 1521  Executor task launch worker-1000WAITING 
> Lock(java.util.concurrent.ThreadPoolExecutor$Worker@196462978})
> sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:202)
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
> scala.concurrent.Await$.result(package.scala:190)
> org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:190)
> org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:104)
> org.apache.spark.storage.BlockManager.getRemoteBytes(BlockManager.scala:582)
> org.apache.spark.storage.BlockManager.getRemoteValues(BlockManager.scala:550)
> org.apache.spark.storage.BlockManager.get(BlockManager.scala:638)
> org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:690)
> org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
> org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:89)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
> org.apache.spark.rdd.RDD.iterator(RDD.scala:287){quote}
> digging into the code showed that the block manager first fetches all bytes 
> (getRemoteBytes) and then wraps it with a deserialization stream, this has 
> several draw backs:
> 1. blocking, requesting executor is blocked while the remote executor is 
> serving the block.
> 2. potentially large memory footprint on requesting executor, in my use case 
> a 700mb of raw bytes stored in a ChunkedByteBuffer.
> 3. inefficient, requesting side usually don't need all values at once as it 
> consumes the values via an iterator.
> 4. potentially large memory footprint on serving executor, in case the block 
> is cached in deserialized form the serving executor has to serialize it into 
> a ChunkedByteBuffer (BlockManager.doGetLocalBytes). this is both memory & CPU 
> intensive, memory footprint can be reduced by using a limited buffer for 
> serialization 'spilling' to the response stream.
> I suggest improving this either by implementing full streaming mechanism or 
> some kind of pagination mechanism, in addition the requesting executor should 
> be able to make progress with the data it already has, blocking only when 
> local buffer is exhausted and remote side didn't deliver the next chunk of 
> the stream (or page in case of pagination) yet.



--
This message 

[jira] [Commented] (SPARK-26442) Use ConfigEntry for hardcoded configs.

2018-12-28 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730293#comment-16730293
 ] 

Sean Owen commented on SPARK-26442:
---

[~ueshin] yikes, why does this need 34 JIRAs! surely this is one logical 
change, maybe split over PRs

> Use ConfigEntry for hardcoded configs.
> --
>
> Key: SPARK-26442
> URL: https://issues.apache.org/jira/browse/SPARK-26442
> Project: Spark
>  Issue Type: Umbrella
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> This umbrella JIRA is to make hardcoded configs to use {{ConfigEntry}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26444) Stage color doesn't change with it's status

2018-12-28 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-26444:
-

Assignee: Chenxiao Mao

> Stage color doesn't change with it's status
> ---
>
> Key: SPARK-26444
> URL: https://issues.apache.org/jira/browse/SPARK-26444
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.0
>Reporter: Chenxiao Mao
>Assignee: Chenxiao Mao
>Priority: Major
> Fix For: 2.3.3, 2.4.1, 3.0.0
>
> Attachments: active.png, complete.png, failed.png
>
>
> On job page, in event timeline section, stage color doesn't change according 
> to its status. See attachments for some screen shots. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26444) Stage color doesn't change with it's status

2018-12-28 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-26444:
--
Priority: Minor  (was: Major)

> Stage color doesn't change with it's status
> ---
>
> Key: SPARK-26444
> URL: https://issues.apache.org/jira/browse/SPARK-26444
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.0
>Reporter: Chenxiao Mao
>Assignee: Chenxiao Mao
>Priority: Minor
> Fix For: 2.3.3, 2.4.1, 3.0.0
>
> Attachments: active.png, complete.png, failed.png
>
>
> On job page, in event timeline section, stage color doesn't change according 
> to its status. See attachments for some screen shots. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26444) Stage color doesn't change with it's status

2018-12-28 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-26444.
---
   Resolution: Fixed
Fix Version/s: 2.4.1
   3.0.0
   2.3.3

Issue resolved by pull request 23385
[https://github.com/apache/spark/pull/23385]

> Stage color doesn't change with it's status
> ---
>
> Key: SPARK-26444
> URL: https://issues.apache.org/jira/browse/SPARK-26444
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.4.0
>Reporter: Chenxiao Mao
>Assignee: Chenxiao Mao
>Priority: Major
> Fix For: 2.3.3, 3.0.0, 2.4.1
>
> Attachments: active.png, complete.png, failed.png
>
>
> On job page, in event timeline section, stage color doesn't change according 
> to its status. See attachments for some screen shots. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26470) Use ConfigEntry for hardcoded configs for eventLog category.

2018-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26470:


Assignee: Apache Spark

> Use ConfigEntry for hardcoded configs for eventLog category.
> 
>
> Key: SPARK-26470
> URL: https://issues.apache.org/jira/browse/SPARK-26470
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26470) Use ConfigEntry for hardcoded configs for eventLog category.

2018-12-28 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26470:


Assignee: (was: Apache Spark)

> Use ConfigEntry for hardcoded configs for eventLog category.
> 
>
> Key: SPARK-26470
> URL: https://issues.apache.org/jira/browse/SPARK-26470
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26458) OneHotEncoderModel verifies the number of category values incorrectly when tries to transform a dataframe.

2018-12-28 Thread Marco Gaido (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730188#comment-16730188
 ] 

Marco Gaido commented on SPARK-26458:
-

Which is the issue you are encountering? Can you provide a reproducer for your 
issue and the current and expected behavior? Thanks.

> OneHotEncoderModel verifies the number of category values incorrectly when 
> tries to transform a dataframe.
> --
>
> Key: SPARK-26458
> URL: https://issues.apache.org/jira/browse/SPARK-26458
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.3.1
>Reporter: duruihuan
>Priority: Major
>
> When the handleInvalid is set to "keep", then one should not compare the 
> categorySizes of the tranformSchema and the values of the metadata of the 
> dataframe to be transformed. Because there may be more than one invalid 
> values in some columns in the dataframe, which causes exception as described 
> in lines 302-306 in OneHotEncoderEstimator.scala. To be concluded, I think 
> the verifyNumOfValues in the method transformSchema should be removed, which 
> can be found in line 299 in the code.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26433) Tail method for spark DataFrame

2018-12-28 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-26433.
--
Resolution: Won't Fix

> Tail method for spark DataFrame
> ---
>
> Key: SPARK-26433
> URL: https://issues.apache.org/jira/browse/SPARK-26433
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 2.4.0
>Reporter: Jan Gorecki
>Priority: Major
>
> There is a head method for spark dataframes which work fine but there doesn't 
> seems to be tail method.
> ```
> >>> ans   
> >>>   
> DataFrame[v1: bigint] 
>   
> >>> ans.head(3)   
> >>>  
> [Row(v1=299443), Row(v1=299493), Row(v1=300751)]
> >>> ans.tail(3)
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/home/jan/git/db-benchmark/spark/py-spark/lib/python3.6/site-packages/py
> spark/sql/dataframe.py", line 1300, in __getattr__
> "'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
> AttributeError: 'DataFrame' object has no attribute 'tail'
> ```
> I would like to feature request Tail method for spark dataframe



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26433) Tail method for spark DataFrame

2018-12-28 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730180#comment-16730180
 ] 

Hyukjin Kwon commented on SPARK-26433:
--

You can simply do it after {{collect()}}. Let's avoid to add APIs when 
workarounds are easy. Spark already has a lot of APIs and I think we should 
focus on deprecading and reducing it, and only add APIs when they're absolutely 
worth.

> Tail method for spark DataFrame
> ---
>
> Key: SPARK-26433
> URL: https://issues.apache.org/jira/browse/SPARK-26433
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 2.4.0
>Reporter: Jan Gorecki
>Priority: Major
>
> There is a head method for spark dataframes which work fine but there doesn't 
> seems to be tail method.
> ```
> >>> ans   
> >>>   
> DataFrame[v1: bigint] 
>   
> >>> ans.head(3)   
> >>>  
> [Row(v1=299443), Row(v1=299493), Row(v1=300751)]
> >>> ans.tail(3)
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/home/jan/git/db-benchmark/spark/py-spark/lib/python3.6/site-packages/py
> spark/sql/dataframe.py", line 1300, in __getattr__
> "'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
> AttributeError: 'DataFrame' object has no attribute 'tail'
> ```
> I would like to feature request Tail method for spark dataframe



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26455) Spark Kinesis Integration with no SSL

2018-12-28 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-26455.
--
Resolution: Invalid

If it's asking for help, let's do that in Spark mailing list. We can discuss 
and investigate further before filing an issue here. You could have better 
answer there.

> Spark Kinesis Integration with no SSL
> -
>
> Key: SPARK-26455
> URL: https://issues.apache.org/jira/browse/SPARK-26455
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.3.0
>Reporter: Shashikant Bangera
>Priority: Major
>
> Hi, 
> we are trying access the endpoint thought library mentioned below and we get 
> the SSL error i think internally it use KCL library. if you look at the 
> error, so if I have to skip the certificate is it possible through KCL utils 
> call ? because I do not find any provision to do that with set SSL as false 
> within spark streaming kinesis library like we do with KCL. Can you please 
> help me with the same.
> compile("org.apache.spark:spark-streaming-kinesis-asl_2.11:2.3.0") {
>  exclude group: 'org.apache.spark', module: 'spark-streaming_2.11'
> }
> Caused by: javax.net.ssl.SSLPeerUnverifiedException: Certificate for 
> kinesis-endpoint> doesn't match any of the subject alternative names: 
> [kinesis-fips.us-east-1.amazonaws.com, 
> *.kinesis.us-east-1.vpce.amazonaws.com, kinesis.us-east-1.amazonaws.com]
>  at 
> org.apache.http.conn.ssl.SSLConnectionSocketFactory.verifyHostname(SSLConnectionSocketFactory.java:467)
>  at 
> org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:397)
>  at 
> org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:355)
>  at 
> shade.com.amazonaws.http.conn.ssl.SdkTLSSocketFactory.connectSocket(SdkTLSSocketFactory.java:132)
>  at 
> org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
>  at 
> org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:373)
>  at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at 
> shade.com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)
>  at shade.com.amazonaws.http.conn.$Proxy18.connect(Unknown Source)
>  at 
> org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381)
>  at 
> org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237)
>  at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
>  at 
> org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
>  at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
>  at 
> org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
>  at 
> shade.com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
>  at 
> shade.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1238)
>  at 
> shade.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1058)
>  ... 20 more



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26454) IllegegalArgument Exception is Thrown while creating new UDF with JAR

2018-12-28 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16730173#comment-16730173
 ] 

Hyukjin Kwon commented on SPARK-26454:
--

what;s full stacktrace of {{IllegalArgumentException}}? Do you mean this is 
also reproducible in local file system as well?

> IllegegalArgument Exception is Thrown while creating new UDF with JAR
> -
>
> Key: SPARK-26454
> URL: https://issues.apache.org/jira/browse/SPARK-26454
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell
>Affects Versions: 2.3.2
>Reporter: Udbhav Agrawal
>Priority: Major
>
> 【Test step】:
> 1.launch spark-shell
> 2. set role admin;
> 3. create new function
>   CREATE FUNCTION Func AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'
> 4. Do select on the function
> sql("select Func('2018-03-09')").show()
> 5.Create new UDF with same JAR
>    sql("CREATE FUNCTION newFunc AS 
> 'org.apache.hadoop.hive.ql.udf.generic.GenericUDFLastDayTest' USING JAR 
> 'hdfs:///tmp/super_udf/two_udfs.jar'")
> 6. Do select on the new function created.
> sql("select newFunc ('2018-03-09')").show()
> 【Output】:
> Function is getting created but illegal argument exception is thrown , select 
> provides result but with illegal argument exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26492) support streaming DecisionTreeRegressor

2018-12-28 Thread sky54521 (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sky54521 updated SPARK-26492:
-
Labels: DecisionTreeRegressor  (was: )

> support streaming DecisionTreeRegressor
> ---
>
> Key: SPARK-26492
> URL: https://issues.apache.org/jira/browse/SPARK-26492
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Affects Versions: 2.4.0
>Reporter: sky54521
>Priority: Major
>  Labels: DecisionTreeRegressor
>
> hope to support streaming DecisionTreeRegressor as soon as possible



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26492) support streaming DecisionTreeRegressor

2018-12-28 Thread sky54521 (JIRA)
sky54521 created SPARK-26492:


 Summary: support streaming DecisionTreeRegressor
 Key: SPARK-26492
 URL: https://issues.apache.org/jira/browse/SPARK-26492
 Project: Spark
  Issue Type: New Feature
  Components: MLlib
Affects Versions: 2.4.0
Reporter: sky54521


hope to support streaming DecisionTreeRegressor as soon as possible



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26446) Add cachedExecutorIdleTimeout docs at ExecutorAllocationManager

2018-12-28 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-26446:
-

Assignee: Qingxin Wu

> Add cachedExecutorIdleTimeout docs at ExecutorAllocationManager
> ---
>
> Key: SPARK-26446
> URL: https://issues.apache.org/jira/browse/SPARK-26446
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, Spark Core
>Affects Versions: 2.4.0
>Reporter: Qingxin Wu
>Assignee: Qingxin Wu
>Priority: Minor
> Fix For: 3.0.0
>
>
> Add docs to describe how remove policy act while considering the property 
> _*{{spark.dynamicAllocation.cachedExecutorIdleTimeout}}*_ in 
> ExecutorAllocationManager.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26446) Add cachedExecutorIdleTimeout docs at ExecutorAllocationManager

2018-12-28 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-26446:
--
Component/s: Documentation

> Add cachedExecutorIdleTimeout docs at ExecutorAllocationManager
> ---
>
> Key: SPARK-26446
> URL: https://issues.apache.org/jira/browse/SPARK-26446
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, Spark Core
>Affects Versions: 2.4.0
>Reporter: Qingxin Wu
>Assignee: Qingxin Wu
>Priority: Minor
> Fix For: 3.0.0
>
>
> Add docs to describe how remove policy act while considering the property 
> _*{{spark.dynamicAllocation.cachedExecutorIdleTimeout}}*_ in 
> ExecutorAllocationManager.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26446) Add cachedExecutorIdleTimeout docs at ExecutorAllocationManager

2018-12-28 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-26446.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

This is resolved via https://github.com/apache/spark/pull/23386 .

> Add cachedExecutorIdleTimeout docs at ExecutorAllocationManager
> ---
>
> Key: SPARK-26446
> URL: https://issues.apache.org/jira/browse/SPARK-26446
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Qingxin Wu
>Priority: Minor
> Fix For: 3.0.0
>
>
> Add docs to describe how remove policy act while considering the property 
> _*{{spark.dynamicAllocation.cachedExecutorIdleTimeout}}*_ in 
> ExecutorAllocationManager.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26470) Use ConfigEntry for hardcoded configs for eventLog category.

2018-12-28 Thread Takuya Ueshin (JIRA)
Takuya Ueshin created SPARK-26470:
-

 Summary: Use ConfigEntry for hardcoded configs for eventLog 
category.
 Key: SPARK-26470
 URL: https://issues.apache.org/jira/browse/SPARK-26470
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org