[jira] [Assigned] (SPARK-22892) Simplify some estimation logic by using double instead of decimal

2017-12-28 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-22892:
---

Assignee: Zhenhua Wang

> Simplify some estimation logic by using double instead of decimal
> -
>
> Key: SPARK-22892
> URL: https://issues.apache.org/jira/browse/SPARK-22892
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Zhenhua Wang
>Assignee: Zhenhua Wang
>Priority: Minor
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-22892) Simplify some estimation logic by using double instead of decimal

2017-12-28 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-22892.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 20062
[https://github.com/apache/spark/pull/20062]

> Simplify some estimation logic by using double instead of decimal
> -
>
> Key: SPARK-22892
> URL: https://issues.apache.org/jira/browse/SPARK-22892
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Zhenhua Wang
>Priority: Minor
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22834) Make insert commands have real children to fix UI issues

2017-12-28 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-22834:
---

Assignee: Gengliang Wang

> Make insert commands have real children to fix UI issues
> 
>
> Key: SPARK-22834
> URL: https://issues.apache.org/jira/browse/SPARK-22834
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
> Fix For: 2.3.0
>
>
> With https://github.com/apache/spark/pull/19474, children of insert commands 
> in UI are missing. To fix it, create a new physical plan 
> `DataWritingCommandExec` to exec `DataWritingCommand` with children.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-22834) Make insert commands have real children to fix UI issues

2017-12-28 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-22834.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 20020
[https://github.com/apache/spark/pull/20020]

> Make insert commands have real children to fix UI issues
> 
>
> Key: SPARK-22834
> URL: https://issues.apache.org/jira/browse/SPARK-22834
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1
>Reporter: Gengliang Wang
> Fix For: 2.3.0
>
>
> With https://github.com/apache/spark/pull/19474, children of insert commands 
> in UI are missing. To fix it, create a new physical plan 
> `DataWritingCommandExec` to exec `DataWritingCommand` with children.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-22891) NullPointerException when use udf

2017-12-28 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-22891.
-
   Resolution: Fixed
 Assignee: Feng Liu
Fix Version/s: 2.3.0

> NullPointerException when use udf
> -
>
> Key: SPARK-22891
> URL: https://issues.apache.org/jira/browse/SPARK-22891
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.2.1
> Environment: hadoop 2.7.2
>Reporter: gaoyang
>Assignee: Feng Liu
>Priority: Minor
> Fix For: 2.3.0
>
>
> In my application,i use multi threads. Each thread has a SparkSession and use 
> SparkSession.sqlContext.udf.register to register my udf. Sometimes there 
> throws exception like this:
> {code:java}
> java.lang.IllegalArgumentException: Error while instantiating 
> 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
>   at 
> org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136)
>   at 
> org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133)
>   at org.apache.spark.sql.SparkSession.udf(SparkSession.scala:207)
>   at org.apache.spark.sql.SQLContext.udf(SQLContext.scala:203)
>   at 
> com.game.data.stat.clusterTask.tools.standard.IpConverterRegister$.run(IpConverterRegister.scala:63)
>   at 
>   ... 20 more
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:789)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:79)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader$lzycompute(HiveSessionStateBuilder.scala:45)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader(HiveSessionStateBuilder.scala:44)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:61)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:35)
>   at 
> org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:289)
>   at 
> org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1059)
>   ... 20 more
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:744)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.getAuthenticator(SessionState.java:1391)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:210)
>   ... 34 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:769)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:736)
>   ... 36 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.isCompatibleWith(HiveMetaStoreClient.java:287)
>   at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
>   at com.sun.proxy.$Proxy25.isCompatibleWith(Unknown Source)
>   at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:206)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:765)
>   ... 37 more
> {code}
> Also, i use apache hive 2.1.1 in my cluster.
> When i use sp

[jira] [Assigned] (SPARK-22530) Add ArrayType Support for working with Pandas and Arrow

2017-12-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22530:


Assignee: Apache Spark

> Add ArrayType Support for working with Pandas and Arrow
> ---
>
> Key: SPARK-22530
> URL: https://issues.apache.org/jira/browse/SPARK-22530
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>Assignee: Apache Spark
>
> Adding ArrayType support for {{toPandas()}}, {{createDataFrame}} from Pandas, 
> and {{pandas_udf}}.  I believe, arrays are already supported on the 
> Java/Scala side, so just need to complete this for Python.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22530) Add ArrayType Support for working with Pandas and Arrow

2017-12-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22530:


Assignee: (was: Apache Spark)

> Add ArrayType Support for working with Pandas and Arrow
> ---
>
> Key: SPARK-22530
> URL: https://issues.apache.org/jira/browse/SPARK-22530
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>
> Adding ArrayType support for {{toPandas()}}, {{createDataFrame}} from Pandas, 
> and {{pandas_udf}}.  I believe, arrays are already supported on the 
> Java/Scala side, so just need to complete this for Python.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22530) Add ArrayType Support for working with Pandas and Arrow

2017-12-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306023#comment-16306023
 ] 

Apache Spark commented on SPARK-22530:
--

User 'BryanCutler' has created a pull request for this issue:
https://github.com/apache/spark/pull/20114

> Add ArrayType Support for working with Pandas and Arrow
> ---
>
> Key: SPARK-22530
> URL: https://issues.apache.org/jira/browse/SPARK-22530
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 2.3.0
>Reporter: Bryan Cutler
>
> Adding ArrayType support for {{toPandas()}}, {{createDataFrame}} from Pandas, 
> and {{pandas_udf}}.  I believe, arrays are already supported on the 
> Java/Scala side, so just need to complete this for Python.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22905) Fix ChiSqSelectorModel save implementation

2017-12-28 Thread zhengruifeng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306020#comment-16306020
 ] 

zhengruifeng commented on SPARK-22905:
--

[~WeichenXu123] I made a check and found that same issue exists in 
{{GaussianMixtureModel}}, otherwise looks fine.

> Fix ChiSqSelectorModel save implementation
> --
>
> Key: SPARK-22905
> URL: https://issues.apache.org/jira/browse/SPARK-22905
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 2.2.1
>Reporter: Weichen Xu
>Assignee: Weichen Xu
> Fix For: 2.3.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, in `ChiSqSelectorModel`, save:
> {code}
> spark.createDataFrame(dataArray).repartition(1).write...
> {code}
> The default partition number used by createDataFrame is "defaultParallelism",
> Current RoundRobinPartitioning won't guarantee the "repartition" generating 
> the same order result with local array. We need fix it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22905) Fix ChiSqSelectorModel save implementation

2017-12-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306018#comment-16306018
 ] 

Apache Spark commented on SPARK-22905:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/20113

> Fix ChiSqSelectorModel save implementation
> --
>
> Key: SPARK-22905
> URL: https://issues.apache.org/jira/browse/SPARK-22905
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 2.2.1
>Reporter: Weichen Xu
>Assignee: Weichen Xu
> Fix For: 2.3.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, in `ChiSqSelectorModel`, save:
> {code}
> spark.createDataFrame(dataArray).repartition(1).write...
> {code}
> The default partition number used by createDataFrame is "defaultParallelism",
> Current RoundRobinPartitioning won't guarantee the "repartition" generating 
> the same order result with local array. We need fix it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22883) ML test for StructuredStreaming: spark.ml.feature, A-M

2017-12-28 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306014#comment-16306014
 ] 

Joseph K. Bradley commented on SPARK-22883:
---

https://github.com/apache/spark/pull/20111 is part 1 of 2 for this JIRA.

> ML test for StructuredStreaming: spark.ml.feature, A-M
> --
>
> Key: SPARK-22883
> URL: https://issues.apache.org/jira/browse/SPARK-22883
> Project: Spark
>  Issue Type: Test
>  Components: ML, Tests
>Affects Versions: 2.3.0
>Reporter: Joseph K. Bradley
>
> *For featurizers with names from A - M*
> Task for adding Structured Streaming tests for all Models/Transformers in a 
> sub-module in spark.ml
> For an example, see LinearRegressionSuite.scala in 
> https://github.com/apache/spark/pull/19843



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22734) Create Python API for VectorSizeHint

2017-12-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305978#comment-16305978
 ] 

Apache Spark commented on SPARK-22734:
--

User 'MrBago' has created a pull request for this issue:
https://github.com/apache/spark/pull/20112

> Create Python API for VectorSizeHint
> 
>
> Key: SPARK-22734
> URL: https://issues.apache.org/jira/browse/SPARK-22734
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 2.2.0
>Reporter: Bago Amirbekian
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22722) Test Coverage for Type Coercion Compatibility

2017-12-28 Thread Yuming Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305979#comment-16305979
 ] 

Yuming Wang commented on SPARK-22722:
-

[~smilegator] All tests are added, except 
[FunctionArgumentConversion|https://github.com/apache/spark/pull/20008#issuecomment-352670852]
 and 
[StackCoercion|https://github.com/apache/spark/pull/20006#pullrequestreview-84366891].

> Test Coverage for Type Coercion Compatibility
> -
>
> Key: SPARK-22722
> URL: https://issues.apache.org/jira/browse/SPARK-22722
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Xiao Li
>Assignee: Yuming Wang
>
> Hive compatibility is pretty important for the users who run or migrate both 
> Hive and Spark SQL. 
> We plan to add a SQLConf for type coercion compatibility 
> (spark.sql.typeCoercion.mode). Users can choose Spark's native mode (default) 
> or Hive mode (hive). 
> Before we deliver the Hive compatibility mode, we plan to write a set of test 
> cases that can be easily run in both Spark and Hive sides. We can easily 
> compare whether they are the same or not. When new typeCoercion rules are 
> added, we also can easily track the changes. These test cases can also be 
> backported to the previous Spark versions for determining the changes we 
> made. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22734) Create Python API for VectorSizeHint

2017-12-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22734:


Assignee: (was: Apache Spark)

> Create Python API for VectorSizeHint
> 
>
> Key: SPARK-22734
> URL: https://issues.apache.org/jira/browse/SPARK-22734
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 2.2.0
>Reporter: Bago Amirbekian
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22734) Create Python API for VectorSizeHint

2017-12-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22734:


Assignee: Apache Spark

> Create Python API for VectorSizeHint
> 
>
> Key: SPARK-22734
> URL: https://issues.apache.org/jira/browse/SPARK-22734
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 2.2.0
>Reporter: Bago Amirbekian
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22922) Python API for fitMultiple

2017-12-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305971#comment-16305971
 ] 

Apache Spark commented on SPARK-22922:
--

User 'MrBago' has created a pull request for this issue:
https://github.com/apache/spark/pull/20058

> Python API for fitMultiple
> --
>
> Key: SPARK-22922
> URL: https://issues.apache.org/jira/browse/SPARK-22922
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 2.2.0
>Reporter: Bago Amirbekian
>
> Implement fitMultiple method on Estimator for pyspark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22922) Python API for fitMultiple

2017-12-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22922:


Assignee: Apache Spark

> Python API for fitMultiple
> --
>
> Key: SPARK-22922
> URL: https://issues.apache.org/jira/browse/SPARK-22922
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 2.2.0
>Reporter: Bago Amirbekian
>Assignee: Apache Spark
>
> Implement fitMultiple method on Estimator for pyspark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22922) Python API for fitMultiple

2017-12-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22922:


Assignee: (was: Apache Spark)

> Python API for fitMultiple
> --
>
> Key: SPARK-22922
> URL: https://issues.apache.org/jira/browse/SPARK-22922
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 2.2.0
>Reporter: Bago Amirbekian
>
> Implement fitMultiple method on Estimator for pyspark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22922) Python API for fitMultiple

2017-12-28 Thread Bago Amirbekian (JIRA)
Bago Amirbekian created SPARK-22922:
---

 Summary: Python API for fitMultiple
 Key: SPARK-22922
 URL: https://issues.apache.org/jira/browse/SPARK-22922
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Affects Versions: 2.2.0
Reporter: Bago Amirbekian


Implement fitMultiple method on Estimator for pyspark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22883) ML test for StructuredStreaming: spark.ml.feature, A-M

2017-12-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305948#comment-16305948
 ] 

Apache Spark commented on SPARK-22883:
--

User 'jkbradley' has created a pull request for this issue:
https://github.com/apache/spark/pull/20111

> ML test for StructuredStreaming: spark.ml.feature, A-M
> --
>
> Key: SPARK-22883
> URL: https://issues.apache.org/jira/browse/SPARK-22883
> Project: Spark
>  Issue Type: Test
>  Components: ML, Tests
>Affects Versions: 2.3.0
>Reporter: Joseph K. Bradley
>
> *For featurizers with names from A - M*
> Task for adding Structured Streaming tests for all Models/Transformers in a 
> sub-module in spark.ml
> For an example, see LinearRegressionSuite.scala in 
> https://github.com/apache/spark/pull/19843



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22883) ML test for StructuredStreaming: spark.ml.feature, A-M

2017-12-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22883:


Assignee: Apache Spark

> ML test for StructuredStreaming: spark.ml.feature, A-M
> --
>
> Key: SPARK-22883
> URL: https://issues.apache.org/jira/browse/SPARK-22883
> Project: Spark
>  Issue Type: Test
>  Components: ML, Tests
>Affects Versions: 2.3.0
>Reporter: Joseph K. Bradley
>Assignee: Apache Spark
>
> *For featurizers with names from A - M*
> Task for adding Structured Streaming tests for all Models/Transformers in a 
> sub-module in spark.ml
> For an example, see LinearRegressionSuite.scala in 
> https://github.com/apache/spark/pull/19843



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22883) ML test for StructuredStreaming: spark.ml.feature, A-M

2017-12-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22883:


Assignee: (was: Apache Spark)

> ML test for StructuredStreaming: spark.ml.feature, A-M
> --
>
> Key: SPARK-22883
> URL: https://issues.apache.org/jira/browse/SPARK-22883
> Project: Spark
>  Issue Type: Test
>  Components: ML, Tests
>Affects Versions: 2.3.0
>Reporter: Joseph K. Bradley
>
> *For featurizers with names from A - M*
> Task for adding Structured Streaming tests for all Models/Transformers in a 
> sub-module in spark.ml
> For an example, see LinearRegressionSuite.scala in 
> https://github.com/apache/spark/pull/19843



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22905) Fix ChiSqSelectorModel save implementation

2017-12-28 Thread Weichen Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305931#comment-16305931
 ] 

Weichen Xu commented on SPARK-22905:


[~podongfeng] Some of them only including one row to save so there's no bug, 
some case including row-number column and when reading it will sort to get 
stable order. But I am not sure I miss some cases, it will great if you help 
check.

> Fix ChiSqSelectorModel save implementation
> --
>
> Key: SPARK-22905
> URL: https://issues.apache.org/jira/browse/SPARK-22905
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 2.2.1
>Reporter: Weichen Xu
>Assignee: Weichen Xu
> Fix For: 2.3.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, in `ChiSqSelectorModel`, save:
> {code}
> spark.createDataFrame(dataArray).repartition(1).write...
> {code}
> The default partition number used by createDataFrame is "defaultParallelism",
> Current RoundRobinPartitioning won't guarantee the "repartition" generating 
> the same order result with local array. We need fix it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22313) Mark/print deprecation warnings as DeprecationWarning for deprecated APIs

2017-12-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305917#comment-16305917
 ] 

Apache Spark commented on SPARK-22313:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/20110

> Mark/print deprecation warnings as DeprecationWarning for deprecated APIs
> -
>
> Key: SPARK-22313
> URL: https://issues.apache.org/jira/browse/SPARK-22313
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.3.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 2.3.0
>
>
> Currently, some {{warnings.warn(...)}} for deprecation use the category 
> {{UserWarning}} as by default.
> If we use {{DeprecationWarning}}, this can actually be useful in IDE, in my 
> case, PyCharm. Please see before and after in the PR. I happened to open a PR 
> first to show my idea.
> Also, looks some deprecated functions do not have this warnings. It might be 
> better to print out those explicitly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-22905) Fix ChiSqSelectorModel save implementation

2017-12-28 Thread zhengruifeng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305905#comment-16305905
 ] 

zhengruifeng edited comment on SPARK-22905 at 12/29/17 2:08 AM:


Many other models are saved in the same way 
{{sparkSession.createDataFrame(...).repartition(1).write.parquet}}, are they 
needed to be fixed?


was (Author: podongfeng):
Many other models are saved in the same way 
{sparkSession.createDataFrame(...).repartition(1).write.parquet}, are they 
needed to be fixed?

> Fix ChiSqSelectorModel save implementation
> --
>
> Key: SPARK-22905
> URL: https://issues.apache.org/jira/browse/SPARK-22905
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 2.2.1
>Reporter: Weichen Xu
>Assignee: Weichen Xu
> Fix For: 2.3.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, in `ChiSqSelectorModel`, save:
> {code}
> spark.createDataFrame(dataArray).repartition(1).write...
> {code}
> The default partition number used by createDataFrame is "defaultParallelism",
> Current RoundRobinPartitioning won't guarantee the "repartition" generating 
> the same order result with local array. We need fix it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22905) Fix ChiSqSelectorModel save implementation

2017-12-28 Thread zhengruifeng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305905#comment-16305905
 ] 

zhengruifeng commented on SPARK-22905:
--

Many other models are saved in the same way 
{sparkSession.createDataFrame(...).repartition(1).write.parquet}, are they 
needed to be fixed?

> Fix ChiSqSelectorModel save implementation
> --
>
> Key: SPARK-22905
> URL: https://issues.apache.org/jira/browse/SPARK-22905
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 2.2.1
>Reporter: Weichen Xu
>Assignee: Weichen Xu
> Fix For: 2.3.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, in `ChiSqSelectorModel`, save:
> {code}
> spark.createDataFrame(dataArray).repartition(1).write...
> {code}
> The default partition number used by createDataFrame is "defaultParallelism",
> Current RoundRobinPartitioning won't guarantee the "repartition" generating 
> the same order result with local array. We need fix it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-22905) Fix ChiSqSelectorModel save implementation

2017-12-28 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-22905.
---
   Resolution: Fixed
Fix Version/s: 2.3.0

Resolved by https://github.com/apache/spark/pull/20088

> Fix ChiSqSelectorModel save implementation
> --
>
> Key: SPARK-22905
> URL: https://issues.apache.org/jira/browse/SPARK-22905
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 2.2.1
>Reporter: Weichen Xu
>Assignee: Weichen Xu
> Fix For: 2.3.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, in `ChiSqSelectorModel`, save:
> {code}
> spark.createDataFrame(dataArray).repartition(1).write...
> {code}
> The default partition number used by createDataFrame is "defaultParallelism",
> Current RoundRobinPartitioning won't guarantee the "repartition" generating 
> the same order result with local array. We need fix it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22891) NullPointerException when use udf

2017-12-28 Thread Feng Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305881#comment-16305881
 ] 

Feng Liu commented on SPARK-22891:
--

A side note: if we don't want to merge 
https://github.com/apache/spark/pull/20029, we should make the creation of hive 
client lazy inside the HiveSessionResourceLoader: 
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionStateBuilder.scala#L123
 as we know the hive client creation is expensive, so it does not make sense to 
materialize it if we don't use it. 

> NullPointerException when use udf
> -
>
> Key: SPARK-22891
> URL: https://issues.apache.org/jira/browse/SPARK-22891
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.2.1
> Environment: hadoop 2.7.2
>Reporter: gaoyang
>Priority: Minor
>
> In my application,i use multi threads. Each thread has a SparkSession and use 
> SparkSession.sqlContext.udf.register to register my udf. Sometimes there 
> throws exception like this:
> {code:java}
> java.lang.IllegalArgumentException: Error while instantiating 
> 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
>   at 
> org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136)
>   at 
> org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133)
>   at org.apache.spark.sql.SparkSession.udf(SparkSession.scala:207)
>   at org.apache.spark.sql.SQLContext.udf(SQLContext.scala:203)
>   at 
> com.game.data.stat.clusterTask.tools.standard.IpConverterRegister$.run(IpConverterRegister.scala:63)
>   at 
>   ... 20 more
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:789)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:79)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader$lzycompute(HiveSessionStateBuilder.scala:45)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader(HiveSessionStateBuilder.scala:44)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:61)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:35)
>   at 
> org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:289)
>   at 
> org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1059)
>   ... 20 more
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:744)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.getAuthenticator(SessionState.java:1391)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:210)
>   ... 34 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:769)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:736)
>   ... 36 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.isCompatibleWith(HiveMetaStoreClient.java:287)
>   at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)

[jira] [Commented] (SPARK-22891) NullPointerException when use udf

2017-12-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305874#comment-16305874
 ] 

Apache Spark commented on SPARK-22891:
--

User 'liufengdb' has created a pull request for this issue:
https://github.com/apache/spark/pull/20109

> NullPointerException when use udf
> -
>
> Key: SPARK-22891
> URL: https://issues.apache.org/jira/browse/SPARK-22891
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.2.1
> Environment: hadoop 2.7.2
>Reporter: gaoyang
>Priority: Minor
>
> In my application,i use multi threads. Each thread has a SparkSession and use 
> SparkSession.sqlContext.udf.register to register my udf. Sometimes there 
> throws exception like this:
> {code:java}
> java.lang.IllegalArgumentException: Error while instantiating 
> 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
>   at 
> org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136)
>   at 
> org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133)
>   at org.apache.spark.sql.SparkSession.udf(SparkSession.scala:207)
>   at org.apache.spark.sql.SQLContext.udf(SQLContext.scala:203)
>   at 
> com.game.data.stat.clusterTask.tools.standard.IpConverterRegister$.run(IpConverterRegister.scala:63)
>   at 
>   ... 20 more
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:789)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:79)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader$lzycompute(HiveSessionStateBuilder.scala:45)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader(HiveSessionStateBuilder.scala:44)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:61)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:35)
>   at 
> org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:289)
>   at 
> org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1059)
>   ... 20 more
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:744)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.getAuthenticator(SessionState.java:1391)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:210)
>   ... 34 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:769)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:736)
>   ... 36 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.isCompatibleWith(HiveMetaStoreClient.java:287)
>   at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
>   at com.sun.proxy.$Proxy25.isCompatibleWith(Unknown Source)
>   at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:206)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:765)
>   ... 37 more
> {code}
> Also, i use apache hive 2.1.1 in

[jira] [Assigned] (SPARK-22891) NullPointerException when use udf

2017-12-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22891:


Assignee: (was: Apache Spark)

> NullPointerException when use udf
> -
>
> Key: SPARK-22891
> URL: https://issues.apache.org/jira/browse/SPARK-22891
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.2.1
> Environment: hadoop 2.7.2
>Reporter: gaoyang
>Priority: Minor
>
> In my application,i use multi threads. Each thread has a SparkSession and use 
> SparkSession.sqlContext.udf.register to register my udf. Sometimes there 
> throws exception like this:
> {code:java}
> java.lang.IllegalArgumentException: Error while instantiating 
> 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
>   at 
> org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136)
>   at 
> org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133)
>   at org.apache.spark.sql.SparkSession.udf(SparkSession.scala:207)
>   at org.apache.spark.sql.SQLContext.udf(SQLContext.scala:203)
>   at 
> com.game.data.stat.clusterTask.tools.standard.IpConverterRegister$.run(IpConverterRegister.scala:63)
>   at 
>   ... 20 more
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:789)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:79)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader$lzycompute(HiveSessionStateBuilder.scala:45)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader(HiveSessionStateBuilder.scala:44)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:61)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:35)
>   at 
> org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:289)
>   at 
> org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1059)
>   ... 20 more
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:744)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.getAuthenticator(SessionState.java:1391)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:210)
>   ... 34 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:769)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:736)
>   ... 36 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.isCompatibleWith(HiveMetaStoreClient.java:287)
>   at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
>   at com.sun.proxy.$Proxy25.isCompatibleWith(Unknown Source)
>   at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:206)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:765)
>   ... 37 more
> {code}
> Also, i use apache hive 2.1.1 in my cluster.
> When i use spark 2.1.x, the exception above never happends again.



--
This message was sent by Atla

[jira] [Assigned] (SPARK-22891) NullPointerException when use udf

2017-12-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22891:


Assignee: Apache Spark

> NullPointerException when use udf
> -
>
> Key: SPARK-22891
> URL: https://issues.apache.org/jira/browse/SPARK-22891
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.2.1
> Environment: hadoop 2.7.2
>Reporter: gaoyang
>Assignee: Apache Spark
>Priority: Minor
>
> In my application,i use multi threads. Each thread has a SparkSession and use 
> SparkSession.sqlContext.udf.register to register my udf. Sometimes there 
> throws exception like this:
> {code:java}
> java.lang.IllegalArgumentException: Error while instantiating 
> 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
>   at 
> org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136)
>   at 
> org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133)
>   at org.apache.spark.sql.SparkSession.udf(SparkSession.scala:207)
>   at org.apache.spark.sql.SQLContext.udf(SQLContext.scala:203)
>   at 
> com.game.data.stat.clusterTask.tools.standard.IpConverterRegister$.run(IpConverterRegister.scala:63)
>   at 
>   ... 20 more
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:789)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:79)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader$lzycompute(HiveSessionStateBuilder.scala:45)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader(HiveSessionStateBuilder.scala:44)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:61)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:35)
>   at 
> org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:289)
>   at 
> org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1059)
>   ... 20 more
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:744)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.getAuthenticator(SessionState.java:1391)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:210)
>   ... 34 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:769)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:736)
>   ... 36 more
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.isCompatibleWith(HiveMetaStoreClient.java:287)
>   at sun.reflect.GeneratedMethodAccessor45.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:156)
>   at com.sun.proxy.$Proxy25.isCompatibleWith(Unknown Source)
>   at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:206)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:765)
>   ... 37 more
> {code}
> Also, i use apache hive 2.1.1 in my cluster.
> When i use spark 2.1.x, the exception above never happends again.



--
This

[jira] [Comment Edited] (SPARK-22891) NullPointerException when use udf

2017-12-28 Thread Feng Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305862#comment-16305862
 ] 

Feng Liu edited comment on SPARK-22891 at 12/29/17 12:56 AM:
-

This is definitely caused by the race from 
https://issues.apache.org/jira/browse/HIVE-11935. 

In spark 2.1, spark creates the `metadataHive` lazily until 
`addJar`(https://github.com/apache/spark/blob/branch-2.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionState.scala#L40),
 so this can only be triggered by concurrent `addJar` (can't imagine this will 
happen in practice)

In spark 2.2, the `metadataHive` creation is tied to the `resourceLoader` 
creation (see the stack trace), so it starts to be triggered by new spark 
session creation. In https://github.com/apache/spark/pull/20029, I'm trying to 
make an argument that it is safe to remove the new hive client creation. 
Besides this change, I think we should also make the hive client creation 
thread safe: 
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala#L251




was (Author: liufeng...@gmail.com):
This is definitely caused by the race from 
https://issues.apache.org/jira/browse/HIVE-11935. 

In spark 2.1, spark creates the `metadataHive` lazily until 
`addJar`(https://github.com/apache/spark/blob/branch-2.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionState.scala#L40),
 so this can only be triggered by concurrent `addJar` (can't imagine this will 
happen in practice)

In spark 2.2, the `metadataHive` creation is tied to the `resourceLoader` 
creation (see the stack trace), so it starts to be triggered by new spark 
session creation. In https://github.com/apache/spark/pull/20029, I'm trying to 
make an argument that it is safe to remove the new hive client creation. 
Besides this change, I think should also make the hive client creation thread 
safe: 
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala#L251



> NullPointerException when use udf
> -
>
> Key: SPARK-22891
> URL: https://issues.apache.org/jira/browse/SPARK-22891
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.2.1
> Environment: hadoop 2.7.2
>Reporter: gaoyang
>Priority: Minor
>
> In my application,i use multi threads. Each thread has a SparkSession and use 
> SparkSession.sqlContext.udf.register to register my udf. Sometimes there 
> throws exception like this:
> {code:java}
> java.lang.IllegalArgumentException: Error while instantiating 
> 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
>   at 
> org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136)
>   at 
> org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133)
>   at org.apache.spark.sql.SparkSession.udf(SparkSession.scala:207)
>   at org.apache.spark.sql.SQLContext.udf(SQLContext.scala:203)
>   at 
> com.game.data.stat.clusterTask.tools.standard.IpConverterRegister$.run(IpConverterRegister.scala:63)
>   at 
>   ... 20 more
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:789)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:79)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader$lzycompute(HiveSessionStateBuilder.scala:45)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader(HiveSessionStateBuilder.scala:44)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:61)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
>   at 
> org.apache.spark.sql.

[jira] [Comment Edited] (SPARK-22891) NullPointerException when use udf

2017-12-28 Thread Feng Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305862#comment-16305862
 ] 

Feng Liu edited comment on SPARK-22891 at 12/29/17 12:49 AM:
-

This is definitely caused by the race from 
https://issues.apache.org/jira/browse/HIVE-11935. 

In spark 2.1, spark creates the `metadataHive` lazily until 
`addJar`(https://github.com/apache/spark/blob/branch-2.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionState.scala#L40),
 so this can only be triggered by concurrent `addJar` (can't imagine this will 
happen in practice)

In spark 2.2, the `metadataHive` creation is tied to the `resourceLoader` 
creation (see the stack trace), so it starts to be triggered by new spark 
session creation. In https://github.com/apache/spark/pull/20029, I'm trying to 
make an argument that it is safe to remove the new hive client creation. 
Besides this change, I think should also make the hive client creation thread 
safe: 
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala#L251




was (Author: liufeng...@gmail.com):
This is definitely caused by the race from 
https://issues.apache.org/jira/browse/HIVE-11935. 

In spark 2.1, spark creates the `metadataHive` lazily until 
`addJar`(https://github.com/apache/spark/blob/branch-2.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionState.scala#L40),
 so this can only be triggered by concurrent `addJar` (can't imagine this will 
happen in practice)

In spark 2.2, the `metadataHive` creation is tied to the `resourceLoader` (see 
the stack trace), so it starts to be triggered by new spark session creation. 
In https://github.com/apache/spark/pull/20029, I'm trying to make an argument 
that it is safe to remove the new hive client creation. Besides change, I think 
should also make the hive client creation thread safe: 
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala#L251



> NullPointerException when use udf
> -
>
> Key: SPARK-22891
> URL: https://issues.apache.org/jira/browse/SPARK-22891
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.2.1
> Environment: hadoop 2.7.2
>Reporter: gaoyang
>Priority: Minor
>
> In my application,i use multi threads. Each thread has a SparkSession and use 
> SparkSession.sqlContext.udf.register to register my udf. Sometimes there 
> throws exception like this:
> {code:java}
> java.lang.IllegalArgumentException: Error while instantiating 
> 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
>   at 
> org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136)
>   at 
> org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133)
>   at org.apache.spark.sql.SparkSession.udf(SparkSession.scala:207)
>   at org.apache.spark.sql.SQLContext.udf(SQLContext.scala:203)
>   at 
> com.game.data.stat.clusterTask.tools.standard.IpConverterRegister$.run(IpConverterRegister.scala:63)
>   at 
>   ... 20 more
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:789)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:79)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader$lzycompute(HiveSessionStateBuilder.scala:45)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader(HiveSessionStateBuilder.scala:44)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:61)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
>   at 
> org.apache.spark.sql.hive.HiveSessionSt

[jira] [Commented] (SPARK-22891) NullPointerException when use udf

2017-12-28 Thread Feng Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305862#comment-16305862
 ] 

Feng Liu commented on SPARK-22891:
--

This is definitely caused by the race from 
https://issues.apache.org/jira/browse/HIVE-11935. 

In spark 2.1, spark creates the `metadataHive` lazily until 
`addJar`(https://github.com/apache/spark/blob/branch-2.1/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveSessionState.scala#L40),
 so this can only be triggered by concurrent `addJar` (can't imagine this will 
happen in practice)

In spark 2.2, the `metadataHive` creation is tied to the `resourceLoader` (see 
the stack trace), so it starts to be triggered by new spark session creation. 
In https://github.com/apache/spark/pull/20029, I'm trying to make an argument 
that it is safe to remove the new hive client creation. Besides change, I think 
should also make the hive client creation thread safe: 
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/IsolatedClientLoader.scala#L251



> NullPointerException when use udf
> -
>
> Key: SPARK-22891
> URL: https://issues.apache.org/jira/browse/SPARK-22891
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.2.1
> Environment: hadoop 2.7.2
>Reporter: gaoyang
>Priority: Minor
>
> In my application,i use multi threads. Each thread has a SparkSession and use 
> SparkSession.sqlContext.udf.register to register my udf. Sometimes there 
> throws exception like this:
> {code:java}
> java.lang.IllegalArgumentException: Error while instantiating 
> 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
>   at 
> org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1062)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:137)
>   at 
> org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:136)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:136)
>   at 
> org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:133)
>   at org.apache.spark.sql.SparkSession.udf(SparkSession.scala:207)
>   at org.apache.spark.sql.SQLContext.udf(SQLContext.scala:203)
>   at 
> com.game.data.stat.clusterTask.tools.standard.IpConverterRegister$.run(IpConverterRegister.scala:63)
>   at 
>   ... 20 more
> Caused by: java.lang.reflect.InvocationTargetException
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:789)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.newSession(HiveClientImpl.scala:79)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader$lzycompute(HiveSessionStateBuilder.scala:45)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.resourceLoader(HiveSessionStateBuilder.scala:44)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:61)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
>   at 
> org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:35)
>   at 
> org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:289)
>   at 
> org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1059)
>   ... 20 more
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:744)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.getAuthenticator(SessionState.java:1391)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.(HiveClientImpl.scala:210)
>   ... 34 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.setAuthorizerV2Config(SessionState.java:769)
>   at 
> org.apache.hadoop.hive.ql.session.SessionState.setupAuth(SessionState.java:736)
>   ... 36 more
> Caused by: ja

[jira] [Updated] (SPARK-14922) Alter Table Drop Partition Using Predicate-based Partition Spec

2017-12-28 Thread Dongjoon Hyun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-14922:
--
Affects Version/s: 2.1.2
   2.2.1

> Alter Table Drop Partition Using Predicate-based Partition Spec
> ---
>
> Key: SPARK-14922
> URL: https://issues.apache.org/jira/browse/SPARK-14922
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.1.2, 2.2.1
>Reporter: Xiao Li
>
> Below is allowed in Hive, but not allowed in Spark.
> {noformat}
> alter table ptestfilter drop partition (c='US', d<'2')
> {noformat}
> This example is copied from drop_partitions_filter.q



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-22818) csv escape of quote escape

2017-12-28 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-22818.
-
   Resolution: Fixed
 Assignee: Soonmok Kwon
Fix Version/s: 2.3.0

> csv escape of quote escape
> --
>
> Key: SPARK-22818
> URL: https://issues.apache.org/jira/browse/SPARK-22818
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.1
>Reporter: Soonmok Kwon
>Assignee: Soonmok Kwon
>Priority: Minor
> Fix For: 2.3.0
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> A DataFrame is stored in CSV format and loaded again. When there's backslash 
> followed by quotation mark, csv reading seems to make an error.
> This issue was raised before in 
> https://issues.apache.org/jira/browse/SPARK-19834 and postponed due to a bug 
> in its dependency. Now it is resolved and this issue can be reopened.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22921) Merge script should prompt for assigning jiras

2017-12-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305814#comment-16305814
 ] 

Apache Spark commented on SPARK-22921:
--

User 'squito' has created a pull request for this issue:
https://github.com/apache/spark/pull/20107

> Merge script should prompt for assigning jiras
> --
>
> Key: SPARK-22921
> URL: https://issues.apache.org/jira/browse/SPARK-22921
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 2.3.0
>Reporter: Imran Rashid
>Priority: Trivial
>
> Its a bit of a nuisance to have to go into jira to assign the issue when you 
> merge a pr.  In general you assign to either the original reporter or a 
> commentor, would be nice if the merge script made that easy to do.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22921) Merge script should prompt for assigning jiras

2017-12-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22921:


Assignee: Apache Spark

> Merge script should prompt for assigning jiras
> --
>
> Key: SPARK-22921
> URL: https://issues.apache.org/jira/browse/SPARK-22921
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 2.3.0
>Reporter: Imran Rashid
>Assignee: Apache Spark
>Priority: Trivial
>
> Its a bit of a nuisance to have to go into jira to assign the issue when you 
> merge a pr.  In general you assign to either the original reporter or a 
> commentor, would be nice if the merge script made that easy to do.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22921) Merge script should prompt for assigning jiras

2017-12-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22921:


Assignee: (was: Apache Spark)

> Merge script should prompt for assigning jiras
> --
>
> Key: SPARK-22921
> URL: https://issues.apache.org/jira/browse/SPARK-22921
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 2.3.0
>Reporter: Imran Rashid
>Priority: Trivial
>
> Its a bit of a nuisance to have to go into jira to assign the issue when you 
> merge a pr.  In general you assign to either the original reporter or a 
> commentor, would be nice if the merge script made that easy to do.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-22904) Basic tests for decimal operations and string cast

2017-12-28 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-22904.
-
   Resolution: Fixed
 Assignee: Marco Gaido
Fix Version/s: 2.3.0

> Basic tests for decimal operations and string cast
> --
>
> Key: SPARK-22904
> URL: https://issues.apache.org/jira/browse/SPARK-22904
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Marco Gaido
>Assignee: Marco Gaido
> Fix For: 2.3.0
>
>
> Tests covering Spark behavior with decimal operations which cause overflow or 
> precision loss and covering casting invalid strings to other data types or 
> passing invalid strings to some functions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11035) Launcher: allow apps to be launched in-process

2017-12-28 Thread Imran Rashid (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Imran Rashid resolved SPARK-11035.
--
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 19591
[https://github.com/apache/spark/pull/19591]

> Launcher: allow apps to be launched in-process
> --
>
> Key: SPARK-11035
> URL: https://issues.apache.org/jira/browse/SPARK-11035
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Fix For: 2.3.0
>
>
> The launcher library is currently restricted to launching apps as child 
> processes. That is fine for a lot of cases, especially if the app is running 
> in client mode.
> But in certain cases, especially launching in cluster mode, it's more 
> efficient to avoid launching a new process, since that process won't be doing 
> much.
> We should add support for launching apps in process, even if restricted to 
> cluster mode at first. This will require some rework of the launch paths to 
> avoid using system properties to propagate configuration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-22890) Basic tests for DateTimeOperations

2017-12-28 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-22890.
-
   Resolution: Fixed
 Assignee: Yuming Wang
Fix Version/s: 2.3.0

> Basic tests for DateTimeOperations
> --
>
> Key: SPARK-22890
> URL: https://issues.apache.org/jira/browse/SPARK-22890
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22921) Merge script should prompt for assigning jiras

2017-12-28 Thread Imran Rashid (JIRA)
Imran Rashid created SPARK-22921:


 Summary: Merge script should prompt for assigning jiras
 Key: SPARK-22921
 URL: https://issues.apache.org/jira/browse/SPARK-22921
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra
Affects Versions: 2.3.0
Reporter: Imran Rashid
Priority: Trivial


Its a bit of a nuisance to have to go into jira to assign the issue when you 
merge a pr.  In general you assign to either the original reporter or a 
commentor, would be nice if the merge script made that easy to do.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-11035) Launcher: allow apps to be launched in-process

2017-12-28 Thread Imran Rashid (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Imran Rashid reassigned SPARK-11035:


Assignee: Marcelo Vanzin

> Launcher: allow apps to be launched in-process
> --
>
> Key: SPARK-11035
> URL: https://issues.apache.org/jira/browse/SPARK-11035
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>
> The launcher library is currently restricted to launching apps as child 
> processes. That is fine for a lot of cases, especially if the app is running 
> in client mode.
> But in certain cases, especially launching in cluster mode, it's more 
> efficient to avoid launching a new process, since that process won't be doing 
> much.
> We should add support for launching apps in process, even if restricted to 
> cluster mode at first. This will require some rework of the launch paths to 
> avoid using system properties to propagate configuration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-11035) Launcher: allow apps to be launched in-process

2017-12-28 Thread Imran Rashid (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Imran Rashid reassigned SPARK-11035:


Assignee: (was: Marcelo Vanzin)

> Launcher: allow apps to be launched in-process
> --
>
> Key: SPARK-11035
> URL: https://issues.apache.org/jira/browse/SPARK-11035
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Marcelo Vanzin
>
> The launcher library is currently restricted to launching apps as child 
> processes. That is fine for a lot of cases, especially if the app is running 
> in client mode.
> But in certain cases, especially launching in cluster mode, it's more 
> efficient to avoid launching a new process, since that process won't be doing 
> much.
> We should add support for launching apps in process, even if restricted to 
> cluster mode at first. This will require some rework of the launch paths to 
> avoid using system properties to propagate configuration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-11035) Launcher: allow apps to be launched in-process

2017-12-28 Thread Imran Rashid (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Imran Rashid reassigned SPARK-11035:


Assignee: Marcelo Vanzin

> Launcher: allow apps to be launched in-process
> --
>
> Key: SPARK-11035
> URL: https://issues.apache.org/jira/browse/SPARK-11035
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
>
> The launcher library is currently restricted to launching apps as child 
> processes. That is fine for a lot of cases, especially if the app is running 
> in client mode.
> But in certain cases, especially launching in cluster mode, it's more 
> efficient to avoid launching a new process, since that process won't be doing 
> much.
> We should add support for launching apps in process, even if restricted to 
> cluster mode at first. This will require some rework of the launch paths to 
> avoid using system properties to propagate configuration.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12297) Add work-around for Parquet/Hive int96 timestamp bug.

2017-12-28 Thread Imran Rashid (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Imran Rashid reassigned SPARK-12297:


Assignee: Imran Rashid  (was: Marcelo Vanzin)

> Add work-around for Parquet/Hive int96 timestamp bug.
> -
>
> Key: SPARK-12297
> URL: https://issues.apache.org/jira/browse/SPARK-12297
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Reporter: Ryan Blue
>Assignee: Imran Rashid
> Fix For: 2.3.0
>
>
> Spark copied Hive's behavior for parquet, but this was inconsistent with 
> other file formats, and inconsistent with Impala (which is the original 
> source of putting a timestamp as an int96 in parquet, I believe).  This made 
> timestamps in parquet act more like timestamps with timezones, while in other 
> file formats, timestamps have no time zone, they are a "floating time".
> The easiest way to see this issue is to write out a table with timestamps in 
> multiple different formats from one timezone, then try to read them back in 
> another timezone.  Eg., here I write out a few timestamps to parquet and 
> textfile hive tables, and also just as a json file, all in the 
> "America/Los_Angeles" timezone:
> {code}
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.types._
> val tblPrefix = args(0)
> val schema = new StructType().add("ts", TimestampType)
> val rows = sc.parallelize(Seq(
>   "2015-12-31 23:50:59.123",
>   "2015-12-31 22:49:59.123",
>   "2016-01-01 00:39:59.123",
>   "2016-01-01 01:29:59.123"
> ).map { x => Row(java.sql.Timestamp.valueOf(x)) })
> val rawData = spark.createDataFrame(rows, schema).toDF()
> rawData.show()
> Seq("parquet", "textfile").foreach { format =>
>   val tblName = s"${tblPrefix}_$format"
>   spark.sql(s"DROP TABLE IF EXISTS $tblName")
>   spark.sql(
> raw"""CREATE TABLE $tblName (
>   |  ts timestamp
>   | )
>   | STORED AS $format
>  """.stripMargin)
>   rawData.write.insertInto(tblName)
> }
> rawData.write.json(s"${tblPrefix}_json")
> {code}
> Then I start a spark-shell in "America/New_York" timezone, and read the data 
> back from each table:
> {code}
> scala> spark.sql("select * from la_parquet").collect().foreach{println}
> [2016-01-01 02:50:59.123]
> [2016-01-01 01:49:59.123]
> [2016-01-01 03:39:59.123]
> [2016-01-01 04:29:59.123]
> scala> spark.sql("select * from la_textfile").collect().foreach{println}
> [2015-12-31 23:50:59.123]
> [2015-12-31 22:49:59.123]
> [2016-01-01 00:39:59.123]
> [2016-01-01 01:29:59.123]
> scala> spark.read.json("la_json").collect().foreach{println}
> [2015-12-31 23:50:59.123]
> [2015-12-31 22:49:59.123]
> [2016-01-01 00:39:59.123]
> [2016-01-01 01:29:59.123]
> scala> spark.read.json("la_json").join(spark.sql("select * from 
> la_textfile"), "ts").show()
> ++
> |  ts|
> ++
> |2015-12-31 23:50:...|
> |2015-12-31 22:49:...|
> |2016-01-01 00:39:...|
> |2016-01-01 01:29:...|
> ++
> scala> spark.read.json("la_json").join(spark.sql("select * from la_parquet"), 
> "ts").show()
> +---+
> | ts|
> +---+
> +---+
> {code}
> The textfile and json based data shows the same times, and can be joined 
> against each other, while the times from the parquet data have changed (and 
> obviously joins fail).
> This is a big problem for any organization that may try to read the same data 
> (say in S3) with clusters in multiple timezones.  It can also be a nasty 
> surprise as an organization tries to migrate file formats.  Finally, its a 
> source of incompatibility between Hive, Impala, and Spark.
> HIVE-12767 aims to fix this by introducing a table property which indicates 
> the "storage timezone" for the table.  Spark should add the same to ensure 
> consistency between file formats, and with Hive & Impala.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21616) SparkR 2.3.0 migration guide, release note

2017-12-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21616:


Assignee: Apache Spark  (was: Felix Cheung)

> SparkR 2.3.0 migration guide, release note
> --
>
> Key: SPARK-21616
> URL: https://issues.apache.org/jira/browse/SPARK-21616
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>Assignee: Apache Spark
>
> From looking at changes since 2.2.0, this/these should be documented in the 
> migration guide / release note for the 2.3.0 release, as it is behavior 
> changes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21616) SparkR 2.3.0 migration guide, release note

2017-12-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21616?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-21616:


Assignee: Felix Cheung  (was: Apache Spark)

> SparkR 2.3.0 migration guide, release note
> --
>
> Key: SPARK-21616
> URL: https://issues.apache.org/jira/browse/SPARK-21616
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>
> From looking at changes since 2.2.0, this/these should be documented in the 
> migration guide / release note for the 2.3.0 release, as it is behavior 
> changes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-12297) Add work-around for Parquet/Hive int96 timestamp bug.

2017-12-28 Thread Imran Rashid (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Imran Rashid reassigned SPARK-12297:


Assignee: Marcelo Vanzin  (was: Imran Rashid)

> Add work-around for Parquet/Hive int96 timestamp bug.
> -
>
> Key: SPARK-12297
> URL: https://issues.apache.org/jira/browse/SPARK-12297
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Reporter: Ryan Blue
>Assignee: Marcelo Vanzin
> Fix For: 2.3.0
>
>
> Spark copied Hive's behavior for parquet, but this was inconsistent with 
> other file formats, and inconsistent with Impala (which is the original 
> source of putting a timestamp as an int96 in parquet, I believe).  This made 
> timestamps in parquet act more like timestamps with timezones, while in other 
> file formats, timestamps have no time zone, they are a "floating time".
> The easiest way to see this issue is to write out a table with timestamps in 
> multiple different formats from one timezone, then try to read them back in 
> another timezone.  Eg., here I write out a few timestamps to parquet and 
> textfile hive tables, and also just as a json file, all in the 
> "America/Los_Angeles" timezone:
> {code}
> import org.apache.spark.sql.Row
> import org.apache.spark.sql.types._
> val tblPrefix = args(0)
> val schema = new StructType().add("ts", TimestampType)
> val rows = sc.parallelize(Seq(
>   "2015-12-31 23:50:59.123",
>   "2015-12-31 22:49:59.123",
>   "2016-01-01 00:39:59.123",
>   "2016-01-01 01:29:59.123"
> ).map { x => Row(java.sql.Timestamp.valueOf(x)) })
> val rawData = spark.createDataFrame(rows, schema).toDF()
> rawData.show()
> Seq("parquet", "textfile").foreach { format =>
>   val tblName = s"${tblPrefix}_$format"
>   spark.sql(s"DROP TABLE IF EXISTS $tblName")
>   spark.sql(
> raw"""CREATE TABLE $tblName (
>   |  ts timestamp
>   | )
>   | STORED AS $format
>  """.stripMargin)
>   rawData.write.insertInto(tblName)
> }
> rawData.write.json(s"${tblPrefix}_json")
> {code}
> Then I start a spark-shell in "America/New_York" timezone, and read the data 
> back from each table:
> {code}
> scala> spark.sql("select * from la_parquet").collect().foreach{println}
> [2016-01-01 02:50:59.123]
> [2016-01-01 01:49:59.123]
> [2016-01-01 03:39:59.123]
> [2016-01-01 04:29:59.123]
> scala> spark.sql("select * from la_textfile").collect().foreach{println}
> [2015-12-31 23:50:59.123]
> [2015-12-31 22:49:59.123]
> [2016-01-01 00:39:59.123]
> [2016-01-01 01:29:59.123]
> scala> spark.read.json("la_json").collect().foreach{println}
> [2015-12-31 23:50:59.123]
> [2015-12-31 22:49:59.123]
> [2016-01-01 00:39:59.123]
> [2016-01-01 01:29:59.123]
> scala> spark.read.json("la_json").join(spark.sql("select * from 
> la_textfile"), "ts").show()
> ++
> |  ts|
> ++
> |2015-12-31 23:50:...|
> |2015-12-31 22:49:...|
> |2016-01-01 00:39:...|
> |2016-01-01 01:29:...|
> ++
> scala> spark.read.json("la_json").join(spark.sql("select * from la_parquet"), 
> "ts").show()
> +---+
> | ts|
> +---+
> +---+
> {code}
> The textfile and json based data shows the same times, and can be joined 
> against each other, while the times from the parquet data have changed (and 
> obviously joins fail).
> This is a big problem for any organization that may try to read the same data 
> (say in S3) with clusters in multiple timezones.  It can also be a nasty 
> surprise as an organization tries to migrate file formats.  Finally, its a 
> source of incompatibility between Hive, Impala, and Spark.
> HIVE-12767 aims to fix this by introducing a table property which indicates 
> the "storage timezone" for the table.  Spark should add the same to ensure 
> consistency between file formats, and with Hive & Impala.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21616) SparkR 2.3.0 migration guide, release note

2017-12-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305789#comment-16305789
 ] 

Apache Spark commented on SPARK-21616:
--

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/20106

> SparkR 2.3.0 migration guide, release note
> --
>
> Key: SPARK-21616
> URL: https://issues.apache.org/jira/browse/SPARK-21616
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>
> From looking at changes since 2.2.0, this/these should be documented in the 
> migration guide / release note for the 2.3.0 release, as it is behavior 
> changes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22920) R sql functions for current_date, current_timestamp, rtrim/ltrim/trim with trimString

2017-12-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305762#comment-16305762
 ] 

Apache Spark commented on SPARK-22920:
--

User 'felixcheung' has created a pull request for this issue:
https://github.com/apache/spark/pull/20105

> R sql functions for current_date, current_timestamp, rtrim/ltrim/trim with 
> trimString
> -
>
> Key: SPARK-22920
> URL: https://issues.apache.org/jira/browse/SPARK-22920
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22920) R sql functions for current_date, current_timestamp, rtrim/ltrim/trim with trimString

2017-12-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22920:


Assignee: Apache Spark

> R sql functions for current_date, current_timestamp, rtrim/ltrim/trim with 
> trimString
> -
>
> Key: SPARK-22920
> URL: https://issues.apache.org/jira/browse/SPARK-22920
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22920) R sql functions for current_date, current_timestamp, rtrim/ltrim/trim with trimString

2017-12-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22920:


Assignee: (was: Apache Spark)

> R sql functions for current_date, current_timestamp, rtrim/ltrim/trim with 
> trimString
> -
>
> Key: SPARK-22920
> URL: https://issues.apache.org/jira/browse/SPARK-22920
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22920) R sql functions for current_date, current_timestamp, rtrim/ltrim/trim with trimString

2017-12-28 Thread Felix Cheung (JIRA)
Felix Cheung created SPARK-22920:


 Summary: R sql functions for current_date, current_timestamp, 
rtrim/ltrim/trim with trimString
 Key: SPARK-22920
 URL: https://issues.apache.org/jira/browse/SPARK-22920
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 2.3.0
Reporter: Felix Cheung






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-22836) Executors page is not showing driver logs links

2017-12-28 Thread Imran Rashid (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Imran Rashid resolved SPARK-22836.
--
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 20038
[https://github.com/apache/spark/pull/20038]

> Executors page is not showing driver logs links
> ---
>
> Key: SPARK-22836
> URL: https://issues.apache.org/jira/browse/SPARK-22836
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Marcelo Vanzin
> Fix For: 2.3.0
>
>
> -I think this was mainly caused by SPARK-15951; that changed modified the 
> executors page to read data from the REST API, and in 2.1 and 2.2 the REST 
> API does not return the driver as an executor. So no information about the 
> driver is shown in that page at all.- (Edit: bug is unrelated to the 
> aforementioned bug.)
> In 2.3 the driver executor is listed, but it is doesn't have any log links.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22836) Executors page is not showing driver logs links

2017-12-28 Thread Imran Rashid (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Imran Rashid reassigned SPARK-22836:


Assignee: Marcelo Vanzin

> Executors page is not showing driver logs links
> ---
>
> Key: SPARK-22836
> URL: https://issues.apache.org/jira/browse/SPARK-22836
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Fix For: 2.3.0
>
>
> -I think this was mainly caused by SPARK-15951; that changed modified the 
> executors page to read data from the REST API, and in 2.1 and 2.2 the REST 
> API does not return the driver as an executor. So no information about the 
> driver is shown in that page at all.- (Edit: bug is unrelated to the 
> aforementioned bug.)
> In 2.3 the driver executor is listed, but it is doesn't have any log links.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22905) Fix ChiSqSelectorModel save implementation

2017-12-28 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley reassigned SPARK-22905:
-

Assignee: Weichen Xu

> Fix ChiSqSelectorModel save implementation
> --
>
> Key: SPARK-22905
> URL: https://issues.apache.org/jira/browse/SPARK-22905
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 2.2.1
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Currently, in `ChiSqSelectorModel`, save:
> {code}
> spark.createDataFrame(dataArray).repartition(1).write...
> {code}
> The default partition number used by createDataFrame is "defaultParallelism",
> Current RoundRobinPartitioning won't guarantee the "repartition" generating 
> the same order result with local array. We need fix it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13127) Upgrade Parquet to 1.9 (Fixes parquet sorting)

2017-12-28 Thread Dong Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305691#comment-16305691
 ] 

Dong Jiang commented on SPARK-13127:


[~gaurav24], looks like you are like me, waiting for this ticket to be worked 
on.
If you would like, help to comment on this thread in developer list to advocate 
to have this issue resolved in Spark 2.3 release
http://apache-spark-developers-list.1001551.n3.nabble.com/Timeline-for-Spark-2-3-td22793.html

> Upgrade Parquet to 1.9 (Fixes parquet sorting)
> --
>
> Key: SPARK-13127
> URL: https://issues.apache.org/jira/browse/SPARK-13127
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Justin Pihony
>
> Currently, when you write a sorted DataFrame to Parquet, then reading the 
> data back out is not sorted by default. [This is due to a bug in 
> Parquet|https://issues.apache.org/jira/browse/PARQUET-241] that was fixed in 
> 1.9.
> There is a workaround to read the file back in using a file glob (filepath/*).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22919) Bump Apache httpclient versions

2017-12-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22919:


Assignee: Apache Spark

> Bump Apache httpclient versions
> ---
>
> Key: SPARK-22919
> URL: https://issues.apache.org/jira/browse/SPARK-22919
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Fokko Driesprong
>Assignee: Apache Spark
>
> I would like to bump the PATCH versions of both the Apache httpclient Apache 
> httpcore. I use the SparkTC Stocator library for connecting to an object 
> store, and I would align the versions to reduce java version mismatches. 
> Furthermore it is good to bump these versions since they fix stability and 
> performance issues:
> https://archive.apache.org/dist/httpcomponents/httpclient/RELEASE_NOTES-4.5.x.txt
> https://www.apache.org/dist/httpcomponents/httpcore/RELEASE_NOTES-4.4.x.txt



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22919) Bump Apache httpclient versions

2017-12-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305649#comment-16305649
 ] 

Apache Spark commented on SPARK-22919:
--

User 'Fokko' has created a pull request for this issue:
https://github.com/apache/spark/pull/20103

> Bump Apache httpclient versions
> ---
>
> Key: SPARK-22919
> URL: https://issues.apache.org/jira/browse/SPARK-22919
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Fokko Driesprong
>
> I would like to bump the PATCH versions of both the Apache httpclient Apache 
> httpcore. I use the SparkTC Stocator library for connecting to an object 
> store, and I would align the versions to reduce java version mismatches. 
> Furthermore it is good to bump these versions since they fix stability and 
> performance issues:
> https://archive.apache.org/dist/httpcomponents/httpclient/RELEASE_NOTES-4.5.x.txt
> https://www.apache.org/dist/httpcomponents/httpcore/RELEASE_NOTES-4.4.x.txt



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22919) Bump Apache httpclient versions

2017-12-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22919:


Assignee: (was: Apache Spark)

> Bump Apache httpclient versions
> ---
>
> Key: SPARK-22919
> URL: https://issues.apache.org/jira/browse/SPARK-22919
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Fokko Driesprong
>
> I would like to bump the PATCH versions of both the Apache httpclient Apache 
> httpcore. I use the SparkTC Stocator library for connecting to an object 
> store, and I would align the versions to reduce java version mismatches. 
> Furthermore it is good to bump these versions since they fix stability and 
> performance issues:
> https://archive.apache.org/dist/httpcomponents/httpclient/RELEASE_NOTES-4.5.x.txt
> https://www.apache.org/dist/httpcomponents/httpcore/RELEASE_NOTES-4.4.x.txt



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22919) Bump Apache httpclient versions

2017-12-28 Thread Fokko Driesprong (JIRA)
Fokko Driesprong created SPARK-22919:


 Summary: Bump Apache httpclient versions
 Key: SPARK-22919
 URL: https://issues.apache.org/jira/browse/SPARK-22919
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.1
Reporter: Fokko Driesprong


I would like to bump the PATCH versions of both the Apache httpclient Apache 
httpcore. I use the SparkTC Stocator library for connecting to an object store, 
and I would align the versions to reduce java version mismatches. Furthermore 
it is good to bump these versions since they fix stability and performance 
issues:
https://archive.apache.org/dist/httpcomponents/httpclient/RELEASE_NOTES-4.5.x.txt
https://www.apache.org/dist/httpcomponents/httpcore/RELEASE_NOTES-4.4.x.txt



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22577) executor page blacklist status should update with TaskSet level blacklisting

2017-12-28 Thread Attila Zsolt Piros (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305643#comment-16305643
 ] 

Attila Zsolt Piros commented on SPARK-22577:


I started working on this issue

> executor page blacklist status should update with TaskSet level blacklisting
> 
>
> Key: SPARK-22577
> URL: https://issues.apache.org/jira/browse/SPARK-22577
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 2.1.1
>Reporter: Thomas Graves
>
> right now the executor blacklist status only updates with the 
> BlacklistTracker after a task set has finished and propagated the 
> blacklisting to the application level. We should change that to show at the 
> taskset level as well. Without this it can be very confusing to the user why 
> things aren't running.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22875) Assembly build fails for a high user id

2017-12-28 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-22875:
-

Assignee: Gera Shegalov

> Assembly build fails for a high user id
> ---
>
> Key: SPARK-22875
> URL: https://issues.apache.org/jira/browse/SPARK-22875
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.2.1
>Reporter: Gera Shegalov
>Assignee: Gera Shegalov
>Priority: Minor
> Fix For: 2.3.0
>
>
> {code}
> ./build/mvn package -Pbigtop-dist -DskipTests
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:3.1.0:single (dist) on project 
> spark-assembly_2.11: Execution dist of goal 
> org.apache.maven.plugins:maven-assembly-plugin:3.1.0:single failed: user id 
> '123456789' is too big ( > 2097151 ). -> [Help 1]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-22875) Assembly build fails for a high user id

2017-12-28 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-22875.
---
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 20055
[https://github.com/apache/spark/pull/20055]

> Assembly build fails for a high user id
> ---
>
> Key: SPARK-22875
> URL: https://issues.apache.org/jira/browse/SPARK-22875
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.2.1
>Reporter: Gera Shegalov
>Priority: Minor
> Fix For: 2.3.0
>
>
> {code}
> ./build/mvn package -Pbigtop-dist -DskipTests
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:3.1.0:single (dist) on project 
> spark-assembly_2.11: Execution dist of goal 
> org.apache.maven.plugins:maven-assembly-plugin:3.1.0:single failed: user id 
> '123456789' is too big ( > 2097151 ). -> [Help 1]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13127) Upgrade Parquet to 1.9 (Fixes parquet sorting)

2017-12-28 Thread Gaurav Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305612#comment-16305612
 ] 

Gaurav Shah commented on SPARK-13127:
-

I am surprised people haven't hit 
https://issues.apache.org/jira/browse/PARQUET-353, I constantly face OOM error 
on a continuous streaming application. Wondering if we would get parquet 1.9.1 
and then upgrade spark to use that. 
https://issues.apache.org/jira/browse/PARQUET-1027

> Upgrade Parquet to 1.9 (Fixes parquet sorting)
> --
>
> Key: SPARK-13127
> URL: https://issues.apache.org/jira/browse/SPARK-13127
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Justin Pihony
>
> Currently, when you write a sorted DataFrame to Parquet, then reading the 
> data back out is not sorted by default. [This is due to a bug in 
> Parquet|https://issues.apache.org/jira/browse/PARQUET-241] that was fixed in 
> 1.9.
> There is a workaround to read the file back in using a file glob (filepath/*).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-17123) Performing set operations that combine string and date / timestamp columns may result in generated projection code which doesn't compile

2017-12-28 Thread Wassim (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305577#comment-16305577
 ] 

Wassim edited comment on SPARK-17123 at 12/28/17 4:39 PM:
--

Hello, having same issue at runtime in java with spark 2.2.0, could you please 
suggest a solution 

{{org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
214, Column 114: No applicable constructor/method found for actual parameters 
"org.apache.spark.unsafe.types.UTF8String"; candidates are: "public static 
java.sql.Date 
org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaDate(int)"}}


   org.apache.spark
   spark-core_2.11
   2.2.0

 
org.apache.spark
spark-sql_2.11
2.2.0



was (Author: wassimdr):
Hello, having same issue in java with spark 2.2.0, could you please suggest a 
solution 

{{org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
214, Column 114: No applicable constructor/method found for actual parameters 
"org.apache.spark.unsafe.types.UTF8String"; candidates are: "public static 
java.sql.Date 
org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaDate(int)"}}


   org.apache.spark
   spark-core_2.11
   2.2.0

 
org.apache.spark
spark-sql_2.11
2.2.0


> Performing set operations that combine string and date / timestamp columns 
> may result in generated projection code which doesn't compile
> 
>
> Key: SPARK-17123
> URL: https://issues.apache.org/jira/browse/SPARK-17123
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Josh Rosen
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 2.0.2, 2.1.0
>
>
> The following example program causes SpecificSafeProjection code generation 
> to produce Java code which doesn't compile:
> {code}
> import org.apache.spark.sql.types._
> spark.sql("set spark.sql.codegen.fallback=false")
> val dateDF = spark.createDataFrame(sc.parallelize(Seq(Row(new 
> java.sql.Date(0, StructType(StructField("value", DateType) :: Nil))
> val longDF = sc.parallelize(Seq(new java.sql.Date(0).toString)).toDF
> dateDF.union(longDF).collect()
> {code}
> This fails at runtime with the following error:
> {code}
> failed to compile: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 28, Column 107: No applicable constructor/method found 
> for actual parameters "org.apache.spark.unsafe.types.UTF8String"; candidates 
> are: "public static java.sql.Date 
> org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaDate(int)"
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificSafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificSafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.codegen.BaseProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private MutableRow mutableRow;
> /* 009 */   private Object[] values;
> /* 010 */   private org.apache.spark.sql.types.StructType schema;
> /* 011 */
> /* 012 */
> /* 013 */   public SpecificSafeProjection(Object[] references) {
> /* 014 */ this.references = references;
> /* 015 */ mutableRow = (MutableRow) references[references.length - 1];
> /* 016 */
> /* 017 */ this.schema = (org.apache.spark.sql.types.StructType) 
> references[0];
> /* 018 */   }
> /* 019 */
> /* 020 */   public java.lang.Object apply(java.lang.Object _i) {
> /* 021 */ InternalRow i = (InternalRow) _i;
> /* 022 */
> /* 023 */ values = new Object[1];
> /* 024 */
> /* 025 */ boolean isNull2 = i.isNullAt(0);
> /* 026 */ UTF8String value2 = isNull2 ? null : (i.getUTF8String(0));
> /* 027 */ boolean isNull1 = isNull2;
> /* 028 */ final java.sql.Date value1 = isNull1 ? null : 
> org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaDate(value2);
> /* 029 */ isNull1 = value1 == null;
> /* 030 */ if (isNull1) {
> /* 031 */   values[0] = null;
> /* 032 */ } else {
> /* 033 */   values[0] = value1;
> /* 034 */ }
> /* 035 */
> /* 036 */ final org.apache.spark.sql.Row value = new 
> org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema(values, 
> schema);
> /* 037 */ if (false) {
> /* 038 */   mutableRow.setNullAt(0);
> /* 039 */ } else {
> /* 040 */
> /* 041 */   mutableRow.update(0, value);
> /* 042 */ }
> /* 043 */
> /* 044 */ return mutableRow;
> /* 045 */   }
> /* 046 */ }
> {code}
> Here, the invocation of {{DateTimeUtils.toJavaDate}} is incorrect because the 
> generated code tries to call it with a UTF8String while the method expects an 
> int instead.



--
This message was sent by Atlassi

[jira] [Commented] (SPARK-17123) Performing set operations that combine string and date / timestamp columns may result in generated projection code which doesn't compile

2017-12-28 Thread Wassim (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305577#comment-16305577
 ] 

Wassim commented on SPARK-17123:


Hello, having same issue in java with spark 2.2.0, could you please suggest a 
solution 

{{org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 
214, Column 114: No applicable constructor/method found for actual parameters 
"org.apache.spark.unsafe.types.UTF8String"; candidates are: "public static 
java.sql.Date 
org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaDate(int)"}}


   org.apache.spark
   spark-core_2.11
   2.2.0

 
org.apache.spark
spark-sql_2.11
2.2.0


> Performing set operations that combine string and date / timestamp columns 
> may result in generated projection code which doesn't compile
> 
>
> Key: SPARK-17123
> URL: https://issues.apache.org/jira/browse/SPARK-17123
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Josh Rosen
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 2.0.2, 2.1.0
>
>
> The following example program causes SpecificSafeProjection code generation 
> to produce Java code which doesn't compile:
> {code}
> import org.apache.spark.sql.types._
> spark.sql("set spark.sql.codegen.fallback=false")
> val dateDF = spark.createDataFrame(sc.parallelize(Seq(Row(new 
> java.sql.Date(0, StructType(StructField("value", DateType) :: Nil))
> val longDF = sc.parallelize(Seq(new java.sql.Date(0).toString)).toDF
> dateDF.union(longDF).collect()
> {code}
> This fails at runtime with the following error:
> {code}
> failed to compile: org.codehaus.commons.compiler.CompileException: File 
> 'generated.java', Line 28, Column 107: No applicable constructor/method found 
> for actual parameters "org.apache.spark.unsafe.types.UTF8String"; candidates 
> are: "public static java.sql.Date 
> org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaDate(int)"
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */   return new SpecificSafeProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificSafeProjection extends 
> org.apache.spark.sql.catalyst.expressions.codegen.BaseProjection {
> /* 006 */
> /* 007 */   private Object[] references;
> /* 008 */   private MutableRow mutableRow;
> /* 009 */   private Object[] values;
> /* 010 */   private org.apache.spark.sql.types.StructType schema;
> /* 011 */
> /* 012 */
> /* 013 */   public SpecificSafeProjection(Object[] references) {
> /* 014 */ this.references = references;
> /* 015 */ mutableRow = (MutableRow) references[references.length - 1];
> /* 016 */
> /* 017 */ this.schema = (org.apache.spark.sql.types.StructType) 
> references[0];
> /* 018 */   }
> /* 019 */
> /* 020 */   public java.lang.Object apply(java.lang.Object _i) {
> /* 021 */ InternalRow i = (InternalRow) _i;
> /* 022 */
> /* 023 */ values = new Object[1];
> /* 024 */
> /* 025 */ boolean isNull2 = i.isNullAt(0);
> /* 026 */ UTF8String value2 = isNull2 ? null : (i.getUTF8String(0));
> /* 027 */ boolean isNull1 = isNull2;
> /* 028 */ final java.sql.Date value1 = isNull1 ? null : 
> org.apache.spark.sql.catalyst.util.DateTimeUtils.toJavaDate(value2);
> /* 029 */ isNull1 = value1 == null;
> /* 030 */ if (isNull1) {
> /* 031 */   values[0] = null;
> /* 032 */ } else {
> /* 033 */   values[0] = value1;
> /* 034 */ }
> /* 035 */
> /* 036 */ final org.apache.spark.sql.Row value = new 
> org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema(values, 
> schema);
> /* 037 */ if (false) {
> /* 038 */   mutableRow.setNullAt(0);
> /* 039 */ } else {
> /* 040 */
> /* 041 */   mutableRow.update(0, value);
> /* 042 */ }
> /* 043 */
> /* 044 */ return mutableRow;
> /* 045 */   }
> /* 046 */ }
> {code}
> Here, the invocation of {{DateTimeUtils.toJavaDate}} is incorrect because the 
> generated code tries to call it with a UTF8String while the method expects an 
> int instead.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22918) sbt test (spark - local) fail after upgrading to 2.2.1 with: java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine",

2017-12-28 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305549#comment-16305549
 ] 

Sean Owen commented on SPARK-22918:
---

It sounds like you have a SecurityManager enabled. Can you turn that off? or 
are you sure you haven't made any special configurations like that?

> sbt test (spark - local) fail after upgrading to 2.2.1 with: 
> java.security.AccessControlException: access denied 
> org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
> 
>
> Key: SPARK-22918
> URL: https://issues.apache.org/jira/browse/SPARK-22918
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Damian Momot
>
> After upgrading 2.2.0 -> 2.2.1 sbt test command in one of my projects started 
> to fail with following exception:
> {noformat}
> java.security.AccessControlException: access denied 
> org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
>   at 
> java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
>   at 
> java.security.AccessController.checkPermission(AccessController.java:884)
>   at 
> org.apache.derby.iapi.security.SecurityUtil.checkDerbyInternalsPrivilege(Unknown
>  Source)
>   at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Unknown 
> Source)
>   at org.apache.derby.iapi.jdbc.JDBCBoot$1.run(Unknown Source)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
>   at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
>   at org.apache.derby.jdbc.EmbeddedDriver.boot(Unknown Source)
>   at org.apache.derby.jdbc.EmbeddedDriver.(Unknown Source)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at java.lang.Class.newInstance(Class.java:442)
>   at 
> org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:47)
>   at 
> org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131)
>   at 
> org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:85)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
>   at 
> org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325)
>   at 
> org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282)
>   at 
> org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:240)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:286)
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>   at 
> org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
>   at 
> org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
>   at 
> org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187)
>   at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356)
>   at 
> org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775)
>   at 
> org.datanucleus.api.jdo.JDOPersisten

[jira] [Updated] (SPARK-22918) sbt test (spark - local) fail after upgrading to 2.2.1 with: java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine", "u

2017-12-28 Thread Damian Momot (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Momot updated SPARK-22918:
-
Description: 
After upgrading 2.2.0 -> 2.2.1 sbt test command in one of my projects started 
to fail with following exception:

{noformat}
java.security.AccessControlException: access denied 
org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
at 
java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
at 
java.security.AccessController.checkPermission(AccessController.java:884)
at 
org.apache.derby.iapi.security.SecurityUtil.checkDerbyInternalsPrivilege(Unknown
 Source)
at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Unknown 
Source)
at org.apache.derby.iapi.jdbc.JDBCBoot$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
at org.apache.derby.jdbc.EmbeddedDriver.boot(Unknown Source)
at org.apache.derby.jdbc.EmbeddedDriver.(Unknown Source)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at 
org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:47)
at 
org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:85)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at 
org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325)
at 
org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282)
at 
org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:240)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:286)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at 
org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
at 
org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187)
at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)
at java.security.AccessController.doPrivileged(Native Method)
at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)
at 
javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHe

[jira] [Updated] (SPARK-22918) sbt test (spark - local) fail after upgrading to 2.2.1 with: java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine", "u

2017-12-28 Thread Damian Momot (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Momot updated SPARK-22918:
-
Description: 
After upgrading 2.2.0 -> 2.2.1 sbt test command started to fail with following 
exception:

{noformat}
java.security.AccessControlException: access denied 
org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
at 
java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
at 
java.security.AccessController.checkPermission(AccessController.java:884)
at 
org.apache.derby.iapi.security.SecurityUtil.checkDerbyInternalsPrivilege(Unknown
 Source)
at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Unknown 
Source)
at org.apache.derby.iapi.jdbc.JDBCBoot$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
at org.apache.derby.jdbc.EmbeddedDriver.boot(Unknown Source)
at org.apache.derby.jdbc.EmbeddedDriver.(Unknown Source)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at 
org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:47)
at 
org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:85)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at 
org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325)
at 
org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282)
at 
org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:240)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:286)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at 
org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
at 
org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187)
at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)
at java.security.AccessController.doPrivileged(Native Method)
at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)
at 
javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
   

[jira] [Updated] (SPARK-22918) sbt test (spark - local) fail after upgrading to 2.2.1 with: java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine", "u

2017-12-28 Thread Damian Momot (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Damian Momot updated SPARK-22918:
-
Description: 
After upgrading 2.2.0 -> 2.2.1 sbt test command started to fail with following 
exception:

{noformat}
java.security.AccessControlException: access denied 
org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
at 
java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
at 
java.security.AccessController.checkPermission(AccessController.java:884)
at 
org.apache.derby.iapi.security.SecurityUtil.checkDerbyInternalsPrivilege(Unknown
 Source)
at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Unknown 
Source)
at org.apache.derby.iapi.jdbc.JDBCBoot$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
at org.apache.derby.jdbc.EmbeddedDriver.boot(Unknown Source)
at org.apache.derby.jdbc.EmbeddedDriver.(Unknown Source)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at 
org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:47)
at 
org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:85)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at 
org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325)
at 
org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282)
at 
org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:240)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:286)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at 
org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
at 
org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187)
at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)
at java.security.AccessController.doPrivileged(Native Method)
at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)
at 
javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)
at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
   

[jira] [Created] (SPARK-22918) sbt test (spark - local) fail after upgrading to 2.2.1 with: java.security.AccessControlException: access denied org.apache.derby.security.SystemPermission( "engine", "u

2017-12-28 Thread Damian Momot (JIRA)
Damian Momot created SPARK-22918:


 Summary: sbt test (spark - local) fail after upgrading to 2.2.1 
with: java.security.AccessControlException: access denied 
org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
 Key: SPARK-22918
 URL: https://issues.apache.org/jira/browse/SPARK-22918
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.2.1
Reporter: Damian Momot


After upgrading 2.2.0 -> 2.2.1 sbt test command started to fail with following 
exception:

{noformat}
java.security.AccessControlException: access denied 
org.apache.derby.security.SystemPermission( "engine", "usederbyinternals" )
at 
java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
at 
java.security.AccessController.checkPermission(AccessController.java:884)
at 
org.apache.derby.iapi.security.SecurityUtil.checkDerbyInternalsPrivilege(Unknown
 Source)
at org.apache.derby.iapi.services.monitor.Monitor.startMonitor(Unknown 
Source)
at org.apache.derby.iapi.jdbc.JDBCBoot$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
at org.apache.derby.iapi.jdbc.JDBCBoot.boot(Unknown Source)
at org.apache.derby.jdbc.EmbeddedDriver.boot(Unknown Source)
at org.apache.derby.jdbc.EmbeddedDriver.(Unknown Source)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at 
org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:47)
at 
org.datanucleus.store.rdbms.connectionpool.BoneCPConnectionPoolFactory.createConnectionPool(BoneCPConnectionPoolFactory.java:54)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131)
at 
org.datanucleus.store.rdbms.ConnectionFactoryImpl.(ConnectionFactoryImpl.java:85)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at 
org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325)
at 
org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282)
at 
org.datanucleus.store.AbstractStoreManager.(AbstractStoreManager.java:240)
at 
org.datanucleus.store.rdbms.RDBMSStoreManager.(RDBMSStoreManager.java:286)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at 
org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
at 
org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
at 
org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187)
at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333)
at 
org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at javax.jdo.JDOHelper$16.run(JDOHelper.jav

[jira] [Resolved] (SPARK-22917) Should not try to generate histogram for empty/null columns

2017-12-28 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-22917.
-
Resolution: Fixed

Issue resolved by pull request 20102
[https://github.com/apache/spark/pull/20102]

> Should not try to generate histogram for empty/null columns
> ---
>
> Key: SPARK-22917
> URL: https://issues.apache.org/jira/browse/SPARK-22917
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Zhenhua Wang
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22917) Should not try to generate histogram for empty/null columns

2017-12-28 Thread Wenchen Fan (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-22917:
---

Assignee: Zhenhua Wang

> Should not try to generate histogram for empty/null columns
> ---
>
> Key: SPARK-22917
> URL: https://issues.apache.org/jira/browse/SPARK-22917
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Zhenhua Wang
>Assignee: Zhenhua Wang
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-21208) Ability to "setLocalProperty" from sc, in sparkR

2017-12-28 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-21208:


Assignee: Hyukjin Kwon

> Ability to "setLocalProperty" from sc, in sparkR
> 
>
> Key: SPARK-21208
> URL: https://issues.apache.org/jira/browse/SPARK-21208
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 2.1.1
>Reporter: Karuppayya
>Assignee: Hyukjin Kwon
> Fix For: 2.3.0
>
>
> Checked the API 
> [documentation|https://spark.apache.org/docs/latest/api/R/index.html] for 
> sparkR.
> Was not able to find a way to *setLocalProperty* on sc.
> Need ability to *setLocalProperty* on sparkContext(similar to available for 
> pyspark, scala)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21208) Ability to "setLocalProperty" from sc, in sparkR

2017-12-28 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-21208.
--
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 20075
[https://github.com/apache/spark/pull/20075]

> Ability to "setLocalProperty" from sc, in sparkR
> 
>
> Key: SPARK-21208
> URL: https://issues.apache.org/jira/browse/SPARK-21208
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 2.1.1
>Reporter: Karuppayya
>Assignee: Hyukjin Kwon
> Fix For: 2.3.0
>
>
> Checked the API 
> [documentation|https://spark.apache.org/docs/latest/api/R/index.html] for 
> sparkR.
> Was not able to find a way to *setLocalProperty* on sc.
> Need ability to *setLocalProperty* on sparkContext(similar to available for 
> pyspark, scala)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-22843) R localCheckpoint API

2017-12-28 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-22843.
--
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 20073
[https://github.com/apache/spark/pull/20073]

> R localCheckpoint API
> -
>
> Key: SPARK-22843
> URL: https://issues.apache.org/jira/browse/SPARK-22843
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22843) R localCheckpoint API

2017-12-28 Thread Hyukjin Kwon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-22843:


Assignee: Hyukjin Kwon

> R localCheckpoint API
> -
>
> Key: SPARK-22843
> URL: https://issues.apache.org/jira/browse/SPARK-22843
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>Assignee: Hyukjin Kwon
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-21828) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB...again

2017-12-28 Thread Marco Gaido (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Gaido resolved SPARK-21828.
-
Resolution: Duplicate

> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" 
> grows beyond 64 KB...again
> -
>
> Key: SPARK-21828
> URL: https://issues.apache.org/jira/browse/SPARK-21828
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Otis Smart
>Priority: Critical
>
> Hello!
> 1. I encounter a similar issue (see below text) on Pyspark 2.2 (e.g., 
> dataframe with ~5 rows x 1100+ columns as input to ".fit()" method of 
> CrossValidator() that includes Pipeline() that includes StringIndexer(), 
> VectorAssembler() and DecisionTreeClassifier()).
> 2. Was the aforementioned patch (aka 
> fix(https://github.com/apache/spark/pull/15480) not included in the latest 
> release; what are the reason and (source) of and solution to this persistent 
> issue please?
> py4j.protocol.Py4JJavaError: An error occurred while calling o9396.fit.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 38 
> in stage 18.0 failed 4 times, most recent failure: Lost task 38.3 in stage 
> 18.0 (TID 1996, ip-10-0-14-83.ec2.internal, executor 4): 
> java.util.concurrent.ExecutionException: java.lang.Exception: failed to 
> compile: org.codehaus.janino.JaninoRuntimeException: Code of method 
> "compare(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I"
>  of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" 
> grows beyond 64 KB
> /* 001 */ public SpecificOrdering generate(Object[] references)
> { /* 002 */ return new SpecificOrdering(references); /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificOrdering extends 
> org.apache.spark.sql.catalyst.expressions.codegen.BaseOrdering {
> /* 006 */
> /* 007 */ private Object[] references;
> /* 008 */
> /* 009 */
> /* 010 */ public SpecificOrdering(Object[] references)
> { /* 011 */ this.references = references; /* 012 */ /* 013 */ }
> /* 014 */
> /* 015 */
> /* 016 */
> /* 017 */ public int compare(InternalRow a, InternalRow b) {
> /* 018 */ InternalRow i = null; // Holds current row being evaluated.
> /* 019 */
> /* 020 */ i = a;
> /* 021 */ boolean isNullA;
> /* 022 */ double primitiveA;
> /* 023 */
> { /* 024 */ /* 025 */ double value = i.getDouble(0); /* 026 */ isNullA = 
> false; /* 027 */ primitiveA = value; /* 028 */ }
> /* 029 */ i = b;
> /* 030 */ boolean isNullB;
> /* 031 */ double primitiveB;
> /* 032 */
> { /* 033 */ /* 034 */ double value = i.getDouble(0); /* 035 */ isNullB = 
> false; /* 036 */ primitiveB = value; /* 037 */ }
> /* 038 */ if (isNullA && isNullB)
> { /* 039 */ // Nothing /* 040 */ }
> else if (isNullA)
> { /* 041 */ return -1; /* 042 */ }
> else if (isNullB)
> { /* 043 */ return 1; /* 044 */ }
> else {
> /* 045 */ int comp = 
> org.apache.spark.util.Utils.nanSafeCompareDoubles(primitiveA, primitiveB);
> /* 046 */ if (comp != 0)
> { /* 047 */ return comp; /* 048 */ }
> /* 049 */ }
> /* 050 */
> /* 051 */
> ...



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21616) SparkR 2.3.0 migration guide, release note

2017-12-28 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305192#comment-16305192
 ] 

Felix Cheung commented on SPARK-21616:
--

SPARK-22315

> SparkR 2.3.0 migration guide, release note
> --
>
> Key: SPARK-21616
> URL: https://issues.apache.org/jira/browse/SPARK-21616
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>
> From looking at changes since 2.2.0, this/these should be documented in the 
> migration guide / release note for the 2.3.0 release, as it is behavior 
> changes



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22917) Should not try to generate histogram for empty/null columns

2017-12-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22917:


Assignee: Apache Spark

> Should not try to generate histogram for empty/null columns
> ---
>
> Key: SPARK-22917
> URL: https://issues.apache.org/jira/browse/SPARK-22917
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Zhenhua Wang
>Assignee: Apache Spark
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-22917) Should not try to generate histogram for empty/null columns

2017-12-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-22917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-22917:


Assignee: (was: Apache Spark)

> Should not try to generate histogram for empty/null columns
> ---
>
> Key: SPARK-22917
> URL: https://issues.apache.org/jira/browse/SPARK-22917
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Zhenhua Wang
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-22917) Should not try to generate histogram for empty/null columns

2017-12-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-22917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305177#comment-16305177
 ] 

Apache Spark commented on SPARK-22917:
--

User 'wzhfy' has created a pull request for this issue:
https://github.com/apache/spark/pull/20102

> Should not try to generate histogram for empty/null columns
> ---
>
> Key: SPARK-22917
> URL: https://issues.apache.org/jira/browse/SPARK-22917
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Zhenhua Wang
> Fix For: 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-22917) Should not try to generate histogram for empty/null columns

2017-12-28 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-22917:


 Summary: Should not try to generate histogram for empty/null 
columns
 Key: SPARK-22917
 URL: https://issues.apache.org/jira/browse/SPARK-22917
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.3.0
Reporter: Zhenhua Wang






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21727) Operating on an ArrayType in a SparkR DataFrame throws error

2017-12-28 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16305157#comment-16305157
 ] 

Felix Cheung commented on SPARK-21727:
--

[~neilalex] How it is going?

> Operating on an ArrayType in a SparkR DataFrame throws error
> 
>
> Key: SPARK-21727
> URL: https://issues.apache.org/jira/browse/SPARK-21727
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Neil Alexander McQuarrie
>Assignee: Neil Alexander McQuarrie
>
> Previously 
> [posted|https://stackoverflow.com/questions/45056973/sparkr-dataframe-with-r-lists-as-elements]
>  this as a stack overflow question but it seems to be a bug.
> If I have an R data.frame where one of the column data types is an integer 
> *list* -- i.e., each of the elements in the column embeds an entire R list of 
> integers -- then it seems I can convert this data.frame to a SparkR DataFrame 
> just fine... SparkR treats the column as ArrayType(Double). 
> However, any subsequent operation on this SparkR DataFrame appears to throw 
> an error.
> Create an example R data.frame:
> {code}
> indices <- 1:4
> myDf <- data.frame(indices)
> myDf$data <- list(rep(0, 20))}}
> {code}
> Examine it to make sure it looks okay:
> {code}
> > str(myDf) 
> 'data.frame':   4 obs. of  2 variables:  
>  $ indices: int  1 2 3 4  
>  $ data   :List of 4
>..$ : num  0 0 0 0 0 0 0 0 0 0 ...
>..$ : num  0 0 0 0 0 0 0 0 0 0 ...
>..$ : num  0 0 0 0 0 0 0 0 0 0 ...
>..$ : num  0 0 0 0 0 0 0 0 0 0 ...
> > head(myDf)   
>   indices   data 
> 1   1 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
> 2   2 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
> 3   3 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 
> 4   4 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
> {code}
> Convert it to a SparkR DataFrame:
> {code}
> library(SparkR, lib.loc=paste0(Sys.getenv("SPARK_HOME"),"/R/lib"))
> sparkR.session(master = "local[*]")
> mySparkDf <- as.DataFrame(myDf)
> {code}
> Examine the SparkR DataFrame schema; notice that the list column was 
> successfully converted to ArrayType:
> {code}
> > schema(mySparkDf)
> StructType
> |-name = "indices", type = "IntegerType", nullable = TRUE
> |-name = "data", type = "ArrayType(DoubleType,true)", nullable = TRUE
> {code}
> However, operating on the SparkR DataFrame throws an error:
> {code}
> > collect(mySparkDf)
> 17/07/13 17:23:00 ERROR executor.Executor: Exception in task 0.0 in stage 1.0 
> (TID 1)
> java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: 
> java.lang.Double is not a valid external type for schema of array
> if (assertnotnull(input[0, org.apache.spark.sql.Row, true]).isNullAt) null 
> else validateexternaltype(getexternalrowfield(assertnotnull(input[0, 
> org.apache.spark.sql.Row, true]), 0, indices), IntegerType) AS indices#0
> ... long stack trace ...
> {code}
> Using Spark 2.2.0, R 3.4.0, Java 1.8.0_131, Windows 10.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org