date:20170527

[jira] [Assigned] (SPARK-20907) Use testQuietly for test suites that generate long log output

2017-05-27 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20907:


Assignee: Apache Spark

> Use testQuietly for test suites that generate long log output
> -
>
> Key: SPARK-20907
> URL: https://issues.apache.org/jira/browse/SPARK-20907
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Kazuaki Ishizaki
>Assignee: Apache Spark
>
> Use `testQuietly` instead of `test` for test causes that generate long output



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20907) Use testQuietly for test suites that generate long log output

2017-05-27 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027707#comment-16027707
 ] 

Apache Spark commented on SPARK-20907:
--

User 'kiszk' has created a pull request for this issue:
https://github.com/apache/spark/pull/18135

> Use testQuietly for test suites that generate long log output
> -
>
> Key: SPARK-20907
> URL: https://issues.apache.org/jira/browse/SPARK-20907
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Kazuaki Ishizaki
>
> Use `testQuietly` instead of `test` for test causes that generate long output



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-20907) Use testQuietly for test suites that generate long log output

2017-05-27 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20907:


Assignee: (was: Apache Spark)

> Use testQuietly for test suites that generate long log output
> -
>
> Key: SPARK-20907
> URL: https://issues.apache.org/jira/browse/SPARK-20907
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Kazuaki Ishizaki
>
> Use `testQuietly` instead of `test` for test causes that generate long output



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-20909) Build-in SQL Function Support - DAYOFWEEK

2017-05-27 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20909:


Assignee: Apache Spark

> Build-in SQL Function Support - DAYOFWEEK
> -
>
> Key: SPARK-20909
> URL: https://issues.apache.org/jira/browse/SPARK-20909
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>  Labels: starter
>
> {noformat}
> DAYOFWEEK(date)
> {noformat}
> Return the weekday index of the argument.
> Ref: 
> https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_dayofweek



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-20909) Build-in SQL Function Support - DAYOFWEEK

2017-05-27 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20909:


Assignee: (was: Apache Spark)

> Build-in SQL Function Support - DAYOFWEEK
> -
>
> Key: SPARK-20909
> URL: https://issues.apache.org/jira/browse/SPARK-20909
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Yuming Wang
>  Labels: starter
>
> {noformat}
> DAYOFWEEK(date)
> {noformat}
> Return the weekday index of the argument.
> Ref: 
> https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_dayofweek



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20909) Build-in SQL Function Support - DAYOFWEEK

2017-05-27 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027705#comment-16027705
 ] 

Apache Spark commented on SPARK-20909:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/18134

> Build-in SQL Function Support - DAYOFWEEK
> -
>
> Key: SPARK-20909
> URL: https://issues.apache.org/jira/browse/SPARK-20909
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Yuming Wang
>  Labels: starter
>
> {noformat}
> DAYOFWEEK(date)
> {noformat}
> Return the weekday index of the argument.
> Ref: 
> https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_dayofweek



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-20908) Cache Manager: Hint should be ignored in plan matching

2017-05-27 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-20908.
-
   Resolution: Fixed
Fix Version/s: 2.2.0

> Cache Manager: Hint should be ignored in plan matching
> --
>
> Key: SPARK-20908
> URL: https://issues.apache.org/jira/browse/SPARK-20908
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Xiao Li
>Assignee: Xiao Li
> Fix For: 2.2.0
>
>
> In Cache manager, the plan matching should ignore Hint. 
> {noformat}
>   val df1 = spark.range(10).join(broadcast(spark.range(10)))
>   df1.cache()
>   spark.range(10).join(spark.range(10)).explain()
> {noformat}
> The output plan of the above query shows that the second query is  not using 
> the cached data of the first query.
> {noformat}
> BroadcastNestedLoopJoin BuildRight, Inner
> :- *Range (0, 10, step=1, splits=2)
> +- BroadcastExchange IdentityBroadcastMode
>+- *Range (0, 10, step=1, splits=2)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20909) Build-in SQL Function Support - DAYOFWEEK

2017-05-27 Thread Yuming Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027694#comment-16027694
 ] 

Yuming Wang commented on SPARK-20909:
-

I'm work on this.

> Build-in SQL Function Support - DAYOFWEEK
> -
>
> Key: SPARK-20909
> URL: https://issues.apache.org/jira/browse/SPARK-20909
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Yuming Wang
>  Labels: starter
>
> {noformat}
> DAYOFWEEK(date)
> {noformat}
> Return the weekday index of the argument.
> Ref: 
> https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_dayofweek



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-20909) Build-in SQL Function Support - DAYOFWEEK

2017-05-27 Thread Yuming Wang (JIRA)

Yuming Wang created SPARK-20909:
---

 Summary: Build-in SQL Function Support - DAYOFWEEK
 Key: SPARK-20909
 URL: https://issues.apache.org/jira/browse/SPARK-20909
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 2.2.0
Reporter: Yuming Wang


{noformat}
DAYOFWEEK(date)
{noformat}
Return the weekday index of the argument.

Ref: 
https://dev.mysql.com/doc/refman/5.5/en/date-and-time-functions.html#function_dayofweek



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20894) Error while checkpointing to HDFS (similar to JIRA SPARK-19268)

2017-05-27 Thread kant kodali (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027685#comment-16027685
 ] 

kant kodali commented on SPARK-20894:
-

Not a Bug.

> Error while checkpointing to HDFS (similar to JIRA SPARK-19268)
> ---
>
> Key: SPARK-20894
> URL: https://issues.apache.org/jira/browse/SPARK-20894
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.1.1
> Environment: Ubuntu, Spark 2.1.1, hadoop 2.7
>Reporter: kant kodali
> Attachments: driver_info_log, executor1_log, executor2_log
>
>
> Dataset df2 = df1.groupBy(functions.window(df1.col("Timestamp5"), "24 
> hours", "24 hours"), df1.col("AppName")).count();
> StreamingQuery query = df2.writeStream().foreach(new 
> KafkaSink()).option("checkpointLocation","/usr/local/hadoop/checkpoint").outputMode("update").start();
> query.awaitTermination();
> This for some reason fails with the Error 
> ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
> java.lang.IllegalStateException: Error reading delta file 
> /usr/local/hadoop/checkpoint/state/0/0/1.delta of HDFSStateStoreProvider[id = 
> (op=0, part=0), dir = /usr/local/hadoop/checkpoint/state/0/0]: 
> /usr/local/hadoop/checkpoint/state/0/0/1.delta does not exist
> I did clear all the checkpoint data in /usr/local/hadoop/checkpoint/  and all 
> consumer offsets in Kafka from all brokers prior to running and yet this 
> error still persists. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-20894) Error while checkpointing to HDFS (similar to JIRA SPARK-19268)

2017-05-27 Thread kant kodali (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kant kodali closed SPARK-20894.
---
Resolution: Not A Problem

> Error while checkpointing to HDFS (similar to JIRA SPARK-19268)
> ---
>
> Key: SPARK-20894
> URL: https://issues.apache.org/jira/browse/SPARK-20894
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.1.1
> Environment: Ubuntu, Spark 2.1.1, hadoop 2.7
>Reporter: kant kodali
> Attachments: driver_info_log, executor1_log, executor2_log
>
>
> Dataset df2 = df1.groupBy(functions.window(df1.col("Timestamp5"), "24 
> hours", "24 hours"), df1.col("AppName")).count();
> StreamingQuery query = df2.writeStream().foreach(new 
> KafkaSink()).option("checkpointLocation","/usr/local/hadoop/checkpoint").outputMode("update").start();
> query.awaitTermination();
> This for some reason fails with the Error 
> ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
> java.lang.IllegalStateException: Error reading delta file 
> /usr/local/hadoop/checkpoint/state/0/0/1.delta of HDFSStateStoreProvider[id = 
> (op=0, part=0), dir = /usr/local/hadoop/checkpoint/state/0/0]: 
> /usr/local/hadoop/checkpoint/state/0/0/1.delta does not exist
> I did clear all the checkpoint data in /usr/local/hadoop/checkpoint/  and all 
> consumer offsets in Kafka from all brokers prior to running and yet this 
> error still persists. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8184) date/time function: weekofyear

2017-05-27 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027680#comment-16027680
 ] 

Apache Spark commented on SPARK-8184:
-

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/18132

> date/time function: weekofyear
> --
>
> Key: SPARK-8184
> URL: https://issues.apache.org/jira/browse/SPARK-8184
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Tarek Auel
> Fix For: 1.5.0
>
>
> weekofyear(string|date|timestamp): int
> Returns the week number of a timestamp string: weekofyear("1970-11-01 
> 00:00:00") = 44, weekofyear("1970-11-01") = 44.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-20908) Cache Manager: Hint should be ignored in plan matching

2017-05-27 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20908:


Assignee: Apache Spark  (was: Xiao Li)

> Cache Manager: Hint should be ignored in plan matching
> --
>
> Key: SPARK-20908
> URL: https://issues.apache.org/jira/browse/SPARK-20908
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>
> In Cache manager, the plan matching should ignore Hint. 
> {noformat}
>   val df1 = spark.range(10).join(broadcast(spark.range(10)))
>   df1.cache()
>   spark.range(10).join(spark.range(10)).explain()
> {noformat}
> The output plan of the above query shows that the second query is  not using 
> the cached data of the first query.
> {noformat}
> BroadcastNestedLoopJoin BuildRight, Inner
> :- *Range (0, 10, step=1, splits=2)
> +- BroadcastExchange IdentityBroadcastMode
>+- *Range (0, 10, step=1, splits=2)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-20908) Cache Manager: Hint should be ignored in plan matching

2017-05-27 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20908:


Assignee: Xiao Li  (was: Apache Spark)

> Cache Manager: Hint should be ignored in plan matching
> --
>
> Key: SPARK-20908
> URL: https://issues.apache.org/jira/browse/SPARK-20908
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> In Cache manager, the plan matching should ignore Hint. 
> {noformat}
>   val df1 = spark.range(10).join(broadcast(spark.range(10)))
>   df1.cache()
>   spark.range(10).join(spark.range(10)).explain()
> {noformat}
> The output plan of the above query shows that the second query is  not using 
> the cached data of the first query.
> {noformat}
> BroadcastNestedLoopJoin BuildRight, Inner
> :- *Range (0, 10, step=1, splits=2)
> +- BroadcastExchange IdentityBroadcastMode
>+- *Range (0, 10, step=1, splits=2)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20908) Cache Manager: Hint should be ignored in plan matching

2017-05-27 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027660#comment-16027660
 ] 

Apache Spark commented on SPARK-20908:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/18131

> Cache Manager: Hint should be ignored in plan matching
> --
>
> Key: SPARK-20908
> URL: https://issues.apache.org/jira/browse/SPARK-20908
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> In Cache manager, the plan matching should ignore Hint. 
> {noformat}
>   val df1 = spark.range(10).join(broadcast(spark.range(10)))
>   df1.cache()
>   spark.range(10).join(spark.range(10)).explain()
> {noformat}
> The output plan of the above query shows that the second query is  not using 
> the cached data of the first query.
> {noformat}
> BroadcastNestedLoopJoin BuildRight, Inner
> :- *Range (0, 10, step=1, splits=2)
> +- BroadcastExchange IdentityBroadcastMode
>+- *Range (0, 10, step=1, splits=2)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20908) Cache Manager: Hint should be ignored in plan matching

2017-05-27 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-20908:

Description: 
In Cache manager, the plan matching should ignore Hint. 

{noformat}
  val df1 = spark.range(10).join(broadcast(spark.range(10)))
  df1.cache()
  spark.range(10).join(spark.range(10)).explain()
{noformat}

The output plan of the above query shows that the second query is  not using 
the cached data of the first query.

{noformat}
BroadcastNestedLoopJoin BuildRight, Inner
:- *Range (0, 10, step=1, splits=2)
+- BroadcastExchange IdentityBroadcastMode
   +- *Range (0, 10, step=1, splits=2)
{noformat}

  was:
In Cache manager, the plan matching should ignore Hint. 

{noformat}
  val df1 = spark.range(10).join(broadcast(spark.range(10)))
  df1.cache()
  spark.range(10).join(spark.range(10)).explain()
{noformat}

The above query shows the plan that does not use the cached data
{noformat}
BroadcastNestedLoopJoin BuildRight, Inner
:- *Range (0, 10, step=1, splits=2)
+- BroadcastExchange IdentityBroadcastMode
   +- *Range (0, 10, step=1, splits=2)
{noformat}


> Cache Manager: Hint should be ignored in plan matching
> --
>
> Key: SPARK-20908
> URL: https://issues.apache.org/jira/browse/SPARK-20908
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.1, 2.2.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> In Cache manager, the plan matching should ignore Hint. 
> {noformat}
>   val df1 = spark.range(10).join(broadcast(spark.range(10)))
>   df1.cache()
>   spark.range(10).join(spark.range(10)).explain()
> {noformat}
> The output plan of the above query shows that the second query is  not using 
> the cached data of the first query.
> {noformat}
> BroadcastNestedLoopJoin BuildRight, Inner
> :- *Range (0, 10, step=1, splits=2)
> +- BroadcastExchange IdentityBroadcastMode
>+- *Range (0, 10, step=1, splits=2)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-20908) Cache Manager: Hint should be ignored in plan matching

2017-05-27 Thread Xiao Li (JIRA)

Xiao Li created SPARK-20908:
---

 Summary: Cache Manager: Hint should be ignored in plan matching
 Key: SPARK-20908
 URL: https://issues.apache.org/jira/browse/SPARK-20908
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.1, 2.2.0
Reporter: Xiao Li
Assignee: Xiao Li


In Cache manager, the plan matching should ignore Hint. 

{noformat}
  val df1 = spark.range(10).join(broadcast(spark.range(10)))
  df1.cache()
  spark.range(10).join(spark.range(10)).explain()
{noformat}

The above query shows the plan that does not use the cached data
{noformat}
BroadcastNestedLoopJoin BuildRight, Inner
:- *Range (0, 10, step=1, splits=2)
+- BroadcastExchange IdentityBroadcastMode
   +- *Range (0, 10, step=1, splits=2)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-20876) If the input parameter is float type for ceil or floor ,the result is not we expected

2017-05-27 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-20876:
---

Assignee: liuxian

> If the input parameter is float type for  ceil or floor ,the result is not we 
> expected
> --
>
> Key: SPARK-20876
> URL: https://issues.apache.org/jira/browse/SPARK-20876
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.0
>Reporter: liuxian
>Assignee: liuxian
> Fix For: 2.3.0
>
>
> spark-sql>SELECT ceil(cast(12345.1233 as float));
> spark-sql>12345
> For this case, the result we expected is 12346
> spark-sql>SELECT floor(cast(-12345.1233 as float));
> spark-sql>-12345
> For this case, the result we expected is  -12346



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-20876) If the input parameter is float type for ceil or floor ,the result is not we expected

2017-05-27 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-20876.
-
   Resolution: Fixed
Fix Version/s: 2.3.0

> If the input parameter is float type for  ceil or floor ,the result is not we 
> expected
> --
>
> Key: SPARK-20876
> URL: https://issues.apache.org/jira/browse/SPARK-20876
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.1.0, 2.2.0
>Reporter: liuxian
>Assignee: liuxian
> Fix For: 2.3.0
>
>
> spark-sql>SELECT ceil(cast(12345.1233 as float));
> spark-sql>12345
> For this case, the result we expected is 12346
> spark-sql>SELECT floor(cast(-12345.1233 as float));
> spark-sql>-12345
> For this case, the result we expected is  -12346



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-20897) cached self-join should not fail

2017-05-27 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-20897.
-
   Resolution: Fixed
Fix Version/s: 2.2.0

> cached self-join should not fail
> 
>
> Key: SPARK-20897
> URL: https://issues.apache.org/jira/browse/SPARK-20897
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.2.0
>
>
> code to reproduce this bug:
> {code}
> // force to plan sort merge join
> spark.conf.set("spark.sql.autoBroadcastJoinThreshold", "0")
> val df = Seq(1 -> "a").toDF("i", "j")
> val df1 = df.as("t1")
> val df2 = df.as("t2")
> assert(df1.join(df2, $"t1.i" === $"t2.i").cache().count() == 1)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-20907) Use testQuietly for test suites that generate long log output

2017-05-27 Thread Kazuaki Ishizaki (JIRA)

Kazuaki Ishizaki created SPARK-20907:


 Summary: Use testQuietly for test suites that generate long log 
output
 Key: SPARK-20907
 URL: https://issues.apache.org/jira/browse/SPARK-20907
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 2.2.0, 2.3.0
Reporter: Kazuaki Ishizaki


Use `testQuietly` instead of `test` for test causes that generate long output



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-19809) NullPointerException on empty ORC file

2017-05-27 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027476#comment-16027476
 ] 

Dongjoon Hyun edited comment on SPARK-19809 at 5/27/17 4:18 PM:


[~hyukjin.kwon]. I don't think so. Parquet file does not need 
`spark.sql.files.ignoreCorruptFiles` option.

{code}
scala> sql("create table empty_parquet(a int) stored as parquet location 
'/tmp/empty_parquet'").show
++
||
++
++

$ touch /tmp/empty_parquet/zero.parquet

scala> sql("select * from empty_parquet").show
+---+
|  a|
+---+
+---+
{code}

You can test this in Spark with SPARK-20728.

{code}
scala> sql("create table empty_orc2(a int) using orc location 
'/tmp/empty_orc'").show
++
||
++
++

scala> sql("select * from empty_orc2").show
+---+
|  a|
+---+
+---+
{code}

I think this is a part of SPARK-20901. And ORC community will handle this. What 
we need is just to use latest ORC. One thing I'm wondering is this is tracked 
in https://issues.apache.org/jira/browse/ORC-162 (Open).


was (Author: dongjoon):
[~hyukjin.kwon]. I don't think so. Parquet file does not need 
`spark.sql.files.ignoreCorruptFiles` option.

{code}
scala> sql("create table empty_parquet(a int) stored as parquet location 
'/tmp/empty_parquet'").show
++
||
++
++

$ touch /tmp/empty_parquet/zero.parquet

scala> sql("select * from empty_parquet").show
+---+
|  a|
+---+
+---+
{code}

Also latest ORC file does not, too. It's fixed in 
https://issues.apache.org/jira/browse/ORC-162 . You can test this in Spark with 
SPARK-20728.
{code}
scala> sql("create table empty_orc2(a int) using orc location 
'/tmp/empty_orc'").show
++
||
++
++

scala> sql("select * from empty_orc2").show
+---+
|  a|
+---+
+---+
{code}

I think this is a part of SPARK-20901. And ORC community already resolved this. 
What we need is just to use latest ORC.

> NullPointerException on empty ORC file
> --
>
> Key: SPARK-19809
> URL: https://issues.apache.org/jira/browse/SPARK-19809
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 1.6.3, 2.0.2, 2.1.1
>Reporter: Michał Dawid
>
> When reading from hive ORC table if there are some 0 byte files we get 
> NullPointerException:
> {code}java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$BISplitStrategy.getSplits(OrcInputFormat.java:560)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1010)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>   at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
>   at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>   at

[jira] [Commented] (SPARK-19809) NullPointerException on empty ORC file

2017-05-27 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027476#comment-16027476
 ] 

Dongjoon Hyun commented on SPARK-19809:
---

[~hyukjin.kwon]. I don't think so. Parquet file does not need 
`spark.sql.files.ignoreCorruptFiles` option.

{code}
scala> sql("create table empty_parquet(a int) stored as parquet location 
'/tmp/empty_parquet'").show
++
||
++
++

$ touch /tmp/empty_parquet/zero.parquet

scala> sql("select * from empty_parquet").show
+---+
|  a|
+---+
+---+
{code}

Also latest ORC file does not, too. It's fixed in 
https://issues.apache.org/jira/browse/ORC-162 . You can test this in Spark with 
SPARK-20728.
{code}
scala> sql("create table empty_orc2(a int) using orc location 
'/tmp/empty_orc'").show
++
||
++
++

scala> sql("select * from empty_orc2").show
+---+
|  a|
+---+
+---+
{code}

I think this is a part of SPARK-20901. And ORC community already resolved this. 
What we need is just to use latest ORC.

> NullPointerException on empty ORC file
> --
>
> Key: SPARK-19809
> URL: https://issues.apache.org/jira/browse/SPARK-19809
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 1.6.3, 2.0.2, 2.1.1
>Reporter: Michał Dawid
>
> When reading from hive ORC table if there are some 0 byte files we get 
> NullPointerException:
> {code}java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$BISplitStrategy.getSplits(OrcInputFormat.java:560)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1010)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>   at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
>   at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:190)
>   at 
> org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:165)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174)
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499)
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
>   at 
> org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2086)
>   at 
>

[jira] [Assigned] (SPARK-20875) Spark should print the log when the directory has been deleted

2017-05-27 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-20875:
-

Assignee: liuzhaokun
Priority: Trivial  (was: Major)

> Spark should print the log when the directory has been deleted
> --
>
> Key: SPARK-20875
> URL: https://issues.apache.org/jira/browse/SPARK-20875
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.1
>Reporter: liuzhaokun
>Assignee: liuzhaokun
>Priority: Trivial
> Fix For: 2.3.0
>
>
> When the "deleteRecursively" method is invoked,spark doesn't print any log if 
> the path was deleted.For example,spark only print "Removing directory" when 
> the worker began cleaning spark.work.dir,but didn't print any log about "the 
> path has been delete".So, I can't judge whether  the path was deleted form 
> the worker's logfile,If there is any accidents about Linux.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-20875) Spark should print the log when the directory has been deleted

2017-05-27 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-20875.
---
   Resolution: Fixed
Fix Version/s: 2.3.0

Issue resolved by pull request 18102
[https://github.com/apache/spark/pull/18102]

> Spark should print the log when the directory has been deleted
> --
>
> Key: SPARK-20875
> URL: https://issues.apache.org/jira/browse/SPARK-20875
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.1
>Reporter: liuzhaokun
> Fix For: 2.3.0
>
>
> When the "deleteRecursively" method is invoked,spark doesn't print any log if 
> the path was deleted.For example,spark only print "Removing directory" when 
> the worker began cleaning spark.work.dir,but didn't print any log about "the 
> path has been delete".So, I can't judge whether  the path was deleted form 
> the worker's logfile,If there is any accidents about Linux.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20393) Strengthen Spark to prevent XSS vulnerabilities

2017-05-27 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-20393:
--
 Priority: Major  (was: Minor)
Fix Version/s: 2.1.2

> Strengthen Spark to prevent XSS vulnerabilities
> ---
>
> Key: SPARK-20393
> URL: https://issues.apache.org/jira/browse/SPARK-20393
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.5.2, 2.0.2, 2.1.0
>Reporter: Nicholas Marion
>Assignee: Nicholas Marion
>  Labels: security
> Fix For: 2.1.2, 2.2.0
>
>
> Using IBM Security AppScan Standard, we discovered several easy to recreate 
> MHTML cross site scripting vulnerabilities in the Apache Spark Web GUI 
> application and these vulnerabilities were found to exist in Spark version 
> 1.5.2 and 2.0.2, the two levels we initially tested. Cross-site scripting 
> attack is not really an attack on the Spark server as much as an attack on 
> the end user, taking advantage of their trust in the Spark server to get them 
> to click on a URL like the ones in the examples below.  So whether the user 
> could or could not change lots of stuff on the Spark server is not the key 
> point.  It is an attack on the user themselves.  If they click the link the 
> script could run in their browser and comprise their device.  Once the 
> browser is compromised it could submit Spark requests but it also might not.
> https://blogs.technet.microsoft.com/srd/2011/01/28/more-information-about-the-mhtml-script-injection-vulnerability/
> {quote}
> Request: GET 
> /app/?appId=Content-Type:%20multipart/related;%20boundary=_AppScan%0d%0a--
> _AppScan%0d%0aContent-Location:foo%0d%0aContent-Transfer-
> Encoding:base64%0d%0a%0d%0aPGh0bWw%2bPHNjcmlwdD5hbGVydCgiWFNTIik8L3NjcmlwdD48L2h0bWw%2b%0d%0a
> HTTP/1.1
> Excerpt from response: No running application with ID 
> Content-Type: multipart/related;
> boundary=_AppScan
> --_AppScan
> Content-Location:foo
> Content-Transfer-Encoding:base64
> PGh0bWw+PHNjcmlwdD5hbGVydCgiWFNTIik8L3NjcmlwdD48L2h0bWw+
> 
> Result: In the above payload the BASE64 data decodes as:
> alert("XSS")
> Request: GET 
> /history/app-20161012202114-0038/stages/stage?id=1=0=Content-
> Type:%20multipart/related;%20boundary=_AppScan%0d%0a--_AppScan%0d%0aContent-
> Location:foo%0d%0aContent-Transfer-
> Encoding:base64%0d%0a%0d%0aPGh0bWw%2bPHNjcmlwdD5hbGVydCgiWFNTIik8L3NjcmlwdD48L2h0bWw%2b%0d%0a
> k.pageSize=100 HTTP/1.1
> Excerpt from response: Content-Type: multipart/related;
> boundary=_AppScan
> --_AppScan
> Content-Location:foo
> Content-Transfer-Encoding:base64
> PGh0bWw+PHNjcmlwdD5hbGVydCgiWFNTIik8L3NjcmlwdD48L2h0bWw+
> Result: In the above payload the BASE64 data decodes as:
> alert("XSS")
> Request: GET /log?appId=app-20170113131903-=0=Content-
> Type:%20multipart/related;%20boundary=_AppScan%0d%0a--_AppScan%0d%0aContent-
> Location:foo%0d%0aContent-Transfer-
> Encoding:base64%0d%0a%0d%0aPGh0bWw%2bPHNjcmlwdD5hbGVydCgiWFNTIik8L3NjcmlwdD48L2h0bWw%2b%0d%0a
> eLength=0 HTTP/1.1
> Excerpt from response:  Bytes 0-0 of 0 of 
> /u/nmarion/Spark_2.0.2.0/Spark-DK/work/app-20170113131903-/0/Content-
> Type: multipart/related; boundary=_AppScan
> --_AppScan
> Content-Location:foo
> Content-Transfer-Encoding:base64
> PGh0bWw+PHNjcmlwdD5hbGVydCgiWFNTIik8L3NjcmlwdD48L2h0bWw+
> Result: In the above payload the BASE64 data decodes as:
> alert("XSS")
> {quote}
> security@apache was notified and recommended a PR.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20896) spark executor get java.lang.ClassCastException when trigger two job at same time

2017-05-27 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027389#comment-16027389
 ] 

Sean Owen commented on SPARK-20896:
---

I don't think it has anything to do with running two jobs at the same time.
You show some errors in your code above, is that related?
If you're saying it's not a problem in spark-shell or spark-submit, then it's 
something to do with how your code interacts with Zeppelin, maybe.

> spark executor get java.lang.ClassCastException when trigger two job at same 
> time
> -
>
> Key: SPARK-20896
> URL: https://issues.apache.org/jira/browse/SPARK-20896
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 1.6.1
>Reporter: poseidon
>
> 1、zeppelin 0.6.2  in *SCOPE* mode 
> 2、spark 1.6.2 
> 3、HDP 2.4 for HDFS YARN 
> trigger scala code like :
> {quote}
> var tmpDataFrame = sql(" select b1,b2,b3 from xxx.x")
> val vectorDf = assembler.transform(tmpDataFrame)
> val vectRdd = vectorDf.select("features").map{x:Row => x.getAs[Vector](0)}
> val correlMatrix: Matrix = Statistics.corr(vectRdd, "spearman")
> val columns = correlMatrix.toArray.grouped(correlMatrix.numRows)
> val rows = columns.toSeq.transpose
> val vectors = rows.map(row => new DenseVector(row.toArray))
> val vRdd = sc.parallelize(vectors)
> import sqlContext.implicits._
> val dfV = vRdd.map(_.toArray).map{ case Array(b1,b2,b3) => (b1,b2,b3) }.toDF()
> val rows = dfV.rdd.zipWithIndex.map(_.swap)
>   
> .join(sc.parallelize(Array("b1","b2","b3")).zipWithIndex.map(_.swap))
>   .values.map{case (row: Row, x: String) => Row.fromSeq(row.toSeq 
> :+ x)}
> {quote}
> ---
> and code :
> {quote}
> var df = sql("select b1,b2 from .x")
> var i = 0
> var threshold = Array(2.0,3.0)
> var inputCols = Array("b1","b2")
> var tmpDataFrame = df
> for (col <- inputCols){
>   val binarizer: Binarizer = new Binarizer().setInputCol(col)
> .setOutputCol(inputCols(i)+"_binary")
> .setThreshold(threshold(i))
>   tmpDataFrame = binarizer.transform(tmpDataFrame).drop(inputCols(i))
>   i = i+1
> }
> var saveDFBin = tmpDataFrame
> val dfAppendBin = sql("select b3 from poseidon.corelatdemo")
> val rows = saveDFBin.rdd.zipWithIndex.map(_.swap)
>   .join(dfAppendBin.rdd.zipWithIndex.map(_.swap))
>   .values.map{case (row1: Row, row2: Row) => Row.fromSeq(row1.toSeq 
> ++ row2.toSeq)}
> import org.apache.spark.sql.types.StructType
> val rowSchema = StructType(saveDFBin.schema.fields ++ 
> dfAppendBin.schema.fields)
> saveDFBin = sqlContext.createDataFrame(rows, rowSchema)
> //save result to table
> import org.apache.spark.sql.SaveMode
> saveDFBin.write.mode(SaveMode.Overwrite).saveAsTable(".")
> sql("alter table . set lifecycle 1")
> {quote}
> on zeppelin with two different notebook at same time. 
> Found this exception log in  executor :
> {quote}
> l1.dtdream.com): java.lang.ClassCastException: 
> org.apache.spark.mllib.linalg.DenseVector cannot be cast to scala.Tuple2
> at 
> $line127359816836.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:34)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1597)
> at 
> org.apache.spark.rdd.ZippedWithIndexRDD$$anonfun$2.apply(ZippedWithIndexRDD.scala:52)
> at 
> org.apache.spark.rdd.ZippedWithIndexRDD$$anonfun$2.apply(ZippedWithIndexRDD.scala:52)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1875)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1875)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {quote}
> OR 
> {quote}
> java.lang.ClassCastException: scala.Tuple2 cannot be cast to 
> org.apache.spark.mllib.linalg.DenseVector
> at 
> $line34684895436.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(:57)
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
> at 
>

[jira] [Commented] (SPARK-20320) AnalysisException: Columns of grouping_id (count(value#17L)) does not match grouping columns (count(value#17L))

2017-05-27 Thread lyc (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027384#comment-16027384
 ] 

lyc commented on SPARK-20320:
-

It seems `count("value")` should not be in `cube`, there should only be column 
names.Like in `groupBy`, it is invalid to `group by count("value")`.

> AnalysisException: Columns of grouping_id (count(value#17L)) does not match 
> grouping columns (count(value#17L))
> ---
>
> Key: SPARK-20320
> URL: https://issues.apache.org/jira/browse/SPARK-20320
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Jacek Laskowski
>Priority: Minor
>
> I'm not questioning the {{AnalysisException}} (which I don't know whether 
> should be reported or not), but the exception message that tells...nothing 
> helpful.
> {code}
> val records = spark.range(5).flatMap(n => Seq.fill(n.toInt)(n))
> scala> 
> records.cube(count("value")).agg(grouping_id(count("value"))).queryExecution.logical
> org.apache.spark.sql.AnalysisException: Columns of grouping_id 
> (count(value#17L)) does not match grouping columns (count(value#17L));
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics$$anonfun$org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveGroupingAnalytics$$replaceGroupingFunc$1.applyOrElse(Analyzer.scala:313)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveGroupingAnalytics$$anonfun$org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveGroupingAnalytics$$replaceGroupingFunc$1.applyOrElse(Analyzer.scala:308)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20891) Reduce duplicate code in typedaggregators.scala

2017-05-27 Thread Ruben Janssen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027381#comment-16027381
 ] 

Ruben Janssen commented on SPARK-20891:
---

Ok I will take that approach next time, thanks for the suggestion :)
I have submitted the change and will continue with 20890 when its merged in.

> Reduce duplicate code in typedaggregators.scala
> ---
>
> Key: SPARK-20891
> URL: https://issues.apache.org/jira/browse/SPARK-20891
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Ruben Janssen
>
> With SPARK-20411, a significant amount of functions will be added to 
> typedaggregators.scala, resulting in a large amount of duplicate code



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-20905) When running spark with yarn-client, large executor-cores will lead to bad performance.

2017-05-27 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-20905.
---
Resolution: Invalid

Questions should go to the mailing list, not JIRA

> When running spark with yarn-client, large executor-cores will lead to bad 
> performance. 
> 
>
> Key: SPARK-20905
> URL: https://issues.apache.org/jira/browse/SPARK-20905
> Project: Spark
>  Issue Type: Question
>  Components: Examples
>Affects Versions: 2.0.0
>Reporter: Cherry Zhang
>
> Hi, all:
>  When I run a training job in spark with yarn-client, and set 
> executor-cores=20(less than vcores=24) and executor-num=4(my cluster has 4 
> slaves), then there will be always one node computing time is larger than 
> others.
> I checked some blogs, and they says executor-cores should be set less than 5 
> if there are tons of concurrency threads. I tried to set executor-cores=4, 
> and  executor-num=20, then it worked.
> But I don't know why, can you give some explain? Thank you very much.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20894) Error while checkpointing to HDFS (similar to JIRA SPARK-19268)

2017-05-27 Thread kant kodali (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kant kodali updated SPARK-20894:

Attachment: driver_info_log

> Error while checkpointing to HDFS (similar to JIRA SPARK-19268)
> ---
>
> Key: SPARK-20894
> URL: https://issues.apache.org/jira/browse/SPARK-20894
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.1.1
> Environment: Ubuntu, Spark 2.1.1, hadoop 2.7
>Reporter: kant kodali
> Attachments: driver_info_log, executor1_log, executor2_log
>
>
> Dataset df2 = df1.groupBy(functions.window(df1.col("Timestamp5"), "24 
> hours", "24 hours"), df1.col("AppName")).count();
> StreamingQuery query = df2.writeStream().foreach(new 
> KafkaSink()).option("checkpointLocation","/usr/local/hadoop/checkpoint").outputMode("update").start();
> query.awaitTermination();
> This for some reason fails with the Error 
> ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
> java.lang.IllegalStateException: Error reading delta file 
> /usr/local/hadoop/checkpoint/state/0/0/1.delta of HDFSStateStoreProvider[id = 
> (op=0, part=0), dir = /usr/local/hadoop/checkpoint/state/0/0]: 
> /usr/local/hadoop/checkpoint/state/0/0/1.delta does not exist
> I did clear all the checkpoint data in /usr/local/hadoop/checkpoint/  and all 
> consumer offsets in Kafka from all brokers prior to running and yet this 
> error still persists. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20894) Error while checkpointing to HDFS (similar to JIRA SPARK-19268)

2017-05-27 Thread kant kodali (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kant kodali updated SPARK-20894:

Attachment: (was: driver_log)

> Error while checkpointing to HDFS (similar to JIRA SPARK-19268)
> ---
>
> Key: SPARK-20894
> URL: https://issues.apache.org/jira/browse/SPARK-20894
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.1.1
> Environment: Ubuntu, Spark 2.1.1, hadoop 2.7
>Reporter: kant kodali
> Attachments: driver_info_log, executor1_log, executor2_log
>
>
> Dataset df2 = df1.groupBy(functions.window(df1.col("Timestamp5"), "24 
> hours", "24 hours"), df1.col("AppName")).count();
> StreamingQuery query = df2.writeStream().foreach(new 
> KafkaSink()).option("checkpointLocation","/usr/local/hadoop/checkpoint").outputMode("update").start();
> query.awaitTermination();
> This for some reason fails with the Error 
> ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
> java.lang.IllegalStateException: Error reading delta file 
> /usr/local/hadoop/checkpoint/state/0/0/1.delta of HDFSStateStoreProvider[id = 
> (op=0, part=0), dir = /usr/local/hadoop/checkpoint/state/0/0]: 
> /usr/local/hadoop/checkpoint/state/0/0/1.delta does not exist
> I did clear all the checkpoint data in /usr/local/hadoop/checkpoint/  and all 
> consumer offsets in Kafka from all brokers prior to running and yet this 
> error still persists. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20897) cached self-join should not fail

2017-05-27 Thread Wenchen Fan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-20897:

Target Version/s: 2.2.0

> cached self-join should not fail
> 
>
> Key: SPARK-20897
> URL: https://issues.apache.org/jira/browse/SPARK-20897
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>
> code to reproduce this bug:
> {code}
> // force to plan sort merge join
> spark.conf.set("spark.sql.autoBroadcastJoinThreshold", "0")
> val df = Seq(1 -> "a").toDF("i", "j")
> val df1 = df.as("t1")
> val df2 = df.as("t2")
> assert(df1.join(df2, $"t1.i" === $"t2.i").cache().count() == 1)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19809) NullPointerException on empty ORC file

2017-05-27 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027361#comment-16027361
 ] 

Hyukjin Kwon commented on SPARK-19809:
--

I think this is then rather about handling malformed files (e.g., 
{{spark.sql.files.ignoreCorruptFiles}}).

> NullPointerException on empty ORC file
> --
>
> Key: SPARK-19809
> URL: https://issues.apache.org/jira/browse/SPARK-19809
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 1.6.3, 2.0.2, 2.1.1
>Reporter: Michał Dawid
>
> When reading from hive ORC table if there are some 0 byte files we get 
> NullPointerException:
> {code}java.lang.NullPointerException
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$BISplitStrategy.getSplits(OrcInputFormat.java:560)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1010)
>   at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048)
>   at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:199)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>   at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
>   at org.apache.spark.rdd.UnionRDD$$anonfun$1.apply(UnionRDD.scala:66)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at org.apache.spark.rdd.UnionRDD.getPartitions(UnionRDD.scala:66)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:242)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:240)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:190)
>   at 
> org.apache.spark.sql.execution.Limit.executeCollect(basicOperators.scala:165)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeCollectPublic(SparkPlan.scala:174)
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499)
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$org$apache$spark$sql$DataFrame$$execute$1$1.apply(DataFrame.scala:1499)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56)
>   at 
> org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2086)
>   at 
> org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$execute$1(DataFrame.scala:1498)
>   at 
> org.apache.spark.sql.DataFrame.org$apache$spark$sql$DataFrame$$collect(DataFrame.scala:1505)
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1375)
>   at 
> org.apache.spark.sql.DataFrame$$anonfun$head$1.apply(DataFrame.scala:1374)
>   at org.apache.spark.sql.DataFrame.withCallback(DataFrame.scala:2099)
>   at org.apache.spark.sql.DataFrame.head(DataFrame.scala:1374)
>   at org.apache.spark.sql.DataFrame.take(DataFrame.scala:1456)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
>

[jira] [Assigned] (SPARK-20365) Not so accurate classpath format for AM and Containers

2017-05-27 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20365:


Assignee: (was: Apache Spark)

> Not so accurate classpath format for AM and Containers
> --
>
> Key: SPARK-20365
> URL: https://issues.apache.org/jira/browse/SPARK-20365
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.2.0
>Reporter: Saisai Shao
>Priority: Minor
>
> In Spark on YARN, when configuring "spark.yarn.jars" with local jars (jars 
> started with "local" scheme), we will get inaccurate classpath for AM and 
> containers. This is because we don't remove "local" scheme when concatenating 
> classpath. It is OK to run because classpath is separated with ":" and java 
> treat "local" as a separate jar. But we could improve it to remove the scheme.
> {code}
> java.class.path = 
>

[jira] [Assigned] (SPARK-20365) Not so accurate classpath format for AM and Containers

2017-05-27 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20365:


Assignee: Apache Spark

> Not so accurate classpath format for AM and Containers
> --
>
> Key: SPARK-20365
> URL: https://issues.apache.org/jira/browse/SPARK-20365
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.2.0
>Reporter: Saisai Shao
>Assignee: Apache Spark
>Priority: Minor
>
> In Spark on YARN, when configuring "spark.yarn.jars" with local jars (jars 
> started with "local" scheme), we will get inaccurate classpath for AM and 
> containers. This is because we don't remove "local" scheme when concatenating 
> classpath. It is OK to run because classpath is separated with ":" and java 
> treat "local" as a separate jar. But we could improve it to remove the scheme.
> {code}
> java.class.path = 
>

[jira] [Commented] (SPARK-20365) Not so accurate classpath format for AM and Containers

2017-05-27 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027350#comment-16027350
 ] 

Apache Spark commented on SPARK-20365:
--

User 'liyichao' has created a pull request for this issue:
https://github.com/apache/spark/pull/18129

> Not so accurate classpath format for AM and Containers
> --
>
> Key: SPARK-20365
> URL: https://issues.apache.org/jira/browse/SPARK-20365
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 2.2.0
>Reporter: Saisai Shao
>Priority: Minor
>
> In Spark on YARN, when configuring "spark.yarn.jars" with local jars (jars 
> started with "local" scheme), we will get inaccurate classpath for AM and 
> containers. This is because we don't remove "local" scheme when concatenating 
> classpath. It is OK to run because classpath is separated with ":" and java 
> treat "local" as a separate jar. But we could improve it to remove the scheme.
> {code}
> java.class.path = 
>

[jira] [Assigned] (SPARK-20906) Constrained Logistic Regression for SparkR

2017-05-27 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20906:


Assignee: Apache Spark

> Constrained Logistic Regression for SparkR
> --
>
> Key: SPARK-20906
> URL: https://issues.apache.org/jira/browse/SPARK-20906
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0, 2.2.1
>Reporter: Miao Wang
>Assignee: Apache Spark
>
> PR https://github.com/apache/spark/pull/17715 Added Constrained Logistic 
> Regression for ML. We should add it to SparkR.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-20906) Constrained Logistic Regression for SparkR

2017-05-27 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-20906:


Assignee: (was: Apache Spark)

> Constrained Logistic Regression for SparkR
> --
>
> Key: SPARK-20906
> URL: https://issues.apache.org/jira/browse/SPARK-20906
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0, 2.2.1
>Reporter: Miao Wang
>
> PR https://github.com/apache/spark/pull/17715 Added Constrained Logistic 
> Regression for ML. We should add it to SparkR.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-20906) Constrained Logistic Regression for SparkR

2017-05-27 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-20906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027320#comment-16027320
 ] 

Apache Spark commented on SPARK-20906:
--

User 'wangmiao1981' has created a pull request for this issue:
https://github.com/apache/spark/pull/18128

> Constrained Logistic Regression for SparkR
> --
>
> Key: SPARK-20906
> URL: https://issues.apache.org/jira/browse/SPARK-20906
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0, 2.2.1
>Reporter: Miao Wang
>
> PR https://github.com/apache/spark/pull/17715 Added Constrained Logistic 
> Regression for ML. We should add it to SparkR.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-20906) Constrained Logistic Regression for SparkR

2017-05-27 Thread Miao Wang (JIRA)

Miao Wang created SPARK-20906:
-

 Summary: Constrained Logistic Regression for SparkR
 Key: SPARK-20906
 URL: https://issues.apache.org/jira/browse/SPARK-20906
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 2.2.0, 2.2.1
Reporter: Miao Wang


PR https://github.com/apache/spark/pull/17715 Added Constrained Logistic 
Regression for ML. We should add it to SparkR.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-20894) Error while checkpointing to HDFS (similar to JIRA SPARK-19268)

2017-05-27 Thread kant kodali (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-20894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kant kodali updated SPARK-20894:

Attachment: executor2_log
executor1_log
driver_log

Attached Driver logs and executor logs

> Error while checkpointing to HDFS (similar to JIRA SPARK-19268)
> ---
>
> Key: SPARK-20894
> URL: https://issues.apache.org/jira/browse/SPARK-20894
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.1.1
> Environment: Ubuntu, Spark 2.1.1, hadoop 2.7
>Reporter: kant kodali
> Attachments: driver_log, executor1_log, executor2_log
>
>
> Dataset df2 = df1.groupBy(functions.window(df1.col("Timestamp5"), "24 
> hours", "24 hours"), df1.col("AppName")).count();
> StreamingQuery query = df2.writeStream().foreach(new 
> KafkaSink()).option("checkpointLocation","/usr/local/hadoop/checkpoint").outputMode("update").start();
> query.awaitTermination();
> This for some reason fails with the Error 
> ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
> java.lang.IllegalStateException: Error reading delta file 
> /usr/local/hadoop/checkpoint/state/0/0/1.delta of HDFSStateStoreProvider[id = 
> (op=0, part=0), dir = /usr/local/hadoop/checkpoint/state/0/0]: 
> /usr/local/hadoop/checkpoint/state/0/0/1.delta does not exist
> I did clear all the checkpoint data in /usr/local/hadoop/checkpoint/  and all 
> consumer offsets in Kafka from all brokers prior to running and yet this 
> error still persists. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19372) Code generation for Filter predicate including many OR conditions exceeds JVM method size limit

2017-05-27 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027270#comment-16027270
 ] 

Dongjoon Hyun commented on SPARK-19372:
---

Thank you so much all!

> Code generation for Filter predicate including many OR conditions exceeds JVM 
> method size limit 
> 
>
> Key: SPARK-19372
> URL: https://issues.apache.org/jira/browse/SPARK-19372
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Jay Pranavamurthi
>Assignee: Kazuaki Ishizaki
> Fix For: 2.2.0, 2.3.0
>
> Attachments: wide400cols.csv
>
>
> For the attached csv file, the code below causes the exception 
> "org.codehaus.janino.JaninoRuntimeException: Code of method 
> "(Lorg/apache/spark/sql/catalyst/InternalRow;)Z" of class 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificPredicate" 
> grows beyond 64 KB
> Code:
> {code:borderStyle=solid}
>   val conf = new SparkConf().setMaster("local[1]")
>   val sqlContext = 
> SparkSession.builder().config(conf).getOrCreate().sqlContext
>   val dataframe =
> sqlContext
>   .read
>   .format("com.databricks.spark.csv")
>   .load("wide400cols.csv")
>   val filter = (0 to 399)
> .foldLeft(lit(false))((e, index) => 
> e.or(dataframe.col(dataframe.columns(index)) =!= s"column${index+1}"))
>   val filtered = dataframe.filter(filter)
>   filtered.show(100)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

43 matches

Mail list logo