date:20160729

[jira] [Assigned] (SPARK-16813) Remove private[sql] and private[spark] from catalyst package

2016-07-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16813:


Assignee: Reynold Xin  (was: Apache Spark)

> Remove private[sql] and private[spark] from catalyst package
> 
>
> Key: SPARK-16813
> URL: https://issues.apache.org/jira/browse/SPARK-16813
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> The catalyst package is meant to be internal, and as a result it does not 
> make sense to mark things as private[sql] or private[spark]. It simply makes 
> debugging harder when Spark developers need to inspect the plans at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16813) Remove private[sql] and private[spark] from catalyst package

2016-07-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400468#comment-15400468
 ] 

Apache Spark commented on SPARK-16813:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/14418

> Remove private[sql] and private[spark] from catalyst package
> 
>
> Key: SPARK-16813
> URL: https://issues.apache.org/jira/browse/SPARK-16813
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> The catalyst package is meant to be internal, and as a result it does not 
> make sense to mark things as private[sql] or private[spark]. It simply makes 
> debugging harder when Spark developers need to inspect the plans at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-16813) Remove private[sql] and private[spark] from catalyst package

2016-07-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16813:


Assignee: Apache Spark  (was: Reynold Xin)

> Remove private[sql] and private[spark] from catalyst package
> 
>
> Key: SPARK-16813
> URL: https://issues.apache.org/jira/browse/SPARK-16813
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> The catalyst package is meant to be internal, and as a result it does not 
> make sense to mark things as private[sql] or private[spark]. It simply makes 
> debugging harder when Spark developers need to inspect the plans at runtime.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-16813) Remove private[sql] and private[spark] from catalyst package

2016-07-29 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-16813:
---

 Summary: Remove private[sql] and private[spark] from catalyst 
package
 Key: SPARK-16813
 URL: https://issues.apache.org/jira/browse/SPARK-16813
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin


The catalyst package is meant to be internal, and as a result it does not make 
sense to mark things as private[sql] or private[spark]. It simply makes 
debugging harder when Spark developers need to inspect the plans at runtime.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-16812) Open up SparkILoop.getAddedJars

2016-07-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16812:


Assignee: Reynold Xin  (was: Apache Spark)

> Open up SparkILoop.getAddedJars
> ---
>
> Key: SPARK-16812
> URL: https://issues.apache.org/jira/browse/SPARK-16812
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> SparkILoop.getAddedJars is a useful method to use so we can programmatically 
> get the list of jars added.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-16812) Open up SparkILoop.getAddedJars

2016-07-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16812:


Assignee: Apache Spark  (was: Reynold Xin)

> Open up SparkILoop.getAddedJars
> ---
>
> Key: SPARK-16812
> URL: https://issues.apache.org/jira/browse/SPARK-16812
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> SparkILoop.getAddedJars is a useful method to use so we can programmatically 
> get the list of jars added.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16812) Open up SparkILoop.getAddedJars

2016-07-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400442#comment-15400442
 ] 

Apache Spark commented on SPARK-16812:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/14417

> Open up SparkILoop.getAddedJars
> ---
>
> Key: SPARK-16812
> URL: https://issues.apache.org/jira/browse/SPARK-16812
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> SparkILoop.getAddedJars is a useful method to use so we can programmatically 
> get the list of jars added.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-16812) Open up SparkILoop.getAddedJars

2016-07-29 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-16812:
---

 Summary: Open up SparkILoop.getAddedJars
 Key: SPARK-16812
 URL: https://issues.apache.org/jira/browse/SPARK-16812
 Project: Spark
  Issue Type: Improvement
  Components: Spark Shell
Reporter: Reynold Xin
Assignee: Reynold Xin


SparkILoop.getAddedJars is a useful method to use so we can programmatically 
get the list of jars added.
 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-16776) Fix Kafka deprecation warnings

2016-07-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16776:


Assignee: Apache Spark

> Fix Kafka deprecation warnings
> --
>
> Key: SPARK-16776
> URL: https://issues.apache.org/jira/browse/SPARK-16776
> Project: Spark
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: holdenk
>Assignee: Apache Spark
>
> The new KafkaTestUtils depends on many deprecated APIs - see if we can 
> refactor this to be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-16776) Fix Kafka deprecation warnings

2016-07-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16776:


Assignee: (was: Apache Spark)

> Fix Kafka deprecation warnings
> --
>
> Key: SPARK-16776
> URL: https://issues.apache.org/jira/browse/SPARK-16776
> Project: Spark
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: holdenk
>
> The new KafkaTestUtils depends on many deprecated APIs - see if we can 
> refactor this to be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16776) Fix Kafka deprecation warnings

2016-07-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400439#comment-15400439
 ] 

Apache Spark commented on SPARK-16776:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/14416

> Fix Kafka deprecation warnings
> --
>
> Key: SPARK-16776
> URL: https://issues.apache.org/jira/browse/SPARK-16776
> Project: Spark
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: holdenk
>
> The new KafkaTestUtils depends on many deprecated APIs - see if we can 
> refactor this to be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16732) Remove unused codes in subexpressionEliminationForWholeStageCodegen

2016-07-29 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-16732:

Assignee: yucai

> Remove unused codes in subexpressionEliminationForWholeStageCodegen
> ---
>
> Key: SPARK-16732
> URL: https://issues.apache.org/jira/browse/SPARK-16732
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: yucai
>Assignee: yucai
> Fix For: 2.0.1, 2.1.0
>
>
> Some codes in subexpressionEliminationForWholeStageCodegen are never used 
> actually. Remove them using this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-16732) Remove unused codes in subexpressionEliminationForWholeStageCodegen

2016-07-29 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-16732.
-
   Resolution: Fixed
Fix Version/s: 2.1.0
   2.0.1

> Remove unused codes in subexpressionEliminationForWholeStageCodegen
> ---
>
> Key: SPARK-16732
> URL: https://issues.apache.org/jira/browse/SPARK-16732
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: yucai
> Fix For: 2.0.1, 2.1.0
>
>
> Some codes in subexpressionEliminationForWholeStageCodegen are never used 
> actually. Remove them using this jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16776) Fix Kafka deprecation warnings

2016-07-29 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400433#comment-15400433
 ] 

Hyukjin Kwon commented on SPARK-16776:
--

Oh, I meant to just leave the warnings but let me do this quickly. Thanks for 
asking this!

> Fix Kafka deprecation warnings
> --
>
> Key: SPARK-16776
> URL: https://issues.apache.org/jira/browse/SPARK-16776
> Project: Spark
>  Issue Type: Sub-task
>  Components: Streaming
>Reporter: holdenk
>
> The new KafkaTestUtils depends on many deprecated APIs - see if we can 
> refactor this to be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-16748) Errors thrown by UDFs cause TreeNodeException when the query has an ORDER BY clause

2016-07-29 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-16748.
--
   Resolution: Fixed
Fix Version/s: 2.1.0
   2.0.1

Issue resolved by pull request 14395
[https://github.com/apache/spark/pull/14395]

> Errors thrown by UDFs cause TreeNodeException when the query has an ORDER BY 
> clause
> ---
>
> Key: SPARK-16748
> URL: https://issues.apache.org/jira/browse/SPARK-16748
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Yin Huai
>Assignee: Tathagata Das
> Fix For: 2.0.1, 2.1.0
>
>
> {code}
> import org.apache.spark.sql.functions._
> val myUDF = udf((c: String) => s"""${c.take(5)}""")
> spark.sql("SELECT cast(null as string) as 
> a").select(myUDF($"a").as("b")).orderBy($"b").collect
> {code}
> {code}
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
> Exchange rangepartitioning(b#345 ASC, 200)
> +- *Project [UDF(null) AS b#345]
>+- Scan OneRowRelation[]
>   at 
> org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:50)
>   at 
> org.apache.spark.sql.execution.exchange.ShuffleExchange.doExecute(ShuffleExchange.scala:113)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:233)
>   at org.apache.spark.sql.execution.SortExec.inputRDDs(SortExec.scala:113)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:361)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:225)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:272)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2183)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
>   at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2532)
>   at 
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2182)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$collect$1.apply(Dataset.scala:2187)
>   at 
> org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$collect$1.apply(Dataset.scala:2187)
>   at org.apache.spark.sql.Dataset.withCallback(Dataset.scala:2545)
>   at 
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2187)
>   at org.apache.spark.sql.Dataset.collect(Dataset.scala:2163)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-16811) spark streaming-Exception in thread “submit-job-thread-pool-40”

2016-07-29 Thread BinXu (JIRA)

BinXu created SPARK-16811:
-

 Summary: spark streaming-Exception in thread 
“submit-job-thread-pool-40”
 Key: SPARK-16811
 URL: https://issues.apache.org/jira/browse/SPARK-16811
 Project: Spark
  Issue Type: Question
  Components: Streaming
Affects Versions: 1.6.1
 Environment: spark 1.6.1
Reporter: BinXu
Priority: Blocker


{color:red}I am using spark 1.6. When I run streaming application for a few 
minutes,it cause such Exception Like this:{color}
 Exception in thread "submit-job-thread-pool-40" Exception in thread 
"submit-job-thread-pool-33" Exception in thread "submit-job-thread-pool-23" 
Exception in thread "submit-job-thread-pool-14" Exception in thread 
"submit-job-thread-pool-29" Exception in thread "submit-job-thread-pool-39" 
Exception in thread "submit-job-thread-pool-2" java.lang.Error: 
java.lang.InterruptedException
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1148)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73)
at 
org.apache.spark.SimpleFutureAction.org$apache$spark$SimpleFutureAction$$awaitResult(FutureAction.scala:165)
at org.apache.spark.SimpleFutureAction$$anon$1.run(FutureAction.scala:147)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
... 2 more
java.lang.Error: java.lang.InterruptedException
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1148)
at   
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73)
at 
org.apache.spark.SimpleFutureAction.org$apache$spark$SimpleFutureAction$$awaitResult(FutureAction.scala:165)
at org.apache.spark.SimpleFutureAction$$anon$1.run(FutureAction.scala:147)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
... 2 more
java.lang.Error: java.lang.InterruptedException
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1148)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73)
at 
org.apache.spark.SimpleFutureAction.org$apache$spark$SimpleFutureAction$$awaitResult(FutureAction.scala:165)
at org.apache.spark.SimpleFutureAction$$anon$1.run(FutureAction.scala:147)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
... 2 more
java.lang.Error: java.lang.InterruptedException
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1148)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73)
at 
org.apache.spark.SimpleFutureAction.org$apache$spark$SimpleFutureAction$$awaitResult(FutureAction.scala:165)
at org.apache.spark.SimpleFutureAction$$anon$1.run(FutureAction.scala:147)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
... 2 more
java.lang.Error: java.lang.InterruptedException
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1148)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.apache.spark.scheduler.JobWaiter.awaitResult(JobWaiter.scala:73)
at 
org.apache.spark.SimpleFutureAction.org$apache$spark$SimpleFutureAction$$awaitResult(FutureAction.scala:165)
at org.apache.spark.SimpleFutureAction$$anon$1.run(FutureAction.scala:147)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
... 2 more

{color:red} 
This problem is so wield,the consume speed is enough.My batch time is 20s, and 
the first few minutes(app is ok),the process time is 8s. But After that,the

[jira] [Resolved] (SPARK-16797) Repartiton call w/ 0 partitions drops data

2016-07-29 Thread Bryan Jeffrey (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Jeffrey resolved SPARK-16797.
---
Resolution: Fixed

Resolved in Spark 2.0.0

> Repartiton call w/ 0 partitions drops data
> --
>
> Key: SPARK-16797
> URL: https://issues.apache.org/jira/browse/SPARK-16797
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.2
>Reporter: Bryan Jeffrey
>Priority: Minor
>  Labels: easyfix
> Fix For: 2.0.0
>
>
> When you call RDD.repartition(0) or DStream.repartition(0), the input data 
> silently dropped. This should not silently fail; instead an exception should 
> be thrown to alert the user to the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16797) Repartiton call w/ 0 partitions drops data

2016-07-29 Thread Bryan Jeffrey (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bryan Jeffrey updated SPARK-16797:
--
Affects Version/s: (was: 2.0.0)
Fix Version/s: 2.0.0

> Repartiton call w/ 0 partitions drops data
> --
>
> Key: SPARK-16797
> URL: https://issues.apache.org/jira/browse/SPARK-16797
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.2
>Reporter: Bryan Jeffrey
>Priority: Minor
>  Labels: easyfix
> Fix For: 2.0.0
>
>
> When you call RDD.repartition(0) or DStream.repartition(0), the input data 
> silently dropped. This should not silently fail; instead an exception should 
> be thrown to alert the user to the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16797) Repartiton call w/ 0 partitions drops data

2016-07-29 Thread Bryan Jeffrey (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400362#comment-15400362
 ] 

Bryan Jeffrey commented on SPARK-16797:
---

I just ran the same thing on Spark 2.0.0 as released, and the issue does appear 
to be resolved.  I'm sorry about the confusion - I had run this earlier on a 
2.0 preview & thought I was seeing the same issue.  We can close this as fixed 
in 2.0.0

Thank you!

> Repartiton call w/ 0 partitions drops data
> --
>
> Key: SPARK-16797
> URL: https://issues.apache.org/jira/browse/SPARK-16797
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Bryan Jeffrey
>Priority: Minor
>  Labels: easyfix
>
> When you call RDD.repartition(0) or DStream.repartition(0), the input data 
> silently dropped. This should not silently fail; instead an exception should 
> be thrown to alert the user to the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-16797) Repartiton call w/ 0 partitions drops data

2016-07-29 Thread Bryan Jeffrey (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400356#comment-15400356
 ] 

Bryan Jeffrey edited comment on SPARK-16797 at 7/30/16 1:37 AM:


Hello. Here is a simple example using Spark 1.6.1:

{code}
import org.apache.spark.streaming.dstream.InputDStream
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.streaming.{Seconds, StreamingContext}

object SimpleExample {
  def main(args: Array[String]): Unit = {
val appName = "Simple Example"
val sparkConf = new SparkConf().setAppName(appName)
val sc = new SparkContext(sparkConf)
val ssc = new StreamingContext(sc, Seconds(5))

val input = Array(1,2,3,4,5,6,7,8)
val queue = 
scala.collection.mutable.Queue(ssc.sparkContext.parallelize(input))
val data: InputDStream[Int] = ssc.queueStream(queue)

data.foreachRDD(x => println("Initial Count: " + x.count()))

val streamPartition = data.repartition(0)
streamPartition.foreachRDD(x => println("Stream Repartition: " + x.count()))

val rddPartition = data.transform(x => x.repartition(0))
rddPartition.foreachRDD(x => println("Rdd Repartition: " + x.count()))

val singlePartitionStreamPartition = data.repartition(1)
singlePartitionStreamPartition.foreachRDD(x => println("Stream w/ Single 
Partition: " + x.count()))

ssc.start()
ssc.awaitTermination()
  }
}
{code}

Output:
{code}
Initial Count: 8
Stream Repartition: 0
Rdd Repartition: 0
Stream w/ Single Partition: 8

... More streaming output (but obviously zero counts  as we're using an 
InputDStream w/ no other data) ...
{code}


was (Author: bryan.jeff...@gmail.com):
Hello. Here is a simple example using Spark 1.6.1:

{code}
import org.apache.spark.streaming.dstream.InputDStream
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.streaming.{Seconds, StreamingContext}

object SimpleExample {
  def main(args: Array[String]): Unit = {
val appName = "Simple Example"
val sparkConf = new SparkConf().setAppName(appName)
val sc = new SparkContext(sparkConf)
val ssc = new StreamingContext(sc, Seconds(5))

val input = Array(1,2,3,4,5,6,7,8)
val queue = 
scala.collection.mutable.Queue(ssc.sparkContext.parallelize(input))
val data: InputDStream[Int] = ssc.queueStream(queue)

data.foreachRDD(x => println("Initial Count: " + x.count()))

val streamPartition = data.repartition(0)
streamPartition.foreachRDD(x => println("Stream Repartition: " + x.count()))

val rddPartition = data.transform(x => x.repartition(0))
rddPartition.foreachRDD(x => println("Rdd Repartition: " + x.count()))

val singlePartitionStreamPartition = data.repartition(1)
singlePartitionStreamPartition.foreachRDD(x => println("Stream w/ Single 
Partition: " + x.count()))

ssc.start()
ssc.awaitTermination()
  }
}
{code}

Output:
{code}
Initial Count: 8
Stream Repartition: 0
Rdd Repartition: 0
Stream w/ Single Partition: 8
Initial Count: 0
Stream Repartition: 0
Rdd Repartition: 0
Stream w/ Single Partition: 0
{code}

> Repartiton call w/ 0 partitions drops data
> --
>
> Key: SPARK-16797
> URL: https://issues.apache.org/jira/browse/SPARK-16797
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Bryan Jeffrey
>Priority: Minor
>  Labels: easyfix
>
> When you call RDD.repartition(0) or DStream.repartition(0), the input data 
> silently dropped. This should not silently fail; instead an exception should 
> be thrown to alert the user to the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-16797) Repartiton call w/ 0 partitions drops data

2016-07-29 Thread Bryan Jeffrey (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400356#comment-15400356
 ] 

Bryan Jeffrey edited comment on SPARK-16797 at 7/30/16 1:36 AM:


Hello. Here is a simple example using Spark 1.6.1:

{code}
import org.apache.spark.streaming.dstream.InputDStream
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.streaming.{Seconds, StreamingContext}

object SimpleExample {
  def main(args: Array[String]): Unit = {
val appName = "Simple Example"
val sparkConf = new SparkConf().setAppName(appName)
val sc = new SparkContext(sparkConf)
val ssc = new StreamingContext(sc, Seconds(5))

val input = Array(1,2,3,4,5,6,7,8)
val queue = 
scala.collection.mutable.Queue(ssc.sparkContext.parallelize(input))
val data: InputDStream[Int] = ssc.queueStream(queue)

data.foreachRDD(x => println("Initial Count: " + x.count()))

val streamPartition = data.repartition(0)
streamPartition.foreachRDD(x => println("Stream Repartition: " + x.count()))

val rddPartition = data.transform(x => x.repartition(0))
rddPartition.foreachRDD(x => println("Rdd Repartition: " + x.count()))

val singlePartitionStreamPartition = data.repartition(1)
singlePartitionStreamPartition.foreachRDD(x => println("Stream w/ Single 
Partition: " + x.count()))

ssc.start()
ssc.awaitTermination()
  }
}
{code}

Output:
{code}
Initial Count: 8
Stream Repartition: 0
Rdd Repartition: 0
Stream w/ Single Partition: 8
Initial Count: 0
Stream Repartition: 0
Rdd Repartition: 0
Stream w/ Single Partition: 0
{code}


was (Author: bryan.jeff...@gmail.com):
Hello. Here is a simple example using Spark 1.6.1:

[jira] [Commented] (SPARK-16797) Repartiton call w/ 0 partitions drops data

2016-07-29 Thread Bryan Jeffrey (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400356#comment-15400356
 ] 

Bryan Jeffrey commented on SPARK-16797:
---

Hello. Here is a simple example using Spark 1.6.1:

[jira] [Updated] (SPARK-16802) joins.LongToUnsafeRowMap crashes with ArrayIndexOutOfBoundsException

2016-07-29 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-16802:

Affects Version/s: (was: 2.0.1)

> joins.LongToUnsafeRowMap crashes with ArrayIndexOutOfBoundsException
> 
>
> Key: SPARK-16802
> URL: https://issues.apache.org/jira/browse/SPARK-16802
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Sylvain Zimmer
>Priority: Critical
>
> Hello!
> This is a little similar to 
> [SPARK-16740|https://issues.apache.org/jira/browse/SPARK-16740] (should I 
> have reopened it?).
> I would recommend to give another full review to {{HashedRelation.scala}}, 
> particularly the new {{LongToUnsafeRowMap}} code. I've had a few other errors 
> that I haven't managed to reproduce so far, as well as what I suspect could 
> be memory leaks (I have a query in a loop OOMing after a few iterations 
> despite not caching its results).
> Here is the script to reproduce the ArrayIndexOutOfBoundsException on the 
> current 2.0 branch:
> {code}
> import os
> import random
> from pyspark import SparkContext
> from pyspark.sql import types as SparkTypes
> from pyspark.sql import SQLContext
> sc = SparkContext()
> sqlc = SQLContext(sc)
> schema1 = SparkTypes.StructType([
> SparkTypes.StructField("id1", SparkTypes.LongType(), nullable=True)
> ])
> schema2 = SparkTypes.StructType([
> SparkTypes.StructField("id2", SparkTypes.LongType(), nullable=True)
> ])
> def randlong():
> return random.randint(-9223372036854775808, 9223372036854775807)
> while True:
> l1, l2 = randlong(), randlong()
> # Sample values that crash:
> # l1, l2 = 4661454128115150227, -5543241376386463808
> print "Testing with %s, %s" % (l1, l2)
> data1 = [(l1, ), (l2, )]
> data2 = [(l1, )]
> df1 = sqlc.createDataFrame(sc.parallelize(data1), schema1)
> df2 = sqlc.createDataFrame(sc.parallelize(data2), schema2)
> crash = True
> if crash:
> os.system("rm -rf /tmp/sparkbug")
> df1.write.parquet("/tmp/sparkbug/vertex")
> df2.write.parquet("/tmp/sparkbug/edge")
> df1 = sqlc.read.load("/tmp/sparkbug/vertex")
> df2 = sqlc.read.load("/tmp/sparkbug/edge")
> sqlc.registerDataFrameAsTable(df1, "df1")
> sqlc.registerDataFrameAsTable(df2, "df2")
> result_df = sqlc.sql("""
> SELECT
> df1.id1
> FROM df1
> LEFT OUTER JOIN df2 ON df1.id1 = df2.id2
> """)
> print result_df.collect()
> {code}
> {code}
> java.lang.ArrayIndexOutOfBoundsException: 1728150825
>   at 
> org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.getValue(HashedRelation.scala:463)
>   at 
> org.apache.spark.sql.execution.joins.LongHashedRelation.getValue(HashedRelation.scala:762)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:117)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:112)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.to(SerDeUtil.scala:112)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.toBuffer(SerDeUtil.scala:112)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.toArray(SerDeUtil.scala:112)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:899)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:899)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1898)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1898)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at

[jira] [Updated] (SPARK-16802) joins.LongToUnsafeRowMap crashes with ArrayIndexOutOfBoundsException

2016-07-29 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-16802:

Priority: Blocker  (was: Major)

> joins.LongToUnsafeRowMap crashes with ArrayIndexOutOfBoundsException
> 
>
> Key: SPARK-16802
> URL: https://issues.apache.org/jira/browse/SPARK-16802
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Sylvain Zimmer
>Priority: Blocker
>
> Hello!
> This is a little similar to 
> [SPARK-16740|https://issues.apache.org/jira/browse/SPARK-16740] (should I 
> have reopened it?).
> I would recommend to give another full review to {{HashedRelation.scala}}, 
> particularly the new {{LongToUnsafeRowMap}} code. I've had a few other errors 
> that I haven't managed to reproduce so far, as well as what I suspect could 
> be memory leaks (I have a query in a loop OOMing after a few iterations 
> despite not caching its results).
> Here is the script to reproduce the ArrayIndexOutOfBoundsException on the 
> current 2.0 branch:
> {code}
> import os
> import random
> from pyspark import SparkContext
> from pyspark.sql import types as SparkTypes
> from pyspark.sql import SQLContext
> sc = SparkContext()
> sqlc = SQLContext(sc)
> schema1 = SparkTypes.StructType([
> SparkTypes.StructField("id1", SparkTypes.LongType(), nullable=True)
> ])
> schema2 = SparkTypes.StructType([
> SparkTypes.StructField("id2", SparkTypes.LongType(), nullable=True)
> ])
> def randlong():
> return random.randint(-9223372036854775808, 9223372036854775807)
> while True:
> l1, l2 = randlong(), randlong()
> # Sample values that crash:
> # l1, l2 = 4661454128115150227, -5543241376386463808
> print "Testing with %s, %s" % (l1, l2)
> data1 = [(l1, ), (l2, )]
> data2 = [(l1, )]
> df1 = sqlc.createDataFrame(sc.parallelize(data1), schema1)
> df2 = sqlc.createDataFrame(sc.parallelize(data2), schema2)
> crash = True
> if crash:
> os.system("rm -rf /tmp/sparkbug")
> df1.write.parquet("/tmp/sparkbug/vertex")
> df2.write.parquet("/tmp/sparkbug/edge")
> df1 = sqlc.read.load("/tmp/sparkbug/vertex")
> df2 = sqlc.read.load("/tmp/sparkbug/edge")
> sqlc.registerDataFrameAsTable(df1, "df1")
> sqlc.registerDataFrameAsTable(df2, "df2")
> result_df = sqlc.sql("""
> SELECT
> df1.id1
> FROM df1
> LEFT OUTER JOIN df2 ON df1.id1 = df2.id2
> """)
> print result_df.collect()
> {code}
> {code}
> java.lang.ArrayIndexOutOfBoundsException: 1728150825
>   at 
> org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.getValue(HashedRelation.scala:463)
>   at 
> org.apache.spark.sql.execution.joins.LongHashedRelation.getValue(HashedRelation.scala:762)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:117)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:112)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.to(SerDeUtil.scala:112)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.toBuffer(SerDeUtil.scala:112)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.toArray(SerDeUtil.scala:112)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:899)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:899)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1898)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1898)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at

[jira] [Updated] (SPARK-16802) joins.LongToUnsafeRowMap crashes with ArrayIndexOutOfBoundsException

2016-07-29 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-16802:

Priority: Critical  (was: Blocker)

> joins.LongToUnsafeRowMap crashes with ArrayIndexOutOfBoundsException
> 
>
> Key: SPARK-16802
> URL: https://issues.apache.org/jira/browse/SPARK-16802
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Sylvain Zimmer
>Priority: Critical
>
> Hello!
> This is a little similar to 
> [SPARK-16740|https://issues.apache.org/jira/browse/SPARK-16740] (should I 
> have reopened it?).
> I would recommend to give another full review to {{HashedRelation.scala}}, 
> particularly the new {{LongToUnsafeRowMap}} code. I've had a few other errors 
> that I haven't managed to reproduce so far, as well as what I suspect could 
> be memory leaks (I have a query in a loop OOMing after a few iterations 
> despite not caching its results).
> Here is the script to reproduce the ArrayIndexOutOfBoundsException on the 
> current 2.0 branch:
> {code}
> import os
> import random
> from pyspark import SparkContext
> from pyspark.sql import types as SparkTypes
> from pyspark.sql import SQLContext
> sc = SparkContext()
> sqlc = SQLContext(sc)
> schema1 = SparkTypes.StructType([
> SparkTypes.StructField("id1", SparkTypes.LongType(), nullable=True)
> ])
> schema2 = SparkTypes.StructType([
> SparkTypes.StructField("id2", SparkTypes.LongType(), nullable=True)
> ])
> def randlong():
> return random.randint(-9223372036854775808, 9223372036854775807)
> while True:
> l1, l2 = randlong(), randlong()
> # Sample values that crash:
> # l1, l2 = 4661454128115150227, -5543241376386463808
> print "Testing with %s, %s" % (l1, l2)
> data1 = [(l1, ), (l2, )]
> data2 = [(l1, )]
> df1 = sqlc.createDataFrame(sc.parallelize(data1), schema1)
> df2 = sqlc.createDataFrame(sc.parallelize(data2), schema2)
> crash = True
> if crash:
> os.system("rm -rf /tmp/sparkbug")
> df1.write.parquet("/tmp/sparkbug/vertex")
> df2.write.parquet("/tmp/sparkbug/edge")
> df1 = sqlc.read.load("/tmp/sparkbug/vertex")
> df2 = sqlc.read.load("/tmp/sparkbug/edge")
> sqlc.registerDataFrameAsTable(df1, "df1")
> sqlc.registerDataFrameAsTable(df2, "df2")
> result_df = sqlc.sql("""
> SELECT
> df1.id1
> FROM df1
> LEFT OUTER JOIN df2 ON df1.id1 = df2.id2
> """)
> print result_df.collect()
> {code}
> {code}
> java.lang.ArrayIndexOutOfBoundsException: 1728150825
>   at 
> org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.getValue(HashedRelation.scala:463)
>   at 
> org.apache.spark.sql.execution.joins.LongHashedRelation.getValue(HashedRelation.scala:762)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:117)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:112)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.to(SerDeUtil.scala:112)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.toBuffer(SerDeUtil.scala:112)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.toArray(SerDeUtil.scala:112)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:899)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:899)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1898)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1898)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at

[jira] [Updated] (SPARK-16802) joins.LongToUnsafeRowMap crashes with ArrayIndexOutOfBoundsException

2016-07-29 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-16802:

Target Version/s: 2.0.1, 2.1.0

> joins.LongToUnsafeRowMap crashes with ArrayIndexOutOfBoundsException
> 
>
> Key: SPARK-16802
> URL: https://issues.apache.org/jira/browse/SPARK-16802
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Sylvain Zimmer
>Priority: Critical
>
> Hello!
> This is a little similar to 
> [SPARK-16740|https://issues.apache.org/jira/browse/SPARK-16740] (should I 
> have reopened it?).
> I would recommend to give another full review to {{HashedRelation.scala}}, 
> particularly the new {{LongToUnsafeRowMap}} code. I've had a few other errors 
> that I haven't managed to reproduce so far, as well as what I suspect could 
> be memory leaks (I have a query in a loop OOMing after a few iterations 
> despite not caching its results).
> Here is the script to reproduce the ArrayIndexOutOfBoundsException on the 
> current 2.0 branch:
> {code}
> import os
> import random
> from pyspark import SparkContext
> from pyspark.sql import types as SparkTypes
> from pyspark.sql import SQLContext
> sc = SparkContext()
> sqlc = SQLContext(sc)
> schema1 = SparkTypes.StructType([
> SparkTypes.StructField("id1", SparkTypes.LongType(), nullable=True)
> ])
> schema2 = SparkTypes.StructType([
> SparkTypes.StructField("id2", SparkTypes.LongType(), nullable=True)
> ])
> def randlong():
> return random.randint(-9223372036854775808, 9223372036854775807)
> while True:
> l1, l2 = randlong(), randlong()
> # Sample values that crash:
> # l1, l2 = 4661454128115150227, -5543241376386463808
> print "Testing with %s, %s" % (l1, l2)
> data1 = [(l1, ), (l2, )]
> data2 = [(l1, )]
> df1 = sqlc.createDataFrame(sc.parallelize(data1), schema1)
> df2 = sqlc.createDataFrame(sc.parallelize(data2), schema2)
> crash = True
> if crash:
> os.system("rm -rf /tmp/sparkbug")
> df1.write.parquet("/tmp/sparkbug/vertex")
> df2.write.parquet("/tmp/sparkbug/edge")
> df1 = sqlc.read.load("/tmp/sparkbug/vertex")
> df2 = sqlc.read.load("/tmp/sparkbug/edge")
> sqlc.registerDataFrameAsTable(df1, "df1")
> sqlc.registerDataFrameAsTable(df2, "df2")
> result_df = sqlc.sql("""
> SELECT
> df1.id1
> FROM df1
> LEFT OUTER JOIN df2 ON df1.id1 = df2.id2
> """)
> print result_df.collect()
> {code}
> {code}
> java.lang.ArrayIndexOutOfBoundsException: 1728150825
>   at 
> org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.getValue(HashedRelation.scala:463)
>   at 
> org.apache.spark.sql.execution.joins.LongHashedRelation.getValue(HashedRelation.scala:762)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:117)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:112)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.to(SerDeUtil.scala:112)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.toBuffer(SerDeUtil.scala:112)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.toArray(SerDeUtil.scala:112)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:899)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:899)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1898)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1898)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at

[jira] [Updated] (SPARK-16804) Correlated subqueries containing LIMIT return incorrect results

2016-07-29 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-16804:

Target Version/s: 2.0.1, 2.1.0  (was: 2.0.1)

> Correlated subqueries containing LIMIT return incorrect results
> ---
>
> Key: SPARK-16804
> URL: https://issues.apache.org/jira/browse/SPARK-16804
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Nattavut Sutyanyong
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Correlated subqueries with LIMIT could return incorrect results. The rule 
> ResolveSubquery in the Analysis phase moves correlated predicates to a join 
> predicates and neglect the semantic of the LIMIT.
> Example:
> {noformat}
> Seq(1, 2).toDF("c1").createOrReplaceTempView("t1")
> Seq(1, 2).toDF("c2").createOrReplaceTempView("t2")
> sql("select c1 from t1 where exists (select 1 from t2 where t1.c1=t2.c2 LIMIT 
> 1)").show
> +---+ 
>   
> | c1|
> +---+
> |  1|
> +---+
> {noformat}
> The correct result contains both rows from T1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16810) Refactor registerSinks with multiple constructos

2016-07-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400263#comment-15400263
 ] 

Apache Spark commented on SPARK-16810:
--

User 'lovexi' has created a pull request for this issue:
https://github.com/apache/spark/pull/14415

> Refactor registerSinks with multiple constructos
> 
>
> Key: SPARK-16810
> URL: https://issues.apache.org/jira/browse/SPARK-16810
> Project: Spark
>  Issue Type: Improvement
>Reporter: YangyangLiu
>
> For some metrics, it may require some app detailed information from 
> SparkConf. So for those sinks, we need to pass SparkConf via sink 
> constructor. Refactored registerSink class reflection part to allow multiple 
> types of sink constructor to be initialized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-16810) Refactor registerSinks with multiple constructos

2016-07-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16810:


Assignee: (was: Apache Spark)

> Refactor registerSinks with multiple constructos
> 
>
> Key: SPARK-16810
> URL: https://issues.apache.org/jira/browse/SPARK-16810
> Project: Spark
>  Issue Type: Improvement
>Reporter: YangyangLiu
>
> For some metrics, it may require some app detailed information from 
> SparkConf. So for those sinks, we need to pass SparkConf via sink 
> constructor. Refactored registerSink class reflection part to allow multiple 
> types of sink constructor to be initialized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-16810) Refactor registerSinks with multiple constructos

2016-07-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16810:


Assignee: Apache Spark

> Refactor registerSinks with multiple constructos
> 
>
> Key: SPARK-16810
> URL: https://issues.apache.org/jira/browse/SPARK-16810
> Project: Spark
>  Issue Type: Improvement
>Reporter: YangyangLiu
>Assignee: Apache Spark
>
> For some metrics, it may require some app detailed information from 
> SparkConf. So for those sinks, we need to pass SparkConf via sink 
> constructor. Refactored registerSink class reflection part to allow multiple 
> types of sink constructor to be initialized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16802) joins.LongToUnsafeRowMap crashes with ArrayIndexOutOfBoundsException

2016-07-29 Thread Sylvain Zimmer (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400260#comment-15400260
 ] 

Sylvain Zimmer commented on SPARK-16802:


Maybe useful for others: an ugly workaround to avoid this code path is to cast 
the join as string and do something like this instead:

{code}
SELECT
df1.id1
FROM df1
LEFT OUTER JOIN df2 ON cast(df1.id1 as string) = cast(df2.id2 as string)
{code}

> joins.LongToUnsafeRowMap crashes with ArrayIndexOutOfBoundsException
> 
>
> Key: SPARK-16802
> URL: https://issues.apache.org/jira/browse/SPARK-16802
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Sylvain Zimmer
>
> Hello!
> This is a little similar to 
> [SPARK-16740|https://issues.apache.org/jira/browse/SPARK-16740] (should I 
> have reopened it?).
> I would recommend to give another full review to {{HashedRelation.scala}}, 
> particularly the new {{LongToUnsafeRowMap}} code. I've had a few other errors 
> that I haven't managed to reproduce so far, as well as what I suspect could 
> be memory leaks (I have a query in a loop OOMing after a few iterations 
> despite not caching its results).
> Here is the script to reproduce the ArrayIndexOutOfBoundsException on the 
> current 2.0 branch:
> {code}
> import os
> import random
> from pyspark import SparkContext
> from pyspark.sql import types as SparkTypes
> from pyspark.sql import SQLContext
> sc = SparkContext()
> sqlc = SQLContext(sc)
> schema1 = SparkTypes.StructType([
> SparkTypes.StructField("id1", SparkTypes.LongType(), nullable=True)
> ])
> schema2 = SparkTypes.StructType([
> SparkTypes.StructField("id2", SparkTypes.LongType(), nullable=True)
> ])
> def randlong():
> return random.randint(-9223372036854775808, 9223372036854775807)
> while True:
> l1, l2 = randlong(), randlong()
> # Sample values that crash:
> # l1, l2 = 4661454128115150227, -5543241376386463808
> print "Testing with %s, %s" % (l1, l2)
> data1 = [(l1, ), (l2, )]
> data2 = [(l1, )]
> df1 = sqlc.createDataFrame(sc.parallelize(data1), schema1)
> df2 = sqlc.createDataFrame(sc.parallelize(data2), schema2)
> crash = True
> if crash:
> os.system("rm -rf /tmp/sparkbug")
> df1.write.parquet("/tmp/sparkbug/vertex")
> df2.write.parquet("/tmp/sparkbug/edge")
> df1 = sqlc.read.load("/tmp/sparkbug/vertex")
> df2 = sqlc.read.load("/tmp/sparkbug/edge")
> sqlc.registerDataFrameAsTable(df1, "df1")
> sqlc.registerDataFrameAsTable(df2, "df2")
> result_df = sqlc.sql("""
> SELECT
> df1.id1
> FROM df1
> LEFT OUTER JOIN df2 ON df1.id1 = df2.id2
> """)
> print result_df.collect()
> {code}
> {code}
> java.lang.ArrayIndexOutOfBoundsException: 1728150825
>   at 
> org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.getValue(HashedRelation.scala:463)
>   at 
> org.apache.spark.sql.execution.joins.LongHashedRelation.getValue(HashedRelation.scala:762)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:117)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:112)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.to(SerDeUtil.scala:112)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.toBuffer(SerDeUtil.scala:112)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.toArray(SerDeUtil.scala:112)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:899)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:899)
>   at 
>

[jira] [Created] (SPARK-16810) Refactor registerSinks with multiple constructos

2016-07-29 Thread YangyangLiu (JIRA)

YangyangLiu created SPARK-16810:
---

 Summary: Refactor registerSinks with multiple constructos
 Key: SPARK-16810
 URL: https://issues.apache.org/jira/browse/SPARK-16810
 Project: Spark
  Issue Type: Improvement
Reporter: YangyangLiu


For some metrics, it may require some app detailed information from SparkConf. 
So for those sinks, we need to pass SparkConf via sink constructor. Refactored 
registerSink class reflection part to allow multiple types of sink constructor 
to be initialized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-16809) Link Mesos Dispatcher and History Server

2016-07-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16809:


Assignee: Apache Spark

> Link Mesos Dispatcher and History Server
> 
>
> Key: SPARK-16809
> URL: https://issues.apache.org/jira/browse/SPARK-16809
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Reporter: Michael Gummelt
>Assignee: Apache Spark
>
> This is a somewhat duplicate of Spark-13401, but the PR for that JIRA seems 
> to only implement sandbox linking, not history server linking, which is the 
> sole scope of this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16809) Link Mesos Dispatcher and History Server

2016-07-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400234#comment-15400234
 ] 

Apache Spark commented on SPARK-16809:
--

User 'mgummelt' has created a pull request for this issue:
https://github.com/apache/spark/pull/14414

> Link Mesos Dispatcher and History Server
> 
>
> Key: SPARK-16809
> URL: https://issues.apache.org/jira/browse/SPARK-16809
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Reporter: Michael Gummelt
>
> This is a somewhat duplicate of Spark-13401, but the PR for that JIRA seems 
> to only implement sandbox linking, not history server linking, which is the 
> sole scope of this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-16809) Link Mesos Dispatcher and History Server

2016-07-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16809:


Assignee: (was: Apache Spark)

> Link Mesos Dispatcher and History Server
> 
>
> Key: SPARK-16809
> URL: https://issues.apache.org/jira/browse/SPARK-16809
> Project: Spark
>  Issue Type: New Feature
>  Components: Mesos
>Reporter: Michael Gummelt
>
> This is a somewhat duplicate of Spark-13401, but the PR for that JIRA seems 
> to only implement sandbox linking, not history server linking, which is the 
> sole scope of this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-16809) Link Mesos Dispatcher and History Server

2016-07-29 Thread Michael Gummelt (JIRA)

Michael Gummelt created SPARK-16809:
---

 Summary: Link Mesos Dispatcher and History Server
 Key: SPARK-16809
 URL: https://issues.apache.org/jira/browse/SPARK-16809
 Project: Spark
  Issue Type: New Feature
  Components: Mesos
Reporter: Michael Gummelt


This is a somewhat duplicate of Spark-13401, but the PR for that JIRA seems to 
only implement sandbox linking, not history server linking, which is the sole 
scope of this JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-16808) History Server main page does not honor APPLICATION_WEB_PROXY_BASE

2016-07-29 Thread Michael Gummelt (JIRA)

Michael Gummelt created SPARK-16808:
---

 Summary: History Server main page does not honor 
APPLICATION_WEB_PROXY_BASE
 Key: SPARK-16808
 URL: https://issues.apache.org/jira/browse/SPARK-16808
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Michael Gummelt


The root of the history server is rendered dynamically with javascript, and 
this doesn't honor APPLICATION_WEB_PROXY_BASE: 
https://github.com/apache/spark/blob/master/core/src/main/resources/org/apache/spark/ui/static/historypage-template.html#L67

Other links in the history server do honor it: 
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/UIUtils.scala#L146

This means the links on the history server root page are broken when deployed 
behind a proxy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16807) Optimize some ABS() statements

2016-07-29 Thread Sylvain Zimmer (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Zimmer updated SPARK-16807:
---
Priority: Minor  (was: Major)

> Optimize some ABS() statements
> --
>
> Key: SPARK-16807
> URL: https://issues.apache.org/jira/browse/SPARK-16807
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Sylvain Zimmer
>Priority: Minor
>
> I'm not a Catalyst expert, but I think some use cases for the ABS() function 
> could generate simpler code.
> This is the code generated when doing something like {{ABS(x - y) > 0}} or 
> {{ABS(x - y) = 0}} in Spark SQL:
> {code}
> /* 267 */   float filter_value6 = -1.0f;
> /* 268 */   filter_value6 = agg_value27 - agg_value32;
> /* 269 */   float filter_value5 = -1.0f;
> /* 270 */   filter_value5 = (float)(java.lang.Math.abs(filter_value6));
> /* 271 */
> /* 272 */   boolean filter_value4 = false;
> /* 273 */   filter_value4 = 
> org.apache.spark.util.Utils.nanSafeCompareFloats(filter_value5, 0.0f) > 0;
> /* 274 */   if (!filter_value4) continue;
> {code}
> Maybe it could all be simplified to something like this?
> {code}
> filter_value4 = (agg_value27 != agg_value32)
> {code}
> (Of course you could write {{x != y}} directly in the SQL query, but the 
> {{0}} in my example could be a configurable threshold, not something you can 
> hardcode)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16807) Optimize some ABS() statements

2016-07-29 Thread Sylvain Zimmer (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Zimmer updated SPARK-16807:
---
Description: 
I'm not a Catalyst expert, but I think some use cases for the ABS() function 
could generate simpler code.

This is the code generated when doing something like {{ABS(x - y) > 0}} or 
{{ABS(x - y) = 0}} in Spark SQL:

{code}
/* 267 */   float filter_value6 = -1.0f;
/* 268 */   filter_value6 = agg_value27 - agg_value32;
/* 269 */   float filter_value5 = -1.0f;
/* 270 */   filter_value5 = (float)(java.lang.Math.abs(filter_value6));
/* 271 */
/* 272 */   boolean filter_value4 = false;
/* 273 */   filter_value4 = 
org.apache.spark.util.Utils.nanSafeCompareFloats(filter_value5, 0.0f) > 0;
/* 274 */   if (!filter_value4) continue;
{code}

Maybe it could all be simplified to something like this?
{code}
filter_value4 = (agg_value27 != agg_value32)
{code}

(Of course you could write {{x != y}} directly in the SQL query, but the {{0}} 
in my example could be a configurable threshold, not something you can hardcode)

  was:
I'm not a Catalyst expert, but I think some use cases for the ABS() function 
could generate simpler code.

This is the code generated when doing something like {{ABS(x - y) > 0}} or 
{{ABS(x - y) = 0}} in Spark SQL:

{code}
/* 267 */   float filter_value6 = -1.0f;
/* 268 */   filter_value6 = agg_value27 - agg_value32;
/* 269 */   float filter_value5 = -1.0f;
/* 270 */   filter_value5 = (float)(java.lang.Math.abs(filter_value6));
/* 271 */
/* 272 */   boolean filter_value4 = false;
/* 273 */   filter_value4 = 
org.apache.spark.util.Utils.nanSafeCompareFloats(filter_value5, 0.0f) > 0;
/* 274 */   if (!filter_value4) continue;
{code}

Maybe it could all be simplified to something like this?
{code}
filter_value4 = (agg_value27 != agg_value32)
{code}


> Optimize some ABS() statements
> --
>
> Key: SPARK-16807
> URL: https://issues.apache.org/jira/browse/SPARK-16807
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Sylvain Zimmer
>
> I'm not a Catalyst expert, but I think some use cases for the ABS() function 
> could generate simpler code.
> This is the code generated when doing something like {{ABS(x - y) > 0}} or 
> {{ABS(x - y) = 0}} in Spark SQL:
> {code}
> /* 267 */   float filter_value6 = -1.0f;
> /* 268 */   filter_value6 = agg_value27 - agg_value32;
> /* 269 */   float filter_value5 = -1.0f;
> /* 270 */   filter_value5 = (float)(java.lang.Math.abs(filter_value6));
> /* 271 */
> /* 272 */   boolean filter_value4 = false;
> /* 273 */   filter_value4 = 
> org.apache.spark.util.Utils.nanSafeCompareFloats(filter_value5, 0.0f) > 0;
> /* 274 */   if (!filter_value4) continue;
> {code}
> Maybe it could all be simplified to something like this?
> {code}
> filter_value4 = (agg_value27 != agg_value32)
> {code}
> (Of course you could write {{x != y}} directly in the SQL query, but the 
> {{0}} in my example could be a configurable threshold, not something you can 
> hardcode)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-16807) Optimize some ABS() statements

2016-07-29 Thread Sylvain Zimmer (JIRA)

Sylvain Zimmer created SPARK-16807:
--

 Summary: Optimize some ABS() statements
 Key: SPARK-16807
 URL: https://issues.apache.org/jira/browse/SPARK-16807
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Sylvain Zimmer


I'm not a Catalyst expert, but I think some use cases for the ABS() function 
could generate simpler code.

This is the code generated when doing something like {{ABS(x - y) > 0}} or 
{{ABS(x - y) = 0}} in Spark SQL:

{code}
/* 267 */   float filter_value6 = -1.0f;
/* 268 */   filter_value6 = agg_value27 - agg_value32;
/* 269 */   float filter_value5 = -1.0f;
/* 270 */   filter_value5 = (float)(java.lang.Math.abs(filter_value6));
/* 271 */
/* 272 */   boolean filter_value4 = false;
/* 273 */   filter_value4 = 
org.apache.spark.util.Utils.nanSafeCompareFloats(filter_value5, 0.0f) > 0;
/* 274 */   if (!filter_value4) continue;
{code}

Maybe it could all be simplified to something like this?
{code}
filter_value4 = (agg_value27 != agg_value32)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-1121) Only add avro if the build is for Hadoop 0.23.X and SPARK_YARN is set

2016-07-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-1121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400167#comment-15400167
 ] 

Apache Spark commented on SPARK-1121:
-

User 'pwendell' has created a pull request for this issue:
https://github.com/apache/spark/pull/37

> Only add avro if the build is for Hadoop 0.23.X and SPARK_YARN is set
> -
>
> Key: SPARK-1121
> URL: https://issues.apache.org/jira/browse/SPARK-1121
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Reporter: Patrick Cogan
>Assignee: prashant
> Fix For: 1.0.0
>
>
> The reason why this is needed is that in the 0.23.X versions of hadoop-client 
> the avro dependency is fully excluded:
> http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client/0.23.10/hadoop-client-0.23.10.pom
> In later versions 2.2.X the avro dependency is correctly inherited from 
> hadoop-common:
> http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client/2.2.0/hadoop-client-2.2.0.pom
> So as a workaround Spark currently depends on Avro directly in the sbt and 
> scala builds. This is a bit ugly so I'd like to propose the following:
> 1. In the Maven build just remove avro and make a note on the 
> building-with-maven page that they will need to manually add avro for this 
> build.
> 2. On sbt only add the avro dependency if the version is 0.23.X and 
> SPARK_YARN is true. Also we only need to add avro not both {avro, avro-ipc} 
> like is there now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16804) Correlated subqueries containing LIMIT return incorrect results

2016-07-29 Thread Nattavut Sutyanyong (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400150#comment-15400150
 ] 

Nattavut Sutyanyong commented on SPARK-16804:
-

To demonstrate that this fix does not unnecessarily block the "good" cases 
(where LIMIT is present but NOT on the correlated path), here is an example, 
which produce the same result set in both with and without this proposed fix.

scala> sql("select c1 from t1 where exists (select 1 from (select 1 from t2 
limit 1) where t1.c1=t2.c2)").show 
+---+   
| c1|
+---+
|  1|
+---+


> Correlated subqueries containing LIMIT return incorrect results
> ---
>
> Key: SPARK-16804
> URL: https://issues.apache.org/jira/browse/SPARK-16804
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Nattavut Sutyanyong
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Correlated subqueries with LIMIT could return incorrect results. The rule 
> ResolveSubquery in the Analysis phase moves correlated predicates to a join 
> predicates and neglect the semantic of the LIMIT.
> Example:
> {noformat}
> Seq(1, 2).toDF("c1").createOrReplaceTempView("t1")
> Seq(1, 2).toDF("c2").createOrReplaceTempView("t2")
> sql("select c1 from t1 where exists (select 1 from t2 where t1.c1=t2.c2 LIMIT 
> 1)").show
> +---+ 
>   
> | c1|
> +---+
> |  1|
> +---+
> {noformat}
> The correct result contains both rows from T1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-16806) from_unixtime function gives wrong answer

2016-07-29 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-16806.
---
Resolution: Duplicate

> from_unixtime function gives wrong answer
> -
>
> Key: SPARK-16806
> URL: https://issues.apache.org/jira/browse/SPARK-16806
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Dong Jiang
>
> The following is from 2.0, for the same epoch, the function with format 
> argument generates a different result for the year.
> spark-sql> select from_unixtime(100), from_unixtime(100, '-MM-dd 
> HH:mm:ss');
> 1969-12-31 19:01:40   1970-12-31 19:01:40



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-16794) Spark 2.0.0. with Yarn

2016-07-29 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-16794.
---
Resolution: Not A Problem

> Spark 2.0.0. with Yarn 
> ---
>
> Key: SPARK-16794
> URL: https://issues.apache.org/jira/browse/SPARK-16794
> Project: Spark
>  Issue Type: Question
>Affects Versions: 2.0.0
> Environment: AWS Cluster with Hortonworks 2.4
>Reporter: Eliano Marques
>
> I'm trying to start spark 2.0.0 with yarn. First I had the issues pointed out 
> here: https://issues.apache.org/jira/browse/SPARK-15343. I then check the 
> hadoop.yarn.timeline-service.enabled to false and the behaviour changed, i.e. 
> it started executing the application. However I'm facing the following error. 
> This might be a silly config but appreciate if you can help. 
> {code}
> spark-shell --master yarn deploy-mode client 
> {code}
> Log: 
> {code}
> 16/07/29 10:04:17 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 16/07/29 10:04:17 WARN component.AbstractLifeCycle: FAILED 
> ServerConnector@16da476c{HTTP/1.1}{0.0.0.0:4040}: java.net.BindException: 
> Address already in use
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.spark_project.jetty.server.ServerConnector.open(ServerConnector.java:321)
>   at 
> org.spark_project.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80)
>   at 
> org.spark_project.jetty.server.ServerConnector.doStart(ServerConnector.java:236)
>   at 
> org.spark_project.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
>   at org.spark_project.jetty.server.Server.doStart(Server.java:366)
>   at 
> org.spark_project.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
>   at 
> org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:298)
>   at org.apache.spark.ui.JettyUtils$$anonfun$5.apply(JettyUtils.scala:308)
>   at org.apache.spark.ui.JettyUtils$$anonfun$5.apply(JettyUtils.scala:308)
>   at 
> org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2071)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2062)
>   at 
> org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:308)
>   at org.apache.spark.ui.WebUI.bind(WebUI.scala:139)
>   at 
> org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:451)
>   at 
> org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:451)
>   at scala.Option.foreach(Option.scala:257)
>   at org.apache.spark.SparkContext.(SparkContext.scala:451)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2256)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823)
>   at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)
>   at $line3.$read$$iw$$iw.(:15)
>   at $line3.$read$$iw.(:31)
>   at $line3.$read.(:33)
>   at $line3.$read$.(:37)
>   at $line3.$read$.()
>   at $line3.$eval$.$print$lzycompute(:7)
>   at $line3.$eval$.$print(:6)
>   at $line3.$eval.$print()
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
>   at 
> scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
>   at 
> scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
>   at 
> scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
>   at 
> scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
>   at 
> scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
>   at 
>

[jira] [Reopened] (SPARK-16794) Spark 2.0.0. with Yarn

2016-07-29 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reopened SPARK-16794:
---

This should not be resolved as Fixed because there was no Spark change 
associated to it

> Spark 2.0.0. with Yarn 
> ---
>
> Key: SPARK-16794
> URL: https://issues.apache.org/jira/browse/SPARK-16794
> Project: Spark
>  Issue Type: Question
>Affects Versions: 2.0.0
> Environment: AWS Cluster with Hortonworks 2.4
>Reporter: Eliano Marques
>
> I'm trying to start spark 2.0.0 with yarn. First I had the issues pointed out 
> here: https://issues.apache.org/jira/browse/SPARK-15343. I then check the 
> hadoop.yarn.timeline-service.enabled to false and the behaviour changed, i.e. 
> it started executing the application. However I'm facing the following error. 
> This might be a silly config but appreciate if you can help. 
> {code}
> spark-shell --master yarn deploy-mode client 
> {code}
> Log: 
> {code}
> 16/07/29 10:04:17 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 16/07/29 10:04:17 WARN component.AbstractLifeCycle: FAILED 
> ServerConnector@16da476c{HTTP/1.1}{0.0.0.0:4040}: java.net.BindException: 
> Address already in use
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.spark_project.jetty.server.ServerConnector.open(ServerConnector.java:321)
>   at 
> org.spark_project.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80)
>   at 
> org.spark_project.jetty.server.ServerConnector.doStart(ServerConnector.java:236)
>   at 
> org.spark_project.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
>   at org.spark_project.jetty.server.Server.doStart(Server.java:366)
>   at 
> org.spark_project.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
>   at 
> org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:298)
>   at org.apache.spark.ui.JettyUtils$$anonfun$5.apply(JettyUtils.scala:308)
>   at org.apache.spark.ui.JettyUtils$$anonfun$5.apply(JettyUtils.scala:308)
>   at 
> org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2071)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2062)
>   at 
> org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:308)
>   at org.apache.spark.ui.WebUI.bind(WebUI.scala:139)
>   at 
> org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:451)
>   at 
> org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:451)
>   at scala.Option.foreach(Option.scala:257)
>   at org.apache.spark.SparkContext.(SparkContext.scala:451)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2256)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823)
>   at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)
>   at $line3.$read$$iw$$iw.(:15)
>   at $line3.$read$$iw.(:31)
>   at $line3.$read.(:33)
>   at $line3.$read$.(:37)
>   at $line3.$read$.()
>   at $line3.$eval$.$print$lzycompute(:7)
>   at $line3.$eval$.$print(:6)
>   at $line3.$eval.$print()
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
>   at 
> scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
>   at 
> scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
>   at 
> scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
>   at 
> scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
>   at 
> scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)

[jira] [Closed] (SPARK-16794) Spark 2.0.0. with Yarn

2016-07-29 Thread Eliano Marques (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eliano Marques closed SPARK-16794.
--
Resolution: Fixed

> Spark 2.0.0. with Yarn 
> ---
>
> Key: SPARK-16794
> URL: https://issues.apache.org/jira/browse/SPARK-16794
> Project: Spark
>  Issue Type: Question
>Affects Versions: 2.0.0
> Environment: AWS Cluster with Hortonworks 2.4
>Reporter: Eliano Marques
>
> I'm trying to start spark 2.0.0 with yarn. First I had the issues pointed out 
> here: https://issues.apache.org/jira/browse/SPARK-15343. I then check the 
> hadoop.yarn.timeline-service.enabled to false and the behaviour changed, i.e. 
> it started executing the application. However I'm facing the following error. 
> This might be a silly config but appreciate if you can help. 
> {code}
> spark-shell --master yarn deploy-mode client 
> {code}
> Log: 
> {code}
> 16/07/29 10:04:17 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 16/07/29 10:04:17 WARN component.AbstractLifeCycle: FAILED 
> ServerConnector@16da476c{HTTP/1.1}{0.0.0.0:4040}: java.net.BindException: 
> Address already in use
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.spark_project.jetty.server.ServerConnector.open(ServerConnector.java:321)
>   at 
> org.spark_project.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80)
>   at 
> org.spark_project.jetty.server.ServerConnector.doStart(ServerConnector.java:236)
>   at 
> org.spark_project.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
>   at org.spark_project.jetty.server.Server.doStart(Server.java:366)
>   at 
> org.spark_project.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
>   at 
> org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:298)
>   at org.apache.spark.ui.JettyUtils$$anonfun$5.apply(JettyUtils.scala:308)
>   at org.apache.spark.ui.JettyUtils$$anonfun$5.apply(JettyUtils.scala:308)
>   at 
> org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2071)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2062)
>   at 
> org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:308)
>   at org.apache.spark.ui.WebUI.bind(WebUI.scala:139)
>   at 
> org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:451)
>   at 
> org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:451)
>   at scala.Option.foreach(Option.scala:257)
>   at org.apache.spark.SparkContext.(SparkContext.scala:451)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2256)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823)
>   at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)
>   at $line3.$read$$iw$$iw.(:15)
>   at $line3.$read$$iw.(:31)
>   at $line3.$read.(:33)
>   at $line3.$read$.(:37)
>   at $line3.$read$.()
>   at $line3.$eval$.$print$lzycompute(:7)
>   at $line3.$eval$.$print(:6)
>   at $line3.$eval.$print()
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
>   at 
> scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
>   at 
> scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
>   at 
> scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
>   at 
> scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
>   at 
> scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
>   at 
>

[jira] [Commented] (SPARK-16806) from_unixtime function gives wrong answer

2016-07-29 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400118#comment-15400118
 ] 

Dongjoon Hyun commented on SPARK-16806:
---

You can see SPARK-16384 , too.

> from_unixtime function gives wrong answer
> -
>
> Key: SPARK-16806
> URL: https://issues.apache.org/jira/browse/SPARK-16806
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Dong Jiang
>
> The following is from 2.0, for the same epoch, the function with format 
> argument generates a different result for the year.
> spark-sql> select from_unixtime(100), from_unixtime(100, '-MM-dd 
> HH:mm:ss');
> 1969-12-31 19:01:40   1970-12-31 19:01:40



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-16806) from_unixtime function gives wrong answer

2016-07-29 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400113#comment-15400113
 ] 

Dongjoon Hyun edited comment on SPARK-16806 at 7/29/16 10:19 PM:
-

Yep. Please use `` instead of ``.
{code}
scala> sql("select from_unixtime(100), from_unixtime(100, '-MM-dd 
HH:mm:ss')").show
...
|   1969-12-31 16:01:40|   1969-12-31 16:01:40|
{code}


was (Author: dongjoon):
Yep. Please use `` instead of ``.
{code}
scala> sql("select from_unixtime(100), from_unixtime(100, '-MM-dd 
HH:mm:ss')").show
+---+---+
|from_unixtime(CAST(100 AS BIGINT), -MM-dd HH:mm:ss)|from_unixtime(CAST(100 
AS BIGINT), -MM-dd HH:mm:ss)|
+---+---+
|1969-12-31 16:01:40|   
 1969-12-31 16:01:40|
+---+---+
{code}

> from_unixtime function gives wrong answer
> -
>
> Key: SPARK-16806
> URL: https://issues.apache.org/jira/browse/SPARK-16806
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Dong Jiang
>
> The following is from 2.0, for the same epoch, the function with format 
> argument generates a different result for the year.
> spark-sql> select from_unixtime(100), from_unixtime(100, '-MM-dd 
> HH:mm:ss');
> 1969-12-31 19:01:40   1970-12-31 19:01:40



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-16806) from_unixtime function gives wrong answer

2016-07-29 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400113#comment-15400113
 ] 

Dongjoon Hyun edited comment on SPARK-16806 at 7/29/16 10:18 PM:
-

Yep. Please use `` instead of ``.
{code}
scala> sql("select from_unixtime(100), from_unixtime(100, '-MM-dd 
HH:mm:ss')").show
+---+---+
|from_unixtime(CAST(100 AS BIGINT), -MM-dd HH:mm:ss)|from_unixtime(CAST(100 
AS BIGINT), -MM-dd HH:mm:ss)|
+---+---+
|1969-12-31 16:01:40|   
 1969-12-31 16:01:40|
+---+---+
{code}


was (Author: dongjoon):
Yep. Please use `` instead of ``.
{code}
sql("select from_unixtime(100), from_unixtime(100, '-MM-dd HH:mm:ss')").show
{code}

> from_unixtime function gives wrong answer
> -
>
> Key: SPARK-16806
> URL: https://issues.apache.org/jira/browse/SPARK-16806
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Dong Jiang
>
> The following is from 2.0, for the same epoch, the function with format 
> argument generates a different result for the year.
> spark-sql> select from_unixtime(100), from_unixtime(100, '-MM-dd 
> HH:mm:ss');
> 1969-12-31 19:01:40   1970-12-31 19:01:40



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16806) from_unixtime function gives wrong answer

2016-07-29 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400113#comment-15400113
 ] 

Dongjoon Hyun commented on SPARK-16806:
---

Yep. Please use `` instead of ``.
{code}
sql("select from_unixtime(100), from_unixtime(100, '-MM-dd HH:mm:ss')").show
{code}

> from_unixtime function gives wrong answer
> -
>
> Key: SPARK-16806
> URL: https://issues.apache.org/jira/browse/SPARK-16806
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Dong Jiang
>
> The following is from 2.0, for the same epoch, the function with format 
> argument generates a different result for the year.
> spark-sql> select from_unixtime(100), from_unixtime(100, '-MM-dd 
> HH:mm:ss');
> 1969-12-31 19:01:40   1970-12-31 19:01:40



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16806) from_unixtime function gives wrong answer

2016-07-29 Thread Dong Jiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dong Jiang updated SPARK-16806:
---
Description: 
The following is from 2.0, for the same epoch, the function with format 
argument generates a different result for the year.
spark-sql> select from_unixtime(100), from_unixtime(100, '-MM-dd HH:mm:ss');
1969-12-31 19:01:40 1970-12-31 19:01:40



  was:
The following is from 2.0, for the same epoch, the function with format 
argument generates a different result.
spark-sql> select from_unixtime(100), from_unixtime(100, '-MM-dd HH:mm:ss');
1969-12-31 19:01:40 1970-12-31 19:01:40




> from_unixtime function gives wrong answer
> -
>
> Key: SPARK-16806
> URL: https://issues.apache.org/jira/browse/SPARK-16806
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Dong Jiang
>
> The following is from 2.0, for the same epoch, the function with format 
> argument generates a different result for the year.
> spark-sql> select from_unixtime(100), from_unixtime(100, '-MM-dd 
> HH:mm:ss');
> 1969-12-31 19:01:40   1970-12-31 19:01:40



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-16806) from_unixtime function gives wrong answer

2016-07-29 Thread Dong Jiang (JIRA)

Dong Jiang created SPARK-16806:
--

 Summary: from_unixtime function gives wrong answer
 Key: SPARK-16806
 URL: https://issues.apache.org/jira/browse/SPARK-16806
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Dong Jiang


The following is from 2.0, for the same epoch, the function with format 
argument generates a different result.
spark-sql> select from_unixtime(100), from_unixtime(100, '-MM-dd HH:mm:ss');
1969-12-31 19:01:40 1970-12-31 19:01:40





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16804) Correlated subqueries containing LIMIT return incorrect results

2016-07-29 Thread Nattavut Sutyanyong (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400105#comment-15400105
 ] 

Nattavut Sutyanyong commented on SPARK-16804:
-

This fix blocks any correlated subquery when there is a LIMIT operation on the 
path from the parent table to the correlated predicate. We may consider 
relaxing this restriction once we have a better support on processing 
correlated subquery in run-time. SPARK-13417 is an umbrella task to track this 
effort.

Note that if the LIMIT is not in the correlated path, Spark returns correct 
result.

Example:

sql("select c1 from t1 where exists (select 1 from t2 where t1.c1=t2.c2) and 
exists (select 1 from t2 LIMIT 1)").show

will return both rows from T1, which is correctly handled with and without this 
proposed fix.

This fix will change the behaviour of the query

sql("select c1 from t1 where exists (select 1 from t2 where t1.c1=t2.c2 LIMIT 
1)").show

to return an error from the Analysis phase as shown below:

org.apache.spark.sql.AnalysisException: Accessing outer query column is not 
allowed in a LIMIT: LocalLimit 1
...

> Correlated subqueries containing LIMIT return incorrect results
> ---
>
> Key: SPARK-16804
> URL: https://issues.apache.org/jira/browse/SPARK-16804
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Nattavut Sutyanyong
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Correlated subqueries with LIMIT could return incorrect results. The rule 
> ResolveSubquery in the Analysis phase moves correlated predicates to a join 
> predicates and neglect the semantic of the LIMIT.
> Example:
> {noformat}
> Seq(1, 2).toDF("c1").createOrReplaceTempView("t1")
> Seq(1, 2).toDF("c2").createOrReplaceTempView("t2")
> sql("select c1 from t1 where exists (select 1 from t2 where t1.c1=t2.c2 LIMIT 
> 1)").show
> +---+ 
>   
> | c1|
> +---+
> |  1|
> +---+
> {noformat}
> The correct result contains both rows from T1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16794) Spark 2.0.0. with Yarn

2016-07-29 Thread Eliano Marques (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400094#comment-15400094
 ] 

Eliano Marques commented on SPARK-16794:


So I investigated further the issue and turns out that the issues was this:

{code}
Error: Could not find or load main class 
org.apache.spark.deploy.yarn.ExecutorLauncher
{code}

We made it work by adding to the spark-defaults.conf file the following: 

{code}
spark.driver.extraJavaOptions -Dhdp.version=current
spark.yarn.am.extraJavaOptions -Dhdp.version=current
{code}

All python (jupyter and shell), scala (jupyter and shell) and R (shell and 
rstudio via sparklyr) are now working with 2.0.0. 

Thanks for your answer again. 

Eliano

> Spark 2.0.0. with Yarn 
> ---
>
> Key: SPARK-16794
> URL: https://issues.apache.org/jira/browse/SPARK-16794
> Project: Spark
>  Issue Type: Question
>Affects Versions: 2.0.0
> Environment: AWS Cluster with Hortonworks 2.4
>Reporter: Eliano Marques
>
> I'm trying to start spark 2.0.0 with yarn. First I had the issues pointed out 
> here: https://issues.apache.org/jira/browse/SPARK-15343. I then check the 
> hadoop.yarn.timeline-service.enabled to false and the behaviour changed, i.e. 
> it started executing the application. However I'm facing the following error. 
> This might be a silly config but appreciate if you can help. 
> {code}
> spark-shell --master yarn deploy-mode client 
> {code}
> Log: 
> {code}
> 16/07/29 10:04:17 WARN util.NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
> 16/07/29 10:04:17 WARN component.AbstractLifeCycle: FAILED 
> ServerConnector@16da476c{HTTP/1.1}{0.0.0.0:4040}: java.net.BindException: 
> Address already in use
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at 
> org.spark_project.jetty.server.ServerConnector.open(ServerConnector.java:321)
>   at 
> org.spark_project.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80)
>   at 
> org.spark_project.jetty.server.ServerConnector.doStart(ServerConnector.java:236)
>   at 
> org.spark_project.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
>   at org.spark_project.jetty.server.Server.doStart(Server.java:366)
>   at 
> org.spark_project.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68)
>   at 
> org.apache.spark.ui.JettyUtils$.org$apache$spark$ui$JettyUtils$$connect$1(JettyUtils.scala:298)
>   at org.apache.spark.ui.JettyUtils$$anonfun$5.apply(JettyUtils.scala:308)
>   at org.apache.spark.ui.JettyUtils$$anonfun$5.apply(JettyUtils.scala:308)
>   at 
> org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:2071)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
>   at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:2062)
>   at 
> org.apache.spark.ui.JettyUtils$.startJettyServer(JettyUtils.scala:308)
>   at org.apache.spark.ui.WebUI.bind(WebUI.scala:139)
>   at 
> org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:451)
>   at 
> org.apache.spark.SparkContext$$anonfun$10.apply(SparkContext.scala:451)
>   at scala.Option.foreach(Option.scala:257)
>   at org.apache.spark.SparkContext.(SparkContext.scala:451)
>   at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2256)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:831)
>   at 
> org.apache.spark.sql.SparkSession$Builder$$anonfun$8.apply(SparkSession.scala:823)
>   at scala.Option.getOrElse(Option.scala:121)
>   at 
> org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:823)
>   at org.apache.spark.repl.Main$.createSparkSession(Main.scala:95)
>   at $line3.$read$$iw$$iw.(:15)
>   at $line3.$read$$iw.(:31)
>   at $line3.$read.(:33)
>   at $line3.$read$.(:37)
>   at $line3.$read$.()
>   at $line3.$eval$.$print$lzycompute(:7)
>   at $line3.$eval$.$print(:6)
>   at $line3.$eval.$print()
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
>   at 
>

[jira] [Comment Edited] (SPARK-16798) java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2

2016-07-29 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400089#comment-15400089
 ] 

Charles Allen edited comment on SPARK-16798 at 7/29/16 10:06 PM:
-

Adding some more flavor, this is running in Mesos coarse mode against 0.28.2.

If I take a subset of the data that failed and run it locally (local[4] or 
local[1]), it succeeds, which is annoying.

here are the info logs from the failing tasks:

{code}
16/07/29 18:19:20 INFO HadoopRDD: Input split: REDACTED1:0+163064
16/07/29 18:19:20 INFO TorrentBroadcast: Started reading broadcast variable 0
16/07/29 18:19:20 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in 
memory (estimated size 18.2 KB, free 3.6 GB)
16/07/29 18:19:20 INFO TorrentBroadcast: Reading broadcast variable 0 took 10 ms
16/07/29 18:19:20 INFO MemoryStore: Block broadcast_0 stored as values in 
memory (estimated size 209.2 KB, free 3.6 GB)
16/07/29 18:19:20 INFO deprecation: mapred.tip.id is deprecated. Instead, use 
mapreduce.task.id
16/07/29 18:19:20 INFO deprecation: mapred.task.id is deprecated. Instead, use 
mapreduce.task.attempt.id
16/07/29 18:19:20 INFO deprecation: mapred.task.is.map is deprecated. Instead, 
use mapreduce.task.ismap
16/07/29 18:19:20 INFO deprecation: mapred.task.partition is deprecated. 
Instead, use mapreduce.task.partition
16/07/29 18:19:20 INFO deprecation: mapred.job.id is deprecated. Instead, use 
mapreduce.job.id
16/07/29 18:19:21 INFO NativeS3FileSystem: Opening 'REDACTED1' for reading
16/07/29 18:19:21 INFO CodecPool: Got brand-new decompressor [.gz]
16/07/29 18:19:21 ERROR Executor: Exception in task 9.0 in stage 0.0 (TID 9)
java.lang.IllegalArgumentException: bound must be positive
at java.util.Random.nextInt(Random.java:388)
at 
org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445)
at 
org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/07/29 18:19:21 INFO CoarseGrainedExecutorBackend: Got assigned task 14
16/07/29 18:19:21 INFO Executor: Running task 14.0 in stage 0.0 (TID 14)
16/07/29 18:19:21 INFO HadoopRDD: Input split: REDACTED2:0+157816
16/07/29 18:19:21 INFO NativeS3FileSystem: Opening 'REDACTED2' for reading
16/07/29 18:19:21 INFO CodecPool: Got brand-new decompressor [.gz]
16/07/29 18:19:21 ERROR Executor: Exception in task 14.0 in stage 0.0 (TID 14)
java.lang.IllegalArgumentException: bound must be positive
at java.util.Random.nextInt(Random.java:388)
at 
org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445)
at 
org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/07/29 18:19:21 INFO CoarseGrainedExecutorBackend: Got assigned task 15
16/07/29 18:19:21 INFO Executor: Running task 9.1 in stage 0.0 (TID 15)
{code}


was (Author: drcrallen):
Adding some more flavor, this is running in Mesos coarse mode against

[jira] [Commented] (SPARK-16779) Fix unnecessary use of postfix operations

2016-07-29 Thread holdenk (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400092#comment-15400092
 ] 

holdenk commented on SPARK-16779:
-

I've gone ahead and done a more full-scope version - leaving the XML postfix 
operators alone for now.

> Fix unnecessary use of postfix operations
> -
>
> Key: SPARK-16779
> URL: https://issues.apache.org/jira/browse/SPARK-16779
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: holdenk
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16798) java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2

2016-07-29 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400089#comment-15400089
 ] 

Charles Allen commented on SPARK-16798:
---

Adding some more flavor, this is running in Mesos coarse mode against 0.28.2.

If I take a subset of the data that failed and run it locally (local[4] or 
local[1]), it succeeds, which is annoying.

here are the info logs from the failing tasks:

{code}
16/07/29 18:19:20 INFO HadoopRDD: Input split: REDACTED1.gz:0+163064
16/07/29 18:19:20 INFO TorrentBroadcast: Started reading broadcast variable 0
16/07/29 18:19:20 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in 
memory (estimated size 18.2 KB, free 3.6 GB)
16/07/29 18:19:20 INFO TorrentBroadcast: Reading broadcast variable 0 took 10 ms
16/07/29 18:19:20 INFO MemoryStore: Block broadcast_0 stored as values in 
memory (estimated size 209.2 KB, free 3.6 GB)
16/07/29 18:19:20 INFO deprecation: mapred.tip.id is deprecated. Instead, use 
mapreduce.task.id
16/07/29 18:19:20 INFO deprecation: mapred.task.id is deprecated. Instead, use 
mapreduce.task.attempt.id
16/07/29 18:19:20 INFO deprecation: mapred.task.is.map is deprecated. Instead, 
use mapreduce.task.ismap
16/07/29 18:19:20 INFO deprecation: mapred.task.partition is deprecated. 
Instead, use mapreduce.task.partition
16/07/29 18:19:20 INFO deprecation: mapred.job.id is deprecated. Instead, use 
mapreduce.job.id
16/07/29 18:19:21 INFO NativeS3FileSystem: Opening 'REDACTED1' for reading
16/07/29 18:19:21 INFO CodecPool: Got brand-new decompressor [.gz]
16/07/29 18:19:21 ERROR Executor: Exception in task 9.0 in stage 0.0 (TID 9)
java.lang.IllegalArgumentException: bound must be positive
at java.util.Random.nextInt(Random.java:388)
at 
org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445)
at 
org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/07/29 18:19:21 INFO CoarseGrainedExecutorBackend: Got assigned task 14
16/07/29 18:19:21 INFO Executor: Running task 14.0 in stage 0.0 (TID 14)
16/07/29 18:19:21 INFO HadoopRDD: Input split: REDACTED2:0+157816
16/07/29 18:19:21 INFO NativeS3FileSystem: Opening 'REDACTED2' for reading
16/07/29 18:19:21 INFO CodecPool: Got brand-new decompressor [.gz]
16/07/29 18:19:21 ERROR Executor: Exception in task 14.0 in stage 0.0 (TID 14)
java.lang.IllegalArgumentException: bound must be positive
at java.util.Random.nextInt(Random.java:388)
at 
org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445)
at 
org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/07/29 18:19:21 INFO CoarseGrainedExecutorBackend: Got assigned task 15
16/07/29 18:19:21 INFO Executor: Running task 9.1 in stage 0.0 (TID 15)
{code}

> java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2
>

[jira] [Commented] (SPARK-16804) Correlated subqueries containing LIMIT return incorrect results

2016-07-29 Thread Nattavut Sutyanyong (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400090#comment-15400090
 ] 

Nattavut Sutyanyong commented on SPARK-16804:
-

scala> sql("select c1 from t1 where exists (select 1 from t2 where t1.c1=t2.c2 
LIMIT 1)").explain(true)
== Parsed Logical Plan ==
'Project ['c1]
+- 'Filter exists#21
   :  +- 'SubqueryAlias exists#21
   : +- 'GlobalLimit 1
   :+- 'LocalLimit 1
   :   +- 'Project [unresolvedalias(1, None)]
   :  +- 'Filter ('t1.c1 = 't2.c2)
   : +- 'UnresolvedRelation `t2`
   +- 'UnresolvedRelation `t1`

== Analyzed Logical Plan ==
c1: int
Project [c1#17]
+- Filter predicate-subquery#21 [(c1#17 = c2#10)]
   :  +- SubqueryAlias predicate-subquery#21 [(c1#17 = c2#10)]   <== This 
correlated predicate is incorrectly moved above the LIMIT
   : +- GlobalLimit 1
   :+- LocalLimit 1
   :   +- Project [1 AS 1#26, c2#10]
   :  +- SubqueryAlias t2
   : +- Project [value#8 AS c2#10]
   :+- LocalRelation [value#8]
   +- SubqueryAlias t1
  +- Project [value#15 AS c1#17]
 +- LocalRelation [value#15]

By rewriting the correlated predicate in the subquery in Analysis phase from 
below the LIMIT 1 operation to above it causing the scan of the subquery table 
to return only 1 row. The correct semantic is the LIMIT 1 must be applied on 
the subquery for each input value from the parent table.

> Correlated subqueries containing LIMIT return incorrect results
> ---
>
> Key: SPARK-16804
> URL: https://issues.apache.org/jira/browse/SPARK-16804
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Nattavut Sutyanyong
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Correlated subqueries with LIMIT could return incorrect results. The rule 
> ResolveSubquery in the Analysis phase moves correlated predicates to a join 
> predicates and neglect the semantic of the LIMIT.
> Example:
> {noformat}
> Seq(1, 2).toDF("c1").createOrReplaceTempView("t1")
> Seq(1, 2).toDF("c2").createOrReplaceTempView("t2")
> sql("select c1 from t1 where exists (select 1 from t2 where t1.c1=t2.c2 LIMIT 
> 1)").show
> +---+ 
>   
> | c1|
> +---+
> |  1|
> +---+
> {noformat}
> The correct result contains both rows from T1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16805) Log timezone when query result does not match

2016-07-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400087#comment-15400087
 ] 

Apache Spark commented on SPARK-16805:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/14413

> Log timezone when query result does not match
> -
>
> Key: SPARK-16805
> URL: https://issues.apache.org/jira/browse/SPARK-16805
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> It is useful to log the timezone when query result does not match, especially 
> on build machines that have different timezone from AMPLab Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-16805) Log timezone when query result does not match

2016-07-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16805:


Assignee: Apache Spark  (was: Reynold Xin)

> Log timezone when query result does not match
> -
>
> Key: SPARK-16805
> URL: https://issues.apache.org/jira/browse/SPARK-16805
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> It is useful to log the timezone when query result does not match, especially 
> on build machines that have different timezone from AMPLab Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-16805) Log timezone when query result does not match

2016-07-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16805:


Assignee: Reynold Xin  (was: Apache Spark)

> Log timezone when query result does not match
> -
>
> Key: SPARK-16805
> URL: https://issues.apache.org/jira/browse/SPARK-16805
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> It is useful to log the timezone when query result does not match, especially 
> on build machines that have different timezone from AMPLab Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16804) Correlated subqueries containing LIMIT return incorrect results

2016-07-29 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-16804:

Description: 
Correlated subqueries with LIMIT could return incorrect results. The rule 
ResolveSubquery in the Analysis phase moves correlated predicates to a join 
predicates and neglect the semantic of the LIMIT.

Example:

{noformat}
Seq(1, 2).toDF("c1").createOrReplaceTempView("t1")
Seq(1, 2).toDF("c2").createOrReplaceTempView("t2")

sql("select c1 from t1 where exists (select 1 from t2 where t1.c1=t2.c2 LIMIT 
1)").show
+---+   
| c1|
+---+
|  1|
+---+
{noformat}

The correct result contains both rows from T1.

  was:
Correlated subqueries with LIMIT could return incorrect results. The rule 
ResolveSubquery in the Analysis phase moves correlated predicates to a join 
predicates and neglect the semantic of the LIMIT.

Example:

Seq(1, 2).toDF("c1").createOrReplaceTempView("t1")
Seq(1, 2).toDF("c2").createOrReplaceTempView("t2")

sql("select c1 from t1 where exists (select 1 from t2 where t1.c1=t2.c2 LIMIT 
1)").show
+---+   
| c1|
+---+
|  1|
+---+

The correct result contains both rows from T1.


> Correlated subqueries containing LIMIT return incorrect results
> ---
>
> Key: SPARK-16804
> URL: https://issues.apache.org/jira/browse/SPARK-16804
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Nattavut Sutyanyong
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Correlated subqueries with LIMIT could return incorrect results. The rule 
> ResolveSubquery in the Analysis phase moves correlated predicates to a join 
> predicates and neglect the semantic of the LIMIT.
> Example:
> {noformat}
> Seq(1, 2).toDF("c1").createOrReplaceTempView("t1")
> Seq(1, 2).toDF("c2").createOrReplaceTempView("t2")
> sql("select c1 from t1 where exists (select 1 from t2 where t1.c1=t2.c2 LIMIT 
> 1)").show
> +---+ 
>   
> | c1|
> +---+
> |  1|
> +---+
> {noformat}
> The correct result contains both rows from T1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-16805) Log timezone when query result does not match

2016-07-29 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-16805:
---

 Summary: Log timezone when query result does not match
 Key: SPARK-16805
 URL: https://issues.apache.org/jira/browse/SPARK-16805
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin


It is useful to log the timezone when query result does not match, especially 
on build machines that have different timezone from AMPLab Jenkins.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16785) dapply doesn't return array or raw columns

2016-07-29 Thread Clark Fitzgerald (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400083#comment-15400083
 ] 

Clark Fitzgerald commented on SPARK-16785:
--

Yes I can do a PR for this. It may take me a little while though.

> dapply doesn't return array or raw columns
> --
>
> Key: SPARK-16785
> URL: https://issues.apache.org/jira/browse/SPARK-16785
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
> Environment: Mac OS X
>Reporter: Clark Fitzgerald
>Priority: Minor
>
> Calling SparkR::dapplyCollect with R functions that return dataframes 
> produces an error. This comes up when returning columns of binary data- ie. 
> serialized fitted models. Also happens when functions return columns 
> containing vectors.
> The error message:
> R computation failed with
>  Error in (function (..., deparse.level = 1, make.row.names = TRUE, 
> stringsAsFactors = default.stringsAsFactors())  :
>   invalid list argument: all variables should have the same length
> Reproducible example: 
> https://github.com/clarkfitzg/phd_research/blob/master/ddR/spark/sparkR_dapplyCollect7.R
> Relates to SPARK-16611



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-15355) Pro-active block replenishment in case of node/executor failures

2016-07-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15355:


Assignee: (was: Apache Spark)

> Pro-active block replenishment in case of node/executor failures
> 
>
> Key: SPARK-15355
> URL: https://issues.apache.org/jira/browse/SPARK-15355
> Project: Spark
>  Issue Type: Sub-task
>  Components: Block Manager, Spark Core
>Reporter: Shubham Chopra
>
> Spark currently does not replenish lost replicas. For resiliency and high 
> availability, BlockManagerMasterEndpoint can proactively verify whether all 
> cached RDDs have enough replicas, and replenish them, in case they don’t.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-15355) Pro-active block replenishment in case of node/executor failures

2016-07-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-15355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15355:


Assignee: Apache Spark

> Pro-active block replenishment in case of node/executor failures
> 
>
> Key: SPARK-15355
> URL: https://issues.apache.org/jira/browse/SPARK-15355
> Project: Spark
>  Issue Type: Sub-task
>  Components: Block Manager, Spark Core
>Reporter: Shubham Chopra
>Assignee: Apache Spark
>
> Spark currently does not replenish lost replicas. For resiliency and high 
> availability, BlockManagerMasterEndpoint can proactively verify whether all 
> cached RDDs have enough replicas, and replenish them, in case they don’t.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15355) Pro-active block replenishment in case of node/executor failures

2016-07-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400078#comment-15400078
 ] 

Apache Spark commented on SPARK-15355:
--

User 'shubhamchopra' has created a pull request for this issue:
https://github.com/apache/spark/pull/14412

> Pro-active block replenishment in case of node/executor failures
> 
>
> Key: SPARK-15355
> URL: https://issues.apache.org/jira/browse/SPARK-15355
> Project: Spark
>  Issue Type: Sub-task
>  Components: Block Manager, Spark Core
>Reporter: Shubham Chopra
>
> Spark currently does not replenish lost replicas. For resiliency and high 
> availability, BlockManagerMasterEndpoint can proactively verify whether all 
> cached RDDs have enough replicas, and replenish them, in case they don’t.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16804) Correlated subqueries containing LIMIT return incorrect results

2016-07-29 Thread Nattavut Sutyanyong (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400075#comment-15400075
 ] 

Nattavut Sutyanyong commented on SPARK-16804:
-

I am working on a fix. Could you please assign this JIRA to nsyca?

> Correlated subqueries containing LIMIT return incorrect results
> ---
>
> Key: SPARK-16804
> URL: https://issues.apache.org/jira/browse/SPARK-16804
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Nattavut Sutyanyong
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Correlated subqueries with LIMIT could return incorrect results. The rule 
> ResolveSubquery in the Analysis phase moves correlated predicates to a join 
> predicates and neglect the semantic of the LIMIT.
> Example:
> Seq(1, 2).toDF("c1").createOrReplaceTempView("t1")
> Seq(1, 2).toDF("c2").createOrReplaceTempView("t2")
> sql("select c1 from t1 where exists (select 1 from t2 where t1.c1=t2.c2 LIMIT 
> 1)").show
> +---+ 
>   
> | c1|
> +---+
> |  1|
> +---+
> The correct result contains both rows from T1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16798) java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2

2016-07-29 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400070#comment-15400070
 ] 

Charles Allen commented on SPARK-16798:
---

I am definitely running a *modified* 2.0.0, but modifications are in the 
scheduler, not the RDD paths.

Right now I'm running 1.5.2_2.11 through the deployment system to get as close 
to apples-to-apples as I can (and so that workflows can be swapped between the 
two ad-hoc)

> java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2
> 
>
> Key: SPARK-16798
> URL: https://issues.apache.org/jira/browse/SPARK-16798
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Charles Allen
>
> Code at https://github.com/metamx/druid-spark-batch which was working under 
> 1.5.2 has ceased to function under 2.0.0 with the below stacktrace.
> {code}
> java.lang.IllegalArgumentException: bound must be positive
>   at java.util.Random.nextInt(Random.java:388)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16798) java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2

2016-07-29 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400071#comment-15400071
 ] 

Charles Allen commented on SPARK-16798:
---

I'll run 2.0.0 stock as another test that will go out during this push.

> java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2
> 
>
> Key: SPARK-16798
> URL: https://issues.apache.org/jira/browse/SPARK-16798
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Charles Allen
>
> Code at https://github.com/metamx/druid-spark-batch which was working under 
> 1.5.2 has ceased to function under 2.0.0 with the below stacktrace.
> {code}
> java.lang.IllegalArgumentException: bound must be positive
>   at java.util.Random.nextInt(Random.java:388)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16804) Correlated subqueries containing LIMIT return incorrect results

2016-07-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400068#comment-15400068
 ] 

Apache Spark commented on SPARK-16804:
--

User 'nsyca' has created a pull request for this issue:
https://github.com/apache/spark/pull/14411

> Correlated subqueries containing LIMIT return incorrect results
> ---
>
> Key: SPARK-16804
> URL: https://issues.apache.org/jira/browse/SPARK-16804
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Nattavut Sutyanyong
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Correlated subqueries with LIMIT could return incorrect results. The rule 
> ResolveSubquery in the Analysis phase moves correlated predicates to a join 
> predicates and neglect the semantic of the LIMIT.
> Example:
> Seq(1, 2).toDF("c1").createOrReplaceTempView("t1")
> Seq(1, 2).toDF("c2").createOrReplaceTempView("t2")
> sql("select c1 from t1 where exists (select 1 from t2 where t1.c1=t2.c2 LIMIT 
> 1)").show
> +---+ 
>   
> | c1|
> +---+
> |  1|
> +---+
> The correct result contains both rows from T1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-16804) Correlated subqueries containing LIMIT return incorrect results

2016-07-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16804:


Assignee: (was: Apache Spark)

> Correlated subqueries containing LIMIT return incorrect results
> ---
>
> Key: SPARK-16804
> URL: https://issues.apache.org/jira/browse/SPARK-16804
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Nattavut Sutyanyong
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Correlated subqueries with LIMIT could return incorrect results. The rule 
> ResolveSubquery in the Analysis phase moves correlated predicates to a join 
> predicates and neglect the semantic of the LIMIT.
> Example:
> Seq(1, 2).toDF("c1").createOrReplaceTempView("t1")
> Seq(1, 2).toDF("c2").createOrReplaceTempView("t2")
> sql("select c1 from t1 where exists (select 1 from t2 where t1.c1=t2.c2 LIMIT 
> 1)").show
> +---+ 
>   
> | c1|
> +---+
> |  1|
> +---+
> The correct result contains both rows from T1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-16804) Correlated subqueries containing LIMIT return incorrect results

2016-07-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16804:


Assignee: Apache Spark

> Correlated subqueries containing LIMIT return incorrect results
> ---
>
> Key: SPARK-16804
> URL: https://issues.apache.org/jira/browse/SPARK-16804
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Nattavut Sutyanyong
>Assignee: Apache Spark
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> Correlated subqueries with LIMIT could return incorrect results. The rule 
> ResolveSubquery in the Analysis phase moves correlated predicates to a join 
> predicates and neglect the semantic of the LIMIT.
> Example:
> Seq(1, 2).toDF("c1").createOrReplaceTempView("t1")
> Seq(1, 2).toDF("c2").createOrReplaceTempView("t2")
> sql("select c1 from t1 where exists (select 1 from t2 where t1.c1=t2.c2 LIMIT 
> 1)").show
> +---+ 
>   
> | c1|
> +---+
> |  1|
> +---+
> The correct result contains both rows from T1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16802) joins.LongToUnsafeRowMap crashes with ArrayIndexOutOfBoundsException

2016-07-29 Thread Miao Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400045#comment-15400045
 ] 

Miao Wang commented on SPARK-16802:
---

Very easy to reproduce. I am learning SQL code now.

> joins.LongToUnsafeRowMap crashes with ArrayIndexOutOfBoundsException
> 
>
> Key: SPARK-16802
> URL: https://issues.apache.org/jira/browse/SPARK-16802
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Sylvain Zimmer
>
> Hello!
> This is a little similar to 
> [SPARK-16740|https://issues.apache.org/jira/browse/SPARK-16740] (should I 
> have reopened it?).
> I would recommend to give another full review to {{HashedRelation.scala}}, 
> particularly the new {{LongToUnsafeRowMap}} code. I've had a few other errors 
> that I haven't managed to reproduce so far, as well as what I suspect could 
> be memory leaks (I have a query in a loop OOMing after a few iterations 
> despite not caching its results).
> Here is the script to reproduce the ArrayIndexOutOfBoundsException on the 
> current 2.0 branch:
> {code}
> import os
> import random
> from pyspark import SparkContext
> from pyspark.sql import types as SparkTypes
> from pyspark.sql import SQLContext
> sc = SparkContext()
> sqlc = SQLContext(sc)
> schema1 = SparkTypes.StructType([
> SparkTypes.StructField("id1", SparkTypes.LongType(), nullable=True)
> ])
> schema2 = SparkTypes.StructType([
> SparkTypes.StructField("id2", SparkTypes.LongType(), nullable=True)
> ])
> def randlong():
> return random.randint(-9223372036854775808, 9223372036854775807)
> while True:
> l1, l2 = randlong(), randlong()
> # Sample values that crash:
> # l1, l2 = 4661454128115150227, -5543241376386463808
> print "Testing with %s, %s" % (l1, l2)
> data1 = [(l1, ), (l2, )]
> data2 = [(l1, )]
> df1 = sqlc.createDataFrame(sc.parallelize(data1), schema1)
> df2 = sqlc.createDataFrame(sc.parallelize(data2), schema2)
> crash = True
> if crash:
> os.system("rm -rf /tmp/sparkbug")
> df1.write.parquet("/tmp/sparkbug/vertex")
> df2.write.parquet("/tmp/sparkbug/edge")
> df1 = sqlc.read.load("/tmp/sparkbug/vertex")
> df2 = sqlc.read.load("/tmp/sparkbug/edge")
> sqlc.registerDataFrameAsTable(df1, "df1")
> sqlc.registerDataFrameAsTable(df2, "df2")
> result_df = sqlc.sql("""
> SELECT
> df1.id1
> FROM df1
> LEFT OUTER JOIN df2 ON df1.id1 = df2.id2
> """)
> print result_df.collect()
> {code}
> {code}
> java.lang.ArrayIndexOutOfBoundsException: 1728150825
>   at 
> org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.getValue(HashedRelation.scala:463)
>   at 
> org.apache.spark.sql.execution.joins.LongHashedRelation.getValue(HashedRelation.scala:762)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
>  Source)
>   at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>   at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:117)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:893)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:112)
>   at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.to(SerDeUtil.scala:112)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.toBuffer(SerDeUtil.scala:112)
>   at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
>   at 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.toArray(SerDeUtil.scala:112)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:899)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:899)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1898)
>   at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1898)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
>   at

[jira] [Created] (SPARK-16804) Correlated subqueries containing LIMIT return incorrect results

2016-07-29 Thread Nattavut Sutyanyong (JIRA)

Nattavut Sutyanyong created SPARK-16804:
---

 Summary: Correlated subqueries containing LIMIT return incorrect 
results
 Key: SPARK-16804
 URL: https://issues.apache.org/jira/browse/SPARK-16804
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Nattavut Sutyanyong


Correlated subqueries with LIMIT could return incorrect results. The rule 
ResolveSubquery in the Analysis phase moves correlated predicates to a join 
predicates and neglect the semantic of the LIMIT.

Example:

Seq(1, 2).toDF("c1").createOrReplaceTempView("t1")
Seq(1, 2).toDF("c2").createOrReplaceTempView("t2")

sql("select c1 from t1 where exists (select 1 from t2 where t1.c1=t2.c2 LIMIT 
1)").show
+---+   
| c1|
+---+
|  1|
+---+

The correct result contains both rows from T1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16803) SaveAsTable does not work when source DataFrame is built on a Hive Table

2016-07-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400029#comment-15400029
 ] 

Apache Spark commented on SPARK-16803:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/14410

> SaveAsTable does not work when source DataFrame is built on a Hive Table
> 
>
> Key: SPARK-16803
> URL: https://issues.apache.org/jira/browse/SPARK-16803
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> {noformat}
> scala> sql("create table sample.sample stored as SEQUENCEFILE as select 1 as 
> key, 'abc' as value")
> res2: org.apache.spark.sql.DataFrame = []
> scala> val df = sql("select key, value as value from sample.sample")
> df: org.apache.spark.sql.DataFrame = [key: int, value: string]
> scala> df.write.mode("append").saveAsTable("sample.sample")
> scala> sql("select * from sample.sample").show()
> +---+-+
> |key|value|
> +---+-+
> |  1|  abc|
> |  1|  abc|
> +---+-+
> {noformat}
> In Spark 1.6, it works, but Spark 2.0 does not work. The error message from 
> Spark 2.0 is
> {noformat}
> scala> df.write.mode("append").saveAsTable("sample.sample")
> org.apache.spark.sql.AnalysisException: Saving data in MetastoreRelation 
> sample, sample
>  is not supported.;
> {noformat}
> So far, we do not plan to support it in Spark 2.0. Spark 1.6 works because it 
> internally uses {{insertInto}}. But, if we change it back it will break the 
> semantic of {{saveAsTable}} (this method uses by-name resolution instead of 
> using by-position resolution used by {{insertInto}}).
> Instead, users should use {{insertInto}} API. We should correct the error 
> messages. Users can understand how to bypass it before we support it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-16803) SaveAsTable does not work when source DataFrame is built on a Hive Table

2016-07-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16803:


Assignee: Apache Spark

> SaveAsTable does not work when source DataFrame is built on a Hive Table
> 
>
> Key: SPARK-16803
> URL: https://issues.apache.org/jira/browse/SPARK-16803
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>
> {noformat}
> scala> sql("create table sample.sample stored as SEQUENCEFILE as select 1 as 
> key, 'abc' as value")
> res2: org.apache.spark.sql.DataFrame = []
> scala> val df = sql("select key, value as value from sample.sample")
> df: org.apache.spark.sql.DataFrame = [key: int, value: string]
> scala> df.write.mode("append").saveAsTable("sample.sample")
> scala> sql("select * from sample.sample").show()
> +---+-+
> |key|value|
> +---+-+
> |  1|  abc|
> |  1|  abc|
> +---+-+
> {noformat}
> In Spark 1.6, it works, but Spark 2.0 does not work. The error message from 
> Spark 2.0 is
> {noformat}
> scala> df.write.mode("append").saveAsTable("sample.sample")
> org.apache.spark.sql.AnalysisException: Saving data in MetastoreRelation 
> sample, sample
>  is not supported.;
> {noformat}
> So far, we do not plan to support it in Spark 2.0. Spark 1.6 works because it 
> internally uses {{insertInto}}. But, if we change it back it will break the 
> semantic of {{saveAsTable}} (this method uses by-name resolution instead of 
> using by-position resolution used by {{insertInto}}).
> Instead, users should use {{insertInto}} API. We should correct the error 
> messages. Users can understand how to bypass it before we support it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-16803) SaveAsTable does not work when source DataFrame is built on a Hive Table

2016-07-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16803:


Assignee: (was: Apache Spark)

> SaveAsTable does not work when source DataFrame is built on a Hive Table
> 
>
> Key: SPARK-16803
> URL: https://issues.apache.org/jira/browse/SPARK-16803
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> {noformat}
> scala> sql("create table sample.sample stored as SEQUENCEFILE as select 1 as 
> key, 'abc' as value")
> res2: org.apache.spark.sql.DataFrame = []
> scala> val df = sql("select key, value as value from sample.sample")
> df: org.apache.spark.sql.DataFrame = [key: int, value: string]
> scala> df.write.mode("append").saveAsTable("sample.sample")
> scala> sql("select * from sample.sample").show()
> +---+-+
> |key|value|
> +---+-+
> |  1|  abc|
> |  1|  abc|
> +---+-+
> {noformat}
> In Spark 1.6, it works, but Spark 2.0 does not work. The error message from 
> Spark 2.0 is
> {noformat}
> scala> df.write.mode("append").saveAsTable("sample.sample")
> org.apache.spark.sql.AnalysisException: Saving data in MetastoreRelation 
> sample, sample
>  is not supported.;
> {noformat}
> So far, we do not plan to support it in Spark 2.0. Spark 1.6 works because it 
> internally uses {{insertInto}}. But, if we change it back it will break the 
> semantic of {{saveAsTable}} (this method uses by-name resolution instead of 
> using by-position resolution used by {{insertInto}}).
> Instead, users should use {{insertInto}} API. We should correct the error 
> messages. Users can understand how to bypass it before we support it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16789) Can't run saveAsTable with database name

2016-07-29 Thread Xiao Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400025#comment-15400025
 ] 

Xiao Li commented on SPARK-16789:
-

Please check the description of the JIRA: 
https://issues.apache.org/jira/browse/SPARK-16803

Thanks!

> Can't run saveAsTable with database name
> 
>
> Key: SPARK-16789
> URL: https://issues.apache.org/jira/browse/SPARK-16789
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: CentOS 7 JDK 1.8 Hive 1.2.1
>Reporter: SonixLegend
>
> The function "saveAsTable" with database and table name is running via 1.6.2 
> successfully. But when I upgrade 2.0, it's got the error. There are my code 
> and error message. Can you help me?
> conf/hive-site.xml
> 
> hive.metastore.uris
> thrift://localhost:9083
> 
> val spark = 
> SparkSession.builder().appName("SparkHive").enableHiveSupport().getOrCreate()
> import spark.implicits._
> import spark.sql
> val source = sql("select * from sample.sample")
> source.createOrReplaceTempView("test")
> source.collect.foreach{tuple => println(tuple(0) + ":" + tuple(1))}
> val target = sql("select key, 'Spark' as value from test")
> println(target.count())
> target.write.mode(SaveMode.Append).saveAsTable("sample.sample")
> spark.stop()
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Saving 
> data in MetastoreRelation sample, sample
>  is not supported.;
>   at 
> org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:218)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
>   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:378)
>   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:354)
>   at com.newtouch.sample.SparkHive$.main(SparkHive.scala:25)
>   at com.newtouch.sample.SparkHive.main(SparkHive.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-16789) Can't run saveAsTable with database name

2016-07-29 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li closed SPARK-16789.
---
Resolution: Duplicate

> Can't run saveAsTable with database name
> 
>
> Key: SPARK-16789
> URL: https://issues.apache.org/jira/browse/SPARK-16789
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: CentOS 7 JDK 1.8 Hive 1.2.1
>Reporter: SonixLegend
>
> The function "saveAsTable" with database and table name is running via 1.6.2 
> successfully. But when I upgrade 2.0, it's got the error. There are my code 
> and error message. Can you help me?
> conf/hive-site.xml
> 
> hive.metastore.uris
> thrift://localhost:9083
> 
> val spark = 
> SparkSession.builder().appName("SparkHive").enableHiveSupport().getOrCreate()
> import spark.implicits._
> import spark.sql
> val source = sql("select * from sample.sample")
> source.createOrReplaceTempView("test")
> source.collect.foreach{tuple => println(tuple(0) + ":" + tuple(1))}
> val target = sql("select key, 'Spark' as value from test")
> println(target.count())
> target.write.mode(SaveMode.Append).saveAsTable("sample.sample")
> spark.stop()
> Exception in thread "main" org.apache.spark.sql.AnalysisException: Saving 
> data in MetastoreRelation sample, sample
>  is not supported.;
>   at 
> org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:218)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:60)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:58)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
>   at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
>   at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
>   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:378)
>   at 
> org.apache.spark.sql.DataFrameWriter.saveAsTable(DataFrameWriter.scala:354)
>   at com.newtouch.sample.SparkHive$.main(SparkHive.scala:25)
>   at com.newtouch.sample.SparkHive.main(SparkHive.scala)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
>   at 
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
>   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16803) SaveAsTable does not work when source DataFrame is built on a Hive Table

2016-07-29 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-16803:

Description: 
{noformat}
scala> sql("create table sample.sample stored as SEQUENCEFILE as select 1 as 
key, 'abc' as value")
res2: org.apache.spark.sql.DataFrame = []

scala> val df = sql("select key, value as value from sample.sample")
df: org.apache.spark.sql.DataFrame = [key: int, value: string]

scala> df.write.mode("append").saveAsTable("sample.sample")

scala> sql("select * from sample.sample").show()
+---+-+
|key|value|
+---+-+
|  1|  abc|
|  1|  abc|
+---+-+
{noformat}
In Spark 1.6, it works, but Spark 2.0 does not work. The error message from 
Spark 2.0 is
{noformat}
scala> df.write.mode("append").saveAsTable("sample.sample")
org.apache.spark.sql.AnalysisException: Saving data in MetastoreRelation 
sample, sample
 is not supported.;
{noformat}

So far, we do not plan to support it in Spark 2.0. Spark 1.6 works because it 
internally uses {{insertInto}}. But, if we change it back it will break the 
semantic of {{saveAsTable}} (this method uses by-name resolution instead of 
using by-position resolution used by {{insertInto}}).

Instead, users should use {{insertInto}} API. We should correct the error 
messages. Users can understand how to bypass it before we support it. 

  was:
{noformat}
scala> sql("create table sample.sample stored as SEQUENCEFILE as select 1 as 
key, 'abc' as value")
res2: org.apache.spark.sql.DataFrame = []

scala> val df = sql("select key, value as value from sample.sample")
df: org.apache.spark.sql.DataFrame = [key: int, value: string]

scala> df.write.mode("append").saveAsTable("sample.sample")

scala> sql("select * from sample.sample").show()
+---+-+
|key|value|
+---+-+
|  1|  abc|
|  1|  abc|
+---+-+
{noformat}
In Spark 1.6, it works, but Spark 2.0 does not work. The error message from 
Spark 2.0 is
{{noformat}}
scala> df.write.mode("append").saveAsTable("sample.sample")
org.apache.spark.sql.AnalysisException: Saving data in MetastoreRelation 
sample, sample
 is not supported.;
{{noformat}}

So far, we do not plan to support it in Spark 2.0. Spark 1.6 works because it 
internally uses {{insertInto}}. But, if we change it back it will break the 
semantic of {{saveAsTable}} (this method uses by-name resolution instead of 
using by-position resolution used by {{insertInto}}).

Instead, users should use {{insertInto}} API. We should correct the error 
messages. Users can understand how to bypass it before we support it. 


> SaveAsTable does not work when source DataFrame is built on a Hive Table
> 
>
> Key: SPARK-16803
> URL: https://issues.apache.org/jira/browse/SPARK-16803
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> {noformat}
> scala> sql("create table sample.sample stored as SEQUENCEFILE as select 1 as 
> key, 'abc' as value")
> res2: org.apache.spark.sql.DataFrame = []
> scala> val df = sql("select key, value as value from sample.sample")
> df: org.apache.spark.sql.DataFrame = [key: int, value: string]
> scala> df.write.mode("append").saveAsTable("sample.sample")
> scala> sql("select * from sample.sample").show()
> +---+-+
> |key|value|
> +---+-+
> |  1|  abc|
> |  1|  abc|
> +---+-+
> {noformat}
> In Spark 1.6, it works, but Spark 2.0 does not work. The error message from 
> Spark 2.0 is
> {noformat}
> scala> df.write.mode("append").saveAsTable("sample.sample")
> org.apache.spark.sql.AnalysisException: Saving data in MetastoreRelation 
> sample, sample
>  is not supported.;
> {noformat}
> So far, we do not plan to support it in Spark 2.0. Spark 1.6 works because it 
> internally uses {{insertInto}}. But, if we change it back it will break the 
> semantic of {{saveAsTable}} (this method uses by-name resolution instead of 
> using by-position resolution used by {{insertInto}}).
> Instead, users should use {{insertInto}} API. We should correct the error 
> messages. Users can understand how to bypass it before we support it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16803) SaveAsTable does not work when source DataFrame is built on a Hive Table

2016-07-29 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-16803:

Description: 
{noformat}
scala> sql("create table sample.sample stored as SEQUENCEFILE as select 1 as 
key, 'abc' as value")
res2: org.apache.spark.sql.DataFrame = []

scala> val df = sql("select key, value as value from sample.sample")
df: org.apache.spark.sql.DataFrame = [key: int, value: string]

scala> df.write.mode("append").saveAsTable("sample.sample")

scala> sql("select * from sample.sample").show()
+---+-+
|key|value|
+---+-+
|  1|  abc|
|  1|  abc|
+---+-+
{noformat}
In Spark 1.6, it works, but Spark 2.0 does not work. The error message from 
Spark 2.0 is
{{noformat}}
scala> df.write.mode("append").saveAsTable("sample.sample")
org.apache.spark.sql.AnalysisException: Saving data in MetastoreRelation 
sample, sample
 is not supported.;
{{noformat}}

So far, we do not plan to support it in Spark 2.0. Spark 1.6 works because it 
internally uses {{insertInto}}. But, if we change it back it will break the 
semantic of {{saveAsTable}} (this method uses by-name resolution instead of 
using by-position resolution used by {{insertInto}}).

Instead, users should use {{insertInto}} API. We should correct the error 
messages. Users can understand how to bypass it before we support it. 

  was:
{{noformat}}
scala> sql("create table sample.sample stored as SEQUENCEFILE as select 1 as 
key, 'abc' as value")
res2: org.apache.spark.sql.DataFrame = []

scala> val df = sql("select key, value as value from sample.sample")
df: org.apache.spark.sql.DataFrame = [key: int, value: string]

scala> df.write.mode("append").saveAsTable("sample.sample")

scala> sql("select * from sample.sample").show()
+---+-+
|key|value|
+---+-+
|  1|  abc|
|  1|  abc|
+---+-+
{{noformat}}
In Spark 1.6, it works, but Spark 2.0 does not work. The error message from 
Spark 2.0 is
{{noformat}}
scala> df.write.mode("append").saveAsTable("sample.sample")
org.apache.spark.sql.AnalysisException: Saving data in MetastoreRelation 
sample, sample
 is not supported.;
{{noformat}}

So far, we do not plan to support it in Spark 2.0. Spark 1.6 works because it 
internally uses {{insertInto}}. But, if we change it back it will break the 
semantic of {{saveAsTable}} (this method uses by-name resolution instead of 
using by-position resolution used by {{insertInto}}).

Instead, users should use {{insertInto}} API. We should correct the error 
messages. Users can understand how to bypass it before we support it. 


> SaveAsTable does not work when source DataFrame is built on a Hive Table
> 
>
> Key: SPARK-16803
> URL: https://issues.apache.org/jira/browse/SPARK-16803
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> {noformat}
> scala> sql("create table sample.sample stored as SEQUENCEFILE as select 1 as 
> key, 'abc' as value")
> res2: org.apache.spark.sql.DataFrame = []
> scala> val df = sql("select key, value as value from sample.sample")
> df: org.apache.spark.sql.DataFrame = [key: int, value: string]
> scala> df.write.mode("append").saveAsTable("sample.sample")
> scala> sql("select * from sample.sample").show()
> +---+-+
> |key|value|
> +---+-+
> |  1|  abc|
> |  1|  abc|
> +---+-+
> {noformat}
> In Spark 1.6, it works, but Spark 2.0 does not work. The error message from 
> Spark 2.0 is
> {{noformat}}
> scala> df.write.mode("append").saveAsTable("sample.sample")
> org.apache.spark.sql.AnalysisException: Saving data in MetastoreRelation 
> sample, sample
>  is not supported.;
> {{noformat}}
> So far, we do not plan to support it in Spark 2.0. Spark 1.6 works because it 
> internally uses {{insertInto}}. But, if we change it back it will break the 
> semantic of {{saveAsTable}} (this method uses by-name resolution instead of 
> using by-position resolution used by {{insertInto}}).
> Instead, users should use {{insertInto}} API. We should correct the error 
> messages. Users can understand how to bypass it before we support it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-16803) SaveAsTable does not work when source DataFrame is built on a Hive Table

2016-07-29 Thread Xiao Li (JIRA)

Xiao Li created SPARK-16803:
---

 Summary: SaveAsTable does not work when source DataFrame is built 
on a Hive Table
 Key: SPARK-16803
 URL: https://issues.apache.org/jira/browse/SPARK-16803
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Xiao Li


{{noformat}}
scala> sql("create table sample.sample stored as SEQUENCEFILE as select 1 as 
key, 'abc' as value")
res2: org.apache.spark.sql.DataFrame = []

scala> val df = sql("select key, value as value from sample.sample")
df: org.apache.spark.sql.DataFrame = [key: int, value: string]

scala> df.write.mode("append").saveAsTable("sample.sample")

scala> sql("select * from sample.sample").show()
+---+-+
|key|value|
+---+-+
|  1|  abc|
|  1|  abc|
+---+-+
{{noformat}}
In Spark 1.6, it works, but Spark 2.0 does not work. The error message from 
Spark 2.0 is
{{noformat}}
scala> df.write.mode("append").saveAsTable("sample.sample")
org.apache.spark.sql.AnalysisException: Saving data in MetastoreRelation 
sample, sample
 is not supported.;
{{noformat}}

So far, we do not plan to support it in Spark 2.0. Spark 1.6 works because it 
internally uses {{insertInto}}. But, if we change it back it will break the 
semantic of {{saveAsTable}} (this method uses by-name resolution instead of 
using by-position resolution used by {{insertInto}}).

Instead, users should use {{insertInto}} API. We should correct the error 
messages. Users can understand how to bypass it before we support it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16441) Spark application hang when dynamic allocation is enabled

2016-07-29 Thread Dhruve Ashar (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dhruve Ashar updated SPARK-16441:
-
Affects Version/s: 2.0.0

> Spark application hang when dynamic allocation is enabled
> -
>
> Key: SPARK-16441
> URL: https://issues.apache.org/jira/browse/SPARK-16441
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.2, 2.0.0
> Environment: hadoop 2.7.2  spark1.6.2
>Reporter: cen yuhai
>
> spark application are waiting for rpc response all the time and spark 
> listener are blocked by dynamic allocation. Executors can not connect to 
> driver and lost.
> "spark-dynamic-executor-allocation" #239 daemon prio=5 os_prio=0 
> tid=0x7fa304438000 nid=0xcec6 waiting on condition [0x7fa2b81e4000]
>java.lang.Thread.State: TIMED_WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0x00070fdb94f8> (a 
> scala.concurrent.impl.Promise$CompletionLatch)
>   at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>   at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208)
>   at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
>   at 
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
>   at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
>   at 
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>   at scala.concurrent.Await$.result(package.scala:107)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
>   at 
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77)
>   at 
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend.doRequestTotalExecutors(YarnSchedulerBackend.scala:59)
>   at 
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:436)
>   - locked <0x828a8960> (a 
> org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend)
>   at 
> org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1438)
>   at 
> org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:359)
>   at 
> org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:310)
>   - locked <0x880e6308> (a 
> org.apache.spark.ExecutorAllocationManager)
>   at 
> org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:264)
>   - locked <0x880e6308> (a 
> org.apache.spark.ExecutorAllocationManager)
>   at 
> org.apache.spark.ExecutorAllocationManager$$anon$2.run(ExecutorAllocationManager.scala:223)
> "SparkListenerBus" #161 daemon prio=5 os_prio=0 tid=0x7fa3053be000 
> nid=0xcec9 waiting for monitor entry [0x7fa2b3dfc000]
>java.lang.Thread.State: BLOCKED (on object monitor)
>   at 
> org.apache.spark.ExecutorAllocationManager$ExecutorAllocationListener.onTaskEnd(ExecutorAllocationManager.scala:618)
>   - waiting to lock <0x880e6308> (a 
> org.apache.spark.ExecutorAllocationManager)
>   at 
> org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:42)
>   at 
> org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
>   at 
> org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31)
>   at 
> org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:55)
>   at 
> org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:37)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(AsynchronousListenerBus.scala:80)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(AsynchronousListenerBus.scala:65)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
>   at 
> org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:64)
>   at

[jira] [Commented] (SPARK-16798) java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2

2016-07-29 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400015#comment-15400015
 ] 

Sean Owen commented on SPARK-16798:
---

Yeah I noticed that too, and while the conclusion is probably the same, I 
wasn't sure why we didn't see a different message there. You are definitely 
running unmodified 2.0.0?

> java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2
> 
>
> Key: SPARK-16798
> URL: https://issues.apache.org/jira/browse/SPARK-16798
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Charles Allen
>
> Code at https://github.com/metamx/druid-spark-batch which was working under 
> 1.5.2 has ceased to function under 2.0.0 with the below stacktrace.
> {code}
> java.lang.IllegalArgumentException: bound must be positive
>   at java.util.Random.nextInt(Random.java:388)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16798) java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2

2016-07-29 Thread Charles Allen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400014#comment-15400014
 ] 

Charles Allen commented on SPARK-16798:
---

The super odd thing here is that RDD.scala:445 *SHOULD* be protected by the 
check introduced in https://github.com/apache/spark/pull/13282 , but for some 
reason it does not seem to be.

> java.lang.IllegalArgumentException: bound must be positive : Worked in 1.5.2
> 
>
> Key: SPARK-16798
> URL: https://issues.apache.org/jira/browse/SPARK-16798
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.0
>Reporter: Charles Allen
>
> Code at https://github.com/metamx/druid-spark-batch which was working under 
> 1.5.2 has ceased to function under 2.0.0 with the below stacktrace.
> {code}
> java.lang.IllegalArgumentException: bound must be positive
>   at java.util.Random.nextInt(Random.java:388)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:445)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$coalesce$1$$anonfun$9.apply(RDD.scala:444)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$25.apply(RDD.scala:807)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14818) Move sketch and mllibLocal out from mima exclusion

2016-07-29 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15400010#comment-15400010
 ] 

Sean Owen commented on SPARK-14818:
---

[~yhuai] do you want to implement this now?

> Move sketch and mllibLocal out from mima exclusion
> --
>
> Key: SPARK-14818
> URL: https://issues.apache.org/jira/browse/SPARK-14818
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Yin Huai
>Priority: Blocker
>
> In SparkBuild.scala, we exclude sketch and mllibLocal from mima check (see 
> the definition of mimaProjects). After we release 2.0, we should move them 
> out from this list. So, we can check binary compatibility for them



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-16667) Spark driver executor dont release unused memory

2016-07-29 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-16667.
---
Resolution: Not A Problem

> Spark driver executor dont release unused memory
> 
>
> Key: SPARK-16667
> URL: https://issues.apache.org/jira/browse/SPARK-16667
> Project: Spark
>  Issue Type: Bug
>  Components: GraphX, Spark Core
>Affects Versions: 1.6.0
> Environment: Ubuntu wily 64 bits
> java 1.8
> 3 slaves(4GB) 1 master(2GB) virtual machines in Vmware over i7 4th generation 
> with 16 gb RAM)
>Reporter: Luis Angel Hernández Acosta
>
> I'm running spark app in standalone cluster. My app create sparkContext and 
> make many calculation with graphx over the time. To calculate, my app create 
> new java thread and wait for it's ending signal. Betwenn any calculation, 
> memory grows 50mb-100mb. I make a thread to be sure that any object created 
> for calculate is destryed after calculate's end, but memory still growing. I 
> tray stoping the sparkContext and all executor memory allocated by app is 
> freed but my driver's memory still growing same 50m-100mb.
> My graph calculaiton include hdfs seralization of rdd and load graph from hdfs
> Spark env:
> export SPARK_MASTER_IP=master
> export SPARK_WORKER_CORES=4
> export SPARK_WORKER_MEMORY=2919m
> export SPARK_WORKER_INSTANCES=1
> export SPARK_DAEMON_MEMORY=256m
> export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true 
> -Dspark.worker.cleanup.interval=10"
> That are my only configurations



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-16725) Migrate Guava to 16+?

2016-07-29 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-16725.
---
Resolution: Won't Fix

> Migrate Guava to 16+?
> -
>
> Key: SPARK-16725
> URL: https://issues.apache.org/jira/browse/SPARK-16725
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 2.0.1
>Reporter: Min Wei
>Priority: Minor
>   Original Estimate: 12h
>  Remaining Estimate: 12h
>
> Currently Spark depends on an old version of Guava, version 14. However 
> Spark-cassandra driver asserts on Guava version 16 and above. 
> It would be great to update the Guava dependency to version 16+
> diff --git a/core/src/main/scala/org/apache/spark/SecurityManager.scala 
> b/core/src/main/scala/org/apache/spark/SecurityManager.scala
> index f72c7de..abddafe 100644
> --- a/core/src/main/scala/org/apache/spark/SecurityManager.scala
> +++ b/core/src/main/scala/org/apache/spark/SecurityManager.scala
> @@ -23,7 +23,7 @@ import java.security.{KeyStore, SecureRandom}
>  import java.security.cert.X509Certificate
>  import javax.net.ssl._
>  
> -import com.google.common.hash.HashCodes
> +import com.google.common.hash.HashCode
>  import com.google.common.io.Files
>  import org.apache.hadoop.io.Text
>  
> @@ -432,7 +432,7 @@ private[spark] class SecurityManager(sparkConf: SparkConf)
>  val secret = new Array[Byte](length)
>  rnd.nextBytes(secret)
>  
> -val cookie = HashCodes.fromBytes(secret).toString()
> +val cookie = HashCode.fromBytes(secret).toString()
>  SparkHadoopUtil.get.addSecretKeyToUserCredentials(SECRET_LOOKUP_KEY, 
> cookie)
>  cookie
>} else {
> diff --git a/core/src/main/scala/org/apache/spark/SparkEnv.scala 
> b/core/src/main/scala/org/apache/spark/SparkEnv.scala
> index af50a6d..02545ae 100644
> --- a/core/src/main/scala/org/apache/spark/SparkEnv.scala
> +++ b/core/src/main/scala/org/apache/spark/SparkEnv.scala
> @@ -72,7 +72,7 @@ class SparkEnv (
>  
>// A general, soft-reference map for metadata needed during HadoopRDD 
> split computation
>// (e.g., HadoopFileRDD uses this to cache JobConfs and InputFormats).
> -  private[spark] val hadoopJobMetadata = new 
> MapMaker().softValues().makeMap[String, Any]()
> +  private[spark] val hadoopJobMetadata = new 
> MapMaker().weakValues().makeMap[String, Any]()
>  
>private[spark] var driverTmpDir: Option[String] = None
>  
> diff --git a/pom.xml b/pom.xml
> index d064cb5..7c3e036 100644
> --- a/pom.xml
> +++ b/pom.xml
> @@ -368,8 +368,7 @@
>
>  com.google.guava
>  guava
> -14.0.1
> -provided
> +19.0
>
>
>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-16702) Driver hangs after executors are lost

2016-07-29 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-16702.
---
Resolution: Duplicate

> Driver hangs after executors are lost
> -
>
> Key: SPARK-16702
> URL: https://issues.apache.org/jira/browse/SPARK-16702
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.1, 1.6.2, 2.0.0
>Reporter: Angus Gerry
> Attachments: SparkThreadsBlocked.txt
>
>
> It's my first time, please be kind.
> I'm still trying to debug this error locally - at this stage I'm pretty 
> convinced that it's a weird deadlock/livelock problem due to the use of 
> {{scheduleAtFixedRate}} within {{ExecutorAllocationManager}}. This problem is 
> possibly tangentially related to the issues discussed in SPARK-1560 around 
> the use of blocking calls within locks.
> h4. Observed Behavior
> When running a spark job, and executors are lost, the job occassionally goes 
> into a state where it makes no progress with tasks. Most commonly it seems 
> that the issue occurs when executors are preempted by yarn, but I'm not 
> confident enough to state that it's restricted to just this scenario.
> Upon inspecting a thread dump from the driver, the following stack traces 
> seem noteworthy (a full thread dump is attached):
> {noformat:title=Thread 178: spark-dynamic-executor-allocation (TIMED_WAITING)}
> sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
> scala.concurrent.impl.Promise$DefaultPromise.tryAwait(Promise.scala:208)
> scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:218)
> scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
> scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
> scala.concurrent.Await$.result(package.scala:190)
> org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:77)
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend.doRequestTotalExecutors(YarnSchedulerBackend.scala:59)
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.requestTotalExecutors(CoarseGrainedSchedulerBackend.scala:447)
> org.apache.spark.SparkContext.requestTotalExecutors(SparkContext.scala:1423)
> org.apache.spark.ExecutorAllocationManager.addExecutors(ExecutorAllocationManager.scala:359)
> org.apache.spark.ExecutorAllocationManager.updateAndSyncNumExecutorsTarget(ExecutorAllocationManager.scala:310)
> org.apache.spark.ExecutorAllocationManager.org$apache$spark$ExecutorAllocationManager$$schedule(ExecutorAllocationManager.scala:264)
> org.apache.spark.ExecutorAllocationManager$$anon$2.run(ExecutorAllocationManager.scala:223)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> java.lang.Thread.run(Thread.java:745)
> {noformat}
> {noformat:title=Thread 22: dispatcher-event-loop-10 (BLOCKED)}
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend$DriverEndpoint.disableExecutor(CoarseGrainedSchedulerBackend.scala:289)
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint$$anonfun$onDisconnected$1.apply(YarnSchedulerBackend.scala:121)
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint$$anonfun$onDisconnected$1.apply(YarnSchedulerBackend.scala:120)
> scala.Option.foreach(Option.scala:257)
> org.apache.spark.scheduler.cluster.YarnSchedulerBackend$YarnDriverEndpoint.onDisconnected(YarnSchedulerBackend.scala:120)
> org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:142)
> org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)
> org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>

[jira] [Resolved] (SPARK-16688) OpenHashSet.MAX_CAPACITY is always based on Int even when using Long

2016-07-29 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-16688.
---
Resolution: Won't Fix

> OpenHashSet.MAX_CAPACITY is always based on Int even when using Long
> 
>
> Key: SPARK-16688
> URL: https://issues.apache.org/jira/browse/SPARK-16688
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Ben McCann
>
> MAX_CAPACITY is hardcoded to a value of 1073741824:
> {code}val MAX_CAPACITY = 1 << 30
>   class LongHasher extends Hasher[Long] {
> override def hash(o: Long): Int = (o ^ (o >>> 32)).toInt
>   }{code}
> I'd like to stick more than 1B items in my hashmap. Spark's all about big 
> data, right?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-16705) Kafka Direct Stream is not experimental anymore

2016-07-29 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-16705.
---
Resolution: Duplicate

> Kafka Direct Stream is not experimental anymore
> ---
>
> Key: SPARK-16705
> URL: https://issues.apache.org/jira/browse/SPARK-16705
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.0.0
>Reporter: Bertrand Dechoux
>Priority: Trivial
>
> http://spark.apache.org/docs/latest/streaming-kafka-integration.html
> {quote}
> Note that this is an experimental feature introduced in Spark 1.3 for the 
> Scala and Java API, in Spark 1.4 for the Python API.
> {quote}
> The feature was indeed marked as experimental for spark 1.3 but is not 
> anymore.
> * 
> https://spark.apache.org/docs/1.3.0/api/java/index.html?org/apache/spark/streaming/kafka/KafkaUtils.html
> * 
> https://spark.apache.org/docs/1.6.2/api/java/index.html?org/apache/spark/streaming/kafka/KafkaUtils.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16802) joins.LongToUnsafeRowMap crashes with ArrayIndexOutOfBoundsException

2016-07-29 Thread Sylvain Zimmer (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Zimmer updated SPARK-16802:
---
Description: 
Hello!

This is a little similar to 
[SPARK-16740|https://issues.apache.org/jira/browse/SPARK-16740] (should I have 
reopened it?).

I would recommend to give another full review to {{HashedRelation.scala}}, 
particularly the new {{LongToUnsafeRowMap}} code. I've had a few other errors 
that I haven't managed to reproduce so far, as well as what I suspect could be 
memory leaks (I have a query in a loop OOMing after a few iterations despite 
not caching its results).

Here is the script to reproduce the ArrayIndexOutOfBoundsException on the 
current 2.0 branch:

{code}
import os
import random

from pyspark import SparkContext
from pyspark.sql import types as SparkTypes
from pyspark.sql import SQLContext

sc = SparkContext()
sqlc = SQLContext(sc)

schema1 = SparkTypes.StructType([
SparkTypes.StructField("id1", SparkTypes.LongType(), nullable=True)
])
schema2 = SparkTypes.StructType([
SparkTypes.StructField("id2", SparkTypes.LongType(), nullable=True)
])


def randlong():
return random.randint(-9223372036854775808, 9223372036854775807)

while True:
l1, l2 = randlong(), randlong()

# Sample values that crash:
# l1, l2 = 4661454128115150227, -5543241376386463808

print "Testing with %s, %s" % (l1, l2)
data1 = [(l1, ), (l2, )]
data2 = [(l1, )]

df1 = sqlc.createDataFrame(sc.parallelize(data1), schema1)
df2 = sqlc.createDataFrame(sc.parallelize(data2), schema2)

crash = True
if crash:
os.system("rm -rf /tmp/sparkbug")
df1.write.parquet("/tmp/sparkbug/vertex")
df2.write.parquet("/tmp/sparkbug/edge")

df1 = sqlc.read.load("/tmp/sparkbug/vertex")
df2 = sqlc.read.load("/tmp/sparkbug/edge")

sqlc.registerDataFrameAsTable(df1, "df1")
sqlc.registerDataFrameAsTable(df2, "df2")

result_df = sqlc.sql("""
SELECT
df1.id1
FROM df1
LEFT OUTER JOIN df2 ON df1.id1 = df2.id2
""")

print result_df.collect()
{code}

{code}
java.lang.ArrayIndexOutOfBoundsException: 1728150825
at 
org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.getValue(HashedRelation.scala:463)
at 
org.apache.spark.sql.execution.joins.LongHashedRelation.getValue(HashedRelation.scala:762)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:117)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:112)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.to(SerDeUtil.scala:112)
at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.toBuffer(SerDeUtil.scala:112)
at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.toArray(SerDeUtil.scala:112)
at 
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:899)
at 
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:899)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1898)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1898)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/07/29 20:19:00 WARN TaskSetManager: Lost task 0.0 in stage 17.0 (TID 50, 
localhost): java.lang.ArrayIndexOutOfBoundsException: 1728150825
at 
org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.getValue(HashedRelation.scala:463)

[jira] [Updated] (SPARK-16802) joins.LongToUnsafeRowMap crashes with ArrayIndexOutOfBoundsException

2016-07-29 Thread Sylvain Zimmer (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Zimmer updated SPARK-16802:
---
Description: 
Hello!

This is a little similar to 
[SPARK-16740|https://issues.apache.org/jira/browse/SPARK-16740] (should I have 
reopened it?).

I would recommend to give another full review to {{HashedRelation.scala}}, 
particularly the new {{LongToUnsafeRowMap}} code. I've had a few other errors 
that I haven't managed to reproduce so far, as well as what I suspect could be 
memory leaks (I have a query in a loop OOMing after a few iterations despite 
not caching its results).

Here is the script to reproduce the ArrayIndexOutOfBoundsException on the 
current 2.0 branch:

{code}
import os
import random

from pyspark import SparkContext
from pyspark.sql import types as SparkTypes
from pyspark.sql import SQLContext

sc = SparkContext()
sqlc = SQLContext(sc)

schema1 = SparkTypes.StructType([
SparkTypes.StructField("id1", SparkTypes.LongType(), nullable=True),
# SparkTypes.StructField("weight1", SparkTypes.FloatType(), nullable=True)
])
schema2 = SparkTypes.StructType([
SparkTypes.StructField("id2", SparkTypes.LongType(), nullable=True),
# SparkTypes.StructField("weight2", SparkTypes.LongType(), nullable=True)
])


def randlong():
return random.randint(-9223372036854775808, 9223372036854775807)

while True:
l1, l2 = randlong(), randlong()

# Sample values that crash:
# l1, l2 = 4661454128115150227, -5543241376386463808

print "Testing with %s, %s" % (l1, l2)
data1 = [(l1, ), (l2, )]
data2 = [(l1, )]

df1 = sqlc.createDataFrame(sc.parallelize(data1), schema1)
df2 = sqlc.createDataFrame(sc.parallelize(data2), schema2)

crash = True
if crash:
os.system("rm -rf /tmp/sparkbug")
df1.write.parquet("/tmp/sparkbug/vertex")
df2.write.parquet("/tmp/sparkbug/edge")

df1 = sqlc.read.load("/tmp/sparkbug/vertex")
df2 = sqlc.read.load("/tmp/sparkbug/edge")

sqlc.registerDataFrameAsTable(df1, "df1")
sqlc.registerDataFrameAsTable(df2, "df2")

result_df = sqlc.sql("""
SELECT
df1.id1
FROM df1
LEFT OUTER JOIN df2 ON df1.id1 = df2.id2
""")

print result_df.collect()
{code}

{code}
java.lang.ArrayIndexOutOfBoundsException: 1728150825
at 
org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.getValue(HashedRelation.scala:463)
at 
org.apache.spark.sql.execution.joins.LongHashedRelation.getValue(HashedRelation.scala:762)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:117)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:112)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.to(SerDeUtil.scala:112)
at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.toBuffer(SerDeUtil.scala:112)
at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.toArray(SerDeUtil.scala:112)
at 
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:899)
at 
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:899)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1898)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1898)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/07/29 20:19:00 WARN TaskSetManager: Lost task 0.0 in stage 17.0 (TID 50, 
localhost):

[jira] [Commented] (SPARK-16801) clearThreshold does not work for SparseVector

2016-07-29 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15399953#comment-15399953
 ] 

Sean Owen commented on SPARK-16801:
---

{{clearThreshold}} is unrelated to vectors entirely. It's a threshold on the 
dot product. You'd have to provide a specific example, but I don't think this 
can be the actual problem if there is one.

> clearThreshold does not work for SparseVector
> -
>
> Key: SPARK-16801
> URL: https://issues.apache.org/jira/browse/SPARK-16801
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib, PySpark
>Affects Versions: 1.6.2
>Reporter: Rahul Shah
>Priority: Minor
>
> LogisticRegression model of mllib library performs randomly when passed with 
> an SparseVector instead of DenseVector.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-16802) joins.LongToUnsafeRowMap crashes with ArrayIndexOutOfBoundsException

2016-07-29 Thread Sylvain Zimmer (JIRA)

Sylvain Zimmer created SPARK-16802:
--

 Summary: joins.LongToUnsafeRowMap crashes with 
ArrayIndexOutOfBoundsException
 Key: SPARK-16802
 URL: https://issues.apache.org/jira/browse/SPARK-16802
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0, 2.0.1
Reporter: Sylvain Zimmer


Hello!

This is a little similar to 
[SPARK-16740|https://issues.apache.org/jira/browse/SPARK-16740] (should I have 
reopened it?).

I would recommend to give another full review to {{HashedRelation.scala}}, 
particularly the new {{LongToUnsafeRowMap}} code. I've had a few other errors 
that I haven't managed to reproduce so far, as well as what I suspect could be 
memory leaks (I have a query in a loop OOMing after a few iterations despite 
not caching its results).

Here is the script to reproduce the ArrayIndexOutOfBoundsException on the 
current 2.0 branch:

{{code}}
import os
import random

from pyspark import SparkContext
from pyspark.sql import types as SparkTypes
from pyspark.sql import SQLContext

sc = SparkContext()
sqlc = SQLContext(sc)

schema1 = SparkTypes.StructType([
SparkTypes.StructField("id1", SparkTypes.LongType(), nullable=True),
# SparkTypes.StructField("weight1", SparkTypes.FloatType(), nullable=True)
])
schema2 = SparkTypes.StructType([
SparkTypes.StructField("id2", SparkTypes.LongType(), nullable=True),
# SparkTypes.StructField("weight2", SparkTypes.LongType(), nullable=True)
])


def randlong():
return random.randint(-9223372036854775808, 9223372036854775807)

while True:
l1, l2 = randlong(), randlong()

# Sample values that crash:
# l1, l2 = 4661454128115150227, -5543241376386463808

print "Testing with %s, %s" % (l1, l2)
data1 = [(l1, ), (l2, )]
data2 = [(l1, )]

df1 = sqlc.createDataFrame(sc.parallelize(data1), schema1)
df2 = sqlc.createDataFrame(sc.parallelize(data2), schema2)

crash = True
if crash:
os.system("rm -rf /tmp/sparkbug")
df1.write.parquet("/tmp/sparkbug/vertex")
df2.write.parquet("/tmp/sparkbug/edge")

df1 = sqlc.read.load("/tmp/sparkbug/vertex")
df2 = sqlc.read.load("/tmp/sparkbug/edge")

sqlc.registerDataFrameAsTable(df1, "df1")
sqlc.registerDataFrameAsTable(df2, "df2")

result_df = sqlc.sql("""
SELECT
df1.id1
FROM df1
LEFT OUTER JOIN df2 ON df1.id1 = df2.id2
""")

print result_df.collect()
{{code}}

{{code}}
java.lang.ArrayIndexOutOfBoundsException: 1728150825
at 
org.apache.spark.sql.execution.joins.LongToUnsafeRowMap.getValue(HashedRelation.scala:463)
at 
org.apache.spark.sql.execution.joins.LongHashedRelation.getValue(HashedRelation.scala:762)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:370)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.hasNext(SerDeUtil.scala:117)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.foreach(SerDeUtil.scala:112)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.to(SerDeUtil.scala:112)
at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.toBuffer(SerDeUtil.scala:112)
at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at 
org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler.toArray(SerDeUtil.scala:112)
at 
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:899)
at 
org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$13.apply(RDD.scala:899)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1898)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1898)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at

[jira] [Updated] (SPARK-16664) Spark 1.6.2 - Persist call on Data frames with more than 200 columns is wiping out the data.

2016-07-29 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-16664:
--
Fix Version/s: 1.6.3

> Spark 1.6.2 - Persist call on Data frames with more than 200 columns is 
> wiping out the data.
> 
>
> Key: SPARK-16664
> URL: https://issues.apache.org/jira/browse/SPARK-16664
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.2
>Reporter: Satish Kolli
>Assignee: Wesley Tang
>Priority: Blocker
> Fix For: 1.6.3, 2.0.1, 2.1.0
>
>
> Calling persist on a data frame with more than 200 columns is removing the 
> data from the data frame. This is an issue in Spark 1.6.2. Works with out any 
> issues in Spark 1.6.1
> Following test case demonstrates problem. Please let me know if you need any 
> additional information. Thanks.
> {code}
> import org.apache.spark._
> import org.apache.spark.rdd.RDD
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.{Row, SQLContext}
> import org.scalatest.FunSuite
> class TestSpark162_1 extends FunSuite {
>   test("test data frame with 200 columns") {
> val sparkConfig = new SparkConf()
> val parallelism = 5
> sparkConfig.set("spark.default.parallelism", s"$parallelism")
> sparkConfig.set("spark.sql.shuffle.partitions", s"$parallelism")
> val sc = new SparkContext(s"local[3]", "TestNestedJson", sparkConfig)
> val sqlContext = new SQLContext(sc)
> // create dataframe with 200 columns and fake 200 values
> val size = 200
> val rdd: RDD[Seq[Long]] = sc.parallelize(Seq(Seq.range(0, size)))
> val rowRdd: RDD[Row] = rdd.map(d => Row.fromSeq(d))
> val schemas = List.range(0, size).map(a => StructField("name"+ a, 
> LongType, true))
> val testSchema = StructType(schemas)
> val testDf = sqlContext.createDataFrame(rowRdd, testSchema)
> // test value
> assert(testDf.persist.take(1).apply(0).toSeq(100).asInstanceOf[Long] == 
> 100)
> sc.stop()
>   }
>   test("test data frame with 201 columns") {
> val sparkConfig = new SparkConf()
> val parallelism = 5
> sparkConfig.set("spark.default.parallelism", s"$parallelism")
> sparkConfig.set("spark.sql.shuffle.partitions", s"$parallelism")
> val sc = new SparkContext(s"local[3]", "TestNestedJson", sparkConfig)
> val sqlContext = new SQLContext(sc)
> // create dataframe with 201 columns and fake 201 values
> val size = 201
> val rdd: RDD[Seq[Long]] = sc.parallelize(Seq(Seq.range(0, size)))
> val rowRdd: RDD[Row] = rdd.map(d => Row.fromSeq(d))
> val schemas = List.range(0, size).map(a => StructField("name"+ a, 
> LongType, true))
> val testSchema = StructType(schemas)
> val testDf = sqlContext.createDataFrame(rowRdd, testSchema)
> // test value
> assert(testDf.persist.take(1).apply(0).toSeq(100).asInstanceOf[Long] == 
> 100)
> sc.stop()
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16797) Repartiton call w/ 0 partitions drops data

2016-07-29 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15399948#comment-15399948
 ] 

Dongjoon Hyun commented on SPARK-16797:
---

Hi, [~bjeffrey].
Could you give me more information how to find that? For me, it seems to raise 
a proper exception like the following.
{code}
scala> spark.sparkContext.parallelize(1 to 10).repartition(0)
java.lang.IllegalArgumentException: requirement failed: Number of partitions 
(0) must be positive.
{code}

> Repartiton call w/ 0 partitions drops data
> --
>
> Key: SPARK-16797
> URL: https://issues.apache.org/jira/browse/SPARK-16797
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Bryan Jeffrey
>Priority: Minor
>  Labels: easyfix
>
> When you call RDD.repartition(0) or DStream.repartition(0), the input data 
> silently dropped. This should not silently fail; instead an exception should 
> be thrown to alert the user to the issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-16796) Visible passwords on Spark environment page

2016-07-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-16796:


Assignee: Apache Spark

> Visible passwords on Spark environment page
> ---
>
> Key: SPARK-16796
> URL: https://issues.apache.org/jira/browse/SPARK-16796
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Artur
>Assignee: Apache Spark
>Priority: Trivial
> Attachments: 
> Mask_spark_ssl_keyPassword_spark_ssl_keyStorePassword_spark_ssl_trustStorePassword_from_We1.patch
>
>
> Spark properties (passwords): 
> spark.ssl.keyPassword,spark.ssl.keyStorePassword,spark.ssl.trustStorePassword
> are visible in Web UI in environment page.
> Can we mask them from Web UI?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16796) Visible passwords on Spark environment page

2016-07-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15399944#comment-15399944
 ] 

Apache Spark commented on SPARK-16796:
--

User 'Devian-ua' has created a pull request for this issue:
https://github.com/apache/spark/pull/14409

> Visible passwords on Spark environment page
> ---
>
> Key: SPARK-16796
> URL: https://issues.apache.org/jira/browse/SPARK-16796
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Reporter: Artur
>Priority: Trivial
> Attachments: 
> Mask_spark_ssl_keyPassword_spark_ssl_keyStorePassword_spark_ssl_trustStorePassword_from_We1.patch
>
>
> Spark properties (passwords): 
> spark.ssl.keyPassword,spark.ssl.keyStorePassword,spark.ssl.trustStorePassword
> are visible in Web UI in environment page.
> Can we mask them from Web UI?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 3 >

1 - 100 of 215 matches

Mail list logo