[jira] [Comment Edited] (SPARK-15345) SparkSession's conf doesn't take effect when there's already an existing SparkContext

2016-05-24 Thread Yi Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299568#comment-15299568
 ] 

Yi Zhou edited comment on SPARK-15345 at 5/25/16 6:56 AM:
--

1) Spark SQL can't find existing hive metastore database in spark-sql shell by 
issuing 'show databases;'
2) Always told me that there is already existing database..(i saw a local derby 
metastore_db folder in current directory). it seemed that spark sql can't read 
the hive conf(eg, hive-site.xml )..
3) Key configurations in spark-defaults.conf:
{code}
spark.sql.hive.metastore.version=1.1.0
spark.sql.hive.metastore.jars=/usr/lib/hive/lib/*:/usr/lib/hadoop/client/*
spark.executor.extraClassPath=/etc/hive/conf
spark.driver.extraClassPath=/etc/hive/conf
spark.yarn.jars=local:/usr/lib/spark/jars/*
{code}

16/05/23 09:48:24 ERROR metastore.RetryingHMSHandler: 
AlreadyExistsException(message:Database test_sparksql already exists)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:898)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:133)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
at com.sun.proxy.$Proxy34.create_database(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createDatabase(HiveMetaStoreClient.java:645)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:91)
at com.sun.proxy.$Proxy35.createDatabase(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.createDatabase(Hive.java:341)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply$mcV$sp(HiveClientImpl.scala:289)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply(HiveClientImpl.scala:289)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply(HiveClientImpl.scala:289)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:260)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:207)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:206)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:249)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.createDatabase(HiveClientImpl.scala:288)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply$mcV$sp(HiveExternalCatalog.scala:94)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply(HiveExternalCatalog.scala:94)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply(HiveExternalCatalog.scala:94)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:68)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.createDatabase(HiveExternalCatalog.scala:93)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:142)
at 
org.apache.spark.sql.execution.command.CreateDatabaseCommand.run(ddl.scala:58)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:57)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:55)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:69)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at 
org.apache.spark.sql.execution.QueryExecuti

[jira] [Commented] (SPARK-15345) SparkSession's conf doesn't take effect when there's already an existing SparkContext

2016-05-24 Thread Yi Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299568#comment-15299568
 ] 

Yi Zhou commented on SPARK-15345:
-

1) Spark SQL can't find existing hive metastore database in spark-sql shell by 
issuing 'show databases;'
2) Always told me that there is already existing database..(i saw a local derby 
metastore_db folder in current directory). it seemed that spark sql can't read 
the hive conf..
3) Key configurations in spark-defaults.conf:
{code}
spark.sql.hive.metastore.version=1.1.0
spark.sql.hive.metastore.jars=/usr/lib/hive/lib/*:/usr/lib/hadoop/client/*
spark.executor.extraClassPath=/etc/hive/conf
spark.driver.extraClassPath=/etc/hive/conf
spark.yarn.jars=local:/usr/lib/spark/jars/*
{code}

16/05/23 09:48:24 ERROR metastore.RetryingHMSHandler: 
AlreadyExistsException(message:Database test_sparksql already exists)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_database(HiveMetaStore.java:898)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:133)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
at com.sun.proxy.$Proxy34.create_database(Unknown Source)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createDatabase(HiveMetaStoreClient.java:645)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:91)
at com.sun.proxy.$Proxy35.createDatabase(Unknown Source)
at org.apache.hadoop.hive.ql.metadata.Hive.createDatabase(Hive.java:341)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply$mcV$sp(HiveClientImpl.scala:289)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply(HiveClientImpl.scala:289)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$createDatabase$1.apply(HiveClientImpl.scala:289)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:260)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:207)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:206)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:249)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.createDatabase(HiveClientImpl.scala:288)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply$mcV$sp(HiveExternalCatalog.scala:94)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply(HiveExternalCatalog.scala:94)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$createDatabase$1.apply(HiveExternalCatalog.scala:94)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:68)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.createDatabase(HiveExternalCatalog.scala:93)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:142)
at 
org.apache.spark.sql.execution.command.CreateDatabaseCommand.run(ddl.scala:58)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:57)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:55)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:69)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
at 
org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:85)
at 
org.apache.s

[jira] [Created] (SPARK-15522) DataFrame Column Names That are Numbers aren't referenced correctly in SQL

2016-05-24 Thread Jason Pohl (JIRA)
Jason Pohl created SPARK-15522:
--

 Summary: DataFrame Column Names That are Numbers aren't referenced 
correctly in SQL
 Key: SPARK-15522
 URL: https://issues.apache.org/jira/browse/SPARK-15522
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Jason Pohl


The following code is run:

val pre_piv_df_a = sqlContext.sql("""
SELECT
CASE WHEN Gender = 'M' Then 1 ELSE 0 END AS has_male,
CASE WHEN Gender = 'F' Then 1 ELSE 0 END AS has_female,
CAST(StartAge AS Double) AS StartAge_dbl,
CAST(EndAge AS Double) AS EndAge_dbl,
*
FROM alldem_union_curr
""")
.withColumn("JavaStartTimestamp", create_ts($"StartTimestamp"))
.drop("StartTimestamp").withColumnRenamed("JavaStartTimestamp", 
"StartTimestamp")
.drop("StartAge").drop("EndAge")
.withColumnRenamed("StartAge_dbl", "StartAge").withColumnRenamed("EndAge_dbl", 
"EndAge")

val pre_piv_df_b = pre_piv_df_a
.withColumn("media_month_cc", media_month_cc($"MediaMonth"))
.withColumn("media_week_cc", media_week_sts_cc($"StartTimestamp"))
.withColumn("media_day_cc", media_day_sts_cc($"StartTimestamp"))
.withColumn("week_day", week_day($"StartTimestamp"))
.withColumn("week_end", week_end($"StartTimestamp"))

.join(sqlContext.table("cad_nets"), $"Network" === $"nielsen_network", "inner")
.withColumnRenamed("cad_network", "norm_net_code_a")
.withColumn("norm_net_code", reCodeNets($"norm_net_code_a"))
pre_piv_df_b.registerTempTable("pre_piv_df")

val piv_qhID_df = pre_piv_df_b.groupBy("Network", "Audience", "StartDate", 
"rating_category_cd")
.pivot("qaID").agg("rating" -> "mean")

The pivot creates a lot of columns (96) with names that are like 
‘01’,’02’,…,’96’ as a result of pivoting a table that has quarter hour IDs.

In the below SQL the highlighted section causes problems. If I rename the 
columns to ‘col01’,’col02’,…,’col96’ I can run the SQL correctly and get the 
expected results.

select * from piv_qhID where 82 is NULL limit 20

And I am getting no rows even though there are nulls.

On the other hand the query:
select * from piv_qhID where 82 is NOT NULL limit 20

Returns all rows (even those with nulls)

Renaming the columns fixes this, but it would be nice if the columns were 
referenced correctly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-15043) Fix and re-enable flaky test: mllib.stat.JavaStatisticsSuite.testCorr

2016-05-24 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng closed SPARK-15043.
-
Resolution: Fixed

Fixed as part of SPARK-15030.

> Fix and re-enable flaky test: mllib.stat.JavaStatisticsSuite.testCorr
> -
>
> Key: SPARK-15043
> URL: https://issues.apache.org/jira/browse/SPARK-15043
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 2.0.0
>Reporter: Josh Rosen
>Assignee: Sean Owen
>Priority: Critical
>
> It looks like the {{mllib.stat.JavaStatisticsSuite.testCorr}} test has become 
> flaky:
> https://spark-tests.appspot.com/tests/org.apache.spark.mllib.stat.JavaStatisticsSuite/testCorr
> The first observed failure was in 
> https://spark-tests.appspot.com/builds/spark-master-test-maven-hadoop-2.6/816
> {code}
> java.lang.AssertionError: expected:<0.9986422261219262> but 
> was:<0.9986422261219272>
>   at 
> org.apache.spark.mllib.stat.JavaStatisticsSuite.testCorr(JavaStatisticsSuite.java:75)
> {code}
> I'm going to ignore this test now, but we need to come back and fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15043) Fix and re-enable flaky test: mllib.stat.JavaStatisticsSuite.testCorr

2016-05-24 Thread Xiangrui Meng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-15043:
--
Fix Version/s: 2.0.0

> Fix and re-enable flaky test: mllib.stat.JavaStatisticsSuite.testCorr
> -
>
> Key: SPARK-15043
> URL: https://issues.apache.org/jira/browse/SPARK-15043
> Project: Spark
>  Issue Type: Bug
>  Components: MLlib
>Affects Versions: 2.0.0
>Reporter: Josh Rosen
>Assignee: Sean Owen
>Priority: Critical
> Fix For: 2.0.0
>
>
> It looks like the {{mllib.stat.JavaStatisticsSuite.testCorr}} test has become 
> flaky:
> https://spark-tests.appspot.com/tests/org.apache.spark.mllib.stat.JavaStatisticsSuite/testCorr
> The first observed failure was in 
> https://spark-tests.appspot.com/builds/spark-master-test-maven-hadoop-2.6/816
> {code}
> java.lang.AssertionError: expected:<0.9986422261219262> but 
> was:<0.9986422261219272>
>   at 
> org.apache.spark.mllib.stat.JavaStatisticsSuite.testCorr(JavaStatisticsSuite.java:75)
> {code}
> I'm going to ignore this test now, but we need to come back and fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org




[jira] [Commented] (SPARK-14529) Consolidate mllib and mllib-local into one mllib folder

2016-05-24 Thread Xiangrui Meng (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299511#comment-15299511
 ] 

Xiangrui Meng commented on SPARK-14529:
---

We should decide whether we want to make this change in 2.0. I don't have 
strong preference on which folder layout is better. So I would +1 on keeping 
the current layout since it doesn't require code changes. How does it sound?

> Consolidate mllib and mllib-local into one mllib folder
> ---
>
> Key: SPARK-14529
> URL: https://issues.apache.org/jira/browse/SPARK-14529
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Reporter: DB Tsai
>Assignee: DB Tsai
>Priority: Minor
>
> In the 2.0 QA period (to avoid the conflict of other PRs), this task will 
> consolidate `mllib/src` into `mllib/mllib/src` and `mllib-local/src` into 
> `mllib/mllib-local/src`. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15521) Add high level APIs based on dapply and gapply for easier usage

2016-05-24 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299483#comment-15299483
 ] 

Sun Rui edited comment on SPARK-15521 at 5/25/16 5:30 AM:
--

cc [~felixcheung], [~timhunter]


was (Author: sunrui):
cc [~felixcheung]

> Add high level APIs based on dapply and gapply for easier usage
> ---
>
> Key: SPARK-15521
> URL: https://issues.apache.org/jira/browse/SPARK-15521
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Sun Rui
>
> dapply() and gapply() of SparkDataFrame are two basic functions. For easier 
> usage to users in the R community, some high level functions can be added 
> based on them.
> Candidates are:
> http://exposurescience.org/heR.doc/library/heR.Misc/html/dapply.html
> http://exposurescience.org/heR.doc/library/stats/html/aggregate.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15449) Wrong Data Format - Documentation Issue

2016-05-24 Thread Miao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299484#comment-15299484
 ] 

Miao Wang commented on SPARK-15449:
---

Got it. I will submit PR soon after fixing the R unit tests failures.

> Wrong Data Format - Documentation Issue
> ---
>
> Key: SPARK-15449
> URL: https://issues.apache.org/jira/browse/SPARK-15449
> Project: Spark
>  Issue Type: Documentation
>  Components: Examples
>Affects Versions: 1.6.1
>Reporter: Kiran Biradarpatil
>Priority: Minor
>
> JAVA example given for MLLib NaiveBayes at 
> http://spark.apache.org/docs/latest/mllib-naive-bayes.html expects the data 
> in LibSVM format. But the example data in MLLib 
> data/mllib/sample_naive_bayes_data.txt is not in right format. 
> So please rectify the sample data file or the the implementation example.
> Thanks!
> Kiran 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15521) Add high level APIs based on dapply and gapply for easier usage

2016-05-24 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299483#comment-15299483
 ] 

Sun Rui commented on SPARK-15521:
-

cc [~felixcheung]

> Add high level APIs based on dapply and gapply for easier usage
> ---
>
> Key: SPARK-15521
> URL: https://issues.apache.org/jira/browse/SPARK-15521
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Sun Rui
>
> dapply() and gapply() of SparkDataFrame are two basic functions. For easier 
> usage to users in the R community, some high level functions can be added 
> based on them.
> Candidates are:
> http://exposurescience.org/heR.doc/library/heR.Misc/html/dapply.html
> http://exposurescience.org/heR.doc/library/stats/html/aggregate.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15521) Add high level APIs based on dapply and gapply for easier usage

2016-05-24 Thread Sun Rui (JIRA)
Sun Rui created SPARK-15521:
---

 Summary: Add high level APIs based on dapply and gapply for easier 
usage
 Key: SPARK-15521
 URL: https://issues.apache.org/jira/browse/SPARK-15521
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Sun Rui


dapply() and gapply() of SparkDataFrame are two basic functions. For easier 
usage to users in the R community, some high level functions can be added based 
on them.

Candidates are:
http://exposurescience.org/heR.doc/library/heR.Misc/html/dapply.html
http://exposurescience.org/heR.doc/library/stats/html/aggregate.html




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-12071) Programming guide should explain NULL in JVM translate to NA in R

2016-05-24 Thread Shivaram Venkataraman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-12071.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 13268
[https://github.com/apache/spark/pull/13268]

> Programming guide should explain NULL in JVM translate to NA in R
> -
>
> Key: SPARK-12071
> URL: https://issues.apache.org/jira/browse/SPARK-12071
> Project: Spark
>  Issue Type: Documentation
>  Components: SparkR
>Affects Versions: 1.6.0
>Reporter: Felix Cheung
>Priority: Minor
>  Labels: releasenotes, starter
> Fix For: 2.0.0
>
>
> This behavior seems to be new for Spark 1.6.0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15412) Improve linear & isotonic regression methods PyDocs

2016-05-24 Thread Shivaram Venkataraman (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-15412.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 13199
[https://github.com/apache/spark/pull/13199]

> Improve linear & isotonic regression methods PyDocs
> ---
>
> Key: SPARK-15412
> URL: https://issues.apache.org/jira/browse/SPARK-15412
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, PySpark
>Reporter: holdenk
>Assignee: holdenk
>Priority: Trivial
> Fix For: 2.0.0
>
>
> Very minor, but LinearRegression & Isotonic regression's PyDocs are missing 
> link, have a shorter description of boundaries, and aren't using list mode 
> for types of reguluarization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-10053) SparkR isn't exporting lapply

2016-05-24 Thread Sun Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Rui closed SPARK-10053.
---
Resolution: Won't Fix

> SparkR isn't exporting lapply
> -
>
> Key: SPARK-10053
> URL: https://issues.apache.org/jira/browse/SPARK-10053
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.4.1
>Reporter: Simon Hafner
>
> SparkR isn't exporting lapply and lapplyPartition (anymore?). There is no 
> other function exported to enable distributed calculations over DataFrames 
> (except groupBy). https://spark.apache.org/docs/latest/api/R/index.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10053) SparkR isn't exporting lapply

2016-05-24 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299475#comment-15299475
 ] 

Sun Rui commented on SPARK-10053:
-

as SparkDataFrame has supported dapply() similar to lapplyPartition, this issue 
can be closed

> SparkR isn't exporting lapply
> -
>
> Key: SPARK-10053
> URL: https://issues.apache.org/jira/browse/SPARK-10053
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.4.1
>Reporter: Simon Hafner
>
> SparkR isn't exporting lapply and lapplyPartition (anymore?). There is no 
> other function exported to enable distributed calculations over DataFrames 
> (except groupBy). https://spark.apache.org/docs/latest/api/R/index.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-15196) Add a wrapper for dapply(repartition(col,...), ... )

2016-05-24 Thread Sun Rui (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sun Rui closed SPARK-15196.
---
Resolution: Not A Problem

> Add a wrapper for dapply(repartition(col,...), ... )
> 
>
> Key: SPARK-15196
> URL: https://issues.apache.org/jira/browse/SPARK-15196
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Narine Kokhlikyan
>
> As mentioned in :
> https://github.com/apache/spark/pull/12836#issuecomment-217338855
> We would like to create a wrapper for: dapply(repartiition(col,...), ... )
> This will allow to run aggregate functions on groups which are identified by 
> a list of grouping columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15196) Add a wrapper for dapply(repartition(col,...), ... )

2016-05-24 Thread Sun Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299473#comment-15299473
 ] 

Sun Rui commented on SPARK-15196:
-

As discussed in https://github.com/apache/spark/pull/12966, this has problem of 
applying an R function to groups. so close it.

> Add a wrapper for dapply(repartition(col,...), ... )
> 
>
> Key: SPARK-15196
> URL: https://issues.apache.org/jira/browse/SPARK-15196
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Narine Kokhlikyan
>
> As mentioned in :
> https://github.com/apache/spark/pull/12836#issuecomment-217338855
> We would like to create a wrapper for: dapply(repartiition(col,...), ... )
> This will allow to run aggregate functions on groups which are identified by 
> a list of grouping columns.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15508) Fix flaky test: o.a.s.streaming.kafka.JavaKafkaStreamSuite.testKafkaStream

2016-05-24 Thread Shixiong Zhu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-15508.
--
   Resolution: Fixed
Fix Version/s: 2.0.0

> Fix flaky test: o.a.s.streaming.kafka.JavaKafkaStreamSuite.testKafkaStream
> --
>
> Key: SPARK-15508
> URL: https://issues.apache.org/jira/browse/SPARK-15508
> Project: Spark
>  Issue Type: Test
>  Components: Streaming
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
> Fix For: 2.0.0
>
>
> `JavaKafkaStreamSuite.testKafkaStream` assumes when `sent.size == 
> result.size`, the content of `sent` and `result` should be same. However, 
> that's not true. The content of `result` may not be the final content.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15498) fix slow tests

2016-05-24 Thread Cheng Lian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-15498.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 13273
[https://github.com/apache/spark/pull/13273]

> fix slow tests
> --
>
> Key: SPARK-15498
> URL: https://issues.apache.org/jira/browse/SPARK-15498
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 2.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15439) Failed to run unit test in SparkR

2016-05-24 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299432#comment-15299432
 ] 

Felix Cheung commented on SPARK-15439:
--

Possibly with R version or package versions. Can you check 
`installed.packages()`?

> Failed to run unit test in SparkR
> -
>
> Key: SPARK-15439
> URL: https://issues.apache.org/jira/browse/SPARK-15439
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Kai Jiang
>
> Failed to run ./R/run-tests.sh   around recent commit (May 19, 2016)
> It might be related to permission. It seems I used `sudo ./R/run-tests.sh` 
> and it worked sometimes. Without permission, maybe we couldn't access /tmp 
> directory.  However, the SparkR unit testing is still brittle.
> [error 
> message|https://gist.github.com/vectorijk/71f4ff34e3d34a628b8a3013f0ca2aa2]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14083) Analyze JVM bytecode and turn closures into Catalyst expressions

2016-05-24 Thread Takeshi Yamamuro (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299426#comment-15299426
 ] 

Takeshi Yamamuro commented on SPARK-14083:
--

To check the feasibility to support other types except for Row, I moved Josh's 
and viirya's prototype forward: 
https://github.com/maropu/spark/compare/master...expression-analysis3
Seems we could easily support these types based on the prototype; see test 
codes 
https://github.com/maropu/spark/compare/master...expression-analysis3#diff-46c2cd76bdf4dc90045f44f70cb33e15R28
I think these types has no null-handling issue discussed above and so it is 
good as a first step to support this feature.


> Analyze JVM bytecode and turn closures into Catalyst expressions
> 
>
> Key: SPARK-14083
> URL: https://issues.apache.org/jira/browse/SPARK-14083
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: Reynold Xin
>
> One big advantage of the Dataset API is the type safety, at the cost of 
> performance due to heavy reliance on user-defined closures/lambdas. These 
> closures are typically slower than expressions because we have more 
> flexibility to optimize expressions (known data types, no virtual function 
> calls, etc). In many cases, it's actually not going to be very difficult to 
> look into the byte code of these closures and figure out what they are trying 
> to do. If we can understand them, then we can turn them directly into 
> Catalyst expressions for more optimized executions.
> Some examples are:
> {code}
> df.map(_.name)  // equivalent to expression col("name")
> ds.groupBy(_.gender)  // equivalent to expression col("gender")
> df.filter(_.age > 18)  // equivalent to expression GreaterThan(col("age"), 
> lit(18)
> df.map(_.id + 1)  // equivalent to Add(col("age"), lit(1))
> {code}
> The goal of this ticket is to design a small framework for byte code analysis 
> and use that to convert closures/lambdas into Catalyst expressions in order 
> to speed up Dataset execution. It is a little bit futuristic, but I believe 
> it is very doable. The framework should be easy to reason about (e.g. similar 
> to Catalyst).
> Note that a big emphasis on "small" and "easy to reason about". A patch 
> should be rejected if it is too complicated or difficult to reason about.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15365) Metastore relation should fallback to HDFS size if statistics are not available from table meta data.

2016-05-24 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-15365.
-
   Resolution: Fixed
 Assignee: Parth Brahmbhatt
Fix Version/s: 2.0.0

> Metastore relation should fallback to HDFS size if statistics are not 
> available from table meta data.
> -
>
> Key: SPARK-15365
> URL: https://issues.apache.org/jira/browse/SPARK-15365
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Parth Brahmbhatt
>Assignee: Parth Brahmbhatt
> Fix For: 2.0.0
>
>
> Currently if a table is used in join operation we rely on Metastore returned 
> size to calculate if we can convert the operation to Broadcast join. This 
> optimization only kicks in for table's that have the statistics available in 
> metastore. Hive generally rolls over to HDFS if the statistics are not 
> available directly from metastore and this seems like a reasonable choice to 
> adopt given the optimization benefit of using broadcast joins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15518) Rename various scheduler backend for consistency

2016-05-24 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-15518.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

> Rename various scheduler backend for consistency
> 
>
> Key: SPARK-15518
> URL: https://issues.apache.org/jira/browse/SPARK-15518
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Reynold Xin
>Assignee: Reynold Xin
> Fix For: 2.0.0
>
>
> Various scheduler backends are not named consistently, making it difficult to 
> understand what they do based on the names. It would be great to rename some 
> of them:
> - LocalScheduler -> LocalSchedulerBackend
> - AppClient -> StandaloneAppClient
> - AppClientListener -> StandaloneAppClientListener
> - SparkDeploySchedulerBackend -> StandaloneSchedulerBackend
> - CoarseMesosSchedulerBackend -> MesosCoarseGrainedSchedulerBackend
> - MesosSchedulerBackend -> MesosFineGrainedSchedulerBackend



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14955) JDBCRelation should report an IllegalArgumentException if stride equals 0

2016-05-24 Thread Yang Juan hu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Juan hu updated SPARK-14955:
-
Affects Version/s: 2.0.0

> JDBCRelation should report an IllegalArgumentException if stride equals 0
> -
>
> Key: SPARK-14955
> URL: https://issues.apache.org/jira/browse/SPARK-14955
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.1, 1.6.1, 2.0.0
>Reporter: Yang Juan hu
>Priority: Minor
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> In file 
> https://github.com/apache/spark/blob/40ed2af587cedadc6e5249031857a922b3b234ca/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala
> row 56 and 57 has following line
> val stride: Long = (partitioning.upperBound / numPartitions
>   - partitioning.lowerBound / numPartitions)
> if we invoke columnPartition as below: 
> columnPartition( JDBCPartitioningInfo("partitionColumn", 0, 7, 8) );
> columnPartition will generate following where condition:
> whereClause: partitionColumn < 0
> whereClause: partitionColumn >= 0 AND partitionColumn < 0
> whereClause: partitionColumn >= 0 AND partitionColumn < 0
> whereClause: partitionColumn >= 0 AND partitionColumn < 0
> whereClause: partitionColumn >= 0 AND partitionColumn < 0
> whereClause: partitionColumn >= 0 AND partitionColumn < 0
> whereClause: partitionColumn >= 0 AND partitionColumn < 0
> whereClause: partitionColumn >= 0
> it will cause data skew, the last partition will contain all data.
> Propose to throw an exception if stride equal 0, help spark user to aware  
> data skew issue ASAP.
> if (stride == 0) return throw new 
> IllegalArgumentException("partitioning.upperBound / numPartitions -  
> partitioning.lowerBound / numPartitions is zero");
> partitionColumn must be an integral type, if we want to load a big table from 
> DBMS,  we need to do some work around.
> Real case to export data from ORACLE database through pyspark.
> #data skew issue version
> df=ssc.read.format("jdbc").options( url=url,
> dbtable="( SELECT ORA_HASH(PART_COL,7)  AS PART_ID, A.* FROM DBMS_TAB A ) 
> TAB_ALIAS",
> fetchSize="1000",
> partitionColumn="PART_ID",
> numPartitions="8",
> lowerBound="0",
> upperBound="7").load()
> #no data skew issue version
> df=ssc.read.format("jdbc").options( url=url,
> dbtable="( SELECT ORA_HASH(PART_COL,7)+1  AS PART_ID, A.* FROM DBMS_TAB A 
> ) TAB_ALIAS",
> fetchSize="1000",
> partitionColumn="PART_ID",
> numPartitions="8",
> lowerBound="1",
> upperBound="8").load()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14321) Reduce date format cost in date functions

2016-05-24 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-14321:

Target Version/s: 2.0.0

> Reduce date format cost in date functions
> -
>
> Key: SPARK-14321
> URL: https://issues.apache.org/jira/browse/SPARK-14321
> Project: Spark
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Priority: Minor
>
> Currently the code generated is
> {noformat}
> /* 066 */ UTF8String primitive5 = null;
> /* 067 */ if (!isNull4) {
> /* 068 */   try {
> /* 069 */ primitive5 = UTF8String.fromString(new 
> java.text.SimpleDateFormat("-MM-dd HH:mm:ss").format(
> /* 070 */ new java.util.Date(primitive7 * 1000L)));
> /* 071 */   } catch (java.lang.Throwable e) {
> /* 072 */ isNull4 = true;
> /* 073 */   }
> /* 074 */ }
> {noformat}
> Instantiation of SimpleDateFormat is fairly expensive. It can be created on 
> need basis. 
> I will share the patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15345) SparkSession's conf doesn't take effect when there's already an existing SparkContext

2016-05-24 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-15345:

Affects Version/s: (was: 2.0.0)
 Target Version/s: 2.0.0

> SparkSession's conf doesn't take effect when there's already an existing 
> SparkContext
> -
>
> Key: SPARK-15345
> URL: https://issues.apache.org/jira/browse/SPARK-15345
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Reporter: Piotr Milanowski
>Assignee: Reynold Xin
>Priority: Blocker
> Fix For: 2.0.0
>
>
> I am working with branch-2.0, spark is compiled with hive support (-Phive and 
> -Phvie-thriftserver).
> I am trying to access databases using this snippet:
> {code}
> from pyspark.sql import HiveContext
> hc = HiveContext(sc)
> hc.sql("show databases").collect()
> [Row(result='default')]
> {code}
> This means that spark doesn't find any databases specified in configuration.
> Using the same configuration (i.e. hive-site.xml and core-site.xml) in spark 
> 1.6, and launching above snippet, I can print out existing databases.
> When run in DEBUG mode this is what spark (2.0) prints out:
> {code}
> 16/05/16 12:17:47 INFO SparkSqlParser: Parsing command: show databases
> 16/05/16 12:17:47 DEBUG SimpleAnalyzer: 
> === Result of Batch Resolution ===
> !'Project [unresolveddeserializer(createexternalrow(if (isnull(input[0, 
> string])) null else input[0, string].toString, 
> StructField(result,StringType,false)), result#2) AS #3]   Project 
> [createexternalrow(if (isnull(result#2)) null else result#2.toString, 
> StructField(result,StringType,false)) AS #3]
>  +- LocalRelation [result#2]  
>   
>  +- LocalRelation [result#2]
> 
> 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ Cleaning closure  
> (org.apache.spark.sql.Dataset$$anonfun$53) +++
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + declared fields: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  public static final long 
> org.apache.spark.sql.Dataset$$anonfun$53.serialVersionUID
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  private final 
> org.apache.spark.sql.types.StructType 
> org.apache.spark.sql.Dataset$$anonfun$53.structType$1
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + declared methods: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  public final java.lang.Object 
> org.apache.spark.sql.Dataset$$anonfun$53.apply(java.lang.Object)
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  public final java.lang.Object 
> org.apache.spark.sql.Dataset$$anonfun$53.apply(org.apache.spark.sql.catalyst.InternalRow)
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + inner classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outer classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outer objects: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + populating accessed fields because 
> this is the starting closure
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + fields accessed by starting 
> closure: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + there are no enclosing objects!
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  +++ closure  
> (org.apache.spark.sql.Dataset$$anonfun$53) is now cleaned +++
> 16/05/16 12:17:47 DEBUG ClosureCleaner: +++ Cleaning closure  
> (org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1)
>  +++
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + declared fields: 1
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  public static final long 
> org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.serialVersionUID
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + declared methods: 2
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  public final java.lang.Object 
> org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.apply(java.lang.Object)
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  public final 
> org.apache.spark.api.python.SerDeUtil$AutoBatchedPickler 
> org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1.apply(scala.collection.Iterator)
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + inner classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outer classes: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + outer objects: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + populating accessed fields because 
> this is the starting closure
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + fields accessed by starting 
> closure: 0
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  + there are no enclosing objects!
> 16/05/16 12:17:47 DEBUG ClosureCleaner:  +++ closure  
> (org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$javaToPython$1)
>  is now cleaned +++

[jira] [Updated] (SPARK-14400) ScriptTransformation does not fail the job for bad user command

2016-05-24 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-14400:

Assignee: Tejas Patil

> ScriptTransformation does not fail the job for bad user command
> ---
>
> Key: SPARK-14400
> URL: https://issues.apache.org/jira/browse/SPARK-14400
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Tejas Patil
>Assignee: Tejas Patil
>Priority: Minor
>
> If the `script` to be ran is an incorrect command, Spark does not catch the 
> failure in running the sub-process and the job is marked as successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-14400) ScriptTransformation does not fail the job for bad user command

2016-05-24 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-14400:

Target Version/s: 2.0.0

> ScriptTransformation does not fail the job for bad user command
> ---
>
> Key: SPARK-14400
> URL: https://issues.apache.org/jira/browse/SPARK-14400
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Tejas Patil
>Priority: Minor
>
> If the `script` to be ran is an incorrect command, Spark does not catch the 
> failure in running the sub-process and the job is marked as successful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15520) SparkSession builder in python should also allow overriding confs of existing sessions

2016-05-24 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-15520:

Issue Type: Sub-task  (was: Bug)
Parent: SPARK-13485

> SparkSession builder in python should also allow overriding confs of existing 
> sessions
> --
>
> Key: SPARK-15520
> URL: https://issues.apache.org/jira/browse/SPARK-15520
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Eric Liang
>
> This is a leftover TODO from the SparkSession clean in this PR: 
> https://github.com/apache/spark/pull/13200



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15520) SparkSession builder in python should also allow overriding confs of existing sessions

2016-05-24 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-15520:

Target Version/s: 2.0.0

> SparkSession builder in python should also allow overriding confs of existing 
> sessions
> --
>
> Key: SPARK-15520
> URL: https://issues.apache.org/jira/browse/SPARK-15520
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Eric Liang
>
> This is a leftover TODO from the SparkSession clean in this PR: 
> https://github.com/apache/spark/pull/13200



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15512) repartition(0) should raise IllegalArgumentException.

2016-05-24 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-15512.
-
   Resolution: Fixed
 Assignee: Dongjoon Hyun
Fix Version/s: 2.0.0

> repartition(0) should raise IllegalArgumentException.
> -
>
> Key: SPARK-15512
> URL: https://issues.apache.org/jira/browse/SPARK-15512
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
> Fix For: 2.0.0
>
>
> Previously, SPARK-8893 added the positive partition constrains on 
> repartition/coalesce operations in general.
> This PR adds one missing part for that and adds explicit two testcases.
> **Before**
> {code:title=repartition(0)|borderStyle=solid}
> scala> sc.parallelize(1 to 5).coalesce(0).collect()
> java.lang.IllegalArgumentException: requirement failed: Number of partitions 
> (0) must be positive.
> ...
> scala> sc.parallelize(1 to 5).repartition(0).collect()
> res1: Array[Int] = Array()
> {code}
> **After**
> {code:title=repartition(0)|borderStyle=solid}
> scala> sc.parallelize(1 to 5).coalesce(0).collect()
> java.lang.IllegalArgumentException: requirement failed: Number of partitions 
> (0) must be positive.
> ...
> scala> sc.parallelize(1 to 5).repartition(0).collect()
> java.lang.IllegalArgumentException: requirement failed: Number of partitions 
> (0) must be positive.
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13484) Filter outer joined result using a non-nullable column from the right table

2016-05-24 Thread Yin Huai (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-13484:
-
Target Version/s: 2.0.0

> Filter outer joined result using a non-nullable column from the right table
> ---
>
> Key: SPARK-13484
> URL: https://issues.apache.org/jira/browse/SPARK-13484
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 1.6.0, 2.0.0
>Reporter: Xiangrui Meng
>
> Technically speaking, this is not a bug. But
> {code}
> val a = sqlContext.range(10).select(col("id"), lit(0).as("count"))
> val b = sqlContext.range(10).select((col("id") % 
> 3).as("id")).groupBy("id").count()
> a.join(b, a("id") === b("id"), "left_outer").filter(b("count").isNull).show()
> {code}
> returns nothing. This is because `b("count")` is not nullable and the filter 
> condition is always false by static analysis. However, it is common for users 
> to use `a(...)` and `b(...)` to filter the joined result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15520) SparkSession builder in python should also allow overriding confs of existing sessions

2016-05-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15520:


Assignee: Apache Spark

> SparkSession builder in python should also allow overriding confs of existing 
> sessions
> --
>
> Key: SPARK-15520
> URL: https://issues.apache.org/jira/browse/SPARK-15520
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Eric Liang
>Assignee: Apache Spark
>
> This is a leftover TODO from the SparkSession clean in this PR: 
> https://github.com/apache/spark/pull/13200



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15520) SparkSession builder in python should also allow overriding confs of existing sessions

2016-05-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15520:


Assignee: (was: Apache Spark)

> SparkSession builder in python should also allow overriding confs of existing 
> sessions
> --
>
> Key: SPARK-15520
> URL: https://issues.apache.org/jira/browse/SPARK-15520
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Eric Liang
>
> This is a leftover TODO from the SparkSession clean in this PR: 
> https://github.com/apache/spark/pull/13200



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15520) SparkSession builder in python should also allow overriding confs of existing sessions

2016-05-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299286#comment-15299286
 ] 

Apache Spark commented on SPARK-15520:
--

User 'ericl' has created a pull request for this issue:
https://github.com/apache/spark/pull/13289

> SparkSession builder in python should also allow overriding confs of existing 
> sessions
> --
>
> Key: SPARK-15520
> URL: https://issues.apache.org/jira/browse/SPARK-15520
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Eric Liang
>
> This is a leftover TODO from the SparkSession clean in this PR: 
> https://github.com/apache/spark/pull/13200



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10592) deprecate weights and use coefficients instead in ML models

2016-05-24 Thread Kai Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299283#comment-15299283
 ] 

Kai Jiang commented on SPARK-10592:
---

This is answered on corresponding Github PR.

> deprecate weights and use coefficients instead in ML models
> ---
>
> Key: SPARK-10592
> URL: https://issues.apache.org/jira/browse/SPARK-10592
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Xiangrui Meng
>Assignee: Kai Jiang
>Priority: Critical
> Fix For: 1.6.0
>
>
> The name `weights` becomes confusing as we are supporting weighted instanced. 
> As discussed in https://github.com/apache/spark/pull/7884, we want to 
> deprecate `weights` and use `coefficients` instead:
> * Deprecate but do not remove `weights`.
> * Only make changes under `spark.ml`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15520) SparkSession builder in python should also allow overriding confs of existing sessions

2016-05-24 Thread Eric Liang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Liang updated SPARK-15520:
---
Component/s: SQL

> SparkSession builder in python should also allow overriding confs of existing 
> sessions
> --
>
> Key: SPARK-15520
> URL: https://issues.apache.org/jira/browse/SPARK-15520
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Eric Liang
>
> This is a leftover TODO from the SparkSession clean in this PR: 
> https://github.com/apache/spark/pull/13200



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15520) SparkSession builder in python should also allow overriding confs of existing sessions

2016-05-24 Thread Eric Liang (JIRA)
Eric Liang created SPARK-15520:
--

 Summary: SparkSession builder in python should also allow 
overriding confs of existing sessions
 Key: SPARK-15520
 URL: https://issues.apache.org/jira/browse/SPARK-15520
 Project: Spark
  Issue Type: Bug
Reporter: Eric Liang


This is a leftover TODO from the SparkSession clean in this PR: 
https://github.com/apache/spark/pull/13200



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15518) Rename various scheduler backend for consistency

2016-05-24 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-15518:

Description: 
Various scheduler backends are not named consistently, making it difficult to 
understand what they do based on the names. It would be great to rename some of 
them:

- LocalScheduler -> LocalSchedulerBackend
- AppClient -> StandaloneAppClient
- AppClientListener -> StandaloneAppClientListener
- SparkDeploySchedulerBackend -> StandaloneSchedulerBackend
- CoarseMesosSchedulerBackend -> MesosCoarseGrainedSchedulerBackend
- MesosSchedulerBackend -> MesosFineGrainedSchedulerBackend


  was:
SparkDeploySchedulerBackend is a weird name and inconsistent with rest of the 
backends (e.g. CoarseMesosSchedulerBackend).



> Rename various scheduler backend for consistency
> 
>
> Key: SPARK-15518
> URL: https://issues.apache.org/jira/browse/SPARK-15518
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> Various scheduler backends are not named consistently, making it difficult to 
> understand what they do based on the names. It would be great to rename some 
> of them:
> - LocalScheduler -> LocalSchedulerBackend
> - AppClient -> StandaloneAppClient
> - AppClientListener -> StandaloneAppClientListener
> - SparkDeploySchedulerBackend -> StandaloneSchedulerBackend
> - CoarseMesosSchedulerBackend -> MesosCoarseGrainedSchedulerBackend
> - MesosSchedulerBackend -> MesosFineGrainedSchedulerBackend



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15518) Rename various scheduler backend for consistency

2016-05-24 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-15518:

Summary: Rename various scheduler backend for consistency  (was: Make 
various scheduler backend class names consistent)

> Rename various scheduler backend for consistency
> 
>
> Key: SPARK-15518
> URL: https://issues.apache.org/jira/browse/SPARK-15518
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> SparkDeploySchedulerBackend is a weird name and inconsistent with rest of the 
> backends (e.g. CoarseMesosSchedulerBackend).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15518) Make various scheduler backend class names consistent

2016-05-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15518:


Assignee: Apache Spark  (was: Reynold Xin)

> Make various scheduler backend class names consistent
> -
>
> Key: SPARK-15518
> URL: https://issues.apache.org/jira/browse/SPARK-15518
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> SparkDeploySchedulerBackend is a weird name and inconsistent with rest of the 
> backends (e.g. CoarseMesosSchedulerBackend).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15518) Make various scheduler backend class names consistent

2016-05-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15518:


Assignee: Reynold Xin  (was: Apache Spark)

> Make various scheduler backend class names consistent
> -
>
> Key: SPARK-15518
> URL: https://issues.apache.org/jira/browse/SPARK-15518
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> SparkDeploySchedulerBackend is a weird name and inconsistent with rest of the 
> backends (e.g. CoarseMesosSchedulerBackend).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15518) Make various scheduler backend class names consistent

2016-05-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299273#comment-15299273
 ] 

Apache Spark commented on SPARK-15518:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/13288

> Make various scheduler backend class names consistent
> -
>
> Key: SPARK-15518
> URL: https://issues.apache.org/jira/browse/SPARK-15518
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> SparkDeploySchedulerBackend is a weird name and inconsistent with rest of the 
> backends (e.g. CoarseMesosSchedulerBackend).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15518) Make various scheduler backend class names consistent

2016-05-24 Thread Reynold Xin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-15518:

Summary: Make various scheduler backend class names consistent  (was: 
Rename SparkDeploySchedulerBackend to StandaloneSchedulerBackend)

> Make various scheduler backend class names consistent
> -
>
> Key: SPARK-15518
> URL: https://issues.apache.org/jira/browse/SPARK-15518
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> SparkDeploySchedulerBackend is a weird name and inconsistent with rest of the 
> backends (e.g. CoarseMesosSchedulerBackend).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4820) Spark build encounters "File name too long" on some encrypted filesystems

2016-05-24 Thread Niko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299232#comment-15299232
 ] 

Niko edited comment on SPARK-4820 at 5/25/16 1:00 AM:
--

Yep, still an issue. I'm using Ubuntu 15.10 and building from scratch - version 
2.0.0 SNAPSHOT.


was (Author: nskirov):
Yep, still an issue. I'm using Ubuntu 15.10 and building from scratch.

> Spark build encounters "File name too long" on some encrypted filesystems
> -
>
> Key: SPARK-4820
> URL: https://issues.apache.org/jira/browse/SPARK-4820
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Patrick Wendell
>Assignee: Theodore Vasiloudis
>Priority: Minor
> Fix For: 1.4.0
>
>
> This was reported by Luchesar Cekov on github along with a proposed fix. The 
> fix has some potential downstream issues (it will modify the classnames) so 
> until we understand better how many users are affected we aren't going to 
> merge it. However, I'd like to include the issue and workaround here. If you 
> encounter this issue please comment on the JIRA so we can assess the 
> frequency.
> The issue produces this error:
> {code}
> [error] == Expanded type of tree ==
> [error] 
> [error] ConstantType(value = Constant(Throwable))
> [error] 
> [error] uncaught exception during compilation: java.io.IOException
> [error] File name too long
> [error] two errors found
> {code}
> The workaround is in maven under the compile options add: 
> {code}
> +  -Xmax-classfile-name
> +  128
> {code}
> In SBT add:
> {code}
> +scalacOptions in Compile ++= Seq("-Xmax-classfile-name", "128"),
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4820) Spark build encounters "File name too long" on some encrypted filesystems

2016-05-24 Thread Niko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299232#comment-15299232
 ] 

Niko edited comment on SPARK-4820 at 5/25/16 12:45 AM:
---

Yep, still an issue. I'm using Ubuntu 15.10 and building from scratch.


was (Author: nskirov):
Yep, still an issue. I'm using Ubuntu 15.10 and building from scratch. Also, 
could someone please give me a bit more details on where the configuration 
mentioned in the workaround actually lives?

> Spark build encounters "File name too long" on some encrypted filesystems
> -
>
> Key: SPARK-4820
> URL: https://issues.apache.org/jira/browse/SPARK-4820
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Patrick Wendell
>Assignee: Theodore Vasiloudis
>Priority: Minor
> Fix For: 1.4.0
>
>
> This was reported by Luchesar Cekov on github along with a proposed fix. The 
> fix has some potential downstream issues (it will modify the classnames) so 
> until we understand better how many users are affected we aren't going to 
> merge it. However, I'd like to include the issue and workaround here. If you 
> encounter this issue please comment on the JIRA so we can assess the 
> frequency.
> The issue produces this error:
> {code}
> [error] == Expanded type of tree ==
> [error] 
> [error] ConstantType(value = Constant(Throwable))
> [error] 
> [error] uncaught exception during compilation: java.io.IOException
> [error] File name too long
> [error] two errors found
> {code}
> The workaround is in maven under the compile options add: 
> {code}
> +  -Xmax-classfile-name
> +  128
> {code}
> In SBT add:
> {code}
> +scalacOptions in Compile ++= Seq("-Xmax-classfile-name", "128"),
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4820) Spark build encounters "File name too long" on some encrypted filesystems

2016-05-24 Thread Niko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299232#comment-15299232
 ] 

Niko edited comment on SPARK-4820 at 5/25/16 12:40 AM:
---

Yep, still an issue. I'm using Ubuntu 15.10 and building from scratch. Also, 
could someone please give me a bit more details on where the configuration 
mentioned in the workaround actually lives?


was (Author: nskirov):
Yep, still an issue. I'm using Ubuntu 15.10 and building from scratch.

> Spark build encounters "File name too long" on some encrypted filesystems
> -
>
> Key: SPARK-4820
> URL: https://issues.apache.org/jira/browse/SPARK-4820
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Patrick Wendell
>Assignee: Theodore Vasiloudis
>Priority: Minor
> Fix For: 1.4.0
>
>
> This was reported by Luchesar Cekov on github along with a proposed fix. The 
> fix has some potential downstream issues (it will modify the classnames) so 
> until we understand better how many users are affected we aren't going to 
> merge it. However, I'd like to include the issue and workaround here. If you 
> encounter this issue please comment on the JIRA so we can assess the 
> frequency.
> The issue produces this error:
> {code}
> [error] == Expanded type of tree ==
> [error] 
> [error] ConstantType(value = Constant(Throwable))
> [error] 
> [error] uncaught exception during compilation: java.io.IOException
> [error] File name too long
> [error] two errors found
> {code}
> The workaround is in maven under the compile options add: 
> {code}
> +  -Xmax-classfile-name
> +  128
> {code}
> In SBT add:
> {code}
> +scalacOptions in Compile ++= Seq("-Xmax-classfile-name", "128"),
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15519) Shuffle Service fails to start if first yarn.nodemanager.local-dirs is bad

2016-05-24 Thread Matthew Sharp (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299234#comment-15299234
 ] 

Matthew Sharp commented on SPARK-15519:
---

I misclicked and resolved as Fixed instead of as Duplicate. Feel free to edit.

> Shuffle Service fails to start if first yarn.nodemanager.local-dirs is bad
> --
>
> Key: SPARK-15519
> URL: https://issues.apache.org/jira/browse/SPARK-15519
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, YARN
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.04 LTS, MapR 5.1, hadoop-2.7.0
>Reporter: Matthew Sharp
>
> {{yarn.nodemanager.local-dirs}} is set to 
> {{/mnt/data0,/mnt/data1,/mnt/data2,/mnt/data3,/mnt/data4}}
> /mnt/data0 was not mounted due to a disk failure, so it was an empty 
> directory which users were not allowed to write to.
> Starting up the node manager, we get this in the logs:
> {quote}
> 2016-05-24 15:41:56,456 INFO 
> org.apache.spark.network.yarn.YarnShuffleService: Initializing YARN shuffle 
> service for Spark
> 2016-05-24 15:41:56,456 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: 
> Adding auxiliary service spark_shuffle, "spark_shuffle"
> 2016-05-24 15:41:56,609 ERROR 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver: error opening 
> leveldb file /mnt/data0/registeredExecutors.ldb.  Creating new file, will not 
> be able to recover state for existing applications
> org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: 
> /mnt/data0/registeredExecutors.ldb/LOCK: Permission denied
> at 
> org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
> at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
> at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:100)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:81)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.(ExternalShuffleBlockHandler.java:56)
> at 
> org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:128)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:157)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:250)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:256)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:476)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:524)
> 2016-05-24 15:41:56,611 WARN 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver: error deleting 
> /mnt/data0/registeredExecutors.ldb
> 2016-05-24 15:41:56,611 ERROR 
> org.apache.spark.network.yarn.YarnShuffleService: Failed to initialize 
> external shuffle service
> java.io.IOException: Unable to create state store
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:129)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:81)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.(ExternalShuffleBlockHandler.java:56)
> at 
> org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:128)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:157)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:250)
> at 
> org.apache.hadoop.s

[jira] [Commented] (SPARK-4820) Spark build encounters "File name too long" on some encrypted filesystems

2016-05-24 Thread Niko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299232#comment-15299232
 ] 

Niko commented on SPARK-4820:
-

Yep, still an issue. I'm using Ubuntu 15.10 and building from scratch.

> Spark build encounters "File name too long" on some encrypted filesystems
> -
>
> Key: SPARK-4820
> URL: https://issues.apache.org/jira/browse/SPARK-4820
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Patrick Wendell
>Assignee: Theodore Vasiloudis
>Priority: Minor
> Fix For: 1.4.0
>
>
> This was reported by Luchesar Cekov on github along with a proposed fix. The 
> fix has some potential downstream issues (it will modify the classnames) so 
> until we understand better how many users are affected we aren't going to 
> merge it. However, I'd like to include the issue and workaround here. If you 
> encounter this issue please comment on the JIRA so we can assess the 
> frequency.
> The issue produces this error:
> {code}
> [error] == Expanded type of tree ==
> [error] 
> [error] ConstantType(value = Constant(Throwable))
> [error] 
> [error] uncaught exception during compilation: java.io.IOException
> [error] File name too long
> [error] two errors found
> {code}
> The workaround is in maven under the compile options add: 
> {code}
> +  -Xmax-classfile-name
> +  128
> {code}
> In SBT add:
> {code}
> +scalacOptions in Compile ++= Seq("-Xmax-classfile-name", "128"),
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15491) JSON serialization fails for JDBC DataFrames

2016-05-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15491:


Assignee: (was: Apache Spark)

> JSON serialization fails for JDBC DataFrames
> 
>
> Key: SPARK-15491
> URL: https://issues.apache.org/jira/browse/SPARK-15491
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: MacOS 10.11.5, Spark 2.0.0-preview
>Reporter: Marc Prud'hommeaux
>
> The TreeNode.toJSON feature implemented in SPARK-12321 fails with an 
> assertion error on DataFrames that use JDBC in spark 2.0.0-preview:
> {code}
> scala> 
> sqlContext.read.json("examples/src/main/resources/people.json").select("name",
>  "age").agg(avg("age"), count("name")).filter(avg("age") > 
> 10).queryExecution.logical.toJSON
> res113: String = 
> [{"class":"org.apache.spark.sql.catalyst.plans.logical.Filter","num-children":1,"condition":[{"class":"org.apache.spark.sql.catalyst.expressions.GreaterThan","num-children":2,"left":0,"right":1},{"class":"org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression","num-children":1,...
> scala> sqlContext.read.format("jdbc").options(db + ("dbtable" -> 
> "categories")).load().queryExecution.logical.simpleString
> res120: String = Relation[category#2148,categoryname#2149] 
> JDBCRelation(categories)
> scala> sqlContext.read.format("jdbc").options(db + ("dbtable" -> 
> "categories")).load().queryExecution.logical.toJSON
> java.lang.AssertionError: assertion failed
>   at scala.Predef$.assert(Predef.scala:156)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.org$apache$spark$sql$catalyst$trees$TreeNode$$parseToJson(TreeNode.scala:598)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$jsonFields$2.apply(TreeNode.scala:562)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$jsonFields$2.apply(TreeNode.scala:553)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.jsonFields(TreeNode.scala:553)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.org$apache$spark$sql$catalyst$trees$TreeNode$$collectJsonValue$1(TreeNode.scala:538)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.jsonValue(TreeNode.scala:543)
>   at org.apache.spark.sql.catalyst.trees.TreeNode.toJSON(TreeNode.scala:529)
>   ... 48 elided
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15491) JSON serialization fails for JDBC DataFrames

2016-05-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15491:


Assignee: Apache Spark

> JSON serialization fails for JDBC DataFrames
> 
>
> Key: SPARK-15491
> URL: https://issues.apache.org/jira/browse/SPARK-15491
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: MacOS 10.11.5, Spark 2.0.0-preview
>Reporter: Marc Prud'hommeaux
>Assignee: Apache Spark
>
> The TreeNode.toJSON feature implemented in SPARK-12321 fails with an 
> assertion error on DataFrames that use JDBC in spark 2.0.0-preview:
> {code}
> scala> 
> sqlContext.read.json("examples/src/main/resources/people.json").select("name",
>  "age").agg(avg("age"), count("name")).filter(avg("age") > 
> 10).queryExecution.logical.toJSON
> res113: String = 
> [{"class":"org.apache.spark.sql.catalyst.plans.logical.Filter","num-children":1,"condition":[{"class":"org.apache.spark.sql.catalyst.expressions.GreaterThan","num-children":2,"left":0,"right":1},{"class":"org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression","num-children":1,...
> scala> sqlContext.read.format("jdbc").options(db + ("dbtable" -> 
> "categories")).load().queryExecution.logical.simpleString
> res120: String = Relation[category#2148,categoryname#2149] 
> JDBCRelation(categories)
> scala> sqlContext.read.format("jdbc").options(db + ("dbtable" -> 
> "categories")).load().queryExecution.logical.toJSON
> java.lang.AssertionError: assertion failed
>   at scala.Predef$.assert(Predef.scala:156)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.org$apache$spark$sql$catalyst$trees$TreeNode$$parseToJson(TreeNode.scala:598)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$jsonFields$2.apply(TreeNode.scala:562)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$jsonFields$2.apply(TreeNode.scala:553)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.jsonFields(TreeNode.scala:553)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.org$apache$spark$sql$catalyst$trees$TreeNode$$collectJsonValue$1(TreeNode.scala:538)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.jsonValue(TreeNode.scala:543)
>   at org.apache.spark.sql.catalyst.trees.TreeNode.toJSON(TreeNode.scala:529)
>   ... 48 elided
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15491) JSON serialization fails for JDBC DataFrames

2016-05-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299228#comment-15299228
 ] 

Apache Spark commented on SPARK-15491:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/13287

> JSON serialization fails for JDBC DataFrames
> 
>
> Key: SPARK-15491
> URL: https://issues.apache.org/jira/browse/SPARK-15491
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: MacOS 10.11.5, Spark 2.0.0-preview
>Reporter: Marc Prud'hommeaux
>
> The TreeNode.toJSON feature implemented in SPARK-12321 fails with an 
> assertion error on DataFrames that use JDBC in spark 2.0.0-preview:
> {code}
> scala> 
> sqlContext.read.json("examples/src/main/resources/people.json").select("name",
>  "age").agg(avg("age"), count("name")).filter(avg("age") > 
> 10).queryExecution.logical.toJSON
> res113: String = 
> [{"class":"org.apache.spark.sql.catalyst.plans.logical.Filter","num-children":1,"condition":[{"class":"org.apache.spark.sql.catalyst.expressions.GreaterThan","num-children":2,"left":0,"right":1},{"class":"org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression","num-children":1,...
> scala> sqlContext.read.format("jdbc").options(db + ("dbtable" -> 
> "categories")).load().queryExecution.logical.simpleString
> res120: String = Relation[category#2148,categoryname#2149] 
> JDBCRelation(categories)
> scala> sqlContext.read.format("jdbc").options(db + ("dbtable" -> 
> "categories")).load().queryExecution.logical.toJSON
> java.lang.AssertionError: assertion failed
>   at scala.Predef$.assert(Predef.scala:156)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.org$apache$spark$sql$catalyst$trees$TreeNode$$parseToJson(TreeNode.scala:598)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$jsonFields$2.apply(TreeNode.scala:562)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$jsonFields$2.apply(TreeNode.scala:553)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.jsonFields(TreeNode.scala:553)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.org$apache$spark$sql$catalyst$trees$TreeNode$$collectJsonValue$1(TreeNode.scala:538)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.jsonValue(TreeNode.scala:543)
>   at org.apache.spark.sql.catalyst.trees.TreeNode.toJSON(TreeNode.scala:529)
>   ... 48 elided
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15519) Shuffle Service fails to start if first yarn.nodemanager.local-dirs is bad

2016-05-24 Thread Matthew Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Sharp resolved SPARK-15519.
---
Resolution: Fixed

Agreed, closing as dupe.

> Shuffle Service fails to start if first yarn.nodemanager.local-dirs is bad
> --
>
> Key: SPARK-15519
> URL: https://issues.apache.org/jira/browse/SPARK-15519
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, YARN
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.04 LTS, MapR 5.1, hadoop-2.7.0
>Reporter: Matthew Sharp
>
> {{yarn.nodemanager.local-dirs}} is set to 
> {{/mnt/data0,/mnt/data1,/mnt/data2,/mnt/data3,/mnt/data4}}
> /mnt/data0 was not mounted due to a disk failure, so it was an empty 
> directory which users were not allowed to write to.
> Starting up the node manager, we get this in the logs:
> {quote}
> 2016-05-24 15:41:56,456 INFO 
> org.apache.spark.network.yarn.YarnShuffleService: Initializing YARN shuffle 
> service for Spark
> 2016-05-24 15:41:56,456 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: 
> Adding auxiliary service spark_shuffle, "spark_shuffle"
> 2016-05-24 15:41:56,609 ERROR 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver: error opening 
> leveldb file /mnt/data0/registeredExecutors.ldb.  Creating new file, will not 
> be able to recover state for existing applications
> org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: 
> /mnt/data0/registeredExecutors.ldb/LOCK: Permission denied
> at 
> org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
> at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
> at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:100)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:81)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.(ExternalShuffleBlockHandler.java:56)
> at 
> org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:128)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:157)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:250)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:256)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:476)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:524)
> 2016-05-24 15:41:56,611 WARN 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver: error deleting 
> /mnt/data0/registeredExecutors.ldb
> 2016-05-24 15:41:56,611 ERROR 
> org.apache.spark.network.yarn.YarnShuffleService: Failed to initialize 
> external shuffle service
> java.io.IOException: Unable to create state store
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:129)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:81)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.(ExternalShuffleBlockHandler.java:56)
> at 
> org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:128)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:157)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:250)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop

[jira] [Commented] (SPARK-15519) Shuffle Service fails to start if first yarn.nodemanager.local-dirs is bad

2016-05-24 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299204#comment-15299204
 ] 

Marcelo Vanzin commented on SPARK-15519:


Looks like this is the same, or at least related, to SPARK-14963.

> Shuffle Service fails to start if first yarn.nodemanager.local-dirs is bad
> --
>
> Key: SPARK-15519
> URL: https://issues.apache.org/jira/browse/SPARK-15519
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, YARN
>Affects Versions: 1.6.1
> Environment: Ubuntu 14.04 LTS, MapR 5.1, hadoop-2.7.0
>Reporter: Matthew Sharp
>
> {{yarn.nodemanager.local-dirs}} is set to 
> {{/mnt/data0,/mnt/data1,/mnt/data2,/mnt/data3,/mnt/data4}}
> /mnt/data0 was not mounted due to a disk failure, so it was an empty 
> directory which users were not allowed to write to.
> Starting up the node manager, we get this in the logs:
> {quote}
> 2016-05-24 15:41:56,456 INFO 
> org.apache.spark.network.yarn.YarnShuffleService: Initializing YARN shuffle 
> service for Spark
> 2016-05-24 15:41:56,456 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: 
> Adding auxiliary service spark_shuffle, "spark_shuffle"
> 2016-05-24 15:41:56,609 ERROR 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver: error opening 
> leveldb file /mnt/data0/registeredExecutors.ldb.  Creating new file, will not 
> be able to recover state for existing applications
> org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: 
> /mnt/data0/registeredExecutors.ldb/LOCK: Permission denied
> at 
> org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
> at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
> at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:100)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:81)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.(ExternalShuffleBlockHandler.java:56)
> at 
> org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:128)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:157)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:250)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:256)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:476)
> at 
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:524)
> 2016-05-24 15:41:56,611 WARN 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver: error deleting 
> /mnt/data0/registeredExecutors.ldb
> 2016-05-24 15:41:56,611 ERROR 
> org.apache.spark.network.yarn.YarnShuffleService: Failed to initialize 
> external shuffle service
> java.io.IOException: Unable to create state store
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:129)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:81)
> at 
> org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.(ExternalShuffleBlockHandler.java:56)
> at 
> org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:128)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:157)
> at 
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> at 
> org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:250)
> at 
> org.apache.hadoop.service.Abst

[jira] [Updated] (SPARK-15519) Shuffle Service fails to start if first yarn.nodemanager.local-dirs is bad

2016-05-24 Thread Matthew Sharp (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Sharp updated SPARK-15519:
--
Description: 
{{yarn.nodemanager.local-dirs}} is set to 
{{/mnt/data0,/mnt/data1,/mnt/data2,/mnt/data3,/mnt/data4}}

/mnt/data0 was not mounted due to a disk failure, so it was an empty directory 
which users were not allowed to write to.

Starting up the node manager, we get this in the logs:
{quote}
2016-05-24 15:41:56,456 INFO org.apache.spark.network.yarn.YarnShuffleService: 
Initializing YARN shuffle service for Spark
2016-05-24 15:41:56,456 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Adding 
auxiliary service spark_shuffle, "spark_shuffle"
2016-05-24 15:41:56,609 ERROR 
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver: error opening 
leveldb file /mnt/data0/registeredExecutors.ldb.  Creating new file, will not 
be able to recover state for existing applications
org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: 
/mnt/data0/registeredExecutors.ldb/LOCK: Permission denied
at 
org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
at 
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:100)
at 
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:81)
at 
org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.(ExternalShuffleBlockHandler.java:56)
at 
org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:128)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:157)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:250)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:256)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:476)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:524)
2016-05-24 15:41:56,611 WARN 
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver: error deleting 
/mnt/data0/registeredExecutors.ldb
2016-05-24 15:41:56,611 ERROR org.apache.spark.network.yarn.YarnShuffleService: 
Failed to initialize external shuffle service
java.io.IOException: Unable to create state store
at 
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:129)
at 
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:81)
at 
org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.(ExternalShuffleBlockHandler.java:56)
at 
org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:128)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:157)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:250)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:256)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:476)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:524)
Caused by: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: 
/mnt/data0/registeredExecutors.ldb/LOCK: Permission denied
at 
org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)

[jira] [Created] (SPARK-15519) Shuffle Service fails to start if first yarn.nodemanager.local-dirs is bad

2016-05-24 Thread Matthew Sharp (JIRA)
Matthew Sharp created SPARK-15519:
-

 Summary: Shuffle Service fails to start if first 
yarn.nodemanager.local-dirs is bad
 Key: SPARK-15519
 URL: https://issues.apache.org/jira/browse/SPARK-15519
 Project: Spark
  Issue Type: Bug
  Components: Shuffle, YARN
Affects Versions: 1.6.1
 Environment: Ubuntu 14.04 LTS, MapR 5.1, hadoop-2.7.0
Reporter: Matthew Sharp


{{yarn.nodemanager.local-dirs}} is set to 
{{/mnt/data0,/mnt/data1,/mnt/data2,/mnt/data3,/mnt/data4}}

/mnt/data0 was not mounted due to a disk failure, so it was an empty directory 
which users were not allowed to write to.

Starting up the node manager, we get this in the logs:
{quote}
2016-05-24 15:41:56,456 INFO org.apache.spark.network.yarn.YarnShuffleService: 
Initializing YARN shuffle service for Spark
2016-05-24 15:41:56,456 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Adding 
auxiliary service spark_shuffle, "spark_shuffle"
2016-05-24 15:41:56,609 ERROR 
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver: error opening 
leveldb file /mnt/data0/registeredExecutors.ldb.  Creating new file, will not 
be able to recover state for existing applications
org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: 
/mnt/data0/registeredExecutors.ldb/LOCK: Permission denied
at 
org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200)
at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218)
at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168)
at 
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:100)
at 
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:81)
at 
org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.(ExternalShuffleBlockHandler.java:56)
at 
org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:128)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:157)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:250)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:256)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:476)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:524)
2016-05-24 15:41:56,611 WARN 
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver: error deleting 
/mnt/data0/registeredExecutors.ldb
2016-05-24 15:41:56,611 ERROR org.apache.spark.network.yarn.YarnShuffleService: 
Failed to initialize external shuffle service
java.io.IOException: Unable to create state store
at 
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:129)
at 
org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:81)
at 
org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.(ExternalShuffleBlockHandler.java:56)
at 
org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:128)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:157)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:250)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:256)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:476)
at 
org.apache.hadoop.yarn.server.

[jira] [Commented] (SPARK-10592) deprecate weights and use coefficients instead in ML models

2016-05-24 Thread Bharath Venkatesh (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299190#comment-15299190
 ] 

Bharath Venkatesh commented on SPARK-10592:
---

Hi Kai

I am trying to build a learning model in Spark 1.6 and I think I am hitting a 
bug related to this deprecation.

This is our sample usecase.

Creating a learning model:
tokenizer = Tokenizer(inputCol=, outputCol=)
hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol=)
lr = LogisticRegression(maxIter=10, regParam=0.01)
pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])
model = pipeline.fit(labeledData)

Creating a DF:
testData=sc.textFile('').map() 
testDf = sqlContext.createDataFrame(testData, schema).where()

Evaluate a target Dataset by calling Model.Transform:
predictionsDf = model.transform(testDf)

I am calling the transform function. The transform is in turn referring to 
weights which seems to be deprecated. I am getting the below Warning.

Warning:
/usr/lib/spark/python/pyspark/ml/classification.py:207: UserWarning: weights is 
deprecated. Use coefficients instead.
warnings.warn("weights is deprecated. Use coefficients instead.")

> deprecate weights and use coefficients instead in ML models
> ---
>
> Key: SPARK-10592
> URL: https://issues.apache.org/jira/browse/SPARK-10592
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Reporter: Xiangrui Meng
>Assignee: Kai Jiang
>Priority: Critical
> Fix For: 1.6.0
>
>
> The name `weights` becomes confusing as we are supporting weighted instanced. 
> As discussed in https://github.com/apache/spark/pull/7884, we want to 
> deprecate `weights` and use `coefficients` instead:
> * Deprecate but do not remove `weights`.
> * Only make changes under `spark.ml`.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15014) Spark Shell could use Ammonite Shell

2016-05-24 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299168#comment-15299168
 ] 

Jakob Odersky commented on SPARK-15014:
---

You might still have some issues with classloaders, I didn't think of that at 
first

> Spark Shell could use Ammonite Shell
> 
>
> Key: SPARK-15014
> URL: https://issues.apache.org/jira/browse/SPARK-15014
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell
>Affects Versions: 1.6.1
> Environment: All
>Reporter: John-Michael Reed
>Priority: Minor
>  Labels: shell, shell-script
>
> Lihaoyi has an enhanced Scala Shell called Ammonite. 
> https://github.com/lihaoyi/Ammonite
> Users of Ammonite shell have tried to use it with Apache Spark. 
> https://github.com/lihaoyi/Ammonite/issues/382
> Spark Shell does not work with Ammonite Shell, but I want it to because the 
> Ammonite REPL offers enhanced auto-complete, pretty printing, and other 
> features. See http://www.lihaoyi.com/Ammonite/#Ammonite-REPL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15463) Support for creating a dataframe from CSV in RDD[String]

2016-05-24 Thread Xin Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299167#comment-15299167
 ] 

Xin Wu commented on SPARK-15463:


I am looking into this. 

> Support for creating a dataframe from CSV in RDD[String]
> 
>
> Key: SPARK-15463
> URL: https://issues.apache.org/jira/browse/SPARK-15463
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: PJ Fanning
>
> I currently use Databrick's spark-csv lib but some features don't work with 
> Apache Spark 2.0.0-SNAPSHOT. I understand that with the addition of CSV 
> support into spark-sql directly, that spark-csv won't be modified.
> I currently read some CSV data that has been pre-processed and is in 
> RDD[String] format.
> There is sqlContext.read.json(rdd: RDD[String]) but other formats don't 
> appear to support the creation of DataFrames based on loading from 
> RDD[String].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15518) Rename SparkDeploySchedulerBackend to StandaloneSchedulerBackend

2016-05-24 Thread Reynold Xin (JIRA)
Reynold Xin created SPARK-15518:
---

 Summary: Rename SparkDeploySchedulerBackend to 
StandaloneSchedulerBackend
 Key: SPARK-15518
 URL: https://issues.apache.org/jira/browse/SPARK-15518
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Reporter: Reynold Xin
Assignee: Reynold Xin


SparkDeploySchedulerBackend is a weird name and inconsistent with rest of the 
backends (e.g. CoarseMesosSchedulerBackend).




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-15439) Failed to run unit test in SparkR

2016-05-24 Thread Miao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299143#comment-15299143
 ] 

Miao Wang edited comment on SPARK-15439 at 5/24/16 11:31 PM:
-

[~felixcheung] For the test case below:

test_that("Check masked functions", {
  # Check that we are not masking any new function from base, stats, testthat 
unexpectedly
  masked <- conflicts(detail = TRUE)$`package:SparkR`
  expect_true("describe" %in% masked)  # only when with testthat..
  func <- lapply(masked, function(x) { capture.output(showMethods(x))[[1]] })
  funcSparkROrEmpty <- grepl("\\(package SparkR\\)$|^$", func)
  maskedBySparkR <- masked[funcSparkROrEmpty]
  namesOfMasked <- c("describe", "cov", "filter", "lag", "na.omit", "predict", 
"sd", "var",
 "colnames", "colnames<-", "intersect", "rank", "rbind", 
"sample", "subset",
 "summary", "transform", "drop", "window", "as.data.frame",
 "endsWith", "startsWith")

When I run it in local, it has "endsWith" and "startsWith". However, when I 
submit PR, Jekins test doesn't include these two. 

I don't know what is the difference when running in local and running auto 
test. I suspect version issue but I don't know where to set the version. Can 
you help checking it? Thanks!


was (Author: wm624):
[~felixcheung] For the test case below:

test_that("Check masked functions", {
  # Check that we are not masking any new function from base, stats, testthat 
unexpectedly
  masked <- conflicts(detail = TRUE)$`package:SparkR`
  expect_true("describe" %in% masked)  # only when with testthat..
  func <- lapply(masked, function(x) { capture.output(showMethods(x))[[1]] })
  funcSparkROrEmpty <- grepl("\\(package SparkR\\)$|^$", func)
  maskedBySparkR <- masked[funcSparkROrEmpty]
  namesOfMasked <- c("describe", "cov", "filter", "lag", "na.omit", "predict", 
"sd", "var",
 "colnames", "colnames<-", "intersect", "rank", "rbind", 
"sample", "subset",
 "summary", "transform", "drop", "window", "as.data.frame",
 "endsWith", "startsWith")

When I run it in local, it has "endsWith" and "startsWith". However, when I 
submit PR, Jekins test doesn't include these two. 

I don't know what is the difference when running in local and running auto 
test. Can you help checking it? Thanks!

> Failed to run unit test in SparkR
> -
>
> Key: SPARK-15439
> URL: https://issues.apache.org/jira/browse/SPARK-15439
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Kai Jiang
>
> Failed to run ./R/run-tests.sh   around recent commit (May 19, 2016)
> It might be related to permission. It seems I used `sudo ./R/run-tests.sh` 
> and it worked sometimes. Without permission, maybe we couldn't access /tmp 
> directory.  However, the SparkR unit testing is still brittle.
> [error 
> message|https://gist.github.com/vectorijk/71f4ff34e3d34a628b8a3013f0ca2aa2]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15449) Wrong Data Format - Documentation Issue

2016-05-24 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299145#comment-15299145
 ] 

Sean Owen commented on SPARK-15449:
---

I don't have the code in front of me, but I think only the 2.0 version matters 
as it will shortly be the published version. The goal is all examples to be 
consistent and correct. If all use libsvm data and the appropriate load method 
then it is done. 

> Wrong Data Format - Documentation Issue
> ---
>
> Key: SPARK-15449
> URL: https://issues.apache.org/jira/browse/SPARK-15449
> Project: Spark
>  Issue Type: Documentation
>  Components: Examples
>Affects Versions: 1.6.1
>Reporter: Kiran Biradarpatil
>Priority: Minor
>
> JAVA example given for MLLib NaiveBayes at 
> http://spark.apache.org/docs/latest/mllib-naive-bayes.html expects the data 
> in LibSVM format. But the example data in MLLib 
> data/mllib/sample_naive_bayes_data.txt is not in right format. 
> So please rectify the sample data file or the the implementation example.
> Thanks!
> Kiran 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15439) Failed to run unit test in SparkR

2016-05-24 Thread Miao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299143#comment-15299143
 ] 

Miao Wang commented on SPARK-15439:
---

[~felixcheung] For the test case below:

test_that("Check masked functions", {
  # Check that we are not masking any new function from base, stats, testthat 
unexpectedly
  masked <- conflicts(detail = TRUE)$`package:SparkR`
  expect_true("describe" %in% masked)  # only when with testthat..
  func <- lapply(masked, function(x) { capture.output(showMethods(x))[[1]] })
  funcSparkROrEmpty <- grepl("\\(package SparkR\\)$|^$", func)
  maskedBySparkR <- masked[funcSparkROrEmpty]
  namesOfMasked <- c("describe", "cov", "filter", "lag", "na.omit", "predict", 
"sd", "var",
 "colnames", "colnames<-", "intersect", "rank", "rbind", 
"sample", "subset",
 "summary", "transform", "drop", "window", "as.data.frame",
 "endsWith", "startsWith")

When I run it in local, it has "endsWith" and "startsWith". However, when I 
submit PR, Jekins test doesn't include these two. 

I don't know what is the difference when running in local and running auto 
test. Can you help checking it? Thanks!

> Failed to run unit test in SparkR
> -
>
> Key: SPARK-15439
> URL: https://issues.apache.org/jira/browse/SPARK-15439
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Kai Jiang
>
> Failed to run ./R/run-tests.sh   around recent commit (May 19, 2016)
> It might be related to permission. It seems I used `sudo ./R/run-tests.sh` 
> and it worked sometimes. Without permission, maybe we couldn't access /tmp 
> directory.  However, the SparkR unit testing is still brittle.
> [error 
> message|https://gist.github.com/vectorijk/71f4ff34e3d34a628b8a3013f0ca2aa2]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15014) Spark Shell could use Ammonite Shell

2016-05-24 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-15014.
---
Resolution: Won't Fix

> Spark Shell could use Ammonite Shell
> 
>
> Key: SPARK-15014
> URL: https://issues.apache.org/jira/browse/SPARK-15014
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell
>Affects Versions: 1.6.1
> Environment: All
>Reporter: John-Michael Reed
>Priority: Minor
>  Labels: shell, shell-script
>
> Lihaoyi has an enhanced Scala Shell called Ammonite. 
> https://github.com/lihaoyi/Ammonite
> Users of Ammonite shell have tried to use it with Apache Spark. 
> https://github.com/lihaoyi/Ammonite/issues/382
> Spark Shell does not work with Ammonite Shell, but I want it to because the 
> Ammonite REPL offers enhanced auto-complete, pretty printing, and other 
> features. See http://www.lihaoyi.com/Ammonite/#Ammonite-REPL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15475) Add tests for writing and reading back empty data for Parquet, Json and Text data sources

2016-05-24 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-15475.
---
Resolution: Duplicate

> Add tests for writing and reading back empty data for Parquet, Json and Text 
> data sources
> -
>
> Key: SPARK-15475
> URL: https://issues.apache.org/jira/browse/SPARK-15475
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Hyukjin Kwon
>
> Currently, Parquet, JSON and Text data sources can write and read back empty 
> data but they are not being tested.
> ORC and CSV do not support this yet. (See SPARK-15474 and SPARK-15473).
> It might be great if they are tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15014) Spark Shell could use Ammonite Shell

2016-05-24 Thread Jakob Odersky (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299118#comment-15299118
 ] 

Jakob Odersky commented on SPARK-15014:
---

spark-shell is a very thin wrapper around the standard scala repl (with spark 
dependencies). It does some configuration and exposes a spark context and some 
imports, almost everything is implemented in these two files: 

- 
https://github.com/apache/spark/blob/master/repl/scala-2.11/src/main/scala/org/apache/spark/repl/Main.scala
- 
https://github.com/apache/spark/blob/master/repl/scala-2.11/src/main/scala/org/apache/spark/repl/SparkILoop.scala.

I don't know much about ammonite, but as a workaround could you use spark as a 
standalone program in your shell? Just add the spark dependencies and create a 
spark context manually.

> Spark Shell could use Ammonite Shell
> 
>
> Key: SPARK-15014
> URL: https://issues.apache.org/jira/browse/SPARK-15014
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Shell
>Affects Versions: 1.6.1
> Environment: All
>Reporter: John-Michael Reed
>Priority: Minor
>  Labels: shell, shell-script
>
> Lihaoyi has an enhanced Scala Shell called Ammonite. 
> https://github.com/lihaoyi/Ammonite
> Users of Ammonite shell have tried to use it with Apache Spark. 
> https://github.com/lihaoyi/Ammonite/issues/382
> Spark Shell does not work with Ammonite Shell, but I want it to because the 
> Ammonite REPL offers enhanced auto-complete, pretty printing, and other 
> features. See http://www.lihaoyi.com/Ammonite/#Ammonite-REPL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15517) Add support for complete output mode

2016-05-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15517:


Assignee: Tathagata Das  (was: Apache Spark)

> Add support for complete output mode 
> -
>
> Key: SPARK-15517
> URL: https://issues.apache.org/jira/browse/SPARK-15517
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>
> Currently structured streaming only supports append output mode. This task is 
> to do the following. 
> - Add support for complete output mode in the planner
> - Add public API for users to specify output mode



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15517) Add support for complete output mode

2016-05-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299113#comment-15299113
 ] 

Apache Spark commented on SPARK-15517:
--

User 'tdas' has created a pull request for this issue:
https://github.com/apache/spark/pull/13286

> Add support for complete output mode 
> -
>
> Key: SPARK-15517
> URL: https://issues.apache.org/jira/browse/SPARK-15517
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>
> Currently structured streaming only supports append output mode. This task is 
> to do the following. 
> - Add support for complete output mode in the planner
> - Add public API for users to specify output mode



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15517) Add support for complete output mode

2016-05-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15517:


Assignee: Apache Spark  (was: Tathagata Das)

> Add support for complete output mode 
> -
>
> Key: SPARK-15517
> URL: https://issues.apache.org/jira/browse/SPARK-15517
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Tathagata Das
>Assignee: Apache Spark
>
> Currently structured streaming only supports append output mode. This task is 
> to do the following. 
> - Add support for complete output mode in the planner
> - Add public API for users to specify output mode



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15517) Add support for complete output mode

2016-05-24 Thread Tathagata Das (JIRA)
Tathagata Das created SPARK-15517:
-

 Summary: Add support for complete output mode 
 Key: SPARK-15517
 URL: https://issues.apache.org/jira/browse/SPARK-15517
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Tathagata Das
Assignee: Tathagata Das


Currently structured streaming only supports append output mode. This task is 
to do the following. 

- Add support for complete output mode in the planner
- Add public API for users to specify output mode



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15516) Schema merging in driver fails for parquet when merging LongType and IntegerType

2016-05-24 Thread Hossein Falaki (JIRA)
Hossein Falaki created SPARK-15516:
--

 Summary: Schema merging in driver fails for parquet when merging 
LongType and IntegerType
 Key: SPARK-15516
 URL: https://issues.apache.org/jira/browse/SPARK-15516
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
 Environment: Databricks
Reporter: Hossein Falaki


I tried to create a table from partitioned parquet directories that requires 
schema merging. I get following error:
{code}
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$24$$anonfun$apply$9.apply(ParquetRelation.scala:831)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$24$$anonfun$apply$9.apply(ParquetRelation.scala:826)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$24.apply(ParquetRelation.scala:826)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$24.apply(ParquetRelation.scala:801)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$22.apply(RDD.scala:756)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$22.apply(RDD.scala:756)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:318)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:282)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: Failed to merge incompatible data 
types LongType and IntegerType
at org.apache.spark.sql.types.StructType$.merge(StructType.scala:462)
at 
org.apache.spark.sql.types.StructType$$anonfun$merge$1$$anonfun$apply$3.apply(StructType.scala:420)
at 
org.apache.spark.sql.types.StructType$$anonfun$merge$1$$anonfun$apply$3.apply(StructType.scala:418)
at scala.Option.map(Option.scala:145)
at 
org.apache.spark.sql.types.StructType$$anonfun$merge$1.apply(StructType.scala:418)
at 
org.apache.spark.sql.types.StructType$$anonfun$merge$1.apply(StructType.scala:415)
at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at org.apache.spark.sql.types.StructType$.merge(StructType.scala:415)
at org.apache.spark.sql.types.StructType.merge(StructType.scala:333)
at 
org.apache.spark.sql.execution.datasources.parquet.ParquetRelation$$anonfun$24$$anonfun$apply$9.apply(ParquetRelation.scala:829)
{code}

cc @rxin and [~mengxr]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15510) SparkR NaiveBayes should not require label to have NominalAttribute

2016-05-24 Thread Miao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299095#comment-15299095
 ] 

Miao Wang commented on SPARK-15510:
---

I will study the code and try to solve it.
Thanks!

> SparkR NaiveBayes should not require label to have NominalAttribute
> ---
>
> Key: SPARK-15510
> URL: https://issues.apache.org/jira/browse/SPARK-15510
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, SparkR
>Reporter: Joseph K. Bradley
>
> Currently, SparkR's NaiveBayes API fails if the label is numeric.  It works 
> if the label is a String.  This is because NaiveBayesWrapper requires that 
> the input column be annotated with NominalAttribute, which is created when 
> Strings are indexed by RFormula.  We should eliminate this restriction since 
> it causes failures easily, such as when trying to run NB on LibSVM datasets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15515) Error Handling in Running SQL Directly On Files

2016-05-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15515:


Assignee: (was: Apache Spark)

> Error Handling in Running SQL Directly On Files
> ---
>
> Key: SPARK-15515
> URL: https://issues.apache.org/jira/browse/SPARK-15515
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> For ORC source format, we are reporting the strange error message when we did 
> not enable Hive support: 
> {noformat}
> Table or view not found: `org.apache.spark.sql.hive.orc`.`file_path`
> {noformat}
> The example query is like
> {noformat}
>   sql(s"select id from `org.apache.spark.sql.hive.orc`.`file_path`")
> {noformat}
> Instead, we should issue the error message like:
> {noformat}
> "The ORC data source must be used with Hive support enabled"
> {noformat}
> For the Avro format, we still report the error message like:
> {noformat}
> Table or view not found: `com.databricks.spark.avro`.`file_path`
> {noformat}
> The example query is like
> {noformat}
>   sql(s"select id from `avro`.`file_path`")
>   sql(s"select id from `com.databricks.spark.avro`.`file_path`")
> {noformat}
> The desired message should be like:
> {noformat}
> Failed to find data source: avro. Please use Spark package 
> http://spark-packages.org/package/databricks/spark-avro";
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15439) Failed to run unit test in SparkR

2016-05-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299078#comment-15299078
 ] 

Apache Spark commented on SPARK-15439:
--

User 'wangmiao1981' has created a pull request for this issue:
https://github.com/apache/spark/pull/13284

> Failed to run unit test in SparkR
> -
>
> Key: SPARK-15439
> URL: https://issues.apache.org/jira/browse/SPARK-15439
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Kai Jiang
>
> Failed to run ./R/run-tests.sh   around recent commit (May 19, 2016)
> It might be related to permission. It seems I used `sudo ./R/run-tests.sh` 
> and it worked sometimes. Without permission, maybe we couldn't access /tmp 
> directory.  However, the SparkR unit testing is still brittle.
> [error 
> message|https://gist.github.com/vectorijk/71f4ff34e3d34a628b8a3013f0ca2aa2]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15439) Failed to run unit test in SparkR

2016-05-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15439:


Assignee: (was: Apache Spark)

> Failed to run unit test in SparkR
> -
>
> Key: SPARK-15439
> URL: https://issues.apache.org/jira/browse/SPARK-15439
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Kai Jiang
>
> Failed to run ./R/run-tests.sh   around recent commit (May 19, 2016)
> It might be related to permission. It seems I used `sudo ./R/run-tests.sh` 
> and it worked sometimes. Without permission, maybe we couldn't access /tmp 
> directory.  However, the SparkR unit testing is still brittle.
> [error 
> message|https://gist.github.com/vectorijk/71f4ff34e3d34a628b8a3013f0ca2aa2]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15515) Error Handling in Running SQL Directly On Files

2016-05-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15515:


Assignee: Apache Spark

> Error Handling in Running SQL Directly On Files
> ---
>
> Key: SPARK-15515
> URL: https://issues.apache.org/jira/browse/SPARK-15515
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>
> For ORC source format, we are reporting the strange error message when we did 
> not enable Hive support: 
> {noformat}
> Table or view not found: `org.apache.spark.sql.hive.orc`.`file_path`
> {noformat}
> The example query is like
> {noformat}
>   sql(s"select id from `org.apache.spark.sql.hive.orc`.`file_path`")
> {noformat}
> Instead, we should issue the error message like:
> {noformat}
> "The ORC data source must be used with Hive support enabled"
> {noformat}
> For the Avro format, we still report the error message like:
> {noformat}
> Table or view not found: `com.databricks.spark.avro`.`file_path`
> {noformat}
> The example query is like
> {noformat}
>   sql(s"select id from `avro`.`file_path`")
>   sql(s"select id from `com.databricks.spark.avro`.`file_path`")
> {noformat}
> The desired message should be like:
> {noformat}
> Failed to find data source: avro. Please use Spark package 
> http://spark-packages.org/package/databricks/spark-avro";
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15514) Unable to detect incompatibility libraries for Spark 2.0 in Data Source Resolution

2016-05-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15514:


Assignee: Apache Spark

> Unable to detect incompatibility libraries for Spark 2.0 in Data Source 
> Resolution
> --
>
> Key: SPARK-15514
> URL: https://issues.apache.org/jira/browse/SPARK-15514
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>
> In Data source resolution, the following class are removed in Spark 2.0. 
> However, we are unable to detect incompatibility libraries for Spark 2.0.
> {noformat}
> "org.apache.spark.sql.DataFrame"
> "org.apache.spark.sql.sources.HadoopFsRelationProvider"
> "org.apache.spark.Logging"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15439) Failed to run unit test in SparkR

2016-05-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15439:


Assignee: Apache Spark

> Failed to run unit test in SparkR
> -
>
> Key: SPARK-15439
> URL: https://issues.apache.org/jira/browse/SPARK-15439
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Kai Jiang
>Assignee: Apache Spark
>
> Failed to run ./R/run-tests.sh   around recent commit (May 19, 2016)
> It might be related to permission. It seems I used `sudo ./R/run-tests.sh` 
> and it worked sometimes. Without permission, maybe we couldn't access /tmp 
> directory.  However, the SparkR unit testing is still brittle.
> [error 
> message|https://gist.github.com/vectorijk/71f4ff34e3d34a628b8a3013f0ca2aa2]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15514) Unable to detect incompatibility libraries for Spark 2.0 in Data Source Resolution

2016-05-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299079#comment-15299079
 ] 

Apache Spark commented on SPARK-15514:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/13283

> Unable to detect incompatibility libraries for Spark 2.0 in Data Source 
> Resolution
> --
>
> Key: SPARK-15514
> URL: https://issues.apache.org/jira/browse/SPARK-15514
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> In Data source resolution, the following class are removed in Spark 2.0. 
> However, we are unable to detect incompatibility libraries for Spark 2.0.
> {noformat}
> "org.apache.spark.sql.DataFrame"
> "org.apache.spark.sql.sources.HadoopFsRelationProvider"
> "org.apache.spark.Logging"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15514) Unable to detect incompatibility libraries for Spark 2.0 in Data Source Resolution

2016-05-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15514:


Assignee: (was: Apache Spark)

> Unable to detect incompatibility libraries for Spark 2.0 in Data Source 
> Resolution
> --
>
> Key: SPARK-15514
> URL: https://issues.apache.org/jira/browse/SPARK-15514
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> In Data source resolution, the following class are removed in Spark 2.0. 
> However, we are unable to detect incompatibility libraries for Spark 2.0.
> {noformat}
> "org.apache.spark.sql.DataFrame"
> "org.apache.spark.sql.sources.HadoopFsRelationProvider"
> "org.apache.spark.Logging"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15515) Error Handling in Running SQL Directly On Files

2016-05-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299077#comment-15299077
 ] 

Apache Spark commented on SPARK-15515:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/13283

> Error Handling in Running SQL Directly On Files
> ---
>
> Key: SPARK-15515
> URL: https://issues.apache.org/jira/browse/SPARK-15515
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> For ORC source format, we are reporting the strange error message when we did 
> not enable Hive support: 
> {noformat}
> Table or view not found: `org.apache.spark.sql.hive.orc`.`file_path`
> {noformat}
> The example query is like
> {noformat}
>   sql(s"select id from `org.apache.spark.sql.hive.orc`.`file_path`")
> {noformat}
> Instead, we should issue the error message like:
> {noformat}
> "The ORC data source must be used with Hive support enabled"
> {noformat}
> For the Avro format, we still report the error message like:
> {noformat}
> Table or view not found: `com.databricks.spark.avro`.`file_path`
> {noformat}
> The example query is like
> {noformat}
>   sql(s"select id from `avro`.`file_path`")
>   sql(s"select id from `com.databricks.spark.avro`.`file_path`")
> {noformat}
> The desired message should be like:
> {noformat}
> Failed to find data source: avro. Please use Spark package 
> http://spark-packages.org/package/databricks/spark-avro";
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15491) JSON serialization fails for JDBC DataFrames

2016-05-24 Thread Huaxin Gao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299063#comment-15299063
 ] 

Huaxin Gao commented on SPARK-15491:


I will submit a PR soon. Thanks.

> JSON serialization fails for JDBC DataFrames
> 
>
> Key: SPARK-15491
> URL: https://issues.apache.org/jira/browse/SPARK-15491
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
> Environment: MacOS 10.11.5, Spark 2.0.0-preview
>Reporter: Marc Prud'hommeaux
>
> The TreeNode.toJSON feature implemented in SPARK-12321 fails with an 
> assertion error on DataFrames that use JDBC in spark 2.0.0-preview:
> {code}
> scala> 
> sqlContext.read.json("examples/src/main/resources/people.json").select("name",
>  "age").agg(avg("age"), count("name")).filter(avg("age") > 
> 10).queryExecution.logical.toJSON
> res113: String = 
> [{"class":"org.apache.spark.sql.catalyst.plans.logical.Filter","num-children":1,"condition":[{"class":"org.apache.spark.sql.catalyst.expressions.GreaterThan","num-children":2,"left":0,"right":1},{"class":"org.apache.spark.sql.catalyst.expressions.aggregate.AggregateExpression","num-children":1,...
> scala> sqlContext.read.format("jdbc").options(db + ("dbtable" -> 
> "categories")).load().queryExecution.logical.simpleString
> res120: String = Relation[category#2148,categoryname#2149] 
> JDBCRelation(categories)
> scala> sqlContext.read.format("jdbc").options(db + ("dbtable" -> 
> "categories")).load().queryExecution.logical.toJSON
> java.lang.AssertionError: assertion failed
>   at scala.Predef$.assert(Predef.scala:156)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.org$apache$spark$sql$catalyst$trees$TreeNode$$parseToJson(TreeNode.scala:598)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$jsonFields$2.apply(TreeNode.scala:562)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$jsonFields$2.apply(TreeNode.scala:553)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.jsonFields(TreeNode.scala:553)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.org$apache$spark$sql$catalyst$trees$TreeNode$$collectJsonValue$1(TreeNode.scala:538)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.jsonValue(TreeNode.scala:543)
>   at org.apache.spark.sql.catalyst.trees.TreeNode.toJSON(TreeNode.scala:529)
>   ... 48 elided
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15515) Error Handling in Running SQL Directly On Files

2016-05-24 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-15515:

Description: 
For ORC source format, we are reporting the strange error message when we did 
not enable Hive support: 
{noformat}
Table or view not found: `org.apache.spark.sql.hive.orc`.`file_path`
{noformat}
The example query is like
{noformat}
  sql(s"select id from `org.apache.spark.sql.hive.orc`.`file_path`")
{noformat}
Instead, we should issue the error message like:
{noformat}
"The ORC data source must be used with Hive support enabled"
{noformat}

For the Avro format, we still report the error message like:
{noformat}
Table or view not found: `com.databricks.spark.avro`.`file_path`
{noformat}

The example query is like
{noformat}
  sql(s"select id from `avro`.`file_path`")
  sql(s"select id from `com.databricks.spark.avro`.`file_path`")
{noformat}

The desired message should be like:
{noformat}
Failed to find data source: avro. Please use Spark package 
http://spark-packages.org/package/databricks/spark-avro";
{noformat}


  was:
For ORC source format, we are reporting the strange error message when we did 
not enable Hive support: 
{noformat}
Table or view not found: `org.apache.spark.sql.hive.orc`.`file_path`
{noformat}
The example query is like
{noformat}
  sql(s"select id from `org.apache.spark.sql.hive.orc`.`file_path`")
{noformat}
Instead, we should issue the error message like:
{noformat}
"The ORC data source must be used with Hive support enabled"
{noformat}

For the Avro format, we still report the error message like:
  sql(s"select id from `com.databricks.spark.avro`.`file_path`")

{noformat}
Table or view not found: `com.databricks.spark.avro`.`file_path`
{noformat}

The example query is like
{noformat}
  sql(s"select id from `avro`.`file_path`")
  sql(s"select id from `com.databricks.spark.avro`.`file_path`")
{noformat}

The desired message should be like:
{noformat}
Failed to find data source: avro. Please use Spark package 
http://spark-packages.org/package/databricks/spark-avro";
{noformat}



> Error Handling in Running SQL Directly On Files
> ---
>
> Key: SPARK-15515
> URL: https://issues.apache.org/jira/browse/SPARK-15515
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> For ORC source format, we are reporting the strange error message when we did 
> not enable Hive support: 
> {noformat}
> Table or view not found: `org.apache.spark.sql.hive.orc`.`file_path`
> {noformat}
> The example query is like
> {noformat}
>   sql(s"select id from `org.apache.spark.sql.hive.orc`.`file_path`")
> {noformat}
> Instead, we should issue the error message like:
> {noformat}
> "The ORC data source must be used with Hive support enabled"
> {noformat}
> For the Avro format, we still report the error message like:
> {noformat}
> Table or view not found: `com.databricks.spark.avro`.`file_path`
> {noformat}
> The example query is like
> {noformat}
>   sql(s"select id from `avro`.`file_path`")
>   sql(s"select id from `com.databricks.spark.avro`.`file_path`")
> {noformat}
> The desired message should be like:
> {noformat}
> Failed to find data source: avro. Please use Spark package 
> http://spark-packages.org/package/databricks/spark-avro";
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15515) Error Handling in Running SQL Directly On Files

2016-05-24 Thread Xiao Li (JIRA)
Xiao Li created SPARK-15515:
---

 Summary: Error Handling in Running SQL Directly On Files
 Key: SPARK-15515
 URL: https://issues.apache.org/jira/browse/SPARK-15515
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.0
Reporter: Xiao Li


For ORC source format, we are reporting the strange error message when we did 
not enable Hive support: 
{noformat}
Table or view not found: `org.apache.spark.sql.hive.orc`.`file_path`
{noformat}
The example query is like
{noformat}
  sql(s"select id from `org.apache.spark.sql.hive.orc`.`file_path`")
{noformat}
Instead, we should issue the error message like:
{noformat}
"The ORC data source must be used with Hive support enabled"
{noformat}

For the Avro format, we still report the error message like:
  sql(s"select id from `com.databricks.spark.avro`.`file_path`")

{noformat}
Table or view not found: `com.databricks.spark.avro`.`file_path`
{noformat}

The example query is like
{noformat}
  sql(s"select id from `avro`.`file_path`")
  sql(s"select id from `com.databricks.spark.avro`.`file_path`")
{noformat}

The desired message should be like:
{noformat}
Failed to find data source: avro. Please use Spark package 
http://spark-packages.org/package/databricks/spark-avro";
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15512) repartition(0) should raise IllegalArgumentException.

2016-05-24 Thread Dongjoon Hyun (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-15512:
--
Component/s: Spark Core

> repartition(0) should raise IllegalArgumentException.
> -
>
> Key: SPARK-15512
> URL: https://issues.apache.org/jira/browse/SPARK-15512
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Dongjoon Hyun
>
> Previously, SPARK-8893 added the positive partition constrains on 
> repartition/coalesce operations in general.
> This PR adds one missing part for that and adds explicit two testcases.
> **Before**
> {code:title=repartition(0)|borderStyle=solid}
> scala> sc.parallelize(1 to 5).coalesce(0).collect()
> java.lang.IllegalArgumentException: requirement failed: Number of partitions 
> (0) must be positive.
> ...
> scala> sc.parallelize(1 to 5).repartition(0).collect()
> res1: Array[Int] = Array()
> {code}
> **After**
> {code:title=repartition(0)|borderStyle=solid}
> scala> sc.parallelize(1 to 5).coalesce(0).collect()
> java.lang.IllegalArgumentException: requirement failed: Number of partitions 
> (0) must be positive.
> ...
> scala> sc.parallelize(1 to 5).repartition(0).collect()
> java.lang.IllegalArgumentException: requirement failed: Number of partitions 
> (0) must be positive.
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15514) Unable to detect incompatibility libraries for Spark 2.0 in Data Source Resolution

2016-05-24 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-15514:

Affects Version/s: 2.0.0

> Unable to detect incompatibility libraries for Spark 2.0 in Data Source 
> Resolution
> --
>
> Key: SPARK-15514
> URL: https://issues.apache.org/jira/browse/SPARK-15514
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> In Data source resolution, the following class are removed in Spark 2.0. 
> However, we are unable to detect incompatibility libraries for Spark 2.0.
> {noformat}
> "org.apache.spark.sql.DataFrame"
> "org.apache.spark.sql.sources.HadoopFsRelationProvider"
> "org.apache.spark.Logging"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15514) Unable to detect incompatibility libraries for Spark 2.0 in Data Source Resolution

2016-05-24 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-15514:

Summary: Unable to detect incompatibility libraries for Spark 2.0 in Data 
Source Resolution  (was: nable to detect incompatibility libraries for Spark 
2.0)

> Unable to detect incompatibility libraries for Spark 2.0 in Data Source 
> Resolution
> --
>
> Key: SPARK-15514
> URL: https://issues.apache.org/jira/browse/SPARK-15514
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> In Data source resolution, the following class are removed in Spark 2.0. 
> However, we are unable to detect incompatibility libraries for Spark 2.0.
> {noformat}
> "org.apache.spark.sql.DataFrame"
> "org.apache.spark.sql.sources.HadoopFsRelationProvider"
> "org.apache.spark.Logging"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15514) Unable to detect incompatibility libraries for Spark 2.0 in Data Source Resolution

2016-05-24 Thread Xiao Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-15514:

Component/s: SQL

> Unable to detect incompatibility libraries for Spark 2.0 in Data Source 
> Resolution
> --
>
> Key: SPARK-15514
> URL: https://issues.apache.org/jira/browse/SPARK-15514
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> In Data source resolution, the following class are removed in Spark 2.0. 
> However, we are unable to detect incompatibility libraries for Spark 2.0.
> {noformat}
> "org.apache.spark.sql.DataFrame"
> "org.apache.spark.sql.sources.HadoopFsRelationProvider"
> "org.apache.spark.Logging"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15513) Bzip2Factory in Hadoop 2.7.1 is not thread safe

2016-05-24 Thread Yin Huai (JIRA)
Yin Huai created SPARK-15513:


 Summary: Bzip2Factory in Hadoop 2.7.1 is not thread safe
 Key: SPARK-15513
 URL: https://issues.apache.org/jira/browse/SPARK-15513
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
 Environment: Hadoop 2.7.1
Reporter: Yin Huai


This is caused by https://issues.apache.org/jira/browse/HADOOP-12191. When we 
are loading the native bzip2 lib by one thread, other threads think that native 
bzip2 lib is not available and then throws exceptions. 

{code}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 6.0 failed 1 times, most recent failure: Lost task 0.0 in stage 6.0 (TID 
37, localhost): java.lang.UnsupportedOperationException
at 
org.apache.hadoop.io.compress.bzip2.BZip2DummyCompressor.finished(BZip2DummyCompressor.java:48)
at 
org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:65)
at java.io.DataOutputStream.write(DataOutputStream.java:107)
at 
org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.writeObject(TextOutputFormat.java:81)
at 
org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.write(TextOutputFormat.java:102)
at org.apache.spark.SparkHadoopWriter.write(SparkHadoopWriter.scala:95)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply$mcV$sp(PairRDDFunctions.scala:1205)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply(PairRDDFunctions.scala:1203)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$7.apply(PairRDDFunctions.scala:1203)
at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1278)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1211)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13.apply(PairRDDFunctions.scala:1190)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Suppressed: java.lang.UnsupportedOperationException
at 
org.apache.hadoop.io.compress.bzip2.BZip2DummyCompressor.finished(BZip2DummyCompressor.java:48)
at 
org.apache.hadoop.io.compress.CompressorStream.finish(CompressorStream.java:89)
at 
org.apache.hadoop.io.compress.CompressorStream.close(CompressorStream.java:106)
at java.io.FilterOutputStream.close(FilterOutputStream.java:159)
at 
org.apache.hadoop.mapred.TextOutputFormat$LineRecordWriter.close(TextOutputFormat.java:108)
at 
org.apache.spark.SparkHadoopWriter.close(SparkHadoopWriter.scala:102)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsHadoopDataset$1$$anonfun$13$$anonfun$apply$8.apply$mcV$sp(PairRDDFunctions.scala:1211)
at 
org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1296)
... 8 more

Driver stacktrace:
at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1450)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1438)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1437)
at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1437)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
at scala.Option.foreach(Option.scala:236)
at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1659)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618)
at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at 
org.apache

[jira] [Created] (SPARK-15514) nable to detect incompatibility libraries for Spark 2.0

2016-05-24 Thread Xiao Li (JIRA)
Xiao Li created SPARK-15514:
---

 Summary: nable to detect incompatibility libraries for Spark 2.0
 Key: SPARK-15514
 URL: https://issues.apache.org/jira/browse/SPARK-15514
 Project: Spark
  Issue Type: Bug
Reporter: Xiao Li


In Data source resolution, the following class are removed in Spark 2.0. 
However, we are unable to detect incompatibility libraries for Spark 2.0.
{noformat}
"org.apache.spark.sql.DataFrame"
"org.apache.spark.sql.sources.HadoopFsRelationProvider"
"org.apache.spark.Logging"
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15439) Failed to run unit test in SparkR

2016-05-24 Thread Miao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299011#comment-15299011
 ] 

Miao Wang commented on SPARK-15439:
---

It seems that this one is caused by https://github.com/apache/spark/pull/11318

Let me change some of this PR for test.

> Failed to run unit test in SparkR
> -
>
> Key: SPARK-15439
> URL: https://issues.apache.org/jira/browse/SPARK-15439
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Kai Jiang
>
> Failed to run ./R/run-tests.sh   around recent commit (May 19, 2016)
> It might be related to permission. It seems I used `sudo ./R/run-tests.sh` 
> and it worked sometimes. Without permission, maybe we couldn't access /tmp 
> directory.  However, the SparkR unit testing is still brittle.
> [error 
> message|https://gist.github.com/vectorijk/71f4ff34e3d34a628b8a3013f0ca2aa2]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15439) Failed to run unit test in SparkR

2016-05-24 Thread Miao Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299004#comment-15299004
 ] 

Miao Wang commented on SPARK-15439:
---

[~shivaram]I fixed that failure. But I see a new failure in test_sparkSQL.R#922

2. Error: subsetting (@test_sparkSQL.R#922) 
argument "subset" is missing, with no default
1: subset(df, select = "name") at 
/Users/mwang/spark_ws_0904/R/lib/SparkR/tests/testthat/test_sparkSQL.R:922
2: subset(df, select = "name")
3: .local(x, ...)
4: x[subset, select, drop = drop]

I am fixing it now.

> Failed to run unit test in SparkR
> -
>
> Key: SPARK-15439
> URL: https://issues.apache.org/jira/browse/SPARK-15439
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Kai Jiang
>
> Failed to run ./R/run-tests.sh   around recent commit (May 19, 2016)
> It might be related to permission. It seems I used `sudo ./R/run-tests.sh` 
> and it worked sometimes. Without permission, maybe we couldn't access /tmp 
> directory.  However, the SparkR unit testing is still brittle.
> [error 
> message|https://gist.github.com/vectorijk/71f4ff34e3d34a628b8a3013f0ca2aa2]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15512) repartition(0) should raise IllegalArgumentException.

2016-05-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15512:


Assignee: Apache Spark

> repartition(0) should raise IllegalArgumentException.
> -
>
> Key: SPARK-15512
> URL: https://issues.apache.org/jira/browse/SPARK-15512
> Project: Spark
>  Issue Type: Bug
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>
> Previously, SPARK-8893 added the positive partition constrains on 
> repartition/coalesce operations in general.
> This PR adds one missing part for that and adds explicit two testcases.
> **Before**
> {code:title=repartition(0)|borderStyle=solid}
> scala> sc.parallelize(1 to 5).coalesce(0).collect()
> java.lang.IllegalArgumentException: requirement failed: Number of partitions 
> (0) must be positive.
> ...
> scala> sc.parallelize(1 to 5).repartition(0).collect()
> res1: Array[Int] = Array()
> {code}
> **After**
> {code:title=repartition(0)|borderStyle=solid}
> scala> sc.parallelize(1 to 5).coalesce(0).collect()
> java.lang.IllegalArgumentException: requirement failed: Number of partitions 
> (0) must be positive.
> ...
> scala> sc.parallelize(1 to 5).repartition(0).collect()
> java.lang.IllegalArgumentException: requirement failed: Number of partitions 
> (0) must be positive.
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-15512) repartition(0) should raise IllegalArgumentException.

2016-05-24 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-15512:


Assignee: (was: Apache Spark)

> repartition(0) should raise IllegalArgumentException.
> -
>
> Key: SPARK-15512
> URL: https://issues.apache.org/jira/browse/SPARK-15512
> Project: Spark
>  Issue Type: Bug
>Reporter: Dongjoon Hyun
>
> Previously, SPARK-8893 added the positive partition constrains on 
> repartition/coalesce operations in general.
> This PR adds one missing part for that and adds explicit two testcases.
> **Before**
> {code:title=repartition(0)|borderStyle=solid}
> scala> sc.parallelize(1 to 5).coalesce(0).collect()
> java.lang.IllegalArgumentException: requirement failed: Number of partitions 
> (0) must be positive.
> ...
> scala> sc.parallelize(1 to 5).repartition(0).collect()
> res1: Array[Int] = Array()
> {code}
> **After**
> {code:title=repartition(0)|borderStyle=solid}
> scala> sc.parallelize(1 to 5).coalesce(0).collect()
> java.lang.IllegalArgumentException: requirement failed: Number of partitions 
> (0) must be positive.
> ...
> scala> sc.parallelize(1 to 5).repartition(0).collect()
> java.lang.IllegalArgumentException: requirement failed: Number of partitions 
> (0) must be positive.
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15512) repartition(0) should raise IllegalArgumentException.

2016-05-24 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299000#comment-15299000
 ] 

Apache Spark commented on SPARK-15512:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/13282

> repartition(0) should raise IllegalArgumentException.
> -
>
> Key: SPARK-15512
> URL: https://issues.apache.org/jira/browse/SPARK-15512
> Project: Spark
>  Issue Type: Bug
>Reporter: Dongjoon Hyun
>
> Previously, SPARK-8893 added the positive partition constrains on 
> repartition/coalesce operations in general.
> This PR adds one missing part for that and adds explicit two testcases.
> **Before**
> {code:title=repartition(0)|borderStyle=solid}
> scala> sc.parallelize(1 to 5).coalesce(0).collect()
> java.lang.IllegalArgumentException: requirement failed: Number of partitions 
> (0) must be positive.
> ...
> scala> sc.parallelize(1 to 5).repartition(0).collect()
> res1: Array[Int] = Array()
> {code}
> **After**
> {code:title=repartition(0)|borderStyle=solid}
> scala> sc.parallelize(1 to 5).coalesce(0).collect()
> java.lang.IllegalArgumentException: requirement failed: Number of partitions 
> (0) must be positive.
> ...
> scala> sc.parallelize(1 to 5).repartition(0).collect()
> java.lang.IllegalArgumentException: requirement failed: Number of partitions 
> (0) must be positive.
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-15512) repartition(0) should raise IllegalArgumentException.

2016-05-24 Thread Dongjoon Hyun (JIRA)
Dongjoon Hyun created SPARK-15512:
-

 Summary: repartition(0) should raise IllegalArgumentException.
 Key: SPARK-15512
 URL: https://issues.apache.org/jira/browse/SPARK-15512
 Project: Spark
  Issue Type: Bug
Reporter: Dongjoon Hyun


Previously, SPARK-8893 added the positive partition constrains on 
repartition/coalesce operations in general.

This PR adds one missing part for that and adds explicit two testcases.

**Before**
{code:title=repartition(0)|borderStyle=solid}
scala> sc.parallelize(1 to 5).coalesce(0).collect()
java.lang.IllegalArgumentException: requirement failed: Number of partitions 
(0) must be positive.
...
scala> sc.parallelize(1 to 5).repartition(0).collect()
res1: Array[Int] = Array()
{code}

**After**
{code:title=repartition(0)|borderStyle=solid}
scala> sc.parallelize(1 to 5).coalesce(0).collect()
java.lang.IllegalArgumentException: requirement failed: Number of partitions 
(0) must be positive.
...
scala> sc.parallelize(1 to 5).repartition(0).collect()
java.lang.IllegalArgumentException: requirement failed: Number of partitions 
(0) must be positive.
...
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15458) Disable schema inference for streaming datasets on file streams

2016-05-24 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das resolved SPARK-15458.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 13238
[https://github.com/apache/spark/pull/13238]

> Disable schema inference for streaming datasets on file streams
> ---
>
> Key: SPARK-15458
> URL: https://issues.apache.org/jira/browse/SPARK-15458
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Tathagata Das
>Assignee: Tathagata Das
> Fix For: 2.0.0
>
>
> If the user relies on the schema to be inferred in file streams can break 
> easily for multiple reasons
> - accidentally running on a directory which has no data
> - schema changing underneath
> - on restart, the query will infer schema again, and may unexpectedly infer 
> incorrect schema, as the file in the directory may be different at the time 
> of the restart.
> To avoid these complicated scenarios, for Spark 2.0, we are going to disable 
> schema inferencing by default with a config, so that user is forced to 
> consider explicitly what is the schema it wants, rather than the system 
> trying to infer it and run into weird corner cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-15511) Dropping data source table succeeds but throws exception

2016-05-24 Thread Andrew Or (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or resolved SPARK-15511.
---
Resolution: Not A Problem
  Assignee: Andrew Or

If you run into this issue again, just delete $SPARK_HOME/metastore_db

> Dropping data source table succeeds but throws exception
> 
>
> Key: SPARK-15511
> URL: https://issues.apache.org/jira/browse/SPARK-15511
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>Assignee: Andrew Or
>
> If the catalog is backed by Hive:
> {code}
> scala> sql("CREATE TABLE boxes (width INT, length INT, height INT) USING CSV")
> {code}
> {code}
> scala> sql("DROP TABLE boxes")
> 16/05/24 13:30:50 WARN DropTableCommand: 
> org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/user/hive/warehouse/boxes;
> com.google.common.util.concurrent.UncheckedExecutionException: 
> org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/user/hive/warehouse/boxes;
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4882)
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.apply(LocalCache.java:4898)
>   at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:170)
> ...
> Caused by: org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/user/hive/warehouse/boxes;
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:317)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:306)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
>   at scala.collection.immutable.List.flatMap(List.scala:344)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
>   at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog$$anon$1.load(HiveMetastoreCatalog.scala:133)
>   at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog$$anon$1.load(HiveMetastoreCatalog.scala:69)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15489) Dataset kryo encoder fails on Collections$UnmodifiableCollection

2016-05-24 Thread Michael Armbrust (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298883#comment-15298883
 ] 

Michael Armbrust commented on SPARK-15489:
--

It should run in the same JVM when running in local mode, otherwise it'll run 
in an executor.

I think that when we construct an encoder, we should probably be passing this 
kind of information in.

> Dataset kryo encoder fails on Collections$UnmodifiableCollection
> 
>
> Key: SPARK-15489
> URL: https://issues.apache.org/jira/browse/SPARK-15489
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Amit Sela
>
> When using Encoders with kryo to encode generically typed Objects in the 
> following manner:
> public static  Encoder encoder() {
>   return Encoders.kryo((Class) Object.class);
> }
> I get a decoding exception when trying to decode 
> `java.util.Collections$UnmodifiableCollection`, which probably comes from 
> Guava's `ImmutableList`.
> This happens when running with master = local[1]. Same code had no problems 
> with RDD api.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-15489) Dataset kryo encoder fails on Collections$UnmodifiableCollection

2016-05-24 Thread Michael Armbrust (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-15489:
-
Target Version/s: 2.0.0

> Dataset kryo encoder fails on Collections$UnmodifiableCollection
> 
>
> Key: SPARK-15489
> URL: https://issues.apache.org/jira/browse/SPARK-15489
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Amit Sela
>
> When using Encoders with kryo to encode generically typed Objects in the 
> following manner:
> public static  Encoder encoder() {
>   return Encoders.kryo((Class) Object.class);
> }
> I get a decoding exception when trying to decode 
> `java.util.Collections$UnmodifiableCollection`, which probably comes from 
> Guava's `ImmutableList`.
> This happens when running with master = local[1]. Same code had no problems 
> with RDD api.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15511) Dropping data source table succeeds but throws exception

2016-05-24 Thread Xiao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15298879#comment-15298879
 ] 

Xiao Li commented on SPARK-15511:
-

If nobody takes it, I can work on it. : )

> Dropping data source table succeeds but throws exception
> 
>
> Key: SPARK-15511
> URL: https://issues.apache.org/jira/browse/SPARK-15511
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Andrew Or
>
> If the catalog is backed by Hive:
> {code}
> scala> sql("CREATE TABLE boxes (width INT, length INT, height INT) USING CSV")
> {code}
> {code}
> scala> sql("DROP TABLE boxes")
> 16/05/24 13:30:50 WARN DropTableCommand: 
> org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/user/hive/warehouse/boxes;
> com.google.common.util.concurrent.UncheckedExecutionException: 
> org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/user/hive/warehouse/boxes;
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4882)
>   at 
> com.google.common.cache.LocalCache$LocalLoadingCache.apply(LocalCache.java:4898)
>   at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:170)
> ...
> Caused by: org.apache.spark.sql.AnalysisException: Path does not exist: 
> file:/user/hive/warehouse/boxes;
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:317)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource$$anonfun$12.apply(DataSource.scala:306)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
>   at scala.collection.immutable.List.flatMap(List.scala:344)
>   at 
> org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:306)
>   at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog$$anon$1.load(HiveMetastoreCatalog.scala:133)
>   at 
> org.apache.spark.sql.hive.HiveMetastoreCatalog$$anon$1.load(HiveMetastoreCatalog.scala:69)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >