date:20160403

[jira] [Assigned] (SPARK-14362) DDL Native Support: Drop View

2016-04-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14362:


Assignee: Apache Spark

> DDL Native Support: Drop View
> -
>
> Key: SPARK-14362
> URL: https://issues.apache.org/jira/browse/SPARK-14362
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>
> Native parsing and native analysis of DDL command: Drop View.
> Based on the HIVE DDL document for 
> [DROP_VIEW_WEB_LINK](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-
> DropView
> ), `DROP VIEW` is defined as, 
> Syntax:
> DROP VIEW [IF EXISTS] [db_name.]view_name;
>  - to remove metadata for the specified view. 
>  - illegal to use DROP TABLE on a view.
>  - illegal to use DROP VIEW on a table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14362) DDL Native Support: Drop View

2016-04-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14362:


Assignee: (was: Apache Spark)

> DDL Native Support: Drop View
> -
>
> Key: SPARK-14362
> URL: https://issues.apache.org/jira/browse/SPARK-14362
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> Native parsing and native analysis of DDL command: Drop View.
> Based on the HIVE DDL document for 
> [DROP_VIEW_WEB_LINK](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-
> DropView
> ), `DROP VIEW` is defined as, 
> Syntax:
> DROP VIEW [IF EXISTS] [db_name.]view_name;
>  - to remove metadata for the specified view. 
>  - illegal to use DROP TABLE on a view.
>  - illegal to use DROP VIEW on a table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14362) DDL Native Support: Drop View

2016-04-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223698#comment-15223698
 ] 

Apache Spark commented on SPARK-14362:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/12146

> DDL Native Support: Drop View
> -
>
> Key: SPARK-14362
> URL: https://issues.apache.org/jira/browse/SPARK-14362
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> Native parsing and native analysis of DDL command: Drop View.
> Based on the HIVE DDL document for 
> [DROP_VIEW_WEB_LINK](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-
> DropView
> ), `DROP VIEW` is defined as, 
> Syntax:
> DROP VIEW [IF EXISTS] [db_name.]view_name;
>  - to remove metadata for the specified view. 
>  - illegal to use DROP TABLE on a view.
>  - illegal to use DROP VIEW on a table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14363) Executor OOM while trying to acquire new page from the memory manager

2016-04-03 Thread Sital Kedia (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sital Kedia updated SPARK-14363:

Description: 
While running a Spark job, we see that the job fails because of executor OOM 
with following stack trace - 
{code}
java.lang.OutOfMemoryError: Unable to acquire 76 bytes of memory, got 0
at 
org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:120)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:326)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:341)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:91)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:168)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:90)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:64)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

{code}

  was:
The job fails because of executor OOM with following stack trace - 
{code}
java.lang.OutOfMemoryError: Unable to acquire 76 bytes of memory, got 0
at 
org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:120)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:326)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:341)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:91)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:168)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:90)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:64)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at

[jira] [Updated] (SPARK-14363) Executor OOM while trying to acquire new page from the memory manager

2016-04-03 Thread Sital Kedia (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sital Kedia updated SPARK-14363:

Description: 
The job fails because of executor OOM with following stack trace - 
{code}
java.lang.OutOfMemoryError: Unable to acquire 76 bytes of memory, got 0
at 
org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:120)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:326)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:341)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:91)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:168)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:90)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:64)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

{code}

  was:
The job fails because of executor OOM with following stack trace - 

java.lang.OutOfMemoryError: Unable to acquire 76 bytes of memory, got 0
at 
org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:120)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:326)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:341)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:91)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:168)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:90)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:64)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at

[jira] [Updated] (SPARK-14361) Support EXCLUDE clause in Window function framing

2016-04-03 Thread Xin Wu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xin Wu updated SPARK-14361:
---
Description: The current Spark SQL does not support the {code}exclude{code} 
clause in Window function framing clause, which is part of ANSI SQL2003's   
(was: The current Spark SQL does not support the `exclusion` clause, which is 
part of ANSI 

SQL2003’s `Window` syntax. For example, IBM Netezza fully supports it as shown 
in the 

[document web link]

(https://www.ibm.com/support/knowledgecenter/SSULQD_7.1.0/com.ibm.nz.dbu.doc/c_dbuser_wi

ndow_aggregation_family_syntax.html). We propose to support it in this JIRA. 


# Introduction

Below is the ANSI SQL2003’s `Window` syntax:
```
FUNCTION_NAME(expr) OVER {window_name | (window_specification)}
window_specification ::= [window_name] [partitioning] [ordering] [framing]
partitioning ::= PARTITION BY value[, value...] [COLLATE collation_name]
ordering ::= ORDER [SIBLINGS] BY rule[, rule...]
rule ::= {value | position | alias} [ASC | DESC] [NULLS {FIRST | LAST}]
framing ::= {ROWS | RANGE} {start | between} [exclusion]
start ::= {UNBOUNDED PRECEDING | unsigned-integer PRECEDING | CURRENT ROW}
between ::= BETWEEN bound AND bound
bound ::= {start | UNBOUNDED FOLLOWING | unsigned-integer FOLLOWING}
exclusion ::= {EXCLUDE CURRENT ROW | EXCLUDE GROUP | EXCLUDE TIES | EXCLUDE NO 
OTHERS}
```
Exclusion clause can be used to excluded certain rows from the window framing 
when 

calculating window aggregation function (e.g. AVG, SUM, MAX, MIN, COUNT, etc)  
related 

to current row. Types of window functions that are not supported are listed 
below:

1. Offset functions, such as lead(), lag()
2. Ranking functions, such as rank(), dense_rank(), percent_rank(), cume_dist, 
ntile()
3. Row number function, such as row_number()

# Definition
Syntax | Description
 | -
EXCLUDE CURRENT ROW | Specifies excluding the current row.
EXCLUDE GROUP | Specifies excluding the current row and all rows that are tied 
with it. 

Ties occur when there is a match on the order column or columns.
EXCLUDE NO OTHERS | Specifies not excluding any rows. This value is the default 
if you 

specify no exclusion.
EXCLUDE TIES | Specifies excluding all rows that are tied with the current row 
(peer 

rows), but retaining the current row.

# Use-case Examples:

- Let's say you want to find out for every employee, where is his/her salary at 
compared 

to the average salary of those within the same department and whose ages are 
within 5 

years younger and older. The query could be: 
```SQL
SELECT NAME, DEPT_ID, SALARY, AGE, AVG(SALARY) AS AVG_WITHIN_5_YEAR
OVER(PARTITION BY DEPT_ID 
 ORDER BY AGE 
 RANGE BETWEEN 5 PRECEDING AND 5 FOLLOWING 
 EXCLUDE CURRENT ROW) 
FROM EMPLOYEE 
```

- Let's say you want to compare every customer's yearly purchase with other 
customers' 

average yearly purchase who are at different age group from the current 
customer. The 

query could be: 
```SQL
SELECT CUST_NAME, AGE, PROD_CATEGORY, YEARLY_PURCHASE, AVG(YEARLY_PURCHASE) 
OVER(PARTITION BY PROD_CATEGORY 
 ORDER BY AGE 
 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUND FOLLOWING 
 EXCLUDE GROUP) 
FROM CUSTOMER_PURCHASE_SUM 
```)

> Support EXCLUDE clause in Window function framing
> -
>
> Key: SPARK-14361
> URL: https://issues.apache.org/jira/browse/SPARK-14361
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xin Wu
>
> The current Spark SQL does not support the {code}exclude{code} clause in 
> Window function framing clause, which is part of ANSI SQL2003's 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14363) Executor OOM while trying to acquire new page from the memory manager

2016-04-03 Thread Sital Kedia (JIRA)

Sital Kedia created SPARK-14363:
---

 Summary: Executor OOM while trying to acquire new page from the 
memory manager
 Key: SPARK-14363
 URL: https://issues.apache.org/jira/browse/SPARK-14363
 Project: Spark
  Issue Type: Bug
  Components: Shuffle
Affects Versions: 1.6.1
Reporter: Sital Kedia


The job fails because of executor OOM with following stack trace - 

java.lang.OutOfMemoryError: Unable to acquire 76 bytes of memory, got 0
at 
org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:120)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:326)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:341)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:91)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:168)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:90)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:64)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14362) DDL Native Support: Drop View

2016-04-03 Thread Xiao Li (JIRA)

Xiao Li created SPARK-14362:
---

 Summary: DDL Native Support: Drop View
 Key: SPARK-14362
 URL: https://issues.apache.org/jira/browse/SPARK-14362
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.0.0
Reporter: Xiao Li


Native parsing and native analysis of DDL command: Drop View.

Based on the HIVE DDL document for 
[DROP_VIEW_WEB_LINK](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-
DropView
), `DROP VIEW` is defined as, 

Syntax:

DROP VIEW [IF EXISTS] [db_name.]view_name;

 - to remove metadata for the specified view. 
 - illegal to use DROP TABLE on a view.
 - illegal to use DROP VIEW on a table.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14363) Executor OOM while trying to acquire new page from the memory manager

2016-04-03 Thread Sital Kedia (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sital Kedia updated SPARK-14363:

Description: 
The job fails because of executor OOM with following stack trace - 

java.lang.OutOfMemoryError: Unable to acquire 76 bytes of memory, got 0
at 
org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:120)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:326)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:341)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:91)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:168)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:90)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:64)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)



  was:
The job fails because of executor OOM with following stack trace - 

java.lang.OutOfMemoryError: Unable to acquire 76 bytes of memory, got 0
at 
org.apache.spark.memory.MemoryConsumer.allocatePage(MemoryConsumer.java:120)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.acquireNewPageIfNecessary(UnsafeExternalSorter.java:326)
at 
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.insertRecord(UnsafeExternalSorter.java:341)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.insertRow(UnsafeExternalRowSorter.java:91)
at 
org.apache.spark.sql.execution.UnsafeExternalRowSorter.sort(UnsafeExternalRowSorter.java:168)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:90)
at org.apache.spark.sql.execution.Sort$$anonfun$1.apply(Sort.scala:64)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$21.apply(RDD.scala:728)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at

[jira] [Created] (SPARK-14361) Support EXCLUDE clause in Window function framing

2016-04-03 Thread Xin Wu (JIRA)

Xin Wu created SPARK-14361:
--

 Summary: Support EXCLUDE clause in Window function framing
 Key: SPARK-14361
 URL: https://issues.apache.org/jira/browse/SPARK-14361
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.0.0
Reporter: Xin Wu


The current Spark SQL does not support the `exclusion` clause, which is part of 
ANSI 

SQL2003’s `Window` syntax. For example, IBM Netezza fully supports it as shown 
in the 

[document web link]

(https://www.ibm.com/support/knowledgecenter/SSULQD_7.1.0/com.ibm.nz.dbu.doc/c_dbuser_wi

ndow_aggregation_family_syntax.html). We propose to support it in this JIRA. 


# Introduction

Below is the ANSI SQL2003’s `Window` syntax:
```
FUNCTION_NAME(expr) OVER {window_name | (window_specification)}
window_specification ::= [window_name] [partitioning] [ordering] [framing]
partitioning ::= PARTITION BY value[, value...] [COLLATE collation_name]
ordering ::= ORDER [SIBLINGS] BY rule[, rule...]
rule ::= {value | position | alias} [ASC | DESC] [NULLS {FIRST | LAST}]
framing ::= {ROWS | RANGE} {start | between} [exclusion]
start ::= {UNBOUNDED PRECEDING | unsigned-integer PRECEDING | CURRENT ROW}
between ::= BETWEEN bound AND bound
bound ::= {start | UNBOUNDED FOLLOWING | unsigned-integer FOLLOWING}
exclusion ::= {EXCLUDE CURRENT ROW | EXCLUDE GROUP | EXCLUDE TIES | EXCLUDE NO 
OTHERS}
```
Exclusion clause can be used to excluded certain rows from the window framing 
when 

calculating window aggregation function (e.g. AVG, SUM, MAX, MIN, COUNT, etc)  
related 

to current row. Types of window functions that are not supported are listed 
below:

1. Offset functions, such as lead(), lag()
2. Ranking functions, such as rank(), dense_rank(), percent_rank(), cume_dist, 
ntile()
3. Row number function, such as row_number()

# Definition
Syntax | Description
 | -
EXCLUDE CURRENT ROW | Specifies excluding the current row.
EXCLUDE GROUP | Specifies excluding the current row and all rows that are tied 
with it. 

Ties occur when there is a match on the order column or columns.
EXCLUDE NO OTHERS | Specifies not excluding any rows. This value is the default 
if you 

specify no exclusion.
EXCLUDE TIES | Specifies excluding all rows that are tied with the current row 
(peer 

rows), but retaining the current row.

# Use-case Examples:

- Let's say you want to find out for every employee, where is his/her salary at 
compared 

to the average salary of those within the same department and whose ages are 
within 5 

years younger and older. The query could be: 
```SQL
SELECT NAME, DEPT_ID, SALARY, AGE, AVG(SALARY) AS AVG_WITHIN_5_YEAR
OVER(PARTITION BY DEPT_ID 
 ORDER BY AGE 
 RANGE BETWEEN 5 PRECEDING AND 5 FOLLOWING 
 EXCLUDE CURRENT ROW) 
FROM EMPLOYEE 
```

- Let's say you want to compare every customer's yearly purchase with other 
customers' 

average yearly purchase who are at different age group from the current 
customer. The 

query could be: 
```SQL
SELECT CUST_NAME, AGE, PROD_CATEGORY, YEARLY_PURCHASE, AVG(YEARLY_PURCHASE) 
OVER(PARTITION BY PROD_CATEGORY 
 ORDER BY AGE 
 RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUND FOLLOWING 
 EXCLUDE GROUP) 
FROM CUSTOMER_PURCHASE_SUM 
```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7179) Add pattern after "show tables" to filter desire tablename

2016-04-03 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223684#comment-15223684
 ] 

Dongjoon Hyun commented on SPARK-7179:
--

Sure, thank you! :)

> Add pattern after "show tables" to filter desire tablename
> --
>
> Key: SPARK-7179
> URL: https://issues.apache.org/jira/browse/SPARK-7179
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: baishuo
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14360) Allow using df.queryExecution.debug.codegen() to dump codegen

2016-04-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223671#comment-15223671
 ] 

Apache Spark commented on SPARK-14360:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/12144

> Allow using df.queryExecution.debug.codegen() to dump codegen
> -
>
> Key: SPARK-14360
> URL: https://issues.apache.org/jira/browse/SPARK-14360
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> We recently added the ability to dump the generated code for a given query. 
> However, the method is only available through an implicit after an import. 
> It'd slightly simplify things if it can be called directly in queryExecution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14360) Allow using df.queryExecution.debug.codegen() to dump codegen

2016-04-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14360:


Assignee: Reynold Xin  (was: Apache Spark)

> Allow using df.queryExecution.debug.codegen() to dump codegen
> -
>
> Key: SPARK-14360
> URL: https://issues.apache.org/jira/browse/SPARK-14360
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> We recently added the ability to dump the generated code for a given query. 
> However, the method is only available through an implicit after an import. 
> It'd slightly simplify things if it can be called directly in queryExecution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14360) Allow using df.queryExecution.debug.codegen() to dump codegen

2016-04-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14360:


Assignee: Apache Spark  (was: Reynold Xin)

> Allow using df.queryExecution.debug.codegen() to dump codegen
> -
>
> Key: SPARK-14360
> URL: https://issues.apache.org/jira/browse/SPARK-14360
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> We recently added the ability to dump the generated code for a given query. 
> However, the method is only available through an implicit after an import. 
> It'd slightly simplify things if it can be called directly in queryExecution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14360) Allow using df.queryExecution.debug.codegen() to dump codegen

2016-04-03 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-14360:
---

 Summary: Allow using df.queryExecution.debug.codegen() to dump 
codegen
 Key: SPARK-14360
 URL: https://issues.apache.org/jira/browse/SPARK-14360
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin


We recently added the ability to dump the generated code for a given query. 
However, the method is only available through an implicit after an import. It'd 
slightly simplify things if it can be called directly in queryExecution.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13258) --conf properties not honored in Mesos cluster mode

2016-04-03 Thread Vidhya Arvind (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223668#comment-15223668
 ] 

Vidhya Arvind commented on SPARK-13258:
---

hi - wondering if this has been fixed?

> --conf properties not honored in Mesos cluster mode
> ---
>
> Key: SPARK-13258
> URL: https://issues.apache.org/jira/browse/SPARK-13258
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.6.0
>Reporter: Michael Gummelt
>
> Spark properties set on {{spark-submit}} via the deprecated 
> {{SPARK_JAVA_OPTS}} are passed along to the driver, but those set via the 
> preferred {{--conf}} are not.
> For example, this results in the URI being fetched in the executor:
> {{SPARK_JAVA_OPTS="-Dspark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md
>  -Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0" 
> ./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077  
> --class org.apache.spark.examples.SparkPi 
> http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}}
> This does not:
> {{SPARK_JAVA_OPTS="-Dspark.mesos.executor.docker.image=mesosphere/spark:1.6.0"
>  ./bin/spark-submit --deploy-mode cluster --master mesos://10.0.78.140:7077 
> --conf 
> spark.mesos.uris=https://raw.githubusercontent.com/mesosphere/spark/master/README.md
>  --class org.apache.spark.examples.SparkPi 
> http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar}}
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L369
> In the above line of code, you can see that SPARK_JAVA_OPTS is passed along 
> to the driver, so those properties take effect.
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L373
> Whereas in this line of code, you see that {{--conf}} variables are set on 
> {{SPARK_EXECUTOR_OPTS}}, which AFAICT has absolutely no effect because this 
> env var is being set on the driver, not the executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7179) Add pattern after "show tables" to filter desire tablename

2016-04-03 Thread Reynold Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223665#comment-15223665
 ] 

Reynold Xin commented on SPARK-7179:


cc [~dongjoon] want to try this one?


> Add pattern after "show tables" to filter desire tablename
> --
>
> Key: SPARK-7179
> URL: https://issues.apache.org/jira/browse/SPARK-7179
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Reporter: baishuo
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-11823) HiveThriftBinaryServerSuite tests timing out, leaves hanging processes

2016-04-03 Thread Reynold Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-11823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223664#comment-15223664
 ] 

Reynold Xin commented on SPARK-11823:
-

[~joshrosen] want to leave an instruction here and see if other people can pick 
it up?


> HiveThriftBinaryServerSuite tests timing out, leaves hanging processes
> --
>
> Key: SPARK-11823
> URL: https://issues.apache.org/jira/browse/SPARK-11823
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Reporter: shane knapp
> Attachments: 
> spark-jenkins-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-amp-jenkins-worker-05.out,
>  stack.log
>
>
> i've noticed on a few branches that the HiveThriftBinaryServerSuite tests 
> time out, and when that happens, the build is aborted but the tests leave 
> behind hanging processes that eat up cpu and ram.
> most recently, i discovered this happening w/the 1.6 SBT build, specifically 
> w/the hadoop 2.0 profile:
> https://amplab.cs.berkeley.edu/jenkins/job/Spark-1.6-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.0,label=spark-test/56/console
> [~vanzin] grabbed the jstack log, which i've attached to this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14359) Improve user experience for typed aggregate functions in Java

2016-04-03 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-14359:
---

 Summary: Improve user experience for typed aggregate functions in 
Java
 Key: SPARK-14359
 URL: https://issues.apache.org/jira/browse/SPARK-14359
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin


See the Scala version in SPARK-14285. The main problem we'd need to work around 
is that Java cannot return primitive types in generics, and as a result we 
would have to return boxed types.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14103) Python DataFrame CSV load on large file is writing to console in Ipython

2016-04-03 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223650#comment-15223650
 ] 

Hyukjin Kwon commented on SPARK-14103:
--

This issue in Univocity is fixed and they will release {{2.0.2}} (See 
https://github.com/uniVocity/univocity-parsers/issues/60). Could I bump up this 
version to solve this issue?

> Python DataFrame CSV load on large file is writing to console in Ipython
> 
>
> Key: SPARK-14103
> URL: https://issues.apache.org/jira/browse/SPARK-14103
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
> Environment: Ubuntu, Python 2.7.11, Anaconda 2.5.0, Spark from Master 
> branch
>Reporter: Shubhanshu Mishra
>  Labels: csv, csvparser, dataframe, pyspark
>
> I am using the spark from the master branch and when I run the following 
> command on a large tab separated file then I get the contents of the file 
> being written to the stderr
> {code}
> df = sqlContext.read.load("temp.txt", format="csv", header="false", 
> inferSchema="true", delimiter="\t")
> {code}
> Here is a sample of output:
> {code}
> ^M[Stage 1:>  (0 + 2) 
> / 2]16/03/23 14:01:02 ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 
> 2)
> com.univocity.parsers.common.TextParsingException: Error processing input: 
> Length of parsed input (101) exceeds the maximum number of characters 
> defined in your parser settings (100). Identified line separator 
> characters in the parsed content. This may be the cause of the error. The 
> line separator in your parser settings is set to '\n'. Parsed content:
> Privacy-shake",: a haptic interface for managing privacy settings in 
> mobile location sharing applications   privacy shake a haptic interface 
> for managing privacy settings in mobile location sharing applications  2010   
>  2010/09/07  international conference on human computer 
> interaction  interact4333105819371[\n]
> 3D4F6CA1Between the Profiles: Another such Bias. Technology 
> Acceptance Studies on Social Network Services   between the profiles 
> another such bias technology acceptance studies on social network services 
> 20152015/08/02  10.1007/978-3-319-21383-5_12international 
> conference on human-computer interaction  interact43331058
> 19502[\n]
> ...
> .
> web snippets20082008/05/04  10.1007/978-3-642-01344-7_13
> international conference on web information systems and technologies
> webist  44F2980219489
> 06FA3FFAInteractive 3D User Interfaces for Neuroanatomy Exploration   
>   interactive 3d user interfaces for neuroanatomy exploration 2009
> internationa]
> at 
> com.univocity.parsers.common.AbstractParser.handleException(AbstractParser.java:241)
> at 
> com.univocity.parsers.common.AbstractParser.parseNext(AbstractParser.java:356)
> at 
> org.apache.spark.sql.execution.datasources.csv.BulkCsvReader.next(CSVParser.scala:137)
> at 
> org.apache.spark.sql.execution.datasources.csv.BulkCsvReader.next(CSVParser.scala:120)
> at scala.collection.Iterator$class.foreach(Iterator.scala:742)
> at 
> org.apache.spark.sql.execution.datasources.csv.BulkCsvReader.foreach(CSVParser.scala:120)
> at 
> scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:155)
> at 
> org.apache.spark.sql.execution.datasources.csv.BulkCsvReader.foldLeft(CSVParser.scala:120)
> at 
> scala.collection.TraversableOnce$class.aggregate(TraversableOnce.scala:212)
> at 
> org.apache.spark.sql.execution.datasources.csv.BulkCsvReader.aggregate(CSVParser.scala:120)
> at 
> org.apache.spark.rdd.RDD$$anonfun$aggregate$1$$anonfun$22.apply(RDD.scala:1058)
> at 
> org.apache.spark.rdd.RDD$$anonfun$aggregate$1$$anonfun$22.apply(RDD.scala:1058)
> at 
> org.apache.spark.SparkContext$$anonfun$35.apply(SparkContext.scala:1827)
> at 
> org.apache.spark.SparkContext$$anonfun$35.apply(SparkContext.scala:1827)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:69)
> at org.apache.spark.scheduler.Task.run(Task.scala:82)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:231)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ArrayIndexOutOfBoundsException
> 16/03/23 14:01:03 ERROR TaskSetManager: Task 0 in stage 1.0 failed 1 times; 
> aborting

[jira] [Assigned] (SPARK-14301) Java examples code merge and clean up

2016-04-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14301:


Assignee: (was: Apache Spark)

> Java examples code merge and clean up
> -
>
> Key: SPARK-14301
> URL: https://issues.apache.org/jira/browse/SPARK-14301
> Project: Spark
>  Issue Type: Sub-task
>  Components: Examples
>Reporter: Xusen Yin
>Priority: Minor
>  Labels: starter
>
> Duplicated code that I found in java/examples/mllib and java/examples/ml:
> * java/ml
> ** JavaCrossValidatorExample.java
> ** JavaDocument.java
> ** JavaLabeledDocument.java
> ** JavaTrainValidationSplitExample.java
> * Unsure code duplications of java/ml, double check
> ** JavaDeveloperApiExample.java
> ** JavaSimpleParamsExample.java
> ** JavaSimpleTextClassificationPipeline.java
> * java/mllib
> ** JavaKMeans.java
> ** JavaLDAExample.java
> ** JavaLR.java
> * Unsure code duplications of java/mllib, double check
> ** JavaALS.java
> ** JavaFPGrowthExample.java
> When merging and cleaning those code, be sure not disturb the previous 
> example on and off blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14301) Java examples code merge and clean up

2016-04-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223651#comment-15223651
 ] 

Apache Spark commented on SPARK-14301:
--

User 'yongtang' has created a pull request for this issue:
https://github.com/apache/spark/pull/12143

> Java examples code merge and clean up
> -
>
> Key: SPARK-14301
> URL: https://issues.apache.org/jira/browse/SPARK-14301
> Project: Spark
>  Issue Type: Sub-task
>  Components: Examples
>Reporter: Xusen Yin
>Priority: Minor
>  Labels: starter
>
> Duplicated code that I found in java/examples/mllib and java/examples/ml:
> * java/ml
> ** JavaCrossValidatorExample.java
> ** JavaDocument.java
> ** JavaLabeledDocument.java
> ** JavaTrainValidationSplitExample.java
> * Unsure code duplications of java/ml, double check
> ** JavaDeveloperApiExample.java
> ** JavaSimpleParamsExample.java
> ** JavaSimpleTextClassificationPipeline.java
> * java/mllib
> ** JavaKMeans.java
> ** JavaLDAExample.java
> ** JavaLR.java
> * Unsure code duplications of java/mllib, double check
> ** JavaALS.java
> ** JavaFPGrowthExample.java
> When merging and cleaning those code, be sure not disturb the previous 
> example on and off blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14301) Java examples code merge and clean up

2016-04-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14301:


Assignee: Apache Spark

> Java examples code merge and clean up
> -
>
> Key: SPARK-14301
> URL: https://issues.apache.org/jira/browse/SPARK-14301
> Project: Spark
>  Issue Type: Sub-task
>  Components: Examples
>Reporter: Xusen Yin
>Assignee: Apache Spark
>Priority: Minor
>  Labels: starter
>
> Duplicated code that I found in java/examples/mllib and java/examples/ml:
> * java/ml
> ** JavaCrossValidatorExample.java
> ** JavaDocument.java
> ** JavaLabeledDocument.java
> ** JavaTrainValidationSplitExample.java
> * Unsure code duplications of java/ml, double check
> ** JavaDeveloperApiExample.java
> ** JavaSimpleParamsExample.java
> ** JavaSimpleTextClassificationPipeline.java
> * java/mllib
> ** JavaKMeans.java
> ** JavaLDAExample.java
> ** JavaLR.java
> * Unsure code duplications of java/mllib, double check
> ** JavaALS.java
> ** JavaFPGrowthExample.java
> When merging and cleaning those code, be sure not disturb the previous 
> example on and off blocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14358) Change SparkListener from a trait to an abstract class, and remove JavaSparkListener

2016-04-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223647#comment-15223647
 ] 

Apache Spark commented on SPARK-14358:
--

User 'rxin' has created a pull request for this issue:
https://github.com/apache/spark/pull/12142

> Change SparkListener from a trait to an abstract class, and remove 
> JavaSparkListener
> 
>
> Key: SPARK-14358
> URL: https://issues.apache.org/jira/browse/SPARK-14358
> Project: Spark
>  Issue Type: Sub-task
>  Components: Scheduler, Spark Core
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> Scala traits are difficult to maintain binary compatibility on, and as a 
> result we had to introduce JavaSparkListener. In Spark 2.0 we can change 
> SparkListener from a trait to an abstract class and then remove 
> JavaSparkListener.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14358) Change SparkListener from a trait to an abstract class, and remove JavaSparkListener

2016-04-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14358:


Assignee: Reynold Xin  (was: Apache Spark)

> Change SparkListener from a trait to an abstract class, and remove 
> JavaSparkListener
> 
>
> Key: SPARK-14358
> URL: https://issues.apache.org/jira/browse/SPARK-14358
> Project: Spark
>  Issue Type: Sub-task
>  Components: Scheduler, Spark Core
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>
> Scala traits are difficult to maintain binary compatibility on, and as a 
> result we had to introduce JavaSparkListener. In Spark 2.0 we can change 
> SparkListener from a trait to an abstract class and then remove 
> JavaSparkListener.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14358) Change SparkListener from a trait to an abstract class, and remove JavaSparkListener

2016-04-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14358:


Assignee: Apache Spark  (was: Reynold Xin)

> Change SparkListener from a trait to an abstract class, and remove 
> JavaSparkListener
> 
>
> Key: SPARK-14358
> URL: https://issues.apache.org/jira/browse/SPARK-14358
> Project: Spark
>  Issue Type: Sub-task
>  Components: Scheduler, Spark Core
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> Scala traits are difficult to maintain binary compatibility on, and as a 
> result we had to introduce JavaSparkListener. In Spark 2.0 we can change 
> SparkListener from a trait to an abstract class and then remove 
> JavaSparkListener.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14358) Change SparkListener from a trait to an abstract class, and remove JavaSparkListener

2016-04-03 Thread Reynold Xin (JIRA)

Reynold Xin created SPARK-14358:
---

 Summary: Change SparkListener from a trait to an abstract class, 
and remove JavaSparkListener
 Key: SPARK-14358
 URL: https://issues.apache.org/jira/browse/SPARK-14358
 Project: Spark
  Issue Type: Sub-task
  Components: Scheduler, Spark Core
Reporter: Reynold Xin
Assignee: Reynold Xin


Scala traits are difficult to maintain binary compatibility on, and as a result 
we had to introduce JavaSparkListener. In Spark 2.0 we can change SparkListener 
from a trait to an abstract class and then remove JavaSparkListener.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14356) Update spark.sql.execution.debug to work on Datasets

2016-04-03 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-14356.
-
   Resolution: Fixed
Fix Version/s: 2.0.0

> Update spark.sql.execution.debug to work on Datasets
> 
>
> Key: SPARK-14356
> URL: https://issues.apache.org/jira/browse/SPARK-14356
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Matei Zaharia
>Assignee: Matei Zaharia
>Priority: Minor
> Fix For: 2.0.0
>
>
> Currently it only works on DataFrame, which seems unnecessarily restrictive 
> for 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14283) Avoid sort in randomSplit when possible

2016-04-03 Thread Bo Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223637#comment-15223637
 ] 

Bo Meng commented on SPARK-14283:
-

Could you please provide more details, such as test cases, use cases, etc.?

> Avoid sort in randomSplit when possible
> ---
>
> Key: SPARK-14283
> URL: https://issues.apache.org/jira/browse/SPARK-14283
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Joseph K. Bradley
>
> Dataset.randomSplit sorts each partition in order to guarantee an ordering 
> and make randomSplit deterministic given the seed.  Since randomSplit is used 
> a fair amount in ML, it would be great to avoid the sort when possible.
> Are there cases when it could be avoided?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13456) Cannot create encoders for case classes defined in Spark shell after upgrading to Scala 2.11

2016-04-03 Thread Wenchen Fan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223635#comment-15223635
 ] 

Wenchen Fan commented on SPARK-13456:
-

This may be related to https://issues.apache.org/jira/browse/SPARK-13611 , that 
the imported type somehow doesn't match the real one. Anyway, I think it's a 
different issue from what this JIRA reported, [~jlaskowski] do you mind 
creating a new ticket for it?

> Cannot create encoders for case classes defined in Spark shell after 
> upgrading to Scala 2.11
> 
>
> Key: SPARK-13456
> URL: https://issues.apache.org/jira/browse/SPARK-13456
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Wenchen Fan
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Spark 2.0 started to use Scala 2.11 by default since [PR 
> #10608|https://github.com/apache/spark/pull/10608].  Unfortunately, after 
> this upgrade, Spark fails to create encoders for case classes defined in REPL:
> {code}
> import sqlContext.implicits._
> case class T(a: Int, b: Double)
> val ds = Seq(1 -> T(1, 1D), 2 -> T(2, 2D)).toDS()
> {code}
> Exception thrown:
> {noformat}
> org.apache.spark.sql.AnalysisException: Unable to generate an encoder for 
> inner class `T` without access to the scope that this class was defined in.
> Try moving this class out of its parent class.;
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$resolveDeserializer$1.applyOrElse(Analyzer.scala:565)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$resolveDeserializer$1.applyOrElse(Analyzer.scala:561)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:262)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:262)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:261)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:304)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:308)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1194)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:300)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1194)
>   at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:287)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1194)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:353)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5$$anonfun$apply$11.apply(TreeNode.scala:333)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:308)
>   at

[jira] [Updated] (SPARK-14321) Reduce date format cost in date functions

2016-04-03 Thread Rajesh Balamohan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Balamohan updated SPARK-14321:
-
Summary: Reduce date format cost in date functions  (was: Reduce date 
format cost and string-to-date cost in date functions)

> Reduce date format cost in date functions
> -
>
> Key: SPARK-14321
> URL: https://issues.apache.org/jira/browse/SPARK-14321
> Project: Spark
>  Issue Type: Bug
>Reporter: Rajesh Balamohan
>Priority: Minor
>
> Currently the code generated is
> {noformat}
> /* 066 */ UTF8String primitive5 = null;
> /* 067 */ if (!isNull4) {
> /* 068 */   try {
> /* 069 */ primitive5 = UTF8String.fromString(new 
> java.text.SimpleDateFormat("-MM-dd HH:mm:ss").format(
> /* 070 */ new java.util.Date(primitive7 * 1000L)));
> /* 071 */   } catch (java.lang.Throwable e) {
> /* 072 */ isNull4 = true;
> /* 073 */   }
> /* 074 */ }
> {noformat}
> Instantiation of SimpleDateFormat is fairly expensive. It can be created on 
> need basis. 
> I will share the patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14357) Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job failure

2016-04-03 Thread Jason Moore (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Moore updated SPARK-14357:

Component/s: Spark Core

> Tasks that fail due to CommitDeniedException (a side-effect of speculation) 
> can cause job failure
> -
>
> Key: SPARK-14357
> URL: https://issues.apache.org/jira/browse/SPARK-14357
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2, 1.6.0, 1.6.1
>Reporter: Jason Moore
>Priority: Critical
>
> Speculation can often result in a CommitDeniedException, but ideally this 
> shouldn't result in the job failing.  So changes were made along with 
> SPARK-8167 to ensure that the CommitDeniedException is caught and given a 
> failure reason that doesn't increment the failure count.
> However, I'm still noticing that this exception is causing jobs to fail using 
> the 1.6.1 release version.
> {noformat}
> 16/04/04 11:36:02 ERROR InsertIntoHadoopFsRelation: Aborting job.
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 18 in 
> stage 315.0 failed 8 times, most recent failure: Lost task 18.8 in stage 
> 315.0 (TID 100793, qaphdd099.quantium.com.au.local): 
> org.apache.spark.SparkException: Task failed while writing rows.
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:272)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> at org.apache.spark.scheduler.Task.run(Task.scala:89)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: Failed to commit task
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.commitTask$1(WriterContainer.scala:287)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:267)
> ... 8 more
> Caused by: org.apache.spark.executor.CommitDeniedException: 
> attempt_201604041136_0315_m_18_8: Not committed because the driver did 
> not authorize commit
> at 
> org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:135)
> at 
> org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitTask(WriterContainer.scala:219)
> at 
> org.apache.spark.sql.execution.datasources.DefaultWriterContainer.commitTask$1(WriterContainer.scala:282)
> ... 9 more
> {noformat}
> It seems to me that the CommitDeniedException gets wrapped into a 
> RuntimeException at 
> [WriterContainer.scala#L286|https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriterContainer.scala#L286]
>  and then into a SparkException at 
> [InsertIntoHadoopFsRelation.scala#L154|https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala#L154]
>  which results in it not being able to be handled properly at 
> [Executor.scala#L290|https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/executor/Executor.scala#L290]
> The solution might be that this catch block should type match on the 
> inner-most cause of an error?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14357) Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job failure

2016-04-03 Thread Jason Moore (JIRA)

Jason Moore created SPARK-14357:
---

 Summary: Tasks that fail due to CommitDeniedException (a 
side-effect of speculation) can cause job failure
 Key: SPARK-14357
 URL: https://issues.apache.org/jira/browse/SPARK-14357
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.6.1, 1.6.0, 1.5.2
Reporter: Jason Moore
Priority: Critical


Speculation can often result in a CommitDeniedException, but ideally this 
shouldn't result in the job failing.  So changes were made along with 
SPARK-8167 to ensure that the CommitDeniedException is caught and given a 
failure reason that doesn't increment the failure count.

However, I'm still noticing that this exception is causing jobs to fail using 
the 1.6.1 release version.

{noformat}
16/04/04 11:36:02 ERROR InsertIntoHadoopFsRelation: Aborting job.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 18 in 
stage 315.0 failed 8 times, most recent failure: Lost task 18.8 in stage 315.0 
(TID 100793, qaphdd099.quantium.com.au.local): org.apache.spark.SparkException: 
Task failed while writing rows.
at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:272)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at 
org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation$$anonfun$run$1$$anonfun$apply$mcV$sp$3.apply(InsertIntoHadoopFsRelation.scala:150)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Failed to commit task
at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer.commitTask$1(WriterContainer.scala:287)
at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer.writeRows(WriterContainer.scala:267)
... 8 more
Caused by: org.apache.spark.executor.CommitDeniedException: 
attempt_201604041136_0315_m_18_8: Not committed because the driver did not 
authorize commit
at 
org.apache.spark.mapred.SparkHadoopMapRedUtil$.commitTask(SparkHadoopMapRedUtil.scala:135)
at 
org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitTask(WriterContainer.scala:219)
at 
org.apache.spark.sql.execution.datasources.DefaultWriterContainer.commitTask$1(WriterContainer.scala:282)
... 9 more
{noformat}

It seems to me that the CommitDeniedException gets wrapped into a 
RuntimeException at 
[WriterContainer.scala#L286|https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/WriterContainer.scala#L286]
 and then into a SparkException at 
[InsertIntoHadoopFsRelation.scala#L154|https://github.com/apache/spark/blob/branch-1.6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/InsertIntoHadoopFsRelation.scala#L154]
 which results in it not being able to be handled properly at 
[Executor.scala#L290|https://github.com/apache/spark/blob/branch-1.6/core/src/main/scala/org/apache/spark/executor/Executor.scala#L290]

The solution might be that this catch block should type match on the inner-most 
cause of an error?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14353) Dateset Time Windowing API for Python, R, and SQL

2016-04-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223610#comment-15223610
 ] 

Apache Spark commented on SPARK-14353:
--

User 'brkyvz' has created a pull request for this issue:
https://github.com/apache/spark/pull/12141

> Dateset Time Windowing API for Python, R, and SQL
> -
>
> Key: SPARK-14353
> URL: https://issues.apache.org/jira/browse/SPARK-14353
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark, SparkR, SQL
>Reporter: Burak Yavuz
>
> The time windowing function `window` was added to Datasets. This JIRA is to 
> track the status for the R, Python and SQL API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14356) Update spark.sql.execution.debug to work on Datasets

2016-04-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223605#comment-15223605
 ] 

Apache Spark commented on SPARK-14356:
--

User 'mateiz' has created a pull request for this issue:
https://github.com/apache/spark/pull/12140

> Update spark.sql.execution.debug to work on Datasets
> 
>
> Key: SPARK-14356
> URL: https://issues.apache.org/jira/browse/SPARK-14356
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Matei Zaharia
>Assignee: Matei Zaharia
>Priority: Minor
>
> Currently it only works on DataFrame, which seems unnecessarily restrictive 
> for 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14356) Update spark.sql.execution.debug to work on Datasets

2016-04-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14356:


Assignee: Apache Spark  (was: Matei Zaharia)

> Update spark.sql.execution.debug to work on Datasets
> 
>
> Key: SPARK-14356
> URL: https://issues.apache.org/jira/browse/SPARK-14356
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Matei Zaharia
>Assignee: Apache Spark
>Priority: Minor
>
> Currently it only works on DataFrame, which seems unnecessarily restrictive 
> for 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14356) Update spark.sql.execution.debug to work on Datasets

2016-04-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14356:


Assignee: Matei Zaharia  (was: Apache Spark)

> Update spark.sql.execution.debug to work on Datasets
> 
>
> Key: SPARK-14356
> URL: https://issues.apache.org/jira/browse/SPARK-14356
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Matei Zaharia
>Assignee: Matei Zaharia
>Priority: Minor
>
> Currently it only works on DataFrame, which seems unnecessarily restrictive 
> for 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14356) Update spark.sql.execution.debug to work on Datasets

2016-04-03 Thread Matei Zaharia (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matei Zaharia reassigned SPARK-14356:
-

Assignee: Matei Zaharia

> Update spark.sql.execution.debug to work on Datasets
> 
>
> Key: SPARK-14356
> URL: https://issues.apache.org/jira/browse/SPARK-14356
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Matei Zaharia
>Assignee: Matei Zaharia
>Priority: Minor
>
> Currently it only works on DataFrame, which seems unnecessarily restrictive 
> for 2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14356) Update spark.sql.execution.debug to work on Datasets

2016-04-03 Thread Matei Zaharia (JIRA)

Matei Zaharia created SPARK-14356:
-

 Summary: Update spark.sql.execution.debug to work on Datasets
 Key: SPARK-14356
 URL: https://issues.apache.org/jira/browse/SPARK-14356
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Matei Zaharia
Priority: Minor


Currently it only works on DataFrame, which seems unnecessarily restrictive for 
2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14355) Fix typos in Exception/Testcase/Comments and static analysis results

2016-04-03 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-14355.
-
   Resolution: Fixed
 Assignee: Dongjoon Hyun
Fix Version/s: 2.0.0

> Fix typos in Exception/Testcase/Comments and static analysis results
> 
>
> Key: SPARK-14355
> URL: https://issues.apache.org/jira/browse/SPARK-14355
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
> Fix For: 2.0.0
>
>
> This issue contains the following 5 types of maintenance fix over 59 files 
> (+94 lines, -93 lines).
> * Fix typos(exception/log strings, testcase name, comments) in 44 lines.
> * Fix lint-java errors (MaxLineLength) in 6 lines. (New codes after 
> SPARK-14011)
> * Use diamond operators in 40 lines. (New codes after SPARK-13702)
> * Fix redundant semicolon in 5 lines.
> * Rename class `InferSchemaSuite` to `CSVInferSchemaSuite` in 
> CSVInferSchemaSuite.scala.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-13456) Cannot create encoders for case classes defined in Spark shell after upgrading to Scala 2.11

2016-04-03 Thread Jacek Laskowski (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Laskowski reopened SPARK-13456:
-

It appears that the issue has not been resolved properly entirely yet.

> Cannot create encoders for case classes defined in Spark shell after 
> upgrading to Scala 2.11
> 
>
> Key: SPARK-13456
> URL: https://issues.apache.org/jira/browse/SPARK-13456
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Wenchen Fan
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Spark 2.0 started to use Scala 2.11 by default since [PR 
> #10608|https://github.com/apache/spark/pull/10608].  Unfortunately, after 
> this upgrade, Spark fails to create encoders for case classes defined in REPL:
> {code}
> import sqlContext.implicits._
> case class T(a: Int, b: Double)
> val ds = Seq(1 -> T(1, 1D), 2 -> T(2, 2D)).toDS()
> {code}
> Exception thrown:
> {noformat}
> org.apache.spark.sql.AnalysisException: Unable to generate an encoder for 
> inner class `T` without access to the scope that this class was defined in.
> Try moving this class out of its parent class.;
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$resolveDeserializer$1.applyOrElse(Analyzer.scala:565)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$resolveDeserializer$1.applyOrElse(Analyzer.scala:561)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:262)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:262)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:261)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:304)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:308)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1194)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:300)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1194)
>   at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:287)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1194)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:353)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5$$anonfun$apply$11.apply(TreeNode.scala:333)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:308)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1194)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:300)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1194)
>   at

[jira] [Commented] (SPARK-13456) Cannot create encoders for case classes defined in Spark shell after upgrading to Scala 2.11

2016-04-03 Thread Jacek Laskowski (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223574#comment-15223574
 ] 

Jacek Laskowski commented on SPARK-13456:
-

I thought it worked fine, but just today ran across the following that looks 
like the issue has not been resolved.

{code}
scala> :pa
// Entering paste mode (ctrl-D to finish)

case class Token(name: String, productId: Int, score: Double)
val data = Seq(
  Token("aaa", 100, 0.12),
  Token("aaa", 200, 0.29),
  Token("bbb", 200, 0.53),
  Token("bbb", 300, 0.42))

// Exiting paste mode, now interpreting.

defined class Token
data: Seq[Token] = List(Token(aaa,100,0.12), Token(aaa,200,0.29), 
Token(bbb,200,0.53), Token(bbb,300,0.42))

scala> val ds = data.toDS
ds: org.apache.spark.sql.Dataset[Token] = [name: string, productId: int ... 1 
more field]

scala> val ds: Dataset[Token] = data.toDS
:27: error: not found: type Dataset
   val ds: Dataset[Token] = data.toDS
   ^

scala> import org.apache.spark.sql._
import org.apache.spark.sql._

scala> val ds: Dataset[Token] = data.toDS
:30: error: type mismatch;
 found   : org.apache.spark.sql.Dataset[Token]
 required: org.apache.spark.sql.Dataset[Token]
   val ds: Dataset[Token] = data.toDS
 ^
scala> sc.version
res0: String = 2.0.0-SNAPSHOT
{code}

{code}
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.0.0-SNAPSHOT
  /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77)
Type in expressions to have them evaluated.
Type :help for more information.
{code}

> Cannot create encoders for case classes defined in Spark shell after 
> upgrading to Scala 2.11
> 
>
> Key: SPARK-13456
> URL: https://issues.apache.org/jira/browse/SPARK-13456
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Wenchen Fan
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Spark 2.0 started to use Scala 2.11 by default since [PR 
> #10608|https://github.com/apache/spark/pull/10608].  Unfortunately, after 
> this upgrade, Spark fails to create encoders for case classes defined in REPL:
> {code}
> import sqlContext.implicits._
> case class T(a: Int, b: Double)
> val ds = Seq(1 -> T(1, 1D), 2 -> T(2, 2D)).toDS()
> {code}
> Exception thrown:
> {noformat}
> org.apache.spark.sql.AnalysisException: Unable to generate an encoder for 
> inner class `T` without access to the scope that this class was defined in.
> Try moving this class out of its parent class.;
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$resolveDeserializer$1.applyOrElse(Analyzer.scala:565)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$resolveDeserializer$1.applyOrElse(Analyzer.scala:561)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:262)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:262)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:261)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:304)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:308)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1194)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:300)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1194)
>   at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:287)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1194)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:353)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:267)
>   at 
>

[jira] [Issue Comment Deleted] (SPARK-13456) Cannot create encoders for case classes defined in Spark shell after upgrading to Scala 2.11

2016-04-03 Thread Jacek Laskowski (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-13456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Laskowski updated SPARK-13456:

Comment: was deleted

(was: It works now. Thanks a lot!)

> Cannot create encoders for case classes defined in Spark shell after 
> upgrading to Scala 2.11
> 
>
> Key: SPARK-13456
> URL: https://issues.apache.org/jira/browse/SPARK-13456
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Wenchen Fan
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Spark 2.0 started to use Scala 2.11 by default since [PR 
> #10608|https://github.com/apache/spark/pull/10608].  Unfortunately, after 
> this upgrade, Spark fails to create encoders for case classes defined in REPL:
> {code}
> import sqlContext.implicits._
> case class T(a: Int, b: Double)
> val ds = Seq(1 -> T(1, 1D), 2 -> T(2, 2D)).toDS()
> {code}
> Exception thrown:
> {noformat}
> org.apache.spark.sql.AnalysisException: Unable to generate an encoder for 
> inner class `T` without access to the scope that this class was defined in.
> Try moving this class out of its parent class.;
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$resolveDeserializer$1.applyOrElse(Analyzer.scala:565)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$resolveDeserializer$1.applyOrElse(Analyzer.scala:561)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:262)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:262)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:261)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:304)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:308)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1194)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:300)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1194)
>   at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:287)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1194)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:353)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5$$anonfun$apply$11.apply(TreeNode.scala:333)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:308)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1194)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:300)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1194)
>   at

[jira] [Commented] (SPARK-13456) Cannot create encoders for case classes defined in Spark shell after upgrading to Scala 2.11

2016-04-03 Thread Jacek Laskowski (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223573#comment-15223573
 ] 

Jacek Laskowski commented on SPARK-13456:
-

It works now. Thanks a lot!

> Cannot create encoders for case classes defined in Spark shell after 
> upgrading to Scala 2.11
> 
>
> Key: SPARK-13456
> URL: https://issues.apache.org/jira/browse/SPARK-13456
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Cheng Lian
>Assignee: Wenchen Fan
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Spark 2.0 started to use Scala 2.11 by default since [PR 
> #10608|https://github.com/apache/spark/pull/10608].  Unfortunately, after 
> this upgrade, Spark fails to create encoders for case classes defined in REPL:
> {code}
> import sqlContext.implicits._
> case class T(a: Int, b: Double)
> val ds = Seq(1 -> T(1, 1D), 2 -> T(2, 2D)).toDS()
> {code}
> Exception thrown:
> {noformat}
> org.apache.spark.sql.AnalysisException: Unable to generate an encoder for 
> inner class `T` without access to the scope that this class was defined in.
> Try moving this class out of its parent class.;
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$resolveDeserializer$1.applyOrElse(Analyzer.scala:565)
>   at 
> org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$resolveDeserializer$1.applyOrElse(Analyzer.scala:561)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:262)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:262)
>   at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:261)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:304)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:308)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1194)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:300)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1194)
>   at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:287)
>   at scala.collection.AbstractIterator.toArray(Iterator.scala:1194)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformChildren(TreeNode.scala:353)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:267)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5$$anonfun$apply$11.apply(TreeNode.scala:333)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245)
>   at scala.collection.immutable.List.foreach(List.scala:381)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:245)
>   at scala.collection.immutable.List.map(List.scala:285)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$5.apply(TreeNode.scala:331)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:370)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
>   at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
>   at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
>   at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:308)
>   at scala.collection.AbstractIterator.to(Iterator.scala:1194)
>   at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:300)
>   at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1194)
>   at

[jira] [Updated] (SPARK-14163) SumEvaluator and countApprox cannot reliably handle RDDs of size 1

2016-04-03 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-14163:
--
Assignee: Marcin Tustin

> SumEvaluator and countApprox cannot reliably handle RDDs of size 1
> --
>
> Key: SPARK-14163
> URL: https://issues.apache.org/jira/browse/SPARK-14163
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2, 1.6.0, 1.6.1, 2.0.0
>Reporter: Marcin Tustin
>Assignee: Marcin Tustin
>Priority: Minor
> Fix For: 2.0.0
>
>
> The bug exists in these lines: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/partial/SumEvaluator.scala#L59-L61
> In this code
> {code:title=SumEvaluator.scala|borderStyle=solid}
>   val degreesOfFreedom = (counter.count - 1).toInt
>   new TDistribution(degreesOfFreedom).inverseCumulativeProbability(1 
> - (1 - confidence) / 2)
> {code}
> If {{counter.count}} is 1 or 0 then {{new TDistribution(degreesOfFreedom)}} 
> will raise an exception because {{TDistribution}} expects its 
> {{degreesOfFreedom}} parameter to be 1 or greater.
> An example (written in pyspark):
> {noformat}
> >>> rdd = sc.parallelize([1])
> >>> rdd.countApprox(1000,0.5)
> 16/03/25 18:09:36 INFO SparkContext: Starting job: sumApprox at 
> NativeMethodAccessorImpl.java:-2
> 16/03/25 18:09:36 INFO DAGScheduler: Got job 1 (sumApprox at 
> NativeMethodAccessorImpl.java:-2) with 2 output partitions
> 16/03/25 18:09:36 INFO DAGScheduler: Final stage: ResultStage 1(sumApprox at 
> NativeMethodAccessorImpl.java:-2)
> 16/03/25 18:09:36 INFO DAGScheduler: Parents of final stage: List()
> 16/03/25 18:09:36 INFO DAGScheduler: Missing parents: List()
> 16/03/25 18:09:36 INFO DAGScheduler: Submitting ResultStage 1 
> (MapPartitionsRDD[6] at mapPartitions at SerDeUtil.scala:147), which has no 
> missing parents
> 16/03/25 18:09:36 INFO MemoryStore: ensureFreeSpace(4328) called with 
> curMem=7140, maxMem=555755765
> 16/03/25 18:09:36 INFO MemoryStore: Block broadcast_1 stored as values in 
> memory (estimated size 4.2 KB, free 530.0 MB)
> 16/03/25 18:09:36 INFO MemoryStore: ensureFreeSpace(2821) called with 
> curMem=11468, maxMem=555755765
> 16/03/25 18:09:36 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes 
> in memory (estimated size 2.8 KB, free 530.0 MB)
> 16/03/25 18:09:36 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory 
> on 10.5.5.158:56348 (size: 2.8 KB, free: 530.0 MB)
> 16/03/25 18:09:36 INFO SparkContext: Created broadcast 1 from broadcast at 
> DAGScheduler.scala:861
> 16/03/25 18:09:36 INFO DAGScheduler: Submitting 2 missing tasks from 
> ResultStage 1 (MapPartitionsRDD[6] at mapPartitions at SerDeUtil.scala:147)
> 16/03/25 18:09:36 INFO YarnScheduler: Adding task set 1.0 with 2 tasks
> 16/03/25 18:09:36 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, 
> r-hadoopeco-data-66215afe.hbinternal.com, PROCESS_LOCAL, 2071 bytes)
> 16/03/25 18:09:36 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, 
> r-hadoopeco-data-84205b1c.hbinternal.com, PROCESS_LOCAL, 2090 bytes)
> 16/03/25 18:09:36 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory 
> on r-hadoopeco-data-66215afe.hbinternal.com:43011 (size: 2.8 KB, free: 530.0 
> MB)
> 16/03/25 18:09:36 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) 
> in 66 ms on r-hadoopeco-data-66215afe.hbinternal.com (1/2)
> 16/03/25 18:09:36 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory 
> on r-hadoopeco-data-84205b1c.hbinternal.com:41613 (size: 2.8 KB, free: 530.0 
> MB)
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/usr/hdp/2.3.4.0-3485/spark/python/pyspark/rdd.py", line 2227, in 
> countApprox
> return int(drdd.sumApprox(timeout, confidence))
>   File "/usr/hdp/2.3.4.0-3485/spark/python/pyspark/rdd.py", line 2243, in 
> sumApprox
> r = jdrdd.sumApprox(timeout, confidence).getFinalValue()
>   File 
> "/usr/hdp/2.3.4.0-3485/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
>  line 538, in __call__
>   File "/usr/hdp/2.3.4.0-3485/spark/python/pyspark/sql/utils.py", line 36, in 
> deco
> return f(*a, **kw)
>   File 
> "/usr/hdp/2.3.4.0-3485/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
>  line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o69.sumApprox.
> : org.apache.commons.math3.exception.NotStrictlyPositiveException: degrees of 
> freedom (0)
>   at 
> org.apache.commons.math3.distribution.TDistribution.(TDistribution.java:120)
>   at 
> org.apache.commons.math3.distribution.TDistribution.(TDistribution.java:86)
>   at 
> org.apache.commons.math3.distribution.TDistribution.(TDistribution.java:63)
>   at 
>

[jira] [Resolved] (SPARK-14163) SumEvaluator and countApprox cannot reliably handle RDDs of size 1

2016-04-03 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-14163.
---
   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 12016
[https://github.com/apache/spark/pull/12016]

> SumEvaluator and countApprox cannot reliably handle RDDs of size 1
> --
>
> Key: SPARK-14163
> URL: https://issues.apache.org/jira/browse/SPARK-14163
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2, 1.6.0, 1.6.1, 2.0.0
>Reporter: Marcin Tustin
>Priority: Minor
> Fix For: 2.0.0
>
>
> The bug exists in these lines: 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/partial/SumEvaluator.scala#L59-L61
> In this code
> {code:title=SumEvaluator.scala|borderStyle=solid}
>   val degreesOfFreedom = (counter.count - 1).toInt
>   new TDistribution(degreesOfFreedom).inverseCumulativeProbability(1 
> - (1 - confidence) / 2)
> {code}
> If {{counter.count}} is 1 or 0 then {{new TDistribution(degreesOfFreedom)}} 
> will raise an exception because {{TDistribution}} expects its 
> {{degreesOfFreedom}} parameter to be 1 or greater.
> An example (written in pyspark):
> {noformat}
> >>> rdd = sc.parallelize([1])
> >>> rdd.countApprox(1000,0.5)
> 16/03/25 18:09:36 INFO SparkContext: Starting job: sumApprox at 
> NativeMethodAccessorImpl.java:-2
> 16/03/25 18:09:36 INFO DAGScheduler: Got job 1 (sumApprox at 
> NativeMethodAccessorImpl.java:-2) with 2 output partitions
> 16/03/25 18:09:36 INFO DAGScheduler: Final stage: ResultStage 1(sumApprox at 
> NativeMethodAccessorImpl.java:-2)
> 16/03/25 18:09:36 INFO DAGScheduler: Parents of final stage: List()
> 16/03/25 18:09:36 INFO DAGScheduler: Missing parents: List()
> 16/03/25 18:09:36 INFO DAGScheduler: Submitting ResultStage 1 
> (MapPartitionsRDD[6] at mapPartitions at SerDeUtil.scala:147), which has no 
> missing parents
> 16/03/25 18:09:36 INFO MemoryStore: ensureFreeSpace(4328) called with 
> curMem=7140, maxMem=555755765
> 16/03/25 18:09:36 INFO MemoryStore: Block broadcast_1 stored as values in 
> memory (estimated size 4.2 KB, free 530.0 MB)
> 16/03/25 18:09:36 INFO MemoryStore: ensureFreeSpace(2821) called with 
> curMem=11468, maxMem=555755765
> 16/03/25 18:09:36 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes 
> in memory (estimated size 2.8 KB, free 530.0 MB)
> 16/03/25 18:09:36 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory 
> on 10.5.5.158:56348 (size: 2.8 KB, free: 530.0 MB)
> 16/03/25 18:09:36 INFO SparkContext: Created broadcast 1 from broadcast at 
> DAGScheduler.scala:861
> 16/03/25 18:09:36 INFO DAGScheduler: Submitting 2 missing tasks from 
> ResultStage 1 (MapPartitionsRDD[6] at mapPartitions at SerDeUtil.scala:147)
> 16/03/25 18:09:36 INFO YarnScheduler: Adding task set 1.0 with 2 tasks
> 16/03/25 18:09:36 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, 
> r-hadoopeco-data-66215afe.hbinternal.com, PROCESS_LOCAL, 2071 bytes)
> 16/03/25 18:09:36 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, 
> r-hadoopeco-data-84205b1c.hbinternal.com, PROCESS_LOCAL, 2090 bytes)
> 16/03/25 18:09:36 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory 
> on r-hadoopeco-data-66215afe.hbinternal.com:43011 (size: 2.8 KB, free: 530.0 
> MB)
> 16/03/25 18:09:36 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) 
> in 66 ms on r-hadoopeco-data-66215afe.hbinternal.com (1/2)
> 16/03/25 18:09:36 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory 
> on r-hadoopeco-data-84205b1c.hbinternal.com:41613 (size: 2.8 KB, free: 530.0 
> MB)
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/usr/hdp/2.3.4.0-3485/spark/python/pyspark/rdd.py", line 2227, in 
> countApprox
> return int(drdd.sumApprox(timeout, confidence))
>   File "/usr/hdp/2.3.4.0-3485/spark/python/pyspark/rdd.py", line 2243, in 
> sumApprox
> r = jdrdd.sumApprox(timeout, confidence).getFinalValue()
>   File 
> "/usr/hdp/2.3.4.0-3485/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
>  line 538, in __call__
>   File "/usr/hdp/2.3.4.0-3485/spark/python/pyspark/sql/utils.py", line 36, in 
> deco
> return f(*a, **kw)
>   File 
> "/usr/hdp/2.3.4.0-3485/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
>  line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling o69.sumApprox.
> : org.apache.commons.math3.exception.NotStrictlyPositiveException: degrees of 
> freedom (0)
>   at 
> org.apache.commons.math3.distribution.TDistribution.(TDistribution.java:120)
>   at 
> org.apache.commons.math3.distribution.TDistribution.(TDistribution.java:86)
>   at 
>

[jira] [Assigned] (SPARK-14355) Fix typos in Exception/Testcase/Comments and static analysis results

2016-04-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14355:


Assignee: Apache Spark

> Fix typos in Exception/Testcase/Comments and static analysis results
> 
>
> Key: SPARK-14355
> URL: https://issues.apache.org/jira/browse/SPARK-14355
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Minor
>
> This issue contains the following 5 types of maintenance fix over 59 files 
> (+94 lines, -93 lines).
> * Fix typos(exception/log strings, testcase name, comments) in 44 lines.
> * Fix lint-java errors (MaxLineLength) in 6 lines. (New codes after 
> SPARK-14011)
> * Use diamond operators in 40 lines. (New codes after SPARK-13702)
> * Fix redundant semicolon in 5 lines.
> * Rename class `InferSchemaSuite` to `CSVInferSchemaSuite` in 
> CSVInferSchemaSuite.scala.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14355) Fix typos in Exception/Testcase/Comments and static analysis results

2016-04-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223534#comment-15223534
 ] 

Apache Spark commented on SPARK-14355:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/12139

> Fix typos in Exception/Testcase/Comments and static analysis results
> 
>
> Key: SPARK-14355
> URL: https://issues.apache.org/jira/browse/SPARK-14355
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> This issue contains the following 5 types of maintenance fix over 59 files 
> (+94 lines, -93 lines).
> * Fix typos(exception strings, testcase name, comments) in 44 lines.
> * Fix lint-java errors (MaxLineLength) in 6 lines. (New codes after 
> SPARK-14011)
> * Use diamond operators in 40 lines. (New codes after SPARK-13702)
> * Fix redundant semicolon in 5 lines.
> * Rename class `InferSchemaSuite` to `CSVInferSchemaSuite` in 
> CSVInferSchemaSuite.scala.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14355) Fix typos in Exception/Testcase/Comments and static analysis results

2016-04-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14355:


Assignee: (was: Apache Spark)

> Fix typos in Exception/Testcase/Comments and static analysis results
> 
>
> Key: SPARK-14355
> URL: https://issues.apache.org/jira/browse/SPARK-14355
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> This issue contains the following 5 types of maintenance fix over 59 files 
> (+94 lines, -93 lines).
> * Fix typos(exception/log strings, testcase name, comments) in 44 lines.
> * Fix lint-java errors (MaxLineLength) in 6 lines. (New codes after 
> SPARK-14011)
> * Use diamond operators in 40 lines. (New codes after SPARK-13702)
> * Fix redundant semicolon in 5 lines.
> * Rename class `InferSchemaSuite` to `CSVInferSchemaSuite` in 
> CSVInferSchemaSuite.scala.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14355) Fix typos in Exception/Testcase/Comments and static analysis results

2016-04-03 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-14355:
--
Description: 
This issue contains the following 5 types of maintenance fix over 59 files (+94 
lines, -93 lines).
* Fix typos(exception/log strings, testcase name, comments) in 44 lines.
* Fix lint-java errors (MaxLineLength) in 6 lines. (New codes after SPARK-14011)
* Use diamond operators in 40 lines. (New codes after SPARK-13702)
* Fix redundant semicolon in 5 lines.
* Rename class `InferSchemaSuite` to `CSVInferSchemaSuite` in 
CSVInferSchemaSuite.scala.

  was:
This issue contains the following 5 types of maintenance fix over 59 files (+94 
lines, -93 lines).
* Fix typos(exception strings, testcase name, comments) in 44 lines.
* Fix lint-java errors (MaxLineLength) in 6 lines. (New codes after SPARK-14011)
* Use diamond operators in 40 lines. (New codes after SPARK-13702)
* Fix redundant semicolon in 5 lines.
* Rename class `InferSchemaSuite` to `CSVInferSchemaSuite` in 
CSVInferSchemaSuite.scala.


> Fix typos in Exception/Testcase/Comments and static analysis results
> 
>
> Key: SPARK-14355
> URL: https://issues.apache.org/jira/browse/SPARK-14355
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Reporter: Dongjoon Hyun
>Priority: Minor
>
> This issue contains the following 5 types of maintenance fix over 59 files 
> (+94 lines, -93 lines).
> * Fix typos(exception/log strings, testcase name, comments) in 44 lines.
> * Fix lint-java errors (MaxLineLength) in 6 lines. (New codes after 
> SPARK-14011)
> * Use diamond operators in 40 lines. (New codes after SPARK-13702)
> * Fix redundant semicolon in 5 lines.
> * Rename class `InferSchemaSuite` to `CSVInferSchemaSuite` in 
> CSVInferSchemaSuite.scala.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14355) Fix typos in Exception/Testcase/Comments and static analysis results

2016-04-03 Thread Dongjoon Hyun (JIRA)

Dongjoon Hyun created SPARK-14355:
-

 Summary: Fix typos in Exception/Testcase/Comments and static 
analysis results
 Key: SPARK-14355
 URL: https://issues.apache.org/jira/browse/SPARK-14355
 Project: Spark
  Issue Type: Bug
  Components: Build
Reporter: Dongjoon Hyun
Priority: Minor


This issue contains the following 5 types of maintenance fix over 59 files (+94 
lines, -93 lines).
* Fix typos(exception strings, testcase name, comments) in 44 lines.
* Fix lint-java errors (MaxLineLength) in 6 lines. (New codes after SPARK-14011)
* Use diamond operators in 40 lines. (New codes after SPARK-13702)
* Fix redundant semicolon in 5 lines.
* Rename class `InferSchemaSuite` to `CSVInferSchemaSuite` in 
CSVInferSchemaSuite.scala.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-12956) add spark.yarn.hdfs.home.directory property

2016-04-03 Thread PJ Fanning (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning closed SPARK-12956.
--
Resolution: Duplicate

> add spark.yarn.hdfs.home.directory property
> ---
>
> Key: SPARK-12956
> URL: https://issues.apache.org/jira/browse/SPARK-12956
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.6.0
>Reporter: PJ Fanning
>
> https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
>  uses the default home directory based on the hadoop configuration. I have a 
> use case where it would be useful to override this and to provide an explicit 
> base path.
> If this seems like a generally use config property, I can put together a pull 
> request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12956) add spark.yarn.hdfs.home.directory property

2016-04-03 Thread PJ Fanning (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223463#comment-15223463
 ] 

PJ Fanning commented on SPARK-12956:


[~tgraves] I think you can close this as a duplicate of SPARK-13063

> add spark.yarn.hdfs.home.directory property
> ---
>
> Key: SPARK-12956
> URL: https://issues.apache.org/jira/browse/SPARK-12956
> Project: Spark
>  Issue Type: Improvement
>  Components: YARN
>Affects Versions: 1.6.0
>Reporter: PJ Fanning
>
> https://github.com/apache/spark/blob/master/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
>  uses the default home directory based on the hadoop configuration. I have a 
> use case where it would be useful to override this and to provide an explicit 
> base path.
> If this seems like a generally use config property, I can put together a pull 
> request.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14341) Throw exception on unsupported Create/Drop Macro DDL commands

2016-04-03 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-14341.
---
  Resolution: Resolved
Assignee: Bo Meng
Target Version/s: 2.0.0

> Throw exception on unsupported Create/Drop Macro DDL commands
> -
>
> Key: SPARK-14341
> URL: https://issues.apache.org/jira/browse/SPARK-14341
> Project: Spark
>  Issue Type: Improvement
>Reporter: Bo Meng
>Assignee: Bo Meng
>Priority: Minor
>
> According to
> [SPARK-14123|https://issues.apache.org/jira/browse/SPARK-14123], we need to 
> throw exception for Create/Drop Macro DDL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-13998) HashingTF should extend UnaryTransformer

2016-04-03 Thread Jacek Laskowski (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-13998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223326#comment-15223326
 ] 

Jacek Laskowski commented on SPARK-13998:
-

I don't personally, but I don't really like whenever I see all these 
non-{{UnaryTransformer}} transformers like {{HashingTF}} or 
{{StopWordsRemover}} on my way. I'd like to give it a shot and get rid of the 
"anomaly". What is the jira for refactoring {{UnaryTransformer}} to support 
setting {{Attribute}}? Or perhaps [~yanboliang] wants to work on it? 

Please guide [~josephkb] / [~mlnick].

> HashingTF should extend UnaryTransformer
> 
>
> Key: SPARK-13998
> URL: https://issues.apache.org/jira/browse/SPARK-13998
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 2.0.0
>Reporter: Jacek Laskowski
>Priority: Minor
>
> Currently 
> [HashingTF|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/feature/HashingTF.scala#L37]
>  extends {{Transformer with HasInputCol with HasOutputCol}}, but there is a 
> helper 
> [UnaryTransformer|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/Transformer.scala#L79-L80]
>  abstract class for exactly the reason.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14354) Let Expand take name expressions and infer output attributes

2016-04-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223290#comment-15223290
 ] 

Apache Spark commented on SPARK-14354:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/12138

> Let Expand take name expressions and infer output attributes
> 
>
> Key: SPARK-14354
> URL: https://issues.apache.org/jira/browse/SPARK-14354
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Liang-Chi Hsieh
>
> Currently we create Expand operator by specifying projections 
> (Seq[Seq[Expression]])  and its output. We allow Expand to reuse child 
> operator's attributes and so make its contraints invalid. We should let it 
> take name expressions and infer output itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14354) Let Expand take name expressions and infer output attributes

2016-04-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14354:


Assignee: Apache Spark

> Let Expand take name expressions and infer output attributes
> 
>
> Key: SPARK-14354
> URL: https://issues.apache.org/jira/browse/SPARK-14354
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>
> Currently we create Expand operator by specifying projections 
> (Seq[Seq[Expression]])  and its output. We allow Expand to reuse child 
> operator's attributes and so make its contraints invalid. We should let it 
> take name expressions and infer output itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14354) Let Expand take name expressions and infer output attributes

2016-04-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14354:


Assignee: (was: Apache Spark)

> Let Expand take name expressions and infer output attributes
> 
>
> Key: SPARK-14354
> URL: https://issues.apache.org/jira/browse/SPARK-14354
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Liang-Chi Hsieh
>
> Currently we create Expand operator by specifying projections 
> (Seq[Seq[Expression]])  and its output. We allow Expand to reuse child 
> operator's attributes and so make its contraints invalid. We should let it 
> take name expressions and infer output itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14354) Let Expand take name expressions and infer output attributes

2016-04-03 Thread Liang-Chi Hsieh (JIRA)

Liang-Chi Hsieh created SPARK-14354:
---

 Summary: Let Expand take name expressions and infer output 
attributes
 Key: SPARK-14354
 URL: https://issues.apache.org/jira/browse/SPARK-14354
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Liang-Chi Hsieh


Currently we create Expand operator by specifying projections 
(Seq[Seq[Expression]])  and its output. We allow Expand to reuse child 
operator's attributes and so make its contraints invalid. We should let it take 
name expressions and infer output itself.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14350) explain output should be in a single cell rather than one line per cell

2016-04-03 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-14350.
---
Resolution: Resolved
  Assignee: Dongjoon Hyun

> explain output should be in a single cell rather than one line per cell
> ---
>
> Key: SPARK-14350
> URL: https://issues.apache.org/jira/browse/SPARK-14350
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Dongjoon Hyun
>
> See 
> {code}
> scala> sql("explain select 1").head
> res3: org.apache.spark.sql.Row = [== Physical Plan ==]
> {code}
> We should show the entire output, rather than just the 1st line, when head is 
> used. That is to say, the output should contain only one row, rather than one 
> row per line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14350) explain output should be in a single cell rather than one line per cell

2016-04-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223205#comment-15223205
 ] 

Apache Spark commented on SPARK-14350:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/12137

> explain output should be in a single cell rather than one line per cell
> ---
>
> Key: SPARK-14350
> URL: https://issues.apache.org/jira/browse/SPARK-14350
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Reynold Xin
>
> See 
> {code}
> scala> sql("explain select 1").head
> res3: org.apache.spark.sql.Row = [== Physical Plan ==]
> {code}
> We should show the entire output, rather than just the 1st line, when head is 
> used. That is to say, the output should contain only one row, rather than one 
> row per line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14350) explain output should be in a single cell rather than one line per cell

2016-04-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14350:


Assignee: (was: Apache Spark)

> explain output should be in a single cell rather than one line per cell
> ---
>
> Key: SPARK-14350
> URL: https://issues.apache.org/jira/browse/SPARK-14350
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Reynold Xin
>
> See 
> {code}
> scala> sql("explain select 1").head
> res3: org.apache.spark.sql.Row = [== Physical Plan ==]
> {code}
> We should show the entire output, rather than just the 1st line, when head is 
> used. That is to say, the output should contain only one row, rather than one 
> row per line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14350) explain output should be in a single cell rather than one line per cell

2016-04-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14350:


Assignee: Apache Spark

> explain output should be in a single cell rather than one line per cell
> ---
>
> Key: SPARK-14350
> URL: https://issues.apache.org/jira/browse/SPARK-14350
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Reynold Xin
>Assignee: Apache Spark
>
> See 
> {code}
> scala> sql("explain select 1").head
> res3: org.apache.spark.sql.Row = [== Physical Plan ==]
> {code}
> We should show the entire output, rather than just the 1st line, when head is 
> used. That is to say, the output should contain only one row, rather than one 
> row per line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14353) Dateset Time Windowing API for Python, R, and SQL

2016-04-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14353:


Assignee: (was: Apache Spark)

> Dateset Time Windowing API for Python, R, and SQL
> -
>
> Key: SPARK-14353
> URL: https://issues.apache.org/jira/browse/SPARK-14353
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark, SparkR, SQL
>Reporter: Burak Yavuz
>
> The time windowing function `window` was added to Datasets. This JIRA is to 
> track the status for the R, Python and SQL API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14353) Dateset Time Windowing API for Python, R, and SQL

2016-04-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14353:


Assignee: Apache Spark

> Dateset Time Windowing API for Python, R, and SQL
> -
>
> Key: SPARK-14353
> URL: https://issues.apache.org/jira/browse/SPARK-14353
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark, SparkR, SQL
>Reporter: Burak Yavuz
>Assignee: Apache Spark
>
> The time windowing function `window` was added to Datasets. This JIRA is to 
> track the status for the R, Python and SQL API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14353) Dateset Time Windowing API for Python, R, and SQL

2016-04-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223168#comment-15223168
 ] 

Apache Spark commented on SPARK-14353:
--

User 'brkyvz' has created a pull request for this issue:
https://github.com/apache/spark/pull/12136

> Dateset Time Windowing API for Python, R, and SQL
> -
>
> Key: SPARK-14353
> URL: https://issues.apache.org/jira/browse/SPARK-14353
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark, SparkR, SQL
>Reporter: Burak Yavuz
>
> The time windowing function `window` was added to Datasets. This JIRA is to 
> track the status for the R, Python and SQL API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14353) Dateset Time Windowing API for Python, R, and SQL

2016-04-03 Thread Burak Yavuz (JIRA)

Burak Yavuz created SPARK-14353:
---

 Summary: Dateset Time Windowing API for Python, R, and SQL
 Key: SPARK-14353
 URL: https://issues.apache.org/jira/browse/SPARK-14353
 Project: Spark
  Issue Type: New Feature
  Components: PySpark, SparkR, SQL
Reporter: Burak Yavuz


The time windowing function `window` was added to Datasets. This JIRA is to 
track the status for the R, Python and SQL API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14352) approxQuantile should support multi columns

2016-04-03 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223164#comment-15223164
 ] 

Apache Spark commented on SPARK-14352:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/12135

> approxQuantile should support multi columns
> ---
>
> Key: SPARK-14352
> URL: https://issues.apache.org/jira/browse/SPARK-14352
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: zhengruifeng
>
> It will be convenient and efficient to calculate quantiles of multi-columns 
> with approxQuantile.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14352) approxQuantile should support multi columns

2016-04-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14352:


Assignee: (was: Apache Spark)

> approxQuantile should support multi columns
> ---
>
> Key: SPARK-14352
> URL: https://issues.apache.org/jira/browse/SPARK-14352
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: zhengruifeng
>
> It will be convenient and efficient to calculate quantiles of multi-columns 
> with approxQuantile.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-14352) approxQuantile should support multi columns

2016-04-03 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-14352:


Assignee: Apache Spark

> approxQuantile should support multi columns
> ---
>
> Key: SPARK-14352
> URL: https://issues.apache.org/jira/browse/SPARK-14352
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: zhengruifeng
>Assignee: Apache Spark
>
> It will be convenient and efficient to calculate quantiles of multi-columns 
> with approxQuantile.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-14352) approxQuantile should support multi columns

2016-04-03 Thread zhengruifeng (JIRA)

zhengruifeng created SPARK-14352:


 Summary: approxQuantile should support multi columns
 Key: SPARK-14352
 URL: https://issues.apache.org/jira/browse/SPARK-14352
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: zhengruifeng


It will be convenient and efficient to calculate quantiles of multi-columns 
with approxQuantile.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-14350) explain output should be in a single cell rather than one line per cell

2016-04-03 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-14350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15223159#comment-15223159
 ] 

Dongjoon Hyun commented on SPARK-14350:
---

Sure! I'll fix this.

> explain output should be in a single cell rather than one line per cell
> ---
>
> Key: SPARK-14350
> URL: https://issues.apache.org/jira/browse/SPARK-14350
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Reynold Xin
>
> See 
> {code}
> scala> sql("explain select 1").head
> res3: org.apache.spark.sql.Row = [== Physical Plan ==]
> {code}
> We should show the entire output, rather than just the 1st line, when head is 
> used. That is to say, the output should contain only one row, rather than one 
> row per line.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-14231) JSON data source fails to infer floats as decimal when precision is bigger than 38 or scale is bigger than precision.

2016-04-03 Thread Davies Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-14231:
---
Assignee: Hyukjin Kwon

> JSON data source fails to infer floats as decimal when precision is bigger 
> than 38 or scale is bigger than precision.
> -
>
> Key: SPARK-14231
> URL: https://issues.apache.org/jira/browse/SPARK-14231
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 2.0.0
>
>
> Currently, JSON data source supports {{floatAsBigDecimal}} option, which 
> reads floats as {{DecimalType}}.
> I noticed there are several restrictions in Spark {{DecimalType}} below:
> 1. The precision cannot be bigger than 38.
> 2. scale cannot be bigger than precision. 
> However, with the option above, it reads {{BigDecimal}} which does not follow 
> the conditions above.
> This could be observed as below:
> {code}
> def simpleFloats: RDD[String] =
>   sqlContext.sparkContext.parallelize(
> """{"a": 0.01}""" ::
> """{"a": 0.02}""" :: Nil)
> val jsonDF = sqlContext.read
>   .option("floatAsBigDecimal", "true")
>   .json(simpleFloats)
> jsonDF.printSchema()
> {code}
> throws an exception below:
> {code}
> org.apache.spark.sql.AnalysisException: Decimal scale (2) cannot be greater 
> than precision (1).;
>   at org.apache.spark.sql.types.DecimalType.(DecimalType.scala:44)
>   at 
> org.apache.spark.sql.execution.datasources.json.InferSchema$.org$apache$spark$sql$execution$datasources$json$InferSchema$$inferField(InferSchema.scala:144)
>   at 
> org.apache.spark.sql.execution.datasources.json.InferSchema$.org$apache$spark$sql$execution$datasources$json$InferSchema$$inferField(InferSchema.scala:108)
>   at 
> org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1$$anonfun$apply$3.apply(InferSchema.scala:59)
>   at 
> org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1$$anonfun$apply$3.apply(InferSchema.scala:57)
>   at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2249)
>   at 
> org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1.apply(InferSchema.scala:57)
>   at 
> org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1.apply(InferSchema.scala:55)
>   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:396)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
> ...
> {code}
> Since JSON data source infers {{DataType}} as {{StringType}} when it fails to 
> infer, it might have to be inferred as {{StringType}} or maybe just simply 
> {{DoubleType}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-14231) JSON data source fails to infer floats as decimal when precision is bigger than 38 or scale is bigger than precision.

2016-04-03 Thread Davies Liu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-14231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-14231.

   Resolution: Fixed
Fix Version/s: 2.0.0

Issue resolved by pull request 12030
[https://github.com/apache/spark/pull/12030]

> JSON data source fails to infer floats as decimal when precision is bigger 
> than 38 or scale is bigger than precision.
> -
>
> Key: SPARK-14231
> URL: https://issues.apache.org/jira/browse/SPARK-14231
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Hyukjin Kwon
>Priority: Minor
> Fix For: 2.0.0
>
>
> Currently, JSON data source supports {{floatAsBigDecimal}} option, which 
> reads floats as {{DecimalType}}.
> I noticed there are several restrictions in Spark {{DecimalType}} below:
> 1. The precision cannot be bigger than 38.
> 2. scale cannot be bigger than precision. 
> However, with the option above, it reads {{BigDecimal}} which does not follow 
> the conditions above.
> This could be observed as below:
> {code}
> def simpleFloats: RDD[String] =
>   sqlContext.sparkContext.parallelize(
> """{"a": 0.01}""" ::
> """{"a": 0.02}""" :: Nil)
> val jsonDF = sqlContext.read
>   .option("floatAsBigDecimal", "true")
>   .json(simpleFloats)
> jsonDF.printSchema()
> {code}
> throws an exception below:
> {code}
> org.apache.spark.sql.AnalysisException: Decimal scale (2) cannot be greater 
> than precision (1).;
>   at org.apache.spark.sql.types.DecimalType.(DecimalType.scala:44)
>   at 
> org.apache.spark.sql.execution.datasources.json.InferSchema$.org$apache$spark$sql$execution$datasources$json$InferSchema$$inferField(InferSchema.scala:144)
>   at 
> org.apache.spark.sql.execution.datasources.json.InferSchema$.org$apache$spark$sql$execution$datasources$json$InferSchema$$inferField(InferSchema.scala:108)
>   at 
> org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1$$anonfun$apply$3.apply(InferSchema.scala:59)
>   at 
> org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1$$anonfun$apply$3.apply(InferSchema.scala:57)
>   at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2249)
>   at 
> org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1.apply(InferSchema.scala:57)
>   at 
> org.apache.spark.sql.execution.datasources.json.InferSchema$$anonfun$1$$anonfun$apply$1.apply(InferSchema.scala:55)
>   at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:396)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:742)
> ...
> {code}
> Since JSON data source infers {{DataType}} as {{StringType}} when it fails to 
> infer, it might have to be inferred as {{StringType}} or maybe just simply 
> {{DoubleType}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

74 matches

Mail list logo