[jira] [Commented] (SPARK-8510) NumPy arrays and matrices as values in sequence files

2015-08-23 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708461#comment-14708461
 ] 

Apache Spark commented on SPARK-8510:
-

User 'paberline' has created a pull request for this issue:
https://github.com/apache/spark/pull/8384

 NumPy arrays and matrices as values in sequence files
 -

 Key: SPARK-8510
 URL: https://issues.apache.org/jira/browse/SPARK-8510
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Reporter: Peter Aberline
Priority: Minor

 Using the DoubleArrayWritable example, I have added support for storing NumPy 
 double arrays and matrices as arrays of doubles and nested arrays of doubles 
 as value elements of Sequence Files.
 Each value element is a discrete matrix or array. This is useful where you 
 have many matrices that you don't want to join into a single Spark Data Frame 
 to store in a Parquet file.
 Pandas DataFrames can be easily converted to and from NumPy matrices, so I've 
 also added the ability to store the schema-less data from DataFrames and 
 Series that contain double data. 
 There seems to be demand for this functionality:
 http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3E
 I'll be issuing a PR for this shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2015-08-23 Thread Nick Xie (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708410#comment-14708410
 ] 

Nick Xie commented on SPARK-3655:
-

I wanted to add a session id to each detail record, but only way I can do that 
with mapStreamByKey is to create a LinkedList of detail records and return the 
lists'  iterator which will take up extra memory as opposed to just modifying 
the record.  I ended up just creating a linkedlist of only session records.  It 
seems to work on my test machine.  I will test it on the cluster next week.

 Support sorting of values in addition to keys (i.e. secondary sort)
 ---

 Key: SPARK-3655
 URL: https://issues.apache.org/jira/browse/SPARK-3655
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 1.1.0, 1.2.0
Reporter: koert kuipers
Assignee: Koert Kuipers

 Now that spark has a sort based shuffle, can we expect a secondary sort soon? 
 There are some use cases where getting a sorted iterator of values per key is 
 helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10173) valueOfTerm get none when SparkIMain used on Java

2015-08-23 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708444#comment-14708444
 ] 

Sean Owen commented on SPARK-10173:
---

It sounds like you're expecting some function of the Scala 2.11 REPL in the 
Spark REPL, which is not necessarily a guarantee. What do you mean that it 
occurs in the REPL but not the shell? The REPL isn't really a library in any 
event. In any event, a pull request might clarify what you want to change.

 valueOfTerm get none when SparkIMain used on Java 
 --

 Key: SPARK-10173
 URL: https://issues.apache.org/jira/browse/SPARK-10173
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell
Affects Versions: 1.4.1
 Environment: Spark 1.4.1 Scala 2.11 Java 1.7.0_21 Windows 8
Reporter: SonixLegend
 Fix For: 1.4.2


 When I use Spark Repl on my Java project, I found if I had used the class 
 SparkIMain via Scala 2.10 then the function valueOfTerm could get the term 
 value, but if I had used same class via Scala 2.11, the valueOfTerm gave me a 
 none result, and I checked the source code, found the line 
 sym.owner.companionSymbol got none. Can you fix it? I used a alternative 
 method, change valueOfTerm to eval, then it will get current result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8510) NumPy arrays and matrices as values in sequence files

2015-08-23 Thread Peter Aberline (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Aberline updated SPARK-8510:
--
Description: 
Using the DoubleArrayWritable as an example, I have added support for storing 
NumPy arrays and matrices as elements of Sequence Files.

Each value element is a discrete matrix or array. This is useful where you have 
many matrices that you don't want to join into a single Spark Data Frame to 
store in a Parquet file.

There seems to be demand for this functionality:

http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3E

I originally put this work in PR 6995, but closed it after suggestions from a 
user on to use NumPy's built in serialization. My second version is in PR 8384.

  was:
Using the DoubleArrayWritable example, I have added support for storing NumPy 
double arrays and matrices as arrays of doubles and nested arrays of doubles as 
value elements of Sequence Files.

Each value element is a discrete matrix or array. This is useful where you have 
many matrices that you don't want to join into a single Spark Data Frame to 
store in a Parquet file.

Pandas DataFrames can be easily converted to and from NumPy matrices, so I've 
also added the ability to store the schema-less data from DataFrames and Series 
that contain double data. 

There seems to be demand for this functionality:

http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3E

I'll be issuing a PR for this shortly.


 NumPy arrays and matrices as values in sequence files
 -

 Key: SPARK-8510
 URL: https://issues.apache.org/jira/browse/SPARK-8510
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Reporter: Peter Aberline
Priority: Minor

 Using the DoubleArrayWritable as an example, I have added support for storing 
 NumPy arrays and matrices as elements of Sequence Files.
 Each value element is a discrete matrix or array. This is useful where you 
 have many matrices that you don't want to join into a single Spark Data Frame 
 to store in a Parquet file.
 There seems to be demand for this functionality:
 http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3E
 I originally put this work in PR 6995, but closed it after suggestions from a 
 user on to use NumPy's built in serialization. My second version is in PR 
 8384.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-8510) NumPy arrays and matrices as values in sequence files

2015-08-23 Thread Peter Aberline (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Aberline updated SPARK-8510:
--
Comment: was deleted

(was: See PR at https://github.com/apache/spark/pull/6995)

 NumPy arrays and matrices as values in sequence files
 -

 Key: SPARK-8510
 URL: https://issues.apache.org/jira/browse/SPARK-8510
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Reporter: Peter Aberline
Priority: Minor

 Using the DoubleArrayWritable as an example, I have added support for storing 
 NumPy arrays and matrices as elements of Sequence Files.
 Each value element is a discrete matrix or array. This is useful where you 
 have many matrices that you don't want to join into a single Spark Data Frame 
 to store in a Parquet file.
 There seems to be demand for this functionality:
 http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3E
 I originally put this work in PR 6995, but closed it after suggestions from a 
 user to use NumPy's built in serialization. My second version is in PR 8384.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8510) NumPy arrays and matrices as values in sequence files

2015-08-23 Thread Peter Aberline (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Aberline updated SPARK-8510:
--
Description: 
Using the DoubleArrayWritable as an example, I have added support for storing 
NumPy arrays and matrices as elements of Sequence Files.

Each value element is a discrete matrix or array. This is useful where you have 
many matrices that you don't want to join into a single Spark Data Frame to 
store in a Parquet file.

There seems to be demand for this functionality:

http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3E

I originally put this work in PR 6995, but closed it after suggestions from a 
user to use NumPy's built in serialization. My second version is in PR 8384.

  was:
Using the DoubleArrayWritable as an example, I have added support for storing 
NumPy arrays and matrices as elements of Sequence Files.

Each value element is a discrete matrix or array. This is useful where you have 
many matrices that you don't want to join into a single Spark Data Frame to 
store in a Parquet file.

There seems to be demand for this functionality:

http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3E

I originally put this work in PR 6995, but closed it after suggestions from a 
user on to use NumPy's built in serialization. My second version is in PR 8384.


 NumPy arrays and matrices as values in sequence files
 -

 Key: SPARK-8510
 URL: https://issues.apache.org/jira/browse/SPARK-8510
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Reporter: Peter Aberline
Priority: Minor

 Using the DoubleArrayWritable as an example, I have added support for storing 
 NumPy arrays and matrices as elements of Sequence Files.
 Each value element is a discrete matrix or array. This is useful where you 
 have many matrices that you don't want to join into a single Spark Data Frame 
 to store in a Parquet file.
 There seems to be demand for this functionality:
 http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3E
 I originally put this work in PR 6995, but closed it after suggestions from a 
 user to use NumPy's built in serialization. My second version is in PR 8384.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-10173) valueOfTerm get none when SparkIMain used on Java

2015-08-23 Thread SonixLegend (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

SonixLegend updated SPARK-10173:

Comment: was deleted

(was: Apache Zepplin is wrote and built on Scala 2.10, and it used Spark Repl 
interpreter. It works on Spark which built Scala 2.10. But I have built a Spark 
via Scala 2.11, then I updated the Zepplin to Scala 2.11, so I got the problem 
about valueOfTerm function return none. You means it is not necessary. All 
right, I think I can wait for Zepplin upgrade to Scala 2.11. Thanks for your 
help.)

 valueOfTerm get none when SparkIMain used on Java 
 --

 Key: SPARK-10173
 URL: https://issues.apache.org/jira/browse/SPARK-10173
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell
Affects Versions: 1.4.1
 Environment: Spark 1.4.1 Scala 2.11 Java 1.7.0_21 Windows 8
Reporter: SonixLegend
 Fix For: 1.4.2


 When I use Spark Repl on my Java project, I found if I had used the class 
 SparkIMain via Scala 2.10 then the function valueOfTerm could get the term 
 value, but if I had used same class via Scala 2.11, the valueOfTerm gave me a 
 none result, and I checked the source code, found the line 
 sym.owner.companionSymbol got none. Can you fix it? I used a alternative 
 method, change valueOfTerm to eval, then it will get current result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10173) valueOfTerm get none when SparkIMain used on Java

2015-08-23 Thread SonixLegend (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708450#comment-14708450
 ] 

SonixLegend commented on SPARK-10173:
-

Apache Zepplin is wrote and built on Scala 2.10, and it used Spark Repl 
interpreter. It works on Spark which built Scala 2.10. But I have built a Spark 
via Scala 2.11, then I updated the Zepplin to Scala 2.11, so I got the problem 
about valueOfTerm function return none. You means it is not necessary. All 
right, I think I can wait for Zepplin upgrade to Scala 2.11. Thanks for your 
help.

 valueOfTerm get none when SparkIMain used on Java 
 --

 Key: SPARK-10173
 URL: https://issues.apache.org/jira/browse/SPARK-10173
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell
Affects Versions: 1.4.1
 Environment: Spark 1.4.1 Scala 2.11 Java 1.7.0_21 Windows 8
Reporter: SonixLegend
 Fix For: 1.4.2


 When I use Spark Repl on my Java project, I found if I had used the class 
 SparkIMain via Scala 2.10 then the function valueOfTerm could get the term 
 value, but if I had used same class via Scala 2.11, the valueOfTerm gave me a 
 none result, and I checked the source code, found the line 
 sym.owner.companionSymbol got none. Can you fix it? I used a alternative 
 method, change valueOfTerm to eval, then it will get current result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10173) valueOfTerm get none when SparkIMain used on Java

2015-08-23 Thread SonixLegend (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708449#comment-14708449
 ] 

SonixLegend commented on SPARK-10173:
-

Apache Zepplin is wrote and built on Scala 2.10, and it used Spark Repl 
interpreter. It works on Spark which built Scala 2.10. But I have built a Spark 
via Scala 2.11, then I updated the Zepplin to Scala 2.11, so I got the problem 
about valueOfTerm function return none. You means it is not necessary. All 
right, I think I can wait for Zepplin upgrade to Scala 2.11. Thanks for your 
help.

 valueOfTerm get none when SparkIMain used on Java 
 --

 Key: SPARK-10173
 URL: https://issues.apache.org/jira/browse/SPARK-10173
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell
Affects Versions: 1.4.1
 Environment: Spark 1.4.1 Scala 2.11 Java 1.7.0_21 Windows 8
Reporter: SonixLegend
 Fix For: 1.4.2


 When I use Spark Repl on my Java project, I found if I had used the class 
 SparkIMain via Scala 2.10 then the function valueOfTerm could get the term 
 value, but if I had used same class via Scala 2.11, the valueOfTerm gave me a 
 none result, and I checked the source code, found the line 
 sym.owner.companionSymbol got none. Can you fix it? I used a alternative 
 method, change valueOfTerm to eval, then it will get current result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8510) NumPy arrays and matrices as values in sequence files

2015-08-23 Thread Peter Aberline (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Aberline updated SPARK-8510:
--
Description: 
Using the DoubleArrayWritable as an example, I have added support for storing 
NumPy arrays and matrices as elements of Sequence Files.

Each value element is a discrete matrix or array. This is useful where you have 
many matrices that you don't want to join into a single Spark DataFrame to 
store in a Parquet file.

There seems to be demand for this functionality:

http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3E

I originally put this work in PR 6995, but closed it after suggestions from a 
user to use NumPy's built in serialization. My second version is in PR 8384.

  was:
Using the DoubleArrayWritable as an example, I have added support for storing 
NumPy arrays and matrices as elements of Sequence Files.

Each value element is a discrete matrix or array. This is useful where you have 
many matrices that you don't want to join into a single Spark Data Frame to 
store in a Parquet file.

There seems to be demand for this functionality:

http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3E

I originally put this work in PR 6995, but closed it after suggestions from a 
user to use NumPy's built in serialization. My second version is in PR 8384.


 NumPy arrays and matrices as values in sequence files
 -

 Key: SPARK-8510
 URL: https://issues.apache.org/jira/browse/SPARK-8510
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Reporter: Peter Aberline
Priority: Minor

 Using the DoubleArrayWritable as an example, I have added support for storing 
 NumPy arrays and matrices as elements of Sequence Files.
 Each value element is a discrete matrix or array. This is useful where you 
 have many matrices that you don't want to join into a single Spark DataFrame 
 to store in a Parquet file.
 There seems to be demand for this functionality:
 http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3E
 I originally put this work in PR 6995, but closed it after suggestions from a 
 user to use NumPy's built in serialization. My second version is in PR 8384.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10172) History Server web UI gets messed up when sorting on any column

2015-08-23 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708309#comment-14708309
 ] 

Sean Owen commented on SPARK-10172:
---

Can you define 'messed up' with a more detailed description or a screen shot? 
or a pull request? it's not clear what you're reporting at the moment.

 History Server web UI gets messed up when sorting on any column
 ---

 Key: SPARK-10172
 URL: https://issues.apache.org/jira/browse/SPARK-10172
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.4.0, 1.4.1
Reporter: Min Shen

 If the history web UI displays the Attempt ID column, when clicking the 
 table header to sort on any column, the entire page gets messed up.
 This seems to be a problem with the sorttable.js not able to correctly handle 
 tables with rowspan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10172) History Server web UI gets messed up when sorting on any column

2015-08-23 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-10172:
--
Priority: Minor  (was: Major)

 History Server web UI gets messed up when sorting on any column
 ---

 Key: SPARK-10172
 URL: https://issues.apache.org/jira/browse/SPARK-10172
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.4.0, 1.4.1
Reporter: Min Shen
Priority: Minor

 If the history web UI displays the Attempt ID column, when clicking the 
 table header to sort on any column, the entire page gets messed up.
 This seems to be a problem with the sorttable.js not able to correctly handle 
 tables with rowspan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10174) refactor out project, filter, ordering generator from SparkPlan

2015-08-23 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10174:


Assignee: (was: Apache Spark)

 refactor out project, filter, ordering generator from SparkPlan
 ---

 Key: SPARK-10174
 URL: https://issues.apache.org/jira/browse/SPARK-10174
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Wenchen Fan
Priority: Trivial





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10174) refactor out project, filter, ordering generator from SparkPlan

2015-08-23 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10174:


Assignee: Apache Spark

 refactor out project, filter, ordering generator from SparkPlan
 ---

 Key: SPARK-10174
 URL: https://issues.apache.org/jira/browse/SPARK-10174
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Wenchen Fan
Assignee: Apache Spark
Priority: Trivial





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10174) refactor out project, filter, ordering generator from SparkPlan

2015-08-23 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708361#comment-14708361
 ] 

Apache Spark commented on SPARK-10174:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/8382

 refactor out project, filter, ordering generator from SparkPlan
 ---

 Key: SPARK-10174
 URL: https://issues.apache.org/jira/browse/SPARK-10174
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Wenchen Fan
Priority: Trivial





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10173) valueOfTerm get none when SparkIMain used on Java

2015-08-23 Thread SonixLegend (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708396#comment-14708396
 ] 

SonixLegend commented on SPARK-10173:
-

I have tested scala 2.11 ILoop and IMain on Java program and they works 
successfully. The error is just said Java call Spark Repl via Scala 2.11. Not 
Spark Shell or Scala Shell.

 valueOfTerm get none when SparkIMain used on Java 
 --

 Key: SPARK-10173
 URL: https://issues.apache.org/jira/browse/SPARK-10173
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell
Affects Versions: 1.4.1
 Environment: Spark 1.4.1 Scala 2.11 Java 1.7.0_21 Windows 8
Reporter: SonixLegend
 Fix For: 1.4.2


 When I use Spark Repl on my Java project, I found if I had used the class 
 SparkIMain via Scala 2.10 then the function valueOfTerm could get the term 
 value, but if I had used same class via Scala 2.11, the valueOfTerm gave me a 
 none result, and I checked the source code, found the line 
 sym.owner.companionSymbol got none. Can you fix it? I used a alternative 
 method, change valueOfTerm to eval, then it will get current result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9730) Sort Merge Join for Full Outer Join

2015-08-23 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9730:
---

Assignee: (was: Apache Spark)

 Sort Merge Join for Full Outer Join
 ---

 Key: SPARK-9730
 URL: https://issues.apache.org/jira/browse/SPARK-9730
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Josh Rosen





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9730) Sort Merge Join for Full Outer Join

2015-08-23 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708443#comment-14708443
 ] 

Apache Spark commented on SPARK-9730:
-

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/8383

 Sort Merge Join for Full Outer Join
 ---

 Key: SPARK-9730
 URL: https://issues.apache.org/jira/browse/SPARK-9730
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Josh Rosen





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-9730) Sort Merge Join for Full Outer Join

2015-08-23 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9730:
---

Assignee: Apache Spark

 Sort Merge Join for Full Outer Join
 ---

 Key: SPARK-9730
 URL: https://issues.apache.org/jira/browse/SPARK-9730
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Josh Rosen
Assignee: Apache Spark





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-10156) sql in sql NullPointerException

2015-08-23 Thread qihuang.zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

qihuang.zheng closed SPARK-10156.
-
Resolution: Fixed

sorry, my fault. invoke sql().map will not fire action. so row.get will not 
get value. that's why npe happen. change map method to collect.foreach will 
work.

 sql in sql NullPointerException
 ---

 Key: SPARK-10156
 URL: https://issues.apache.org/jira/browse/SPARK-10156
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.1
Reporter: qihuang.zheng

 I Have two tables, outer and inner. After I query outer table, I will use the 
 row value for the inner table. But get NullPointerException, here is the 
 example:  
val rdd1 = sc.parallelize(List((1,a),(1,b),(1,c)));
 rdd1.toDF(c1,c2).registerTempTable(test)
 cacheTable(test)
 val rdd2 = sc.parallelize(List((a,A),(a,AA),(b,B),(c,C)));
 rdd1.toDF(c1,c2).registerTempTable(test2)
 cacheTable(test2)
 val rdd = sql(select * from test).map(row={
   val k = row.getInt(0)
   val v = row.getString(1)
   val rddSet = sql(sselect collect_set(c2) from test2 where c1=$v)
   val set = rddSet.first().getSeq[String](0).toSet
   (k,v,set)
 })
 //NullPointerException
 rdd.count()
 And NPE:  
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in 
 stage 1.0 failed 4 times, most recent failure: Lost task 7.3 in stage 1.0 
 (TID 23, 192.168.6.53): java.lang.NullPointerException



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9401) Fully implement code generation for ConcatWs

2015-08-23 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-9401:
-
Assignee: Yijie Shen

 Fully implement code generation for ConcatWs
 

 Key: SPARK-9401
 URL: https://issues.apache.org/jira/browse/SPARK-9401
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Yijie Shen
 Fix For: 1.6.0


 In ConcatWs, we fall back to interpreted mode if the input is a mix of string 
 and array of strings. We should have code gen for those as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10130) type coercion for IF should have children resolved first

2015-08-23 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-10130:
--
Assignee: Adrian Wang

 type coercion for IF should have children resolved first
 

 Key: SPARK-10130
 URL: https://issues.apache.org/jira/browse/SPARK-10130
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Adrian Wang
Assignee: Adrian Wang
Priority: Blocker
 Fix For: 1.5.0


 SELECT IF(a  0, a, 0) FROM (SELECT key a FROM src) temp;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10171) AWS Lambda Executors

2015-08-23 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-10171:
--
Component/s: EC2

 AWS Lambda Executors
 

 Key: SPARK-10171
 URL: https://issues.apache.org/jira/browse/SPARK-10171
 Project: Spark
  Issue Type: Wish
  Components: EC2
Reporter: Jaka Jancar
Priority: Minor

 It would be great if Spark supported using AWS Lambda for execution in 
 addition to Standalone, Mesos and YARN, getting rid of the concept of a 
 cluster and having a single infinite-sized one.
 Couple of problems I see today:
   - Execution time is limited to 60s. This will probably change in the future.
   - Burstiness is still not very high.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10172) History Server web UI gets messed up when sorting on any column

2015-08-23 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-10172:
--
Component/s: Web UI

 History Server web UI gets messed up when sorting on any column
 ---

 Key: SPARK-10172
 URL: https://issues.apache.org/jira/browse/SPARK-10172
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.4.0, 1.4.1
Reporter: Min Shen
Priority: Minor

 If the history web UI displays the Attempt ID column, when clicking the 
 table header to sort on any column, the entire page gets messed up.
 This seems to be a problem with the sorttable.js not able to correctly handle 
 tables with rowspan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10173) valueOfTerm get none when SparkIMain used on Java

2015-08-23 Thread SonixLegend (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708320#comment-14708320
 ] 

SonixLegend commented on SPARK-10173:
-

This is my test code.

dependencies
dependency
groupIdorg.apache.spark/groupId
artifactIdspark-repl_2.11/artifactId
version1.4.1/version
/dependency
/dependencies

SparkILoop iloop = new SparkILoop();

Settings settings = new Settings();

BooleanSetting setting = (BooleanSetting) settings.usejavacp();

setting.v_$eq(true);

iloop.settings_$eq(settings);

iloop.createInterpreter();

SparkIMain imain = iloop.intp();

try {
imain.interpret(@transient var map = new 
java.util.HashMap[String, Object]());

MapString, Object map = (MapString, Object) 
imain.eval(map); //imain.valueOfTerm(map);

map.put(Test, 测试);

imain.interpret(@transient val string = 
map.get(\Test\).asInstanceOf[String]);
imain.interpret(println(string));
} catch (ScriptException e) {
e.printStackTrace();
}

 valueOfTerm get none when SparkIMain used on Java 
 --

 Key: SPARK-10173
 URL: https://issues.apache.org/jira/browse/SPARK-10173
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell
Affects Versions: 1.4.1
 Environment: Spark 1.4.1 Scala 2.11 Java 1.7.0_21 Windows 8
Reporter: SonixLegend
 Fix For: 1.4.2


 When I use Spark Repl on my Java project, I found if I had used the class 
 SparkIMain via Scala 2.10 then the function valueOfTerm could get the term 
 value, but if I had used same class via Scala 2.11, the valueOfTerm gave me a 
 none result, and I checked the source code, found the line 
 sym.owner.companionSymbol got none. Can you fix it? I used a alternative 
 method, change valueOfTerm to eval, then it will get current result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10173) valueOfTerm get none when SparkIMain used on Java

2015-08-23 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708324#comment-14708324
 ] 

Sean Owen commented on SPARK-10173:
---

Is this specific to Spark, or to the Scala shell? I am not sure this is 
behavior of Spark per se.

 valueOfTerm get none when SparkIMain used on Java 
 --

 Key: SPARK-10173
 URL: https://issues.apache.org/jira/browse/SPARK-10173
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell
Affects Versions: 1.4.1
 Environment: Spark 1.4.1 Scala 2.11 Java 1.7.0_21 Windows 8
Reporter: SonixLegend
 Fix For: 1.4.2


 When I use Spark Repl on my Java project, I found if I had used the class 
 SparkIMain via Scala 2.10 then the function valueOfTerm could get the term 
 value, but if I had used same class via Scala 2.11, the valueOfTerm gave me a 
 none result, and I checked the source code, found the line 
 sym.owner.companionSymbol got none. Can you fix it? I used a alternative 
 method, change valueOfTerm to eval, then it will get current result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10174) refactor out project, filter, ordering generator from SparkPlan

2015-08-23 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-10174:
---

 Summary: refactor out project, filter, ordering generator from 
SparkPlan
 Key: SPARK-10174
 URL: https://issues.apache.org/jira/browse/SPARK-10174
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Wenchen Fan
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10173) valueOfTerm get none when SparkIMain used on Java

2015-08-23 Thread SonixLegend (JIRA)
SonixLegend created SPARK-10173:
---

 Summary: valueOfTerm get none when SparkIMain used on Java 
 Key: SPARK-10173
 URL: https://issues.apache.org/jira/browse/SPARK-10173
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell
Affects Versions: 1.4.1
 Environment: Spark 1.4.1 Scala 2.11 Java 1.7.0_21 Windows 8
Reporter: SonixLegend
 Fix For: 1.4.2


When I use Spark Repl on my Java project, I found if I had used the class 
SparkIMain via Scala 2.10 then the function valueOfTerm could get the term 
value, but if I had used same class via Scala 2.11, the valueOfTerm gave me a 
none result, and I checked the source code, found the line 
sym.owner.companionSymbol got none. Can you fix it? I used a alternative 
method, change valueOfTerm to eval, then it will get current result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4879) Missing output partitions after job completes with speculative execution

2015-08-23 Thread Igor Berman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708331#comment-14708331
 ] 

Igor Berman commented on SPARK-4879:


there is blog post 
http://tech.grammarly.com/blog/posts/Petabyte-Scale-Text-Processing-with-Spark.html
 with a link to gist by Aaron Davidson: 
https://gist.github.com/aarondav/c513916e72101bbe14ec which mentions in 
comments that when using speculation it should be DirectOutputCommitter

might be somebody will find it useful (imho, it should be in documentation)


 Missing output partitions after job completes with speculative execution
 

 Key: SPARK-4879
 URL: https://issues.apache.org/jira/browse/SPARK-4879
 Project: Spark
  Issue Type: Bug
  Components: Input/Output, Spark Core
Affects Versions: 1.0.2, 1.1.1, 1.2.0, 1.3.0
Reporter: Josh Rosen
Assignee: Josh Rosen
Priority: Critical
  Labels: backport-needed
 Fix For: 1.3.0

 Attachments: speculation.txt, speculation2.txt


 When speculative execution is enabled ({{spark.speculation=true}}), jobs that 
 save output files may report that they have completed successfully even 
 though some output partitions written by speculative tasks may be missing.
 h3. Reproduction
 This symptom was reported to me by a Spark user and I've been doing my own 
 investigation to try to come up with an in-house reproduction.
 I'm still working on a reliable local reproduction for this issue, which is a 
 little tricky because Spark won't schedule speculated tasks on the same host 
 as the original task, so you need an actual (or containerized) multi-host 
 cluster to test speculation.  Here's a simple reproduction of some of the 
 symptoms on EC2, which can be run in {{spark-shell}} with {{--conf 
 spark.speculation=true}}:
 {code}
 // Rig a job such that all but one of the tasks complete instantly
 // and one task runs for 20 seconds on its first attempt and instantly
 // on its second attempt:
 val numTasks = 100
 sc.parallelize(1 to numTasks, 
 numTasks).repartition(2).mapPartitionsWithContext { case (ctx, iter) =
   if (ctx.partitionId == 0) {  // If this is the one task that should run 
 really slow
 if (ctx.attemptId == 0) {  // If this is the first attempt, run slow
  Thread.sleep(20 * 1000)
 }
   }
   iter
 }.map(x = (x, x)).saveAsTextFile(/test4)
 {code}
 When I run this, I end up with a job that completes quickly (due to 
 speculation) but reports failures from the speculated task:
 {code}
 [...]
 14/12/11 01:41:13 INFO scheduler.TaskSetManager: Finished task 37.1 in stage 
 3.0 (TID 411) in 131 ms on ip-172-31-8-164.us-west-2.compute.internal 
 (100/100)
 14/12/11 01:41:13 INFO scheduler.DAGScheduler: Stage 3 (saveAsTextFile at 
 console:22) finished in 0.856 s
 14/12/11 01:41:13 INFO spark.SparkContext: Job finished: saveAsTextFile at 
 console:22, took 0.885438374 s
 14/12/11 01:41:13 INFO scheduler.TaskSetManager: Ignoring task-finished event 
 for 70.1 in stage 3.0 because task 70 has already completed successfully
 scala 14/12/11 01:41:13 WARN scheduler.TaskSetManager: Lost task 49.1 in 
 stage 3.0 (TID 413, ip-172-31-8-164.us-west-2.compute.internal): 
 java.io.IOException: Failed to save output of task: 
 attempt_201412110141_0003_m_49_413
 
 org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:160)
 
 org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:172)
 
 org.apache.hadoop.mapred.FileOutputCommitter.commitTask(FileOutputCommitter.java:132)
 org.apache.spark.SparkHadoopWriter.commit(SparkHadoopWriter.scala:109)
 
 org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:991)
 
 org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:974)
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
 org.apache.spark.scheduler.Task.run(Task.scala:54)
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 java.lang.Thread.run(Thread.java:745)
 {code}
 One interesting thing to note about this stack trace: if we look at 
 {{FileOutputCommitter.java:160}} 
 ([link|http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-core/2.5.0-mr1-cdh5.2.0/org/apache/hadoop/mapred/FileOutputCommitter.java#160]),
  this point in the execution seems to correspond to a case where a task 
 completes, attempts to commit its 

[jira] [Resolved] (SPARK-10164) GMM bug: match error

2015-08-23 Thread Joseph K. Bradley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-10164.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 8370
[https://github.com/apache/spark/pull/8370]

 GMM bug: match error
 

 Key: SPARK-10164
 URL: https://issues.apache.org/jira/browse/SPARK-10164
 Project: Spark
  Issue Type: Bug
  Components: MLlib
Affects Versions: 1.5.0
Reporter: Joseph K. Bradley
Assignee: Joseph K. Bradley
Priority: Critical
 Fix For: 1.5.0


 GaussianMixture now distributes matrix decompositions for certain problem 
 sizes.  Distributed computation actually fails, but this was not tested in 
 unit tests.  This is a regression.
 Here is an example failure:
 {code}
 Exception in thread main scala.MatchError: ArrayBuffer(0.05001, 
 0.05001, 0.05001, 0.05
 001, 0.05001, 0.05001, 0.05001, 
 0.05001, 0.05001, 0.05001, 0.050
 01, 0.05001, 0.05001, 
 0.05001, 0.05001, 0.05001, 0.05000
 0001, 0.05001, 0.05001, 0.05001) (of 
 class scala.collection.mutable.ArrayBuffer)
 at scala.runtime.ScalaRunTime$.array_apply(ScalaRunTime.scala:71)
 at scala.Array$.slowcopy(Array.scala:81)
 at scala.Array$.copy(Array.scala:107)
 at 
 org.apache.spark.mllib.clustering.GaussianMixture.run(GaussianMixture.scala:215)
 at 
 mllib.perf.clustering.GaussianMixtureTest.run(GaussianMixtureTest.scala:60)
 at mllib.perf.TestRunner$$anonfun$2.apply(TestRunner.scala:66)
 at mllib.perf.TestRunner$$anonfun$2.apply(TestRunner.scala:64)
 at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at 
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
 at scala.collection.immutable.Range.foreach(Range.scala:141)
 at 
 scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
 at scala.collection.AbstractTraversable.map(Traversable.scala:105)
 at mllib.perf.TestRunner$.main(TestRunner.scala:64)
 at mllib.perf.TestRunner.main(TestRunner.scala)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
 at 
 org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
 at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
 at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
 at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
 15/08/21 21:25:33 INFO spark.SparkContext: Invoking stop() from shutdown hook
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9671) ML 1.5 QA: Programming guide update and migration guide

2015-08-23 Thread Joseph K. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708625#comment-14708625
 ] 

Joseph K. Bradley commented on SPARK-9671:
--

Should mention change to GradientDescent to use convergence tolerance: 
[SPARK-3382].  This will change the behavior by stopping early in some cases.  
Note: If we do not want to change the behavior, we should set the default 
tolerance to 0.

 ML 1.5 QA: Programming guide update and migration guide
 ---

 Key: SPARK-9671
 URL: https://issues.apache.org/jira/browse/SPARK-9671
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib
Reporter: Joseph K. Bradley
Priority: Critical

 Before the release, we need to update the MLlib Programming Guide.  Updates 
 will include:
 * Add migration guide subsection.
 ** Use the results of the QA audit JIRAs.
 * Check phrasing, especially in main sections (for outdated items such as In 
 this release, ...)
 * Possibly reorganize parts of the Pipelines guide if needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10172) History Server web UI gets messed up when sorting on any column

2015-08-23 Thread Min Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Min Shen updated SPARK-10172:
-
Attachment: screen-shot.png

[~srowen],

Screen shot attached. When Attempt ID column is displayed, after sorting 
based on any column, the columns in the table become misaligned.

 History Server web UI gets messed up when sorting on any column
 ---

 Key: SPARK-10172
 URL: https://issues.apache.org/jira/browse/SPARK-10172
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 1.4.0, 1.4.1
Reporter: Min Shen
Priority: Minor
 Attachments: screen-shot.png


 If the history web UI displays the Attempt ID column, when clicking the 
 table header to sort on any column, the entire page gets messed up.
 This seems to be a problem with the sorttable.js not able to correctly handle 
 tables with rowspan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10176) Show partially analyzed plan when checkAnswer df fails to resolve

2015-08-23 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-10176:


 Summary: Show partially analyzed plan when checkAnswer df fails to 
resolve
 Key: SPARK-10176
 URL: https://issues.apache.org/jira/browse/SPARK-10176
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Michael Armbrust
Assignee: Michael Armbrust


It would be much easier to debug test failures if we could see the failed plan 
instead of just the user friendly error message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9570) Consistent recommendation for submitting spark apps to YARN, -master yarn --deploy-mode x vs -master yarn-x'.

2015-08-23 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708600#comment-14708600
 ] 

Apache Spark commented on SPARK-9570:
-

User 'nssalian' has created a pull request for this issue:
https://github.com/apache/spark/pull/8385

 Consistent recommendation for submitting spark apps to YARN, -master yarn 
 --deploy-mode x vs -master yarn-x'.
 -

 Key: SPARK-9570
 URL: https://issues.apache.org/jira/browse/SPARK-9570
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, Spark Submit, YARN
Affects Versions: 1.4.1
Reporter: Neelesh Srinivas Salian
Priority: Minor
  Labels: starter

 There are still some inconsistencies in the documentation regarding 
 submission of the applications for yarn.
 SPARK-3629 was done to correct the same but 
 http://spark.apache.org/docs/latest/submitting-applications.html#master-urls
 still has yarn-client and yarn-client as opposed to the nor of having 
 --master yarn and --deploy-mode cluster / client
 Need to change this appropriately (if needed) to avoid confusion:
 https://spark.apache.org/docs/latest/running-on-yarn.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10145) Executor exit without useful messages when spark runs in spark-streaming

2015-08-23 Thread Baogang Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708653#comment-14708653
 ] 

Baogang Wang commented on SPARK-10145:
--

Any update?
This issue can be reproduced. 

 Executor exit without useful messages when spark runs in spark-streaming
 

 Key: SPARK-10145
 URL: https://issues.apache.org/jira/browse/SPARK-10145
 Project: Spark
  Issue Type: Bug
  Components: Streaming, YARN
 Environment: spark 1.3.1, hadoop 2.6.0, 6 nodes, each node has 32 
 cores and 32g memory  
Reporter: Baogang Wang
Priority: Critical
   Original Estimate: 168h
  Remaining Estimate: 168h

 Each node is allocated 30g memory by Yarn.
 My application receives messages from Kafka by directstream. Each application 
 consists of 4 dstream window
 Spark application is submitted by this command:
 spark-submit --class spark_security.safe.SafeSockPuppet  --driver-memory 3g 
 --executor-memory 3g --num-executors 3 --executor-cores 4  --name 
 safeSparkDealerUser --master yarn  --deploy-mode cluster  
 spark_Security-1.0-SNAPSHOT.jar.nocalse 
 hdfs://A01-R08-3-I160-102.JD.LOCAL:9000/spark_properties/safedealer.properties
 After about 1 hours, some executor exits. There is no more yarn logs after 
 the executor exits and there is no stack when the executor exits.
 When I see the yarn node manager log, it shows as follows :
 2015-08-17 17:25:41,550 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
  Start request for container_1439803298368_0005_01_01 by user root
 2015-08-17 17:25:41,551 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
  Creating a new application reference for app application_1439803298368_0005
 2015-08-17 17:25:41,551 INFO 
 org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root   
 IP=172.19.160.102   OPERATION=Start Container Request   
 TARGET=ContainerManageImpl  RESULT=SUCCESS  
 APPID=application_1439803298368_0005
 CONTAINERID=container_1439803298368_0005_01_01
 2015-08-17 17:25:41,551 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Application application_1439803298368_0005 transitioned from NEW to INITING
 2015-08-17 17:25:41,552 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Adding container_1439803298368_0005_01_01 to application 
 application_1439803298368_0005
 2015-08-17 17:25:41,557 WARN 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
  rollingMonitorInterval is set as -1. The log rolling mornitoring interval is 
 disabled. The logs will be aggregated after this application is finished.
 2015-08-17 17:25:41,663 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
  Application application_1439803298368_0005 transitioned from INITING to 
 RUNNING
 2015-08-17 17:25:41,664 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
  Container container_1439803298368_0005_01_01 transitioned from NEW to 
 LOCALIZING
 2015-08-17 17:25:41,664 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got 
 event CONTAINER_INIT for appId application_1439803298368_0005
 2015-08-17 17:25:41,664 INFO 
 org.apache.spark.network.yarn.YarnShuffleService: Initializing container 
 container_1439803298368_0005_01_01
 2015-08-17 17:25:41,665 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
  Resource 
 hdfs://A01-R08-3-I160-102.JD.LOCAL:9000/user/root/.sparkStaging/application_1439803298368_0005/spark-assembly-1.3.1-hadoop2.6.0.jar
  transitioned from INIT to DOWNLOADING
 2015-08-17 17:25:41,665 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
  Resource 
 hdfs://A01-R08-3-I160-102.JD.LOCAL:9000/user/root/.sparkStaging/application_1439803298368_0005/spark_Security-1.0-SNAPSHOT.jar
  transitioned from INIT to DOWNLOADING
 2015-08-17 17:25:41,665 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Created localizer for container_1439803298368_0005_01_01
 2015-08-17 17:25:41,668 INFO 
 org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
  Writing credentials to the nmPrivate file 
 /export/servers/hadoop2.6.0/tmp/nm-local-dir/nmPrivate/container_1439803298368_0005_01_01.tokens.
  Credentials list: 
 2015-08-17 17:25:41,682 INFO 
 org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: 
 Initializing user root
 2015-08-17 17:25:41,686 INFO 
 

[jira] [Created] (SPARK-10175) Enhance spark doap file

2015-08-23 Thread Luciano Resende (JIRA)
Luciano Resende created SPARK-10175:
---

 Summary: Enhance spark doap file
 Key: SPARK-10175
 URL: https://issues.apache.org/jira/browse/SPARK-10175
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Reporter: Luciano Resende


The Spark doap has broken links and is also missing entries related to issue 
tracker and mailing lists. This affects the list in projects.apache.org and 
also in the main apache website.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9803) Add transform and subset to DataFrame

2015-08-23 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708536#comment-14708536
 ] 

Felix Cheung commented on SPARK-9803:
-

subset would be a synonym for `[` 
https://issues.apache.org/jira/browse/SPARK-9316


 Add transform and subset  to DataFrame
 --

 Key: SPARK-9803
 URL: https://issues.apache.org/jira/browse/SPARK-9803
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Hossein Falaki

 These three base functions are heavily used with R dataframes. It would be 
 great to have them work with Spark DataFrames:
 * transform
 * subset



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10118) Improve SparkR API docs for 1.5 release

2015-08-23 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708645#comment-14708645
 ] 

Yu Ishikawa commented on SPARK-10118:
-

[~shivaram] sure. I'll send a PR about that later.

 Improve SparkR API docs for 1.5 release
 ---

 Key: SPARK-10118
 URL: https://issues.apache.org/jira/browse/SPARK-10118
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, SparkR
Reporter: Shivaram Venkataraman

 This includes checking if the new DataFrame functions  expression show up 
 appropriately in the roxygen docs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10175) Enhance spark doap file

2015-08-23 Thread Luciano Resende (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luciano Resende updated SPARK-10175:

Attachment: SPARK-10175

Updates to the doap located on the website svn repository.

 Enhance spark doap file
 ---

 Key: SPARK-10175
 URL: https://issues.apache.org/jira/browse/SPARK-10175
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Reporter: Luciano Resende
 Attachments: SPARK-10175


 The Spark doap has broken links and is also missing entries related to issue 
 tracker and mailing lists. This affects the list in projects.apache.org and 
 also in the main apache website.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10039) Resetting REPL state not work

2015-08-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708537#comment-14708537
 ] 

ASF GitHub Bot commented on SPARK-10039:


Github user felixcheung commented on the pull request:

https://github.com/apache/incubator-zeppelin/pull/228#issuecomment-133934888
  
We could call `sc.stop()` or `SparkIMain.reset()`?
Though apparently SparkIMain reset() has some issue: 
https://issues.apache.org/jira/browse/SPARK-10039


 Resetting REPL state not work
 -

 Key: SPARK-10039
 URL: https://issues.apache.org/jira/browse/SPARK-10039
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell
Affects Versions: 1.4.1
Reporter: Kevin Jung
Priority: Minor

 Spark shell can't find a base directory of class server after running 
 :reset command. 
 {quote}
 scala :reset 
 scala 1 
 uncaught exception during compilation: java.lang.AssertiON-ERROR 
 java.lang.AssertiON-ERROR: assertion failed: Tried to find '$line33' in 
 '/tmp/spark-f47f3917-ac31-4138-bf1a-a8cefd094ac3' but it is not a directory 
 ~~~impossible to command anymore including 'exit'~~~ 
 {quote}
 I figure out reset() method in SparkIMain try to delete virtualDirectory and 
 then create again. But virtualDirectory.create() makes a file, not a 
 directory. Details here.
 {quote}
 drwxrwxr-x.   3 root root0 2015-08-17 09:09 
 spark-9cfc6b06-c902-4caf-8712-9ea63f17d017
 (After :reset)
 \-rw-rw-r--.   1 root root0 2015-08-17 09:09 
 spark-9cfc6b06-c902-4caf-8712-9ea63f17d017
 {quote}
 vd.delete; vd.givenPath.createDirectory(true); will temporarily solve the 
 problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10118) Improve SparkR API docs for 1.5 release

2015-08-23 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708695#comment-14708695
 ] 

Apache Spark commented on SPARK-10118:
--

User 'yu-iskw' has created a pull request for this issue:
https://github.com/apache/spark/pull/8386

 Improve SparkR API docs for 1.5 release
 ---

 Key: SPARK-10118
 URL: https://issues.apache.org/jira/browse/SPARK-10118
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, SparkR
Reporter: Shivaram Venkataraman

 This includes checking if the new DataFrame functions  expression show up 
 appropriately in the roxygen docs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10118) Improve SparkR API docs for 1.5 release

2015-08-23 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10118:


Assignee: (was: Apache Spark)

 Improve SparkR API docs for 1.5 release
 ---

 Key: SPARK-10118
 URL: https://issues.apache.org/jira/browse/SPARK-10118
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, SparkR
Reporter: Shivaram Venkataraman

 This includes checking if the new DataFrame functions  expression show up 
 appropriately in the roxygen docs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10118) Improve SparkR API docs for 1.5 release

2015-08-23 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10118:


Assignee: Apache Spark

 Improve SparkR API docs for 1.5 release
 ---

 Key: SPARK-10118
 URL: https://issues.apache.org/jira/browse/SPARK-10118
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, SparkR
Reporter: Shivaram Venkataraman
Assignee: Apache Spark

 This includes checking if the new DataFrame functions  expression show up 
 appropriately in the roxygen docs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10134) Improve the performance of Binary Comparison

2015-08-23 Thread Cheng Hao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Hao updated SPARK-10134:
--
Priority: Minor  (was: Major)

 Improve the performance of Binary Comparison
 

 Key: SPARK-10134
 URL: https://issues.apache.org/jira/browse/SPARK-10134
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Cheng Hao
Priority: Minor

 Currently, compare the binary byte by byte is quite slow, use the Guava 
 utility to improve the performance, which take 8 bytes one time in the 
 comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10178) HiveComparision test should print out dependent tables

2015-08-23 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-10178:


 Summary: HiveComparision test should print out dependent tables
 Key: SPARK-10178
 URL: https://issues.apache.org/jira/browse/SPARK-10178
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Michael Armbrust
Assignee: Michael Armbrust






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10177) Parquet support interpret timestamp values differently from Hive 0.14.0+

2015-08-23 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-10177:
--

 Summary: Parquet support interpret timestamp values differently 
from Hive 0.14.0+
 Key: SPARK-10177
 URL: https://issues.apache.org/jira/browse/SPARK-10177
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: Cheng Lian
Assignee: Cheng Lian
Priority: Blocker


Running the following SQL under Hive 0.14.0+ (tested against 0.14.0 and 1.2.1):
{code:sql}
CREATE TABLE ts_test STORED AS PARQUET
AS SELECT CAST(2015-01-01 00:00:00 AS TIMESTAMP);
{code}
Then read the Parquet file generated by Hive with Spark SQL:
{noformat}
scala 
sqlContext.read.parquet(hdfs://localhost:9000/user/hive/warehouse_hive14/ts_test).collect()
res1: Array[org.apache.spark.sql.Row] = Array([2015-01-01 12:00:00.0])
{noformat}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10176) Show partially analyzed plan when checkAnswer df fails to resolve

2015-08-23 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10176:


Assignee: Michael Armbrust  (was: Apache Spark)

 Show partially analyzed plan when checkAnswer df fails to resolve
 -

 Key: SPARK-10176
 URL: https://issues.apache.org/jira/browse/SPARK-10176
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Michael Armbrust
Assignee: Michael Armbrust

 It would be much easier to debug test failures if we could see the failed 
 plan instead of just the user friendly error message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10176) Show partially analyzed plan when checkAnswer df fails to resolve

2015-08-23 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708818#comment-14708818
 ] 

Apache Spark commented on SPARK-10176:
--

User 'marmbrus' has created a pull request for this issue:
https://github.com/apache/spark/pull/8389

 Show partially analyzed plan when checkAnswer df fails to resolve
 -

 Key: SPARK-10176
 URL: https://issues.apache.org/jira/browse/SPARK-10176
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Michael Armbrust
Assignee: Michael Armbrust

 It would be much easier to debug test failures if we could see the failed 
 plan instead of just the user friendly error message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10179) LogisticRegressionWithSGD does not multiclass

2015-08-23 Thread Shiqiao Du (JIRA)
Shiqiao Du created SPARK-10179:
--

 Summary: LogisticRegressionWithSGD does not multiclass 
 Key: SPARK-10179
 URL: https://issues.apache.org/jira/browse/SPARK-10179
 Project: Spark
  Issue Type: Bug
  Components: MLlib
 Environment: Ubuntu 14.04, spark 1.4.1
Reporter: Shiqiao Du


LogisticRegressionWithSGD does not support Multi-Class input while 
LogisticRegressionWithLBFGS is OK.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10178) HiveComparision test should print out dependent tables

2015-08-23 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708807#comment-14708807
 ] 

Apache Spark commented on SPARK-10178:
--

User 'marmbrus' has created a pull request for this issue:
https://github.com/apache/spark/pull/8388

 HiveComparision test should print out dependent tables
 --

 Key: SPARK-10178
 URL: https://issues.apache.org/jira/browse/SPARK-10178
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Michael Armbrust
Assignee: Michael Armbrust





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10178) HiveComparision test should print out dependent tables

2015-08-23 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10178:


Assignee: Apache Spark  (was: Michael Armbrust)

 HiveComparision test should print out dependent tables
 --

 Key: SPARK-10178
 URL: https://issues.apache.org/jira/browse/SPARK-10178
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Michael Armbrust
Assignee: Apache Spark





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10178) HiveComparision test should print out dependent tables

2015-08-23 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10178:


Assignee: Michael Armbrust  (was: Apache Spark)

 HiveComparision test should print out dependent tables
 --

 Key: SPARK-10178
 URL: https://issues.apache.org/jira/browse/SPARK-10178
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Michael Armbrust
Assignee: Michael Armbrust





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-10176) Show partially analyzed plan when checkAnswer df fails to resolve

2015-08-23 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10176:


Assignee: Apache Spark  (was: Michael Armbrust)

 Show partially analyzed plan when checkAnswer df fails to resolve
 -

 Key: SPARK-10176
 URL: https://issues.apache.org/jira/browse/SPARK-10176
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Michael Armbrust
Assignee: Apache Spark

 It would be much easier to debug test failures if we could see the failed 
 plan instead of just the user friendly error message.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-10142) Python Streaming checkpoint recovery does not work with non-local file path

2015-08-23 Thread Tathagata Das (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das resolved SPARK-10142.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

 Python Streaming checkpoint recovery does not work with non-local file path
 ---

 Key: SPARK-10142
 URL: https://issues.apache.org/jira/browse/SPARK-10142
 Project: Spark
  Issue Type: Bug
  Components: PySpark, Streaming
Affects Versions: 1.3.1, 1.4.1
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Critical
 Fix For: 1.5.0


 The Python code in StreamingContext.getOrCreate() check whether the give 
 checkpointPath exists on local file system. The solution is to use the same 
 code path as Java to verify whether the valid checkpoint is present or not



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-979) Add some randomization to scheduler to better balance in-memory partition distributions

2015-08-23 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708739#comment-14708739
 ] 

Apache Spark commented on SPARK-979:


User 'tdas' has created a pull request for this issue:
https://github.com/apache/spark/pull/8387

 Add some randomization to scheduler to better balance in-memory partition 
 distributions
 ---

 Key: SPARK-979
 URL: https://issues.apache.org/jira/browse/SPARK-979
 Project: Spark
  Issue Type: Improvement
Reporter: Reynold Xin
Assignee: Kay Ousterhout
 Fix For: 1.0.0


 The Spark scheduler is very deterministic, which causes problems for the 
 following workload (in serial order on a cluster with a small number of 
 nodes):
 cache rdd 1 with 1 partition
 cache rdd 2 with 1 partition
 cache rdd 3 with 1 partition
 
 After a while, only executor 1 will have data in memory, and eventually 
 leading to evicting in-memory blocks to disk while all other executors are 
 empty. 
 We can solve this problem by adding some randomization to the cluster 
 scheduling, or by adding memory aware scheduling (which is much harder to 
 do). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10134) Improve the performance of Binary Comparison

2015-08-23 Thread Cheng Hao (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Hao updated SPARK-10134:
--
Fix Version/s: (was: 1.6.0)

 Improve the performance of Binary Comparison
 

 Key: SPARK-10134
 URL: https://issues.apache.org/jira/browse/SPARK-10134
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Cheng Hao
Priority: Minor

 Currently, compare the binary byte by byte is quite slow, use the Guava 
 utility to improve the performance, which take 8 bytes one time in the 
 comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10134) Improve the performance of Binary Comparison

2015-08-23 Thread Cheng Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708766#comment-14708766
 ] 

Cheng Hao commented on SPARK-10134:
---

We can improve that by enable the comparison every 8 bytes for a 64 bit OS. 
https://bugs.openjdk.java.net/browse/JDK-8033148

 Improve the performance of Binary Comparison
 

 Key: SPARK-10134
 URL: https://issues.apache.org/jira/browse/SPARK-10134
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Cheng Hao

 Currently, compare the binary byte by byte is quite slow, use the Guava 
 utility to improve the performance, which take 8 bytes one time in the 
 comparison.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org