[jira] [Commented] (SPARK-8510) NumPy arrays and matrices as values in sequence files
[ https://issues.apache.org/jira/browse/SPARK-8510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708461#comment-14708461 ] Apache Spark commented on SPARK-8510: - User 'paberline' has created a pull request for this issue: https://github.com/apache/spark/pull/8384 NumPy arrays and matrices as values in sequence files - Key: SPARK-8510 URL: https://issues.apache.org/jira/browse/SPARK-8510 Project: Spark Issue Type: Improvement Components: PySpark Reporter: Peter Aberline Priority: Minor Using the DoubleArrayWritable example, I have added support for storing NumPy double arrays and matrices as arrays of doubles and nested arrays of doubles as value elements of Sequence Files. Each value element is a discrete matrix or array. This is useful where you have many matrices that you don't want to join into a single Spark Data Frame to store in a Parquet file. Pandas DataFrames can be easily converted to and from NumPy matrices, so I've also added the ability to store the schema-less data from DataFrames and Series that contain double data. There seems to be demand for this functionality: http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3E I'll be issuing a PR for this shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)
[ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708410#comment-14708410 ] Nick Xie commented on SPARK-3655: - I wanted to add a session id to each detail record, but only way I can do that with mapStreamByKey is to create a LinkedList of detail records and return the lists' iterator which will take up extra memory as opposed to just modifying the record. I ended up just creating a linkedlist of only session records. It seems to work on my test machine. I will test it on the cluster next week. Support sorting of values in addition to keys (i.e. secondary sort) --- Key: SPARK-3655 URL: https://issues.apache.org/jira/browse/SPARK-3655 Project: Spark Issue Type: New Feature Components: Spark Core Affects Versions: 1.1.0, 1.2.0 Reporter: koert kuipers Assignee: Koert Kuipers Now that spark has a sort based shuffle, can we expect a secondary sort soon? There are some use cases where getting a sorted iterator of values per key is helpful. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10173) valueOfTerm get none when SparkIMain used on Java
[ https://issues.apache.org/jira/browse/SPARK-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708444#comment-14708444 ] Sean Owen commented on SPARK-10173: --- It sounds like you're expecting some function of the Scala 2.11 REPL in the Spark REPL, which is not necessarily a guarantee. What do you mean that it occurs in the REPL but not the shell? The REPL isn't really a library in any event. In any event, a pull request might clarify what you want to change. valueOfTerm get none when SparkIMain used on Java -- Key: SPARK-10173 URL: https://issues.apache.org/jira/browse/SPARK-10173 Project: Spark Issue Type: Bug Components: Spark Shell Affects Versions: 1.4.1 Environment: Spark 1.4.1 Scala 2.11 Java 1.7.0_21 Windows 8 Reporter: SonixLegend Fix For: 1.4.2 When I use Spark Repl on my Java project, I found if I had used the class SparkIMain via Scala 2.10 then the function valueOfTerm could get the term value, but if I had used same class via Scala 2.11, the valueOfTerm gave me a none result, and I checked the source code, found the line sym.owner.companionSymbol got none. Can you fix it? I used a alternative method, change valueOfTerm to eval, then it will get current result. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8510) NumPy arrays and matrices as values in sequence files
[ https://issues.apache.org/jira/browse/SPARK-8510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Aberline updated SPARK-8510: -- Description: Using the DoubleArrayWritable as an example, I have added support for storing NumPy arrays and matrices as elements of Sequence Files. Each value element is a discrete matrix or array. This is useful where you have many matrices that you don't want to join into a single Spark Data Frame to store in a Parquet file. There seems to be demand for this functionality: http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3E I originally put this work in PR 6995, but closed it after suggestions from a user on to use NumPy's built in serialization. My second version is in PR 8384. was: Using the DoubleArrayWritable example, I have added support for storing NumPy double arrays and matrices as arrays of doubles and nested arrays of doubles as value elements of Sequence Files. Each value element is a discrete matrix or array. This is useful where you have many matrices that you don't want to join into a single Spark Data Frame to store in a Parquet file. Pandas DataFrames can be easily converted to and from NumPy matrices, so I've also added the ability to store the schema-less data from DataFrames and Series that contain double data. There seems to be demand for this functionality: http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3E I'll be issuing a PR for this shortly. NumPy arrays and matrices as values in sequence files - Key: SPARK-8510 URL: https://issues.apache.org/jira/browse/SPARK-8510 Project: Spark Issue Type: Improvement Components: PySpark Reporter: Peter Aberline Priority: Minor Using the DoubleArrayWritable as an example, I have added support for storing NumPy arrays and matrices as elements of Sequence Files. Each value element is a discrete matrix or array. This is useful where you have many matrices that you don't want to join into a single Spark Data Frame to store in a Parquet file. There seems to be demand for this functionality: http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3E I originally put this work in PR 6995, but closed it after suggestions from a user on to use NumPy's built in serialization. My second version is in PR 8384. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-8510) NumPy arrays and matrices as values in sequence files
[ https://issues.apache.org/jira/browse/SPARK-8510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Aberline updated SPARK-8510: -- Comment: was deleted (was: See PR at https://github.com/apache/spark/pull/6995) NumPy arrays and matrices as values in sequence files - Key: SPARK-8510 URL: https://issues.apache.org/jira/browse/SPARK-8510 Project: Spark Issue Type: Improvement Components: PySpark Reporter: Peter Aberline Priority: Minor Using the DoubleArrayWritable as an example, I have added support for storing NumPy arrays and matrices as elements of Sequence Files. Each value element is a discrete matrix or array. This is useful where you have many matrices that you don't want to join into a single Spark Data Frame to store in a Parquet file. There seems to be demand for this functionality: http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3E I originally put this work in PR 6995, but closed it after suggestions from a user to use NumPy's built in serialization. My second version is in PR 8384. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8510) NumPy arrays and matrices as values in sequence files
[ https://issues.apache.org/jira/browse/SPARK-8510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Aberline updated SPARK-8510: -- Description: Using the DoubleArrayWritable as an example, I have added support for storing NumPy arrays and matrices as elements of Sequence Files. Each value element is a discrete matrix or array. This is useful where you have many matrices that you don't want to join into a single Spark Data Frame to store in a Parquet file. There seems to be demand for this functionality: http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3E I originally put this work in PR 6995, but closed it after suggestions from a user to use NumPy's built in serialization. My second version is in PR 8384. was: Using the DoubleArrayWritable as an example, I have added support for storing NumPy arrays and matrices as elements of Sequence Files. Each value element is a discrete matrix or array. This is useful where you have many matrices that you don't want to join into a single Spark Data Frame to store in a Parquet file. There seems to be demand for this functionality: http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3E I originally put this work in PR 6995, but closed it after suggestions from a user on to use NumPy's built in serialization. My second version is in PR 8384. NumPy arrays and matrices as values in sequence files - Key: SPARK-8510 URL: https://issues.apache.org/jira/browse/SPARK-8510 Project: Spark Issue Type: Improvement Components: PySpark Reporter: Peter Aberline Priority: Minor Using the DoubleArrayWritable as an example, I have added support for storing NumPy arrays and matrices as elements of Sequence Files. Each value element is a discrete matrix or array. This is useful where you have many matrices that you don't want to join into a single Spark Data Frame to store in a Parquet file. There seems to be demand for this functionality: http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3E I originally put this work in PR 6995, but closed it after suggestions from a user to use NumPy's built in serialization. My second version is in PR 8384. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-10173) valueOfTerm get none when SparkIMain used on Java
[ https://issues.apache.org/jira/browse/SPARK-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SonixLegend updated SPARK-10173: Comment: was deleted (was: Apache Zepplin is wrote and built on Scala 2.10, and it used Spark Repl interpreter. It works on Spark which built Scala 2.10. But I have built a Spark via Scala 2.11, then I updated the Zepplin to Scala 2.11, so I got the problem about valueOfTerm function return none. You means it is not necessary. All right, I think I can wait for Zepplin upgrade to Scala 2.11. Thanks for your help.) valueOfTerm get none when SparkIMain used on Java -- Key: SPARK-10173 URL: https://issues.apache.org/jira/browse/SPARK-10173 Project: Spark Issue Type: Bug Components: Spark Shell Affects Versions: 1.4.1 Environment: Spark 1.4.1 Scala 2.11 Java 1.7.0_21 Windows 8 Reporter: SonixLegend Fix For: 1.4.2 When I use Spark Repl on my Java project, I found if I had used the class SparkIMain via Scala 2.10 then the function valueOfTerm could get the term value, but if I had used same class via Scala 2.11, the valueOfTerm gave me a none result, and I checked the source code, found the line sym.owner.companionSymbol got none. Can you fix it? I used a alternative method, change valueOfTerm to eval, then it will get current result. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10173) valueOfTerm get none when SparkIMain used on Java
[ https://issues.apache.org/jira/browse/SPARK-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708450#comment-14708450 ] SonixLegend commented on SPARK-10173: - Apache Zepplin is wrote and built on Scala 2.10, and it used Spark Repl interpreter. It works on Spark which built Scala 2.10. But I have built a Spark via Scala 2.11, then I updated the Zepplin to Scala 2.11, so I got the problem about valueOfTerm function return none. You means it is not necessary. All right, I think I can wait for Zepplin upgrade to Scala 2.11. Thanks for your help. valueOfTerm get none when SparkIMain used on Java -- Key: SPARK-10173 URL: https://issues.apache.org/jira/browse/SPARK-10173 Project: Spark Issue Type: Bug Components: Spark Shell Affects Versions: 1.4.1 Environment: Spark 1.4.1 Scala 2.11 Java 1.7.0_21 Windows 8 Reporter: SonixLegend Fix For: 1.4.2 When I use Spark Repl on my Java project, I found if I had used the class SparkIMain via Scala 2.10 then the function valueOfTerm could get the term value, but if I had used same class via Scala 2.11, the valueOfTerm gave me a none result, and I checked the source code, found the line sym.owner.companionSymbol got none. Can you fix it? I used a alternative method, change valueOfTerm to eval, then it will get current result. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10173) valueOfTerm get none when SparkIMain used on Java
[ https://issues.apache.org/jira/browse/SPARK-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708449#comment-14708449 ] SonixLegend commented on SPARK-10173: - Apache Zepplin is wrote and built on Scala 2.10, and it used Spark Repl interpreter. It works on Spark which built Scala 2.10. But I have built a Spark via Scala 2.11, then I updated the Zepplin to Scala 2.11, so I got the problem about valueOfTerm function return none. You means it is not necessary. All right, I think I can wait for Zepplin upgrade to Scala 2.11. Thanks for your help. valueOfTerm get none when SparkIMain used on Java -- Key: SPARK-10173 URL: https://issues.apache.org/jira/browse/SPARK-10173 Project: Spark Issue Type: Bug Components: Spark Shell Affects Versions: 1.4.1 Environment: Spark 1.4.1 Scala 2.11 Java 1.7.0_21 Windows 8 Reporter: SonixLegend Fix For: 1.4.2 When I use Spark Repl on my Java project, I found if I had used the class SparkIMain via Scala 2.10 then the function valueOfTerm could get the term value, but if I had used same class via Scala 2.11, the valueOfTerm gave me a none result, and I checked the source code, found the line sym.owner.companionSymbol got none. Can you fix it? I used a alternative method, change valueOfTerm to eval, then it will get current result. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8510) NumPy arrays and matrices as values in sequence files
[ https://issues.apache.org/jira/browse/SPARK-8510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Aberline updated SPARK-8510: -- Description: Using the DoubleArrayWritable as an example, I have added support for storing NumPy arrays and matrices as elements of Sequence Files. Each value element is a discrete matrix or array. This is useful where you have many matrices that you don't want to join into a single Spark DataFrame to store in a Parquet file. There seems to be demand for this functionality: http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3E I originally put this work in PR 6995, but closed it after suggestions from a user to use NumPy's built in serialization. My second version is in PR 8384. was: Using the DoubleArrayWritable as an example, I have added support for storing NumPy arrays and matrices as elements of Sequence Files. Each value element is a discrete matrix or array. This is useful where you have many matrices that you don't want to join into a single Spark Data Frame to store in a Parquet file. There seems to be demand for this functionality: http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3E I originally put this work in PR 6995, but closed it after suggestions from a user to use NumPy's built in serialization. My second version is in PR 8384. NumPy arrays and matrices as values in sequence files - Key: SPARK-8510 URL: https://issues.apache.org/jira/browse/SPARK-8510 Project: Spark Issue Type: Improvement Components: PySpark Reporter: Peter Aberline Priority: Minor Using the DoubleArrayWritable as an example, I have added support for storing NumPy arrays and matrices as elements of Sequence Files. Each value element is a discrete matrix or array. This is useful where you have many matrices that you don't want to join into a single Spark DataFrame to store in a Parquet file. There seems to be demand for this functionality: http://mail-archives.us.apache.org/mod_mbox/spark-user/201506.mbox/%3CCAJQK-mg1PUCc_hkV=q3n-01ioq_pkwe1g-c39ximco3khqn...@mail.gmail.com%3E I originally put this work in PR 6995, but closed it after suggestions from a user to use NumPy's built in serialization. My second version is in PR 8384. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10172) History Server web UI gets messed up when sorting on any column
[ https://issues.apache.org/jira/browse/SPARK-10172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708309#comment-14708309 ] Sean Owen commented on SPARK-10172: --- Can you define 'messed up' with a more detailed description or a screen shot? or a pull request? it's not clear what you're reporting at the moment. History Server web UI gets messed up when sorting on any column --- Key: SPARK-10172 URL: https://issues.apache.org/jira/browse/SPARK-10172 Project: Spark Issue Type: Bug Affects Versions: 1.4.0, 1.4.1 Reporter: Min Shen If the history web UI displays the Attempt ID column, when clicking the table header to sort on any column, the entire page gets messed up. This seems to be a problem with the sorttable.js not able to correctly handle tables with rowspan. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10172) History Server web UI gets messed up when sorting on any column
[ https://issues.apache.org/jira/browse/SPARK-10172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-10172: -- Priority: Minor (was: Major) History Server web UI gets messed up when sorting on any column --- Key: SPARK-10172 URL: https://issues.apache.org/jira/browse/SPARK-10172 Project: Spark Issue Type: Bug Affects Versions: 1.4.0, 1.4.1 Reporter: Min Shen Priority: Minor If the history web UI displays the Attempt ID column, when clicking the table header to sort on any column, the entire page gets messed up. This seems to be a problem with the sorttable.js not able to correctly handle tables with rowspan. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10174) refactor out project, filter, ordering generator from SparkPlan
[ https://issues.apache.org/jira/browse/SPARK-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10174: Assignee: (was: Apache Spark) refactor out project, filter, ordering generator from SparkPlan --- Key: SPARK-10174 URL: https://issues.apache.org/jira/browse/SPARK-10174 Project: Spark Issue Type: Improvement Components: SQL Reporter: Wenchen Fan Priority: Trivial -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10174) refactor out project, filter, ordering generator from SparkPlan
[ https://issues.apache.org/jira/browse/SPARK-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10174: Assignee: Apache Spark refactor out project, filter, ordering generator from SparkPlan --- Key: SPARK-10174 URL: https://issues.apache.org/jira/browse/SPARK-10174 Project: Spark Issue Type: Improvement Components: SQL Reporter: Wenchen Fan Assignee: Apache Spark Priority: Trivial -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10174) refactor out project, filter, ordering generator from SparkPlan
[ https://issues.apache.org/jira/browse/SPARK-10174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708361#comment-14708361 ] Apache Spark commented on SPARK-10174: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/8382 refactor out project, filter, ordering generator from SparkPlan --- Key: SPARK-10174 URL: https://issues.apache.org/jira/browse/SPARK-10174 Project: Spark Issue Type: Improvement Components: SQL Reporter: Wenchen Fan Priority: Trivial -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10173) valueOfTerm get none when SparkIMain used on Java
[ https://issues.apache.org/jira/browse/SPARK-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708396#comment-14708396 ] SonixLegend commented on SPARK-10173: - I have tested scala 2.11 ILoop and IMain on Java program and they works successfully. The error is just said Java call Spark Repl via Scala 2.11. Not Spark Shell or Scala Shell. valueOfTerm get none when SparkIMain used on Java -- Key: SPARK-10173 URL: https://issues.apache.org/jira/browse/SPARK-10173 Project: Spark Issue Type: Bug Components: Spark Shell Affects Versions: 1.4.1 Environment: Spark 1.4.1 Scala 2.11 Java 1.7.0_21 Windows 8 Reporter: SonixLegend Fix For: 1.4.2 When I use Spark Repl on my Java project, I found if I had used the class SparkIMain via Scala 2.10 then the function valueOfTerm could get the term value, but if I had used same class via Scala 2.11, the valueOfTerm gave me a none result, and I checked the source code, found the line sym.owner.companionSymbol got none. Can you fix it? I used a alternative method, change valueOfTerm to eval, then it will get current result. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9730) Sort Merge Join for Full Outer Join
[ https://issues.apache.org/jira/browse/SPARK-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9730: --- Assignee: (was: Apache Spark) Sort Merge Join for Full Outer Join --- Key: SPARK-9730 URL: https://issues.apache.org/jira/browse/SPARK-9730 Project: Spark Issue Type: New Feature Components: SQL Reporter: Josh Rosen -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9730) Sort Merge Join for Full Outer Join
[ https://issues.apache.org/jira/browse/SPARK-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708443#comment-14708443 ] Apache Spark commented on SPARK-9730: - User 'viirya' has created a pull request for this issue: https://github.com/apache/spark/pull/8383 Sort Merge Join for Full Outer Join --- Key: SPARK-9730 URL: https://issues.apache.org/jira/browse/SPARK-9730 Project: Spark Issue Type: New Feature Components: SQL Reporter: Josh Rosen -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-9730) Sort Merge Join for Full Outer Join
[ https://issues.apache.org/jira/browse/SPARK-9730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9730: --- Assignee: Apache Spark Sort Merge Join for Full Outer Join --- Key: SPARK-9730 URL: https://issues.apache.org/jira/browse/SPARK-9730 Project: Spark Issue Type: New Feature Components: SQL Reporter: Josh Rosen Assignee: Apache Spark -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-10156) sql in sql NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-10156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] qihuang.zheng closed SPARK-10156. - Resolution: Fixed sorry, my fault. invoke sql().map will not fire action. so row.get will not get value. that's why npe happen. change map method to collect.foreach will work. sql in sql NullPointerException --- Key: SPARK-10156 URL: https://issues.apache.org/jira/browse/SPARK-10156 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.4.1 Reporter: qihuang.zheng I Have two tables, outer and inner. After I query outer table, I will use the row value for the inner table. But get NullPointerException, here is the example: val rdd1 = sc.parallelize(List((1,a),(1,b),(1,c))); rdd1.toDF(c1,c2).registerTempTable(test) cacheTable(test) val rdd2 = sc.parallelize(List((a,A),(a,AA),(b,B),(c,C))); rdd1.toDF(c1,c2).registerTempTable(test2) cacheTable(test2) val rdd = sql(select * from test).map(row={ val k = row.getInt(0) val v = row.getString(1) val rddSet = sql(sselect collect_set(c2) from test2 where c1=$v) val set = rddSet.first().getSeq[String](0).toSet (k,v,set) }) //NullPointerException rdd.count() And NPE: org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 1.0 failed 4 times, most recent failure: Lost task 7.3 in stage 1.0 (TID 23, 192.168.6.53): java.lang.NullPointerException -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9401) Fully implement code generation for ConcatWs
[ https://issues.apache.org/jira/browse/SPARK-9401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-9401: - Assignee: Yijie Shen Fully implement code generation for ConcatWs Key: SPARK-9401 URL: https://issues.apache.org/jira/browse/SPARK-9401 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Yijie Shen Fix For: 1.6.0 In ConcatWs, we fall back to interpreted mode if the input is a mix of string and array of strings. We should have code gen for those as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10130) type coercion for IF should have children resolved first
[ https://issues.apache.org/jira/browse/SPARK-10130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-10130: -- Assignee: Adrian Wang type coercion for IF should have children resolved first Key: SPARK-10130 URL: https://issues.apache.org/jira/browse/SPARK-10130 Project: Spark Issue Type: Bug Components: SQL Reporter: Adrian Wang Assignee: Adrian Wang Priority: Blocker Fix For: 1.5.0 SELECT IF(a 0, a, 0) FROM (SELECT key a FROM src) temp; -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10171) AWS Lambda Executors
[ https://issues.apache.org/jira/browse/SPARK-10171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-10171: -- Component/s: EC2 AWS Lambda Executors Key: SPARK-10171 URL: https://issues.apache.org/jira/browse/SPARK-10171 Project: Spark Issue Type: Wish Components: EC2 Reporter: Jaka Jancar Priority: Minor It would be great if Spark supported using AWS Lambda for execution in addition to Standalone, Mesos and YARN, getting rid of the concept of a cluster and having a single infinite-sized one. Couple of problems I see today: - Execution time is limited to 60s. This will probably change in the future. - Burstiness is still not very high. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10172) History Server web UI gets messed up when sorting on any column
[ https://issues.apache.org/jira/browse/SPARK-10172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-10172: -- Component/s: Web UI History Server web UI gets messed up when sorting on any column --- Key: SPARK-10172 URL: https://issues.apache.org/jira/browse/SPARK-10172 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.4.0, 1.4.1 Reporter: Min Shen Priority: Minor If the history web UI displays the Attempt ID column, when clicking the table header to sort on any column, the entire page gets messed up. This seems to be a problem with the sorttable.js not able to correctly handle tables with rowspan. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10173) valueOfTerm get none when SparkIMain used on Java
[ https://issues.apache.org/jira/browse/SPARK-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708320#comment-14708320 ] SonixLegend commented on SPARK-10173: - This is my test code. dependencies dependency groupIdorg.apache.spark/groupId artifactIdspark-repl_2.11/artifactId version1.4.1/version /dependency /dependencies SparkILoop iloop = new SparkILoop(); Settings settings = new Settings(); BooleanSetting setting = (BooleanSetting) settings.usejavacp(); setting.v_$eq(true); iloop.settings_$eq(settings); iloop.createInterpreter(); SparkIMain imain = iloop.intp(); try { imain.interpret(@transient var map = new java.util.HashMap[String, Object]()); MapString, Object map = (MapString, Object) imain.eval(map); //imain.valueOfTerm(map); map.put(Test, 测试); imain.interpret(@transient val string = map.get(\Test\).asInstanceOf[String]); imain.interpret(println(string)); } catch (ScriptException e) { e.printStackTrace(); } valueOfTerm get none when SparkIMain used on Java -- Key: SPARK-10173 URL: https://issues.apache.org/jira/browse/SPARK-10173 Project: Spark Issue Type: Bug Components: Spark Shell Affects Versions: 1.4.1 Environment: Spark 1.4.1 Scala 2.11 Java 1.7.0_21 Windows 8 Reporter: SonixLegend Fix For: 1.4.2 When I use Spark Repl on my Java project, I found if I had used the class SparkIMain via Scala 2.10 then the function valueOfTerm could get the term value, but if I had used same class via Scala 2.11, the valueOfTerm gave me a none result, and I checked the source code, found the line sym.owner.companionSymbol got none. Can you fix it? I used a alternative method, change valueOfTerm to eval, then it will get current result. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10173) valueOfTerm get none when SparkIMain used on Java
[ https://issues.apache.org/jira/browse/SPARK-10173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708324#comment-14708324 ] Sean Owen commented on SPARK-10173: --- Is this specific to Spark, or to the Scala shell? I am not sure this is behavior of Spark per se. valueOfTerm get none when SparkIMain used on Java -- Key: SPARK-10173 URL: https://issues.apache.org/jira/browse/SPARK-10173 Project: Spark Issue Type: Bug Components: Spark Shell Affects Versions: 1.4.1 Environment: Spark 1.4.1 Scala 2.11 Java 1.7.0_21 Windows 8 Reporter: SonixLegend Fix For: 1.4.2 When I use Spark Repl on my Java project, I found if I had used the class SparkIMain via Scala 2.10 then the function valueOfTerm could get the term value, but if I had used same class via Scala 2.11, the valueOfTerm gave me a none result, and I checked the source code, found the line sym.owner.companionSymbol got none. Can you fix it? I used a alternative method, change valueOfTerm to eval, then it will get current result. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10174) refactor out project, filter, ordering generator from SparkPlan
Wenchen Fan created SPARK-10174: --- Summary: refactor out project, filter, ordering generator from SparkPlan Key: SPARK-10174 URL: https://issues.apache.org/jira/browse/SPARK-10174 Project: Spark Issue Type: Improvement Components: SQL Reporter: Wenchen Fan Priority: Trivial -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10173) valueOfTerm get none when SparkIMain used on Java
SonixLegend created SPARK-10173: --- Summary: valueOfTerm get none when SparkIMain used on Java Key: SPARK-10173 URL: https://issues.apache.org/jira/browse/SPARK-10173 Project: Spark Issue Type: Bug Components: Spark Shell Affects Versions: 1.4.1 Environment: Spark 1.4.1 Scala 2.11 Java 1.7.0_21 Windows 8 Reporter: SonixLegend Fix For: 1.4.2 When I use Spark Repl on my Java project, I found if I had used the class SparkIMain via Scala 2.10 then the function valueOfTerm could get the term value, but if I had used same class via Scala 2.11, the valueOfTerm gave me a none result, and I checked the source code, found the line sym.owner.companionSymbol got none. Can you fix it? I used a alternative method, change valueOfTerm to eval, then it will get current result. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4879) Missing output partitions after job completes with speculative execution
[ https://issues.apache.org/jira/browse/SPARK-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708331#comment-14708331 ] Igor Berman commented on SPARK-4879: there is blog post http://tech.grammarly.com/blog/posts/Petabyte-Scale-Text-Processing-with-Spark.html with a link to gist by Aaron Davidson: https://gist.github.com/aarondav/c513916e72101bbe14ec which mentions in comments that when using speculation it should be DirectOutputCommitter might be somebody will find it useful (imho, it should be in documentation) Missing output partitions after job completes with speculative execution Key: SPARK-4879 URL: https://issues.apache.org/jira/browse/SPARK-4879 Project: Spark Issue Type: Bug Components: Input/Output, Spark Core Affects Versions: 1.0.2, 1.1.1, 1.2.0, 1.3.0 Reporter: Josh Rosen Assignee: Josh Rosen Priority: Critical Labels: backport-needed Fix For: 1.3.0 Attachments: speculation.txt, speculation2.txt When speculative execution is enabled ({{spark.speculation=true}}), jobs that save output files may report that they have completed successfully even though some output partitions written by speculative tasks may be missing. h3. Reproduction This symptom was reported to me by a Spark user and I've been doing my own investigation to try to come up with an in-house reproduction. I'm still working on a reliable local reproduction for this issue, which is a little tricky because Spark won't schedule speculated tasks on the same host as the original task, so you need an actual (or containerized) multi-host cluster to test speculation. Here's a simple reproduction of some of the symptoms on EC2, which can be run in {{spark-shell}} with {{--conf spark.speculation=true}}: {code} // Rig a job such that all but one of the tasks complete instantly // and one task runs for 20 seconds on its first attempt and instantly // on its second attempt: val numTasks = 100 sc.parallelize(1 to numTasks, numTasks).repartition(2).mapPartitionsWithContext { case (ctx, iter) = if (ctx.partitionId == 0) { // If this is the one task that should run really slow if (ctx.attemptId == 0) { // If this is the first attempt, run slow Thread.sleep(20 * 1000) } } iter }.map(x = (x, x)).saveAsTextFile(/test4) {code} When I run this, I end up with a job that completes quickly (due to speculation) but reports failures from the speculated task: {code} [...] 14/12/11 01:41:13 INFO scheduler.TaskSetManager: Finished task 37.1 in stage 3.0 (TID 411) in 131 ms on ip-172-31-8-164.us-west-2.compute.internal (100/100) 14/12/11 01:41:13 INFO scheduler.DAGScheduler: Stage 3 (saveAsTextFile at console:22) finished in 0.856 s 14/12/11 01:41:13 INFO spark.SparkContext: Job finished: saveAsTextFile at console:22, took 0.885438374 s 14/12/11 01:41:13 INFO scheduler.TaskSetManager: Ignoring task-finished event for 70.1 in stage 3.0 because task 70 has already completed successfully scala 14/12/11 01:41:13 WARN scheduler.TaskSetManager: Lost task 49.1 in stage 3.0 (TID 413, ip-172-31-8-164.us-west-2.compute.internal): java.io.IOException: Failed to save output of task: attempt_201412110141_0003_m_49_413 org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:160) org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:172) org.apache.hadoop.mapred.FileOutputCommitter.commitTask(FileOutputCommitter.java:132) org.apache.spark.SparkHadoopWriter.commit(SparkHadoopWriter.scala:109) org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:991) org.apache.spark.rdd.PairRDDFunctions$$anonfun$13.apply(PairRDDFunctions.scala:974) org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) org.apache.spark.scheduler.Task.run(Task.scala:54) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:745) {code} One interesting thing to note about this stack trace: if we look at {{FileOutputCommitter.java:160}} ([link|http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-core/2.5.0-mr1-cdh5.2.0/org/apache/hadoop/mapred/FileOutputCommitter.java#160]), this point in the execution seems to correspond to a case where a task completes, attempts to commit its
[jira] [Resolved] (SPARK-10164) GMM bug: match error
[ https://issues.apache.org/jira/browse/SPARK-10164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-10164. --- Resolution: Fixed Fix Version/s: 1.5.0 Issue resolved by pull request 8370 [https://github.com/apache/spark/pull/8370] GMM bug: match error Key: SPARK-10164 URL: https://issues.apache.org/jira/browse/SPARK-10164 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.5.0 Reporter: Joseph K. Bradley Assignee: Joseph K. Bradley Priority: Critical Fix For: 1.5.0 GaussianMixture now distributes matrix decompositions for certain problem sizes. Distributed computation actually fails, but this was not tested in unit tests. This is a regression. Here is an example failure: {code} Exception in thread main scala.MatchError: ArrayBuffer(0.05001, 0.05001, 0.05001, 0.05 001, 0.05001, 0.05001, 0.05001, 0.05001, 0.05001, 0.05001, 0.050 01, 0.05001, 0.05001, 0.05001, 0.05001, 0.05001, 0.05000 0001, 0.05001, 0.05001, 0.05001) (of class scala.collection.mutable.ArrayBuffer) at scala.runtime.ScalaRunTime$.array_apply(ScalaRunTime.scala:71) at scala.Array$.slowcopy(Array.scala:81) at scala.Array$.copy(Array.scala:107) at org.apache.spark.mllib.clustering.GaussianMixture.run(GaussianMixture.scala:215) at mllib.perf.clustering.GaussianMixtureTest.run(GaussianMixtureTest.scala:60) at mllib.perf.TestRunner$$anonfun$2.apply(TestRunner.scala:66) at mllib.perf.TestRunner$$anonfun$2.apply(TestRunner.scala:64) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.Range.foreach(Range.scala:141) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at mllib.perf.TestRunner$.main(TestRunner.scala:64) at mllib.perf.TestRunner.main(TestRunner.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 15/08/21 21:25:33 INFO spark.SparkContext: Invoking stop() from shutdown hook {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9671) ML 1.5 QA: Programming guide update and migration guide
[ https://issues.apache.org/jira/browse/SPARK-9671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708625#comment-14708625 ] Joseph K. Bradley commented on SPARK-9671: -- Should mention change to GradientDescent to use convergence tolerance: [SPARK-3382]. This will change the behavior by stopping early in some cases. Note: If we do not want to change the behavior, we should set the default tolerance to 0. ML 1.5 QA: Programming guide update and migration guide --- Key: SPARK-9671 URL: https://issues.apache.org/jira/browse/SPARK-9671 Project: Spark Issue Type: Sub-task Components: MLlib Reporter: Joseph K. Bradley Priority: Critical Before the release, we need to update the MLlib Programming Guide. Updates will include: * Add migration guide subsection. ** Use the results of the QA audit JIRAs. * Check phrasing, especially in main sections (for outdated items such as In this release, ...) * Possibly reorganize parts of the Pipelines guide if needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10172) History Server web UI gets messed up when sorting on any column
[ https://issues.apache.org/jira/browse/SPARK-10172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Min Shen updated SPARK-10172: - Attachment: screen-shot.png [~srowen], Screen shot attached. When Attempt ID column is displayed, after sorting based on any column, the columns in the table become misaligned. History Server web UI gets messed up when sorting on any column --- Key: SPARK-10172 URL: https://issues.apache.org/jira/browse/SPARK-10172 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 1.4.0, 1.4.1 Reporter: Min Shen Priority: Minor Attachments: screen-shot.png If the history web UI displays the Attempt ID column, when clicking the table header to sort on any column, the entire page gets messed up. This seems to be a problem with the sorttable.js not able to correctly handle tables with rowspan. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10176) Show partially analyzed plan when checkAnswer df fails to resolve
Michael Armbrust created SPARK-10176: Summary: Show partially analyzed plan when checkAnswer df fails to resolve Key: SPARK-10176 URL: https://issues.apache.org/jira/browse/SPARK-10176 Project: Spark Issue Type: Bug Components: SQL Reporter: Michael Armbrust Assignee: Michael Armbrust It would be much easier to debug test failures if we could see the failed plan instead of just the user friendly error message. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9570) Consistent recommendation for submitting spark apps to YARN, -master yarn --deploy-mode x vs -master yarn-x'.
[ https://issues.apache.org/jira/browse/SPARK-9570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708600#comment-14708600 ] Apache Spark commented on SPARK-9570: - User 'nssalian' has created a pull request for this issue: https://github.com/apache/spark/pull/8385 Consistent recommendation for submitting spark apps to YARN, -master yarn --deploy-mode x vs -master yarn-x'. - Key: SPARK-9570 URL: https://issues.apache.org/jira/browse/SPARK-9570 Project: Spark Issue Type: Improvement Components: Documentation, Spark Submit, YARN Affects Versions: 1.4.1 Reporter: Neelesh Srinivas Salian Priority: Minor Labels: starter There are still some inconsistencies in the documentation regarding submission of the applications for yarn. SPARK-3629 was done to correct the same but http://spark.apache.org/docs/latest/submitting-applications.html#master-urls still has yarn-client and yarn-client as opposed to the nor of having --master yarn and --deploy-mode cluster / client Need to change this appropriately (if needed) to avoid confusion: https://spark.apache.org/docs/latest/running-on-yarn.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10145) Executor exit without useful messages when spark runs in spark-streaming
[ https://issues.apache.org/jira/browse/SPARK-10145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708653#comment-14708653 ] Baogang Wang commented on SPARK-10145: -- Any update? This issue can be reproduced. Executor exit without useful messages when spark runs in spark-streaming Key: SPARK-10145 URL: https://issues.apache.org/jira/browse/SPARK-10145 Project: Spark Issue Type: Bug Components: Streaming, YARN Environment: spark 1.3.1, hadoop 2.6.0, 6 nodes, each node has 32 cores and 32g memory Reporter: Baogang Wang Priority: Critical Original Estimate: 168h Remaining Estimate: 168h Each node is allocated 30g memory by Yarn. My application receives messages from Kafka by directstream. Each application consists of 4 dstream window Spark application is submitted by this command: spark-submit --class spark_security.safe.SafeSockPuppet --driver-memory 3g --executor-memory 3g --num-executors 3 --executor-cores 4 --name safeSparkDealerUser --master yarn --deploy-mode cluster spark_Security-1.0-SNAPSHOT.jar.nocalse hdfs://A01-R08-3-I160-102.JD.LOCAL:9000/spark_properties/safedealer.properties After about 1 hours, some executor exits. There is no more yarn logs after the executor exits and there is no stack when the executor exits. When I see the yarn node manager log, it shows as follows : 2015-08-17 17:25:41,550 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Start request for container_1439803298368_0005_01_01 by user root 2015-08-17 17:25:41,551 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl: Creating a new application reference for app application_1439803298368_0005 2015-08-17 17:25:41,551 INFO org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=root IP=172.19.160.102 OPERATION=Start Container Request TARGET=ContainerManageImpl RESULT=SUCCESS APPID=application_1439803298368_0005 CONTAINERID=container_1439803298368_0005_01_01 2015-08-17 17:25:41,551 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1439803298368_0005 transitioned from NEW to INITING 2015-08-17 17:25:41,552 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Adding container_1439803298368_0005_01_01 to application application_1439803298368_0005 2015-08-17 17:25:41,557 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The logs will be aggregated after this application is finished. 2015-08-17 17:25:41,663 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1439803298368_0005 transitioned from INITING to RUNNING 2015-08-17 17:25:41,664 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container: Container container_1439803298368_0005_01_01 transitioned from NEW to LOCALIZING 2015-08-17 17:25:41,664 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event CONTAINER_INIT for appId application_1439803298368_0005 2015-08-17 17:25:41,664 INFO org.apache.spark.network.yarn.YarnShuffleService: Initializing container container_1439803298368_0005_01_01 2015-08-17 17:25:41,665 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://A01-R08-3-I160-102.JD.LOCAL:9000/user/root/.sparkStaging/application_1439803298368_0005/spark-assembly-1.3.1-hadoop2.6.0.jar transitioned from INIT to DOWNLOADING 2015-08-17 17:25:41,665 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://A01-R08-3-I160-102.JD.LOCAL:9000/user/root/.sparkStaging/application_1439803298368_0005/spark_Security-1.0-SNAPSHOT.jar transitioned from INIT to DOWNLOADING 2015-08-17 17:25:41,665 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Created localizer for container_1439803298368_0005_01_01 2015-08-17 17:25:41,668 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Writing credentials to the nmPrivate file /export/servers/hadoop2.6.0/tmp/nm-local-dir/nmPrivate/container_1439803298368_0005_01_01.tokens. Credentials list: 2015-08-17 17:25:41,682 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Initializing user root 2015-08-17 17:25:41,686 INFO
[jira] [Created] (SPARK-10175) Enhance spark doap file
Luciano Resende created SPARK-10175: --- Summary: Enhance spark doap file Key: SPARK-10175 URL: https://issues.apache.org/jira/browse/SPARK-10175 Project: Spark Issue Type: Bug Components: Project Infra Reporter: Luciano Resende The Spark doap has broken links and is also missing entries related to issue tracker and mailing lists. This affects the list in projects.apache.org and also in the main apache website. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9803) Add transform and subset to DataFrame
[ https://issues.apache.org/jira/browse/SPARK-9803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708536#comment-14708536 ] Felix Cheung commented on SPARK-9803: - subset would be a synonym for `[` https://issues.apache.org/jira/browse/SPARK-9316 Add transform and subset to DataFrame -- Key: SPARK-9803 URL: https://issues.apache.org/jira/browse/SPARK-9803 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Hossein Falaki These three base functions are heavily used with R dataframes. It would be great to have them work with Spark DataFrames: * transform * subset -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10118) Improve SparkR API docs for 1.5 release
[ https://issues.apache.org/jira/browse/SPARK-10118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708645#comment-14708645 ] Yu Ishikawa commented on SPARK-10118: - [~shivaram] sure. I'll send a PR about that later. Improve SparkR API docs for 1.5 release --- Key: SPARK-10118 URL: https://issues.apache.org/jira/browse/SPARK-10118 Project: Spark Issue Type: Documentation Components: Documentation, SparkR Reporter: Shivaram Venkataraman This includes checking if the new DataFrame functions expression show up appropriately in the roxygen docs -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10175) Enhance spark doap file
[ https://issues.apache.org/jira/browse/SPARK-10175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luciano Resende updated SPARK-10175: Attachment: SPARK-10175 Updates to the doap located on the website svn repository. Enhance spark doap file --- Key: SPARK-10175 URL: https://issues.apache.org/jira/browse/SPARK-10175 Project: Spark Issue Type: Bug Components: Project Infra Reporter: Luciano Resende Attachments: SPARK-10175 The Spark doap has broken links and is also missing entries related to issue tracker and mailing lists. This affects the list in projects.apache.org and also in the main apache website. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10039) Resetting REPL state not work
[ https://issues.apache.org/jira/browse/SPARK-10039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708537#comment-14708537 ] ASF GitHub Bot commented on SPARK-10039: Github user felixcheung commented on the pull request: https://github.com/apache/incubator-zeppelin/pull/228#issuecomment-133934888 We could call `sc.stop()` or `SparkIMain.reset()`? Though apparently SparkIMain reset() has some issue: https://issues.apache.org/jira/browse/SPARK-10039 Resetting REPL state not work - Key: SPARK-10039 URL: https://issues.apache.org/jira/browse/SPARK-10039 Project: Spark Issue Type: Bug Components: Spark Shell Affects Versions: 1.4.1 Reporter: Kevin Jung Priority: Minor Spark shell can't find a base directory of class server after running :reset command. {quote} scala :reset scala 1 uncaught exception during compilation: java.lang.AssertiON-ERROR java.lang.AssertiON-ERROR: assertion failed: Tried to find '$line33' in '/tmp/spark-f47f3917-ac31-4138-bf1a-a8cefd094ac3' but it is not a directory ~~~impossible to command anymore including 'exit'~~~ {quote} I figure out reset() method in SparkIMain try to delete virtualDirectory and then create again. But virtualDirectory.create() makes a file, not a directory. Details here. {quote} drwxrwxr-x. 3 root root0 2015-08-17 09:09 spark-9cfc6b06-c902-4caf-8712-9ea63f17d017 (After :reset) \-rw-rw-r--. 1 root root0 2015-08-17 09:09 spark-9cfc6b06-c902-4caf-8712-9ea63f17d017 {quote} vd.delete; vd.givenPath.createDirectory(true); will temporarily solve the problem. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10118) Improve SparkR API docs for 1.5 release
[ https://issues.apache.org/jira/browse/SPARK-10118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708695#comment-14708695 ] Apache Spark commented on SPARK-10118: -- User 'yu-iskw' has created a pull request for this issue: https://github.com/apache/spark/pull/8386 Improve SparkR API docs for 1.5 release --- Key: SPARK-10118 URL: https://issues.apache.org/jira/browse/SPARK-10118 Project: Spark Issue Type: Documentation Components: Documentation, SparkR Reporter: Shivaram Venkataraman This includes checking if the new DataFrame functions expression show up appropriately in the roxygen docs -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10118) Improve SparkR API docs for 1.5 release
[ https://issues.apache.org/jira/browse/SPARK-10118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10118: Assignee: (was: Apache Spark) Improve SparkR API docs for 1.5 release --- Key: SPARK-10118 URL: https://issues.apache.org/jira/browse/SPARK-10118 Project: Spark Issue Type: Documentation Components: Documentation, SparkR Reporter: Shivaram Venkataraman This includes checking if the new DataFrame functions expression show up appropriately in the roxygen docs -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10118) Improve SparkR API docs for 1.5 release
[ https://issues.apache.org/jira/browse/SPARK-10118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10118: Assignee: Apache Spark Improve SparkR API docs for 1.5 release --- Key: SPARK-10118 URL: https://issues.apache.org/jira/browse/SPARK-10118 Project: Spark Issue Type: Documentation Components: Documentation, SparkR Reporter: Shivaram Venkataraman Assignee: Apache Spark This includes checking if the new DataFrame functions expression show up appropriately in the roxygen docs -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10134) Improve the performance of Binary Comparison
[ https://issues.apache.org/jira/browse/SPARK-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-10134: -- Priority: Minor (was: Major) Improve the performance of Binary Comparison Key: SPARK-10134 URL: https://issues.apache.org/jira/browse/SPARK-10134 Project: Spark Issue Type: Improvement Components: SQL Reporter: Cheng Hao Priority: Minor Currently, compare the binary byte by byte is quite slow, use the Guava utility to improve the performance, which take 8 bytes one time in the comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10178) HiveComparision test should print out dependent tables
Michael Armbrust created SPARK-10178: Summary: HiveComparision test should print out dependent tables Key: SPARK-10178 URL: https://issues.apache.org/jira/browse/SPARK-10178 Project: Spark Issue Type: Improvement Components: SQL Reporter: Michael Armbrust Assignee: Michael Armbrust -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10177) Parquet support interpret timestamp values differently from Hive 0.14.0+
Cheng Lian created SPARK-10177: -- Summary: Parquet support interpret timestamp values differently from Hive 0.14.0+ Key: SPARK-10177 URL: https://issues.apache.org/jira/browse/SPARK-10177 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: Cheng Lian Assignee: Cheng Lian Priority: Blocker Running the following SQL under Hive 0.14.0+ (tested against 0.14.0 and 1.2.1): {code:sql} CREATE TABLE ts_test STORED AS PARQUET AS SELECT CAST(2015-01-01 00:00:00 AS TIMESTAMP); {code} Then read the Parquet file generated by Hive with Spark SQL: {noformat} scala sqlContext.read.parquet(hdfs://localhost:9000/user/hive/warehouse_hive14/ts_test).collect() res1: Array[org.apache.spark.sql.Row] = Array([2015-01-01 12:00:00.0]) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10176) Show partially analyzed plan when checkAnswer df fails to resolve
[ https://issues.apache.org/jira/browse/SPARK-10176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10176: Assignee: Michael Armbrust (was: Apache Spark) Show partially analyzed plan when checkAnswer df fails to resolve - Key: SPARK-10176 URL: https://issues.apache.org/jira/browse/SPARK-10176 Project: Spark Issue Type: Bug Components: SQL Reporter: Michael Armbrust Assignee: Michael Armbrust It would be much easier to debug test failures if we could see the failed plan instead of just the user friendly error message. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10176) Show partially analyzed plan when checkAnswer df fails to resolve
[ https://issues.apache.org/jira/browse/SPARK-10176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708818#comment-14708818 ] Apache Spark commented on SPARK-10176: -- User 'marmbrus' has created a pull request for this issue: https://github.com/apache/spark/pull/8389 Show partially analyzed plan when checkAnswer df fails to resolve - Key: SPARK-10176 URL: https://issues.apache.org/jira/browse/SPARK-10176 Project: Spark Issue Type: Bug Components: SQL Reporter: Michael Armbrust Assignee: Michael Armbrust It would be much easier to debug test failures if we could see the failed plan instead of just the user friendly error message. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10179) LogisticRegressionWithSGD does not multiclass
Shiqiao Du created SPARK-10179: -- Summary: LogisticRegressionWithSGD does not multiclass Key: SPARK-10179 URL: https://issues.apache.org/jira/browse/SPARK-10179 Project: Spark Issue Type: Bug Components: MLlib Environment: Ubuntu 14.04, spark 1.4.1 Reporter: Shiqiao Du LogisticRegressionWithSGD does not support Multi-Class input while LogisticRegressionWithLBFGS is OK. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10178) HiveComparision test should print out dependent tables
[ https://issues.apache.org/jira/browse/SPARK-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708807#comment-14708807 ] Apache Spark commented on SPARK-10178: -- User 'marmbrus' has created a pull request for this issue: https://github.com/apache/spark/pull/8388 HiveComparision test should print out dependent tables -- Key: SPARK-10178 URL: https://issues.apache.org/jira/browse/SPARK-10178 Project: Spark Issue Type: Improvement Components: SQL Reporter: Michael Armbrust Assignee: Michael Armbrust -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10178) HiveComparision test should print out dependent tables
[ https://issues.apache.org/jira/browse/SPARK-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10178: Assignee: Apache Spark (was: Michael Armbrust) HiveComparision test should print out dependent tables -- Key: SPARK-10178 URL: https://issues.apache.org/jira/browse/SPARK-10178 Project: Spark Issue Type: Improvement Components: SQL Reporter: Michael Armbrust Assignee: Apache Spark -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10178) HiveComparision test should print out dependent tables
[ https://issues.apache.org/jira/browse/SPARK-10178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10178: Assignee: Michael Armbrust (was: Apache Spark) HiveComparision test should print out dependent tables -- Key: SPARK-10178 URL: https://issues.apache.org/jira/browse/SPARK-10178 Project: Spark Issue Type: Improvement Components: SQL Reporter: Michael Armbrust Assignee: Michael Armbrust -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10176) Show partially analyzed plan when checkAnswer df fails to resolve
[ https://issues.apache.org/jira/browse/SPARK-10176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10176: Assignee: Apache Spark (was: Michael Armbrust) Show partially analyzed plan when checkAnswer df fails to resolve - Key: SPARK-10176 URL: https://issues.apache.org/jira/browse/SPARK-10176 Project: Spark Issue Type: Bug Components: SQL Reporter: Michael Armbrust Assignee: Apache Spark It would be much easier to debug test failures if we could see the failed plan instead of just the user friendly error message. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10142) Python Streaming checkpoint recovery does not work with non-local file path
[ https://issues.apache.org/jira/browse/SPARK-10142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das resolved SPARK-10142. --- Resolution: Fixed Fix Version/s: 1.5.0 Python Streaming checkpoint recovery does not work with non-local file path --- Key: SPARK-10142 URL: https://issues.apache.org/jira/browse/SPARK-10142 Project: Spark Issue Type: Bug Components: PySpark, Streaming Affects Versions: 1.3.1, 1.4.1 Reporter: Tathagata Das Assignee: Tathagata Das Priority: Critical Fix For: 1.5.0 The Python code in StreamingContext.getOrCreate() check whether the give checkpointPath exists on local file system. The solution is to use the same code path as Java to verify whether the valid checkpoint is present or not -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-979) Add some randomization to scheduler to better balance in-memory partition distributions
[ https://issues.apache.org/jira/browse/SPARK-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708739#comment-14708739 ] Apache Spark commented on SPARK-979: User 'tdas' has created a pull request for this issue: https://github.com/apache/spark/pull/8387 Add some randomization to scheduler to better balance in-memory partition distributions --- Key: SPARK-979 URL: https://issues.apache.org/jira/browse/SPARK-979 Project: Spark Issue Type: Improvement Reporter: Reynold Xin Assignee: Kay Ousterhout Fix For: 1.0.0 The Spark scheduler is very deterministic, which causes problems for the following workload (in serial order on a cluster with a small number of nodes): cache rdd 1 with 1 partition cache rdd 2 with 1 partition cache rdd 3 with 1 partition After a while, only executor 1 will have data in memory, and eventually leading to evicting in-memory blocks to disk while all other executors are empty. We can solve this problem by adding some randomization to the cluster scheduling, or by adding memory aware scheduling (which is much harder to do). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10134) Improve the performance of Binary Comparison
[ https://issues.apache.org/jira/browse/SPARK-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Hao updated SPARK-10134: -- Fix Version/s: (was: 1.6.0) Improve the performance of Binary Comparison Key: SPARK-10134 URL: https://issues.apache.org/jira/browse/SPARK-10134 Project: Spark Issue Type: Improvement Components: SQL Reporter: Cheng Hao Priority: Minor Currently, compare the binary byte by byte is quite slow, use the Guava utility to improve the performance, which take 8 bytes one time in the comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10134) Improve the performance of Binary Comparison
[ https://issues.apache.org/jira/browse/SPARK-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708766#comment-14708766 ] Cheng Hao commented on SPARK-10134: --- We can improve that by enable the comparison every 8 bytes for a 64 bit OS. https://bugs.openjdk.java.net/browse/JDK-8033148 Improve the performance of Binary Comparison Key: SPARK-10134 URL: https://issues.apache.org/jira/browse/SPARK-10134 Project: Spark Issue Type: Improvement Components: SQL Reporter: Cheng Hao Currently, compare the binary byte by byte is quite slow, use the Guava utility to improve the performance, which take 8 bytes one time in the comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org