[jira] [Commented] (SPARK-6006) Optimize count distinct in case of high cardinality columns

2015-02-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336430#comment-14336430 ] Apache Spark commented on SPARK-6006: - User 'saucam' has created a pull request for

[jira] [Commented] (SPARK-5983) Don't respond to HTTP TRACE in HTTP-based UIs

2015-02-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336440#comment-14336440 ] Apache Spark commented on SPARK-5983: - User 'srowen' has created a pull request for

[jira] [Created] (SPARK-6007) Add numRows param in DataFrame.show

2015-02-25 Thread Jacky Li (JIRA)
Jacky Li created SPARK-6007: --- Summary: Add numRows param in DataFrame.show Key: SPARK-6007 URL: https://issues.apache.org/jira/browse/SPARK-6007 Project: Spark Issue Type: Bug

[jira] [Resolved] (SPARK-5666) Improvements in Mqtt Spark Streaming

2015-02-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-5666. -- Resolution: Fixed Fix Version/s: 1.4.0 Issue resolved by pull request 4178

[jira] [Commented] (SPARK-6005) Flaky test: o.a.s.streaming.kafka.DirectKafkaStreamSuite.offset recovery

2015-02-25 Thread Iulian Dragos (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336542#comment-14336542 ] Iulian Dragos commented on SPARK-6005: -- Looks similar, but unless I miss something,

[jira] [Commented] (SPARK-5837) HTTP 500 if try to access Spark UI in yarn-cluster or yarn-client mode

2015-02-25 Thread Marco Capuccini (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336475#comment-14336475 ] Marco Capuccini commented on SPARK-5837: setting yarn.resourcemanager.hostname

[jira] [Updated] (SPARK-5666) Improvements in Mqtt Spark Streaming

2015-02-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-5666: - Assignee: Prabeesh K Improvements in Mqtt Spark Streaming -

[jira] [Commented] (SPARK-5947) First class partitioning support in data sources API

2015-02-25 Thread Philippe Girolami (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336556#comment-14336556 ] Philippe Girolami commented on SPARK-5947: -- For some workloads, it can make more

[jira] [Created] (SPARK-6006) Optimize count distinct in case high cardinality columns

2015-02-25 Thread Yash Datta (JIRA)
Yash Datta created SPARK-6006: - Summary: Optimize count distinct in case high cardinality columns Key: SPARK-6006 URL: https://issues.apache.org/jira/browse/SPARK-6006 Project: Spark Issue Type:

[jira] [Updated] (SPARK-6006) Optimize count distinct in case of high cardinality columns

2015-02-25 Thread Yash Datta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yash Datta updated SPARK-6006: -- Summary: Optimize count distinct in case of high cardinality columns (was: Optimize count distinct in

[jira] [Commented] (SPARK-5978) Spark examples cannot compile with Hadoop 2

2015-02-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336443#comment-14336443 ] Sean Owen commented on SPARK-5978: -- Hm, not sure I can reproduce this. If you build for

[jira] [Commented] (SPARK-6007) Add numRows param in DataFrame.show

2015-02-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336470#comment-14336470 ] Apache Spark commented on SPARK-6007: - User 'jackylk' has created a pull request for

[jira] [Resolved] (SPARK-5771) Number of Cores in Completed Applications of Standalone Master Web Page always be 0 if sc.stop() is called

2015-02-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-5771. -- Resolution: Fixed Fix Version/s: 1.4.0 Issue resolved by pull request 4567

[jira] [Commented] (SPARK-4010) Spark UI returns 500 in yarn-client mode

2015-02-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336654#comment-14336654 ] Sean Owen commented on SPARK-4010: -- [~Hanchen] see SPARK-5837 for a possible explanation.

[jira] [Commented] (SPARK-5940) Graph Loader: refactor + add more formats

2015-02-25 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336677#comment-14336677 ] Takeshi Yamamuro commented on SPARK-5940: - I made a quick fix as follows;

[jira] [Commented] (SPARK-4010) Spark UI returns 500 in yarn-client mode

2015-02-25 Thread Hanchen Su (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336644#comment-14336644 ] Hanchen Su commented on SPARK-4010: --- I still have the problem in Spark 1.2.1 Spark UI

[jira] [Updated] (SPARK-5771) Number of Cores in Completed Applications of Standalone Master Web Page always be 0 if sc.stop() is called

2015-02-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-5771: - Assignee: Liangliang Gu Number of Cores in Completed Applications of Standalone Master Web Page always

[jira] [Created] (SPARK-6009) IllegalArgumentException thrown by TimSort when SQL ORDER BY RAND ()

2015-02-25 Thread Paul Barber (JIRA)
Paul Barber created SPARK-6009: -- Summary: IllegalArgumentException thrown by TimSort when SQL ORDER BY RAND () Key: SPARK-6009 URL: https://issues.apache.org/jira/browse/SPARK-6009 Project: Spark

[jira] [Commented] (SPARK-5750) Document that ordering of elements in shuffled partitions is not deterministic across runs

2015-02-25 Thread Ilya Ganelin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336795#comment-14336795 ] Ilya Ganelin commented on SPARK-5750: - Did you have a particular doc in mind to

[jira] [Commented] (SPARK-6010) Exception thrown when reading Spark SQL generated Parquet files with different but compatible schemas

2015-02-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336796#comment-14336796 ] Apache Spark commented on SPARK-6010: - User 'liancheng' has created a pull request for

[jira] [Commented] (SPARK-5750) Document that ordering of elements in shuffled partitions is not deterministic across runs

2015-02-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336864#comment-14336864 ] Sean Owen commented on SPARK-5750: -- Personally I think that would be a fine way forward,

[jira] [Resolved] (SPARK-6008) zip two rdds derived from pickleFile fails

2015-02-25 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-6008. --- Resolution: Duplicate Fix Version/s: 1.2.2 1.3.0 Assignee: Davies

[jira] [Created] (SPARK-6008) zip two rdds derived from pickleFile fails

2015-02-25 Thread Charles Hayden (JIRA)
Charles Hayden created SPARK-6008: - Summary: zip two rdds derived from pickleFile fails Key: SPARK-6008 URL: https://issues.apache.org/jira/browse/SPARK-6008 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-6010) Exception thrown when reading Spark SQL generated Parquet files with different but compatible schemas

2015-02-25 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-6010: - Summary: Exception thrown when reading Spark SQL generated Parquet files with different but compatible schemas Key: SPARK-6010 URL: https://issues.apache.org/jira/browse/SPARK-6010

[jira] [Updated] (SPARK-6012) Deadlock when asking for partitions from CoalescedRDD on top of a TakeOrdered operator

2015-02-25 Thread Max Seiden (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Seiden updated SPARK-6012: -- Summary: Deadlock when asking for partitions from CoalescedRDD on top of a TakeOrdered operator (was:

[jira] [Created] (SPARK-6015) Python docs' source code links are all broken

2015-02-25 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-6015: Summary: Python docs' source code links are all broken Key: SPARK-6015 URL: https://issues.apache.org/jira/browse/SPARK-6015 Project: Spark Issue

[jira] [Updated] (SPARK-6012) Deadlock when asking for SchemaRDD partitions with TakeOrdered operator

2015-02-25 Thread Max Seiden (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Seiden updated SPARK-6012: -- Description: h3. Summary I've found that a deadlock occurs when asking for the partitions from a

[jira] [Commented] (SPARK-6004) Pick the best model when training GradientBoostedTrees with validation

2015-02-25 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337020#comment-14337020 ] Joseph K. Bradley commented on SPARK-6004: -- Can you please add a short JIRA

[jira] [Commented] (SPARK-5124) Standardize internal RPC interface

2015-02-25 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337028#comment-14337028 ] Reynold Xin commented on SPARK-5124: I went and looked at the various use cases of

[jira] [Created] (SPARK-6014) java.io.IOException: Filesystem is thrown when ctrl+c or ctrl+d spark-sql on YARN

2015-02-25 Thread Cheolsoo Park (JIRA)
Cheolsoo Park created SPARK-6014: Summary: java.io.IOException: Filesystem is thrown when ctrl+c or ctrl+d spark-sql on YARN Key: SPARK-6014 URL: https://issues.apache.org/jira/browse/SPARK-6014

[jira] [Commented] (SPARK-6014) java.io.IOException: Filesystem is thrown when ctrl+c or ctrl+d spark-sql on YARN

2015-02-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337055#comment-14337055 ] Apache Spark commented on SPARK-6014: - User 'piaozhexiu' has created a pull request

[jira] [Commented] (SPARK-5750) Document that ordering of elements in shuffled partitions is not deterministic across runs

2015-02-25 Thread Ilya Ganelin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337104#comment-14337104 ] Ilya Ganelin commented on SPARK-5750: - I'd be happy to pull those in. Is it fine to

[jira] [Commented] (SPARK-5978) Spark examples cannot compile with Hadoop 2

2015-02-25 Thread Michael Nazario (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336914#comment-14336914 ] Michael Nazario commented on SPARK-5978: I was building this with 2.0.0-cdh4.7.0.

[jira] [Commented] (SPARK-5930) Documented default of spark.shuffle.io.retryWait is confusing

2015-02-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336926#comment-14336926 ] Apache Spark commented on SPARK-5930: - User 'srowen' has created a pull request for

[jira] [Commented] (SPARK-3441) Explain in docs that repartitionAndSortWithinPartitions enacts Hadoop style shuffle

2015-02-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336948#comment-14336948 ] Sean Owen commented on SPARK-3441: -- Since another shuffle-related doc ticket came up for

[jira] [Updated] (SPARK-5978) Spark, Examples have Hadoop1/2 compat issues with Hadoop 2.0.x (e.g. CDH4)

2015-02-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-5978: - Component/s: (was: PySpark) Build Priority: Critical (was: Major)

[jira] [Commented] (SPARK-5124) Standardize internal RPC interface

2015-02-25 Thread Aaron Davidson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336974#comment-14336974 ] Aaron Davidson commented on SPARK-5124: --- I tend to prefer having explicit

[jira] [Commented] (SPARK-6004) Pick the best model when training GradientBoostedTrees with validation

2015-02-25 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337025#comment-14337025 ] Joseph K. Bradley commented on SPARK-6004: -- The point of the previous PR

[jira] [Created] (SPARK-6013) Add more Python ML examples for spark.ml

2015-02-25 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-6013: Summary: Add more Python ML examples for spark.ml Key: SPARK-6013 URL: https://issues.apache.org/jira/browse/SPARK-6013 Project: Spark Issue Type:

[jira] [Comment Edited] (SPARK-5124) Standardize internal RPC interface

2015-02-25 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337028#comment-14337028 ] Reynold Xin edited comment on SPARK-5124 at 2/25/15 7:29 PM: -

[jira] [Created] (SPARK-6011) Out of disk space due to Spark not deleting shuffle files of lost executors

2015-02-25 Thread pankaj arora (JIRA)
pankaj arora created SPARK-6011: --- Summary: Out of disk space due to Spark not deleting shuffle files of lost executors Key: SPARK-6011 URL: https://issues.apache.org/jira/browse/SPARK-6011 Project:

[jira] [Commented] (SPARK-6011) Out of disk space due to Spark not deleting shuffle files of lost executors

2015-02-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336950#comment-14336950 ] Apache Spark commented on SPARK-6011: - User 'pankajarora12' has created a pull request

[jira] [Commented] (SPARK-5978) Spark examples cannot compile with Hadoop 2

2015-02-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336961#comment-14336961 ] Sean Owen commented on SPARK-5978: -- Ah right, the key is 2.0.x. This is a subset of the

[jira] [Updated] (SPARK-5124) Standardize internal RPC interface

2015-02-25 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-5124: --- Target Version/s: 1.4.0 (was: 1.3.0) Standardize internal RPC interface

[jira] [Resolved] (SPARK-5996) DataFrame.collect() doesn't recognize UDTs

2015-02-25 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-5996. -- Resolution: Fixed Fix Version/s: 1.3.0 DataFrame.collect() doesn't recognize UDTs

[jira] [Created] (SPARK-6012) Deadlock when asking for SchemaRDD partitions with TakeOrdered operator

2015-02-25 Thread Max Seiden (JIRA)
Max Seiden created SPARK-6012: - Summary: Deadlock when asking for SchemaRDD partitions with TakeOrdered operator Key: SPARK-6012 URL: https://issues.apache.org/jira/browse/SPARK-6012 Project: Spark

[jira] [Updated] (SPARK-6016) Cannot read the parquet table after overwriting the existing table when spark.sql.parquet.cacheMetadata=true

2015-02-25 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-6016: Description: saveAsTable is fine and seems we have successfully deleted the old data and written the new

[jira] [Commented] (SPARK-5750) Document that ordering of elements in shuffled partitions is not deterministic across runs

2015-02-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337116#comment-14337116 ] Sean Owen commented on SPARK-5750: -- Yes, although on some of those I'm not as clear what

[jira] [Updated] (SPARK-6011) Out of disk space due to Spark not deleting shuffle files of lost executors

2015-02-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-6011: - Component/s: (was: Spark Core) YARN Target Version/s: (was: 1.3.1)

[jira] [Commented] (SPARK-5836) Highlight in Spark documentation that by default Spark does not delete its temporary files

2015-02-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337128#comment-14337128 ] Sean Owen commented on SPARK-5836: -- I'd like to take this up, since I've heard versions

[jira] [Created] (SPARK-6016) Cannot read the parquet table after overwriting the existing table when spark.sql.parquet.cacheMetadata=true

2015-02-25 Thread Yin Huai (JIRA)
Yin Huai created SPARK-6016: --- Summary: Cannot read the parquet table after overwriting the existing table when spark.sql.parquet.cacheMetadata=true Key: SPARK-6016 URL: https://issues.apache.org/jira/browse/SPARK-6016

[jira] [Commented] (SPARK-6016) Cannot read the parquet table after overwriting the existing table when spark.sql.parquet.cacheMetadata=true

2015-02-25 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337189#comment-14337189 ] Yin Huai commented on SPARK-6016: - cc [~lian cheng] Cannot read the parquet table after

[jira] [Commented] (SPARK-6015) Python docs' source code links are all broken

2015-02-25 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337247#comment-14337247 ] Joseph K. Bradley commented on SPARK-6015: -- That would be great; would you mind

[jira] [Updated] (SPARK-6015) Backport Python doc source code link fix to 1.2

2015-02-25 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-6015: - Description: The Python docs display {code}[source]{code} links which should link to

[jira] [Updated] (SPARK-6015) Backport Python doc source code link fix to 1.2

2015-02-25 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-6015: - Summary: Backport Python doc source code link fix to 1.2 (was: Python docs' source code

[jira] [Updated] (SPARK-6015) Backport Python doc source code link fix to 1.2

2015-02-25 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-6015: - Affects Version/s: (was: 1.3.0) 1.2.1 Backport Python doc

[jira] [Reopened] (SPARK-6015) Backport Python doc source code link fix to 1.2

2015-02-25 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley reopened SPARK-6015: -- Backport Python doc source code link fix to 1.2

[jira] [Updated] (SPARK-5975) SparkSubmit --jars not present on driver in python

2015-02-25 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-5975: - Component/s: (was: Spark Core) PySpark SparkSubmit --jars not present on driver in

[jira] [Resolved] (SPARK-4845) Adding a parallelismRatio to control the partitions num of shuffledRDD

2015-02-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-4845. -- Resolution: Won't Fix Fix Version/s: (was: 1.3.0) Target Version/s: (was: 1.3.0)

[jira] [Updated] (SPARK-6014) java.io.IOException: Filesystem is thrown when ctrl+c or ctrl+d spark-sql on YARN

2015-02-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-6014: - Component/s: YARN java.io.IOException: Filesystem is thrown when ctrl+c or ctrl+d spark-sql on YARN

[jira] [Updated] (SPARK-5982) Remove Local Read Time

2015-02-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-5982: - Component/s: Spark Core Remove Local Read Time -- Key: SPARK-5982

[jira] [Updated] (SPARK-5949) Driver program has to register roaring bitmap classes used by spark with Kryo when number of partitions is greater than 2000

2015-02-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-5949: - Component/s: Spark Core Driver program has to register roaring bitmap classes used by spark with Kryo

[jira] [Assigned] (SPARK-6015) Python docs' source code links are all broken

2015-02-25 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu reassigned SPARK-6015: - Assignee: Davies Liu Python docs' source code links are all broken

[jira] [Commented] (SPARK-6015) Python docs' source code links are all broken

2015-02-25 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337240#comment-14337240 ] Davies Liu commented on SPARK-6015: --- Should we backport that fix into 1.2? Python

[jira] [Resolved] (SPARK-6015) Python docs' source code links are all broken

2015-02-25 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-6015. --- Resolution: Fixed Fix Version/s: 1.3.0 Python docs' source code links are all broken

[jira] [Updated] (SPARK-6015) Backport Python doc source code link fix to 1.2

2015-02-25 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-6015: - Target Version/s: 1.2.2 (was: 1.3.0) Backport Python doc source code link fix to 1.2

[jira] [Updated] (SPARK-6015) Backport Python doc source code link fix to 1.2

2015-02-25 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-6015: - Fix Version/s: (was: 1.3.0) Backport Python doc source code link fix to 1.2

[jira] [Updated] (SPARK-5970) Temporary directories are not removed (but their content is)

2015-02-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-5970: - Priority: Minor (was: Major) Assignee: Milan Straka Temporary directories are not removed (but

[jira] [Resolved] (SPARK-5970) Temporary directories are not removed (but their content is)

2015-02-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-5970. -- Resolution: Fixed Fix Version/s: 1.4.0 Issue resolved by pull request 4759

[jira] [Updated] (SPARK-5975) SparkSubmit --jars not present on driver in python

2015-02-25 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-5975: - Summary: SparkSubmit --jars not present on driver in python (was: SparkSubmit --jars not present on

[jira] [Commented] (SPARK-1182) Sort the configuration parameters in configuration.md

2015-02-25 Thread Brennon York (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337308#comment-14337308 ] Brennon York commented on SPARK-1182: - [~rxin] I've incorporated all the changes you

[jira] [Updated] (SPARK-4924) Factor out code to launch Spark applications into a separate library

2015-02-25 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated SPARK-4924: -- Target Version/s: 1.4.0 (was: 1.3.0) Factor out code to launch Spark applications into a

[jira] [Resolved] (SPARK-1955) VertexRDD can incorrectly assume index sharing

2015-02-25 Thread Ankur Dave (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Dave resolved SPARK-1955. --- Resolution: Fixed Fix Version/s: 1.2.2 1.3.0 Assignee: Brennon York

[jira] [Created] (SPARK-6017) Provide transparent secure communication channel on Yarn

2015-02-25 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created SPARK-6017: - Summary: Provide transparent secure communication channel on Yarn Key: SPARK-6017 URL: https://issues.apache.org/jira/browse/SPARK-6017 Project: Spark

[jira] [Updated] (SPARK-6016) Cannot read the parquet table after overwriting the existing table when spark.sql.parquet.cacheMetadata=true

2015-02-25 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-6016: Description: saveAsTable is fine and seems we have successfully deleted the old data and written the new

[jira] [Updated] (SPARK-6017) Provide transparent secure communication channel on Yarn

2015-02-25 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated SPARK-6017: -- Attachment: secure_spark_on_yarn.pdf First draft of problem statement and proposed solutions.

[jira] [Updated] (SPARK-6016) Cannot read the parquet table after overwriting the existing table when spark.sql.parquet.cacheMetadata=true

2015-02-25 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated SPARK-6016: Description: saveAsTable is fine and seems we have successfully deleted the old data and written the new

[jira] [Updated] (SPARK-6018) NoSuchMethodError in Spark app is swallowed by YARN AM

2015-02-25 Thread Cheolsoo Park (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheolsoo Park updated SPARK-6018: - Description: I discovered this bug while testing 1.3 RC with old 1.2 Spark job that I had. Due

[jira] [Resolved] (SPARK-2770) Rename spark-ganglia-lgpl to ganglia-lgpl

2015-02-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-2770. -- Resolution: Won't Fix Rename spark-ganglia-lgpl to ganglia-lgpl

[jira] [Commented] (SPARK-6022) GraphX `diff` test incorrectly operating on values (not VertexId's)

2015-02-25 Thread Brennon York (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337513#comment-14337513 ] Brennon York commented on SPARK-6022: - FWIW I have this fix put in place under my

[jira] [Commented] (SPARK-3901) Add SocketSink capability for Spark metrics

2015-02-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337512#comment-14337512 ] Sean Owen commented on SPARK-3901: -- It sounds like this can't proceed, blocked on

[jira] [Resolved] (SPARK-725) Ran out of disk space on EC2 master due to Ganglia logs

2015-02-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-725. - Resolution: Not a Problem Sounds like the best guess was that this wasn't a Spark issue. Ran out of disk

[jira] [Resolved] (SPARK-5974) Add save/load to examples in ML guide

2015-02-25 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-5974. -- Resolution: Fixed Fix Version/s: 1.3.0 Add save/load to examples in ML guide

[jira] [Commented] (SPARK-1182) Sort the configuration parameters in configuration.md

2015-02-25 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337555#comment-14337555 ] Reynold Xin commented on SPARK-1182: Merged. Thanks for doing this, [~boyork].

[jira] [Resolved] (SPARK-786) Clean up old work directories in standalone worker

2015-02-25 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-786. - Resolution: Duplicate Clean up old work directories in standalone worker

[jira] [Commented] (SPARK-5124) Standardize internal RPC interface

2015-02-25 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337733#comment-14337733 ] Shixiong Zhu commented on SPARK-5124: - [~rxin] could you clarify how to reply the

[jira] [Created] (SPARK-6025) Helper method for GradientBoostedTrees to compute validation error

2015-02-25 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-6025: Summary: Helper method for GradientBoostedTrees to compute validation error Key: SPARK-6025 URL: https://issues.apache.org/jira/browse/SPARK-6025 Project:

[jira] [Commented] (SPARK-6004) Pick the best model when training GradientBoostedTrees with validation

2015-02-25 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337793#comment-14337793 ] Joseph K. Bradley commented on SPARK-6004: -- If validation is not being done, we

[jira] [Created] (SPARK-6027) Make KafkaUtils work in Python with kafka-assembly provided as --jar

2015-02-25 Thread Tathagata Das (JIRA)
Tathagata Das created SPARK-6027: Summary: Make KafkaUtils work in Python with kafka-assembly provided as --jar Key: SPARK-6027 URL: https://issues.apache.org/jira/browse/SPARK-6027 Project: Spark

[jira] [Commented] (SPARK-5124) Standardize internal RPC interface

2015-02-25 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337724#comment-14337724 ] Shixiong Zhu commented on SPARK-5124: - [~vanzin], thanks for the suggestions. I agree

[jira] [Commented] (SPARK-5124) Standardize internal RPC interface

2015-02-25 Thread Aaron Davidson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337745#comment-14337745 ] Aaron Davidson commented on SPARK-5124: --- [~zsxwing] For receiveAndReply, I think the

[jira] [Commented] (SPARK-6004) Pick the best model when training GradientBoostedTrees with validation

2015-02-25 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337762#comment-14337762 ] Joseph K. Bradley commented on SPARK-6004: -- I'm not too worried about stopping

[jira] [Updated] (SPARK-6020) Flaky test: o.a.s.sql.columnar.PartitionBatchPruningSuite

2015-02-25 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-6020: - Assignee: Cheng Lian Flaky test: o.a.s.sql.columnar.PartitionBatchPruningSuite

[jira] [Commented] (SPARK-5981) pyspark ML models should support predict/transform on vector within map

2015-02-25 Thread Manoj Kumar (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337805#comment-14337805 ] Manoj Kumar commented on SPARK-5981: [~josephkb] Hi, Can I work on this? pyspark ML

[jira] [Commented] (SPARK-5185) pyspark --jars does not add classes to driver class path

2015-02-25 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337840#comment-14337840 ] Tathagata Das commented on SPARK-5185: -- I also encountered this for KafkaUtils in

[jira] [Updated] (SPARK-5185) pyspark --jars does not add classes to driver class path

2015-02-25 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-5185: - Assignee: Andrew Or (was: Burak Yavuz) pyspark --jars does not add classes to driver class path

[jira] [Updated] (SPARK-5185) pyspark --jars does not add classes to driver class path

2015-02-25 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-5185: - Assignee: Burak Yavuz pyspark --jars does not add classes to driver class path

[jira] [Commented] (SPARK-2168) History Server renered page not suitable for load balancing

2015-02-25 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337845#comment-14337845 ] Apache Spark commented on SPARK-2168: - User 'elyast' has created a pull request for

[jira] [Created] (SPARK-6024) When a data source table has too many columns, cannot persist it in metastore.

2015-02-25 Thread Yin Huai (JIRA)
Yin Huai created SPARK-6024: --- Summary: When a data source table has too many columns, cannot persist it in metastore. Key: SPARK-6024 URL: https://issues.apache.org/jira/browse/SPARK-6024 Project: Spark

[jira] [Commented] (SPARK-6024) When a data source table has too many columns, it's schema cannot be stored in metastore.

2015-02-25 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337722#comment-14337722 ] Yin Huai commented on SPARK-6024: - Seems we need to split the schema's string

  1   2   3   >