[jira] [Commented] (SPARK-3622) Provide a custom transformation that can output multiple RDDs

2014-09-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142982#comment-14142982 ] Patrick Wendell commented on SPARK-3622: In Spark most RDD operations are lazy, so

[jira] [Comment Edited] (SPARK-3622) Provide a custom transformation that can output multiple RDDs

2014-09-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142982#comment-14142982 ] Patrick Wendell edited comment on SPARK-3622 at 9/22/14 7:35 AM:

[jira] [Comment Edited] (SPARK-3622) Provide a custom transformation that can output multiple RDDs

2014-09-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142982#comment-14142982 ] Patrick Wendell edited comment on SPARK-3622 at 9/22/14 7:35 AM:

[jira] [Created] (SPARK-3635) Find Strongly Connected Components with Graphx has a small bug

2014-09-22 Thread Oded Zimerman (JIRA)
Oded Zimerman created SPARK-3635: Summary: Find Strongly Connected Components with Graphx has a small bug Key: SPARK-3635 URL: https://issues.apache.org/jira/browse/SPARK-3635 Project: Spark

[jira] [Commented] (SPARK-3635) Find Strongly Connected Components with Graphx has a small bug

2014-09-22 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143006#comment-14143006 ] Apache Spark commented on SPARK-3635: - User 'odedz' has created a pull request for

[jira] [Commented] (SPARK-3612) Executor shouldn't quit if heartbeat message fails to reach the driver

2014-09-22 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143007#comment-14143007 ] Apache Spark commented on SPARK-3612: - User 'sryza' has created a pull request for

[jira] [Commented] (SPARK-3621) Provide a way to broadcast an RDD (instead of just a variable made of the RDD) so that a job can access

2014-09-22 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143009#comment-14143009 ] Sean Owen commented on SPARK-3621: -- If the data is shipped to the worker node, and the

[jira] [Created] (SPARK-3636) It is not friendly to interrupt a Job when user passes different storageLevels to a RDD

2014-09-22 Thread uncleGen (JIRA)
uncleGen created SPARK-3636: --- Summary: It is not friendly to interrupt a Job when user passes different storageLevels to a RDD Key: SPARK-3636 URL: https://issues.apache.org/jira/browse/SPARK-3636 Project:

[jira] [Commented] (SPARK-3636) It is not friendly to interrupt a Job when user passes different storageLevels to a RDD

2014-09-22 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143055#comment-14143055 ] Apache Spark commented on SPARK-3636: - User 'uncleGen' has created a pull request for

[jira] [Created] (SPARK-3637) NPE in ShuffleMapTask

2014-09-22 Thread Przemyslaw Pastuszka (JIRA)
Przemyslaw Pastuszka created SPARK-3637: --- Summary: NPE in ShuffleMapTask Key: SPARK-3637 URL: https://issues.apache.org/jira/browse/SPARK-3637 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-3638) Commons HTTP client dependency conflict in extras/kinesis-asl module

2014-09-22 Thread Aniket Bhatnagar (JIRA)
Aniket Bhatnagar created SPARK-3638: --- Summary: Commons HTTP client dependency conflict in extras/kinesis-asl module Key: SPARK-3638 URL: https://issues.apache.org/jira/browse/SPARK-3638 Project:

[jira] [Created] (SPARK-3639) Kinesis examples set master as local

2014-09-22 Thread Aniket Bhatnagar (JIRA)
Aniket Bhatnagar created SPARK-3639: --- Summary: Kinesis examples set master as local Key: SPARK-3639 URL: https://issues.apache.org/jira/browse/SPARK-3639 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-3638) Commons HTTP client dependency conflict in extras/kinesis-asl module

2014-09-22 Thread Aniket Bhatnagar (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aniket Bhatnagar updated SPARK-3638: Component/s: Streaming Commons HTTP client dependency conflict in extras/kinesis-asl

[jira] [Commented] (SPARK-3638) Commons HTTP client dependency conflict in extras/kinesis-asl module

2014-09-22 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143109#comment-14143109 ] Apache Spark commented on SPARK-3638: - User 'aniketbhatnagar' has created a pull

[jira] [Commented] (SPARK-3639) Kinesis examples set master as local

2014-09-22 Thread Aniket Bhatnagar (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143108#comment-14143108 ] Aniket Bhatnagar commented on SPARK-3639: - If the community agrees this is an

[jira] [Created] (SPARK-3640) KinesisUtils should accept a credentials object instead of forcing DefaultCredentialsProvider

2014-09-22 Thread Aniket Bhatnagar (JIRA)
Aniket Bhatnagar created SPARK-3640: --- Summary: KinesisUtils should accept a credentials object instead of forcing DefaultCredentialsProvider Key: SPARK-3640 URL: https://issues.apache.org/jira/browse/SPARK-3640

[jira] [Commented] (SPARK-3640) KinesisUtils should accept a credentials object instead of forcing DefaultCredentialsProvider

2014-09-22 Thread Aniket Bhatnagar (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143111#comment-14143111 ] Aniket Bhatnagar commented on SPARK-3640: - I understand that the credentials need

[jira] [Commented] (SPARK-3625) In some cases, the RDD.checkpoint does not work

2014-09-22 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143153#comment-14143153 ] Sean Owen commented on SPARK-3625: -- This prints 1000 both times for me, which is correct.

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-09-22 Thread Chengxiang Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143191#comment-14143191 ] Chengxiang Li commented on SPARK-2321: -- I agree that a stable, immutable, and

[jira] [Commented] (SPARK-3621) Provide a way to broadcast an RDD (instead of just a variable made of the RDD) so that a job can access

2014-09-22 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143231#comment-14143231 ] Xuefu Zhang commented on SPARK-3621: In my limited understanding, to broadcast a

[jira] [Updated] (SPARK-3625) In some cases, the RDD.checkpoint does not work

2014-09-22 Thread Guoqiang Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-3625: --- Description: The reproduce code: {code} sc.setCheckpointDir(checkpointDir) val c =

[jira] [Commented] (SPARK-3593) Support Sorting of Binary Type Data

2014-09-22 Thread Paul Magid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143295#comment-14143295 ] Paul Magid commented on SPARK-3593: --- I am putting Spark SQL 1.1 through its paces (in a

[jira] [Created] (SPARK-3641) Correctly populate SparkPlan.currentContext

2014-09-22 Thread Yin Huai (JIRA)
Yin Huai created SPARK-3641: --- Summary: Correctly populate SparkPlan.currentContext Key: SPARK-3641 URL: https://issues.apache.org/jira/browse/SPARK-3641 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-3625) In some cases, the RDD.checkpoint does not work

2014-09-22 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143342#comment-14143342 ] Sean Owen commented on SPARK-3625: -- It still prints 1000 both times, which is correct.

[jira] [Created] (SPARK-3642) Better document the nuances of shared variables

2014-09-22 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-3642: - Summary: Better document the nuances of shared variables Key: SPARK-3642 URL: https://issues.apache.org/jira/browse/SPARK-3642 Project: Spark Issue Type:

[jira] [Updated] (SPARK-3625) In some cases, the RDD.checkpoint does not work

2014-09-22 Thread Guoqiang Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-3625: --- Issue Type: Improvement (was: Bug) In some cases, the RDD.checkpoint does not work

[jira] [Updated] (SPARK-3625) In some cases, the RDD.checkpoint does not work

2014-09-22 Thread Guoqiang Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-3625: --- Priority: Major (was: Blocker) In some cases, the RDD.checkpoint does not work

[jira] [Commented] (SPARK-3625) In some cases, the RDD.checkpoint does not work

2014-09-22 Thread Guoqiang Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143410#comment-14143410 ] Guoqiang Li commented on SPARK-3625: Ok, it has been modified to improvement This

[jira] [Updated] (SPARK-3625) In some cases, the RDD.checkpoint does not work

2014-09-22 Thread Guoqiang Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-3625: --- Description: The reproduce code: {code} sc.setCheckpointDir(checkpointDir) val c =

[jira] [Commented] (SPARK-3642) Better document the nuances of shared variables

2014-09-22 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143423#comment-14143423 ] Apache Spark commented on SPARK-3642: - User 'sryza' has created a pull request for

[jira] [Updated] (SPARK-3625) In some cases, the RDD.checkpoint does not work

2014-09-22 Thread Guoqiang Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-3625: --- Description: The reproduce code: {code} sc.setCheckpointDir(checkpointDir) val c =

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-09-22 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143425#comment-14143425 ] Josh Rosen commented on SPARK-2321: --- {quote} ... maybe we should redesign the

[jira] [Commented] (SPARK-3627) spark on yarn reports success even though job fails

2014-09-22 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143438#comment-14143438 ] Thomas Graves commented on SPARK-3627: -- We could make this a separate issue, but I've

[jira] [Updated] (SPARK-1475) Drain event logging queue before stopping event logger

2014-09-22 Thread Kan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kan Zhang updated SPARK-1475: - Summary: Drain event logging queue before stopping event logger (was: Draining event logging queue

[jira] [Updated] (SPARK-3588) Gaussian Mixture Model clustering

2014-09-22 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-3588: -- Assignee: Meethu Mathew Gaussian Mixture Model clustering -

[jira] [Commented] (SPARK-3631) Add docs for checkpoint usage

2014-09-22 Thread Burak Yavuz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143484#comment-14143484 ] Burak Yavuz commented on SPARK-3631: Thanks for setting this up [~aash]! [~pwendell],

[jira] [Commented] (SPARK-3627) spark on yarn reports success even though job fails

2014-09-22 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143523#comment-14143523 ] Thomas Graves commented on SPARK-3627: -- this might be the same as SPARK-3293 spark

[jira] [Comment Edited] (SPARK-3614) Filter on minimum occurrences of a term in IDF

2014-09-22 Thread RJ Nowling (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142860#comment-14142860 ] RJ Nowling edited comment on SPARK-3614 at 9/22/14 5:52 PM:

[jira] [Commented] (SPARK-3561) Native Hadoop/YARN integration for batch/ETL workloads

2014-09-22 Thread Adam Kawa (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143517#comment-14143517 ] Adam Kawa commented on SPARK-3561: -- We also would be very interested in trying this out

[jira] [Commented] (SPARK-1655) In naive Bayes, store conditional probabilities distributively.

2014-09-22 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143550#comment-14143550 ] Apache Spark commented on SPARK-1655: - User 'staple' has created a pull request for

[jira] [Commented] (SPARK-3641) Correctly populate SparkPlan.currentContext

2014-09-22 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143540#comment-14143540 ] Michael Armbrust commented on SPARK-3641: - The idea here is to be able to support

[jira] [Commented] (SPARK-3641) Correctly populate SparkPlan.currentContext

2014-09-22 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143566#comment-14143566 ] Yin Huai commented on SPARK-3641: - Sounds good. Let me fix it. Correctly populate

[jira] [Resolved] (SPARK-2373) RDD add span function (split an RDD to two RDD based on user's function)

2014-09-22 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-2373. --- Resolution: Won't Fix Resolving this as Won't Fix, per discussion on the PR. [Matei

[jira] [Created] (SPARK-3643) Add cluster-specific config settings to configuration page

2014-09-22 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-3643: Summary: Add cluster-specific config settings to configuration page Key: SPARK-3643 URL: https://issues.apache.org/jira/browse/SPARK-3643 Project: Spark

[jira] [Created] (SPARK-3644) REST API for Spark application info (jobs / stages / tasks / storage info)

2014-09-22 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-3644: - Summary: REST API for Spark application info (jobs / stages / tasks / storage info) Key: SPARK-3644 URL: https://issues.apache.org/jira/browse/SPARK-3644 Project: Spark

[jira] [Updated] (SPARK-3644) REST API for Spark application info (jobs / stages / tasks / storage info)

2014-09-22 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-3644: -- Assignee: (was: Josh Rosen) REST API for Spark application info (jobs / stages / tasks / storage

[jira] [Updated] (SPARK-3644) REST API for Spark application info (jobs / stages / tasks / storage info)

2014-09-22 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-3644: -- Description: This JIRA is a forum to draft a design proposal for a REST interface for accessing

[jira] [Updated] (SPARK-3298) [SQL] registerAsTable / registerTempTable overwrites old tables

2014-09-22 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-3298: Target Version/s: 1.2.0 [SQL] registerAsTable / registerTempTable overwrites old tables

[jira] [Assigned] (SPARK-3298) [SQL] registerAsTable / registerTempTable overwrites old tables

2014-09-22 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust reassigned SPARK-3298: --- Assignee: Michael Armbrust [SQL] registerAsTable / registerTempTable overwrites old

[jira] [Commented] (SPARK-3298) [SQL] registerAsTable / registerTempTable overwrites old tables

2014-09-22 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143637#comment-14143637 ] Michael Armbrust commented on SPARK-3298: - I think the plan here is to add an

[jira] [Commented] (SPARK-3634) Python modules added through addPyFile should take precedence over system modules

2014-09-22 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143641#comment-14143641 ] Apache Spark commented on SPARK-3634: - User 'davies' has created a pull request for

[jira] [Created] (SPARK-3645) Make caching using SQL commands eager by default, with the option of being lazy

2014-09-22 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-3645: --- Summary: Make caching using SQL commands eager by default, with the option of being lazy Key: SPARK-3645 URL: https://issues.apache.org/jira/browse/SPARK-3645

[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-09-22 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143656#comment-14143656 ] Nicholas Chammas commented on SPARK-3431: - [~joshrosen] I can take a crack at this

[jira] [Created] (SPARK-3646) Copy SQL options from the spark context

2014-09-22 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-3646: --- Summary: Copy SQL options from the spark context Key: SPARK-3646 URL: https://issues.apache.org/jira/browse/SPARK-3646 Project: Spark Issue Type:

[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-09-22 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143684#comment-14143684 ] Josh Rosen commented on SPARK-3431: --- [~nchammas] I'm not sure. The different test

[jira] [Commented] (SPARK-3646) Copy SQL options from the spark context

2014-09-22 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143693#comment-14143693 ] Apache Spark commented on SPARK-3646: - User 'marmbrus' has created a pull request for

[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-09-22 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143707#comment-14143707 ] Nicholas Chammas commented on SPARK-3431: - {quote} Do you know how maven / sbt

[jira] [Commented] (SPARK-3610) History server log name should not be based on user input

2014-09-22 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143709#comment-14143709 ] Andrew Or commented on SPARK-3610: -- Hi all, I don't have the time to fix this, but this

[jira] [Commented] (SPARK-3298) [SQL] registerAsTable / registerTempTable overwrites old tables

2014-09-22 Thread Evan Chan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143716#comment-14143716 ] Evan Chan commented on SPARK-3298: -- Sounds good, thanks! -Evan Never doubt that a small

[jira] [Commented] (SPARK-2870) Thorough schema inference directly on RDDs of Python dictionaries

2014-09-22 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143720#comment-14143720 ] Nicholas Chammas commented on SPARK-2870: - [~marmbrus] - API-wise, how are you

[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-09-22 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143724#comment-14143724 ] Sean Owen commented on SPARK-3431: -- It's trivial to configure Maven surefire/failsafe to

[jira] [Commented] (SPARK-3270) Spark API for Application Extensions

2014-09-22 Thread Michal Malohlava (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143732#comment-14143732 ] Michal Malohlava commented on SPARK-3270: - Hi Patrick, you are right - in the

[jira] [Commented] (SPARK-3641) Correctly populate SparkPlan.currentContext

2014-09-22 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143737#comment-14143737 ] Michael Armbrust commented on SPARK-3641: - Hey [~yhuai] have you started on this

[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-09-22 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143744#comment-14143744 ] Nicholas Chammas commented on SPARK-3431: - I see. I'll try to look into it then. I

[jira] [Commented] (SPARK-3641) Correctly populate SparkPlan.currentContext

2014-09-22 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143750#comment-14143750 ] Yin Huai commented on SPARK-3641: - No, I have not started. I can start after your caching

[jira] [Commented] (SPARK-3129) Prevent data loss in Spark Streaming

2014-09-22 Thread Hari Shreedharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143775#comment-14143775 ] Hari Shreedharan commented on SPARK-3129: - I did multiple rounds of testing and it

[jira] [Commented] (SPARK-3032) Potential bug when running sort-based shuffle with sorting using TimSort

2014-09-22 Thread Aaron Davidson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143792#comment-14143792 ] Aaron Davidson commented on SPARK-3032: --- [~matei] any thoughts on this issue?

[jira] [Created] (SPARK-3647) Shaded Guava patch causes access issues with package private classes

2014-09-22 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created SPARK-3647: - Summary: Shaded Guava patch causes access issues with package private classes Key: SPARK-3647 URL: https://issues.apache.org/jira/browse/SPARK-3647 Project: Spark

[jira] [Commented] (SPARK-3647) Shaded Guava patch causes access issues with package private classes

2014-09-22 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143796#comment-14143796 ] Marcelo Vanzin commented on SPARK-3647: --- There are two options I see here: - extend

[jira] [Resolved] (SPARK-3578) GraphGenerators.sampleLogNormal sometimes returns too-large result

2014-09-22 Thread Joseph E. Gonzalez (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph E. Gonzalez resolved SPARK-3578. --- Resolution: Fixed Fix Version/s: 1.2.0 Resolved by

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-09-22 Thread Mark Hamstra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143814#comment-14143814 ] Mark Hamstra commented on SPARK-2321: - Which would be kind of the opposite half of the

[jira] [Commented] (SPARK-3614) Filter on minimum occurrences of a term in IDF

2014-09-22 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143820#comment-14143820 ] Apache Spark commented on SPARK-3614: - User 'rnowling' has created a pull request for

[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-09-22 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143835#comment-14143835 ] Sean Owen commented on SPARK-3431: -- For your experiments, scalatest just copies an old

[jira] [Commented] (SPARK-2848) Shade Guava in Spark deliverables

2014-09-22 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143877#comment-14143877 ] Thomas Graves commented on SPARK-2848: -- [~vanzin] [~pwendell] I think the fix version

[jira] [Commented] (SPARK-2848) Shade Guava in Spark deliverables

2014-09-22 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143885#comment-14143885 ] Marcelo Vanzin commented on SPARK-2848: --- Yes, that's right, this was pushed onto

[jira] [Commented] (SPARK-3614) Filter on minimum occurrences of a term in IDF

2014-09-22 Thread Liquan Pei (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143887#comment-14143887 ] Liquan Pei commented on SPARK-3614: --- To me, the less number of documents a term appears,

[jira] [Updated] (SPARK-2848) Shade Guava in Spark deliverables

2014-09-22 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated SPARK-2848: -- Fix Version/s: (was: 1.1.0) 1.2.0 Shade Guava in Spark deliverables

[jira] [Created] (SPARK-3648) Provide a script for fetching remote PR's for review

2014-09-22 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-3648: -- Summary: Provide a script for fetching remote PR's for review Key: SPARK-3648 URL: https://issues.apache.org/jira/browse/SPARK-3648 Project: Spark Issue

[jira] [Updated] (SPARK-3648) Provide a script for fetching remote PR's for review

2014-09-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3648: --- Issue Type: New Feature (was: Bug) Provide a script for fetching remote PR's for review

[jira] [Created] (SPARK-3649) ClassCastException in GraphX custom serializers when sort-based shuffle spills

2014-09-22 Thread Ankur Dave (JIRA)
Ankur Dave created SPARK-3649: - Summary: ClassCastException in GraphX custom serializers when sort-based shuffle spills Key: SPARK-3649 URL: https://issues.apache.org/jira/browse/SPARK-3649 Project:

[jira] [Commented] (SPARK-3614) Filter on minimum occurrences of a term in IDF

2014-09-22 Thread RJ Nowling (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143898#comment-14143898 ] RJ Nowling commented on SPARK-3614: --- It could lead to over-fitting and thus

[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce

2014-09-22 Thread Grega Kespret (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143895#comment-14143895 ] Grega Kespret commented on SPARK-2620: -- We have this issue on Spark 1.1.0. case

[jira] [Commented] (SPARK-3622) Provide a custom transformation that can output multiple RDDs

2014-09-22 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143908#comment-14143908 ] Sandy Ryza commented on SPARK-3622: --- Is this a duplicate of SPARK-2688? Provide a

[jira] [Updated] (SPARK-1720) use LD_LIBRARY_PATH instead of -Djava.library.path

2014-09-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1720: --- Priority: Critical (was: Major) Target Version/s: 1.2.0 use LD_LIBRARY_PATH

[jira] [Updated] (SPARK-3649) ClassCastException in GraphX custom serializers when sort-based shuffle spills

2014-09-22 Thread Ankur Dave (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Dave updated SPARK-3649: -- Description: As

[jira] [Updated] (SPARK-3606) Spark-on-Yarn AmIpFilter does not work with Yarn HA.

2014-09-22 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-3606: - Affects Version/s: 1.1.0 Spark-on-Yarn AmIpFilter does not work with Yarn HA.

[jira] [Commented] (SPARK-3622) Provide a custom transformation that can output multiple RDDs

2014-09-22 Thread Xuefu Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143921#comment-14143921 ] Xuefu Zhang commented on SPARK-3622: They are related but not exactly the same.

[jira] [Commented] (SPARK-3621) Provide a way to broadcast an RDD (instead of just a variable made of the RDD) so that a job can access

2014-09-22 Thread bc Wong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143933#comment-14143933 ] bc Wong commented on SPARK-3621: I think this is for the case of a map-side join where one

[jira] [Updated] (SPARK-3649) ClassCastException in GraphX custom serializers when sort-based shuffle spills

2014-09-22 Thread Ankur Dave (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Dave updated SPARK-3649: -- Description: As

[jira] [Commented] (SPARK-1720) use LD_LIBRARY_PATH instead of -Djava.library.path

2014-09-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143930#comment-14143930 ] Patrick Wendell commented on SPARK-1720: Another user reported this issue, so

[jira] [Created] (SPARK-3650) Triangle Count handles reverse edges incorrectly

2014-09-22 Thread Joseph E. Gonzalez (JIRA)
Joseph E. Gonzalez created SPARK-3650: - Summary: Triangle Count handles reverse edges incorrectly Key: SPARK-3650 URL: https://issues.apache.org/jira/browse/SPARK-3650 Project: Spark

[jira] [Commented] (SPARK-3650) Triangle Count handles reverse edges incorrectly

2014-09-22 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143954#comment-14143954 ] Apache Spark commented on SPARK-3650: - User 'jegonzal' has created a pull request for

[jira] [Commented] (SPARK-3647) Shaded Guava patch causes access issues with package private classes

2014-09-22 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14143968#comment-14143968 ] Apache Spark commented on SPARK-3647: - User 'vanzin' has created a pull request for

[jira] [Updated] (SPARK-1860) Standalone Worker cleanup should not clean up running executors

2014-09-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1860: --- Priority: Blocker (was: Critical) Standalone Worker cleanup should not clean up running

[jira] [Updated] (SPARK-1860) Standalone Worker cleanup should not clean up running executors

2014-09-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-1860: --- Target Version/s: 1.2.0 Standalone Worker cleanup should not clean up running executors

[jira] [Updated] (SPARK-3032) Potential bug when running sort-based shuffle with sorting using TimSort

2014-09-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3032: --- Priority: Critical (was: Major) Potential bug when running sort-based shuffle with sorting

[jira] [Updated] (SPARK-3032) Potential bug when running sort-based shuffle with sorting using TimSort

2014-09-22 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3032: --- Target Version/s: 1.2.0 Potential bug when running sort-based shuffle with sorting using

[jira] [Created] (SPARK-3651) Consolidate executor maps in CoarseGrainedSchedulerBackend

2014-09-22 Thread Andrew Or (JIRA)
Andrew Or created SPARK-3651: Summary: Consolidate executor maps in CoarseGrainedSchedulerBackend Key: SPARK-3651 URL: https://issues.apache.org/jira/browse/SPARK-3651 Project: Spark Issue Type:

[jira] [Commented] (SPARK-3620) Refactor config option handling code for spark-submit

2014-09-22 Thread Dale Richardson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144095#comment-14144095 ] Dale Richardson commented on SPARK-3620: Due to typesafe conf being based on a

[jira] [Created] (SPARK-3652) upgrade spark sql hive version to 0.13.1

2014-09-22 Thread wangfei (JIRA)
wangfei created SPARK-3652: -- Summary: upgrade spark sql hive version to 0.13.1 Key: SPARK-3652 URL: https://issues.apache.org/jira/browse/SPARK-3652 Project: Spark Issue Type: Dependency upgrade

  1   2   >