[jira] [Resolved] (SPARK-1063) Add .sortBy(f) method on RDD

2014-11-05 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-1063. -- Resolution: Fixed > Add .sortBy(f) method on RDD > > >

[jira] [Resolved] (SPARK-824) Make less copies of blocks during remote reads

2014-11-05 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-824. - Resolution: Fixed This is a pretty old issue that no longer affects the newest block manager and N

[jira] [Resolved] (SPARK-914) Make RDD implement Scala and Java Iterable interfaces

2014-11-05 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-914. - Resolution: Fixed Fix Version/s: 1.0.0 > Make RDD implement Scala and Java Iterable interfac

[jira] [Resolved] (SPARK-880) When built with Hadoop2, spark-shell and examples don't initialize log4j properly

2014-11-05 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-880. - Resolution: Fixed > When built with Hadoop2, spark-shell and examples don't initialize log4j > pro

[jira] [Resolved] (SPARK-812) Netty shuffle creates a lot of open file handles

2014-11-05 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-812. - Resolution: Invalid No longer a problem for new versions of the Netty shuffle > Netty shuffle crea

[jira] [Commented] (SPARK-785) ClosureCleaner not invoked on most PairRDDFunctions

2014-11-05 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199922#comment-14199922 ] Matei Zaharia commented on SPARK-785: - [~adav] it still seems to be, weirdly enough: fo

[jira] [Resolved] (SPARK-682) Memoize results of getPreferredLocations

2014-11-05 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-682. - Resolution: Duplicate Fix Version/s: 1.1.0 > Memoize results of getPreferredLocations >

[jira] [Resolved] (SPARK-656) Let Amazon choose our EC2 clusters' availability zone if the user does not specify one

2014-11-05 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-656. - Resolution: Fixed > Let Amazon choose our EC2 clusters' availability zone if the user does not > s

[jira] [Resolved] (SPARK-610) Support master failover in standalone mode

2014-11-05 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-610. - Resolution: Fixed Fix Version/s: 0.8.1 Assignee: Aaron Davidson > Support master fa

[jira] [Resolved] (SPARK-619) Hadoop MapReduce should be configured to use all local disks for shuffle on AMI

2014-11-05 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-619. - Resolution: Fixed > Hadoop MapReduce should be configured to use all local disks for shuffle on >

[jira] [Resolved] (SPARK-600) SparkContext.stop and clearJars delete local JAR files

2014-11-05 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-600. - Resolution: Fixed Should no longer be a problem since 1.0 > SparkContext.stop and clearJars delete

[jira] [Closed] (SPARK-542) Cache Miss when machine have multiple hostname

2014-11-05 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia closed SPARK-542. --- Resolution: Won't Fix New versions of Spark have ways to specify the hostname and IP address to bind t

[jira] [Resolved] (SPARK-565) Integrate spark in scala standard collection API

2014-11-05 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-565. - Resolution: Won't Fix FYI I'm going to close this because we've locked down the API for 1.X, and it

[jira] [Resolved] (SPARK-4040) Update spark documentation for local mode and spark-streaming.

2014-11-05 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-4040. -- Resolution: Fixed > Update spark documentation for local mode and spark-streaming. > --

[jira] [Updated] (SPARK-4040) Update spark documentation for local mode and spark-streaming.

2014-11-05 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-4040: - Assignee: jay vyas > Update spark documentation for local mode and spark-streaming. > ---

[jira] [Resolved] (SPARK-4222) FixedLengthBinaryRecordReader should readFully

2014-11-05 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-4222. -- Resolution: Fixed Fix Version/s: 1.2.0 > FixedLengthBinaryRecordReader should readFully >

[jira] [Updated] (SPARK-4222) FixedLengthBinaryRecordReader should readFully

2014-11-05 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-4222: - Assignee: Jascha Swisher > FixedLengthBinaryRecordReader should readFully > --

[jira] [Resolved] (SPARK-3466) Limit size of results that a driver collects for each action

2014-11-02 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-3466. -- Resolution: Fixed Fix Version/s: 1.2.0 > Limit size of results that a driver collects for

[jira] [Resolved] (SPARK-3929) Support for fixed-precision decimal

2014-11-01 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-3929. -- Resolution: Fixed Fix Version/s: 1.2.0 > Support for fixed-precision decimal > --

[jira] [Resolved] (SPARK-3931) Support reading fixed-precision decimals from Parquet

2014-11-01 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-3931. -- Resolution: Fixed Fix Version/s: 1.2.0 > Support reading fixed-precision decimals from Pa

[jira] [Resolved] (SPARK-3932) Support reading fixed-precision decimals from Hive 0.13

2014-11-01 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-3932. -- Resolution: Fixed Fix Version/s: 1.2.0 Done in https://github.com/apache/spark/pull/2983

[jira] [Commented] (SPARK-3931) Support reading fixed-precision decimals from Parquet

2014-11-01 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193666#comment-14193666 ] Matei Zaharia commented on SPARK-3931: -- Done in https://github.com/apache/spark/pull/

[jira] [Commented] (SPARK-4186) Support binaryFiles and binaryRecords API in Python

2014-11-01 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14193363#comment-14193363 ] Matei Zaharia commented on SPARK-4186: -- [~davies] it would be great if you have a cha

[jira] [Created] (SPARK-4186) Support binaryFiles and binaryRecords API in Python

2014-11-01 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-4186: Summary: Support binaryFiles and binaryRecords API in Python Key: SPARK-4186 URL: https://issues.apache.org/jira/browse/SPARK-4186 Project: Spark Issue Type:

[jira] [Resolved] (SPARK-2759) The ability to read binary files into Spark

2014-11-01 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2759. -- Resolution: Fixed Fix Version/s: 1.2.0 > The ability to read binary files into Spark > --

[jira] [Updated] (SPARK-1847) Pushdown filters on non-required parquet columns

2014-10-31 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-1847: - Assignee: Yash Datta > Pushdown filters on non-required parquet columns >

[jira] [Updated] (SPARK-3968) Use parquet-mr filter2 api in spark sql

2014-10-31 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-3968: - Assignee: Yash Datta > Use parquet-mr filter2 api in spark sql > -

[jira] [Resolved] (SPARK-1847) Pushdown filters on non-required parquet columns

2014-10-31 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-1847. -- Resolution: Fixed Fix Version/s: 1.2.0 > Pushdown filters on non-required parquet columns

[jira] [Created] (SPARK-4176) Support decimals with precision > 18 in Parquet

2014-10-31 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-4176: Summary: Support decimals with precision > 18 in Parquet Key: SPARK-4176 URL: https://issues.apache.org/jira/browse/SPARK-4176 Project: Spark Issue Type: New

[jira] [Updated] (SPARK-3561) Allow for pluggable execution contexts in Spark

2014-10-31 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-3561: - Fix Version/s: (was: 1.2.0) > Allow for pluggable execution contexts in Spark > --

[jira] [Updated] (SPARK-3466) Limit size of results that a driver collects for each action

2014-10-28 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-3466: - Priority: Critical (was: Major) > Limit size of results that a driver collects for each action >

[jira] [Created] (SPARK-4043) Add a flag for stopping threads of cancelled tasks if Thread.interrupt doesn't kill them

2014-10-21 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-4043: Summary: Add a flag for stopping threads of cancelled tasks if Thread.interrupt doesn't kill them Key: SPARK-4043 URL: https://issues.apache.org/jira/browse/SPARK-4043

[jira] [Commented] (SPARK-3466) Limit size of results that a driver collects for each action

2014-10-20 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14178025#comment-14178025 ] Matei Zaharia commented on SPARK-3466: -- Ah, I see, that concern makes sense if the to

[jira] [Commented] (SPARK-3655) Secondary sort

2014-10-20 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177824#comment-14177824 ] Matei Zaharia commented on SPARK-3655: -- I believe you can build this on top of sortBy

[jira] [Commented] (SPARK-3466) Limit size of results that a driver collects for each action

2014-10-20 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177800#comment-14177800 ] Matei Zaharia commented on SPARK-3466: -- The way I'd build this is by putting a limit

[jira] [Updated] (SPARK-3467) Python BatchedSerializer should dynamically lower batch size for large objects

2014-10-20 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-3467: - Assignee: Davies Liu > Python BatchedSerializer should dynamically lower batch size for large obje

[jira] [Created] (SPARK-3933) Optimize decimal type in Spark SQL for those with small precision

2014-10-13 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-3933: Summary: Optimize decimal type in Spark SQL for those with small precision Key: SPARK-3933 URL: https://issues.apache.org/jira/browse/SPARK-3933 Project: Spark

[jira] [Created] (SPARK-3932) Support reading fixed-precision decimals from Hive 0.13

2014-10-13 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-3932: Summary: Support reading fixed-precision decimals from Hive 0.13 Key: SPARK-3932 URL: https://issues.apache.org/jira/browse/SPARK-3932 Project: Spark Issue T

[jira] [Created] (SPARK-3931) Support reading fixed-precision decimals from Parquet

2014-10-13 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-3931: Summary: Support reading fixed-precision decimals from Parquet Key: SPARK-3931 URL: https://issues.apache.org/jira/browse/SPARK-3931 Project: Spark Issue Typ

[jira] [Updated] (SPARK-3929) Support for fixed-precision decimal

2014-10-13 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-3929: - Description: Spark SQL should support fixed-precision decimals, which are available in Hive 0.13 (

[jira] [Created] (SPARK-3930) Add precision and scale to Spark SQL's Decimal type

2014-10-13 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-3930: Summary: Add precision and scale to Spark SQL's Decimal type Key: SPARK-3930 URL: https://issues.apache.org/jira/browse/SPARK-3930 Project: Spark Issue Type:

[jira] [Created] (SPARK-3929) Support for fixed-precision decimal

2014-10-13 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-3929: Summary: Support for fixed-precision decimal Key: SPARK-3929 URL: https://issues.apache.org/jira/browse/SPARK-3929 Project: Spark Issue Type: New Feature

[jira] [Commented] (SPARK-3849) Automate remaining Spark Code Style Guide rules

2014-10-12 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14168961#comment-14168961 ] Matei Zaharia commented on SPARK-3849: -- Just to comment the same thing I did on the m

[jira] [Updated] (SPARK-2759) The ability to read binary files into Spark

2014-10-08 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2759: - Assignee: Kevin Mader > The ability to read binary files into Spark >

[jira] [Updated] (SPARK-2759) The ability to read binary files into Spark

2014-10-08 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2759: - Target Version/s: 1.2.0 > The ability to read binary files into Spark > --

[jira] [Resolved] (SPARK-3762) clear all SparkEnv references after stop

2014-10-07 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-3762. -- Resolution: Fixed Fix Version/s: 1.2.0 > clear all SparkEnv references after stop > -

[jira] [Resolved] (SPARK-2530) Relax incorrect assumption of one ExternalAppendOnlyMap per thread

2014-10-06 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2530. -- Resolution: Fixed Fix Version/s: 1.1.0 This was fixed by SPARK-2711. > Relax incorrect a

[jira] [Commented] (SPARK-3633) Fetches failure observed after SPARK-2711

2014-10-06 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161149#comment-14161149 ] Matei Zaharia commented on SPARK-3633: -- BTW one other possibility is that ExternalApp

[jira] [Commented] (SPARK-3633) Fetches failure observed after SPARK-2711

2014-10-06 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161142#comment-14161142 ] Matei Zaharia commented on SPARK-3633: -- In that case though, the problem might be tha

[jira] [Commented] (SPARK-3633) Fetches failure observed after SPARK-2711

2014-10-06 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161117#comment-14161117 ] Matei Zaharia commented on SPARK-3633: -- I'm curious, why do you think this is caused

[jira] [Updated] (SPARK-3356) Document when RDD elements' ordering within partitions is nondeterministic

2014-09-30 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-3356: - Assignee: Sean Owen > Document when RDD elements' ordering within partitions is nondeterministic >

[jira] [Resolved] (SPARK-3356) Document when RDD elements' ordering within partitions is nondeterministic

2014-09-30 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-3356. -- Resolution: Fixed Fix Version/s: 1.2.0 > Document when RDD elements' ordering within part

[jira] [Resolved] (SPARK-3032) Potential bug when running sort-based shuffle with sorting using TimSort

2014-09-29 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-3032. -- Resolution: Fixed Fix Version/s: 1.2.0 1.1.1 > Potential bug when runn

[jira] [Commented] (SPARK-3032) Potential bug when running sort-based shuffle with sorting using TimSort

2014-09-29 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152014#comment-14152014 ] Matei Zaharia commented on SPARK-3032: -- Yup, this will appear in 1.1.1. I've merged i

[jira] [Resolved] (SPARK-3389) Add converter class to make reading Parquet files easy with PySpark

2014-09-27 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-3389. -- Resolution: Fixed Fix Version/s: 1.2.0 > Add converter class to make reading Parquet file

[jira] [Commented] (SPARK-3129) Prevent data loss in Spark Streaming

2014-09-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14145537#comment-14145537 ] Matei Zaharia commented on SPARK-3129: -- Alright, in that case, this sounds pretty goo

[jira] [Comment Edited] (SPARK-3129) Prevent data loss in Spark Streaming

2014-09-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14145324#comment-14145324 ] Matei Zaharia edited comment on SPARK-3129 at 9/23/14 7:53 PM: -

[jira] [Commented] (SPARK-3129) Prevent data loss in Spark Streaming

2014-09-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14145324#comment-14145324 ] Matei Zaharia commented on SPARK-3129: -- Is that 100 MB/s per node or in total? That s

[jira] [Updated] (SPARK-3389) Add converter class to make reading Parquet files easy with PySpark

2014-09-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-3389: - Assignee: Uri Laserson > Add converter class to make reading Parquet files easy with PySpark > ---

[jira] [Updated] (SPARK-3389) Add converter class to make reading Parquet files easy with PySpark

2014-09-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-3389: - Target Version/s: 1.2.0 > Add converter class to make reading Parquet files easy with PySpark > --

[jira] [Resolved] (SPARK-2745) Add Java friendly methods to Duration class

2014-09-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2745. -- Resolution: Fixed Fix Version/s: 1.2.0 > Add Java friendly methods to Duration class > --

[jira] [Updated] (SPARK-2745) Add Java friendly methods to Duration class

2014-09-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2745: - Assignee: Sean Owen (was: Tathagata Das) > Add Java friendly methods to Duration class >

[jira] [Updated] (SPARK-2745) Add Java friendly methods to Duration class

2014-09-23 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2745: - Assignee: Sean Owen (was: Sean Owen) > Add Java friendly methods to Duration class >

[jira] [Updated] (SPARK-3032) Potential bug when running sort-based shuffle with sorting using TimSort

2014-09-22 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-3032: - Assignee: Saisai Shao > Potential bug when running sort-based shuffle with sorting using TimSort >

[jira] [Commented] (SPARK-3032) Potential bug when running sort-based shuffle with sorting using TimSort

2014-09-22 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144243#comment-14144243 ] Matei Zaharia commented on SPARK-3032: -- I'm not completely sure that this is because

[jira] [Commented] (SPARK-3032) Potential bug when running sort-based shuffle with sorting using TimSort

2014-09-22 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14144244#comment-14144244 ] Matei Zaharia commented on SPARK-3032: -- Yeah actually I'm sure TimSort works fine wit

[jira] [Created] (SPARK-3643) Add cluster-specific config settings to configuration page

2014-09-22 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-3643: Summary: Add cluster-specific config settings to configuration page Key: SPARK-3643 URL: https://issues.apache.org/jira/browse/SPARK-3643 Project: Spark Issu

[jira] [Updated] (SPARK-3629) Improvements to YARN doc

2014-09-22 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-3629: - Labels: starter (was: ) > Improvements to YARN doc > > >

[jira] [Updated] (SPARK-3629) Improvements to YARN doc

2014-09-21 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-3629: - Description: Right now this doc starts off with a big list of config options, and only then tells

[jira] [Updated] (SPARK-3629) Improvements to YARN doc

2014-09-21 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-3629: - Summary: Improvements to YARN doc (was: Improve ordering of YARN doc) > Improvements to YARN doc

[jira] [Created] (SPARK-3629) Improve ordering of YARN doc

2014-09-21 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-3629: Summary: Improve ordering of YARN doc Key: SPARK-3629 URL: https://issues.apache.org/jira/browse/SPARK-3629 Project: Spark Issue Type: Documentation

[jira] [Updated] (SPARK-3628) Don't apply accumulator updates multiple times for tasks in result stages

2014-09-21 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-3628: - Target Version/s: 1.1.1, 1.2.0, 0.9.3, 1.0.3 (was: 1.1.1, 1.2.0, 1.0.3) > Don't apply accumulator

[jira] [Updated] (SPARK-3628) Don't apply accumulator updates multiple times for tasks in result stages

2014-09-21 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-3628: - Target Version/s: 1.1.1, 1.2.0, 0.9.3, 1.0.3 (was: 1.1.1, 1.2.0, 1.0.3) > Don't apply accumulator

[jira] [Updated] (SPARK-3628) Don't apply accumulator updates multiple times for tasks in result stages

2014-09-21 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-3628: - Target Version/s: 1.1.1, 1.2.0, 1.0.3 (was: 1.1.1, 1.2.0, 0.9.3, 1.0.3) > Don't apply accumulator

[jira] [Comment Edited] (SPARK-3628) Don't apply accumulator updates multiple times for tasks in result stages

2014-09-21 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14142756#comment-14142756 ] Matei Zaharia edited comment on SPARK-3628 at 9/21/14 10:49 PM:

[jira] [Comment Edited] (SPARK-3628) Don't apply accumulator updates multiple times for tasks in result stages

2014-09-21 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14142756#comment-14142756 ] Matei Zaharia edited comment on SPARK-3628 at 9/21/14 10:43 PM:

[jira] [Commented] (SPARK-3628) Don't apply accumulator updates multiple times for tasks in result stages

2014-09-21 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14142756#comment-14142756 ] Matei Zaharia commented on SPARK-3628: -- BTW the problem is that this used to be guard

[jira] [Created] (SPARK-3628) Don't apply accumulator updates multiple times for tasks in result stages

2014-09-21 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-3628: Summary: Don't apply accumulator updates multiple times for tasks in result stages Key: SPARK-3628 URL: https://issues.apache.org/jira/browse/SPARK-3628 Project: Spar

[jira] [Created] (SPARK-3619) Upgrade to Mesos 0.21 to work around MESOS-1688

2014-09-20 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-3619: Summary: Upgrade to Mesos 0.21 to work around MESOS-1688 Key: SPARK-3619 URL: https://issues.apache.org/jira/browse/SPARK-3619 Project: Spark Issue Type: Imp

[jira] [Created] (SPARK-3611) Show number of cores for each executor in application web UI

2014-09-19 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-3611: Summary: Show number of cores for each executor in application web UI Key: SPARK-3611 URL: https://issues.apache.org/jira/browse/SPARK-3611 Project: Spark I

[jira] [Commented] (SPARK-3129) Prevent data loss in Spark Streaming

2014-09-19 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14141382#comment-14141382 ] Matei Zaharia commented on SPARK-3129: -- So Hari, what is the maximum sustainable rate

[jira] [Comment Edited] (SPARK-2593) Add ability to pass an existing Akka ActorSystem into Spark

2014-09-18 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139163#comment-14139163 ] Matei Zaharia edited comment on SPARK-2593 at 9/18/14 4:56 PM: -

[jira] [Commented] (SPARK-2593) Add ability to pass an existing Akka ActorSystem into Spark

2014-09-18 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14139163#comment-14139163 ] Matei Zaharia commented on SPARK-2593: -- Sure, it would be great to do this for stream

[jira] [Commented] (SPARK-2593) Add ability to pass an existing Akka ActorSystem into Spark

2014-09-17 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138402#comment-14138402 ] Matei Zaharia commented on SPARK-2593: -- BTW doing this for the ActorReceiver for Spar

[jira] [Commented] (SPARK-2593) Add ability to pass an existing Akka ActorSystem into Spark

2014-09-17 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138390#comment-14138390 ] Matei Zaharia commented on SPARK-2593: -- The reason that we don't want to expose Akka

[jira] [Updated] (SPARK-2620) case class cannot be used as key for reduce

2014-09-17 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2620: - Affects Version/s: 1.1.0 > case class cannot be used as key for reduce > -

[jira] [Commented] (SPARK-3129) Prevent data loss in Spark Streaming

2014-09-17 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138281#comment-14138281 ] Matei Zaharia commented on SPARK-3129: -- Great, it will be nice to see how fast this i

[jira] [Commented] (SPARK-3129) Prevent data loss in Spark Streaming

2014-09-17 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138014#comment-14138014 ] Matei Zaharia commented on SPARK-3129: -- Hari, have you actually benchmarked a WAL bas

[jira] [Commented] (SPARK-3530) Pipeline and Parameters

2014-09-17 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137719#comment-14137719 ] Matei Zaharia commented on SPARK-3530: -- To comment on the versioning stuff here, "dep

[jira] [Updated] (SPARK-1449) Please delete old releases from mirroring system

2014-09-14 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-1449: - Assignee: Patrick Wendell > Please delete old releases from mirroring system > ---

[jira] [Commented] (SPARK-1449) Please delete old releases from mirroring system

2014-09-14 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1412#comment-1412 ] Matei Zaharia commented on SPARK-1449: -- Hey folks, sorry for the delay -- will look i

[jira] [Created] (SPARK-3467) Python BatchedSerializer should dynamically lower batch size for large objects

2014-09-09 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-3467: Summary: Python BatchedSerializer should dynamically lower batch size for large objects Key: SPARK-3467 URL: https://issues.apache.org/jira/browse/SPARK-3467 Project:

[jira] [Updated] (SPARK-3466) Limit size of results that a driver collects for each action

2014-09-09 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-3466: - Issue Type: New Feature (was: Improvement) > Limit size of results that a driver collects for eac

[jira] [Created] (SPARK-3466) Limit size of results that a driver collects for each action

2014-09-09 Thread Matei Zaharia (JIRA)
Matei Zaharia created SPARK-3466: Summary: Limit size of results that a driver collects for each action Key: SPARK-3466 URL: https://issues.apache.org/jira/browse/SPARK-3466 Project: Spark I

[jira] [Commented] (SPARK-3441) Explain in docs that repartitionAndSortWithinPartitions enacts Hadoop style shuffle

2014-09-08 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126175#comment-14126175 ] Matei Zaharia commented on SPARK-3441: -- I agree that we should have more of a doc her

[jira] [Updated] (SPARK-3444) Provide a way to easily change the log level in the Spark shell while running

2014-09-08 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-3444: - Assignee: Holden Karau > Provide a way to easily change the log level in the Spark shell while run

[jira] [Updated] (SPARK-3444) Provide a way to easily change the log level in the Spark shell while running

2014-09-08 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-3444: - Assignee: Holden Karau (was: Holden Karau) > Provide a way to easily change the log level in the

[jira] [Commented] (SPARK-2688) Need a way to run multiple data pipeline concurrently

2014-09-08 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125868#comment-14125868 ] Matei Zaharia commented on SPARK-2688: -- Just as a note, to launch multiple Spark acti

[jira] [Updated] (SPARK-2978) Provide an MR-style shuffle transformation

2014-09-08 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia updated SPARK-2978: - Assignee: Sandy Ryza > Provide an MR-style shuffle transformation > --

[jira] [Resolved] (SPARK-2978) Provide an MR-style shuffle transformation

2014-09-08 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matei Zaharia resolved SPARK-2978. -- Resolution: Fixed Fix Version/s: 1.2.0 > Provide an MR-style shuffle transformation > ---

<    1   2   3   4   5   6   7   >