[jira] [Updated] (SPARK-12957) Derive and propagate data constrains in logical plan

2016-02-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12957: - Assignee: Sameer Agarwal > Derive and propagate data constrains in logical p

[jira] [Resolved] (SPARK-13090) Add initial support for constraint propagation in SparkSQL

2016-02-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-13090. -- Resolution: Fixed Fix Version/s: 2.0.0 > Add initial support for constra

Re: Spark DataFrame Catalyst - Another Oracle like query optimizer?

2016-02-02 Thread Michael Armbrust
> > A principal difference between RDDs and DataFrames/Datasets is that the > latter have a schema associated to them. This means that they support only > certain types (primitives, case classes and more) and that they are > uniform, whereas RDDs can contain any serializable object and must not > n

[jira] [Updated] (SPARK-13002) Mesos scheduler backend does not follow the property spark.dynamicAllocation.initialExecutors

2016-02-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13002: - Target Version/s: 2.0.0 (was: 1.6.1, 2.0.0) > Mesos scheduler backend does not fol

Re: Spark 1.6.1

2016-02-02 Thread Michael Armbrust
; Mingyu > > From: Romi Kuntsman > Date: Tuesday, February 2, 2016 at 3:16 AM > To: Michael Armbrust > Cc: Hamel Kothari , Ted Yu , > "dev@spark.apache.org" > Subject: Re: Spark 1.6.1 > > Hi Michael, > What about the memory leak bu

[jira] [Resolved] (SPARK-12783) Dataset map serialization error

2016-02-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12783. -- Resolution: Fixed Fix Version/s: 1.6.1 Closing, please reopen if you can

Re: Spark 1.6.1

2016-02-02 Thread Michael Armbrust
> > What about the memory leak bug? > https://issues.apache.org/jira/browse/SPARK-11293 > Even after the memory rewrite in 1.6.0, it still happens in some cases. > Will it be fixed for 1.6.1? > I think we have enough issues queued up that I would not hold the release for that, but if there is a pa

[jira] [Updated] (SPARK-13094) No encoder implicits for Seq[Primitive]

2016-02-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13094: - Assignee: Michael Armbrust > No encoder implicits for Seq[Primit

[jira] [Resolved] (SPARK-13094) No encoder implicits for Seq[Primitive]

2016-02-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-13094. -- Resolution: Fixed Fix Version/s: 1.6.1 2.0.0 Issue resolved

[jira] [Resolved] (SPARK-10820) Initial infrastructure

2016-02-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-10820. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11006

Re: optimal way to load parquet files with partition

2016-02-02 Thread Michael Armbrust
It depends how many partitions you have and if you are only doing a single operation. Loading all the data and filtering will require us to scan the directories to discover all the months. This information will be cached. Then we should prune and avoid reading unneeded data. Option 1 does not re

[jira] [Created] (SPARK-13128) API for building arrays / lists encoders

2016-02-01 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-13128: Summary: API for building arrays / lists encoders Key: SPARK-13128 URL: https://issues.apache.org/jira/browse/SPARK-13128 Project: Spark Issue Type

[jira] [Updated] (SPARK-13122) Race condition in MemoryStore.unrollSafely() causes memory leak

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13122: - Target Version/s: 1.6.1 > Race condition in MemoryStore.unrollSafely() causes mem

[jira] [Updated] (SPARK-13087) Grouping by a complex expression may lead to incorrect AttributeReferences in aggregations

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13087: - Affects Version/s: 2.0.0 > Grouping by a complex expression may lead to incorr

[jira] [Commented] (SPARK-13087) Grouping by a complex expression may lead to incorrect AttributeReferences in aggregations

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15127353#comment-15127353 ] Michael Armbrust commented on SPARK-13087: -- Here's a self-contained

[jira] [Updated] (SPARK-13087) Grouping by a complex expression may lead to incorrect AttributeReferences in aggregations

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13087: - Priority: Critical (was: Major) > Grouping by a complex expression may lead

[jira] [Updated] (SPARK-13094) No encoder implicits for Seq[Primitive]

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13094: - Summary: No encoder implicits for Seq[Primitive] (was: Dataset Aggregators do not work

[jira] [Commented] (SPARK-13094) Dataset Aggregators do not work with complex types

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126996#comment-15126996 ] Michael Armbrust commented on SPARK-13094: -- Sorry, I'm looking at th

[jira] [Updated] (SPARK-13094) Dataset Aggregators do not work with complex types

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13094: - Issue Type: Improvement (was: Bug) > Dataset Aggregators do not work with complex ty

[jira] [Updated] (SPARK-13094) Dataset Aggregators do not work with complex types

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13094: - Description: Dataset aggregators with complex types fail with unable to find encoder for

[jira] [Commented] (SPARK-13083) Small spark sql queries get blocked if there is a long running query over a lot a partitions

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126954#comment-15126954 ] Michael Armbrust commented on SPARK-13083: -- The other possibility is that

[jira] [Resolved] (SPARK-11780) Provide type aliases in org.apache.spark.sql.types for backwards compatibility

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-11780. -- Resolution: Fixed Fix Version/s: 1.6.1 Issue resolved by pull request 10915

[jira] [Resolved] (SPARK-13083) Small spark sql queries get blocked if there is a long running query over a lot a partitions

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-13083. -- Resolution: Not A Problem Assignee: Michael Armbrust > Small spark sql quer

[jira] [Commented] (SPARK-13083) Small spark sql queries get blocked if there is a long running query over a lot a partitions

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126905#comment-15126905 ] Michael Armbrust commented on SPARK-13083: -- You need to also ensure the que

Re: Spark 1.6.1

2016-02-01 Thread Michael Armbrust
ackwards > compatible according to the Jackson folks. > > On Mon, Feb 1, 2016 at 10:29 AM Ted Yu wrote: > >> SPARK-12624 has been resolved. >> According to Wenchen, SPARK-12783 is fixed in 1.6.0 release. >> >> Are there other blockers for Spark 1.6.1 ? >>

[jira] [Updated] (SPARK-10777) order by fails when column is aliased and projection includes windowed aggregate

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-10777: - Assignee: Xiao Li > order by fails when column is aliased and projection inclu

[jira] [Updated] (SPARK-12705) Sorting column can't be resolved if it's not in projection

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12705: - Assignee: Xiao Li > Sorting column can't be resolved if it's not

[jira] [Resolved] (SPARK-10777) order by fails when column is aliased and projection includes windowed aggregate

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-10777. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10678

[jira] [Resolved] (SPARK-12705) Sorting column can't be resolved if it's not in projection

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12705. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10678

[jira] [Created] (SPARK-13118) Support for classes defined in package objects

2016-02-01 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-13118: Summary: Support for classes defined in package objects Key: SPARK-13118 URL: https://issues.apache.org/jira/browse/SPARK-13118 Project: Spark Issue

[jira] [Resolved] (SPARK-12989) Bad interaction between StarExpansion and ExtractWindowExpressions

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12989. -- Resolution: Fixed Fix Version/s: 1.6.1 2.0.0 Issue resolved

[jira] [Assigned] (SPARK-10820) Initial infrastructure

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust reassigned SPARK-10820: Assignee: Michael Armbrust > Initial infrastruct

[jira] [Updated] (SPARK-10820) Initial infrastructure

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-10820: - Summary: Initial infrastructure (was: Physical plan: determine physical operators

[jira] [Commented] (SPARK-13101) Dataset complex types mapping to DataFrame (element nullability) mismatch

2016-02-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126678#comment-15126678 ] Michael Armbrust commented on SPARK-13101: -- /cc [~lian cheng] [~cloud

[jira] [Created] (SPARK-13099) ccjlbr

2016-01-29 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-13099: Summary: ccjlbr Key: SPARK-13099 URL: https://issues.apache.org/jira/browse/SPARK-13099 Project: Spark Issue Type: Bug Reporter: Michael

[jira] [Updated] (SPARK-13090) Add initial support for constraint propagation in SparkSQL

2016-01-29 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13090: - Assignee: Sameer Agarwal > Add initial support for constraint propagation in Spark

[jira] [Updated] (SPARK-13092) Track constraints in ExpressionSet

2016-01-29 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13092: - Assignee: Sameer Agarwal > Track constraints in Expression

[jira] [Updated] (SPARK-13091) Rewrite/Propagate constraints for Aliases

2016-01-29 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13091: - Assignee: Sameer Agarwal > Rewrite/Propagate constraints for Alia

[jira] [Updated] (SPARK-13094) Dataset Aggregators do not work with complex types

2016-01-29 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13094: - Target Version/s: 1.6.1 > Dataset Aggregators do not work with complex ty

[jira] [Commented] (SPARK-13094) Dataset Aggregators do not work with complex types

2016-01-29 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15124205#comment-15124205 ] Michael Armbrust commented on SPARK-13094: -- Sorry, I think I was unclear.

Re: Spark 1.6.1

2016-01-29 Thread Michael Armbrust
I think this is fixed in branch-1.6 already. If you can reproduce it there can you please open a JIRA and ping me? On Fri, Jan 29, 2016 at 12:16 PM, deenar < deenar.toras...@thinkreactive.co.uk> wrote: > Hi Michael > > The Dataset aggregators do not appear to support complex Spark-SQL types. I >

[jira] [Updated] (SPARK-13087) Grouping by a complex expression may lead to incorrect AttributeReferences in aggregations

2016-01-29 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13087: - Target Version/s: 1.6.1 > Grouping by a complex expression may lead to incorr

Re: Spark 2.0.0 release plan

2016-01-29 Thread Michael Armbrust
t; > Regards > > Deenar > > > > On 27 January 2016 at 19:55, Michael Armbrust > > wrote: > >> > >> We do maintenance releases on demand when there is enough to justify > doing > >> one. I'm hoping to cut 1.6.1 soon, but have not had t

Re: Spark 2.0.0 release plan

2016-01-29 Thread Michael Armbrust
t; > Regards > > Deenar > > > > On 27 January 2016 at 19:55, Michael Armbrust > > wrote: > >> > >> We do maintenance releases on demand when there is enough to justify > doing > >> one. I'm hoping to cut 1.6.1 soon, but have not had t

[jira] [Resolved] (SPARK-12926) SQLContext to display warning message when non-sql configs are being set

2016-01-28 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12926. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10849

Re: Broadcast join on multiple dataframes

2016-01-28 Thread Michael Armbrust
Can you provide the analyzed and optimized plans (explain(true)) On Thu, Jan 28, 2016 at 12:26 PM, Srikanth wrote: > Hello, > > I have a use case where one large table has to be joined with several > smaller tables. > I've added broadcast hint for all small tables in the joins. > > val large

[jira] [Commented] (SPARK-12725) SQL generation suffers from name conficts introduced by some analysis rules

2016-01-28 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15122016#comment-15122016 ] Michael Armbrust commented on SPARK-12725: -- Why don't we just add

Re: Spark 2.0.0 release plan

2016-01-27 Thread Michael Armbrust
We do maintenance releases on demand when there is enough to justify doing one. I'm hoping to cut 1.6.1 soon, but have not had time yet. On Wed, Jan 27, 2016 at 8:12 AM, Daniel Siegmann < daniel.siegm...@teamaol.com> wrote: > Will there continue to be monthly releases on the 1.6.x branch during

Re: Spark 2.0.0 release plan

2016-01-27 Thread Michael Armbrust
We do maintenance releases on demand when there is enough to justify doing one. I'm hoping to cut 1.6.1 soon, but have not had time yet. On Wed, Jan 27, 2016 at 8:12 AM, Daniel Siegmann < daniel.siegm...@teamaol.com> wrote: > Will there continue to be monthly releases on the 1.6.x branch during

[jira] [Commented] (SPARK-12988) Can't drop columns that contain dots

2016-01-26 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15118433#comment-15118433 ] Michael Armbrust commented on SPARK-12988: -- Here are my thoughts a

[jira] [Commented] (SPARK-8279) udf_round_3 test fails

2016-01-26 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15117838#comment-15117838 ] Michael Armbrust commented on SPARK-8279: - I don't think it wa

Re: NPE from sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply?

2016-01-26 Thread Michael Armbrust
That is a bug in generated code. It would be great if you could post a reproduction. On Tue, Jan 26, 2016 at 9:15 AM, Jacek Laskowski wrote: > Hi, > > Does this say anything to anyone? :) It's with Spark 2.0.0-SNAPSHOT > built today. Is this something I could fix myself in my code or is > this

Re: Datasets and columns

2016-01-25 Thread Michael Armbrust
kes one column is there > a way to do a custom encoder with my own columns > On Jan 25, 2016 1:30 PM, "Michael Armbrust" > wrote: > >> The encoder is responsible for mapping your class onto some set of >> columns. Try running: datasetMyType.printSchema() >>

[jira] [Created] (SPARK-12989) Bad interaction between StarExpansion and ExtractWindowExpressions

2016-01-25 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-12989: Summary: Bad interaction between StarExpansion and ExtractWindowExpressions Key: SPARK-12989 URL: https://issues.apache.org/jira/browse/SPARK-12989 Project

[jira] [Resolved] (SPARK-12975) Throwing Exception when Bucketing Columns are part of Partitioning Columns

2016-01-25 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12975. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10891

[jira] [Updated] (SPARK-12975) Throwing Exception when Bucketing Columns are part of Partitioning Columns

2016-01-25 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12975: - Assignee: Xiao Li > Throwing Exception when Bucketing Columns are part of Partition

Re: Datasets and columns

2016-01-25 Thread Michael Armbrust
The encoder is responsible for mapping your class onto some set of columns. Try running: datasetMyType.printSchema() On Mon, Jan 25, 2016 at 1:16 PM, Steve Lewis wrote: > assume I have the following code > > SparkConf sparkConf = new SparkConf(); > > JavaSparkContext sqlCtx= new JavaSparkContex

Re: Trouble dropping columns from a DataFrame that has other columns with dots in their names

2016-01-25 Thread Michael Armbrust
Looks like you found a bug. I've filed them here: SPARK-12987 - Drop fails when columns contain dots SPARK-12988 - Can't drop columns that contain dots On Fri, Jan 22, 2016 at 3:18 PM, Joshua

[jira] [Created] (SPARK-12988) Can't drop columns that contain dots

2016-01-25 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-12988: Summary: Can't drop columns that contain dots Key: SPARK-12988 URL: https://issues.apache.org/jira/browse/SPARK-12988 Project: Spark Issue Type

[jira] [Updated] (SPARK-12987) Drop fails when columns contain dots

2016-01-25 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12987: - Priority: Critical (was: Major) > Drop fails when columns contain d

[jira] [Created] (SPARK-12987) Drop fails when columns contain quotes

2016-01-25 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-12987: Summary: Drop fails when columns contain quotes Key: SPARK-12987 URL: https://issues.apache.org/jira/browse/SPARK-12987 Project: Spark Issue Type

[jira] [Updated] (SPARK-12987) Drop fails when columns contain dots

2016-01-25 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12987: - Summary: Drop fails when columns contain dots (was: Drop fails when columns contain

Re: I need help mapping a PairRDD solution to Dataset

2016-01-20 Thread Michael Armbrust
ols in > neighboring regions > > On Wed, Jan 20, 2016 at 10:43 AM, Michael Armbrust > wrote: > >> The analog to PairRDD is a GroupedDataset (created by calling groupBy), >> which offers similar functionality, but doesn't require you to construct >> new objec

Re: I need help mapping a PairRDD solution to Dataset

2016-01-20 Thread Michael Armbrust
The analog to PairRDD is a GroupedDataset (created by calling groupBy), which offers similar functionality, but doesn't require you to construct new object that are in the form of key/value pairs. It doesn't matter if they are complex objects, as long as you can create an encoder for them (current

Re: Redundant common columns of nature full outer join

2016-01-20 Thread Michael Armbrust
If you use the join that takes USING columns it should automatically coalesce (take the non null value from) the left/right columns: https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L405 On Tue, Jan 19, 2016 at 10:51 PM, Zhong Wang wrote:

[jira] [Resolved] (SPARK-12816) Schema generation for type aliases does not work

2016-01-19 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12816. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10749

Re: Spark SQL -Hive transactions support

2016-01-19 Thread Michael Armbrust
We don't support Hive style transaction. On Tue, Jan 19, 2016 at 11:32 AM, hnagar wrote: > Hive has transactions support since version 0.14. > > I am using Spark 1.6, and Hive 1.2.1, are transactions supported in Spark > SQL now. I tried in the Spark-Shell and it gives the following error > > or

Re: Spark Dataset doesn't have api for changing columns

2016-01-19 Thread Michael Armbrust
In Spark 2.0 we are planning to combine DataFrame and Dataset so that all the methods will be available on either class. On Tue, Jan 19, 2016 at 3:42 AM, Milad khajavi wrote: > Hi Spark users, > > when I want to map the result of count on groupBy, I need to convert the > result to Dataframe, the

Re: Serializing DataSets

2016-01-18 Thread Michael Armbrust
What error? On Mon, Jan 18, 2016 at 9:01 AM, Simon Hafner wrote: > And for deserializing, > `sqlContext.read.parquet("path/to/parquet").as[T]` and catch the > error? > > 2016-01-14 3:43 GMT+08:00 Michael Armbrust : > > Yeah, thats the best way for now (note the co

Re: DataFrameWriter on partitionBy for parquet eat all RAM

2016-01-15 Thread Michael Armbrust
See here for some workarounds: https://issues.apache.org/jira/browse/SPARK-12546 On Thu, Jan 14, 2016 at 6:46 PM, Jerry Lam wrote: > Hi Arkadiusz, > > the partitionBy is not designed to have many distinct value the last time > I used it. If you search in the mailing list, I think there are coupl

[jira] [Created] (SPARK-12841) UnresolvedException with cast

2016-01-15 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-12841: Summary: UnresolvedException with cast Key: SPARK-12841 URL: https://issues.apache.org/jira/browse/SPARK-12841 Project: Spark Issue Type: Bug

[jira] [Resolved] (SPARK-12813) Eliminate serialization for back to back operations

2016-01-14 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12813. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10747

Re: SQL UDF problem (with re to types)

2016-01-14 Thread Michael Armbrust
ved through proper generics > implementation in Java 1.8). > > On Thu, Jan 14, 2016 at 1:42 PM, Michael Armbrust > wrote: > >> We automatically convert types for UDFs defined in Scala, but we can't do >> it in Java because the types are erased by the compiler. If you w

Re: SQL UDF problem (with re to types)

2016-01-14 Thread Michael Armbrust
We automatically convert types for UDFs defined in Scala, but we can't do it in Java because the types are erased by the compiler. If you want to use double you should cast before calling the UDF. On Wed, Jan 13, 2016 at 8:10 PM, Raghu Ganti wrote: > So, when I try BigDecimal, it works. But, sh

[jira] [Updated] (SPARK-11780) Provide type aliases in org.apache.spark.sql.types for backwards compatibility

2016-01-13 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-11780: - Target Version/s: 1.6.1 > Provide type aliases in org.apache.spark.sql.types

Spark 1.6.1

2016-01-13 Thread Michael Armbrust
Hey All, While I'm not aware of any critical issues with 1.6.0, there are several corner cases that users are hitting with the Dataset API that are fixed in branch-1.6. As such I'm considering a 1.6.1 release. At the moment there are only two critical issues targeted for 1.6.1: - SPARK-12624 -

[jira] [Updated] (SPARK-12783) Dataset map serialization error

2016-01-13 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12783: - Summary: Dataset map serialization error (was: Dataset map) > Dataset map serializat

[jira] [Resolved] (SPARK-12478) Dataset fields of product types can't be null

2016-01-13 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12478. -- Resolution: Fixed Fix Version/s: 1.6.1 This is fixed in branch-1.6 now

[jira] [Updated] (SPARK-12478) Dataset fields of product types can't be null

2016-01-13 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12478: - Fix Version/s: 2.0.0 > Dataset fields of product types can't

[jira] [Created] (SPARK-12813) Eliminate serialization for back to back operations

2016-01-13 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-12813: Summary: Eliminate serialization for back to back operations Key: SPARK-12813 URL: https://issues.apache.org/jira/browse/SPARK-12813 Project: Spark

Re: How to make Dataset api as fast as DataFrame

2016-01-13 Thread Michael Armbrust
The focus of this release was to get the API out there and there's a lot of low hanging performance optimizations. That said, there is likely always going to be some cost of materializing objects. Another note, anytime your comparing performance its useful to include the output of explain so we c

Re: Serializing DataSets

2016-01-13 Thread Michael Armbrust
Yeah, thats the best way for now (note the conversion is purely logical so there is no cost of calling toDF()). We'll likely be combining the classes in Spark 2.0 to remove this awkwardness. On Tue, Jan 12, 2016 at 11:20 PM, Simon Hafner wrote: > What's the proper way to write DataSets to disk?

Re: Spark 1.6 udf/udaf alternatives in dataset?

2016-01-12 Thread Michael Armbrust
ed(DAGScheduler.scala:861) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1607) >> at >> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599) >> at >> org.apache.spark.scheduler.DAGS

[jira] [Updated] (SPARK-12783) Dataset map

2016-01-12 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12783: - Assignee: Wenchen Fan Target Version/s: 1.6.1, 2.0.0 Priority

Re: [Spark SQL]: Issues with writing dataframe with Append Mode to Parquet

2016-01-12 Thread Michael Armbrust
There can be dataloss when you are using the DirectOutputCommitter and speculation is turned on, so we disable it automatically. On Tue, Jan 12, 2016 at 1:11 PM, Jerry Lam wrote: > Hi spark users and developers, > > I wonder if the following observed behaviour is expected. I'm writing > datafram

[jira] [Resolved] (SPARK-9843) Catalyst: Allow adding custom optimizers

2016-01-12 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-9843. - Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10210 [https

Re: Spark 1.6 udf/udaf alternatives in dataset?

2016-01-12 Thread Michael Armbrust
> > df1.as[TestCaseClass].map(_.toMyMap).show() //fails > > This looks like a bug. What is the error? It might be fixed in branch-1.6/master if you can test there. > Please advice on what I may be missing here? > > > Also for join, may I suggest to have a custom encoder / transformation to > say

Re: Spark 1.6 udf/udaf alternatives in dataset?

2016-01-11 Thread Michael Armbrust
> > Also, while extracting a value into Dataset using as[U] method, how could > I specify a custom encoder/translation to case class (where I don't have > the same column-name mapping or same data-type mapping)? > There is no public API yet for defining your own encoders. You change the column na

Re: Dataset throws: Task not serializable

2016-01-11 Thread Michael Armbrust
the same error with > dummy data. > > Thanks! > > On Thu, Jan 7, 2016 at 2:03 PM, Michael Armbrust > wrote: > >> Were you running in the REPL? >> >> On Thu, Jan 7, 2016 at 10:34 AM, Michael Armbrust > > wrote: >> >>> Thanks for providing a great

[jira] [Resolved] (SPARK-12758) Add note to Spark SQL Migration section about SPARK-11724

2016-01-11 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12758. -- Resolution: Fixed Fix Version/s: 1.6.1 2.0.0 Issue resolved

[jira] [Commented] (SPARK-12714) Transforming Dataset with sequences of case classes to RDD causes Task Not Serializable exception

2016-01-11 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15092591#comment-15092591 ] Michael Armbrust commented on SPARK-12714: -- Would you be able to test

[jira] [Resolved] (SPARK-12696) Dataset serialization error

2016-01-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12696. -- Resolution: Fixed Fix Version/s: 1.6.1 Issue resolved by pull request 10650

[jira] [Updated] (SPARK-12704) we may repartition a relation even it's not needed

2016-01-07 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12704: - Issue Type: Improvement (was: Bug) > we may repartition a relation even it's no

[jira] [Commented] (SPARK-12704) we may repartition a relation even it's not needed

2016-01-07 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15088638#comment-15088638 ] Michael Armbrust commented on SPARK-12704: -- I think this explanation migh

[jira] [Updated] (SPARK-12696) Dataset serialization error

2016-01-07 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12696: - Target Version/s: 1.6.1 (was: 1.6.1, 2.0.0) > Dataset serialization er

Re: Dataset throws: Task not serializable

2016-01-07 Thread Michael Armbrust
Were you running in the REPL? On Thu, Jan 7, 2016 at 10:34 AM, Michael Armbrust wrote: > Thanks for providing a great description. I've opened > https://issues.apache.org/jira/browse/SPARK-12696 > > I'm actually getting a different error (running in notebooks though). &

Re: Dataset throws: Task not serializable

2016-01-07 Thread Michael Armbrust
Thanks for providing a great description. I've opened https://issues.apache.org/jira/browse/SPARK-12696 I'm actually getting a different error (running in notebooks though). Something seems wrong either way. > > *P.S* mapping by name with case classes doesn't work if the order of the > fields of

[jira] [Updated] (SPARK-12696) Dataset serialization error

2016-01-07 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12696: - Priority: Blocker (was: Major) > Dataset serialization er

[jira] [Created] (SPARK-12696) Dataset serialization error

2016-01-07 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-12696: Summary: Dataset serialization error Key: SPARK-12696 URL: https://issues.apache.org/jira/browse/SPARK-12696 Project: Spark Issue Type: Bug

Re: problem with DataFrame df.withColumn() org.apache.spark.sql.AnalysisException: resolved attribute(s) missing

2016-01-06 Thread Michael Armbrust
oh, and I think I installed jekyll using "gem install jekyll" On Wed, Jan 6, 2016 at 4:17 PM, Michael Armbrust wrote: > from docs/ run: > > SKIP_API=1 jekyll serve --watch > > On Wed, Jan 6, 2016 at 4:12 PM, Andy Davidson < > a...@santacruzintegration.com>

<    7   8   9   10   11   12   13   14   15   16   >