[jira] [Updated] (SPARK-13728) Fix ORC PPD

2016-03-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13728: - Assignee: Hyukjin Kwon > Fix ORC PPD > --- > > Key:

[jira] [Commented] (SPARK-13728) Fix ORC PPD

2016-03-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15185463#comment-15185463 ] Michael Armbrust commented on SPARK-13728: -- That sounds like a good lea

[jira] [Commented] (SPARK-13665) Initial separation of concerns in HadoopFSRelation

2016-03-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15185460#comment-15185460 ] Michael Armbrust commented on SPARK-13665: -- I think what everyone is goin

[jira] [Updated] (SPARK-13665) Initial separation of concerns in HadoopFSRelation

2016-03-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13665: - Summary: Initial separation of concerns in HadoopFSRelation (was: Initial separation of

[jira] [Created] (SPARK-13738) Clean up ResolveDataSource

2016-03-07 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-13738: Summary: Clean up ResolveDataSource Key: SPARK-13738 URL: https://issues.apache.org/jira/browse/SPARK-13738 Project: Spark Issue Type: Sub-task

[jira] [Resolved] (SPARK-13648) org.apache.spark.sql.hive.client.VersionsSuite fails NoClassDefFoundError on IBM JDK

2016-03-07 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-13648. -- Resolution: Fixed Fix Version/s: 1.6.1 2.0.0 Issue resolved

[jira] [Updated] (SPARK-13648) org.apache.spark.sql.hive.client.VersionsSuite fails NoClassDefFoundError on IBM JDK

2016-03-07 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13648: - Fix Version/s: (was: 1.6.1) 1.6.2

[jira] [Updated] (SPARK-13722) No Push Down for Non-deterministic Predicates through Generate

2016-03-07 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13722: - Assignee: Xiao Li > No Push Down for Non-deterministic Predicates through Gener

[jira] [Resolved] (SPARK-13722) No Push Down for Non-deterministic Predicates through Generate

2016-03-07 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-13722. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11562

[jira] [Updated] (SPARK-13730) Nulls in dataframes getting converted to 0 with spark 2.0 SNAPSHOT

2016-03-07 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13730: - Target Version/s: 2.0.0 > Nulls in dataframes getting converted to 0 with spark

Re: Nulls getting converted to 0 with spark 2.0 SNAPSHOT

2016-03-07 Thread Michael Armbrust
That looks like a bug to me. Open a JIRA? On Mon, Mar 7, 2016 at 11:30 AM, Franklyn D'souza < franklyn.dso...@shopify.com> wrote: > Just wanted to confirm that this is the expected behaviour. > > Basically I'm putting nulls into a non-nullable LongType column and doing > a transformation operati

[jira] [Created] (SPARK-13729) Reimplement the planning tests on SimpleTextRelation

2016-03-07 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-13729: Summary: Reimplement the planning tests on SimpleTextRelation Key: SPARK-13729 URL: https://issues.apache.org/jira/browse/SPARK-13729 Project: Spark

[jira] [Created] (SPARK-13728) Fix ORC PPD

2016-03-07 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-13728: Summary: Fix ORC PPD Key: SPARK-13728 URL: https://issues.apache.org/jira/browse/SPARK-13728 Project: Spark Issue Type: Sub-task

[jira] [Resolved] (SPARK-13694) QueryPlan.expressions should always include all expressions

2016-03-07 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-13694. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11532

[jira] [Updated] (SPARK-13605) Bean encoder cannot handle nonbean properties - no way to Encode nonbean Java objects with columns

2016-03-04 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13605: - Fix Version/s: (was: 1.6.0) > Bean encoder cannot handle nonbean properties - no

[jira] [Updated] (SPARK-13605) Bean encoder cannot handle nonbean properties - no way to Encode nonbean Java objects with columns

2016-03-04 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13605: - Target Version/s: 2.0.0 (was: 1.6.0) > Bean encoder cannot handle nonbean propert

[jira] [Updated] (SPARK-13605) Bean encoder cannot handle nonbean properties - no way to Encode nonbean Java objects with columns

2016-03-04 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13605: - Component/s: SQL > Bean encoder cannot handle nonbean properties - no way to Enc

[jira] [Updated] (SPARK-13605) Bean encoder cannot handle nonbean properties - no way to Encode nonbean Java objects with columns

2016-03-04 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13605: - Description: in the current environment the only way to turn a List or JavaRDD into a

Re: Spark 1.5.2 : change datatype in programaticallly generated schema

2016-03-04 Thread Michael Armbrust
Change the type of a subset of the columns using withColumn, after you have loaded the DataFrame. Here is an example. On Thu, Mar 3,

Re: Spark SQL - udf with entire row as parameter

2016-03-04 Thread Michael Armbrust
You have to use SQL to call it (but you will be able to do it with dataframes in Spark 2.0 due to a better parser). You need to construct a struct(*) and then pass that to your function since a function must have a fixed number of arguments. Here is an example

Re: Does Spark 1.5.x really still support Hive 0.12?

2016-03-04 Thread Michael Armbrust
Read the docs at the link that you pasted: http://spark.apache.org/docs/latest/sql-programming-guide.html#interacting-with-different-versions-of-hive-metastore Spark will always compile against the same version of Hive (1.2.1), but it can dynamically load jars to speak to other versions. On Fri,

[jira] [Created] (SPARK-13683) Finalize the public interface for OutputWriter[Factory]

2016-03-04 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-13683: Summary: Finalize the public interface for OutputWriter[Factory] Key: SPARK-13683 URL: https://issues.apache.org/jira/browse/SPARK-13683 Project: Spark

[jira] [Created] (SPARK-13682) Finalize the public API for FileFormat

2016-03-04 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-13682: Summary: Finalize the public API for FileFormat Key: SPARK-13682 URL: https://issues.apache.org/jira/browse/SPARK-13682 Project: Spark Issue Type

[jira] [Updated] (SPARK-13681) Reimplement CommitFailureTestRelationSuite

2016-03-04 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13681: - Description: This test case got broken by [#11509|https://github.com/apache/spark/pull

[jira] [Created] (SPARK-13681) Reimplement CommitFailureTestRelationSuite

2016-03-04 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-13681: Summary: Reimplement CommitFailureTestRelationSuite Key: SPARK-13681 URL: https://issues.apache.org/jira/browse/SPARK-13681 Project: Spark Issue

[jira] [Created] (SPARK-13665) Initial separation of concerns

2016-03-03 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-13665: Summary: Initial separation of concerns Key: SPARK-13665 URL: https://issues.apache.org/jira/browse/SPARK-13665 Project: Spark Issue Type: Sub-task

[jira] [Created] (SPARK-13664) Simplify and Speedup HadoopFSRelation

2016-03-03 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-13664: Summary: Simplify and Speedup HadoopFSRelation Key: SPARK-13664 URL: https://issues.apache.org/jira/browse/SPARK-13664 Project: Spark Issue Type

Re: Selecting column in dataframe created with incompatible schema causes AnalysisException

2016-03-02 Thread Michael Armbrust
Note that if you specify the schema that you expect when reading JSON you basically get the "relaxed" mode that you are asking for. Records that don't match will end up with nulls. The problem here is Spark SQL knows that the operation you are asking for is invalid given the set of data you let i

[VOTE] Release Apache Spark 1.6.1 (RC1)

2016-03-02 Thread Michael Armbrust
Please vote on releasing the following candidate as Apache Spark version 1.6.1! The vote is open until Saturday, March 5, 2016 at 20:00 UTC and passes if a majority of at least 3+1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.6.1 [ ] -1 Do not release this package because ...

Re: Selecting column in dataframe created with incompatible schema causes AnalysisException

2016-03-02 Thread Michael Armbrust
-dev +user StructType(StructField(data,ArrayType(StructType(StructField( > *stuff,ArrayType(*StructType(StructField(onetype,ArrayType(StructType(StructField(id,LongType,true), > StructField(name,StringType,true)),true),true), StructField(othertype, > ArrayType(StructType(StructField(company,String

Re: Selecting column in dataframe created with incompatible schema causes AnalysisException

2016-03-02 Thread Michael Armbrust
-dev +user StructType(StructField(data,ArrayType(StructType(StructField( > *stuff,ArrayType(*StructType(StructField(onetype,ArrayType(StructType(StructField(id,LongType,true), > StructField(name,StringType,true)),true),true), StructField(othertype, > ArrayType(StructType(StructField(company,String

[jira] [Updated] (SPARK-13393) Column mismatch issue in left_outer join using Spark DataFrame

2016-03-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13393: - Target Version/s: 2.0.0 > Column mismatch issue in left_outer join using Spark DataFr

Re: DataSet Evidence

2016-03-01 Thread Michael Armbrust
Hey Steve, This isn't possible today, but it would not be hard to allow. You should open a feature request JIRA. Michael On Mon, Feb 29, 2016 at 4:55 PM, Steve Lewis wrote: > I have a relatively complex Java object that I would like to use in a > dataset > > if I say > > Encoder evidence = E

Re: Mapper side join with DataFrames API

2016-03-01 Thread Michael Armbrust
Its helpful to always include the output of df.explain(true) when you are asking about performance. On Mon, Feb 29, 2016 at 6:14 PM, Deepak Gopalakrishnan wrote: > Hello All, > > I'm trying to join 2 dataframes A and B with a > > sqlContext.sql("SELECT * FROM A INNER JOIN B ON A.a=B.a"); > > Now

Re: Dataframe Partitioning

2016-03-01 Thread Michael Armbrust
If you have to pick a number, its better to over estimate than underestimate since task launching in spark is relatively cheap compared to spilling to disk or OOMing (now much less likely due to Tungsten). Eventually, we plan to make this dynamic, but you should tune for your particular workload.

[jira] [Commented] (SPARK-13463) Support Column pruning for Dataset logical plan

2016-03-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15174448#comment-15174448 ] Michael Armbrust commented on SPARK-13463: -- If you are reading in a

[jira] [Resolved] (SPARK-13544) Rewrite/Propagate constraints for Aliases in Aggregate

2016-02-29 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13544?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-13544. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11422

Re: Spark SQL support for sub-queries

2016-02-26 Thread Michael Armbrust
There will probably be some subquery support in 2.0. That particular query would be more efficient to express as an argmax however. Here is an example in Spark 1.6

Re: d.filter("id in max(id)")

2016-02-26 Thread Michael Armbrust
You can do max on a struct to get the max value for the first column, along with the values for other columns in the row (an argmax) Here is an example

[jira] [Updated] (SPARK-13383) Keep broadcast hint after column pruning

2016-02-24 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13383: - Assignee: Liang-Chi Hsieh > Keep broadcast hint after column prun

Re: Spark 1.6.1

2016-02-24 Thread Michael Armbrust
> FYI > > On Mon, Feb 22, 2016 at 10:07 PM, Luciano Resende > wrote: > >> >> >> On Mon, Feb 22, 2016 at 9:08 PM, Michael Armbrust > > wrote: >> >>> An update: people.apache.org has been shut down so the release scripts >>> are broken.

Re: Filter on a column having multiple values

2016-02-24 Thread Michael Armbrust
You can do this either with expr("... IN ...") or isin. Here is a full example . On Wed, Feb 24, 2016 at 2:40 PM, Ashok Kumar wrote

Re: How to Exploding a Map[String,Int] column in a DataFrame (Scala)

2016-02-24 Thread Michael Armbrust
You can do this using the explode function defined in org.apache.spark.sql.functions. Here is some example code . On Wed, Feb 24, 2

[jira] [Resolved] (SPARK-13383) Keep broadcast hint after column pruning

2016-02-24 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-13383. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11260

[jira] [Assigned] (SPARK-13092) Track constraints in ExpressionSet

2016-02-23 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust reassigned SPARK-13092: Assignee: Michael Armbrust (was: Sameer Agarwal) > Track constraints

[jira] [Updated] (SPARK-13445) Seleting "data" with window function does not work unless aliased (using PARTITION BY)

2016-02-23 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13445: - Priority: Critical (was: Major) > Seleting "data" with window function

[jira] [Resolved] (SPARK-13440) Option fields in Datasets cause analysis exceptions when resolving columns

2016-02-23 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-13440. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11316

[jira] [Commented] (SPARK-13456) Cannot create encoders for case classes defined in Spark shell after upgrading to Scala 2.11

2016-02-23 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159475#comment-15159475 ] Michael Armbrust commented on SPARK-13456: -- We need to inject OuterSc

[jira] [Commented] (SPARK-1199) Type mismatch in Spark shell when using case class defined in shell

2016-02-23 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159458#comment-15159458 ] Michael Armbrust commented on SPARK-1199: - Not that I know of. Also, please

[jira] [Commented] (SPARK-1199) Type mismatch in Spark shell when using case class defined in shell

2016-02-23 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159436#comment-15159436 ] Michael Armbrust commented on SPARK-1199: - You will have to define your

[jira] [Commented] (SPARK-1199) Type mismatch in Spark shell when using case class defined in shell

2016-02-23 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159362#comment-15159362 ] Michael Armbrust commented on SPARK-1199: - All classes defined in the REPL

Re: Spark 1.6.1

2016-02-22 Thread Michael Armbrust
An update: people.apache.org has been shut down so the release scripts are broken. Will try again after we fix them. On Mon, Feb 22, 2016 at 6:28 PM, Michael Armbrust wrote: > I've kicked off the build. Please be extra careful about merging into > branch-1.6 until after the release.

Re: Spark 1.6.1

2016-02-22 Thread Michael Armbrust
I've kicked off the build. Please be extra careful about merging into branch-1.6 until after the release. On Mon, Feb 22, 2016 at 10:24 AM, Michael Armbrust wrote: > I will cut the RC today. Sorry for the delay! > > On Mon, Feb 22, 2016 at 5:19 AM, Patrick Woody > wrote:

[jira] [Resolved] (SPARK-11972) [Spark SQL] the value of 'hiveconf' parameter in CLI can't be got after enter spark-sql session

2016-02-22 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-11972. -- Resolution: Fixed Fix Version/s: 1.6.1 2.0.0 Issue resolved

[jira] [Resolved] (SPARK-11624) Spark SQL CLI will set sessionstate twice

2016-02-22 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-11624. -- Resolution: Fixed Fix Version/s: 1.6.1 2.0.0 Issue resolved

[jira] [Assigned] (SPARK-13440) Option fields in Datasets cause analysis exceptions when resolving columns

2016-02-22 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust reassigned SPARK-13440: Assignee: Michael Armbrust > Option fields in Datasets cause analysis excepti

[jira] [Resolved] (SPARK-12546) Writing to partitioned parquet table can fail with OOM

2016-02-22 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12546. -- Resolution: Fixed Fix Version/s: 1.6.1 2.0.0 Issue resolved

[jira] [Updated] (SPARK-12546) Writing to partitioned parquet table can fail with OOM

2016-02-22 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12546: - Labels: releasenotes (was: ) > Writing to partitioned parquet table can fail with

Re: Serializing collections in Datasets

2016-02-22 Thread Michael Armbrust
I think this will be fixed in 1.6.1. Can you test when we post the first RC? (hopefully later today) On Mon, Feb 22, 2016 at 1:51 PM, Daniel Siegmann < daniel.siegm...@teamaol.com> wrote: > Experimenting with datasets in Spark 1.6.0 I ran into a serialization > error when using case classes cont

[jira] [Updated] (SPARK-12546) Writing to partitioned parquet table can fail with OOM

2016-02-22 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12546: - Assignee: Michael Armbrust Target Version/s: 1.6.1 Priority

[jira] [Updated] (SPARK-11624) Spark SQL CLI will set sessionstate twice

2016-02-22 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-11624: - Target Version/s: 1.6.1 > Spark SQL CLI will set sessionstate tw

[jira] [Updated] (SPARK-11624) Spark SQL CLI will set sessionstate twice

2016-02-22 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-11624: - Priority: Critical (was: Major) > Spark SQL CLI will set sessionstate tw

[jira] [Updated] (SPARK-13249) Filter null keys for inner join

2016-02-22 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13249: - Shepherd: Michael Armbrust > Filter null keys for inner j

Re: Spark 1.6.1

2016-02-22 Thread Michael Armbrust
I will cut the RC today. Sorry for the delay! On Mon, Feb 22, 2016 at 5:19 AM, Patrick Woody wrote: > Hey Michael, > > Any update on a first cut of the RC? > > Thanks! > -Pat > > On Mon, Feb 15, 2016 at 6:50 PM, Michael Armbrust > wrote: > >> I'm

[jira] [Resolved] (SPARK-13091) Rewrite/Propagate constraints for Aliases

2016-02-19 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-13091. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11144

[jira] [Resolved] (SPARK-13261) Expose maxCharactersPerColumn as a user configurable option

2016-02-19 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-13261. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11147

[jira] [Resolved] (SPARK-12966) Postgres JDBC ArrayType(DecimalType) 'Unable to find server array type'

2016-02-19 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12966. -- Resolution: Fixed Fix Version/s: 2.0.0 > Postgres JDBC ArrayType(DecimalT

[jira] [Updated] (SPARK-13384) Keep attribute qualifiers after dedup in Analyzer

2016-02-19 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13384: - Assignee: Liang-Chi Hsieh > Keep attribute qualifiers after dedup in Analy

[jira] [Resolved] (SPARK-13384) Keep attribute qualifiers after dedup in Analyzer

2016-02-19 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-13384. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11261

Re: equalTo isin not working as expected with a constructed column with DataFrames

2016-02-19 Thread Michael Armbrust
Can you include the output of explain(true) on the dataframe in question. It would also be really helpful to see a small code fragment that reproduces the issue. On Thu, Feb 18, 2016 at 9:10 AM, Mehdi Ben Haj Abbes wrote: > Hi, > I forgot to mention that I'm using the 1.5.1 version. > Regards, >

Re: Spark Job Hanging on Join

2016-02-19 Thread Michael Armbrust
Please include the output of running explain() when reporting performance issues with DataFrames. On Fri, Feb 19, 2016 at 9:31 AM, Tamara Mendt wrote: > Hi all, > > I am running a Spark job that gets stuck attempting to join two > dataframes. The dataframes are not very large, one is about 2 M r

[jira] [Updated] (SPARK-13363) Aggregator not working with DataFrame

2016-02-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13363: - Priority: Blocker (was: Minor) > Aggregator not working with DataFr

[jira] [Updated] (SPARK-13363) Aggregator not working with DataFrame

2016-02-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13363: - Affects Version/s: (was: 2.0.0) 1.6.0 Target Version/s

Re: trouble using Aggregator with DataFrame

2016-02-17 Thread Michael Armbrust
Glad you like it :) This sounds like a bug, and we should fix it as we merge DataFrame / Dataset for 2.0. Could you open JIRA targeted at 2.0? On Wed, Feb 17, 2016 at 2:22 PM, Koert Kuipers wrote: > first of all i wanted to say that i am very happy to see > org.apache.spark.sql.expressions.Agg

Re: How to use a custom partitioner in a dataframe in Spark

2016-02-17 Thread Michael Armbrust
Can you describe what you are trying to accomplish? What would the custom partitioner be? On Tue, Feb 16, 2016 at 1:21 PM, SRK wrote: > Hi, > > How do I use a custom partitioner when I do a saveAsTable in a dataframe. > > > Thanks, > Swetha > > > > -- > View this message in context: > http://ap

Re: cartesian with Dataset

2016-02-17 Thread Michael Armbrust
You will get a cartesian if you do a join/joinWith using lit(true) as the condition. We could consider adding an API for doing that more concisely. On Wed, Feb 17, 2016 at 4:08 AM, Alex Dzhagriev wrote: > Hello all, > > Is anybody aware of any plans to support cartesian for Datasets? Are there

Re: Dataset takes more memory compared to RDD

2016-02-15 Thread Michael Armbrust
What algorithm? Can you provide code? On Fri, Feb 12, 2016 at 3:22 PM, Raghava Mutharaju < m.vijayaragh...@gmail.com> wrote: > Hello All, > > I implemented an algorithm using both the RDDs and the Dataset API (in > Spark 1.6). Dataset version takes lot more memory than the RDDs. Is this > normal?

Re: Spark 1.6.1

2016-02-15 Thread Michael Armbrust
> issues targeting 1.6.1 are fixed > <https://github.com/apache/spark/pull/11131> now > <https://github.com/apache/spark/pull/10539>. > > On 3 February 2016 at 08:16, Daniel Darabos < > daniel.dara...@lynxanalytics.com> wrote: > >> >> On Tue, Feb 2, 201

[jira] [Updated] (SPARK-12583) spark shuffle fails with mesos after 2mins

2016-02-15 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12583: - Target Version/s: 1.6.1 > spark shuffle fails with mesos after 2m

Re: org.apache.spark.sql.AnalysisException: undefined function lit;

2016-02-13 Thread Michael Armbrust
selectExpr just uses the SQL parser to interpret the string you give it. So to get a string literal you would use quotes: df.selectExpr("*", "'" + time.miliseconds() + "' AS ms") On Fri, Feb 12, 2016 at 6:19 PM, Andy Davidson < a...@santacruzintegration.com> wrote: > I am trying to add a column

Re: GroupedDataset needs a mapValues

2016-02-13 Thread Michael Armbrust
Instead of grouping with a lambda function, you can do it with a column expression to avoid materializing an unnecessary tuple: df.groupBy($"_1") Regarding the mapValues, you can do something similar using an Aggregator

Re: broadcast join in SparkSQL requires analyze table noscan

2016-02-10 Thread Michael Armbrust
> > My question is that is "NOSCAN" option a must? If I execute "ANALYZE TABLE > compute statistics" command in Hive shell, is the statistics > going to be used by SparkSQL to decide broadcast join? Yes, spark SQL will only accept the simple no scan version. However, as long as the sizeInBytes

[jira] [Updated] (SPARK-13253) Error aliasing array columns.

2016-02-09 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13253: - Description: Getting an "UnsupportedOperationException" when trying to alia

[jira] [Updated] (SPARK-13253) Error aliasing array columns.

2016-02-09 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13253: - Target Version/s: 1.6.1, 2.0.0 > Error aliasing array colu

[jira] [Updated] (SPARK-13253) Error aliasing array columns.

2016-02-09 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13253: - Affects Version/s: 1.6.0 > Error aliasing array colu

Re: Error aliasing an array column.

2016-02-09 Thread Michael Armbrust
That looks like a bug in toString for columns. Can you open a JIRA? On Tue, Feb 9, 2016 at 1:38 PM, Rakesh Chalasani wrote: > Sorry, didn't realize the mail didn't show the code. Using Spark release > 1.6.0 > > Below is an example to reproduce it. > > import org.apache.spark.sql.SQLContext > va

Re: Preserving partitioning with dataframe select

2016-02-09 Thread Michael Armbrust
RDD level partitioning information is not used to decide when to shuffle for queries planned using Catalyst (since we have better information about distribution from the query plan itself). Instead you should be looking at the logic in EnsureRequirements

[jira] [Resolved] (SPARK-13101) Dataset complex types mapping to DataFrame (element nullability) mismatch

2016-02-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-13101. -- Resolution: Fixed Fix Version/s: (was: 1.6.1) 2.0.0

[jira] [Updated] (SPARK-13101) Dataset complex types mapping to DataFrame (element nullability) mismatch

2016-02-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13101: - Fix Version/s: 1.6.1 > Dataset complex types mapping to DataFrame (element nullabil

[jira] [Commented] (SPARK-11725) Let UDF to handle null value

2016-02-05 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15135264#comment-15135264 ] Michael Armbrust commented on SPARK-11725: -- [~onetoinfin...@yahoo

[jira] [Updated] (SPARK-12939) migrate encoder resolution to Analyzer

2016-02-05 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-12939: - Assignee: Wenchen Fan > migrate encoder resolution to Analy

[jira] [Resolved] (SPARK-12939) migrate encoder resolution to Analyzer

2016-02-05 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12939. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10852

Re: Dataset Encoders for SparseVector

2016-02-04 Thread Michael Armbrust
We are hoping to add better support for UDTs in the next release, but for now you can use kryo to generate an encoder for any class: implicit val vectorEncoder = org.apache.spark.sql.Encoders.kryo[SparseVector] On Thu, Feb 4, 2016 at 12:22 PM, raj.kumar wrote: > Hi, > > I have a DataFrame df wi

[jira] [Reopened] (SPARK-13101) Dataset complex types mapping to DataFrame (element nullability) mismatch

2016-02-03 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust reopened SPARK-13101: -- Assignee: Wenchen Fan > Dataset complex types mapping to DataFrame (elem

[jira] [Updated] (SPARK-13101) Dataset complex types mapping to DataFrame (element nullability) mismatch

2016-02-03 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-13101: - Target Version/s: 1.6.1, 2.0.0 (was: 1.6.1) > Dataset complex types mapping

[jira] [Resolved] (SPARK-13101) Dataset complex types mapping to DataFrame (element nullability) mismatch

2016-02-03 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-13101. -- Resolution: Fixed Fix Version/s: 1.6.1 Issue resolved by pull request 11042

[jira] [Resolved] (SPARK-13166) Remove DataStreamReader/Writer

2016-02-03 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-13166. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11062

Re: Spark DataFrame Catalyst - Another Oracle like query optimizer?

2016-02-03 Thread Michael Armbrust
On Wed, Feb 3, 2016 at 1:42 PM, Nirav Patel wrote: > Awesome! I just read design docs. That is EXACTLY what I was talking > about! Looking forward to it! > Great :) Most of the API is there in 1.6. For the next release I would like to unify DataFrame <-> Dataset and do a lot of work on perform

[jira] [Reopened] (SPARK-12957) Derive and propagate data constrains in logical plan

2016-02-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust reopened SPARK-12957: -- > Derive and propagate data constrains in logical p

[jira] [Resolved] (SPARK-12957) Derive and propagate data constrains in logical plan

2016-02-02 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-12957. -- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 10844

<    6   7   8   9   10   11   12   13   14   15   >