date:20150825


 [ 
https://issues.apache.org/jira/browse/SPARK-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10230:


Assignee: Apache Spark

 LDA public API should use docConcentration
 --

 Key: SPARK-10230
 URL: https://issues.apache.org/jira/browse/SPARK-10230
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Feynman Liang
Assignee: Apache Spark
Priority: Minor

 {{alpha}} is provided as an alias to {{docConcentration}} because it is 
 commonly used in literature. However, we should prefer {{docConcentration}} 
 since it is unambiguous what we mean.
 The public API currently uses {{ {get,set}OptimizeAlpha}} but should instead 
 use {{ {get,set}OptimizeDocConcentration}}. We should also probably deprecate 
 any public API's using {{alpha}} directly and refer users to the 
 corresponding {{docConcentration}} methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2015-08-25 Thread Koert Kuipers (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711668#comment-14711668
 ] 

Koert Kuipers commented on SPARK-3655:
--

oh, thats no good i am using guava without even declaring a dependency...

let me see if there is an alternative to using guava for this




 Support sorting of values in addition to keys (i.e. secondary sort)
 ---

 Key: SPARK-3655
 URL: https://issues.apache.org/jira/browse/SPARK-3655
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 1.1.0, 1.2.0
Reporter: koert kuipers
Assignee: Koert Kuipers

 Now that spark has a sort based shuffle, can we expect a secondary sort soon? 
 There are some use cases where getting a sorted iterator of values per key is 
 helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10231) Update @Since annotation for mllib.classification

Xiangrui Meng created SPARK-10231:
-

 Summary: Update @Since annotation for mllib.classification
 Key: SPARK-10231
 URL: https://issues.apache.org/jira/browse/SPARK-10231
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib
Affects Versions: 1.5.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
Priority: Minor


Some public methods are missing @Since tags, and some versions are not correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-5456) Decimal Type comparison issue

2015-08-25 Thread Brandon Bradley (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711690#comment-14711690
 ] 

Brandon Bradley commented on SPARK-5456:


I'm still experiencing this in 1.4.0 and 1.4.1. I believe the fix for it should 
be in 1.4.1.

 Decimal Type comparison issue
 -

 Key: SPARK-5456
 URL: https://issues.apache.org/jira/browse/SPARK-5456
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0, 1.3.0
Reporter: Kuldeep
Assignee: Adrian Wang
Priority: Blocker
 Fix For: 1.3.2, 1.4.0


 Not quite able to figure this out but here is a junit test to reproduce this, 
 in JavaAPISuite.java
 {code:title=DecimalBug.java}
   @Test
   public void decimalQueryTest() {
 ListRow decimalTable = new ArrayListRow();
 decimalTable.add(RowFactory.create(new BigDecimal(1), new 
 BigDecimal(2)));
 decimalTable.add(RowFactory.create(new BigDecimal(3), new 
 BigDecimal(4)));
 JavaRDDRow rows = sc.parallelize(decimalTable);
 ListStructField fields = new ArrayListStructField(7);
 fields.add(DataTypes.createStructField(a, 
 DataTypes.createDecimalType(), true));
 fields.add(DataTypes.createStructField(b, 
 DataTypes.createDecimalType(), true));
 sqlContext.applySchema(rows.rdd(), 
 DataTypes.createStructType(fields)).registerTempTable(foo);
 Assert.assertEquals(sqlContext.sql(select * from foo where a  
 0).collectAsList(), decimalTable);
   }
 {code}
 Fails with
 java.lang.ClassCastException: java.math.BigDecimal cannot be cast to 
 org.apache.spark.sql.types.Decimal



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10227) sbt build on Scala 2.11 fails


 [ 
https://issues.apache.org/jira/browse/SPARK-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-10227:
--
 Shepherd: Sean Owen
Affects Version/s: 1.5.0
 Target Version/s:   (was: 1.5.0)

Hm I thought we zapped most or all of those, or else we wouldn't be able to 
build a 1.5 release candidate for Scala 2.11: 
https://repository.apache.org/content/repositories/orgapachespark-1137/org/apache/spark/

I wonder if it's only the SBT build that is set to fail on warnings?

Cleaning these up would be the fastest solution anyway. Are you in a position 
to propose a PR? I can work with you on that as I have done a fair bit of 
warning cleanup over time.

 sbt build on Scala 2.11 fails
 -

 Key: SPARK-10227
 URL: https://issues.apache.org/jira/browse/SPARK-10227
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.5.0
Reporter: Luc Bourlier

 Scala 2.11 has additional warnings compare to Scala 2.10, and the addition of 
 'fatal warnings' in the sbt build, the current {{trunk}} (and {{branch-1.5}}) 
 fails to  build with sbt on Scala 2.11.
 Most of the warning are about the {{@transient}} annotation not being set on 
 relevant elements, and a few pointing to some potential bugs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-4223) Support * (meaning all users) as part of the acls


 [ 
https://issues.apache.org/jira/browse/SPARK-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-4223:
-
Assignee: Zhuo Liu

 Support * (meaning all users) as part of the acls
 -

 Key: SPARK-4223
 URL: https://issues.apache.org/jira/browse/SPARK-4223
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Thomas Graves
Assignee: Zhuo Liu

 Currently we support setting view and modify acls but you have to specify a 
 list of users.  It would be nice to support * meaning all users have access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10228) Integer overflow in VertexRDDImpl.count

2015-08-25 Thread Robin Cheng (JIRA)

Robin Cheng created SPARK-10228:
---

 Summary: Integer overflow in VertexRDDImpl.count
 Key: SPARK-10228
 URL: https://issues.apache.org/jira/browse/SPARK-10228
 Project: Spark
  Issue Type: Bug
  Components: GraphX
Affects Versions: 1.4.1
Reporter: Robin Cheng


VertexRDDImpl overrides RDD.count() but aggregates Int instead of Long:

/** The number of vertices in the RDD. */
  override def count(): Long = {
partitionsRDD.map(_.size).reduce(_ + _)
  }

This causes Pregel to stop iterating when the number of messages is negative, 
giving incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10229) Wrong jline dependency when compiled against Scala 2.11


[ 
https://issues.apache.org/jira/browse/SPARK-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711630#comment-14711630
 ] 

Sean Owen commented on SPARK-10229:
---

Repeat of https://issues.apache.org/jira/browse/SPARK-10037 ?
Are you using -Dscala-2.11?

 Wrong jline dependency when compiled against Scala 2.11
 ---

 Key: SPARK-10229
 URL: https://issues.apache.org/jira/browse/SPARK-10229
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.5.0
Reporter: Cheng Lian
Priority: Blocker

 Scala migrated to the official jline in 2.11.0-M4, so the scala-specific fork 
 of jline is gone and you can just depend on the official jline. The 
 nonexistent org.scala-lang:jline:2.11.7 artifact is causing build failure 
 when Spark is built against Scala 2.11.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4223) Support * (meaning all users) as part of the acls


[ 
https://issues.apache.org/jira/browse/SPARK-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711568#comment-14711568
 ] 

Sean Owen commented on SPARK-4223:
--

Done, added as Contributor and assigned

 Support * (meaning all users) as part of the acls
 -

 Key: SPARK-4223
 URL: https://issues.apache.org/jira/browse/SPARK-4223
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Thomas Graves
Assignee: Zhuo Liu

 Currently we support setting view and modify acls but you have to specify a 
 list of users.  It would be nice to support * meaning all users have access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10227) sbt build on Scala 2.11 fails

2015-08-25 Thread Luc Bourlier (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711634#comment-14711634
 ] 

Luc Bourlier commented on SPARK-10227:
--

I am working of a PR right now.

 sbt build on Scala 2.11 fails
 -

 Key: SPARK-10227
 URL: https://issues.apache.org/jira/browse/SPARK-10227
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.5.0
Reporter: Luc Bourlier

 Scala 2.11 has additional warnings compare to Scala 2.10, and the addition of 
 'fatal warnings' in the sbt build, the current {{trunk}} (and {{branch-1.5}}) 
 fails to  build with sbt on Scala 2.11.
 Most of the warning are about the {{@transient}} annotation not being set on 
 relevant elements, and a few pointing to some potential bugs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10219) Error when additional options provided as variable in write.df

2015-08-25 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711642#comment-14711642
 ] 

Shivaram Venkataraman commented on SPARK-10219:
---

I think thats happening because `mode` is actually an argument name that is 
taken in by the write.df method -- So I am not sure you need option=mode, but 
just mode=mode or mode=append should work ?

 Error when additional options provided as variable in write.df
 --

 Key: SPARK-10219
 URL: https://issues.apache.org/jira/browse/SPARK-10219
 Project: Spark
  Issue Type: Bug
  Components: R
Affects Versions: 1.4.0
 Environment: SparkR shell
Reporter: Samuel Alexander
  Labels: spark-shell, sparkR

 Opened a SparkR shell
 Created a df using 
  df - jsonFile(sqlContext, examples/src/main/resources/people.json)
 Assigned a variable like below
  mode - append
 When write.df called using below statement got the mentioned error
  write.df(df, source=org.apache.spark.sql.parquet, path=par_path, 
  option=mode)
 Error in writeType(con, type) : Unsupported type for serialization name
 Whereas mode is passed as append itself, i.e. not via mode variable as 
 below everything works fine
  write.df(df, source=org.apache.spark.sql.parquet, path=par_path, 
  option=append)
 Note: For parquet it is not needed to hanve option. But we are using Spark 
 Salesforce package 
 (http://spark-packages.org/package/springml/spark-salesforce) which require 
 additional options to be passed.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10229) Wrong jline dependency when compiled against Scala 2.11

2015-08-25 Thread Cheng Lian (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711646#comment-14711646
 ] 

Cheng Lian commented on SPARK-10229:


Sorry, I was using {{-Pscala-2.11}}. Thanks for clarification!

 Wrong jline dependency when compiled against Scala 2.11
 ---

 Key: SPARK-10229
 URL: https://issues.apache.org/jira/browse/SPARK-10229
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.5.0
Reporter: Cheng Lian
Priority: Blocker

 Scala migrated to the official jline in 2.11.0-M4, so the scala-specific fork 
 of jline is gone and you can just depend on the official jline. The 
 nonexistent org.scala-lang:jline:2.11.7 artifact is causing build failure 
 when Spark is built against Scala 2.11.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10230) LDA public API should use docConcentration


 [ 
https://issues.apache.org/jira/browse/SPARK-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10230:


Assignee: (was: Apache Spark)

 LDA public API should use docConcentration
 --

 Key: SPARK-10230
 URL: https://issues.apache.org/jira/browse/SPARK-10230
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Feynman Liang
Priority: Minor

 {{alpha}} is provided as an alias to {{docConcentration}} because it is 
 commonly used in literature. However, we should prefer {{docConcentration}} 
 since it is unambiguous what we mean.
 The public API currently uses {{ {get,set}OptimizeAlpha}} but should instead 
 use {{ {get,set}OptimizeDocConcentration}}. We should also probably deprecate 
 any public API's using {{alpha}} directly and refer users to the 
 corresponding {{docConcentration}} methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10230) LDA public API should use docConcentration


[ 
https://issues.apache.org/jira/browse/SPARK-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711691#comment-14711691
 ] 

Apache Spark commented on SPARK-10230:
--

User 'feynmanliang' has created a pull request for this issue:
https://github.com/apache/spark/pull/8422

 LDA public API should use docConcentration
 --

 Key: SPARK-10230
 URL: https://issues.apache.org/jira/browse/SPARK-10230
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Feynman Liang
Priority: Minor

 {{alpha}} is provided as an alias to {{docConcentration}} because it is 
 commonly used in literature. However, we should prefer {{docConcentration}} 
 since it is unambiguous what we mean.
 The public API currently uses {{ {get,set}OptimizeAlpha}} but should instead 
 use {{ {get,set}OptimizeDocConcentration}}. We should also probably deprecate 
 any public API's using {{alpha}} directly and refer users to the 
 corresponding {{docConcentration}} methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10227) sbt build on Scala 2.11 fails

2015-08-25 Thread Luc Bourlier (JIRA)

Luc Bourlier created SPARK-10227:


 Summary: sbt build on Scala 2.11 fails
 Key: SPARK-10227
 URL: https://issues.apache.org/jira/browse/SPARK-10227
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Luc Bourlier


Scala 2.11 has additional warnings compare to Scala 2.10, and the addition of 
'fatal warnings' in the sbt build, the current {{trunk}} (and {{branch-1.5}}) 
fails to  build with sbt on Scala 2.11.

Most of the warning are about the {{@transient}} annotation not being set on 
relevant elements, and a few pointing to some potential bugs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10230) LDA public API should use docConcentration


[ 
https://issues.apache.org/jira/browse/SPARK-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711660#comment-14711660
 ] 

Feynman Liang commented on SPARK-10230:
---

Working on this

 LDA public API should use docConcentration
 --

 Key: SPARK-10230
 URL: https://issues.apache.org/jira/browse/SPARK-10230
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Feynman Liang
Priority: Minor

 {{alpha}} is provided as an alias to {{docConcentration}} because it is 
 commonly used in literature. However, we should prefer {{docConcentration}} 
 since it is unambiguous what we mean.
 The public API currently uses {{ {get,set}OptimizeAlpha}} but should instead 
 use {{ {get,set}OptimizeDocConcentration}}. We should also probably deprecate 
 any public API's using {{alpha}} directly and refer users to the 
 corresponding {{docConcentration}} methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10230) LDA public API should use docConcentration

Feynman Liang created SPARK-10230:
-

 Summary: LDA public API should use docConcentration
 Key: SPARK-10230
 URL: https://issues.apache.org/jira/browse/SPARK-10230
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Feynman Liang
Priority: Minor


{{alpha}} is provided as an alias to {{docConcentration}} because it is 
commonly used in literature. However, we should prefer {{docConcentration}} 
since it is unambiguous what we mean.

The public API currently uses {{ {get,set}OptimizeAlpha}} but should instead 
use {{ {get,set}OptimizeDocConcentration}}. We should also probably deprecate 
any public API's using {{alpha}} directly and refer users to the corresponding 
{{docConcentration}} methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-10198) Turn off Hive verifyPartitionPath by default

2015-08-25 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10198?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-10198.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 8404
[https://github.com/apache/spark/pull/8404]

 Turn off Hive verifyPartitionPath by default
 

 Key: SPARK-10198
 URL: https://issues.apache.org/jira/browse/SPARK-10198
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0, 1.5.0
Reporter: Michael Armbrust
Assignee: Michael Armbrust
Priority: Blocker
 Fix For: 1.5.0


 I've seen several cases in production where this option either causes us to 
 fail reading valid tables, or incorrectly returns no results.  It also 
 invalidates our new metastore partition pruning feature.  Since there is not 
 much time to dig into the root cause, I propose we turn it off by default for 
 Spark 1.5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-8531) Update ML user guide for MinMaxScaler

2015-08-25 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-8531.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 7211
[https://github.com/apache/spark/pull/7211]

 Update ML user guide for MinMaxScaler
 -

 Key: SPARK-8531
 URL: https://issues.apache.org/jira/browse/SPARK-8531
 Project: Spark
  Issue Type: Documentation
  Components: ML
Affects Versions: 1.5.0
Reporter: yuhao yang
Assignee: yuhao yang
Priority: Minor
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10231) Update @Since annotation for mllib.classification


[ 
https://issues.apache.org/jira/browse/SPARK-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711682#comment-14711682
 ] 

Apache Spark commented on SPARK-10231:
--

User 'mengxr' has created a pull request for this issue:
https://github.com/apache/spark/pull/8421

 Update @Since annotation for mllib.classification
 -

 Key: SPARK-10231
 URL: https://issues.apache.org/jira/browse/SPARK-10231
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib
Affects Versions: 1.5.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
Priority: Minor

 Some public methods are missing @Since tags, and some versions are not 
 correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-10188) Pyspark CrossValidator with RMSE selects incorrect model

2015-08-25 Thread Noel Smith (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noel Smith resolved SPARK-10188.

Resolution: Fixed

 Pyspark CrossValidator with RMSE selects incorrect model
 

 Key: SPARK-10188
 URL: https://issues.apache.org/jira/browse/SPARK-10188
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 1.5.0
Reporter: Noel Smith

 Pyspark {{CrossValidator}} is giving incorrect results when selecting 
 estimators using RMSE as an evaluation metric.
 In the example below, it should be selecting the {{LogisticRegression}} 
 estimator with zero regularization as that gives the most accurate result, 
 but instead it selects the one with the largest.
 Probably related to: SPARK-10097
 {code}
 from pyspark.ml.evaluation import RegressionEvaluator
 from pyspark.ml.regression import LinearRegression
 from pyspark.ml.tuning import ParamGridBuilder, CrossValidator, 
 CrossValidatorModel
 from pyspark.ml.feature import Binarizer
 from pyspark.mllib.linalg import Vectors
 from pyspark.sql import SQLContext
 sqlContext = SQLContext(sc)
 # Label = 2 * feature
 train = sqlContext.createDataFrame([
 (Vectors.dense([10.0]), 20.0), 
 (Vectors.dense([100.0]), 200.0), 
 (Vectors.dense([1000.0]), 2000.0)] * 10,
 [features, label])
 test = sqlContext.createDataFrame([
 (Vectors.dense([1000.0]),)],  
 [features])
 # Expected prediction 2000.0
 print LinearRegression(regParam=0.0).fit(train).transform(test).collect() # 
 Predicts 2000.0 (perfect)
 print LinearRegression(regParam=100.0).fit(train).transform(test).collect() # 
 Predicts 1869.31
 print 
 LinearRegression(regParam=100.0).fit(train).transform(test).collect() # 
 741.08 (worst)
 # Cross-validation
 lr = LinearRegression()
 rmse_eval = RegressionEvaluator(metricName=rmse)
 grid = (ParamGridBuilder()
 .addGrid( lr.regParam, [0.0, 100.0, 100.0] )
 .build())
 cv = CrossValidator(estimator=lr, estimatorParamMaps=grid, 
 evaluator=rmse_eval)
 cv_model = cv.fit(train)
 cv_model.bestModel.transform(test).collect() # Predicts 741.08 (i.e. worst 
 model selected)
 {code}
 Once workaround for users would be to add a wrapper around the selected 
 evaluator to invert the metric:
 {code}
 class InvertedEvaluator(Evaluator):
 def __init__(self, evaluator):
 super(Evaluator, self).__init__()
 self.evaluator = evaluator
 
 def _evaluate(self, dataset):
 return -self.evaluator.evaluate(dataset)
  invertedEvaluator = InvertedEvaluator(RegressionEvaluator(metricName=rmse))
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2015-08-25 Thread Koert Kuipers (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711701#comment-14711701
 ] 

Koert Kuipers commented on SPARK-3655:
--

Great. We have stress tested it with millions of records per key (and only
1.5g of ram per executor) to make sure there was no hidden assumption that
data needs to fit in memory somehow, and it worked fine. Seems the
shuffle-based sort keeps it promise...



 Support sorting of values in addition to keys (i.e. secondary sort)
 ---

 Key: SPARK-3655
 URL: https://issues.apache.org/jira/browse/SPARK-3655
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 1.1.0, 1.2.0
Reporter: koert kuipers
Assignee: Koert Kuipers

 Now that spark has a sort based shuffle, can we expect a secondary sort soon? 
 There are some use cases where getting a sorted iterator of values per key is 
 helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3533) Add saveAsTextFileByKey() method to RDDs

2015-08-25 Thread Jason Hubbard (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711291#comment-14711291
 ] 

Jason Hubbard commented on SPARK-3533:
--

Spark SQL has the ability to write to multiple file locations already 
SPARK-3007.  I'm not recommending converting your RDD to DataFrame just to 
write to multiple locations, but it might be beneficial for them to share the 
same mechanism.  One current limitation of the Spark SQL implementation is that 
each split will open a new Writer for each hive partition, and if there are a 
lot of hive partitions spread across the splits then it will cause many small 
files and possibly degrade performance because of memory usage.

 Add saveAsTextFileByKey() method to RDDs
 

 Key: SPARK-3533
 URL: https://issues.apache.org/jira/browse/SPARK-3533
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, Spark Core
Affects Versions: 1.1.0
Reporter: Nicholas Chammas

 Users often have a single RDD of key-value pairs that they want to save to 
 multiple locations based on the keys.
 For example, say I have an RDD like this:
 {code}
  a = sc.parallelize(['Nick', 'Nancy', 'Bob', 'Ben', 
  'Frankie']).keyBy(lambda x: x[0])
  a.collect()
 [('N', 'Nick'), ('N', 'Nancy'), ('B', 'Bob'), ('B', 'Ben'), ('F', 'Frankie')]
  a.keys().distinct().collect()
 ['B', 'F', 'N']
 {code}
 Now I want to write the RDD out to different paths depending on the keys, so 
 that I have one output directory per distinct key. Each output directory 
 could potentially have multiple {{part-}} files, one per RDD partition.
 So the output would look something like:
 {code}
 /path/prefix/B [/part-1, /part-2, etc]
 /path/prefix/F [/part-1, /part-2, etc]
 /path/prefix/N [/part-1, /part-2, etc]
 {code}
 Though it may be possible to do this with some combination of 
 {{saveAsNewAPIHadoopFile()}}, {{saveAsHadoopFile()}}, and the 
 {{MultipleTextOutputFormat}} output format class, it isn't straightforward. 
 It's not clear if it's even possible at all in PySpark.
 Please add a {{saveAsTextFileByKey()}} method or something similar to RDDs 
 that makes it easy to save RDDs out to multiple locations at once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2015-08-25 Thread Koert Kuipers (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711382#comment-14711382
 ] 

Koert Kuipers commented on SPARK-3655:
--

glad to hear it worked well.

totally agree guava dependency mismatch is a pain. spark-sorted does not
have a dependency on guava. could it be one of your other dependencies uses
guava?




 Support sorting of values in addition to keys (i.e. secondary sort)
 ---

 Key: SPARK-3655
 URL: https://issues.apache.org/jira/browse/SPARK-3655
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 1.1.0, 1.2.0
Reporter: koert kuipers
Assignee: Koert Kuipers

 Now that spark has a sort based shuffle, can we expect a secondary sort soon? 
 There are some use cases where getting a sorted iterator of values per key is 
 helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-10226) Error occured in SparkSQL when using !=


[ 
https://issues.apache.org/jira/browse/SPARK-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711410#comment-14711410
 ] 

Herman van Hovell edited comment on SPARK-10226 at 8/25/15 3:11 PM:


Apparently most databases support this: 
http://stackoverflow.com/questions/723195/should-i-use-or-for-not-equal-in-tsql

I wouldn't call this a bug though. It is more of an improvement.


was (Author: hvanhovell):
Apparently most databases support this: 
http://stackoverflow.com/questions/723195/should-i-use-or-for-not-equal-in-tsql

 Error occured in SparkSQL when using  !=
 

 Key: SPARK-10226
 URL: https://issues.apache.org/jira/browse/SPARK-10226
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: wangwei

 DataSource:  
 src/main/resources/kv1.txt
 SQL: 
   1. create table src(id string, name string);
   2. load data local inpath 
 '${SparkHome}/examples/src/main/resources/kv1.txt' into table src;
   3. select count( * ) from src where id != '0';
 [ERROR] Could not expand event
 java.lang.IllegalArgumentException: != 0;: event not found
   at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779)
   at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631)
   at jline.console.ConsoleReader.accept(ConsoleReader.java:2019)
   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666)
   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269)
   at 
 org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231)
   at 
 org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at 
 org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:666)
   at 
 org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178)
   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118)
   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10183) Expose the SparkR backend api

2015-08-25 Thread Amos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711417#comment-14711417
 ] 

Amos commented on SPARK-10183:
--

Pushed code up last night to link R to a running spark context. I'm not able to 
post a link from my iPhone but it's in elbamos/incubator-Zeppelin, branch 
reinterpreter, and the code you'll care about is in classes RContext (which 
does the work), RBackendHelper (because the Backend is private) and RStatics. 

 Expose the SparkR backend api
 -

 Key: SPARK-10183
 URL: https://issues.apache.org/jira/browse/SPARK-10183
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Reporter: Amos
Priority: Minor

 The Backend class is currently scoped to the api.r package. I'm accessing it, 
 for the Zeppelin project, so I can start SparkR against an already-running 
 spark context. To do this I've had to create a helper class withing api.r. It 
 would be better if the backend were exposed. It isn't a tremendous amount of 
 functionality - create a backend, start it, stop it. 
 (If we want to be really clever, it could also be passed a spark context and 
 make that available to R clients, facilitate passing rdd's back and forth, 
 etc. I'll be pushing code that does some of that to Zeppelin in a day or two 
 if that helps.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10055) San Francisco Crime Classification

2015-08-25 Thread Kai Sasaki (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711298#comment-14711298
 ] 

Kai Sasaki commented on SPARK-10055:


I submitted the initial version of this competition. Although the score is not 
good, there are several points I found in using Spark ML API. There might be 
something which is just caused by my lack of knowledge of Spark ML. So if we 
can already solve with existing code, please let me know.

* There does not seem to be {{Transformer}} which can cast type of columns. In 
this case, {{X}} and {{Y}} are String as default when read by 
[spark-csv|http://spark-packages.org/package/databricks/spark-csv].
  In order to use {{StandardScaler}} to {{X}} and {{Y}}, they must be numeric 
types. I cannot do that with Spark ML `Transformer`. Fortunately, {{spark-csv}}
  can infer types of schema to reading all data once. But in case of no such 
option in reading library, I think it is better to cast column types in Spark 
ML pipeline.
  
* {{StringIndexer}} exports its labels in order by frequencies. But in this 
competition, we have to write in alphabetical order. We have to write some 
extra code
  to convert frequency order labels to alphabetical order.
  
* {{StandardScaler}} can only receive vector data as its own input. In this 
case, I want to scale {{X}} and {{Y}} with {{StandardScaler}}. 
  But these are simple double data, it is necessary to assemble these values 
into feature vector. Is there some case to use `StandardScaler`
  to simple Int data or Double data? We have to assemble these data into a 
feature vector before scaling?
  
The code is 
[here|https://github.com/Lewuathe/kaggle-jobs/blob/master/src/main/scala/com/lewuathe/SfCrimeClassification.scala].
 Thank you.


 San Francisco Crime Classification
 --

 Key: SPARK-10055
 URL: https://issues.apache.org/jira/browse/SPARK-10055
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Xiangrui Meng
Assignee: Xusen Yin

 Apply ML pipeline API to San Francisco Crime Classification 
 (https://www.kaggle.com/c/sf-crime).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10226) Error occured in SparkSQL when using !=


 [ 
https://issues.apache.org/jira/browse/SPARK-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10226:


Assignee: Apache Spark

 Error occured in SparkSQL when using  !=
 

 Key: SPARK-10226
 URL: https://issues.apache.org/jira/browse/SPARK-10226
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: wangwei
Assignee: Apache Spark

 DataSource:  
 src/main/resources/kv1.txt
 SQL: 
   1. create table src(id string, name string);
   2. load data local inpath 
 '${SparkHome}/examples/src/main/resources/kv1.txt' into table src;
   3. select count(*) from src where id != '0';
 [ERROR] Could not expand event
 java.lang.IllegalArgumentException: != 0;: event not found
   at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779)
   at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631)
   at jline.console.ConsoleReader.accept(ConsoleReader.java:2019)
   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666)
   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269)
   at 
 org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231)
   at 
 org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at 
 org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:666)
   at 
 org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178)
   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118)
   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10215) Div of Decimal returns null

2015-08-25 Thread Yi Zhou (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711327#comment-14711327
 ] 

Yi Zhou commented on SPARK-10215:
-

This issue cause cases which is relative to 'decimal' type  to fail , so 
hopefully it can be fixed in Spark 1.5.0.
Thanks in advance !

 Div of Decimal returns null
 ---

 Key: SPARK-10215
 URL: https://issues.apache.org/jira/browse/SPARK-10215
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Cheng Hao
Priority: Blocker

 {code}
 val d = Decimal(1.12321)
 val df = Seq((d, 1)).toDF(a, b)
 df.selectExpr(b * a / b).collect() = Array(Row(null))
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-10226) Error occured in SparkSQL when using !=


[ 
https://issues.apache.org/jira/browse/SPARK-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711400#comment-14711400
 ] 

Herman van Hovell edited comment on SPARK-10226 at 8/25/15 3:05 PM:


In what SQL dialect is {{!=}} a valid symbol for {{not equals}}? I thought 
pretty much all SQL environments use {{}} for this.


was (Author: hvanhovell):
In what SQL dialect is {{!=}} a valid symbol {{not equals}}? I thought pretty 
much all SQL environments use {{}} for this.

 Error occured in SparkSQL when using  !=
 

 Key: SPARK-10226
 URL: https://issues.apache.org/jira/browse/SPARK-10226
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: wangwei

 DataSource:  
 src/main/resources/kv1.txt
 SQL: 
   1. create table src(id string, name string);
   2. load data local inpath 
 '${SparkHome}/examples/src/main/resources/kv1.txt' into table src;
   3. select count( * ) from src where id != '0';
 [ERROR] Could not expand event
 java.lang.IllegalArgumentException: != 0;: event not found
   at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779)
   at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631)
   at jline.console.ConsoleReader.accept(ConsoleReader.java:2019)
   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666)
   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269)
   at 
 org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231)
   at 
 org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at 
 org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:666)
   at 
 org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178)
   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118)
   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10226) Error occured in SparkSQL when using !=


[ 
https://issues.apache.org/jira/browse/SPARK-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711363#comment-14711363
 ] 

Apache Spark commented on SPARK-10226:
--

User 'small-wang' has created a pull request for this issue:
https://github.com/apache/spark/pull/8420

 Error occured in SparkSQL when using  !=
 

 Key: SPARK-10226
 URL: https://issues.apache.org/jira/browse/SPARK-10226
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: wangwei

 DataSource:  
 src/main/resources/kv1.txt
 SQL: 
   1. create table src(id string, name string);
   2. load data local inpath 
 '${SparkHome}/examples/src/main/resources/kv1.txt' into table src;
   3. select count( * ) from src where id != '0';
 [ERROR] Could not expand event
 java.lang.IllegalArgumentException: != 0;: event not found
   at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779)
   at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631)
   at jline.console.ConsoleReader.accept(ConsoleReader.java:2019)
   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666)
   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269)
   at 
 org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231)
   at 
 org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at 
 org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:666)
   at 
 org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178)
   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118)
   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10226) Error occured in SparkSQL when using !=


[ 
https://issues.apache.org/jira/browse/SPARK-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711410#comment-14711410
 ] 

Herman van Hovell commented on SPARK-10226:
---

Apparently most databases support this: 
http://stackoverflow.com/questions/723195/should-i-use-or-for-not-equal-in-tsql

 Error occured in SparkSQL when using  !=
 

 Key: SPARK-10226
 URL: https://issues.apache.org/jira/browse/SPARK-10226
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: wangwei

 DataSource:  
 src/main/resources/kv1.txt
 SQL: 
   1. create table src(id string, name string);
   2. load data local inpath 
 '${SparkHome}/examples/src/main/resources/kv1.txt' into table src;
   3. select count( * ) from src where id != '0';
 [ERROR] Could not expand event
 java.lang.IllegalArgumentException: != 0;: event not found
   at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779)
   at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631)
   at jline.console.ConsoleReader.accept(ConsoleReader.java:2019)
   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666)
   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269)
   at 
 org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231)
   at 
 org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at 
 org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:666)
   at 
 org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178)
   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118)
   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6951) History server slow startup if the event log directory is large

2015-08-25 Thread Thomas Graves (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-6951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711292#comment-14711292
]

Thomas Graves commented on SPARK-6951:
--

Sorry I was wrong.. I just went and tested it and it does start up fairly
quickly now. I'm having problems with it getting stuck reading large files
which is a separate issue.

History server slow startup if the event log directory is large
---

Key: SPARK-6951
URL: https://issues.apache.org/jira/browse/SPARK-6951
Project: Spark
Issue Type: Bug
Components: Web UI
Affects Versions: 1.3.0
Reporter: Matt Cheah

I started my history server, then navigated to the web UI where I expected to
be able to view some completed applications, but the webpage was not
available. It turned out that the History Server was not finished parsing all
of the event logs in the event log directory that I had specified. I had
accumulated a lot of event logs from months of running Spark, so it would
have taken a very long time for the History Server to crunch through them
all. I purged the event log directory and started from scratch, and the UI
loaded immediately.
We should have a pagination strategy or parse the directory lazily to avoid
needing to wait after starting the history server.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10226) Error occured in SparkSQL when using !=


[ 
https://issues.apache.org/jira/browse/SPARK-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711351#comment-14711351
 ] 

Apache Spark commented on SPARK-10226:
--

User 'small-wang' has created a pull request for this issue:
https://github.com/apache/spark/pull/8419

 Error occured in SparkSQL when using  !=
 

 Key: SPARK-10226
 URL: https://issues.apache.org/jira/browse/SPARK-10226
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: wangwei

 DataSource:  
 src/main/resources/kv1.txt
 SQL: 
   1. create table src(id string, name string);
   2. load data local inpath 
 '${SparkHome}/examples/src/main/resources/kv1.txt' into table src;
   3. select count( * ) from src where id != '0';
 [ERROR] Could not expand event
 java.lang.IllegalArgumentException: != 0;: event not found
   at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779)
   at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631)
   at jline.console.ConsoleReader.accept(ConsoleReader.java:2019)
   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666)
   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269)
   at 
 org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231)
   at 
 org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at 
 org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:666)
   at 
 org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178)
   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118)
   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10226) Error occured in SparkSQL when using !=


[ 
https://issues.apache.org/jira/browse/SPARK-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711400#comment-14711400
 ] 

Herman van Hovell commented on SPARK-10226:
---

In what SQL dialect is {{!=}} a valid symbol {{not equals}}? I thought pretty 
much all SQL environments use {{}} for this.

 Error occured in SparkSQL when using  !=
 

 Key: SPARK-10226
 URL: https://issues.apache.org/jira/browse/SPARK-10226
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: wangwei

 DataSource:  
 src/main/resources/kv1.txt
 SQL: 
   1. create table src(id string, name string);
   2. load data local inpath 
 '${SparkHome}/examples/src/main/resources/kv1.txt' into table src;
   3. select count( * ) from src where id != '0';
 [ERROR] Could not expand event
 java.lang.IllegalArgumentException: != 0;: event not found
   at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779)
   at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631)
   at jline.console.ConsoleReader.accept(ConsoleReader.java:2019)
   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666)
   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269)
   at 
 org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231)
   at 
 org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at 
 org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:666)
   at 
 org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178)
   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118)
   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2015-08-25 Thread Nick Xie (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711435#comment-14711435
 ] 

Nick Xie commented on SPARK-3655:
-

It is in your api/java/GroupSorted.scala

line 8: import com.google.common.collect.{ Ordering = GuavaOrdering }
.
line 29:   private implicit def ordering[K]: Ordering[K] = 
comparatorToOrdering(GuavaOrdering.natural.asInstanceOf[Comparator[K]])

 Support sorting of values in addition to keys (i.e. secondary sort)
 ---

 Key: SPARK-3655
 URL: https://issues.apache.org/jira/browse/SPARK-3655
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 1.1.0, 1.2.0
Reporter: koert kuipers
Assignee: Koert Kuipers

 Now that spark has a sort based shuffle, can we expect a secondary sort soon? 
 There are some use cases where getting a sorted iterator of values per key is 
 helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2015-08-25 Thread Nick Xie (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711451#comment-14711451
 ] 

Nick Xie commented on SPARK-3655:
-

For the record, the data file is 25 million rows and about 3000 unique keys, so 
that's about 8000 records on average to be sorted per key on the timestamp.

 Support sorting of values in addition to keys (i.e. secondary sort)
 ---

 Key: SPARK-3655
 URL: https://issues.apache.org/jira/browse/SPARK-3655
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 1.1.0, 1.2.0
Reporter: koert kuipers
Assignee: Koert Kuipers

 Now that spark has a sort based shuffle, can we expect a secondary sort soon? 
 There are some use cases where getting a sorted iterator of values per key is 
 helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10226) Error occured in SparkSQL when using !=


 [ 
https://issues.apache.org/jira/browse/SPARK-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10226:


Assignee: (was: Apache Spark)

 Error occured in SparkSQL when using  !=
 

 Key: SPARK-10226
 URL: https://issues.apache.org/jira/browse/SPARK-10226
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: wangwei

 DataSource:  
 src/main/resources/kv1.txt
 SQL: 
   1. create table src(id string, name string);
   2. load data local inpath 
 '${SparkHome}/examples/src/main/resources/kv1.txt' into table src;
   3. select count(*) from src where id != '0';
 [ERROR] Could not expand event
 java.lang.IllegalArgumentException: != 0;: event not found
   at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779)
   at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631)
   at jline.console.ConsoleReader.accept(ConsoleReader.java:2019)
   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666)
   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269)
   at 
 org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231)
   at 
 org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at 
 org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:666)
   at 
 org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178)
   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118)
   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10226) Error occured in SparkSQL when using !=


[ 
https://issues.apache.org/jira/browse/SPARK-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711302#comment-14711302
 ] 

Apache Spark commented on SPARK-10226:
--

User 'small-wang' has created a pull request for this issue:
https://github.com/apache/spark/pull/8418

 Error occured in SparkSQL when using  !=
 

 Key: SPARK-10226
 URL: https://issues.apache.org/jira/browse/SPARK-10226
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: wangwei

 DataSource:  
 src/main/resources/kv1.txt
 SQL: 
   1. create table src(id string, name string);
   2. load data local inpath 
 '${SparkHome}/examples/src/main/resources/kv1.txt' into table src;
   3. select count(*) from src where id != '0';
 [ERROR] Could not expand event
 java.lang.IllegalArgumentException: != 0;: event not found
   at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779)
   at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631)
   at jline.console.ConsoleReader.accept(ConsoleReader.java:2019)
   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666)
   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269)
   at 
 org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231)
   at 
 org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at 
 org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:666)
   at 
 org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178)
   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118)
   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4223) Support * (meaning all users) as part of the acls

2015-08-25 Thread Thomas Graves (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711384#comment-14711384
 ] 

Thomas Graves commented on SPARK-4223:
--

[~srowen] [~rxin] do one of you have permissions to give 'zhuoliu' committer 
access so we can assign this jira to him?

 Support * (meaning all users) as part of the acls
 -

 Key: SPARK-4223
 URL: https://issues.apache.org/jira/browse/SPARK-4223
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Thomas Graves

 Currently we support setting view and modify acls but you have to specify a 
 list of users.  It would be nice to support * meaning all users have access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4223) Support * (meaning all users) as part of the acls

2015-08-25 Thread Zhuo Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711574#comment-14711574
 ] 

Zhuo Liu commented on SPARK-4223:
-

Thank you! [~sowen] [~tgraves]

 Support * (meaning all users) as part of the acls
 -

 Key: SPARK-4223
 URL: https://issues.apache.org/jira/browse/SPARK-4223
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 1.2.0
Reporter: Thomas Graves
Assignee: Zhuo Liu

 Currently we support setting view and modify acls but you have to specify a 
 list of users.  It would be nice to support * meaning all users have access.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10229) Wrong jline dependency when compiled against Scala 2.11

2015-08-25 Thread Cheng Lian (JIRA)

Cheng Lian created SPARK-10229:
--

 Summary: Wrong jline dependency when compiled against Scala 2.11
 Key: SPARK-10229
 URL: https://issues.apache.org/jira/browse/SPARK-10229
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.5.0
Reporter: Cheng Lian
Priority: Blocker


Scala migrated to the official jline in 2.11.0-M4, so the scala-specific fork 
of jline is gone and you can just depend on the official jline. The nonexistent 
org.scala-lang:jline:2.11.7 artifact is causing build failure when Spark is 
built against Scala 2.11.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-10228) Integer overflow in VertexRDDImpl.count


 [ 
https://issues.apache.org/jira/browse/SPARK-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-10228.
---
Resolution: Duplicate

Always best to look at master first, since you'd see it was already fixed:

https://github.com/apache/spark/blame/9e952ecbce670e9b532a1c664a4d03b66e404112/graphx/src/main/scala/org/apache/spark/graphx/impl/VertexRDDImpl.scala

https://issues.apache.org/jira/browse/SPARK-3190


 Integer overflow in VertexRDDImpl.count
 ---

 Key: SPARK-10228
 URL: https://issues.apache.org/jira/browse/SPARK-10228
 Project: Spark
  Issue Type: Bug
  Components: GraphX
Affects Versions: 1.4.1
Reporter: Robin Cheng

 VertexRDDImpl overrides RDD.count() but aggregates Int instead of Long:
 /** The number of vertices in the RDD. */
   override def count(): Long = {
 partitionsRDD.map(_.size).reduce(_ + _)
   }
 This causes Pregel to stop iterating when the number of messages is 
 negative, giving incorrect results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10226) Error occured in SparkSQL when using !=

2015-08-25 Thread wangwei (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wangwei updated SPARK-10226:

Description: 
DataSource:  
src/main/resources/kv1.txt

SQL: 
  1. create table src(id string, name string);
  2. load data local inpath 
'${SparkHome}/examples/src/main/resources/kv1.txt' into table src;
  3. select count( * ) from src where id != '0';

[ERROR] Could not expand event
java.lang.IllegalArgumentException: != 0;: event not found
at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779)
at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631)
at jline.console.ConsoleReader.accept(ConsoleReader.java:2019)
at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666)
at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:666)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

  was:
DataSource:  
src/main/resources/kv1.txt

SQL: 
  1. create table src(id string, name string);
  2. load data local inpath 
'${SparkHome}/examples/src/main/resources/kv1.txt' into table src;
  3. select count(*) from src where id != '0';

[ERROR] Could not expand event
java.lang.IllegalArgumentException: != 0;: event not found
at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779)
at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631)
at jline.console.ConsoleReader.accept(ConsoleReader.java:2019)
at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666)
at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231)
at 
org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:666)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


 Error occured in SparkSQL when using  !=
 

 Key: SPARK-10226
 URL: https://issues.apache.org/jira/browse/SPARK-10226
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: wangwei

 DataSource:  
 src/main/resources/kv1.txt
 SQL: 
   1. create table src(id string, name string);
   2. load data local inpath 
 '${SparkHome}/examples/src/main/resources/kv1.txt' into table src;
   3. select count( * ) from src where id != '0';
 [ERROR] Could not expand event
 java.lang.IllegalArgumentException: != 0;: event not found
   at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779)
   at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631)
   at jline.console.ConsoleReader.accept(ConsoleReader.java:2019)
   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666)
   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269)
   at 
 org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231)
   at 
 org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at

[jira] [Commented] (SPARK-3655) Support sorting of values in addition to keys (i.e. secondary sort)

2015-08-25 Thread Nick Xie (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711367#comment-14711367
 ] 

Nick Xie commented on SPARK-3655:
-

It worked really well on the cluster.  :-)  I did notice that it had a 
dependency on Google guava classes.  Any way to rid of this dependency?   guava 
dependency mismatch is a pain with spark and hadoop versions.

 Support sorting of values in addition to keys (i.e. secondary sort)
 ---

 Key: SPARK-3655
 URL: https://issues.apache.org/jira/browse/SPARK-3655
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 1.1.0, 1.2.0
Reporter: koert kuipers
Assignee: Koert Kuipers

 Now that spark has a sort based shuffle, can we expect a secondary sort soon? 
 There are some use cases where getting a sorted iterator of values per key is 
 helpful.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-10229) Wrong jline dependency when compiled against Scala 2.11

2015-08-25 Thread Cheng Lian (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Lian resolved SPARK-10229.

Resolution: Not A Problem

I was using {{-Pscala-2.11}} since {{scala-2.11}} is a POM profile. But it 
should be {{-Dscala-2.11}}.

 Wrong jline dependency when compiled against Scala 2.11
 ---

 Key: SPARK-10229
 URL: https://issues.apache.org/jira/browse/SPARK-10229
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 1.5.0
Reporter: Cheng Lian
Priority: Blocker

 Scala migrated to the official jline in 2.11.0-M4, so the scala-specific fork 
 of jline is gone and you can just depend on the official jline. The 
 nonexistent org.scala-lang:jline:2.11.7 artifact is causing build failure 
 when Spark is built against Scala 2.11.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10253) Remove Guava dependencies in MLlib java tests


[ 
https://issues.apache.org/jira/browse/SPARK-10253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712443#comment-14712443
 ] 

Feynman Liang commented on SPARK-10253:
---

Working on this

 Remove Guava dependencies in MLlib java tests
 -

 Key: SPARK-10253
 URL: https://issues.apache.org/jira/browse/SPARK-10253
 Project: Spark
  Issue Type: Improvement
  Components: ML, MLlib
Reporter: Feynman Liang
Priority: Minor

 Many tests depend on Google Guava's {{Lists.newArrayList}} when 
 {{java.util.Arrays.asList}} could be used instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10253) Remove Guava dependencies in MLlib java tests

Feynman Liang created SPARK-10253:
-

 Summary: Remove Guava dependencies in MLlib java tests
 Key: SPARK-10253
 URL: https://issues.apache.org/jira/browse/SPARK-10253
 Project: Spark
  Issue Type: Improvement
  Components: ML, MLlib
Reporter: Feynman Liang
Priority: Minor


Many tests depend on Google Guava's {{Lists.newArrayList}} when 
{{java.util.Arrays.asList}} could be used instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10273) Add @since annotation to pyspark.mllib.feature

Xiangrui Meng created SPARK-10273:
-

 Summary: Add @since annotation to pyspark.mllib.feature
 Key: SPARK-10273
 URL: https://issues.apache.org/jira/browse/SPARK-10273
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib, PySpark
Reporter: Xiangrui Meng
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10274) Add @since annotation to pyspark.mllib.fpm

Xiangrui Meng created SPARK-10274:
-

 Summary: Add @since annotation to pyspark.mllib.fpm
 Key: SPARK-10274
 URL: https://issues.apache.org/jira/browse/SPARK-10274
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib, PySpark
Reporter: Xiangrui Meng
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10269) Add @since annotation to pyspark.mllib.classification

Xiangrui Meng created SPARK-10269:
-

 Summary: Add @since annotation to pyspark.mllib.classification
 Key: SPARK-10269
 URL: https://issues.apache.org/jira/browse/SPARK-10269
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib, PySpark
Reporter: Xiangrui Meng
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10270) Add/Replace some Java friendly DataFrame API

2015-08-25 Thread Cheng Hao (JIRA)

Cheng Hao created SPARK-10270:
-

 Summary: Add/Replace some Java friendly DataFrame API
 Key: SPARK-10270
 URL: https://issues.apache.org/jira/browse/SPARK-10270
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Cheng Hao


Currently in DataFrame, we have API like:
{code}
def join(right: DataFrame, usingColumns: Seq[String]): DataFrame
def dropDuplicates(colNames: Seq[String]): DataFrame
def dropDuplicates(colNames: Array[String]): DataFrame
{code}

Those API not like the so friendly to Java programmers, change it to:
{code}
def join(right: DataFrame, usingColumns: String*): DataFrame
def dropDuplicates(colNames: String*): DataFrame
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10277) Add @since annotation to pyspark.mllib.regression

Xiangrui Meng created SPARK-10277:
-

 Summary: Add @since annotation to pyspark.mllib.regression
 Key: SPARK-10277
 URL: https://issues.apache.org/jira/browse/SPARK-10277
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib, PySpark
Reporter: Xiangrui Meng
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10276) Add @since annotation to pyspark.mllib.recommendation

Xiangrui Meng created SPARK-10276:
-

 Summary: Add @since annotation to pyspark.mllib.recommendation
 Key: SPARK-10276
 URL: https://issues.apache.org/jira/browse/SPARK-10276
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib, PySpark
Reporter: Xiangrui Meng
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10279) Add @since annotation to pyspark.mllib.util

Xiangrui Meng created SPARK-10279:
-

 Summary: Add @since annotation to pyspark.mllib.util
 Key: SPARK-10279
 URL: https://issues.apache.org/jira/browse/SPARK-10279
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib, PySpark
Reporter: Xiangrui Meng
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10271) Add @since annotation to pyspark.mllib.clustering

Xiangrui Meng created SPARK-10271:
-

 Summary: Add @since annotation to pyspark.mllib.clustering
 Key: SPARK-10271
 URL: https://issues.apache.org/jira/browse/SPARK-10271
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib, PySpark
Reporter: Xiangrui Meng
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10275) Add @since annotation to pyspark.mllib.random

Xiangrui Meng created SPARK-10275:
-

 Summary: Add @since annotation to pyspark.mllib.random
 Key: SPARK-10275
 URL: https://issues.apache.org/jira/browse/SPARK-10275
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib, PySpark
Reporter: Xiangrui Meng
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10272) Add @since annotation to pyspark.mllib.evaluation

Xiangrui Meng created SPARK-10272:
-

 Summary: Add @since annotation to pyspark.mllib.evaluation
 Key: SPARK-10272
 URL: https://issues.apache.org/jira/browse/SPARK-10272
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib, PySpark
Reporter: Xiangrui Meng
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10278) Add @since annotation to pyspark.mllib.tree

Xiangrui Meng created SPARK-10278:
-

 Summary: Add @since annotation to pyspark.mllib.tree
 Key: SPARK-10278
 URL: https://issues.apache.org/jira/browse/SPARK-10278
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib, PySpark
Reporter: Xiangrui Meng
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8360) Streaming DataFrames

2015-08-25 Thread Adrian Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712377#comment-14712377
 ] 

Adrian Wang commented on SPARK-8360:


https://github.com/intel-bigdata/spark-streamingsql
Our streaming sql project is highly related to this jira ticket.

 Streaming DataFrames
 

 Key: SPARK-8360
 URL: https://issues.apache.org/jira/browse/SPARK-8360
 Project: Spark
  Issue Type: Umbrella
  Components: SQL, Streaming
Reporter: Reynold Xin

 Umbrella ticket to track what's needed to make streaming DataFrame a reality.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-9964) PySpark DataFrameReader accept RDD of String for JSON


 [ 
https://issues.apache.org/jira/browse/SPARK-9964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9964:
---

Assignee: Apache Spark

 PySpark DataFrameReader accept RDD of String for JSON
 -

 Key: SPARK-9964
 URL: https://issues.apache.org/jira/browse/SPARK-9964
 Project: Spark
  Issue Type: New Feature
  Components: PySpark, SQL
Reporter: Joseph K. Bradley
Assignee: Apache Spark
Priority: Minor

 It would be nice (but not necessary) for the PySpark DataFrameReader to 
 accept an RDD of Strings (like the Scala version does) for JSON, rather than 
 only taking a path.
 If this JIRA is accepted, it should probably be duplicated to cover the other 
 input types (not just JSON).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9964) PySpark DataFrameReader accept RDD of String for JSON


[ 
https://issues.apache.org/jira/browse/SPARK-9964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712409#comment-14712409
 ] 

Apache Spark commented on SPARK-9964:
-

User 'yanboliang' has created a pull request for this issue:
https://github.com/apache/spark/pull/8444

 PySpark DataFrameReader accept RDD of String for JSON
 -

 Key: SPARK-9964
 URL: https://issues.apache.org/jira/browse/SPARK-9964
 Project: Spark
  Issue Type: New Feature
  Components: PySpark, SQL
Reporter: Joseph K. Bradley
Priority: Minor

 It would be nice (but not necessary) for the PySpark DataFrameReader to 
 accept an RDD of Strings (like the Scala version does) for JSON, rather than 
 only taking a path.
 If this JIRA is accepted, it should probably be duplicated to cover the other 
 input types (not just JSON).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-9964) PySpark DataFrameReader accept RDD of String for JSON


 [ 
https://issues.apache.org/jira/browse/SPARK-9964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9964:
---

Assignee: (was: Apache Spark)

 PySpark DataFrameReader accept RDD of String for JSON
 -

 Key: SPARK-9964
 URL: https://issues.apache.org/jira/browse/SPARK-9964
 Project: Spark
  Issue Type: New Feature
  Components: PySpark, SQL
Reporter: Joseph K. Bradley
Priority: Minor

 It would be nice (but not necessary) for the PySpark DataFrameReader to 
 accept an RDD of Strings (like the Scala version does) for JSON, rather than 
 only taking a path.
 If this JIRA is accepted, it should probably be duplicated to cover the other 
 input types (not just JSON).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10238) Update @Since annotation for mllib.linalg

2015-08-25 Thread DB Tsai (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712410#comment-14712410
 ] 

DB Tsai commented on SPARK-10238:
-

Resolved in master and branch 1.5

 Update @Since annotation for mllib.linalg
 -

 Key: SPARK-10238
 URL: https://issues.apache.org/jira/browse/SPARK-10238
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib
Affects Versions: 1.5.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10281) Add @since annotation to pyspark.ml.clustering

Xiangrui Meng created SPARK-10281:
-

 Summary: Add @since annotation to pyspark.ml.clustering
 Key: SPARK-10281
 URL: https://issues.apache.org/jira/browse/SPARK-10281
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, ML, PySpark
Reporter: Xiangrui Meng
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10280) Add @since annotation to pyspark.ml.classification

Xiangrui Meng created SPARK-10280:
-

 Summary: Add @since annotation to pyspark.ml.classification
 Key: SPARK-10280
 URL: https://issues.apache.org/jira/browse/SPARK-10280
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, ML, PySpark
Reporter: Xiangrui Meng
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10285) Add @since annotation to pyspark.ml.util

Xiangrui Meng created SPARK-10285:
-

 Summary: Add @since annotation to pyspark.ml.util
 Key: SPARK-10285
 URL: https://issues.apache.org/jira/browse/SPARK-10285
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, ML, PySpark
Reporter: Xiangrui Meng
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10284) Add @since annotation to pyspark.ml.tuning

Xiangrui Meng created SPARK-10284:
-

 Summary: Add @since annotation to pyspark.ml.tuning
 Key: SPARK-10284
 URL: https://issues.apache.org/jira/browse/SPARK-10284
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, ML, PySpark
Reporter: Xiangrui Meng
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10282) Add @since annotation to pyspark.ml.recommendation

Xiangrui Meng created SPARK-10282:
-

 Summary: Add @since annotation to pyspark.ml.recommendation
 Key: SPARK-10282
 URL: https://issues.apache.org/jira/browse/SPARK-10282
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, ML, PySpark
Reporter: Xiangrui Meng
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10283) Add @since annotation to pyspark.ml.regression

Xiangrui Meng created SPARK-10283:
-

 Summary: Add @since annotation to pyspark.ml.regression
 Key: SPARK-10283
 URL: https://issues.apache.org/jira/browse/SPARK-10283
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, ML, PySpark
Reporter: Xiangrui Meng
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10286) Add @since annotation to pyspark.ml.param and pyspark.ml.*

Xiangrui Meng created SPARK-10286:
-

 Summary: Add @since annotation to pyspark.ml.param and pyspark.ml.*
 Key: SPARK-10286
 URL: https://issues.apache.org/jira/browse/SPARK-10286
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, ML, PySpark
Reporter: Xiangrui Meng
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10269) Add @since annotation to pyspark.mllib.classification


 [ 
https://issues.apache.org/jira/browse/SPARK-10269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-10269:
--
Target Version/s: 1.6.0

 Add @since annotation to pyspark.mllib.classification
 -

 Key: SPARK-10269
 URL: https://issues.apache.org/jira/browse/SPARK-10269
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib, PySpark
Reporter: Xiangrui Meng
Priority: Minor
  Labels: starter





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10272) Add @since annotation to pyspark.mllib.evaluation


 [ 
https://issues.apache.org/jira/browse/SPARK-10272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-10272:
--
Target Version/s: 1.6.0

 Add @since annotation to pyspark.mllib.evaluation
 -

 Key: SPARK-10272
 URL: https://issues.apache.org/jira/browse/SPARK-10272
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib, PySpark
Reporter: Xiangrui Meng
Priority: Minor
  Labels: starter





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10271) Add @since annotation to pyspark.mllib.clustering


 [ 
https://issues.apache.org/jira/browse/SPARK-10271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-10271:
--
Target Version/s: 1.6.0

 Add @since annotation to pyspark.mllib.clustering
 -

 Key: SPARK-10271
 URL: https://issues.apache.org/jira/browse/SPARK-10271
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib, PySpark
Reporter: Xiangrui Meng
Priority: Minor
  Labels: starter





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10273) Add @since annotation to pyspark.mllib.feature


 [ 
https://issues.apache.org/jira/browse/SPARK-10273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-10273:
--
Target Version/s: 1.6.0

 Add @since annotation to pyspark.mllib.feature
 --

 Key: SPARK-10273
 URL: https://issues.apache.org/jira/browse/SPARK-10273
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib, PySpark
Reporter: Xiangrui Meng
Priority: Minor
  Labels: starter





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10287) After processing a query using JSON data, Spark SQL continuously refreshes metadata of the table

2015-08-25 Thread Yin Huai (JIRA)

Yin Huai created SPARK-10287:


 Summary: After processing a query using JSON data, Spark SQL 
continuously refreshes metadata of the table
 Key: SPARK-10287
 URL: https://issues.apache.org/jira/browse/SPARK-10287
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: Yin Huai
Priority: Blocker


{code}
val df = sqlContext.read.format(json).load(aPartitionedJsonData)
val columnStr = df.schema.map(_.name).mkString(,)
println(scolumns: $columnStr)
val hash = df
  .selectExpr(shash($columnStr) as hashValue)
  .groupBy()
  .sum(hashValue)
  .head()
  .getLong(0)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10287) After processing a query using JSON data, Spark SQL continuously refreshes metadata of the table

2015-08-25 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-10287:
-
Description: 
I have a partitioned json table with around 2000 partitions.
{code}
val df = sqlContext.read.format(json).load(aPartitionedJsonData)
val columnStr = df.schema.map(_.name).mkString(,)
println(scolumns: $columnStr)
val hash = df
  .selectExpr(shash($columnStr) as hashValue)
  .groupBy()
  .sum(hashValue)
  .head()
  .getLong(0)
{code}

  was:
{code}
val df = sqlContext.read.format(json).load(aPartitionedJsonData)
val columnStr = df.schema.map(_.name).mkString(,)
println(scolumns: $columnStr)
val hash = df
  .selectExpr(shash($columnStr) as hashValue)
  .groupBy()
  .sum(hashValue)
  .head()
  .getLong(0)
{code}


 After processing a query using JSON data, Spark SQL continuously refreshes 
 metadata of the table
 

 Key: SPARK-10287
 URL: https://issues.apache.org/jira/browse/SPARK-10287
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: Yin Huai
Priority: Blocker

 I have a partitioned json table with around 2000 partitions.
 {code}
 val df = sqlContext.read.format(json).load(aPartitionedJsonData)
 val columnStr = df.schema.map(_.name).mkString(,)
 println(scolumns: $columnStr)
 val hash = df
   .selectExpr(shash($columnStr) as hashValue)
   .groupBy()
   .sum(hashValue)
   .head()
   .getLong(0)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10220) org.apache.spark.sql.jdbc.JDBCRDD could not parse mysql table column named reserved word


 [ 
https://issues.apache.org/jira/browse/SPARK-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10220:


Assignee: (was: Apache Spark)

 org.apache.spark.sql.jdbc.JDBCRDD could not parse mysql table column named 
 reserved word
 

 Key: SPARK-10220
 URL: https://issues.apache.org/jira/browse/SPARK-10220
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: fang fang chen
 Attachments: SPARK-10220.patch


 Reproduce steps:
var options: HashMap[String, String] = new HashMap
 options.put(driver, com.mysql.jdbc.Driver)
 options.put(url, url_total)
 options.put(dbtable, table)//one column named desc
 options.put(lowerBound, lower_bound.toString())
 options.put(upperBound, upper_bound.toString())
 options.put(numPartitions, partitions.toString());
 options.put(partitionColumn, id);
 val jdbcDF = sqlContext.load(jdbc, options)
 jdbcDF.save(output)
 Exception:
 15/08/24 19:02:34 ERROR executor.Executor: Exception in task 0.3 in stage 0.0 
 (TID 3)
 com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error 
 in your SQL syntax; check the manual that corresponds to your MySQL server 
 version for the right syntax to use near 
 'desc,warning_stat,money_limit,real_name,region_lv1,region_lv2,region_lv3,region_'
  at line 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10220) org.apache.spark.sql.jdbc.JDBCRDD could not parse mysql table column named reserved word


[ 
https://issues.apache.org/jira/browse/SPARK-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712394#comment-14712394
 ] 

Apache Spark commented on SPARK-10220:
--

User 'ffchenAtCloudera' has created a pull request for this issue:
https://github.com/apache/spark/pull/8443

 org.apache.spark.sql.jdbc.JDBCRDD could not parse mysql table column named 
 reserved word
 

 Key: SPARK-10220
 URL: https://issues.apache.org/jira/browse/SPARK-10220
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
Reporter: fang fang chen
 Attachments: SPARK-10220.patch


 Reproduce steps:
var options: HashMap[String, String] = new HashMap
 options.put(driver, com.mysql.jdbc.Driver)
 options.put(url, url_total)
 options.put(dbtable, table)//one column named desc
 options.put(lowerBound, lower_bound.toString())
 options.put(upperBound, upper_bound.toString())
 options.put(numPartitions, partitions.toString());
 options.put(partitionColumn, id);
 val jdbcDF = sqlContext.load(jdbc, options)
 jdbcDF.save(output)
 Exception:
 15/08/24 19:02:34 ERROR executor.Executor: Exception in task 0.3 in stage 0.0 
 (TID 3)
 com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error 
 in your SQL syntax; check the manual that corresponds to your MySQL server 
 version for the right syntax to use near 
 'desc,warning_stat,money_limit,real_name,region_lv1,region_lv2,region_lv3,region_'
  at line 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10254) Remove Guava dependencies in spark.ml.feature


 [ 
https://issues.apache.org/jira/browse/SPARK-10254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10254:


Assignee: Apache Spark

 Remove Guava dependencies in spark.ml.feature
 -

 Key: SPARK-10254
 URL: https://issues.apache.org/jira/browse/SPARK-10254
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Feynman Liang
Assignee: Apache Spark





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10254) Remove Guava dependencies in spark.ml.feature


 [ 
https://issues.apache.org/jira/browse/SPARK-10254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10254:


Assignee: (was: Apache Spark)

 Remove Guava dependencies in spark.ml.feature
 -

 Key: SPARK-10254
 URL: https://issues.apache.org/jira/browse/SPARK-10254
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Feynman Liang





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10254) Remove Guava dependencies in spark.ml.feature


[ 
https://issues.apache.org/jira/browse/SPARK-10254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712444#comment-14712444
 ] 

Apache Spark commented on SPARK-10254:
--

User 'feynmanliang' has created a pull request for this issue:
https://github.com/apache/spark/pull/8445

 Remove Guava dependencies in spark.ml.feature
 -

 Key: SPARK-10254
 URL: https://issues.apache.org/jira/browse/SPARK-10254
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Feynman Liang





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10255) Remove Guava dependencies in spark.ml.param

Feynman Liang created SPARK-10255:
-

 Summary: Remove Guava dependencies in spark.ml.param
 Key: SPARK-10255
 URL: https://issues.apache.org/jira/browse/SPARK-10255
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Feynman Liang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10202) Specify schema during KMeansModel.save to avoid reflection


[ 
https://issues.apache.org/jira/browse/SPARK-10202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712456#comment-14712456
 ] 

Vinod KC commented on SPARK-10202:
--

I'm working on this

 Specify schema during KMeansModel.save to avoid reflection
 --

 Key: SPARK-10202
 URL: https://issues.apache.org/jira/browse/SPARK-10202
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Feynman Liang
Priority: Minor

 [KMeansModel.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeansModel.scala#L110]
  currently infers a schema from a case class when the schema is known and 
 should be manually provided.
 See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10226) Error occured in SparkSQL when using !=

2015-08-25 Thread wangwei (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712457#comment-14712457
 ] 

wangwei commented on SPARK-10226:
-

I tested the case in Spark-1.4 and found that exclamation mark works, so != was 
supported in SparkSQL.

 Error occured in SparkSQL when using  !=
 

 Key: SPARK-10226
 URL: https://issues.apache.org/jira/browse/SPARK-10226
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: wangwei

 DataSource:  
 src/main/resources/kv1.txt
 SQL: 
   1. create table src(id string, name string);
   2. load data local inpath 
 '${SparkHome}/examples/src/main/resources/kv1.txt' into table src;
   3. select count( * ) from src where id != '0';
 [ERROR] Could not expand event
 java.lang.IllegalArgumentException: != 0;: event not found
   at jline.console.ConsoleReader.expandEvents(ConsoleReader.java:779)
   at jline.console.ConsoleReader.finishBuffer(ConsoleReader.java:631)
   at jline.console.ConsoleReader.accept(ConsoleReader.java:2019)
   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2666)
   at jline.console.ConsoleReader.readLine(ConsoleReader.java:2269)
   at 
 org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:231)
   at 
 org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:601)
   at 
 org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:666)
   at 
 org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:178)
   at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:203)
   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:118)
   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10203) Specify schema during GLMClassificationModel.save to avoid reflection


[ 
https://issues.apache.org/jira/browse/SPARK-10203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712458#comment-14712458
 ] 

Vinod KC commented on SPARK-10203:
--

I'm working on this

 Specify schema during GLMClassificationModel.save to avoid reflection
 -

 Key: SPARK-10203
 URL: https://issues.apache.org/jira/browse/SPARK-10203
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Feynman Liang
Priority: Minor

 [GLMClassificationModel.save|https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/mllib/src/main/scala/org/apache/spark/mllib/classification/impl/GLMClassificationModel.scala#L38]
  currently infers a schema from a case class when the schema is known and 
 should be manually provided.
 See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10206) Specify schema during IsotonicRegression.save to avoid reflection


[ 
https://issues.apache.org/jira/browse/SPARK-10206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712462#comment-14712462
 ] 

Vinod KC commented on SPARK-10206:
--

I'm working on this

 Specify schema during IsotonicRegression.save to avoid reflection
 -

 Key: SPARK-10206
 URL: https://issues.apache.org/jira/browse/SPARK-10206
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Feynman Liang
Priority: Minor

 [IsotonicRegression.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/regression/IsotonicRegression.scala#L184]
  currently infers a schema from a case class when the schema is known and 
 should be manually provided.
 See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10204) Specify schema during NaiveBayes.save to avoid reflection


[ 
https://issues.apache.org/jira/browse/SPARK-10204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712459#comment-14712459
 ] 

Vinod KC commented on SPARK-10204:
--

I'm working on this

 Specify schema during NaiveBayes.save to avoid reflection
 -

 Key: SPARK-10204
 URL: https://issues.apache.org/jira/browse/SPARK-10204
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Feynman Liang
Priority: Minor

 [NaiveBayes.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala#L181]
  currently infers a schema from a case class when the schema is known and 
 should be manually provided.
 See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10257) Remove Guava dependencies in spark.mllib JavaTests


 [ 
https://issues.apache.org/jira/browse/SPARK-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10257:


Assignee: Apache Spark

 Remove Guava dependencies in spark.mllib JavaTests
 --

 Key: SPARK-10257
 URL: https://issues.apache.org/jira/browse/SPARK-10257
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Feynman Liang
Assignee: Apache Spark
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10257) Remove Guava dependencies in spark.mllib JavaTests


 [ 
https://issues.apache.org/jira/browse/SPARK-10257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10257:


Assignee: (was: Apache Spark)

 Remove Guava dependencies in spark.mllib JavaTests
 --

 Key: SPARK-10257
 URL: https://issues.apache.org/jira/browse/SPARK-10257
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Feynman Liang
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-10243) Update @Since annotation for mllib.tree


 [ 
https://issues.apache.org/jira/browse/SPARK-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-10243.
---
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 8442
[https://github.com/apache/spark/pull/8442]

 Update @Since annotation for mllib.tree
 ---

 Key: SPARK-10243
 URL: https://issues.apache.org/jira/browse/SPARK-10243
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib
Affects Versions: 1.5.0
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
Priority: Minor
 Fix For: 1.5.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10104) Consolidate different forms of table identifiers


 [ 
https://issues.apache.org/jira/browse/SPARK-10104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10104:


Assignee: (was: Apache Spark)

 Consolidate different forms of table identifiers
 

 Key: SPARK-10104
 URL: https://issues.apache.org/jira/browse/SPARK-10104
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Yin Huai

 Right now, we have QualifiedTableName, TableIdentifier, and Seq[String] to 
 represent table identifiers. We should only have one form and looks 
 TableIdentifier is the best one because it provides methods to get table 
 name, database name, return unquoted string, and return quoted string. 
 There will be TODOs having SPARK-10104 in it. Those places need to be 
 updated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10129) math function: stddev_samp

2015-08-25 Thread Yanbo Liang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712439#comment-14712439
 ] 

Yanbo Liang commented on SPARK-10129:
-

I'm working on it.

 math function: stddev_samp
 --

 Key: SPARK-10129
 URL: https://issues.apache.org/jira/browse/SPARK-10129
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Reporter: Davies Liu

 Use the STDDEV_SAMP function to return the standard deviation of a sample 
 variance.
 http://www-01.ibm.com/support/knowledgecenter/SSPT3X_3.0.0/com.ibm.swg.im.infosphere.biginsights.bigsql.doc/doc/bsql_stdev_samp.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10255) Remove Guava dependencies in spark.ml.param


 [ 
https://issues.apache.org/jira/browse/SPARK-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10255:


Assignee: (was: Apache Spark)

 Remove Guava dependencies in spark.ml.param
 ---

 Key: SPARK-10255
 URL: https://issues.apache.org/jira/browse/SPARK-10255
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Feynman Liang
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10255) Remove Guava dependencies in spark.ml.param


 [ 
https://issues.apache.org/jira/browse/SPARK-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10255:


Assignee: Apache Spark

 Remove Guava dependencies in spark.ml.param
 ---

 Key: SPARK-10255
 URL: https://issues.apache.org/jira/browse/SPARK-10255
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Feynman Liang
Assignee: Apache Spark
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10256) Remove Guava dependencies in spark.ml.classificaiton

Feynman Liang created SPARK-10256:
-

 Summary: Remove Guava dependencies in spark.ml.classificaiton
 Key: SPARK-10256
 URL: https://issues.apache.org/jira/browse/SPARK-10256
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Feynman Liang
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10255) Remove Guava dependencies in spark.ml.param


[ 
https://issues.apache.org/jira/browse/SPARK-10255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712447#comment-14712447
 ] 

Apache Spark commented on SPARK-10255:
--

User 'feynmanliang' has created a pull request for this issue:
https://github.com/apache/spark/pull/8446

 Remove Guava dependencies in spark.ml.param
 ---

 Key: SPARK-10255
 URL: https://issues.apache.org/jira/browse/SPARK-10255
 Project: Spark
  Issue Type: Improvement
  Components: ML
Reporter: Feynman Liang
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10205) Specify schema during PowerIterationClustering.save to avoid reflection


[ 
https://issues.apache.org/jira/browse/SPARK-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712461#comment-14712461
 ] 

Vinod KC commented on SPARK-10205:
--

I'm working on this

 Specify schema during PowerIterationClustering.save to avoid reflection
 ---

 Key: SPARK-10205
 URL: https://issues.apache.org/jira/browse/SPARK-10205
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Feynman Liang
Priority: Minor

 [PowerIterationClustering.save|https://github.com/apache/spark/blob/f5b028ed2f1ad6de43c8b50ebf480e1b6c047035/mllib/src/main/scala/org/apache/spark/mllib/clustering/PowerIterationClustering.scala#L82]
  currently infers a schema from a case class when the schema is known and 
 should be manually provided.
 See parent JIRA for rationale.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10211) Specify schema during MatrixFactorizationModel.save to avoid reflection