date:20150619

Feynman Liang created SPARK-8493:


 Summary: Fisher Vector Feature Transformer
 Key: SPARK-8493
 URL: https://issues.apache.org/jira/browse/SPARK-8493
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Feynman Liang


Fisher vectors provide a vocabulary-based encoding for images (see 
https://hal.inria.fr/hal-00830491/file/journal.pdf). This representation is 
useful due to reduced dimensionality, providing regularization as well as 
increased scalability.

An implementation of FVs in Spark ML should provide a way to both train a GMM 
vocabulary as well compute Fisher kernel encodings of provided images. The 
vocabulary trainer can be implemented as a standalone GMM pipeline. The feature 
transformer can be implemented as a org.apache.spark.ml.UnaryTransformer. It 
should accept a vocabulary (Array[Array[Double]]) as well as an image 
(Array[Double]) and produce the Fisher kernel encoding (Array[Double]).

See Enceval (http://www.robots.ox.ac.uk/~vgg/software/enceval_toolkit/) for a 
reference implementation in MATLAB/C++.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-8452) expose jobGroup API in SparkR

2015-06-19 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-8452.
--
   Resolution: Fixed
Fix Version/s: 1.5.0

Issue resolved by pull request 6889
[https://github.com/apache/spark/pull/6889]

 expose jobGroup API in SparkR
 -

 Key: SPARK-8452
 URL: https://issues.apache.org/jira/browse/SPARK-8452
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 1.4.0
Reporter: Hossein Falaki
 Fix For: 1.5.0, 1.4.1


 Following job management calls are missing in SparkR:
 {code}
 setJobGroup()
 cancelJobGroup()
 clearJobGroup()
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8452) expose jobGroup API in SparkR

2015-06-19 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-8452:
-
Assignee: Hossein Falaki

 expose jobGroup API in SparkR
 -

 Key: SPARK-8452
 URL: https://issues.apache.org/jira/browse/SPARK-8452
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 1.4.0
Reporter: Hossein Falaki
Assignee: Hossein Falaki
 Fix For: 1.4.1, 1.5.0


 Following job management calls are missing in SparkR:
 {code}
 setJobGroup()
 cancelJobGroup()
 clearJobGroup()
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8315) Better error when saving to parquet with duplicate columns


[ 
https://issues.apache.org/jira/browse/SPARK-8315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594144#comment-14594144
 ] 

Yin Huai commented on SPARK-8315:
-

I tried the following in 1.4
{code}
import org.apache.spark.sql.functions._
val df1 = Seq((1, 1)).toDF(i, j).as(t1)
val df2 = Seq((1, 1)).toDF(i, j).as(t2)
val joined = df1.join(df2, col(t1.i) === col(t2.j))
joined.explain(true)
joined.write.format(parquet).saveAsTable(yinParquetSameColumnNames)
{code}
and I got an analysis exception {{org.apache.spark.sql.AnalysisException: 
Reference 'i' is ambiguous, could be: i#30, i#34.;}}. Seems it is fixed in 1.4, 
but it will be good to add to add a regression test.

 Better error when saving to parquet with duplicate columns
 --

 Key: SPARK-8315
 URL: https://issues.apache.org/jira/browse/SPARK-8315
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Michael Armbrust
Priority: Critical

 Parquet allows you to silently write out files with duplicate column names 
 and then emits a very confusing error when trying to read the data back in:
 {code}
 Error in SQL statement: java.lang.RuntimeException: 
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 4 in 
 stage 21.0 failed 4 times, most recent failure: Lost task 4.3 in stage 21.0 
 (TID 2767, ...): parquet.io.ParquetDecodingException: Can not read value at 0 
 in block -1 in file ...
 {code}
 We should throw a better error before attempting to write out an invalid file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8477) Add in operator to DataFrame Column in Python


[ 
https://issues.apache.org/jira/browse/SPARK-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593895#comment-14593895
 ] 

Yu Ishikawa commented on SPARK-8477:


Oh, I'm sorry. I didn't know we have already have {{inSet}}. It seems that the 
function of {{inSet}} is almost like that of {{in}}.

 Add in operator to DataFrame Column in Python
 -

 Key: SPARK-8477
 URL: https://issues.apache.org/jira/browse/SPARK-8477
 Project: Spark
  Issue Type: New Feature
  Components: PySpark, SQL
Reporter: Yu Ishikawa





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8470) MissingRequirementError for ScalaReflection on user classes

2015-06-19 Thread Michael Armbrust (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593893#comment-14593893
 ] 

Michael Armbrust commented on SPARK-8470:
-

Additional info on how this is being run:
{code}
We're using the normal command line:
---
bin/spark-submit --properties-file ./spark-submit.conf --class 
com.rr.data.visits.VisitSequencerRunner 
./mvt-master-SNAPSHOT-jar-with-dependencies.jar
--- 

Our jar contains both com.rr.data.visits.orc.OrcReadWrite (which you can see in 
the stack trace) and the unfound com.rr.data.Visit.
{code}

 MissingRequirementError for ScalaReflection on user classes
 ---

 Key: SPARK-8470
 URL: https://issues.apache.org/jira/browse/SPARK-8470
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Michael Armbrust
Assignee: Andrew Or
Priority: Blocker

 From the mailing list:
 {code}
 Since upgrading to Spark 1.4, I'm getting a
 scala.reflect.internal.MissingRequirementError when creating a DataFrame
 from an RDD. The error references a case class in the application (the
 RDD's type parameter), which has been verified to be present.
 Items of note:
 1) This is running on AWS EMR (YARN). I do not get this error running
 locally (standalone).
 2) Reverting to Spark 1.3.1 makes the problem go away
 3) The jar file containing the referenced class (the app assembly jar)
 is not listed in the classpath expansion dumped in the error message.
 I have seen SPARK-5281, and am guessing that this is the root cause,
 especially since the code added there is involved in the stacktrace.
 That said, my grasp on scala reflection isn't strong enough to make
 sense of the change to say for sure. It certainly looks, though, that in
 this scenario the current thread's context classloader may not be what
 we think it is (given #3 above).
 Any ideas?
 App code:
   def registerTable[A : Product : TypeTag](name: String, rdd:
 RDD[A])(implicit hc: HiveContext) = {
 val df = hc.createDataFrame(rdd)
 df.registerTempTable(name)
   }
 Stack trace:
 scala.reflect.internal.MissingRequirementError: class comMyClass in
 JavaMirror with sun.misc.Launcher$AppClassLoader@d16e5d6 of type class
 sun.misc.Launcher$AppClassLoader with classpath [ lots and lots of paths
 and jars, but not the app assembly jar] not found
 at
 scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
 at
 scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
 at
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48)
 at
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
 at
 scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72)
 at
 scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:119)
 at
 scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:21)
 at
 com.ipcoop.spark.sql.SqlEnv$$typecreator1$1.apply(SqlEnv.scala:87)
 at
 scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:231)
 at
 scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:231)
 at
 org.apache.spark.sql.catalyst.ScalaReflection$class.localTypeOf(ScalaReflection.scala:71)
 at
 org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:59)
 at
 org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:28)
 at
 org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:410)
 {code}
 Another report:
 {code}
 Hi,
 I use spark 0.14. I tried to create dataframe from RDD below, but got 
 scala.reflect.internal.MissingRequirementError
 val partitionedTestDF2 = pairVarRDD.toDF(column1,column2,column3)
 //pairVarRDD is RDD[Record4Dim_2], and Record4Dim_2 is a Case Class
 How can I fix this?
 Exception in thread main scala.reflect.internal.MissingRequirementError: 
 class etl.Record4Dim_2 in JavaMirror with 
 sun.misc.Launcher$AppClassLoader@30177039 of type class 
 sun.misc.Launcher$AppClassLoader with classpath 
 [file:/local/spark140/conf/,file:/local/spark140/lib/spark-assembly-1.4.0-SNAPSHOT-hadoop2.6.0.jar,file:/local/spark140/lib/datanucleus-core-3.2.10.jar,file:/local/spark140/lib/datanucleus-rdbms-3.2.9.jar,file:/local/spark140/lib/datanucleus-api-jdo-3.2.6.jar,file:/etc/hadoop/conf/]
  and parent being sun.misc.Launcher$ExtClassLoader@52c8c6d9 of type class 
 sun.misc.Launcher$ExtClassLoader with classpath

[jira] [Updated] (SPARK-8485) Feature transformers for image processing


 [ 
https://issues.apache.org/jira/browse/SPARK-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-8485:
-
Component/s: ML

 Feature transformers for image processing
 -

 Key: SPARK-8485
 URL: https://issues.apache.org/jira/browse/SPARK-8485
 Project: Spark
  Issue Type: New Feature
  Components: ML
Reporter: Feynman Liang

 Many transformers exist to convert from image representations into more 
 compact descriptors amenable to standard ML techniques. We should implement 
 these transformers in Spark to support machine learning on richer content 
 types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8484) Add TrainValidationSplit to ml.tuning

2015-06-19 Thread Martin Zapletal (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593905#comment-14593905
 ] 

Martin Zapletal commented on SPARK-8484:


I can work on this one. Can you please assign to me?

 Add TrainValidationSplit to ml.tuning
 -

 Key: SPARK-8484
 URL: https://issues.apache.org/jira/browse/SPARK-8484
 Project: Spark
  Issue Type: New Feature
  Components: ML
Reporter: Xiangrui Meng

 Add TrainValidationSplit for hyper-parameter tuning. It randomly splits the 
 input dataset into train and validation and use evaluation metric on the 
 validation set to select the best model. It should be similar to 
 CrossValidator, but simpler and less expensive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8486) SIFT Feature Extractor

Feynman Liang created SPARK-8486:


 Summary: SIFT Feature Extractor
 Key: SPARK-8486
 URL: https://issues.apache.org/jira/browse/SPARK-8486
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Feynman Liang


Scale invariant feature transform (SIFT) is a method to transform images into 
dense vectors describing local features which are invariant to scale and 
rotation. (Lowe, IJCV 2004, http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)

We can implement SIFT in Spark ML pipelines as a 
[[org.apache.spark.ml.Transformer]]. Given an image Array[Array[Numeric]], the 
SIFT transformer should output an Array[Numeric] of the SIFT features present 
in the image.

Depending on performance, approximating Laplacian of Gaussian by Difference of 
Gaussian as described by Lowe can be even further improved using box filters 
(Bay, ECCV 2006,  http://www.vision.ee.ethz.ch/~surf/eccv06.pdf).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-7360) Compare Pyrolite performance affected by useMemo

2015-06-19 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng closed SPARK-7360.

  Resolution: Done
   Fix Version/s: 1.4.0
Assignee: Xiangrui Meng  (was: Nicholas Chammas)
Target Version/s: 1.4.0  (was: 1.5.0)

I marked this as done. [~davies] mentioned that we need to serialize a lot of 
classes without useMemo, which hurts performance. So turning useMemo off 
globally is not a good option.

 Compare Pyrolite performance affected by useMemo
 

 Key: SPARK-7360
 URL: https://issues.apache.org/jira/browse/SPARK-7360
 Project: Spark
  Issue Type: Task
  Components: PySpark
Reporter: Xiangrui Meng
Assignee: Xiangrui Meng
 Fix For: 1.4.0


 As discussed in SPARK-6288, disabling useMemo shows significant performance 
 on some ML tasks. We should test whether this is true across PySpark, and 
 consider patch Pyrolite for Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8486) SIFT/SURF Feature Transformer


 [ 
https://issues.apache.org/jira/browse/SPARK-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-8486:
-
Description: 
Scale invariant feature transform (SIFT) is a scale and rotation invariant 
method to transform images into matrices describing local features. (Lowe, IJCV 
2004, http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)

We can implement SIFT in Spark ML pipelines as a 
org.apache.spark.ml.Transformer. Given an image Array[Array[Numeric]], the SIFT 
transformer should output an ArrayArray[[Numeric]] of the SIFT features for the 
provided image.

The implementation should support computation of SIFT at predefined interest 
points, every kth pixel, and densely (over all pixels). Furthermore, the 
implementation should support various approximations for approximating the 
Laplacian of Gaussian. In addition to approximating using Difference of 
Gaussian (as described by Lowe), we should support
 * SURF approximation using box filters (Bay, ECCV 2006,  
http://www.vision.ee.ethz.ch/~surf/eccv06.pdf) should also be supported.
 * DAISY 

  was:
Scale invariant feature transform (SIFT) is a scale and rotation invariant 
method to transform images into matrices describing local features. (Lowe, IJCV 
2004, http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)

We can implement SIFT in Spark ML pipelines as a 
org.apache.spark.ml.Transformer. Given an image Array[Array[Numeric]], the SIFT 
transformer should output an ArrayArray[[Numeric]] of the SIFT features for the 
provided image.

The implementation should support computation of SIFT at predefined interest 
points, every kth pixel, and densely (over all pixels).

Depending on performance, approximating Laplacian of Gaussian by Difference of 
Gaussian (traditional SIFT) as described by Lowe can be even further improved 
using box filters (aka SURF, see Bay, ECCV 2006,  
http://www.vision.ee.ethz.ch/~surf/eccv06.pdf).


 SIFT/SURF Feature Transformer
 -

 Key: SPARK-8486
 URL: https://issues.apache.org/jira/browse/SPARK-8486
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Feynman Liang

 Scale invariant feature transform (SIFT) is a scale and rotation invariant 
 method to transform images into matrices describing local features. (Lowe, 
 IJCV 2004, http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)
 We can implement SIFT in Spark ML pipelines as a 
 org.apache.spark.ml.Transformer. Given an image Array[Array[Numeric]], the 
 SIFT transformer should output an ArrayArray[[Numeric]] of the SIFT features 
 for the provided image.
 The implementation should support computation of SIFT at predefined interest 
 points, every kth pixel, and densely (over all pixels). Furthermore, the 
 implementation should support various approximations for approximating the 
 Laplacian of Gaussian. In addition to approximating using Difference of 
 Gaussian (as described by Lowe), we should support
  * SURF approximation using box filters (Bay, ECCV 2006,  
 http://www.vision.ee.ethz.ch/~surf/eccv06.pdf) should also be supported.
  * DAISY 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8486) SIFT/SURF/DAISY Feature Transformer


 [ 
https://issues.apache.org/jira/browse/SPARK-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-8486:
-
Summary: SIFT/SURF/DAISY Feature Transformer  (was: SIFT/SURF Feature 
Transformer)

 SIFT/SURF/DAISY Feature Transformer
 ---

 Key: SPARK-8486
 URL: https://issues.apache.org/jira/browse/SPARK-8486
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Feynman Liang

 Scale invariant feature transform (SIFT) is a scale and rotation invariant 
 method to transform images into matrices describing local features. (Lowe, 
 IJCV 2004, http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)
 We can implement SIFT in Spark ML pipelines as a 
 org.apache.spark.ml.Transformer. Given an image Array[Array[Numeric]], the 
 SIFT transformer should output an ArrayArray[[Numeric]] of the SIFT features 
 for the provided image.
 The implementation should support computation of SIFT at predefined interest 
 points, every kth pixel, and densely (over all pixels). Furthermore, the 
 implementation should support various approximations for approximating the 
 Laplacian of Gaussian. In addition to approximating using Difference of 
 Gaussian (as described by Lowe), we should support
  * SURF approximation using box filters (Bay, ECCV 2006,  
 http://www.vision.ee.ethz.ch/~surf/eccv06.pdf) should also be supported.
  * DAISY 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8494) ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3


 [ 
https://issues.apache.org/jira/browse/SPARK-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-8494:
--
Description: 
I found a similar issue to SPARK-1923 but with Scala 2.10.4.
I used the Test.scala from SPARK-1923 but used the libraryDependencies from a 
build.sbt that I am working on.
If I remove the spray 1.3.3 jars, the test case passes but has a SPARK-1923

Application:
{code}
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object Test {
  def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
val sc = new SparkContext(conf)
sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
sc.stop()
  }
{code}

Exception:
{code}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
failed 1 times, most recent failure: Exception failure in TID 1 on host 
localhost: java.lang.ClassNotFoundException: scala.None$
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
java.security.AccessController.doPrivileged(Native Method)
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
java.lang.ClassLoader.loadClass(ClassLoader.java:425)
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
java.lang.Class.forName0(Native Method)
java.lang.Class.forName(Class.java:270)

org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
{code}


  was:
I found a similar issue to SPARK-1923 but with Scala 2.10.4.
I used the Test.scala from SPARK-1923 but used the libraryDependencies from a 
build.sbt that I am working on.
If I remove the spray 1.3.3 jars, the test case passes but has a 

Application:
{code}
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object Test {
  def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
val sc = new SparkContext(conf)
sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
sc.stop()
  }
{code}

Exception:
{code}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
failed 1 times, most recent failure: Exception failure in TID 1 on host 
localhost: java.lang.ClassNotFoundException: scala.None$
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
java.security.AccessController.doPrivileged(Native Method)
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
java.lang.ClassLoader.loadClass(ClassLoader.java:425)
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
java.lang.Class.forName0(Native Method)
java.lang.Class.forName(Class.java:270)

org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
{code}



 ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3
 ---

 Key: SPARK-8494
 URL: https://issues.apache.org/jira/browse/SPARK-8494
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: PJ Fanning
Assignee: Patrick Wendell

 I found a similar issue to SPARK-1923 but with Scala 2.10.4.
 I used the Test.scala from SPARK-1923 but used the libraryDependencies from a 
 build.sbt that I am working on.
 If I remove the spray 1.3.3 jars, the test case passes but has a SPARK-1923
 Application:
 {code}
 import org.apache.spark.SparkConf
 import org.apache.spark.SparkContext
 object Test {
   def main(args: Array[String]): Unit = {
 val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
 val sc = new SparkContext(conf)
 sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
 sc.stop()
   }
 {code}
 Exception:
 {code}
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
 failed 1 times, most recent failure: Exception failure in TID 1 on host 
 localhost: java.lang.ClassNotFoundException: scala.None$
 java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 java.security.AccessController.doPrivileged(Native Method)
 java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 java.lang.Class.forName0(Native

[jira] [Updated] (SPARK-8494) ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3


 [ 
https://issues.apache.org/jira/browse/SPARK-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-8494:
--
Description: 
I found a similar issue to SPARK-1923 but with Scala 2.10.4.
I used the Test.scala from SPARK-1923 but used the libraryDependencies from a 
build.sbt that I am working on.
If I remove the spray 1.3.3 jars, the test case passes but has a 

Application:
{code}
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object Test {
  def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
val sc = new SparkContext(conf)
sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
sc.stop()
  }
{code}

Exception:
{code}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
failed 1 times, most recent failure: Exception failure in TID 1 on host 
localhost: java.lang.ClassNotFoundException: scala.None$
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
java.security.AccessController.doPrivileged(Native Method)
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
java.lang.ClassLoader.loadClass(ClassLoader.java:425)
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
java.lang.Class.forName0(Native Method)
java.lang.Class.forName(Class.java:270)

org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
{code}


  was:
I just wanted to document this for posterity. I had an issue when running a 
Spark 1.0 app locally with sbt. The issue was that if you both:

1. Reference a scala class (e.g. None) inside of a closure.
2. Run your program with 'sbt run'

It throws an exception. Upgrading the scalaVersion to 2.10.4 in sbt solved this 
issue. Somehow scala classes were not being loaded correctly inside of the 
executors:

Application:
{code}
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object Test {
  def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
val sc = new SparkContext(conf)
sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
sc.stop()
  }
{code}

Exception:
{code}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
failed 1 times, most recent failure: Exception failure in TID 1 on host 
localhost: java.lang.ClassNotFoundException: scala.None$
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
java.security.AccessController.doPrivileged(Native Method)
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
java.lang.ClassLoader.loadClass(ClassLoader.java:425)
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
java.lang.Class.forName0(Native Method)
java.lang.Class.forName(Class.java:270)

org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
{code}



 ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3
 ---

 Key: SPARK-8494
 URL: https://issues.apache.org/jira/browse/SPARK-8494
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: PJ Fanning
Assignee: Patrick Wendell

 I found a similar issue to SPARK-1923 but with Scala 2.10.4.
 I used the Test.scala from SPARK-1923 but used the libraryDependencies from a 
 build.sbt that I am working on.
 If I remove the spray 1.3.3 jars, the test case passes but has a 
 Application:
 {code}
 import org.apache.spark.SparkConf
 import org.apache.spark.SparkContext
 object Test {
   def main(args: Array[String]): Unit = {
 val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
 val sc = new SparkContext(conf)
 sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
 sc.stop()
   }
 {code}
 Exception:
 {code}
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
 failed 1 times, most recent failure: Exception failure in TID 1 on host 
 localhost: java.lang.ClassNotFoundException: scala.None$
 java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 java.security.AccessController.doPrivileged(Native Method)
 java.net.URLClassLoader.findClass(URLClassLoader.java:354)

[jira] [Commented] (SPARK-6749) Make metastore client robust to underlying socket connection loss


[ 
https://issues.apache.org/jira/browse/SPARK-6749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594145#comment-14594145
 ] 

Apache Spark commented on SPARK-6749:
-

User 'ericl' has created a pull request for this issue:
https://github.com/apache/spark/pull/6912

 Make metastore client robust to underlying socket connection loss
 -

 Key: SPARK-6749
 URL: https://issues.apache.org/jira/browse/SPARK-6749
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Yin Huai
Priority: Critical

 Right now, if metastore get restarted, we have to restart the driver to get a 
 new connection to the metastore client because the underlying socket 
 connection is gone. We should make metastore client robust to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-6749) Make metastore client robust to underlying socket connection loss


 [ 
https://issues.apache.org/jira/browse/SPARK-6749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6749:
---

Assignee: (was: Apache Spark)

 Make metastore client robust to underlying socket connection loss
 -

 Key: SPARK-6749
 URL: https://issues.apache.org/jira/browse/SPARK-6749
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Yin Huai
Priority: Critical

 Right now, if metastore get restarted, we have to restart the driver to get a 
 new connection to the metastore client because the underlying socket 
 connection is gone. We should make metastore client robust to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-6749) Make metastore client robust to underlying socket connection loss


 [ 
https://issues.apache.org/jira/browse/SPARK-6749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-6749:
---

Assignee: Apache Spark

 Make metastore client robust to underlying socket connection loss
 -

 Key: SPARK-6749
 URL: https://issues.apache.org/jira/browse/SPARK-6749
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Yin Huai
Assignee: Apache Spark
Priority: Critical

 Right now, if metastore get restarted, we have to restart the driver to get a 
 new connection to the metastore client because the underlying socket 
 connection is gone. We should make metastore client robust to it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8477) Add in operator to DataFrame Column in Python

2015-06-19 Thread Reynold Xin (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593884#comment-14593884
 ] 

Reynold Xin commented on SPARK-8477:


Maybe inSet ?

 Add in operator to DataFrame Column in Python
 -

 Key: SPARK-8477
 URL: https://issues.apache.org/jira/browse/SPARK-8477
 Project: Spark
  Issue Type: New Feature
  Components: PySpark, SQL
Reporter: Yu Ishikawa





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8484) Add TrainValidationSplit to ml.tuning

2015-06-19 Thread Xiangrui Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593933#comment-14593933
 ] 

Xiangrui Meng commented on SPARK-8484:
--

Assigned:)

 Add TrainValidationSplit to ml.tuning
 -

 Key: SPARK-8484
 URL: https://issues.apache.org/jira/browse/SPARK-8484
 Project: Spark
  Issue Type: New Feature
  Components: ML
Reporter: Xiangrui Meng
Assignee: Martin Zapletal

 Add TrainValidationSplit for hyper-parameter tuning. It randomly splits the 
 input dataset into train and validation and use evaluation metric on the 
 validation set to select the best model. It should be similar to 
 CrossValidator, but simpler and less expensive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-8470) MissingRequirementError for ScalaReflection on user classes


[ 
https://issues.apache.org/jira/browse/SPARK-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593954#comment-14593954
 ] 

Andrew Or edited comment on SPARK-8470 at 6/19/15 9:19 PM:
---

Closing this as a duplicate. We will add the regression test in SPARK-8489.


was (Author: andrewor14):
Closing this as FIXED. We will add the regression test in SPARK-8489.

 MissingRequirementError for ScalaReflection on user classes
 ---

 Key: SPARK-8470
 URL: https://issues.apache.org/jira/browse/SPARK-8470
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Michael Armbrust
Assignee: Andrew Or
Priority: Blocker
 Fix For: 1.4.1, 1.5.0


 From the mailing list:
 {code}
 Since upgrading to Spark 1.4, I'm getting a
 scala.reflect.internal.MissingRequirementError when creating a DataFrame
 from an RDD. The error references a case class in the application (the
 RDD's type parameter), which has been verified to be present.
 Items of note:
 1) This is running on AWS EMR (YARN). I do not get this error running
 locally (standalone).
 2) Reverting to Spark 1.3.1 makes the problem go away
 3) The jar file containing the referenced class (the app assembly jar)
 is not listed in the classpath expansion dumped in the error message.
 I have seen SPARK-5281, and am guessing that this is the root cause,
 especially since the code added there is involved in the stacktrace.
 That said, my grasp on scala reflection isn't strong enough to make
 sense of the change to say for sure. It certainly looks, though, that in
 this scenario the current thread's context classloader may not be what
 we think it is (given #3 above).
 Any ideas?
 App code:
   def registerTable[A : Product : TypeTag](name: String, rdd:
 RDD[A])(implicit hc: HiveContext) = {
 val df = hc.createDataFrame(rdd)
 df.registerTempTable(name)
   }
 Stack trace:
 scala.reflect.internal.MissingRequirementError: class comMyClass in
 JavaMirror with sun.misc.Launcher$AppClassLoader@d16e5d6 of type class
 sun.misc.Launcher$AppClassLoader with classpath [ lots and lots of paths
 and jars, but not the app assembly jar] not found
 at
 scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
 at
 scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
 at
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48)
 at
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
 at
 scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72)
 at
 scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:119)
 at
 scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:21)
 at
 com.ipcoop.spark.sql.SqlEnv$$typecreator1$1.apply(SqlEnv.scala:87)
 at
 scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:231)
 at
 scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:231)
 at
 org.apache.spark.sql.catalyst.ScalaReflection$class.localTypeOf(ScalaReflection.scala:71)
 at
 org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:59)
 at
 org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:28)
 at
 org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:410)
 {code}
 Another report:
 {code}
 Hi,
 I use spark 0.14. I tried to create dataframe from RDD below, but got 
 scala.reflect.internal.MissingRequirementError
 val partitionedTestDF2 = pairVarRDD.toDF(column1,column2,column3)
 //pairVarRDD is RDD[Record4Dim_2], and Record4Dim_2 is a Case Class
 How can I fix this?
 Exception in thread main scala.reflect.internal.MissingRequirementError: 
 class etl.Record4Dim_2 in JavaMirror with 
 sun.misc.Launcher$AppClassLoader@30177039 of type class 
 sun.misc.Launcher$AppClassLoader with classpath 
 [file:/local/spark140/conf/,file:/local/spark140/lib/spark-assembly-1.4.0-SNAPSHOT-hadoop2.6.0.jar,file:/local/spark140/lib/datanucleus-core-3.2.10.jar,file:/local/spark140/lib/datanucleus-rdbms-3.2.9.jar,file:/local/spark140/lib/datanucleus-api-jdo-3.2.6.jar,file:/etc/hadoop/conf/]
  and parent being sun.misc.Launcher$ExtClassLoader@52c8c6d9 of type class 
 sun.misc.Launcher$ExtClassLoader with classpath 
 [file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/sunec.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/sunjce_provider.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/sunpkcs11.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/zipfs.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/localedata.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/dnsns.jar]
  and parent being primordial

[jira] [Updated] (SPARK-8491) DAISY Feature Transformer

[
https://issues.apache.org/jira/browse/SPARK-8491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Feynman Liang updated SPARK-8491:
-
Description:
DAISY (Tola et al, PAMI 2010,
http://infoscience.epfl.ch/record/138785/files/tola_daisy_pami_1.pdf) is
another local image descriptor utilizing histograms of local orientation
similar to SIFT. However, one key difference is that the weighted sum of
gradient norms used in SIFT's orientation assignment is replaced by convolution
with Gaussian kernels. This provides a significant speedup in computing dense
descriptors.

We can implement DAISY in Spark ML pipelines as a
org.apache.spark.ml.Transformer. Given an image Array[Array[Numeric]], the SIFT
transformer should output an ArrayArray[[Numeric]] of the DAISY features for
the provided image.

The convolution operation can leverage GPU parallelism for efficiency.

A C++/MATLAB reference implementation is available at
http://cvlab.epfl.ch/software/daisy.

was:
DAISY is another local image descriptor utilizing histograms of local
orientation similar to SIFT. However, one key difference is that the weighted
sum of gradient norms used in SIFT's orientation assignment is replaced by
convolution with Gaussian kernels. This provides a significant speedup in
computing dense descriptors.

The convolution operation can leverage GPU parallelism for efficiency.

DAISY Feature Transformer
-

Key: SPARK-8491
URL: https://issues.apache.org/jira/browse/SPARK-8491
Project: Spark
Issue Type: Sub-task
Components: ML
Reporter: Feynman Liang

DAISY (Tola et al, PAMI 2010,
http://infoscience.epfl.ch/record/138785/files/tola_daisy_pami_1.pdf) is
another local image descriptor utilizing histograms of local orientation
similar to SIFT. However, one key difference is that the weighted sum of
gradient norms used in SIFT's orientation assignment is replaced by
convolution with Gaussian kernels. This provides a significant speedup in
computing dense descriptors.
We can implement DAISY in Spark ML pipelines as a
org.apache.spark.ml.Transformer. Given an image Array[Array[Numeric]], the
SIFT transformer should output an ArrayArray[[Numeric]] of the DAISY features
for the provided image.
The convolution operation can leverage GPU parallelism for efficiency.
A C++/MATLAB reference implementation is available at
http://cvlab.epfl.ch/software/daisy.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-4438) Add HistoryServer RESTful API

2015-06-19 Thread Jonathan Kelly (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-4438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594092#comment-14594092
 ] 

Jonathan Kelly commented on SPARK-4438:
---

This API was added in 1.4.0, right? Should this JIRA be resolved now?

 Add HistoryServer RESTful API
 -

 Key: SPARK-4438
 URL: https://issues.apache.org/jira/browse/SPARK-4438
 Project: Spark
  Issue Type: Improvement
  Components: Deploy
Reporter: Gankun Luo
 Attachments: HistoryServer RESTful API Design Doc.pdf


 Spark HistoryServer currently only supports keep track of all completed 
 applications through the WEBUI, does not provide RESTful API for external 
 system query completed application information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8494) ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3


 [ 
https://issues.apache.org/jira/browse/SPARK-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-8494:
--
Attachment: spark-test-case.zip

 ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3
 ---

 Key: SPARK-8494
 URL: https://issues.apache.org/jira/browse/SPARK-8494
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: PJ Fanning
Assignee: Patrick Wendell
 Attachments: spark-test-case.zip


 I found a similar issue to SPARK-1923 but with Scala 2.10.4.
 I used the Test.scala from SPARK-1923 but used the libraryDependencies from a 
 build.sbt that I am working on.
 If I remove the spray 1.3.3 jars, the test case passes but has a 
 ClassNotFoundException otherwise.
 I have a spark-assembly jar built using Spark 1.3.2-SNAPSHOT.
 Application:
 {code}
 import org.apache.spark.SparkConf
 import org.apache.spark.SparkContext
 object Test {
   def main(args: Array[String]): Unit = {
 val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
 val sc = new SparkContext(conf)
 sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
 sc.stop()
   }
 {code}
 Exception:
 {code}
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
 failed 1 times, most recent failure: Exception failure in TID 1 on host 
 localhost: java.lang.ClassNotFoundException: scala.collection.immutable.Range
 java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 java.security.AccessController.doPrivileged(Native Method)
 java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 java.lang.Class.forName0(Native Method)
 java.lang.Class.forName(Class.java:270)
 
 org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
 
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
 {code}
 {code}
 name := spark-test-case
 version := 1.0
 scalaVersion := 2.10.4
 resolvers += spray repo at http://repo.spray.io;
 resolvers += Scalaz Bintray Repo at https://dl.bintray.com/scalaz/releases;
 val akkaVersion = 2.3.11
 val sprayVersion = 1.3.3
 libraryDependencies ++= Seq(
   com.h2database  % h2   % 1.4.187,
   com.typesafe.akka  %% akka-actor   % akkaVersion,
   com.typesafe.akka  %% akka-slf4j   % akkaVersion,
   ch.qos.logback  % logback-classic  % 1.0.13,
   io.spray   %% spray-can% sprayVersion,
   io.spray   %% spray-routing% sprayVersion,
   io.spray   %% spray-json   % 1.3.1,
   com.databricks %% spark-csv% 1.0.3,
   org.specs2 %% specs2   % 2.4.17   % test,
   org.specs2 %% specs2-junit % 2.4.17   % test,
   io.spray   %% spray-testkit% sprayVersion   % test,
   com.typesafe.akka  %% akka-testkit % akkaVersion% test,
   junit   % junit% 4.12 % test
 )
 scalacOptions ++= Seq(
   -unchecked,
   -deprecation,
   -Xlint,
   -Ywarn-dead-code,
   -language:_,
   -target:jvm-1.7,
   -encoding, UTF-8
 )
 testOptions += Tests.Argument(TestFrameworks.JUnit, -v)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8332) NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer

2015-06-19 Thread Olivier Girardot (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593898#comment-14593898
 ] 

Olivier Girardot commented on SPARK-8332:
-

You're right sorry

 NoSuchMethodError: 
 com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer
 --

 Key: SPARK-8332
 URL: https://issues.apache.org/jira/browse/SPARK-8332
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.0
 Environment: spark 1.4  hadoop 2.3.0-cdh5.0.0
Reporter: Tao Li
Priority: Critical
  Labels: 1.4.0, NoSuchMethodError, com.fasterxml.jackson

 I complied new spark 1.4.0 version. 
 But when I run a simple WordCount demo, it throws NoSuchMethodError 
 {code}
 java.lang.NoSuchMethodError: 
 com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer
 {code}
 I found out that the default fasterxml.jackson.version is 2.4.4. 
 Is there any wrong or conflict with the jackson version? 
 Or is there possibly some project maven dependency containing the wrong 
 version of jackson?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-8470) MissingRequirementError for ScalaReflection on user classes


[ 
https://issues.apache.org/jira/browse/SPARK-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593927#comment-14593927
 ] 

Andrew Or edited comment on SPARK-8470 at 6/19/15 8:35 PM:
---

FYI, I was able to reproduce this locally. This allowed me to conclude two 
things:

1. It has nothing to do with YARN specifically.
2. It is caused by some code in the hive module; I could reproduce this only 
with HiveContext, but not with SQLContext.

Small reproduction:

{code}
bin/spark-submit --master local --class FunTest app.jar
{code}

Inside app.jar: FunTest.scala
{code}
object FunTest {
  def main(args: Array[String]): Unit = {
println(Runnin' my cool class)
val conf = new SparkConf().setAppName(testing)
val sc = new SparkContext(conf)
val sqlContext = new HiveContext(sc)
val coolClasses = Seq(
  MyCoolClass(ast, resent, uture),
  MyCoolClass(mamazing, papazing, fafazing))
val df = sqlContext.createDataFrame(coolClasses)
df.collect()
  }
}
{code}

Inside app.jar: MyCoolClass.scala
{code}
case class MyCoolClass(past: String, present: String, future: String)
{code}

Result:
{code}
Exception in thread main scala.reflect.internal.MissingRequirementError: 
class MyCoolClass not found.
at 
scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
at 
scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
at 
scala.reflect.internal.Mirrors$RootsBase.ensureClassSymbol(Mirrors.scala:90)
at 
scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:119)
at 
scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:21)
at FunTest$$typecreator1$1.apply(FunTest.scala:13)
at 
scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:231)
at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:231)
at 
org.apache.spark.sql.catalyst.ScalaReflection$class.localTypeOf(ScalaReflection.scala:71)
at 
org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:59)
at 
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:28)
at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:426)
at FunTest$.main(FunTest.scala:13)
at FunTest.main(FunTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{code}


was (Author: andrewor14):
FYI, I was able to reproduce this locally. I was able to conclude two things:
1. It has nothing to do with YARN specifically.
2. It is caused by some code in the hive module; I could reproduce this only 
with HiveContext, but not with SQLContext.

Small reproduction:

{code}
bin/spark-submit --master local --class FunTest app.jar
{code}

Inside app.jar: FunTest.scala
{code}
object FunTest {
  def main(args: Array[String]): Unit = {
println(Runnin' my cool class)
val conf = new SparkConf().setAppName(testing)
val sc = new SparkContext(conf)
val sqlContext = new HiveContext(sc)
val coolClasses = Seq(
  MyCoolClass(ast, resent, uture),
  MyCoolClass(mamazing, papazing, fafazing))
val df = sqlContext.createDataFrame(coolClasses)
df.collect()
  }
}
{code}

Inside app.jar: MyCoolClass.scala
{code}
case class MyCoolClass(past: String, present: String, future: String)
{code}

Result:
{code}
Exception in thread main scala.reflect.internal.MissingRequirementError: 
class MyCoolClass not found.
at 
scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
at 
scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
at 
scala.reflect.internal.Mirrors$RootsBase.ensureClassSymbol(Mirrors.scala:90)
at 
scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:119)
at 
scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:21)
at FunTest$$typecreator1$1.apply(FunTest.scala:13)
at 
scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:231)
at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:231)
at

[jira] [Commented] (SPARK-8477) Add in operator to DataFrame Column in Python


[ 
https://issues.apache.org/jira/browse/SPARK-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593928#comment-14593928
 ] 

Davies Liu commented on SPARK-8477:
---

[~rxin] [~yuu.ishik...@gmail.com] We already have `inSet` to match the Scala 
API `in`, we cloud close this one.

 Add in operator to DataFrame Column in Python
 -

 Key: SPARK-8477
 URL: https://issues.apache.org/jira/browse/SPARK-8477
 Project: Spark
  Issue Type: New Feature
  Components: PySpark, SQL
Reporter: Yu Ishikawa





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-8470) MissingRequirementError for ScalaReflection on user classes


 [ 
https://issues.apache.org/jira/browse/SPARK-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-8470.

   Resolution: Fixed
Fix Version/s: 1.5.0
   1.4.1

 MissingRequirementError for ScalaReflection on user classes
 ---

 Key: SPARK-8470
 URL: https://issues.apache.org/jira/browse/SPARK-8470
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Michael Armbrust
Assignee: Andrew Or
Priority: Blocker
 Fix For: 1.4.1, 1.5.0


 From the mailing list:
 {code}
 Since upgrading to Spark 1.4, I'm getting a
 scala.reflect.internal.MissingRequirementError when creating a DataFrame
 from an RDD. The error references a case class in the application (the
 RDD's type parameter), which has been verified to be present.
 Items of note:
 1) This is running on AWS EMR (YARN). I do not get this error running
 locally (standalone).
 2) Reverting to Spark 1.3.1 makes the problem go away
 3) The jar file containing the referenced class (the app assembly jar)
 is not listed in the classpath expansion dumped in the error message.
 I have seen SPARK-5281, and am guessing that this is the root cause,
 especially since the code added there is involved in the stacktrace.
 That said, my grasp on scala reflection isn't strong enough to make
 sense of the change to say for sure. It certainly looks, though, that in
 this scenario the current thread's context classloader may not be what
 we think it is (given #3 above).
 Any ideas?
 App code:
   def registerTable[A : Product : TypeTag](name: String, rdd:
 RDD[A])(implicit hc: HiveContext) = {
 val df = hc.createDataFrame(rdd)
 df.registerTempTable(name)
   }
 Stack trace:
 scala.reflect.internal.MissingRequirementError: class comMyClass in
 JavaMirror with sun.misc.Launcher$AppClassLoader@d16e5d6 of type class
 sun.misc.Launcher$AppClassLoader with classpath [ lots and lots of paths
 and jars, but not the app assembly jar] not found
 at
 scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
 at
 scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
 at
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48)
 at
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
 at
 scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72)
 at
 scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:119)
 at
 scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:21)
 at
 com.ipcoop.spark.sql.SqlEnv$$typecreator1$1.apply(SqlEnv.scala:87)
 at
 scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:231)
 at
 scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:231)
 at
 org.apache.spark.sql.catalyst.ScalaReflection$class.localTypeOf(ScalaReflection.scala:71)
 at
 org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:59)
 at
 org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:28)
 at
 org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:410)
 {code}
 Another report:
 {code}
 Hi,
 I use spark 0.14. I tried to create dataframe from RDD below, but got 
 scala.reflect.internal.MissingRequirementError
 val partitionedTestDF2 = pairVarRDD.toDF(column1,column2,column3)
 //pairVarRDD is RDD[Record4Dim_2], and Record4Dim_2 is a Case Class
 How can I fix this?
 Exception in thread main scala.reflect.internal.MissingRequirementError: 
 class etl.Record4Dim_2 in JavaMirror with 
 sun.misc.Launcher$AppClassLoader@30177039 of type class 
 sun.misc.Launcher$AppClassLoader with classpath 
 [file:/local/spark140/conf/,file:/local/spark140/lib/spark-assembly-1.4.0-SNAPSHOT-hadoop2.6.0.jar,file:/local/spark140/lib/datanucleus-core-3.2.10.jar,file:/local/spark140/lib/datanucleus-rdbms-3.2.9.jar,file:/local/spark140/lib/datanucleus-api-jdo-3.2.6.jar,file:/etc/hadoop/conf/]
  and parent being sun.misc.Launcher$ExtClassLoader@52c8c6d9 of type class 
 sun.misc.Launcher$ExtClassLoader with classpath 
 [file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/sunec.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/sunjce_provider.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/sunpkcs11.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/zipfs.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/localedata.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/dnsns.jar]
  and parent being primordial classloader with boot classpath

[jira] [Commented] (SPARK-8470) MissingRequirementError for ScalaReflection on user classes


[ 
https://issues.apache.org/jira/browse/SPARK-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593954#comment-14593954
 ] 

Andrew Or commented on SPARK-8470:
--

Closing this as FIXED. We will add the regression test in SPARK-8489.

 MissingRequirementError for ScalaReflection on user classes
 ---

 Key: SPARK-8470
 URL: https://issues.apache.org/jira/browse/SPARK-8470
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Michael Armbrust
Assignee: Andrew Or
Priority: Blocker
 Fix For: 1.4.1, 1.5.0


 From the mailing list:
 {code}
 Since upgrading to Spark 1.4, I'm getting a
 scala.reflect.internal.MissingRequirementError when creating a DataFrame
 from an RDD. The error references a case class in the application (the
 RDD's type parameter), which has been verified to be present.
 Items of note:
 1) This is running on AWS EMR (YARN). I do not get this error running
 locally (standalone).
 2) Reverting to Spark 1.3.1 makes the problem go away
 3) The jar file containing the referenced class (the app assembly jar)
 is not listed in the classpath expansion dumped in the error message.
 I have seen SPARK-5281, and am guessing that this is the root cause,
 especially since the code added there is involved in the stacktrace.
 That said, my grasp on scala reflection isn't strong enough to make
 sense of the change to say for sure. It certainly looks, though, that in
 this scenario the current thread's context classloader may not be what
 we think it is (given #3 above).
 Any ideas?
 App code:
   def registerTable[A : Product : TypeTag](name: String, rdd:
 RDD[A])(implicit hc: HiveContext) = {
 val df = hc.createDataFrame(rdd)
 df.registerTempTable(name)
   }
 Stack trace:
 scala.reflect.internal.MissingRequirementError: class comMyClass in
 JavaMirror with sun.misc.Launcher$AppClassLoader@d16e5d6 of type class
 sun.misc.Launcher$AppClassLoader with classpath [ lots and lots of paths
 and jars, but not the app assembly jar] not found
 at
 scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
 at
 scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
 at
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48)
 at
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
 at
 scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72)
 at
 scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:119)
 at
 scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:21)
 at
 com.ipcoop.spark.sql.SqlEnv$$typecreator1$1.apply(SqlEnv.scala:87)
 at
 scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:231)
 at
 scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:231)
 at
 org.apache.spark.sql.catalyst.ScalaReflection$class.localTypeOf(ScalaReflection.scala:71)
 at
 org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:59)
 at
 org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:28)
 at
 org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:410)
 {code}
 Another report:
 {code}
 Hi,
 I use spark 0.14. I tried to create dataframe from RDD below, but got 
 scala.reflect.internal.MissingRequirementError
 val partitionedTestDF2 = pairVarRDD.toDF(column1,column2,column3)
 //pairVarRDD is RDD[Record4Dim_2], and Record4Dim_2 is a Case Class
 How can I fix this?
 Exception in thread main scala.reflect.internal.MissingRequirementError: 
 class etl.Record4Dim_2 in JavaMirror with 
 sun.misc.Launcher$AppClassLoader@30177039 of type class 
 sun.misc.Launcher$AppClassLoader with classpath 
 [file:/local/spark140/conf/,file:/local/spark140/lib/spark-assembly-1.4.0-SNAPSHOT-hadoop2.6.0.jar,file:/local/spark140/lib/datanucleus-core-3.2.10.jar,file:/local/spark140/lib/datanucleus-rdbms-3.2.9.jar,file:/local/spark140/lib/datanucleus-api-jdo-3.2.6.jar,file:/etc/hadoop/conf/]
  and parent being sun.misc.Launcher$ExtClassLoader@52c8c6d9 of type class 
 sun.misc.Launcher$ExtClassLoader with classpath 
 [file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/sunec.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/sunjce_provider.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/sunpkcs11.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/zipfs.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/localedata.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/dnsns.jar]
  and parent being primordial classloader with boot classpath

[jira] [Resolved] (SPARK-8093) Spark 1.4 branch's new JSON schema inference has changed the behavior of handling inner empty JSON object.


 [ 
https://issues.apache.org/jira/browse/SPARK-8093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-8093.
-
   Resolution: Fixed
Fix Version/s: 1.4.1

Issue resolved by pull request 6799
[https://github.com/apache/spark/pull/6799]

 Spark 1.4 branch's new JSON schema inference has changed the behavior of 
 handling inner empty JSON object.
 --

 Key: SPARK-8093
 URL: https://issues.apache.org/jira/browse/SPARK-8093
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Harish Butani
Assignee: Nathan Howell
Priority: Critical
 Fix For: 1.4.1

 Attachments: t1.json


 This is similar to SPARK-3365. Sample json is attached. Code to reproduce
 {code}
 var jsonDF = read.json(/tmp/t1.json)
 jsonDF.write.parquet(/tmp/t1.parquet)
 {code}
 The 'integration' object is empty in the json.
 StackTrace:
 {code}
 
 Caused by: java.io.IOException: Could not read footer: 
 java.lang.IllegalStateException: Cannot build an empty group
   at 
 parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:238)
   at 
 org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:369)
   at 
 org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$metadataCache$lzycompute(newParquet.scala:154)
   at 
 org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$metadataCache(newParquet.scala:152)
   at 
 org.apache.spark.sql.parquet.ParquetRelation2.refresh(newParquet.scala:197)
   at 
 org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.insert(commands.scala:134)
   ... 69 more
 Caused by: java.lang.IllegalStateException: Cannot build an empty group
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8494) ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3

PJ Fanning created SPARK-8494:
-

 Summary: ClassNotFoundException when running with sbt, scala 
2.10.4, spray 1.3.3
 Key: SPARK-8494
 URL: https://issues.apache.org/jira/browse/SPARK-8494
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: PJ Fanning
Assignee: Patrick Wendell


I just wanted to document this for posterity. I had an issue when running a 
Spark 1.0 app locally with sbt. The issue was that if you both:

1. Reference a scala class (e.g. None) inside of a closure.
2. Run your program with 'sbt run'

It throws an exception. Upgrading the scalaVersion to 2.10.4 in sbt solved this 
issue. Somehow scala classes were not being loaded correctly inside of the 
executors:

Application:
{code}
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object Test {
  def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
val sc = new SparkContext(conf)
sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
sc.stop()
  }
{code}

Exception:
{code}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
failed 1 times, most recent failure: Exception failure in TID 1 on host 
localhost: java.lang.ClassNotFoundException: scala.None$
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
java.security.AccessController.doPrivileged(Native Method)
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
java.lang.ClassLoader.loadClass(ClassLoader.java:425)
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
java.lang.Class.forName0(Native Method)
java.lang.Class.forName(Class.java:270)

org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8486) SIFT/SURF Feature Transformer


 [ 
https://issues.apache.org/jira/browse/SPARK-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-8486:
-
Description: 
Scale invariant feature transform (SIFT) is a method to transform images into 
dense vectors describing local features which are invariant to scale and 
rotation. (Lowe, IJCV 2004, http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)

We can implement SIFT in Spark ML pipelines as a 
org.apache.spark.ml.Transformer. Given an image Array[Array[Numeric]], the SIFT 
transformer should output an ArrayArray[[Numeric]] of the SIFT features for the 
provided image.

Depending on performance, approximating Laplacian of Gaussian by Difference of 
Gaussian (traditional SIFT) as described by Lowe can be even further improved 
using box filters (aka SURF, see Bay, ECCV 2006,  
http://www.vision.ee.ethz.ch/~surf/eccv06.pdf).

  was:
Scale invariant feature transform (SIFT) is a method to transform images into 
dense vectors describing local features which are invariant to scale and 
rotation. (Lowe, IJCV 2004, http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)

We can implement SIFT in Spark ML pipelines as a 
org.apache.spark.ml.Transformer. Given an image Array[Array[Numeric]], the SIFT 
transformer should output an Array[Numeric] of the SIFT features present in the 
image.

Depending on performance, approximating Laplacian of Gaussian by Difference of 
Gaussian (traditional SIFT) as described by Lowe can be even further improved 
using box filters (aka SURF, see Bay, ECCV 2006,  
http://www.vision.ee.ethz.ch/~surf/eccv06.pdf).


 SIFT/SURF Feature Transformer
 -

 Key: SPARK-8486
 URL: https://issues.apache.org/jira/browse/SPARK-8486
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Feynman Liang

 Scale invariant feature transform (SIFT) is a method to transform images into 
 dense vectors describing local features which are invariant to scale and 
 rotation. (Lowe, IJCV 2004, http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)
 We can implement SIFT in Spark ML pipelines as a 
 org.apache.spark.ml.Transformer. Given an image Array[Array[Numeric]], the 
 SIFT transformer should output an ArrayArray[[Numeric]] of the SIFT features 
 for the provided image.
 Depending on performance, approximating Laplacian of Gaussian by Difference 
 of Gaussian (traditional SIFT) as described by Lowe can be even further 
 improved using box filters (aka SURF, see Bay, ECCV 2006,  
 http://www.vision.ee.ethz.ch/~surf/eccv06.pdf).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8420) Inconsistent behavior with Dataframe Timestamp between 1.3.1 and 1.4.0

2015-06-19 Thread Michael Armbrust (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593915#comment-14593915
 ] 

Michael Armbrust commented on SPARK-8420:
-

This is no longer true as we now special case equality.

 Inconsistent behavior with Dataframe Timestamp between 1.3.1 and 1.4.0
 --

 Key: SPARK-8420
 URL: https://issues.apache.org/jira/browse/SPARK-8420
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Justin Yip
Assignee: Michael Armbrust
Priority: Blocker
  Labels: releasenotes

 I am trying out 1.4.0 and notice there are some differences in behavior with 
 Timestamp between 1.3.1 and 1.4.0. 
 In 1.3.1, I can compare a Timestamp with string.
 {code}
 scala val df = sqlContext.createDataFrame(Seq((1, 
 Timestamp.valueOf(2015-01-01 00:00:00)), (2, Timestamp.valueOf(2014-01-01 
 00:00:00
 ...
 scala df.filter($_2 = 2014-06-01).show
 ...
 _1 _2  
 2  2014-01-01 00:00:...
 {code}
 However, in 1.4.0, the filter is always false:
 {code}
 scala val df = sqlContext.createDataFrame(Seq((1, 
 Timestamp.valueOf(2015-01-01 00:00:00)), (2, Timestamp.valueOf(2014-01-01 
 00:00:00
 df: org.apache.spark.sql.DataFrame = [_1: int, _2: timestamp]
 scala df.filter($_2 = 2014-06-01).show
 +--+--+
 |_1|_2|
 +--+--+
 +--+--+
 {code}
 Not sure if that is intended, but I cannot find any doc mentioning these 
 inconsistencies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-8477) Add in operator to DataFrame Column in Python


[ 
https://issues.apache.org/jira/browse/SPARK-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593895#comment-14593895
 ] 

Yu Ishikawa edited comment on SPARK-8477 at 6/19/15 8:27 PM:
-

Oh, I'm sorry. I didn't know we have already had {{inSet}}. It seems that the 
function of {{inSet}} is almost like that of {{in}}.


was (Author: yuu.ishik...@gmail.com):
Oh, I'm sorry. I didn't know we have already have {{inSet}}. It seems that the 
function of {{inSet}} is almost like that of {{in}}.

 Add in operator to DataFrame Column in Python
 -

 Key: SPARK-8477
 URL: https://issues.apache.org/jira/browse/SPARK-8477
 Project: Spark
  Issue Type: New Feature
  Components: PySpark, SQL
Reporter: Yu Ishikawa





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8093) Spark 1.4 branch's new JSON schema inference has changed the behavior of handling inner empty JSON object.


[ 
https://issues.apache.org/jira/browse/SPARK-8093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594097#comment-14594097
 ] 

Yin Huai commented on SPARK-8093:
-

With https://github.com/apache/spark/pull/6799, we have changed the behavior 
back to Spark 1.3's behavior. Empty inner structs will not be in the schema. 

 Spark 1.4 branch's new JSON schema inference has changed the behavior of 
 handling inner empty JSON object.
 --

 Key: SPARK-8093
 URL: https://issues.apache.org/jira/browse/SPARK-8093
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Harish Butani
Assignee: Nathan Howell
Priority: Critical
 Fix For: 1.4.1, 1.5.0

 Attachments: t1.json


 This is similar to SPARK-3365. Sample json is attached. Code to reproduce
 {code}
 var jsonDF = read.json(/tmp/t1.json)
 jsonDF.write.parquet(/tmp/t1.parquet)
 {code}
 The 'integration' object is empty in the json.
 StackTrace:
 {code}
 
 Caused by: java.io.IOException: Could not read footer: 
 java.lang.IllegalStateException: Cannot build an empty group
   at 
 parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:238)
   at 
 org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:369)
   at 
 org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$metadataCache$lzycompute(newParquet.scala:154)
   at 
 org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$metadataCache(newParquet.scala:152)
   at 
 org.apache.spark.sql.parquet.ParquetRelation2.refresh(newParquet.scala:197)
   at 
 org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.insert(commands.scala:134)
   ... 69 more
 Caused by: java.lang.IllegalStateException: Cannot build an empty group
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8485) Feature transformers for image processing

Feynman Liang created SPARK-8485:


 Summary: Feature transformers for image processing
 Key: SPARK-8485
 URL: https://issues.apache.org/jira/browse/SPARK-8485
 Project: Spark
  Issue Type: New Feature
Reporter: Feynman Liang


Many transformers exist to convert from image representations into more compact 
descriptors amenable to standard ML techniques. We should implement these 
transformers in Spark to support machine learning on richer content types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8486) SIFT/SURF Feature Extractor


 [ 
https://issues.apache.org/jira/browse/SPARK-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-8486:
-
Description: 
Scale invariant feature transform (SIFT) is a method to transform images into 
dense vectors describing local features which are invariant to scale and 
rotation. (Lowe, IJCV 2004, http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)

We can implement SIFT in Spark ML pipelines as a 
org.apache.spark.ml.Transformer. Given an image Array[Array[Numeric]], the SIFT 
transformer should output an Array[Numeric] of the SIFT features present in the 
image.

Depending on performance, approximating Laplacian of Gaussian by Difference of 
Gaussian (traditional SIFT) as described by Lowe can be even further improved 
using box filters (aka SURF, see Bay, ECCV 2006,  
http://www.vision.ee.ethz.ch/~surf/eccv06.pdf).

  was:
Scale invariant feature transform (SIFT) is a method to transform images into 
dense vectors describing local features which are invariant to scale and 
rotation. (Lowe, IJCV 2004, http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)

We can implement SIFT in Spark ML pipelines as a 
[[org.apache.spark.ml.Transformer]]. Given an image Array[Array[Numeric]], the 
SIFT transformer should output an Array[Numeric] of the SIFT features present 
in the image.

Depending on performance, approximating Laplacian of Gaussian by Difference of 
Gaussian (traditional SIFT) as described by Lowe can be even further improved 
using box filters (aka SURF, see Bay, ECCV 2006,  
http://www.vision.ee.ethz.ch/~surf/eccv06.pdf).


 SIFT/SURF Feature Extractor
 ---

 Key: SPARK-8486
 URL: https://issues.apache.org/jira/browse/SPARK-8486
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Feynman Liang

 Scale invariant feature transform (SIFT) is a method to transform images into 
 dense vectors describing local features which are invariant to scale and 
 rotation. (Lowe, IJCV 2004, http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)
 We can implement SIFT in Spark ML pipelines as a 
 org.apache.spark.ml.Transformer. Given an image Array[Array[Numeric]], the 
 SIFT transformer should output an Array[Numeric] of the SIFT features present 
 in the image.
 Depending on performance, approximating Laplacian of Gaussian by Difference 
 of Gaussian (traditional SIFT) as described by Lowe can be even further 
 improved using box filters (aka SURF, see Bay, ECCV 2006,  
 http://www.vision.ee.ethz.ch/~surf/eccv06.pdf).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8470) MissingRequirementError for ScalaReflection on user classes


[ 
https://issues.apache.org/jira/browse/SPARK-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593927#comment-14593927
 ] 

Andrew Or commented on SPARK-8470:
--

FYI, I was able to reproduce this locally. I was able to conclude two things:
1. It has nothing to do with YARN specifically.
2. It is caused by some code in the hive module; I could reproduce this only 
with HiveContext, but not with SQLContext.

Small reproduction:

{code}
bin/spark-submit --master local --class FunTest app.jar
{code}

Inside app.jar: FunTest.scala
{code}
object FunTest {
  def main(args: Array[String]): Unit = {
println(Runnin' my cool class)
val conf = new SparkConf().setAppName(testing)
val sc = new SparkContext(conf)
val sqlContext = new HiveContext(sc)
val coolClasses = Seq(
  MyCoolClass(ast, resent, uture),
  MyCoolClass(mamazing, papazing, fafazing))
val df = sqlContext.createDataFrame(coolClasses)
df.collect()
  }
}
{code}

Inside app.jar: MyCoolClass.scala
{code}
case class MyCoolClass(past: String, present: String, future: String)
{code}

Result:
{code}
Exception in thread main scala.reflect.internal.MissingRequirementError: 
class MyCoolClass not found.
at 
scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
at 
scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
at 
scala.reflect.internal.Mirrors$RootsBase.ensureClassSymbol(Mirrors.scala:90)
at 
scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:119)
at 
scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:21)
at FunTest$$typecreator1$1.apply(FunTest.scala:13)
at 
scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:231)
at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:231)
at 
org.apache.spark.sql.catalyst.ScalaReflection$class.localTypeOf(ScalaReflection.scala:71)
at 
org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:59)
at 
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:28)
at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:426)
at FunTest$.main(FunTest.scala:13)
at FunTest.main(FunTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{code}

 MissingRequirementError for ScalaReflection on user classes
 ---

 Key: SPARK-8470
 URL: https://issues.apache.org/jira/browse/SPARK-8470
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Michael Armbrust
Assignee: Andrew Or
Priority: Blocker

 From the mailing list:
 {code}
 Since upgrading to Spark 1.4, I'm getting a
 scala.reflect.internal.MissingRequirementError when creating a DataFrame
 from an RDD. The error references a case class in the application (the
 RDD's type parameter), which has been verified to be present.
 Items of note:
 1) This is running on AWS EMR (YARN). I do not get this error running
 locally (standalone).
 2) Reverting to Spark 1.3.1 makes the problem go away
 3) The jar file containing the referenced class (the app assembly jar)
 is not listed in the classpath expansion dumped in the error message.
 I have seen SPARK-5281, and am guessing that this is the root cause,
 especially since the code added there is involved in the stacktrace.
 That said, my grasp on scala reflection isn't strong enough to make
 sense of the change to say for sure. It certainly looks, though, that in
 this scenario the current thread's context classloader may not be what
 we think it is (given #3 above).
 Any ideas?
 App code:
   def registerTable[A : Product : TypeTag](name: String, rdd:
 RDD[A])(implicit hc: HiveContext) = {
 val df = hc.createDataFrame(rdd)
 df.registerTempTable(name)
   }
 Stack trace:
 scala.reflect.internal.MissingRequirementError: class comMyClass in
 JavaMirror with sun.misc.Launcher$AppClassLoader@d16e5d6 of type class
 sun.misc.Launcher$AppClassLoader with

[jira] [Created] (SPARK-8487) Update reduceByKeyAndWindow docs to highlight that filtering Function must be used

Tathagata Das created SPARK-8487:


 Summary: Update reduceByKeyAndWindow docs to highlight that 
filtering Function must be used
 Key: SPARK-8487
 URL: https://issues.apache.org/jira/browse/SPARK-8487
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Reporter: Tathagata Das
Assignee: Tathagata Das
Priority: Critical






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-8470) MissingRequirementError for ScalaReflection on user classes


 [ 
https://issues.apache.org/jira/browse/SPARK-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or closed SPARK-8470.

Resolution: Duplicate

 MissingRequirementError for ScalaReflection on user classes
 ---

 Key: SPARK-8470
 URL: https://issues.apache.org/jira/browse/SPARK-8470
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Michael Armbrust
Assignee: Andrew Or
Priority: Blocker
 Fix For: 1.4.1, 1.5.0


 From the mailing list:
 {code}
 Since upgrading to Spark 1.4, I'm getting a
 scala.reflect.internal.MissingRequirementError when creating a DataFrame
 from an RDD. The error references a case class in the application (the
 RDD's type parameter), which has been verified to be present.
 Items of note:
 1) This is running on AWS EMR (YARN). I do not get this error running
 locally (standalone).
 2) Reverting to Spark 1.3.1 makes the problem go away
 3) The jar file containing the referenced class (the app assembly jar)
 is not listed in the classpath expansion dumped in the error message.
 I have seen SPARK-5281, and am guessing that this is the root cause,
 especially since the code added there is involved in the stacktrace.
 That said, my grasp on scala reflection isn't strong enough to make
 sense of the change to say for sure. It certainly looks, though, that in
 this scenario the current thread's context classloader may not be what
 we think it is (given #3 above).
 Any ideas?
 App code:
   def registerTable[A : Product : TypeTag](name: String, rdd:
 RDD[A])(implicit hc: HiveContext) = {
 val df = hc.createDataFrame(rdd)
 df.registerTempTable(name)
   }
 Stack trace:
 scala.reflect.internal.MissingRequirementError: class comMyClass in
 JavaMirror with sun.misc.Launcher$AppClassLoader@d16e5d6 of type class
 sun.misc.Launcher$AppClassLoader with classpath [ lots and lots of paths
 and jars, but not the app assembly jar] not found
 at
 scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
 at
 scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
 at
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48)
 at
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
 at
 scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72)
 at
 scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:119)
 at
 scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:21)
 at
 com.ipcoop.spark.sql.SqlEnv$$typecreator1$1.apply(SqlEnv.scala:87)
 at
 scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:231)
 at
 scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:231)
 at
 org.apache.spark.sql.catalyst.ScalaReflection$class.localTypeOf(ScalaReflection.scala:71)
 at
 org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:59)
 at
 org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:28)
 at
 org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:410)
 {code}
 Another report:
 {code}
 Hi,
 I use spark 0.14. I tried to create dataframe from RDD below, but got 
 scala.reflect.internal.MissingRequirementError
 val partitionedTestDF2 = pairVarRDD.toDF(column1,column2,column3)
 //pairVarRDD is RDD[Record4Dim_2], and Record4Dim_2 is a Case Class
 How can I fix this?
 Exception in thread main scala.reflect.internal.MissingRequirementError: 
 class etl.Record4Dim_2 in JavaMirror with 
 sun.misc.Launcher$AppClassLoader@30177039 of type class 
 sun.misc.Launcher$AppClassLoader with classpath 
 [file:/local/spark140/conf/,file:/local/spark140/lib/spark-assembly-1.4.0-SNAPSHOT-hadoop2.6.0.jar,file:/local/spark140/lib/datanucleus-core-3.2.10.jar,file:/local/spark140/lib/datanucleus-rdbms-3.2.9.jar,file:/local/spark140/lib/datanucleus-api-jdo-3.2.6.jar,file:/etc/hadoop/conf/]
  and parent being sun.misc.Launcher$ExtClassLoader@52c8c6d9 of type class 
 sun.misc.Launcher$ExtClassLoader with classpath 
 [file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/sunec.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/sunjce_provider.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/sunpkcs11.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/zipfs.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/localedata.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/dnsns.jar]
  and parent being primordial classloader with boot classpath

[jira] [Reopened] (SPARK-8470) MissingRequirementError for ScalaReflection on user classes


 [ 
https://issues.apache.org/jira/browse/SPARK-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or reopened SPARK-8470:
--

 MissingRequirementError for ScalaReflection on user classes
 ---

 Key: SPARK-8470
 URL: https://issues.apache.org/jira/browse/SPARK-8470
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Michael Armbrust
Assignee: Andrew Or
Priority: Blocker
 Fix For: 1.4.1, 1.5.0


 From the mailing list:
 {code}
 Since upgrading to Spark 1.4, I'm getting a
 scala.reflect.internal.MissingRequirementError when creating a DataFrame
 from an RDD. The error references a case class in the application (the
 RDD's type parameter), which has been verified to be present.
 Items of note:
 1) This is running on AWS EMR (YARN). I do not get this error running
 locally (standalone).
 2) Reverting to Spark 1.3.1 makes the problem go away
 3) The jar file containing the referenced class (the app assembly jar)
 is not listed in the classpath expansion dumped in the error message.
 I have seen SPARK-5281, and am guessing that this is the root cause,
 especially since the code added there is involved in the stacktrace.
 That said, my grasp on scala reflection isn't strong enough to make
 sense of the change to say for sure. It certainly looks, though, that in
 this scenario the current thread's context classloader may not be what
 we think it is (given #3 above).
 Any ideas?
 App code:
   def registerTable[A : Product : TypeTag](name: String, rdd:
 RDD[A])(implicit hc: HiveContext) = {
 val df = hc.createDataFrame(rdd)
 df.registerTempTable(name)
   }
 Stack trace:
 scala.reflect.internal.MissingRequirementError: class comMyClass in
 JavaMirror with sun.misc.Launcher$AppClassLoader@d16e5d6 of type class
 sun.misc.Launcher$AppClassLoader with classpath [ lots and lots of paths
 and jars, but not the app assembly jar] not found
 at
 scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
 at
 scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
 at
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48)
 at
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
 at
 scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72)
 at
 scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:119)
 at
 scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:21)
 at
 com.ipcoop.spark.sql.SqlEnv$$typecreator1$1.apply(SqlEnv.scala:87)
 at
 scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:231)
 at
 scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:231)
 at
 org.apache.spark.sql.catalyst.ScalaReflection$class.localTypeOf(ScalaReflection.scala:71)
 at
 org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:59)
 at
 org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:28)
 at
 org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:410)
 {code}
 Another report:
 {code}
 Hi,
 I use spark 0.14. I tried to create dataframe from RDD below, but got 
 scala.reflect.internal.MissingRequirementError
 val partitionedTestDF2 = pairVarRDD.toDF(column1,column2,column3)
 //pairVarRDD is RDD[Record4Dim_2], and Record4Dim_2 is a Case Class
 How can I fix this?
 Exception in thread main scala.reflect.internal.MissingRequirementError: 
 class etl.Record4Dim_2 in JavaMirror with 
 sun.misc.Launcher$AppClassLoader@30177039 of type class 
 sun.misc.Launcher$AppClassLoader with classpath 
 [file:/local/spark140/conf/,file:/local/spark140/lib/spark-assembly-1.4.0-SNAPSHOT-hadoop2.6.0.jar,file:/local/spark140/lib/datanucleus-core-3.2.10.jar,file:/local/spark140/lib/datanucleus-rdbms-3.2.9.jar,file:/local/spark140/lib/datanucleus-api-jdo-3.2.6.jar,file:/etc/hadoop/conf/]
  and parent being sun.misc.Launcher$ExtClassLoader@52c8c6d9 of type class 
 sun.misc.Launcher$ExtClassLoader with classpath 
 [file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/sunec.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/sunjce_provider.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/sunpkcs11.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/zipfs.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/localedata.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/dnsns.jar]
  and parent being primordial classloader with boot classpath

[jira] [Created] (SPARK-8490) SURF Feature Transformer

Feynman Liang created SPARK-8490:


 Summary: SURF Feature Transformer
 Key: SPARK-8490
 URL: https://issues.apache.org/jira/browse/SPARK-8490
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Feynman Liang


Speeded up robust features (SURF) (Bay et al, ECCV 2006, 
http://www.vision.ee.ethz.ch/~surf/eccv06.pdf) is an image descriptor transform 
very similar to SIFT (SPARK-8486) but can be computed more efficiently. One key 
difference is using box filters (Difference of Boxes) to approximate the 
Laplacian of the Gaussian.

We can implement SURF in Spark ML pipelines as a 
org.apache.spark.ml.Transformer. Given an image Array[Array[Numeric]], the SIFT 
transformer should output an ArrayArray[[Numeric]] of the SURF features for the 
provided image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8470) MissingRequirementError for ScalaReflection on user classes


[ 
https://issues.apache.org/jira/browse/SPARK-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594005#comment-14594005
 ] 

Apache Spark commented on SPARK-8470:
-

User 'andrewor14' has created a pull request for this issue:
https://github.com/apache/spark/pull/6909

 MissingRequirementError for ScalaReflection on user classes
 ---

 Key: SPARK-8470
 URL: https://issues.apache.org/jira/browse/SPARK-8470
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Michael Armbrust
Assignee: Andrew Or
Priority: Blocker
 Fix For: 1.4.1, 1.5.0


 From the mailing list:
 {code}
 Since upgrading to Spark 1.4, I'm getting a
 scala.reflect.internal.MissingRequirementError when creating a DataFrame
 from an RDD. The error references a case class in the application (the
 RDD's type parameter), which has been verified to be present.
 Items of note:
 1) This is running on AWS EMR (YARN). I do not get this error running
 locally (standalone).
 2) Reverting to Spark 1.3.1 makes the problem go away
 3) The jar file containing the referenced class (the app assembly jar)
 is not listed in the classpath expansion dumped in the error message.
 I have seen SPARK-5281, and am guessing that this is the root cause,
 especially since the code added there is involved in the stacktrace.
 That said, my grasp on scala reflection isn't strong enough to make
 sense of the change to say for sure. It certainly looks, though, that in
 this scenario the current thread's context classloader may not be what
 we think it is (given #3 above).
 Any ideas?
 App code:
   def registerTable[A : Product : TypeTag](name: String, rdd:
 RDD[A])(implicit hc: HiveContext) = {
 val df = hc.createDataFrame(rdd)
 df.registerTempTable(name)
   }
 Stack trace:
 scala.reflect.internal.MissingRequirementError: class comMyClass in
 JavaMirror with sun.misc.Launcher$AppClassLoader@d16e5d6 of type class
 sun.misc.Launcher$AppClassLoader with classpath [ lots and lots of paths
 and jars, but not the app assembly jar] not found
 at
 scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
 at
 scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
 at
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48)
 at
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
 at
 scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72)
 at
 scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:119)
 at
 scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:21)
 at
 com.ipcoop.spark.sql.SqlEnv$$typecreator1$1.apply(SqlEnv.scala:87)
 at
 scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:231)
 at
 scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:231)
 at
 org.apache.spark.sql.catalyst.ScalaReflection$class.localTypeOf(ScalaReflection.scala:71)
 at
 org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:59)
 at
 org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:28)
 at
 org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:410)
 {code}
 Another report:
 {code}
 Hi,
 I use spark 0.14. I tried to create dataframe from RDD below, but got 
 scala.reflect.internal.MissingRequirementError
 val partitionedTestDF2 = pairVarRDD.toDF(column1,column2,column3)
 //pairVarRDD is RDD[Record4Dim_2], and Record4Dim_2 is a Case Class
 How can I fix this?
 Exception in thread main scala.reflect.internal.MissingRequirementError: 
 class etl.Record4Dim_2 in JavaMirror with 
 sun.misc.Launcher$AppClassLoader@30177039 of type class 
 sun.misc.Launcher$AppClassLoader with classpath 
 [file:/local/spark140/conf/,file:/local/spark140/lib/spark-assembly-1.4.0-SNAPSHOT-hadoop2.6.0.jar,file:/local/spark140/lib/datanucleus-core-3.2.10.jar,file:/local/spark140/lib/datanucleus-rdbms-3.2.9.jar,file:/local/spark140/lib/datanucleus-api-jdo-3.2.6.jar,file:/etc/hadoop/conf/]
  and parent being sun.misc.Launcher$ExtClassLoader@52c8c6d9 of type class 
 sun.misc.Launcher$ExtClassLoader with classpath 
 [file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/sunec.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/sunjce_provider.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/sunpkcs11.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/zipfs.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/localedata.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/dnsns.jar]
  and parent being primordial classloader with boot classpath

[jira] [Updated] (SPARK-8494) ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3


 [ 
https://issues.apache.org/jira/browse/SPARK-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-8494:
--
Description: 
I found a similar issue to SPARK-1923 but with Scala 2.10.4.
I used the Test.scala from SPARK-1923 but used the libraryDependencies from a 
build.sbt that I am working on.
If I remove the spray 1.3.3 jars, the test case passes but has a 
ClassNotFoundException otherwise.

Application:
{code}
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object Test {
  def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
val sc = new SparkContext(conf)
sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
sc.stop()
  }
{code}

Exception:
{code}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
failed 1 times, most recent failure: Exception failure in TID 1 on host 
localhost: java.lang.ClassNotFoundException: scala.collection.immutable.Range
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
java.security.AccessController.doPrivileged(Native Method)
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
java.lang.ClassLoader.loadClass(ClassLoader.java:425)
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
java.lang.Class.forName0(Native Method)
java.lang.Class.forName(Class.java:270)

org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
{code}

{code}
name := spark-test-case

version := 1.0

scalaVersion := 2.10.4

resolvers += spray repo at http://repo.spray.io;

resolvers += Scalaz Bintray Repo at https://dl.bintray.com/scalaz/releases;

val akkaVersion = 2.3.11
val sprayVersion = 1.3.3

libraryDependencies ++= Seq(
  com.h2database  % h2   % 1.4.187,
  com.typesafe.akka  %% akka-actor   % akkaVersion,
  com.typesafe.akka  %% akka-slf4j   % akkaVersion,
  ch.qos.logback  % logback-classic  % 1.0.13,
  io.spray   %% spray-can% sprayVersion,
  io.spray   %% spray-routing% sprayVersion,
  io.spray   %% spray-json   % 1.3.1,
  com.databricks %% spark-csv% 1.0.3,
  org.specs2 %% specs2   % 2.4.17   % test,
  org.specs2 %% specs2-junit % 2.4.17   % test,
  io.spray   %% spray-testkit% sprayVersion   % test,
  com.typesafe.akka  %% akka-testkit % akkaVersion% test,
  junit   % junit% 4.12 % test
)

scalacOptions ++= Seq(
  -unchecked,
  -deprecation,
  -Xlint,
  -Ywarn-dead-code,
  -language:_,
  -target:jvm-1.7,
  -encoding, UTF-8
)

testOptions += Tests.Argument(TestFrameworks.JUnit, -v)
{code}


  was:
I found a similar issue to SPARK-1923 but with Scala 2.10.4.
I used the Test.scala from SPARK-1923 but used the libraryDependencies from a 
build.sbt that I am working on.
If I remove the spray 1.3.3 jars, the test case passes but has a SPARK-1923

Application:
{code}
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object Test {
  def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
val sc = new SparkContext(conf)
sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
sc.stop()
  }
{code}

Exception:
{code}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
failed 1 times, most recent failure: Exception failure in TID 1 on host 
localhost: java.lang.ClassNotFoundException: scala.None$
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
java.security.AccessController.doPrivileged(Native Method)
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
java.lang.ClassLoader.loadClass(ClassLoader.java:425)
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
java.lang.Class.forName0(Native Method)
java.lang.Class.forName(Class.java:270)

org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
{code}



 ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3
 ---

 Key: SPARK-8494
 URL: https://issues.apache.org/jira/browse/SPARK-8494
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: PJ Fanning

[jira] [Commented] (SPARK-8492) Support BinaryType in UnsafeRow


[ 
https://issues.apache.org/jira/browse/SPARK-8492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594131#comment-14594131
 ] 

Apache Spark commented on SPARK-8492:
-

User 'davies' has created a pull request for this issue:
https://github.com/apache/spark/pull/6911

 Support BinaryType in UnsafeRow
 ---

 Key: SPARK-8492
 URL: https://issues.apache.org/jira/browse/SPARK-8492
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Davies Liu
Assignee: Davies Liu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8492) Support BinaryType in UnsafeRow


 [ 
https://issues.apache.org/jira/browse/SPARK-8492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8492:
---

Assignee: Davies Liu  (was: Apache Spark)

 Support BinaryType in UnsafeRow
 ---

 Key: SPARK-8492
 URL: https://issues.apache.org/jira/browse/SPARK-8492
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Davies Liu
Assignee: Davies Liu





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8477) Add in operator to DataFrame Column in Python


[ 
https://issues.apache.org/jira/browse/SPARK-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593888#comment-14593888
 ] 

Yu Ishikawa commented on SPARK-8477:


Should I rename the upper case {{In}} to {{inSet}}?

 Add in operator to DataFrame Column in Python
 -

 Key: SPARK-8477
 URL: https://issues.apache.org/jira/browse/SPARK-8477
 Project: Spark
  Issue Type: New Feature
  Components: PySpark, SQL
Reporter: Yu Ishikawa





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8486) SIFT/SURF Feature Extractor


 [ 
https://issues.apache.org/jira/browse/SPARK-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-8486:
-
Description: 
Scale invariant feature transform (SIFT) is a method to transform images into 
dense vectors describing local features which are invariant to scale and 
rotation. (Lowe, IJCV 2004, http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)

We can implement SIFT in Spark ML pipelines as a 
[[org.apache.spark.ml.Transformer]]. Given an image Array[Array[Numeric]], the 
SIFT transformer should output an Array[Numeric] of the SIFT features present 
in the image.

Depending on performance, approximating Laplacian of Gaussian by Difference of 
Gaussian (aka SURF) as described by Lowe can be even further improved using box 
filters (Bay, ECCV 2006,  http://www.vision.ee.ethz.ch/~surf/eccv06.pdf).

  was:
Scale invariant feature transform (SIFT) is a method to transform images into 
dense vectors describing local features which are invariant to scale and 
rotation. (Lowe, IJCV 2004, http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)

We can implement SIFT in Spark ML pipelines as a 
[[org.apache.spark.ml.Transformer]]. Given an image Array[Array[Numeric]], the 
SIFT transformer should output an Array[Numeric] of the SIFT features present 
in the image.

Depending on performance, approximating Laplacian of Gaussian by Difference of 
Gaussian as described by Lowe can be even further improved using box filters 
(Bay, ECCV 2006,  http://www.vision.ee.ethz.ch/~surf/eccv06.pdf).


 SIFT/SURF Feature Extractor
 ---

 Key: SPARK-8486
 URL: https://issues.apache.org/jira/browse/SPARK-8486
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Feynman Liang

 Scale invariant feature transform (SIFT) is a method to transform images into 
 dense vectors describing local features which are invariant to scale and 
 rotation. (Lowe, IJCV 2004, http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)
 We can implement SIFT in Spark ML pipelines as a 
 [[org.apache.spark.ml.Transformer]]. Given an image Array[Array[Numeric]], 
 the SIFT transformer should output an Array[Numeric] of the SIFT features 
 present in the image.
 Depending on performance, approximating Laplacian of Gaussian by Difference 
 of Gaussian (aka SURF) as described by Lowe can be even further improved 
 using box filters (Bay, ECCV 2006,  
 http://www.vision.ee.ethz.ch/~surf/eccv06.pdf).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8486) SIFT/SURF Feature Extractor


 [ 
https://issues.apache.org/jira/browse/SPARK-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-8486:
-
Summary: SIFT/SURF Feature Extractor  (was: SIFT Feature Extractor)

 SIFT/SURF Feature Extractor
 ---

 Key: SPARK-8486
 URL: https://issues.apache.org/jira/browse/SPARK-8486
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Feynman Liang

 Scale invariant feature transform (SIFT) is a method to transform images into 
 dense vectors describing local features which are invariant to scale and 
 rotation. (Lowe, IJCV 2004, http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)
 We can implement SIFT in Spark ML pipelines as a 
 [[org.apache.spark.ml.Transformer]]. Given an image Array[Array[Numeric]], 
 the SIFT transformer should output an Array[Numeric] of the SIFT features 
 present in the image.
 Depending on performance, approximating Laplacian of Gaussian by Difference 
 of Gaussian as described by Lowe can be even further improved using box 
 filters (Bay, ECCV 2006,  http://www.vision.ee.ethz.ch/~surf/eccv06.pdf).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8486) SIFT/SURF Feature Extractor


 [ 
https://issues.apache.org/jira/browse/SPARK-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-8486:
-
Description: 
Scale invariant feature transform (SIFT) is a method to transform images into 
dense vectors describing local features which are invariant to scale and 
rotation. (Lowe, IJCV 2004, http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)

We can implement SIFT in Spark ML pipelines as a 
[[org.apache.spark.ml.Transformer]]. Given an image Array[Array[Numeric]], the 
SIFT transformer should output an Array[Numeric] of the SIFT features present 
in the image.

Depending on performance, approximating Laplacian of Gaussian by Difference of 
Gaussian (traditional SIFT) as described by Lowe can be even further improved 
using box filters (aka SURF, see Bay, ECCV 2006,  
http://www.vision.ee.ethz.ch/~surf/eccv06.pdf).

  was:
Scale invariant feature transform (SIFT) is a method to transform images into 
dense vectors describing local features which are invariant to scale and 
rotation. (Lowe, IJCV 2004, http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)

We can implement SIFT in Spark ML pipelines as a 
[[org.apache.spark.ml.Transformer]]. Given an image Array[Array[Numeric]], the 
SIFT transformer should output an Array[Numeric] of the SIFT features present 
in the image.

Depending on performance, approximating Laplacian of Gaussian by Difference of 
Gaussian (aka SURF) as described by Lowe can be even further improved using box 
filters (Bay, ECCV 2006,  http://www.vision.ee.ethz.ch/~surf/eccv06.pdf).


 SIFT/SURF Feature Extractor
 ---

 Key: SPARK-8486
 URL: https://issues.apache.org/jira/browse/SPARK-8486
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Feynman Liang

 Scale invariant feature transform (SIFT) is a method to transform images into 
 dense vectors describing local features which are invariant to scale and 
 rotation. (Lowe, IJCV 2004, http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)
 We can implement SIFT in Spark ML pipelines as a 
 [[org.apache.spark.ml.Transformer]]. Given an image Array[Array[Numeric]], 
 the SIFT transformer should output an Array[Numeric] of the SIFT features 
 present in the image.
 Depending on performance, approximating Laplacian of Gaussian by Difference 
 of Gaussian (traditional SIFT) as described by Lowe can be even further 
 improved using box filters (aka SURF, see Bay, ECCV 2006,  
 http://www.vision.ee.ethz.ch/~surf/eccv06.pdf).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8486) SIFT/SURF Feature Transformer


 [ 
https://issues.apache.org/jira/browse/SPARK-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-8486:
-
Description: 
Scale invariant feature transform (SIFT) is a scale and rotation invariant 
method to transform images into matrices describing local features. (Lowe, IJCV 
2004, http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)

We can implement SIFT in Spark ML pipelines as a 
org.apache.spark.ml.Transformer. Given an image Array[Array[Numeric]], the SIFT 
transformer should output an ArrayArray[[Numeric]] of the SIFT features for the 
provided image.

The implementation should support computation of SIFT at predefined interest 
points, every kth pixel, and densely (over all pixels).

Depending on performance, approximating Laplacian of Gaussian by Difference of 
Gaussian (traditional SIFT) as described by Lowe can be even further improved 
using box filters (aka SURF, see Bay, ECCV 2006,  
http://www.vision.ee.ethz.ch/~surf/eccv06.pdf).

  was:
Scale invariant feature transform (SIFT) is a method to transform images into 
dense vectors describing local features which are invariant to scale and 
rotation. (Lowe, IJCV 2004, http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)

We can implement SIFT in Spark ML pipelines as a 
org.apache.spark.ml.Transformer. Given an image Array[Array[Numeric]], the SIFT 
transformer should output an ArrayArray[[Numeric]] of the SIFT features for the 
provided image.

Depending on performance, approximating Laplacian of Gaussian by Difference of 
Gaussian (traditional SIFT) as described by Lowe can be even further improved 
using box filters (aka SURF, see Bay, ECCV 2006,  
http://www.vision.ee.ethz.ch/~surf/eccv06.pdf).


 SIFT/SURF Feature Transformer
 -

 Key: SPARK-8486
 URL: https://issues.apache.org/jira/browse/SPARK-8486
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Feynman Liang

 Scale invariant feature transform (SIFT) is a scale and rotation invariant 
 method to transform images into matrices describing local features. (Lowe, 
 IJCV 2004, http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)
 We can implement SIFT in Spark ML pipelines as a 
 org.apache.spark.ml.Transformer. Given an image Array[Array[Numeric]], the 
 SIFT transformer should output an ArrayArray[[Numeric]] of the SIFT features 
 for the provided image.
 The implementation should support computation of SIFT at predefined interest 
 points, every kth pixel, and densely (over all pixels).
 Depending on performance, approximating Laplacian of Gaussian by Difference 
 of Gaussian (traditional SIFT) as described by Lowe can be even further 
 improved using box filters (aka SURF, see Bay, ECCV 2006,  
 http://www.vision.ee.ethz.ch/~surf/eccv06.pdf).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8486) SIFT Feature Transformer


 [ 
https://issues.apache.org/jira/browse/SPARK-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-8486:
-
Summary: SIFT Feature Transformer  (was: SIFT/SURF/DAISY Feature 
Transformer)

 SIFT Feature Transformer
 

 Key: SPARK-8486
 URL: https://issues.apache.org/jira/browse/SPARK-8486
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Feynman Liang

 Scale invariant feature transform (SIFT) is a scale and rotation invariant 
 method to transform images into matrices describing local features. (Lowe, 
 IJCV 2004, http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)
 We can implement SIFT in Spark ML pipelines as a 
 org.apache.spark.ml.Transformer. Given an image Array[Array[Numeric]], the 
 SIFT transformer should output an ArrayArray[[Numeric]] of the SIFT features 
 for the provided image.
 The implementation should support computation of SIFT at predefined interest 
 points, every kth pixel, and densely (over all pixels). Furthermore, the 
 implementation should support various approximations for approximating the 
 Laplacian of Gaussian using Difference of Gaussian (as described by Lowe).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8486) SIFT/SURF/DAISY Feature Transformer


 [ 
https://issues.apache.org/jira/browse/SPARK-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-8486:
-
Description: 
Scale invariant feature transform (SIFT) is a scale and rotation invariant 
method to transform images into matrices describing local features. (Lowe, IJCV 
2004, http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)

We can implement SIFT in Spark ML pipelines as a 
org.apache.spark.ml.Transformer. Given an image Array[Array[Numeric]], the SIFT 
transformer should output an ArrayArray[[Numeric]] of the SIFT features for the 
provided image.

The implementation should support computation of SIFT at predefined interest 
points, every kth pixel, and densely (over all pixels). Furthermore, the 
implementation should support various approximations for approximating the 
Laplacian of Gaussian using Difference of Gaussian (as described by Lowe).

  was:
Scale invariant feature transform (SIFT) is a scale and rotation invariant 
method to transform images into matrices describing local features. (Lowe, IJCV 
2004, http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)

We can implement SIFT in Spark ML pipelines as a 
org.apache.spark.ml.Transformer. Given an image Array[Array[Numeric]], the SIFT 
transformer should output an ArrayArray[[Numeric]] of the SIFT features for the 
provided image.

The implementation should support computation of SIFT at predefined interest 
points, every kth pixel, and densely (over all pixels). Furthermore, the 
implementation should support various approximations for approximating the 
Laplacian of Gaussian. In addition to approximating using Difference of 
Gaussian (as described by Lowe), we should support
 * SURF approximation using box filters (Bay, ECCV 2006,  
http://www.vision.ee.ethz.ch/~surf/eccv06.pdf) should also be supported.
 * DAISY 


 SIFT/SURF/DAISY Feature Transformer
 ---

 Key: SPARK-8486
 URL: https://issues.apache.org/jira/browse/SPARK-8486
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Feynman Liang

 Scale invariant feature transform (SIFT) is a scale and rotation invariant 
 method to transform images into matrices describing local features. (Lowe, 
 IJCV 2004, http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)
 We can implement SIFT in Spark ML pipelines as a 
 org.apache.spark.ml.Transformer. Given an image Array[Array[Numeric]], the 
 SIFT transformer should output an ArrayArray[[Numeric]] of the SIFT features 
 for the provided image.
 The implementation should support computation of SIFT at predefined interest 
 points, every kth pixel, and densely (over all pixels). Furthermore, the 
 implementation should support various approximations for approximating the 
 Laplacian of Gaussian using Difference of Gaussian (as described by Lowe).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-8477) Add in operator to DataFrame Column in Python


 [ 
https://issues.apache.org/jira/browse/SPARK-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Ishikawa closed SPARK-8477.
--

 Add in operator to DataFrame Column in Python
 -

 Key: SPARK-8477
 URL: https://issues.apache.org/jira/browse/SPARK-8477
 Project: Spark
  Issue Type: New Feature
  Components: PySpark, SQL
Reporter: Yu Ishikawa
 Fix For: 1.3.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8492) Support BinaryType in UnsafeRow

Davies Liu created SPARK-8492:
-

 Summary: Support BinaryType in UnsafeRow
 Key: SPARK-8492
 URL: https://issues.apache.org/jira/browse/SPARK-8492
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Davies Liu
Assignee: Davies Liu






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8491) DAISY Feature Transformer

Feynman Liang created SPARK-8491:


 Summary: DAISY Feature Transformer
 Key: SPARK-8491
 URL: https://issues.apache.org/jira/browse/SPARK-8491
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Feynman Liang


DAISY is another local image descriptor utilizing histograms of local 
orientation similar to SIFT. However, one key difference is that the weighted 
sum of gradient norms used in SIFT's orientation assignment is replaced by 
convolution with Gaussian kernels. This provides a significant speedup in 
computing dense descriptors.

We can implement DAISY in Spark ML pipelines as a 
org.apache.spark.ml.Transformer. Given an image Array[Array[Numeric]], the SIFT 
transformer should output an ArrayArray[[Numeric]] of the DAISY features for 
the provided image.

The convolution operation can leverage GPU parallelism for efficiency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8093) Spark 1.4 branch's new JSON schema inference has changed the behavior of handling inner empty JSON object.


 [ 
https://issues.apache.org/jira/browse/SPARK-8093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated SPARK-8093:

Fix Version/s: 1.5.0

 Spark 1.4 branch's new JSON schema inference has changed the behavior of 
 handling inner empty JSON object.
 --

 Key: SPARK-8093
 URL: https://issues.apache.org/jira/browse/SPARK-8093
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Harish Butani
Assignee: Nathan Howell
Priority: Critical
 Fix For: 1.4.1, 1.5.0

 Attachments: t1.json


 This is similar to SPARK-3365. Sample json is attached. Code to reproduce
 {code}
 var jsonDF = read.json(/tmp/t1.json)
 jsonDF.write.parquet(/tmp/t1.parquet)
 {code}
 The 'integration' object is empty in the json.
 StackTrace:
 {code}
 
 Caused by: java.io.IOException: Could not read footer: 
 java.lang.IllegalStateException: Cannot build an empty group
   at 
 parquet.hadoop.ParquetFileReader.readAllFootersInParallel(ParquetFileReader.java:238)
   at 
 org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:369)
   at 
 org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$metadataCache$lzycompute(newParquet.scala:154)
   at 
 org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$metadataCache(newParquet.scala:152)
   at 
 org.apache.spark.sql.parquet.ParquetRelation2.refresh(newParquet.scala:197)
   at 
 org.apache.spark.sql.sources.InsertIntoHadoopFsRelation.insert(commands.scala:134)
   ... 69 more
 Caused by: java.lang.IllegalStateException: Cannot build an empty group
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8494) ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3


 [ 
https://issues.apache.org/jira/browse/SPARK-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PJ Fanning updated SPARK-8494:
--
Description: 
I found a similar issue to SPARK-1923 but with Scala 2.10.4.
I used the Test.scala from SPARK-1923 but used the libraryDependencies from a 
build.sbt that I am working on.
If I remove the spray 1.3.3 jars, the test case passes but has a 
ClassNotFoundException otherwise.
I have a spark-assembly jar built using Spark 1.3.2-SNAPSHOT.

Application:
{code}
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object Test {
  def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
val sc = new SparkContext(conf)
sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
sc.stop()
  }
{code}

Exception:
{code}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
failed 1 times, most recent failure: Exception failure in TID 1 on host 
localhost: java.lang.ClassNotFoundException: scala.collection.immutable.Range
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
java.security.AccessController.doPrivileged(Native Method)
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
java.lang.ClassLoader.loadClass(ClassLoader.java:425)
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
java.lang.Class.forName0(Native Method)
java.lang.Class.forName(Class.java:270)

org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
{code}

{code}
name := spark-test-case

version := 1.0

scalaVersion := 2.10.4

resolvers += spray repo at http://repo.spray.io;

resolvers += Scalaz Bintray Repo at https://dl.bintray.com/scalaz/releases;

val akkaVersion = 2.3.11
val sprayVersion = 1.3.3

libraryDependencies ++= Seq(
  com.h2database  % h2   % 1.4.187,
  com.typesafe.akka  %% akka-actor   % akkaVersion,
  com.typesafe.akka  %% akka-slf4j   % akkaVersion,
  ch.qos.logback  % logback-classic  % 1.0.13,
  io.spray   %% spray-can% sprayVersion,
  io.spray   %% spray-routing% sprayVersion,
  io.spray   %% spray-json   % 1.3.1,
  com.databricks %% spark-csv% 1.0.3,
  org.specs2 %% specs2   % 2.4.17   % test,
  org.specs2 %% specs2-junit % 2.4.17   % test,
  io.spray   %% spray-testkit% sprayVersion   % test,
  com.typesafe.akka  %% akka-testkit % akkaVersion% test,
  junit   % junit% 4.12 % test
)

scalacOptions ++= Seq(
  -unchecked,
  -deprecation,
  -Xlint,
  -Ywarn-dead-code,
  -language:_,
  -target:jvm-1.7,
  -encoding, UTF-8
)

testOptions += Tests.Argument(TestFrameworks.JUnit, -v)
{code}


  was:
I found a similar issue to SPARK-1923 but with Scala 2.10.4.
I used the Test.scala from SPARK-1923 but used the libraryDependencies from a 
build.sbt that I am working on.
If I remove the spray 1.3.3 jars, the test case passes but has a 
ClassNotFoundException otherwise.

Application:
{code}
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext

object Test {
  def main(args: Array[String]): Unit = {
val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
val sc = new SparkContext(conf)
sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
sc.stop()
  }
{code}

Exception:
{code}
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
failed 1 times, most recent failure: Exception failure in TID 1 on host 
localhost: java.lang.ClassNotFoundException: scala.collection.immutable.Range
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
java.security.AccessController.doPrivileged(Native Method)
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
java.lang.ClassLoader.loadClass(ClassLoader.java:425)
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
java.lang.Class.forName0(Native Method)
java.lang.Class.forName(Class.java:270)

org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
{code}

{code}
name := spark-test-case

version := 1.0

scalaVersion := 2.10.4

resolvers += spray repo at http://repo.spray.io;

resolvers += Scalaz Bintray Repo at https://dl.bintray.com/scalaz/releases;

val akkaVersion = 2.3.11
val sprayVersion = 1.3.3

libraryDependencies

[jira] [Commented] (SPARK-8494) ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3


[ 
https://issues.apache.org/jira/browse/SPARK-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594132#comment-14594132
 ] 

PJ Fanning commented on SPARK-8494:
---

[~pwendell] Apologies about the JIRA being assigned to you. I cloned SPARK-1923 
and now can't change the Assignee.

 ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3
 ---

 Key: SPARK-8494
 URL: https://issues.apache.org/jira/browse/SPARK-8494
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: PJ Fanning
Assignee: Patrick Wendell

 I found a similar issue to SPARK-1923 but with Scala 2.10.4.
 I used the Test.scala from SPARK-1923 but used the libraryDependencies from a 
 build.sbt that I am working on.
 If I remove the spray 1.3.3 jars, the test case passes but has a 
 ClassNotFoundException otherwise.
 Application:
 {code}
 import org.apache.spark.SparkConf
 import org.apache.spark.SparkContext
 object Test {
   def main(args: Array[String]): Unit = {
 val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
 val sc = new SparkContext(conf)
 sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
 sc.stop()
   }
 {code}
 Exception:
 {code}
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
 failed 1 times, most recent failure: Exception failure in TID 1 on host 
 localhost: java.lang.ClassNotFoundException: scala.collection.immutable.Range
 java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 java.security.AccessController.doPrivileged(Native Method)
 java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 java.lang.Class.forName0(Native Method)
 java.lang.Class.forName(Class.java:270)
 
 org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
 
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
 {code}
 {code}
 name := spark-test-case
 version := 1.0
 scalaVersion := 2.10.4
 resolvers += spray repo at http://repo.spray.io;
 resolvers += Scalaz Bintray Repo at https://dl.bintray.com/scalaz/releases;
 val akkaVersion = 2.3.11
 val sprayVersion = 1.3.3
 libraryDependencies ++= Seq(
   com.h2database  % h2   % 1.4.187,
   com.typesafe.akka  %% akka-actor   % akkaVersion,
   com.typesafe.akka  %% akka-slf4j   % akkaVersion,
   ch.qos.logback  % logback-classic  % 1.0.13,
   io.spray   %% spray-can% sprayVersion,
   io.spray   %% spray-routing% sprayVersion,
   io.spray   %% spray-json   % 1.3.1,
   com.databricks %% spark-csv% 1.0.3,
   org.specs2 %% specs2   % 2.4.17   % test,
   org.specs2 %% specs2-junit % 2.4.17   % test,
   io.spray   %% spray-testkit% sprayVersion   % test,
   com.typesafe.akka  %% akka-testkit % akkaVersion% test,
   junit   % junit% 4.12 % test
 )
 scalacOptions ++= Seq(
   -unchecked,
   -deprecation,
   -Xlint,
   -Ywarn-dead-code,
   -language:_,
   -target:jvm-1.7,
   -encoding, UTF-8
 )
 testOptions += Tests.Argument(TestFrameworks.JUnit, -v)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8492) Support BinaryType in UnsafeRow


 [ 
https://issues.apache.org/jira/browse/SPARK-8492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8492:
---

Assignee: Apache Spark  (was: Davies Liu)

 Support BinaryType in UnsafeRow
 ---

 Key: SPARK-8492
 URL: https://issues.apache.org/jira/browse/SPARK-8492
 Project: Spark
  Issue Type: Bug
  Components: SQL
Reporter: Davies Liu
Assignee: Apache Spark





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-7810) rdd.py _load_from_socket cannot load data from jvm socket if ipv6 is used

2015-06-19 Thread Ai He (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-7810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ai He updated SPARK-7810:
-
Issue Type: Improvement  (was: Bug)

 rdd.py _load_from_socket cannot load data from jvm socket if ipv6 is used
 ---

 Key: SPARK-7810
 URL: https://issues.apache.org/jira/browse/SPARK-7810
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 1.3.1
Reporter: Ai He

 Method _load_from_socket in rdd.py cannot load data from jvm socket if ipv6 
 is used. The current method only works well with ipv4. New modification 
 should work around both two protocols.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8486) SIFT/SURF Feature Transformer


 [ 
https://issues.apache.org/jira/browse/SPARK-8486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-8486:
-
Summary: SIFT/SURF Feature Transformer  (was: SIFT/SURF Feature Extractor)

 SIFT/SURF Feature Transformer
 -

 Key: SPARK-8486
 URL: https://issues.apache.org/jira/browse/SPARK-8486
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Feynman Liang

 Scale invariant feature transform (SIFT) is a method to transform images into 
 dense vectors describing local features which are invariant to scale and 
 rotation. (Lowe, IJCV 2004, http://www.cs.ubc.ca/~lowe/papers/ijcv04.pdf)
 We can implement SIFT in Spark ML pipelines as a 
 org.apache.spark.ml.Transformer. Given an image Array[Array[Numeric]], the 
 SIFT transformer should output an Array[Numeric] of the SIFT features present 
 in the image.
 Depending on performance, approximating Laplacian of Gaussian by Difference 
 of Gaussian (traditional SIFT) as described by Lowe can be even further 
 improved using box filters (aka SURF, see Bay, ECCV 2006,  
 http://www.vision.ee.ethz.ch/~surf/eccv06.pdf).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8477) Add in operator to DataFrame Column in Python


[ 
https://issues.apache.org/jira/browse/SPARK-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593910#comment-14593910
 ] 

Yu Ishikawa commented on SPARK-8477:


[~rxin] Should we support not only `inSet` but also what is so called `in` in 
Python? I think it is just an alias of {{inSet}}.

https://github.com/apache/spark/blob/master/python%2Fpyspark%2Fsql%2Fcolumn.py#L248

 Add in operator to DataFrame Column in Python
 -

 Key: SPARK-8477
 URL: https://issues.apache.org/jira/browse/SPARK-8477
 Project: Spark
  Issue Type: New Feature
  Components: PySpark, SQL
Reporter: Yu Ishikawa





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8484) Add TrainValidationSplit to ml.tuning

2015-06-19 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-8484:
-
Assignee: Martin Zapletal

 Add TrainValidationSplit to ml.tuning
 -

 Key: SPARK-8484
 URL: https://issues.apache.org/jira/browse/SPARK-8484
 Project: Spark
  Issue Type: New Feature
  Components: ML
Reporter: Xiangrui Meng
Assignee: Martin Zapletal

 Add TrainValidationSplit for hyper-parameter tuning. It randomly splits the 
 input dataset into train and validation and use evaluation metric on the 
 validation set to select the best model. It should be similar to 
 CrossValidator, but simpler and less expensive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8488) HOG Feature Transformer

Feynman Liang created SPARK-8488:


 Summary: HOG Feature Transformer
 Key: SPARK-8488
 URL: https://issues.apache.org/jira/browse/SPARK-8488
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Feynman Liang


Histogram of oriented gradients (HOG) is method utilizing local orientation 
(gradients and edges) to transform images into dense image descriptors (Dalal  
Triggs, CVPR 2005, 
http://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf).

HOG in Spark ML pipelines can be implemented as a 
org.apache.spark.ml.Transformer. Given an image Array[Array[Numeric]], the SIFT 
transformer should output an ArrayArray[[Numeric]] of the HOG features for the 
provided image.

HOG and SIFT are similar in that the both represent images using local 
orientation histograms. In contrast to SIFT, however, HOG uses overlapping 
spatial blocks and is computed densely across all pixels.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8470) MissingRequirementError for ScalaReflection on user classes


[ 
https://issues.apache.org/jira/browse/SPARK-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593948#comment-14593948
 ] 

Andrew Or commented on SPARK-8470:
--

Update: I verified that this is actually already fixed through 
https://github.com/apache/spark/pull/6891. It is ultimately caused by the same 
issue as SPARK-8368! I will add a regression test shortly.

 MissingRequirementError for ScalaReflection on user classes
 ---

 Key: SPARK-8470
 URL: https://issues.apache.org/jira/browse/SPARK-8470
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Michael Armbrust
Assignee: Andrew Or
Priority: Blocker

 From the mailing list:
 {code}
 Since upgrading to Spark 1.4, I'm getting a
 scala.reflect.internal.MissingRequirementError when creating a DataFrame
 from an RDD. The error references a case class in the application (the
 RDD's type parameter), which has been verified to be present.
 Items of note:
 1) This is running on AWS EMR (YARN). I do not get this error running
 locally (standalone).
 2) Reverting to Spark 1.3.1 makes the problem go away
 3) The jar file containing the referenced class (the app assembly jar)
 is not listed in the classpath expansion dumped in the error message.
 I have seen SPARK-5281, and am guessing that this is the root cause,
 especially since the code added there is involved in the stacktrace.
 That said, my grasp on scala reflection isn't strong enough to make
 sense of the change to say for sure. It certainly looks, though, that in
 this scenario the current thread's context classloader may not be what
 we think it is (given #3 above).
 Any ideas?
 App code:
   def registerTable[A : Product : TypeTag](name: String, rdd:
 RDD[A])(implicit hc: HiveContext) = {
 val df = hc.createDataFrame(rdd)
 df.registerTempTable(name)
   }
 Stack trace:
 scala.reflect.internal.MissingRequirementError: class comMyClass in
 JavaMirror with sun.misc.Launcher$AppClassLoader@d16e5d6 of type class
 sun.misc.Launcher$AppClassLoader with classpath [ lots and lots of paths
 and jars, but not the app assembly jar] not found
 at
 scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
 at
 scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
 at
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48)
 at
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
 at
 scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72)
 at
 scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:119)
 at
 scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:21)
 at
 com.ipcoop.spark.sql.SqlEnv$$typecreator1$1.apply(SqlEnv.scala:87)
 at
 scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:231)
 at
 scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:231)
 at
 org.apache.spark.sql.catalyst.ScalaReflection$class.localTypeOf(ScalaReflection.scala:71)
 at
 org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:59)
 at
 org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:28)
 at
 org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:410)
 {code}
 Another report:
 {code}
 Hi,
 I use spark 0.14. I tried to create dataframe from RDD below, but got 
 scala.reflect.internal.MissingRequirementError
 val partitionedTestDF2 = pairVarRDD.toDF(column1,column2,column3)
 //pairVarRDD is RDD[Record4Dim_2], and Record4Dim_2 is a Case Class
 How can I fix this?
 Exception in thread main scala.reflect.internal.MissingRequirementError: 
 class etl.Record4Dim_2 in JavaMirror with 
 sun.misc.Launcher$AppClassLoader@30177039 of type class 
 sun.misc.Launcher$AppClassLoader with classpath 
 [file:/local/spark140/conf/,file:/local/spark140/lib/spark-assembly-1.4.0-SNAPSHOT-hadoop2.6.0.jar,file:/local/spark140/lib/datanucleus-core-3.2.10.jar,file:/local/spark140/lib/datanucleus-rdbms-3.2.9.jar,file:/local/spark140/lib/datanucleus-api-jdo-3.2.6.jar,file:/etc/hadoop/conf/]
  and parent being sun.misc.Launcher$ExtClassLoader@52c8c6d9 of type class 
 sun.misc.Launcher$ExtClassLoader with classpath 
 [file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/sunec.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/sunjce_provider.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/sunpkcs11.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/zipfs.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/localedata.jar,file:/usr/jdk64/jdk1.7.0_67/jre/lib/ext/dnsns.jar]
  and parent being primordial classloader with boot classpath

[jira] [Updated] (SPARK-8420) Inconsistent behavior with Dataframe Timestamp between 1.3.1 and 1.4.0

2015-06-19 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-8420:

Shepherd: Yin Huai

 Inconsistent behavior with Dataframe Timestamp between 1.3.1 and 1.4.0
 --

 Key: SPARK-8420
 URL: https://issues.apache.org/jira/browse/SPARK-8420
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Justin Yip
Assignee: Michael Armbrust
Priority: Blocker
  Labels: releasenotes

 I am trying out 1.4.0 and notice there are some differences in behavior with 
 Timestamp between 1.3.1 and 1.4.0. 
 In 1.3.1, I can compare a Timestamp with string.
 {code}
 scala val df = sqlContext.createDataFrame(Seq((1, 
 Timestamp.valueOf(2015-01-01 00:00:00)), (2, Timestamp.valueOf(2014-01-01 
 00:00:00
 ...
 scala df.filter($_2 = 2014-06-01).show
 ...
 _1 _2  
 2  2014-01-01 00:00:...
 {code}
 However, in 1.4.0, the filter is always false:
 {code}
 scala val df = sqlContext.createDataFrame(Seq((1, 
 Timestamp.valueOf(2015-01-01 00:00:00)), (2, Timestamp.valueOf(2014-01-01 
 00:00:00
 df: org.apache.spark.sql.DataFrame = [_1: int, _2: timestamp]
 scala df.filter($_2 = 2014-06-01).show
 +--+--+
 |_1|_2|
 +--+--+
 +--+--+
 {code}
 Not sure if that is intended, but I cannot find any doc mentioning these 
 inconsistencies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8477) Add in operator to DataFrame Column in Python


 [ 
https://issues.apache.org/jira/browse/SPARK-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu updated SPARK-8477:
--
Fix Version/s: 1.3.0

 Add in operator to DataFrame Column in Python
 -

 Key: SPARK-8477
 URL: https://issues.apache.org/jira/browse/SPARK-8477
 Project: Spark
  Issue Type: New Feature
  Components: PySpark, SQL
Reporter: Yu Ishikawa
 Fix For: 1.3.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-8470) MissingRequirementError for ScalaReflection on user classes


[ 
https://issues.apache.org/jira/browse/SPARK-8470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593927#comment-14593927
 ] 

Andrew Or edited comment on SPARK-8470 at 6/19/15 8:36 PM:
---

FYI, I was able to reproduce this locally. This allowed me to conclude two 
things:

1. It has nothing to do with YARN specifically.
2. It is caused by some code in the hive module; I could reproduce this only 
with HiveContext, but not with SQLContext.

Small reproduction:

{code}
bin/spark-submit --master local --class FunTest app.jar
{code}

Inside app.jar: FunTest.scala
{code}
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.hive.HiveContext

object FunTest {
  def main(args: Array[String]): Unit = {
println(Runnin' my cool class)
val conf = new SparkConf().setAppName(testing)
val sc = new SparkContext(conf)
val sqlContext = new HiveContext(sc)
val coolClasses = Seq(
  MyCoolClass(ast, resent, uture),
  MyCoolClass(mamazing, papazing, fafazing))
val df = sqlContext.createDataFrame(coolClasses)
df.collect()
  }
}
{code}

Inside app.jar: MyCoolClass.scala
{code}
case class MyCoolClass(past: String, present: String, future: String)
{code}

Result:
{code}
Exception in thread main scala.reflect.internal.MissingRequirementError: 
class MyCoolClass not found.
at 
scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
at 
scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
at 
scala.reflect.internal.Mirrors$RootsBase.ensureClassSymbol(Mirrors.scala:90)
at 
scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:119)
at 
scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:21)
at FunTest$$typecreator1$1.apply(FunTest.scala:13)
at 
scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:231)
at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:231)
at 
org.apache.spark.sql.catalyst.ScalaReflection$class.localTypeOf(ScalaReflection.scala:71)
at 
org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:59)
at 
org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:28)
at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:426)
at FunTest$.main(FunTest.scala:13)
at FunTest.main(FunTest.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{code}


was (Author: andrewor14):
FYI, I was able to reproduce this locally. This allowed me to conclude two 
things:

1. It has nothing to do with YARN specifically.
2. It is caused by some code in the hive module; I could reproduce this only 
with HiveContext, but not with SQLContext.

Small reproduction:

{code}
bin/spark-submit --master local --class FunTest app.jar
{code}

Inside app.jar: FunTest.scala
{code}
object FunTest {
  def main(args: Array[String]): Unit = {
println(Runnin' my cool class)
val conf = new SparkConf().setAppName(testing)
val sc = new SparkContext(conf)
val sqlContext = new HiveContext(sc)
val coolClasses = Seq(
  MyCoolClass(ast, resent, uture),
  MyCoolClass(mamazing, papazing, fafazing))
val df = sqlContext.createDataFrame(coolClasses)
df.collect()
  }
}
{code}

Inside app.jar: MyCoolClass.scala
{code}
case class MyCoolClass(past: String, present: String, future: String)
{code}

Result:
{code}
Exception in thread main scala.reflect.internal.MissingRequirementError: 
class MyCoolClass not found.
at 
scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
at 
scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
at 
scala.reflect.internal.Mirrors$RootsBase.ensureClassSymbol(Mirrors.scala:90)
at 
scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:119)
at 
scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:21)
at FunTest$$typecreator1$1.apply(FunTest.scala:13)
at

[jira] [Resolved] (SPARK-8477) Add in operator to DataFrame Column in Python


 [ 
https://issues.apache.org/jira/browse/SPARK-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Davies Liu resolved SPARK-8477.
---
  Resolution: Implemented
Target Version/s: 1.3.0  (was: 1.5.0)

 Add in operator to DataFrame Column in Python
 -

 Key: SPARK-8477
 URL: https://issues.apache.org/jira/browse/SPARK-8477
 Project: Spark
  Issue Type: New Feature
  Components: PySpark, SQL
Reporter: Yu Ishikawa





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8489) Add regression tests for SPARK-8470

Andrew Or created SPARK-8489:


 Summary: Add regression tests for SPARK-8470
 Key: SPARK-8489
 URL: https://issues.apache.org/jira/browse/SPARK-8489
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Affects Versions: 1.4.0
Reporter: Andrew Or
Assignee: Andrew Or
Priority: Critical


See SPARK-8470 for more detail. Basically the Spark Hive code silently 
overwrites the context class loader populated in SparkSubmit, resulting in 
certain classes missing when we do reflection in `SQLContext#createDataFrame`.

That issue is already resolved in https://github.com/apache/spark/pull/6891, 
but we should add a regression test for the specific manifestation of the bug 
in SPARK-8470.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8485) Feature transformers for image processing

2015-06-19 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14593990#comment-14593990
 ] 

Sean Owen commented on SPARK-8485:
--

I think that before you opened all these JIRAs you should have established 
whether this is a fit for MLlib. Please read: 
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark

Some of these are of enough use, maybe, to be included, but they can start in a 
separate repo. Some I am not sure about myself. Please let's back up before 
opening more

 Feature transformers for image processing
 -

 Key: SPARK-8485
 URL: https://issues.apache.org/jira/browse/SPARK-8485
 Project: Spark
  Issue Type: New Feature
  Components: ML
Reporter: Feynman Liang

 Many transformers exist to convert from image representations into more 
 compact descriptors amenable to standard ML techniques. We should implement 
 these transformers in Spark to support machine learning on richer content 
 types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8485) Feature transformers for image processing


 [ 
https://issues.apache.org/jira/browse/SPARK-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-8485:
-
Target Version/s:   (was: 1.5.0)

 Feature transformers for image processing
 -

 Key: SPARK-8485
 URL: https://issues.apache.org/jira/browse/SPARK-8485
 Project: Spark
  Issue Type: New Feature
  Components: ML
Reporter: Feynman Liang

 Many transformers exist to convert from image representations into more 
 compact descriptors amenable to standard ML techniques. We should implement 
 these transformers in Spark to support machine learning on richer content 
 types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8483) Remove commons-lang3 depedency from flume-sink


 [ 
https://issues.apache.org/jira/browse/SPARK-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8483:
---

Assignee: (was: Apache Spark)

 Remove commons-lang3 depedency from flume-sink
 --

 Key: SPARK-8483
 URL: https://issues.apache.org/jira/browse/SPARK-8483
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.4.0
Reporter: Hari Shreedharan

 flume-sink module uses only one method from commons-lang3. Since the build 
 would become complex if we create an assembly and would likely make it more 
 difficult for customers, let's just remove the dependency altogether.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8483) Remove commons-lang3 depedency from flume-sink


[ 
https://issues.apache.org/jira/browse/SPARK-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594058#comment-14594058
 ] 

Apache Spark commented on SPARK-8483:
-

User 'harishreedharan' has created a pull request for this issue:
https://github.com/apache/spark/pull/6910

 Remove commons-lang3 depedency from flume-sink
 --

 Key: SPARK-8483
 URL: https://issues.apache.org/jira/browse/SPARK-8483
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.4.0
Reporter: Hari Shreedharan

 flume-sink module uses only one method from commons-lang3. Since the build 
 would become complex if we create an assembly and would likely make it more 
 difficult for customers, let's just remove the dependency altogether.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-8483) Remove commons-lang3 depedency from flume-sink


 [ 
https://issues.apache.org/jira/browse/SPARK-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-8483:
---

Assignee: Apache Spark

 Remove commons-lang3 depedency from flume-sink
 --

 Key: SPARK-8483
 URL: https://issues.apache.org/jira/browse/SPARK-8483
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.4.0
Reporter: Hari Shreedharan
Assignee: Apache Spark

 flume-sink module uses only one method from commons-lang3. Since the build 
 would become complex if we create an assembly and would likely make it more 
 difficult for customers, let's just remove the dependency altogether.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8485) Feature transformers for image processing

2015-06-19 Thread Joseph K. Bradley (JIRA)

[
https://issues.apache.org/jira/browse/SPARK-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594122#comment-14594122
]

Joseph K. Bradley commented on SPARK-8485:
--

This is something which is going to come up in MLlib now that we have a better
interface for feature transformers. I suspect a lot of people will look to
Pipelines for existing transformers, including in major applications areas like
NLP, vision, and audio.

I think some of these are clearly useful (SIFT HOG are the ones I hear most
about). For others, it would be good to look to other libraries and see what
is most common. My feeling is that it would be nice to have a few such
transformers in MLlib itself, but a full-fledged image processing library would
belong in an external package for now.

My main concerns are:
* Interest/need: We should hold off on implementing these to see if the
community has sufficient interest.
* Data type: If we add image processing, we need to support actual images,
including depth (data type) and multiple channels (e.g. RGB). This will be a
significant commitment to create a UDT for images, but it would be important to
lay the groundwork for further image processing work.

Let's leave the JIRAs open for discussion to gather interest, use cases with
Spark, and feedback. But people should discuss here before sending PRs.

Feature transformers for image processing
-

Key: SPARK-8485
URL: https://issues.apache.org/jira/browse/SPARK-8485
Project: Spark
Issue Type: New Feature
Components: ML
Reporter: Feynman Liang

Many transformers exist to convert from image representations into more
compact descriptors amenable to standard ML techniques. We should implement
these transformers in Spark to support machine learning on richer content
types.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8420) Inconsistent behavior with Dataframe Timestamp between 1.3.1 and 1.4.0


[ 
https://issues.apache.org/jira/browse/SPARK-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594200#comment-14594200
 ] 

Apache Spark commented on SPARK-8420:
-

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/6914

 Inconsistent behavior with Dataframe Timestamp between 1.3.1 and 1.4.0
 --

 Key: SPARK-8420
 URL: https://issues.apache.org/jira/browse/SPARK-8420
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Justin Yip
Assignee: Michael Armbrust
Priority: Blocker
  Labels: releasenotes

 I am trying out 1.4.0 and notice there are some differences in behavior with 
 Timestamp between 1.3.1 and 1.4.0. 
 In 1.3.1, I can compare a Timestamp with string.
 {code}
 scala val df = sqlContext.createDataFrame(Seq((1, 
 Timestamp.valueOf(2015-01-01 00:00:00)), (2, Timestamp.valueOf(2014-01-01 
 00:00:00
 ...
 scala df.filter($_2 = 2014-06-01).show
 ...
 _1 _2  
 2  2014-01-01 00:00:...
 {code}
 However, in 1.4.0, the filter is always false:
 {code}
 scala val df = sqlContext.createDataFrame(Seq((1, 
 Timestamp.valueOf(2015-01-01 00:00:00)), (2, Timestamp.valueOf(2014-01-01 
 00:00:00
 df: org.apache.spark.sql.DataFrame = [_1: int, _2: timestamp]
 scala df.filter($_2 = 2014-06-01).show
 +--+--+
 |_1|_2|
 +--+--+
 +--+--+
 {code}
 Not sure if that is intended, but I cannot find any doc mentioning these 
 inconsistencies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8494) ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3

2015-06-19 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-8494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-8494:
---
Assignee: (was: Patrick Wendell)

 ClassNotFoundException when running with sbt, scala 2.10.4, spray 1.3.3
 ---

 Key: SPARK-8494
 URL: https://issues.apache.org/jira/browse/SPARK-8494
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: PJ Fanning
 Attachments: spark-test-case.zip


 I found a similar issue to SPARK-1923 but with Scala 2.10.4.
 I used the Test.scala from SPARK-1923 but used the libraryDependencies from a 
 build.sbt that I am working on.
 If I remove the spray 1.3.3 jars, the test case passes but has a 
 ClassNotFoundException otherwise.
 I have a spark-assembly jar built using Spark 1.3.2-SNAPSHOT.
 Application:
 {code}
 import org.apache.spark.SparkConf
 import org.apache.spark.SparkContext
 object Test {
   def main(args: Array[String]): Unit = {
 val conf = new SparkConf().setMaster(local[4]).setAppName(Test)
 val sc = new SparkContext(conf)
 sc.makeRDD(1 to 1000, 10).map(x = Some(x)).count
 sc.stop()
   }
 {code}
 Exception:
 {code}
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0:1 
 failed 1 times, most recent failure: Exception failure in TID 1 on host 
 localhost: java.lang.ClassNotFoundException: scala.collection.immutable.Range
 java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 java.security.AccessController.doPrivileged(Native Method)
 java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 java.lang.Class.forName0(Native Method)
 java.lang.Class.forName(Class.java:270)
 
 org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
 
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
 java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
 {code}
 {code}
 name := spark-test-case
 version := 1.0
 scalaVersion := 2.10.4
 resolvers += spray repo at http://repo.spray.io;
 resolvers += Scalaz Bintray Repo at https://dl.bintray.com/scalaz/releases;
 val akkaVersion = 2.3.11
 val sprayVersion = 1.3.3
 libraryDependencies ++= Seq(
   com.h2database  % h2   % 1.4.187,
   com.typesafe.akka  %% akka-actor   % akkaVersion,
   com.typesafe.akka  %% akka-slf4j   % akkaVersion,
   ch.qos.logback  % logback-classic  % 1.0.13,
   io.spray   %% spray-can% sprayVersion,
   io.spray   %% spray-routing% sprayVersion,
   io.spray   %% spray-json   % 1.3.1,
   com.databricks %% spark-csv% 1.0.3,
   org.specs2 %% specs2   % 2.4.17   % test,
   org.specs2 %% specs2-junit % 2.4.17   % test,
   io.spray   %% spray-testkit% sprayVersion   % test,
   com.typesafe.akka  %% akka-testkit % akkaVersion% test,
   junit   % junit% 4.12 % test
 )
 scalacOptions ++= Seq(
   -unchecked,
   -deprecation,
   -Xlint,
   -Ywarn-dead-code,
   -language:_,
   -target:jvm-1.7,
   -encoding, UTF-8
 )
 testOptions += Tests.Argument(TestFrameworks.JUnit, -v)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-8389) Expose KafkaRDDs offsetRange in Python


 [ 
https://issues.apache.org/jira/browse/SPARK-8389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das closed SPARK-8389.

Resolution: Duplicate

 Expose KafkaRDDs offsetRange in Python
 --

 Key: SPARK-8389
 URL: https://issues.apache.org/jira/browse/SPARK-8389
 Project: Spark
  Issue Type: New Feature
  Components: Streaming
Affects Versions: 1.4.0
Reporter: Tathagata Das
Priority: Critical

 Probably requires creating a JavaKafkaPairRDD and also use that in the python 
 APIs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8389) Expose KafkaRDDs offsetRange in Python


 [ 
https://issues.apache.org/jira/browse/SPARK-8389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-8389:
-
Assignee: (was: Saisai Shao)

 Expose KafkaRDDs offsetRange in Python
 --

 Key: SPARK-8389
 URL: https://issues.apache.org/jira/browse/SPARK-8389
 Project: Spark
  Issue Type: New Feature
  Components: Streaming
Affects Versions: 1.4.0
Reporter: Tathagata Das
Priority: Critical

 Probably requires creating a JavaKafkaPairRDD and also use that in the python 
 APIs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7786) Allow StreamingListener to be specified in SparkConf and loaded when creating StreamingContext


[ 
https://issues.apache.org/jira/browse/SPARK-7786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594240#comment-14594240
 ] 

Tathagata Das commented on SPARK-7786:
--

[~397090770] This functionality can be easily done by the user code without 
actually loosing events rather than having the functionality as a SparkConf. 
The user can very easily pass the name of the class by whatever means (cmdline 
args, etc.) in the process and the user can use reflection to instantiate the 
right listener and attach it to the streaming context before starting it. 

The reason similar functionality was added for SparkListener because attaching 
any listener after the SparkContext has been initialized will not catch all the 
initial events. So the system needs to attach any listener before any event has 
been generated, and that why SparkConf config was necessary. However this is 
not the case for StreamingListener as there are no events before starting the 
StreamingContext.

So can you elaborate on scenarios where this is absolutely essential?

 Allow StreamingListener to be specified in SparkConf and loaded when creating 
 StreamingContext
 --

 Key: SPARK-7786
 URL: https://issues.apache.org/jira/browse/SPARK-7786
 Project: Spark
  Issue Type: New Feature
  Components: Streaming
Affects Versions: 1.3.1
Reporter: yangping wu
Priority: Minor

 As  mentioned in 
 [SPARK-5411|https://issues.apache.org/jira/browse/SPARK-5411], We can also 
 allow user to register StreamingListener  through SparkConf settings, and 
 loaded when creating StreamingContext, This would allow monitoring frameworks 
 to be easily injected into Spark programs without having to modify those 
 programs' code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8483) Remove commons-lang3 depedency from flume-sink

2015-06-19 Thread Hari Shreedharan (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594252#comment-14594252
 ] 

Hari Shreedharan commented on SPARK-8483:
-

Well, we aren't adding a dependency - we are only removing one. So I don't see 
stuff breaking. We can push it out to 1.5 if this is risky.

 Remove commons-lang3 depedency from flume-sink
 --

 Key: SPARK-8483
 URL: https://issues.apache.org/jira/browse/SPARK-8483
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.4.0
Reporter: Hari Shreedharan
Assignee: Hari Shreedharan

 flume-sink module uses only one method from commons-lang3. Since the build 
 would become complex if we create an assembly and would likely make it more 
 difficult for customers, let's just remove the dependency altogether.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8495) Add a `.lintr` file to validate the SparkR files

Yu Ishikawa created SPARK-8495:
--

 Summary: Add a `.lintr` file to validate the SparkR files
 Key: SPARK-8495
 URL: https://issues.apache.org/jira/browse/SPARK-8495
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa


https://issues.apache.org/jira/browse/SPARK-6813

As we discussed, we are planning to go with {{lintr}} to validate the SparkR 
files. So we should add a rules for it as a {{.lintr}} file.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-8497) Graph Clique(Complete Connected Sub-graph) Discovery Algorithm

2015-06-19 Thread Fan Jiang (JIRA)

Fan Jiang created SPARK-8497:


 Summary: Graph Clique(Complete Connected Sub-graph) Discovery 
Algorithm
 Key: SPARK-8497
 URL: https://issues.apache.org/jira/browse/SPARK-8497
 Project: Spark
  Issue Type: New Feature
  Components: GraphX, ML, MLlib, Spark Core
Reporter: Fan Jiang


In recent years, social network industry has high demand on Complete Connected 
Sub-Graph Discoveries, so does Telecom. Similar as the graph connection from 
Twitter, the calls and other activities from telecoms world form a huge social 
graph, and due to the nature of communication method, it shows the strongest 
inter-person relationship, the graph based analysis will reveal tremendous 
value from telecoms connections. 

We need an algorithm in Spark to figure out ALL the strongest completely 
connected sub-graph (so called Clique here) for EVERY person in the network 
which will be one of the start point for understanding user's social behaviour. 

In Huawei, we have many real-world use cases that invovle telecom social graph 
of tens billion edges and hundreds million vertices, and the cliques will be 
also in tens million level. The graph will be a fast changing one which means 
we need to analyse the graph pattern very often (one result per day/week for 
moving time window which spans multiple months). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8389) Expose KafkaRDDs offsetRange in Python


[ 
https://issues.apache.org/jira/browse/SPARK-8389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594199#comment-14594199
 ] 

Tathagata Das commented on SPARK-8389:
--

Aah, there is already discussion. This escaped my notice because it did not 
have the streaming component tag on it. I am going to close this JIRA as 
duplicate.

 Expose KafkaRDDs offsetRange in Python
 --

 Key: SPARK-8389
 URL: https://issues.apache.org/jira/browse/SPARK-8389
 Project: Spark
  Issue Type: New Feature
  Components: Streaming
Affects Versions: 1.4.0
Reporter: Tathagata Das
Assignee: Saisai Shao
Priority: Critical

 Probably requires creating a JavaKafkaPairRDD and also use that in the python 
 APIs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8389) Expose KafkaRDDs offsetRange in Python


 [ 
https://issues.apache.org/jira/browse/SPARK-8389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-8389:
-
Assignee: Saisai Shao  (was: Cody Koeninger)

 Expose KafkaRDDs offsetRange in Python
 --

 Key: SPARK-8389
 URL: https://issues.apache.org/jira/browse/SPARK-8389
 Project: Spark
  Issue Type: New Feature
  Components: Streaming
Affects Versions: 1.4.0
Reporter: Tathagata Das
Assignee: Saisai Shao
Priority: Critical

 Probably requires creating a JavaKafkaPairRDD and also use that in the python 
 APIs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8337) KafkaUtils.createDirectStream for python is lacking API/feature parity with the Scala/Java version


 [ 
https://issues.apache.org/jira/browse/SPARK-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-8337:
-
Component/s: Streaming

 KafkaUtils.createDirectStream for python is lacking API/feature parity with 
 the Scala/Java version
 --

 Key: SPARK-8337
 URL: https://issues.apache.org/jira/browse/SPARK-8337
 Project: Spark
  Issue Type: Bug
  Components: PySpark, Streaming
Affects Versions: 1.4.0
Reporter: Amit Ramesh
Priority: Critical

 See the following thread for context.
 http://apache-spark-developers-list.1001551.n3.nabble.com/Re-Spark-1-4-Python-API-for-getting-Kafka-offsets-in-direct-mode-tt12714.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6813) SparkR style guide

2015-06-19 Thread Shivaram Venkataraman (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-6813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594239#comment-14594239
 ] 

Shivaram Venkataraman commented on SPARK-6813:
--

Two things

1. For the variable and function names should be lowercase, this should not 
come up if the camelCase=NULL is being picked up correctly. I think the best 
way to get lintr to pick up options is to create `.lintr` file in 
`SPARK_HOME/R/pkg` -- I just tried this and this removed all the variable 
name errors.

2. The trailing whitespace is a valid problem. We should remove it. In fact 
[~rxin] has been doing this for all the other parts of the code recently.

 SparkR style guide
 --

 Key: SPARK-6813
 URL: https://issues.apache.org/jira/browse/SPARK-6813
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Shivaram Venkataraman

 We should develop a SparkR style guide document based on the some of the 
 guidelines we use and some of the best practices in R.
 Some examples of R style guide are:
 http://r-pkgs.had.co.nz/r.html#style 
 http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html
 A related issue is to work on a automatic style checking tool. 
 https://github.com/jimhester/lintr seems promising
 We could have a R style guide based on the one from google [1], and adjust 
 some of them with the conversation in Spark:
 1. Line Length: maximum 100 characters
 2. no limit on function name (API should be similar as in other languages)
 3. Allow S4 objects/methods



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7148) Configure Parquet block size (row group size) for ML model import/export

2015-06-19 Thread Joseph K. Bradley (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594261#comment-14594261
 ] 

Joseph K. Bradley commented on SPARK-7148:
--

Hm, if it's that simple, then I wonder if we can adjust parquet.block.size 
before saving/loading the ML models and reset the block size to its original 
value afterwards.  I'll have to try that!

 Configure Parquet block size (row group size) for ML model import/export
 

 Key: SPARK-7148
 URL: https://issues.apache.org/jira/browse/SPARK-7148
 Project: Spark
  Issue Type: Improvement
  Components: MLlib, SQL
Affects Versions: 1.3.0, 1.3.1, 1.4.0
Reporter: Joseph K. Bradley
Priority: Minor

 It would be nice if we could configure the Parquet buffer size when using 
 Parquet format for ML model import/export.  Currently, for some models (trees 
 and ensembles), the schema has 13+ columns.  With a default buffer size of 
 128MB (I think), that puts the allocated buffer way over the default memory 
 made available by run-example.  Because of this problem, users have to use 
 spark-submit and explicitly use a larger amount of memory in order to run 
 some ML examples.
 Is there a simple way to specify {{parquet.block.size}}?  I'm not familiar 
 with this part of SparkSQL.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-6813) SparkR style guide


[ 
https://issues.apache.org/jira/browse/SPARK-6813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594281#comment-14594281
 ] 

Yu Ishikawa edited comment on SPARK-6813 at 6/20/15 5:21 AM:
-

That sounds good! I created an issue to add a {{.lintr}} file as folllows.

https://issues.apache.org/jira/browse/SPARK-8495


was (Author: yuu.ishik...@gmail.com):
That's sounds good! I created an issue to add a {{.lintr}} file as folllows.

https://issues.apache.org/jira/browse/SPARK-8495

 SparkR style guide
 --

 Key: SPARK-6813
 URL: https://issues.apache.org/jira/browse/SPARK-6813
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Shivaram Venkataraman

 We should develop a SparkR style guide document based on the some of the 
 guidelines we use and some of the best practices in R.
 Some examples of R style guide are:
 http://r-pkgs.had.co.nz/r.html#style 
 http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html
 A related issue is to work on a automatic style checking tool. 
 https://github.com/jimhester/lintr seems promising
 We could have a R style guide based on the one from google [1], and adjust 
 some of them with the conversation in Spark:
 1. Line Length: maximum 100 characters
 2. no limit on function name (API should be similar as in other languages)
 3. Allow S4 objects/methods



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-8489) Add regression tests for SPARK-8470


 [ 
https://issues.apache.org/jira/browse/SPARK-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai resolved SPARK-8489.
-
   Resolution: Fixed
Fix Version/s: 1.5.0
   1.4.1

Issue resolved by https://github.com/apache/spark/pull/6909 (the pr used a 
wrong jira number).

 Add regression tests for SPARK-8470
 ---

 Key: SPARK-8489
 URL: https://issues.apache.org/jira/browse/SPARK-8489
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Affects Versions: 1.4.0
Reporter: Andrew Or
Assignee: Andrew Or
Priority: Critical
 Fix For: 1.4.1, 1.5.0


 See SPARK-8470 for more detail. Basically the Spark Hive code silently 
 overwrites the context class loader populated in SparkSubmit, resulting in 
 certain classes missing when we do reflection in `SQLContext#createDataFrame`.
 That issue is already resolved in https://github.com/apache/spark/pull/6891, 
 but we should add a regression test for the specific manifestation of the bug 
 in SPARK-8470.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8389) Expose KafkaRDDs offsetRange in Python


 [ 
https://issues.apache.org/jira/browse/SPARK-8389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-8389:
-
Summary: Expose KafkaRDDs offsetRange in Python  (was: Expose KafkaRDDs 
offsetRange in Java and Python)

 Expose KafkaRDDs offsetRange in Python
 --

 Key: SPARK-8389
 URL: https://issues.apache.org/jira/browse/SPARK-8389
 Project: Spark
  Issue Type: New Feature
  Components: Streaming
Affects Versions: 1.4.0
Reporter: Tathagata Das
Assignee: Cody Koeninger
Priority: Critical

 Probably requires creating a JavaKafkaPairRDD and also use that in the python 
 APIs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8420) Inconsistent behavior with Dataframe Timestamp between 1.3.1 and 1.4.0


[ 
https://issues.apache.org/jira/browse/SPARK-8420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594215#comment-14594215
 ] 

Yin Huai commented on SPARK-8420:
-

Will resolve this one after 1.4 backport is merged.

 Inconsistent behavior with Dataframe Timestamp between 1.3.1 and 1.4.0
 --

 Key: SPARK-8420
 URL: https://issues.apache.org/jira/browse/SPARK-8420
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
Reporter: Justin Yip
Assignee: Michael Armbrust
Priority: Blocker
  Labels: releasenotes
 Fix For: 1.5.0


 I am trying out 1.4.0 and notice there are some differences in behavior with 
 Timestamp between 1.3.1 and 1.4.0. 
 In 1.3.1, I can compare a Timestamp with string.
 {code}
 scala val df = sqlContext.createDataFrame(Seq((1, 
 Timestamp.valueOf(2015-01-01 00:00:00)), (2, Timestamp.valueOf(2014-01-01 
 00:00:00
 ...
 scala df.filter($_2 = 2014-06-01).show
 ...
 _1 _2  
 2  2014-01-01 00:00:...
 {code}
 However, in 1.4.0, the filter is always false:
 {code}
 scala val df = sqlContext.createDataFrame(Seq((1, 
 Timestamp.valueOf(2015-01-01 00:00:00)), (2, Timestamp.valueOf(2014-01-01 
 00:00:00
 df: org.apache.spark.sql.DataFrame = [_1: int, _2: timestamp]
 scala df.filter($_2 = 2014-06-01).show
 +--+--+
 |_1|_2|
 +--+--+
 +--+--+
 {code}
 Not sure if that is intended, but I cannot find any doc mentioning these 
 inconsistencies.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8483) Remove commons-lang3 depedency from flume-sink


 [ 
https://issues.apache.org/jira/browse/SPARK-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-8483:
-
Target Version/s: 1.4.1, 1.5.0  (was: 1.5.0)

 Remove commons-lang3 depedency from flume-sink
 --

 Key: SPARK-8483
 URL: https://issues.apache.org/jira/browse/SPARK-8483
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.4.0
Reporter: Hari Shreedharan
Assignee: Hari Shreedharan

 flume-sink module uses only one method from commons-lang3. Since the build 
 would become complex if we create an assembly and would likely make it more 
 difficult for customers, let's just remove the dependency altogether.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-8483) Remove commons-lang3 depedency from flume-sink


[ 
https://issues.apache.org/jira/browse/SPARK-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594228#comment-14594228
 ] 

Tathagata Das commented on SPARK-8483:
--

We generally do not the dependency changes between patch releases. Because of 
potential dependency issues between multiple version of same libraries at 
runtime, etc. But this sink thing runs only in Flume. So do you think it is 
okay in this case? For sure?

 Remove commons-lang3 depedency from flume-sink
 --

 Key: SPARK-8483
 URL: https://issues.apache.org/jira/browse/SPARK-8483
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.4.0
Reporter: Hari Shreedharan
Assignee: Hari Shreedharan

 flume-sink module uses only one method from commons-lang3. Since the build 
 would become complex if we create an assembly and would likely make it more 
 difficult for customers, let's just remove the dependency altogether.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6813) SparkR style guide


[ 
https://issues.apache.org/jira/browse/SPARK-6813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594227#comment-14594227
 ] 

Yu Ishikawa commented on SPARK-6813:


[~shivaram] Those two rules looks good!

I modified my code, using the github version instead of the CRAN version. And I 
tryied to set the two rules in it.

h3. The latest version script

https://github.com/apache/spark/compare/master...yu-iskw:SPARK-6813

h3. The result of the script

https://gist.github.com/yu-iskw/7a663dbea295ee767849

h3. Rules we should discuss

- {{Variable and function names should be all lowercase}} or not
- {{Trailing whitespace is superfluous}}


 SparkR style guide
 --

 Key: SPARK-6813
 URL: https://issues.apache.org/jira/browse/SPARK-6813
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Shivaram Venkataraman

 We should develop a SparkR style guide document based on the some of the 
 guidelines we use and some of the best practices in R.
 Some examples of R style guide are:
 http://r-pkgs.had.co.nz/r.html#style 
 http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html
 A related issue is to work on a automatic style checking tool. 
 https://github.com/jimhester/lintr seems promising
 We could have a R style guide based on the one from google [1], and adjust 
 some of them with the conversation in Spark:
 1. Line Length: maximum 100 characters
 2. no limit on function name (API should be similar as in other languages)
 3. Allow S4 objects/methods



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-6813) SparkR style guide


[ 
https://issues.apache.org/jira/browse/SPARK-6813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14594250#comment-14594250
 ] 

Yu Ishikawa edited comment on SPARK-6813 at 6/20/15 1:25 AM:
-

Sounds great! 

h3. TODO

Please let me know if you'd like to add anything.

- We will modify valid problems
- You will create {{.lintr}} in {{SPARK_HOME/R/pkg}}
- I will send a PR about my {{lint-r}} script and merge it
- We will modify valid problems with my {{lint-r}} script again
- We will add some settings to run the {{lint-r}} script on the official jenkins


was (Author: yuu.ishik...@gmail.com):
Sounds great! 

h3. TODO

Please let me know if you'd like to add anything.

- Modify valid problems
- You will create {{.lintr}} in {{SPARK_HOME/R/pkg}}
- I will send a PR about my {{lint-r}} script and merge it
- Modify valid problems with my {{lint-r}} script again
- We will add some settings to run the {{lint-r}} script on the official jenkins

 SparkR style guide
 --

 Key: SPARK-6813
 URL: https://issues.apache.org/jira/browse/SPARK-6813
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Shivaram Venkataraman

 We should develop a SparkR style guide document based on the some of the 
 guidelines we use and some of the best practices in R.
 Some examples of R style guide are:
 http://r-pkgs.had.co.nz/r.html#style 
 http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html
 A related issue is to work on a automatic style checking tool. 
 https://github.com/jimhester/lintr seems promising
 We could have a R style guide based on the one from google [1], and adjust 
 some of them with the conversation in Spark:
 1. Line Length: maximum 100 characters
 2. no limit on function name (API should be similar as in other languages)
 3. Allow S4 objects/methods



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8490) SURF Feature Transformer


 [ 
https://issues.apache.org/jira/browse/SPARK-8490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-8490:
-
Priority: Minor  (was: Major)

 SURF Feature Transformer
 

 Key: SPARK-8490
 URL: https://issues.apache.org/jira/browse/SPARK-8490
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Feynman Liang
Priority: Minor

 Speeded up robust features (SURF) (Bay et al, ECCV 2006, 
 http://www.vision.ee.ethz.ch/~surf/eccv06.pdf) is an image descriptor 
 transform very similar to SIFT (SPARK-8486) but can be computed more 
 efficiently. One key difference is using box filters (Difference of Boxes) to 
 approximate the Laplacian of the Gaussian.
 We can implement SURF in Spark ML pipelines as a 
 org.apache.spark.ml.Transformer. Given an image Array[Array[Numeric]], the 
 SIFT transformer should output an ArrayArray[[Numeric]] of the SURF features 
 for the provided image.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8485) Feature transformers for image processing


 [ 
https://issues.apache.org/jira/browse/SPARK-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feynman Liang updated SPARK-8485:
-
Priority: Minor  (was: Major)

 Feature transformers for image processing
 -

 Key: SPARK-8485
 URL: https://issues.apache.org/jira/browse/SPARK-8485
 Project: Spark
  Issue Type: New Feature
  Components: ML
Reporter: Feynman Liang
Priority: Minor

 Many transformers exist to convert from image representations into more 
 compact descriptors amenable to standard ML techniques. We should implement 
 these transformers in Spark to support machine learning on richer content 
 types.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-8493) Fisher Vector Feature Transformer